Elementary Linear Algebra, Applications Version [11 ed.]
 1118474228, 9781118474228, 9781118434413

Table of contents :
Cover......Page 1
Title Page......Page 5
Copyright Page......Page 6
Dedication......Page 7
Preface......Page 8
CONTENTS......Page 12
CHAPTER 1 Systems of Linear Equations and Matrices......Page 15
1.1 Introduction to Systems of Linear Equations......Page 16
1.2 Gaussian Elimination......Page 25
1.3 Matrices and Matrix Operations......Page 39
1.4 Inverses; Algebraic Properties of Matrices......Page 53
1.5 Elementary Matrices and a Method for Finding A-1......Page 66
1.6 More on Linear Systems and Invertible Matrices......Page 75
1.7 Diagonal, Triangular, and Symmetric Matrices......Page 81
1.8 Matrix Transformations......Page 89
Network Analysis (Traffic Flow)......Page 98
Electrical Circuits......Page 100
Balancing Chemical Equations......Page 102
Polynomial Interpolation......Page 105
1.10 Application: Leontief Input-Output Models......Page 110
2.1 Determinants by Cofactor Expansion......Page 119
2.2 Evaluating Determinants by Row Reduction......Page 127
2.3 Properties of Determinants; Cramer’s Rule......Page 132
3.1 Vectors in 2-Space, 3-Space, and n-Space......Page 145
3.2 Norm, Dot Product, and Distance in Rn......Page 156
3.3 Orthogonality......Page 169
3.4 The Geometry of Linear Systems......Page 178
3.5 Cross Product......Page 186
4.1 Real Vector Spaces......Page 197
4.2 Subspaces......Page 205
4.3 Linear Independence......Page 216
4.4 Coordinates and Basis......Page 226
4.5 Dimension......Page 235
4.6 Change of Basis......Page 243
4.7 Row Space, Column Space, and Null Space......Page 251
4.8 Rank, Nullity, and the Fundamental Matrix Spaces......Page 262
4.9 Basic Matrix Transformations in R2 and R3......Page 273
4.10 Properties of Matrix Transformations......Page 284
4.11 Application: Geometry of Matrix Operators on R2......Page 294
5.1 Eigenvalues and Eigenvectors......Page 305
5.2 Diagonalization......Page 316
5.3 Complex Vector Spaces......Page 327
5.4 Application: Differential Equations......Page 340
5.5 Application: Dynamical Systems and Markov Chains......Page 346
6.1 Inner Products......Page 359
6.2 Angle and Orthogonality in Inner Product Spaces......Page 369
6.3 Gram–Schmidt Process; QR-Decomposition......Page 378
6.4 Best Approximation; Least Squares......Page 392
6.5 Application: Mathematical Modeling Using Least Squares......Page 401
6.6 Application: Function Approximation; Fourier Series......Page 408
7.1 Orthogonal Matrices......Page 415
7.2 Orthogonal Diagonalization......Page 423
7.3 Quadratic Forms......Page 431
7.4 Optimization Using Quadratic Forms......Page 443
7.5 Hermitian, Unitary, and Normal Matrices......Page 451
8.1 General Linear Transformations......Page 461
8.2 Compositions and Inverse Transformations......Page 472
8.3 Isomorphism......Page 480
8.4 Matrices for General Linear Transformations......Page 486
8.5 Similarity......Page 495
9.1 LU-Decompositions......Page 505
9.2 The Power Method......Page 515
9.3 Comparison of Procedures for Solving Linear Systems......Page 523
9.4 Singular Value Decomposition......Page 528
9.5 Application: Data Compression Using Singular Value Decomposition......Page 535
CHAPTER 10 Applications of Linear Algebra......Page 541
10.1 Constructing Curves and Surfaces Through Specified Points......Page 542
10.2 The Earliest Applications of Linear Algebra......Page 547
10.3 Cubic Spline Interpolation......Page 554
10.4 Markov Chains......Page 565
10.5 Graph Theory......Page 575
10.6 Games of Strategy......Page 584
10.7 Leontief Economic Models......Page 593
10.8 Forest Management......Page 602
10.9 Computer Graphics......Page 609
10.10 Equilibrium Temperature Distributions......Page 617
10.11 Computed Tomography......Page 627
10.12 Fractals......Page 638
10.13 Chaos......Page 653
10.14 Cryptography......Page 666
10.15 Genetics......Page 677
10.16 Age-Specific Population Growth......Page 687
10.17 Harvesting of Animal Populations......Page 697
10.18 A Least Squares Model for Human Hearing......Page 705
10.19 Warps and Morphs......Page 711
10.20 Internet Search Engines......Page 720
APPENDIX A Working with Proofs......Page 729
APPENDIX B Complex Numbers......Page 733
Answers to Exercises......Page 741
Index......Page 787
Index of Applications and Historical Topics......Page 801

Citation preview

WileyPLUS is a research-based online environment for effective teaching and learning. WileyPLUS builds students’ confidence because it takes the guesswork out of studying by providing students with a clear roadmap: • • •

what to do how to do it if they did it right

It offers interactive resources along with a complete digital textbook that help students learn more. With WileyPLUS, students take more initiative so you’ll have greater impact on their achievement in the classroom and beyond.

Now available for

For more information, visit www.wileyplus.com

ALL THE HELP, RESOURCES, AND PERSONAL SUPPORT YOU AND YOUR STUDENTS NEED! www.wileyplus.com/resources

Student Partner Program 2-Minute Tutorials and all of the resources you and your students need to get started

Student support from an experienced student user

Collaborate with your colleagues, find a mentor, attend virtual and live events, and view resources www.WhereFacultyConnect.com

Quick Start Pre-loaded, ready-to-use assignments and presentations created by subject matter experts

Technical Support 24/7 FAQs, online chat, and phone support www.wileyplus.com/support

© Courtney Keating/iStockphoto

Your WileyPLUS Account Manager, providing personal training and support

11 T

H

EDITION

Elementary Linear Algebra Applications Version H OWA R D

A NT O N

Professor Emeritus, Drexel University

C H R I S

R O R R E S

University of Pennsylvania

VICE PRESIDENT AND PUBLISHER SENIOR ACQUISITIONS EDITOR ASSOCIATE CONTENT EDITOR FREELANCE DEVELOPMENT EDITOR MARKETING MANAGER EDITORIAL ASSISTANT SENIOR PRODUCT DESIGNER SENIOR PRODUCTION EDITOR SENIOR CONTENT MANAGER OPERATIONS MANAGER SENIOR DESIGNER MEDIA SPECIALIST PHOTO RESEARCH EDITOR COPY EDITOR PRODUCTION SERVICES COVER ART

Laurie Rosatone David Dietz Jacqueline Sinacori Anne Scanlan-Rohrer Melanie Kurkjian Michael O’Neal Thomas Kulesa Ken Santor Karoline Luciano Melissa Edwards Maddy Lesure Laura Abrams Felicia Ruocco Lilian Brady Carol Sawyer/The Perfect Proof Norm Christiansen

This book was set in Times New Roman STD by Techsetters, Inc. and printed and bound by Quad Graphics/Versailles. The cover was printed by Quad Graphics/Versailles. This book is printed on acid-free paper. Copyright 2014, 2010, 2005, 2000, 1994, 1991, 1987, 1984, 1981, 1977, 1973 by Anton Textbooks, Inc. All rights reserved. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, website www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, (201) 748-6011, fax (201) 748-6008, website www.wiley.com/go/permissions. Best efforts have been made to determine whether the images of mathematicians shown in the text are in the public domain or properly licensed. If you believe that an error has been made, please contact the Permissions Department. Evaluation copies are provided to qualified academics and professionals for review purposes only, for use in their courses during the next academic year. These copies are licensed and may not be sold or transferred to a third party. Upon completion of the review period, please return the evaluation copy to Wiley. Return instructions and a free of charge return shipping label are available at www.wiley.com/go/returnlabel. Outside of the United States, please contact your local representative. Library of Congress Cataloging-in-Publication Data Anton, Howard, author. Elementary linear algebra : applications version / Howard Anton, Chris Rorres. -- 11th edition. pages cm Includes index. ISBN 978-1-118-43441-3 (cloth) 1. Algebras, Linear--Textbooks. I. Rorres, Chris, author. II. Title. QA184.2.A58 2013 512'.5--dc23 2013033542 ISBN 978-1-118-43441-3 ISBN Binder-Ready Version 978-1-118-47422-8 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

ABOUT

THE

AUTHOR Howard Anton obtained his B.A. from Lehigh University, his M.A. from the University of Illinois, and his Ph.D. from the Polytechnic University of Brooklyn, all in mathematics. In the early 1960s he worked for Burroughs Corporation and Avco Corporation at Cape Canaveral, Florida, where he was involved with the manned space program. In 1968 he joined the Mathematics Department at Drexel University, where he taught full time until 1983. Since then he has devoted the majority of his time to textbook writing and activities for mathematical associations. Dr. Anton was president of the EPADEL Section of the Mathematical Association of America (MAA), served on the Board of Governors of that organization, and guided the creation of the Student Chapters of the MAA. In addition to various pedagogical articles, he has published numerous research papers in functional analysis, approximation theory, and topology. He is best known for his textbooks in mathematics, which are among the most widely used in the world. There are currently more than 175 versions of his books, including translations into Spanish, Arabic, Portuguese, Italian, Indonesian, French, Japanese, Chinese, Hebrew, and German. For relaxation, Dr. Anton enjoys travel and photography.

Chris Rorres earned his B.S. degree from Drexel University and his Ph.D. from the Courant Institute of New York University. He was a faculty member of the Department of Mathematics at Drexel University for more than 30 years where, in addition to teaching, he did applied research in solar engineering, acoustic scattering, population dynamics, computer system reliability, geometry of archaeological sites, optimal animal harvesting policies, and decision theory. He retired from Drexel in 2001 as a Professor Emeritus of Mathematics and is now a mathematical consultant. He also has a research position at the School of Veterinary Medicine at the University of Pennsylvania where he does mathematical modeling of animal epidemics. Dr. Rorres is a recognized expert on the life and work of Archimedes and has appeared in various television documentaries on that subject. His highly acclaimed website on Archimedes (http://www.math.nyu.edu/~crorres/Archimedes/contents.html) is a virtual book that has become an important teaching tool in mathematical history for students around the world.

To: My wife, Pat My children, Brian, David, and Lauren My parents, Shirley and Benjamin My benefactor, Stephen Girard (1750–1831), whose philanthropy changed my life

Howard Anton To: Billie

Chris Rorres

PREFACE

Summary of Changes in This Edition

vi

This textbook is an expanded version of Elementary Linear Algebra, eleventh edition, by Howard Anton. The first nine chapters of this book are identical to the first nine chapters of that text; the tenth chapter consists of twenty applications of linear algebra drawn from business, economics, engineering, physics, computer science, approximation theory, ecology, demography, and genetics. The applications are largely independent of each other, and each includes a list of mathematical prerequisites. Thus, each instructor has the flexibility to choose those applications that are suitable for his or her students and to incorporate each application anywhere in the course after the mathematical prerequisites have been satisfied. Chapters 1–9 include simpler treatments of some of the applications covered in more depth in Chapter 10. This edition gives an introductory treatment of linear algebra that is suitable for a first undergraduate course. Its aim is to present the fundamentals of linear algebra in the clearest possible way—sound pedagogy is the main consideration. Although calculus is not a prerequisite, there is some optional material that is clearly marked for students with a calculus background. If desired, that material can be omitted without loss of continuity. Technology is not required to use this text, but for instructors who would like to use MATLAB, Mathematica, Maple, or calculators with linear algebra capabilities, we have posted some supporting material that can be accessed at either of the following companion websites: www.howardanton.com www.wiley.com/college/anton Many parts of the text have been revised based on an extensive set of reviews. Here are the primary changes: • Earlier Linear Transformations Linear transformations are introduced earlier (starting in Section 1.8). Many exercise sets, as well as parts of Chapters 4 and 8, have been revised in keeping with the earlier introduction of linear transformations. • New Exercises Hundreds of new exercises of all types have been added throughout the text. • Technology Exercises requiring technology such as MATLAB, Mathematica, or Maple have been added and supporting data sets have been posted on the companion websites for this text. The use of technology is not essential, and these exercises can be omitted without affecting the flow of the text. • Exercise Sets Reorganized Many multiple-part exercises have been subdivided to create a better balance between odd and even exercise types. To simplify the instructor’s task of creating assignments, exercise sets have been arranged in clearly defined categories. • Reorganization In addition to the earlier introduction of linear transformations, the old Section 4.12 on Dynamical Systems and Markov Chains has been moved to Chapter 5 in order to incorporate material on eigenvalues and eigenvectors. • Rewriting Section 9.3 on Internet Search Engines from the previous edition has been rewritten to reflect more accurately how the Google PageRank algorithm works in practice. That section is now Section 10.20 of the applications version of this text. • Appendix A Rewritten The appendix on reading and writing proofs has been expanded and revised to better support courses that focus on proving theorems. • Web Materials Supplementary web materials now include various applications modules, three modules on linear programming, and an alternative presentation of determinants based on permutations. • Applications Chapter Section 10.2 of the previous edition has been moved to the websites that accompany this text, so it is now part of a three-module set on Linear

Preface

vii

Programming. A new section on Internet search engines has been added that explains the PageRank algorithm used by Google. Hallmark Features

• Relationships Among Concepts One of our main pedagogical goals is to convey to the student that linear algebra is a cohesive subject and not simply a collection of isolated definitions and techniques. One way in which we do this is by using a crescendo of Equivalent Statements theorems that continually revisit relationships among systems of equations, matrices, determinants, vectors, linear transformations, and eigenvalues. To get a general sense of how we use this technique see Theorems 1.5.3, 1.6.4, 2.3.8, 4.8.8, and then Theorem 5.1.5, for example. • Smooth Transition to Abstraction Because the transition from R n to general vector spaces is difficult for many students, considerable effort is devoted to explaining the purpose of abstraction and helping the student to “visualize” abstract ideas by drawing analogies to familiar geometric ideas. • Mathematical Precision When reasonable, we try to be mathematically precise. In keeping with the level of student audience, proofs are presented in a patient style that is tailored for beginners. • Suitability for a Diverse Audience This text is designed to serve the needs of students in engineering, computer science, biology, physics, business, and economics as well as those majoring in mathematics. • Historical Notes To give the students a sense of mathematical history and to convey that real people created the mathematical theorems and equations they are studying, we have included numerous Historical Notes that put the topic being studied in historical perspective.

About the Exercises

• Graded Exercise Sets Each exercise set in the first nine chapters begins with routine drill problems and progresses to problems with more substance. These are followed by three categories of exercises, the first focusing on proofs, the second on true/false exercises, and the third on problems requiring technology. This compartmentalization is designed to simplify the instructor’s task of selecting exercises for homework. • Proof Exercises Linear algebra courses vary widely in their emphasis on proofs, so exercises involving proofs have been grouped and compartmentalized for easy identification. Appendix A has been rewritten to provide students more guidance on proving theorems. • True/False Exercises The True/False exercises are designed to check conceptual understanding and logical reasoning. To avoid pure guesswork, the students are required to justify their responses in some way. • Technology Exercises Exercises that require technology have also been grouped. To avoid burdening the student with keyboarding, the relevant data files have been posted on the websites that accompany this text. • Supplementary Exercises Each of the first nine chapters ends with a set of supplementary exercises that draw on all topics in the chapter. These tend to be more challenging.

Supplementary Materials for Students

• Student Solutions Manual This supplement provides detailed solutions to most oddnumbered exercises (ISBN 978-1-118-464427). • Data Files Data files for the technology exercises are posted on the companion websites that accompany this text. • MATLAB Manual and Linear Algebra Labs This supplement contains a set of MATLAB laboratory projects written by Dan Seth of West Texas A&M University. It is designed to help students learn key linear algebra concepts by using MATLAB and is available in PDF form without charge to students at schools adopting the 11th edition of the text. • Videos A complete set of Daniel Solow’s How to Read and Do Proofs videos is available to students through WileyPLUS as well as the companion websites that accompany

viii

Preface

this text. Those materials include a guide to help students locate the lecture videos appropriate for specific proofs in the text. Supplementary Materials for Instructors

• Instructor’s Solutions Manual This supplement provides worked-out solutions to most exercises in the text (ISBN 978-1-118-434482). • PowerPoint Presentations PowerPoint slides are provided that display important definitions, examples, graphics, and theorems in the book. These can also be distributed to students as review materials or to simplify note taking. • Test Bank Test questions and sample exams are available in PDF or LATEX form. • WileyPLUS An online environment for effective teaching and learning. WileyPLUS builds student confidence by taking the guesswork out of studying and by providing a clear roadmap of what to do, how to do it, and whether it was done right. Its purpose is to motivate and foster initiative so instructors can have a greater impact on classroom achievement and beyond.

A Guide for the Instructor

Although linear algebra courses vary widely in content and philosophy, most courses fall into two categories—those with about 40 lectures and those with about 30 lectures. Accordingly, we have created long and short templates as possible starting points for constructing a course outline. Of course, these are just guides, and you will certainly want to customize them to fit your local interests and requirements. Neither of these sample templates includes applications or the numerical methods in Chapter 9. Those can be added, if desired, and as time permits. Long Template Chapter 1: Systems of Linear Equations and Matrices

8 lectures

6 lectures

Chapter 2: Determinants

3 lectures

2 lectures

Chapter 3: Euclidean Vector Spaces

4 lectures

3 lectures

10 lectures

9 lectures

Chapter 5: Eigenvalues and Eigenvectors

3 lectures

3 lectures

Chapter 6: Inner Product Spaces

3 lectures

1 lecture

Chapter 7: Diagonalization and Quadratic Forms

4 lectures

3 lectures

Chapter 8: General Linear Transformations

4 lectures

3 lectures

39 lectures

30 lectures

Chapter 4: General Vector Spaces

Total:

Reviewers

Short Template

The following people reviewed the plans for this edition, critiqued much of the content, and provided me with insightful pedagogical advice: John Alongi, Northwestern University Jiu Ding, University of Southern Mississippi Eugene Don, City University of New York at Queens John Gilbert, University of Texas Austin Danrun Huang, St. Cloud State University Craig Jensen, University of New Orleans Steve Kahan, City University of New York at Queens Harihar Khanal, Embry-Riddle Aeronautical University Firooz Khosraviyani, Texas A&M International University Y. George Lai, Wilfred Laurier University Kouok Law, Georgia Perimeter College Mark MacLean, Seattle University

Preface

ix

Vasileios Maroulas, University of Tennessee, Knoxville Daniel Reynolds, Southern Methodist University Qin Sheng, Baylor University Laura Smithies, Kent State University Larry Susanka, Bellevue College Cristina Tone, University of Louisville Yvonne Yaz, Milwaukee School of Engineering Ruhan Zhao, State University of New York at Brockport Exercise Contributions

Special thanks are due to three talented people who worked on various aspects of the exercises: Przemyslaw Bogacki, Old Dominion University – who solved the exercises and created the solutions manuals. Roger Lipsett, Brandeis University – who proofread the manuscript and exercise solutions for mathematical accuracy. Daniel Solow, Case Western Reserve University – author of “How to Read and Do Proofs,” for providing videos on techniques of proof and a key to using those videos in coordination with this text. Sky Pelletier Waterpeace – who critiqued the technology exercises, suggested improvements, and provided the data sets.

Special Contributions

I would also like to express my deep appreciation to the following people with whom I worked on a daily basis: Anton Kaul – who worked closely with me at every stage of the project and helped to write some new text material and exercises. On the many occasions that I needed mathematical or pedagogical advice, he was the person I turned to. I cannot thank him enough for his guidance and the many contributions he has made to this edition. David Dietz – my editor, for his patience, sound judgment, and dedication to producing a quality book. Anne Scanlan-Rohrer – of Two Ravens Editorial, who coordinated the entire project and brought all of the pieces together. Jacqueline Sinacori – who managed many aspects of the content and was always there to answer my often obscure questions. Carol Sawyer – of The Perfect Proof, who managed the myriad of details in the production process and helped with proofreading. Maddy Lesure – with whom I have worked for many years and whose elegant sense of design is apparent in the pages of this book. Lilian Brady – my copy editor for almost 25 years. I feel fortunate to have been the beneficiary of her remarkable knowledge of typography, style, grammar, and mathematics. Pat Anton – of Anton Textbooks, Inc., who helped with the mundane chores duplicating, shipping, accuracy checking, and tasks too numerous to mention. John Rogosich – of Techsetters, Inc., who programmed the design, managed the composition, and resolved many difficult technical issues. Brian Haughwout – of Techsetters, Inc., for his careful and accurate work on the illustrations. Josh Elkan – for providing valuable assistance in accuracy checking.

Howard Anton Chris Rorres

CONTENTS

C HA PT E R

1

Systems of Linear Equations and Matrices 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9

Introduction to Systems of Linear Equations 2 Gaussian Elimination 11 Matrices and Matrix Operations 25 Inverses; Algebraic Properties of Matrices 39 Elementary Matrices and a Method for Finding A−1 More on Linear Systems and Invertible Matrices 61 Diagonal, Triangular, and Symmetric Matrices 67 Matrix Transformations 75 Applications of Linear Systems 84 • Network Analysis (Traffic Flow) 84 • Electrical Circuits 86 • Balancing Chemical Equations 88 • Polynomial Interpolation 91 1.10 Application: Leontief Input-Output Models 96

C HA PT E R

2

Determinants

52

105

2.1 Determinants by Cofactor Expansion 105 2.2 Evaluating Determinants by Row Reduction 113 2.3 Properties of Determinants; Cramer’s Rule 118

C HA PT E R

3

Euclidean Vector Spaces 3.1 3.2 3.3 3.4 3.5

C HA PT E R

4

131

Vectors in 2-Space, 3-Space, and n-Space Norm, Dot Product, and Distance in Rn Orthogonality 155 The Geometry of Linear Systems 164 Cross Product 172

General Vector Spaces

131 142

183

4.1 Real Vector Spaces 183 4.2 Subspaces 191 4.3 Linear Independence 202 4.4 Coordinates and Basis 212 4.5 Dimension 221 4.6 Change of Basis 229 4.7 Row Space, Column Space, and Null Space 237 4.8 Rank, Nullity, and the Fundamental Matrix Spaces 4.9 Basic Matrix Transformations in R2 and R3 259 4.10 Properties of Matrix Transformations 270 4.11 Application: Geometry of Matrix Operators on R2 x

248

280

1

Contents

C HA PT E R

5

Eigenvalues and Eigenvectors 5.1 5.2 5.3 5.4 5.5

C HA PT E R

6

C HA PT E R

7

8

C HA PT E R

9

10

Orthogonal Matrices 401 Orthogonal Diagonalization 409 Quadratic Forms 417 Optimization Using Quadratic Forms 429 Hermitian, Unitary, and Normal Matrices

387

401

437

447

General Linear Transformations 447 Compositions and Inverse Transformations 458 Isomorphism 466 Matrices for General Linear Transformations 472 Similarity 481

Numerical Methods 9.1 9.2 9.3 9.4 9.5

C HA PT E R

Inner Products 345 Angle and Orthogonality in Inner Product Spaces 355 Gram–Schmidt Process; QR-Decomposition 364 Best Approximation; Least Squares 378 Application: Mathematical Modeling Using Least Squares Application: Function Approximation; Fourier Series 394

General Linear Transformations 8.1 8.2 8.3 8.4 8.5

332

345

Diagonalization and Quadratic Forms 7.1 7.2 7.3 7.4 7.5

C HA PT E R

Eigenvalues and Eigenvectors 291 Diagonalization 302 Complex Vector Spaces 313 Application: Differential Equations 326 Application: Dynamical Systems and Markov Chains

Inner Product Spaces 6.1 6.2 6.3 6.4 6.5 6.6

291

491

LU-Decompositions 491 The Power Method 501 Comparison of Procedures for Solving Linear Systems 509 Singular Value Decomposition 514 Application: Data Compression Using Singular Value Decomposition

Applications of Linear Algebra

527

10.1 Constructing Curves and Surfaces Through Specified Points 10.2 The Earliest Applications of Linear Algebra 533 10.3 Cubic Spline Interpolation 540

528

521

xi

xii

Contents

10.4 Markov Chains 551 10.5 Graph Theory 561 10.6 Games of Strategy 570 10.7 Leontief Economic Models 579 10.8 Forest Management 588 10.9 Computer Graphics 595 10.10 Equilibrium Temperature Distributions 603 10.11 Computed Tomography 613 10.12 Fractals 624 10.13 Chaos 639 10.14 Cryptography 652 10.15 Genetics 663 10.16 Age-Specific Population Growth 673 10.17 Harvesting of Animal Populations 683 10.18 A Least Squares Model for Human Hearing 10.19 Warps and Morphs 697 10.20 Internet Search Engines 706

APPENDIX A

Working with Proofs

APPENDIX B

Complex Numbers

A1

A5

Answers to Exercises Index

I1

A13

691

CHAPTER

1

Systems of Linear Equations and Matrices CHAPTER CONTENTS

1.1

Introduction to Systems of Linear Equations

1.2

Gaussian Elimination

1.3

Matrices and Matrix Operations

1.4

Inverses; Algebraic Properties of Matrices

1.5

Elementary Matrices and a Method for Finding A−1

1.6

More on Linear Systems and Invertible Matrices

1.7

Diagonal,Triangular, and Symmetric Matrices

11

1.8

MatrixTransformations

1.9

Applications of Linear Systems • • • •

25 39 52

61

67

75 84

Network Analysis (Traffic Flow) 84 Electrical Circuits 86 Balancing Chemical Equations 88 Polynomial Interpolation 91

1.10 Leontief Input-Output Models INTRODUCTION

2

96

Information in science, business, and mathematics is often organized into rows and columns to form rectangular arrays called “matrices” (plural of “matrix”). Matrices often appear as tables of numerical data that arise from physical observations, but they occur in various mathematical contexts as well. For example, we will see in this chapter that all of the information required to solve a system of equations such as 5x + y = 3 2x − y = 4 is embodied in the matrix



5

1 2 −1



3 4

and that the solution of the system can be obtained by performing appropriate operations on this matrix. This is particularly important in developing computer programs for solving systems of equations because computers are well suited for manipulating arrays of numerical information. However, matrices are not simply a notational tool for solving systems of equations; they can be viewed as mathematical objects in their own right, and there is a rich and important theory associated with them that has a multitude of practical applications. It is the study of matrices and related topics that forms the mathematical field that we call “linear algebra.” In this chapter we will begin our study of matrices.

1

2

Chapter 1 Systems of Linear Equations and Matrices

1.1 Introduction to Systems of Linear Equations Systems of linear equations and their solutions constitute one of the major topics that we will study in this course. In this first section we will introduce some basic terminology and discuss a method for solving such systems.

Linear Equations

Recall that in two dimensions a line in a rectangular xy -coordinate system can be represented by an equation of the form

ax + by = c (a, b not both 0) and in three dimensions a plane in a rectangular xyz-coordinate system can be represented by an equation of the form

ax + by + cz = d (a, b, c not all 0) These are examples of “linear equations,” the first being a linear equation in the variables x and y and the second a linear equation in the variables x , y , and z. More generally, we define a linear equation in the n variables x1 , x2 , . . . , xn to be one that can be expressed in the form a1 x1 + a2 x2 + · · · + an xn = b (1) where a1 , a2 , . . . , an and b are constants, and the a ’s are not all zero. In the special cases where n = 2 or n = 3, we will often use variables without subscripts and write linear equations as

a1 x + a2 y = b (a1 , a2 not both 0) a1 x + a2 y + a3 z = b (a1 , a2 , a3 not all 0)

(2) (3)

In the special case where b = 0, Equation (1) has the form

a1 x1 + a2 x2 + · · · + an xn = 0

(4)

which is called a homogeneous linear equation in the variables x1 , x2 , . . . , xn .

E X A M P L E 1 Linear Equations

Observe that a linear equation does not involve any products or roots of variables. All variables occur only to the first power and do not appear, for example, as arguments of trigonometric, logarithmic, or exponential functions. The following are linear equations:

x + 3y = 7 1 x − y + 3z = −1 2

x1 − 2x2 − 3x3 + x4 = 0 x1 + x2 + · · · + xn = 1

The following are not linear equations:

x + 3y 2 = 4 sin x + y = 0

3x + 2y − xy = 5

√ x1 + 2x2 + x3 = 1

A finite set of linear equations is called a system of linear equations or, more briefly, a linear system. The variables are called unknowns. For example, system (5) that follows has unknowns x and y , and system (6) has unknowns x1 , x2 , and x3 . 5x + y = 3 2x − y = 4

4x1 − x2 + 3x3 = −1 3x1 + x2 + 9x3 = −4

(5–6)

1.1 Introduction to Systems of Linear Equations

The double subscripting on the coefficients aij of the unknowns gives their location in the system—the first subscript indicates the equation in which the coefficient occurs, and the second indicates which unknown it multiplies. Thus, a12 is in the first equation and multiplies x2 .

3

A general linear system of m equations in the n unknowns x1 , x2 , . . . , xn can be written as

a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = bm

(7)

A solution of a linear system in n unknowns x1 , x2 , . . . , xn is a sequence of n numbers s1 , s2 , . . . , sn for which the substitution

x1 = s1 , x2 = s2 , . . . , xn = sn makes each equation a true statement. For example, the system in (5) has the solution

x = 1, y = −2 and the system in (6) has the solution

x1 = 1, x2 = 2, x3 = −1 These solutions can be written more succinctly as

(1, −2) and (1, 2, −1) in which the names of the variables are omitted. This notation allows us to interpret these solutions geometrically as points in two-dimensional and three-dimensional space. More generally, a solution

x1 = s1 , x2 = s2 , . . . , xn = sn of a linear system in n unknowns can be written as

(s1 , s2 , . . . , sn ) which is called an ordered n-tuple. With this notation it is understood that all variables appear in the same order in each equation. If n = 2, then the n-tuple is called an ordered pair, and if n = 3, then it is called an ordered triple.

Linear Systems inTwo and Three Unknowns

Linear systems in two unknowns arise in connection with intersections of lines. For example, consider the linear system

a1 x + b1 y = c1 a2 x + b2 y = c2 in which the graphs of the equations are lines in the xy-plane. Each solution (x, y) of this system corresponds to a point of intersection of the lines, so there are three possibilities (Figure 1.1.1): 1. The lines may be parallel and distinct, in which case there is no intersection and consequently no solution. 2. The lines may intersect at only one point, in which case the system has exactly one solution. 3. The lines may coincide, in which case there are infinitely many points of intersection (the points on the common line) and consequently infinitely many solutions. In general, we say that a linear system is consistent if it has at least one solution and inconsistent if it has no solutions. Thus, a consistent linear systemof two equations in

4

Chapter 1 Systems of Linear Equations and Matrices y

y

y

One solution

No solution

x

x

x

Figure 1.1.1

Infinitely many solutions (coincident lines)

two unknowns has either one solution or infinitely many solutions—there are no other possibilities. The same is true for a linear system of three equations in three unknowns

a1 x + b1 y + c1 z = d1 a2 x + b2 y + c2 z = d2 a3 x + b3 y + c3 z = d3 in which the graphs of the equations are planes. The solutions of the system, if any, correspond to points where all three planes intersect, so again we see that there are only three possibilities—no solutions, one solution, or infinitely many solutions (Figure 1.1.2).

No solutions (three parallel planes; no common intersection)

No solutions (two parallel planes; no common intersection)

No solutions (no common intersection)

No solutions (two coincident planes parallel to the third; no common intersection)

One solution (intersection is a point)

Infinitely many solutions (intersection is a line)

Infinitely many solutions (planes are all coincident; intersection is a plane)

Infinitely many solutions (two coincident planes; intersection is a line)

Figure 1.1.2

We will prove later that our observations about the number of solutions of linear systems of two equations in two unknowns and linear systems of three equations in three unknowns actually hold for all linear systems. That is: Every system of linear equations has zero, one, or infinitely many solutions. There are no other possibilities.

1.1 Introduction to Systems of Linear Equations

5

E X A M P L E 2 A Linear System with One Solution

Solve the linear system

x−y =1 2x + y = 6

x from the second equation by adding −2 times the first equation to the second. This yields the simplified system

Solution We can eliminate

x−y =1 3y = 4 From the second equation we obtain y = 43 , and on substituting this value in the first equation we obtain x = 1 + y = 73 . Thus, the system has the unique solution

x = 73 , y =

4 3

Geometrically, this means that  the  lines represented by the equations in the system intersect at the single point 73 , 43 . We leave it for you to check this by graphing the lines. E X A M P L E 3 A Linear System with No Solutions

Solve the linear system

x+ y=4 3x + 3y = 6

Solution We can eliminate x from the second equation by adding −3 times the first equation to the second equation. This yields the simplified system

x+y =

4

0 = −6 The second equation is contradictory, so the given system has no solution. Geometrically, this means that the lines corresponding to the equations in the original system are parallel and distinct. We leave it for you to check this by graphing the lines or by showing that they have the same slope but different y -intercepts. E X A M P L E 4 A Linear System with Infinitely Many Solutions

Solve the linear system

4x − 2y = 1 16x − 8y = 4

Solution We can eliminate x from the second equation by adding −4 times the first equation to the second. This yields the simplified system

4 x − 2y = 1 0=0 The second equation does not impose any restrictions on x and y and hence can be omitted. Thus, the solutions of the system are those values of x and y that satisfy the single equation 4x − 2y = 1 (8) Geometrically, this means the lines corresponding to the two equations in the original system coincide. One way to describe the solution set is to solve this equation for x in terms of y to obtain x = 41 + 21 y and then assign an arbitrary value t (called a parameter)

6

Chapter 1 Systems of Linear Equations and Matrices

In Example 4 we could have also obtained parametric equations for the solutions by solving (8) for y in terms of x and letting x = t be the parameter. The resulting parametric equations would look different but would define the same solution set.

to y . This allows us to express the solution by the pair of equations (called parametric equations)

x=

1 4

+ 21 t, y = t

We can obtain specific numerical solutions from these equations by substituting  1  numerical values for the parameter t . For example, t = 0 yields the solution ,0 , t = 1 4    yields the solution 43 , 1 , and t = −1 yields the solution − 41 , −1 . You can confirm that these are solutions by substituting their coordinates into the given equations.

E X A M P L E 5 A Linear System with Infinitely Many Solutions

Solve the linear system

x − y + 2z = 5 2x − 2y + 4z = 10 3x − 3y + 6z = 15

Solution This system can be solved by inspection, since the second and third equations

are multiples of the first. Geometrically, this means that the three planes coincide and that those values of x , y , and z that satisfy the equation

x − y + 2z = 5

(9)

automatically satisfy all three equations. Thus, it suffices to find the solutions of (9). We can do this by first solving this equation for x in terms of y and z, then assigning arbitrary values r and s (parameters) to these two variables, and then expressing the solution by the three parametric equations

x = 5 + r − 2s, y = r, z = s Specific solutions can be obtained by choosing numerical values for the parameters r and s . For example, taking r = 1 and s = 0 yields the solution (6, 1, 0). Augmented Matrices and Elementary Row Operations

As the number of equations and unknowns in a linear system increases, so does the complexity of the algebra involved in finding solutions. The required computations can be made more manageable by simplifying notation and standardizing procedures. For example, by mentally keeping track of the location of the +’s, the x ’s, and the =’s in the linear system

a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = bm we can abbreviate the system by writing only the rectangular array of numbers



a11

As noted in the introduction to this chapter, the term “matrix” is used in mathematics to denote a rectangular array of numbers. In a later section we will study matrices in detail, but for now we will only be concerned with augmented matrices for linear systems.

⎢ ⎢a21 ⎢ . ⎣ .. am1

a12

· · · a1n

a22 .. .

· · · a2 n .. .

am2

· · · amn

b1



⎥ b2 ⎥ .. ⎥ . ⎦ bm

This is called the augmented matrix for the system. For example, the augmented matrix for the system of equations



x1 + x2 + 2x3 = 9 2x1 + 4x2 − 3x3 = 1 3x1 + 6x2 − 5x3 = 0

is

1



⎢ ⎣2

1

2

9

4

−3

1⎦

3

6

−5

0



1.1 Introduction to Systems of Linear Equations

7

The basic method for solving a linear system is to perform algebraic operations on the system that do not alter the solution set and that produce a succession of increasingly simpler systems, until a point is reached where it can be ascertained whether the system is consistent, and if so, what its solutions are. Typically, the algebraic operations are: 1. Multiply an equation through by a nonzero constant. 2. Interchange two equations. 3. Add a constant times one equation to another. Since the rows (horizontal lines) of an augmented matrix correspond to the equations in the associated system, these three operations correspond to the following operations on the rows of the augmented matrix: 1. Multiply a row through by a nonzero constant. 2. Interchange two rows. 3. Add a constant times one row to another. These are called elementary row operations on a matrix. In the following example we will illustrate how to use elementary row operations and an augmented matrix to solve a linear system in three unknowns. Since a systematic procedure for solving linear systems will be developed in the next section, do not worry about how the steps in the example were chosen. Your objective here should be simply to understand the computations. E X A M P L E 6 Using Elementary Row Operations

In the left column we solve a system of linear equations by operating on the equations in the system, and in the right column we solve the same system by operating on the rows of the augmented matrix.



x + y + 2z = 9

1

1

2



9

2x + 4y − 3z = 1

⎢ ⎣2

4

−3

1⎦

3x + 6y − 5z = 0

3

6

−5

0

Add −2 times the first equation to the second to obtain

x + y + 2z =

9

2y − 7z = −17 3x + 6y − 5z =

Maxime Bôcher (1867–1918)

0



Add −2 times the first row to the second to obtain



1

⎢ ⎣0

1

2

2

3

6

−7 −5

9



⎥ −17⎦ 0

Historical Note The first known use of augmented matrices appeared between 200 B.C. and 100 B.C. in a Chinese manuscript entitled Nine Chapters of Mathematical Art. The coefficients were arranged in columns rather than in rows, as today, but remarkably the system was solved by performing a succession of operations on the columns. The actual use of the term augmented matrix appears to have been introduced by the American mathematician Maxime Bôcher in his book Introduction to Higher Algebra, published in 1907. In addition to being an outstanding research mathematician and an expert in Latin, chemistry, philosophy, zoology, geography, meteorology, art, and music, Bôcher was an outstanding expositor of mathematics whose elementary textbooks were greatly appreciated by students and are still in demand today. [Image: Courtesy of the American Mathematical Society www.ams.org]

8

Chapter 1 Systems of Linear Equations and Matrices

Add −3 times the first equation to the third to obtain

Add −3 times the first row to the third to obtain



x + y + 2z = 9 2y − 7z = −17 3y − 11z = −27 Multiply the second equation by

1 2

x + y + 2z =

to obtain

⎢ ⎣0

1

2

2

−7

0

3

−11

Multiply the second row by



9

1

= − 172

⎢ ⎣0

3y − 11z = −27

0

y−

7 z 2

Add −3 times the second equation to the third to obtain

x + y + 2z =



1

9

− 21 z = − 23 Multiply the third equation by −2 to obtain

x + y + 2z =

y−

11 z 2 7 z 2

= =

z=

3

y

−11

1

− 27

0

0

− 21

0

⎥ −17⎦ −27 1 2

to obtain

9



⎥ − 172 ⎦ −27



9

⎥ − 172 ⎥ ⎦ − 23

1

2

1

− 27

0

1



9

⎥ − 172 ⎦ 3

Add −1 times the second row to the first to obtain



⎢ ⎢0 ⎣

0 1

11 2 − 27

0

0

1

1

Add −11 times the third equation to the first 2 and 27 times the third equation to the second to obtain

x

3

⎢ ⎢0 ⎣

⎢ ⎣0

3

35 2 − 172

− 27

2

1

Add −1 times the second equation to the first to obtain

+

1

1



y − 27 z = − 172

x

2



Multiply the third row by −2 to obtain

9

z=

1

9

Add −3 times the second row to the third to obtain

y − 27 z = − 172

The solution in this example can also be expressed as the ordered triple (1, 2, 3) with the understanding that the numbers in the triple are in the same order as the variables in the system, namely, x, y, z.

1



35 2 ⎥ − 172 ⎥ ⎦

3

Add − 11 times the third row to the first and 2 times the third row to the second to obtain



=1 =2



⎢ ⎣0

0

0

1

1

0

2⎦

0

0

1

3

z=3

1

7 2



The solution x = 1, y = 2, z = 3 is now evident.

Exercise Set 1.1 1. In each part, determine whether the equation is linear in x1 , x2 , and x3 . (a) x1 + 5x2 −



2 x3 = 1

(c) x1 = −7x2 + 3x3 (e)

3/5 x1

− 2x2 + x3 = 4

2. In each part, determine whether the equation is linear in x and y .

(b) x1 + 3x2 + x1 x3 = 2

(a) 21/3 x +

(d) x1−2 + x2 + 8x3 = 5

(c) cos

(f ) πx1 −

(e) xy = 1



2 x2 = 7

1/3

π  7



3y = 1

x − 4y = log 3



(b) 2x 1/3 + 3 y = 1 (d)

π 7

cos x − 4y = 0

(f ) y + 7 = x

1.1 Introduction to Systems of Linear Equations

3. Using the notation of Formula (7), write down a general linear system of

(d)

(a) two equations in two unknowns. (b) three equations in three unknowns. (c) two equations in four unknowns. 4. Write down the augmented matrix for each of the linear systems in Exercise 3. In each part of Exercises 5–6, find a linear system in the unknowns x1 , x2 , x3 , . . . , that corresponds to the given augmented matrix.



2 ⎢ 5. (a) ⎣3 0

 6. (a)



0 −4 1

0

3

−1

5

2

0



3 ⎢−4 ⎢ (b) ⎢ ⎣−1 0



0 ⎥ 0⎦ 1

0 0 3 0

3 ⎢ (b) ⎣7 0

−1 −3 −4

1 4 0 0

1 −2 −1

−1 −6



0 1 −2

−2 4 1



5 ⎥ −3⎦ 7

(c)

x3

(d)

2

, 25 , 2



,

10 2 , 7 7



(e)

5



7

, 87 , 0

7

, 227 , 2

5

(c) (5, 8, 1)



11. In each part, solve the linear system, if possible, and use the result to determine whether the lines represented by the equations in the system have zero, one, or infinitely many points of intersection. If there is a single point of intersection, give its coordinates, and if there are infinitely many, find parametric equations for them. (a) 3x − 2y = 4 6x − 4 y = 9

(b) 2x − 4y = 1 4 x − 8y = 2

(c) x − 2y = 0 x − 4y = 8

12. Under what conditions on a and b will the following linear system have no solutions, one solution, infinitely many solutions? 2 x − 3y = a 4x − 6y = b

(d) 3v − 8w + 2x − y + 4z = 0 14. (a) x + 10y = 2 (b) x1 + 3x2 − 12x3 = 3 (c) 4x1 + 2x2 + 3x3 + x4 = 20 (d) v + w + x − 5y + 7z = 0 In Exercises 15–16, each linear system has infinitely many solutions. Use parametric equations to describe its solution set.

(b) 2x1 + 2x3 = 1 3x1 − x2 + 4x3 = 7 6x1 + x2 − x3 = 0

=1 =2 =3

2x1 − 4x2 − x3 = 1 x1 − 3x2 + x3 = 1 3x1 − 5x2 − 3x3 = 1

 13

7

5

(b)

(c) −8x1 + 2x2 − 5x3 + 6x4 = 1

9. In each part, determine whether the given 3-tuple is a solution of the linear system

(a) (3, 1, 1)

 , 87 , 1

(b) 3x1 − 5x2 + 4x3 = 7

(b) 6x1 − x2 + 3x3 = 4 5x2 − x3 = 1

8. (a) 3x1 − 2x2 = −1 4x1 + 5x2 = 3 7x1 + 3x2 = 2

x2

7

13. (a) 7x − 5y = 3

2x2 − 3x4 + x5 = 0 −3x1 − x2 + x3 = −1 6x1 + 2x2 − x3 + 2x4 − 3x5 = 6

(c) x1

5

In each part of Exercises 13–14, use parametric equations to describe the solution set of the linear equation.



3 −3 ⎥ ⎥ ⎥ −9 ⎦ −2

In each part of Exercises 7–8, find the augmented matrix for the linear system. 7. (a) −2x1 = 6 3x1 = 8 9x1 = −3

(a)

9

(b) (3, −1, 1)

(c) (13, 5, 2)

(e) (17, 7, 5)

10. In each part, determine whether the given 3-tuple is a solution of the linear system

x + 2y − 2z = 3 3x − y + z = 1 −x + 5y − 5z = 5

15. (a) 2x − 3y = 1 6 x − 9y = 3 (b)

x1 + 3x2 − x3 = −4 3x1 + 9x2 − 3x3 = −12 −x1 − 3x2 + x3 = 4

16. (a) 6x1 + 2x2 = −8 3x1 + x2 = −4

(b)

2x − y + 2z = −4 6x − 3y + 6z = −12 −4 x + 2 y − 4 z = 8

In Exercises 17–18, find a single elementary row operation that will create a 1 in the upper left corner of the given augmented matrix and will not create any fractions in its first row.



−3

17. (a) ⎣ 2 0



2 18. (a) ⎣ 7 −5

−1 −3 2 4 1 4



2 3 −3

4 2⎦ 1

−6

8 3⎦ 7

4 2





0 (b) ⎣2 1



7 (b) ⎣ 3 −6

−1 −9



−5

0 2⎦ 3

3

−3

4

−4 −1 3

−2 8

−1



2 1⎦ 4

10

Chapter 1 Systems of Linear Equations and Matrices

In Exercises 19–20, find all values of k for which the given augmented matrix corresponds to a consistent linear system.





k

−4

8

2

3 −6

−4

k

8

5

20. (a)

1 4

19. (a)

1 4

(b)



k

(b)

4

k 8 1 −1

−1 −4 −2



2

21. The curve y = ax 2 + bx + c shown in the accompanying figure passes through the points (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ). Show that the coefficients a , b, and c form a solution of the system of linear equations whose augmented matrix is



x12

⎢ 2 ⎣x2 x32 y

y1



x1

1

x2

1

⎥ y2 ⎦

x3

1

y3

Let x, y, and z denote the number of ounces of the first, second, and third foods that the dieter will consume at the main meal. Find (but do not solve) a linear system in x, y, and z whose solution tells how many ounces of each food must be consumed to meet the diet requirements. 26. Suppose that you want to find values for a, b, and c such that the parabola y = ax 2 + bx + c passes through the points (1, 1), (2, 4), and (−1, 1). Find (but do not solve) a system of linear equations whose solutions provide values for a, b, and c. How many solutions would you expect this system of equations to have, and why? 27. Suppose you are asked to find three real numbers such that the sum of the numbers is 12, the sum of two times the first plus the second plus two times the third is 5, and the third number is one more than the first. Find (but do not solve) a linear system whose equations describe the three conditions.

True-False Exercises

y = ax2 + bx + c

TF. In parts (a)–(h) determine whether the statement is true or false, and justify your answer.

(x3, y3) (x1, y1)

(a) A linear system whose equations are all homogeneous must be consistent.

(x2, y2) x

Figure Ex-21

22. Explain why each of the three elementary row operations does not affect the solution set of a linear system. 23. Show that if the linear equations

x1 + kx2 = c

and

x1 + lx2 = d

have the same solution set, then the two equations are identical (i.e., k = l and c = d ). 24. Consider the system of equations

ax + by = k cx + dy = l ex + fy = m Discuss the relative positions of the lines ax + by = k , cx + dy = l , and ex + fy = m when (a) the system has no solutions. (b) the system has exactly one solution.

(b) Multiplying a row of an augmented matrix through by zero is an acceptable elementary row operation. (c) The linear system

x− y =3 2x − 2y = k cannot have a unique solution, regardless of the value of k . (d) A single linear equation with two or more unknowns must have infinitely many solutions. (e) If the number of equations in a linear system exceeds the number of unknowns, then the system must be inconsistent. (f ) If each equation in a consistent linear system is multiplied through by a constant c, then all solutions to the new system can be obtained by multiplying solutions from the original system by c. (g) Elementary row operations permit one row of an augmented matrix to be subtracted from another. (h) The linear system with corresponding augmented matrix



(c) the system has infinitely many solutions. 25. Suppose that a certain diet calls for 7 units of fat, 9 units of protein, and 16 units of carbohydrates for the main meal, and suppose that an individual has three possible foods to choose from to meet these requirements: Food 1: Each ounce contains 2 units of fat, 2 units of protein, and 4 units of carbohydrates. Food 2: Each ounce contains 3 units of fat, 1 unit of protein, and 2 units of carbohydrates. Food 3: Each ounce contains 1 unit of fat, 3 units of protein, and 5 units of carbohydrates.

2 0

−1 0

4 −1

is consistent.

Working withTechnology T1. Solve the linear systems in Examples 2, 3, and 4 to see how your technology utility handles the three types of systems. T2. Use the result in Exercise 21 to find values of a , b, and c for which the curve y = ax 2 + bx + c passes through the points (−1, 1, 4), (0, 0, 8), and (1, 1, 7).

1.2 Gaussian Elimination

11

1.2 Gaussian Elimination In this section we will develop a systematic procedure for solving systems of linear equations. The procedure is based on the idea of performing certain operations on the rows of the augmented matrix that simplify it to a form from which the solution of the system can be ascertained by inspection.

Considerations in Solving Linear Systems

When considering methods for solving systems of linear equations, it is important to distinguish between large systems that must be solved by computer and small systems that can be solved by hand. For example, there are many applications that lead to linear systems in thousands or even millions of unknowns. Large systems require special techniques to deal with issues of memory size, roundoff errors, solution time, and so forth. Such techniques are studied in the field of numerical analysis and will only be touched on in this text. However, almost all of the methods that are used for large systems are based on the ideas that we will develop in this section.

Echelon Forms

In Example 6 of the last section, we solved a linear system in the unknowns x , y , and z by reducing the augmented matrix to the form



1 ⎢0 ⎣ 0

0 1 0

0 0 1



1 2⎥ ⎦ 3

from which the solution x = 1, y = 2, z = 3 became evident. This is an example of a matrix that is in reduced row echelon form. To be of this form, a matrix must have the following properties: 1. If a row does not consist entirely of zeros, then the first nonzero number in the row is a 1. We call this a leading 1. 2. If there are any rows that consist entirely of zeros, then they are grouped together at the bottom of the matrix. 3. In any two successive rows that do not consist entirely of zeros, the leading 1 in the lower row occurs farther to the right than the leading 1 in the higher row. 4. Each column that contains a leading 1 has zeros everywhere else in that column. A matrix that has the first three properties is said to be in row echelon form. (Thus, a matrix in reduced row echelon form is of necessity in row echelon form, but not conversely.) E X A M P L E 1 Row Echelon and Reduced Row Echelon Form

The following matrices are in reduced row echelon form.



1 ⎢ ⎣0 0

0 1 0

0 0 1





4 1 ⎥ ⎢ 7⎦ , ⎣0 0 −1

0 1 0



0 ⎥ 0⎦ , 1



0 ⎢0 ⎢ ⎢ ⎣0 0

1 0 0 0

−2 0 0 0

0 1 0 0



1 3⎥ 0 ⎥ ⎥, 0⎦ 0 0

0 0

The following matrices are in row echelon form but not reduced row echelon form.



1 ⎢ ⎣0 0

4 1 0

−3 6 1





7 1 ⎥ ⎢ 2⎦ , ⎣0 5 0

1 1 0





0 0 ⎥ ⎢ 0⎦ , ⎣0 0 0

1 0 0

2 1 0

6 −1 0



0 ⎥ 0⎦ 1

12

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 2 More on Row Echelon and Reduced Row Echelon Form

As Example 1 illustrates, a matrix in row echelon form has zeros below each leading 1, whereas a matrix in reduced row echelon form has zeros below and above each leading 1. Thus, with any real numbers substituted for the ∗’s, all matrices of the following types are in row echelon form:



1 ⎢0 ⎢ ⎢ ⎣0 0

⎤ ∗ ∗ ∗ 1 ∗ ∗⎥ ⎥ ⎥, 0 1 ∗⎦



1 ⎢0 ⎢ ⎢ ⎣0 0

0 0 1

⎤ ∗ ∗ ∗ 1 ∗ ∗⎥ ⎥ ⎥, 0 1 ∗⎦



1 ⎢0 ⎢ ⎢ ⎣0 0

0 0 0

⎤ ∗ ∗ ∗ 1 ∗ ∗⎥ ⎥ ⎥, 0 0 0⎦ 0 0 0



0 ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎣0 0

1 0 0 0 0

⎤ ∗ ∗⎥ ⎥ ⎥ ∗⎥ ⎥ ∗⎦ 0 0 0 0 0 0 1 ∗

∗ ∗ ∗ ∗ ∗ 0 1 ∗ ∗ ∗ 0 0 1 ∗ ∗ 0 0 0 1 ∗

∗ ∗ ∗ ∗

All matrices of the following types are in reduced row echelon form:











1 0 0 0 1 0 0 ∗ 1 0 ⎢0 1 0 0⎥ ⎢0 1 0 ∗⎥ ⎢0 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ ⎥, ⎢ ⎥, ⎢ ⎣0 0 1 0⎦ ⎣0 0 1 ∗⎦ ⎣0 0 0 0 0 1

0 0

0 0 0 0





0 ∗ ∗ ⎢0 ⎢ ∗ ∗⎥ ⎥ ⎢ ⎥ , ⎢0 ⎢ 0 0⎦ ⎣0 0 0 0

1 0 0 0 0

∗ 0 0 0 ∗ 0 1 0 0 ∗ 0 0 1 0 ∗ 0 0 0 1 ∗

∗ ∗ ∗ ∗

0 0 0 0 0 0 0 0 0 0 1

∗ ∗ ∗ ∗

⎤ ∗ ∗⎥ ⎥ ⎥ ∗⎥ ⎥ ∗⎦ ∗

If, by a sequence of elementary row operations, the augmented matrix for a system of linear equations is put in reduced row echelon form, then the solution set can be obtained either by inspection or by converting certain linear equations to parametric form. Here are some examples. E X A M P L E 3 Unique Solution

Suppose that the augmented matrix for a linear system in the unknowns x1 , x2 , x3 , and x4 has been reduced by elementary row operations to



1 ⎢0 ⎢ ⎢ ⎣0 0

0 1 0 0

0 0 1 0

0 0 0 1



3 −1⎥ ⎥ ⎥ 0⎦ 5

This matrix is in reduced row echelon form and corresponds to the equations

x1 In Example 3 we could, if desired, express the solution more succinctly as the 4-tuple (3, −1, 0, 5).

x2 x3

= 3 = −1 = 0 x4 = 5

Thus, the system has a unique solution, namely, x1 = 3, x2 = −1, x3 = 0, x4 = 5. E X A M P L E 4 Linear Systems in Three Unknowns

In each part, suppose that the augmented matrix for a linear system in the unknowns x , y , and z has been reduced by elementary row operations to the given reduced row echelon form. Solve the system.



1 ⎢ (a) ⎣0 0

0 1 0

0 2 0



0 ⎥ 0⎦ 1



1 ⎢ (b) ⎣0 0

0 1 0

3 −4 0

⎤ −1 ⎥ 2⎦ 0



1 ⎢ (c) ⎣0 0

−5 0 0

1 0 0



4 ⎥ 0⎦ 0

1.2 Gaussian Elimination

13

Solution (a) The equation that corresponds to the last row of the augmented matrix is

0x + 0y + 0z = 1 Since this equation is not satisfied by any values of x , y , and z, the system is inconsistent. Solution (b) The equation that corresponds to the last row of the augmented matrix is

0x + 0y + 0z = 0 This equation can be omitted since it imposes no restrictions on x , y , and z; hence, the linear system corresponding to the augmented matrix is

+ 3z = −1 y − 4z = 2

x

Since x and y correspond to the leading 1’s in the augmented matrix, we call these the leading variables. The remaining variables (in this case z) are called free variables. Solving for the leading variables in terms of the free variables gives

x = −1 − 3z y = 2 + 4z From these equations we see that the free variable z can be treated as a parameter and assigned an arbitrary value t , which then determines values for x and y . Thus, the solution set can be represented by the parametric equations

x = −1 − 3t, y = 2 + 4t, z = t By substituting various values for t in these equations we can obtain various solutions of the system. For example, setting t = 0 yields the solution

x = −1, y = 2, z = 0 and setting t = 1 yields the solution

x = −4, y = 6, z = 1 Solution (c) As explained in part (b), we can omit the equations corresponding to the zero rows, in which case the linear system associated with the augmented matrix consists of the single equation x − 5y + z = 4 (1)

We will usually denote parameters in a general solution by the letters r, s, t, . . . , but any letters that do not conflict with the names of the unknowns can be used. For systems with more than three unknowns, subscripted letters such as t1 , t2 , t3 , . . . are convenient.

from which we see that the solution set is a plane in three-dimensional space. Although (1) is a valid form of the solution set, there are many applications in which it is preferable to express the solution set in parametric form. We can convert (1) to parametric form by solving for the leading variable x in terms of the free variables y and z to obtain

x = 4 + 5y − z From this equation we see that the free variables can be assigned arbitrary values, say y = s and z = t , which then determine the value of x . Thus, the solution set can be expressed parametrically as

x = 4 + 5s − t, y = s, z = t

(2)

Formulas, such as (2), that express the solution set of a linear system parametrically have some associated terminology. DEFINITION 1 If a linear system has infinitely many solutions, then a set of parametric

equations from which all solutions can be obtained by assigning numerical values to the parameters is called a general solution of the system.

14

Chapter 1 Systems of Linear Equations and Matrices

Elimination Methods

We have just seen how easy it is to solve a system of linear equations once its augmented matrix is in reduced row echelon form. Now we will give a step-by-step elimination procedure that can be used to reduce any matrix to reduced row echelon form. As we state each step in the procedure, we illustrate the idea by reducing the following matrix to reduced row echelon form.



0

0

⎢ ⎣2

4

2

4

−2 −10 −5

0

7

6

12

6

12

⎤ ⎥

28⎦

−5 −1

Step 1. Locate the leftmost column that does not consist entirely of zeros.



0 ⎢ 2 ⎣ 2

0 4 4

2 10 5

0 6 6

7 12 5

⎤ 12 ⎥ 28⎦ 1

Leftmost nonzero column

Step 2. Interchange the top row with another row, if necessary, to bring a nonzero entry to the top of the column found in Step 1.



2

⎢ ⎣0 2

−10 −2 0 −5 4

4

6

12

0

7

6

−5



28



12⎦

The first and second rows in the preceding matrix were interchanged.

−1

Step 3. If the entry that is now at the top of the column found in Step 1 is a , multiply the first row by 1/a in order to introduce a leading 1.



1

⎢ ⎣0 2

−5 0 −2 4 −5 2



3

6

14

0

7

12⎦

6

−5



The first row of the preceding matrix was multiplied by 21 .

−1

Step 4. Add suitable multiples of the top row to the rows below so that all entries below the leading 1 become zeros.



1

⎢ ⎣0 0



−5 0 −2

3

6

14

0

7

12⎦

0

0

2

5



−17 −29

−2 times the first row of the preceding matrix was added to the third row.

Step 5. Now cover the top row in the matrix and begin again with Step 1 applied to the submatrix that remains. Continue in this way until the entire matrix is in row echelon form.

⎡ 1 ⎢ ⎣0 0

2 0 0

5 2 5

3 0 0

6 7 17

⎤ 14 ⎥ 12 ⎦ 29

Leftmost nonzero column in the submatrix

⎡ 1 ⎢ ⎣0

2

5

3

6

0

1

0

7 2

0

0

5

0

17

14



⎥ 6⎦

29

The first row in the submatrix was multiplied by 1 to introduce a 2 leading 1.

1.2 Gaussian Elimination

⎡ 1 ⎢0 ⎣

2

5

3

6

0

1

0

0

0

0

7 2 1 2

0 ⎡ 1 ⎢ ⎣0

2

5

3

6

0

1

0

0

0

0

0

7 2 1 2

⎡ 1 ⎢ ⎣0 0



14

⎥ 6⎦ 1

14

15



⎥ 6⎦ 1

–5 times the first row of the submatrix was added to the second row of the submatrix to introduce a zero below the leading 1.

The top row in the submatrix was covered, and we returned again to Step 1.

Leftmost nonzero column in the new submatrix

2

5

3

6

0 0

1 0

0 0

7 2

1

14



⎥ 6⎦ 2

The first (and only) row in the new submatrix was multiplied by 2 to introduce a leading 1.

The entire matrix is now in row echelon form. To find the reduced row echelon form we need the following additional step. Step 6. Beginning with the last nonzero row and working upward, add suitable multiples of each row to the rows above to introduce zeros above the leading 1’s.



1 ⎢ ⎣0 0

2 0 0

−5

1 ⎢ ⎣0 0

2 0 0

−5

1 ⎢ ⎣0 0

2 0 0







3 0 0

6 0 1

14 ⎥ 1⎦ 2

7 times the third row of the preceding 2 matrix was added to the second row.

1 0

3 0 0

0 0 1

2 ⎥ 1⎦ 2

−6 times the third row was added to the first row.

0 1 0

3 0 0

0 0 1

7 ⎥ 1⎦ 2

5 times the second row was added to the first row.

1 0





The last matrix is in reduced row echelon form. The procedure (or algorithm) we have just described for reducing a matrix to reduced row echelon form is called Gauss–Jordan elimination. This algorithm consists of two parts, a forward phase in which zeros are introduced below the leading 1’s and a backward phase in which zeros are introduced above the leading 1’s. If only theforward phase is

Carl Friedrich Gauss (1777–1855)

Wilhelm Jordan (1842–1899)

Historical Note Although versions of Gaussian elimination were known much earlier, its importance in scientific computation became clear when the great German mathematician Carl Friedrich Gauss used it to help compute the orbit of the asteroid Ceres from limited data. What happened was this: On January 1, 1801 the Sicilian astronomer and Catholic priest Giuseppe Piazzi (1746–1826) noticed a dim celestial object that he believed might be a “missing planet.” He named the object Ceres and made a limited number of positional observations but then lost the object as it neared the Sun. Gauss, then only 24 years old, undertook the problem of computing the orbit of Ceres from the limited data using a technique called “least squares,” the equations of which he solved by the method that we now call “Gaussian elimination.” The work of Gauss created a sensation when Ceres reappeared a year later in the constellation Virgo at almost the precise position that he predicted! The basic idea of the method was further popularized by the German engineer Wilhelm Jordan in his book on geodesy (the science of measuring Earth shapes) entitled Handbuch der Vermessungskunde and published in 1888. [Images: Photo Inc/Photo Researchers/Getty Images (Gauss); Leemage/Universal Images Group/Getty Images (Jordan)]

16

Chapter 1 Systems of Linear Equations and Matrices

used, then the procedure produces a row echelon form and is called Gaussian elimination. For example, in the preceding computations a row echelon form was obtained at the end of Step 5. E X A M P L E 5 Gauss–Jordan Elimination

Solve by Gauss–Jordan elimination.

x1 + 3x2 − 2x3 + 2x5 2x1 + 6x2 − 5x3 − 2x4 + 4x5 − 3x6 5x3 + 10x4 + 15x6 2x1 + 6x2 + 8x4 + 4x5 + 18x6

= 0 = −1 = 5 = 6

Solution The augmented matrix for the system is



1 ⎢2 ⎢ ⎢ ⎣0 2

3 6 0 6

−2 −5 5 0

0 −2 10 8

2 4 0 4

0 −3 15 18



0 −1⎥ ⎥ ⎥ 5⎦ 6

Adding −2 times the first row to the second and fourth rows gives



1 ⎢0 ⎢ ⎢ ⎣0 0

3 0 0 0

−2 −1 5 4

0 −2 10 8

2 0 0 0

0 −3 15 18



0 −1⎥ ⎥ ⎥ 5⎦ 6

Multiplying the second row by −1 and then adding −5 times the new second row to the third row and −4 times the new second row to the fourth row gives



1 ⎢0 ⎢ ⎢ ⎣0 0

3 0 0 0

−2 1 0 0

0 2 0 0

2 0 0 0



0 3 0 6

0 1⎥ ⎥ ⎥ 0⎦ 2

Interchanging the third and fourth rows and then multiplying the third row of the resulting matrix by 16 gives the row echelon form



1 ⎢0 ⎢

3 0

−2

0

0 0

⎢ ⎣0



1

0 2

2 0

0 3

0 1⎥ ⎥

0 0

0 0

0 0

1 0

0

1⎥ ⎦ 3

This completes the forward phase since there are zeros below the leading 1’s.

Adding −3 times the third row to the second row and then adding 2 times the second row of the resulting matrix to the first row yields the reduced row echelon form



3 0

0 1

4 2

2 0

0 0

0 0⎥ ⎥

0

0 0

0 0

0 0

0 0

1 0

0

⎢ ⎣0 Note that in constructing the linear system in (3) we ignored the row of zeros in the corresponding augmented matrix. Why is this justified?



1 ⎢0 ⎢

1⎥ ⎦ 3

This completes the backward phase since there are zeros above the leading 1’s.

The corresponding system of equations is

x1 + 3x2

+ 4x4 + 2x5 x3 + 2x4

=0 =0 x6 =

1 3

(3)

1.2 Gaussian Elimination

17

Solving for the leading variables, we obtain

x1 = −3x2 − 4x4 − 2x5 x3 = −2x4 x6 =

1 3

Finally, we express the general solution of the system parametrically by assigning the free variables x2 , x4 , and x5 arbitrary values r, s , and t , respectively. This yields

x1 = −3r − 4s − 2t, x2 = r, x3 = −2s, x4 = s, x5 = t, x6 =

Homogeneous Linear Systems

1 3

A system of linear equations is said to be homogeneous if the constant terms are all zero; that is, the system has the form

a11 x1 + a12 x2 + · · · + a1n xn = 0 a21 x1 + a22 x2 + · · · + a2n xn = 0 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = 0 Every homogeneous system of linear equations is consistent because all such systems have x1 = 0, x2 = 0, . . . , xn = 0 as a solution. This solution is called the trivial solution; if there are other solutions, they are called nontrivial solutions. Because a homogeneous linear system always has the trivial solution, there are only two possibilities for its solutions: • The system has only the trivial solution. • The system has infinitely many solutions in addition to the trivial solution. In the special case of a homogeneous linear system of two equations in two unknowns, say a1 x + b1 y = 0 (a1 , b1 not both zero)

a2 x + b2 y = 0 (a2 , b2

not both zero)

the graphs of the equations are lines through the origin, and the trivial solution corresponds to the point of intersection at the origin (Figure 1.2.1).

y

y a1x + b1y = 0 x a 2 x + b2 y = 0

Only the trivial solution

Figure 1.2.1

x a1x + b1y = 0 and a 2 x + b2 y = 0 Infinitely many solutions

There is one case in which a homogeneous system is assured of having nontrivial solutions—namely, whenever the system involves more unknowns than equations. To see why, consider the following example of four equations in six unknowns.

18

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 6 A Homogeneous System

Use Gauss–Jordan elimination to solve the homogeneous linear system

x1 + 3x2 − 2x3 + 2 x5 2x1 + 6x2 − 5x3 − 2x4 + 4x5 − 3x6 + 15x6 5x3 + 10x4 + 8x4 + 4x5 + 18x6 2x1 + 6x2

=0 =0 =0 =0

(4)

Solution Observe first that the coefficients of the unknowns in this system are the same

as those in Example 5; that is, the two systems differ only in the constants on the right side. The augmented matrix for the given homogeneous system is



1 ⎢2 ⎢ ⎢ ⎣0 2

3 6 0 6

−2 −5

0 −2 10 8

5 0

2 4 0 4



0 −3 15 18

0 0⎥ ⎥ ⎥ 0⎦ 0

(5)

which is the same as the augmented matrix for the system in Example 5, except for zeros in the last column. Thus, the reduced row echelon form of this matrix will be the same as that of the augmented matrix in Example 5, except for the last column. However, a moment’s reflection will make it evident that a column of zeros is not changed by an elementary row operation, so the reduced row echelon form of (5) is



1 ⎢0 ⎢ ⎢ ⎣0 0

3 0 0 0

0 1 0 0

4 2 0 0

2 0 0 0

0 0 1 0



0 0⎥ ⎥ ⎥ 0⎦ 0

(6)

The corresponding system of equations is

x1 + 3x2

+ 4x4 + 2x5 x3 + 2x4

=0 =0 x6 = 0

Solving for the leading variables, we obtain

x1 = −3x2 − 4x4 − 2x5 x3 = −2x4 x6 = 0

(7)

If we now assign the free variables x2 , x4 , and x5 arbitrary values r , s , and t , respectively, then we can express the solution set parametrically as

x1 = −3r − 4s − 2t, x2 = r, x3 = −2s, x4 = s, x5 = t, x6 = 0 Note that the trivial solution results when r = s = t = 0.

Free Variables in Homogeneous Linear Systems

Example 6 illustrates two important points about solving homogeneous linear systems: 1. Elementary row operations do not alter columns of zeros in a matrix, so the reduced row echelon form of the augmented matrix for a homogeneous linear system has a final column of zeros. This implies that the linear system corresponding to the reduced row echelon form is homogeneous, just like the original system.

1.2 Gaussian Elimination

19

2. When we constructed the homogeneous linear system corresponding to augmented matrix (6), we ignored the row of zeros because the corresponding equation 0x1 + 0x2 + 0x3 + 0x4 + 0x5 + 0x6 = 0 does not impose any conditions on the unknowns. Thus, depending on whether or not the reduced row echelon form of the augmented matrix for a homogeneous linear system has any rows of zero, the linear system corresponding to that reduced row echelon form will either have the same number of equations as the original system or it will have fewer. Now consider a general homogeneous linear system with n unknowns, and suppose that the reduced row echelon form of the augmented matrix has r nonzero rows. Since each nonzero row has a leading 1, and since each leading 1 corresponds to a leading variable, the homogeneous system corresponding to the reduced row echelon form of the augmented matrix must have r leading variables and n − r free variables. Thus, this system is of the form  xk1 + ()=0

+

xk2 .. 



()=0 .. . .  x kr + ( ) = 0

(8)

where in each equation the expression ( ) denotes a sum that involves the free variables, if any [see (7), for example]. In summary, we have the following result.

THEOREM 1.2.1 Free Variable Theorem for Homogeneous Systems

If a homogeneous linear system has n unknowns, and if the reduced row echelon form of its augmented matrix has r nonzero rows, then the system has n − r free variables.

Note that Theorem 1.2.2 applies only to homogeneous systems—a nonhomogeneous system with more unknowns than equations need not be consistent. However, we will prove later that if a nonhomogeneous system with more unknowns then equations is consistent, then it has infinitely many solutions.

Theorem 1.2.1 has an important implication for homogeneous linear systems with more unknowns than equations. Specifically, if a homogeneous linear system has m equations in n unknowns, and if m < n, then it must also be true that r < n (why?). This being the case, the theorem implies that there is at least one free variable, and this implies that the system has infinitely many solutions. Thus, we have the following result.

THEOREM 1.2.2 A homogeneous linear system with more unknowns than equations has

infinitely many solutions.

In retrospect, we could have anticipated that the homogeneous system in Example 6 would have infinitely many solutions since it has four equations in six unknowns.

Gaussian Elimination and Back-Substitution

For small linear systems that are solved by hand (such as most of those in this text), Gauss–Jordan elimination (reduction to reduced row echelon form) is a good procedure to use. However, for large linear systems that require a computer solution, it is generally more efficient to use Gaussian elimination (reduction to row echelon form) followed by a technique known as back-substitution to complete the process of solving the system. The next example illustrates this technique.

20

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 7 Example 5 Solved by Back-Substitution

From the computations in Example 5, a row echelon form of the augmented matrix is



1 ⎢0 ⎢ ⎢ ⎣0 0

3 0 0 0

−2

0 2 0 0

1 0 0

2 0 0 0



0 3 1 0

0 1⎥ ⎥ 1⎥ ⎦ 3

0

To solve the corresponding system of equations

x1 + 3x2 − 2x3

+ 2 x5

x3 + 2x4

=0 + 3x6 = 1 x6 =

1 3

we proceed as follows: Step 1. Solve the equations for the leading variables.

x1 = −3x2 + 2x3 − 2x5 x3 = 1 − 2x4 − 3x6 x6 =

1 3

Step 2. Beginning with the bottom equation and working upward, successively substitute each equation into all the equations above it. Substituting x6 = 13 into the second equation yields

x1 = −3x2 + 2x3 − 2x5 x3 = −2x4 x6 =

1 3

Substituting x3 = −2x4 into the first equation yields

x1 = −3x2 − 4x4 − 2x5 x3 = −2x4 x6 =

1 3

Step 3. Assign arbitrary values to the free variables, if any. If we now assign x2 , x4 , and x5 the arbitrary values r , s , and t , respectively, the general solution is given by the formulas

x1 = −3r − 4s − 2t, x2 = r, x3 = −2s, x4 = s, x5 = t, x6 =

1 3

This agrees with the solution obtained in Example 5.

EXAMPLE 8

Suppose that the matrices below are augmented matrices for linear systems in the unknowns x1 , x2 , x3 , and x4 . These matrices are all in row echelon form but not reduced row echelon form. Discuss the existence and uniqueness of solutions to the corresponding linear systems

1.2 Gaussian Elimination



1 ⎢0 ⎢ (a) ⎢ ⎣0 0

−3 1 0 0

7 2 1 0

2 −4 6 0



5 1⎥ ⎥ ⎥ 9⎦ 1



1 ⎢0 ⎢ (b) ⎢ ⎣0 0

−3 1 0 0

7 2 1 0

2 −4 6 0



5 1⎥ ⎥ ⎥ 9⎦ 0



1 ⎢0 ⎢ (c) ⎢ ⎣0 0

−3 1 0 0

7 2 1 0

2 −4 6 1

21



5 1⎥ ⎥ ⎥ 9⎦ 0

Solution (a) The last row corresponds to the equation

0x1 + 0x2 + 0x3 + 0x4 = 1 from which it is evident that the system is inconsistent. Solution (b) The last row corresponds to the equation

0x1 + 0x2 + 0x3 + 0x4 = 0 which has no effect on the solution set. In the remaining three equations the variables

x1 , x2 , and x3 correspond to leading 1’s and hence are leading variables. The variable x4 is a free variable. With a little algebra, the leading variables can be expressed in terms of the free variable, and the free variable can be assigned an arbitrary value. Thus, the system must have infinitely many solutions. Solution (c) The last row corresponds to the equation

x4 = 0 which gives us a numerical value for x4 . If we substitute this value into the third equation, namely, x3 + 6x4 = 9 we obtain x3 = 9. You should now be able to see that if we continue this process and substitute the known values of x3 and x4 into the equation corresponding to the second row, we will obtain a unique numerical value for x2 ; and if, finally, we substitute the known values of x4 , x3 , and x2 into the equation corresponding to the first row, we will produce a unique numerical value for x1 . Thus, the system has a unique solution.

Some Facts About Echelon Forms

There are three facts about row echelon forms and reduced row echelon forms that are important to know but we will not prove: 1. Every matrix has a unique reduced row echelon form; that is, regardless of whether you use Gauss–Jordan elimination or some other sequence of elementary row operations, the same reduced row echelon form will result in the end.* 2. Row echelon forms are not unique; that is, different sequences of elementary row operations can result in different row echelon forms. 3. Although row echelon forms are not unique, the reduced row echelon form and all row echelon forms of a matrix A have the same number of zero rows, and the leading 1’s always occur in the same positions. Those are called the pivot positions of A. A column that contains a pivot position is called a pivot column of A.

* A proof of this result can be found in the article “The Reduced Row Echelon Form of a Matrix Is Unique: A Simple Proof,” by Thomas Yuster, Mathematics Magazine, Vol. 57, No. 2, 1984, pp. 93–94.

22

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 9 Pivot Positions and Columns

Earlier in this section (immediately after Definition 1) we found a row echelon form of



0 ⎢ A = ⎣2 2

If A is the augmented matrix for a linear system, then the pivot columns identify the leading variables. As an illustration, in Example 5 the pivot columns are 1, 3, and 6, and the leading variables are x1 , x3 , and x6 .



to be

0 4 4

1 ⎢ ⎣0 0

2 0 0



−2 −10 −5

0 6 6

7 12 −5

12 ⎥ 28⎦ −1

−5

3 0 0

6

14 ⎥ −6 ⎦ 2

1 0

− 27 1



The leading 1’s occur in positions (row 1, column 1), (row 2, column 3), and (row 3, column 5). These are the pivot positions. The pivot columns are columns 1, 3, and 5.

Roundoff Error and Instability

There is often a gap between mathematical theory and its practical implementation— Gauss–Jordan elimination and Gaussian elimination being good examples. The problem is that computers generally approximate numbers, thereby introducing roundoff errors, so unless precautions are taken, successive calculations may degrade an answer to a degree that makes it useless. Algorithms (procedures) in which this happens are called unstable. There are various techniques for minimizing roundoff error and instability. For example, it can be shown that for large linear systems Gauss–Jordan elimination involves roughly 50% more operations than Gaussian elimination, so most computer algorithms are based on the latter method. Some of these matters will be considered in Chapter 9.

Exercise Set 1.2



In Exercises 1–2, determine whether the matrix is in row echelon form, reduced row echelon form, both, or neither.



1 ⎢ 1. (a) ⎣0 0

 (d)



0 1 0



0 ⎥ 0⎦ 1

1 ⎢ (b) ⎣0 0

1

0

3

1

0

1

2

4



0



(f ) ⎣0 0



0

⎢0 ⎢ ⎣0 0



0⎦

(g)

0



2

0

1

0⎦

0

0



⎥ ⎤

5 1

−3

0

0

0

⎥ 1⎦



0 ⎢ (c) ⎣0 0



1 0 0 0

0 ⎥ 1⎦ 0



2

0

3

0

1

1

0

0

0

⎥ 1⎦

0

0

0

0

−7

5

5

0

1

3

2



1

0

0

1

0⎦

2

0





1

⎢ (e) ⎣0 0







3

4

0

1⎦

0

0

2

3

0

0⎦

0

1







2

3

4

5

0

7

1

0

0

0

1⎦

0

0

0

0



3⎥ ⎥



(g)

1

−2

0

1

0

0

1

−2



In Exercises 3–4, suppose that the augmented matrix for a linear system has been reduced by row operations to the given row echelon form. Solve the system.



1

−3

4

7

3. (a) ⎣0 0

1

2

2⎦

0

1

5

1

0

8

6

(b) ⎣0 0

1

4

−5 −9

0

1

1

2

1 ⎢0 ⎢ (c) ⎢ ⎣0 0

7 0 0 0

−2

−8

1 0 0

0 1 1 0

1 ⎢ (d) ⎣0 0

−3

7 4 0

1 ⎥ 0⎦ 1

⎡ ⎢

1



0



(c) ⎣0 0



⎢1 ⎢ (f ) ⎢ ⎣0



0⎥ ⎥

1

(b) ⎣0 0



⎢ (d) ⎣0

1

1

(e) ⎢



1



0 ⎥ 0⎦ 0







2. (a) ⎣0 0



0 1 0



1





1 0





⎤ ⎥

3⎦

6 3 0

⎤ −3 5⎥ ⎥ ⎥ 9⎦ 0

1.2 Gaussian Elimination



1

⎤ −3 ⎥ 0⎦

17. 3x1 + x2 + x3 + x4 = 0 5x1 − x2 + x3 − x4 = 0

0

0 1 0

0 0 1

1 ⎢ (b) ⎣0 0

0 1 0

0 0 1

−7 3 1

8 ⎥ 2⎦ −5

1 ⎢0 ⎢ (c) ⎢ ⎣0 0

−6

0 1 0 0

0 0 1 0

3 4 5 0

1

−3

0

0 0

0 1 0

0 ⎥ 0⎦ 1

⎢ 4. (a) ⎣0 ⎡





⎢ (d) ⎣0

0 0 0

7



−2



7⎥ ⎥ ⎥ 8⎦ 0

In Exercises 5–8, solve the linear system by Gaussian elimination. 5.

x1 + x2 + 2x3 = 8 −x1 − 2x2 + 3x3 = 1 3x1 − 7x2 + 4x3 = 10

7.

x − y + 2z − w 2x + y − 2z − 2w −x + 2y − 4z + w − 3w 3x

8.

− 2b + 3c = 1 3a + 6b − 3c = −2 6a + 6b + 3c = 5

2u + v − 4w + 3x = 0 2u + 3v + 2w − x = 0 −4u − 3v + 5w − 4x = 0

6.

2x1 + 2x2 + 2x3 = 0 −2x1 + 5x2 + 2x3 = 1 8x1 + x2 + 4x3 = −1

20. x1 + 3x2 x1 + 4x2 − 2x2 2x1 − 4x2 x1 − 2x2

+ 4z = 0 − 3z = 0 + z=0 − 2z = 0

+ 2x3 − 2x3 + x3 − x3

+ x4 = 0 =0 − x4 = 0 + x4 = 0 + x4 = 0

21. 2I1 − I2 + 3I3 I1 − 2I3 3I1 − 3I2 + I3 2I1 + I2 + 4I3 22.

= −1 = −2 = 1 = −3

+ 2y − y + y + 3y

2x w 2 w + 3x −2 w + x

+ 4I4 + 7I4 + 5I4 + 4I4

v + 3w − 2 x = 0

18.

⎤ 19.

23

= 9 = 11 = 8 = 10

Z3 + Z4 −Z1 − Z2 + 2Z3 − 3Z4 Z1 + Z2 − 2Z3 2Z1 + 2Z2 − Z3

+ Z5 + Z5 − Z5 + Z5

=0 =0 =0 =0

In each part of Exercises 23–24, the augmented matrix for a linear system is given in which the asterisk represents an unspecified real number. Determine whether the system is consistent, and if so whether the solution is unique. Answer “inconclusive” if there is not enough information to make a decision.

In Exercises 9–12, solve the linear system by Gauss–Jordan elimination. 9. Exercise 5

10. Exercise 6

11. Exercise 7

12. Exercise 8

In Exercises 13–14, determine whether the homogeneous system has nontrivial solutions by inspection (without pencil and paper).



1 23. (a) ⎣0 0



1 (c) ⎣0 0



1 24. (a) ⎣0 0

13. 2x1 − 3x2 + 4x3 − x4 = 0 7x1 + x2 − 8x3 + 9x4 = 0 2x1 + 8x2 + x3 − x4 = 0



1 (c) ⎣1 1

14. x1 + 3x2 − x3 = 0 x2 − 8x3 = 0 4 x3 = 0

∗ 1 0

∗ 1 0

∗ 1 0

∗ ∗

⎤ ∗ ∗⎦ ∗ ⎤ ∗ ∗⎦

0

1

∗ ∗

⎤ ∗ ∗⎦

1

1

∗ ∗ 1



0 0

0 0

0 1⎦









1 (b) ⎣0 0



1 (d) ⎣0 0



∗ 1 0

∗ 0 0

∗ ∗

⎤ ∗ ∗⎦

0

0

∗ ∗

⎤ ∗ 0⎦ ∗

1

1

0 1

∗ ⎡



0 0 1





0 0

0 0

(b) ⎣∗

1 (d) ⎣1 1

⎤ ∗ ∗⎦ ∗ ⎤ ∗ 1⎦ 1

In Exercises 15–22, solve the given linear system by any method.

In Exercises 25–26, determine the values of a for which the system has no solutions, exactly one solution, or infinitely many solutions.

15. 2x1 + x2 + 3x3 = 0 x1 + 2x2 =0 x2 + x3 = 0

25. x + 2y − 3z = 4 5z = 2 3x − y + 4x + y + (a 2 − 14)z = a + 2

16. 2x − y − 3z = 0 −x + 2y − 3z = 0 x + y + 4z = 0

24

Chapter 1 Systems of Linear Equations and Matrices

26. x + 2y + z=2 3z = 1 2x − 2y + x + 2y − (a 2 − 3)z = a

36. Solve the following system for x, y, and z. 1

x 2

In Exercises 27–28, what condition, if any, must a , b, and c satisfy for the linear system to be consistent? 27. x + 3y − z = a x + y + 2z = b 2 y − 3z = c

28.

x + 3y + z = a −x − 2y + z = b 3x + 7y − z = c

In Exercises 29–30, solve the following systems, where a , b, and c are constants. 29. 2x + y = a

x −

+

y 3

y 9

y



4

+

8

+

10

20 (0, 10)

2

7

(1, 7)



2

(3, –11) –20

1

⎢ ⎣0

−2

3

4

(4, –14)

Figure Ex-37

This exercise shows that a matrix can have multiple row echelon forms. 32. Reduce

=5

x



3

z

=0

6

31. Find two different row echelon forms of 1

z

=1

y

–2



z

y = ax 3 + bx 2 + cx + d.

+ 2x3 = b 3x2 + 3x3 = c

2x1

+

2

37. Find the coefficients a, b, c, and d so that the curve shown in the accompanying figure is the graph of the equation

30. x1 + x2 + x3 = a

3x + 6y = b

1

x

+

3



⎥ −29⎦

38. Find the coefficients a, b, c, and d so that the circle shown in the accompanying figure is given by the equation ax 2 + ay 2 + bx + cy + d = 0. y (–2, 7) (–4, 5)

5

to reduced row echelon form without introducing fractions at any intermediate stage.

x

33. Show that the following nonlinear system has 18 solutions if 0 ≤ α ≤ 2π , 0 ≤ β ≤ 2π , and 0 ≤ γ ≤ 2π . sin α + 2 cos β + 3 tan γ = 0 2 sin α + 5 cos β + 3 tan γ = 0

− sin α − 5 cos β + 5 tan γ = 0 [Hint: Begin by making the substitutions x = sin α ,

y = cos β , and z = tan γ .]

34. Solve the following system of nonlinear equations for the unknown angles α , β , and γ , where 0 ≤ α ≤ 2π , 0 ≤ β ≤ 2π , and 0 ≤ γ < π . 2 sin α − cos β + 3 tan γ = 3 4 sin α + 2 cos β − 2 tan γ = 2 6 sin α − 3 cos β + tan γ = 9 35. Solve the following system of nonlinear equations for x, y, and z.

x 2 + y 2 + z2 = 6 x 2 − y 2 + 2z 2 = 2 2x 2 + y 2 − z 2 = 3 [Hint: Begin by making the substitutions X = x 2 , Y = y 2 , Z = z 2 .]

(4, –3)

Figure Ex-38

39. If the linear system

a1 x + b1 y + c1 z = 0 a2 x − b2 y + c2 z = 0 a3 x + b3 y − c3 z = 0 has only the trivial solution, what can be said about the solutions of the following system?

a1 x + b1 y + c1 z = 3 a2 x − b2 y + c2 z = 7 a3 x + b3 y − c3 z = 11 40. (a) If A is a matrix with three rows and five columns, then what is the maximum possible number of leading 1’s in its reduced row echelon form? (b) If B is a matrix with three rows and six columns, then what is the maximum possible number of parameters in the general solution of the linear system with augmented matrix B ? (c) If C is a matrix with five rows and three columns, then what is the minimum possible number of rows of zeros in any row echelon form of C ?

1.3 Matrices and Matrix Operations

41. Describe all possible reduced row echelon forms of



a ⎢ (a) ⎣d g

b e h



⎤ c ⎥ f⎦ i

a ⎢e ⎢ (b) ⎢ ⎣i m

b f j n

c g k p



d h⎥ ⎥ ⎥ l⎦ q

42. Consider the system of equations

ax + by = 0 cx + dy = 0 ex + fy = 0 Discuss the relative positions of the lines ax + by = 0, cx + dy = 0, and ex + fy = 0 when the system has only the trivial solution and when it has nontrivial solutions.

Working with Proofs 43. (a) Prove that if ad − bc  = 0, then the reduced row echelon form of     a b 1 0 is c d 0 1 (b) Use the result in part (a) to prove that if ad − bc  = 0, then the linear system

ax + by = k cx + dy = l

has exactly one solution.

25

(d) A homogeneous linear system in n unknowns whose corresponding augmented matrix has a reduced row echelon form with r leading 1’s has n − r free variables. (e) All leading 1’s in a matrix in row echelon form must occur in different columns. (f ) If every column of a matrix in row echelon form has a leading 1, then all entries that are not leading 1’s are zero. (g) If a homogeneous linear system of n equations in n unknowns has a corresponding augmented matrix with a reduced row echelon form containing n leading 1’s, then the linear system has only the trivial solution. (h) If the reduced row echelon form of the augmented matrix for a linear system has a row of zeros, then the system must have infinitely many solutions. (i) If a linear system has more unknowns than equations, then it must have infinitely many solutions.

Working withTechnology T1. Find the reduced row echelon form of the augmented matrix for the linear system:

+ 4x4 = −3 6x1 + x2 −9x1 + 2x2 + 3x3 − 8x4 = 1 − 4x3 + 5x4 = 2 7x1 Use your result to determine whether the system is consistent and, if so, find its solution.

True-False Exercises TF. In parts (a)–(i) determine whether the statement is true or false, and justify your answer. (a) If a matrix is in reduced row echelon form, then it is also in row echelon form. (b) If an elementary row operation is applied to a matrix that is in row echelon form, the resulting matrix will still be in row echelon form. (c) Every matrix has a unique row echelon form.

T2. Find values of the constants A, B , C , and D that make the following equation an identity (i.e., true for all values of x ).

C D 3x 3 + 4x 2 − 6x Ax + B + + = 2 (x 2 + 2x + 2)(x 2 − 1) x + 2x + 2 x − 1 x + 1 [Hint: Obtain a common denominator on the right, and then equate corresponding coefficients of the various powers of x in the two numerators. Students of calculus will recognize this as a problem in partial fractions.]

1.3 Matrices and Matrix Operations Rectangular arrays of real numbers arise in contexts other than as augmented matrices for linear systems. In this section we will begin to study matrices as objects in their own right by defining operations of addition, subtraction, and multiplication on them.

Matrix Notation and Terminology

In Section 1.2 we used rectangular arrays of numbers, called augmented matrices, to abbreviate systems of linear equations. However, rectangular arrays of numbers occur in other contexts as well. For example, the following rectangular array with three rows and seven columns might describe the number of hours that a student spent studying three subjects during a certain week:

26

Chapter 1 Systems of Linear Equations and Matrices

Math History Language

Mon.

Tues.

Wed.

Thurs.

Fri.

Sat.

Sun.

2 0 4

3 3 1

2 1 3

4 4 1

1 3 0

4 2 0

2 2 2

If we suppress the headings, then we are left with the following rectangular array of numbers with three rows and seven columns, called a “matrix”:



2 ⎢ ⎣0 4

3 3 1

2 1 3

4 4 1

1 3 0



4 2 0

2 ⎥ 2⎦ 2

More generally, we make the following definition. DEFINITION 1 A matrix is a rectangular array of numbers. The numbers in the array are called the entries in the matrix.

E X A M P L E 1 Examples of Matrices Matrix brackets are often omitted from 1 × 1 matrices, making it impossible to tell, for example, whether the symbol 4 denotes the number “four” or the matrix [4]. This rarely causes problems because it is usually possible to tell which is meant from the context.

Some examples of matrices are



1 ⎣ 3 −1





2 0⎦, [2 4

1

e

⎢ − 3], ⎣0

0

0

π 1 2

0

√ ⎤ − 2 1 ⎥ , [4] 1 ⎦, 3

0

The size of a matrix is described in terms of the number of rows (horizontal lines) and columns (vertical lines) it contains. For example, the first matrix in Example 1 has three rows and two columns, so its size is 3 by 2 (written 3 × 2). In a size description, the first number always denotes the number of rows, and the second denotes the number of columns. The remaining matrices in Example 1 have sizes 1 × 4, 3 × 3, 2 × 1, and 1 × 1, respectively. A matrix with only one row, such as the second in Example 1, is called a row vector (or a row matrix), and a matrix with only one column, such as the fourth in that example, is called a column vector (or a column matrix). The fifth matrix in that example is both a row vector and a column vector. We will use capital letters to denote matrices and lowercase letters to denote numerical quantities; thus we might write



2 A= 3

1 4

7 2



a or C = d

b e

c f

When discussing matrices, it is common to refer to numerical quantities as scalars. Unless stated otherwise, scalars will be real numbers; complex scalars will be considered later in the text. The entry that occurs in row i and column j of a matrix A will be denoted by aij . Thus a general 3 × 4 matrix might be written as

1.3 Matrices and Matrix Operations



a11 ⎢ A = ⎣a21 a31 and a general m × n matrix as



a11 ⎢a ⎢ 21 A=⎢ . ⎣ .. am1 A matrix with n rows and n columns is said to be a square matrix of order n.

a12 a22 a32

a13 a23 a33

a12 a22 .. .

··· ···

am2

27

⎤ a14 ⎥ a24 ⎦ a34

···

⎤ a1n a2 n ⎥ ⎥ .. ⎥ . ⎦ amn

(1)

When a compact notation is desired, the preceding matrix can be written as

[aij ]m×n or [aij ] the first notation being used when it is important in the discussion to know the size, and the second when the size need not be emphasized. Usually, we will match the letter denoting a matrix with the letter denoting its entries; thus, for a matrix B we would generally use bij for the entry in row i and column j , and for a matrix C we would use the notation cij . The entry in row i and column j of a matrix A is also commonly denoted by the symbol (A)ij . Thus, for matrix (1) above, we have

(A)ij = aij

and for the matrix

2 −3 7 0 we have (A)11 = 2, (A)12 = −3, (A)21 = 7, and (A)22 = 0. Row and column vectors are of special importance, and it is common practice to denote them by boldface lowercase letters rather than capital letters. For such matrices, double subscripting of the entries is unnecessary. Thus a general 1 × n row vector a and a general m × 1 column vector b would be written as

A=

⎤ b1 ⎢b ⎥ ⎢ 2⎥ · · · an ] and b = ⎢ .. ⎥ ⎣ . ⎦ bm ⎡

a = [a1 a2

A matrix A with n rows and n columns is called a square matrix of order n, and the shaded entries a11 , a22 , . . . , ann in (2) are said to be on the main diagonal of A.

⎡ ⎢ ⎢ ⎢ ⎣

Operations on Matrices

a11 a21 .. . an1

a12 a22 .. . an2

··· ···

a1n a2n .. . · · · ann

⎤ ⎥ ⎥ ⎥ ⎦

(2)

So far, we have used matrices to abbreviate the work in solving systems of linear equations. For other applications, however, it is desirable to develop an “arithmetic of matrices” in which matrices can be added, subtracted, and multiplied in a useful way. The remainder of this section will be devoted to developing this arithmetic. DEFINITION 2 Two matrices are defined to be equal if they have the same size and

their corresponding entries are equal.

28

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 2 Equality of Matrices The equality of two matrices

Consider the matrices

A = [aij ] and B = [bij ] of the same size can be expressed either by writing

(A)ij = (B)ij or by writing

aij = bij where it is understood that the equalities hold for all values of i and j .



2 3

A=

1

, B=

x

2 3



1 2 , C= 5 3

1 4

0 0

If x = 5, then A = B , but for all other values of x the matrices A and B are not equal, since not all of their corresponding entries are equal. There is no value of x for which A = C since A and C have different sizes. DEFINITION 3 If A and B are matrices of the same size, then the sum A + B is the matrix obtained by adding the entries of B to the corresponding entries of A, and the difference A − B is the matrix obtained by subtracting the entries of B from the corresponding entries of A. Matrices of different sizes cannot be added or subtracted.

In matrix notation, if A = [aij ] and B = [bij ] have the same size, then

(A + B)ij = (A)ij + (B)ij = aij + bij and (A − B)ij = (A)ij − (B)ij = aij − bij

E X A M P L E 3 Addition and Subtraction

Consider the matrices



Then





1 0 −2

0 2 7

−4 3 ⎥ ⎢ 4⎦, B = ⎣ 2 0 3

−2

4 2 0

5 2 3

2 ⎢ A = ⎣−1 4



⎢ A+B =⎣ 1 7

3 2 2



5 0 −4



1 1 ⎥ −1⎦, C = 2 5



4 6 ⎥ ⎢ 3⎦ and A − B = ⎣−3 5 1

−2 −2 −4

1 2



−5

2 ⎥ 5⎦ −5

2 11

The expressions A + C , B + C , A − C , and B − C are undefined. DEFINITION 4 If A is any matrix and c is any scalar, then the product cA is the matrix obtained by multiplying each entry of the matrix A by c. The matrix cA is said to be a scalar multiple of A.

In matrix notation, if A = [aij ], then

(cA)ij = c(A)ij = caij E X A M P L E 4 Scalar Multiples

For the matrices



2 A= 1 we have



4 2A = 2

3 3

6 6



4 0 , B= −1 1



8 0 , (−1)B = 2 1

2 3



−2 −7 , −3 5

It is common practice to denote (−1)B by −B .



−6

7 9 , C= −5 3

0



1 C 3

3 12

3 = 1

−2 0

1 4

1.3 Matrices and Matrix Operations

29

Thus far we have defined multiplication of a matrix by a scalar but not the multiplication of two matrices. Since matrices are added by adding corresponding entries and subtracted by subtracting corresponding entries, it would seem natural to define multiplication of matrices by multiplying corresponding entries. However, it turns out that such a definition would not be very useful for most problems. Experience has led mathematicians to the following more useful definition of matrix multiplication.

A is an m × r matrix and B is an r × n matrix, then the product AB is the m × n matrix whose entries are determined as follows: To find the entry in row i and column j of AB , single out row i from the matrix A and column j from the matrix B . Multiply the corresponding entries from the row and column together, DEFINITION 5 If

and then add up the resulting products.

E X A M P L E 5 Multiplying Matrices

Consider the matrices

A=

1 2

2 6



4 4 ⎢ , B = ⎣0 0 2

1 −1 7



4 3 5

3 ⎥ 1⎦ 2

Since A is a 2 × 3 matrix and B is a 3 × 4 matrix, the product AB is a 2 × 4 matrix. To determine, for example, the entry in row 2 and column 3 of AB , we single out row 2 from A and column 3 from B . Then, as illustrated below, we multiply corresponding entries together and add up these products.

⎡ 4 1 2 4 ⎢ ⎣0 2 6 0 2



1 1 7

4 3 5

⎤ ⎡ 3 ⎥ ⎢ 1⎦ = ⎣ 2

⎤ ⎥ ⎦

26

(2 · 4) + (6 · 3) + (0 · 5) = 26 The entry in row 1 and column 4 of AB is computed as follows:

⎡ 4 1 2 4 ⎢ ⎣0 2 6 0 2



1 1 7

4 3 5

⎤ ⎡ 3 ⎥ ⎢ 1⎦ = ⎣ 2

⎤ 13 ⎥ ⎦

(1 · 3) + (2 · 1) + (4 · 2) = 13 The computations for the remaining entries are

(1 · 4) + (2 · 0) + (4 · 2) = 12 (1 · 1) − (2 · 1) + (4 · 7) = 27 (1 · 4) + (2 · 3) + (4 · 5) = 30 (2 · 4) + (6 · 0) + (0 · 2) = 8 (2 · 1) − (6 · 1) + (0 · 7) = −4 (2 · 3) + (6 · 1) + (0 · 2) = 12



12 AB = 8

27 −4

30 26

13 12

The definition of matrix multiplication requires that the number of columns of the first factor A be the same as the number of rows of the second factor B in order to form the product AB . If this condition is not satisfied, the product is undefined. A convenient

30

Chapter 1 Systems of Linear Equations and Matrices

way to determine whether a product of two matrices is defined is to write down the size of the first factor and, to the right of it, write down the size of the second factor. If, as in (3), the inside numbers are the same, then the product is defined. The outside numbers then give the size of the product.

A m × r

B r × n =

AB m × n (3)

Inside Outside

E X A M P L E 6 Determining Whether a Product Is Defined

Suppose that A, B , and C are matrices with the following sizes:

A

B

C

3×4

4×7

7×3

Then by (3), AB is defined and is a 3 × 7 matrix; BC is defined and is a 4 × 3 matrix; and CA is defined and is a 7 × 4 matrix. The products AC , CB , and BA are all undefined. In general, if A = [aij ] is an m × r matrix and B = [bij ] is an r × n matrix, then, as illustrated by the shading in the following display,



a11 ⎢a ⎢ 21 ⎢ . ⎢ .. AB = ⎢ ⎢ ai 1 ⎢ . ⎢ . ⎣ .

a12 a22 .. . ai 2 .. .

am1

am2

··· ··· ··· ···

a1r a2r .. . air .. .



⎥ ⎡b ⎥ 11 ⎥⎢ ⎥ ⎢b21 ⎥⎢ . ⎥⎣ . ⎥ . ⎥ ⎦ br 1

b12 b22 .. .

· · · b1 j · · · b2 j .. .

br 2

br j

···

⎤ · · · b1n · · · b2n ⎥ ⎥ .. ⎥ . ⎦ · · · br n

(4)

amr

the entry (AB)ij in row i and column j of AB is given by

(AB)ij = ai 1 b1j + ai 2 b2j + ai 3 b3j + · · · + air brj

(5)

Formula (5) is called the row-column rule for matrix multiplication. Partitioned Matrices

A matrix can be subdivided or partitioned into smaller matrices by inserting horizontal and vertical rules between selected rows and columns. For example, the following are three possible partitions of a general 3 × 4 matrix A—the first is a partition of A into

Gotthold Eisenstein (1823–1852)

Historical Note The concept of matrix multiplication is due to the German mathematician Gotthold Eisenstein, who introduced the idea around 1844 to simplify the process of making substitutions in linear systems. The idea was then expanded on and formalized by Cayley in his Memoir on the Theory of Matrices that was published in 1858. Eisenstein was a pupil of Gauss, who ranked him as the equal of Isaac Newton and Archimedes. However, Eisenstein, suffering from bad health his entire life, died at age 30, so his potential was never realized. [Image: http://www-history.mcs.st-andrews.ac.uk/ Biographies/Eisenstein.html]

1.3 Matrices and Matrix Operations

31

four submatrices A11 , A12 , A21 , and A22 ; the second is a partition of A into its row vectors r1 , r2 , and r3 ; and the third is a partition of A into its column vectors c1 , c2 , c3 , and c4 :



a11 ⎢ A = ⎣a21 a31 ⎡ a11 ⎢ A = ⎣a21 a31 ⎡ a11 ⎢ A = ⎣a21 a31

Matrix Multiplication by Columns and by Rows

a12 a22 a32

a13 a23 a33

a12 a22 a32

a13 a23 a33

a12 a22 a32

a13 a23 a33

⎤ a14

A11 A12 ⎥ a24 ⎦ = A21 A22 a34 ⎤ ⎡ ⎤ a14 r1 ⎥ ⎢ ⎥ a24 ⎦ = ⎣r2 ⎦ a34 r3 ⎤ a14 ⎥ a24 ⎦ = [c1 c2 c3 c4 ] a34

Partitioning has many uses, one of which is for finding particular rows or columns of a matrix product AB without computing the entire product. Specifically, the following formulas, whose proofs are left as exercises, show how individual column vectors of AB can be obtained by partitioning B into column vectors and how individual row vectors of AB can be obtained by partitioning A into row vectors.

AB = A[b1 b2

· · · bn ] = [Ab1 Ab2

· · · Abn ]

(6)

(AB computed column by column)





a1 ⎢a ⎥ ⎢ 2⎥





a1 B ⎢a B ⎥ ⎢ 2 ⎥

AB = ⎢ . ⎥B = ⎢ . ⎥ ⎣ .. ⎦ ⎣ .. ⎦ am am B

(7)

(AB computed row by row)

We now have three methods for computing a product of two matrices, entry by entry using Definition 5, column by column using Formula (8), and row by row using Formula (9). We will call these the entry method , the row method , and the column method , respectively.

In words, these formulas state that

j th column vector of AB = A[ j th column vector of B]

(8)

i th row vector of AB = [i th row vector of A]B

(9)

E X A M P L E 7 Example 5 Revisited

If A and B are the matrices in Example 5, then from (8) the second column vector of AB can be obtained by the computation



1 2

2 6





1 4 ⎢ ⎥ ⎣−1⎦ 0 7

27 −4



 Second column of B

=

Second column of AB

32

Chapter 1 Systems of Linear Equations and Matrices

and from (9) the first row vector of AB can be obtained by the computation



[1 2

4 ⎢ 4 ]⎣0 2

1 1 7

4 3 5

⎤ 3 ⎥ 1⎦ = 2

[ 12 27 30 13 ]

First row of A

Matrix Products as Linear Combinations Definition 6 is applicable, in particular, to row and column vectors. Thus, for example, a linear combination of column vectors x1 , x2 , . . . , xr of the same size is an expression of the form

c1 x1 + c2 x2 + · · · + cr xr

First row of AB

The following definition provides yet another way of thinking about matrix multiplication. DEFINITION 6 If

A1 , A2 , . . . , Ar are matrices of the same size, and if c1 , c2 , . . . , cr are scalars, then an expression of the form c 1 A1 + c 2 A2 + · · · + c r Ar is called a linear combination of A1 , A2 , . . . , Ar with coefficients c1 , c2 , . . . , cr . To see how matrix products can be viewed as linear combinations, let A be an m × n matrix and x an n × 1 column vector, say



a12 a22 .. .

a11 ⎢a ⎢ 21 A=⎢ . ⎣ .. am 1 Then

am2

⎤ ⎡ ⎤ a1n x1 ⎢x ⎥ a2 n ⎥ ⎥ ⎢ 2⎥ .. ⎥ and x = ⎢ .. ⎥ ⎦ ⎣.⎦ . xn · · · amn ··· ···



⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a11 x1 + a12 x2 + · · · + a1n xn a11 a12 a1n ⎢a x + a x +···+ a x ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 21 1 ⎢ a21 ⎥ ⎢ a22 ⎥ ⎢ a2n ⎥ 22 2 2n n ⎥ Ax = ⎢ . ⎥ = x1 ⎢ . ⎥ + x2 ⎢ . ⎥ + · · · + xn ⎢ . ⎥ . . .. .. ⎦ ⎣ .. ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ am1 x1 + am2 x2 + · · · + amn xn a m1 am2 amn (10) This proves the following theorem. THEOREM 1.3.1 If A is an m × n matrix, and if x is an n × 1 column vector, then the

product Ax can be expressed as a linear combination of the column vectors of A in which the coefficients are the entries of x.

E X A M P L E 8 Matrix Products as Linear Combinations

The matrix product



−1

3

2

2 1

⎢ ⎣ 1

2

⎤⎡

2







1

⎥⎢ ⎥ ⎢ ⎥ −3⎦ ⎣−1⎦ = ⎣−9⎦ −2 −3 3

can be written as the following linear combination of column vectors:



⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 3 2 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ − 3 − 1 2 2 ⎣ ⎦ − 1 ⎣ ⎦ + 3 ⎣ ⎦ = ⎣ 9⎦ −2 −3 2 1

1.3 Matrices and Matrix Operations

33

E X A M P L E 9 Columns of a Product AB as Linear Combinations

We showed in Example 5 that

AB =

1

2

2

6

⎡ 1

4 4 ⎢ − 1 0 ⎣

4

3

3

1⎦ =

2

5

2

0

7

⎤ ⎥



12

27

30

13

8

−4

26

12

It follows from Formula (6) and Theorem 1.3.1 that the j th column vector of AB can be expressed as a linear combination of the column vectors of A in which the coefficients in the linear combination are the entries from the j th column of B . The computations are as follows:

 

 

12

=4

8



27



−4   30

1

=

2

Column-Row Expansion

2 1 2

6

 

+2

  2



6 2

+3

6

+7

2 6

0 4 0

  +5

  +

4

 

 

  =3

12

1

2

+0

 

  13

2

 

 

=4

26

1

4 0

  +2

4 0

Partitioning provides yet another way to view matrix multiplication. Specifically, suppose that an m × r matrix A is partitioned into its r column vectors c1 , c2 , . . . , cr (each of size m × 1) and an r × n matrix B is partitioned into its r row vectors r1 , r2 , . . . , rr (each of size 1 × n). Each term in the sum c1 r1 + c2 r2 + · · · + cr rr has size m × n so the sum itself is an m × n matrix. We leave it as an exercise for you to verify that the entry in row i and column j of the sum is given by the expression on the right side of Formula (5), from which it follows that

AB = c1 r1 + c2 r2 + · · · + cr rr

(11)

We call (11) the column-row expansion of AB .

E X A M P L E 10 Column-Row Expansion

Find the column-row expansion of the product

 AB =



1

3

2

−1

2

0

4

−3

5

1

 (12)

Solution The column vectors of A and the row vectors of B are, respectively,

  c1 =

1

2

 , c2 =



3

−1

 ; r1 = 2

0





4 , r2 = −3

5

1



34

Chapter 1 Systems of Linear Equations and Matrices

Thus, it follows from (11) that the column-row expansion of AB is

 

1  2 AB = 2

 = The main use of the columnrow expansion is for developing theoretical results rather than for numerical computations.

Matrix Form of a Linear System

4 +

0

2

0

4

4

0

8







 +



3 

−1

−3 5 1





−9

15

3

3

−5

−1

(13)

As a check, we leave it for you to confirm that the product in (12) and the sum in (13) both yield   7 −7 15

AB =

7

−5

7

Matrix multiplication has an important application to systems of linear equations. Consider a system of m linear equations in n unknowns:

a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = bm Since two matrices are equal if and only if their corresponding entries are equal, we can replace the m equations in this system by the single matrix equation



⎤ ⎡ ⎤ a11 x1 + a12 x2 + · · · + a1n xn b1 ⎢ a x + a x + · · · + a x ⎥ ⎢b ⎥ 22 2 2n n ⎥ ⎢ 21 1 ⎢ 2⎥ ⎢ .. .. .. ⎥ = ⎢ .. ⎥ ⎣ . ⎦ ⎣ . ⎦ . . bm am1 x1 + am2 x2 + · · · + amn xn

The m × 1 matrix on the left side of this equation can be written as a product to give



a11 ⎢a ⎢ 21 ⎢ .. ⎣ .

a12 a22 .. .

··· ···

a m1

am2

···

⎤⎡ ⎤ ⎡ ⎤ x1 b1 a1n ⎥ ⎥ ⎢ ⎢ a 2 n ⎥ ⎢ x2 ⎥ ⎢ b2 ⎥ ⎥ .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥ ⎦ ⎣ ⎦ ⎣ . . . ⎦ xn bm amn

If we designate these matrices by A, x, and b, respectively, then we can replace the original system of m equations in n unknowns by the single matrix equation

Ax = b

The vertical partition line in the augmented matrix [A | b] is optional, but is a useful way of visually separating the coefficient matrix A from the column vector b.

Transpose of a Matrix

The matrix A in this equation is called the coefficient matrix of the system. The augmented matrix for the system is obtained by adjoining b to A as the last column; thus the augmented matrix is



a11 ⎢a ⎢ 21 [A | b] = ⎢ .. ⎣ . am1

a12 a22 .. .

··· ···

a1n a2 n .. .

am2

···

amn

⎤ b1 b2 ⎥ ⎥ .. ⎥ . ⎦ bm

We conclude this section by defining two matrix operations that have no analogs in the arithmetic of real numbers.

1.3 Matrices and Matrix Operations

35

If A is any m × n matrix, then the transpose of A, denoted by AT , is defined to be the n × m matrix that results by interchanging the rows and columns of A; that is, the first column of AT is the first row of A, the second column of AT is the second row of A, and so forth. DEFINITION 7

E X A M P L E 11 Some Transposes

The following are some examples of matrices and their transposes.



a11 ⎢ A = ⎣a21 a31 ⎡ a11 ⎢a ⎢ 12 AT = ⎢ ⎣a13 a14

a12 a22 a32 a21 a22 a23 a24

a13 a23 a33

⎤ ⎡ a14 2 ⎥ ⎢ a24 ⎦, B = ⎣1 5 a34

⎤ a31  a32 ⎥ 2 ⎥ ⎥, B T = 3 a33 ⎦ a34

1 4



3 ⎥ 4⎦, C = [1 3 5], D = [4] 6

⎡ ⎤



1 5 ⎢ ⎥ T , C = ⎣3⎦, D T = [4] 6 5

Observe that not only are the columns of AT the rows of A, but the rows of AT are the columns of A. Thus the entry in row i and column j of AT is the entry in row j and column i of A; that is,

(AT )ij = (A)j i

(14)

Note the reversal of the subscripts. In the special case where A is a square matrix, the transpose of A can be obtained by interchanging entries that are symmetrically positioned about the main diagonal. In (15) we see that AT can also be obtained by “reflecting” A about its main diagonal.



1 ⎢ A= ⎣ 3 5

2 7 8

⎤ 4 ⎥ 0⎦ 6



1 ⎢ ⎣ 3 5

2 7 8

⎤ 4 ⎥ 0⎦ 6

⎡ AT

1 ⎢ 2 ⎣ 4

3 7 0

⎤ 5 ⎥ 8⎦ 6

(15)

Interchange entries that are symmetrically positioned about the main diagonal.

James Sylvester (1814–1897)

Arthur Cayley (1821–1895)

Historical Note The term matrix was first used by the English mathematician James Sylvester, who defined the term in 1850 to be an “oblong arrangement of terms.” Sylvester communicated his work on matrices to a fellow English mathematician and lawyer named Arthur Cayley, who then introduced some of the basic operations on matrices in a book entitled Memoir on the Theory of Matrices that was published in 1858. As a matter of interest, Sylvester, who was Jewish, did not get his college degree because he refused to sign a required oath to the Church of England. He was appointed to a chair at the University of Virginia in the United States but resigned after swatting a student with a stick because he was reading a newspaper in class. Sylvester, thinking he had killed the student, fled back to England on the first available ship. Fortunately, the student was not dead, just in shock! [Images: © Bettmann/CORBIS (Sylvester ); Photo Researchers/Getty Images (Cayley )]

36

Chapter 1 Systems of Linear Equations and Matrices

Trace of a Matrix

DEFINITION 8 If A is a square matrix, then the trace of A, denoted by tr(A), is defined to be the sum of the entries on the main diagonal of A. The trace of A is undefined if A is not a square matrix.

E X A M P L E 1 2 Trace

The following are examples of matrices and their traces.



a11 ⎢ A = ⎣a21 a31

a12 a22 a32





−1

a13 ⎢ 3 ⎥ ⎢ a23 ⎦, B = ⎢ ⎣ 1 a33 4

tr(A) = a11 + a22 + a33

2 5 2 −2



7 −8 7 1

0 4⎥ ⎥ ⎥ −3⎦ 0

tr(B) = −1 + 5 + 7 + 0 = 11

In the exercises you will have some practice working with the transpose and trace operations.

Exercise Set 1.3 In Exercises 1–2, suppose that A, B , C , D , and E are matrices with the following sizes:

A

B

C

D

E

(4 × 5)

(4 × 5)

(5 × 2)

(4 × 2 )

(5 × 4)

In each part, determine whether the given matrix expression is defined. For those that are defined, give the size of the resulting matrix. 1. (a) BA

T

3

0

⎢ A = ⎣−1 1

1

⎢ D = ⎣−1 3

(f ) B − B T (i) (CD)E

( j) C(BA)

(k) tr(DE T )

(l) tr(BC)

(b) BA

(c) (3E)D

(h) (C TB)AT

(i) tr(DD T )

(b) DC

(c) BC − 3D

( j) tr(4E T − D)

(k) tr(C TAT + 2E T ) (l) tr((EC T )TA)

(e) B D + ED

(f ) BA + D

 B=

T

4

 −1

0

2

 C=

,











1

4

2

3

1

5



5

2

6

1

3

0

1⎦, E = ⎣−1 4 4

1

2⎦

1

3

2

(h) (2E T − 3D T )T

5. (a) AB

− 41 A

(g) (DA)T

1



(g) 2E T − 3D T

1 T C 2

(f ) E(5B + A)



⎥ 2⎦,

(e)

(f ) CC T

In Exercises 3–6, use the following matrices to compute the indicated expression if it is defined.



(d) B T + 5C T

(e) A(BC)

T

T

(d) D (BE)

(c) (D − E)T

(d) (AB)C

T

(e) A − 3E

2. (a) CD T

(b) D T − E T

(c) AC + D

(b) AB

(d) E(AC)

4. (a) 2AT + C



,

6. (a) (2D T − E)A

(b) (4B)C + 2B

(c) (−AC)T + 5D T

(d) (BAT − 2C)T

(e) B T(CC T − ATA)

(f ) D T E T − (ED)T

In Exercises 7–8, use the following matrices and either the row method or the column method, as appropriate, to find the indicated row or column.



3

⎢ A = ⎣6

−2











6

−2

5

4⎦ and B = ⎣0

1

3⎦

4

9

7

5

7

4



3. (a) D + E

(b) D − E

(c) 5A

(d) −7C

(e) 2B − C

(f ) 4E − 2D

(g) −3(D + 2E)

(h) A − A

(i) tr(D)

(c) the second column of AB

(d) the first column of BA

( j) tr(D − 3E)

(k) 4 tr(7B)

(l) tr(A)

(e) the third row of AA

(f ) the third column of AA

0

7. (a) the first row of AB

7

(b) the third row of AB

1.3 Matrices and Matrix Operations

8. (a) the first column of AB

(b) the third column of BB

(c) the second row of BB

(d) the first column of AA

(e) the third column of AB

(f ) the first row of BA

In Exercises 15–16, find all values of k , if any, that satisfy the equation.





15. k

1

1

0



(b) Express each column vector of BB as a linear combination of the column vectors of B . 10. (a) Express each column vector of AB as a linear combination of the column vectors of A. (b) Express each column vector of BA as a linear combination of the column vectors of B . In each part of Exercises 11–12, find matrices A, x, and b that express the given linear system as a single matrix equation Ax = b, and write out this matrix equation. 11. (a) 2x1 − 3x2 + 5x3 = 7 9x1 − x2 + x3 = −1 x1 + 5x2 + 4x3 = 0 (b) 4x1 − 3x3 5x1 + x2 2x1 − 5x2 + 9x3 3x2 − x3

+ x4 − 8x4 − x4 + 7x4

12. (a) x1 − 2x2 + 3x3 = −3 = 0 2x1 + x2 − 3x2 + 4x3 = 1 x1 + x3 = 5



3 ⎢ 14. (a) ⎣ 4 −2



3

⎢ 5 ⎢ (b) ⎢ ⎣ 3 −2

6

−1 3 1

−2 0 1 5

2

(b) 3x1 + 3x2 + 3x3 = −3 −x1 − 5x2 − 2x3 = 3 − 4x2 + x3 = 0

−7

⎤⎡ ⎤ ⎡ ⎤ 2 2 x1 ⎥⎢ ⎥ ⎢ ⎥ 7⎦ ⎣x2 ⎦ = ⎣−1⎦ 5 x3 4 ⎤⎡ ⎤ ⎡ ⎤ 0 1 0 w ⎢ x ⎥ ⎢0⎥ 2 −2⎥ ⎥⎢ ⎥ ⎢ ⎥ ⎥⎢ ⎥ = ⎢ ⎥ 4 7⎦ ⎣ y ⎦ ⎣0⎦ 1 6 0 z

⎤⎡ ⎤

2 0 3

0

0 2 ⎥⎢ ⎥ 3⎦ ⎣ 2 ⎦ = 0 k 1

In Exercises 17–20, use the column-row expansion of AB to express this product as a sum of matrices.



17. A =

4

−3

2

−1

 18. A =

19. A =



0

−2

4

−3



20. A =

=1 =3 =0 =2

⎤⎡ ⎤ ⎡ ⎤ 2 x1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ 3⎦ ⎣x2 ⎦ = ⎣0⎦ 13. (a) ⎣−1 −2 3 0 4 −1 x3 ⎡ ⎤⎡ ⎤ ⎡ ⎤ 1 1 1 2 x ⎢ ⎥⎢ ⎥ ⎢ ⎥ 3 0⎦ ⎣y ⎦ = ⎣ 2⎦ (b) ⎣2 −9 5 −3 −6 z 5

16. 2

1

⎢ k ⎣2



, B=  , B= 

1

2

3

4

5

6



In each part of Exercises 13–14, express the matrix equation as a system of linear equations.





⎤⎡ ⎤

0 k ⎥⎢ ⎥ 2⎦ ⎣ 1 ⎦ = 0 −3 1

1 0 2

⎢ 1 ⎣1

In Exercises 9–10, use matrices A and B from Exercises 7–8.

9. (a) Express each column vector of AA as a linear combination of the column vectors of A.

37

1

2

−2 

3

1

1

4

1

−3

0

2



1

2

⎢ , B = ⎣3

0

4

2

1

−2

5



⎤ ⎥

4⎦

5





0

6



2

⎢ , B = ⎣4

1

⎤ −1 ⎥ 0⎦ −1

21. For the linear system in Example 5 of Section 1.2, express the general solution that we obtained in that example as a linear combination of column vectors that contain only numerical entries. [Suggestion: Rewrite the general solution as a single column vector, then write that column vector as a sum of column vectors each of which contains at most one parameter, and then factor out the parameters.] 22. Follow the directions of Exercise 21 for the linear system in Example 6 of Section 1.2. In Exercises 23–24, solve the matrix equation for a , b, c, and d .



a 3 4 d − 2c = 23. −2 −1 a + b d + 2c



a−b b+a 8 1 = 24. 3d + c 2 d − c 7 6

25. (a) Show that if A has a row of zeros and B is any matrix for which AB is defined, then AB also has a row of zeros. (b) Find a similar result involving a column of zeros. 26. In each part, find a 6 × 6 matrix [aij ] that satisfies the stated condition. Make your answers as general as possible by using letters rather than specific numbers for the nonzero entries. (a) aij = 0

if

i = j

(b) aij = 0

if

i>j

(c) aij = 0

if

i 1

38

Chapter 1 Systems of Linear Equations and Matrices

In Exercises 27–28, how many 3 × 3 matrices A can you find for which the equation is satisfied for all choices of x , y , and z?

⎤ ⎡ ⎤ ⎡ x x+y ⎥ ⎢ ⎥ ⎢ 27. A ⎣y ⎦ = ⎣x − y ⎦ 0 z

⎡ ⎤ ⎡ ⎤ x xy ⎢ ⎥ ⎢ ⎥ 28. A ⎣y ⎦ = ⎣ 0 ⎦ 0 z

29. A matrix B is said to be a square root of a matrix A if BB = A.



2 (a) Find two square roots of A = 2

2 . 2

(b) How many different square roots can you find of

5 0 A= ? 0 9 (c) Do you think that every 2 × 2 matrix has at least one square root? Explain your reasoning. 30. Let 0 denote a 2 × 2 matrix, each of whose entries is zero. (a) Is there a 2 × 2 matrix A such that A  = 0 and AA = 0 ? Justify your answer. (b) Is there a 2 × 2 matrix A such that A  = 0 and AA = A? Justify your answer.

34. The accompanying table shows a record of May and June unit sales for a clothing store. Let M denote the 4 × 3 matrix of May sales and J the 4 × 3 matrix of June sales. (a) What does the matrix M + J represent? (b) What does the matrix M − J represent? (c) Find a column vector x for which M x provides a list of the number of shirts, jeans, suits, and raincoats sold in May. (d) Find a row vector y for which yM provides a list of the number of small, medium, and large items sold in May. (e) Using the matrices x and y that you found in parts (c) and (d), what does yM x represent? Table Ex-34 May Sales Small

Medium

Large

Shirts

45

60

75

Jeans

30

30

40

Suits

12

65

45

Raincoats

15

40

35

31. Establish Formula (11) by using Formula (5) to show that

June Sales

(AB)ij = (c1 r1 + c2 r2 + · · · + cr rr )ij 32. Find a 4 × 4 matrix A = [aij ] whose entries satisfy the stated condition. (b) aij = i j −1

(a) aij = i + j



(c) aij =

1 if

−1 if

|i − j | > 1 |i − j | ≤ 1

33. Suppose that type I items cost $1 each, type II items cost $2 each, and type III items cost $3 each. Also, suppose that the accompanying table describes the number of items of each type purchased during the first four months of the year. Table Ex-33

Small

Medium

Large

Shirts

30

33

40

Jeans

21

23

25

Suits

9

12

11

Raincoats

8

10

9

Working with Proofs 35. Prove: If A and B are n × n matrices, then tr(A + B) = tr(A) + tr(B) 36. (a) Prove: If AB and BA are both defined, then AB and BA are square matrices.

Type I

Type II

Type III

Jan.

3

4

3

Feb.

5

6

0

True-False Exercises

Mar.

2

9

4

Apr.

1

1

7

TF. In parts (a)–(o) determine whether the statement is true or false, and justify your answer.

What information is represented by the following product?



3 ⎢5 ⎢ ⎢ ⎣2 1

4 6 9 1



3 ⎡ ⎤ 1 0⎥ ⎥⎢ ⎥ ⎥ ⎣2⎦ 4⎦ 3 7

(b) Prove: If A is an m × n matrix and A(BA) is defined, then B is an n × m matrix.



(a) The matrix

1 4

2 5

3 has no main diagonal. 6

(b) An m × n matrix has m column vectors and n row vectors. (c) If A and B are 2 × 2 matrices, then AB = BA. (d) The i th row vector of a matrix product AB can be computed by multiplying A by the i th row vector of B .

1.4 Inverses; Algebraic Properties of Matrices

39

(e) For every matrix A, it is true that (AT )T = A.

Working withTechnology

(f ) If A and B are square matrices of the same order, then

T1. (a) Compute the product AB of the matrices in Example 5, and compare your answer to that in the text.

tr(AB) = tr(A)tr(B)

(b) Use your technology utility to extract the columns of A and the rows of B , and then calculate the product AB by a column-row expansion.

(g) If A and B are square matrices of the same order, then

(AB)T = ATB T (h) For every square matrix A, it is true that tr(AT ) = tr(A). (i) If A is a 6 × 4 matrix and B is an m × n matrix such that B TAT is a 2 × 6 matrix, then m = 4 and n = 2. ( j) If A is an n × n matrix and c is a scalar, then tr(cA) = c tr(A). (k) If A, B , and C are matrices of the same size such that A − C = B − C , then A = B .

T2. Suppose that a manufacturer uses Type I items at $1.35 each, Type II items at $2.15 each, and Type III items at $3.95 each. Suppose also that the accompanying table describes the purchases of those items (in thousands of units) for the first quarter of the year. Write down a matrix product, the computation of which produces a matrix that lists the manufacturer’s expenditure in each month of the first quarter. Compute that product.

(l) If A, B , and C are square matrices of the same order such that AC = BC , then A = B . (m) If AB + BA is defined, then A and B are square matrices of the same size. (n) If B has a column of zeros, then so does AB if this product is defined. (o) If B has a column of zeros, then so does BA if this product is defined.

Type I

Type II

Type III

Jan.

3.1

4.2

3.5

Feb.

5.1

6.8

0

Mar.

2.2

9.5

4.0

Apr.

1.0

1.0

7.4

1.4 Inverses; Algebraic Properties of Matrices In this section we will discuss some of the algebraic properties of matrix operations. We will see that many of the basic rules of arithmetic for real numbers hold for matrices, but we will also see that some do not.

Properties of Matrix Addition and Scalar Multiplication

The following theorem lists the basic algebraic properties of the matrix operations. THEOREM 1.4.1 Properties of Matrix Arithmetic

Assuming that the sizes of the matrices are such that the indicated operations can be performed, the following rules of matrix arithmetic are valid. (a) A + B = B + A [Commutative law for matrix addition] (b) (c)

A + (B + C) = (A + B) + C A(BC) = (AB)C

(d ) A(B + C) = AB + AC (e)

(B + C)A = BA + CA

( f ) A(B − C) = AB − AC ( g)

(B − C)A = BA − CA

(h)

a(B + C) = aB + aC

a(B − C) = aB − aC ( j ) (a + b)C = aC + bC (i )

(k) (l )

(a − b)C = aC − bC a(bC) = (ab)C

(m) a(BC) = (aB)C = B(aC)

[Associative law for matrix addition] [Associative law for matrix multiplication] [Left distributive law] [Right distributive law]

40

Chapter 1 Systems of Linear Equations and Matrices

To prove any of the equalities in this theorem we must show that the matrix on the left side has the same size as that on the right and that the corresponding entries on the two sides are the same. Most of the proofs follow the same pattern, so we will prove part (d ) as a sample. The proof of the associative law for multiplication is more complicated than the rest and is outlined in the exercises.

There are three basic ways to prove that two matrices of the same size are equal— prove that corresponding entries are the same, prove that corresponding row vectors are the same, or prove that corresponding column vectors are the same.

Proof (d) We must show that A(B + C) and AB + AC have the same size and that corresponding entries are equal. To form A(B + C), the matrices B and C must have the same size, say m × n, and the matrix A must then have m columns, so its size must be of the form r × m. This makes A(B + C) an r × n matrix. It follows that AB + AC is also an r × n matrix and, consequently, A(B + C) and AB + AC have the same size. Suppose that A = [aij ], B = [bij ], and C = [cij ]. We want to show that corresponding entries of A(B + C) and AB + AC are equal; that is,





A(B + C)

ij

= (AB + AC)ij

for all values of i and j . But from the definitions of matrix addition and matrix multiplication, we have





A(B + C)

ij

= ai 1 (b1j + c1j ) + ai 2 (b2j + c2j ) + · · · + aim (bmj + cmj ) = (ai 1 b1j + ai 2 b2j + · · · + aim bmj ) + (ai 1 c1j + ai 2 c2j + · · · + aim cmj ) = (AB)ij + (AC)ij = (AB + AC)ij

Remark Although the operations of matrix addition and matrix multiplication were defined for pairs of matrices, associative laws (b) and (c) enable us to denote sums and products of three matrices as A + B + C and ABC without inserting any parentheses. This is justified by the fact that no matter how parentheses are inserted, the associative laws guarantee that the same end result will be obtained. In general, given any sum or any product of matrices, pairs of parentheses can be inserted or deleted anywhere within the expression without affecting the end result.

E X A M P L E 1 Associativity of Matrix Multiplication

As an illustration of the associative law for matrix multiplication, consider



1 ⎢ A = ⎣3 0 Then



1 ⎢ AB = ⎣3 0 Thus



2 ⎥ 4 4⎦ 2 1



2 4 ⎥ 4⎦, B = 2 1



8 3 ⎢ = ⎣20 1 2



8 ⎢ (AB)C = ⎣20 2 and



1 ⎢ A(BC) = ⎣3 0



3 1 , C= 1 2



5 4 ⎥ 13⎦ and BC = 2 1



5 ⎥ 1 13⎦ 2 1



2 ⎥ 10 4⎦ 4 1



18 0 ⎢ = ⎣46 3 4



18 9 ⎢ = ⎣46 3 4

so (AB)C = A(BC), as guaranteed by Theorem 1.4.1(c).

0 3





3 1



15 ⎥ 39⎦ 3



15 ⎥ 39⎦ 3

1 2



0 10 = 3 4

9 3

1.4 Inverses; Algebraic Properties of Matrices

Properties of Matrix Multiplication

41

Do not let Theorem 1.4.1 lull you into believing that all laws of real arithmetic carry over to matrix arithmetic. For example, you know that in real arithmetic it is always true that ab = ba, which is called the commutative law for multiplication. In matrix arithmetic, however, the equality of AB and BA can fail for three possible reasons: 1. AB may be defined and BA may not (for example, if A is 2 × 3 and B is 3 × 4). 2. AB and BA may both be defined, but they may have different sizes (for example, if A is 2 × 3 and B is 3 × 2). 3. AB and BA may both be defined and have the same size, but the two products may be different (as illustrated in the next example).

E X A M P L E 2 Order Matters in Matrix Multiplication Do not read too much into Example 2—it does not rule out the possibility that AB and BA may be equal in certain cases, just that they are not equal in all cases. If it so happens that AB = BA, then we say that AB and BA commute.

Zero Matrices

Consider the matrices



−1 0

A= Multiplying gives

AB =

2



and B =

3

−1

−2

11

4

1 3

2 0

and BA =

3 −3

6 0

Thus, AB  = BA.

A matrix whose entries are all zero is called a zero matrix. Some examples are



0 0



0 0 ⎢ , ⎣0 0 0

0 0 0



0 0 ⎥ 0⎦ , 0 0

⎡ ⎤

0

⎢0⎥ 0 0 0 ⎢ ⎥ , ⎢ ⎥ , [0] 0 0 0 ⎣0⎦ 0

We will denote a zero matrix by 0 unless it is important to specify its size, in which case we will denote the m × n zero matrix by 0m×n . It should be evident that if A and 0 are matrices with the same size, then

A+0=0+A=A Thus, 0 plays the same role in this matrix equation that the number 0 plays in the numerical equation a + 0 = 0 + a = a. The following theorem lists the basic properties of zero matrices. Since the results should be self-evident, we will omit the formal proofs. THEOREM 1.4.2 Properties of Zero Matrices

If c is a scalar, and if the sizes of the matrices are such that the operations can be perfomed, then: (a) A + 0 = 0 + A = A (b) A − 0 = A (c)

A − A = A + (−A) = 0

(d ) 0A = 0 (e)

If cA = 0, then c = 0 or A = 0.

42

Chapter 1 Systems of Linear Equations and Matrices

Since we know that the commutative law of real arithmetic is not valid in matrix arithmetic, it should not be surprising that there are other rules that fail as well. For example, consider the following two laws of real arithmetic: • If ab = ac and a  = 0, then b = c. [The cancellation law] • If ab = 0, then at least one of the factors on the left is 0. The next two examples show that these laws are not true in matrix arithmetic.

E X A M P L E 3 Failure of the Cancellation Law

Consider the matrices





0 A= 0





1 1 , B= 2 3

We leave it for you to confirm that



1 2 , C= 4 3



3 AB = AC = 6

4 8

5 4





Although A  = 0, canceling A from both sides of the equation AB = AC would lead to the incorrect conclusion that B = C . Thus, the cancellation law does not hold, in general, for matrix multiplication (though there may be particular cases where it is true).

E X A M P L E 4 A Zero Product with Nonzero Factors

Here are two matrices for which AB = 0, but A  = 0 and B  = 0:





0 A= 0

Identity Matrices



1 3 , B= 2 0

7 0



A square matrix with 1’s on the main diagonal and zeros elsewhere is called an identity matrix. Some examples are

 [1],





1 0 ⎢ , ⎣0 1 0

1 0

0 1 0





1 0 ⎢0 ⎥ ⎢ 0⎦ , ⎢ ⎣0 1 0

0 1 0 0

0 0 1 0



0 0⎥ ⎥ ⎥ 0⎦ 1

An identity matrix is denoted by the letter I . If it is important to emphasize the size, we will write In for the n × n identity matrix. To explain the role of identity matrices in matrix arithmetic, let us consider the effect of multiplying a general 2 × 3 matrix A on each side by an identity matrix. Multiplying on the right by the 3 × 3 identity matrix yields

AI3 =

a11 a21

⎡ ⎤

1 0 0 a13 ⎢ a11 ⎥ ⎣0 1 0⎦ = a23 a21

a12 a22

0

0

1

a12 a22

a13 =A a23

and multiplying on the left by the 2 × 2 identity matrix yields



1 I2 A = 0

0 1



a11 a21

a12 a22

a13 a11 = a23 a21

a12 a22

a13 =A a23

1.4 Inverses; Algebraic Properties of Matrices

43

The same result holds in general; that is, if A is any m × n matrix, then

AIn = A and Im A = A Thus, the identity matrices play the same role in matrix arithmetic that the number 1 plays in the numerical equation a · 1 = 1 · a = a. As the next theorem shows, identity matrices arise naturally in studying reduced row echelon forms of square matrices. THEOREM 1.4.3 If R is the reduced row echelon form of an n × n matrix A, then either

R has a row of zeros or R is the identity matrix In .

Proof Suppose that the reduced row echelon form of A is



r11 ⎢r ⎢ 21 R=⎢ . ⎣ .. r n1

r12 r22 .. . r n2

··· ··· ···

⎤ r1n r2n ⎥ ⎥ .. ⎥ . ⎦ rnn

Either the last row in this matrix consists entirely of zeros or it does not. If not, the matrix contains no zero rows, and consequently each of the n rows has a leading entry of 1. Since these leading 1’s occur progressively farther to the right as we move down the matrix, each of these 1’s must occur on the main diagonal. Since the other entries in the same column as one of these 1’s are zero, R must be In . Thus, either R has a row of zeros or R = In . Inverse of a Matrix

In real arithmetic every nonzero number a has a reciprocal a −1 (= 1/a) with the property

a · a −1 = a −1 · a = 1 The number a −1 is sometimes called the multiplicative inverse of a . Our next objective is to develop an analog of this result for matrix arithmetic. For this purpose we make the following definition. DEFINITION 1

If A is a square matrix, and if a matrix B of the same size can be found such that AB = BA = I , then A is said to be invertible (or nonsingular) and B is called an inverse of A. If no such matrix B can be found, then A is said to be singular.

Remark The relationship AB = BA = I is not changed by interchanging A and B , so if A is invertible and B is an inverse of A, then it is also true that B is invertible, and A is an inverse of B . Thus, when AB = BA = I we say that A and B are inverses of one another.

E X A M P L E 5 An Invertible Matrix

Let



2 A= −1

−5 3



3 and B = 1

5 2

44

Chapter 1 Systems of Linear Equations and Matrices

Then

−5 3

AB =

2 −1



3 BA = 1

3 5 2



2 −1

1



−5



5 1 = 2 0 1 = 3 0

0 =I 1

0 =I 1

Thus, A and B are invertible and each is an inverse of the other.

E X A M P L E 6 A Class of Singular Matrices

A square matrix with a row or column of zeros is singular. To help understand why this is so, consider the matrix ⎤ ⎡ 1 4 0 ⎥ ⎢ A = ⎣2 5 0⎦ 3 6 0 As in Example 6, we will frequently denote a zero matrix with one row or one column by a boldface zero.

To prove that A is singular we must show that there is no 3 × 3 matrix B such that AB = BA = I. For this purpose let c1 , c2 , 0 be the column vectors of A. Thus, for any 3 × 3 matrix B we can express the product BA as

BA = B[c1

c2

0] = [B c1

B c2

0] [Formula (6) of Section 1.3]

The column of zeros shows that BA  = I and hence that A is singular.

Properties of Inverses

It is reasonable to ask whether an invertible matrix can have more than one inverse. The next theorem shows that the answer is no—an invertible matrix has exactly one inverse.

THEOREM 1.4.4 If B and C are both inverses of the matrix A, then B

= C.

Proof Since B is an inverse of A, we have BA = I. Multiplying both sides on the right by C gives (BA)C = I C = C . But it is also true that (BA)C = B(AC) = BI = B , so C = B. WARNING The symbol

A−1

should not be interpreted as 1/A. Division by matrices will not be a defined operation in this text.

As a consequence of this important result, we can now speak of “the” inverse of an invertible matrix. If A is invertible, then its inverse will be denoted by the symbol A−1 . Thus,

AA−1 = I and A−1A = I

(1)

The inverse of A plays much the same role in matrix arithmetic that the reciprocal a −1 plays in the numerical relationships aa −1 = 1 and a −1 a = 1. In the next section we will develop a method for computing the inverse of an invertible matrix of any size. For now we give the following theorem that specifies conditions under which a 2 × 2 matrix is invertible and provides a simple formula for its inverse.

Historical Note The formula for A−1 given in Theorem 1.4.5 first appeared (in a more general form) in Arthur Cayley’s 1858 Memoir on the Theory of Matrices. The more general result that Cayley discovered will be studied later.

1.4 Inverses; Algebraic Properties of Matrices

The quantity ad − bc in Theorem 1.4.5 is called the determinant of the 2 × 2 matrix A and is denoted by

THEOREM 1.4.5 The matrix

A

or alternatively by

det(A) =

 b  = ad − bc d

a b = ad – bc c d

Figure 1.4.1

a c

b d

is invertible if and only if ad − bc  = 0, in which case the inverse is given by the formula

det(A) = ad − bc

 a  c

A=

45

−1



−b a

d = ad − bc −c 1

(2)

We will omit the proof, because we will study a more general version of this theorem later. For now, you should at least confirm the validity of Formula (2) by showing that AA−1 = A−1A = I . Remark Figure 1.4.1 illustrates that the determinant of a 2 × 2 matrix A is the product of the entries on its main diagonal minus the product of the entries off its main diagonal.

E X A M P L E 7 Calculating the Inverse of a 2 × 2 Matrix

In each part, determine whether the matrix is invertible. If so, find its inverse.



6 (a) A = 5

1 2

A



−1 6

2 −6

3

= (6)(2) − (1)(5) = 7, which is nonzero.

Thus, A is invertible, and its inverse is 1 2 = 7 −5

−1

(b) A =

Solution (a) The determinant of A is det(A)

−1



 =

− 17

2 7 − 57



6 7

We leave it for you to confirm that AA−1 = A−1 A = I. Solution (b) The matrix is not invertible since det(A)

= (−1)(−6) − (2)(3) = 0.

E X A M P L E 8 Solution of a Linear System by Matrix Inversion

A problem that arises in many applications is to solve a pair of equations of the form

u = ax + by v = cx + dy for x and y in terms of u and v. One approach is to treat this as a linear system of two equations in the unknowns x and y and use Gauss–Jordan elimination to solve for x and y. However, because the coefficients of the unknowns are literal rather than numerical, this procedure is a little clumsy. As an alternative approach, let us replace the two equations by the single matrix equation



u ax + by = v cx + dy

u a = v c

x y If we assume that the 2 × 2 matrix is invertible (i.e., ad − bc  = 0), then we can multiply which we can rewrite as

b d

through on the left by the inverse and rewrite the equation as



a c

b d

− 1 u a = v c

b d

−1

a c

b d

x y

46

Chapter 1 Systems of Linear Equations and Matrices

which simplifies to



a c

b d

− 1 u x = v y

Using Theorem 1.4.5, we can rewrite this equation as



d ad − bc −c 1

from which we obtain

−b a

u x = v y

av − cu du − bv , y= ad − bc ad − bc

x=

The next theorem is concerned with inverses of matrix products. THEOREM 1.4.6 If

A and B are invertible matrices with the same size, then AB is

invertible and

(AB)−1 = B −1 A−1

Proof We can establish the invertibility and obtain the stated formula at the same time

by showing that

(AB)(B −1 A−1 ) = (B −1 A−1 )(AB) = I

But

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I and similarly, (B −1 A−1 )(AB) = I. Although we will not prove it, this result can be extended to three or more factors: A product of any number of invertible matrices is invertible, and the inverse of the product is the product of the inverses in the reverse order.

E X A M P L E 9 The Inverse of a Product



Consider the matrices

A= If a product of matrices is singular, then at least one of the factors must be singular. Why?

We leave it for you to show that



A

−1





3 −2 = , −1 1

B

−1

=







1 −1

−1



2 3 , B= 3 2

3 2

2 2

 −3

4 6 , (AB)−1 = 9 8 −2

7 AB = 9 and also that

1 1

,

−1

−1

B A

=

7 2

1 −1

−1

3 2





Thus, (AB)−1 = B −1 A−1 as guaranteed by Theorem 1.4.6. Powers of a Matrix

If A is a square matrix, then we define the nonnegative integer powers of A to be

A0 = I and An = AA · · · A

[n factors]

and if A is invertible, then we define the negative integer powers of A to be

A−n = (A−1 )n = A−1 A−1 · · · A−1



4 −3 3 −2 = 7 9 −1 1 −2 2

[n factors]

1.4 Inverses; Algebraic Properties of Matrices

47

Because these definitions parallel those for real numbers, the usual laws of nonnegative exponents hold; for example,

Ar As = Ar+s and (Ar )s = Ars In addition, we have the following properties of negative exponents. THEOREM 1.4.7 If A is invertible and n is a nonnegative integer, then:

(a) A−1 is invertible and (A−1 )−1 = A. (b) An is invertible and (An )−1 = A−n = (A−1 )n . (c) kA is invertible for any nonzero scalar k, and (kA)−1 = k −1 A−1 . We will prove part (c) and leave the proofs of parts (a) and (b) as exercises. Proof (c) Properties (m) and (l) of Theorem 1.4.1 imply that

(kA)(k −1 A−1 ) = k −1 (kA)A−1 = (k −1 k)AA−1 = (1)I = I and similarly, (k −1 A−1 )(kA) = I. Thus, kA is invertible and (kA)−1 = k −1 A−1 . E X A M P L E 10 Properties of Exponents

Let A and A−1 be the matrices in Example 9; that is,



1 A= 1 Then

A−3 = (A−1 )3 =



3 −1

Also,

A3 =

1 1

2 3

−1

and A

−2



3 −1

1 2 3



1 1

2 3



so, as expected from Theorem 1.4.7(b), 3 −1

(A )



41 = (11)(41) − (30)(15) −15 1

−2 1



−2

3 = −1



1

−2

3 −1



−30



2 11 = 3 15

=

1

1 1

41 −15

−30 11

30 41

41 = 11 −15

−30 11

= (A−1 )3

E X A M P L E 11 The Square of a Matrix Sum

In real arithmetic, where we have a commutative law for multiplication, we can write

(a + b)2 = a 2 + ab + ba + b2 = a 2 + ab + ab + b2 = a 2 + 2ab + b2 However, in matrix arithmetic, where we have no commutative law for multiplication, the best we can do is to write

(A + B)2 = A2 + AB + BA + B 2

Matrix Polynomials

It is only in the special case where A and B commute (i.e., AB = BA) that we can go a step further and write (A + B)2 = A2 + 2AB + B 2 If A is a square matrix, say n × n, and if

p(x) = a0 + a1 x + a2 x 2 + · · · + am x m

48

Chapter 1 Systems of Linear Equations and Matrices

is any polynomial, then we define the n × n matrix p(A) to be

p(A) = a0 I + a1 A + a2 A2 + · · · + am Am

(3)

where I is the n × n identity matrix; that is, p(A) is obtained by substituting A for x and replacing the constant term a0 by the matrix a0 I. An expression of form (3) is called a matrix polynomial in A. E X A M P L E 1 2 A Matrix Polynomial

Find p(A) for



−1 2

p(x) = x − 2x − 3 and A = 2

Solution

0

3

p(A) = A2 − 2A − 3I

2



−1 2 −1 2 1 0 = −2 −3 0



1 = 0

3



4 −2 − 9 0

0

3



4 3 − 6 0

0

1



0 0 = 3 0

0 0

or more briefly, p(A) = 0. Remark It follows from the fact that Ar As = Ar+s = As+r = As Ar that powers of a square matrix commute, and since a matrix polynomial in A is built up from powers of A, any two matrix polynomials in A also commute; that is, for any polynomials p1 and p2 we have

p1 (A)p2 (A) = p2 (A)p1 (A)

Properties of theTranspose

(4)

The following theorem lists the main properties of the transpose. THEOREM 1.4.8 If the sizes of the matrices are such that the stated operations can be

performed, then: (a) (AT )T = A (b) (A + B)T = AT + B T (c)

(A − B)T = AT − B T

(d ) (kA)T = kAT (e)

(AB)T = B TAT

If you keep in mind that transposing a matrix interchanges its rows and columns, then you should have little trouble visualizing the results in parts (a)–(d ). For example, part (a) states the obvious fact that interchanging rows and columns twice leaves a matrix unchanged; and part (b) states that adding two matrices and then interchanging the rows and columns produces the same result as interchanging the rows and columns before adding. We will omit the formal proofs. Part (e) is less obvious, but for brevity we will omit its proof as well. The result in that part can be extended to three or more factors and restated as: The transpose of a product of any number of matrices is the product of the transposes in the reverse order.

1.4 Inverses; Algebraic Properties of Matrices

49

The following theorem establishes a relationship between the inverse of a matrix and the inverse of its transpose. THEOREM 1.4.9 If A is an invertible matrix, then AT is also invertible and

(AT )−1 = (A−1 )T Proof We can establish the invertibility and obtain the formula at the same time by

showing that

AT(A−1 )T = (A−1 )TAT = I

But from part (e) of Theorem 1.4.8 and the fact that I T = I, we have

AT(A−1 )T = (A−1 A)T = I T = I (A−1 )TAT = (AA−1 )T = I T = I which completes the proof. E X A M P L E 1 3 Inverse of a Transpose

Consider a general 2 × 2 invertible matrix and its transpose:



A=

a c

b d



and AT =

a b

c d

Since A is invertible, its determinant ad − bc is nonzero. But the determinant of AT is also ad − bc (verify), so AT is also invertible. It follows from Theorem 1.4.5 that



(AT )−1

d ⎢ ad − bc =⎢ ⎣ b − ad − bc

which is the same matrix that results if A

−1

T −1

(A )



⎤ c ad − bc ⎥ ⎥ ⎦ a ad − bc

is transposed (verify). Thus,

= (A−1 )T

as guaranteed by Theorem 1.4.9.

Exercise Set 1.4 In Exercises 1–2, verify that the following matrices and scalars satisfy the stated properties of Theorem 1.4.1.

 A=  C=

3

−1

2

4

4

1

−3

−2



 , B=

0

2

1

−4

 ,

 , a = 4 , b = −7

1. (a) The associative law for matrix addition. (b) The associative law for matrix multiplication.

2. (a) a(BC) = (aB)C = B(aC) (b) A(B − C) = AB − AC

In Exercises 3–4, verify that the matrices and scalars in Exercise 1 satisfy the stated properties. 3. (a) (AT )T = A

(b) (AB)T = B TAT

4. (a) (A + B)T = AT + B T

(b) (aC)T = aC T

In Exercises 5–8, use Theorem 1.4.5 to compute the inverse of the matrix.



(c) The left distributive law. (d) (a + b)C = aC + bC

(c) (B + C)A = BA + CA

(d) a(bC) = (ab)C

5. A =

2 4

−3 4

6. B =

3 5

1 2

50

Chapter 1 Systems of Linear Equations and Matrices



2 0

7. C =

0 3





6 −2

8. D =

4 −1

9. Find the inverse of

⎡ ⎣

4x1 − 3x2 = −2

x1 + 4x2 = 4

+ e−x )

28. 2x1 − 2x2 = 4

1 (ex 2

0

− e−x )

27. 6x1 + x2 =

1 (ex 2



3

− e−x )

4x1 + 5x2 =

1 (ex 2

25. 3x1 − 2x2 = −1

+ e−x )



1 (ex 2

10. Find the inverse of



cos θ − sin θ

sin θ cos θ

In Exercises 25–28, use the method of Example 8 to find the unique solution of the given linear system.

11. (A )

−1 T

−1 −1

= (A )



−3

7 −2

1



−1

17. (I + 2A)−1 =

4

16. (5AT )−1 =

2 5

18. A−1 =

−3

−1

5

2





−1

2 3

5

−3

(a) A

(b) A

19. A =

3 2

1 1

(c) A − 2A + I 2

20. A =

2 4

0 1

(b) p(x) = 2x 2 − x + 1 (c) p(x) = x 3 − 2x + 1 3 2

1 1

22. A =

In Exercises 23–24, let



A=

a

b

c

d



, B=



0

1

0

0

2 4



0 1

31. (a) Give an example of two 2 × 2 matrices such that

(b) State a valid formula for multiplying out

32. The numerical equation a 2 = 1 has exactly two solutions. Find at least eight solutions of the matrix equation A2 = I3 . [Hint: Look for solutions in which all entries off the main diagonal are zero.] 33. (a) Show that if a square matrix A satisfies the equation A2 + 2A + I = 0, then A must be invertible. What is the inverse?

 , C=

30. An arbitrary square matrix A.

(c) What condition can you impose on A and B that will allow you to write (A + B)(A − B) = A2 − B 2 ?

(a) p(x) = x − 2



29. The matrix A in Exercise 21.

(A + B)(A − B)

In Exercises 21–22, compute p(A) for the given matrix A and the following polynomials.

21. A =

p(x) = x 2 − 9, p1 (x) = x + 3, p2 (x) = x − 3

(A + B)(A − B)  = A2 − B 2

In Exercises 19–20, compute the following using the given matrix A. 3

and if A is a square matrix, then it can be proved that

In Exercises 29–30, verify this statement for the stated matrix A and polynomials

14. (ABC)T = C TB TAT

In Exercises 15–18, use the given information to find A. 15. (7A)−1 =

p(x) = p1 (x)p2 (x) p(A) = p1 (A)p2 (A)

=A

12. (A )

13. (ABC)−1 = C −1 B −1 A−1

−x1 − 3x2 = 1

If a polynomial p(x) can be factored as a product of lower degree polynomials, say

In Exercises 11–14, verify that the equations are valid for the matrices in Exercises 5–8. T −1

26. −x1 + 5x2 = 4



0

0

1

0

(b) Show that if p(x) is a polynomial with a nonzero constant term, and if A is a square matrix for which p(A) = 0, then A is invertible. 34. Is it possible for A3 to be an identity matrix without A being invertible? Explain.

23. Find all values of a, b, c, and d (if any) for which the matrices A and B commute.

35. Can a matrix with a row of zeros or a column of zeros have an inverse? Explain.

24. Find all values of a, b, c, and d (if any) for which the matrices A and C commute.

36. Can a matrix with two identical rows or two identical columns have an inverse? Explain.

1.4 Inverses; Algebraic Properties of Matrices

In Exercises 37–38, determine whether A is invertible, and if so, find the inverse. [Hint: Solve AX = I for X by equating corresponding entries on the two sides.]



1

⎢ 37. A = ⎣1 0

0

1





1

⎥ 0⎦

1

1

1

⎢ 38. A = ⎣1 0

1

1



0

⎥ 0⎦

1

1

In Exercises 39–40, simplify the expression assuming that A, B , C , and D are invertible.

51

49. Assuming that all matrices are n × n and invertible, solve for D .

C T B −1 A2 BAC −1DA−2 B T C −2 = C T 50. Assuming that all matrices are n × n and invertible, solve for D .

ABC TDBAT C = AB T

Working with Proofs In Exercises 51–58, prove the stated result.

39. (AB)−1 (AC −1 )(D −1 C −1 )−1 D −1 40. (AC

−1 −1

−1

−1 −1

) (AC )(AC ) AD

41. Show that if R is a 1 × n matrix and C is an n × 1 matrix, then RC = tr(CR). 42. If A is a square matrix and n is a positive integer, is it true that (An )T = (AT )n ? Justify your answer. 43. (a) Show that if A is invertible and AB = AC , then B = C . (b) Explain why part (a) and Example 3 do not contradict one another. 44. Show that if A is invertible and k is any nonzero scalar, then (kA)n = k nAn for all integer values of n. 45. (a) Show that if A, B , and A + B are invertible matrices with the same size, then

A(A−1 + B −1 )B(A + B)−1 = I (b) What does the result in part (a) tell you about the matrix A−1 + B −1 ? 46. A square matrix A is said to be idempotent if A2 = A. (a) Show that if A is idempotent, then so is I − A. (b) Show that if A is idempotent, then 2A − I is invertible and is its own inverse. 47. Show that if A is a square matrix such that Ak = 0 for some positive integer k , then the matrix I − A is invertible and

(I − A)−1 = I + A + A2 + · · · + Ak−1 48. Show that the matrix

 A=

51. Theorem 1.4.1(a)

52. Theorem 1.4.1(b)

53. Theorem 1.4.1( f )

54. Theorem 1.4.1(c)

55. Theorem 1.4.2(c)

56. Theorem 1.4.2(b)

57. Theorem 1.4.8(d)

58. Theorem 1.4.8(e)

−1

a

b

c

d



satisfies the equation

A2 − (a + d)A + (ad − bc)I = 0

True-False Exercises TF. In parts (a)–(k) determine whether the statement is true or false, and justify your answer. (a) Two n × n matrices, A and B , are inverses of one another if and only if AB = BA = 0. (b) For all square matrices A and B of the same size, it is true that (A + B)2 = A2 + 2AB + B 2 . (c) For all square matrices A and B of the same size, it is true that A2 − B 2 = (A − B)(A + B). (d) If A and B are invertible matrices of the same size, then AB is invertible and (AB)−1 = A−1 B −1 . (e) If A and B are matrices such that AB is defined, then it is true that (AB)T = ATB T . (f ) The matrix

A=

a c

b d

is invertible if and only if ad − bc  = 0. (g) If A and B are matrices of the same size and k is a constant, then (kA + B)T = kAT + B T . (h) If A is an invertible matrix, then so is AT . (i) If p(x) = a0 + a1 x + a2 x 2 + · · · + am x m and I is an identity matrix, then p(I ) = a0 + a1 + a2 + · · · + am . ( j) A square matrix containing a row or column of zeros cannot be invertible. (k) The sum of two invertible matrices of the same size must be invertible.

52

Chapter 1 Systems of Linear Equations and Matrices

Working withTechnology T1. Let A be the matrix



0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . 0

1 2

1⎤ 3

⎢1 A=⎢ ⎣4

0

1⎥ 5⎦

1 6

1 7

0

the terms of which are commonly denoted as



F0 , F1 , F2 , F3 , . . . , Fn , . . . After the initial terms F0 = 0 and F1 = 1, each term is the sum of the previous two; that is,

Discuss the behavior of Ak as k increases indefinitely, that is, as k → ⬁. T2. In each part use your technology utility to make a conjecture about the form of An for positive integer powers of n.



(a) A =

a

1

0

a





(b) A =

cos θ

sin θ

− sin θ

cos θ

Fn = Fn−1 + Fn−2 Confirm that if

 Q=



F2

F1

F1

F0 

then

T3. The Fibonacci sequence (named for the Italian mathematician Leonardo Fibonacci 1170–1250) is



Q = n

 =



1

1

1

0

Fn+1

Fn

Fn

F0



1.5 Elementary Matrices and a Method for Finding A−1 In this section we will develop an algorithm for finding the inverse of a matrix, and we will discuss some of the basic properties of invertible matrices.

In Section 1.1 we defined three elementary row operations on a matrix A: 1. Multiply a row by a nonzero constant c. 2. Interchange two rows. 3. Add a constant c times one row to another. It should be evident that if we let B be the matrix that results from A by performing one of the operations in this list, then the matrix A can be recovered from B by performing the corresponding operation in the following list: 1. Multiply the same row by 1/c. 2. Interchange the same two rows. 3. If B resulted by adding c times row ri of A to row rj , then add −c times rj to ri . It follows that if B is obtained from A by performing a sequence of elementary row operations, then there is a second sequence of elementary row operations, which when applied to B recovers A (Exercise 33). Accordingly, we make the following definition. DEFINITION 1 Matrices

A and B are said to be row equivalent if either (hence each) can be obtained from the other by a sequence of elementary row operations.

Our next goal is to show how matrix multiplication can be used to carry out an elementary row operation. DEFINITION 2

A matrix E is called an elementary matrix if it can be obtained from an identity matrix by performing a single elementary row operation.

1.5 Elementary Matrices and a Method for Finding A−1

53

E X A M P L E 1 Elementary Matrices and Row Operations

Listed below are four elementary matrices and the operations that produce them.



1 0

0



−3

1 ⎢0 ⎢ ⎢ ⎣0 0

0 0 0 1

0 0 1 0



0 1⎥ ⎥ ⎥ 0⎦ 0



1 ⎢ ⎣0 0



0 1 0

3 ⎥ 0⎦ 1

Interchange the second and fourth rows of I4 .



1 ⎢ ⎣0 0

0 1 0



0 ⎥ 0⎦ 1







 Multiply the second row of I2 by −3.



Add 3 times the third row of I3 to the first row.

Multiply the first row of I3 by 1.

The following theorem, whose proof is left as an exercise, shows that when a matrix A is multiplied on the left by an elementary matrix E , the effect is to perform an elementary row operation on A. THEOREM 1.5.1 Row Operations by Matrix Multiplication

If the elementary matrix E results from performing a certain row operation on Im and if A is an m × n matrix, then the product EA is the matrix that results when this same row operation is performed on A.

E X A M P L E 2 Using Elementary Matrices

Consider the matrix



1 ⎢ A = ⎣2 1 and consider the elementary matrix

0 −1 4



1 ⎢ E = ⎣0 3



2 3 4

0 1 0

3 ⎥ 6⎦ 0



0 ⎥ 0⎦ 1

which results from adding 3 times the first row of I3 to the third row. The product EA is Theorem 1.5.1 will be a useful tool for developing new results about matrices, but as a practical matter it is usually preferable to perform row operations directly.



1 ⎢ EA = ⎣2 4

0 −1 4

2 3 10



3 ⎥ 6⎦ 9

which is precisely the matrix that results when we add 3 times the first row of A to the third row.

We know from the discussion at the beginning of this section that if E is an elementary matrix that results from performing an elementary row operation on an identity matrix I , then there is a second elementary row operation, which when applied to E produces I back again. Table 1 lists these operations. The operations on the right side of the table are called the inverse operations of the corresponding operations on the left.

54

Chapter 1 Systems of Linear Equations and Matrices Table 1 Row Operation on I That Produces E

Row Operation on E That Reproduces I

Multiply row i by c  = 0

Multiply row i by 1/c

Interchange rows i and j

Interchange rows i and j

Add c time row i to row j

Add −c times row i to row j

E X A M P L E 3 Row Operations and Inverse Row Operations

In each of the following, an elementary row operation is applied to the 2 × 2 identity matrix to obtain an elementary matrix E , then E is restored to the identity matrix by applying the inverse row operation.



1 0

0 1



−→

1 0

0 7

1 0

0 1

−→

0 1

0 1

1 0

−→

1 0

0 1

0 1



−→



Interchange the first Interchange the first and second rows. and second rows.

1 0

Multiply the second row by 17 .





−→

1 0



 Multiply the second row by 7.





1 0

5 1



−→

1 0

0 1





Add 5 times the Add −5 times the second row to the second row to the first. first.

The next theorem is a key result about invertibility of elementary matrices. It will be a building block for many results that follow. THEOREM 1.5.2 Every elementary matrix is invertible, and the inverse is also an ele-

mentary matrix. Proof If E is an elementary matrix, then E results by performing some row operation on I . Let E0 be the matrix that results when the inverse of this operation is performed on I . Applying Theorem 1.5.1 and using the fact that inverse row operations cancel the effect of each other, it follows that

E0 E = I and EE0 = I Thus, the elementary matrix E0 is the inverse of E . EquivalenceTheorem

One of our objectives as we progress through this text is to show how seemingly diverse ideas in linear algebra are related. The following theorem, which relates results we have obtained about invertibility of matrices, homogeneous linear systems, reduced row

1.5 Elementary Matrices and a Method for Finding A−1

55

echelon forms, and elementary matrices, is our first step in that direction. As we study new topics, more statements will be added to this theorem.

THEOREM 1.5.3 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent, that is, all true or all false. (a) A is invertible. (b) Ax = 0 has only the trivial solution. (c)

The reduced row echelon form of A is In .

(d ) A is expressible as a product of elementary matrices.

The following figure illustrates visually that from the sequence of implications

(a) ⇒ (b) ⇒ (c) ⇒ (d ) ⇒ (a) we can conclude that

Proof We will prove the equivalence by establishing the chain of implications:

(a) ⇒ (b) ⇒ (c) ⇒ (d ) ⇒ (a). = 0. Multiplying both sides of this equation by the matrix A−1 gives A−1 (Ax0 ) = A−1 0, or (A−1 A)x0 = 0, or I x0 = 0, or x0 = 0. Thus, Ax = 0 has only the trivial solution. (a) ⇒ (b) Assume A is invertible and let x0 be any solution of Ax

(d ) ⇒ (c) ⇒ (b) ⇒ (a) (b) ⇒ (c) Let Ax

and hence that

(a) ⇔ (b) ⇔ (c) ⇔ (d ) (see Appendix A). (a)

(d)

(b)

= 0 be the matrix form of the system a11 x1 + a12 x2 + · · · + a1n xn = 0 a21 x1 + a22 x2 + · · · + a2n xn = 0 .. .. .. .. . . . . an1 x1 + an2 x2 + · · · + ann xn = 0

(1)

and assume that the system has only the trivial solution. If we solve by Gauss–Jordan elimination, then the system of equations corresponding to the reduced row echelon form of the augmented matrix will be

=0 =0

x1

(c)

x2

..

. xn = 0

Thus the augmented matrix



a11 ⎢a ⎢ 21 ⎢ . ⎣ .. an1

··· ···

a12 a22 .. . an2

···

0 0⎥ ⎥

ann

0

for (1) can be reduced to the augmented matrix



1 ⎢ ⎢0 ⎢ ⎢0 ⎢

0 1 0

0 0 1

0

0

0

⎢ .. ⎣.

.. .

.. .



a1n a2n .. .

.. ⎥ .⎦

⎤ ··· 0 0 ⎥ · · · 0 0⎥ ⎥ · · · 0 0⎥ .. .. ⎥ ⎥ . .⎦ ··· 1 0

(2)

56

Chapter 1 Systems of Linear Equations and Matrices

for (2) by a sequence of elementary row operations. If we disregard the last column (all zeros) in each of these matrices, we can conclude that the reduced row echelon form of A is In . (c) ⇒ (d ) Assume that the reduced row echelon form of A is In , so that A can be reduced

to In by a finite sequence of elementary row operations. By Theorem 1.5.1, each of these operations can be accomplished by multiplying on the left by an appropriate elementary matrix. Thus we can find elementary matrices E1 , E2 , . . . , Ek such that

Ek · · · E2 E1 A = In

(3)

By Theorem 1.5.2, E1 , E2 , . . . , Ek are invertible. Multiplying both sides of Equation (3) on the left successively by Ek−1 , . . . , E2−1 , E1−1 we obtain

A = E1−1 E2−1 · · · Ek−1 In = E1−1 E2−1 · · · Ek−1

(4)

By Theorem 1.5.2, this equation expresses A as a product of elementary matrices. (d ) ⇒ (a) If A is a product of elementary matrices, then from Theorems 1.4.7 and 1.5.2,

the matrix A is a product of invertible matrices and hence is invertible.

A Method for Inverting Matrices

As a first application of Theorem 1.5.3, we will develop a procedure (or algorithm) that can be used to tell whether a given matrix is invertible, and if so, produce its inverse. To derive this algorithm, assume for the moment, that A is an invertible n × n matrix. In Equation (3), the elementary matrices execute a sequence of row operations that reduce A to In . If we multiply both sides of this equation on the right by A−1 and simplify, we obtain

A−1 = Ek · · · E2 E1 In But this equation tells us that the same sequence of row operations that reduces A to In will transform In to A−1 . Thus, we have established the following result. Inversion Algorithm To find the inverse of an invertible matrix A, find a sequence of

elementary row operations that reduces A to the identity and then perform that same sequence of operations on In to obtain A−1 . A simple method for carrying out this procedure is given in the following example.

E X A M P L E 4 Using Row Operations to Find A−1

Find the inverse of



1 ⎢ A = ⎣2 1

2 5 0



3 ⎥ 3⎦ 8

Solution We want to reduce A to the identity matrix by row operations and simultaneously apply these operations to I to produce A−1 . To accomplish this we will adjoin the identity matrix to the right side of A, thereby producing a partitioned matrix of the form

[A | I ]

1.5 Elementary Matrices and a Method for Finding A−1

57

Then we will apply row operations to this matrix until the left side is reduced to I ; these operations will convert the right side to A−1 , so the final matrix will have the form

[I | A−1 ] The computations are as follows:



1



2

3

1

0

0

5

3

0

1

0⎦

1

0

8

0

0

1

1

2

3

1

0

0

1

−3

−2

1

0⎦

−2

5

−1

0

1

2

3

1

0

0

1

−2 −5

1

0⎦

0

−3 −1

2

1

2

3

1

0

0

1

−3

−2

1

0⎦

0

1

5

−2

−1

⎢ ⎣0

2

0

−14

6

3

1

0

13

0

0

1

5

−2

−1

⎢ ⎣0

0

0

−40

16

9

1

0

13

−5

⎥ −3⎦

0

0

1

5

−2

−1

−40

16 −5 −2

9 ⎥ −3⎦ −1

⎢ ⎣2 ⎡

⎢ ⎣0 0



1

⎢ ⎣0 0



1

⎢ ⎣0 0





1

1

Thus,



⎢ A−1 = ⎣ 13 5



⎤ ⎥

We added −2 times the first row to the second and −1 times the first row to the third.

⎤ ⎥

We added 2 times the second row to the third.

⎤ ⎥

We multiplied the third row by −1.



⎥ −5 −3 ⎦

We added 3 times the third row to the second and −3 times the third row to the first.

⎤ We added −2 times the second row to the first.



Often it will not be known in advance if a given n × n matrix A is invertible. However, if it is not, then by parts (a) and (c) of Theorem 1.5.3 it will be impossible to reduce A to In by elementary row operations. This will be signaled by a row of zeros appearing on the left side of the partition at some stage of the inversion algorithm. If this occurs, then you can stop the computations and conclude that A is not invertible. E X A M P L E 5 Showing That a Matrix Is Not Invertible

Consider the matrix ⎡

1 ⎢ A=⎣ 2 −1

6 4 2



4 ⎥ −1⎦ 5

58

Chapter 1 Systems of Linear Equations and Matrices

Applying the procedure of Example 4 yields





1 ⎢ ⎣ 2 −1

6 4 2

4 −1 5

1 0 0

0 1 0

0 ⎥ 0⎦ 1

1 ⎢ ⎣ 0 0

6 −8 8

4 −9 9

1 −2 1

0 1 0

0 ⎥ 0⎦ 1

1 ⎢ ⎣ 0 0

6 −8 0

4 −9 0

1 −2 −1

0 1 1

0 ⎥ 0⎦ 1





⎤ We added −2 times the first row to the second and added the first row to the third.



We added the second row to the third.

Since we have obtained a row of zeros on the left side, A is not invertible. E X A M P L E 6 Analyzing Homogeneous Systems

Use Theorem 1.5.3 to determine whether the given homogeneous system has nontrivial solutions.

x1 + 2x2 + 3x3 = 0

(a)

x1 + 6x2 + 4x3 = 0

(b)

2x1 + 5x2 + 3x3 = 0

2x1 + 4x2 − x3 = 0

+ 8x3 = 0

−x1 + 2x2 + 5x3 = 0

x1

Solution From parts (a) and (b) of Theorem 1.5.3 a homogeneous linear system has

only the trivial solution if and only if its coefficient matrix is invertible. From Examples 4 and 5 the coefficient matrix of system (a) is invertible and that of system (b) is not. Thus, system (a) has only the trivial solution while system (b) has nontrivial solutions.

Exercise Set 1.5 In Exercises 1–2, determine whether the given matrix is elementary.



1. (a)

1

0

−5

1



1

⎢ (c) ⎣0 0

1 0





(b)

1

0



0

⎥ 1⎦

0

0

 2. (a)



1

0



2

⎢0 ⎢ (d) ⎢ ⎣0 ⎡

3



0

0

(c) ⎣0 0

1

9⎦

0

1





 2

0

1

0

0

1

0⎦

0

0

1





0

1

1

0⎦

0

0

1

−3

0

1







1

0

0

(c) ⎣ 0 −5

1

0⎦

0

1



 ⎤

0

0

(d) ⎣ 0 0

0

1⎦

1

0



4. (a)

1

0

−3

1



0

⎢0 ⎢ ⎣0

(c) ⎢

1

0

0

(b) ⎣ 0 0

1

0⎦

0

1





0

1

0

0

0

0⎦

0

0

1

1

0

0

(b) ⎣0 0

1

0⎦

0

3

⎡ ⎢

1



0

0

1

0

0

1

⎥ 0⎦

0

0

0

0⎥ ⎥



1

⎢0 ⎢ ⎣0

(d) ⎢

0

0



1

0





0

⎢0 ⎢ (d) ⎢ ⎣1





−7





−1



3. (a)

0⎥ ⎥

0





0

(b) ⎣0 1



0

1



1

0





−5

In Exercises 3–4, find a row operation and the corresponding elementary matrix that will restore the given elementary matrix to the identity matrix.

0⎥ ⎥



⎤ ⎥ ⎤

0

− 17

1

0

0

1

0⎦

0

0

1

0

0⎥ ⎥



1.5 Elementary Matrices and a Method for Finding A−1

In Exercises 5–6 an elementary matrix E and a matrix A are given. Identify the row operation corresponding to E and verify that the product EA results from applying the row operation to A.

 5. (a) E =

0

1

1

0





 , A=

1



0

⎢ (b) E = ⎣0

−2 −6 ⎡

3

0 −3

−6

1

0

4

(c) E = ⎣0

1

0⎦ , A = ⎣2

5⎦

0

0

1

6



6. (a) E =

 −6

0

0

1







1

4

3



 , A=

−1 3









0

0

(b) E = ⎣−4

1

0⎦ , A = ⎣ 1

0

0

1



1

⎢ (c) E = ⎣0 0

0

0

2



4

3

6

5 0

1

3 ⎢ A = ⎣2 8



3 ⎢ C = ⎣2 2



8 ⎢ F = ⎣8 3



0

−4

−1

5

0

1

3

−4

⎤ ⎥

3⎦

−1



1 8 ⎥ ⎢ −1⎦ , B = ⎣2 5 3

4 −7 −7

1 8 ⎥ ⎢ −1⎦ , D = ⎣−6 3 3

1 1 4



1 −7 4



1 21 4

0

8

⎡1







(d) EF = B

0

1

13. ⎣0 1

1

1⎦

1

0





(b) A =

6

6

7

6⎦

7

7



0

2

1 ⎥ 10 ⎦ 1 10

(b)

−4

⎤ ⎥

1⎦

−9 − 25

1 5 3 −5 − 45



⎥ − 103 ⎦ 1 10





2

3 2





14. ⎣−4 2 0



1

⎢1 ⎢ ⎣1



16. ⎢

1

0

2

12

0

2

−1

−4

0



0⎥ ⎥

⎥ 0⎦ −5

⎢0 ⎢ ⎣0

0

0

k2

0

0

k3

0

0

0

⎢0 ⎢ ⎣0 k4

8

−4

⎡ √

−4

20. (a) ⎢

−4







0

0

1

2

0

3

0

3

5

0⎦

3

5

7

0⎥ ⎥



0

2

⎢1 ⎢ ⎣0

0

0

−1

3

2

1

5

18. ⎢



0⎦



0

0

0



0



1⎥ ⎥



0⎦

−3

In Exercises 19–20, find the inverse of each of the following 4 × 4 matrices, where k1 , k2 , k3 , k4 , and k are all nonzero.



−4

−2

4

5 ⎢2 ⎣5 1 5



2

⎢1 ⎢ ⎣0

5 ⎥ 3⎦ 1

2

−3

(b) ⎣ 2

⎡1



15. ⎣2 2

17. ⎢

In Exercises 9–10, first use Theorem 1.4.5 and then use the inversion algorithm to find A−1 , if it exists.



4

3





1

19. (a) ⎢

(c) EB = F

6

−1



− 25

1 5 1 5 − 45

5 ⎢1 ⎣5 1 5

k1

(b) ED = B

7

1



8. (a) EB = D

2

3⎦



(d) EC = A

4

5



2

5 ⎥ 1⎦ 1

(c) EA = C

1

11. (a) ⎣2



5 ⎥ −1 ⎦ 1

(b) EB = A

9. (a) A =

3



7. (a) EA = B





2



⎥ 5⎦

4 −7 1



(b) A =

In Exercises 13–18, use the inversion algorithm to find the inverse of the matrix (if the inverse exists).

In Exercises 7–8, use the following matrices and find an elementary matrix E that satisfies the stated equation.









1

⎥ ⎢ 0⎦ , A = ⎣2

−6



−1 −3

2



−1 −6

3

1

12. (a)

5

−5 −16

1

In Exercises 11–12, use the inversion algorithm to find the inverse of the matrix (if the inverse exists).

⎡ ⎤



−2 −6

1



−1 −6

5

2 −1 0 −4 −4 ⎥ ⎢ ⎥ 0⎦ , A = ⎣1 −3 −1 5 3⎦ 2 0 1 3 −1 1 ⎤ ⎡ ⎤ 0

1



−1



 10. (a) A =

59

0



0⎥ ⎥



k

0

1

0

0

k

1⎦

0

0

0

1

k

⎢1 ⎢ ⎣0

0

0

0

k

0

1

k

0⎦

0

0

1

k

⎢0 ⎢ ⎣0

(b) ⎢

0

⎥ 0⎦ k4

0

0

k1





0

k2

k3

0

⎥ 0⎦

0

0

0

0⎥ ⎥



1

(b) ⎢

0

0⎥ ⎥





0⎥ ⎥



In Exercises 21–22, find all values of c, if any, for which the given matrix is invertible.

⎡ ⎢

c

21. ⎣1 1

c c 1

⎤ c ⎥ c⎦ c





c

1

22. ⎣1 0

c

0 ⎥ 1⎦

1

c



60

Chapter 1 Systems of Linear Equations and Matrices

In Exercises 23–26, express the matrix and its inverse as products of elementary matrices.



23.

−3

1 2

2



1 ⎢ 25. ⎣0 0



1 −5

24.

−2

0 4 0



0 2



1 ⎢ 26. ⎣1 0



3⎦ 1



1 1 1

True-False Exercises

0 ⎥ 1⎦ 1

TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.

In Exercises 27–28, show that the matrices A and B are row equivalent by finding a sequence of elementary row operations that produces B from A, and then use that result to find a matrix C such that CA = B .











1

2

3

1

0

4

1⎦, B = ⎣0

2

⎥ −2 ⎦

2

1

9

1

4

⎢ ⎡

1









5



27. A = ⎣1

2

1

0

6

9

28. A = ⎣−1

1

0⎦, B = ⎣−5

3

0

−1 −2



−1

−1 ⎡

29. Show that if

1

⎢ A = ⎣0 a

4

⎤ ⎥

0⎦

−1



0

0

1

0⎦

b

c





0

⎢ ⎢b ⎢ A=⎢ ⎢0 ⎢0 ⎣

a

0

0

0

c

0

d

0

e

0

f

0

(a) The product of two elementary matrices of the same size must be an elementary matrix. (b) Every elementary matrix is invertible. (c) If A and B are row equivalent, and if B and C are row equivalent, then A and C are row equivalent. (d) If A is an n × n matrix that is not invertible, then the linear system Ax = 0 has infinitely many solutions. (e) If A is an n × n matrix that is not invertible, then the matrix obtained by interchanging two rows of A cannot be invertible. (f ) If A is invertible and a multiple of the first row of A is added to the second row, then the resulting matrix is invertible. (g) An expression of an invertible matrix A as a product of elementary matrices is unique.

is an elementary matrix, then at least one entry in the third row must be zero. 30. Show that

33. Prove that if B is obtained from A by performing a sequence of elementary row operations, then there is a second sequence of elementary row operations, which when applied to B recovers A.

0



⎥ ⎥ 0⎥ ⎥ g⎥ ⎦

Working withTechnology T1. It can be proved that if the partitioned matrix



0⎥

0 0 0 h 0 is not invertible for any values of the entries.

Working with Proofs 31. Prove that if A and B are m × n matrices, then A and B are row equivalent if and only if A and B have the same reduced row echelon form. 32. Prove that if A is an invertible matrix and B is row equivalent to A, then B is also invertible.

A

B

C

D



is invertible, then its inverse is



A−1 + A−1 B(D − CA−1 B)−1 CA−1

−A−1 B(D − CA−1 B)−1

−(D − CA−1 B)−1 CA−1

(D − CA−1 B)−1



provided that all of the inverses on the right side exist. Use this result to find the inverse of the matrix





1

2

1

0

⎢0 ⎢ ⎢ ⎣0

−1

0

1⎥ ⎥

0

2

0⎦

0

0

3

3



1.6 More on Linear Systems and Invertible Matrices

61

1.6 More on Linear Systems and Invertible Matrices In this section we will show how the inverse of a matrix can be used to solve a linear system and we will develop some more results about invertible matrices.

Number of Solutions of a Linear System

In Section 1.1 we made the statement (based on Figures 1.1.1 and 1.1.2) that every linear system either has no solutions, has exactly one solution, or has infinitely many solutions. We are now in a position to prove this fundamental result. THEOREM 1.6.1 A system of linear equations has zero, one, or infinitely many solutions.

There are no other possibilities.

Ax = b is a system of linear equations, exactly one of the following is true: (a) the system has no solutions, (b) the system has exactly one solution, or (c) the system has more than one solution. The proof will be complete if we can show that the system has infinitely many solutions in case (c). Assume that Ax = b has more than one solution, and let x0 = x1 − x2 , where x1 and x2 are any two distinct solutions. Because x1 and x2 are distinct, the matrix x0 is nonzero; moreover, Proof If

Ax0 = A(x1 − x2 ) = Ax1 − Ax2 = b − b = 0 If we now let k be any scalar, then

A(x1 + k x0 ) = Ax1 + A(k x0 ) = Ax1 + k(Ax0 ) = b + k0 = b + 0 = b But this says that x1 + k x0 is a solution of Ax = b. Since x0 is nonzero and there are infinitely many choices for k , the system Ax = b has infinitely many solutions. Solving Linear Systems by Matrix Inversion

Thus far we have studied two procedures for solving linear systems—Gauss–Jordan elimination and Gaussian elimination. The following theorem provides an actual formula for the solution of a linear system of n equations in n unknowns in the case where the coefficient matrix is invertible.

A is an invertible n × n matrix, then for each n × 1 matrix b, the system of equations Ax = b has exactly one solution, namely, x = A−1 b.

THEOREM 1.6.2 If

Proof Since

A(A−1 b) = b, it follows that x = A−1 b is a solution of Ax = b. To show

that this is the only solution, we will assume that x0 is an arbitrary solution and then show that x0 must be the solution A−1 b. If x0 is any solution of Ax = b, then Ax0 = b. Multiplying both sides of this equation by A−1 , we obtain x0 = A−1 b. E X A M P L E 1 Solution of a Linear System Using A−1

Consider the system of linear equations

x1 + 2x2 + 3x3 = 5 2x1 + 5x2 + 3x3 = 3

x1

+ 8x3 = 17

62

Chapter 1 Systems of Linear Equations and Matrices

In matrix form this system can be written as Ax = b, where



1 ⎢ A = ⎣2 1



⎡ ⎤

⎡ ⎤

3 5 x1 ⎥ ⎢ ⎥ ⎢ ⎥ 3⎦, x = ⎣x2 ⎦, b = ⎣ 3⎦ 8 17 x3

2 5 0

In Example 4 of the preceding section, we showed that A is invertible and



−40

⎢ A−1 = ⎣ 13 5 Keep in mind that the method of Example 1 only applies when the system has as many equations as unknowns and the coefficient matrix is invertible.

Linear Systems with a Common Coefficient Matrix

16 −5 −2

By Theorem 1.6.2, the solution of the system is



−40



x = A−1 b = ⎣ 13 5

16 −5 −2



9 ⎥ −3⎦ −1

⎤⎡ ⎤





9 5 1 ⎥⎢ ⎥ ⎢ ⎥ −3⎦ ⎣ 3⎦ = ⎣−1⎦ 17 2 −1

or x1 = 1, x2 = −1, x3 = 2. Frequently, one is concerned with solving a sequence of systems

Ax = b1 , Ax = b2 , Ax = b3 , . . . , Ax = bk each of which has the same square coefficient matrix A. If A is invertible, then the solutions x1 = A−1 b1 , x2 = A−1 b2 , x3 = A−1 b3 , . . . , xk = A−1 bk can be obtained with one matrix inversion and k matrix multiplications. An efficient way to do this is to form the partitioned matrix

[A | b1 | b2 | · · · | bk ]

(1)

in which the coefficient matrix A is “augmented” by all k of the matrices b1 , b2 , . . . , bk , and then reduce (1) to reduced row echelon form by Gauss–Jordan elimination. In this way we can solve all k systems at once. This method has the added advantage that it applies even when A is not invertible. E X A M P L E 2 Solving Two Linear Systems at Once

Solve the systems (a)

x1 + 2x2 + 3x3 = 4 x1

x1 + 2x2 + 3x3 =

1

2x1 + 5x2 + 3x3 =

6

(b)

2x1 + 5x2 + 3x3 = 5

+ 8x3 = 9

+ 8x3 = −6

x1

Solution The two systems have the same coefficient matrix. If we augment this co-

efficient matrix with the columns of constants obtain ⎡ 1 2 3 ⎢ 2 5 3 ⎣ 1 0 8

on the right sides of these systems, we 4 5 9



1 ⎥ 6⎦ −6

Reducing this matrix to reduced row echelon form yields (verify)



1 ⎢ 0 ⎣ 0

0 1 0

0 0 1

1 0 1



2 ⎥ 1⎦ −1

1.6 More on Linear Systems and Invertible Matrices

63

It follows from the last two columns that the solution of system (a) is x1 = 1, x2 = 0, x3 = 1 and the solution of system (b) is x1 = 2, x2 = 1, x3 = −1. Properties of Invertible Matrices

Up to now, to show that an n × n matrix A is invertible, it has been necessary to find an n × n matrix B such that AB = I and BA = I The next theorem shows that if we produce an n × n matrix B satisfying either condition, then the other condition will hold automatically.

THEOREM 1.6.3 Let A be a square matrix.

(a) If B is a square matrix satisfying BA = I, then B = A−1 . (b) If B is a square matrix satisfying AB = I, then B = A−1 .

We will prove part (a) and leave part (b) as an exercise.

BA = I . If we can show that A is invertible, the proof can be completed by multiplying BA = I on both sides by A−1 to obtain

Proof (a) Assume that

BAA−1 = IA−1 or BI = IA−1 or B = A−1 To show that A is invertible, it suffices to show that the system Ax = 0 has only the trivial solution (see Theorem 1.5.3). Let x0 be any solution of this system. If we multiply both sides of Ax0 = 0 on the left by B , we obtain BAx0 = B 0 or I x0 = 0 or x0 = 0. Thus, the system of equations Ax = 0 has only the trivial solution. EquivalenceTheorem

We are now in a position to add two more statements to the four given in Theorem 1.5.3.

THEOREM 1.6.4 Equivalent Statements

If A is an n × n matrix, then the following are equivalent. (b)

A is invertible. Ax = 0 has only the trivial solution.

(c)

The reduced row echelon form of A is In .

(d ) (e)

A is expressible as a product of elementary matrices. Ax = b is consistent for every n × 1 matrix b.

( f)

Ax = b has exactly one solution for every n × 1 matrix b.

(a)

Proof Since we proved in Theorem 1.5.3 that (a), (b), (c), and (d ) are equivalent, it will

be sufficient to prove that (a) ⇒ ( f ) ⇒ (e) ⇒ (a). (a) ⇒ (f ) This was already proved in Theorem 1.6.2.

Ax = b has exactly one solution for every n × 1 matrix b, then Ax = b is consistent for every n × 1 matrix b. (f ) ⇒ (e) This is almost self-evident, for if

64

Chapter 1 Systems of Linear Equations and Matrices (e) ⇒ (a) If the system Ax = b is consistent for every n × 1 matrix b, then, in particular, this is so for the systems

⎡ ⎤

⎡ ⎤

⎡ ⎤

⎢ .. ⎥ ⎣.⎦

⎢ .. ⎥ ⎣.⎦

⎢ .. ⎥ ⎣.⎦

0

0

1

1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ Ax = ⎢ ⎢0⎥, Ax = ⎢0⎥, . . . , Ax = ⎢0⎥

Let x1 , x2 , . . . , xn be solutions of the respective systems, and let us form an n × n matrix C having these solutions as columns. Thus C has the form

C = [x1 | x2 | · · · | xn ] As discussed in Section 1.3, the successive columns of the product AC will be

Ax1 , Ax2 , . . . , Axn [see Formula (8) of Section 1.3]. Thus, It follows from the equivalency of parts (e) and ( f ) that if you can show that Ax = b has at least one solution for every n × 1 matrix b, then you can conclude that it has exactly one solution for every n × 1 matrix b.



1 ⎢ ⎢0 ⎢ AC = [Ax1 | Ax2 | · · · | Axn ] = ⎢ ⎢0

0 1 0

0

0

⎢ .. ⎣.

.. .

⎤ ··· 0 ⎥ · · · 0⎥ ⎥ · · · 0⎥ =I .. ⎥ ⎥ .⎦ ··· 1

By part (b) of Theorem 1.6.3, it follows that C = A−1 . Thus, A is invertible. We know from earlier work that invertible matrix factors produce an invertible product. Conversely, the following theorem shows that if the product of square matrices is invertible, then the factors themselves must be invertible. THEOREM 1.6.5 Let A and B be square matrices of the same size. If AB is invertible,

then A and B must also be invertible.

B is invertible by showing that the homogeneous system B x = 0 has only the trivial solution. If we assume that x0 is any solution of this system, Proof We will show first that

then

(AB)x0 = A(B x0 ) = A0 = 0 so x0 = 0 by parts (a) and (b) of Theorem 1.6.4 applied to the invertible matrix AB . But the invertibility of B implies the invertibility of B −1 (Theorem 1.4.7), which in turn implies that

(AB)B −1 = A(BB −1 ) = AI = A

is invertible since the left side is a product of invertible matrices. This completes the proof. In our later work the following fundamental problem will occur frequently in various contexts.

A be a fixed m × n matrix. Find all m × 1 matrices b such that the system of equations Ax = b is consistent.

A Fundamental Problem Let

1.6 More on Linear Systems and Invertible Matrices

65

If A is an invertible matrix, Theorem 1.6.2 completely solves this problem by asserting that for every m × 1 matrix b, the linear system Ax = b has the unique solution x = A−1 b. If A is not square, or if A is square but not invertible, then Theorem 1.6.2 does not apply. In these cases b must usually satisfy certain conditions in order for Ax = b to be consistent. The following example illustrates how the methods of Section 1.2 can be used to determine such conditions. E X A M P L E 3 Determining Consistency by Elimination

What conditions must b1 , b2 , and b3 satisfy in order for the system of equations

x1 + x2 + 2x3 = b1 x1 + x3 = b2 2x1 + x2 + 3x3 = b3 to be consistent? Solution The augmented matrix is



1 ⎢ ⎣1 2

1 0 1

2 1 3

⎤ b1 ⎥ b2 ⎦ b3

which can be reduced to row echelon form as follows:



1 ⎢ ⎣0 0

1 −1 −1

2 −1 −1

1

0

1 1 −1

2 1 −1

1 ⎢ ⎣0 0

1 1 0

2 1 0



⎢ ⎣0 ⎡

⎤ b1 ⎥ b1 − b2 ⎦ b3 − 2b1 ⎤ b1 ⎥ b1 − b2 ⎦ b3 − 2b1

−1 times the first row was added to the second and −2 times the first row was added to the third.

The second row was multiplied by −1.

⎤ b1 ⎥ b1 − b 2 ⎦ b3 − b 2 − b 1

The second row was added to the third.

It is now evident from the third row in the matrix that the system has a solution if and only if b1 , b2 , and b3 satisfy the condition

b3 − b2 − b1 = 0 or b3 = b1 + b2 To express this condition another way, Ax = b is consistent if and only if b is a matrix of the form ⎤ ⎡

b1 ⎥ b2 ⎦ b1 + b 2



b=⎣ where b1 and b2 are arbitrary.

E X A M P L E 4 Determining Consistency by Elimination

What conditions must b1 , b2 , and b3 satisfy in order for the system of equations

x1 + 2x2 + 3x3 = b1 2x1 + 5x2 + 3x3 = b2

x1 to be consistent?

+ 8x3 = b3

66

Chapter 1 Systems of Linear Equations and Matrices Solution The augmented matrix is



1 ⎢ ⎣2 1

2 5 0

⎤ b1 ⎥ b2 ⎦ b3

3 3 8

Reducing this to reduced row echelon form yields (verify)



1 ⎢ ⎣0 0

0 1 0

0 0 1

⎤ −40b1 + 16b2 + 9b3 ⎥ 13b1 − 5b2 − 3b3 ⎦ 5b1 − 2b2 − b3

(2)

In this case there are no restrictions on b1 , b2 , and b3 , so the system has the unique solution

What does the result in Example 4 tell you about the coefficient matrix of the system?

x1 = −40b1 + 16b2 + 9b3 , x2 = 13b1 − 5b2 − 3b3 , x3 = 5b1 − 2b2 − b3

(3)

for all values of b1 , b2 , and b3 .

Exercise Set 1.6 In Exercises 1–8, solve the system by inverting the coefficient matrix and using Theorem 1.6.2. 1. x1 + x2 = 2 5x1 + 6x2 = 9

2. 4x1 − 3x2 = −3 2x1 − 5x2 = 9

3. x1 + 3x2 + x3 = 4 2x1 + 2x2 + x3 = −1 2x1 + 3x2 + x3 = 3

4. 5x1 + 3x2 + 2x3 = 4 3x1 + 3x2 + 2x3 = 2 x2 + x3 = 5

5.

x+y+ z= 5 x + y − 4z = 10 −4x + y + z = 0

7. 3x1 + 5x2 = b1 x1 + 2x2 = b2

6.

− x − 2y w + x + 4y w + 3x + 7y −w − 2x − 4y

− 3z = 0 + 4z = 7 + 9z = 4 − 6z = 6

8. x1 + 2x2 + 3x3 = b1 2x1 + 5x2 + 5x3 = b2 3x1 + 5x2 + 8x3 = b3

In Exercises 9–12, solve the linear systems together by reducing the appropriate augmented matrix.

In Exercises 13–17, determine conditions on the bi ’s, if any, in order to guarantee that the linear system is consistent. 13.

x1 + 3x2 = b1 −2x1 + x2 = b2

14. 6x1 − 4x2 = b1 3x1 − 2x2 = b2

15.

x1 − 2x2 + 5x3 = b1 4x1 − 5x2 + 8x3 = b2 −3x1 + 3x2 − 3x3 = b3

16.

17.

x1 −2x1 −3x1 4x1

− x2 + x2 + 2 x2 − 3x2

+ 3x3 + 5x3 + 2 x3 + x3

+ 2x4 + x4 − x4 + 3x4



2

(ii) b1 = −2, b2 = 5

10. −x1 + 4x2 + x3 = b1 x1 + 9x2 − 2x3 = b2 6x1 + 4x2 − 8x3 = b3

1

⎢ A = ⎣2

2

3

1

x1 − 2x2 − x3 = b1 −4x1 + 5x2 + 2x3 = b2 −4x1 + 7x2 + 4x3 = b3

= b1 = b2 = b3 = b4 2



⎥ −2⎦ and 1

⎡ ⎤ x1 ⎢ ⎥ x = ⎣x2 ⎦ x3

(a) Show that the equation Ax = x can be rewritten as (A − I )x = 0 and use this result to solve Ax = x for x.

(i) b1 = 0, b2 = 1, b3 = 0 (ii) b1 = −3, b2 = 4, b3 = −5 11. 4x1 − 7x2 = b1 x1 + 2x2 = b2 (i) b1 = 0, b2 = 1 (iii) b1 = −1, b2 = 3

x1 + 3x2 + 5x3 = b1 −x1 − 2x2 = b2 2x1 + 5x2 + 4x3 = b3 (i) b1 = 1, b2 = 0, b3 = −1 (ii) b1 = 0, b2 = 1, b3 = 1 (iii) b1 = −1, b2 = −1, b3 = 0

18. Consider the matrices

9. x1 − 5x2 = b1 3x1 + 2x2 = b2 (i) b1 = 1, b2 = 4

12.

(b) Solve Ax = 4x. In Exercises 19–20, solve the matrix equation for X .

⎡ (ii) b1 = −4, b2 = 6 (iv) b1 = −5, b2 = 1

1

−1

19. ⎣2 0

3



2











2

−1

5

7

8

0⎦ X = ⎣4

0

−3 −7

0

1⎦

2

1

1

−1

3

5



1.7 Diagonal, Triangular, and Symmetric Matrices



−2 ⎢ 20. ⎣ 0

−1

1

1

0

1





4

⎥ ⎢ −1⎦ X = ⎣6 1 −4



3

2

1

7

8

9⎦

3

7

9

67

(e) Let A be an n × n matrix and S is an n × n invertible matrix. If x is a solution to the linear system (S −1 AS)x = b, then S x is a solution to the linear system Ay = S b.



(f ) Let A be an n × n matrix. The linear system Ax = 4x has a unique solution if and only if A − 4I is an invertible matrix.

Working with Proofs 21. Let Ax = 0 be a homogeneous system of n linear equations in n unknowns that has only the trivial solution. Prove that if k is any positive integer, then the system Ak x = 0 also has only the trivial solution.

(g) Let A and B be n × n matrices. If A or B (or both) are not invertible, then neither is AB .

Working withTechnology

22. Let Ax = 0 be a homogeneous system of n linear equations in n unknowns, and let Q be an invertible n × n matrix. Prove that Ax = 0 has only the trivial solution if and only if (QA)x = 0 has only the trivial solution.

T1. Colors in print media, on computer monitors, and on television screens are implemented using what are called “color models”. For example, in the RGB model, colors are created by mixing percentages of red (R), green (G), and blue (B), and in the YIQ model (used in TV broadcasting), colors are created by mixing percentages of luminescence (Y) with percentages of a chrominance factor (I) and a chrominance factor (Q). The conversion from the RGB model to the YIQ model is accomplished by the matrix equation

23. Let Ax = b be any consistent system of linear equations, and let x1 be a fixed solution. Prove that every solution to the system can be written in the form x = x1 + x0 , where x0 is a solution to Ax = 0. Prove also that every matrix of this form is a solution. 24. Use part (a) of Theorem 1.6.3 to prove part (b).

⎡ ⎤



Y

⎢ ⎥ ⎢ ⎣ I ⎦ = ⎣.596 Q .212

True-False Exercises TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.

.587

.299

.114

⎤⎡ ⎤ R

⎥⎢ ⎥ −.321⎦ ⎣G⎦ B .311

−.275 −.523

What matrix would you use to convert the YIQ model to the RGB model?

(a) It is impossible for a system of linear equations to have exactly two solutions.

T2. Let

(b) If A is a square matrix, and if the linear system Ax = b has a unique solution, then the linear system Ax = c also must have a unique solution.



−2

1

⎢ A = ⎣4

(c) If A and B are n × n matrices such that AB = In , then BA = In .

2

3

⎡ ⎤

⎡ ⎤





⎢ ⎥

⎢ ⎥



7

3

0

11

1

⎤ ⎥

1⎦ , B1 = ⎣1⎦ , B2 = ⎣ 5⎦ , B3 = ⎣−4⎦

5

0



−1

2

Solve the linear systems Ax = B1 , Ax = B2 , Ax = B3 using the method of Example 2.

(d) If A and B are row equivalent matrices, then the linear systems Ax = 0 and B x = 0 have the same solution set.

1.7 Diagonal,Triangular, and Symmetric Matrices In this section we will discuss matrices that have various special forms. These matrices arise in a wide variety of applications and will play an important role in our subsequent work.

Diagonal Matrices

A square matrix in which all the entries off the main diagonal are zero is called a diagonal matrix. Here are some examples:



2 0



1

0 ⎢ , ⎣0 −5 0

0 1 0





6 0 ⎢0 ⎥ ⎢ 0⎦, ⎢ ⎣0 1 0

0 −4 0 0

0 0 0 0



0 0⎥ 0 ⎥ , ⎥ 0⎦ 0 8

0 0

68

Chapter 1 Systems of Linear Equations and Matrices

A general n × n diagonal matrix D can be written as



d1

Confirm Formula (2) by showing that

0

⎢0 ⎢ D = ⎢ .. ⎣.

d2 .. .

0

0

··· ···



0 0⎥ ⎥

.. ⎥ .⎦ · · · dn

(1)

A diagonal matrix is invertible if and only if all of its diagonal entries are nonzero; in this case the inverse of (1) is



1/d1 ⎢ 0 ⎢

DD −1 = D −1D = I

0 1/d2

D −1 = ⎢ .. ⎣ .

.. .

0

0



··· ···

0 0 ⎥ ⎥

.. ⎥ . ⎦ · · · 1/dn

(2)

You can verify that this is so by multiplying (1) and (2). Powers of diagonal matrices are easy to compute; we leave it for you to verify that if D is the diagonal matrix (1) and k is a positive integer, then



d1k

0

⎢ ⎢0 D =⎢ ⎢ ... ⎣

d2k .. .

0

0

k

··· ···

0



⎥ .. ⎥ .⎥ ⎦ · · · dnk 0⎥

(3)

E X A M P L E 1 Inverses and Powers of Diagonal Matrices

If



1 ⎣ A= 0 0 then

A−1



1 ⎢ = ⎣0 0



0



0 1 ⎥ ⎢ 0⎦, A5 = ⎣0 1 0 2

− 13 0

0 −3 0

0 −243 0



0 0⎦ 2





1 0 ⎥ ⎢ 0⎦, A−5 = ⎣0 32 0

0 1 − 243

0

0

⎤ ⎥

0⎦ 1 32

Matrix products that involve diagonal factors are especially easy to compute. For example,



d1

⎢ ⎣0

0

⎤⎡

⎤ ⎡ a12 a13 a14 d1 a11 ⎥ ⎢ a22 a23 a24 ⎦ = ⎣d2 a21 a31 a32 a33 a34 d3 a31 ⎡ ⎤ ⎤ d1 a11 a13 ⎡ d1 0 0 ⎢ ⎥ a23 ⎥ ⎢ ⎥ ⎢d1 a21 ⎥ ⎣0 d2 0 ⎦ = ⎢ a33 ⎦ ⎣d1 a31 0 0 d3 a43 d1 a41

d2

0 a11 ⎥⎢ 0 ⎦ ⎣a21

0

d3

0



a11 ⎢a ⎢ 21 ⎢ ⎣a31 a41

a12 a22 a32 a42

d1 a12 d2 a22 d3 a32 d2 a12 d2 a22 d2 a32 d2 a42

d1 a13 d2 a23 d3 a33

⎤ d3 a13 d3 a23 ⎥ ⎥ ⎥ d3 a33 ⎦ d3 a43

⎤ d1 a14 ⎥ d2 a24 ⎦ d3 a34

In words, to multiply a matrix A on the left by a diagonal matrix D, multiply successive rows of A by the successive diagonal entries of D, and to multiply A on the right by D, multiply successive columns of A by the successive diagonal entries of D .

1.7 Diagonal, Triangular, and Symmetric Matrices

Triangular Matrices

69

A square matrix in which all the entries above the main diagonal are zero is called lower triangular, and a square matrix in which all the entries below the main diagonal are zero is called upper triangular. A matrix that is either upper triangular or lower triangular is called triangular. E X A M P L E 2 Upper and Lower Triangular Matrices

⎡ a11 ⎢0 ⎢ ⎢ ⎣0 0

a12 a22 0 0

a13 a23 a33 0

⎤ a14 a24 ⎥ ⎥ ⎥ a34 ⎦ a44

A general 4 × 4 upper triangular matrix

⎡ a11 ⎢a ⎢ 21 ⎢ ⎣a31 a41

0 a22 a32 a42

0 0 a33 a43

⎤ 0 0 ⎥ ⎥ ⎥ 0 ⎦ a44

A general 4 × 4 lower triangular matrix

Remark Observe that diagonal matrices are both upper triangular and lower triangular since they have zeros below and above the main diagonal. Observe also that a square matrix in row echelon form is upper triangular since it has zeros below the main diagonal.

Properties ofTriangular Matrices ij

Figure 1.7.1

Example 2 illustrates the following four facts about triangular matrices that we will state without formal proof: • A square matrix A = [aij ] is upper triangular if and only if all entries to the left of the main diagonal are zero; that is, aij = 0 if i > j (Figure 1.7.1). • A square matrix A = [aij ] is lower triangular if and only if all entries to the right of the main diagonal are zero; that is, aij = 0 if i < j (Figure 1.7.1). • A square matrix A = [aij ] is upper triangular if and only if the i th row starts with at least i − 1 zeros for every i. • A square matrix A = [aij ] is lower triangular if and only if the j th column starts with at least j − 1 zeros for every j. The following theorem lists some of the basic properties of triangular matrices. THEOREM 1.7.1

(a) The transpose of a lower triangular matrix is upper triangular, and the transpose of an upper triangular matrix is lower triangular. (b) The product of lower triangular matrices is lower triangular, and the product of upper triangular matrices is upper triangular. (c)

A triangular matrix is invertible if and only if its diagonal entries are all nonzero.

(d ) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper triangular matrix is upper triangular.

Part (a) is evident from the fact that transposing a square matrix can be accomplished by reflecting the entries about the main diagonal; we omit the formal proof. We will prove (b), but we will defer the proofs of (c) and (d ) to the next chapter, where we will have the tools to prove those results more efficiently. Proof (b) We will prove the result for lower triangular matrices; the proof for upper triangular matrices is similar. Let A = [aij ] and B = [bij ] be lower triangular n × n matrices,

70

Chapter 1 Systems of Linear Equations and Matrices

and let C = [cij ] be the product C = AB . We can prove that C is lower triangular by showing that cij = 0 for i < j . But from the definition of matrix multiplication,

cij = ai 1 b1j + ai 2 b2j + · · · + ain bnj If we assume that i < j , then the terms in this expression can be grouped as follows:

cij = ai 1 b1j + ai 2 b2j + · · · + ai(j −1) b(j −1)j + aij bjj + · · · + ain bnj       Terms in which the row number of b is less than the column number of b

Terms in which the row number of a is less than the column number of a

In the first grouping all of the b factors are zero since B is lower triangular, and in the second grouping all of the a factors are zero since A is lower triangular. Thus, cij = 0, which is what we wanted to prove. E X A M P L E 3 Computations with Triangular Matrices

Consider the upper triangular matrices Observe that in Example 3 the diagonal entries of AB and BA are the same, and in both cases they are the products of the corresponding diagonal entries of A and B . In the exercises we will ask you to prove that this happens whenever two upper triangular matrices or two lower triangular matrices are multiplied.



1 ⎢ A = ⎣0 0

It is easy to recognize a symmetric matrix by inspection: The entries on the main diagonal have no restrictions, but mirror images of entries across the main diagonal must be equal. Here is a picture using the second matrix in Example 4:



1 ⎢ ⎣4 5

4 3 0

⎤ 5 ⎥ 0⎦ 7

⎤ ⎡ −1 3 ⎥ ⎢ 4⎦, B = ⎣0

−2 0 0

0

5



2 ⎥ −1⎦ 1

It follows from part (c) of Theorem 1.7.1 that the matrix A is invertible but the matrix B is not. Moreover, the theorem also tells us that A−1 , AB , and BA must be upper triangular. We leave it for you to confirm these three statements by showing that



1

A− 1

⎢ =⎢ ⎣0

− 23 1 2

0

Symmetric Matrices

3 2 0

0



7 5⎥ 2⎥ −5⎦, 1 5



3 ⎢ AB = ⎣0 0

−2

−2





3 ⎥ ⎢ 2⎦ , BA = ⎣0 0 5

0 0

DEFINITION 1 A square matrix A is said to be symmetric if A

5 0 0

⎤ −1 ⎥ −5⎦ 5

= AT .

E X A M P L E 4 Symmetric Matrices

The following matrices are symmetric, since each is equal to its own transpose (verify).



7 −3

−3 5





1

⎢ , ⎣4

5

4 −3 0



5 ⎥ 0⎦, 7



d1

⎢0 ⎢ ⎢ ⎣0 0

0

d2 0 0

0 0



d3

0 0⎥ ⎥ ⎥ 0⎦

0

d4

Remark It follows from Formula (14) of Section 1.3 that a square matrix A is symmetric if and only if

(A)ij = (A)j i

(4)

for all values of i and j .

The following theorem lists the main algebraic properties of symmetric matrices. The proofs are direct consequences of Theorem 1.4.8 and are omitted.

1.7 Diagonal, Triangular, and Symmetric Matrices

71

THEOREM 1.7.2 If A and B are symmetric matrices with the same size, and if k is any

scalar, then: (a) AT is symmetric. (b) A + B and A − B are symmetric. (c) kA is symmetric.

It is not true, in general, that the product of symmetric matrices is symmetric. To see why this is so, let A and B be symmetric matrices with the same size. Then it follows from part (e) of Theorem 1.4.8 and the symmetry of A and B that

(AB)T = B TAT = BA Thus, (AB)T = AB if and only if AB = BA, that is, if and only if A and B commute. In summary, we have the following result. THEOREM 1.7.3 The product of two symmetric matrices is symmetric if and only if the

matrices commute.

E X A M P L E 5 Products of Symmetric Matrices

The first of the following equations shows a product of symmetric matrices that is not symmetric, and the second shows a product of symmetric matrices that is symmetric. We conclude that the factors in the first equation do not commute, but those in the second equation do. We leave it for you to verify that this is so.





1 2

Invertibility of Symmetric Matrices

1 2

2 3

2 3





−4 1

−4 3





1 −2 = −5 0 3 2 = 1 −1

1 2

1 3

In general, a symmetric matrix need not be invertible. For example, a diagonal matrix with a zero on the main diagonal is symmetric but not invertible. However, the following theorem shows that if a symmetric matrix happens to be invertible, then its inverse must also be symmetric. THEOREM 1.7.4 If A is an invertible symmetric matrix, then A−1 is symmetric.

Proof Assume that A is symmetric and invertible. From Theorem 1.4.9 and the fact that A = AT , we have

(A−1 )T = (AT )−1 = A−1

which proves that A−1 is symmetric. Products AAT and ATA are Symmetric

Matrix products of the form AAT and ATA arise in a variety of applications. If A is an m × n matrix, then AT is an n × m matrix, so the products AAT and ATA are both square matrices—the matrix AAT has size m × m, and the matrix ATA has size n × n. Such products are always symmetric since

(AAT )T = (AT )TAT = AAT and (ATA)T = AT(AT )T = ATA

72

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 6 The Product of a Matrix and Its Transpose Is Symmetric

Let A be the 2 × 3 matrix





Then

1 ⎢ ATA = ⎣−2 4

AA = T

1 3

−2

1 3

A=

4 −5

0



−2 0





10 −2 −11 −2 4 ⎥ ⎢ 4 −8⎦ = ⎣ −2 0 −5 −11 −8 41 ⎤ ⎡ 1 3

4 ⎢ 21 −17 ⎥ 0⎦ = ⎣−2 −5 −17 34 4 −5

3 ⎥ 1 0⎦ 3 −5

Observe that ATA and AAT are symmetric as expected. Later in this text, we will obtain general conditions on A under which AAT and ATA are invertible. However, in the special case where A is square, we have the following result. THEOREM 1.7.5 If A is an invertible matrix, then AAT and ATA are also invertible. Proof Since A is invertible, so is AT by Theorem 1.4.9. Thus AAT and ATA are invertible,

since they are the products of invertible matrices.

Exercise Set 1.7 In Exercises 1–2, classify the matrix as upper triangular, lower triangular, or diagonal, and decide by inspection whether the matrix is invertible. [Note: Recall that a diagonal matrix is both upper and lower triangular, so there may be more than one answer in some parts.]



1. (a)



2

1

0

3



(b)





−1

0

0

(c) ⎣ 0

2

0⎦

0

0

1 5



 2. (a)

0

1

7





4

0

−2

(d) ⎣0

0

3⎦

0

0

8

0

−3

0

0



(b)



4

0

0

3 5

0⎦

0

0





7





3 3. ⎣0 0

0 −1 0

⎤⎡

0 2 0⎦ ⎣−4 2 2



1 1⎦ 5

5 5. ⎣0 0

0 2 0



2

0

6. ⎣0 0

−1





−4 −5 ⎣ 0

2 −1



0

0

⎤⎡

−3 0 0⎦ ⎣ 1 −3 −6

0

0 3 0



0 0⎦ 2

−4

−3

0

0

0⎦ ⎣ 0

5

0⎦

0

2

0 3 2

4

−1

3

0⎦ ⎣ 1

2

−5

1

⎤⎡ ⎥⎢

4



4 0 2

2 −5 2

0





3

0

0

1

0⎦

7

0

0



In Exercises 3–6, find the product by inspection.



1 4. −3

3⎦ 2

⎤⎡ ⎥⎢

−2

0

⎤ ⎥

In Exercises 7–10, find A2 , A−2 , and A−k (where k is any integer) by inspection.

(d) ⎣3



−2



3



(c) ⎣0



0





4

0





7. A =

1

0

0

−2

⎡1







0

9. A = ⎣ 0

1 3

0⎦

0

0

1 4



2

0

0

8. A = ⎣ 0

3

0⎦

0

0

5



0





−6



−2

0



0

⎢ 0 −4 0 ⎢ ⎣ 0 0 −3

10. A = ⎢

0

0

0



0

0⎥ ⎥



0⎦ 2

1.7 Diagonal, Triangular, and Symmetric Matrices

In Exercises 23–24, find the diagonal entries of AB by inspection.

In Exercises 11–12, compute the product by inspection.



⎤⎡

⎤⎡



1

0

0

2

0

0

0

0

0

11. ⎣0

0

0⎦ ⎣ 0

5

0⎦ ⎣0

2

0⎦

0

0

3

0

0

0

1





−1

⎥⎢

0

⎤⎡

0

0

3

⎢ 12. ⎣ 0

2

⎥⎢ 0⎦ ⎣0

0

0

4

0

0

⎥⎢

0 0

⎤⎡

5

5

⎥⎢ 0⎦ ⎣0

0

7





0

0

13.

39

1

0

0

−1

⎤⎡

a

0

0

15. (a) ⎣ 0

b

0⎦ ⎣w

v ⎥ x⎦

0

0

c

z



⎡ ⎢

⎥⎢

⎤ v  ⎥ a x⎦ 0 z

u

16. (a) ⎣w

y

u

0

0

−1

y

s v

x

y

a

0

0

(b) ⎣ 0 0





0

b



t a ⎥⎢ w⎦ ⎣ 0

0

0

0⎦

0

c

r

s

t

b

0⎦ ⎣u

v

⎥ w⎦

0

c

y

z

0

⎤⎡ ⎥⎢

x



17. (a)

2

−1

×

3

 18. (a)

0

×

3

0





1

×

×





⎢3 ⎢ (b) ⎢ ⎣7

1

×

−8

0

×⎥ ⎥ ⎥ ×⎦

2

−3

9

0

7

−3

2





×

1

0

⎤ −1 ⎥ −4 ⎦ −2

1

0

0 0

1

−5 −3 −2

0

6

4

5

×

1

−7 ⎥ ⎥ ⎥ −6 ⎦

×

×

×

3

19. ⎣0 0

7





⎢2 ⎢ 21. ⎢ ⎣4



2

4

20. ⎣ 0 0

3

0⎦

0

5

⎢ −3 ⎢ ⎣−4

0

0

−1 −6

0

0

3

8



0



0⎥ ⎥

4

⎥ 0⎦

1

3



2

22. ⎢

0

3⎦

0

6



0

6

0

0

5

0⎦

−3

0

2

6







−3 −1

4

a+5 ⎡



a − 2b + 2c

2



26. A = ⎣3

5

0

−2

2a + b + c

a+c

⎤ ⎥ ⎦

7

In Exercises 27–28, find all values of x for which A is invertible.

⎡ ⎢

27. A = ⎣ 0 0



x−



0

⎤ x4 ⎥ x3 ⎦ x−4

0

0

x2 x+2

1 2

28. A = ⎢ ⎣ x

x−

x2

x3

1 3

⎤ ⎥ ⎥ ⎦

0

x+

1 4

30. Show that if A is a symmetric n × n matrix and B is any n × m matrix, then the following products are symmetric:

B TB, BB T , B TAB In Exercises 31–32, find a diagonal matrix A that satisfies the given condition.





−1

5

29. If A is an invertible upper triangular or lower triangular matrix, what can you say about the diagonal entries of A−1 ?

In Exercises 19–22, determine by inspection whether the matrix is invertible.



7

0⎦ , B = ⎣1 3 7





2

0



⎢× ⎢ ⎣×

(b) ⎢

⎥ ⎢ −2⎦ , B = ⎣ 0 0 −1 ⎤ ⎡

x−1

In Exercises 17–18, create a symmetric matrix by substituting appropriate numbers for the ×’s.



−1

0



b

z



4



⎤⎡

r



24. A = ⎣−2

25. A =

(b) ⎣u



0

6

In Exercises 25–26, find all values of the unknown constant(s) for which A is symmetric.

1000

1





0



In Exercises 15–16, use what you have learned in this section about multiplying by diagonal matrices to compute the product by inspection.



1

3

 14.

23. A = ⎣0



In Exercises 13–14, compute the indicated quantity.



2



⎥ 0⎦

−2

0

3



0

73

1 31. A = ⎣0 0 5

0 −1 0





0 0⎦ −1

32. A

−2

9 = ⎣0 0

0 4 0



0 0⎦ 1

33. Verify Theorem 1.7.1(b) for the matrix product AB and Theorem 1.7.1(d) for the matrix A, where





−1

⎢ A=⎣ 0 0



0⎥ ⎥

⎥ 0⎦ −5

0











2

5

2

−8

1

3⎦, B = ⎣0

2

1⎦

0

3

0

−4

0

34. Let A be an n × n symmetric matrix. (a) Show that A2 is symmetric. (b) Show that 2A2 − 3A + I is symmetric.

0



74

Chapter 1 Systems of Linear Equations and Matrices

35. Verify Theorem 1.7.4 for the given matrix A.

 (a) A =





2

−1

−1

3

1

−2

(b) A = ⎣−2

1

3

−7



3



⎥ −7 ⎦ 4

Working with Proofs

36. Find all 3 × 3 diagonal matrices A that satisfy A2 − 3A − 4I = 0. 37. Let A = [aij ] be an n × n matrix. Determine whether A is symmetric. (a) aij = i 2 + j 2

(b) aij = i 2 − j 2

(c) aij = 2i + 2j

(d) aij = 2i + 2j 2

39. Find an upper triangular matrix that satisfies



1 0

Step 1. Let U x = y, so that LU x = b can be expressed as Ly = b. Solve this system. Step 2. Solve the system U x = y for x. In each part, use this two-step method to solve the given system. 1



⎤⎡

0 3 4

⎢ (a) ⎣−2 2

2 ⎢ (b) ⎣ 4 −3

⎤⎡

0 1 −2

⎤⎡ ⎤

−1

0 2 ⎥⎢ 0⎦ ⎣0 0 1

0 3 ⎥⎢ 0⎦ ⎣0 0 3





⎤⎡ ⎤ ⎡ ⎤ 2 4 x1 ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣x2 ⎦ = ⎣−5⎦ 2 2 x3

−5 4 0

In the text we defined a matrix A to be symmetric if AT = A. Analogously, a matrix A is said to be skew-symmetric if AT = −A. Exercises 41–45 are concerned with matrices of this type. 41. Fill in the missing entries (marked with ×) so the matrix A is skew-symmetric.

⎡ ⎢

×

(a) A = ⎣ 0

×

× × −1

4





⎥ ×⎦ ×

×

0

(b) A = ⎣× 8

× ×



⎤ × ⎥ −4 ⎦ ×

42. Find all values of a , b, c, and d for which A is skew-symmetric.



0 ⎢ A = ⎣−2 −3

2a − 3b + c 0 −5

(b) If A and B are skew-symmetric matrices, then so are AT , A + B , A − B , and kA for any scalar k . 46. Prove: If the matrices A and B are both upper triangular or both lower triangular, then the diagonal entries of both AB and BA are the products of the diagonal entries of A and B .



3a − 5b + 5c ⎥ 5a − 8b + 6c⎦

d

True-False Exercises TF. In parts (a)–(m) determine whether the statement is true or false, and justify your answer. (a) The transpose of a diagonal matrix is a diagonal matrix. (b) The transpose of an upper triangular matrix is an upper triangular matrix. (c) The sum of an upper triangular matrix and a lower triangular matrix is a diagonal matrix. (d) All entries of a symmetric matrix are determined by the entries occurring on and above the main diagonal.

3 1 x1 ⎥⎢ ⎥ ⎢ ⎥ 2⎦ ⎣x2 ⎦ = ⎣−2⎦ 4 0 x3

1 0

45. Prove the following facts about skew-symmetric matrices.

47. Prove: If ATA = A, then A is symmetric and A = A2 .

30 −8

40. If the n × n matrix A can be expressed as A = LU , where L is a lower triangular matrix and U is an upper triangular matrix, then the linear system Ax = b can be expressed as LU x = b and can be solved in two steps:



44. Prove that every square matrix A can be expressed as the sum of a symmetric matrix and a skew-symmetric matrix. [Hint: Note the identity A = 21 (A + AT ) + 21 (A − AT ).] (a) If A is an invertible skew-symmetric matrix, then A−1 is skew-symmetric.

3

38. On the basis of your experience with Exercise 37, devise a general test that can be applied to a formula for aij to determine whether A = [aij ] is symmetric.

A3 =

43. We showed in the text that the product of symmetric matrices is symmetric if and only if the matrices commute. Is the product of commuting skew-symmetric matrices skew-symmetric? Explain.

(e) All entries of an upper triangular matrix are determined by the entries occurring on and above the main diagonal. (f ) The inverse of an invertible lower triangular matrix is an upper triangular matrix. (g) A diagonal matrix is invertible if and only if all of its diagonal entries are positive. (h) The sum of a diagonal matrix and a lower triangular matrix is a lower triangular matrix. (i) A matrix that is both symmetric and upper triangular must be a diagonal matrix. ( j) If A and B are n × n matrices such that A + B is symmetric, then A and B are symmetric. (k) If A and B are n × n matrices such that A + B is upper triangular, then A and B are upper triangular. (l) If A2 is a symmetric matrix, then A is a symmetric matrix. (m) If kA is a symmetric matrix for some k  = 0, then A is a symmetric matrix.

1.8 Matrix Transformations

Working withTechnology T1. Starting with the formula stated in Exercise T1 of Section 1.5, derive a formula for the inverse of the “block diagonal” matrix



in which D1 and D2 are invertible, and use your result to compute the inverse of the matrix



1.24

⎢3.08 ⎢ M=⎢ ⎣ 0



D1

0

0

D2

75

0



2.37

0

0

−1.01

0

0 ⎥ ⎥



0

2.76

4.92⎦

0

3.23

5.54

1.8 MatrixTransformations In this section we will introduce a special class of functions that arise from matrix multiplication. Such functions, called “matrix transformations,” are fundamental in the study of linear algebra and have important applications in physics, engineering, social sciences, and various branches of mathematics.

Recall that in Section 1.1 we defined an “ordered n-tuple” to be a sequence of n real numbers, and we observed that a solution of a linear system in n unknowns, say

x1 = s1 , x2 = s2 , . . . , xn = sn can be expressed as the ordered n-tuple

(s1 , s2 , . . . , sn )

The term “vector” is used in various ways in mathematics, physics, engineering, and other applications. The idea of viewing n-tuples as vectors will be discussed in more detail in Chapter 3, at which point we will also explain how this idea relates to more familiar notion of a vector.

(1)

Recall also that if n = 2, then the n-tuple is called an “ordered pair,” and if n = 3, it is called an “ordered triple.” For two ordered n-tuples to be regarded as the same, they must list the same numbers in the same order. Thus, for example, (1, 2) and (2, 1) are different ordered pairs. The set of all ordered n-tuples of real numbers is denoted by the symbol R n . The elements of R n are called vectors and are denoted in boldface type, such as a, b, v, w, and x. When convenient, ordered n-tuples can be denoted in matrix notation as column vectors. For example, the matrix

⎡ ⎤ s1 ⎢s ⎥ ⎢ 2⎥ ⎢ ⎥ ⎢ .. ⎥ ⎣.⎦

(2)

sn can be used as an alternative to (1). We call (1) the comma-delimited form of a vector and (2) the column-vector form. For each i = 1, 2, . . . , n, let ei denote the vector in R n with a 1 in the i th position and zeros elsewhere. In column form these vectors are

⎡ ⎤ 1

⎡ ⎤ 0

⎡ ⎤ 0

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢0⎥ ⎢ ⎥ e1 = ⎢ ⎥ , e2 = ⎢ ⎥ , . . . , en = ⎢0⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎣.⎦ ⎣.⎦ ⎣.⎦ 0

0

1

We call the vectors e1 , e2 , . . . , en the standard basis vectors for R n . For example, the vectors ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 0 0

⎢ ⎥

⎢ ⎥

⎢ ⎥

e1 = ⎣0⎦ , e2 = ⎣1⎦ , e3 = ⎣0⎦ 0 0 1 are the standard basis vectors for R 3 .

76

Chapter 1 Systems of Linear Equations and Matrices

The vectors e1 , e2 , . . . , en in R n are termed “basis vectors” because all other vectors in R n are expressible in exactly one way as a linear combination of them. For example, if ⎡ ⎤

x1 ⎢x ⎥ ⎢ 2⎥ ⎥ x=⎢ ⎢ .. ⎥ ⎣.⎦ xn

then we can express x as x = x1 e1 + x2 e2 + · · · + xn en Functions and Transformations

Recall that a function is a rule that associates with each element of a set A one and only one element in a set B . If f associates the element b with the element a , then we write

b = f(a) f a b = f (a)

Domain A

Codomain B

Figure 1.8.1

and we say that b is the image of a under f or that f(a) is the value of f at a . The set A is called the domain of f and the set B the codomain of f (Figure 1.8.1). The subset of the codomain that consists of all images of elements in the domain is called the range of f . In many applications the domain and codomain of a function are sets of real numbers, but in this text we will be concerned with functions for which the domain is R n and the codomain is R m for some positive integers m and n. DEFINITION 1 If f is a function with domain R n and codomain R m , then we say that

f is a transformation from R n to R m or that f maps from R n to R m , which we denote by writing

f : R n →R m In the special case where m = n, a transformation is sometimes called an operator on Rn.

MatrixTransformations It is common in linear algebra to use the letter T to denote a transformation. In keeping with this usage, we will usually denote a transformation from R n to R m by writing

T : R n →R m

In this section we will be concerned with the class of transformations from R n to R m that arise from linear systems. Specifically, suppose that we have the system of linear equations

w1 = a11 x1 + a12 x2 + · · · + a1n xn w2 = a21 x1 + a22 x2 + · · · + a2n xn .. .. .. .. . . . . wm = am1 x1 + am2 x2 + · · · + amn xn which we can write in matrix notation as



⎤ ⎡ w1 a11 ⎢w ⎥ ⎢a ⎢ 2 ⎥ ⎢ 21 ⎢ .. ⎥ = ⎢ .. ⎣ . ⎦ ⎣ . wm a m1

a12 a22 .. . a m2

⎤⎡ ⎤ x1 a1n ⎥ ⎢ a 2 n ⎥ ⎢ x2 ⎥ ⎥ .. ⎥ ⎢ .. ⎥ . ⎦⎣ . ⎦ xn · · · amn ··· ···

(3)

(4)

or more briefly as w = Ax

(5)

Although we could view (5) as a compact way of writing linear system (3), we will view it instead as a transformation that maps a vector x in R n into thevector w in R m by

1.8 Matrix Transformations

77

multiplying x on the left by A. We call this a matrix transformation (or matrix operator in the special case where m = n). We denote it by

TA : R n → R m TA x

TA(x)

Rn

(see Figure 1.8.2). This notation is useful when it is important to make the domain and codomain clear. The subscript on TA serves as a reminder that the transformation results from multiplying vectors in R n by the matrix A. In situations where specifying the domain and codomain is not essential, we will express (4) as w = TA (x)

Rm TA : R n → R m

(6)

We call the transformation TA multiplication by A. On occasion we will find it convenient to express (6) in the schematic form TA

x −→ w

Figure 1.8.2

(7)

which is read “TA maps x into w.” E X A M P L E 1 A Matrix Transformation from R 4 to R 3

The transformation from R 4 to R 3 defined by the equations

w1 = 2x1 − 3x2 + x3 − 5x4 w2 = 4x1 + x2 − 2x3 + x4 w3 = 5x1 − x2 + 4x3 can be expressed in matrix form as







2 −3 w1 ⎢ ⎥ ⎢ 1 ⎣w2 ⎦ = ⎣4 5 −1 w3

(8)

⎡ ⎤ ⎤ x1 −5 ⎢ ⎥ ⎥ ⎢x2 ⎥ 1⎦ ⎢ ⎥ ⎣ x3 ⎦ 0 x4

1

−2 4

from which we see that the transformation can be interpreted as multiplication by



−3

2 ⎢ A = ⎣4 5

1

−1

⎤ −5 ⎥ 1⎦

1 −2 4

(9)

0

Although the image under the transformation TA of any vector

⎡ ⎤ x1 ⎢x ⎥ ⎢ 2⎥ x=⎢ ⎥ ⎣x3 ⎦ x4

in R 4 could be computed directly from the defining equations in (8), we will find it preferable to use the matrix in (9). For example, if





1 ⎢−3⎥ ⎢ ⎥ x=⎢ ⎥ ⎣ 0⎦ 2 then it follows from (9) that







2 −3 w1 ⎢ ⎢ ⎥ 1 ⎣w2 ⎦ = TA (x) = Ax = ⎣4 5 −1 w3

1

−2 4





⎡ ⎤ 1 −5 ⎢ ⎥ 1 ⎥ ⎢−3⎥ ⎢ ⎥ 1⎦ ⎢ ⎥ = ⎣3⎦ ⎣ 0⎦ 8 0 2 ⎤

78

Chapter 1 Systems of Linear Equations and Matrices

E X A M P L E 2 Zero Transformations

If 0 is the m × n zero matrix, then

T0 (x) = 0x = 0 so multiplication by zero maps every vector in R n into the zero vector in R m . We call T0 the zero transformation from R n to R m .

E X A M P L E 3 Identity Operators

If I is the n × n identity matrix, then

TI (x) = I x = x so multiplication by I maps every vector in R n to itself. We call TI the identity operator on R n .

Properties of Matrix Transformations

The following theorem lists four basic properties of matrix transformations that follow from properties of matrix multiplication.

A the matrix transformation TA : R n →R m has the following properties for all vectors u and v and for every scalar k :

THEOREM 1.8.1 For every matrix

(a) TA (0) = 0 (b) TA (k u) = kTA (u)

[Homogeneity property]

(c) TA (u + v) = TA (u) + TA (v)

[Additivity property]

(d ) TA (u − v) = TA (u) − TA (v)

Proof All four parts are restatements of the following properties of matrix arithmetic

given in Theorem 1.4.1:

A0 = 0, A(k u) = k(Au), A(u + v) = Au + Av, A(u − v) = Au − Av It follows from parts (b) and (c) of Theorem 1.8.1 that a matrix transformation maps a linear combination of vectors in R n into the corresponding linear combination of vectors in R m in the sense that

TA (k1 u1 + k2 u2 + · · · + kr ur ) = k1 TA (u1 ) + k2 TA (u2 ) + · · · + kr TA (ur )

(10)

Matrix transformations are not the only kinds of transformations. For example, if

w1 = x12 + x22 w2 = x1 x2 then there are no constants a , b, c, and d for which



w1 w2





a = c

b d

  x1 x2

=

(11)



x12 + x22



x1 x2

so that the equations in (11) do not define a matrix transformation from R 2 to R 2 .

1.8 Matrix Transformations

79

This leads us to the following two questions. Question 1. Are there algebraic properties of a transformation T : R n →R m that can

be used to determine whether T is a matrix transformation? Question 2. If we discover that a transformation

T : R n →R m is a matrix transfor-

mation, how can we find a matrix for it?

The following theorem and its proof will provide the answers.

T : R n →R m is a matrix transformation if and only if the following relationships hold for all vectors u and v in R n and for every scalar k :

THEOREM 1.8.2

(i) T (u + v) = T (u) + T (v) (ii) T (k u) = kT (u)

[Additivity property] [Homogeneity property]

T is a matrix transformation, then properties (i) and (ii) follow respectively from parts (c) and (b) of Theorem 1.8.1. Conversely, assume that properties (i) and (ii) hold. We must show that there exists an m × n matrix A such that T (x) = Ax

Proof If

for every vector x in R n . Recall that the derivation of Formula (10) used only the additivity and homogeneity properties of TA . Since we are assuming that T has those properties, it must be true that

T (k1 u1 + k2 u2 + · · · + kr ur ) = k1 T (u1 ) + k2 T (u2 ) + · · · + kr T (ur )

(12)

for all scalars k1 , k2 , . . . , kr and all vectors u1 , u2 , . . . , ur in R n . Let A be the matrix

A = [T (e1 ) | T (e2 ) | · · · | T (en )]

(13)

where e1 , e2 , . . . , en are the standard basis vectors for R n . It follows from Theorem 1.3.1 that Ax is a linear combination of the columns of A in which the successive coefficients are the entries x1 , x2 , . . . , xn of x. That is,

Ax = x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en ) Using Formula (10) we can rewrite this as

Ax = T (x1 e1 + x2 e2 + · · · + xn en ) = T (x) which completes the proof.

Theorem 1.8.3 tells us that for transformations from R n to R m , the terms “matrix transformation” and “linear transformation” are synonymous.

The additivity and homogeneity properties in Theorem 1.8.2 are called linearity conditions, and a transformation that satisfies these conditions is called a linear transformation. Using this terminology Theorem 1.8.2 can be restated as follows.

THEOREM 1.8.3 Every linear transformation from R n to R m is a matrix transformation,

and conversely, every matrix transformation from R n to R m is a linear transformation.

80

Chapter 1 Systems of Linear Equations and Matrices

Depending on whether n-tuples and m-tuples are regarded as vectors or points, the geometric effect of a matrix transformation TA : R n →R m is to map each vector (point) in R n into a vector (point) in R m (Figure 1.8.3). Rn x

0

0

Figure 1.8.3

Rn

Rm TA(x)

x

0

0

TA maps vectors to vectors.

Rm TA(x)

TA maps points to points.

The following theorem states that if two matrix transformations from R n to R m have the same image at each point of R n , then the matrices themselves must be the same. THEOREM 1.8.4 If TA : R n →R m and TB : R n →R m are matrix transformations, and if

TA (x) = TB (x) for every vector x in R n , then A = B .

Proof To say that TA (x)

= TB (x) for every vector in R n is the same as saying that Ax = B x

for every vector x in R n . This will be true, in particular, if x is any of the standard basis vectors e1 , e2 , . . . , en for R n ; that is,

Aej = B ej (j = 1, 2, . . . , n)

(14)

Since every entry of ej is 0 except for the j th, which is 1, it follows from Theorem 1.3.1 that Aej is the j th column of A and B ej is the j th column of B . Thus, (14) implies that corresponding columns of A and B are the same, and hence that A = B . Theorem 1.8.4 is significant because it tells us that there is a one-to-one correspondence between m × n matrices and matrix transformations from R n to R m in the sense that every m × n matrix A produces exactly one matrix transformation (multiplication by A) and every matrix transformation from R n to R m arises from exactly one m × n matrix; we call that matrix the standard matrix for the transformation. A Procedure for Finding Standard Matrices

In the course of proving Theorem 1.8.2 we showed in Formula (13) that if e1 , e2 , . . . , en are the standard basis vectors for R n (in column form), then the standard matrix for a linear transformation T : R n →R m is given by the formula

A = [T (e1 ) | T (e2 ) | · · · | T (en )]

(15)

This suggests the following procedure for finding standard matrices. Finding the Standard Matrix for a Matrix Transformation Step 1. Find the images of the standard basis vectors e1 , e2 , . . . , en for R n . Step 2. Construct the matrix that has the images obtained in Step 1 as its successive columns. This matrix is the standard matrix for the transformation.

1.8 Matrix Transformations

81

E X A M P L E 4 Finding a Standard Matrix

Find the standard matrix A for the linear transformation T : R 2 →R 2 defined by the formula ⎡ ⎤   2 x1 + x 2

⎢ ⎥ = ⎣ x1 − 3x2 ⎦ −x1 + x2

x1 x2

T

(16)

Solution We leave it for you to verify that

  T (e1 ) = T

1 0



2



⎢ ⎥ = ⎣ 1⎦ and T (e2 ) = T −1

  0 1





1

⎢ ⎥ = ⎣−3⎦ 1

Thus, it follows from Formulas (15) and (16) that the standard matrix is



2

1



⎥ −3⎦

⎢ A = [T (e1 ) | T (e2 )] = ⎣ 1 −1

1

E X A M P L E 5 Computing with Standard Matrices

For the linear transformation in Example 4, use the standard matrix A obtained in that example to find   1

T

Although we could have obtained the result in Example 5 by substituting values for the variables in (13), the method used in Example 5 is preferable for large-scale problems in that matrix multiplication is better suited for computer computations.

4

Solution The transformation is multiplication by A, so

  T

1

4



2

⎢ =⎣ 1 −1







1   6 ⎥ 1 ⎥ ⎢ −3⎦ = ⎣−11⎦ 4 1 3

For transformation problems posed in comma-delimited form, a good procedure is to rewrite the problem in column-vector form and use the methods previously illustrated. E X A M P L E 6 Finding a Standard Matrix

Rewrite the transformation T (x1 , x2 ) = (3x1 + x2 , 2x1 − 4x2 ) in column-vector form and find its standard matrix. Solution

     3x1 + x2 3 x1 = = T x2 2x1 − 4x2 2

Thus, the standard matrix is



3

1

2

−2

1

−2

  x1 x2



Remark This section is but a first step in the study of linear transformations, which is one of the major themes in this text. We will delve deeper into this topic in Chapter 4, at which point we will have more background and a richer source of examples to work with.

82

Chapter 1 Systems of Linear Equations and Matrices

Exercise Set 1.8 In Exercises 1–2, find the domain and codomain of the transformation TA(x) = Ax. 1. (a) A has size 3 × 2.

(b) A has size 2 × 3.

(c) A has size 3 × 3.

(d) A has size 1 × 6.

2. (a) A has size 4 × 5.

(b) A has size 5 × 4.

(c) A has size 4 × 4.

(d) A has size 3 × 1.

In Exercises 3–4, find the domain and codomain of the transformation defined by the equations. 3. (a) w1 = 4x1 + 5x2

(b) w1 = 5x1 − 7x2

w2 = x1 − 8x2

w2 = 6x1 + x2

w2 = −x1 + 4x2 + 2x3

w2 = 4x1 − 3x2 + 2x3

w3 = −3x1 + 2x2 − 5x3

3

1

6

7

 6. (a)

6

−1

⎡ ⎢

2

(b) ⎣4 2



  3 x1 7 x2

⎤ −1   ⎥ x1 3⎦ x2 −5

(c) T (x1 , x2 , x3 ) = (0, 0, 0, 0, 0)

(b) T (x1 , x2 ) = (x1 , x2 ) (c) T (x1 , x2 , x3 ) = (x1 + 2x2 + x3 , x1 + 5x2 , x3 )

2

1

(b) ⎣3 1

7



0

15. Find the standard matrix for the operator T : R 3 →R 3 defined by w1 = 3x1 + 5x2 − x3

w3 = 3x1 + 2x2 − x3

⎤⎡ ⎤ x1 −6 ⎥⎢ ⎥ −4⎦ ⎣x2 ⎦ 3 x3

7. (a) T (x1 , x2 ) = (2x1 − x2 , x1 + x2 ) (b) T (x1 , x2 , x3 ) = (4x1 + x2 , x1 + x2 )

and then compute T (−1, 2, 4) by directly substituting in the equations and then by matrix multiplication. 16. Find the standard matrix for the transformation T : R 4 →R 2 defined by w1 = 2x1 + 3x2 − 5x3 − x4

w2 = x1 − 5x2 + 2x3 − 3x4 and then compute T (1, −1, 2, 4) by directly substituting in the equations and then by matrix multiplication. In Exercises 17–18, find the standard matrix for the transformation and use it to compute T (x). Check your result by substituting directly in the formula for T .

8. (a) T (x1 , x2 , x3 , x4 ) = (x1 , x2 ) (b) T (x1 , x2 , x3 ) = (x1 , x2 − x3 , x2 )

17. (a) T (x1 , x2 ) = (−x1 + x2 , x2 ); x = (−1, 4)

In Exercises 9–10, find the domain and codomain of the transformation T defined by the formula.

⎤ ⎡ ⎛⎡ ⎤⎞ x1 x1 ⎥ ⎢ ⎜⎢ ⎥⎟ ⎢ x2 ⎥ 10. T ⎝⎣x2 ⎦⎠ = ⎢ ⎥ ⎣x1 − x3 ⎦ x3 0

In Exercises 11–12, find the standard matrix for the transformation defined by the equations. 11. (a) w1 = 2x1 − 3x2 + x3 w2 = 3x1 + 5x2 − x3

(b) T (x1 , x2 , x3 , x4 ) = (7x1 + 2x2 − x3 + x4 , x2 + x3 , −x1 )

w2 = 4x1 − x2 + x3

In Exercises 7–8, find the domain and codomain of the transformation T defined by the formula.

⎤ ⎡   4x1 x1 ⎥ ⎢ 9. T = ⎣x1 − x2 ⎦ x2 3x2

(a) T (x1 , x2 ) = (x2 , −x1 , x1 + 3x2 , x1 − x2 )

(d) T (x1 , x2 , x3 ) = (4x1 , 7x2 , −8x3 )

In Exercises 5–6, find the domain and codomain of the transformation defined by the matrix product.

5. (a)

13. Find the standard matrix for the transformation T defined by the formula.

(a) T (x1 , x2 ) = (2x1 − x2 , x1 + x2 )

(b) w1 = 2x1 + 7x2 − 4x3

⎡ ⎤  x1 2 ⎢ ⎥ ⎣x2 ⎦ 1 x3

w2 = x1 + x2 w3 = x1 + x2 + x3 w4 = x1 + x2 + x3 + x4

14. Find the standard matrix for the operator T defined by the formula.

x1 − 4x2 + 8x3



(b) w1 = x1

(d) T (x1 , x2 , x3 , x4 ) = (x4 , x1 , x3 , x2 , x1 − x3 )

w3 = 2x1 + 3x2 4. (a) w1 =

12. (a) w1 = −x1 + x2 w2 = 3x1 − 2x2 w3 = 5x1 − 7x2

(b) w1 = 7x1 + 2x2 − 8x3 w2 = − x2 + 5x3 w3 = 4x1 + 7x2 − x3

(b) T (x1 , x2 , x3 ) = (2x1 − x2 + x3 , x2 + x3 , 0); x = (2, 1, −3) 18. (a) T (x1 , x2 ) = (2x1 − x2 , x1 + x2 ); x = (−2, 2) (b) T (x1 , x2 , x3 ) = (x1 , x2 − x3 , x2 ); x = (1, 0, 5) In Exercises 19–20, find TA (x), and express your answer in matrix form.



19. (a) A =

1 3

(b) A =



2 3 ; x= 4 −2

−1 3

2 1





−1 0 ⎢ ⎥ ; x = ⎣ 1⎦ 5 3

1.8 Matrix Transformations



−2 ⎢ 20. (a) A = ⎣ 3 ⎡ ⎢

6

−1

(b) A = ⎣ 2 7

⎡ ⎤



30. We proved in the text that if T : R n →R m is a matrix transformation, then T (0) = 0. Show that the converse of this result is false by finding a mapping T : R n →R m that is not a matrix transformation but for which T (0) = 0.

4 x1 ⎢ ⎥ ⎥ 7⎦; x = ⎣x2 ⎦ −1 x3

1 5 0



1 x1 ⎥ 4⎦; x = x2 8

31. Let TA : R 3 →R 3 be multiplication by



−1

In Exercises 21–22, use Theorem 1.8.2 to show that T is a matrix transformation. 21. (a) T (x, y) = (2x + y, x − y) (b) T (x1 , x2 , x3 ) = (x1 , x3 , x1 + x2 )

In Exercises 23–24, use Theorem 1.8.2 to show that T is not a matrix transformation. 23. (a) T (x, y) = (x 2 , y) (b) T (x, y, z) = (x, y, xz)

√  (b) T (x1 , x2 , x3 ) = x1 , x2 , x3 

25. A function of the form f(x) = mx + b is commonly called a “linear function” because the graph of y = mx + b is a line. Is f a matrix transformation on R ? 26. Show that T (x, y) = (0, 0) defines a matrix operator on R 2 but T (x, y) = (1, 1) does not. In Exercises 27–28, the images of the standard basis vectors for R 3 are given for a linear transformation T : R 3 →R 3 . Find the standard matrix for the transformation, and find T (x). 0



4



⎡ ⎤ 2

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 27. T (e1 ) = ⎣3⎦ , T (e2 ) = ⎣0⎦ , T (e3 ) = ⎣−3⎦ ; x = ⎣1⎦ 0 1 0 −1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −3 2 1 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 28. T (e1 ) = ⎣1⎦ , T (e2 ) = ⎣−1⎦ , T (e3 ) = ⎣0⎦ ; x = ⎣2⎦ 3

0

0

1

2⎦

5



−3

and let e1 , e2 , and e3 be the standard basis vectors for R 3 . Find the following vectors by inspection.

(c) TA (7e3 )

Working with Proofs 32. (a) Prove: If T : R n →R m is a matrix transformation, then T (0) = 0; that is, T maps the zero vector in R n into the zero vector in R m . (b) The converse of this is not true. Find an example of a function T for which T (0) = 0 but which is not a matrix transformation.

24. (a) T (x, y) = (x, y + 1)

1

4

(b) TA (e1 + e2 + e3 )

(b) T (x1 , x2 ) = (x2 , x1 )

⎡ ⎤

⎢ A=⎣ 2



3

(a) TA (e1 ), TA (e2 ), and TA (e3 )

22. (a) T (x, y, z) = (x + y, y + z, x)

⎡ ⎤

83

2

1

29. Let T : R 2 →R 2 be a linear operator for which the images of the standard basis vectors for R 2 are T (e1 ) = (a, b) and T (e2 ) = (c, d). Find T (1, 1).

True-False Exercises TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) If A is a 2 × 3 matrix, then the domain of the transformation TA is R 2 . (b) If A is an m × n matrix, then the codomain of the transformation TA is R n . (c) There is at least one linear transformation T : R n →R m for which T (2x) = 4T (x) for some vector x in R n . (d) There are linear transformations from R n to R m that are not matrix transformations. (e) If TA : R n →R n and if TA (x) = 0 for every vector x in R n , then A is the n × n zero matrix. (f ) There is only one matrix transformation T : R n →R m such that T (−x) = −T (x) for every vector x in R n . (g) If b is a nonzero vector in R n , then T (x) = x + b is a matrix operator on R n .

84

Chapter 1 Systems of Linear Equations and Matrices

1.9 Applications of Linear Systems In this section we will discuss some brief applications of linear systems. These are but a small sample of the wide variety of real-world problems to which our study of linear systems is applicable.

Network Analysis

The concept of a network appears in a variety of applications. Loosely stated, a network is a set of branches through which something “flows.” For example, the branches might be electrical wires through which electricity flows, pipes through which water or oil flows, traffic lanes through which vehicular traffic flows, or economic linkages through which money flows, to name a few possibilities. In most networks, the branches meet at points, called nodes or junctions, where the flow divides. For example, in an electrical network, nodes occur where three or more wires join, in a traffic network they occur at street intersections, and in a financial network they occur at banking centers where incoming money is distributed to individuals or other institutions. In the study of networks, there is generally some numerical measure of the rate at which the medium flows through a branch. For example, the flow rate of electricity is often measured in amperes, the flow rate of water or oil in gallons per minute, the flow rate of traffic in vehicles per hour, and the flow rate of European currency in millions of Euros per day. We will restrict our attention to networks in which there is flow conservation at each node, by which we mean that the rate of flow into any node is equal to the rate of flow out of that node. This ensures that the flow medium does not build up at the nodes and block the free movement of the medium through the network. A common problem in network analysis is to use known flow rates in certain branches to find the flow rates in all of the branches. Here is an example.

E X A M P L E 1 Network Analysis Using Linear Systems 30

Figure 1.9.1 shows a network with four nodes in which the flow rate and direction of flow in certain branches are known. Find the flow rates and directions of flow in the remaining branches.

35

55

Solution As illustrated in Figure 1.9.2, we have assigned arbitrary directions to the

unknown flow rates x1 , x2 , and x3 . We need not be concerned if some of the directions are incorrect, since an incorrect direction will be signaled by a negative value for the flow rate when we solve for the unknowns. It follows from the conservation of flow at node A that

15

60

Figure 1.9.1

x1 + x2 = 30 Similarly, at the other nodes we have

30 x2 35

A

B x3

C 60

x2 + x3 = 35

(node B )

x1

x3 + 15 = 60

(node C )

D

x1 + 15 = 55

(node D )

15

55

These four conditions produce the linear system

x1 + x2 = 30 x2 + x3 = 35

Figure 1.9.2

x1

x3 = 45 = 40

1.9 Applications of Linear Systems

85

which we can now try to solve for the unknown flow rates. In this particular case the system is sufficiently simple that it can be solved by inspection (work from the bottom up). We leave it for you to confirm that the solution is

x1 = 40, x2 = −10, x3 = 45 The fact that x2 is negative tells us that the direction assigned to that flow in Figure 1.9.2 is incorrect; that is, the flow in that branch is into node A.

E X A M P L E 2 Design of Traffic Patterns

The network in Figure 1.9.3 shows a proposed plan for the traffic flow around a new park that will house the Liberty Bell in Philadelphia, Pennsylvania. The plan calls for a computerized traffic light at the north exit on Fifth Street, and the diagram indicates the average number of vehicles per hour that are expected to flow in and out of the streets that border the complex. All streets are one-way. (a) How many vehicles per hour should the traffic light let through to ensure that the average number of vehicles per hour flowing into the complex is the same as the average number of vehicles flowing out? (b) Assuming that the traffic light has been set to balance the total flow in and out of the complex, what can you say about the average number of vehicles per hour that will flow along the streets that border the complex? N W

E

Traffic light

200

x

200

Market St.

700

Liberty Park

Fifth St.

500

Sixth St.

S

400

500

C

x3

400

700

Chestnut St.

D

x1

600

400

A

400

600

(a)

Figure 1.9.3

B x2

x4

(b)

Solution (a) If, as indicated in Figure 1.9.3b, we let x denote the number of vehicles per hour that the traffic light must let through, then the total number of vehicles per hour that flow in and out of the complex will be

Flowing in: 500 + 400 + 600 + 200 = 1700 Flowing out: x + 700 + 400 Equating the flows in and out shows that the traffic light should let x = 600 vehicles per hour pass through. Solution (b) To avoid traffic congestion, the flow in must equal the flow out at each intersection. For this to happen, the following conditions must be satisfied:

Intersection

Flow In

A B C D

400 + 600

x2 + x 3 500 + 200 x1 + x4

Flow Out

= = = =

x1 + x2 400 + x x3 + x4 700

86

Chapter 1 Systems of Linear Equations and Matrices

Thus, with x = 600, as computed in part (a), we obtain the following linear system:

x1 + x2 x2 + x3 x3 + x4 x1 + x4

= 1000 = 1000 = 700 = 700

We leave it for you to show that the system has infinitely many solutions and that these are given by the parametric equations

x1 = 700 − t, x2 = 300 + t, x3 = 700 − t, x4 = t

(1)

However, the parameter t is not completely arbitrary here, since there are physical constraints to be considered. For example, the average flow rates must be nonnegative since we have assumed the streets to be one-way, and a negative flow rate would indicate a flow in the wrong direction. This being the case, we see from (1) that t can be any real number that satisfies 0 ≤ t ≤ 700, which implies that the average flow rates along the streets will fall in the ranges 0 ≤ x1 ≤ 700, 300 ≤ x2 ≤ 1000, 0 ≤ x3 ≤ 700, 0 ≤ x4 ≤ 700 Electrical Circuits + –

Switch

Figure 1.9.4

Next we will show how network analysis can be used to analyze electrical circuits consisting of batteries and resistors. A battery is a source of electric energy, and a resistor, such as a lightbulb, is an element that dissipates electric energy. Figure 1.9.4 shows a schematic diagram of a circuit with one battery (represented by the symbol ), one resistor (represented by the symbol ), and a switch. The battery has a positive pole (+) and a negative pole (−). When the switch is closed, electrical current is considered to flow from the positive pole of the battery, through the resistor, and back to the negative pole (indicated by the arrowhead in the figure). Electrical current, which is a flow of electrons through wires, behaves much like the flow of water through pipes. A battery acts like a pump that creates “electrical pressure” to increase the flow rate of electrons, and a resistor acts like a restriction in a pipe that reduces the flow rate of electrons. The technical term for electrical pressure is electrical potential; it is commonly measured in volts (V). The degree to which a resistor reduces the electrical potential is called its resistance and is commonly measured in ohms (). The rate of flow of electrons in a wire is called current and is commonly measured in amperes (also called amps) (A). The precise effect of a resistor is given by the following law:

I amperes passes through a resistor with a resistance of R ohms, then there is a resulting drop of E volts in electrical potential that is the Ohm’s Law If a current of

product of the current and resistance; that is,

E = IR

+ –

Figure 1.9.5

+ –

A typical electrical network will have multiple batteries and resistors joined by some configuration of wires. A point at which three or more wires in a network are joined is called a node (or junction point). A branch is a wire connecting two nodes, and a closed loop is a succession of connected branches that begin and end at the same node. For example, the electrical network in Figure 1.9.5 has two nodes and three closed loops— two inner loops and one outer loop. As current flows through an electrical network, it undergoes increases and decreases in electrical potential, called voltage rises and voltage drops, respectively. The behavior of the current at the nodes and around closed loops is governed by two fundamental laws:

1.9 Applications of Linear Systems

87

Kirchhoff’s Current Law The sum of the currents flowing into any node is equal to the sum of the currents flowing out.

Kirchhoff’s Voltage Law In one traversal of any closed loop, the sum of the voltage rises equals the sum of the voltage drops.

I2 I1

Kirchhoff’s current law is a restatement of the principle of flow conservation at a node that was stated for general networks. Thus, for example, the currents at the top node in Figure 1.9.6 satisfy the equation I1 = I2 + I3 . In circuits with multiple loops and batteries there is usually no way to tell in advance which way the currents are flowing, so the usual procedure in circuit analysis is to assign arbitrary directions to the current flows in the branches and let the mathematical computations determine whether the assignments are correct. In addition to assigning directions to the current flows, Kirchhoff’s voltage law requires a direction of travel for each closed loop. The choice is arbitrary, but for consistency we will always take this direction to be clockwise (Figure 1.9.7).We also make the following conventions:

I3

Figure 1.9.6

+ –

+ –

• A voltage drop occurs at a resistor if the direction assigned to the current through the resistor is the same as the direction assigned to the loop, and a voltage rise occurs at a resistor if the direction assigned to the current through the resistor is the opposite to that assigned to the loop.

Clockwise closed-loop convention with arbitrary direction assignments to currents in the branches

Figure 1.9.7

• A voltage rise occurs at a battery if the direction assigned to the loop is from − to + through the battery, and a voltage drop occurs at a battery if the direction assigned to the loop is from + to − through the battery. If you follow these conventions when calculating currents, then those currents whose directions were assigned correctly will have positive values and those whose directions were assigned incorrectly will have negative values. E X A M P L E 3 A Circuit with One Closed Loop

I

Determine the current I in the circuit shown in Figure 1.9.8.

+ 6 V–

3

Figure 1.9.8

Solution Since the direction assigned to the current through the resistor is the same as the direction of the loop, there is a voltage drop at the resistor. By Ohm’s law this voltage drop is E = IR = 3I . Also, since the direction assigned to the loop is from − to + through the battery, there is a voltage rise of 6 volts at the battery. Thus, it follows from Kirchhoff’s voltage law that 3I = 6

from which we conclude that the current is I = 2 A. Since I is positive, the direction assigned to the current flow is correct.

I1

E X A M P L E 4 A Circuit with Three Closed Loops

I2

A

Determine the currents I1 , I2 , and I3 in the circuit shown in Figure 1.9.9.

I3 5

+ – 50 V

20 

B

Figure 1.9.9

+ – 30 V

10 

Solution Using the assigned directions for the currents, Kirchhoff’s current law provides one equation for each node:

Node

Current In

A B

I1 + I 2 I3

Current Out

= =

I3 I1 + I2

88

Chapter 1 Systems of Linear Equations and Matrices

However, these equations are really the same, since both can be expressed as

I1 + I 2 − I 3 = 0

(2)

To find unique values for the currents we will need two more equations, which we will obtain from Kirchhoff’s voltage law. We can see from the network diagram that there are three closed loops, a left inner loop containing the 50 V battery, a right inner loop containing the 30 V battery, and an outer loop that contains both batteries. Thus, Kirchhoff’s voltage law will actually produce three equations. With a clockwise traversal of the loops, the voltage rises and drops in these loops are as follows:

Left Inside Loop

Voltage Rises

Voltage Drops

50

5I1 + 20I3

Right Inside Loop 30 + 10I2 + 20I3 Outside Loop

0

30 + 50 + 10I2

5I1

These conditions can be rewritten as

+ 20I3 =

5I1

50

10I2 + 20I3 = −30 5I1 − 10I2

=

(3)

80

However, the last equation is superfluous, since it is the difference of the first two. Thus, if we combine (2) and the first two equations in (3), we obtain the following linear system of three equations in the three unknown currents:

I1 + 5 I1

I2 −

I3 =

0

+ 20I3 =

50

10I2 + 20I3 = −30 We leave it for you to show that the solution of this system in amps is I1 = 6, I2 = −5, and I3 = 1. The fact that I2 is negative tells us that the direction of this current is opposite to that indicated in Figure 1.9.9. Balancing Chemical Equations

Chemical compounds are represented by chemical formulas that describe the atomic makeup of their molecules. For example, water is composed of two hydrogen atoms and one oxygen atom, so its chemical formula is H2 O; and stable oxygen is composed of two oxygen atoms, so its chemical formula is O2 . When chemical compounds are combined under the right conditions, the atoms in their molecules rearrange to form new compounds. For example, when methane burns,

Gustav Kirchhoff (1824–1887)

Historical Note The German physicist Gustav Kirchhoff was a student of Gauss. His work on Kirchhoff’s laws, announced in 1854, was a major advance in the calculation of currents, voltages, and resistances of electrical circuits. Kirchhoff was severely disabled and spent most of his life on crutches or in a wheelchair. [Image: ullstein bild histopics/akg-im]

1.9 Applications of Linear Systems

89

the methane (CH4 ) and stable oxygen (O2 ) react to form carbon dioxide (CO2 ) and water (H2 O). This is indicated by the chemical equation CH4 + O2 −→ CO2 + H2 O

(4)

The molecules to the left of the arrow are called the reactants and those to the right the products. In this equation the plus signs serve to separate the molecules and are not intended as algebraic operations. However, this equation does not tell the whole story, since it fails to account for the proportions of molecules required for a complete reaction (no reactants left over). For example, we can see from the right side of (4) that to produce one molecule of carbon dioxide and one molecule of water, one needs three oxygen atoms for each carbon atom. However, from the left side of (4) we see that one molecule of methane and one molecule of stable oxygen have only two oxygen atoms for each carbon atom. Thus, on the reactant side the ratio of methane to stable oxygen cannot be one-to-one in a complete reaction. A chemical equation is said to be balanced if for each type of atom in the reaction, the same number of atoms appears on each side of the arrow. For example, the balanced version of Equation (4) is CH4 + 2O2 −→ CO2 + 2H2 O

(5)

by which we mean that one methane molecule combines with two stable oxygen molecules to produce one carbon dioxide molecule and two water molecules. In theory, one could multiply this equation through by any positive integer. For example, multiplying through by 2 yields the balanced chemical equation 2CH4 + 4O2 −→ 2CO2 + 4H2 O However, the standard convention is to use the smallest positive integers that will balance the equation. Equation (4) is sufficiently simple that it could have been balanced by trial and error, but for more complicated chemical equations we will need a systematic method. There are various methods that can be used, but we will give one that uses systems of linear equations. To illustrate the method let us reexamine Equation (4). To balance this equation we must find positive integers, x1 , x2 , x3 , and x4 such that

x1 (CH4 ) + x2 (O2 ) −→ x3 (CO2 ) + x4 (H2 O)

(6)

For each of the atoms in the equation, the number of atoms on the left must be equal to the number of atoms on the right. Expressing this in tabular form we have Left Side Carbon Hydrogen Oxygen

x1 4x1 2x2

Right Side

= = =

x3 2x4 2 x3 + x 4

from which we obtain the homogeneous linear system

x1 4 x1

− x3

=0 − 2 x4 = 0 2x2 − 2x3 − x4 = 0

The augmented matrix for this system is



1 ⎢ 4 ⎣ 0

0 0 2

−1 0 −2

0 −2 −1



0 ⎥ 0⎦ 0

90

Chapter 1 Systems of Linear Equations and Matrices

We leave it for you to show that the reduced row echelon form of this matrix is





⎢ ⎣0

0

0

− 21

0

1

0

−1

0⎦

0

0

1

− 21

0

1



from which we conclude that the general solution of the system is

x1 = t/2, x2 = t, x3 = t/2, x4 = t where t is arbitrary. The smallest positive integer values for the unknowns occur when we let t = 2, so the equation can be balanced by letting x1 = 1, x2 = 2, x3 = 1, x4 = 2. This agrees with our earlier conclusions, since substituting these values into Equation (6) yields Equation (5).

E X A M P L E 5 Balancing Chemical Equations Using Linear Systems

Balance the chemical equation

+ Na3 PO4 −→ H3 PO4 + NaCl [hydrochloric acid] + [sodium phosphate] −→ [phosphoric acid] + [sodium chloride] HCl

Solution Let x1 , x2 , x3 , and x4 be positive integers that balance the equation

x1 (HCl) + x2 (Na3 PO4 ) −→ x3 (H3 PO4 ) + x4 (NaCl)

(7)

Equating the number of atoms of each type on the two sides yields 1x1 = 3x3 Hydrogen (H) 1x1 = 1x4 Chlorine (Cl) 3x2 = 1x4 Sodium (Na) 1x2 = 1x3 Phosphorus (P) 4x2 = 4x3 Oxygen (O) from which we obtain the homogeneous linear system

− 3x3

x1 x1

=0 − x4 = 0

− x4 = 0 x2 − x3 =0

3x2

4x2 − 4x3

=0

We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is



1

⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢ ⎣0 0

0

0

−1

1

0

− 13

0

1

− 13

0 0

0 0

0 0

0



⎥ ⎥ ⎥ 0⎥ ⎥ ⎥ 0⎦ 0⎥

0

1.9 Applications of Linear Systems

91

from which we conclude that the general solution of the system is

x1 = t, x2 = t/3, x3 = t/3, x4 = t where t is arbitrary. To obtain the smallest positive integers that balance the equation, we let t = 3, in which case we obtain x1 = 3, x2 = 1, x3 = 1, and x4 = 3. Substituting these values in (7) produces the balanced equation 3HCl + Na3 PO4 −→ H3 PO4 + 3NaCl Polynomial Interpolation

An important problem in various applications is to find a polynomial whose graph passes through a specified set of points in the plane; this is called an interpolating polynomial for the points. The simplest example of such a problem is to find a linear polynomial

p(x) = ax + b y

whose graph passes through two known distinct points, (x1 , y1 ) and (x2 , y2 ), in the xy-plane (Figure 1.9.10). You have probably encountered various methods in analytic geometry for finding the equation of a line through two points, but here we will give a method based on linear systems that can be adapted to general polynomial interpolation. The graph of (8) is the line y = ax + b, and for this line to pass through the points (x1 , y1 ) and (x2 , y2 ), we must have

y = ax + b (x2, y2) (x1, y1)

(8)

x

y1 = ax1 + b and y2 = ax2 + b

Figure 1.9.10

Therefore, the unknown coefficients a and b can be obtained by solving the linear system

ax1 + b = y1 ax2 + b = y2 We don’t need any fancy methods to solve this system—the value of a can be obtained by subtracting the equations to eliminate b, and then the value of a can be substituted into either equation to find b. We leave it as an exercise for you to find a and b and then show that they can be expressed in the form

a=

y2 − y1 y1 x2 − y2 x1 and b = x2 − x 1 x2 − x1

(9)

provided x1  = x2 . Thus, for example, the line y = ax + b that passes through the points

(2, 1) and (5, 4)

y

can be obtained by taking (x1 , y1 ) = (2, 1) and (x2 , y2 ) = (5, 4), in which case (9) yields

y=x–1 (5, 4)

(2, 1)

a= x

4−1 (1)(5) − (4)(2) = 1 and b = = −1 5−2 5−2

Therefore, the equation of the line is

y =x−1 Figure 1.9.11

(Figure 1.9.11). Now let us consider the more general problem of finding a polynomial whose graph passes through n points with distinct x -coordinates

(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), . . . , (xn , yn )

(10)

Since there are n conditions to be satisfied, intuition suggests that we should begin by looking for a polynomial of the form

p(x) = a0 + a1 x + a2 x 2 + · · · + an−1 x n−1

(11)

92

Chapter 1 Systems of Linear Equations and Matrices

since a polynomial of this form has n coefficients that are at our disposal to satisfy the n conditions. However, we want to allow for cases where the points may lie on a line or have some other configuration that would make it possible to use a polynomial whose degree is less than n − 1; thus, we allow for the possibility that an−1 and other coefficients in (11) may be zero. The following theorem, which we will prove later in the text, is the basic result on polynomial interpolation.

THEOREM 1.9.1 Polynomial Interpolation

Given any n points in the xy-plane that have distinct x-coordinates, there is a unique polynomial of degree n − 1 or less whose graph passes through those points.

Let us now consider how we might go about finding the interpolating polynomial (11) whose graph passes through the points in (10). Since the graph of this polynomial is the graph of the equation

y = a0 + a1 x + a2 x 2 + · · · + an−1 x n−1

(12)

it follows that the coordinates of the points must satisfy

a0 + a1 x1 + a2 x12 + · · · + an−1 x1n−1 = y1 a0 + a1 x2 + a2 x22 + · · · + an−1 x2n−1 = y2 .. .. .. .. .. . . . . . a0 + a1 xn + a2 xn2 + · · · + an−1 xnn−1 = yn

(13)

In these equations the values of x ’s and y ’s are assumed to be known, so we can view this as a linear system in the unknowns a0 , a1 , . . . , an−1 . From this point of view the augmented matrix for the system is



1

⎢ ⎢ ⎢1 ⎢ ⎢ .. ⎢. ⎣ 1

x1

x12

· · · x1n−1

x2 .. .

x22 .. .

· · · x2n−1 .. .

xn

xn2

· · · xnn−1



y1

⎥ ⎥ y2 ⎥ ⎥ .. ⎥ .⎥ ⎦ yn

(14)

and hence the interpolating polynomial can be found by reducing this matrix to reduced row echelon form (Gauss–Jordan elimination).

E X A M P L E 6 Polynomial Interpolation by Gauss–Jordan Elimination

Find a cubic polynomial whose graph passes through the points

(1, 3), (2, −2), (3, −5), (4, 0) Solution Since there are four points, we will use an interpolating polynomial of degree

n = 3. Denote this polynomial by p(x) = a0 + a1 x + a2 x 2 + a3 x 3 and denote the x - and y -coordinates of the given points by

x1 = 1, x2 = 2, x3 = 3, x4 = 4 and y1 = 3, y2 = −2, y3 = −5, y4 = 0

1.9 Applications of Linear Systems

93

Thus, it follows from (14) that the augmented matrix for the linear system in the unknowns a0 , a1 , a2 , and a3 is



1

⎢ ⎢ ⎢1 ⎢ ⎢ ⎢1 ⎣

y

1

x1

x12

x13

x2

x22

x23

x3

x32

x33

x4

x42

x43



⎡ 1 ⎥ ⎥ ⎢ y2 ⎥ ⎢1 ⎥=⎢ ⎥ y3 ⎥ ⎣1 ⎦ 1 y4 y1

1 2 3 4

1 4 9 16

1 8 27 64



3 −2 ⎥ ⎥ ⎥ −5 ⎦ 0

4

We leave it for you to confirm that the reduced row echelon form of this matrix is

3



2 1 –1 –1

1 ⎢ ⎢0 ⎢ ⎢0 ⎣ 0

x 1

2

3

4

–2 –3

0 1 0 0

0 0 1 0



0 0 0 1

4 ⎥ 3⎥ ⎥ −5⎥ ⎦ 1

–5

from which it follows that a0 = 4, a1 = 3, a2 = −5, a3 = 1. Thus, the interpolating polynomial is p(x) = 4 + 3x − 5x 2 + x 3

Figure 1.9.12

The graph of this polynomial and the given points are shown in Figure 1.9.12.

–4

Remark Later we will give a more efficient method for finding interpolating polynomials that is better suited for problems in which the number of data points is large.

CA L C U L U S A N D C A L C U L AT I N G UT I L ITY REQUIRED

E X A M P L E 7 Approximate Integration

There is no way to evaluate the integral



1

sin

π x2

0

! dx

2

directly since there is no way to express an antiderivative of the integrand in terms of elementary functions. This integral could be approximated by Simpson’s rule or some comparable method, but an alternative approach is to approximate the integrand by an interpolating polynomial and integrate the approximating polynomial. For example, let us consider the five points

x0 = 0, x1 = 0.25, x2 = 0.5, x3 = 0.75, x4 = 1 that divide the interval [0, 1] into four equally spaced subintervals (Figure 1.9.13). The values of !

y 1

f(x) = sin

π x2 2

at these points are approximately

0.5

f(0) = 0, f(0.25) = 0.098017, f(0.5) = 0.382683, f(0.75) = 0.77301, f(1) = 1

x 0

0.25 0.5 0.75 1 1.25 p(x) sin (πx 2/2)

Figure 1.9.13

The interpolating polynomial is (verify)

p(x) = 0.098796x + 0.762356x 2 + 2.14429x 3 − 2.00544x 4 and



1

p(x) dx ≈ 0.438501

(15)

(16)

0

As shown in Figure 1.9.13, the graphs of f and p match very closely over the interval [0, 1], so the approximation is quite good.

94

Chapter 1 Systems of Linear Equations and Matrices

Exercise Set 1.9 1. The accompanying figure shows a network in which the flow rate and direction of flow in certain branches are known. Find the flow rates and directions of flow in the remaining branches.

(b) Solve the system for the unknown flow rates. (c) Is it possible to close the road from A to B for construction and keep traffic flowing on the other streets? Explain.

50 300 500 30

200 x1

A

60

450

2. The accompanying figure shows known flow rates of hydrocarbons into and out of a network of pipes at an oil refinery. (a) Set up a linear system whose solution provides the unknown flow rates.

600

5.

(c) Find the flow rates and directions of flow if x4 = 50 and x6 = 0.

Figure Ex-4

8V + –

2

I1 2 

I2 I3

4

150 x4

25

400

In Exercises 5–8, analyze the given electrical circuits by finding the unknown currents.

(b) Solve the system for the unknown flow rates.

x1

x7

350

Figure Ex-1

40

x3

x5

400 x6

200

600

x4

x3

50

100 x2

B

x5

– + 6V

x6

x2

200

Figure Ex-2

175

6.

+2V –

6

3. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow rates along the streets are measured as the average number of vehicles per hour.

I2 4

(a) Set up a linear system whose solution provides the unknown flow rates.

I1 I3

– 1V+

2

(b) Solve the system for the unknown flow rates. (c) If the flow along the road from A to B must be reduced for construction, what is the minimum flow that is required to keep traffic flowing on all roads? 400 300

250

I1 20 

x2

I2 20 

x4 x1 300

I3

20 

200

B

I5

I6

A

400 100

I4 + 10 V –

750 x3

20 

7.

Figure Ex-3

8.

5V + –

3 I1 4

4. The accompanying figure shows a network of one-way streets with traffic flowing in the directions indicated. The flow rates along the streets are measured as the average number of vehicles per hour.

– + 4V

(a) Set up a linear system whose solution provides the unknown flow rates.

– + 3V

I2 5 I3

– 10 V +

1.9 Applications of Linear Systems

In Exercises 9–12, write a balanced equation for the given chemical reaction. 9. C3 H8 + O2 → CO2 + H2 O

(propane combustion)

10. C6 H12 O6 → CO2 + C2 H5 OH

(fermentation of sugar)

11. CH3 COF + H2 O → CH3 COOH + HF 12. CO2 + H2 O → C6 H12 O6 + O2

(photosynthesis)

13. Find the quadratic polynomial whose graph passes through the points (1, 1), (2, 2), and (3, 5). 14. Find the quadratic polynomial whose graph passes through the points (0, 0), (−1, 1), and (1, 1). 15. Find the cubic polynomial whose graph passes through the points (−1, −1), (0, 1), (1, 3), (4, −1). 16. The accompanying figure shows the graph of a cubic polynomial. Find the polynomial. 10 9 8 7 6 5 4 3 2 1

95

(b) When a current passes through a resistor, there is an increase in the electrical potential in a circuit. (c) Kirchhoff’s current law states that the sum of the currents flowing into a node equals the sum of the currents flowing out of the node. (d) A chemical equation is called balanced if the total number of atoms on each side of the equation is the same. (e) Given any n points in the xy-plane, there is a unique polynomial of degree n − 1 or less whose graph passes through those points.

Working withTechnology T1. The following table shows the lifting force on an aircraft wing measured in a wind tunnel at various wind velocities. Model the data with an interpolating polynomial of degree 5, and use that polynomial to estimate the lifting force at 2000 ft/s. Velocity (100 ft/s) Lifting Force (100 lb)

1

2

4

8

16

32

0

3.12

15.86

33.7

81.5

123.0

T2. (Calculus required ) Use the method of Example 7 to approximate the integral



1

2

ex dx 1

2

3

4

5

6

7

8

0

Figure Ex-16

17. (a) Find an equation that represents the family of all seconddegree polynomials that pass through the points (0, 1) and (1, 2). [Hint: The equation will involve one arbitrary parameter that produces the members of the family when varied.] (b) By hand, or with the help of a graphing utility, sketch four curves in the family. 18. In this section we have selected only a few applications of linear systems. Using the Internet as a search tool, try to find some more real-world applications of such systems. Select one that is of interest to you, and write a paragraph about it.

by subdividing the interval of integration into five equal parts and using an interpolating polynomial to approximate the integrand. Compare your answer to that obtained using the numerical integration capability of your technology utility. T3. Use the method of Example 5 to balance the chemical equation Fe2 O3 + Al → Al2 O3 + Fe

(Fe = iron, Al = aluminum, O = oxygen) T4. Determine the currents in the accompanying circuit. 20 V + – I2

True-False Exercises TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) In any network, the sum of the flows out of a node must equal the sum of the flows into a node.

I3

3

470 

I3

I2

I1

I1 + – 12 V

2

96

Chapter 1 Systems of Linear Equations and Matrices

1.10 Leontief Input-Output Models In 1973 the economist Wassily Leontief was awarded the Nobel prize for his work on economic modeling in which he used matrix methods to study the relationships among different sectors in an economy. In this section we will discuss some of the ideas developed by Leontief.

Inputs and Outputs in an Economy

Manufacturing

Agriculture

Open Sector

Utilities

Figure 1.10.1

Leontief Model of an Open Economy

One way to analyze an economy is to divide it into sectors and study how the sectors interact with one another. For example, a simple economy might be divided into three sectors—manufacturing, agriculture, and utilities. Typically, a sector will produce certain outputs but will require inputs from the other sectors and itself. For example, the agricultural sector may produce wheat as an output but will require inputs of farm machinery from the manufacturing sector, electrical power from the utilities sector, and food from its own sector to feed its workers. Thus, we can imagine an economy to be a network in which inputs and outputs flow in and out of the sectors; the study of such flows is called input-output analysis. Inputs and outputs are commonly measured in monetary units (dollars or millions of dollars, for example) but other units of measurement are also possible. The flows between sectors of a real economy are not always obvious. For example, in World War II the United States had a demand for 50,000 new airplanes that required the construction of many new aluminum manufacturing plants. This produced an unexpectedly large demand for certain copper electrical components, which in turn produced a copper shortage. The problem was eventually resolved by using silver borrowed from Fort Knox as a copper substitute. In all likelihood modern input-output analysis would have anticipated the copper shortage. Most sectors of an economy will produce outputs, but there may exist sectors that consume outputs without producing anything themselves (the consumer market, for example). Those sectors that do not produce outputs are called open sectors. Economies with no open sectors are called closed economies, and economies with one or more open sectors are called open economies (Figure 1.10.1). In this section we will be concerned with economies with one open sector, and our primary goal will be to determine the output levels that are required for the productive sectors to sustain themselves and satisfy the demand of the open sector. Let us consider a simple open economy with one open sector and three product-producing sectors: manufacturing, agriculture, and utilities. Assume that inputs and outputs are measured in dollars and that the inputs required by the productive sectors to produce one dollar’s worth of output are in accordance with Table 1.

Wassily Leontief (1906–1999)

Historical Note It is somewhat ironic that it was the Russian-born Wassily Leontief who won the Nobel prize in 1973 for pioneering the modern methods for analyzing free-market economies. Leontief was a precocious student who entered the University of Leningrad at age 15. Bothered by the intellectual restrictions of the Soviet system, he was put in jail for anti-Communist activities, after which he headed for the University of Berlin, receiving his Ph.D. there in 1928. He came to the United States in 1931, where he held professorships at Harvard and then New York University. [Image: © Bettmann/CORBIS]

1.10 Leontief Input-Output Models

97

Table 1

Provider

Input Required per Dollar Output Manufacturing

Agriculture

Utilities

Manufacturing

$ 0.50

$ 0.10

$ 0.10

Agriculture

$ 0.20

$ 0.50

$ 0.30

Utilities

$ 0.10

$ 0.30

$ 0.40

Usually, one would suppress the labeling and express this matrix as



0.5 ⎣ C = 0 .2 0 .1



0.1 0.5 0.3

0. 1 0. 3 ⎦ 0. 4

(1)

This is called the consumption matrix (or sometimes the technology matrix) for the economy. The column vectors













0.5 0.1 0.1 ⎣ ⎣ ⎣ ⎦ ⎦ c1 = 0.2 , c2 = 0.5 , c3 = 0.3⎦ 0 .1 0.3 0 .4

What is the economic significance of the row sums of the consumption matrix?

in C list the inputs required by the manufacturing, agricultural, and utilities sectors, respectively, to produce $1.00 worth of output. These are called the consumption vectors of the sectors. For example, c1 tells us that to produce $1.00 worth of output the manufacturing sector needs $0.50 worth of manufacturing output, $0.20 worth of agricultural output, and $0.10 worth of utilities output. Continuing with the above example, suppose that the open sector wants the economy to supply it manufactured goods, agricultural products, and utilities with dollar values:

d1 dollars of manufactured goods d2 dollars of agricultural products d3 dollars of utilities The column vector d that has these numbers as successive components is called the outside demand vector. Since the product-producing sectors consume some of their own output, the dollar value of their output must cover their own needs plus the outside demand. Suppose that the dollar values required to do this are

x1 dollars of manufactured goods x2 dollars of agricultural products x3 dollars of utilities The column vector x that has these numbers as successive components is called the production vector for the economy. For the economy with consumption matrix (1), that portion of the production vector x that will be consumed by the three productive sectors is





0.5 ⎢ ⎥ x1 ⎣0.2⎦ 0 .1 Fractions consumed by manufacturing





0.1 ⎢ ⎥ + x2 ⎣0.5⎦ 0 .3 Fractions consumed by agriculture







0.1 0.5 ⎢ ⎥ ⎢ + x3 ⎣0.3⎦ = ⎣0.2 0.4 0.1 Fractions consumed by utilities

0. 1 0.5 0. 3

⎤⎡ ⎤

x1 0.1 ⎥⎢ ⎥ 0.3⎦ ⎣x2 ⎦ = C x 0. 4 x3

98

Chapter 1 Systems of Linear Equations and Matrices

The vector C x is called the intermediate demand vector for the economy. Once the intermediate demand is met, the portion of the production that is left to satisfy the outside demand is x − C x. Thus, if the outside demand vector is d, then x must satisfy the equation x − Cx = d Amount produced

Intermediate demand

Outside demand

which we will find convenient to rewrite as

(I − C)x = d

(2)

The matrix I − C is called the Leontief matrix and (2) is called the Leontief equation. E X A M P L E 1 Satisfying Outside Demand

Consider the economy described in Table 1. Suppose that the open sector has a demand for $7900 worth of manufacturing products, $3950 worth of agricultural products, and $1975 worth of utilities. (a) Can the economy meet this demand? (b) If so, find a production vector x that will meet it exactly. Solution The consumption matrix, production vector, and outside demand vector are



0.5 ⎢ C = ⎣0.2 0 .1



0.1 0.5 0.3

⎡ ⎤





0.1 7900 x1 ⎥ ⎥ ⎢ ⎥ ⎢ 0.3⎦ , x = ⎣x2 ⎦ , d = ⎣3950⎦ 1975 0.4 x3

(3)

To meet the outside demand, the vector x must satisfy the Leontief equation (2), so the problem reduces to solving the linear system



0.5 ⎢ ⎣−0.2 −0.1

⎤⎡ ⎤ ⎡ ⎤ x1 −0.1 −0.1 7900 ⎥⎢ ⎥ ⎢ ⎥ 0.5 −0.3⎦ ⎣x2 ⎦ = ⎣3950⎦ 1975 −0.3 0.6 x3

I −C

x

(4)

d

(if consistent). We leave it for you to show that the reduced row echelon form of the augmented matrix for this system is



1 ⎢ 0 ⎣ 0

0 1 0

0 0 1



27,500 ⎥ 33,750⎦ 24,750

This tells us that (4) is consistent, and the economy can satisfy the demand of the open sector exactly by producing $27,500 worth of manufacturing output, $33,750 worth of agricultural output, and $24,750 worth of utilities output. Productive Open Economies

In the preceding discussion we considered an open economy with three product-producing sectors; the same ideas apply to an open economy with n product-producing sectors. In this case, the consumption matrix, production vector, and outside demand vector have the form ⎡ ⎡ ⎤ ⎡ ⎤ ⎤

c11 ⎢c21 ⎢ C = ⎢ .. ⎣ . cn1

c12 c22 .. .

··· ···

c n2

· · · cnn

x1 d1 c1n ⎢ x2 ⎥ ⎢d2 ⎥ c2n ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ .. ⎥ , x = ⎢ .. ⎥ , d = ⎢ .. ⎥ ⎣ ⎣.⎦ ⎦ ⎦ . . xn

dn

1.10 Leontief Input-Output Models

99

where all entries are nonnegative and

cij = the monetary value of the output of the i th sector that is needed by the j th sector to produce one unit of output

xi = the monetary value of the output of the i th sector di = the monetary value of the output of the i th sector that is required to meet the demand of the open sector

Remark Note that the j th column vector of C contains the monetary values that the j th sector requires of the other sectors to produce one monetary unit of output, and the i th row vector of C contains the monetary values required of the i th sector by the other sectors for each of them to produce one monetary unit of output.

As discussed in our example above, a production vector x that meets the demand d of the outside sector must satisfy the Leontief equation

(I − C)x = d If the matrix I − C is invertible, then this equation has the unique solution x = (I − C)−1 d

(5)

for every demand vector d. However, for x to be a valid production vector it must have nonnegative entries, so the problem of importance in economics is to determine conditions under which the Leontief equation has a solution with nonnegative entries. It is evident from the form of (5) that if I − C is invertible, and if (I − C)−1 has nonnegative entries, then for every demand vector d the corresponding x will also have nonnegative entries, and hence will be a valid production vector for the economy. Economies for which (I − C)−1 has nonnegative entries are said to be productive. Such economies are desirable because demand can always be met by some level of production. The following theorem, whose proof can be found in many books on economics, gives conditions under which open economies are productive.

THEOREM 1.10.1 If

C is the consumption matrix for an open economy, and if all of the column sums are less than 1, then the matrix I − C is invertible, the entries of (I − C)−1 are nonnegative, and the economy is productive.

Remark The j th column sum of C represents the total dollar value of input that the j th sector requires to produce $1 of output, so if the j th column sum is less than 1, then the j th sector requires less than $1 of input to produce $1 of output; in this case we say that the j th sector is profitable. Thus, Theorem 1.10.1 states that if all product-producing sectors of an open economy are profitable, then the economy is productive. In the exercises we will ask you to show that an open economy is productive if all of the row sums of C are less than 1 (Exercise 11). Thus, an open economy is productive if either all of the column sums or all of the row sums of C are less than 1.

E X A M P L E 2 An Open Economy Whose Sectors Are All Profitable

The column sums of the consumption matrix C in (1) are less than 1, so (I − C)−1 exists and has nonnegative entries. Use a calculating utility to confirm this, and use this inverse to solve Equation (4) in Example 1.

100

Chapter 1 Systems of Linear Equations and Matrices Solution We leave it for you to show that



(I − C)−1

2.65823 ⎣ ≈ 1.89873 1.39241

This matrix has nonnegative entries, and



2.65823 x = (I − C)−1 d ≈ ⎣1.89873 1.39241



1.13924 3.67089 2.02532

1.01266 2.15190⎦ 2.91139

⎤⎡

1.13924 3.67089 2.02532







1.01266 7900 27,500 2.15190⎦ ⎣3950⎦ ≈ ⎣33,750⎦ 2.91139 24,750 1975

which is consistent with the solution in Example 1.

Exercise Set 1.10

(a) Construct a consumption matrix for this economy. (b) How much must M and B each produce to provide customers with $7000 worth of mechanical work and $14,000 worth of body work? 2. A simple economy produces food (F ) and housing (H ). The production of $1.00 worth of food requires $0.30 worth of food and $0.10 worth of housing, and the production of $1.00 worth of housing requires $0.20 worth of food and $0.60 worth of housing. (a) Construct a consumption matrix for this economy. (b) What dollar value of food and housing must be produced for the economy to provide consumers $130,000 worth of food and $130,000 worth of housing?

3. Consider the open economy described by the accompanying table, where the input is in dollars needed for $1.00 of output.

4. A company produces Web design, software, and networking services. View the company as an open economy described by the accompanying table, where input is in dollars needed for $1.00 of output. (a) Find the consumption matrix for the company. (b) Suppose that the customers (the open sector) have a demand for $5400 worth of Web design, $2700 worth of software, and $900 worth of networking. Use row reduction to find a production vector that will meet this demand exactly. Table Ex-4 Input Required per Dollar Output

Provider

1. An automobile mechanic (M ) and a body shop (B ) use each other’s services. For each $1.00 of business that M does, it uses $0.50 of its own services and $0.25 of B ’s services, and for each $1.00 of business that B does it uses $0.10 of its own services and $0.25 of M ’s services.

Software

Networking

Web Design

$ 0.40

$ 0.20

$ 0.45

Software

$ 0.30

$ 0.35

$ 0.30

Networking

$ 0.15

$ 0.10

$ 0.20

In Exercises 5–6, use matrix inversion to find the production vector x that meets the demand d for the consumption matrix C.



(a) Find the consumption matrix for the economy.

5. C =

(b) Suppose that the open sector has a demand for $1930 worth of housing, $3860 worth of food, and $5790 worth of utilities. Use row reduction to find a production vector that will meet this demand exactly.

6. C =

0.1 0.5



0.3 0.3





0. 3 50 ; d= 0 .4 60 0 .1 22 ; d= 14 0 .7

7. Consider an open economy with consumption matrix

1

Table Ex-3 Input Required per Dollar Output Housing

Provider

Web Design

Food

Utilities

Housing

$ 0.10

$ 0.60

$ 0.40

Food

$ 0.30

$ 0.20

$ 0.30

Utilities

$ 0.40

$ 0.10

$ 0.20

C=



2

0

0

1

(a) Show that the economy can meet a demand of d1 = 2 units from the first sector and d2 = 0 units from the second sector, but it cannot meet a demand of d1 = 2 units from the first sector and d2 = 1 unit from the second sector. (b) Give both a mathematical and an economic explanation of the result in part (a).

Chapter 1 Supplementary Exercises

8. Consider an open economy with consumption matrix

⎡1

⎤ 1

1 4

2

⎢1 C=⎢ ⎣2 1 2

1 8 1 4

1 8

True-False Exercises TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer.

4

⎥ 1⎥ 4⎦

101

(a) Sectors of an economy that produce outputs are called open sectors.

If the open sector demands the same dollar value from each product-producing sector, which such sector must produce the greatest dollar value to meet the demand? Is the economy productive?

(b) A closed economy is an economy that has no open sectors. (c) The rows of a consumption matrix represent the outputs in a sector of an economy.

9. Consider an open economy with consumption matrix



c11 C= c21

c12

(d) If the column sums of the consumption matrix are all less than 1, then the Leontief matrix is invertible.

0

Show that the Leontief equation x − C x = d has a unique solution for every demand vector d if c21 c12 < 1 − c11 .

Working with Proofs 10. (a) Consider an open economy with a consumption matrix C whose column sums are less than 1, and let x be the production vector that satisfies an outside demand d; that is, (I − C)−1 d = x. Let dj be the demand vector that is obtained by increasing the j th entry of d by 1 and leaving the other entries fixed. Prove that the production vector xj that meets this demand is

(e) The Leontief equation relates the production vector for an economy to the outside demand vector.

Working withTechnology T1. The following table describes an open economy with three sectors in which the table entries are the dollar inputs required to produce one dollar of output. The outside demand during a 1-week period if $50,000 of coal, $75,000 of electricity, and $1,250,000 of manufacturing. Determine whether the economy can meet the demand. Input Required per Dollar Output Electricity

(b) In words, what is the economic significance of the j th column vector of (I − C)−1 ? [Hint: Look at xj − x.] 11. Prove: If C is an n × n matrix whose entries are nonnegative and whose row sums are less than 1, then I − C is invertible and has nonnegative entries. [Hint: (AT )−1 = (A−1 )T for any invertible matrix A.]

Provider

xj = x + j th column vector of (I − C)−1

Coal

Manufacturing

Electricity

$ 0.1

$ 0.25

$ 0.2

Coal

$ 0.3

$ 0.4

$ 0.5

Manufacturing

$ 0.1

$ 0.15

$ 0.1

Chapter 1 Supplementary Exercises In Exercises 1–4 the given matrix represents an augmented matrix for a linear system. Write the corresponding set of linear equations for the system, and use Gaussian elimination to solve the linear system. Introduce free parameters as necessary.

 1.

3

−1

0

4

1

2

0

3

3

−1



2

−4

1

3. ⎣−4 0

0

3

1

−1



6



⎥ −1⎦ 3





1

4

−1



5. Use Gauss–Jordan elimination to solve for x and y in terms of x and y .

x = 35 x − 45 y

y = 45 x + 35 y

⎢−2 ⎢ 2. ⎢ ⎣ 3

−8 12

⎥ −3⎦

6. Use Gauss–Jordan elimination to solve for x and y in terms of x and y . x = x cos θ − y sin θ

0

0

0

y = x sin θ + y cos θ

3

1

−2

4. ⎣−9 6

−3

⎡ ⎢

2

2⎥ ⎥

⎤ ⎥

6⎦ 1

7. Find positive integers that satisfy

x+ y+

z= 9

x + 5y + 10z = 44

102

Chapter 1 Systems of Linear Equations and Matrices

8. A box containing pennies, nickels, and dimes has 13 coins with a total value of 83 cents. How many coins of each type are in the box? Is the economy productive?



9. Let

a ⎢ ⎣a 0



0

b

2

a a

4

4⎦

2

b

15. Find values of a , b, and c such that the graph of the polynomial p(x) = ax 2 + bx + c passes through the points (1, 2), (−1, 6), and (2, 3). 16. (Calculus required ) Find values of a , b, and c such that the graph of p(x) = ax 2 + bx + c passes through the point (−1, 0) and has a horizontal tangent at (2, −9).



be the augmented matrix for a linear system. Find for what values of a and b the system has

17. Let Jn be the n × n matrix each of whose entries is 1. Show that if n > 1, then

(I − Jn )−1 = I −

(a) a unique solution. (b) a one-parameter solution. (c) a two-parameter solution.

x1 + x 2 + x 3 = 4 (a 2 − 4)x3 = a − 2 ⎤

 2 ⎥ 3⎦, B =

⎢ A = ⎣−2

0

−2 ⎡

1

8

−6

6

⎢ C=⎣ 6 −4

A3 + 4A2 − 2A + 7I = 0 then so does AT .

20. Prove: If A is invertible, then A + B and I + BA−1 are both invertible or both not invertible.

11. Find a matrix K such that AKB = C given that 4

−1



0

0

1

−1

,

21. Prove: If A is an m × n matrix and B is the n × 1 matrix each of whose entries is 1/n, then



⎤ r1 ⎢r2 ⎥ ⎢ ⎥ AB = ⎢ . ⎥ ⎣ .. ⎦

⎤ ⎥

1⎦

0

rm

0

12. How should the coefficients a , b, and c be chosen so that the system ax + by − 3z = −3

where r i is the average of the entries in the i th row of A. 22. (Calculus required ) If the entries of the matrix



−2x − by + cz = −1 ax + 3y − cz = −3 has the solution x = 1, y = −1, and z = 2? 13. In each part, solve the matrix equation for X .







−1

0

1

(a) X ⎣ 1 3

1

0⎦ =

⎢ 

(b) X

 (c)

1

−1 

1

−1

2

3

0

1

3

1

−1

2





X−X

1

2

0

−3

1

5

 =



−5

−1 −3 

6

1

4

2

0



=



c11 (x) ⎢ c21 (x) ⎢ C=⎢ . ⎣ ..

c12 (x) c22 (x) .. .

··· ···

cm1 (x)

cm2 (x)

···

(x) c11

⎢ c dC 21 ⎢ (x) =⎢ . ⎣ .. dx



0

cm 1 (x)

7

2

−2

5

4



(a) Show that (I − A)−1 = I + A + A2 + A3 if A4 = 0. (b) Show that

(I − A)−1 = I + A + A2 + · · · + An

⎤ c1n (x) c2n (x) ⎥ ⎥ .. ⎥ . ⎦ cmn (x)

are differentiable functions of x , then we define



14. Let A be a square matrix.

if An+1 = 0.

Jn

19. Prove: If B is invertible, then AB −1 = B −1 A if and only if AB = BA.

x3 = 2

1

n−1

18. Show that if a square matrix A satisfies

(d) no solution.

10. For which value(s) of a does the following system have zero solutions? One solution? Infinitely many solutions?



1

c12 (x)

c22 (x) .. .

··· ···

cm 2 (x)

···

⎤ c1 n (x) c2 n (x) ⎥ ⎥ .. ⎥ . ⎦

cmn (x)

Show that if the entries in A and B are differentiable functions of x and the sizes of the matrices are such that the stated operations can be performed, then

dA d (kA) = k dx dx dA dB d (A + B) = + (b) dx dx dx dA dB d (AB) = B +A (c) dx dx dx (a)

Chapter 1 Supplementary Exercises

23. (Calculus required ) Use part (c) of Exercise 22 to show that −1

dA dx

= −A−1

dA −1 A dx

State all the assumptions you make in obtaining this formula. 24. Assuming that the stated inverses exist, prove the following equalities. (a) (C

−1

−1 −1

+D )

(a) Confirm that the sizes of all matrices are such that the product AB can be obtained using Formula (∗). (b) Confirm that the result obtained using Formula (∗) agrees with that obtained using ordinary matrix multiplication. 26. Suppose that an invertible matrix A is partitioned as



A=

−1

= C(C + D) D

−1

−1

(b) (I + CD) C = C(I + DC)

−1

A Partitioned matrices can be multiplied by the row-column rule just as if the matrix entries were numbers provided that the sizes of all matrices are such that the necessary operations can be performed. Thus, for example, if A is partitioned into a 2 × 2 matrix and B into a 2 × 1 matrix, then



AB =

A11

A12

A21

A22



B1





=

B2

A11 B1 + A12 B2



A21 B1 + A22 B2

⎢ 4 A=⎢ ⎣

0

2

1

1

0

3

0

−3

4

2

3

0



1

⎢ ⎢2 ⎢ ⎢ B = ⎢4 ⎢ ⎢ ⎣0 2



⎥   ⎥ B1 −1 ⎥ ⎥= ⎥ B2 ⎥ 3⎦ 1⎥

5

4





⎥ A11 −1⎥ = ⎦ A21 −2

A12 A22

A12

A21

A22

=



B11

B12

B21

B22



where −1 B11 = (A11 − A12 A22 A21 )−1 , −1 B21 = −A22 A21 B11 ,

−1 B12 = −B11 A12 A22

−1 B22 = (A22 − A21 A11 A12 )−1

provided all the inverses in these formulas exist. (*)

provided that the sizes are such that AB , the two sums, and the four products are all defined. 25. Let A and B be the following partitioned matrices.



A11



Show that

(c) (C + DD T )−1 D = C −1 D(I + D T C −1 D)−1

103



27. In the special case where matrix A21 in Exercise 26 is zero, the matrix A simplifies to



A=

A11

A12

0

A22



which is said to be in block upper triangular form. Use the result of Exercise 26 to show that in this case



A−1 =

−1 A11

−1 −1 −A11 A12 A22

0

−1 A22



28. A linear system whose coefficient matrix has a pivot position in every row must be consistent. Explain why this must be so. 29. What can you say about the consistency or inconsistency of a linear system of three equations in five unknowns whose coefficient matrix has three pivot columns?

CHAPTER

2

Determinants CHAPTER CONTENTS

2.1 Determinants by Cofactor Expansion

105

2.2 Evaluating Determinants by Row Reduction 2.3 Properties of Determinants; Cramer’s Rule INTRODUCTION

113 118

In this chapter we will study “determinants” or, more precisely, “determinant functions.” Unlike real-valued functions, such as f(x) = x 2 , that assign a real number to a real variable x , determinant functions assign a real number f(A) to a matrix variable A. Although determinants first arose in the context of solving systems of linear equations, they are rarely used for that purpose in real-world applications. While they can be useful for solving very small linear systems (say two or three unknowns), our main interest in them stems from the fact that they link together various concepts in linear algebra and provide a useful formula for the inverse of a matrix.

2.1 Determinants by Cofactor Expansion In this section we will define the notion of a “determinant.” This will enable us to develop a specific formula for the inverse of an invertible matrix, whereas up to now we have had only a computational procedure for finding it. This, in turn, will eventually provide us with a formula for solutions of certain kinds of linear systems.

Recall from Theorem 1.4.5 that the 2 × 2 matrix



a A= c WARNING It is important to

keep in mind that det(A) is a number, whereas A is a matrix.

b d

is invertible if and only if ad − bc  = 0 and that the expression ad − bc is called the determinant of the matrix A. Recall also that this determinant is denoted by writing

 a det(A) = ad − bc or  c

 b  = ad − bc d

(1)

and that the inverse of A can be expressed in terms of the determinant as

A− 1 = Minors and Cofactors



d 1 det(A) −c

−b a

(2)

One of our main goals in this chapter is to obtain an analog of Formula (2) that is applicable to square matrices of all orders. For this purpose we will find it convenient to use subscripted entries when writing matrices or determinants. Thus, if we denote a 2 × 2 matrix as



A=

a11 a21

a12 a22

105

106

Chapter 2 Determinants

then the two equations in (1) take the form

 a12  = a11 a22 − a12 a21 a22 

 a11

det(A) = 

a21

(3)

In situations where it is inconvenient to assign a name to the matrix, we can express this formula as

det

a11 a21

a12 = a11 a22 − a12 a21 a22

(4)

There are various methods for defining determinants of higher-order square matrices. In this text, we will us an “inductive definition” by which we mean that the determinant of a square matrix of a given order will be defined in terms of determinants of square matrices of the next lower order. To start the process, let us define the determinant of a 1 × 1 matrix [a11 ] as (5) det [a11 ] = a11 from which it follows that Formula (4) can be expressed as



a11 det a21

a12 = det[a11 ] det[a22 ] − det[a12 ] det[a21 ] a22

Now that we have established a starting point, we can define determinants of 3 × 3 matrices in terms of determinants of 2 × 2 matrices, then determinants of 4 × 4 matrices in terms of determinants of 3 × 3 matrices, and so forth, ad infinitum. The following terminology and notation will help to make this inductive process more efficient.

A is a square matrix, then the minor of entry aij is denoted by Mij and is defined to be the determinant of the submatrix that remains after the i th row and j th column are deleted from A. The number (−1)i+j Mij is denoted by Cij and is called the cofactor of entry aij . DEFINITION 1 If

E X A M P L E 1 Finding Minors and Cofactors



Let

3 ⎢ A = ⎣2 1 WARNING We have followed

1 5 4

−4

⎤ ⎥

6⎦ 8

The minor of entry a11 is

the standard convention of using capital letters to denote minors and cofactors even though they are numbers, not matrices.

M11 =

The cofactor of a11 is

3 2 1

1 5 4

4 6 8

=

5 4

6 = 16 8

C11 = (−1)1+1 M11 = M11 = 16

Historical Note The term determinant was first introduced by the German mathematician Carl Friedrich Gauss in 1801 (see p. 15), who used them to “determine” properties of certain kinds of functions. Interestingly, the term matrix is derived from a Latin word for “womb” because it was viewed as a container of determinants.

2.1 Determinants by Cofactor Expansion

107

Similarly, the minor of entry a32 is



3



M32 = 2

1 The cofactor of a32 is

4



3 6 =



2 8

1 5 4

4

= 26 6

C32 = (−1)3+2 M32 = −M32 = −26

Remark Note that a minor Mij and its corresponding cofactor Cij are either the same or negatives of each other and that the relating sign (−1)i+j is either +1 or −1 in accordance with the pattern in the “checkerboard” array ⎤ ⎡ + ⎢− ⎢ ⎢ ⎢+ ⎢ ⎢− ⎣ .. .

− + − + .. .

+ − + − .. .

− + − + .. .

+ − + − .. .

··· · · ·⎥ ⎥ ⎥ · · ·⎥ ⎥ · · ·⎥ ⎦

For example,

C11 = M11 , C21 = −M21 , C22 = M22 and so forth. Thus, it is never really necessary to calculate (−1)i+j to calculate Cij —you can simply compute the minor Mij and then adjust the sign in accordance with the checkerboard pattern. Try this in Example 1.

E X A M P L E 2 Cofactor Expansions of a 2 × 2 Matrix

The checkerboard pattern for a 2 × 2 matrix A = [aij ] is



+ −

so that

− +

C11 = M11 = a22 C21 = −M21 = −a12

C12 = −M12 = −a21 C22 = M22 = a11 We leave it for you to use Formula (3) to verify that det(A) can be expressed in terms of cofactors in the following four ways:

 a11

det(A) = 

a21

 a12  a22 

= a11 C11 + a12 C12 = a21 C21 + a22 C22 = a11 C11 + a21 C21 = a12 C12 + a22 C22

(6)

Each of the last four equations is called a cofactor expansion of det(A). In each cofactor expansion the entries and cofactors all come from the same row or same column of A. Historical Note The term minor is apparently due to the English mathematician James Sylvester (see p. 35), who wrote the following in a paper published in 1850: “Now conceive any one line and any one column be struck out, we get…a square, one term less in breadth and depth than the original square; and by varying in every possible selection of the line and column excluded, we obtain, supposing the original square to consist of n lines and n columns, n2 such minor squares, each of which will represent what I term a “First Minor Determinant” relative to the principal or complete determinant.”

108

Chapter 2 Determinants

For example, in the first equation the entries and cofactors all come from the first row of A, in the second they all come from the second row of A, in the third they all come from the first column of A, and in the fourth they all come from the second column of A. Definition of a General Determinant

Formula (6) is a special case of the following general result, which we will state without proof. THEOREM 2.1.1 If A is an n × n matrix, then regardless of which row or column of A

is chosen, the number obtained by multiplying the entries in that row or column by the corresponding cofactors and adding the resulting products is always the same.

This result allows us to make the following definition. DEFINITION 2 If A is an n × n matrix, then the number obtained by multiplying the

entries in any row or column of A by the corresponding cofactors and adding the resulting products is called the determinant of A, and the sums themselves are called cofactor expansions of A. That is, det(A) = a1j C1j + a2j C2j + · · · + anj Cnj

(7)

[cofactor expansion along the jth column]

and

det(A) = ai 1 Ci 1 + ai 2 Ci 2 + · · · + ain Cin [cofactor expansion along the ith row]

E X A M P L E 3 Cofactor Expansion Along the First Row

Find the determinant of the matrix



3 A = ⎣−2 5

1 −4 4



0 3⎦ −2

by cofactor expansion along the first row.

Historical Note Cofactor expansion is not the only method for expressing the determinant of a matrix in terms of determinants of lower order. For example, although it is not well known, the English mathematician Charles Dodgson, who was the author of Alice’s Adventures in Wonderland and Through the Looking Glass under the pen name of Lewis Carroll, invented such a method, called condensation. That method has recently been resurrected from obscurity because of its suitability for parallel processing on computers. [Image: Oscar G. Rejlander/ Time & Life Pictures/Getty Images]

Charles Lutwidge Dodgson (Lewis Carroll) (1832–1898)

(8)

2.1 Determinants by Cofactor Expansion Solution

  3  det(A) =  −2  5

1 −4 4



 0   −4  3  = 3  4 −2 





 −2 3  − 1   5 −2





 −2 3  + 0   5 −2

109

 −4  4

= 3(−4) − (1)(−11) + 0 = −1 E X A M P L E 4 Cofactor Expansion Along the First Column Note that in Example 4 we had to compute three cofactors, whereas in Example 3 only two were needed because the third was multiplied by zero. As a rule, the best strategy for cofactor expansion is to expand along a row or column with the most zeros.

Let A be the matrix in Example 3, and evaluate det(A) by cofactor expansion along the first column of A. Solution

  3  det(A) =  −2  5

1 −4 4



 0   −4 3  = 3  4 −2 





1 3  − (−2)   4 −2





 1 0  + 5   −2 −4



0  3

= 3(−4) − (−2)(−2) + 5(3) = −1 This agrees with the result obtained in Example 3.

E X A M P L E 5 Smart Choice of Row or Column

If A is the 4 × 4 matrix





1 0 0 −1 ⎢3 1 2 2⎥ ⎥ ⎢ A=⎢ ⎥ 0 −2 1⎦ ⎣1 2 0 0 1 then to find det(A) it will be easiest to use cofactor expansion along the second column, since it has the most zeros:

  1 0 −1    1 det(A) = 1 · 1 −2   2 0 1

For the 3 × 3 determinant, it will be easiest to use cofactor expansion along its second column, since it has the most zeros:



1 det(A) = 1 · −2 ·  2

 −1 1

= −2(1 + 2) = −6 E X A M P L E 6 Determinant of a Lower Triangular Matrix

The following computation shows that the determinant of a 4 × 4 lower triangular matrix is the product of its diagonal entries. Each part of the computation uses a cofactor expansion along the first row.

 a11  a  21  a31  a41

0

a22 a32 a42

0 0

a33 a43

    a22 0  0        = a11 a32 a33 0     a42 a43 a44   a44    a33 0   = a11 a22  a a  0 0 0

43

44

= a11 a22 a33 |a44 | = a11 a22 a33 a44

110

Chapter 2 Determinants

The method illustrated in Example 6 can be easily adapted to prove the following general result.

A is an n × n triangular matrix (upper triangular, lower triangular, or diagonal ), then det(A) is the product of the entries on the main diagonal of the matrix; that is, det(A) = a11 a22 · · · ann .

THEOREM 2.1.2 If

A UsefulTechnique for Evaluating 2 × 2 and 3 × 3 Determinants

Determinants of 2 × 2 and 3 × 3 matrices can be evaluated very efficiently using the pattern suggested in Figure 2.1.1.

a11 a21

a11 a21 a31

a12 a22

Figure 2.1.1

WARNING The

arrow technique works only for determinants of 2 × 2 and 3 × 3 matrices. It does not work for matrices of size 4 × 4 or higher.

a12 a22 a32

a13 a23 a33

a11 a21 a31

a12 a22 a32

In the 2 × 2 case, the determinant can be computed by forming the product of the entries on the rightward arrow and subtracting the product of the entries on the leftward arrow. In the 3 × 3 case we first recopy the first and second columns as shown in the figure, after which we can compute the determinant by summing the products of the entries on the rightward arrows and subtracting the products on the leftward arrows. These procedures execute the computations

  a11 a12    a21 a22  = a11 a22 − a12 a21

  a11 a12 a13                a21 a22 a23  = a11 a22 a23  − a12 a21 a23  + a13 a21 a22    a32 a33  a31 a33  a31 a32  a  31 a32 a33 = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 ) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 which agrees with the cofactor expansions along the first row. E X A M P L E 7 A Technique for Evaluating 2 × 2 and 3 × 3 Determinants

3 4

1 4 7

2 5 8

3 1 = 4 2

3 6 9

=

1 = (3)(2)  (1)(4) = 10 2

1 4 7

2 5 8

3 6 9

1 4 7

2 5 8

= [45 + 84 + 96]  [105  48  72] = 240

2.1 Determinants by Cofactor Expansion

111

Exercise Set 2.1 In Exercises 1–2, find all the minors and cofactors of the matrix A.





−2

1 ⎢ 1. A = ⎣ 6 −3

3 ⎥ −1 ⎦ 4

7 1

3. Let



4 ⎢0 ⎢ A=⎢ ⎣4 4



1 ⎢ 2. A = ⎣3 0

−1 0 1 1



1 3 1

2 ⎥ 6⎦ 4

6 3⎥ ⎥ ⎥ 14⎦ 2

(b) M23 and C23 .

(c) M22 and C22 .

(d) M21 and C21 .

4. Let



2 ⎢−3 ⎢ A=⎢ ⎣ 3 3

3 2 −2 −2



−1

(d) M24 and C24 .

4 6. 8

1 2



−5 7. −7

√

7 −2

8.

2

4

√  6 √ 3

In Exercises 9–14, use the arrow technique to evaluate the determinant.

 a − 3 9.  −3  −2   11.  3   1  3   13. 2  1

 5  a − 2 

4  −7  2

1 5 6 0 −1 9



 −2   10.  5   3

7 1 8

6  −2  4

 −1   12.  3   1

1 0 7

2  −5  2

 c   14. 2  4

0  5  −4



λ−2 15. A = −5

1

λ+4

1

c−1

λ−4 ⎢ 16. A = ⎣ 0 0

(e) the third row.

(f ) the third column.

(a) the first row.

(b) the first column.

(c) the second row.

(d) the second column.

(e) the third row.

(f ) the third column.



c  2

3

3 ⎢2 ⎢ 25. A = ⎢ ⎣4 2



1 ⎢ 27. ⎣0 0

2

λ





3

0

1 ⎢ 23. A = ⎣1 1



0 5 0



7 ⎥ 1⎦ 5

k k k

⎤ k2 ⎥ k2 ⎦ k2

3 2 1 10

0 0 −3 3

0 3 2 4 2

0 3 4 6 4

3 ⎢ 22. A = ⎣1 1



3 0 −3



1 ⎥ −4 ⎦ 5

k−1 k−3 k+1

k+1



24. A = ⎣ 2 5



7 ⎥ 4⎦

k



5 −2⎥ ⎥ ⎥ 0⎦ 2 1 −1 2 2 2



0 0⎥ ⎥ ⎥ 3⎥ ⎥ 3⎦ 3

In Exercises 27–32, evaluate the determinant of the given matrix by inspection.

In Exercises 15–18, find all values of λ for which det(A) = 0.



(d) the second column.

4 ⎢3 ⎢ ⎢ 26. A = ⎢1 ⎢ ⎣9 2



−4



(b) the first column.



In Exercises 5–8, evaluate the determinant of the given matrix. If the matrix is invertible, use Equation (2) to find its inverse.



0

(c) the second row.



(c) M41 and C41 .

λ

0



0 ⎥ 0 ⎦ λ−5

(a) the first row.

21. A = ⎣ 2 −1

(b) M44 and C44 .

5 4

λ+1

−3

(a) M32 and C32 .

3 5. −2

0

2



Find



λ−1

4

In Exercises 21–26, evaluate det(A) by a cofactor expansion along a row or column of your choice.

1 3⎥ ⎥ ⎥ 0⎦ 4

0 1 1

λ−4 ⎢ 18. A = ⎣ −1

20. Evaluate the determinant in Exercise 12 by a cofactor expansion along

Find (a) M13 and C13 .

17. A =



19. Evaluate the determinant in Exercise 13 by a cofactor expansion along



1 −3 0 3





0 ⎥ 2 ⎦ λ−1



0

⎢1 ⎢ ⎣0

29. ⎢

1



0 −1 0

0 ⎥ 0⎦ 1 0



0

0

2

0

4

3

⎥ 0⎦

2

3

8

0⎥ ⎥



2 ⎢ 28. ⎣0 0



0 2 0

0 ⎥ 0⎦ 2

⎢0 ⎢ ⎣0

1

1

2

2

0

3

3⎦

0

0

0

4



1

30. ⎢

1



2⎥ ⎥



112

Chapter 2 Determinants



1

⎢0 ⎢ 31. ⎢ ⎣0 0

2

7

1

−4

0

2

⎤ −3 1⎥ ⎥ ⎥ 7⎦

0

0

3



−3

⎢ 1 ⎢ 32. ⎢ ⎣ 40

0

0

2 10

0 −1

100

200

−23



0

0⎥ ⎥ ⎥ 0⎦



0

A=

a

b c

0

1 0

42. Prove that if A is upper triangular and Bij is the matrix that results when the i th row and j th column of A are deleted, then Bij is upper triangular if i < j .



 and B =

d

e f

0



a c

b is ad + bc. d

(b) Two square matrices that have the same determinant must have the same size. (c) The minor Mij is the same as the cofactor Cij if i + j is even. (d) If A is a 3 × 3 symmetric matrix, then Cij = Cj i for all i and j .

 a − c  =0 d − f

(e) The number obtained by a cofactor expansion of a matrix A is independent of the row or column chosen for the expansion.

35. By inspection, what is the relationship between the following determinants?

b

  1

1 = 0

(a) The determinant of the 2 × 2 matrix

commute if and only if

 a   d1 = d  g



1 

TF. In parts (a)–( j) determine whether the statement is true or false, and justify your answer.



 b   e

y b1 b2

True-False Exercises

  1 0

34. Show that the matrices



 x    a1   a2

3

33. In each part, show that the value of the determinant is independent of θ .

   sin θ cos θ    (a)   − cos θ sin θ    sin θ cos θ   sin θ (b)  − cos θ  sin θ − cos θ sin θ + cos θ

41. Prove that the equation of the line through the distinct points (a1 , b1 ) and (a2 , b2 ) can be written as

  a + λ c    f  and d2 =  d    g 1

36. Show that 1 det(A) = 2

  tr(A)  tr(A2 )

b 1 0

 c   f  1

(f ) If A is a square matrix whose minors are all zero, then det(A) = 0. (g) The determinant of a lower triangular matrix is the sum of the entries along the main diagonal. (h) For every square matrix A and every scalar c, it is true that det(cA) = c det(A).



(i) For all square matrices A and B , it is true that

tr(A)

det(A + B) = det(A) + det(B)

1 

( j) For every 2 × 2 matrix A it is true that det(A2 ) = (det(A))2 .

for every 2 × 2 matrix A. 37. What can you say about an nth-order determinant all of whose entries are 1? Explain. 38. What is the maximum number of zeros that a 3 × 3 matrix can have without having a zero determinant? Explain. 39. Explain why the determinant of a matrix with integer entries must be an integer.

Working withTechnology T1. (a) Use the determinant capability of your technology utility to find the determinant of the matrix



4.2

⎢0.0 ⎢ A=⎢ ⎣4.5 4.7



−1.3

1.1

0 .0

−3.2

1 .3

0.0

14.8⎦

1 .0

3 .4

2.3

6.0

3.4⎥ ⎥



Working with Proofs 40. Prove that (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ) are collinear points if and only if

 x1   x2  x3

y1 y2 y3



1  1 = 0  1

(b) Compare the result obtained in part (a) to that obtained by a cofactor expansion along the second row of A. T2. Let An be the n × n matrix with 2’s along the main diagonal, 1’s along the diagonal lines immediately above and below the main diagonal, and zeros everywhere else. Make a conjecture about the relationship between n and det(An ).

2.2 Evaluating Determinants by Row Reduction

113

2.2 Evaluating Determinants by Row Reduction In this section we will show how to evaluate a determinant by reducing the associated matrix to row echelon form. In general, this method requires less computation than cofactor expansion and hence is the method of choice for large matrices.

A BasicTheorem

We begin with a fundamental theorem that will lead us to an efficient procedure for evaluating the determinant of a square matrix of any size. THEOREM 2.2.1 Let

A be a square matrix. If A has a row of zeros or a column of

zeros, then det(A) = 0. Proof Since the determinant of A can be found by a cofactor expansion along any row

or column, we can use the row or column of zeros. Thus, if we let C1 , C2 , . . . , Cn denote the cofactors of A along that row or column, then it follows from Formula (7) or (8) in Section 2.1 that det(A) = 0 · C1 + 0 · C2 + · · · + 0 · Cn = 0 The following useful theorem relates the determinant of a matrix and the determinant of its transpose. Because transposing a matrix changes its columns to rows and its rows to columns, almost every theorem about the rows of a determinant has a companion version about columns, and vice versa.

Elementary Row Operations

THEOREM 2.2.2 Let A be a square matrix. Then det(A)

= det(AT ).

Proof Since transposing a matrix changes its columns to rows and its rows to columns,

the cofactor expansion of A along any row is the same as the cofactor expansion of AT along the corresponding column. Thus, both have the same determinant. The next theorem shows how an elementary row operation on a square matrix affects the value of its determinant. In place of a formal proof we have provided a table to illustrate the ideas in the 3 × 3 case (see Table 1). Table 1

The first panel of Table 1 shows that you can bring a common factor from any row (column) of a determinant through the determinant sign. This is a slightly different way of thinking about part (a) of Theorem 2.2.3.

Relationship

 ka11    a21   a31

Operation

  a11 ka13     a23  = k a21   a31 a33 

ka12 a22 a32

 a13   a23   a33 

a12 a22 a32

In the matrix B the first row of A was multiplied by k .

det(B) = k det(A)

 a21   a11  a31

  a11 a23     a13  = − a21   a31 a33 

a22 a12 a32

a12 a22 a32

 a13   a23   a33 

In the matrix B the first and second rows of A were interchanged.

det(B) = − det(A)

 a11 + ka21    a21   a31

a12 + ka22 a22 a32

  a13 + ka23  a11   a23  = a21    a31 a33

det(B) = det(A)

a12 a22 a32

 a13   a23   a33 

In the matrix B a multiple of the second row of A was added to the first row.

114

Chapter 2 Determinants THEOREM 2.2.3 Let A be an n × n matrix.

(a) If B is the matrix that results when a single row or single column of A is multiplied by a scalar k, then det(B) = k det(A). (b) If B is the matrix that results when two rows or two columns of A are interchanged, then det(B) = − det(A). (c) If B is the matrix that results when a multiple of one row of A is added to another or when a multiple of one column is added to another, then det(B) = det(A). We will verify the first equation in Table 1 and leave the other two for you. To start, note that the determinants on the two sides of the equation differ only in the first row, so these determinants have the same cofactors, C11 , C12 , C13 , along that row (since those cofactors depend only on the entries in the second two rows). Thus, expanding the left side by cofactors along the first row yields

 ka11   a21  a 31

Elementary Matrices

ka12 a22 a32

 ka13   a23  = ka11 C11 + ka12 C12 + ka13 C13 a33  = k(a11 C11 + a12 C12 + a13 C13 )   a11 a12 a13    = k a21 a22 a23  a a32 a33  31

It will be useful to consider the special case of Theorem 2.2.3 in which A = In is the n × n identity matrix and E (rather than B ) denotes the elementary matrix that results when the row operation is performed on In . In this special case Theorem 2.2.3 implies the following result. THEOREM 2.2.4 Let E be an n × n elementary matrix.

(a) If E results from multiplying a row of In by a nonzero number k, then det(E) = k . (b) If E results from interchanging two rows of In , then det(E) = −1. (c) If E results from adding a multiple of one row of In to another, then det(E) = 1.

E X A M P L E 1 Determinants of Elementary Matrices Observe that the determinant of an elementary matrix cannot be zero.

The following determinants of elementary matrices, which are evaluated by inspection, illustrate Theorem 2.2.4.

 1  0   0  0

0

0

3

0



0  0 

 = 3, 0 1 0  0 0 1

The second row of I4 was multiplied by 3.

Matrices with Proportional Rows or Columns

 0  0   0  1

0

0

1

0

0

1



1  0 

 = −1,  0 0 0 0

The first and last rows of

I4 were interchanged.

 1  0   0  0

  1 0 0  =1 0 1 0  0 0 1 0

0

7

7 times the last row of I4 was added to the first row.

If a square matrix A has two proportional rows, then a row of zeros can be introduced by adding a suitable multiple of one of the rows to the other. Similarly for columns. But adding a multiple of one row or column to another does not change the determinant, so from Theorem 2.2.1, we must have det(A) = 0. This proves the following theorem.

2.2 Evaluating Determinants by Row Reduction

115

THEOREM 2.2.5 If A is a square matrix with two proportional rows or two proportional

columns, then det(A) = 0.

E X A M P L E 2 Proportional Rows or Columns

Each of the following matrices has two proportional rows or columns; thus, each has a determinant of zero.





−1 4 , −2 8

Evaluating Determinants by Row Reduction

1

⎢ ⎣−4

2

−2

7





−1 ⎢ 6 −2 ⎢ ⎢ ⎣ 5 8 −9 3



5⎦,

8

−4

3

3

4 5 1

−12

⎤ −5 2⎥ ⎥ ⎥ 4⎦ 15

We will now give a method for evaluating determinants that involves substantially less computation than cofactor expansion. The idea of the method is to reduce the given matrix to upper triangular form by elementary row operations, then compute the determinant of the upper triangular matrix (an easy computation), and then relate that determinant to that of the original matrix. Here is an example.

E X A M P L E 3 Using Row Reduction to Evaluate a Determinant

Evaluate det(A) where



0 ⎢ A = ⎣3 2 Solution We will reduce

apply Theorem 2.1.2.

 0   det(A) =  3  2

Even with today’s fastest computers it would take millions of years to calculate a 25 × 25 determinant by cofactor expansion, so methods based on row reduction are often used for large determinants. For determinants of small size (such as those in this text), cofactor expansion is often a reasonable choice.

1 −6 6



1 −6 6

5 ⎥ 9⎦ 1

A to row echelon form (which is upper triangular) and then 



3 5     9 = − 0   2 1

−6 1 6



9   5  1

   1 −2 3      1 5 = −3  0   2 6 1    1 −2 3     1 5 = −3  0    0 10 −5     1 −2 3     1 5 = −3  0   0 0 −55     1 −2 3     1 5 = (−3)(−55)  0   0 0 1 = (−3)(−55)(1) = 165

The first and second rows of A were interchanged.

A common factor of 3 from the first row was taken through the determinant sign.

−2 times the first row was added to the third row.

−10 times the second row was added to the third row.

A common factor of −55 from the last row was taken through the determinant sign.

116

Chapter 2 Determinants

E X A M P L E 4 Using Column Operations to Evaluate a Determinant

Compute the determinant of



1 ⎢2 ⎢ A=⎢ ⎣0 7

0 7 6 3

0 0 3 1



3 6⎥ ⎥ ⎥ 0⎦ −5

Solution This determinant could be computed as above by using elementary row oper-

ations to reduce A to row echelon form, but we can put A in lower triangular form in one step by adding −3 times the first column to the fourth to obtain



1 ⎢2 ⎢ det(A) = det ⎢ ⎣0 7

Example 4 points out that it is always wise to keep an eye open for column operations that can shorten computations.

0 7 6 3

0 0 3 1



0 0⎥ ⎥ ⎥ = (1)(7)(3)(−26) = −546 0⎦ −26

Cofactor expansion and row or column operations can sometimes be used in combination to provide an effective method for evaluating determinants. The following example illustrates this idea. E X A M P L E 5 Row Operations and Cofactor Expansion

Evaluate det(A) where



3 ⎢1 ⎢ A=⎢ ⎣2 3

5 2 4 7

−2 −1 1 5



6 1⎥ ⎥ ⎥ 5⎦ 3

Solution By adding suitable multiples of the second row to the remaining rows, we

obtain

  0 −1 1  1 2 − 1  det(A) =  0 0 3  0 1 8   −1 1   3 = − 0   1 8   −1 1   3 = − 0   0 9  3 = −(−1) 



3  1   3  0



3   3  0

Cofactor expansion along the first column



3   3  3

We added the first row to the third row.



3  9 3 = −18

Cofactor expansion along the first column

2.2 Evaluating Determinants by Row Reduction

117

Exercise Set 2.2 In Exercises 1–4, verify that det(A) = det(AT ).



1. A =

−2

3 4

1





1

2

−2

4 ⎢ 4. A = ⎣ 0 −1

2 2 1





−1

2 ⎢ 3. A = ⎣1 5

−6

2. A =

3 ⎥ 4⎦ 6

2

−3

 d   15. g  a

⎤ −1 ⎥ −3 ⎦ 5

In Exercises 5–8, find the determinant of the given elementary matrix by inspection.



1

0

0

−5

0

1 0 0

⎢0 ⎢ 5. ⎢ ⎣0 ⎡

1 ⎢0 ⎢ 7. ⎢ ⎣0 0

0 0 1 0

0



⎥ 0⎥ ⎥ 0⎦

0 0 0 1 0 0



1 ⎢ 6. ⎣ 0 −5

1





0 0⎥ ⎥ ⎥ 0⎦ 1

1 ⎢0 ⎢ 8. ⎢ ⎣0 0



0 1 0

0 ⎥ 0⎦ 1

0

0 0 1 0

− 13 0 0



0 0⎥ ⎥ ⎥ 0⎦ 1

In Exercises 9–14, evaluate the determinant of the matrix by first reducing the matrix to row echelon form and then using some combination of row operations and cofactor expansion.



−6

3 ⎢ 9. ⎣−2 0



2

1 0 2 1

⎢1 ⎢ 11. ⎢ ⎣0 0



1

⎢ ⎢−2 ⎢ 13. ⎢ ⎢ 0 ⎢ 0 ⎣ 0



7 1

1 ⎢ 5 ⎢ 14. ⎢ ⎣−1 2





9 ⎥ −2 ⎦ 5

3 ⎢ 10. ⎣ 0 −2





3 1 1 2

1 1⎥ ⎥ ⎥ 0⎦ 3

3

1

5

−7

0 1 2 0

−4

0 0 0

−2 −9 2 8

3 6 −6 6

1

−3

5

4 −2

⎢ 12. ⎣−2

3

6 0 1

⎤ −9 ⎥ −2 ⎦

 1   a  2 a



(a) det ⎣0

b e h

 c + f   −f   i 

b b2



1 

 c  = (b − a)(c − a)(c − b)  c2 

a31 0

0 0

a32 a42

⎥ a23 ⎦ = −a13 a22 a31 a33 ⎤ 0 a14 a23 a24 ⎥ ⎥ ⎥ = a14 a23 a32 a41 a33 a34 ⎦ a43 a44

In Exercises 25–28, confirm the identities without evaluating the determinants directly.

⎥ 25.

26.

27.

 c  f  = −6 i

1

a22 a32

⎢0 ⎢ (b) det ⎢ ⎣0 a41

In Exercises 15–22, evaluate the determinant, given that

 a  d  g

b+e −e h

24. Verify the formulas in parts (a) and (b) and then make a conjecture about a general result of which these results are special cases. ⎡ ⎤ 0 0 a13

0 ⎥ 1⎦ 2



 i   f  c

23. Use row reduction to show that



1 3⎥ ⎥ ⎥ −2 ⎦ 1

 −f   4i 

−e 4h

h e b

 a + d   18.  −d   g



3c

3b





 g   16. d  a

 f   i  c

    a + g b + h c + i   a b c         2e 2f  e f  19.  d 20.  2d      g g + 3a h + 3b i + 3c h i       −3a a −3b −3c  b c        e f  e f 21.  d 22.  d     g − 4d h − 4e i − 4f  2a 2b 2c

5

2⎥ ⎥ 1⎥ ⎥ 1⎥ ⎦ 1

0 1 1

  3a   17. −d   4g

e h b

28.

    a1 b1 a1 + b1 + c1  a1 b1 c1          a2 b2 a2 + b2 + c2  = a2 b2 c2      a3 b3 a3 + b3 + c3  a3 b3 c3     a1 a2 a1 + b1 t a2 + b2 t a3 + b3 t       2  a1 t + b1 a2 t + b2 a3 t + b3  = (1 − t ) b1 b2    c1 c2  c1 c2 c3      a1 + b1 a1 − b1 c1  a1 b1 c1          a + b a − b c 2 = −  2 a2 b2 c2  2 2 2 2     a3 + b3 a3 − b3 c3  a3 b3 c3      a1 b1 + ta1 c1 + rb1 + sa1  a1 a2 a3          a2 b2 + ta2 c2 + rb2 + sa2  = b1 b2 b3      a3 b3 + ta3 c3 + rb3 + sa3  c1 c2 c3 

 a3   b3   c3 

118

Chapter 2 Determinants



In Exercises 29–30, show that det(A) = 0 without directly evaluating the determinant.



−2

8 2 10 −6

⎢ 3

29. A = ⎢ ⎣

1 4



−4 ⎢ 1 ⎢ ⎢ 30. A = ⎢ 1 ⎢ ⎣ 1

1 −4 1 1 1

1



1 5 6 4

4 1⎥ ⎥ 5⎦ −3

1 1 −4 1 1

1 1 1 −4 1

A M= C

0

(a) If A is a 4 × 4 matrix and B is obtained from A by interchanging the first two rows and then interchanging the last two rows, then det(B) = det(A).



or M =

A 0

C B



in which A and B are square, then det(M) = det(A) det(B). Use this result to compute the determinants of the matrices in Exercises 31 and 32.



−9



1 ⎢ 2 ⎢ ⎢ ⎢−1

2 5 3

0 0 2

8 4 6

6 7 9

5⎥ ⎥ ⎥ −2⎥

⎢ 0 ⎢ ⎣ 0

0 0 0

0 0 0

3 2 −3

0 1 8

0⎥ ⎥ 0⎦ −4

31. M = ⎢ ⎢

0



1 ⎢0 ⎢ ⎢ 0 32. M = ⎢ ⎢

⎢ ⎣0 2

2 1 0

0 2 1

0 0 0

0 0

0 0

1 0

⎤ b b⎥ ⎥ ⎥ b⎦ a

b b a b

TF. In parts (a)–(f ) determine whether the statement is true or false, and justify your answer.

1 1⎥ ⎥ ⎥ 1⎥ ⎥ 1⎦ −4



B

b a b b

True-False Exercises ⎤

It can be proved that if a square matrix M is partitioned into block triangular form as



a ⎢b ⎢ ⎢ ⎣b b

(b) If A is a 3 × 3 matrix and B is obtained from A by multiplying the first column by 4 and multiplying the third column by 43 , then det(B) = 3 det(A). (c) If A is a 3 × 3 matrix and B is obtained from A by adding 5 times the first row to each of the second and third rows, then det(B) = 25 det(A). (d) If A is an n × n matrix and B is obtained from A by multiplying each row of A by its row number, then det(B) =

⎥ ⎥

n(n + 1) 2

det(A)

(e) If A is a square matrix with two identical columns, then det(A) = 0. (f ) If the sum of the second and fourth row vectors of a 6 × 6 matrix A is equal to the last row vector, then det(A) = 0.



0 0⎥ ⎥ ⎥ 0⎥

Working withTechnology T1. Find the determinant of



⎥ ⎥ 2⎦

4.2

1

33. Let A be an n × n matrix, and let B be the matrix that results when the rows of A are written in reverse order. State a theorem that describes how det(A) and det(B) are related. 34. Find the determinant of the following matrix.

⎢0.0 ⎢ A=⎢ ⎣4.5 4.7



−1.3

1.1

0 .0

−3.2

1 .3

0.0

14.8⎦

1 .0

3 .4

2 .3

6 .0

3.4⎥ ⎥



by reducing the matrix to reduced row echelon form, and compare the result obtained in this way to that obtained in Exercise T1 of Section 2.1.

2.3 Properties of Determinants; Cramer’s Rule In this section we will develop some fundamental properties of matrices, and we will use these results to derive a formula for the inverse of an invertible matrix and formulas for the solutions of certain kinds of linear systems.

Basic Properties of Determinants

Suppose that A and B are n × n matrices and k is any scalar. We begin by considering possible relationships among det(A), det(B), and det(kA), det(A + B), and det(AB) Since a common factor of any row of a matrix can be moved through the determinant sign, and since each of the n rows in kA has a common factor of k , it follows that

2.3 Properties of Determinants; Cramer’s Rule

det(kA) = k n det(A) For example,

  ka11    ka21   ka31

   a11 ka13     ka23  = k 3  a21    a31 ka33 

ka12 ka22 ka32

119

(1)

a12 a22 a32

 a13   a23   a33 

Unfortunately, no simple relationship exists among det(A), det(B), and det(A + B). In particular, det(A + B) will usually not be equal to det(A) + det(B). The following example illustrates this fact. E X A M P L E 1 det(A + B)  = det(A) + det(B)



Consider

1 A= 2



2 3 , B= 5 1



1 4 , A+B = 3 3

3 8

We have det(A) = 1, det(B) = 8, and det(A + B) = 23; thus det(A + B)  = det(A) + det(B)

In spite of the previous example, there is a useful relationship concerning sums of determinants that is applicable when the matrices involved are the same except for one row (column). For example, consider the following two matrices that differ only in the second row:

A=

a11 a21

a12 a22

and B =

a11 b21

a12 b22

Calculating the determinants of A and B , we obtain det(A) + det(B) = (a11 a22 − a12 a21 ) + (a11 b22 − a12 b21 )

= a11 (a22 + b22 ) − a12 (a21 + b21 )

a12 a11 = det a21 + b21 a22 + b22

Thus

a11 det a21

a12 a11 + det a22 b21

a12 a11 = det b22 a21 + b21

a12 a22 + b22

This is a special case of the following general result.

A, B, and C be n × n matrices that differ only in a single row, say the r th, and assume that the r th row of C can be obtained by adding corresponding entries in the r th rows of A and B . Then

THEOREM 2.3.1 Let

det(C) = det(A) + det(B) The same result holds for columns.

E X A M P L E 2 Sums of Determinants

We leave it to you to confirm the following equality by evaluating the determinants.



1 ⎢ det ⎣ 2 1+0

7 0 4+1

5 3





1 ⎢ ⎥ ⎦ = det ⎣2 1 7 + (−1)

7 0 4





5 1 ⎥ ⎢ 3⎦ + det ⎣2 7 0

7 0 1



5 ⎥ 3⎦ −1

120

Chapter 2 Determinants

Determinant of a Matrix Product

Considering the complexity of the formulas for determinants and matrix multiplication, it would seem unlikely that a simple relationship should exist between them. This is what makes the simplicity of our next result so surprising. We will show that if A and B are square matrices of the same size, then det(AB) = det(A) det(B)

(2)

The proof of this theorem is fairly intricate, so we will have to develop some preliminary results first. We begin with the special case of (2) in which A is an elementary matrix. Because this special case is only a prelude to (2), we call it a lemma. LEMMA 2.3.2 If B is an n × n matrix and E is an n × n elementary matrix, then

det(EB) = det(E) det(B) Proof We will consider three cases, each in accordance with the row operation that

produces the matrix E . Case 1 If E results from multiplying a row of In by k , then by Theorem 1.5.1, EB results

from B by multiplying the corresponding row by k ; so from Theorem 2.2.3(a) we have det(EB) = k det(B) But from Theorem 2.2.4(a) we have det(E) = k , so det(EB) = det(E) det(B) Cases 2 and 3 The proofs of the cases where E results from interchanging two rows of

In or from adding a multiple of one row to another follow the same pattern as Case 1 and are left as exercises. Remark It follows by repeated applications of Lemma 2.3.2 that if B is an n × n matrix and E1 , E2 , . . . , Er are n × n elementary matrices, then

det(E1 E2 · · · Er B) = det(E1 ) det(E2 ) · · · det(Er ) det(B)

DeterminantTest for Invertibility

(3)

Our next theorem provides an important criterion for determining whether a matrix is invertible. It also takes us a step closer to establishing Formula (2). THEOREM 2.3.3 A square matrix A is invertible if and only if det(A)

 = 0.

R be the reduced row echelon form of A. As a preliminary step, we will show that det(A) and det(R) are both zero or both nonzero: Let E1 , E2 , . . . , Er be the elementary matrices that correspond to the elementary row operations that produce R from A. Thus Proof Let

R = Er · · · E2 E1 A and from (3), det(R) = det(Er ) · · · det(E2 ) det(E1 ) det(A)

(4)

We pointed out in the margin note that accompanies Theorem 2.2.4 that the determinant of an elementary matrix is nonzero. Thus, it follows from Formula (4) that det(A) and det(R) are either both zero or both nonzero, which sets the stage for the main part of the proof. If we assume first that A is invertible, then it follows from Theorem 1.6.4 that

2.3 Properties of Determinants; Cramer’s Rule

It follows from Theorems 2.3.3 and 2.2.5 that a square matrix with two proportional rows or two proportional columns is not invertible.

121

R = I and hence that det(R) = 1 ( = 0). This, in turn, implies that det(A)  = 0, which is what we wanted to show. Conversely, assume that det(A)  = 0. It follows from this that det(R)  = 0, which tells us that R cannot have a row of zeros. Thus, it follows from Theorem 1.4.3 that R = I and hence that A is invertible by Theorem 1.6.4. E X A M P L E 3 Determinant Test for Invertibility

Since the first and third rows of



1

⎢ A = ⎣1 2



2

3

0

1⎦

4

6



are proportional, det(A) = 0. Thus A is not invertible. We are now ready for the main result concerning products of matrices. THEOREM 2.3.4 If A and B are square matrices of the same size, then

det(AB) = det(A) det(B) Proof We divide the proof into two cases that depend on whether or not A is invertible.

If the matrix A is not invertible, then by Theorem 1.6.5 neither is the product AB . Thus, from Theorem 2.3.3, we have det(AB) = 0 and det(A) = 0, so it follows that det(AB) = det(A) det(B). Now assume that A is invertible. By Theorem 1.6.4, the matrix A is expressible as a product of elementary matrices, say Augustin Louis Cauchy (1789–1857) Historical Note In 1815 the great French mathematician Augustin Cauchy published a landmark paper in which he gave the first systematic and modern treatment of determinants. It was in that paper that Theorem 2.3.4 was stated and proved in full generality for the first time. Special cases of the theorem had been stated and proved earlier, but it was Cauchy who made the final jump. [Image: © Bettmann/CORBIS]

A = E1 E2 · · · Er

(5)

so

AB = E1 E2 · · · Er B Applying (3) to this equation yields det(AB) = det(E1 ) det(E2 ) · · · det(Er ) det(B) and applying (3) again yields det(AB) = det(E1 E2 · · · Er ) det(B) which, from (5), can be written as det(AB) = det(A) det(B). E X A M P L E 4 Verifying that det(AB) = det(A) det(B)

Consider the matrices

A=



3

1

2

1

, B=

−1 3 5

8

, AB =

2

17

3

14

We leave it for you to verify that det(A) = 1, det(B) = −23, and det(AB) = −23 Thus det(AB) = det(A) det(B), as guaranteed by Theorem 2.3.4. The following theorem gives a useful relationship between the determinant of an invertible matrix and the determinant of its inverse.

122

Chapter 2 Determinants THEOREM 2.3.5 If A is invertible, then

det(A−1 ) =

1 det(A)

A−1A = I , it follows that det(A−1A) = det(I). Therefore, we must have det(A−1 ) det(A) = 1. Since det(A)  = 0, the proof can be completed by dividing through by det(A).

Proof Since

Adjoint of a Matrix

In a cofactor expansion we compute det(A) by multiplying the entries in a row or column by their cofactors and adding the resulting products. It turns out that if one multiplies the entries in any row by the corresponding cofactors from a different row, the sum of these products is always zero. (This result also holds for columns.) Although we omit the general proof, the next example illustrates this fact. E X A M P L E 5 Entries and Cofactors from Different Rows



Let

3

2

⎢ A = ⎣1 2

6

−4

⎤ −1 ⎥ 3⎦ 0

We leave it for you to verify that the cofactors of A are

C11 = 12

C12 = 6

C13 = −16

C21 = 4

C22 = 2

C23 = 16

C31 = 12

C32 = −10

C33 = 16

so, for example, the cofactor expansion of det(A) along the first row is det(A) = 3C11 + 2C12 + (−1)C13 = 36 + 12 + 16 = 64 and along the first column is det(A) = 3C11 + C21 + 2C31 = 36 + 4 + 24 = 64 Suppose, however, we multiply the entries in the first row by the corresponding cofactors from the second row and add the resulting products. The result is 3C21 + 2C22 + (−1)C23 = 12 + 4 − 16 = 0 Or suppose we multiply the entries in the first column by the corresponding cofactors from the second column and add the resulting products. The result is again zero since Leonard Eugene Dickson (1874–1954) Historical Note The use of the term adjoint for the transpose of the matrix of cofactors appears to have been introduced by the American mathematician L. E. Dickson in a research paper that he published in 1902. [Image: Courtesy of the American Mathematical Society www.ams.org]

3C12 + 1C22 + 2C32 = 18 + 2 − 20 = 0 DEFINITION 1 If A is any n × n matrix and Cij is the cofactor of aij , then the matrix



C11 ⎢C ⎢ 21 ⎢ .. ⎣ . Cn1

C12 C22 .. . Cn2

⎤ C1n C2n ⎥ ⎥ .. ⎥ . ⎦ · · · Cnn ··· ···

is called the matrix of cofactors from A. The transpose of this matrix is called the adjoint of A and is denoted by adj(A).

2.3 Properties of Determinants; Cramer’s Rule

123

E X A M P L E 6 Adjoint of a 3 × 3 Matrix

Let



3 ⎣ A= 1 2

⎤ −1 3⎦

2 6 −4

0

As noted in Example 5, the cofactors of A are

C11 = 12 C21 = 4 C31 = 12

C12 = 6 C22 = 2 C32 = −10

so the matrix of cofactors is



12 ⎣ 4 12 and the adjoint of A is

6 2 −10



12 ⎣ 6 adj(A) = −16

C13 = −16 C23 = 16 C33 = 16

−16



16⎦ 16

4 2 16



12 −10⎦ 16

In Theorem 1.4.5 we gave a formula for the inverse of a 2 × 2 invertible matrix. Our next theorem extends that result to n × n invertible matrices.

It follows from Theorems 2.3.5 and 2.1.2 that if A is an invertible triangular matrix, then det(A−1 ) =

1

1

a11 a22

···

a11

,

1

a22

,...,

If A is an invertible matrix, then

A− 1 =

1

ann

Moreover, by using the adjoint formula it is possible to show that 1

THEOREM 2.3.6 Inverse of a Matrix Using Its Adjoint

are actually the successive diagonal entries of A−1 (compare A and A−1 in Example 3 of Section 1.7).

(6)

Proof We show first that

A adj(A) = det(A)I

1

ann

1 adj(A) det(A)

Consider the product



a11 ⎢a21 ⎢ ⎢ .. ⎢ . A adj(A) = ⎢ ⎢ ai 1 ⎢ ⎢ . ⎣ .. an1

⎤ a1n ⎡ a2n ⎥ ⎥ C11 .. ⎥ ⎢C 12 . ⎥ ⎥⎢ ⎢ .. ⎥ ain ⎥ ⎣ . .. ⎥ C 1n . ⎦

a12 a22 .. .

... ...

ai 2 .. .

...

an2

. . . ann

C21 C22 .. .

... ...

Cj 1 Cj 2 .. .

C2n

. . . Cj n

... ...

⎤ Cn1 Cn2 ⎥ ⎥ .. ⎥ . ⎦

. . . Cnn

The entry in the i th row and j th column of the product A adj(A) is

ai 1 Cj 1 + ai 2 Cj 2 + · · · + ain Cj n

(7)

(see the shaded lines above). If i = j , then (7) is the cofactor expansion of det(A) along the i th row of A (Theorem 2.1.1), and if i  = j , then the a’s and the cofactors come from different rows of A, so the value of (7) is zero (as illustrated in Example 5). Therefore,

124

Chapter 2 Determinants



det(A) ⎢ ⎢ 0

A adj(A) = ⎢ ⎢ ⎣

.. .

0 det(A)

.. .

0

0

··· ···

0 0

⎤ ⎥ ⎥ ⎥ = det(A)I ⎥ ⎦

.. . · · · det(A)

(8)

Since A is invertible, det(A)  = 0. Therefore, Equation (8) can be rewritten as



1 1 adj(A) = I [A adj(A)] = I or A det(A) det(A) Multiplying both sides on the left by A−1 yields

A−1 =

1 adj(A) det(A)

E X A M P L E 7 Using the Adjoint to Find an Inverse Matrix

Use Formula (6) to find the inverse of the matrix A in Example 6.

= 64. Thus, ⎡ ⎤ 12 4 12 ⎢ 64 ⎥ ⎢ 6 2 −10⎦ = ⎢ 64 ⎣ 16 16 16 − 64

Solution We showed in Example 5 that det(A)

⎡ A−1 =

Cramer’s Rule

1 1 ⎢ adj(A) = ⎣ det(A) 64

12 6 −16

4 64 2 64 16 64



12 64 ⎥

10 ⎥ − 64 ⎥ ⎦ 16 64

Our next theorem uses the formula for the inverse of an invertible matrix to produce a formula, called Cramer’s rule, for the solution of a linear system Ax = b of n equations in n unknowns in the case where the coefficient matrix A is invertible (or, equivalently, when det(A)  = 0). THEOREM 2.3.7 Cramer’s Rule

If Ax = b is a system of n linear equations in n unknowns such that det(A)  = 0, then the system has a unique solution. This solution is

x1 =

det(A1 ) det(A2 ) det(An ) , x2 = , . . . , xn = det(A) det(A) det(A)

where Aj is the matrix obtained by replacing the entries in the j th column of A by the entries in the matrix ⎡ ⎤

b1 ⎢b ⎥ ⎢ 2⎥ b = ⎢ .. ⎥ ⎣.⎦ bn

Gabriel Cramer (1704–1752) Historical Note Variations of Cramer’s rule were fairly well known before the Swiss mathematician discussed it in work he published in 1750. It was Cramer’s superior notation that popularized the method and led mathematicians to attach his name to it. [Image: Science Source/Photo Researchers]

 = 0, then A is invertible, and by Theorem 1.6.2, x = A−1 b is the unique solution of Ax = b. Therefore, by Theorem 2.3.6 we have ⎤⎡ ⎤ ⎡ b1 C11 C21 · · · Cn1 ⎥ ⎢ ⎥ ⎢ C C · · · C b 1 1 22 n2 ⎥ ⎢ 2 ⎥ ⎢ 12 x = A−1 b = adj(A)b = ⎥ ⎢ ⎢ .. .. .. .⎥ det(A) det(A) ⎣ . . . ⎦ ⎣ .. ⎦ C1n C2n · · · Cnn bn Proof If det(A)

2.3 Properties of Determinants; Cramer’s Rule

125

Multiplying the matrices out gives

⎤ b1 C11 + b2 C21 + · · · + bn Cn1 ⎢b C + b C + · · · + b C ⎥ 1 2 22 n n2 ⎥ ⎢ 1 12 x= ⎢ .. .. .. ⎥ det(A) ⎣ . . . ⎦ b1 C1n + b2 C2n + · · · + bn Cnn ⎡

The entry in the j th row of x is therefore

xj =

b1 C1j + b2 C2j + · · · + bn Cnj det(A)

(9)

⎤ a11 a12 · · · a1j −1 b1 a1j +1 · · · a1n ⎥ ⎢a ⎢ 21 a22 · · · a2j −1 b2 a2j +1 · · · a2n ⎥ Aj = ⎢ .. .. .. .. .. .. ⎥ ⎣ . . . . . . ⎦ an1 an2 · · · anj −1 bn anj +1 · · · ann Since Aj differs from A only in the j th column, it follows that the cofactors of entries b1 , b2 , . . . , bn in Aj are the same as the cofactors of the corresponding entries in the j th column of A. The cofactor expansion of det(Aj ) along the j th column is therefore ⎡

Now let

det(Aj ) = b1 C1j + b2 C2j + · · · + bn Cnj Substituting this result in (9) gives

xj =

det(Aj ) det(A)

E X A M P L E 8 Using Cramer’s Rule to Solve a Linear System

Use Cramer’s rule to solve

x1 +

+ 2x3 = 6 −3x1 + 4x2 + 6x3 = 30 −x1 − 2x2 + 3x3 = 8 ⎡

Solution

For n > 3, it is usually more efficient to solve a linear system with n equations in n unknowns by Gauss–Jordan elimination than by Cramer’s rule. Its main use is for obtaining properties of solutions of a linear system without actually solving the system.

EquivalenceTheorem









1 ⎢ A = ⎣−3 −1

0 4 −2

2 6 ⎥ ⎢ 6⎦, A1 = ⎣30 8 3

1 ⎢ A2 = ⎣−3 −1

6 30 8

2 1 ⎢ ⎥ 6⎦, A3 = ⎣−3 3 −1



0 4 −2 0 4 −2



2 ⎥ 6⎦, 3



6 ⎥ 30⎦ 8

Therefore,

x1 =

−40 det(A1 ) det(A2 ) 72 18 −10 = = = , x2 = = , det(A) 44 11 det(A) 44 11 x3 =

det(A3 ) 152 38 = = det(A) 44 11

In Theorem 1.6.4 we listed five results that are equivalent to the invertibility of a matrix A. We conclude this section by merging Theorem 2.3.3 with that list to produce the following theorem that relates all of the major topics we have studied thus far.

126

Chapter 2 Determinants THEOREM 2.3.8 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent.

O PT I O N A L

(a)

A is invertible.

(b) (c)

Ax = 0 has only the trivial solution. The reduced row echelon form of A is In .

(d )

A can be expressed as a product of elementary matrices.

(e)

Ax = b is consistent for every n × 1 matrix b.

( f)

Ax = b has exactly one solution for every n × 1 matrix b.

( g)

det(A)  = 0.

We now have all of the machinery necessary to prove the following two results, which we stated without proof in Theorem 1.7.1: • Theorem 1.7.1(c) A triangular matrix is invertible if and only if its diagonal entries are all nonzero. • Theorem 1.7.1(d ) The inverse of an invertible lower triangular matrix is lower triangular, and the inverse of an invertible upper triangular matrix is upper triangular.

Proof of Theorem 1.7.1(c) Let

A = [aij ] be a triangular matrix, so that its diagonal

entries are

a11 , a22 , . . . , ann From Theorem 2.1.2, the matrix A is invertible if and only if det(A) = a11 a22 · · · ann is nonzero, which is true if and only if the diagonal entries are all nonzero. Proof of Theorem 1.7.1(d) We will prove the result for upper triangular matrices and leave the lower triangular case for you. Assume that A is upper triangular and invertible. Since 1 A−1 = adj(A) det(A)

we can prove that A−1 is upper triangular by showing that adj(A) is upper triangular or, equivalently, that the matrix of cofactors is lower triangular. We can do this by showing that every cofactor Cij with i < j (i.e., above the main diagonal) is zero. Since

Cij = (−1)i+j Mij it suffices to show that each minor Mij with i < j is zero. For this purpose, let Bij be the matrix that results when the i th row and j th column of A are deleted, so

Mij = det(Bij )

(10)

From the assumption that i < j , it follows that Bij is upper triangular (see Figure 1.7.1). Since A is upper triangular, its (i + 1)-st row begins with at least i zeros. But the i th row of Bij is the (i + 1)-st row of A with the entry in the j th column removed. Since i < j , none of the first i zeros is removed by deleting the j th column; thus the i th row of Bij starts with at least i zeros, which implies that this row has a zero on the main diagonal. It now follows from Theorem 2.1.2 that det(Bij ) = 0 and from (10) that Mij = 0.

2.3 Properties of Determinants; Cramer’s Rule

127

Exercise Set 2.3 In Exercises 1–4, verify that det(kA) = k n det(A).



1. A =

−1

2

3

4







; k=2

−1

3. A = ⎣3

2

1⎦; k = −2

1

4

5

1 ⎢ 4. A = ⎣0 0

1 2 1

⎢ ⎡

2

−2



; k = −4

2 ⎢ 19. A = ⎣−1 2



2

3

2 5

2. A =

In Exercises 19–23, decide whether the matrix is invertible, and if so, use the adjoint method to find its inverse.







1 ⎥ 3⎦; k = 3 −2

In Exercises 5–6, verify that det(AB) = det(BA) and determine whether the equality det(A + B) = det(A) + det(B) holds.













2

1

0

1

−1

5. A = ⎣3

4

0⎦ and B = ⎣7

1

2⎦

0

0

2

0

1

2

−1

⎢ ⎡

−1

8

6. A = ⎣ 1

0

−2

2



5 2

3

2 ⎢ 7. A = ⎣−1 2





⎢ ⎥ −1⎦ and B = ⎣1

−4

⎤ ⎥

3⎦

1

0

2





5 ⎥ 0⎦ 3

−1

3

1 0

1

⎥ −4 ⎦

3

1

6

0 1 3

0 ⎥ 0⎦ 6

1

10. A = ⎣ 5

0

6⎦

8

0

3



1

0

12. A = ⎣9

−1

8

9





k−3 15. A = −2 ⎡

1

⎢ 17. A = ⎣3 k

2 1 3

−2 k−2 ⎤

4 ⎥ 6⎦ 2

3 5 3 3

−1



16. A =

k

2

2

k



1

⎢ 18. A = ⎣k

0



1 2 8 2



0 ⎥ 0⎦ 6

24. 7x1 − 2x2 = 3 3x1 + x2 = 5

25. 4x + 5y =2 11x + y + 2z = 3 x + 5y + 2z = 1

6 26. x − 4y + z = 4x − y + 2z = −1 2x + 2y − 3z = −20

27. x1 − 3x2 + x3 = 4 = −2 2 x1 − x 2 − 3x3 = 0 4 x1

28. −x1 − 4x2 + 2x3 2x1 − x2 + 7x3 −x1 + x2 + 3x3 x1 − 2x2 + x3

+ x4 + 9x4 + x4 − 4x4

= −32 = 14 = 11 = −4



cos θ

⎤ 0



0⎥ ⎦ 0





sin θ

0

cos θ

0⎦

0



1

is invertible for all values of θ ; then find A−1 using Theorem 2.3.6. 31. Use Cramer’s rule to solve for y without solving for the unknowns x , z, and w . 4x + y + z + w =

6

3x + 7 y − z + w =

1

7x + 3y − 5z + 8w = −3

x + y + z + 2w =

0

0 1 3



3 ⎥ 2⎦ −4

1 2⎥ ⎥ ⎥ 9⎦ 2

0

−1

2 1 2

2 ⎢ 22. A = ⎣ 8 −5

⎢ A = ⎣ − sin θ



4⎦

√ − 7 √ −3 7 −9

5

1 0

0 3 0





5 ⎥ −3⎦ 2

30. Show that the matrix



In Exercises 15–18, find the values of k for which the matrix A is invertible.



1 ⎢2 ⎢ 23. A = ⎢ ⎣1 1

2 ⎢ 20. A = ⎣ 0 −2

29. 3x1 − x2 + x3 = 4 −x1 + 7x2 − 2x3 = 1 2x1 + 6x2 − x3 = 5



0

⎡ √ 2 ⎢ √ 14. A = ⎢ ⎣3 2



3 ⎥ 2⎦ −4

−3



11. A = ⎣−2

2 ⎢ 13. A = ⎣ 8 −5

8



0 3 0



2



2 ⎢ 8. A = ⎣ 0 −2



5 ⎥ −3⎦ 2

4







−3

2 ⎢ 9. A = ⎣0 0



5 −1 4

−3



5 ⎥ 0⎦ 3

In Exercises 24–29, solve by Cramer’s rule, where it applies.



In Exercises 7–14, use determinants to decide whether the given matrix is invertible.



2 ⎢ 21. A = ⎣0 0





5 −1 4

3

32. Let Ax = b be the system in Exercise 31. (a) Solve by Cramer’s rule.

⎥ k⎦

(b) Solve by Gauss–Jordan elimination.

1

(c) Which method involves fewer computations?

128

Chapter 2 Determinants



33. Let

a ⎢ A = ⎣d g

b e h

(f ) For every n × n matrix A, we have

⎤ c ⎥ f⎦ i

A · adj(A) = (det(A))In (g) If A is a square matrix and the linear system Ax = 0 has multiple solutions for x, then det(A) = 0.

Assuming that det(A) = −7, find (b) det(A−1 )

(a) det(3A)

(c) det(2A−1 )



a

(d) det((2A)−1 )



(e) det ⎣b

c



g h i

d ⎥ e⎦ f

34. In each part, find the determinant given that A is a 4 × 4 matrix for which det(A) = −2. (a) det(−A)

(b) det(A−1 )

(c) det(2AT )

(d) det(A3 )

35. In each part, find the determinant given that A is a 3 × 3 matrix for which det(A) = 7. (a) det(3A)

(b) det(A−1 )

(c) det(2A−1 )

(d) det((2A)−1 )

(h) If A is an n × n matrix and there exists an n × 1 matrix b such that the linear system Ax = b has no solutions, then the reduced row echelon form of A cannot be In . (i) If E is an elementary matrix, then E x = 0 has only the trivial solution. ( j) If A is an invertible matrix, then the linear system Ax = 0 has only the trivial solution if and only if the linear system A−1 x = 0 has only the trivial solution. (k) If A is invertible, then adj(A) must also be invertible. (l) If A has a row of zeros, then so does adj(A).

Working withTechnology T1. Consider the matrix

Working with Proofs 36. Prove that a square matrix A is invertible if and only if ATA is invertible. 37. Prove that if A is a square matrix, then det(ATA) = det(AAT ). 38. Let Ax = b be a system of n linear equations in n unknowns with integer coefficients and integer constants. Prove that if det(A) = 1, the solution x has integer entries. 39. Prove that if det(A) = 1 and all the entries in A are integers, then all the entries in A−1 are integers.

True-False Exercises TF. In parts (a)–(l) determine whether the statement is true or false, and justify your answer. (a) If A is a 3 × 3 matrix, then det(2A) = 2 det(A). (b) If A and B are square matrices of the same size such that det(A) = det(B), then det(A + B) = 2 det(A). (c) If A and B are square matrices of the same size and A is invertible, then det(A−1 BA) = det(B) (d) A square matrix A is invertible if and only if det(A) = 0. (e) The matrix of cofactors of A is precisely [adj(A)]T .

A=





1

1

1

1+

in which > 0. Since det(A) =  = 0, it follows from Theorem 2.3.8 that A is invertible. Compute det(A) for various small nonzero values of until you find a value that produces det(A) = 0, thereby leading you to conclude erroneously that A is not invertible. Discuss the cause of this. T2. We know from Exercise 39 that if A is a square matrix then det(ATA) = det(AAT ). By experimenting, make a conjecture as to whether this is true if A is not square. T3. The French mathematician Jacques Hadamard (1865–1963) proved that if A is an n × n matrix each of whose entries satisfies the condition |aij | ≤ M , then

| det(A)| ≤

√ nn M n

(Hadamard’s inequality). For the following matrix A, use this result to find an interval of possible values for det(A), and then use your technology utility to show that the value of det(A) falls within this interval.



0.3

⎢0.2 ⎢ A=⎢ ⎣2.5 1.7

−2.4

−1.7

−0.3

−1.2

2 .3

0.0

1.0

−2.1

2.5



1 .4 ⎥ ⎥



1.8⎦

2.3

Chapter 2 Supplementary Exercises

129

Chapter 2 Supplementary Exercises In Exercises 1–8, evaluate the determinant of the given matrix by (a) cofactor expansion and (b) using elementary row operations to introduce zeros into the matrix.



1.

−4

2

3

3





−1 ⎢ 3. ⎣ 0 −3 ⎡

3

2.

0



2 1

1





−1

1 4

2

3

−2

−1 ⎢ 4. ⎣−4 −7

2

⎥ 1⎦

⎢−2 ⎢ 7. ⎢ ⎣ 1 −9



7

⎥ −1⎦

5

0

⎢ 5. ⎣1 ⎡



6

0

3

1

0

−1 −2

2



1

4⎥ ⎥

⎥ 1⎦

2

−1 −6



−2 −5 −8

−5 ⎢ 6. ⎣ 3

1

1

−2



−1 ⎢ 4 ⎢ 8. ⎢ ⎣ 1 −4

0



−3 ⎥ −6 ⎦ −9 4

In Exercises 17–24, use the adjoint method (Theorem 2.3.6) to find the inverse of the given matrix, if it exists. 17. The matrix in Exercise 1. 18. The matrix in Exercise 2. 19. The matrix in Exercise 3.

20. The matrix in Exercise 4.

21. The matrix in Exercise 5.

22. The matrix in Exercise 6.

23. The matrix in Exercise 7.

24. The matrix in Exercise 8.

25. Use Cramer’s rule to solve for x and y in terms of x and y .

x = 35 x − 45 y



y = 45 x + 35 y

⎥ 2⎦

26. Use Cramer’s rule to solve for x and y in terms of x and y .

2

−2

−3

3

2

2 −3

3 −2

−4

x = x cos θ − y sin θ y = x sin θ + y cos θ



⎥ ⎥ 4⎦ −1 1⎥

27. By examining the determinant of the coefficient matrix, show that the following system has a nontrivial solution if and only if α = β . x + y + αz = 0

x + y + βz = 0 αx + βy + z = 0

9. Evaluate the determinants in Exercises 3–6 by using the arrow technique (see Example 7 in Section 2.1). 10. (a) Construct a 4 × 4 matrix whose determinant is easy to compute using cofactor expansion but hard to evaluate using elementary row operations.

28. Let A be a 3 × 3 matrix, each of whose entries is 1 or 0. What is the largest possible value for det(A)?

(b) Construct a 4 × 4 matrix whose determinant is easy to compute using elementary row operations but hard to evaluate using cofactor expansion.

29. (a) For the triangle in the accompanying figure, use trigonometry to show that

b cos γ + c cos β = a c cos α + a cos γ = b a cos β + b cos α = c

11. Use the determinant to decide whether the matrices in Exercises 1–4 are invertible.

and then apply Cramer’s rule to show that

12. Use the determinant to decide whether the matrices in Exercises 5–8 are invertible. In Exercises 13–15, find the given determinant by any method.

  5  13.  b − 2  0  0   15.  0  0  5

 3   14.  a 2  2

 b − 3   −3  0

0

0

0

0

−4

0

−1

0

2

0

0

0

0

0

16. Solve for x .

 x   3

−4 1

a−1

b2 + c2 − a 2 2bc

(b) Use Cramer’s rule to obtain similar formulas for cos β and cos γ .

b

γ

a

α

 −3   0   0  0  0   1 −1    = 2 1−x  1

 a   2  4

cos α =

β

Figure Ex-29

c

30. Use determinants to show that for all real values of λ, the only solution of x − 2y = λx

x − y = λy

0

x 3

 −3   −6   x − 5

is x = 0, y = 0. 31. Prove: If A is invertible, then adj(A) is invertible and [adj(A)]−1 =

1 A = adj(A−1 ) det(A)

130

Chapter 2 Determinants C(x3, y3)

32. Prove: If A is an n × n matrix, then

B(x2, y2)

det[adj(A)] = [det(A)]n−1 33. Prove: If the entries in each row of an n × n matrix A add up to zero, then the determinant of A is zero. [Hint: Consider the product Ax, where x is the n × 1 matrix, each of whose entries is one.] 34. (a) In the accompanying figure, the area of the triangle ABC can be expressed as area ABC = area ADEC + area CEFB − area ADFB Use this and the fact that the area of a trapezoid equals 1 the altitude times the sum of the parallel sides to show 2 that    x1 y1 1    1  area ABC =  x2 y2 1   2  x3 y3 1  [Note: In the derivation of this formula, the vertices are labeled such that the triangle is traced counterclockwise proceeding from (x1 , y1 ) to (x2 , y2 ) to (x3 , y3 ). For a clockwise orientation, the determinant above yields the negative of the area.] (b) Use the result in (a) to find the area of the triangle with vertices (3, 3), (4, 0), (−2, −1).

A(x1, y1)

D

E

Figure Ex-34

F

35. Use the fact that 21375, 38798, 34162, 40223, 79154 are all divisible by 19 to show that

 2  3   3  4  7

1

3

7

8

7

9

4

1

6

0

2

2

9

1

5

  8   2  3  4 5

is divisible by 19 without directly evaluating the determinant. 36. Without directly evaluating the determinant, show that

  sin α    sin β   sin γ

cos α cos β cos γ



sin(α + δ) 

  sin(γ + δ) 

sin(β + δ)  = 0

CHAPTER

3

Euclidean Vector Spaces CHAPTER CONTENTS

3.1 Vectors in 2-Space, 3-Space, and n-Space 3.2 Norm, Dot Product, and Distance in R 3.3 Orthogonality

INTRODUCTION

131

142

155

3.4 The Geometry of Linear Systems 3.5 Cross Product

n

164

172

Engineers and physicists distinguish between two types of physical quantities— scalars, which are quantities that can be described by a numerical value alone, and vectors, which are quantities that require both a number and a direction for their complete physical description. For example, temperature, length, and speed are scalars because they can be fully described by a number that tells “how much”—a temperature of 20◦ C, a length of 5 cm, or a speed of 75 km/h. In contrast, velocity and force are vectors because they require a number that tells “how much” and a direction that tells “which way”—say, a boat moving at 10 knots in a direction 45◦ northeast, or a force of 100 lb acting vertically. Although the notions of vectors and scalars that we will study in this text have their origins in physics and engineering, we will be more concerned with using them to build mathematical structures and then applying those structures to such diverse fields as genetics, computer science, economics, telecommunications, and environmental science.

3.1 Vectors in 2-Space, 3-Space, and n-Space Linear algebra is primarily concerned with two types of mathematical objects, “matrices” and “vectors.” In Chapter 1 we discussed the basic properties of matrices, we introduced the idea of viewing n-tuples of real numbers as vectors, and we denoted the set of all such n-tuples as R n . In this section we will review the basic properties of vectors in two and three dimensions with the goal of extending these properties to vectors in R n .

Geometric Vectors

Terminal point

Initial point

Figure 3.1.1

Engineers and physicists represent vectors in two dimensions (also called 2-space) or in three dimensions (also called 3-space) by arrows. The direction of the arrowhead specifies the direction of the vector and the length of the arrow specifies the magnitude. Mathematicians call these geometric vectors. The tail of the arrow is called the initial point of the vector and the tip the terminal point (Figure 3.1.1). In this text we will denote vectors in boldface type such as a, b, v, w, and x, and we will denote scalars in lowercase italic type such as a , k , v , w , and x . When we want to indicate that a vector v has initial point A and terminal point B , then, as shown in Figure 3.1.2, we will write −→ v = AB 131

132

Chapter 3 Euclidean Vector Spaces

Vectors with the same length and direction, such as those in Figure 3.1.3, are said to be equivalent. Since we want a vector to be determined solely by its length and direction, equivalent vectors are regarded as the same vector even though they may be in different positions. Equivalent vectors are also said to be equal, which we indicate by writing v=w The vector whose initial and terminal points coincide has length zero, so we call this the zero vector and denote it by 0. The zero vector has no natural direction, so we will agree that it can be assigned any direction that is convenient for the problem at hand.

B v

A v = AB

Figure 3.1.2

Vector Addition

There are a number of important algebraic operations on vectors, all of which have their origin in laws of physics. Parallelogram Rule for Vector Addition If v and w are vectors in 2-space or 3-space

Equivalent vectors

that are positioned so their initial points coincide, then the two vectors form adjacent sides of a parallelogram, and the sum v + w is the vector represented by the arrow from the common initial point of v and w to the opposite vertex of the parallelogram (Figure 3.1.4a).

Figure 3.1.3

Here is another way to form the sum of two vectors. Triangle Rule for Vector Addition If v and w are vectors in 2-space or 3-space that are

positioned so the initial point of w is at the terminal point of v, then the sum v + w is represented by the arrow from the initial point of v to the terminal point of w (Figure 3.1.4b). In Figure 3.1.4c we have constructed the sums v + w and w + v by the triangle rule. This construction makes it evident that v+w=w+v (1) and that the sum obtained by the triangle rule is the same as the sum obtained by the parallelogram rule. w

w

v

v+w

v

v+w

v+w w+v

v

w

w

Figure 3.1.4

v

(a)

(b)

(c)

Vector addition can also be viewed as a process of translating points. Vector Addition Viewed asTranslation If v, w, and v + w are positioned so their initial

points coincide, then the terminal point of v + w can be viewed in two ways:

1. The terminal point of v + w is the point that results when the terminal point of v is translated in the direction of w by a distance equal to the length of w (Figure 3.1.5a). 2. The terminal point of v + w is the point that results when the terminal point of w is translated in the direction of v by a distance equal to the length of v (Figure 3.1.5b). Accordingly, we say that v + w is the translation of v by w or, alternatively, the translation of w by v.

3.1 Vectors in 2-Space, 3-Space, and n-Space

v

v+w

v+w

v

w

w

(a)

Figure 3.1.5

Vector Subtraction

133

(b)

In ordinary arithmetic we can write a − b = a + (−b), which expresses subtraction in terms of addition. There is an analogous idea in vector arithmetic.

−v, is the vector that has the same length as v but is oppositely directed (Figure 3.1.6a), and the difference of v from w, denoted by w − v, is taken to be the sum

Vector Subtraction The negative of a vector v, denoted by

w − v = w + (−v)

(2)

The difference of v from w can be obtained geometrically by the parallelogram method shown in Figure 3.1.6b, or more directly by positioning w and v so their initial points coincide and drawing the vector from the terminal point of v to the terminal point of w (Figure 3.1.6c).

v

w

w–v –v

Figure 3.1.6

Scalar Multiplication

v

1 2

v

(–1)v

(–3)v

2v

Figure 3.1.7

Parallel and Collinear Vectors

(a)

–v

v

(b)

w w–v v

(c)

Sometimes there is a need to change the length of a vector or change its length and reverse its direction. This is accomplished by a type of multiplication in which vectors are multiplied by scalars. As an example, the product 2v denotes the vector that has the same direction as v but twice the length, and the product −2v denotes the vector that is oppositely directed to v and has twice the length. Here is the general result. Scalar Multiplication If v is a nonzero vector in 2-space or 3-space, and if k is a nonzero scalar, then we define the scalar product of v by k to be the vector whose length is |k| times the length of v and whose direction is the same as that of v if k is positive and opposite to that of v if k is negative. If k = 0 or v = 0, then we define k v to be 0.

Figure 3.1.7 shows the geometric relationship between a vector v and some of its scalar multiples. In particular, observe that (−1)v has the same length as v but is oppositely directed; therefore,

(−1)v = −v

(3)

Suppose that v and w are vectors in 2-space or 3-space with a common initial point. If one of the vectors is a scalar multiple of the other, then the vectors lie on a common line, so it is reasonable to say that they are collinear (Figure 3.1.8a). However, if we translate one of the vectors, as indicated in Figure 3.1.8b, then the vectors are parallel but no longer collinear. This creates a linguistic problem because translating a vector does not change it. The only way to resolve this problem is to agree that the terms parallel and

134

Chapter 3 Euclidean Vector Spaces

collinear mean the same thing when applied to vectors. Although the vector 0 has no clearly defined direction, we will regard it as parallel to all vectors when convenient.

kv

kv v

v

(a)

Figure 3.1.8

Sums ofThree or More Vectors

(b)

Vector addition satisfies the associative law for addition, meaning that when we add three vectors, say u, v, and w, it does not matter which two we add first; that is, u + (v + w) = (u + v) + w It follows from this that there is no ambiguity in the expression u + v + w because the same result is obtained no matter how the vectors are grouped. A simple way to construct u + v + w is to place the vectors “tip to tail” in succession and then draw the vector from the initial point of u to the terminal point of w (Figure 3.1.9a). The tip-to-tail method also works for four or more vectors (Figure 3.1.9b). The tip-to-tail method makes it evident that if u, v, and w are vectors in 3-space with a common initial point, then u + v + w is the diagonal of the parallelepiped that has the three vectors as adjacent sides (Figure 3.1.9c). v

u + (v + w) (u + v) +w

Figure 3.1.9

Vectors in Coordinate Systems

The component forms of the zero vector are 0 = (0, 0) in 2-space and 0 = (0, 0, 0) in 3space.

x

u+v

u

v+

u

w

w

u

+

v+

w

+

u+

v

x

v

v+

w

w

w u

(a)

(b)

(c)

Up until now we have discussed vectors without reference to a coordinate system. However, as we will soon see, computations with vectors are much simpler to perform if a coordinate system is present to work with. If a vector v in 2-space or 3-space is positioned with its initial point at the origin of a rectangular coordinate system, then the vector is completely determined by the coordinates of its terminal point (Figure 3.1.10). We call these coordinates the components of v relative to the coordinate system. We will write v = (v1 , v2 ) to denote a vector v in 2-space with components (v1 , v2 ), and v = (v1 , v2 , v3 ) to denote a vector v in 3-space with components (v1 , v2 , v3 ). z

y (v1, v2)

(v1, v2, v3)

v

v y x

Figure 3.1.10

x

3.1 Vectors in 2-Space, 3-Space, and n-Space

135

It should be evident geometrically that two vectors in 2-space or 3-space are equivalent if and only if they have the same terminal point when their initial points are at the origin. Algebraically, this means that two vectors are equivalent if and only if their corresponding components are equal. Thus, for example, the vectors

y

v = (v1 , v2 , v3 ) and w = (w1 , w2 , w3 )

(v1, v2)

in 3-space are equivalent if and only if x

Figure 3.1.11 The ordered pair (v1 , v2 ) can represent a point or a vector.

Vectors Whose Initial Point Is Not at the Origin y P1(x1, y1)

v

OP1

P2(x2, y2)

v1 = w1 , v2 = w2 , v3 = w3 Remark It may have occurred to you that an ordered pair (v1 , v2 ) can represent either a vector with components v1 and v2 or a point with coordinates v1 and v2 (and similarly for ordered triples). Both are valid geometric interpretations, so the appropriate choice will depend on the geometric viewpoint that we want to emphasize (Figure 3.1.11). It is sometimes necessary to consider vectors whose initial points are not at the origin. −−→ If P1 P2 denotes the vector with initial point P1 (x1 , y1 ) and terminal point P2 (x2 , y2 ), then the components of this vector are given by the formula

−−→ P1 P2 = (x2 − x1 , y2 − y1 )

(4)

−−→

OP2

x

That is, the components of P1 P2 are obtained by subtracting the coordinates of the initial point from the coordinates of the terminal point. For example, in Figure 3.1.12 −−→ −−→ −−→ the vector P1 P2 is the difference of vectors OP2 and OP1 , so

−−→ −−→ −−→ P1 P2 = OP2 − OP1 = (x2 , y2 ) − (x1 , y1 ) = (x2 − x1 , y2 − y1 )

v = P1P2 = OP2 – OP1

Figure 3.1.12

As you might expect, the components of a vector in 3-space that has initial point P1 (x1 , y1 , z1 ) and terminal point P2 (x2 , y2 , z2 ) are given by

−−→ P1 P2 = (x2 − x1 , y2 − y1 , z2 − z1 )

(5)

E X A M P L E 1 Finding the Components of a Vector

−−→

The components of the vector v = P1 P2 with initial point P1 (2, −1, 4) and terminal point P2 (7, 5, −8) are v = (7 − 2, 5 − (−1), (−8) − 4) = (5, 6, −12) n-Space

The idea of using ordered pairs and triples of real numbers to represent points in twodimensional space and three-dimensional space was well known in the eighteenth and nineteenth centuries. By the dawn of the twentieth century, mathematicians and physicists were exploring the use of “higher dimensional” spaces in mathematics and physics. Today, even the layman is familiar with the notion of time as a fourth dimension, an idea used by Albert Einstein in developing the general theory of relativity. Today, physicists working in the field of “string theory” commonly use 11-dimensional space in their quest for a unified theory that will explain how the fundamental forces of nature work. Much of the remaining work in this section is concerned with extending the notion of space to n dimensions. To explore these ideas further, we start with some terminology and notation. The set of all real numbers can be viewed geometrically as a line. It is called the real line and is denoted by R or R 1 . The superscript reinforces the intuitive idea that a line is onedimensional. The set of all ordered pairs of real numbers (called 2-tuples) and the set of all ordered triples of real numbers (called 3-tuples) are denoted by R 2 and R 3 , respectively.

136

Chapter 3 Euclidean Vector Spaces

The superscript reinforces the idea that the ordered pairs correspond to points in the plane (two-dimensional) and ordered triples to points in space (three-dimensional). The following definition extends this idea. DEFINITION 1 If

n is a positive integer, then an ordered n-tuple is a sequence of n real numbers (v1 , v2 , . . . , vn ). The set of all ordered n-tuples is called n-space and is denoted by R n . Remark You can think of the numbers in an n-tuple (v1 , v2 , . . . , vn ) as either the coordinates of a generalized point or the components of a generalized vector, depending on the geometric image you want to bring to mind—the choice makes no difference mathematically, since it is the algebraic properties of n-tuples that are of concern.

Here are some typical applications that lead to n-tuples. • Experimental Data—A scientist performs an experiment and makes n numerical measurements each time the experiment is performed. The result of each experiment can be regarded as a vector y = (y1 , y2 , . . . , yn ) in R n in which y1 , y2 , . . . , yn are the measured values. • Storage and Warehousing—A national trucking company has 15 depots for storing and servicing its trucks. At each point in time the distribution of trucks in the service depots can be described by a 15-tuple x = (x1 , x2 , . . . , x15 ) in which x1 is the number of trucks in the first depot, x2 is the number in the second depot, and so forth. • Electrical Circuits—A certain kind of processing chip is designed to receive four input voltages and produce three output voltages in response. The input voltages can be regarded as vectors in R 4 and the output voltages as vectors in R 3 . Thus, the chip can be viewed as a device that transforms an input vector v = (v1 , v2 , v3 , v4 ) in R 4 into an output vector w = (w1 , w2 , w3 ) in R 3 . • Graphical Images—One way in which color images are created on computer screens is by assigning each pixel (an addressable point on the screen) three numbers that describe the hue, saturation, and brightness of the pixel. Thus, a complete color image can be viewed as a set of 5-tuples of the form v = (x, y, h, s, b) in which x and y are the screen coordinates of a pixel and h, s , and b are its hue, saturation, and brightness. • Economics—One approach to economic analysis is to divide an economy into sectors (manufacturing, services, utilities, and so forth) and measure the output of each sector by a dollar value. Thus, in an economy with 10 sectors the economic output of the entire economy can be represented by a 10-tuple s = (s1 , s2 , . . . , s10 ) in which the numbers s1 , s2 , . . . , s10 are the outputs of the individual sectors.

Albert Einstein (1879–1955)

Historical Note The German-born physicist Albert Einstein immigrated to the United States in 1935, where he settled at Princeton University. Einstein spent the last three decades of his life working unsuccessfully at producing a unified field theory that would establish an underlying link between the forces of gravity and electromagnetism. Recently, physicists have made progress on the problem using a framework known as string theory. In this theory the smallest, indivisible components of the Universe are not particles but loops that behave like vibrating strings. Whereas Einstein’s space-time universe was four-dimensional, strings reside in an 11-dimensional world that is the focus of current research. [Image: © Bettmann/CORBIS]

3.1 Vectors in 2-Space, 3-Space, and n-Space

137

• Mechanical Systems—Suppose that six particles move along the same coordinate line so that at time t their coordinates are x1 , x2 , . . . , x6 and their velocities are v1 , v2 , . . . , v6 , respectively. This information can be represented by the vector v = (x1 , x2 , x3 , x4 , x5 , x6 , v1 , v2 , v3 , v4 , v5 , v6 , t) in R 13 . This vector is called the state of the particle system at time t . Operations on Vectors in Rn

Our next goal is to define useful operations on vectors in R n . These operations will all be natural extensions of the familiar operations on vectors in R 2 and R 3 . We will denote a vector v in R n using the notation v = (v1 , v2 , . . . , vn ) and we will call 0 = (0, 0, . . . , 0) the zero vector. We noted earlier that in R 2 and R 3 two vectors are equivalent (equal) if and only if their corresponding components are the same. Thus, we make the following definition. DEFINITION 2 Vectors v

= (v1 , v2 , . . . , vn ) and w = (w1 , w2 , . . . , wn ) in R n are said

to be equivalent (also called equal) if

v1 = w1 , v2 = w2 , . . . , vn = wn We indicate this by writing v = w.

E X A M P L E 2 Equality of Vectors

(a, b, c, d) = (1, −4, 2, 7) if and only if a = 1, b = −4, c = 2, and d = 7. Our next objective is to define the operations of addition, subtraction, and scalar multiplication for vectors in R n . To motivate these ideas, we will consider how these operations can be performed on vectors in R 2 using components. By studying Figure 3.1.13 you should be able to deduce that if v = (v1 , v2 ) and w = (w1 , w2 ), then v + w = (v1 + w1 , v2 + w2 )

(6)

k v = (kv1 , kv2 )

(7)

In particular, it follows from (7) that

−v = (−1)v = (−v1 , −v2 )

y

(v1 + w1, v2 + w2)

v2

w2

(w1, w2)

v+

w

w

y (kv1, kv2)

kv v

(v1, v2)

kv2 x

Figure 3.1.13

(8)

v1

w1

v2

v

(v1, v2)

v1 kv1

x

138

Chapter 3 Euclidean Vector Spaces

and hence that w − v = w + (−v) = (w1 − v1 , w2 − v2 )

(9)

Motivated by Formulas (6)–(9), we make the following definition. DEFINITION 3 If v

= (v1 , v2 , . . . , vn ) and w = (w1 , w2 , . . . , wn ) are vectors in R n ,

and if k is any scalar, then we define v + w = (v1 + w1 , v2 + w2 , . . . , vn + wn )

(10)

k v = (kv1 , kv2 , . . . , kvn )

(11)

−v = (−v1 , −v2 , . . . , −vn )

(12)

w − v = w + (−v) = (w1 − v1 , w2 − v2 , . . . , wn − vn )

(13)

E X A M P L E 3 Algebraic Operations Using Components In words, vectors are added (or subtracted) by adding (or subtracting) their corresponding components, and a vector is multiplied by a scalar by multiplying each component by that scalar.

If v = (1, −3, 2) and w = (4, 2, 1), then v + w = (5, −1, 3), −w = (−4, −2, −1),

2v = (2, −6, 4) v − w = v + (−w) = (−3, −5, 1)

The following theorem summarizes the most important properties of vector operations. THEOREM 3.1.1 If u, v, and w are vectors in R n , and if k and m are scalars, then:

(b)

u+v=v+u (u + v) + w = u + (v + w)

(c)

u+0=0+u=u

(d )

u + (−u) = 0

(e) ( f)

k(u + v) = k u + k v (k + m)u = k u + mu

( g)

k(mu) = (km)u

(h)

1u = u

(a)

We will prove part (b) and leave some of the other proofs as exercises. Proof (b) Let u

= (u1 , u2 , . . . , un ), v = (v1 , v2 , . . . , vn ), and w = (w1 , w2 , . . . , wn ).

Then

  (u + v) + w = (u1 , u2 , . . . , un ) + (v1 , v2 , . . . , vn ) + (w1 , w2 , . . . , wn ) = (u1 + v1 , u2 + v2 , . . . , un + vn ) + (w1 , w2 , . . . , wn )   = (u1 + v1 ) + w1 , (u2 + v2 ) + w2 , . . . , (un + vn ) + wn   = u1 + (v1 + w1 ), u2 + (v2 + w2 ), . . . , un + (vn + wn ) = (u1 , u2 , . . . , un ) + (v1 + w1 , v2 + w2 , . . . , vn + wn ) = u + (v + w)

[Vector addition] [Vector addition] [Regroup] [Vector addition]

The following additional properties of vectors in R n can be deduced easily by expressing the vectors in terms of components (verify).

3.1 Vectors in 2-Space, 3-Space, and n-Space

139

THEOREM 3.1.2 If v is a vector in R n and k is a scalar, then:

(a) 0v = 0 (b) k 0 = 0 (c) (−1)v = −v Calculating Without Components

One of the powerful consequences of Theorems 3.1.1 and 3.1.2 is that they allow calculations to be performed without expressing the vectors in terms of components. For example, suppose that x, a, and b are vectors in R n , and we want to solve the vector equation x + a = b for the vector x without using components. We could proceed as follows: x+a=b [ Given ] (x + a) + (−a) = b + (−a) [ Add the negative of a to both sides ] x + (a + (−a)) = b − a

[ Part (b) of Theorem 3.1.1 ]

x+0=b−a

[ Part (d ) of Theorem 3.1.1 ]

x=b−a

[ Part (c) of Theorem 3.1.1 ]

While this method is obviously more cumbersome than computing with components in R n , it will become important later in the text where we will encounter more general kinds of vectors. Linear Combinations

Addition, subtraction, and scalar multiplication are frequently used in combination to form new vectors. For example, if v1 , v2 , and v3 are vectors in R n , then the vectors u = 2v1 + 3v2 + v3 and w = 7v1 − 6v2 + 8v3 are formed in this way. In general, we make the following definition.

Note that this definition of a linear combination is consistent with that given in the context of matrices (see Definition 6 in Section 1.3).

Alternative Notations for Vectors

R n , then w is said to be a linear combination of the vectors v1 , v2 , . . . , vr in R n if it can be expressed in the form

DEFINITION 4 If w is a vector in

w = k1 v1 + k2 v2 + · · · + kr vr

(14)

where k1 , k2 , . . . , kr are scalars. These scalars are called the coefficients of the linear combination. In the case where r = 1, Formula (14) becomes w = k1 v1 , so that a linear combination of a single vector is just a scalar multiple of that vector. Up to now we have been writing vectors in R n using the notation v = (v1 , v2 , . . . , vn )

(15) n

We call this the comma-delimited form. However, since a vector in R is just a list of its n components in a specific order, any notation that displays those components in the correct order is a valid way of representing the vector. For example, the vector in (15) can be written as v = [v1 v2 · · · vn ] (16) which is called row-vector form, or as

⎡ ⎤ v1 ⎢v ⎥ ⎢ 2⎥ ⎥ v=⎢ ⎢ .. ⎥ ⎣.⎦ vn

(17)

140

Chapter 3 Euclidean Vector Spaces

which is called column-vector form. The choice of notation is often a matter of taste or convenience, but sometimes the nature of a problem will suggest a preferred notation. Notations (15), (16), and (17) will all be used at various places in this text.

Application of Linear Combinations to Color Models The set of all such color vectors is called RGB space or the RGB color cube (Figure 3.1.14). Thus, each color vector c in this cube is expressible as a linear combination of the form

Colors on computer monitors are commonly based on what is called the RGB color model. Colors in this system are created by adding together percentages of the primary colors red (R), green (G), and blue (B). One way to do this is to identify the primary colors with the vectors r = (1, 0, 0) (pure red), g = (0, 1, 0) (pure green), b = (0, 0, 1) (pure blue)

c = k1 r + k2 g + k3 b = k1 (1, 0, 0) + k2 (0, 1, 0) + k3 (0, 0, 1)

= (k1 , k2 , k3 )

where 0 ≤ ki ≤ 1. As indicated in the figure, the corners of the cube represent the pure primary colors together with the colors black, white, magenta, cyan, and yellow. The vectors along the diagonal running from black to white correspond to shades of gray.

in R 3 and to create all other colors by forming linear combinations of r, g, and b using coefficients between 0 and 1, inclusive; these coefficients represent the percentage of each pure color in the mix.

Blue

Cyan

(0, 0, 1)

(0, 1, 1)

Magenta

White

(1, 0, 1)

(1, 1, 1)

Black

Green

(0, 0, 0)

(0, 1, 0)

Figure 3.1.14

Red

Yellow

(1, 0, 0)

(1, 1, 0)

Exercise Set 3.1

−−→

In Exercises 3–4, find the components of the vector P1 P2 .

In Exercises 1–2, find the components of the vector. 1. (a)

y

z

(b) (1, 5)

(0, 0, 4)

3. (a) P1 (3, 5), P2 (2, 8)

(b) P1 (5, −2, 1), P2 (2, 4, 2)

4. (a) P1 (−6, 2), P2 (−4, −1)

(b) P1 (0, 0, 0), P2 (−1, 6, 1)

5. (a) Find the terminal point of the vector that is equivalent to u = (1, 2) and whose initial point is A(1, 1). y

(4, 1)

(b) Find the initial point of the vector that is equivalent to u = (1, 1, 3) and whose terminal point is B(−1, −1, 2).

x x y

2. (a) (–3, 3)

6. (a) Find the initial point of the vector that is equivalent to u = (1, 2) and whose terminal point is B(2, 0).

z

(b) (2, 3)

(2, 3, 0)

(0, 4, 4)

(3, 0, 4)

(b) Find the terminal point of the vector that is equivalent to u = (1, 1, 3) and whose initial point is A(0, 2, 0).

−→

y

7. Find an initial point P of a nonzero vector u = PQ with terminal point Q(3, 0, −5) and such that (a) u has the same direction as v = (4, −2, −1).

x x

(b) u is oppositely directed to v = (4, −2, −1).

3.1 Vectors in 2-Space, 3-Space, and n-Space

−→

8. Find a terminal point Q of a nonzero vector u = PQ with initial point P (−1, 3, −5) and such that (a) u has the same direction as v = (6, 7, −3). (b) u is oppositely directed to v = (6, 7, −3). 9. Let u = (4, −1), v = (0, 5), and w = (−3, −3). Find the components of (a) u + w

(b) v − 3u

(c) 2(u − 5w)

(d) 3v − 2(u + 2w)

10. Let u = (−3, 1, 2), v = (4, 0, −8), and w = (6, −1, −4). Find the components of (a) v − w

(b) 6u + 2v

(c) −3(v − 8w)

(d) (2u − 7w) − (8v + u)

11. Let u = (−3, 2, 1, 0), v = (4, 7, −3, 2), and w = (5, −2, 8, 1). Find the components of (a) v − w

(b) −u + (v − 4w)

(c) 6(u − 3v)

(d) (6v − w) − (4u + v)

22. Show that there do not exist scalars c1 , c2 , and c3 such that

c1 (1, 0, 1, 0) + c2 (1, 0, −2, 1) + c3 (2, 0, 1, 2) = (1, −2, 2, 3) 23. Let P be the point (2, 3, −2) and Q the point (7, −4, 1). (a) Find the midpoint of the line segment connecting the points P and Q. (b) Find the point on the line segment connecting the points P and Q that is 43 of the way from P to Q. 24. In relation to the points P1 and P2 in Figure 3.1.12, what can you say about the terminal point of the following vector if its initial point is at the origin?

−−→

(b) 3(2u − v)

(c) (3u − v) − (2u + 4w)

(d)

1 (w 2

−−→

−−→

u = OP1 + 21 (OP2 − OP1 ) 25. In each part, find the components of the vector u + v + w. y

(a)

y

(b)

v

w x

12. Let u = (1, 2, −3, 5, 0), v = (0, 4, −1, 1, 2), and w = (7, 1, −4, −2, 3). Find the components of (a) v + w

141

u

x

w

u

v

− 5v + 2u) + v

13. Let u, v, and w be the vectors in Exercise 11. Find the components of the vector x that satisfies the equation 3u + v − 2w = 3x + 2w.

26. Referring to the vectors pictured in Exercise 25, find the components of the vector u − v + w. 27. Let P be the point (1, 3, 7). If the point (4, 0, −6) is the midpoint of the line segment connecting P and Q, what is Q?

14. Let u, v, and w be the vectors in Exercise 12. Find the components of the vector x that satisfies the equation 2u − v + x = 7x + w.

28. If the sum of three vectors in R 3 is zero, must they lie in the same plane? Explain.

15. Which of the following vectors in R 6 , if any, are parallel to u = (−2, 1, 0, 3, 5, 1)?

29. Consider the regular hexagon shown in the accompanying figure. (a) What is the sum of the six radial vectors that run from the center to the vertices?

(a) (4, 2, 0, 6, 10, 2) (b) (4, −2, 0, −6, −10, −2)

(b) How is the sum affected if each radial vector is multiplied by 21 ?

(c) (0, 0, 0, 0, 0, 0) 16. For what value(s) of t, if any, is the given vector parallel to u = (4, −1)? (a) (8t, −2)

(b) (8t, 2t)

(c) (1, t 2 )

17. Let u = (1, −1, 3, 5) and v = (2, 1, 0, −3). Find scalars a and b so that a u + bv = (1, −4, 9, 18). 18. Let u = (2, 1, 0, 1, −1) and v = (−2, 3, 1, 0, 2). Find scalars a and b so that a u + bv = (−8, 8, 3, −1, 7). In Exercises 19–20, find scalars c1 , c2 , and c3 for which the equation is satisfied. 19. c1 (1, −1, 0) + c2 (3, 2, 1) + c3 (0, 1, 4) = (−1, 1, 19) 20. c1 (−1, 0, 2) + c2 (2, 2, −2) + c3 (1, −2, 1) = (−6, 12, 4) 21. Show that there do not exist scalars c1 , c2 , and c3 such that

c1 (−2, 9, 6) + c2 (−3, 2, 1) + c3 (1, 7, 5) = (0, 5, 4)

(c) What is the sum of the five radial vectors that remain if a is removed? (d) Discuss some variations and generalizations of the result in part (c). a f

b

e

c d

Figure Ex-29

30. What is the sum of all radial vectors of a regular n-sided polygon? (See Exercise 29.)

142

Chapter 3 Euclidean Vector Spaces

31. Prove parts (a), (c), and (d) of Theorem 3.1.1.

(f ) If a and b are scalars such that a u + bv = 0, then u and v are parallel vectors.

32. Prove parts (e)–(h) of Theorem 3.1.1.

(g) Collinear vectors with the same length are equal.

33. Prove parts (a)–(c) of Theorem 3.1.2.

(h) If (a, b, c) + (x, y, z) = (x, y, z), then (a, b, c) must be the zero vector.

Working with Proofs

True-False Exercises TF. In parts (a)–(k) determine whether the statement is true or false, and justify your answer.

(i) If k and m are scalars and u and v are vectors, then

(k + m)(u + v) = k u + mv

(a) Two equivalent vectors must have the same initial point. (b) The vectors (a, b) and (a, b, 0) are equivalent. (c) If k is a scalar and v is a vector, then v and k v are parallel if and only if k ≥ 0. (d) The vectors v + (u + w) and (w + v) + u are the same. (e) If u + v = u + w, then v = w.

( j) If the vectors v and w are given, then the vector equation 3(2v − x) = 5x − 4w + v can be solved for x. (k) The linear combinations a1 v1 + a2 v2 and b1 v1 + b2 v2 can only be equal if a1 = b1 and a2 = b2 .

3.2 Norm, Dot Product, and Distance in R n In this section we will be concerned with the notions of length and distance as they relate to vectors. We will first discuss these ideas in R 2 and R 3 and then extend them algebraically to R n .

Norm of a Vector y (v1, v2) ||v||

In this text we will denote the length of a vector v by the symbol v , which is read as the norm of v, the length of v, or the magnitude of v (the term “norm” being a common mathematical synonym for length). As suggested in Figure 3.2.1a, it follows from the Theorem of Pythagoras that the norm of a vector (v1 , v2 ) in R 2 is

v2

v = x

√ v12 + v22

(1)

Similarly, for a vector (v1 , v2 , v3 ) in R 3 , it follows from Figure 3.2.1b and two applications of the Theorem of Pythagoras that

v1

(a)

v 2 = (OR)2 + (RP )2 = (OQ)2 + (QR)2 + (RP )2 = v12 + v22 + v32

z P(v1, v2, v3)

and hence that

v =

||v|| y

O

S

Q

√ v12 + v22 + v32

(2)

Motivated by the pattern of Formulas (1) and (2), we make the following definition.

R

x

(b) Figure 3.2.1

= (v1 , v2 , . . . , vn ) is a vector in R n , then the norm of v (also called the length of v or the magnitude of v) is denoted by v , and is defined by the formula

DEFINITION 1 If v

v =

" v12 + v22 + · · · + vn2

(3)

3.2 Norm, Dot Product, and Distance in R n

143

E X A M P L E 1 Calculating Norms

It follows from Formula (2) that the norm of the vector v = (−3, 2, 1) in R 3 is

v =

" √ (−3)2 + 22 + 12 = 14

and it follows from Formula (3) that the norm of the vector v = (2, −1, 3, −5) in R 4 is

v =

"

22 + (−1)2 + 32 + (−5)2 =



39

Our first theorem in this section will generalize to R n the following three familiar facts about vectors in R 2 and R 3 : • Distances are nonnegative. • The zero vector is the only vector of length zero. • Multiplying a vector by a scalar multiplies its length by the absolute value of that scalar. It is important to recognize that just because these results hold in R 2 and R 3 does not guarantee that they hold in R n —their validity in R n must be proved using algebraic properties of n-tuples. THEOREM 3.2.1 If v is a vector in R n , and if k is any scalar, then:

(a) v ≥ 0 (b) v = 0 if and only if v = 0 (c) k v = |k| v We will prove part (c) and leave (a) and (b) as exercises. Proof (c) If v

= (v1 , v2 , . . . , vn ), then k v = (kv1 , kv2 , . . . , kvn ), so "

k v = (kv1 )2 + (kv2 )2 + · · · + (kvn )2 # = (k 2 )(v12 + v22 + · · · + vn2 ) # = |k| v12 + v22 + · · · + vn2 = |k| v

Unit Vectors

WARNING Sometimes you will

see Formula (4) expressed as v u=

v This is just a more compact way of writing that formula and is not intended to convey that v is being divided by v .

A vector of norm 1 is called a unit vector. Such vectors are useful for specifying a direction when length is not relevant to the problem at hand. You can obtain a unit vector in a desired direction by choosing any nonzero vector v in that direction and multiplying v by the reciprocal of its length. For example, if v is a vector of length 2 in R 2 or R 3 , then 21 v is a unit vector in the same direction as v. More generally, if v is any nonzero vector in R n , then u=

1

v

(4)

v

defines a unit vector that is in the same direction as v. We can confirm that (4) is a unit vector by applying part (c) of Theorem 3.2.1 with k = 1/ v to obtain

u = k v = |k| v = k v =

1

v

v = 1

144

Chapter 3 Euclidean Vector Spaces

The process of multiplying a nonzero vector by the reciprocal of its length to obtain a unit vector is called normalizing v.

E X A M P L E 2 Normalizing a Vector

Find the unit vector u that has the same direction as v = (2, 2, −1). Solution The vector v has length

v =

"

22 + 22 + (−1)2 = 3

Thus, from (4) u = 13 (2, 2, −1) =

2 3

, 23 , − 13



As a check, you may want to confirm that u = 1. The Standard Unit Vectors y (0, 1)

and in R 3 by

j x i

i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1) (Figure 3.2.2). Every vector v = (v1 , v2 ) in R 2 and every vector v = (v1 , v2 , v3 ) in R 3 can be expressed as a linear combination of standard unit vectors by writing

(1, 0)

(a) z

v = (v1 , v2 ) = v1 (1, 0) + v2 (0, 1) = v1 i + v2 j

(5)

(0, 0, 1)

v = (v1 , v2 , v3 ) = v1 (1, 0, 0) + v2 (0, 1, 0) + v3 (0, 0, 1) = v1 i + v2 j + v3 k

(6)

k

x

When a rectangular coordinate system is introduced in R 2 or R 3 , the unit vectors in the positive directions of the coordinate axes are called the standard unit vectors. In R 2 these vectors are denoted by i = (1, 0) and j = (0, 1)

j

y

i (1, 0, 0)

(0, 1, 0)

Moreover, we can generalize these formulas to R n by defining the standard unit vectors in R n to be e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0, . . . , 1)

(b)

(7)

in which case every vector v = (v1 , v2 , . . . , vn ) in R n can be expressed as

Figure 3.2.2

v = (v1 , v2 , . . . , vn ) = v1 e1 + v2 e2 + · · · + vn en

(8)

E X A M P L E 3 Linear Combinations of Standard Unit Vectors

(2, −3, 4) = 2i − 3j + 4k (7, 3, −4, 5) = 7e1 + 3e2 − 4e3 + 5e4

Distance in Rn

−−→

If P1 and P2 are points in R 2 or R 3 , then the length of the vector P1 P2 is equal to the distance d between the two points (Figure 3.2.3). Specifically, if P1 (x1 , y1 ) and P2 (x2 , y2 ) are points in R 2 , then Formula (4) of Section 3.1 implies that

" −−→ d = P1 P2 = (x2 − x1 )2 + (y2 − y1 )2

(9)

3.2 Norm, Dot Product, and Distance in R n

d

P2

145

This is the familiar distance formula from analytic geometry. Similarly, the distance between the points P1 (x1 , y1 , z1 ) and P2 (x2 , y2 , z2 ) in 3-space is

" −−→ d(u, v) = P1 P2 = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

P1 d = ||P1P2||

(10)

Motivated by Formulas (9) and (10), we make the following definition.

Figure 3.2.3

= (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are points in R n , then we denote the distance between u and v by d(u, v) and define it to be " d(u, v) = u − v = (u1 − v1 )2 + (u2 − v2 )2 + · · · + (un − vn )2 (11) DEFINITION 2 If u

We noted in the previous section that n-tuples can be viewed either as vectors or points in R n . In Definition 2 we chose to describe them as points, as that seemed the more natural interpretation.

E X A M P L E 4 Calculating Distance in R n

If u = (1, 3, −2, 7) and v = (0, 7, 2, 2) then the distance between u and v is

d(u, v) = Dot Product

" √ (1 − 0)2 + (3 − 7)2 + (−2 − 2)2 + (7 − 2)2 = 58

Our next objective is to define a useful multiplication operation on vectors in R 2 and R 3 and then extend that operation to R n . To do this we will first need to define exactly what we mean by the “angle” between two vectors in R 2 or R 3 . For this purpose, let u and v be nonzero vectors in R 2 or R 3 that have been positioned so that their initial points coincide. We define the angle between u and v to be the angle θ determined by u and v that satisfies the inequalities 0 ≤ θ ≤ π (Figure 3.2.4).

u

u

θ

θ

θ

v v

v

u

v u

Figure 3.2.4

θ

The angle θ between u and v satisfies 0 ≤ θ ≤ π.

DEFINITION 3 If u and v are nonzero vectors in R 2 or R 3 , and if θ is the angle between u and v, then the dot product (also called the Euclidean inner product) of u and v is denoted by u · v and is defined as

u · v = u

v cos θ

(12)

If u = 0 or v = 0, then we define u · v to be 0. The sign of the dot product reveals information about the angle θ that we can obtain by rewriting Formula (12) as u·v cos θ = (13)

u

v Since 0 ≤ θ ≤ π , it follows from Formula (13) and properties of the cosine function studied in trigonometry that • θ is acute if u · v > 0.

• θ is obtuse if u · v < 0.

• θ = π/2 if u · v = 0.

146

Chapter 3 Euclidean Vector Spaces z

E X A M P L E 5 Dot Product

Find the dot product of the vectors shown in Figure 3.2.5.

(0, 2, 2)

Solution The lengths of the vectors are

v

u = 1 and v =

(0, 0, 1) θ = 45°

u

y

and the cosine of the angle θ between them is





8=2 2



cos(45◦ ) = 1/ 2

x

Thus, it follows from Formula (12) that

Figure 3.2.5

Component Form of the Dot Product z



For computational purposes it is desirable to have a formula that expresses the dot product of two vectors in terms of components. We will derive such a formula for vectors in 3-space; the derivation for vectors in 2-space is similar. Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be two nonzero vectors. If, as shown in Figure 3.2.6, θ is the angle between u and v, then the law of cosines yields

−→

PQ 2 = u 2 + v 2 − 2 u

v cos θ

P(u1, u2, u3) u θ



u · v = u

v cos θ = (1)(2 2)(1/ 2) = 2

v

Q(v1, v2, v3)

x

(14)

−→

Since PQ = v − u, we can rewrite (14) as

y

u

v cos θ = 21 ( u 2 + v 2 − v − u 2 ) or u · v = 21 ( u 2 + v 2 − v − u 2 )

Figure 3.2.6

Substituting

u 2 = u21 + u22 + u23 ,

v 2 = v12 + v22 + v32

and

v − u 2 = (v1 − u1 )2 + (v2 − u2 )2 + (v3 − u3 )2 Although we derived Formula (15) and its 2-space companion under the assumption that u and v are nonzero, it turned out that these formulas are also applicable if u = 0 or v = 0 (verify).

we obtain, after simplifying, u · v = u1 v1 + u2 v2 + u3 v3

(15)

The companion formula for vectors in 2-space is u · v = u1 v1 + u2 v2

(16)

Motivated by the pattern in Formulas (15) and (16), we make the following definition.

Josiah Willard Gibbs (1839–1903)

Historical Note The dot product notation was first introduced by the American physicist and mathematician J. Willard Gibbs in a pamphlet distributed to his students at Yale University in the 1880s. The product was originally written on the baseline, rather than centered as today, and was referred to as the direct product. Gibbs’s pamphlet was eventually incorporated into a book entitled Vector Analysis that was published in 1901 and coauthored with one of his students. Gibbs made major contributions to the fields of thermodynamics and electromagnetic theory and is generally regarded as the greatest American physicist of the nineteenth century. [Image: SCIENCE SOURCE/Photo Researchers/ Getty Images]

3.2 Norm, Dot Product, and Distance in R n

147

= (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in R n , then the dot product (also called the Euclidean inner product) of u and v is denoted by u · v

DEFINITION 4 If u

In words, to calculate the dot product (Euclidean inner product) multiply corresponding components and add the resulting products.

and is defined by u · v = u1 v1 + u2 v2 + · · · + un vn

(17)

E X A M P L E 6 Calculating Dot Products Using Components

(a) Use Formula (15) to compute the dot product of the vectors u and v in Example 5. (b) Calculate u · v for the following vectors in R 4 : u = (−1, 3, 5, 7), v = (−3, −4, 1, 0) Solution (a) The component forms of the vectors are u

= (0, 0, 1) and v = (0, 2, 2).

Thus, u · v = (0)(0) + (0)(2) + (1)(2) = 2 which agrees with the result obtained geometrically in Example 5.

z (0, 0, k)

Solution (b)

u · v = (−1)(−3) + (3)(−4) + (5)(1) + (7)(0) = −4

u3 (k, k, k) d

E X A M P L E 7 A Geometry Problem Solved Using Dot Product y

u2 u1 x

θ

(0, k, 0)

(k, 0, 0)

Figure 3.2.7

Note that the angle θ obtained in Example 7 does not involve k . Why was this to be expected?

Algebraic Properties of the Dot Product

Find the angle between a diagonal of a cube and one of its edges. Solution Let k be the length of an edge and introduce a coordinate system as shown in Figure 3.2.7. If we let u1 = (k, 0, 0), u2 = (0, k, 0), and u3 = (0, 0, k), then the vector d = (k, k, k) = u1 + u2 + u3 is a diagonal of the cube. It follows from Formula (13) that the angle θ between d and the edge u1 satisfies k2 u1 · d 1 cos θ = = =√ √

u1

d 3 (k)( 3k 2 ) With the help of a calculator we obtain ! 1 −1 θ = cos ≈ 54.74◦ √ 3

In the special case where u = v in Definition 4, we obtain the relationship v · v = v12 + v22 + · · · + vn2 = v 2 (18) This yields the following formula for expressing the length of a vector in terms of a dot product:

v =



v·v

(19)

Dot products have many of the same algebraic properties as products of real numbers. THEOREM 3.2.2 If u, v, and w are vectors in R n , and if k is a scalar, then:

(a) u · v = v · u

[ Symmetry property ]

(b) u · (v + w) = u · v + u · w

[ Distributive property ]

(c)

k(u · v) = (k u) · v

(d ) v · v ≥ 0 and v · v = 0 if and only if v = 0

[ Homogeneity property ] [ Positivity property ]

We will prove parts (c) and (d) and leave the other proofs as exercises.

148

Chapter 3 Euclidean Vector Spaces

= (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ). Then k(u · v) = k(u1 v1 + u2 v2 + · · · + un vn ) = (ku1 )v1 + (ku2 )v2 + · · · + (kun )vn = (k u) · v

Proof (c) Let u

Proof (d) The result follows from parts (a) and (b) of Theorem 3.2.1 and the fact that

v · v = v1 v1 + v2 v2 + · · · + vn vn = v12 + v22 + · · · + vn2 = v 2 The next theorem gives additional properties of dot products. The proofs can be obtained either by expressing the vectors in terms of components or by using the algebraic properties established in Theorem 3.2.2. THEOREM 3.2.3 If u, v, and w are vectors in R n , and if k is a scalar, then:

(a) 0 · v = v · 0 = 0 (b) (u + v) · w = u · w + v · w (c)

u · (v − w) = u · v − u · w

(d ) (u − v) · w = u · w − v · w (e)

k(u · v) = u · (k v)

We will show how Theorem 3.2.2 can be used to prove part (b) without breaking the vectors into components. The other proofs are left as exercises. Proof (b)

(u + v) · w = w · (u + v)

[By symmetry]

=w·u+w·v

[By distributivity]

=u·w+v·w

[By symmetry]

Formulas (18) and (19) together with Theorems 3.2.2 and 3.2.3 make it possible to manipulate expressions involving dot products using familiar algebraic techniques. E X A M P L E 8 Calculating with Dot Products

(u − 2v) · (3u + 4v) = u · (3u + 4v) − 2v · (3u + 4v) = 3(u · u) + 4(u · v) − 6(v · u) − 8(v · v) = 3 u 2 − 2(u · v) − 8 v 2 Cauchy–Schwarz Inequality and Angles in Rn

Our next objective is to extend to R n the notion of “angle” between nonzero vectors u and v. We will do this by starting with the formula −1

θ = cos

u·v

u

v

! (20)

which we previously derived for nonzero vectors in R 2 and R 3 . Since dot products and norms have been defined for vectors in R n , it would seem that this formula has all the ingredients to serve as a definition of the angle θ between two vectors, u and v, in R n . However, there is a fly in the ointment, the problem being that the inverse cosine in Formula (20) is not defined unless its argument satisfies the inequalities u·v ≤1 −1 ≤ (21)

u

v Fortunately, these inequalities do hold for all nonzero vectors in R n as a result of the following fundamental result known as the Cauchy–Schwarz inequality.

3.2 Norm, Dot Product, and Distance in R n

149

THEOREM 3.2.4 Cauchy–Schwarz Inequality

If u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in R n , then

|u · v| ≤ u

v

(22)

or in terms of components

|u1 v1 + u2 v2 + · · · + un vn | ≤ (u21 + u22 + · · · + u2n )1/2 (v12 + v22 + · · · + vn2 )1/2 (23) We will omit the proof of this theorem because later in the text we will prove a more general version of which this will be a special case. Our goal for now will be to use this theorem to prove that the inequalities in (21) hold for all nonzero vectors in R n . Once that is done we will have established all the results required to use Formula (20) as our definition of the angle between nonzero vectors u and v in R n . To prove that the inequalities in (21) hold for all nonzero vectors in R n , divide both sides of Formula (22) by the product u

v to obtain    u·v  |u · v| ≤1 ≤ 1 or equivalently 

u

v

u

v  from which (21) follows. Geometry in Rn

u+v

Earlier in this section we extended various concepts to R n with the idea that familiar results that we can visualize in R 2 and R 3 might be valid in R n as well. Here are two fundamental theorems from plane geometry whose validity extends to R n : • The sum of the lengths of two side of a triangle is at least as large as the third (Figure 3.2.8).

v

• The shortest distance between two points is a straight line (Figure 3.2.9). The following theorem generalizes these theorems to R n . u

THEOREM 3.2.5 If u, v, and w are vectors in R n , then:

||u + v|| ≤ ||u|| + ||v||

Figure 3.2.8

(a) u + v ≤ u + v

[ Triangle inequality for vectors ]

(b) d(u, v) ≤ d(u, w) + d(w, v)

[ Triangle inequality for distances ]

v

w u d(u, v) ≤ d(u, w) + d(w, v)

Figure 3.2.9

Hermann Amandus Schwarz (1843–1921)

Viktor Yakovlevich Bunyakovsky (1804–1889)

Historical Note The Cauchy–Schwarz inequality is named in honor of the French mathematician Augustin Cauchy (see p. 121) and the German mathematician Hermann Schwarz. Variations of this inequality occur in many different settings and under various names. Depending on the context in which the inequality occurs, you may find it called Cauchy’s inequality, the Schwarz inequality, or sometimes even the Bunyakovsky inequality, in recognition of the Russian mathematician who published his version of the inequality in 1859, about 25 years before Schwarz. [Images: © Rudolph Duehrkoop/ ullstein bild/The Image Works (Schwarz); http://www-history.mcs.st-and.ac.uk/ Biographies/Bunyakovsky.html (Bunyakovsky)]

150

Chapter 3 Euclidean Vector Spaces Proof (a)

u + v 2 = (u + v) · (u + v) = (u · u) + 2(u · v) + (v · v) = u 2 + 2(u · v) + v 2 Property of absolute value ≤ u 2 + 2|u · v| + v 2 Cauchy–Schwarz inequality ≤ u 2 + 2 u

v + v 2 2 = ( u + v ) This completes the proof since both sides of the inequality in part (a) are nonnegative. Proof (b) It follows from part (a) and Formula (11) that

d(u, v) = u − v = (u − w) + (w − v) ≤ u − w + w − v = d(u, w) + d(w, v) u+v v

u–v

It is proved in plane geometry that for any parallelogram the sum of the squares of the diagonals is equal to the sum of the squares of the four sides (Figure 3.2.10). The following theorem generalizes that result to R n .

u

Figure 3.2.10

THEOREM 3.2.6 Parallelogram Equation for Vectors

If u and v are vectors in R n , then

 

u + v 2 + u − v 2 = 2 u 2 + v 2

Proof

(24)

u + v 2 + u − v 2 = (u + v) · (u + v) + (u − v) · (u − v) = 2(u · u) + 2(v · v)   = 2 u 2 + v 2

We could state and prove many more theorems from plane geometry that generalize to R n , but the ones already given should suffice to convince you that R n is not so different from R 2 and R 3 even though we cannot visualize it directly. The next theorem establishes a fundamental relationship between the dot product and norm in R n .

THEOREM 3.2.7 If u and v are vectors in R n with the Euclidean inner product, then

u · v = 41 u + v 2 − 41 u − v 2

Proof

Note that Formula (25) expresses the dot product in terms of norms.

(25)

u + v 2 = (u + v) · (u + v) = u 2 + 2(u · v) + v 2

u − v 2 = (u − v) · (u − v) = u 2 − 2(u · v) + v 2

from which (25) follows by simple algebra.

Dot Products as Matrix Multiplication

There are various ways to express the dot product of vectors using matrix notation. The formulas depend on whether the vectors are expressed as row matrices or column matrices. Table 1 shows the possibilities.

3.2 Norm, Dot Product, and Distance in R n

151

Table 1

Form

Example

Dot Product



u a column matrix and v a column matrix

u · v = u T v = vT u

⎡ ⎤



1 u = ⎣−3⎦ 5

5 5] ⎣4⎦ = −7 0

−3

uT v = [1

⎡ ⎤





5

v = ⎣4⎦

v u = [5 T

1 0] ⎣−3⎦ = −7 5

4

0

⎡ ⎤

−3 ⎡ ⎤

u = [1 u a row matrix and v a column matrix

u · v = uv = vT uT

5 v = ⎣4⎦ 0



u a column matrix and v a row matrix

u · v = vu = uT vT

5]



v u = [5

vu = [5



1 0] ⎣−3⎦ = −7 5

4







4

5 5] ⎣4⎦ = −7 0

−3

T T

1 u = ⎣−3⎦ 5 v = [5

uv = [1

1 0] ⎣−3⎦ = −7 5

4

⎡ ⎤

0]

−3

u v = [1 T T

5 5] ⎣4⎦ = −7 0

⎡ ⎤

uvT = [1 u a row matrix and v a row matrix

u · v = uvT = vuT

u = [1

−3

v = [5

4

−3

5]

5 5] ⎣4⎦ = −7 0



0] vuT = [5

4



1 0] ⎣−3⎦ = −7 5

Application of Dot Products to ISBN Numbers Although the system has recently changed, most older books have been assigned a unique 10-digit number called an International Standard Book Number or ISBN. The first nine digits of this number are split into three groups—the first group representing the country or group of countries in which the book originates, the second identifying the publisher, and the third assigned to the book title itself. The tenth and final digit, called a check digit, is computed from the first nine digits and is used to ensure that an electronic transmission of the ISBN, say over the Internet, occurs without error. To explain how this is done, regard the first nine digits of the ISBN as a vector b in R 9 , and let a be the vector a = (1, 2, 3, 4, 5, 6, 7, 8, 9) Then the check digit c is computed using the following procedure: 1. Form the dot product a · b. 2. Divide a · b by 11, thereby producing a remainder c that is an integer between 0 and 10, inclusive. The check digit is taken to be c, with the proviso that c = 10 is written as X to avoid double digits.

For example, the ISBN of the brief edition of Calculus, sixth edition, by Howard Anton is 0-471-15307-9 which has a check digit of 9. This is consistent with the first nine digits of the ISBN, since a · b = (1, 2, 3, 4, 5, 6, 7, 8, 9) · (0, 4, 7, 1, 1, 5, 3, 0, 7) = 152 Dividing 152 by 11 produces a quotient of 13 and a remainder of 9, so the check digit is c = 9. If an electronic order is placed for a book with a certain ISBN, then the warehouse can use the above procedure to verify that the check digit is consistent with the first nine digits, thereby reducing the possibility of a costly shipping error.

152

Chapter 3 Euclidean Vector Spaces

If A is an n × n matrix and u and v are n × 1 matrices, then it follows from the first row in Table 1 and properties of the transpose that

Au · v = vT(Au) = (vTA)u = (AT v)T u = u · AT v u · Av = (Av)T u = (vTAT )u = vT(AT u) = AT u · v The resulting formulas

Au · v = u · AT v

(26)

u · Av = AT u · v

(27)

provide an important link between multiplication by an n × n matrix A and multiplication by AT .

E X A M P L E 9 Verifying that Au · v = u · AT v

Suppose that



1 ⎢ A=⎣ 2 −1 Then

4 0



1 ⎢ 2 Au = ⎣ −1









⎤⎡



⎡ ⎤





−1 −2 3 ⎢ ⎥ ⎢ ⎥ ⎥ 1⎦, u = ⎣ 2⎦, v = ⎣ 0⎦ 4 5 1

−2

1 ⎢ AT v = ⎣−2 3

4 0

−1 3 7 ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣ 2⎦ = ⎣10⎦ 4 5 1

2 4 1

⎤⎡ ⎤ ⎡ ⎤ −2 −7 −1 ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣ 0⎦ = ⎣ 4⎦ 5 1 −1

−2

from which we obtain

Au · v = 7(−2) + 10(0) + 5(5) = 11 u · AT v = (−1)(−7) + 2(4) + 4(−1) = 11 Thus, Au · v = u · AT v as guaranteed by Formula (26). We leave it for you to verify that Formula (27) also holds.

A Dot Product View of Matrix Multiplication

Dot products provide another way of thinking about matrix multiplication. Recall that if A = [aij ] is an m × r matrix and B = [bij ] is an r × n matrix, then the ij th entry of AB is

ai 1 b1j + ai 2 b2j + · · · + air brj which is the dot product of the i th row vector of A

[ai 1 ai 2 and the j th column vector of B

· · · air ]

⎤ b1j ⎢ ⎥ ⎢b2j ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ . ⎦ brj ⎡

3.2 Norm, Dot Product, and Distance in R n

153

Thus, if the row vectors of A are r1 , r2 , . . . , rm and the column vectors of B are c1 , c2 , . . . , cn , then the matrix product AB can be expressed as



r1 · c1 ⎢r · c ⎢ 2 1

AB = ⎢ ⎣

.. . rm · c1

r1 · c2 r2 · c2

.. . rm · c2

··· ···



r1 · cn r2 · cn ⎥ ⎥

.. ⎥ . ⎦ · · · rm · cn

(28)

Exercise Set 3.2 In Exercises 1–2, find the norm of v, and a unit vector that is oppositely directed to v. 1. (a) v = (2, 2, 2)

(b) v = (1, 0, 2, 1, 3)

2. (a) v = (1, −1, 2)

(b) v = (−2, 3, 3, −1)

In Exercises 3–4, evaluate the given expression with u = (2, −2, 3), v = (1, −3, 4), and w = (3, 6, −4). 3. (a) u + v (c) − 2u + 2v 4. (a) u + v + w (c) 3v − 3 v

(b) u + v (d) 3u − 5v + w

(c) − u v 6. (a) u + − 2v + − 3w

In Exercises 15–16, determine whether the expression makes sense mathematically. If not, explain why. (b) u · (v + w)

(c) u · v

(d) u − v

(b) 3u − 5 v + w

$

14. Suppose that a vector a in the xy -plane points in a direction that is 47◦ counterclockwise from the positive x -axis, and a vector b in that plane points in a direction that is 43◦ clockwise from the positive x -axis. What can you say about the value of a · b?

15. (a) u · (v · w)

(b) u − v

In Exercises 5–6, evaluate the given expression with u = (−2, −1, 4, 5), v = (3, 1, −5, 7), and w = (−6, 2, 1, 1). 5. (a) 3u − 5v + w

the positive x -axis, and a vector b in that plane has a length of 5 units and points in the positive y -direction. Find a · b.

$

(b) $ u − v w$

7. Let v = (−2, 3, 0, 6). Find all scalars k such that k v = 5. 8. Let v = (1, 1, 2, −3, 1). Find all scalars k such that

k v = 4. In Exercises 9–10, find u · v, u · u, and v · v. 9. (a) u = (3, 1, 4), v = (2, 2, −4) (b) u = (1, 1, 4, 6), v = (2, −2, 3, −2) 10. (a) u = (1, 1, −2, 3), v = (−1, 0, 5, 1) (b) u = (2, −1, 1, 0, −2), v = (1, 2, 2, 2, 1) In Exercises 11–12, find the Euclidean distance between u and v and the cosine of the angle between those vectors. State whether that angle is acute, obtuse, or 90◦ . 11. (a) u = (3, 3, 3), v = (1, 0, 4) (b) u = (0, −2, −1, 1), v = (−3, 2, 4, 4) 12. (a) u = (1, 2, −3, 0), v = (5, 1, 2, −2) (b) u = (0, 1, 1, 1, 2), v = (2, 1, 0, −1, 3) 13. Suppose that a vector a in the xy -plane has a length of 9 units and points in a direction that is 120◦ counterclockwise from

(d) (u · v) − u

16. (a) u · v

(b) (u · v) − w

(c) (u · v) − k

(d) k · u

In Exercises 17–18, verify that the Cauchy–Schwarz inequality holds. 17. (a) u = (−3, 1, 0), v = (2, −1, 3) (b) u = (0, 2, 2, 1), v = (1, 1, 1, 1) 18. (a) u = (4, 1, 1), v = (1, 2, 3) (b) u = (1, 2, 1, 2, 3), v = (0, 1, 1, 5, −2) 19. Let r0 = (x0 , y0 ) be a fixed vector in R 2 . In each part, describe in words the set of all vectors r = (x, y) that satisfy the stated condition. (a) r − r0 = 1

(b) r − r0 ≤ 1

(c) r − r0 > 1

20. Repeat the directions of Exercise 19 for vectors r = (x, y, z) and r0 = (x0 , y0 , z0 ) in R 3 . Exercises 21–25 The direction of a nonzero vector v in an xyzcoordinate system is completely determined by the angles α, β , and γ between v and the standard unit vectors i, j, and k (Figure Ex-21). These are called the direction angles of v, and their cosines are called the direction cosines of v. 21. Use Formula (13) to show that the direction cosines of a vector v = (v1 , v2 , v3 ) in R 3 are cos α =

v1 ,

v

cos β =

v2 ,

v

cos γ =

v3

v

154

Chapter 3 Euclidean Vector Spaces z

30. Under what conditions will the triangle inequality (Theorem 3.2.5a) be an equality? Explain your answer geometrically.

v

k

γ

β

y

α

j

i x

Figure Ex-21

22. Use the result in Exercise 21 to show that cos2 α + cos2 β + cos2 γ = 1

Exercises 31–32 The effect that a force has on an object depends on the magnitude of the force and the direction in which it is applied. Thus, forces can be regarded as vectors and represented as arrows in which the length of the arrow specifies the magnitude of the force, and the direction of the arrow specifies the direction in which the force is applied. It is a fact of physics that force vectors obey the parallelogram law in the sense that if two force vectors F1 and F2 are applied at a point on an object, then the effect is the same as if the single force F1 + F2 (called the resultant) were applied at that point (see accompanying figure). Forces are commonly measured in units called pounds-force (abbreviated lbf) or Newtons (abbreviated N).

23. Show that two nonzero vectors v1 and v2 in R 3 are orthogonal if and only if their direction cosines satisfy

F1 + F2

cos α1 cos α2 + cos β1 cos β2 + cos γ1 cos γ2 = 0 24. The accompanying figure shows a cube.

F2 The single force F1 + F2 has the same effect as the two forces F1 and F2.

(a) Find the angle between the vectors d and u to the nearest degree. (b) Make a conjecture about the angle between the vectors d and v, and confirm your conjecture by computing the angle.

31. A particle is said to be in static equilibrium if the resultant of all forces applied to it is zero. For the forces in the accompanying figure, find the resultant F that must be applied to the indicated point to produce static equilibrium. Describe F by giving its magnitude and the angle in degrees that it makes with the positive x -axis.

z

v d y

x

u

F1

Figure Ex-24

32. Follow the directions of Exercise 31. y

y

25. Estimate, to the nearest degree, the angles that a diagonal of a box with dimensions 10 cm × 15 cm × 25 cm makes with the edges of the box.

10 lb

120 N 8 lb

60°

x

150 N 75°

45°

100 N

x

26. If v = 2 and w = 3, what are the largest and smallest values possible for v − w ? Give a geometric explanation of your results. 27. What can you say about two nonzero vectors, u and v, that satisfy the equation u + v = u + v ? 28. (a) What relationship must hold for the point p = (a, b, c) to be equidistant from the origin and the xz-plane? Make sure that the relationship you state is valid for positive and negative values of a , b, and c. (b) What relationship must hold for the point p = (a, b, c) to be farther from the origin than from the xz-plane? Make sure that the relationship you state is valid for positive and negative values of a , b, and c. 29. State a procedure for finding a vector of a specified length m that points in the same direction as a given vector v.

Figure Ex-31

Figure Ex-32

Working with Proofs 33. Prove parts (a) and (b) of Theorem 3.2.1. 34. Prove parts (a) and (c) of Theorem 3.2.3. 35. Prove parts (d) and (e) of Theorem 3.2.3.

True-False Exercises TF. In parts (a)–(j) determine whether the statement is true or false, and justify your answer.

3.3 Orthogonality

(a) If each component of a vector in R 3 is doubled, the norm of that vector is doubled. (b) In R 2 , the vectors of norm 5 whose initial points are at the origin have terminal points lying on a circle of radius 5 centered at the origin.

155

(h) If u · v = 0, then either u = 0 or v = 0. (i) In R 2 , if u lies in the first quadrant and v lies in the third quadrant, then u · v cannot be positive. ( j) For all vectors u, v, and w in R n , we have

u + v + w ≤ u + v + w

(c) Every vector in R n has a positive norm. (d) If v is a nonzero vector in R n , there are exactly two unit vectors that are parallel to v.

Working withTechnology

(e) If u = 2, v = 1, and u · v = 1, then the angle between u and v is π/3 radians.

T1. Let u be a vector in R 100 whose i th component is i , and let v be the vector in R 100 whose i th component is 1/(i + 1). Find the dot product of u and v.

(f ) The expressions (u · v) + w and u · (v + w) are both meaningful and equal to each other. (g) If u · v = u · w, then v = w.

T2. Find, to the nearest degree, the angles that a diagonal of a box with dimensions 10 cm × 11 cm × 25 cm makes with the edges of the box.

3.3 Orthogonality In the last section we defined the notion of “angle” between vectors in R n . In this section we will focus on the notion of “perpendicularity.” Perpendicular vectors in R n play an important role in a wide variety of applications.

Orthogonal Vectors

Recall from Formula (20) in the previous section that the angle θ between two nonzero vectors u and v in R n is defined by the formula −1

θ = cos

u·v

u

v

!

It follows from this that θ = π/2 if and only if u · v = 0. Thus, we make the following definition.

R n are said to be orthogonal (or perpendicular) if u · v = 0. We will also agree that the zero vector in R n is orthogonal to every vector in R n . DEFINITION 1 Two nonzero vectors u and v in

E X A M P L E 1 Orthogonal Vectors

(a) Show that u = (−2, 3, 1, 4) and v = (1, 2, 0, −1) are orthogonal vectors in R 4 . (b) Let S = {i, j, k} be the set of standard unit vectors in R 3 . Show that each ordered pair of vectors in S is orthogonal. Solution (a) The vectors are orthogonal since

u · v = (−2)(1) + (3)(2) + (1)(0) + (4)(−1) = 0 Solution (b) It suffices to show that

i·j=i·k=j·k=0

156

Chapter 3 Euclidean Vector Spaces

Using the computations in R 3 as a model, you should be able to see that each ordered pair of standard unit vectors in R n is orthogonal.

because it will follow automatically from the symmetry property of the dot product that j·i=k·i=k·j=0 Although the orthogonality of the vectors in S is evident geometrically from Figure 3.2.2, it is confirmed algebraically by the computations i · j = (1, 0, 0) · (0, 1, 0) = 0 i · k = (1, 0, 0) · (0, 0, 1) = 0 j · k = (0, 1, 0) · (0, 0, 1) = 0

Lines and Planes Determined by Points and Normals

One learns in analytic geometry that a line in R 2 is determined uniquely by its slope and one of its points, and that a plane in R 3 is determined uniquely by its “inclination” and one of its points. One way of specifying slope and inclination is to use a nonzero vector n, called a normal, that is orthogonal to the line or plane in question. For example, Figure 3.3.1 shows the line through the point P0 (x0 , y0 ) that has normal n = (a, b) and the plane through the point P0 (x0 , y0 , z0 ) that has normal n = (a, b, c). Both the line and the plane are represented by the vector equation

−−→

n · P0 P = 0

(1)

where P is either an arbitrary point (x, y) on the line or an arbitrary point (x, y, z) in −−→ the plane. The vector P0 P can be expressed in terms of components as Formula (1) is called the pointnormal form of a line or plane and Formulas (2) and (3) the component forms.

−−→ P0 P = (x − x0 , y − y0 ) [ line ] −−→ P0 P = (x − x0 , y − y0 , z − z0 )

[ plane ]

Thus, Equation (1) can be written as

a(x − x0 ) + b(y − y0 ) = 0

(2)

[ line ]

a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0

(3)

[ plane ]

These are called the point-normal equations of the line and plane. z

y

(a, b, c) P(x, y)

P(x, y, z)

(a, b)

n

n P0(x0, y0)

P0(x0, y0, z0)

x

y

x

Figure 3.3.1

E X A M P L E 2 Point-Normal Equations

It follows from (2) that in R 2 the equation 6(x − 3) + (y + 7) = 0 represents the line through the point (3, −7) with normal n = (6, 1); and it follows from (3) that in R 3 the equation 4(x − 3) + 2y − 5(z − 7) = 0 represents the plane through the point (3, 0, 7) with normal n = (4, 2, −5).

3.3 Orthogonality

157

When convenient, the terms in Equations (2) and (3) can be multiplied out and the constants combined. This leads to the following theorem.

THEOREM 3.3.1

(a) If a and b are constants that are not both zero, then an equation of the form

ax + by + c = 0

(4)

represents a line in R 2 with normal n = (a, b). (b) If a, b, and c are constants that are not all zero, then an equation of the form

ax + by + cz + d = 0

(5)

represents a plane in R 3 with normal n = (a, b, c).

E X A M P L E 3 Vectors Orthogonal to Lines and Planes Through the Origin

(a) The equation ax + by = 0 represents a line through the origin in R 2 . Show that the vector n1 = (a, b) formed from the coefficients of the equation is orthogonal to the line, that is, orthogonal to every vector along the line. (b) The equation ax + by + cz = 0 represents a plane through the origin in R 3 . Show that the vector n2 = (a, b, c) formed from the coefficients of the equation is orthogonal to the plane, that is, orthogonal to every vector that lies in the plane. Solution We will solve both problems together. The two equations can be written as

(a, b) · (x, y) = 0 and (a, b, c) · (x, y, z) = 0 or, alternatively, as n1 · (x, y) = 0 and n2 · (x, y, z) = 0 These equations show that n1 is orthogonal to every vector (x, y) on the line and that n2 is orthogonal to every vector (x, y, z) in the plane (Figure 3.3.1).

Recall that

ax + by = 0 and ax + by + cz = 0 Referring to Table 1 of Section 3.2, in what other ways can you write (6) if n and x are expressed in matrix form?

are called homogeneous equations. Example 3 illustrates that homogeneous equations in two or three unknowns can be written in the vector form n·x=0

(6)

where n is the vector of coefficients and x is the vector of unknowns. In R 2 this is called the vector form of a line through the origin, and in R 3 it is called the vector form of a plane through the origin.

Orthogonal Projections

In many applications it is necessary to “decompose” a vector u into a sum of two terms, one term being a scalar multiple of a specified nonzero vector a and the other term being orthogonal to a. For example, if u and a are vectors in R 2 that are positioned so their initial points coincide at a point Q, then we can create such a decomposition as follows (Figure 3.3.2):

158

Chapter 3 Euclidean Vector Spaces

• Drop a perpendicular from the tip of u to the line through a. • Construct the vector w1 from Q to the foot of the perpendicular. • Construct the vector w2 = u − w1 . Since w1 + w2 = w1 + (u − w1 ) = u we have decomposed u into a sum of two orthogonal vectors, the first term being a scalar multiple of a and the second being orthogonal to a.

u

w2 Q

w1

a

Q

u

u

w2 a

(a)

w1

(b)

w2 Q

w1

a

(c)

Figure 3.3.2 Three possible cases.

The following theorem shows that the foregoing results, which we illustrated using vectors in R 2 , apply as well in R n . THEOREM 3.3.2 Projection Theorem

If u and a are vectors in R n , and if a  = 0, then u can be expressed in exactly one way in the form u = w1 + w2 , where w1 is a scalar multiple of a and w2 is orthogonal to a.

Proof Since the vector w1 is to be a scalar multiple of a, it must have the form

w1 = k a

(7)

Our goal is to find a value of the scalar k and a vector w2 that is orthogonal to a such that u = w1 + w2 (8) We can determine k by using (7) to rewrite (8) as u = w1 + w2 = k a + w2 and then applying Theorems 3.2.2 and 3.2.3 to obtain u · a = (k a + w2 ) · a = k a 2 + (w2 · a)

(9)

Since w2 is to be orthogonal to a, the last term in (9) must be 0, and hence k must satisfy the equation u · a = k a 2 from which we obtain

u·a

a 2 as the only possible value for k . The proof can be completed by rewriting (8) as u·a a w2 = u − w1 = u − k a = u −

a 2

k=

and then confirming that w2 is orthogonal to a by showing that w2 · a = 0 (we leave the details for you).

3.3 Orthogonality

159

The vectors w1 and w2 in the Projection Theorem have associated names—the vector w1 is called the orthogonal projection of u on a or sometimes the vector component of u along a, and the vector w2 is called the vector component of u orthogonal to a. The vector w1 is commonly denoted by the symbol proja u, in which case it follows from (8) that w2 = u − proja u. In summary, proja u =

u·a a

a 2

u − proja u = u −

u·a a

a 2

(10)

(vector component of u along a)

(vector component of u orthogonal to a)

(11)

E X A M P L E 4 Orthogonal Projection on a Line

Find the orthogonal projections of the vectors e1 = (1, 0) and e2 = (0, 1) on the line L that makes an angle θ with the positive x -axis in R 2 .

= (cos θ, sin θ) is a unit vector along the line L, so our first problem is to find the orthogonal projection of e1 along a. Since "

a = sin2 θ + cos2 θ = 1 and e1 · a = (1, 0) · (cos θ, sin θ) = cos θ Solution As illustrated in Figure 3.3.3, a

y

e2 = (0, 1)

L

θ

(cos θ, sin θ) sin θ x

cos θ

e1 = (1, 0)

1

Figure 3.3.3

it follows from Formula (10) that this projection is proja e1 =

e1 · a a = (cos θ)(cos θ, sin θ) = (cos2 θ, sin θ cos θ)

a 2

Similarly, since e2 · a = (0, 1) · (cos θ, sin θ) = sin θ , it follows from Formula (10) that proja e2 =

e2 · a a = (sin θ)(cos θ, sin θ) = (sin θ cos θ, sin2 θ)

a 2

E X A M P L E 5 Vector Component of u Along a

Let u = (2, −1, 3) and a = (4, −1, 2). Find the vector component of u along a and the vector component of u orthogonal to a. Solution

u · a = (2)(4) + (−1)(−1) + (3)(2) = 15

a 2 = 42 + (−1)2 + 22 = 21 Thus the vector component of u along a is proja u =

u·a a=

a 2

15 (4, −1, 2) 21

=

 20 7

, − 57 , 107



and the vector component of u orthogonal to a is u − proja u = (2, −1, 3) −

 20 7

   , − 57 , 107 = − 67 , − 27 , 117

As a check, you may wish to verify that the vectors u − proja u and a are perpendicular by showing that their dot product is zero.

160

Chapter 3 Euclidean Vector Spaces u

Sometimes we will be more interested in the norm of the vector component of u along a than in the vector component itself. A formula for this norm can be derived as follows: $ $   $u · a $ u · a $ $   a = |u · a| a

proja u = $ a$ =  2 2

a

a 

a 2

||u|| θ

a

||u|| cos θ

(a) 0 ≤ θ
0. Thus,

π 2

proja u =

u ||u|| θ

a

(12)

If θ denotes the angle between u and a, then u · a = u

a cos θ , so (12) can also be written as

proja u = u | cos θ|

– ||u|| cos θ

π (b) n, then S is

linearly dependent. Proof Suppose that

v1 = (v11 , v12 , . . . , v1n ) v2 = (v21 , v22 , . . . , v2n )

.. .

.. . vr = (vr 1 , vr 2 , . . . , vrn )

and consider the equation

k1 v1 + k2 v2 + · · · + kr vr = 0

208

Chapter 4 General Vector Spaces

It follows from Theorem 4.3.3 that a set in R 2 with more than two vectors is linearly dependent and a set in R 3 with more than three vectors is linearly dependent.

If we express both sides of this equation in terms of components and then equate the corresponding components, we obtain the system

v11 k1 + v21 k2 + · · · + vr 1 kr = 0 v12 k1 + v22 k2 + · · · + vr 2 kr = 0 .. .. .. .. . . . . v1n k1 + v2n k2 + · · · + vrn kr = 0 This is a homogeneous system of n equations in the r unknowns k1 , . . . , kr . Since r > n, it follows from Theorem 1.2.2 that the system has nontrivial solutions. Therefore, S = {v1 , v2 , . . . , vr } is a linearly dependent set.

CA L C U L U S R E Q U I R E D

Linear Independence of Functions

Sometimes linear dependence of functions can be deduced from known identities. For example, the functions f1 = sin2 x, f2 = cos2 x, and f3 = 5 form a linearly dependent set in F (−⬁, ⬁), since the equation 5f1 + 5f2 − f3 = 5 sin2 x + 5 cos2 x − 5

= 5(sin2 x + cos2 x) − 5 = 0 expresses 0 as a linear combination of f1 , f2 , and f3 with coefficients that are not all zero. However, it is relatively rare that linear independence or dependence of functions can be ascertained by algebraic or trigonometric methods. To make matters worse, there is no general method for doing that either. That said, there does exist a theorem that can be useful for that purpose in certain cases. The following definition is needed for that theorem.

= f1 (x), f2 = f2 (x), . . . , fn = fn (x) are functions that are n − 1 times differentiable on the interval (−⬁, ⬁), then the determinant    f1 (x)  f2 (x) · · · fn (x)   





f2 (x) · · · fn (x)  f1 (x)   W (x) =  .. .. ..  . .  .   (n−1)   f1 (x) f2(n−1) (x) · · · fn(n−1) (x)  DEFINITION 2 If f1

is called the Wronskian of f1 , f2 , . . . , fn .

Józef Hoëné de Wro´nski (1778–1853)

Historical Note The Polish-French mathematician Józef Hoëné de Wronski ´ was born Józef Hoëné and adopted the name Wronski ´ after he married. Wronski’s ´ life was fraught with controversy and conflict, which some say was due to psychopathic tendencies and his exaggeration of the importance of his own work. Although Wronski’s ´ work was dismissed as rubbish for many years, and much of it was indeed erroneous, some of his ideas contained hidden brilliance and have survived. Among other things, Wronski ´ designed a caterpillar vehicle to compete with trains (though it was never manufactured) and did research on the famous problem of determining the longitude of a ship at sea. His final years were spent in poverty. [Image: © TopFoto/The Image Works]

4.3 Linear Independence

209

Suppose for the moment that f1 = f1 (x), f2 = f2 (x), . . . , fn = fn (x) are linearly dependent vectors in C (n−1) (−⬁, ⬁). This implies that the vector equation

k1 f1 + k2 f2 + · · · + kn fn = 0 is satisfied by values of the coefficients k1 , k2 , . . . , kn that are not all zero, and for these coefficients the equation

k1 f1 (x) + k2 f2 (x) + · · · + kn fn (x) = 0 is satisfied for all x in (−⬁, ⬁). Using this equation together with those that result by differentiating it n − 1 times we obtain the linear system

+ k2 f2 (x)

k1 f1 (x)



+ k2 f2 (x) .. .

k1 f1 (x) .. .

+ · · · + kn fn (x)

=0

kn fn (x)

=0 .. .

+···+

.. .

k1 f1(n−1) (x) + k2 f2(n−1) (x) + · · · + kn fn(n−1) (x) = 0 Thus, the linear dependence of f1 , f2 , . . . , fn implies that the linear system



f1 (x) ⎢

⎢f1 (x) ⎢ . ⎢ . ⎣ . f1(n−1) (x)

⎤⎡ ⎤ ⎡ ⎤ 0 k1 ⎥⎢ ⎥ ⎢ ⎥ ⎥ ⎢k2 ⎥ ⎢0⎥ ⎥ ⎢ . ⎥ = ⎢.⎥ ⎥ ⎢ . ⎥ ⎢.⎥ ⎦ ⎣ . ⎦ ⎣.⎦ (n−1) 0 kn · · · fn (x)

· · · fn (x) · · · fn (x) .. .

f2 (x) f2 (x) .. . f2(n−1) (x)

(10)

has a nontrivial solution for every x in the interval (−⬁, ⬁), and this in turn implies that the determinant of the coefficient matrix of (10) is zero for every such x . Since this determinant is the Wronskian of f1 , f2 , . . . , fn , we have established the following result. WARNING The

converse of Theorem 4.3.4 is false. If the Wronskian of f1 , f2 , . . . , fn is identically zero on (−⬁, ⬁), then no conclusion can be reached about the linear independence of {f1 , f2 , . . . , fn }— this set of vectors may be linearly independent or linearly dependent.

n − 1 continuous derivatives on the interval (−⬁, ⬁), and if the Wronskian of these functions is not identically zero on (−⬁, ⬁), then these functions form a linearly independent set of vectors in C (n−1) (−⬁, ⬁).

THEOREM 4.3.4 If the functions f1 , f2 , . . . , fn have

In Example 6 we showed that x and sin x are linearly independent functions by observing that neither is a scalar multiple of the other. The following example illustrates how to obtain the same result using the Wronskian (though it is a more complicated procedure in this particular case). E X A M P L E 7 Linear Independence Using the Wronskian

Use the Wronskian to show that f1 = x and f2 = sin x are linearly independent vectors in C ⬁ (−⬁, ⬁). Solution The Wronskian is

 x W (x) =  1



sin x  = x cos x − sin x cos x 

This function is not identically zero on the interval (−⬁, ⬁) since, for example,

W

'π (

=

π

cos

'π (

2 2 2 Thus, the functions are linearly independent.

− sin

'π ( 2

=

π 2

210

Chapter 4 General Vector Spaces

E X A M P L E 8 Linear Independence Using the Wronskian

Use the Wronskian to show that f1 = 1, f2 = ex , and f3 = e2x are linearly independent vectors in C ⬁ (−⬁, ⬁). Solution The Wronskian is

 1   W(x) =  0  0

ex ex ex

 e2x   2 e 2 x  = 2 e 3x  4 e 2x 

This function is obviously not identically zero on (−⬁, ⬁), so f1 , f2 , and f3 form a linearly independent set. We will close this section by proving Theorem 4.3.1.

O PT I O N A L

Proof of Theorem 4.3.1 We will prove this theorem in the case where the set S has two or more vectors, and leave the case where S has only one vector as an exercise. Assume first that S is linearly independent. We will show that if the equation

k1 v1 + k2 v2 + · · · + kr vr = 0

(11)

can be satisfied with coefficients that are not all zero, then at least one of the vectors in

S must be expressible as a linear combination of the others, thereby contradicting the assumption of linear independence. To be specific, suppose that k1  = 0. Then we can rewrite (11) as v1 =



k2 k1

! v2 + · · · + −

kr k1

! vr

which expresses v1 as a linear combination of the other vectors in S . Conversely, we must show that if the only coefficients satisfying (11) are

k1 = 0, k2 = 0, . . . , kr = 0 then the vectors in S must be linearly independent. But if this were true of the coefficients and the vectors were not linearly independent, then at least one of them would be expressible as a linear combination of the others, say v1 = c2 v2 + · · · + cr vr which we can rewrite as v1 + (−c2 )v2 + · · · + (−cr )vr = 0 But this contradicts our assumption that (11) can only be satisfied by coefficients that are all zero. Thus, the vectors in S must be linearly independent.

Exercise Set 4.3 1. Explain why the following form linearly dependent sets of vectors. (Solve this problem by inspection.) (a) u1 = (−1, 2, 4) and u2 = (5, −10, −20) in R 3 (b) u1 = (3, −1), u2 = (4, 5), u3 = (−4, 7) in R 2 (c) p1 = 3 − 2x + x 2 and p2 = 6 − 4x + 2x 2 in P2



(d) A =

−3

4

2

0





and B =

3

−4

−2

0



2. In each part, determine whether the vectors are linearly independent or are linearly dependent in R 3 . (a) (−3, 0, 4), (5, −1, 2), (1, 1, 3) (b) (−2, 0, 1), (3, 2, 5), (6, −1, 1), (7, 0, −2) 3. In each part, determine whether the vectors are linearly independent or are linearly dependent in R 4 . (a) (3, 8, 7, −3), (1, 5, 3, −1), (2, −1, 2, 6), (4, 2, 6, 4)

in M22

(b) (3, 0, −3, 6), (0, 2, 3, 1), (0, −2, −2, 0), (−2, 1, 2, 1)

4.3 Linear Independence

whether the set {TA (u1 ), TA (u2 ), TA (u3 )} is linearly independent in R 3 .

4. In each part, determine whether the vectors are linearly independent or are linearly dependent in P2 .



(a) 2 − x + 4x 2 , 3 + 6x + 2x 2 , 2 + 10x − 4x 2



(a)



0

1

2

1

0

0

0

0

0

 (b)



1

, 

1

2

2

1



,





,



0

1

2

1

0

0

1

0

0

0



in M22



,

0

0

0

0

1

0

1

0

1

k





,

−1 k

0



1



,

2

0

1

3

1



(a) A = ⎣1

0

⎥ −3⎦

2

2

0

1

(b) A = ⎣1

1

⎥ −3⎦

2

2

0



1



1

15. Are the vectors v1 , v2 , and v3 in part (a) of the accompanying figure linearly independent? What about those in part (b)? Explain.



z

in M23

z v3

6. Determine all values of k for which the following matrices are linearly independent in M22 .



2



1



(b) 1 + 3x + 3x 2 , x + 4x 2 , 5 + 6x + 3x 2 , 7 + 2x − x 2 5. In each part, determine whether the matrices are linearly independent or dependent.

211

v3 v2



v2

v1

y v1

7. In each part, determine whether the three vectors lie in a plane in R 3 . (a) v1 = (2, −2, 0), v2 = (6, 1, 4), v3 = (2, 0, −4)

x

x

(a)

(b)

Figure Ex-15

(b) v1 = (−6, 7, 2), v2 = (3, 2, 4), v3 = (4, −1, 2) 8. In each part, determine whether the three vectors lie on the same line in R 3 . (a) v1 = (−1, 2, 3), v2 = (2, −4, −6), v3 = (−3, 6, 0)

16. By using appropriate identities, where required, determine which of the following sets of vectors in F (−⬁, ⬁) are linearly dependent.

(b) v1 = (2, −1, 4), v2 = (4, 2, 3), v3 = (2, 7, −6)

(a) 6, 3 sin2 x, 2 cos2 x

(b) x, cos x

(c) v1 = (4, 6, 8), v2 = (2, 3, 4), v3 = (−2, −3, −4)

(c) 1, sin x, sin 2x

(d) cos 2x, sin2 x, cos2 x

(e) (3 − x)2 , x 2 − 6x, 5

(f ) 0, cos3 πx, sin5 3πx

9. (a) Show that the three vectors v1 = (0, 3, 1, −1), v2 = (6, 0, 5, 1), and v3 = (4, −7, 1, 3) form a linearly dependent set in R 4 . (b) Express each vector in part (a) as a linear combination of the other two. 10. (a) Show that the vectors v1 = (1, 2, 3, 4), v2 = (0, 1, 0, −1), and v3 = (1, 3, 3, 3) form a linearly dependent set in R 4 . (b) Express each vector in part (a) as a linear combination of the other two. 11. For which real values of λ do the following vectors form a linearly dependent set in R 3 ?

      v1 = λ, − 21 , − 21 , v2 = − 21 , λ, − 21 , v3 = − 21 , − 21 , λ

12. Under what conditions is a set with one vector linearly independent? 13. In each part, let TA : R 2 →R 2 be multiplication by A, and let u1 = (1, 2) and u2 = (−1, 1). Determine whether the set {TA (u1 ), TA (u2 )} is linearly independent in R 2 .



(a) A =

1

−1

0

2





(b) A =

y

1

−1

−2

2



14. In each part, let TA : R 3 →R 3 be multiplication by A, and let u1 = (1, 0, 0), u2 = (2, −1, 1), and u3 = (0, 1, 1). Determine

17. (Calculus required ) The functions

f1 (x) = x and f2 (x) = cos x are linearly independent in F (−⬁, ⬁) because neither function is a scalar multiple of the other. Confirm the linear independence using the Wronskian. 18. (Calculus required ) The functions

f1 (x) = sin x and f2 (x) = cos x are linearly independent in F (−⬁, ⬁) because neither function is a scalar multiple of the other. Confirm the linear independence using the Wronskian. 19. (Calculus required ) Use the Wronskian to show that the following sets of vectors are linearly independent. (a) 1, x, ex

(b) 1, x, x 2

20. (Calculus required) Use the Wronskian to show that the functions f1 (x) = ex , f2 (x) = xex , and f3 (x) = x 2 ex are linearly independent vectors in C ⬁ (−⬁, ⬁). 21. (Calculus required) Use the Wronskian to show that the functions f1 (x) = sin x, f2 (x) = cos x , and f3 (x) = x cos x are linearly independent vectors in C ⬁ (−⬁, ⬁).

212

Chapter 4 General Vector Spaces

22. Show that for any vectors u, v, and w in a vector space V , the vectors u − v, v − w, and w − u form a linearly dependent set. 23. (a) In Example 1 we showed that the mutually orthogonal vectors i, j, and k form a linearly independent set of vectors in R 3 . Do you think that every set of three nonzero mutually orthogonal vectors in R 3 is linearly independent? Justify your conclusion with a geometric argument. (b) Justify your conclusion with an algebraic argument. [Hint: Use dot products.]

Working with Proofs 24. Prove that if {v1 , v2 , v3 } is a linearly independent set of vectors, then so are {v1 , v2 }, {v1 , v3 }, {v2 , v3 }, {v1 }, {v2 }, and {v3 }. 25. Prove that if S = {v1 , v2 , . . . , vr } is a linearly independent set of vectors, then so is every nonempty subset of S . 26. Prove that if S = {v1 , v2 , v3 } is a linearly dependent set of vectors in a vector space V, and v4 is any vector in V that is not in S , then {v1 , v2 , v3 , v4 } is also linearly dependent. 27. Prove that if S = {v1 , v2 , . . . , vr } is a linearly dependent set of vectors in a vector space V, and if vr+1 , . . . , vn are any vectors in V that are not in S , then {v1 , v2 , . . . , vr , vr+1 , . . . , vn } is also linearly dependent. 28. Prove that in P2 every set with more than three vectors is linearly dependent. 29. Prove that if {v1 , v2 } is linearly independent and v3 does not lie in span{v1 , v2 }, then {v1 , v2 , v3 } is linearly independent.

(a) A set containing a single vector is linearly independent. (b) The set of vectors {v, k v} is linearly dependent for every scalar k . (c) Every linearly dependent set contains the zero vector. (d) If the set of vectors {v1 , v2 , v3 } is linearly independent, then {k v1 , k v2 , k v3 } is also linearly independent for every nonzero scalar k . (e) If v1 , . . . , vn are linearly dependent nonzero vectors, then at least one vector vk is a unique linear combination of v1 , . . . , vk−1 . (f ) The set of 2 × 2 matrices that contain exactly two 1’s and two 0’s is a linearly independent set in M22 . (g) The three polynomials (x − 1)(x + 2), x(x + 2), and x(x − 1) are linearly independent. (h) The functions f1 and f2 are linearly dependent if there is a real number x such that k1 f1 (x) + k2 f2 (x) = 0 for some scalars k1 and k2 .

Working withTechnology T1. Devise three different methods for using your technology utility to determine whether a set of vectors in R n is linearly independent, and then use each of those methods to determine whether the following vectors are linearly independent. v1 = (4, −5, 2, 6), v2 = (2, −2, 1, 3),

30. Use part (a) of Theorem 4.3.1 to prove part (b). 31. Prove part (b) of Theorem 4.3.2.

v3 = (6, −3, 3, 9), v4 = (4, −1, 5, 6) T2. Show that S = {cos t, sin t, cos 2t, sin 2t} is a linearly independent set in C(−⬁, ⬁) by evaluating the left side of the equation

32. Prove part (c) of Theorem 4.3.2.

c1 cos t + c2 sin t + c3 cos 2t + c4 sin 2t = 0

True-False Exercises TF. In parts (a)–(h) determine whether the statement is true or false, and justify your answer.

at sufficiently many values of t to obtain a linear system whose only solution is c1 = c2 = c3 = c4 = 0.

4.4 Coordinates and Basis We usually think of a line as being one-dimensional, a plane as two-dimensional, and the space around us as three-dimensional. It is the primary goal of this section and the next to make this intuitive notion of dimension precise. In this section we will discuss coordinate systems in general vector spaces and lay the groundwork for a precise definition of dimension in the next section.

Coordinate Systems in Linear Algebra

In analytic geometry one uses rectangular coordinate systems to create a one-to-one correspondence between points in 2-space and ordered pairs of real numbers and between points in 3-space and ordered triples of real numbers (Figure 4.4.1). Although rectangular coordinate systems are common, they are not essential. For example, Figure 4.4.2 shows coordinate systems in 2-space and 3-space in which the coordinate axes are not mutually perpendicular.

4.4 Coordinates and Basis

213

z c

y

P(a, b, c)

P(a, b)

b

y b x a

O

Figure 4.4.1

a

x

Coordinates of P in a rectangular coordinate system in 2-space.

Coordinates of P in a rectangular coordinate system in 3-space.

z c

y

P(a, b, c)

P(a, b)

b

y b x a

O

Figure 4.4.2

a

x

Coordinates of P in a nonrectangular coordinate system in 2-space.

Coordinates of P in a nonrectangular coordinate system in 3-space.

In linear algebra coordinate systems are commonly specified using vectors rather than coordinate axes. For example, in Figure 4.4.3 we have re-created the coordinate systems in Figure 4.4.2 by using unit vectors to identify the positive directions and then attaching coordinates to a point P using the scalar coefficients in the equations

−→ −→ OP = a u1 + bu2 and OP = a u1 + bu2 + cu3

cu3 bu2

u3

P(a, b)

P(a, b, c)

u2

O u1

Figure 4.4.3

O

u1

au1

u2

bu2

au1

Units of measurement are essential ingredients of any coordinate system. In geometry problems one tries to use the same unit of measurement on all axes to avoid distorting the shapes of figures. This is less important in applications where coordinates represent physical quantities with diverse units (for example, time in seconds on one axis and temperature in degrees Celsius on another axis). To allow for this level of generality, we will relax the requirement that unit vectors be used to identify the positive directions and require only that those vectors be linearly independent. We will refer to these as the “basis vectors” for the coordinate system. In summary, it is the directions of the basis vectors that establish the positive directions, and it is the lengths of the basis vectors that establish the spacing between the integer points on the axes (Figure 4.4.4).

214

Chapter 4 General Vector Spaces y 4 3 2 1

–3 –2 –1 –1

y

y

y 2

4

2 3

x 1

2 3

1

x –3 –2 –1

–2 –3 –4 Equal spacing Perpendicular axes

1

2

1

1

2 3

–3

–2

–1 –1

x

x 1

2

–3 –2 –1

3

–1

–2

–1

1 2 3

–3 –4

–2 Unequal spacing Perpendicular axes

Equal spacing Skew axes

–2 Unequal spacing Skew axes

Figure 4.4.4

Basis for a Vector Space

Our next goal is to extend the concepts of “basis vectors” and “coordinate systems” to general vector spaces, and for that purpose we will need some definitions. Vector spaces fall into two categories: A vector space V is said to be finite-dimensional if there is a finite set of vectors in V that spans V and is said to be infinite-dimensional if no such set exists.

= {v1 , v2 , . . . , vn } is a set of vectors in a finite-dimensional vector space V , then S is called a basis for V if:

DEFINITION 1 If S

(a) S spans V. (b) S is linearly independent. If you think of a basis as describing a coordinate system for a finite-dimensional vector space V , then part (a) of this definition guarantees that there are enough basis vectors to provide coordinates for all vectors in V , and part (b) guarantees that there is no interrelationship between the basis vectors. Here are some examples. E X A M P L E 1 The Standard Basis for R n

Recall from Example 11 of Section 4.2 that the standard unit vectors e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0, . . . , 1) n

span R and from Example 1 of Section 4.3 that they are linearly independent. Thus, they form a basis for R n that we call the standard basis for R n . In particular, i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1) is the standard basis for R 3 . E X A M P L E 2 The Standard Basis for Pn

Show that S = {1, x, x 2 , . . . , x n } is a basis for the vector space Pn of polynomials of degree n or less. Solution We must show that the polynomials in

S are linearly independent and span

Pn . Let us denote these polynomials by p0 = 1, p1 = x, p2 = x 2 , . . . , pn = x n We showed in Example 13 of Section 4.2 that these vectors span Pn and in Example 4 of Section 4.3 that they are linearly independent. Thus, they form a basis for Pn that we call the standard basis for Pn .

4.4 Coordinates and Basis

215

E X A M P L E 3 Another Basis for R 3

Show that the vectors v1 = (1, 2, 1), v2 = (2, 9, 0), and v3 = (3, 3, 4) form a basis for R 3 . Solution We must show that these vectors are linearly independent and span

R 3 . To

prove linear independence we must show that the vector equation

c1 v1 + c2 v2 + c3 v3 = 0

(1)

has only the trivial solution; and to prove that the vectors span R 3 we must show that every vector b = (b1 , b2 , b3 ) in R 3 can be expressed as

c1 v1 + c2 v2 + c3 v3 = b

(2)

By equating corresponding components on the two sides, these two equations can be expressed as the linear systems

c1 + 2c2 + 3c3 = 0 2c1 + 9c2 + 3c3 = 0

c1 + 2c2 + 3c3 = b1 2c1 + 9c2 + 3c3 = b2

and

+ 4 c3 = 0

c1

(3)

+ 4 c3 = b 3

c1

(verify). Thus, we have reduced the problem to showing that in (3) the homogeneous system has only the trivial solution and that the nonhomogeneous system is consistent for all values of b1 , b2 , and b3 . But the two systems have the same coefficient matrix



1 ⎢ A = ⎣2 1 From Examples 1 and 3 you can see that a vector space can have more than one basis.



2 9 0

3 ⎥ 3⎦ 4

so it follows from parts (b), (e), and (g) of Theorem 2.3.8 that we can prove both results at the same time by showing that det(A)  = 0. We leave it for you to confirm that det(A) = −1, which proves that the vectors v1 , v2 , and v3 form a basis for R 3 . E X A M P L E 4 The Standard Basis for M mn

Show that the matrices

1 0 0 M1 = , M2 = 0 0 0





1 0 , M3 = 0 1

0 0 , M4 = 0 0

0 1

form a basis for the vector space M22 of 2 × 2 matrices. Solution We must show that the matrices are linearly independent and span M22 . To prove linear independence we must show that the equation

c1 M1 + c2 M2 + c3 M3 + c4 M4 = 0

(4)

has only the trivial solution, where 0 is the 2 × 2 zero matrix; and to prove that the matrices span M22 we must show that every 2 × 2 matrix



a B= c

b d

can be expressed as

c1 M1 + c2 M2 + c3 M3 + c4 M4 = B The matrix forms of Equations (4) and (5) are



c1

and

c1





1 0

0 0 + c2 0 0

1 0

0 0 + c2 0 0





1 0 + c3 0 1 1 0 + c3 0 1





0 0 + c4 0 0 0 0 + c4 0 0

(5)





0 0 = 1 0 0 a = 1 c

0 0

b d

216

Chapter 4 General Vector Spaces

which can be rewritten as



c1 c3



c2 0 0 c1 = and 0 0 c4 c3

c2 a = c4 c

b d

Since the first equation has only the trivial solution

c1 = c2 = c3 = c4 = 0 the matrices are linearly independent, and since the second equation has the solution

c1 = a, c2 = b, c3 = c, c4 = d the matrices span M22 . This proves that the matrices M1 , M2 , M3 , M4 form a basis for M22 . More generally, the mn different matrices whose entries are zero except for a single entry of 1 form a basis for Mmn called the standard basis for Mmn . The simplest of all vector spaces is the zero vector space V = {0}. This space is finite-dimensional because it is spanned by the vector 0. However, it has no basis in the sense of Definition 1 because {0} is not a linearly independent set (why?). However, we will find it useful to define the empty set Ø to be a basis for this vector space. E X A M P L E 5 An Infinite-Dimensional Vector Space

Show that the vector space of P⬁ of all polynomials with real coefficients is infinitedimensional by showing that it has no finite spanning set. Solution If there were a finite spanning set, say S = {p1 , p2 , . . . , pr }, then the degrees of the polynomials in S would have a maximum value, say n; and this in turn would imply that any linear combination of the polynomials in S would have degree at most n. Thus, there would be no way to express the polynomial x n+1 as a linear combination of the polynomials in S , contradicting the fact that the vectors in S span P⬁ .

E X A M P L E 6 Some Finite- and Infinite-Dimensional Spaces

In Examples 1, 2, and 4 we found bases for R n , Pn , and Mmn , so these vector spaces are finite-dimensional. We showed in Example 5 that the vector space P⬁ is not spanned by finitely many vectors and hence is infinite-dimensional. Some other examples of infinite-dimensional vector spaces are R ⬁ , F (−⬁, ⬁), C(−⬁, ⬁), C m (−⬁, ⬁), and C ⬁ (−⬁, ⬁). Coordinates Relative to a Basis

Earlier in this section we drew an informal analogy between basis vectors and coordinate systems. Our next goal is to make this informal idea precise by defining the notion of a coordinate system in a general vector space. The following theorem will be our first step in that direction. THEOREM 4.4.1 Uniqueness of Basis Representation

If S = {v1 , v2 , . . . , vn } is a basis for a vector space V, then every vector v in V can be expressed in the form v = c1 v1 + c2 v2 + · · · + cn vn in exactly one way. Proof Since S spans V, it follows from the definition of a spanning set that every vector

in V is expressible as a linear combination of the vectors in S . To see that there is only one way to express a vector as a linear combination of the vectors in S , suppose that some vector v can be written as v = c1 v1 + c2 v2 + · · · + cn vn

4.4 Coordinates and Basis

217

and also as v = k1 v1 + k2 v2 + · · · + kn vn Subtracting the second equation from the first gives 0 = (c1 − k1 )v1 + (c2 − k2 )v2 + · · · + (cn − kn )vn Since the right side of this equation is a linear combination of vectors in S , the linear independence of S implies that

c1 − k1 = 0, c2 − k2 = 0, . . . , cn − kn = 0 z

that is,

ck k

c1 = k1 , c2 = k2 , . . . , cn = kn (0, 0, 1)

Thus, the two expressions for v are the same. (a, b, c) y

j i x

(1, 0, 0)

ai

bj

We now have all of the ingredients required to define the notion of “coordinates” in a general vector space V . For motivation, observe that in R 3 , for example, the coordinates (a, b, c) of a vector v are precisely the coefficients in the formula v = a i + bj + ck

(0, 1, 0)

Figure 4.4.5

that expresses v as a linear combination of the standard basis vectors for R 3 (see Figure 4.4.5). The following definition generalizes this idea. DEFINITION 2 If S

Sometimes it will be desirable to write a coordinate vector as a column matrix or row matrix, in which case we will denote it with square brackets as [v]S . We will refer to this as the matrix form of the coordinate vector and (6) as the commadelimited form.

= {v1 , v2 , . . . , vn } is a basis for a vector space V , and v = c1 v1 + c2 v2 + · · · + cn vn

is the expression for a vector v in terms of the basis S , then the scalars c1 , c2 , . . . , cn are called the coordinates of v relative to the basis S . The vector (c1 , c2 , . . . , cn ) in R n constructed from these coordinates is called the coordinate vector of v relative to S; it is denoted by (v)S = (c1 , c2 , . . . , cn ) (6)

Remark It is standard to regard two sets to be the same if they have the same members, even if those members are written in a different order. In particular, in a basis for a vector space V , which is a set of linearly independent vectors that span V , the order in which those vectors are listed does not generally matter. However, the order in which they are listed is critical for coordinate vectors, since changing the order of the basis vectors changes the coordinate vectors [for example, in R 2 the coordinate pair (1, 2) is not the same as the coordinate pair (2, 1)]. To deal with this complication, many authors define an ordered basis to be one in which the listing order of the basis vectors remains fixed. In all discussions involving coordinate vectors we will assume that the underlying basis is ordered, even though we may not say so explicitly.

Observe that (v)S is a vector in R n , so that once an ordered basis S is given for a vector space V , Theorem 4.4.1 establishes a one-to-one correspondence between vectors in V and vectors in R n (Figure 4.4.6).

A one-to-one correspondence

(v)S

v

Figure 4.4.6

V

Rn

218

Chapter 4 General Vector Spaces

E X A M P L E 7 Coordinates Relative to the Standard Basis for R n

In the special case where V = R n and S is the standard basis, the coordinate vector (v)S and the vector v are the same; that is, v = (v)S For example, in R 3 the representation of a vector v = (a, b, c) as a linear combination of the vectors in the standard basis S = {i, j, k} is v = a i + bj + ck so the coordinate vector relative to this basis is (v)S = (a, b, c), which is the same as the vector v. E X A M P L E 8 Coordinate Vectors Relative to Standard Bases

(a) Find the coordinate vector for the polynomial p(x) = c0 + c1 x + c2 x 2 + · · · + cn x n relative to the standard basis for the vector space Pn . (b) Find the coordinate vector of



a B= c relative to the standard basis for M22 .

b d

Solution (a) The given formula for p(x) expresses this polynomial as a linear combina-

tion of the standard basis vectors S = {1, x, x 2 , . . . , x n }. Thus, the coordinate vector for p relative to S is (p)S = (c0 , c1 , c2 , . . . , cn ) Solution (b) We showed in Example 4 that the representation of a vector



B=

a c

b d

as a linear combination of the standard basis vectors is

a B= c







1 0 0 1 0 0 0 0 b +b +c +d =a 0 0 0 0 1 0 0 1 d

so the coordinate vector of B relative to S is

(B)S = (a, b, c, d) E X A M P L E 9 Coordinates in R 3

(a) We showed in Example 3 that the vectors v1 = (1, 2, 1), v2 = (2, 9, 0), v3 = (3, 3, 4) form a basis for R 3 . Find the coordinate vector of v = (5, −1, 9) relative to the basis S = {v1 , v2 , v3 }. (b) Find the vector v in R 3 whose coordinate vector relative to S is (v)S = (−1, 3, 2). Solution (a) To find (v)S we must first express v as a linear combination of the vectors in S ; that is, we must find values of c1 , c2 , and c3 such that

v = c1 v1 + c2 v2 + c3 v3

4.4 Coordinates and Basis

219

or, in terms of components,

(5, −1, 9) = c1 (1, 2, 1) + c2 (2, 9, 0) + c3 (3, 3, 4) Equating corresponding components gives

c1 + 2c2 + 3c3 =

5

2c1 + 9c2 + 3c3 = −1

+ 4 c3 =

c1

9

Solving this system we obtain c1 = 1, c2 = −1, c3 = 2 (verify). Therefore,

(v)S = (1, −1, 2) Solution (b) Using the definition of (v)S , we obtain

v = (−1)v1 + 3v2 + 2v3 = (−1)(1, 2, 1) + 3(2, 9, 0) + 2(3, 3, 4) = (11, 31, 7)

Exercise Set 4.4 1. Use the method of Example 3 to show that the following set of vectors forms a basis for R 2 .

)

*

(2, 1), (3, 0)

2. Use the method of Example 3 to show that the following set of vectors forms a basis for R 3 .

*

)

3. Show that the following polynomials form a basis for P2 .

4. Show that the following polynomials form a basis for P3 .

5. Show that the following matrices form a basis for M22 . 3

6

3

−6

,

0

−1

−1

0





,

0

−12





−8 , −4

1

0

−1

2



6. Show that the following matrices form a basis for M22 .





1

1

1

1



,



1

−1

0

0



,



0

−1

1

0



,

1

0

0

0



7. In each part, show that the set of vectors is not a basis for R 3 .

)

*

)

*

(a) (2, −3, 1), (4, 1, 1), (0, −7, 1)

(b) (1, 6, 4), (2, 4, −1), (−1, 2, 5)

8. Show that the following vectors do not form a basis for P2 . 1 − 3x + 2 x 2 ,

1 + x + 4x 2 ,

0

1

1



,

2

−2

3

2





,

1

 −1

1

0



,

0

 −1

1

1

10. Let V be the space spanned by v1 = cos2 x , v2 = sin2 x , v3 = cos 2x . (b) Find a basis for V .

(a) u1 = (2, −4), u2 = (3, 8); w = (1, 1) (b) u1 = (1, 1), u2 = (0, 2); w = (a, b)

1 + x, 1 − x, 1 − x 2 , 1 − x 3





1

11. Find the coordinate vector of w relative to the basis S = {u1 , u2 } for R 2 .

x 2 + 1, x 2 − 1, 2x − 1





(a) Show that S = {v1 , v2 , v3 } is not a basis for V.

(3, 1, −4), (2, 5, 6), (1, 4, 8)



9. Show that the following matrices do not form a basis for M22 .

1 − 7x

12. Find the coordinate vector of w relative to the basis S = {u1 , u2 } for R 2 . (a) u1 = (1, −1), u2 = (1, 1); w = (1, 0) (b) u1 = (1, −1), u2 = (1, 1); w = (0, 1) 13. Find the coordinate vector of v relative to the basis S = {v1 , v2 , v3 } for R 3 . (a) v = (2, −1, 3); v1 = (1, 0, 0), v2 = (2, 2, 0), v3 = (3, 3, 3) (b) v = (5, −12, 3); v1 = (1, 2, 3), v2 = (−4, 5, 6), v3 = (7, −8, 9) 14. Find the coordinate vector of p relative to the basis S = {p1 , p2 , p3 } for P2 . (a) p = 4 − 3x + x 2 ; p1 = 1, p2 = x , p3 = x 2 (b) p = 2 − x + x 2 ; p1 = 1 + x , p2 = 1 + x 2 , p3 = x + x 2

220

Chapter 4 General Vector Spaces

In Exercises 15–16, first show that the set S = {A1 , A2 , A3 , A4 } is a basis for M22 , then express A as a linear combination of the vectors in S , and then find the coordinate vector of A relative to S .

15. A1 =

1 0 , A2 = 1 1

0 0

0 1 ; A= 1 1

1 1

0 1 , A2 = 0 0

0 1

0 6 ; A= 0 5

A4 = 16. A1 =

0 0







1 0 , A3 = 1 1



A4 =



1 1



2 3

(b) (1, 0)

(c) (0, 1)

(d) (a, b)

x' j and u2

1 1 , A3 = 0 0





(a) ( 3, 1) y and y'

0 , 1

and u2 . Find the x y -coordinates of the points whose xy coordinates are given.

u1

0 , 1

x

30° i

In Exercises 17–18, first show that the set S = {p1 , p2 , p3 } is a basis for P2 , then express p as a linear combination of the vectors in S , and then find the coordinate vector of p relative to S . 17. p1 = 1 + x + x 2 , p2 = x + x 2 , p3 = x 2 ; p = 7 − x + 2x 2

24. The accompanying figure shows a rectangular xy -coordinate system and an x y -coordinate system with skewed axes. Assuming that 1-unit scales are used on all the axes, find the x y coordinates of the points whose xy -coordinates are given. (a) (1, 1)

18. p1 = 1 + 2x + x 2 , p2 = 2 + 9x, p3 = 3 + 3x + 4x 2 ; p = 2 + 17x − 3x 2

Figure Ex-23

(b) (1, 0)

y

(c) (0, 1)

(d) (a, b)



19. In words, explain why the sets of vectors in parts (a) to (d) are not bases for the indicated vector spaces. (a) u1 = (1, 2), u2 = (0, 3), u3 = (1, 5) for R 2

45°

x and x´

(b) u1 = (−1, 3, 2), u2 = (6, 1, 1) for R 3

Figure Ex-24

(c) p1 = 1 + x + x 2 , p2 = x for P2



(d) A =



1

0

2

3

5

0

4

2

 D=



, B= 

6

0

−1

4



 , C=

3

0

1

7

 25. The first four Hermite polynomials [named for the French mathematician Charles Hermite (1822–1901)] are

,

1, 2t, −2 + 4t 2 , −12t + 8t 3

for M22

20. In any vector space a set that contains the zero vector must be linearly dependent. Explain why this is so. 21. In each part, let TA : R 3 →R 3 be multiplication by A, and let {e1 , e2 , e3 } be the standard basis for R 3 . Determine whether the set {TA (e1 ), TA (e2 ), TA (e3 )} is linearly independent in R 2 .



1

1

⎢ (a) A = ⎣ 0 −1

1



1

⎥ −3⎦

2

0



1

1

⎢ (b) A = ⎣ 0 −1

2

2

−1

(a) A = ⎣1

1

0

−1



0

⎤ ⎥

1⎦ 2



1

⎥ 1⎦

2

1



0

1

0

(b) A = ⎣1

0

1⎦

0

0

1



(a) Show that the first four Hermite polynomials form a basis for P3 . (b) Let B be the basis in part (a). Find the coordinate vector of the polynomial



22. In each part, let TA : R 3 →R 3 be multiplication by A, and let u = (1, −2, −1). Find the coordinate vector of TA (u) relative to the basis S = {(1, 1, 0), (0, 1, 1), (1, 1, 1)} for R 3 .



These polynomials have a wide variety of applications in physics and engineering.



23. The accompanying figure shows a rectangular xy -coordinate system determined by the unit basis vectors i and j and an x y -coordinate system determined by unit basis vectors u1

p(t) = −1 − 4t + 8t 2 + 8t 3 relative to B . 26. The first four Laguerre polynomials [named for the French mathematician Edmond Laguerre (1834–1886)] are 1, 1 − t, 2 − 4t + t 2 , 6 − 18t + 9t 2 − t 3 (a) Show that the first four Laguerre polynomials form a basis for P3 . (b) Let B be the basis in part (a). Find the coordinate vector of the polynomial p(t) = −10t + 9t 2 − t 3 relative to B .

4.5 Dimension

27. Consider the coordinate vectors



⎤ −8 6 3 ⎢ 7⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ [w]S = ⎣−1⎦ , [q]S = ⎣0⎦ , [B]S = ⎢ ⎥ ⎣ 6⎦ ⎤



⎡ ⎤

4

4

3

221

(c) If {v1 , v2 , . . . , vn } is a basis for a vector space V, then every vector in V can be expressed as a linear combination of v 1 , v2 , . . . , vn . (d) The coordinate vector of a vector x in R n relative to the standard basis for R n is x.

(a) Find w if S is the basis in Exercise 2. (b) Find q if S is the basis in Exercise 3. (c) Find B if S is the basis in Exercise 5. 28. The basis that we gave for M22 in Example 4 consisted of noninvertible matrices. Do you think that there is a basis for M22 consisting of invertible matrices? Justify your answer.

(e) Every basis of P4 contains at least one polynomial of degree 3 or less.

Working withTechnology T1. Let V be the subspace of P3 spanned by the vectors p1 = 1 + 5x − 3x 2 − 11x 3 , p2 = 7 + 4x − x 2 + 2x 3 , p3 = 5 + x + 9x 2 + 2x 3 ,

Working with Proofs

p4 = 3 − x + 7x 2 + 5x 3

29. Prove that R ⬁ is an infinite-dimensional vector space.

(a) Find a basis S for V .

30. Let TA : R →R be multiplication by an invertible matrix A, and let {u1 , u2 , . . . , un } be a basis for R n . Prove that {TA (u1 ), TA (u2 ), . . . , TA (un )} is also a basis for R n .

(b) Find the coordinate vector of p = 19 + 18x − 13x 2 − 10x 3 relative to the basis S you obtained in part (a).

n

n

31. Prove that if V is a subspace of a vector space W and if V is infinite-dimensional, then so is W .

True-False Exercises TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) If V = span{v1 , . . . , vn }, then {v1 , . . . , vn } is a basis for V. (b) Every linearly independent subset of a vector space V is a basis for V.

T2. Let V be the subspace of C ⬁ (−⬁, ⬁) spanned by the vectors in the set

B = {1, cos x, cos2 x, cos3 x, cos4 x, cos5 x} and accept without proof that B is a basis for V . Confirm that the following vectors are in V , and find their coordinate vectors relative to B . f0 = 1, f1 = cos x, f2 = cos 2x, f3 = cos 3x, f4 = cos 4x, f5 = cos 5x

4.5 Dimension We showed in the previous section that the standard basis for R n has n vectors and hence that the standard basis for R 3 has three vectors, the standard basis for R 2 has two vectors, and the standard basis for R 1 (= R) has one vector. Since we think of space as three-dimensional, a plane as two-dimensional, and a line as one-dimensional, there seems to be a link between the number of vectors in a basis and the dimension of a vector space. We will develop this idea in this section.

Number of Vectors in a Basis

Our first goal in this section is to establish the following fundamental theorem.

THEOREM 4.5.1 All bases for a finite-dimensional vector space have the same number

of vectors.

To prove this theorem we will need the following preliminary result, whose proof is deferred to the end of the section.

222

Chapter 4 General Vector Spaces THEOREM 4.5.2 Let

V be an n-dimensional vector space, and let {v1 , v2 , . . . , vn } be

any basis. (a) If a set in V has more than n vectors, then it is linearly dependent. (b) If a set in V has fewer than n vectors, then it does not span V.

We can now see rather easily why Theorem 4.5.1 is true; for if

S = {v1 , v2 , . . . , vn } is an arbitrary basis for V , then the linear independence of S implies that any set in V with more than n vectors is linearly dependent and any set in V with fewer than n vectors does not span V . Thus, unless a set in V has exactly n vectors it cannot be a basis. We noted in the introduction to this section that for certain familiar vector spaces the intuitive notion of dimension coincides with the number of vectors in a basis. The following definition makes this idea precise. DEFINITION 1 The dimension of a finite-dimensional vector space

V is denoted by dim(V ) and is defined to be the number of vectors in a basis for V . In addition, the zero vector space is defined to have dimension zero.

Engineers often use the term degrees of freedom as a synonym for dimension.

E X A M P L E 1 Dimensions of Some Familiar Vector Spaces

dim(R n ) = n

[ The standard basis has n vectors. ]

dim(Pn ) = n + 1 [ The standard basis has n + 1 vectors. ] dim(Mmn ) = mn [ The standard basis has mn vectors. ] E X A M P L E 2 Dimension of Span(S)

If S = {v1 , v2 , . . . , vr } then every vector in span(S) is expressible as a linear combination of the vectors in S . Thus, if the vectors in S are linearly independent, they automatically form a basis for span(S), from which we can conclude that dim[span{v1 , v2 , . . . , vr }] = r In words, the dimension of the space spanned by a linearly independent set of vectors is equal to the number of vectors in that set. E X A M P L E 3 Dimension of a Solution Space

Find a basis for and the dimension of the solution space of the homogeneous system

x1 + 3x2 − 2x3

+ 2 x5

=0

2x1 + 6x2 − 5x3 − 2x4 + 4x5 − 3x6 = 0 5x3 + 10x4 2x1 + 6x2

+ 15x6 = 0

+ 8x4 + 4x5 + 18x6 = 0

Solution In Example 6 of Section 1.2 we found the solution of this system to be

x1 = −3r − 4s − 2t, x2 = r, x3 = −2s, x4 = s, x5 = t, x6 = 0 which can be written in vector form as

(x1 , x2 , x3 , x4 , x5 , x6 ) = (−3r − 4s − 2t, r, −2s, s, t, 0)

4.5 Dimension

223

or, alternatively, as

(x1 , x2 , x3 , x4 , x5 , x6 ) = r(−3, 1, 0, 0, 0, 0) + s(−4, 0, −2, 1, 0, 0) + t(−2, 0, 0, 0, 1, 0) This shows that the vectors v1 = (−3, 1, 0, 0, 0, 0), v2 = (−4, 0, −2, 1, 0, 0), v3 = (−2, 0, 0, 0, 1, 0) span the solution space. We leave it for you to check that these vectors are linearly independent by showing that none of them is a linear combination of the other two (but see the remark that follows). Thus, the solution space has dimension 3. Remark It can be shown that for any homogeneous linear system, the method of the last example always produces a basis for the solution space of the system. We omit the formal proof.

Some Fundamental Theorems

We will devote the remainder of this section to a series of theorems that reveal the subtle interrelationships among the concepts of linear independence, spanning sets, basis, and dimension. These theorems are not simply exercises in mathematical theory—they are essential to the understanding of vector spaces and the applications that build on them. We will start with a theorem (proved at the end of this section) that is concerned with the effect on linear independence and spanning if a vector is added to or removed from a nonempty set of vectors. Informally stated, if you start with a linearly independent set S and adjoin to it a vector that is not a linear combination of those already in S , then the enlarged set will still be linearly independent. Also, if you start with a set S of two or more vectors in which one of the vectors is a linear combination of the others, then that vector can be removed from S without affecting span(S ) (Figure 4.5.1).

The vector outside the plane can be adjoined to the other two without affecting their linear independence.

Any of the vectors can be removed, and the remaining two will still span the plane.

Either of the collinear vectors can be removed, and the remaining two will still span the plane.

Figure 4.5.1

THEOREM 4.5.3 Plus/Minus Theorem

Let S be a nonempty set of vectors in a vector space V. (a) If S is a linearly independent set, and if v is a vector in V that is outside of span(S), then the set S ∪ {v} that results by inserting v into S is still linearly independent. (b) If v is a vector in S that is expressible as a linear combination of other vectors in S, and if S − {v} denotes the set obtained by removing v from S, then S and S − {v} span the same space; that is, span(S) = span(S − {v})

224

Chapter 4 General Vector Spaces

E X A M P L E 4 Applying the Plus/Minus Theorem

Show that p1 = 1 − x 2 , p2 = 2 − x 2 , and p3 = x 3 are linearly independent vectors. Solution The set S = {p1 , p2 } is linearly independent since neither vector in S is a scalar multiple of the other. Since the vector p3 cannot be expressed as a linear combination of the vectors in S (why?), it can be adjoined to S to produce a linearly independent set S ∪ {p3 } = {p1 , p2 , p3 }.

In general, to show that a set of vectors {v1 , v2 , . . . , vn } is a basis for a vector space V, one must show that the vectors are linearly independent and span V. However, if we happen to know that V has dimension n (so that {v1 , v2 , . . . , vn } contains the right number of vectors for a basis), then it suffices to check either linear independence or spanning—the remaining condition will hold automatically. This is the content of the following theorem. THEOREM 4.5.4 Let

V be an n-dimensional vector space, and let S be a set in V with exactly n vectors. Then S is a basis for V if and only if S spans V or S is linearly independent.

S has exactly n vectors and spans V. To prove that S is a basis, we must show that S is a linearly independent set. But if this is not so, then some vector v in S is a linear combination of the remaining vectors. If we remove this vector from S , then it follows from Theorem 4.5.3(b) that the remaining set of n − 1 vectors still spans V. But this is impossible since Theorem 4.5.2(b) states that no set with fewer than n vectors can span an n-dimensional vector space. Thus S is linearly independent. Assume that S has exactly n vectors and is a linearly independent set. To prove that S is a basis, we must show that S spans V. But if this is not so, then there is some vector v in V that is not in span(S). If we insert this vector into S , then it follows from Theorem 4.5.3(a) that this set of n + 1 vectors is still linearly independent. But this is impossible, since Theorem 4.5.2(a) states that no set with more than n vectors in an n-dimensional vector space can be linearly independent. Thus S spans V. Proof Assume that

E X A M P L E 5 Bases by Inspection

(a) Explain why the vectors v1 = (−3, 7) and v2 = (5, 5) form a basis for R 2 . (b) Explain why the vectors v1 = (2, 0, −1), v2 = (4, 0, 7), and v3 = (−1, 1, 4) form a basis for R 3 . Solution (a) Since neither vector is a scalar multiple of the other, the two vectors form

a linearly independent set in the two-dimensional space R 2 , and hence they form a basis by Theorem 4.5.4. Solution (b) The vectors v1 and v2 form a linearly independent set in the xz-plane (why?).

The vector v3 is outside of the xz-plane, so the set {v1 , v2 , v3 } is also linearly independent. Since R 3 is three-dimensional, Theorem 4.5.4 implies that {v1 , v2 , v3 } is a basis for the vector space R 3 .

The next theorem (whose proof is deferred to the end of this section) reveals two important facts about the vectors in a finite-dimensional vector space V :

4.5 Dimension

225

1. Every spanning set for a subspace is either a basis for that subspace or has a basis as a subset. 2. Every linearly independent set in a subspace is either a basis for that subspace or can be extended to a basis for it.

THEOREM 4.5.5 Let S be a finite set of vectors in a finite-dimensional vector space V.

(a) If S spans V but is not a basis for V, then S can be reduced to a basis for V by removing appropriate vectors from S . (b) If S is a linearly independent set that is not already a basis for V, then S can be enlarged to a basis for V by inserting appropriate vectors into S .

We conclude this section with a theorem that relates the dimension of a vector space to the dimensions of its subspaces. THEOREM 4.5.6 If W is a subspace of a finite-dimensional vector space V, then:

(a) W is finite-dimensional. (b) dim(W ) ≤ dim(V ). (c) W = V if and only if dim(W ) = dim(V ). Proof (a) We will leave the proof of this part as an exercise. Proof (b) Part (a) shows that W is finite-dimensional, so it has a basis

S = {w1 , w2 , . . . , wm } Either S is also a basis for V or it is not. If so, then dim(V ) = m, which means that dim(V ) = dim(W ). If not, then because S is a linearly independent set it can be enlarged to a basis for V by part (b) of Theorem 4.5.5. But this implies that dim(W ) < dim(V ), so we have shown that dim(W ) ≤ dim(V ) in all cases. Proof (c) Assume that dim(W )

= dim(V ) and that S = {w1 , w2 , . . . , wm }

is a basis for W . If S is not also a basis for V , then being linearly independent S can be extended to a basis for V by part (b) of Theorem 4.5.5. But this would mean that dim(V ) > dim(W ), which contradicts our hypothesis. Thus S must also be a basis for V, which means that W = V . The converse is obvious. Figure 4.5.2 illustrates the geometric relationship between the subspaces of R 3 in order of increasing dimension. Line through the origin (1-dimensional)

The origin (0-dimensional)

Figure 4.5.2

Plane through the origin (2-dimensional) R3 (3-dimensional)

226

Chapter 4 General Vector Spaces O PT I O N A L

We conclude this section with optional proofs of Theorems 4.5.2, 4.5.3, and 4.5.5. Proof ofTheorem 4.5.2 (a) Let S

= {w1 , w2 , . . . , wm } be any set of m vectors in V, where m > n. We want to show that S is linearly dependent. Since S = {v1 , v2 , . . . , vn } is a basis, each wi can be expressed as a linear combination of the vectors in S , say w1 = a11 v1 + a21 v2 + · · · + an1 vn w2 = a12 v1 + a22 v2 + · · · + an2 vn

.. .

.. .

.. .

(1)

.. .

wm = a1m v1 + a2m v2 + · · · + anm vn To show that S is linearly dependent, we must find scalars k1 , k2 , . . . , km , not all zero, such that k1 w1 + k2 w2 + · · · + km wm = 0 (2) We leave it for you to verify that the equations in (1) can be rewritten in the partitioned form ⎡ ⎤

a11 ⎢ ⎢a12 ⎢ [w1 | w2 | · · · | wm ] = [v1 | v2 | · · · | vn ] ⎢ . ⎢ .. ⎣ a1n

Since m > n, the linear system



a11 ⎢ ⎢a12 ⎢ ⎢ .. ⎢ . ⎣ a1n

a21 a22 .. . a2 n

a21

···

a22 .. .

···

a2 n

a m1

⎥ am2 ⎥ ⎥ .. ⎥ . ⎥ ⎦ · · · amn

⎤ am1 ⎡ x1 ⎤ ⎡0⎤ ⎥ ⎢ ⎥ · · · am2 ⎥ ⎢ x2 ⎥ ⎥ ⎢0⎥ ⎥⎢ ⎥ ⎥ ⎢ .. ⎥ ⎢ . ⎥ = ⎢ .. ⎥ .. ⎦ ⎢ . ⎥ ⎣ ⎣ .⎦ ⎦ · · · amn 0 xm

(3)

···

(4)

has more equations than unknowns and hence has a nontrivial solution

x1 = k1 , x2 = k2 , . . . , xm = km Creating a column vector from this solution and multiplying both sides of (3) on the right by this vector yields



k1





a11

a21

⎢ ⎢k ⎥ ⎢a12 ⎢ 2⎥ ⎢ ⎢ ⎥ [w1 | w2 | · · · | wm ] ⎢ . ⎥ = [v1 | v2 | · · · | vn ] ⎢ . ⎢ .. ⎣ .. ⎦ ⎣ a1n km

a22 .. . a2n

⎤ a m 1 ⎡ k1 ⎤ ⎥ · · · am2 ⎥ ⎢ k2 ⎥ ⎥ ⎥⎢ ⎢ .. ⎥ ⎢ . ⎥ .⎥ . ⎥ ⎦⎣ . ⎦ · · · amn km ···

By (4), this simplifies to



k1



⎡ ⎤ 0

⎢ k ⎥ ⎢0⎥ ⎢ 2⎥ ⎢ ⎥ ⎥ ⎢ ⎥ [w1 | w2 | · · · | wm ] ⎢ ⎢ .. ⎥ = ⎢ .. ⎥ ⎣ . ⎦ ⎣.⎦ km

0

which we can rewrite as

k1 w1 + k2 w2 + · · · + km wm = 0 Since the scalar coefficients in this equation are not all zero, we have proved that S = {w1 , w2 , . . . , wm } is linearly independent.

4.5 Dimension

227

The proof of Theorem 4.5.2(b) closely parallels that of Theorem 4.5.2(a) and will be omitted.

S = {v1 , v2 , . . . , vr } is a linearly independent set of vectors in V, and v is a vector in V that is outside of span(S). To show that S = {v1 , v2 , . . . , vr , v} is a linearly independent set, we must show that the only scalars that satisfy (5) k1 v1 + k2 v2 + · · · + kr vr + kr+1 v = 0 Proof of Theorem 4.5.3 (a) Assume that

are k1 = k2 = · · · = kr = kr+1 = 0. But it must be true that kr+1 = 0 for otherwise we could solve (5) for v as a linear combination of v1 , v2 , . . . , vr , contradicting the assumption that v is outside of span(S). Thus, (5) simplifies to

k1 v1 + k2 v2 + · · · + kr vr = 0

(6)

which, by the linear independence of {v1 , v2 , . . . , vr }, implies that

k1 = k2 = · · · = kr = 0 Proof of Theorem 4.5.3 (b) Assume that S = {v1 , v2 , . . . , vr } is a set of vectors in V, and (to be specific) suppose that vr is a linear combination of v1 , v2 , . . . , vr−1 , say

vr = c1 v1 + c2 v2 + · · · + cr−1 vr−1

(7)

We want to show that if vr is removed from S , then the remaining set of vectors {v1 , v2 , . . . , vr−1 } still spans S ; that is, we must show that every vector w in span(S) is expressible as a linear combination of {v1 , v2 , . . . , vr−1 }. But if w is in span(S), then w is expressible in the form w = k1 v1 + k2 v2 + · · · + kr−1 vr−1 + kr vr or, on substituting (7), w = k1 v1 + k2 v2 + · · · + kr−1 vr−1 + kr (c1 v1 + c2 v2 + · · · + cr−1 vr−1 ) which expresses w as a linear combination of v1 , v2 , . . . , vr−1 .

S is a set of vectors that spans V but is not a basis for V, then S is a linearly dependent set. Thus some vector v in S is expressible as a linear combination of the other vectors in S . By the Plus/Minus Theorem (4.5.3b), we can remove v from S , and the resulting set S will still span V. If S is linearly independent, then S is a basis for V, and we are done. If S is linearly dependent, then we can remove some appropriate vector from S to produce a set S

that still spans V. We can continue removing vectors in this way until we finally arrive at a set of vectors in S that is linearly independent and spans V. This subset of S is a basis for V. Proof of Theorem 4.5.5 (a) If

Proof of Theorem 4.5.5 (b) Suppose that dim(V ) = n. If S is a linearly independent set that is not already a basis for V, then S fails to span V, so there is some vector v in V that is not in span(S). By the Plus/Minus Theorem (4.5.3a), we can insert v into S , and the resulting set S will still be linearly independent. If S spans V, then S is a basis for V, and we are finished. If S does not span V, then we can insert an appropriate vector into S to produce a set S

that is still linearly independent. We can continue inserting vectors in this way until we reach a set with n linearly independent vectors in V. This set will be a basis for V by Theorem 4.5.4.

228

Chapter 4 General Vector Spaces

Exercise Set 4.5 In Exercises 1–6, find a basis for the solution space of the homogeneous linear system, and find the dimension of that space. 1.

x1 + x2 − x3 = 0 −2x1 − x2 + 2x3 = 0 −x1 + x3 = 0

2. 3x1 + x2 + x3 + x4 = 0 5x1 − x2 + x3 − x4 = 0

3. 2x1 + x2 + 3x3 = 0 x1 + 5x3 = 0 x2 + x3 = 0

4. x1 − 4x2 + 3x3 − x4 = 0 2x1 − 8x2 + 6x3 − 2x4 = 0

5. x1 − 3x2 + x3 = 0 2x1 − 6x2 + 2x3 = 0 3x1 − 9x2 + 3x3 = 0

6. x 3x 4x 6x

+ y + 2y + 3y + 5y

+ z=0 − 2z = 0 − z=0 + z=0

7. In each part, find a basis for the given subspace of R 3 , and state its dimension. (a) The plane 3x − 2y + 5z = 0. (b) The plane x − y = 0. (c) The line x = 2t, y = −t, z = 4t . (d) All vectors of the form (a, b, c), where b = a + c. 8. In each part, find a basis for the given subspace of R 4 , and state its dimension.

14. Let {v1 , v2 , v3 } be a basis for a vector space V. Show that {u1 , u2 , u3 } is also a basis, where u1 = v1 , u2 = v1 + v2 , and u3 = v1 + v2 + v3 . 15. The vectors v1 = (1, −2, 3) and v2 = (0, 5, −3) are linearly independent. Enlarge {v1 , v2 } to a basis for R 3 . 16. The vectors v1 = (1, 0, 0, 0) and v2 = (1, 1, 0, 0) are linearly independent. Enlarge {v1 , v2 } to a basis for R 4 . 17. Find a basis for the subspace of R 3 that is spanned by the vectors v1 = (1, 0, 0), v2 = (1, 0, 1), v3 = (2, 0, 1), v4 = (0, 0, −1) 18. Find a basis for the subspace of R 4 that is spanned by the vectors v1 = (1, 1, 1, 1), v2 = (2, 2, 2, 0), v3 = (0, 0, 0, 3), v4 = (3, 3, 3, 4) 19. In each part, let TA : R 3 →R 3 be multiplication by A and find the dimension of the subspace of R 3 consisting of all vectors x for which TA (x) = 0.



1

0

(a) A = ⎣1

0

1⎦

1

0

1

⎢ ⎡

(a) All vectors of the form (a, b, c, 0). (b) All vectors of the form (a, b, c, d), where d = a + b and c = a − b.

(a) The vector space of all diagonal n × n matrices. (b) The vector space of all symmetric n × n matrices. (c) The vector space of all upper triangular n × n matrices. 10. Find the dimension of the subspace of P3 consisting of all polynomials a0 + a1 x + a2 x 2 + a3 x 3 for which a0 = 0. 11. (a) Show that the set W of all polynomials in P2 such that p(1) = 0 is a subspace of P2 . (b) Make a conjecture about the dimension of W . (c) Confirm your conjecture by finding a basis for W . 12. Find a standard basis vector for R 3 that can be added to the set {v1 , v2 } to produce a basis for R 3 . (a) v1 = (−1, 2, 3), v2 = (1, −2, −2) (b) v1 = (1, −1, 0), v2 = (3, 1, −2) 13. Find standard basis vectors for R 4 that can be added to the set {v1 , v2 } to produce a basis for R 4 . v1 = (1, −4, 2, −3), v2 = (−3, 8, −4, 6)



⎥ ⎤

1

0

0

1

0⎦

1

1

1





1

2

0

(b) A = ⎣1

2

0⎦

1

2

0



(c) A = ⎣−1

(c) All vectors of the form (a, b, c, d), where a = b = c = d . 9. Find the dimension of each of the following vector spaces.



1





20. In each part, let TA be multiplication by A and find the dimension of the subspace R 4 consisting of all vectors x for which TA (x) = 0.

 (a) A =

1

0

2

−1

−1

4

0

0





0

⎢ (b) A = ⎣−1 1



0

1

1

1

0

0⎦

0

0

1



Working with Proofs 21. (a) Prove that for every positive integer n, one can find n + 1 linearly independent vectors in F (−⬁, ⬁). [Hint: Look for polynomials.] (b) Use the result in part (a) to prove that F (−⬁, ⬁) is infinitedimensional. (c) Prove that C(−⬁, ⬁), C m (−⬁, ⬁), and C ⬁ (−⬁, ⬁) are infinite-dimensional. 22. Let S be a basis for an n-dimensional vector space V. Prove that if v1 , v2 , . . . , vr form a linearly independent set of vectors in V, then the coordinate vectors (v1 )S , (v2 )S , . . . , (vr )S form a linearly independent set in R n , and conversely.

4.6 Change of Basis

23. Let S = {v1 , v2 , . . . , vr } be a nonempty set of vectors in an n-dimensional vector space V . Prove that if the vectors in S span V, then the coordinate vectors (v1 )S , (v2 )S , . . . , (vr )S span R n , and conversely.

25. Prove: A subspace of a finite-dimensional vector space is finite-dimensional. 26. State the two parts of Theorem 4.5.2 in contrapositive form. 27. In each part, let S be the standard basis for P2 . Use the results proved in Exercises 22 and 23 to find a basis for the subspace of P2 spanned by the given vectors. (a) −1 + x − 2x , 3 + 3x + 6x , 9 2

(b) 1 + x , x , 2 + 2x + 3x 2

(g) Every linearly independent set of vectors in R n is contained in some basis for R n . (h) There is a basis for M22 consisting of invertible matrices. 2

24. Prove part (a) of Theorem 4.5.6.

2

229

(i) If A has size n × n and In , A, A2 , . . . , An are distinct matri2 ces, then {In , A, A2 , . . . , An } is a linearly dependent set. ( j) There are at least two distinct three-dimensional subspaces of P2 . (k) There are only three distinct two-dimensional subspaces of P2 .

Working withTechnology T1. Devise three different procedures for using your technology utility to determine the dimension of the subspace spanned by a set of vectors in R n , and then use each of those procedures to determine the dimension of the subspace of R 5 spanned by the vectors

2

(c) 1 + x − 3x 2 , 2 + 2x − 6x 2 , 3 + 3x − 9x 2

True-False Exercises

v1 = (2, 2, −1, 0, 1),

TF. In parts (a)–( k) determine whether the statement is true or false, and justify your answer.

(b) There is a set of 17 linearly independent vectors in R 17 .



(c) There is a set of 11 vectors that span R 17 .

3.4

(d) Every linearly independent set of five vectors in R 5 is a basis for R 5 . 5

(e) Every set of five vectors that spans R is a basis for R . n

v3 = (1, 1, −2, 0, −1), v4 = (0, 0, 1, 1, 1) T2. Find a basis for the row space of A by starting at the top and successively removing each row that is a linear combination of its predecessors.

(a) The zero vector space has dimension zero.

5

v2 = (−1, −1, 2, −3, 1),

⎢ 2 .1 ⎢ ⎢ A=⎢ ⎢8.9 ⎢ ⎣7.6 1.0

n

(f ) Every set of vectors that spans R contains a basis for R .

−1.8



2 .2

1.0

3.6

4.0

8.0

6.0

9.4

9.0

−3.4⎥ ⎥ ⎥ 7.0 ⎥ ⎥ ⎥ −8.6⎦

2 .2

0.0

2 .2

4.6 Change of Basis A basis that is suitable for one problem may not be suitable for another, so it is a common process in the study of vector spaces to change from one basis to another. Because a basis is the vector space generalization of a coordinate system, changing bases is akin to changing coordinate axes in R2 and R3 . In this section we will study problems related to changing bases.

Coordinate Maps

If S = {v1 , v2 , . . . , vn } is a basis for a finite-dimensional vector space V , and if

(v)S = (c1 , c2 , . . . , cn ) is the coordinate vector of v relative to S , then, as illustrated in Figure 4.4.6, the mapping v → (v)S

(1)

creates a connection (a one-to-one correspondence) between vectors in the general vector space V and vectors in the Euclidean vector space R n . We call (1) the coordinate map relative to S from V to R n . In this section we will find it convenient to express coordinate

230

Chapter 4 General Vector Spaces

[ ]S v

c1 c2 . . . cn Rn

V

⎡ ⎤ c1 ⎢c2 ⎥ ⎢ ⎥ [v]S = ⎢ . ⎥ ⎣ .. ⎦ cn

vectors in the matrix form

Coordinate map

(2)

where the square brackets emphasize the matrix notation (Figure 4.6.1).

Figure 4.6.1

Change of Basis

There are many applications in which it is necessary to work with more than one coordinate system. In such cases it becomes important to know how the coordinates of a fixed vector relative to each coordinate system are related. This leads to the following problem.

V, and if we change the basis for V from a basis B to a basis B , how are the coordinate vectors [v]B and [v]B related?

The Change-of-Basis Problem If v is a vector in a finite-dimensional vector space

Remark To solve this problem, it will be convenient to refer to B as the “old basis” and B as the “new basis.” Thus, our objective is to find a relationship between the old and new coordinates of a fixed vector v in V. For simplicity, we will solve this problem for two-dimensional spaces. The solution for n-dimensional spaces is similar. Let

B = {u1 , u2 } and B = {u 1 , u 2 } be the old and new bases, respectively. We will need the coordinate vectors for the new basis vectors relative to the old basis. Suppose they are

[u 1 ]B = That is,

a c and [u 2 ]B = b d u 1 = a u1 + bu2

(4)

u 2 = cu1 + d u2 Now let v be any vector in V , and let

[v]B

(3)

k1 = k2

(5)

be the new coordinate vector, so that v = k1 u 1 + k2 u 2

(6)

In order to find the old coordinates of v, we must express v in terms of the old basis B . To do this, we substitute (4) into (6). This yields v = k1 (a u1 + bu2 ) + k2 (cu1 + d u2 ) or v = (k1 a + k2 c)u1 + (k1 b + k2 d)u2 Thus, the old coordinate vector for v is



k1 a + k2 c [v]B = k1 b + k2 d

4.6 Change of Basis

231

which, by using (5), can be written as

[v]B =

a b

c d

k1 a = k2 b

c [v]B

d

This equation states that the old coordinate vector [v]B results when we multiply the new coordinate vector [v]B on the left by the matrix



P =

a b

c d

Since the columns of this matrix are the coordinates of the new basis vectors relative to the old basis [see (3)], we have the following solution of the change-of-basis problem. Solution of the Change-of-Basis Problem If we change the basis for a vector space V

from an old basis B = {u1 , u2 , . . . , un } to a new basis B = {u 1 , u 2 , . . . , u n }, then for each vector v in V , the old coordinate vector [v]B is related to the new coordinate vector [v]B by the equation [v]B = P [v]B

(7) where the columns of P are the coordinate vectors of the new basis vectors relative to the old basis; that is, the column vectors of P are

[u 1 ]B , [u 2 ]B , . . . , [u n ]B

Transition Matrices

(8)

The matrix P in Equation (7) is called the transition matrix from B to B . For emphasis, we will often denote it by PB →B . It follows from (8) that this matrix can be expressed in terms of its column vectors as

  PB →B = [u 1 ]B | [u 2 ]B | · · · | [u n ]B

(9)

Similarly, the transition matrix from B to B can be expressed in terms of its column vectors as

  PB→B = [u1 ]B | [u2 ]B | · · · | [un ]B

(10)

Remark There is a simple way to remember both of these formulas using the terms “old basis”

and “new basis” defined earlier in this section: In Formula (9) the old basis is B and the new basis is B , whereas in Formula (10) the old basis is B and the new basis is B . Thus, both formulas can be restated as follows:

The columns of the transition matrix from an old basis to a new basis are the coordinate vectors of the old basis relative to the new basis.

E X A M P L E 1 Finding Transition Matrices

Consider the bases B = {u1 , u2 } and B = {u 1 , u 2 } for R 2 , where u1 = (1, 0), u2 = (0, 1), u 1 = (1, 1), u 2 = (2, 1) (a) Find the transition matrix PB →B from B to B . (b) Find the transition matrix PB→B from B to B .

232

Chapter 4 General Vector Spaces Solution (a) Here the old basis vectors are u 1 and u 2 and the new basis vectors are u1

and u2 . We want to find the coordinate matrices of the old basis vectors u 1 and u 2 relative to the new basis vectors u1 and u2 . To do this, observe that u 1 = u1 + u2 u 2 = 2u1 + u2 from which it follows that

1 1

[u 1 ]B =

and [u 2 ]B =



and hence that

PB →B =

1 1

2 1

2 1

Solution (b) Here the old basis vectors are u1 and u2 and the new basis vectors are u 1

and u 2 . As in part (a), we want to find the coordinate matrices of the old basis vectors u 1 and u 2 relative to the new basis vectors u1 and u2 . To do this, observe that u1 = −u 1 + u 2 u2 = 2u 1 − u 2 from which it follows that

−1



[u1 ]B =

1



2 −1

and [u2 ]B =



and hence that

PB→B =

−1 1

2 −1

Suppose now that B and B are bases for a finite-dimensional vector space V . Since multiplication by PB →B maps coordinate vectors relative to the basis B into coordinate vectors relative to a basis B , and PB→B maps coordinate vectors relative to B into coordinate vectors relative to B , it follows that for every vector v in V we have

[v]B = PB →B [v]B

(11)

[v]B = PB→B [v]B

(12)

E X A M P L E 2 Computing Coordinate Vectors

Let B and B be the bases in Example 1. Use an appropriate formula to find [v]B given that −3 [v]B = 5 Solution To find [v]B we need to make the transition from

Formula (11) and part (a) of Example 1 that



[v]B = PB →B [v]B

Invertibility of Transition Matrices

1 = 1

2 1



−3

5

B to B . It follows from

=

7 2

If B and B are bases for a finite-dimensional vector space V , then

(PB →B )(PB→B ) = PB→B

4.6 Change of Basis

233

because multiplication by the product (PB →B )(PB→B ) first maps the B -coordinates of a vector into its B -coordinates, and then maps those B -coordinates back into the original B -coordinates. Since the net effect of the two operations is to leave each coordinate vector unchanged, we are led to conclude that PB→B must be the identity matrix, that is,

(PB →B )(PB→B ) = I

(13)

(we omit the formal proof). For example, for the transition matrices obtained in Example 1 we have





1 2 −1 2 1 0 = =I (PB →B )(PB→B ) = 1 1 1 −1 0 1 It follows from (13) that PB →B is invertible and that its inverse is PB→B . Thus, we have the following theorem. THEOREM 4.6.1 If P is the transition matrix from a basis B to a basis B for a finite-

dimensional vector space V, then P is invertible and P −1 is the transition matrix from B to B .

An Efficient Method for ComputingTransition Matrices for Rn

Our next objective is to develop an efficient procedure for computing transition matrices between bases for R n . As illustrated in Example 1, the first step in computing a transition matrix is to express each new basis vector as a linear combination of the old basis vectors. For R n this involves solving n linear systems of n equations in n unknowns, each of which has the same coefficient matrix (why?). An efficient way to do this is by the method illustrated in Example 2 of Section 1.6, which is as follows: A Procedure for Computing PB→B 

Step 1. Form the matrix [B | B]. Step 2. Use elementary row operations to reduce the matrix in Step 1 to reduced row echelon form. Step 3. The resulting matrix will be [I | PB→B ]. Step 4. Extract the matrix PB→B from the right side of the matrix in Step 3. This procedure is captured in the following diagram.

[new basis | old basis]

row operations

−→

[I | transition from old to new]

(14)

E X A M P L E 3 Example 1 Revisited

In Example 1 we considered the bases B = {u1 , u2 } and B = {u 1 , u 2 } for R 2 , where u1 = (1, 0), u2 = (0, 1), u 1 = (1, 1), u 2 = (2, 1) (a) Use Formula (14) to find the transition matrix from B to B . (b) Use Formula (14) to find the transition matrix from B to B . Solution (a) Here B is the old basis and B is the new basis, so



1 [new basis | old basis] = 0

0 1

1 1

2 1

234

Chapter 4 General Vector Spaces

Since the left side is already the identity matrix, no reduction is needed. We see by inspection that the transition matrix is



PB →B

1 = 1

2 1

which agrees with the result in Example 1. Solution (b) Here B is the old basis and B is the new basis, so

[new basis | old basis] =

1 1

2 1

1 0

0 1

By reducing this matrix, so the left side becomes the identity, we obtain (verify)



1 [I | transition from old to new] = 0 so the transition matrix is

PB→B =

−1 1

0 1

−1 1

2 −1

2 −1

which also agrees with the result in Example 1.

Transition to the Standard Basis for R n

Note that in part (a) of the last example the column vectors of the matrix that made the transition from the basis B to the standard basis turned out to be the vectors in B

written in column form. This illustrates the following general result.

B = {u1 , u2 , . . . , un } be any basis for the vector space R n and let S = {e1 , e2 , . . . , en } be the standard basis for R n . If the vectors in these bases are written in column form, then

THEOREM 4.6.2 Let

PB →S = [u1 | u2 | · · · | un ]

(15)

It follows from this theorem that if

A = [u1 | u2 | · · · | un ] is any invertible n × n matrix, then A can be viewed as the transition matrix from the basis {u1 , u2 , . . . , un } for R n to the standard basis for R n . Thus, for example, the matrix



1 ⎢ A = ⎣2 1

2 5 0



3 ⎥ 3⎦ 8

which was shown to be invertible in Example 4 of Section 1.5, is the transition matrix from the basis u1 = (1, 2, 1), u2 = (2, 5, 0), u3 = (3, 3, 8) to the basis e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1)

4.6 Change of Basis

235

Exercise Set 4.6 1. Consider the bases B = {u1 , u2 } and B = {u 1 , u 2 } for R 2 , where

 

u1 =

2 2



, u2 =

4

 



1

, u 1 =

−1



−1 −1

3

, u 2 =



(a) Find the transition matrix from B to B .



3

p1 = 6 + 3x, p2 = 10 + 2x, q1 = 2, q2 = 3 + 2x

−5

(a) Find the transition matrix from B to B .

and use (12) to compute [w]B .

(b) Find the transition matrix from B to B .

(d) Check your work by computing [w]B directly. 2. Repeat the directions of Exercise 1 with the same vector w but with

 

u1 =

1 0

 

, u2 =

0 1

 



, u1 =

2

1



−3



, u2 =



4

3. Consider the bases B = {u1 , u2 , u3 } and B = {u 1 , u 2 , u 3 } for R 3 , where

⎡ ⎤



⎢ ⎥



2

2



1









3

⎢ ⎥

(c) Confirm that PB2 →B1 and PB1 →B2 are inverses of one another.

1



1



1









−1

−3

2

(b) Use Formula (14) to find the transition matrix PS→B . (c) Confirm that PB→S and PS→B are inverses of one another. (d) Let w = (5, −3). Find [w]B and then use Formula (11) to compute [w]S .

and use (12) to compute [w]B . (c) Check your work by computing [w]B directly. 4. Repeat the directions of Exercise 3 with the same vector w, but with ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 −3 −3







u1 = ⎣ 0⎦,

u2 = ⎣ 2⎦,

−3 ⎡ ⎤ −6 ⎢ ⎥ u 1 = ⎣−6⎦,

−1 −1 ⎡ ⎤ ⎡ ⎤ −2 −2 ⎢ ⎥ ⎢ ⎥ u 2 = ⎣−6⎦, u 3 = ⎣−3⎦ 4

(e) Let w = (2, 5). Find [w]B2 and then use the matrix PB2 →B1 to compute [w]B1 from [w]B2 .

(a) Find the transition matrix PB→S by inspection.

⎤ −5 ⎢ ⎥ w = ⎣ 8⎦ −5



(d) Let w = (0, 1). Find [w]B1 and then use the matrix PB1 →B2 to compute [w]B2 from [w]B1 .

8. Let S be the standard basis for R 2 , and let B = {v1 , v2 } be the basis in which v1 = (2, 1) and v2 = (−3, 4).



0

(a) Use Formula (14) to find the transition matrix PB2 →B1 .



(b) Compute the coordinate vector [w]B , where



7. Let B1 = {u1 , u2 } and B2 = {v1 , v2 } be the bases for R 2 in which u1 = (1, 2), u2 = (2, 3), v1 = (1, 3), and v2 = (1, 4). (b) Use Formula (14) to find the transition matrix PB1 →B2 .

1

(a) Find the transition matrix B to B .



(d) Check your work by computing [p]B directly.

⎡ ⎤

u 1 = ⎣ 1⎦, u 2 = ⎣ 1⎦, u 3 = ⎣ 0⎦

−5

(c) Compute the coordinate vector [p]B , where p = −4 + x , and use (12) to compute [p]B .



u1 = ⎣1⎦, u2 = ⎣−1⎦, u3 = ⎣2⎦



(d) Compute the coordinate vector [h]B , where h = 2 sin x − 5 cos x , and use (12) to obtain [h]B .

6. Consider the bases B = {p1 , p2 } and B = {q1 , q2 } for P1 , where

(c) Compute the coordinate vector [w]B , where w=

(c) Find the transition matrix from B to B .

(e) Check your work by computing [h]B directly.

(b) Find the transition matrix from B to B .



(b) Find the transition matrix from B = {g1 , g2 } to B = {f1 , f2 }.

u3 = ⎣ 6⎦

7

5. Let V be the space spanned by f1 = sin x and f2 = cos x . (a) Show that g1 = 2 sin x + cos x and g2 = 3 cos x form a basis for V.

(e) Let w = (3, −5). Find [w]S and then use Formula (12) to compute [w]B . 9. Let S be the standard basis for R 3 , and let B = {v1 , v2 , v3 } be the basis in which v1 = (1, 2, 1), v2 = (2, 5, 0), and v3 = (3, 3, 8). (a) Find the transition matrix PB→S by inspection. (b) Use Formula (14) to find the transition matrix PS→B . (c) Confirm that PB→S and PS→B are inverses of one another. (d) Let w = (5, −3, 1). Find [w]B and then use Formula (11) to compute [w]S . (e) Let w = (3, −5, 0). Find [w]S and then use Formula (12) to compute [w]B .

236

Chapter 4 General Vector Spaces

10. Let S = {e1 , e2 } be the standard basis for R 2 , and let B = {v1 , v2 } be the basis that results when the vectors in S are reflected about the line y = x . (a) Find the transition matrix PB→S . (b) Let P = PB→S and show that P T = PS→B . 11. Let S = {e1 , e2 } be the standard basis for R 2 , and let B = {v1 , v2 } be the basis that results when the vectors in S are reflected about the line that makes an angle θ with the positive x -axis. (a) Find the transition matrix PB→S .

12. If B1 , B2 , and B3 are bases for R 2 , and if



3 5

then PB3 →B1 =

1 2

and PB2 →B3 =

7 4

2 −1

.

13. If P is the transition matrix from a basis B to a basis B , and Q is the transition matrix from B to a basis C , what is the transition matrix from B to C ? What is the transition matrix from C to B ? 14. To write the coordinate vector for a vector, it is necessary to specify an order for the vectors in the basis. If P is the transition matrix from a basis B to a basis B , what is the effect on P if we reverse the order of vectors in B from v1 , . . . , vn to vn , . . . , v1 ? What is the effect on P if we reverse the order of vectors in both B and B ? 15. Consider the matrix



1 ⎢ P = ⎣1 0

1 0 2

is applied to each vector in S . Find the transition matrix PB→S . 19. If [w]B = w holds for all vectors w in R n , what can you say about the basis B ?

Working with Proofs 20. Let B be a basis for R n . Prove that the vectors v1 , v2 , . . . , vk span R n if and only if the vectors [v1 ]B , [v2 ]B , . . . , [vk ]B span R n . 21. Let B be a basis for R n . Prove that the vectors v1 , v2 , . . . , vk form a linearly independent set in R n if and only if the vectors [v1 ]B , [v2 ]B , . . . , [vk ]B form a linearly independent set in R n .

(b) Let P = PB→S and show that P T = PS→B .

PB1 →B2 =

T (x1 , x2 , x3 ) = (x1 + x2 , 2x1 − x2 + 4x3 , x2 + 3x3 )



0 ⎥ 2⎦ 1

True-False Exercises TF. In parts (a)–(f ) determine whether the statement is true or false, and justify your answer. (a) If B1 and B2 are bases for a vector space V, then there exists a transition matrix from B1 to B2 . (b) Transition matrices are invertible. (c) If B is a basis for a vector space R n , then PB→B is the identity matrix. (d) If PB1 →B2 is a diagonal matrix, then each vector in B2 is a scalar multiple of some vector in B1 . (e) If each vector in B2 is a scalar multiple of some vector in B1 , then PB1 →B2 is a diagonal matrix. (f ) If A is a square matrix, then A = PB1 →B2 for some bases B1 and B2 for R n .

Working withTechnology T1. Let

(a) P is the transition matrix from what basis B to the standard basis S = {e1 , e2 , e3 } for R 3 ? (b) P is the transition matrix from the standard basis S = {e1 , e2 , e3 } to what basis B for R 3 ? 16. The matrix



1 P = ⎣0 0

0 3 1



0 2⎦ 1

is what basis B to the basis * ) the transition matrix from (1, 1, 1), (1, 1, 0), (1, 0, 0) for R 3 ? 17. Let S = {e1 , e2 } be the standard basis for R 2 , and let B = {v1 , v2 } be the basis that results when the linear transformation defined by

T (x1 , x2 ) = (2x1 + 3x2 , 5x1 − x2 ) is applied to each vector in S . Find the transition matrix PB→S . 18. Let S = {e1 , e2 , e3 } be the standard basis for R 3 , and let B = {v1 , v2 , v3 } be the basis that results when the linear transformation defined by

and



5

8

6

⎢3 ⎢ P =⎢ ⎣0

−1

0

1

−1

2

4

3

v1 = (2, 4, 3, −5),

−13



−9⎥ ⎥ ⎥ 0⎦ −5

v2 = (0, 1, −1, 0),

v3 = (3, −1, 0, −9), v4 = (5, 8, 6, −13) Find a basis B = {u1 , u2 , u3 , u4 } for R 4 for which P is the transition matrix from B to B = {v1 , v2 , v3 , v4 }. T2. Given that the matrix for a linear transformation T : R 4 →R 4 relative to the standard basis B = {e1 , e2 , e3 , e4 } for R 4 is



1

⎢3 ⎢ ⎢ ⎣2 1

1



2

0

0

−1

5

3

1⎦

2

1

3

2⎥ ⎥



find the matrix for T relative to the basis

B = {e1 , e1 + e2 , e1 + e2 + e3 , e1 + e2 + e3 + e4 }

4.7 Row Space, Column Space, and Null Space

237

4.7 Row Space, Column Space, and Null Space In this section we will study some important vector spaces that are associated with matrices. Our work here will provide us with a deeper understanding of the relationships between the solutions of a linear system and properties of its coefficient matrix.

Row Space, Column Space, and Null Space

Recall that vectors can be written in comma-delimited form or in matrix form as either row vectors or column vectors. In this section we will use the latter two. DEFINITION 1 For an m × n matrix



the vectors

⎤ a1n a2 n ⎥ ⎥ .. ⎥ . ⎦ amn

··· ···

a12 a22 .. .

a11 ⎢a ⎢ 21 A=⎢ . ⎣ .. am1

am2

···

r1 = [a11

a12

· · · a1n ]

r2 = [a21

a22

.. .

rm = [am1

.. .

· · · a2 n ]

am2 · · · amn ] in R that are formed from the rows of A are called the row vectors of A, and the vectors ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ a11 a12 a1n ⎢a ⎥ ⎢a ⎥ ⎢a ⎥ ⎢ 21 ⎥ ⎢ 22 ⎥ ⎢ 2n ⎥ c1 = ⎢ . ⎥, c2 = ⎢ . ⎥, . . . , cn = ⎢ . ⎥ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ am1 am2 amn in R m formed from the columns of A are called the column vectors of A. n

E X A M P L E 1 Row and Column Vectors of a 2 × 3 Matrix



Let

A=

2 3

1 −1

0 4

The row vectors of A are r1 = [2 1 0] and r2 = [3 and the column vectors of A are





−1

4]



2 1 0 , c2 = , and c3 = c1 = 3 4 −1

The following definition defines three important vector spaces associated with a matrix. We will sometimes denote the row space of A, the column space of A, and the null space of A by row(A), col(A), and null(A), respectively.

A is an m × n matrix, then the subspace of R n spanned by the row vectors of A is called the row space of A, and the subspace of R m spanned by the column vectors of A is called the column space of A. The solution space of the homogeneous system of equations Ax = 0, which is a subspace of R n , is called the null space of A. DEFINITION 2 If

238

Chapter 4 General Vector Spaces

In this section and the next we will be concerned with two general questions: Question 1. What relationships exist among the solutions of a linear system Ax = b and the row space, column space, and null space of the coefficient matrix A? Question 2. What relationships exist among the row space, column space, and null space of a matrix? Starting with the first question, suppose that



a12 a22 .. . am2

a11 ⎢a ⎢ 21 A=⎢ . ⎣ .. a m1

··· ··· ···

⎤ ⎡ ⎤ a1n x1 ⎥ ⎥ ⎢ a2 n ⎥ ⎢ x2 ⎥ and x = ⎥ ⎢ .. .⎥ ⎣ .. ⎦ . ⎦ amn xn

It follows from Formula (10) of Section 1.3 that if c1 , c2 , . . . , cn denote the column vectors of A, then the product Ax can be expressed as a linear combination of these vectors with coefficients from x; that is,

Ax = x1 c1 + x2 c2 + · · · + xn cn

(1)

Thus, a linear system, Ax = b, of m equations in n unknowns can be written as

x1 c1 + x2 c2 + · · · + xn cn = b

(2)

from which we conclude that Ax = b is consistent if and only if b is expressible as a linear combination of the column vectors of A. This yields the following theorem. THEOREM 4.7.1 A system of linear equations Ax

= b is consistent if and only if b is in

the column space of A.

E X A M P L E 2 A Vector b in the Column Space of A

Let Ax = b be the linear system



−1

⎢ ⎣ 1 2

⎤⎡ ⎤ ⎡ ⎤ 1 x1 ⎥⎢ ⎥ ⎢ ⎥ 2 −3⎦ ⎣x2 ⎦ = ⎣−9⎦ x3 −3 1 −2 3

2

Show that b is in the column space of A by expressing it as a linear combination of the column vectors of A. Solution Solving the system by Gaussian elimination yields (verify)

x1 = 2, x2 = −1, x3 = 3 It follows from this and Formula (2) that





⎡ ⎤







⎢ ⎥



−1

3

2











1



2 ⎣ 1⎦ − ⎣2⎦ + 3 ⎣−3⎦ = ⎣−9⎦ 2

1

−2

−3

Recall from Theorem 3.4.4 that the general solution of a consistent linear system

Ax = b can be obtained by adding any specific solution of the system to the general solution of the corresponding homogeneous system Ax = 0. Keeping in mind that the null space of A is the same as the solution space of Ax = 0, we can rephrase that theorem in the following vector form.

4.7 Row Space, Column Space, and Null Space

239

Ax = b, and if S = {v1 , v2 , . . . , vk } is a basis for the null space of A, then every solution of Ax = b can

THEOREM 4.7.2 If x0 is any solution of a consistent linear system

be expressed in the form x = x0 + c1 v1 + c2 v2 + · · · + ck vk

(3)

Conversely, for all choices of scalars c1 , c2 , . . . , ck , the vector x in this formula is a solution of Ax = b. The vector x0 in Formula (3) is called a particular solution of Ax = b, and the remaining part of the formula is called the general solution of Ax = 0. With this terminology Theorem 4.7.2 can be rephrased as:

The general solution of a consistent linear system can be expressed as the sum of a particular solution of that system and the general solution of the corresponding homogeneous system. Geometrically, the solution set of Ax = b can be viewed as the translation by x0 of the solution space of Ax = 0 (Figure 4.7.1).

y

x0 + x

x

x

x0

Solution set of Ax = b

Figure 4.7.1

Solution space of Ax = 0

E X A M P L E 3 General Solution of a Linear System Ax = b

In the concluding subsection of Section 3.4 we compared solutions of the linear systems



1

⎢2 ⎢ ⎢ ⎣0 2

3 6 0 6

−2 −5 5 0

0 −2 10 8

⎡ ⎤ x1 ⎡ ⎡ ⎤ ⎥ 2 0 ⎢ 1 x 0 ⎢ 2⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 4 −3⎥ ⎢x3 ⎥ ⎢0⎥ ⎢2 ⎥ ⎢ ⎥ = ⎢ ⎥ and ⎢ 0 15⎦ ⎢x4 ⎥ ⎣0⎦ ⎣0 ⎢ ⎥ ⎣ ⎦ 4 18 2 0 x5 x6 ⎤

3 6 0 6

−2 −5 5 0

0 −2 10 8

⎡ ⎤ x1 ⎡ ⎤ ⎥ 2 0 ⎢ x 0 ⎢ 2⎥ ⎥ ⎢ ⎥ ⎢ 4 −3⎥ ⎢x3 ⎥ ⎢−1⎥ ⎥ ⎥⎢ ⎥ = ⎢ ⎥ 0 15⎦ ⎢x4 ⎥ ⎣ 5⎦ ⎢ ⎥ 4 18 ⎣x5 ⎦ 6 x6 ⎤

and deduced that the general solution x of the nonhomogeneous system and the general solution xh of the corresponding homogeneous system (when written in column-vector form) are related by

240

Chapter 4 General Vector Spaces

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 −3r − 4s − 2t x1 −4 −2 −3 ⎢x2 ⎥ ⎢ ⎥ ⎢0 ⎥ ⎢ 0⎥ ⎢ 0⎥ ⎢ 1⎥ r ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢x ⎥ ⎢ ⎥ ⎢0 ⎥ ⎢ −2 ⎥ ⎢ 0⎥ ⎢ 0⎥ −2s ⎢ 3⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ = ⎢ ⎥+ r ⎢ ⎥ + s ⎢ ⎥ + t ⎢ ⎥ ⎢x4 ⎥ ⎢ ⎥ ⎢0 ⎥ ⎢ 1⎥ ⎢ 0⎥ ⎢ 0⎥ s ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣x5 ⎦ ⎣ ⎦ ⎣ ⎦ ⎦ ⎣ ⎦ ⎣ 1⎦ ⎣ 0 t 0 0 1 1 0 0 0 x6 3 3          x

xh

x0

Recall from the Remark following Example 3 of Section 4.5 that the vectors in xh form a basis for the solution space of Ax = 0. Bases for Row Spaces, Column Spaces, and Null Spaces

We know that performing elementary row operations on the augmented matrix [A | b] of a linear system does not change the solution set of that system. This is true, in particular, if the system is homogeneous, in which case the augmented matrix is [A | 0]. But elementary row operations have no effect on the column of zeros, so it follows that the solution set of Ax = 0 is unaffected by performing elementary row operations on A itself. Thus, we have the following theorem. THEOREM 4.7.3 Elementary row operations do not change the null space of a matrix.

The following theorem, whose proof is left as an exercise, is a companion to Theorem 4.7.3. THEOREM 4.7.4 Elementary row operations do not change the row space of a matrix.

Theorems 4.7.3 and 4.7.4 might tempt you into incorrectly believing that elementary row operations do not change the column space of a matrix. To see why this is not true, compare the matrices

1 3 1 3 A= and B = 2 6 0 0 The matrix B can be obtained from A by adding −2 times the first row to the second. However, this operation has changed the column space of A, since that column space consists of all scalar multiples of 1 2 whereas the column space of B consists of all scalar multiples of 1 0 and the two are different spaces. E X A M P L E 4 Finding a Basis for the Null Space of a Matrix

Find a basis for the null space of the matrix ⎡ 1 3 −2 ⎢2 6 −5 ⎢ A=⎢ 0 5 ⎣0 2 6 0

0 −2 10 8

2 4 0 4



0 −3⎥ ⎥ ⎥ 15⎦ 18

4.7 Row Space, Column Space, and Null Space

241

A is the solution space of the homogeneous linear system Ax = 0, which, as shown in Example 3, has the basis ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −3 −4 −2 ⎢ 1⎥ ⎢ 0⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0⎥ ⎢ −2 ⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ v1 = ⎢ ⎥, v2 = ⎢ ⎥, v3 = ⎢ ⎥ ⎢ 0⎥ ⎢ 1⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 0⎦ ⎣ 0⎦ ⎣ 1⎦

Solution The null space of

0

0

0

Remark Observe that the basis vectors v1 , v2 , and v3 in the last example are the vectors that result by successively setting one of the parameters in the general solution equal to 1 and the others equal to 0. The following theorem makes it possible to find bases for the row and column spaces of a matrix in row echelon form by inspection.

THEOREM 4.7.5 If a matrix

R is in row echelon form, then the row vectors with the leading 1’s (the nonzero row vectors) form a basis for the row space of R, and the column vectors with the leading 1’s of the row vectors form a basis for the column space of R .

The proof essentially involves an analysis of the positions of the 0’s and 1’s of R . We omit the details.

E X A M P L E 5 Bases for the Row and Column Spaces of a Matrix in Row Echelon Form

Find bases for the row and column spaces of the matrix



1 ⎢0 ⎢ R=⎢ ⎣0 0

−2

5 3 0 0

1 0 0

0 0 1 0



3 0⎥ ⎥ ⎥ 0⎦ 0

Solution Since the matrix R is in row echelon form, it follows from Theorem 4.7.5 that the vectors

r1 = [1

−2

5

0

3]

r 2 = [0

1

3

0

0]

r3 = [0

0

0

1

0]

form a basis for the row space of R , and the vectors

⎡ ⎤





⎡ ⎤

−2 1 0 ⎢0⎥ ⎢ 1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ c1 = ⎢ ⎥, c2 = ⎢ ⎥, c4 = ⎢ ⎥ ⎣0⎦ ⎣ 0⎦ ⎣1⎦ 0 0 0 form a basis for the column space of R .

242

Chapter 4 General Vector Spaces

E X A M P L E 6 Basis for a Row Space by Row Reduction

Find a basis for the row space of the matrix



1 ⎢ 2 ⎢ A=⎢ ⎣ 2 −1

−3 −6 −6 3

4 9 9 −4

−2 −1 −1 2

5 8 9 −5



4 2⎥ ⎥ ⎥ 7⎦ −4

Solution Since elementary row operations do not change the row space of a matrix, we

can find a basis for the row space of A by finding a basis for the row space of any row echelon form of A. Reducing A to row echelon form, we obtain (verify)



1 ⎢0 ⎢ R=⎢ ⎣0 0

−3 0 0 0

4 1 0 0

−2 3 0 0

5 −2 1 0



4 −6⎥ ⎥ ⎥ 5⎦ 0

By Theorem 4.7.5, the nonzero row vectors of R form a basis for the row space of R and hence form a basis for the row space of A. These basis vectors are

Basis for the Column Space of a Matrix

r1 = [1

−3

4

−2

5

4]

r2 = [0

0

1

3

−2

−6]

r3 = [0

0

0

0

1

5]

The problem of finding a basis for the column space of a matrix A in Example 6 is complicated by the fact that an elementary row operation can alter its column space. However, the good news is that elementary row operations do not alter dependence relationships among the column vectors. To make this more precise, suppose that w1 , w2 , . . . , wk are linearly dependent column vectors of A, so there are scalars c1 , c2 , . . . , ck that are not all zero and such that

c1 w1 + c2 w2 + · · · + ck wk = 0

(4)

If we perform an elementary row operation on A, then these vectors will be changed into new column vectors w 1 , w 2 , . . . , w k . At first glance it would seem possible that the transformed vectors might be linearly independent. However, this is not so, since it can be proved that these new column vectors are linearly dependent and, in fact, related by an equation c1 w 1 + c2 w 2 + · · · + ck w k = 0 that has exactly the same coefficients as (4). It can also be proved that elementary row operations do not alter the linear independence of a set of column vectors. All of these results are summarized in the following theorem.

Although elementary row operations can change the column space of a matrix, it follows from Theorem 4.7.6(b) that they do not change the dimension of its column space.

THEOREM 4.7.6 If A and B are row equivalent matrices, then:

(a) A given set of column vectors of A is linearly independent if and only if the corresponding column vectors of B are linearly independent. (b) A given set of column vectors of A forms a basis for the column space of A if and only if the corresponding column vectors of B form a basis for the column space of B .

4.7 Row Space, Column Space, and Null Space

243

E X A M P L E 7 Basis for a Column Space by Row Reduction

Find a basis for the column space of the matrix



1 ⎢ 2 ⎢ A=⎢ ⎣ 2 −1

−3 −6 −6 3

4 9 9 −4

−2 −1 −1

5 8 9 −5

2



4 2⎥ ⎥ ⎥ 7⎦ −4

that consists of column vectors of A. Solution We observed in Example 6 that the matrix



1 ⎢0 ⎢ R=⎢ ⎣0 0

−3 0 0 0

4 1 0 0

−2 3 0 0

5 −2 1 0



4 −6⎥ ⎥ ⎥ 5⎦ 0

is a row echelon form of A. Keeping in mind that A and R can have different column spaces, we cannot find a basis for the column space of A directly from the column vectors of R . However, it follows from Theorem 4.7.6(b) that if we can find a set of column vectors of R that forms a basis for the column space of R , then the corresponding column vectors of A will form a basis for the column space of A. Since the first, third, and fifth columns of R contain the leading 1’s of the row vectors, the vectors ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 4 5 ⎢0⎥ ⎢ 1⎥ ⎢−2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ c 1 = ⎢ ⎥, c 3 = ⎢ ⎥, c 5 = ⎢ ⎥ ⎣0⎦ ⎣ 0⎦ ⎣ 1⎦ 0 0 0 form a basis for the column space of R . Thus, the corresponding column vectors of A, which are ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 4 5 ⎢ 2⎥ ⎢ 9⎥ ⎢ 8⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ c1 = ⎢ ⎥, c3 = ⎢ ⎥, c5 = ⎢ ⎥ ⎣ 2⎦ ⎣ 9⎦ ⎣ 9⎦ −1 −4 −5 form a basis for the column space of A. Up to now we have focused on methods for finding bases associated with matrices. Those methods can readily be adapted to the more general problem of finding a basis for the subspace spanned by a set of vectors in R n . E X A M P L E 8 Basis for the Space Spanned by a Set of Vectors

The following vectors span a subspace of R 4 . Find a subset of these vectors that forms a basis of this subspace. v1 = (1, 2, 2, −1),

v2 = (−3, −6, −6, 3),

v3 = (4, 9, 9, −4),

v4 = (−2, −1, −1, 2),

v5 = (5, 8, 9, −5),

v6 = (4, 2, 7, −4)

Solution If we rewrite these vectors in column form and construct the matrix that has

those vectors as its successive columns, then we obtain the matrix A in Example 7 (verify). Thus, span{v1 , v2 , v3 , v4 , v5 , v6 } = col(A)

244

Chapter 4 General Vector Spaces

Proceeding as in that example (and adjusting the notation appropriately), we see that the vectors v1 , v3 , and v5 form a basis for span{v1 , v2 , v3 , v4 , v5 , v6 } Bases Formed from Row and Column Vectors of a Matrix

In Example 6, we found a basis for the row space of a matrix by reducing that matrix to row echelon form. However, the basis vectors produced by that method were not all row vectors of the original matrix. The following adaptation of the technique used in Example 7 shows how to find a basis for the row space of a matrix that consists entirely of row vectors of that matrix. E X A M P L E 9 Basis for the Row Space of a Matrix

Find a basis for the row space of



1 ⎢2 ⎢ A=⎢ ⎣0 2

−2 −5

0 −3 15 18

5 6



0 −2 10 8

3 6⎥ ⎥ ⎥ 0⎦ 6

consisting entirely of row vectors from A. Solution We will transpose A, thereby converting the row space of A into the column space of AT ; then we will use the method of Example 7 to find a basis for the column space of AT ; and then we will transpose again to convert column vectors back to row vectors. Transposing A yields



1 ⎢−2 ⎢ ⎢ AT = ⎢ 0 ⎢ ⎣ 0 3

2 −5 −3 −2 6



0 5 15 10 0

2 6⎥ ⎥ ⎥ 18⎥ ⎥ 8⎦ 6

and then reducing this matrix to row echelon form we obtain



1 ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎣0 0

2 1 0 0 0

0 −5 0 0 0



2 −10⎥ ⎥ ⎥ 1⎥ ⎥ 0⎦ 0

The first, second, and fourth columns contain the leading 1’s, so the corresponding column vectors in AT form a basis for the column space of AT ; these are









⎡ ⎤

1 2 2 ⎢−2⎥ ⎢−5⎥ ⎢ 6⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ c1 = ⎢ 0⎥, c2 = ⎢−3⎥, and c4 = ⎢18⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 0⎦ ⎣−2⎦ ⎣ 8⎦ 6 3 6 Transposing again and adjusting the notation appropriately yields the basis vectors r1 = [1

−2

0

0

3],

r4 = [2 for the row space of A.

6

r2 = [2 18

8

−5 6]

−3

−2

6],

4.7 Row Space, Column Space, and Null Space

245

Next we will give an example that adapts the method of Example 7 to solve the following general problem in R n :

S = {v1 , v2 , . . . , vk } in R n , find a subset of these vectors that forms a basis for span(S), and express each vector that is not in that basis Problem Given a set of vectors

as a linear combination of the basis vectors.

E X A M P L E 10 Basis and Linear Combinations

(a) Find a subset of the vectors v1 = (1, −2, 0, 3), v2 = (2, −5, −3, 6), v3 = (0, 1, 3, 0), v4 = (2, −1, 4, −7), v5 = (5, −8, 1, 2) that forms a basis for the subspace of R 4 spanned by these vectors. (b) Express each vector not in the basis as a linear combination of the basis vectors. Solution (a) We begin by constructing a matrix that has v1 , v2 , . . . , v5 as its column

vectors:

Had we only been interested in part (a) of this example, it would have sufficed to reduce the matrix to row echelon form. It is for part (b) that the reduced row echelon form is most useful.



1 ⎢ −2 ⎢ ⎢ ⎣ 0 3

2 −5 −3 6

0 1 3 0

2 −1 4 −7



5 −8⎥ ⎥ ⎥ 1⎦ 2











v1

v2

v3

v4

v5

(5)

The first part of our problem can be solved by finding a basis for the column space of this matrix. Reducing the matrix to reduced row echelon form and denoting the column vectors of the resulting matrix by w1 , w2 , w3 , w4 , and w5 yields



1 ⎢0 ⎢ ⎢ ⎣0 0

0 1 0 0

2 −1 0 0



0 0 1 0

1 1⎥ ⎥ ⎥ 1⎦ 0











w1

w2

w3

w4

w5

(6)

The leading 1’s occur in columns 1, 2, and 4, so by Theorem 4.7.5,

{w1 , w2 , w4 } is a basis for the column space of (6), and consequently,

{v1 , v2 , v4 } is a basis for the column space of (5). Solution (b) We will start by expressing w3 and w5 as linear combinations of the basis

vectors w1 , w2 , w4 . The simplest way of doing this is to express w3 and w5 in terms of basis vectors with smaller subscripts. Accordingly, we will express w3 as a linear combination of w1 and w2 , and we will express w5 as a linear combination of w1 , w2 , and w4 . By inspection of (6), these linear combinations are w3 = 2w1 − w2 w5 = w1 + w2 + w4

246

Chapter 4 General Vector Spaces

We call these the dependency equations. The corresponding relationships in (5) are v3 = 2v1 − v2 v5 = v1 + v2 + v4 The following is a summary of the steps that we followed in our last example to solve the problem posed above. Basis for the Space Spanned by a Set of Vectors Step 1. Form the matrix A whose columns are the vectors in the set S = {v1 , v2 , . . . , vk }. Step 2. Reduce the matrix A to reduced row echelon form R . Step 3. Denote the column vectors of R by w1 , w2 , . . . , wk . Step 4. Identify the columns of R that contain the leading 1’s. The corresponding column vectors of A form a basis for span(S). This completes the first part of the problem. Step 5. Obtain a set of dependency equations for the column vectors w1 , w2 , . . . , wk of R by successively expressing each wi that does not contain a leading 1 of R as a linear combination of predecessors that do. Step 6. In each dependency equation obtained in Step 5, replace the vector wi by the vector vi for i = 1, 2, . . . , k . This completes the second part of the problem.

Exercise Set 4.7



In Exercises 1–2, express the product Ax as a linear combination of the column vectors of A.

1. (a)





2

3 4

−1 ⎡

−3

1 2

1

0 6 −1



2 ⎡ ⎤ −1 0⎥ ⎥⎢ ⎥ ⎥ ⎣ 2⎦ −1 ⎦ 5 3

6 −4 3 8

⎢ 5 ⎢ 2. (a) ⎢ ⎣ 2

4 ⎢ (b) ⎣3 0

−1

⎤⎡



⎥⎢



−2

2 ⎦ ⎣ 3⎦ 5 4





2 (b) 6

1 3



3 5 ⎢ ⎥ ⎣ 0⎦ −8 −5

1 ⎢ 3. (a) A = ⎣1 2



1 ⎢ (b) A = ⎣9 1



1 ⎢ 4. (a) A = ⎣−1 −1







2 −1 ⎢ ⎥ ⎥ 1⎦; b = ⎣ 0⎦ 3 2

1 0 1

−1 3 1

−1 1 −1







1 5 ⎢ ⎥ ⎥ 1⎦; b = ⎣ 1⎦ 1 −1



1

⎡ ⎤ 2

⎢ ⎥ ⎥ −1⎦; b = ⎣0⎦ 1

2 1 2 1

0 2 1 2



⎡ ⎤

1 4 ⎢ 3⎥ 1⎥ ⎥ ⎢ ⎥ ⎥; b = ⎢ ⎥ 3⎦ ⎣ 5⎦ 2 7

5. Suppose that x1 = 3, x2 = 0, x3 = −1, x4 = 5 is a solution of a nonhomogeneous linear system Ax = b and that the solution set of the homogeneous system Ax = 0 is given by the formulas

x1 = 5r − 2s, x2 = s, x3 = s + t, x4 = t (a) Find a vector form of the general solution of Ax = 0.

In Exercises 3–4, determine whether b is in the column space of A, and if so, express b as a linear combination of the column vectors of A



1 ⎢0 ⎢ (b) A = ⎢ ⎣1 0

0

(b) Find a vector form of the general solution of Ax = b. 6. Suppose that x1 = −1, x2 = 2, x3 = 4, x4 = −3 is a solution of a nonhomogeneous linear system Ax = b and that the solution set of the homogeneous system Ax = 0 is given by the formulas

x1 = −3r + 4s, x2 = r − s, x3 = r, x4 = s (a) Find a vector form of the general solution of Ax = 0. (b) Find a vector form of the general solution of Ax = b. In Exercises 7–8, find the vector form of the general solution of the linear system Ax = b, and then use that result to find the vector form of the general solution of Ax = 0.

4.7 Row Space, Column Space, and Null Space

7. (a) x1 − 3x2 = 1 2x1 − 6x2 = 2 8. (a)

(b)

x1 − 2x2 2x1 − 4x2 −x1 + 2x2 3x1 − 6x2 x1 −2x1 −x1 4x1

(b) x1 + x2 + 2x3 =

5

x1 + x3 = −2 2x1 + x2 + 3x3 = 3

+ x3 + 2x3 − x3 + 3x3

+ 2x4 + 4x4 − 2x4 + 6x4

= −1 = −2 = 1 = −3

+ 2x2 − 3x3 + x4 + x2 + 2x3 + x4 + 3x2 − x3 + 2x4 − 7x2 − 5x4

17. v1 = (1, −1, 5, 2), v2 = (−2, 3, 1, 0), v3 = (4, −5, 9, 4), v4 = (0, 4, 2, −3), v5 = (−7, 18, 2, −8)

= 4 = −1 = 3 = −5

−1 −4 −6

1 ⎢ 9. (a) A = ⎣5 7



1 ⎢ 10. (a) A = ⎣ 2 −1



3

1

2 5 3 2

4 −2 0 3

2

2 ⎢ (b) A = ⎣4 0

⎥ −4⎦

4 1 3

⎢ 3 ⎢ (b) A = ⎢ ⎣−1





0 0 0



−1 ⎥ −2⎦ 0



2 ⎥ 0⎦ 2 5 1 −1 5

6 4 −2 7

1

⎢ 11. (a) ⎣0 0



1

⎢0 ⎢ ⎢ 12. (a) ⎢0 ⎢ ⎣0 0

0 0 0

1 ⎢0 ⎢ (b) ⎢ ⎣0 0

2 ⎥ 1⎦ 0 2 1 0 0 0

4

−3 1 0 0



20. Construct a matrix whose null space consists of all linear combinations of the⎡vectors ⎡ ⎤ ⎤ 1 2 ⎢−1⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ v1 = ⎢ ⎥ and v2 = ⎢ ⎥ ⎣ 3⎦ ⎣−2⎦ 4 2

(a) b = (0, 0)

1 ⎢−2 A=⎢ ⎣−1 −3

−3 1 0 0

1 ⎢0 ⎢ (b) ⎢ ⎣0 0

−2 5 3 8

5 −7 −2 −9

2

0

1

−1

4

 . For the given vector b,

2 1 0 0



0 0 0 0

0 0⎥ ⎥ ⎥ 0⎦ 0

−1

5 3⎥ ⎥ ⎥ −7⎦ 1

4 1 0



0 0 1 1

(c) b = (−1, 1)



0

1⎥ ⎥

⎥. For the given vector b, find

1⎦

2 0 the general form of all vectors x in R 2 for which TA (x) = b if such vectors exist. (a) b = (0, 0, 0, 0)

(b) b = (1, 1, −1, −1)

(c) b = (2, 0, 0, 2) 23. (a) Let



0

⎢ A = ⎣1



1

0

0

0⎦



0 0 0 Show that relative to an xyz-coordinate system in 3-space the null space of A consists of all points on the z-axis and that the column space consists of all points in the xy -plane (see the accompanying figure). (b) Find a 3 × 3 matrix whose null space is the x -axis and whose column space is the yz-plane. z

(b) Use the method of Example 9 to find a basis for the row space of A that consists entirely of row vectors of A.

Null space of A

In Exercises 14–15, find a basis for the subspace of R 4 that is spanned by the given vectors.

y

14. (1, 1, −4, −3), (2, 0, 2, −2), (2, −1, 3, 2) 15. (1, 1, 0, 0), (0, 0, 1, 1), (−2, 0, 2, 2), (0, −3, 0, 3)



2



3 −6⎥ ⎥ −3⎦ −9

(b) b = (1, 3)

⎢0 ⎢ 22. In each part, let A = ⎢ ⎣1

13. (a) Use the methods of Examples 6 and 7 to find bases for the row space and column space of the matrix



1

find the general form of all vectors x in R 3 for which TA (x) = b if such vectors exist.



5 0⎥ ⎥ ⎥ −3⎥ ⎥ 1⎦ 0

19. The matrix in Exercise 10(b).

21. In each part, let A =

9 −1⎥ ⎥ ⎥ −1⎦ 8





In Exercises 18–19, find a basis for the row space of A that consists entirely of row vectors of A. 18. The matrix in Exercise 10(a).





In Exercises 11–12, a matrix in row echelon form is given. By inspection, find a basis for the row space and for the column space of that matrix.



In Exericses 16–17, find a subset of the given vectors that forms a basis for the space spanned by those vectors, and then express each vector that is not in the basis as a linear combination of the basis vectors. 16. v1 = (1, 0, 1, 1), v2 = (−3, 3, 7, 1), v3 = (−1, 3, 9, 3), v4 = (−5, 3, 5, −1)

In Exercises 9–10, find bases for the null space and row space of A.



247

x

Column space of A

Figure Ex-23

248

Chapter 4 General Vector Spaces

24. Find a 3 × 3 matrix whose null space is (a) a point.

(b) a line.

(d) The set of nonzero row vectors of a matrix A is a basis for the row space of A.

(c) a plane.

(e) If A and B are n × n matrices that have the same row space, then A and B have the same column space.

25. (a) Find all 2 × 2 matrices whose null space is the line 3x − 5y = 0. (b) Describe the null spaces of the following matrices:



A=



1

4

0

5



, B=

1

0

0

5





, C=

6

2

3

1





, D=

0

0

0

0



(f ) If E is an m × m elementary matrix and A is an m × n matrix, then the null space of EA is the same as the null space of A. (g) If E is an m × m elementary matrix and A is an m × n matrix, then the row space of EA is the same as the row space of A. (h) If E is an m × m elementary matrix and A is an m × n matrix, then the column space of EA is the same as the column space of A.

Working with Proofs 26. Prove Theorem 4.7.4. 27. Prove that the row vectors of an n × n invertible matrix A form a basis for R n .

(i) The system Ax = b is inconsistent if and only if b is not in the column space of A.

28. Suppose that A and B are n × n matrices and A is invertible. Invent and prove a theorem that describes how the row spaces of AB and B are related.

( j) There is an invertible matrix A and a singular matrix B such that the row spaces of A and B are the same.

Working withTechnology True-False Exercises

T1. Find a basis for the column space of

TF. In parts (a)–( j) determine whether the statement is true or false, and justify your answer. (a) The span of v1 , . . . , vn is the column space of the matrix whose column vectors are v1 , . . . , vn . (b) The column space of a matrix A is the set of solutions of Ax = b. (c) If R is the reduced row echelon form of A, then those column vectors of R that contain the leading 1’s form a basis for the column space of A.



2

⎢3 ⎢ ⎢ A=⎢ ⎢3 ⎢ ⎣2 1

4



6

0

8

4

12

9

−2

8

6

18

9

−7

−2

6

−3

6

5

18

4

33

⎥ ⎥ −1⎥ ⎥ ⎥ 11⎦

3

−2

0

2

6

2

6⎥

that consists of column vectors of A. T2. Find a basis for the row space of the matrix A in Exercise T1 that consists of row vectors of A.

4.8 Rank, Nullity, and the Fundamental Matrix Spaces In the last section we investigated relationships between a system of linear equations and the row space, column space, and null space of its coefficient matrix. In this section we will be concerned with the dimensions of those spaces. The results we obtain will provide a deeper insight into the relationship between a linear system and its coefficient matrix.

Row and Column Spaces Have Equal Dimensions

In Examples 6 and 7 of Section 4.7 we found that the row and column spaces of the matrix ⎤ ⎡ 1 −3 4 −2 5 4 ⎢ 2 −6 9 −1 8 2⎥ ⎥ ⎢ A=⎢ ⎥ 9 −1 9 7⎦ ⎣ 2 −6 −1 3 −4 2 −5 −4 both have three basis vectors and hence are both three-dimensional. The fact that these spaces have the same dimension is not accidental, but rather a consequence of the following theorem.

4.8 Rank, Nullity, and the Fundamental Matrix Spaces THEOREM 4.8.1 The row space and the column space of a matrix

249

A have the same

dimension. Proof It follows from Theorems 4.7.4 and 4.7.6 (b) that elementary row operations do

not change the dimension of the row space or of the column space of a matrix. Thus, if R is any row echelon form of A, it must be true that

The proof of Theorem 4.8.1 shows that the rank of A can be interpreted as the number of leading 1’s in any row echelon form of A.

Rank and Nullity

dim(row space of A) = dim(row space of R) dim(column space of A) = dim(column space of R) so it suffices to show that the row and column spaces of R have the same dimension. But the dimension of the row space of R is the number of nonzero rows, and by Theorem 4.7.5 the dimension of the column space of R is the number of leading 1’s. Since these two numbers are the same, the row and column space have the same dimension.

The dimensions of the row space, column space, and null space of a matrix are such important numbers that there is some notation and terminology associated with them. DEFINITION 1 The common dimension of the row space and column space of a

matrix A is called the rank of A and is denoted by rank(A); the dimension of the null space of A is called the nullity of A and is denoted by nullity(A).

E X A M P L E 1 Rank and Nullity of a 4 × 6 Matrix

Find the rank and nullity of the matrix



−1

⎢ 3 ⎢ A=⎢ ⎣ 2 4

2 −7 −5 −9

0 2 2 2

4 0 4 −4

5 1 6 −4

⎤ −3 4⎥ ⎥ ⎥ 1⎦ 7

Solution The reduced row echelon form of A is



1 ⎢0 ⎢ ⎢ ⎣0 0

0 1 0 0

−4 −2 0 0

−28 −37 −12 −16 0 0

0 0



13 5⎥ ⎥ ⎥ 0⎦ 0

(1)

(verify). Since this matrix has two leading 1’s, its row and column spaces are twodimensional and rank(A) = 2. To find the nullity of A, we must find the dimension of the solution space of the linear system Ax = 0. This system can be solved by reducing its augmented matrix to reduced row echelon form. The resulting matrix will be identical to (1), except that it will have an additional last column of zeros, and hence the corresponding system of equations will be

x1 − 4x3 − 28x4 − 37x5 + 13x6 = 0 x2 − 2x3 − 12x4 − 16x5 + 5x6 = 0 Solving these equations for the leading variables yields

x1 = 4x3 + 28x4 + 37x5 − 13x6 x2 = 2x3 + 12x4 + 16x5 − 5x6

(2)

250

Chapter 4 General Vector Spaces

from which we obtain the general solution

x1 = 4r + 28s + 37t − 13u x2 = 2r + 12s + 16t − 5u x3 = r x4 = s x5 = t x6 = u or in column vector form

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x1 −13 4 28 37 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ x2 ⎥ ⎢2 ⎥ ⎢12⎥ ⎢16⎥ ⎢ −5⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢x3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = r ⎢1⎥ + s ⎢ 0⎥ + t ⎢ 0⎥ + u ⎢ 0⎥ ⎢x4 ⎥ ⎢ 0⎥ ⎢ 1⎥ ⎢ 0⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣x5 ⎦ ⎣ 0⎦ ⎣ 0⎦ ⎣ 1⎦ ⎣ 0⎦ 0 0 0 1 x6

(3)

Because the four vectors on the right side of (3) form a basis for the solution space, nullity(A) = 4. E X A M P L E 2 Maximum Value for Rank

What is the maximum possible rank of an m × n matrix A that is not square?

A lie in R n and the column vectors in R m , the row space of A is at most n-dimensional and the column space is at most m-dimensional. Since the rank of A is the common dimension of its row and column space, it follows that the rank is at most the smaller of m and n. We denote this by writing

Solution Since the row vectors of

rank(A) ≤ min(m, n) in which min(m, n) is the minimum of m and n. The following theorem establishes a fundamental relationship between the rank and nullity of a matrix. THEOREM 4.8.2 Dimension Theorem for Matrices

If A is a matrix with n columns, then rank(A) + nullity(A) = n

(4)

= 0 has n unknowns (variables). These fall into two distinct categories: the leading variables and the free variables. Thus,



number of leading number of free + =n variables variables But the number of leading variables is the same as the number of leading 1’s in any row echelon form of A, which is the same as the dimension of the row space of A, which is the same as the rank of A. Also, the number of free variables in the general solution of Ax = 0 is the same as the number of parameters in that solution, which is the same as the dimension of the solution space of Ax = 0, which is the same as the nullity of A. This yields Formula (4). Proof Since A has n columns, the homogeneous linear system Ax

4.8 Rank, Nullity, and the Fundamental Matrix Spaces

251

E X A M P L E 3 The Sum of Rank and Nullity



The matrix

−1

⎢ 3 ⎢ A=⎢ ⎣ 2 4

2 −7 −5 −9

0 2 2 2

4 0 4 −4

5 1 6 −4

⎤ −3 4⎥ ⎥ ⎥ 1⎦ 7

has 6 columns, so rank(A) + nullity(A) = 6 This is consistent with Example 1, where we showed that rank(A) = 2 and nullity(A) = 4

The following theorem, which summarizes results already obtained, interprets rank and nullity in the context of a homogeneous linear system. THEOREM 4.8.3 If A is an m × n matrix, then

(a) rank(A) = the number of leading variables in the general solution of Ax = 0. (b) nullity(A) = the number of parameters in the general solution of Ax = 0.

E X A M P L E 4 Rank, Nullity, and Linear Systems

(a) Find the number of parameters in the general solution of Ax = 0 if A is a 5 × 7 matrix of rank 3. (b) Find the rank of a 5 × 7 matrix A for which Ax = 0 has a two-dimensional solution space. Solution (a) From (4),

nullity(A) = n − rank(A) = 7 − 3 = 4 Thus, there are four parameters. Solution (b) The matrix A has nullity 2, so

rank(A) = n − nullity(A) = 7 − 2 = 5 Recall from Section 4.7 that if Ax = b is a consistent linear system, then its general solution can be expressed as the sum of a particular solution of this system and the general solution of Ax = 0. We leave it as an exercise for you to use this fact and Theorem 4.8.3 to prove the following result.

= b is a consistent linear system of m equations in n unknowns, and if A has rank r , then the general solution of the system contains n − r parameters.

THEOREM 4.8.4 If Ax

The Fundamental Spaces of a Matrix

There are six important vector spaces associated with a matrix A and its transpose AT : row space of A

row space of AT

column space of A

column space of AT

null space of A

null space of AT

252

Chapter 4 General Vector Spaces

If A is an m × n matrix, then the row space and null space of A are subspaces of R n , and the column space of A and the null space of AT are subspaces of R m .

However, transposing a matrix converts row vectors into column vectors and conversely, so except for a difference in notation, the row space of AT is the same as the column space of A, and the column space of AT is the same as the row space of A. Thus, of the six spaces listed above, only the following four are distinct: row space of A

column space of A

null space of A

null space of AT

These are called the fundamental spaces of a matrix A. We will now consider how these four subspaces are related. Let us focus for a moment on the matrix AT . Since the row space and column space of a matrix have the same dimension, and since transposing a matrix converts its columns to rows and its rows to columns, the following result should not be surprising. THEOREM 4.8.5 If A is any matrix, then rank(A)

= rank(AT ).

Proof

rank(A) = dim(row space of A) = dim(column space of AT ) = rank(AT ). This result has some important implications. For example, if A is an m × n matrix, then applying Formula (4) to the matrix AT and using the fact that this matrix has m columns yields rank(AT ) + nullity(AT ) = m which, by virtue of Theorem 4.8.5, can be rewritten as rank(A) + nullity(AT ) = m

(5)

This alternative form of Formula (4) makes it possible to express the dimensions of all four fundamental spaces in terms of the size and rank of A. Specifically, if rank(A) = r , then

A Geometric Link Between the Fundamental Spaces

dim[row(A)] = r

dim[col(A)] = r

dim[null(A)] = n − r

dim[null(AT )] = m − r

(6)

The four formulas in (6) provide an algebraic relationship between the size of a matrix and the dimensions of its fundamental spaces. Our next objective is to find a geometric relationship between the fundamental spaces themselves. For this purpose recall from Theorem 3.4.3 that if A is an m × n matrix, then the null space of A consists of those vectors that are orthogonal to each of the row vectors of A. To develop that idea in more detail, we make the following definition.

W is a subspace of R n , then the set of all vectors in R n that are orthogonal to every vector in W is called the orthogonal complement of W and is denoted by the symbol W ⊥ . DEFINITION 2 If

The following theorem lists three basic properties of orthogonal complements. We will omit the formal proof because a more general version of this theorem will be proved later in the text.

4.8 Rank, Nullity, and the Fundamental Matrix Spaces

Part (b) of Theorem 4.8.6 can be expressed as

THEOREM 4.8.6 If W is a subspace of R n , then:

W ∩ W ⊥ = {0}

(b) The only vector common to W and W ⊥ is 0.

and part (c) as

253

(a) W ⊥ is a subspace of R n . (c) The orthogonal complement of W ⊥ is W .

(W ⊥ )⊥ = W

E X A M P L E 5 Orthogonal Complements

In R 2 the orthogonal complement of a line W through the origin is the line through the origin that is perpendicular to W (Figure 4.8.1a); and in R 3 the orthogonal complement of a plane W through the origin is the line through the origin that is perpendicular to that plane (Figure 4.8.1b). y

y

W⊥ W W x x

Explain why {0} and R n are orthogonal complements.

W⊥

z

(a)

Figure 4.8.1

(b)

The next theorem will provide a geometric link between the fundamental spaces of a matrix. In the exercises we will ask you to prove that if a vector in R n is orthogonal to each vector in a basis for a subspace of R n , then it is orthogonal to every vector in that subspace. Thus, part (a) of the following theorem is essentially a restatement of Theorem 3.4.3 in the language of orthogonal complements; it is illustrated in Example 6 of Section 3.4. The proof of part (b), which is left as an exercise, follows from part (a). The essential idea of the theorem is illustrated in Figure 4.8.2. z

z

y T

0

Figure 4.8.2

Row

A

ll A Nu

ll A

Nu

x

y

x

0 Col

A

THEOREM 4.8.7 If A is an m × n matrix, then:

(a) The null space of A and the row space of A are orthogonal complements in R n . (b) The null space of AT and the column space of A are orthogonal complements in R m .

254

Chapter 4 General Vector Spaces

More on the Equivalence Theorem

In Theorem 2.3.8 we listed six results that are equivalent to the invertibility of a square matrix A. We are now in a position to add ten more statements to that list to produce a single theorem that summarizes and links together all of the topics that we have covered thus far. We will prove some of the equivalences and leave others as exercises. THEOREM 4.8.8 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent. (a)

A is invertible.

(b) (c)

Ax = 0 has only the trivial solution. The reduced row echelon form of A is In .

(d )

A is expressible as a product of elementary matrices.

(e)

Ax = b is consistent for every n × 1 matrix b.

( f)

Ax = b has exactly one solution for every n × 1 matrix b.

( g)

det(A)  = 0.

(h)

The column vectors of A are linearly independent.

(i )

The row vectors of A are linearly independent.

( j)

The column vectors of A span R n .

(k)

The row vectors of A span R n .

(l )

The column vectors of A form a basis for R n .

(m) The row vectors of A form a basis for R n . (n)

A has rank n.

(o)

A has nullity 0.

( p) The orthogonal complement of the null space of A is R n . (q)

The orthogonal complement of the row space of A is {0}.

(h) through (m) follows from Theorem 4.5.4 (we omit the details). To complete the proof we will show that (b), (n), and (o) are equivalent by proving the chain of implications (b) ⇒ (o) ⇒ (n) ⇒ (b).

Proof The equivalence of

(b) ⇒ (o) If Ax = 0 has only the trivial solution, then there are no parameters in that solution, so nullity(A) = 0 by Theorem 4.8.3(b). (o) ⇒ (n) Theorem 4.8.2. (n) ⇒ (b) If A has rank n, then Theorem 4.8.3(a) implies that there are n leading variables

(hence no free variables) in the general solution of Ax = 0. This leaves the trivial solution as the only possibility.

Applications of Rank

The advent of the Internet has stimulated research on finding efficient methods for transmitting large amounts of digital data over communications lines with limited bandwidths. Digital data are commonly stored in matrix form, and many techniques for improving transmission speed use the rank of a matrix in some way. Rank plays a role because it measures the “redundancy” in a matrix in the sense that if A is an m × n matrix of rank k , then n − k of the column vectors and m − k of the row vectors can be expressed in terms of k linearly independent column or row vectors. The essential idea in many data compression schemes is to approximate the original data set by a data set with smaller rank that conveys nearly the same information, then eliminate redundant vectors in the approximating set to speed up the transmission time.

4.8 Rank, Nullity, and the Fundamental Matrix Spaces O PT I O N A L

Overdetermined and Underdetermined Systems

In engineering and physics, the occurrence of an overdetermined or underdetermined linear system often signals that one or more variables were omitted in formulating the problem or that extraneous variables were included. This often leads to some kind of complication.

255

In many applications the equations in a linear system correspond to physical constraints or conditions that must be satisfied. In general, the most desirable systems are those that have the same number of constraints as unknowns since such systems often have a unique solution. Unfortunately, it is not always possible to match the number of constraints and unknowns, so researchers are often faced with linear systems that have more constraints than unknowns, called overdetermined systems, or with fewer constraints than unknowns, called underdetermined systems. The following theorem will help us to analyze both overdetermined and underdetermined systems. THEOREM 4.8.9 Let A be an m × n matrix.

(a) (Overdetermined Case). If m > n, then the linear system Ax = b is inconsistent for at least one vector b in R n . (b) (Underdetermined Case). If m < n, then for each vector b in R m the linear system Ax = b is either inconsistent or has infinitely many solutions.

m > n, in which case the column vectors of A cannot span R m (fewer vectors than the dimension of R m ). Thus, there is at least one vector b in R m that is not in the column space of A, and for any such b the system Ax = b is inconsistent by Theorem 4.7.1. Proof (a) Assume that

Proof (b) Assume that m

< n. For each vector b in R n there are two possibilities: either

the system Ax = b is consistent or it is inconsistent. If it is inconsistent, then the proof is complete. If it is consistent, then Theorem 4.8.4 implies that the general solution has n − r parameters, where r = rank(A). But we know from Example 2 that rank(A) is at most the smaller of m and n (which is m), so

n−r ≥n−m>0 This means that the general solution has at least one parameter and hence there are infinitely many solutions. E X A M P L E 6 Overdetermined and Underdetermined Systems

(a) What can you say about the solutions of an overdetermined system Ax = b of 7 equations in 5 unknowns in which A has rank r = 4? (b) What can you say about the solutions of an underdetermined system Ax = b of 5 equations in 7 unknowns in which A has rank r = 4?

R 7 , and for any such b the number of parameters in the general solution is n − r = 5 − 4 = 1. Solution (a) The system is consistent for some vector b in

Solution (b) The system may be consistent or inconsistent, but if it is consistent for the

vector b in R 5 , then the general solution has n − r = 7 − 4 = 3 parameters. E X A M P L E 7 An Overdetermined System

The linear system

x1 x1 x1 x1 x1

− 2 x2 − x2 + x2 + 2 x2 + 3x2

= b1 = b2 = b3 = b4 = b5

is overdetermined, so it cannot be consistent for all possible values of b1 , b2 , b3 , b4 , and b5 . Conditions under which the system is consistent can be obtained by solving the linear

256

Chapter 4 General Vector Spaces

system by Gauss–Jordan elimination. We leave it for you to show that the augmented matrix is row equivalent to



1 ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎣0 0

2b2 − b1

0 1 0 0 0

b2 b3 − 3b2 b4 − 4b2 b5 − 5b2



− b1 ⎥ ⎥ ⎥ + 2 b1 ⎥ ⎥ + 3b1 ⎦ + 4 b1

(7)

Thus, the system is consistent if and only if b1 , b2 , b3 , b4 , and b5 satisfy the conditions

=0 2b1 − 3b2 + b3 3b1 − 4b2 + b4 =0 4b1 − 5b2 + b5 = 0 Solving this homogeneous linear system yields

b1 = 5r − 4s, b2 = 4r − 3s, b3 = 2r − s, b4 = r, b5 = s where r and s are arbitrary. Remark The coefficient matrix for the given linear system in the last example has n = 2 columns, and it has rank r = 2 because there are two nonzero rows in its reduced row echelon form. This implies that when the system is consistent its general solution will contain n − r = 0 parameters; that is, the solution will be unique. With a moment’s thought, you should be able to see that this is so from (7).

Exercise Set 4.8 In Exercises 1–2, find the rank and nullity of the matrix A by reducing it to row echelon form.



1

⎢2 ⎢ 1. (a) A = ⎢ ⎣3

4

4

8



−1 −2 −3 −4

2 6

1

⎤ ⎥

4 2

3

(b) A = ⎣−3

6

−1

1

2

−4

5

8

1

0

−2 −3

1

1

−1



(c) Find the number of leading variables and the number of parameters in the general solution of Ax = 0 without solving the system.

3⎦

−2



(b) Confirm that the rank and nullity satisfy Formula (4).

2⎥ ⎥

1

⎤ −1 ⎥ −7 ⎦ −4 ⎤ 0

⎢ 0 ⎢ 2. (a) A = ⎢ ⎣−2

−1 −1

0

1

3

0

1

3

1

3

1

1



⎢ 0 ⎢ ⎢ (b) A = ⎢−3 ⎢ ⎣ 3

0

6

4

2

0

−2 −4

(a) By inspection of the matrix R , find the rank and nullity of A.

1



3⎥ ⎥ ⎥ 3⎦

−4

0⎥ ⎥

⎥ −1⎥ ⎥ 1⎦ −2

In Exercises 3–6, the matrix R is the reduced row echelon form of the matrix A.



⎤ ⎡ −3 1 ⎥ ⎢ −3⎦; R = ⎣0

0

0

1

0⎦

0

0

1

2

−1

3. A = ⎣−1

2

1

1

2

−1

4. A = ⎣−1

2

1

1

2

−1

5. A = ⎣−2

1

−4

2

6

2

2

0

−1

3

1

1

3

⎢ ⎡ ⎢ ⎡ ⎢ ⎡

0

⎢ 1 ⎢ ⎣ 2 −2

6. A = ⎢

4

⎡ ⎤ −3 1 ⎢ ⎥ −3⎦; R = ⎣0 0 −6

⎤ ⎥

1

⎤ −3 ⎥ −3⎦

0

0

⎡ ⎤ −3 1 ⎢ ⎥ 3⎦; R = ⎣0

− 21

− 23

0

0⎦

0

0

0

1

0

−1

1

1

0

0

1⎦

0

0

0

4



0



⎢0 −3⎥ ⎢ ⎥ ⎥; R = ⎢ ⎣0 1⎦ 0 −2

⎤ ⎥ ⎤

0

0⎥ ⎥



4.8 Rank, Nullity, and the Fundamental Matrix Spaces

7. In each part, find the largest possible value for the rank of A and the smallest possible value for the nullity of A. (a) A is 4 × 4

(b) A is 3 × 5

15. Are there values of r and s for which



1 ⎢0 ⎢ ⎢ ⎣0 0

(c) A is 5 × 3

8. If A is an m × n matrix, what is the largest possible value for its rank and the smallest possible value for its nullity?

0

r −2 s−1 0

257



0 2 ⎥ ⎥ ⎥ r + 2⎦ 3

has rank 1? Has rank 2? If so, find those values. 9. In each part, use the information in the table to: (i) find the dimensions of the row space of A, column space of A, null space of A, and null space of AT ; (ii) determine whether or not the linear system Ax = b is consistent; (iii) find the number of parameters in the general solution of each system in (ii) that is consistent. (a)

(b)

(c)

(d)

(e)

(f )

(g)

Size of A 3×3 3×3 3×3 5×9 5×9 4×4 6×2 3 2 1 2 2 0 2 Rank(A) 3 1 2 3 0 2 Rank[A | b] 3

1 ⎢ A = ⎣ −3 −2

2 1 3

4 5 9



(c) What kind of geometric object is the row space of your matrix? 17. Suppose that A is a 3 × 3 matrix whose null space is a line through the origin in 3-space. Can the row or column space of A also be a line through the origin? Explain. 18. (a) If A is a 3 × 5 matrix, then the rank of A is at most . Why?

(c) If A is a 3 × 5 matrix, then the rank of AT is at most . Why?

0 ⎥ 2⎦ 2

(d) If A is a 3 × 5 matrix, then the nullity of AT is at most . Why?

11. (a) Find an equation relating nullity(A) and nullity(AT ) for the matrix in Exercise 10. (b) Find an equation relating nullity(A) and nullity(AT ) for a general m × n matrix. 12. Let T : R 2 →R 3 be the linear transformation defined by the formula

T (x1 , x2 ) = (x1 + 3x2 , x1 − x2 , x1 )

19. (a) If A is a 3 × 5 matrix, then the number of leading 1’s in . the reduced row echelon form of A is at most Why? (b) If A is a 3 × 5 matrix, then the number of parameters in . Why? the general solution of Ax = 0 is at most (c) If A is a 5 × 3 matrix, then the number of leading 1’s in . the reduced row echelon form of A is at most Why? (d) If A is a 5 × 3 matrix, then the number of parameters in . Why? the general solution of Ax = 0 is at most

(a) Find the rank of the standard matrix for T . (b) Find the nullity of the standard matrix for T . 13. Let T : R →R be the linear transformation defined by the formula 5

(b) What kind of geometric object is the null space of your matrix?

(b) If A is a 3 × 5 matrix, then the nullity of A is at most . Why?

10. Verify that rank(A) = rank(AT ).



16. (a) Give an example of a 3 × 3 matrix whose column space is a plane through the origin in 3-space.

3

20. Let A be a 7 × 6 matrix such that Ax = 0 has only the trivial solution. Find the rank and nullity of A. 21. Let A be a 5 × 7 matrix with rank 4. (a) What is the dimension of the solution space of Ax = 0 ?

T (x1 , x2 , x3 , x4 , x5 ) = (x1 + x2 , x2 + x3 + x4 , x4 + x5 )

(b) Is Ax = b consistent for all vectors b in R 5 ? Explain.

(a) Find the rank of the standard matrix for T .

22. Let



(b) Find the nullity of the standard matrix for T . 14. Discuss how the rank of A varies with t .





1 ⎢ (a) A = ⎣1

1

t

t

t

1

1⎦ 1



⎡ ⎢

t

(b) A = ⎣ 3 −1

A=

3 6 −3



−1 ⎥ −2⎦ t

a11 a21

a12 a22

a13 a23

Show that A has rank 2 if and only if one or more of the following determinants is nonzero.

 a11  a 21

 a12  , a22 

 a11  a 21

 a13  , a23 

 a12  a 22

 a13  a23 

258

Chapter 4 General Vector Spaces

23. Use the result in Exercise 22 to show that the set of points (x, y, z) in R 3 for which the matrix



x 1

y x

z y

has rank 1 is the curve with parametric equations x = t , y = t 2, z = t 3.

Working with Proofs 29. Prove: If k  = 0, then A and kA have the same rank. 30. Prove: If a matrix A is not square, then either the row vectors or the column vectors of A are linearly dependent. 31. Use Theorem 4.8.3 to prove Theorem 4.8.4.

24. Find matrices A and B for which rank(A) = rank(B), but rank(A2 )  = rank(B 2 ).

32. Prove Theorem 4.8.7(b).

25. In Example 6 of Section 3.4 we showed that the row space and the null space of the matrix

33. Prove: If a vector v in R n is orthogonal to each vector in a basis for a subspace W of R n , then v is orthogonal to every vector in W .



1

⎢2 ⎢ A=⎢ ⎣0 2



0

2

6

−2 −5

−2

4

0

5

10

0

−3⎥ ⎥ ⎥ 15⎦

6

0

8

4

18

3

0

are orthogonal complements in R 6 , as guaranteed by part (a) of Theorem 4.8.7. Show that null space of AT and the column space of A are orthogonal complements in R 4 , as guaranteed by part (b) of Theorem 4.8.7. [Suggestion: Show that each column vector of A is orthogonal to each vector in a basis for the null space of AT .]

True-False Exercises TF. In parts (a)–( j) determine whether the statement is true or false, and justify your answer. (a) Either the row vectors or the column vectors of a square matrix are linearly independent. (b) A matrix with linearly independent row vectors and linearly independent column vectors is square. (c) The nullity of a nonzero m × n matrix is at most m.

26. Confirm the results stated in Theorem 4.8.7 for the matrix.



−2

⎢ 1 ⎢ A=⎢ ⎣ 3 1

−5

8

0

3

−5 −19 −13

1

11 7

7 5

⎤ −17 5⎥ ⎥ ⎥ 1⎦ −3

27. In each part, state whether the system is overdetermined or underdetermined. If overdetermined, find all values of the b’s for which it is inconsistent, and if underdetermined, find all values of the b’s for which it is inconsistent and all values for which it has infinitely many solutions.

⎡ ⎢

1

(a) ⎣−3 0

 (b)

−2 

(c)

1

1

−1

⎡ ⎤ ⎤ b1 −1   ⎢ ⎥ ⎥ x 1⎦ = ⎣b2 ⎦ y 1 b3 ⎡ ⎤    x b1 −3 4 ⎢ ⎥ ⎣y ⎦ = −6 8 b2 z ⎡ ⎤    x b1 −3 0 ⎢ ⎥ ⎣y ⎦ = 1 1 b2 z

28. What conditions must be satisfied by b1 , b2 , b3 , b4 , and b5 for the overdetermined linear system

x1 − 3x2 = b1 x1 − 2x2 = b2 x1 + x2 = b3 x1 − 4x2 = b4 x1 + 5x2 = b5 to be consistent?

(d) Adding one additional column to a matrix increases its rank by one. (e) The nullity of a square matrix with linearly dependent rows is at least one. (f ) If A is square and Ax = b is inconsistent for some vector b, then the nullity of A is zero. (g) If a matrix A has more rows than columns, then the dimension of the row space is greater than the dimension of the column space. (h) If rank(AT ) = rank(A), then A is square. (i) There is no 3 × 3 matrix whose row space and null space are both lines in 3-space. ( j) If V is a subspace of R n and W is a subspace of V, then W ⊥ is a subspace of V ⊥ .

Working withTechnology T1. It can be proved that a nonzero matrix A has rank k if and only if some k × k submatrix has a nonzero determinant and all square submatrices of larger size have determinant zero. Use this fact to find the rank of



3

⎢5 ⎢ A=⎢ ⎣1 7



−1

3

2

5

−3

2

3

4⎥ ⎥

−3

−5

0

⎥ −7⎦

−5

1

4

1

Check your result by computing the rank of A in a different way.

4.9 Basic Matrix Transformations in R 2 and R 3

T2. Sylvester’s inequality states that if A and B are n × n matrices with rank rA and rB , respectively, then the rank rAB of AB satisfies the inequality

259

where min(rA , rB ) denotes the smaller of rA and rB or their common value if the two ranks are the same. Use your technology utility to confirm this result for some matrices of your choice.

rA + rB − n ≤ rAB ≤ min(rA , rB )

4.9 Basic Matrix Transformations in R 2 and R 3 In this section we will continue our study of linear transformations by considering some basic types of matrix transformations in R 2 and R 3 that have simple geometric interpretations. The transformations we will study here are important in such fields as computer graphics, engineering, and physics.

There are many ways to transform the vector spaces R 2 and R 3 , some of the most important of which can be accomplished by matrix transformations using the methods introduced in Section 1.8. For example, rotations about the origin, reflections about lines and planes through the origin, and projections onto lines and planes through the origin can all be accomplished using a linear operator TA in which A is an appropriate 2 × 2 or 3 × 3 matrix.

Reflection Operators

Some of the most basic matrix operators on R 2 and R 3 are those that map each point into its symmetric image about a fixed line or a fixed plane that contains the origin; these are called reflection operators. Table 1 shows the standard matrices for the reflections about the coordinate axes in R 2 , and Table 2 shows the standard matrices for the reflections about the coordinate planes in R 3 . In each case the standard matrix was obtained using the following procedure introduced in Section 1.8: Find the images of the standard basis vectors, convert those images to column vectors, and then use those column vectors as successive columns of the standard matrix.

Table 1 Operator

Illustration y

Standard Matrix

(x, y)

x

Reflection about the x -axis

T (x, y) = (x, −y)

Images of e1 and e2

T (e1 ) = T (1, 0) = (1, 0) T (e2 ) = T (0, 1) = (0, −1)

x



1 0

0 −1

T(x) (x, –y) y

Reflection about the y -axis

T (x, y) = (−x, y)

(–x, y) T(x)

x

y

Reflection about the line y = x

T (x, y) = (y, x)

T (e1 ) = T (1, 0) = (−1, 0) T (e2 ) = T (0, 1) = (0, 1)

(x, y)

(y, x)

−1 0

0 1

x y=x

T(x) x



(x, y) x

T (e1 ) = T (1, 0) = (0, 1) T (e2 ) = T (0, 1) = (1, 0)



0 1

1 0

260

Chapter 4 General Vector Spaces Table 2 Operator

Images of e1 , e2 , e3

Illustration

Standard Matrix

z (x, y, z)

Reflection about the xy -plane

T (e1 ) = T (1, 0, 0) = (1, 0, 0) T (e2 ) = T (0, 1, 0) = (0, 1, 0) T (e3 ) = T (0, 0, 1) = (0, 0, −1)

x y

T (x, y, z) = (x, y, −z) x

T(x)



1 ⎢ ⎣0 0



0 1 0

0 ⎥ 0⎦ −1

0 −1 0

0 ⎥ 0⎦ 1

(x, y, –z) z

(x, –y, z)

Reflection about the xz-plane

(x, y, z) x

T(x)

y

T (x, y, z) = (x, −y, z)

T (e1 ) = T (1, 0, 0) = (1, 0, 0) T (e2 ) = T (0, 1, 0) = (0, −1, 0) T (e3 ) = T (0, 0, 1) = (0, 0, 1)



1 ⎢ ⎣0 0



x

z

Reflection about the yz-plane

(–x, y, z)

T(x)

(x, y, z)

T (x, y, z) = (−x, y, z)

y

x

T (e1 ) = T (1, 0, 0) = (−1, 0, 0) T (e2 ) = T (0, 1, 0) = (0, 1, 0) T (e3 ) = T (0, 0, 1) = (0, 0, 1)



−1

⎢ ⎣ 0 0

0 1 0



0 ⎥ 0⎦ 1

x

Projection Operators

Matrix operators on R 2 and R 3 that map each point into its orthogonal projection onto a fixed line or plane through the origin are called projection operators (or more precisely, orthogonal projection operators). Table 3 shows the standard matrices for the orthogonal projections onto the coordinate axes in R 2 , and Table 4 shows the standard matrices for the orthogonal projections onto the coordinate planes in R 3 .

Table 3 Operator

Illustration

Images of e1 and e2

Standard Matrix

y (x, y)

Orthogonal projection onto the x -axis

T (e1 ) = T (1, 0) = (1, 0) T (e2 ) = T (0, 1) = (0, 0)

x

T (x, y) = (x, 0)

(x, 0) x



1 0

0 0

0 0

0 1

T(x) y

Orthogonal projection onto the y -axis

T (x, y) = (0, y)

(0, y) T(x)

(x, y) x

x

T (e1 ) = T (1, 0) = (0, 0) T (e2 ) = T (0, 1) = (0, 1)



4.9 Basic Matrix Transformations in R 2 and R 3

261

Table 4 Operator

Images of e1 , e2 , e3

Illustration

Standard Matrix

z

Orthogonal projection onto the xy -plane

(x, y, z)

x

T (x, y, z) = (x, y, 0)

y

T(x)

x

T (e1 ) = T (1, 0, 0) = (1, 0, 0) T (e2 ) = T (0, 1, 0) = (0, 1, 0) T (e3 ) = T (0, 0, 1) = (0, 0, 0)



1 ⎢ ⎣0 0



0 1 0

0 ⎥ 0⎦ 0

0 0 0

0 ⎥ 0⎦ 1

0 1 0

0 ⎥ 0⎦ 1

(x, y, 0)

z

Orthogonal projection onto the xz-plane

(x, 0, z)

(x, y, z)

x

y

T(x)

T (x, y, z) = (x, 0, z)

T (e1 ) = T (1, 0, 0) = (1, 0, 0) T (e2 ) = T (0, 1, 0) = (0, 0, 0) T (e3 ) = T (0, 0, 1) = (0, 0, 1)



1 ⎢ ⎣0 0



x z

(0, y, z) T(x)

Orthogonal projection onto the yz-plane

(x, y, z) y

x

T (x, y, z) = (0, y, z)

T (e1 ) = T (1, 0, 0) = (0, 0, 0) T (e2 ) = T (0, 1, 0) = (0, 1, 0) T (e3 ) = T (0, 0, 1) = (0, 0, 1)



0 ⎢ ⎣0 0



x

Rotation Operators

Matrix operators on R 2 and R 3 that move points along arcs of circles centered at the origin are called rotation operators. Let us consider how to find the standard matrix for the rotation operator T : R 2 →R 2 that moves points counterclockwise about the origin through a positive angle θ . As illustrated in Figure 4.9.1, the images of the standard basis vectors are

T (e1 ) = T (1, 0) = (cos θ, sin θ) and T (e2 ) = T (0, 1) = (− sin θ, cos θ ) so it follows from Formula (14) of Section 1.8 that the standard matrix for T is



A = [T (e1 ) | T (e2 )] =

(–sin θ, cos θ)

cos θ sin θ

− sin θ cos θ

y e2

T

(cos θ, sin θ) 1

u

1 u

T

x

e1

Figure 4.9.1

In keeping with common usage we will denote this operator by Rθ and call



cos θ Rθ = sin θ

− sin θ cos θ

(1)

262

Chapter 4 General Vector Spaces

In the plane, counterclockwise angles are positive and clockwise angles are negative. The rotation matrix for a clockwise rotation of −θ radians can be obtained by replacing θ by −θ in (1). After simplification this yields



R−θ =

cos θ

sin θ

− sin θ

cos θ

the rotation matrix for R 2 . If x = (x, y) is a vector in R 2 , and if w = (w1 , w2 ) is its image under the rotation, then the relationship w = Rθ x can be written in component form as w1 = x cos θ − y sin θ (2) w2 = x sin θ + y cos θ These are called the rotation equations for R 2 . These ideas are summarized in Table 5. Table 5



Operator

Illustration y

Counterclockwise rotation about the origin through an angle θ

Rotation Equations

(w1, w2)

w1 = x cos θ − y sin θ w2 = x sin θ + y cos θ

w (x, y)

θ

Standard Matrix



− sin θ cos θ

cos θ sin θ

x

x

E X A M P L E 1 A Rotation Operator

Find the image of x = (1, 1) under a rotation of π/6 radians (= 30◦ ) about the origin. Solution It follows from (1) with θ

 √3

Rπ/6 x =

2 1 2

= π/6 that   √3−1 

− 21 1 0.37 2 ≈ = √ √ 1+ 3 3 1 1.37 2

2

or in comma-delimited notation, Rπ/6 (1, 1) ≈ (0.37, 1.37). Rotations in R 3

A rotation of vectors in R 3 is commonly described in relation to a line through the origin called the axis of rotation and a unit vector u along that line (Figure 4.9.2a). The unit vector and what is called the right-hand rule can be used to establish a sign for the angle of rotation by cupping the fingers of your right hand so they curl in the direction of rotation and observing the direction of your thumb. If your thumb points in the direction of u, then the angle of rotation is regarded to be positive relative to u, and if it points in the direction opposite to u, then it is regarded to be negative relative to u (Figure 4.9.2b). z

z x

Figure 4.9.2

(a) Angle of rotation

u

y

y x

x

Negative rotation

u

θ w

z

Positive rotation

Axis of rotation l

y

x

(b) Right-hand rule

For rotations about the coordinate axes in R 3 , we will take the unit vectors to be i, j, and k, in which case an angle of rotation will be positive if it is counterclockwise looking toward the origin along the positive coordinate axis and will be negative if it is clockwise. Table 6 shows the standard matrices for the rotation operators on R 3 that rotate each vector about one of the coordinate axes through an angle θ . You will find it instructive to compare these matrices to that in Table 5.

4.9 Basic Matrix Transformations in R 2 and R 3

263

Table 6 Operator

Standard Matrix

Rotation Equations

Illustration z

Counterclockwise rotation about the positive x -axis through an angle θ

y w

w1 = x w2 = y cos θ − z sin θ w3 = y sin θ + z cos θ



1 ⎢ ⎣0 0



0 cos θ sin θ

0 ⎥ − sin θ ⎦ cos θ

x

θ

x z

Counterclockwise rotation about the positive y -axis through an angle θ

x θ

x

w1 = x cos θ + z sin θ w2 = y w3 = −x sin θ + z cos θ

y



cos θ ⎢ ⎣ 0 − sin θ

0 1 0



sin θ ⎥ 0 ⎦ cos θ

w z

Counterclockwise rotation about the positive z-axis through an angle θ

θ

x

w1 = x cos θ − y sin θ w2 = x sin θ + y cos θ w3 = z

w y



cos θ ⎢ ⎣ sin θ 0

− sin θ cos θ 0



0 ⎥ 0⎦ 1

x

Yaw, Pitch, and Roll In aeronautics and astronautics, the orientation of an aircraft or space shuttle relative to an xyz-coordinate system is often described in terms of angles called yaw, pitch, and roll. If, for example, an aircraft is flying along the y -axis and the xy -plane defines the horizontal, then the aircraft’s angle of rotation about the z-axis is called the yaw, its angle of rotation about the x -axis is called the pitch, and its angle of rotation about the y -axis is called the roll. A combination of yaw, pitch, and roll can be achieved by a single rotation about some axis through the origin. This is, in fact, how a space shuttle makes attitude adjustments—it doesn’t perform each rotation separately; it calculates one axis, and rotates about that axis to get the correct orientation. Such rotation maneuvers are used to

align an antenna, point the nose toward a celestial object, or position a payload bay for docking. z Yaw

y

x Pitch

Roll

For completeness, we note that the standard matrix for a counterclockwise rotation through an angle θ about an axis in R 3 , which is determined by an arbitrary unit vector u = (a, b, c) that has its initial point at the origin, is



⎤ a 2 (1 − cos θ) + cos θ ab(1 − cos θ) − c sin θ ac(1 − cos θ) + b sin θ ⎢ ⎥ bc(1 − cos θ) − a sin θ ⎦ ⎣ab(1 − cos θ) + c sin θ b2 (1 − cos θ) + cos θ ac(1 − cos θ) − b sin θ bc(1 − cos θ) + a sin θ c2 (1 − cos θ) + cos θ

(3)

The derivation can be found in the book Principles of Interactive Computer Graphics, by W. M. Newman and R. F. Sproull (New York: McGraw-Hill, 1979). You may find it instructive to derive the results in Table 6 as special cases of this more general result.

264

Chapter 4 General Vector Spaces

Dilations and Contractions

If k is a nonnegative scalar, then the operator T(x) = k x on R 2 or R 3 has the effect of increasing or decreasing the length of each vector by a factor of k . If 0 ≤ k < 1 the operator is called a contraction with factor k , and if k > 1 it is called a dilation with factor k (Figure 4.9.3). Tables 7 and 8 illustrate these operators. If k = 1, then T is the identity operator. x

T(x) = kx x

T(x) = kx

(a) 0 ≤ k < 1

Figure 4.9.3

(b) k > 1

Table 7 Illustration

Effect on the Unit Square

T (x, y) = (kx, ky)

Operator

y

Contraction with factor k in R 2

x T(x)

(0 ≤ k < 1)

(0, 1)

(x, y)

(kx, ky)

(0, k)

x

(1, 0) y

Dilation with factor k in R 2

T(x) x

Standard Matrix

(kx, ky)



(k, 0) (0, k)

(0, 1)

(x, y)

(k > 1)

x

(1, 0)

(k, 0)

Table 8 Illustration

Standard Matrix

T (x, y, z) = (kx, ky, kz)

Operator

z

Contraction with factor k in R 3

x T(x)

(x, y, z)

(kx, ky, kz)

(0 ≤ k < 1)

y



k

x z

(kx, ky, kz) T(x)

Dilation with factor k in R 3

x

(x, y, z) y

(k > 1) x

0



⎢ ⎣0

k

0 ⎥ 0⎦

0

0

k

k

0

0

k

4.9 Basic Matrix Transformations in R 2 and R 3

Expansions and Compressions

265

In a dilation or contraction of R 2 or R 3 , all coordinates are multiplied by a nonnegative factor k . If only one coordinate is multiplied by k , then, depending on the value of k , the resulting operator is called a compression or expansion with factor k in the direction of a coordinate axis. This is illustrated in Table 9 for R 2 . The extension to R 3 is left as an exercise.

Table 9 Illustration

Effect on the Unit Square

T (x, y) = (kx, y)

Operator

Compression in the x -direction with factor k in R 2

y (kx, y)

x (1, 0)

(x, y)

(kx, y)

(0, 1)

T(x)

(k, 0)

Effect on the Unit Square

Standard Matrix

y (0, 1)

(x, y) x

(0, k)

(x, ky) x (1, 0)

T(x) y

Shears

0 1

(0, 1)

(1, 0)

(0 ≤ k < 1)

(k > 1)

k

x

Illustration

Expansion in the y -direction with factor k in R 2



x

T (x, y) = (x, ky)

Compression in the y -direction with factor k in R 2

(k, 0)

0

y

(k > 1)

Operator

(0, 1)

x

(0 ≤ k < 1)

Expansion in the x -direction with factor k in R 2

(0, 1)

(x, y)

T(x)

Standard Matrix

(x, ky) T(x)

(x, y) x

(1, 0)



1 0

0

k

(0, k)

(0, 1)

x (1, 0)

(1, 0)

A matrix operator of the form T (x, y) = (x + ky, y) translates a point (x, y) in the xy -plane parallel to the x -axis by an amount ky that is proportional to the y -coordinate of the point. This operator leaves the points on the x -axis fixed (since y = 0), but as we progress away from the x -axis, the translation distance increases. We call this operator the shear in the x-direction by a factor k. Similarly, a matrix operator of the form T (x, y) = (x, y + kx) is called the shear in the y-direction by a factor k. Table 10, which illustrates the basic information about shears in R 2 , shows that a shear is in the positive direction if k > 0 and the negative direction if k < 0.

266

Chapter 4 General Vector Spaces Table 10 Operator

Effect on the Unit Square

(k, 1)

(k, 1)

(0, 1)

Shear in the x -direction by a factor k in R 2

Standard Matrix



(k > 0)

(0, 1)

Shear in the y -direction by a factor k in R 2

1

0 1

1

(k < 0)

(0, 1)

(0, 1)

(1, k)

k

(1, 0)

T (x, y) = (x, y + kx)

k

(1, 0)

(1, 0)

(1, 0)

T (x, y) = (x + ky, y)

1 0

(1, k) (k > 0)

(k < 0)

E X A M P L E 2 Effect of Matrix Operators on the Unit Square

In each part, describe the matrix operator whose standard matrix is shown, and show its effect on the unit square.



(a)

1 0

A1 =

2 1



(b)

−2

1 0

A2 =



1

2 0

A3 =

(c)

0 2



2 0

A4 =

(d)

0 1

Solution By comparing the forms of these matrices to those in Tables 7, 9, and 10, we

see that the matrix A1 corresponds to a shear in the x -direction by a factor 2, the matrix A2 corresponds to a shear in the x -direction by a factor −2, the matrix A3 corresponds to a dilation with factor 2, and the matrix A4 corresponds to an expansion in the x direction with factor 2. The effects of these operators on the unit square are shown in Figure 4.9.4. y

y

y 3

3

3

2

2

2

2

1

1

0

Orthogonal Projections onto LinesThrough the Origin y L

x T(x) θ

Figure 4.9.5

x

1

2

3

1

1

x

Figure 4.9.4

y

3

x –2

–1

0

A1

1

x

x 0

1

A2

2

0

3

1

A3

2

3

A4

In Table 3 we listed the standard matrices for the orthogonal projections onto the coordinate axes in R 2 . These are special cases of the more general matrix operator TA : R 2 →R 2 that maps each point into its orthogonal projection onto a line L through the origin that makes an angle θ with the positive x -axis (Figure 4.9.5). In Example 4 of Section 3.3 we used Formula (10) of that section to find the orthogonal projections of the standard basis vectors for R 2 onto that line. Expressed in matrix form, we found those projections to be



sin θ cos θ cos2 θ T (e1 ) = and T (e2 ) = sin θ cos θ sin2 θ Thus, the standard matrix for TA is



A = [T (e1 ) | T (e2 )] =

cos2 θ

sin θ cos θ

sin θ cos θ

sin2 θ

=

cos2 θ 1 2

sin 2θ

1 2

sin 2θ

sin2 θ

4.9 Basic Matrix Transformations in R 2 and R 3

267

In keeping with common usage, we will denote this operator by We have included two versions of Formula (4) because both are commonly used. Whereas the first version involves only the angle θ , the second involves both θ and 2θ .

Pθ =

cos2 θ

sin θ cos θ

sin θ cos θ

sin2 θ

=

1 2

cos2 θ 1 2

sin 2θ

(4)

sin2 θ

sin 2θ

E X A M P L E 3 Orthogonal Projection onto a Line Through the Origin

Use Formula (4) to find the orthogonal projection of the vector x = (1, 5) onto the line through the origin that makes an angle of π/6 (= 30◦ ) with the positive x -axis. Solution Since sin(π/6)

= 1/2 and cos(π/6) =



3/2, it follows from (4) that the stan-

dard matrix for this projection is

Pπ/6 =

cos2 (π/6)

sin(π/6) cos(π/6)

sin(π/6) cos(π/6)

sin2 (π/6)

Thus,





3 4

3 4

Pπ/6 x =





1 = 5

1 4

3 4

 3+5√3  √

4 3+5 4



3 4

=



3 4



2.91 ≈ 1.68



3 4



1 4

or in comma-delimited notation, Pπ/6 (1, 5) ≈ (2.91, 1.68). Reflections About Lines Through the Origin y

Hθx L θ

x

x

Figure 4.9.6

y

In Table 1 we listed the reflections about the coordinate axes in R 2 . These are special cases of the more general operator Hθ : R 2 →R 2 that maps each point into its reflection about a line L through the origin that makes an angle θ with the positive x -axis (Figure 4.9.6). We could find the standard matrix for Hθ by finding the images of the standard basis vectors, but instead we will take advantage of our work on orthogonal projections by using Formula (4) for Pθ to find a formula for Hθ . You should be able to see from Figure 4.9.7 that for every vector x in R n

Pθ x − x = 21 (Hθ x − x) or equivalently Hθ x = (2Pθ − I )x Thus, it follows from Theorem 1.8.4 that

H θ = 2 Pθ − I Hθx

and hence from (4) that



L θ

x

Figure 4.9.7

cos 2θ sin 2θ

Hθ =

Pθx x

(5)

sin 2θ − cos 2θ

(6)

E X A M P L E 4 Reflection About a Line Through the Origin

Find the reflection of the vector x = (1, 5) about the line through the origin that makes an angle of π/6 (= 30◦ ) with the x -axis.

=



3/2 and cos(π/3) = 1/2, it follows from (6) that the standard matrix for this reflection is

Solution Since sin(π/3)



Hπ/6

cos(π/3) = sin(π/3)

Thus,

 Hπ/6 x =

1 2



3 2



3 2

− 21

sin(π/3) = − cos(π/3)



1 = 5



2 3−5 2

3 2







− 21

3 2

 1+5√3  √



1 2

4.83 −1.63

or in comma-delimited notation, Hπ/6 (1, 5) ≈ (4.83, −1.63).

268

Chapter 4 General Vector Spaces

Exercise Set 4.9 1. Use matrix multiplication to find the reflection of (−1, 2) about the (a) x -axis.

(b) y -axis.

(c) line y = x .

2. Use matrix multiplication to find the reflection of (a, b) about the (a) x -axis.

(b) y -axis.

(c) line y = x .

3. Use matrix multiplication to find the reflection of (2, −5, 3) about the (a) xy -plane.

(b) xz-plane.

(c) yz-plane.

4. Use matrix multiplication to find the reflection of (a, b, c) about the (a) xy -plane.

(b) xz-plane.

(c) yz-plane.

5. Use matrix multiplication to find the orthogonal projection of (2, −5) onto the (a) x -axis.

(b) y -axis.

6. Use matrix multiplication to find the orthogonal projection of (a, b) onto the (a) x -axis.

(b) y -axis.

7. Use matrix multiplication to find the orthogonal projection of (−2, 1, 3) onto the (a) xy -plane.

(b) xz-plane.

(c) yz-plane.

8. Use matrix multiplication to find the orthogonal projection of (a, b, c) onto the (a) xy -plane.

(b) xz-plane.

(c) yz-plane.

9. Use matrix multiplication to find the image of the vector (3, −4) when it is rotated about the origin through an angle of (a) θ = 30◦ .

(b) θ = −60◦ .

(c) θ = 45◦ .

(d) θ = 90◦ .

10. Use matrix multiplication to find the image of the nonzero vector v = (v1 , v2 ) when it is rotated about the origin through (a) a positive angle α .

(b) a negative angle −α .

11. Use matrix multiplication to find the image of the vector (2, −1, 2) if it is rotated (a) 30◦ clockwise about the positive x -axis. (b) 30◦ counterclockwise about the positive y -axis. (c) 45◦ clockwise about the positive y -axis.

(c) 45◦ counterclockwise about the positive y -axis. (d) 90◦ clockwise about the positive z-axis. 13. (a) Use matrix multiplication to find the contraction of (−1, 2) with factor k = 21 . (b) Use matrix multiplication to find the dilation of (−1, 2) with factor k = 3. 14. (a) Use matrix multiplication to find the contraction of (a, b) with factor k = 1/α , where α > 1. (b) Use matrix multiplication to find the dilation of (a, b) with factor k = α , where α > 1. 15. (a) Use matrix multiplication to find the contraction of (2, −1, 3) with factor k = 41 . (b) Use matrix multiplication to find the dilation of (2, −1, 3) with factor k = 2. 16. (a) Use matrix multiplication to find the contraction of (a, b, c) with factor k = 1/α , where α > 1. (b) Use matrix multiplication to find the dilation of (a, b, c) with factor k = α , where α > 1. 17. (a) Use matrix multiplication to find the compression of (−1, 2) in the x -direction with factor k = 21 . (b) Use matrix multiplication to find the compression of (−1, 2) in the y -direction with factor k = 21 . 18. (a) Use matrix multiplication to find the expansion of (−1, 2) in the x -direction with factor k = 3. (b) Use matrix multiplication to find the expansion of (−1, 2) in the y -direction with factor k = 3. 19. (a) Use matrix multiplication to find the compression of (a, b) in the x -direction with factor k = 1/α , where α > 1. (b) Use matrix multiplication to find the expansion of (a, b) in the y -direction with factor k = α , where α > 1. 20. Based on Table 9, make a conjecture about the standard matrices for the compressions with factor k in the directions of the coordinate axes in R 3 . Exercises 21–22 Using Example 2 as a model, describe the matrix operator whose standard matrix is given, and then show in a coordinate system its effect on the unit square.

(d) 90◦ counterclockwise about the positive z-axis. 12. Use matrix multiplication to find the image of the vector (2, −1, 2) if it is rotated (a) 30◦ counterclockwise about the positive x -axis. (b) 30◦ clockwise about the positive y -axis.

 21. (a) A1 =

 (c) A3 =



1 2

0

0

1 2

1

0

1 2

1



 (b) A2 =



1

0

0

1 2

 (d) A4 =

1

0

− 21

1



4.9 Basic Matrix Transformations in R 2 and R 3

 22. (a) A1 =

3

0

0

3

 (c) A3 =

1

0

3

1



 (b) A2 =



1

0

0

3

 (d) A4 =



1

0

−3

1

269

32. In each part of the accompanying figure, find the standard matrix for the pictured operator.



z

z

z

(x, y, z)

(z, y, x) (y, x, z)

In each part of Exercises 23–24, the effect of some matrix operator on the unit square is shown. Find the standard matrix for an operator with that effect. 23. (a)

y

(b)

3 2

x

y 3

2

0

(b)

3

1

2

3

x

x –2

–1

0

1

(b)

(c)

33. Use Formula (3) to find the standard matrix for a rotation of 180◦ about the axis determined by the vector v = (2, 2, 1). [Note: Formula (3) requires that the vector defining the axis of rotation have length 1.]

y

1

1

x

3

2

2

y

x

3 y

24. (a)

(x, y, z)

(x, y, z)

Figure Ex-32

1 1

x

(a)

x 0

y

(x, z, y)

2

1

y

0

1

2

3

In Exercises 25–26, find the standard matrix for the orthogonal projection of R 2 onto the stated line, and then use that matrix to find the orthogonal projection of the given point onto that line. 25. The orthogonal projection of (3, 4) onto the line that makes an angle of π/3 (= 60◦ ) with the positive x -axis. 26. The orthogonal projection of (1, 2) onto the line that makes an angle of π/4 (= 45◦ ) with the positive x -axis. In Exercises 27–28, find the standard matrix for the reflection of R 2 about the stated line, and then use that matrix to find the reflection of the given point about that line. 27. The reflection of (3, 4) about the line that makes an angle of π/3 (= 60◦ ) with the positive x -axis. 28. The reflection of (1, 2) about the line that makes an angle of π/4 (= 45◦ ) with the positive x -axis. 29. For each reflection operator in Table 2 use the standard matrix to compute T (1, 2, 3), and convince yourself that your result makes sense geometrically. 30. For each orthogonal projection operator in Table 4 use the standard matrix to compute T (1, 2, 3), and convince yourself that your result makes sense geometrically. 31. Find the standard matrix for the operator T : R →R that 3

3

(a) rotates each vector 30◦ counterclockwise about the z-axis (looking along the positive z-axis toward the origin). (b) rotates each vector 45◦ counterclockwise about the x -axis (looking along the positive x -axis toward the origin). (c) rotates each vector 90◦ counterclockwise about the y -axis (looking along the positive y -axis toward the origin).

34. Use Formula (3) to find the standard matrix for a rotation of π/2 radians about the axis determined by v = (1, 1, 1). [Note: Formula (3) requires that the vector defining the axis of rotation have length 1.] 35. Use Formula (3) to derive the standard matrices for the rotations about the x -axis, the y -axis, and the z-axis through an angle of 90◦ in R 3 . 36. Show that the standard matrices listed in Tables 1 and 3 are special cases of Formulas (4) and (6). 37. In a sentence, describe the geometric effect of multiplying a vector x by the matrix



A=

cos2 θ − sin2 θ 2 sin θ cos θ

−2 sin θ cos θ cos2 θ − sin2 θ



38. If multiplication by A rotates a vector x in the xy -plane through an angle θ , what is the effect of multiplying x by AT ? Explain your reasoning. 39. Let x0 be a nonzero column vector in R 2 , and suppose that T : R 2 →R 2 is the transformation defined by the formula T (x) = x0 + Rθ x, where Rθ is the standard matrix of the rotation of R 2 about the origin through the angle θ . Give a geometric description of this transformation. Is it a matrix transformation? Explain. 40. In R 3 the orthogonal projections onto the x -axis, y -axis, and z-axis are

T1 (x, y, z) = (x, 0, 0), T2 (x, y, z) = (0, y, 0), T3 (x, y, z) = (0, 0, z) respectively. (a) Show that the orthogonal projections onto the coordinate axes are matrix operators, and then find their standard matrices.

270

Chapter 4 General Vector Spaces

(b) Show that if T : R 3 →R 3 is an orthogonal projection onto one of the coordinate axes, then for every vector x in R 3 , the vectors T(x) and x − T(x) are orthogonal.

(c) Make a sketch showing x and x − T(x) in the case where T is the orthogonal projection onto the x -axis.

4.10 Properties of MatrixTransformations In this section we will discuss properties of matrix transformations. We will show, for example, that if several matrix transformations are performed in succession, then the same result can be obtained by a single matrix transformation that is chosen appropriately. We will also explore the relationship between the invertibility of a matrix and properties of the corresponding transformation.

Compositions of Matrix Transformations

Suppose that TA is a matrix transformation from R n to R k and TB is a matrix transformation from R k to R m . If x is a vector in R n , then TA maps this vector into a vector TA (x) in R k , and TB , in turn, maps that vector into the vector TB (TA (x)) in R m . This process creates a transformation from R n to R m that we call the composition of TB with TA and denote by the symbol

TB ◦ TA which is read “TB circle TA .” As illustrated in Figure 4.10.1, the transformation TA in the formula is performed first; that is,

(TB ◦ TA )(x) = TB (TA (x))

(1)

This composition is itself a matrix transformation since

(TB ◦ TA )(x) = TB (TA (x)) = B(TA (x)) = B(Ax) = (BA)x which shows that it is multiplication by BA. This is expressed by the formula

TB ◦ TA = TBA TA Rn

Figure 4.10.1

x

(2) TB

Rk

TA(x)

Rm

TB (TA (x))

TB ° TA

Compositions can be defined for any finite succession of matrix transformations whose domains and ranges have the appropriate dimensions. For example, to extend Formula (2) to three factors, consider the matrix transformations

TA : R n → R k , TB : R k → R l , TC : R l → R m We define the composition (TC ◦ TB ◦ TA ): R n →R m by

(TC ◦ TB ◦ TA )(x) = TC (TB (TA (x))) As above, it can be shown that this is a matrix transformation whose standard matrix is CBA and that TC ◦ TB ◦ TA = TCBA (3) Sometimes we will want to refer to the standard matrix for a matrix transformation

T : R n →R m without giving a name to the matrix itself. In such cases we will denote the standard matrix for T by the symbol [T ]. Thus, the equation T (x) = [T ]x

4.10 Properties of Matrix Transformations

271

states that T (x) is the product of the standard matrix [T ] and the column vector x. For example, if T1 : R n →R k and if T2 : R k →R m , then Formula (2) can be restated as

[T2 ◦ T1 ] = [T2 ][T1 ]

(4)

Similarly, Formula (3) can be restated as

[T3 ◦ T2 ◦ T1 ] = [T3 ][T2 ][T1 ]

(5)

E X A M P L E 1 Composition Is Not Commutative

Let T1 : R 2 →R 2 be the reflection about the line y = x , and let T2 : R 2 →R 2 be the orthogonal projection onto the y -axis. Figure 4.10.2 illustrates graphically that T1 ◦ T2 and T2 ◦ T1 have different effects on a vector x. This same conclusion can be reached by showing that the standard matrices for T1 and T2 do not commute:

WARNING Just as it is not gen-

erally true for matrices that AB = BA, so it is not generally true that



TB ◦ TA = TA ◦ TB That is, order matters when matrix transformations are composed. In those special cases where the order does not matter we say that the linear transformations commute.

0 [T1 ◦ T2 ] = [T1 ][T2 ] = 1

1 0

0 [T2 ◦ T1 ] = [T2 ][T1 ] = 0

0 1









0 0

0 0 = 1 0

1 0

0 1

1 0 = 0 1

0 0





so [T2 ◦ T1 ]  = [T1 ◦ T2 ]. y

y T1(x)

y=x

y=x

T2 (T1 (x))

x

T2 (x)

x x

x T1 (T2 (x))

T2 ° T1

Figure 4.10.2

T2 (T1(x))

T 1 ° T2

y

E X A M P L E 2 Composition of Rotations Is Commutative

T1(x)

θ1 + θ 2

θ2

x

Let T1 : R 2 →R 2 and T2 : R 2 →R 2 be the matrix operators that rotate vectors about the origin through the angles θ1 and θ2 , respectively. Thus the operation

(T2 ◦ T1 )(x) = T2 (T1 (x))

θ1 x

Figure 4.10.3

first rotates x through the angle θ1 , then rotates T1 (x) through the angle θ2 . It follows that the net effect of T2 ◦ T1 is to rotate each vector in R 2 through the angle θ1 + θ2 (Figure 4.10.3). The standard matrices for these matrix operators, which are



[T1 ] =



− sin θ1 cos θ2 − sin θ2 , [T2 ] = , cos θ1 sin θ2 cos θ2

cos(θ1 + θ2 ) − sin(θ1 + θ2 ) [T2 ◦ T1 ] = sin(θ1 + θ2 ) cos(θ1 + θ2 ) cos θ1 sin θ1

should satisfy (4). With the help of some basic trigonometric identities, we can confirm that this is so as follows:

272

Chapter 4 General Vector Spaces

 [T2 ][T1 ] =

Using the notation Rθ for a rotation of R 2 about the origin through an angle θ , the computation in Example 2 shows that

sin θ2

− sin θ2 cos θ2



− sin θ1 cos θ1

cos θ1 sin θ1





 −(cos θ2 sin θ1 + sin θ2 cos θ1 ) = sin θ2 cos θ1 + cos θ2 sin θ1 − sin θ2 sin θ1 + cos θ2 cos θ1   cos(θ1 + θ2 ) − sin(θ1 + θ2 ) = sin(θ1 + θ2 ) cos(θ1 + θ2 ) cos θ2 cos θ1 − sin θ2 sin θ1

Rθ1 Rθ2 = Rθ1 +θ2 This makes sense since rotating a vector through an angle θ1 and then rotating the resulting vector through an angle θ2 is the same as rotating the original vector through the angle θ1 + θ2 .

cos θ2

= [T2 ◦ T1 ] E X A M P L E 3 Composition of Two Reflections

Let T1 : R 2 →R 2 be the reflection about the y -axis, and let T2 : R 2 →R 2 be the reflection about the x -axis. In this case T1 ◦ T2 and T2 ◦ T1 are the same; both map every vector x = (x, y) into its negative −x = (−x, −y) (Figure 4.10.4):

(T1 ◦ T2 )(x, y) = T1 (x, −y) = (−x, −y) (T2 ◦ T1 )(x, y) = T2 (−x, y) = (−x, −y) The equality of T1 ◦ T2 and T2 ◦ T1 can also be deduced by showing that the standard matrices for T1 and T2 commute:





0 0 −1 −1 0 1 = [T1 ◦ T2 ] = [T1 ][T2 ] = 0 1 0 −1 0 −1





1 0 −1 0 0 −1 [T2 ◦ T1 ] = [T2 ][T1 ] = = 0 −1 0 1 0 −1 The operator T (x) = −x on R 2 or R 3 is called the reflection about the origin. As the foregoing computations show, the standard matrix for this operator on R 2 is

−1 0 [T ] = 0 −1 y

y (x, y)

(x, y)

(–x, y)

x

T1(x) x

x

T2 (x)

T1(T2 (x)) (–x, –y)

Figure 4.10.4

x

(x, –y)

(–x, –y)

T2 (T1 (x))

T1 ° T2

T2 ° T1

E X A M P L E 4 Composition of Three Transformations

Find the standard matrix for the operator T : R 3 →R 3 that first rotates a vector counterclockwise about the z-axis through an angle θ , then reflects the resulting vector about the yz-plane, and then projects that vector orthogonally onto the xy -plane. Solution The operator T can be expressed as the composition

T = T3 ◦ T2 ◦ T1 where T1 is the rotation about the z-axis, T2 is the reflection about the yz-plane, and T3 is the orthogonal projection onto the xy -plane. From Tables 6, 2, and 4 of Section 4.9, the standard matrices for these operators are

4.10 Properties of Matrix Transformations





− sin θ cos θ

cos θ ⎢ [T1 ] = ⎣ sin θ 0



0 −1 ⎥ ⎢ 0⎦, [T2 ] = ⎣ 0 0 1

0

0 1 0





0 1 ⎥ ⎢ 0⎦, [T3 ] = ⎣0 1 0

Thus, it follows from (5) that the standard matrix for T is



1 ⎢ [T ] = ⎣0 0

⎤⎡

−1 0 ⎥⎢ 0⎦ ⎣ 0 0 0

0 1 0

⎡ − cos θ ⎢ = ⎣ sin θ 0

One-to-One Matrix Transformations

⎤⎡

0 cos θ ⎥⎢ 0⎦ ⎣ sin θ 1 0

0 1 0

− sin θ cos θ 0

0 1 0

273



0 ⎥ 0⎦ 0



0 ⎥ 0⎦ 1



sin θ cos θ 0

0 ⎥ 0⎦ 0

Our next objective is to establish a link between the invertibility of a matrix A and properties of the corresponding matrix transformation TA . DEFINITION 1 A matrix transformation TA : R n →R m is said to be one-to-one if TA

maps distinct vectors (points) in R n into distinct vectors (points) in R m .

(See Figure 4.10.5.) This idea can be expressed in various ways. For example, you should be able to see that the following are just restatements of Definition 1: 1. TA is one-to-one if for each vector b in the range of A there is exactly one vector x in R n such that TA x = b. 2. TA is one-to-one if the equality TA (u) = TA (v) implies that u = v.

Rn

Rm

Figure 4.10.5

Rn

Rm Not one-to-one

One-to-one

Rotation operators on R 2 are one-to-one since distinct vectors that are rotated through the same angle have distinct images (Figure 4.10.6). In contrast, the orthogonal projection of R 2 onto the x-axis is not one-to-one because it maps distinct points on the same vertical line into the same point (Figure 4.10.7). y

T(v)

y

T(u) θ θ

P v u

Q x

M

Figure 4.10.6 Distinct vectors u and v are rotated into distinct vectors T (u) and T (v).

x

Figure 4.10.7 The distinct points P and Q are mapped into the same point M .

274

Chapter 4 General Vector Spaces

Kernel and Range

In the discussion leading up to Theorem 4.2.5 we introduced the notion of the “kernel” of a matrix transformation. The following definition formalizes this idea and defines the companion notion of “range.” DEFINITION 2 If TA : R n →R m is a matrix transformation, then the set of all vectors

in R n that TA maps into 0 is called the kernel of TA and is denoted by ker(TA ). The set of all vectors in R m that are images under this transformation of at least one vector in R n is called the range of TA and is denoted by R(TA ). In brief: ker(TA ) = null space of A

(6)

R(TA ) = column space of A

(7)

The key to solving a mathematical problem is often adopting the right point of view; and this is why, in linear algebra, we develop different ways of thinking about the same vector space. For example, if A is an m × n matrix, here are three ways of viewing the same subspace of R n : • Matrix view: the null space of A • System view: the solution space of Ax = 0 • Transformation view: the kernel of TA and here are three ways of viewing the same subspace of R m : • Matrix view: the column space of A • System view: all b in R m for which Ax = b is consistent • Transformation view: the range of TA In the special case of a linear operator TA : R n →R n , the following theorem establishes fundamental relationships between the invertibility of A and properties of TA .

THEOREM 4.10.1 If

A is an n × n matrix and TA : R n →R n is the corresponding

matrix operator, then the following statements are equivalent. (a) A is invertible. (b) The kernel of TA is {0}. (c) The range of TA is R n . (d ) TA is one-to-one.

⇒ (b) ⇒ (c) ⇒ (d ) ⇒ (a ). We will prove the first two implications and leave the rest as exercises. Proof We can prove this theorem by establishing the chain of implications (a )

(a) ⇒ (b) Assume that A is invertible. It follows from parts (a) and (b) of Theorem 4.8.8

that the system Ax = 0 has only the trivial solution and hence that the null space of A is {0}. Formula (6) now implies that the kernel of TA is {0}. (b) ⇒ (c) Assume that the kernel of TA is {0}. It follows from Formula (6) that the null

space of A is {0} and hence that A has nullity 0. This in turn implies that the rank of A is n and hence that the column space of A is all of R n . Formula (7) now implies that the range of TA is R n .

4.10 Properties of Matrix Transformations

275

E X A M P L E 5 The Rotation Operator on R 2 Is One-to-One

As was illustrated in Figure 4.10.6, the operator T : R 2 →R 2 that rotates vectors through an angle θ is one-to-one. In accordance with parts (a) and (d) of Theorem 4.10.1, show that the standard matrix for T is invertible. Solution We will show that the standard matrix for T is invertible by showing that its determinant is nonzero. From Table 5 of Section 4.9 the standard matrix for T is



− sin θ cos θ

cos θ [T ] = sin θ

This matrix is invertible because

  cos θ det[T ] =  sin θ

 − sin θ  = cos2 θ + sin2 θ = 1  = 0 cos θ 

E X A M P L E 6 Projection Operators Are Not One-to-One

As illustrated in Figure 4.10.7, the operator T : R 2 →R 2 that projects onto the x -axis in the xy -plane is not one-to-one. In accordance with parts (a) and (d) of Theorem 4.10.1, show that the standard matrix for T is not invertible. Solution We will show that the standard matrix for T is not invertible by showing that its determinant is zero. From Table 3 of Section 4.9 the standard matrix for T is

 [T ] =

1

0

0

0



Since det[T ] = 0, the operator T is not one-to-one.

Inverse of a One-to-One Matrix Operator

If TA : R n →R n is a one-to-one matrix operator, then it follows from Theorem 4.10.1 that A is invertible. The matrix operator

TA−1 : R n →R n that corresponds to A−1 is called the inverse operator or (more simply) the inverse of TA . This terminology is appropriate because TA and TA−1 cancel the effect of each other in the sense that if x is any vector in R n , then

TA (TA−1 (x)) = AA−1 x = I x = x TA−1 (TA (x)) = A−1 Ax = I x = x y

or, equivalently, TA

s x to w m ap

x TA–1 maps w

TA ◦ TA−1 = TAA−1 = TI w

to x x

Figure 4.10.8

TA−1 ◦ TA = TA−1 A = TI From a more geometric viewpoint, if w is the image of x under TA , then TA−1 maps w backinto x, since

TA−1 (w) = TA−1 (TA (x)) = x This is illustrated in Figure 4.10.8 for R 2 .

276

Chapter 4 General Vector Spaces

Before considering examples, it will be helpful to touch on some notational matters. If TA : R n →R n is a one-to-one matrix operator, and if TA−1 : R n →R n is its inverse, then the standard matrices for these operators are related by the equation

TA−1 = TA−1

(8)

In cases where it is preferable not to assign a name to the matrix, we can express this equation as

[T −1 ] = [T ]−1

(9)

E X A M P L E 7 Standard Matrix for T −1

Let T : R 2 →R 2 be the operator that rotates each vector in R 2 through the angle θ , so from Table 5 of Section 4.9,



− sin θ cos θ

cos θ sin θ

[T ] =

(10)

It is evident geometrically that to undo the effect of T , one must rotate each vector in R 2 through the angle −θ . But this is exactly what the operator T −1 does, since the standard matrix for T −1 is

[T −1 ] = [T ]−1 =



cos θ − sin θ



sin θ cos(−θ) = cos θ sin(−θ)

− sin(−θ) cos(−θ)

(verify), which is the standard matrix for a rotation through the angle −θ . E X A M P L E 8 Finding T −1

Show that the operator T : R 2 →R 2 defined by the equations

w1 = 2x1 + x2 w2 = 3x1 + 4x2 is one-to-one, and find T −1 (w1 , w2 ). Solution The matrix form of these equations is



w1 2 = 3 w2

so the standard matrix for T is

1 4

x1 x2



2 [T ] = 3

1 4

This matrix is invertible (so T is one-to-one) and the standard matrix for T −1 is



4 5

[T −1 ] = [T ]−1 = ⎣ − 35 Thus



4 w 1 5 =⎣ [T −1 ] w2 −3

5

from which we conclude that

T −1 (w1 , w2 ) =

2 5

⎤ ⎦

⎤ ⎡

4 1 w − w 1 2 w1 5 5 ⎦ ⎦ =⎣ 2 3 2 w2 − w + w 5 5 1 5 2

− 15

4 5



− 15



w1 − 15 w2 , − 35 w1 + 25 w2



4.10 Properties of Matrix Transformations

More on the Equivalence Theorem

277

As our final result in this section, we will add parts (b), (c), and (d) of Theorem 4.10.1 to Theorem 4.8.8. THEOREM 4.10.2 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent. (b)

A is invertible. Ax = 0 has only the trivial solution.

(c)

The reduced row echelon form of A is In .

(d )

A is expressible as a product of elementary matrices.

(e) ( f)

Ax = b is consistent for every n × 1 matrix b. Ax = b has exactly one solution for every n × 1 matrix b.

( g)

det(A)  = 0.

(h)

The column vectors of A are linearly independent.

(i )

The row vectors of A are linearly independent.

( j)

The column vectors of A span R n .

(k)

The row vectors of A span R n .

(l )

The column vectors of A form a basis for R n .

(a)

(m) The row vectors of A form a basis for R n . (n) (o)

A has rank n. A has nullity 0.

( p) The orthogonal complement of the null space of A is R n . (q)

The orthogonal complement of the row space of A is {0}.

(r)

The kernel of TA is {0}.

(s)

The range of TA is R n .

(t)

TA is one-to-one.

Exercise Set 4.10 In Exercises 1–4, determine whether the operators T1 and T2 commute; that is, whether T1 ◦ T2 = T2 ◦ T1 . 1. (a) T1 : R 2 →R 2 is the reflection about the line y = x , and T2 : R 2 →R 2 is the orthogonal projection onto the x -axis. (b) T1 : R 2 →R 2 is the reflection about the x -axis, and T2 : R 2 →R 2 is the reflection about the line y = x . 2. (a) T1 : R →R is the orthogonal projection onto the x -axis, and T2 : R 2 →R 2 is the orthogonal projection onto the y -axis. 2

2

(b) T1 : R →R is the rotation about the origin through an angle of π/4, and T2 : R 2 →R 2 is the reflection about the y -axis. 2

2

3. T1 : R 3 →R 3 is a dilation with factor k , and T2 : R 3 →R 3 is a contraction with factor 1/k . 4. T1 : R 3 →R 3 is the rotation about the x -axis through an angle θ1 , and T2 : R 3 →R 3 is the rotation about the z-axis through an angle θ2 .

In Exercises 5–6, let TA and TB bet the operators whose standard matrices are given. Find the standard matrices for TB ◦ TA and TA ◦ TB .



5. A =

1 4



6 ⎢ 6. A = ⎣2 4

−2

1 3 0 −3



, B= ⎤

2 5

−3

0



4 −1 ⎥ ⎢−1 1⎦ , B = ⎣ 6

2

0 5 −3



4 2⎥ ⎦ 8

7. Find the standard matrix for the stated composition in R 2 . (a) A rotation of 90◦ , followed by a reflection about the line y = x. (b) An orthogonal projection onto the y -axis, followed by a contraction with factor k = 21 . (c) A reflection about the x -axis, followed by a dilation with factor k = 3, followed by a rotation about the origin of 60◦ .

278

Chapter 4 General Vector Spaces

8. Find the standard matrix for the stated composition in R 2 . (a) A rotation about the origin of 60◦ , followed by an orthogonal projection onto the x -axis, followed by a reflection about the line y = x . (b) A dilation with factor k = 2, followed by a rotation about the origin of 45◦ , followed by a reflection about the y -axis. (c) A rotation about the origin of 15◦ , followed by a rotation about the origin of 105◦ , followed by a rotation about the origin of 60◦ . 3

9. Find the standard matrix for the stated composition in R . (a) A reflection about the yz-plane, followed by an orthogonal projection onto the xz-plane. ◦ (b) A rotation of 45√ about the y -axis, followed by a dilation with factor k = 2.

(c) An orthogonal projection onto the xy -plane, followed by a reflection about the yz-plane. 10. Find the standard matrix for the stated composition in R 3 . (a) A rotation of 30◦ about the x -axis, followed by a rotation of 30◦ about the z-axis, followed by a contraction with factor k = 41 . (b) A reflection about the xy -plane, followed by a reflection about the xz-plane, followed by an orthogonal projection onto the yz-plane. (c) A rotation of 270◦ about the x -axis, followed by a rotation of 90◦ about the y -axis, followed by a rotation of 180◦ about the z-axis. 11. Let T1 (x1 , x2 ) = (x1 + x2 , x1 − x2 ) and T2 (x1 , x2 ) = (3x1 , 2x1 + 4x2 ). (a) Find the standard matrices for T1 and T2 . (b) Find the standard matrices for T2 ◦ T1 and T1 ◦ T2 . (c) Use the matrices obtained in part (b) to find formulas for T1 (T2 (x1 , x2 )) and T2 (T1 (x1 , x2 )). 12. Let T1 (x1 , x2 , x3 ) = (4x1 , −2x1 + x2 , −x1 − 3x2 ) and T2 (x1 , x2 , x3 ) = (x1 + 2x2 , −x3 , 4x1 − x3 ). (a) Find the standard matrices for T1 and T2 .

14. (a) A rotation about the z-axis in R 3 . (b) A reflection about the xy -plane in R 3 . (c) A dilation with factor k > 0 in R 3 . (d) An orthogonal projection onto the xz-plane in R 3 . In Exercises 15–16, describe in words the inverse of the given one-to-one operator. 15. (a) The reflection about the x -axis on R 2 . (b) The rotation about the origin through an angle of π/4 on R 2 . (c) The dilation with factor of 3 on R 2 . 16. (a) The reflection about the yz-plane in R 3 . (b) The contraction with factor

1 5

in R 3 .

(c) The rotation through an angle of −18◦ about the z-axis in R 3 . In Exercises 17–18, express the equations in matrix form, and then use parts (g) and (s) of Theorem 4.10.2 to determine whether the operator defined by the equations is one-to-one. 17. (a) w1 = 8x1 + 4x2 w2 = 2x1 + x2

(b) w1 = −x1 + 3x2 + 2x3 w2 = 2 x1 + 4 x3 w3 = x1 + 3x2 + 6x3

18. (a) w1 = 2x1 − 3x2 w2 = 5x1 + x2

(b) w1 = x1 + 2x2 + 3x3 w2 = 2x1 + 5x2 + 3x3 w3 = x1 + 8x3

19. Determine whether the matrix operator T : R 2 →R 2 defined by the equations is one-to-one; if so, find the standard matrix for the inverse operator, and find T −1 (w1 , w2 ). (a) w1 =

x1 + 2x2 w2 = −x1 + x2

(b) w1 =

4x1 − 6x2 w2 = −2x1 + 3x2

20. Determine whether the matrix operator T : R 3 →R 3 defined by the equations is one-to-one; if so, find the standard matrix for the inverse operator, and find T −1 (w1 , w2 , w3 ). (a) w1 = x1 − 2x2 + 2x3 w2 = 2x1 + x2 + x3

w3 = x1 + x2

(b) w1 =

x1 − 3x2 + 4x3 w2 = −x1 + x2 + x3 w3 = − 2x2 + 5x3

(b) Find the standard matrices for T2 ◦ T1 and T1 ◦ T2 . (c) Use the matrices obtained in part (b) to find formulas for T1 (T2 (x1 , x2 , x3 )) and T2 (T1 (x1 , x2 , x3 )). In Exercises 13–14, determine by inspection whether the stated matrix operator is one-to-one. 13. (a) The orthogonal projection onto the x -axis in R 2 . 2

(b) The reflection about the y -axis in R . (c) The reflection about the line y = x in R 2 . (d) A contraction with factor k > 0 in R 2 .

In Exercises 21–22, determine whether multiplication by A is a one-to-one matrix transformation.



3

⎤ −1 ⎥ 0⎦ −4

1

2



1

21. (a) A = ⎣2



⎢0 ⎢ 22. (a) A = ⎢ ⎣1 1

1 1 0

 (b) A =

1



1⎥ ⎥



0⎦

−1

1

2

3

−1

0

−4

 (b) A =



4

3

1

1



4.10 Properties of Matrix Transformations

In Exercises 23–24, let T be multiplication by the matrix A. Find (a) a basis for the range of T . (c) the rank and nullity of T .

Working with Proofs

(d) the rank and nullity of A. 1

−1

7

6 4

⎢ 23. A = ⎣5





3 ⎥ −4 ⎦ 2

2

⎢ 24. A = ⎣ 4

20

0 0 0



−1 ⎥ −2 ⎦ 0

In Exercises 25–26, let TA : R →R be multiplication by A. Find a basis for the kernel of TA , and then find a basis for the range of TA that consists of column vectors of A. 4



1 ⎢ 25. A = ⎣−3 −3



1 ⎢ 26. A = ⎣−2 −1

−1

2 1 8 1 4 8

−2

0 2 3



2

27. Let A be an n × n matrix such that det(A) = 0, and let T : R n →R n be multiplication by A. (a) What can you say about the range of the matrix operator T ? Give an example that illustrates your conclusion. (b) What can you say about the number of vectors that T maps into 0? 28. Answer the questions in Exercise 27 in the case where det(A)  = 0. 29. (a) Is a composition of one-to-one matrix transformations one-to-one? Justify your conclusion. (b) Can the composition of a one-to-one matrix transformation and a matrix transformation that is not one-to-one be one-to-one? Account for both possible orders of composition and justify your conclusion. 30. Let TA : R 2 →R 2 be multiplication by

A=

2 sin θ cos θ

35. Prove the implication (d) ⇒ (a) in Theorem 4.10.1.

True-False Exercises

(a) If TA and TB are matrix operators on R n , then TA (TB (x)) = TB (TA (x)) for every vector x in R n .

1 ⎥ 2⎦ 5

cos2 θ − sin2 θ

34. Prove the implication (c) ⇒ (d) in Theorem 4.10.1.

TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.





33. Prove that the matrix transformations TA and TB commute if and only if the matrices A and B commute.

3

⎥ 4⎦

3 4

32. (a) The inverse transformation for a reflections about a coordinate axis is a reflection about that axis. (b) The inverse transformation for a shear along a coordinate axis is a shear along that axis.

(b) a basis for the kernel of T .



279

−2 sin θ cos θ cos2 θ − sin2 θ



(a) What is the geometric effect of applying this transformation to a vector x in R 2 ? (b) Express the operator TA as a composition of two linear operators on R 2 . In Exercises 31–32, use matrix inversion to confirm the stated result in R 2 . 31. (a) The inverse transformation for a reflection about y = x is a reflection about y = x . (b) The inverse transformation for a compression along an axis is an expansion along that axis.

(b) If T1 and T2 are matrix operators on R n , then [T2 ◦ T1 ] = [T2 ][T1 ]. (c) A composition of two rotation operators about the origin of R 2 is another rotation about the origin. (d) A composition of two reflection operators in R 2 is another reflection operator. (e) The kernel of a matrix transformation TA : R n →R m is the same as the null space of A. (f ) If there is a nonzero vector in the kernel of the matrix operator TA : R n →R n , then this operator is not one-to-one. (g) If A is an n × n matrix and if the linear system Ax = 0 has a nontrivial solution, then the range of the matrix operator is not R n .

Working withTechnology T1. (a) Find the standard matrix for the linear operator on R 3 that performs a counterclockwise rotation of 47◦ about the x -axis, followed by a counterclockwise rotation of 68◦ about the y -axis, followed by a counterclockwise rotation of 33◦ about the z-axis. (b) Find the image of the point (1, 1, 1) under the operator in part (a). T2. Find the standard matrix for the linear operator on R 2 that first reflects each point in the plane about the line through the origin that makes an angle of 27◦ with the positive x -axis and then projects the resulting point orthogonally onto the line through the origin that makes an angle of 51◦ with the positive x -axis.

280

Chapter 4 General Vector Spaces

4.11 Geometry of Matrix Operators on R 2 In applications such as computer graphics it is important to understand not only how linear operators on R 2 and R 3 affect individual vectors but also how they affect two-dimensional or three-dimensional regions. That is the focus of this section.

Transformations of Regions

Figure 4.11.1 shows a famous picture of Albert Einstein that has been transformed in various ways using matrix operators on R 2 . The original image was scanned and then digitized to decompose it into a rectangular array of pixels. Those pixels were then transformed as follows: • The program MATLAB was used to assign coordinates and a gray level to each pixel. • The coordinates of the pixels were transformed by matrix multiplication. • The pixels were then assigned their original gray levels to produce the transformed picture. In computer games a perception of motion is created by using matrices to rapidly and repeatedly transform the arrays of pixels that form the visual images.

Digitized scan

Rotated

Sheared horizontally

Compressed horizontally

Figure 4.11.1 [Image: ARTHUR SASSE/AFP/Getty Images]

The effect of a matrix operator on R 2 can often be deduced by studying how it transforms the points that form the unit square. The following theorem, which we state without proof, shows that if the operator is invertible, then it maps each line segment in the unit square into the line segment connecting the images of its endpoints. In particular, the edges of the unit square get mapped into edges of the image (see Figure 4.11.2 in which the edges of a unit square and the corresponding edges of its image have been numbered).

Images of Lines Under Matrix Operators

y

y

e2

3

4

4 x

1

2 1

Figure 4.11.2

4

x

1

Unit square rotated

3 x

x 1

e1

Unit square

2

3

2

(1, 1) 2

y

y

3

Unit square reflected about the y-axis

4 Unit square reflected about the line y = x

4.11 Geometry of Matrix Operators on R 2

281

T : R 2 →R 2 is multiplication by an invertible matrix, then:

THEOREM 4.11.1 If

(a) The image of a straight line is a straight line. (b) The image of a line through the origin is a line through the origin. (c)

The images of parallel lines are parallel lines.

(d ) The image of the line segment joining points P and Q is the line segment joining the images of P and Q. (e)

The images of three points lie on a line if and only if the points themselves lie on a line.

E X A M P L E 1 Image of a Line

According to Theorem 4.11.1, the invertible matrix



3 A= 2

1 1

maps the line y = 2x + 1 into another line. Find its equation.

= 2x + 1, and let (x , y ) be its image under

Solution Let (x, y) be a point on the line y

multiplication by A. Then



3 x = 2 y

1 1

3 x x = and 2 y y

so

1 1

−1 1 x = y

−2

−1 x

3 y

x − y

x=

y = −2x + 3y

Substituting these expressions in y = 2x + 1 yields −2x + 3y = 2(x − y ) + 1 or, equivalently,

y = 45 x +

1 5

E X A M P L E 2 Transformation of the Unit Square

Sketch the image of the unit square under multiplication by the invertible matrix



y 3

(0, 1)

2 1

(0, 0)

A=

(1, 1)

4

x

Solution Since



0

1

2

1

y (1, 3) 2

 3

1

(1, 1) 4

1

2

1



Label the vertices of the image with their coordinates, and number the edges of the unit square and their corresponding images (as in Figure 4.11.2).

(1, 0)

(0, 2)

0

x

0

1

  0 0

  =

  0

0 0

 ,

  =

1

0

1

2

1

 ,

0

1

  1 0

  =

  1

0

2

,

  =

1

2 1 1 1 2 1 1 3 the image of the unit square is a parallelogram with vertices (0, 0), (0, 2), (1, 1), and (1, 3) (Figure 4.11.3).

(0, 0)

Figure 4.11.3

The next example illustrates a transformation of the unit square under a composition of matrix operators.

282

Chapter 4 General Vector Spaces

E X A M P L E 3 Transformation of the Unit Square

(a) Find the standard matrix for the operator on R 2 that first shears by a factor of 2 in the x -direction and then reflects the result about the line y = x . Sketch the image of the unit square under this operator. (b) Find the standard matrix for the operator on R 2 that first reflects about y = x and then shears by a factor of 2 in the x -direction. Sketch the image of the unit square under this operator. (c) Confirm that the shear and the reflection in parts (a) and (b) do not commute. Solution (a) The standard matrix for the shear is



1 A1 = 0

2 1



and for the reflection is

0 1

A2 =

1 0



Thus, the standard matrix for the shear followed by the reflection is



0 A2 A1 = 1

1 0



1 0



2 0 = 1 1

1 2

Solution (b) The standard matrix for the reflection followed by the shear is



1 A1 A2 = 0

2 1



0 1



1 2 = 0 1

1 0

Solution (c) The computations in Solutions (a) and (b) show that A1 A2  = A2 A1 , so the standard matrices, and hence the operators, do not commute. The same conclusion follows from Figures 4.11.4 and 4.11.5 since the two operators produce different images of the unit square.

y

y

y

y=x

(3, 1) (1, 1)

(1, 1) x

x

x

Shear in the x-direction by a factor k = 2

Reflection about y = x

Figure 4.11.4 y

y

y

y=x

(1, 3)

y=x

(3, 1) (1, 1) x

Figure 4.11.5

x

Shear in the x-direction by a factor k = 2

x

Reflection about y = x

4.11 Geometry of Matrix Operators on R 2

283

In Example 3 we illustrated the effect on the unit square in R 2 of a composition of shears and reflections. Our next objective is to show how to decompose any 2 × 2 invertible matrix into a product of matrices in Table 1, thereby allowing us to analyze the geometric effect of a matrix operator in R 2 as a composition of simpler matrix operators. The next theorem is our first step in this direction.

Geometry of Invertible Matrix Operators

Table 1 Standard Matrix

Operator

Effect on the Unit Square y



−1

Reflection about the y -axis

0

0 1

y (1, 1)

(––1, 1)

x

x

y

y (1, 1)



Reflection about the x -axis

1 0

0 −1

x

x

(1, –1)



Reflection about the line y = x

Rotation about the origin through a positive angle θ

Compression in the x -direction with factor k



1 0

0 1

y (1, 1)

(1, 1)

x



k 0

0 1

y



y

1 0

(k > 1)



k

y

0

x

y (1, 1)

0

(1, k)

k

0 1

(k, 1) x

x

x

x

θ

(1, 1)

(0 < k < 1)

Expansion in the x -direction with factor k

x

(cos θ – sin θ, sin θ + cos θ) y

y

(0 < k < 1)

Compression in the y -direction with factor k

(1, 1)

x

− sin θ cos θ

cos θ sin θ

y

y

y (k, 1)

(1, 1)

x

x

(Continued on the following page.)

284

Chapter 4 General Vector Spaces

Standard Matrix

Operator

Effect on the Unit Square y

Expansion in the y -direction with factor k



1 0

y (1, k)

0

(1, 1)

k

(k > 1)

x

x

Shear in the positive x -direction by a factor k



1 0

k

(1, 1)

1

1 0

k

(k, 1)

(1, 1)

1

1

k

x

y

y



(1, 1 + k)

(1, 1)

0 1

(1, k)

x

(k > 0)



1

k

0 1

x

y

y

Shear in the negative y -direction by a factor k

y (k + 1, 1)

x

(k < 0)

Shear in the positive y -direction by a factor k

x

y



(1 + k, 1)

x

(k > 0)

Shear in the negative x -direction by a factor k

y (k, 1)

y

(1, 1)

(1, 1 + k) x x

(k < 0)

(1, k)

THEOREM 4.11.2 If E is an elementary matrtix, then TE : R 2 →R 2 is one of the

following: (a)

A shear along a coordinate axis.

(b)

A reflection about y = x .

(c)

A compression along a coordinate axis.

(d )

An expansion along a coordinate axis.

(e)

A reflection about a coordinate axis.

( f)

A compression or expansion along a coordinate axis followed by a reflection about a coordinate axis.

4.11 Geometry of Matrix Operators on R 2

285

Proof Because a 2 × 2 elementary matrix results from performing a single elementary row operation on the 2 × 2 identity matrix, such a matrix must have one of the following forms (verify):



1

k



0 , 1

1 0

k

1



0 1

,



k

1 , 0



0 , 1

0

1 0

0

k

The first two matrices represent shears along coordinate axes, and the third represents a reflection about y = x . If k > 0, the last two matrices represent compressions or expansions along coordinate axes, depending on whether 0 ≤ k < 1 or k > 1. If k < 0, and if we express k in the form k = −k1 , where k1 > 0, then the last two matrices can be written as







k 0 −k1 0 −1 0 k1 0 (1) = = 0 1 0 1 0 1 0 1



1 0

0



1 = 0 k

0

−k1





1 = 0

0 −1

1 0

0

(2)

k1

Since k1 > 0, the product in (1) represents a compression or expansion along the x -axis followed by a reflection about the y -axis, and (2) represents a compression or expansion along the y -axis followed by a reflection about the x -axis. In the case where k = −1, transformations (1) and (2) are simply reflections about the y -axis and x -axis, respectively. We know from Theorem 4.10.2(d) that an invertible matrix can be expressed as a product of elementary matrices, so Theorem 4.11.2 implies the following result.

TA : R 2 →R 2 is multiplication by an invertible matrix A, then the geometric effect of TA is the same as an appropriate succession of shears, compressions, expansions, and reflections.

THEOREM 4.11.3 If

The next example will illustrate how Theorems 4.11.2 and 4.11.3 together with Table 1 can be used to analyze the geometric effect of multiplication by a 2 × 2 invertible matrix. E X A M P L E 4 Decomposing a Matrix Operator

In Example 2 we illustrated the effect on the unit square of multiplication by



0 A= 2

1 1

(see Figure 4.11.3). Express this matrix as a product of elementary matrices, and then describe the effect of multiplication by A in terms of shears, compressions, expansions, and reflections. Solution The matrix A can be reduced to the identity matrix as follows:



0 2



1 2 −→ 0 1



Multiply the first row by 21 .

1 2



1 −→ 0 1





 Interchange the first and second rows.

1 1 −→ 1 0

Add − 21 times the second row to the first.

0 1

286

Chapter 4 General Vector Spaces

These three successive row operations can be performed by multiplying A on the left successively by



E1 =

0

1

1

0





, E2 =



1 2

0

0

1



, E3 =

1

− 21

0

1



Inverting these matrices and using Formula (4) of Section 1.5 yields



A=

0

1

2

1





−1

−1

−1

= E1 E2 E3 =

0

1

1

0



2

0

0

1





1

1 2

0

1

Reading from right to left we can now see that the geometric effect of multiplying by A is equivalent to successively 1. shearing by a factor of

1 2

in the x -direction;

2. expanding by a factor of 2 in the x -direction; 3. reflecting about the line y = x . This is illustrated in Figure 4.11.6, whose end result agrees with that in Example 2.

y

y

y

y

y=x

(1, 3)

y=x

(0, 2) (3, 1)

( 32 , 1)

(1, 1) x

(1, 1) x

x

x (0, 0)

Figure 4.11.6

E X A M P L E 5 Transformations with Diagonal Matrices

Discuss the geometric effect on the unit square of multiplication by a diagonal matrix



A=

k1

0

0

k2

in which the entries k1 and k2 are positive real numbers ( = 1). Solution The matrix A is invertible and can be expressed as



A=

k1

0

0

k2



1 0

=



0

k2

k1 0

which show that multiplication by A causes a compression or expansion of the unit square by a factor of k1 in the x -direction followed by an expansion or compression of the unit square by a factor of k2 in the y -direction.

y (1, 1) x

E X A M P L E 6 Reflection About the Origin

As illustrated in Figure 4.11.7, multiplication by the matrix (–1, –1)

Figure 4.11.7

0 1



A=

−1 0

0 −1

4.11 Geometry of Matrix Operators on R 2

287

has the geometric effect of reflecting the unit square about the origin. Note, however, that the matrix equation



A=

−1 0



−1 0 = 0 −1

0 1



1 0

0 −1

together with Table 1 shows that the same result can be obtained by first reflecting the unit square about the x -axis and then reflecting that result about the y -axis. You should be able to see this as well from Figure 4.11.7.

y (1, 1)

E X A M P L E 7 Reflection About the Line y = –x x

We leave it for you to verify that multiplication by the matrix

−1



0 A= −1

(–1, –1)

0

reflects the unit square about the line y = −x (Figure 4.11.8).

Figure 4.11.8

Exercise Set 4.11 1. Use the method of Example 1 to find an equation for the image of the line y = 4x under multiplication by the matrix



A=

5

2

2

1



(b) Rotates through 30◦ about the origin, then shears by a factor of −2 in the y -direction, and then expands by a factor of 3 in the y -direction.

2. Use the method of Example 1 to find an equation for the image of the line y = −4x + 3 under multiplication by the matrix



4 3

A=

−3 −2

3. A shear by a factor 3 in the x -direction. 1 2

in the y -direction.

In Exercises 5–6, sketch the image of the unit square under multiplication by the given invertible matrix. As in Example 2, number the edges of the unit square and its image so it is clear how those edges correspond.

5.

3 1

−1 −2

6.

2 −1

In each part of Exercises 9–10, determine whether the stated operators commute. 9. (a) A reflection about the x -axis and a compression in the x -direction with factor 13 .

In Exercises 3–4, find an equation for the image of the line y = 2x that results from the stated transformation.

4. A compression with factor

8. (a) Reflects about the y -axis, then expands by a factor of 5 in the x -direction, and then reflects about y = x .

1 2

In each part of Exercises 7–8, find the standard matrix for a single operator that performs the stated succession of operations. 7. (a) Compresses by a factor of 21 in the x -direction, then expands by a factor of 5 in the y -direction.

(b) A reflection about the line y = x and an expansion in the x -direction with factor 2. 10. (a) A shear in the y -direction by a factor y -direction by a factor 35 .

1 4

and a shear in the

(b) A shear in the y -direction by a factor x -direction by a factor 35 .

1 4

and a shear in the

In Exercises 11–14, express the matrix as a product of elementary matrices, and then describe the effect of multiplication by A in terms of shears, compressions, expansions, and reflections.

11. A =

4 0

4 −2



−2

0 13. A = 4



12. A =

1 2

4 9

−3



1 14. A = 4

0

6

In each part of Exercises 15–16, describe, in words, the effect on the unit square of multiplication by the given diagonal matrix.



(b) Expands by a factor of 5 in the y -direction, then shears by a factor of 2 in the y -direction.

15. (a) A =

(c) Reflects about y = x , then rotates through an angle of 180◦ about the origin.

16. (a) A =

3 0

0 1



−2 0

0 1



(b) A =

1 0

0 −5



(b) A =

−3 0

0 −1

288

Chapter 4 General Vector Spaces

17. (a) Show that multiplication by



3 6

A=

1 2

maps each point in the plane onto the line y = 2x . (b) It follows from part (a) that the noncollinear points (1, 0), (0, 1), (−1, 0) are mapped onto a line. Does this violate part (e) of Theorem 4.11.1? 18. Find the matrix for a shear in the x -direction that transforms the triangle with vertices (0, 0), (2, 1), and (3, 0) into a right triangle with the right angle at the origin.

25. Find the image of the triangle with vertices (0, 0), (1, 1), (2, 0) under multiplication by

2 −1 A= 0 0 Does your answer violate part (e) of Theorem 4.11.1? Explain. 26. In R 3 the shear in the xy-direction by a factor k is the matrix transformation that moves each point (x, y, z) parallel to the xy -plane to the new position (x + kz, y + kz, z). (See the accompanying figure.) (a) Find the standard matrix for the shear in the xy -direction by a factor k . (b) How would you define the shear in the xz-direction by a factor k and the shear in the yz-direction by a factor k ? What are the standard matrices for these matrix transformations?

19. In accordance with part (c) of Theorem 4.11.1, show that multiplication by the invertible matrix



3 1

A=

2 1

z

maps the parallel lines y = 3x + 1 and y = 3x − 2 into parallel lines.

z

(x, y, z)

20. Draw a figure that shows the image of the triangle with vertices (0, 0), (1, 0), and (0.5, 1) under a shear by a factor of 2 in the x -direction. 21. (a) Draw a figure that shows the image of the triangle with vertices (0, 0), (1, 0), and (0.5, 1) under multiplication by

A=

1 1

−1

1

(b) Find a succession of shears, compressions, expansions, and reflections that produces the same image. 22. Find the endpoints of the line segment that results when the line segment from P (1, 2) to Q(3, 4) is transformed by (a) a compression with factor

1 2

in the y -direction.

(b) a rotation of 30◦ about the origin. 23. Draw a figure showing the italicized letter “T ” that results when the letter in the accompanying figure is sheared by a factor 41 in the x -direction. y

(x + kz, y + kz, z) y

x

y

x

Figure Ex-26

Working with Proofs 27. Prove part (a) of Theorem 4.11.1. [Hint: A line in the plane has an equation of the form Ax + By + C = 0, where A and B are not both zero. Use the method of Example 1 to show that the image of this line under multiplication by the invertible matrix

a b c d has the equation A x + B y + C = 0, where A = (dA − cB)/(ad − bc) and

B = (−bA + aB)/(ad − bc) Then show that A and B are not both zero to conclude that the image is a line.] 28. Use the hint in Exercise 27 to prove parts (b) and (c) of Theorem 4.11.1.

1 (0, .90)

True-False Exercises TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.

x 1 (.45, 0) (.55, 0)

Figure Ex-23

24. Can an invertible matrix operator on R 2 map a square region into a triangular region? Justify your answer.

(a) The image of the unit square under a one-to-one matrix operator is a square. (b) A 2 × 2 invertible matrix operator has the geometric effect of a succession of shears, compressions, expansions, and reflections.

Chapter 4 Supplementary Exercises

(c) The image of a line under an invertible matrix operator is a line.

(f ) The matrix

(d) Every reflection operator on R 2 is its own inverse.



1 (e) The matrix 1

1 represents reflection about a line. −1

−2

1 2

1



289

represents a shear.

1 (g) The matrix 0

0 represents an expansion. 3

Chapter 4 Supplementary Exercises 1. Let V be the set of all ordered triples of real numbers, and consider the following addition and scalar multiplication operations on u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ): u + v = (u1 + v1 , u2 + v2 , u3 + v3 ), k u = (ku1 , 0, 0) (a) Compute u + v and k u for u = (3, −2, 4), v = (1, 5, −2), and k = −1. (b) In words, explain why V is closed under addition and scalar multiplication. (c) Since the addition operation on V is the standard addition operation on R 3 , certain vector space axioms hold for V because they are known to hold for R 3 . Which axioms in Definition 1 of Section 4.1 are they? (d) Show that Axioms 7, 8, and 9 hold.

5. Let W be the space spanned by f = sin x and g = cos x . (a) Show that for any value of θ , f1 = sin(x + θ) and g1 = cos(x + θ) are vectors in W . (b) Show that f1 and g1 form a basis for W . 6. (a) Express v = (1, 1) as a linear combination of v1 = (1, −1), v2 = (3, 0), and v3 = (2, 1) in two different ways. (b) Explain why this does not violate Theorem 4.4.1. 7. Let A be an n × n matrix, and let v1 , v2 , . . . , vn be linearly independent vectors in R n expressed as n × 1 matrices. What must be true about A for Av1 , Av2 , . . . , Avn to be linearly independent?

(e) Show that Axiom 10 fails for the given operations. 2. In each part, the solution space of the system is a subspace of R 3 and so must be a line through the origin, a plane through the origin, all of R 3 , or the origin only. For each system, determine which is the case. If the subspace is a plane, find an equation for it, and if it is a line, find parametric equations. (a) 0x + 0y + 0z = 0

(c)

x − 2y + 7z = 0 −4x + 8y + 5z = 0 2x − 4y + 3z = 0

(b)

8. Must a basis for Pn contain a polynomial of degree k for each k = 0, 1, 2, . . . , n? Justify your answer. 9. For the purpose of this exercise, let us define a “checkerboard matrix” to be a square matrix A = [aij ] such that



2 x − 3y + z = 0 6 x − 9y + 3z = 0 −4 x + 6 y − 2 z = 0

(d) x + 4y + 8z = 0 2 x + 5y + 6z = 0 3x + y − 4 z = 0

3. For what values of s is the solution space of

x1 + x2 + sx3 = 0 x1 + sx2 + x3 = 0 sx1 + x2 + x3 = 0 the origin only, a line through the origin, a plane through the origin, or all of R 3 ? 4. (a) Express (4a, a − b, a + 2b) as a linear combination of (4, 1, 1) and (0, −1, 2).

aij =

1

if i + j is even

0

if i + j is odd

Find the rank and nullity of the following checkerboard matrices. (a) The 3 × 3 checkerboard matrix. (b) The 4 × 4 checkerboard matrix. (c) The n × n checkerboard matrix. 10. For the purpose of this exercise, let us define an “X -matrix” to be a square matrix with an odd number of rows and columns that has 0’s everywhere except on the two diagonals where it has 1’s. Find the rank and nullity of the following X -matrices.





0 1 0



1 ⎥ 0⎦ 1

1 ⎢ ⎢0 ⎢ (b) ⎢ ⎢0 ⎢0 ⎣ 1

0 1 0 1 0

(b) Express (3a + b + 3c, −a + 4b − c, 2a + b + 2c) as a linear combination of (3, −1, 2) and (1, 4, 1).

1 ⎢ (a) ⎣0 1

(c) Express (2a − b + 4c, 3a − c, 4b + c) as a linear combination of three nonzero vectors.

(c) the X -matrix of size (2n + 1) × (2n + 1)

0 0 1 0 0

0 1 0 1 0



1 ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎦ 1

290

Chapter 4 General Vector Spaces

11. In each part, show that the stated set of polynomials is a subspace of Pn and find a basis for it. (a) All polynomials in Pn such that p(−x) = p(x). (b) All polynomials in Pn such that p(0) = p(1). 12. (Calculus required ) Show that the set of all polynomials in Pn that have a horizontal tangent at x = 0 is a subspace of Pn . Find a basis for this subspace. 13. (a) Find a basis for the vector space of all 3 × 3 symmetric matrices. (b) Find a basis for the vector space of all 3 × 3 skewsymmetric matrices. 14. Various advanced texts in linear algebra prove the following determinant criterion for rank: The rank of a matrix A is r if and only if A has some r × r submatrix with a nonzero determinant, and all square submatrices of larger size have determinant zero. [Note: A submatrix of A is any matrix obtained by deleting rows or columns of A. The matrix A itself is also considered to be a submatrix of A.] In each part, use this criterion to find the rank of the matrix.



(a)

1 2



1 ⎢ (c) ⎣2 3

2 4 0 −1 −1

0 −1



(b)



1 ⎥ 3⎦ 4

1 2



1 ⎢ (d) ⎣ 3 −1

2 4

3 6

−1 1 2

2 0 4



0 ⎥ 0⎦ 0

15. Use the result in Exercise 14 above to find the possible ranks for matrices of the form



0 0 0 0

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ a51

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

a52

a53

a54

a55

⎤ a16 ⎥ a26 ⎥ ⎥ a36 ⎥ ⎥ a46 ⎥ ⎦ a56

16. Prove: If S is a basis for a vector space V, then for any vectors u and v in V and any scalar k , the following relationships hold. (a) (u + v)S = (u)S + (v)S

(b) (k u)S = k(u)S

17. Let Dk , Rθ , and Sk be a dilation of R 2 with factor k , a counterclockwise rotation about the origin of R 2 through an angle θ , and a shear of R 2 by a factor k , respectively. (a) Do Dk and Rθ commute? (b) Do Rθ and Sk commute? (c) Do Dk and Sk commute? 18. A vector space V is said to be the direct sum of its subspaces U and W , written V = U ⊕W , if every vector in V can be expressed in exactly one way as v = u + w, where u is a vector in U and w is a vector in W . (a) Prove that V = U ⊕W if and only if every vector in V is the sum of some vector in U and some vector in W and U ∩ W = {0}. (b) Let U be the xy -plane and W the z-axis in R 3 . Is it true that R 3 = U ⊕W ? Explain. (c) Let U be the xy -plane and W the yz-plane in R 3 . Can every vector in R 3 be expressed as the sum of a vector in U and a vector in W ? Is it true that R 3 = U ⊕W ? Explain.

CHAPTER

5

Eigenvalues and Eigenvectors CHAPTER CONTENTS

5.1 Eigenvalues and Eigenvectors 5.2 Diagonalization

291

302

5.3 Complex Vector Spaces 5.4 Differential Equations

313 326

5.5 Dynamical Systems and Markov Chains INTRODUCTION

332

In this chapter we will focus on classes of scalars and vectors known as “eigenvalues” and “eigenvectors,” terms derived from the German word eigen, meaning “own,” “peculiar to,” “characteristic,” or “individual.” The underlying idea first appeared in the study of rotational motion but was later used to classify various kinds of surfaces and to describe solutions of certain differential equations. In the early 1900s it was applied to matrices and matrix transformations, and today it has applications in such diverse fields as computer graphics, mechanical vibrations, heat flow, population dynamics, quantum mechanics, and economics, to name just a few.

5.1 Eigenvalues and Eigenvectors In this section we will define the notions of “eigenvalue” and “eigenvector” and discuss some of their basic properties.

Definition of Eigenvalue and Eigenvector

We begin with the main definition in this section.

A is an n × n matrix, then a nonzero vector x in R n is called an eigenvector of A (or of the matrix operator TA ) if Ax is a scalar multiple of x; that is, DEFINITION 1 If

Ax = λx for some scalar λ. The scalar λ is called an eigenvalue of A (or of TA ), and x is said to be an eigenvector corresponding to λ. The requirement that an eigenvector be nonzero is imposed to avoid the unimportant case A0 = λ0, which holds for every A and λ.

In general, the image of a vector x under multiplication by a square matrix A differs from x in both magnitude and direction. However, in the special case where x is an eigenvector of A, multiplication by A leaves the direction unchanged. For example, in R 2 or R 3 multiplication by A maps each eigenvector x of A (if any) along the same line through the origin as x. Depending on the sign and magnitude of the eigenvalue λ 291

292

Chapter 5 Eigenvalues and Eigenvectors

corresponding to x, the operation Ax = λx compresses or stretches x by a factor of λ, with a reversal of direction in the case where λ is negative (Figure 5.1.1). λx x

x

x

x

λx 0

0

0

0

λx λx

Figure 5.1.1

(a) 0 ≤ λ ≤ 1

(b) λ ≥ 1

(c) –1 ≤ λ ≤ 0

(d) λ ≤ –1

E X A M P L E 1 Eigenvector of a 2 × 2 Matrix



The vector x =

1 is an eigenvector of 2

y 6



3 A= 8

3x

0 −1

corresponding to the eigenvalue λ = 3, since



2

3 Ax = 8

x x 1

3

0 −1





1 3 = = 3x 2 6

Geometrically, multiplication by A has stretched the vector x by a factor of 3 (Figure 5.1.2).

Figure 5.1.2

Computing Eigenvalues and Eigenvectors

Our next objective is to obtain a general procedure for finding eigenvalues and eigenvectors of an n × n matrix A. We will begin with the problem of finding the eigenvalues of A. Note first that the equation Ax = λx can be rewritten as Ax = λI x, or equivalently, as

(λI − A)x = 0

Note that if (A)ij = aij , then formula (1) can be written in expanded form as

  λ − a11 a12 · · · −a1n     −a21 λ − a22 · · · −a2n     . ..  ..  ..  . .    −a −an2 · · · λ − ann  n1 =0

For λ to be an eigenvalue of A this equation must have a nonzero solution for x. But it follows from parts (b) and (g) of Theorem 4.10.2 that this is so if and only if the coefficient matrix λI − A has a zero determinant. Thus, we have the following result. THEOREM 5.1.1 If

A is an n × n matrix, then λ is an eigenvalue of A if and only if it

satisfies the equation det(λI − A) = 0

(1)

This is called the characteristic equation of A.

E X A M P L E 2 Finding Eigenvalues

In Example 1 we observed that λ = 3 is an eigenvalue of the matrix



A=

3 8

0 −1

but we did not explain how we found it. Use the characteristic equation to find all eigenvalues of this matrix.

5.1 Eigenvalues and Eigenvectors

293

Solution It follows from Formula (1) that the eigenvalues of A are the solutions of the equation det(λI − A) = 0, which we can write as

 λ − 3   −8



0  =0 λ + 1

from which we obtain

(λ − 3)(λ + 1) = 0

(2)

This shows that the eigenvalues of A are λ = 3 and λ = −1. Thus, in addition to the eigenvalue λ = 3 noted in Example 1, we have discovered a second eigenvalue λ = −1.

When the determinant det(λI − A) in (1) is expanded, the characteristic equation of A takes the form

λn + c1 λn−1 + · · · + cn = 0

(3)

where the left side of this equation is a polynomial of degree n in which the coefficient of λn is 1 (Exercise 37). The polynomial

p(λ) = λn + c1 λn−1 + · · · + cn

(4)

is called the characteristic polynomial of A. For example, it follows from (2) that the characteristic polynomial of the 2 × 2 matrix in Example 2 is

p(λ) = (λ − 3)(λ + 1) = λ2 − 2λ − 3 which is a polynomial of degree 2. Since a polynomial of degree n has at most n distinct roots, it follows from (3) that the characteristic equation of an n × n matrix A has at most n distinct solutions and consequently the matrix has at most n distinct eigenvalues. Since some of these solutions may be complex numbers, it is possible for a matrix to have complex eigenvalues, even if that matrix itself has real entries. We will discuss this issue in more detail later, but for now we will focus on examples in which the eigenvalues are real numbers.

E X A M P L E 3 Eigenvalues of a 3 × 3 Matrix

Find the eigenvalues of



0 ⎢ A = ⎣0 4

1 0 −17



0 ⎥ 1⎦ 8

Solution The characteristic polynomial of A is

⎡ ⎢

λ

det(λI − A) = det ⎣ 0 −4

−1 λ 17



0 ⎥ −1 ⎦ = λ3 − 8λ2 + 17λ − 4 λ−8

The eigenvalues of A must therefore satisfy the cubic equation

λ3 − 8λ2 + 17λ − 4 = 0

(5)

294

Chapter 5 Eigenvalues and Eigenvectors

To solve this equation, we will begin by searching for integer solutions. This task can be simplified by exploiting the fact that all integer solutions (if there are any) of a polynomial equation with integer coefficients

λn + c1 λn−1 + · · · + cn = 0 must be divisors of the constant term, cn . Thus, the only possible integer solutions of (5) are the divisors of −4, that is, ±1, ±2, ±4. Successively substituting these values in (5) shows that λ = 4 is an integer solution and hence that λ − 4 is a factor of the left side of (5). Dividing λ − 4 into λ3 − 8λ2 + 17λ − 4 shows that (5) can be rewritten as In applications involving large matrices it is often not feasible to compute the characteristic equation directly, so other methods must be used to find eigenvalues. We will consider such methods in Chapter 9.

(λ − 4)(λ2 − 4λ + 1) = 0 Thus, the remaining solutions of (5) satisfy the quadratic equation

λ2 − 4λ + 1 = 0 which can be solved by the quadratic formula. Thus, the eigenvalues of A are

λ = 4, λ = 2 +



3, and λ = 2 −



3

E X A M P L E 4 Eigenvalues of an Upper Triangular Matrix

Find the eigenvalues of the upper triangular matrix



a11

⎢0 ⎢ A=⎢ ⎣0

a12 a22

0

0 0

a13 a23 a33 0

⎤ a14 a24 ⎥ ⎥ ⎥ a34 ⎦ a44

Solution Recalling that the determinant of a triangular matrix is the product of the

entries on the main diagonal (Theorem 2.1.2), we obtain



λ − a11

⎢ ⎢ ⎣

det(λI − A) = det ⎢

0 0 0

−a12 λ − a22 0 0

−a13 −a23 λ − a33 0

⎤ −a14 −a24 ⎥ ⎥ ⎥ −a34 ⎦ λ − a44

= (λ − a11 )(λ − a22 )(λ − a33 )(λ − a44 ) Thus, the characteristic equation is

(λ − a11 )(λ − a22 )(λ − a33 )(λ − a44 ) = 0 and the eigenvalues are

λ = a11 , λ = a22 , λ = a33 , λ = a44 which are precisely the diagonal entries of A.

The following general theorem should be evident from the computations in the preceding example. THEOREM 5.1.2 If A is an n × n triangular matrix (upper triangular, lower triangular,

or diagonal ), then the eigenvalues of A are the entries on the main diagonal of A.

5.1 Eigenvalues and Eigenvectors

295

E X A M P L E 5 Eigenvalues of a Lower Triangular Matrix

By inspection, the eigenvalues of the lower triangular matrix



Had Theorem 5.1.2 been available earlier, we could have anticipated the result obtained in Example 2.

1 2



0

⎢ A = ⎣−1

0 ⎥ 0⎦

2 3

−8

5

− 41

are λ = 21 , λ = 23 , and λ = − 41 . The following theorem gives some alternative ways of describing eigenvalues. THEOREM 5.1.3 If A is an n × n matrix, the following statements are equivalent.

(a) λ is an eigenvalue of A. (b) λ is a solution of the characteristic equation det(λI − A) = 0. (c)

The system of equations (λI − A)x = 0 has nontrivial solutions.

(d ) There is a nonzero vector x such that Ax = λx.

Finding Eigenvectors and Bases for Eigenspaces

Now that we know how to find the eigenvalues of a matrix, we will consider the problem of finding the corresponding eigenvectors. By definition, the eigenvectors of A corresponding to an eigenvalue λ are the nonzero vectors that satisfy

(λI − A)x = 0 Notice that x = 0 is in every eigenspace but is not an eigenvector (see Definition 1). In the exercises we will ask you to show that this is the only vector that distinct eigenspaces have in common.

Thus, we can find the eigenvectors of A corresponding to λ by finding the nonzero vectors in the solution space of this linear system. This solution space, which is called the eigenspace of A corresponding to λ, can also be viewed as: 1. the null space of the matrix λI − A 2. the kernel of the matrix operator TλI −A : R n →R n 3. the set of vectors for which Ax = λx E X A M P L E 6 Bases for Eigenspaces

Find bases for the eigenspaces of the matrix



A=

−1 3 2



0

Historical Note Methods of linear algebra are used in the emerging field of computerized face recognition. Researchers are working with the idea that every human face in a racial group is a combination of a few dozen primary shapes. For example, by analyzing threedimensional scans of many faces, researchers at Rockefeller University have produced both an average head shape in the Caucasian group— dubbed the meanhead (top row left in the figure to the left)—and a set of standardized variations from that shape, called eigenheads (15 of which are shown in the picture). These are so named because they are eigenvectors of a certain matrix that stores digitized facial information. Face shapes are represented mathematically as linear combinations of the eigenheads. [Image: © Dr. Joseph J. Atick, adapted from Scientific American]

296

Chapter 5 Eigenvalues and Eigenvectors Solution The characteristic equation of A is

 λ + 1    −2

 −3  = λ(λ + 1) − 6 = (λ − 2)(λ + 3) = 0 λ

so the eigenvalues of A are λ = 2 and λ = −3. Thus, there are two eigenspaces of A, one for each eigenvalue. By definition,  

x1 x2

x=

is an eigenvector of A corresponding to an eigenvalue λ if and only if (λI − A)x = 0, that is,      0 λ + 1 −3 x1

−2

λ

=

x2

In the case where λ = 2 this equation becomes



3

−3

−2

2

  x1 x2

0

  0

=

0

whose general solution is

x1 = t, x2 = t (verify). Since this can be written in matrix form as

  x1 x2

=

 t

 

=t

t

1 1

 

it follows that

1 1

is a basis for the eigenspace corresponding to λ = 2. We leave it for you to follow the pattern of these computations and show that



− 23



1 is a basis for the eigenspace corresponding to λ = −3. Figure 5.1.3 illustrates the geometric effect of multiplication by the matrix A in Example 6. The eigenspace corresponding to λ = 2 is the line L1 through the origin and the point (1, 1), and the eigenspace corresponding to λ = 3 is the line L2 through the origin and the point (− 23 , 1). As indicated in the figure, multiplication by A maps each vector in L1 back into L1 , scaling it by a factor of 2, and it maps each vector in L2 back into L2 , scaling it by a factor of −3. E X A M P L E 7 Eigenvectors and Bases for Eigenspaces

Find bases for the eigenspaces of



0 A = ⎣1 1

0 2 0

−2



1⎦ 3

5.1 Eigenvalues and Eigenvectors

297

y

L1

L2

(– 32 , 1)

(2, 2) Multiplication (1, 1) by λ = 2 x

( 92 , –3) Multiplication by λ = –3

Figure 5.1.3

A is λ3 − 5λ2 + 8λ − 4 = 0, or in factored form, (λ − 1)(λ − 2) = 0 (verify). Thus, the distinct eigenvalues of A are λ = 1 and λ = 2, so there are two eigenspaces of A. Solution The characteristic equation of 2

By definition,

⎡ ⎤ x1 x = ⎣x2 ⎦ x3

is an eigenvector of A corresponding to λ if and only if x is a nontrivial solution of (λI − A)x = 0, or in matrix form,



0 λ ⎣ −1 λ − 2 −1 0

⎤⎡ ⎤

⎡ ⎤

2 0 x1 ⎦ ⎣ ⎦ ⎣ −1 x2 = 0⎦ λ−3 x3 0

(6)

In the case where λ = 2, Formula (6) becomes



2 ⎣ −1 −1

0 0 0

⎤⎡ ⎤

⎡ ⎤

2 0 x1 ⎦ ⎣ ⎦ ⎣ −1 x2 = 0⎦ −1 x3 0

Solving this system using Gaussian elimination yields (verify)

x1 = −s, x2 = t, x3 = s Thus, the eigenvectors of A corresponding to λ = 2 are the nonzero vectors of the form



⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −s −s 0 0 −1 x = ⎣ t ⎦ = ⎣ 0⎦ + ⎣ t ⎦ = s ⎣ 0⎦ + t ⎣1⎦ s s 0 1 0 Since



⎤ ⎡ ⎤ 0 −1 ⎣ 0⎦ and ⎣1⎦ 1

0

are linearly independent (why?), these vectors form a basis for the eigenspace corresponding to λ = 2.

298

Chapter 5 Eigenvalues and Eigenvectors

If λ = 1, then (6) becomes



1 ⎣ −1 −1

0 −1 0

⎤⎡ ⎤

⎡ ⎤

2 0 x1 −1⎦ ⎣x2 ⎦ = ⎣0⎦ x3 −2 0

Solving this system yields (verify)

x1 = −2s, x2 = s, x3 = s Thus, the eigenvectors corresponding to λ = 1 are the nonzero vectors of the form

⎤ ⎡ ⎤ ⎡ ⎤ −2 −2s −2 ⎣ s ⎦ = s ⎣ 1⎦ so that ⎣ 1⎦ 1 1 s ⎡

is a basis for the eigenspace corresponding to λ = 1.

Eigenvalues and Invertibility

The next theorem establishes a relationship between the eigenvalues and the invertibility of a matrix.

THEOREM 5.1.4

A square matrix A is invertible if and only if λ = 0 is not an eigen-

value of A.

Proof Assume that A is an n × n matrix and observe first that λ

= 0 is a solution of the

characteristic equation

λn + c1 λn−1 + · · · + cn = 0 if and only if the constant term cn is zero. Thus, it suffices to prove that A is invertible if and only if cn  = 0. But det(λI − A) = λn + c1 λn−1 + · · · + cn or, on setting λ = 0, det(−A) = cn or (−1)n det(A) = cn It follows from the last equation that det(A) = 0 if and only if cn = 0, and this in turn implies that A is invertible if and only if cn  = 0.

E X A M P L E 8 Eigenvalues and Invertibility

The matrix A in Example 7 is invertible since it has eigenvalues λ = 1 and λ = 2, neither of which is zero. We leave it for you to check this conclusion by showing that det(A)  = 0.

More on the Equivalence Theorem

As our final result in this section, we will use Theorem 5.1.4 to add one additional part to Theorem 4.10.2.

5.1 Eigenvalues and Eigenvectors

299

THEOREM 5.1.5 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent. (a) A is invertible. (b)

Ax = 0 has only the trivial solution.

(c)

The reduced row echelon form of A is In .

(d )

A is expressible as a product of elementary matrices.

Ax = b is consistent for every n × 1 matrix b. ( f ) Ax = b has exactly one solution for every n × 1 matrix b. ( g) det(A)  = 0. (h) The column vectors of A are linearly independent.

(e)

(i )

The row vectors of A are linearly independent.

( j)

The column vectors of A span R n .

(k)

The row vectors of A span R n .

(l )

The column vectors of A form a basis for R n .

(m) The row vectors of A form a basis for R n . (n) (o)

A has rank n. A has nullity 0.

( p) The orthogonal complement of the null space of A is R n .

Eigenvalues of General LinearTransformations

(q)

The orthogonal complement of the row space of A is {0}.

(r)

The kernel of TA is {0}.

(s)

The range of TA is R n .

(t)

TA is one-to-one.

(u)

λ = 0 is not an eigenvalue of A.

Thus far, we have only defined eigenvalues and eigenvectors for matrices and linear operators on R n . The following definition, which parallels Definition 1, extends this concept to general vector spaces.

→V is a linear operator on a vector space V , then a nonzero vector x in V is called an eigenvector of T if T(x) is a scalar multiple of x; that is, T(x) = λx for some scalar λ. The scalar λ is called an eigenvalue of T , and x is said to be an eigenvector corresponding to λ. DEFINITION 2 If T : V

As with matrix operators, we call the kernel of the operator λI − A the eigenspace of T corresponding to λ. Stated another way, this is the subspace of all vectors in V for which T(x) = λx. CA L C U L U S R E Q U I R E D

In vector spaces of functions eigenvectors are commonly referred to as eigenfunctions.

E X A M P L E 9 Eigenvalue of a Differentiation Operator

If D : C ⬁ →C ⬁ is the differentiation operator on the vector space of functions with continuous derivatives of all orders on the interval (−⬁, ⬁), and if λ is a constant, then

D(eλx ) = λeλx so that λ is an eigenvalue of D and eλx is a corresponding eigenvector.

300

Chapter 5 Eigenvalues and Eigenvectors

Exercise Set 5.1 In Exercises 1–4, confirm by multiplication that x is an eigenvector of A, and find the corresponding eigenvalue.



2 1 ; x= 2 −1



−1

5 1

2. A =



1 3

1. A =



4 ⎢ 3. A = ⎣2 1



17. (Calculus required ) Let D 2 : C ⬁ (−⬁, ⬁) →C ⬁ (−⬁, ⬁) be the operator that maps a function into its second derivative.

1 1 ⎢ ⎥ ⎥ 2⎦ ; x = ⎣2⎦ 4 1

−1

2 ⎢ 4. A = ⎣−1 −1

16. T (x, y, z) = (2x − y − z, x − z, −x + y + 2z)

1 1

⎡ ⎤



0 3 0

15. T (x, y) = (x + 4y, 2x + 3y)

; x=

3

2

−1

(a) Show that D 2 is linear.

1

In each part of Exercises 5–6, find the characteristic equation, the eigenvalues, and bases for the eigenspaces of the matrix.



1 2

5. (a)



1 0

(c)

0 1



2 1

6. (a)

2 0

0 2

1



(b)

(d)

19. (a) Reflection about the line y = x .

1

−3

2 0

(b) Orthogonal projection onto the x -axis. (c) Rotation about the origin through a positive angle of 90◦ .

2

(d) Contraction with factor k (0 ≤ k < 1).

1 −2

2 −1

(e) Shear in the x -direction by a factor k (k  = 0).

In Exercises 7–12, find the characteristic equation, the eigenvalues, and bases for the eigenspaces of the matrix.



4 ⎢ 7. ⎣−2 −2



6 ⎢ 9. ⎣0 1



4

⎢ 11. ⎣0 1

0 1 0 3

−2 0 0 3 0





1 ⎥ 0⎦ 1

−8

1 ⎢ 8. ⎣ 0 −2

⎤ ⎥

0⎦ −3



−1 ⎥ 0⎦ 2



0 ⎢ 10. ⎣1 1



1

⎢ 12. ⎣3 6

−2

0 0 0

(c) Dilation with factor k (k > 1).



0⎦ 4

(e) Shear in the y -direction by a factor k (k  = 0). In each part of Exercises 21–22, find the eigenvalues and the corresponding eigenspaces of the stated matrix operator on R 3 . Refer to the tables in Section 4.9 and use geometric reasoning to find the answers. No computations are needed. 21. (a) Reflection about the xy -plane.



(b) Orthogonal projection onto the xz-plane.

3 ⎥ 3⎦ 4

(c) Counterclockwise rotation about the positive x -axis through an angle of 90◦ .

In Exercises 13–14, find the characteristic equation of the matrix by inspection.



3 ⎢ 13. ⎣−2 4

0 7 8



0 ⎥ 0⎦ 1



9

⎢0 ⎢ 14. ⎢ ⎣0 0

−8 −1 0 0

(b) Rotation about the origin through a positive angle of 180◦ . (d) Expansion in the y -direction with factor k (k > 1).

1 ⎥ 1⎦ 0

−3 −5 −6

20. (a) Reflection about the y -axis.





1 0 1

18. (Calculus required ) Let D 2 : C ⬁ →C ⬁ be the linear operator in Exercise 17. Show that if ω is a positive constant, then √ √ sinh ωx and cosh ωx are eigenvectors of D 2 , and find their corresponding eigenvalues. In each part of Exercises 19–20, find the eigenvalues and the corresponding eigenspaces of the stated matrix operator on R 2 . Refer to the tables in Section 4.9 and use geometric reasoning to find the answers. No computations are needed.

2

−2

1 0

(d)



−7

−2

(b)

1 2

(c)



4 3



(b) Show that if ω is a positive constant, then sin ωx and √ cos ωx are eigenvectors of D 2 , and find their corresponding eigenvalues.

⎡ ⎤ ⎤ −1 1 ⎢ ⎥ ⎥ −1⎦ ; x = ⎣1⎦ 2

In Exercises 15–16, find the eigenvalues and a basis for each eigenspace of the linear operator defined by the stated formula. [Suggestion: Work with the standard matrix for the operator.]

6 0 3 0



3 0⎥ ⎥ ⎥ 0⎦ 7

(d) Contraction with factor k (0 ≤ k < 1). 22. (a) Reflection about the xz-plane. (b) Orthogonal projection onto the yz-plane. (c) Counterclockwise rotation about the positive y -axis through an angle of 180◦ . (d) Dilation with factor k (k > 1).

5.1 Eigenvalues and Eigenvectors

23. Let A be a 2 × 2 matrix, and call a line through the origin of R 2 invariant under A if Ax lies on the line when x does. Find equations for all lines in R 2 , if any, that are invariant under the given matrix.

(a) A =

4 2

−1

(b) A =

1

0 −1

1 0

24. Find det(A) given that A has p(λ) as its characteristic polynomial.

30. Let A be the matrix in Exercise 29. Show that if b  = 0, then



x1 =

and x2 =

−b a − λ2

1 2

λ2 =

1 2

%

(a + d) +

%

(a + d) −

"

(a − d)2 + 4bc

"

(a − d)2 + 4bc

&

&

p(λ) = λ2 + c1 λ + c2

(a) What is the size of A?

is the characteristic polynomial of a 2 × 2 matrix, then

p(A) = A2 + c1 A + c2 I = 0 (Stated informally, A satisfies its characteristic equation. This result is true as well for n × n matrices.)

(b) Is A invertible? (c) How many eigenspaces does A have? 26. The eigenvectors that we have been studying are sometimes called right eigenvectors to distinguish them from left eigenvectors, which are n × 1 column matrices x that satisfy the equation xTA = μxT for some scalar μ. For a given matrix A, how are the right eigenvectors and their corresponding eigenvalues related to the left eigenvectors and their corresponding eigenvalues? 27. Find a 3 × 3 matrix A that has eigenvalues 1, −1, and 0, and for which ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣−1⎦ , ⎣1⎦ , ⎣−1⎦ 0 1 0 are their corresponding eigenvectors.

32. Prove: If a , b, c, and d are integers such that a + b = c + d , then

A=

a c

b d

has integer eigenvalues. 33. Prove: If λ is an eigenvalue of an invertible matrix A and x is a corresponding eigenvector, then 1/λ is an eigenvalue of A−1 and x is a corresponding eigenvector. 34. Prove: If λ is an eigenvalue of A, x is a corresponding eigenvector, and s is a scalar, then λ − s is an eigenvalue of A − sI and x is a corresponding eigenvector. 35. Prove: If λ is an eigenvalue of A and x is a corresponding eigenvector, then sλ is an eigenvalue of sA for every scalar s and x is a corresponding eigenvector. 36. Find the eigenvalues and bases for the eigenspaces of



Working with Proofs

−2 ⎢ A = ⎣ −2 −4

28. Prove that the characteristic equation of a 2 × 2 matrix A can be expressed as λ2 − tr(A)λ + det(A) = 0, where tr(A) is the trace of A.

A=

a c

b d

(a) A−1

then the solutions of the characteristic equation of A are

(a + d) ±

"

(a − d)2 + 4bc

2 3 2



3 ⎥ 2⎦ 5

and then use Exercises 33 and 34 to find the eigenvalues and bases for the eigenspaces of

29. Use the result in Exercise 28 to show that if

%



31. Use the result of Exercise 28 to prove that if

25. Suppose that the characteristic polynomial of some matrix A is found to be p(λ) = (λ − 1)(λ − 3)2 (λ − 4)3 . In each part, answer the question and explain your reasoning.

1 2

λ1 = and

[Hint: See the proof of Theorem 5.1.4.]

λ=

−b a − λ1

are eigenvectors of A that correspond, respectively, to the eigenvalues

(a) p(λ) = λ3 − 2λ2 + λ + 5 (b) p(λ) = λ4 − λ3 + 7

301

&

Use this result to show that A has (a) two distinct real eigenvalues if (a − d)2 + 4bc > 0. (b) two repeated real eigenvalues if (a − d)2 + 4bc = 0. (c) complex conjugate eigenvalues if (a − d)2 + 4bc < 0.

(b) A − 3I

(c) A + 2I

37. Prove that the characteristic polynomial of an n × n matrix A has degree n and that the coefficient of λn in that polynomial is 1. 38. (a) Prove that if A is a square matrix, then A and AT have the same eigenvalues. [Hint: Look at the characteristic equation det(λI − A) = 0.] (b) Show that A and AT need not have the same eigenspaces. [Hint: Use the result in Exercise 30 to find a 2 × 2 matrix for which A and AT have different eigenspaces.]

302

Chapter 5 Eigenvalues and Eigenvectors

Working withTechnology

39. Prove that the intersection of any two distinct eigenspaces of a matrix A is {0}.

T1. For the given matrix A, find the characteristic polynomial and the eigenvalues, and then use the method of Example 7 to find bases for the eigenspaces.

True-False Exercises



TF. In parts (a)–(f) determine whether the statement is true or false, and justify your answer.

−8

⎢ 0 ⎢ ⎢ A=⎢ ⎢ 0 ⎢ ⎣ 0

(a) If A is a square matrix and Ax = λx for some nonzero scalar λ, then x is an eigenvector of A. (b) If λ is an eigenvalue of a matrix A, then the linear system (λI − A)x = 0 has only the trivial solution.

4

33

38

173

0

−1

−4

0

−5

−25

0

1

5

−16

−19

−86

⎤ −30 ⎥ 0⎥ ⎥ 1⎥ ⎥ ⎥ 0⎦ 15

T2. The Cayley–Hamilton Theorem states that every square matrix satisfies its characteristic equation; that is, if A is an n × n matrix whose characteristic equation is

(c) If the characteristic polynomial of a matrix A is p(λ) = λ2 + 1, then A is invertible.

λ

+ c1 λn−1 + · · · + cn = 0

(d) If λ is an eigenvalue of a matrix A, then the eigenspace of A corresponding to λ is the set of eigenvectors of A corresponding to λ.

then An + c1 An−1 + · · · + cn = 0. (a) Verify the Cayley–Hamilton Theorem for the matrix



(e) The eigenvalues of a matrix A are the same as the eigenvalues of the reduced row echelon form of A.

0

⎢ A = ⎣0 2

(f ) If 0 is an eigenvalue of a matrix A, then the set of columns of A is linearly independent.



1

0

0

1⎦

−5



4

(b) Use the result in Exercise 28 to prove the Cayley–Hamilton Theorem for 2 × 2 matrices.

5.2 Diagonalization In this section we will be concerned with the problem of finding a basis for Rn that consists of eigenvectors of an n × n matrix A. Such bases can be used to study geometric properties of A and to simplify various numerical computations. These bases are also of physical significance in a wide variety of applications, some of which will be considered later in this text.

The Matrix Diagonalization Problem

Products of the form P −1AP in which A and P are n × n matrices and P is invertible will be our main topic of study in this section. There are various ways to think about such products, one of which is to view them as transformations

A →P −1AP in which the matrix A is mapped into the matrix P −1AP . These are called similarity transformations. Such transformations are important because they preserve many properties of the matrix A. For example, if we let B = P −1AP , then A and B have the same determinant since det(B) = det(P −1AP ) = det(P −1 ) det(A) det(P )

=

1 det(A) det(P ) = det(A) det(P )

5.2 Diagonalization

303

In general, any property that is preserved by a similarity transformation is called a similarity invariant and is said to be invariant under similarity. Table 1 lists the most important similarity invariants. The proofs of some of these are given as exercises.

Table 1 Similarity Invariants Property

Description

Determinant

A and P −1AP have the same determinant.

Invertibility

A is invertible if and only if P −1AP is invertible.

Rank

A and P −1AP have the same rank.

Nullity

A and P −1AP have the same nullity.

Trace

A and P −1AP have the same trace.

Characteristic polynomial

A and P −1AP have the same characteristic polynomial.

Eigenvalues

A and P −1AP have the same eigenvalues.

Eigenspace dimension

If λ is an eigenvalue of A (and hence of P −1AP ) then the eigenspace of A corresponding to λ and the eigenspace of P −1AP corresponding to λ have the same dimension.

We will find the following terminology useful in our study of similarity transformations. DEFINITION 1 If

A and B are square matrices, then we say that B is similar to A if there is an invertible matrix P such that B = P −1AP .

Note that if B is similar to A, then it is also true that A is similar to B since we can express A as A = Q−1 BQ by taking Q = P −1 . This being the case, we will usually say that A and B are similar matrices if either is similar to the other. Because diagonal matrices have such a simple form, it is natural to inquire whether a given n × n matrix A is similar to a matrix of this type. Should this turn out to be the case, and should we be able to actually find a diagonal matrix D that is similar to A, then we would be able to ascertain many of the similarity invariant properties of A directly from the diagonal entries of D . For example, the diagonal entries of D will be the eigenvalues of A (Theorem 5.1.2), and the product of the diagonal entries of D will be the determinant of A (Theorem 2.1.2). This leads us to introduce the following terminology. DEFINITION 2 A square matrix A is said to be diagonalizable if it is similar to some diagonal matrix; that is, if there exists an invertible matrix P such that P −1AP is diagonal. In this case the matrix P is said to diagonalize A.

The following theorem and the ideas used in its proof will provide us with a roadmap for devising a technique for determining whether a matrix is diagonalizable and, if so, for finding a matrix P that will perform the diagonalization.

304

Chapter 5 Eigenvalues and Eigenvectors

Part (b) of Theorem 5.2.1 is equivalent to saying that there is a basis for R n consisting of eigenvectors of A. Why?

THEOREM 5.2.1 If A is an n × n matrix, the following statements are equivalent.

(a) A is diagonalizable. (b) A has n linearly independent eigenvectors. Proof (a) ⇒ (b) Since

A is assumed to be diagonalizable, it follows that there exist an invertible matrix P and a diagonal matrix D such that P −1AP = D or, equivalently, AP = PD

(1)

If we denote the column vectors of P by p1 , p2 , . . . , pn , and if we assume that the diagonal entries of D are λ1 , λ2 , . . . , λn , then by Formula (6) of Section 1.3 the left side of (1) can be expressed as

AP = A[p1 p2

· · · pn ] = [Ap1 Ap2

· · · Apn ]

and, as noted in the comment following Example 1 of Section 1.7, the right side of (1) can be expressed as PD = [λ1 p1 λ2 p2 · · · λn pn ] Thus, it follows from (1) that

Ap1 = λ1 p1 , Ap2 = λ2 p2 , . . . , Apn = λn pn

(2)

Since P is invertible, we know from Theorem 5.1.5 that its column vectors p1 , p2 , . . . , pn are linearly independent (and hence nonzero). Thus, it follows from (2) that these n column vectors are eigenvectors of A. Proof (b) ⇒ (a) Assume that A has n linearly independent eigenvectors, p1 , p2 , . . . , pn ,

and that λ1 , λ2 , . . . , λn are the corresponding eigenvalues. If we let

P = [p1 p2

· · · pn ]

and if we let D be the diagonal matrix that has λ1 , λ2 , . . . , λn as its successive diagonal entries, then

AP = A[p1 p2 · · · pn ] = [Ap1 Ap2 = [λ1 p1 λ2 p2 · · · λn pn ] = PD

· · · Apn ]

Since the column vectors of P are linearly independent, it follows from Theorem 5.1.5 that P is invertible, so that this last equation can be rewritten as P −1AP = D , which shows that A is diagonalizable. Whereas Theorem 5.2.1 tells us that we need to find n linearly independent eigenvectors to diagonalize a matrix, the following theorem tells us where such vectors might be found. Part (a) is proved at the end of this section, and part (b) is an immediate consequence of part (a) and Theorem 5.2.1 (why?). THEOREM 5.2.2

(a) If λ1 , λ2 , . . . , λk are distinct eigenvalues of a matrix A, and if v1 , v2 , . . . , vk are corresponding eigenvectors, then {v1 , v2 , . . . , vk } is a linearly independent set. (b) An n × n matrix with n distinct eigenvalues is diagonalizable. Remark Part (a) of Theorem 5.2.2 is a special case of a more general result: Specifically, if λ1 , λ2 , . . . , λk are distinct eigenvalues, and if S1 , S2 , . . . , Sk are corresponding sets of linearly independent eigenvectors, then the union of these sets is linearly independent.

5.2 Diagonalization

Procedure for Diagonalizing a Matrix

305

Theorem 5.2.1 guarantees that an n × n matrix A with n linearly independent eigenvectors is diagonalizable, and the proof of that theorem together with Theorem 5.2.2 suggests the following procedure for diagonalizing A. A Procedure for Diagonalizing an n × n Matrix Step 1. Determine first whether the matrix is actually diagonalizable by searching for n linearly independent eigenvectors. One way to do this is to find a basis for each eigenspace and count the total number of vectors obtained. If there is a total of n vectors, then the matrix is diagonalizable, and if the total is less than n, then it is not. Step 2. If you ascertained that the matrix is diagonalizable, then form the matrix P = [p1 p2 · · · pn ] whose column vectors are the n basis vectors you obtained in Step 1. Step 3. P −1AP will be a diagonal matrix whose successive diagonal entries are the eigenvalues λ1 , λ2 , . . . , λn that correspond to the successive columns of P .

E X A M P L E 1 Finding a Matrix P That Diagonalizes a Matrix A

Find a matrix P that diagonalizes



0 ⎣ A= 1 1

−2

0 2 0



1⎦ 3

Solution In Example 7 of the preceding section we found the characteristic equation of

A to be

(λ − 1)(λ − 2)2 = 0

and we found the following bases for the eigenspaces:



⎡ ⎤ ⎤ −1 0 λ = 2: p1 = ⎣ 0⎦, p2 = ⎣1⎦ ; 1



−2

λ = 1: p3 = ⎣ 1⎦

0

1

There are three basis vectors in total, so the matrix



−1

P =⎣ 0 1

0 1 0

−2



1⎦ 1

diagonalizes A. As a check, you should verify that



1 P −1AP = ⎣ 1 −1

0 1 0

⎤⎡

2 0 1⎦ ⎣1 1 −1

0 2 0



−2

⎤⎡

−1

1⎦ ⎣ 0 3 1

0 1 0

−2





2 1⎦ = ⎣0 1 0

0 2 0



0 0⎦ 1

In general, there is no preferred order for the columns of P . Since the i th diagonal entry of P −1AP is an eigenvalue for the i th column vector of P , changing the order of the columns of P just changes the order of the eigenvalues on the diagonal of P −1AP . Thus, had we written ⎡ ⎤ −1 −2 0 1 1⎦ P =⎣ 0 1 1 0

306

Chapter 5 Eigenvalues and Eigenvectors

in the preceding example, we would have obtained



2 ⎢ P −1AP = ⎣0 0



0 1 0

0 ⎥ 0⎦ 2

E X A M P L E 2 A Matrix That Is Not Diagonalizable

Show that the following matrix is not diagonalizable:



1 ⎢ A=⎣ 1 −3

0 2 5



0 ⎥ 0⎦ 2

Solution The characteristic polynomial of A is

 λ−1   det(λI − A) =  −1   3

0

λ−2 −5

     = (λ − 1)(λ − 2)2  λ−2 0 0

so the characteristic equation is

(λ − 1)(λ − 2)2 = 0 and the distinct eigenvalues of A are λ = 1 and λ = 2. We leave it for you to show that bases for the eigenspaces are



λ = 1: p1 =



1 ⎢ 81 ⎥ ⎢− ⎥ ; ⎣ 8⎦

⎡ ⎤ 0

⎢ ⎥ λ = 2: p2 = ⎣0⎦ 1

1

Since A is a 3 × 3 matrix and there are only two basis vectors in total, A is not diagonalizable. Alternative Solution If you are concerned only in determining whether a matrix is di-

agonalizable and not with actually finding a diagonalizing matrix P , then it is not necessary to compute bases for the eigenspaces—it suffices to find the dimensions of the eigenspaces. For this example, the eigenspace corresponding to λ = 1 is the solution space of the system ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0 0 0 0 x1 ⎢ ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣x2 ⎦ = ⎣0⎦ ⎣−1 −1 0 3 −5 −1 x3 Since the coefficient matrix has rank 2 (verify), the nullity of this matrix is 1 by Theorem 4.8.2, and hence the eigenspace corresponding to λ = 1 is one-dimensional. The eigenspace corresponding to λ = 2 is the solution space of the system



1 ⎢ ⎣−1 3

0 0 −5

⎤⎡ ⎤

⎡ ⎤

0 0 x1 ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣x2 ⎦ = ⎣0⎦ 0 0 x3

This coefficient matrix also has rank 2 and nullity 1 (verify), so the eigenspace corresponding to λ = 2 is also one-dimensional. Since the eigenspaces produce a total of two basis vectors, and since three are needed, the matrix A is not diagonalizable.

5.2 Diagonalization

307

E X A M P L E 3 Recognizing Diagonalizability

We saw in Example 3 of the preceding section that



0 ⎢ A = ⎣0 4

1 0 −17

has three distinct eigenvalues: λ = 4, λ = 2 + diagonalizable and ⎡ 4 0





⎢ P −1AP = ⎣0

2+

0

0



0 ⎥ 1⎦ 8 3, and λ = 2 − 0 3

0 2−





3. Therefore, A is

⎤ ⎥ ⎦ 3

for some invertible matrix P . If needed, the matrix P can be found using the method shown in Example 1 of this section. E X A M P L E 4 Diagonalizability of Triangular Matrices

From Theorem 5.1.2, the eigenvalues of a triangular matrix are the entries on its main diagonal. Thus, a triangular matrix with distinct entries on the main diagonal is diagonalizable. For example, ⎤ ⎡ 2 4 0 −1 ⎢ 0 3 1 7⎥ ⎥ ⎢ A=⎢ ⎥ 0 5 8⎦ ⎣ 0 0 0 0 −2 is a diagonalizable matrix with eigenvalues λ1 = −1, λ2 = 3, λ3 = 5, λ4 = −2. Eigenvalues of Powers of a Matrix

Since there are many applications in which it is necessary to compute high powers of a square matrix A, we will now turn our attention to that important problem. As we will see, the most efficient way to compute Ak , particularly for large values of k , is to first diagonalize A. But because diagonalizing a matrix A involves finding its eigenvalues and eigenvectors, we will need to know how these quantities are related to those of Ak . As an illustration, suppose that λ is an eigenvalue of A and x is a corresponding eigenvector. Then A2 x = A(Ax) = A(λx) = λ(Ax) = λ(λx) = λ2 x which shows not only that λ2 is a eigenvalue of A2 but that x is a corresponding eigenvector. In general, we have the following result.

Note that diagonalizability is not a requirement in Theorem 5.2.3.

THEOREM 5.2.3

If k is a positive integer, λ is an eigenvalue of a matrix A, and x is a corresponding eigenvector, then λk is an eigenvalue of Ak and x is a corresponding eigenvector.

E X A M P L E 5 Eigenvalues and Eigenvectors of Matrix Powers

In Example 2 we found the eigenvalues and corresponding eigenvectors of the matrix



1 ⎢ A=⎣ 1 −3 Do the same for A7 .

0 2 5



0 ⎥ 0⎦ 2

308

Chapter 5 Eigenvalues and Eigenvectors

A are λ = 1 and λ = 2, so the eigenvalues of A7 are λ = 17 = 1 and λ = 27 = 128. The eigenvectors p1 and p2 obtained in Example 1 corresponding to the eigenvalues λ = 1 and λ = 2 of A are also the eigenvectors corresponding to the eigenvalues λ = 1 and λ = 128 of A7 . Solution We know from Example 2 that the eigenvalues of

Computing Powers of a Matrix

The problem of computing powers of a matrix is greatly simplified when the matrix is diagonalizable. To see why this is so, suppose that A is a diagonalizable n × n matrix, that P diagonalizes A, and that



λ1

⎢0 ⎢ P −1AP = ⎢ .. ⎣.

λ2 .. .

0

0



··· ···

0

0 0⎥ ⎥

.. ⎥ = D .⎦ · · · λn

Squaring both sides of this equation yields



λ21

⎢0 ⎢ (P −1AP )2 = ⎢ .. ⎣.

λ22 .. .

0

0



··· ···

0

0 0⎥ ⎥

.. ⎥ = D 2 .⎦ · · · λ2n

We can rewrite the left side of this equation as

(P −1AP )2 = P −1APP −1AP = P −1 AIAP = P −1A2 P from which we obtain the relationship P −1A2P = D 2 . More generally, if k is a positive integer, then a similar computation will show that



λk1

⎢0 ⎢ P −1AkP = D k = ⎢ .. ⎣.

λk2 .. .

0

0

which we can rewrite as



λk1

Formula (3) reveals that raising a diagonalizable matrix A to a positive integer power has the effect of raising its eigenvalues to that power.

··· ···

0

.. ⎥ .⎦ · · · λkn

λk2 .. .

0

0



··· ···

0

⎢0 ⎢ Ak = PD kP −1 = P ⎢ .. ⎣.



0 0⎥ ⎥

0 0⎥ ⎥

.. ⎥ P −1 .⎦ · · · λkn

E X A M P L E 6 Powers of a Matrix

Use (3) to find A13 , where



0 ⎢ A = ⎣1 1

0 2 0

−2

⎤ ⎥

1⎦ 3

Solution We showed in Example 1 that the matrix A is diagonalized by



−1

⎢ P =⎣ 0 1 and that

0 1 0



2 ⎢ −1 D = P AP = ⎣0 0

−2

⎤ ⎥

1⎦ 1 0 2 0



0 ⎥ 0⎦ 1

(3)

5.2 Diagonalization

Thus, it follows from (3) that



−1

0 1 0

⎢ A13 = PD 13 P −1 = ⎣ 0 1



−8190

⎢ = ⎣ 8191 8191

−2

⎤⎡

213 ⎥⎢ 1⎦ ⎣0 1 0

0 8192 0

0 213 0

−16382



⎤⎡

0 1 ⎥⎢ 0 ⎦⎣ 1 113 −1

0 1 0

309



2 ⎥ 1⎦ −1

(4)



8191 ⎦ 16383

Remark With the method in the preceding example, most of the work is in diagonalizing A. Once that work is done, it can be used to compute any power of A. Thus, to compute A1000 we need only change the exponents from 13 to 1000 in (4). Geometric and Algebraic Multiplicity

Theorem 5.2.2(b) does not completely settle the diagonalizability question since it only guarantees that a square matrix with n distinct eigenvalues is diagonalizable; it does not preclude the possibility that there may exist diagonalizable matrices with fewer than n distinct eigenvalues. The following example shows that this is indeed the case. E X A M P L E 7 The Converse of Theorem 5.2.2(b) Is False

Consider the matrices



1 ⎢ I = ⎣0 0

0 1 0





0 1 ⎥ ⎢ 0⎦ and J = ⎣0 1 0

1 1 0



0 ⎥ 1⎦ 1

It follows from Theorem 5.1.2 that both of these matrices have only one distinct eigenvalue, namely λ = 1, and hence only one eigenspace. We leave it as an exercise for you to solve the characteristic equations

(λI − I )x = 0 and (λI − J )x = 0 with λ = 1 and show that for I the eigenspace is three-dimensional (all of R 3 ) and for J it is one-dimensional, consisting of all scalar multiples of

⎡ ⎤

1 ⎢ ⎥ x = ⎣0⎦ 0 This shows that the converse of Theorem 5.2.2(b) is false, since we have produced two 3 × 3 matrices with fewer than three distinct eigenvalues, one of which is diagonalizable and the other of which is not. A full excursion into the study of diagonalizability is left for more advanced courses, but we will touch on one theorem that is important for a fuller understanding of diagonalizability. It can be proved that if λ0 is an eigenvalue of A, then the dimension of the eigenspace corresponding to λ0 cannot exceed the number of times that λ − λ0 appears as a factor of the characteristic polynomial of A. For example, in Examples 1 and 2 the characteristic polynomial is (λ − 1)(λ − 2)2 Thus, the eigenspace corresponding to λ = 1 is at most (hence exactly) one-dimensional, and the eigenspace corresponding to λ = 2 is at most two-dimensional. In Example 1

310

Chapter 5 Eigenvalues and Eigenvectors

the eigenspace corresponding to λ = 2 actually had dimension 2, resulting in diagonalizability, but in Example 2 the eigenspace corresponding to λ = 2 had only dimension 1, resulting in nondiagonalizability. There is some terminology that is related to these ideas. If λ0 is an eigenvalue of an n × n matrix A, then the dimension of the eigenspace corresponding to λ0 is called the geometric multiplicity of λ0 , and the number of times that λ − λ0 appears as a factor in the characteristic polynomial of A is called the algebraic multiplicity of λ0 . The following theorem, which we state without proof, summarizes the preceding discussion.

THEOREM 5.2.4 Geometric and Algebraic Multiplicity

If A is a square matrix, then: (a) For every eigenvalue of A, the geometric multiplicity is less than or equal to the algebraic multiplicity. (b) A is diagonalizable if and only if the geometric multiplicity of every eigenvalue is equal to the algebraic multiplicity.

We will complete this section with an optional proof of Theorem 5.2.2(a). O PT I O N A L

Proof of Theorem 5.2.2 (a ) Let v1 , v2 , . . . , vk be eigenvectors of A corresponding to dis-

tinct eigenvalues λ1 , λ2 , . . . , λk . We will assume that v1 , v2 , . . . , vk are linearly dependent and obtain a contradiction. We can then conclude that v1 , v2 , . . . , vk are linearly independent. Since an eigenvector is nonzero by definition, {v1 } is linearly independent. Let r be the largest integer such that {v1 , v2 , . . . , vr } is linearly independent. Since we are assuming that {v1 , v2 , . . . , vk } is linearly dependent, r satisfies 1 ≤ r < k . Moreover, by the definition of r , {v1 , v2 , . . . , vr+1 } is linearly dependent. Thus, there are scalars c1 , c2 , . . . , cr+1 , not all zero, such that

c1 v1 + c2 v2 + · · · + cr+1 vr+1 = 0

(5)

Multiplying both sides of (5) by A and using the fact that

Av1 = λ1 v1 , Av2 = λ2 v2 , . . . , Avr+1 = λr+1 vr+1 we obtain

c1 λ1 v1 + c2 λ2 v2 + · · · + cr+1 λr+1 vr+1 = 0

(6)

If we now multiply both sides of (5) by λr+1 and subtract the resulting equation from (6) we obtain

c1 (λ1 − λr+1 )v1 + c2 (λ2 − λr+1 )v2 + · · · + cr (λr − λr+1 )vr = 0 Since {v1 , v2 , . . . , vr } is a linearly independent set, this equation implies that

c1 (λ1 − λr+1 ) = c2 (λ2 − λr+1 ) = · · · = cr (λr − λr+1 ) = 0 and since λ1 , λ2 , . . . , λr+1 are assumed to be distinct, it follows that

c1 = c2 = · · · = cr = 0 Substituting these values in (5) yields

cr+1 vr+1 = 0

(7)

5.2 Diagonalization

311

Since the eigenvector vr+1 is nonzero, it follows that

cr+1 = 0

(8)

But equations (7) and (8) contradict the fact that c1 , c2 , . . . , cr+1 are not all zero so the proof is complete.

Exercise Set 5.2 In Exercises 1–4, show that A and B are not similar matrices.



1 1. A = 3

2. A =

4 2



1 ⎢ 3. A = ⎣0 0



1 ⎢ 4. A = ⎣2 3



1 1 ,B = 2 3

−1



4 2

,B =

4

0 −2

1 4





3 ⎢1 ⎥ 2⎦, B = ⎢ ⎣2 1 0

0 0 0

1 1 ⎥ ⎢ 2⎦, B = ⎣2 3 0



0

1

0⎥ ⎦

0

1

1 2 1

0 ⎥ 0⎦ 1

5. A =

1

0

6

−1



−2

2

0

7. A = ⎣0

3

0⎦

0

0

3



−14 −20

12



17



1

0

0

8. A = ⎣0

1

1⎦

0

1

1



4

0

⎢ A = ⎣2 1

1



3

⎥ 2⎦

0

4

(b) For each eigenvalue λ, find the rank of the matrix λI − A. (c) Is A diagonalizable? Justify your conclusion. 10. Follow the directions in Exercise 9 for the matrix 3

⎢ ⎣0 0



0

0

2

0⎦

1

2

17

3



0

0

0

0

0⎦

3

0

1





5

0

14. A = ⎣1

5

0⎦

0

1

5



0



In Exercises 17–18, use the method of Example 6 to compute the matrix A10 .

 17. A =



0

3

2

−1

19. Let



 18. A =



−1

⎢ A=⎣ 0



(a) Find the eigenvalues of A.



1

(b) λ3 − 3λ2 + 3λ − 1 = 0



9. Let

12. A = ⎣25

16. (a) λ3 (λ2 − 5λ − 6) = 0









0⎦

4

⎤ −9 −6 ⎥ −11 −9⎦ −9 −4 ⎤

(b) λ2 (λ − 1)(λ − 2)3 = 0





19

15. (a) (λ − 1)(λ + 3)(λ − 5) = 0



6. A =





In each part of Exercises 15–16, the characteristic equation of a matrix A is given. Find the size of the matrix and the possible dimensions of its eigenspaces.





−2

4

13. A = ⎣0



2

In Exercises 5–8, find a matrix P that diagonalizes A, and check your work by computing P −1AP .



−1 ⎢ 11. A = ⎣−3 −3 ⎡ ⎢

1

2 1 0





0

0

−1

2











7

−1

1

1

1

1

0⎦ and P = ⎣0 1 −2

0

1⎦

0

5

15



Confirm that P diagonalizes A, and then compute A11 . 20. Let



⎢ A = ⎣0

−2 −1

0

0

1











1

−4

0⎦ and P = ⎣1

0

0⎦

1

0

8

−1

0

1



Confirm that P diagonalizes A, and then compute each of the following powers of A. (a) A1000

(b) A−1000

(d) A−2301

(c) A2301

21. Find An if n is a positive integer and



In Exercises 11–14, find the geometric and algebraic multiplicity of each eigenvalue of the matrix A, and determine whether A is diagonalizable. If A is diagonalizable, then find a matrix P that diagonalizes A, and find P −1AP .



1

3

⎢ A = ⎣−1 0

−1 2

−1



0

⎥ −1⎦ 3

312

Chapter 5 Eigenvalues and Eigenvectors

22. Show that the matrices



1

⎢ A = ⎣1 1











1

1

3

0

0

1

1⎦ and B = ⎣0 0 1

0

0⎦

0

0

1



are similar. 23. We know from Table 1 that similar matrices have the same rank. Show that the converse is false by showing that the matrices     1 0 0 1 A= and B = 0 0 0 0 have the same rank but are not similar. [Suggestion: If they were similar, then there would be an invertible 2 × 2 matrix P for which AP = PB . Show that there is no such matrix.] 24. We know from Table 1 that similar matrices have the same eigenvalues. Use the method of Exercise 23 to show that the converse is false by showing that the matrices

 A=

1

1

0

1



 and B =

1

0

0

1



have the same eigenvalues but are not similar. 25. If A, B , and C are n × n matrices such that A is similar to B and B is similar to C , do you think that A must be similar to C ? Justify your answer. 26. (a) Is it possible for an n × n matrix to be similar to itself ? Justify your answer. (b) What can you say about an n × n matrix that is similar to 0n×n ? Justify your answer. (c) Is is possible for a nonsingular matrix to be similar to a singular matrix? Justify your answer. 27. Suppose that the characteristic polynomial of some matrix A is found to be p(λ) = (λ − 1)(λ − 3)2 (λ − 4)3 . In each part, answer the question and explain your reasoning. (a) What can you say about the dimensions of the eigenspaces of A? (b) What can you say about the dimensions of the eigenspaces if you know that A is diagonalizable? (c) If {v1 , v2 , v3 } is a linearly independent set of eigenvectors of A, all of which correspond to the same eigenvalue of A, what can you say about that eigenvalue? 28. Let



a A= c

b d

Show that (a) A is diagonalizable if (a − d)2 + 4bc > 0. (b) A is not diagonalizable if (a − d)2 + 4bc < 0. [Hint: See Exercise 29 of Section 5.1.]

29. In the case where the matrix A in Exercise 28 is diagonalizable, find a matrix P that diagonalizes A. [Hint: See Exercise 30 of Section 5.1.] In Exercises 30–33, find the standard matrix A for the given linear operator, and determine whether that matrix is diagonalizable. If diagonalizable, find a matrix P that diagonalizes A. 30. T (x1 , x2 ) = (2x1 − x2 , x1 + x2 ) 31. T (x1 , x2 ) = (−x2 , −x1 ) 32. T (x1 , x2 , x3 ) = (8x1 + 3x2 − 4x3 , −3x1 + x2 + 3x3 , 4x1 + 3x2 ) 33. T (x1 , x2 , x3 ) = (3x1 , x2 , x1 − x2 ) 34. If P is a fixed n × n matrix, then the similarity transformation

A →P −1AP can be viewed as an operator SP (A) = P −1AP on the vector space Mnn of n × n matrices. (a) Show that SP is a linear operator. (b) Find the kernel of SP . (c) Find the rank of SP .

Working with Proofs 35. Prove that similar matrices have the same rank and nullity. 36. Prove that similar matrices have the same trace. 37. Prove that if A is diagonalizable, then so is Ak for every positive integer k. 38. We know from Table 1 that similar matrices, A and B , have the same eigenvalues. However, it is not true that those eigenvalues have the same corresponding eigenvectors for the two matrices. Prove that if B = P −1AP , and v is an eigenvector of B corresponding to the eigenvalue λ, then P v is the eigenvector of A corresponding to λ. 39. Let A be an n × n matrix, and let q(A) be the matrix

q(A) = an An + an−1 An−1 + · · · + a1 A + a0 In (a) Prove that if B = P −1AP , then q(B) = P −1 q(A)P . (b) Prove that if A is diagonalizable, then so is q(A). 40. Prove that if A is a diagonalizable matrix, then the rank of A is the number of nonzero eigenvalues of A. 41. This problem will lead you through a proof of the fact that the algebraic multiplicity of an eigenvalue of an n × n matrix A is greater than or equal to the geometric multiplicity. For this purpose, assume that λ0 is an eigenvalue with geometric multiplicity k. (a) Prove that there is a basis B = {u1 , u2 , . . . , un } for R n in which the first k vectors of B form a basis for the eigenspace corresponding to λ0 .

5.3 Complex Vector Spaces

(b) Let P be the matrix having the vectors in B as columns. Prove that the product AP can be expressed as



AP = P

λ0 Ik

X Y

0

[Hint: Compare the first k column vectors on both sides.] (c) Use the result in part (b) to prove that A is similar to



C=

λ0 Ik 0

X Y

and hence that A and C have the same characteristic polynomial. (d) By considering det(λI − C), prove that the characteristic polynomial of C (and hence A) contains the factor (λ − λ0 ) at least k times, thereby proving that the algebraic multiplicity of λ0 is greater than or equal to the geometric multiplicity k.

313

(g) If there is a basis for R n consisting of eigenvectors of an n × n matrix A, then A is diagonalizable. (h) If every eigenvalue of a matrix A has algebraic multiplicity 1, then A is diagonalizable. (i) If 0 is an eigenvalue of a matrix A, then A2 is singular.

Working withTechnology T1. Generate a random 4 × 4 matrix A and an invertible 4 × 4 matrix P and then confirm, as stated in Table 1, that P −1AP and A have the same (a) determinant. (b) rank. (c) nullity. (d) trace. (e) characteristic polynomial. (f ) eigenvalues.

True-False Exercises TF. In parts (a)–(i) determine whether the statement is true or false, and justify your answer.

T2. (a) Use Theorem 5.2.1 to show that the following matrix is diagonalizable.



−13

(a) An n × n matrix with fewer than n distinct eigenvalues is not diagonalizable.

⎢ A = ⎣ 10 −5

−60

−60

⎤ ⎥

40⎦

42

−20

−18

(b) An n × n matrix with fewer than n linearly independent eigenvectors is not diagonalizable.

(b) Find a matrix P that diagonalizes A.

(c) If A and B are similar n × n matrices, then there exists an invertible n × n matrix P such that PA = BP .

(c) Use the method of Example 6 to compute A10 , and check your result by computing A10 directly.

(d) If A is diagonalizable, then there is a unique matrix P such that P −1AP is diagonal.

T3. Use Theorem 5.2.1 to show that the following matrix is not diagonalizable.

(e) If A is diagonalizable and invertible, then A−1 is diagonalizable.



−10

⎢ A = ⎣−15

(f ) If A is diagonalizable, then AT is diagonalizable.

−3

11

−6



16

⎥ −10⎦

3

−2

5.3 Complex Vector Spaces Because the characteristic equation of any square matrix can have complex solutions, the notions of complex eigenvalues and eigenvectors arise naturally, even within the context of matrices with real entries. In this section we will discuss this idea and apply our results to study symmetric matrices in more detail. A review of the essentials of complex numbers appears in the back of this text.

Review of Complex Numbers

Recall that if z = a + bi is a complex number, then: • Re(z) = a and Im(z) = b are called the real part of z and the imaginary part of z, respectively, • |z| =

√ a 2 + b2 is called the modulus (or absolute value) of z,

• z = a − bi is called the complex conjugate of z,

314

Chapter 5 Eigenvalues and Eigenvectors z = a + bi

Im(z) = b

• zz = a 2 + b2 = |z|2 , • the angle φ in Figure 5.3.1 is called an argument of z,

|z|

• Re(z) = |z| cos φ

φ

• Im(z) = |z| sin φ Re(z) = a

• z = |z|(cos φ + i sin φ) is called the polar form of z.

Figure 5.3.1

Complex Eigenvalues

In Formula (3) of Section 5.1 we observed that the characteristic equation of a general n × n matrix A has the form

λn + c1 λn−1 + · · · + cn = 0

(1)

in which the highest power of λ has a coefficient of 1. Up to now we have limited our discussion to matrices in which the solutions of (1) are real numbers. However, it is possible for the characteristic equation of a matrix A with real entries to have imaginary solutions; for example, the characteristic equation of the matrix

A= is

 λ + 2   −5

−2 5

−1 2



1  = λ2 + 1 = 0 λ − 2

which has the imaginary solutions λ = i and λ = −i . To deal with this case we will need to explore the notion of a complex vector space and some related ideas.

Vectors in C n

A vector space in which scalars are allowed to be complex numbers is called a complex vector space. In this section we will be concerned only with the following complex generalization of the real vector space R n . DEFINITION 1 If

n is a positive integer, then a complex n-tuple is a sequence of n complex numbers (v1 , v2 , . . . , vn ). The set of all complex n-tuples is called complex n-space and is denoted by C n . Scalars are complex numbers, and the operations of addition, subtraction, and scalar multiplication are performed componentwise. The terminology used for n-tuples of real numbers applies to complex n-tuples without change. Thus, if v1 , v2 , . . . , vn are complex numbers, then we call v = (v1 , v2 , . . . , vn ) a vector in C n and v1 , v2 , . . . , vn its components. Some examples of vectors in C 3 are



u = (1 + i, −4i, 3 + 2i), v = (0, i, 5), w = 6 −



2 i, 9 + 21 i, πi

Every vector v = (v1 , v2 , . . . , vn ) = (a1 + b1 i, a2 + b2 i, . . . , an + bn i) in C n can be split into real and imaginary parts as v = (a1 , a2 , . . . , an ) + i(b1 , b2 , . . . , bn ) which we also denote as v = Re(v) + i Im(v)



5.3 Complex Vector Spaces

315

where Re(v) = (a1 , a2 , . . . , an ) and Im(v) = (b1 , b2 , . . . , bn ) The vector v = (v 1 , v 2 , . . . , v n ) = (a1 − b1 i, a2 − b2 i, . . . , an − bn i) is called the complex conjugate of v and can be expressed in terms of Re(v) and Im(v) as v = (a1 , a2 , . . . , an ) − i(b1 , b2 , . . . , bn ) = Re(v) − i Im(v)

(2)

It follows that the vectors in R n can be viewed as those vectors in C n whose imaginary part is zero; or stated another way, a vector v in C n is in R n if and only if v = v. In this section we will need to distinguish between matrices whose entries must be real numbers, called real matrices, and matrices whose entries may be either real numbers or complex numbers, called complex matrices. When convenient, you can think of a real matrix as a complex matrix each of whose entries has a zero imaginary part. The standard operations on real matrices carry over without change to complex matrices, and all of the familiar properties of matrices continue to hold. If A is a complex matrix, then Re(A) and Im(A) are the matrices formed from the real and imaginary parts of the entries of A, and A is the matrix formed by taking the complex conjugate of each entry in A. E X A M P L E 1 Real and Imaginary Parts of Vectors and Matrices As you might expect, if A is a complex matrix, then A and A can be expressed in terms of Re(A) and Im(A) as

v = (3 + i, −2i, 5) and A = Then

A = Re(A) + i Im(A) A = Re(A) − i Im(A)

Algebraic Properties of the Complex Conjugate



Let

1+i 4

−i 6 − 2i

v = (3 − i, 2i, 5), Re(v) = (3, 0, 5), Im(v) = (1, −2, 0)



i 1 0 1 −1 , Re(A) = , Im(A) = 6 + 2i 4 6 0 −2   1 + i −i  = (1 + i)(6 − 2i) − (−i)(4) = 8 + 8i det(A) =  4 6 − 2i 

A=

1−i 4

The next two theorems list some properties of complex vectors and matrices that we will need in this section. Some of the proofs are given as exercises. THEOREM 5.3.1 If u and v are vectors in C n , and if k is a scalar, then:

(a) u = u (b) k u = k u (c)

u+v=u+v

(d ) u − v = u − v

THEOREM 5.3.2 If

then: (a) A = A (b) (AT ) = (A)T (c) AB = A B

A is an m × k complex matrix and B is a k × n complex matrix,

316

Chapter 5 Eigenvalues and Eigenvectors

The Complex Euclidean Inner Product

The following definition extends the notions of dot product and norm to C n . DEFINITION 2 If u

= (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in C n ,

then the complex Euclidean inner product of u and v (also called the complex dot product) is denoted by u · v and is defined as The complex conjugates in (3) ensure that v is a real number, for without them the quantity v · v in (4) might be imaginary.

u · v = u1 v 1 + u2 v 2 + · · · + un v n

(3)

We also define the Euclidean norm on C n to be

v =



v·v=

" |v1 |2 + |v2 |2 + · · · + |vn |2

(4)

As in the real case, we call v a unit vector in C n if v = 1, and we say two vectors u and v are orthogonal if u · v = 0. E X A M P L E 2 Complex Euclidean Inner Product and Norm

Find u · v, v · u, u , and v for the vectors u = (1 + i, i, 3 − i) and v = (1 + i, 2, 4i) Solution

u · v = (1 + i)(1 + i) + i(2) + (3 − i)(4i) = (1 + i)(1 − i) + 2i + (3 − i)(−4i) = −2 − 10i v · u = (1 + i)(1 + i) + 2(i) + (4i)(3 − i) = (1 + i)(1 − i) − 2i + 4i(3 + i) = −2 + 10i

"

√ √ |1 + i|2 + |i|2 + |3 − i|2 = 2 + 1 + 10 = 13 " √ √

v = |1 + i|2 + |2|2 + |4i|2 = 2 + 4 + 16 = 22

u =

Recall from Table 1 of Section 3.2 that if u and v are column vectors in R n , then their dot product can be expressed as u · v = uT v = vT u The analogous formulas in C n are (verify) u · v = uT v = vT u

(5)

Example 2 reveals a major difference between the dot product on R n and the complex dot product on C n . For the dot product on R n we always have v · u = u · v (the symmetry property), but for the complex dot product the corresponding relationship is given by u · v = v · u, which is called its antisymmetry property. The following theorem is an analog of Theorem 3.2.2. THEOREM 5.3.3 If u, v, and w are vectors in C n , and if k is a scalar, then the complex

Euclidean inner product has the following properties: (a) u · v = v · u

[ Antisymmetry property ]

(b) u · (v + w) = u · v + u · w

[ Distributive property ]

(c)

k(u · v) = (k u) · v

(d ) u · k v = k(u · v) (e)

v · v ≥ 0 and v · v = 0 if and only if v = 0.

[ Homogeneity property ] [ Antihomogeneity property ] [ Positivity property ]

5.3 Complex Vector Spaces

317

Parts (c) and (d ) of this theorem state that a scalar multiplying a complex Euclidean inner product can be regrouped with the first vector, but to regroup it with the second vector you must first take its complex conjugate. We will prove part (d ), and leave the others as exercises. Proof (d)

k(u · v) = k(v · u) = k (v · u) = k (v · u) = (k v) · u = u · (k v) To complete the proof, substitute k for k and use the fact that k = k. Vector Concepts in C n

Is R n a subspace of C n ? Explain.

Except for the use of complex scalars, the notions of linear combination, linear independence, subspace, spanning, basis, and dimension carry over without change to C n . Eigenvalues and eigenvectors are defined for complex matrices exactly as for real matrices. If A is an n × n matrix with complex entries, then the complex roots of the characteristic equation det(λI − A) = 0 are called complex eigenvalues of A. As in the real case, λ is a complex eigenvalue of A if and only if there exists a nonzero vector x in C n such that Ax = λx. Each such x is called a complex eigenvector of A corresponding to λ. The complex eigenvectors of A corresponding to λ are the nonzero solutions of the linear system (λI − A)x = 0, and the set of all such solutions is a subspace of C n , called the complex eigenspace of A corresponding to λ. The following theorem states that if a real matrix has complex eigenvalues, then those eigenvalues and their corresponding eigenvectors occur in conjugate pairs.

λ is an eigenvalue of a real n × n matrix A, and if x is a corresponding eigenvector, then λ is also an eigenvalue of A, and x is a corresponding eigenvector.

THEOREM 5.3.4 If

Proof Since λ is an eigenvalue of A and x is a corresponding eigenvector, we have

Ax = λ x = λ x

(6)

However, A = A, since A has real entries, so it follows from part (c) of Theorem 5.3.2 that Ax = Ax = Ax (7) Equations (6) and (7) together imply that

Ax = Ax = λ x in which x  = 0 (why?); this tells us that λ is an eigenvalue of A and x is a corresponding eigenvector.

E X A M P L E 3 Complex Eigenvalues and Eigenvectors

Find the eigenvalues and bases for the eigenspaces of



A=

−2

−1

5

2

Solution The characteristic polynomial of A is

 λ + 2   −5



1  = λ2 + 1 = (λ − i)(λ + i) λ − 2

318

Chapter 5 Eigenvalues and Eigenvectors

so the eigenvalues of A are λ = i and λ = −i. Note that these eigenvalues are complex conjugates, as guaranteed by Theorem 5.3.4. To find the eigenvectors we must solve the system

λ+2 x1 1 0 = −5 λ − 2 x 2 0 with λ = i and then with λ = −i. With λ = i, this system becomes



x1 0 = i − 2 x2 0

i+2 −5

1

(8)

We could solve this system by reducing the augmented matrix



i+2 −5

1

0 0

i−2

(9)

to reduced row echelon form by Gauss–Jordan elimination, though the complex arithmetic is somewhat tedious. A simpler procedure here is first to observe that the reduced row echelon form of (9) must have a row of zeros because (8) has nontrivial solutions. This being the case, each row of (9) must be a scalar multiple of the other, and hence the first row can be made into a row of zeros by adding a suitable multiple of the second row to it. Accordingly, we can simply set the entries in the first row to zero, then interchange the rows, and then multiply the new first row by − 15 to obtain the reduced row echelon form   1 25 − 15 i 0 0

0

0

Thus, a general solution of the system is

  x1 = − 25 + 15 i t, x2 = t

This tells us that the eigenspace corresponding to λ = i is one-dimensional and consists of all complex scalar multiples of the basis vector



− 25 + 15 i

x=



(10)

1

As a check, let us confirm that Ax = i x. We obtain

Ax =

−2 5



 2 1 −1 − 5 + 5 i 2

1

    1  −2 − 25 + 15 i − 1 − 5 − 25 i = = = ix   i 5 − 25 + 15 i + 2 

We could find a basis for the eigenspace corresponding to λ = −i in a similar way, but the work is unnecessary since Theorem 5.3.4 implies that



− 25 − 15 i

x=



(11)

1

must be a basis for this eigenspace. The following computations confirm that x is an eigenvector of A corresponding to λ = −i :



 2 1 −1 − 5 − 5 i

Ax =

−2



5

2



−2 = 

− 25



1 i 5





1

−1

5 − 25 − 15 i + 2



 =

− 15 + 25 i −i

 = −i x

5.3 Complex Vector Spaces

319

Since a number of our subsequent examples will involve 2 × 2 matrices with real entries, it will be useful to discuss some general results about the eigenvalues of such matrices. Observe first that the characteristic polynomial of the matrix



a A= c is

 λ − a det(λI − A) =  −c

b d

 −b  = (λ − a)(λ − d) − bc = λ2 − (a + d)λ + (ad − bc) λ − d

We can express this in terms of the trace and determinant of A as det(λI − A) = λ2 − tr(A)λ + det(A)

(12)

from which it follows that the characteristic equation of A is

λ2 − tr(A)λ + det(A) = 0

(13)

Now recall from algebra that if ax 2 + bx + c = 0 is a quadratic equation with real coefficients, then the discriminant b2 − 4ac determines the nature of the roots:

b2 − 4ac > 0 b2 − 4ac = 0 b2 − 4ac < 0

[ Two distinct real roots ] [ One repeated real root ] [ Two conjugate imaginary roots ]

Applying this to (13) with a = 1, b = −tr(A), and c = det(A) yields the following theorem. THEOREM 5.3.5 If A is a 2 × 2 matrix with real entries, then the characteristic equa-

tion of A is λ2 − tr(A)λ + det(A) = 0 and

(a) A has two distinct real eigenvalues if tr(A)2 − 4 det(A) > 0; (b) A has one repeated real eigenvalue if tr(A)2 − 4 det(A) = 0; (c) A has two complex conjugate eigenvalues if tr(A)2 − 4 det(A) < 0.

E X A M P L E 4 Eigenvalues of a 2 × 2 Matrix

In each part, use Formula (13) for the characteristic equation to find the eigenvalues of



2 (a) A = −1

Olga Taussky-Todd (1906–1995)

2 5



0 (b) A = 1

−1 2



2 (c) A = −3

3 2

Historical Note OlgaTaussky-Todd was one of the pioneering women in matrix analysis and the first woman appointed to the faculty at the California Institute ofTechnology. She worked at the National Physical Laboratory in London during World War II, where she was assigned to study flutter in supersonic aircraft. While there, she realized that some results about the eigenvalues of a certain 6 × 6 complex matrix could be used to answer key questions about the flutter problem that would otherwise have required laborious calculation. AfterWorldWar II Olga Taussky-Todd continued her work on matrix-related subjects and helped to draw many known but disparate results about matrices into the coherent subject that we now call matrix theory. [Image: Courtesy of the Archives, California Institute ofTechnology ]

320

Chapter 5 Eigenvalues and Eigenvectors Solution (a) We have tr(A)

= 7 and det(A) = 12, so the characteristic equation of A is λ2 − 7λ + 12 = 0

Factoring yields (λ − 4)(λ − 3) = 0, so the eigenvalues of A are λ = 4 and λ = 3. Solution (b) We have tr(A)

= 2 and det(A) = 1, so the characteristic equation of A is λ2 − 2 λ + 1 = 0

Factoring this equation yields (λ − 1)2 = 0, so λ = 1 is the only eigenvalue of A; it has algebraic multiplicity 2. Solution (c) We have tr(A)

= 4 and det(A) = 13, so the characteristic equation of A is λ2 − 4λ + 13 = 0

Solving this equation by the quadratic formula yields

λ=



"

(−4)2 − 4(13) 2

=





−36

2

= 2 ± 3i

Thus, the eigenvalues of A are λ = 2 + 3i and λ = 2 − 3i .

Symmetric Matrices Have Real Eigenvalues

Our next result, which is concerned with the eigenvalues of real symmetric matrices, is important in a wide variety of applications. The key to its proof is to think of a real symmetric matrix as a complex matrix whose entries have an imaginary part of zero.

THEOREM 5.3.6 If A is a real symmetric matrix, then A has real eigenvalues.

Proof Suppose that λ is an eigenvalue of A and x is a corresponding eigenvector, where

we allow for the possibility that λ is complex and x is in C n . Thus,

Ax = λx where x  = 0. If we multiply both sides of this equation by xT and use the fact that xTAx = xT (λx) = λ(xT x) = λ(x · x) = λ x 2 then we obtain

xTAx

x 2 Since the denominator in this expression is real, we can prove that λ is real by showing that xTAx = xTAx (14)

λ=

But A is symmetric and has real entries, so it follows from the second equality in (5) and properties of the conjugate that T

xTAx = x Ax = xT Ax = (Ax)T x = (Ax)T x = (Ax)T x = xTAT x = xTAx

A Geometric Interpretation of Complex Eigenvalues

The following theorem is the key to understanding the geometric significance of complex eigenvalues of real 2 × 2 matrices.

5.3 Complex Vector Spaces

321

THEOREM 5.3.7 The eigenvalues of the real matrix



a C= b

y (a, b)



Scaled

Figure 5.3.3

(15)



|λ| 0 −b cos φ = a 0 |λ| sin φ

− sin φ cos φ

(16)

where φ is the angle from the positive x -axis to the ray that joins the origin to the point (a, b) (Figure 5.3.2).

Cx Rotated

φ

a b

x

Figure 5.3.2

y

are λ = a ± bi. If a and b are not both zero, then this matrix can be factored as

|λ| φ

−b a

Geometrically, this theorem states that multiplication by a matrix of form (15) can be viewed as a rotation through the angle φ followed by a scaling with factor |λ| (Figure 5.3.3).

C is (λ − a)2 + b2 = 0 (verify), from which it follows that the eigenvalues of C are λ = a ± bi. Assuming that a and b are not both zero, let φ be the angle from the positive x -axis to the ray that joins the origin to the point (a, b). The angle φ is an argument of the eigenvalue λ = a + bi, so we see from Proof The characteristic equation of

x x

Figure 5.3.2 that

a = |λ| cos φ and b = |λ| sin φ It follows from this that the matrix in (15) can be written as



a b



a ⎢ |λ| 0 ⎢ |λ| −b = a 0 |λ| ⎣ b |λ|





⎤ b

cos φ |λ| ⎥ ⎥ = |λ| 0 a ⎦ 0 |λ| sin φ |λ|

− sin φ cos φ

The following theorem, whose proof is considered in the exercises, shows that every real 2 × 2 matrix with complex eigenvalues is similar to a matrix of form (15).

A be a real 2 × 2 matrix with complex eigenvalues λ = a ± bi (where b  = 0). If x is an eigenvector of A corresponding to λ = a − bi, then the matrix P = [Re(x) Im(x)] is invertible and

a −b P −1 A=P (17) b a

THEOREM 5.3.8 Let

E X A M P L E 5 A Matrix Factorization Using Complex Eigenvalues

Factor the matrix in Example 3 into form (17) using the eigenvalue λ = −i and the corresponding eigenvector that was given in (11). Solution For consistency with the notation in Theorem 5.3.8, let us denote the eigen-

vector in (11) that corresponds to λ = −i by x (rather than x as before). For this λ and x we have    

a = 0, b = 1, Re(x) =

− 25

1

, Im(x) =

− 15

0

322

Chapter 5 Eigenvalues and Eigenvectors

Thus,

 Im(x)] =

P = [Re(x)

− 25

− 15

1

0



so A can be factored in form (17) as



−2

−1

5

2

 =

− 25

− 15

1

0



−1

0 1



0

0 −5

1 −2

You may want to confirm this by multiplying out the right side.

A Geometric Interpretation ofTheorem 5.3.8

To clarify what Theorem 5.3.8 says geometrically, let us denote the matrices on the right side of (16) by S and Rφ , respectively, and then use (16) to rewrite (17) as

 A = PSRφ P

−1

=P



|λ|

0

0

|λ|

cos φ sin φ

− sin φ P −1 cos φ

(18)

If we now view P as the transition matrix from the basis B = {Re(x), Im(x)} to the standard basis, then (18) tells us that computing a product Ax0 can be broken down into a three-step process:

Interpreting Formula (18) Step 1. Map x0 from standard coordinates into B -coordinates by forming the product P − 1 x0 . Step 2. Rotate and scale the vector P −1 x0 by forming the product SRφ P −1 x0 . Step 3. Map the rotated and scaled vector back to standard coordinates to obtain Ax0 = PSRφ P −1 x0 .

Power Sequences

There are many problems in which one is interested in how successive applications of a matrix transformation affect a specific vector. For example, if A is the standard matrix for an operator on R n and x0 is some fixed vector in R n , then one might be interested in the behavior of the power sequence x0 , Ax0 , A2 x0 , . . . , Ak x0 , . . . For example, if

 A=

1 2 − 35

3 4 11 10



and x0 =

1 1

then with the help of a computer or calculator one can show that the first four terms in the power sequence are

x0 =







1 1.25 1 .0 0.35 , A x0 = , A 2 x0 = , A 3 x0 = −0.2 −0.82 1 0 .5

With the help of MATLAB or a computer algebra system one can show that if the first 100 terms are plotted as ordered pairs (x, y), then the points move along the elliptical path shown in Figure 5.3.4a.

5.3 Complex Vector Spaces y 1

y

x0 = (1, 1)

y

( 12 , 1)

(3)

(1, 1)

1

(1)

(2)

Ax0

φ

1

x x A2x0

–1

–1

323

(1, ) 1 2

( 54 , 12 ) x

1

–1

A3x0

–1

A4x0

(a)

(c)

(b)

Figure 5.3.4

To understand why the points move along an elliptical path, we will need to examine the eigenvalues and eigenvectors of A. We leave it for you to show that the eigenvalues of A are λ = 45 ± 35 i and that the corresponding eigenvectors are

λ1 =

4 5

− 35 i : v1 =

1 2

 + i, 1 and λ2 =

If we take λ = λ1 = 45 − 35 i and x = v1 = then we obtain the factorization



3 4

1 2

⎣ − 35

11 10

A



⎦= =

1

1 2

2

1

0

+ 35 i : v2 =

1 2

 − i, 1

 + i, 1 in (17) and use the fact that |λ| = 1,

⎡4 ⎣5

1

4 5

− 35 4 5

3 5

P

⎤ ⎦

0 1



1



− 21

(19)

P −1

where Rφ is a rotation about the origin through the angle φ whose tangent is tan φ =

sin φ 3/5 3 = = cos φ 4/5 4



φ = tan−1

3 4

≈ 36.9◦



The matrix P in (19) is the transition matrix from the basis

B = {Re(x), Im(x)} = y (0, 1)

(

1 2,

1)

 * , 1 , (1, 0)

An x0 = (PRφ P −1 )n x0 = PRφn P −1 x0 x

Figure 5.3.5

2

to the standard basis, and P −1 is the transition matrix from the standard basis to the basis B (Figure 5.3.5). Next, observe that if n is a positive integer, then (19) implies that

Re(x)

Im(x)

) 1

(1, 0)

so the product An x0 can be computed by first mapping x0 into the point P −1 x0 in B coordinates, then multiplying by Rφn to rotate this point about the origin through the angle nφ, and then multiplying Rφn P −1 x0 by P to map the resulting point back to standard coordinates. We can now see what is happening geometrically: In B -coordinates each successive multiplication by A causes the point P −1 x0 to advance through an angle φ, thereby tracing a circular orbit about the origin. However, the basis B is skewed (not orthogonal), so when the points on the circular orbit are transformed back to standard coordinates, the effect is to distort the circular orbit into the elliptical orbit traced by An x0 (Figure 5.3.4b). Here are the computations for the first step (successive steps are

324

Chapter 5 Eigenvalues and Eigenvectors

illustrated in Figure 5.3.4c):



3 4

1 2





⎦ 1 = 11 1

⎣ − 35

10

=

=

1 2

⎡4 ⎣5

1

− 35 4 5

3 5

1

0

1 2

1

1

0

1 2

1

2

1

0

1

⎡4 ⎣5

− 35 4 5

3 5

1

⎤ ⎦

0

1

1

− 21

 1 1

⎤  1



[x0 is mapped to B -coordinates.]

1 2



[The point 1,

1 2



is rotated through the angle φ .]

⎡ ⎤ 5

4 =⎣ ⎦ 1 2

[The point

1

2,1



is mapped to standard coordinates.]

Exercise Set 5.3 In Exercises 1–2, find u, Re(u), Im(u), and u . 1. u = (2 − i, 4i, 1 + i)

2. u = (6, 1 + 4i, 6 − 2i)

In Exercises 3–4, show that u, v, and k satisfy Theorem 5.3.1. 3. u = (3 − 4i, 2 + i, −6i), v = (1 + i, 2 − i, 4), k = i

12. u = (1 + i, 4, 3i), v = (3, −4i, 2 + 3i), w = (1 − i, 4i, 4 − 5i), k = 1 + i 13. Compute (u · v) − w · u for the vectors u, v, and w in Exercise 11.

4. u = (6, 1 + 4i, 6 − 2i), v = (4, 3 + 2i, i − 3), k = −i

14. Compute (i u · w) + ( u v) · u for the vectors u, v, and w in Exercise 12.

5. Solve the equation i x − 3v = u for x, where u and v are the vectors in Exercise 3.

In Exercises 15–18, find the eigenvalues and bases for the eigenspaces of A.

6. Solve the equation (1 + i)x + 2u = v for x, where u and v are the vectors in Exercise 4.

15. A =



In Exercises 7–8, find A, Re(A), Im(A), det(A), and tr(A).

7. A =

−5i 2−i

4

8. A =

1 + 5i

4i

2 + 3i

2 − 3i 1

9. Let A be the matrix given in Exercise 7, and let B be the matrix



1−i 2i

B=

Confirm that these matrices have the properties stated in Theorem 5.3.2.

B=

5i

In Exercises 11–12, compute u · v, u · w, and v · w, and show that the vectors satisfy Formula (5) and parts (a), (b), and (c) of Theorem 5.3.3. 11. u = (i, 2i, 3), v = (4, −2i, 1 + i), w = (2 − i, 2i, 5 + 3i), k = 2i

−2

5 1



−1

−5

4

7

16. A =

18. A =

3

8 −3

6 2



In Exercises 19–22, each matrix C has form (15). Theorem 5.3.7 implies that C is the product of a scaling matrix with factor |λ| and a rotation matrix with angle φ . Find |λ| and φ for which

−π < φ ≤ π.

1 −1 19. C = 1

√ 

1

3

√ − 3

21. C =



0 −5

20. C =

1



1 − 4i

Confirm that these matrices have the properties stated in Theorem 5.3.2.

0

17. A =

10. Let A be the matrix given in Exercise 8, and let B be the matrix



−5

4 1

1

5 0

√  2 √

 √ 2 √ 22. C = − 2

2

In Exercises 23–26, find an invertible matrix P and a matrix C of form (15) such that A = P CP −1 .



−1

−5

4

7

23. A =

25. A =

8 −3

24. A =

6 2

4 1

26. A =

5 1

−5

0

−2 3

5.3 Complex Vector Spaces

27. Find all complex scalars k, if any, for which u and v are orthogonal in C 3 .

(a) For notational simplicity, let

M=

(a) u = (2i, i, 3i), v = (i, 6i, k) (b) u = (k, k, 1 + i), v = (1, −1, 1 − i) 28. Show that if A is a real n × n matrix and x is a column vector in C n , then Re(Ax) = A(Re(x)) and Im(Ax) = A(Im(x)). 29. The matrices



0 σ1 = 1



1 0 , σ2 = 0 i

−i



1 , σ3 = 0 0

0 −1

called Pauli spin matrices, are used in quantum mechanics to study particle spin. The Dirac matrices, which are also used in quantum mechanics, are expressed in terms of the Pauli spin matrices and the 2 × 2 identity matrix I2 as

β= αy =

I2

0

0

−I2

σ2

0

σ2

, αx =

, αz =

0

0

σ1

σ1

0

0

σ3

σ3

0

,

2

2

a b

−b a

and let u = Re(x) and v = Im(x), so P = [u | v]. Show that the relationship Ax = λx implies that

Ax = (a u + bv) + i(−bu + a v) and then equate real and imaginary parts in this equation to show that

AP = [Au | Av] = [a u + bv | −bu + a v] = PM (b) Show that P is invertible, thereby completing the proof, since the result in part (a) implies that A = PMP −1 . [Hint: If P is not invertible, then one of its column vectors is a real scalar multiple of the other, say v = cu. Substitute this into the equations Au = a u + bv and Av = −bu + a v obtained in part (a), and show that (1 + c2 )bu = 0. Finally, show that this leads to a contradiction, thereby proving that P is invertible.] 36. In this problem you will prove the complex analog of the Cauchy–Schwarz inequality.

(a) Show that β = αx = αy = αz . 2

2

(b) Matrices A and B for which AB = −BA are said to be anticommutative. Show that the Dirac matrices are anticommutative.

(a) Prove: If k is a complex number, and u and v are vectors in C n , then

(u − k v) · (u − k v) = u · u − k(u · v) − k(u · v) + kk(v · v)

30. If k is a real scalar and v is a vector in R n , then Theorem 3.2.1 states that k v = |k| v . Is this relationship also true if k is a complex scalar and v is a vector in C n ? Justify your answer.

(b) Use the result in part (a) to prove that

Working with Proofs

(c) Take k = (u · v)/(v · v) in part (b) to prove that

0 ≤ u · u − k(u · v) − k(u · v) + kk(v · v)

|u · v| ≤ u v

31. Prove part (c) of Theorem 5.3.1. 32. Prove Theorem 5.3.2.

True-False Exercises

33. Prove that if u and v are vectors in C n , then

TF. In parts (a)–(f ) determine whether the statement is true or false, and justify your answer.

1 1 u · v = u + v 2 − u − v 2 4 4

+

325

i 4

u + i v − 2

i 4

(a) There is a real 5 × 5 matrix with no real eigenvalues.

u − i v

2

34. It follows from Theorem 5.3.7 that the eigenvalues of the rotation matrix



Rφ =

cos φ sin φ

−sin φ cos φ

are λ = cos φ ± i sin φ. Prove that if x is an eigenvector corresponding to either eigenvalue, then Re(x) and Im(x) are orthogonal and have the same length. [Note: This implies that P = [Re(x) | Im(x)] is a real scalar multiple of an orthogonal matrix.] 35. The two parts of this exercise lead you through a proof of Theorem 5.3.8.

(b) The eigenvalues of a 2 × 2 complex matrix are the solutions of the equation λ2 − tr(A)λ + det(A) = 0. (c) A 2 × 2 matrix A with real entries has two distinct eigenvalues if and only if tr(A)2  = 4 det(A). (d) If λ is a complex eigenvalue of a real matrix A with a corresponding complex eigenvector v, then λ is a complex eigenvalue of A and v is a complex eigenvector of A corresponding to λ. (e) Every eigenvalue of a complex symmetric matrix is real. (f ) If a 2 × 2 real matrix A has complex eigenvalues and x0 is a vector in R 2 , then the vectors x0 , Ax0 , A2 x0 , . . . , An x0 , . . . lie on an ellipse.

326

Chapter 5 Eigenvalues and Eigenvectors

5.4 Differential Equations Many laws of physics, chemistry, biology, engineering, and economics are described in terms of “differential equations”—that is, equations involving functions and their derivatives. In this section we will illustrate one way in which matrix diagonalization can be used to solve systems of differential equations. Calculus is a prerequisite for this section.

Terminology

Recall from calculus that a differential equation is an equation involving unknown functions and their derivatives. The order of a differential equation is the order of the highest derivative it contains. The simplest differential equations are the first-order equations of the form y = ay (1) where y = f(x) is an unknown differentiable function to be determined, y = dy/dx is its derivative, and a is a constant. As with most differential equations, this equation has infinitely many solutions; they are the functions of the form

y = ceax

(2)

where c is an arbitrary constant. That every function of this form is a solution of (1) follows from the computation

y = caeax = ay

and that these are the only solution is shown in the exercises. Accordingly, we call (2) the general solution of (1). As an example, the general solution of the differential equation y = 5y is y = ce5x (3) Often, a physical problem that leads to a differential equation imposes some conditions that enable us to isolate one particular solution from the general solution. For example, if we require that solution (3) of the equation y = 5y satisfy the added condition

y(0) = 6

(4)

(that is, y = 6 when x = 0), then on substituting these values in (3), we obtain 6 = ce0 = c, from which we conclude that

y = 6e5x is the only solution y = 5y that satisfies (4). A condition such as (4), which specifies the value of the general solution at a point, is called an initial condition, and the problem of solving a differential equation subject to an initial condition is called an initial-value problem. First-Order Linear Systems

In this section we will be concerned with solving systems of differential equations of the form

y1 = a11 y1 + a12 y2 + · · · + a1n yn y2 = a21 y1 + a22 y2 + · · · + a2n yn (5) .. .. .. .. . . . . yn = an1 y1 + an2 y2 + · · · + ann yn where y1 = f1 (x), y2 = f2 (x), . . . , yn = fn (x) are functions to be determined, and the aij’s are constants. In matrix notation, (5) can be written as ⎡ ⎤ ⎡ ⎤⎡ ⎤ y1 a11 a12 · · · a1n y1 ⎢ y ⎥ ⎢a ⎥ ⎢y ⎥ a · · · a 21 22 2 n ⎢ 2⎥ ⎢ ⎥ ⎢ 2⎥ ⎢ .. ⎥ = ⎢ .. .. .. ⎥ ⎢ .. ⎥ ⎣ .⎦ ⎣ . . . ⎦⎣ . ⎦ yn

an1 an2 · · · ann yn

5.4 Differential Equations

327

or more briefly as y = Ay

(6)

where the notation y denotes the vector obtained by differentiating each component of y. We call (5) or its matrix form (6) a constant coefficient first-order homogeneous linear system. It is of first order because all derivatives are of that order, it is linear because differentiation and matrix multiplication are linear transformations, and it is homogeneous because y1 = y2 = · · · = yn = 0 is a solution regardless of the values of the coefficients. As expected, this is called the trivial solution. In this section we will work primarily with the matrix form. Here is an example.

E X A M P L E 1 Solution of a Linear System with Initial Conditions

(a) Write the following system in matrix form:

y1 =

3y1

y2 = −2y2

y3 =

(7)

5y3

(b) Solve the system. (c) Find a solution of the system that satisfies the initial conditions y1 (0) = 1, y2 (0) = 4, and y3 (0) = −2. Solution (a)

or

⎡ ⎤ ⎡ 3 y1

⎢ ⎥ ⎢ ⎣y2 ⎦ = ⎣0 0 y3



3 ⎢

y = ⎣0 0

⎤⎡ ⎤

0 −2 0

0 y1 ⎥⎢ ⎥ 0⎦ ⎣y2 ⎦ 5 y3

0 −2 0

0 ⎥ 0⎦ y 5

(8)



(9)

Solution (b) Because each equation in (7) involves only one unknown function, we can

solve the equations individually. It follows from (2) that these solutions are

y1 = c1 e3x y2 = c2 e−2x y3 = c3 e5x or, in matrix notation,

⎤ ⎡ ⎤ ⎡ c1 e3x y1 ⎥ ⎢ ⎥ ⎢ y = ⎣y2 ⎦ = ⎣c2 e−2x ⎦ y3 c3 e5x

Solution (c) From the given initial conditions, we obtain

1 = y1 (0) = c1 e0 = c1 4 = y2 (0) = c2 e0 = c2

−2 = y3 (0) = c3 e0 = c3

(10)

328

Chapter 5 Eigenvalues and Eigenvectors

so the solution satisfying these conditions is

y1 = e3x , y2 = 4e−2x , y3 = −2e5x or, in matrix notation,

Solution by Diagonalization

⎤ ⎡ ⎤ ⎡ e3x y1 ⎥ ⎢ ⎥ ⎢ y = ⎣y2 ⎦ = ⎣ 4e−2x ⎦ y3 −2e5x

What made the system in Example 1 easy to solve was the fact that each equation involved only one of the unknown functions, so its matrix formulation, y = Ay, had a diagonal coefficient matrix A [Formula (9)]. A more complicated situation occurs when some or all of the equations in the system involve more than one of the unknown functions, for in this case the coefficient matrix is not diagonal. Let us now consider how we might solve such a system. The basic idea for solving a system y = Ay whose coefficient matrix A is not diagonal is to introduce a new unknown vector u that is related to the unknown vector y by an equation of the form y = P u in which P is an invertible matrix that diagonalizes A. Of course, such a matrix may or may not exist, but if it does, then we can rewrite the equation y = Ay as P u = A(P u) or alternatively as

u = (P −1AP )u

Since P is assumed to diagonalize A, this equation has the form u = D u where D is diagonal. We can now solve this equation for u using the method of Example 1, and then obtain y by matrix multiplication using the relationship y = P u. In summary, we have the following procedure for solving a system y = Ay in the case were A is diagonalizable. A Procedure for Solving y = Ay If A Is Diagonalizable Step 1. Find a matrix P that diagonalizes A. Step 2. Make the substitutions y = P u and y = P u to obtain a new “diagonal system” u = D u, where D = P −1AP . Step 3. Solve u = D u. Step 4. Determine y from the equation y = P u.

E X A M P L E 2 Solution Using Diagonalization

(a) Solve the system

y1 = y1 + y2 y2 = 4y1 − 2y2

(b) Find the solution that satisfies the initial conditions y1 (0) = 1, y2 (0) = 6. Solution (a) The coefficient matrix for the system is



1 A= 4

1 −2

5.4 Differential Equations

329

As discussed in Section 5.2, A will be diagonalized by any matrix P whose columns are linearly independent eigenvectors of A. Since

 −1  = λ2 + λ − 6 = (λ + 3)(λ − 2) λ + 2

 λ − 1 det(λI − A) =  −4

the eigenvalues of A are λ = 2 and λ = −3. By definition,

x1 x2

x=

is an eigenvector of A corresponding to λ if and only if x is a nontrivial solution of



−1 λ+2

λ−1 −4

If λ = 2, this system becomes



0 x1 = 0 x2

−1 x 1 0 = 0 4 x2

1 −4

Solving this system yields x1 = t, x2 = t, so

x1 t 1 = =t 1 x2 t

Thus,

1 1

p1 =

is a basis for the eigenspace corresponding to λ = 2. Similarly, you can show that



p2 =

− 41



1

is a basis for the eigenspace corresponding to λ = −3. Thus,



1 P = 1 diagonalizes A, and

D = P −1AP =

− 41



1



2 0

0 −3

Thus, as noted in Step 2 of the procedure stated above, the substitution y = P u and y = P u

yields the “diagonal system”



2 u = Du = 0



0 u −3

u 1 =

or

2u1



u2 = −3u2

From (2) the solution of this system is

u1 = c1 e2x u2 = c2 e−3x



c1 e2x or u = c2 e−3x



330

Chapter 5 Eigenvalues and Eigenvectors

so the equation y = P u yields, as the solution for y,

 y1 1 y= = y2 1

or

− 41 1



   c1 e2x c1 e2x − 41 c2 e−3x = c2 e−3x c1 e2x + c2 e−3x

y1 = c1 e2x − 41 c2 e−3x y2 = c1 e2x + c2 e−3x

(11)

Solution (b) If we substitute the given initial conditions in (11), we obtain

c1 − 41 c2 = 1 c1 + c2 = 6 Solving this system, we obtain c1 = 2, c2 = 4, so it follows from (11) that the solution satisfying the initial conditions is

y1 = 2e2x − e−3x y2 = 2e2x + 4e−3x Remark Keep in mind that the method of Example 2 works because the coefficient matrix of the system is diagonalizable. In cases where this is not so, other methods are required. These are typically discussed in books devoted to differential equations.

Exercise Set 5.4 1. (a) Solve the system

y1 = y1 + 4y2 y2 = 2y1 + 3y2 (b) Find the solution that satisfies the initial conditions y1 (0) = 0, y2 (0) = 0. 2. (a) Solve the system

y1 = y1 + 3y2 y2 = 4y1 + 5y2 (b) Find the solution that satisfies the conditions y1 (0) = 2, y2 (0) = 1. 3. (a) Solve the system

y1 =

4 y1

+ y3



y2 = −2y1 + y2 y3 = −2y1

+ y3

(b) Find the solution that satisfies the initial conditions y1 (0) = −1, y2 (0) = 1, y3 (0) = 0. 4. Solve the system

y1 = 4y1 + 2y2 + 2y3 y2 = 2y1 + 4y2 + 2y3 y3 = 2y1 + 2y2 + 4y3

5. Show that every solution of y = ay has the form y = ceax . [Hint: Let y = f(x) be a solution of the equation, and show that f(x)e−ax is constant.] 6. Show that if A is diagonalizable and

⎤ y1 ⎢y ⎥ ⎢ 2⎥ y=⎢ .⎥ ⎣ .. ⎦ yn ⎡

is a solution of the system y = Ay, then each yi is a linear combination of eλ1 x , eλ2 x , . . . , eλn x , where λ1 , λ2 , . . . , λn are the eigenvalues of A. 7. Sometimes it is possible to solve a single higher-order linear differential equation with constant coefficients by expressing it as a system and applying the methods of this section. For the differential equation y

− y − 6y = 0, show that the substitutions y1 = y and y2 = y lead to the system

y1 = y2 y2 = 6y1 + y2 Solve this system, and use the result to solve the original differential equation.

5.4 Differential Equations

8. Use the procedure in Exercise 7 to solve y

+ y − 12y = 0. 9. Explain how you might use the procedure in Exercise 7 to solve y

− 6y

+ 11y − 6y = 0. Use that procedure to solve the equation.

331

(b) Find iL (t) and vC (t) subject to the initial conditions iL (0) = 2 amperes and vC (0) = 1 volt. (c) What can you say about the current and voltage in part (b) over the “long term” (that is, as t → ⬁)?

10. Solve the nondiagonalizable system C

y1 = y1 + y2 y2 = y2

R

[Hint: Solve the second equation for y2 , substitute in the first equation, and then multiply both sides of the resulting equation by e−x .]

L

Figure Ex-13



11. Consider a system of differential equations y = Ay, where A is a 2 × 2 matrix. For what values of a11 , a12 , a21 , a22 do the component solutions y1 (t), y2 (t) tend to zero as t → ⬁? In particular, what must be true about the determinant and the trace of A for this to happen? 12. (a) By rewriting (11) in matrix form, show that the solution of the system in Example 2 can be expressed as

 

y = c1 e2x

1 1



+ c2 e−3x

− 41



In Exercises 14–15, a mapping

L: C ⬁ (−⬁, ⬁) →C ⬁ (−⬁, ⬁) is given. (a) Show that L is a linear operator. (b) Use the ideas in Exercises 7 and 9 to solve the differential equation L(y) = 0.

1

This is called the general solution of the system. (b) Note that in part (a), the vector in the first term is an eigenvector corresponding to the eigenvalue λ1 = 2, and the vector in the second term is an eigenvector corresponding to the eigenvalue λ2 = −3. This is a special case of the following general result: Theorem. If the coefficient matrix A of the system y = Ay is diagonalizable, then the general solution of the system can be expressed as y = c1 eλ1 x x1 + c2 eλ2 x x2 + · · · + cn eλn x xn where λ1 , λ2 , . . . , λn are the eigenvalues of A, and xi is an eigenvector of A corresponding to λi .

14. L(y) = y

+ 2y − 3y 15. L(y) = y

− 2y

− y + 2y

Working with Proofs 16. Prove the theorem in Exercise 12 by tracing through the fourstep procedure preceding Example 2 with



⎢ ⎢0 ⎢ D=⎢. ⎢. ⎣.

0

···

λ2 .. .

···

0

0

···

λ1

0



⎥ ⎥ and P = [x1 | x2 | · · · | xn ] .. ⎥ ⎥ .⎦ λn 0⎥

True-False Exercises 13. The electrical circuit in the accompanying figure is called a parallel LRC circuit; it contains a resistor with resistance R ohms (), an inductor with inductance L henries (H), and a capacitor with capacitance C farads (F). It is shown in electrical circuit analysis that at time t the current iL through the inductor and the voltage vC across the capacitor are solutions of the system



iL (t) vC (t)





=

1/L

−1/C

−1/(RC)



0

TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) Every system of differential equations y = Ay has a solution. (b) If x = Ax and y = Ay, then x = y. (c) If x = Ax and y = Ay, then (cx + d y) = A(cx + d y) for all scalars c and d .



iL (t) vC (t)

(a) Find the general solution of this system in the case where R = 1 ohm, L = 1 henry, and C = 0.5 farad.

(d) If A is a square matrix with distinct real eigenvalues, then it is possible to solve x = Ax by diagonalization. (e) If A and P are similar matrices, then y = Ay and u = P u have the same solutions.

332

Chapter 5 Eigenvalues and Eigenvectors

Working withTechnology

and the voltage drop V in volts (V) across the capacitor satisfy the system of differential equations

T1. (a) Find the general solution of the following system by computing appropriate eigenvalues and eigenvectors. y1 = 3y1 + 2y2 + 2y3

y2 =

dI V = L dt

y1 + 4y2 + y3

dV I V =− − dt C RC



y3 = −2y1 − 4y2 − y3 (b) Find the solution that satisfies the initial conditions y1 (0) = 0, y2 (0) = 1, y3 (0) = −3. [Technology not required.]

where the derivatives are with respect to the time t . Find I and V as functions of t if L = 0.5 H, C = 0.2 F, R = 2 , and the initial values of V and I are V (0) = 1 V and I (0) = 2 A.

T2. It is shown in electrical circuit theory that for the LRC circuit in Figure Ex-13 the current I in amperes (A) through the inductor

5.5 Dynamical Systems and Markov Chains In this optional section we will show how matrix methods can be used to analyze the behavior of physical systems that evolve over time. The methods that we will study here have been applied to problems in business, ecology, demographics, sociology, and most of the physical sciences.

Dynamical Systems

A dynamical system is a finite set of variables whose values change with time. The value of a variable at a point in time is called the state of the variable at that time, and the vector formed from these states is called the state vector of the dynamical system at that time. Our primary objective in this section is to analyze how the state vector of a dynamical system changes with time. Let us begin with an example.

E X A M P L E 1 Market Share as a Dynamical System Channel 1

10%

Channel 2

20% 80% Channel 1 loses 20% and holds 80%. Channel 2 loses 10% and holds 90%.

90%

Suppose that two competing television channels, channel 1 and channel 2, each have 50% of the viewer market at some initial point in time. Assume that over each one-year period channel 1 captures 10% of channel 2’s share, and channel 2 captures 20% of channel 1’s share (see Figure 5.5.1). What is each channel’s market share after one year? Solution Let us begin by introducing the time-dependent variables

x1 (t) = fraction of the market held by channel 1 at time t x2 (t) = fraction of the market held by channel 2 at time t and the column vector



Figure 5.5.1

x(t) =

x1 (t) x2 (t)

← Channel 1’s fraction of the market at time t in years ← Channel 2’s fraction of the market at time t in years

The variables x1 (t) and x2 (t) form a dynamical system whose state at time t is the vector x(t). If we take t = 0 to be the starting point at which the two channels had 50% of the market, then the state of the system at that time is

x1 (0) ← Channel 1’s fraction of the market at time t = 0 0.5 = x(0) = x2 (0) ← Channel 2’s fraction of the market at time t = 0 0 .5

(1)

Now let us try to find the state of the system at time t = 1 (one year later). Over the one-year period, channel 1 retains 80% of its initial 50%, and it gains 10% of channel 2’s initial 50%. Thus, x1 (1) = 0.8(0.5) + 0.1(0.5) = 0.45 (2)

5.5 Dynamical Systems and Markov Chains

333

Similarly, channel 2 gains 20% of channel 1’s initial 50%, and retains 90% of its initial 50%. Thus, x2 (1) = 0.2(0.5) + 0.9(0.5) = 0.55 (3) Therefore, the state of the system at time t = 1 is



← Channel 1’s fraction of the market at time t = 1 x1 (1) 0.45 = x(1) = x2 (1) ← Channel 2’s fraction of the market at time t = 1 0.55

(4)

E X A M P L E 2 Evolution of Market Share over Five Years

Track the market shares of channels 1 and 2 in Example 1 over a five-year period. Solution To solve this problem suppose that we have already computed the market

share of each channel at time t = k and we are interested in using the known values of x1 (k) and x2 (k) to compute the market shares x1 (k + 1) and x2 (k + 1) one year later. The analysis is exactly the same as that used to obtain Equations (2) and (3). Over the one-year period, channel 1 retains 80% of its starting fraction x1 (k) and gains 10% of channel 2’s starting fraction x2 (k). Thus,

x1 (k + 1) = (0.8)x1 (k) + (0.1)x2 (k)

(5)

Similarly, channel 2 gains 20% of channel 1’s starting fraction x1 (k) and retains 90% of its own starting fraction x2 (k). Thus,

x2 (k + 1) = (0.2)x1 (k) + (0.9)x2 (k) Equations (5) and (6) can be expressed in matrix form as



x1 (k + 1) 0.8 = x2 (k + 1) 0.2



0. 1 0.9

(6)

x1 (k) x2 (k)

(7)

which provides a way of using matrix multiplication to compute the state of the system at time t = k + 1 from the state at time t = k. For example, using (1) and (7) we obtain



x(1) =

0.8 0.2

which agrees with (4). Similarly,



0.8 x(2) = 0.2



0. 1 0.8 x(0) = 0.9 0.2



0. 1 0.8 x(1) = 0.9 0.2

0.1 0.9





0. 1 0.9





0.5 0.45 = 0 .5 0.55

0.45 0.415 = 0.55 0.585

We can now continue this process, using Formula (7) to compute x(3) from x(2), then x(4) from x(3), and so on. This yields (verify)







0.3905 0.37335 0.361345 , x(4) = , x(5) = x(3) = 0.6095 0.62665 0.638655

(8)

Thus, after five years, channel 1 will hold about 36% of the market and channel 2 will hold about 64% of the market. If desired, we can continue the market analysis in the last example beyond the fiveyear period and explore what happens to the market share over the long term. We did so, using a computer, and obtained the following state vectors (rounded to six decimal places):







0.338041 0.333466 0.333333 x(10) ≈ , x(20) ≈ , x(40) ≈ 0.661959 0.666534 0.666667

(9)

334

Chapter 5 Eigenvalues and Eigenvectors

All subsequent state vectors, when rounded to six decimal places, are the same as x(40), so we see that the market shares eventually stabilize with channel 1 holding about onethird of the market and channel 2 holding about two-thirds. Later in this section, we will explain why this stabilization occurs. Markov Chains

In many dynamical systems the states of the variables are not known with certainty but can be expressed as probabilities; such dynamical systems are called stochastic processes (from the Greek word stochastikos, meaning “proceeding by guesswork”). A detailed study of stochastic processes requires a precise definition of the term probability, which is outside the scope of this course. However, the following interpretation will suffice for our present purposes: Stated informally, the probability that an experiment or observation will have a certain outcome is the fraction of the time that the outcome would occur if the experiment could be repeated indefinitely under constant conditions—the greater the number of actual repetitions, the more accurately the probability describes the fraction of time that the outcome occurs. For example, when we say that the probability of tossing heads with a fair coin is 21 , we mean that if the coin were tossed many times under constant conditions, then we would expect about half of the outcomes to be heads. Probabilities are often expressed as decimals or percentages. Thus, the probability of tossing heads with a fair coin can also be expressed as 0.5 or 50%. If an experiment or observation has n possible outcomes, then the probabilities of those outcomes must be nonnegative fractions whose sum is 1. The probabilities are nonnegative because each describes the fraction of occurrences of an outcome over the long term, and the sum is 1 because they account for all possible outcomes. For example, if a box containing 10 balls has one red ball, three green balls, and six yellow balls, and if a ball is drawn at random from the box, then the probabilities of the various outcomes are p1 = prob(red) = 1/10 = 0.1 p2 = prob(green) = 3/10 = 0.3 p3 = prob(yellow) = 6/10 = 0.6 Each probability is a nonnegative fraction and

p1 + p2 + p3 = 0.1 + 0.3 + 0.6 = 1 In a stochastic process with n possible states, the state vector at each time t has the form ⎡ ⎤

x1 (t) ⎢x2 (t)⎥ ⎢ ⎥ x(t) = ⎢ .. ⎥ ⎣ . ⎦ xn (t)

Probability that the system is in state 1 Probability that the system is in state 2

.. .

Probability that the system is in state n

The entries in this vector must add up to 1 since they account for all n possibilities. In general, a vector with nonnegative entries that add up to 1 is called a probability vector. E X A M P L E 3 Example 1 Revisited from the Probability Viewpoint

Observe that the state vectors in Examples 1 and 2 are all probability vectors. This is to be expected since the entries in each state vector are the fractional market shares of the channels, and together they account for the entire market. In practice, it is preferable

5.5 Dynamical Systems and Markov Chains

335

to interpret the entries in the state vectors as probabilities rather than exact market fractions, since market information is usually obtained by statistical sampling procedures with intrinsic uncertainties. Thus, for example, the state vector



x1 (1) 0.45 x(1) = = x2 (1) 0.55

which we interpreted in Example 1 to mean that channel 1 has 45% of the market and channel 2 has 55%, can also be interpreted to mean that an individual picked at random from the market will be a channel 1 viewer with probability 0.45 and a channel 2 viewer with probability 0.55.

A square matrix, each of whose columns is a probability vector, is called a stochastic matrix. Such matrices commonly occur in formulas that relate successive states of a stochastic process. For example, the state vectors x(k + 1) and x(k) in (7) are related by an equation of the form x(k + 1) = P x(k) in which



0 .8 P = 0.2

0.1 0.9

(10)

is a stochastic matrix. It should not be surprising that the column vectors of P are probability vectors, since the entries in each column provide a breakdown of what happens to each channel’s market share over the year—the entries in column 1 convey that each year channel 1 retains 80% of its market share and loses 20%; and the entries in column 2 convey that each year channel 2 retains 90% of its market share and loses 10%. The entries in (10) can also be viewed as probabilities:

p11 p21 p12 p22

= 0.8 = probability that a channel 1 viewer remains a channel 1 viewer = 0.2 = probability that a channel 1 viewer becomes a channel 2 viewer = 0.1 = probability that a channel 2 viewer becomes a channel 1 viewer = 0.9 = probability that a channel 2 viewer remains a channel 2 viewer

Example 1 is a special case of a large class of stochastic processes called Markov chains.

DEFINITION 1 A Markov chain is a dynamical system whose state vectors at a succession of equally spaced times are probability vectors and for which the state vectors at successive times are related by an equation of the form

x(k + 1) = P x(k)

State at time t = k

pij

State at time t=k+1

The entry pij is the probability that the system is in state i at time t = k + 1 if it is in state j at time t = k.

Figure 5.5.2

in which P = [pij ] is a stochastic matrix and pij is the probability that the system will be in state i at time t = k + 1 if it is in state j at time t = k. The matrix P is called the transition matrix for the system.

WARNING Note that in this definition the row index

column index j to the earlier state (Figure 5.5.2).

i corresponds to the later state and the

336

Chapter 5 Eigenvalues and Eigenvectors

E X A M P L E 4 Wildlife Migration as a Markov Chain

Suppose that a tagged lion can migrate over three adjacent game reserves in search of food, reserve 1, reserve 2, and reserve 3. Based on data about the food resources, researchers conclude that the monthly migration pattern of the lion can be modeled by a Markov chain with transition matrix

0.5

Reserve 1 0.2

Reserve at time t = k 0.3

0.4



0.6

0 .5 ⎢ P = ⎣0.2 0 .3

0.3 0.2

Reserve 2

0.4

Figure 5.5.3

1

Reserve 0.1 3

2

0.4 0.2 0.4

3



0 .6 1 ⎥ 0.3⎦ 2 0 .1 3

Reserve at time t = k + 1

(see Figure 5.5.3). That is,

p11 p12 p13 p21 p22 p23 p31 p32 p33

= 0.5 = probability that the lion will stay in reserve 1 when it is in reserve 1 = 0.4 = probability that the lion will move from reserve 2 to reserve 1 = 0.6 = probability that the lion will move from reserve 3 to reserve 1 = 0.2 = probability that the lion will move from reserve 1 to reserve 2 = 0.2 = probability that the lion will stay in reserve 2 when it is in reserve 2 = 0.3 = probability that the lion will move from reserve 3 to reserve 2 = 0.3 = probability that the lion will move from reserve 1 to reserve 3 = 0.4 = probability that the lion will move from reserve 2 to reserve 3 = 0.1 = probability that the lion will stay in reserve 3 when it is in reserve 3

Assuming that t is in months and the lion is released in reserve 2 at time t = 0, track its probable locations over a six-month period. Solution Let x1 (k), x2 (k), and x3 (k) be the probabilities that the lion is in reserve 1, 2, or 3, respectively, at time t = k, and let

⎤ x1 (k) x(k) = ⎣x2 (k)⎦ x3 (k) ⎡

be the state vector at that time. Since we know with certainty that the lion is in reserve 2 at time t = 0, the initial state vector is

⎡ ⎤

0 x(0) = ⎣1⎦ 0

Historical Note Markov chains are named in honor of the Russian mathematician A. A. Markov, a lover of poetry, who used them to analyze the alternation of vowels and consonants in the poem Eugene Onegin by Pushkin. Markov believed that the only applications of his chains were to the analysis of literary works, so he would be astonished to learn that his discovery is used today in the social sciences, quantum theory, and genetics! [Image: SPL/Science Source]

Andrei Andreyevich Markov (1856–1922)

5.5 Dynamical Systems and Markov Chains

We leave it for you to show that the state vectors over a six-month period are

337

























0.400 0.520 0.500 ⎦ ⎦ ⎣ ⎣ ⎣ x(1) = P x(0) = 0.200 , x(2) = P x(1) = 0.240 , x(3) = P x(2) = 0.224⎦ 0.400 0.240 0.276 0.505 0.504 0.504 x(4) = P x(3) ≈ ⎣0.228⎦ , x(5) = P x(4) ≈ ⎣0.227⎦ , x(6) = P x(5) ≈ ⎣0.227⎦ 0.267 0.269 0.269 As in Example 2, the state vectors here seem to stabilize over time with a probability of approximately 0.504 that the lion is in reserve 1, a probability of approximately 0.227 that it is in reserve 2, and a probability of approximately 0.269 that it is in reserve 3. Markov Chains inTerms of Powers of theTransition Matrix

In a Markov chain with an initial state of x(0), the successive state vectors are x(1) = P x(0), x(2) = P x(1), x(3) = P x(2), x(4) = P x(3), . . . For brevity, it is common to denote x(k) by xk , which allows us to write the successive state vectors more briefly as x1 = P x0 , x2 = P x1 , x3 = P x2 , x4 = P x3 , . . .

(11)

Alternatively, these state vectors can be expressed in terms of the initial state vector x0 as x1 = P x0 , x2 = P (P x0 ) = P 2 x0 , x3 = P (P 2 x0 ) = P 3 x0 , x4 = P (P 3 x0 ) = P 4 x0 , . . . from which it follows that Note that Formula (12) makes it possible to compute the state vector xk without first computing the earlier state vectors as required in Formula (11).

xk = P k x0

(12)

E X A M P L E 5 Finding a State Vector Directly from x0

Use Formula (12) to find the state vector x(3) in Example 2. Solution From (1) and (7), the initial state vector and transition matrix are



0 .5 x0 = x(0) = 0 .5



0.8 and P = 0.2

We leave it for you to calculate P 3 and show that



x(3) = x3 = P 3 x0 =

0.562 0.438



0.219 0.781



0.1 0.9



0 .5 0.3905 = 0 .5 0.6095

which agrees with the result in (8). Long-Term Behavior of a Markov Chain

We have seen two examples of Markov chains in which the state vectors seem to stabilize after a period of time. Thus, it is reasonable to ask whether all Markov chains have this property. The following example shows that this is not the case. E X A M P L E 6 A Markov Chain That Does Not Stabilize

The matrix



0 P = 1

1 0

338

Chapter 5 Eigenvalues and Eigenvectors

is stochastic and hence can be regarded as the transition matrix for a Markov chain. A simple calculation shows that P 2 = I, from which it follows that

I = P 2 = P 4 = P 6 = · · · and P = P 3 = P 5 = P 7 = · · · Thus, the successive states in the Markov chain with initial vector x0 are x 0 , P x0 , x0 , P x0 , x0 , . . . which oscillate between x0 and P x0 . Thus, the Markov chain does not stabilize unless both components of x0 are 21 (verify).

A precise definition of what it means for a sequence of numbers or vectors to stabilize is given in calculus; however, that level of precision will not be needed here. Stated informally, we will say that a sequence of vectors x1 , x2 , . . . , xk , . . . approaches a limit q or that it converges to q if all entries in xk can be made as close as we like to the corresponding entries in the vector q by taking k sufficiently large. We denote this by writing xk → q as k → ⬁. Similarly, we say that a sequence of matrices

P1 , P2 , P3 , . . . , Pk , . . . converges to a matrix Q, written Pk →Q as k → ⬁, if each entry of Pk can be made as close as we like to the corresponding entry of Q by taking k sufficiently large. We saw in Example 6 that the state vectors of a Markov chain need not approach a limit in all cases. However, by imposing a mild condition on the transition matrix of a Markov chain, we can guarantee that the state vectors will approach a limit.

DEFINITION 2 A stochastic matrix P is said to be regular if P or some positive power of P has all positive entries, and a Markov chain whose transition matrix is regular is said to be a regular Markov chain.

E X A M P L E 7 Regular Stochastic Matrices

The transition matrices in Examples 2 and 4 are regular because their entries are positive. The matrix

0.5 1 P = 0.5 0 is regular because



0.75 P = 0.25 2

0.5 0.5

has positive entries. The matrix P in Example 6 is not regular because P and every positive power of P have some zero entries (verify).

The following theorem, which we state without proof, is the fundamental result about the long-term behavior of Markov chains.

5.5 Dynamical Systems and Markov Chains

339

THEOREM 5.5.1 If P is the transition matrix for a regular Markov chain, then:

(a) There is a unique probability vector q with positive entries such that P q = q. (b) For any initial probability vector x0 , the sequence of state vectors x 0 , P x0 , . . . , P k x0 , . . . converges to q. (c) The sequence P , P 2 , P 3 , . . . , P k , . . . converges to the matrix Q each of whose column vectors is q. The vector q in Theorem 5.5.1 is called the steady-state vector of the Markov chain. Because it is a nonzero vector that satisfies the equation P q = q, it is an eigenvector corresponding to the eigenvalue λ = 1 of P . Thus, q can be found by solving the linear system (I − P )q = 0 (13) subject to the requirement that q be a probability vector. Here are some examples. E X A M P L E 8 Examples 1 and 2 Revisited

The transition matrix for the Markov chain in Example 2 is



0.8 P = 0.2

0.1 0.9

Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector q. To find q we will solve the system (I − P )q = 0, which we can write as

0.2 −0.1 0 q1 = 0 −0.2 0.1 q2 The general solution of this system is

q1 = 0.5s, q2 = s (verify), which we can write in vector form as



1  s q1 0.5s = 2 = q= q2 s s

(14)

For q to be a probability vector, we must have 1 = q1 + q2 = 23 s which implies that s = 23 . Substituting this value in (14) yields the steady-state vector

1

q=

3 2 3

which is consistent with the numerical results obtained in (9). E X A M P L E 9 Example 4 Revisited

The transition matrix for the Markov chain in Example 4 is



0 .5 ⎢ 0 P = ⎣ .2 0 .3

0.4 0.2 0.4



0 .6 ⎥ 0.3⎦ 0 .1

340

Chapter 5 Eigenvalues and Eigenvectors

Since the entries of P are positive, the Markov chain is regular and hence has a unique steady-state vector q. To find q we will solve the system (I − P )q = 0, which we can write (using fractions) as



1 2 ⎢ 1 ⎣−5 − 103

⎤⎡ ⎤ ⎡ ⎤ q1 0 3 ⎥ ⎣ ⎦ ⎣ − 10 ⎦ q2 = 0⎦ 9 q3 0

− 25

− 35

4 5 2 −5

(15)

10

(We have converted to fractions to avoid roundoff error in this illustrative example.) We leave it for you to confirm that the reduced row echelon form of the coefficient matrix is



1

− 158

0

⎢ ⎣0 0



1

27 ⎥ − 32 ⎦

0

0

and that the general solution of (15) is

q1 =

q2 =

15 s, 8

27 s, 32

q3 = s

(16)

For q to be a probability vector we must have q1 + q2 + q3 = 1, from which it follows 32 that s = 119 (verify). Substituting this value in (16) yields the steady-state vector

⎡ 60 ⎤

⎤ ⎡ 0.5042 119 ⎢ 27 ⎥ ⎥ ⎣ ⎦ q=⎢ ⎣ 119 ⎦ ≈ 0.2269 32 0.2689 119

(verify), which is consistent with the results obtained in Example 4.

Exercise Set 5.5 In Exercises 1–2, determine whether A is a stochastic matrix. If A is not stochastic, then explain why not.



0.4 0.6

1. (a) A =



0.3 0.7





⎡1

0.6 0 .7



3

1 3

1 (d) A = ⎢ ⎣6

1 3

⎥ − 21 ⎥ ⎦

1 2

1 3

1

1

1 2

1 3

(c) A = ⎢ ⎣0

0

1⎥ 3⎦

0

1 2

1 3





0.2 0.8

2. (a) A =

⎡1

(b) A =



1 9

1 6

2

0

5⎥ 6⎦

5 12

8 9



12

⎢1

(c) A = ⎢ ⎣

0 .9 0 .1

0.4 0.3

(b) A =

0.2 0.9





0.8 0.1

1 2





1 3

1 2

(d) A = ⎢ ⎣ 0

1 3

1⎥ 2⎦

2

1 3

0



0

In Exercises 3–4, use Formulas (11) and (12) to compute the state vector x4 in two different ways.



0.5 3. P = 0.5



0.6 0.5 ; x0 = 0.5 0 .4

0.8 0.2



0 .5 1 ; x0 = 0 0 .5

In Exercises 5–6, determine whether P is a regular stochastic matrix. 1 1 1  1  0 1 5 7 5 5 5. (a) P = (b) P = (c) P = 4 6 4 4 1 0 5 7 5 5

1

6. (a) P =

−1



4. P =

2

1

1 2

0





(b) P =

1

2 3

0

1 3



3

(c) P =

4

1 3

1 4

2 3



In Exercises 7–10, verify that P is a regular stochastic matrix, and find the steady-state vector for the associated Markov chain.

1 7. P =

4

2 3

3 4

1 3

⎡1 2

⎢1 9. P = ⎢ ⎣4 1 4





 8. P =

0.2 0.6 0.8 0.4



⎡1

1 2

⎥ 1⎥ 3⎦

⎢ 10. P = ⎢ ⎣0

0

2 3

1 2

0

3

2 3



1 4

2 5

3 4

2⎥ 5⎦

0

1 5



5.5 Dynamical Systems and Markov Chains

11. Consider a Markov process with transition matrix

State 1 State 2

State 1 0 .2 0 .8

State 2

0.1 0.9

(a) What does the entry 0.2 represent? (b) What does the entry 0.1 represent? (c) If the system is in state 1 initially, what is the probability that it will be in state 2 at the next observation? (d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the next observation? 12. Consider a Markov process with transition matrix State 1  0

State 2

1

6 7

State 1 State 2 (a) What does the entry

6 7

1 7



represent?

(b) What does the entry 0 represent? (c) If the system is in state 1 initially, what is the probability that it will be in state 1 at the next observation? (d) If the system has a 50% chance of being in state 1 initially, what is the probability that it will be in state 2 at the next observation? 13. On a given day the air quality in a certain city is either good or bad. Records show that when the air quality is good on one day, then there is a 95% chance that it will be good the next day, and when the air quality is bad on one day, then there is a 45% chance that it will be bad the next day. (a) Find a transition matrix for this phenomenon.

341

(c) If the mouse chooses type II today, what is the probability that it will choose type II three days from now? (d) If there is a 10% chance that the mouse will choose type I today, what is the probability that it will choose type I tomorrow? 15. Suppose that at some initial point in time 100,000 people live in a certain city and 25,000 people live in its suburbs. The Regional Planning Commission determines that each year 5% of the city population moves to the suburbs and 3% of the suburban population moves to the city. (a) Assuming that the total population remains constant, make a table that shows the populations of the city and its suburbs over a five-year period (round to the nearest integer). (b) Over the long term, how will the population be distributed between the city and its suburbs? 16. Suppose that two competing television stations, station 1 and station 2, each have 50% of the viewer market at some initial point in time. Assume that over each one-year period station 1 captures 5% of station 2’s market share and station 2 captures 10% of station 1’s market share. (a) Make a table that shows the market share of each station over a five-year period. (b) Over the long term, how will the market share be distributed between the two stations? 17. Fill in the missing entries of the stochastic matrix



7 ⎢ 10

⎢ P = ⎢∗ ⎣

1 10

∗ 3 10 3 5

1 5



⎥ ⎥ ∗⎥ ⎦

3 10

and find its steady-state vector.

(b) If the air quality is good today, what is the probability that it will be good two days from now?

18. If P is an n × n stochastic matrix, and if M is a 1 × n matrix . whose entries are all 1’s, then MP =

(c) If the air quality is bad today, what is the probability that it will be bad three days from now?

19. If P is a regular stochastic matrix with steady-state vector q, what can you say about the sequence of products

(d) If there is a 20% chance that the air quality will be good today, what is the probability that it will be good tomorrow? 14. In a laboratory experiment, a mouse can choose one of two food types each day, type I or type II. Records show that if the mouse chooses type I on a given day, then there is a 75% chance that it will choose type I the next day, and if it chooses type II on one day, then there is a 50% chance that it will choose type II the next day. (a) Find a transition matrix for this phenomenon. (b) If the mouse chooses type I today, what is the probability that it will choose type I two days from now?

P q, P 2 q, P 3 q, . . . , P k q, . . . as k → ⬁? 20. (a) If P is a regular n × n stochastic matrix with steady-state vector q, and if e1 , e2 , . . . , en are the standard unit vectors in column form, what can you say about the behavior of the sequence

P ei , P 2 ei , P 3 ei , . . . , P k ei , . . . as k → ⬁ for each i = 1, 2, . . . , n? (b) What does this tell you about the behavior of the column vectors of P k as k → ⬁?

342

Chapter 5 Eigenvalues and Eigenvectors

Rented from Location 1 2 3

Working with Proofs 21. Prove that the product of two stochastic matrices with the same size is a stochastic matrix. [Hint: Write each column of the product as a linear combination of the columns of the first factor.]

Returned to Location

22. Prove that if P is a stochastic matrix whose entries are all greater than or equal to ρ, then the entries of P 2 are greater than or equal to ρ .

1

1 10

1 5

3 5

2

4 5

3 10

1 5

3

1 10

1 2

1 5

True-False Exercises

(a) Assuming that a car is rented from location 1, what is the probability that it will be at location 1 after two rentals?

TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.

(b) Assuming that this dynamical system can be modeled as a Markov chain, find the steady-state vector.

⎡1⎤

(c) If the rental agency owns 120 cars, how many parking spaces should it allocate at each location to be reasonably certain that it will have enough spaces for the cars over the long term? Explain your reasoning.

3

⎢ ⎥

(a) The vector ⎣ 0 ⎦ is a probability vector. 2 3

 (b) The matrix

0.2

1

0.8

0

 is a regular stochastic matrix.

(c) The column vectors of a transition matrix are probability vectors. (d) A steady-state vector for a Markov chain with transition matrix P is any solution of the linear system (I − P )q = 0.

T3. Physical traits are determined by the genes that an offspring receives from its parents. In the simplest case a trait in the offspring is determined by one pair of genes, one member of the pair inherited from the male parent and the other from the female parent. Typically, each gene in a pair can assume one of two forms, called alleles, denoted by A and a. This leads to three possible pairings:

AA, Aa, aa

(e) The square of every regular stochastic matrix is stochastic. (f ) A vector with real entries that sum to 1 is a probability vector. (g) Every regular stochastic matrix has λ = 1 as an eigenvalue.

Working withTechnology T1. In Examples 4 and 9 we considered the Markov chain with transition matrix P and initial state vector x(0) where



0.5

⎢ P = ⎣0.2

0 .3



⎡ ⎤



⎢ ⎥

0.4

0.6

0.2

0.3⎦ and x(0) = ⎣1⎦

0.4

0.1

called genotypes (the pairs Aa and aA determine the same trait and hence are not distinguished from one another). It is shown in the study of heredity that if a parent of known genotype is crossed with a random parent of unknown genotype, then the offspring will have the genotype probabilities given in the following table, which can be viewed as a transition matrix for a Markov process: Genotype of Parent

AA

Aa

AA

1 2

1 4

0

Aa

1 2

1 2

1 2

aa

0

1 4

1 2

0 0

(a) Confirm the numerical values of x(1), x(2), . . . , x(6) obtained in Example 4 using the method given in that example. (b) As guaranteed by part (c) of Theorem 5.5.1, confirm that the sequence P , P 2 , P 3 , . . . , P k , . . . converges to the matrix Q each of whose column vectors is the steady-state vector q obtained in Example 9. T2. Suppose that a car rental agency has three locations, numbered 1, 2, and 3. A customer may rent a car from any of the three locations and return it to any of the three locations. Records show that cars are rented and returned in accordance with the following probabilities:

Genotype of Offspring

aa

Thus, for example, the offspring of a parent of genotype AA that is crossed at random with a parent of unknown genotype will have a 50% chance of being AA, a 50% chance of being Aa , and no chance of being aa. (a) Show that the transition matrix is regular. (b) Find the steady-state vector, and discuss its physical interpretation.

Chapter 5 Supplementary Exercises

343

Chapter 5 Supplementary Exercises 1. (a) Show that if 0 < θ < π , then

Verify this result for

A=

− sin θ cos θ

cos θ sin θ



(a) A =

has no real eigenvalues and consequently no real eigenvectors. (b) Give a geometric explanation of the result in part (a). 2. Find the eigenvalues of



0 A = ⎣0

k3

0 1⎦ 3k

3. (a) Show that if D is a diagonal matrix with nonnegative entries on the main diagonal, then there is a matrix S such that S 2 = D . (b) Show that if A is a diagonalizable matrix with nonnegative eigenvalues, then there is a matrix S such that S 2 = A. (c) Find a matrix S such that S 2 = A, given that



1 A = ⎣0 0



3 4 0

6

1

2

1

0

(b) A = ⎣0

0

1⎦

1

−3



1 5⎦ 9

c0 + c1 λ + λ2 = 0 then c0 I + c1 A + A2 = 0, so

A2 = −c1 A − c0 I Multiplying through by A yields A3 = −c1 A2 − c0 A, which expresses A3 in terms of A2 and A, and multiplying through by A2 yields A4 = −c1 A3 − c0 A2 , which expresses A4 in terms of A3 and A2 . Continuing in this way, we can calculate successive powers of A by expressing them in terms of lower powers. Use this procedure to calculate A2 , A3 , A4 , and A5 for

A=

(c) A−1 and B −1 (if A is invertible) 5. Prove: If A is a square matrix and p(λ) = det(λI − A) is the characteristic polynomial of A, then the coefficient of λn−1 in p(λ) is the negative of the trace of A.

0

is not diagonalizable. 7. In advanced linear algebra, one proves the Cayley–Hamilton Theorem, which states that a square matrix A satisfies its characteristic equation; that is, if

c0 + c1 λ + c2 λ + · · · + cn−1 λ 2

n−1

+λ =0 n

is the characteristic equation of A, then

c0 I + c1 A + c2 A + · · · + cn−1 A 2

6 2

⎢ A = ⎣0

0

1

−3

n−1

+A =0 n



1⎦ 3

11. Find the eigenvalues of the matrix



6. Prove: If b  = 0, then

b a

3 1

10. Use the method of the preceding exercise to calculate A3 and A4 for ⎡ ⎤ 0 1 0

(b) Ak and B k (k a positive integer)

a

3

9. The Cayley–Hamilton Theorem provides a method for calculating powers of a matrix. For example, if A is a 2 × 2 matrix with characteristic equation

(a) AT and B T





In Exercises 8–10, use the Cayley–Hamilton Theorem, stated in Exercise 7. 8. (a) Use Exercise 28 of Section 5.1 to establish the Cayley– Hamilton Theorem for 2 × 2 matrices.

4. Given that A and B are similar matrices, in each part determine whether the given matrices are also similar.

A=



0

(b) Prove the Cayley–Hamilton Theorem for n × n diagonalizable matrices.



1 0 −3k 2

3





c1 ⎢c ⎢ 1 A=⎢. ⎣ ..

c2 c2 .. .

c1

c2

··· ··· ···

⎤ cn cn ⎥ ⎥ .. ⎥ .⎦ cn

12. (a) It was shown in Exercise 37 of Section 5.1 that if A is an n × n matrix, then the coefficient of λn in the characteristic polynomial of A is 1. (A polynomial with this property is called monic.) Show that the matrix



0

⎢ ⎢1 ⎢ ⎢0 ⎢ ⎢ .. ⎣. 0

0

0

··· ··· ···

0

···

1

0

0

0

0

1 0

.. .

.. .

0 0

.. .

−c0 −c1 −c2 .. .

−cn−1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

344

Chapter 5 Eigenvalues and Eigenvectors

has characteristic polynomial

p(λ) = c0 + c1 λ + · · · + cn−1 λ

n−1



n

This shows that every monic polynomial is the characteristic polynomial of some matrix. The matrix in this example is called the companion matrix of p(λ). [Hint: Evaluate all determinants in the problem by adding a multiple of the second row to the first to introduce a zero at the top of the first column, and then expanding by cofactors along the first column.]

16. Suppose that a 4 × 4 matrix A has eigenvalues λ1 = 1, λ2 = −2, λ3 = 3, and λ4 = −3. (a) Use the method of Exercise 24 of Section 5.1 to find det(A). (b) Use Exercise 5 above to find tr(A). 17. Let A be a square matrix such that A3 = A. What can you say about the eigenvalues of A? 18. (a) Solve the system

y1 = y1 + 3y2 y2 = 2y1 + 4y2

(b) Find a matrix with characteristic polynomial

p(λ) = 1 − 2λ + λ2 + 3λ3 + λ4 13. A square matrix A is called nilpotent if An = 0 for some positive integer n. What can you say about the eigenvalues of a nilpotent matrix? 14. Prove: If A is an n × n matrix and n is odd, then A has at least one real eigenvalue. 15. Find a 3 × 3 matrix A that has eigenvalues λ = 0, 1, and −1 with corresponding eigenvectors



0



⎢ ⎥ ⎣ 1⎦, −1 respectively.





1

⎡ ⎤

(b) Find the solution satisfying the initial conditions y1 (0) = 5 and y2 (0) = 6. 19. Let A be a 3 × 3 matrix, one of whose eigenvalues is 1. Given that both the sum and the product of all three eigenvalues is 6, what are the possible values for the remaining two eigenvalues? 20. Show that the matrices



0

⎢ A = ⎣0

0

⎢ ⎥ ⎣−1⎦,

⎢ ⎥ ⎣1⎦

1

1

1









1

0

d1

0

0

1⎦ and D = ⎣ 0

d2

0

0

0

0



0



0⎦

d3

are similar if

dk = cos

2πk 2πk + i sin 3 3

(k = 1, 2, 3)

CHAPTER

6

Inner Product Spaces CHAPTER CONTENTS

6.1 Inner Products

345

6.2 Angle and Orthogonality in Inner Product Spaces 6.3 Gram–Schmidt Process; QR-Decomposition 6.4 Best Approximation; Least Squares

378

6.5 Mathematical Modeling Using Least Squares 6.6 Function Approximation; Fourier Series INTRODUCTION

355

364

387

394

In Chapter 3 we defined the dot product of vectors in R n , and we used that concept to define notions of length, angle, distance, and orthogonality. In this chapter we will generalize those ideas so they are applicable in any vector space, not just R n . We will also discuss various applications of these ideas.

6.1 Inner Products In this section we will use the most important properties of the dot product on R n as axioms, which, if satisfied by the vectors in a vector space V, will enable us to extend the notions of length, distance, angle, and perpendicularity to general vector spaces.

General Inner Products

Note that Definition 1 applies only to real vector spaces. A definition of inner products on complex vector spaces is given in the exercises. Since we will have little need for complex vector spaces from this point on, you can assume that all vector spaces under discussion are real, even though some of the theorems are also valid in complex vector spaces.

In Definition 4 of Section 3.2 we defined the dot product of two vectors in R n , and in Theorem 3.2.2 we listed four fundamental properties of such products. Our first goal in this section is to extend the notion of a dot product to general real vector spaces by using those four properties as axioms. We make the following definition. DEFINITION 1 An inner product on a real vector space V is a function that associates a real number u, v with each pair of vectors in V in such a way that the following axioms are satisfied for all vectors u, v, and w in V and all scalars k .

1. u, v = v, u

[ Symmetry axiom ]

2. u + v, w = u, w + v, w

[ Additivity axiom ]

3. k u, v = ku, v

[ Homogeneity axiom ]

4. v, v ≥ 0 and v, v = 0 if and only if v = 0 [ Positivity axiom ] A real vector space with an inner product is called a real inner product space. Because the axioms for a real inner product space are based on properties of the dot product, these inner product space axioms will be satisfied automatically if we define the inner product of two vectors u and v in R n to be

u, v = u · v = u1 v1 + u2 v2 + · · · + un vn

(1) 345

346

Chapter 6 Inner Product Spaces

This inner product is commonly called the Euclidean inner product (or the standard inner product) on R n to distinguish it from other possible inner products that might be defined on R n . We call R n with the Euclidean inner product Euclidean n-space. Inner products can be used to define notions of norm and distance in a general inner product space just as we did with dot products in R n . Recall from Formulas (11) and (19) of Section 3.2 that if u and v are vectors in Euclidean n-space, then norm and distance can be expressed in terms of the dot product as

v =



v · v and d(u, v) = u − v =

" (u − v) · (u − v)

Motivated by these formulas, we make the following definition. DEFINITION 2 If V is a real inner product space, then the norm (or length) of a vector v in V is denoted by v and is defined by

v =

" v, v

and the distance between two vectors is denoted by d(u, v) and is defined by

d(u, v) = u − v =

" u − v, u − v

A vector of norm 1 is called a unit vector. The following theorem, whose proof is left for the exercises, shows that norms and distances in real inner product spaces have many of the properties that you might expect. THEOREM 6.1.1 If u and v are vectors in a real inner product space

V, and if k is a

scalar, then: (a) v ≥ 0 with equality if and only if v = 0. (b) k v = |k| v . (c)

d(u, v) = d(v, u).

(d ) d(u, v) ≥ 0 with equality if and only if u = v. Although the Euclidean inner product is the most important inner product on R n , there are various applications in which it is desirable to modify it by weighting each term differently. More precisely, if

w1 , w2 , . . . , wn are positive real numbers, which we will call weights, and if u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors in R n , then it can be shown that the formula

u, v = w1 u1 v1 + w2 u2 v2 + · · · + wn un vn

(2)

defines an inner product on R n that we call the weighted Euclidean inner product with weights w1 , w2 , . . . , wn . Note that the standard Euclidean inner product in Formula (1) is the special case of the weighted Euclidean inner product in which all the weights are 1.

E X A M P L E 1 Weighted Euclidean Inner Product

Let u = (u1 , u2 ) and v = (v1 , v2 ) be vectors in R 2 . Verify that the weighted Euclidean inner product u, v = 3u1 v1 + 2u2 v2 (3) satisfies the four inner product axioms.

6.1 Inner Products

347

Solution

Axiom 1: Interchanging u and v in Formula (3) does not change the sum on the right side, so u, v = v, u. In Example 1, we are using subscripted w ’s to denote the components of the vector w, not the weights. The weights are the numbers 3 and 2 in Formula (3).

Axiom 2: If w = (w1 , w2 ), then

u + v, w = 3(u1 + v1 )w1 + 2(u2 + v2 )w2 = 3(u1 w1 + v1 w1 ) + 2(u2 w2 + v2 w2 ) = (3u1 w1 + 2u2 w2 ) + (3v1 w1 + 2v2 w2 ) = u, w + v, w Axiom 3: k u, v = 3(ku1 )v1 + 2(ku2 )v2

= k(3u1 v1 + 2u2 v2 ) = ku, v Axiom 4: v, v = 3(v1 v1 ) + 2(v2 v2 ) = 3v12 + 2v22 ≥ 0 with equality if and only if v1 = v2 = 0, that is, if and only if v = 0. An Application of Weighted Euclidean Inner Products

To illustrate one way in which a weighted Euclidean inner product can arise, suppose that some physical experiment has n possible numerical outcomes

x1 , x2 , . . . , xn and that a series of m repetitions of the experiment yields these values with various frequencies. Specifically, suppose that x1 occurs f1 times, x2 occurs f2 times, and so forth. Since there is a total of m repetitions of the experiment, it follows that

f1 + f2 + · · · + fn = m Thus, the arithmetic average of the observed numerical values (denoted by x¯ ) is

x¯ = If we let

1 f1 x1 + f2 x2 + · · · + fn xn = (f1 x1 + f2 x2 + · · · + fn xn ) f1 + f2 + · · · + fn m

(4)

f = (f1 , f2 , . . . , fn ) x = (x1 , x2 , . . . , xn )

w1 = w2 = · · · = wn = 1/m then (4) can be expressed as the weighted Euclidean inner product

x¯ = f, x = w1 f1 x1 + w2 f2 x2 + · · · + wn fn xn E X A M P L E 2 Calculating with a Weighted Euclidean Inner Product

It is important to keep in mind that norm and distance depend on the inner product being used. If the inner product is changed, then the norms and distances between vectors also change. For example, for the vectors u = (1, 0) and v = (0, 1) in R 2 with the Euclidean inner product we have "

u = 12 + 0 2 = 1 and

d(u, v) = u − v = (1, −1) =

"

12 + (−1)2 =

but if we change to the weighted Euclidean inner product

u, v = 3u1 v1 + 2u2 v2 we have

u = u, u1/2 = [3(1)(1) + 2(0)(0)]1/2 =



3



2

348

Chapter 6 Inner Product Spaces

and

Unit Circles and Spheres in Inner Product Spaces

d(u, v) = u − v = (1, −1), (1, −1)1/2 √ = [3(1)(1) + 2(−1)(−1)]1/2 = 5

DEFINITION 3 If V is an inner product space, then the set of points in V that satisfy

u = 1 is called the unit sphere or sometimes the unit circle in V .

y

E X A M P L E 3 Unusual Unit Circles in R 2

||u|| = 1 x 1

(a) Sketch the unit circle in an xy-coordinate system in R 2 using the Euclidean inner product u, v = u1 v1 + u2 v2 . (b) Sketch the unit circle in an xy-coordinate system in R 2 using the weighted Euclidean inner product u, v = 19 u1 v1 + 41 u2 v2 .

(a) The unit circle using the standard Euclidean inner product.

= (x, y), then u = u, u1/2 = x 2 + y 2 = 1, or on squaring both sides,

Solution (a) If u

circle is

"

x2 + y2 = 1

y 2

" x 2 + y 2 , so the equation of the unit

||u|| = 1

As expected, the graph of this equation is a circle of radius 1 centered at the origin (Figure 6.1.1a). x 3

Solution (b) If u

#

unit circle is (b) The unit circle using a weighted Euclidean inner product.

Figure 6.1.1

Inner Products Generated by Matrices

= (x, y), then u = u, u1/2 =

1 2 x 9

#

1 2 x 9

+ 41 y 2 , so the equation of the

+ 41 y 2 = 1, or on squaring both sides, x2 9

+

y2 4

=1

The graph of this equation is the ellipse shown in Figure 6.1.1b. Though this may seem odd when viewed geometrically, it makes sense algebraically since all points on the ellipse are 1 unit away from the origin relative to the given weighted Euclidean inner product. In short, weighting has the effect of distorting the space that we are used to seeing through “unweighted Euclidean eyes.”

The Euclidean inner product and the weighted Euclidean inner products are special cases of a general class of inner products on R n called matrix inner products. To define this class of inner products, let u and v be vectors in R n that are expressed in column form, and let A be an invertible n × n matrix. It can be shown (Exercise 47) that if u · v is the Euclidean inner product on R n , then the formula

u, v = Au · Av

(5)

also defines an inner product; it is called the inner product on R n generated by A. Recall from Table 1 of Section 3.2 that if u and v are in column form, then u · v can be written as vTu from which it follows that (5) can be expressed as

u, v = (Av)TAu

6.1 Inner Products

349

or equivalently as

u, v = vTATAu

(6)

E X A M P L E 4 Matrices Generating Weighted Euclidean Inner Products

The standard Euclidean and weighted Euclidean inner products are special cases of matrix inner products. The standard Euclidean inner product on R n is generated by the n × n identity matrix, since setting A = I in Formula (5) yields

u, v = I u · I v = u · v and the weighted Euclidean inner product

u, v = w1 u1 v1 + w2 u2 v2 + · · · + wn un vn is generated by the matrix

⎡√ w1 ⎢ ⎢ 0 A=⎢ ⎢ .. ⎣ .

0

√ w2 .. .

0

0

0 0

(7)



··· ···

0 ⎥ 0 ⎥

.. ⎥ ⎥ . ⎦ √ 0 ··· wn .. .

This can be seen by observing that ATA is the n × n diagonal matrix whose diagonal entries are the weights w1 , w2 , . . . , wn . E X A M P L E 5 Example 1 Revisited Every diagonal matrix with positive diagonal entries generates a weighted inner product. Why?

Other Examples of Inner Products

The weighted Euclidean inner product u, v = 3u1 v1 + 2u2 v2 discussed in Example 1 is the inner product on R 2 generated by

√

A=

3 0



0 √ 2

So far, we have only considered examples of inner products on R n . We will now consider examples of inner products on some of the other kinds of vector spaces that we discussed earlier. E X A M P L E 6 The Standard Inner Product on M nn

If u = U and v = V are matrices in the vector space Mnn , then the formula

u, v = tr(U TV )

(8)

defines an inner product on Mnn called the standard inner product on that space (see Definition 8 of Section 1.3 for a definition of trace). This can be proved by confirming that the four inner product space axioms are satisfied, but we can see why this is so by computing (8) for the 2 × 2 matrices



U=

u1 u3

u2 u4



and V =

v1 v3

v2 v4

This yields

u, v = tr(U TV ) = u1 v1 + u2 v2 + u3 v3 + u4 v4

350

Chapter 6 Inner Product Spaces

which is just the dot product of the corresponding entries in the two matrices. And it follows from this that

u =

# " " u, u = trU T U  = u21 + u22 + u23 + u24

For example, if



1 3

u=U =

2 4

and v = V =

−1 3

0 2

then

u, v = tr(U T V ) = 1(−1) + 2(0) + 3(3) + 4(2) = 16 and

u =

v =

√ √

u, u = v, v =

" "

tr(U T U ) =



12 + 22 + 32 + 42 =



30

" √ tr(V T V ) = (−1)2 + 02 + 32 + 22 = 14

E X A M P L E 7 The Standard Inner Product on Pn

If p = a0 + a1 x + · · · + an x n and q = b0 + b1 x + · · · + bn x n are polynomials in Pn , then the following formula defines an inner product on Pn (verify) that we will call the standard inner product on this space:

p, q = a0 b0 + a1 b1 + · · · + an bn

(9)

The norm of a polynomial p relative to this inner product is

p =

# " p, p = a02 + a12 + · · · + an2

E X A M P L E 8 The Evaluation Inner Product on Pn

If p = p(x) = a0 + a1 x + · · · + an x n and q = q(x) = b0 + b1 x + · · · + bn x n are polynomials in Pn , and if x0 , x1 , . . . , xn are distinct real numbers (called sample points), then the formula

p, q = p(x0 )q(x0 ) + p(x1 )q(x1 ) + · · · + p(xn )q(xn )

(10)

defines an inner product on Pn called the evaluation inner product at x0 , x1 , . . . , xn . Algebraically, this can be viewed as the dot product in R n of the n-tuples





p(x0 ), p(x1 ), . . . , p(xn )

and





q(x0 ), q(x1 ), . . . , q(xn )

and hence the first three inner product axioms follow from properties of the dot product. The fourth inner product axiom follows from the fact that

p, p = [p(x0 )]2 + [p(x1 )]2 + · · · + [p(xn )]2 ≥ 0 with equality holding if and only if

p(x0 ) = p(x1 ) = · · · = p(xn ) = 0 But a nonzero polynomial of degree n or less can have at most n distinct roots, so it must be that p = 0, which proves that the fourth inner product axiom holds.

6.1 Inner Products

351

The norm of a polynomial p relative to the evaluation inner product is

p =

" " p, p = [p(x0 )]2 + [p(x1 )]2 + · · · + [p(xn )]2

(11)

E X A M P L E 9 Working with the Evaluation Inner Product

Let P2 have the evaluation inner product at the points

x0 = −2, x1 = 0, and x2 = 2 Compute p, q and p for the polynomials p = p(x) = x 2 and q = q(x) = 1 + x . Solution It follows from (10) and (11) that

p, q = p(−2)q(−2) + p(0)q(0) + p(2)q(2) = (4)(−1) + (0)(1) + (4)(3) = 8 " "

p = [p(x0 )]2 + [p(x1 )]2 + [p(x2 )]2 = [p(−2)]2 + [p(0)]2 + [p(2)]2 √ √ √ = 42 + 02 + 42 = 32 = 4 2

CA L C U L U S R E Q U I R E D

E X A M P L E 10 An Integral Inner Product on C [a, b]

Let f = f(x) and g = g(x) be two functions in C[a, b] and define



b

f, g =

f(x)g(x) dx

(12)

a

We will show that this formula defines an inner product on C[a, b] by verifying the four inner product axioms for functions f = f(x), g = g(x), and h = h(x) in C[a, b]:



Axiom 1: f, g =



b

b

f(x)g(x) dx = g(x)f(x) dx = g, f  a  a b Axiom 2: f + g, h = (f(x) + g(x))h(x) dx 

a b

=



b

f(x)h(x) dx +

g(x)h(x) dx

a

a

= f, h + g, h  b  Axiom 3: k f, g = kf(x)g(x) dx = k a

b

f(x)g(x) dx = kf, g

a

Axiom 4: If f = f(x) is any function in C[a, b], then



f, f  =

b

f 2 (x) dx ≥ 0

(13)

a

since f 2 (x) ≥ 0 for all x in the interval [a, b]. Moreover, because f is continuous on [a, b], the equality in Formula (13) holds if and only if the function f is identically zero on [a, b], that is, if and only if f = 0; and this proves that Axiom 4 holds. CA L C U L U S R E Q U I R E D

E X A M P L E 11 Norm of a Vector in C [a, b]

If C[a, b] has the inner product that was defined in Example 10, then the norm of a function f = f(x) relative to this inner product is

+ 

f = f, f

1/2

b

=

f 2 (x) dx a

(14)

352

Chapter 6 Inner Product Spaces

and the unit sphere in this space consists of all functions f in C[a, b] that satisfy the equation



b

f 2 (x) dx = 1

a

Remark Note that the vector space Pn is a subspace of C[a, b] because polynomials are continuous functions. Thus, Formula (12) defines an inner product on Pn that is different from both the standard inner product and the evaluation inner product.

WARNING Recall from calculus that the arc length of a curve y

is given by the formula

 L=

b

= f(x) over an interval [a, b]

"

1 + [f (x)]2 dx

(15)

a

Do not confuse this concept of arc length with f , which is the length (norm) of f when f is viewed as a vector in C[a, b]. Formulas (14) and (15) have different meanings.

Algebraic Properties of Inner Products

The following theorem lists some of the algebraic properties of inner products that follow from the inner product axioms. This result is a generalization of Theorem 3.2.3, which applied only to the dot product on R n . THEOREM 6.1.2 If u, v, and w are vectors in a real inner product space V, and if k is a

scalar, then: (a) 0, v = v, 0 = 0 (b) u, v + w = u, v + u, w (c)

u, v − w = u, v − u, w

(d ) u − v, w = u, w − v, w (e)

ku, v = u, k v

Proof We will prove part (b) and leave the proofs of the remaining parts as exercises.

u, v + w = v + w, u = v, u + w, u = u, v + u, w

[ By symmetry ] [ By additivity ] [ By symmetry ]

The following example illustrates how Theorem 6.1.2 and the defining properties of inner products can be used to perform algebraic computations with inner products. As you read through the example, you will find it instructive to justify the steps. E X A M P L E 1 2 Calculating with Inner Products

u − 2v, 3u + 4v = u, 3u + 4v − 2v, 3u + 4v = u, 3u + u, 4v − 2v, 3u − 2v, 4v = 3u, u + 4u, v − 6v, u − 8v, v = 3 u 2 + 4u, v − 6u, v − 8 v 2 = 3 u 2 − 2u, v − 8 v 2

6.1 Inner Products

353

Exercise Set 6.1 1. Let R 2 have the weighted Euclidean inner product

u, v = 2u1 v1 + 3u2 v2

16. x0 = −1, x1 = 0, x2 = 1, x3 = 2

and let u = (1, 1), v = (3, 2), w = (0, −1), and k = 3. Compute the stated quantities. (a) u, v

(b) k v, w

(c) u + v, w

(d) v

(e) d(u, v)

(f ) u − k v

2. Follow the directions of Exercise 1 using the weighted Euclidean inner product

u, v = 21 u1 v1 + 5u2 v2

3. A =

2

1

1

1



 4. A =



1

0

2

−1

In Exercises 5–6, find a matrix that generates the stated weighted inner product on R 2 . 5. u, v = 2u1 v1 + 3u2 v2

7. A =

4

1

2

−3



3 9. U = 4

8. A =



1 10. U = −3

−2

8



2 4 , V = 0 5

6 8



−2

3 4

21. U =

8



1 −3

−1

, V =

22. U =

3 1

1



2 4 , V = 0 5

6 8

In Exercises 23–24, let p = x + x 3 and q = 1 + x 2

3

23. x0 = −2, x1 = −1, x2 = 0, x3 = 1 24. x0 = −1, x1 = 0, x2 = 1, x3 = 2 In Exercises 25–26, find u and d(u, v) for the vectors u = (−1, 2) and v = (2, 5) relative to the inner product on R 2 generated by the matrix A.

11. p = −2 + x + 3x , q = 4 − 7x

 25. A =

2

12. p = −5 + 2x + x 2 , q = 3 + 2x − 4x 2 In Exercises 13–14, a weighted Euclidean inner product on R 2 is given for the vectors u = (u1 , u2 ) and v = (v1 , v2 ). Find a matrix that generates it. 13. u, v = 3u1 v1 + 5u2 v2

In Exercises 21–22, find U and d(U, V ) relative to the standard inner product on M22 .

−1

In Exercises 11–12, find the standard inner product on P2 of the given polynomials. 2

20. p = −5 + 2x + x 2 , q = 3 + 2x − 4x 2

Find p and d(p, q) relative to the evaluation inner product on P3 at the stated sample points.



3 1

1



In Exercises 19–20, find p and d(p, q) relative to the standard inner product on P2 .

1

−1

, V =

18. u = (−1, 2) and v = (2, 5)

2



In Exercises 9–10, compute the standard inner product on M22 of the given matrices.



17. u = (−3, 2) and v = (1, 7)

6. u, v = 21 u1 v1 + 5u2 v2

In Exercises 7–8, use the inner product on R 2 generated by the matrix A to find u, v for the vectors u = (0, −3) and v = (6, 2).



In Exercises 17–18, find u and d(u, v) relative to the weighted Euclidean inner product u, v = 2u1 v1 + 3u2 v2 on R 2 .

19. p = −2 + x + 3x 2 , q = 4 − 7x 2

In Exercises 3–4, compute the quantities in parts (a)–(f) of Exercise 1 using the inner product on R 2 generated by A.



15. x0 = −2, x1 = −1, x2 = 0, x3 = 1

14. u, v = 4u1 v1 + 6u2 v2

In Exercises 15–16, a sequence of sample points is given. Use the evaluation inner product on P3 at those sample points to find p, q for the polynomials p = x + x 3 and q = 1 + x 2



4

0

3

5

 26. A =



1

2

−1

3

In Exercises 27–28, suppose that u, v, and w are vectors in an inner product space such that

u, v = 2, v, w = −6, u, w = −3

u = 1,

v = 2,

w = 7

Evaluate the given expression. 27. (a) 2v − w, 3u + 2w

(b) u + v

28. (a) u − v − 2w, 4u + v

(b) 2w − v

In Exercises 29–30, sketch the unit circle in R 2 using the given inner product. 29. u, v = 41 u1 v1 +

1 uv 16 2 2

30. u, v = 2u1 v1 + u2 v2

354

Chapter 6 Inner Product Spaces

In Exercises 31–32, find a weighted Euclidean inner product on R 2 for which the “unit circle” is the ellipse shown in the accompanying figure. y

31.

(b) What conditions must k1 and k2 satisfy for u, v = k1 u1 v1 + k2 u2 v2 to define an inner product on R 2 ? Justify your answer.

y

32.

1

1

x

x 3 4

3

43. (a) Let u = (u1 , u2 ) and v = (v1 , v2 ). Prove that u, v = 3u1 v1 + 5u2 v2 defines an inner product on R 2 by showing that the inner product axioms hold.

44. Prove that the following identity holds for vectors in any inner product space.

u, v = 41 u + v 2 − 41 u − v 2 Figure Ex-31

Figure Ex-31

In Exercises 33–34, let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ). Show that the expression does not define an inner product on R 3 , and list all inner product axioms that fail to hold. 33. u, v =

u21 v12

+

u22 v22

+

u23 v32

34. u, v = u1 v1 − u2 v2 + u3 v3 In Exercises 35–36, suppose that u and v are vectors in an inner product space. Rewrite the given expression in terms of u, v,

u 2 , and v 2 . 35. 2v − 4u, u − 3v

36. 5u + 6v, 4v − 3u

37. (Calculus required ) Let the vector space P2 have the inner product



p, q =

1

p(x)q(x) dx −1

Find the following for p = 1 and q = x 2 . (a) p, q

(b) d(p, q)

(c) p

(d) q

38. (Calculus required ) Let the vector space P3 have the inner product



p, q =

1

45. Prove that the following identity holds for vectors in any inner product space.

u + v 2 + u − v 2 = 2 u 2 + 2 v 2 46. The definition of a complex vector space was given in the first margin note in Section 4.1. The definition of a complex inner product on a complex vector space V is identical to that in Definition 1 except that scalars are allowed to be complex numbers, and Axiom 1 is replaced by u, v = v, u. The remaining axioms are unchanged. A complex vector space with a complex inner product is called a complex inner product space. Prove that if V is a complex inner product space, then u, k v = ku, v. 47. Prove that Formula (5) defines an inner product on R n . 48. (a) Prove that if v is a fixed vector in a real inner product space V , then the mapping T : V →R defined by T (x) = x, v is a linear transformation. (b) Let V = R 3 have the Euclidean inner product, and let v = (1, 0, 2). Compute T (1, 1, 1). (c) Let V = P2 have the standard inner product, and let v = 1 + x . Compute T (x + x 2 ). (d) Let V = P2 have the evaluation inner product at the points x0 = 1, x1 = 0, x2 = −1, and let v = 1 + x . Compute T (x + x 2 ).

p(x)q(x) dx −1

Find the following for p = 2x 3 and q = 1 − x 3 . (a) p, q

(b) d(p, q)

(c) p

(d) q

True-False Exercises TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer. (a) The dot product on R 2 is an example of a weighted inner product.

(Calculus required ) In Exericses 39–40, use the inner product



1

f, g =

f (x)g(x)dx 0

on C[0, 1] to compute f, g. 39. f = cos 2π x, g = sin 2πx 40. f = x, g = ex

Working with Proofs 41. Prove parts (a) and (b) of Theorem 6.1.1. 42. Prove parts (c) and (d) of Theorem 6.1.1.

(b) The inner product of two vectors cannot be a negative real number. (c) u, v + w = v, u + w, u. (d) k u, k v = k 2 u, v. (e) If u, v = 0, then u = 0 or v = 0. (f ) If v 2 = 0, then v = 0. (g) If A is an n × n matrix, then u, v = Au · Av defines an inner product on R n .

6.2 Angle and Orthogonality in Inner Product Spaces

Working withTechnology

355

and let

T1. (a) Confirm that the following matrix generates an inner product. ⎡ ⎤ 5 8 6 −13

⎢3 ⎢ A=⎢ ⎣0

−1

0

1

−1

2

4

3

−9 ⎥ ⎥ ⎥ 0⎦

(a) Compute p, q, p , and q . (b) Verify that the identities in Exercises 44 and 45 hold for the vectors p and q.

−5

(b) For the following vectors, use the inner product in part (a) to compute u, v, first by Formula (5) and then by Formula (6). ⎡ ⎤ ⎡ ⎤ 1 0

⎢ 1⎥ ⎢−2⎥ ⎢ ⎥ ⎢ ⎥ u = ⎢ ⎥ and v = ⎢ ⎥ ⎣−1⎦ ⎣ 0⎦ 3

p = p(x) = x + x 3 and q = q(x) = 1 + x 2 + x 4

T3. Let the vector space M33 have the standard inner product and let



1

−2

u = U = ⎣−2 3













2

−1

4

1⎦ and v = V = ⎣1

4

3⎦

1

0

1

0

2

3

0



(a) Use Formula (8) to compute u, v, u , and v .

2

T2. Let the vector space P4 have the evaluation inner product at the points −2, −1, 0, 1, 2

(b) Verify that the identities in Exercises 44 and 45 hold for the vectors u and v.

6.2 Angle and Orthogonality in Inner Product Spaces In Section 3.2 we defined the notion of “angle” between vectors in R n . In this section we will extend this idea to general vector spaces. This will enable us to extend the notion of orthogonality as well, thereby setting the groundwork for a variety of new applications.

Cauchy–Schwarz Inequality

Recall from Formula (20) of Section 3.2 that the angle θ between two vectors u and v in R n is ! u·v −1 θ = cos (1)

u

v We were assured that this formula was valid because it followed from the Cauchy– Schwarz inequality (Theorem 3.2.4) that

−1 ≤

u·v ≤1

u

v

(2)

as required for the inverse cosine to be defined. The following generalization of the Cauchy–Schwarz inequality will enable us to define the angle between two vectors in any real inner product space.

THEOREM 6.2.1 Cauchy–Schwarz Inequality

If u and v are vectors in a real inner product space V, then

|u, v| ≤ u

v

(3)

Proof We warn you in advance that the proof presented here depends on a clever trick

that is not easy to motivate. In the case where u = 0 the two sides of (3) are equal since u, v and u are both zero. Thus, we need only consider the case where u  = 0. Making this assumption, let

a = u, u, b = 2u, v, c = v, v

356

Chapter 6 Inner Product Spaces

and let t be any real number. Since the positivity axiom states that the inner product of any vector with itself is nonnegative, it follows that 0 ≤ t u + v, t u + v = u, ut 2 + 2u, vt + v, v

= at 2 + bt + c This inequality implies that the quadratic polynomial at 2 + bt + c has either no real roots or a repeated real root. Therefore, its discriminant must satisfy the inequality b2 − 4ac ≤ 0. Expressing the coefficients a, b, and c in terms of the vectors u and v gives 4u, v2 − 4u, uv, v ≤ 0 or, equivalently,

u, v2 ≤ u, uv, v Taking square roots of both sides and using the fact that u, u and v, v are nonnegative yields |u, v| ≤ u, u1/2 v, v1/2 or equivalently |u, v| ≤ u

v which completes the proof. The following two alternative forms of the Cauchy–Schwarz inequality are useful to know:

u, v2 ≤ u, uv, v

(4)

u, v2 ≤ u 2 v 2

(5)

The first of these formulas was obtained in the proof of Theorem 6.2.1, and the second is a variation of the first.

Angle Between Vectors

Our next goal is to define what is meant by the “angle” between vectors in a real inner product space. As a first step, we leave it as an exercise for you to use the Cauchy–Schwarz inequality to show that

u, v ≤1

u

v This being the case, there is a unique angle θ in radian measure for which cos θ =

−1 ≤

(6)

u, v and 0 ≤ θ ≤ π

u

v

(7)

(Figure 6.2.1). This enables us to define the angle θ between u and v to be −1

θ = cos

u, v

u

v

! (8)

y 1 θ –π

Figure 6.2.1

–π 2

π 2 –1

π

3π 2



5π 2



6.2 Angle and Orthogonality in Inner Product Spaces

357

E X A M P L E 1 Cosine of the Angle Between Vectors in M 22

Let M22 have the standard inner product. Find the cosine of the angle between the vectors     1 2 −1 0 u=U = and v = V = 3 4 3 2 Solution We showed in Example 6 of the previous section that

u, v = 16, u =



30, v =



14

from which it follows that cos θ =

Properties of Length and Distance in General Inner Product Spaces

16 u, v = √ √ ≈ 0.78

u

v 30 14

In Section 3.2 we used the dot product to extend the notions of length and distance to R n , and we showed that various basic geometry theorems remained valid (see Theorems 3.2.5, 3.2.6, and 3.2.7). By making only minor adjustments to the proofs of those theorems, one can show that they remain valid in any real inner product space. For example, here is the generalization of Theorem 3.2.5 (the triangle inequalities). THEOREM 6.2.2 If u, v, and w are vectors in a real inner product space

V, and if k is

any scalar, then: (a) u + v ≤ u + v

[ Triangle inequality for vectors ]

(b) d(u, v) ≤ d(u, w) + d(w, v)

[ Triangle inequality for distances ]

Proof (a)

u + v 2 = u + v, u + v = u, u + 2u, v + v, v ≤ u, u + 2|u, v| + v, v

[ Property of absolute value ]

≤ u, u + 2 u

v + v, v

[ By (3) ]

= u + 2 u

v + v = ( u + v )2 2

2

Taking square roots gives u + v ≤ u + v . Proof (b) Identical to the proof of part (b) of Theorem 3.2.5.

Orthogonality

Although Example 1 is a useful mathematical exercise, there is only an occasional need to compute angles in vector spaces other than R 2 and R 3 . A problem of more interest in general vector spaces is ascertaining whether the angle between vectors is π/2. You should be able to see from Formula (8) that if u and v are nonzero vectors, then the angle between them is θ = π/2 if and only if u, v = 0. Accordingly, we make the following definition, which is a generalization of Definition 1 in Section 3.3 and is applicable even if one or both of the vectors is zero. DEFINITION 1 Two vectors u and v in an inner product space V called orthogonal if

u, v = 0.

358

Chapter 6 Inner Product Spaces

As the following example shows, orthogonality depends on the inner product in the sense that for different inner products two vectors can be orthogonal with respect to one but not the other. E X A M P L E 2 Orthogonality Depends on the Inner Product

The vectors u = (1, 1) and v = (1, −1) are orthogonal with respect to the Euclidean inner product on R 2 since u · v = (1)(1) + (1)(−1) = 0 However, they are not orthogonal with respect to the weighted Euclidean inner product

u, v = 3u1 v1 + 2u2 v2 since u, v = 3(1)(1) + 2(1)(−1) = 1  = 0 E X A M P L E 3 Orthogonal Vectors in M 22

If M22 has the inner product of Example 6 in the preceding section, then the matrices



1 1

U=

0 1



0 0

and V =

2 0

are orthogonal since

U, V  = 1(0) + 0(2) + 1(0) + 1(0) = 0 CA L C U L U S R E Q U I R E D

E X A M P L E 4 Orthogonal Vectors in P 2

Let P2 have the inner product

 p, q =

1

−1

p(x)q(x) dx

and let p = x and q = x 2 . Then



p = p, p

1/2

=

−1



q = q, q

1/2

 p, q =

1

−1

1/2

1

=

xx dx

 =

1/2

1 2 2

−1

 xx 2 dx =

x x dx

1

−1

=

2

−1

 =

,

1/2

1

x dx

4

−1

,

1/2

1

x dx

2 3

=

2 5

x 3 dx = 0

Because p, q = 0, the vectors p = x and q = x 2 are orthogonal relative to the given inner product. In Theorem 3.3.3 we proved the Theorem of Pythagoras for vectors in Euclidean

n-space. The following theorem extends this result to vectors in any real inner product space. THEOREM 6.2.3 Generalized Theorem of Pythagoras

If u and v are orthogonal vectors in a real inner product space, then

u + v 2 = u 2 + v 2

6.2 Angle and Orthogonality in Inner Product Spaces Proof The orthogonality of u and v implies that u, v

359

= 0, so

u + v = u + v, u + v = u + 2u, v + v 2 = u 2 + v 2 2

CA L C U L U S R E Q U I R E D

2

E X A M P L E 5 Theorem of Pythagoras in P 2

In Example 4 we showed that p = x and q = x 2 are orthogonal with respect to the inner product



p, q =

1

−1

p(x)q(x) dx

on P2 . It follows from Theorem 6.2.3 that

p + q 2 = p 2 + q 2 Thus, from the computations in Example 4, we have

, 2

p + q = 2

2 3

, 2

+

2 5

=

2 2 16 + = 3 5 15

We can check this result by direct integration:



p + q 2 = p + q, p + q =  =

Orthogonal Complements

1

 x dx + 2

−1 1

2

−1

1

(x + x 2 )(x + x 2 ) dx

x dx +



1

3

−1

−1

x 4 dx =

2 2 16 +0+ = 3 5 15

In Section 4.8 we defined the notion of an orthogonal complement for subspaces of R n , and we used that definition to establish a geometric link between the fundamental spaces of a matrix. The following definition extends that idea to general inner product spaces. DEFINITION 2 If W is a subspace of a real inner product space V, then the set of all vectors in V that are orthogonal to every vector in W is called the orthogonal complement of W and is denoted by the symbol W ⊥ .

In Theorem 4.8.6 we stated three properties of orthogonal complements in R n . The following theorem generalizes parts (a) and (b) of that theorem to general real inner product spaces. THEOREM 6.2.4 If W is a subspace of a real inner product space V, then:

(a) W ⊥ is a subspace of V . (b) W ∩ W ⊥ = {0}. Proof (a) The set W ⊥ contains at least the zero vector, since 0, w

= 0 for every vector w in W . Thus, it remains to show that W ⊥ is closed under addition and scalar multiplication. To do this, suppose that u and v are vectors in W ⊥ , so that for every vector w in W we have u, w = 0 and v, w = 0. It follows from the additivity and homogeneity axioms of inner products that u + v, w = u, w + v, w = 0 + 0 = 0 k u, w = ku, w = k(0) = 0 which proves that u + v and k u are in W ⊥ .

360

Chapter 6 Inner Product Spaces

W and W ⊥ , then v is orthogonal to itself; that is, v, v = 0. It follows from the positivity axiom for inner products that v = 0. Proof (b) If v is any vector in both

The next theorem, which we state without proof, generalizes part (c) of Theorem 4.8.6. Note, however, that this theorem applies only to finite-dimensional inner product spaces, whereas Theorem 4.8.6 does not have this restriction. Theorem 6.2.5 implies that in a finite-dimensional inner product space orthogonal complements occur in pairs, each being orthogonal to the other (Figure 6.2.2).

W⊥

W

THEOREM 6.2.5 If W is a subspace of a real finite-dimensional inner product space V,

then the orthogonal complement of W ⊥ is W ; that is,

(W ⊥ )⊥ = W In our study of the fundamental spaces of a matrix in Section 4.8 we showed that the row space and null space of a matrix are orthogonal complements with respect to the Euclidean inner product on R n (Theorem 4.8.7). The following example takes advantage of that fact. E X A M P L E 6 Basis for an Orthogonal Complement

Let W be the subspace of R 6 spanned by the vectors

Figure 6.2.2 Each vector in W is orthogonal to each vector in W ⊥ and conversely.

w1 = (1, 3, −2, 0, 2, 0),

w2 = (2, 6, −5, −2, 4, −3),

w3 = (0, 0, 5, 10, 0, 15),

w4 = (2, 6, 0, 8, 4, 18)

Find a basis for the orthogonal complement of W . Solution The subspace W is the same as the row space of the matrix



1 ⎢2 ⎢ A=⎢ ⎣0 2

3 6 0 6

−2 −5 5 0

0 −2 10 8

2 4 0 4



0 −3⎥ ⎥ ⎥ 15⎦ 18

Since the row space and null space of A are orthogonal complements, our problem reduces to finding a basis for the null space of this matrix. In Example 4 of Section 4.7 we showed that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −3 −4 −2 ⎢ 1⎥ ⎢ 0⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 0⎥ ⎢ −2 ⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ v1 = ⎢ ⎥, v2 = ⎢ ⎥, v3 = ⎢ ⎥ ⎢ 0⎥ ⎢ 1⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ 0⎦ ⎣ 0⎦ ⎣ 1⎦ 0 0 0 form a basis for this null space. Expressing these vectors in comma-delimited form (to match that of w1 , w2 , w3 , and w4 ), we obtain the basis vectors v1 = (−3, 1, 0, 0, 0, 0), v2 = (−4, 0, −2, 1, 0, 0), v3 = (−2, 0, 0, 0, 1, 0) You may want to check that these vectors are orthogonal to w1 , w2 , w3 , and w4 by computing the necessary dot products.

6.2 Angle and Orthogonality in Inner Product Spaces

361

Exercise Set 6.2 In Exercises 1–2, find the cosine of the angle between the vectors with respect to the Euclidean inner product. 1. (a) u = (1, −3), v = (2, 4) (b) u = (−1, 5, 2), v = (2, 4, −9) (c) u = (1, 0, 1, 0), v = (−3, −3, −3, −3) 2. (a) u = (−1, 0), v = (3, 8) (b) u = (4, 1, 8), v = (1, 0, −3)

p1 = 2 + kx + 6x 2 , p2 = l + 5x + 3x 2 , p3 = 1 + 2x + 3x 2

In Exercises 3–4, find the cosine of the angle between the vectors with respect to the standard inner product on P2 . 3. p = −1 + 5x + 2x , q = 2 + 4x − 9x

16. Let R 4 have the Euclidean inner product. Find two unit vectors that are orthogonal to all three of the vectors u = (2, 1, −4, 0), v = (−1, −1, 2, 2), and w = (3, 2, 5, 4). 17. Do there exist scalars k and l such that the vectors

(c) u = (2, 1, 7, −1), v = (4, 0, 0, 0)

2

15. If the vectors u = (1, 2) and v = (2, −4) are orthogonal with respect to the weighted Euclidean inner product u, v = w1 u1 v1 + w2 u2 v2 , what must be true of the weights w1 and w2 ?

2

are mutually orthogonal with respect to the standard inner product on P2 ? 18. Show that the vectors

4. p = x − x 2 , q = 7 + 3x + 3x 2

 

u=

In Exercises 5–6, find the cosine of the angle between A and B with respect to the standard inner product on M22 .



5. A =

2 1



2 6. A = −1





6 3 , B= 1 −3

2 0

4 −3 , B= 4 3

1 2

3 3

 and v =

5



−8

are orthogonal with respect to the inner product on R 2 that is generated by the matrix



A=

In Exercises 7–8, determine whether the vectors are orthogonal with respect to the Euclidean inner product. 7. (a) u = (−1, 3, 2), v = (4, 2, −1) (b) u = (−2, −2, −2), v = (1, 1, 1) (c) u = (a, b), v = (−b, a) 8. (a) u = (u1 , u2 , u3 ), v = (0, 0, 0) (b) u = (−4, 6, −10, 1), v = (2, 1, −2, 9) (c) u = (a, b, c), v = (−c, 0, a) In Exercises 9–10, show that the vectors are orthogonal with respect to the standard inner product on P2 . 9. p = −1 − x + 2x , q = 2x + x 2

2

1

1

1



[See Formulas (5) and (6) of Section 6.1.] 19. Let P2 have the evaluation inner product at the points

x0 = −2, x1 = 0, x2 = 2 Show that the vectors p = x and q = x 2 are orthogonal with respect to this inner product. 20. Let M22 have the standard inner product. Determine whether the matrix A is in the subspace spanned by the matrices U and V .



A=

−1

1

0

2





, U=

1

 −1

3

0



, V =

4

0

9

2



2

10. p = 2 − 3x + x 2 , q = 4 + 2x − 2x 2

In Exercises 21–24, confirm that the Cauchy–Schwarz inequality holds for the given vectors using the stated inner product.

In Exercises 11–12, show that the matrices are orthogonal with respect to the standard inner product on M22 .

21. u = (1, 0, 3), v = (2, 1, −1) using the weighted Euclidean inner product u, v = 2u1 v1 + 3u2 v2 + u3 v3 in R 3 .



2 11. U = −1

12. U =

5 2



1 −3 , V = 0 3



−1 1 , V = −2 −1

0 2

3 0

In Exercises 13–14, show that the vectors are not orthogonal with respect to the Euclidean inner product on R 2 , and then find a value of k for which the vectors are orthogonal with respect to the weighted Euclidean inner product u, v = 2u1 v1 + ku2 v2 . 13. u = (1, 3), v = (2, −1)



22. U =

14. u = (2, −4), v = (0, 3)

−1 6

2 1



and V =

1 3

0 3

using the standard inner product on M22 . 23. p = −1 + 2x + x 2 and q = 2 − 4x 2 using the standard inner product on P2 . 24. The vectors

u=

1 1

and v =

1 −1

with respect to the inner product in Exercise 18.

362

Chapter 6 Inner Product Spaces

25. Let R 4 have the Euclidean inner product, and let u = (−1, 1, 0, 2). Determine whether the vector u is orthogonal to the subspace spanned by the vectors w1 = (1, −1, 3, 0) and w2 = (4, 0, 9, 2). 26. Let P3 have the standard inner product, and let p = −1 − x + 2x 2 + 4x 3 Determine whether p is orthogonal to the subspace spanned by the polynomials w1 = 2 − x 2 + x 3 and w2 = 4x − 2x 2 + 2x 3 . In Exercises 27–28, find a basis for the orthogonal complement of the subspace of R n spanned by the vectors.

35. (Calculus required ) Let C[0, 1] have the inner product in Exercise 31. (a) Show that the vectors p = p(x) = 1 and q = q(x) =

In Exercises 29–30, assume that R n has the Euclidean inner product. 29. (a) Let W be the line in R 2 with equation y = 2x . Find an equation for W ⊥ . (b) Let W be the plane in R 3 with equation x − 2y − 3z = 0. Find parametric equations for W ⊥ . 30. (a) Let W be the y -axis in an xyz-coordinate system in R 3 . Describe the subspace W ⊥ .

are orthogonal.

36. (Calculus required ) Let C[−1, 1] have the inner product in Exercise 33. (a) Show that the vectors p = p(x) = x and q = q(x) = x 2 − 1 are orthogonal. (b) Show that the vectors in part (a) satisfy the Theorem of Pythagoras. 37. Let V be an inner product space. Show that√if u and v are orthogonal unit vectors in V, then u − v = 2. 38. Let V be an inner product space. Show that if w is orthogonal to both u1 and u2 , then it is orthogonal to k1 u1 + k2 u2 for all scalars k1 and k2 . Interpret this result geometrically in the case where V is R 3 with the Euclidean inner product. 39. (Calculus required ) Let C[0, π ] have the inner product



(b) Let W be the yz-plane of an xyz-coordinate system in R 3 . Describe the subspace W ⊥ . 31. (Calculus required ) Let C[0, 1] have the integral inner product



1

p, q =

p(x)q(x) dx 0

and let p = p(x) = x and q = q(x) = x 2 . (a) Find p, q. (b) Find p and q .

f(x)g(x) dx 0

and let fn = cos nx (n = 0, 1, 2, . . .). Show that if k  = l , then fk and fl are orthogonal vectors. 40. As illustrated in the accompanying figure, the vectors √ √ u = (1, 3 ) and v = (−1, 3 ) have norm 2 and an angle of 60◦ between them relative to the Euclidean inner product. Find a weighted Euclidean inner product with respect to which u and v are orthogonal unit vectors. (–1, √3)

(b) Find the distance between the vectors p and q in Exercise 31.

v

y

(1, √3)

60° u

x 2

33. (Calculus required ) Let C[−1, 1] have the integral inner product



π

f, g =

32. (a) Find the cosine of the angle between the vectors p and q in Exercise 31.

p, q =

−x

(b) Show that the vectors in part (a) satisfy the Theorem of Pythagoras.

27. v1 = (1, 4, 5, 2), v2 = (2, 1, 3, 0), v3 = (−1, 3, 2, 2) 28. v1 = (1, 4, 5, 6, 9), v2 = (3, −2, 1, 4, −1), v3 = (−1, 0, −1, −2, −1), v4 = (2, 3, 5, 7, 8)

1 2

1

Figure Ex-40

p(x)q(x) dx −1

and let p = p(x) = x 2 − x and q = q(x) = x + 1. (a) Find p, q. (b) Find p and q . 34. (a) Find the cosine of the angle between the vectors p and q in Exercise 33. (b) Find the distance between the vectors p and q in Exercise 33.

Working with Proofs 41. Let V be an inner product space. Prove that if w is orthogonal to each of the vectors u1 , u2 , . . . , ur , then it is orthogonal to every vector in span{u1 , u2 , . . . , ur }. 42. Let {v1 , v2 , . . . , vr } be a basis for an inner product space V . Prove that the zero vector is the only vector in V that is orthogonal to all of the basis vectors.

6.2 Angle and Orthogonality in Inner Product Spaces

43. Let {w1 , w2 , . . . , wk } be a basis for a subspace W of V . Prove that W ⊥ consists of all vectors in V that are orthogonal to every basis vector. 44. Prove the following generalization of Theorem 6.2.3: If v1 , v2 , . . . , vr are pairwise orthogonal vectors in an inner product space V, then

363

(a) Assuming that P2 has the standard inner product, find all vectors q in P2 such that p, q = T (p), T (q). (b) Assuming that P2 has the evaluation inner product at the points x0 = −1, x1 = 0, x2 = 1, find all vectors q in P2 such that p, q = T (p), T (q).

True-False Exercises

v1 + v2 + · · · + vr 2 = v1 2 + v2 2 + · · · + vr 2 45. Prove: If u and v are n × 1 matrices and A is an n × n matrix, then (vTATAu)2 ≤ (uTATAu)(vTATAv)

TF. In parts (a)–(f ) determine whether the statement is true or false, and justify your answer.

46. Use the Cauchy–Schwarz inequality to prove that for all real values of a , b, and θ ,

(b) If u is a vector in both W and W ⊥ , then u = 0.

(a cos θ + b sin θ )2 ≤ a 2 + b2

(c) If u and v are vectors in W ⊥ , then u + v is in W ⊥ .

47. Prove: If w1 , w2 , . . . , wn are positive real numbers, and if u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are any two vectors in R n , then

|w1 u1 v1 + w2 u2 v2 + · · · + wn un vn | ≤

(w1 u21

+ w2 u22 + · · · + wn u2n )1/2 (w1 v12 + w2 v22 + · · · + wn vn2 )1/2

48. Prove that equality holds in the Cauchy–Schwarz inequality if and only if u and v are linearly dependent. 49. (Calculus required ) Let f(x) and g(x) be continuous functions on [0, 1]. Prove:



2

1

f(x)g(x) dx

(a)



(b)

g 2 (x) dx

0 1

(d) If u is a vector in W ⊥ and k is a real number, then k u is in W ⊥ . (e) If u and v are orthogonal, then |u, v| = u

v . (f ) If u and v are orthogonal, then u + v = u + v .

Working withTechnology T1. (a) We know that the row space and null space of a matrix are orthogonal complements relative to the Euclidean inner product. Confirm this fact for the matrix



0

1/2 [f(x) + g(x)]2 dx



1

f 2 (x) dx

0





1



(a) If u is orthogonal to every vector of a subspace W , then u = 0.

 ≤

1/2

1

f 2 (x) dx

0

0

 +

1/2

1

g 2 (x) dx

2

−1

3

5



⎢4 ⎢ ⎢ A=⎢ ⎢3 ⎢ ⎣4

−3

1

−2

3

−1

15

⎥ ⎥ 4⎥ ⎥ ⎥ 17⎦

7

−6

−7

0

0

3⎥

[Hint: Use the Cauchy–Schwarz inequality.] 50. Prove that Formula (4) holds for all nonzero vectors u and v in a real inner product space V . 51. Let TA : R 2 →R 2 be multiplication by



A=

1

1

−1

1



and let x = (1, 1). 2

(a) Assuming that R has the Euclidean inner product, find all vectors v in R 2 such that x, v = TA (x), TA (v). (b) Assuming that R 2 has the weighted Euclidean inner product u, v = 2u1 v1 + 3u2 v2 , find all vectors v in R 2 such that x, v = TA (x), TA (v). 52. Let T : P2 →P2 be the linear transformation defined by

T (a + bx + cx 2 ) = 3a − cx 2 and let p = 1 + x .

(b) Find a basis for the orthogonal complement of the column space of A. T2. In each part, confirm that the vectors u and v satisfy the Cauchy–Schwarz inequality relative to the stated inner product. (a) M44 with the standard inner product.



1

0

2

⎢0 ⎢ u=⎢ ⎣3

−1

0

0

0

0

4

−3



0





2

2

1

3

⎢ 3 ⎢ ⎥ and v = ⎢ ⎣ 1 2⎦

−1

0

1⎥ ⎥

0

0

⎥ −2 ⎦

−3

1

2

0

1⎥ ⎥ 0

(b) R 4 with the weighted Euclidean inner product with weights w1 = 21 , w2 = 41 , w3 = 18 , w4 = 18 . u = (1, −2, 2, 1) and v = (0, −3, 3, −2)

364

Chapter 6 Inner Product Spaces

6.3 Gram–Schmidt Process; QR-Decomposition In many problems involving vector spaces, the problem solver is free to choose any basis for the vector space that seems appropriate. In inner product spaces, the solution of a problem can often be simplified by choosing a basis in which the vectors are orthogonal to one another. In this section we will show how such bases can be obtained.

Orthogonal and Orthonormal Sets

Recall from Section 6.2 that two vectors in an inner product space are said to be orthogonal if their inner product is zero. The following definition extends the notion of orthogonality to sets of vectors in an inner product space. DEFINITION 1 A set of two or more vectors in a real inner product space is said to be orthogonal if all pairs of distinct vectors in the set are orthogonal. An orthogonal set in which each vector has norm 1 is said to be orthonormal.

E X A M P L E 1 An Orthogonal Set in R 3

Let v1 = (0, 1, 0), v2 = (1, 0, 1), v3 = (1, 0, −1) 3

and assume that R has the Euclidean inner product. It follows that the set of vectors S = {v1 , v2 , v3 } is orthogonal since v1 , v2  = v1 , v3  = v2 , v3  = 0.

Note that Formula (1) is identical to Formula (4) of Section 3.2, but whereas Formula (4) was valid only for vectors in R n with the Euclidean inner product, Formula (1) is valid in general inner product spaces.

It frequently happens that one has found a set of orthogonal vectors in an inner product space but what is actually needed is a set of orthonormal vectors. A simple way to convert an orthogonal set of nonzero vectors into an orthonormal set is to multiply each vector v in the orthogonal set by the reciprocal of its length to create a vector of norm 1 (called a unit vector). To see why this works, suppose that v is a nonzero vector in an inner product space, and let 1 u= v (1)

v Then it follows from Theorem 6.1.1(b) with k = v that

 $  $ $ 1 $  1   v = 1 v = 1 $  $

u = $ v$ = 

v

v 

v

This process of multiplying a vector v by the reciprocal of its length is called normalizing v. We leave it as an exercise to show that normalizing the vectors in an orthogonal set of nonzero vectors preserves the orthogonality of the vectors and produces an orthonormal set. E X A M P L E 2 Constructing an Orthonormal Set

The Euclidean norms of the vectors in Example 1 are

v1 = 1, v2 =



2, v3 =



2

Consequently, normalizing u1 , u2 , and u3 yields u1 =

1

v1 v2 = (0, 1, 0), u2 = =

v1

v2 v3 u3 = =

v3

1

1

√ , 0, − √ 2

1

√ , 0, √

2

!

2

2

! ,

6.3 Gram–Schmidt Process; QR -Decomposition

365

We leave it for you to verify that the set S = {u1 , u2 , u3 } is orthonormal by showing that

u1 , u2  = u1 , u3  = u2 , u3  = 0 and u1 = u2 = u3 = 1 In R 2 any two nonzero perpendicular vectors are linearly independent because neither is a scalar multiple of the other; and in R 3 any three nonzero mutually perpendicular vectors are linearly independent because no one lies in the plane of the other two (and hence is not expressible as a linear combination of the other two). The following theorem generalizes these observations.

S = {v1 , v2 , . . . , vn } is an orthogonal set of nonzero vectors in an inner product space, then S is linearly independent.

THEOREM 6.3.1 If

Proof Assume that

k1 v1 + k2 v2 + · · · + kn vn = 0

(2)

To demonstrate that S = {v1 , v2 , . . . , vn } is linearly independent, we must prove that k1 = k2 = · · · = kn = 0. For each vi in S , it follows from (2) that

k1 v1 + k2 v2 + · · · + kn vn , vi  = 0, vi  = 0 or, equivalently,

k1 v1 , vi  + k2 v2 , vi  + · · · + kn vn , vi  = 0 From the orthogonality of S it follows that vj , vi  = 0 when j  = i , so this equation reduces to ki vi , vi  = 0 Since the vectors in S are assumed to be nonzero, it follows from the positivity axiom for inner products that vi , vi   = 0. Thus, the preceding equation implies that each ki in Equation (2) is zero, which is what we wanted to prove. Since an orthonormal set is orthogonal, and since its vectors are nonzero (norm 1), it follows from Theorem 6.3.1 that every orthonormal set is linearly independent.

In an inner product space, a basis consisting of orthonormal vectors is called an orthonormal basis, and a basis consisting of orthogonal vectors is called an orthogonal basis. A familiar example of an orthonormal basis is the standard basis for R n with the Euclidean inner product: e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . , en = (0, 0, 0, . . . , 1) E X A M P L E 3 An Orthonormal Basis for Pn

Recall from Example 7 of Section 6.1 that the standard inner product of the polynomials p = a0 + a1 x + · · · + an x n and q = b0 + b1 x + · · · + bn x n is

p, q = a0 b0 + a1 b1 + · · · + an bn and the norm of p relative to this inner product is

p =

# " p, p = a02 + a12 + · · · + an2

You should be able to see from these formulas that the standard basis

* ) S = 1, x, x 2 , . . . , x n

is orthonormal with respect to this inner product.

366

Chapter 6 Inner Product Spaces

E X A M P L E 4 An Orthonormal Basis

In Example 2 we showed that the vectors u1 = (0, 1, 0), u2 =

1

1

√ , 0, √ 2

2

! , and u3 =

1

1

√ , 0, − √ 2

!

2

3

form an orthonormal set with respect to the Euclidean inner product on R . By Theorem 6.3.1, these vectors form a linearly independent set, and since R 3 is three-dimensional, it follows from Theorem 4.5.4 that S = {u1 , u2 , u3 } is an orthonormal basis for R 3 . Coordinates Relative to Orthonormal Bases

One way to express a vector u as a linear combination of basis vectors

S = {v1 , v2 , . . . , vn } is to convert the vector equation u = c1 v1 + c2 v2 + · · · + cn vn to a linear system and solve for the coefficients c1 , c2 , . . . , cn . However, if the basis happens to be orthogonal or orthonormal, then the following theorem shows that the coefficients can be obtained more simply by computing appropriate inner products.

THEOREM 6.3.2

(a) If S = {v1 , v2 , . . . , vn } is an orthogonal basis for an inner product space V, and if u is any vector in V, then u=

u, v1  u, v2  u, vn  v1 + v2 + · · · + vn 2 2

v1

v2

vn 2

(3)

(b) If S = {v1 , v2 , . . . , vn } is an orthonormal basis for an inner product space V, and if u is any vector in V, then u = u, v1 v1 + u, v2 v2 + · · · + u, vn vn

(4)

Proof (a) Since S = {v1 , v2 , . . . , vn } is a basis for V, every vector u in V can be expressed in the form u = c1 v1 + c2 v2 + · · · + cn vn

We will complete the proof by showing that

ci =

u, vi 

vi 2

(5)

for i = 1, 2, . . . , n. To do this, observe first that

u, vi  = c1 v1 + c2 v2 + · · · + cn vn , vi  = c1 v1 , vi  + c2 v2 , vi  + · · · + cn vn , vi  Since S is an orthogonal set, all of the inner products in the last equality are zero except the i th, so we have u, vi  = ci vi , vi  = ci vi 2 Solving this equation for ci yields (5), which completes the proof. Proof (b) In this case, v1

mula (4).

= v2 = · · · = vn = 1, so Formula (3) simplifies to For-

6.3 Gram–Schmidt Process; QR -Decomposition

367

Using the terminology and notation from Definition 2 of Section 4.4, it follows from Theorem 6.3.2 that the coordinate vector of a vector u in V relative to an orthogonal basis S = {v1 , v2 , . . . , vn } is

(u)S =

u, v1  u, v2  u, vn  , ,..., 2 2

v1 v2

vn 2

! (6)

and relative to an orthonormal basis S = {v1 , v2 , . . . , vn } is

(u)S = (u, v1 , u, v2 , . . . , u, vn )

(7)

E X A M P L E 5 A Coordinate Vector Relative to an Orthonormal Basis

Let

 , 0, 45 It is easy to check that S = {v1 , v2 , v3 } is an orthonormal basis for R 3 with the Euclidean inner product. Express the vector u = (1, 1, 1) as a linear combination of the vectors in S , and find the coordinate vector (u)S . 

v1 = (0, 1, 0), v2 = − 45 , 0,

3 5



3

, v3 =

5

Solution We leave it for you to verify that

u, v1  = 1, u, v2  = − 15 , and u, v3  =

7 5

Therefore, by Theorem 6.3.2 we have u = v1 − 15 v2 + 75 v3 that is,

(1, 1, 1) = (0, 1, 0) −



1 5

   − 45 , 0, 35 + 75 35 , 0, 45

Thus, the coordinate vector of u relative to S is

  (u)S = (u, v1 , u, v2 , u, v3 ) = 1, − 15 , 75

E X A M P L E 6 An Orthonormal Basis from an Orthogonal Basis

(a) Show that the vectors w1 = (0, 2, 0), w2 = (3, 0, 3), w3 = (−4, 0, 4) form an orthogonal basis for R 3 with the Euclidean inner product, and use that basis to find an orthonormal basis by normalizing each vector. (b) Express the vector u = (1, 2, 4) as a linear combination of the orthonormal basis vectors obtained in part (a). Solution (a) The given vectors form an orthogonal set since

w1 , w2  = 0, w1 , w3  = 0, w2 , w3  = 0 It follows from Theorem 6.3.1 that these vectors are linearly independent and hence form a basis for R 3 by Theorem 4.5.4. We leave it for you to calculate the norms of w1 , w2 , and w3 and then obtain the orthonormal basis v1 =

1

w1 w2 = (0, 1, 0), v2 = =

w1

w2 v3 =

w3 =

w3

1

1

− √ , 0, √ 2

1

√ , 0, √

2

!

2

2

!

,

368

Chapter 6 Inner Product Spaces Solution (b) It follows from Formula (4) that

u = u, v1 v1 + u, v2 v2 + u, v3 v3 We leave it for you to confirm that

u, v1  = (1, 2, 4) · (0, 1, 0) = 2 1

u, v2  = (1, 2, 4) ·

1

!

5

=√ 2 2 2 ! 1 1 3 u, v3  = (1, 2, 4) · − √ , 0, √ =√ √ , 0, √

2

2

and hence that 5

1

(1, 2, 4) = 2(0, 1, 0) + √

√ , 0, √

2

Orthogonal Projections

1

2

2

!

2

3

+√

2

1

1

!

− √ , 0, √ 2

2

Many applied problems are best solved by working with orthogonal or orthonormal basis vectors. Such bases are typically found by starting with some simple basis (say a standard basis) and then converting that basis into an orthogonal or orthonormal basis. To explain exactly how that is done will require some preliminary ideas about orthogonal projections. In Section 3.3 we proved a result called the Projection Theorem (see Theorem 3.3.2) that dealt with the problem of decomposing a vector u in R n into a sum of two terms, w1 and w2 , in which w1 is the orthogonal projection of u on some nonzero vector a and w2 is orthogonal to w1 (Figure 3.3.2). That result is a special case of the following more general theorem, which we will state without proof.

THEOREM 6.3.3 Projection Theorem

If W is a finite-dimensional subspace of an inner product space V, then every vector u in V can be expressed in exactly one way as u = w1 + w2

(8)

where w1 is in W and w2 is in W ⊥ .

The vectors w1 and w2 in Formula (8) are commonly denoted by w1 = projW u and w2 = projW ⊥ u

W⊥

u

0

projW u

Figure 6.3.1

(9)

These are called the orthogonal projection of u on W and the orthogonal projection of u on W ⊥ , respectively. The vector w2 is also called the component of u orthogonal to W . Using the notation in (9), Formula (8) can be expressed as u = projW u + projW ⊥ u

(10)

projW⊥ u

(Figure 6.3.1). Moreover, since projW ⊥ u = u − projW u, we can also express Formula (10) as W

u = projW u + (u − projW u)

(11)

6.3 Gram–Schmidt Process; QR -Decomposition

369

The following theorem provides formulas for calculating orthogonal projections. THEOREM 6.3.4 Let W be a finite-dimensional subspace of an inner product space V .

Although Formulas (12) and (13) are expressed in terms of orthogonal and orthonormal basis vectors, the resulting vector projW u does not depend on the basis vectors that are used.

(a) If {v1 , v2 , . . . , vr } is an orthogonal basis for W, and u is any vector in V, then

u, v1  u, v2  u, vr  v1 + v2 + · · · + vr (12)

v1 2

v2 2

vr 2 (b) If {v1 , v2 , . . . , vr } is an orthonormal basis for W, and u is any vector in V, then projW u =

projW u = u, v1 v1 + u, v2 v2 + · · · + u, vr vr

(13)

Proof (a) It follows from Theorem 6.3.3 that the vector u can be expressed in the form u = w1 + w2 , where w1 = projW u is in W and w2 is in W ⊥ ; and it follows from Theorem 6.3.2 that the component projW u = w1 can be expressed in terms of the basis vectors for W as w1 , v1  w1 , v2  w1 , vr  v1 + v2 + · · · + vr (14) projW u = w1 = 2 2

v1

v2

vr 2 Since w2 is orthogonal to W , it follows that

w2 , v1  = w2 , v2  = · · · = w2 , vr  = 0 so we can rewrite (14) as projW u = w1 =

w1 + w2 , v1  w1 + w2 , v2  w1 + w2 , vr  v1 + v2 + · · · + vr

v1 2

v2 2

vr 2

or, equivalently, as projW u = w1 = Proof (b) In this case,

u, v1  u, v2  u, vr  v1 + v2 + · · · + vr 2 2

v1

v2

vr 2

v1 = v2 = · · · = vr = 1, so Formula (14) simplifies to

Formula (13). E X A M P L E 7 Calculating Projections

Let R 3 have the Euclidean inner product, and let W be  the subspace spanned by the orthonormal vectors v1 = (0, 1, 0) and v2 = − 45 , 0, 35 . From Formula (13) the orthogonal projection of u = (1, 1, 1) on W is projW u = u, v1 v1 + u, v2 v2

   = (1)(0, 1, 0) + − 15 − 45 , 0, 35   = 254 , 1, − 253

The component of u orthogonal to W is projW ⊥ u = u − projW u = (1, 1, 1) −

4 25

   , 1, − 253 = 21 , 0, 28 25 25

Observe that projW ⊥ u is orthogonal to both v1 and v2 , so this vector is orthogonal to each vector in the space W spanned by v1 and v2 , as it should be. A Geometric Interpretation of Orthogonal Projections

If W is a one-dimensional subspace of an inner product space V, say span{a}, then Formula (12) has only the one term projW u =

u, a a

a 2

In the special case where V is R 3 with the Euclidean inner product, this is exactly Formula (10) of Section 3.3 for the orthogonal projection of u along a. This suggests that

370

Chapter 6 Inner Product Spaces

we can think of (12) as the sum of orthogonal projections on “axes” determined by the basis vectors for the subspace W (Figure 6.3.2).

u

v2

projv u 2

0 projv u

Figure 6.3.2

The Gram–Schmidt Process

W

projW u 1

v1

We have seen that orthonormal bases exhibit a variety of useful properties. Our next theorem, which is the main result in this section, shows that every nonzero finite-dimensional vector space has an orthonormal basis. The proof of this result is extremely important since it provides an algorithm, or method, for converting an arbitrary basis into an orthonormal basis. THEOREM 6.3.5 Every nonzero finite-dimensional inner product space has an ortho-

normal basis. Proof Let W be any nonzero finite-dimensional subspace of an inner product space, and

suppose that {u1 , u2 , . . . , ur } is any basis for W . It suffices to show that W has an orthogonal basis since the vectors in that basis can be normalized to obtain an orthonormal basis. The following sequence of steps will produce an orthogonal basis {v1 , v2 , . . . , vr } for W : Step 1. Let v1 = u1 . Step 2. As illustrated in Figure 6.3.3, we can obtain a vector v2 that is orthogonal to v1 by computing the component of u2 that is orthogonal to the space W1 spanned by v1 . Using Formula (12) to perform this computation, we obtain

v2 = u2 – projW u2 1

u2 W1 projW u2

v1

1

Figure 6.3.3

v2 = u2 − projW1 u2 = u2 −

u2 , v1  v1

v1 2

Of course, if v2 = 0, then v2 is not a basis vector. But this cannot happen, since it would then follow from the preceding formula for v2 that u2 =

v3 = u3 – projW u3 2

u2 , v1  u2 , v1  v1 = u1 2

v1

u1 2

which implies that u2 is a multiple of u1 , contradicting the linear independence of the basis {u1 , u2 , . . . , ur }. Step 3. To construct a vector v3 that is orthogonal to both v1 and v2 , we compute the component of u3 orthogonal to the space W2 spanned by v1 and v2 (Figure 6.3.4). Using Formula (12) to perform this computation, we obtain

u3 v2 v1

W2 projW u3 2

Figure 6.3.4

v3 = u3 − projW2 u3 = u3 −

u3 , v1  u3 , v2  v1 − v2 2

v1

v2 2

As in Step 2, the linear independence of {u1 , u2 , . . . , ur } ensures that v3  = 0. We leave the details for you.

6.3 Gram–Schmidt Process; QR -Decomposition

371

Step 4. To determine a vector v4 that is orthogonal to v1 , v2 , and v3 , we compute the component of u4 orthogonal to the space W3 spanned by v1 , v2 , and v3 . From (12), v4 = u4 − projW3 u4 = u4 −

u4 , v1  u4 , v2  u4 , v3  v1 − v2 − v3

v1 2

v2 2

v3 2

Continuing in this way we will produce after r steps an orthogonal set of nonzero vectors {v1 , v2 , . . . , vr }. Since such sets are linearly independent, we will have produced an orthogonal basis for the r -dimensional space W . By normalizing these basis vectors we can obtain an orthonormal basis. The step-by-step construction of an orthogonal (or orthonormal) basis given in the foregoing proof is called the Gram–Schmidt process. For reference, we provide the following summary of the steps.

The Gram–Schmidt Process To convert a basis {u1 , u2 , . . . , ur } into an orthogonal basis {v1 , v2 , . . . , vr }, perform the following computations: Step 1. v1 = u1

u2 , v1  v1

v1 2 u3 , v1  Step 3. v3 = u3 − v1 −

v1 2 u4 , v1  Step 4. v4 = u4 − v1 −

v1 2 .. . Step 2. v2 = u2 −

u3 , v2  v2

v2 2 u4 , v2  u4 , v3  v2 − v3 2

v2

v3 2

(continue for r steps) Optional Step. To convert the orthogonal basis into an orthonormal basis

{q1 , q2 , . . . , qr }, normalize the orthogonal basis vectors.

Historical Note Erhardt Schmidt (1875–1959) was a German mathematician who studied for his doctoral degree at Göttingen University under David Hilbert, one of the giants of modern mathematics. For most of his life he taught at Berlin University where, in addition to making important contributions to many branches of mathematics, he fashioned some of Hilbert’s ideas into a general concept, called a Hilbert space—a fundamental structure in the study of infinite-dimensional vector spaces. He first described the process that bears his name in a paper on integral equations that he published in 1907.

Jorgen Pederson Gram (1850–1916)

Historical Note Gram was a Danish actuary whose early education was at village schools supplemented by private tutoring. He obtained a doctorate degree in mathematics while working for the Hafnia Life Insurance Company, where he specialized in the mathematics of accident insurance. It was in his dissertation that his contributions to the Gram–Schmidt process were formulated. He eventually became interested in abstract mathematics and received a gold medal from the Royal Danish Society of Sciences and Letters in recognition of his work. His lifelong interest in applied mathematics never wavered, however, and he produced a variety of treatises on Danish forest management. [Image: http://www-history.mcs.st-and.ac.uk/PictDisplay/Gram.html]

372

Chapter 6 Inner Product Spaces

E X A M P L E 8 Using the Gram–Schmidt Process

Assume that the vector space R 3 has the Euclidean inner product. Apply the Gram– Schmidt process to transform the basis vectors u1 = (1, 1, 1), u2 = (0, 1, 1), u3 = (0, 0, 1) into an orthogonal basis {v1 , v2 , v3 }, and then normalize the orthogonal basis vectors to obtain an orthonormal basis {q1 , q2 , q3 }. Solution

Step 1. v1 = u1 = (1, 1, 1) Step 2. v2 = u2 − projW1 u2 = u2 −

u2 , v1  v1

v1 2

!

2 3

2 1 1 3 3 3 u3 , v1  u3 , v2  Step 3. v3 = u3 − projW2 u3 = u3 − v1 − v2 2

v1

v2 2

= (0, 1, 1) − (1, 1, 1) = − , ,

1/3 2/3

1 3

= (0, 0, 1) − (1, 1, 1) − 1 1 2 2

!

2 1 1 3 3 3

!

− , ,

= 0, − , Thus,

!

v1 = (1, 1, 1), v2 =

2 1 1 , v3 = 3 3 3

− , ,

1 1 0, − , 2 2

!

form an orthogonal basis for R 3 . The norms of these vectors are

v1 =





3, v2 =

so an orthonormal basis for R 3 is v1 = q1 =

v1

1

1

1

6 1 , v3 = √ 3 2

!

v2 = √ , √ , √ , q2 =

v2 3 3 3 1

v3 = q3 =

v3

1

2

1

1

−√ , √ , √ !

6

6

6

! ,

0, − √ , √ 2 2

Remark In the last example we normalized at the end to convert the orthogonal basis into an orthonormal basis. Alternatively, we could have normalized each orthogonal basis vector as soon as it was obtained, thereby producing an orthonormal basis step by step. However, that procedure generally has the disadvantage in hand calculation of producing more square roots to manipulate. A more useful variation is to “scale” the orthogonal basis vectors at each step to eliminate some of the fractions. For example, after Step 2 above, we could have multiplied by 3 to produce (−2, 1, 1) as the second orthogonal basis vector, thereby simplifying the calculations in Step 3.

CA L C U L U S R E Q U I R E D

E X A M P L E 9 Legendre Polynomials

Let the vector space P2 have the inner product



p, q =

1

−1

p(x)q(x) dx

Apply the Gram–Schmidt process to transform the standard basis {1, x, x 2 } for P2 into an orthogonal basis {φ1 (x), φ2 (x), φ3 (x)}.

6.3 Gram–Schmidt Process; QR -Decomposition Solution Take u1

373

= 1, u2 = x , and u3 = x 2 .

Step 1. v1 = u1 = 1 Step 2. We have

 u2 , v1  =

so v2 = u2 − Step 3. We have

1

−1

u2 , v1  v1 = u2 = x

v1 2 

u3 , v1  =

1

x dx = 2

−1

 u3 , v2  =

x dx = 0

1

−1

x 3 dx = 

v1 = v1 , v1  =

1

2

so v3 = u3 −

−1

x3

1 =

3

x

4

−1

2 3

1

4

=0 −1

1

1 dx = x

=2 −1

1 u3 , v1  u3 , v2  v1 − v2 = x 2 − 2 2

v1

v2 3

Thus, we have obtained the orthogonal basis {φ1 (x), φ2 (x), φ3 (x)} in which 1 φ1 (x) = 1, φ2 (x) = x, φ3 (x) = x 2 − 3 Remark The orthogonal basis vectors in the last example are often scaled so all three functions have a value of 1 at x = 1. The resulting polynomials 1 (3x 2 − 1) 1, x, 2 which are known as the first three Legendre polynomials, play an important role in a variety of applications. The scaling does not affect the orthogonality.

Extending Orthonormal Sets to Orthonormal Bases

Recall from part (b) of Theorem 4.5.5 that a linearly independent set in a finite-dimensional vector space can be enlarged to a basis by adding appropriate vectors. The following theorem is an analog of that result for orthogonal and orthonormal sets in finite-dimensional inner product spaces. THEOREM 6.3.6 If W is a finite-dimensional inner product space, then:

(a) Every orthogonal set of nonzero vectors in W can be enlarged to an orthogonal basis for W . (b) Every orthonormal set in W can be enlarged to an orthonormal basis for W . We will prove part (b) and leave part (a) as an exercise.

S = {v1 , v2 , . . . , vs } is an orthonormal set of vectors in W . Part (b) of Theorem 4.5.5 tells us that we can enlarge S to some basis

Proof (b) Suppose that

S = {v1 , v2 , . . . , vs , vs+1 , . . . , vk } for W . If we now apply the Gram–Schmidt process to the set S , then the vectors v1 , v2 , . . . , vs , will not be affected since they are already orthonormal, and the resulting set S

= {v1 , v2 , . . . , vs , v s+1 , . . . , v k } will be an orthonormal basis for W .

374

Chapter 6 Inner Product Spaces O PT I O N A L

QR-Decomposition

In recent years a numerical algorithm based on the Gram–Schmidt process, and known as QR-decomposition, has assumed growing importance as the mathematical foundation for a wide variety of numerical algorithms, including those for computing eigenvalues of large matrices. The technical aspects of such algorithms are discussed in textbooks that specialize in the numerical aspects of linear algebra. However, we will discuss some of the underlying ideas here. We begin by posing the following problem.

A is an m × n matrix with linearly independent column vectors, and if Q is the matrix that results by applying the Gram–Schmidt process to the column vectors of A, what relationship, if any, exists between A and Q? Problem If

To solve this problem, suppose that the column vectors of A are u1 , u2 , . . . , un and that Q has orthonormal column vectors q1 , q2 , . . . , qn . Thus, A and Q can be written in partitioned form as

A = [u1 | u2 | · · · | un ] and Q = [q1 | q2 | · · · | qn ] It follows from Theorem 6.3.2(b) that u1 , u2 , . . . , un are expressible in terms of the vectors q1 , q2 , . . . , qn as u1 = u1 , q1 q1 + u1 , q2 q2 + · · · + u1 , qn qn u2 = u2 , q1 q1 + u2 , q2 q2 + · · · + u2 , qn qn

.. .

.. .

.. .

.. .

un = un , q1 q1 + un , q2 q2 + · · · + un , qn qn Recalling from Section 1.3 (Example 9) that the j th column vector of a matrix product is a linear combination of the column vectors of the first factor with coefficients coming from the j th column of the second factor, it follows that these relationships can be expressed in matrix form as



u1 , q1  ⎢u , q  ⎢ 1 2 [u1 | u2 | · · · | un ] = [q1 | q2 | · · · | qn ] ⎢ .. ⎣ . u1 , qn 

⎤ un , q1  un , q2 ⎥ ⎥ .. ⎥ . ⎦ u2 , qn  · · · un , qn  u2 , q1  u2 , q2  .. .

··· ···

or more briefly as

A = QR

(15)

where R is the second factor in the product. However, it is a property of the Gram– Schmidt process that for j ≥ 2, the vector qj is orthogonal to u1 , u2 , . . . , uj −1 . Thus, all entries below the main diagonal of R are zero, and R has the form



u1 , q1 

⎢ ⎢ R=⎢ ⎣

0

.. .

0

⎤ un , q1  un , q2 ⎥ ⎥ .. ⎥ . ⎦ · · · un , qn 

u2 , q1  · · · u2 , q2  · · · .. . 0

(16)

We leave it for you to show that R is invertible by showing that its diagonal entries are nonzero. Thus, Equation (15) is a factorization of A into the product of a matrix Q

6.3 Gram–Schmidt Process; QR -Decomposition

375

with orthonormal column vectors and an invertible upper triangular matrix R . We call Equation (15) a QR -decomposition of A. In summary, we have the following theorem.

THEOREM 6.3.7 QR -Decomposition

It is common in numerical linear algebra to say that a matrix with linearly independent columns has full column rank.

If A is an m × n matrix with linearly independent column vectors, then A can be factored as

A = QR where Q is an m × n matrix with orthonormal column vectors, and R is an n × n invertible upper triangular matrix.

Recall from Theorem 5.1.5 (the Equivalence Theorem) that a square matrix has linearly independent column vectors if and only if it is invertible. Thus, it follows from Theorem 6.3.7 that every invertible matrix has a QR -decomposition. E X A M P L E 10 QR -Decomposition of a 3 × 3 Matrix

Find a QR -decomposition of



1 ⎢ A = ⎣1 1 Solution The column vectors of A are

⎡ ⎤



0 1 1

0 ⎥ 0⎦ 1

⎡ ⎤

⎡ ⎤

1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ u1 = ⎣1⎦, u2 = ⎣1⎦, u3 = ⎣0⎦ 1 1 1 Applying the Gram–Schmidt process with normalization to these column vectors yields the orthonormal vectors (see Example 8)

⎡ q1 =

√1

3 ⎢ 1 ⎢√ ⎣ 3 √1 3





− √26

⎥ ⎢ ⎥, q2 = ⎢ ⎦ ⎣

√1

6 √1 6







0

⎥ ⎥ ⎢ ⎥, q3 = ⎢− √1 ⎥ ⎦ ⎣ 2 ⎦ √1

2

Thus, it follows from Formula (16) that R is



u1 , q1 

⎢ R=⎣

0 0

u2 , q1  u2 , q2  0

⎤ ⎡ √3 3 u3 , q1  ⎥ ⎢ ⎢ 0 u3 , q2 ⎦ = ⎣ u3 , q3  0

√2

3 √2 6

0

√1

3 √1 6 √1 2

⎤ ⎥ ⎥ ⎦

from which it follows that a QR -decomposition of A is



1 ⎢ ⎣1 1

0 1 1





√1

3 0 ⎢ 1 ⎥ √ 0⎦ = ⎢ ⎣ 3 1 √1 3

A

=

− √26 √1

6 √1 6

Q

⎤ ⎡

0

√3

√2

0

0

3

⎥ ⎢ ⎢ − √12 ⎥ ⎦ ⎣0 √1

2

3 √2 6

R

√1

3 √1 6 √1 2

⎤ ⎥ ⎥ ⎦

376

Chapter 6 Inner Product Spaces

Exercise Set 6.3 1. In each part, determine whether the set of vectors is orthogonal and whether it is orthonormal with respect to the Euclidean inner product on R 2 . (a) (0, 1), (2, 0)

'

√1

form an orthogonal basis for R 3 with respect to the Euclidean inner product, and then use Theorem 6.3.2(a) to express the vector u = (−1, 0, 2) as a linear combination of v1 , v2 , and v3 .

2

(d) (0, 0), (0, 1)

10. Verify that the vectors

2. In each part, determine whether the set of vectors is orthogonal and whether it is orthonormal with respect to the Euclidean inner product on R 3 .

'

( ' ( ' ( , √13 , √13 , − √13 , − √12 , 0, √12       (b) 23 , − 23 , 13 , 23 , 13 , − 23 , 13 , 23 , 23 ' ( (c) (1, 0, 0), 0, √12 , √12 , (0, 0, 1) ' ( ' ( (d) √16 , √16 , − √26 , √12 , − √12 , 0 (a)

√1

2

, 0,

9. Verify that the vectors v1 = (2, −2, 1), v2 = (2, 1, −2), v3 = (1, 2, 2)

( ' ( , √12 , √12 ' ( ' ( (c) − √12 , − √12 , √12 , √12 (b) − √12 ,

8. Use Theorem 6.3.2(b) to express the vector u = (3, −7, 4) as a linear combination of the vectors v1 , v2 , and v3 in Exercise 7.

√1

2

v1 = (1, −1, 2, −1), v3 = (1, 2, 0, −1),

form an orthogonal basis for R with respect to the Euclidean inner product, and then use Theorem 6.3.2(a) to express the vector u = (1, 1, 1, 1) as a linear combination of v1 , v2 , v3 , and v4 .

11. Exercise 7

2 3 1 3

p3 (x) =

− 23 x + 13 x 2 , p2 (x) =

+ 13 x − 23 x 2 ,

2 3

+ 23 x + 23 x 2

(b) p1 (x) = 1, p2 (x) =

√1

2

x+

√1

2

x 2 , p3 (x) = x 2

4. In each part, determine whether the set of vectors is orthogonal with respect to the standard inner product on M22 (see Example 6 of Section 6.1).



(a)



1 0

0 , 0



1 (b) 0



0 , 0

2 3

0





0

2 3





, , − 23 − 23 31



1 0 0 0 , ,

1 3

0 0

0

1

1

1

0

1 3

2 3

2 3

0 −1





⎡1





1

2

0

5. A = ⎣ 0

0

5⎦

−1

2

0



5

− 21

1 6. A = ⎢ ⎣5

1 2

1 5

0





13. Exercise 9

14. Exercise 10 2

In Exercises 15–18, let R have the Euclidean inner product. (a) Find the orthogonal projection of u onto the line spanned by the vector v. (b) Find the component of u orthogonal to the line spanned by the vector v, and confirm that this component is orthogonal to the line. 15. u = (−1, 6); v =

3 5

,

4 5



16. u = (2, 3); v =

5 13

,

12 13



18. u = (3, −1); v = (3, 4) 3

In Exercises 19–22, let R have the Euclidean inner product. (a) Find the orthogonal projection of u onto the plane spanned by the vectors v1 and v2 .

1⎤ 3



1⎥ 3⎦

− 23

7. Verify that the vectors v1 = − 35 , 45 , 0 , v2 =

12. Exercise 8

17. u = (2, 3); v = (1, 1)

In Exercises 5–6, show that the column vectors of A form an orthogonal basis for the column space of A with respect to the Euclidean inner product, and then find an orthonormal basis for that column space.



v4 = (1, 0, 0, 1) 4

In Exercises 11–14, find the coordinate vector (u)S for the vector u and the basis S that were given in the stated exercise.

3. In each part, determine whether the set of vectors is orthogonal with respect to the standard inner product on P2 (see Example 7 of Section 6.1). (a) p1 (x) =

v2 = (−2, 2, 3, 2),

4

 , 3 , 0 , v3 = (0, 0, 1) 5 5

form an orthonormal basis for R 3 with respect to the Euclidean inner product, and then use Theorem 6.3.2(b) to express the vector u = (1, −2, 2) as a linear combination of v1 , v2 , and v3 .

(b) Find the component of u orthogonal to the plane spanned by the vectors v1 and v2 , and confirm that this component is orthogonal to the plane.

   , 23 , − 23 , v2 = 23 , 13 , 23 ' ' ( 20. u = (3, −1, 2); v1 = √16 , √16 , − √26 , v2 = √13 , √13 , 19. u = (4, 2, 1); v1 =

1 3

√1

(

3

21. u = (1, 0, 3); v1 = (1, −2, 1), v2 = (2, 1, 0) 22. u = (1, 0, 2); v1 = (3, 1, 2), v2 = (−1, 1, 1) In Exercises 23–24, the vectors v1 and v2 are orthogonal with respect to the Euclidean inner product on R 4 . Find the orthogonal projection of b = (1, 2, 0, −2) on the subspace W spanned by these vectors. 23. v1 = (1, 1, 1, 1), v2 = (1, 1, −1, −1) 24. v1 = (0, 1, −4, −1), v2 = (3, 5, 1, 1)

6.3 Gram–Schmidt Process; QR -Decomposition

In Exercises 25–26, the vectors v1 , v2 , and v3 are orthonormal with respect to the Euclidean inner product on R 4 . Find the orthogonal projection of b = (1, 2, 0, −1) onto the subspace W spanned by these vectors.

'

25. v1 = 0, v3 =

'

√1

√1

18

18

, − √4

, 0,

18

√1

18

, − √1

, − √418

(

18

(

, v2 =

1 2

,

5 1 1 , , 6 6 6



,

377

38. Verify that the set of vectors {(1, 0), (0, 1)} is orthogonal with respect to the inner product u, v = 4u1 v1 + u2 v2 on R 2 ; then convert it to an orthonormal set by normalizing the vectors. 39. Find vectors x and y in R 2 that are orthonormal with respect to the inner product u, v = 3u1 v1 + 2u2 v2 but are not orthonormal with respect to the Euclidean inner product.

    26. v1 = 21 , 21 , 21 , 21 , v2 = 21 , 21 , − 21 , − 21 ,  1 v3 = 2 , − 21 , 21 , − 21

40. In Example 3 of Section 4.9 we found the orthogonal projection of the vector x = (1, 5) onto the line through the origin making an angle of π/6 radians with the positive x -axis. Solve that same problem using Theorem 6.3.4.

In Exercises 27–28, let R 2 have the Euclidean inner product and use the Gram–Schmidt process to transform the basis {u1 , u2 } into an orthonormal basis. Draw both sets of basis vectors in the xy -plane.

41. This exercise illustrates that the orthogonal projection resulting from Formula (12) in Theorem 6.3.4 does not depend on which orthogonal basis vectors are used.

27. u1 = (1, −3), u2 = (2, 2)

28. u1 = (1, 0), u2 = (3, −5)

(a) Let R 3 have the Euclidean inner product, and let W be the subspace of R 3 spanned by the orthogonal vectors v1 = (1, 0, 1) and v2 = (0, 1, 0)

In Exercises 29–30, let R 3 have the Euclidean inner product and use the Gram–Schmidt process to transform the basis {u1 , u2 , u3 } into an orthonormal basis.

Show that the orthogonal vectors

29. u1 = (1, 1, 1), u2 = (−1, 1, 0), u3 = (1, 2, 1)

span the same subspace W .

30. u1 = (1, 0, 0), u2 = (3, 7, −2), u3 = (0, 4, 1) 31. Let R 4 have the Euclidean inner product. Use the Gram– Schmidt process to transform the basis {u1 , u2 , u3 , u4 } into an orthonormal basis. u1 = (0, 2, 1, 0),

u2 = (1, −1, 0, 0),

u3 = (1, 2, 0, −1),

u4 = (1, 0, 0, 1)

32. Let R 3 have the Euclidean inner product. Find an orthonormal basis for the subspace spanned by (0, 1, 2), (−1, 0, 1), (−1, 1, 3).

v 1 = (1, 1, 1) and v 2 = (1, −2, 1) (b) Let u = (−3, 1, 7) and show that the same vector projW u results regardless of which of the bases in part (a) is used for its computation. 42. (Calculus required ) Use Theorem 6.3.2(a) to express the following polynomials as linear combinations of the first three Legendre polynomials (see the Remark following Example 9). (a) 1 + x + 4x 2

(b) 2 − 7x 2

(c) 4 + 3x

43. (Calculus required ) Let P2 have the inner product



1

p, q =

p(x)q(x) dx 0

33. Let b and W be as in Exercise 23. Find vectors w1 in W and w2 in W ⊥ such that b = w1 + w2 .

Apply the Gram–Schmidt process to transform the standard basis S = {1, x, x 2 } into an orthonormal basis.

34. Let b and W be as in Exercise 25. Find vectors w1 in W and w2 in W ⊥ such that b = w1 + w2 .

44. Find an orthogonal basis for the column space of the matrix



35. Let R 3 have the Euclidean inner product. The subspace of R 3 spanned by the vectors u1 = (1, 1, 1) and u2 = (2, 0, −1) is a plane passing through the origin. Express w = (1, 2, 3) in the form w = w1 + w2 , where w1 lies in the plane and w2 is perpendicular to the plane. 36. Let R 4 have the Euclidean inner product. Express the vector w = (−1, 2, 6, 0) in the form w = w1 + w2 , where w1 is in the space W spanned by u1 = (−1, 0, 1, 2) and u2 = (0, 1, 0, 1), and w2 is orthogonal to W . 37. Let R 3 have the inner product

u, v = u1 v1 + 2u2 v2 + 3u3 v3 Use the Gram–Schmidt process to transform u1 = (1, 1, 1), u2 = (1, 1, 0), u3 = (1, 0, 0) into an orthonormal basis.

6

⎤ −5 1⎥ ⎥ ⎥ 5⎦

1

⎢ 2 ⎢ A=⎢ ⎣−2

−2

6

8

1

−7

In Exercises 45–48, we obtained the column vectors of Q by applying the Gram–Schmidt process to the column vectors of A. Find a QR -decomposition of the matrix A.



1 45. A = 2



1 46. A = ⎣0 1

−1



3



, Q=

√1

5

√2

5



√1

2 2 ⎢ ⎢ ⎦ 1 , Q=⎣0 4 √1 2

− √25



√1

5

− √13 √1

⎤ ⎥ ⎥

3⎦

√1

3

378

Chapter 6 Inner Product Spaces



1 47. A = ⎣0 1



1

48. A = ⎣1 0

0 1 2

2 1 3





√1

2 2 ⎢ ⎢ ⎦ 1 , Q=⎣0 0 √1





− √13

1 ⎢ ⎢ 1⎦ , Q = ⎢ √12 ⎣ 1 0

⎥ ⎥ 6⎦

√2

3

− √16

√1 3

2

True-False Exercises

6

√1

√ √2

√1 2



√1

2 19



− 2√219 √

3√ 2 19

− √319

TF. In parts (a)–(f ) determine whether the statement is true or false, and justify your answer.



⎥ ⎥ ⎥ 19 ⎦

√3

√1 19

49. Find a QR -decomposition of the matrix



1 ⎢−1 A=⎢ ⎣ 1 −1

0 1 0 1



1 1⎥ ⎥ 1⎦ 1

50. In the Remark following Example 8 we discussed two alternative ways to perform the calculations in the Gram–Schmidt process: normalizing each orthogonal basis vector as soon as it is calculated and scaling the orthogonal basis vectors at each step to eliminate fractions. Try these methods in Example 8.

(a) Every linearly independent set of vectors in an inner product space is orthogonal. (b) Every orthogonal set of vectors in an inner product space is linearly independent. (c) Every nontrivial subspace of R 3 has an orthonormal basis with respect to the Euclidean inner product. (d) Every nonzero finite-dimensional inner product space has an orthonormal basis. (e) projW x is orthogonal to every vector of W . (f ) If A is an n × n matrix with a nonzero determinant, then A has a QR-decomposition.

Working withTechnology T1. (a) Use the Gram–Schmidt process to find an orthonormal basis relative to the Euclidean inner product for the column space of ⎡ ⎤ 1 1 1 1

Working with Proofs 51. Prove part (a) of Theorem 6.3.6. 52. In Step 3 of the proof of Theorem 6.3.5, it was stated that “the linear independence of {u1 , u2 , . . . , un } ensures that v3  = 0.” Prove this statement. 53. Prove that the diagonal entries of R in Formula (16) are nonzero. 54. Show that matrix Q in Example 10 has the property QQT = I3 , and prove that every m × n matrix Q with orthonormal column vectors has the property QQT = Im . 55. (a) Prove that if W is a subspace of a finite-dimensional vector space V , then the mapping T : V →W defined by T (v) = projW v is a linear transformation.

⎢1 ⎢ A=⎢ ⎣0

0

0

1

0

2⎦

2

−1

1

1

1⎥ ⎥



(b) Use the method of Example 9 to find a QR -decomposition of A. T2. Let P4 have the evaluation inner product at the points −2, −1, 0, 1, 2. Find an orthogonal basis for P4 relative to this inner product by applying the Gram–Schmidt process to the vectors p0 = 1, p1 = x, p2 = x 2 , p3 = x 3 , p4 = x 4

(b) What are the range and kernel of the transformation in part (a)?

6.4 Best Approximation; Least Squares There are many applications in which some linear system Ax = b of m equations in n unknowns should be consistent on physical grounds but fails to be so because of measurement errors in the entries of A or b. In such cases one looks for vectors that come as close as possible to being solutions in the sense that they minimize b − Ax with respect to the Euclidean inner product on R m . In this section we will discuss methods for finding such minimizing vectors.

Least Squares Solutions of Linear Systems

Suppose that Ax = b is an inconsistent linear system of m equations in n unknowns in which we suspect the inconsistency to be caused by errors in the entries of A or b. Since no exact solution is possible, we will look for a vector x that comes as “close as possible” to being a solution in the sense that it minimizes b − Ax with respect to the Euclidean

6.4 Best Approximation; Least Squares

379

inner product on R m . You can think of Ax as an approximation to b and b − Ax as the error in that approximation—the smaller the error, the better the approximation. This leads to the following problem. If a linear system is consistent, then its exact solutions are the same as its least squares solutions, in which case the least squares error is zero.

Least Squares Problem Given a linear system Ax = b of m equations in n unknowns, find a vector x in R n that minimizes b − Ax with respect to the Euclidean inner product on R m . We call such a vector, if it exists, a least squares solution of Ax = b, we call b − Ax the least squares error vector, and we call b − Ax the least squares error.

To explain the terminology in this problem, suppose that the column form of b − Ax is



⎤ e1 ⎢e ⎥ ⎢ 2⎥ ⎥ b − Ax = ⎢ ⎢ .. ⎥ ⎣.⎦ em

The term “least squares solution” results from the fact that minimizing b − Ax also 2 has the effect of minimizing b − Ax 2 = e12 + e22 + · · · + em . What is important to keep in mind about the least squares problem is that for every vector x in R n , the product Ax is in the column space of A because it is a linear combination of the column vectors of A. That being the case, to find a least squares solution of Ax = b is equivalent to finding a vector Aˆx in the column space of A that is closest to b in the sense that it minimizes the length of the vector b − Ax. This is illustrated in Figure 6.4.1a, which also suggests that Aˆx is the orthogonal projection of b on the column space of A, that is, Aˆx = projcol(A) b (Figure 6.4.1b). The next theorem will confirm this conjecture.

b – Ax

b

b

Ax Axˆ

Axˆ = projcol(A)b col(A)

Figure 6.4.1

(a)

col(A)

(b)

THEOREM 6.4.1 Best Approximation Theorem

If W is a finite-dimensional subspace of an inner product space V, and if b is a vector in V, then projW b is the best approximation to b from W in the sense that

b − projW b < b − w for every vector w in W that is different from projW b.

Proof For every vector w in W , we can write

b − w = (b − projW b) + (projW b − w)

(1)

380

Chapter 6 Inner Product Spaces

But projW b − w, being a difference of vectors in W , is itself in W ; and since b − projW b is orthogonal to W , the two terms on the right side of (1) are orthogonal. Thus, it follows from the Theorem of Pythagoras (Theorem 6.2.3) that

b − w 2 = b − projW b 2 + projW b − w 2 If w  = projW b, it follows that the second term in this sum is positive, and hence that

b − projW b 2 < b − w 2 Since norms are nonnegative, it follows (from a property of inequalities) that

b − projW b < b − w It follows from Theorem 6.4.1 that if V = R n and W = col(A), then the best approximation to b from col(A) is projcol(A) b. But every vector in the column space of A is expressible in the form Ax for some vector x, so there is at least one vector xˆ in col(A) for which Aˆx = projcol(A) b. Each such vector is a least squares solution of Ax = b. Note, however, that although there may be more than one least squares solution of Ax = b, each such solution xˆ has the same error vector b − Aˆx.

Finding Least Squares Solutions

One way to find a least squares solution of Ax = b is to calculate the orthogonal projection projW b on the column space W of A and then solve the equation

Ax = projW b

(2)

However, we can avoid calculating the projection by rewriting (2) as b − Ax = b − projW b and then multiplying both sides of this equation by AT to obtain

AT (b − Ax) = AT (b − projW b)

(3)

Since b − projW b is the component of b that is orthogonal to the column space of A, it follows from Theorem 4.8.7(b) that this vector lies in the null space of AT , and hence that

AT (b − projW b) = 0 Thus, (3) simplifies to

AT (b − Ax) = 0 which we can rewrite as

ATAx = AT b

(4)

This is called the normal equation or the normal system associated with Ax = b. When viewed as a linear system, the individual equations are called the normal equations associated with Ax = b. In summary, we have established the following result.

6.4 Best Approximation; Least Squares THEOREM 6.4.2 For every linear system Ax

381

= b, the associated normal system

A Ax = AT b T

(5)

is consistent, and all solutions of (5) are least squares solutions of Ax = b. Moreover, if W is the column space of A, and x is any least squares solution of Ax = b, then the orthogonal projection of b on W is projW b = Ax

(6)

E X A M P L E 1 Unique Least Squares Solution

Find the least squares solution, the least squares error vector, and the least squares error of the linear system x1 − x2 = 4 3x1 + 2x2 = 1

−2x1 + 4x2 = 3 Solution It will be convenient to express the system in the matrix form Ax





4

AA=

1

3

−1

2

AT b =





1

−2 ⎢ ⎣ 3 4 −2 1

3

−1

2

so the normal system ATAx = AT b is 14

−3

(7)

3

It follows that T

= b, where

−1 4 ⎥ ⎢ ⎥ 2⎦ and b = ⎣1⎦

1

⎢ A=⎣ 3 −2

⎡ ⎤

−1



⎥ 2⎦ =

4



14

−3

−3

21

(8)

⎡ ⎤

4 −2 ⎢ ⎥ 1 ⎣1⎦ = 4

3

10

−3 x1 1 = 21 x2 10

Solving this system yields a unique least squares solution, namely,

x1 =

17 , 95

x2 =

143 285

The least squares error vector is

⎡ ⎤



4 1 ⎢ ⎥ ⎢ b − Ax = ⎣1⎦ − ⎣ 3 3 −2





4 −1 17 ⎢ ⎥ 95 ⎢ ⎥ ⎢ 2⎦ ⎣ ⎦ = ⎣1⎦ − ⎢ 143 ⎣

439 ⎥ 285 ⎥

⎤⎡

4



⎡ ⎤

285

3

92 − 285

95 57







1232 285 ⎥

⎢ ⎢ ⎥ = ⎢− 154 ⎥ ⎦ ⎣ 285 ⎦ 4 3

and the least squares error is

b − Ax ≈ 4.556 The computations in the next example are a little tedious for hand computation, so in absence of a calculating utility you may want to just read through it for its ideas and logical flow.

382

Chapter 6 Inner Product Spaces

E X A M P L E 2 Infinitely Many Least Squares Solutions

Find the least squares solutions, the least squares error vector, and the least squares error of the linear system 3x1 + 2x2 − x3 = 2 x1 − 4x2 + 3x3 = −2 x1 + 10x2 − 7x3 = 1 Solution The matrix form of the system is Ax



3

−1

2

⎢ A = ⎣1 −4 1 It follows that





= b, where ⎡





−7



12

⎢ ATA = ⎣ 12 −7



1

⎤ −7 ⎥ 120 −84⎦ and −84 59

11



b = ⎣−2⎦

3⎦ and

10

2



5

⎥ ⎢ AT b = ⎣ 22⎦ −15

so the augmented matrix for the normal system ATAx = AT b is



11

−7 −84

12

⎢ ⎣ 12 120 −7 −84

59

The reduced row echelon form of this matrix is



1

1

1 7 − 57

0

0

0

⎢ ⎣0 0

5

⎤ ⎥

22⎦

−15 ⎤

2 7 13 ⎥ 84 ⎦

0

from which it follows that there are infinitely many least squares solutions, and that they are given by the parametric equations

x1 =

2 7

x2 =

13 84

− 17 t + 57 t

x3 = t As a check, let us verify that all least squares solutions produce the same least squares error vector and the same least squares error. To see that this is so, we first compute











3

2

b − Ax = ⎣−2⎦ − ⎣1

−4



2

1

1

10

⎤ ⎡ 2 1 ⎤ ⎡ ⎤ ⎡ 7⎤ ⎡ 5⎤ −1 − 7t 2 6 6 7 ⎥ ⎢ 13 5 ⎥ ⎢ ⎥ ⎢ 1 ⎥ ⎢ 5 ⎥ 3⎦ ⎣ 84 + 7 t ⎦ = ⎣−2⎦ − ⎣− 3 ⎦ = ⎣− 3 ⎦ 11 −7 1 t − 56 6

Since b − Ax does not depend on t , all least squares solutions produce the same error vector, namely

b − A x = Conditions for Uniqueness of Least Squares Solutions

#  5 2 6

√  2  2 + − 53 + − 56 = 56 6

We know from Theorem 6.4.2 that the system ATAx = AT b of normal equations that is associated with the system Ax = b is consistent. Thus, it follows from Theorem 1.6.1 that every linear system Ax = b has either one least squares solution (as in Example 1) or infinitely many least squares solutions (as in Example 2). Since ATA is a square matrix, the former occurs if ATA is invertible and the latter if it is not. The next two theorems are concerned with this idea.

6.4 Best Approximation; Least Squares

383

THEOREM 6.4.3 If A is an m × n matrix, then the following are equivalent.

(a) The column vectors of A are linearly independent. (b) ATA is invertible. Proof We will prove that (a)

⇒ (b) and leave the proof that (b) ⇒ (a) as an exercise.

(a) ⇒ (b) Assume that the column vectors of

A are linearly independent. The matrix ATA has size n × n, so we can prove that this matrix is invertible by showing that the linear system ATAx = 0 has only the trivial solution. But if x is any solution of this system, then Ax is in the null space of AT and also in the column space of A. By Theorem 4.8.7(b) these spaces are orthogonal complements, so part (b) of Theorem 6.2.4 implies that

Ax = 0. But A is assumed to have linearly independent column vectors, so x = 0 by Theorem 1.3.1. The next theorem, which follows directly from Theorems 6.4.2 and 6.4.3, gives an explicit formula for the least squares solution of a linear system in which the coefficient matrix has linearly independent column vectors.

A is an m × n matrix with linearly independent column vectors, then for every m × 1 matrix b, the linear system Ax = b has a unique least squares solution. This solution is given by

THEOREM 6.4.4 If

x = (ATA)−1AT b

(9)

Moreover, if W is the column space of A, then the orthogonal projection of b on W is projW b = Ax = A(ATA)−1AT b

(10)

E X A M P L E 3 A Formula Solution to Example 1

Use Formula (9) and the matrices in Formulas (7) and (8) to find the least squares solution of the linear system in Example 1. Solution We leave it for you to verify that



14

−3

−3

21

3

1

3

14

−1

2

x = (ATA)−1 AT b =



1 21 = 285 3

− 1 



1

⎡ ⎤  4 3 −2 ⎢ ⎥ ⎣ 1⎦

−1 2 4 ⎡ ⎤ ⎡ ⎤ 3  4 17 −2 ⎢ ⎥ 95 ⎣ ⎦ ⎣ 1⎦ = 4

3

143 285

which agrees with the result obtained in Example 1. It follows from Formula (10) that the standard matrix for the orthogonal projection on the column space of a matrix A is

P = A(ATA)−1 AT

(11)

We will use this result in the next example. E X A M P L E 4 Orthogonal Projection on a Column Space

We showed in Formula (4) of Section 4.9 that the standard matrix for the orthogonal projection onto the line W through the origin of R 2 that makes an angle θ with the positive x -axis is

384

Chapter 6 Inner Product Spaces



y

Pθ =

cos2 θ

sin θ cos θ

sin θ cos θ

sin2 θ

Derive this result using Formula (11). W 1 θ cos θ

sin θ

A for which the line W is the column space. Since the line is one-dimensional and consists of all scalar multiples of the vector w = (cos θ, sin θ) (see Figure 6.4.2), we can take A to be

Solution To apply Formula (11) we must find a matrix

w x

Figure 6.4.2



cos θ A= sin θ

Since ATA is the 1 × 1 identity matrix (verify), it follows that



cos θ A(A A) A = AA = [cos θ sin θ T

−1

T

T



=

More on the Equivalence Theorem

sin θ]

cos2 θ

sin θ cos θ

sin θ cos θ

sin2 θ

= Pθ

As our final result in the main part of this section we will add one additional part to Theorem 5.1.5. THEOREM 6.4.5 Equivalent Statements

If A is an n × n matrix, then the following statements are equivalent. (a)

A is invertible.

(b)

Ax = 0 has only the trivial solution.

(c)

The reduced row echelon form of A is In .

(d )

A is expressible as a product of elementary matrices.

Ax = b is consistent for every n × 1 matrix b. ( f ) Ax = b has exactly one solution for every n × 1 matrix b. ( g) det(A)  = 0. (h) The column vectors of A are linearly independent.

(e)

(i )

The row vectors of A are linearly independent.

( j)

The column vectors of A span R n .

(k)

The row vectors of A span R n .

(l )

The column vectors of A form a basis for R n .

(m) The row vectors of A form a basis for R n . (n)

A has rank n.

(o)

A has nullity 0.

( p) The orthogonal complement of the null space of A is R n . (q)

The orthogonal complement of the row space of A is {0}.

(r)

The kernel of TA is {0}.

(s)

The range of TA is R n .

(t)

TA is one-to-one.

(u)

λ = 0 is not an eigenvalue of A. ATA is invertible.

(v)

6.4 Best Approximation; Least Squares

385

The proof of part (v) follows from part (h) of this theorem and Theorem 6.4.3 applied to square matrices. O PT I O N A L

Another View of Least Squares

Recall from Theorem 4.8.7 that the null space and row space of an m × n matrix A are orthogonal complements, as are the null space of AT and the column space of A. Thus, given a linear system Ax = b in which A is an m × n matrix, the Projection Theorem (6.3.3) tells us that the vectors x and b can each be decomposed into sums of orthogonal terms as x = xrow(A) + xnull(A) and b = bnull(AT ) + bcol(A) where xrow(A) and xnull(A) are the orthogonal projections of x on the row space of A and the null space of A, and the vectors bnull(AT ) and bcol(A) are the orthogonal projections of b on the null space of AT and the column space of A. In Figure 6.4.3 we have represented the fundamental spaces of A by perpendicular lines in R n and R m on which we indicated the orthogonal projections of x and b. (This, of course, is only pictorial since the fundamental spaces need not be one-dimensional.) The figure shows Ax as a point in the column space of A and conveys that bcol(A) is the point in col(A) that is closest to b. This illustrates that the least squares solutions of Ax = b are the exact solutions of the equation Ax = bcol(A) . null(A)

col(A)

Ax xnull(A)

Figure 6.4.3

O PT I O N A L

The Role of QR-Decomposition in Least Squares Problems

Rn

x

xrow(A)

b

row(A)

null(AT)

bnull(AT)

bcol(A)

Rm

Formulas (9) and (10) have theoretical use but are not well suited for numerical computation. In practice, least squares solutions of Ax = b are typically found by using some variation of Gaussian elimination to solve the normal equations or by using QRdecomposition and the following theorem. THEOREM 6.4.6 If A is an m × n matrix with linearly independent column vectors, and

if A = QR is a QR -decomposition of A (see Theorem 6.3.7), then for each b in R m the system Ax = b has a unique least squares solution given by x = R −1 QT b

(12)

A proof of this theorem and a discussion of its use can be found in many books on numerical methods of linear algebra. However, you can obtain Formula (12) by making the substitution A = QR in (9) and using the fact that QTQ = I to obtain



−1

x = (QR)T (QR)

(QR)T b = (R TQTQR)−1 (QR)T b = R −1 (R T )−1 R TQT b = R −1QT b

386

Chapter 6 Inner Product Spaces

Exercise Set 6.4



In Exercises 1–2, find the associated normal equation.



1 ⎢ 1. ⎣2 4







−1 2 ⎥ x1 ⎢ ⎥ 3⎦ = ⎣−1⎦ x2 5









−1 0 ⎡ ⎤ x1 ⎥ ⎢ 2 ⎥ ⎢ ⎥ ⎢ 0⎥ ⎥ ⎥ ⎣x2 ⎦ = ⎢ ⎥ 5⎦ ⎣ 1⎦ x3 2 4

1 4 2

In Exercises 3–6, find the least squares solution of the equation Ax = b.



−1

1 ⎢ 3. A = ⎣2 4





2 ⎥ ⎢ ⎥ 3⎦; b = ⎣−1⎦ 5 5



−2

2 ⎢ 4. A = ⎣1 3











2 ⎢ ⎥ ⎥ 1⎦; b = ⎣−1⎦ 1 1

1

1



2

−1 6 ⎢0⎥ −2⎥ ⎢ ⎥ ⎥ ⎥; b = ⎢ ⎥ 0⎦ ⎣9⎦ 3 −1 ⎡ ⎤ ⎤ −1 0 ⎢6⎥ ⎥ 2⎥ ⎢ ⎥ ⎥; b = ⎢ ⎥ 0⎦ ⎣0⎦ 6 −1

0 −2 −1 1

⎢1 ⎢ 6. A = ⎢ ⎣2 0

⎡ ⎤



0 1 1 1

⎢2 ⎢ 5. A = ⎢ ⎣1



In Exercises 7–10, find the least squares error vector and least squares error of the stated equation. Verify that the least squares error vector is orthogonal to the column space of A.

3 ⎢ 14. A = ⎣1 1





3 1 1

13. A = ⎣ 2 0



5

−1

2 ⎢ 3 ⎢ 2. ⎢ ⎣−1 1

−1



2 7 ⎢ ⎥ ⎥ 3⎦; b = ⎣ 0⎦ 1 −7

−1

2 −4 10







2 ⎥ ⎢ ⎥ 3⎦; b = ⎣−2⎦ 1 −7

In Exercises 15–16, use Theorem 6.4.2 to find the orthogonal projection of b on the column space of A, and check your result using Theorem 6.4.4.



−1

1 ⎢ 15. A = ⎣ 3 −2





⎡ ⎤ 4

⎥ ⎢ ⎥ 2⎦; b = ⎣1⎦ 3

4







1 −4 ⎢ ⎥ ⎥ 3⎦; b = ⎣ 2⎦ 3 −2

5 ⎢ 16. A = ⎣1 4

17. Find the orthogonal projection of u on the subspace of R 3 spanned by the vectors v1 and v2 . u = (1, −6, 1); v1 = (−1, 2, 1), v2 = (2, 2, 4) 18. Find the orthogonal projection of u on the subspace of R 4 spanned by the vectors v1 , v2 , and v3 . u = (6, 3, 9, 6); v1 = (2, 1, 1, 1), v2 = (1, 0, 1, 1), v3 = (−2, −1, 0, −1) In Exercises 19–20, use the method of Example 3 to find the standard matrix for the orthogonal projection on the stated subspace of R 2 . Compare your result to that in Table 3 of Section 4.9. 19. the x -axis

20. the y -axis

7. The equation in Exercise 3.

In Exercises 21–22, use the method of Example 3 to find the standard matrix for the orthogonal projection on the stated subspace of R 3 . Compare your result to that in Table 4 of Section 4.9.

8. The equation in Exercise 4.

21. the xz-plane

9. The equation in Exercise 5.

In Exercises 23–24, a QR -factorization of A is given. Use it to find the least squares solution of Ax = b.

10. The equation in Exercise 6. In Exercises 11–14, find parametric equations for all least squares solutions of Ax = b, and confirm that all of the solutions have the same error vector.



2

⎢ 11. A = ⎣ 4 −2 ⎡

1 ⎢ 12. A = ⎣−2 3



⎡ ⎤

1 3 ⎥ ⎢ ⎥ 2⎦; b = ⎣2⎦ 1 −1 3



⎡ ⎤



3

1

−4 ⎡

1



3

24. A = ⎣4 0



=

3 5 4 −5



⎡3

⎥ −8 ⎦ =

5 ⎢4 ⎣5

−6 1

0

4 5 3 5





5

− 15

0

7 5

0  ⎥ 5 0⎦ 0 1



 

; b=

3

2



⎤ −1 −10 ⎢ ⎥ ; b = ⎣ 7⎦ 

1

2

25. Let W be the plane with equation 5x − 3y + z = 0.

1

(a) Find a basis for W .

1

(b) Find the standard matrix for the orthogonal projection onto W .

⎢ ⎥ ⎥ −6⎦; b = ⎣0⎦ 9



23. A =

22. the yz-plane

6.5 Mathematical Modeling Using Least Squares

True-False Exercises

26. Let W be the line with parametric equations

x = 2t, y = −t, z = 4t

TF. In parts (a)–(h) determine whether the statement is true or false, and justify your answer.

(a) Find a basis for W . (b) Find the standard matrix for the orthogonal projection on W . 27. Find the orthogonal projection of u = (5, 6, 7, 2) on the solution space of the homogeneous linear system

x1 + x2 + x3 =0 2x2 + x3 + x4 = 0 28. Show that if w = (a, b, c) is a nonzero vector, then the standard matrix for the orthogonal projection of R 3 onto the line span{w} is



P =

1

a 2 + b2 + c2

a2

⎢ ⎣ab ac

ab

387

ac



b2

⎥ bc ⎦

bc

c2

29. Let A be an m × n matrix with linearly independent row vectors. Find a standard matrix for the orthogonal projection of R n onto the row space of A.

Working with Proofs 30. Prove: If A has linearly independent column vectors, and if Ax = b is consistent, then the least squares solution of Ax = b and the exact solution of Ax = b are the same. 31. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares solution of Ax = b is x = 0. 32. Prove the implication (b) ⇒ (a) of Theorem 6.4.3.

(a) If A is an m × n matrix, then ATA is a square matrix. (b) If ATA is invertible, then A is invertible. (c) If A is invertible, then ATA is invertible. (d) If Ax = b is a consistent linear system, then ATAx = AT b is also consistent. (e) If Ax = b is an inconsistent linear system, then ATAx = AT b is also inconsistent. (f ) Every linear system has a least squares solution. (g) Every linear system has a unique least squares solution. (h) If A is an m × n matrix with linearly independent columns and b is in R m , then Ax = b has a unique least squares solution.

Working withTechnology T1. (a) Use Theorem 6.4.4 to show that the following linear system has a unique least squares solution, and use the method of Example 1 to find it.

x1 + x2 + x3 = 1 4x1 + 2x2 + x3 = 10 9x1 + 3x2 + x3 = 9 16x1 + 4x2 + x3 = 16 (b) Check your result in part (a) using Formula (9). T2. Use your technology utility to perform the computations and confirm the results obtained in Example 2.

6.5 Mathematical Modeling Using Least Squares In this section we will use results about orthogonal projections in inner product spaces to obtain a technique for fitting a line or other polynomial curve to a set of experimentally determined points in the plane.

Fitting a Curve to Data

A common problem in experimental work is to obtain a mathematical relationship y = f(x) between two variables x and y by “fitting” a curve to points in the plane corresponding to various experimentally determined values of x and y , say

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) On the basis of theoretical considerations or simply by observing the pattern of the points, the experimenter decides on the general form of the curve y = f(x) to be fitted. This curve is called a mathematical model of the data. Some examples are (Figure 6.5.1):

388

Chapter 6 Inner Product Spaces

(a) A straight line: y = a + bx (b) A quadratic polynomial: y = a + bx + cx 2 (c) A cubic polynomial: y = a + bx + cx 2 + dx 3 y

y

y

x

x

Least Squares Fit of a Straight Line

(b) y = a + bx + cx2

(a) y = a + bx

Figure 6.5.1

x

(c) y = a + bx + cx2 + dx3

When data points are obtained experimentally, there is generally some measurement “error,” making it impossible to find a curve of the desired form that passes through all the points. Thus, the idea is to choose the curve (by determining its coefficients) that “best fits” the data. We begin with the simplest case: fitting a straight line to data points. Suppose we want to fit a straight line y = a + bx to the experimentally determined points

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) If the data points were collinear, the line would pass through all n points, and the unknown coefficients a and b would satisfy the equations

y1 = a + bx1 y2 = a + bx2 .. .

(1)

yn = a + bxn We can write this system in matrix form as



1 ⎢1 ⎢

⎢ .. ⎣.

1

⎤ ⎡ ⎤ y1 x1 ⎢ ⎥ x2 ⎥ y ⎥ a ⎢ 2⎥ = ⎢ .. ⎥ .. ⎥ b ⎦ ⎣ .⎦ . yn

xn

or more compactly as

Mv = y where

⎤ ⎡ 1 y1 ⎢1 ⎢y ⎥ ⎢ ⎢ 2⎥ y = ⎢ .. ⎥, M = ⎢ .. ⎣. ⎣ .⎦ yn 1 ⎡

(2)

⎤ x1 x2 ⎥ a ⎥ .. ⎥, v = b .⎦ xn

(3)

If there are measurement errors in the data, then the data points will typically not lie on a line, and (1) will be inconsistent. In this case we look for a least squares approximation to the values of a and b by solving the normal system

M TM v = M T y For simplicity, let us assume that the x -coordinates of the data points are not all the same, so M has linearly independent column vectors (Exericse 14) and the normal system has the unique solution

6.5 Mathematical Modeling Using Least Squares



v =

  a∗ b∗

389

= (M TM)−1 M T y

[see Formula (9) of Theorem 6.4.4]. The line y = a ∗ + b∗ x that results from this solution is called the least squares line of best fit or the regression line. It follows from (2) and (3) that this line minimizes

y − M v 2 = [y1 − (a + bx1 )]2 + [y2 − (a + bx2 )]2 + · · · + [yn − (a + bxn )]2 The quantities

d1 = |y1 − (a + bx1 )|, d2 = |y2 − (a + bx2 )|, . . . , dn = |yn − (a + bxn )| are called residuals. Since the residual di is the distance between the data point (xi , yi ) and the regression line (Figure 6.5.2), we can interpret its value as the “error” in yi at the point xi . If we assume that the value of each xi is exact, then all the errors are in the yi so the regression line can be described as the line that minimizes the sum of the squares of the data errors—hence the name, “least squares line of best fit.” In summary, we have the following theorem. y

(xi, yi) di

(x1, y1) d1

yi

y=

a+

bx

dn (xn, yn)

a + bxi x

Figure 6.5.2 di measures the vertical error.

THEOREM 6.5.1 Uniqueness of the Least Squares Solution

Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be a set of two or more data points, not all lying on a vertical line, and let ⎤ ⎡ ⎡ ⎤ 1 x1 y1 ⎢1 x ⎥ ⎢y ⎥ 2⎥ ⎢ ⎢ 2⎥ M = ⎢ .. .. ⎥ and y = ⎢ .. ⎥ (4)

⎣.

1

.⎦ xn

⎣ .⎦ yn

Then there is a unique least squares straight line fit

y = a ∗ + b∗ x to the data points. Moreover,

∗ a v = ∗ b ∗

(5)

(6)

is given by the formula v∗ = (M TM)−1M T y

(7)



which expresses the fact that v = v is the unique solution of the normal equation

M TM v = M T y

(8)

390

Chapter 6 Inner Product Spaces

E X A M P L E 1 Least Squares Straight Line Fit

Find the least squares straight line fit to the four points (0, 1), (1, 3), (2, 4), and (3, 4). (See Figure 6.5.3.)

5 4

Solution We have



3

1 ⎢1 ⎢ M=⎢ ⎣1 1

y 2 1 0 –1

0

1

x

2

3

4



0 1⎥ 4 ⎥ ⎥, M TM = 2⎦ 6 3

v∗ = (M TM)−1M T y =

Figure 6.5.3



7 1 10 −3



6 7 1 , and (M TM)−1 = 14 10 −3

−3

⎡ ⎤



−3 2

2

1

1 0

1 1

1 2

⎥ 1 ⎢ 1.5 ⎢3⎥ ⎢ ⎥= 3 ⎣4⎦ 1 4

so the desired line is y = 1.5 + x . E X A M P L E 2 Spring Constant

6.1 x

Hooke’s law in physics states that the length x of a uniform spring is a linear function of the force y applied to it. If we express this relationship as y = a + bx , then the coefficient b is called the spring constant. Suppose a particular unstretched spring has a measured length of 6.1 inches (i.e., x = 6.1 when y = 0). Suppose further that, as illustrated in Figure 6.5.4, various weights are attached to the end of the spring and the following table of resulting spring lengths is recorded. Find the least squares straight line fit to the data and use it to approximate the spring constant.

y

Weight y (lb)

0

2

4

6

Length x (in)

6.1

7.6

8.7

10.4

Figure 6.5.4 Solution The mathematical problem is to fit a line y

= a + bx to the four data points

(6.1, 0), (7.6, 2), (8.7, 4), (10.4, 6) For these data the matrices M and y in (4) are



1 ⎢1 ⎢ M=⎢ ⎣1 1 so v∗ =



⎡ ⎤

6 .1 0 ⎢2⎥ 7.6⎥ ⎢ ⎥ ⎥ ⎥, y = ⎢ ⎥ 8.7⎦ ⎣4⎦ 6 10.4



a −8.6 T −1 T M) M y ≈ = (M 1 .4 b∗

where the numerical values have been rounded to one decimal place. Thus, the estimated value of the spring constant is b∗ ≈ 1.4 pounds/inch. Least Squares Fit of a Polynomial

The technique described for fitting a straight line to data points can be generalized to fitting a polynomial of specified degree to data points. Let us attempt to fit a polynomial of fixed degree m y = a0 + a1 x + · · · + am x m (9) to n points

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )

6.5 Mathematical Modeling Using Least Squares

391

Substituting these n values of x and y into (9) yields the n equations

y1 = a0 + a1 x1 + · · · + am x1m y2 = a0 + a1 x2 + · · · + am x2m .. .. .. .. . . . . yn = a0 + a1 xn + · · · + am xnm or in matrix form, y = Mv where

⎡ ⎡ ⎤ 1 y1 ⎢1 ⎢y ⎥ ⎢ ⎢ 2⎥ y = ⎢ . ⎥, M = ⎢.. ⎣. ⎣ .. ⎦ yn 1

x1 x2 .. .

x12 x22 .. .

xn

xn2

(10)

⎤ ⎡ ⎤ · · · x1m a0 ⎢a ⎥ · · · x2m ⎥ ⎥ ⎢ 1⎥ .. ⎥, v = ⎢ .. ⎥ ⎣ . ⎦ . ⎦ m am · · · xn

(11)

As before, the solutions of the normal equations

M TM v = M T y determine the coefficients of the polynomial, and the vector v minimizes

y − M v Conditions that guarantee the invertibility of M TM are discussed in the exercises (Exercise 16). If M TM is invertible, then the normal equations have a unique solution v = v∗ , which is given by v∗ = (M TM)−1M T y (12)

E X A M P L E 3 Fitting a Quadratic Curve to Data

According to Newton’s second law of motion, a body near the Earth’s surface falls vertically downward in accordance with the equation

s = s0 + v0 t + 21 gt 2

(13)

where

s = vertical displacement downward relative to some reference point s0 = displacement from the reference point at time t = 0 v0 = velocity at time t = 0 g = acceleration of gravity at the Earth’s surface Suppose that a laboratory experiment is performed to approximate g by measuring the displacement s relative to a fixed reference point of a falling weight at various times. Use the experimental results shown in the following table to approximate g . Time t (sec)

.1

.2

.3

.4

.5

Displacement s (ft)

−0.18

0.31

1.03

2.48

3.73

392

Chapter 6 Inner Product Spaces Solution For notational simplicity, let

a0 = s0 , a1 = v0 , and a2 = 21 g in (13), so our

mathematical problem is to fit a quadratic curve

s = a0 + a1 t + a2 t 2

(14)

to the five data points:

(.1, −0.18), (.2, 0.31), (.3, 1.03), (.4, 2.48), (.5, 3.73) With the appropriate adjustments in notation, the matrices M and y in (11) are



1

⎢ ⎢1 ⎢ ⎢ M = ⎢1 ⎢ ⎢1 ⎣ 1 Thus, from (12),

t5

t52

.25

t3



1

.1

1

.5

s5

3.73

⎤ ⎡ ∗⎤ ⎡ a0 −0.40 ⎥ ⎢ ⎥ ⎢ v∗ = ⎣a1∗ ⎦ = (M TM)−1M T y ≈ ⎣ 0.35⎦ a2∗ 16.1 s = −0.40 + 0.35t + 16.1t 2

2 1 0 –1 0

t4

⎥ ⎢ t22 ⎥ 1 .2 ⎥ ⎢ ⎢ 2⎥ ⎢ t3 ⎥ = ⎢1 .3 ⎥ ⎢ t42 ⎥ ⎦ ⎣1 .4

t2

t12

so the least squares quadratic fit is

3

.1

.2 .3 .4 .5 Time t (in seconds)

Figure 6.5.5

.6

From this equation we estimate that 21 g = 16.1 and hence that g = 32.2 ft/sec2 . Note that this equation also provides the following estimates of the initial displacement and velocity of the weight: s0 = a0∗ = −0.40 ft v0 = a1∗ = 0.35 ft/sec In Figure 6.5.5 we have plotted the data points and the approximating polynomial.

500 450 Temperature T (K)

Distance s (in feet)

4



⎡ ⎤ ⎡ ⎤ ⎤ .01 s1 −0.18 ⎢s ⎥ ⎢ 0.31⎥ .04⎥ ⎢ 2⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ .09⎥, y = ⎢s3 ⎥ = ⎢ 1.03⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ .16⎦ ⎣s4 ⎦ ⎣ 2.48⎦

t1

400 350 300

Temperature of Venusian Atmosphere Magellan orbit 3213 Date: 5 October 1991 Latitude: 67 N LTST: 22:05

250

On October 5, 1991 the Magellan spacecraft entered the atmosphere of Venus and transmitted the temperature T in kelvins (K) versus the altitude h in kilometers (km) until its signal was lost at an altitude of about 34 km. Discounting the initial erratic signal, the data strongly suggested a linear relationship, so a least squares straight line fit was used on the linear part of the data to obtain the equation Historical Note

T = 737.5 − 8.125h

200 150 100 30 40 50 60 70 80 90 100 Altitude h (km) Source: NASA

By setting h = 0 in this equation, the surface temperature of Venus was estimated at T ≈ 737.5 K. The accuracy of this result has been confirmed by more recent flybys of Venus.

6.5 Mathematical Modeling Using Least Squares

393

Exercise Set 6.5 In Exercises 1–2, find the least squares straight line fit

y = ax + b to the data points, and show that the result is reasonable by graphing the fitted line and plotting the data in the same coordinate system. 1. (0, 0), (1, 2), (2, 7)

2. (0, 1), (2, 0), (3, 1), (3, 2)

In Exercises 3–4, find the least squares quadratic fit

y = a0 + a1 x + a2 x 2 to the data points, and show that the result is reasonable by graphing the fitted curve and plotting the data in the same coordinate system. 3. (2, 0), (3, −10), (5, −48), (6, −76) 4. (1, −2), (0, −1), (1, 0), (2, 4) 5. Find a curve of the form y = a + (b/x) that best fits the data points (1, 7), (3, 3), (6, 1) by making the substitution X = 1/x .



6. Find a curve of the form y = a + b x that best fits the data points (3, 1.5), (7, 2.5), (10, 3) by making the substitution √ X = x . Show that the result is reasonable by graphing the fitted curve and plotting the data in the same coordinate system.

Working with Proofs 7. Prove that the matrix M in Equation (3) has linearly independent columns if and only if at least two of the numbers x1 , x2 , . . . , xn are distinct. 8. Prove that the columns of the n × (m + 1) matrix M in Equation (11) are linearly independent if n > m and at least m + 1 of the numbers x1 , x2 , . . . , xn are distinct. [Hint: A nonzero polynomial of degree m has at most m distinct roots.] 9. Let M be the matrix in Equation (11). Using Exercise 8, show that a sufficient condition for the matrix M TM to be invertible is that n > m and that at least m + 1 of the numbers x1 , x2 , . . . , xn are distinct.

(d) If the data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) do not lie on a vertical line, then the expression

|y1 − (a + bx1 )| + |y2 − (a + bx2 )| + · · · + |yn − (a + bxn )| is minimized by taking a and b to be the coefficients in the least squares line y = a + bx of best fit to the data.

Working withTechnology In Exercises T1–T7, find the normal system for the least squares cubic fit y = a0 + a1 x + a2 x 2 + a3 x 3 to the data points. Solve the system and show that the result is reasonable by graphing the fitted curve and plotting the data in the same coordinate system. T1. (−1, −14), (0, −5), (1, −4), (2, 1), (3, 22) T2. (0, −10), (1, −1), (2, 0), (3, 5), (4, 26) T3. The owner of a rapidly expanding business finds that for the first five months of the year the sales (in thousands) are $4.0, $4.4, $5.2, $6.4, and $8.0. The owner plots these figures on a graph and conjectures that for the rest of the year, the sales curve can be approximated by a quadratic polynomial. Find the least squares quadratic polynomial fit to the sales curve, and use it to project the sales for the twelfth month of the year. T4. Pathfinder is an experimental, lightweight, remotely piloted, solar-powered aircraft that was used in a series of experiments by NASA to determine the feasibility of applying solar power for long-duration, high-altitude flights. In August 1997 Pathfinder recorded the data in the accompanying table relating altitude H and temperature T . Show that a linear model is reasonable by plotting the data, and then find the least squares line H = H0 + kT of best fit. Table Ex-T4 Altitude H (thousands of feet) 15 Temperature T (◦ C)

20

25

30

35

40

45

4.5 −5.9 −16.1 −27.6 −39.8 −50.2 −62.9

True-False Exercises TF. In parts (a)–(d) determine whether the statement is true or false, and justify your answer. (a) Every set of data points has a unique least squares straight line fit. (b) If the data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) are not collinear, then (2) is an inconsistent system. (c) If the data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) do not lie on a vertical line, then the expression

|y1 − (a + bx1 )|2 + |y2 − (a + bx2 )2 | + · · · + |yn − (a + bxn )|2 is minimized by taking a and b to be the coefficients in the least squares line y = a + bx of best fit to the data.

Three important models in applications are exponential models (y = aebx ) power function models (y = ax b ) logarithmic models (y = a + b ln x) where a and b are to be determined to fit experimental data as closely as possible. Exercises T5–T7 are concerned with a procedure, called linearization, by which the data are transformed to a form in which a least squares straight line fit can be used to approximate the constants. Calculus is required for these exercises. T5. (a) Show that making the substitution Y = ln y in the equation y = aebx produces the equation Y = bx + ln a whose graph in the xY -plane is a line of slope b and Y -intercept ln a .

394

Chapter 6 Inner Product Spaces

(b) Part (a) suggests that a curve of the form y = aebx can be fitted to n data points (xi , yi ) by letting Yi = ln yi , then fitting a straight line to the transformed data points (xi , Yi ) by least squares to find b and ln a , and then computing a from ln a . Use this method to fit an exponential model to the following data, and graph the curve and data in the same coordinate system. x

0

1

2

3

4

5

6

7

y

3.9

5.3

7.2

9.6

12

17

23

31

T6. (a) Show that making the substitutions

X = ln x and Y = ln y in the equation y = ax b produces the equation Y = bX + ln a whose graph in the XY -plane is a line of slope b and Y intercept ln a . (b) Part (a) suggest that a curve of the form y = ax b can be fitted to n data points (xi , yi ) by letting Xi = ln xi and Yi = ln yi , then fitting a straight line to the transformed data points (Xi , Yi ) by least squares to find b and ln a , and then com-

puting a from ln a . Use this method to fit a power function model to the following data, and graph the curve and data in the same coordinate system. x

2

3

4

5

6

7

8

9

y

1.75

1.91

2.03

2.13

2.22

2.30

2.37

2.43

T7. (a) Show that making the substitution X = ln x in the equation y = a + b ln x produces the equation y = a + bX whose graph in the Xy -plane is a line of slope b and y -intercept a . (b) Part (a) suggests that a curve of the form y = a + b ln x can be fitted to n data points (xi , yi ) by letting Xi = ln xi and then fitting a straight line to the transformed data points (Xi , yi ) by least squares to find b and a . Use this method to fit a logarithmic model to the following data, and graph the curve and data in the same coordinate system. x

2

3

4

5

6

7

8

9

y

4.07

5.30

6.21

6.79

7.32

7.91

8.23

8.51

6.6 Function Approximation; Fourier Series In this section we will show how orthogonal projections can be used to approximate certain types of functions by simpler functions. The ideas explained here have important applications in engineering and science. Calculus is required.

Best Approximations

All of the problems that we will study in this section will be special cases of the following general problem. Approximation Problem Given a function f that is continuous on an interval [a, b],

find the “best possible approximation” to f using only functions from a specified subspace W of C[a, b].

Here are some examples of such problems: (a) Find the best possible approximation to ex over [0, 1] by a polynomial of the form a0 + a1 x + a2 x 2 . (b) Find the best possible approximation to sin πx over [−1, 1] by a function of the form a0 + a1 ex + a2 e2x + a3 e3x . (c) Find the best possible approximation to x over [0, 2π ] by a function of the form a0 + a1 sin x + a2 sin 2x + b1 cos x + b2 cos 2x . In the first example W is the subspace of C[0, 1] spanned by 1, x , and x 2 ; in the second example W is the subspace of C[−1, 1] spanned by 1, ex , e2x , and e3x ; and in the third example W is the subspace of C[0, 2π ] spanned by 1, sin x , sin 2x , cos x , and cos 2x .

6.6 Function Approximation; Fourier Series

Measurements of Error g f

[ a

395

To solve approximation problems of the preceding types, we first need to make the phrase “best approximation over [a, b]” mathematically precise. To do this we will need some way of quantifying the error that results when one continuous function is approximated by another over an interval [a, b]. If we were to approximate f(x) by g(x), and if we were concerned only with the error in that approximation at a single point x0 , then it would be natural to define the error to be error = |f(x0 ) − g(x0 )|

| f (x0) – g(x0)|

]

x0

b

Figure 6.6.1 The deviation between f and g at x0 .

sometimes called the deviation between f and g at x0 (Figure 6.6.1). However, we are not concerned simply with measuring the error at a single point but rather with measuring it over the entire interval [a, b]. The difficulty is that an approximation may have small deviations in one part of the interval and large deviations in another. One possible way of accounting for this is to integrate the deviation |f(x) − g(x)| over the interval [a, b] and define the error over the interval to be



b

error =

g

|f(x) − g(x)| dx

(1)

a

f

[

]

a

b

Figure 6.6.2 The area between the graphs of f and g over [a, b] measures the error in approximating f by g over [a, b].

Geometrically, (1) is the area between the graphs of f(x) and g(x) over the interval [a, b] (Figure 6.6.2)—the greater the area, the greater the overall error. Although (1) is natural and appealing geometrically, most mathematicians and scientists generally favor the following alternative measure of error, called the mean square error:



b

mean square error =

[f(x) − g(x)]2 dx

a

Mean square error emphasizes the effect of larger errors because of the squaring and has the added advantage that it allows us to bring to bear the theory of inner product spaces. To see how, suppose that f is a continuous function on [a, b] that we want to approximate by a function g from a subspace W of C[a, b], and suppose that C[a, b] is given the inner product



b

f, g =

f(x)g(x) dx a

It follows that



f − g 2 = f − g, f − g =

b

[f(x) − g(x)]2 dx = mean square error

a

so minimizing the mean square error is the same as minimizing f − g 2 . Thus, the approximation problem posed informally at the beginning of this section can be restated more precisely as follows.

Least Squares Approximation

Least Squares Approximation Problem Let f be a function that is continuous on an interval [a, b], let C[a, b] have the inner product



b

f, g =

f(x)g(x) dx a

and let W be a finite-dimensional subspace of C[a, b]. Find a function g in W that minimizes  b

f − g 2 = a

[f(x) − g(x)]2 dx

396

Chapter 6 Inner Product Spaces

Since f − g 2 and f − g are minimized by the same function g, this problem is equivalent to looking for a function g in W that is closest to f. But we know from Theorem 6.4.1 that g = projW f is such a function (Figure 6.6.3). Thus, we have the following result. f = function in C[a, b] to be approximated

W subspace of approximating functions

Figure 6.6.3

g = proj W f = least squares approximation to f from W

[a, b], and W is a finite-dimensional subspace of C[a, b], then the function g in W that minimizes the mean square error

THEOREM 6.6.1 If f is a continuous function on



b

[f(x) − g(x)]2 dx

a

is g = projW f, where the orthogonal projection is relative to the inner product



f, g =

b

f(x)g(x) dx a

The function g = projW f is called the least squares approximation to f from W . Fourier Series

A function of the form

T (x) = c0 + c1 cos x + c2 cos 2x + · · · + cn cos nx + d1 sin x + d2 sin 2x + · · · + dn sin nx

(2)

is called a trigonometric polynomial; if cn and dn are not both zero, then T (x) is said to have order n. For example,

T (x) = 2 + cos x − 3 cos 2x + 7 sin 4x is a trigonometric polynomial of order 4 with

c0 = 2, c1 = 1, c2 = −3, c3 = 0, c4 = 0, d1 = 0, d2 = 0, d3 = 0, d4 = 7 It is evident from (2) that the trigonometric polynomials of order n or less are the various possible linear combinations of 1, cos x, cos 2x, . . . , cos nx,

sin x, sin 2x, . . . , sin nx

(3)

It can be shown that these 2n + 1 functions are linearly independent and thus form a basis for a (2n + 1)-dimensional subspace of C[a, b]. Let us now consider the problem of finding the least squares approximation of a continuous function f(x) over the interval [0, 2π ] by a trigonometric polynomial of order n or less. As noted above, the least squares approximation to f from W is the orthogonal projection of f on W . To find this orthogonal projection, we must find an orthonormal basis g0 , g1 , . . . , g2n for W , after which we can compute the orthogonal projection on W from the formula projW f = f, g0 g0 + f, g1 g1 + · · · + f, g2n g2n

(4)

6.6 Function Approximation; Fourier Series

397

[see Theorem 6.3.4(b)]. An orthonormal basis for W can be obtained by applying the Gram–Schmidt process to the basis vectors in (3) using the inner product





f, g =

f(x)g(x) dx 0

This yields the orthonormal basis 1 1 1 g0 = √ , g1 = √ cos x, . . . , gn = √ cos nx, π π 2π 1 1 gn+1 = √ sin x, . . . , g2n = √ sin nx

π

(5)

π

(see Exercise 6). If we introduce the notation 2

1

a0 = √



1

f, g0 , a1 = √ f, g1 , . . . , an = √ f, gn  π π 1

1

(6)

b1 = √ f, gn+1 , . . . , bn = √ f, g2n  π π then on substituting (5) in (4), we obtain projW f =

a0 2

+ [a1 cos x + · · · + an cos nx] + [b1 sin x + · · · + bn sin nx]

where 2

a0 = √





2

f, g0  = √



1

1

a1 = √ f, g1  = √ π π .. . 1

1

an = √ f, gn  = √ π π



1

f(x) √

0





0

 0





1

dx =





f(x) dx

π

1

(7)

0

1



f(x) √ cos x dx = π π 1

f(x) cos x dx 0

1

f(x) √ cos nx dx = π π

 2π 1 1 1 b1 = √ f, gn+1  = √ f(x) √ sin x dx = π π 0 π .. .  2π 1 1 1 bn = √ f, g2n  = √ f(x) √ sin nx dx = π π 0 π



1

π 1

π





f(x) cos nx dx 0





f(x) sin x dx 0





f(x) sin nx dx 0

In short,

ak =

1

π





f(x) cos kx dx, bk =

0

1



π



f(x) sin kx dx 0

The numbers a0 , a1 , . . . , an , b1 , . . . , bn are called the Fourier coefficients of f.

E X A M P L E 1 Least Squares Approximations

Find the least squares approximation of f(x) = x on [0, 2π ] by (a) a trigonometric polynomial of order 2 or less; (b) a trigonometric polynomial of order n or less.

(8)

398

Chapter 6 Inner Product Spaces Solution (a)

a0 =

1



π



f(x) dx =

0

1



π



x dx = 2π

(9a)

x cos kx dx = 0

(9b)

0

For k = 1, 2, . . . , integration by parts yields (verify)

ak = bk =



1

π



f(x) cos kx dx =

0



1

π



f(x) sin kx dx =

1

π 1







0 2π

x sin kx dx = −

2

(9c)

π 0 k Thus, the least squares approximation to x on [0, 2π ] by a trigonometric polynomial of order 2 or less is

x≈

0

a0

+ a1 cos x + a2 cos 2x + b1 sin x + b2 sin 2x

2 or, from (9a), (9b), and (9c),

x ≈ π − 2 sin x − sin 2x Solution (b) The least squares approximation to x on [0, 2π ] by a trigonometric poly-

nomial of order n or less is

x≈

a0

+ [a1 cos x + · · · + an cos nx] + [b1 sin x + · · · + bn sin nx] 2 or, from (9a), (9b), and (9c), ! sin 2x sin 3x sin nx x ≈ π − 2 sin x + + + ··· + 2 3 n The graphs of y = x and some of these approximations are shown in Figure 6.6.4. y

y=x

( y = π – 2 (sin x + y = π – 2 (sin x + y = π – 2 sin x +

6 5

3

Historical Note Fourier was a French mathematician and physicist who discovered the Fourier series and related ideas while working on problems of heat diffusion. This discovery was one of the most influential in the history of mathematics; it is the cornerstone of many fields of mathematical research and a basic tool in many branches of engineering. Fourier, a political activist during the French revolution, spent time in jail for his defense of many victims during the Terror. He later became a favorite of Napoleon who made him a baron. [Image: Hulton Archive/ Getty Images]

+ sin33x +

sin 2x + sin33x 2 sin 2x 2

)

sin 4x 4

)

)

y = π – 2 sin x

4

Jean Baptiste Fourier (1768–1830)

sin 2x 2

y=π

2 1 x

Figure 6.6.4

1

2

3

4

5

6 2π 7

It is natural to expect that the mean square error will diminish as the number of terms in the least squares approximation

f(x) ≈

a0 2

+

n (ak cos kx + bk sin kx) k=1

increases. It can be proved that for functions f in C[0, 2π ], the mean square error approaches zero as n → +⬁; this is denoted by writing

f(x) =

a0 2

+

⬁ (ak cos kx + bk sin kx) k=1

The right side of this equation is called the Fourier series for f over the interval [0, 2π ]. Such series are of major importance in engineering, science, and mathematics.

Chapter 6 Supplementary Exercises

399

Exercise Set 6.6 1. Find the least squares approximation of f(x) = 1 + x over the interval [0, 2π] by (a) a trigonometric polynomial of order 2 or less.

8. Find the Fourier series of f(x) = π − x over the interval [0, 2π]. 9. Find the Fourier series of f(x) = 1, 0 < x < π and f(x) = 0, π ≤ x ≤ 2π over the interval [0, 2π].

(b) a trigonometric polynomial of order n or less. 2. Find the least squares approximation of f(x) = x 2 over the interval [0, 2π] by

10. What is the Fourier series of sin(3x)?

(a) a trigonometric polynomial of order 3 or less.

True-False Exercises

(b) a trigonometric polynomial of order n or less.

TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer.

3. (a) Find the least squares approximation of x over the interval [0, 1] by a function of the form a + bex . (b) Find the mean square error of the approximation. 4. (a) Find the least squares approximation of ex over the interval [0, 1] by a polynomial of the form a0 + a1 x . (b) Find the mean square error of the approximation. 5. (a) Find the least squares approximation of sin πx over the interval [−1, 1] by a polynomial of the form a0 + a1 x + a2 x 2 .

(a) If a function f in C[a, b] is approximated by the function g, then the mean square error is the same as the area between the graphs of f(x) and g(x) over the interval [a, b]. (b) Given a finite-dimensional subspace W of C[a, b], the function g = projW f minimizes the mean square error. (c) {1, cos x, sin x, cos 2x, sin 2x} is an orthogonal subset of the vector space . 2π C[0, 2π] with respect to the inner product f, g = 0 f(x)g(x) dx .

6. Use the Gram–Schmidt process to obtain the orthonormal basis (5) from the basis (3).

(d) {1, cos x, sin x, cos 2x, sin 2x} is an orthonormal subset of the vector . 2πspace C[0, 2π] with respect to the inner product f, g = 0 f(x)g(x) dx .

7. Carry out the integrations indicated in Formulas (9a), (9b), and (9c).

(e) {1, cos x, sin x, cos 2x, sin 2x} is a linearly independent subset of C[0, 2π].

(b) Find the mean square error of the approximation.

Chapter 6 Supplementary Exercises 1. Let R 4 have the Euclidean inner product. (a) Find a vector in R 4 that is orthogonal to u1 = (1, 0, 0, 0) and u4 = (0, 0, 0, 1) and makes equal angles with u2 = (0, 1, 0, 0) and u3 = (0, 0, 1, 0). (b) Find a vector x = (x1 , x2 , x3 , x4 ) of length 1 that is orthogonal to u1 and u4 above and such that the cosine of the angle between x and u2 is twice the cosine of the angle between x and u3 . 2. Prove: If u, v is the Euclidean inner product on R n , and if A is an n × n matrix, then

u, Av = AT u, v

4. Let Ax = 0 be a system of m equations in n unknowns. Show that ⎡ ⎤

x1 ⎢x2 ⎥ ⎢ ⎥ x=⎢. ⎥ ⎣ .. ⎦ xn

is a solution of this system if and only if the vector x = (x1 , x2 , . . . , xn ) is orthogonal to every row vector of A with respect to the Euclidean inner product on R n . 5. Use the Cauchy–Schwarz inequality to show that if a1 , a2 , . . . , an are positive real numbers, then

[Hint: Use the fact that u, v = u · v = vT u.] 3. Let M22 have the inner product U, V  = tr(U V ) = tr(V U ) that was defined in Example 6 of Section 6.1. Describe the orthogonal complement of T

(a) the subspace of all diagonal matrices. (b) the subspace of symmetric matrices.

T

(a1 + a2 + · · · + an )

1

a1

+

1

a2

+ ··· +

1

!

an

≥ n2

6. Show that if x and y are vectors in an inner product space and c is any scalar, then

cx + y 2 = c2 x 2 + 2cx, y + y 2

400

Chapter 6 Inner Product Spaces

7. Let R 3 have the Euclidean inner product. Find two vectors of length 1 that are orthogonal to all three of the vectors u1 = (1, 1, −1), u2 = (−2, −1, 2), and u3 = (−1, 0, 1).

14. Prove: If u, v1 and u, v2 are two inner products on a vector space V, then the quantity u, v = u, v1 + u, v2 is also an inner product.

8. Find a weighted Euclidean inner product on R n such that the vectors v1 = (1, 0, 0, . . . , 0)

15. Prove Theorem 6.2.5.

v2 = (0,



v3 = (0, 0,

.. .

2, 0, . . . , 0)



3, . . . , 0)

vn = (0, 0, 0, . . . , form an orthonormal set.



16. Prove: If A has linearly independent column vectors, and if b is orthogonal to the column space of A, then the least squares solution of Ax = b is x = 0. 17. Is there any value of s for which x1 = 1 and x2 = 2 is the least squares solution of the following linear system?

n)

x1 − x2 = 1 2x1 + 3x2 = 1 4x1 + 5x2 = s

9. Is there a weighted Euclidean inner product on R 2 for which the vectors (1, 2) and (3, −1) form an orthonormal set? Justify your answer. 10. If u and v are vectors in an inner product space V, then u, v, and u − v can be regarded as sides of a “triangle” in V (see the accompanying figure). Prove that the law of cosines holds for any such triangle; that is,

u − v 2 = u 2 + v 2 − 2 u

v cos θ

Explain your reasoning. 18. Show that if p and q are distinct positive integers, then the functions f(x) = sin px and g(x) = sin qx are orthogonal with respect to the inner product





f, g =

where θ is the angle between u and v.

f(x)g(x) dx 0

v

19. Show that if p and q are positive integers, then the functions f(x) = cos px and g(x) = sin qx are orthogonal with respect to the inner product

u–v

θ u

Figure Ex-10



11. (a) As shown in Figure 3.2.6, the vectors (k, 0, 0), (0, k, 0), and (0, 0, k) form the edges of a cube in R 3 with diagonal (k, k, k). Similarly, the vectors

(k, 0, 0, . . . , 0), (0, k, 0, . . . , 0), . . . , (0, 0, 0, . . . , k) can be regarded as edges of a “cube” in R n with diagonal (k, k, k, . . . , k). Show that each of the above edges√makes an angle of θ with the diagonal, where cos θ = 1/ n.



f, g =

f(x)g(x) dx 0

20. Let W be the intersection of the planes

x + y + z = 0 and x − y + z = 0 in R 3 . Find an equation for W ⊥ . 21. Prove that if ad − bc  = 0, then the matrix



(b) (Calculus required ) What happens to the angle θ in part (a) as the dimension of R n approaches ⬁?

A=

12. Let u and v be vectors in an inner product space. (a) Prove that u = v if and only if u + v and u − v are orthogonal. 2

(b) Give a geometric interpretation of this result in R with the Euclidean inner product. 13. Let u be a vector in an inner product space V, and let {v1 , v2 , . . . , vn } be an orthonormal basis for V . Show that if αi is the angle between u and vi , then cos2 α1 + cos2 α2 + · · · + cos2 αn = 1

a

b

c

d



has a unique QR -decomposition A = QR , where 1



−c

c

a

Q= √ a 2 + c2

a

1

R= √ a 2 + c2





a 2 + c2

ab + cd

0

ad − bc



CHAPTER

7

Diagonalization and Quadratic Forms CHAPTER CONTENTS

7.1 Orthogonal Matrices

401

7.2 Orthogonal Diagonalization 7.3 Quadratic Forms

409

417

7.4 Optimization Using Quadratic Forms

429

7.5 Hermitian, Unitary, and Normal Matrices INTRODUCTION

437

In Section 5.2 we found conditions that guaranteed the diagonalizability of an n × n matrix, but we did not consider what class or classes of matrices might actually satisfy those conditions. In this chapter we will show that every symmetric matrix is diagonalizable. This is an extremely important result because many applications utilize it in some essential way.

7.1 Orthogonal Matrices In this section we will discuss the class of matrices whose inverses can be obtained by transposition. Such matrices occur in a variety of applications and arise as well as transition matrices when one orthonormal basis is changed to another.

Orthogonal Matrices Recall from Theorem 1.6.3 that if either product in (1) holds, then so does the other. Thus, A is orthogonal if either AAT = I or ATA = I .

We begin with the following definition. DEFINITION 1 A square matrix A is said to be orthogonal if its transpose is the same

as its inverse, that is, if

A−1 = AT

or, equivalently, if

AAT = ATA = I

(1)

E X A M P L E 1 A 3 × 3 Orthogonal Matrix



The matrix

A= is orthogonal since

ATA =

⎡3 7 ⎢2 ⎢ ⎣7 6 7

− 67 3 7 2 7

3 7 ⎢ 6 ⎢− ⎣ 7 2 7

⎤⎡

3 2 7 7 ⎥⎢ 6⎥ ⎢ 6 − 7⎦ ⎣ 7 3 2 −7 7



2 7 3 7 6 7

6 7 ⎥ 2⎥ 7⎦ − 37

2 7 3 7 6 7



6 7 ⎥ 2⎥ 7⎦ − 37



1

0

0

0

0

1



⎥ ⎢ = ⎣0 1 0⎦ 401

402

Chapter 7 Diagonalization and Quadratic Forms

E X A M P L E 2 Rotation and Reflection Matrices Are Orthogonal

Recall from Table 5 of Section 4.9 that the standard matrix for the counterclockwise rotation of R 2 through an angle θ is

A=

cos θ sin θ

− sin θ cos θ

This matrix is orthogonal for all choices of θ since

ATA =

cos θ − sin θ

sin θ cos θ



cos θ sin θ

− sin θ 1 0 = 0 1 cos θ

We leave it for you to verify that the reflection matrices in Tables 1 and 2 and the rotation matrices in Table 6 of Section 4.9 are all orthogonal.

Observe that for the orthogonal matrices in Examples 1 and 2, both the row vectors and the column vectors form orthonormal sets with respect to the Euclidean inner product. This is a consequence of the following theorem.

THEOREM 7.1.1 The following are equivalent for an n × n matrix A.

(a) A is orthogonal. (b) The row vectors of A form an orthonormal set in R n with the Euclidean inner product. (c) The column vectors of A form an orthonormal set in R n with the Euclidean inner product.

Proof We will prove the equivalence of (a) and (b) and leave the equivalence of (a) and

(c) as an exercise. (a) ⇔ (b) Let ri be the i th row vector and cj the j th column vector of A. Since transpos-

ing a matrix converts its columns to rows and rows to columns, it follows that cTj = rj . Thus, it follows from the row-column rule [Formula (5) of Section 1.3] and the bottom form listed in Table 1 of Section 3.2 that



r1 cT1

⎢ T ⎢r2 c1 ⎢ T AA = ⎢ ⎢ .. ⎢ . ⎣ rn cT1

r1 cT2

···

r2 cT2

···

.. . rn cT2

···

r1 cTn

⎤ ⎥



r1 · r1

⎢ ⎢r2 · r1 ⎢ ⎥=⎢ .. ⎥ ⎢ .. ⎢ . ⎥ ⎦ ⎣ . rn cTn rn · r1 r2 cTn ⎥ ⎥



r1 · r2

···

r1 · rn

r2 · r2

···

r2 · rn ⎥ ⎥

.. . rn · r2

.. .

⎥ ⎥ ⎥ ⎥ ⎦

· · · rn · rn

It is evident from this formula that AAT = I if and only if r1 · r1 = r2 · r2 = · · · = rn · rn = 1 WARNING Note that an or-

thogonal matrix has orthonormal rows and columns—not simply orthogonal rows and columns.

and ri · rj = 0 when i  = j which are true if and only if {r1 , r2 , . . . , rn } is an orthonormal set in R n . The following theorem lists four more fundamental properties of orthogonal matrices. The proofs are all straightforward and are left as exercises.

7.1 Orthogonal Matrices

403

THEOREM 7.1.2

(a) The transpose of an orthogonal matrix is orthogonal. (b) The inverse of an orthogonal matrix is orthogonal. (c) A product of orthogonal matrices is orthogonal. (d ) If A is orthogonal, then det(A) = 1 or det(A) = −1.

E X A M P L E 3 det(A) = ±1 for an Orthogonal Matrix A

The matrix

 A=

√1

2 − √12

√1



2 √1 2

is orthogonal since its row (and column) vectors form orthonormal sets in R 2 with the Euclidean inner product. We leave it for you to verify that det(A) = 1 and that interchanging the rows produces an orthogonal matrix whose determinant is −1. Orthogonal Matrices as Linear Operators

We observed in Example 2 that the standard matrices for the basic reflection and rotation operators on R 2 and R 3 are orthogonal. The next theorem will explain why this is so. THEOREM 7.1.3 If A is an n × n matrix, then the following are equivalent.

(a) A is orthogonal. (b) Ax = x for all x in R n . (c) Ax · Ay = x · y for all x and y in R n . Proof We will prove the sequence of implications (a) (a) ⇒ (b) Assume that A is orthogonal, so that ATA

⇒ (b) ⇒ (c) ⇒ (a).

= I . It follows from Formula (26)

of Section 3.2 that

Ax = (Ax · Ax)1/2 = (x · ATAx)1/2 = (x · x)1/2 = x (b) ⇒ (c) Assume that Ax

= x for all x in R n . From Theorem 3.2.7 we have

Ax · Ay = 41 Ax + Ay 2 − 41 Ax − Ay 2 = 41 A(x + y) 2 − 41 A(x − y) 2 = 41 x + y 2 − 41 x − y 2 = x · y (c) ⇒ (a) Assume that Ax · Ay

= x · y for all x and y in R n . It follows from Formula (26)

of Section 3.2 that x · y = x · ATAy which can be rewritten as x · (ATAy − y) = 0 or as x · (ATA − I )y = 0 Since this equation holds for all x in R n , it holds in particular if x = (ATA − I )y, so

(ATA − I )y · (ATA − I )y = 0 Thus, it follows from the positivity axiom for inner products that

(ATA − I )y = 0

404

Chapter 7 Diagonalization and Quadratic Forms

Since this equation is satisfied by every vector y in R n , it must be that ATA − I is the zero matrix (why?) and hence that ATA = I . Thus, A is orthogonal.

TA (u)

TA (v)

β α

v u

0 ||TA(u)|| = ||u||, TA (v)|| = ||v|| α = β, d (TA (u), TA (v)) = d(u, v)

Figure 7.1.1

Change of Orthonormal Basis

Theorem 7.1.3 has a useful geometric interpretation when considered from the viewpoint of matrix transformations: If A is an orthogonal matrix and TA : R n →R n is multiplication by A, then we will call TA an orthogonal operator on R n . It follows from parts (a) and (b) of Theorem 7.1.3 that the orthogonal operators on R n are precisely those operators that leave the lengths (norms) of vectors unchanged. However, as illustrated in Figure 7.1.1, this implies that orthogonal operators also leave angles and distances between vectors in R n unchanged since these can be expressed in terms of norms [see Definition 2 and Formula (20) of Section 3.2]. Orthonormal bases for inner product spaces are convenient because, as the following theorem shows, many familiar formulas hold for such bases. We leave the proof as an exercise. THEOREM 7.1.4 If S is an orthonormal basis for an n-dimensional inner product space

V, and if (u)S = (u1 , u2 , . . . , un ) and (v)S = (v1 , v2 , . . . , vn ) then:

# u21 + u22 + · · · + u2n " (b) d(u, v) = (u1 − v1 )2 + (u2 − v2 )2 + · · · + (un − vn )2 (a) u =

(c) u, v = u1 v1 + u2 v2 + · · · + un vn

Remark Note that the three parts of Theorem 7.1.4 can be expressed as

u = (u)S

  d(u, v) = d (u)S , (v)S

/ 0 u, v = (u)S , (v)S

where the norm, distance, and inner product on the left sides are relative to the inner product on V and on the right sides are relative to the Euclidean inner product on R n .

Transitions between orthonormal bases for an inner product space are of special importance in geometry and various applications. The following theorem, whose proof is deferred to the end of this section, is concerned with transitions of this type. THEOREM 7.1.5 Let

V be a finite-dimensional inner product space. If P is the transition matrix from one orthonormal basis for V to another orthonormal basis for V, then P is an orthogonal matrix.

E X A M P L E 4 Rotation of Axes in 2-Space

In many problems a rectangular xy-coordinate system is given, and a new x y -coordinate system is obtained by rotating the xy-system counterclockwise about the origin through an angle θ . When this is done, each point Q in the plane has two sets of coordinates— coordinates (x, y) relative to the xy-system and coordinates (x , y ) relative to the x y system (Figure 7.1.2a). By introducing unit vectors u1 and u2 along the positive x - and y -axes and unit vectors u 1 and u 2 along the positive x - and y -axes, we can regard this rotation as a change

7.1 Orthogonal Matrices

405

from an old basis B = {u1 , u2 } to a new basis B = {u 1 , u 2 } (Figure 7.1.2b). Thus, the new coordinates (x , y ) and the old coordinates (x, y) of a point Q will be related by



x −1 x =P y y

(2)

where P is the transition from B to B . To find P we must determine the coordinate matrices of the new basis vectors u 1 and u 2 relative to the old basis. As indicated in Figure 7.1.2c, the components of u 1 in the old basis are cos θ and sin θ , so



cos θ [u1 ]B = sin θ



Similarly, from Figure 7.1.2d we see that the components of u 2 in the old basis are cos(θ + π/2) = − sin θ and sin(θ + π/2) = cos θ , so



Thus the transition matrix from B to B is cos θ P = sin θ

− sin θ cos θ

[u 2 ]B =

− sin θ cos θ

(3)

Observe that P is an orthogonal matrix, as expected, since B and B are orthonormal bases. Thus

cos θ sin θ −1 T P =P = − sin θ cos θ so (2) yields



cos θ sin θ x x = (4) − sin θ cos θ y y

or, equivalently,

x = x cos θ + y sin θ y = −x sin θ + y cos θ

(5)

These are sometimes called the rotation equations for R 2 . y´

y

y´ Q

(x, y) u´ (x´, y´) 2



θ

cos θ

u1

(a)



u1´

u1´

x

y

u2´

u2

x´ θ

y

θ

sin θ

π sin θ + 2

(

π cos θ + 2

(c)

x´ θ

x

(

(b)

π θ +2

)

x

) (d)

Figure 7.1.2

E X A M P L E 5 Rotation of Axes in 2-Space

Use form (4) of the rotation equations for R 2 to find the new coordinates of the point Q(2, 1) if the coordinate axes of a rectangular coordinate system are rotated through an angle of θ = π/4. Solution Since

sin

π 4

= cos

π 4

1

=√

2

406

Chapter 7 Diagonalization and Quadratic Forms

the equation in (4) becomes

⎡ 1

√ x ⎣ 2

= y − √1

√1

2

√1

2

⎤ ⎦

2

x y

Thus, if the old coordinates of a point Q are (x, y) = (2, −1), then

⎡ 1

√ x ⎣ 2 = y

− √1 2



√1

2

√1

so the new coordinates of Q are (x , y ) =

2





√1



2 2 ⎦ ⎦ =⎣ 3 −1 −√

'

√1

2

2

(

, − √32 .

Remark Observe that the coefficient matrix in (4) is the same as the standard matrix for the linear operator that rotates the vectors of R 2 through the angle −θ (see margin note for Table 5 of Section 4.9). This is to be expected since rotating the coordinate axes through the angle θ with the vectors of R 2 kept fixed has the same effect as rotating the vectors in R 2 through the angle −θ with the axes kept fixed. z u3



E X A M P L E 6 Application to Rotation of Axes in 3-Space

u3´ u2´

y´ y

u1 x

u2

u1´ θ

Suppose that a rectangular xyz-coordinate system is rotated around its z-axis counterclockwise (looking down the positive z-axis) through an angle θ (Figure 7.1.3). If we introduce unit vectors u1 , u2 , and u3 along the positive x -, y -, and z-axes and unit vectors u 1 , u 2 , and u 3 along the positive x -, y -, and z -axes, we can regard the rotation as a change from the old basis B = {u1 , u2 , u3 } to the new basis B = {u 1 , u 2 , u 3 }. In light of Example 4, it should be evident that





Figure 7.1.3







cos θ − sin θ ⎥ ⎥ ⎢ ⎢



[u1 ]B = ⎣ sin θ ⎦ and [u2 ]B = ⎣ cos θ ⎦ 0 0 Moreover, since u 3 extends 1 unit up the positive z -axis,

⎡ ⎤ 0

⎢ ⎥ [u 3 ]B = ⎣0⎦ 1

It follows that the transition matrix from B to B is ⎡ cos θ − sin θ ⎢ cos θ P = ⎣ sin θ 0 0 and the transition matrix from B to B is ⎡ cos θ ⎢ P −1 = ⎣ − sin θ 0

sin θ cos θ 0



0 ⎥ 0⎦ 1



0 ⎥ 0⎦ 1

(verify). Thus, the new coordinates (x , y , z ) of a point Q can be computed from its old coordinates (x, y, z) by

⎡ ⎤ ⎡ x

cos θ ⎢ ⎥ ⎢ ⎣y ⎦ = ⎣ − sin θ 0 z

O PT I O N A L

sin θ cos θ 0

⎤⎡ ⎤

x 0 ⎥⎢ ⎥ 0⎦ ⎣y ⎦ 1 z

We conclude this section with an optional proof of Theorem 7.1.5.

7.1 Orthogonal Matrices

407

V is an n-dimensional inner product space and that P is the transition matrix from an orthonormal basis B to an orthonormal basis B . We will denote the norm relative to the inner product on V by the symbol V to distinguish it from the norm relative to the Euclidean inner product on R n , which we will denote by . To prove that P is orthogonal, we will use Theorem 7.1.3 and show that P x = x for every vector x in R n . As a first step in this direction, recall from Theorem 7.1.4(a) that for any orthonormal basis for V the norm of any vector u in V is the same as the norm of its coordinate vector with respect to the Euclidean inner product, that is,

Proof of Theorem 7.1.5 Assume that

Recall that (u)S denotes a coordinate vector expressed in comma-delimited form whereas [u]S denotes a coordinate vector expressed in column form.

u V = [u]B = [u]B or

u V = [u]B = P [u]B

(6)

n

Now let x be any vector in R , and let u be the vector in V whose coordinate vector with respect to the basis B is x, that is, [u]B = x. Thus, from (6),

u = x = P x which proves that P is orthogonal.

Exercise Set 7.1 In each part of Exercises 1–4, determine whether the matrix is orthogonal, and if so find it inverse.

 1. (a)

1

0

0

−1

 2. (a)

1

0

0

1



0

⎢ 3. (a) ⎢ ⎣1 0





⎥ 0 ⎥ ⎦

⎢ (b) ⎢ ⎣

0

√1

2

1 2

2

1 2

− 56

1 6

1 6

1 6

1 6

− 56

1 2

5 √1 5

0

√1

0

⎤ 1 2

⎥ 1⎥ 6⎥ ⎥ − 56 ⎥ ⎦



1

⎢ ⎢0 ⎢ (b) ⎢ ⎢0 ⎣

1 6



√2

− √12

2

8. Let TA : R 3 →R 3 be multiplication by the orthogonal matrix in Exercise 6. Find TA (x) for the vector x = (0, 1, 4), and confirm TA (x) = x relative to the Euclidean inner product on R 3 .

2

5 √2 5



√1

√1

√1

(b)

1

2

2 √1 2





− √12

√1

(b)



⎡1 ⎢1 ⎢ ⎢2 4. (a) ⎢ ⎢1 ⎣2



0

√1

√1

6 − √26 √1 6

0 √1 3 √1 3 √1 3

3

√1

3 √1 3

0



9. Are the standard matrices for the reflections in Tables 1 and 2 of Section 4.9 orthogonal?

⎥ ⎥ ⎦

10. Are the standard matrices for the orthogonal projections in Tables 3 and 4 of Section 4.9 orthogonal? 11. What conditions must a and b satisfy for the matrix

0



0⎥ ⎥

− 21

5. A =

4 5 ⎢ 9 ⎣− 25 12 25

0 4 5 3 5

− 35





⎥ − 12 25 ⎦ 16 25

6. A =

1 3 ⎢ 2 ⎣ 3 − 23

2 3 − 23 − 13

0

⎥ 1⎥ ⎦

1 2

0

2⎤ 3 1⎥ 3⎦ 2 3



a+b a−b



In Exercises 5–6, show that the matrix is orthogonal three ways: first by calculating ATA, then by using part (b) of Theorem 7.1.1, and then by using part (c) of Theorem 7.1.1.



7. Let TA : R 3 →R 3 be multiplication by the orthogonal matrix in Exercise 5. Find TA (x) for the vector x = (−2, 3, 5), and confirm that TA (x) = x relative to the Euclidean inner product on R 3 .

b−a b+a

to be orthogonal? 12. Under what conditions will a diagonal matrix be orthogonal? 13. Let a rectangular x y -coordinate system be obtained by rotating a rectangular xy-coordinate system counterclockwise through the angle θ = π/3. (a) Find the x y -coordinates of the point whose xy-coordinates are (−2, 6). (b) Find the xy-coordinates of the point whose x y -coordinates are (5, 2). 14. Repeat Exercise 13 with θ = 3π/4.

408

Chapter 7 Diagonalization and Quadratic Forms

15. Let a rectangular x y z -coordinate system be obtained by rotating a rectangular xyz-coordinate system counterclockwise about the z-axis (looking down the z-axis) through the angle θ = π/4. (a) Find the x y z -coordinates of the point whose xyz-coordinates are (−1, 2, 5). (b) Find the xyz-coordinates of the point whose x y z -coordinates are (1, 6, −3). 16. Repeat Exercise 15 for a rotation of θ = 3π/4 counterclockwise about the x -axis (looking along the positive x -axis toward the origin). 17. Repeat Exercise 15 for a rotation of θ = π/3 counterclockwise about the y -axis (looking along the positive y -axis toward the origin). 18. A rectangular x y z -coordinate system is obtained by rotating an xyz-coordinate system counterclockwise about the y -axis through an angle θ (looking along the positive y -axis toward the origin). Find a matrix A such that

⎡ ⎤ ⎡ ⎤ x

x ⎢ ⎥ ⎢ ⎥ ⎣y ⎦ = A ⎣y ⎦ z

z

where (x, y, z) and (x , y , z ) are the coordinates of the same point in the xyz- and x y z -systems, respectively. 19. Repeat Exercise 18 for a rotation about the x -axis. 20. A rectangular x

y

z

-coordinate system is obtained by first rotating a rectangular xyz-coordinate system 60◦ counterclockwise about the z-axis (looking down the positive z-axis) to obtain an x y z -coordinate system, and then rotating the x y z -coordinate system 45◦ counterclockwise about the y axis (looking along the positive y -axis toward the origin). Find a matrix A such that

⎡ ⎤ ⎡ ⎤ x

x ⎢

⎥ ⎢ ⎥ ⎣y ⎦ = A ⎣y ⎦ z

z

where (x, y, z) and (x

, y

, z

) are the xyz- and x

y

z

coordinates of the same point. 21. A linear operator on R 2 is called rigid if it does not change the lengths of vectors, and it is called angle preserving if it does not change the angle between nonzero vectors. (a) Identify two different types of linear operators that are rigid. (b) Identify two different types of linear operators that are angle preserving. (c) Are there any linear operators on R 2 that are rigid and not angle preserving? Angle preserving and not rigid? Justify your answer. 22. Can an orthogonal operator TA : R n →R n map nonzero vectors that are not orthogonal into orthogonal vectors? Justify your answer.

23. The set S =

1

√1

3

,

√1

#

2

x,

3 2 x 2

# 2



2 3

is an orthonormal ba-

sis for P2 with respect to the evaluation inner product at the points x0 = −1, x1 = 0, x2 = 1. Let p = p(x) = 1 + x + x 2 and q = q(x) = 2x − x 2 . (a) Find (p)S and (q)S . (b) Use Theorem 7.1.4 to compute p , d(p, q) and p, q. 24. The sets S = {1, x} and S =

1

√1

2

(1 + x),

√1

2

2 (1 − x) are or-

thonormal bases for P1 with respect to the standard inner product. Find the transition matrix P from S to S , and verify that the conclusion of Theorem 7.1.5 holds for P .

Working with Proofs 25. Prove that if x is an n × 1 matrix, then the matrix

A = In −

2 xxT xT x

is both orthogonal and symmetric. 26. Prove that a 2 × 2 orthogonal matrix A has only one of two possible forms:



cos θ sin θ

A=

− sin θ cos θ



or A =

cos θ sin θ

sin θ − cos θ

where 0 ≤ θ < 2π . [Hint: Start with a general 2 × 2 matrix A, and use the fact that the column vectors form an orthonormal set in R 2 .] 27. (a) Use the result in Exercise 26 to prove that multiplication by a 2 × 2 orthogonal matrix is a rotation if det(A) = 1 and a reflection followed by a rotation if det(A) = −1. (b) In the case where the transformation in part (a) is a reflection followed by a rotation, show that the same transformation can be accomplished by a single reflection about an appropriate line through the origin. What is that line? [Hint: See Formula (6) of Section 4.9.] 28. In each part, use the result in Exercise 27(a) to determine whether multiplication by A is a rotation or a reflection followed by rotation. Find the angle of rotation in both cases, and in the case where it is a reflection followed by a rotation find an equation for the line through the origin referenced in Exercise 27(b).

 (a) A =

− √12

− √12

√1

2

− √12





(b) A = ⎣

− 21 √

3 2



3 2

⎤ ⎦

1 2

29. The result in Exercise 27(a) has an analog for 3 × 3 orthogonal matrices. It can be proved that multiplication by a 3 × 3 orthogonal matrix A is a rotation about some line through the origin of R 3 if det(A) = 1 and is a reflection about some coordinate plane followed by a rotation about some line through the origin if det(A) = −1. Use the first of these facts and Theorem 7.1.2 to prove that any composition of rotations about lines through the origin in R 3 can be accomplished by a single rotation about an appropriate line through the origin.

7.2 Orthogonal Diagonalization

30. Euler’s Axis of Rotation Theorem states that: If A is an orthogonal 3 × 3 matrix for which det(A) = 1, then multiplication by A is a rotation about a line through the origin in R 3 . Moreover, if u is a unit vector along this line, then Au = u. (a) Confirm that the following matrix A is orthogonal, that det(A) = 1, and that there is a unit vector u for which Au = u. ⎡ ⎤ 2

⎢ 73 A=⎢ ⎣7 6 7

3 7 − 67 2 7

6 7 ⎥ 2⎥ 7⎦ − 37

(b) Use Formula (3) of Section 4.9 to prove that if A is a 3 × 3 orthogonal matrix for which det(A) = 1, then the angle of rotation resulting from multiplication by A satisfies the equation cos θ = 21 [tr(A) − 1]. Use this result to find the angle of rotation for the rotation matrix in part (a). 31. Prove the equivalence of statements (a) and (c) that are given in Theorem 7.1.1.

True-False Exercises TF. In parts (a)–(h) determine whether the statement is true or false, and justify your answer.



1

⎢ (a) The matrix ⎣0 0



1 (b) The matrix 2



0 ⎥ 1⎦ is orthogonal. 0

−2 1

is orthogonal.

(c) An m × n matrix A is orthogonal if ATA = I . (d) A square matrix whose columns form an orthogonal set is orthogonal. (e) Every orthogonal matrix is invertible. (f ) If A is an orthogonal matrix, then A2 is orthogonal and (det A)2 = 1.

409

(g) Every eigenvalue of an orthogonal matrix has absolute value 1. (h) If A is a square matrix and Au = 1 for all unit vectors u, then A is orthogonal.

Working withTechnology T1. If a is a nonzero vector in R n , then aaT is called the outer product of a with itself, the subspace a⊥ is called the hyperplane in R n orthogonal to a, and the n × n orthogonal matrix

Ha⊥ = I −

2 aaT aTa

is called the Householder matrix or the Householder reflection about a⊥ , named in honor of the American mathematician Alston S. Householder (1904–1993). In R 2 the matrix Ha⊥ represents a reflection about the line through the origin that is orthogonal to a, and in R 3 it represents a reflection about the plane through the origin that is orthogonal to a. In higher dimensions we can view Ha⊥ as a “reflection” about the hyperplane a⊥ . Householder reflections are important in large-scale implementations of numerical algorithms, particularly QR -decompositions, because they can be used to transform a given vector into a vector with specified zero components while leaving the other components unchanged. This is a consequence of the following theorem [see Contemporary Linear Algebra, by Howard Anton and Robert C. Busby (Hoboken, NJ: John Wiley & Sons, 2003, p. 422)].

Theorem. If v and w are distinct vectors in R n with the same norm, then the Householder reflection about the hyperplane (v − w)⊥ maps v into w and conversely. (a) Find a Householder reflection that maps the vector v = (4, 2, 4) into a vector w that has zeros as its second and third components. Find w. (b) Find a Householder reflection that maps the vector v = (3, 4, 2, 4) into the vector whose last two entries are zero, while leaving the first entry unchanged. Find w.

7.2 Orthogonal Diagonalization In this section we will be concerned with the problem of diagonalizing a symmetric matrix A. As we will see, this problem is closely related to that of finding an orthonormal basis for R n that consists of eigenvectors of A. Problems of this type are important because many of the matrices that arise in applications are symmetric.

The Orthogonal Diagonalization Problem

In Section 5.2 we defined two square matrices, A and B , to be similar if there is an invertible matrix P such that P −1AP = B . In this section we will be concerned with the special case in which it is possible to find an orthogonal matrix P for which this relationship holds. We begin with the following definition.

410

Chapter 7 Diagonalization and Quadratic Forms DEFINITION 1 If A and B are square matrices, then we say that B is orthogonally similar to A if there is an orthogonal matrix P such that B = P TAP .

Note that if B is orthogonally similar to A, then it is also true that A is orthogonally similar to B since we can express A as A = QTBQ by taking Q = P T (verify). This being the case we will say that A and B are orthogonally similar matrices if either is orthogonally similar to the other. If A is orthogonally similar to some diagonal matrix, say

P TAP = D then we say that A is orthogonally diagonalizable and that P orthogonally diagonalizes A. Our first goal in this section is to determine what conditions a matrix must satisfy to be orthogonally diagonalizable. As an initial step, observe that there is no hope of orthogonally diagonalizing a matrix that is not symmetric. To see why this is so, suppose that P TAP = D (1) where P is an orthogonal matrix and D is a diagonal matrix. Multiplying the left side of (1) by P , the right side by P T , and then using the fact that PP T = P TP = I , we can rewrite this equation as A = PDP T (2) Now transposing both sides of this equation and using the fact that a diagonal matrix is the same as its transpose we obtain

AT = (PDP T )T = (P T )T D TP T = PDP T = A so A must be symmetric if it is orthogonally diagonalizable. Conditions for Orthogonal Diagonalizability

The following theorem shows that every symmetric matrix with real entries is, in fact, orthogonally diagonalizable. In this theorem, and for the remainder of this section, orthogonal will mean orthogonal with respect to the Euclidean inner product on R n . THEOREM 7.2.1 If A is an n × n matrix with real entries, then the following are equiv-

alent.

(a) A is orthogonally diagonalizable. (b) A has an orthonormal set of n eigenvectors. (c) A is symmetric. Proof (a) ⇒ (b) Since A is orthogonally diagonalizable, there is an orthogonal matrix P

such that P −1AP is diagonal. As shown in Formula (2) in the proof of Theorem 5.2.1, the n column vectors of P are eigenvectors of A. Since P is orthogonal, these column vectors are orthonormal, so A has n orthonormal eigenvectors. (b) ⇒ (a) Assume that A has an orthonormal set of n eigenvectors {p1 , p2 , . . . , pn }. As

shown in the proof of Theorem 5.2.1, the matrix P with these eigenvectors as columns diagonalizes A. Since these eigenvectors are orthonormal, P is orthogonal and thus orthogonally diagonalizes A.

(a) ⇒ (b) we showed that an orthogonally diagonalizable n × n matrix A is orthogonally diagonalized by an n × n matrix P whose columns form an orthonormal set of eigenvectors of A. Let D be the diagonal matrix (a) ⇒ (c) In the proof that

D = P TAP

7.2 Orthogonal Diagonalization

411

from which it follows that

A = PDP T Thus,

AT = (PDP T )T = PD TP T = PDP T = A which shows that A is symmetric. (c) ⇒ (a) The proof of this part is beyond the scope of this text. However, because it is

such an important result we have outlined the structure of its proof in the exercises (see Exercise 31). Properties of Symmetric Matrices

Our next goal is to devise a procedure for orthogonally diagonalizing a symmetric matrix, but before we can do so, we need the following critical theorem about eigenvalues and eigenvectors of symmetric matrices.

THEOREM 7.2.2 If A is a symmetric matrix with real entries, then:

(a) The eigenvalues of A are all real numbers. (b) Eigenvectors from different eigenspaces are orthogonal.

Part (a ), which requires results about complex vector spaces, will be discussed in Section 7.5. Proof (b) Let v1 and v2 be eigenvectors corresponding to distinct eigenvalues λ1 and λ2

of the matrix A. We want to show that v1 · v2 = 0. Our proof of this involves the trick of starting with the expression Av1 · v2 . It follows from Formula (26) of Section 3.2 and the symmetry of A that Av1 · v2 = v1 · ATv2 = v1 · Av2 (3) But v1 is an eigenvector of A corresponding to λ1 , and v2 is an eigenvector of A corresponding to λ2 , so (3) yields the relationship

λ1 v1 · v2 = v1 · λ2 v2 which can be rewritten as

(λ1 − λ2 )(v1 · v2 ) = 0

(4)

But λ1 − λ2  = 0, since λ1 and λ2 were assumed distinct. Thus, it follows from (4) that v1 · v2 = 0. Theorem 7.2.2 yields the following procedure for orthogonally diagonalizing a symmetric matrix. Orthogonally Diagonalizing an n × n Symmetric Matrix Step 1. Find a basis for each eigenspace of A. Step 2. Apply the Gram–Schmidt process to each of these bases to obtain an orthonormal basis for each eigenspace. Step 3. Form the matrix P whose columns are the vectors constructed in Step 2. This matrix will orthogonally diagonalize A, and the eigenvalues on the diagonal of D = P TAP will be in the same order as their corresponding eigenvectors in P .

412

Chapter 7 Diagonalization and Quadratic Forms

Remark The justification of this procedure should be clear: Theorem 7.2.2 ensures that eigenvectors from different eigenspaces are orthogonal, and applying the Gram–Schmidt process ensures that the eigenvectors within the same eigenspace are orthonormal. Thus the entire set of eigenvectors obtained by this procedure will be orthonormal.

E X A M P L E 1 Orthogonally Diagonalizing a Symmetric Matrix

Find an orthogonal matrix P that diagonalizes



4 ⎢ A = ⎣2 2



2 4 2

2 ⎥ 2⎦ 4

Solution We leave it for you to verify that the characteristic equation of A is



λ−4 ⎢ det(λI − A) = det ⎣ −2 −2

⎤ −2 ⎥ −2 ⎦ = (λ − 2)2 (λ − 8) = 0 λ−4

−2 λ−4 −2

Thus, the distinct eigenvalues of A are λ = 2 and λ = 8. By the method used in Example 7 of Section 5.1, it can be shown that









−1

⎤ −1 ⎢ ⎥ u2 = ⎣ 0⎦ ⎡

u1 = ⎣ 1⎦ 0

and

(5)

1

form a basis for the eigenspace corresponding to λ = 2. Applying the Gram–Schmidt process to {u1 , u2 } yields the following orthonormal eigenvectors (verify):



− √12



v1 = ⎢ ⎣

√1

2











− √16

⎥ ⎥ ⎦

√1 ⎥ v2 = ⎢ ⎣− 6 ⎦

and

(6)

√2 6

0 The eigenspace corresponding to λ = 8 has

⎡ ⎤

1 ⎢ ⎥ u3 = ⎣1⎦ 1 as a basis. Applying the Gram–Schmidt process to {u3 } (i.e., normalizing u3 ) yields



v3 =



√1

3 ⎢ 1 ⎢√ ⎣ 3 √1 3

⎥ ⎥ ⎦

Finally, using v1 , v2 , and v3 as column vectors, we obtain



− √12

⎢ P =⎢ ⎣

√1

2

0

− √16

√1

3 √1 3 √1 3

− √16 √2

6

⎤ ⎥ ⎥ ⎦

which orthogonally diagonalizes A. As a check, we leave it for you to confirm that



− √12

⎢ 1 √ P TAP = ⎢ ⎣− 6 √1

3

√1

2 − √16 √1 3

0





⎡ ⎤ − √1 2

4 2 2 ⎥ ⎢ √2 ⎥ ⎣2 4 2⎦ ⎢ 6⎦ ⎣ 2 2 4 √1 3

√1

2

0

− √16 − √16 √2

6

√1



3 ⎥ √1 ⎥ 3⎦ √1 3





2 0 0 = ⎣0 2 0⎦ 0 0 8

7.2 Orthogonal Diagonalization

Spectral Decomposition

413

If A is a symmetric matrix that is orthogonally diagonalized by

P = [u1

···

u2

un ]

and if λ1 , λ2 , . . . , λn are the eigenvalues of A corresponding to the unit eigenvectors u1 , u2 , . . . , un , then we know that D = P TAP , where D is a diagonal matrix with the eigenvalues in the diagonal positions. It follows from this that the matrix A can be expressed as

⎡ A = PDP T = [u1

···

u2

λ1

⎢0 ⎢ un ] ⎢ ..

0

⎣.

λ2 .. .

0

0



T

u1

= [λ1 u1

λ2 u2



··· ··· .. .

⎤⎡



uT1 0 ⎢ T⎥ 0⎥ ⎥ ⎢u2 ⎥

.. ⎥ ⎢ .⎥ ⎥ . ⎦⎢ ⎣ .. ⎦ · · · λn uTn

⎢ T⎥ ⎢u2 ⎥ ⎥ · · · λn un ] ⎢ ⎢ .. ⎥ ⎣.⎦ uTn

Multiplying out, we obtain the formula

A = λ1 u1 uT1 + λ2 u2 uT2 + · · · + λn un uTn

(7)

which is called a spectral decomposition of A.* Note that in each term of the spectral decomposition of A has the form λu uT , where u is a unit eigenvector of A in column form, and λ is an eigenvalue of A corresponding to u. Since u has size n × 1, it follows that the product u uT has size n × n. It can be proved (though we will not do it) that u uT is the standard matrix for the orthogonal projection of R n on the subspace spanned by the vector u. Accepting this to be so, the spectral decomposition of A tells that the image of a vector x under multiplication by a symmetric matrix A can be obtained by projecting x orthogonally on the lines (one-dimensional subspaces) determined by the eigenvectors of A, then scaling those projections by the eigenvalues, and then adding the scaled projections. Here is an example. E X A M P L E 2 A Geometric Interpretation of a Spectral Decomposition



The matrix

A=

1 2

2 −2

has eigenvalues λ1 = −3 and λ2 = 2 with corresponding eigenvectors





1 x1 = −2

and x2 =

2 1

(verify). Normalizing these basis vectors yields



u1 =

*

x1

x1

√1



⎡ x2

√2



= ⎣ 2 ⎦ and u2 = =⎣ 1 ⎦ √

x2 − √5 5 5

5

The terminology spectral decomposition is derived from the fact that the set of all eigenvalues of a matrix

A is sometimes called the spectrum of A. The terminology eigenvalue decomposition is due to Professor Dan Kalman, who introduced it in an award-winning paper entitled “A Singularly Valuable Decomposition: The SVD of a Matrix,” The College Mathematics Journal, Vol. 27, No. 1, January 1996.

414

Chapter 7 Diagonalization and Quadratic Forms

so a spectral decomposition of A is



1 2







1

5

− 25

1 5

= (−3) ⎣ − 25

4 5



&

√ % 2 5 = λ1 u1 uT1 + λ2 u2 uT2 = (−3) ⎣ 2 ⎦ √15 −2 −√

− √25 + (2) ⎣





⎦ + (2 ) ⎣

4 5

2 5

2 5

1 5



√2



5 ⎦ √1 5

%



√2

5

√1

&

5

(8)

where, as noted above, the 2 × 2 matrices on the right side of (8) are the standard matrices for the orthogonal projections onto the eigenspaces corresponding to the eigenvalues λ1 = −3 and λ2 = 2, respectively. Now let us see what this spectral decomposition tells us about the image of the vector x = (1, 1) under multiplication by A. Writing x in column form, it follows that



1 2

Ax = and from (8) that



1 Ax = 2

2 −2

2 −2









= (−3)  =

3 5 − 65

− 25

1

1 5 = (−3) 2 1 −5

− 15 2 5





1 3 = 1 0



4 5



6

5 6 5

4 5 2 5

2 5 1 5

 1 1

5 3 5

+ (2 )

 12 

+

1 + (2 ) 1

(9)

=

3 0

(10)

Formulas (9) and (10) provide two different ways of viewing the image of the vector (1, 1) under multiplication by A: Formula (9) tells us directly that the image of this vector is (3, 0), whereas Formula (10) tells us that this image can also be obtained by projecting (1, onto corresponding to λ1 = −3 and λ2 = 2 to obtain  1)  the eigenspaces   the vectors  − 15 , 25 and 65 , 35 , then scaling by the eigenvalues to obtain 35 , − 65 and 125 , 65 , and then adding these vectors (see Figure 7.2.1).

λ2 = 2 x = (1, 1)

( 125 , 56 )

( 65 , 35 )

(– 15 , 52 )

Ax = (3, 0)

( 35 , – 56 ) Figure 7.2.1

The Nondiagonalizable Case

λ1 = – 3

If A is an n × n matrix that is not orthogonally diagonalizable, it may still be possible to achieve considerable simplification in the form of P TAP by choosing the orthogonal matrix P appropriately. We will consider two theorems (without proof) that illustrate this. The first, due to the German mathematician Issai Schur, states that every square matrix A is orthogonally similar to an upper triangular matrix that has the eigenvalues of A on the main diagonal.

7.2 Orthogonal Diagonalization

415

THEOREM 7.2.3 Schur’s Theorem

If A is an n × n matrix with real entries and real eigenvalues, then there is an orthogonal matrix P such that P TAP is an upper triangular matrix of the form



λ1

⎢0 ⎢ ⎢ P AP = ⎢ 0 ⎢ .. ⎣. T

0

× λ2

⎤ × ×⎥ ⎥ ×⎥ ⎥ .. ⎥ .⎦

.. .

× × λ3 .. .

··· ··· ··· .. .

0

0

· · · λn

0

(11)

in which λ1 , λ2 , . . . , λn are the eigenvalues of A repeated according to multiplicity. It is common to denote the upper triangular matrix in (11) by S (for Schur), in which case that equation would be rewritten as A = PSP T (12) which is called a Schur decomposition of A. The next theorem, due to the German electrical engineer Karl Hessenberg (1904– 1959), states that every square matrix with real entries is orthogonally similar to a matrix in which each entry below the first subdiagonal is zero (Figure 7.2.2). Such a matrix is said to be in upper Hessenberg form.

First subdiagonal

Figure 7.2.2

THEOREM 7.2.4 Hessenberg’s Theorem

If A is an n × n matrix with real entries, then there is an orthogonal matrix P such that P TAP is a matrix of the form ⎡ ⎤

Note that unlike those in (11), the diagonal entries in (13) are usually not the eigenvalues of A.

× × ··· × × × ⎢× × · · · × × ×⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎢ 0 × × × ×⎥ T ⎢ P AP = ⎢ .. .. . . . .. .. ⎥ ⎥ . .. . . .⎥ ⎢. ⎢ ⎥ ⎣ 0 0 · · · × × ×⎦ 0 0 ··· 0 × ×

(13)

It is common to denote the upper Hessenberg matrix in (13) by H (for Hessenberg), in which case that equation can be rewritten as

A = PHP T

(14)

which is called an upper Hessenberg decomposition of A.

Issai Schur (1875–1941)

Historical Note The life of the German mathematician Issai Schur is a sad reminder of the effect that Nazi policies had on Jewish intellectuals during the 1930s. Schur was a brilliant mathematician and a popular lecturer who attracted many students and researchers to the University of Berlin, where he worked and taught. His lectures sometimes attracted so many students that opera glasses were needed to see him from the back row. Schur’s life became increasingly difficult under Nazi rule, and in April of 1933 he was forced to “retire” from the university under a law that prohibited non-Aryans from holding “civil service” positions. There was an outcry from many of his students and colleagues who respected and liked him, but it did not stave off his complete dismissal in 1935. Schur, who thought of himself as a loyal German, never understood the persecution and humiliation he received at Nazi hands. He left Germany for Palestine in 1939, a broken man. Lacking in financial resources, he had to sell his beloved mathematics books and lived in poverty until his death in 1941. [Image: Courtesy Electronic Publishing Services, Inc., NewYork City ]

416

Chapter 7 Diagonalization and Quadratic Forms

Remark In many numerical algorithms the initial matrix is first converted to upper Hessenberg form to reduce the amount of computation in subsequent parts of the algorithm. Many computer packages have built-in commands for finding Schur and Hessenberg decompositions.

Exercise Set 7.2 In Exercises 1–6, find the characteristic equation of the given symmetric matrix, and then by inspection determine the dimensions of the eigenspaces.

1.

1 2



1 ⎢ 3. ⎣1 1



4 ⎢4 ⎢ 5. ⎢ ⎣0 0

2 4





1 1 1

1 ⎥ 1⎦ 1

4 4 0 0

0 0 0 0



4 ⎢ 4. ⎣2 2



2 ⎥ −2 ⎦ −2

1 −2 2 ⎥ 2⎦ 4

−1

2 ⎢−1 ⎢ 6. ⎢ ⎣ 0 0



⎡ ⎤

⎡ ⎤





⎡ ⎤

⎡ ⎤

0 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 20. x1 = ⎣ 1⎦, x2 = ⎣0⎦, x3 = ⎣1⎦ 0 1 −1



0 0 2 −1

2 0 0



0 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 19. x1 = ⎣ 1⎦, x2 = ⎣0⎦, x3 = ⎣1⎦ 0 1 −1



2 4 2



0 0⎥ ⎥ ⎥ 0⎦ 0



−4

1 ⎢ 2. ⎣−4 2

In Exercises 19–20, determine whether there exists a 3 × 3 symmetric matrix whose eigenvalues are λ1 = −1, λ2 = 3, λ3 = 7 and for which the corresponding eigenvectors are as stated. If there is such a matrix, find it, and if there is none, explain why not.

0 0⎥ ⎥ ⎥ −1 ⎦ 2

21. Let A be a diagonalizable matrix with the property that eigenvectors corresponding to distinct eigenvalues are orthogonal. Must A be symmetric? Explain your reasoning. 22. Assuming that b  = 0, find a matrix that orthogonally diagonalizes

a b

In Exercises 7–14, find a matrix P that orthogonally diagonalizes A, and determine P −1AP .

 7. A =

√ 

6



2 3

2 3

7

−2

0 −3 0

⎡ ⎢

9. A = ⎣

0 −36



2 ⎢ 11. A = ⎣−1 −1



−7

⎢ 24 ⎢ ⎣ 0

13. A = ⎢

0

3 8. A = 1



2

1 ⎢ 12. A = ⎣1 0

24 0 7 0 0 −7 0 24

0 3 ⎢1 0⎥ ⎢ ⎥ ⎥ 14. A = ⎢ 24⎦ ⎣0 0 7

2 −1

1 3

⎤ −36 6 ⎥ 0⎦ 10. A = −2 −23 ⎤ −1 ⎥ −1⎦

−1





23. Let TA : R 2 →R 2 be multiplication by A. Find two orthogonal unit vectors u1 and u2 such that TA (u1 ) and TA (u2 ) are orthogonal.





−2

(a) A =

3



1 1 0

0 ⎥ 0⎦ 0

1 3 0 0

0 0 0 0

3 15. 1



−3 ⎢ 17. ⎣ 1 2



1 3

6 16. −2 1

−3 2



2 2⎥ ⎦ 0



−2

1

1

1







(b) A =



0 0⎥ ⎥ ⎥ 0⎦ 0



4

2

2

(a) A = ⎣2

4

2⎦

2

2

4





1

2

2

1







1

0

0

(b) A = ⎣0

1

1⎦

0

1

1





Working with Proofs 25. Prove that if A is any m × n matrix, then ATA has an orthonormal set of n eigenvectors. 26. Prove: If {u1 , u2 , . . . , un } is an orthonormal basis for R n , and if A can be expressed as

A = c1 u1 uT1 + c2 u2 uT2 + · · · + cn un uTn

then A is symmetric and has eigenvalues c1 , c2 , . . . , cn .

3

−2

0

0 −36

−3



18. ⎣

−1

24. Let TA : R 3 →R 3 be multiplication by A. Find two orthogonal unit vectors u1 and u2 such that TA (u1 ) and TA (u2 ) are orthogonal.

In Exercises 15–18, find the spectral decomposition of the matrix.



b a

0

−36

⎤ ⎥

0⎦ −23

27. Use the result in Exercise 29 of Section 5.1 to prove Theorem 7.2.2(a) for 2 × 2 symmetric matrices. 28. (a) Prove that if v is any n × 1 matrix and I is the n × n identity matrix, then I − vvT is orthogonally diagonalizable.

7.3 Quadratic Forms

(b) Find a matrix P that orthogonally diagonalizes I − vvT if

(c) Use Theorem 7.2.2(b) and the fact that A is diagonalizable to prove that A is orthogonally diagonalizable.

⎡ ⎤ 1

⎢ ⎥

v = ⎣0⎦

True-False Exercises

1 29. Prove that if A is a symmetric orthogonal matrix, then 1 and −1 are the only possible eigenvalues. 30. Is the converse of Theorem 7.2.2(b) true? Justify your answer. 31. In this exercise we will show that a symmetric matrix A is orthogonally diagonalizable, thereby completing the missing part of Theorem 7.2.1. We will proceed in two steps: first we will show that A is diagonalizable, and then we will build on that result to show that A is orthogonally diagonalizable. (a) Assume that A is a symmetric n × n matrix. One way to prove that A is diagonalizable is to show that for each eigenvalue λ0 the geometric multiplicity is equal to the algebraic multiplicity. For this purpose, assume that the geometric multiplicity of λ0 is k , let B0 = {u1 , u2 , . . . , uk } be an orthonormal basis for the eigenspace corresponding to the eigenvalue λ0 , extend this to an orthonormal basis B0 = {u1 , u2 , . . . , un } for R n , and let P be the matrix having the vectors of B as columns. As shown in Exercise 40(b) of Section 5.2, the product AP can be written as  

AP = P

(a) If A is a square matrix, then AAT and ATA are orthogonally diagonalizable. (b) If v1 and v2 are eigenvectors from distinct eigenspaces of a symmetric matrix with real entries, then

v1 + v2 2 = v1 2 + v2 2 (c) Every orthogonal matrix is orthogonally diagonalizable. (d) If A is both invertible and orthogonally diagonalizable, then A−1 is orthogonally diagonalizable. (e) Every eigenvalue of an orthogonal matrix has absolute value 1. (f ) If A is an n × n orthogonally diagonalizable matrix, then there exists an orthonormal basis for R n consisting of eigenvectors of A. (g) If A is orthogonally diagonalizable, then A has real eigenvalues.

X

Working withTechnology

0

Y

T1. If your technology utility has an orthogonal diagonalization capability, use it to confirm the final result obtained in Example 1.

(b) It follows from part (a) and Exercise 40(c) of Section 5.2 that A has the same characteristic polynomial as

C=P

TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.

λ0 Ik

Use the fact that B is an orthonormal basis to prove that X = 0 [a zero matrix of size n × (n − k)].



417

λ0 Ik

0

0

Y



Use this fact and Exercise 40(d) of Section 5.2 to prove that the algebraic multiplicity of λ0 is the same as the geometric multiplicity of λ0 . This establishes that A is diagonalizable.

T2. For the given matrix A, find orthonormal bases for the eigenspaces of A, and use those basis vectors to construct an orthogonal matrix P for which P TAP is diagonal.



−4

2

⎢ A=⎣ 2

−7

−2

4

−2

⎤ ⎥

4⎦

−7

T3. Find a spectral decomposition of the matrix A in Exercise T2.

7.3 Quadratic Forms In this section we will use matrix methods to study real-valued functions of several variables in which each term is either the square of a variable or the product of two variables. Such functions arise in a variety of applications, including geometry, vibrations of mechanical systems, statistics, and electrical engineering.

Definition of a Quadratic Form

Expressions of the form

a1 x1 + a2 x2 + · · · + an xn occurred in our study of linear equations and linear systems. If a1 , a2 , . . . , an are treated as fixed constants, then this expression is a real-valued function of the n variables x1 , x2 , . . . , xn and is called a linear form on R n . All variables in a linear form occur to

418

Chapter 7 Diagonalization and Quadratic Forms

the first power and there are no products of variables. Here we will be concerned with quadratic forms on R n , which are functions of the form

a 1 x12 + a 2 x22 + · · · + a n xn2 + (all possible terms ak xi xj in which i  = j ) The terms of the form ak xi xj are called cross product terms. It is common to combine the cross product terms involving xi xj with those involving xj xi to avoid duplication. Thus, a general quadratic form on R 2 would typically be expressed as

a 1 x12 + a 2 x22 + 2a 3 x 1 x 2

(1)

3

and a general quadratic form on R as

a 1 x12 + a 2 x22 + a 3 x32 + 2a 4 x 1 x 2 + 2a 5 x 1 x 3 + 2a 6 x 2 x 3

(2)

If, as usual, we do not distinguish between the number a and the 1 × 1 matrix [a ], and if we let x be the column vector of variables, then (1) and (2) can be expressed in matrix form as





x1



x1

x2

 a1 a3

x2

a3 a2 ⎡ a  1 ⎣ x 3 a4 a5

x1 = xTAx x2 ⎤⎡ ⎤ a4 a5 x1 ⎦ ⎣ a2 a6 x2 ⎦ = xTAx a6 a3 x3

(verify). Note that the matrix A in these formulas is symmetric, that its diagonal entries are the coefficients of the squared terms, and its off-diagonal entries are half the coefficients of the cross product terms. In general, if A is a symmetric n × n matrix and x is an n × 1 column vector of variables, then we call the function

QA (x) = xTAx

(3)

the quadratic form associated with A. When convenient, (3) can be expressed in dot product notation as xTAx = x · Ax = Ax · x

(4) T

In the case where A is a diagonal matrix, the quadratic form x Ax has no cross product terms; for example, if A has diagonal entries λ1 , λ2 , . . . , λn , then



x Ax = [x1 T

x2

λ1

0

⎢ ⎢0 · · · xn ] ⎢ ⎢ ... ⎣

λ2 .. .

0

0

⎤⎡ ⎤

··· ··· .. .

0 x ⎥ 1⎥ 0 ⎥⎢ ⎢x2 ⎥

.. ⎥ ⎢ . ⎥ = λ1 x12 + λ2 x22 + · · · + λn xn2 ⎥ . ⎦ ⎣ .. ⎦ xn · · · λn

E X A M P L E 1 Expressing Quadratic Forms in Matrix Notation

In each part, express the quadratic form in the matrix notation xTAx, where A is symmetric. (a) 2x 2 + 6xy − 5y 2 (b) x12 + 7x22 − 3x32 + 4x 1 x 2 − 2x 1 x 3 + 8x 2 x 2

A are the coefficients of the squared terms, and the off-diagonal entries are half the coefficients of the cross product terms, so

Solution The diagonal entries of



2x 2 + 6xy − 5y 2 = [x

y] 

x12 + 7x22 − 3x32 + 4x 1 x 2 − 2x 1 x 3 + 8x 2 x 3 = x 1

2 3

x2

x y ⎡ 1  ⎣ x3 2 −1

3 −5

2 7 4

⎤⎡ ⎤ x1 −1 ⎦ ⎣ x2 ⎦ 4 −3

x3

7.3 Quadratic Forms

Change of Variable in a Quadratic Form

419

There are three important kinds of problems that occur in applications of quadratic forms:

Problem 1 If xTAx is a quadratic form on R 2 or R 3 , what kind of curve or surface is

represented by the equation xTAx = k ? Problem 2 If xTAx is a quadratic form on

R n , what conditions must A satisfy for xTAx to have positive values for x  = 0?

Problem 3 If xTAx is a quadratic form on R n , what are its maximum and minimum

values if x is constrained to satisfy x = 1?

We will consider the first two problems in this section and the third problem in the next section. Many of the techniques for solving these problems are based on simplifying the quadratic form xTAx by making a substitution x = Py

(5)

that expresses the variables x1 , x2 , . . . , xn in terms of new variables y1 , y2 , . . . , yn . If P is invertible, then we call (5) a change of variable, and if P is orthogonal, then we call (5) an orthogonal change of variable. If we make the change of variable x = P y in the quadratic form xTAx, then we obtain xTAx = (P y)TA(P y) = yTP TAP y = yT(P TAP )y

(6)

Since the matrix B = P TAP is symmetric (verify), the effect of the change of variable is to produce a new quadratic form yTB y in the variables y1 , y2 , . . . , yn . In particular, if we choose P to orthogonally diagonalize A, then the new quadratic form will be yTD y, where D is a diagonal matrix with the eigenvalues of A on the main diagonal; that is,

⎡ λ1

x Ax = y D y = [y1 T

T

y2

0

⎢ ⎢0 · · · yn ] ⎢ ⎢ .. ⎣.

λ2 .. .

0

0

··· ··· .. .

⎤⎡ ⎤

y 0 ⎥ ⎢ 1⎥ 0⎥ y ⎥ ⎢ 2⎥

.. ⎥ ⎢ .. ⎥ . ⎦⎣ . ⎦ yn · · · λn

= λ1 y12 + λ2 y22 + · · · + λn yn2 Thus, we have the following result, called the principal axes theorem.

THEOREM 7.3.1 The Principal Axes Theorem

If A is a symmetric n × n matrix, then there is an orthogonal change of variable that transforms the quadratic form xTAx into a quadratic form yTD y with no cross product terms. Specifically, if P orthogonally diagonalizes A, then making the change of variable x = P y in the quadratic form xTAx yields the quadratic form xTAx = yTD y = λ1 y12 + λ2 y22 + · · · + λn yn2 in which λ1 , λ2 , . . . , λn are the eigenvalues of A corresponding to the eigenvectors that form the successive columns of P .

420

Chapter 7 Diagonalization and Quadratic Forms

E X A M P L E 2 An Illustration of the Principal Axes Theorem

Find an orthogonal change of variable that eliminates the cross product terms in the quadratic form Q = x12 − x32 − 4x1 x2 + 4x2 x3 , and express Q in terms of the new variables. Solution The quadratic form can be expressed in matrix notation as



Q = xTAx = [x1

−2

1 ⎢ x3 ] ⎣−2 0

x2

0 2

⎤⎡ ⎤

x1 0 ⎥⎢ ⎥ 2⎦ ⎣x2 ⎦ −1 x3

The characteristic equation of the matrix A is

 λ − 1 2   λ  2   0 −2

     = λ3 − 9λ = λ(λ + 3)(λ − 3) = 0  λ + 1 0 −2

so the eigenvalues are λ = 0, −3, 3. We leave it for you to show that orthonormal bases for the three eigenspaces are

⎡2⎤







⎢1⎥ ⎢ 2⎥ ⎢ ⎥ ⎢ ⎥ ⎢ λ = 0: ⎢ ⎣ 3 ⎦ , λ = −3: ⎣− 3 ⎦ , λ = 3: ⎣

2⎥ 3⎦

3

2 3

− 13



− 23

2 3



1 3

Thus, a substitution x = P y that eliminates the cross product terms is

⎡2 ⎡ ⎤ 3 x1 ⎢ ⎣x2 ⎦ = ⎢ 1 ⎣3 x3 2

− 13 − 23 2 3

3

This produces the new quadratic form



Q = yT(P TAP )y = y1

y2



0 y3 ⎣0 0





⎡ ⎤ y1 ⎥ 2⎥⎣ ⎦ y2 3⎦ y3 1

− 23

3

0 −3 0

⎤⎡ ⎤

y1 0 0⎦ ⎣y2 ⎦ = −3y22 + 3y32 y3 3

in which there are no cross product terms. Remark If A is a symmetric n × n matrix, then the quadratic form xTAx is a real-valued function whose range is the set of all possible values for xTAx as x varies over R n . It can be shown that an orthogonal change of variable x = P y does not alter the range of a quadratic form; that is, the set of all values for xTAx as x varies over R n is the same as the set of all values for yT(P TAP )y as y varies over R n .

Quadratic Forms in Geometry

Recall that a conic section or conic is a curve that results by cutting a double-napped cone with a plane (Figure 7.3.1). The most important conic sections are ellipses, hyperbolas, and parabolas, which result when the cutting plane does not pass through the vertex. Circles are special cases of ellipses that result when the cutting plane is perpendicular to the axis of symmetry of the cone. If the cutting plane passes through the vertex, then the resulting intersection is called a degenerate conic. The possibilities are a point, a pair of intersecting lines, or a single line. Quadratic forms in R 2 arise naturally in the study of conic sections. For example, it is shown in analytic geometry that an equation of the form

ax 2 + 2bxy + cy 2 + dx + ey + f = 0

(7)

7.3 Quadratic Forms

Circle

Figure 7.3.1

Ellipse

Parabola

421

Hyperbola

in which a, b, and c are not all zero, represents a conic section.* If d = e = 0 in (7), then there are no linear terms, so the equation becomes

ax 2 + 2bxy + cy 2 + f = 0

(8)

and is said to represent a central conic. These include circles, ellipses, and hyperbolas, but not parabolas. Furthermore, if b = 0 in (8), then there is no cross product term (i.e., term involving xy), and the equation

ax 2 + cy 2 + f = 0

(9)

is said to represent a central conic in standard position. The most important conics of this type are shown in Table 1. Table 1 y

β

y

y

y

β

β

β α

–α

x

x α

–α

–α

x

α

x –α

α

–β

–β

–β

–β x2

+

y2

x2

=1

α β (α ≥ β > 0) 2

2

+

y2

=1

α β (β ≥ α > 0) 2

2

x2



y2

y2

=1

2

2



x2

=1 β α2 (α > 0, β > 0)

α β (α > 0, β > 0) 2

If we take the constant f in Equations (8) and (9) to the right side and let k = −f , then we can rewrite these equations in matrix form as



x

y

 a b

b c

 x = k and x y

y

 a

0

x =k c y

0

(10)

* We must also allow for the possibility that there are no real values of x and y that satisfy the equation, as with x 2 + y 2 + 1 = 0. In such cases we say that the equation has no graph or has an empty graph.

422

Chapter 7 Diagonalization and Quadratic Forms y

x

The first of these corresponds to Equation (8) in which there is a cross product term 2bxy , and the second corresponds to Equation (9) in which there is no cross product term. Geometrically, the existence of a cross product term signals that the graph of the quadratic form is rotated about the origin, as in Figure 7.3.2. The three-dimensional analogs of the equations in (10) are





x

A central conic rotated out of standard position

y

a ⎢ z ⎣d e

d b f

⎤⎡ ⎤ e x  ⎥⎢ ⎥ f ⎦ ⎣y ⎦ = k and x c z



a

y

⎤⎡ ⎤

⎢ z ⎣0

b

x 0 ⎥⎢ ⎥ 0⎦ ⎣y ⎦ = k

0

0

c

0

(11)

z

If a, b, and c are not all zero, then the graphs in R 3 of the equations in (11) are called central quadrics; the graph of the second of these equations, which is a special case of the first, is called a central quadric in standard position.

Figure 7.3.2

Identifying Conic Sections

We are now ready to consider the first of the three problems posed earlier, identifying the curve or surface represented by an equation xTAx = k in two or three variables. We will focus on the two-variable case. We noted above that an equation of the form

ax 2 + 2bxy + cy 2 + f = 0

(12)

represents a central conic. If b = 0, then the conic is in standard position, and if b  = 0, it is rotated. It is an easy matter to identify central conics in standard position by matching the equation with one of the standard forms. For example, the equation 9x 2 + 16y 2 − 144 = 0 y

can be rewritten as

3 x –4

4 –3 x2 y2 + =1 16 9

Figure 7.3.3

x2

+

y2

=1 16 9 which, by comparison with Table 1, is the ellipse shown in Figure 7.3.3. If a central conic is rotated out of standard position, then it can be identified by first rotating the coordinate axes to put it in standard position and then matching the resulting equation with one of the standard forms in Table 1. To find a rotation that eliminates the cross product term in the equation ax 2 + 2bxy + cy 2 = k

(13)

it will be convenient to express the equation in the matrix form



xTAx = [x

y]

a b

b c

x =k y

(14)

and look for a change of variable

x = P x

that diagonalizes A and for which det(P ) = 1. Since we saw in Example 4 of Section 7.1 that the transition matrix

cos θ − sin θ P = (15) sin θ cos θ has the effect of rotating the xy-axes of a rectangular coordinate system through an angle θ , our problem reduces to finding θ that diagonalizes A, thereby eliminating the cross product term in (13). If we make this change of variable, then in the x y -coordinate system, Equation (14) will become x TD x = [x



y ]

λ1 0



x =k λ2 y

0

(16)

where λ1 and λ2 are the eigenvalues of A. The conic can now be identified by writing (16) in the form λ1 x 2 + λ 2 y 2 = k (17)

7.3 Quadratic Forms

423

and performing the necessary algebra to match it with one of the standard forms in Table 1. For example, √ if λ1 , λ2 , and k are positive,√then (17) represents an ellipse with an axis of length 2 k/λ1 in the x -direction and 2 k/λ2 in the y -direction. The first column vector of P , which is a unit eigenvector corresponding to λ1 , is along the positive x -axis; and the second column vector of P , which is a unit eigenvector corresponding to λ2 , is a unit vector along the y -axis. These are called the principal axes of the ellipse, which explains why Theorem 7.3.1 is called “the principal axes theorem.” (See Figure 7.3.4.) Unit eigenvector for λ2 y´

y

√k/λ1 x´ (cos θ, sin θ)

(–– sin θ, cos θ) θ

x

√k/λ2

Unit eigenvector for λ1

Figure 7.3.4

E X A M P L E 3 Identifying a Conic by Eliminating the Cross Product Term

(a) Identify the conic whose equation is 5x 2 − 4xy + 8y 2 − 36 = 0 by rotating the xy-axes to put the conic in standard position. (b) Find the angle θ through which you rotated the xy-axes in part (a). Solution (a) The given equation can be written in the matrix form

xTAx = 36



where

A= The characteristic polynomial of A is

 λ − 5   2

5 −2

−2

8



2  = (λ − 4)(λ − 9) λ − 8

so the eigenvalues are λ = 4 and λ = 9. We leave it for you to show that orthonormal bases for the eigenspaces are



λ = 4: ⎣



√2

5

√1



− √15

⎦ , λ = 9: ⎣

5



5

Thus, A is orthogonally diagonalized by



P =⎣

− √15

√2

5

√2

√1

5

Had it turned out that det(P ) = −1, then we would have interchanged the columns to reverse the sign.

√2



⎤ ⎦

(18)

5

Moreover, it happens by chance that det(P ) = 1, so we are assured that the substitution x = P x performs a rotation of axes. It follows from (16) that the equation of the conic in the x y -coordinate system is



[x



4 y] 0





x = 36 y

0 9

424

Chapter 7 Diagonalization and Quadratic Forms

which we can write as 4x 2 + 9y 2 = 36 or

x 2

y 2

=1 9 4 We can now see from Table 1 that the conic is an ellipse whose axis has length 2α = 6 in the x -direction and length 2β = 4 in the y -direction. y´ y 2

1

(– √

5

,

2

√5

)

(0, 2)

(√

5

,

1

√ )

Solution (b) It follows from (15) that



5

P =⎣

x´ (3, 0) x 26.6˚

Figure 7.3.5

√2

5

√1

5

− √15 √2

5

+



⎦ = cos θ sin θ

− sin θ cos θ

which implies that

Thus, θ = tan−1

2 1 sin θ 1 = cos θ = √ , sin θ = √ , tan θ = cos θ 2 5 5 1 ◦ ≈ 26 . 6 (Figure 7.3.5). 2

Remark In the exercises we will ask you to show that if b = 0, then the cross product term in the equation

ax 2 + 2bxy + cy 2 = k can be eliminated by a rotation through an angle θ that satisfies cot 2θ =

a−c 2b

(19)

We leave it for you to confirm that this is consistent with part (b) of the last example.

Positive Definite Quadratic Forms

The terminology in Definition 1 also applies to the matrix A; that is, A is positive definite, negative definite, or indefinite in accordance with whether the associated quadratic form has that property.

We will now consider the second of the two problems posed earlier, determining conditions under which xTAx > 0 for all nonzero values of x. We will explain why this is important shortly, but first we introduce some terminology. DEFINITION 1 A quadratic form xTAx is said to be

positive definite if xTAx > 0 for x  = 0; negative definite if xTAx < 0 for x  = 0; indefinite if xTAx has both positive and negative values. The following theorem, whose proof is deferred to the end of the section, provides a way of using eigenvalues to determine whether a matrix A and its associated quadratic form xTAx are positive definite, negative definite, or indefinite. THEOREM 7.3.2 If A is a symmetric matrix, then:

(a) xTAx is positive definite if and only if all eigenvalues of A are positive. (b) xTAx is negative definite if and only if all eigenvalues of A are negative. (c) xTAx is indefinite if and only if A has at least one positive eigenvalue and at least one negative eigenvalue. Remark The three classifications in Definition 1 do not exhaust all of the possibilities. For example, a quadratic form for which xTAx ≥ 0 if x  = 0 is called positive semidefinite, and one for which xTAx ≤ 0 if x  = 0 is called negative semidefinite. Every positive definite form is positive semidefinite, but not conversely, and every negative definite form is negative semidefinite, but not conversely (why?). By adjusting the proof of Theorem 7.3.2 appropriately, one can prove that xTAx is positive semidefinite if and only if all eigenvalues of A are nonnegative and is negative semidefinite if and only if all eigenvalues of A are nonpositive.

7.3 Quadratic Forms

425

E X A M P L E 4 Positive Definite Quadratic Forms

It is not usually possible to tell from the signs of the entries in a symmetric matrix A whether that matrix is positive definite, negative definite, or indefinite. For example, the entries of the matrix ⎡ ⎤ 3 1 1 A = ⎣1 0 2⎦ 1 2 0 are nonnegative, but the matrix is indefinite since its eigenvalues are λ = 1, 4, −2 (verify). To see this another way, let us write out the quadratic form as



Positive definite and negative definite matrices are invertible. Why?

xTAx = [x1

x2

3 ⎣ x3 ] 1 1

1 0 2

⎤⎡ ⎤

x1 1 ⎦ ⎣ x2 ⎦ = 3x12 + 2x1 x2 + 2x1 x3 + 4x2 x3 2 x3 0

We can now see, for example, that xTAx = 4 for x1 = 0, x2 = 1, x3 = 1 and xTAx = −4 for x1 = 0, x2 = 1, x3 = −1

Classifying Conic Sections Using Eigenvalues

If xTB x = k is the equation of a conic, and if k  = 0, then we can divide through by k and rewrite the equation in the form xTAx = 1

(20)

where A = (1/k)B . If we now rotate the coordinate axes to eliminate the cross product term (if any) in this equation, then the equation of the conic in the new coordinate system will be of the form λ1 x 2 + λ 2 y 2 = 1 (21) in which λ1 and λ2 are the eigenvalues of A. The particular type of conic represented by this equation will depend on the signs of the eigenvalues λ1 and λ2 . For example, you should be able to see from (21) that: • xTAx = 1 represents an ellipse if λ1 > 0 and λ2 > 0.

y y´



• xTAx = 1 has no graph if λ1 < 0 and λ2 < 0. • xTAx = 1 represents a hyperbola if λ1 and λ2 have opposite signs.

1/√λ2

1/√λ1

x

In the case of the ellipse, Equation (21) can be rewritten as

x 2 y 2 =1 √ 2+ √ (1/ λ1 ) (1/ λ2 )2 √ √ so the axes of the ellipse have lengths 2/ λ1 and 2/ λ2 (Figure 7.3.6). Figure 7.3.6

(22)

The following theorem is an immediate consequence of this discussion and Theorem 7.3.2. THEOREM 7.3.3 If A is a symmetric 2 × 2 matrix, then:

(a) xTAx = 1 represents an ellipse if A is positive definite. (b) xTAx = 1 has no graph if A is negative definite. (c) xTAx = 1 represents a hyperbola if A is indefinite.

426

Chapter 7 Diagonalization and Quadratic Forms

In Example 3 we performed a rotation to show that the equation 5x 2 − 4xy + 8y 2 − 36 = 0 represents an ellipse with a major axis of length 6 and a minor axis of length 4. This conclusion can also be obtained by rewriting the equation in the form 5 2 x 36

− 19 xy + 29 y 2 = 1

and showing that the associated matrix



5 36

A=⎣ − 181

− 181 2 9

⎤ ⎦

has eigenvalues λ1 = 19 and λ2 = 41 . These eigenvalues are positive, so the matrix A is positive definite and the equation represents it follows from (21) √ an ellipse. Moreover, √ that the axes of the ellipse have lengths 2/ λ1 = 6 and 2/ λ2 = 4, which is consistent with Example 3.

Identifying Positive Definite Matrices



a11 ⎢a ⎢ 21 ⎢ ⎣a31 a41

a12 a22 a32 a42

Positive definite matrices are the most important symmetric matrices in applications, so it will be useful to learn a little more about them. We already know that a symmetric matrix is positive definite if and only if its eigenvalues are all positive; now we will give a criterion that can be used to determine whether a symmetric matrix is positive definite without finding the eigenvalues. For this purpose we define the kth principal submatrix of an n × n matrix A to be the k × k submatrix consisting of the first k rows and columns of A. For example, here are the principal submatrices of a general 4 × 4 matrix:

a13 a23 a33 a43

⎤ a14 a24 ⎥ ⎥ ⎥ a34 ⎦ a44

First principal submatrix



a11 ⎢a ⎢ 21 ⎢ ⎣a31 a41

a12 a22 a32 a42

a13 a23 a33 a43

⎤ a14 a24 ⎥ ⎥ ⎥ a34 ⎦ a44

Second principal submatrix



a11 ⎢a ⎢ 21 ⎢ ⎣a31 a41

a12 a22 a32 a42

a13 a23 a33 a43

⎤ a14 a24 ⎥ ⎥ ⎥ a34 ⎦ a44

Third principal submatrix



a11 ⎢a ⎢ 21 ⎢ ⎣a31 a41

a12 a22 a32 a42

a13 a23 a33 a43

⎤ a14 a24 ⎥ ⎥ ⎥ a34 ⎦ a44

Fourth principal submatrix = A

The following theorem, which we state without proof, provides a determinant test for ascertaining whether a symmetric matrix is positive definite.

THEOREM 7.3.4 If A is a symmetric matrix, then:

(a) A is positive definite if and only if the determinant of every principal submatrix is positive. (b) A is negative definite if and only if the determinants of the principal submatrices alternate between negative and positive values starting with a negative value for the determinant of the first principal submatrix. (c) A is indefinite if and only if it is neither positive definite nor negative definite and at least one principal submatrix has a positive determinant and at least one has a negative determinant.

7.3 Quadratic Forms

427

E X A M P L E 5 Working with Principal Submatrices

The matrix



2 A = ⎣−1 −3

2 4

is positive definite since the determinants

  2 |2| = 2,  −1

⎤ −3 4⎦

−1

9

  2  −1  −3

 −1 = 3, 2

 −3  4 = 1 9

−1 2 4

are all positive. Thus, we are guaranteed that all eigenvalues of A are positive and xTAx > 0 for x  = 0. O PT I O N A L

We conclude this section with an optional proof of Theorem 7.3.2. Proofs of Theorem 7.3.2 (a) and (b) It follows from the principal axes theorem (Theorem 7.3.1) that there is an orthogonal change of variable x = P y for which

xTAx = yTD y = λ1 y12 + λ2 y22 + · · · + λn yn2

(23)

where the λ’s are the eigenvalues of A. Moreover, it follows from the invertibility of P that y  = 0 if and only if x  = 0, so the values of xTAx for x  = 0 are the same as the values of yTD y for y  = 0. Thus, it follows from (23) that xTAx > 0 for x  = 0 if and only if all of the λ’s in that equation are positive, and that xTAx < 0 for x  = 0 if and only if all of the λ’s are negative. This proves parts (a) and (b). Proof (c) Assume that A has at least one positive eigenvalue and at least one negative eigenvalue, and to be specific, suppose that λ1 > 0 and λ2 < 0 in (23). Then

xTAx > 0 if y1 = 1 and all other y ’s are 0 and xTAx < 0 if y2 = 1 and all other y ’s are 0 which proves that xTAx is indefinite. Conversely, if xTAx > 0 for some x, then yTD y > 0 for some y, so at least one of the λ’s in (23) must be positive. Similarly, if xTAx < 0 for some x, then yTD y < 0 for some y, so at least one of the λ’s in (23) must be negative, which completes the proof.

Exercise Set 7.3 In Exercises 1–2, express the quadratic form in the matrix notation xTAx, where A is a symmetric matrix. 1. (a)

3x12

+

7x22

(b)

4x12



9x22

− 6x1 x2

(c) 9x12 − x22 + 4x32 + 6x1 x2 − 8x1 x3 + x2 x3 2. (a) 5x12 + 5x1 x2 (c)

x12

+

x22



3x32

(b) −7x1 x2

− 5x1 x2 + 9x1 x3

In Exercises 3–4, find a formula for the quadratic form that does not use matrices.



3. [x

2 y] −3

−3 x 5 y

⎡ 

4. x1

x2

−2

7 2

7 2

0

1

6

⎢ ⎢ x3 ⎢ ⎣



⎡ ⎤ ⎥ x1 ⎥⎢ ⎥ 6⎥ ⎣x2 ⎦ ⎦ x3 3

1

In Exercises 5–8, find an orthogonal change of variables that eliminates the cross product terms in the quadratic form Q, and express Q in terms of the new variables. 5. Q = 2x12 + 2x22 − 2x1 x2 6. Q = 5x12 + 2x22 + 4x32 + 4x1 x2 7. Q = 3x12 + 4x22 + 5x32 + 4x1 x2 − 4x2 x3 8. Q = 2x12 + 5x22 + 5x32 + 4x1 x2 − 4x1 x3 − 8x2 x3

428

Chapter 7 Diagonalization and Quadratic Forms

In Exercises 9–10, express the quadratic equation in the matrix form xTAx + Kx + f = 0, where xTAx is the associated quadratic form and K is an appropriate matrix. 9. (a) 2x 2 + xy + x − 6y + 2 = 0

10. (a) x 2 − xy + 5x + 8y − 3 = 0



(b) 5xy = 8 In Exercises 11–12, identify the conic section represented by the equation. 11. (a) 2x 2 + 5y 2 = 20

(b) x 2 − y 2 − 8 = 0

12. (a) 4x 2 + 9y 2 = 1

(b) 4x 2 − 5y 2 = 20

(c) −x = 2y

(d) x − 3 = −y

2

2

14. 5x 2 + 4xy + 5y 2 = 9

15. 11x 2 + 24xy + 4y 2 − 15 = 0



1 (d) 0



2 0

18. (a)



0 (d) 0

0 2 0 0

16. x 2 + xy + y 2 =





−2

0 −5

0

(e)

2 0

0 0



−1

(c)

0 2

0

19.

+

x22

20.

22. −(x1 − x2 )2

−x12



5 25. (a) A = −2

−2 5

−1

2





−3

2 ⎥ 3⎦ 2 1 2 1

−1



(b) A = ⎣ 2 0

⎤ ⎥

1⎦ 2



−4 ⎢ (b) A = ⎣−1 1



2 −3 0

0 ⎥ 0⎦ −5

−1 −2 −1

1 ⎥ −1⎦ −2



29. 5x12 + x22 + kx32 + 4x1 x2 − 2x1 x3 − 2x2 x3 30. 3x12 + x22 + 2x32 − 2x1 x3 + 2kx2 x3 31. Let xTAx be a quadratic form in the variables x1 , x2 , . . . , xn , and define T : R n →R by T(x) = xTAx.

32. Express the quadratic form (c1 x1 + c2 x2 + · · · + cn xn )2 in the matrix notation xTAx, where A is symmetric.

1

n

(x1 + x2 + · · · + xn )

and

(c)

2 0

0 5

sx2 =

21. (x1 − x2 )

2

24. x1 x2



2

−1

0

2 0

⎢ (b) A = ⎣−1

1

n−1



(x1 − x )2 + (x2 − x )2 + · · · + (xn − x)2



are called, respectively, the sample mean and sample variance of x = (x1 , x2 , . . . , xn ).

In Exercises 25–26, show that the matrix A is positive definite first by using Theorem 7.3.2 and then by using Theorem 7.3.4.





1 −1 3

x=

3x22

23. x12 − x22

0



0 ⎥ −1⎦ 3

33. In statistics, the quantities

In Exercises 19–24, classify the quadratic form as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite.

x12

−1

(b) Show that T(cx) = c2 T(x).

0 −2

(b)

3 ⎢ 27. (a) A = ⎣1 2

0 0

(e)

0 −2

0



0 −5 0 −5

−1

(b)

3

⎢ (b) A = ⎣−1

(a) Show that T(x + y) = T(x) + 2xTAy + T(y). 1 2

In Exercises 17–18, determine by inspection whether the matrix is positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 1 0



In Exercises 29–30, find all values of k for which the quadratic form is positive definite.

2

In Exercises 13–16, identify the conic section represented by the equation by rotating axes to place the conic in standard position. Find an equation of the conic in the rotated coordinates, and find the angle of rotation.



1 2

4 ⎢ 28. (a) A = ⎣ 1 −1

(d) x 2 + y 2 − 25 = 0

(c) 7y 2 − 2x = 0

17. (a)

2 26. (a) A = 1

In Exercises 27–28, use Theorem 7.3.4 to classify the matrix as positive definite, negative definite, or indefinite.

(b) y 2 + 7x − 8y − 5 = 0

13. 2x 2 − 4xy − y 2 + 8 = 0





0 ⎥ 0⎦ 5

(a) Express the quadratic form sx2 in the matrix notation xTAx, where A is symmetric. (b) Is sx2 a positive definite quadratic form? Explain. 34. The graph in an xyz-coordinate system of an equation of form ax 2 + by 2 + cz2 = 1 in which a, b, and c are positive is a surface called a central ellipsoid in standard position (see the accompanying figure). This is the three-dimensional generalization of the ellipse ax 2 + by 2 = 1 in the xy-plane. The intersections of the ellipsoid ax 2 + by 2 + cz2 = 1 with the coordinate axes determine three line segments called the axes of the ellipsoid. If a central ellipsoid is rotated about the origin so two or more of its axes do not coincide with any of the coordinate axes, then the resulting equation will have one or more cross product terms.

7.4 Optimization Using Quadratic Forms

(a) Show that the equation 4 2 x 3

+

4 2 y 3

+

4 2 z 3

429

(d) A positive definite matrix is invertible.

+

4 xy 3

+

4 xz 3

+

4 yz 3

=1

represents an ellipsoid, and find the lengths of its axes. [Suggestion: Write the equation in the form xTAx = 1 and make an orthogonal change of variable to eliminate the cross product terms.] (b) What property must a symmetric 3 × 3 matrix have in order for the equation xTAx = 1 to represent an ellipsoid? z

(e) A symmetric matrix is either positive definite, negative definite, or indefinite. (f ) If A is positive definite, then −A is negative definite. (g) x · x is a quadratic form for all x in R n . (h) If A is symmetric and invertible, and if xTAx is a positive definite quadratic form, then xTA−1 x is also a positive definite quadratic form. (i) If A is symmetric and has only positive eigenvalues, then xTAx is a positive definite quadratic form.

y

( j) If A is a 2 × 2 symmetric matrix with positive entries and det(A) > 0, then A is positive definite. x

Figure Ex-34

35. What property must a symmetric 2 × 2 matrix A have for xTAx = 1 to represent a circle?

Working with Proofs 36. Prove: If b  = 0, then the cross product term can be eliminated from the quadratic form ax 2 + 2bxy + cy 2 by rotating the coordinate axes through an angle θ that satisfies the equation cot 2θ =

a−c 2b

37. Prove: If A is an n × n symmetric matrix all of whose eigenvalues are nonnegative, then xT Ax ≥ 0 for all nonzero x in the vector space R n .

True-False Exercises TF. In parts (a)–(l) determine whether the statement is true or false, and justify your answer. (a) If all eigenvalues of a symmetric matrix A are positive, then A is positive definite. (b)

x12



x22

+

x32

+ 4x1 x2 x3 is a quadratic form.

(c) (x1 − 3x2 ) is a quadratic form. 2

(k) If A is symmetric, and if the quadratic form xTAx has no cross product terms, then A must be a diagonal matrix. (l) If xTAx is a positive definite quadratic form in two variables and c  = 0, then the graph of the equation xTAx = c is an ellipse.

Working withTechnology T1. Find an orthogonal matrix P such that P TAP is diagonal.



−2



1

1

1

⎢ 1 ⎢ A=⎢ ⎣ 1

−2

1

1⎥ ⎥

1

−2

1

1

1



1⎦

−2

T2. Use the eigenvalues of the following matrix to determine whether it is positive definite, negative definite, or idefinite, and then confirm your conclusion using Theorem 7.3.4.



−5

⎢−3 ⎢ ⎢ A=⎢ ⎢ 0 ⎢ ⎣ 3 0

−3

0

3

−2

0

2

0

−1

1

2

1

−8

0

1

2

0



⎥ ⎥ 1⎥ ⎥ ⎥ 2⎦ 0⎥

−7

7.4 Optimization Using Quadratic Forms Quadratic forms arise in various problems in which the maximum or minimum value of some quantity is required. In this section we will discuss some problems of this type.

Constrained Extremum Problems

Our first goal in this section is to consider the problem of finding the maximum and minimum values of a quadratic form xTAx subject to the constraint x = 1. Problems of this type arise in a wide variety of applications. To visualize this problem geometrically in the case where xTAx is a quadratic form on R 2 , view z = xTAx as the equation of some surface in a rectangular xyz-coordinate

430

Chapter 7 Diagonalization and Quadratic Forms

Constrained minimum

z Constrained maximum

y x

Unit circle

system and view x = 1 as the unit circle centered at the origin of the xy-plane. Geometrically, the problem of finding the maximum and minimum values of xTAx subject to the requirement x = 1 amounts to finding the highest and lowest points on the intersection of the surface with the right circular cylinder determined by the circle (Figure 7.4.1). The following theorem, whose proof is deferred to the end of the section, is the key result for solving problems of this type.

Figure 7.4.1 THEOREM 7.4.1 Constrained Extremum Theorem

Let A be a symmetric n × n matrix whose eigenvalues in order of decreasing size are λ1 ≥ λ2 ≥ · · · ≥ λn . Then: (a) The quadratic form xTAx attains a maximum value and a minimum value on the set of vectors for which x = 1. (b) The maximum value attained in part (a) occurs at a vector corresponding to the eigenvalue λ1 . (c) The minimum value attained in part (a) occurs at a vector corresponding to the eigenvalue λn .

Remark The condition x = 1 in this theorem is called a constraint, and the maximum or minimum value of xTAx subject to the constraint is called a constrained extremum. This constraint can also be expressed as xTx = 1 or as x12 + x22 + · · · + xn2 = 1, when convenient.

E X A M P L E 1 Finding Constrained Extrema

Find the maximum and minimum values of the quadratic form

z = 5x 2 + 5y 2 + 4xy subject to the constraint x 2 + y 2 = 1. Solution The quadratic form can be expressed in matrix notation as



5 y] 2

z = 5x + 5y + 4xy = x Ax = [x 2

T

2

2 5

x y

We leave it for you to show that the eigenvalues of A are λ1 = 7 and λ2 = 3 and that corresponding eigenvectors are



1 , λ2 = 3: 1

λ1 = 7: Normalizing these eigenvectors yields



λ1 = 7: ⎣

√1

2

√1

2





−1 1



⎦ , λ2 = 3: ⎣

− √12 √1

Thus, the constrained extrema are constrained maximum: z = 7 at (x, y) =

⎤ ⎦

(1)

2

 

√1

, √12 2

constrained minimum: z = 3 at (x, y) = − √12 ,



√1

2



7.4 Optimization Using Quadratic Forms

431

Remark Since the negatives of the eigenvectors in (1) are also unit eigenvectors, they too produce the maximum and minimum values of z; that is, the constrained maximum z = 7 also occurs   at the point (x, y) = − √12 , − √12 and the constrained minimum z = 3 at (x, y) = √12 , − √12 .

E X A M P L E 2 A Constrained Extremum Problem

A rectangle is to be inscribed in the ellipse 4x 2 + 9y 2 = 36, as shown in Figure 7.4.2. Use eigenvalue methods to find nonnegative values of x and y that produce the inscribed rectangle with maximum area.

y (x, y) x

= 4xy, so the problem is to maximize the quadratic form z = 4xy subject to the constraint 4x 2 + 9y 2 = 36. In this problem, the graph of the constraint equation is an ellipse rather than the unit circle as required in Theorem 7.4.1, but we can remedy this problem by rewriting the constraint as ' ( ' (

Solution The area z of the inscribed rectangle is given by z

Figure 7.4.2 A rectangle inscribed in the ellipse 4x 2 + 9y 2 = 36.

x

2

y

2

=1 2 and defining new variables, x1 and y1 , by the equations 3

+

x = 3x1 and y = 2y1 This enables us to reformulate the problem as follows: maximize z = 4xy = 24x1 y1 subject to the constraint

x12 + y12 = 1 To solve this problem, we will write the quadratic form z = 24x1 y1 as



z = xTAx = [x1

y1 ]

0 12

x1 y1

12 0

We now leave it for you to show that the largest eigenvalue of A is λ = 12 and that the only corresponding unit eigenvector with nonnegative entries is

⎡ ⎤ √1 x1 2 ⎣ x= = 1 ⎦ y1 √ 2

Thus, the maximum area is z = 12, and this occurs when 3

2 and y = 2y1 = √ 2 2

x = 3x1 = √ Constrained Extrema and Level Curves z

f(x, y) = k

z = f(x, y) Plane z = k

k

y

x

Level curve f(x, y) = k

Figure 7.4.3

A useful way of visualizing the behavior of a function f(x, y) of two variables is to consider the curves in the xy-plane along which f(x, y) is constant. These curves have equations of the form

and are called the level curves of f (Figure 7.4.3). In particular, the level curves of a quadratic form xTAx on R 2 have equations of the form xTAx = k

(2)

so the maximum and minimum values of xTAx subject to the constraint x = 1 are the largest and smallest values of k for which the graph of (2) intersects the unit circle. Typically, such values of k produce level curves that just touch the unit circle

432

Chapter 7 Diagonalization and Quadratic Forms y

(Figure 7.4.4), and the coordinates of the points where the level curves just touch produce the vectors that maximize or minimize xTAx subject to the constraint x = 1.

x ||x|| = 1 x

E X A M P L E 3 Example 1 Revisited Using Level Curves T

x Ax = k

Figure 7.4.4

In Example 1 (and its following remark) we found the maximum and minimum values of the quadratic form z = 5x 2 + 5y 2 + 4xy subject to the constraint x 2 + y 2 = 1. We showed that the constrained maximum is z = 7, which is attained at the points 1

(x, y) =

!

1

2

1

and (x, y) =

√ ,√

2

1

−√ , −√ 2

!

(3)

2

and that the constrained minimum is z = 3, which is attained at the points 1

1

!

(x, y) = − √ , √ 2

and (x, y) =

2

1

1

√ , −√ 2

2

!

(4)

Geometrically, this means that the level curve 5x 2 + 5y 2 + 4xy = 7 should just touch the unit circle at the points in (3), and the level curve 5x 2 + 5y 2 + 4xy = 3 should just touch it at the points in (4). All of this is consistent with Figure 7.4.5.

y 1

(–√ 2

2

1

,

√2

)

1

(√

2

x +y =1 1

(–√

2

Figure 7.4.5

CA L C U L U S R E Q U I R E D

Relative Extrema of Functions ofTwo Variables

5x2 + 5y2 + 4xy = 7

, –

1

√2

)

2

,

1

√2

1

(√

2

)

, –

π 4

1

√2

x

)

5x2 + 5y2 + 4xy = 3

We will conclude this section by showing how quadratic forms can be used to study characteristics of real-valued functions of two variables. Recall that if a function f(x, y) has first-order partial derivatives, then its relative maxima and minima, if any, occur at points where the conditions

fx (x, y) = 0 and fy (x, y) = 0 are both true. These are called critical points of f. The specific behavior of f at a critical point (x0 , y0 ) is determined by the sign of

D(x, y) = f(x, y) − f(x0 , y0 )

(5)

at points (x, y) that are close to, but different from, (x0 , y0 ): • If D(x, y) > 0 at points (x, y) that are sufficiently close to, but different from, (x0 , y0 ), then f(x0 , y0 ) < f(x, y) at such points and f is said to have a relative minimum at (x0 , y0 ) (Figure 7.4.6a).

7.4 Optimization Using Quadratic Forms z

433

• If D(x, y) < 0 at points (x, y) that are sufficiently close to, but different from, (x0 , y0 ), then f(x0 , y0 ) > f(x, y) at such points and f is said to have a relative maximum at (x0 , y0 ) (Figure 7.4.6b). • If D(x, y) has both positive and negative values inside every circle centered at (x0 , y0 ), then there are points (x, y) that are arbitrarily close to (x0 , y0 ) at which f(x0 , y0 ) < f(x, y) and points (x, y) that are arbitrarily close to (x0 , y0 ) at which f(x0 , y0 ) > f(x, y). In this case we say that f has a saddle point at (x0 , y0 ) (Figure 7.4.6c).

y

x

In general, it can be difficult to determine the sign of (5) directly. However, the following theorem, which is proved in calculus, makes it possible to analyze critical points using derivatives.

Relative minimum at (0, 0)

(a) z

THEOREM 7.4.2 Second Derivative Test

y

x

Suppose that (x0 , y0 ) is a critical point of f(x, y) and that f has continuous secondorder partial derivatives in some circular region centered at (x0 , y0 ). Then: (a) f has a relative minimum at (x0 , y0 ) if 2 fxx (x0 , y0 )fyy (x0 , y0 ) − fxy (x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0

Relative maximum at (0, 0)

(b) f has a relative maximum at (x0 , y0 ) if

(b)

2 fxx (x0 , y0 )fyy (x0 , y0 ) − fxy (x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0

z

(c) f has a saddle point at (x0 , y0 ) if y

2 fxx (x0 , y0 )fyy (x0 , y0 ) − fxy (x0 , y0 ) < 0

(d ) The test is inconclusive if

x Saddle point at (0, 0)

2 fxx (x0 , y0 )fyy (x0 , y0 ) − fxy (x0 , y0 ) = 0

(c) Figure 7.4.6

Our interest here is in showing how to reformulate this theorem using properties of symmetric matrices. For this purpose we consider the symmetric matrix

H (x, y) =

fxx (x, y) fxy (x, y) fxy (x, y) fyy (x, y)

which is called the Hessian or Hessian matrix of f in honor of the German mathematician and scientist Ludwig Otto Hesse (1811–1874). The notation H (x, y) emphasizes that the entries in the matrix depend on x and y. The Hessian is of interest because

  fxx (x0 , y0 ) fxy (x0 , y0 )  = fxx (x0 , y0 )fyy (x0 , y0 ) − f 2 (x0 , y0 )  det[H (x0 , y0 )] =  xy fxy (x0 , y0 ) fyy (x0 , y0 ) is the expression that appears in Theorem 7.4.2. We can now reformulate the second derivative test as follows.

434

Chapter 7 Diagonalization and Quadratic Forms THEOREM 7.4.3 Hessian Form of the Second Derivative Test

Suppose that (x0 , y0 ) is a critical point of f(x, y) and that f has continuous secondorder partial derivatives in some circular region centered at (x0 , y0 ). If H (x0 , y0 ) is the Hessian of f at (x0 , y0 ), then: (a) f has a relative minimum at (x0 , y0 ) if H (x0 , y0 ) is positive definite. (b) f has a relative maximum at (x0 , y0 ) if H (x0 , y0 ) is negative definite. (c) f has a saddle point at (x0 , y0 ) if H (x0 , y0 ) is indefinite. (d ) The test is inconclusive otherwise. We will prove part (a). The proofs of the remaining parts will be left as exercises. Proof (a) If H (x0 , y0 ) is positive definite, then Theorem 7.3.4 implies that the principal

submatrices of H (x0 , y0 ) have positive determinants. Thus,

 fxy (x0 , y0 ) 2 = fxx (x0 , y0 )fyy (x0 , y0 ) − fxy (x0 , y0 ) > 0 fyy (x0 , y0 )

 fxx (x0 , y0 )

det[H (x0 , y0 )] = 

fxy (x0 , y0 )

and det[fxx (x0 , y0 )] = fxx (x0 , y0 ) > 0 so f has a relative minimum at (x0 , y0 ) by part (a) of Theorem 7.4.2. E X A M P L E 4 Using the Hessian to Classify Relative Extrema

Find the critical points of the function

f(x, y) = 13 x 3 + xy 2 − 8xy + 3 and use the eigenvalues of the Hessian matrix at those points to determine which of them, if any, are relative maxima, relative minima, or saddle points. Solution To find both the critical points and the Hessian matrix we will need to calculate

the first and second partial derivatives of f. These derivatives are

fx (x, y) = x 2 + y 2 − 8y, fy (x, y) = 2xy − 8x, fxx (x, y) = 2x, fyy (x, y) = 2x

fxy (x, y) = 2y − 8

Thus, the Hessian matrix is

fxx (x, y) fxy (x, y) 2x = H (x, y) = fxy (x, y) fyy (x, y) 2y − 8



2y − 8 2x

To find the critical points we set fx and fy equal to zero. This yields the equations

fx (x, y) = x 2 + y 2 − 8y = 0 and fy (x, y) = 2xy − 8x = 2x(y − 4) = 0 Solving the second equation yields x = 0 or y = 4. Substituting x = 0 in the first equation and solving for y yields y = 0 or y = 8; and substituting y = 4 into the first equation and solving for x yields x = 4 or x = −4. Thus, we have four critical points:

(0, 0), (0, 8), (4, 4), (−4, 4) Evaluating the Hessian matrix at these points yields



H (0, 0) =

0 −8



8 H (4, 4) = 0

−8

0 , 8

0

,



H (0, 8) =

0 8



8 0

H (−4, 4) =

−8 0

0 −8

7.4 Optimization Using Quadratic Forms

435

We leave it for you to find the eigenvalues of these matrices and deduce the following classifications of the stationary points:

Critical Point

O PT I O N A L

(x0 , y0 )

λ1

λ2

(0, 0)

8

−8

Saddle point

(0, 8)

8

−8

Saddle point

(4, 4)

8

8

Relative minimum

(−4, 4)

−8

−8

Relative maximum

Classification

We conclude this section with an optional proof of Theorem 7.4.1. Proof of Theorem 7.4.1 The first step in the proof is to show that xTAx has constrained

maximum and minimum values for x = 1. Since A is symmetric, the principal axes theorem (Theorem 7.3.1) implies that there is an orthogonal change of variable x = P y such that xTAx = λ1 y12 + λ2 y22 + · · · + λn yn2 (6) in which λ1 , λ2 , . . . , λn are the eigenvalues of A. Let us assume that x = 1 and that the column vectors of P (which are unit eigenvectors of A) have been ordered so that

λ1 ≥ λ2 ≥ · · · ≥ λn

(7)

Since the matrix P is orthogonal, multiplication by P is length preserving, from which it follows that y = x = 1; that is,

y12 + y22 + · · · + yn2 = 1 It follows from this equation and (7) that

λn = λn (y12 + y22 + · · · + yn2 ) ≤ λ1 y12 + λ2 y22 + · · · + λn yn2 ≤ λ1 (y12 + y22 + · · · + yn2 ) = λ1 and hence from (6) that

λn ≤ xTAx ≤ λ1 This shows that all values of xTAx for which x = 1 lie between the largest and smallest eigenvalues of A. Now let x be a unit eigenvector corresponding to λ1 . Then xTAx = xT (λ1 x) = λ1 xTx = λ1 x 2 = λ1 which shows that xTAx has λ1 as a constrained maximum and that this maximum occurs if x is a unit eigenvector of A corresponding to λ1 . Similarly, if x is a unit eigenvector corresponding to λn , then xTAx = xT (λn x) = λn xTx = λn x 2 = λn so xTAx has λn as a constrained minimum and this minimum occurs if x is a unit eigenvector of A corresponding to λn . This completes the proof.

436

Chapter 7 Diagonalization and Quadratic Forms

Exercise Set 7.4 In Exercises 1–4, find the maximum and minimum values of the given quadratic form subject to the constraint x 2 + y 2 = 1, and determine the values of x and y at which the maximum and minimum occur. 1. 5x 2 − y 2

3. 3x 2 + 7y 2

2. xy

4. 5x 2 + 5xy

In Exercises 5–6, find the maximum and minimum values of the given quadratic form subject to the constraint

x 2 + y 2 + z2 = 1 and determine the values of x , y , and z at which the maximum and minimum occur. 5. 9x 2 + 4y 2 + 3z2

6. 2x 2 + y 2 + z2 + 2xy + 2xz

7. Use the method of Example 2 to find the maximum and minimum values of xy subject to the constraint 4x 2 + 8y 2 = 16. 8. Use the method of Example 2 to find the maximum and minimum values of x 2 + xy + 2y 2 subject to the constraint x 2 + 3y 2 = 16. In Exercises 9–10, draw the unit circle and the level curves corresponding to the given quadratic form. Show that the unit circle intersects each of these curves in exactly two places, label the intersection points, and verify that the constrained extrema occur at those points. 9. 5x 2 − y 2

10. xy

11. (a) Show that the function f(x, y) = 4xy − x 4 − y 4 has critical points at (0, 0), (1, 1), and (−1, −1). (b) Use the Hessian form of the second derivative test to show that f has relative maxima at (1, 1) and (−1, −1) and a saddle point at (0, 0). 12. (a) Show that the function f(x, y) = x 3 − 6xy − y 3 has critical points at (0, 0) and (−2, 2). (b) Use the Hessian form of the second derivative test to show that f has a relative maximum at (−2, 2) and a saddle point at (0, 0).

18. Suppose that x is a unit eigenvector of a matrix A corresponding to an eigenvalue 2. What is the value of xTAx? 19. (a) Show that the functions

f(x, y) = x 4 + y 4 and g(x, y) = x 4 − y 4 have a critical point at (0, 0) but the second derivative test is inconclusive at that point. (b) Give a reasonable argument to show that f has a relative minimum at (0, 0) and g has a saddle point at (0, 0). 20. Suppose that the Hessian matrix of a certain quadratic form f(x, y) is

2 4 H = 4 2 What can you say about the location and classification of the critical points of f ? 21. Suppose that A is an n × n symmetric matrix and

q(x) = xTAx where x is a vector in R n that is expressed in column form. What can you say about the value of q if x is a unit eigenvector corresponding to an eigenvalue λ of A?

Working with Proofs 22. Prove: If xTAx is a quadratic form whose minimum and maximum values subject to the constraint x = 1 are m and M, respectively, then for each number c in the interval m ≤ c ≤ M, there is a unit vector xc such that xTcAxc = c. [Hint: In the case where m < M, let um and uM be unit eigenvectors of A such that uTm Aum = m and uTM AuM = M, and let

,

xc =

M −c um + M −m

,

c−m uM M −m

Show that xTcAxc = c.]

True-False Exercises TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer.

In Exercises 13–16, find the critical points of f, if any, and classify them as relative maxima, relative minima, or saddle points.

(a) A quadratic form must have either a maximum or minimum value.

13. f(x, y) = x 3 − 3xy − y 3

(b) The maximum value of a quadratic form xTAx subject to the constraint x = 1 occurs at a unit eigenvector corresponding to the largest eigenvalue of A.

14. f(x, y) = x 3 − 3xy + y 3 15. f(x, y) = x 2 + 2y 2 − x 2 y 16. f(x, y) = x 3 + y 3 − 3x − 3y 17. A rectangle whose center is at the origin and whose sides are parallel to the coordinate axes is to be inscribed in the ellipse x 2 + 25y 2 = 25. Use the method of Example 2 to find nonnegative values of x and y that produce the inscribed rectangle with maximum area.

(c) The Hessian matrix of a function f with continuous secondorder partial derivatives is a symmetric matrix. (d) If (x0 , y0 ) is a critical point of a function f and the Hessian of f at (x0 , y0 ) is 0, then f has neither a relative maximum nor a relative minimum at (x0 , y0 ). (e) If A is a symmetric matrix and det(A) < 0, then the minimum of xTAx subject to the constraint x = 1 is negative.

7.5 Hermitian, Unitary, and Normal Matrices

437

z

Working withTechnology T1. Find the maximum and minimum values of the following quadratic form subject to the stated constraint, and specify the points at which those values are attained.

w = 2x 2 + y 2 + z2 + 2xy + 2xz; x 2 + y 2 + z2 = 1 T2. Suppose that the temperature at a point (x, y) on a metal plate is T (x, y) = 4x 2 − 4xy + y 2 . An ant walking on the plate traverses a circle of radius 5 centered at the origin. What are the highest and lowest temperatures encountered by the ant? T3. The accompanying figure shows the intersection of the surface z = x 2 + 4y 2 (called an elliptic paraboloid) and the surface x 2 + y 2 = 1 (called a right circular cylinder). Find the highest and lowest points on the curve of intersection.

x

y

Figure Ex-T3

7.5 Hermitian, Unitary, and Normal Matrices We showed in Section 7.2 that every symmetric matrix with real entries is orthogonally diagonalizable, and conversely that every diagonalizable matrix with real entries is symmetric. In this section we will be concerned with the diagonalization problem for matrices with complex entries.

Real Matrices Versus Complex Matrices

As discussed in Section 5.3, we distinguish between matrices whose entries must be real numbers, called real matrices, and matrices whose entries may be either real numbers or complex numbers, called complex matrices. When convenient, you can think of a real matrix as a complex matrix each of whose entries has zero as its imaginary part. Similarly, we distinguish between real vectors (those in R n ) and complex vectors (those in C n ).

Hermitian and Unitary Matrices

The transpose operation is less important for complex matrices than for real matrices. A more useful operation for complex matrices is given in the following definition. DEFINITION 1 If A is a complex matrix, then the conjugate transpose of A, denoted by A∗ , is defined by A∗ = AT (1)

Remark Note that the order in which the transpose and conjugation operations are performed in Formula (1) does not matter (see Theorem 5.3.2b). Moreover, if A is a real matrix, then Formula (1) simplifies to A∗ = (A)T = AT , so the conjugate transpose is the same as the transpose in that case.

E X A M P L E 1 Conjugate Transpose

Find the conjugate transpose A∗ of the matrix



A=

1+i 2

−i 3 − 2i

0

i

438

Chapter 7 Diagonalization and Quadratic Forms Solution We have

A=

1−i 2

i 3 + 2i

0



1−i

and hence A∗ = AT = ⎣ i 0

−i

2



3 + 2i ⎦

−i

The following theorem, parts of which are given as exercises, shows that the basic algebraic properties of the conjugate transpose operation are similar to those of the transpose (compare to Theorem 1.4.8). THEOREM 7.5.1 If

k is a complex scalar, and if A and B are complex matrices whose sizes are such that the stated operations can be performed, then: (a) (A∗ )∗ = A (b) (A + B)∗ = A∗ + B ∗ (c)

(A − B)∗ = A∗ − B ∗

(d ) (kA)∗ = kA∗ (e)

(AB)∗ = B ∗A∗

We now define two new classes of matrices that will be important in our study of diagonalization in C n . DEFINITION 2 A square matrix A is said to be unitary if

To show that a matrix is unitary it suffices to show that either AA∗ = I or A∗A = I since either equation implies the other.

AA∗ = A∗A = I

(2)

A∗ = A−1

(3)

A∗ = A

(4)

or, equivalently, if *

and it is said to be Hermitian if

If A is a real matrix, then A∗ = AT , in which case (3) becomes AT = A−1 and (4) becomes AT = A. Thus, the unitary matrices are complex generalizations of the real orthogonal matrices and the Hermitian matrices are complex generalizations of the real symmetric matrices.

E X A M P L E 2 Recognizing Hermitian Matrices

Hermitian matrices are easy to recognize because their diagonal entries are real (why?) and the entries that are symmetrically positioned across the main diagonal are complex conjugates. Thus, for example, we can tell by inspection that



1

A = ⎣ −i 1−i

i −5 2+i

is Hermitian.

*



1+i 2 − i⎦ 3

In honor of the French mathematician Charles Hermite (1822–1901).

7.5 Hermitian, Unitary, and Normal Matrices

439

E X A M P L E 3 Recognizing Unitary Matrices

Unlike Hermitian matrices, unitary matrices are not readily identifiable by inspection. The most direct way to identify such matrices is to determine whether the matrix satisfies Equation (2) or Equation (3). We leave it for you to verify that the following matrix is unitary: ⎡ ⎤ √1

√1

2 A=⎣ 1 − √2 i

2

√1

i



2

In Theorem 7.2.2 we established that real symmetric matrices have real eigenvalues and that eigenvectors from different eigenvalues are orthogonal. That theorem is a special case of our next theorem in which orthogonality is with respect to the complex Euclidean inner product on C n . We will prove part (b) of the theorem and leave the proof of part (a) for the exercises. In our proof we will make use of the fact that the relationship u · v = vT u given in Formula (5) of Section 5.3 can be expressed in terms of the conjugate transpose as u · v = v∗ u

THEOREM 7.5.2

(5)

If A is a Hermitian matrix, then:

(a) The eigenvalues of A are all real numbers. (b) Eigenvalues from different eigenspaces are orthogonal.

A corresponding to distinct eigenvalues λ1 and λ2 . Using Formula (5) and the facts that λ1 = λ1 , λ2 = λ2 , and A = A∗ , we can write λ1 (v2 · v1 ) = (λ1 v1 )∗ v2 = (Av1 )∗ v2 = (v∗1 A∗ )v2 Proof (b) Let v1 and v2 be eigenvectors of

= (v∗1 A)v2 = v∗1 (Av2 ) = v∗1 (λ2 v2 ) = λ2 (v∗1 v2 ) = λ2 (v2 · v1 ) This implies that (λ1 − λ2 )(v2 · v1 ) = 0 and hence that v2 · v1 = 0 (since λ1  = λ2 ). E X A M P L E 4 Eigenvalues and Eigenvectors of a Hermitian Matrix

Confirm that the Hermitian matrix

A=



2

1−i

1+i 3

has real eigenvalues and that eigenvectors from different eigenspaces are orthogonal. Solution The characteristic polynomial of A is

  λ−2 det(λI − A) =  −1 + i

 −1 − i  λ − 3

= (λ − 2)(λ − 3) − (−1 − i)(−1 + i) = (λ2 − 5λ + 6) − 2 = (λ − 1)(λ − 4) so the eigenvalues of A are λ = 1 and λ = 4, which are real. Bases for the eigenspaces of A can be obtained by solving the linear system



λ−2 −1 + i

−1 − i x1 0 = λ − 3 x2 0

440

Chapter 7 Diagonalization and Quadratic Forms

with λ = 1 and with λ = 4. We leave it for you to do this and to show that the general solutions of these systems are

 1

(1 + i) x1 −1 − i x1 2 λ = 1: and λ = 4: =t =t x2 1 x2 1

Thus, bases for these eigenspaces are



−1 − i

λ = 1: v1 =

1

and λ = 4: v2 =

1

2

(1 + i)



1

The vectors v1 and v2 are orthogonal since v1 · v2 = (−1 − i)

1 2

 (1 + i) + (1)(1) = 21 (−1 − i)(1 − i) + 1 = 0

and hence all scalar multiples of them are also orthogonal.

Unitary matrices are not usually easy to recognize by inspection. However, the following analog of Theorems 7.1.1 and 7.1.3, part of which is proved in the exercises, provides a way of ascertaining whether a matrix is unitary without computing its inverse. THEOREM 7.5.3 If A is an n × n matrix with complex entries, then the following are

equivalent. (a) A is unitary. (b) Ax = x for all x in C n . (c)

Ax · Ay = x · y for all x and y in C n .

(d ) The column vectors of A form an orthonormal set in C n with respect to the complex Euclidean inner product. (e)

The row vectors of A form an orthonormal set in C n with respect to the complex Euclidean inner product.

E X A M P L E 5 A Unitary Matrix

Use Theorem 7.5.3 to show that



A=⎣

1 (1 2

+ i)

1 (1 2

1 (1 2

− i)

1 (−1 2



+ i)



+ i)

is unitary, and then find A−1 . Solution We will show that the row vectors

r1 =

1 2

(1 + i)

1 (1 2



+ i)

and r2 =

are orthonormal. The relevant computations are

1 2

(1 − i)

1 (−1 2



+ i)

# #     1 (1 + i)2 +  1 (1 + i)2 = 1 + 1 = 1 2 2 2 2 # # 2  2

r2 =  21 (1 − i) +  21 (−1 + i) = 21 + 21 = 1       r1 · r2 = 21 (1 + i) 21 (1 − i) + 21 (1 + i) 21 (−1 + i)       = 21 (1 + i) 21 (1 + i) + 21 (1 + i) 21 (−1 − i) = 21 i − 21 i = 0

r1 =

7.5 Hermitian, Unitary, and Normal Matrices

Since we now know that A is unitary, it follows that



A−1 = A∗ = ⎣

1 (1 2

− i)

1 (1 2

1 (1 2

− i)

1 (−1 2

441



+ i)



− i)

You can confirm the validity of this result by showing that AA∗ = A∗ A = I. Unitary Diagonalizability

Since unitary matrices are the complex analogs of the real orthogonal matrices, the following definition is a natural generalization of orthogonal diagonalizability for real matrices. DEFINITION 3 A square complex matrix A is said to be unitarily diagonalizable if there is a unitary matrix P such that P ∗AP = D is a complex diagonal matrix. Any such matrix P is said to unitarily diagonalize A.

Recall that a real symmetric n × n matrix A has an orthonormal set of n eigenvectors and is orthogonally diagonalized by any n × n matrix whose column vectors are an orthonormal set of eigenvectors of A. Here is the complex analog of that result.

n × n Hermitian matrix A has an orthonormal set of n eigenvectors and is unitarily diagonalized by any n × n matrix P whose column vectors form an orthonormal set of eigenvectors of A.

THEOREM 7.5.4 Every

The procedure for unitarily diagonalizing a Hermitian matrix A is exactly the same as that for orthogonally diagonalizing a symmetric matrix: Unitarily Diagonalizing a Hermitian Matrix Step 1. Find a basis for each eigenspace of A. Step 2. Apply the Gram–Schmidt process to each of these bases to obtain orthonormal bases for the eigenspaces. Step 3. Form the matrix P whose column vectors are the basis vectors obtained in Step 2. This will be a unitary matrix (Theorem 7.5.3) and will unitarily diagonalize A.

E X A M P L E 6 Unitary Diagonalization of a Hermitian Matrix

Find a matrix P that unitarily diagonalizes the Hermitian matrix



A=

2

1−i

1+i 3

Solution We showed in Example 4 that the eigenvalues of

that bases for the corresponding eigenspaces are

λ = 1: v1 =

−1 − i 1

and λ = 4: v2 =

A are λ = 1 and λ = 4 and 1 2

(1 + i)



1

Since each eigenspace has only one basis vector, the Gram–Schmidt process is simply a matter of normalizing these basis vectors. We leave it for you to show that

442

Chapter 7 Diagonalization and Quadratic Forms



−√ 1−i

v1 =⎣ p1 =

v1

3

√1

3







6

Thus, A is unitarily diagonalized by the matrix



P = [p1

1+i

√ ⎦ and p2 = v2 = ⎣ 6 ⎦

v2 √2

p2 ] = ⎣

−√ 1−i

1√ +i 6

√1

√2

3 3

⎤ ⎦

6

Although it is a little tedious, you may want to check this result by showing that



P AP = ⎣ ∗

Skew-Symmetric and Skew-Hermitian Matrices

−√ 1+i

√1

1√ −i 6

√2

3

3

⎤ ⎦



2

1−i

6

−√ 1−i

1√ +i 6

√1

√2

3 3



⎦= 1

0 4

0

6

We will now consider two more classes of matrices that play a role in the analysis of the diagonalization problem. A square real matrix A is said to be skew-symmetric if AT = −A, and a square complex matrix A is said to be skew-Hermitian if A∗ = −A. We leave it as an exercise to show that a skew-symmetric matrix must have zeros on the main diagonal, and a skew-Hermitian matrix must have zeros or pure imaginary numbers on the main diagonal. Here are two examples:



0 ⎢ A = ⎣−1 2

−2

1 0 −4





i ⎢ A = ⎣−1 − i −5



4⎦ 0

[ skew-symmetric ]

Normal Matrices



1+i ⎣ 3



1−i 2i

5

i

0

⎥ i⎦

[ skew-Hermitian ]

Hermitian matrices enjoy many, but not all, of the properties of real symmetric matrices. For example, we know that real symmetric matrices are orthogonally diagonalizable and Hermitian matrices are unitarily diagonalizable. However, whereas the real symmetric matrices are the only orthogonally diagonalizable matrices, the Hermitian matrices do not constitute the entire class of unitarily diagonalizable complex matrices. Specifically, it can be proved that a square complex matrix A is unitarily diagonalizable if and only if

AA∗ = A∗A

(6)

Matrices with this property are said to be normal. Normal matrices include the Hermitian, skew-Hermitian, and unitary matrices in the complex case and the symmetric, skew-symmetric, and orthogonal matrices in the real case. The nonzero skew-symmetric matrices are particularly interesting because they are examples of real matrices that are not orthogonally diagonalizable but are unitarily diagonalizable. A Comparison of Eigenvalues

We have seen that Hermitian matrices have real eigenvalues. In the exercises we will ask you to show that the eigenvalues of a skew-Hermitian matrix are either zero or purely imaginary (have real part of zero) and that the eigenvalues of unitary matrices have modulus 1. These ideas are illustrated schematically in Figure 7.5.1. y Pure imaginary eigenvalues (skew-Hermitian) |λ| = 1 (unitary) 1

x

Real eigenvalues (Hermitian)

Figure 7.5.1

7.5 Hermitian, Unitary, and Normal Matrices

443

Exercise Set 7.5

In Exercises 1–2, find A∗ .





1−i ⎥ 3 + i⎦ 0

2i ⎢ 1. A = ⎣ 4 5+i



1 − i −1 + i 5 − 7i −i

2i 4

2. A =

In Exercises 3–4, substitute numbers for the ×’s so that A is Hermitian.





1

i

3. A = ⎣×

−3 ×



×



2 − 3i ⎥ 1 ⎦ 2

4. A = ⎣×

0 −4

×

×



2



5. (a) A = ⎣ −i 2 − 3i



−3 ×

×



× i i

(b) A = ⎣ 0 3 − 5i



6. (a) A = ⎣ 1 + i 6 − 2i



×

1

⎢ (b) A = ⎣ × 3 − 5i

0

⎢ ⎢ ⎣

3

6



7. A =

2 − 3i −1

3

2 + 3i

4 i 5

− 45

3 i 5

9. A = ⎣

⎡ 11. A = ⎣

1 √ 2 2



1 √

2 2



√1

3

12. A =

1+i 3

(−1 + i)

1 √

0 −2 i

2i 2

2 2

√1

6

√1

√1 2

1 (1 2



+ i)



4

i−

√ ⎦

1+i





2 − 3i ⎥ 1 ⎦ 4i

×



0

20. A = ⎣×

×

2 − 3i

i

0

0 0

3 − 5i

×

0

−i

⎤ ⎥ ⎦

⎥ × ⎦ ×

0

× × 2i i

1



3 − 5i



0

−1 − i

0

−i

× −4 − 7i

×

0



⎥ −i ⎦ 3i

×

i ⎢ 22. (a) A = ⎣ × 2 + 3i ⎡



2 − 3i ⎥ 1+i ⎦

× ⎤ 4 + 7i ⎥ × ⎦ 1

In Exercises 23–24, verify that the eigenvalues of the skewHermitian matrix A are pure imaginary numbers.



0

1+i

−1 + i i

24. A =

0 3i

3

In Exercises 25–26, show that A is normal.





1 + 2i ⎢ 25. A = ⎣ 2 + i −2 − i

6

1−i 5

0

23. A =

In Exercises 13–18, find a unitary matrix P that diagonalizes the Hermitian matrix A, and determine P −1AP . 13. A =

19. A = ⎣×



√1 2

√2

3

i



√ ⎤

(1 − i)

0

(b) A = ⎣

1−i 3





× −3 + 5i

− 21 (1 + i)

√ 



2



10. A = ⎣



0

(b) A = ⎣



1 √ 2 2



√1









0 ⎥ ⎥





3+i

− √12 i

i

In Exercises 21–22, show that A is not skew-Hermitian for any choice of the ×’s.



√

0

21. (a) A = ⎣ −i 2 + 3i

8. A =

3−i

⎥ −1 + i ⎦

2



3+i −3

0



0

1

×

In Exercises 9–12, show that A is unitary, and find A . 3 5

2

16. A =

In Exercises 19–20, substitute numbers for the ×’s so that A is skew-Hermitian.



−1



√1

2

i 2



In Exercises 7–8, verify that the eigenvalues of the Hermitian matrix A are real and that eigenvectors from different eigenspaces are orthogonal (see Theorem 7.5.2).



0 −1 −1 − i

18. A = ⎢− √2 i

3 + 5i ⎥ 1−i ⎦ 2+i

×

5 ⎢ 17. A = ⎣0 0

⎥ −i ⎦

⎤ × ⎥ ×⎦

×





⎥ × ⎦ × ⎤ 3 + 5i ⎥ −i ⎦ ×

1+i 7

1



2 − 3i

i

1

2 − 2i



2 + 2i 4

6



3 + 5i

In Exercises 5–6, show that A is not Hermitian for any choice of the ×’s.



15. A =



14. A =

3

−i

i

3



2 + 2i



26. A = ⎣

i 1−i

−i

⎤ −2 − i ⎥ −i ⎦ 1+i

i −2 i 1 − 3i

1−i ⎥ 1 − 3i ⎦ −3 + 8i

2+i 1+i



3i 0

444

Chapter 7 Diagonalization and Quadratic Forms

27. Let A be any n × n matrix with complex entries, and define the matrices B and C to be

B=

1 1 (A + A∗ ) and C = (A − A∗ ) 2 2i

(a) Show that B and C are Hermitian. ∗

(b) Show that A = B + iC and A = B − iC.

38. Prove that each entry on the main diagonal of a skewHermitian matrix is either zero or a pure imaginary number. 39. Prove that if A is a unitary matrix, then so is A∗ . 40. Prove that the eigenvalues of a skew-Hermitian matrix are either zero or pure imaginary.

(c) What condition must B and C satisfy for A to be normal?

41. Prove that the eigenvalues of a unitary matrix have modulus 1.

28. Show that if A is an n × n matrix with complex entries, and if u and v are vectors in C n that are expressed in column form, then Au · v = u · A∗ v and u · Av = A∗ u · v

42. Prove that if u is a nonzero vector in C n that is expressed in column form, then P = u u∗ is Hermitian.

29. Show that



e−iθ −ie−iθ

eiθ A= √ iθ 2 ie 1

is unitary for all real values of θ. [Note: See Formula (17) in Appendix B for the definition of eiθ .]



30. Show that

α + iγ −β + iδ β + iδ α − iγ is unitary if α 2 + β 2 + λ2 + δ 2 = 1.

A=

31. Let A be the unitary matrix in Exercise 9, and verify that the conclusions in parts (b) and (c) of Theorem 7.5.3 hold for the vectors x = (1 + i, 2 − i) and y = (1, 1 − i). 32. Let TA : C →C be multiplication by the Hermitian matrix A in Exercise 14, and find two orthogonal unit vectors u1 and u2 for which TA (u1 ) and TA (u2 ) are orthogonal. 2

2

33. Under what conditions is the following matrix normal?





a ⎢ A = ⎣0

0 0

0

0

b

0

⎥ c⎦

34. What relationship must exist between a matrix and its inverse if it is both Hermitian and unitary? 35. Find a 2 × 2 matrix that is both Hermitian and unitary and whose entries are not all real numbers.

43. Prove that if u is a unit vector in C n that is expressed in column form, then H = I − 2u u∗ is Hermitian and unitary. 44. Prove that if A is an invertible matrix, then A∗ is invertible, and (A∗ )−1 = (A−1 )∗ . 45. (a) Prove that det(A) = det(A). (b) Use the result in part (a) and the fact that a square matrix and its transpose have the same determinant to prove that det(A∗ ) = det(A). 46. Use part (b) of Exercise 45 to prove: (a) If A is Hermitian, then det(A) is real. (b) If A is unitary, then | det(A)| = 1. 47. Prove that an n × n matrix with complex entries is unitary if and only if the columns of A form an orthonormal set in C n . 48. Prove that the eigenvalues of a Hermitian matrix are real.

True-False Exercises TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer.



(a) The matrix

0

i

i

2



− √i 2



(b) The matrix ⎣

0 √i

2



is Hermitian. √i

6

− √i 6 √i

6

√i

3

√i

⎤ ⎥

3⎦

is unitary.

√i

3

Working with Proofs

(c) The conjugate transpose of a unitary matrix is unitary.

36. Use properties of the transpose and complex conjugate to prove parts (b) and (d) of Theorem 7.5.1.

(d) Every unitarily diagonalizable matrix is Hermitian.

37. Use properties of the transpose and complex conjugate to prove parts (a) and (e) of Theorem 7.5.1.

(e) A positive integer power of a skew-Hermitian matrix is skewHermitian.

Chapter 7 Supplementary Exercises

445

Chapter 7 Supplementary Exercises 1. Verify that each matrix is orthogonal, and find its inverse.

3 (a)

5 4 5

− 45 3 5



 (b)

4 5 ⎢ 9 ⎣− 25 12 25

0 4 5 3 5

10. Find a unitary matrix U that diagonalizes



3⎤

−5

1 ⎢ A = ⎣0 1

⎥ − 12 25 ⎦ 16 25

2. Prove: If Q is an orthogonal matrix, then each entry of Q is the same as its cofactor if det(Q) = 1 and is the negative of its cofactor if det(Q) = −1.

11. Show that if U is an n × n unitary matrix and

|z1 | = |z2 | = · · · = |zn | = 1 then the product

u, v = uTAv



is an inner product on R n . 4. Find the characteristic polynomial and the dimensions of the eigenspaces of the symmetric matrix 3



2 3 2

2

2 ⎥ 2⎦ 3



1

⎢ A = ⎣0 1

0 1 0



1 ⎥ 0⎦ 1

and determine the diagonal matrix D = P TAP . 6. Express each quadratic form in the matrix notation xTAx. (a) −4x12 + 16x22 − 15x1 x2 (b) 9x12 − x22 + 4x32 + 6x1 x2 − 8x1 x3 + x2 x3 7. Classify the quadratic form

x12 − 3x1 x2 + 4x22 as positive definite, negative definite, indefinite, positive semidefinite, or negative semidefinite. 8. Find an orthogonal change of variable that eliminates the cross product terms in each quadratic form, and express the quadratic form in terms of the new variables. (a)

+

5x22

+ 2x1 x2

z2 .. .

0

··· ···

0

0

0

···

.. .

0



0⎥ ⎥

.. ⎥ .⎦ zn

12. Suppose that A is skew-Hermitian. (a) Show that iA is Hermitian. (b) Show that A is unitarily diagonalizable and has pure imaginary eigenvalues. 13. Find a , b, and c for which the matrix



a

⎢ ⎢b ⎣ c

√1

2

− √12

√1

√1

√1

√1

6 3

⎤ ⎥ ⎥

6⎦ 3

is orthogonal. Are the values of a , b, and c unique? Explain. 14. In each part, suppose that A is a 4 × 4 matrix in which det(Mj ) is the determinant of the j th principal submatrix of A. Determine whether A is positive definite, negative definite, or indefinite. (a) det(M1 ) < 0, det(M2 ) > 0, det(M3 ) < 0, det(M4 ) > 0 (b) det(M1 ) > 0, det(M2 ) > 0, det(M3 ) > 0, det(M4 ) > 0

(b) −5x12 + x22 − x32 + 6x1 x3 + 4x1 x2 9. Identify the type of conic section represented by each equation. (a) y − x 2 = 0

0

is also unitary.

5. Find a matrix P that orthogonally diagonalizes

−3x12

⎢0 ⎢ U⎢. ⎣ ..

0

z1

⎢ ⎣2

0 ⎥ 1⎦ 1

and determine the diagonal matrix D = U −1AU .

3. Prove that if A is a positive definite symmetric matrix, and if u and v are vectors in R n in column form, then





1 1 0

(b) 3x − 11y 2 = 0

(c) det(M1 ) < 0, det(M2 ) < 0, det(M3 ) < 0, det(M4 ) < 0 (d) det(M1 ) > 0, det(M2 ) < 0, det(M3 ) > 0, det(M4 ) < 0 (e) det(M1 ) = 0, det(M2 ) < 0, det(M3 ) = 0, det(M4 ) > 0 (f ) det(M1 ) = 0, det(M2 ) > 0, det(M3 ) = 0, det(M4 ) = 0

CHAPTER

8

General Linear Transformations CHAPTER CONTENTS

8.1 General Linear Transformations

447

8.2 Compositions and Inverse Transformations 8.3 Isomorphism

8.4 Matrices for General Linear Transformations 8.5 Similarity INTRODUCTION

458

466 472

481

In earlier sections we studied linear transformations from R n to R m . In this chapter we will define and study linear transformations from a general vector space V to a general vector space W . The results we will obtain here have important applications in physics, engineering, and various branches of mathematics.

8.1 General Linear Transformations Up to now our study of linear transformations has focused on transformations from R n to R m . In this section we will turn our attention to linear transformations involving general vector spaces. We will illustrate ways in which such transformations arise, and we will establish a fundamental relationship between general n-dimensional vector spaces and R n .

Definitions and Terminology

In Section 1.8 we defined a matrix transformation TA : R n →R m to be a mapping of the form TA (x) = Ax in which A is an m × n matrix. We subsequently established in Theorem 1.8.3 that the matrix transformations are precisely the linear transformations from R n to R m , that is, the transformations with the linearity properties

T(u + v) = T(u) + T(v) and T(k u) = kT(u) We will use these two properties as the starting point for defining more general linear transformations. DEFINITION 1 If T : V →W is a mapping from a vector space V to a vector space W , then T is called a linear transformation from V to W if the following two properties hold for all vectors u and v in V and for all scalars k :

(i) T(k u) = kT(u) (ii) T(u + v) = T(u) + T(v)

[ Homogeneity property ] [ Additivity property ]

In the special case where V = W , the linear transformation T is called a linear operator on the vector space V. 447

448

Chapter 8 General Linear Transformations

The homogeneity and additivity properties of a linear transformation T : V →W can be used in combination to show that if v1 and v2 are vectors in V and k1 and k2 are any scalars, then T(k1 v1 + k2 v2 ) = k1 T(v1 ) + k2 T(v2 ) More generally, if v1 , v2 , . . . , vr are vectors in V and k1 , k2 , . . . , kr are any scalars, then

T(k1 v1 + k2 v2 + · · · + kr vr ) = k1 T(v1 ) + k2 T(v2 ) + · · · + kr T(vr )

(1)

The following theorem is an analog of parts (a) and (d) of Theorem 1.8.2.

THEOREM 8.1.1 If T : V

→W is a linear transformation, then:

(a) T(0) = 0. (b) T(u − v) = T(u) − T(v) for all u and v in V .

Use the two parts of Theorem 8.1.1 to prove that

T(−v) = −T (v) for all v in V.

Proof Let u be any vector in V. Since 0u

= 0, it follows from the homogeneity property

in Definition 1 that

T(0) = T(0u) = 0T(u) = 0 which proves (a). We can prove part (b) by rewriting T(u − v) as

  T(u − v) = T u + (−1)v = T(u) + (−1)T(v) = T(u) − T(v)

We leave it for you to justify each step.

E X A M P L E 1 Matrix Transformations

Because we have based the definition of a general linear transformation on the homogeneity and additivity properties of matrix transformations, it follows that every matrix transformation TA : R n →R m is also a linear transformation in this more general sense with V = R n and W = R m .

E X A M P L E 2 The Zero Transformation

Let V and W be any two vector spaces. The mapping T : V →W such that T(v) = 0 for every v in V is a linear transformation called the zero transformation. To see that T is linear, observe that

T(u + v) = 0, T(u) = 0, T(v) = 0, and T(k u) = 0 Therefore,

T(u + v) = T(u) + T(v) and T(k u) = kT(u)

E X A M P L E 3 The Identity Operator

Let V be any vector space. The mapping I : V →V defined by I (v) = v is called the identity operator on V. We will leave it for you to verify that I is linear.

8.1 General Linear Transformations

449

E X A M P L E 4 Dilation and Contraction Operators

If V is a vector space and k is any scalar, then the mapping T : V →V given by T(x) = k x is a linear operator on V , for if c is any scalar and if u and v are any vectors in V , then

T(cu) = k(cu) = c(k u) = cT(u) T(u + v) = k(u + v) = k u + k v = T(u) + T(v) If 0 < k < 1, then T is called the contraction of V with factor k , and if k > 1, it is called the dilation of V with factor k . E X A M P L E 5 A Linear Transformation from Pn to Pn +1

Let p = p(x) = c0 + c1 x + · · · + cn x n be a polynomial in Pn , and define the transformation T : Pn →Pn+1 by

T(p) = T(p(x)) = xp(x) = c0 x + c1 x 2 + · · · + cn x n+1 This transformation is linear because for any scalar k and any polynomials p1 and p2 in Pn we have T(k p) = T(kp(x)) = x(kp(x)) = k(xp(x)) = kT(p) and

T(p1 + p2 ) = T(p1 (x) + p2 (x)) = x(p1 (x) + p2 (x)) = xp1 (x) + xp2 (x) = T(p1 ) + T(p2 )

E X A M P L E 6 A Linear Transformation Using the Dot Product

Let v0 be any fixed vector in R n , and let T : R n →R be the transformation

T(x) = x · v0  that maps a vector x to its dot product with v0 . This transformation is linear, for if k is any scalar, and if u and v are any vectors in R n, then it follows from properties of the dot product in Theorem 3.2.2 that

T (k u) = (k u) · v0 = k(u · v0 ) = kT (u) T (u + v) = (u + v) · v0 = (u · v0 ) + (v · v0 ) = T (u) + T (v) E X A M P L E 7 Transformations on Matrix Spaces

Let Mnn be the vector space of n × n matrices. In each part determine whether the transformation is linear. (a)

T1 (A) = AT

(b)

T2 (A) = det(A)

Solution (a) It follows from parts (b) and (d) of Theorem 1.4.8 that

T1 (kA) = (kA)T = kAT = kT1 (A) T1 (A + B) = (A + B)T = AT + B T = T1 (A) + T1 (B) so T1 is linear. Solution (b) It follows from Formula (1) of Section 2.3 that

T2 (kA) = det(kA) = k n det(A) = k n T2 (A) Thus, T2 is not homogeneous and hence not linear if n > 1. Note that additivity also fails because we showed in Example 1 of Section 2.3 that det(A + B) and det(A) + det(B) are not generally equal.

450

Chapter 8 General Linear Transformations

E X A M P L E 8 Translation Is Not Linear

y x + x0

Part (a) of Theorem 8.1.1 states that a linear transformation maps 0 to 0. This property is useful for identifying transformations that are not linear. For example, if x0 is a fixed nonzero vector in R 2 , then the transformation

x0

T(x) = x + x0 x x 0

Figure 8.1.1 T(x) = x + x0 translates each point x along a line parallel to x0 through a distance x0 .

has the geometric effect of translating each point x in a direction parallel to x0 through a distance of x0 (Figure 8.1.1). This cannot be a linear transformation since T(0) = x0 , so T does not map 0 to 0.

E X A M P L E 9 The Evaluation Transformation

Let V be a subspace of F (−⬁, ⬁), let

x1 , x2 , . . . , xn be a sequence of distinct real numbers, and let T : V →R n be the transformation

  T(f ) = f(x1 ), f(x2 ), . . . , f(xn )

(2)

that associates with f the n-tuple of function values at x1 , x2 , . . . , xn . We call this the evaluation transformation on V at x1 , x2 , . . . , xn . Thus, for example, if

x1 = −1, x2 = 2, x3 = 4 and if f(x) = x − 1, then 2

  T(f ) = f(x1 ), f(x2 ), f(x3 ) = (0, 3, 15)

The evaluation transformation in (2) is linear, for if k is any scalar, and if f and g are any functions in V , then

  T(kf ) = (kf )(x1 ), (kf )(x2 ), . . . , (kf )(xn )   = kf(x1 ), kf(x2 ), . . . , kf(xn )   = k f(x1 ), f(x2 ), . . . , f(xn ) = kT(f )

and

Finding Linear Transformations from Images of Basis Vectors

  T(f + g) = (f + g)(x1 ), (f + g)(x2 ), . . . , (f + g)(xn )   = f(x1 ) + g(x1 ), f(x2 ) + g(x2 ), . . . , f(xn ) + g(xn )     = f(x1 ), f(x2 ), . . . , f(xn ) + g(x1 ), g(x2 ), . . . , g(xn ) = T(f ) + T(g)

We saw in Formula (15) of Section 1.8 that if TA : R n →R m is multiplication by A, and if e1 , e2 , . . . , en are the standard basis vectors for R n , then A can be expressed as

A = [T(e1 ) | T(e2 ) | · · · | T(en )] It follows from this that the image of any vector v = (c1 , c2 , . . . , cn ) in R n under multiplication by A can be expressed as

TA(v) = c1 TA(e1 ) + c2 TA(e2 ) + · · · + cn TA(en ) This formula tells us that for a matrix transformation the image of any vector is expressible as a linear combination of the images of the standard basis vectors. This is a special case of the following more general result.

8.1 General Linear Transformations

451

T : V →W be a linear transformation, where V is finite-dimensional. If S = {v1 , v2 , . . . , vn } is a basis for V, then the image of any vector v in V can be expressed as T(v) = c1 T(v1 ) + c2 T(v2 ) + · · · + cn T(vn ) (3)

THEOREM 8.1.2 Let

where c1 , c2 , . . . , cn are the coefficients required to express v as a linear combination of the vectors in the basis S . Proof Express v as v

= c1 v1 + c2 v2 + · · · + cn vn and use the linearity of T .

E X A M P L E 10 Computing with Images of Basis Vectors

Consider the basis S = {v1 , v2 , v3 } for R 3 , where v1 = (1, 1, 1), v2 = (1, 1, 0), v3 = (1, 0, 0) Let T : R 3 →R 2 be the linear transformation for which

T(v1 ) = (1, 0), T(v2 ) = (2, −1), T(v3 ) = (4, 3) Find a formula for T(x1 , x2 , x3 ), and then use that formula to compute T(2, −3, 5). Solution We first need to express x = (x1 , x2 , x3 ) as a linear combination of v1 , v2 , and v3 . If we write (x1 , x2 , x3 ) = c1 (1, 1, 1) + c2 (1, 1, 0) + c3 (1, 0, 0)

then on equating corresponding components, we obtain

c1 + c2 + c3 = x1 c1 + c2 = x2 c1 = x3 which yields c1 = x3 , c2 = x2 − x3 , c3 = x1 − x2 , so

(x1 , x2 , x3 ) = x3 (1, 1, 1) + (x2 − x3 )(1, 1, 0) + (x1 − x2 )(1, 0, 0) = x3 v1 + (x2 − x3 )v2 + (x1 − x2 )v3 Thus

T(x1 , x2 , x3 ) = x3 T(v1 ) + (x2 − x3 )T(v2 ) + (x1 − x2 )T(v3 ) = x3 (1, 0) + (x2 − x3 )(2, −1) + (x1 − x2 )(4, 3) = (4x1 − 2x2 − x3 , 3x1 − 4x2 + x3 )

From this formula we obtain

T(2, −3, 5) = (9, 23)

CA L C U L U S R E Q U I R E D

E X A M P L E 11 A Linear Transformation from C 1 (−ⴥ, ⴥ) to F (−ⴥ, ⴥ)

Let V = C 1 (−⬁, ⬁) be the vector space of functions with continuous first derivatives on (−⬁, ⬁), and let W = F (−⬁, ⬁) be the vector space of all real-valued functions defined on (−⬁, ⬁). Let D : V →W be the transformation that maps a function f = f(x) into its derivative—that is, D( f ) = f (x) From the properties of differentiation, we have

D(f + g) = D(f ) + D(g) and D( k f ) = kD(f ) Thus, D is a linear transformation.

452

Chapter 8 General Linear Transformations

CA L C U L U S R E Q U I R E D

E X A M P L E 1 2 An Integral Transformation

Let V = C(−⬁, ⬁) be the vector space of continuous functions on the interval (−⬁, ⬁), let W = C 1 (−⬁, ⬁) be the vector space of functions with continuous first derivatives on (−⬁, ⬁), and let J : V →W be the transformation that maps a function f in V into



J (f ) =

x

f(t) dt 0

For example, if f(x) = x 2 , then



J (f ) =

x

t 2 dt =

0

t3

x

3

=

x3 3

0

The transformation J : V →W is linear, for if k is any constant, and if f and g are any functions in V , then properties of the integral imply that



J (kf ) =

x

0



kf(t) dt = k 

J (f + g) = 0

Kernel and Range

x

f(t) dt = kJ (f )

0

x



(f(t) + g(t)) dt = 0

x



x

f(t) dt +

g(t) dt = J (f ) + J (g)

0

Recall that if A is an m × n matrix, then the null space of A consists of all vectors x in R n such that Ax = 0, and by Theorem 4.7.1 the column space of A consists of all vectors b in R m for which there is at least one vector x in R n such that Ax = b. From the viewpoint of matrix transformations, the null space of A consists of all vectors in R n that multiplication by A maps into 0, and the column space of A consists of all vectors in R m that are images of at least one vector in R n under multiplication by A. The following definition extends these ideas to general linear transformations.

T : V →W is a linear transformation, then the set of vectors in V that T maps into 0 is called the kernel of T and is denoted by ker(T ). The set of all vectors in W that are images under T of at least one vector in V is called the range of T and is denoted by R(T ). DEFINITION 2 If

E X A M P L E 1 3 Kernel and Range of a Matrix Transformation

If TA : R n →R m is multiplication by the m × n matrix A, then, as discussed above, the kernel of TA is the null space of A, and the range of TA is the column space of A. E X A M P L E 1 4 Kernel and Range of the Zero Transformation

Let T : V →W be the zero transformation. Since T maps every vector in V into 0, it follows that ker(T ) = V. Moreover, since 0 is the only image under T of vectors in V , it follows that R(T ) = {0}. E X A M P L E 1 5 Kernel and Range of the Identity Operator

Let I : V →V be the identity operator. Since I (v) = v for all vectors in V , every vector in V is the image of some vector (namely, itself); thus R(I) = V. Since the only vector that I maps into 0 is 0, it follows that ker(I) = {0}. E X A M P L E 1 6 Kernel and Range of an Orthogonal Projection

Let T : R 3 →R 3 be the orthogonal projection onto the xy -plane. As illustrated in Figure 8.1.2a, the points that T maps into 0 = (0, 0, 0) are precisely those on the z-axis, so

8.1 General Linear Transformations

453

ker(T ) is the set of points of the form (0, 0, z). As illustrated in Figure 8.1.2b, T maps the points in R 3 to the xy-plane, where each point in that plane is the image of each point on the vertical line above it. Thus, R(T ) is the set of points of the form (x, y, 0). z

z (0, 0, z) T

(x, y, z)

y

x

x

Figure 8.1.2 y

(a) ker(T) is the z-axis.

y

T

(0, 0, 0)

(x, y, 0)

(b) R(T) is the entire xy-plane

T(v)

E X A M P L E 1 7 Kernel and Range of a Rotation

v θ x

Figure 8.1.3

CA L C U L U S R E Q U I R E D

Let T : R 2 →R 2 be the linear operator that rotates each vector in the xy -plane through the angle θ (Figure 8.1.3). Since every vector in the xy -plane can be obtained by rotating some vector through the angle θ , it follows that R(T ) = R 2 . Moreover, the only vector that rotates into 0 is 0, so ker(T ) = {0}. E X A M P L E 1 8 Kernel of a Differentiation Transformation

Let V = C 1 (−⬁, ⬁) be the vector space of functions with continuous first derivatives on (−⬁, ⬁), let W = F (−⬁, ⬁) be the vector space of all real-valued functions defined on (−⬁, ⬁), and let D : V →W be the differentiation transformation D( f ) = f (x). The kernel of D is the set of functions in V with derivative zero. From calculus, this is the set of constant functions on (−⬁, ⬁). Properties of Kernel and Range

In all of the preceding examples, ker(T ) and R(T ) turned out to be subspaces. In Examples 14, 15, and 17 they were either the zero subspace or the entire vector space. In Example 16 the kernel was a line through the origin, and the range was a plane through the origin, both of which are subspaces of R 3 . All of this is a consequence of the following general theorem. THEOREM 8.1.3 If T : V

→W is a linear transformation, then:

(a) The kernel of T is a subspace of V . (b) The range of T is a subspace of W .

Proof (a) To show that ker(T ) is a subspace, we must show that it contains at least one vector and is closed under addition and scalar multiplication. By part (a) of Theorem 8.1.1, the vector 0 is in ker(T ), so the kernel contains at least one vector. Let v1 and v2 be vectors in ker(T ), and let k be any scalar. Then

T(v1 + v2 ) = T(v1 ) + T(v2 ) = 0 + 0 = 0 so v1 + v2 is in ker(T ). Also,

T(k v1 ) = kT(v1 ) = k 0 = 0 so k v1 is in ker(T ).

454

Chapter 8 General Linear Transformations Proof (b) To show that R(T ) is a subspace of W , we must show that it contains at least one vector and is closed under addition and scalar multiplication. However, it contains at least the zero vector of W since T(0) = (0) by part (a) of Theorem 8.1.1. To prove that it is closed under addition and scalar multiplication, we must show that if w1 and w2 are vectors in R(T ), and if k is any scalar, then there exist vectors a and b in V for which T(a) = w1 + w2 and T(b) = k w1 (4)

But the fact that w1 and w2 are in R(T ) tells us there exist vectors v1 and v2 in V such that T(v1 ) = w1 and T(v2 ) = w2 The following computations complete the proof by showing that the vectors a = v1 + v2 and b = k v1 satisfy the equations in (4):

T(a) = T(v1 + v2 ) = T(v1 ) + T(v2 ) = w1 + w2 T(b) = T(k v1 ) = kT(v1 ) = k w1

CA L C U L U S R E Q U I R E D

E X A M P L E 1 9 Application to Differential Equations

Differential equations of the form

y

+ ω2 y = 0

(ω a positive constant)

(5)

arise in the study of vibrations. The set of all solutions of this equation on the interval (−⬁, ⬁) is the kernel of the linear transformation D : C 2 (−⬁, ⬁) →C(−⬁, ⬁), given by

D(y) = y

+ ω2 y It is proved in standard textbooks on differential equations that the kernel is a twodimensional subspace of C 2 (−⬁, ⬁), so that if we can find two linearly independent solutions of (5), then all other solutions can be expressed as linear combinations of those two. We leave it for you to confirm by differentiating that

y1 = cos ωx and y2 = sin ωx are solutions of (5). These functions are linearly independent since neither is a scalar multiple of the other, and thus

y = c1 cos ωx + c2 sin ωx

(6)

is a “general solution” of (5) in the sense that every choice of c1 and c2 produces a solution, and every solution is of this form. Rank and Nullity of Linear Transformations

In Definition 1 of Section 4.8 we defined the notions of rank and nullity for an m × n matrix, and in Theorem 4.8.2, which we called the Dimension Theorem for Matrices, we proved that the sum of the rank and nullity is n. We will show next that this result is a special case of a more general result about linear transformations. We start with the following definition. DEFINITION 3 Let T : V →W be a linear transformation. If the range of T is finitedimensional, then its dimension is called the rank of T; and if the kernel of T is finite-dimensional, then its dimension is called the nullity of T. The rank of T is denoted by rank(T ) and the nullity of T by nullity(T ).

The following theorem, whose proof is optional, generalizes Theorem 4.8.2.

8.1 General Linear Transformations

455

THEOREM 8.1.4 Dimension Theorem for Linear Transformations

If T : V →W is a linear transformation from a finite-dimensional vector space V to a vector space W, then the range of T is finite-dimensional, and rank(T ) + nullity(T ) = dim(V )

(7)

In the special case where A is an m × n matrix and TA : R n →R m is multiplication by A, the kernel of TA is the null space of A, and the range of TA is the column space of A. Thus, it follows from Theorem 8.1.4 that rank(TA ) + nullity(TA ) = n O PT I O N A L

Proof of Theorem 8.1.4 Assume that V is n-dimensional. We must show that

dim(R(T )) + dim(ker(T )) = n We will give the proof for the case where 1 ≤ dim(ker(T )) < n. The cases where dim(ker(T )) = 0 and dim(ker(T )) = n are left as exercises. Assume dim(ker(T )) = r , and let v1 , . . . , vr be a basis for the kernel. Since {v1 , . . . , vr } is linearly independent, Theorem 4.5.5(b) states that there are n − r vectors, vr+1 , . . . , vn , such that the extended set {v1 , . . . , vr , vr+1 , . . . , vn } is a basis for V . To complete the proof, we will show that the n − r vectors in the set S = {T (vr+1 ), . . . , T(vn )} form a basis for the range of T . It will then follow that dim(R(T )) + dim(ker(T )) = (n − r) + r = n First we show that S spans the range of T . If b is any vector in the range of T , then b = T(v) for some vector v in V . Since {v1 , . . . , vr , vr+1 , . . . , vn } is a basis for V , the vector v can be written in the form v = c1 v1 + · · · + cr vr + cr+1 vr+1 + · · · + cn vn Since v1 , . . . , vr lie in the kernel of T , we have T(v1 ) = · · · = T(vr ) = 0, so b = T(v) = cr+1 T(vr+1 ) + · · · + cn T(vn ) Thus S spans the range of T . Finally, we show that S is a linearly independent set and consequently forms a basis for the range of T . Suppose that some linear combination of the vectors in S is zero; that is, kr+1 T(vr+1 ) + · · · + kn T(vn ) = 0 (8) We must show that kr+1 = · · · = kn = 0. Since T is linear, (8) can be rewritten as

T(kr+1 vr+1 + · · · + kn vn ) = 0 which says that kr+1 vr+1 + · · · + kn vn is in the kernel of T . This vector can therefore be written as a linear combination of the basis vectors {v1 , . . . , vr }, say

kr+1 vr+1 + · · · + kn vn = k1 v1 + · · · + kr vr Thus,

k1 v1 + · · · + kr vr − kr+1 vr+1 − · · · − kn vn = 0 Since {v1 , . . . , vn } is linearly independent, all of the k ’s are zero; in particular, kr+1 = · · · = kn = 0, which completes the proof.

456

Chapter 8 General Linear Transformations

Exercise Set 8.1 In Exercises 1–2, suppose that T is a mapping whose domain is the vector space M22 . In each part, determine whether T is a linear transformation, and if so, find its kernel. 1. (a) T(A) = A2

(b) T(A) = tr(A)

(c) T(A) = A + AT 2. (a) T(A) = (A)11

(b) T(A) = 02×2

3. T : R 3 →R , where T(u) = u . 4. T : R 3 →R 3 , where v0 is a fixed vector in R 3 and T(u) = u × v0 . 5. T : M22 →M23 , where B is a fixed 2 × 3 matrix and T(A) = AB .

(b) T

b d

!

15. Let T : M22 →M22 be the dilation operator with factor k = 3.



(a) Find T

= a 2 + b2

(b) T(a0 + a1 x + a2 x 2 ) = (a0 + 1) + (a1 + 1)x + (a2 + 1)x 2 (b) T(f(x)) = f(x + 1)

T(a0 , a1 , a2 , . . . , an , . . .) = (0, a0 , a1 , a2 , . . . , an , . . .)

10. Let T : P2 →P3 be the linear transformation defined by T(p(x)) = xp(x). Which of the following are in ker(T )? (c) 1 + x

(d) −x

11. Let T : P2 →P3 be the linear transformation in Exercise 10. Which of the following are in R(T )? (c) 3 − x 2

(d) −x

12. Let V be any vector space, and let T : V →V be defined by T(v) = 3v. (a) What is the kernel of T ? (b) What is the range of T ? 13. In each part, use the given information to find the nullity of the linear transformation T . (a) T : R 5 →P5 has rank 3. (b) T : P4 →P3 has rank 1.

3

.

16. Let T : P2 →P2 be the contraction operator with factor k = 1/4.

(b) ker(T )

(c) R(T )

18. Let V be the subspace of C[0, 2π] spanned by the vectors 1, sin x , and cos x , and let T : V →R 3 be the evaluation transformation at the sequence of points 0, π, 2π . Find (a) T(1 + sin x + cos x)

9. T : R ⬁ →R ⬁ , where

(b) 1 + x

−4

(b) ker(T )

(c) R(T )

8. T : F (−⬁, ⬁) →F (−⬁, ⬁), where

(b) 0

2

(b) Find the rank and nullity of T .

(a) T(x 2 )

(a) T(f(x)) = 1 + f(x)



1

17. Let T : P2 →R 3 be the evaluation transformation at the sequence of points −1, 0, 1. Find

!

(a) T(a0 + a1 x + a2 x 2 ) = a0 + a1 (x + 1) + a2 (x + 1)2

(a) x + x 2

(d) T : Pn →Mmn has nullity 3.

(b) Find the rank and nullity of T .

= 3a − 4 b + c − d

7. T : P2 →P2 , where

(a) x 2

(b) T : P3 →R has nullity 1.

(a) Find T(1 + 4x + 8x 2 ).

6. T : M22 →R , where

b d

14. In each part, use the given information to find the rank of the linear transformation T .

(c) The null space of T : P5 →P5 is P5 .

In Exercises 3–9, determine whether the mapping T is a linear transformation, and if so, find its kernel.

(a) T

(d) T : M22 →M22 has rank 3.

(a) T : R 7 →M32 has nullity 2.

(c) T(A) = cA

a c a c

(c) The range of T : Mmn →R 3 is R 3 .

19. Consider the basis S = {v1 , v2 } for R 2 , where v1 = (1, 1) and v2 = (1, 0), and let T : R 2 →R 2 be the linear operator for which T(v1 ) = (1, −2) and T(v2 ) = (−4, 1) Find a formula for T(x1 , x2 ), and use that formula to find T(5, −3). 20. Consider the basis S = {v1 , v2 } for R 2 , where v1 = (−2, 1) and v2 = (1, 3), and let T : R 2 →R 3 be the linear transformation such that

T(v1 ) = (−1, 2, 0) and T(v2 ) = (0, −3, 5) Find a formula for T(x1 , x2 ), and use that formula to find T(2, −3). 21. Consider the basis S = {v1 , v2 , v3 } for R 3 , where v1 = (1, 1, 1), v2 = (1, 1, 0), and v3 = (1, 0, 0), and let T : R 3 →R 3 be the linear operator for which

T(v1 ) = (2, −1, 4), T(v2 ) = (3, 0, 1), T(v3 ) = (−1, 5, 1) Find a formula for T(x1 , x2 , x3 ), and use that formula to find T(2, 4, −1).

8.1 General Linear Transformations

22. Consider the basis S = {v1 , v2 , v3 } for R 3 , where v1 = (1, 2, 1), v2 = (2, 9, 0), and v3 = (3, 3, 4), and let T : R 3 →R 2 be the linear transformation for which

T(v1 ) = (1, 0), T(v2 ) = (−1, 1), T(v3 ) = (0, 1) Find a formula for T(x1 , x2 , x3 ), and use that formula to find T(7, 13, 7). 23. Let T : P3 →P2 be the mapping defined by

T(a0 + a1 x + a2 x + a3 x ) = 5a0 + a3 x 2

3

457

30. In each part, determine whether the mapping T : Pn →Pn is linear. (a) T(p(x)) = p(x + 1) (b) T(p(x)) = p(x) + 1 31. Let v1 , v2 , and v3 be vectors in a vector space V, and let T : V →R 3 be a linear transformation for which T(v1 ) = (1, −1, 2), T(v2 ) = (0, 3, 2),

T(v3 ) = (−3, 1, 2) 2

Find T(2v1 − 3v2 + 4v3 ).

(a) Show that T is linear.

Working with Proofs

(b) Find a basis for the kernel of T .

32. Let {v1 , v2 , . . . , vn } be a basis for a vector space V, and let T : V →W be a linear transformation. Prove that if

(c) Find a basis for the range of T . 24. Let T : P2 →P2 be the mapping defined by

T(a0 + a1 x + a2 x 2 ) = 3a0 + a1 x + (a0 + a1 )x 2 (a) Show that T is linear.

T(v1 ) = T(v2 ) = · · · = T(vn ) = 0 then T is the zero transformation. 33. Let {v1 , v2 , . . . , vn } be a basis for a vector space V, and let T : V →V be a linear operator. Prove that if

T(v1 ) = v1 , T(v2 ) = v2 , . . . , T(vn ) = vn

(b) Find a basis for the kernel of T .

then T is the identity transformation on V.

(c) Find a basis for the range of T . 25. (a) (Calculus required ) Let D : P3 →P2 be the differentiation transformation D(p) = p (x). What is the kernel of D ? (b) (Calculus required ).Let J : P1 →R be the integration trans1 formation J (p) = −1 p(x) dx . What is the kernel of J ? 26. (Calculus required ) Let V = C[a, b] be the vector space of continuous functions on [a, b], and let T : V →V be the transformation defined by



x

T( f ) = 5f(x) + 3

f(t) dt a

Is T a linear operator? 27. (Calculus required ) Let V be the vector space of real-valued functions with continuous derivatives of all orders on the interval (−⬁, ⬁), and let W = F (−⬁, ⬁) be the vector space of real-valued functions defined on (−⬁, ⬁).

34. Prove: If {v1 , v2 , . . . , vn } is a basis for a vector space V and w1 , w2 , . . . , wn are vectors in a vector space W , not necessarily distinct, then there exists a linear transformation T : V →W such that

T(v1 ) = w1 , T(v2 ) = w2 , . . . , T(vn ) = wn

True-False Exercises TF. In parts (a)–(i) determine whether the statement is true or false, and justify your answer. (a) If T(c1 v1 + c2 v2 ) = c1 T(v1 ) + c2 T(v2 ) for all vectors v1 and v2 in V and all scalars c1 and c2 , then T is a linear transformation. (b) If v is a nonzero vector in V, then there is exactly one linear transformation T : V →W such that T(−v) = −T(v). (c) There is exactly one linear transformation T : V →W for which T(u + v) = T(u − v) for all vectors u and v in V.

(a) Find a linear transformation T : V →W whose kernel is P3 .

(d) If v0 is a nonzero vector in V, then the formula T(v) = v0 + v defines a linear operator on V.

(b) Find a linear transformation T : V →W whose kernel is Pn .

(e) The kernel of a linear transformation is a vector space.

28. For a positive integer n > 1, let T : Mnn →R be the linear transformation defined by T (A) = tr(A), where A is an n × n matrix with real entries. Determine the dimension of ker(T ). 29. (a) Let T : V →R 3 be a linear transformation from a vector space V to R 3 . Geometrically, what are the possibilities for the range of T ? (b) Let T : R 3 →W be a linear transformation from R 3 to a vector space W . Geometrically, what are the possibilities for the kernel of T ?

(f ) The range of a linear transformation is a vector space. (g) If T : P6 →M22 is a linear transformation, then the nullity of T is 3. (h) The function T : M22 →R defined by T(A) = det A is a linear transformation. (i) The linear transformation T : M22 →M22 defined by



T(A) = has rank 1.

1 2

3 A 6

458

Chapter 8 General Linear Transformations

8.2 Compositions and InverseTransformations In Section 4.10 we discussed compositions and inverses of matrix transformations. In this section we will extend some of those ideas to general linear transformations.

One-to-One and Onto

To set the groundwork for our discussion in this section we will need the following definitions that are illustrated in Figure 8.2.1.

T : V →W is a linear transformation from a vector space V to a vector space W , then T is said to be one-to-one if T maps distinct vectors in V into distinct vectors in W . DEFINITION 1 If

T : V →W is a linear transformation from a vector space V to a vector space W , then T is said to be onto (or onto W ) if every vector in W is the image of at least one vector in V . DEFINITION 2 If

V

W

V

W

V

W

Range of T One-to-one. Distinct vectors in V have distinct images in W.

Onto W. Every vector in W is the image of some vector in V.

Not one-to-one. There exist distinct vectors in V with the same image.

V

W

Range of T Not onto W. Not every vector in W is the image of some vector in V.

Figure 8.2.1

THEOREM 8.2.1 If T : V

→W is a linear transformation, then the following statements

are equivalent. (a) T is one-to-one. (b) ker(T ) = {0}.

= 0 by Theorem 8.1.1(a). Since T is one-to-one, there can be no other vectors in V that map into 0, so ker(T ) = {0}. Proof (a) ⇒ (b) Since T is linear, we know that T(0)

= {0}. If u and v are distinct vectors in V, then u − v  = 0. This implies that T(u − v)  = 0, for otherwise ker(T ) would contain a nonzero vector. Since T is linear, it follows that

(b) ⇒ (a) Assume that ker(T )

T(u) − T(v) = T(u − v)  = 0 so T maps distinct vectors in V into distinct vectors in W and hence is one-to-one. In the special case where V is finite-dimensional and T is a linear operator on V , then we can add a third statement to those in Theorem 8.2.1.

8.2 Compositions and Inverse Transformations

459

THEOREM 8.2.2 If

V and W are finite-dimensional vector spaces with the same dimension, and if T : V →W is a linear transformation, then the following statements are equivalent. (a) T is one-to-one. (b) ker(T ) = {0}. (c) T is onto [i.e., R(T ) = W ].

Proof We already know that (a) and (b) are equivalent by Theorem 8.2.1, so it suffices to show that (b) and (c) are equivalent. We leave it for you to do this by assuming that dim(V ) = n and applying Theorem 8.1.4.

The requirement in Theorem 8.2.2 that V and W have the same dimension is essential for the validity of the theorem. In the exercises we will ask you to prove the following facts for the case where they do not have the same dimension. • If dim(W ) < dim(V ), then T cannot be one-to-one. • If dim(V ) < dim(W ), then T cannot be onto. Stated informally, if a linear transformation maps a “bigger” space to a “smaller” space, then some points in the “bigger” space must have the same image; and if a linear transformation maps a “smaller” space to a “bigger” space, then there must be points in the “bigger” space that are not images of any points in the “smaller” space. E X A M P L E 1 Matrix Transformations

If TA : R n →R m is multiplication by an m × n matrix A, then it follows from the foregoing discussion that TA is not one-to-one if m < n and is not onto if n < m. In the case where m = n we know from Theorem 4.10.2 that TA is both one-to-one and onto if and only if A is invertible. E X A M P L E 2 Basic Transformations That Are One-to-One and Onto

The linear transformations T1 : P3 →R 4 and T2 : M22 →R 4 defined by

T1 (a + bx + cx 2 + dx 3 ) = (a, b, c, d)

! a b T2 = (a, b, c, d) c d are both one-to-one and onto (verify by showing that their kernels contain only the zero vector). E X A M P L E 3 A One-to-One Linear Transformation That Is Not Onto

Let T : Pn →Pn+1 be the linear transformation

T(p) = T(p(x)) = xp(x) discussed in Example 5 of Section 8.1. If p = p(x) = c0 + c1 x + · · · + cn x n and q = q(x) = d0 + d1 x + · · · + dn x n are distinct polynomials, then they differ in at least one coefficient. Thus,

T(p) = c0 x + c1 x 2 + · · · + cn x n+1 and T(q) = d0 x + d1 x 2 + · · · + dn x n+1 also differ in at least one coefficient. It follows that T is one-to-one since it maps distinct polynomials p and q into distinct polynomials T(p) and T(q). However, it is not onto

460

Chapter 8 General Linear Transformations

because all images under T have a zero constant term. Thus, for example, there is no vector in Pn that maps into the constant polynomial 1. E X A M P L E 4 Shifting Operators

Let V = R ⬁ be the sequence space discussed in Example 3 of Section 4.1, and consider the linear “shifting operators” on V defined by

T1 (u1 , u2 , . . . , un , . . .) = (0, u1 , u2 , . . . , un , . . .) T2 (u1 , u2 , . . . , un , . . .) = (u2 , u3 , . . . , un , . . .) (a) Show that T1 is one-to-one but not onto. (b) Show that T2 is onto but not one-to-one. Solution (a) The operator T1 is one-to-one because distinct sequences in R ⬁ obviously

have distinct images. This operator is not onto because no vector in R ⬁ maps into the sequence (1, 0, 0, . . . , 0, . . .), for example.

T2 is not one-to-one because, for example, the vectors (1, 0, 0, . . . , 0, . . .) and (2, 0, 0, . . . , 0, . . .) both map into (0, 0, 0, . . . , 0, . . .). This Solution (b) The operator

Why does Example 4 not violate Theorem 8.2.2?

operator is onto because every possible sequence of real numbers can be obtained with an appropriate choice of the numbers u2 , u3 , . . . , un , . . . . E X A M P L E 5 Differentiation Is Not One-to-One

CA L C U L U S R E Q U I R E D

Let

D : C 1 (−⬁, ⬁) →F (−⬁, ⬁) be the differentiation transformation discussed in Example 11 of Section 8.1. This linear transformation is not one-to-one because it maps functions that differ by a constant into the same function. For example,

D(x 2 ) = D(x 2 + 1) = 2x Composition of Linear Transformations Note that the word “with” establishes the order of the operations in a composition. The composition of T2 with T1 is

The following definition extends Formula (1) of Section 4.10 to general linear transformations.

T1 : U →V and T2 : V →W are linear transformations, then the composition of T2 with T1 , denoted by T2 ◦ T1 (which is read “T2 circle T1 ”), is the function defined by the formula DEFINITION 3 If

(T2 ◦ T1 )(u) = T2 (T1 (u))

(T2 ◦ T1 )(u) = T2 (T1 (u)) whereas the composition of T1 with T2 is

(T1 ◦ T2 )(u) = T1 (T2 (u)) It is not true, in general, that T1 ◦ T2 = T2 ◦ T1 .

(1)

where u is a vector in U . Remark Observe that this definition requires that the domain of T2 (which is V ) contain the range of T1 . This is essential for the formula T2 (T1 (u)) to make sense (Figure 8.2.2).

T2 ° T1 T1

T2

u U

T1(u) V

Figure 8.2.2 The composition of T2 with T1 .

T2 (T1(u)) W

8.2 Compositions and Inverse Transformations

461

Our next theorem shows that the composition of two linear transformations is itself a linear transformation.

T1 : U →V and T2 : V →W are linear transformations, then (T2 ◦ T1 ): U →W is also a linear transformation.

THEOREM 8.2.3 If

Proof If u and v are vectors in

U and c is a scalar, then it follows from (1) and the

linearity of T1 and T2 that

(T2 ◦ T1 )(u + v) = T2 (T1 (u + v)) = T2 (T1 (u) + T1 (v)) = T2 (T1 (u)) + T2 (T1 (v)) = (T2 ◦ T1 )(u) + (T2 ◦ T1 )(v) and

(T2 ◦ T1 )(cu) = T2 (T1 (cu)) = T2 (cT1 (u)) = cT2 (T1 (u)) = c(T2 ◦ T1 )(u)

Thus, T2 ◦ T1 satisfies the two requirements of a linear transformation.

E X A M P L E 6 Composition of Linear Transformations

Let T1 : P1 →P2 and T2 : P2 →P2 be the linear transformations given by the formulas

T1 (p(x)) = xp(x) and T2 (p(x)) = p(2x + 4) Then the composition (T2 ◦ T1 ): P1 →P2 is given by the formula

(T2 ◦ T1 )(p(x)) = T2 (T1 (p(x))) = T2 (xp(x)) = (2x + 4)p(2x + 4) In particular, if p(x) = c0 + c1 x , then

(T2 ◦ T1 )(p(x)) = (T2 ◦ T1 )(c0 + c1 x) = (2x + 4)(c0 + c1 (2x + 4)) = c0 (2x + 4) + c1 (2x + 4)2

E X A M P L E 7 Composition with the Identity Operator

If T : V →V is any linear operator, and if I : V →V is the identity operator (Example 3 of Section 8.1), then for all vectors v in V , we have

(T ◦ I)(v) = T(I (v)) = T(v) (I ◦ T )(v) = I (T(v)) = T(v) It follows that T ◦ I and I ◦ T are the same as T ; that is,

T ◦I =T

and

I ◦T =T

(2)

As illustrated in Figure 8.2.3, compositions can be defined for more than two linear transformations. For example, if

T1 : U →V, T2 : V →W, and T3 : W →Y are linear transformations, then the composition T3 ◦ T2 ◦ T1 is defined by

(T3 ◦ T2 ◦ T1 )(u) = T3 (T2 (T1 (u)))

(3)

462

Chapter 8 General Linear Transformations (T3 ° T2 ° T1)(u)

T1 u

T3

T2 T1(u)

U

T3(T2(T1(u)))

T2(T1(u)) V

W

Y

Figure 8.2.3 The composition of three linear transformations.

Inverse Linear Transformations

In Theorem 4.10.1 we showed that a matrix operator TA : R n →R n is one-to-one if and only if the matrix A is invertible, in which case the inverse operator is TA−1 . We then showed that if w is the image of a vector x under the operator TA , then x is the image under TA−1 of the vector w (see Figure 4.10.8). Our next objective is to extend the notion of invertibility to general linear transformations. If T : V →W is a one-to-one linear transformation with range R(T ), and if w is any vector in R(T ), then the fact that T is one-to-one means that there is exactly one vector v in V for which T (v) = w. This fact allows us to define a new function, called the inverse of T (and denoted by T −1 ), that is defined on the range of T and that maps w back into v (Figure 8.2.4). T w = T(v)

v

Figure 8.2.4 The inverse of T maps T(v) back into v.

V

T –1

R(T)

It can be proved (Exercise 33) that T −1 : R(T ) →V is a linear transformation. Moreover, it follows from the definition of T −1 that

T −1 (T(v)) = T −1 (w) = v

(4)

T (T −1 (w)) = T(v) = w

(5)

so that T and T −1 , when applied in succession in either order, cancel the effect of each other. E X A M P L E 8 An Inverse Transformation

We showed in Example 3 of this section that the linear transformation T : Pn →Pn+1 given by T(p) = T (p(x)) = xp(x) is one-to-one but not onto. The fact that it is not onto can be seen explicitly from the formula T(c0 + c1 x + · · · + cn x n ) = c0 x + c1 x 2 + · · · + cn x n+1 (6) which makes it clear that the range of T consists of all polynomials in Pn+1 that have zero constant term. Since T is one-to-one it has an inverse, and from (6) this inverse is given by the formula

T −1 (c0 x + c1 x 2 + · · · + cn x n+1 ) = c0 + c1 x + · · · + cn x n For example, in the case where n ≥ 3, T −1 (2x − x 2 + 5x 3 + 3x 4 ) = 2 − x + 5x 2 + 3x 3 E X A M P L E 9 An Inverse Transformation

Let T : R 3 →R 3 be the linear operator defined by the formula

T(x1 , x2 , x3 ) = (3x1 + x2 , −2x1 − 4x2 + 3x3 , 5x1 + 4x2 − 2x3 ) Determine whether T is one-to-one; if so, find T −1 (x1 , x2 , x3 ).

8.2 Compositions and Inverse Transformations

463

Solution It follows from Formula (15) of Section 1.8 that the standard matrix for T is



3 ⎢ [T ] = ⎣−2 5

1 −4 4



0 ⎥ 3⎦ −2

(verify). This matrix is invertible, and from Formula (9) of Section 4.10 the standard matrix for T −1 is ⎡ ⎤ 4 −2 −3 6 9⎦ [T −1 ] = [T ]−1 = ⎣−11 −12 7 10 It follows that

⎛⎡ ⎤⎞ ⎤ ⎡ ⎤ ⎡ 4x − 2x − 3x ⎤ ⎡ ⎤ ⎡ x1 x1 x1 4 −2 −3 1 2 3 ⎥ ⎢ −1 ⎝⎣ ⎦⎠ −1 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ x2 6 9 x2 = ⎣−11x1 + 6x2 + 9x3 ⎦ = [T ] x2 = −11 T x3 x3 −12 7 10 x3 −12x1 + 7x2 + 10x3

Expressing this result in horizontal notation yields

T −1 (x1 , x2 , x3 ) = (4x1 − 2x2 − 3x3 , −11x1 + 6x2 + 9x3 , −12x1 + 7x2 + 10x3 ) Composition of One-to-One Linear Transformations

We conclude this section with a theorem that shows that the composition of one-toone linear transformations is one-to-one and that the inverse of a composition is the composition of the inverses in the reverse order. THEOREM 8.2.4 If

T1 : U →V and T2 : V →W are one-to-one linear transforma-

tions, then: (a) T2 ◦ T1 is one-to-one. (b) (T2 ◦ T1 )−1 = T1−1 ◦ T2−1 . Proof (a) We want to show that T2 ◦ T1 maps distinct vectors in U into distinct vectors in W . But if u and v are distinct vectors in U , then T1 (u) and T1 (v) are distinct vectors in V since T1 is one-to-one. This and the fact that T2 is one-to-one imply that

T2 (T1 (u)) and T2 (T1 (v)) are also distinct vectors. But these expressions can also be written as

(T2 ◦ T1 )(u) and (T2 ◦ T1 )(v) so T2 ◦ T1 maps u and v into distinct vectors in W . Proof (b) We want to show that

(T2 ◦ T1 )−1 (w) = (T1−1 ◦ T2−1 )(w) for every vector w in the range of T2 ◦ T1 . For this purpose, let u = (T2 ◦ T1 )−1 (w) so our goal is to show that

u = (T1−1 ◦ T2−1 )(w)

But it follows from (7) that

(T2 ◦ T1 )(u) = w or, equivalently,

T2 (T1 (u)) = w

(7)

464

Chapter 8 General Linear Transformations

Now, taking T2−1 of each side of this equation, then taking T1−1 of each side of the result, and then using (4) yields (verify) u = T1−1 (T2−1 (w)) or, equivalently,

u = (T1−1 ◦ T2−1 )(w)

In words, part (b) of Theorem 8.2.4 states that the inverse of a composition is the composition of the inverses in the reverse order. This result can be extended to compositions of three or more linear transformations; for example,

(T3 ◦ T2 ◦ T1 )−1 = T1−1 ◦ T2−1 ◦ T3−1

(8)

In the case where TA , TB , and TC are matrix operators on R n , Formula (8) can be written as

Note the order of the subscripts on the two sides of Formula (9).

(TC ◦ TB ◦ TA )−1 = TA−1 ◦ TB−1 ◦ TC−1

or alternatively as

(TCBA )−1 = TA−1 B −1 C −1

(9)

Exercise Set 8.2 In Exercises 1–2, determine whether the linear transformation is one-to-one by finding its kernel and then applying Theorem 8.2.1. 1. (a) T : R 2 →R 2 , where T(x, y) = (y, x) (b) T : R 2 →R 3 , where T(x, y) = (x, y, x + y) (c) T : R →R , where T(x, y, z) = (x + y + z, x − y − z) 3

5. Use the given information to determine whether the linear transformation is one-to-one. (a) T : V →W ; nullity(T ) = 0 (b) T : V →W ; rank(T ) = dim(V ) (c) T : V →W ; dim(W ) < dim(V )

2

2. (a) T : R 2 →R 3 , where T(x, y) = (x − y, y − x, 2x − 2y) (b) T : R 2 →R 2 , where T(x, y) = (0, 2x + 3y) (c) T : R 2 →R 2 , where T(x, y) = (x + y, x − y) In Exercises 3–4, determine whether multiplication by A is one-to-one by computing the nullity of A and then applying Theorem 8.2.1.



−2

1





⎥ −4 ⎦

3. (a) A = ⎣ 2

−3 ⎡

6



1

3

1

7

(b) A = ⎣ 2

7

2

4⎦

−1

−3

0

0



⎡ ⎢

1

4. (a) A = ⎣2

2

⎤ ⎥

7⎦

3

9

1

−3

6

1

(b) A = ⎣0

1

2

4⎦

0

0

0

1

⎡ ⎢

⎤ ⎥



6. Use the given information to determine whether the linear operator is one-to-one, onto, both, or neither. (a) T : V →V ; nullity(T ) = 0 (b) T : V →V ; rank(T ) < dim(V ) (c) T : V →V ; R(T ) = V 7. Show that the linear transformation T : P2 →R 2 defined by T(p(x)) = (p(−1), p(1)) is not one-to-one by finding a nonzero polynomial that maps into 0 = (0, 0). Do you think that this transformation is onto? 8. Show that the linear transformation T : P2 →P2 defined by T(p(x)) = p(x + 1) is one-to-one. Do you think that this transformation is onto? 9. Let a be a fixed vector in R 3 . Does the formula T(v) = a × v define a one-to-one linear operator on R 3 ? Explain your reasoning. 10. Let E be a fixed 2 × 2 elementary matrix. Does the formula T(A) = EA define a one-to-one linear operator on M22 ? Explain your reasoning. In Exercises 11–12, compute (T2 ◦ T1 )(x, y). 11. T1 (x, y) = (2x, 3y), T2 (x, y) = (x − y, x + y)

8.2 Compositions and Inverse Transformations

12. T1 (x, y) = (2x, −3y, x + y),

465

(a) Show that T1 and T2 are one-to-one.

T2 (x, y, z) = (x − y, y + z)

(b) Find formulas for

In Exercises 13–14, compute (T3 ◦ T2 ◦ T1 )(x, y). 13. T1 (x, y) = (−2y, 3x, x − 2y), T2 (x, y, z) = (y, z, x),

T3 (x, y, z) = (x + z, y − z)

15. Let T1 : M22 →R and T2 : M22 →M22 be the linear transformations given by T1 (A) = tr(A) and T2 (A) = AT .

a (a) Find (T1 ◦ T2 )(A), where A = c

(c) Verify that (T2 ◦ T1 )−1 = T1−1 ◦ T2−1 . 23. Let T1 : P2 →P3 and T2 : P3 →P3 be the linear transformations given by the formulas

14. T1 (x, y) = (x + y, y, −x), T2 (x, y, z) = (0, x + y + z, 3y), T3 (x, y, z) = (3x + 2y, 4z − x − 3y)



T1−1 (x, y), T2−1 (x, y), (T2 ◦ T1 )−1 (x, y)

b . d

(b) Can you find (T2 ◦ T1 )(A)? Explain. 16. Rework Exercise 15 given that T1 : M22 →M22 and T2 : M22 →M22 are the linear transformations, T1 (A) = kA and T2 (A) = AT , where k is a scalar. 17. Suppose that the linear transformations T1 : P2 →P2 and T2 : P2 →P3 are given by the formulas T1 (p(x)) = p(x + 1) and T2 (p(x)) = xp(x). Find (T2 ◦ T1 )(a0 + a1 x + a2 x 2 ). 18. Let T1 : Pn →Pn and T2 : Pn →Pn be the linear operators given by T1 (p(x)) = p(x − 1) and T2 (p(x)) = p(x + 1). Find (T1 ◦ T2 )(p(x)) and (T2 ◦ T1 )(p(x)). 19. Let T : P1 →R 2 be the function defined by the formula

T(p(x)) = (p(0), p(1)) (a) Find T(1 − 2x). (b) Show that T is a linear transformation. (c) Show that T is one-to-one. (d) Find T −1 (2, 3), and sketch its graph. 20. In each part, determine whether the linear operator T : R n →R n is one-to-one; if so, find T −1 (x1 , x2 , . . . , xn ). (a) T(x1 , x2 , . . . , xn ) = (0, x1 , x2 , . . . , xn−1 ) (b) T(x1 , x2 , . . . , xn ) = (xn , xn−1 , . . . , x2 , x1 ) (c) T(x1 , x2 , . . . , xn ) = (x2 , x3 , . . . , xn , x1 ) 21. Let T : R n →R n be the linear operator defined by the formula

T(x1 , x2 , . . . , xn ) = (a1 x1 , a2 x2 , . . . , an xn ) where a1 , . . . , an are constants. (a) Under what conditions will T have an inverse? (b) Assuming that the conditions determined in part (a) are satisfied, find a formula for T −1 (x1 , x2 , . . . , xn ).

    T1 p(x) = xp(x) and T2 p(x) = p(x + 1)     (a) Find formulas for T1−1 p(x) , T2−1 p(x) , and   (T1−1 ◦ T2−1 ) p(x) .

(b) Verify that (T2 ◦ T1 )−1 = T1−1 ◦ T2−1 . 24. Let TA : R 3 →R 3 , TB : R 3 →R 3 , and TC : R 3 →R 3 be the reflections about the xy -plane, the xz-plane, and the yz-plane, respectively. Verify Formula (9) for these linear operators. 25. Let T1 : V →V be the dilation T1 (v) = 4v. Find a linear operator T2 : V →V such that T1 ◦ T2 = I and T2 ◦ T1 = I . 26. Let T1 : M22 →P1 and T2 : P1 →R 3 be the linear transforma-

 a tions given by T1 c

b d



= (a + b) + (c + d)x and

T2 (a + bx) = (a, b, a). (a) Find the formula for T2 ◦ T1 . (b) Show that T2 ◦ T1 is not one-to-one by finding distinct 2 × 2 matrices A and B such that

(T2 ◦ T1 )(A) = (T2 ◦ T1 )(B) (c) Show that T2 ◦ T1 is not onto by finding a vector (a, b, c) in R 3 that is not in the range of T2 ◦ T1 . 27. Let T : R 3 →R 3 be the orthogonal projection of R 3 onto the xy -plane. Show that T ◦ T = T . 28. (Calculus required ) Let V be the vector space C 1 [0, 1] and let T : V →R be defined by

T(f) = f (0) + 2f (0) + 3f (1) Verify that T is a linear transformation. Determine whether T is one-to-one, and justify your conclusion. 29. (Calculus required ) The Fundamental Theorem of Calculus implies that integration and differentiation reverse the actions of each other. Define a transformation D : Pn →Pn−1 by D(p(x)) = p (x), and define J : Pn−1 →Pn by



J (p(x)) =

x

p(t) dt 0

(a) Show that D and J are linear transformations. 22. Let T1 : R 2 →R 2 and T2 : R 2 →R 2 be the linear operators given by the formulas

T1 (x, y) = (x + y, x − y) and T2 (x, y) = (2x + y, x − 2y)

(b) Explain why J is not the inverse transformation of D . (c) Can the domains and/or codomains of D and J be restricted so they are inverse linear transformations?

466

Chapter 8 General Linear Transformations

30. (Calculus required ) Let





D( f ) = f (x) and J ( f ) =

x

f(t) dt 0

be the linear transformations in Examples 11 and 12 of Section 8.1. Find (J ◦ D)( f ) for (a) f(x) = x 2 + 3x + 2.

(b) f(x) = sin x .

36. Prove: If there exists an onto linear transformation T : V →W then dim(V ) ≥ dim(W ). 37. Prove: If V and W are finite-dimensional vector spaces such that dim(W ) < dim(V ), then there is no one-to-one linear transformation T : V →W .

31. (Calculus required . 1) Let J : P1 →R be the integration transformation J (p) = −1 p(x)dx . Determine whether J is one-toone. Justify your answer.

True-False Exercises

32. (Calculus required ) Let D : Pn →Pn−1 be the differentiation transformation D(p(x)) = p (x). Determine whether D is onto, and justify your answer.

(a) The composition of two linear transformations is also a linear transformation.

TF. In parts (a)–(f) determine whether the statement is true or false, and justify your answer.

Working with Proofs

(b) If T1 : V →V and T2 : V →V are any two linear operators, then T1 ◦ T2 = T2 ◦ T1 .

33. Prove: If T : V →W is a one-to-one linear transformation, then T −1 : R(T ) →V is a one-to-one linear transformation.

(c) The inverse of a one-to-one linear transformation is a linear transformation.

34. Use the definition of T3 ◦ T2 ◦ T1 given by Formula (3) to prove that (a) T3 ◦ T2 ◦ T1 is a linear transformation.

(d) If a linear transformation T has an inverse, then the kernel of T is the zero subspace. (e) If T : R 2 →R 2 is the orthogonal projection onto the x -axis, then T −1 : R 2 →R 2 maps each point on the x -axis onto a line that is perpendicular to the x -axis.

(b) T3 ◦ T2 ◦ T1 = (T3 ◦ T2 ) ◦ T1 . (c) T3 ◦ T2 ◦ T1 = T3 ◦ (T2 ◦ T1 ). 35. Let q0 (x) be a fixed polynomial of degree m, and define a function T with domain Pn by the formula T(p(x)) = p(q0 (x)). Prove that T is a linear transformation.

(f ) If T1 : U →V and T2 : V →W are linear transformations, and if T1 is not one-to-one, then neither is T2 ◦ T1 .

8.3 Isomorphism In this section we will establish a fundamental connection between real finite-dimensional vector spaces and the Euclidean space R n . This connection is not only important theoretically, but it has practical applications in that is allows us to perform vector computations in general vector spaces by working with the vectors in R n .

Isomorphism

Although many of the theorems in this text have been concerned exclusively with the vector space R n , this is not as limiting as it might seem. We will show that the vector space R n is the “mother” of all real n-dimensional vector spaces in the sense that every n-dimensional vector space must have the same algebraic structure as R n even though its vectors may not be expressed as n-tuples. To explain what we mean by this, we will need the following definition. DEFINITION 1 A linear transformation T : V →W that is both one-to-one and onto is said to be an isomorphism, and W is said to be isomorphic to V .

In the exercises we will ask you to show that if T : V →W is an isomorphism, then T −1 : W →V is also an isomorphism. Accordingly, we will usually say simply that V and W are isomorphic and that T is an isomorphism between V and W . The word isomorphic is derived from the Greek words iso, meaning “identical,” and morphe, meaning “form.” This terminology is appropriate because, as we will now explain, isomorphic vector spaces have the same “algebraic form,” even though they

8.3 Isomorphism

467

may consist of different kinds of objects. For example, the following diagram illustrates an isomorphism between P2 and R 3 T −−− −− −− −→ c0 + c1 x + c2 x 2 ← − (c0 , c1 , c2 ) −1 − T

Although the vectors on the two sides of the arrows are different kinds of objects, the vector operations on each side mirror those on the other side. For example, for scalar multiplication we have T −−− −− −− −→ k(c0 + c1 x + c2 x 2 ) ← − k(c0 , c1 , c2 ) −1 − T

−−− −− −− −→ kc0 + kc1 x + kc2 x 2 ← − (kc0 , kc1 , kc2 ) −1 − T

T

and for vector addition we have T −−− −− −− −→ (c0 + c1 x + c2 x 2 ) + (d0 + d1 x + d2 x 2 ) ← − (c0 , c1 , c2 ) + (d0 , d1 , d2 ) −1 − T

−−− −− −− −→ (c0 + d0 ) + (c1 + d1 )x + (c2 + d2 )x 2 ← − (c0 + d0 , c1 + d1 , c2 + d2 ) −1 − T

T

The following theorem, which is one of the most basic results in linear algebra, reveals the fundamental importance of the vector space R n . THEOREM 8.3.1 Every real n-dimensional vector space is isomorphic to R n .

Theorem 8.3.1 tells us that every real n-dimensional vector space differs from R n only in notation; the algebraic structures of the two spaces are the same.

Proof Let V be a real n-dimensional vector space. To prove that V is isomorphic to R n

we must find a linear transformation T : V →R n that is one-to-one and onto. For this purpose, let S = {v1 , v2 , . . . , vn } be any basis for V , let u = k1 v1 + k2 v2 + · · · + kn vn

(1)

be the representation of a vector u in V as a linear combination of the basis vectors, and let T : V →R n be the coordinate map

T(u) = (u)S = (k1 , k2 , . . . , kn )

(2)

We will show that T is an isomorphism (linear, one-to-one, and onto). To prove the linearity, let u and v be vectors in V , let c be a scalar, and let u = k1 v1 + k2 v2 + · · · + kn vn and v = d1 v1 + d2 v2 + · · · + dn vn

(3)

be the representations of u and v as linear combinations of the basis vectors. Then it follows from (3) that

T(cu) = T(ck1 v1 + ck2 v2 + · · · + ckn vn ) = (ck1 , ck2 , . . . , ckn ) = c(k1 , k2 , . . . , kn ) = cT(u) and that

  T(u + v) = T (k1 + d1 )v1 + (k2 + d2 )v2 + · · · + (kn + dn )vn = (k1 + d1 , k2 + d2 , . . . , kn + dn ) = (k1 , k2 , . . . , kn ) + (d1 , d2 , . . . , dn ) = T(u) + T(v)

which shows that T is linear. To show that T is one-to-one, we must show that if u and v are distinct vectors in V , then so are their images in R n . But if u  = v, and if the

468

Chapter 8 General Linear Transformations

representations of these vectors in terms of the basis vectors are as in (3), then we must have ki  = di for at least one i . Thus,

T(u) = (k1 , k2 , . . . , kn )  = (d1 , d2 , . . . , dn ) = T(v) which shows that u and v have distinct images under T . Finally, the transformation T is onto, for if w = (k1 , k2 , . . . , kn ) is any vector in R n , then it follows from (2) that w is the image under T of the vector u = k1 v1 + k2 v2 + · · · + kn vn Whereas Theorem 8.3.1 tells us, in general, that every n-dimensional vector space is isomorphic to R n , it is Formula (2) in its proof that tells us how to find isomorphisms. THEOREM 8.3.2 If S

= {v1 , v2 , . . . , vn } is a basis for a vector space V , then the coor-

dinate map T

u −→ (u)S is an isomorphism between V and R n .

Remark Recall that coordinate maps depend on the order in which the basis vectors are listed. Thus, Theorem 8.3.2 actually describes many possible isomorphisms, one for each of the n! possible orders in which the basis vectors can be listed.

E X A M P L E 1 The Natural Isomorphism Between Pn–1 and R n

It follows from Theorem 8.3.2 that the coordinate map T

a0 + a1 x + · · · + an−1 x n−1 −→ (a0 , a1 , . . . , an−1 ) defines an isomorphism between Pn−1 and R n . This is called natural isomorphism between those vector spaces. E X A M P L E 2 The Natural Isomorphism Between M 22 and R 4

It follows from Theorem 8.3.2 that the coordinate map



a c

 b T −→ (a, b, c, d) d

defines an isomorphism between M22 and R 4 . This is a special case of the isomorphism that maps an m × n matrix into its coordinate vector. We call this the natural isomorphism between Mmn and R mn .

CA L C U L U S R E Q U I R E D

E X A M P L E 3 Differentiation by Matrix Multiplication

Consider the differentiation transformation D : P3 →P2 on the vector space of polynomials of degree 3 or less. If we map P3 and P2 into R 4 and R 3 , respectively, by the natural isomorphisms, then the transformation D produces a corresponding matrix transformation from R 4 to R 3 . Specifically, the derivative transformation D

a0 + a1 x + a2 x 2 + a3 x 3 −→ a1 + 2a2 x + 3a3 x 2

8.3 Isomorphism

produces the matrix transformation



0

1 0 0

⎢ ⎣0 0

0 2 0

469

⎡ ⎤ ⎤ a0 ⎡ ⎤ a1 0 ⎢ ⎥ ⎥ ⎢ a1 ⎥ ⎢ ⎥ 0⎦ ⎢ ⎥ = ⎣2a2 ⎦ ⎣a2 ⎦ 3 3a3 a3

Thus, for example, the derivative

d (2 + x + 4x 2 − x 3 ) = 1 + 8x − 3x 2 dx can be calculated as the matrix product



0

1 0 0

⎢ ⎣0 0

0 2 0





⎡ ⎤ 2 1 0 ⎢ ⎥ ⎥ ⎢ 1⎥ ⎢ ⎥ 0⎦ ⎢ ⎥ = ⎣ 8⎦ ⎣ 4⎦ 3 −3 −1 ⎤

This idea is useful for constructing numerical algorithms to calculate derivatives. E X A M P L E 4 Working with Isomorphisms

Use the natural isomorphism between P5 and R 6 to determine whether the following polynomials are linearly independent. p1 = 1 + 2x − 3x 2 + 4x 3 + x 5 p2 = 1 + 3x − 4x 2 + 6x 3 + 5x 4 + 4x 5 p3 = 3 + 8x − 11x 2 − 16x 3 + 10x 4 + 9x 5 Solution We will convert this to a matrix problem by creating a matrix whose rows

are the coordinate vectors of the polynomials under the natural isomorphism and then determine whether those rows are linearly independent using elementary row operations. The matrix whose rows are the coordinate vectors of the polynomials under the natural isomorphism is ⎤ ⎡ 1 2 −3 4 0 1

⎢ A = ⎣1

3

−4

6

5

3

8

−11

16

10



4⎦ 9

We leave it for you to use elementary row operations to reduce this matrix to the row echelon form ⎤ ⎡ 1 2 −3 4 0 1



⎢ R = ⎣0

1

−1

2

5

3⎦

0

0

0

0

0

0

This matrix has only two nonzero rows, so the row space of A is two-dimensional, which means that its row vectors are linearly dependent. Hence so are the given polynomials. Inner Product Space Isomorphisms

In the case where V is a real n-dimensional inner product space, both V and R n have, in addition to their algebraic structure, a geometric structure arising from their respective inner products. Thus, it is reasonable to inquire if there exists an isomorphism from V to R n that preserves the geometric structure as well as the algebraic structure. For example, we would want orthogonal vectors in V to have orthogonal counterparts in R n , and we would want orthonormal sets in V to correspond to orthonormal sets in R n . In order for an isomorphism to preserve geometric structure, it obviously has to preserve inner products, since notions of length, angle, and orthogonality are all based

470

Chapter 8 General Linear Transformations

on the inner product. Thus, if V and W are inner product spaces, then we call an isomorphism T : V →W an inner product space isomorphism if

T(u), T(v) = u, v for all u and v in V The following analog of Theorem 8.3.2 provides an important method for obtaining inner product space isomorphisms between real inner product spaces and Euclidean vector spaces. If S = {v1 , v2 , . . . , vn } is an ordered orthonormal basis for a real vector space V , then the coordinate map

THEOREM 8.3.3

T

u −→ (u)S is an inner product space isomorphism between V and the vector space R n with the Euclidean inner product.

E X A M P L E 5 An Inner Product Space Isomorphism

We saw in Example 1 that the coordinate map T

a0 + a1 x + · · · + an−1 x n−1 −→ (a0 , a1 , . . . , an−1 ) with respect to the standard basis for Pn−1 is an isomorphism between Pn−1 and R n . However, the standard basis is orthonormal with respect to the standard inner product on Pn−1 (see Example 3 of Section 6.3), so it follows that T is actually an inner product space isomorphism with respect to the standard inner product on Pn−1 and the Euclidean inner product on R n . To verify that this is so, recall from Example 7 of Section 6.1 that the standard inner product on Pn−1 of two vectors p = a0 + a1 x + · · · + an−1 x n−1 and q = b0 + b1 x + · · · + bn−1 x n−1 is

p, q = a0 b0 + a1 b1 + · · · + an−1 bn−1 But this is exactly the Euclidean inner product on R n of the n-tuples

(a0 , a1 , . . . , an−1 ) and (b0 , b1 , . . . , bn−1 )

E X A M P L E 6 A Notational Matter

Let R n be the vector space of real n-tuples in comma-delimited form, let Mn be the vector space of real n × 1 matrices, let R n have the Euclidean inner product u, v = u · v, and let Mn have the inner product u, v = uT v in which u and v are expressed in column form. The mapping T : R n →Mn defined by

⎡ ⎤ v1 ⎢v2 ⎥ T ⎢ ⎥ (v1 , v2 , . . . , vn ) −→ ⎢ .. ⎥ ⎣.⎦ vn

is an inner product space isomorphism, so the distinction between the inner product space R n and the inner product space Mn is essentially notational, a fact that we have used many times in this text.

8.3 Isomorphism

Exercise Set 8.3 In Exercises 1–8, state whether the transformation is an isomorphism. No proof required. 1. c0 + c1 x →(c0 − c1 , c1 ) from P1 to R . 2

 16.

a c





⎢ a+b ⎢ →⎢ ⎣ d a+b+c b



3. a + bx + cx 2 + dx 3 →



b

c

d

4.

a

3

a

b

c

d

⎤ ⎥ ⎥ ⎥ ⎦

a+b+c+d

2. (x, y) →(x, y, 0) from R to R . 2

a+b



17. Do you think that R 2 is isomorphic to the xy -plane in R 3 ? Justify your answer.

from P3 to M22 .



18. (a) For what value or values of k , if any, is Mmn isomorphic to R k ?

→ad − bc from M22 to R .

5. (a, b, c, d) →a + bx + cx 2 + (d + 1)x 3 from R 4 to P3 . 6. A →AT from Mnn to Mnn .

(b) For what value or values of k , if any, is Mmn isomorphic to Pk ? 19. Let T : P2 →M22 be the mapping





p(0)

p(1)

p(1)

p(0)

7. c1 sin x + c2 cos x →(c1 , c2 ) from the subspace of C(−⬁, ⬁) spanned by S = {sin x, cos x} to R 2 .

T (p) = T (p(x)) =

8. The map (u1 , u2 , . . . , un , . . .) →(0, u1 , u2 , . . . , un , . . .) from R ⬁ to R ⬁ .

Is this an isomorphism? Justify your answer.

9. (a) Find an isomorphism between the vector space of all 3 × 3 symmetric matrices and R 6 . (b) Find two different isomorphisms between the vector space of all 2 × 2 matrices and R 4 . 10. (a) Find an isomorphism between the vector space of all polynomials of degree at most 3 such that p(0) = 0 and R 3 . (b) Find an isomorphism between the vector spaces span{1, sin(x), cos(x)} and R 3 . In Exercises 11–12, determine whether the matrix transformation TA : R 3 →R 3 is an isomorphism.



0

1

⎢ 11. A = ⎣ 1 −1

−1





0

⎥ 2⎦

1

0

−1

1

⎢ 12. A = ⎣ 0 −1

0



0

⎥ 2⎦

1

0

In Exercises 13–14, find the dimension n of the solution space W of Ax = 0, and then construct an isomorphism between W and Rn .





1

1

1

1

13. A = ⎣2

2

2

2⎦

3

3

3

3







1

⎢1 ⎢ 14. A = ⎢ ⎣0 0



0

1

0

0

1

0⎥ ⎥

1

0

1⎦

1

0

1



In Exercises 15–16, determine whether the transformation is an isomorphism from M22 to R 4 .

 15.

a c





a

⎢ a+b ⎢ →⎢ ⎣ a+b+c d b

a+b+c+d

⎤ ⎥ ⎥ ⎥ ⎦

471

20. Show that if M22 and P3 have the standard inner products given in Examples 6 and 7 of Section 6.1, then the mapping



a0

a1

a2

a3



→a0 + a1 x + a2 x 2 + a3 x 3

is an inner product space isomorphism between those spaces. 21. (Calculus required ) Devise a method for using matrix multiplication to differentiate functions in the vector space span{1, sin(x), cos(x), sin(2x), cos(2x)}. Use your method to find the derivative of 3 − 4 sin(x) + sin(2x) + 5 cos(2x).

Working with Proofs 22. Prove that if T : V →W is an isomorphism, then so is T −1 : W →V . 23. Prove that if U , V , and W are vector spaces such that U is isomorphic to V and V is isomorphic to W , then U is isomorphic to W . 24. Use the result in Exercise 22 to prove that any two real finitedimensional vector spaces with the same dimension are isomorphic to one another. 25. Prove that an inner product space isomorphism preserves angles and distances—that is, the angle between u and v in V is equal to the angle between T(u) and T(v) in W , and

u − v V = T(u) − T (v) W . 26. Prove that an inner product space isomorphism maps orthonormal sets into orthonormal sets.

True-False Exercises TF. In parts (a)–(f) determine whether the statement is true or false, and justify your answer.

472

Chapter 8 General Linear Transformations

(a) The vector spaces R 2 and P2 are isomorphic.

(d) There is a subspace of M23 that is isomorphic to R 4 .

(b) If the kernel of a linear transformation T : P3 →P3 is {0}, then T is an isomorphism.

(e) Isomorphic finite-dimensional vector spaces must have the same number of basis vectors.

(c) Every linear transformation from M33 to P9 is an isomorphism.

(f ) R n is isomorphic to a subspace of R n+1 .

8.4 Matrices for General LinearTransformations In this section we will show that a general linear transformation from any n-dimensional vector space V to any m-dimensional vector space W can be performed using an appropriate matrix transformation from R n to R m . This idea is used in computer computations since computers are well suited for performing matrix computations.

Matrices of Linear Transformations

Suppose that V is an n-dimensional vector space, that W is an m-dimensional vector space, and that T : V →W is a linear transformation. Suppose further that B is a basis for V , that B is a basis for W , and that for each vector x in V , the coordinate matrices for x and T(x) are [x]B and [T(x)]B , respectively (Figure 8.4.1).

A vector in V (n-dimensional) A vector in Rn

T

x

A vector in Rm

[T(x)]B´´

[x]B

Figure 8.4.1

A vector in W (m-dimensional)

T(x)

It will be our goal to find an m × n matrix A such that multiplication by A maps the vector [x]B into the vector [T(x)]B for each x in V (Figure 8.4.2a). If we can do so, then, as illustrated in Figure 8.4.2b, we will be able to execute the linear transformation T by using matrix multiplication and the following indirect procedure: Finding T(x) Indirectly Step 1. Compute the coordinate vector [x]B . Step 2. Multiply [x]B on the left by A to produce [T(x)]B . Step 3. Reconstruct T(x) from its coordinate vector [T(x)]B .

T maps V into W

x

T

T(x)

Direct computation

x (1)

[x]B

A

[T(x)]B´

(3) Multiply by A

[x]B

(2)

Multiplication by A maps Rn into Rm

Figure 8.4.2

(a)

T(x)

(b)

[T(x)]B´

8.4 Matrices for General Linear Transformations

473

The key to executing this plan is to find an m × n matrix A with the property that

A[x]B = [T(x)]B

(1)

For this purpose, let B = {u1 , u2 , . . . , un } be a basis for the n-dimensional space V and B = {v1 , v2 , . . . , vm } a basis for the m-dimensional space W . Since Equation (1) must hold for all vectors in V , it must hold, in particular, for the basis vectors in B ; that is,

A[u1 ]B = [T(u1 )]B , A[u2 ]B = [T(u2 )]B , . . . , A[un ]B = [T(un )]B

But

⎡ ⎤

⎡ ⎤

⎡ ⎤

⎣.⎦

⎣.⎦

⎣.⎦

0

0

(2)

1 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥, [u2 ]B = ⎢0⎥, . . . , [un ]B = ⎢0⎥ 0 [u1 ]B = ⎢ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ ⎢.⎥ so

1

⎡ ⎤



a11 ⎢a ⎢ 21 A[u1 ]B = ⎢ .. ⎣ .

a12 a22 .. .

am 1

am2

⎤ 1 ⎡ ⎤ a11 a1n ⎢ ⎥ ⎢ ⎥ 0 ⎥ ⎢ a2n ⎥ ⎢ ⎥ ⎢ a21 ⎥ ⎥ ⎥ ⎥ .. ⎥ ⎢ ⎢0⎥ = ⎢ . ⎦ ⎣ .. ⎦ . ⎢ .. ⎥ ⎣.⎦ · · · amn am1 ··· ···

0

⎡ ⎤



a11 ⎢ a21 ⎢ A[u2 ]B = ⎢ . ⎣ . . am 1

a12 a22 .. . am2

.. .

⎤ 0 ⎡ ⎤ a1n ⎢ ⎥ a12 ⎢1⎥ ⎢ ⎥ ⎥ a2n ⎥ ⎢ ⎥ ⎢ a22 ⎥ ⎢0⎥ = ⎢ . ⎥ ⎥ .. ⎥ ⎦⎢ . ⎥ ⎣ .. ⎦ . ⎢ ⎣ .. ⎦ am2 · · · amn ··· ···

0

.. . ⎡

a11 ⎢a ⎢ 21 A[un ]B = ⎢ .. ⎣ . am 1

a12 a22 .. . am2

⎡ ⎤ ⎤ 0 a1n a1n ⎢ ⎥ ⎢0⎥ ⎢ ⎥ ⎥ ⎢ a a2 n ⎥ ⎥ ⎥ ⎢ 2n ⎥ .. ⎥ ⎢ ⎢0⎥ = ⎢ ⎥ . ⎦ ⎢ .. ⎥ ⎣ ... ⎦ ⎣.⎦ · · · amn amn ··· ···

Substituting these results into (2) yields



⎤ a11 ⎢ ⎥ ⎢ a21 ⎥ ⎢ . ⎥ = [T(u1 )]B , ⎢ . ⎥ ⎣ . ⎦ am1

.. .

⎡ ⎤



1

⎤ a12 ⎢ ⎥ ⎢ a22 ⎥ ⎢ . ⎥ = [T(u2 )]B , . . . , ⎢ . ⎥ ⎣ . ⎦ am2



⎤ a1n ⎢ ⎥ ⎢ a2n ⎥ ⎢ . ⎥ = [T(un )]B

⎢ . ⎥ ⎣ . ⎦ amn

which shows that the successive columns of A are the coordinate vectors of

T(u1 ), T(u2 ), . . . , T(un ) with respect to the basis B . Thus, the matrix A that completes the link in Figure 8.4.2a is

  A = [T(u1 )]B | [T(u2 )]B | · · · | [T(un )]B

(3)

We will call this the matrix for T relative to the bases B and B and will denote it by the symbol [T ]B ,B . Using this notation, Formula (3) can be written as

  [T ]B ,B = [T(u1 )]B | [T(u2 )]B | · · · | [T(un )]B

(4)

474

Chapter 8 General Linear Transformations

and from (1), this matrix has the property

[T ]B ,B [x]B = [T(x)]B

We leave it as an exercise to show that in the special case where TC : R n →R m is multiplication by C , and where B and B are the standard bases for R n and R m , respectively, then [TC ]B ,B = C (6)

[T ]B´,B Basis for the image space

Basis for the domain

Figure 8.4.3

[T ]B´,B [x]B = [T(x)]B´

Remark Observe that in the notation [T ]B ,B the right subscript is a basis for the domain of T , and the left subscript is a basis for the image space of T (Figure 8.4.3). Moreover, observe how the subscript B seems to “cancel out” in Formula (5) (Figure 8.4.4).

E X A M P L E 1 Matrix for a Linear Transformation

Let T : P1 →P2 be the linear transformation defined by

T(p(x)) = xp(x)

Cancellation

Figure 8.4.4

(5)

Find the matrix for T with respect to the standard bases

B = {u1 , u2 } and B = {v1 , v2 , v3 } where u1 = 1, u2 = x;

v1 = 1, v2 = x, v3 = x 2

Solution From the given formula for T we obtain

T(u1 ) = T(1) = (x)(1) = x T(u2 ) = T(x) = (x)(x) = x 2 By inspection, the coordinate vectors for T(u1 ) and T(u2 ) relative to B are

⎡ ⎤

[T(u1 )]B

⎡ ⎤

0 0 ⎢ ⎥ ⎢ ⎥ = ⎣1⎦, [T(u2 )]B = ⎣0⎦ 0 1

Thus, the matrix for T with respect to B and B is

 [T ]B ,B = [T(u1 )]B



0  ⎢ | [T(u2 )]B = ⎣1 0



0 ⎥ 0⎦ 1

E X A M P L E 2 The Three-Step Procedure

Let T : P1 →P2 be the linear transformation in Example 1, and use the three-step procedure described in the following figure to perform the computation

T(a + bx) = x(a + bx) = ax + bx 2 x

Direct computation

(1) [x]B

T(x) (3)

Multiply by [T ]B´,B (2)

[T(x)]B´

8.4 Matrices for General Linear Transformations

475

Solution

Step 1. The coordinate matrix for x = a + bx relative to the basis B = {1, x} is

[x]B = Although Example 2 is simple, the procedure that it illustrates is applicable to problems of great complexity.

a b

Step 2. Multiplying [x]B by the matrix [T ]B ,B found in Example 1 we obtain



0 ⎢

[T ]B ,B [x]B = ⎣1 0

⎡ ⎤



0 0 ⎥ a ⎢ ⎥ 0⎦ a = ⎣ ⎦ = [T(x)]B

b 1 b

Step 3. Reconstructing T(x) = T(a + bx) from [T(x)]B we obtain

T(a + bx) = 0 + ax + bx 2 = ax + bx 2

E X A M P L E 3 Matrix for a Linear Transformation

Let T : R 2 →R 3 be the linear transformation defined by

T

⎤ ⎡ ⎤ ⎡ 0 1 x2 ! x1 ⎥ ⎢ ⎥ x1 ⎢ = ⎣−5x1 + 13x2 ⎦ = ⎣−5 13⎦ x2 x2 −7x1 + 16x2 −7 16

Find the matrix for the transformation T with respect to the bases B = {u1 , u2 } for R 2 and B = {v1 , v2 , v3 } for R 3 , where

  u1 =



 





⎡ ⎤



−1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ v1 = ⎣ 0⎦, v2 = ⎣ 2⎦, v3 = ⎣1⎦ 2 2 −1

3 5 , u2 = ; 1 2

Solution From the formula for T ,









1 2 ⎢ ⎥ ⎢ ⎥ T(u1 ) = ⎣−2⎦, T(u2 ) = ⎣ 1⎦ −5 −3 Expressing these vectors as linear combinations of v1 , v2 , and v3 , we obtain (verify)

T(u1 ) = v1 − 2v3 , T(u2 ) = 3v1 + v2 − v3 ⎡

Thus,

[T(u1 )]B







1 3 ⎢ ⎥ ⎢ ⎥ = ⎣ 0⎦, [T(u2 )]B = ⎣ 1⎦ −2 −1



so

 [T ]B ,B = [T(u1 )]B

1  ⎢ | [T(u2 )]B = ⎣ 0 −2



3 ⎥ 1⎦ −1

Remark Example 3 illustrates that a fixed linear transformation generally has multiple representations, each depending on the bases chosen. In this case the matrices



0

⎢ [T ] = ⎣−5 −7

1









1

13⎦ and [T ]B ,B = ⎣ 0 16

−2

3

⎤ ⎥

1⎦

−1

both represent the transformation T , the first relative to the standard bases for R 2 and R 3 , the second relative to the bases B and B stated in the example.

476

Chapter 8 General Linear Transformations

Matrices of Linear Operators Phrased informally, Formulas (7) and (8) state that the matrix for T , when multiplied by the coordinate vector for x, produces the coordinate vector for T(x).

In the special case where V = W (so that T : V →V is a linear operator), it is usual to take B = B when constructing a matrix for T . In this case the resulting matrix is called the matrix for T relative to the basis B and is usually denoted by [T ]B rather than [T ]B,B . If B = {u1 , u2 , . . . , un }, then Formulas (4) and (5) become

  [T ]B = [T(u1 )]B | [T(u2 )]B | · · · | [T(un )]B

(7)

[T ]B [x]B = [T(x)]B

(8)

In the special case where T : R n →R n is a matrix operator, say multiplication by A, and B is the standard basis for R n , then Formula (7) simplifies to

[T ]B = A Matrices of Identity Operators

(9)

Recall that the identity operator I : V →V maps every vector in V into itself, that is, I (x) = x for every vector x in V. The following example shows that if V is n-dimensional, then the matrix for I relative to any basis B for V is the n × n identity matrix. E X A M P L E 4 Matrices of Identity Operators

If B = {u1 , u2 , . . . , un } is a basis for a finite-dimensional vector space V , and if I : V →V is the identity operator on V , then

I (u1 ) = u1 , I (u2 ) = u2 , . . . , I (un ) = un Therefore,



1 ⎢ ⎢0 ⎢ [I ]B = ⎢ ⎢0

0 1 0

0

0

⎢ .. ⎣.

.. .

⎤ ··· 0 ⎥ · · · 0⎥ ⎥ · · · 0⎥ =I .. ⎥ ⎥ .⎦ ··· 1







[I (u1 )]B

[I (u2 )]B

[I (un )]B

E X A M P L E 5 Linear Operator on P 2

Let T : P2 →P2 be the linear operator defined by

T(p(x)) = p(3x − 5) that is, T(c0 + c1 x + c2 x ) = c0 + c1 (3x − 5) + c2 (3x − 5)2 . 2

(a) Find [T ]B relative to the basis B = {1, x, x 2 }. (b) Use the indirect procedure to compute T(1 + 2x + 3x 2 ). (c) Check the result in (b) by computing T(1 + 2x + 3x 2 ) directly. Solution (a) From the formula for T ,

T(1) = 1, T(x) = 3x − 5, T(x 2 ) = (3x − 5)2 = 9x 2 − 30x + 25 so

⎡ ⎤









−5 1 25 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 0 3 − [T(1)]B = ⎣ ⎦, [T(x)]B = ⎣ ⎦, [T(x )]B = ⎣ 30⎦ 0 0 9

8.4 Matrices for General Linear Transformations

Thus,



1 ⎢ [T ]B = ⎣0 0

−5 3 0

477



25 ⎥ −30⎦ 9

Solution (b)

Step 1. The coordinate matrix for p = 1 + 2x + 3x 2 relative to the basis B = {1, x, x 2 } is ⎡ ⎤ 1 ⎢ ⎥ [p]B = ⎣2⎦ 3 Step 2. Multiplying [p]B by the matrix [T ]B found in part (a) we obtain



1 ⎢ [T ]B [p]B = ⎣0 0

−5 3 0

⎤⎡ ⎤





25 1 66 ⎥⎢ ⎥ ⎢ ⎥ −30⎦ ⎣2⎦ = ⎣−84⎦ = [T(p)]B 3 9 27

Step 3. Reconstructing T(p) = T(1 + 2x + 3x 2 ) from [T(p)]B we obtain

T(1 + 2x + 3x 2 ) = 66 − 84x + 27x 2 Solution (c) By direct computation,

T(1 + 2x + 3x 2 ) = 1 + 2(3x − 5) + 3(3x − 5)2 = 1 + 6x − 10 + 27x 2 − 90x + 75 = 66 − 84x + 27x 2 which agrees with the result in (b). Matrices of Compositions and Inverse Transformations

We will conclude this section by mentioning two theorems without proof that are generalizations of Formulas (4) and (9) of Section 4.10.

T1 : U →V and T2 : V →W are linear transformations, and if B, B

, and B are bases for U, V, and W, respectively, then

THEOREM 8.4.1 If

[T2 ◦ T1 ]B ,B = [T2 ]B ,B

[T1 ]B

,B

(10)

T : V →V is a linear operator, and if B is a basis for V, then the following are equivalent.

THEOREM 8.4.2 If

(a) T is one-to-one. (b) [T ]B is invertible. Moreover, when these equivalent conditions hold, 1 [T −1 ]B = [T ]− B

(11)

Remark In (10), observe how the interior subscript B

(the basis for the intermediate space V ) [T2° T1]B´,B = [T2]B´,B´´ [T1]B´´,B Cancellation

Figure 8.4.5

seems to “cancel out,” leaving only the bases for the domain and image space of the composition as subscripts (Figure 8.4.5). This “cancellation” of interior subscripts suggests the following extension of Formula (10) to compositions of three linear transformations (Figure 8.4.6):

[T3 ◦ T2 ◦ T1 ]B ,B = [T3 ]B ,B

[T2 ]B

,B

[T1 ]B

,B

(12)

478

Chapter 8 General Linear Transformations T2

T1

Basis B

T3

Basis B´´

Basis B´´´

Basis B´

Figure 8.4.6

The following example illustrates Theorem 8.4.1. E X A M P L E 6 Composition

Let T1 : P1 →P2 be the linear transformation defined by

T1 (p(x)) = xp(x) and let T2 : P2 →P2 be the linear operator defined by

T2 (p(x)) = p(3x − 5) Then the composition (T2 ◦ T1 ): P1 →P2 is given by

(T2 ◦ T1 )(p(x)) = T2 (T1 (p(x))) = T2 (xp(x)) = (3x − 5)p(3x − 5) Thus, if p(x) = c0 + c1 x , then   (T2 ◦ T1 )(c0 + c1 x) = (3x − 5) c0 + c1 (3x − 5) = c0 (3x − 5) + c1 (3x − 5)2 (13) In this example, P1 plays the role of U in Theorem 8.4.1, and P2 plays the roles of both V and W ; thus we can take B = B

in (10) so that the formula simplifies to [T2 ◦ T1 ]B ,B = [T2 ]B [T1 ]B ,B (14)

2 Let us choose B = {1, x} to be the basis for P1 and choose B = {1, x, x } to be the basis for P2 . We showed in Examples 1 and 5 that ⎤ ⎤ ⎡ ⎡ 0 0 1 −5 25 ⎥ ⎥ ⎢ ⎢ 3 −30⎦ [T1 ]B ,B = ⎣1 0⎦ and [T2 ]B = ⎣0 0

1

0

Thus, it follows from (14) that



[T2 ◦ T1 ]B ,B

1 ⎢ = ⎣0 0

−5 3 0

⎤⎡

25 0 ⎥⎢ −30⎦ ⎣1 0 9

0





−5 0 ⎥ ⎢ 0⎦ = ⎣ 3 1 0

9



25 ⎥ −30⎦ 9

(15)

As a check, we will calculate [T2 ◦ T1 ]B ,B directly from Formula (4). Since B = {1, x}, it follows from Formula (4) with u1 = 1 and u2 = x that

  [T2 ◦ T1 ]B ,B = [(T2 ◦ T1 )(1)]B | [(T2 ◦ T1 )(x)]B

(16)

Using (13) yields

(T2 ◦ T1 )(1) = 3x − 5 and (T2 ◦ T1 )(x) = (3x − 5)2 = 9x 2 − 30x + 25 From this and the fact that B = {1, x, x 2 }, it follows that ⎤ ⎡ ⎤ ⎡ 25 −5 ⎥ ⎢ ⎥ ⎢ [(T2 ◦ T1 )(1)]B = ⎣ 3⎦ and [(T2 ◦ T1 )(x)]B = ⎣−30⎦ 0 Substituting in (16) yields

9



−5

⎢ [T2 ◦ T1 ]B ,B = ⎣ 3 0 which agrees with (15).



25 ⎥ −30⎦ 9

8.4 Matrices for General Linear Transformations

479

Exercise Set 8.4 1. Let T : P2 →P3 be the linear transformation defined by T(p(x)) = xp(x).

6. Let T : R 3 →R 3 be the linear operator defined by

T(x1 , x2 , x3 ) = (x1 − x2 , x2 − x1 , x1 − x3 )

(a) Find the matrix for T relative to the standard bases

B = {u1 , u2 , u3 } and B = {v1 , v2 , v3 , v4 } where u1 = 1, v1 = 1,

u2 = x, v2 = x,

v1 = (1, 0, 1), v2 = (0, 1, 1), v3 = (1, 1, 0)

u3 = x 2 v3 = x 2 ,

v4 = x 3

(b) Verify that the matrix [T ]B ,B obtained in part (a) satisfies Formula (5) for every vector x = c0 + c1 x + c2 x 2 in P2 . 2. Let T : P2 →P1 be the linear transformation defined by

T(a0 + a1 x + a2 x 2 ) = (a0 + a1 ) − (2a1 + 3a2 )x (a) Find the matrix for T relative to the standard bases B = {1, x, x 2 } and B = {1, x} for P2 and P1 . (b) Verify that the matrix [T ]B ,B obtained in part (a) satisfies Formula (5) for every vector x = c0 + c1 x + c2 x 2 in P2 . 3. Let T : P2 →P2 be the linear operator defined by

T(a0 + a1 x + a2 x 2 ) = a0 + a1 (x − 1) + a2 (x − 1)2 (a) Find the matrix for T relative to the standard basis B = {1, x, x 2 } for P2 . (b) Verify that the matrix [T ]B obtained in part (a) satisfies Formula (8) for every vector x = a0 + a1 x + a2 x 2 in P2 . 4. Let T : R 2 →R 2 be the linear operator defined by

T

! x1 − x2 x1 = x2 x1 + x2

and let B = {u1 , u2 } be the basis for which



1 u1 = 1



and u2 =

−1 0

⎤ ⎡ ! x1 + 2x2 x1 ⎥ ⎢ = ⎣ −x1 ⎦ x2

⎡ ⎤

(a) Find [T ]B with respect to the basis B = {1, x, x 2 }. (b) Use the three-step procedure illustrated in Example 2 to compute T(2 − 3x + 4x 2 ). (c) Check the result obtained in part (b) by computing T(2 − 3x + 4x 2 ) directly. 8. Let T : P2 →P3 be the linear transformation defined by T(p(x)) = xp(x − 3), that is,

  T(c0 + c1 x + c2 x 2 ) = x c0 + c1 (x − 3) + c2 (x − 3)2







−1 1 1 3 , and let A = and v2 = be the 3 4 −2 5 matrix for T : R 2 →R 2 relative to the basis B = {v1 , v2 }.

9. Let v1 =

(b) Find T(v1 ) and T(v2 ).

(a) Find the matrix [T ]B ,B relative to the bases B = {u1 , u2 } and B = {v1 , v2 , v3 }, where

⎡ ⎤

T(c0 + c1 x + c2 x 2 ) = c0 + c1 (2x + 1) + c2 (2x + 1)2

(a) Find [T(v1 )]B and [T(v2 )]B .

0



7. Let T : P2 →P2 be the linear operator defined by T(p(x)) = p(2x + 1), that is,

(c) Check the result obtained in part (b) by computing T(1 + x − x 2 ) directly.

5. Let T : R 2 →R 3 be defined by

−2 1 , u2 = u1 = 3 4

(c) Is T one-to-one? If so, find the matrix of T −1 with respect to the basis B .

(b) Use the three-step procedure illustrated in Example 2 to compute T(1 + x − x 2 ).

(a) Find [T ]B .



(b) Verify that Formula (8) holds for every vector x = (x1 , x2 , x3 ) in R 3 .

(a) Find [T ]B ,B relative to the bases B = {1, x, x 2 } and B = {1, x, x 2 , x 3 }.

(b) Verify that Formula (8) holds for every vector x in R 2 .

T

(a) Find the matrix for T with respect to the basis B = {v1 , v2 , v3 }, where

(c) Find a formula for T

! x1 . x2

(d) Use the formula obtained in (c) to compute T

⎡ ⎤

1 2 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ v1 = ⎣1⎦, v2 = ⎣2⎦, v3 = ⎣0⎦ 1 0 0 (b) Verify that Formula (5) holds for every vector in R 2 .



1 1



3 −2

1

0

6 0

2 7

1⎦ be the matrix for 1

⎢ 10. Let A = ⎣ 1 −3

!



T : R 4 →R 3 relative to the bases B = {v1 , v2 , v3 , v4 } and

.

480

Chapter 8 General Linear Transformations

B = {w1 , w2 , w3 }, where ⎡ ⎤ ⎡ 0

2





⎡ ⎤



1

6

⎢1⎥ ⎢ 1⎥ ⎢ 4⎥ ⎢9⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ v1 = ⎢ ⎣1⎦, v2 = ⎣−1⎦, v3 = ⎣−1⎦, v4 = ⎣4⎦ 1 2 2 −1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −7 −6 0 ⎦ ⎦ ⎣ ⎣ ⎣ 8 8 9⎦ , w2 = , w3 = w1 = 8

1

14. Let B = {v1 , v2 , v3 , v4 } be a basis for a vector space V. Find the matrix with respect to B for the linear operator T : V →V defined by T(v1 ) = v2 , T(v2 ) = v3 , T(v3 ) = v4 , T(v4 ) = v1 . 15. Let T : P2 →M22 be the linear transformation defined by



T (p) =

1

(a) Find [T(v1 )]B , [T(v2 )]B , [T(v3 )]B , and [T(v4 )]B .

⎛⎡ ⎤⎞ x1 ⎜⎢x2 ⎥⎟ ⎢ ⎥⎟ (c) Find a formula for T ⎜ ⎝⎣x3 ⎦⎠. x4



p(0)

p(1)

p(−1) p(0)

let B be the standard basis for M22 , and let B = {1, x, x 2 }, B

= {1, 1 + x, 1 + x 2 } be bases for P2 .

(b) Find T(v1 ), T(v2 ), T(v3 ), and T(v4 ).

(a) Find [T ]B,B and [T ]B,B

.

⎛⎡ ⎤⎞ 2

⎜⎢2⎥⎟ ⎢ ⎥⎟ (d) Use the formula obtained in (c) to compute T ⎜ ⎝⎣0⎦⎠. ⎡

(c) Verify that the matrices in part (a) satisfy the formula you stated in part (b).



0

1 3 −1 0 5⎦ be the matrix for T : P2 →P2 with 11. Let A = ⎣2 6 −2 4 respect to the basis B = {v1 , v2 , v3 }, where v1 = 3x + 3x 2 , v2 = −1 + 3x + 2x 2 , v3 = 3 + 7x + 2x 2 . (a) Find [T(v1 )]B , [T(v2 )]B , and [T(v3 )]B . (b) Find T(v1 ), T(v2 ), and T(v3 ). (c) Find a formula for T(a0 + a1 x + a2 x 2 ). (d) Use the formula obtained in (c) to compute T(1 + x 2 ). 12. Let T1 : P1 →P2 be the linear transformation defined by

T1 (p(x)) = xp(x) and let T2 : P2 →P2 be the linear operator defined by

T2 (p(x)) = p(2x + 1) Let B = {1, x} and B = {1, x, x 2 } be the standard bases for P1 and P2 . (a) Find [T2 ◦ T1 ]B ,B , [T2 ]B , and [T1 ]B ,B . (b) State a formula relating the matrices in part (a). (c) Verify that the matrices in part (a) satisfy the formula you stated in part (b). 13. Let T1 : P1 →P2 be the linear transformation defined by

T1 (c0 + c1 x) = 2c0 − 3c1 x and let T2 : P2 →P3 be the linear transformation defined by

T2 (c0 + c1 x + c2 x 2 ) = 3c0 x + 3c1 x 2 + 3c2 x 3 Let B = {1, x}, B

= {1, x, x 2 }, and B = {1, x, x 2 , x 3 }. (a) Find [T2 ◦ T1 ]B ,B , [T2 ]B ,B

, and [T1 ]B

,B . (b) State a formula relating the matrices in part (a).

(b) For the matrices obtained in part (a), compute T(2 + 2x + x 2 ) using the three-step procedure illustrated in Example 2. (c) Check the results obtained in part (b) by computing T(2 + 2x + x 2 ) directly. 16. Let T : M22 →R 2 be the linear transformation given by

T

  a b c d



a+b+c

=



d

and let B be the standard basis for M22 , B the standard basis for R 2 , and    3 1 −1



B =

1

,

0

(a) Find [T ]B ,B and [T ]B

,B .



(b) Compute T

1 2



using the three-step procedure 3 4 that was illustrated in Example 2 for both matrices found in part (a).

(c) Check results obtained in part (b) by computing  the 1 2 T directly. 3 4 17. (Calculus required ) Let D : P2 →P2 be the differentiation operator D(p) = p (x). (a) Find the matrix for D relative to the basis B = {p1 , p2 , p3 } for P2 in which p1 = 1, p2 = x, p3 = x 2 . (b) Use the matrix in part (a) to compute D(6 − 6x + 24x 2 ). 18. (Calculus required ) Let D : P2 →P2 be the differentiation operator D(p) = p (x). (a) Find the matrix for D relative to the basis B = {p1 , p2 , p3 } for P2 in which p1 = 2, p2 = 2 − 3x, p3 = 2 − 3x + 8x 2 . (b) Use the matrix in part (a) to compute D(6 − 6x + 24x 2 ). 19. (Calculus required ) Let V be the vector space of real-valued functions defined on the interval (−⬁, ⬁), and let D : V →V be the differentiation operator. (a) Find the matrix for D relative to the basis B = {f1 , f2 , f3 } for V in which f1 = 1, f2 = sin x, f3 = cos x

8.5 Similarity

23. Prove that if B and B are the standard bases for R n and R m , respectively, then the matrix for a linear transformation T : R n →R m relative to the bases B and B is the standard matrix for T .

(b) Use the matrix in part (a) to compute

D(2 + 3 sin x − 4 cos x) 20. Let V be a four-dimensional vector space with basis B , let W be a seven-dimensional vector space with basis B , and let T : V →W be a linear transformation. Identify the four vector spaces that contain the vectors at the corners of the accompanying diagram.

x

Direct computation

(1)

True-False Exercises TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer. (a) If the matrix of a linear transformation

T : V →W relative to 2 4 some bases of V and W is , then there is a nonzero 0 3 vector x in V such that T(x) = 2x.

T(x) (3)

[x]B

Multiply by [T]B´, B (2)

481

[T(x)]B´

T : V →W relative to (b) If the matrix of a linear transformation

2 4 bases for V and W is , then there is a nonzero vector 0 3 x in V such that T(x) = 4x.

Figure Ex-20

21. In each part, fill in the missing part of the equation.

(c) If the matrix of a linear transformation

T : V →W relative to 1 4 certain bases for V and W is , then T is one-to-one. 2 3

(a) [T2 ◦ T1 ]B ,B = [T2 ] ? [T1 ]B

,B (b) [T3 ◦ T2 ◦ T1 ]B ,B = [T3 ] ? [T2 ]B

,B

[T1 ]B

,B

(d) If S : V →V and T : V →V are linear operators and B is a basis for V, then the matrix of S ◦ T relative to B is [T ]B [S]B .

Working with Proofs 22. Prove that if T : V →W is the zero transformation, then the matrix for T with respect to any bases for V and W is a zero matrix.

(e) If T : V →V is an invertible linear operator and B is a basis 1 for V, then the matrix for T −1 relative to B is [T ]− B .

8.5 Similarity The matrix for a linear operator T : V →V depends on the basis selected for V. One of the fundamental problems of linear algebra is to choose a basis for V that makes the matrix for T as simple as possible—a diagonal or a triangular matrix, for example. In this section we will study this problem.

Simple Matrices for Linear Operators

Standard bases do not necessarily produce the simplest matrices for linear operators. For example, consider the matrix operator T : R 2 →R 2 whose matrix relative to the standard basis B = {e1 , e2 } for R 2 is

1 1 [T ]B = (1) −2 4 Let us compare this matrix to the matrix [T ]B for the same operator T but relative to the basis B = {u 1 , u 2 } for R 2 in which u 1 = Since

T(u 1 ) =



1 −2



1 4





1 1 , u 2 = 1 2

(2)







1 2 1 = = 2u 1 and T(u 2 ) = 1 2 −2



it follows that

[T(u1 )]

B

2 = 0



and [T(u2 )]

1 4

B

=

0 3



1 3 = = 3u 2 2 6

482

Chapter 8 General Linear Transformations

so the matrix for T relative to the basis B is

  2 [T ]B = T(u 1 )B | T(u 2 )B =

0

0 3

This matrix, being diagonal, has a simpler form than [T ]B and conveys clearly that the operator T scales u 1 by a factor of 2 and u 2 by a factor of 3, information that is not immediately evident from [T ]B . One of the major themes in more advanced linear algebra courses is to determine the “simplest possible form” that can be obtained for the matrix of a linear operator by choosing the basis appropriately. Sometimes it is possible to obtain a diagonal matrix (as above, for example), whereas other times one must settle for a triangular matrix or some other form. We will only be able to touch on this important topic in this text. The problem of finding a basis that produces the simplest possible matrix for a linear operator T : V →V can be attacked by first finding a matrix for T relative to any basis, typically a standard basis, where applicable, and then changing the basis in a way that simplifies the matrix. Before pursuing this idea, it will be helpful to revisit some concepts about changing bases. A New View of Transition Matrices

Recall from Formulas (7) and (8) of Section 4.6 that if B = {u1 , u2 , . . . , un } and B = {u 1 , u 2 , . . . , u n } are bases for a vector space V, then the transition matrices from B to B and from B to B are

  PB→B = [u1 ]B | [u2 ]B | · · · | [un ]B

  PB →B = [u 1 ]B | [u 2 ]B | · · · | [u n ]B

(3) (4)

where the matrices PB→B and PB →B are inverses of each other. We also showed in Formulas (11) and (12) of that section that if v is any vector in V , then

PB→B [v]B = [v]B

(5)

PB →B [v]B = [v]B

(6)

The following theorem shows that transition matrices in Formulas (3) and (4) can be viewed as matrices for identity operators.

B and B are bases for a finite-dimensional vector space V, and if I : V →V is the identity operator on V, then

THEOREM 8.5.1 If

PB→B = [I ]B ,B and PB →B = [I ]B,B

= {u1 , u2 , . . . , un } and B = {u 1 , u 2 , . . . , u n } are bases for V. Using the fact that I (v) = v for all v in V , it follows from Formula (4) of Section 8.4 that   [I ]B ,B = [I (u1 )]B | [I (u2 )]B | · · · | [I (un )]B

  = [u1 ]B | [u2 ]B | · · · | [un ]B

= PB→B [ Formula (3) above ] Proof Suppose that B

The proof that [I ]B,B = PB →B is similar. Effect of Changing Bases on Matrices of Linear Operators

We are now ready to consider the main problem in this section. Problem If

B and B are two bases for a finite-dimensional vector space V , and if

T : V →V is a linear operator, what relationship, if any, exists between the matrices [T ]B and [T ]B ?

8.5 Similarity

483

The answer to this question can be obtained by considering the composition of the three linear operators on V pictured in Figure 8.5.1. T

I v

Figure 8.5.1

V

I T(v)

v V

Basis = B´

V

Basis = B

Basis = B

T(v) V

Basis = B´

In this figure, v is first mapped into itself by the identity operator, then v is mapped into T(v) by T , and then T(v) is mapped into itself by the identity operator. All four vector spaces involved in the composition are the same (namely, V ), but the bases for the spaces vary. Since the starting vector is v and the final vector is T(v), the composition produces the same result as applying T directly; that is,

T =I ◦T ◦I

(7)

If, as illustrated in Figure 8.5.1, the first and last vector spaces are assigned the basis B

and the middle two spaces are assigned the basis B , then it follows from (7) and Formula (12) of Section 8.4 (with an appropriate adjustment to the names of the bases) that

[T ]B ,B = [I ◦ T ◦ I ]B ,B = [I ]B ,B [T ]B,B [I ]B,B

(8)

or, in simpler notation,

[T ]B = [I ]B ,B [T ]B [I ]B,B

(9)

We can simplify this formula even further by using Theorem 8.5.1 to rewrite it as

[T ]B = PB→B [T ]B PB →B

(10)

In summary, we have the following theorem.

T : V →V be a linear operator on a finite-dimensional vector space V, and let B and B be bases for V . Then

THEOREM 8.5.2 Let

[T ]B = P −1 [T ]B P

(11)

where P = PB →B and P −1 = PB→B .

[T ]B´ = PB→B´ [T ]B PB´→B Exterior subscripts

Warning When applying Theorem 8.5.2, it is easy to forget whether P = PB →B (correct) or

P = PB→B (incorrect). It may help to use the diagram in Figure 8.5.2 and observe that the exterior subscripts of the transition matrices match the subscript of the matrix they enclose.

Figure 8.5.2

In the terminology of Definition 1 of Section 5.2, Theorem 8.5.2 tells us that matrices representing the same linear operator relative to different bases must be similar. The following theorem, which we state without proof, shows that the converse of Theorem 8.5.2 is also true. THEOREM 8.5.3 If V is a finite-dimensional vector space, then two matrices A and B

represent the same linear operator (but possibly with respect to different bases) if and only if they are similar. Moreover, if B = P −1AP , then P is the transition matrix from the basis used for B to the basis used for A.

484

Chapter 8 General Linear Transformations

E X A M P L E 1 Similar Matrices Represent the Same Linear Operator

We showed at the beginning of this section that the matrices

C=

1 −2



1 4

2 0

and D =

0 3

represent the same linear operator T : R 2 →R 2 . Verify that these matrices are similar by finding a matrix P for which D = P −1 CP . Solution We need to find the transition matrix

  P = PB →B = [u 1 ]B | [u 2 ]B where B = {u 1 , u 2 } is the basis for R 2 given by (2) and B = {e1 , e2 } is the standard basis for R 2 . We see by inspection that u 1 = e1 + e2 u 2 = e1 + 2e2 from which it follows that

[u 1 ]B = Thus,

1 1

and [u 2 ]B =







P = PB →B = [u1 ]B | [u2 ]B



1 2



1 = 1

1 2

We leave it for you to verify that

P −1 = and hence that



2 0

D

Similarity Invariants



0 2 = 3 −1

P

−1



2 −1

−1 1 −1

1



1 −2

C



1 4

1 1

1 2

P

Recall from Section 5.2 that a property of a square matrix is called a similarity invariant if that property is shared by all similar matrices. In Table 1 of that section we listed the most important similarity invariants. Since we know from Theorem 8.5.3 that two matrices are similar if and only if they represent the same linear operator T : V →V , it follows that if B and B are bases for V , then every similarity invariant property of [T ]B is also a similarity invariant property of [T ]B . For example, for any two bases B and B

we must have det[T ]B = det[T ]B

It follows from this equation that the value of the determinant depends on T , but not on the particular basis that is used to represent T in matrix form. Thus, the determinant can be regarded as a property of the linear operator T , and we can define the determinant of the linear operator T to be (12) det(T ) = det[T ]B where B is any basis for V . Table 1 lists the basic similarity invariants of a linear operator T : V →V.

8.5 Similarity

485

Table 1 Similarity Invariants Property

Similarity

Determinant

[T ]B and P −1 [T ]B P have the same determinant.

Invertibility

[T ]B is invertible if and only if P −1 [T ]B P is invertible.

Rank

[T ]B and P −1 [T ]B P have the same rank.

Nullity

[T ]B and P −1 [T ]B P have the same nullity.

Trace

[T ]B and P −1 [T ]B P have the same trace.

Characteristic polynomial

[T ]B and P −1 [T ]B P have the same characteristic polynomial.

Eigenvalues

[T ]B and P −1 [T ]B P have the same eigenvalues.

Eigenspace dimension

If λ is an eigenvalue of [T ]B and P −1 [T ]B P, then the eigenspace of [T ]B corresponding to λ and the eigenspace of P −1 [T ]B P corresponding to λ have the same dimension.

E X A M P L E 2 Determinant of a Linear Operator

At the beginning of this section we showed that the matrices

[T ] =

1 −2

1 4

and [T ]B =

2 0

0 3

represent the same linear operator relative to different bases, the first relative to the standard basis B = {e1 , e2 } for R 2 and the second relative to the basis B = {u 1 , u 2 } for which 1 1



u1 = , u2 = 1 2 This means that [T ] and [T ]B must be similar matrices and hence must have the same similarity invariant properties. In particular, they must have the same determinant. We leave it for you to verify that

     1 1 2 0  =6   det[T ] =  = 6 and det[T ]B =  0 3 −2 4 

E X A M P L E 3 Eigenvalues of a Linear Operator

Find the eigenvalues of the linear operator T : P2 →P2 defined by

T (a + bx + cx 2 ) = −2c + (a + 2b + c)x + (a + 3c)x 2 Solution Because eigenvalues are similarity invariants, we can find the eigenvalues of T

by choosing any basis B for P2 and computing the eigenvalues of the matrix [T ]B . We leave it for you to show that the matrix for T relative to the standard basis B = {1, x, x 2 } is ⎤ ⎡ 0 0 −2 ⎥ ⎢ 2 1⎦ [T ]B = ⎣1 1 0 3 Thus, the eigenvalues of T are λ = 1 and λ = 2 (Example 7 of Section 5.1).

486

Chapter 8 General Linear Transformations

Exercise Set 8.5 In Exercises 1–2, use a property from Table 1 to show that the matrices A and B are not similar.



1. (a) A =

1

3

1

1

1

1

1

2

 (b) A =





1

⎢ 2. (a) A = ⎣1



, B=



2

1

1



, B=

1

1

1



0

0

−1 1

1

⎢ ⎥ 0⎦, B = ⎣1

1

0

0

1

0

1

(b) A = ⎣0 0

⎡ ⎢

and B = {u1 , u2 } and B = {v1 , v2 }, where





1



u1 = 1



1

⎥ 0⎦

1

1

0

0

0

1

1

0⎦, B = ⎣0

0

1⎦

1

0

0

0









1

[T ]B =

0

1

1

10. T : R 3 →R 3 is defined by



and PB→B =

3

2

1

1



T(x1 , x2 , x3 ) = (x1 + 2x2 − x3 , −x2 , x1 + 7x3 ) B is the standard basis, and B = {v1 , v2 , v3 }, where v1 = (1, 0, 0), v2 = (1, 1, 0), v3 = (1, 1, 1) 11. T : R →R 2 is the rotation about the origin through an angle of 45◦ , B is the standard basis, and B = {v1 , v2 }, where 2

4. Let T : R 2 →R 2 be a linear operator, and let B and B be bases for R 2 for which

[T ]B =

3

2

−1

1



and PB →B =

4

5

1

−1



v1 =

[T ]B =

0

1

1





and PB→B =

3

2

1

1



[T ]B =

2

−1

1





and PB →B =



4

5

1

−1

In Exercises 7–14, find the matrix for T relative to the basis B , and use Theorem 8.5.2 to compute the matrix for T relative to the basis B . 7. T : R 2 →R 2 is defined by

! x1 − 2x2 x1 = x2 −x2

and B = {u1 , u2 } and B = {v1 , v2 }, where





1 0 u1 = , u2 = ; 0 1



(

2

' , v2 = − √12 ,

√1

(

2

13. T : P1 →P1 is defined by

T(a0 + a1 x) = −a0 + (a0 + a1 )x

q1 = x + 1, q2 = x − 1

Find the matrix for T relative to the basis B .

T

√1

B is the standard basis for P1 , and B = {q1 , q2 }, where

6. Let T : R 2 →R 2 be a linear operator, and let B and B be bases for R 2 for which 3

, 2

2

Find the matrix for T relative to the basis B.



√1

v1 = (k, 1), v2 = (1, 0)

5. Let T : R 2 →R 2 be a linear operator, and let B and B be bases for R 2 for which 2

'

12. T : R →R is the shear in the x -direction by a positive factor k , B is the standard basis, and B = {v1 , v2 }, where 2

Find the matrix for T relative to the basis B .





18 10 , v2 = 8 5

v1 = (−2, 1, 0), v2 = (−1, 0, 1), v3 = (0, 1, 0)





v1 =

B is the standard basis, and B = {v1 , v2 , v3 }, where

Find the matrix for T relative to the basis B .





T(x1 , x2 , x3 ) = (−2x1 − x2 , x1 + x3 , x2 )





2



2 4 , u2 = ; 2 −1

9. T : R 3 →R 3 is defined by

3. Let T : R 2 →R 2 be a linear operator, and let B and B be bases for R 2 for which



! x1 + 7x2 x1 = x2 3x1 − 4x2

T

−1



8. T : R 2 →R 2 is defined by

14. T : P1 →P1 is defined by T(a0 + a1 x) = a0 + a1 (x + 1), and B = {p1 , p2 } and B = {q1 , q2 }, where p1 = 6 + 3x, p2 = 10 + 2x;

q1 = 2 , q2 = 3 + 2 x

15. Let T : P2 →P2 be defined by

T(a0 + a1 x + a2 x 2 ) = (5a0 + 6a1 + 2a2 ) − (a1 + 8a2 )x + (a0 − 2a2 )x 2 (a) Find the eigenvalues of T . (b) Find bases for the eigenspaces of T . 16. Let T : M22 →M22 be defined by



T

4 7 , v2 = v1 = 1 2

a c

b d

!



=

2c b − 2c

(a) Find the eigenvalues of T . (b) Find bases for the eigenspaces of T .

a+c d

8.5 Similarity

487

17. Since the standard basis for R n is so simple, why would one want to represent a linear operator on R n in another basis?

30. (a) Prove that if A and B are similar matrices, then A2 and B 2 are also similar.

18. Find two nonzero 2 × 2 matrices (different from those in Exercise 1) that are not similar, and explain why they are not.

(b) If A2 and B 2 are similar, must A and B be similar? Explain.

In Exercises 19–21, find the determinant and the eigenvalues of the linear operator T .

31. Let C and D be m × n matrices, and let B = {v1 , v2 , . . . , vn } be a basis for a vector space V. Prove that if C[x]B = D[x]B for all x in V, then C = D .

19. T : R 2 →R 2 , where T(x1 , x2 ) = (3x1 − 4x2 , −x1 + 7x2 )

True-False Exercises

20. T : R 3 →R 3 , where

T(x1 , x2 , x3 ) = (x1 − x2 , x2 − x3 , x3 − x1 )

21. T : P2 →P2 , where

TF. In parts (a)–(h) determine whether the statement is true or false, and justify your answer. (a) A matrix cannot be similar to itself.

T(p(x)) = p(x − 1)

22. Let T : P4 →P4 be the linear operator given by the formula T(p(x)) = p(2x + 1). (a) Find a matrix for T relative to some convenient basis, and then use it to find the rank and nullity of T . (b) Use the result in part (a) to determine whether T is oneto-one.

Working with Proofs 23. Complete the proof below by justifying each step. Hypothesis: A and B are similar matrices.

(b) If A is similar to B , and B is similar to C , then A is similar to C . (c) If A and B are similar and B is singular, then A is singular. (d) If A and B are invertible and similar, then A−1 and B −1 are similar. (e) If T1 : R n →R n and T2 : R n →R n are linear operators, and if [T1 ]B ,B = [T2 ]B ,B with respect to two bases B and B for R n , then T1 (x) = T2 (x) for every vector x in R n .

Conclusion: A and B have the same characteristic polynomial. (f ) If T1 : R n →R n is a linear operator, and if [T1 ]B = [T1 ]B with respect to two bases B and B for R n , then B = B . Proof: (1) det(λI − B) = det(λI − P −1AP ) (2)

= det(λP −1 P − P −1AP )

(3)

= det(P −1 (λI − A)P )

(4)

= det(P −1 ) det(λI − A) det(P )

(5)

= det(P −1 ) det(P ) det(λI − A)

(6)

= det(λI − A)

24. If A and B are similar matrices, say B = P −1AP , then it follows from Exercise 23 that A and B have the same eigenvalues. Suppose that λ is one of the common eigenvalues and x is a corresponding eigenvector of A. See if you can find an eigenvector of B corresponding to λ (expressed in terms of λ, x, and P ). In Exercises 25–28, prove that the stated property is a similarity invariant. 25. Trace

26. Rank

27. Nullity

28. Invertibility

29. Let λ be an eigenvalue of a linear operator T : V →V. Prove that the eigenvectors of T corresponding to λ are the nonzero vectors in the kernel of λI − T .

(g) If T : R n →R n is a linear operator, and if [T ]B = In with respect to some basis B for R n , then T is the identity operator on R n . (h) If T : R n →R n is a linear operator, and if [T ]B ,B = In with respect to two bases B and B for R n , then T is the identity operator on R n .

Working withTechnology T1. Use the matrices A and P given below to construct a matrix B = P −1AP that is similar to A, and confirm, in accordance with Table 1, that A and B have the same determinant, trace, rank, characteristic equation, and eigenvalues.



−13

⎢ A = ⎣ 10 −5

−60 42

−20

⎤ ⎡ −60 1 ⎥ ⎢ 40⎦ and P = ⎣ 2 −18 −1

−1

1



−1

⎥ −1 ⎦

−1

0

T2. Let T : R 3 →R 3 be the linear transformation whose standard matrix is the matrix A in Exercise T1. Find a basis S for R 3 for which [T ]S is diagonal.

488

Chapter 8 General Linear Transformations

Chapter 8 Supplementary Exercises 1. Let A be an n × n matrix, B a nonzero n × 1 matrix, and x a vector in R n expressed in matrix notation. Is T(x) = Ax + B a linear operator on R n ? Justify your answer.



2. Let

cos θ A= sin θ

(a) Show that



A2 =

cos 2θ sin 2θ

− sin 2θ cos 2θ

− sin θ cos θ

(T1 + T2 )(x) = T1 (x) + T2 (x) (kT )(x) = k(T(x))



and A3 =

− sin 3θ cos 3θ

cos 3θ sin 3θ

(b) Based on your answer to part (a), make a guess at the form of the matrix An for any positive integer n. (c) By considering the geometric effect of multiplication by A, obtain the result in part (b) geometrically. 3. Devise a method for finding two n × n matrices that are not similar. Use your method to find two 3 × 3 matrices that are not similar. 4. Let v1 , v2 , . . . , vm be fixed vectors in R n , and let T : R n →R m be the function defined by T(x) = (x · v1 , x · v2 , . . . , x · vm ), where x · vi is the Euclidean inner product on R n . (a) Show that T is a linear transformation. (b) Show that the matrix with row vectors v1 , v2 , . . . , vm is the standard matrix for T . 5. Let {e1 , e2 , e3 , e4 } be the standard basis for R 4 , and let T : R 4 →R 3 be the linear transformation for which

T(e1 ) = (1, 2, 1), T(e3 ) = (1, 3, 0),

T(e2 ) = (0, 1, 0), T(e4 ) = (1, 1, 1)

(b) Find the rank and nullity of T . 6. Suppose that vectors in R 3 are denoted by 1 × 3 matrices, and define T : R 3 →R 3 by



−1

⎢ T([x1 x2 x3 ]) = [x1 x2 x3 ] ⎣ 3 2



2

4

0

1⎦

2

5



(a) Find a basis for the kernel of T . (b) Find a basis for the range of T . 7. Let B = {v1 , v2 , v3 , v4 } be a basis for a vector space V, and let T : V →V be the linear operator for which

T(v1 ) = v1 + v2 + v3 + 3v4 T(v2 ) = v1 − v2 + 2v3 + 2v4 T(v3 ) = 2v1 − 4v2 + 5v3 + 3v4 T(v4 ) = −2v1 + 6v2 − 6v3 − 2v4 (b) Determine whether T is one-to-one.

(a) Show that (T1 + T2 ): V →W and kT : V →W are both linear transformations. (b) Show that the set of all linear transformations from V to W with the operations in part (a) is a vector space. 9. Let A and B be similar matrices. Prove: (a) AT and B T are similar. (b) If A and B are invertible, then A−1 and B −1 are similar. 10. (Fredholm Alternative Theorem) Let T : V →V be a linear operator on an n-dimensional vector space. Prove that exactly one of the following statements holds: (i) The equation T(x) = b has a solution for all vectors b in V. (ii) Nullity of T > 0. 11. Let T : M22 →M22 be the linear operator defined by



T(X) =

1 0



1 0 X+X 0 1

0 1

Find the rank and nullity of T . 12. Prove: If A and B are similar matrices, and if B and C are also similar matrices, then A and C are similar matrices. 13. Let L: M22 →M22 be the linear operator that is defined by L(M) = M T . Find the matrix for L with respect to the standard basis for M22 .

(a) Find bases for the range and kernel of T .

(a) Find the rank and nullity of T .

8. Let V and W be vector spaces, let T , T1 , and T2 be linear transformations from V to W , and let k be a scalar. Define new transformations, T1 + T2 and kT , by the formulas

14. Let B = {u1 , u2 , u3 } and B = {v1 , v2 , v3 } be bases for a vector space V, and let





−1

2 P = ⎣1 0

3 4⎦ 2

1 1

be the transition matrix from B to B . (a) Express v1 , v2 , v3 as linear combinations of u1 , u2 , u3 . (b) Express u1 , u2 , u3 as linear combinations of v1 , v2 , v3 . 15. Let B = {u1 , u2 , u3 } be a basis for a vector space V, and let T : V →V be a linear operator for which



−3

[T ]B = ⎣ 1 0

4 0 1



7 −2 ⎦ 0

Find [T ]B , where B = {v1 , v2 , v3 } is the basis for V defined by v1 = u1 , v2 = u1 + u2 , v3 = u1 + u2 + u3

Chapter 8 Supplementary Exercises

16. Show that the matrices



1 −1

1 4

are similar but that



3 −6

1 −2

and

(x − x2 )(x − x3 ) (x1 − x2 )(x1 − x3 ) (x − x1 )(x − x3 ) P2 (x) = (x2 − x1 )(x2 − x3 ) (x − x1 )(x − x2 ) P3 (x) = (x3 − x1 )(x3 − x2 ) P1 (x) =

1 3

and

where

2 1

−1

2 0

1

489

are not.

(d) What relationship exists between the graph of the function

17. Suppose that T : V →V is a linear operator, and B is a basis for V for which





x1 − x2 + x3 ⎥ x2 ⎦ if x1 − x3

⎢ [T(x)]B = ⎣

⎡ ⎤ x1 ⎢ ⎥ [x]B = ⎣x2 ⎦ x3

Find [T ]B . 18. Let T : V →V be a linear operator. Prove that T is one-to-one if and only if det(T )  = 0.

a1 P1 (x) + a2 P2 (x) + a3 P3 (x) and the points (x1 , a1 ), (x2 , a2 ), and (x3 , a3 )? 22. (Calculus required ) Let p(x) and q(x) be continuous functions, and let V be the subspace of C(−⬁, ⬁) consisting of all twice differentiable functions. Define L: V →V by

L(y(x)) = y

(x) + p(x)y (x) + q(x)y(x) (a) Show that L is a linear transformation.

19. (Calculus required ) (a) Show that if f = f(x) is twice differentiable, then the function D : C 2 (−⬁, ⬁) →F (−⬁, ⬁) defined by D(f ) = f

(x) is a linear transformation.

(b) Consider the special case where p(x) = 0 and q(x) = 1. Show that the function

φ(x) = c1 sin x + c2 cos x

(b) Find a basis for the kernel of D . (c) Show that the set of functions satisfying the equation D(f ) = f(x) is a two-dimensional subspace of C 2 (−⬁, ⬁), and find a basis for this subspace. 20. Let T : P2 →R 3 be the function defined by the formula

⎤ p(−1) ⎥ ⎢ T(p(x)) = ⎣ p(0) ⎦ p(1) ⎡

(a) Find T(x + 5x + 6). 2

(b) Show that T is a linear transformation.

is in the kernel of L for all real values of c1 and c2 . 23. (Calculus required ) Let D : Pn →Pn be the differentiation operator D(p) = p . Show that the matrix for D relative to the basis B = {1, x, x 2 , . . . , x n } is



0

⎢0 ⎢ ⎢ ⎢0 ⎢. ⎢. ⎢. ⎢ ⎣0

(c) Show that T is one-to-one.

(e) Sketch the graph of the polynomial in part (d). 21. Let x1 , x2 , and x3 be distinct real numbers such that

x1 < x2 < x3 and let T : P2 →R 3 be the function defined by the formula

⎤ p(x1 ) ⎥ ⎢ T(p(x)) = ⎣p(x2 )⎦ p(x3 ) ⎡

0

0

2

0

0

0

3

0

0

0

0

0

0

.. .

.. .

··· ··· ··· ··· ···

0



0⎥ ⎥

⎥ .. ⎥ ⎥ .⎥ ⎥ n⎦ 0⎥

0

24. (Calculus required ) It can be shown that for any real number c, the vectors 1, x − c,

(x − c)2 ,..., 2!

(x − c)n n!

form a basis for Pn . Find the matrix for the differentiation operator of Exercise 23 with respect to this basis. 25. (Calculus required ) Let J : Pn →Pn+1 be the integration transformation defined by



(a) Show that T is a linear transformation. (b) Show that T is one-to-one.

0

.. .

0

(d) Find T −1 (0, 3, 0).

1

J (p) =

x

(a0 + a1 t + · · · + an t n ) dt

0

(c) Verify that if a1 , a2 , and a3 are any real numbers, then

⎛⎡ ⎤⎞ a1 −1 ⎜⎢ ⎥⎟ T ⎝⎣a2 ⎦⎠ = a1 P1 (x) + a2 P2 (x) + a3 P3 (x) a3

= a0 x +

a1 2

x2 + · · · +

an n+1 x n+1

where p = a0 + a1 x + · · · + an x n . Find the matrix for J with respect to the standard bases for Pn and Pn+1 .

CHAPTER

9

Numerical Methods CHAPTER CONTENTS

9.1 LU-Decompositions 9.2 The Power Method

491 501

9.3 Comparison of Procedures for Solving Linear Systems 9.4 Singular Value Decomposition

514

9.5 Data Compression Using Singular Value Decomposition INTRODUCTION

509

521

This chapter is concerned with “numerical methods” of linear algebra, an area of study that encompasses techniques for solving large-scale linear systems and for finding numerical approximations of various kinds. It is not our objective to discuss algorithms and technical issues in fine detail since there are many excellent books on the subject. Rather, we will be concerned with introducing some of the basic ideas and exploring two important contemporary applications that rely heavily on numerical ideas—singular value decomposition and data compression. A computing utility such as MATLAB, Mathematica, or Maple is recommended for Sections 9.2 to 9.5.

9.1 LU-Decompositions Up to now, we have focused on two methods for solving linear systems, Gaussian elimination (reduction to row echelon form) and Gauss–Jordan elimination (reduction to reduced row echelon form). While these methods are fine for the small-scale problems in this text, they are not suitable for large-scale problems in which computer roundoff error, memory usage, and speed are concerns. In this section we will discuss a method for solving a linear system of n equations in n unknowns that is based on factoring its coefficient matrix into a product of lower and upper triangular matrices. This method, called “LU -decomposition,” is the basis for many computer algorithms in common use.

Solving Linear Systems by Factoring

Our first goal in this section is to show how to solve a linear system Ax = b of n equations in n unknowns by factoring the coefficient matrix A. We begin with some terminology. DEFINITION 1 A factorization of a square matrix A as

A = LU

(1)

where L is lower triangular and U is upper triangular, is called an LU-decomposition (or LU-factorization) of A. Before we consider the problem of obtaining an LU -decomposition, we will explain how such decompositions can be used to solve linear systems, and we will give an illustrative example. 491

492

Chapter 9 Numerical Methods

The Method of LU -Decomposition Step 1. Rewrite the system Ax = b as

LU x = b

(2)

Ux = y

(3)

Step 2. Define a new n × 1 matrix y by Step 3. Use (3) to rewrite (2) as Ly = b and solve this system for y. Step 4. Substitute y in (3) and solve for x. This procedure, which is illustrated in Figure 9.1.1, replaces the single linear system Ax = b by a pair of linear systems Ux = y Ly = b that must be solved in succession. However, since each of these systems has a triangular coefficient matrix, it generally turns out to involve no more computation to solve the two systems than to solve the original system directly. Solve Ax = b

x

Solv e

b

y=b Solve L

Ux = y y

Figure 9.1.1

E X A M P L E 1 Solving Ax = b by LU -Decomposition

Later in this section we will derive the factorization



2 ⎣ −3 4



6 −8 9



2 2 0⎦ = ⎣−3 2 4

0 1 −3

=

A

L

Use this result to solve the linear system



2 ⎣ −3 4

6 −8 9

0 1 −3

L

⎤⎡



1 3⎦ 1

(4)

U

⎤⎡ ⎤

x

From (4) we can rewrite this system as 2 ⎣ −3 4

3 1 0

⎡ ⎤

x1 2 2 0⎦ ⎣x2 ⎦ = ⎣2⎦ x3 2 3

A ⎡

⎤⎡

0 1 0⎦ ⎣ 0 7 0

0 1 0⎦ ⎣ 0 7 0

3 1 0

U

=

b

⎤⎡ ⎤

⎡ ⎤

x1 1 2 3⎦ ⎣x2 ⎦ = ⎣2⎦ x3 1 3 x

=

(5)

b

Historical Note In 1979 an important library of machine-independent linear algebra programs called LINPACK was developed at Argonne National Laboratories. Many of the programs in that library use the decomposition methods that we will study in this section. Variations of the LINPACK routines are used in many computer programs, including MATLAB, Mathematica, and Maple.

9.1 LU -Decompositions

493

As specified in Step 2 above, let us define y1 , y2 , and y3 by the equation



1 ⎣0 0

⎤⎡ ⎤

U

2 ⎣ −3 4

0 1 −3

⎤⎡ ⎤

y

(6)

y

⎡ ⎤

0 2 y1 0⎦ ⎣y2 ⎦ = ⎣2⎦ 7 y3 3

L or equivalently as

=

x

which allows us to rewrite (5) as



⎡ ⎤

1 x1 y1 ⎦ ⎣ ⎦ ⎣ x2 = y2 ⎦ 3 x3 y3 1

3 1 0

=

(7)

b

2y1 =2 −3y1 + y2 =2 4y1 − 3y2 + 7y3 = 3

This system can be solved by a procedure that is similar to back substitution, except that we solve the equations from the top down instead of from the bottom up. This procedure, called forward substitution, yields

y1 = 1, y2 = 5, y3 = 2 (verify). As indicated in Step 4 above, the linear system ⎡ 1 3 ⎣0 1 0 0 or, equivalently,

we substitute these values into (6), which yields

⎤⎡ ⎤

⎡ ⎤

x1 1 1 3⎦ ⎣x2 ⎦ = ⎣5⎦ x3 1 2

x1 + 3x2 + x3 = 1 x2 + 3x3 = 5 x3 = 2

Solving this system by back substitution yields

x1 = 2, x2 = −1, x3 = 2 (verify).

Alan Mathison Turing (1912–1954)

Historical Note Although the ideas were known earlier, credit for popularizing the matrix formulation of the LU-decomposition is often given to the British mathematician Alan Turing for his work on the subject in 1948. Turing, one of the great geniuses of the twentieth century, is the founder of the field of artificial intelligence. Among his many accomplishments in that field, he developed the concept of an internally programmed computer before the practical technology had reached the point where the construction of such a machine was possible. During World War II Turing was secretly recruited by the British government’s Code and Cypher School at Bletchley Park to help break the Nazi Enigma codes; it was Turing’s statistical approach that provided the breakthrough. In addition to being a brilliant mathematician,Turing was a worldclass runner who competed successfully with Olympic-level competition. Sadly,Turing, a homosexual, was tried and convicted of “gross indecency” in 1952, in violation of the then-existing British statutes. Depressed, he committed suicide at age 41 by eating an apple laced with cyanide. [Image: © National Portrait Gallery ]

494

Chapter 9 Numerical Methods

Finding LU-Decompositions

The preceding example illustrates that once an LU-decomposition of A is obtained, a linear system Ax = b can be solved by one forward substitution and one backward substitution. The main advantage of this method over Gaussian and Gauss–Jordan elimination is that it “decouples” A from b so that for solving a sequence of linear systems with the same coefficient matrix A, say

Ax = b1 , Ax = b2 , . . . , Ax = bk the work in factoring A need only be performed once, after which it can be reused for each system in the sequence. Such sequences occur in problems in which the matrix A remains fixed but the matrix b varies with time. Not every square matrix has an LU-decomposition. However, if it is possible to reduce a square matrix A to row echelon form by Gaussian elimination without performing any row interchanges, then A will have an LU-decomposition, though it may not be unique. To see why this is so, assume that A has been reduced to a row echelon form U using a sequence of row operations that does not include row interchanges. We know from Theorem 1.5.1 that these operations can be accomplished by multiplying A on the left by an appropriate sequence of elementary matrices; that is, there exist elementary matrices E1 , E2 , . . . , Ek such that

Ek · · · E2 E1 A = U

(8)

Since elementary matrices are invertible, we can solve (8) for A as

A = E1−1 E2−1 · · · Ek−1 U or more briefly as where

A = LU

(9)

L = E1−1 E2−1 · · · Ek−1

(10)

We now have all of the ingredients to prove the following result. THEOREM 9.1.1 If

A is a square matrix that can be reduced to a row echelon form U by Gaussian elimination without row interchanges, then A can be factored as A = LU, where L is a lower triangular matrix.

L and U be the matrices in Formulas (10) and (8), respectively. The matrix U is upper triangular because it is a row echelon form of a square matrix (so all entries below its main diagonal are zero). To prove that L is lower triangular, it suffices to prove Proof Let

that each factor on the right side of (10) is lower triangular, since Theorem 1.7.1(b) will then imply that L itself is lower triangular. Since row interchanges are excluded, each Ej results either by adding a scalar multiple of one row of an identity matrix to a row below or by multiplying one row of an identity matrix by a nonzero scalar. In either case, the resulting matrix Ej is lower triangular and hence so is Ej−1 by Theorem 1.7.1(d). This completes the proof.

E X A M P L E 2 An LU -Decomposition

Find an LU-decomposition of



2 ⎢ A = ⎣−3 4

6 −8 9



2 ⎥ 0⎦ 2

9.1 LU -Decompositions

495

Solution To obtain an LU-decomposition, A = LU , we will reduce A to a row echelon form U using Gaussian elimination and then calculate L from (10). The steps are as follows:

Reduction to Row Echelon Form

2 –3 4

6 –8 9

Row Operation

Step 1

3 –8 9

row 1

(3 × row 1) + row 2 3 1 9

(−4 × row 1) + row 3 3 1 –3

(3 × row 2) + row 3 3 1 0

Step 5

3 1 0

E1–1

1 E2 = 3 0

0 1 0

0 0 1

E2–1 =

0 1 0

0 0 1

1 –3 0

0 1 0

0 0 1

E3 =

1 0 –4

0 1 0

0 0 1

E3–1

1 = 0 4

0 1 0

0 0 1

1 E4 = 0 0

0 1 3

0 0 1

1 0 E4–1 = 0 1 0 –3

1 E5 = 0 0

0 1 0

0 0

E5–1

0 0 1

1 3 7 1 × 7

1 0 0

E1 = 0 0

0 0 1

1 3 –2

Step 4

1 0 0

2 = 0 0

0 1 0

1 3 2

Step 3

1 0 0

1 2

1 0 2

Step 2

1 0 4

Inverse of the Elementary Matrix

2 0 2 1 × 2

1 –3 4

Elementary Matrix Corresponding to the Row Operation

1 3 1

=U

row 3

1 7

1 = 0 0

0 1 0

0 0 7

496

Chapter 9 Numerical Methods

and, from (10),



2 ⎢ L = ⎣0 0



2 ⎢ = ⎣−3 4 so

⎤⎡

0 1 0

0 1 ⎥⎢ 0⎦ ⎣−3 1 0 0 1 −3



2 ⎢ ⎣−3 4



0 1 0

⎤⎡

0 1 ⎥⎢ 0⎦ ⎣ 0 4 1

0 1 0

⎤⎡

0 1 ⎥⎢ 0⎦ ⎣0 1 0

⎤⎡

0 1 −3

0 1 ⎥⎢ 0⎦ ⎣0 0 1

0 ⎥ 0⎦ 7 6 −8 9

0 1 0



0 ⎥ 0⎦ 7

(11)





2 2 ⎥ ⎢ 0⎦ = ⎣−3 2 4

0 1 −3

⎤⎡

0 1 ⎥⎢ 0⎦ ⎣0 0 7

3 1 0



1 ⎥ 3⎦ 1

is an LU-decomposition of A.

Bookkeeping

As Example 2 shows, most of the work in constructing an LU-decomposition is expended in calculating L. However, all this work can be eliminated by some careful bookkeeping of the operations used to reduce A to U . Because we are assuming that no row interchanges are required to reduce A to U , there are only two types of operations involved—multiplying a row by a nonzero constant, and adding a scalar multiple of one row to another. The first operation is used to introduce the leading 1’s and the second to introduce zeros below the leading 1’s. In Example 2, a multiplier of 21 was needed in Step 1 to introduce a leading 1 in the first row, and a multiplier of 17 was needed in Step 5 to introduce a leading 1 in the third row. No actual multiplier was required to introduce a leading 1 in the second row because it was already a 1 at the end of Step 2, but for convenience let us say that the multiplier was 1. Comparing these multipliers with the successive diagonal entries of L, we see that these diagonal entries are precisely the reciprocals of the multipliers used to construct U : ⎡ ⎤  2 0 0 ⎢ ⎥  1 0⎦ L = ⎣−3 (12)  7 4 −3 Also observe in Example 2 that to introduce zeros below the leading 1 in the first row, we used the operations add 3 times the first row to the second add −4 times the first row to the third and to introduce the zero below the leading 1 in the second row, we used the operation add 3 times the second row to the third Now note in (11) that in each position below the main diagonal of L, the entry is the negative of the multiplier in the operation that introduced the zero in that position in U . This suggests the following procedure for constructing an LU -decomposition of a square matrix A, assuming that this matrix can be reduced to row echelon form without row interchanges.

9.1 LU -Decompositions

497

Procedure for Constructing an LU -Decomposition Step 1. Reduce A to a row echelon form U by Gaussian elimination without row interchanges, keeping track of the multipliers used to introduce the leading 1’s and the multipliers used to introduce the zeros below the leading 1’s. Step 2. In each position along the main diagonal of L, place the reciprocal of the multiplier that introduced the leading 1 in that position in U . Step 3. In each position below the main diagonal of L, place the negative of the multiplier used to introduce the zero in that position in U . Step 4. Form the decomposition A = LU . E X A M P L E 3 Constructing an LU -Decomposition

Find an LU -decomposition of





−2 −1

6 A = ⎣9 3

0 1⎦ 5

7

A to a row echelon form U and at each step we will fill in an entry of L in accordance with the four-step procedure above.

Solution We will reduce



6 ⎣ A= 9 3



−2 −1 7

1  1 −3

⎢ ⎢ 9 ⎣

−1

3

7

1

− 13



⎢ ⎣ 0 ⎡

 0 1

⎢ ⎣0 ⎡

0 1

⎢ ⎣0 ⎡

0 1

⎢ U =⎣ 0

0

2 8

− 13

 1 8

− 13 1

 0 − 13 1 0





• 0 ⎣• • • •

0 1⎦ 5



0 ←− multiplier =

⎥ 1⎥ ⎦



1 6

• •

• ⎡



⎥ 1⎦ ←− multiplier = −9 5 ←− multiplier = −3 ⎤ multiplier =

5 ⎤ 0



0 2

0 0⎦



• ⎤

6

0

3

8

6

0 2 8

⎣9 2

1 ←− multiplier = −8 ⎤ 0 1⎥ 2⎦

 1 ←− multiplier = 1

• ⎤

3



1⎥ 2⎦



0

6 ⎣9 3



L = ⎣9 3

0 0⎦



0 0⎦

• ⎤

0 0⎦ 1

Thus, we have constructed the LU -decomposition



6 A = LU = ⎣9 3

0 2 8



⎤ 1 0 ⎢ 0⎦ ⎣ 0 1 0

− 13

• denotes an unknown entry of L.

0 0⎦

• •



1 2



6

⎣9

0

1⎥ ←− 2⎦

0

⎣•

5 0

6



0 0⎦

0

No actual operation is performed here since there is already a leading 1 in the third row.



1

1⎥ 2⎦

0

1

We leave it for you to confirm this end result by multiplying the factors.

498

Chapter 9 Numerical Methods

LU-Decompositions Are Not Unique

In general, LU-decompositions are not unique. For example, if



l11 ⎢ A = LU = ⎣l21 l31

⎤⎡

0

l22 l32

⎤ u13 ⎥ u23 ⎦

u12

0 1 ⎥⎢ 0 ⎦ ⎣0 l33 0

1 0

1

and L has nonzero diagonal entries (which will be true if A is invertible), then we can shift the diagonal entries from the left factor to the right factor by writing



1

⎢ A = ⎣l21 / l11 l31 / l11 ⎡

⎤⎡

0

l11

0

1

0⎦ ⎣ 0 0 1

l22

⎥⎢

l32 / l22

1

⎤⎡

0

⎢ = ⎣l21 / l11 l31 / l11

⎤⎡

0

0

l11 ⎥⎢ 0⎦ ⎣ 0

1

l32 / l22

0

1 0

⎤ u13 ⎥ u23 ⎦ 1



l11 u12 l22

0

1

u12

0 1 ⎥⎢ 0 ⎦ ⎣0 l33 0

l11 u13 ⎥ l22 u23 ⎦ l33

0

which is another LU -decomposition of A.

LDU-Decompositions

The method we have given for computing LU -decompositions may result in an “asymmetry” in that the matrix U has 1’s on the main diagonal but L need not. However, if it is preferred to have 1’s on the main diagonal of both the lower triangular factor and the upper triangular factor, then we can “shift” the diagonal entries of L to a diagonal matrix D and write L as

L = L D where L is a lower triangular matrix with 1’s on the main diagonal. For example, a general 3 × 3 lower triangular matrix with nonzero entries on the main diagonal can be factored as



a11 ⎣a21 a31

0

a22 a32 L





0 1 0 ⎦ = ⎣a21 /a11

a33

a31 /a11

0 1

a32 /a22 L

⎤⎡

0 a11 0⎦ ⎣ 0 1 0



0

0 0⎦

a22 0

a33

D

Note that the columns of L are obtained by dividing each entry in the corresponding column of L by the diagonal entry in the column. Thus, for example, we can rewrite (4) as If desired, the diagonal matrix and the upper triangular matrix in (13) can be multiplied to produce an LUdecomposition in which the 1’s are on the main diagonal of L rather than U .



2 ⎣ −3 4

6 −8 9





2 2 0⎦ = ⎣−3 2 4

0 1 −3

1

⎢ 3 = ⎣− 2

0 1

2

−3



⎤⎡

0 1 0⎦ ⎣ 0 7 0

⎤ 0 ⎡2 0⎥ ⎦ ⎣0 0 1



3 1 0

1 3⎦ 1

0 1 0

0 1 0⎦ ⎣ 0 7 0

⎤⎡

3 1 0



1 3⎦ 1

(13)

One can prove that if A is an invertible matrix that can be reduced to row echelon form without row interchanges, then A can be factored uniquely as

A = LDU

9.1 LU -Decompositions

499

where L is a lower triangular matrix with 1’s on the main diagonal, D is a diagonal matrix, and U is an upper triangular matrix with 1’s on the main diagonal. This is called the LDU-decomposition (or LDU-factorization) of A. PLU-Decompositions

Many computer algorithms for solving linear systems perform row interchanges to reduce roundoff error, in which case the existence of an LU -decomposition is not guaranteed. However, it is possible to work around this problem by “preprocessing” the coefficient matrix A so that the row interchanges are performed prior to computing the LU -decomposition itself. More specifically, the idea is to create a matrix Q (called a permutation matrix) by multiplying, in sequence, those elementary matrices that produce the row interchanges and then execute them by computing the product QA. This product can then be reduced to row echelon form without row interchanges, so it is assured to have an LU -decomposition QA = LU (14) Because the matrix Q is invertible (being a product of elementary matrices), the systems Ax = b and QAx = Qb will have the same solutions. But it follows from (14) that the latter system can be rewritten as LU x = Qb and hence can be solved using LU decomposition. It is common to see Equation (14) expressed as

A = PLU

(15)

in which P = Q−1 . This is called a PLU-decomposition or (PLU-factorization) of A.

Exercise Set 9.1 1. Use the method of Example 1 and the LU-decomposition



−6

3 −2



3 = 5 −2

0 1



−2

1 0

3

⎢ ⎣ 2 −4

−6 0 7

−3

3 ⎥ ⎢ 6⎦ = ⎣ 2 4 −4

0 4 −1

⎤⎡

0 1 ⎥⎢ 0 ⎦ ⎣0 0 2

−2 1 0

−1

⎤ ⎥

2⎦ 1

to solve the system 3x1 − 6x2 − 3x3 = −3 2x1

+ 6x3 = −22

−4x1 + 7x2 + 4x3 =

−2 −2 5



2. Use the method of Example 1 and the LU-decomposition



6

2 ⎢ 5. ⎣ 0 −1

3x1 − 6x2 = 0 −2x1 + 5x2 = 1



4.



−10 −10 x1 = 5 x2 19

−5



1

to solve the system





3

In Exercises 3–6, find an LU-decomposition of the coefficient matrix, and then use the method of Example 1 to solve the system.

2 8 x1 −2 3. = −1 −1 x2 −2

−3

12 −2 1



6. ⎣ 1 0

⎤⎡ ⎤ ⎡ ⎤ x1 −4 ⎥⎢ ⎥ ⎢ ⎥ 2⎦ ⎣x2 ⎦ = ⎣−2⎦ 6 2 x3

−2

⎤⎡ ⎤ ⎡ ⎤ −6 x1 −33 ⎥⎢ ⎥ ⎢ ⎥ 2⎦ ⎣x2 ⎦ = ⎣ 7⎦ 1 x3 −1

In Exercises 7–8, an LU -decomposition of a matrix A is given. (a) Compute L−1 and U −1 . (b) Use the result in part (a) to find the inverse of A.



2

−1

7. A = ⎣ 4

2

−6

−1 ⎡



3

⎤ ⎥

1⎦; 2

⎤⎡



⎢ A = LU = ⎣ 2

0

0

2

−1

1

0⎦ ⎣ 0

4

⎥ −5⎦

−3

−1

0

0

6

1

1

⎥⎢

3

8. The LU-decomposition obtained in Example 2.

500

Chapter 9 Numerical Methods

⎡ 9. Let



2

−1

1

⎢ A = ⎣−2

−1

2

1



0

⎢ 16. A = ⎣1



2

2⎦ 0

(a) Find an LU-decomposition of A. (b) Express A in the form A = L1 DU1 , where L1 is lower triangular with 1’s along the main diagonal, U1 is upper triangular, and D is a diagonal matrix. (c) Express A in the form A = L2 U2 , where L2 is lower triangular with 1’s along the main diagonal and U2 is upper triangular.

3 1 2

−2





17. Let Ax = b be a linear system of n equations in n unknowns, and assume that A is an invertible matrix that can be reduced to row echelon form without row interchanges. How many additions and multiplications are required to solve the system by the method of Example 1?

Working with Proofs 18. Let

A=

10. (a) Show that the matrix



0 1

1 0

In Exercises 11–12, use the given PLU-decomposition of A to solve the linear system Ax = b by rewriting it as P −1Ax = P −1 b and solving this system by LU-decomposition.





0 ⎢ A = ⎣1 0

⎤⎡

1 0 0

0 1 ⎥⎢ 0⎦ ⎣0 1 3

⎡ ⎤



3 4 ⎢ ⎢ ⎥ 12. b = ⎣0⎦ ; A = ⎣0 6 8



1 ⎢ A = ⎣0 0

⎤⎡

0 0 1

0 4 ⎥⎢ 1⎦ ⎣ 0 0 0

1 2 1



1 2 1

⎤⎡



2 1 0

(a) Every square matrix has an LU-decomposition.

2 ⎥ 4⎦ = PLU 1

2 13. A = 4

2 1

⎤⎡

1 4

2 1 ⎥⎢ 4⎦ ⎣0 9 0

1 0



1 2

(d) If an invertible matrix A has an LU-decomposition, then A has a unique LDU-decomposition.



⎥ −4⎦ = PLU

3

−12

6

−28

⎢ 14. A = ⎣0

2

3 ⎢ 15. A = ⎣3 0

−1 −1 2







0 −2 ⎢ ⎥ ⎥ 1⎦ ; b = ⎣ 1⎦ 4 1

(e) Every square matrix has a P LU-decomposition.

Working withTechnology

1



6 ⎥ 0⎦ 13

In Exercises 15–16, find a PLU-decomposition of A, and use it to solve the linear system Ax = b by the method of Exercises 11 and 12.



(b) If a square matrix A is row equivalent to an upper triangular matrix U , then A has an LU-decomposition. (c) If L1 , L2 , . . . , Lk are n × n lower triangular matrices, then the product L1 L2 · · · Lk is lower triangular.



2 ⎥ 1⎦ ; 8

In Exercises 13–14, find the LDU-decomposition of A.



19. Prove: If A is any n × n matrix, then A can be factored as A = PLU , where L is lower triangular, U is upper triangular, and P can be obtained by interchanging the rows of In appropriately. [Hint: Let U be a row echelon form of A, and let all row interchanges required in the reduction of A to U be performed first.]

TF. In parts (a)–(e) determine whether the statement is true or false, and justify your answer.

0 1 ⎥⎢ 0⎦ ⎣0 0 17

1 −1 0

True-False Exercises

4 ⎥ 2⎦ ; 3

0 1 −5

b d

(b) Find the LU-decomposition described in part (a).

(b) Find a PLU-decomposition of this matrix.

2 0 ⎢ ⎥ ⎢ 11. b = ⎣1⎦ ; A = ⎣1 5 3

a c

(a) Prove: If a  = 0, then the matrix A has a unique LUdecomposition with 1’s along the main diagonal of L.

has no LU-decomposition.

⎡ ⎤



7

⎢ ⎥ ⎥ 4⎦ ; b = ⎣ 5⎦ 5 −2

T1. Technology utilities vary in how they handle LU-decompositions. For example, many utilities perform row interchanges to reduce roundoff error and hence produce PLU-decompositions, even when asked for LU-decompositions. See what happens when you use your utility to find an LU-decomposition of the matrix A in Example 2. T2. The accompanying figure shows a metal plate whose edges are held at the temperatures shown. It follows from thermodynamic principles that the temperature at each of the six interior nodes will eventually stabilize at a value that is approximately the average of the temperatures at the four neighboring nodes. These are called the steady-state temperatures at the nodes. Thus, for example, if we denote the steady-state temperatures at the interior nodes in

9.2 The Power Method

the accompanying figure as T1 , T2 , T3 , T4 , T5 , and T6 , then at the node labeled T1 that temperature will be T1 = 41 (0 + 5 + T2 + T3 ) or, equivalently, 4 T1 − T 2 − T 3 = 5 Find a linear system whose solution gives the steady-state temperatures at the nodes, and use your technology utility to solve that system by LU -decomposition.



0º 0º





T1

T3

T5

T2

T4

T6

20º

20º

501

10º 10º

20º

Figure Ex-T2

9.2 The Power Method The eigenvalues of a square matrix can, in theory, be found by solving the characteristic equation. However, this procedure has so many computational difficulties that it is almost never used in applications. In this section we will discuss an algorithm that can be used to approximate the eigenvalue with greatest absolute value and a corresponding eigenvector. This particular eigenvalue and its corresponding eigenvectors are important because they arise naturally in many iterative processes. The methods we will study in this section have recently been used to create Internet search engines such as Google.

The Power Method

There are many applications in which some vector x0 in R n is multiplied repeatedly by an n × n matrix A to produce a sequence x0 , Ax0 , A2 x0 , . . . , Ak x0 , . . . We call a sequence of this form a power sequence generated by A. In this section we will be concerned with the convergence of power sequences and how such sequences can be used to approximate eigenvalues and eigenvectors. For this purpose, we make the following definition. DEFINITION 1 If the distinct eigenvalues of a matrix

A are λ1 , λ2 , . . . , λk , and if |λ1 | is larger than |λ2 |, . . . , |λk |, then λ1 is called a dominant eigenvalue of A. Any eigenvector corresponding to a dominant eigenvalue is called a dominant eigenvector of A.

E X A M P L E 1 Dominant Eigenvalues

Some matrices have dominant eigenvalues and some do not. For example, if the distinct eigenvalues of a matrix are

λ1 = −4, λ2 = −2, λ3 = 1, λ4 = 3 then λ1 = −4 is dominant since |λ1 | = 4 is greater than the absolute values of all the other eigenvalues; but if the distinct eigenvalues of a matrix are

λ1 = 7, λ2 = −7, λ3 = −2, λ4 = 5 then |λ1 | = |λ2 | = 7, so there is no eigenvalue whose absolute value is greater than the absolute value of all the other eigenvalues. The most important theorems about convergence of power sequences apply to n × n matrices with n linearly independent eigenvectors (symmetric matrices, for example), so we will limit our discussion to this case in this section.

502

Chapter 9 Numerical Methods

A be a symmetric n × n matrix that has a positive* dominant eigenvalue λ. If x0 is a unit vector in R n that is not orthogonal to the eigenspace corresponding to λ, then the normalized power sequence

THEOREM 9.2.1 Let

x 0 , x1 =

Ax0 Ax1 Axk−1 , x2 = , . . . , xk = ,...

Ax0

Ax1

Axk−1

(1)

converges to a unit dominant eigenvector, and the sequence

Ax1 · x1 , Ax2 · x2 , Ax3 · x3 , . . . , Axk · xk , . . .

(2)

converges to the dominant eigenvalue λ.

Remark In the exercises we will ask you to show that (1) can also be expressed as x0 , x1 =

Ax0 A2 x0 Ak x0 $ $,... x = , x2 = , . . . , k $Ak x0 $

Ax0

A2 x0

(3)

This form of the power sequence expresses each iterate in terms of the starting vector x0 , rather than in terms of its predecessor.

We will not prove Theorem 9.2.1, but we can make it plausible geometrically in the 2 × 2 case where A is a symmetric matrix with distinct positive eigenvalues, λ1 and λ2 , one of which is dominant. To be specific, assume that λ1 is dominant and

λ1 > λ2 > 0 Since we are assuming that A is symmetric and has distinct eigenvalues, it follows from Theorem 7.2.2 that the eigenspaces corresponding to λ1 and λ2 are perpendicular lines through the origin. Thus, the assumption that x0 is a unit vector that is not orthogonal to the eigenspace corresponding to λ1 implies that x0 does not lie in the eigenspace corresponding to λ2 . To see the geometric effect of multiplying x0 by A, it will be useful to split x0 into the sum x0 = v0 + w0 (4) where v0 and w0 are the orthogonal projections of x0 on the eigenspaces of λ1 and λ2 , respectively (Figure 9.2.1a). λ1v0 + λ2w0 Eigenspace λ2

w0

Figure 9.2.1

Eigenspace λ2

Eigenspace λ1

x0

x0

λ2w0

w0

v0

(a)

x1

Eigenspace λ1

x0

λ1v0

x1 x

v0

(b)

(c)

This enables us to express Ax0 as

Ax0 = Av0 + Aw0 = λ1 v0 + λ2 w0

(5)

* If the dominant eigenvalue is not positive, sequence (2) will still converge to the dominant eigenvalue, but sequence (1) may not converge to a specific dominant eigenvector because of alternation (see Exercise 11). Nevertheless, each term of (1) will closely approximate some dominant eigenvector for sufficiently large values of k .

9.2 The Power Method

503

which tells us that multiplying x0 by A “scales” the terms v0 and w0 in (4) by λ1 and λ2 , respectively. However, λ1 is larger than λ2 , so the scaling is greater in the direction of v0 than in the direction of w0 . Thus, multiplying x0 by A “pulls” x0 toward the eigenspace of λ1 , and normalizing produces a vector x1 = Ax0 / Ax0 , which is on the unit circle and is closer to the eigenspace of λ1 than x0 (Figure 9.2.1b). Similarly, multiplying x1 by A and normalizing produces a unit vector x2 that is closer to the eigenspace of λ1 than x1 . Thus, it seems reasonable that by repeatedly multiplying by A and normalizing we will produce a sequence of vectors xk that lie on the unit circle and converge to a unit vector x in the eigenspace of λ1 (Figure 9.2.1c). Moreover, if xk converges to x, then it also seems reasonable that Axk · xk will converge to

Ax · x = λ1 x · x = λ1 x 2 = λ1 which is the dominant eigenvalue of A. The Power Method with Euclidean Scaling

Theorem 9.2.1 provides us with an algorithm for approximating the dominant eigenvalue and a corresponding unit eigenvector of a symmetric matrix A, provided the dominant eigenvalue is positive. This algorithm, called the power method with Euclidean scaling, is as follows: The Power Method with Euclidean Scaling Step 0. Choose an arbitrary nonzero vector and normalize it, if need be, to obtain a unit vector x0 . Step 1. Compute Ax0 and normalize it to obtain the first approximation x1 to a dominant unit eigenvector. Compute Ax1 · x1 to obtain the first approximation to the dominant eigenvalue. Step 2. Compute Ax1 and normalize it to obtain the second approximation x2 to a dominant unit eigenvector. Compute Ax2 · x2 to obtain the second approximation to the dominant eigenvalue. Step 3. Compute Ax2 and normalize it to obtain the third approximation x3 to a dominant unit eigenvector. Compute Ax3 · x3 to obtain the third approximation to the dominant eigenvalue. Continuing in this way will usually generate a sequence of better and better approximations to the dominant eigenvalue and a corresponding unit eigenvector.*

E X A M P L E 2 The Power Method with Euclidean Scaling

Apply the power method with Euclidean scaling to



3 A= 2

2 3

with x0 =

1 0

Stop at x5 and compare the resulting approximations to the exact values of the dominant eigenvalue and eigenvector.

*

If the vector x0 happens to be orthogonal to the eigenspace of the dominant eigenvalue, then the hypotheses of Theorem 9.2.1 will be violated and the method may fail. However, the reality is that computer roundoff errors usually perturb x0 enough to destroy any orthogonality and make the algorithm work. This is one instance in which errors help to obtain correct results!

504

Chapter 9 Numerical Methods Solution We will leave it for you to show that the eigenvalues of A are λ = 1 and λ = 5 and that the eigenspace corresponding to the dominant eigenvalue λ = 5 is the line represented by the parametric equations x1 = t , x2 = t , which we can write in vector form as 1 x=t (6) 1



Setting t = 1/ 2 yields the normalized dominant eigenvector

 v1 =

√1

2 √1 2





0.707106781187 . . .



0.707106781187 . . .

(7)

Now let us see what happens when we use the power method, starting with the unit vector x0 .

A x0 =

3

2

2

3

Ax1 ≈

3

2

2

3

A x2 ≈ A x3 ≈

1



3

2

3

2

0.83205



2

2

3

2

2

3

0.71274





0.70824 0.70597 (1)

λ



3.56097

3.50445

3.54108

3.52976



3.53440

= (Ax1 ) · x1

λ(3) = (Ax3 ) · x3 λ(4) = (Ax4 ) · x4

The Power Method with Maximum Entry Scaling

3.53666

λ(2) = (Ax2 ) · x2

It is accidental that λ(5) (the fifth approximation) produced five decimal place accuracy. In general, n iterations need not produce n decimal place accuracy.

3.32820



0.70143

3.60555



0.67828



3

0.73480



0.55470

2

3

=

0

3

Ax4 ≈



λ(5) = (Ax5 ) · x5



3 3 0.83205 1 1 Ax0 ≈ ≈ x1 = =√

Ax0 3.60555 2 0.55470 13 2



3.60555 0.73480 Ax1 1 ≈ x2 = ≈

Ax1 4.90682 3.32820 0.67828



3.56097 0.71274 Ax2 1 x3 = ≈ ≈

Ax2 4.99616 3.50445 0.70143



3.54108 0.70824 Ax3 1 x4 = ≈ ≈

Ax3 4.99985 3.52976 0.70597



3.53666 0.70733 1 Ax4 x5 = ≈ ≈

Ax4 4.99999 3.53440 0.70688



 0.83205 = (Ax1 ) x1 ≈ 3.60555 3.32820 ≈ 4.84615 0.55470

  0.73480 T = (Ax2 ) x2 ≈ 3.56097 3.50445 ≈ 4.99361 0.67828

 0.71274  = (Ax3 )T x3 ≈ 3.54108 3.52976 ≈ 4.99974 0.70143

 0.70824  = (Ax4 )T x4 ≈ 3.53666 3.53440 ≈ 4.99999 0.70597

  0.70733 T = (Ax5 ) x5 ≈ 3.53576 3.53531 ≈ 5.00000 0.70688 T



Thus, λ(5) approximates the dominant eigenvalue to five decimal place accuracy and x5 approximates the dominant eigenvector in (7) to three decimal place accuracy.

There is a variation of the power method in which the iterates, rather than being normalized at each stage, are scaled to make the maximum entry 1. To describe this method, it will be convenient to denote the maximum absolute value of the entries in a vector x by

9.2 The Power Method

max(x). Thus, for example, if



505



5

⎢ 3⎥ ⎢ ⎥ ⎥ ⎣−7⎦

x=⎢

2 then max(x) = 7. We will need the following variation of Theorem 9.2.1.

A be a symmetric n × n matrix that has a positive dominant* eigenvalue λ. If x0 is a nonzero vector in R n that is not orthogonal to the eigenspace corresponding to λ, then the sequence

THEOREM 9.2.2 Let

x 0 , x1 =

Ax0 Ax1 Axk−1 , x2 = , . . . , xk = ,... max(Ax0 ) max(Ax1 ) max(Axk−1 )

(8)

converges to an eigenvector corresponding to λ, and the sequence

A x1 · x1 , x1 · x1

A x2 · x2 , x2 · x2

A x3 · x3 ,..., x3 · x3

A xk · xk ,... xk · xk

(9)

converges to λ.

Remark In the exercises we will ask you to show that (8) can be written in the alternative form x0 , x1 =

Ax0 A2 x0 Ak x0 , x2 = , . . . , xk = ,... 2 max(Ax0 ) max(A x0 ) max(Ak x0 )

(10)

which expresses the iterates in terms of the initial vector x0 .

We will omit the proof of this theorem, but if we accept that (8) converges to an eigenvector of A, then it is not hard to see why (9) converges to the dominant eigenvalue. To see this, note that each term in (9) is of the form

Ax · x x·x

(11)

which is called a Rayleigh quotient of A. In the case where λ is an eigenvalue of A and x is a corresponding eigenvector, the Rayleigh quotient is

λx · x λ(x · x) Ax · x = = =λ x·x x·x x·x Thus, if xk converges to a dominant eigenvector x, then it seems reasonable that

A xk · xk Ax · x converges to =λ xk · xk x·x which is the dominant eigenvalue. Theorem 9.2.2 produces the following algorithm, which is called the power method with maximum entry scaling.

* As in Theorem 9.2.1, if the dominant eigenvalue is not positive, sequence (9) will still converge to the dominant eigenvalue, but sequence (8) may not converge to a specific dominant eigenvector. Nevertheless, each term of (8) will closely approximate some dominant eigenvector for sufficiently large values of k (see Exercise 11).

506

Chapter 9 Numerical Methods

The Power Method with Maximum Entry Scaling Step 0. Choose an arbitrary nonzero vector x0 . Step 1. Compute Ax0 and multiply it by the factor 1/ max(Ax0 ) to obtain the first approximation x1 to a dominant eigenvector. Compute the Rayleigh quotient of x1 to obtain the first approximation to the dominant eigenvalue. Step 2. Compute Ax1 and scale it by the factor 1/ max(Ax1 ) to obtain the second approximation x2 to a dominant eigenvector. Compute the Rayleigh quotient of x2 to obtain the second approximation to the dominant eigenvalue. Step 3. Compute Ax2 and scale it by the factor 1/ max(Ax2 ) to obtain the third approximation x3 to a dominant eigenvector. Compute the Rayleigh quotient of x3 to obtain the third approximation to the dominant eigenvalue. Continuing in this way will generate a sequence of better and better approximations to the dominant eigenvalue and a corresponding eigenvector.

John William Strutt Rayleigh (1842–1919) Historical Note The British mathematical physicist John Rayleigh won the Nobel prize in physics in 1904 for his discovery of the inert gas argon. Rayleigh also made fundamental discoveries in acoustics and optics, and his work in wave phenomena enabled him to give the first accurate explanation of why the sky is blue. [Image: The Granger Collection, NewYork ]

E X A M P L E 3 Example 2 Revisited Using Maximum Entry Scaling

Apply the power method with maximum entry scaling to



3 A= 2

2 3



with x0 =

1 0

Stop at x5 and compare the resulting approximations to the exact values and to the approximations obtained in Example 2. Solution We leave it for you to confirm that





3 A x0 = 2

2 3

3 Ax1 ≈ 2

2 3

3 A x2 ≈ 2

2 3

3 2

2 3

3 2

2 3







A x3 ≈

A x4 ≈

Whereas the power method with Euclidean scaling produces a sequence that approaches a unit dominant eigenvector, maximum entry scaling produces a sequence that approaches an eigenvector whose largest component is 1.





1 3 = 0 2



















1.00000 4.33333 ≈ 0.66667 4.00000

1.00000 4.84615 ≈ 0.92308 4.76923

1.00000 4.96825 ≈ 0.98413 4.95238



1.00000 4.99361 ≈ 0.99681 4.99042



1 3 Ax0 1.00000 = ≈ x1 = 0.66667 max(Ax0 ) 3 2



Ax1 1 4.33333 1.00000 x2 = ≈ ≈ 0.92308 max(Ax1 ) 4.33333 4.00000



1 Ax2 4.84615 1.00000 ≈ ≈ x3 = 0.98413 max(Ax2 ) 4.84615 4.76923



1 Ax3 4.96825 1.00000 x4 = ≈ ≈ 0.99681 max(Ax3 ) 4.96825 4.95238



1 Ax4 4.99361 1.00000 ≈ ≈ x5 = 0.99936 max(Ax4 ) 4.99361 4.99042

λ(1) =

7.00000 Ax1 · x1 (Ax1 )T x1 = ≈ ≈ 4.84615 x1 · x1 1.44444 xT1 x1

λ(2) =

9.24852 Ax2 · x2 (Ax2 )T x2 = ≈ ≈ 4.99361 T x2 · x2 1.85207 x2 x2

λ(3) =

9.84203 Ax3 · x3 (Ax3 )T x3 = ≈ ≈ 4.99974 T x3 · x3 1.96851 x3 x3

λ(4) =

9.96808 Ax4 · x4 (Ax4 )T x4 = ≈ ≈ 4.99999 T x4 · x4 1.99362 x4 x4

λ(5) =

Ax5 · x5 (Ax5 )T x5 9.99360 = ≈ ≈ 5.00000 x5 · x5 1.99872 xT5 x5

9.2 The Power Method

507

Thus, λ(5) approximates the dominant eigenvalue correctly to five decimal places and x5 closely approximates the dominant eigenvector



x=

1 1

that results by taking t = 1 in (6). Rate of Convergence

If A is a symmetric matrix whose distinct eigenvalues can be arranged so that

|λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λk | then the “rate” at which the Rayleigh quotients converge to the dominant eigenvalue λ1 depends on the ratio |λ1 |/|λ2 |; that is, the convergence is slow when this ratio is near 1 and rapid when it is large—the greater the ratio, the more rapid the convergence. For example, if A is a 2 × 2 symmetric matrix, then the greater the ratio |λ1 |/|λ2 |, the greater the disparity between the scaling effects of λ1 and λ2 in Figure 9.2.1, and hence the greater the effect that multiplication by A has on pulling the iterates toward the eigenspace of λ1 . Indeed, the rapid convergence in Example 3 is due to the fact that |λ1 |/|λ2 | = 5/1 = 5, which is considered to be a large ratio. In cases where the ratio is close to 1, the convergence of the power method may be so slow that other methods must be used.

Stopping Procedures

If λ is the exact value of the dominant eigenvalue, and if a power method produces the approximation λ(k) at the k th iteration, then we call

   λ − λ(k)     λ 

(12)

the relative error in λ(k) . If this is expressed as a percentage, then it is called the percentage error in λ(k) . For example, if λ = 5 and the approximation after three iterations is λ(3) = 5.1, then relative error in λ

(3)

     λ − λ(3)   5 − 5.1     = |−0.02| = 0.02  = = λ   5 

percentage error in λ(3) = 0.02 × 100% = 2% In applications one usually knows the relative error E that can be tolerated in the dominant eigenvalue, so the goal is to stop computing iterates once the relative error in the approximation to that eigenvalue is less than E . However, there is a problem in computing the relative error from (12) in that the eigenvalue λ is unknown. To circumvent this problem, it is usual to estimate λ by λ(k) and stop the computations when

 (k)   λ − λ(k−1)    k , that

U  = [σ1 u1 σ2 u2 · · · σk uk 0 · · · 0] = [Av1 Av2 · · · Avk Avk+1 · · · Avn ] = AV which we can rewrite using the orthogonality of V as A = U V T .

Exercise Set 9.4 In Exercises 1–4, find the distinct singular values of A.



1. A = 1



1 3. A = 2

2

0

−2



2. A =

3 0

0 4 2

4. A =



0



1

2

In Exercises 5–12, find a singular value decomposition of A.

5. A =

1 1

7. A =

4 0

−1 1 6 4



−3

6. A =

8. A =

0 −4

0 3 3

3 3

−2 ⎢ 9. A = ⎣−1 2

√

1







1 ⎢ 11. A = ⎣ 1 −1



2 ⎥ 1⎦ −2



0 ⎥ 1⎦ 1

10. A =

−2

−1

2

1



6 ⎢ 12. A = ⎣0 4

2 −2



4 ⎥ 0⎦ 0

Working with Proofs 13. Prove: If A is an m × n matrix, then ATA and AAT have the same rank. 14. Prove part (d ) of Theorem 9.4.1 by using part (a) of the theorem and the fact that A and ATA have n columns.

9.5 Data Compression Using Singular Value Decomposition

15. (a) Prove part (b) of Theorem 9.4.1 by first showing that row(ATA) is a subspace of row(A). (b) Prove part (c) of Theorem 9.4.1 by using part (b). 16. Let T : R n →R m be a linear transformation whose standard matrix A has the singular value decomposition A = U V T , and let B = {v1 , v2 , . . . , vn } and B = {u1 , u2 , . . . , um } be the column vectors of V and U, respectively. Prove that

 = [T ]B ,B . 17. Prove that the singular values of ATA are the squares of the singular values of A. 18. Prove that if A = U V T is a singular value decomposition of A, then U orthogonally diagonalizes AAT . 19. A polar decomposition of an n × n matrix A is a factorization A = PQ in which P is a positive semidefinite n × n matrix with the same rank as A, and Q is an orthogonal n × n matrix. (a) Prove that if A = U V T is the singular value decomposition of A, then A = (U U T )(U V T ) is a polar decomposition of A. (b) Find a polar decomposition of the matrix in Exercise 5.

(a) If A is an m × n matrix, then ATA is an m × m matrix. (b) If A is an m × n matrix, then ATA is a symmetric matrix. (c) If A is an m × n matrix, then the eigenvalues of ATA are positive real numbers. (d) If A is an n × n matrix, then A is orthogonally diagonalizable. (e) If A is an m × n matrix, then ATA is orthogonally diagonalizable. (f ) The eigenvalues of ATA are the singular values of A. (g) Every m × n matrix has a singular value decomposition.

Working withTechnology T1. Use your technology utility to duplicate the computations in Example 2. T2. For the given matrix A, use the steps in Example 2 to find matrices U ,  , and V T in a singular value decomposition A = U V T .

 (a) A =

True-False Exercises

521

TF. In parts (a)–(g) determine whether the statement is true or false, and justify your answer.

−2

−1

2

2

1

−2







1

(b) A = ⎣ 1

−1

0

⎤ ⎥

1⎦ 1

9.5 Data Compression Using Singular Value Decomposition Efficient transmission and storage of large quantities of digital data has become a major problem in our technological world. In this section we will discuss the role that singular value decomposition plays in compressing digital data so that it can be transmitted more rapidly and stored in less space. We assume here that you have read Section 9.4.

Reduced Singular Value Decomposition

Algebraically, the zero rows and columns of the matrix  in Theorem 9.4.4 are superfluous and can be eliminated by multiplying out the expression U V T using block multiplication and the partitioning shown in that formula. The products that involve zero blocks as factors drop out, leaving

⎡ A = [u1

u2

···

σ1

⎢0 ⎢ uk ] ⎢ ..

0

⎣.

σ2 .. .

0

0

... ... .. .

⎡ ⎤

⎤ vT 1 0 ⎢ T⎥ ⎢ v 0⎥ ⎥ 2⎥

⎥ .. ⎥ ⎢ .⎥ . ⎦⎢ ⎣ .. ⎦ . . . σk vT

(1)

k

which is called a reduced singular value decomposition of A. In this text we will denote the matrices on the right side of (1) by U1 , 1 , and V1T , respectively, and we will write this equation as

A = U1 1 V1T

(2)

522

Chapter 9 Numerical Methods

Note that the sizes of U1 , 1 , and V1T are m × k, k × k, and k × n, respectively, and that the matrix 1 is invertible since its diagonal entries are positive. If we multiply out on the right side of (1) using the column-row rule, then we obtain

A = σ1 u1 vT1 + σ2 u2 vT2 + · · · + σk uk vTk

(3)

which is called a reduced singular value expansion of A. This result applies to all matrices, whereas the spectral decomposition [Formula (7) of Section 7.2] applies only to symmetric matrices. Remark It can be proved that an m × n matrix M has rank 1 if and only if it can be factored as M = uvT , where u is a column vector in R m and V is a column vector in R n . Thus, a reduced singular value decomposition expresses a matrix A of rank k as a linear combination of k rank 1 matrices.

E X A M P L E 1 Reduced Singular Value Decomposition

Find a reduced singular value decomposition and a reduced singular value expansion of the matrix ⎡ ⎤ 1 1 A = ⎣0 1⎦ 1 0 Solution In Example 2 of Section 9.4 we found the singular value decomposition





⎡√

6 3





3⎥



2 2



2 2

6

A

⎤⎡√ ⎡√ 3 0 2 ⎥ ⎥ 2 ⎥⎢ √1 ⎥ ⎣ 0 1⎦ ⎣ √ 3⎥ 2 ⎦ 0 0 2 1

0

⎢ 1 ⎢√ ⎢ ⎦ 1 = ⎢ 66 ⎢√ ⎣ 6 0

1 ⎣0 1

− √1

=











2 2



(4)

2 2

3

U

VT



Since A has rank 2 (verify), it follows from (1) with k = 2 that the reduced singular value decomposition of A corresponding to (4) is



1 ⎣0 1



⎡√

6 3

⎢ 1 ⎢√ 6 1⎦ = ⎢ ⎢ 6 ⎣√ 0 6 6



⎡ ⎥ √

√2 ⎥ 3 0 ⎣ 2 − 22 ⎥ √ ⎥ 2 0 1 √ ⎦ 2 0









2 2



2 2



2 2

This yields the reduced singular value expansion



1 ⎣0 1

⎡√ ⎤





6

⎢ 3 ⎥ 1 √ ⎢√ ⎥%√ T T 6⎥ 2 ⎦ 1 = σ1 u1 v1 + σ2 u2 v2 = 3 ⎢ ⎢ 6 ⎥ 2 ⎣ ⎦ √ 0 6 ⎡√

3 3

⎢ √ ⎢√ 3 = 3⎢ ⎢ 6 ⎣√ 3 6





2 2

&

⎡ 0 ⎥ ⎢ 3 ⎥ + (1) − 1 ⎣ 2 6 ⎥ √ ⎦ 1 √

2

0



⎢ √2 ⎥ % √ − ⎥ 2 + (1) ⎢ ⎣ 2 ⎦ 2

6

3 3 ⎥

3 6



0



1⎥ 2⎦

− 21

Note that the matrices in the expansion have rank 1, as expected.



2 2





2 2

&

9.5 Data Compression Using Singular Value Decomposition

Data Compression and Image Processing

523

Singular value decompositions can be used to “compress” visual information for the purpose of reducing its required storage space and speeding up its electronic transmission. The first step in compressing a visual image is to represent it as a numerical matrix from which the visual image can be recovered when needed. For example, a black and white photograph might be scanned as a rectangular array of pixels (points) and then stored as a matrix A by assigning each pixel a numerical value in accordance with its gray level. If 256 different gray levels are used (0 = white to 255 = black), then the entries in the matrix would be integers between 0 and 255. The image can be recovered from the matrix A by printing or displaying the pixels with their assigned gray levels. If the matrix A has size m × n, then one might store each of its mn entries individually. An alternative procedure is to compute the reduced singular value decomposition

A = σ1 u1vT1 + σ2 u2 vT2 + · · · + σk uk vTk

(5)

in which σ1 ≥ σ2 ≥ · · · ≥ σk , and store the σ ’s, the u’s, and the v’s. When needed, the matrix A (and hence the image it represents) can be reconstructed from (5). Since each uj has m entries and each vj has n entries, this method requires storage space for

km + kn + k = k(m + n + 1) numbers. Suppose, however, that the singular values σr+1 , . . . , σk are sufficiently small that dropping the corresponding terms in (5) produces an acceptable approximation

Ar = σ1 u1 vT1 + σ2 u2 vT2 + · · · + σr ur vTr

(6)

to A and the image that it represents. We call (6) the rank r approximation of A. This matrix requires storage space for only

rm + rn + r = r(m + n + 1) numbers, compared to mn numbers required for entry-by-entry storage of A. For example, the rank 100 approximation of a 1000 × 1000 matrix A requires storage for only 100(1000 + 1000 + 1) = 200,100 numbers, compared to the 1,000,000 numbers required for entry-by-entry storage of

A—a compression of almost 80%. Figure 9.5.1 shows some approximations of a digitized mandrill image obtained using (6).

Original

Reconstruction

Historical Note In 1924 the U.S. Federal Bureau of Investigation (FBI) began collecting fingerprints and handprints and now has more than 100 million such prints in its files. To reduce the storage cost, the FBI began working with the Los Alamos National Laboratory, the National Bureau of Standards, and other groups in 1993 to devise rank-based compression methods for storing prints in digital form. The following figure shows an original fingerprint and a reconstruction from digital data that was compressed at a ratio of 26:1.

524

Chapter 9 Numerical Methods

Rank 4

Rank 10

Rank 20

Rank 50

Rank 128

Figure 9.5.1 [Image: Digital Vision/Age Fotostock America, Inc.]

Exercise Set 9.5 In Exercises 1–4, find a reduced singular value decomposition of A. [Note: Each matrix appears in Exercise Set 9.4, where you were asked to find its (unreduced) singular value decomposition.]





−2 ⎢ 1. A = ⎣−1 2



2. A =

0 ⎥ 1⎦ 1

6 ⎢ 4. A = ⎣0 4



1 ⎢ 3. A = ⎣ 1 −1



2 ⎥ 1⎦ −2

−2

−1

2

2

1

−2





7. The matrix A in Exercise 3. 8. The matrix A in Exercise 4. 9. Suppose A is a 200 × 500 matrix. How many numbers must be stored in the rank 100 approximation of A? Compare this with the number of entries of A.

True-False Exercises

4 ⎥ 0⎦ 0

TF. In parts (a)–(c) determine whether the statement is true or false, and justify your answer. Assume that U1 1 V1T is a reduced singular value decomposition of an m × n matrix of rank k .

In Exercises 5–8, find a reduced singular value expansion of A.

(a) U1 has size m × k .

5. The matrix A in Exercise 1.

(b) 1 has size k × k .

6. The matrix A in Exercise 2.

(c) V1 has size k × n.

Chapter 9 Supplementary Exercises

1. Find an LU -decomposition of A =

−6

2 . 0

6

2. Find the LDU -decomposition of the matrix A in Exercise 1.



2

⎢ 3. Find an LU -decomposition of A = ⎣1 1

4 4 3



6 ⎥ 7⎦. 7

4. Find the LDU -decomposition of the matrix A in Exercise 3.



2 5. Let A = 1



1 1 and x0 = . 2 0

(a) Identify the dominant eigenvalue of A and then find the corresponding dominant unit eigenvector v with positive entries. (b) Apply the power method with Euclidean scaling to A and x0 , stopping at x5 . Compare your value of x5 to the eigenvector v found in part (a).

(c) Apply the power method with maximum entry scaling to A and x0 , stopping at x5 . Compare your result with the 1 . eigenvector 1 6. Consider the symmetric matrix



0 A= 1

1 0

Discuss the behavior of the power sequence x0 , x1 , . . . , xk , . . . with Euclidean scaling for a general nonzero vector x0 . What is it about the matrix that causes the observed behavior? 7. Suppose that a symmetric matrix A has distinct eigenvalues λ1 = 8, λ2 = 1.4, λ3 = 2.3, and λ4 = −8.1. What can you say about the convergence of the Rayleigh quotients?

8. Find a singular value decomposition of A =

1 1

1 . 1

Chapter 9 Supplementary Exercises



1

⎢ 9. Find a singular value decomposition of A = ⎣0 1

525



1 ⎥ 0⎦. 1

12. Do orthogonally similar matrices have the same singular values? Justify your answer.

10. Find a reduced singular value decomposition and a reduced singular value expansion of the matrix A in Exercise 9.

13. If P is the standard matrix for the orthogonal projection of R n onto a subspace W, what can you say about the singular values of P ?

11. Find the reduced singular value decomposition of the matrix whose singular value decomposition is

14. Prove: If A has rank 1, then there exists a scalar k such that A2 = kA.



1 ⎢2

⎢1 ⎢2 ⎢ A=⎢ ⎢1 ⎢2 ⎣ 1 2

1 2

1 2

− 21

− 21

− 21

1 2

1 2

− 21



1 2 ⎥⎡

⎥ 24 1 ⎥⎢ 2 ⎥⎢ 0 ⎥⎢ ⎣ 0 − 21 ⎥ ⎥ ⎦ 0 1 −2

0 12 0 0

⎤⎡

2 0 ⎢ 3 ⎥ 0⎥⎢ ⎥⎢ 23 0⎦⎢ ⎣ 1 0 −3

− 13 2 3 2 3



2 3⎥

⎥ − 13 ⎥ ⎥ ⎦ 2 3

CHAPTER

10

Applications of Linear Algebra CHAPTER CONTENTS

10.1

Constructing Curves and Surfaces Through Specified Points 528

10.2

The Earliest Applications of Linear Algebra

10.3

Cubic Spline Interpolation

10.4

Markov Chains

10.5

GraphTheory

10.6

Games of Strategy

10.7

Leontief Economic Models

10.8

Forest Management

10.9

Computer Graphics

551 561 570 579

588 595

10.10 EquilibriumTemperature Distributions 10.11 ComputedTomography 10.12 Fractals 10.13 Chaos

603

613

624 639

10.14 Cryptography 10.15 Genetics

652

663

10.16 Age-Specific Population Growth 10.17 Harvesting of Animal Populations

673 683

10.18 A Least Squares Model for Human Hearing 10.19 Warps and Morphs

691

697

10.20 Internet Search Engines INTRODUCTION

533

540

706

This chapter consists of 20 applications of linear algebra. With one clearly marked exception, each application is in its own independent section, so sections can be deleted or permuted as desired. Each topic begins with a list of linear algebra prerequisites. Because our primary objective in this chapter is to present applications of linear algebra, proofs are often omitted. Whenever results from other fields are needed, they are stated precisely, with motivation where possible, but usually without proof.

527

528

Chapter 10 Applications of Linear Algebra

10.1 Constructing Curves and SurfacesThrough Specified Points In this section we describe a technique that uses determinants to construct lines, circles, and general conic sections through specified points in the plane. The procedure is also used to pass planes and spheres in 3-space through fixed points.

PREREQUISITES: Linear Systems Determinants Analytic Geometry The following theorem follows from Theorem 2.3.8.

THEOREM 10.1.1 A homogeneous linear system with as many equations as unknowns

has a nontrivial solution if and only if the determinant of the coefficient matrix is zero.

We will now show how this result can be used to determine equations of various curves and surfaces through specified points.

A LineThroughTwo Points

y

that passes through these two points (Figure 10.1.1). Note that c1 , c2 , and c3 are not all zero and that these coefficients are unique only up to a multiplicative constant. Because (x1 , y1 ) and (x2 , y2 ) lie on the line, substituting them in (1) gives the two equations

(x2, y2) (x1, y1) x

Figure 10.1.1

Suppose that (x1 , y1 ) and (x2 , y2 ) are two distinct points in the plane. There exists a unique line c1 x + c2 y + c3 = 0 (1)

c1 x1 + c2 y1 + c3 = 0 c1 x2 + c2 y2 + c3 = 0

(2) (3)

The three equations, (1), (2), and (3), can be grouped together and rewritten as

xc1 + yc2 + c3 = 0 x1 c1 + y1 c2 + c3 = 0 x2 c1 + y2 c2 + c3 = 0 which is a homogeneous linear system of three equations for c1 , c2 , and c3 . Because c1 , c2 , and c3 are not all zero, this system has a nontrivial solution, so the determinant of the coefficient matrix of the system must be zero. That is,

 x    x1   x2

y y1 y2



1   1 = 0  1

(4)

Consequently, every point (x, y) on the line satisfies (4); conversely, it can be shown that every point (x, y) that satisfies (4) lies on the line.

10.1 Constructing Curves and Surfaces Through Specified Points

529

E X A M P L E 1 Equation of a Line

Find the equation of the line that passes through the two points (2, 1) and (3, 7). Solution Substituting the coordinates of the two points into Equation (4) gives

 x   2  3

y 1 7



1   1 = 0  1

The cofactor expansion of this determinant along the first row then gives

−6x + y + 11 = 0

A CircleThroughThree Points

Suppose that there are three distinct points in the plane, (x1 , y1 ), (x2 , y2 ), and (x3 , y3 ), not all lying on a straight line. From analytic geometry we know that there is a unique circle, say, c1 (x 2 + y 2 ) + c2 x + c3 y + c4 = 0 (5) that passes through them (Figure 10.1.2). Substituting the coordinates of the three points into this equation gives

y (x2, y2) (x1, y1)

c1 (x12 + y12 ) + c2 x1 + c3 y1 + c4 = 0 c1 (x22 c1 (x32 (x3, y3)

Figure 10.1.2

x

+ +

y22 ) y32 )

(6)

+ c2 x2 + c3 y2 + c4 = 0

(7)

+ c2 x3 + c3 y3 + c4 = 0

(8)

As before, Equations (5) through (8) form a homogeneous linear system with a nontrivial solution for c1 , c2 , c3 , and c4 . Thus the determinant of the coefficient matrix is zero:

 2  x + y2  2  x + y2 1  1  2  x2 + y22   x2 + y2 3

3

x x1

y y1

x2

y2

x3

y3



1  1 

=0  1 1

(9)

This is a determinant form for the equation of the circle.

E X A M P L E 2 Equation of a Circle

Find the equation of the circle that passes through the three points (1, 7), (6, 2), and (4, 6). Solution Substituting the coordinates of the three points into Equation (9) gives

 2  x + y2   50    40   52

x

y

1 6 4

7 2 6



1  1  =0 1  1

which reduces to 10(x 2 + y 2 ) − 20x − 40y − 200 = 0 In standard form this is

(x − 1)2 + (y − 2)2 = 52 Thus the circle has center (1, 2) and radius 5.

530

Chapter 10 Applications of Linear Algebra

A General Conic Section Through Five Points

In his momumental work Principia Mathematica, Issac Newton posed and solved the following problem (Book I, Proposition 22, Problem 14): “To describe a conic that shall pass through five given points.” Newton solved this problem geometrically, as shown in Figure 10.1.3, in which he passed an ellipse through the points A, B, D, P, C; however, the methods of this section can also be applied. C P

S

t

T

r d

R

D p

B

Q

A

Figure 10.1.3

e

The general equation of a conic section in the plane (a parabola, hyperbola, or ellipse, or degenerate forms of these curves) is given by

c1 x 2 + c2 xy + c3 y 2 + c4 x + c5 y + c6 = 0 This equation contains six coefficients, but we can reduce the number to five if we divide through by any one of them that is not zero. Thus only five coefficients must be determined, so five distinct points in the plane are sufficient to determine the equation of the conic section (Figure 10.1.4). As before, the equation can be put in determinant form (see Exercise 7):  2  x xy y 2 x y 1    x2 x y y2 x y 1  1 1 1 1  1  1

y (x1, y1) (x2, y2) (x3, y3) (x5, y5) (x4, y4)

 2 x  2  2  x3   x2  4  2 x 5

x

Figure 10.1.4

x2 y2

y22

x2

y2

x3 y3

y32

x3

y3

x4 y4

y42

x4

y4

x5 y5

y52

x5

y5



1 

=0  1   1

(10)

1

E X A M P L E 3 Equation of an Orbit

An astronomer who wants to determine the orbit of an asteroid about the Sun sets up a Cartesian coordinate system in the plane of the orbit with the Sun at the origin. Astronomical units of measurement are used along the axes (1 astronomical unit = mean distance of Earth to Sun = 93 million miles). By Kepler’s first law, the orbit must be an ellipse, so the astronomer makes five observations of the asteroid at five different times and finds five points along the orbit to be

(8.025, 8.310), (10.170, 6.355), (11.202, 3.212), (10.736, 0.375), (9.092, −2.267) Find the equation of the orbit. Solution Substituting the coordinates of the five given points into (10) and rounding to

three decimal places give

  x2 xy   64.401 66.688   103.429 64.630    125.485 35.981   115.262 4.026   82.664 −20.612

y2 69.056 40.386 10.317 0.141 5.139

x y 8.025 8.310 10.170 6.355 11.202 3.212 10.736 0.375 9.092 −2.267



1  1  1  =0 1  1  1

10.1 Constructing Curves and Surfaces Through Specified Points

531

The cofactor expansion of this determinant along the first row yields 386.802x 2 − 102.895xy + 446.029y 2 − 2476.443x − 1427.998y − 17109.375 = 0 Figure 10.1.5 is an accurate diagram of the orbit, together with the five given points.

Figure 10.1.5

A PlaneThroughThree Points

10 (8.025, 8.310) 8 (10.170, 6.355) 6 4 (11.202, 3.212) 2 Sun (10.736, 0.375) 0 –2 (9.092, –2.267) –4 –6 –6 –4 –2 0 2 4 6 8 10 12 14 16 18 20 22

In Exercise 8 we ask you to show the following: The plane in 3-space with equation

c1 x + c2 y + c3 z + c4 = 0 that passes through three noncollinear points (x1 , y1 , z1 ), (x2 , y2 , z2 ), and (x3 , y3 , z3 ) is given by the determinant equation

 x  x  1   x2   x3

y y1 y2 y3

z z1 z2 z3



1  1  =0 1  1

(11)

E X A M P L E 4 Equation of a Plane

The equation of the plane that passes through the three noncollinear points (1, 1, 0), (2, 0, −1), and (2, 9, 2) is   x 1  y z  1 1 0 1    =0 2 0 −1 1   2 9 2 1 which reduces to 2x − y + 3z − 1 = 0 A SphereThrough Four Points

In Exercise 9 we ask you to show the following: The sphere in 3-space with equation

c1 (x 2 + y 2 + z2 ) + c2 x + c3 y + c4 z + c5 = 0 that passes through four noncoplanar points (x1 , y1 , z1 ), (x2 , y2 , z2 ), (x3 , y3 , z3 ), and (x4 , y4 , z4 ) is given by the following determinant equation:

 2  x + y 2 + z2   x 2 + y 2 + z2  1 1 1  2  x2 + y22 + z22   2  x3 + y32 + z32   x 2 + y 2 + z2 4 4 4

x x1

y y1

z z1

x2

y2

z2

x3

y3

z3

x4

y4

z4



1  1 



1  = 0

 1  1

(12)

532

Chapter 10 Applications of Linear Algebra

E X A M P L E 5 Equation of a Sphere

The equation of the sphere that passes through the four points (0, 3, 2), (1, −1, 1), (2, 1, 0), and (5, 1, 3) is

 2  x + y 2 + z2   13   3    5   35

x

y

z

0 1 2 5

3 −1 1 1

2 1 0 3



1  1   1 = 0  1  1

This reduces to

x 2 + y 2 + z2 − 4x − 2y − 6z + 5 = 0 which in standard form is

(x − 2)2 + (y − 1)2 + (z − 3)2 = 9

Exercise Set 10.1 1. Find the equations of the lines that pass through the following points: (a) (1, −1), (2, 2)

(b) (0, 1), (1, −1)

10. Find a determinant equation for the parabola of the form

c1 y + c2 x 2 + c3 x + c4 = 0

2. Find the equations of the circles that pass through the following points:

that passes through three given noncollinear points in the plane.

(b) (2, −2), (3, 5), (−4, 6)

11. What does Equation (9) become if the three distinct points are collinear?

(a) (2, 6), (2, 0), (5, 3)

3. Find the equation of the conic section that passes through the points (0, 0), (0, −1), (2, 0), (2, −5), and (4, −1). 4. Find the equations of the planes in 3-space that pass through the following points: (a) (1, 1, −3), (1, −1, 1), (0, −1, 2) (b) (2, 3, 1), (2, −1, −1), (1, 2, 1) 5. (a) Alter Equation (11) so that it determines the plane that passes through the origin and is parallel to the plane that passes through three specified noncollinear points. (b) Find the two planes described in part (a) corresponding to the triplets of points in Exercises 4(a) and 4(b). 6. Find the equations of the spheres in 3-space that pass through the following points: (a) (1, 2, 3), (−1, 2, 1), (1, 0, 1), (1, 2, −1) (b) (0, 1, −2), (1, 3, 1), (2, −1, 0), (3, 1, −1) 7. Show that Equation (10) is the equation of the conic section that passes through five given distinct points in the plane. 8. Show that Equation (11) is the equation of the plane in 3-space that passes through three given noncollinear points. 9. Show that Equation (12) is the equation of the sphere in 3space that passes through four given noncoplanar points.

12. What does Equation (11) become if the three distinct points are collinear? 13. What does Equation (12) become if the four points are coplanar?

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. The general equation of a quadric surface is given by

a1 x 2 + a2 y 2 + a3 z2 + a4 xy + a5 xz + a6 yz + a7 x + a8 y + a9 z + a10 = 0 Given nine points on this surface, it may be possible to determine its equation.

10.2 The Earliest Applications of Linear Algebra

(a) Show that if the nine points (xi , yi ) for i = 1, 2, 3, . . . , 9 lie on this surface, and if they determine uniquely the equation of this surface, then its equation can be written in determinant form as

  2 x  2  x1   2  x2   x2  3  2 x  4  2  x5   x2  6   x2  7  2  x8   2  x9

y2 y12

z2 z12

xy x1 y1

xz x1 z1

yz y1 z1

x x1

y y1

z z1

y22

z22

x2 y2

x 2 z2

y2 z2

x2

y2

z2

y32

z32

x3 y3

x3 z3

y3 z3

x3

y3

z3

y42

z42

x4 y4

x4 z4

y4 z4

x4

y4

z4

y52

z52

x5 y5

x5 z5

y5 z5

x5

y5

z5

y62

z62

x6 y6

x6 z6

y6 z6

x6

y6

z6

y72

z72

x7 y7

x7 z7

y7 z7

x7

y7

z7

y82

z82

x8 y8

x8 z8

y8 z8

x8

y8

z8

y92

z92

x9 y9

x9 z9

y9 z9

x9

y9

z9

   1   1  1   1  =0 1   1  1   1   1 1

(b) Use the result in part (a) to determine the equation of the quadric surface that passes through the points (1, 2, 3), (2, 1, 7), (0, 4, 6), (3, −1, 4), (3, 0, 11), (−1, 5, 8), (9, −8, 3), (4, 5, 3), and (−2, 6, 10).

533

A point

(x10 , x20 , x30 , . . . , xn0 ) ∈ R n lies on this hyperplane if

a1 x10 + a2 x20 + a3 x30 + · · · + an xn0 + an+1 = 0 Given that the n points (x1i , x2i , x3i , . . . , xni ), i = 1, 2, 3, . . . , n, lie on this hyperplane and that they uniquely determine the equation of the hyperplane, show that the equation of the hyperplane can be written in determinant form as

  x1   x11    x12   x13   ..  .  x

1n

x2 x21 x22 x23 .. .

x3 x31 x32 x33 .. .

··· ··· ··· ··· .. .

xn xn1 xn2 xn3 .. .

x2n

x3n

···

xnn

  1   1 =0 1  ..  .  1 1

(b) Determine the equation of the hyperplane in R 9 that goes through the following nine points:

T2. (a) A hyperplane in the n-dimensional Euclidean space R n has an equation of the form

a1 x1 + a2 x2 + a3 x3 + · · · + an xn + an+1 = 0

(1, 2, 3, 4, 5, 6, 7, 8, 9) (3, 4, 5, 6, 7, 8, 9, 1, 2) (5, 6, 7, 8, 9, 1, 2, 3, 4) (7, 8, 9, 1, 2, 3, 4, 5, 6) (9, 1, 2, 3, 4, 5, 6, 7, 8)

(2, 3, 4, 5, 6, 7, 8, 9, 1) (4, 5, 6, 7, 8, 9, 1, 2, 3) (6, 7, 8, 9, 1, 2, 3, 4, 5) (8, 9, 1, 2, 3, 4, 5, 6, 7)

where ai , i = 1, 2, 3, . . . , n + 1, are constants, not all zero, and xi , i = 1, 2, 3, . . . , n, are variables for which

(x1 , x2 , x3 , . . . , xn ) ∈ R n

10.2 The Earliest Applications of Linear Algebra Linear systems can be found in the earliest writings of many ancient civilizations. In this section we give some examples of the types of problems that they used to solve.

PREREQUISITES: Linear Systems

The practical problems of early civilizations included the measurement of land, the distribution of goods, the tracking of resources such as wheat and cattle, and taxation and inheritance calculations. In many cases, these problems led to linear systems of equations since linearity is one of the simplest relationships that can exist among variables. In this section we present examples from five diverse ancient cultures illustrating how they used and solved systems of linear equations. We restrict ourselves to examples before A.D. 500. These examples consequently predate the development of the field of algebra by Islamic/Arab mathematicians, a field that ultimately led in the nineteenth century to the branch of mathematics now called linear algebra.

534

Chapter 10 Applications of Linear Algebra

E X A M P L E 1 Egypt (about 1650 B.C.)

Problem 40 of the Ahmes Papyrus [Image: © The Trustees of the British Museum]

The Ahmes (or Rhind) Papyrus is the source of most of our information about ancient Egyptian mathematics. This 5-meter-long papyrus contains 84 short mathematical problems, together with their solutions, and dates from about 1650 B.C. Problem 40 in this papyrus is the following: Divide 100 hekats of barley among five men in arithmetic progression so that the sum of the two smallest is one-seventh the sum of the three largest. Let a be the least amount that any man obtains, and let d be the common difference of the terms in the arithmetic progression. Then the other four men receive a + d , a + 2d , a + 3d , and a + 4d hekats. The two conditions of the problem require that

a + (a + d) + (a + 2d) + (a + 3d) + (a + 4d) = 100 1 [(a 7

+ 2d) + (a + 3d) + (a + 4d)] = a + (a + d)

These equations reduce to the following system of two equations in two unknowns: 5a + 10d = 100 11a − 2d =

0

(1)

The solution technique described in the papyrus is known as the method of false position or false assumption. It begins by assuming some convenient value of a (in our case a = 1), substituting that value into the second equation, and obtaining d = 11/2. Substituting a = 1 and d = 11/2 into the left-hand side of the first equation gives 60, whereas the right-hand side is 100. Adjusting the initial guess for a by multiplying it by 100/60 leads to the correct value a = 5/3. Substituting a = 5/3 into the second equation then gives d = 55/6, so the quantities of barley received by the five men are 10/6, 65/6, 120/6, 175/6, and 230/6 hekats. This technique of guessing a value of an unknown and later adjusting it has been used by many cultures throughout the ages. E X A M P L E 2 Babylonia (1900–1600 B.C.)

The Old Babylonian Empire flourished in Mesopotamia between 1900 and 1600 B.C. Many clay tablets containing mathematical tables and problems survive from that period, one of which (designated Ca MLA 1950) contains the next problem. The statement of the problem is a bit muddled because of the condition of the tablet, but the diagram and the solution on the tablet indicate that the problem is as follows:

Babylonian clay tablet Ca MLA 1950 [Image: American Oriental Society/American Schools of Oriental Research]

10.2 The Earliest Applications of Linear Algebra

535

30

y

20 Area = 320 x

A trapezoid with an area of 320 square units is cut off from a right triangle by a line parallel to one of its sides. The other side has length 50 units, and the height of the trapezoid is 20 units. What are the upper and the lower widths of the trapezoid? Let x be the lower width of the trapezoid and  y its  upper width. The area of the trapezoid is its height times its average width, so 20 x+y = 320. Using similar triangles, we also 2 y x have 50 = 30 . The solution on the tablet uses these relations to generate the linear system 1 (x 2

+ y) = 16

1 (x 2

− y) = 4

(2)

Adding and subtracting these two equations then gives the solution x = 20 and y = 12. E X A M P L E 3 China (A.D. 263)

Chiu Chang Suan Shu in Chinese characters

The most important treatise in the history of Chinese mathematics is the Chiu Chang Suan Shu, or “The Nine Chapters of the Mathematical Art.” This treatise, which is a collection of 246 problems and their solutions, was assembled in its final form by Liu Hui in A.D. 263. Its contents, however, go back to at least the beginning of the Han dynasty in the second century B.C. The eighth of its nine chapters, entitled “The Way of Calculating by Arrays,” contains 18 word problems that lead to linear systems in three to six unknowns. The general solution procedure described is almost identical to the Gaussian elimination technique developed in Europe in the nineteenth century by Carl Friedrich Gauss (see page 15). The first problem in the eighth chapter is the following: There are three classes of corn, of which three bundles of the first class, two of the second, and one of the third make 39 measures. Two of the first, three of the second, and one of the third make 34 measures. And one of the first, two of the second, and three of the third make 26 measures. How many measures of grain are contained in one bundle of each class? Let x , y , and z be the measures of the first, second, and third classes of corn. Then the conditions of the problem lead to the following linear system of three equations in three unknowns: 3x + 2y + z = 39 2x + 3y + z = 34

x + 2y + 3z = 26

(3)

536

Chapter 10 Applications of Linear Algebra

The solution described in the treatise represented the coefficients of each equation by an appropriate number of rods placed within squares on a counting table. Positive coefficients were represented by black rods, negative coefficients were represented by red rods, and the squares corresponding to zero coefficients were left empty. The counting table was laid out as follows so that the coefficients of each equation appear in columns with the first equation in the rightmost column: 1

2

3

2

3

2

3

1

1

26

34

39

Next, the numbers of rods within the squares were adjusted to accomplish the following two steps: (1) two times the numbers of the third column were subtracted from three times the numbers in the second column and (2) the numbers in the third column were subtracted from three times the numbers in the first column. The result was the following array: 3 4

5

2

8

1

1

39

24

39

In this array, four times the numbers in the second column were subtracted from five times the numbers in the first column, yielding 3 5

2

36

1

1

99

24

39

This last array is equivalent to the linear system 3x + 2y + z = 39 5y + z = 24 36z = 99 This triangular system was solved by a method equivalent to back substitution to obtain

x = 37/4, y = 17/4, and z = 11/4.

E X A M P L E 4 Greece (third century B.C.)

Archimedes c. 287–212 B.C.

Perhaps the most famous system of linear equations from antiquity is the one associated with the first part of Archimedes’ celebrated Cattle Problem. This problem supposedly was posed by Archimedes as a challenge to his colleague Eratosthenes. No solution has come down to us from ancient times, so that it is not known how, or even whether, either of these two geometers solved it.

10.2 The Earliest Applications of Linear Algebra

537

If thou art diligent and wise, O stranger, compute the number of cattle of the Sun, who once upon a time grazed on the fields of the Thrinacian isle of Sicily, divided into four herds of different colors, one milk white, another glossy black, a third yellow, and the last dappled. In each herd were bulls, mighty in number according to these proportions: Understand, stranger, that the white bulls were equal to a half and a third of the black together with the whole of the yellow, while the black were equal to the fourth part of the dappled and a fifth, together with, once more, the whole of the yellow. Observe further that the remaining bulls, the dappled, were equal to a sixth part of the white and a seventh, together with all of the yellow. These were the proportions of the cows: The white were precisely equal to the third part and a fourth of the whole herd of the black; while the black were equal to the fourth part once more of the dappled and with it a fifth part, when all, including the bulls, went to pasture together. Now the dappled in four parts were equal in number to a fifth part and a sixth of the yellow herd. Finally the yellow were in number equal to a sixth part and a seventh of the white herd. If thou canst accurately tell, O stranger, the number of cattle of the Sun, giving separately the number of well-fed bulls and again the number of females according to each color, thou wouldst not be called unskilled or ignorant of numbers, but not yet shalt thou be numbered among the wise. The conventional designation of the eight variables in this problem is

W = number of white bulls B = number of black bulls Y = number of yellow bulls D = number of dappled bulls w = number of white cows b = number of black cows y = number of yellow cows d = number of dappled cows The problem can now be stated as the following seven homogeneous equations in eight unknowns: 1.

W =

2.

B=

3.

D=

4.

w=

5.

b=

1 2

1 4

1 6

1 3

1 4

+

1 3

+

1 5

+

1 7

+

1 4

+

1 5











B +Y

(The white bulls were equal to a half and a third of the black [bulls] together with the whole of the yellow [bulls].)

D+Y

(The black [bulls] were equal to the fourth part of the dappled [bulls] and a fifth, together with, once more, the whole of the yellow [bulls].)

W +Y

(The remaining bulls, the dappled, were equal to a sixth part of the white [bulls] and a seventh, together with all of the yellow [bulls].)

(B + b) (The white [cows] were precisely equal to the third part and a fourth of the whole herd of the black.)

(D + d)

(The black [cows] were equal to the fourth part once more of the dappled and with it a fifth part, when all, including the bulls, went to pasture together.)

538

Chapter 10 Applications of Linear Algebra

6.

d=

7.

y=

1 5

1 6

+

1 6

+

1 7





(Y + y)

(The dappled [cows] in four parts [that is, in totality] were equal in number to a fifth part and a sixth of the yellow herd.)

(W + w)

(The yellow [cows] were in number equal to a sixth part and a seventh of the white herd.)

As we ask you to show in the exercises, this system has infinitely many solutions of the form W = 10,366,482k

B = 7,460,514k Y = 4,149,387k D = 7,358,060k w = 7,206,360k

(4)

b = 4,893,246k y = 5,439,213k d = 3,515,820k where k is any real number. The values k = 1, 2, . . . give infinitely many positive integer solutions to the problem, with k = 1 giving the smallest solution.

E X A M P L E 5 India (fourth century A.D.)

The Bakhshali Manuscript is an ancient work of Indian/Hindu mathematics dating from around the fourth century A.D., although some of its materials undoubtedly come from many centuries before. It consists of about 70 leaves or sheets of birch bark containing mathematical problems and their solutions. Many of its problems are so-called equalization problems that lead to systems of linear equations. One such problem on the fragment shown is the following: Fragment III-5-3v of the Bakhshali Manuscript [Image: Bodleian Library, University of Oxford, MS. Sansk. d. 14, fragment III 5 3v.]

One merchant has seven asava horses, a second has nine haya horses, and a third has ten camels. They are equally well off in the value of their animals if each gives two animals, one to each of the others. Find the price of each animal and the total value of the animals possessed by each merchant. Let x be the price of an asava horse, let y be the price of a haya horse, let z be the price of a camel, and the let K be the total value of the animals possessed by each merchant. Then the conditions of the problem lead to the following system of equations: 5x + y + z = K x + 7y + z = K x + y + 8z = K

(5)

The method of solution described in the manuscript begins by subtracting the quantity (x + y + z) from both sides of the three equations to obtain 4x = 6y = 7z = K − (x + y + z). This shows that if the prices x , y , and z are to be integers, then the quantity K − (x + y + z) must be an integer that is divisible by 4, 6, and 7. The manuscript takes the product of these three numbers, or 168, for the value of K − (x + y + z), which yields x = 42, y = 28, and z = 24 for the prices and K = 262 for the total value. (See Exercise 6 for more solutions to this problem.)

10.2 The Earliest Applications of Linear Algebra

539

Exercise Set 10.2 1. The following lines from Book 12 of Homer’s Odyssey relate a precursor of Archimedes’ Cattle Problem: Thou shalt ascend the isle triangular, Where many oxen of the Sun are fed, And fatted flocks. Of oxen fifty head

x1 , x2 , . . . , xn in the following linear system: x1 + x2 + · · · + xn = a1 x1 + x2 = a2 x1 + x3 = a3 .. . x1 + xn = an

In every herd feed, and their herds are seven; And of his fat flocks is their number even. The last line means that there are as many sheep in all the flocks as there are oxen in all the herds. What is the total number of oxen and sheep that belong to the god of the Sun? (This was a difficult problem in Homer’s day.) 2. Solve the following problems from the Bakhshali Manuscript. (a) B possesses two times as much as A; C has three times as much as A and B together; D has four times as much as A, B, and C together. Their total possessions are 300. What is the possession of A? (b) B gives 2 times as much as A; C gives 3 times as much as B; D gives 4 times as much as C. Their total gift is 132. What is the gift of A? 3. A problem on a Babylonian tablet requires finding the length and width of a rectangle given that the length and the width add up to 10, while the length and one-fourth of the width add up to 7. The solution provided on the tablet consists of the following four statements: Multiply 7 by 4 to obtain 28. Take away 10 from 28 to obtain 18. Take one-third of 18 to obtain 6, the length. Take away 6 from 10 to obtain 4, the width. Explain how these steps lead to the answer. 4. The following two problems are from “The Nine Chapters of the Mathematical Art.” Solve them using the array technique described in Example 3. (a) Five oxen and two sheep are worth 10 units and two oxen and five sheep are worth 8 units. What is the value of each ox and sheep? (b) There are three kinds of corn. The grains contained in two, three, and four bundles, respectively, of these three classes of corn, are not sufficient to make a whole measure. However, if we added to them one bundle of the second, third, and first classes, respectively, then the grains would become on full measure in each case. How many measures of grain does each bundle of the different classes contain? 5. This problem in part (a) is known as the “Flower of Thymaridas,” named after a Pythagorean of the fourth century B.C. (a) Given the n numbers a1 , a2 , . . . , an , solve for

(b) Identify a problem in this exercise set that fits the pattern in part (a), and solve it using your general solution. 6. For Example 5 from the Bakhshali Manuscript: (a) Express Equations (5) as a homogeneous linear system of three equations in four unknowns (x , y , z, and K ) and show that the solution set has one arbitrary parameter. (b) Find the smallest solution for which all four variables are positive integers. (c) Show that the solution given in Example 5 is included among your solutions. 7. Solve the problems posed in the following three epigrams, which appear in a collection entitled “The Greek Anthology,” compiled in part by a scholar named Metrodorus around A.D. 500. Some of its 46 mathematical problems are believed to date as far back as 600 B.C. [Note: Before solving parts (a) and (c), you will have to formulate the question.] (a) I desire my two sons to receive the thousand staters of which I am possessed, but let the fifth part of the legitimate one’s share exceed by ten the fourth part of what falls to the illegitimate one. (b) Make me a crown weighing sixty minae, mixing gold and brass, and with them tin and much-wrought iron. Let the gold and brass together form two-thirds, the gold and tin together three-fourths, and the gold and iron three-fifths. Tell me how much gold you must put in, how much brass, how much tin, and how much iron, so as to make the whole crown weigh sixty minae. (c) First person: I have what the second has and the third of what the third has. Second person: I have what the third has and the third of what the first has. Third person: And I have ten minae and the third of what the second has.

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.

540

Chapter 10 Applications of Linear Algebra

T1. (a) Solve Archimedes’ Cattle Problem using a symbolic algebra program. (b) The Cattle Problem has a second part in which two additional conditions are imposed. The first of these states that “When the white bulls mingled their number with the black, they stood firm, equal in depth and breadth.” This requires that W + B be a square number, that is, 1, 4, 9, 16, 25, and so on. Show that this requires that the values of k in Eq. (4) be restricted as follows: k = 4,456,749r 2 , r = 1, 2, 3, . . . and find the smallest total number of cattle that satisfies this second condition.

1 + 2, 1 + 2 + 3, 1 + 2 + 3 + 4, . . . . This final part of the problem was not completely solved until 1965 when all 206,545 digits of the smallest number of cattle that satisfies this condition were found using a computer. T2. The following problem is from “The Nine Chapters of the Mathematical Art” and determines a homogeneous linear system of five equations in six unknowns. Show that the system has infinitely many solutions, and find the one for which the depth of the well and the lengths of the five ropes are the smallest possible positive integers. Suppose that five families share a well. Suppose further that

Remark The second condition imposed in the second part of

2 of A’s ropes are short of the well’s depth by one of B’s ropes.

the Cattle Problem states that “When the yellow and the dappled bulls were gathered into one herd, they stood in such a manner that their number, beginning from one, grew slowly greater ’til it completed a triangular figure.” This requires that the quantity Y + D be a triangular number—that is, a number of the form 1,

3 of B’s ropes are short of the well’s depth by one of C’s ropes. 4 of C’s ropes are short of the well’s depth by one of D’s ropes. 5 of D’s ropes are short of the well’s depth by one of E’s ropes. 6 of E’s ropes are short of the well’s depth by one of A’s ropes.

10.3 Cubic Spline Interpolation In this section an artist’s drafting aid is used as a physical model for the mathematical problem of finding a curve that passes through specified points in the plane. The parameters of the curve are determined by solving a linear system of equations.

PREREQUISITES: Linear Systems Matrix Algebra Differential Calculus

Curve Fitting

Fitting a curve through specified points in the plane is a common problem encountered in analyzing experimental data, in ascertaining the relations among variables, and in design work. A ubiquitous application is in the design and description of computer and printer fonts, such as PostScriptTM and TrueTypeTM fonts (Figure 10.3.1). In Figure 10.3.2

Figure 10.3.1

10.3 Cubic Spline Interpolation

541

seven points in the xy -plane are displayed, and in Figure 10.3.4 a smooth curve has been drawn that passes through them. A curve that passes through a set of points in the plane is said to interpolate those points, and the curve is called an interpolating curve for those points. The interpolating curve in Figure 10.3.4 was drawn with the aid of a drafting spline (Figure 10.3.3). This drafting aid consists of a thin, flexible strip of wood or other material that is bent to pass through the points to be interpolated. Attached sliding weights hold the spline in position while the artist draws the interpolating curve. The drafting spline will serve as the physical model for a mathematical theory of interpolation that we will discuss in this section.

y

y

x

Figure 10.3.2

Statement of the Problem

x

Figure 10.3.3

Figure 10.3.4

Suppose that we are given n points in the xy -plane,

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) which we wish to interpolate with a “well-behaved” curve (Figure 10.3.5). For convenience, we take the points to be equally spaced in the x -direction, although our results can easily be extended to the case of unequally spaced points. If we let the common distance between the x -coordinates of the points be h, then we have

x2 − x1 = x3 − x2 = · · · = xn − xn−1 = h Let y = S(x), x1 ≤ x ≤ xn denote the interpolating curve that we seek. We assume that this curve describes the displacement of a drafting spline that interpolates the n points when the weights holding down the spline are situated precisely at the n points. It is known from linear beam theory that for small displacements, the fourth derivative of the displacement of a beam is zero along any interval of the x -axis that contains no external forces acting on the beam. If we treat our drafting spline as a thin beam and realize that the only external forces acting on it arise from the weights at the n specified points, then it follows that S (iv) (x) ≡ 0 (1) for values of x lying in the n − 1 open intervals

(x1 , x2 ), (x2 , x3 ), . . . , (xn−1 , xn ) between the n points.

y

(x3, y3) (x2, y2)

y = S(x) (xn–1, yn–1)

(x1, y1) h

Figure 10.3.5

(xn, yn) h

h

h

x

542

Chapter 10 Applications of Linear Algebra

We also need the result from linear beam theory that states that for a beam acted upon only by external forces, the displacement must have two continuous derivatives. In the case of the interpolating curve y = S(x) constructed by the drafting spline, this means that S(x), S (x), and S

(x) must be continuous for x1 ≤ x ≤ xn . The condition that S

(x) be continuous is what causes a drafting spline to produce a pleasing curve, as it results in continuous curvature. The eye can perceive sudden changes in curvature—that is, discontinuities in S

(x)—but sudden changes in higher derivatives are not discernible. Thus, the condition that S

(x) be continuous is the minimal prerequisite for the interpolating curve to be perceptible as a single smooth curve, rather than as a series of separate curves pieced together. To determine the mathematical form of the function S(x), we observe that because S (iv) (x) ≡ 0 in the intervals between the n specified points, it follows by integrating this equation four times that S(x) must be a cubic polynomial in x in each such interval. In general, however, S(x) will be a different cubic polynomial in each interval, so S(x) must have the form ⎧

x1 ≤ x ≤ x2 x2 ≤ x ≤ x3

S1 (x), ⎪ ⎪ ⎪ ⎨S (x), 2 S(x) = .. ⎪ . ⎪ ⎪ ⎩ Sn−1 (x),

(2)

xn−1 ≤ x ≤ xn

where S1 (x), S2 (x), . . . , Sn−1 (x) are cubic polynomials. For convenience, we will write these in the form

S1 (x) = a1 (x − x1 )3 + b1 (x − x1 )2 + c1 (x − x1 ) + d1 ,

x1 ≤ x ≤ x2

S2 (x) = a2 (x − x2 ) + b2 (x − x2 ) + c2 (x − x2 ) + d2 , .. .

x2 ≤ x ≤ x3

3

2

(3)

Sn−1 (x) = an−1 (x − xn−1 ) + bn−1 (x − xn−1 ) 3

2

+ cn−1 (x − xn−1 ) + dn−1 ,

xn−1 ≤ x ≤ xn

The ai’s, bi’s, ci’s, and di’s constitute a total of 4n − 4 coefficients that we must determine to specify S(x) completely. If we choose these coefficients so that S(x) interpolates the n specified points in the plane and S(x), S (x), and S

(x) are continuous, then the resulting interpolating curve is called a cubic spline. Derivation of the Formula of a Cubic Spline

From Equations (2) and (3), we have

S(x) = S1 (x) = a1 (x − x1 )3 + b1 (x − x1 )2 + c1 (x − x1 ) + d1 ,

x1 ≤ x ≤ x2

S(x) = S2 (x) = a2 (x − x2 ) + b2 (x − x2 ) + c2 (x − x2 ) + d2 , .. .. . .

x2 ≤ x ≤ x3

3

2

S(x) = Sn−1 (x) = an−1 (x − xn−1 )3 + bn−1 (x − xn−1 )2 + cn−1 (x − xn−1 ) + dn−1 ,

xn−1 ≤ x ≤ xn (4)

so

S (x) = S1 (x) = 3a1 (x − x1 )2 + 2b1 (x − x1 ) + c1 ,

x1 ≤ x ≤ x2

S (x) = S2 (x) = 3a2 (x − x2 )3 + 2b2 (x − x2 ) + c2 , .. .. . .

x2 ≤ x ≤ x3

2 S (x) = Sn− 1 (x) = 3an−1 (x − xn−1 ) + 2bn−1 (x − xn−1 ) + cn−1 ,

xn−1 ≤ x ≤ xn (5)

10.3 Cubic Spline Interpolation

543

and

S

(x) = S1

(x) = 6a1 (x − x1 ) + 2b1 , S

(x) = S2

(x) = 6a2 (x − x2 ) + 2b2 , .. .. . .

x1 ≤ x ≤ x2



S

(x) = Sn− 1 (x) = 6an−1 (x − xn−1 ) + 2bn−1 ,

xn−1 ≤ x ≤ xn

x2 ≤ x ≤ x3

(6)

We will now use these equations and the four properties of cubic splines stated below to express the unknown coefficients ai , bi , ci , di , i = 1, 2, . . . , n − 1, in terms of the known coordinates y1 , y2 , . . . , yn . 1. S(x) interpolates the points (xi , yi ), i = 1, 2, . . . , n. Because S(x) interpolates the points (xi , yi ), i = 1, 2, . . . , n, we have

S(x1 ) = y1 , S(x2 ) = y2 , . . . , S(xn ) = yn

(7)

From the first n − 1 of these equations and (4), we obtain

d1 = y1 d2 = y2 .. .

(8)

dn−1 = yn−1 From the last equation in (7), the last equation in (4), and the fact that xn − xn−1 = h, we obtain

an−1 h3 + bn−1 h2 + cn−1 h + dn−1 = yn

(9)

2. S(x) is continuous on [x1 , xn ]. Because S(x) is continuous for x1 ≤ x ≤ xn , it follows that at each point xi in the set x2 , x3 , . . . , xn−1 we must have

Si−1 (xi ) = Si (xi ),

i = 2 , 3, . . . , n − 1

(10)

Otherwise, the graphs of Si−1 (x) and Si (x) would not join together to form a continuous curve at xi . When we apply the interpolating property Si (xi ) = yi , it follows from (10) that Si−1 (xi ) = yi , i = 2, 3, . . . , n − 1, or from (4) that

a1 h3 + b1 h2 + c1 h + d1 = y2 a2 h3 + b2 h2 + c2 h + d2 = y3 .. . an−2 h3 + bn−2 h2 + cn−2 h + dn−2 = yn−1 3. S (x) is continuous on [x1 , xn ]. Because S (x) is continuous for x1 ≤ x ≤ xn , it follows that



Si− 1 (xi ) = Si (xi ),

i = 2 , 3, . . . , n − 1

(11)

544

Chapter 10 Applications of Linear Algebra

or, from (5), 3a1 h2 + 2b1 h + c1 = c2 3a2 h2 + 2b2 h + c2 = c3

.. .

(12)

3an−2 h2 + 2bn−2 h + cn−2 = cn−1 4. S

(x) is continuous on [x1 , x2 ]. Because S

(x) is continuous for x1 ≤ x ≤ xn , it follows that



Si− 1 (xi ) = Si (xi ),

i = 2 , 3, . . . , n − 1

or, from (6), 6a1 h + 2b1 = 2b2 6a2 h + 2b2 = 2b3 6an−2 h + 2bn−2

.. . = 2bn−1

(13)

Equations (8), (9), (11), (12), and (13) constitute a system of 4n − 6 linear equations in the 4n − 4 unknown coefficients ai , bi , ci , di , i = 1, 2, . . . , n − 1. Consequently, we need two more equations to determine these coefficients uniquely. Before obtaining these additional equations, however, we can simplify our existing system by expressing the unknowns ai , bi , ci , and di in terms of new unknown quantities

M1 = S

(x1 ), M2 = S

(x2 ), . . . , Mn = S

(xn ) and the known quantities

y1 , y2 , . . . , yn For example, from (6) it follows that

M1 = 2b1 M2 = 2b2 .. . Mn−1 = 2bn−1 so

b1 = 21 M1 , b2 = 21 M2 , . . . , bn−1 = 21 Mn−1 Moreover, we already know from (8) that

d1 = y1 , d2 = y2 , . . . , dn−1 = yn−1 We leave it as an exercise for you to derive the expressions for the ai’s and ci’s in terms of the Mi’s and yi’s. The final result is as follows:

10.3 Cubic Spline Interpolation

545

THEOREM 10.3.1 Cubic Spline Interpolation

Given n points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) with xi+1 − xi = h, i = 1, 2, . . . , n − 1, the cubic spline

⎧ a1 (x − x1 )3 + b1 (x − x1 )2 + c1 (x − x1 ) + d1 , ⎪ ⎪ ⎪ ⎪ 3 2 ⎪ ⎪ ⎨a2 (x − x2 ) + b2 (x .− x2 ) + c2 (x − x2 ) + d2 , .. S(x) = ⎪ ⎪ ⎪ ⎪ an−1 (x − xn−1 )3 + bn−1 (x − xn−1 )2 ⎪ ⎪ ⎩ + cn−1 (x − xn−1 ) + dn−1 ,

x1 ≤ x ≤ x2 x2 ≤ x ≤ x3

xn−1 ≤ x ≤ xn

that interpolates these points has coefficients given by

ai = (Mi+1 − Mi )/6h bi = Mi /2 ci = (yi+1 − yi )/ h − [(Mi+1 + 2Mi )h/6] di = yi

(14)

for i = 1, 2, . . . , n − 1, where Mi = S

(xi ), i = 1, 2, . . . , n.

From this result, we see that the quantities M1 , M2 , . . . , Mn uniquely determine the cubic spline. To find these quantities, we substitute the expressions for ai , bi , and ci given in (14) into (12). After some algebraic simplification, we obtain

M1 + 4M2 + M3 = 6(y1 − 2y2 + y3 )/ h2 M2 + 4M3 + M4 = 6(y2 − 2y3 + y4 )/ h2 .. .

(15)

Mn−2 + 4Mn−1 + Mn = 6(yn−2 − 2yn−1 + yn )/ h2 or, in matrix form,

⎡ ⎡

1 ⎢0 ⎢ ⎢ ⎢0

4 1 0

1 4 1

0 1 4

··· ··· ···

0 0 0

0 0 0

0 0 0

0

0 0 0

0 0 0

0 0 0

··· 4 ··· 1 ··· 0

1 4 1

0 1 4

⎢. ⎢ .. ⎢ ⎢0 ⎢ ⎢ ⎣0

.. .

.. .

.. .

.. .

.. .

.. .

⎤⎢

M1 M2 M3 M4 .. .



⎡ ⎤ ⎥ y1 − 2y2 + y3 ⎥ ⎥ ⎢ y2 − 2y3 + y4 ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎥ y − 2 y + y ⎢ ⎥ 3 4 5 ⎥ ⎥ ⎢ ⎥ 6 ⎢ .. ⎥ ⎢ .. ⎥ ⎥ = ⎥ .⎥ ⎢ . ⎢ ⎥ 2 ⎥ h ⎢ ⎢ ⎥ ⎥ ⎥ 0⎥ ⎢ M y − 2 y + y ⎢ n− 3 n−4 n−3 n−2 ⎥ ⎥ ⎥⎢ ⎢ ⎥ ⎥ ⎣yn−3 − 2yn−2 + yn−1 ⎦ 0⎦ ⎢ ⎢Mn−2 ⎥ ⎥ ⎢ 1 ⎣Mn−1 ⎦ yn−2 − 2yn−1 + yn Mn 0 ⎢ ⎢ 0⎥ ⎥⎢ ⎥⎢ 0⎥ ⎢ ⎢

This is a linear system of n − 2 equations for the n unknowns M1 , M2 , . . . , Mn . Thus, we still need two additional equations to determine M1 , M2 , . . . , Mn uniquely. The reason for this is that there are infinitely many cubic splines that interpolate the given points, so we simply do not have enough conditions to determine a unique cubic spline passing through the points. We discuss below three possible ways of specifying the two additional conditions required to obtain a unique cubic spline through the points. (The exercises present two more.) They are summarized in Table 1.

Chapter 10 Applications of Linear Algebra Table 1

The Natural Spline

...

=

0 0

0 ... 0 ...

1 0

4 1

1 4

Mn–2 Mn–1

5 1

1 4

0 ... 1 ...

0 0

0 0

0 0

M2 M3

...

... ... ...

0 0

=

0 0

0 0

0 ... 0 ...

1 0

4 1

1 5

Mn–2 Mn–1

6 1

0 4

0 ... 1 ...

0 0

0 0

0 0

M2 M3

0 0

0 ... 0 ...

0 0

1 0

4 0

1 6

=

Mn–2 Mn–1

6 h2

6 h2

6 h2

y1 – 2y2 + y3 y2 – 2y3 + y4

...

M2 M3

0 0

yn–2 – 2yn–1 + yn y1 – 2y2 + y3 y2 – 2y3 + y4

...

0 0

yn–2 – 2yn–1 + yn y1 – 2y2 + y3 y2 – 2y3 + y4

...

M1 = 2M2 – M3 The spline is a Mn = 2Mn–1 – Mn–2 single cubic curve on the first two and last two intervals.

0 0

... ... ...

M1 = M2 Mn = Mn–1

0 ... 1 ...

1 4

... ... ...

The spline reduces to a parabolic curve on the first and last intervals.

4 1

...

Cubic Runout Spline

M1 = 0 Mn = 0

... ... ...

Parabolic Runout Spline

The second derivative of the spline is zero at the endpoints.

... ... ...

Natural Spline

... ... ...

546

yn–2 – 2yn–1 + yn

The two simplest mathematical conditions we can impose are

M1 = Mn = 0 These conditions together with (15) result in an n × n linear system for M1 , M2 , . . . , Mn , which can be written in matrix form as



1 ⎢ ⎢1 ⎢ ⎢0

0 4 1

0 1 4

0 0 1

.. .

··· 0 ··· 0 ··· 0 .. .

0 0 0

0

0 0

0 0

0 0

··· 1 ··· 0

4 0

⎢. ⎢ .. ⎢ ⎢ ⎣0

.. .

.. .

.. .

⎤⎡





M1 0 ⎥⎢ M ⎥ 0⎥ ⎢ 2 ⎥ ⎥⎢ ⎥ 0⎥ ⎢ M3 ⎥

0 y1 − 2y2 + y3 y2 − 2 y3 + y 4



⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 6 ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎥ ⎢ .. ⎥ = 2 ⎢ .. ⎥ .⎥ ⎢ . ⎥ h ⎢ . ⎥ ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣Mn−1 ⎦ ⎣yn−2 − 2yn−1 + yn ⎦ 1 Mn 0

For numerical calculations it is more convenient to eliminate M1 and Mn from this system and write



4 ⎢ ⎢1 ⎢ ⎢0

1 4 1

0 1 4

0 0 1

.. .

··· 0 ··· 0 ··· 0 .. .

0

0 0

0 0

0 0

··· 1 4 ··· 0 1

⎢. ⎢ .. ⎢ ⎢ ⎣0

.. .

.. .

0 0 0

.. .

⎤⎡



M2 0 ⎥⎢ ⎥ 0⎥ ⎢ M3 ⎥ ⎥⎢ ⎥ 0 ⎥ ⎢ M4 ⎥



y1 − 2y2 + y3 y2 − 2y3 + y4 y3 − 2y4 + y5 .. .



⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 6 ⎢ ⎥ ⎢ ⎥ ⎥ .. ⎥ ⎢ .. ⎥ = 2 ⎢ ⎥ .⎥ ⎢ . ⎥ h ⎢ ⎥ ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣Mn−2 ⎦ ⎣yn−3 − 2yn−2 + yn−1 ⎦ 4 Mn−1 yn−2 − 2yn−1 + yn

(16)

together with

M1 = 0

(17)

Mn = 0

(18)

Thus, the (n − 2) × (n − 2) linear system can be solved for the n − 2 coefficients M2 , M3 , . . . , Mn−1 , and M1 and Mn are determined by (17) and (18). Physically, the natural spline results when the ends of a drafting spline extend freely beyond the interpolating points without constraint. The end portions of the spline

10.3 Cubic Spline Interpolation

547

outside the interpolating points will fall on straight line paths, causing S

(x) to vanish at the endpoints x1 and xn and resulting in the mathematical conditions M1 = Mn = 0. The natural spline tends to flatten the interpolating curve at the endpoints, which may be undesirable. Of course, if it is required that S

(x) vanish at the endpoints, then the natural spline must be used. The Parabolic Runout Spline

The two additional constraints imposed for this type of spline are

M1 = M2 Mn = Mn−1

(19) (20)

If we use the preceding two equations to eliminate M1 and Mn from (15), we obtain the (n − 2) × (n − 2) linear system



5 ⎢ ⎢1 ⎢ ⎢0

1 4 1

0 1 4

0 0 1

··· ··· ···

0 0 0

0 0 0

0

0 0

0 0

0 0

··· ···

1 0

4 1

⎢. ⎢ .. ⎢ ⎢ ⎣0

.. .

.. .

.. .

.. .

.. .

⎤⎡



M2 0 ⎥⎢ M ⎥ 0⎥ ⎢ 3 ⎥ ⎥⎢ ⎥ 0⎥ ⎢ M4 ⎥



y1 − 2y2 + y3 y2 − 2y3 + y4 y3 − 2y4 + y5 .. .



⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 6 ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎥ ⎢ .. ⎥ = 2 ⎢ ⎥ .⎥ ⎢ . ⎥ h ⎢ ⎥ ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣Mn−2 ⎦ ⎣yn−3 − 2yn−2 + yn−1 ⎦ 5 Mn−1 yn−2 − 2yn−1 + yn

(21)

for M2 , M3 , . . . , Mn−1 . Once these n − 2 values have been determined, M1 and Mn are determined from (19) and (20). From (14) we see that M1 = M2 implies that a1 = 0, and Mn = Mn−1 implies that an−1 = 0. Thus, from (3) there are no cubic terms in the formula for the spline over the end intervals [x1 , x2 ] and [xn−1 , xn ]. Hence, as the name suggests, the parabolic runout spline reduces to a parabolic curve over these end intervals. The Cubic Runout Spline

For this type of spline, we impose the two additional conditions

M1 = 2 M2 − M 3

(22)

Mn = 2Mn−1 − Mn−2

(23)

Using these two equations to eliminate M1 and Mn from (15) results in the following (n − 2) × (n − 2) linear system for M2 , M3 , . . . , Mn−1 :



6 ⎢ ⎢1 ⎢ ⎢0

0 4 1

0 1 4

0 0 1

··· ··· ···

0 0 0

0 0 0

0

0 0

0 0

0 0

··· ···

1 0

4 0

⎢. ⎢ .. ⎢ ⎢ ⎣0

.. .

.. .

.. .

.. .

.. .

⎤⎡



M2 0 ⎥⎢ ⎥ 0⎥ ⎢ M 3 ⎥ ⎥⎢ ⎥ 0⎥ ⎢ M4 ⎥



y1 − 2y2 + y3 y2 − 2y3 + y4 y3 − 2y4 + y5 .. .



⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 6 ⎢ ⎥ ⎢ ⎥ ⎥ .. ⎥ ⎢ .. ⎥ = 2 ⎢ ⎥ .⎥ ⎢ . ⎥ h ⎢ ⎥ ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣Mn−2 ⎦ ⎣yn−3 − 2yn−2 + yn−1 ⎦ 6 Mn−1 yn−2 − 2yn−1 + yn

(24)

After we solve this linear system for M2 , M3 , . . . , Mn−1 , we can use (22) and (23) to determine M1 and Mn . If we rewrite (22) as

M2 − M1 = M3 − M2 it follows from (14) that a1 = a2 . Because S

(x) = 6a1 on [x1 , x2 ] and S

(x) = 6a2 on [x2 , x3 ], we see that S

(x) is constant over the entire interval [x1 , x3 ]. Consequently, S(x) consists of a single cubic curve over the interval [x1 , x3 ] rather than two different cubic curves pieced together at x2 . [To see this, integrate S

(x) three times.] A similar analysis shows that S(x) consists of a single cubic curve over the last two intervals.

548

Chapter 10 Applications of Linear Algebra

Whereas the natural spline tends to produce an interpolating curve that is flat at the endpoints, the cubic runout spline has the opposite tendency: it produces a curve with pronounced curvature at the endpoints. If neither behavior is desired, the parabolic runout spline is a reasonable compromise. E X A M P L E 1 Using a Parabolic Runout Spline

The density of water is well known to reach a maximum at a temperature slightly above freezing. Table 2, from the Handbook of Chemistry and Physics (CRC Press, 2009), gives the density of water in grams per cubic centimeter for five equally spaced temperatures from −10◦ C to 30◦ C. We will interpolate these five temperature–density measurements with a parabolic runout spline and attempt to find the maximum density of water in this range by finding the maximum value on this cubic spline. In the exercises we ask you to perform similar calculations using a natural spline and a cubic runout spline to interpolate the data points. Table 2 Temperature (°C)

Density (g/cm3)

–10 0 10 20 30

.99815 .99987 .99973 .99823 .99567

Set

Then

x1 = −10,

y1 = .99815

x2 =

0,

y2 = .99987

x3 =

10,

y3 = .99973

x4 =

20,

y4 = .99823

x5 =

30,

y5 = .99567

6[y1 − 2y2 + y3 ]/ h2 = −.0001116 6[y2 − 2y3 + y4 ]/ h2 = −.0000816

6[y3 − 2y4 + y5 ]/ h2 = −.0000636 and the linear system (21) for the parabolic runout spline becomes



5 ⎢ ⎣1 0 Solving this system yields

1 4 1

⎤⎡







0 M2 −.0001116 ⎥⎢ ⎥ ⎢ ⎥ 1⎦ ⎣M3 ⎦ = ⎣−.0000816⎦ 5 M4 −.0000636

M2 = −.00001973 M3 = −.00001293 M4 = −.00001013

From (19) and (20), we have

M1 = M2 = −.00001973 M5 = M4 = −.00001013

10.3 Cubic Spline Interpolation

549

Solving for the ai’s, bi’s, ci’s, and di’s in (14), we obtain the following expression for the interpolating parabolic runout spline: ⎧ −.00000987(x + 10)2 ⎪ ⎪ ⎪ ⎨.000000113(x − 0)3 −.00000987(x − 0)2 S(x) = ⎪.000000047(x − 10)3 −.00000647(x − 10)2 ⎪ ⎪ ⎩ −.00000507(x − 20)2

+ .0002707(x + 10) + .99815, −10 ≤ x + .0000733(x − 0) + .99987, 0≤x − .0000900(x − 10) + .99973, 10 ≤ x − .0002053(x − 20) + .99823, 20 ≤ x

≤ 0 ≤ 10 ≤ 20 ≤ 30

This spline is plotted in Figure 10.3.6. From that figure we see that the maximum is attained in the interval [0, 10]. To find this maximum, we set S (x) equal to zero in the interval [0, 10]:

S (x) = .000000339x 2 − .0000197x + .0000733 = 0 To three significant digits the root of this quadratic in the interval [0, 10] is x = 3.99, and for this value of x , S(3.99) = 1.00001. Thus, according to our interpolated estimate, the maximum density of water is 1.00001 g/cm3 attained at 3.99◦ C. This agrees well with the experimental maximum density of 1.00000 g/cm3 attained at 3.98◦ C. (In the original metric system, the gram was defined as the mass of one cubic centimeter of water at its maximum density.) 1.00000

Density (g/cm3)

0.99900

0.99800

0.99700

0.99600

0.99500 –10

Figure 10.3.6

Closing Remarks

0

10

20

30

Temperature (°C)

In addition to producing excellent interpolating curves, cubic splines and their generalizations are useful for numerical integration and differentiation, for the numerical solution of differential and integral equations, and in optimization theory.

Exercise Set 10.3 1. Derive the expressions for ai and ci in Equations (14) of Theorem 10.3.1. 2. The six points

(0, .00000), (.2, .19867), (.4, .38942), (.6, .56464), (.8, .71736), (1.0, .84147) lie on the graph of y = sin x , where x is in radians.

(a) Find the portion of the parabolic runout spline that interpolates these six points for .4 ≤ x ≤ .6. Maintain an accuracy of five decimal places in your calculations. (b) Calculate S(.5) for the spline you found in part (a). What is the percentage error of S(.5) with respect to the “exact” value of sin(.5) = .47943?

550

Chapter 10 Applications of Linear Algebra

3. The following five points

(0, 1), (1, 7), (2, 27), (3, 79), (4, 181) lie on a single cubic curve. (a) Which of the three types of cubic splines (natural, parabolic runout, or cubic runout) would agree exactly with the single cubic curve on which the five points lie? (b) Determine the cubic spline you chose in part (a), and verify that it is a single cubic curve that interpolates the five points. 4. Repeat the calculations in Example 1 using a natural spline to interpolate the five data points. 5. Repeat the calculations in Example 1 using a cubic runout spline to interpolate the five data points.

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. In the solution of the natural cubic spline problem, it is necessary to solve a system of equations having coefficient matrix



4 ⎢1 ⎢

6. Consider the five points (0, 0), (.5, 1), (1, 0), (1.5, −1), and (2, 0) on the graph of y = sin(πx).

⎢. An = ⎢ .. ⎢ ⎣0

(a) Use a natural spline to interpolate the data points (0, 0), (.5, 1), and (1, 0). (b) Use a natural spline to interpolate the data points (.5, 1), (1, 0), and (1.5, −1). (c) Explain the unusual nature of your result in part (b). 7. (The Periodic Spline) If it is known or if it is desired that the n points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) to be interpolated lie on a single cycle of a periodic curve with period xn − x1 , then an interpolating cubic spline S(x) must satisfy

S(x1 ) = S(xn ) S (x1 ) = S (xn ) S

(x1 ) = S

(xn ) (a) Show that these three periodicity conditions require that

y1 = yn M1 = Mn 4M1 + M2 + Mn−1 = 6(yn−1 − 2y1 + y2 )/ h2 (b) Using the three equations in part (a) and Equations (15), construct an (n − 1) × (n − 1) linear system for M1 , M2 , . . . , Mn−1 in matrix form. 8. (The Clamped Spline) Suppose that, in addition to the n points to be interpolated, we are given specific values y1 and yn for the slopes S (x1 ) and S (xn ) of the interpolating cubic spline at the endpoints x1 and xn .

0

1 4

0 1

.. .

··· ··· .. .

0 0

0 0

0 0

0 0

··· ···

1 0

4 1

.. .



0 0⎥ ⎥

.. ⎥ .⎥ ⎥ 1⎦

.. .

.. .

4

If we can present a formula for the inverse of this matrix, then the solution for the natural cubic spline problem can be easily obtained. In this exercise and the next, we use a computer to discover this formula. Toward this end, we first determine an expression for the determinant of An , denoted by the symbol Dn . Given that

A1 = [4] and A2 =

4 1

1 4

we see that

D1 = det(A1 ) = det[4] = 4 and

D2 = det(A2 ) = det

4 1

1 = 15 4

(a) Use the cofactor expansion of determinants to show that

Dn = 4Dn−1 − Dn−2 for n = 3, 4, 5, . . . . This says, for example, that

D3 = 4D2 − D1 = 4(15) − 4 = 56 D4 = 4D3 − D2 = 4(56) − 15 = 209 and so on. Using a computer, check this result for 5 ≤ n ≤ 10. (b) By writing

(a) Show that 2M1 + M2 = 6(y2 − y1 − hy1 )/ h2 2Mn + Mn−1 = 6(yn−1 − yn + hyn )/ h2

Dn = 4Dn−1 − Dn−2 and the identity, Dn−1 = Dn−1 , in matrix form,



(b) Using the equations in part (a) and Equations (15), construct an n × n linear system for M1 , M2 , . . . , Mn in matrix form.

  Dn 4 = 1 Dn−1

−1 0



Dn−1 Dn−2



show that

Remark The clamped spline described in this exercise is the most accurate type of spline for interpolation work if the slopes at the endpoints are known or can be estimated.



Dn 4 = 1 Dn−1

−1 0

n−2

D2 4 = 1 D1

−1 0

n−2 15 4

10.4 Markov Chains

(c) Use the methods in Section 5.2 and a computer to show that  √ √ √ √ (2 + 3 )n−1 − (2 − 3 )n−1 (2 − 3 )n−2 − (2 + 3 )n−2 √ n−2 √ n−2 √ n−3 √ n−3 (2 + 3 ) − (2 − 3 ) (2 − 3 ) − (2 + 3 ) √





4 1

n−2 −1 0

=

1 (a) Use a computer to compute A− k for k = 1, 2, 3, 4, and 5.

(b) From your results in part (a), discover the conjecture that 1 A− n = [αij ]

2 3

and hence

Dn =

(2 +



3 )n+1 − (2 −





where αij = αj i and

3 )n+1

2 3

for n = 1, 2, 3, . . . .

αij = (−1)i+j

(d) Using a computer, check this result for 1 ≤ n ≤ 10. 1 T2. In this exercise, we determine a formula for calculating A− n from Dk for k = 0, 1, 2, 3, . . . , n, assuming that D0 is defined to be 1.

551

Dn−j Di−1 Dn

!

for i ≤ j . (c) Use the result in part (b) to compute A 7−1 and compare it to the result obtained using the computer.

10.4 Markov Chains In this section we describe a general model of a system that changes from state to state. We then apply the model to several concrete problems.

PREREQUISITES: Linear Systems Matrices Intuitive Understanding of Limits A Markov Process

Suppose a physical or mathematical system undergoes a process of change such that at any moment it can occupy one of a finite number of states. For example, the weather in a certain city could be in one of three possible states: sunny, cloudy, or rainy. Or an individual could be in one of four possible emotional states: happy, sad, angry, or apprehensive. Suppose that such a system changes with time from one state to another and at scheduled times the state of the system is observed. If the state of the system at any observation cannot be predicted with certainty, but the probability that a given state occurs can be predicted by just knowing the state of the system at the preceding observation, then the process of change is called a Markov chain or Markov process. DEFINITION 1 If a Markov chain has k possible states, which we label as 1, 2, . . . , k , then the probability that the system is in state i at any observation after it was in state j at the preceding observation is denoted by pij and is called the transition probability from state j to state i . The matrix P = [pij ] is called the transition matrix of the Markov chain.

For example, in a three-state Markov chain, the transition matrix has the form Preceding State



1

p11 ⎢ ⎣p21 p31

2

p12 p22 p32

3

⎤ p13 1 ⎥ p23 ⎦ 2 p33 3

New State

In this matrix, p32 is the probability that the system will change from state 2 to state 3, p11 is the probability that the system will still be in state 1 if it was previously in state 1, and so forth.

552

Chapter 10 Applications of Linear Algebra

E X A M P L E 1 Transition Matrix of the Markov Chain

A car rental agency has three rental locations, denoted by 1, 2, and 3. A customer may rent a car from any of the three locations and return the car to any of the three locations. The manager finds that customers return the cars to the various locations according to the following probabilities: Rented from Location 1

2

.8 ⎢ ⎣.1 .1

.3 .2 .5



3

⎤ .2 1 ⎥ .6 ⎦ 2 .2 3

Returned to Location

This matrix is the transition matrix of the system considered as a Markov chain. From this matrix, the probability is .6 that a car rented from location 3 will be returned to location 2, the probability is .8 that a car rented from location 1 will be returned to location 1, and so forth.

E X A M P L E 2 Transition Matrix of the Markov Chain

By reviewing its donation records, the alumni office of a college finds that 80% of its alumni who contribute to the annual fund one year will also contribute the next year, and 30% of those who do not contribute one year will contribute the next. This can be viewed as a Markov chain with two states: state 1 corresponds to an alumnus giving a donation in any one year, and state 2 corresponds to the alumnus not giving a donation in that year. The transition matrix is

.8 .3 P = .2 .7

In the examples above, the transition matrices of the Markov chains have the property that the entries in any column sum to 1. This is not accidental. If P = [pij ] is the transition matrix of any Markov chain with k states, then for each j we must have

p1j + p2j + · · · + pkj = 1

(1)

because if the system is in state j at one observation, it is certain to be in one of the k possible states at the next observation. A matrix with property (1) is called a stochastic matrix, a probability matrix, or a Markov matrix. From the preceding discussion, it follows that the transition matrix for a Markov chain must be a stochastic matrix. In a Markov chain, the state of the system at any observation time cannot generally be determined with certainty. The best one can usually do is specify probabilities for each of the possible states. For example, in a Markov chain with three states, we might describe the possible state of the system at some observation time by a column vector

⎡ ⎤ x1 ⎢ ⎥ x = ⎣x2 ⎦ x3

in which x1 is the probability that the system is in state 1, x2 the probability that it is in state 2, and x3 the probability that it is in state 3. In general we make the following definition.

10.4 Markov Chains

553

DEFINITION 2 The state vector for an observation of a Markov chain with k states is

a column vector x whose i th component xi is the probability that the system is in the

i th state at that time. Observe that the entries in any state vector for a Markov chain are nonnegative and have a sum of 1. (Why?) A column vector that has this property is called a probability vector. Let us suppose now that we know the state vector x(0) for a Markov chain at some initial observation. The following theorem will enable us to determine the state vectors x(1) , x(2) , . . . , x(n) , . . . at the subsequent observation times.

THEOREM 10.4.1 If P is the transition matrix of a Markov chain and x(n) is the state

vector at the nth observation, then x(n+1) = P x(n) .

The proof of this theorem involves ideas from probability theory and will not be given here. From this theorem, it follows that x(1) = P x(0) x(2) = P x(1) = P 2 x(0) x(3) = P x(2) = P 3 x(0)

.. .

x(n) = P x(n−1) = P n x(0) In this way, the initial state vector x(0) and the transition matrix P determine x(n) for n = 1, 2, . . . .

E X A M P L E 3 Example 2 Revisited

The transition matrix in Example 2 was

P =

.3 .7

.8 .2

We now construct the probable future donation record of a new graduate who did not give a donation in the initial year after graduation. For such a graduate the system is initially in state 2 with certainty, so the initial state vector is

0 1

x(0) = From Theorem 10.4.1 we then have (1)

x

= Px

(0)

x(2) = P x(1) x(3) = P x(2)



.8 = .2 .8 = .2 .8 = .2

.3 .7





0 .3 = 1 .7

.3 .45 = .7 .55



.3 .45 .525 = .7 .55 .475

.3 .7

554

Chapter 10 Applications of Linear Algebra

Thus, after three years the alumnus can be expected to make a donation with probability .525. Beyond three years, we find the following state vectors (to three decimal places):

.563 , .438

.598 , = .402

.581 , .419

.599 , = .401

x(4) =

x(5) =

x(8)

x(9)

.591 , .409

.599 , = .401

x(10)

For all n beyond 11, we have



.600 .400

x(n) =

.595 .405

.600 = .400

x(6) =

x(7) = x(11)

to three decimal places. In other words, the state vectors converge to a fixed vector as the number of observations increases. (We will discuss this further below.) E X A M P L E 4 Example 1 Revisited

The transition matrix in Example 1 was



.8 ⎢ ⎣.1 .1

⎤ .2 ⎥ .6 ⎦ .2

.3 .2 .5

If a car is rented initially from location 2, then the initial state vector is

⎡ ⎤ 0

(0)

x

⎢ ⎥ = ⎣1⎦ 0

Using this vector and Theorem 10.4.1, one obtains the later state vectors listed in Table 1. Table 1 n

x(n)

x(n) 1 x(n) 2 x(n) 3

0

1

2

3

4

5

6

7

8

9

10

11

0 1 0

.300 .200 .500

.400 .370 .230

.477 .252 .271

.511 .261 .228

.533 .240 .227

.544 .238 .219

.550 .233 .217

.553 .232 .215

.555 .231 .214

.556 .230 .214

.557 .230 .213

For all values of n greater than 11, all state vectors are equal to x(11) to three decimal places. Two things should be observed in this example. First, it was not necessary to know how long a customer kept the car. That is, in a Markov process the time period between observations need not be regular. Second, the state vectors approach a fixed vector as n increases, just as in the first example. E X A M P L E 5 Using Theorem 10.4.1

A traffic officer is assigned to control the traffic at the eight intersections indicated in Figure 10.4.1.She is instructed to remain at each intersection for an hour and then to either remain at the same intersection or move to a neighboring intersection. To avoid establishing a pattern, she is told to choose her new intersection on a random basis, with each possible choice equally likely. For example, if she is at intersection 5, her next

10.4 Markov Chains

1

2

555

intersection can be 2, 4, 5, or 8, each with probability 41 . Every day she starts at the location where she stopped the day before. The transition matrix for this Markov chain is Old Intersection

3

6

4

7

1

2

3

4

5

6

7

8

1 ⎢3 ⎢1 ⎢3

1 3 1 3

0

1 5

0

0

0

0

0

0

1 4

0

0

0

0

0

0

0

1 4 1 4

1 4

1 3

1 5 1 5 1 5

1 3

0

1 3 1 3

0

0

0

1 3

0

0

0

1 5

0

1 3 1 3

0

1 4

1 4 1 4 1 4



5

⎢ ⎢0 ⎢ ⎢1 ⎢3 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎣

8

Figure 10.4.1

0

0 0

0

0



⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ 1⎥ 3⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 3⎦ 1 3

1 2 3 4 5

New Intersection

6 7 8

If the traffic officer begins at intersection 5, her probable locations, hour by hour, are given by the state vectors given in Table 2. For all values of n greater than 22, all state vectors are equal to x(22) to three decimal places. Thus, as with the first two examples, the state vectors approach a fixed vector as n increases.

Table 2 n x(n)

x(n) 1 x(n) 2 x(n) 3 x(n) 4 x(n) 5 x(n) 6 x(n) 7 x(n) 8

Limiting Behavior of the State Vectors

0

1

2

3

4

5

10

15

20

22

0 0 0 0 1 0 0 0

.000 .250 .000 .250 .250 .000 .000 .250

.133 .146 .050 .113 .279 .000 .133 .146

.116 .163 .039 .187 .190 .050 .104 .152

.130 .140 .067 .162 .190 .056 .131 .124

.123 .138 .073 .178 .168 .074 .125 .121

.113 .115 .100 .178 .149 .099 .138 .108

.109 .109 .106 .179 .144 .105 .142 .107

.108 .108 .107 .179 .143 .107 .143 .107

.107 .107 .107 .179 .143 .107 .143 .107

In our examples we saw that the state vectors approached some fixed vector as the number of observations increased. We now ask whether the state vectors always approach a fixed vector in a Markov chain. A simple example shows that this is not the case.

E X A M P L E 6 System Oscillates Between Two State Vectors

Let



0 P = 1

1 0

(0)

and x

=

Then, because P 2 = I and P 3 = P , we have that (0)

x

=x

(2)

=x

(4)

= ··· =

1 0

1 0

556

Chapter 10 Applications of Linear Algebra



and

0 1 1 0 , so it does and This system oscillates indefinitely between the two state vectors 0 1 not approach any fixed vector. x(1) = x(3) = x(5) = · · · =

However, if we impose a mild condition on the transition matrix, we can show that a fixed limiting state vector is approached. This condition is described by the following definition. DEFINITION 3 A transition matrix is regular if some integer power of it has all positive

entries. Thus, for a regular transition matrix P , there is some positive integer m such that all entries of P m are positive. This is the case with the transition matrices of Examples 1 and 2 for m = 1. In Example 5 it turns out that P 4 has all positive entries. Consequently, in all three examples the transition matrices are regular. A Markov chain that is governed by a regular transition matrix is called a regular Markov chain. We will see that every regular Markov chain has a fixed state vector q such that P n x(0) approaches q as n increases for any choice of x(0) . This result is of major importance in the theory of Markov chains. It is based on the following theorem. THEOREM 10.4.2 Behavior of P n as n → ⴥ

If P is a regular transition matrix, then as n → ⬁,

⎤ q1 q2 ⎥ ⎥ .. ⎥ .⎦ qk · · · qk where the qi are positive numbers such that q1 + q2 + · · · + qk = 1. ⎡

q1 ⎢q ⎢ 2 Pn → ⎢ . ⎣ .. qk

q1 q2 .. .

··· ···

We will not prove this theorem here. We refer you to a more specialized text, such as J. Kemeny and J. Snell, Finite Markov Chains (New York: Springer-Verlag, 1976). Let us set ⎤ ⎡ ⎡ ⎤

q1 ⎢q ⎢ 2 Q=⎢. ⎣ .. qk

q1 q2 .. .

qk

··· ···

q1 q1 ⎥ ⎥ ⎢ q2 ⎥ ⎢q2 ⎥ and q = ⎢ . ⎥ .. ⎥ ⎣ .. ⎦ .⎦ qk · · · qk

Thus, Q is a transition matrix, all of whose columns are equal to the probability vector q. Q has the property that if x is any probability vector, then

⎤⎡ ⎤ ⎡ ⎤ q1 x1 + q1 x2 + · · · + q1 xk q1 x1 ⎢ ⎥ ⎢ ⎥ q2 ⎥ ⎥ ⎢x2 ⎥ ⎢q2 x1 + q2 x2 + · · · + q2 xk ⎥ ⎢.⎥=⎢ . ⎥ .. ⎥ . . .. .. ⎦ . ⎦ ⎣ .. ⎦ ⎣ .. xk qk x1 + qk x2 + · · · + qk xk qk · · · q k ⎡ ⎤ q1 ⎢q ⎥ ⎢ 2⎥ = (x1 + x2 + · · · + xk ) ⎢ . ⎥ = (1)q = q ⎣ .. ⎦ qk ⎡

q1 ⎢q ⎢ 2 Qx = ⎢ . ⎣ .. qk

q1 q2 .. .

··· ···

10.4 Markov Chains

557

That is, Q transforms any probability vector x into the fixed probability vector q. This result leads to the following theorem. THEOREM 10.4.3 Behavior of P n x as n → ⴥ

If P is a regular transition matrix and x is any probability vector, then as n → ⬁,

⎡ ⎤ q1 ⎢q ⎥ ⎢ 2⎥ P nx → ⎢ . ⎥ = q ⎣ .. ⎦ qk

where q is a fixed probability vector, independent of n, all of whose entries are positive. This result holds since Theorem 10.4.2 implies that P n → Q as n → ⬁. This in turn implies that P n x → Qx = q as n → ⬁. Thus, for a regular Markov chain, the system eventually approaches a fixed state vector q. The vector q is called the steady-state vector of the regular Markov chain. For systems with many states, usually the most efficient technique of computing the steady-state vector q is simply to calculate P n x for some large n. Our examples illustrate this procedure. Each is a regular Markov process, so that convergence to a steady-state vector is ensured. Another way of computing the steady-state vector is to make use of the following theorem. THEOREM 10.4.4 Steady-State Vector

The steady-state vector q of a regular transition matrix P is the unique probability vector that satisfies the equation P q = q. To see this, consider the matrix identity PP n = P n+1 . By Theorem 10.4.2, both P n and P n+1 approach Q as n → ⬁. Thus, we have PQ = Q. Any one column of this matrix equation gives P q = q. To show that q is the only probability vector that satisfies this equation, suppose r is another probability vector such that P r = r. Then also P n r = r for n = 1, 2, . . . . When we let n → ⬁, Theorem 10.4.3 leads to q = r. Theorem 10.4.4 can also be expressed by the statement that the homogeneous linear system (I − P )q = 0 has a unique solution vector q with nonnegative entries that satisfy the condition q1 + q2 + · · · + qk = 1. We can apply this technique to the computation of the steady-state vectors for our examples. E X A M P L E 7 Example 2 Revisited

In Example 2 the transition matrix was

P = so the linear system (I − P )q = 0 is



.2 −.2



.8 .2

.3 .7

−.3 q1 0 = 0 .3 q 2

(2)

558

Chapter 10 Applications of Linear Algebra

This leads to the single independent equation

.2q1 − .3q2 = 0 or

q1 = 1.5q2 Thus, when we set q2 = s , any solution of (2) is of the form



1.5 q=s 1

where s is an arbitrary constant. To make the vector q a probability vector, we set s = 1/(1.5 + 1) = .4. Consequently,

.6 .4

q=

is the steady-state vector of this regular Markov chain. This means that over the long run, 60% of the alumni will give a donation in any one year, and 40% will not. Observe that this agrees with the result obtained numerically in Example 3.

E X A M P L E 8 Example 1 Revisited

In Example 1 the transition matrix was



.8 ⎢ P = ⎣.1 .1 so the linear system (I − P )q = 0 is



.2 ⎢ ⎣−.1 −.1

⎤ .2 ⎥ .6 ⎦ .2

.3 .2 .5

⎤⎡ ⎤ ⎡ ⎤ q1 −.2 0 ⎥⎢ ⎥ ⎢ ⎥ −.6⎦ ⎣q2 ⎦ = ⎣0⎦ 0 .8 q3

−.3 .8 −.5

The reduced row echelon form of the coefficient matrix is (verify)



1

− 34 13

0

⎢ ⎢0 ⎣ 0



1

⎥ ⎥ − 14 13 ⎦

0

0

so the original linear system is equivalent to the system

q1 = q2 =

 34 

q3

13   14 13

q3

When we set q3 = s , any solution of the linear system is of the form

⎡ ⎤ 34

⎢ 13 ⎥ 14 ⎥ q=s⎢ ⎣ 13 ⎦ 1 To make this a probability vector, we set

s=

1 34 13

+

14 13

+1

=

13 61

10.4 Markov Chains

559

Thus, the steady-state vector of the system is

⎡ ⎤

q=

34 61 ⎥ ⎢ 14 ⎢ ⎥ ⎣ 61 ⎦ 13 61



⎤ .5573 . . . ⎢ ⎥ = ⎣.2295 . . .⎦ .2131 . . .

This agrees with the result obtained numerically in Table 1. The entries of q give the long-run probabilities that any one car will be returned to location 1, 2, or 3, respectively. If the car rental agency has a fleet of 1000 cars, it should design its facilities so that there are at least 558 spaces at location 1, at least 230 spaces at location 2, and at least 214 spaces at location 3. E X A M P L E 9 Example 5 Revisited

We will not give the details of the calculations but simply state that the unique probability vector solution of the linear system (I − P )q = 0 is

⎡3⎤

q=

28 ⎢3⎥ ⎢ 28 ⎥ ⎢ ⎥ ⎢3⎥ ⎢ 28 ⎥ ⎢ ⎥ ⎢5⎥ ⎢ 28 ⎥ ⎢4⎥ ⎢ ⎥ ⎢ 28 ⎥ ⎢3⎥ ⎢ 28 ⎥ ⎢ ⎥ ⎢4⎥ ⎣ 28 ⎦ 3 28

⎤ .1071… ⎢.1071…⎥ ⎥ ⎢ ⎥ ⎢ ⎢.1071…⎥ ⎥ ⎢ ⎥ ⎢ ⎢.1785…⎥ ⎥ ⎢ =⎢ ⎥ ⎢.1428…⎥ ⎥ ⎢ ⎢.1071…⎥ ⎥ ⎢ ⎥ ⎢ ⎣.1428…⎦ ⎡

.1071…

The entries in this vector indicate the proportion of time the traffic officer spends at each intersection over the long term. Thus, if the objective is for her to spend the same proportion of time at each intersection, then the strategy of random movement with equal probabilities from one intersection to another is not a good one. (See Exercise 5.)

Exercise Set 10.4 1. Consider the transition matrix



.5 .5

.4 P = .6

(b) State why P is regular and find its steady-state vector.



(a) Calculate x(n) for n = 1, 2, 3, 4, 5 if x(0) =

1 . 0

(b) State why P is regular and find its steady-state vector. 2. Consider the transition matrix



.2 ⎢ P = ⎣.6 .2

⎤ .7 ⎥ .2⎦ .1

.1 .4 .5

(a) Calculate x(1) , x(2) , and x(3) to three decimal places if

⎡ ⎤

x(0)

3. Find the steady-state vectors of the following regular transition matrices:

1 (a)

3 2 3

3 4 1 4



(b)

.81 .19

.26 .74

(c)

4. Let P be the transition matrix

1 2 1 2



0

1 ⎢ 31 ⎢ ⎣3 1 3

1 2

0

⎤ ⎥

1⎥ 4⎦ 3 4

0 1 2



1

 

(a) Show that P is not regular. n (0)

0

0

(b) Show that as n increases, P x initial state vector x(0) .

1

(c) What conclusion of Theorem 10.4.3 is not valid for the steady state of this transition matrix?

⎢ ⎥ = ⎣0⎦

approaches

1

for any

560

Chapter 10 Applications of Linear Algebra

5. Verify that if P is a k × k regular transition matrix all of whose row sums are equal to 1, then the entries of its steady-state vector are all equal to 1/k . 6. Show that the transition matrix



1 2



0

P2 =

1



⎢1 P =⎢ ⎣2

1 2 1 2

⎥ 0⎥ ⎦

1 2

0

1 2

0

with

is regular, and use Exercise 5 to find its steady-state vector. 7. John is either happy or sad. If he is happy one day, then he is happy the next day four times out of five. If he is sad one day, then he is sad the next day one time out of three. Over the long term, what are the chances that John is happy on any given day? 8. A country is divided into three demographic regions. It is found that each year 5% of the residents of region 1 move to region 2, and 5% move to region 3. Of the residents of region 2, 15% move to region 1 and 10% move to region 3. And of the residents of region 3, 10% move to region 1 and 5% move to region 2. What percentage of the population resides in each of the three regions after a long period of time?

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.



0

⎢ ⎢0 ⎢ P4 = ⎢ ⎢0 ⎣ 1

1 2 1 2





0

0

0

1 3 1 3 1 3

1 2 1 2

0

1

1 2 1 2

1 3 ⎥ 1⎥ , 3⎦ 1 3

0

0

0

0

0

0

0

1 3 1 3 1 3

1 4 1 4 1 4 1 4

⎢ P3 = ⎢ ⎣0

, ⎤

1 4 ⎥ 1⎥ 4⎥ ⎥, 1⎥ 4⎦ 1 4



0



⎢ ⎢0 ⎢ ⎢ P5 = ⎢ ⎢0 ⎢ ⎢0 ⎣ 1

1 2 1 2



1 5 ⎥ 1⎥ 5⎥ ⎥ 1⎥ , 5⎥ 1⎥ ⎥ 5⎦ 1 5

and so on. (a) Use a computer to show that each of these four matrices is regular by computing their squares. (b) Verify Theorem 10.4.2 by computing the 100th power of Pk for k = 2, 3, 4, 5. Then make a conjecture as to the limiting value of Pkn as n → ⬁ for all k = 2, 3, 4, . . . . (c) Verify that the common column qk of the limiting matrix you found in part (b) satisfies the equation Pk qk = qk , as required by Theorem 10.4.4. T2. A mouse is placed in a box with nine rooms as shown in the accompanying figure. Assume that it is equally likely that the mouse goes through any door in the room or stays in the room. (a) Construct the 9 × 9 transition matrix for this problem and show that it is regular. (b) Determine the steady-state vector for the matrix. (c) Use a symmetry argument to show that this problem may be solved using only a 3 × 3 matrix.

1

2

3

4

5

6

7

8

9

T1. Consider the sequence of transition matrices

{P2 , P3 , P4 , . . .}

Figure Ex-T2

10.5 Graph Theory

561

10.5 GraphTheory In this section we introduce matrix representations of relations among members of a set. We use matrix arithmetic to analyze these relationships.

PREREQUISITES: Matrix Addition and Multiplication

Relations Among Members of a Set

There are countless examples of sets with finitely many members in which some relation exists among members of the set. For example, the set could consist of a collection of people, animals, countries, companies, sports teams, or cities; and the relation between two members, A and B , of such a set could be that person A dominates person B , animal A feeds on animal B , country A militarily supports country B , company A sells its product to company B , sports team A consistently beats sports team B , or city A has a direct airline flight to city B . We will now show how the theory of directed graphs can be used to mathematically model relations such as those in the preceding examples.

Directed Graphs

A directed graph is a finite set of elements, {P1 , P2 , . . . , Pn }, together with a finite collection of ordered pairs (Pi , Pj ) of distinct elements of this set, with no ordered pair being repeated. The elements of the set are called vertices, and the ordered pairs are called directed edges, of the directed graph. We use the notation Pi → Pj (which is read “Pi is connected to Pj ”) to indicate that the directed edge (Pi , Pj ) belongs to the directed graph. Geometrically, we can visualize a directed graph (Figure 10.5.1) by representing the vertices as points in the plane and representing the directed edge Pi → Pj by drawing a line or arc from vertex Pi to vertex Pj , with an arrow pointing from Pi to Pj . If both Pi → Pj and Pj → Pi hold (denoted Pi ↔ Pj ), we draw a single line between Pi and Pj with two oppositely pointing arrows (as with P2 and P3 in the figure). As in Figure 10.5.1, for example, a directed graph may have separate “components” of vertices that are connected only among themselves; and some vertices, such as P5 , may not be connected with any other vertex. Also, because Pi → Pi is not permitted in a directed graph, a vertex cannot be connected with itself by a single arc that does not pass through any other vertex. Figure 10.5.2 shows diagrams representing three more examples of directed graphs. With a directed graph having n vertices, we may associate an n × n matrix M = [mij ], called the vertex matrix of the directed graph. Its elements are defined by

P2 P7 P1

P3

P6

P5 P4

Figure 10.5.1

8

mij =

1, if Pi → Pj 0, otherwise P3 P1

P3

P2

P2

P4 P4 P2

P1 P4

P1

Figure 10.5.2

(a)

P5

(b)

P3

(c)

562

Chapter 10 Applications of Linear Algebra

for i , j = 1, 2, . . . , n. For the three directed graphs in Figure 10.5.2, the corresponding vertex matrices are



1 0 1 0

0 1 0 0

0 0⎥ ⎥ ⎥ 1⎦ 0

0 ⎢0 ⎢ ⎢ M = ⎢0 ⎢ ⎣0 0

1 0 0 1 1

0 1 0 0 1

0 1 1 0 0

0 ⎢1 ⎢ M=⎢ ⎣1 1

1 0 0 0

0 1 0 0

0 0⎥ ⎥ ⎥ 1⎦ 0

Figure 10.5.2a:



Figure 10.5.2b:



Figure 10.5.2c:



0 ⎢0 ⎢ M=⎢ ⎣0 0





1 0⎥ ⎥ ⎥ 0⎥ ⎥ 1⎦ 0

By their definition, vertex matrices have the following two properties: (i) All entries are either 0 or 1. (ii) All diagonal entries are 0. Conversely, any matrix with these two properties determines a unique directed graph having the given matrix as its vertex matrix. For example, the matrix

P3



P4 P1 P2

Figure 10.5.3

0 ⎢0 ⎢ M=⎢ ⎣1 0

1 0 0 0

1 1 0 0



0 0⎥ ⎥ ⎥ 1⎦ 0

determines the directed graph in Figure 10.5.3.

E X A M P L E 1 Influences Within a Family

M

YS

OS

D

F

Figure 10.5.4

A certain family consists of a mother, father, daughter, and two sons. The family members have influence, or power, over each other in the following ways: the mother can influence the daughter and the oldest son; the father can influence the two sons; the daughter can influence the father; the oldest son can influence the youngest son; and the youngest son can influence the mother. We may model this family influence pattern with a directed graph whose vertices are the five family members. If family member A influences family member B , we write A → B . Figure 10.5.4 is the resulting directed graph, where we have used obvious letter designations for the five family members. The vertex matrix of this directed graph is M F D OS YS ⎡ ⎤ M 0 0 1 1 0 F ⎢ 1 1⎥ ⎢0 0 0 ⎥ ⎢ ⎥ D ⎢0 1 0 0 0⎥ ⎢ ⎥ 0 1⎦ OS ⎣0 0 0 0 0 YS 1 0 0

10.5 Graph Theory

563

E X A M P L E 2 Vertex Matrix: Moves on a Chessboard

In chess the knight moves in an “L”-shaped pattern about the chessboard. For the board in Figure 10.5.5 it may move horizontally two squares and then vertically one square, or it may move vertically two squares and then horizontally one square. Thus, from the center square in the figure, the knight may move to any of the eight marked shaded squares. Suppose that the knight is restricted to the nine numbered squares in Figure 10.5.6. If by i → j we mean that the knight may move from square i to square j , the directed graph in Figure 10.5.7 illustrates all possible moves that the knight may make among these nine squares. In Figure 10.5.8 we have “unraveled” Figure 10.5.7 to make the pattern of possible moves clearer. The vertex matrix of this directed graph is given by

Figure 10.5.5



1

2

3

4

5

6

7

8

9

Figure 10.5.6 1

2

3

5

4

7

6

9

8

Figure 10.5.7 8 1

3 5

6

4

0 ⎢0 ⎢ ⎢ ⎢0 ⎢ ⎢0 ⎢ M=⎢ ⎢0 ⎢1 ⎢ ⎢ ⎢0 ⎢ ⎣1 0

0 0 0 0 0 0 1 0 1

0 0 0 1 0 0 0 1 0

0 0 1 0 0 0 0 0 1

0 0 0 0 0 0 0 0 0

1 0 0 0 0 0 1 0 0

0 1 0 0 0 1 0 0 0

1 0 1 0 0 0 0 0 0



0 1⎥ ⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎦ 0

In Example 1 the father cannot directly influence the mother; that is, F → M is not true. But he can influence the youngest son, who can then influence the mother. We write this as F → YS → M and call it a 2-step connection from F to M . Analogously, we call M → D a 1-step connection, F → OS → YS → M a 3-step connection, and so forth. Let us now consider a technique for finding the number of all possible r -step connections (r = 1, 2, . . .) from one vertex Pi to another vertex Pj of an arbitrary directed graph. (This will include the case when Pi and Pj are the same vertex.) The number of 1-step connections from Pi to Pj is simply mij . That is, there is either zero or one 1-step connection from Pi to Pj , depending on whether mij is zero or one. For the number (2) of 2-step connections, we consider the square of the vertex matrix. If we let mij be the 2 (i, j )-th element of M , we have

m(ij2) = mi 1 m1j + mi 2 m2j + · · · + min mnj 7

9 2

Figure 10.5.8

(1)

Now, if mi 1 = m1j = 1, there is a 2-step connection Pi → P1 → Pj from Pi to Pj . But if either mi 1 or m1j is zero, such a 2-step connection is not possible. Thus Pi → P1 → Pj is a 2-step connection if and only if mi 1 m1j = 1. Similarly, for any k = 1, 2, . . . , n, Pi → Pk → Pj is a 2-step connection from Pi to Pj if and only if the term mik mkj on the right side of (1) is one; otherwise, the term is zero. Thus, the right side of (1) is the total number of two 2-step connections from Pi to Pj . A similar argument will work for finding the number of 3-, 4-, . . . , r -step connections from Pi to Pj . In general, we have the following result.

M be the vertex matrix of a directed graph and let m(r) ij be the (i, j )-th element of M r . Then m(r) is equal to the number of r -step connections from ij Pi to Pj . THEOREM 10.5.1 Let

564

Chapter 10 Applications of Linear Algebra

E X A M P L E 3 Using Theorem 10.5.1

Figure 10.5.9 is the route map of a small airline that services the four cities P1 , P2 , P3 , P4 . As a directed graph, its vertex matrix is

P2



0 ⎢1 ⎢ M=⎢ ⎣1 0

P3

P1

P4

We have that

Figure 10.5.9



2 ⎢1 ⎢ M2 = ⎢ ⎣0 2

0 1 2 0

1 1 2 1

1 0 0 1

1 1 0 1





0 0⎥ ⎥ ⎥ 1⎦ 0



1 1 ⎢2 1⎥ ⎥ ⎢ ⎥ and M 3 = ⎢ 0⎦ ⎣4 1 1

3 2 0 3

3 3 2 3



1 1⎥ ⎥ ⎥ 2⎦ 1

If we are interested in connections from city P4 to city P3 , we may use Theorem 10.5.1 to (2) find their number. Because m43 = 1, there is one 1-step connection; because m43 = 1, (3) there is one 2-step connection; and because m43 = 3, there are three 3-step connections. To verify this, from Figure 10.5.9 we find 1-step connections from P4 to P3 : P4 → P3 2-step connections from P4 to P3 : P4 → P2 → P3 3-step connections from P4 to P3 : P4 → P3 → P4 → P3

P4 → P2 → P1 → P3 P4 → P3 → P1 → P3 Cliques

In everyday language a “clique” is a closely knit group of people (usually three or more) that tends to communicate within itself and has no place for outsiders. In graph theory this concept is given a more precise meaning. DEFINITION 1 A subset of a directed graph is called a clique if it satisfies the following three conditions:

(i) The subset contains at least three vertices. (ii) For each pair of vertices Pi and Pj in the subset, both Pi → Pj and Pj → Pi are true. (iii) The subset is as large as possible; that is, it is not possible to add another vertex to the subset and still satisfy condition (ii). P5

P3

P6

This definition suggests that cliques are maximal subsets that are in perfect “communication” with each other. For example, if the vertices represent cities, and Pi → Pj means that there is a direct airline flight from city Pi to city Pj , then there is a direct flight between any two cities within a clique in either direction. E X A M P L E 4 A Directed Graph with Two Cliques

P4

P2

P1

The directed graph illustrated in Figure 10.5.10 (which might represent the route map of an airline) has two cliques:

{P1 , P2 , P3 , P4 } and {P3 , P4 , P6 } P7

Figure 10.5.10

This example shows that a directed graph may contain several cliques and that a vertex may simultaneously belong to more than one clique.

10.5 Graph Theory

P1

P2

P5

P4

P3

565

For simple directed graphs, cliques can be found by inspection. But for large directed graphs, it would be desirable to have a systematic procedure for detecting cliques. For this purpose, it will be helpful to define a matrix S = [sij ] related to a given directed graph as follows: 8 1, if Pi ↔ Pj sij = 0, otherwise The matrix S determines a directed graph that is the same as the given directed graph, with the exception that the directed edges with only one arrow are deleted. For example, if the original directed graph is given by Figure 10.5.11a, the directed graph that has S as its vertex matrix is given in Figure 10.5.11b. The matrix S may be obtained from the vertex matrix M of the original directed graph by setting sij = 1 if mij = mj i = 1 and setting sij = 0 otherwise. The following theorem, which uses the matrix S , is helpful for identifying cliques.

(a) P1

P5

THEOREM 10.5.2 Identifying Cliques (3)

Let sij be the (i, j )-th element of S 3 . Then a vertex Pi belongs to some clique if and (3)

only if sii  = 0. P2

P4

P3

(b)

(3)

 = 0, then there is at least one 3-step connection from Pi to itself in the modified directed graph determined by S . Suppose it is Pi → Pj → Pk → Pi . In the modified directed graph, all directed relations are two-way, so we also have the connections Pi ↔ Pj ↔ Pk ↔ Pi . But this means that {Pi , Pj , Pk } is either a clique or a subset of a clique. In either case, Pi must belong to some clique. The converse statement, “if Pi belongs to a clique, then sii(3)  = 0,” follows in a similar manner. Proof If sii

Figure 10.5.11

E X A M P L E 5 Using Theorem 10.5.2

Suppose that a directed graph has as its vertex matrix



0 ⎢1 ⎢ M=⎢ ⎣0 1 Then



0 ⎢1 ⎢ S=⎢ ⎣0 1

1 0 1 0

0 1 0 0

1 0 1 0



1 1 0 0

1 0⎥ ⎥ ⎥ 1⎦ 0





1 0 ⎥ ⎢ 0⎥ ⎢3 ⎥ and S 3 = ⎢ 0⎦ ⎣0 0 2

3 0 2 0

0 2 0 1



2 0⎥ ⎥ ⎥ 1⎦ 0

Because all diagonal entries of S 3 are zero, it follows from Theorem 10.5.2 that the directed graph has no cliques. E X A M P L E 6 Using Theorem 10.5.2

Suppose that a directed graph has as its vertex matrix



0 ⎢1 ⎢ ⎢ M = ⎢1 ⎢ ⎣1 1

1 0 1 1 0

0 0 0 0 0

1 1 1 0 1



1 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎦ 0

566

Chapter 10 Applications of Linear Algebra

Then



0 ⎢1 ⎢ ⎢ S = ⎢0 ⎢ ⎣1 1

1 0 0 1 0

0 0 0 0 0

1 1 0 0 0





1 2 ⎢4 0⎥ ⎥ ⎢ ⎥ ⎢ 0⎥ and S 3 = ⎢0 ⎥ ⎢ ⎣4 0⎦ 0 3 (3)

(3)

4 2 0 3 1

0 0 0 0 0

4 3 0 2 1



3 1⎥ ⎥ ⎥ 0⎥ ⎥ 1⎦ 0

(3)

The nonzero diagonal entries of S 3 are s11 , s22 , and s44 . Consequently, in the given directed graph, P1 , P2 , and P4 belong to cliques. Because a clique must contain at least three vertices, the directed graph has only one clique, {P1 , P2 , P4 }.

Dominance-Directed Graphs

P2

In many groups of individuals or animals, there is a definite “pecking order” or dominance relation between any two members of the group. That is, given any two individuals A and B , either A dominates B or B dominates A, but not both. In terms of a directed graph in which Pi → Pj means Pi dominates Pj , this means that for all distinct pairs, either Pi → Pj or Pj → Pi , but not both. In general, we have the following definition. DEFINITION 2 A dominance-directed graph is a directed graph such that for any dis-

tinct pair of vertices Pi and Pj , either Pi → Pj or Pj → Pi , but not both. P1

P3

An example of a directed graph satisfying this definition is a league of n sports teams that play each other exactly one time, as in one round of a round-robin tournament in which no ties are allowed. If Pi → Pj means that team Pi beat team Pj in their single match, it is easy to see that the definition of a dominance-directed group is satisfied. For this reason, dominance-directed graphs are sometimes called tournaments. Figure 10.5.12 illustrates some dominance-directed graphs with three, four, and five vertices, respectively. In these three graphs, the circled vertices have the following interesting property: from each one there is either a 1-step or a 2-step connection to any other vertex in its graph. In a sports tournament, these vertices would correspond to the most “powerful” teams in the sense that these teams either beat any given team or beat some other team that beat the given team. We can now state and prove a theorem that guarantees that any dominance-directed graph has at least one vertex with this property.

(a) P3

P2 P1

P4

(b) P2

THEOREM 10.5.3 Connections in Dominance-Directed Graphs P1

P3

P5

P4

(c) Figure 10.5.12

In any dominance-directed graph, there is at least one vertex from which there is a 1-step or 2-step connection to any other vertex.

Proof Consider a vertex (there may be several) with the largest total number of 1-step

and 2-step connections to other vertices in the graph. By renumbering the vertices, we may assume that P1 is such a vertex. Suppose there is some vertex Pi such that there is no 1-step or 2-step connection from P1 to Pi . Then, in particular, P1 → Pi is not true, so that by definition of a dominance-directed graph, it must be that Pi → P1 . Next, let Pk be any vertex such that P1 → Pk is true. Then we cannot have Pk → Pi , as then P1 → Pk → Pi would be a 2-step connection from P1 to Pi . Thus, it must be that Pi → Pk . That is, Pi has 1-step connections to all the vertices to which P1 has 1-step connections. The vertex Pi must then also have 2-step connections to all the vertices to which P1

10.5 Graph Theory

567

has 2-step connections. But because, in addition, we have that Pi → P1 , this means that Pi has more 1-step and 2-step connections to other vertices than does P1 . However, this contradicts the way in which P1 was chosen. Hence, there can be no vertex Pi to which P1 has no 1-step or 2-step connection. This proof shows that a vertex with the largest total number of 1-step and 2-step connections to other vertices has the property stated in the theorem. There is a simple way of finding such vertices using the vertex matrix M and its square M 2 . The sum of the entries in the i th row of M is the total number of 1-step connections from Pi to other vertices, and the sum of the entries of the i th row of M 2 is the total number of 2-step connections from Pi to other vertices. Consequently, the sum of the entries of the i th row of the matrix A = M + M 2 is the total number of 1-step and 2-step connections from Pi to other vertices. In other words, a row of A = M + M 2 with the largest row sum identifies a vertex having the property stated in Theorem 10.5.3.

E X A M P L E 7 Using Theorem 10.5.3

Suppose that five baseball teams play each other exactly once, and the results are as indicated in the dominance-directed graph of Figure 10.5.13. The vertex matrix of the graph is

P1



0 ⎢1 ⎢ ⎢ M = ⎢0 ⎢ ⎣0 1

P5

P2

P3

Figure 10.5.13

P4

0 0 0 1 0

1 1 0 0 1



1 0 1 0 1

0 1⎥ ⎥ ⎥ 0⎥ ⎥ 0⎦ 0

so



0 ⎢1 ⎢ ⎢ A = M + M 2 = ⎢0 ⎢ ⎣0 1

0 0 0 1 0

1 1 0 0 1

1 0 1 0 1





0 0 ⎢1 1⎥ ⎥ ⎢ ⎥ ⎢ 0⎥ + ⎢ 0 ⎥ ⎢ 0⎦ ⎣ 1 0 0

1 0 1 0 1

0 2 0 1 1

1 3 0 0 2





0 0 ⎢2 0⎥ ⎥ ⎢ ⎥ ⎢ 0⎥ = ⎢0 ⎥ ⎢ 1⎦ ⎣1 0 1

1 0 1 1 1

1 3 0 1 2

2 3 1 0 3



0 1⎥ ⎥ ⎥ 0⎥ ⎥ 1⎦ 0

The row sums of A are 1st row sum = 4 2nd row sum = 9 3rd row sum = 2 4th row sum = 4 5th row sum = 7 Because the second row has the largest row sum, the vertex P2 must have a 1-step or 2-step connection to any other vertex. This is easily verified from Figure 10.5.13.

We have informally suggested that a vertex with the largest number of 1-step and 2step connections to other vertices is a “powerful” vertex. We can formalize this concept with the following definition.

568

Chapter 10 Applications of Linear Algebra DEFINITION 3 The power of a vertex of a dominance-directed graph is the total num-

ber of 1-step and 2-step connections from it to other vertices. Alternatively, the power of a vertex Pi is the sum of the entries of the ith row of the matrix A = M + M 2 , where M is the vertex matrix of the directed graph.

E X A M P L E 8 Example 7 Revisited

Let us rank the five baseball teams in Example 7 according to their powers. From the calculations for the row sums in that example, we have Power of team P1 = 4 Power of team P2 = 9 Power of team P3 = 2 Power of team P4 = 4 Power of team P5 = 7 Hence, the ranking of the teams according to their powers would be

P2 (first), P5 (second), P1 and P4 (tied for third), P3 (last)

Exercise Set 10.5 1. Construct the vertex matrix for each of the directed graphs illustrated in Figure Ex-1. P2

P1

P3

P1

(a)

P4

(b) P2 P3

0 0 0 0 0 0

1 0 0 0 1 0



0 1 0 1 0 1

1 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 1⎦ 0

0 ⎢1 ⎢ ⎣0 0

1 0 1 1

1 0 0 1



1 0⎥ ⎥ 1⎦ 0

(a) Draw a diagram of the directed graph. (b) Use Theorem 10.5.1 to find the number of 1-, 2-, and 3-step connections from the vertex P1 to the vertex P2 . Verify your answer by listing the various connections as in Example 3.

P4 P1

P6

P5

1 0 0 1 0 1



P5

P3

0 ⎢1 ⎢ ⎢0 ⎢ (c) ⎢ ⎢1 ⎢ ⎣0 0

3. Let M be the following vertex matrix of a directed graph:

P4 P2



(c) Figure Ex-1

2. Draw a diagram of the directed graph corresponding to each of the following vertex matrices. ⎡ ⎤ 0 0 1 0 0 ⎡ ⎤ 0 1 1 0 ⎢1 0 0 0 1⎥ ⎢ ⎥ ⎢1 0 0 0⎥ ⎢ ⎥ ⎥ (a) ⎢ (b) ⎢0 1 0 1 1⎥ ⎣0 0 0 1⎦ ⎢ ⎥ ⎣0 0 0 0 0⎦ 1 0 1 0 1 1 1 0 0

(c) Repeat part (b) for the 1-, 2-, and 3-step connections from P1 to P4 . 4. (a) Compute the matrix product M TM for the vertex matrix M in Example 1. (b) Verify that the k th diagonal entry of M TM is the number of family members who influence the k th family member. Why is this true? (c) Find a similar interpretation for the values of the nondiagonal entries of M TM . 5. By inspection, locate all cliques in each of the directed graphs illustrated in Figure Ex-5.

10.5 Graph Theory P1

Working withTechnology

P1 P5

P2

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.

P4

P4 P2

P3

P3

(a)

(b) P1

P2

P3

T1. A graph having n vertices such that every vertex is connected to every other vertex has a vertex matrix given by



P4

P8

P6

P7

P5

1 0 1 1 1

1 1 0 1 1

1 1 1 0 1

1 1 1 1 0

.. .

··· ··· ··· ··· ··· .. .

1 1⎥ ⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥

1

1

1

1

1

···

0

⎢ .. ⎣.

Figure Ex-5

6. For each of the following vertex matrices, use Theorem 10.5.2 to find all cliques in the corresponding directed graphs. 0 ⎢1 ⎢ ⎢ (a) ⎢0 ⎢ ⎣1 1

1 0 1 0 0

0 1 0 0 1

1 0 1 0 1





0 ⎢1 ⎢ ⎢0 ⎢ (b) ⎢ ⎢1 ⎢ ⎣0 0

0 1⎥ ⎥ ⎥ 1⎥ ⎥ 1⎦ 0

1 0 1 0 1 0

0 1 0 1 0 1

1 0 1 0 1 1

1 1 0 1 0 1



0 1⎥ ⎥ 1⎥ ⎥ ⎥ 1⎥ ⎥ 0⎦ 0

7. For the dominance-directed graph illustrated in Figure Ex-7 construct the vertex matrix and find the power of each vertex. P1

.. .

.. .

(b) Use the results in part (a) and symmetry arguments to show that Mnk can be written as



0

⎢1 ⎢ ⎢ ⎢1 ⎢ k Mn = ⎢ ⎢1 ⎢1 ⎢ ⎢ .. ⎣.

P4

Figure Ex-7

8. Five baseball teams play each other one time with the following results: A beats B, C, D B beats C, E C beats D, E D beats B E beats A, D Rank the five baseball teams in accordance with the powers of the vertices they correspond to in the dominance-directed graph representing the outcomes of the games.

.. ⎥ .⎦

(a) Use a computer to compute the eight matrices Mnk for n = 2, 3 and for k = 2, 3, 4, 5.

1

P2

.. .

In this problem we develop a formula for Mnk whose (i, j )-th entry equals the number of k -step connections from Pi to Pj .

⎤k

1 0 1 1 1

1 1 0 1 1

1 1 1 0 1

1 1 1 1 0

.. .

··· ··· ··· ··· ··· .. .

1 1⎥ ⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥

1

1

1

1

···

0

.. .

.. .

.. .



P3



0 ⎢1 ⎢ ⎢ ⎢1 ⎢ Mn = ⎢ ⎢1 ⎢1 ⎢

(c)



569

.. ⎥ .⎦

αk ⎢βk ⎢ ⎢ ⎢βk ⎢ =⎢ ⎢βk ⎢βk ⎢ ⎢ .. ⎣.

βk αk βk βk βk .. .

βk βk αk βk βk .. .

βk βk βk αk βk .. .

βk βk βk βk αk .. .

··· ··· ··· ··· ··· .. .

βk

βk

βk

βk

βk

···

(c) Using the fact that Mnk = Mn Mnk−1 , show that

αk 0 = 1 βk with

n−1 n−2



0 α1 = 1 β1

αk−1 βk−1

⎤ βk βk ⎥ ⎥ ⎥ βk ⎥ ⎥ βk ⎥ ⎥ βk ⎥ ⎥ .. ⎥ .⎦ αk

570

Chapter 10 Applications of Linear Algebra

(d) Using part (c), show that

0 αk = 1 βk

n−1 n−2

where Un is the n × n matrix all of whose entries are ones and In is the n × n identity matrix.

k−1 0 1

(f ) Show that for n > 2, all vertices for these directed graphs belong to cliques.

(e) Use the methods of Section 5.2 to compute



n−1 n−2

0 1

k−1

and thereby obtain expressions for αk and βk , and eventually show that ! (n − 1)k − (−1)k Un + (−1)k In Mnk =

n

T2. Consider a round-robin tournament among n players (labeled a1 , a2 , a3 , . . . , an ) where a1 beats a2 , a2 beats a3 , a3 beats a4 , . . . , an−1 beats an , and an beats a1 . Compute the “power” of each player, showing that they all have the same power; then determine that common power. [Hint: Use a computer to study the cases n = 3, 4, 5, 6; then make a conjecture and prove your conjecture to be true.]

10.6 Games of Strategy In this section we discuss a general game in which two competing players choose separate strategies to reach opposing objectives. The optimal strategy of each player is found in certain cases with the use of matrix techniques.

PREREQUISITES: Matrix Multiplication Basic Probability Concepts GameTheory

1/3

1/6

2 1

3 1/2 Row-wheel of player R

1/4

To introduce the basic concepts in the theory of games, we will consider the following carnival-type game that two people agree to play. We will call the participants in the game player R and player C . Each player has a stationary wheel with a movable pointer on it as in Figure 10.6.1. For reasons that will become clear, we will call player R ’s wheel the row-wheel and player C ’s wheel the column-wheel. The row-wheel is divided into three sectors numbered 1, 2, and 3, and the column-wheel is divided into four sectors numbered 1, 2, 3, and 4. The fractions of the area occupied by the various sectors are indicated in the figure. To play the game, each player spins the pointer of his or her wheel and lets it come to rest at random. The number of the sector in which each pointer comes to rest is called the move of that player. Thus, player R has three possible moves and player C has four possible moves. Depending on the move each player makes, player C then makes a payment of money to player R according to Table 1.

1/4 1

2

Table 1 Payment to Player R Player C ’s Move

4 3 1/3 Column-wheel of player C

Figure 10.6.1

1/6

Player R’s Move

1

2

3

4

1

$3

$5

–$2

–$1

2

–$2

$4

–$3

–$4

3

$6

–$5

$0

$3

For example, if the row-wheel pointer comes to rest in sector 1 (player R makes move 1), and the column-wheel pointer comes to rest in sector 2 (player C makes move 2), then player C must pay player R the sum of $5. Some of the entries in this table are

10.6 Games of Strategy

571

negative, indicating that player C makes a negative payment to player R . By this we mean that player R makes a positive payment to player C . For example, if the row-wheel shows 2 and the column-wheel shows 4, then player R pays player C the sum of $4, because the corresponding entry in the table is −$4. In this way the positive entries of the table are the gains of player R and the losses of player C , and the negative entries are the gains of player C and the losses of player R . In this game the players have no control over their moves; each move is determined by chance. However, if each player can decide whether he or she wants to play, then each would want to know how much he or she can expect to win or lose over the long term if he or she chooses to play. (Later in the section we will discuss this question and also consider a more complicated situation in which the players can exercise some control over their moves by varying the sectors of their wheels.) Two-Person Zero-Sum Matrix Games

The game described above is an example of a two-person zero-sum matrix game. The term zero-sum means that in each play of the game, the positive gain of one player is equal to the negative gain (loss) of the other player. That is, the sum of the two gains is zero. The term matrix game is used to describe a two-person game in which each player has only a finite number of moves, so that all possible outcomes of each play, and the corresponding gains of the players, can be displayed in tabular or matrix form, as in Table 1. In a general game of this type, let player R have m possible moves and let player C have n possible moves. In a play of the game, each player makes one of his or her possible moves, and then a payoff is made from player C to player R , depending on the moves. For i = 1, 2, . . . , m, and j = 1, 2, . . . , n, let us set

aij = payoff that player C makes to player R if player R makes move i and player C makes move j This payoff need not be money; it can be any type of commodity to which we can attach a numerical value. As before, if an entry aij is negative, we mean that player C receives a payoff of |aij | from player R . We arrange these mn possible payoffs in the form of an m × n matrix ⎤ ⎡

a11 ⎢a ⎢ 21 A=⎢ . ⎣ .. a m1

a12 a22 .. .

am2

··· ···

a1n a2 n ⎥ ⎥ .. ⎥ . ⎦ · · · amn

which we will call the payoff matrix of the game. Each player is to make his or her moves on a probabilistic basis. For example, for the game discussed in the introduction, the ratio of the area of a sector to the area of the wheel would be the probability that the player makes the move corresponding to that sector. Thus, from Figure 10.6.1, we see that player R would make move 2 with probability 13 , and player C would make move 2 with probability 41 . In the general case we make the following definitions:

pi = probability that player R makes move i (i = 1, 2, . . . , m) qj = probability that player C makes move j (j = 1, 2, . . . , n) It follows from these definitions that

p1 + p2 + · · · + pm = 1 and

q1 + q2 + · · · + qn = 1

572

Chapter 10 Applications of Linear Algebra

With the probabilities pi and qj we form two vectors:

p = [p1 p2

⎡ ⎤ q1 ⎢q ⎥ ⎢ 2⎥ · · · pm ] and q = ⎢ . ⎥ ⎣ .. ⎦ qn

We call the row vector p the strategy of player R and the column vector q the strategy of player C . For example, from Figure 10.6.1 we have

⎡ ⎤ 1

p=

1

⎢4⎥ ⎢1⎥  ⎢4⎥ 1 and q = ⎢1⎥ 2 ⎢ ⎥ ⎣3⎦

1 3

6

1 6

for the carnival game described earlier. From the theory of probability, if the probability that player R makes move i is pi , and independently the probability that player C makes move j is qj , then pi qj is the probability that for any one play of the game, player R makes move i and player C makes move j . The payoff to player R for such a pair of moves is aij . If we multiply each possible payoff by its corresponding probability and sum over all possible payoffs, we obtain the expression

a11 p1 q1 + a12 p1 q2 + · · · + a1n p1 qn + a21 p2 q1 + · · · + amn pm qn

(1)

Equation (1) is a weighted average of the payoffs to player R ; each payoff is weighted according to the probability of its occurrence. In the theory of probability, this weighted average is called the expected payoff to player R . It can be shown that if the game is played many times, the long-term average payoff per play to player R is given by this expression. We denote this expected payoff by E(p, q) to emphasize the fact that it depends on the strategies of the two players. From the definition of the payoff matrix A and the strategies p and q, it can be verified that we may express the expected payoff in matrix notation as



a11 ⎢a ⎢ 21 · · · pm ] ⎢ . ⎣ .. a m1

E(p, q) = [p1 p2

a12 a22 .. . am2

⎤⎡ ⎤ q1 a1n ⎥ ⎢ a2n ⎥ ⎢q2 ⎥ ⎥ .. ⎥ ⎢ .. ⎥ = pAq ⎦ ⎣ . .⎦ · · · amn qn ··· ···

(2)

Because E(p, q) is the expected payoff to player R , it follows that −E(p, q) is the expected payoff to player C . E X A M P L E 1 Expected Payoff to Player R

For the carnival game described earlier, we have



E(p, q) = pAq =

1 6

1 3

3  1 ⎢ −2 2 ⎣ 6

5 4 −5



⎡ ⎤ 1

4⎥ −1 ⎢ ⎢1⎥ ⎥ ⎢4⎥ −4⎦ ⎢ 1 ⎥ = ⎢ ⎥ 0 3 ⎣3⎦

−2 −3

13 72

= .1805 . . .

1 6

Thus, in the long run, player R can expect to receive an average of about 18 cents from player C in each play of the game.

10.6 Games of Strategy

573

So far we have been discussing the situation in which each player has a predetermined strategy. We will now consider the more difficult situation in which both players can change their strategies independently. For example, in the game described in the introduction, we would allow both players to alter the areas of the sectors of their wheels and thereby control the probabilities of their respective moves. This qualitatively changes the nature of the problem and puts us firmly in the field of true game theory. It is understood that neither player knows what strategy the other will choose. It is also assumed that each player will make the best possible choice of strategy and that the other player knows this. Thus, player R attempts to choose a strategy p such that E(p, q) is as large as possible for the best strategy q that player C can choose; and similarly, player C attempts to choose a strategy q such that E(p, q) is as small as possible for the best strategy p that player R can choose. To see that such choices are actually possible, we will need the following theorem, called the Fundamental Theorem of Two-Person Zero-Sum Games. (The general proof, which involves ideas from the theory of linear programming, will be omitted. However, below we will prove this theorem for what are called strictly determined games and 2 × 2 matrix games.) THEOREM 10.6.1 Fundamental Theorem of Zero-Sum Games

There exist strategies p∗ and q∗ such that

E(p∗ , q) ≥ E(p∗ , q∗ ) ≥ E(p, q∗ )

(3)

for all strategies p and q. The strategies p∗ and q∗ in this theorem are the best possible strategies for players R and C , respectively. To see why this is so, let v = E(p∗ , q∗ ). The left-hand inequality of Equation (3) then reads

E(p∗ , q) ≥ v for all strategies q This means that if player R chooses the strategy p∗ , then no matter what strategy q player C chooses, the expected payoff to player R will never be below v . Moreover, it is not possible for player R to achieve an expected payoff greater than v . To see why, suppose there is some strategy p∗∗ that player R can choose such that

E(p∗∗ , q) > v for all strategies q Then, in particular,

E(p∗∗ , q∗ ) > v

But this contradicts the right-hand inequality of Equation (3), which requires that v ≥ E(p∗∗ , q∗ ). Consequently, the best player R can do is prevent his or her expected payoff from falling below the value v . Similarly, the best player C can do is ensure that player R ’s expected payoff does not exceed v , and this can be achieved by using strategy q∗ . On the basis of this discussion, we arrive at the following definitions. DEFINITION 1 If p∗ and q∗ are strategies such that

E(p∗ , q) ≥ E(p∗ , q∗ ) ≥ E(p, q∗ ) for all strategies p and q, then (i) p∗ is called an optimal strategy for player R . (ii) q∗ is called an optimal strategy for player C . (iii) v = E(p∗ , q∗ ) is called the value of the game.

(4)

574

Chapter 10 Applications of Linear Algebra

The wording in this definition suggests that optimal strategies are not necessarily unique. This is indeed the case, and in Exercise 2 we ask you to show this. However, it can be proved that any two sets of optimal strategies always result in the same value v of the game. That is, if p∗ , q∗ and p∗∗ , q∗∗ are optimal strategies, then

E(p∗ , q∗ ) = E(p∗∗ , q∗∗ )

(5)

The value of a game is thus the expected payoff to player R when both players choose any possible optimal strategies. To find optimal strategies, we must find vectors p∗ and q∗ that satisfy Equation (4). This is generally done by using linear programming techniques. Next, we discuss special cases for which optimal strategies may be found by more elementary techniques. We now introduce the following definition. DEFINITION 2 An entry ars in a payoff matrix A is called a saddle point if

(i) ars is the smallest entry in its row, and (ii) ars is the largest entry in its column. A game whose payoff matrix has a saddle point is called strictly determined . For example, the shaded element in each of the following payoff matrices is a saddle point:

30 60 10

3 1 , 4 0

50 90 60

5 75 , 30

0 15 7 6

3 8 10 11

5 2 6 3

9 10 9 2

If a matrix has a saddle point ars , it turns out that the following strategies are optimal strategies for the two players:

⎡ ⎤ 0

p∗ = [0

· · · 1 · · · 0],

0

rth entry

⎢0⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢.⎥ ∗ ⎥ q =⎢ ⎢1⎥ ← ⎢ ⎥ ⎢ .. ⎥ ⎣.⎦

sth entry

0 That is, an optimal strategy for player R is to always make the r th move, and an optimal strategy for player C is to always make the s th move. Such strategies for which only one move is possible are called pure strategies. Strategies for which more than one move is possible are called mixed strategies. To show that the above pure strategies are optimal, you can verify the following three equations (see Exercise 6):

E(p∗ , q∗ ) = p∗ Aq∗ = ars E(p∗ , q) = p∗ Aq ≥ ars ∗



E(p, q ) = pAq ≤ ars

(6) for any strategy q

(7)

for any strategy p

(8)

Together, these three equations imply that

E(p∗ , q) ≥ E(p∗ , q∗ ) ≥ E(p, q∗ ) for all strategies p and q. Because this is exactly Equation (4), it follows that p∗ and q∗ are optimal strategies.

10.6 Games of Strategy

575

From Equation (6) the value of a strictly determined game is simply the numerical value of a saddle point ars . It is possible for a payoff matrix to have several saddle points, but then the uniqueness of the value of a game guarantees that the numerical values of all saddle points are the same. E X A M P L E 2 Optimal Strategies to Maximize a Viewing Audience

Two competing television networks, R and C , are scheduling one-hour programs in the same time period. Network R can schedule one of three possible programs, and network C can schedule one of four possible programs. Neither network knows which program the other will schedule. Both networks ask the same outside polling agency to give them an estimate of how all possible pairings of the programs will divide the viewing audience. The agency gives them each Table 2, whose (i, j )-th entry is the percentage of the viewing audience that will watch network R if network R ’s program i is paired against network C ’s program j . What program should each network schedule in order to maximize its viewing audience? Table 2 Audience Percentage for Network R Network C’s Program

Network R’s Program

1

2

3

4

1

60

20

30

55

2

50

75

45

60

3

70

45

35

30

Solution Subtract 50 from each entry in Table 2 to construct the following matrix:



10 ⎢ ⎣ 0 20

−30 25 −5

⎤ −20 5 ⎥ −5 10⎦ −15 −20

This is the payoff matrix of the two-person zero-sum game in which each network is considered to start with 50% of the audience, and the (i, j )-th entry of the matrix is the percentage of the viewing audience that network C loses to network R if programs i and j are paired against each other. It is easy to see that the entry

a23 = −5 is a saddle point of the payoff matrix. Hence, the optimal strategy of network R is to schedule program 2, and the optimal strategy of network C is to schedule program 3. This will result in network R ’s receiving 45% of the audience and network C ’s receiving 55% of the audience. 2 × 2 Matrix Games

Another case in which the optimal strategies can be found by elementary means occurs when each player has only two possible moves. In this case, the payoff matrix is a 2 × 2 matrix



A=

a11 a21

a12 a22

576

Chapter 10 Applications of Linear Algebra

If the game is strictly determined, at least one of the four entries of A is a saddle point, and the techniques discussed above can then be applied to determine optimal strategies for the two players. If the game is not strictly determined, we first compute the expected payoff for arbitrary strategies p and q:

a11 a12 q1 E(p, q) = pAq = [p1 p2 ] a21 a22 q2 = a11 p1 q1 + a12 p1 q2 + a21 p2 q1 + a22 p2 q2

(9)

p1 + p2 = 1 and q1 + q2 = 1

(10)



Because

we may substitute p2 = 1 − p1 and q2 = 1 − q1 into (9) to obtain

E(p, q) = a11 p1 q1 + a12 p1 (1 − q1 ) + a21 (1 − p1 )q1 + a22 (1 − p1 )(1 − q1 )

(11)

If we rearrange the terms in Equation (11), we can write

E(p, q) = [(a11 + a22 − a12 − a21 )p1 − (a22 − a21 )]q1 + (a12 − a22 )p1 + a22

(12)

By examining the coefficient of the q1 term in (12), we see that if we set

p1 = p1∗ =

a22 − a21 a11 + a22 − a12 − a21

(13)

then that coefficient is zero, and (12) reduces to

E(p∗ , q) =

a11 a22 − a12 a21 a11 + a22 − a12 − a21

(14)

Equation (14) is independent of q; that is, if player R chooses the strategy determined by (13), player C cannot change the expected payoff by varying his or her strategy. In a similar manner, it can be verified that if player C chooses the strategy determined by

a22 − a12 a11 + a22 − a12 − a21

(15)

a11 a22 − a12 a21 a11 + a22 − a12 − a21

(16)

E(p∗ , q) = E(p∗ , q∗ ) = E(p, q∗ )

(17)

q1 = q1∗ = then substituting in (12) gives

E(p, q∗ ) = Equations (14) and (16) show that

for all strategies p and q. Thus, the strategies determined by (13), (15), and (10) are optimal strategies for players R and C , respectively, and so we have the following result.

10.6 Games of Strategy

577

THEOREM 10.6.2 Optimal Strategies for a 2 × 2 Matrix Game

For a 2 × 2 game that is not strictly determined, optimal strategies for players R and C are

p∗ =

and

a22 − a21 a11 − a12 a11 + a22 − a12 − a21 a11 + a22 − a12 − a21 ⎤ ⎡ a22 − a12 ⎢ a11 + a22 − a12 − a21 ⎥ ⎥ q∗ = ⎢ ⎦ ⎣ a11 − a21 a11 + a22 − a12 − a21

The value of the game is

v=

a11 a22 − a12 a21 a11 + a22 − a12 − a21

In order to be complete, we must show that the entries in the vectors p∗ and q∗ are numbers strictly between 0 and 1. In Exercise 8 we ask you to show that this is the case as long as the game is not strictly determined. Equation (17) is interesting in that it implies that either player can force the expected payoff to be the value of the game by choosing his or her optimal strategy, regardless of which strategy the other player chooses. This is not true, in general, for games in which either player has more than two moves. E X A M P L E 3 Using Theorem 10.6.2

The federal government desires to inoculate its citizens against a certain flu virus. The virus has two strains, and the proportions in which the two strains occur in the virus population is not known. Two vaccines have been developed and each citizen is given only one of them. Vaccine 1 is 85% effective against strain 1 and 70% effective against strain 2. Vaccine 2 is 60% effective against strain 1 and 90% effective against strain 2. What inoculation policy should the government adopt? Solution We can consider this a two-person game in which player R (the government) desires to make the payoff (the fraction of citizens resistant to the virus) as large as possible, and player C (the virus) desires to make the payoff as small as possible. The payoff matrix is

Vaccine



Strain 1 2

.85 .70 2 .60 .90 1

This matrix has no saddle points, so Theorem 10.6.2 is applicable. Consequently, 2 a22 − a21 .90 − .60 .30 p1∗ = = = = a11 + a22 − a12 − a21 .85 + .90 − .70 − .60 .45 3

p2∗ = 1 − p1∗ = 1 − q1∗ =

4 a22 − a12 .90 − .70 .20 = = = a11 + a22 − a12 − a21 .85 + .90 − .70 − .60 .45 9

q2∗ = 1 − q1∗ = 1 − v=

2 1 = 3 3

4 5 = 9 9

a11 a22 − a12 a21 (.85)(.90) − (.70)(.60) .345 = = = .7666 . . . a11 + a22 − a12 − a21 .85 + .90 − .70 − .60 .45

578

Chapter 10 Applications of Linear Algebra

Thus, the optimal strategy for the government is to inoculate 23 of the citizens with vaccine 1 and 13 of the citizens with vaccine 2. This will guarantee that about 76.7% of the citizens will be resistant to a virus attack regardless of the distribution of the two strains. In contrast, a virus distribution of 49 of strain 1 and 59 of strain 2 will result in the same 76.7% of resistant citizens, regardless of the inoculation strategy adopted by the government (see Exercise 7).

Exercise Set 10.6 1. Suppose that a game has a payoff matrix



−4 ⎢ A=⎣ 5 −8

6

−4

−7

3 6

0

5. Player R has two playing cards: a black ace and a red four. Player C also has two cards: a black two and a red three. Each player secretly selects one of his or her cards. If both selected cards are the same color, player C pays player R the sum of the face values in dollars. If the cards are different colors, player R pays player C the sum of the face values. What are optimal strategies for both players, and what is the value of the game?



1 ⎥ 8⎦ −2

(a) If players R and C use strategies

⎡ ⎤ 1

p=

1 2

⎢4⎥ ⎢1⎥  ⎢4⎥ 0 21 and q = ⎢ 1 ⎥ ⎢ ⎥ ⎣4⎦

6. Verify Equations (6), (7), and (8). 7. Verify the statement in the last paragraph of Example 3.

1 4

respectively, what is the expected payoff of the game? (b) If player C keeps his strategy fixed as in part (a), what strategy should player R choose to maximize his expected payoff ? (c) If player R keeps her strategy fixed as in part (a), what strategy should player C choose to minimize the expected payoff to player R ? 2. Construct a simple example to show that optimal strategies are not necessarily unique. For example, find a payoff matrix with several equal saddle points. 3. For the strictly determined games with the following payoff matrices, find optimal strategies for the two players, and find the values of the games.

(a)

5 7

2 3





−3 ⎢ (b) ⎣ 2 −4

−2

2 ⎢ (c) ⎣−6 5

0 2





−3 ⎢−2 ⎢ (d) ⎢ ⎣−4 −3

0 ⎥ −5 ⎦ 3

−2

⎤ ⎥

4⎦ 1 2 −1 1 4

−1



5⎥ ⎥ ⎥ 0⎦ 6

4. For the 2 × 2 games with the following payoff matrices, find optimal strategies for the two players, and find the values of the games.

(a)



6 −1

3 (d) 5

3 4 5 2



40 (b) −10

20 30

(e)

(c)

7 −5

−3 −2

3 −5

7 4

8. Show that the entries of the optimal strategies p∗ and q∗ given in Theorem 10.6.2 are numbers strictly between zero and one.

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider a game between two players where each player can make up to n different moves (n > 1). If the i th move of player R and the j th move of player C are such that i + j is even, then C pays R $1. If i + j is odd, then R pays C $1. Assume that both players have the same strategy—that is, pn = [ρi ]1×n and qn = [ρi ]n×1 , where ρ1 + ρ2 + ρ3 + · · · + ρn = 1. Use a computer to show that

E(p2 , q2 ) = (ρ1 − ρ2 )2 E(p3 , q3 ) = (ρ1 − ρ2 + ρ3 )2 E(p4 , q4 ) = (ρ1 − ρ2 + ρ3 − ρ4 )2 E(p5 , q5 ) = (ρ1 − ρ2 + ρ3 − ρ4 + ρ5 )2 Using these results as a guide, prove in general that the expected payoff to player R is

⎛ ⎞2 n j + 1 E(pn , qn ) = ⎝ (−1) ρj ⎠ ≥ 0 j =1

which shows that in the long run, player R will not lose in this game.

10.7 Leontief Economic Models

T2. Consider a game between two players where each player can make up to n different moves (n > 1). If both players make the same move, then player C pays player R $(n − 1). However, if both players make different moves, then player R pays player C $1. Assume that both players have the same strategy—that is, pn = [ρi ]1×n and qn = [ρi ]n×1 , where ρ1 + ρ2 + ρ3 + · · · + ρn = 1. Use a computer to show that E(p2 , q2 ) = 21 (ρ1 − ρ1 )2 + 21 (ρ1 − ρ2 )2 + 21 (ρ2 − ρ1 )2

579

Using these results as a guide, prove in general that the expected payoff to player R is

E(pn , qn ) =

n n 1 -(ρi − ρj )2 ≥ 0 2 i=1 j =1

which shows that in the long run, player R will not lose in this game.

+ 21 (ρ2 − ρ2 )2 E(p3 , q3 ) = 21 (ρ1 − ρ1 )2 + 21 (ρ1 − ρ2 )2 + 21 (ρ1 − ρ3 )2 + 21 (ρ2 − ρ1 )2 + 21 (ρ2 − ρ2 )2 + 21 (ρ2 − ρ3 )2 + 21 (ρ3 − ρ1 )2 + 21 (ρ3 − ρ2 )2 + 21 (ρ3 − ρ3 )2 E(p4 , q4 ) = 21 (ρ1 − ρ1 )2 + 21 (ρ1 − ρ2 )2 + 21 (ρ1 − ρ3 )2 + 21 (ρ1 − ρ4 )2 + 21 (ρ2 − ρ1 )2 + 21 (ρ2 − ρ2 )2 + 21 (ρ2 − ρ3 )2 + 21 (ρ2 − ρ4 )2 + 21 (ρ3 − ρ1 )2 + 21 (ρ3 − ρ2 )2 + 21 (ρ3 − ρ3 )2 + 21 (ρ3 − ρ4 )2 + 21 (ρ4 − ρ1 )2 + 21 (ρ4 − ρ2 )2 + 21 (ρ4 − ρ3 )2 + 21 (ρ4 − ρ4 )2

10.7 Leontief Economic Models In this section we discuss two linear models for economic systems. Some results about nonnegative matrices are applied to determine equilibrium price structures and outputs necessary to satisfy demand.

PREREQUISITES: Linear Systems Matrices

Economic Systems

Matrix theory has been very successful in describing the interrelations among prices, outputs, and demands in economic systems. In this section we discuss some simple models based on the ideas of Nobel laureate Wassily Leontief. We examine two different but related models: the closed or input-output model, and the open or production model. In each, we are given certain economic parameters that describe the interrelations between the “industries” in the economy under consideration. Using matrix theory, we then evaluate certain other parameters, such as prices or output levels, in order to satisfy a desired economic objective. We begin with the closed model.

Leontief Closed (Input-Output) Model

First we present a simple example; then we proceed to the general theory of the model.

E X A M P L E 1 An Input-Output Model

Three homeowners—a carpenter, an electrician, and a plumber—agree to make repairs in their three homes. They agree to work a total of 10 days each according to the following schedule:

580

Chapter 10 Applications of Linear Algebra Work Performed by Carpenter

Electrician

Plumber

Days of Work in Home of Carpenter

2

1

6

Days of Work in Home of Electrician

4

5

1

Days of Work in Home of Plumber

4

4

3

For tax purposes, they must report and pay each other a reasonable daily wage, even for the work each does on his or her own home. Their normal daily wages are about $100, but they agree to adjust their respective daily wages so that each homeowner will come out even—that is, so that the total amount paid out by each is the same as the total amount each receives. We can set

p1 = daily wage of carpenter p2 = daily wage of electrician p3 = daily wage of plumber To satisfy the “equilibrium” condition that each homeowner comes out even, we require that total expenditures = total income for each of the homeowners for the 10-day period. For example, the carpenter pays a total of 2p1 + p2 + 6p3 for the repairs in his own home and receives a total income of 10p1 for the repairs that he performs on all three homes. Equating these two expressions then gives the first of the following three equations: 2p1 + p2 + 6p3 = 10p1 4p1 + 5p2 + p3 = 10p2 4p1 + 4p2 + 3p3 = 10p3 The remaining two equations are the equilibrium equations for the electrician and the plumber. Dividing these equations by 10 and rewriting them in matrix form yields



.2 ⎢ ⎣.4 .4

.1 .5 .4

⎤⎡ ⎤ ⎡ ⎤ p1 p1 .6 ⎥⎢ ⎥ ⎢ ⎥ .1⎦ ⎣p2 ⎦ = ⎣p2 ⎦ .3 p3 p3

(1)

Equation (1) can be rewritten as a homogeneous system by subtracting the left side from the right side to obtain



.8 ⎢ ⎣−.4 −.4

⎤⎡ ⎤ ⎡ ⎤ p1 −.1 −.6 0 ⎥⎢ ⎥ ⎢ ⎥ .5 −.1⎦ ⎣p2 ⎦ = ⎣0⎦ 0 −.4 .7 p3

The solution of this homogeneous system is found to be (verify)



⎤ ⎡ ⎤ 31 p1 ⎢ ⎥ ⎢ ⎥ ⎣p2 ⎦ = s ⎣32⎦ 36 p3

where s is an arbitrary constant. This constant is a scale factor, which the homeowners may choose for their convenience. For example, they can set s = 3 so that the corresponding daily wages—$93, $96, and $108—are about $100.

10.7 Leontief Economic Models

581

This example illustrates the salient features of the Leontief input-output model of a closed economy. In the basic Equation (1), each column sum of the coefficient matrix is 1, corresponding to the fact that each of the homeowners’ “output” of labor is completely distributed among these same homeowners in the proportions given by the entries in the column. Our problem is to determine suitable “prices” for these outputs so as to put the system in equilibrium—that is, so that each homeowner’s total expenditures equal his or her total income. In the general model we have an economic system consisting of a finite number of “industries,” which we number as industries 1, 2, . . . , k . Over some fixed period of time, each industry produces an “output” of some good or service that is completely utilized in a predetermined manner by the k industries. An important problem is to find suitable “prices” to be charged for these k outputs so that for each industry, total expenditures equal total income. Such a price structure represents an equilibrium position for the economy. For the fixed time period in question, let us set

pi = price charged by the i th industry for its total output eij = fraction of the total output of the j th industry purchased by the i th industry for i, j = 1, 2, . . . , k . By definition, we have (i) pi ≥ 0, (ii) eij ≥ 0,

i = 1, 2, . . . , k i, j = 1, 2, . . . , k

(iii) e1j + e2j + · · · + ekj = 1,

j = 1, 2, . . . , k

With these quantities, we form the price vector

⎤ p1 ⎢p ⎥ ⎢ 2⎥ p=⎢ . ⎥ ⎣ .. ⎦ pk ⎡

and the exchange matrix or input-output matrix



e11 ⎢e ⎢ 21 E=⎢ . ⎣ .. ek 1

e12 e22 .. . ek 2

⎤ e1k e2 k ⎥ ⎥ .. ⎥ . ⎦ · · · ekk ··· ···

Condition (iii) expresses the fact that all the column sums of the exchange matrix are 1. As in the example, in order that the expenditures of each industry be equal to its income, the following matrix equation must be satisfied [see (1)]:

Ep = p

(2)

(I − E)p = 0

(3)

or

Equation (3) is a homogeneous linear system for the price vector p. It will have a nontrivial solution if and only if the determinant of its coefficient matrix I − E is zero. In Exercise 7 we ask you to show that this is the case for any exchange matrix E . Thus, (3) always has nontrivial solutions for the price vector p. Actually, for our economic model to make sense, we need more than just the fact that (3) has nontrivial solutions for p. We also need the prices pi of the k outputs to be nonnegative numbers. We express this condition as p ≥ 0. (In general, if A is any

582

Chapter 10 Applications of Linear Algebra

vector or matrix, the notation A ≥ 0 means that every entry of A is nonnegative, and the notation A > 0 means that every entry of A is positive. Similarly, A ≥ B means A − B ≥ 0, and A > B means A − B > 0.) To show that (3) has a nontrivial solution for which p ≥ 0 is a bit more difficult than showing merely that some nontrivial solution exists. But it is true, and we state this fact without proof in the following theorem.

E is an exchange matrix, then E p = p always has a nontrivial solution p whose entries are nonnegative.

THEOREM 10.7.1 If

Let us consider a few simple examples of this theorem.

E X A M P L E 2 Using Theorem 10.7.1



1

Let

E= Then (I − E)p = 0 is



1 2 − 21

0

2 1 2

1



0 p1 = 0 p2

0 0

which has the general solution

p=s

0 1

where s is an arbitrary constant. We then have nontrivial solutions p ≥ 0 for any s > 0.

E X A M P L E 3 Using Theorem 10.7.1

Let



1 E= 0 Then (I − E)p = 0 has the general solution



0 1



1 0 p=s +t 0 1

where s and t are independent arbitrary constants. Nontrivial solutions p ≥ 0 then result from any s ≥ 0 and t ≥ 0, not both zero.

Example 2 indicates that in some situations one of the prices must be zero in order to satisfy the equilibrium condition. Example 3 indicates that there may be several linearly independent price structures available. Neither of these situations describes a truly interdependent economic structure. The following theorem gives sufficient conditions for both cases to be excluded. THEOREM 10.7.2 Let

E be an exchange matrix such that for some positive integer m all the entries of E m are positive. Then there is exactly one linearly independent solution of (I − E)p = 0, and it may be chosen so that all its entries are positive.

10.7 Leontief Economic Models

583

We will not give a proof of this theorem. If you have read Section 10.4 on Markov chains, observe that this theorem is essentially the same as Theorem 10.4.4. What we are calling exchange matrices in this section were called stochastic or Markov matrices in Section 10.4.

E X A M P L E 4 Using Theorem 10.7.2

The exchange matrix in Example 1 was



.2 ⎢ E = ⎣.4 .4

.1 .5 .4

⎤ .6 ⎥ .1⎦ .3

Because E > 0, the condition E m > 0 in Theorem 10.7.2 is satisfied for m = 1. Consequently, we are guaranteed that there is exactly one linearly independent solution of (I − E)p = 0, and it can be chosen so that p > 0. In that example, we found that

⎡ ⎤

31 ⎢ ⎥ p = ⎣32⎦ 36 is such a solution. Leontief Open (Production) Model

In contrast with the closed model, in which the outputs of k industries are distributed only among themselves, the open model attempts to satisfy an outside demand for the outputs. Portions of these outputs can still be distributed among the industries themselves, to keep them operating, but there is to be some excess, some net production, with which to satisfy the outside demand. In the closed model the outputs of the industries are fixed, and our objective is to determine prices for these outputs so that the equilibrium condition, that expenditures equal incomes, is satisfied. In the open model it is the prices that are fixed, and our objective is to determine levels of the outputs of the industries needed to satisfy the outside demand. We will measure the levels of the outputs in terms of their economic values using the fixed prices. To be precise, over some fixed period of time, let

xi = monetary value of the total output of the i th industry di = monetary value of the output of the i th industry needed to satisfy the outside demand

cij = monetary value of the output of the i th industry needed by the j th industry to produce one unit of monetary value of its own output With these quantities, we define the production vector

⎡ ⎤ x1 ⎢x ⎥ ⎢ 2⎥ x=⎢.⎥ ⎣ .. ⎦ xk

the demand vector

⎡ ⎤ d1 ⎢d ⎥ ⎢ 2⎥ d = ⎢ .. ⎥ ⎣.⎦ dk

584

Chapter 10 Applications of Linear Algebra

and the consumption matrix



c11 ⎢c ⎢ 21 C=⎢ . ⎣ .. ck1

c12 c22 .. . ck2

⎤ c1k c2 k ⎥ ⎥ .. ⎥ . ⎦ · · · ckk ··· ···

By their nature, we have that x ≥ 0, d ≥ 0, and C ≥ 0 From the definition of cij and xj , it can be seen that the quantity

ci 1 x1 + ci 2 x2 + · · · + cik xk is the value of the output of the i th industry needed by all k industries to produce a total output specified by the production vector x. Because this quantity is simply the i th entry of the column vector C x, we can say further that the i th entry of the column vector x − Cx is the value of the excess output of the i th industry available to satisfy the outside demand. The value of the outside demand for the output of the i th industry is the i th entry of the demand vector d. Consequently, we are led to the following equation x − Cx = d or

(I − C)x = d

(4)

for the demand to be exactly met, without any surpluses or shortages. Thus, given C and d, our objective is to find a production vector x ≥ 0 that satisfies Equation (4). E X A M P L E 5 Production Vector for a Town

A town has three main industries: a coal-mining operation, an electric power-generating plant, and a local railroad. To mine $1 of coal, the mining operation must purchase $.25 of electricity to run its equipment and $.25 of transportation for its shipping needs. To produce $1 of electricity, the generating plant requires $.65 of coal for fuel, $.05 of its own electricity to run auxiliary equipment, and $.05 of transportation. To provide $1 of transportation, the railroad requires $.55 of coal for fuel and $.10 of electricity for its auxiliary equipment. In a certain week the coal-mining operation receives orders for $50,000 of coal from outside the town, and the generating plant receives orders for $25,000 of electricity from outside. There is no outside demand for the local railroad. How much must each of the three industries produce in that week to exactly satisfy their own demand and the outside demand? Solution For the one-week period let

x1 = value of total output of coal-mining operation x2 = value of total output of power-generating plant x3 = value of total output of local railroad From the information supplied, the consumption matrix of the system is



0 ⎢ . 25 C=⎣ .25

⎤ .65 .55 ⎥ .05 .10⎦ .05 0

10.7 Leontief Economic Models

585

The linear system (I − C)x = d is then



1.00 ⎢ ⎣−.25 −.25

⎤⎡ ⎤ ⎡ ⎤ x1 −.65 −.55 50,000 ⎥⎢ ⎥ ⎢ ⎥ .95 −.10⎦ ⎣x2 ⎦ = ⎣25,000⎦ 0 −.05 1.00 x3

The coefficient matrix on the left is invertible, and the solution is given by



756 1 ⎢ −1 x = (I − C) d = ⎣220 503 200

542 690 170

⎤⎡







470 50,000 102,087 ⎥⎢ ⎥ ⎢ ⎥ 190⎦ ⎣25,000⎦ = ⎣ 56,163⎦ 630 0 28,330

Thus, the total output of the coal-mining operation should be $102,087, the total output of the power-generating plant should be $56,163, and the total output of the railroad should be $28,330. Let us reconsider Equation (4):

(I − C)x = d If the square matrix I − C is invertible, we can write x = (I − C)−1 d

(5)

−1

In addition, if the matrix (I − C) has only nonnegative entries, then we are guaranteed that for any d ≥ 0, Equation (5) has a unique nonnegative solution for x. This is a particularly desirable situation, as it means that any outside demand can be met. The terminology used to describe this case is given in the following definition. DEFINITION 1 A consumption matrix

and

C is said to be productive if (I − C)−1 exists

(I − C)−1 ≥ 0

We will now consider some simple criteria that guarantee that a consumption matrix is productive. The first is given in the following theorem. THEOREM 10.7.3 Productive Consumption Matrix

A consumption matrix C is productive if and only if there is some production vector x ≥ 0 such that x > C x. (The proof is outlined in Exercise 9.) The condition x > C x means that there is some production schedule possible such that each industry produces more than it consumes. Theorem 10.7.3 has two interesting corollaries. Suppose that all the row sums of C are less than 1. If ⎡ ⎤ 1 ⎢1⎥ ⎢ ⎥ x = ⎢ .. ⎥

⎣.⎦

1 then C x is a column vector whose entries are these row sums. Therefore, x > C x, and the condition of Theorem 10.7.3 is satisfied. Thus, we arrive at the following corollary: COROLLARY 10.7.4 A consumption matrix is productive if each of its row sums is less

than 1.

586

Chapter 10 Applications of Linear Algebra

As we ask you to show in Exercise 8, this corollary leads to the following: COROLLARY 10.7.5 A consumption matrix is productive if each of its column sums is

less than 1. Recalling the definition of the entries of the consumption matrix C , we see that the j th column sum of C is the total value of the outputs of all k industries needed to produce one unit of value of output of the j th industry. The j th industry is thus said to be profitable if that j th column sum is less than 1. In other words, Corollary 10.7.5 says that a consumption matrix is productive if all k industries in the economic system are profitable. E X A M P L E 6 Using Corollary 10.7.5

The consumption matrix in Example 5 was



0 ⎢ C = ⎣.25 .25

⎤ .65 .55 ⎥ .05 .10⎦ .05 0

All three column sums in this matrix are less than 1, so all three industries are profitable. Consequently, by Corollary 10.7.5, the consumption matrix C is productive. This can also be seen in the calculations in Example 5, as (I − C)−1 is nonnegative.

Exercise Set 10.7 1. For the following exchange matrices, find nonnegative price vectors that satisfy the equilibrium condition (3).

1 2 1 2

(a)



.35 ⎣ (c) .25 .40

1 3 2 3



 (b)

.50 .20 .30



1 ⎢ 21 ⎢ ⎣3 1 6

0 0 1



1 2 ⎥ 1⎥ 2⎦

0

.30 .30⎦ .40

2. Using Theorem 10.7.3 and its corollaries, show that each of the following consumption matrices is productive.

(a)

.8 .3



.7 (c) ⎣.1 .2

.1 .6



.3 .4 .4

.70 ⎢ (b) ⎣.20 .05

.30 .40 .15

⎤ .25 ⎥ .25⎦ .25

⎤ .2 .3⎦ .1

3. Using Theorem 10.7.2, show that there is only one linearly independent price vector for the closed economic system with exchange matrix



0 E = ⎣1 0

.2 .2 .6

⎤ .5 .5⎦ 0

4. Three neighbors have backyard vegetable gardens. Neighbor A grows tomatoes, neighbor B grows corn, and neighbor C grows lettuce. They agree to divide their crops among themselves as follows: A gets 21 of the tomatoes, 13 of the corn, and 1 of the lettuce. B gets 13 of the tomatoes, 13 of the corn, and 4 1 of the lettuce. C gets 16 of the tomatoes, 13 of the corn, 21 of 4 the lettuce. What prices should the neighbors assign to their respective crops if the equilibrium condition of a closed economy is to be satisfied, and if the lowest-priced crop is to have a price of $100? 5. Three engineers—a civil engineer (CE), an electrical engineer (EE), and a mechanical engineer (ME)—each have a consulting firm. The consulting they do is of a multidisciplinary nature, so they buy a portion of each others’ services. For each $1 of consulting the CE does, she buys $.10 of the EE’s services and $.30 of the ME’s services. For each $1 of consulting the EE does, she buys $.20 of the CE’s services and $.40 of the ME’s services. And for each $1 of consulting the ME does, she buys $.30 of the CE’s services and $.40 of the EE’s services. In a certain week the CE receives outside consulting orders of $500, the EE receives outside consulting orders of $700, and the ME receives outside consulting orders of $600. What dollar amount of consulting does each engineer perform in that week? 6. (a) Suppose that the demand di for the output of the i th industry increases by one unit. Explain why the i th column of the matrix (I − C)−1 is the increase that must be made to the production vector x to satisfy this additional demand.

10.7 Leontief Economic Models

(b) Referring to Example 5, use the result in part (a) to determine the increase in the value of the output of the coal-mining operation needed to satisfy a demand of one additional unit in the value of the output of the powergenerating plant. 7. Using the fact that the column sums of an exchange matrix E are all 1, show that the column sums of I − E are zero. From this, show that I − E has zero determinant, and so (I − E)p = 0 has nontrivial solutions for p. 8. Show that Corollary 10.7.5 follows from Corollary 10.7.4. [Hint: Use the fact that (AT )−1 = (A−1 )T for any invertible matrix A.] 9. (Calculus required ) Prove Theorem 10.7.3 as follows: (a) Prove the “only if ” part of the theorem; that is, show that if C is a productive consumption matrix, then there is a vector x ≥ 0 such that x > C x. (b) Prove the “if ” part of the theorem as follows: Step 1. Show that if there is a vector x∗ ≥ 0 such that C x∗ < x∗ , then x∗ > 0. Step 2. Show that there is a number λ such that 0 < λ < 1 and C x∗ < λx∗ . Step 3. Show that C n x∗ < λn x∗ for n = 1, 2, . . . . Step 4. Show that C n → 0 as n → ⬁. Step 5. By multiplying out, show that

(I − C)(I + C + C 2 + · · · + C n−1 ) = I − C n for n = 1, 2, . . . . Step 6. By letting n → ⬁ in Step 5, show that the matrix infinite sum

S = I + C + C2 + · · · exists and that (I − C)S = I . Step 7. Show that S ≥ 0 and that S = (I − C)−1 . Step 8. Show that C is a productive consumption matrix.

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques

587

in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider a sequence of exchange matrices {E2 , E3 , E4 , E5 , . . . , En }, where

 E2 =

0 1



0

⎢ ⎢1 ⎢ E4 = ⎢ ⎢0 ⎣ 0

1 2 1 2





0

,

0

1 3 1 3

1 2

0

0

1 3

1 2



1 4 ⎥ 1⎥ 4⎥ ⎥, 1⎥ 4⎦ 1 4

1 2

⎢ E3 = ⎢ ⎣1

0

0

1 2

0

1 2



⎢ ⎢1 ⎢ ⎢ E5 = ⎢ ⎢0 ⎢ ⎢0 ⎣ 0



1 3 ⎥ 1⎥ , 3⎦ 1 3

0

1 3 1 3

1 2

0

1 4 1 4 1 4

0

1 3

0

0

0

1 4



1 5 ⎥ 1⎥ 5⎥ ⎥ 1⎥ 5⎥ 1⎥ ⎥ 5⎦ 1 5

and so on. Use a computer to show that E22 > 02 , E33 > 03 , E44 > 04 , E55 > 05 , and make the conjecture that although Enn > 0n is true, Enk > 0n is not true for k = 1, 2, 3, . . . , n − 1. Next, use a computer to determine the vectors pn such that En pn = pn (for n = 2, 3, 4, 5, 6), and then see if you can discover a pattern that would allow you to compute pn+1 easily from pn . Test your discovery by first constructing p8 from





2520

⎥ ⎢ ⎢3360⎥ ⎥ ⎢ ⎢1890⎥ ⎥ ⎢ ⎥ ⎢ p7 = ⎢ 672⎥ ⎥ ⎢ ⎢ 175⎥ ⎥ ⎢ ⎥ ⎢ ⎣ 36⎦ 7 and then checking to see whether E8 p8 = p8 . T2. Consider an open production model having n industries with n > 1. In order to produce $1 of its own output, the j th industry must spend $(1/n) for the output of the i th industry (for all i  = j ), but the j th industry (for all j = 1, 2, 3, . . . , n) spends nothing for its own output. Construct the consumption matrix Cn , show that it is productive, and determine an expression for (In − Cn )−1 . In determining an expression for (In − Cn )−1 , use a computer to study the cases when n = 2, 3, 4, and 5; then make a conjecture and prove your conjecture to be true. [Hint: If Fn = [1]n×n (i.e., the n × n matrix with every entry equal to 1), first show that

Fn2 = nFn and then express your value of (In − Cn )−1 in terms of n, In , and Fn .]

588

Chapter 10 Applications of Linear Algebra

10.8 Forest Management In this section we discuss a matrix model for the management of a forest where trees are grouped into classes according to height. The optimal sustainable yield of a periodic harvest is calculated when the trees of different height classes can have different economic values.

PREREQUISITES: Matrix Operations

Optimal SustainableYield

Our objective is to introduce a simplified model for the sustainable harvesting of a forest whose trees are classified by height. The height of a tree is assumed to determine its economic value when it is cut down and sold. Initially, there is a distribution of trees of various heights. The forest is then allowed to grow for a certain period of time, after which some of the trees of various heights are harvested. The trees left unharvested are to be of the same height configuration as the original forest, so that the harvest is sustainable. As we will see, there are many such sustainable harvesting procedures. We want to find one for which the total economic value of all the trees removed is as large as possible. This determines the optimal sustainable yield of the forest and is the largest yield that can be attained continually without depleting the forest.

The Model

Suppose that a harvester has a forest of Douglas fir trees that are to be sold as Christmas trees year after year. Every December the harvester cuts down some of the trees to be sold. For each tree cut down, a seedling is planted in its place. In this way the total number of trees in the forest is always the same. (In this simplified model, we will not take into account trees that die between harvests. We assume that every seedling planted survives and grows until it is harvested.) In the marketplace, trees of different heights have different economic values. Suppose that there are n different price classes corresponding to certain height intervals, as shown in Table 1 and Figure 10.8.1. The first class consists of seedlings with heights in the interval [0, h1 ), and these seedlings are of no economic value. The nth class consists of trees with heights greater than or equal to hn−1 .

Table 1

[0, h1)

2

p2

[h1, h2)

3

p3

[h2, h3)

hn–1 hn–2



None

Height of Tree

1 (seedlings)

...

Height Interval

...

Value (dollars)

...

Class

h3



h2 h1

n–1 n

pn – 1 pn

[hn – 2, hn – 1) [hn – 1, )

0

0

p2

p3

pn–1

Value of Tree

Figure 10.8.1

pn

10.8 Forest Management

589

Let xi (i = 1, 2, . . . , n) be the number of trees within the i th class that remain after each harvest. We form a column vector with the numbers and call it the nonharvest vector: ⎡ ⎤

x1 ⎢x ⎥ ⎢ 2⎥ x=⎢.⎥ ⎣ .. ⎦ xn

For a sustainable harvesting policy, the forest is to be returned after each harvest to the fixed configuration given by the nonharvest vector x. Part of our problem is to find those nonharvest vectors x for which sustainable harvesting is possible. Because the total number of trees in the forest is fixed, we can set

x1 + x2 + · · · + xn = s

(1)

where s is predetermined by the amount of land available and the amount of space each tree requires. Referring to Figure 10.8.2, we have the following situation. The forest configuration is given by the vector x after each harvest. Between harvests the trees grow and produce a new forest configuration before each harvest. A certain number of trees are removed from each class at the harvest. Finally, a seedling is planted in place of each tree removed, to return the forest again to the configuration x.

Forest after growth

Trees not removed Same forest configuration

Forest before growth (nonharvest vector x)

Plant seedlings

Growth

Harvest

Trees removed

Forest after harvest (nonharvest vector x)

Figure 10.8.2

Consider first the growth of the forest between harvests. During this period a tree in the i th class may grow and move up to a higher height class. Or its growth may be retarded for some reason, and it will remain in the same class. We consequently define the following growth parameters gi for i = 1, 2, . . . , n − 1:

gi = the fraction of trees in the i th class that grow into the (i + 1)-st class during a growth period For simplicity we assume that a tree can move at most one height class upward in one growth period. With this assumption, we have 1 − gi = the fraction of trees in the i th class that remain in the i th class during a growth period

590

Chapter 10 Applications of Linear Algebra

With these n − 1 growth parameters, we form the following n × n growth matrix:



1 − g1

⎢ g ⎢ 1 ⎢ ⎢ 0 G=⎢ ⎢ .. ⎢ . ⎢ ⎣ 0

0 1 − g2

0

··· ··· ··· .. .

0 0

g2 .. .

1 − g3

0 0

0 0

.. .

· · · 1 − gn−1 ··· gn−1



0 0⎥ ⎥ ⎥ 0⎥

.. ⎥ ⎥ .⎥ ⎥ 0⎦

(2)

1

Because the entries of the vector x are the numbers of trees in the n classes before the growth period, you can verify that the entries of the vector





(1 − g1 )x1 g1 x1 + (1 − g2 )x2 g2 x2 + (1 − g3 )x3 .. .

⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ Gx = ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣gn−2 xn−2 + (1 − gn−1 )xn−1 ⎦ gn−1 xn−1 + xn

(3)

are the numbers of trees in the n classes after the growth period. Suppose that during the harvest we remove yi (i = 1, 2, . . . , n) trees from the i th class. We will call the column vector



⎤ y1 ⎢y ⎥ ⎢ 2⎥ y=⎢ .⎥ ⎣ .. ⎦ yn

the harvest vector. Thus, a total of

y1 + y2 + · · · + yn trees are removed at each harvest. This is also the total number of trees added to the first class (the new seedlings) after each harvest. If we define the following n × n replacement matrix ⎤ ⎡ 1 1 ··· 1 ⎢0 0 · · · 0⎥ ⎥ ⎢ R = ⎢ .. .. (4) .. ⎥ then the column vector

⎣.

.

0

0

.⎦

···

0



y1 + y2 + · · · + yn

⎢ ⎢ ⎢ Ry = ⎢ ⎢ ⎣

0 0

.. .

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(5)

0 specifies the configuration of trees planted after each harvest. At this point we are ready to write the following equation, which characterizes a sustainable harvesting policy:











configuration configuration ⎣ at end of ⎦ − [harvest] + new seedling = ⎣at beginning of⎦ replacement growth period growth period or mathematically,

Gx − y + R y = x

10.8 Forest Management

591

This equation can be rewritten as

(I − R)y = (G − I )x

(6)

or more comprehensively as



0 ⎢ ⎢0 ⎢ ⎢0

⎢. ⎢ .. ⎢ ⎢ ⎣0 0

−1 1 0

.. .

0 0

⎤⎡ ⎤ y1 · · · − 1 −1 ⎥ 0 ··· 0 0⎥ ⎢ y2 ⎥ ⎥ ⎥⎢ ⎢ ⎥ 1 ··· 0 0 ⎢ y3 ⎥ ⎥ .. .. ..⎥ ⎢ . ⎥ ⎥ . ⎢ . . .⎥ ⎢ . ⎥ ⎥ ⎥ 0 ··· 1 0⎦ ⎣yn−1 ⎦

−1

0

···

0

1

yn ⎡

−g1 ⎢ g ⎢ 1 ⎢ ⎢ 0 =⎢ ⎢ .. ⎢ . ⎢ ⎣ 0

0

0 0

−g2 g2 .. .

−g3 .. .

0 0

0 0

0

··· ··· ···

0 0 0

.. . · · · −gn−1 ··· gn−1

⎤⎡



0 x1 ⎥ 0⎥ x ⎥⎢ ⎥⎢ 2 ⎥ ⎥ 0⎥ ⎢ ⎢ x3 ⎥

.. ⎥ ⎥ ⎢ .. ⎥ .⎥ ⎢ . ⎥ ⎥ ⎥⎢ ⎣ 0⎦ xn−1 ⎦ xn 0

We will refer to Equation (6) as the sustainable harvesting condition. Any vectors x and y with nonnegative entries, and such that x1 + x2 + · · · + xn = s , which satisfy this matrix equation, determine a sustainable harvesting policy for the forest. Note that if y1 > 0, then the harvester is removing seedlings of no economic value and replacing them with new seedlings. Because there is no point in doing this, we assume that

y1 = 0

(7)

With this assumption, it can be verified that (6) is the matrix form of the following set of equations:

y2 + y3 + · · · + yn = g1 x1 y2 = g1 x1 − g2 x2

y3 = g2 x2 − g3 x3 .. .

(8)

yn−1 = gn−2 xn−2 − gn−1 xn−1 yn = gn−1 xn−1 Note that the first equation in (8) is the sum of the remaining n − 1 equations. Because we must have yi ≥ 0 for i = 2, 3, . . . , n, Equations (8) require that

g1 x1 ≥ g2 x2 ≥ · · · ≥ gn−1 xn−1 ≥ 0

(9)

Conversely, if x is a column vector with nonnegative entries that satisfy Equation (9), then (7) and (8) define a column vector y with nonnegative entries. Furthermore, x and y then satisfy the sustainable harvesting condition (6). In other words, a necessary and sufficient condition for a nonnegative column vector x to determine a forest configuration that is capable of sustainable harvesting is that its entries satisfy (9). Optimal SustainableYield

Because we remove yi trees from the i th class (i = 2, 3, . . . , n) and each tree in the i th class has an economic value of pi , the total yield of the harvest, Y ld , is given by

Y ld = p2 y2 + p3 y3 + · · · + pn yn

(10)

Using (8), we may substitute for the yi’s in (10) to obtain

Y ld = p2 g1 x1 + (p3 − p2 )g2 x2 + · · · + (pn − pn−1 )gn−1 xn−1

(11)

592

Chapter 10 Applications of Linear Algebra

Combining (11), (1), and (9), we can now state the problem of maximizing the yield of the forest over all possible sustainable harvesting policies as follows: Problem Find nonnegative numbers x1 , x2 , . . . , xn that maximize

Y ld = p2 g1 x1 + (p3 − p2 )g2 x2 + · · · + (pn − pn−1 )gn−1 xn−1 subject to

x1 + x 2 + · · · + xn = s and

g1 x1 ≥ g2 x2 ≥ · · · ≥ gn−1 xn−1 ≥ 0 As formulated above, this problem belongs to the field of linear programming. However, we will illustrate the following result, without linear programming theory, by actually exhibiting a sustainable harvesting policy. THEOREM 10.8.1 Optimal Sustainable Yield

The optimal sustainable yield is achieved by harvesting all the trees from one particular height class and none of the trees from any other height class.

Let us first set

Y ldk = yield obtained by harvesting all of the k th class and none of the other classes The largest value of Y ldk for k = 2, 3, . . . , n will then be the optimal sustainable yield, and the corresponding value of k will be the class that should be completely harvested to attain the optimal sustainable yield. Because no class but the k th is harvested, we have

y2 = y3 = · · · = yk−1 = yk+1 = · · · = yn = 0

(12)

In addition, because all of the k th class is harvested, no trees are ever present in the height classes above the k th class. Thus,

xk = xk+1 = · · · = xn = 0

(13)

Substituting (12) and (13) into the sustainable harvesting condition (8) gives

yk = g1 x1 0 = g1 x1 − g2 x2 0 = g2 x2 − g3 x3

.. . 0 = gk−2 xk−2 − gk−1 xk−1 yk = gk−1 xk−1

(14)

Equations (14) can also be written as

yk = g1 x1 = g2 x2 = · · · = gk−1 xk−1 from which it follows that

(15)

x2 = g1 x1 /g2 x3 = g1 x1 /g3 .. . xk−1 = g1 x1 /gk−1

(16)

10.8 Forest Management

593

If we substitute Equations (13) and (16) into

x1 + x2 + · · · + xn = s [which is Equation (1)], we can solve for x1 and obtain

x1 =

s g1 g1 g1 1+ + + ··· + g2 g3 gk−1

(17)

For the yield Y ldk , we combine (10), (12), (15), and (17) to obtain

Y ldk = p2 y2 + p3 y3 + · · · + pn yn = pk yk = pk g1 x1 =

pk s 1

g1

+

1

g2

+ ··· +

(18)

1

gk−1

Equation (18) determines Y ldk in terms of the known growth and economic parameters for any k = 2, 3, . . . , n. Thus, the optimal sustainable yield is found as follows.

THEOREM 10.8.2 Finding the Optimal Sustainable Yield

The optimal sustainable yield is the largest value of

pk s 1

g1

+

1

g2

+ ··· +

1

gk−1

for k = 2, 3, . . . , n. The corresponding value of k is the number of the class that is completely harvested.

In Exercise 4 we ask you to show that the nonharvest vector x for the optimal sustainable yield is ⎡ ⎤ 1/g1 ⎢ ⎥ ⎢ 1/g2 ⎥ x=

s 1

g1

+

1

g2

+ ··· +

1

gk−1

⎢ . ⎥ ⎢ .. ⎥ ⎢ ⎥ ⎢ ⎥ ⎢1/gk−1 ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ ⎥ ⎢ 0 ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ . ⎦

(19)

0 Theorem 10.8.2 implies that it is not necessarily the highest-priced class of trees that should be totally cropped. The growth parameters gi must also be taken into account to determine the optimal sustainable yield.

E X A M P L E 1 Using Theorem 10.8.2

For a Scots pine forest in Scotland with a growth period of six years, the following growth matrix was found (see M. B. Usher, “A Matrix Approach to the Management of Renewable Resources, with Special Reference to Selection Forests,” Journal of Applied

594

Chapter 10 Applications of Linear Algebra

Ecology, vol. 3, 1966, pp. 355–367):



.72 ⎢.28 ⎢ ⎢0 ⎢ G=⎢ ⎢0 ⎢ ⎣0 0

0 .69 .31 0 0 0

0 0 .75 .25 0 0

0 0 0 .77 .23 0

0 0 0 0 .63 .37



0 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 1.00

Suppose that the prices of trees in the five tallest height classes are

p2 = $50,

p3 = $100,

p4 = $150,

p5 = $200,

p6 = $250

Which class should be completely harvested to obtain the optimal sustainable yield, and what is that yield? Solution From matrix G we have that

g1 = .28,

g2 = .31,

g3 = .25,

g4 = .23,

g5 = .37

Equation (18) then gives

Y ld2 = 50s/(.28−1 ) = 14.0s Y ld3 = 100s/(.28−1 + .31−1 ) = 14.7s Y ld4 = 150s/(.28−1 + .31−1 + .25−1 ) = 13.9s Y ld5 = 200s/(.28−1 + .31−1 + .25−1 + .23−1 ) = 13.2s Y ld6 = 250s/(.28−1 + .31−1 + .25−1 + .23−1 + .37−1 ) = 14.0s We see that Y ld3 is the largest of these five quantities, so from Theorem 10.8.2 the third class should be completely harvested every six years to maximize the sustainable yield. The corresponding optimal sustainable yield is $14.7s , where s is the total number of trees in the forest.

Exercise Set 10.8 1. A certain forest is divided into three height classes and has a growth matrix between harvests given by

⎡ G=

1 ⎢ 21 ⎢ ⎣2

0

0 1 3 2 3



0



0⎥ ⎦ 1

If the price of trees in the second class is $30 and the price of trees in the third class is $50, which class should be completely harvested to attain the optimal sustainable yield? What is the optimal yield if there are 1000 trees in the forest? 2. In Example 1, to what level must the price of trees in the fifth class rise so that the fifth class is the one to harvest completely in order to attain the optimal sustainable yield? 3. In Example 1, what must the ratio of the prices p2 : p3 : p4 : p5 : p6 be in order that the yields Y ldk , k = 2, 3, 4, 5, 6, all be the same? (In this case, any sustainable harvesting policy will produce the same optimal sustainable yield.)

4. Derive Equation (19) for the nonharvest vector x corresponding to the optimal sustainable harvesting policy described in Theorem 10.8.2. 5. For the optimal sustainable harvesting policy described in Theorem 10.8.2, how many trees are removed from the forest during each harvest? 6. If all the growth parameters g1 , g2 , . . . , gn−1 in the growth matrix G are equal, what should the ratio of the prices p2 : p3 : . . . : pn be in order that any sustainable harvesting policy be an optimal sustainable harvesting policy? (See Exercise 3.)

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal

10.9 Computer Graphics

595

of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.

(e) Compare the values of k determined in parts (b) and (c) to 1/(2 − ρ), and use some calculus to explain why

T1. A particular forest has growth parameters given by

T2. A particular forest has growth parameters given by

gi =

1

i for i = 1, 2, 3, . . . , n − 1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of a tree in the k th height interval is given by

2−ρ

gi =

1 2i

for i = 1, 2, 3, . . . , n − 1, where n (the total number of height classes) can be chosen as large as needed. Suppose that the value of a tree in the k th height interval is given by

pk = a(k − 1)ρ

pk = a(k − 1)ρ

where a is a constant (in dollars) and ρ is a parameter satisfying 1 ≤ ρ ≤ 2. (a) Show that the yield Y ldk is given by

Y ldk =

1

k

2a(k − 1)ρ−1 s

where a is a constant (in dollars) and ρ is a parameter satisfying 1 ≤ ρ. (a) Show that the yield Y ldk is given by

Y ldk =

k

(b) For

ρ = 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 use a computer to determine the class number that should be completely harvested, and determine the optimal sustainable yield in each case. Make sure that you allow k to take on only integer values in your calculations. (c) Repeat the calculations in part (b) using

ρ = 1.91, 1.92, 1.93, 1.94, 1.95, 1.96, 1.97, 1.98, 1.99

a(k − 1)ρ s 2k − 2

(b) For

ρ = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 use a computer to determine the class number that should be completely harvested in order to obtain an optimal yield, and determine the optimal sustainable yield in each case. Make sure that you allow k to take on only integer values in your calculations. (c) Compare the values of k determined in part (b) to 1 + ρ/ ln(2) and use some calculus to explain why

(d) Show that if ρ = 2, then the optimal sustainable yield can never be larger than 2as .

k 1+

ρ ln(2)

10.9 Computer Graphics In this section we assume that a view of a three-dimensional object is displayed on a video screen and show how matrix algebra can be used to obtain new views of the object by rotation, translation, and scaling.

PREREQUISITES: Matrix Algebra Analytic Geometry

Visualization of a Three-Dimensional Object

Suppose that we want to visualize a three-dimensional object by displaying various views of it on a video screen. The object we have in mind to display is to be determined by a finite number of straight line segments. As an example, consider the truncated right pyramid with hexagonal base illustrated in Figure 10.9.1. We first introduce an xyz-coordinate system in which to embed the object. As in Figure 10.9.1, we orient the coordinate system so that its origin is at the center of the video screen and the xy -plane coincides with the plane of the screen. Consequently, an observer will see only the projection of the view of the three-dimensional object onto the two-dimensional xy -plane.

596

Chapter 10 Applications of Linear Algebra y P10 P9

P11

P8

P12 z P4

P7

P3

P5 P6

P2

P1

x

Figure 10.9.1

In the xyz-coordinate system, the endpoints P1 , P2 , . . . , Pn of the straight line segments that determine the view of the object will have certain coordinates—say,

(x1 , y1 , z1 ), (x2 , y2 , z2 ), . . . , (xn , yn , zn ) These coordinates, together with a specification of which pairs are to be connected by straight line segments, are to be stored in the memory of the video display system. For example, assume that the 12 vertices of the truncated pyramid in Figure 10.9.1 have the following coordinates (the screen is 4 units wide by 3 units high):

P1 : (1.000, −.800, .000), P3 : (−.500, −.800, −.866),

P2 : (.500, −.800, −.866),

P5 : (−.500, −.800, .866),

P6 : (.500, −.800, .866),

P7 : (.840, −.400, .000),

P8 : (.315, .125, −.546), P10 : (−.360, .800, .000), P12 : (.315, .125, .546)

P4 : (−1.000, −.800, .000),

P9 : (−.210, .650, −.364), P11 : (−.210, .650, .364),

These 12 vertices are connected pairwise by 18 straight line segments as follows, where Pi ↔ Pj denotes that point Pi is connected to point Pj :

P1 ↔ P2 , P2 ↔ P3 , P7 ↔ P8 , P8 ↔ P9 , P1 ↔ P7 , P2 ↔ P8 , –2

–1

1 0 –1

View 1

0

1

2

P3 ↔ P4 , P4 ↔ P5 , P5 ↔ P6 , P6 ↔ P1 , P9 ↔ P10 , P10 ↔ P11 , P11 ↔ P12 , P12 ↔ P7 , P3 ↔ P9 ,

P4 ↔ P10 ,

P5 ↔ P11 ,

P6 ↔ P12

In View 1 these 18 straight line segments are shown as they would appear on the video screen. It should be noticed that only the x - and y -coordinates of the vertices are needed by the video display system to draw the view, because only the projection of the object onto the xy -plane is displayed. However, we must keep track of the z-coordinates to carry out certain transformations discussed later. We now show how to form new views of the object by scaling, translating, or rotating the initial view. We first construct a 3 × n matrix P , referred to as the coordinate matrix of the view, whose columns are the coordinates of the n points of a view:



x1 ⎢ y P =⎣ 1 z1

x2 y2 z2

⎤ · · · xn ⎥ · · · yn ⎦ · · · zn

10.9 Computer Graphics

597

For example, the coordinate matrix P corresponding to View 1 is the 3 × 12 matrix ⎡

1.000 ⎢ ⎢−.800 ⎣ .000

.500

−.500

−1.000

−.800

−.800

−.800

−.866

−.866

.000

−.500

.500

.840

.315

−.800 −.800 −.400

.125

.866

.866

−.210 −.360 −.210 .315



.650

.800

⎥ .650 .125⎥ ⎦

.000 −.546 −.364

.000

.364

.546

We will show below how to transform the coordinate matrix P of a view to a new coordinate matrix P corresponding to a new view of the object.The straight line segments connecting the various points move with the points as they are transformed. In this way, each view is uniquely determined by its coordinate matrix once we have specified which pairs of points in the original view are to be connected by straight lines.

Scaling y

1

The first type of transformation we consider consists of scaling a view along the x , y , and z directions by factors of α , β , and γ , respectively. By this we mean that if a point Pi has coordinates (xi , yi , zi ) in the original view, it is to move to a new point Pi

with coordinates (αxi , βyi , γ zi ) in the new view. This has the effect of transforming a unit cube in the original view to a rectangular parallelepiped of dimensions α × β × γ (Figure 10.9.2). Mathematically, this may be accomplished with matrix multiplication as follows. Define a 3 × 3 diagonal matrix



1

α

x

⎢ S = ⎣0

1

0

z

α

z

(b)

γ

⎡ ⎤ ⎡ xi

α ⎢ ⎥ ⎢ 0 y = ⎣ i⎦ ⎣ 0 zi

⎤⎡ ⎤

β

xi 0 ⎥⎢ ⎥ 0 ⎦ ⎣yi ⎦

0

γ

0

zi

Using the coordinate matrix P , which contains the coordinates of all n points of the original view as its columns, we can transform these n points simultaneously to produce the coordinate matrix P of the scaled view, as follows:

Figure 10.9.2



α

0

0

then the transformed point Pi is represented by the column vector x

–1

β

⎡ ⎤ xi ⎢ ⎥ ⎣yi ⎦ zi

β

–2



0 ⎥ 0⎦

Then, if a point Pi in the original view is represented by the column vector

(a) y

0

1

2

⎢ SP = ⎣0

0



1

αx1 ⎢ = ⎣βy1 γ z1

0

⎤⎡

β

0 x1 ⎥⎢ 0 ⎦ ⎣y1

0

γ

0

αx2 βy2 γ z2

z1

x2 y2 z2

⎤ · · · xn ⎥ · · · yn ⎦ · · · zn ⎤

· · · αxn ⎥ · · · βyn ⎦ = P

· · · γ zn

–1

View 2 View 1 scaled by

α = 1.8, β = 0.5, γ = 3.0.

The new coordinate matrix can then be entered into the video display system to produce the new view of the object. As an example, View 2 is View 1 scaled by setting α = 1.8, β = 0.5, and γ = 3.0. Note that the scaling γ = 3.0 along the z-axis is not visible in View 2, since we see only the projection of the object onto the xy -plane.

598

Chapter 10 Applications of Linear Algebra

Translation

We next consider the transformation of translating or displacing an object to a new position on the screen. Referring to Figure 10.9.3, suppose we desire to change an existing view so that each point Pi with coordinates (xi , yi , zi ) moves to a new point Pi

with coordinates (xi + x0 , yi + y0 , zi + z0 ). The vector

⎡ ⎤ x0 ⎢ ⎥ ⎣y0 ⎦ z0

is called the translation vector of the transformation. By defining a 3 × n matrix T as



⎤ · · · x0 ⎥ · · · y0 ⎦ · · · z0

x0 y0 z0

x0 ⎢ T = ⎣y0 z0

we can translate all n points of the view determined by the coordinate matrix P by matrix addition via the equation –2

–1

0

1

P = P + T

2

The coordinate matrix P then specifies the new coordinates of the n points. For example, if we wish to translate View 1 according to the translation vector

1



0



1.2 ⎢ ⎥ 0 ⎣ .4⎦ 1.7

–1

the result is View 3. Note, again, that the translation z0 = 1.7 along the z-axis does not show up explicitly in View 3. In Exercise 7, a technique of performing translations by matrix multiplication rather than by matrix addition is explained.

View 3 View 1 translated by

x0 = 1.2, y0 = 0.4, z0 = 1.7.

y P´i (xi + x0, yi + y0, zi + z0)

x

Figure 10.9.3

Rotation y

P´i (x´i, y´i, z´i ) ρ

θ φ

Figure 10.9.4

ρ

Pi (xi, yi, zi) x

z

Pi (xi, yi, zi)

A more complicated type of transformation is a rotation of a view about one of the three coordinate axes. We begin with a rotation about the z-axis (the axis perpendicular to the screen) through an angle θ . Given a point Pi in the original view with coordinates (xi , yi , zi ), we wish to compute the new coordinates (xi , yi , zi ) of the rotated point Pi . Referring to Figure 10.9.4 and using a little trigonometry, you should be able to derive the following:

xi = ρ cos(φ + θ) = ρ cos φ cos θ − ρ sin φ sin θ = xi cos θ − yi sin θ yi = ρ sin(φ + θ) = ρ cos φ sin θ + ρ sin φ cos θ = xi sin θ + yi cos θ zi = zi

10.9 Computer Graphics

These equations can be written in matrix form as

⎡ ⎤ ⎡ xi

cos θ ⎢ ⎥ ⎢ ⎣yi ⎦ = ⎣ sin θ 0 zi

599

⎤⎡ ⎤

− sin θ cos θ

xi 0 ⎥⎢ ⎥ 0⎦ ⎣yi ⎦ 1 zi

0

If we let R denote the 3 × 3 matrix in this equation, all n points can be rotated by the matrix product

P = RP

to yield the coordinate matrix P of the rotated view. Rotations about the x - and y -axes can be accomplished analogously, and the resulting rotation matrices are given with Views 4, 5, and 6. These three new views of the truncated pyramid correspond to rotations of View 1 about the x -, y -, and z-axes, respectively, each through an angle of 90◦ . –2

Rotation about the x-axis y θ

–1

0

1

2

–1

0

1

2

–1

0

1

2

1 x

1 0 0 0 cos θ –sin θ 0 sin θ cos θ

z

0

–1

View 4 View 1 rotated 90◦ about the x -axis.

–2

Rotation about the y-axis y

1

θ x

cos θ 0 –sin θ

0 sin θ 1 0 0 cos θ

z

0

–1

View 5 View 1 rotated 90◦ about the y -axis.

–2

Rotation about the z-axis y

1 x

z

cos θ –sin θ 0 sin θ cos θ 0 0 0 1

θ

View 6 View 1 rotated 90◦ about the z-axis.

0

–1

600

–2

Chapter 10 Applications of Linear Algebra

–1

0

1

1 0

2

Rotations about three coordinate axes may be combined to give oblique views of an object. For example, View 7 is View 1 rotated first about the x -axis through 30◦ , then about the y -axis through −70◦ , and finally about the z-axis through −27◦ . Mathematically, these three successive rotations can be embodied in the single transformation equation P = RP , where R is the product of three individual rotation matrices:



1 ⎢ R1 = ⎣0 0

–1

View 7 Oblique view of truncated pyramid.

0 cos(30◦ ) sin(30◦ )





cos(−70◦ ) ⎢ 0 R2 = ⎣

0 1

sin(−70◦ ) ⎥ 0 ⎦

− sin(−70◦ )

0

cos(−70◦ )



cos(−27◦ ) ⎢ R3 = ⎣ sin(−27◦ ) 0 in the order



0 ⎥ − sin(30◦ ) ⎦ cos(30◦ )

− sin(−27◦ ) cos(−27◦ ) 0



0 ⎥ 0⎦ 1

⎤ .305 −.025 −.952 ⎥ ⎢ .985 −.076⎦ R = R3 R2 R1 = ⎣−.155 .940 .171 .296 ⎡

As a final illustration, in View 8 we have two separate views of the truncated pyramid, which constitute a stereoscopic pair. They were produced by first rotating View 7 about the y -axis through an angle of −3◦ and translating it to the right, then rotating the same View 7 about the y -axis through an angle of +3◦ and translating it to the left. The translation distances were chosen so that the stereoscopic views are about 2 21 inches apart—the approximate distance between a pair of eyes.

View 8 Stereoscopic figure of truncated pyramid. The three-dimensionality of the diagram can be seen by holding the book about one foot away and focusing on a distant object. Then by shifting your gaze to View 8 without refocusing, you can make the two views of the stereoscopic pair merge together and produce the desired effect.

10.9 Computer Graphics

601

Exercise Set 10.9 1. View 9 is a view of a square with vertices (0, 0, 0), (1, 0, 0), (1, 1, 0), and (0, 1, 0).

(c) The matrix



1 ⎣.6 0

(a) What is the coordinate matrix of View 9? (b) What is the coordinate matrix of View 9 after it is scaled by a factor 1 21 in the x -direction and 21 in the y -direction? Draw a sketch of the scaled view. (c) What is the coordinate matrix of View 9 after it is translated by the following vector?



⎤ −2 ⎣−1⎦

–2

(d) What is the coordinate matrix of View 9 after it is rotated through an angle of −30◦ about the z-axis? Draw a sketch of the rotated view. 0

1

–1

0

1

2

1

Draw a sketch of the translated view.

–1



0 0⎦ 1

determines a shear in the y-direction with factor .6 with respect to the x-coordinate (an example appears in View 11). Sketch a view of the square in View 9 after such a shearing transformation, and find the new coordinates of its four vertices.

3

–2

0 1 0

0 –1

2

1

View 11 View 1 sheared along the y -axis by .6 with respect to the x -coordinate (Exercise 2).

0 –1

View 9 Square with vertices

(0, 0, 0), (1, 0, 0), (1, 1, 0), and (0, 1, 0) (Exercises 1 and 2). 2. (a) If the coordinate matrix of View 9 is multiplied by the matrix ⎡ ⎤ 1 21 0

⎢ ⎣0



1 0⎦ 0 0 1 the result is the coordinate matrix of View 10. Such a transformation is called a shear in the x -direction with factor 21 with respect to the y -coordinate. Show that under such a transformation, a point with coordinates (xi , yi , zi ) has new coordinates (xi + 21 yi , yi , zi ).

3. (a) The reflection about the xz-plane is defined as the transformation that takes a point (xi , yi , zi ) to the point (xi , −yi , zi ) (e.g., View 12). If P and P are the coordinate matrices of a view and its reflection about the xzplane, respectively, find a matrix M such that P = MP . (b) Analogous to part (a), define the reflection about the yzplane and construct the corresponding transformation matrix. Draw a sketch of View 1 reflected about the yzplane. (c) Analogous to part (a), define the reflection about the xy plane and construct the corresponding transformation matrix. Draw a sketch of View 1 reflected about the xy plane. –2

(b) What are the coordinates of the four vertices of the shear square in View 10? –2

–1

0

1

–1

0

1

2

1

2

0

1

–1 0

View 12 View 1 reflected about the xz-plane (Exercise 3).

–1

View 10 View 9 sheared along the x -axis by 21 with respect to the y -coordinate (Exercise 2).

4. (a) View 13 is View 1 subject to the following five transformations: 1.

Scale by a factor of 21 in the x -direction, 2 in the y direction, and 13 in the z-direction.

602

Chapter 10 Applications of Linear Algebra

2.

1 2

Translate

6. Suppose that a view with coordinate matrix P is to be rotated through an angle θ about an axis through the origin and specified by two angles α and β (see Figure Ex-6). If P is the coordinate matrix of the rotated view, find rotation matrices R1 , R2 , R3 , R4 , and R5 such that

unit in the x -direction.



3.

Rotate 20 about the x -axis.

4.

Rotate −45◦ about the y -axis.

5.

Rotate 90◦ about the z-axis.

Construct the five matrices M1 , M2 , M3 , M4 , and M5 associated with these five transformations. (b) If P is the coordinate matrix of View 1 and P is the coordinate matrix of View 13, express P in terms of M1 , M2 , M3 , M4 , M5 , and P . –2

–1

0

1

2

1

P = R5 R4 R3 R2 R1 P [Hint: The desired rotation can be accomplished in the following five steps: 1.

Rotate through an angle of β about the y -axis.

2.

Rotate through an angle of α about the z-axis.

3.

Rotate through an angle of θ about the y -axis.

4.

Rotate through an angle of −α about the z-axis.

5.

Rotate through an angle of −β about the y -axis.]

0

y

–1

θ

View 13 View 1 scaled, translated, and rotated (Exercise 4).

α x β

5. (a) View 14 is View 1 subject to the following seven transformations: 1.

Scale by a factor of .3 in the x -direction and by a factor of .5 in the y -direction. ◦

2.

Rotate 45 about the x -axis.

3.

Translate 1 unit in the x -direction.

4.

Rotate 35◦ about the y -axis.

5.

Rotate −45◦ about the z-axis.

6.

Translate 1 unit in the z-direction.

7.

Scale by a factor of 2 in the x -direction.

(b) If P is the coordinate matrix of View 1 and P is the coordinate matrix of View 14, express P in terms of M1 , M2 , . . . , M7 , and P . –1

0

1

1 0 –1

View 14 View 1 scaled, translated, and rotated (Exercise 5).

Figure Ex-6

7. This exercise illustrates a technique for translating a point with coordinates (xi , yi , zi ) to a point with coordinates (xi + x0 , yi + y0 , zi + z0 ) by matrix multiplication rather than matrix addition. (a) Let the point (xi , yi , zi ) be associated with the column vector ⎡ ⎤

Construct the matrices M1 , M2 , . . . , M7 associated with these seven transformations.

–2

z

2

xi ⎢y ⎥ ⎢ i⎥ vi = ⎢ ⎥ ⎣ zi ⎦ 1

and let the point (xi + x0 , yi + y0 , zi + z0 ) be associated with the column vector



⎤ xi + x0 ⎢y + y ⎥ ⎢ i 0⎥ v i = ⎢ ⎥ ⎣ zi + z0 ⎦ 1

Find a 4 × 4 matrix M such that v i = M vi . (b) Find the specific 4 × 4 matrix of the above form that will effect the translation of the point (4, −2, 3) to the point (−1, 7, 0). 8. For the three rotation matrices given with Views 4, 5, and 6, show that

R −1 = R T (A matrix with this property is called an orthogonal matrix. See Section 7.1.)

10.10 Equilibrium Temperature Distributions

Working withTechnology

603

and therefore correspond to those vectors whose direction is not affected by a reflection through the plane. Use a computer to determine the eigenvectors and eigenvalues of M , and then give a physical argument to support your answer.

The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.

T2. A vector v = (x, y, z) is rotated by an angle θ about an axis having unit vector (a, b, c), thereby forming the rotated vector vR = (xR , yR , zR ). It can be shown that

T1. Let (a, b, c) be a unit vector normal to the plane ax + by + cz = 0, and let r = (x, y, z) be a vector. It can be shown that the

with

mirror image of the vector r through the above plane has coordinates rm = (xm , ym , zm ), where



⎤ ⎡ ⎤ xm x ⎢ ⎥ ⎢ ⎥ y ⎣ m ⎦ = M ⎣y ⎦ zm z

with



1 ⎢ M = I − 2nnT = ⎣0 0

0 1 0

⎡ ⎤ 0 a ⎥ ⎢ ⎥ 0⎦ − 2 ⎣b ⎦ [a b c] 1 c

⎤ ⎡ ⎤ xR x ⎢ ⎥ ⎢ ⎥ ⎣yR ⎦ = R(θ) ⎣y ⎦ zR z



1 ⎢ R(θ) = cos(θ) ⎣0 0

0 1 0



(b) Use a computer to show that det(M) = −1. (c) The eigenvectors of M satisfy the equation

⎤ ⎡ ⎤ ⎡ ⎤ xm x x ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ym ⎦ = M ⎣y ⎦ = λ ⎣y ⎦ zm z z

⎡ ⎤

0 a ⎥ ⎢ ⎥ 0⎦ + (1 − cos(θ)) ⎣b ⎦ [a b c] 1 c



0

⎢ + sin(θ) ⎣ c −b



(a) Show that M 2 = I and give a physical reason why this must be so. [Hint: Use the fact that (a, b, c) is a unit vector to show that nTn = 1.]





0

⎤ b ⎥ −a ⎦

a

0

−c

(a) Use a computer to show that R(θ)R(ϕ) = R(θ + ϕ), and then give a physical reason why this must be so. Depending on the sophistication of the computer you are using, you may have to experiment using different values of a , b, and

c=

"

1 − a 2 − b2

(b) Show also that R −1 (θ) = R(−θ) and give a physical reason why this must be so. (c) Use a computer to show that det(R(θ)) = +1.

10.10 EquilibriumTemperature Distributions In this section we will see that the equilibrium temperature distribution within a trapezoidal plate can be found when the temperatures around the edges of the plate are specified. The problem is reduced to solving a system of linear equations. Also, an iterative technique for solving the problem and a “random walk” approach to the problem are described.

PREREQUISITES: Linear Systems Matrices Intuitive Understanding of Limits

Boundary Data

Suppose that the two faces of the thin trapezoidal plate shown in Figure 10.10.1a are insulated from heat. Suppose that we are also given the temperature along the four edges of the plate. For example, let the temperature be constant on each edge with values of

604

Chapter 10 Applications of Linear Algebra

0◦ , 0◦ , 1◦ , and 2◦ , as in the figure. After a period of time, the temperature inside the plate will stabilize. Our objective in this section is to determine this equilibrium temperature distribution at the points inside the plate. As we will see, the interior equilibrium temperature is completely determined by the boundary data—that is, the temperature along the edges of the plate.

Temperature = 2°

Te m

pe

ra tu

re =

2 2.00

0.25

1.75

0.50

1.50 0° 1

Temperature = 1° 2

Figure 10.10.1

0.00



(a)

1.25 0.75 1.00

0.00

1.00

(b)

The equilibrium temperature distribution can be visualized by the use of curves that connect points of equal temperature. Such curves are called isotherms of the temperature distribution. In Figure 10.10.1b we have sketched a few isotherms, using information we derive later in the chapter. Although all our calculations will be for the trapezoidal plate illustrated, our techniques generalize easily to a plate of any practical shape. They also generalize to the problem of finding the temperature within a three-dimensional body. In fact, our “plate” could be the cross section of some solid object if the flow of heat perpendicular to the cross section is negligible. For example, Figure 10.10.1 could represent the cross section of a long dam. The dam is exposed to three different temperatures: the temperature of the ground at its base, the temperature of the water on one side, and the temperature of the air on the other side. A knowledge of the temperature distribution inside the dam is necessary to determine the thermal stresses to which it is subjected. Next we will consider a certain thermodynamic principle that characterizes the temperature distribution we are seeking. The Mean-Value Property

There are many different ways to obtain a mathematical model for our problem. The approach we use is based on the following property of equilibrium temperature distributions. THEOREM 10.10.1 The Mean-Value Property

Let a plate be in thermal equilibrium and let P be a point inside the plate. Then if C is any circle with center at P that is completely contained in the plate, the temperature at P is the average value of the temperature on the circle (Figure 10.10.2). P C

Figure 10.10.2

This property is a consequence of certain basic laws of molecular motion, and we will not attempt to derive it. Basically, this property states that in equilibrium, thermal energy tends to distribute itself as evenly as possible consistent with the boundary conditions.

10.10 Equilibrium Temperature Distributions

605

It can be shown that the mean-value property uniquely determines the equilibrium temperature distribution of a plate. Unfortunately, determining the equilibrium temperature distribution from the meanvalue property is not an easy matter. However, if we restrict ourselves to finding the temperature only at a finite set of points within the plate, the problem can be reduced to solving a linear system. We pursue this idea next.

Discrete Formulation of the Problem

We can overlay our trapezoidal plate with a succession of finer and finer square nets or meshes (Figure 10.10.3). In (a) we have a rather coarse net; in (b) we have a net with half the spacing as in (a); and in (c) we have a net with the spacing again reduced by half. The points of intersection of the net lines are called mesh points. We classify them as boundary mesh points if they fall on the boundary of the plate or as interior mesh points if they lie in the interior of the plate. For the three net spacings we have chosen, there are 1, 9, and 49 interior mesh points, respectively.

2

2 0

2 0

2

t1

2 2

t0

2

0

2 2

1

1

1

(a) 1 interior mesh point

1

1

0

t2

t3

t4

t5

t6

t7

t8

t9

1

0

1

0 0

2 2 2 2 2 2 2 2 2 2 2 2

0 0 0 0 0 0 0 0 0 0 0

1

1 1 1 1 1 1 1 1 1

(b) 9 interior mesh points

(c) 49 interior mesh points

Figure 10.10.3

In the discrete formulation of our problem, we try to find the temperature only at the interior mesh points of some particular net. For a rather fine net, as in (c), this will provide an excellent picture of the temperature distribution throughout the entire plate. At the boundary mesh points, the temperature is given by the boundary data. (In Figure 10.10.3 we have labeled all the boundary mesh points with their corresponding temperatures.) At the interior mesh points, we will apply the following discrete version of the mean-value property.

THEOREM 10.10.2 Discrete Mean-Value Property

At each interior mesh point, the temperature is approximately the average of the temperatures at the four neighboring mesh points.

This discrete version is a reasonable approximation to the true mean-value property. But because it is only an approximation, it will provide only an approximation to the true temperatures at the interior mesh points. However, the approximations will get better as the mesh spacing decreases. In fact, as the mesh spacing approaches zero, the approximations approach the exact temperature distribution, a fact proved in advanced courses

606

Chapter 10 Applications of Linear Algebra

in numerical analysis. We will illustrate this convergence by computing the approximate temperatures at the mesh points for the three mesh spacings given in Figure 10.10.3. Case (a) of Figure 10.10.3 is simple, for there is only one interior mesh point. If we let t0 be the temperature at this mesh point, the discrete mean-value property immediately gives

t0 = 41 (2 + 1 + 0 + 0) = 0.75 In case (b) we can label the temperatures at the nine interior mesh points t1 , t2 , . . . , t9 , as in Figure 10.10.3b. (The particular ordering is not important.) By applying the discrete mean-value property successively to each of these nine mesh points, we obtain the following nine equations:

t1 = 41 (t2 + 2 + 0 + 0) t2 = 41 (t1 + t3 + t4 + 2) t3 = 41 (t2 + t5 + 0 + 0) t4 = 41 (t2 + t5 + t7 + 2) t5 = 41 (t3 + t4 + t6 + t8 ) t6 =

1 (t 4 5

(1)

+ t9 + 0 + 0)

t7 = 41 (t4 + t8 + 1 + 2) t8 = 41 (t5 + t7 + t9 + 1) t9 = 41 (t6 + t8 + 1 + 0) This is a system of nine linear equations in nine unknowns. We can rewrite it in matrix form as t = Mt + b

(2)

where

⎡ ⎤ t1 ⎢ ⎥ ⎢t2 ⎥ ⎢ ⎥ ⎢ t3 ⎥ ⎢ ⎥ ⎢t4 ⎥ ⎢ ⎥ ⎢ ⎥ t = ⎢t5 ⎥, ⎢ ⎥ ⎢ t6 ⎥ ⎢ ⎥ ⎢ t7 ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ t8 ⎦ t9



0

⎢1 ⎢ ⎢4 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢ M=⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎣ 0

1 4

0

0

0

0

0

0

0

1 4

1 4

0

0

0

0

0

1 4 1 4

0

0

0

0

1 4

0

0

1 4

1 4 1 4

0 0

0

0

1 4

1 4

0

1 4

0

0

0

1 4

0

0

0

0

0

1 4

0

0

0

1 4

0

0

0

1 4

0

1 4

0

0

1 4

0

1 4

0

0

0

0



⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥, 1⎥ ⎥ 4⎥ ⎥ 0⎥ ⎥ 1⎥ 4⎦ 0

⎡ ⎤ 1

⎢2⎥ ⎢1⎥ ⎢2⎥ ⎢ ⎥ ⎢0 ⎥ ⎢ ⎥ ⎢1⎥ ⎢2⎥ ⎢ ⎥ ⎥ b=⎢ ⎢0 ⎥ ⎢0 ⎥ ⎢ ⎥ ⎢3⎥ ⎢ ⎥ ⎢4⎥ ⎢1⎥ ⎢4⎥ ⎣ ⎦ 1 4

To solve Equation (2), we write it as

(I − M)t = b The solution for t is thus t = (I − M)−1 b

(3)

as long as the matrix (I − M) is invertible. This is indeed the case, and the solution for

10.10 Equilibrium Temperature Distributions

t as calculated by (3) is

607





0.7846 ⎥ ⎢ ⎢1.1383⎥ ⎥ ⎢ ⎢0.4719⎥ ⎥ ⎢ ⎢1.2967⎥ ⎥ ⎢ ⎥ ⎢ t = ⎢0.7491⎥ ⎥ ⎢ ⎢0.3265⎥ ⎥ ⎢ ⎢1.2995⎥ ⎥ ⎢ ⎢0.9014⎥ ⎦ ⎣ 0.5570

(4)

Figure 10.10.4 is a diagram of the plate with the nine interior mesh points labeled with their temperatures as given by this solution. 2

Figure 10.10.4

2

0

2

0.7846

0

2

1.1383

0.4719

0

2

1.2967

0.7491

0.3265

0

2

1.2995

0.9014

0.5570

0

1

1

1

1

1

For case (c) of Figure 10.10.3, we repeat this same procedure. We label the temperatures at the 49 interior mesh points as t1 , t2 , . . . , t49 in some manner. For example, we may begin at the top of the plate and proceed from left to right along each row of mesh points. Applying the discrete mean-value property to each mesh point gives a system of 49 linear equations in 49 unknowns: t1 = 41 (t2 + 2 + 0 + 0)

t2 = 41 (t1 + t3 + t4 + 2) .. . t48 = t49 =

1 (t 4 41 1 (t 4 42

(5)

+ t47 + t49 + 1) + t48 + 0 + 1)

In matrix form, Equations (5) are t = Mt + b where t and b are column vectors with 49 entries, and M is a 49 × 49 matrix. As in (3), the solution for t is t = (I − M)−1 b (6) In Figure 10.10.5 we display the temperatures at the 49 mesh points found by Equation (6). The nine unshaded temperatures in this figure fall on the mesh points of Figure 10.10.4. In Table 1 we compare the temperatures at these nine common mesh points for the three different mesh spacings used.

608

Chapter 10 Applications of Linear Algebra 2

Figure 10.10.5

2

0

2

0.7903

0

2

1.1611

0.4915

0

2

1.3625

0.8048

0.3528

0

2

1.4844

1.0122

0.6064

0.2710

0

2

1.5627

1.1533

0.7896

0.4778

0.2162

0

2

1.6131

1.2488

0.9210

0.6342

0.3868

0.1756

0

2

1.6409

1.3078

1.0114

0.7513

0.5214

0.3157

0.1344

0

2

1.6426

1.3301

1.0657

0.8380

0.6318

0.4312

0.2221

0

2

1.5994

1.3042

1.0834

0.9032

0.7365

0.5554

0.3227

0

2

1.4508

1.2039

1.0605

0.9548

0.8556

0.7311

0.5135

0

1

1

1

1

1

1

1

Table 1 Temperatures at Common Mesh Points

t1 t2 t3 t4 t5 t6 t7 t8 t9

Case (a)

Case (b)

Case (c)

— — — — 0.7500 — — — —

0.7846 1.1383 0.4719 1.2967 0.7491 0.3265 1.2995 0.9014 0.5570

0.8048 1.1533 0.4778 1.3078 0.7513 0.3157 1.3042 0.9032 0.5554

1

1

10.10 Equilibrium Temperature Distributions

609

Knowing that the temperatures of the discrete problem approach the exact temperatures as the mesh spacing decreases, we may surmise that the nine temperatures obtained in case (c) are closer to the exact values than those in case (b). A NumericalTechnique

To obtain the 49 temperatures in case (c) of Figure 10.10.3, it was necessary to solve a linear system with 49 unknowns. A finer net might involve a linear system with hundreds or even thousands of unknowns. Exact algorithms for the solutions of such large systems are impractical, and for this reason we now discuss a numerical technique for the practical solution of these systems. To describe this technique, we look again at Equation (2): t = Mt + b

(7)

The vector t we are seeking appears on both sides of this equation. We consider a way of generating better and better approximations to the vector solution t. For the initial approximation t(0) we can take t(0) = 0 if no better choice is available. If we substitute t(0) into the right side of (7) and label the resulting left side as t(1) , we have t(1) = M t(0) + b If we substitute t we label t(2) :

(1)

(8)

into the right side of (7), we generate another approximation, which t(2) = M t(1) + b

(9)

Continuing in this way, we generate a sequence of approximations as follows: t(1) = M t(0) + b t(2) = M t(1) + b t(3) = M t(2) + b

.. .

(10)

t(n) = M t(n−1) + b

.. .

One would hope that this sequence of approximations t(0) , t(1) , t(2) , . . . converges to the exact solution of (7). We do not have the space here to go into the theoretical considerations necessary to show this. Suffice it to say that for the particular problem we are considering, the sequence converges to the exact solution for any mesh size and for any initial approximation t(0) . This technique of generating successive approximations to the solution of (7) is a variation of a technique called Jacobi iteration; the approximations themselves are called iterates. As a numerical example, let us apply Jacobi iteration to the calculation of the nine mesh point temperatures of case (b). Setting t(0) = 0, we have, from Equation (2),

⎤ .5000 ⎥ ⎢ ⎢.5000⎥ ⎥ ⎢ ⎢.0000⎥ ⎥ ⎢ ⎢.5000⎥ ⎥ ⎢ ⎥ ⎢ = M t(0) + b = M 0 + b = b = ⎢.0000⎥ ⎥ ⎢ ⎢.0000⎥ ⎥ ⎢ ⎢.7500⎥ ⎥ ⎢ ⎥ ⎢ ⎣.2500⎦ .2500 ⎡

t(1)

610

Chapter 10 Applications of Linear Algebra

t(2) = M t(1) + b



0



1 4

0

0

0

0

0

0

0

1 4

1 4

0

0

0

0

0

1 4 1 4

0

0

0

0

1 4

0

0

1 4

⎢ ⎢1 ⎢4 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎢ =⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢0 ⎢ ⎢ ⎢0 ⎣

1 4 1 4

0 0

0

0

1 4

1 4

0

1 4

0

0

0

1 4

0

0

0

0

0

1 4

0

0

0

1 4

0

0

0

1 4

0

1 4

0

0

0

0

1 4

0

1 4

0

0

Some additional iterates are



t(3)



0.6875 ⎥ ⎢ ⎢0.8906⎥ ⎥ ⎢ ⎢0.2344⎥ ⎥ ⎢ ⎢0.9688⎥ ⎥ ⎢ ⎥ ⎢ = ⎢0.3750⎥, ⎥ ⎢ ⎢0.1250⎥ ⎥ ⎢ ⎢1.0781⎥ ⎥ ⎢ ⎢0.6094⎥ ⎦ ⎣ 0.3906

t(10)



⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎥ .5000 .6250 .5000 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ 0⎥ ⎢ .5000⎥ ⎢.5000⎥ ⎢.7500⎥ ⎥⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0⎥ ⎢.0000⎥ ⎢.1250⎥ .0000⎥ ⎥⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢.5000⎥ ⎢.8125⎥ 0⎥ ⎢ .5000⎥ ⎢ ⎢ ⎢ ⎥ ⎥ ⎥ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ . 0000 . 0000 . 1875 + = 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ . 0000 . 0000 . 0625 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ 4⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ .7500⎥ ⎢.7500⎥ ⎢.9375⎥ ⎥ 0⎥ ⎥⎢ ⎥ ⎢.2500⎥ ⎢.5000⎥ ⎥⎢ ⎣ ⎦ ⎣ ⎦ 1 ⎥ ⎣.2500⎦ 4 ⎦ .2500 .2500 .3125

0

0



0.7791 ⎥ ⎢ ⎢1.1230⎥ ⎥ ⎢ ⎢0.4573⎥ ⎥ ⎢ ⎢1.2770⎥ ⎥ ⎢ ⎥ ⎢ = ⎢0.7236⎥, ⎥ ⎢ ⎢0.3131⎥ ⎥ ⎢ ⎢1.2848⎥ ⎥ ⎢ ⎢0.8827⎥ ⎦ ⎣ 0.5446



t(20)



0.7845 ⎥ ⎢ ⎢1.1380⎥ ⎥ ⎢ ⎢0.4716⎥ ⎥ ⎢ ⎢1.2963⎥ ⎥ ⎢ ⎥ ⎢ = ⎢0.7486⎥, ⎥ ⎢ ⎢0.3263⎥ ⎥ ⎢ ⎢1.2992⎥ ⎥ ⎢ ⎢0.9010⎥ ⎦ ⎣ 0.5567



t(30)



0.7846 ⎥ ⎢ ⎢1.1383⎥ ⎥ ⎢ ⎢0.4719⎥ ⎥ ⎢ ⎢1.2967⎥ ⎥ ⎢ ⎥ ⎢ = ⎢0.7491⎥ ⎥ ⎢ ⎢0.3265⎥ ⎥ ⎢ ⎢1.2995⎥ ⎥ ⎢ ⎢0.9014⎥ ⎦ ⎣ 0.5570

All iterates beginning with the thirtieth are equal to t(30) to four decimal places. Consequently, t(30) is the exact solution to four decimal places. This agrees with our previous result given in Equation (4). The Jacobi iteration scheme applied to the linear system (5) with 49 unknowns produces iterates that begin repeating to four decimal places after 119 iterations. Thus, t(119) would provide the 49 temperatures of case (c) correct to four decimal places. A Monte CarloTechnique 2 0

2

0

2

0

2 t5

2

0

2

0

1

1

1

Figure 10.10.6

1

1

In this section we describe a so-called Monte Carlo technique for computing the temperature at a single interior mesh point of the discrete problem without having to compute the temperatures at the remaining interior mesh points. First we define a discrete random walk along the net. By this we mean a directed path along the net lines (Figure 10.10.6) that joins a succession of mesh points such that the direction of departure from each mesh point is chosen at random. Each of the four possible directions of departure from each mesh point along the path is to be equally probable. By the use of random walks, we can compute the temperature at a specified interior mesh point on the basis of the following property. THEOREM 10.10.3 Random Walk Property

Let W1 , W2 , . . . , Wn be a succession of random walks, all of which begin at a specified interior mesh point. Let t1∗ , t2∗ , . . . , tn∗ be the temperatures at the boundary mesh points first encountered along each of these random walks. Then the average value (t1∗ + t2∗ + · · · + tn∗ )/n of these boundary temperatures approaches the temperature at the specified interior mesh point as the number of random walks n increases without bound.

10.10 Equilibrium Temperature Distributions

611

This property is a consequence of the discrete mean-value property that the mesh point temperatures satisfy. The proof of the random walk property involves elementary concepts from probability theory, and we will not give it here. In Table 2 we display the results of a large number of computer-generated random walks for the evaluation of the temperature t5 of the nine-point mesh of case (b) in Figure 10.10.6. The first column lists the number n of the random walk. The second column lists the temperature tn∗ of the boundary point first encountered along the corresponding random walk. The last column contains the cumulative average of the boundary temperatures encountered along the n random walks. Thus, after 1000 random walks we have the approximation t5  .7550. This compares with the exact value t5 = .7491 that we had previously evaluated. As can be seen, the convergence to the exact value is not too rapid. Table 2

n

t*n

(t*1 + ... + tn*)/n

n

t*n

(t*1 + ... + t*n )/n

1 2 3 4 5 6 7 8 9 10

1 2 1 0 2 0 2 0 2 0

1.0000 1.5000 1.3333 1.0000 1.2000 1.0000 1.1429 1.0000 1.1111 1.0000

20 30 40 50 100 150 200 250 500 1000

1 0 0 2 0 1 0 1 1 0

0.9500 0.8000 0.8250 0.8400 0.8300 0.8000 0.8050 0.8240 0.7860 0.7550

Exercise Set 10.10 1. A plate in the form of a circular disk has boundary temperatures of 0◦ on the left of its circumference and 1◦ on the right half of its circumference. A net with four interior mesh points is overlaid on the disk (see Figure Ex-1). (a) Using the discrete mean-value property, write the 4 × 4 linear system t = M t + b that determines the approximate temperatures at the four interior mesh points.

0

0

0

1

t1

t2

t3

t4

1

1

(b) Solve the linear system in part (a). (c) Use the Jacobi iteration scheme with t(0) = 0 to generate the iterates t(1) , t(2) , t(3) , t(4) , and t(5) for the linear system in part (a). What is the “error vector” t(5) − t, where t is the solution found in part (b)? (d) By certain advanced methods, it can be determined that the exact temperatures to four decimal places at the four mesh points are t1 = t3 = .2871 and t2 = t4 = .7129. What are the percentage errors in the values found in part (b)?

0 Figure Ex-1

1

2. Use Theorem 10.10.1 to find the exact equilibrium temperature at the center of the disk in Exercise 1. 3. Calculate the first two iterates t(1) and t(2) for case (b) of Figure 10.10.3 with nine interior mesh points [Equation (2)] when the

612

Chapter 10 Applications of Linear Algebra

initial iterate is chosen as (0)

t

= [1 1 1 1 1 1 1 1 1]

T

4. The random walk illustrated in Figure Ex-4a can be described by six arrows

←↓→→↑→ that specify the directions of departure from the successive mesh points along the path. Figure Ex-4b is an array of 100 computer-generated, randomly oriented arrows arranged in a 10 × 10 array. Use these arrows to determine random walks to approximate the temperature t5 , as in Table 2. Proceed as follows: 1.

Take the last two digits of your telephone number. Use the last digit to specify a row and the other to specify a column.

2.

Go to the arrow in the array with that row and column number.

3.

Using this arrow as a starting point, move through the array of arrows as you would read a book (left to right and top to bottom). Beginning at the point labeled t5 in Figure Ex4a and using this sequence of arrows to specify a sequence of directions, move from mesh point to mesh point until you reach a boundary mesh point. This completes your first random walk. Record the temperature at the boundary mesh point. (If you reach the end of the arrow array, continue with the arrow in the upper left corner.) Return to the interior mesh point labeled t5 and begin where you left off in the arrow array; generate your next random walk. Repeat this process until you have completed 10 random walks and have recorded 10 boundary temperatures.

4.

5.

Calculate the average of the 10 boundary temperatures recorded. (The exact value is t5 = .7491.)

capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Suppose that we have the square region described by

R = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} and suppose that the equilibrium temperature distribution u(x, y) along the boundary is given by u(x, 0) = TB , u(x, 1) = TT , u(0, y) = TL , and u(1, y) = TR . Suppose next that this region is partitioned into an (n + 1) × (n + 1) mesh using

i j and yj = n n for i = 0, 1, 2, . . . , n and j = 0, 1, 2, . . . , n. If the temperatures xi =

of the interior mesh points are labeled by

ui,j = u(xi , yi ) = u(i/n, j/n) then show that

ui,j = 41 (ui−1,j + ui+1,j + ui,j −1 + ui,j +1 ) for i = 1, 2, 3, . . . , n − 1 and j = 1, 2, 3, . . . , n − 1. To handle the boundary points, define

u0,j = TL , un,j = TR , ui,0 = TB , and ui,n = TT for i = 1, 2, 3, . . . , n − 1 and j = 1, 2, 3, . . . , n − 1. Next let



Fn+1 =



0

2

0

2

0

2 t5

2

0

2

0

0 1



0

⎢0 ⎢ F4 = ⎢ ⎣0 1

1

1

1

1 , 0

1 0 0 0

0 1 0 0

1 0 0

0 ⎥ 1⎦, 0

0 ⎢ ⎢0 ⎢ F5 = ⎢ ⎢0 ⎢0 ⎣ 1

1 0 0 0 0

0 1 0 0 0





0 0⎥ ⎥ ⎥, 1⎦ 0



0 ⎢ F3 = ⎣0 1



Mn+1 = Fn+1 +

1

(a)

0

0 0 1 0 0

and so on. By defining the (n + 1) × (n + 1) matrix

9 1

1





0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8

In

be the (n + 1) × (n + 1) matrix with the n × n identity matrix in the upper right-hand corner, a one in the lower left-hand corner, and zeros everywhere else. For example,

F2 = 2

0

(b)

Figure Ex-4

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra

T Fn+ 1

=

0

In

1

0





+

0

In

1

0



0 ⎥ 0⎥ ⎥ 0⎥ ⎥ 1⎥ ⎦ 0

T

show that if Un+1 is the (n + 1) × (n + 1) matrix with entries uij , then the set of equations

ui,j = 41 (ui−1,j + ui+1,j + ui,j −1 + ui,j +1 ) for i = 1, 2, 3, . . . , n − 1 and j = 1, 2, 3, . . . , n − 1 can be written as the matrix equation

Un+1 = 41 (Mn+1 Un+1 + Un+1 Mn+1 ) where we consider only those elements of Un+1 with i = 1, 2, 3, . . . , n − 1 and j = 1, 2, 3, . . . , n − 1.

10.11 Computed Tomography

T2. The results of the preceding exercise and the discussion in the text suggest the following algorithm for solving for the equilibrium temperature in the square region

R = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}

2.

u(x, 0) = TB , u(x, 1) = TT , u(0, y) = TL , u(1, y) = TR Choose a value for n, and then choose an initial guess, say ⎤ ⎡ 0 TL · · · T L 0 ⎥ ⎢T 0 · · · 0 TT ⎥ ⎢ B ⎥ ⎢ . . . . (0) .. ⎥ .. .. . Un+1 = ⎢ ⎥ ⎢ . ⎥ ⎢ 0 · · · 0 TT ⎦ ⎣TB 0 TR · · · T R 0 (k+1)

For each value of k = 0, 1, 2, 3, . . . , compute Un+1 using (k+1) (k) (k) 1 Un+ 1 = 4 (Mn+1 Un+1 + Un+1 Mn+1 ) (k+1)

where Mn+1 is as defined in Exercise T1. Then adjust Un+1 (0) by replacing all edge entries by the initial edge entries in Un+1 . [Note: The edge entries of a matrix are the entries in the first and last columns and first and last rows.] 3.

(k+1)

Use a computer and this algorithm to solve for u(x, y) given that

u(x, 0) = 0, u(x, 1) = 0, u(0, y) = 0, u(1, y) = 2 (30)

given the boundary conditions

1.

613

(k)

Continue this process until Un+1 − Un+1 is approximately the zero matrix. This suggests that (k) Un+1 = lim Un+ 1 k→⬁

Choose n = 6 and compute up to Un+1 . The exact solution can be expressed as

u(x, y) =

⬁ 8 - sinh[(2m − 1)πx] sin[(2m − 1)πy] π m=1 (2m − 1) sinh[(2m − 1)π]

Use a computer to compute u(i/6, j/6) for i , j = 0, 1, 2, 3, 4, 5, 6, and then compare your results to the values of u(i/6, j/6) in (30) Un+ 1. T3. Using the exact solution u(x, y) for the temperature distribution described in Exercise T2, use a graphing program to do the following: (a) Plot the surface z = u(x, y) in three-dimensional xyz-space in which z is the temperature at the point (x, y) in the square region. (b) Plot several isotherms of the temperature distribution (curves in the xy -plane over which the temperature is a constant). (c) Plot several curves of the temperature as a function of x with y held constant. (d) Plot several curves of the temperature as a function of y with x held constant.

10.11 ComputedTomography In this section we will see how constructing a cross-sectional view of a human body by analyzing X-ray scans leads to an inconsistent linear system. We present an iteration technique that provides an “approximate solution” of the linear system.

PREREQUISITES: Linear Systems Natural Logarithms Euclidean Space R n

The basic problem of computed tomography is to construct an image of a cross section of the human body using data collected from many individual beams of X rays that are passed through the cross section. These data are processed by a computer, and the computed cross section is displayed on a video monitor. Figure 10.11.1 is a diagram of

Figure 10.11.1

614

Chapter 10 Applications of Linear Algebra

General Electric’s CT system showing a patient prepared to have a cross section of his head scanned by X-ray beams. Such a system is also known as a CAT scanner, for Computer-Aided T omography scanner. Figure 10.11.2 shows a typical cross section of a human head produced by the system. The first commercial system of computed tomography for medical use was developed in 1971 by G. N. Hounsfield of EMI, Ltd., in England. In 1979, Houndsfield and A. M. Cormack were awarded the Nobel Prize for their pioneering work in the field. As we will see in this section, the construction of a cross section, or tomograph, requires the solution of a large linear system of equations. Certain algorithms, called algebraic reconstruction techniques (ARTs), can be used to solve these linear systems, whose solutions yield the cross sections in digital form. Figure 10.11.2 [Image: Edward Kinsman/Photo Researchers/Getty Images]

Scanning Modes

Unlike conventional X-ray pictures that are formed by X rays that are projected perpendicular to the plane of the picture, tomographs are constructed from thousands of individual, hairline-thin X-ray beams that lie in the plane of the cross section. After they pass through the cross section, the intensities of the X-ray beams are measured by an X-ray detector, and these measurements are relayed to a computer where they are processed. Figures 10.11.3 and 10.11.4 illustrate two possible modes of scanning the cross section: the parallel mode and the fan-beam mode. In the parallel mode a single X-ray source and X-ray detector pair are translated across the field of view containing the cross section, and many measurements of the parallel beams are recorded. Then the source and detector pair are rotated through a small angle, and another set of measurements is taken. This is repeated until the desired number of beam measurements is completed. For example, in the original 1971 machine, 160 parallel measurements were taken through 180 angles spaced 1◦ apart: a total of 160 × 180 = 28,800 beam measurements. Each such scan took approximately 5 21 minutes.

X-ray detector array

X-ray detector

Rotation

Rotation

Patient

Patient

n Tra n tio sla

X-ray source

X-ray source

Figure 10.11.3 Parallel mode.

Figure 10.11.4 Fan-beam mode.

In the fan-beam mode of scanning, a single X-ray tube generates a fan of collimated beams whose intensities are measured simultaneously by an array of detectors on the other side of the field of view. The X-ray tube and detector array are rotated through many angles, and a set of measurements is taken at each angle until the scan is completed. In the General Electric CT system, which uses the fan-beam mode, each scan takes 1 second.

10.11 Computed Tomography

Derivation of Equations

X-ray detector 1st pixel

ith beam

To see how the cross section is reconstructed from the many individual beam measurements, refer to Figure 10.11.5. Here the field of view in which the cross section is situated has been divided into many square pixels (picture elements) numbered 1 through N as indicated. It is our desire to determine the X-ray density of each pixel. In the EMI system, 6400 pixels were used, arranged in a square 80 × 80 array. The G.E. CT system uses 262,144 pixels in a 512 × 512 array, each pixel being about 1 mm on a side. After the densities of the pixels are determined by the method we will describe, they are reproduced on a video monitor, with each pixel shaded a level of gray proportional to its X-ray density. Because different tissues within the human body have different X-ray densities, the video display clearly distinguishes the various tissues and organs within the cross section. Figure 10.11.6 shows a single pixel with an X-ray beam of roughly the same width as the pixel passing squarely through it. The photons constituting the X-ray beam are absorbed by the tissue within the pixel at a rate proportional to the X-ray density of the tissue. Quantitatively, the X-ray density of the j th pixel is denoted by xj and is defined by

xj = ln

Nth pixel X-ray source

jth pixel

615

number of photons entering the j th pixel number of photons leaving the j th pixel

!

where “ln” denotes the natural logarithmic function. Using the logarithm property ln(a/b) = − ln(b/a), we also have

!

Figure 10.11.5

fraction of photons that pass through xj = − ln the j th pixel without being absorbed

If the X-ray beam passes through an entire row of pixels (Figure 10.11.7), then the number of photons leaving one pixel is equal to the number of photons entering the next pixel in the row. If the pixels are numbered 1, 2, . . . , n, then the additive property of the logarithmic function gives

x1 + x2 + · · · + xn = ln

number of photons entering the first pixel number of photons leaving the nth pixel



!



fraction of photons that pass = − ln ⎝ through the row of n pixels ⎠ without being absorbed

(1)

Thus, to determine the total X-ray density of a row of pixels, we simply sum the individual pixel densities.

jth pixel Photons entering jth pixel

Figure 10.11.6

First pixel Photons entering first pixel

Figure 10.11.7

Second pixel

Third pixel

Photons leaving jth pixel

nth pixel Photons leaving nth pixel

616

Chapter 10 Applications of Linear Algebra

Next, consider the X-ray beam in Figure 10.11.5. By the beam density of the i th beam of a scan, denoted by bi , we mean





number of photons of the i th beam entering the detector ⎟ ⎜ without the cross section in the field of view

bi = ln ⎜ ⎝



number of photons of the i th beam entering the detector ⎠ with the cross section in the field of view





fraction of photons of the i th beam that = − ln ⎝ pass through the cross section without ⎠ being absorbed

(2)

The numerator in the first expression for bi is obtained by performing a calibration scan without the cross section in the field of view. The resulting detector measurements are stored within the computer’s memory. Then a clinical scan is performed with the cross section in the field of view, the bi’s of all the beams constituting the scan are computed, and the values are stored for further processing. For each beam that passes squarely through a row of pixels, we must have









fraction of photons of the fraction of photons of the ⎜beam that pass through the⎟ ⎜beam that pass through the⎟ ⎜ ⎟=⎜ ⎟ ⎝row of pixels without being ⎠ ⎝ cross section without being ⎠ absorbed absorbed Thus, if the i th beam passes squarely through a row of n pixels, then it follows from Equations (1) and (2) that

x1 + x2 + · · · + xn = bi In this equation, bi is known from the clinical and calibration measurements, and x1 , x2 , . . . , xn are unknown pixel densities that must be determined. More generally, if the i th beam passes squarely through a row (or column) of pixels with numbers j1 , j2 , . . . , ji , then we have

xj1 + xj2 + · · · + xji = bi If we set

8 aij =

1, 0,

if j = j1 , j2 , . . . , ji otherwise

then we can write this equation as

ai 1 x1 + ai 2 x2 + · · · + aiN xN = bi

(3)

We will refer to Equation (3) as the i th beam equation. Referring to Figure 10.11.5, however, we see that the beams of a scan do not necessarily pass through a row or column of pixels squarely. Instead, a typical beam passes diagonally through each pixel in its path. There are many ways to take this into account. In Figure 10.11.8 we outline three methods of defining the quantities aij that appear in Equation (3), each of which reduces to our previous definition when the beam passes squarely through a row or column of pixels. Reading down the figure, each method is more exact than its predecessor, but with successively more computational difficulty.

10.11 Computed Tomography

617

Center-of-Pixel Method ith beam aij =

1 if the ith beam passes through the center of the jth pixel 0 otherwise

jth pixel Center Line Method Length of center line

aij =

length of the center line of the ith beam that lies in the jth pixel width of the jth pixel Width of pixel Area Method

aij =

area of the ith beam that lies in the jth pixel area of the ith beam that would lie in the jth pixel if the ith beam were to cross the pixel squarely

Area in the numerator of aij

Area in the denominator of aij

Figure 10.11.8

Using any one of the three methods to define the aij’s in the i th beam equation, we can write the set of M beam equations in a complete scan as

a11 x1 + a12 x2 + · · · + a1N xN = b1 a21 x1 + a22 x2 + · · · + a2N xN = b2 .. .. .. .. . . . . aM 1 x1 + aM 2 x2 + · · · + aMN xN = bM

(4)

In this way we have a linear system of M equations (the M beam equations) in N unknowns (the N pixel densities). Depending on the number of beams and pixels used, we can have M > N , M = N , or M < N . We will consider only the case M > N , the so-called overdetermined case, in which there are more beams in the scan than pixels in the field of view. Because of inherent modeling and experimental errors in the problem, we should not expect our linear system to have an exact mathematical solution for the pixel densities. In the next section we attempt to find an “approximate” solution to this linear system. Algebraic Reconstruction Techniques

There have been many mathematical algorithms devised to treat the overdetermined linear system (4). The one we will describe belongs to the class of so-called Algebraic Reconstruction Techniques (ARTs). This method, which can be traced to an iterative technique originally introduced by S. Kaczmarz in 1937, was the one used in the first commercial machine. To introduce this technique, consider the following system of three equations in two unknowns:

L1: L2: L3:

x1 + x2 =

2

x1 − 2x2 = −2 3x1 − x2 =

3

(5)

618

Chapter 10 Applications of Linear Algebra x2

x2

3x1 – x2 = 3

x2

x0

L2

L2

L2 x1 + x2 = 2

(1)

x1 – 2x2 = –2

(2)

x1

x2

x*2

Limit cycle

(2)

x3 (1)

x2

x*3

(1)

x3

(2) x1

x*1

L1

L3

x1

x1

x1

(c)

(b)

(a)

L1

L3

L1

L3

Figure 10.11.9

The lines L1 , L2 , L3 determined by these three equations are plotted in the x1 x2 -plane. As shown in Figure 10.11.9a, the three lines do not have a common intersection, and so the three equations do not have an exact solution. However, the points (x1 , x2 ) on the shaded triangle formed by the three lines are all situated “near” these three lines and can be thought of as constituting “approximate” solutions to our system. The following iterative procedure describes a geometric construction for generating points on the boundary of that triangular region (Figure 10.11.9b): Algorithm 1 Step 0. Choose an arbitrary starting point x0 in the x1 x2 -plane. (1)

Step 1. Project x0 orthogonally onto the first line L1 and call the projection x1 . The superscript (1) indicates that this is the first of several cycles through the steps. (1)

(1)

Step 2. Project x1 orthogonally onto the second line L2 and call the projection x2 . (1)

(1)

Step 3. Project x2 orthogonally onto the third line L3 and call the projection x3 . (1)

Step 4. Take x3 as the new value of x0 and cycle through Steps 1 through 3 again. In (2) (2) (2) the second cycle, label the projected points x1 , x2 , x3 ; in the third cycle, label (3) (3) (3) the projected points x1 , x2 , x3 ; and so forth. This algorithm generates three sequences of points

L1:

(1)

(2)

(3)

x1 , x1 , x1 , . . .

L2: x(21) , x(22) , x(23) , . . . L3:

(1)

(2)

(3)

x3 , x3 , x3 , . . .

that lie on the three lines L1 , L2 , and L3 , respectively. It can be shown that as long as the three lines are not all parallel, then the first sequence converges to a point x∗1 on L1 , the second sequence converges to a point x∗2 on L2 , and the third sequence converges to a point x∗3 on L3 (Figure 10.11.9c). These three limit points form what is called the limit cycle of the iterative process. It can be shown that the limit cycle is independent of the starting point x0 . Next we discuss the specific formulas needed to effect the orthogonal projections in Algorithm 1. First, because the equation of a line in x1 x2 -space is

a1 x1 + a2 x2 = b we can express it in vector form as aT x = b

10.11 Computed Tomography

where a=

619

x1 a1 and x = x2 a2

The following theorem gives the necessary projection formula (Exercise 5).

x2

x*

THEOREM 10.11.1 Orthogonal Projection Formula xp x1

Let L be a line in R 2 with equation aT x = b, and let x∗ be any point in R 2 (Figure 10.11.10). Then the orthogonal projection, xp , of x∗ onto L is given by

L

x p = x∗ +

Figure 10.11.10

(b − aT x∗ ) aT a

a

E X A M P L E 1 Using Algorithm 1

We can use Algorithm 1 to find an approximate solution of the linear system given in (5) and illustrated in Figure 10.11.9. If we write the equations of the three lines as

L1:

aT1 x = b1

L2: aT2 x = b2

where Table 1

x1

x2

x0

1.00000

3.00000

x(1) 1 x(1) 2 x(1) 3

.00000 .40000 1.30000

2.00000 1.20000 .90000

x(2) 1 x(2) 2 x(2) 3

1.20000 .88000 1.42000

.80000 1.44000 1.26000

x(3) 1 x(3) 2 x(3) 3

1.08000 .83200 1.40800

.92000 1.41600 1.22400

x(4) 1 x(4) 2 x(4) 3

1.09200 .83680 1.40920

.90800 1.41840 1.22760

x(5) 1 x(5) 2 x(5) 3

1.09080 .83632 1.40908

.90920 1.41816 1.22724

(6) 1 (6) 2 (6) 3

1.09092 .83637 1.40909

.90908 1.41818 1.22728

x x x

x1 x= , x2

L3:

aT3 x = b3





1 , a1 = 1

b1 = 2 ,



1 , a2 = −2

b2 = −2,

3 , a3 = −1

b3 = 3

then, using Theorem 10.11.1, we can express the iteration scheme in Algorithm 1 as (p)

(p) xk

=

(p) xk−1

+

(bk − aTk xk−1 ) aTk ak

ak ,

k = 1, 2, 3

where p = 1 for the first cycle of iterates, p = 2 for the second cycle of iterates, and so (p) forth. After each cycle of iterates (i.e., after x3 is computed), the next cycle of iterates (p+1) (p) set equal to x3 . is begun with x0 Table 1 gives the numerical results of six cycles of iterations starting with the initial point x0 = (1, 3). Using certain techniques that are impractical for large linear systems, we can show the exact values of the points of the limit cycle in this example to be

 = (1.09090 . . . , .90909 . . .) , 10 11  46 78  ∗ x2 = 55 , 55 = (.83636 . . . , 1.41818 . . .)  31 27  x∗3 = 22 , 22 = (1.40909 . . . , 1.22727 . . .)

x∗1 =

 12

11

It can be seen that the sixth cycle of iterates provides an excellent approximation to the (6) (6) (6) limit cycle. Any one of the three iterates x1 , x2 , x3 can be used as an approximate (6) (6) (6) solution of the linear system. (The large discrepancies in the values of x1 , x2 , and x3 are due to the artificial nature of this illustrative example. In practical problems, these discrepancies would be much smaller.)

620

Chapter 10 Applications of Linear Algebra

To generalize Algorithm 1 so that it applies to an overdetermined system of M equations in N unknowns,

a11 x1 + a12 x2 + · · · + a1N xN = b1 a21 x1 + a22 x2 + · · · + a2N xN = b2 .. .. .. .. . . . . aM 1 x1 + aM 2 x2 + · · · + aMN xN = bM

(6)

we introduce column vectors x and ai as follows:





⎤ x1 ⎢x ⎥ ⎢ 2⎥ x = ⎢ . ⎥, ⎣ .. ⎦ xN

⎤ ai 1 ⎢a ⎥ ⎢ i2 ⎥ ai = ⎢ . ⎥, ⎣ .. ⎦ aiN

i = 1, 2 , . . . , M

With these vectors, the M equations constituting our linear system (6) can be written in vector form as aTi x = bi , i = 1, 2, . . . , M Each of these M equations defines what is called a hyperplane in the N -dimensional Euclidean space R N . In general these M hyperplanes have no common intersection, and so we seek instead some point in R N that is reasonably “close” to all of them. Such a point will constitute an approximate solution of the linear system, and its N entries will determine approximate pixel densities with which to form the desired cross section. As in the two-dimensional case, we will introduce an iterative process that generates cycles of successive orthogonal projections onto the M hyperplanes beginning with some arbitrary initial point in R N . Our notation for these successive iterates is (p)

xk

=

the iterate lying on the k th hyperplane generated during the p th cycle of iterations

!

The algorithm is as follows: Algorithm 2 Step 0. Choose any point in R N and label it x0 . Step 1. For the first cycle of iterates, set p = 1. Step 2. For k = 1, 2, . . . , M , compute (p)

(p)

xk (p+1)

Step 3. Set x0

(p)

= xk−1 +

(bk − aTk xk−1 ) aTk ak

ak

(p)

= xM .

Step 4. Increase the cycle number p by 1 and return to Step 2. (p)

(p)

In Step 2 the iterate xk is called the orthogonal projection of xk−1 onto the hyperplane aTk x = bk . Consequently, as in the two-dimensional case, this algorithm determines a sequence of orthogonal projections from one hyperplane onto the next in which we cycle back to the first hyperplane after each projection onto the last hyperplane. (1) (2) It can be shown that if the vectors a1 , a2 , . . . , aM span R N , then the iterates xM , xM , (3) xM , . . . lying on the M th hyperplane will converge to a point x∗M on that hyperplane which does not depend on the choice of the initial point x0 . In computed tomography, (p) one of the iterates xM for p sufficiently large is taken as an approximate solution of the linear system for the pixel densities.

10.11 Computed Tomography

621

Note that for the center-of-pixel method, the scalar quantity aTk ak appearing in the equation in Step 2 of the algorithm is simply the number of pixels in which the k th beam passes through the center. Similarly, note that the scalar quantity (p)

bk − aTk xk−1 in that same equation can be interpreted as the excess kth beam density that results (p) if the pixel densities are set equal to the entries of xk−1 . This provides the following interpretation of our ART iteration scheme for the center-of-pixel method: Generate the pixel densities of each iterate by distributing the excess beam density of successive beams in the scan evenly among those pixels in which the beam passes through the center. When the last beam in the scan has been reached, return to the first beam and continue. E X A M P L E 2 Using Algorithm 2

We can use Algorithm 2 to find the unknown pixel densities of the 9 pixels arranged in the 3 × 3 array illustrated in Figure 10.11.11. These 9 pixels are scanned using the parallel mode with 12 beams whose measured beam densities are indicated in the figure. We choose the center-of-pixel method to set up the 12 beam equations. (In Exercises 7 and 8, you are asked to set up the beam equations using the center line and area methods.) As you can verify, the beam equations are

x7 + x8 + x9 = 13.00

x3 + x6 + x9 = 18.00

x4 + x5 + x6 = 15.00

x2 + x5 + x8 = 12.00

x1 + x2 + x3 = 8.00

x1 + x4 + x7 = 6.00

x6 + x8 + x9 = 14.79

x2 + x3 + x6 = 10.51

x3 + x5 + x7 = 14.31

x1 + x5 + x9 = 16.13

x1 + x2 + x4 = 3.81

x4 + x7 + x8 = 7.04

Table 2 illustrates the results of the iteration scheme starting with an initial iterate x0 = 0. (1) (1) The table gives the values of each of the first cycle of iterates, x1 through x12 , but (p)

(p)

thereafter gives the iterates x12 only for various values of p . The iterates x12 start (45)

repeating to two decimal places for p ≥ 45, and so we take the entries of x12 as approximate values of the 9 pixel densities.

1

2

3

4

5

6

7

8

9

b3 = 8.00 b2 = 15.00 b1 = 13.00

1

b6 = 3.81 b5 = 14.31 b4 = 14.79 3

2

4

5 7

89

6

b8 = 12.00 b7 = 18.00 b9 = 6.00 1

2

3

4

5

6

7

8

9

b10 = 10.51 b11 = 16.13 b12 = 7.04 1 4

2

3

5 78

6 9

Figure 10.11.11

We close this section by noting that the field of computed tomography is presently a very active research area. In fact, the ART scheme discussed here has been replaced in commercial systems by more sophisticated techniques that are faster and provide a more accurate view of the cross section. However, all the new techniques address the same basic mathematical problem: finding a good approximate solution of a large overdetermined inconsistent linear system of equations.

622

Chapter 10 Applications of Linear Algebra Table 2

First Cycle of Iterates

Pixel Densities

x1

x2

x3

x4

x5

x6

x7

x8

x9

x0 x(1) 1 x(1) 2 x(1) 3 x(1) 4 x(1) 5 x(1) 6 x(1) 7 x(1) 8 x(1) 9 x(1) 10 x(1) 11 x(1) 12

.00 .00 .00 2.67 2.67 2.67 .49 .49 .49 –.31 –.31 1.06 1.06

.00 .00 .00 2.67 2.67 2.67 .49 .49 .84 .84 .13 .13 .13

.00 .00 .00 2.67 2.67 3.44 3.44 4.93 4.93 4.93 4.22 4.22 4.22

.00 .00 5.00 5.00 5.00 5.00 2.83 2.83 2.83 2.02 2.02 2.02 .58

.00 .00 5.00 5.00 5.00 5.77 5.77 5.77 6.11 6.11 6.11 7.49 7.49

.00 .00 5.00 5.00 5.37 5.37 5.37 6.87 6.87 6.87 6.16 6.16 6.16

.00 4.33 4.33 4.33 4.33 5.10 5.10 5.10 5.10 4.30 4.30 4.30 2.85

.00 4.33 4.33 4.33 4.71 4.71 4.71 4.71 5.05 5.05 5.05 5.05 3.61

.00 4.33 4.33 4.33 4.71 4.71 4.71 6.20 6.20 6.20 6.20 7.58 7.58

x(2) 12

2.03

.69

4.42

1.34

7.49

5.39

2.65

3.04

6.61

x(3) 12

1.78

.51

4.52

1.26

7.49

5.48

2.56

3.22

6.86

x(4) 12

1.82

.52

4.62

1.37

7.49

5.37

2.45

3.22

6.82

x(5) 12

1.79

.49

4.71

1.43

7.49

5.31

2.37

3.25

6.85

x(10) 12

1.68

.44

5.03

1.70

7.49

5.03

2.04

3.29

6.96

(20)

1.49

.48

5.29

2.00

7.49

4.73

1.79

3.25

7.15

(30) 12

x

1.38

.55

5.34

2.11

7.49

4.62

1.74

3.19

7.26

x(40) 12

1.33

.59

5.33

2.14

7.49

4.59

1.75

3.15

7.31

x(45) 12

1.32

.60

5.32

2.15

7.49

4.59

1.76

3.14

7.32

x12

Exercise Set 10.11 (p)

1. (a) Setting xk equations

(p)

(p)

= (xk1 , xk2 ), show that the three projection (p)

(p)

xk

(p)

= xk−1 +

(bk − aTk xk−1 ) aTk ak

(p)

ak , k = 1, 2, 3

for the three lines in Equation (5) can be written as (p)

(p)

(p)

(p)

(p)

(p)

x11 = 21 [2 + x01 − x02 ]

k = 1:

x12 = 21 [2 − x01 + x02 ] (p)

where (x01

(p)

(p)

(p)

(p)

x22 = 15 [4 + 2x11 + x12 ] (p)

1 [9 10

(p)

1 [−3 10

x31 =

k = 3: (p+1)

(p)

x21 = 15 [−2 + 4x11 + 2x12 ]

k = 2:

x32 = (p+1)

, x02

(p)

(p)

(p)

+ x21 + 3x22 ] (p)

(p)

+ 3x21 + 9x22 ] (p)

(b) Show that the three pairs of equations in part (a) can be combined to produce

) = (x31 , x32 ) for p = 1, 2, . . . .

x31 = (p)

x32 = (0)

1 [28 20 1 [24 20

(p−1)

+ x31

(p−1)

+ 3x31

(0)

(p−1)

− x32

(1)

]

(p−1)

− 3x32

(1)

]

p = 1, 2, . . .

(1)

where (x31 , x32 ) = (x01 , x02 ) = x0 . [Note: Using this pair of equations, we can perform one complete cycle of three orthogonal projections in a single step.] (c) Because x3 tends to the limit point x∗3 as p → ⬁, the equations in part (b) become (p)

∗ x31 = ∗ = x32

1 [28 20 1 [24 20

∗ ∗ + x31 − x32 ] ∗ ∗ + 3x31 − 3x32 ]

∗ ∗ as p → ⬁. Solve this linear system for x∗3 = (x31 , x32 ). [Note: The simplifications of the ART formulas described

10.11 Computed Tomography

in this exercise are impractical for the large linear systems that arise in realistic computed tomography problems.] (1)

(2)

(6)

2. Use the result of Exercise 1(b) to find x3 , x3 , . . . , x3 to five decimal places in Example 1 using the following initial points: (a) x0 = (0, 0)

(b) x0 = (1, 1)

(c) x0 = (148, −15) 3. (a) Show directly that the points of the limit cycle in Example 1, x∗1 =

 12

11

,

10 11



, x∗2 =

 46 55

,

78 55



, x∗3 =

 31 22

27 , 22



form a triangle whose vertices lie on the lines L1 , L2 , and L3 and whose sides are perpendicular to these lines (Figure 10.11.9c). (b) Using the equations  31 27  derived in Exercise 1(a), show that if (1) x0 = x∗3 = 22 , 22 , then x1 = x∗1 = (1)

x2 = x∗2 = (1)

x3 = x∗3 = (1)

 12

11

,

10 11

55

,

78 55

, 22

27 22

 46

 31





Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Given the set of equations

ak x + bk y = ck for k = 1, 2, 3, . . . , n (with n > 2), let us consider the following algorithm for obtaining an approximate solution to the system. 1.



for i , j = 1, 2, 3, . . . , n and i < j for their unique solutions. This leads to 1 n(n − 1) 2

4. The following three lines in the x1 x2 -plane,

x2 = 1

L2 :

x1 − x2 = 2

L3 :

x1 − x2 = 0

solutions, which we label as

(xij , yij )

do not have a common intersection. Draw an accurate sketch of the three lines and graphically perform several cycles of the orthogonal projections described in Algorithm 1, beginning with the initial point x0 = (0, 0). On the basis of your sketch, determine the three points of the limit cycle. 5. Prove Theorem 10.11.1 by verifying that (a) the point xp as defined in the theorem lies on the line aT x = b (i.e., aT xp = b). (b) the vector xp − x∗ is orthogonal to the line aT x = b (i.e., xp − x∗ is parallel to a). (1)

Solve all possible pairs of equations

ai x + bi y = ci and aj x + bj y = cj

[Note: Either part of this exercise shows that successive orthogonal projections of any point on the limit cycle will move around the limit cycle indefinitely.]

L1 :

623

(2 )

(3)

6. As stated in the text, the iterates xM , xM , xM , . . . defined in Algorithm 2 will converge to a unique limit point x∗M if the vectors a1 , a2 , . . . , aM span R N . Show that if this is the case and if the center-of-pixel method is used, then the center of each of the N pixels in the field of view is crossed by at least one of the M beams in the scan. 7. Construct the 12 beam equations in Example 2 using the center line method. Assume that the distance between the center lines of adjacent beams is equal to the width of a single pixel. 8. Construct the 12 beam equations in Example 2 using the area method. Assume that the width of each beam is equal to the width of a single pixel and that the distance between the center lines of adjacent beams is also equal to the width of a single pixel.

for i , j = 1, 2, 3, . . . , n and i < j . 2.

Construct the geometric center of these points defined by

⎛ 2

(xC , yC ) = ⎝ n(n − 1)

n−1 n -

xij ,

i=1 j = i+1

2

n(n − 1)

n−1 n -

⎞ yij⎠

i=1 j = i+1

and use this as the approximate solution to the original system. Use this algorithm to approximate the solution to the system

x+ y=

2

x − 2y = −2 3x − y =

3

and compare your results to those in this section. T2. (Calculus required ) Given the set of equations

ak x + bk y = ck for k = 1, 2, 3, . . . , n (with n > 2), let us consider the following least squares algorithm for obtaining an approximate solution (x ∗ , y ∗ ) to the system. Given a point (α, β) and the line ai x + bi y = ci , the distance from this point to the line is given by

|ai α + bi β − ci | # ai2 + bi2

624

Chapter 10 Applications of Linear Algebra

If we define a function f (x, y) by

f (x, y) =

n i=1

and

(ai x + bi y − ci ) ai2 + bi2 ∗

2

2

ai ai + bi2 2

i=1





x∗ +

n -

ai bi ai + bi2 2

i=1



y∗ =

n i=1



and then determine the point (x , y ) that minimizes this function, we will determine the point that is closest to each of these lines in a summed least squares sense. Show that x ∗ and y ∗ are solutions to the system

 n -



n -

ai ci ai + bi2 2

i=1

ai bi ai2 + bi2



 ∗

x +

n i=1

bi2 2 ai + bi2

 y∗ =

n i=1

bi ci ai2 + bi2

Apply this algorithm to the system

x+ y=

2

x − 2y = −2 3x − y =

3

and compare your results to those in this section.

10.12 Fractals In this section we will use certain classes of linear transformations to describe and generate intricate sets in the Euclidean plane. These sets, called fractals, are currently the focus of much mathematical and scientific research.

PREREQUISITES: Geometry of Linear Operators on R 2 (Section 4.11) Euclidean Space R n Natural Logarithms Intuitive Understanding of Limits

Fractals in the Euclidean Plane

At the end of the nineteenth century and the beginning of the twentieth century, various bizarre and wild sets of points in the Euclidean plane began appearing in mathematics. Although they were initially mathematical curiosities, these sets, called fractals, are rapidly growing in importance. It is now recognized that they reveal a regularity in physical and biological phenomena previously dismissed as “random,” “noisy,” or “chaotic.” For example, fractals are all around us in the shapes of clouds, mountains, coastlines, trees, and ferns. In this section we give a brief description of certain types of fractals in the Euclidean plane R 2 . Much of this description is an outgrowth of the work of two mathematicians, Benoit B. Mandelbrot and Michael Barnsley, who are both active researchers in the field.

Self-Similar Sets

To begin our study of fractals, we need to introduce some terminology about sets in R 2 . We will call a set in R 2 bounded if it can be enclosed by a suitably large circle (Figure 10.12.1) and closed if it contains all of its boundary points (Figure 10.12.2). Two sets in R 2 will be called congruent if they can be made to coincide exactly by translating and rotating them appropriately within R 2 (Figure 10.12.3). We will also rely on your intuitive concept of overlapping and nonoverlapping sets, as illustrated in Figure 10.12.4. If T : R 2 → R 2 is the linear operator that scales by a factor of s (see Table 7 of Section 4.9), and if Q is a set in R 2 , then the set T (Q) (the set of images of points in Q under T ) is called a dilation of the set Q if s > 1 and a contraction of Q if 0 < s < 1 (Figure 10.12.5). In either case we say that T (Q) is the set Q scaled by the factor s .

10.12 Fractals Enclosing circle

y

625

y

Bounded set

Unbounded set

y x

x

(a) Set enclosed by a circle

(b) This set cannot be enclosed by any circle.

Figure 10.12.1 y

y

x

(a) Overlapping sets Closed set

y

Congruent sets x

x

Figure 10.12.2 The boundary points (solid color) lie in the set. x

Figure 10.12.3

y

y

(b) Nonoverlapping sets Figure 10.12.4 Q

Figure 10.12.5 A contraction of Q.

T

( xy ) =

s 0 0 s

x y

T(Q)

x

x

The types of fractals we will consider first are called self-similar. In general, we define a self-similar set in R 2 as follows: DEFINITION 1 A closed and bounded subset of the Euclidean plane

R 2 is said to be

self-similar if it can be expressed in the form

S = S1 ∪ S2 ∪ S3 ∪ · · · ∪ Sk

(1)

where S1 , S2 , S3 , . . . , Sk are nonoverlapping sets, each of which is congruent to S scaled by the same factor s (0 < s < 1). If S is a self-similar set, then (1) is sometimes called a decomposition of S into nonoverlapping congruent sets. E X A M P L E 1 Line Segment (a)

(b) Figure 10.12.6

A line segment in R 2 (Figure 10.12.6a) can be expressed as the union of two nonoverlapping congruent line segments (Figure 10.12.6b). In Figure 10.12.6b we have separated the two line segments slightly so that they can be seen more easily. Each of these two smaller line segments is congruent to the original line segment scaled by a factor of 21 . Hence, a line segment is a self-similar set with k = 2 and s = 21 .

626

Chapter 10 Applications of Linear Algebra

E X A M P L E 2 Square

A square (Figure 10.12.7a) can be expressed as the union of four nonoverlapping congruent squares (Figure 10.12.7b), where we have again separated the smaller squares slightly. Each of the four smaller squares is congruent to the original square scaled by a factor of 21 . Hence, a square is a self-similar set with k = 4 and s = 21 . E X A M P L E 3 Sierpinski Carpet (a)

The set suggested by Figure 10.12.8a,the Sierpinski “carpet,” was first described by the Polish mathematician Waclaw Sierpinski (1882–1969). It can be expressed as the union of eight nonoverlapping congruent subsets (Figure 10.12.8b), each of which is congruent to the original set scaled by a factor of 13 . Hence, it is a self-similar set with k = 8 and s = 13 . Note that the intricate square-within-a-square pattern continues forever on a smaller and smaller scale (although this can only be suggested in a figure such as the one shown).

(b) Figure 10.12.7

Figure 10.12.8

(a)

(b)

E X A M P L E 4 Sierpinski Triangle

Figure 10.12.9a illustrates another set described by Sierpinski. It is a self-similar set with k = 3 and s = 21 (Figure 10.12.9b). As with the Sierpinski carpet, the intricate triangle-within-a-triangle pattern continues forever on a smaller and smaller scale.

Figure 10.12.9

(a)

(b)

The Sierpinski carpet and triangle have a more intricate structure than the line segment and the square in that they exhibit a pattern that is repeated indefinitely. This difference will be explored later in this section. Topological Dimension of a Set

In Section 4.5 we defined the dimension of a subspace of a vector space to be the number of vectors in a basis, and we found that definition to coincide with our intuitive sense of dimension. For example, the origin of R 2 is zero-dimensional, lines through the origin are one-dimensional, and R 2 itself is two-dimensional. This definition of dimension is a

10.12 Fractals

627

special case of a more general concept called topological dimension, which is applicable to sets in R n that are not necessarily subspaces. A precise definition of this concept is studied in a branch of mathematics called topology. Although that definition is beyond the scope of this text, we can state informally that • a point in R 2 has topological dimension zero; • a curve in R 2 has topological dimension one; • a region in R 2 has topological dimension two. It can be proved that the topological dimension of a set in R n must be an integer between 0 and n, inclusive. In this text we will denote the topological dimension of a set S by dT (S).

Table 1 Set S

dT (S)

Line segment

1

Square

2

Sierpinski carpet

1

Sierpinski triangle

1

Hausdorff Dimension of a Self-Similar Set

E X A M P L E 5 Topological Dimensions of Sets

Table 1 gives the topological dimensions of the sets studied in our earlier examples. The first two results in this table are intuitively obvious; however, the last two are not. Informally stated, the Sierpinski carpet and triangle both contain so many “holes” that those sets resemble web-like networks of lines rather than regions. Hence they have topological dimension one. The proofs are quite difficult.

In 1919 the German mathematician Felix Hausdorff (1868–1942) gave an alternative definition for the dimension of an arbitrary set in R n . His definition is quite complicated, but for a self-similar set, it reduces to something rather simple: DEFINITION 2 The Hausdorff dimension of a self-similar set S of form (1) is denoted by dH (S) and is defined by ln k dH (S) = (2) ln(1/s)

In this definition, “ln” denotes the natural logarithm function. Equation (2) can also be expressed as 1 s dH (S) = (3)

k

in which the Hausdorff dimension dH (S) appears as an exponent. Formula (3) is more helpful for interpreting the concept of Hausdorff dimension; it states, for example, that if you scale a self-similar set by a factor of s = 21 , then its area (or more properly its measure) decreases by a factor of 1 2

 1 dH (S) 2

. Thus, scaling a line segment by a factor of

1 2

 1 1

= 21 , and scaling a square region by a  2 reduces its measure (area) by a factor of 21 = 41 .

reduces its measure (length) by a factor of

2

factor of Before proceeding to some examples, we should note a few facts about the Hausdorff dimension of a set: • The topological dimension and Hausdorff dimension of a set need not be the same. • The Hausdorff dimension of a set need not be an integer.

• The topological dimension of a set is less than or equal to its Hausdorff dimension; that is, dT (S) ≤ dH (S).

628

Chapter 10 Applications of Linear Algebra

E X A M P L E 6 Hausdorff Dimensions of Sets

Table 2 lists the Hausdorff dimensions of the sets studied in our earlier examples. Table 2 Set S

Fractals

ln k

s

k

Line segment

1 2

2

ln 2/ln 2 = 1

Square

1 2

4

ln 4/ln 2 = 2

Sierpinski carpet

1 3

8

ln 8/ln 3 = 1.892 . . .

Sierpinski triangle

1 2

3

ln 3/ln 2 = 1.584 . . .

dH(S) =

ln (1/s)

Comparing Tables 1 and 2, we see that the Hausdorff and topological dimensions are equal for both the line segment and square but are unequal for the Sierpinski carpet and triangle. In 1977 Benoit B. Mandelbrot suggested that sets for which the topological and Hausdorff dimensions differ must be quite complicated (as Hausdorff had earlier suggested in 1919). Mandelbrot proposed calling such sets fractals, and he offered the following definition. DEFINITION 3 A fractal is a subset of a Euclidean space whose Hausdorff dimension and topological dimension are not equal.

According to this definition, the Sierpinski carpet and Sierpinski triangle are fractals, whereas the line segment and square are not. It follows from the preceding definition that a set whose Hausdorff dimension is not an integer must be a fractal (why?). However, we will see later that the converse is not true; that is, it is possible for a fractal to have an integer Hausdorff dimension. Similitudes

We will now show how some techniques from linear algebra can be used to generate fractals. This linear algebra approach also leads to algorithms that can be exploited to draw fractals on a computer. We begin with a definition. DEFINITION 4 A similitude with scale factor s is a mapping of R 2 into R 2 of the form

T

! cos θ x =s sin θ y

− sin θ cos θ

x e + y f

where s , θ , e, and f are scalars. Geometrically, a similitude is a composition of three simpler mappings: a scaling by a factor of s , a rotation about the origin through an angle θ , and a translation (e units in the x -direction and f units in the y -direction). Figure 10.12.10 illustrates the effect of a similitude on the unit square U . For our application to fractals, we will need only similitudes that are contractions, by which we mean that the scale factor s is restricted to the range 0 < s < 1. Consequently, when we refer to similitudes we will always mean similitudes subject to this restriction.

10.12 Fractals y

629

y (Scaling)

(1, 1)

(0, 1)

s T(U) 1

U

θ (Rotation)

(e, f) x (0, 0)

(Translation)

x

(1, 0)

(b) Unit square after similitude

(a) Unit square

Figure 10.12.10

Similitudes are important in the study of fractals because of the following fact: If T : R 2 → R 2 is a similitude with scale factor s and if S is a closed and bounded set in R 2 , then the image T (S) of the set S under T is congruent to S scaled by s . y

Recall from the definition of a self-similar set in R 2 that a closed and bounded set S in R 2 is self-similar if it can be expressed in the form

(1, 1)

(0, 1)

U

S = S1 ∪ S2 ∪ S3 ∪ · · · ∪ Sk

U x

(0, 0)

where S1 , S2 , S3 , . . . , Sk are nonoverlapping sets each of which is congruent to S scaled by the same factor s (0 < s < 1) [see (1)]. In the following examples, we will find similitudes that produce the sets S1 , S2 , S3 , . . . , Sk from S for the line segment, square, Sierpinski carpet, and Sierpinski triangle.

(1, 0)

S

E X A M P L E 7 Line Segment

(a)

We will take as our line segment the line segment S connecting the points (0, 0) and (1, 0) in the xy -plane (Figure 10.12.11a). Consider the two similitudes

y

T1 T2

(0, 12 ) T1(U)

T2(U)

T1(S)

T2(S)

x (0, 0)

( 12 , 0) (b) Figure 10.12.11

(1, 0)

! x = y ! x = y



x y

1 1 1 0 x + 2 2 0 1 y 0 1 1 2 0

0 1

(4)

both of which have s = 21 and θ = 0. In Figure 10.12.11b we show how these two similitudes map the unit square U . The similitude T1 maps U onto the smaller square T1 (U ), and the similitude T2 maps U onto the smaller square T2 (U ). At the same time, T1 maps the line segment S onto the smaller line segment T1 (S), and T2 maps S onto the smaller nonoverlapping line segment T2 (S). The union of these two smaller nonoverlapping line segments is precisely the original line segment S ; that is,

S = T1 (S) ∪ T2 (S)

(5)

E X A M P L E 8 Square

Let us consider the unit square U in the xy -plane (Figure 10.12.12a ) and the following four similitudes, all having s = 21 and θ = 0:

T1 T3

! x = y ! x = y



x y

  0 1 1 0 x + 1 2 0 1 y 2 1 1 2 0

0 1

T2 T4

! x = y ! x = y

1 x + 2 y 0

1 1 1 0 x + 21 2 0 1 y 2

1 1 2 0

0 1

(6)

630

Chapter 10 Applications of Linear Algebra y

The images of the unit square U under these four similitudes are the four squares shown in Figure 10.12.12b. Thus, (1, 1)

(0, 1)

U

U = T1 (U ) ∪ T2 (U ) ∪ T3 (U ) ∪ T4 (U )

is a decomposition of U into  four nonoverlapping squares that are congruent to U scaled by the same scale factor s = 21 .

U x

(0, 0)

(1, 0)

!

x ei 1 1 0 x Ti = + , i = 1, 2, 3, . . . , 8 fi y 3 0 1 y ei where the eight values of are fi         2   1 2

y (1, 1)

(0, 1) T3(U)

E X A M P L E 9 Sierpinski Carpet

Let us consider a Sierpinski carpet S over the unit square U of the xy -plane (Figure 10.12.13a) and the following eight similitudes, all having s = 13 and θ = 0:

(a)

T4(U)

0

( ) 0,

1 2

T1(U)

0

T2(U) x

(0, 0)

(

1 2,

(7)

0

)

(1, 0)

(b) Figure 10.12.12

1 3

,

0

2 3

,

0

,

0 1 3

,

3 1 3

0

,

2 3

,

3 2 3

,

(8)

3 2 3

The images of S under these eight similitudes are the eight sets shown in Figure 10.12.13b. Thus, S = T1 (S) ∪ T2 (S) ∪ T3 (S) ∪ · · · ∪ T8 (S) (9) is a decomposition of S into eight nonoverlapping sets that are congruent to S scaled by the same scale factor s = 13 . y

y

T7(S) (0, 1)

(1, 1)

x (0, 0)

S

T6(S)

T8(S)

T4(S)

T5(S)

T1(S)

T3(S)

x

(1, 0) T2(S)

Figure 10.12.13

(a)

(b)

E X A M P L E 10 Sierpinski Triangle

Let us consider a Sierpinski triangle S fitted inside the unit square U of the xy -plane, as shown in Figure 10.12.14a, and the following three similitudes, all having s = 21 and θ = 0: !

x 1 1 0 x T1 = y 2 0 1 y

T2

T3

!

1 x 1 1 0 x = + 2 y 2 0 1 y 0 !

  0 x 1 1 0 x = + 1 y 2 0 1 y 2

(10)

10.12 Fractals

631

The images of S under these three similitudes are the three sets in Figure 10.12.14b. Thus, S = T1 (S) ∪ T2 (S) ∪ T3 (S) (11) is a decomposition of S into three nonoverlapping sets that are congruent to S scaled by the same scale factor s = 21 .

y

y

(1, 1)

(0, 1) S

(0, 1) T3(S)

U

(0, 12)

T2(S)

T1(S) x (0, 0)

Figure 10.12.14

(1, 0)

(a)

x (0, 0)

(

1 2,

)

0

(1, 0)

(b)

In the preceding examples we started with a specific set S and showed that it was self-similar by finding similitudes T1 , T2 , T3 , . . . , Tk with the same scale factor such that T1 (S), T2 (S), T3 (S), . . . , Tk (S) were nonoverlapping sets and such that

S = T1 (S) ∪ T2 (S) ∪ T3 (S) ∪ · · · ∪ Tk (S)

(12)

The following theorem addresses the converse problem of determining a self-similar set from a collection of similitudes. THEOREM 10.12.1 If T1 , T2 , T3 , . . . , Tk are contracting similitudes with the same scale

factor, then there is a unique nonempty closed and bounded set S in the Euclidean plane such that

S = T1 (S) ∪ T2 (S) ∪ T3 (S) ∪ · · · ∪ Tk (S) Furthermore, if the sets T1 (S), T2 (S), T3 (S), . . . , Tk (S) are nonoverlapping, then S is self-similar.

Algorithms for Generating Fractals

In general, there is no simple way to obtain the set S in the preceding theorem directly. We now describe an iterative procedure that will determine S from the similitudes that define it. We first give an example of the procedure and then give an algorithm for the general case. E X A M P L E 11 Sierpinski Carpet

Figure 10.12.15 shows the unit square region S0 in the xy -plane, which will serve as an “initial” set for an iterative procedure for the construction of the Sierpinski carpet. The set S1 in the figure is the result of mapping S0 with each of the eight similitudes Ti (i = 1, 2, . . . , 8) in (8) that determine the Sierpinski carpet. It consists of eight square regions, each of side length 13 , surrounding an empty middle square. Next we apply the eight similitudes to S1 and arrive at the set S2 . Similarly, applying the eight similitudes to S2 results in the set S3 . It we continue this process indefinitely, the sequence of sets S1 , S2 , S3 , . . . will “converge” to a set S , which is the Sierpinski carpet.

632

Chapter 10 Applications of Linear Algebra y

(1, 1)

(0, 1)

x (0, 0)

(1, 0)

Figure 10.12.15

S0

S1

S2

S3

S4

S

Remark Although we should properly give a definition of what it means for a sequence of sets to “converge” to a given set, an intuitive interpretation will suffice in this introductory treatment.

Although we started in Figure 10.12.15 with the unit square region to arrive at the Sierpinski carpet, we could have started with any nonempty set S0 . The only restriction is that the set S0 be closed and bounded. For example, if we start with the particular set S0 shown in Figure 10.12.16, then S1 is the set obtained by applying each of the eight

y (0, 1)

x (0, 0)

Figure 10.12.16

(1, 0) S0

S1

S2

S3

S4

S

10.12 Fractals

633

similitudes in (8). Applying the eight similitudes to S1 results in the set S2 . As before, applying the eight similitudes indefinitely yields the Sierpinski carpet S as the limiting set. The general algorithm illustrated in the preceding example is as follows: Let T1 , T2 , T3 , . . . , Tk be contracting similitudes with the same scale factor, and for an arbitrary set Q in R 2 , define the set J(Q) by

J(Q) = T1 (Q) ∪ T2 (Q) ∪ T3 (Q) ∪ · · · ∪ Tk (Q) The following algorithm generates a sequence of sets S0 , S1 , . . . , Sn , . . . that converges to the set S in Theorem 10.12.1. Algorithm 1 Step 0. Choose an arbitrary nonempty closed and bounded set S0 in R 2 . Step 1. Compute S1 = J(S0 ). Step 2. Compute S2 = J(S1 ). Step 3. Compute S3 = J(S2 ).

.. .

Step n.Compute Sn = J(Sn−1 ).

.. .

E X A M P L E 1 2 Sierpinski Triangle

Let us construct the Sierpinski triangle determined by the three similitudes given in (10). The corresponding set mapping is J(Q) = T1 (Q) ∪ T2 (Q) ∪ T3 (Q). Figure 10.12.17 shows an arbitrary closed and bounded set S0 ; the first four iterates S1 , S2 , S3 , S4 ; and the limiting set S (the Sierpinski triangle). y (0, 1)

x (0, 0)

(1, 0) S0

S1

S2

S3

S4

S

Figure 10.12.17

634

Chapter 10 Applications of Linear Algebra

E X A M P L E 1 3 Using Algorithm 1

Consider the following two similitudes:

! x = y ! x = y

T1



1 1 2 0



0 1





x .3 1 cos θ − sin θ T2 + cos θ y .3 2 sin θ The actions of these two similitudes on the unit square U are illustrated in Figure 10.12.18. Here, the rotation angle θ is a parameter that we will vary to generate different selfsimilar sets. The self-similar sets determined by these two similitudes are shown in Figure 10.12.19 for various values of θ . For simplicity, we have not drawn the xy -axes, but in each case the origin is the lower left point of the set. These sets were generated on a computer using Algorithm 1 for the various values of θ . Because k = 2 and s = 21 , it follows from (2) that the Hausdorff dimension of these sets for any value of θ is 1. It can be shown that the topological dimension of these sets is 1 for θ = 0 and 0 for all other values of θ . It follows that the self-similar set for θ = 0 is not a fractal [it is the straight line segment from (0, 0) to (.6, .6)], while the self-similar sets for all other values of θ are fractals. In particular, they are examples of fractals with integer Hausdorff dimension. y

y (1, 1)

(0, 1)

1 2

(0, 12 )

U x

(0, 0)

(1, 0)

(0, 0)

T2(U) θ (.3, .3) T1(U)

(

(a)

Figure 10.12.18

1 2

,0

x

)

(b) (.6, .6)

(0, 0)

Figure 10.12.19

A Monte Carlo Approach

θ = 60°

θ = 50°

θ = 40°

θ = 30°

θ = 20°

θ = 10°

θ = 0°

The set-mapping approach of constructing self-similar sets described in Algorithm 1 is rather time-consuming on a computer because the similitudes involved must be applied to each of the many computer screen pixels in the successive iterated sets. In 1985 Michael Barnsley described an alternative, more practical method of generating a self-similar set defined through its similitudes. It is a so-called Monte Carlo method that takes advantage of probability theory. Barnsley refers to it as the Random Iteration Algorithm. Let T1 , T2 , T3 , . . . , Tk be contracting similitudes with the same scale factor. The following algorithm generates a sequence of points

x0 x1 xn , ,..., ,... y0 y1 yn that collectively converge to the set S in Theorem 10.12.1.

10.12 Fractals

Algorithm 2 Step 0. Choose an arbitrary point

635

x0 in S . y0

Step 1. Choose one of the k similitudes at random, say Tk1 , and compute

! x1 x0 = Tk1 y1 y0

Step 2. Choose one of the k similitudes at random, say Tk2 , and compute

! x2 x1 = Tk2 y2 y1

.. .

Step n.Choose one of the k similitudes at random, say Tkn , and compute



.. .



! xn xn−1 = Tkn yn yn−1

On a computer screen the pixels corresponding to the points generated by this algorithm will fill out the pixel representation of the limiting set S . Figure 10.12.20 shows four stages of the Random Algorithm that generate Iteration

0 the Sierpinski carpet, starting with the initial point . 0

5000 iterations

15,000 iterations

45,000 iterations

100,000 iterations

Figure 10.12.20

Remark Although Step 0 in the preceding algorithm requires the selection of an initial point in the set S , which may not be known in advance, this is not a serious problem. In practice, one can usually start with any point in R 2 and after a few iterations (say ten or so), the point generated will be sufficiently close to S that the algorithm will work correctly from that point on.

More General Fractals

So far, we have discussed fractals that are self-similar sets according to the definition of a self-similar set in R 2 . However, Theorem 10.12.1 remains true if the similitudes T1 , T2 , . . . , Tk are replaced by more general transformations, called contracting affine transformations. An affine transformation is defined as follows: DEFINITION 5 An affine transformation is a mapping of R 2 into R 2 of the form

T

! x a = y c

where a , b, c, d , e, and f are scalars.

b d

x e + y f

636

Chapter 10 Applications of Linear Algebra y

Figure 10.12.21 shows how an affine transformation maps the unit square U onto a parallelogram T (U ). An affine transformation is said to be contracting if the Euclidean distance between any two points in the plane is strictly decreased after the two points are mapped by the transformation. It can be shown that any k contracting affine transformations T1 , T2 , . . . , Tk determine a unique closed and bounded set S satisfying the equation S = T1 (S) ∪ T2 (S) ∪ T3 (S) ∪ · · · ∪ Tk (S) (13)

(1, 1)

(0, 1)

U x (0, 0)

(1, 0)

(a) Unit square y (a + b + e, c + d + f ) (b + e, d + f ) T(U)

Equation (13) has the same form as Equation (12), which we used to find self-similar sets. Although Equation (13), which uses contracting affine transformations, does not determine a self-similar set S , the set it does determine has many of the features of selfsimilar sets. For example, Figure 10.12.22 shows how a set in the plane resembling a fern (an example made famous by Barnsley) can be generated through four contracting affine transformations. Note that the middle fern is the slightly overlapping union of the four smaller affine-image ferns surrounding it. Note also how T3 , because the determinant of its matrix part is zero, maps the entire fern onto the small straight line segment between the points (.50, 0) and (.50, .16). Figure 10.12.22 contains a wealth of information and should be studied carefully.

(a + e, c + f ) (e, f )

(.115, 1.030)

x

(b) Unit square after affine transformation

(.965, .990)

(.340, .495)

Figure 10.12.21

(.600, .275)

(.140, .265)

(.075, .180) (.925, .140)

(.400, .045)

T1

T3

( ) x y

( ) x y

=

=

.20 –.26 .23 .22

0 0 0 .16

x + .400 y .045

T2

( ) x y

=

(0, 1)

(1, 1)

(0, 0)

(1, 0)

x + .50 y 0

T4

( ) x y

=

.85 –.04

.04 .85

–.15 .28 .26 .24

x + .075 y .180

x + .575 y –.086

(.705, .414) (.50, .16) (.50, 0)

Figure 10.12.22

(.425, .174)

(.855, .154)

(.575, –.086)

Michael Barnsley has applied the above theory to the field of data compression and transmission. The fern, for example, is completely determined by the four affine transformations T1 , T2 , T3 , T4 . These four transformations, in turn, are determined by the 24

10.12 Fractals

637

numbers given in Figure 10.12.22 defining their corresponding values of a , b, c, d , e, and f . In other words, these 24 numbers completely encode the picture of the fern. Storing these 24 numbers in a computer requires considerably less memory space than storing a pixel-by-pixel description of the fern. In principle, any picture represented by a pixel map on a computer screen can be described through a finite number of affine transformations, although it is not easy to determine which transformations to use. Nevertheless, once encoded, the affine transformations generally require several orders of magnitude less computer memory than a pixel-by-pixel description of the pixel map.

F URTHER R EADINGS

Readers interested in learning more about fractals are referred to the following books, the first of which elaborates on the linear transformation approach of this section. 1. MICHAEL BARNSLEY, Fractals Everywhere (New York: Academic Press, 1993). 2. BENOIT B. MANDELBROT, The Fractal Geometry of Nature (New York: W. H. Freeman, 1982). 3. HEINZ-OTTO PEITGEN AND P. H. RICHTER, The Beauty of Fractals (New York: Springer-Verlag, 1986). 4. HEINZ-OTTO PEITGEN AND DIETMAR SAUPE, The Science of Fractal Images (New York: SpringerVerlag, 2011).

Exercise Set 10.12 1. The self-similar set in Figure Ex-1 has the sizes indicated. Given that its lower left corner is situated at the origin of the xy -plane, find the similitudes that determine the set. What is its Hausdorff dimension? Is it a fractal? 1 1 25

3. Each of the 12 self-similar sets in Figure Ex-3 results from three similitudes with scale factor of 21 , and so all have Hausdorff dimension ln 3/ ln 2 = 1.584 . . . . The rotation angles of the three similitudes are all multiples of 90◦ . Find these rotation angles for each set and express them as a triplet of integers (n1 , n2 , n3 ), where ni is the corresponding integer multiple of 90◦ in the order upper right, lower left, lower right. For example, the first set (the Sierpinski triangle) generates the triplet (0, 0, 0).

1 1 25

Figure Ex-1

2. Find the Hausdorff dimension of the self-similar set shown in Figure Ex-2. Use a ruler to measure the figure and determine an approximate value of the scale factor s . What are the rotation angles of the similitudes determining this set?

Figure Ex-2

Figure Ex-3

638

Chapter 10 Applications of Linear Algebra

4. For each of the self-similar sets in Figure Ex-4, find: (i) the scale factor s of the similitudes describing the set; (ii) the rotation angles θ of all similitudes describing the set (all rotation angles are multiples of 90◦ ); and (iii) the Hausdorff dimension of the set. Which of the sets are fractals and why?

express the unit square as the union of four overlapping squares. Evaluate the right-hand side of Equation (2) for the values of k and s determined by these similitudes, and show that the result is not the correct value of the Hausdorff dimension of the unit square. [Note: This exercise shows the necessity of the nonoverlapping condition in the definition of a self-similar set and its Hausdorff dimension.] 9. All of the results in this section can be extended to R n . Compute the Hausdorff dimension of the unit cube in R 3 (see Figure Ex-9). Given that the topological dimension of the unit cube is 3, determine whether it is a fractal. [Hint: Express the unit cube as the union of eight smaller congruent nonoverlapping cubes.] z

(a)

1

(b)

y 1 x

(c)

(d)

Figure Ex-4

5. Show that of the four affine transformations shown in Figure 10.12.22, only the transformation T2 is a similitude. Determine its scale factor s and rotation angle θ .

1

Figure Ex-9

10. The set in R 3 in Figure Ex-10 is called the Menger sponge. It is a self-similar set obtained by drilling out certain square holes from the unit cube. Note that each face of the Menger sponge is a Sierpinski carpet and that the holes in the Sierpinski carpet now run all the way through the Menger sponge. Determine the values of k and s for the Menger sponge and find its Hausdorff dimension. Is the Menger sponge a fractal? z

6. Find the coordinates of the tip of the fern in Figure 10.12.22. [Hint: The transformation T2 maps the tip of the fern to itself.] 7. The square in Figure 10.12.7a was expressed as the union of 4 nonoverlapping squares as in Figure 10.12.7b. Suppose that it is expressed instead as the union of 16 nonoverlapping squares. Verify that its Hausdorff dimension is still 2, as determined by Equation (2). 8. Show that the four similitudes

  x = T1 y   x T2 = y   x = T3 y   x = T4 y



3 1 4 0



3 1 4 0



3 1 4 0



3 1 4 0

  x 1 y     1 0 x + 4 1 y 0     0 0 x + 1 1 y 4   1 0 x + 41 1 y 4 0

y

x

Figure Ex-10

10.13 Chaos

11. The two similitudes 

T1 and

x y



=

1 1 3 0

   x 1 1 = T2 3 0 y

  0 x 1 y

    2 0 x + 3 1 y 0

determine a fractal known as the Cantor set. Starting with the unit square region U as an initial set, sketch the first four sets that Algorithm 1 determines. Also, find the Hausdorff dimension of the Cantor set. (This famous set was the first example that Hausdorff gave in his 1919 paper of a set whose Hausdorff dimension is not equal to its topological dimension.)

S=

Ti (S)

i=1

for appropriately chosen similitudes Ti (for i = 1, 2, 3, . . . , 20). Determine these similitudes by determining the collection of 3 × 1 matrices ⎧⎡ ⎤ ⎫

 ⎪ ⎪ ⎨ ai  ⎬ ⎢ ⎥ ⎣ bi ⎦ for i = 1, 2, 3, . . . , 20 ⎪ ⎪ ⎩ c  ⎭ i

T2. Generalize the ideas involved in the Cantor set (in R 1 ), the Sierpinski carpet (in R 2 ), and the Menger sponge (in R 3 ) to R n by considering the set S satisfying

12. Compute the areas of the sets S0 , S1 , S2 , S3 , and S4 in Figure 10.12.15.

S=

mn 9

Ti (S)

i=1

with

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Use similitudes of the form

⎛⎡ ⎤⎞ ⎡ 1 x ⎜⎢ ⎥⎟ 1 ⎢ Ti ⎝⎣y ⎦⎠ = ⎣0 3 0 z

20 9

0 1 0

⎤⎡ ⎤

⎡ ⎤

0 x ai ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣y ⎦ + ⎣bi ⎦ 1 ci z

to show that the Menger sponge (see Exercise 10) is the set S satisfying

639

⎡ ⎛⎡ ⎤⎞ x1 1 ⎢ ⎜⎢x2 ⎥⎟ ⎢ ⎜⎢ ⎥⎟ 1 ⎢0 ⎜⎢ ⎥⎟ 0 Ti ⎜⎢x3 ⎥⎟ = ⎢ ⎜⎢ . ⎥⎟ 3 ⎢ ⎢ .. ⎝⎣ .. ⎦⎠ ⎣. 0 xn

0 1 0

0 0 1

.. .

··· ··· ··· .. .

0

0

···

.. .

⎤⎡ ⎤





0 x1 a 1i ⎥⎢ ⎥ ⎢ ⎥ 0⎥ ⎢x2 ⎥ ⎢a2i ⎥ ⎥⎢ ⎥ ⎢ ⎥ 0⎥ ⎢x3 ⎥ + ⎢a3i ⎥

⎢ ⎥ .. ⎥ ⎥⎢ . ⎥ . ⎦ ⎣ .. ⎦ xn 1

⎢ ⎥ ⎢ .. ⎥ ⎣ . ⎦ ani

where each aki equals 0, 13 , or 23 , and no two of them ever equal at the same time. Use a computer to construct the set

1 3

⎧ ⎡ ⎤ ⎫ a1i  ⎪ ⎪ ⎪ ⎪  ⎪ ⎪ ⎪ ⎢ ⎥ ⎪ ⎪ ⎪ ⎨ ⎢a2i ⎥ ⎬ ⎢a ⎥ ⎢ 3i ⎥ for i = 1, 2, 3, . . . , mn ⎪ ⎢ ⎥ ⎪ ⎪ ⎪ ⎪ ⎣ ... ⎦ ⎪ ⎪ ⎪ ⎪ ⎪  ⎩ ⎭  ani thereby determining the value of mn for n = 2, 3, 4. Then develop an expression for mn .

10.13 Chaos In this section we use a map of the unit square in the xy-plane onto itself to describe the concept of a chaotic mapping.

PREREQUISITES: Geometry of Linear Operators on R 2 (Section 4.11) Eigenvalues and Eigenvectors Intuitive Understanding of Limits and Continuity Chaos

The word chaos was first used in a mathematical sense in 1975 by Tien-Yien Li and James Yorke in a paper entitled “Period Three Implies Chaos.” The term is now used to describe the behavior of certain mathematical mappings and physical phenomena that at first glance seem to behave in a random or disorderly fashion but actually have an underlying element of order (examples include random-number generation, shuffling cards, cardiac arrhythmia, fluttering airplane wings, changes in the red spot of Jupiter,

640

Chapter 10 Applications of Linear Algebra

and deviations in the orbit of Pluto). In this section we discuss a particular chaotic mapping called Arnold’s cat map, after the Russian mathematician Vladimir I. Arnold who first described it using a diagram of a cat. Arnold’s Cat Map

To describe Arnold’s cat map, we need a few ideas about modular arithmetic. If x is a real number, then the notation x mod 1 denotes the unique number in the interval [0, 1) that differs from x by an integer. For example, 2.3 mod 1 = 0.3, 0.9 mod 1 = 0.9, −3.7 mod 1 = 0.3, 2.0 mod 1 = 0 Note that if x is a nonnegative number, then x mod 1 is simply the fractional part of x . If (x, y) is an ordered pair of real numbers, then the notation (x, y) mod 1 denotes (x mod 1, y mod 1). For example,

(2.3, −7.9) mod 1 = (0.3, 0.1) Observe that for every real number x , the point x mod 1 lies in the unit interval [0, 1) and that for every ordered pair (x, y), the point (x, y) mod 1 lies in the unit square

S = {(x, y) | 0 ≤ x < 1, 0 ≤ y < 1} Also observe that the upper boundary and the right-hand boundary of the square are not included in S . Arnold’s cat map is the transformation  : R 2 → R 2 defined by the formula

 : (x, y) → (x + y, x + 2y) mod 1 or, in matrix notation,

! x 1 1 x  = mod 1 1 2 y y

(1)

To understand the geometry of Arnold’s cat map, it is helpful to write (1) in the factored form



! 1 0 1 1 x x  = mod 1 1 1 0 1 y y which expresses Arnold’s cat map as the composition of a shear in the x -direction with factor 1, followed by a shear in the y -direction with factor 1. Because the computations are performed mod 1,  maps all points of R 2 into the unit square S . We will illustrate the effect of Arnold’s cat map on the unit square S , which is shaded in Figure 10.13.1a and contains a picture of a cat. It can be shown that it does not matter whether the mod 1 computations are carried out after each shear or at the very end. We will discuss both methods, first performing them at the end. The steps are as follows: 3

3

2

2

1

1

0

1

(a) Figure 10.13.1

2

0

1

(b)

3

3

2

2

2

1

1

1

3

Step 1: (x, y) → (x + y, y)

2

0

Step 2: (x, y) → (x, x + y)

1

(c)

2

0

1

2

0

(d)

Step 3: (x, y) → (x, y) mod 1

1

2

10.13 Chaos

641

Step 1. Shear in the x -direction with factor 1 (Figure 10.13.1b):

(x, y) → (x + y, y)

or in matrix notation

1 0

1 1



x x+y = y y

Step 2. Shear in the y -direction with factor 1 (Figure 10.13.1c):

(x, y) → (x, x + y) or, in matrix notation,



1 1

0 1



x x = y x+y

Step 3. Reassembly into S (Figure 10.13.1d ):

(x, y) → (x, y) mod 1 The geometric effect of the mod 1 arithmetic is to break up the parallelogram in Figure 10.13.1c and reassemble the pieces of S as shown in Figure 10.13.1d. For computer implementation, it is more convenient to perform the mod 1 arithmetic at each step, rather than at the end. With this approach there is a reassembly at each step, but the net effect is the same. The steps are as follows: Step 1. Shear in the x -direction with factor 1, followed by a reassembly into S (Figure 10.13.2b): (x, y) → (x + y, y) mod 1 Step 2. Shear in the y -direction with factor 1, followed by a reassembly into S (Figure 10.13.2c): (x, y) → (x, x + y) mod 1 Step 1:

2

2

2 (x, y) → (x, y) mod 1

(x, y) → (x + y, y)

1

1

Step 2:

2

2

1

(x, y) → (x, y) mod 1

1

1 (x, y) → (x, x + y)

0

1

2

(a)

0

1

2

0

1

2

0

1

(b)

0

2

1

2

(c)

Figure 10.13.2

Repeated Mappings

Chaotic mappings such as Arnold’s cat map usually arise in physical models in which an operation is performed repeatedly. For example, cards are mixed by repeated shuffles, paint is mixed by repeated stirs, water in a tidal basin is mixed by repeated tidal changes, and so forth. Thus, we are interested in examining the effect on S of repeated applications (or iterations) of Arnold’s cat map. Figure 10.13.3, which was generated on a computer, shows the effect of 25 iterations of Arnold’s cat map on the cat in the unit square S . Two interesting phenomena occur: • The cat returns to its original form at the 25th iteration. • At some of the intermediate iterations, the cat is decomposed into streaks that seem to have a specific direction. Much of the remainder of this section is devoted to explaining these phenomena.

Chapter 10 Applications of Linear Algebra 101 pixels 101 pixels

642

Iteration 1

Iteration 2

Iteration 3

Iteration 4

Iteration 5

Iteration 6

Iteration 7

Iteration 8

Iteration 9

Iteration 10

Iteration 11

Iteration 12

Iteration 13

Iteration 14

Iteration 15

Iteration 16

Iteration 17

Iteration 18

Iteration 19

Iteration 20

Iteration 21

Iteration 22

Iteration 23

Iteration 24

Iteration 25

Figure 10.13.3

Periodic Points

Our first goal is to explain why the cat in Figure 10.13.3 returns to its original configuration at the 25th iteration. For this purpose it will be helpful to think of a picture in the xy -plane as an assignment of colors to the points in the plane. For pictures generated on a computer screen or other digital device, hardware limitations require that a picture be broken up into discrete squares, called pixels. For example, in the computer-generated pictures in Figure 10.13.3 the unit square S is divided into a grid with 101 pixels on a side for a total of 10,201 pixels, each of which is black or white (Figure 10.13.4). An assignment of colors to pixels to create a picture is called a pixel map. Enlarged view of cat's face showing individual pixels

Figure 10.13.4

10.13 Chaos

643

As shown in Figure 10.13.5, each pixel in S can be assigned a unique pair of coordinates of the form (m/101, n/101) that identifies its lower left-hand corner, where m and n are integers in the range 0, 1, 2, . . . , 100. We call these points pixel points because each such point identifies a unique pixel. Instead of restricting the discussion to the case where S is subdivided into an array with 101 pixels on a side, let us consider the more general case where there are p pixels per side. Thus, each pixel map in S consists of p 2 pixels uniformly spaced 1/p units apart in both the x - and the y -directions. The pixel points in S have coordinates of the form (m/p, n/p), where m and n are integers ranging from 0 to p − 1. 100 101

. . .

n 101

. . .

Figure 10.13.5

( 101m ,

3 101 2 101 1 101 0 101 0 1 2 3 101 101 101 101

m

. . . 100

101

101

. . .

n 101

)

Under Arnold’s cat map each pixel point of S is transformed into another pixel point of S . To see why this is so, observe that the image of the pixel point (m/p, n/p) under  is given in matrix form by

⎡ ⎤ ⎤⎞ ⎤ ⎡ m+n m m ⎥ ⎜ ⎢ p ⎥⎟ 1 1 ⎢ p ⎥ ⎢ p ⎢ ⎥ ⎥ ⎜ ⎢ ⎥⎟ ⎢  ⎜ ⎢ ⎥⎟ = mod 1 = ⎢ ⎢ ⎥ ⎥ mod 1 1 2 ⎣n ⎦ ⎝ ⎣ n ⎦⎠ ⎣ m + 2n ⎦ p p p ⎛⎡

(2)

The ordered pair ((m + n)/p, (m + 2n)/p) mod 1 is of the form (m /p, n /p), where m and n lie in the range 0, 1, 2, . . . , p − 1. Specifically, m and n are the remainders when m + n and m + 2n are divided by p , respectively. Consequently, each point in S of the form (m/p, n/p) is mapped onto another point of the same form. Because Arnold’s cat map transforms every pixel point of S into another pixel point of S , and because there are only p 2 different pixel points in S , it follows that any given pixel point must return to its original position after at most p 2 iterations of Arnold’s cat map. E X A M P L E 1 Using Formula (2)

If p = 76, then (2) becomes

⎛⎡

m

⎤⎞



m+n



⎥ ⎜⎢ ⎥⎟ ⎢ ⎜⎢ 76 ⎥⎟ ⎢ 76 ⎥  ⎜⎢ ⎥⎟ = ⎢ ⎥ mod 1 ⎝⎣ n ⎦⎠ ⎣ m + 2n ⎦ 76

76

In this case the successive iterates of the point

 27 76

 , 58 are 76

0

1

2

3

4

5

6

7

8

27 76 58 76

9 76 67 76

0 76 67 76

67 76 58 76

49 76 31 76

4 76 35 76

39 76 74 76

37 76 35 76

72 76 31 76

644

Chapter 10 Applications of Linear Algebra 6

1

2

3

0

(verify). Because the point returns to its initial position on the ninth application of Arnold’s cat map (but no sooner), the point is said to have period 9, and the set of nine distinct iterates of the point is called a 9-cycle. Figure 10.13.6 shows this 9-cycle with the initial point labeled 0 and its successive iterates labeled accordingly.

7 5

4

8

Figure 10.13.6

Period Versus Pixel Width

In general, a point that returns to its initial position after n applications of Arnold’s cat map, but does not return with fewer than n applications, is said to have period n, and its set of n distinct iterates is called an n-cycle. Arnold’s cat map maps (0, 0) into (0, 0), so this point has period 1. Points with period 1 are also called fixed points. We leave it as an exercise (Exercise 11) to show that (0, 0) is the only fixed point of Arnold’s cat map. If P1 and P2 are points with periods q1 and q2 , respectively, then P1 returns to its initial position in q1 iterations (but no sooner), and P2 returns to its initial position in q2 iterations (but no sooner); thus, both points return to their initial positions in any number of iterations that is a multiple of both q1 and q2 . In general, for a pixel map with p 2 pixel points of the form (m/p, n/p), we let (p) denote the least common multiple of the periods of all the pixel points in the map [i.e., (p) is the smallest integer that is divisible by all of the periods]. It follows that the pixel map will return to its initial configuration in (p) iterations of Arnold’s cat map (but no sooner). For this reason, we call (p) the period of the pixel map. In Exercise 4 we ask you to show that if p = 101, then all pixel points have period 1, 5, or 25, so (101) = 25. This explains why the cat in Figure 10.13.3 returned to its initial configuration in 25 iterations. Figure 10.13.7 shows how the period of a pixel map varies with p . Although the general tendency is for the period to increase as p increases, there is a surprising amount of irregularity in the graph. Indeed, there is no simple function that specifies this relationship (see Exercise 1). 1000 900 800

(p) (Period)

700 600 500 400 300 200 100 0 0

Figure 10.13.7

50

100

150

200

250

300

350

400

450

500

p (Side length of unit square in pixels)

Although a pixel map with p pixels on a side does not return to its initial configuration until (p) iterations have occurred, various unexpected things can occur at intermediate iterations. For example, Figure 10.13.8 shows a pixel map with p = 250 of the famous Hungarian-American mathematician John von Neumann. It can be shown that (250) = 750; hence, the pixel map will return to its initial configuration after

10.13 Chaos

645

750 iterations of Arnold’s cat map (but no sooner). However, after 375 iterations the pixel map is turned upside down, and after another 375 iterations (for a total of 750) the pixel map is returned to its initial configuration. Moreover, there are so many pixel points with periods that divide 750 that multiple ghostlike images of the original likeness occur at intermediate iterations; at 195 iterations numerous miniatures of the original likeness occur in diagonal rows.

250 pixels

250 pixels

125 iterations

5 iterations

10 iterations

75 iterations

195 iterations

250 iterations

375 iterations

Figure 10.13.8 [Image: Photographer unknown. Courtesy of The Shelby White and Leon Levy Archives Center, Institute for Advanced Study, Princeton, NJ, USA]

TheTiled Plane

Our next objective is to explain the cause of the linear streaks that occur in Figure 10.13.3. For this purpose it will be helpful to view Arnold’s cat map another way. As defined, Arnold’s cat map is not a linear transformation because of the mod 1 arithmetic. However, there is an alternative way of defining Arnold’s cat map that avoids the mod 1 arithmetic and results in a linear transformation. For this purpose, imagine that the unit square S with its picture of the cat is a “tile,” and suppose that the entire plane is covered with such tiles, as in Figure 10.13.9. We say that the xy -plane has been tiled with the unit square. If we apply the matrix transformation in (1) to the entire tiled plane without performing the mod 1 arithmetic, then it can be shown that the portion of the image within S will be identical to the image that we obtained using the mod 1 arithmetic (Figure 10.13.9). In short, the tiling results in the same pixel map in S as the mod 1 arithmetic, but in the tiled case Arnold’s cat map is a linear transformation. It is important to understand, however, that tiling and mod 1 arithmetic produce periodicity in different ways. If a pixel map in S has period n, then in the case of mod 1 arithmetic, each point returns to its original position at the end of n iterations. In the case of tiling, points need not return to their original positions; rather, each point is replaced by a point of the same color at the end of n iterations.

646

Chapter 10 Applications of Linear Algebra Step 3: (x, y) → (x, y) mod 1

Step 2: (x, y) → (x, x + y)

Step 1: (x, y) → (x + y, y) 3

3

3

3

2

2

2

2

1

1

1

1

0

1

2

0

1

2

0

1

2

0

1

2

Figure 10.13.9

Properties of Arnold’s Cat Map

To understand the cause of the streaks in Figure 10.13.3, think of Arnold’s cat map as a linear transformation on the tiled plane. Observe that the matrix



C=

1 1

1 2

that defines Arnold’s cat map is symmetric and has a determinant of 1. The fact that the determinant is 1 means that multiplication by this matrix preserves areas; that is, the area of any figure in the plane and the area of its image are the same. This is also true for figures in S in the case of mod 1 arithmetic, since the effect of the mod 1 arithmetic is to cut up the figure and reassemble the pieces without any overlap, as shown in Figure 10.13.1d. Thus, in Figure 10.13.3 the area of the cat (whatever it is) is the same as the total area of the blotches in each iteration. The fact that the matrix is symmetric means that its eigenvalues are real and the corresponding eigenvectors are perpendicular. We leave it for you to show that the eigenvalues and corresponding eigenvectors of C are



3+ 5 λ1 = = 2.6180 . . . , 2





1 √ ⎥ ⎢ , v1 = ⎣ 1 + 5 ⎦ = 1.6180 . . . 1





3− 5 λ2 = = 0.3819 . . . , 2



⎢ v2 = ⎣

−1 −

2

2 1

√ ⎤ 5



−1.6180 . . . ⎥ ⎦= 1

For each application of Arnold’s cat map, the eigenvalue λ1 causes a stretching in the direction of the eigenvector v1 by a factor of 2.6180 . . . , and the eigenvalue λ2 causes a compression in the direction of the eigenvector v2 by a factor of 0.3819 . . . . Figure 10.13.10 shows a square centered at the origin whose sides are parallel to the two eigenvector directions. Under the above mapping, this square is deformed into the rectangle whose sides are also parallel to the two eigenvector directions. The area of the square and rectangle are the same. To explain the cause of the streaks in Figure 10.13.3, consider S to be part of the tiled plane, and let p be a point of S with period n. Because we are considering tiling, there is a point q in the plane with the same color as p that on successive iterations moves toward the position initially occupied by p, reaching that position on the nth iteration. This point is q = (A−1 )n p = A−n p, since

An q = An (A−n p) = p Thus, with successive iterations, points of S flow away from their initial positions, while at the same time other points in the plane (with corresponding colors) flow toward those initial positions, completing their trip on the final iteration of the cycle. Figure 10.13.11

10.13 Chaos







647



illustrates this in the case where n = 4, q = − 83 , 53 , and p = A4 q = 13 , 23 . Note that   p mod 1 = q mod 1 = 13 , 23 , so both points occupy the same positions on their respective tiles. The outgoing point moves in the general direction of the eigenvector v1 , as indicated by the arrows in Figure 10.13.11, and the incoming point moves in the general direction of eigenvector v2 . It is the “flow lines” in the general directions of the eigenvectors that form the streaks in Figure 10.13.3. 3 2

v2 =

–1 – 5 2 1

1 v1 =

1

1+ 5 2

0 –1 –2

Figure 10.13.10

–3

–2

4

–1

0

1

2

3

q

2 p = A4q

0 –2 –4

Figure 10.13.11

Nonperiodic Points

–4

–2

0

2

4

Thus far we have considered the effect of Arnold’s cat map on pixel points of the form (m/p, n/p) for an arbitrary positive integer p . We know that all such points are periodic. We now consider the effect of Arnold’s cat map on an arbitrary point (a, b) in S . We classify such points as rational if the coordinates a and b are both rational numbers, and irrational if at least one of the coordinates is irrational. Every rational point is periodic, since it is a pixel point for a suitable choice of p . For example, the rational point (r1 /s1 , r2 /s2 ) can be written as (r1 s2 /s1 s2 , r2 s1 /s1 s2 ), so it is a pixel point with p = s1 s2 . It can be shown (Exercise 13) that the converse is also true: Every periodic point must be a rational point. It follows from the preceding discussion that the irrational points in S are nonperiodic, so that successive iterates of an irrational point (x0 , y0 ) in S must all be distinct points in S . Figure 10.13.12, which was computer generated, shows an irrational point and selected iterates up to 100,000. For the particular irrational point that we selected, the iterates do not seem to cluster in any particular region of S ; rather, they appear to be spread throughout S , becoming denser with successive iterations. The behavior of the iterates in Figure 10.13.12 is sufficiently important that there is some terminology associated with it. We say that a set D of points in S is dense in S

648

Chapter 10 Applications of Linear Algebra

if every circle centered at any point of S encloses points of D , no matter how small the radius of the circle is taken (Figure 10.13.13). It can be shown that the rational points are dense in S and the iterates of most (but not all) of the irrational points are dense in S .

Initial point

10,000 iterations

1000 iterations

2000 iterations

5000 iterations

25,000 iterations

50,000 iterations

100,000 iterations

Figure 10.13.12

Arbitrary circle in S

Points of set D

Figure 10.13.13

Definition of Chaos

We know that under Arnold’s cat map, the rational points of S are periodic and dense in S and that some but not all of the irrational points have iterates that are dense in S . These are the basic ingredients of chaos. There are several definitions of chaos in current use, but the following one, which is an outgrowth of a definition introduced by Robert L. Devaney in 1986 in his book An Introduction to Chaotic Dynamical Systems (Benjamin/Cummings Publishing Company), is most closely related to our work. DEFINITION 1 A mapping T of S onto itself is said to be chaotic if:

(i) S contains a dense set of periodic points of the mapping T . (ii) There is a point in S whose iterates under T are dense in S .

10.13 Chaos

649

Thus Arnold’s cat map satisfies the definition of a chaotic mapping. What is noteworthy about this definition is that a chaotic mapping exhibits an element of order and an element of disorder—the periodic points move regularly in cycles, but the points with dense iterates move irregularly, often obscuring the regularity of the periodic points. This fusion of order and disorder characterizes chaotic mappings.

Dynamical Systems

Chaotic mappings arise in the study of dynamical systems. Informally stated, a dynamical system can be viewed as a system that has a specific state or configuration at each point of time but that changes its state with time. Chemical systems, ecological systems, electrical systems, biological systems, economic systems, and so forth can be looked at in this way. In a discrete-time dynamical system, the state changes at discrete points of time rather than at each instant. In a discrete-time chaotic dynamical system, each state results from a chaotic mapping of the preceding state. For example, if one imagines that Arnold’s cat map is applied at discrete points of time, then the pixel maps in Figure 10.13.3 can be viewed as the evolution of a discrete-time chaotic dynamical system from some initial set of states (each point of the cat is a single initial state) to successive sets of states. One of the fundamental problems in the study of dynamical systems is to predict future states of the system from a known initial state. In practice, however, the exact initial state is rarely known because of errors in the devices used to measure the initial state. It was believed at one time that if the measuring devices were sufficiently accurate and the computers used to perform the iteration were sufficiently powerful, then one could predict the future states of the system to any degree of accuracy. But the discovery of chaotic systems shattered this belief because it was found that for such systems the slightest error in measuring the initial state or in the computation of the iterates becomes magnified exponentially, thereby preventing an accurate prediction of future states. Let us demonstrate this sensitivity to initial conditions with Arnold’s cat map. Suppose that P0 is a point in the xy -plane whose exact coordinates are (0.77837, 0.70904). A measurement error of 0.00001 is made in the y -coordinate, such that the point is thought to be located at (0.77837, 0.70905), which we denote by Q0 . Both P0 and Q0 are pixel points with p = 100,000 (why?), and thus, since (100,000) = 75,000, both return to their initial positions after 75,000 iterations. In Figure 10.13.14 we show the first 50 iterates of P0 under Arnold’s cat map as crosses and the first 50 iterates of Q0 as circles. Although P0 and Q0 are close enough that their symbols overlap initially, only their first eight iterates have overlapping symbols; from the ninth iteration on their iterates follow divergent paths. It is possible to quantify the growth of the error from the eigenvalues and eigenvectors of Arnold’s cat map. For this purpose we will think of Arnold’s cat map as a linear transformation on the tiled plane. Recall from Figure 10.13.10 and the related discussion that the projected distance between two points in S in the direction of the eigenvector v1 increases by a factor of 2.6180 . . . (= λ1 ) with each iteration (Figure 10.13.15). After nine iterations this projected distance increases by a factor of (2.6180 . . .)9 = 5777.99 . . . , and with an initial error of roughly 1/100,000 in the direction of v1 , this distance is 1 0.05777 . . . , or about 17 the width of the unit square S . After 12 iterations this small initial error grows to (2.6180 . . .)12 /100,000 = 1.0368 . . . , which is greater than the width of S . Thus, we completely lose track of the true iterates within S after 12 iterations because of the exponential growth of the initial error. Although sensitivity to initial conditions limits the ability to predict the future evolution of dynamical systems, new techniques are presently being investigated to describe this future evolution in alternative ways.

650

Chapter 10 Applications of Linear Algebra Pi+2

7

5

2

9

0

di+1

Pi

8 di

3

4

Pi+1

6

1 v1

Figure 10.13.14

Figure 10.13.15

Exercise Set 10.13 1. In a journal article [F. J. Dyson and H. Falk, “Period of a Discrete Cat Mapping,” The American Mathematical Monthly, 99 (August–September 1992), pp. 603–614] the following results concerning the nature of the function (p) were established: (i) (p) = 3p if and only if p = 2 · 5k for k = 1, 2, . . . . (ii) (p) = 2p if and only if p = 5k for k = 1, 2, . . . or p = 6 · 5k for k = 0, 1, 2, . . . . (iii) (p) ≤ 12p/7 for all other choices of p . Find (250), (25), (125), (30), (10), (50), (3750), (6), and (5). 2. Find all the n-cycles that are subsets of the 36 points in S of the form (m/6, n/6) with m and n in the range 0, 1, 2, 3, 4, 5. Then find (6). 3. (Fibonacci Shift-Register Random-Number Generator) A wellknown method of generating a sequence of “pseudorandom” integers x0 , x1 , x2 , x3 , . . . in the interval from 0 to p − 1 is based on the following algorithm: (i) Pick any two integers x0 and x1 from the range 0, 1, 2, . . . , p − 1. (ii) Set xn+1 = (xn + xn−1 ) mod p for n = 1, 2, . . . . Here x mod p denotes the number in the interval from 0 to p − 1 that differs from x by a multiple of p. For example, 35 mod 9 = 8 (because 8 = 35 − 3 · 9); 36 mod 9 = 0 (because 0 = 36 − 4 · 9); and −3 mod 9 = 6 (because 6 = −3 + 1 · 9).

Remark If we take p = 1 and pick x0 and x1 from the inter-

val [0, 1), then the above random-number generator produces pseudorandom numbers in the interval [0, 1). The resulting scheme is precisely Arnold’s ct map. Furthermore, if we eliminate the modular arithmetic in the algorithm and take x0 = x1 = 1, then the resulting sequence of integers is the famous Fibonacci sequence, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . , in which each number after the first two is the sum of the preceding two numbers.

4. For C =

1 1

1 , it can be verified that 2

C 25 =

7,778,742,049 12,586,269,025

12,586,269,025 20,365,011,074

It can also be verified that 12,586,269,025 is divisible by 101 and that when 7,778,742,049 and 20,365,011,074 are divided by 101, the remainder is 1. (a) Show that every point in S of the form (m/101, n/101) returns to its starting position after 25 iterations under Arnold’s cat map. (b) Show that every point in S of the form (m/101, n/101) has period 1, 5, or 25.



(c) Show that the point iterating it five times.

1 ,0 101



has period greater than 5 by

(d) Show that (101) = 25.

(a) Generate the sequence of pseudorandom numbers that results from the choices p = 15, x0 = 3, and x1 = 7 until the sequence starts repeating.

T : S → S defined by 5. Show that for  the mapping  T (x, y) = x + 125 , y mod 1, every point in S is a periodic point. Why does this show that the mapping is not chaotic?

(b) Show that the following formula is equivalent to step (ii) of the algorithm:

6. An Anosov automorphism on R 2 is a mapping from the unit square S onto S of the form



1 xn+1 = 1 xn+2

1 2



xn−1 mod p xn

for n = 1, 2, 3, . . .

(c) Use the formula in part (b) to generate the sequence of vectors for the choices p = 21, x0 = 5, and x1 = 5 until the sequence starts repeating.

x a → y c

b d

x mod 1 y

in which (i) a , b, c, and d are integers, (ii) the determinant of the matrix is ±1, and (iii) the eigenvalues of the matrix

10.13 Chaos

do not have magnitude 1. It can be shown that all Anosov automorphisms are chaotic mappings.

12. Find all 2-cycles of Arnold’s cat map by finding all solutions of the equation

x0 1 = 1 y0

(a) Show that Arnold’s cat map is an Anosov automorphism. (b) Which of the following are the matrices of an Anosov automorphism?



0 1

1 , 0

5 2

7 , 3



3 1

2 , 1

6 5

2 2





1 0

0 , 1

(c) Show that the following mapping of S onto S is not an Anosov automorphism.

0 x → −1 y

x mod 1 y

1 0

What is the geometric effect of this transformation on S ? Use your observation to show that the mapping is not a chaotic mapping by showing that all points in S are periodic points. 7. Show that Arnold’s cat map is one-to-one over the unit square S and that its range is S . 8. Show that the inverse of Arnold’s cat map is given by

651

1 2

2 x0 mod 1 y0

with 0 ≤ x0 < 1 and 0 ≤ y0 < 1. [Hint: For appropriate nonnegative integers, r and s , we can write

2 x0 = 3 y0

x0 r − y0 s

3 5

for the preceding equation.] 13. Show that every periodic point of Arnold’s cat map must be a rational point by showing that for all solutions of the equation



n 1 1 x0 x0 = mod 1 1 2 y0 y0 the numbers x0 and y0 are quotients of integers. 14. Let T be the Arnold’s cat map applied five times in a row; that is, T =  5 . Figure Ex-14 represents four successive mappings of T on the first image, each image having a resolution of 101 × 101 pixels. The fifth mapping returns to the first image because this cat map has a period of 25. Explain how you might generate this particular sequence of images.

 −1 (x, y) = (2x − y, −x + y) mod 1 9. Show that the unit square S can be partitioned into four triangular regions on each of which Arnold’s cat map is a transformation of the form

1 x → 1 y

1 2

x a + y b

where a and b need not be the same for each region. [Hint: Find the regions in S that map onto the four shaded regions of the parallelogram in Figure 10.13.1d.] 10. If (x0 , y0 ) is a point in S and (xn , yn ) is its nth iterate under Arnold’s cat map, show that

1 xn = 1 yn

1 2

n x0 mod 1 y0

This result implies that the modular arithmetic need only be performed once rather than after each iteration. 11. Show that (0, 0) is the only fixed point of Arnold’s cat map by showing that the only solution of the equation

x0 1 = 1 y0

1 2

x0 mod 1 y0

with 0 ≤ x0 < 1 and 0 ≤ y0 < 1 is x0 = y0 = 0. [Hint: For appropriate nonnegative integers, r and s , we can write

x0 1 = 1 y0 for the preceding equation.]

1 2

x0 r − y0 s

Figure Ex-14

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. The methods of Exercise 4 show that for the cat map, (p) is the smallest integer satisfying the equation



1 1

1 2

(p)

mod p =

1 0

0 1

This suggests that one way to determine (p) is to compute



1 1

n

1 2

mod p

starting with n = 1 and stopping when this produces the identity matrix. Use this idea to compute (p) for p = 2, 3, . . . , 10.

652

Chapter 10 Applications of Linear Algebra

Compare your results to the formulas given in Exercise 1, if they apply. What can you conjecture about



1 1

1 2

21 (p)

and write C = PDP −1 ; hence, C n = PD nP −1 . Use a computer to show that



(n) c11

(n) c12

C = ⎣ (n) c21 n

mod p

when (p) is even?

(n) c22

⎤ ⎦

where T2. The eigenvalues and eigenvectors for the cat map matrix

1 1 C= 1 2 are √ √ 3+ 5 3− 5 , λ2 = , λ1 = 2 2





1 ⎢ √ ⎥ v1 = ⎣ 1 + 5 ⎦,

 (n)

c11 =  c22 =

1 ⎢ √ ⎥ v2 = ⎣ 1 − 5 ⎦







3+ 5 ⎢ ⎢ 2 D=⎢ 0



1 ⎥ ⎢ √ √ ⎥ ⎥ and P = ⎣ 1 + 5 3 − 5⎦ 0

2

2

5

2 5 1+

√ 



5

2 5

√ n

3− 5 2

 −

√ 





√ n

3+ 5 2

1−



5

2 5 1−

√ 



2 5

5

√ n

3+ 5 2

√ n

3− 5 2

and

2 2 Using these eigenvalues and eigenvectors, we can define



√ 



(n)





1+

(n)

1



√ ⎥

1 − 5⎦ 2

(n)

1

c12 = c21 = √

5



√ n

3+ 5 2

 −

√ n 3

3− 5 2

How can you use these results and your conclusions in Exercise T1 to simplify the method for computing (p)?

10.14 Cryptography In this section we present a method of encoding and decoding messages. We also examine modular arithmetic and show how Gaussian elimination can sometimes be used to break an opponent’s code.

PREREQUISITES: Matrices Gaussian Elimination Matrix Operations Linear Independence Linear Transformations (Section 4.9)

Ciphers

The study of encoding and decoding secret messages is called cryptography. Although secret codes date to the earliest days of written communication, there has been a recent surge of interest in the subject because of the need to maintain the privacy of information transmitted over public lines of communication. In the language of cryptography, codes are called ciphers, uncoded messages are called plaintext, and coded messages are called ciphertext. The process of converting from plaintext to ciphertext is called enciphering, and the reverse process of converting from ciphertext to plaintext is called deciphering. The simplest ciphers, called substitution ciphers, are those that replace each letter of the alphabet by a different letter. For example, in the substitution cipher Plain A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Cipher D E F G H I J K L M N O P Q R S T U V W X Y Z A B C the plaintext letter A is replaced by D , the plaintext letter B by E , and so forth. With this cipher the plaintext message ROME WAS NOT BUILT IN A DAY

10.14 Cryptography

653

becomes URPH ZDV QRW EXLOW LQ D GDB Hill Ciphers

A disadvantage of substitution ciphers is that they preserve the frequencies of individual letters, making it relatively easy to break the code by statistical methods. One way to overcome this problem is to divide the plaintext into groups of letters and encipher the plaintext group by group, rather than one letter at a time. A system of cryptography in which the plaintext is divided into sets of n letters, each of which is replaced by a set of n cipher letters, is called a polygraphic system. In this section we will study a class of polygraphic systems based on matrix transformations. [The ciphers that we will discuss are called Hill ciphers after Lester S. Hill, who introduced them in two papers: “Cryptography in an Algebraic Alphabet,” American Mathematical Monthly, 36 (June– July 1929), pp. 306–312; and “Concerning Certain Linear Transformation Apparatus of Cryptography,” American Mathematical Monthly, 38 (March 1931), pp. 135–154.] In the discussion to follow, we assume that each plaintext and ciphertext letter except Z is assigned the numerical value that specifies its position in the standard alphabet (Table 1). For reasons that will become clear later, Z is assigned a value of zero. Table 1 A B C D E

1 2 3

F

G H

I

J

K L M N O P Q R

S

T U V W X Y Z

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0

In the simplest Hill ciphers, successive pairs of plaintext are transformed into ciphertext by the following procedure: Step 1. Choose a 2 × 2 matrix with integer entries



A=

a12 a22

a11 a21

to perform the encoding. Certain additional conditions on A will be imposed later. Step 2. Group successive plaintext letters into pairs, adding an arbitrary “dummy” letter to fill out the last pair if the plaintext has an odd number of letters, and replace each plaintext letter by its numerical value. Step 3. Successively convert each plaintext pair p1 p2 into a column vector



p=

p1 p2

and form the product Ap. We will call p a plaintext vector and Ap the corresponding ciphertext vector. Step 4. Convert each ciphertext vector into its alphabetic equivalent. E X A M P L E 1 Hill Cipher of a Message

Use the matrix



1 0

2 3

to obtain the Hill cipher for the plaintext message I AM HIDING

654

Chapter 10 Applications of Linear Algebra Solution If we group the plaintext into pairs and add the dummy letter G to fill out the

last pair, we obtain

IA

MH

ID

IN

13 8

9 4

9 14

GG

or, equivalently, from Table 1, 9 1

7 7

To encipher the pair IA, we form the matrix product



1 0

2 3





9 11 = 1 3

which, from Table 1, yields the ciphertext KC . To encipher the pair MH , we form the product



1 0

2 3





13 29 = 8 24

(1)

However, there is a problem here, because the number 29 has no alphabet equivalent (Table 1). To resolve this problem, we make the following agreement: Whenever an integer greater than 25 occurs, it will be replaced by the remainder that results when this integer is divided by 26. Because the remainder after division by 26 is one of the integers 0, 1, 2, . . . , 25, this procedure will always yield an integer with an alphabet equivalent. Thus, in (1) we replace 29 by 3, which is the remainder after dividing 29 by 26. It now follows from Table 1 that the ciphertext for the pair MH is CX . The computations for the remaining ciphertext vectors are





1 0

1 0



2 3 1 0









2 3

9 17 = 4 12

9 37 = 14 42

2 3



or



11 16

7 21 = 7 21

These correspond to the ciphertext pairs QL, KP , and UU , respectively. In summary, the entire ciphertext message is

KC

CX

QL

KP

UU

which would usually be transmitted as a single string without spaces: KCCXQLKPUU Because the plaintext was grouped in pairs and enciphered by a 2 × 2 matrix, the Hill cipher in Example 1 is referred to as a Hill 2-cipher. It is obviously also possible to group the plaintext in triples and encipher by a 3 × 3 matrix with integer entries; this is called a Hill 3-cipher. In general, for a Hill n-cipher, plaintext is grouped into sets of n letters and enciphered by an n × n matrix with integer entries. Modular Arithmetic

In Example 1, integers greater than 25 were replaced by their remainders after division by 26. This technique of working with remainders is at the core of a body of mathematics called modular arithmetic. Because of its importance in cryptography, we will digress for a moment to touch on some of the main ideas in this area.

10.14 Cryptography

655

In modular arithmetic we are given a positive integer m, called the modulus, and any two integers whose difference is an integer multiple of the modulus are regarded as “equal” or “equivalent” with respect to the modulus. More precisely, we make the following definition. DEFINITION 1 If m is a positive integer and a and b are any integers, then we say that

a is equivalent to b modulo m, written a = b (mod m) if a − b is an integer multiple of m.

E X A M P L E 2 Various Equivalences

7=2

(mod 5)

19 = 3

(mod 2) (mod 26)

−1 = 25 12 = 0

(mod 4)

For any modulus m it can be proved that every integer a is equivalent, modulo m, to exactly one of the integers 0, 1, 2, . . . , m − 1 We call this integer the residue of a modulo m, and we write

Zm = {0, 1, 2, . . . , m − 1} to denote the set of residues modulo m. If a is a nonnegative integer, then its residue modulo m is simply the remainder that results when a is divided by m. For an arbitrary integer a , the residue can be found using the following theorem. THEOREM 10.14.1 For any integer a and modulus m, let

R = remainder of

|a| m

Then the residue r of a modulo m is given by

⎧ ⎪ ⎨R r = m−R ⎪ ⎩0

if a ≥ 0 if a < 0 and R  = 0 if a < 0 and R = 0

E X A M P L E 3 Residues mod 26

Find the residue modulo 26 of (a) 87, (b) −38, and (c) −26. Solution (a) Dividing |87|

= 87 by 26 yields a remainder of R = 9, so r = 9. Thus, 87 = 9 (mod 26)

Solution (b) Dividing |−38|

= 38 by 26 yields a remainder of R = 12, so r = 26 − 12 =

14. Thus,

−38 = 14 (mod 26)

656

Chapter 10 Applications of Linear Algebra Solution (c) Dividing |−26|

= 26 by 26 yields a remainder of R = 0. Thus, −26 = 0 (mod 26)

In ordinary arithmetic every nonzero number a has a reciprocal or multiplicative inverse, denoted by a −1 , such that

aa −1 = a −1 a = 1 In modular arithmetic we have the following corresponding concept: DEFINITION 2 If a is a number in Zm , then a number a −1 in Zm is called a reciprocal

or multiplicative inverse of a modulo m if aa −1 = a −1 a = 1 (mod m).

It can be proved that if a and m have no common prime factors, then a has a unique reciprocal modulo m; conversely, if a and m have a common prime factor, then a has no reciprocal modulo m. E X A M P L E 4 Reciprocal of 3 mod 26

The number 3 has a reciprocal modulo 26 because 3 and 26 have no common prime factors. This reciprocal can be obtained by finding the number x in Z26 that satisfies the modular equation 3x = 1 (mod 26) Although there are general methods for solving such modular equations, it would take us too far afield to study them. However, because 26 is relatively small, this equation can be solved by trying the possible solutions, 0 to 25, one at a time. With this approach we find that x = 9 is the solution, because 3 · 9 = 27 = 1 (mod 26) Thus,

3−1 = 9 (mod 26)

E X A M P L E 5 A Number with No Reciprocal mod 26

The number 4 has no reciprocal modulo 26, because 4 and 26 have 2 as a common prime factor (see Exercise 9). For future reference, in Table 2 we provide the following reciprocals modulo 26: Table 2 Reciprocals Modulo 26

Deciphering

a

1

3

5

7

9

11

15

17

19

21

23

25

a–1

1

9

21

15

3

19

7

23

11

5

17

25

Every useful cipher must have a procedure for decipherment. In the case of a Hill cipher, decipherment uses the inverse (mod 26) of the enciphering matrix. To be precise, if m is a positive integer, then a square matrix A with entries in Zm is said to be invertible modulo m if there is a matrix B with entries in Zm such that

AB = BA = I (mod m)

10.14 Cryptography

Suppose now that



a11 A= a21

a12 a22

657

is invertible modulo 26 and this matrix is used in a Hill 2-cipher. If



p=

p1 p2

is a plaintext vector, then c = Ap (mod 26) is the corresponding ciphertext vector and p = A−1 c (mod 26) Thus, each plaintext vector can be recovered from the corresponding ciphertext vector by multiplying it on the left by A−1 (mod 26). In cryptography it is important to know which matrices are invertible modulo 26 and how to obtain their inverses. We now investigate these questions. In ordinary arithmetic, a square matrix A is invertible if and only if det(A)  = 0, or, equivalently, if and only if det(A) has a reciprocal. The following theorem is the analog of this result in modular arithmetic. THEOREM 10.14.2 A square matrix A with entries in Zm is invertible modulo m if and only if the residue of det(A) modulo m has a reciprocal modulo m.

Because the residue of det(A) modulo m will have a reciprocal modulo m if and only if this residue and m have no common prime factors, we have the following corollary. COROLLARY 10.14.3 A square matrix A with entries in Zm is invertible modulo m if and only if m and the residue of det(A) modulo m have no common prime factors.

Because the only prime factors of m = 26 are 2 and 13, we have the following corollary, which is useful in cryptography. COROLLARY 10.14.4 A square matrix A with entries in Z26 is invertible modulo 26 if and only if the residue of det(A) modulo 26 is not divisible by 2 or 13.

We leave it for you to verify that if



a c

A=

b d

has entries in Z26 and the residue of det(A) = ad − bc modulo 26 is not divisible by 2 or 13, then the inverse of A (mod 26) is given by −1

A

= (ad − bc)

−1



d −c

−b a

(mod 26)

where (ad − bc)−1 is the reciprocal of the residue of ad − bc (mod 26).

(2)

658

Chapter 10 Applications of Linear Algebra

E X A M P L E 6 Inverse of a Matrix mod 26

Find the inverse of

A=

5 2

6 3

modulo 26. Solution

det(A) = ad − bc = 5 · 3 − 6 · 2 = 3 so from Table 2,

(ad − bc)−1 = 3−1 = 9 (mod 26)

Thus, from (2),

A

−1

−6

3 =9 −2

As a check,

AA



−1



5 = 2



27 = 5 −18

6 3



1 8

−54



1 = 8 45



24 53 = 19 26

24 19



234 1 = 105 0

0 1

(mod 26)

(mod 26)

Similarly, A−1 A = I .

E X A M P L E 7 Decoding a Hill 2-Cipher

Decode the following Hill 2-cipher, which was enciphered by the matrix in Example 6: GTNKGKDUSK Solution From Table 1 the numerical equivalent of this ciphertext is

7 20

14 11

7 11

4 21

19 11

To obtain the plaintext pairs, we multiply each ciphertext vector by the inverse of A (obtained in Example 6):



1 8

24 19

1 8

24 19

1 8

24 19

1 8

24 19

1 8

24 19









7 487 19 = = 20 436 20

(mod 26)

14 278 18 = = 11 321 9

(mod 26)

7 271 11 = = 11 265 5

(mod 26)

4 508 14 = = 21 431 15

(mod 26)

19 283 23 = = 11 361 23

(mod 26)















From Table 1, the alphabet equivalents of these vectors are ST RI KE NO WW which yields the message STRIKE NOW

10.14 Cryptography

Breaking a Hill Cipher

659

Because the purpose of enciphering messages and information is to prevent “opponents” from learning their contents, cryptographers are concerned with the security of their ciphers—that is, how readily they can be broken (deciphered by their opponents). We will conclude this section by discussing one technique for breaking Hill ciphers. Suppose that you are able to obtain some corresponding plaintext and ciphertext from an opponent’s message. For example, on examining some intercepted ciphertext, you may be able to deduce that the message is a letter that begins DEAR SIR. We will show that with a small amount of such data, it may be possible to determine the deciphering matrix of a Hill code and consequently obtain access to the rest of the message. It is a basic result in linear algebra that a linear transformation is completely determined by its values at a basis. This principle suggests that if we have a Hill n-cipher, and if p1 , p2 , . . . , pn are linearly independent plaintext vectors whose corresponding ciphertext vectors

Ap1 , Ap2 , . . . , Apn are known, then there is enough information available to determine the matrix A and hence A−1 (mod m). The following theorem, whose proof is discussed in the exercises, provides a way to do this. THEOREM 10.14.5 Determining the Deciphering Matrix

Let p1 , p2 , . . . , pn be linearly independent plaintext vectors, and let c1 , c2 , . . . , cn be the corresponding ciphertext vectors in a Hill n-cipher. If





pT1 ⎢pT ⎥ ⎢ 2⎥

⎥ P =⎢ ⎢ .. ⎥ ⎣.⎦ pTn is the n × n matrix with row vectors pT1 , pT2 , . . . , pTn and if

⎡ T⎤ c1

⎢cT ⎥ ⎢ 2⎥ ⎥ C=⎢ ⎢ .. ⎥ ⎣.⎦ cTn is the n × n matrix with row vectors cT1 , cT2 , . . . , cTn , then the sequence of elementary row operations that reduces C to I transforms P to (A−1 )T . This theorem tells us that to find the transpose of the deciphering matrix A−1 , we must find a sequence of row operations that reduces C to I and then perform this same sequence of operations on P . The following example illustrates a simple algorithm for doing this. E X A M P L E 8 Using Theorem 10.14.5

The following Hill 2-cipher is intercepted: IOSBTGXESPXHOPDE Decipher the message, given that it starts with the word DEAR.

660

Chapter 10 Applications of Linear Algebra Solution From Table 1, the numerical equivalent of the known plaintext is

DE

AR

4 5

1 18

and the numerical equivalent of the corresponding ciphertext is IO

SB

9 15

19 2

so the corresponding plaintext and ciphertext vectors are





4 9 p1 = ↔ c1 = 5 15



p2 =



1 19 ↔ c2 = 18 2

 T

We want to reduce

C=

c1



9 19

=

15 2

cT2 to I by elementary row operations and simultaneously apply these operations to



P =

pT1 pT2





4 1

=

5 18

to obtain (A−1 )T (the transpose of the deciphering matrix). This can be accomplished by adjoining P to the right of C and applying row operations to the resulting matrix [C | P ] until the left side is reduced to I . The final matrix will then have the form [I | (A−1 )T ]. The computations can be carried out as follows:



15 2

4 1

1 19

45 2

12 1

15 18

We multiplied the first row by 9−1 = 3.

1 19

19 2

12 1

15 18

We replaced 45 by its residue modulo 26.

1 0

19 −359

1 0

19 5

12 7

1 0

19 1

12 147

1 0

19 1

12 17

1 0

0 1

−311 −156

1 0

0 1

1 17



5 18

9 19

We formed the matrix [C | P ].



12 −227

15 −267

15 19

We added −19 times the first row to the second.

We replaced the entries in the second row by their residues modulo 26.

15 399

We multiplied the second row by 5−1 = 21.

15 9

17

We replaced the entries in the second row by their residues modulo 26.

We added −19 times the second row to the first.

9

0 9

We replaced the entries in the first row by their residues modulo 26.

Thus,

(A−1 )T =



1 17

0 9

10.14 Cryptography

so the deciphering matrix is

A

−1



1 = 0

17 9

661

To decipher the message, we first group the ciphertext into pairs and find the numerical equivalent of each letter: IO

SB

TG

XE

SP

XH

OP

DE

9 15

19 2

20 7

24 5

19 16

24 8

15 16

4 5

Next, we multiply successive ciphertext vectors on the left by A−1 and find the alphabet equivalents of the resulting plaintext pairs:

































17 9

9 = 15

4 5

D E

1 0

17 9

19 1 = 2 18

A R

1 0

17 9

20 9 = 7 11

I K

1 0

17 9

24 5 = 5 19

E S

1 0

17 9

19 5 = 16 14

E N

1 0

17 9

24 4 = 8 20

D T

1 0

17 9

15 1 = 16 14

A N

1 0

17 9

4 11 = 5 19

K S







1 0



(mod 26)

Finally, we construct the message from the plaintext pairs: DE AR IK ES EN DT AN KS DEAR IKE SEND TANKS

F URTHER R EADINGS

Readers interested in learning more about mathematical cryptography are referred to the following books, the first of which is elementary and the second more advanced. 1. ABRAHAM SINKOV, Elementary Cryptanalysis, a Mathematical Approach (Mathematical Association of America, 2009). 2. ALAN G. KONHEIM, Cryptography, a Primer (New York: Wiley-Interscience, 1981).

662

Chapter 10 Applications of Linear Algebra

Exercise Set 10.14 1. Obtain the Hill cipher of the message

symbols would be available and all matrix arithmetic would be done modulo 29. Under what conditions would a matrix with entries in Z29 be invertible modulo 29?

DARK NIGHT for each of the following enciphering matrices:

(a)

1 2

3 1



4 (b) 1

3 2

9. Show that the modular equation 4x = 1 (mod 26) has no solution in Z26 by successively substituting the values x = 0, 1, 2, . . . , 25.

2. In each part determine whether the matrix is invertible modulo 26. If so, find its inverse modulo 26 and check your work by verifying that AA−1 = A−1 A = I (mod 26).





9 1 (a) A = 7 2 2 1 (d) A = 1 7





3 1 (b) A = 5 3

3 1 (e) A = 6 2



8 11 (c) A = 1 9



SAKNOXAOJX given that it is a Hill cipher with enciphering matrix 4 3

1 2

(b) To prove Theorem 10.14.5, let E1 , E2 , . . . , En be the elementary matrices that correspond to the row operations that reduce C to I , so

En · · · E2 E1 C = I

1 8 (f ) A = 1 3

3. Decode the message



10. (a) Let P and C be the matrices in Theorem 10.14.5. Show that P = C(A−1 )T .

Show that

from which it follows that the same sequence of row operations that reduces C to I converts P to (A−1 )T . 11. (a) If A is the enciphering matrix of a Hill n-cipher, show that

4. A Hill 2-cipher is intercepted that starts with the pairs SL HK Find the deciphering and enciphering matrices, given that the plaintext is known to start with the word ARMY. 5. Decode the following Hill 2-cipher if the last four plaintext letters are known to be ATOM. LNGIHGYBVRENJYQO 6. Decode the following Hill 3-cipher if the first nine plaintext letters are IHAVECOME: HPAFQGGDUGDDHPGODYNOR 7. All of the results of this section can be generalized to the case where the plaintext is a binary message; that is, it is a sequence of 0’s and 1’s. In this case we do all of our modular arithmetic using modulus 2 rather than modulus 26. Thus, for example, 1 + 1 = 0 (mod 2). Suppose we want to encrypt the message 110101111. break it into triplets ⎡ ⎤Let ⎡ us ⎤ first ⎡ ⎤ ⎡ to form the ⎤ three 1 1 1 1 1 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ vectors ⎣1⎦, ⎣0⎦, ⎣1⎦, and let us take ⎣0 1 1⎦ as our 0 1 1 1 1 1 enciphering matrix. (a) Find the encoded message. (b) Find the inverse modulo 2 of the enciphering matrix, and verify that it decodes your encoded message. 8. If, in addition to the standard alphabet, a period, comma, and question mark were allowed, then 29 plaintext and ciphertext

En · · · E2 E1 P = (A−1 )T

A−1 = (C −1P )T (mod 26) where C and P are the matrices defined in Theorem 10.14.5. (b) Instead of using Theorem 10.14.5 as in the text, find the deciphering matrix A−1 of Example 8 by using the result in part (a) and Equation (2) to compute C −1 . [Note: Although this method is practical for Hill 2-ciphers, Theorem 10.14.5 is more efficient for Hill n-ciphers with n > 2.]

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Two integers that have no common factors (except 1) are said to be relatively prime. Given a positive integer n, let Sn = {a1 , a2 , a3 , . . . , am }, where a1 < a2 < a3 < · · · < am , be the set of all positive integers less than n and relatively prime to n. For example, if n = 9, then

S9 = {a1 , a2 , a3 , . . . , a6 } = {1, 2, 4, 5, 7, 8} (a) Construct a table consisting of n and Sn for n = 2, 3, . . . , 15, and then compute m k=1



ak and

m k=1



ak

(mod n)

10.15 Genetics

in each case. Draw a conjecture for n > 15 and prove your conjecture to be true. [Hint: Use the fact that if a is relatively prime to n, then n − a is also relatively prime to n.] (b) Given a positive integer n and the set Sn , let Pn be the m × m matrix



a1 a2 a3 .. .

⎢ ⎢ ⎢ ⎢ Pn = ⎢ ⎢ ⎢ ⎢ ⎣am−1 am

a2 a3 a4 .. .

a3 a4 a5 .. .

··· ··· ··· .. .

am−1 am a1 .. .

am a1

a1 a2

··· ···

am−3 am−2

am a1 a2 .. .



⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ am−2 ⎦ am−1

so that, for example,



1

⎢2 ⎢ ⎢4 ⎢ P9 = ⎢ ⎢5 ⎢ ⎣7 8

2 4 5 7 8 1

4 5 7 8 1 2

5 7 8 1 2 4

7 8 1 2 4 5



8 1⎥ ⎥ 2⎥ ⎥ ⎥ 4⎥ ⎥ 5⎦ 7

Use a computer to compute det(Pn ) and det(Pn )(mod n) for n = 2, 3, . . . , 15, and then use these results to construct a conjecture. (c) Use the results of part (a) to prove your conjecture to be true. [Hint: Add the first m − 1 rows of Pn to its last row and then

663

use Theorem 2.2.3.] What do these results imply about the inverse of Pn (mod n)? T2. Given a positive integer n greater than 1, the number of positive integers less than n and relatively prime to n is called the Euler phi function of n and is denoted by ϕ(n). For example, ϕ(6) = 2 since only two positive integers (1 and 5) are less than 6 and have no common factor with 6. (a) Using a computer, for each value of n = 2, 3, . . . , 25 compute and print out all positive integers that are less than n and relatively prime to n. Then use these integers to determine the values of ϕ(n) for n = 2, 3, . . . , 25. Can you discover a pattern in the results? (b) It can be shown that if {p1 , p2 , p3 , . . . , pm } are all the distinct prime factors of n, then

ϕ(n) = n 1 −

1

p1

!

1−

1

p2

!

1−

1

p3

!

··· 1 −

1

!

pm

For example, since {2, 3} are the distinct prime factors of 12, we have ! ! 1 1 =4 ϕ(12) = 12 1 − 1− 2 3 which agrees with the fact that {1, 5, 7, 11} are the only positive integers less than 12 and relatively prime to 12. Using a computer, print out all the prime factors of n for n = 2, 3, . . . , 25. Then compute ϕ(n) using the formula above and compare it to your results in part (a).

10.15 Genetics In this section we investigate the propagation of an inherited trait in successive generations by computing powers of a matrix.

PREREQUISITES: Eigenvalues and Eigenvectors Diagonalization of a Matrix Intuitive Understanding of Limits InheritanceTraits

In this section we examine the inheritance of traits in animals or plants. The inherited trait under consideration is assumed to be governed by a set of two genes, which we designate by A and a . Under autosomal inheritance each individual in the population of either gender possesses two of these genes, the possible pairings being designated AA, Aa , and aa . This pair of genes is called the individual’s genotype, and it determines how the trait controlled by the genes is manifested in the individual. For example, in snapdragons a set of two genes determines the color of the flower. Genotype AA produces red flowers, genotype Aa produces pink flowers, and genotype aa produces white flowers. In humans, eye coloration is controlled through autosomal inheritance. Genotypes AA and Aa have brown eyes, and genotype aa has blue eyes. In this case we say that gene A dominates gene a , or that gene a is recessive to gene A, because genotype Aa has the same outward trait as genotype AA. In addition to autosomal inheritance we will also discuss X -linked inheritance. In this type of inheritance, the male of the species possesses only one of the two possible

664

Chapter 10 Applications of Linear Algebra

genes (A or a ), and the female possesses a pair of the two genes (AA, Aa , or aa ). In humans, color blindness, hereditary baldness, hemophilia, and muscular dystrophy, to name a few, are traits controlled by X-linked inheritance. Below we explain the manner in which the genes of the parents are passed on to their offspring for the two types of inheritance. We construct matrix models that give the probable genotypes of the offspring in terms of the genotypes of the parents, and we use these matrix models to follow the genotype distribution of a population through successive generations. Autosomal Inheritance

In autosomal inheritance an individual inherits one gene from each of its parents’ pairs of genes to form its own particular pair. As far as we know, it is a matter of chance which of the two genes a parent passes on to the offspring. Thus, if one parent is of genotype Aa , it is equally likely that the offspring will inherit the A gene or the a gene from that parent. If one parent is of genotype aa and the other parent is of genotype Aa , the offspring will always receive an a gene from the aa parent and will receive either an A gene or an a gene, with equal probability, from the Aa parent. Consequently, each of the offspring has equal probability of being genotype aa or Aa . In Table 1 we list the probabilities of the possible genotypes of the offspring for all possible combinations of the genotypes of the parents. Table 1 Genotypes of Parents Genotype of Offspring

AA–AA AA AA

AA–Aa AA

AA–aa AA

Aa–Aa

Aa–aa

aa–aa

AA

1

1 2

0

1 4

0

0

Aa

0

1 2

1

1 2

1 2

0

aa

0

0

0

1 4

1 2

1

E X A M P L E 1 Distribution of Genotypes in a Population

Suppose that a farmer has a large population of plants consisting of some distribution of all three possible genotypes AA, Aa , and aa . The farmer desires to undertake a breeding program in which each plant in the population is always fertilized with a plant of genotype AA and is then replaced by one of its offspring. We want to derive an expression for the distribution of the three possible genotypes in the population after any number of generations. For n = 0, 1, 2, . . . , let us set

an = fraction of plants of genotype AA in nth generation bn = fraction of plants of genotype Aa in nth generation cn = fraction of plants of genotype aa in nth generation Thus a0 , b0 , and c0 specify the initial distribution of the genotypes. We also have that

an + bn + cn = 1 for n = 0, 1, 2, . . . From Table 1 we can determine the genotype distribution of each generation from the genotype distribution of the preceding generation by the following equations:

an = an−1 + 21 bn−1 bn = cn−1 + 21 bn−1 cn = 0

n = 1, 2, . . .

(1)

10.15 Genetics

665

For example, the first of these three equations states that all the offspring of a plant of genotype AA will be of genotype AA under this breeding program and that half of the offspring of a plant of genotype Aa will be of genotype AA. Equations (1) can be written in matrix notation as x(n) = M x(n−1) ,

n = 1, 2, . . .

(2)

⎤ ⎤ ⎡ ⎤ ⎡ ⎡ 1 21 0 an an−1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ x(n) = ⎣bn ⎦, x(n−1) = ⎣bn−1 ⎦, and M = ⎣0 21 1⎦ cn cn−1 0 0 0 Note that the three columns of the matrix M are the same as the first three columns of

where

Table 1. From Equation (2) it follows that x(n) = M x(n−1) = M 2 x(n−2) = · · · = M n x(0)

(3)

Consequently, if we can find an explicit expression for M n , we can use (3) to obtain an explicit expression for x(n) . To find an explicit expression for M n , we first diagonalize M . That is, we find an invertible matrix P and a diagonal matrix D such that

M = PDP −1

(4)

With such a diagonalization, we then have (see Exercise 1)

M n = PD nP −1 where



λ1

0

⎢0 ⎢ D n = ⎢ .. ⎣.

λ2 .. .

0

0

⎤n

··· ···

0 0

for n = 1, 2, . . .



λn1

0 0⎥ ⎥

⎢0 ⎢ .. ⎥ = ⎢ .. ⎦ ⎣. . · · · λk 0

.. .

0

0

0 0

λn2 .. .

.. .

0

0

··· ···



0 0⎥ ⎥

.. ⎥ .⎦ · · · λnk

The diagonalization of M is accomplished by finding its eigenvalues and corresponding eigenvectors. These are as follows (verify):

λ1 = 1, ⎡ ⎤

Eigenvalues:

Corresponding eigenvectors: Thus, in Equation (4) we have



1 ⎢ ⎥ v1 = ⎣0⎦, 0



λ2

0

0

λ3

0



and

1 ⎢ P = [v1 | v2 | v3 ] = ⎣0 0

x(n) = PD nP −1 x(0)



⎢ D =⎣0

0



1 ⎢ = ⎣0 0

1 −1 0

⎤⎡

1 1 ⎥⎢ −2⎦ ⎣0 1 0

λ3 = 0 ⎡

1 ⎢ ⎥ v2 = ⎣−1⎦, 0

1 0 ⎥ ⎢ 0 ⎦ = ⎣0

λ1

Therefore,

λ2 = 21 , ⎡ ⎤



0

0

1 2

0⎦

0

0





1 −1 0

1 ⎥ −2 ⎦ 1

0

0

 1 n 2

0



1 ⎢ ⎥ v3 = ⎣−2⎦ 1

⎤⎡

1 ⎥⎢ 0⎦ ⎣0 0 0

1 −1 0

⎤⎡ ⎤

1 a0 ⎥⎢ ⎥ −2⎦ ⎣b0 ⎦ 1 c0

666

Chapter 10 Applications of Linear Algebra

or x(n)

 n  n−1 ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 1 1 − 21 1 − 21 a0 an  1 n  1 n−1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ b = ⎣bn ⎦ = ⎢ ⎣0 ⎦ ⎣ 0⎦ 2 2 cn c0 0 0 0 ⎡ ⎤  n  n−1 a0 + b0 + c0 − 21 b0 − 21 c0 ⎢ ⎥  1 n  1 n−1 ⎢ ⎥ =⎣ b0 + 2 c0 ⎦ 2 0

Using the fact that a0 + b0 + c0 = 1, we thus have

 n  n−1 an = 1 − 21 b0 − 21 c0  1 n  1 n−1 bn = 2 b0 + 2 c0

n = 1, 2, . . .

(5)

cn = 0 These are explicit formulas for the fractions of the three genotypes in the nth generation of plants in terms  n of the initial genotype fractions. Because 21 tends to zero as n approaches infinity, it follows from these equations that

an → 1 bn → 0 cn = 0 as n approaches infinity. That is, in the limit all plants in the population will be genotype AA.

E X A M P L E 2 Modifying Example 1

We can modify Example 1 so that instead of each plant being fertilized with one of genotype AA, each plant is fertilized with a plant of its own genotype. Using the same notation as in Example 1, we then find x(n) = M n x(0) where



1

⎢ M=⎢ ⎣0 0

1 4 1 2 1 4



0



0⎥ ⎦ 1

The columns of this new matrix M are the same as the columns of Table 1 corresponding to parents with genotypes AA–AA, Aa –Aa , and aa –aa . The eigenvalues of M are (verify)

λ1 = 1,

λ2 = 1,

λ3 =

1 2

The eigenvalue λ1 = 1 has multiplicity two and its corresponding eigenspace is twodimensional. Picking two linearly independent eigenvectors v1 and v2 in that eigenspace, and a single eigenvector v3 for the simple eigenvalue λ3 = 21 , we have (verify)

⎡ ⎤

1 ⎢ ⎥ v1 = ⎣0⎦, 0

⎡ ⎤

0 ⎢ ⎥ v2 = ⎣0⎦, 1





1 ⎢ ⎥ − 2 v3 = ⎣ ⎦ 1

10.15 Genetics

667

The calculations for x(n) are then x(n) = M n x(0) = PD nP −1 x(0)



1 ⎢ = ⎣0 0



⎤⎡

1 1 ⎥⎢ −2⎦ ⎣0 1 0

0 0 1

1

1 2

 n+1 − 21  1 n

0

1 2



⎢ =⎢ ⎣0

2

 1 n+1 2

0 1 0

⎤⎡

1 0 ⎥⎢ ⎢ 0 ⎦ ⎣0

 1 n 2

⎤⎡ ⎤ 0 a ⎥ ⎢ 0⎥ ⎥ 0⎦ ⎣b0 ⎦ c0 1

0

1 2 1 2 − 21

⎤⎡ ⎤ a ⎥ ⎢ 0⎥ b 1⎥ ⎣ ⎦ 0⎦ c0 0 0

Thus,

%  n+1 & b0 an = a0 + 21 − 21  1 n bn = 2 b0 n = 1, 2 , . . . %  1 n+1 & 1 cn = c0 + 2 − 2 b0  n  n+1 In the limit, as n tends to infinity, 21 → 0 and 21 → 0, so

(6)

an → a0 + 21 b0 bn → 0 cn → c0 + 21 b0 Thus, fertilization of each plant with one of its own genotype produces a population that in the limit contains only genotypes AA and aa . Autosomal Recessive Diseases

There are many genetic diseases governed by autosomal inheritance in which a normal gene A dominates an abnormal gene a . Genotype AA is a normal individual; genotype Aa is a carrier of the disease but is not afflicted with the disease; and genotype aa is afflicted with the disease. In humans such genetic diseases are often associated with a particular racial group—for instance, cystic fibrosis (predominant among Caucasians), sickle-cell anemia (predominant among people of African origin), Cooley’s anemia (predominant among people of Mediterranean origin), and Tay-Sachs disease (predominant among Eastern European Jews). Suppose that an animal breeder has a population of animals that carries an autosomal recessive disease. Suppose further that those animals afflicted with the disease do not survive to maturity. One possible way to control such a disease is for the breeder to always mate a female, regardless of her genotype, with a normal male. In this way, all future offspring will either have a normal father and a normal mother (AA–AA matings) or a normal father and a carrier mother (AA–Aa matings). There can be no AA–aa matings since animals of genotype aa do not survive to maturity. Under this type of mating program no future offspring will be afflicted with the disease, although there will still be carriers in future generations. Let us now determine the fraction of carriers in future generations. We set x(n) =

an , bn

n = 1, 2, . . .

where

an = fraction of population of genotype AA in nth generation bn = fraction of population of genotype Aa (carriers) in nth generation

668

Chapter 10 Applications of Linear Algebra

Because each offspring has at least one normal parent, we may consider the controlled mating program as one of continual mating with genotype AA, as in Example 1. Thus, the transition of genotype distributions from one generation to the next is governed by the equation x(n) = M x(n−1) , n = 1, 2, . . .



where

M=



1 2 1 2

1 0

(0)

Because we know the initial distribution x , the distribution of genotypes in the nth generation is thus given by x(n) = M n x(0) ,

n = 1, 2 , . . .

The diagonalization of M is easily carried out (see Exercise 4) and leads to x

(n)

n −1 (0)

= PD P

x



1 = 0



1 −1

1 0

0



 1 n 2

1 0

 n    n  a0 + b0 − 21 b0 1 1 − 21 a0 = =  1 n  1 n b0 0 b0 

2

a0 b0

1 −1

2

Because a0 + b0 = 1, we have

 n an = 1 − 21 b0  n bn = 21 b0

n = 1, 2, . . .

(7)

Thus, as n tends to infinity, we have

an → 1 bn → 0 so in the limit there will be no carriers in the population. From (7) we see that bn = 21 bn−1 , n = 1, 2, . . .

(8)

That is, the fraction of carriers in each generation is one-half the fraction of carriers in the preceding generation. It would be of interest also to investigate the propagation of carriers under random mating, when two animals mate without regard to their genotypes. Unfortunately, such random mating leads to nonlinear equations, and the techniques of this section are not applicable. However, by other techniques it can be shown that under random mating, Equation (8) is replaced by

bn =

bn−1 1 + 21 bn−1

,

n = 1, 2, . . .

(9)

As a numerical example, suppose that the breeder starts with a population in which 10% of the animals are carriers. Under the controlled-mating program governed by Equation (8), the percentage of carriers can be reduced to 5% in one generation. But under random mating, Equation (9) predicts that 9.5% of the population will be carriers after one generation (bn = .095 if bn−1 = .10). In addition, under controlled mating no offspring will ever be afflicted with the disease, but with random mating it can be shown that about 1 in 400 offspring will be born with the disease when 10% of the population are carriers. X-Linked Inheritance

As mentioned in the introduction, in X-linked inheritance the male possesses one gene (A or a ) and the female possesses two genes (AA, Aa , or aa ). The term X-linked is used because such genes are found on the X-chromosome, of which the male has one and the

10.15 Genetics

669

female has two. The inheritance of such genes is as follows: A male offspring receives one of his mother’s two genes with equal probability, and a female offspring receives the one gene of her father and one of her mother’s two genes with equal probability. Readers familiar with basic probability can verify that this type of inheritance leads to the genotype probabilities in Table 2. Table 2

Female

Offspring

Male

Genotypes of Parents (Father, Mother) (A, AA) AA

A, Aa) (A

(A, aa)

(a, AA AA)

(a, Aa)

(a, aa)

A

1

1 2

0

1

1 2

0

a

0

1 2

1

0

1 2

1

AA

1

1 2

0

0

0

0

Aa

0

1 2

1

1

1 2

0

aa

0

0

0

0

1 2

1

We will discuss a program of inbreeding in connection with X-linked inheritance. We begin with a male and female; select two of their offspring at random, one of each gender, and mate them; select two of the resulting offspring and mate them; and so forth. Such inbreeding is commonly performed with animals. (Among humans, such brother-sister marriages were used by the rulers of ancient Egypt to keep the royal line pure.) The original male-female pair can be one of the six types, corresponding to the six columns of Table 2:

(A, AA),

(A, Aa),

(A, aa),

(a, AA),

(a, Aa),

(a, aa)

The sibling pairs mated in each successive generation have certain probabilities of being one of these six types. To compute these probabilities, for n = 0, 1, 2, . . . , let us set

an = probability sibling-pair mated in nth generation is type (A, AA) bn = probability sibling-pair mated in nth generation is type (A, Aa) cn = probability sibling-pair mated in nth generation is type (A, aa) dn = probability sibling-pair mated in nth generation is type (a, AA) en = probability sibling-pair mated in nth generation is type (a, Aa) fn = probability sibling-pair mated in nth generation is type (a, aa) With these probabilities we form a column vector



x(n)

⎤ an ⎢ bn ⎥ ⎢ ⎥ ⎢c ⎥ ⎢ n⎥ = ⎢ ⎥, ⎢ dn ⎥ ⎢ ⎥ ⎣ en ⎦

n = 0, 1, 2, . . .

fn From Table 2 it follows that x(n) = M x(n−1) ,

n = 1, 2, . . .

(10)

670

Chapter 10 Applications of Linear Algebra

where

⎡ ⎢ ⎢ ⎢ ⎢ ⎢ M=⎢ ⎢ ⎢ ⎢ ⎣

(A, AA) (A, Aa) (A, aa) (a, AA) (a, Aa) (a, aa) 0

0

1 4 1 4

0 0

1

0

0

0

0

1

0

0

0

0

1 4 1 4

0

0

0

0

0

1 4 1 4

1

0

0

0

0

0

0

1 4 1 4

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

0

(A, AA) (A, Aa) (A, aa) (a, AA) (a, Aa) (a, aa)

1

For example, suppose that in the (n − 1)-st generation, the sibling pair mated is type (A, Aa). Then their male offspring will be genotype A or a with equal probability, and their female offspring will be genotype AA or Aa with equal probability. Because one of the male offspring and one of the female offspring are chosen at random for mating, the next sibling pair will be one of type (A, AA), (A, Aa), (a, AA), or (a, Aa) with equal probability. Thus, the second column of M contains “ 41 ” in each of the four rows corresponding to these four sibling pairs. (See Exercise 9 for the remaining columns.) As in our previous examples, it follows from (10) that x(n) = M n x(0) ,

n = 1, 2 , . . .

(11)

After lengthy calculations, the eigenvalues and eigenvectors of M turn out to be

λ1 = 1,

√ √ λ5 = 41 (1 + 5 ), λ6 = 41 (1 − 5 ) ⎡ ⎤ ⎡ ⎤ 1 0 1 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢0⎥ ⎢ 2⎥ ⎢−6⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢0⎥ ⎢0⎥ ⎢−1⎥ ⎢−3⎥ ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ v1 = ⎢ ⎥, v2 = ⎢ ⎥, v3 = ⎢ ⎥, v4 = ⎢ ⎢ 3⎥, ⎢0⎥ ⎢0⎥ ⎢ 1⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0⎦ ⎣0⎦ ⎣−2⎦ ⎣ 6⎦ 0 1 1 −1 ⎡ ⎡ ⎤ ⎤ √ √ 1 1 (−3 − 5 ) (−3 + 5 ) ⎢4 ⎢4 ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ 1 1 ⎢ ⎢ ⎥ √ ⎥ √ ⎥ ⎢1 ⎢1 ⎥ ⎢ 4 (−1 + 5 )⎥ ⎢ 4 (−1 − 5 )⎥ ⎢ ⎥ v5 = ⎢ v = , √ ⎥ √ ⎥ 6 ⎢1 ⎢1 ⎥ ⎢ 4 (−1 + 5 )⎥ ⎢ 4 (−1 − 5 )⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ 1 1 ⎣ ⎣ ⎦ √ √ ⎦ 1 1 (−3 − 5 ) (−3 + 5 ) 4 4

λ2 = 1, λ3 = 21 , ⎡ ⎤

λ4 = − 21 , ⎡ ⎤

The diagonalization of M then leads to x(n) = PD nP −1 x(0) , where

⎡ 1

⎢ ⎢0 ⎢ ⎢ ⎢0 P =⎢ ⎢ ⎢0 ⎢ ⎢0 ⎣ 0

0

−1

1

0

2

−6

0

−1 −3

0

1

3

0

−2

6

1

−1

1

1 (−3 4

n = 1, 2, . . . − 1

1 (−1 4 1 (−1 4

+ + 1

1 (−3 4



√ √ √ √

5)

1 (−3 4

(12)

+

5) 5)

⎤ 5)

⎥ ⎥ ⎥ √ ⎥ 1 ⎥ (− 1 − 5 ) 4 √ ⎥ ⎥ 1 (−1 − 5 )⎥ 4 ⎥ ⎥ 1 ⎦ √ 1 (− 3 + 5 ) 4 1

5)



10.15 Genetics



1 ⎢ ⎢0

0 1

⎢ ⎢0 0 ⎢ n D =⎢ ⎢0 0 ⎢ ⎢ ⎢0 0 ⎣ 0

P

0

2



0 0

0 0

0 0

0

0

0

n − 21

0 0

0

0

0

2 3 1 3 1 8 − 241

1

−1

 1 n

0



⎢ ⎢0 ⎢ ⎢ ⎢0 =⎢ ⎢0 ⎢ ⎢ ⎢0 ⎣

0 0

+

1 (5 20 1 (5 20



√ √

5) 5)

1 4

0

(1 +



5)

1

0 1 3 2 3 − 41 − 121 √ 1 5 5 √ 1 −5 5

4

2 3 1 3 1 4 1 12

(1 −



5) 0

⎤ ⎥

1⎥ ⎥

⎥ ⎥ 0⎥ ⎥ ⎥ √ 1 (5 + 5 ) 0⎥ 20 ⎦ √ 1 ( 5 − 5 ) 0 20

5

− 15

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ n ⎦

0

1 3 2 3 − 18 1 24



1 5



0

n

671



5

0⎥

We will not write out the matrix product in (12), as it is rather unwieldy. However, if a specific vector x(0) is given, the calculation for x(n) is not too cumbersome (see Exercise 6). Because the absolute values of the last four diagonal entries of D are less than 1, we see that as n tends to infinity,



1 ⎢ ⎢0 ⎢ ⎢0 n D →⎢ ⎢0 ⎢ ⎢ ⎣0 0

0 1 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0



0 ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎦ 0

And so, from Equation (12),



x(n)

1 ⎢ ⎢0 ⎢ ⎢0 →P⎢ ⎢0 ⎢ ⎢ ⎣0 0

0 1 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0



0 ⎥ 0⎥ ⎥ 0⎥ ⎥ P −1 x(0) 0⎥ ⎥ ⎥ 0⎦ 0

Performing the matrix multiplication on the right, we obtain (verify)



x(n)

⎢ ⎢ ⎢ ⎢ →⎢ ⎢ ⎢ ⎢ ⎣

a0 + 23 b0 + 13 c0 + 23 d0 + 13 e0 0 0 0 0

f0 + 13 b0 + 23 c0 + 13 d0 + 23 e0

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(13)

672

Chapter 10 Applications of Linear Algebra

That is, in the limit all sibling pairs will be either type (A, AA) or type (a, aa). For example, if the initial parents are type (A, Aa) (that is, b0 = 1 and a0 = c0 = d0 = e0 = f0 = 0), then as n tends to infinity,

⎡2⎤ 3

x(n)

⎢ ⎥ ⎢0⎥ ⎢ ⎥ ⎢0⎥ ⎥ →⎢ ⎢0⎥ ⎢ ⎥ ⎢ ⎥ ⎣0⎦ 1 3

Thus, in the limit there is probability bability 13 that they will be (a, aa).

2 3

that the sibling pairs will be (A, AA), and pro-

Exercise Set 10.15 1. Show that if M = PDP −1 , then M n = PD nP −1 for n = 1, 2, . . . .

(A, AA) is the same as the proportion of A genes in the initial

2. In Example 1 suppose that the plants are always fertilized with a plant of genotype Aa rather than one of genotype AA. Derive formulas for the fractions of the plants of genotypes AA, Aa , and aa in the nth generation. Also, find the limiting genotype distribution as n tends to infinity.

8. In X-linked inheritance suppose that none of the females of genotype Aa survive to maturity. Under inbreeding the possible sibling pairs are then

3. In Example 1 suppose that the initial plants are fertilized with genotype AA, the first generation is fertilized with genotype Aa , the second generation is fertilized with genotype AA, and this alternating pattern of fertilization is kept up. Find formulas for the fractions of the plants of genotypes AA, Aa , and aa in the nth generation. 4. In the section on autosomal recessive diseases, find the eigenvalues and eigenvectors of the matrix M and verify Equation (7). 5. Suppose that a breeder has an animal population in which 25% of the population are carriers of an autosomal recessive disease. If the breeder allows the animals to mate irrespective of their genotype, use Equation (9) to calculate the number of generations required for the percentage of carriers to fall from 25% to 10%. If the breeder instead implements the controlled-mating program determined by Equation (8), what will the percentage of carriers be after the same number of generations? 6. In the section on X-linked inheritance, suppose that the initial parents are equally likely to be of any of the six possible genotype parents; that is, ⎡ ⎤

x(0) =

1 6 ⎢1⎥ ⎢6⎥ ⎢ ⎥ ⎢1⎥ ⎢6⎥ ⎢ ⎥ ⎢1⎥ ⎢6⎥ ⎢ ⎥ ⎢1⎥ ⎣6⎦ 1 6

Using Equation (12), calculate x(n) and also calculate the limit of x(n) as n tends to infinity. 7. From (13) show that under X-linked inheritance with inbreeding, the probability that the limiting sibling pairs will be of type

population.

(A, AA), (A, aa), (a, AA), and (a, aa) Find the transition matrix that describes how the genotype distribution changes in one generation. 9. Derive the matrix M in Equation (10) from Table 2.

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. (a) Use a computer to verify that the eigenvalues and eigenvectors of ⎤ ⎡ 1 41 0 0 0 0

⎢ ⎢0 ⎢ ⎢ ⎢0 M=⎢ ⎢0 ⎢ ⎢ ⎢0 ⎣ 0

1 4

0

1

0

0

0

1 4 1 4

1 4 1 4

0

0

0

1

0

0

0

0

1 4 1 4



0⎥ ⎥

⎥ ⎥ ⎥ 0⎥ ⎥ 0⎥ ⎦ 0⎥

1

as given in the text are correct. (b) Starting with x(n) = M x(n−1) and the assumption that lim x(n) = x

n→⬁

exists, we must have lim x(n) = M lim x(n−1) or x = M x

n→⬁

n→⬁

10.16 Age-Specific Population Growth

This suggests that x can be solved directly using the equation (M − I )x = 0. Use a computer to solve the equation x = M x, where

from Equation (12) and



1 ⎢0 ⎢ ⎢0 ⎢ n lim D = ⎢ n→⬁ ⎢0 ⎢ ⎣0 0

⎡ ⎤ a ⎢ b⎥ ⎢ ⎥ ⎢ c⎥ ⎢ ⎥ x=⎢ ⎥ ⎢d⎥ ⎢ ⎥ ⎣ e⎦ f

and a + b + c + d + e + f = 1; compare your results to Equation (13). Explain why the solution to (M − I )x = 0 along with a + b + c + d + e + f = 1 is not specific enough to determine limn→⬁ x(n) . T2. (a) Given

⎢0 ⎢ ⎢ ⎢0 ⎢ P =⎢ ⎢0 ⎢ ⎢ ⎣0 0

0

−1

1

0

2

−6

0

−1

−3

1 (−1 4

+

1 (−1 4

+

0

1

3

0

−2

6

1

1

−1

1 (−3 4

− 1

1 1 (−3 4



√ √ √ √

5)

1 (−3 4

+

5) 5)

⎡ 1

⎢ ⎢0 ⎢ ⎢0 ⎢ n lim M = ⎢ n→⬁ ⎢0 ⎢ ⎢0 ⎣



5)

⎥ ⎥ ⎥ 1 ⎥ (− 1 − 5 ) ⎥ 4 √ ⎥ 1 ⎥ (− 1 − 5 ) ⎥ 4 ⎥ 1 ⎦ √ 1 (− 3 + 5 ) 4 1

5)



0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

2 3

1 3

2 3

1 3

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0



0

1 3

2 3

1 3

2 3



0 0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎦ 0

⎤ 0



0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎥ ⎥ 0⎥ ⎦ 1

(b) Use a computer to calculate M n for n = 10, 20, 30, 40, 50, 60, 70, and then compare your results to the limit in part (a).

10.16 Age-Specific Population Growth In this section we investigate, using the Leslie matrix model, the growth over time of a female population that is divided into age classes. We then determine the limiting age distribution and growth rate of the population.

PREREQUISITES: Eigenvalues and Eigenvectors Diagonalization of a Matrix Intuitive Understanding of Limits

One of the most common models of population growth used by demographers is the so-called Leslie model developed in the 1940s. This model describes the growth of the female portion of a human or animal population. In this model the females are divided into age classes of equal duration. To be specific, suppose that the maximum age attained by any female in the population is L years (or some other time unit) and we divide the population into n age classes. Then each class is L/n years in duration. We label the age classes according to Table 1. Table 1 Age Class

1 2 3

n–1 n

Age Interval

[0, L/n) [L/n, 2L/n) [2L/n, 3L/n)

...

1

0 1 0 0 0 0

use a computer to show that

...



673

[(n – 2)L/n, (n – 1)L/n) [(n – 1)L/n, L]

674

Chapter 10 Applications of Linear Algebra

Suppose that we know the number of females in each of the n classes at time t = 0. In (0) (0) particular, let there be x1 females in the first class, x2 females in the second class, and so forth. With these n numbers we form a column vector:



x1(0)

x(0)



⎢ (0) ⎥ ⎢x ⎥ ⎢ 2 ⎥ =⎢ . ⎥ ⎢ .. ⎥ ⎣ ⎦ xn(0)

We call this vector the initial age distribution vector. As time progresses, the number of females within each of the n classes changes because of three biological processes: birth, death, and aging. By describing these three processes quantitatively, we will see how to project the initial age distribution vector into the future. The easiest way to study the aging process is to observe the population at discrete times—say, t0 , t1 , t2 , . . . , tk , . . . . The Leslie model requires that the duration between any two successive observation times be the same as the duration of the age intervals. Therefore, we set

t0 = 0 t1 = L/n t2 = 2L/n .. . tk = kL/n .. . With this assumption, all females in the (i + 1)-st class at time tk+1 were in the i th class at time tk . The birth and death processes between two successive observation times can be described by means of the following demographic parameters:

ai (i = 1, 2, . . . , n)

The average number of daughters born to each female during the time she is in the ith age class

bi (i = 1, 2, . . . , n – 1)

The fraction of females in the ith age class that can be expected to survive and pass into the (i +1)-st age class

By their definitions, we have that (i) ai ≥ 0

for i = 1, 2, . . . , n

(ii) 0 < bi ≤ 1

for i = 1, 2, . . . , n − 1

Note that we do not allow any bi to equal zero, because then no females would survive beyond the i th age class. We also assume that at least one ai is positive so that some births occur. Any age class for which the corresponding value of ai is positive is called a fertile age class.

10.16 Age-Specific Population Growth

675

We next define the age distribution vector x(k) at time tk by



x1(k)



⎢ (k) ⎥ ⎢x ⎥ ⎢ 2 ⎥ =⎢ . ⎥ ⎢ .. ⎥ ⎣ ⎦ xn(k)

x(k)

(k)

where xi is the number of females in the i th age class at time tk . Now, at time tk , the females in the first age class are just those daughters born between times tk−1 and tk . Thus, we can write

⎧ ⎫ number of⎪ ⎪ ⎪ ⎪ ⎨ ⎬ females

in class 1 ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ at time tk

⎧ number of ⎪ ⎪ ⎪ ⎪ daughters ⎪ ⎪ ⎪ ⎪ ⎨ born to

=

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

females in

⎪ ⎪ ⎪ ⎪ class 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ between times ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ tk−1 and tk

+

⎧ number of ⎪ ⎪ ⎪ ⎪ daughters ⎪ ⎪ ⎪ ⎪ ⎨ born to

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

females in

⎪ ⎪ ⎪ ⎪ class 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ between times ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ tk−1 and tk

+ ··· +

⎧ number of ⎪ ⎪ ⎪ ⎪ daughters ⎪ ⎪ ⎪ ⎪ ⎨ born to

⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬

females in

⎪ ⎪ ⎪ ⎪ class n ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ between times ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ tk−1 and tk

or, mathematically,

x1(k) = a1 x1(k−1) + a2 x2(k−1) + · · · + an xn(k−1)

(1)

The females in the (i + 1)-st age class (i = 1, 2, . . . , n − 1) at time tk are those females in the i th class at time tk−1 who are still alive at time tk . Thus,

⎧ ⎫ fraction of ⎪ ⎪ ⎪ ⎫ ⎪ ⎪ ⎪⎧ ⎪ females in ⎪ number of ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ ⎬⎨ class i females in females in = class i + 1⎪ ⎪ ⎪ who survive ⎪ ⎪ ⎪⎪ ⎪ class i ⎪ ⎪ ⎪ ⎩ ⎭ ⎪ ⎪ ⎭ ⎪⎩ ⎪ at time tk ⎪and pass into⎪ ⎪ at time tk−1 ⎪ ⎪ ⎩ ⎭ class i + 1 ⎧ ⎫ number of⎪ ⎪ ⎪ ⎪ ⎨ ⎬

or, mathematically, (k) (k−1) xi+ , 1 = bi xi

i = 1, 2, . . . , n − 1

(2)

Using matrix notation, we can write Equations (1) and (2) as



x1(k)





a ⎢ (k) ⎥ ⎢ 1 ⎢x2 ⎥ ⎢b1 ⎢ ⎥ ⎢ ⎢x (k) ⎥ ⎢ 0 ⎢ 3 ⎥=⎢ ⎢ . ⎥ ⎢. ⎢ . ⎥ ⎢ .. ⎣ . ⎦ ⎣ 0 xn(k)

a2

a3

0

0 0

.. .

· · · an−1 ··· 0 ··· 0 .. .

0

· · · bn−1

b2 .. . 0

an

⎤⎡



⎥⎢

(k−1) ⎥

x1(k−1)

⎢ ⎥ 0⎥ ⎥ ⎢x2 ⎥ (k−1) ⎥ ⎥ ⎢ 0 ⎥ ⎢x3 ⎥

⎢ . ⎥ .. ⎥ ⎢ . ⎥ .⎥ ⎦⎣ . ⎦ 0 xn(k−1)

or more compactly as x(k) = Lx(k−1) , where L is the Leslie matrix



an

0 0

.. .

· · · an−1 ··· 0 ··· 0 .. .

0

· · · bn−1

0

a1 ⎢b1 ⎢ ⎢ L = ⎢0 ⎢. ⎣ ..

a2

a3

0

0

0

b2 .. .

k = 1, 2, . . .

(3)



0⎥ ⎥ 0⎥ ⎥

.. ⎥ .⎦

(4)

676

Chapter 10 Applications of Linear Algebra

From Equation (3) it follows that x(1) = Lx(0) x(2) = Lx(1) = L2 x(0) x(3) = Lx(2) = L3 x(0) x(k)

(5)

.. . = Lx(k−1) = Lk x(0)

Thus, if we know the initial age distribution x(0) and the Leslie matrix L, we can determine the female age distribution at any later time. E X A M P L E 1 Female Age Distribution for Animals

Suppose that the oldest age attained by the females in a certain animal population is 15 years and we divide the population into three age classes with equal durations of five years. Let the Leslie matrix for this population be



0



⎢ L = ⎣ 21

4

3

0

0⎦

0

1 4

0



If there are initially 1000 females in each of the three age classes, then from Equation (3) we have



x(0)



1,000 ⎥ ⎢ = ⎣1,000⎦ 1,000









⎤⎡







⎤⎡





4

x(1) = Lx(0) = ⎣ 21

0

0

1 4

1,000 7,000 ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣1,000⎦ = ⎣ 500⎦ 250 1,000 0

0

4

3

x(2) = Lx(1) = ⎣ 21

0

0

1 4

7,000 2,750 ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣ 500⎦ = ⎣3,500⎦ 250 125 0

0

4

3

x(3) = Lx(2) = ⎣ 21

0

0

1 4

⎢ ⎡ ⎢

⎡ ⎢

3

⎤⎡

0



2,750 14,375 ⎥ ⎢ ⎥ ⎢ ⎥ 0⎦ ⎣3,500⎦ = ⎣ 1,375⎦ 125 875 0

Thus, after 15 years there are 14,375 females between 0 and 5 years of age, 1375 females between 5 and 10 years of age, and 875 females between 10 and 15 years of age. Limiting Behavior

Although Equation (5) gives the age distribution of the population at any time, it does not immediately give a general picture of the dynamics of the growth process. For this we need to investigate the eigenvalues and eigenvectors of the Leslie matrix. The eigenvalues of L are the roots of its characteristic polynomial. As we ask you to verify in Exercise 2, this characteristic polynomial is

p(λ) = |λI − L| = λn − a1 λn−1 − a2 b1 λn−2 − a3 b1 b2 λn−3 − · · · − an b1 b2 · · · bn−1 To analyze the roots of this polynomial, it will be convenient to introduce the function

q(λ) =

a1 a3 b1 b2 an b1 b2 · · · bn−1 a2 b1 + ··· + + 2 + λ λ λ3 λn

(6)

10.16 Age-Specific Population Growth

677

Using this function, the characteristic equation p(λ) = 0 can be written (verify)

q(λ) = 1 for λ  = 0

Because all the ai and bi are nonnegative, we see that q(λ) is monotonically decreasing for λ greater than zero. Furthermore, q(λ) has a vertical asymptote at λ = 0 and approaches zero as λ → ⬁. Consequently, as Figure 10.16.1 indicates, there is a unique λ, say λ = λ1 , such that q(λ1 ) = 1. That is, the matrix L has a unique positive eigenvalue. It can also be shown (see Exercise 3) that λ1 has multiplicity 1; that is, λ1 is not a repeated root of

q(λ)

1 λ 0

(7)

the characteristic equation. Although we omit the computational details, you can verify that an eigenvector corresponding to λ1 is



λ1

⎢ ⎢ ⎢ ⎢ ⎢ x1 = ⎢ ⎢ ⎢ ⎢ ⎣

Figure 10.16.1



1

b1 /λ1 b1 b2 /λ21 b1 b2 b3 /λ31 .. . 1 b1 b2 · · · bn−1 /λn− 1

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(8)

Because λ1 has multiplicity 1, its corresponding eigenspace has dimension 1 (Exercise 3), and so any eigenvector corresponding to it is some multiple of x1 . We can summarize these results in the following theorem.

THEOREM 10.16.1 Existence of a Positive Eigenvalue

A Leslie matrix L has a unique positive eigenvalue λ1 . This eigenvalue has multiplicity 1 and an eigenvector x1 all of whose entries are positive.

We will now show that the long-term behavior of the age distribution of the population is determined by the positive eigenvalue λ1 and its eigenvector x1 . In Exercise 9 we ask you to prove the following result.

THEOREM 10.16.2 Eigenvalues of a Leslie Matrix

If λ1 is the unique positive eigenvalue of a Leslie matrix L, and λk is any other real or complex eigenvalue of L, then |λk | ≤ λ1 .

For our purposes the conclusion in Theorem 10.16.2 is not strong enough; we need λ1 to satisfy |λk | < λ1 . In this case λ1 would be called the dominant eigenvalue of L. However, as the following example shows, not all Leslie matrices satisfy this condition.

E X A M P L E 2 Leslie Matrix with No Dominant Eigenvalue

Let



0

⎢ L = ⎣ 21

0



0

6

0

0⎦

1 3

0



Then the characteristic polynomial of L is

p(λ) = |λI − L| = λ3 − 1

678

Chapter 10 Applications of Linear Algebra

The eigenvalues of L are thus the solutions of λ3 = 1—namely,





1 3 − + i, 2 2

λ = 1,

1 3 − − i 2 2

All three eigenvalues have absolute value 1, so the unique positive eigenvalue λ1 = 1 is not dominant. Note that this matrix has the property that L3 = I . This means that for any choice of the initial age distribution x(0) , we have x(0) = x(3) = x(6) = · · · = x(3k) = · · · The age distribution vector thus oscillates with a period of three time units. Such oscillations (or population waves, as they are called) could not occur if λ1 were dominant, as we will see below.

It is beyond the scope of this book to discuss necessary and sufficient conditions for

λ1 to be a dominant eigenvalue. However, we will state the following sufficient condition without proof.

THEOREM 10.16.3 Dominant Eigenvalue

If two successive entries ai and ai+1 in the first row of a Leslie matrix L are nonzero, then the positive eigenvalue of L is dominant.

Thus, if the female population has two successive fertile age classes, then its Leslie matrix has a dominant eigenvalue. This is always the case for realistic populations if the duration of the age classes is sufficiently small. Note that in Example 2 there is only one fertile age class (the third), so the condition of Theorem 10.16.3 is not satisfied. In what follows, we always assume that the condition of Theorem 10.16.3 is satisfied. Let us assume that L is diagonalizable. This is not really necessary for the conclusions we will draw, but it does simplify the arguments. In this case, L has n eigenvalues, λ1 , λ2 , . . . , λn , not necessarily distinct, and n linearly independent eigenvectors, x1 , x2 , . . . , xn , corresponding to them. In this listing we place the dominant eigenvalue λ1 first. We construct a matrix P whose columns are the eigenvectors of L:

P = [x1 | x2 | x3 | · · · | xn ] The diagonalization of L is then given by the equation



λ1

0

0 0



··· ···

0 0⎥ ⎥

0 ⎥ 0⎥

⎢0 ⎢ L = P ⎢ .. ⎣.

λ2 .. .

0

0

0

.. ⎥ P −1 .⎦ · · · λn

λk1

0

0 0

··· ···

From this it follows that



⎢ ⎢0 L =P⎢ ⎢ ... ⎣

λk2 .. .

0

0

k

.. .



P −1 .. ⎥ .⎥ ⎦ 0 · · · λkn .. .

10.16 Age-Specific Population Growth

679

for k = 1, 2, . . . . For any initial age distribution vector x(0) , we then have



λk1

0

⎢ ⎢0 Lk x(0) = P ⎢ ⎢ ... ⎣

λk2 .. .

0

0



··· ···

0 0

0 ⎥ 0⎥

P −1 x(0) .. ⎥ .⎥ ⎦ 0 · · · λkn .. .

for k = 1, 2, . . . . Dividing both sides of this equation by λk1 and using the fact that x(k) = Lk x(0) , we have



1

1

x(k) k

λ1

⎢ ⎢ ⎢ ⎢0 ⎢ =P⎢ ⎢ .. ⎢. ⎢ ⎢ ⎣0

0

λ2 λ1 .. .

!k

0

···

0

···



0

⎥ ⎥ ⎥ 0 ⎥ ⎥ −1 (0) .. ⎥ ⎥P x . ⎥ ! ⎥ λn k ⎥ ⎦ λ1

.. .

0

···

0

(9)

Because λ1 is the dominant eigenvalue, we have |λi /λ1 | < 1 for i = 2, 3, . . . , n. It follows that (λi /λ1 )k → 0 as k → ⬁ for i = 2, 3, . . . , n Using this fact, we can take the limit of both sides of (9) to obtain

⎡ 8 lim

k→⬁

1

=



1 ⎢0 ⎢

0 0

0 0

··· ···

0 0⎥ ⎥

0

0

0

···

0

x(k) = P ⎢ .. ⎣. λk

.. .

1

.. .

.. ⎥ P −1 x(0) .⎦

(10)

Let us denote the first entry of the column vector P −1 x(0) by the constant c. As we ask you to show in Exercise 4, the right side of (10) can be written as cx1 , where c is a positive constant that depends only on the initial age distribution vector x(0) . Thus, (10) becomes

8 lim

k→⬁

1

λk1

= x

(k)

= c x1

(11)

Equation (11) gives us the approximation x(k)  cλk1 x1

(12)

for large values of k . From (12) we also have 1 x(k−1)  cλk− 1 x1

(13)

Comparing Equations (12) and (13), we see that x(k)  λ1 x(k−1)

(14)

for large values of k . This means that for large values of time, each age distribution vector is a scalar multiple of the preceding age distribution vector, the scalar being the positive eigenvalue of the Leslie matrix. Consequently, the proportion of females in each of the age classes becomes constant. As we will see in the following example, these limiting proportions can be determined from the eigenvector x1 .

680

Chapter 10 Applications of Linear Algebra

E X A M P L E 3 Example 1 Revisited

The Leslie matrix in Example 1 was



0

⎢ L = ⎣ 21

0



4

3

0

0⎦

1 4

0



Its characteristic polynomial is p(λ) = λ3 − 2λ − 38 , and you can verify that the positive eigenvalue is λ1 = 23 . From (8) the corresponding eigenvector x1 is



1



⎢ 1 ⎥ ⎡ ⎤ ⎢ 2 ⎥ 1 ⎥ ⎢ ⎥ ⎥ ⎢ 3 ⎥ ⎢ ⎢ ⎢ x1 = ⎣ b1 /λ1 ⎦ = ⎢ 2 ⎥ = ⎣ 13 ⎥ ⎦ ⎢ ' 1 (' 1 ( ⎥ 1 ⎢ 2 4 ⎥ b1 b2 /λ21 ⎦ ⎣ 18 ⎡



1

' (2 3 2

From (14) we have x(k)  23 x(k−1) for large values of k . Hence, every five years the number of females in each of the three classes will increase by about 50%, as will the total number of females in the population. From (12) we have ⎡ ⎤ 1 x(k)  c

 3 k ⎢ 1 ⎥ ⎢ ⎥ 2 ⎣3⎦ 1 18

Consequently, eventually the females will be distributed among the three age classes in 1 the ratios 1: 13 : 18 . This corresponds to a distribution of 72% of the females in the first age class, 24% of the females in the second age class, and 4% of the females in the third age class.

E X A M P L E 4 Female Age Distribution for Humans

In this example we use birth and death parameters from the year 1965 for Canadian females. Because few women over 50 years of age bear children, we restrict ourselves to the portion of the female population between 0 and 50 years of age. The data are for 5-year age classes, so there are a total of 10 age classes. Rather than writing out the 10 × 10 Leslie matrix in full, we list the birth and death parameters as follows:

Age Interval

ai

bi

[0, 5) [5, 10) [10, 15) [15, 20) [20, 25) [25, 30) [30, 35) [35, 40) [40, 45) [45, 50)

0.00000 0.00024 0.05861 0.28608 0.44791 0.36399 0.22259 0.10457 0.02826 0.00240

0.99651 0.99820 0.99802 0.99729 0.99694 0.99621 0.99460 0.99184 0.98700 —

10.16 Age-Specific Population Growth

681

Using numerical techniques, we can approximate the positive eigenvalue and corresponding eigenvector by ⎤ ⎡ 1.00000 ⎢0.92594⎥ ⎥ ⎢ ⎥ ⎢ ⎢0.85881⎥ ⎥ ⎢ ⎢0.79641⎥ ⎥ ⎢ ⎢0.73800⎥ ⎥ ⎢ λ1 = 1.07622 and x1 = ⎢ ⎥ ⎢0.68364⎥ ⎥ ⎢ ⎢0.63281⎥ ⎥ ⎢ ⎢0.58482⎥ ⎥ ⎢ ⎥ ⎢ ⎣0.53897⎦ 0.49429 Thus, if Canadian women continued to reproduce and die as they did in 1965, eventually every 5 years their numbers would increase by 7.622%. From the eigenvector x1 , we see that, in the limit, for every 100,000 females between 0 and 5 years of age, there will be 92,594 females between 5 and 10 years of age, 85,881 females between 10 and 15 years of age, and so forth. Let us look again at Equation (12), which gives the age distribution vector of the population for large times: x(k)  cλk1 x1 (15) Three cases arise according to the value of the positive eigenvalue λ1 : (i) The population is eventually increasing if λ1 > 1. (ii) The population is eventually decreasing if λ1 < 1. (iii) The population eventually stabilizes if λ1 = 1. The case λ1 = 1 is particularly interesting because it determines a population that has zero population growth. For any initial age distribution, the population approaches a limiting age distribution that is some multiple of the eigenvector x1 . From Equations (6) and (7), we see that λ1 = 1 is an eigenvalue if and only if

a1 + a2 b1 + a3 b1 b2 + · · · + an b1 b2 · · · bn−1 = 1

(16)

R = a1 + a2 b1 + a3 b1 b2 + · · · + an b1 b2 · · · bn−1

(17)

The expression is called the net reproduction rate of the population. (See Exercise 5 for a demographic interpretation of R .) Thus, we can say that a population has zero population growth if and only if its net reproduction rate is 1.

Exercise Set 10.16 1. Suppose that a certain animal population is divided into two age classes and has a Leslie matrix

 L=

1

3 2

1 2

0



(a) Calculate the positive eigenvalue λ1 of L and the corresponding eigenvector x1 .

(b) Beginning with the initial age distribution vector



x(0) =



100 0

calculate x(1) , x(2) , x(3) , x(4) , and x(5) , rounding off to the nearest integer when necessary. (c) Calculate x(6) using the exact formula x(6) = Lx(5) and using the approximation formula x(6)  λ1 x(5) .

682

Chapter 10 Applications of Linear Algebra

2. Find the characteristic polynomial of a general Leslie matrix given by Equation (4).

(b) From your results in part (a), conjecture a relationship between a and b1 , b2 , . . . , bn−1 that will make Lnn = In , where



3. (a) Show that the positive eigenvalue λ1 of a Leslie matrix is always simple. Recall that a root λ0 of a polynomial q(λ) is simple if and only if q (λ0 )  = 0.

0

⎢ ⎢b1 ⎢ ⎢0 ⎢ Ln = ⎢ 0 ⎢ ⎢. ⎢ .. ⎣

(b) Show that the eigenspace corresponding to λ1 has dimension 1. 4. Show that the right side of Equation (10) is cx1 , where c is the first entry of the column vector P −1 x(0) .

.. .

b3 .. .

··· ··· ··· ··· .. .

0

0

···

0 0

0 0 0

b2 0

0

a

0 0 0 0

bn−1

⎤ ⎥

0⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥

.. ⎥ .⎥ ⎦

.. .

0

5. Show that the net reproduction rate R , defined by (17), can be interpreted as the average number of daughters born to a single female during her expected lifetime.

(c) Determine an expression for pn (λ) = |λIn − Ln | and use it to show that all eigenvalues of Ln satisfy |λ| = 1 when a and b1 , b2 , . . . , bn−1 are related by the equation determined in part (b).

6. Show that a population is eventually decreasing if and only if its net reproduction rate is less than 1. Similarly, show that a population is eventually increasing if and only if its net reproduction rate is greater than 1.

T2. Consider the sequence of Leslie matrices

L2 =

7. Calculate the net reproduction rate of the animal population in Example 1.

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets. T1. Consider the sequence of Leslie matrices

L2 =





0 a , b1 0



0

0 0

0 0 0

0

b3

⎢b ⎢ 1 L4 = ⎢ ⎣ 0 b2 0

a



0⎥ ⎥ ⎥, 0⎦ 0

⎢ L3 = ⎣b1

0 0

0

b2

0

0 0



0

a

= I2 ,

L33

= I3 ,

0

b

ap2

ap3

0

b

0 0

0

0

b

0 0 0

ap

ap2

ap3

ap4

0

0 0

b

0 0 0

0

b

0 0 0 0

b 0 0

0



a ⎢ ⎢b ⎢ ⎢0 ⎢ Ln = ⎢ 0 ⎢ ⎢. ⎢. ⎣.

ap

ap2

0

0 0

.. .

b .. .

··· ··· ··· ··· .. .

0

0

0

···

b 0





ap

a ⎢ ⎢b ⎢ L5 = ⎢ ⎢0 ⎢0 ⎣



0 ⎦, 0

0

a ⎢ ⎢b L4 = ⎢ ⎢0 ⎣ ⎡

ap2

⎥ ⎥ ⎥, ⎥ ⎦ ⎤ ⎥ ⎥ ⎥ ⎥, . . . ⎥ ⎥ ⎦

ap n−2

apn−1

0 0 0

0 0 0

.. . b

.. .

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

0

where 0 < p < 1, 0 < b < 1, and 1 < a .



(a) Choose a value for n (say, n = 8). For various values of a , b, and p, use a computer to determine the dominant eigenvalue of Ln , and then compare your results to the value of a + bp .

0 ⎦, 0

⎢b ⎢ 1 ⎢ L5 = ⎢ 0 b2 ⎢ ⎣ 0 0 b3

0 0 0 0

0

0

b4

a



0⎥ ⎥ ⎥ 0 ⎥, . . . ⎥ 0⎦ 0

(a) Use a computer to show that

L22

,

0

ap



0 0 0

0

ap

a ⎢ L3 = ⎣b



8. (For readers with a hand calculator) Calculate the net reproduction rate of the Canadian female population in Example 4. 9. (For readers who have read Sections 10.1–10.3) Prove Theorem 10.16.2. [Hint: Write λk = reiθ , substitute into (7), take the real parts of both sides, and show that r ≤ λ1 .]

a b



(b) Show that

pn (λ) = |λIn − Ln | = λn − a

λn − (bp)n λ − bp

!

which means that the eigenvalues of Ln must satisfy

λn+1 − (a + bp)λn + a(bp)n = 0 L44

= I4 ,

L55

= I5 , . . .

for a suitable choice of a in terms of b1 , b2 , . . . , bn−1 .

(c) Can you now provide a rough proof to explain the fact that λ1  a + bp ?

10.17 Harvesting of Animal Populations

T3. Suppose that a population of mice has a Leslie matrix L over a 1-month period and an initial age distribution vector x(0) given by



0

⎢4 ⎢5 ⎢ ⎢0 ⎢ L=⎢ ⎢0 ⎢ ⎢0 ⎣ 0



0

1 2

4 5

3 10

0

0

0

0

9 10

0

0

0

0

9 10

0

0

0

0

4 5

0

⎡ ⎤ 50 ⎥ ⎢40⎥ 0⎥ ⎢ ⎥ ⎥ ⎢30⎥ 0⎥ ⎢ ⎥ ⎥ (0) ⎥ and x = ⎢ ⎥ ⎢20⎥ 0⎥ ⎢ ⎥ ⎥ ⎥ ⎣10⎦ 0⎦

0

0

0

3 10

0

0

5

683

(a) Compute the net reproduction rate of the population. (b) Compute the age distribution vector after 100 months and 101 months, and show that the vector after 101 weeks is approximately a scalar multiple of the vector after 100 months. (c) Compute the dominant eigenvalue of L and its corresponding eigenvector. How are they related to your results in part (b)? (d) Suppose you wish to control the mouse population by feeding it a substance that decreases its age-specific birthrates (the entries in the first row of L) by a constant fraction. What range of fractions would cause the population eventually to decrease?

10.17 Harvesting of Animal Populations In this section we employ the Leslie matrix model of population growth to model the sustainable harvesting of an animal population. We also examine the effect of harvesting different fractions of different age groups.

PREREQUISITES: Age-Specific Population Growth (Section 10.16) Harvesting

In Section 10.16 we used the Leslie matrix model to examine the growth of a female population that was divided into discrete age classes. In this section, we investigate the effects of harvesting an animal population growing according to such a model. By harvesting we mean the removal of animals from the population. (The word harvesting is not necessarily a euphemism for “slaughtering”; the animals may be removed from the population for other purposes.) In this section we restrict ourselves to sustainable harvesting policies. By this we mean the following: DEFINITION 1 A harvesting policy in which an animal population is periodically harvested is said to be sustainable if the yield of each harvest is the same and the age distribution of the population remaining after each harvest is the same.

Thus, the animal population is not depleted by a sustainable harvesting policy; only the excess growth is removed. As in Section 10.16, we will discuss only the females of the population. If the number of males in each age class is equal to the number of females—a reasonable assumption for many populations—then our harvesting policies will also apply to the male portion of the population. The Harvesting Model

Figure 10.17.1 illustrates the basic idea of the model. We begin with a population having a particular age distribution. It undergoes a growth period that will be described by the Leslie matrix. At the end of the growth period, a certain fraction of each age class is harvested in such a way that the unharvested population has the same age distribution as the original population. This cycle repeats after each harvest so that the yield is sustainable. The duration of the harvest is assumed to be short in comparison with the growth period so that any growth or change in the population during the harvest period can be neglected.

684

Chapter 10 Applications of Linear Algebra

Population before growth period

Population after growth period Growth

Not harvested

Population harvested

Harvested

Figure 10.17.1

To describe this harvesting model mathematically, let

⎡ ⎤ x1 ⎢x ⎥ ⎢ 2⎥ x=⎢.⎥ ⎣ .. ⎦ xn

be the age distribution vector of the population at the beginning of the growth period. Thus xi is the number of females in the i th class left unharvested. As in Section 10.16, we require that the duration of each age class be identical with the duration of the growth period. For example, if the population is harvested once a year, then the population is divided into 1-year age classes. If L is the Leslie matrix describing the growth of the population, then the vector Lx is the age distribution vector of the population at the end of the growth period, immediately before the periodic harvest. Let hi , for i = 1, 2, . . . , n, be the fraction of females from the i th class that is harvested. We use these n numbers to form an n × n diagonal matrix ⎡ ⎤ h1 0 0 · · · 0 ⎢ 0 h2 0 · · · 0 ⎥ ⎢ ⎥ ⎢ ⎥ H = ⎢ 0 0 h3 · · · 0 ⎥

⎢. ⎣ ..

.. .

.. .

0

0

0

.. ⎥ .⎦ · · · hn

which we will call the harvesting matrix. By definition, we have 0 ≤ hi ≤ 1

(i = 1, 2, . . . , n)

That is, we can harvest none (hi = 0), all (hi = 1), or some fraction (0 < hi < 1) of each of the n classes. Because the number of females in the i th class immediately before each harvest is the i th entry (Lx)i of the vector Lx, the i th entry of the column vector



⎤ h1 (Lx)1 ⎢ ⎥ ⎢ h2 (Lx)2 ⎥ ⎥ HLx = ⎢ .. ⎢ ⎥ . ⎣ ⎦ hn (Lx)n

is the number of females harvested from the i th class.

10.17 Harvesting of Animal Populations

From the definition of a sustainable harvesting policy, we have







685



age distribution age distribution ⎣ at end of ⎦ − [harvest] = ⎣ at beginning of ⎦ growth period growth period or, mathematically,

Lx − HLx = x

(1)

If we write Equation (1) in the form

(I − H )Lx = x

(2)

we see that x must be an eigenvector of the matrix (I − H )L corresponding to the eigenvalue 1. As we will now show, this places certain restrictions on the values of hi and x. Suppose that the Leslie matrix of the population is



an

0 0

.. .

· · · an−1 ··· 0 ··· 0 .. .

0

· · · bn−1

0

a1 ⎢b1 ⎢ ⎢ L = ⎢0 ⎢. ⎣ ..

a2

a3

0

0

0

b2 .. .



0⎥ ⎥ 0⎥ ⎥

(3)

.. ⎥ .⎦

Then the matrix (I − H )L is (verify)



(1 − h1 )a1 ⎢ ⎢(1 − h2 )b1 ⎢ 0 (I − H )L = ⎢ ⎢ .. ⎢ . ⎣ 0

(1 − h1 )a2

(1 − h1 )a3

0

0 0

(1 − h3 )b2 .. . 0

.. .

0

··· ··· ···

(1 − h1 )an−1

(1 − h1 )an

0 0

0 0

.. . · · · (1 − hn )bn−1

.. .

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

0

Thus, we see that (I − H )L is a matrix with the same mathematical form as a Leslie matrix. In Section 10.16 we showed that a necessary and sufficient condition for a Leslie matrix to have 1 as an eigenvalue is that its net reproduction rate also be 1 [see Eq. (16) of Section 10.16]. Calculating the net reproduction rate of (I − H )L and setting it equal to 1, we obtain (verify)

(1 − h1 )[a1 + a2 b1 (1 − h2 ) + a3 b1 b2 (1 − h2 )(1 − h3 ) + · · · + an b1 b2 · · · bn−1 (1 − h2 )(1 − h3 ) · · · (1 − hn )] = 1

(4)

This equation places a restriction on the allowable harvesting fractions. Only those values of h1 , h2 , . . . , hn that satisfy (4) and that lie in the interval [0, 1] can produce a sustainable yield. If h1 , h2 , . . . , hn do satisfy (4), then the matrix (I − H )L has the desired eigenvalue λ1 = 1. Furthermore, this eigenvalue has multiplicity 1, because the positive eigenvalue of a Leslie matrix always has multiplicity 1 (Theorem 10.16.1). This means that there is only one linearly independent eigenvector x satisfying Equation (2). [See Exercise 3(b) of Section 10.16.] One possible choice for x is the following normalized eigenvector:



1



⎥ ⎢ b1 (1 − h2 ) ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ b1 b2 (1 − h2 )(1 − h3 ) ⎥ x1 = ⎢ ⎥ ⎢ b1 b2 b3 (1 − h2 )(1 − h3 )(1 − h4 ) ⎥ ⎢ .. ⎥ ⎢ ⎦ ⎣ . b1 b2 b3 · · · bn−1 (1 − h2 )(1 − h3 ) · · · (1 − hn )

(5)

686

Chapter 10 Applications of Linear Algebra

Any other solution x of (2) is a multiple of x1 . Thus, the vector x1 determines the proportion of females within each of the n classes after a harvest under a sustainable harvesting policy. But there is an ambiguity in the total number of females in the population after each harvest. This can be determined by some auxiliary condition, such as an ecological or economic constraint. For example, for a population economically supported by the harvester, the largest population the harvester can afford to raise between harvests would determine the particular constant that x1 is multiplied by to produce the appropriate vector x in Equation (2). For a wild population, the natural habitat of the population would determine how large the total population could be between harvests. Summarizing our results so far, we see that there is a wide choice in the values of h1 , h2 , . . . , hn that will produce a sustainable yield. But once these values are selected, the proportional age distribution of the population after each harvest is uniquely determined by the normalized eigenvector x1 defined by Equation (5). We now consider a few particular harvesting strategies of this type. Uniform Harvesting

With many populations it is difficult to distinguish or catch animals of specific ages. If animals are caught at random, we can reasonably assume that the same fraction of each age class is harvested. We therefore set

h = h1 = h2 = · · · = hn Equation (2) then reduces to (verify) 1

Lx =

!

1−h

x

Hence, 1/(1 − h) must be the unique positive eigenvalue λ1 of the Leslie growth matrix L. That is, 1 1−h Solving for the harvesting fraction h, we obtain

λ1 =

h = 1 − (1/λ1 )

(6)

The vector x1 , in this case, is the same as the eigenvector of L corresponding to the eigenvalue λ1 . From Equation (8) of Section 10.16, this is



⎢ ⎢ ⎢ ⎢ ⎢ x1 = ⎢ ⎢ ⎢ ⎢ ⎣

1

b1 /λ1 b1 b2 /λ21 b1 b2 b3 /λ31 .. . 1 b1 b2 · · · bn−1 /λn− 1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(7)

From (6) we can see that the larger λ1 is, the larger is the fraction of animals we can harvest without depleting the population. Note that we need λ1 > 1 in order for the harvesting fraction h to lie in the interval (0, 1). This is to be expected, because λ1 > 1 is the condition that the population be increasing.

E X A M P L E 1 Harvesting Sheep

For a certain species of domestic sheep in New Zealand with a growth period of 1 year, the following Leslie matrix was found (see G. Caughley, “Parameters for Seasonally Breeding Populations,” Ecology, 48, 1967, pp. 834–839).

10.17 Harvesting of Animal Populations



.000 .045 ⎢.845 0 ⎢ ⎢ 0 .975 ⎢ ⎢ ⎢ 0 0 ⎢ ⎢ 0 0 ⎢ ⎢ 0 0 ⎢ L=⎢ ⎢ 0 0 ⎢ ⎢ 0 0 ⎢ ⎢ 0 0 ⎢ ⎢ ⎢ 0 0 ⎢ ⎣ 0 0 0

0

.391

.472

.484

.546

.543 .502

.468 .459

.433

0 0 .965 0 0 0 0 0 0 0 0

0 0 0 .950 0 0 0 0 0 0 0

0 0 0 0 .926 0 0 0 0 0 0

0 0 0 0 0 .895 0 0 0 0 0

0 0 0 0 0 0 .850 0 0 0 0

0 0 0 0 0 0 0 0 .691 0 0

0 0 0 0 0 0 0 0 0 0 .370

0 0 0 0 0 0 0 .786 0 0 0

0 0 0 0 0 0 0 0 0 .561 0

687

⎤ .421 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 0

The sheep have a lifespan of 12 years, so they are divided into 12 age classes of duration 1 year each. By the use of numerical techniques, the unique positive eigenvalue of L can be found to be λ1 = 1.176 From Equation (6), the harvesting fraction h is

h = 1 − (1/λ1 ) = 1 − (1/1.176) = .150 Thus, the uniform harvesting policy is one in which 15.0% of the sheep from each of the 12 age classes is harvested every year. From (7) the age distribution vector of the sheep after each harvest is proportional to





1.000 ⎢ ⎥ ⎢0.719⎥ ⎢ ⎥ ⎢0.596⎥ ⎢ ⎥ ⎢0.489⎥ ⎢ ⎥ ⎢ ⎥ ⎢0.395⎥ ⎢ ⎥ ⎢0.311⎥ ⎢ ⎥ x1 = ⎢ ⎥ ⎢0.237⎥ ⎢ ⎥ ⎢0.171⎥ ⎢ ⎥ ⎢0.114⎥ ⎢ ⎥ ⎢0.067⎥ ⎢ ⎥ ⎢ ⎥ ⎣0.032⎦ 0.010

(8)

From (8) we see that for every 1000 sheep between 0 and 1 year of age that are not harvested, there are 719 sheep between 1 and 2 years of age, 596 sheep between 2 and 3 years of age, and so forth.

Harvesting Only the Youngest Age Class

In some populations only the youngest females are of any economic value, so the harvester seeks to harvest only the females from the youngest age class. Accordingly, let us set

h1 = h h2 = h3 = · · · = hn = 0

688

Chapter 10 Applications of Linear Algebra

Equation (4) then reduces to

(1 − h)(a1 + a2 b1 + a3 b1 b2 + · · · + an b1 b2 · · · bn−1 ) = 1 or

(1 − h)R = 1 where R is the net reproduction rate of the population. [See Equation (17) of Section 10.16.] Solving for h, we obtain

h = 1 − (1/R)

(9)

Note from this equation that a sustainable harvesting policy is possible only if R > 1. This is reasonable because only if R > 1 is the population increasing. From Equation (5), the age distribution vector after each harvest is proportional to the vector

⎡ ⎢ ⎢ ⎢ ⎢ x1 = ⎢ ⎢ ⎢ ⎢ ⎣



1

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

b1 b1 b2 b1 b2 b3 .. .

(10)

b1 b2 b3 · · · bn−1

E X A M P L E 2 Sustainable Harvesting Policy

Let us apply this type of sustainable harvesting policy to the sheep population in Example 1. For the net reproduction rate of the population we find

R = a1 + a2 b1 + a3 b1 b2 + · · · + an b1 b2 · · · bn−1 = (.000) + (.045)(.845) + · · · + (.421)(.845)(.975) · · · (.370) = 2.514 From Equation (9), the fraction of the first age class harvested is

h = 1 − (1/R) = 1 − (1/2.514) = .602 From Equation (10), the age distribution of the sheep population after the harvest is proportional to the vector



1.000 .845 (.845)(.975) (.845)(.975)(.965)







1.000 ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.845⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.824⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.795⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.755⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.699⎥ ⎢ ⎥ ⎢ ⎥ x1 = ⎢ ⎥ = ⎢0.626⎥ .. ⎢ ⎥ ⎢ ⎥ . ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.532⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.418⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢0.289⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎣ ⎦ ⎣0.162⎦ 0.060 (.845)(.975) · · · (.370)

(11)

10.17 Harvesting of Animal Populations

689

A direct calculation gives us the following (see also Exercise 3):





2.514 ⎥ ⎢ ⎢0.845⎥ ⎥ ⎢ ⎢0.824⎥ ⎥ ⎢ ⎢0.795⎥ ⎥ ⎢ ⎥ ⎢ ⎢0.755⎥ ⎥ ⎢ ⎢0.699⎥ ⎥ Lx1 = ⎢ ⎢0.626⎥ ⎥ ⎢ ⎥ ⎢ ⎢0.532⎥ ⎥ ⎢ ⎢0.418⎥ ⎥ ⎢ ⎢0.289⎥ ⎥ ⎢ ⎥ ⎢ ⎣0.162⎦ 0.060

(12)

The vector Lx1 is the age distribution vector immediately before the harvest. The total of all entries in Lx1 is 8.520, so the first entry 2.514 is 29.5% of the total. This means that immediately before each harvest, 29.5% of the population is in the youngest age class. Since 60.2% of this class is harvested, it follows that 17.8% (= 60.2% of 29.5%) of the entire sheep population is harvested each year. This can be compared with the uniform harvesting policy of Example 1, in which 15.0% of the sheep population is harvested each year.

Optimal SustainableYield

We saw in Example 1 that a sustainable harvesting policy in which the same fraction of each age class is harvested produces a yield of 15.0% of the sheep population. In Example 2 we saw that if only the youngest age class is harvested, the resulting yield is 17.8% of the population. There are many other possible sustainable harvesting policies, and each generally provides a different yield. It would be of interest to find a sustainable harvesting policy that produces the largest possible yield. Such a policy is called an optimal sustainable harvesting policy, and the resulting yield is called the optimal sustainable yield. However, determining the optimal sustainable yield requires linear programming theory, which we will not discuss here. We refer you to the following result, which appears in J. R. Beddington and D. B. Taylor, “Optimum Age Specific Harvesting of a Population,” Biometrics, 29, 1973, pp. 801–809.

THEOREM 10.17.1 Optimal Sustainable Yield

An optimal sustainable harvesting policy is one in which either one or two age classes are harvested. If two age classes are harvested, then the older age class is completely harvested.

As an illustration, it can be shown that the optimal sustainable yield of the sheep population is attained when h1 = 0.522 (13) h9 = 1.000 and all other values of hi are zero. Thus, 52.2% of the sheep between 0 and 1 year of age and all the sheep between 8 and 9 years of age are harvested. As we ask you to show in Exercise 2, the resulting optimal sustainable yield is 19.9% of the population.

690

Chapter 10 Applications of Linear Algebra

Exercise Set 10.17 1. Let a certain animal population be divided into three 1-year age classes and have as its Leslie matrix



0

⎢ L = ⎣ 21

0



4

3

0

0⎦

1 4

0



T1. The results of Theorem 10.17.1 suggest the following algorithm for determining the optimal sustainable yield.

(a) Find the yield and the age distribution vector after each harvest if the same fraction of each of the three age classes is harvested every year. (b) Find the yield and the age distribution vector after each harvest if only the youngest age class is harvested every year. Also, find the fraction of the youngest age class that is harvested. 2. For the optimal sustainable harvesting policy described by Equations (13), find the vector x1 that specifies the age distribution of the population after each harvest. Also calculate the vector Lx1 and verify that the optimal sustainable yield is 19.9% of the population. 3. Use Equation (10) to show that if only the first age class of an animal population is harvested,



R−1

in these exercises, you will be able to use your technology utility to solve many of the problems in the regular exercise sets.



⎢ 0 ⎥ ⎢ ⎥ ⎢ ⎥ Lx1 − x1 = ⎢ 0 ⎥ ⎢ .. ⎥ ⎣ . ⎦ 0 where R is the net reproduction rate of the population. 4. If only the I th class of an animal population is to be periodically harvested (I = 1, 2, . . . , n), find the corresponding harvesting fraction hI . 5. Suppose that all of the J th class and a certain fraction hI of the I th class of an animal population is to be periodically harvested (1 ≤ I < J ≤ n). Calculate hI .

Working withTechnology The following exercises are designed to be solved using a technology utility. Typically, this will be MATLAB, Mathematica, Maple, Derive, or Mathcad, but it may also be some other type of linear algebra software or a scientific calculator with some linear algebra capabilities. For each exercise you will need to read the relevant documentation for the particular utility you are using. The goal of these exercises is to provide you with a basic proficiency with your technology utility. Once you have mastered the techniques

(i) For each value of i = 1, 2, . . . , n, set hi = h and hk = 0 for k  = i and calculate the respective yields. These n calculations give the one-age-class results. Of course, any calculation leading to a value of h not between 0 and 1 is rejected. (ii) For each value of i = 1, 2, . . . , n − 1 and j = i + 1, i + 2, . . . , n, set hi = h, hj = 1, and hk = 0 for k  = i, j and calculate the respective yields. These 21 n(n − 1) calculations give the two-age-class results. Of course, any calculation leading to a value of h not between 0 and 1 is again rejected. (iii) Of the yields calculated in parts (i) and (ii), the largest is the optimal sustainable yield. Note that there will be at most

n + 21 n(n − 1) = 21 n(n + 1) calculations in all. Once again, some of these may lead to a value of h not between 0 and 1 and must therefore be rejected. If we use this algorithm for the sheep example in the text, there will be at most 21 (12)(12 + 1) = 78 calculations to consider. Use a computer to do the two-age-class calculations for h1 = h, hj = 1, and hk = 0 for k  = 1 or j for j = 2, 3, . . . , 12. Construct a summary table consisting of the values of h1 and the percentage yields using j = 2, 3, . . . , 12, which will show that the largest of these yields occurs when j = 9. T2. Using the algorithm in Exercise T1, do the one-age-class calculations for hi = h and hk = 0 for k  = i for i = 1, 2, . . . , 12. Construct a summary table consisting of the values of hi and the percentage yields using i = 1, 2, . . . , 12, which will show that the largest of these yields occurs when i = 9. T3. Referring to the mouse population in Exercise T3 of Section 10.16, suppose that reducing the birthrates is not practical, so you instead decide to control the population by uniformly harvesting all of the age classes monthly. (a) What fraction of the population must be harvested monthly to bring the mouse population to equilibrium eventually? (b) What is the equilibrium age distribution vector under this uniform harvesting policy? (c) The total number of mice in the original mouse population was 155. What would be the total number of mice after 5, 10, and 200 months under your uniform harvesting policy?

10.18 A Least Squares Model for Human Hearing

691

10.18 A Least Squares Model for Human Hearing In this section we apply the method of least squares approximation to a model for human hearing. The use of this method is motivated by energy considerations.

PREREQUISITES: Inner Product Spaces Orthogonal Projection Fourier Series (Section 6.6)

Anatomy of the Ear

We begin with a brief discussion of the nature of sound and human hearing. Figure 10.18.1 is a schematic diagram of the ear showing its three main components: the outer ear, middle ear, and inner ear. Sound waves enter the outer ear where they are channeled to the eardrum, causing it to vibrate. Three tiny bones in the middle ear mechanically link the eardrum with the snail-shaped cochlea within the inner ear. These bones pass on the vibrations of the eardrum to a fluid within the cochlea. The cochlea contains thousands of minute hairs that oscillate with the fluid. Those near the entrance of the cochlea are stimulated by high frequencies, and those near the tip are stimulated by low frequencies. The movements of these hairs activate nerve cells that send signals along various neural pathways to the brain, where the signals are interpreted as sound. Cochlea Auditory nerve

Eardrum Sound wave

To brain

Outer ear

Middle ear Inner ear

Figure 10.18.1

The sound waves themselves are variations in time of the air pressure. For the auditory system, the most elementary type of sound wave is a sinusoidal variation in the air pressure. This type of sound wave stimulates the hairs within the cochlea in such a way that a nerve impulse along a single neural pathway is produced (Figure 10.18.2). A sinusoidal sound wave can be described by a function of time

q(t) = A0 + A sin(ωt − δ) q(t)

(1)

2π ω

A A0 t

Figure 10.18.2

δ ω

Ear

Neural pathways to brain

692

Chapter 10 Applications of Linear Algebra

where q(t) is the atmospheric pressure at the eardrum, A0 is the normal atmospheric pressure, A is the maximum deviation of the pressure from the normal atmospheric pressure, ω/2π is the frequency of the wave in cycles per second, and δ is the phase angle of the wave. To be perceived as sound, such sinusoidal waves must have frequencies within a certain range. For humans this range is roughly 20 cycles per second (cps) to 20,000 cps. Frequencies outside this range will not stimulate the hairs within the cochlea enough to produce nerve signals. To a reasonable degree of accuracy, the ear is a linear system. This means that if a complex sound wave is a finite sum of sinusoidal components of different amplitudes, frequencies, and phase angles, say,

q(t) = A0 + A1 sin(ω1 t − δ1 ) + A2 sin(ω2 t − δ2 ) + · · · + An sin(ωn t − δn )

(2)

then the response of the ear consists of nerve impulses along the same neural pathways that would be stimulated by the individual components (Figure 10.18.3).

q(t) + t

= +

Figure 10.18.3

Let us now consider some periodic sound wave p(t) with period T [i.e., p(t) ≡ p(t + T )] that is not a finite sum of sinusoidal waves. If we examine the response of the ear to such a periodic wave, we find that it is the same as the response to some wave that is the sum of sinusoidal waves. That is, there is some sound wave q(t) as given by Equation (2) that produces the same response as p(t), even though p(t) and q(t) are different functions of time. We now want to determine the frequencies, amplitudes, and phase angles of the sinusoidal components of q(t). Because q(t) produces the same response as the periodic wave p(t), it is reasonable to expect that q(t) has the same period T as p(t). This requires that each sinusoidal term in q(t) have period T . Consequently, the frequencies of the sinusoidal components must be integer multiples of the basic frequency 1/T of the function p(t). Thus, the ωk in Equation (2) must be of the form

ωk = 2kπ/T ,

k = 1, 2 , . . .

But because the ear cannot perceive sinusoidal waves with frequencies greater than 20,000 cps, we may omit those values of k for which ωk /2π = k/T is greater than 20,000. Thus, q(t) is of the form

q(t) = A0 + A1 sin

! ! 2nπ t − δ1 + · · · + An sin − δn T T

2π t

(3)

where n is the largest integer such that n/T is not greater than 20,000. We now turn our attention to the values of the amplitudes A0 , A1 , . . . , An and the phase angles δ1 , δ2 , . . . , δn that appear in Equation (3). There is some criterion by which

10.18 A Least Squares Model for Human Hearing

693

the auditory system “picks” these values so that q(t) produces the same response as p(t). To examine this criterion, let us set

e(t) = p(t) − q(t) If we consider q(t) as an approximation to p(t), then e(t) is the error in this approximation, an error that the ear cannot perceive. In terms of e(t), the criterion for the determination of the amplitudes and the phase angles is that the quantity



T

 [e(t)]2 dt =

0

T

[p(t) − q(t)]2 dt

(4)

0

be as small as possible. We cannot go into the physiological reasons for this, but we note that this expression is proportional to the acoustic energy of the error wave e(t) over one period. In other words, it is the energy of the difference between the two sound waves p(t) and q(t) that determines whether the ear perceives any difference between them. If this energy is as small as possible, then the two waves produce the same sensation of sound. Mathematically, the function q(t) in (4) is the least squares approximation to p(t) from the vector space C[0, T ] of continuous functions on the interval [0, T ]. (See Section 6.6.) Least squares approximations by continuous functions arise in a wide variety of engineering and scientific approximation problems. Apart from the acoustics problem just discussed, some other examples follow.

S(x) axial strain

x x=0

1. Let S(x) be the axial strain distribution in a uniform rod lying along the x -axis from x = 0 to x = l (Figure 10.18.4). The strain energy in the rod is proportional to the integral



x=l

Figure 10.18.4

l

[S(x)]2 dx

0

The closeness of an approximation q(x) to S(x) can be judged according to the strain energy of the difference of the two strain distributions. That energy is proportional to



l

[S(x) − q(x)]2 dx

0

which is a least squares criterion. E(t) voltage t 0

2. Let E(t) be a periodic voltage across a resistor in an electrical circuit (Figure 10.18.5). The electrical energy transferred to the resistor during one period T is proportional to



T

T

[E(t)]2 dt

0

Figure 10.18.5

If q(t) has the same period as E(t) and is to be an approximation to E(t), then the criterion of closeness might be taken as the energy of the difference voltage. This is proportional to



y(x) displacement

T

[E(t) − q(t)]2 dt

0

which is again a least squares criterion.

x x=0

x=l

3. Let y(x) be the vertical displacement of a uniform flexible string whose equilibrium position is along the x -axis from x = 0 to x = l (Figure 10.18.6). The elastic potential energy of the string is proportional to



Figure 10.18.6 0

l

[y(x)]2 dx

694

Chapter 10 Applications of Linear Algebra

If q(x) is to be an approximation to the displacement, then as before, the energy integral



l

[y(x) − q(x)]2 dx

0

determines a least squares criterion for the closeness of the approximation. Least squares approximation is also used in situations where there is no a priori justification for its use, such as for approximating business cycles, population growth curves, sales curves, and so forth. It is used in these cases because of its mathematical simplicity. In general, if no other error criterion is immediately apparent for an approximation problem, the least squares criterion is the one most often chosen. The following result was obtained in Section 6.6.

THEOREM 10.18.1 Minimizing Mean Square Error on [0, 2π ]

If f (t) is continuous on [0, 2π ], then the trigonometric function g(t) of the form

g(t) = 21 a0 + a1 cos t + · · · + an cos nt + b1 sin t + · · · + bn sin nt that minimizes the mean square error





[f (t) − g(t)]2 dt

0

has coefficients



1

ak =

π

f (t) cos kt dt,

k = 0, 1, 2, . . . , n

f (t) sin kt dt,

k = 1, 2 , . . . , n

0



1

bk =



π



0

If the original function f (t) is defined over the interval [0, T ] instead of [0, 2π ], a change of scale will yield the following result (see Exercise 8):

THEOREM 10.18.2 Minimizing Mean Square Error on [0, T ]

If f (t) is continuous on [0, T ], then the trigonometric function g(t) of the form 1 2

g(t) = a0 + a1 cos



t + · · · + an cos

T

2nπ

T

t + b1 sin



T

t + · · · + bn sin

that minimizes the mean square error



T

[f (t) − g(t)]2 dt

0

has coefficients

ak = bk =

2

T 2

T



T

f (t) cos 0



T

f (t) sin 0

2kπ t

T 2kπ t

T

dt,

k = 0, 1, 2, . . . , n

dt,

k = 1, 2 , . . . , n

2nπ

T

t

10.18 A Least Squares Model for Human Hearing

695

E X A M P L E 1 Least Squares Approximation to a Sound Wave p(t)

A t 0

T = .0002

–A

Figure 10.18.7

2T

Let a sound wave p(t) have a saw-tooth pattern with a basic frequency of 5000 cps (Figure 10.18.7). Assume units are chosen so that the normal atmospheric pressure is at the zero level and the maximum amplitude of the wave is A. The basic period of the wave is T = 1/5000 = .0002 second. From t = 0 to t = T , the function p(t) has the equation ! 2A T p(t) = −t T 2 Theorem 10.18.2 then yields the following (verify):

a0 = ak = bk =

2

T 2

T 2

T



T

p(t) dt =

0



T

p(t) cos T

p(t) sin

T

2A

T

T

2

dt =

T 2kπ t

dt =

T

0

T

0

2kπ t

0





2

2

T

− t dt = 0



T 2

!



T

0

T

0

2A

T

T

2

2A

T

T

2

! − t cos ! − t sin

2kπ t

T

2kπ t

T

dt = 0, k = 1, 2, . . . dt =

2A



, k = 1, 2 , . . .

We can now investigate how the sound wave p(t) is perceived by the human ear. We note that 4/T = 20,000 cps, so we need only go up to k = 4 in the formulas above. The least squares approximation to p(t) is then

q(t) =



2A

4π 6π 8π 1 1 1 t + sin t + sin t + sin t sin T 2 T 3 T 4 T

π



The four sinusoidal terms have frequencies of 5000, 10,000, 15,000, and 20,000 cps, respectively. In Figure 10.18.8 we have plotted p(t) and q(t) over one period. Although q(t) is not a very good point-by-point approximation to p(t), to the ear, both p(t) and q(t) produce the same sensation of sound.

y A

1 2

A

q(t) p(t)

T = .0002 t 0 – 12 A

Figure 10.18.8

–A

As discussed in Section 6.6, the least squares approximation becomes better as the number of terms in the approximating trigonometric polynomial becomes larger. More precisely,



2π 0



2

1 (ak cos kt + bk sin kt) f (t) − a0 − 2 n

k=1

dt

696

Chapter 10 Applications of Linear Algebra

tends to zero as n approaches infinity. We denote this by writing 1 2

f (t) ∼ a0 +

⬁ (ak cos kt + bk sin kt) k=1

where the right side of this equation is the Fourier series of f (t). Whether the Fourier series of f (t) converges to f (t) for each t is another question, and a more difficult one. For most continuous functions encountered in applications, the Fourier series does indeed converge to its corresponding function for each value of t .

Exercise Set 10.18 1. Find the trigonometric polynomial of order 3 that is the least squares approximation to the function f (t) = (t − π)2 over the interval [0, 2π ]. 2. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function f (t) = t 2 over the interval [0, T ]. 3. Find the trigonometric polynomial of order 4 that is the least squares approximation to the function f (t) over the interval [0, 2π ], where

8

sin t, 0,

f (t) =

0≤t ≤π π < t ≤ 2π

4. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function f (t) = sin 21 t over the interval [0, 2π]. 5. Find the trigonometric polynomial of arbitrary order n that is the least squares approximation to the function f (t) over the interval [0, T ], where



f (t) =

t,

0 ≤ t ≤ 21 T

T − t,

1 T 2

b if and only if (a − b) > 0

(10)

To prove an “if and only if” statement of form (9), you must prove both H ⇒ C and C ⇒ H. Equivalent statements are often phrased in other ways. For example, statement (10) might also be expressed as If a > b, then (a − b) > 0 and conversely. Sometimes two true statements will give you a third true statement for free. Specifically, if it is true that H ⇒ C and C ⇒ D , then it follows that H ⇒ D must also be true. For example, consider the following two theorems from geometry. THEOREM 1 lf opposite sides of a quadrilateral are parallel, then the quadrilateral is a

parallelogram.

THEOREM 2

Opposite sides of a parallelogram have equal lengths.

Because the conclusion of the first theorem is essentially the hypothesis of the second, the two theorems together yield the following third theorem. THEOREM 3

lengths.

If opposite sides of a quadrilateral are parallel, then they have equal

Appendix A Working with Proofs

A3

To take this idea a step further, three true statements can sometimes yield three other true statements for free. Specifically, if

H

H ⇒ C, C ⇒ D, D ⇒ H

(11)

then we have the implication loop in Figure A.1, from which we see that

C ⇒ H, D ⇒ C, H ⇒ D D

C

By combining this result with (11) we obtain

Figure A.1

H ⇐ ⇒ C, C ⇐ ⇒ D, D ⇐ ⇒H

(12)

In summary, if you want to prove the three equivalences in (12) you need only prove the three implications in (11). Reductio ad Absurdum

It is a matter of logic that a statement cannot be both true and false. This fact is the basis for a method of proof, called “reductio ad absurdum” or, more commonly, “proof by contradiction,” the idea of which is to make the assumption that the conclusion of a statement is false and show that this leads to a contradiction of some sort. The underlying logic is that if H ⇒ C is a true statement, then the statement

(H and ∼C) ⇒ C must be false, for otherwise C would be both true and false. Sets

Many of the proofs in this text are concerned with sets (or collections) of objects, the objects being called the elements of the set. Although a set can generally include any kinds of objects, in linear algebra the objects are typically “scalars,” “matrices,” or “vectors” (terms that are all defined in the text). We assume that you are already familiar with the basic terminology and notation of sets, but we will review it quickly here. Sets are generally denoted by capital letters and their elements by lowercase letters. One way to describe a set is to simply list its elements enclosed by braces; for example,

S = {1, 3, 5}

(13)

By agreement, the elements of a set must all be different, and the order in which the elements are listed does not matter. Thus, for example, the above set might also be written as

S = {3, 5, 1} or S = {5, 1, 3} To indicate that an element a is a member of a set S we write a ∈ S (read, “a belongs to S ”), and to indicate that a is not a member of S we write a  ∈ S (read, “a does not belong to S ”). Thus, for the set in (13) we have 3 ∈ S and 4  ∈ S There are two common ways of denoting sets with infinitely many elements: If the elements have some obvious notational pattern, then the set can be denoted by explicitly specifying some initial elements and using dots to indicate that the remaining elements follow the same pattern. For example, the set of positive integers might be denoted as

S = {1, 2, 3, . . .}

(14)

An alternative method for denoting the set S in (14) is to write

S = {x : x is a positive integer} where the right side is read, ”the set of all x such that x is a positive integer.” This is called set-builder notation. In general, set-builder notation has the form

S = {x :

}

(15)

A4

Appendix A Working with Proofs

where the blank line is replaced by a description that defines those and only those elements in the set S . Of particular interest in this text are the set of real numbers, denoted by R , the set of points in the plane, denoted by R 2 , and the set of points in three-dimensional space, denoted by R 3 . The latter two can be described in set-builder notation as

R 2 = {(x, y): x, y ∈ R} and R 3 = {(x, y, z): x, y, z ∈ R} Operations on Sets

If A and B are arbitrary sets, then the union of A and B , denoted by A ∪ B , is the set of elements that belong to A or B or both; and the intersection of A and B , denoted by A ∩ B , is the set of elements that belong to both A and B . These operations are illustrated in Figure A.2 using Venn Diagrams, named for the British logician John A. Venn (1834–1923). In those diagrams the sets A and B are the regions enclosed by the circles, and the sets A ∪ B and A ∩ B are shaded. In the event that the sets A and B have no common elements, then we say that the sets are disjoint and we write A ∩ B = Ø, where the symbol Ø denotes a set with no elements called the empty set.

A

Figure A.2

B

A ∪ B is shaded.

A

B

A ∩ B is shaded.

If every element of a set A belongs as well to a set B , then we say that A is a subset of B and we write A ⊂ B . If A ⊂ B and B ⊂ A, then A and B have exactly the same elements, so we say that A and B are equal and we write A = B . Ordered Sets

In certain linear algebra problems the order in which elements are listed is important, so we will want to consider ordered sets, that is, sets in which duplicate elements are not allowed but order matters. Thus, for example,

S1 = {3, 5, 1} and S2 = {5, 1, 3} are the same sets, but not the same ordered sets. How to Do Proofs

• A good first step in a proof is to write down in complete sentences what is given (i.e., the hypothesis H ) and what is to be proved (i.e., the conclusion C ). • Once you clearly understand what is given and what is to be proved, you must decide whether you want to prove the theorem directly, or in contrapositive form, or by reductio ad absurdum. You might restate the theorem in the three ways and see which form seems most promising. • Next, you might want to review earlier theorems that could be relevant to your proof. • From this point on it is a matter of experience and intuition, but keep in mind that proving theorems is not an easy task, so don’t be discouraged. As you read through the proofs in the text, observe the techniques and try to make them part of your own repertoire.

APPENDIX B

COMPLEX NUMBERS

Complex numbers arise naturally in the course of solving polynomial equations. For example, the solutions of the quadratic equation ax 2 + bx + c = 0, which are given by the quadratic formula √ −b ± b2 − 4ac x= 2a are complex numbers if the expression inside the radical is negative. In this appendix we will review some of the basic ideas about complex numbers that are used in this text.

Complex Numbers

To deal with the problem that the equation x 2 = −1 has no real solutions, mathematicians of the eighteenth century invented the “imaginary” number

i=

√ −1

which is assumed to have the property

√ i 2 = ( −1)2 = −1

but which otherwise has the algebraic properties of a real number. An expression of the form a + bi or a + ib in which a and b are real numbers is called a complex number. Sometimes it will be convenient to use a single letter, typically z, to denote a complex number, in which case we write z = a + bi or z = a + ib The number a is called the real part of z and is denoted by Re(z), and the number b is called the imaginary part of z and is denoted by Im(z). Thus, Re(3 + 2i) = 3, Re(1 − 5i) = 1, Re(7i) = Re(0 + 7i) = 0, Re(4) = Re(4 + 0i) = 4,

Im(3 + 2i) = 2 Im(1 − 5i) = Im(1 + (−5)i) = −5 Im(7i) = Im(0 + 7i) = 7 Im(4) = Im(4 + 0i) = 0

Two complex numbers are considered equal if and only if their real parts are equal and their imaginary parts are equal; that is,

a + bi = c + di if and only if a = c and b = d A complex number z = bi whose real part is zero is said to be pure imaginary. A complex number z = a whose imaginary part is zero is a real number, so the real numbers can be viewed as a subset of the complex numbers. Complex numbers are added, subtracted, and multiplied in accordance with the standard rules of algebra but with i 2 = −1:

(a + bi) + (c + di) = (a + c) + (b + d)i (a + bi) − (c + di) = (a − c) + (b − d)i (a + bi)(c + di) = (ac − bd) + (ad + bc)i

(1) (2) (3)

Multiplication formula (3) is obtained by expanding the left side and using the fact that i 2 = −1. Also note that if b = 0, then the multiplication formula simplifies to

a(c + di) = ac + adi

(4)

The set of complex numbers with these operations is commonly denoted by the symbol C and is called the complex number system. A5

A6

Appendix B Complex Numbers

E X A M P L E 1 Multiplying Complex Numbers

As a practical matter, it is usually more convenient to compute products of complex numbers by expansion, rather than substituting in (3). For example,

(3 − 2i)(4 + 5i) = 12 + 15i − 8i − 10i 2 = (12 + 10) + 7i = 22 + 7i

The Complex Plane

A complex number z = a + bi can be associated with the ordered pair (a, b) of real numbers and represented geometrically by a point or a vector in the xy-plane (Figure B.1). We call this the complex plane. Points on the x -axis have an imaginary part of zero and hence correspond to real numbers, whereas points on the y -axis have a real part of zero and correspond to pure imaginary numbers. Accordingly, we call the x -axis the real axis and the y -axis the imaginary axis (Figure B.2).

Imaginary axis

y

y a + bi

b

b

a + bi

(Imaginary part of z)

z = a + bi

b

Real axis a (Real part of z)

x

x a

a

Figure B.1

Figure B.2

Complex numbers can be added, subtracted, or multiplied by real numbers geometrically by performing these operations on their associated vectors (Figure B.3, for example). In this sense the complex number system C is closely related to R 2 , the main difference being that complex numbers can be multiplied to produce other complex numbers, whereas there is no multiplication operation on R 2 that produces other vectors in R 2 (the dot product defined in Section 3.2 produces a scalar, not a vector in R 2 ).

y

y z1

z1

z1 + z2 z2

y z = a + bi

Figure B.3

Figure B.4

The sum of two complex numbers

z2

x

The difference of two complex numbers

(a, b)

x

z = a – bi

x

z 1 – z2

If z = a + bi is a complex number, then the complex conjugate of z, or more simply, the conjugate of z, is denoted by z¯ (read, “z bar”) and is defined by

z¯ = a − bi (a, –b)

(5)

Numerically, z¯ is obtained from z by reversing the sign of the imaginary part, and geometrically it is obtained by reflecting the vector for z about the real axis (Figure B.4).

Appendix B Complex Numbers

A7

E X A M P L E 2 Some Complex Conjugates

z = 3 + 4i z = − 2 − 5i z=i z=7

z¯ = 3 − 4i z¯ = −2 + 5i z¯ = −i z¯ = 7

Remark The last computation in this example illustrates the fact that a real number is equal to its complex conjugate. More generally, z = z¯ if and only if z is a real number.

The following computation shows that the product of a complex number z = a + bi and its conjugate z = a − bi is a nonnegative real number:

z¯z = (a + bi)(a − bi) = a 2 − abi + bai − b2 i 2 = a 2 + b2 You will recognize that z = a + bi

| z|



z¯z =

|z| =

a

Figure B.5

" a 2 + b2

is the length of the vector corresponding to z (Figure B.5); we call this length the modulus (or absolute value of z) and denote it by |z|. Thus,

b

|z| = √a2 + b2

(6)

" √ z¯z = a 2 + b2

(7)



Note that if b = 0, then z = a is a real number and |z| = a 2 = |a|, which tells us that the modulus of a real number is the same as its absolute value. E X A M P L E 3 Some Modulus Computations

z = 3 + 4i z = −4 − 5i z=i Reciprocals and Division

√ 32 + 4 2 = 5 " √ |z| = (−4)2 + (−5)2 = 41 √ |z| = 02 + 12 = 1

|z| =

If z  = 0, then the reciprocal (or multiplicative inverse) of z is denoted by 1/z ( or z−1 ) and is defined by the property ! 1 z=1

z

This equation has a unique solution for 1/z, which we can obtain by multiplying both sides by z¯ and using the fact that z¯z = |z|2 [see (7)]. This yields 1

z

=

z¯ |z|2

(8)

If z2  = 0, then the quotient z1 /z2 is defined to be the product of z1 and 1/z2 . This yields the formula

z1 z¯ 2 z1 z¯ 2 = z1 = z2 |z2 |2 |z2 |2

(9)

Observe that the expression on the right side of (9) results if the numerator and denominator of z1 /z2 are multiplied by z¯ 2 . As a practical matter, this is often the best way to perform divisions of complex numbers.

A8

Appendix B Complex Numbers

E X A M P L E 4 Division of Complex Numbers

Let z1 = 3 + 4i and z2 = 1 − 2i . Express z1 /z2 in the form a + bi . Solution We will multiply the numerator and denominator of

This yields

z1 /z2 by z¯ 2 = 1 + 2i .

z1 z¯ 2 z1 3 + 4i 1 + 2i = = · z2 z2 z¯ 2 1 − 2i 1 + 2i = =

3 + 6i + 4i + 8i 2 1 − 4i 2

−5 + 10i 5

= −1 + 2i

The following theorems list some useful properties of the modulus and conjugate operations.

THEOREM 1 The following results hold for any complex numbers z, z1 , and z2 .

(a)

z1 + z2 = z¯ 1 + z¯ 2

(b)

z1 − z2 = z¯ 1 − z¯ 2

(c)

z1 z2 = z¯ 1 z¯ 2

(d )

z1 /z2 = z¯ 1 /¯z2 z¯¯ = z

(e)

THEOREM 2 The following results hold for any complex numbers z, z1 , and z2 .

Polar Form of a Complex Number

(a, b)

|z| φ a = |z| cos φ

Figure B.6

b = |z| sin φ

(a)

|¯z| = |z|

(b)

|z1 z2 | = |z1 ||z2 |

(c)

|z1 /z2 | = |z1 |/|z2 |

(d )

|z1 + z2 | ≤ |z1 | + |z2 |

If z = a + bi is a nonzero complex number, and if φ is an angle from the real axis to the vector z, then, as suggested in Figure B.6, the real and imaginary parts of z can be expressed as a = |z| cos φ and b = |z| sin φ (10) Thus, the complex number z = a + bi can be expressed as

z = |z|(cos φ + i sin φ)

(11)

which is called a polar form of z. The angle φ in this formula is called an argument of z. The argument of z is not unique because we can add or subtract any multiple of 2π to it to obtain a different argument of z. However, there is only one argument whose radian measure satisfies −π < φ ≤ π (12) This is called the principal argument of z.

Appendix B Complex Numbers

A9

E X A M P L E 5 Polar Form of a Complex Number

Express z = 1 −



3i in polar form using the principal argument.

Solution The modulus of z is

|z| =

#



12 + (− 3)2 =



4=2



Thus, it follows from (10) with a = 1 and b = − 3 that 1 = 2 cos φ and





3 = 2 sin φ

1

and this implies that

π 3 √3

2



1 3 cos φ = and sin φ = − 2 2 The unique angle φ that satisfies these equations and whose radian measure satisfies (12) is φ = −π/3 (Figure B.7). Thus, a polar form of z is

(1, – √3)

' ' π( ' π (( ' π π( z = 2 cos − + i sin − = 2 cos − i sin

Figure B.7

3

Geometric Interpretation of Multiplication and Division of Complex Numbers

3

3

3

We will now show how polar forms of complex numbers provide geometric interpretations of multiplication and division. Let

z1 = |z1 |(cos φ1 + i sin φ1 ) and z2 = |z2 |(cos φ2 + i sin φ2 ) be polar forms of the nonzero complex numbers z1 and z2 . Multiplying, we obtain

z1 z2 = |z1 ||z2 |[(cos φ1 cos φ2 − sin φ1 sin φ2 ) + i(sin φ1 cos φ2 + cos φ1 sin φ2 )] Now applying the trigonometric identities z1z2

cos(φ1 + φ2 ) = cos φ1 cos φ2 − sin φ1 sin φ2 sin(φ1 + φ2 ) = sin φ1 cos φ2 + cos φ1 sin φ2

y z2

z1

|z2| |z1||z2|

φ2 φ1 φ1 + φ2

Figure B.8

yields

|z1|

z1 z2 = |z1 ||z2 |[cos(φ1 + φ2 ) + i sin(φ1 + φ2 )] x

(13)

which is a polar form of the complex number that has modulus |z1 ||z2 | and argument φ1 + φ2 . Thus, we have shown that multiplying two complex numbers has the geometric effect of multiplying their moduli and adding their arguments (Figure B.8). Similar kinds of computations show that

z1 |z1 | [cos(φ1 − φ2 ) + i sin(φ1 − φ2 )] = z2 |z2 |

(14)

which tells us that dividing complex numbers has the geometric effect of dividing their moduli and subtracting their arguments (each in the appropriate order).

E X A M P L E 6 Multiplying and Dividing in Polar Form

Use polar forms of the complex numbers z1 = 1 + z1 z2 and z1 /z2 .



3i and z2 =



3 + i to compute

A10

Appendix B Complex Numbers Solution Polar forms of these complex numbers are

' ' π π( π π( and z2 = 2 cos + i sin z1 = 2 cos + i sin 3

3

6

6

(verify). Thus, it follows from (13) that

% 'π 'π % 'π ( ' π (& π( π (& + i sin = 4 cos + i sin = 4i z1 z2 = 4 cos + + 3

6

3

6

2

2

and from (14) that

'π 'π ( ' π ( √3 1 % 'π z1 π( π (& + i sin = cos + i sin = = 1 · cos − − + i z2 3 6 3 6 6 6 2 2

As a check, let us calculate z1 z2 and z1 /z2 directly:

√ √ √ √ z1 z2 = (1 + 3i)( 3 + i) = 3 + i + 3i + 3i 2 = 4i √ √ √ √ √ √ √ z1 1 + 3i 1 + 3i 3−i 3 − i + 3i − 3i 2 2 3 + 2i 3 1 = √ = = + i = √ ·√ = 2 z2 3 − i 4 2 2 3+i 3+i 3−i

y iz z

which agrees with the results obtained using polar forms. 90° x

Figure B.9

DeMoivre’s Formula

Remark The complex number i has a modulus of 1 and a principal argument of π/2. Thus, if z is a complex number, then iz has the same modulus as z but its argument is greater by π/2 (= 90◦ ); that is, multiplication by i has the geometric effect of rotating the vector z counterclockwise by 90◦ (Figure B.9).

If n is a positive integer, and if z is a nonzero complex number with polar form

z = |z|(cos φ + i sin φ) then raising z to the nth power yields

zn = z · z · · · · · z = |z|n [cos(φ + φ + · · · + φ )] + i[sin(φ + φ + · · · + φ )] n factors

n terms

n terms

which we can write more succinctly as

zn = |z|n (cos nφ + i sin nφ)

(15)

In the special case where |z| = 1 this formula simplifies to

zn = cos nφ + i sin nφ which, using the polar form for z, becomes

(cos φ + i sin φ)n = cos nφ + i sin nφ

(16)

This result is called DeMoivre’s formula, named for the French mathematician Abraham de Moivre (1667–1754). Euler’s Formula

If θ is a real number, say the radian measure of some angle, then the complex exponential function eiθ is defined to be

eiθ = cos θ + i sin θ

(17)

which is sometimes called Euler’s formula, named for the Swiss mathematician Leonhard Euler (1707–1783). One motivation for this formula comes from the Maclaurin series in calculus. Readers who have studied infinite series in calculus can deduce (17) by formally

Appendix B Complex Numbers

A11

substituting iθ for x in the Maclaurin series for ex and writing

(iθ )2 (iθ )3 (iθ )4 (iθ )5 (iθ )6 + + + + + ··· 2! 3! 4! 5! 6! θ2 θ3 θ4 θ5 θ6 = 1 + iθ − −i + +i − + ··· 2! 3! 4! 5! 6! ! ! θ2 θ4 θ6 θ3 θ5 = 1− + − + ··· + i θ − + − ··· 2! 4! 6! 3! 5! = cos θ + i sin θ

eiθ = 1 + iθ +

where the last step follows from the Maclaurin series for cos θ and sin θ . If z = a + bi is any complex number, then the complex exponential e z is defined to be ez = ea+bi = ea eib = ea (cos b + i sin b) (18) It can be proved that complex exponentials satisfy the standard laws of exponents. Thus, for example, 1 e z1 z1 z2 z1 +z2 z1 −z2 −z

e e =e

,

ez2

=e

,

ez

=e

ANSWERS

TO

EXERCISES

Exercise Set 1.1 (page 8) 1. (a), (c), and (f) are linear equations; (b), (d), and (e) are not linear equations 3. (a) a11 x1 + a12 x2 = b1

(b) a11 x1 + a12 x2 + a13 x3 = b1

a21 x1 + a22 x2 = b2

=0 5. (a) 2x1 3x1 − 4x2 = 0 x2 = 1 ⎡

−2

7. (a) ⎣ 3 9



(b) 3x1 − 2x3 = 5 7x1 + x2 + 4x3 = −3 − 2x2 + x3 = 7



6 8⎦ −3

(c) a11 x1 + a12 x2 + a13 x3 + a14 x4 = b1

a21 x1 + a22 x2 + a23 x3 = b2 a31 x1 + a32 x2 + a33 x3 = b3

6 (b) 0

−1

3 −1

5

4 1

a21 x1 + a22 x2 + a23 x3 + a24 x4 = b2



0 (b) ⎣−3 6

2 −1 2

0 1 −1

−3 0 2

1 0 −3



0 −1 ⎦ 6

9. (a), (d), and (e) are solutions; (b) and (c) are not solutions 11. (a) No points of intersection (b) Infinitely many points of intersection: x = (c) One point of intersection: (−8, −4)

+ 2t, y = t

13. (a) x = (b) (c) (d)

3 + 57 t, y = t 7 7 x1 = 3 + 53 r − 43 s, x2 = r, x3 x1 = − 18 + 41 r − 58 s + 43 t, x2 v = 83 t1 − 23 t2 + 13 t3 − 43 t4 , w

1 2

=s = r, x3 = s, x4 = t = t 1 , x = t2 , y = t3 , z = t 4

15. (a) x = 21 + 23 t, y = t (b) x1 = −4 − 3r + s, x2 = r, x3 = s 17. (a) Add 2 times the second row to the first row. (b) Add the third row to the first row, or interchange the first row and the third row. 19. (a) All values of k  = 2 (b) All values of k

25. 2x + 3y + z = 7 2x + y + 3z = 9 4x + 2y + 5z = 16

27.

x + y + z = 12 2x + y + 2z = 5 −x + z= 1

True/False 1.1 (a) True

(b) False

(c) True

(d) True

(e) False

(f) False

(g) True

(h) False

Exercise Set 1.2 (page 22) 1. (a) Both 3. (a) (b) (c) (d)

(b) Both

(c) Both

(e) Both

(f) Both

(g) Row echelon form

x = −37, y = −8, z = 5 w = −10 + 13t , x = −5 + 13t , y = 2 − t , z = t x1 = −11 − 7s + 2t, x2 = s, x3 = −4 − 3t, x4 = 9 − 3t, x5 = t No solution

5. x1 = 3, x2 = 1, x3 = 2

7. x = −1 + t, y = 2s, z = s, w = t

11. x = −1 + t, y = 2s, z = s, w = t 17. x1 = 23. (a) (b) (c) (d)

(d) Both

− 41 s, x2

=

− 41 s

13. Has nontrivial solutions

− t, x3 = s, x4 = t

9. x1 = 3, x2 = 1, x3 = 2 15. x1 = 0, x2 = 0, x3 = 0

19. w = t, x = −t, y = t, z = 0

21. I1 = −1, I2 = 0, I3 = 1, I4 = 2.

Consistent; unique solution Consistent; infinitely many solutions Inconsistent Insufficient information provided A13

A14

Answers to Exercises

25. No solutions when a = −4; infinitely many solutions when a = 4; one solution for all values a  = −4 and a  = 4



31. E.g.,

1 0



27. −a + b + c = 0

29. x = 23 a − 19 b, y = − 13 a + 29 b

3 1 and 1 0



0 (other answers are possible) 1

37. a = 1, b = −6, c = 2, d = 10



35. x = ±1, y = ± 3, z = ± 2

39. The nonhomogeneous system has only one solution.

True/False 1.2 (a) True

(b) False

(c) False

(d) True

(e) True

(f) False

(g) True

(h) False

(i) False

Exercise Set 1.3 (page 36) (b) Defined; 4 × 4 matrix

1. (a) Undefined

(d) Defined; 5 × 2 matrix



7

6

⎢ 3. (a) ⎣−2

5



(e) Defined; 4 × 5 matrix



−5

⎥ 3⎦

(c) Defined; 4 × 2 matrix

⎤ −1 ⎥ −1⎦

4



⎢ 1 (b) ⎣ 0 −1 7 3 7 −1 1 1 ⎡ ⎤ 22 −6 8 ⎢ ⎥ 4 6⎦ (e) Undefined (f) ⎣−2 (g) 10 (i) 5



(j) −25

 (f)

(k) 168



−3 5⎦

12 5. (a) ⎣−4 4

0

17

35



−39 −33

42

108

75

(c) ⎣12 36

−3

21⎦

⎢ ⎣

9





11 8

1



41



41



(b) 63

78

63

12

6

−20

24

8

41

⎡ ⎤





⎢ ⎥





−2

3

7

4

⎡ ⎤

⎢ ⎥









−2

3

⎢ ⎥

9

⎡ ⎤ ⎢ ⎥





⎡ ⎤

⎢ ⎥





⎢ ⎥

7

third column of AA = 7 ⎣6⎦ + 4 ⎣ 5⎦ + 9 ⎣4⎦



⎢ 11. (a) A = ⎣9

2

−3 −1

1

5

4

0



⎢5 ⎢ (b) A = ⎢ ⎣2

−5

0

3

1

0

4

9

(d) ⎣11 7

−11



0







13

(e) ⎣11 7

−11

9

17

9

⎤ ⎥

17⎦ 13

9



14⎦

(i) 61

(j) 35

(k) 28

(l) 99

16

⎡ ⎤

⎡ ⎤ 

(e) 24

56

76



⎢ ⎥

(f) ⎣98⎦ 97

97

⎡ ⎤





⎡ ⎤

⎢ ⎥





⎢ ⎥

7

−2 7

⎡ ⎤

4









−2

6

⎢ ⎥

5

⎡ ⎤ 4

⎢ ⎥

second column of BB = −2 ⎣0⎦ + 1 ⎣ 1⎦ + 7 ⎣3⎦ 7 7 5

⎡ ⎤





⎡ ⎤

⎢ ⎥





⎢ ⎥

6

−2

4

third column of BB = 4 ⎣0⎦ + 3 ⎣ 1⎦ + 5 ⎣3⎦

⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ x1 5 7 2 −3 5 7 x1 ⎥ ⎥⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 1⎦, x = ⎣x2 ⎦, b = ⎣−1⎦; ⎣9 −1 1⎦ x2 = ⎣−1⎦ x3 4 0 1 5 4 0 x3 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎡ ⎤ ⎡ ⎤ −3 1 1 4 0 −3 1 1 x1 x1 ⎢x ⎥ ⎢ 3⎥ ⎢ 5 ⎢ ⎥ ⎢ ⎥ 0 −8 ⎥ 1 0 −8⎥ ⎢ 2⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢x2 ⎥ ⎢3⎥ ⎥, x = ⎢ ⎥ , b = ⎢ ⎥; ⎢ ⎥⎢ ⎥ = ⎢ ⎥ ⎣x3 ⎦ ⎣0⎦ ⎣2 −5 9 −1 ⎦ 9 −1⎦ ⎣x3 ⎦ ⎣0⎦ 2 2 0 3 −1 −1 7 x4 7 x4 ⎤

45



17⎦

17

3

7

⎡ ⎤

−2



(b) first column of BB = 6 ⎣0⎦ + 0 ⎣ 1⎦ + 7 ⎣3⎦

second column of AA = −2 ⎣6⎦ + 5 ⎣ 5⎦ + 4 ⎣4⎦ 0 4 9 3



6

9. (a) first column of AA = 3 ⎣6⎦ + 6 ⎣ 5⎦ + 0 ⎣4⎦ 0

0

45

(d) ⎣ 6⎦ 63

⎡ ⎤



0

3

⎢ ⎥

(c) ⎣21⎦ 67



0⎦





 −14 −35

−28 −7

(h) ⎣0 0

6

⎢ ⎥

57

−7 −21

⎤ −24 ⎥ −15⎦ −30



⎢ (h) ⎣48



67

(d)



⎡ ⎤

7. (a) 67

5

−21 −6 −12



−2

0 (g) 12

5





⎥ 10⎦

(l) Undefined

(b) Undefined



17



0

⎢ (c) ⎣−5

4

1

21

(f) Defined; 5 × 5 matrix

15

7

7

5

Answers to Exercises A15

13. (a) 5x1 + 6x2 − 7x3 = 2 −x1 − 2x2 + 3x3 = 0 4x2 − x3 = 3 15. k = −1

 

17.

4  0 2

  19.

 



1  1 4

 −3  −2 −1





2 +

1

(b) x + y + z = 2 = 2 2 x + 3y 5x − 3y − 6z = −9

 



2  2 + 3 5





0

1 =

3

0





3  4 + 5 6

6 =

4

8

2

4

1

2

4

8



 +



2

 +

  −9 −3 6 −5 = 2 −1 −3 −1    

6

6

8

15

+

20

15

18

30

36

=

5



3



22

28

49

64

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ x1 −4 −2 −3 0 ⎢ 0⎥ ⎢ 0⎥ ⎢x ⎥ ⎢ ⎥ ⎢ 1⎥ ⎢ 2 ⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢x3 ⎥ ⎢ 0⎥ ⎢ −2 ⎥ ⎢ ⎥ ⎢ 0⎥ ⎥ = ⎢ ⎥ + r ⎢ ⎥ + s ⎢ ⎥ + t ⎢ 0⎥ 21. ⎢ ⎢x ⎥ ⎢ 0⎥ ⎢ 1⎥ ⎢ 0⎥ ⎢ 0⎥ 4 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ x5 ⎦ ⎣ 0⎦ ⎣ 1⎦ ⎣ 0⎦ 0 1 0 0 0 x6 3 23. a = 4 , b = −6 , c = −1 , d = 1



1

1

27. The only matrix satisfying the given condition is A = ⎣1

−1

0

0



 29. (a)

1

1

1

1





−1 −1

and

−1 −1

0

⎤ ⎥

0⎦. 0

 √

(b) Four square roots can be found:

5

0

  √ − 5

0

0

3

0

3

,

 √ ,

5

0

0

−3

 , and



 √ − 5

0

0

−3

 .



the total cost of items purchased in January ⎢the total cost of items purchased in February⎥ ⎥ 33. The matrix product represents ⎢ ⎣the total cost of items purchased in March ⎦. the total cost of items purchased in April

True/False 1.3 (a) True (l) False

(b) False (m) True

(c) False (n) True

(d) False (o) False

Exercise Set 1.4 (page 49) 

5.

3 20

1 5

− 15 

19. (a)

1 10





7.



41

15

30

11



1 2

0

0

1 3

 (b)



1 2





1 0

ex + e−x



  − 21 ex − e−x   

9.

11

−15

−30

41

(c)

23. The matrices commute if c = 0 and a = d . 31. (a) E.g., A =

(e) True



0 0 and B = 0 0

1 0

6

2

4

2

(f) False

(g) False

(h) True

   2 − 21 ex − e−x 15. 71   1 ex + e−x 7 2   

25. x1 =

21. (a) 1 23

, x2 =

1

1

2

−1

13 23

(b)





1 3 7

17.



20

7

14

6

1 27. x1 = − 11 , x2 =

(b) (A + B) (A − B) = A (A − B) + B (A − B) = A2 − AB + BA − B 2 (c) AB = BA

(i) True

− 139

 (c) 6 11

2 13

(j) True

1 13 − 136



36

13

26

10



(k) True

A16

Answers to Exercises

⎡ 37. Invertible; A−1

35. No



1⎥ 2⎦

1 2

− 21

1 2



− 21

1 2

1 2

⎢ 1 =⎢ ⎣− 2

39. B −1

1 2

True/False 1.4 (a) False

(b) False

(c) False

(d) False

(e) False

(f) True

(g) True

(h) True

(i) False

(j) True

(k) False

Exercise Set 1.5 (page 58) 1. (a) Elementary

(b) Not elementary

(c) Not elementary

 3. (a) Add 3 times the second row to the first row:

1

3

0

1



− 17



(b) Multiply the first row by − 17 : ⎣ 0 0

0

0

1

0⎦

0

1

5. (a) Interchange the first and second rows: EA =

3

−1



−6 −2 ⎡

−6

0

⎢ 7. (a) ⎣0 1

0



1

1

⎥ 0⎦

0

0

0

⎢ (b) ⎣0 1







1

⎥ 0⎦

0

0

⎢ (c) ⎣ 0 −2

0

−40

16

11. (a) The inverse is ⎣ 13

−5 −2



5



−3

1 2 1 4

1 4 ⎢− 1 ⎢ 8

17. ⎢

⎣ 0

0

1 40

− 201

− 23 1 2 1 − 10

0

1

2

13

23. A =

1

−2

0

1



⎡1 ⎢ ⎢ 0 19. (a) ⎢ ⎢ 0 ⎣

k2

0

0



1

0

1

0

1

5

2

1

0

−8

0

1

(answer is not unique)

0

k1

1

1

1

0

−4

−1

5

9

4

−12



 ;A

=

0

0

0⎦

0

0

1

0⎥ ⎥



−10



1

−5

1

1

1 2

− 21

1⎤ 2



0

1 13. ⎢ ⎣− 2

1 2

1⎥ 2⎦

7 2

15. ⎣−1

1

1 2

1 2

0

−1

k

− k1

0 (b) ⎢ ⎣0 0

1 0 0



0

1





0

⎡1

1



2



0

9. (a)



− 21 0 0



0 0⎥ ⎥

k

− k1 ⎦

0

1

1



1

0

1

0

0

− 18

−2

1



−7

4

2

−1



k4

 −1

0

3⎥ ⎦

⎥ 0⎦

⎢ (d) ⎣0

⎥ 0⎥ ⎥ 0⎥ ⎦

0

1



0



k3

−4

1

0

0

1

6



0

1

0

0



0



⎥ 0⎦

0

0

5⎦

(b) Not invertible

− 15 

0

−1 −3



⎥ −3⎦. −1

1

28



21. Any value of c other than 0 and 1



0

0







9



0⎥ ⎥ ⎥ 0⎦

1

−6 −1

5

3



0⎦

0

(c) Add 4 times the third row to the first row: EA = ⎣ 2



1

(d) Interchange the first and third rows: ⎢

⎢ (b) Add −3 times the second row to the third row: EA = ⎣ 1 −1 ⎡

0

⎢0 ⎢ ⎣1





0





1







(c) Add 5 times the first row to the third row: ⎣0 5



(d) Not elementary

1

2

0

1



(b) Not invertible

−3

⎤ ⎥

0⎦ 1

Answers to Exercises A17



⎤⎡

⎤⎡

1

0

0

1

0

25. A = ⎣0

4

0⎦ ⎣0

1

3⎥⎢ 0 4⎦⎣

0

0

1

0

1



⎥⎢

0

0

1

−2

0

0









⎤⎡

1

0

2

1

0

1

0⎦; A−1 = ⎣0

1

0⎦ ⎣0

1

0

1

0

1

0

0

⎥⎢

0

⎤⎡

0

1

⎥⎢ − 43 ⎦ ⎣0 0

1



0

0

1 4

0⎦

0

1



(answer is not unique) 27. Add −1 times the first row to the second row; add −1 times the second row to the first row; add −1 times the first row to the third row (answer is not unique)

True/False 1.5 (a) False

(b) True

(c) True

(d) True

(e) True

(f) True

(g) False

Exercise Set 1.6 (page 66) 1. x1 = 3, x2 = −1 9. (i) x1 =

22 , 17

11. (i) x1 =

7 , 15

3. x1 = −1, x2 = 4, x3 = −7

x2 =

1 17

(ii) x1 =

21 , 17

x2 =

4 15

(ii) x1 =

34 15

x2 =

5. x = 1, y = 5, and z = −1

, x2 =

28 15

(iii) x1 =



11

12

−3

27

19. X = ⎣ −6

−8

1

−18

⎥ −17⎦

−15

−21

9

−38

−35



26

19 , 15

x2 =

(iv) x1 = − 15 , x2 =

13 15

15. b1 = b2 + b3

13. The system is consistent for all values of b1 and b2 .



7. x1 = 2b1 − 5b2 , x2 = −b1 + 3b2

11 17 3 5

17. b1 = b3 + b4 and b2 = 2b3 + b4

True/False 1.6 (a) True

(b) True

(c) True

(d) True

(e) True

(f) True

(g) True

Exercise Set 1.7 (page 72) 1. (a) (b) (c) (d)

⎡ ⎢

Upper triangular and invertible Lower triangular and not invertible Diagonal, upper triangular, lower triangular, and invertible Upper triangular and not invertible

6





⎥ −1⎦

5. ⎣

−15

3

3. ⎣4 4



10

⎡1 4

⎢ 9. A = ⎣ 0 2

0

⎡ ⎢

au

15. (a) ⎣bw

cy

0

0



10

0

20

2

−10

6

0

18

−6 ⎡

−6

−6 ⎤

4

1 9

⎥ ⎢ 0 ⎦, A−2 = ⎣0

0

1 16

av



⎥ bx ⎦ cz

19. Not invertible

0



0

0

0

tc

(b) ⎣ua

vb

⎥ wc⎦

xa

yb

zc

 17. (a)

23. −3, 5, −6

0

3

0

4k

2

−1

−1

3



1

0

0

4



25. a = −8



, A−2 =



⎥ 0⎦

k

0



sb

21. Invertible

2k

16

ra



 7. A2 =

⎥ ⎢ 0⎦, A−k = ⎣ 0

9 0

⎤ −20 ⎥ 6⎦ −6 ⎡



0

⎢ 11. ⎣0 0



1

0



1

0

0

1 4

0



, A−k =



0

⎥ 0⎦

0

0

3

7

 13.

2

0 1

0

1

0

0

−1



(−2)k





⎢3 ⎢ ⎣7

1

−8

−8

0

−3⎥ ⎥ ⎥ 9⎦

2

−3

9

0

(b) ⎢

1

27. All x such that x  = 1 , x  = −2 , and x  = 4

29. They are reciprocals of the corresponding diagonal entries of the matrix A.



1

0

31. ⎣0

−1

0

0



0

⎤ ⎥

0⎦

−1

A18

Answers to Exercises

(b) Not symmetric (unless n = 1)

37. (a) Symmetric

 39.

1

10

0

−2







0

0

4

41. (a) ⎣ 0

0

1⎦

−4

−1







0

0

(b) ⎣0

0

−8 ⎥ −4 ⎦

8

4

0



0

(d) Not symmetric (unless n = 1)

(c) Symmetric



43. No

True/False 1.7 (a) True (l) False

(b) False (m) True

(c) False

(d) True

(e) True

(f) False

(g) False

(h) True

(i) True

(j) False

(k) False

Exercise Set 1.8 (page 82) 1. (a) (b) (c) (d)

Domain: Domain: Domain: Domain:

R 2 ; codomain: R 3 ; codomain: R 3 ; codomain: R 6 ; codomain:

R3 R2 R3 R

3. (a) Domain: R 2 ; codomain: R 2 (b) Domain: R 2 ; codomain: R 3

5. (a) Domain: R 3 ; codomain: R 2 (b) Domain: R 2 ; codomain: R 3

 7. (a) Domain: R 2 ; codomain: R 2 (b) Domain: R 3 ; codomain: R 2



0

1

⎢−1 ⎢ 13. (a) ⎢ ⎣ 1 ⎡







0⎥ ⎥

7

⎢ (b) ⎣ 0 −1



3⎦

−1

1

9. Domain: R 2 ; codomain: R 3

2

−1

1

1

⎥ 0⎦

0

0

0

1

0



5

−1



0

1

0

19. (a) TA (x) =

1

0

0

⎥ 0⎥ ⎥ 0⎥ ⎥ ⎥ 0⎦

0

0

0



0

⎢1 ⎢ ⎢ (d) ⎢ ⎢0 ⎢ ⎣0 1

1

5

−1

0

1

0

0

0⎥

0

1 0

0

−1

4

1

2

3

4







3

−2

 =

−1 1

0

 (b) TA (x) =

(b) False

 −1

2

3

1

⎡ ⎤    −1 0 ⎢ ⎥ 3 ⎣ 1⎦ = 5

13

3

⎡ ⎤ 2

(c) True

(d) False

2

(b) ⎣0

−1

4

7





0

1

7

⎥ ⎥ 0⎥ ⎥ ⎥ 0⎦ 0



⎡ ⎤ 2

⎢ ⎥

(b) ⎣5⎦ 6

⎡ ⎢

0

(e) True

(f) False

⎤ ⎥

(c) ⎣ 14⎦

−21

True/False 1.8 (a) False

−3

3



1 1⎦; T (2, 1, −3) = (0, −2, 0) 0

⎢ ⎥ ⎥ 0 −3⎦ ; T (x) = ⎣6⎦ 29. (a + c, b + d) 1 0 1 −1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ −1 3 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 31. (a) TA (e1 ) = ⎣ 2⎦, TA (e2 ) = ⎣1⎦, TA (e3 ) = ⎣ 2⎦ 4 5 −3 ⎢ 27. ⎣3



2







0

0

0

−1 ⎥ 1⎦; T (−1, 2, 4) = (3, −2, −3) 3 2 −1 ⎡   2 −1 −1 1 1 17. (a) ; T (−1, 4) = (5, 4) (b) ⎣0 3

15. ⎣4

0

⎢0 ⎢ ⎢ (c) ⎢ ⎢0 ⎢ ⎣0

11. (a)



(g) False

25. No, unless b = 0

⎤ −8 ⎥ 5⎦ −1

Answers to Exercises A19

Exercise Set 1.9 (page 94) 1.

x2 − x3 = 100 x3 − x4 = −500 x1 − x2 = 300 −x1 + x4 = 100 (b) x1 = −100 + s , x2 = −400 + s , x3 = −500 + s , x4 = s (c) To keep the traffic flowing on all roads, the flow from A to B

3. (a)

50 40

10

10

50

30

60

must exceed 500 vehicles per hour. 40

5. I1 = 2.6A , I2 = −0.4A , I3 = 2.2A

7. I1 = I4 = I5 = I6 = 0.5A, I2 = I3 = 0A

11. CH3 COF + H2 O → CH3 COOH + HF

4 3

15. 1 +

13 x 6



9. C3 H8 + 5O2 → 3CO2 + 4H2 O

1 3 x 6

2

t=2 t= 1

17. (a) p (x) = 1 + (1 − t) x + tx y (b)

13. 2 − 2x + x

2

2

–1 t= 2 t=–

–2 –1

x

1

True/False 1.9 (a) False

(b) False

(c) True

(d) False

(e) False

Exercise Set 1.10 (page 100)



0.50 0.25 0.25 0.10 (b) M must produce approximately $25,290.32 worth of mechanical work and B must produce approximately $22,580.65 worth of body work

1. (a)



0.10 3. (a) ⎣0.30 0.40

0.60 0.20 0.10



0.40 0.30⎦ 0.20







$31,500 (b) ⎣$26,500⎦ $26,300

5. x ≈

123.08 202.56

True/False 1.10 (a) False

(b) True

(c) False

(d) True

(e) True

Chapter 1 Supplementary Exercises (page 101) 1. 3x1 − x2 + 4x4 = 1 + 3x3 + 3x4 = −1 2x1

3.

x1 = − 23 s − 23 t − 21 , x2 = − 29 s − 21 t − 25 , x3 = s, x4 = t 5. x = 35 x + 45 y , y = − 45 x + 35 y 9. (a) a  = 0 and b  = 2



11.

0 1

2 1

13. (a)

−1 6

3 0

x1 = − 172 , x2 = − 263 , x3 = − 353

7. x = 4, y = 2, z = 3

(b) a  = 0 and b = 2



2x1 − 4x2 + x3 = 6 −4x1 + 3x3 = −1 x2 − x3 = 3

−1

1

(b)



1 3

(c) a = 0 and b = 2

−2 1



(c)

(d) a = 0 and b  = 2

− 113 37

− 160 37

− 20 37

− 46 37



15. a = 1, b = −2, c = 3

A20

Answers to Exercises

Exercise Set 2.1 (page 111) 1. M11 = 29, C11 = 29 M12 = 21, C12 = −21 M13 = 27, C13 = 27 M21 = −11, C21 = 11 M22 = 13, C22 = 13 M23 = −5, C23 = 5 M31 = −19, C31 = −19 M32 = −19, C32 = 19 M33 = 19, C33 = 19 9. a 2 −5a+21 23. 0

3. (a) (b) (c) (d)

11. −65

25. −240

M13 M23 M22 M21

= 0, C13 = 0 = −96, C23 = 96 = −48, C22 = −48 = 72, C21 = −72

13. −123

27. −1

29. 0

−5

11

22

1 11

3 22

5. 22;

15. λ = −3 or λ = 1 31. 6

2

 7. 59;

17. λ = 1 or λ = −1

33. (a) The determinant is 1.

 −2

−7

59

59

7 59

−5



59

19. (all parts) −123

(b) The determinant is 1.

21. −40

35. d1 + λ = d2

37. If n = 1 then the determinant is 1. If n ≥ 2 then the determinant is 0.

True/False 2.1 (a) False

(b) False

(c) True

(d) True

(e) True

(f) True

(g) False

(h) False

(i) False

(j) True

Exercise Set 2.2 (page 117) 5. −5 21. 18

7. −1

9. 33

31. −24

13. −2

11. 6

33. det(B) = (−1)

n/2"

15. −6

19. −6

17. 72

det(A)

True/False 2.2 (a) True

(b) True

(c) False

(d) False

(e) True

(f) True

Exercise Set 2.3 (page 127) 5. det(A + B)  = det(A) + det(B)

7. Invertible

9. Invertible

11. Not invertible



15. k  =



5− 17 2



5+ 17 2

and k  =



−4

17. k  = −1

3

0

⎢ 2 ⎢ ⎣ −7

−1

0

0

−1

6

0

1

23. Invertible; A−1 = ⎢

35. (a) 189

(b)

(c)

8 7

(d)

25. x =

3 , 11

y=

2 , 11

−5 4 −2

z = − 111

−5



5⎦ −3

⎡1



3 2

1

21. Invertible; A−1 = ⎢ ⎣0

1

3⎥ 2⎦

0

0

1 2



2



27. x1 = − 30 , x2 = − 38 , x3 = − 40 11 11 11

−7

31. y = 0

29. Cramer’s rule does not apply. 1 7

⎤ −1 0⎥ ⎥ ⎥ 8⎦

19. Invertible; A−1

3 = ⎣−3 2

13. Invertible

33. (a) −189

(b) − 17

(c) − 87

1 (d) − 56

(e) 7

1 56

True/False 2.3 (a) False (l) False

(b) False

(c) True

(d) False

(e) True

(f) True

(g) True

(h) True

(i) True

(j) True

(k) True

Answers to Exercises A21

Chapter 2 Supplementary Exercises (page 129) 1. −18

5. −10

3. 24

9. Exercise 3: 24; Exercise 4: 0; Exercise 5: −10; Exercise 6: −48

7. 329

11. The matrices in Exercises 1–3 are invertible; the matrix in Exercise 4 is not.

 17.

− 16

1 9

1 6

2 9

⎡1 8

− 18

1 19. ⎢ ⎣8

5 24

1 4

− 127









⎥ − 241 ⎥ ⎦

21. ⎢ ⎣

− 38

2 5

1 5

− 35

− 25

6 5



− 121

25. x = 35 x + 45 y , y = 35 y − 45 x

1 5

29. (b) cos β =

− 101



13. −b2 + 5b − 21 10 329

⎢ 55 ⎢ ⎢ 329 ⎢−3 ⎣ 47



2⎥ 5⎦

23. ⎢

− 103

a 2 +c2 −b2 , 2ac



31 − 329

cos γ =

2 − 329

52 329

11 − 329

43 − 329

10 47

25 − 47

72 329

102 329

15. −120 27 − 329

⎤ ⎥

16 ⎥ 329 ⎥

⎥ − 476 ⎥ ⎦ 15 − 329

a 2 +b2 −c2 2ab

Exercise Set 3.1 (page 140) 1. (a) (3, −4)

(b) (2, −3, 4)

3. (a) (−1, 3)

7. (a) (−1, 2, −4) is one possible answer 9. (a) (1, −4)

(b) (−12, 8)

11. (a) (−1, 9, −11, 1) 13.



− 253 , 7, − 323 , − 23



33.

lb ≈ 183.01 lb and

500 √ 1+ 3

(c) (−90, −114, 60, −36)

9

, − 21 , − 21 2

(b) Parallel to u



31. Magnitude of F is

√ 750√ 2 3+ 3

(b) (−2, −2, −1)

(d) (4, 29)

15. (a) Not parallel to u

(c) −a

(b) 0

(c) (38, 28)

23. (a)

5. (a) (2, 3)

(b) (7, −2, −6) is one possible answer

(b) (−13, 13, −36, −2)

19. c1 = 2, c2 = −1, c3 = 5 29. (a) 0

(b) (−3, 6, 1)

(b)



 23

, − 49 , 41 4



(d) (27, 29, −27, 9) 17. a = 3, b = −1

(c) Parallel to u

(b) (3, −8)

25. (a) (−2, 5)

27. (7, −3, −19)

84 lb ≈ 9.17 lb; the angle with the positive x -axis ≈ −70.9◦

lb ≈ 224.14 lb

True/False 3.1 (a) False

(b) False

(c) False

(d) True

(e) True

Exercise Set 3.2 (page 153) ( '

(f) False

(h) True

(i) False

1. (a) v = 2 3 ; v1 v = √13 , √13 , √13 ; − v1 v = − √13 , − √13 , − √13 ( ( ' ' √ (b) v = 15 ; v1 v = √115 , 0, √215 , √115 , √315 ; − v1 v = − √115 , 0, − √215 , − √115 , − √315 3. (a)



83

7. k =

5 7

(b)

or k= − 57

11. (a) d (u, v) = (b) d (u, v) = 15. (a) (b) (c) (d)









17+ 26

(c) 2 3

(d)



466

5. (a)

9. (a) u · v = −8; u · u = 26; v · v = 24

14; cos θ = √551 ; the angle is acute



59; cos θ

= √ −√4

6 45

(j) True

(

'



(g) False



2570





(b) 3 46 − 10 21 +



42



(c) 2 966

(b) u · v = 0; u · u = 54; v · v = 21 √

13.

45 3 2

; the angle is obtuse

Does not make sense; v · w is a scalar, whereas the dot product is only defined for vectors Makes sense Does not make sense; u · v is a scalar, whereas the norm is only defined for vectors Makes sense

25. 71◦ , 61◦ , 36◦

True/False 3.2 (a) True

(b) True

(c) False

(d) True

(e) True

(f) False

(g) False

(h) False

(i) True

(j) True

(k) False

A22

Answers to Exercises

Exercise Set 3.3 (page 162) 1. (a) Orthogonal

(b) Not orthogonal

3. −2 (x+1) + (y−3) − (z+2) = 0 13. (a)

2 5

21. 1

(b) 23.

√18

25.

17

5. 2z = 0

15. (0, 0), (6, 2)

22

√1

(c) Not orthogonal

5 3

27.

11 √

6



'

29.

7. Not parallel

, 0, − 80 − 16 13 13

17.

(d) Not orthogonal

  55 ,

, 1, − 11 13 13



9. Parallel 19.

11. Not perpendicular

1

( √1 , − √1 , is one possible answer 3 3 3

, − 15 , 101 , − 101 5

√1

31. Yes

 9 ,

, 6 , 9 , 21 5 5 10 10

37.

50√ ,000 2



Nm ≈ 35,355 Nm

True/False 3.3 (a) True

(b) True

(c) True

(d) True

(e) True

(f) False

(g) False

Exercise Set 3.4 (page 170) 1. Vector equation: (x, y) = (−4, 1) + t (0, −8); parametric equations: x = −4, y = 1 − 8t 3. Vector equation: (x, y, z) = t (−3, 0, 1); parametric equations: x = −3t , y = 0, z = t 5. Point: (3, −6); vector: (−5, −1) 7. Point: (4, 6); vector: (−6, −6) 9. Vector equation: (x, y, z) = (−3, 1, 0) + t1 (0, −3, 6) + t2 (−5, 1, 2); parametric equations: x = −3 − 5t2 , y = 1 − 3t1 + t2 , z = 6t1 + 2t2 11. Vector equation: (x, y, z) = (−1, 1, 4) + t1 (6, −1, 0) + t2 (−1, 3, 1); parametric equations: x = −1 + 6t1 − t2 , y = 1 − t1 + 3t2 , z = 4 + t2 13. Vector equation: (x, y) = t (3, 2); parametric equations: x = 3t and y = 2t 15. Vector equation: (x, y, z) = t1 (5, 0, 4) + t2 (0, 1, 0); parametric equations: x = 5t1 , y = t2 , and z = 4t1 17. x1 = −s − t , x2 = s , x3 = t 19. x1 = 37 r −

19 s 7

− 78 t , x2 = − 27 r + 17 s + 37 t , x3 = r , x4 = s , x5 = t

21. (a) (x, y, z) = (1, 0, 0) + (−s − t, s, t) (b) A plane passing through the point (1, 0, 0) and parallel to the vectors (−1, 1, 0) and (−1, 0, 1).

x+ y+z =0

23. (a)

−2x + 3y 25. 27.

(b) A straight line passing through the origin

=0

(a) x1 = − 23 s + 13 t , x2 = s , x3 = t x1 = 13 − 43 r − 13 s , x2 = r , x3 = s ,

(c) x = − 35 t , y = − 25 t , z = t



(c) (x1 , x2 , x3 ) = (1, 0, 1) + − 23 s + 13 t, s, t



x4 = 1;  4  1 general solution of the associated homogeneous system:  1 − 3 r− 3 s, r, s, 0 ; particular solution of the nonhomogeneous system: 3 , 0, 0, 1

29. If T (v) = 0 then the image is a single point; otherwise the image is a line. True/False 3.4 (a) True

(b) False

(c) True

(d) True

(e) False

(f) True

Exercise Set 3.5 (page 179) 1. (a) (32, −6, −4) 3. u × w = 1125 2

(b) (−32, 6, 4)

(c) (52, −29, 10)

5. u × (v × w) = (−14, −20, −82)

(d) 0

(e) (0, 0, 0)

(f) (0, 0, 0)

7. u × v = (18, 36, −18)

9.



59

11. 3

Answers to Exercises A23 √

13. 7

15. √

27. (a)

26 2

374 2

17. 16 √

(b)

21. −92

19. The vectors do not lie in the same plane.



29. 2(v × u)

26 3

23. abc

(b) 132◦ , 109◦ , 132◦

31. (a) 1500 2 Nm ≈ 2121.32 Nm

25. (a) −3 39. (a)

(b) 3 17 6

(b)

(c) 3 1 2

True/False 3.5 (a) True

(b) True

(c) False

(d) True

(e) False

(f) False

Chapter 3 Supplementary Exercises (page 181) 1. (a) (13, −3, 10)



(b)

3. (a) (−5, −12, 20, −2)



70 (b)





(d) − 89 ,

(c) 3 86

106

(c)



20 20 , 9 9

(e) −122



(f) (−3150, −2430, 1170)

 90

(d) − 135 , − 15 , 90 , 77 77 77 77

2810

#

7. (−1, −1, 5)

5. The plane containing A, B , and C .



9.

14 17

√11

11.

35

13. Vector equation: (x, y, z) = (−2, 1, 3) + t1 (1, −2, −2) + t2 (5, −1, −5); parametric equations: x = −2 + t1 + 5t2 , y = 1 − 2t1 − t2 , z = 3 − 2t1 − 5t2 15. Vector equation: (x, y) = (0, −3) + t (8, −1); parametric equations: x = 8t , y = −3 − t 17. Vector equation: (x, y) = (0, −5) + t (1, 3); parametric equations: x = t , y = −5 + 3t 19. 3 (x + 1) + 6 (y − 5) + 2 (z − 6) = 0

21. −18 (x − 9) − 51y − 24 (z − 4) = 0

25. A plane through the origin

Exercise Set 4.1 (page 190) 1. (a) u + v = (2, 6); k u = (0, 6)

(c) Axioms 1-5

7. Not a vector space; Axiom 8 fails.

3. Vector space

9. Vector space

5. Not a vector space; Axioms 5 and 6 fail.

11. Vector space

19.

1

u

= u−1

True/False 4.1 (a) True

(b) False

(c) False

(d) False

(e) True

(f) False

Exercise Set 4.2 (page 200) 1. (a), (c), (e)

3. (a), (b), (d)

11. (a) The vectors span R 15. (a) Line; x =

− 21 t ,

3

y=

19. (a) The set spans R 2

5. (a), (c), (d)

7. (a), (c)

(b) The vectors do not span R

− 23 t ,

z=t

(b) Origin

3

9. (a), (b)

13. The polynomials do not span P2

(c) Plane; x − 3y + z = 0

(d) Line; x = −3t , y = −2t , z = t

(b) The set does not span R 2

True/False 4.2 (a) True

(b) True

(c) False

(d) False

(e) False

(f) True

(g) True

(h) False

Exercise Set 4.3 (page 210) 1. (a) u2 = −5u1 (c) p2 = 2p1

(b) A set of 3 vectors in R 2 must be linearly dependent by Theorem 4.3.3. (d) A = (−1) B

5. (a) Linearly independent

3. (a) Linearly dependent

(b) Linearly independent

(b) Linearly independent

(i) False

(j) True

(k) False

A24

Answers to Exercises

7. (a) The vectors do not lie in a plane 9. (b) v1 =

2 v 7 2



v2 =

3 v; 7 3

7 v 2 1

13. (a) Linearly independent

+

3 v; 2 3

(b) The vectors lie in a plane v3 = − 73 v1 + 23 v2

11. λ = − 21 , λ = 1

(b) Linearly dependent

15. (a) Linearly independent

(b) Linearly dependent

True/False 4.3 (a) False

(b) True

(c) False

(d) True

(e) True

(f) False

(g) True

(h) False

Exercise Set 4.4 (page 219) 11. (a)

5

28



3 14

,



(b) a,

b−a



17. p = 7p1 − 8p2 + 3p3 ; (p)S = (7, −8, 3)

'

23. (a) (2, 0)

(b)

13. (a) (3, −2, 1)

2

21. (a) Linearly independent

, − √13 3

√2

(

' (c) (0, 1)



(b) 3x 2 + 8x − 1

27. (a) (20, 17, 2)

(d)

(c)

15. A = 1A1 − 1A2 + 1A3 − 1A4 ; (A)S = (1, −1, 1, −1)

(b) (−2, 0, 1)

(b) Linearly dependent

, b − √a3 3

−103

2a √

−21 −106

( 25. (b) (3, 4, 2, 1)

30

True/False 4.4 (a) False

(b) False

(c) True

(d) True

(e) False

Exercise Set 4.5 (page 228) 1. Basis: {(1, 0, 1)}; dimension: 1

  * , 1, 0 , − 53 , 0, 1 ; dimension: 2 3

(c) Basis: {(2, −1, 4)}; dimension: 1

(b) Basis: {(1, 1, 0) , (0, 0, 1)}; dimension: 2

(d) Basis: S = {(1, 1, 0) , (0, 1, 1)}; dimension: 2

(c) Basis: {−1 + x, −1 + x 2 }

11. (b) Dimension: 2

(b) 2

9. (a) n

(b)

n(n+1) 2

(c)

n(n+1) 2

13. e2 and e3 (the answer is not unique) 17. {v1 , v2 } (the answer is not unique)

15. v1 , v2 , and e1 form a basis for R 3 (the answer is not unique) 19. (a) 1

5. Basis: {(3, 1, 0) , (−1, 0, 1)}; dimension: 2

3. No basis; dimension: 0

) 2

7. (a) Basis:

(c) 1

27. (a) {−1 + x − 2x 2 , 3 + 3x + 6x 2 , 9} (the answer is not unique) (b) {1 + x , x 2 } (the answer is not unique) (c) {1 + x − 3x 2 } (the answer is not unique) True/False 4.5 (a) True

(b) True

(c) False

(d) True

(e) True

Exercise Set4.6 (page 235)    − 21

13 10 − 25

1. (a)



3

2

⎢ 3. (a) ⎣−2

−3

5

5. (b)

2 1

(b)

0



5 2 ⎥ − 21 ⎦

1

− 25

−2

− 132

(c)

1 2 − 16

 (c) [w]B =



0 1 3



9

−5

8 5



; [w]B =



− 27 ⎢ 23 ⎥ ⎣ 2⎦

2 1 ; [h]B = −5 −2

(g) True



6

(d) [h]B =

− 17 10 ⎡



⎢ ⎥ (b) [w]B = ⎣−9⎦; [w]B =

6



0 3

0

(f) True

−4



−7

(h) True

(i) True

(j) False

(k) False

Answers to Exercises A25



3 −1

7. (a)



5 −2

(b)



1

2

3

9. (a) ⎣2

5

3⎦

1

0

8



5 −3







(d) [w]B1 =

−5

⎥ −3 ⎦

5

−2

−1

− cos (2θ )

−1

1













3

−1

; [w]B1 =





4



−1 ⎡

⎤ −200 ⎢ ⎥ ⎢ ⎥ (e) [w]S = ⎣−5⎦; [w]B = ⎣ 64⎦

5



77⎦; [w]S = ⎣−3⎦ 30

(b) B =



5

(e) [w]B2 =

1

3



25

0

13. P −1 Q−1

15. (a) B = {(1, 1, 0) (1, 0, 2) (0, 2, 1)} 17.



−239







sin (2θ )

3

9

−1

; [w]B2 =

(d) [w]B = ⎣

(b) ⎣ 13

sin (2θ )

2

−1







16

cos (2θ )



2



−40



 11. (a)

2 −1

) 4 5

    * , 15 , − 25 , 15 , − 15 , 25 , − 25 , 25 , 15

19. B must be the standard basis.

True/False 4.6 (a) True

(b) True

(c) True

(d) True

Exercise Set 4.7 (page 246)  1. (a) 1

2



−1

  +2

3

4

⎡ ⎤

(e) False



⎤ −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (b) −2 ⎣3⎦ + 3 ⎣ 6⎦ + 5 ⎣ 2⎦ 0 4 −1 4

0



(f) False



⎡ ⎤

⎤ ⎡ ⎤ ⎡ ⎤ 1 5 −1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (b) b is in the column space of A; ⎣9⎦ − 3 ⎣ 3⎦ + ⎣1⎦ = ⎣ 1⎦ 1 1 1 −1 1

3. (a) b is not in the column space of A

⎡ ⎤ 5



−2



⎡ ⎤



0

⎢0⎥ ⎢ 1⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 5. (a) r ⎢ ⎥ + s ⎢ ⎥ + t ⎢ ⎥ ⎣0⎦ ⎣ 1⎦ ⎣1 ⎦ 0

0

3



⎡ ⎤ 5



−2





⎡ ⎤ 0

⎢ 0⎥ ⎢0⎥ ⎢ 1⎥ ⎢0 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (b) ⎢ ⎥ + r ⎢ ⎥ + s ⎢ ⎥ + t ⎢ ⎥ ⎣−1⎦ ⎣0⎦ ⎣ 1⎦ ⎣1 ⎦

1

0

5

0

1

7. (a) (1, 0) + t (3, 1); t (3, 1)

(b) (−2, 7, 0) + t (−1, −1, 1); t (−1, −1, 1) ⎧⎡ ⎤⎫ ⎪ ⎨ 16 ⎪ ⎬ )   ⎢ ⎥ 0 −16 , 0 9. (a) Basis for the null space: ⎣19⎦ ; basis for the row space: 1 ⎪ ⎪ ⎩ ⎭ 1 ⎧⎡ ⎤ ⎡ 1 ⎤⎫ ⎪ 2 ⎪ ⎬ ⎨ 0 ) * ⎢ ⎥ ⎢ ⎥ (b) Basis for the null space: ⎣1⎦ , ⎣ 0⎦ ; basis for the row space: 1 0 − 21 ⎪ ⎪ ⎭ ⎩ 0

1

* −19

1

⎧⎡ ⎤ ⎡ ⎤⎫ 2 ⎪ ⎪ ⎨ 1 ⎬ )   * ⎢ ⎥ ⎢ ⎥ 11. (a) Basis for the column space: ⎣0⎦ , ⎣1⎦ ; basis for the row space: 1 0 2 , 0 0 1 ⎪ ⎪ ⎩ ⎭ 0 0 ⎧⎡ ⎤ ⎡ ⎤⎫ −3 ⎪ 1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨⎢0⎥ ⎢ 1⎥⎬ )   ⎢ ⎥ ⎢ ⎥ (b) Basis for the column space: ⎢ ⎥ , ⎢ ⎥ ; basis for the row space: 1 −3 0 0 , 0 1 ⎪ ⎪ ⎦ ⎣ ⎦ ⎣ 0 ⎪ ⎪ ⎪ ⎪ 0 ⎭ ⎩ 0

0

0

0

*

A26

Answers to Exercises

13. (a) Basis for the row space:

)

 

3 , 0 ⎡ ⎤⎫ 0 ⎪

1

0 11 0 ⎧⎡ ⎤ ⎡ ⎤ −2 1 ⎪ ⎪ ⎪ ⎨⎢−2⎥ ⎢ 5⎥ ⎢ ⎥ ⎢ ⎥

1

⎪ ⎢0⎥⎪ ⎬ ⎢ ⎥ basis for the column space: ⎢ ⎥ , ⎢ ⎥ , ⎢ ⎥ ⎪ ⎣−1⎦ ⎣ 3⎦ ⎣1⎦⎪ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩ 8 1 −3 )     (b) 1 −2 5 0 3 , −2 5 −7 0 −6 , −1

19.

1

4

5

6

  9 , 3

−2

1

0

 

0 , 0

−2

3

1

0

−3

0

*

1

0 ;

*

17. Basis: {v1 , v2 , v4 }; v3 = 2v1 − v2 ; v5 = −v1 + 3v2 + 2v4

15. {(1, 1, 0, 0) , (0, 0, 1, 1) , (−2, 0, 2, 2) , (0, −3, 0, 3)}

)

3

*

−1

4

21. Since TA (x) = Ax,we are seeking the general solution of the linear system Ax = b. (a) x = t − 83 , 43 , 1 (b) x = (c) x =





7

, − 23 , 0 3  1 , − 23 , 0 3 ⎤

  + t − 83 , 43 , 1   + t − 83 , 43 , 1

0

0

0

23. (b) ⎣0

1

0⎦ is an example of such a matrix

0

0

1



25. (a)



− 5a where a and b are not both zero − 5b

3a 3b

(b) Only the zero vector forms the null space for both A and B . The line 3x + y = 0 forms the null space for C . The entire plane forms the null space for D .

True/False 4.7 (a) True

(b) False

(c) False

(d) False

(e) False

(f) True

(g) True

(h) False

(i) True

(j) False

Exercise Set 4.8 (page 256) 1. (a) rank(A) = 1; nullity(A) = 3

(b) rank(A) = 2; nullity(A) = 3

3. (a) rank(A) = 3; nullity(A) = 0

(c) 3 leading variables; 0 parameters in the general solution (the solution is unique)

5. (a) rank(A) = 1; nullity(A) = 2

(c) 1 leading variable; 2 parameters in the general solution

7. (a) largest possible value for the rank: 4; smallest possible value for the nullity: 0 (b) largest possible value for the rank: 3; smallest possible value for the nullity: 2 (c) largest possible value for the rank: 3; smallest possible value for the nullity: 0 (a)

(b)

(c)

(d)

(e)

(f)

(g)

dimension of the row space of A

3

2

1

2

2

0

2

dimension of the column space of A

3

2

1

2

2

0

2

dimension of the null space of A

0

1

2

7

7

4

0

0

1

2

3

3

4

4

Yes

No

Yes

Yes

No

Yes

Yes

0



2

7



4

0

9. (i)

dimension of the null space of A (ii)

T

is the system Ax = b consistent?

(iii) number of parameters in the general solution of Ax = b





11. (a) nullity(A) − nullity AT = 1





(b) nullity(A) − nullity AT = n − m

15. The matrix cannot have rank 1. It has rank 2 if r = 2 and s = 1.

13. (a) 3

(b) 2

Answers to Exercises A27

17. No, both row and column spaces of A must be planes. 19. (a) 3

(b) 5

(c) 3

(d) 3

21. (a) 3

(b) No

27. (a) Overdetermined; inconsistent if 3b1 + b2 + 2b3  = 0 (b) Underdetermined; infinitely many solutions for all b’s; (cannot be inconsistent) (c) Underdetermined; infinitely many solutions for all b’s; (cannot be inconsistent)

True/False 4.8 (a) False

(b) True

(c) False

(d) False

Exercise Set 4.9 (page 268) 

1. (a)



1

0

0

−1



 −1



=

2

−1



⎤⎡

0

0

3. (a) ⎣0

1

0⎦ ⎣−5⎦ = ⎣−5⎦

 5. (a)

0

−1 

0

1

0

0

0



⎥⎢







2

2

=

(b)

0

−2









−2

0

0

7. (a) ⎣0

1

0⎦ ⎣ 1⎦ = ⎣ 1⎦

0

0

0

√

1

2 2





2 2

1

⎢ 11. (a) ⎢ ⎣0 0

⎡√

0

− 21 0 1

2 2



0

0

1 2

3



−1



− 21

=

2

⎤⎡

0

1 4

1/α 0

2

(b) ⎣0

0

0

0



2

⎥⎢



3

0

−1

1

2



=

  0 a 1

⎤⎡ ⎥⎢

1



b

− 21

1

=

a/α b

3

⎢ ⎥

1 2

0

0

=









0

0



2

1

  0 a

0

α

b

−1 1

 =

3



a

αb



4



0

=

 =

3 −2 2



6

3

√ 



3

− 3 2 3 −2

3

2

0⎦ ⎣−1⎦ = ⎣−2⎦

−1

⎢ ⎥

4

2

2

1 2



1

1

(b) ⎣0

0

⎥⎢

0



1



2



⎡ ⎤

0⎦ ⎣ 1⎦ = ⎣1⎦

2

6

⎥⎢



1

⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣−1⎦ = ⎣2⎦

−3

⎤⎡

⎤⎡

−2



0 3



2



3

(c) ⎣0

0



1

0

⎤⎡



0

⎤ ⎡√ ⎤ 2 3+1 ⎥⎢ ⎥ ⎢ ⎥ 0⎥ ⎦ ⎣−1⎦ = ⎣ −1√ ⎦ √ 3 2 −1 + 3 2 ⎤⎡ ⎤ ⎡ ⎤ 1 2

0

=

−4



0

⎥⎢

0

−4  



0⎦ ⎣−5⎦ = ⎣−5⎦

0

3



1





0

1

⎤⎡

(c) ⎣ 0

0

3 2



1

0

2

−1 0







1 − 23   2 −1 3









0

3

0



=

2

(j) False

−1

2

2



(b)





(i) True



3

(b)



−1

 −1 2

⎡ ⎤

3



0

2





0



3





(b)

2

0

0

⎤ 1





3

1



(h) False

0

3

(b)

2

1

−5 ⎤⎡ ⎤ ⎡ ⎤ 0 −2 −2 ⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣ 1⎦ = ⎣ 0⎦





(c)

2

0

0⎦ ⎣−5⎦ = ⎣5⎦

=

−5



1

0

0



0





0

19. (a)

0



2 2



2 ⎢ 1⎥ ⎣− 4 ⎦ 3 4

0⎦ ⎣−1⎦ =

0

−1

1





1 4



(b) ⎣0





15. (a) ⎣ 0

17. (a)

0



3 3 2

1

0

1 2

1



2



0



1

(g) False

  =

2

4



0

0

2

2 2

0

1 2

⎡1

0



2



(f) False

√ ⎥ ⎢ 2 ⎥ ⎢ 1⎥⎢ 3 ⎥ ⎢ 0 1 − 1 = (b) 1 − ⎦ ⎣ ⎦ ⎣ 2⎦ ⎣ 2 √ √ 1 3 2 + 3 − 21 0 2 2 √ ⎤⎡ ⎤ ⎡ ⎤ ⎡ 2 − 2 2 0 −1 0 ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 0 0⎦ ⎣−1⎦ = ⎣ −1⎦ (d) ⎣1 √ √

3 2

⎢ (c) ⎢ ⎣ √0

13. (a)





−1



0



0



2 2



1



4.60 +2 √ = 3 ≈ √ 3 −4 −1.96 −2 3 2√    2 √    2 7 2 − 2 3 4.95 2 √ √ = ≈ (d) 2 −4 −0.71 − 22 2 ⎤⎡ ⎤ ⎡ ⎡√ ⎤

 √2 (c)

3



− 21

3 2

9. (a)

⎥⎢

0



1



0

−3 

3

 

−5 ⎤⎡

2

−1



1



2



(b)

−2 ⎤ ⎡

(e) True

 −1.96

−4.60

−2 3

A28

21.

Answers to Exercises

(b) the matrix A2 corresponds to the compression in the y -direction with factor 21

(a) the matrix A1 corresponds to the contraction with factor 21 y

y

y

y

3 2

3 2

3 2

1

1

1

1

1 2

x 1 2

0 –1

1 2

x 1 2

0

1

–1

2

2

0

0

2

1

2

0

1



1 2

x 1 2

0

1

–1

2



(b)

(d) the matrix A4 corresponds to the shear in the y -direction by a factor − 21

3 2

1 2

23. (a)

(c) the matrix A3 corresponds to the shear in the y -direction by a factor 21

x 1 2

0

1

–1

2

2

 (dilation with factor 2)

 (shear in the x -direction by a factor 2)





1 4

25. The standard matrix:

3 4





− 21

27. The standard matrix:

 ; Pπ /3 (3, 4) =

3 4

3 4



3 2





1 2

3 2

'

3 4

+



3,

'

( + 3 ≈ (2.48, 4.30)



3 3 4



; Hπ /3 (3, 4) = − 23 + 2 3,





3 3 2

( + 2 ≈ (1.96, 4.60)

⎤⎡ ⎤



⎥⎢ ⎥



3 −1 ⎤⎡ ⎤



0

0

29. Reflection about the xy -plane: T (1, 2, 3) = ⎣0

1

0⎦ ⎣2⎦ = ⎣ 2⎦;

0 1

0 0





⎢ Reflection about the xz-plane: T (1, 2, 3) = ⎣0

−1

0

0



1

0

1

1



1



−3 ⎤ 1

⎥⎢ ⎥ ⎢ ⎥ 0⎦ ⎣2⎦ = ⎣−2⎦; 3

1



⎥⎢ ⎥



3

⎤⎡ ⎤



−1

0

0

Reflection about the yz-plane: T (1, 2, 3) = ⎣ 0

1

0⎦ ⎣2⎦ = ⎣ 2⎦

0

0



⎡√ ⎢

3 2

31. (a) ⎢ ⎣

1



− 21

0





1 2

0



3 2

0⎥ ⎦

0

1



1

0

0

− √12 ⎥ ⎦

1 2 √1 2

(b) ⎣0 0



1



−1

1 3

3



0

0

1

(c) ⎣ 0

1

0⎦

−1

0

0



√1 2

⎥ ⎡

− 19





33. ⎣



8 9 − 19 4 9

8 9 4 9

4 9 4⎥ 9⎦ − 79

37. Rotation through the angle 2θ 39. Rotation through the angle θ , then translation by x0 ; not a matrix transformation

Exercise Set 4.10 (page 277) 1. (a) Operators do not commute



5. [TB ◦ TA ] =



−10

−7

5

−10



−1

0

0

9. (a) ⎣ 0

0

0⎦

0

0

1







(b) Operators do not commute



; [ TA ◦ T B ] =

⎡ ⎢

1

(b) ⎣ 0

−1

−8

−3

13

−12 ⎡ −1 ⎢ (c) ⎣ 0

0

0

1

0⎦

0

0

0



0

1

2

0⎦

0

1







7. (a)

⎤ ⎥



3. Operators commute

1

0

0

−1





(b)

0

0

0

1 2



 (c)



3 2

3 3 2

3 3 2

− 23





Answers to Exercises A29





3 1 ; [ T2 ] = −1 2

1 1

11. (a) [T1 ] =



3

(b) [T2 ◦ T1 ] =

3



0



4



; [T1 ◦ T2 ] =

5

4



6 −2 1 −4 (c) T1 (T2 (x1 ,x2 )) = (5x1 + 4x2 , x1 − 4x2 ); T2 (T1 (x1 ,x2 )) = (3x1 + 3x2 , 6x1 − 2x2 )

13. (a) Not one-to-one

(b) One-to-one

15. (a) Reflection about the x -axis







  4 x1

(c) One-to-one

(d) One-to-one

(b) Rotation through an angle of −π/4

(c) Contraction by a factor of

8 w1 ; the operator is not one-to-one = 2 1 x2 w2 ⎡ ⎤ ⎡ ⎤ ⎡x ⎤ w1 1 −1 3 2 ⎢ ⎥ ⎢ ⎥ (b) ⎣w2 ⎦ = ⎣ 2 0 4⎦ ⎣x2 ⎦; the operator is not one-to-one 1 3 6 w3 x3   1   − 23 ; T −1 (w1 , w2 ) = 13 w1 − 23 w2 , 13 w1 + 13 w2 19. (a) One-to-one; standard matrix of T −1 : 13 1 17. (a)

3

21. (a) One-to-one

1 3

(b) Not one-to-one

3

(b) Not one-to-one

⎧⎡ ⎤ ⎡−1⎤⎫ ⎪ ⎪ ⎨ 1 ⎬ ⎢ ⎥ 23. (a) ⎣ 5 ⎦ , ⎣ 6⎦ ⎪ ⎪ ⎩ 7 ⎭

(b) {(−14, 19, 11)}

(c) rank(T ) = 2; nullity(T ) = 1

(d) rank(A) = 2; nullity(A) = 1

4

⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ 1 2 −1 ⎪ ⎪ ⎬ ⎨ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 25. Basis for ker(TA ): {(10, 2, 0, 7)}; basis for R(TA ): ⎣−3⎦ , ⎣1⎦ , ⎣ 3⎦ ⎪ ⎪ ⎭ ⎩ 8 4 −3 27. (a) Range of T must be a proper subset of R n 29. (a) Yes

(b) T maps infinitely many vectors into 0

(b) Yes

True/False 4.10 (a) False

(b) True

(c) True

(d) False

(e) True

(f) True

(g) True

Exercise Set 4.11 (page 287) 1. y =

6

x 13

3. y = 27 x

5.

y (0, 1)

y 3

(3, 1)

(1, 1) 1

2

4

(0, 0)

x (0, 0)

1

x

(1, 0)

2

4

(2, –1) 3

 7. (a)



1 2

0

0

5

 (b)

1

0

2

5



 (c)

0

−1

−1

0



(–1, –2)

9. (a) Operators commute

(b) Operators do not commute

11. Shearing by a factor of 1 in the x -direction, then reflection about the x -axis, then expanding by a factor of 2 in the y -direction, then expanding by a factor of 4 in the x -direction.

A30

Answers to Exercises

13. Reflection about the x -axis, then expanding by a factor of 2 in the y -direction, then expanding by a factor of 4 in the x -direction, then reflection about the line y = x . 15. (a) The unit square is expanded in the x -direction by a factor of 3. (b) The unit square is reflected about the x -axis and expanded in the y -direction by a factor of 5. 17. (b) No, Theorem 4.11.1 applies only to invertible matrices. y

21. (a)

y

(– 12 , 32 )

( 12 , 1)

(1, 1)

x (0, 0)

x (0, 0)

(1, 0)

(b) Shearing by a factor of −1 in the x -direction, then expanding by a factor of 2 in the y -direction, then shearing by a factor of 1 in the y -direction. y

23.

y

(0, 1)

( 14 , 1)

(1, 1)

1

( 54 , 1)

1

x

x

1

1

25. The line segment from (0,0) to (2,0). Theorem 4.11.1 does not apply here because A is singular.

True/False 4.11 (a) False

(b) True

(c) True

(d) True

(e) False

(f) False

(g) True

Chapter 4 Supplementary Exercises (page 289) 1. (a) u + v = (4, 3, 2); k u = (−3, 0, 0)

3. a plane if s= 1; a line if s= −2; the origin if s = −2 and s  = 1

(c) Axioms 1–5

7. A must be invertible 9. (a) rank is 2; nullity is 1 (b) rank is 2; nullity is 2 (c) For n = 1, rank is 1 and nullity is 0; for n ≥ 2, rank is 2 and nullity is n − 2.

)

n/2"

11. (a) 1, x 2 , x 4 , . . . , x 2

⎧⎡ ⎪ ⎨ 1 ⎢ 13. (a) ⎣0 ⎪ ⎩ 0

⎧⎡ 0 ⎪ ⎨ ⎢ (b) ⎣−1 ⎪ ⎩ 0

⎤ ⎡

*

)

(b) 1, x − x n , x 2 − x n , . . . , x n−1 − x n

⎤ ⎡

⎤ ⎡

⎤ ⎡

*

⎤ ⎡

⎤⎫

0 ⎪ ⎬

0

0

0

1

0

0

0

1

0

0

0

0

0

0

0

0

0

0⎦ , ⎣1

0

0⎦ , ⎣0

0

0⎦ , ⎣0

1

0⎦ , ⎣0

0

1⎦ , ⎣0

0

0⎦

0

0

0

0

0

0

0

0

1

0

0

1

⎥ ⎢

0

⎤ ⎡

⎥ ⎢

1

⎤ ⎡

⎥ ⎢

0

1

0

0

0

1

0

0

0

0⎦ , ⎣ 0

0

0⎦ , ⎣0

0

0

0

−1

0

0

0

−1

⎥ ⎢

15. Possible ranks are 0, 1, and 2.

⎥ ⎢

17. (a) Yes

⎤⎫

⎥ ⎢

0

0 ⎪ ⎬



1⎦ 0

⎪ ⎭

(b) No

(c) Yes

⎥ ⎢

0



⎪ ⎭

Answers to Exercises A31

Exercise Set 5.1 (page 302) 1. eigenvalue: −1

3. eigenvalue: 5

5. (a) Characteristic equation: (λ−5) (λ+1) = 0; eigenvalue: 5, basis for eigenspace: {(1, 1)}; eigenvalue: −1, basis for eigenspace: {(−2, 1)} (b) Characteristic equation: λ2 + 3 = 0; no real eigenvalues (c) Characteristic equation: (λ − 1)2 = 0; eigenvalue: 1, basis for eigenspace: {(1, 0) , (0, 1)} (d) Characteristic equation: (λ − 1)2 = 0; eigenvalue: λ = 1, basis for eigenspace: {(1, 0)} 7. Characteristic equation: (λ−1) (λ−2) (λ−3) = 0; eigenvalue: 1, basis for eigenspace: {(0, 1, 0)}; eigenvalue: 2, basis for eigenspace: {(−1, 2, 2)}; eigenvalue: 3, basis for eigenspace: {(−1, 1, 1)} 9. Characteristic equation: (λ+2)2 (λ−5) = 0; eigenvalue: −2, basis for eigenspace: {(1, 0, 1)} ; eigenvalue: 5, basis for eigenspace: {(8, 0, 1)} 11. Characteristic equation: (λ−3)3 = 0; eigenvalue: 3, basis for eigenspace: {(0, 1, 0) , (1, 0, 1)} 13. (λ − 3) (λ − 7) (λ − 1) = 0 15. Eigenvalue: 5, basis for eigenspace: {(1, 1)}; eigenvalue: −1, basis for eigenspace: {(−2, 1)} 17. (b) λ= −ω is the eigenvalue associated with given eigenvectors. 19. (a) Eigenvalue: 1, eigenspace: span{(1, 1)}; eigenvalue: −1, eigenspace: span{(−1, 1)} (b) Eigenvalue: 1, eigenspace: span{(1, 0)}; eigenvalue: 0, eigenspace: span{(0, 1)} (c) No real eigenvalues (d) Eigenvalue: k , eigenspace: R 2 (e) Eigenvalue: 1, eigenspace: span{(1, 0)} 21. (a) Eigenvalue: 1, eigenspace: span{(1, 0, 0) , (0, 1, 0)}; eigenvalue: −1, eigenspace: span{(0, 0, 1)} (b) Eigenvalue: 1, eigenspace: span{(1, 0, 0) , (0, 0, 1)}; eigenvalue: 0, eigenspace: span{(0, 1, 0)} (c) Eigenvalue: 1, eigenspace: span{(1, 0, 0)} (d) Eigenvalue: k , eigenspace: R 3 23. (a) y = 2x and y = x



27.

− 21 ⎢ 1 ⎣− 2

− 21 − 21

0

0



25. (a) 6 × 6

(b) No invariant lines

(b) Yes

(c) Three

1

⎥ −1⎦ 1

True/False 5.1 (a) False

(b) False

(c) True

(d) False

Exercise Set 5.2 (page 313) 

5. P =

1

0

3

1



(answer is not unique)

(e) False



(f) False

−2



1

0

7. P = ⎣0

1

0⎦ (answer is not unique)

0

0

1





A32

Answers to Exercises

9. (a) 3 and 5

(b) rank (3I − A) = 1; rank (5I − A) = 2

(c) Yes

11. eigenvalues: 1, 2 and 3; each multiplicity 1 and geometric multiplicity 1;⎤ ⎡ ⎡ has algebraic ⎤ 1 2 1 1 0 0



⎢ A is diagonalizable; P = ⎣1

3

3⎦ (answer is not unique); P −1 AP = ⎣0

2

0⎦

1

3

4

0

3

0

0

0



0

13. eigenvalue λ = 0 has both algebraic and geometric multiplicity 2; geometric multiplicity 1; eigenvalue λ = 1 has both⎡algebraic and⎤ 0 −1 0









⎤ ⎥

⎢ A is diagonalizable; P = ⎣1

0

0⎦ (answer is not unique); P −1 AP = ⎣0

0

0⎦

0

3

1

0

1

0

15. (a) A is a 3 × 3 matrix; all three eigenspaces (for λ = 1, λ = −3, and λ = 5) must have dimension 1. (b) A is a 6 × 6 matrix; the possible dimensions of the eigenspace corresponding to λ = 0 are 1 or 2; the dimension of the eigenspace corresponding to λ = 1 must be 1; the possible dimensions of the eigenspace corresponding to λ = 2 are 1, 2, or 3.

17.

24,234

−23,210

−34,815



−1

10,237

19. A11 = ⎣ 0

1

0

10,245



35,839

−2,047

⎤ ⎥

0⎦



1

−1

21. ⎣2

0

1

1



−2,048

1

⎤⎡

1

0

⎥⎢ −1⎦ ⎣0 1

3n

0

0

⎤⎡

1 6 ⎥⎢ 1 0 ⎦ ⎣− 2 1 4n 3

0

1 3

0

− 13



1 6 1⎥ 2⎦ 1 3

25. Yes 27. (a) The dimension of the eigenspace corresponding to λ = 1 must be 1; the possible dimensions of the eigenspace corresponding to λ = 3 are 1 or 2; the possible dimensions of the eigenspace corresponding to λ = 4 are 1, 2, or 3. (b) The dimension of the eigenspace corresponding to λ = 1 must be 1; the dimension of the eigenspace corresponding to λ = 3 must be 2; the dimension of the eigenspace corresponding to λ = 4 must be 3. (c) This eigenvalue must be λ = 4.



31. Standard matrix:

0

−1

−1 ⎡

0





; diagonalizable; P =

−1

1

1

1











(answer is not unique)

3

0

0

0

0

33. Standard matrix: ⎣0

1

0⎦; diagonalizable; P = ⎣0

−1

1

−1

1

1



0

3

⎤ ⎥

0⎦ (answer is not unique) 1

True/False 5.2 (a) False

(b) True

(c) True

(d) False

(e) True

(f) True

Exercise Set 5.3 (page 326) 1. u = (2 + i, −4i, 1 − i); Re (u) = (2, 0, 1); Im (u) = (−1, 4, 1); u =



7. A =

5i

4

2+i

1 − 5i





; Re (A) =





0

4

2

1

; Im (A) =

11. u · v = −1 + i ; u · w = 18 − 7i ; v · w = 12 + 6i



15. Eigenvalue: 2 + i , basis for eigenspace:

2+i

1+i 1

0

−1

5



23

(h) True

(i) True

5. x = (7 − 6i, −4 − 8i, 6 − 12i)

; det (A) = 17−i ; tr (A) = 1

13. −11 − 14i



; eigenvalue: 2 − i , basis for eigenspace:

1

 17. Eigenvalue: 4 + i , basis for eigenspace:

3

−5



(g) True

3

3

1

 ; eigenvalue: 4 − i , basis for eigenspace:

2−i

1−i 1

3

Answers to Exercises A33

19. |λ| =



2; φ =

− 83 i

27. (a) k=

21. |λ| = 2; φ = − π3

π 4

23. P =

 −2

−1

2

0



 ;C =

3

−2

2

3

 25. P =

 −1

1

1

0

 ;C =



5

 −3

3

5

(b) None

True/False 5.3 (a) False

(b) True

(c) False

(d) True

(e) False

(f) False

Exercise Set 5.4 (page 332) 1. (a) y1 = c1 e5x − 2c2 e−x , y2 = c1 e5x + c2 e−x (b) y1 = 0, y2 = 0 3. (a) y1 = −c2 e2x − c3 e3x , y2 = c1 ex + 2c2 e2x + c3 e3x , y3 = 2c2 e2x + c3 e3x (b) y1 = e2x − 2e3x , y2 = ex − 2e2x + 2e3x , y3 = −2e2x + 2e3x 7. y = c1 e3x − c2 e−2x

9. y = c1 ex + c2 e2x + c3 e3x ⎤ ⎡ ⎤ ⎡ 0 1 0 y1 ⎥ ⎢ ⎥ ⎢ 15. (b) y = Ay where y = ⎣y2 ⎦ and A = ⎣ 0 0 1⎦

−2

y3

1

2

(c) The solution of the system: y1 = c1 e2x + c2 ex + c3 e−x , y2 = 2c1 e2x + c2 ex − c3 e−x , and y3 = 4c1 e2x + c2 ex + c3 e−x ; The solution of the differential equation: y = c1 e2x + c2 ex + c3 e−x True/False 5.4 (a) True

(b) False

(c) True

(d) True

(e) False

Exercise Set 5.5 (page 342) 1. (a) Stochastic



0.54545

3. x4 = 11. (a) (b) (c) (d)

(b) Not stochastic

(c) Stochastic

(d) Not stochastic



⎡4⎤

  5. (a) Regular

0.45455

(b) Not regular

(c) Regular

7.

8 17 9 17

11

Probability that the system will stay in state 1 when it is in state 1 Probability that the system will move to state 1 when it is in state 2 0.8 0.85



13. (a)



0.95

0.55

0.05

0.45

(b) 0.93

15. (a) city population suburb population

(c) 0.142

(d) 0.63

initial state

after 1 year

after 2 years

after 3 years

after 4 years

after 5 years

100,000 25,000

95,750 29,250

91,840 33,160

88,243 36,757

84,933 40,067

81,889 43,111

(b) City population will approach 46,875 and the suburbs population will approach 78,125.

⎡7

17. P =

10 ⎢1 ⎣5 1 10

1 10 3 10 3 5



1 5 1 ⎥ ; 2 ⎦ 3 10

⎡1⎤ 3

⎢ ⎥

steady-state vector: ⎣ 13 ⎦

19. For any positive integer k , P k q = q.

1 3

True/False 5.5 (a) True

(b) True

(c) True

(d) False

(e) True

(f) False

(g) True

⎢ ⎥

4 9. ⎣ 11 ⎦ 3 11

A34

Answers to Exercises

Chapter 5 Supplementary Exercises (page 345) 1. (b) A is the standard matrix of the rotation in the plane about the origin through a positive angle θ . Unless the angle is an integer multiple of π , no vector resulting from such a rotation is a scalar multiple of the original nonzero vector.



1



1

⎢ 3. (c) ⎣0 0

2

⎥ 1⎦

0

3





0

9. A = 2

15

30

5

10

,A =

75

150

25

50

⎡ 11. 0, tr(A)



 3

,A =

0



375

750

125

250



,A = 5



1875

3750

625

1250



1

0

15. ⎣−1

− 21

− 21 ⎥ ⎦

1

− 21

− 21



13. All eigenvalues must be 0

 4

17. The only possible eigenvalues are −1, 0, and 1. 19. The remaining eigenvalues are 2 and 3.

Exercise Set 6.1 (page 353) 1. (a) 12

(b) −18

(c) −9

3. (a) 34

(b) −39

(c) −18



5.

2 0

√0

7. −24

3

19. p = 25. u =



14 , d(p, q) =



(d)

30



137



65, d(u, v) = 12 5

(e)

89



(e)

11 −29

9. 3





(d)

21. U =

11



34

13.

(f)



(f)



3 0



93, d(U, V ) =

27. (a) −101

203



610

√0

5



15. −50



99 = 3 11

17. u =



30, d(u, v) =





107



23. p = 6 3 , d(p, q) = 11 2

(b) 3

y

29. 4

x –2

2 –4

31. u, v = 19 u1 v1 + u2 v2 37. (a)

2 3

(b)

√4

(c)

15

33. Axioms 2 and 3 do not hold.



# 2

(d)

2 5

35. 14u, v − 4 u 2 − 6 v 2

43. (b) k1 and k2 must both be positive.

39. 0

True/False 6.1 (a) True

(b) False

(c) True

(d) True

(e) False

(f) True

(g) False

Exercise Set 6.2 (page 361) 1. (a) − √12

(c) − √12

(b) 0

13. Orthogonal if k =

4 3

3. 0

5.

19 √ 10 7

7. (a) Orthogonal

(b) Not orthogonal

15. The weights must be positive numbers such that w1 = 4w2 .

17. No

(c) Orthogonal 25. No

Answers to Exercises A35

)

27. (−1, −1, 1, 0), 31. (a)

1 4

2 7

, − 47 , 0, 1

(b) p =

√1 3

q =

√1 5

*

29. (a) y = − 21 x 33. (a) 0

(b) x = t , y = −2t , z = −3t

(b) p =

51. (a) v = a (1, −1)

√4 15

q = 2

#

(b) v = a(1, −2)

2 3

True/False 6.2 (a) False

(b) True

(c) True

(d) True

(e) False

(f) False

Exercise Set 6.3 (page 376) 1. (a) Orthogonal but not orthonormal (b) Orthogonal and orthonormal (c) Not orthogonal and not orthonormal (d) Orthogonal but not orthonormal 3. (a) Orthogonal

(b) Not orthogonal

1'

( ' ( 2 , 0, − √12 , √12 , 0, √12 (0, 1, 0)     u = − 115 v1 − 25 v2 + 2v3 9. u = 0v1 − 23 v2 + 13 v3 11. − 11 , − 25 , 2 13. 0, − 23 , 13 5             (a) 63 , 84 (b) − 88 , 66 17. (a) 25 , 25 (b) − 21 , 21 19. (a) 10 , 8, 4 (b) 23 , − 23 , − 13 25 25 25 25 3 3 3    7 14 7      (a) 22 , − 14 ,2 (b) − 15 , 15 , 3 23. 23 , 23 , −1, −1 25. 23 , 11 , − 181 , − 17 15 15 3 18 6 18 ( ( 1' ( ' ( ' (2 ' ' √1 , √1 , √1 , − √12 , √12 , 0 , √16 , √16 , − √26 q1 = √110 , − √310 , q2 = √310 , √110 29. 3 3 3

5. An orthonormal basis: 7. 15. 21. 27.

√1

2

y u2

q2

x

q1

1' 31. 33. 35. 39. 43.

45.

u1

( ' ( ' (2 , 0 , √110 , √110 , − √210 , − √210 , √115 , √115 , − √215 , √315     From Exercise 23, w1 = projW b = 23 , 23 , −1, −1 , so w2 = b − projW b = − 21 , 21 , 1, −1 . 1' ( ' ( ' (2   13 31 20   √1 , √1 , √1 √1 , √1 , − √1 √2 , − √1 , 0 w1 = 14 , , , 14 , 7 , w2 = 141 , − 143 , 17 37. An orthonormal basis: 6 6 6 6 6 6 6 6 ( ( ' ' For example, x = √13 , 0 and y = 0, √12 41. (b) projW u = (2, 1, 2) (using both methods) √ √   An orthonormal basis: {1, 3 (−1 + 2x) , 5 1 − 6x + 6x 2 } √ √ ⎤ ⎡√ √ 2 2 2 √  √ 5 5 ⎢ 1 ⎥ √ 3 − 3 ⎦ (Q is given) R= 47. R = ⎣ 0 √ (Q is given) 0,

√2

5

√1

,

5

0

( ' , 0 , √530 , − √130 ,

√2

30

5

0

49. A does not have a QR -decomposition.

0

√4

6

55. (b) The range of T is W ; the kernel of T is W ⊥ .

True/False 6.3 (a) False

(b) False

(c) True

(d) True

(e) False

(f) True

A36

Answers to Exercises

Exercise Set 6.4 (page 386) 

1.

  x1

21

25

25

35

x2

  20

=

3. x1 =

20









− 116

20 , 11

x2 = − 118

5. x1 = 12, x2 = −3, x3 = 9

; least squares error: 7. Least squares error vector: ⎣− 27 11 ⎦



3 11



110 ≈ 2.86

15 11



3

⎢−3⎥ √ ⎢ ⎥ ⎥; least squares error: 3 3 ≈ 5.196 ⎣ 0⎦

9. Least squares error vector: ⎢

3

⎡ ⎤ 2

11. Least squares solutions: x1 =

1 2

⎢ ⎥ − 21 t , x2 = t ; error vector: ⎣0⎦ 2

⎡ 13. Least squares solutions: x1 = − 76 − t , x2 =





⎢ 15. ⎣

439 ⎥ 285 ⎦ 94 57

92 − 285



3



⎢ ⎥ 17. ⎣−4⎦

 19.

−1

1

0

0

0

⎡ 25. (a) {(1, 0, −5) , (0, 1, 3)}

(b)

1 35

7 6

⎢ − t , x3 = t ; error vector: ⎢ ⎣ ⎡



1

0

⎢ 21. ⎣0 0

10

15

⎢ ⎣ 15 −5

7⎤ 3

−5

26 3

0

− 496



0

⎥ 0⎦

0

1

  1 7 18 7

23.





0



⎢−1⎥ ⎢ ⎥ ⎥ ⎣ 1⎦





27. ⎢

3⎦ 34



7⎥ 6⎦

29. AT AAT

−1

A

1

True/False 6.4 (a) True

(b) False

(c) True

(d) True

(e) False

(f) True

(g) False

Exercise Set 6.5 (page 393) 1. y = − 21 + 27 x

3. y = 2 + 5x−3x 2 x

7 12

x 1

–50

2

True/False 6.5 (a) False

5. y =

y

y

(b) True

(c) True

(d) False

45678

5 21

+

48 7x

(h) True

Answers to Exercises A37

Exercise Set 6.6 (page 399) 1. (a) 1 + π − 2 sin x − sin 2x 3. (a) 5. (a)

ex 1 − e−1 2 3x

π

7e − 19 ≈ 0.00136 12e − 12

(b) 6

(b) 1 −

2 2 2 sin x − sin (2x) − · · · − sin(nx) 1 2 n

(b) 1 + π −

≈ 0.392

π2

9.

⬁  1 - 1  + 1 − (−1)k sin kx 2 kπ k=1

True/False 6.6 (a) False

(b) True

(c) True

(d) False

(e) True

Chapter 6 Supplementary Exercises (page 399) 1. (a) (0, a , a , 0) with a  = 0

'

(b) ± 0, √25 ,

√1

(

5

,0

3. (a) The subspace of all matrices in M22 with zeros on the main diagonal. (b) The subspace of all 2 × 2 skew-symmetric matrices. 7. ±

'

√1

2

(

√1

, 0,

11. (b) θ approaches

9. No

2

π

17. No

2

Exercise Set 7.1 (page 407)

1 0

1. (a) Orthogonal; A−1 =



0 −1



− √12



(b) Orthogonal; A−1 = ⎢ ⎣

3. (a) Not orthogonal

6

11. a 2 + b2 =

9. Yes



− 21 −

⎢ 17. (a) ⎢ ⎣







5 3 2

2

3 2

+



1 2

13. (a)





−1 + 3 3 √ 3+ 3



1 − 323 ⎢2 ⎥ ⎢ ⎥ 6 (b) ⎣ ⎦ √ − 23 − 23

⎥ ⎥ ⎦

5 2





1 19. ⎣0 0

(b)

0 cos θ − sin θ

5 2

⎥ ⎥

6⎦

2



− 235



7. TA (x) = ⎢ ⎣

√1

3

√ 

√1

√1

√1

3



2

⎤ √1 2

− √26

√1

√1

− √12

0

√1



√1

2

(b) Orthogonal; A−1 =

3



1+

√  3

5 2



3



√1

2



⎢ 3 ⎥ √ ⎥ 15. (a) ⎢ ⎣ 2⎦

⎤ ⎥

18 ⎥ ; 25 ⎦ 101 25

TA (x) = x =



− √52

⎢ (b) ⎢ ⎣

5

√7



38

⎤ ⎥ ⎥

2⎦

−3



0 sin θ ⎦ cos θ

21. (a) Rotations about the origin, reflections about any line through the origin, and any combination of these (b) Rotations about the origin, dilations, contractions, reflections about lines through the origin, and combinations of these (c) No; dilations and contractions

' √ ( √ √2 , 2 2, − √2 q = − ( ) S 3 3 3 √ √ (b) p = 11, d(p, q) = 21, p, q = 0

23. (a) (p)S =

'

√5

, 3



2,

√ ( √2 ,

True/False 7.1 (a) False

(b) False

(c) False

(d) False

(e) True

(f) True

(g) True

(h) True

A38

Answers to Exercises

Exercise Set 7.2 (page 416) 1. λ2 − 5λ = 0; λ = 0: one-dimensional; λ = 5: one-dimensional 3. λ3 − 3λ2 = 0; λ = 3: one-dimensional; λ = 0: two-dimensional 5. λ4 − 8λ3 = 0; λ = 0: three-dimensional; λ = 8: one-dimensional √ ⎤ √3



− √27

7. P = ⎣

√2

7

− √1

2



− √1

√1

− √16

√1

6

√1

2

15. (2)

3 5

0

4 5

− 45

0

3 5

0

− √1

− √12



17. (−4) ⎢ ⎣

√1

2



3

9. P = ⎣ 0

1

0⎦; P −1 AP = ⎣ 0

−3

3 5

0

4 5

0⎦

0

0

0 0

0

0

−25

0

0⎥ ⎥

0

25

0

0

0

√1



2



3

⎥ 4⎦

0

4

3

√1

√1

2

√1

&

2



− √13

 = (2)



⎢ ⎥% 0 + (−4) ⎢− √13 ⎥ − √13 ⎣ ⎦ &

2



0





1 2

⎢ 0⎥ ⎦ + (−4) ⎣

0

0

√1

1 3

1 3

1 3

21. Yes

23. (a)

1√ 4−2 2

25

− 21

− 21

1 2

− √13



6

,

1 3

⎤ √ − 2√ −1 ⎣ 4+2 2 ⎦

1 2

1 2





√1

⎢ 1 ⎥% 1 √ ⎥ √ + (2 ) ⎢ ⎣ 6⎦ 6

1 6

1⎤ 3

1 6

1⎥ 3⎦

1 3

2 3

√1

√2

6

&

6

√2

6





2

(b)

1 2

6

 −1  

1√ 4+2 2

2

⎡ &

3

⎡1

1 3

1 + (4 )

√1

⎥ ⎢1 ⎢ − 13 ⎥ ⎦ + (2 ) ⎣ 6

− 13 ⎤ ⎡

2− √1 ⎣ 4−2 2 ⎦



0⎦

1 2

⎤ − 13

3

1 3

− 13 ⎡√

0

⎢ 19. ⎣0

%

−50

0



⎡ −25 ⎥ ⎢ 0 0⎥ ⎢ ⎥ −1 ⎥; P AP = ⎢ 3⎥ ⎣ 0 5⎦ 



0⎦







− 21

1 2

3

0



0



0

+ (4 )

0

0

0

0

3

√1



25

0

⎤ ⎥% 1 ⎥ −√ 2 ⎦



3 5

⎥ −1 ⎥; P AP = ⎢ ⎣0 3⎦

4 5

2



0

0

&



− 45





2

⎢ 1 = (−4) ⎢ ⎣− 2 ⎡

√1

2

√1



10



3

0

 1  − √2 % 2

0





√1

6



0

0

3

√2

0

− 45 ⎢ 3 ⎢ ⎢ 5 13. P = ⎢ ⎢ 0 ⎣

3

7

⎡ 11. P = ⎢ ⎣

⎦; P −1 AP =

7

√ √3



√1

,

2

√1



2

√1

2

True/False 7.2 (a) True

(b) True

(c) False

(d) True

(e) True

(f) True

(g) True

Exercise Set 7.3 (page 427) 3

  0 x1

0

7

 1. (a) [x1

x2 ]

3. 2x 2 + 5y 2 − 6xy

⎡ ⎤ ⎡− 2 x1 3 ⎢ ⎥ ⎢ 2 7. ⎣x2 ⎦ = ⎢ ⎣ 3 1 x3

3

x2

1 3 2 3

(b) [x1

 =

− √12

x2 ]

4

−3 √1

2

 −3 x 1 −9 x2

  y1

⎡ (c) [x1

; Q = 3y12 + y22 √1 √1 x2 y2 2 2 ⎤ − 13 ⎡ ⎤ ⎥ y1 2 2 2 ⎣ ⎦ − 23 ⎥ ⎦ y2 ; Q = y1 + 4y2 + 7y3 y3 2

5. 2 3

  x1



3

x2

9

3

⎢ x3 ] ⎢ ⎣ 3

−1

−4

1 2

⎤ −4 ⎡x1 ⎤ ⎥ 1⎥⎢ ⎥ x 2 ⎦ ⎣ 2⎦ x3 4

Answers to Exercises A39



9. (a) x

y

  x

  2

1 2

1 2

0

11. (a) Ellipse

y

   x −6 + (2 ) = 0 y



+ 1

(b) Hyperbola

(c) Parabola

15. Hyperbola: 4x 2 − y 2 = 3; θ = sin 17. (a) Positive definite 19. Positive definite

  −1 3 5

n

⎢ ⎢− 1 ⎢ n(n−1) 2 T ⎢ 33. (a) sx = x ⎢ . ⎢ . ⎢ . ⎣ 1 − n(n− 1)

1 − n(n− 1)

···

1

···

n

.. .

..

1 − n(n− 1)

0

0

1

  x y



−8

+ 7

   x

(d) Positive semidefinite

23. Indefinite

27. (a) Indefinite



⎥ 1 ⎥ − n(n− 1) ⎥ ⎥ x .. ⎥ ⎥ . ⎥ ⎦

···

  0

+ (−5) = 0

y

13. Hyperbola: 3y 2 − 2x 2 = 8; θ = sin−1

(c) Indefinite

1 − n(n− 1)

.

y

'

√2

(

5

≈ 63.4◦

≈ 36.9◦

(b) Negative definite

1

(b) x

(d) Circle

21. Positive semidefinite





(e) Negative semidefinite

(b) Negative definite

29. k > 2

35. A must have a positive eigenvalue of multiplicity 2.

1

n

True/False 7.3 (a) True (l) False

(b) False

(c) True

(d) True

(e) False

(f) True

(g) True

(h) True

(i) True

(j) True

(k) True

Exercise Set 7.4 (page 436) 1. Maximum: 5 at (x, y) = (±1, 0); minimum: −1 at (x, y) = (0, ±1) 3. Maximum: 7 at (x, y) = (0, ±1); minimum: 3 at (x, y) = (±1, 0) 5. Maximum: 9 at (x, y, z) = (±1, 0, 0); minimum: 3 at (x, y, z) = (0, 0, ±1)



7. Maximum:

2 at (x, y) =

'√

' √

(

y

9.

' √



(

'√

(

2, −1

y 5x 2 – y 2 = 5 x

(–1, 0)

(

2, 1 and (x, y) = − 2, −1 ; minimum: − 2 at (x, y) = − 2, 1 and (x, y) =

5x 2 – y 2 = –1

(0, 1)

x

(1, 0) (0, –1)

13. Saddle point at (0, 0); relative maximum at (−1, 1) 15. Relative minimum at (0, 0); saddle point at (2, 1); saddle point at (−2, 1)

17. x =

√5

2

,y =

21. q(x) = λ

√1

2

True/False 7.4 (a) False

(b) True

(c) True

(d) False

(e) True

Exercise Set 7.5 (page 443)  1.

−2 i

4

5−i

1+i

3−i

0

 −1

9. A

=

3 5

− 45

− 45 i

− 35 i





2 − 3i

1

i

3. ⎣ −i

−3

1

1

2



2 + 3i



 −1

11. A

=⎣

1 √

2 2 1 √

2 2

'√ '

3−i

(

√ (

1+i 3

⎤ ⎥ ⎦

5. (a) (A)13  = (A∗ )13

1 √

'

2 2 1 √

2 2

√ (⎤

'

1−i 3

−i −

√ (⎦ 3

(b) (A)22  = (A∗ )22

 −1+i 13. P =



−i  1√ 6

√1

√2

3 3

6

;P

 −1

AP =



3

0

0

6

A40

Answers to Exercises

 −1−i √

+i  1√ 3

√2

√1

6

15. P =

6

⎡ ⎢

19. ⎣



; P −1 AP =

3

0

i

2 − 3i

i

0

1

−1

4i

−2 − 3i

2

0

0

8





0

⎢ 1−i



√2

3

0

0

6



⎥ ⎦

−2

0⎥; P −1 AP = ⎢ ⎣ 0 ⎦

6

√1



1

−√ 1+i

√ 17. P = ⎢ ⎣ 3





0

27. (c) B and C must commute

√1

− √i 2

√i

− √12

2

35.

2



0

0

1

0⎦

0

5





True/False 7.5 (a) False

(b) False

(c) True

(d) False

(e) False

Chapter 7 Supplementary Exercises (page 445) 3 1. (a)

5

− 45

4 5

3 5



− √12



5. P = ⎢ ⎣



−1 =

√1

2

0 √1

2

0 √1

2

7. Positive definite

0

3 5

4 5

− 45

3 5





− 35

0

9 (b) ⎢ ⎣− 25

4 5

⎥ ⎥ − 12 25 ⎦

12 25

3 5

16 25







0

1⎥; P T AP = ⎢ ⎣0 ⎦ 0

0

9. (a) Parabola

13. Two possible solutions: a = 0, b =

#



⎤−1

4 5



4 5

⎢ =⎢ ⎣ 0 − 35

12 ⎤ 25

4 5

3⎥ 5⎦

− 12 25



− 16 25



0

0

2

0⎦

0

1



(b) Parabola 2 , 3

− 259

c = − √13 and a = 0, b = −

#

2 , 3

c=

√1

3

Exercise Set 8.1 (page 456) 1. (a) Nonlinear



a

(b) Linear; kernel consists of all matrices of the form

 (c) Linear; kernel consists of all matrices of the form 3. Nonlinear

(b) 4

17. (a) (1, 0, 1)



−a  0 b

−b

0

5. Linear; kernel consists of all 2 × 2 matrices whose rows are orthogonal to all columns of B

7. (a) Linear; ker(T ) = {0} 13. (a) 2

c

b

(b) Nonlinear

(c) mn − 3 (b) ker(T ) = {0}

(d) 1

9. Linear; ker(T ) = {(0, 0, 0, . . .)}

15. (a)





3

6

−12

9

(c) R(T ) = R 3

11. (a) and (d)

(b) rank(T ) = 4; nullity(T ) = 0

19. T (x1 , x2 ) = (−4x1 + 5x2 , x1 − 3x2 ); T (5, −3) = (−35, 14)

19. T (x1 , x2 , x3 ) = (−x1 + 4x2 − x3 , 5x1 − 5x2 − x3 , x1 + 3x3 ); T (2, 4, −1) = (15, −9, −1) 23. (b) {x, x 2 }

(c) {5, x 2 }

25. (a) ker(D) consists of all constant polynomials (b) ker(J ) consists of all polynomials of the form a1 x 27. (a) T (f (x)) = f (4) (x) (b) T (f (x)) = f (n+1) (x)

Answers to Exercises A41

29. (a) The origin, a line through the origin, a plane through the origin, or the entire space R 3 (b) The origin, a line through the origin, a plane through the origin, or the entire space R 3 31. (−10, −7, 6) True/False 8.1 (a) True

(b) False

(c) True

(d) False

(e) True

(f) True

(g) False

(h) False

(i) False

Exercise Set 8.2 (page 464) 1. (a) ker(T ) = {0}; T is one-to-one

(b) ker(T ) = {0}; T is one-to-one

3. (a) nullity(A) = 1; not one-to-one

(b) nullity(A) = 1; not one-to-one

5. (a) One-to-one

(c) Not one-to-one



(b) One-to-one



(c) ker(T ) = {span(0, 1, 1)}; T is not one-to-one

7. For example, T 1 − x 2 = (0, 0); T is onto 9. No; T is not one-to-one because ker(T )  = {0} as T (a) = a × a = 0 13. (T3 ◦ T2 ◦ T1 ) (x, y) = (3x − 2y, x)

11. (T2 ◦ T1 )(x, y) = (2x − 3y, 2x + 3y) 15. (a) a + d

(b) (T2 ◦ T1 )(A) does not exist because T1 (A) is not a 2 × 2 matrix

17. a0 x + a1 x (x + 1) + a2 x (x + 1)2 (d) T −1 (2, 3) = 2 + x

19. (a) (1, −1) y 2

p(x) = 2 + x x

21. (a) all the ai ’s must be nonzero

(b) T −1 (x1 , x2 , . . . , xn ) =

'

1

x, a1 1

1

x , . . . , a1n xn a2 2

(

23. (a) T1−1 (p (x)) = x1 p (x); T2−1 (p(x)) = p(x − 1); (T1−1 ◦ T2−1 )(p(x)) = x1 p(x − 1) 25. T2 (v) = 41 v

31. Since ker(J )  = {0}, J is not one-to-one.

True/False 8.2 (a) True

(b) False

(c) True

(d) True

(e) False

(f) True

Exercise Set 8.3 (page 471) 1. Isomorphism

⎛⎡

3. Isomorphism

a

b

9. (a) T ⎝⎣b

d

c

e

⎜⎢

⎡ ⎤ a ⎢ ⎥ ⎤⎞ ⎢ b⎥ ⎢ ⎥ c ⎢ ⎥ ⎥⎟ ⎢ c⎥ e⎦⎠ = ⎢ ⎥ ⎢ d⎥ ⎢ ⎥ f ⎢ e⎥ ⎣ ⎦ f

5. Not an isomorphism

 (b) T1

a

b

c

d



7. Isomorphism

⎡ ⎤ a  ⎢ b⎥ a ⎢ ⎥ = ⎢ ⎥ ; T2 ⎣ c⎦ c d

b d



⎡ ⎤ a ⎢ c⎥ ⎢ ⎥ =⎢ ⎥ ⎣ b⎦ d

11. Isomorphism

13. dim (W ) = 3; (−r − s − t, r, s, t) → (r, s, t) is an isomorphism between W and R 3

15. Isomorphism

17. Yes

19. No

True/False 8.3 (a) False

(b) True

(c) False

(d) True

(e) True

(f) True

A42

Answers to Exercises

Exercise Set 8.4 (page 479) ⎡

0

⎢1 ⎢ 1. (a) ⎢ ⎣0 0



0 1

0⎥ ⎥ ⎥ 0⎦

0

1

0

  x1 x2

−1

3. (a) ⎣0

1

⎥ −2⎦

0

0

1

1



18 7

1 7

− 107 7 ⎡ ⎤

24 7

=

(b) T (v1 ) =

5

  x1

1

⎢ ⎥

3



⎥ 1⎥ ⎦

1

=

19 7





− 837 ⎤







1

−1

1

1

7. (a) ⎣0

2

4 ⎦

0

0

4

4 3

; T ( v2 ) =



1





−5 

  (d) T

x2 ⎡

3



0

8 3



3



0

⎢ 1 − 5. (a) ⎢ ⎣ 2

 

; [T (v2 )]B =

−2 

1





1





9. (a) [T (v1 )]B = (c) T



0



(b), (c) 3 + 10x + 16x 2

  −2 29



11. (a) [T (v1 )]B = ⎣2⎦ ; [T (v2 )]B = ⎣ 0⎦; [T (v3 )]B = ⎣ 5⎦

−2

6

4

(b) T(v1 ) = 16 + 51x + 19x 2 ; T(v2 ) = −6 − 5x + 5x 2 ; T (v3 ) = 7 + 40x + 15x 2   239a0 − 161a1 + 289a2 201a0 − 111a1 + 247a2 61a0 − 31a1 + 107a2 2 (c) T a0 + a1 x + a2 x 2 = + x+ x 24 8 12 (d) T (1 + x 2 ) = 22 + 56x+14x 2





0

1

15. (a) [T ]B,B



1

0

1

0

17. (a) ⎣0

0

2⎦

0

0

0





1

⎢1 1⎥ ⎢ ⎥ ⎥; [T ]B,B

= ⎢ ⎣1 1⎦

1



0

0

1

1

2

2⎥ ⎥



(b), (c)

1

1

0

0

19. (a) ⎣0

0

⎥ −1 ⎦

0

1

0

⎡ ⎢



0

0

0

0⎥ ⎥

(b) [T2 ◦ T1 ]B ,B = [T2 ]B ,B

[T1 ]B

,B



3

0⎦

0

0

3

2

5

1

2



2⎦

1

21. (a) [T2 ◦ T1 ]B ,B = [T2 ]B ,B

[T1 ]B

,B



0

0

(b) −6 + 48x

0

0









⎢3 ⎥ ⎢ −3⎦; [T2 ]B ,B

= ⎢ ⎣0

0

0

0

−1

0

⎢ ⎥; [T1 ]B

,B = ⎣0 −9⎦

0

⎢1 ⎢ =⎢ ⎣1

2

0⎥ ⎥

⎢6 ⎢ 13. (a) [T2 ◦ T1 ]B ,B = ⎢ ⎣0 ⎡



0





0

(b) 4 sin x + 3 cos x

(b) [T3 ◦ T2 ◦ T1 ]B ,B = [T3 ]B ,B

[T2 ]B

,B

[T1 ]B

,B

23. The matrix for T relative to B is the matrix whose columns are the transforms of the basis vectors in B in terms of the standard basis. Since B is the standard basis for R n , this matrix is the standard matrix for T . Also, since B is the standard basis for R m , the resulting transformation will give vector components relative to the standard basis. True/False 8.4 (a) False

(b) False

(c) True

(d) False

(e) True

Exercise Set 8.5 (page 486) 1. (a) det(A) = −2 does not equal det(B) = −1



3.

6

 −10



5.

−2

−2

6

5



7. [T ]B =

−3 ⎡ −2 ⎢ 9. [T ]B = ⎣ 1

−1 0

−2 ⎢ ⎥ 1 ⎦ ; [ T ]B = ⎣ 1

0

1

0

2



0



0

−1



(b) tr(A) = 3 does not equal tr(B) = −2

1

−2

0

−1 ⎤

0

0

⎥ 1⎦

1

0





; [ T ]B =



11

20

−6

−11

 11. [T ]B =

√1

2

√1

2

− √12 √1

2

 ; [ T ]B =



√1

2

√1

2

− √12 √1

2



Answers to Exercises A43

 13. [T ]B =

−1

0

1

1



1 ; [ T ]B =

2

1 2

3 2

− 21



15. (a) −4, 3 (b) A basis for the eigenspace corresponding to λ = −4 is {−2 + 83 x + x 2 }; A basis for the eigenspace corresponding to λ = 3 is {5 − 2x + x 2 }



19. det(T ) = 17; eigenvalues: 5 ± 2 2

21. det(T ) = 1; eigenvalue: 1

True/False 8.5 (a) False

(b) True

(c) True

(d) True

(e) True

(f) False

(g) True

(h) False

Chapter 8 Supplementary Exercises (page 488) 1. No



−1



⎢ 1⎥ ⎢ ⎥ ⎥ ⎣ 0⎦

5. (a) T (e3 ) and any two of T (e1 ), T (e2 ), T (e4 ) form a basis for the range; a basis for ker(T ) is ⎢

1

(b) rank(T ) = 3; nullity(T ) = 1 7. (a) rank(T ) = 2; nullity(T ) = 2

0

⎢ ⎢1 ⎢ ⎢ ⎢0 ⎢ ⎢ ⎢ 25. ⎢0 ⎢ ⎢. ⎢ .. ⎢ ⎢ ⎢0 ⎣ 0

0

···

0

0

0

0

···

0

0⎥ ⎥

1 2

0

···

0

0

1 3

···

0

.. .

.. .

..

.

.. .

···

n

···

0

0

0

1

0



0

0



⎢0 ⎢ 13. ⎢ ⎣0

11. rank(T ) = 3; nullity(T ) = 1



(b) T is not one-to-one 0

0

0



0

1

1

0

0⎥ ⎥ ⎥ 0⎦

0

0

1





−4

0

15. ⎣ 1

0

⎥ −2 ⎦

0

1

1



9



1

−1

17. ⎣0

1

1

0



1

⎤ ⎥

0⎦

19. (b) {1, x}

−1

⎥ ⎥

0⎥ ⎥



0⎥ ⎥

⎥ ..⎥ .⎥ ⎥ ⎥ 0⎥ ⎦

1 1

n+1

Exercise Set 9.1 (page 499) 1. x1 = 2, x2 = 1



3. x1 = 3, x2 = −1 1

0

7. (a) L−1 = ⎣−2

1

1

1





0



⎡1





5. x1 = −1, x2 = 1, x3 = 0 1 8

2

0 ⎦; U −1 = ⎢ ⎣0 1 0

⎤⎡

0

0

0

1

1 2

9. (a) A = LU = ⎣−2

1

0⎦ ⎢ ⎣0

0



2

0

⎥⎢

1

0



1 6

1⎤

−2



1⎥ ⎦

0

⎤⎡



5 ⎥ 24 ⎦

1 4

2



− 487

1

−1



1

0

0

2

1

(c) A = L2 U2 = ⎣−1

1

0⎦ ⎣ 0

0

1⎦

1

0

1

0

0

1



⎥⎢





5 48

− 481

7 (b) A−1 = ⎢ ⎣− 24

11 24

1 6

1 6





− 487

⎤ ⎥

5 ⎥ 24 ⎦ 1 6

⎤⎡

1

0

0

2

0

(b) A = L1 DU1 = ⎣−1

1

0 ⎦ ⎣0

1

1

0

1

0

0



⎥⎢

⎤ ⎡1 ⎥⎢ 0⎦ ⎢ ⎣0 0 1

0

1 2

− 21

⎤ ⎥

0

1⎥ ⎦

0

1

A44

Answers to Exercises

 11. x1 =

21 , 17

x2 = − 14 , x3 = 17 ⎡

1

0

⎢ 15. A = P LU = ⎣0 0

0

⎤⎡

3

0

⎥⎢ 1⎦ ⎣0

1

0

0

3

2 3 n additions 3

17. Approximately

13. A = LDU =

12 17

0

⎤⎡

2 0

1

0

0

2



1

2

0

0

−3



− 13

1

⎥⎢ 0⎦ ⎢ ⎣0

1



1

1

0

1



0



1

1⎥ ; 2⎦

0

1

x1 = − 21 , x2 =

1 2

x3 = 3

and multiplications are required

True/False 9.1 (a) False

(b) False

(c) True

(d) True

(e) True

Exercise Set 9.2 (page 508) 1. (a) λ3 = −8 is the dominant eigenvalue



3. x1 ≈ x3 ≈

0.98058



(b) no dominant eigenvalue



, λ ≈ 5.15385; x2 ≈

0.98837

(1)

−0.19612   0.98679

, λ(3) ≈ 5.16226; x4 ≈

−0.16201



−0.15206   0.98715 −0.15977



, λ(2) ≈ 5.16185;

, λ(4) ≈ 5.16228;

dominant eigenvalue: 2 + 10 ≈ 5.16228 ; √ corresponding unit eigenvector: √ 1 √ (3 10, −1) ≈ (0.98709, −0.16018)

 5. x1 = x4 ≈

−1

 ,λ

1

20+6 10

 = 6; x2 = 

(1)

 −0.53488

−0.5



1

, λ = 6.6; x3 ≈



dominant eigenvalue: 3+ 13≈ 6.60555 ; ' √

corresponding scaled eigenvector: 7. (a) x1 =



1

−0.5

−0.53846 1

 , λ(3) ≈ 6.60550;

, λ(4) ≈ 6.60555;

1





(2)

 ; x2 =

1



−0.8

2− 13 ,1 3



; x3 ≈

(

≈ (−0.53518, 1)  1

−0.929

9. 2.99993 ;

(b) λ = 2.8 ; λ ≈ 2.976 ; λ ≈ 2.997 (c) eigenvector: (1, −1); eigenvalue: 3 (d) 0.1% (1)

(2)



0.99180



1.00000

(3)

⎡ ⎤

⎡ ⎤

1

1

⎢ ⎥

13. (a) Starting with x0 = ⎣0⎦ it takes 8 iterations. 0

⎢0⎥ ⎢ ⎥ (b) Starting with x0 = ⎢ ⎥ it takes 8 iterations. ⎣0⎦ 0

Exercise Set 9.3 (page 513) 1. (a) ≈ 0.067 second

(b) ≈ 66.68 seconds

(c) ≈ 66,668 seconds, or about 18.5 hours

3. (a) ≈ 9.52 seconds

(b) ≈ 0.0014 second

(c) ≈ 9.52 seconds

(d) ≈ 28.57 seconds

5. (a) about 6.67 × 105 seconds for forward phase; about 10 seconds for backward phase (b) 1334 gigaflops per second 7. n2 flops

9. 2n3 − n2 flops

Answers to Exercises A45

Exercise Set 9.4 (page 520) 1.



5, 0

3.





⎢ ⎢ 9. A = ⎢ 31 ⎣ − 23 √

2

√1

2



0

√1





⎥ 3 2





⎡ √

0



√1

2



√1 ⎥ − 2 0⎦ 1

√1





0

0

1





2



− √15

√2

5

7. A =

√1

√1 3



√2

5

5

8

0

0

2

⎤ ⎡√ √2 3 6

0

⎢ 1 √ 11. A = ⎢ ⎣ 3 − √13

√1

2

1



2



0

0

2

0

2

0 

0

− √12

2

2



0

− 62  1

 √

√1

2

⎥⎢ −232⎥ ⎣

0

2

2



2 6

− √12

√1

5. A =

5

√1

2 3

19. (b) A =





⎢ − √16 ⎥ ⎦⎣

√1 2 √1 2

√1

√2

− √25

0  √ ⎥ 1 2⎦

0

√1

5

0

0

0



5

5



0

√1 6





1

√1

2

2

True/False 9.4 (a) False

(b) True

(c) False

(d) False

Exercise Set 9.5 (page 524) ⎡

2⎤ 3

⎢ 1⎥% √ &% 1 ⎥ √ 1. A = ⎢ ⎣ 3⎦ 3 2 − 2 − 23 ⎡ 1 ⎤

√1

3

√1

⎡ ⎢



⎥ √ ⎢ ⎥ ⎥ [1 0] + 2 ⎢ √1 ⎥ [0 1] ⎣ 2⎦

√1

√1



⎥ ⎥ 2⎦

3

(g) True



√

3



0

√1

0

1

0

2

0

1





2⎤ 3

⎥%

√ ⎢

5. A = 3 2 ⎢ ⎣

1⎥ 3⎦

− 23

2

9. 70,100 numbers must be stored; A has 100,000 entries

3⎦

− √13

0

− √13



0

(f) False

√1

3

3. A = ⎢ ⎣

2



√ ⎢ 7. A = 3 ⎢ ⎣

&

(e) True

√1 2

True/False 9.5 (a) True

(b) True

(c) False

Chapter 9 Supplementary Exercises (page 524)  1. A =

2

0

−2

1



−3

1

0

2





⎤⎡



2

0

0

1

2

3

3. A = ⎣1

2

0⎦ ⎣0

1

2⎦

1

1

2

0

1



⎥⎢

0

⎥ 

5. (a) dominant eigenvalue: 3, corresponding positive unit eigenvector:

 (b) x5 ≈

0.7100

0.7042



1

(c) x5 ≈







√1



2

√1

2

0.7071

;v≈

0.7071



0.9918

7. The Rayleigh quotients will slowly converge to the dominant eigenvalue λ4 = −8.1.



− √1

2

⎢ 9. A = ⎢ ⎣ 0 − √12

0 1 0

√1

2

⎤⎡

2

⎥ 0 ⎥⎢ ⎦ ⎣0 − √12

0



0 

√1 ⎥ − 2 0⎦ − √12

0

⎡1

− √12 √1

2



2

⎢1 ⎢ ⎢2 11. A = ⎢ ⎢1 ⎣2 1 2

1⎤ 2

⎥ − 21 ⎥ ⎥ 24 ⎥ 1⎥ −2⎦ 0 1 2

2

0

3

− 13

2 3

12

2 3

2 3

− 13



− √12

√1

2

&

A46

Answers to Exercises

Exercise Set 10.1 (page 532) 1. (a) y = 3x − 4

(b) y = −2x + 1

2. (a) x + y − 4x − 6y + 4 = 0 or (x − 2)2 + (y − 3)2 = 9 2

2

3. x 2 + 2xy + y 2 − 2x + y = 0 (a parabola)



x ⎢x1 5. (a) ⎢ ⎣x2 x3

y y1 y2 y3



z z1 z2 z3

0 1⎥ ⎥=0 1⎦ 1

(b) x 2 + y 2 + 2x − 4y − 20 = 0 or (x + 1)2 + (y − 2)2 = 25

4. (a) x + 2y + z = 0

(b) −x + y − 2z + 1 = 0

(b) x + 2y + z = 0; −x + y − 2z = 0

6. (a) x 2 + y 2 + z2 − 2x − 4y − 2z = −2 or (x − 1)2 + (y − 2)2 + (z − 1)2 = 4 (b) x 2 + y 2 + z2 − 2x − 2y = 3 or (x − 1)2 + (y − 1)2 + z2 = 5

 y  y  1 10.   y2   y3

x2 x12 x22 x32



1 

x x1 x2 x3

1 

=0  1

11. The equation of the line through the three collinear points

1

13. The equation of the plane through the four coplanar points

Exercise Set 10.2 (page 539) 1. 700

2. (a) 5

(b) 4

34 21

4. (a) Ox, units; sheep, 20 unit 21 (b) First kind, 259 measure; second kind,

7 25

measure; third kind,

4 25

measure

(a2 + a3 + · · · + an ) − a1 5. (a) x1 = , xi = ai − x1 , i = 2, 3, . . . , n n−2

(b) Exercise 7(b); gold, 30 21 minae; brass, 9 21 minae; tin, 14 21 minae; iron, 5 21 minae

6. (a) 5x + y + z − K = 0 x + 7y + z − K = 0 x + y + 8z − K = 0

x=

21t , 131

y=

14t , 131

z=

12t , 131

K = t where t is an arbitrary number

(b) Take t = 131, so that x = 21, y = 14, z = 12, K = 131. (c) Take t = 262, so that x = 42, y = 28, z = 24, K = 262. 7. (a) Legitimate son, 577 79 staters; illegitimate son, 422 29 staters (b) Gold, 30 21 minae; brass, 9 21 minae; tin, 14 21 minae; iron, 5 21 minae (c) First person, 45; second person, 37 21 ; third person, 22 21

Exercise Set 10.3 (page 549) 2. (a) S(x) = −.12643(x − .4)3 − .20211(x − .4)2 + .92158(x − .4) + .38942 (b) S(.5) = .47943; error = 0% 3. (a) The cubic runout spline 4. S(x) =

⎧ 3 ⎨−.00000042(x +3 10)

(b) S(x) = 3x 3 − 2x 2 + 5x + 1

+ .000214(x + 10) + .99815, −10 ≤ x .00000024(x) − .0000126(x)2 + .000088(x) + .99987, 0≤x 3 2 ⎩−.00000004(x − 10)3 − .0000054(x − 10)2 − .000092(x − 10) + .99973, 10 ≤ x .00000022(x − 20) − .0000066(x − 20) − .000212(x − 20) + .99823, 20 ≤ x

≤0 ≤ 10 ≤ 20 ≤ 30

Maximum at (x, S(x)) = (3.93, 1.00004) 5. S(x) =

⎧ 3 2 ⎨.00000009(x +3 10) − .0000121(x +2 10) + .000282(x + 10) + .99815, −10 ≤ x ≤ 0 .00000009(x)

− .0000093(x)

+ .000070(x)

+ .99987,

⎩.00000004(x − 10)3 − .0000066(x − 10)2 − .000087(x − 10) + .99973, 3

2

.00000004(x − 20) − .0000053(x − 20) − .000207(x − 20) + .99823,

Maximum at (x, S(x)) = (4.00, 1.00001)

0 ≤ x ≤ 10 10 ≤ x ≤ 20 20 ≤ x ≤ 30

12. 0 = 0

Answers to Exercises A47

8

−4x 3 + 3x 4x 3 − 12x 2 + 9x − 1

6. (a) S(x) =

8

0.5 ≤ x ≤ 1 1 ≤ x ≤ 1.5

2 − 2x 2 − 2x

(b) S(x) =

0 ≤ x ≤ 0.5 0.5 ≤ x ≤ 1

(c) The three data points are collinear.



4

⎢1 ⎢ ⎢ ⎢0 ⎢ 7. (b) ⎢ . ⎢ .. ⎢ ⎢ ⎣0

1

0

0

4

1

0

1

4

1

.. .

.. .

.. .

0

0

0

1

0

0

0

2

1

0

0

4

1

0

1

4

1

.. .

.. .

.. .

0

0

0

0

0

0



⎢1 ⎢ ⎢ ⎢0 ⎢ 8. (b) ⎢. ⎢.. ⎢ ⎢ ⎣0 0

··· ··· ···

0

0

0

0

0

0

0

0

0

.. .

.. .

.. .

··· ···

0

1

4

0

0

1

··· ··· ···

0

0

0

0

0

0

0

0

0

.. .

.. .

.. .

0

0

4

0

1

1

... ...

⎤ M1 ⎥⎢ ⎥ 0⎥ ⎢ M2 ⎥ ⎥⎢ ⎥ ⎢ ⎥ 0⎥ ⎥ ⎢ M3 ⎥ .. ⎥ ⎢ .. ⎥ = ⎢ ⎥ .⎥ ⎥⎢ . ⎥ ⎥⎢ ⎥ 1⎦ ⎣Mn−2 ⎦ 4 Mn−1 ⎤⎡ ⎤ M1 1 ⎥⎢ ⎥ 0 ⎥ ⎢ M2 ⎥ ⎥⎢ ⎥ ⎢ ⎥ 0⎥ ⎥ ⎢ M3 ⎥ .. ⎥ ⎢ .. ⎥ = ⎢ ⎥ .⎥ ⎥⎢ . ⎥ ⎥⎢ ⎥ 1⎦ ⎣Mn−1 ⎦ 2 Mn 1

⎤⎡



6

h2

6

h2

⎤ yn−1 − 2y1 + y2 ⎢ y − 2y + y ⎥ ⎢ 1 2 3 ⎥ ⎢ ⎥ ⎢ y2 − 2y3 + y4 ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎣yn−3 − 2yn−2 + yn−1 ⎦ yn−2 − 2yn−1 + y1 ⎡ ⎤ − hy1 − y 1 + y2 ⎢ y1 − 2y2 + y3 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ y 2 y 2 − 3 + y4 ⎥ ⎢ ⎥ ⎢ ⎥ .. ⎢ ⎥ . ⎢ ⎥ ⎢ ⎥ ⎣ yn−2 − 2yn−1 + yn ⎦ yn−1 − yn + hyn

Exercise Set 10.4 (page 559) 1. (a) x

(1)

          .4 .46 .454 .4546 .45454 (2) (3) (4 ) (5) = , x = , x = , x = , x = .6 .54 .546 .5454 .54546 5

(b) P is regular since all entries of P are positive; q =

2. (a) x(1)

3. (a)

⎤ ⎡ ⎤ ⎡ ⎤ ⎡ .7 .23 .273 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ = ⎣.2⎦, x(2) = ⎣.52⎦, x(3) = ⎣.396⎦ .1 .25 .331

9

 26 

17

45

(b)

8 17

19 45

1−

 (b) P → n

' (n 1 2

0

1

1

72

⎢ ⎥

29 ⎥ (b) P is regular, since all entries of P are positive: q = ⎢ ⎣ 72 ⎦ 21 72

19

0

12 19

⎥ ⎦, n = 1, 2, . . . .

Thus, no integer power of P has all positive entries.

1

 



0

⎡ 22 ⎤

⎢4⎥ ⎥ (c) ⎢ ⎣ 19 ⎦ ⎤

1 2

⎢ 4. (a) P = ⎣

6 11

⎡3⎤

⎡ ' (n n

11

n (0)

as n increases, so P x



0 1

for any x(0) as n increases.

  (c) The entries of the limiting vector

⎡1 2

⎢1 6. P = ⎢ ⎣4 2

1 4

1 4

1⎤ 4

1 2

1⎥ 4⎦

1 4

1 2



0 1

are not all positive.

⎡1⎤ 3

⎢1⎥ ⎥ has all positive entries; q = ⎢ ⎣3⎦ 1 3

8. 54 16 % in region 1, 16 23 % in region 2, and 29 16 % in region 3

7.

10 13

A48

Answers to Exercises

Exercise Set 10.5 (page 568) ⎡

0

⎢1 ⎢ 1. (a) ⎢ ⎣1 0

1

0

0

0

1

1

0

1⎥ ⎥ ⎥ 1⎦

0

0

0

2. (a) P1





0 ⎢0 ⎢ ⎢ (b) ⎢1 ⎢ ⎣0 0

1 0 0 0 0

1 0 0 1 1

(b) P4

P2





0 0 1 0 0

0

⎢1 ⎢ ⎢ ⎢0 (c) ⎢ ⎢0 ⎢ ⎢ ⎣0

0 1⎥ ⎥ ⎥ 0⎥ ⎥ 0⎦ 0

0

P1



1

0

1

0

0

0

0

0

0

0⎥

1

0

1

1

0

0

0

0

0

0

0

0

⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ ⎥ 1⎦

0

1

0

1

0

(c) P1

P2

P3

P6

P5

P4

P5

P3

P3

P4

P2

P1

3. (a)

P1 → P2

(b) 1-step:

P1 → P4 → P2 P1 → P3 → P2

2-step: P4

P3



2-step: 3-step:

P1 → P2 → P1 → P2 P1 → P3 → P4 → P2 P1 → P4 → P3 → P2

3-step: P2

(c) 1-step:

P1 → P4 P1 → P3 → P4 P1 → P2 → P1 → P4 P1 → P4 → P3 → P4



1 0 0 0 0 ⎢0 1 0 0 0⎥ ⎢ ⎥ ⎥ 4. (a) ⎢ ⎢0 0 1 1 0⎥ ⎣0 0 1 2 1⎦ 0 0 0 1 2 (c) The ij th entry is the number of family members who influence both the i th and j th family members. 5. (a) {P1 , P2 , P3 }



0

⎢1 ⎢ 7. ⎢ ⎣0 0

1



(b) {P3 , P4 , P5 }

(c) {P2 , P4 , P6 , P8 } and {P4 , P5 , P6 }

1

0

0

1

0

1⎦ Power of P3 = 4

1

0

0

0⎥ ⎥ Power of P2 = 3



8. First, A; second, B and E (tie); fourth, C ; fifth, D

Power of P4 = 2

Exercise Set 10.6 (page 578) (b) [0 1 0]



(c) [1 0 0 0]

T

2. Let A =

1

1

1

1

  ∗



3. (a) p = [0 1], q =

0 1

 , for example.

  ∗

, v=3



(b) p = [0 1 0], q =

⎡ ⎤

1 0

, v=2

⎡ ⎤

0

1

⎢ ⎥

(c) p∗ = [0 0 1], q∗ = ⎣1⎦, v = 2

⎢ ⎥

(d) p∗ = [0 1 0 0], q∗ = ⎣0⎦, v = −2

0 4. (a) p∗ =

5 8

3 8



,

(b) {P3 , P4 , P6 }

Power of P1 = 5

0

1. (a) −5/8

6. (a) None

0

1 q∗ =

8 7 8

,

v=

27 8

(b) p∗ =

2 3

1 3



, q∗ =

1 6 5 6

, v=

70 3

Answers to Exercises A49

  ∗

(c) p = [1 0], (e) p∗ =



5. p =

3

 13

7 20

20

q =

10 13

13



1



0



v=3

,

(d) p =

3

2 5

5



3 5



q =

,

2 5

1



13

q∗ =

,

v=

19 5

v = − 29 13

,

12 13

,

 11  20



q =

,

v = − 203

,

9 20

Exercise Set 10.7 (page 586)   1. (a)

⎡ ⎤

⎡ ⎤

⎢ ⎥ (b) ⎣5⎦

⎢ ⎥ (c) ⎣54⎦

6

2

3

78

6

79

2. (a) Use Corollary 10.8.4; all row sums are less than one. (b) Use Corollary 10.8.5; all column ⎤ are less⎡than⎤one. ⎡ sums 2 1.9

⎢ ⎥





(c) Use Theorem 10.8.3, with x = ⎣1⎦ > C x = ⎣ .9⎦. 1 .9 3. E 2 has all positive entries.

4. Price of tomatoes, $120.00; price of corn, $100.00; price of lettuce, $106.67

5. $1256 for the CE, $1448 for the EE, $1556 for the ME

6. (b)

542 503

Exercise Set 10.8 (page 594) 1. The second class; $15,000 −1

−1

5. s/(g1 + g2 + · · · +

3. 1 : 1.90 : 3.02 : 4.24 : 5.00

2. $223

−1 gk− 1)

6. 1 : 2 : 3 : · · · : n − 1

Exercise Set 10.9 (page 601) ⎡



0

1

1

0

1. (a) ⎢ ⎣0

0

1

1⎥ ⎦

0

0

0

0











–2



1

0

3 2

3 2

0

(b) ⎢ ⎣0

0

1 2

1⎥ 2⎦

0

0

0

0



−2



–2



1

−2 ⎢ (c) ⎣−1

−1 −1

−1 0

0⎦

3

3

3

3

–1

–1

0

0 –1 0

1

2

0 –1



0



(d) ⎣0 0

.866 −.500

1.366

0

0

.366

⎤ .500 ⎥ .866⎦

1

0

0

–2

–1

–1

0

1

2

1

2

A50

Answers to Exercises





1

2. (b) (0, 0, 0), (1, 0, 0), 1 21 , 1, 0 , and

, 1, 0

2



–2

–1

0

1

0

1

2

2

1

(c) (0, 0, 0), (1, .6, 0), (1, 1.6, 0), (0, 1, 0) 0 –1



1

0

3. (a) ⎣0 0

−1



0









0⎦

0

1



–2



1

−1

0

0

(b) ⎣ 0 0

1

0⎦

0

1

–1

0 –1





–2



1

1

0

0

(c) ⎣0 0

1

0⎦



−1

0

–1

0

1

2

1 2

···

1⎤ 2

⎢ 0⎥ ⎦, M3 = ⎣0

cos 20◦

⎥ − sin 20◦ ⎥ ⎦,

0

sin 20◦

cos 20◦

0 –1

⎡1



⎡1



2

0

0

4. (a) M1 = ⎢ ⎣0

2

⎢ 0⎥ ⎦, M2 = ⎣ 0

0

···

0

0

1 3

0

··· ⎤

2







0

cos(−45◦ )

⎢ 0 M4 = ⎣ − sin(−45◦ )

0

sin(−45◦ )

1

0

0

cos(−45◦ )



1





0



−1

0

⎥ ⎢ ⎦, M5 = ⎣1 0

0

0



0

⎤ ⎥

0

0⎦

0

1



(b) P = M5 M4 M3 (M1 P + M2 )











.3

0

0

5. (a) M1 = ⎣ 0

.5

0⎦, M2 = ⎣0

0

0

1





cos 35◦

1

0 sin 45◦

sin 35◦

0



cos 45◦

0



⎥ ⎢ 0 1 0 ⎦, M5 M4 = ⎣ − sin 35◦ 0 cos 35◦ ⎡ ⎡ ⎤ 0 0 ··· 0 2 ⎢ ⎢ ⎥ M6 = ⎣0 0 · · · 0⎦, M7 = ⎣0 0 1 1 ··· 1

cos(−45◦ ) ⎢ = ⎣ sin(−45◦ )

− sin(−45◦ ) cos(−45◦ )

0 0

0

1

0⎦

0

1



6. R1 = ⎣





0













1

cos α

sin α

0

0 ⎦, R4 = ⎣− sin α 0 cos θ

cos α

0⎦, R5 = ⎣ 0

0

1 0

0 ⎦, R2 = ⎣ sin α 0 cos β

0

sin θ

1

− sin β ⎡ cos θ ⎢ 0 R3 = ⎣ − sin θ ⎡ 1

⎢0 ⎢ 7. (a) M = ⎢ ⎣0 0

0



0

0

1

0

0

1

x0 y0 ⎥ ⎥ ⎥ z0 ⎦

0

0

1

cos α



1

⎢0 ⎢ (b) ⎢ ⎣0 0

0



0⎦,



0

sin β



0⎦,



0

0

0



1

1

− sin α cos α

cos β

··· 0 ··· 0 ··· ⎤

1



(b) P = M7 (M5 M4 (M2 M1 P + M3 ) + M6 )



1

⎢ ⎥ − sin 45◦ ⎦, M3 = ⎣0

cos 45◦

0





0

0

−5



0

0

1

0

0

1

⎥ −3 ⎦

0

0

1

9⎥ ⎥



0⎦,









1

cos β sin β

0

− sin β

1

0

0

cos β

⎤ ⎥ ⎦

Answers to Exercises A51

Exercise Set 10.10 (page 611) ⎡ ⎤ ⎡ t1 0 ⎢ ⎥ ⎢1 ⎢ t2 ⎥ ⎢ ⎢ ⎥ ⎢4 1. (a) ⎢ ⎥ = ⎢ ⎢ t3 ⎥ ⎢ 1 ⎣ ⎦ ⎣4

1 4

1 4

0

0

0

0

t4

1 4

1 4

0

⎡ ⎤ 0

(c) t(1)

⎤⎡ ⎤ ⎡ ⎤ t1 0 ⎥⎢ ⎥ ⎢1⎥ 1⎥⎢ ⎥ ⎢ ⎥ t 4 ⎥ ⎢ 2⎥ ⎢2⎥ +⎢ ⎥ ⎥ ⎢ ⎥ 1⎥⎢ ⎥ ⎢0⎥ t 4 ⎦ ⎣ 3⎦ ⎣ ⎦

⎡1⎤

0

1 2 ⎡3⎤ 16

t4

0

⎡1⎤

4

⎢3⎥ ⎢ ⎥ ⎢4⎥ (b) t = ⎢ ⎥ ⎢1⎥ ⎣4⎦

8

3 4

⎡7⎤

⎢1⎥ ⎢5⎥ ⎢ 11 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥ (2) ⎢ 8 ⎥ (3) ⎢ 16 ⎥ = ⎢ ⎥, t = ⎢ ⎥, t = ⎢ ⎥, ⎢0⎥ ⎢1⎥ ⎢3⎥ ⎣ ⎦ ⎣8⎦ ⎣ 16 ⎦ 1 2

5 8

11 16

23 32

(d) for t1 and t3 , −12.9%; for t2 and t4 , 5.2% 2.

1 2

3. t(1) = t(2) =

3 4

 13 16

5 4 9 8

1 2 9 16

t( 4 )

⎡ 15 ⎤ ⎡ 1⎤ − 64 32 64 ⎢ 23 ⎥ ⎢ 47 ⎥ ⎢ 1⎥ ⎢ ⎥ ⎢ ⎥ ⎢− ⎥ ⎢ 32 ⎥ ⎢ 64 ⎥ ⎢ 64 ⎥ = ⎢ ⎥, t(5) = ⎢ ⎥, t(5) − t = ⎢ ⎥ ⎢7⎥ ⎢ 15 ⎥ ⎢− 1 ⎥ ⎣ 32 ⎦ ⎣ 64 ⎦ ⎣ 64 ⎦

5 4 11 8

5 4

1 2

1 13 16

7 16

1 21 16

 3 T 4 1

47 64

− 641

 5 T 8

Exercise Set 10.11 (page 622) 1. (c) x3∗ =

 31 22

,

27 22



(1)

2. (a) x3 = (1.40000, 1.20000)

(b) Same as part (a)

(2 )

x3 = (1.41000, 1.23000) (3)

x3 = (1.40900, 1.22700) (4 )

x3 = (1.40910, 1.22730) (5)

x3 = (1.40909, 1.22727) (6)

x3 = (1.40909, 1.22727)

(1)

(c) x3 (2) x3 (3) x3 (4) x3 (5) x3 (6) x3

= (9.55000, 25.65000) = (.59500, −1.21500) = (1.49050, 1.47150) = (1.40095, 1.20285) = (1.40991, 1.22972) = (1.40901, 1.22703)

4. x∗1 = (1, 1), x∗2 = (2, 0), x∗3 = (1, 1) 7.

x7 + x8 + x9 = 13.00 x4 + x5 + x6 = 15.00 x1 + x2 + x3 = 8.00 .82843(x6 + x8 ) + .58579x9 = 14.79 1.41421(x3 + x5 + x7 ) = 14.31 .82843(x2 + x4 ) + .58579x1 = 3.81 x3 + x6 + x9 = 18.00 x2 + x5 + x8 = 12.00 x1 + x4 + x7 = 6.00 .82843(x2 + x6 ) + .58579x3 = 10.51 1.41421(x1 + x5 + x9 ) = 16.13 .82843(x4 + x8 ) + .58579x7 = 7.04

Exercise Set 10.12 (page 637)

8.

x7 + x8 + x9 = 13.00 x4 + x5 + x6 = 15.00 x1 + x2 + x3 = 8.00 .04289(x3 + x5 + x7 ) + .75000(x6 + x8 ) + .61396x9 = 14.79 .91421(x3 + x5 + x7 ) + .25000(x2 + x4 + x6 + x8 ) = 14.31 .04289(x3 + x5 + x7 ) + .75000(x2 + x4 ) + .61396x1 = 3.81 x3 + x6 + x9 = 18.00 x2 + x5 + x8 = 12.00 x1 + x4 + x7 = 6.00 .04289(x1 + x5 + x9 ) + .75000(x2 + x6 ) + .61396x3 = 10.51 .91421(x1 + x5 + x9 ) + .25000(x2 + x4 + x6 + x8 ) = 16.13 .04289(x1 + x5 + x9 ) + .75000(x4 + x8 ) + .61396x7 = 7.04

       ei x x 12 1 0 = + 1. Ti , i = 1, 2, 3, 4, where the four values of 25 0 1 y y fi          13  13  25  ei 0 0 25 25 = 1.888 . . . are , , ; dH (S) = ln(4)/ ln 12 13 , and 13 0 fi 0 25 25

A52

Answers to Exercises

2. s ≈ .47; dH (S) ≈ ln(4)/ ln(1/.47) = 1.8. . . . Rotation angles: 0◦ (upper left); −90◦ (upper right); 180◦ (lower left); 180◦ (lower right) 3. (0, 0, 0), (1, 0, 0), (2, 0, 0), (3, 0, 0), (0, 0, 1), (0, 0, 2), (1, 2, 0), (2, 1, 3), (2, 0, 1), (2, 0, 2), (2, 2, 0), (0, 3, 3) 4. (a) (i) s = 13 ; (ii) all rotation angles are 0◦ ; (iii) dH (S) = ln(7)/ ln(3) = 1.771. . . . This set is a fractal. (b) (i) s = 21 ; (ii) all rotation angles are 180◦ ; (iii) dH (S) = ln(3)/ ln(2) = 1.584. . . . This set is a fractal. (c) (i) s = 21 ; (ii) rotation angles: −90◦ (top); 180◦ (lower left); 180◦ (lower right); (iii) dH (S) = ln(3)/ ln(2) = 1.584. . . . This set is a fractal.

(d) (i) s = 21 ; (ii) rotation angles: 90◦ (upper left); 180◦ (upper right); 180◦ (lower right); (iii) dH (S) = ln(3)/ ln(2) = 1.584. . . . This set is a fractal.

5. s = .8509 . . . , θ = −2.69◦ . . . 8. ln(4)/ ln

4 3

= 4.818 . . .

7. dH (S) = ln(16)/ ln(4) = 2

6. (0.766, 0.996) rounded to three decimal places

9. dH (S) = ln(8)/ ln(2) = 3; the cube is not a fractal.

10. k = 20; s = 13 ; dH (S) = ln(20)/ ln(3) = 2.726 . . .; the set is a fractal.

 2

12. Area of S0 = 1; area of S1 = 89 = 0.888 . . . ; area of S2 = 89 = 0.790 . . . ;  3  4 area of S3 = 89 = 0.702 . . . ; area of S4 = 89 = 0.624 . . .

11.

Initial set

First iterate

Second iterate Third iterate Fourth iterate

dH (S) = ln(2)/ ln(3) = 0.6309 . . .

Exercise Set 10.13 (page 650) 1. (250) = 750, (25) = 50, (125) = 250, (30) = 60, (10) = 30, (50) = 150, (3750) = 7500, (6) = 12, (5) = 10

    * , 0 , 36 , 36 , 0, 36 ; )       * )       * two 4-cycles: 46 , 0 , 46 , 46 , 26 , 0 , 26 , 26 and 0, 26 , 26 , 46 , 0, 46 , 46 , 26 ; ) 1   1 2   3 5   2 1   3 4   1 5   5   5 4   3 1  two 12-cycles: 0, 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 0, 6 , 6 , 6 , 6 , 6 ,  4 5   3 2   5 1 * )              , , , , , and 1 , 0 , 1 , 1 , 26 , 36 , 56 , 26 , 16 , 36 , 46 , 16 , 56 , 0 ,  56 56   46 36   16 46   5 3  6 2 5 *6 6 , , 6 , 6 , 6 , 6 , 6 , 6 , 6 , 6 . (6) = 12 6 6

2. One 1-cycle: {(0, 0)}; one 3-cycle:

) 3 6

3. (a) 3, 7, 10, 2, 12, 14, 11, 10, 6, 1, 7, 8, 0, 8, 8, 1, 9, 10, 4, 14, 3, 2, 5, 7, 12, 4, 1, 5, 6, 11, 2, 13, 0, 13, 13, 11, 9, 5, 14, 4, 3, 7, . . . (c) (5, 5), (10, 15), (4, 19), (2, 0), (2, 2), (4, 6), (10, 16), (5, 0), (5, 5), . . . 4. (c) The first five iterates of



1 ,0 101



are



1 , 1 101 101

  ,

2 , 3 101 101

  ,

5 , 8 101 101

  13 ,

101

,

21 101



, and

 34 101

,

55 101



.

Answers to Exercises A53



3

2





about the center point (0, 1)

1 2

1 2

7



of S .

(1, 1)

(0, 1)

IV

   x 1 (1, 1/2) → III y 1

II

9. (0, 1/2)

and

5

. 1 1 2 3 ◦ (c) The geometric effect ofthis  transformation is to rotate each point in the interior of S clockwise by 90

6. (b) The matrices of Anosov automorphisms are

    a x + b y 2

(1/2, 1) I´

III´

1

II´

IV´

I (0, 0)

(1, 1)

(1, 0)

(0, 0)

(1/2, 0)

        a a 0 0 = = In region I: ; in region II: ; b b 0 −1         −1 −1 a a = = ; in region IV: in region III: b b −1 −2         12. 15 , 35 and 45 , 25 form one 2-cycle, and 25 , 15 and 35 , 45 form another 2-cycle.

(1, 0)

14. Begin with a 101 × 101 array of white pixels and add the letter ‘A’ in black pixels to it. Apply the mapping to this image, which will scatter the black pixels throughout the image. Then superimpose the letter ‘B’ in black pixels onto this image. Apply the mapping again and then superimpose the letter ‘C’ in black pixels onto the resulting image. Repeat this procedure with the letters ‘D’ and ‘E’. The next application of the mapping will return you to the letter ‘A’ with the pixels for the letters ‘B’ through ‘E’ scattered in the background. Four subsequent applications of T to this image will produce the remaining images.

Exercise Set 10.14 (page 662) 1. (a) GIYUOKEVBH



−1

=

2. (a) A

12

7

23

15

15

12

21

5

 (f) A−1 =



(b) SFANEFZWJH (b) Not invertible

 (c) A

=

 

3. WE LOVE MATH

−1

4. Deciphering matrix =

1

19

23

24

 (d) Not invertible



7

15

6

5

 ; enciphering matrix =

(e) Not invertible



7

5

2

15



5. THEY SPLIT THE ATOM

6. I HAVE COME TO BURY CAESAR

8. A is invertible modulo 29 if and only if det(A)  = 0 (mod 29).

Exercise Set 10.15 (page 672) 2.

an =

1 4

bn =

1 2

cn =

1 4

 1 n+1

⎫ (a0 − c0 )⎪ ⎪ ⎪ ⎬

⎫ an → 41 ⎪ ⎪ ⎬ as n → ⬁ n = 1, 2, . . . bn = 21 ⎪ ⎪ ⎪ ⎪ ⎪  1 n+1 1⎭ ⎭ cn → 4 − 2 (a0 − c0 ) +

2

7. (a) 010110001

0 (b) ⎣1 1

1 1 0



1 1⎦ 1

A54

Answers to Exercises



2 1 ⎪ + (2a0 − b0 − 4c0 )⎪ ⎪ ⎪ 3 6(4)n ⎪ ⎬ 1 1 n = 0, 1, 2, . . . = − (2a0 − b0 − 4c0 )⎪ ⎪ 3 6(4)n ⎪ ⎪

a2n+1 = 3.

b2n+1

⎪ ⎭

c2n+1 = 0



5 1 ⎪ + (2a0 − b0 − 4c0 )⎪ ⎪ ⎪ 12 6(4)n ⎪ ⎪ ⎪ ⎬ 1 b2 n = n = 1, 2, . . . ⎪ 2 ⎪ ⎪ ⎪ ⎪ 1 1 ⎪ ⎭ c2n = − (2a0 − b0 − 4c0 ) ⎪ n 12 6(4)

a2n =

 

4. Eigenvalues: λ1 = 1, λ2 =

1 ; 2

1

eigenvectors: e1 =

0

 , e2 =

1



−1

5. 12 generations; .006% ⎡ √ √ n+1 √ √ n+1 ⎤ 1 1 1 + · [(− 3 − 5 )( 1 + 5 ) + (− 3 + 5 )( 1 − 5 ) ]⎥ ⎢ 2 3 4n+2 ⎢ ⎥ ⎢1 ⎥ √ n+1 √ n+1 1 ⎢ ⎥ ⎡1⎤ ⎢ · n+1 [(1 + 5 ) + (1 − 5 ) ] ⎥ 2 ⎢3 4 ⎥ ⎢ ⎢ ⎥ ⎥ ⎥ ⎢1 ⎢ ⎥ ⎢ · 1 [(1 + √5 )n + (1 − √5 )n ] ⎢ 0⎥ ⎥ ⎢ ⎢ 0⎥ ⎥ n+ 1 ⎢3 4 ⎢ ⎥ ⎥ (n) 6. x(n) = ⎢ ⎥; x → ⎢ ⎥ as n → ⬁ √ n √ n ⎢1 ⎢ 0⎥ ⎥ 1 ⎢ · ⎢ ⎥ ⎥ ⎢ 3 4n+1 [(1 + 5 ) + (1 − 5 ) ] ⎢ 0⎥ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ √ √ 1 1 ⎢1 ⎥ n+1 n+1 ⎢ · n+1 [(1 + 5 ) + (1 − 5 ) ] ⎥ 2 ⎢3 4 ⎥ ⎢ ⎥ ⎣1 1 √ √ n+1 √ √ n+1 ⎦ 1 + · n+2 [(−3 − 5 )(1 + 5 ) + (−3 + 5 )(1 − 5 ) ] 2 3 4



1

⎢0 ⎢ 8. ⎢ ⎣0 0

0



0

0

0

0

0

0

0⎦

0

0

1

0⎥ ⎥



Exercise Set 10.16 (page 681)    1. (a) λ1 =

3 , 2

1

x1 =

 (c) x(6) ≈ Lx(5) ≈ 7. 2.375

1 3

857 285

(b) x



(1)

=



100



, x

50



, x(6) ≈ λ1 x(5) ≈

(2 )



=

175 50

 , x

 (3)



250 88

 , x

 (4 )

855 287

8. 1.49611

Exercise Set 10.17 (page 690)⎡

1



⎢ ⎥

1 ⎥ 1. (a) Yield = 33 13 % of population; x1 = ⎢ ⎣3⎦ 1

⎡18⎤ 1

⎢1⎥ ⎥ (b) Yield = 45.8% of population; x1 = ⎢ ⎣ 2 ⎦; harvest 57.9% of youngest age class 1 8



382 125

 , x

 (5)



570 191



Answers to Exercises A55



1.000

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 2. x1 = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣





⎢ .845⎥ ⎢ ⎥ ⎢ ⎥ .824⎥ ⎢ ⎢ ⎥ ⎢ .795⎥ ⎢ ⎥ ⎢ .755⎥ ⎢ ⎥ ⎢ ⎥ ⎢ .699⎥ ⎥, Lx1 = ⎢ ⎢ ⎥ .626⎥ ⎢ ⎢ ⎥ ⎢ .532⎥ ⎢ ⎥ ⎢ 0⎥ ⎢ ⎥ ⎢ ⎥ 0⎥ ⎢ ⎢ ⎥ ⎣ 0⎦

2.090



⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 1.090 + .418 ⎥, = .199 ⎥ 7.584 ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦

.845 .824 .795 .755 .699 .626 .532 .418

0

0

4. hI = (R − 1)/(aI b1 b2 · · · bI −1 + · · · + an b1 b2 · · · bn−1 ) 5. hI =

a1 + a2 b1 + · · · + (aJ −1 b1 b2 · · · bJ −2 ) − 1 aI b1 b2 · · · bI −1 + · · · + aJ −1 b1 b2 · · · bJ −2

Exercise Set 10.18 (page 696) 1.

π2 3

+ 4 cos t + cos 2t +

T2

T2 2. + 2 3 π 3. 5.

1

π T 4

+ −

4 cos 3t 9



4π 6π 8π 1 1 1 t + 2 cos t + 2 cos t + 2 cos t cos T 2 T 3 T 4 T

1 2 2 cos 2t − cos 4t sin t − 2 3π 15π 8T

π2

4.

4

π

!

T2 − π



4π 6π 8π 1 1 1 t + sin t + sin t + sin t sin T 2 T 3 T 4 T

1 1 1 1 1 − cos nt cos t − cos 2t − cos 3t − · · · − 2 1·3 3·5 5·7 (2n − 1)(2n + 1)

2πt 6πt 10πt 2nπt 1 1 1 1 + 2 cos + 2 cos + ··· + cos cos 22 T 6 T 10 T (2n)2 T

Exercise Set 10.19 (page 704) 1. (a) Yes; v = 15 v1 + 25 v2 + 25 v3

(b) No; v = 25 v1 + 45 v2 − 15 v3

(c) Yes; v = 25 v1 + 35 v2 + 0v3

(d) Yes; v =

4 v 15 1

+

6 v 15 2

+

5 v 15 3

2. m = number of triangles = 7, n = number of vertex points = 7, k = number of boundary vertex points = 5; Equation (7) is 7 = 2(7) − 2 − 5. 3. w = M v + b = M(c1 v1 + c2 v2 + c3 v3 ) + (c1 + c2 + c3 )b = c1 (M v1 + b) + c2 (M v2 + b) + c3 (M v3 + b) = c1 w1 + c2 w2 + c3 w3 4. (a) v1

v2

(b) v1

v3 v4

v6

v2 v3

v4

v5

v5

v7

v6

v7

!

!

!

A56

Answers to Exercises

 5. (a) M =

2

0

1

 (c) M =

1

0

0

1

 



1

(b) M =

2







1

, b= , b=

2

 −1

1

1



 (d) M =

−3

 

3

1

2

0

1





1 2

0

, b=

, b=

1 2



−1

7. (a) Two of the coefficients are zero. (b) At least one of the coefficients is zero. (c) None of the coefficients are zero.



8. (a)

1 v 3 1

+ 13 v2 + 13 v3

(b)

8/3 2

Exercise Set 10.20 (page 712) 1. (a) [1/5

2/5

2/5]T

(b) [1/2

3. The matrix M in Equation (8) is



1

⎢ ⎢1 ⎢ ⎢. ⎢. ⎣.

(1 − δ) M = δB + n

1

1/2]T

0



1

···

1

1

···

1⎥

.. .

..

1

···



1

⎥ ⎢ 1 δ⎢ ⎥ ⎢ = ⎢. .. ⎥ ⎥ ⎢ . ⎦ n ⎣ ..

.

1

1



1

···

1

1

···

1⎥

.. .

..

1

···



⎥ ⎥ (1 − δ) + .. ⎥ ⎥ n .⎦

.

1

⎢ ⎢1 ⎢ ⎢. ⎢. ⎣.

1

1



1

···

1

1

···

1⎥

.. .

..

1

···



1

⎥ ⎢ 1 1⎢ ⎥ ⎢ = ⎢. .. ⎥ ⎥ ⎢ . ⎦ n ⎣ ..

.

1

1

which has the normalized eigenvector [1/n 1/n · · · 1/n] . Thus all pages have page rank 1/n.



1

···

1

1

···

1⎥

.. .

..

1

···

⎥ ⎥ .. ⎥ ⎥ .⎦

.

1

T

5. (k)

x

= Mx

(k−1)

⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨



1

(1 − δ) = δB + ⎪ n ⎪ ⎪ ⎪ ⎪ ⎩

⎢ ⎢1 ⎢ ⎢. ⎢. ⎣. 1

1

···

1

···

.. .

..

1

···

⎤⎫ ⎥⎪ ⎪ 1⎥⎪ ⎬ (1 − δ) ⎥ (k−1) = δB x(k−1) + ⎥ .. ⎥⎪ x n . ⎦⎪ ⎪ ⎪ ⎪ ⎭ 1 ⎪ ⎪

.



1

⎢ ⎢1 ⎢ ⎢. ⎢. ⎣.

1

1



1

···

1

1

···

1⎥

.. .

..

1

···

.

⎥ ⎥ (k−1) x .. ⎥ ⎥ .⎦

1

⎡ ⎤ 1

⎢ ⎥ 1⎥ (1 − δ) ⎢ ⎢ ⎥ (k−1) = δB x + ⎢.⎥ n ⎢ .. ⎥ ⎣ ⎦ 1 where the last equality is true because for each row in the last matrix product we have [1 1 · · · 1]x(k−1) = 1 since the sum of the entries of the state vector x(k−1) is 1. 7. (a) Eigenvalues are 1 and −1. Eigenvector for eigenvalue 1 is [1 1]T . The iterates alternate between [1 0]T and [0 1]T and

           

     k k+1 , , . . . and the , 0 1 1 2 2 3 k k−1 k                   1 k k+1 1 1 1 2 1 2 1 3 1 3 1 1 k 1 , , , , , ,..., , ,... fractional page count is , 2 1 3 1 4 2 5 2 6 3 2k − 1 k − 1 2k k 2k + 1 0 k so do not converge. The total page count is

1

,

1

,

2

,

2

,

3

3

,



,...,

k

which converges to [1/2 1/2]T . (b) Eigenvalues are 1 and −δ . Eigenvector for eigenvalue 1 is [1 1]T for any δ . Thus both pages have the same rank (as is obvious by



1 1−δ symmetry). M = 2 1+δ

1+δ





k 1 1 + (−δ) , M = 2 1 − (−δ)k 1−δ k

1 − (−δ)k 1 + (−δ)k





1 1 , and so M converges to 2 1

Therefore, for any initial vector we have that x(k) = M x(k−1) = M k−1 x(1) →

k



1 1 2 1

1 1



x1(1)

x2(1)





=

(1)

1 1



as k goes to infinity. (1)

1 x1 + x 2 2 x1(1) + x2(1)



 

=

1 1 . 2 1

Answers to Exercises A57

9.

1

2

3



0

⎢1 ⎢ Transition matrix = ⎢ ⎣0

4

1/2

1/3

0

1/3

1/2

0

0

1/3

0

1/2





4/13

0 ⎥ ⎥



⎢5/13⎥ ⎥ ⎢ ⎥ ; eigenvector = ⎢ ⎥ ⎣3/13⎦ 1/2⎦ 1/13

0

11. 1

2

3



0

⎢ ⎢1 ⎢ ⎢0 ⎢ ⎢ ⎢. Transition matrix = ⎢ .. ⎢ ⎢ ⎢0 ⎢ ⎢0 ⎣

4

5

1/2

0

···

0

0

0

1/2

···

0

0

1/2

0

···

0

0

.. .

.. .

..

.. .

.. .

0

0

···

0

1/2

0

0

···

1/2

0

0

0

···

0

1/2

0

.

0



⎡ ⎤

⎥ 0⎥ ⎥ 0⎥ ⎥ ⎥ 1 .. ⎥ ; eigenvector = .⎥ ⎥ 2 (n − 1) ⎥ 0⎥ ⎥ 1⎥ ⎦

1

0

13.

1

2

3



0

⎢1 ⎢ Transition matrix = ⎢ ⎣0 0

4

1/2

0

0

1/2

1/2

0

0

1/2

1/2







2 /8

⎢3/8⎥ ⎢ ⎥ ⎥ ; normalized eigenvector = ⎢ ⎥ ⎣2/8⎦ 1/2⎦ 0 ⎥ ⎥ 0

1

⎢ ⎥ ⎢2 ⎥ ⎢ ⎥ ⎢2 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ .. ⎥ ⎢.⎥ ⎢ ⎥ ⎢ ⎥ ⎢2 ⎥ ⎢ ⎥ ⎢2 ⎥ ⎣ ⎦

1/8

INDEX A Absolute value: of complex number, 313, A7 of determinant, 178 Addition: associative law for, 39, 134 by scalars, 184 of vectors in R 2 and R 3 , 132, 134 of vectors in R n , 138 Additivity property, of linear transformation, 448 Adjacency matrix, 705–706 Adjoint, of a matrix, 122–124 Aeronautics, yaw, pitch, and roll, 263 Affine transformations, 633–635 contracting, 633–634 with warps, 696 Age-specific population growth, 671–679 female age distribution of animals, 674 female age distribution of humans, 678–679 Leslie matrix, 673, 675–679 limiting behavior, 674–679 Ahmes Papyrus, 532 Algebraic multiplicity, 309–310 Algebraic operations, using vector components, 138–139 Algebraic properties of matrices, 39–49 Algebraic properties of vectors, dot product, 147–148 Algebraic Reconstruction Techniques (ARTs), 612, 615–618 Alleles, 342 Amps (unit), 86 Angle: in R n , 148–149, 155 between vectors, 146–149, 356–357 Animal population harvesting, 681–687 model for, 682–684 only in youngest age class, 685–687 optimal sustainable yield, 687 uniform, 684–685 Anosov automorphism, 648–649 Anticommutativity, 325 Antihomogeneity property, of complex Euclidean inner product, 316 Antisymmetry property: of complex Euclidean inner product, 316 of dot product, 316 Approximate integration, 93–94 Approximations, best, 379–380 Approximation problems, 394–396 Archimedes, 534–535

Area: of parallelogram, 176 of triangle, 176–177 Argument, of complex number, 314, A8 Arithmetic average, 347 Arithmetic operations: matrices, 27–35, 39–43 vectors in R 2 and R 3 , 132–134 vectors in R n , 137–139 Arnold, Vladimir I., 638 Arnold’s cat map, 638–640, 644–646 Artificial intelligence, 493 ARTs (Algebraic Reconstruction Techniques), 612, 615–618 Associative law for addition, 39, 134 Associative law for matrix multiplication, 39, 40–41 Astronautics, yaw, pitch, and roll, 263 Augmented matrices, 6–7, 11, 12, 18, 25, 34 Autosomal inheritance, 661–665 Autosomal recessive diseases, 665–666 Axes: rotation of, in 2-space, 404–406 rotation of, in 3-space, 406–407 Axis of rotation, 262 B Babylonia, early applications in, 532–533 Back-substitution, 19–20 Backward phase, 15 Bakhshali Manuscript, 536 Balancing (of chemical equation), 89 Barnsley, Michael, 622, 632, 634 Basis, 221–223 change of, 229–234, 482–484 coordinate system for vector space, 214–216 for eigenvectors and eigenspaces, 295–298 finite basis, 214 by inspection, 224–225 linear combinations and, 245 number of vectors in, 222 ordered basis, 217 orthogonal basis, 365 for orthogonal complement, 360 orthonormal basis, 365–367 for row and column spaces, 241 by row reduction, 242–244 for row space of a matrix, 244–245 standard basis, 214–216, 218 transition matrix, 231–234 uniqueness of basis representation, 216

Basis vectors, 214, 450–451 Bateman, Harry, 517 Battery, 86 Beam density, computed tomography, 614 Begin-triangle, warps, 696 Beltrami, Eugenio, 518 Best approximation theorem, 379–380 Block triangular form, 118 Block upper triangular form, 103 Bôcher, Maxime, 7, 196 Books, ISBN number of, 153 Boundary data, temperature distribution, 601–602 Boundary mesh points, 603 Bounded sets, 622–623 Branches (network), 84 Brightness, graphical images, 136 Bunyakovsky, Viktor Yakovlevich, 149 C

C n , 317–320 Calculus of variations, 174 Cancellation law, 42 Cantor set, 637 Carroll, Lewis, 108 Cat map (Arnold’s), 638–640, 644–646 CAT scanner, 612 Cattle Problem, 534–536 Cauchy, Augustin, 122, 149, 184 Cauchy-Schwarz inequality, 148–149, 355–356 Cayley, Arthur, 30, 35, 44 Central conic, 421–422 Central conic in standard position, 421 Central ellipsoid in standard position, 428 Central quadrics in standard position, 422 Change-of-basis problem, 230–231, 482 Change of variable, 419 Chaos, 637–648 Arnold’s cat map, 638–640, 644–646 defined, 646 dynamical systems, 647–648 nonperiodic points, 645–646 periodic points, 640–642 period vs. pixel width, 642–643 repeated mappings, 639–640 tiled planes, 643–644 Characteristic equation, 292, 306 Characteristic polynomial, 293, 306 Chemical equations, balancing with linear systems, 88–91 Chemical formulas, 88 Chessboard moves, 561 China, early applications in, 533–534 I1

I2

Index

Chiu Chang Suan Shu, 533–534 Ciphers, 650–652. See also Cryptography Ciphertext, 650 Ciphertext vector, 651 Circle, through three points, 527 Clamped splines, 548 Cliques, directed graphs, 562–564 Clockwise closed-loop convention, 86 Closed economies, 96 Closed Leontief model, 577–581 Closed sets, 622–623 Closure under addition, 184 Closure under scalar, 184 Coefficients: of linear combination of matrices, 32 of linear combination of vectors, 139, 195 literal, 45 Coefficient matrices, 34, 306, 491 Cofactor, 106–107 Cofactor expansion: of 2 × 2 matrices, 107–108 determinants by, 105–110 elementary row operations and, 116–117 Collinear vectors, 133–134 Columns, cofactor expansion and choice of, 109 Column matrices, 26–27 Column-matrix form of vectors, 237 Column space, 237, 238, 240, 241, 251–252 basis for, 241, 243 equal dimensions of row and column space, 248–249 orthogonal project on a, 383–384 Column vectors, 26, 27, 40 Column-vector form of vectors, 140 Column-wheel, 568 Combustion, linear systems to analyze combustion equation for methane, 88–90 Comma-delimited form of vectors, 139, 217, 237 Common initial point, 134 Commutative law for addition, 39 Commutative law for multiplication, 41, 47 Complete reaction (chemical), 89 Complex conjugates: of complex numbers, 313, A6 of vectors, 315 Complex dot product, 316 Complex eigenvalues, 317–318, 320–322 Complex eigenvectors, 317–318

Complex Euclidean inner product, 316–317 Complex exponential functions, A10–A11 Complex inner products, 354 Complex inner product space, 354 Complex matrices, 315 Complex n-space, 314 Complex n-tuples, 314 Complex numbers, 313–314, A5–A11 division of, A8, A9–A11 multiplication of, A6, A9–A11 polar form of, 314, A9–A11 Complex number system, A5 Complex plane, A6 Complex vector spaces, 184, 313–324 Component form, 156 Components (of a vector): algebraic operations using, 138–139 calculating dot products using, 147–148 complex n-tuples, 314 finding, 135–136 in R 2 and R 3 , 134–135 vector components of u along a, 159–160 Composition: with identity operator, 461 of linear transformations, 460–461, 463–464 matrices of, 477–478 of matrix transformations, 270–273 non-commutative nature of, 271 of one-to-one linear transformations, 463–464 of reflections, 272, 283–284 of rotations, 271–272, 283 of three transformations, 272–273 Compression operator, 265, 283 Computed tomography, 611–620 Algebraic Reconstruction Techniques, 615–620 derivation of equations, 613–615 scanning modes, 612 Computers, LINPACK, 492 Computer graphics, 593–598 morphs, 695, 699–702 rotation, 596–598 scaling, 595 translation, 596 visualization of three-dimensional object, 593–595 warps, 695–699 Computer programs, LU -decomposition and, 492 Conclusion, A1 Condensation, 108 Congruent set, 622

Conic sections (conics), 420–424 classifying, with eigenvalues, 425–426 quadratic forms of, 420–422 through five points, 528–529 Conjugate transpose, 437–438 Consistency, determining by elimination, 65–66 Consistent linear system, 3–4, 238–239 Constrained extremum, 429–432 Constrained extremum theorem, 430 Constraint, 430 Consumption matrix, 97, 582 Consumption vectors, 97, 98 Continuous derivatives, functions with, 194 Contracting affine transformation, 633–634 Contraction, 264, 449 Contraction operators: and fractals, 622, 623, 626–627 for general linear transformations, 449 Contrapositive, A2 Convergence: of power sequences, 501 rate of, 507 Converse, A2 Convex combination, 696 Coordinates, 217 of generalized point, 136 in R 3 , 218–219 relative to standard basis for R n , 218 Coordinate map, 229–230 Coordinate systems, 212–214 “basis vectors” for, 214 units of measurement, 213 Coordinate vectors: computing, 232–233 matrix form of, 217 relative to orthonormal basis, 367 relative to standard bases, 218 Cormack, A. M., 612 Corresponding linear systems, 169 Cramer, Gabriel, 125 Cramer’s rule, 125 Critical points, 432 Cross product, 172–179 calculating, 173–174 determinant form of, 175–176 geometric interpretation of, 176–177 notation, 173 properties of, 174–175 of standard unit vectors, 175–176 Cross product terms, 418, 423–424 Cryptography, 650–659 breaking Hill ciphers, 657–659 ciphers, 650–652

Index

deciphering, 654–656 Hill ciphers, 651–652, 656–659 modular arithmetic, 652–654 CT, See Computed tomography Cubic runout spline, 544–547 Cubic spline, 541–544 Cubic spline interpolation, 538–547 cubic runout spline, 544–547 curve fitting, 538–539 derivation of formula of cubic spline, 541–544 natural spline, 544–545 parabolic runout spline, 544–547 statement of problem, 539–540 Current (electrical), 86 Curve fitting, cubic spline interpolation, 538–539 D Damping factor, 708 Dangling pages, 704 Data compression, singular value decomposition, 521–524 Deciphering matrix, 657 Decomposition: eigenvalue decomposition, 514 Hessenberg decomposition, 514 LDU -decomposition, 498–499 LU -decomposition, 491–498, 513 PLU -decomposition, 499 Schur decomposition, 514 self-similar sets, 623 singular value decomposition, 516–519, 521–524 of square matrices, 514–515 Degenerate conic, 420 Degrees of freedom, 222 Demand vector, 581 DeMoivre’s formula, A10 Dense sets, in chaos theory, 645–646 Dependency equations, 245–246 Determinants, 45, 105–127 by cofactor expansion, 105–110 defined, 105 of elementary matrices, 114–115 equivalence theorem, 126–127 evaluating by row reduction, 113–117 general determinant, 108 geometric interpretation of, 178–179 of linear operator, 485 of lower triangular matrix, 109–110 of matrix product, 120–121 properties of, 116–124 sums of, 120 of 3 × 3 matrices, 110 of 2 × 2 matrices, 110

Devaney, Robert L., 646 Deviation, 395 Diagonal coefficient matrices, 328 Diagonal entries, 516 Diagonalizability: defined, 303 nondiagonalizability of n × n matrix, 414–415 orthogonal diagonalizability, 441 recognizing, 307 of triangular matrices, 307 Diagonalization: matrices, 302–311 orthogonal diagonalization, 409–416 solution of linear system by, 328–330 Diagonal matrices, 67–69, 286 Dickson, Leonard Eugene, 123 Difference: matrices, 28 vectors, 133, 138 Differential equations, 326–330, 454 Differentiation, by matrix multiplication, 468–469 Differentiation transformation, 453 Digital communications, matrix form and, 254 Dilation, 264, 449 Dilation operators, 449, 622 Dimensions: of spans, 222 of vector spaces, 222 Dimension theorem, for linear transformations, 454–455 Dirac matrices, 325 Directed edges, 559 Directed graphs, 559–564 cliques, 562–564 dominance-directed, 564–566 Direct product, 146 Direct sum, 290 Discrete mean-value property, 603 Discrete random walk, 608 Discrete-time chaotic dynamical systems, 647 Discrete-time dynamical systems, 647 Discriminant, 319 Disjoint sets, A4 Displacement, 163 Distance, 346 general inner product spaces, 357 orthogonal projections for, 160–162 between parallel planes, 162 between a point and a plane, 161–162 real inner product spaces, 346 in R n , 144–145 triangle inequality for, 149–150

Distinct eigenvalues, 501 Distributive property: of complex Euclidean inner product, 316 of dot product, 147–148 Dodgson, Charles Lutwidge, 108 Dominance-directed graphs, 564–566 Dominant eigenvalue, 501–503 Dominant eigenvalue, of Leslie matrix, 675 Dominant genes, 661 Dot product, 145–148 algebraic properties of, 147–148 antisymmetry property of, 316 application of, 153 calculating with, 148 complex dot product, 316 cross product and, 173–174 dot product form of linear systems, 168–169 as matrix multiplication, 150–152 relationships involving, 173–174 symmetry property of, 147–148, 316 of vectors, 150–152 Drafting spline, 539 Dynamical system, 332–334, 647–648 E Ear: anatomy of, 689–690 least squares hearing model, 689–694 Echelon forms, 11–12, 21–22 Economics, n-tuples and, 136 Economic modeling, Leontief economic analysis with, 96–100, 577–584 Economic sectors, 96 Egypt, early applications in, 532 Eigenspaces, 295–296, 306, 317 bases for, 295–298 of real symmetric matrix, 439–440 Eigenvalues, 291–298, 306, 317–318 complex eigenvalues, 317–318 conic sections classified by using, 425–426 dominant eigenvalues, 501–503 of general linear transformations, 299 of Hermitian, 439–440 of Hermitian matrices, 442 invertibility and, 298 of Leslie matrix, 675–679 of linear operators, 485 of square matrix, 307 of symmetric matrices, 411 of 3 × 3 matrix, 293–294 of triangular matrices, 294–295 of 2 × 2 matrix, 319–320

I3

I4

Index

Eigenvalue decomposition (EVD), 514 Eigenvectors, 291–298 bases for eigenspaces and, 295–298 complex eigenvectors, 317–318 left/right eigenvectors, 301 of real symmetric matrix, 439–440 of square matrix, 307 of symmetric matrices, 411 of 2 × 2 vector, 292 Einstein, Albert, 135, 136 Eisenstein, Gotthold, 30 Electrical circuits: network analysis with linear systems, 86–88 n-tuples and, 136 Electrical current, 86 Electrical potential, 86 Electrical resistance, 86 Elements (of a set), A3 Elementary matrices, 52 determinants, 114–115 and homogeneous linear systems, 58 invertibility, 54 matrix operators corresponding to, 284 Elementary row operations, 7–8, 53–54, 240 cofactor expansion and, 116–117 determinants and, 113–117 and inverse operations, 54–57 and inverse row operations, 54–57 for inverting matrices, 56–57 matrix multiplication, 53–54 row reduction and determinants, 113–117 Elimination methods, 14–16, 65–66 Ellipse, principal axes of, 423 Elliptic paraboloid, 437 Empty set, A4 Enciphering, 650 End-triangle, warps, 696 Entries, 26, 27 Equality, of complex numbers, A5 Equal matrices, 27–28, 40 Equal sets, A4 Equal vectors, 132, 137–138 Equilibrium temperature distribution, 601–609 boundary data, 601–602 discrete formulation of problem, 603–607 mean-value property, 602–603 Monte Carlo technique for, 608–609 numerical technique for, 607–608 Equivalence theorem, 384 determinants, 126–127

invertibility, 54–56, 298–299 n × n matrix, 253–254, 277 Equivalent statements, A2 Equivalent vectors, 132, 137–138 Errors: approximation problems, 395 least squares error, 379 mean square error, 395 measurements of, 395 percentage error, 507 relative error, 507 roundoff errors, 22 Error vector, 381 Estimated percentage error, 507 Estimated relative error, 507–508 Euclidean inner product, 346–348 complex Euclidean inner product, 316–317 of vectors in R 2 or R 3 , 145 Euclidean norm, 316 Euclidean n-space, 346 Euclidean scaling, power method with, 503–504 Euler phi functions, 661 Euler’s formula, A10 Evaluation inner product, 350–351 Evaluation transformation, 450 EVD (eigenvalue decomposition), 514 Exchange matrix, 579 Expansion operator, 265, 283–284 Expected payoff, matrix games, 570 Exponents, matrix laws, 47 Exponential models, 393 F Factorization, 491, 494 Family influence, 560 Fan-beam mode scanning, computed tomography, 612 Fertile age class, 672 Fibonacci, Leonardo, 52 Fibonacci sequence, 52 Fibonacci shift-register random-number generator, 648 Fingerprint storage, 523 Finite basis, 214 Finite-dimensional inner product space, 360, 373 Finite-dimensional vector space, 214, 224–225, 229–230 First-order linear system, 326–328 Fixed points, 642 Floating-point numbers, 509 Floating-point operation, 509 Flops, 509–512 Flow conservation, in networks, 84

Forest management, 586–592 Forward phase, 15 Forward substitution, 493 4 × 6 matrix, rank and nullity of, 249–250 Fourier, Jean Baptiste, 398 Fourier coefficients, 397 Fourier series, 396–398 Fractals, 622–635 algorithms for generating, 629–632 defined, 626 in Euclidean plane, 622 Hausdorff dimension of self-similar sets, 625–626 Monte Carlo approach for, 632–633 self-similar sets, 622–624 similitudes, 626–629 topological dimension of sets, 624–625 Free variables, 13, 250 Free variable theorem for homogeneous systems, 18–19 Full column rank, 375 Functions: with continuous derivatives, 194 linear dependence of, 207–209 Function spaces, 194–195 Fundamental spaces, 251–253 Fundamental Theorem of Two-Person Zero-Sum Games, 571–572 G Games of strategy: game theory, 568–569 2 × 2 matrix games, 573–576 two-person zero-sum games, 569–573 Game theory, 568–569 Gauss, Carl Friedrich, 15, 29, 106, 533 Gaussian elimination, 11–16, 512, 513 defined, 16 roundoff errors, 22 Gauss-Jordan elimination: of augmented matrix, 318, 513 described, 15 for homogeneous system, 18 polynomial interpolation by, 92–93 roundoff errors, 22 using, 45, 512–513 General determinant, 108 General Electric CT system, 612 Generalized Theorem of Pythagoras, 358–359 General solution, 13, 239, 326 Genes, dominant and recessive, 661 Genetics, 661–670 autosomal inheritance, 662–665 autosomal recessive diseases, 665–666

Index

inheritance traits, 661–662 X-linked inheritance, 666–670 Genetic diseases, 665–666 Genotypes, 342, 661–662 defined, 661 distribution in population, 662–665 Geometric multiplicity, 309–310 Geometric vectors, 131 Geometry: of linear systems, 164–170 quadratic forms in, 420–422 in R n , 149–150 Gibbs, Josiah Willard, 146, 173 Golub, Gene H., 518 Gram, Jorgen Pederson, 371 Gram-Schmidt process, 370–373, 375, 397 Graphic images: images of lines under matrix operators, 280–281 n-tuples and, 136 RGB color model, 140 Graph theory, 559–566 cliques, 562–564 directed graphs, 559–564 dominance-directed graphs, 564–566 relations among members of sets, 559 Grassmann, H.G., 184 Greece, early applications in, 534–536 Growth matrix, forest management model, 588 H Hadamard’s inequality, 129 Harvesting: animal populations, 681–687 forests, 586–592 Harvesting matrix (animals), 682–684 Harvest vector (forests), 588 Hausdorff, Felix, 625 Hausdorff dimension, 625–626 Hearing, least squares model for, 689–694 Hermite, Charles, 438 Hermite polynomials, 220 Hermitian matrices, 437–440 Hesse, Ludwig Otto, 433 Hessenberg decomposition, 514 Hessenberg’s theorem, 415 Hessian matrices, 433–434 Hilbert, David, 371 Hilbert space, 371 Hill, George William, 196 Hill, Lester S., 651 Hill 2-cipher, 652, 656 Hill 3-cipher, 652 Hill ciphers, 651–652, 656–659 Hill n-cipher, 652

Homogeneity property: of complex Euclidean inner product, 316 of dot product, 147–148 of linear transformation, 448 Homogeneous equations, 157–158, 168 Homogeneous linear equations, 2 Homogeneous linear systems, 17–19, 239 constant coefficient first-order, 327 dimensions of solution space, 223–224 and elementary matrices, 58 free variable theorem for, 18–19 solutions of, 198–199 Homogeneous systems, solutions spaces of, 199 Hooke’s law, 390 Houndsfield, G. N., 612 Householder matrix, 409 Householder reflection, 409 Hue, graphical images, 136 Human hearing, least squares model for, 689–694 Hyperplane, 618 Hypothesis, A1 I Idempotency, 51 Identity matrices, 42–43 Identity operators: about, 448 composition with, 461 kernel and range of, 452 matrices of, 476–477 Images: of basis vectors, 450–451 of lines under matrix operators, 280–281 n-tuples and, 136 RGB color model, 140 Image processing, data compression and, 523–524 Imaginary axis, A6 Imaginary numbers, See Complex numbers Imaginary part: of complex numbers, 313, A5 of vectors and matrices, 314–315 Inconsistent linear system, 3 Indefinite quadratic forms, 424 India, early applications in, 536 Infinite-dimensional vector space, 214, 216 Inheritance, 661–665 autosomal, 661–665 X-linked, 661–662, 666–670 Initial age distribution vector, 672 Initial condition, 326

I5

Initial point, 131 Initial-value problem, 326 Inner product: algebraic properties of, 352 calculating, 352 complex inner products, 354 Euclidean inner product, 145, 316–317, 346–348 evaluation inner product, 350–351 examples of, 346–351 linear transformation using, 449 matrix inner products, 348 on Mnn , 349–350 on real vector space, 345 on R n , 346–348 standard inner products, 346, 349–350 Inner product space, 449 complex inner product space, 354 isomorphisms in, 469–470 unit circle, 348 unit sphere, 348 Inputs, in economics, 96 Input-output analysis, 96 Input-output matrix, 579 Instability, 22 Integer coefficients, 294 Integral transformation, 452 Integration, approximate, 93–94 Interior mesh points, 603 Intermediate demand vector, 98 Internet search engines, 704–710 Interpolating curves, 539 Interpolating polynomial, 91 Interpolation, 539 Intersection, A4 Invariant under similarity, 303, 484–485 Inverse: of 2 × 2 matrices, 45–46 of diagonal matrices, 68 of matrix using its adjoint, 124 of a product, 46–47 Inverse linear transformations, 462–463 Inverse matrices, 43–46 Inverse operations, 54–57 Inverse row operations, 54–57 Inverse transformations, 477–478 Inversion, solving linear systems by, 45–46, 61–62 Inversion algorithm, 55 Invertibility: determinant test for, 121–122 eigenvalues and, 298 of elementary matrices, 54 equivalence theorem, 54–56 matrix transformation and, 273–274

I6

Index

test for determinant, 121–122 of transition matrices, 232–233 of triangular matrices, 69 Invertible matrices: algebraic properties of, 43–46 defined, 43 and linear systems, 61–66 modulo m, 654–656 ISBN (books), 153 Isomorphism, 466–470 Isotherms, 602 Iterates (Jacobi iteration), 607–608 Iterations: of Arnold’s cat map, 639 Jacobi, 607–608 J Jacobi iteration, 607–608

Jordan, Camille, 515, 518 Jordan, Wilhelm, 15 Jordan canonical form, 515 Junctions (network), 84, 86 K Kaczmarz, S., 615

Kalman, Dan, 413 Kernel, 200, 452–454, 458 Kirchhoff, Gustav, 88 Kirchhoff’s current law, 87 Kirchhoff’s voltage law, 87

k th principal submatrix, 426 L Lagrange, Joseph Louis, 174

Laguerre polynomials, 220 LDU -decomposition, 498–499 LDU -factorization, 499 Leading 1!, 11 Leading variables, 13, 250 Least squares: curve fitting, 387–388 mathematical modeling using, 387–392 Least squares approximation, 395–398 defined, 396 in human hearing model, 689–694 Least squares error, 379 Least squares error vector, 379 Least squares fit: of polynomial, 390–391

of quadratic curve to data, 391–392 straight line fit, 388–390 Least squares polynomial fit, 390–391 Least squares solutions, 389–390 infinitely many, 392 of linear systems, 378–379, 385 QR-decomposition and, 385 straight line fit, 388–390 unique, 391 Least squares straight line fit, 388–389 Left distributive law, 39 Left eigenvectors, 301 Legendre polynomials, 372–373 Length, 142, 346, 357 Leontief, Wassily, 96, 577 Leontief economic models, 577–584 closed model, 577–581 economic systems, 577 input-output models, 96–100 open model, 96–100, 581–584 Leontief equation, 98 Leontief matrices, 98 Leslie matrix age-specific population growth, 673, 675–679 animal population harvesting, 682–684 eigenvalues, 675–679 Leslie model, of population growth, 671–679 Level curves, 432 Limit cycle, 616 Lines: image of, 281 line segment from one point to another in R 2 , 168 orthogonal projection on, 159 orthogonal projection on lines through the origin, 266–267 point-normal equations, 156–157 through origin as subspaces, 192–193 through two points, 526–527 through two points in R 2 , 167–168 vector and parametric equations in R 2 and R 3 , 164–166 vector and parametric equations of in R 4 , 166–167 vector form of, 158, 165 vectors orthogonal to, 157–158 Linear algebra, 1. See also Linear equations; Linear systems coordinate systems, 212–214 earliest applications of, 531–536 Linear beam theory, 539–540 Linear combinations: basis and, 245 history of term, 196

of matrices, 32–33 of vectors, 140, 144–145, 195, 197–198 Linear dependence, 196 Linear equations, 2–3, 168. See also Linear systems Linear form, 417–418 Linear independence, 196, 202–210, 226–227 of polynomials, 206 of sets, 202–206 of standard unit vectors in R 3 , 204 of standard unit vectors in R 4 , 205 of standard unit vectors in R n , 203–204 of two functions, 206–207 using the Wronskian, 209–210 Linearly dependent set, 203 Linearly independent set, 203, 205 Linear operators: determinants of, 485 matrices of, 476, 481–482 orthogonal matrices as, 403–404 on P2 , 476–477 Linear systems, 2–3. See also Homogeneous linear systems applications, 84–94 augmented matrices, 6–7, 11, 12, 18, 25, 34 for balancing chemical equations, 88–91 coefficient matrix, 34 with a common coefficient matrix, 62–63 comparison of procedures for solving, 509–513 computer solution, 1 corresponding linear systems, 169 cost estimate for solving, 509–512 dot product form of, 168–169 first-order linear system, 326–328 general solution, 13 geometry of, 164–170 with infinitely many solutions, 5–7 least squares solutions of, 378–379, 385 network analysis with, 84–88 nonhomogeneous, 19 with no solutions, 5 number of solutions, 61 overdetermined/underdetermined, 255–256 polynomial interpolation, 91–94 solution methods, 3, 4–7 solutions, 3, 11 solving by elimination row operations, 7–8 solving by Gaussian elimination, 11–16, 21, 22, 512, 513

Index

solving by matrix inversion, 45–46, 61–62 solving with Cramer’s rule, 126 in three unknowns, 12–13 Linear transformations: composition of, 460–461, 463–464 defined, 447 dimension theorem for, 454–455 eigenvalues of, 299 examples of, 449, 451 inverse linear transformations, 462–463 matrices of, 472–475 one-to-one, 458–460 onto, 458–460 from Pn to Pn+1 , 449 rank and nullity in, 454–455 using inner product, 449 Line segment, from one point to another in R 2 , 168 Links, 704 LINPACK, 492 Literal coefficients, 45 Liu Hui, 533 Logarithmic models, 393 Lower triangular matrices, 69, 295 LU -decompositions, 491–498, 513 constructing, 497 examples of, 494–497 finding, 494 method, 492 LU -factorization, 491, 494 M

Mnn , See n × n matrices Magnitude (norm), 142 Main diagonal, 27, 516 Mandelbrot, Benoit B., 622, 626 Mantissa, 509 Markov, Andrei Andreyevich, 336 Markov chain, 334–340, 549–557 limiting behavior of state vectors, 553–557 steady-state vector of, 339 transition matrix for, 339–340, 550–553 Markov matrix, 550 Mathematical models, 387–388 MATLAB, 492 Matrices. See also matrices of specific size, e.g.: 2 × 2 matrices adjoint of, 122–124 algebraic properties of, 39–49 arithmetic operations with, 27–35 coefficient matrices, 34, 306, 491 column matrices, 26–27 complex matrices, 315 compositions of, 477–478

defined, 1, 6, 26 determinants, 105–127 diagonal coefficient matrices, 328 diagonalization, 302–311 diagonal matrices, 67–69, 286 dimension theorem for matrices, 250 elementary matrices, 52, 54, 58, 114–115, 284–285 entries, 26, 27 equality of, 27–28, 40 examples of, 26–27 fundamental spaces, 251–253 Hermitian matrices, 437–440, 442 Hessian matrices, 433–434 identity matrices, 42–43 of identity operators, 476–477 inner products generated by, 348–349 inverse matrices, 43–46 of inverse transformations, 477–478 invertibility, 54–56, 69, 121–122, 232–233 invertible matrices, 43–46, 61–66 inverting, 56–57 Leontief economic analysis with, 96–100 linear combination, 32–33 of linear operators, 476, 481–482 of linear transformations, 472–475 lower triangular matrices, 109–110 normal matrices, 442 notation and terminology, 25–27, 34 orthogonally diagonalizable matrices, 410 orthogonal matrices, 401–407 partitioned, 30–32 permutation matrices, 499 positive definite matrices, 426 powers of, 46–47, 308–309 with proportional rows or columns, 115 rank of, 250 real and imaginary parts of, 314–315 real matrices, 315, 320–321 redundancy in, 254 reflection matrices, 402 rotation matrices, 262, 402 row equivalents, 52 row matrices, 26 scalar multiples, 28–29 similar matrices, 303 singular/nonsingular matrices, 43, 44 size of, 26, 27, 40 skew-Hermitian matrices, 442 skew-symmetric matrices, 442 square matrices, 27, 35, 43, 67, 69, 113–117, 307, 401, 514–515

I7

standard matrices, 276, 286–287, 383–384 stochastic matrices, 338–339 submatrices, 31, 427 symmetric matrices, 70–71, 320, 411, 433 trace, 36 transition matrices, 231–234, 482 transpose, 34–35 triangular matrices, 69–70, 294–295, 307 unitary matrices, 437–438, 440–442 upper triangular matrices, 69, 294 zero matrices, 41 Matrix factorization, 321–322 Matrix form of coordinate vector, 217 Matrix games: defined, 569 two-person zero-sum, 569–573 Matrix inner products, 348 Matrix multiplication, See Multiplication (matrices) Matrix notation, 25–27, 34, 418 Matrix operators: effect of, on unit square, 266 geometry of invertible, 283–285 graphics images of lines under matrix operators, 280–281 on R 2 , 280–287 Matrix polynomials, 48 Matrix spaces, transformations on, 449 Matrix transformations, 75–81, 448 composition of, 270–273 defined, 447 kernel and range of, 452–453 in R 2 and R 3 , 259–267 zero transformations, 448, 452 Maximization problems, for two-person zero-sum games, 573 Maximum entry scaling, power method with, 504–507 Mean square error, 395 Mean-value property, 602–603 Mechanical systems, n-tuples and, 137 Menger sponge, 636 Mesh points, 603–607 Methane, linear systems to analyze combustion equation, 88–90 Minor, 106–107 Mixed strategies, of players in matrix games, 572 m × n matrices (Mmn ): real vector spaces, 186–187 standard basis for, 215–216 Modular arithmetic, 638, 652–654

I8

Index

Modulus: of complex numbers, 313, A7 defined, 653 Monte Carlo technique: fractal generation, 632–633 temperature distribution determination, 608–609 Morphs, 695, 699–702 Multiplication (matrices), 29–30. See also Product (of matrices) associative law for, 39, 40–41 column-row expansion, 33–34 by columns and by rows, 31–32 differentiation by, 468–469 dot products as, 150–152 elementary row operations, 53–54 by invertible matrix, 285 order and, 41 Multiplication (vectors). See also Cross product; Euclidean inner product; Inner product; Product (of vectors) in R 2 and R 3 , 133 by scalars, 184 Multiplicative inverse: of complex number, A7 of modulo m, 654 N Natural isomorphism, 468 Natural spline, 544–545 n-cycle, 642 n-dimensional vector space, 224 Negative, of vector, 133 Negative definite quadratic forms, 424 Negative pole, 86 Negative semidefinite quadratic forms, 424 Net reproduction rate, 679 Networks, defined, 84 Network analysis, with linear systems, 84–88 n × n matrices (Mnn ): equivalent statements, 254, 277 Hessenberg’s theorem, 415 nondiagonalizability of, 414–415 standard inner products on, 349–350 subspaces of, 193 Nodes (network), 84, 86 Nonharvest vector (forests), 587 Nonhomogeneous linear systems, 19 Nonoverlapping sets, 622, 623 Nonperiodic pixel points, 645–646 Nonsingular matrices, 43 Nontrivial solution, 17 Nonzero vectors, 200

Norm (length), 142, 160, 346 calculating, 143 complex Euclidean inner product and, 316–317 Euclidean norm, 316 real inner product spaces, 346 of vector in C[a, b], 351–352 Normal, 156 Normal equations, 380 Normalization, 144 Normal matrices, 442 Normal system, 380 n-space, 135, 136. See also R n Nullity, 454–455 of 4 × 6 matrix, 249–250 sum of, 251 Null space, 237, 240 Numerical analysis, 11 Numerical coefficients, 45 O Ohms (unit), 86 Ohm’s law, 86 1-Step connection, directed graphs, 561, 564–565 One-to-one linear transformations, 458–460, 463–464 Onto linear transformations, 458–460 Open economies, Leontief analysis of, 96–100 Open Leontief model, 581–584 Open sectors, 96 Operators, 449, 460. See also Linear operators Optimal strategies: 2 × 2 matrix games, 575–576 two-person zero-sum games, 571–573 Optimal sustainable harvesting policy, 687 Optimal sustainable yield: animal harvesting, 687 forest harvesting, 586, 589–592 Optimization, using quadratic forms, 429–435 Orbits, 528–529 Order: of differential equation, 326 matrix multiplication and, 41 of trigonometric polynomial, 396 Ordered basis, 217 Ordered n-tuple, 3, 136 Ordered pair, 3 Ordered sets, A4 Ordered triple, 3 Order n, 396 Orthogonal basis, 365, 367–368, 373 Orthogonal change of variable, 420

Orthogonal complement, 252–253, 359–360 Orthogonal diagonalization, 409–416, 441 Orthogonality: defined, 364 inner product and, 358 of row vectors and solution vectors, 169 Orthogonally diagonalizable matrices, 410 Orthogonal matrices, 401–407 Orthogonal operators, 404 Orthogonal projections, 158–160, 368–370 with Algebraic Reconstruction Technique, 615–618 on a column space, 383–384 geometric interpretation of, 369–370 kernel and range of, 452–453 on lines through the origin, 266–267 on a subspace, 381–382 Orthogonal projection operators, 260 Orthogonal sets, 155, 364 Orthogonal vectors, 155–158, 316 in M22 , 358 in P2 , 358 Orthonormal basis, 365–367, 370, 396–397 change of, 404 coordinate vectors relative to, 367 from orthogonal basis, 367–368 orthonormal sets extended to, 373 Orthonormality, 364 Orthonormal sets, 365 constructing, 364–365 extended to orthonormal bases, 373 Outputs, in economics, 96 Outside demand vector, 97, 98 Overdetermined linear system, 255–256 Overlapping sets, 622, 623 P

Pn , See Polynomials P2 : linear operators on, 476–477 orthogonal vectors in, 358 Theorem of Pythagoras in, 359 Page ranks, 705 Parabolic runout spline, 544–547 Parallel mode scanning, computed tomography, 612 Parallelogram, area of, 176 Parallelogram equation for vectors, 150 Parallelogram rule for vector addition, 132 Parallel planes, distance between, 162 Parallel vectors, 133–134 Parameters, 5, 13, 164

Index

Parametric equations, 6 of lines and planes in R 4 , 166–167 of lines in R 2 and R 3 , 164–166 of planes in R 3 , 164–166 Particular solution, 239 Partitioned matrices, 30–32 Pauli spin matrices, 325 Payoff, matrix games, 569 Payoff matrix, 569, 572 Percentage error, 507 Period, of a pixel map, 642 Periodic splines, 548 Permutation matrices, 499 Perpendicular vectors, 155 Photographs, data compression and image processing, 523–524 Piazzi, Giuseppe, 15 Picture, 640 Picture-density, of begin-triangle, 696 Pine forest growth, 591–592 Pitch (aircraft), 263 Pivot column, 21–22 Pivot position, 21–22 Pixels: data compression and image processing, 523 defined, 640 Pixel maps, 640–643 Pixel points: defined, 641 nonperiodic, 645–646 Plaintext, 650 Plaintext vector, 651 Planes: distance between a point and a plane, 161–162 distance between parallel planes, 162 point-normal equations, 156–157 through origin as subspaces, 193 through three points, 529 tiled, 643–644 vector and parametric equations in R 3 , 164–166 vector and parametric equations of in R 4 , 166–167 vector form of, 158, 165 vectors orthogonal to, 157–158 PLU -decomposition, 499 PLU -factorization, 499 Plus-minus theorem, 223–224 Points: constructing curves and surfaces through, 526–530 distance between a point and a plane, 161–162 Point-normal equations, 156–157

Polar form, of complex numbers, 314, A8–A9 Poles (battery), 86 Polygraphic system, 651 Polynomials (Pn ), 48 characteristic polynomial, 293, 306 cubic, 539–547 least squares fit of, 390–391 Legendre polynomials, 372–373 linear independence of, 206 linearly independent set in, 205 linear transformation, 449 spanning set for, 197 standard basis for, 214 standard inner product on, 350–351 subspaces of, 194 trigonometric polynomial, 396–397 Polynomial interpolation, 91–94 Population growth, age-specific, 671–679 Population waves, 676 Positive definite matrices, 426 Positive definite quadratic forms, 424–425 Positive pole, 86 Positive semidefinite quadratic forms, 424 Positivity property: of complex Euclidean inner product, 317 of dot product, 147–148 Power, of vertex of dominance-directed graph, 566 Power function models, 393 Power method, 501–508 with Euclidean scaling, 503–504 with maximum entry scaling, 504–507 stopping procedures, 508 Powers of a matrix, 46–47, 68, 308–309 Power sequence generated by A, 501 Price vector, 579 Principal argument, A8 Principal axes, 423 Principal axes theorem, 420, 423 Principal submatrices, 427 Probability, 334 Probability (Markov) matrix, 550 Probability transition matrix, 706 Probability vector, 334, 551 Product (of matrices), 28–30 determinants of, 120–121 inverse of, 46–47 as linear combination, 32–33 of lower triangular matrices, 69 of symmetric matrices, 71 transpose of, 49 Product (of vectors): cross product, 172–179 scalar multiple in R 2 and R 3 , 133

I9

Products (in chemical equation), 89 Production vector, 97, 98, 581 Productive consumption matrix, 583–584 Productive open economies, 98–100 Profitable industries, in Leontief model, 584 Profitable sectors, 99–100 Projection operators, 260–261, 275–276 Projection theorem, 158–159, 368 Proofs, A1–A4 Pure imaginary complex numbers, A5 Pure strategies, of players in matrix games, 572 Q QR-decomposition, 374, 385 Quadratic curve, of least squares fit, 391–392 Quadratic forms, 417–422 applications of, 419–420 change of variable, 419 conic sections, 420–422 expressing in matrix notation, 418 indefinite quadratic forms, 424 negative definite quadratic forms, 424 negative semidefinite quadratic forms, 424 optimization using, 429–435 positive definite quadratic forms, 424–425 positive semidefinite quadratic forms, 424 principal axes theorem, 420 Quadratic form associated with A, 418 Quotient, A7 R

Rn : coordinates relative to standard basis for, 218 distance in, 144–145 Euclidean inner product, 346–348 geometry in, 149–150 linear independence of standard unit vectors in, 203–204 norm of a vector, 142–143 span in standard unit vector, 196 spanning in, 196 standard basis for, 214 standard unit vectors in, 144 Theorem of Pythagoras in, 160 transition matrices for, 233–234 two-point vector equations in, 167–168 vector forms of lines and planes in, 166 vectors in, 135–139 as vector space, 185

I10

Index

R2 : Anosov automorphism, 648–649 dot product of vectors in, 145 line segment from one point to another in, 168 lines through origin are subspaces of, 192–193 lines through two points in, 167–168 matrix operators on, 280–287 matrix transformations in, 259–262, 264–267 norm of a vector, 142–143 parametric equations, of lines in, 164–166 self-similar sets in, 622–623 shears in, 265–266 spanning in, 196–197 unit circles in, 348 vector addition in, 132, 134 vectors in, 131–140 R3 : coordinates in, 218–219 dot product of vectors in, 145 linear independence of standard unit vectors in, 204 lines through origin are subspaces of, 192–193 matrix transformations in, 259–265 norm of a vector, 142–143 orthogonal set in, 364 rotations in, 262–263 spanning in, 196–197 standard basis for, 215 vector addition in, 132, 134 vector and parametric equations of lines in, 164–166 vector and parametric equations of planes in, 164–166 vectors in, 131–140 R4 : cosine of angle between two vectors in, 357 linear independence of standard unit vectors in, 205 Theorem of Pythagoras in, 160 vector and parametric equations of lines and planes in, 166–167 Random iteration algorithm, 632 Range, 452–454 Rank, 454–455 of 4 × 6 matrix, 249–250 of an approximation, 523 dimension theorem for matrices, 250 maximum value for, 250 redundancy in a matrix and, 254 sum of, 251

Rate of convergence, 507 Rayleigh, John William Strutt, 506 Rayleigh quotient, 505 Reactants (in chemical equation), 89 Real axis, A6 Real inner product space, 345, 355–356 Real line, 135 Real matrices, 315, 320–321 Real part: of complex numbers, 313, A5 of vectors and matrices, 314–315 Real-valued functions, vector space of, 187 Real vector space, 183, 184, 345 Recessive genes, 661 Reciprocals: of complex number, A7 of modulo m, 654 Rectangular coordinate systems, 212–213 Reduced row echelon forms, 11–12, 21, 318 Reduced singular value decomposition, 521 Reduced singular value expansion, 522 Redundancy, in matrices, 254 Reflections, composition of, 272, 284–285 Reflection matrices, 402 Reflection operators, 259–260, 267 Regression line, 389 Regular Markov chain, 338, 554 Regular stochastic matrices, 338–339 Regular transition matrix, 554 Relative error, 507 Relative maximum, 433, 434 Relative minimum, 432, 434 Repeated mappings, of Arnold’s cat map, 639–640 Replacement matrix, forest management model, 588 Residuals, 389 Residue, of a modulo m, 653–654 Resistance (electrical), 86 Resistor, 86 Resultant, 154 Revection transformation, computer graphics, 599 RGB color cube, 140 RGB color model, 140 RGB space, 140 Rhind Papyrus, 532 Right circular cylinder, 437 Right distributive law, 39 Right eigenvectors, 301 Right-hand rule, 176, 262 Roll (aircraft), 263

Rotations: composition of, 271–272, 283 kernel and range of, 453 in R 3 , 262–263 Rotation equations, 262, 405 Rotation matrices, 262, 402 Rotation of axes: in 2-space, 404–406 in 3-space, 406–407 Rotation operator, 261–263 properties of, 275 on R 3 , 262–263 Rotation transformation: computer graphics, 596–598 self-similar sets, 626 Roundoff errors, 22 Rows, cofactor expansion and choice of row, 109 Row-column method, 31–32 Row echelon form, 11–12, 14–15, 21–22, 241 Row equivalents, 52 Row matrices, 26 Row-matrix form of vectors, 237 Row operations, See Elementary row operations Row reduction: basis by, 242–244 evaluating determinants by, 113–117 Row space, 237, 240, 241, 251–252 basis by row reduction, 242–243 basis for, 241, 244–245 equal dimensions of row and column space, 248–249 Row vectors, 26, 27, 40, 168–169, 237 Row-vector form of vectors, 139 Row-wheel, 568 Runout splines, 544–547 S Saddle points, 433, 434, 572 Sample points, 350 Saturation, graphical images, 136 Scalars, 26, 131, 133 from vector multiples, 172 vector space scalars, 184 Scalar moment, 180 Scalar multiples, 28–29, 184 Scalar multiplication, 133, 184 Scalar triple product, 177 Scaling: Euclidean scaling, 503–504 maximum entry scaling, 504–507 Scaling transformation: computer graphics, 595 self-similar sets, 622, 626–627

Index

Schmidt, Erhardt, 371, 518 Schur, Issai, 414, 415 Schur decomposition, 415, 514 Schur’s theorem, 415 Schwarz, Hermann Amandus, 149 Search engines, Internet, 704–710 Second derivative test, 433, 434 Sectors (economic), 96 Self-similar sets, 622–626 Sensitivity to initial conditions, dynamical systems, 647 Sets, A3–A4 linear independence of, 202–206 relations among members of, 559 self-similar sets, 622–626 Set-builder notation, A3–A4 Shear operators, 265–266, 284–285 Shear transformation, computer graphics, 599 Sheep harvesting, 684–685 Shifting operators, 460 Sierpinski, Waclaw, 624 Sierpinski carpet, 624, 626, 628–631, 633, 636 Sierpinski triangle, 624, 626, 628–629, 631–632 Similarity invariants, 303, 484–485 Similarity transformations, 302 Similar matrices, 303 Similitudes, 626–629 Singular matrices, 43, 44 Singular values, 515–516 Singular value decomposition (SVD), 516–519, 521–524 Skew-Hermitian matrices, 442 Skew product, 173 Skew-symmetric matrices, 442 Solutions: best approximations, 379–380 comparison of procedures for solving linear systems, 509–513 cost of, 509–512 factoring, 491 flops and, 509–512 Gaussian elimination, 11–16, 22, 512, 513 Gauss-Jordan elimination, 15, 18, 21, 22, 45–46, 92–93, 318, 512–513 general solution, 13, 239, 326 of homogeneous linear systems, 198–199 least squares solutions, 378–379, 385 of linear systems, 3, 11 of linear systems by diagonalization, 328–330 of linear systems by factoring, 491

of linear systems with initial conditions, 327–328 particular solution, 239 power method, 501–508 trivial/nontrivial solutions, 17, 327 Solutions spaces, of homogeneous systems, 199 Solution vectors, 168–169 Sound waves, in human ear, 689–694 Spacecraft, yaw, pitch, and roll, 263 Spanning: in R 2 and R 3 , 196–197 in R n , 196 testing for, 198 Spanning sets, 197, 200, 216 Spans, 196, 222 Spectral decomposition of A, 413–414 Sphere, through four points, 529–530 Spline interpolation, cubic, 538–547 Spring constant, 390 Square matrices, 43, 67, 69, 401 decompositions of, 514–515 determinants of, 113–117 eigenvalues of, 307 of order n, 27 trace, 36 transpose, 35 Standard basis: coordinates relative to standard basis for R n , 218 coordinate vectors relative to, 218 for Mmn , 215–216 for polynomials, 214 for R 3 , 215 for R n , 214 Standard inner product: defined, 346 on polynomials, 350 on vector space, 349–350 Standard matrices: for matrix transformation, 286–287 for T −1 , 276 Standard unit vectors, 144, 175–176 linear independence in R 3 , 204 linear independence in R 4 , 205 linear independence in R n , 203–204 in span R n , 196 State of a particle system, 137 State of the variable, 332 State vector, 334 of Markov chains, 551, 553–557 webgraph, 706 Static equilibrium, 155 Steady-state vector, of Markov chain, 339, 555–556 Stochastic matrices, 338–339, 550

I11

Stochastic processes, 334 Stopping procedures, 508 Strategies, of players in matrix games, 570–573 Strictly determined games, 572 String theory, 135, 136 Subdiagonal, 415 Submatrices, 31, 427 Subsets, A4 Subspaces, 191–200, 453 creating, 195–198 defined, 191 examples of, 192–200 of Mnn , 193 orthogonal projections on, 381–382 of polynomials, 194 of polynomials (Pn ), 194 of R 2 and R 3 , 192–193 zero subspace, 192 Substitution ciphers, 650 Subtraction: of vectors in R 2 and R 3 , 133 of vectors in R n , 138 Sum: direct, 290 matrices, 28, 47 of rank and nullity, 251 of vectors in R 2 and R 3 , 132, 134 of vectors in R n , 138 SVD (singular value decomposition), 516–519, 521–524 Sylvester, James, 35, 107, 518 Sylvester’s inequality, 259 Symmetric matrices, 70–71, 320 eigenvalues of, 411 Hessian matrices, 433–434 Symmetry property, of dot product, 147–148, 316 T

T −1 , standard matrix for, 276 Taussky-Todd, Olga, 319 Technology Matrix, 97 Television, market share as dynamical system, 332–334 Temperature distribution, at equilibrium, See Equilibrium temperature Terminal point, 131 Theorem of Pythagoras: generalized Theorem of Pythagoras, 358–359 in R 4 , 160 in R n , 160 3 × 3 matrices: adjoint, 123 determinants, 110

I12

Index

eigenvalues, 293–294 orthogonal matrix, 401–402 QR-decomposition of, 375 Three-dimensional object visualization, 593–595 3-space, 131 cross product, 172–179 scalar triple product, 177 3-Step connection, directed graphs, 561 Three-Step Procedure, 474–475 3-tuples, 135 Tien-Yien Li, 637 Tiled planes, 643–644 Time, as fourth dimension, 135 Time-varying morphs, 699–702 Time-varying warps, 698–699 Topological dimensions, 624–625 Topology, 624–625 Torque, 180 Tournaments, 564 Trace, square matrices, 36 Traffic flow, network analysis with linear systems, 85–86 Transformations. See also Linear transformations; Matrix transformations differentiation transformation, 453 evaluation transformation, 450 integral transformation, 452 inverse transformations, 477–478 on matrix spaces, 449 one-to-one linear transformation, 459 Transition matrices, 231–234, 482 invertibility of, 232–233 Markov chains, 550–553 for R n , 233–234 Transition probability, Markov chains, 549 Translation, 132, 450 Translation transformation, computer graphics, 596 Transpose, 34–35 determinant of, 113 invertibility, 49 of lower triangular matrix, 69 properties, 48–49 vector spaces, 251–252 Triangle: area of, 176–177 Sierpinski, 624, 626, 628–629, 631–632 Triangle inequalities: for distances, 149–150, 357 for vectors, 149–150, 357

Triangle rule for vector addition, 132 Triangular matrices, 69–70 diagonalizability of, 307 eigenvalues of, 294–295 Triangulation, 697–698 Trigonometric polynomial, 396 Trivial solution, 17, 327 Turing, Alan Mathison, 493 2 × 2 matrices: cofactor expansions of, 107–108 determinants, 110 eigenvalues of, 319–321 games, 573–576 inverse of, 45–46 vector space, 186 2 × 2 vector, eigenvectors, 292 Two-person zero-sum games, 569–573 Two-point vector equations, in R n , 167–168 2-Step connection, directed graphs, 561, 564–565 2-space, 131 2-tuples, 135 U Underdetermined linear system, 255 Unified field theory, 136 Union, A4 Unitary diagonalization, of Hermitian matrices, 441–442 Unitary matrices, 437–438, 440–442 Unit circle, 348 Units of measurement, 213 Unit sphere, 348 Unit vectors, 143–145, 316, 346 Unknowns, 2 Unstable algorithms, 22 Upper Hessenberg decomposition, 415 Upper Hessenberg form, 415 Upper triangular matrices, 69, 110, 294 V Vaccine distribution, 575–576 Vectors, 131 angle between, 146–149, 356–357 arithmetic operations, 132–134, 137–138 “basis vectors,” 214 collinear vectors, 133–134 column-matrix form of, 237 column-vector form of, 140 comma-delimited form of, 139, 217, 237 components of, 134–135

in coordinate systems, 134–135 coordinate vectors, 218–219 dot product, 145–148, 150–152 equality of, 132, 137–138 equivalence of, 132, 137–138 geometric vectors, 131 linear combinations of, 140, 144–145, 195, 197–198 linear independence of, 196, 202–210 nonzero vectors, 200 normalizing, 144 norm of, 160 notation for, 131, 139–140 orthogonal vectors, 155–158, 316 parallelogram equation for, 150 parallel vectors, 133–134 perpendicular vectors, 155 probability vector, 334 in R 2 and R 3 , 131–140 real and imaginary parts of, 314–315 in R n , 135–139 row-matrix form of, 237 row vectors, 26, 27, 40, 168–169, 237 row-vector form of, 139 solution vectors, 168–169 standard unit vectors, 144, 175–176, 196, 203–204 state vector, 334 triangle inequality for, 149–150 unit vectors, 143–145, 316, 346 zero vector, 132, 137 Vector addition: matrix games, 572 parallelogram rule for, 132 in R 2 and R 3 , 132, 134 triangle rule for, 132 Vector equations: of lines and planes in R 4 , 166–167 of lines in R 2 and R 3 , 164–166 of planes in R 3 , 164–166 two-point vector equations in R n , 167–168 Vector forms, 165 Vector space, 183 axioms, 183–184 complex vector spaces, 184, 313–324 dimensions of, 222 examples of, 185–189, 216 finite-dimensional vector spaces, 214, 216–217, 224–225 infinite-dimensional vector spaces, 214, 216 of infinite real number sequences, 185

Index

isomorphic, 466 of m × n matrices, 186–187 n-dimensional, 224 of real-valued functions, 187 real vector space, 183, 184 subspaces, 191–200, 453 for transposes of matrices, 251–252 of 2 × 2 matrices, 186 zero vector space, 185, 222 Vector space scalars, 184 Vector subtraction, in R 2 and R 3 , 133 Venn Diagrams, A4 Vertex matrix, 559–561 Vertex points, 697–698 Vertices, graphs, 559–560 Viewing audience maximization, 573 Visualization, of three-dimensional objects, 593–595

Volts (units), 86 Voltage rises/drops, 86, 87 von Neumann, John, 642 W Warps, 695–699 affine transformations with, 696 defined, 696 time-varying, 698–699 Webgraph, 704 Weight, 346 Weighted Euclidean inner products, 346–349 Weyl, Herman Klaus, 518 Wildlife migration, as Markov chain, 336–337 Wilson, Edwin, 173 Work, 163

I13

Wronski, ´ Józef Hoëné de, 208 Wronskian, 209–210 X X-linked inheritance, 661–662, 666–670 X-ray computed tomography, 611–620 Y Yaw, 263 Yorke, James, 637 Z Zero matrices, 41 Zero population growth, 679 Zero subspace, 192 Zero-sum matrix games, two-person, 569–573 Zero transformations, 448, 452 Zero vectors, 132, 137 Zero vector space, 185, 222

A P P L I C AT I O N S A N D H I S T O R I C A L T O P I C S Aeronautical Engineering

Differential Equations

Lifting force 95

First-order linear systems 328–332

Solar powered aircraft 395 Supersonic aircraft flutter 321 Yaw, pitch, and roll 264 Astrophysics Kepler’s laws 10.1* Measurement of temperature on Venus 394 Biology and Ecology

Electrical Engineering Circuit analysis 84-85, 86–88 Digitizing signals 185 LRC circuits 333 Geometry in Euclidean Space Angle between a diagonal of a cube and an edge 147 Direction angles and cosines 154

Air quality prediction 343

Parallelogram law 150

Forest management 10.8*

Generalized theorem of Pythagoras 160, 360

Genetics 344, 10.15*

Reflection about a line 268

Harvesting of animal populations 10.17*

Rotation about a line 411

Population dynamics 343, 10.16*

Rotation of coordinate axes 406–409

Wildlife migration 338–339

Vector methods in plane geometry Module 4**

Business and Economics Game theory 10.6* Leontief input-output models 96–100, 10.7*

Library Science ISBN numbers 153

Market share 334–336, 343

Linear Algebra Historical Figures

Sales and cost analysis 38, 39

Harry Bateman 519

Sales projections using least squares 395

Eugene Beltrami 520

Calculus Approximate integration 93–94 Derivatives of matrices 102 Integral inner products 353 Partial fractions 25

Maxime Bôcher 7 Viktor Bunyakovsky 149 Lewis Carroll 108 Augustin Cauchy 122 Arthur Cayley 35, 44 Gabriel Cramer 125

Chemistry

Leonard Dickson 123

Balancing chemical equations 88–91

Albert Einstein 136

Civil Engineering

Gotthold Eisenstein 30 Leonhard Euler A10

Equilibrium of rigid bodies Module 5**

Leonardo Fibonacci 52

Traffic flow 85–86

Jean Fourier 400

Computer Science

Carl Friedrich Gauss 15, 106

Color models for digital displays 67, 136, 140 Computer graphics 10.9* Facial recognition 297 Fractals 10.12* Google site ranking 10.20* Warps and morphs 10.19*

Josiah Gibbs 146, 173 Gene Golub 520 Jorgen Pederson Gram 373 Hermann Grassman 18 Jacques Hadamard 129 Charles Hermite 440 Ludwig Hesse 435

Cryptography

Karl Hessenberg 417

Hill ciphers 10.14*

George Hill 196

(*Section in the Applications Version) (**Web Module)

Alton Householder 411

Medicine and Health

Camille Jordan 520

Computed tomography 10.11*

Wilhelm Jordan 15

Modeling human hearing 10.18*

Gustav Kirchhoff 88

Nutrition 10

Joseph Lagrange 174 Wassily Leontief 96

Numerical Linear Algebra

Andrei Markov 338

Cost in flops of algorithms 511–515

Abraham de Moivre A10

Data compression 523–526

John Rayleigh 508

FBI fingerprint storage 925

Erhardt Schmidt 373

Fitting curves to data 10, 24, 91–93

Issai Schur 417

Householder reflections 411

Hermann Schwarz 149

LU-decomposition 493–501

James Sylvester 35, 107

Polynomial interpolation 92–94

OlgaTodd 321

Power method 503–510

AlanTuring 495

Powers of a matrix 310–311

John Venn A4

QR-decomposition 376–377, 387

Herman Weyl 520

Roundoff error, instability 22

Jósef Wronski ´ 206

Schur decomposition 417 Singular value decomposition 516–522, 523–524

Mathematical History

Spectral decomposition 415–416

Early history of linear algebra 10.2*

Upper Hessenberg decomposition 417

Mathematical Modeling

Operations Research

Chaos 10.13*

Assignment of resources Module 6**

Cubic splines 10.3*

Linear programming Modules 1–3**

Curve fitting 10, 24, 91–93, 10.1*

Storage and warehousing 136

Exponential models 395

Physics

Graph theory 10.5* Least squares 380–385, 387, 392–394, 399–400 Linear, quadratic, cubic models 389–390 Logarithmic models 395 Markov chains 337, 10.4* Modeling experimental data 389–380, 393–394 Population growth 10.16* Power function models 395 Mathematics

Displacement and work 163 Experimental data 136 Mass-spring systems 201–202 Mechanical systems 137 Motion of falling body using least squares 393–394 Quantum mechanics 327 Resultant of forces 154 Scalar moment of force 180 Spring constant using least squares 392

Cauchy–Schwarz inequality 148–149

Static equilibrium 155

Constrained extrema 431–434

Temperature distribution 502

Fibonacci sequences 52

Torque 180

Fourier series 398–400

Probability and Statistics

Hermite polynomials 221 Laguerre polynomials 221 Legendre polynomials 374–375

Arithmetic average 349 Sample mean and variance 430

Quadratic forms 419–429

Psychology

Sylvester’s inequality 259

Behavior 343