A Brief Course in Linear Algebra

  • Commentary
  • Downloaded from the web; no longer available

Table of contents :
Chapter I. Linear Algebra, Basic Notions 1
1.1 Introduction 1
1.2 Matrix Algebra 4
1.3 Formal Rules 12
1.4 Linear Systems of Algebraic Equations 15
1.5 Singularity, Pivots, and Invertible Matrices 24
1.6 Gauss-Jordan Reduction in the General Case 36
1.7 Homogeneous Systems and Vector Subspaces 46
1.8 Linear Independence, Bases, and Dimension 51
1.9 Calculations in R n 62
1.10 Review Problems 67
Chapter II. Determinants and Eigenvalues 71
2.1 Introduction 71
2.2 Definition of the Determinant 74
2.3 Some Important Properties of Determinants 82
2.4 Eigenvalues and Eigenvectors 89
2.5 Diagonalization 100
2.6 The Exponential of a Matrix 105
2.7 Review 108
Chapter III. Applications 111
3.1 Real Symmetric Matrices 111
3.2 Repeated Eigenvalues, The Gram–Schmidt Process 113
3.3 Change of Coordinates 118
3.4 Classification of Conics and Quadrics 125
3.5 Conics and the Method of Lagrange Multipliers 133
3.6 Normal Modes 139
3.7 Review 147
Chapter IV. Index 149

Citation preview

A Brief Course in Linear Algebra

Leonard Evens Department of Mathematics Northwestern University

Evanston, Illinois 1997

c Leonard Evens 1994, 1997 °

CONTENTS

Chapter I. Linear Algebra, Basic Notions 1.1 Introduction 1.2 Matrix Algebra 1.3 Formal Rules 1.4 Linear Systems of Algebraic Equations 1.5 Singularity, Pivots, and Invertible Matrices 1.6 Gauss-Jordan Reduction in the General Case 1.7 Homogeneous Systems and Vector Subspaces 1.8 Linear Independence, Bases, and Dimension 1.9 Calculations in Rn 1.10 Review Problems Chapter II. Determinants and Eigenvalues 2.1 Introduction 2.2 Definition of the Determinant 2.3 Some Important Properties of Determinants 2.4 Eigenvalues and Eigenvectors 2.5 Diagonalization 2.6 The Exponential of a Matrix 2.7 Review Chapter III. Applications 3.1 Real Symmetric Matrices 3.2 Repeated Eigenvalues, The Gram–Schmidt Process 3.3 Change of Coordinates 3.4 Classification of Conics and Quadrics 3.5 Conics and the Method of Lagrange Multipliers 3.6 Normal Modes 3.7 Review Chapter IV. Index

v

1 1 4 12 15 24 36 46 51 62 67 71 71 74 82 89 100 105 108 111 111 113 118 125 133 139 147 149

vi

CONTENTS

PREFACE

This text has been specially created for the course Mathematics B17 at Northwestern University. The subject matter is introductory linear algebra, which is covered in about six weeks in that course. Since time is limited, emphasis is on important basic concepts. Linear algebra is somewhat more theoretical than some of the subjects you studied previously in your calculus courses. What you have to learn is a collection of basic concepts and algorithms. Some of these concepts are a bit subtle, and instead of memorizing formulas you need to learn moderately complex procedures for solving problems. In developing the subject matter, we have tried to keep things concrete by concentrating on illustrative examples. Such examples exhibit the important features of the theory. To describe a concept in complete generality will often require an extensive discussion and listing of many special cases and caveats. However, if you have a good understanding of the basic examples, you will usually be able to figure out what to do if you encounter something similar but not exactly the same as the example. You won’t find as many exercises as you are used to in a calculus text. Most of the exercises take somewhat more time than is usual, so try to glean as much as you can from each rather than relying on repetition to drive a point home. There are fairly complete answers at the end of the book. However, don’t just try to get the right answers. It is more important to understand the methods and concepts. Also, don’t concentrate so much on how to solve particular problems that you lose sight of the ideas these problems are meant to illustrate. There are also more ‘theoretical’ questions than you may be used to. Such problems are intended to get you to come to grips with important concepts in cases where just doing some more examples might not suffice. You need not write out formal proofs as long as you can give convincing explanations. The emphasis should be on understanding rather than on mathematical rigor. Unfortunately, there isn’t time in the syllabus to develop many of the beautiful and important applications of linear algebra. A few such applications are mentioned in the exercises, and two important applications are included at the end. However, linear algebra is one of the most essential mathematical tools in science, engineering, statistics, economics, etc., so we ask you to bear with us if the going gets a bit tough. For completeness, we have included some proofs of crucial theorems, but they are not supposed to be a fundamental part of the course. Given that time is limited, it is not unreasonable to postpone the proofs for a more advanced course in linear algebra. No one has ever written a perfect book. A publisher once told me that people still find typographical errors in the oft reprinted works of Charles Dickens. If you find iii

iv

PREFACE

something that doesn’t seem to make any sense, in either the text or the problems, please mention it to your instructor. More important, if you find some discussion particularly murky, please let me or your instructor know. The exposition will be revised with such contributions in mind, and you may help generations of calculus students yet to come. I would like to thank Professor Daniel S. Kahn who has helped enormously with the preparation of this text but who doesn’t want to be held responsible, as an author, for my misdeeds. I would also like to thank my teaching assistants for valuable comments. In particular, I incorporated a suggestion by Jason Douma which I hope clarifies the concept of eigenvector. This text was typeset using AMS-TEX. Leonard Evens, December, 1994

CHAPTER I

LINEAR ALGEBRA, BASIC NOTIONS

1. Introduction In your previous calculus courses, you studied differentiation and integration for functions of more than one variable. Usually, you couldn’t do much if the number of variables was greater than two or three. However, in many important applications, the number of relevant variables can be quite large. In such situations, even very basic algebra can get quite complicated. Linear algebra is a tool invented in the nineteenth century and further extended in the twentieth century to enable people to handle such algebra in a systematic and understandable manner. We start off with a couple of simple examples where it is clear that we may have to deal with a lot of variables. Example 1. Professor Marie Curie has ten students in a chemistry class and gives five exams which are weighted differently in order to obtain a total score for the course. The data as presented in her grade book is as follows. student/exam 1 2 3 4 5 6 7 8 9 10

1 78 81 92 53 81 21 83 62 70 69

2 70 75 90 72 79 92 84 65 72 75

3 74 72 94 65 79 90 76 67 76 70

4 82 85 88 72 82 88 79 73 82 78

5 74 80 94 59 78 95 84 65 73 79

The numbers across the top label the exams and the numbers in the left hand column number the students. There are a variety of statistics the teacher might want to calculate from this data. First, she might want to know the average score for each test. For a given test, label the scores x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 so that xi is the score for the ith student. Then the average score is 10

1 X x1 + x2 + x2 + x4 + x5 + x6 + x7 + x8 + x9 + x10 = xi . 10 10 i=1 For example, for the second test the average is 1 (70 + 75 + 90 + 72 + 79 + 92 + 84 + 65 + 72 + 75) = 77.4. 10 1

2

I. LINEAR ALGEBRA, BASIC NOTIONS

Suppose she decides to weight the five scores as follows: the first, third, and fifth scores are weighted equally at 20 percent or 0.2, the second score is weighted 10 percent or 0.1, and the fourth score is weighted 30 percent or 0.3. Then if the scores for a typical student are denoted y1 , y2 , y3 , y4 , y5 , the total weighted score would be 0.2 y1 + 0.1 y2 + 0.2 y3 + 0.3 y4 + 0.2 y5 . If we denote the weightings a1 = a3 = a5 = 0.2, a2 = 0.1, a4 = 0.3, then this could also be written 5 X a1 y1 + a2 y2 + a3 y3 + a4 y4 + a5 y5 = ai yi . i=1

For example, for the third student, the total score would be 0.2 · 92 + 0.1 · 90 + 0.2 · 94 + 0.3 · 88 + 0.2 · 94 = 91.4. As you see, in both cases we have a number of variables and we are forming what is called a linear function of those variables, that is, an expression in which each variable appears simply to the first power (with no complicated functions). When we only have two or three variables, the algebra for dealing with such functions is quite simple, but as the number of variables grows, the algebra becomes much more complex. Such data sets and calculations should be familiar to anyone who has played with a spreadsheet. Example 2. In studying complicated electrical circuits, one uses a collection of rules called Kirchhoff ’s laws. One of these rules says that the currents converging at a node in the circuit add up algebraically to zero. (Currents can be positive or negative.) Other rules put other restrictions on the currents. For example, in the circuit below 10

x1

x2

10 x5 x

20 3

15 x4

Numerical resistances in ohms 5

50 volts

Kirchhoff’s laws yield the following equations for the currents x1 , x2 , x3 , x4 , x5 in the different branches of the circuit. 10x1 + 10x2 = 50 20x3 + 5x4 = 50 x1 − x2 − x5 = 0 −x3 + x4 − x5 = 0 10x1 + 5x4 + 15x5 = 50

1. INTRODUCTION

3

Don’t worry if you don’t know anything about electricity. The point is that the circuit is governed by a system of linear equations. In order to understand the circuit, we must have methods to solve such systems. In your high school algebra course, you learned how to solve two equations in two unknowns and perhaps three equations in three unknowns. In this course we shall study how to solve any number of equations in any number of unknowns. Linear algebra was invented in large part to discuss the solutions of such systems in an organized manner. The above example yielded a fairly small system, but electrical engineers must often deal with very large circuits involving many, many currents. Similarly, many other applications in other fields require the solution of systems of very many equations in very many unknowns. Nowadays, one uses electronic computers to solve such systems. Consider for example the system of 5 equations in 5 unknowns 2x1 + 3x2 − 5x3 + 6x4 − x5 = 10 3x1 − 3x2 + 6x3 + x4 − x5 = 2 x1 + x2 − 4x3 + 2x4 + x5 = 5 4x1 − 3x2 + x3 + 6x4 + x5 = 4 2x1 + 3x2 − 5x3 + 6x4 − x5 = 3 How might you present the data needed to solve this system to a computer? Clearly, the computer won’t care about the names of the unknowns since it doesn’t need such aids to do what we tell it to do. It would need to be given the table of coefficients 2 3 −5 6 −1 3 −3 6 1 −1 1 1 −4 2 1 4 −3 1 6 1 2 3 −5 6 −1 and the quantities on the right—also called the ‘givens’ 10 2 5 4 3 Each such table is an example of a matrix , and in the next section, we shall discuss the algebra of such matrices. Exercises for Section 1. 1. A professor taught a class with three students who took two exams each. The results were student/test 1 2 1 100 95 2 60 75 3 100 95

4

I. LINEAR ALGEBRA, BASIC NOTIONS

(a) What were the average scores on each test? (b) Are there weightings a1 , a2 which result in either of the following weighted scores? student score student score 1 98 1 98 or 2 66 2 66 3 98 3 97 2.

Solve each of the following linear systems by any method you know. (a) 2x + 3y = 3 x + 3y = 1

(b) x+y =3 y+z =4 x+y+z =5 (c) x+y =3 y+z =4 x + 2y + z = 5 (d) x+y+z =1 z=1

2. Matrix Algebra In the previous section, we saw as the table of grades  78  81   92   53   81   21   83   62  70 69

examples of rectangular arrays or matrices such 70 75 90 72 79 92 84 65 72 75

74 72 94 65 79 90 76 67 76 70

82 85 88 72 82 88 79 73 82 78

 74 80   94   59   78   95   84   65   73 79

2. MATRIX ALGEBRA

5

This is called a 10 × 5 matrix. It has 10 rows and 5 columns. More generally, an m × n matrix is a table or rectangular array of the form 

a11  a21  .  ..

a12 a22 .. .

am1

am2

... ... ... ...

 a1n a2n  ..  .  amn

It has m rows and n columns. The quantities aij are called the entries of the matrix. They are numbered with subscripts so that the first index i tells you which row the entry is in, and the second index j tells you which column it is in. Examples. ·

·

[ x1

1 4 x2

−2 1 2 3

1 −2 3 2

4 1

¸ is a 2 × 2 matrix ¸ is a 2 × 4 matrix

x3 x4 ]   x1  x2    x3 x4

is a 1 × 4 matrix is a 4 × 1 matrix

Matrices of various sizes and shapes arise in applications. For example, every financial spreadsheet involves a matrix, at least implicitly. Similarly, every system of linear equations has a coefficient matrix. In computer programming, a matrix is called a 2-dimensional array and the entry in row i and column j is usually denoted a[i, j] instead of aij . As in programming, it is useful to think of the entire array as a single entity, so we use a single letter to denote it   a11 a12 . . . a1n  a21 a22 . . . a2n  . A= .. ..   ... . ... .  am1

am2

...

amn

There are various different special arrangements which play important roles. A matrix with the same number of rows as columns is called a square matrix. Matrices of coefficients for systems of linear equations are often square. A 1 × 1 matrix [a] is not logically distinguishable from a number or scalar , so we make no distinction between the two concepts. A matrix with one row a = [ a1

a2

...

an ]

6

I. LINEAR ALGEBRA, BASIC NOTIONS

is called a row vector and a matrix with one column   a1  a2   a=  ...  an is called a column vector. This terminology requires a bit of explanation. In three dimensional calculus, a vector is completely determined by its set of components [ v1 v2 v3 ]. Much of the analysis you encountered in that subject was simplified by using vector notation v to stand for the vector rather than emphasizing its components. When we wish to generalize to larger numbers of variables, it is also useful to think of a set of components [ v1 v2 . . . vn ] as constituting a higher dimensional vector v. In this way we can use geometric insights which apply in two or three dimensions to help guide us—by analogy— when discussing these more complicated situations. In so doing, there is a formal difference between specifying the components horizontally, as above, or vertically as in   v1  v2   .   ..  vn but logically speaking the same data is specified. In either case, the entity under consideration should be viewed as a higher dimensional analogue of a vector. For technical reasons which will be clear shortly, we shall usually specify such objects as column vectors. Matrices are denoted in different ways by different authors. Most people use ordinary (non-boldface) capital letters, e.g., A, B, X, Q. However, one sometimes wants to use boldface for row or column vectors, as above, since boldface is commonly used for vectors in two and three dimensions and we want to emphasize that analogy. Since there are no consistent rules about notation, you should make sure you know when a symbol represents a matrix which is not a scalar. Matrices may be combined in various useful ways. Two matrices of the same size and shape are added by adding corresponding entries. You are not allowed to add matrices with different shapes. Examples.   1 1 −1 2 1 +  0 0 1 −1 



  1 2 3 =  2 −2 −1

     x+y −y x  y  +  −y  =  0  . 0 x x

 0 4 −1

2. MATRIX ALGEBRA

7

The m × n matrix with zero entries is called a zero matrix and is usually just denoted 0. Since zero matrices with different shapes are not the same, it is sometimes necessary to indicate the shape by using subscripts, as in ‘0mn ’, but usually the context makes it clear which zero matrix is needed. The zero matrix of a given shape has the property that if you add it to any matrix A of the same shape, you get the same A again. Example. ·

1 −1 2 3

¸ · 0 0 + −2 0

¸ · ¸ 0 1 −1 0 = 0 2 3 −2

0 0

A matrix may also be multiplied by a scalar by multiplying each entry of the matrix by that scalar. More generally, we may multiply several matrices with the same shape by different scalars and add up the result: c1 A1 + c2 A2 + · · · + ck Ak where c1 , c2 , . . . , ck are scalars and A1 , A2 , . . . , Ak are m×n matrices with the same m and n. This process is called linear combination. Example.               1 0 1 2 0 3 5 0 1 1 0 −1 3             2 2   + (−1)   + 3   =   +   +   =  . 1 0 1 2 0 3 5 0 1 1 0 −1 3 2 Sometimes it is convenient to put the scalar on the other side of the matrix, but the meaning is the same: each entry of the matrix is multiplied by the scalar. cA = Ac. We shall also have occasion to consider matrix valued functions A(t) of a scalar variable t. That means that each entry aij (t) is a function of t. Such functions are differentiated or integrated entry by entry. Examples.

Z

1 0

· d e2t dt 2e2t

e−t −e−t

·

·

t t2

¸ dt =

1/2 1/3

¸

· =

2e2t 4e2t

−e−t e−t

¸

¸

There are various ways to multiply matrices. For example, one sometimes multiplies matrices of the same shape by multiplying corresponding entries. This is useful only in very special circumstances. Another kind of multiplication generalizes the

8

I. LINEAR ALGEBRA, BASIC NOTIONS

dot product of vectors. In three dimensions, if a has components [ a1 a2 a3 ] and b has components [ b1 b2 b3 ], then the dot product a · b = a1 b1 + a2 b2 + a3 b3 ; that is, corresponding components are multiplied and the results are added. If [ a1 is a row vector of size n, and

a2

...

an ]



 b1  b2   .   ..  bn

is a column vector of the same size n, the row by column product is defined to be the sum of the products of corresponding entries   b1 n X  b2    ai bi . [ a1 a2 . . . an ]  ..  = a1 b1 + a2 b2 + · · · + an bn = . i=1

bn

This product is of course a scalar , and except for the distinction between row and column vectors, it is an obvious generalization of notion of dot product in two or three dimensions. You should be familiar with its properties. More generally, let A be an m × n matrix and B an n × p matrix. Then each row of A has the same size as each column of B. The matrix product AB is defined to be the m × p matrix with i, j entry the row by column product of the ith row of A with the jth column of B. Thus, if C = AB, then C has the same number of rows as A, the same number of columns as B, and cij =

n X

air brj .

r=1

Examples. ¸ · 1 0 1 2−1 = −1 2 1 1−0 {z } | | {z } | ·

2 1

1 0

¸·

2×2

2×3

· =

1 1

2 0

0+2 0+0 {z 2×3

3 1

2+1 1+0

¸ }

¸

   x−y 1 −1 · ¸ 1 0  x =  x  y 2x + y 2 1 |{z} | {z } | {z } 2×1 

3×2

3×1

The most immediate use for matrix multiplication is a simplification of the notation used to describe a system of linear equations.

2. MATRIX ALGEBRA

9

Consider the system in the previous section 2x1 + 3x2 − 5x3 + 6x4 − x5 = 10 3x1 − 3x2 + 6x3 + x4 − x5 = 2 x1 + x2 − 4x3 + 2x4 + x5 = 5 4x1 − 3x2 + x3 + 6x4 + x5 = 4 2x1 + 3x2 − 5x3 + 6x4 − x5 = 3 If you look closely, you will notice that the expressions on the left are the entries of a matrix product: 

2 3 −5 6  3 −3  1 −4 1  4 −3 1 2 3 −5

    6 −1 2x1 + 3x2 − 5x3 + 6x4 − x5 x1 1 −1   x2   3x1 − 3x2 + 6x3 + x4 − x5      2 1   x3  =  x1 + x2 − 4x3 + 2x4 + x5      x4 4x1 − 3x2 + x3 + 6x4 + x5 6 1 x5 2x1 + 3x2 − 5x3 + 6x4 − x5 6 −1

Note that what appears on the right—although it looks rather complicated—is just a 5 × 1 column vector. Thus, the system of equations can be written as a single matrix equation      10 2 3 −5 6 −1 x1 6 1 −1   x2   2   3 −3      1 −4 2 1   x3  =  5  . 1      x4 4 4 −3 1 6 1 x5 3 2 3 −5 6 −1 If we use the notation 

 2 3 −5 6 −1 6 1 −1   3 −3   A = 1 1 −4 2 1,   4 −3 1 6 1 2 3 −5 6 −1

 x1  x2    x =  x3    x4 x5 



 10  2   b =  5,   4 3

then the system can be written even more compactly Ax = b. Of course, this notational simplicity hides a lot of real complexity, but it does help us to think about the essentials of the problem. More generally, an arbitrary system of m equations in n unknowns has the form 

a11  a21  .  ..

a12 a22 .. .

am1

am2

... ... ... ...

    a1n x1 b1 a2n   x2   b2   .  =  . , ..  .   ..   ..  xn bm amn

10

I. LINEAR ALGEBRA, BASIC NOTIONS

where



a11  a21 A=  ...

a12 a22 .. .

am1

am2

is the coefficient matrix and 

 x1  x2   x=  ... 

... ... ... ...

 a1n a2n  ..  .  amn 

and

 b1  b2   b=  ...  ,

xn

bm

are column vectors of unknowns and ‘givens’ respectively. Later in this chapter, we shall investigate systematic methods for solving systems of linear equations. Special Operations for Row or Column Vectors. We have already remarked that a column vector   v1  v2   v=  ...  vn may be viewed as a generalization of a vector in two or three dimensions. We also used a generalization of the dot product of two such vectors in defining the matrix product. In a similar fashion, we may define the length of a row vector or column vector to be the square root of the sum of the squares of its components. For example. for   1 p √  2 v= we have |v| = 12 + 22 + (−3)2 + 42 = 30. , −3 4 Exercises for Section 2. 1.



 1 x =  2, −3

Let



 −2 y =  1, 3



 1 z =  0. −1

Calculate x + y and 3x − 5y + z. 2.

Let 

 2 7 4 −3 1 −2   −3 0 A= , 1 3 −2 3 0 0 5 −5



 1  −2  x= , 3 5

Compute Ax, Ay, Ax + Ay, and A(x + y).



 −2  2 y= . 0 4

2. MATRIX ALGEBRA

3.

Let · 1 −1 A= 0 −2

 ¸ 1 3 ,B =  1 2 −3

 · 2 −1  0 ,C = 0 2

1 2

11

 ¸ −1 −2 −3 , D =  1 −2 −2 2 1

 0 1. −4

Calculate each of the following quantities if it is defined : A + 3B, A + C, C + 2D, AB, BA, CD, DC. 4.

Suppose A is a 2 × 2 matrix such that · ¸ · ¸ · ¸ · ¸ 1 3 2 6 A = A = . 2 1 1 4

Find A. 5. Let ei denote the n × 1 column vector, with all entries zero except the ith which is 1, e.g., for n = 3,       1 0 0 e2 =  1  , e3 =  0  . e1 =  0  , 0 0 1 Let A be an arbitrary m × n matrix. Show that Aei is the ith column of A. You may verify this just in the case n = 3 and A is 3 × 3. That is sufficiently general to understand the general argument. 6.

Write each of the following systems in matrix form. (a) 2x1 − 3x2 = 2 −4x1 + 2x2 = 3 (b) 2x1 − 3x2 = 4 −4x1 + 2x2 = 1 (c) =1 x1 + x2 x2 + x3 = 1 2x1 + 3x2 − x3 = 0

7.

(a) Determine the lengths of the following column vectors       1 1 0  2   0  2 u= ,v =  ,w =  . −2 0 2 1 −1 0 (b) Are any of these vectors mutually perpendicular? (c) Find unit vectors proportional to each of these vectors.

12

I. LINEAR ALGEBRA, BASIC NOTIONS

8. One kind of magic square is a square array of numbers such that the sum of every row and the sum of every column is the same number. (a) Which of the following matrices present magic squares? ·

1 4

3 2

¸



1 2 1

2 1 1

 1 1 2

(b) Use matrix multiplication to describe the condition that an n × n matrix A presents a magic square. 9. Population is often described by a first order differential equation of the form dp = rp where p represents the population and r is a parameter called the growth dt rate. However, real populations are more complicated. For example, human populations come in different ages with different fertility. Matrices are used to create more realistic population models. Here is an example of how that might be done Assume a human population is divided into 10 age groups between 0 and 99. Let xi , i = 1, 2, . . . , 10 be the number of women in the ith age group, and consider the vector x with those components. (For the sake of this exercise, we ignore men.) Suppose the following table gives the birth and death rates for each age group in each ten year period. i 1 2 3 4 5 6 7 8 9 10

Age 0...9 10 . . . 19 20 . . . 29 30 . . . 39 40 . . . 49 50 . . . 59 60 . . . 69 70 . . . 79 80 . . . 89 90 . . . 99

BR 0 .01 .04 .03 .01 .001 0 0 0 0

DR .01 .01 .01 .01 .02 .03 .04 .10 .30 1.00

For example, the fourth age group is women age 30 to 39. In a ten year period, we expect this group to give birth to .03x4 girls, all of whom will be in the first age group at the beginning of the next ten year period. We also expect .01x4 of them to die, which tells us something about the value of x5 at the beginning of the next ten year period. Construct a 10 × 10 matrix A which incorporates this information about birth and death rates so that Ax gives the population vector after one ten year period has elapsed. Note that An x keeps track of the population structure after n ten year periods have elapsed.

3. FORMAL RULES

13

3. Formal Rules The usual rules of algebra apply to matrices with a few exceptions. Here are some of these rules and warnings about when they apply. The associative law A(BC) = (AB)C works as long as the shapes of the matrices match. That means that the length of each row of A must be the same as the length of each column of B and the length of each row of B must be the same as the length of each column of C. Otherwise, none of the products in the formula will be defined. Example 1. Let · A=

1 −1

AB =

·

0 , 1 ·

Then

while

¸

1 0

· ¸ 7 , BC = 1

B=

1 −2

  3 C = 2. 1

¸

1 1

1 2 , −1 0

¸ 2 , −2

· (AB)C = ·

1 0 A(BC) = −1 1

¸ 7 , −6

¸· ¸ · ¸ 7 7 = . 1 −6

Note that this is bit more complicated than the associative law for ordinary numbers (scalars). For those who are interested, the proof of the in the exercises. For each positive integer n, the n × n matrix  1 0 ... 0 1 ... I=  ... ... . . . 0

0 ...

general associative law is outlined  0 0 ..  . 1

is called the identity matrix of degree n. As in the case of the zero matrices, we get a different identity matrix for each n, and if we need to note the dependence on n, we shall use the notation In . The identity matrix of degree n has the property IA = A for any matrix A with n rows and the property BI = B for any matrix B with n columns. Example 2. Let



1 I = 0 0

0 1 0

 0 0 1

be the 3 × 3 identity matrix. Then, for example,     1 0 0 1 3 1 0 1 0 4 2 =  4 0 0 1 −1 6 −1

 3 2 6

14

I. LINEAR ALGEBRA, BASIC NOTIONS

and

·

1 3

 ¸ 1 3  0 1 0

2 2

0 1 0

 · 0 1  0 = 3 1

¸ 3 . 1

2 2

The entries of the identity matrix are usually denoted δij . δij = 1 if i = j (the diagonal entries) and δij = 0 if i 6= j. The indexed expression δij is often called the Kronecker δ. The commutative law AB = BA is not generally true for matrix multiplication. First of all, the products won’t be defined unless the shapes match. Even if the shapes match on both sides, the resulting products may have different sizes. Thus, if A is m × n and B is n × m, then AB is m × m and BA is n × n. Finally, even if the shapes match and the products have the same sizes (if both A and B are n × n), it may still be true that the products are different. Example 3. Suppose ·

1 A= 0 ·

Then AB =

0 0

0 0

¸

·

¸ 0 . 0

0 B= 1

¸ 0 =0 0

· BA =

0 1

¸ 0 6 0 = 0

so AB 6= BA. Lest you think that this is a specially concocted example, let me assure you that it is the exception rather than the rule for the commutative law to hold for a randomly chosen pair of square matrices. Another rule of algebra which holds for scalars but does not generally hold for matrices is the cancellation law. Example 4. Let ·

1 A= 0

0 0

¸

·

0 B= 1

0 0

¸

·

0 C= 0

¸ 0 . 1

Then AB = 0

and

AC = 0

so we cannot necessarily conclude from AB = AC that B = C. The distributive laws A(B + C) = AB + AC (A + B)C = AC + BC do hold as long as the operations are defined. Note however that since the commutative law does not hold in general, the distributive law must be stated for both possible orders of multiplication. Another useful rule is c(AB) = (cA)B = A(cB)

4. LINEAR SYSTEMS OF ALGEBRAIC EQUATIONS

15

where c is a scalar and A and B are matrices whose shapes match so the products are defined. The rules of calculus apply in general to matrix valued functions except that you have to be careful about orders whenever products are involved. For example, we have dA(t) dB(t) d (A(t)B(t)) = B(t) + A(t) dt dt dt for matrix valued functions A(t) and B(t) with matching shapes. We have just listed some of the rules of algebra and calculus, and we haven’t discussed any of the proofs. Generally, you can be confident that matrices can be manipulated like scalars if you are careful about matters like commutativity discussed above. However, in any given case, if things don’t seem to be working properly, you should look carefully to see if some operation you are using is valid for matrices. Exercises for Section 3. 1.

2 3 4 5 (a) Let I be · the 3¸× 3 identity matrix. What is I ? How about I , I , I , etc.? 0 1 (b) Let J = . What is J 2 ? 1 0

2. Find two 2 × 2 matrices A and B such that neither has any zero entries but such that AB = 0. 3. Let A be an m × n matrix, let x and y be n × 1 column vectors, and let a and b be scalars. Using the rules of algebra discussed in Section 3, prove A(ax + by) = a(Ax) + b(Ay). 4. (Optional) Pn Prove the associative law (AB)C = PpA(BC). Hint: If D = AB, then dik = j=1 aij bjk , and if E = BC then ejr = k=1 bjk ckr , where A is m × n, B is n × p, and C is p × q. 5.

Verify the following relation · ¸ · ¸· d cos t − sin t 0 −1 cos t = cos t 1 0 sin t dt sin t

¸ − sin t . cos t

4. Linear Systems of Algebraic Equations We start with a problem you ought to be able to solve from what you learned in high school Example 1. Consider the algebraic system

(1)

x1 + 2x2 − x3 = 1 x1 − x2 + x3 = 0 x1 + x2 + 2x3 = 1

16

I. LINEAR ALGEBRA, BASIC NOTIONS

which is a system of 3 equations in 3 unknowns x1 , x2 , x3 . This system may also be written more compactly as a matrix equation      1 2 −1 x1 1  1 −1 1   x2  =  0  . x3 1 1 2 1 The method we shall use to solve (1) is the method of elimination of unknowns. Subtract the first equation from each of the other equations to eliminate x1 from those equations. x1 + 2x2 − x3 = 1 −3x2 + 2x3 = −1 −x2 + 3x3 = 0 Now subtract 3 times the third equation from the second equation. x1 + 2x2 − x3 = 1 −7x3 = −1 −x2 + 3x3 = 0 which may be reordered to obtain x1 + 2x2 − x3 = 1 −x2 + 3x3 = 0 7x3 = 1. We may now solve as follows. According to the last equation x3 = 1/7. Putting this in the second equation yields −x2 + 3/7 = 0

or

x2 = 3/7.

Putting x3 = 1/7 and x2 = 3/7 in the first equation yields x1 + 2(3/7) − 1/7 = 1

or

x1 = 1 − 5/7 = 2/7.

Hence, we get x1 = 2/7 x2 = 3/7 x3 = 1/7 To check, we calculate 

    1 2 −1 2/7 1  1 −1 1   3/7  =  0  . 1 1 2 1/7 1

4. LINEAR SYSTEMS OF ALGEBRAIC EQUATIONS

17

The above example illustrates the general procedure which may be applied to any system of m equations in n unknowns a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . am1 x1 + am2 x2 + · · · + amn xn = bm or, using matrix notation, Ax = b with 

a11  a21 A=  ...

a12 a22 .. .

am1

am2

... ... ... ...

 a1n a2n  ..  .  amn

 x1  x2   x=  ...  

xn  b1  b2   b=  ...  . 

bm As in Example 1, a sequence of elimination steps yields a set of equations each involving at least one fewer unknowns than the one above it. This process is called Gaussian reduction after the famous 19th century German mathematician C. F. Gauss. To complete the solution, we start with the last equation and substitute back recursively in each of the previous equations. This process is called appropriately back-substitution. The combined process will generally lead to a complete solution, but, as we shall see later, there can be some difficulties. Row Operations and Gauss-Jordan reduction. Generally a system of m equations in n unknowns can be written in matrix form Ax = b where A is an m × n matrix of coefficients, x is a n × 1 column vector of unknowns and b is a m × 1 column vector of givens. It turns out to be just about as easy to study more general systems of the form AX = B

18

I. LINEAR ALGEBRA, BASIC NOTIONS

where A is an m × n matrix, X is an n × p matrix of unknowns, and B is a known m × p matrix. Usually, p will be 1, so X and B will be column vectors, but the procedure is basically the same for any p. For the moment we emphasize the case in which the coefficient matrix A is square, i.e., m = n, but we shall return later to the general case (m and n possibly different). If you look carefully at Example 1, you will see that we employed three basic types of operations: (1) adding or subtracting a multiple of one equation from another, (2) multiplying or dividing an equation by a non-zero scalar, (3) interchanging two equations. Translated into matrix notation, these operations correspond to applying the following operations to the matrices on both sides of the equation AX = B: (1) adding or subtracting one row of a matrix to another, (2) multiplying or dividing one row of a matrix by a non-zero scalar, (3) interchanging two rows of a matrix. (The rows of the matrices correspond to the equations.) These operations are called elementary row operations. An important principle about row operations that we shall use over and over again is the following: To apply a row operation to a product AX, it suffices to apply the row operation to A and then to multiply the result by X. It is easy to convince yourself that this rule is valid by looking at examples. Example. Suppose ·

1 A= 2

¸ 3 , 4

·

1 X= 3

2 2

¸ 3 . 1

Apply the operation of adding −2 times the first row of A to the second row of A. · ¸ · ¸ 1 3 1 3 → 2 4 0 −2 and multiply by X to get · ¸· 1 3 1 0 −2 3

2 2

On the other hand, first compute · ¸· 1 3 1 AX = 2 4 3

¸ · 3 10 8 = 1 −6 −4

2 2

¸ · 3 10 = 1 14

¸ 6 . −2

8 12

6 10

¸

and then add −2 times its first row to its second row to obtain · ¸ · ¸ 10 8 6 10 8 6 → . 14 10 12 −6 −4 −2 Note that the result is the same by either route.

4. LINEAR SYSTEMS OF ALGEBRAIC EQUATIONS

19

If you want to see a general explanation of why this works, see the appendix to this section. This suggests a procedure for solving a system of the form AX = B. Apply row operations to both sides until we obtain a system which is easy to solve (or for which it is clear there is no solution.) Because of the principle just enunciated, we may apply the row operations on the left just to the matrix A and omit reference to X since that is not changed. For this reason, it is usual to collect A on the left and B on the right in a so-called augmented matrix [A | B] where the ‘|’ (or other appropriate divider) separates the two matrices. We illustrate this by redoing Example 1, but this time using matrix notation. Example 1, redone. The system was x1 + 2x2 − x3 = 1 x1 − x2 + x3 = 0 x1 + x2 + 2x3 = 1 so the augmented matrix is



1 1 1

 2 −1 | 1 −1 1 | 0 1 2 | 1

We first do the Gaussian part of the reduction using row operations. The row operations are indicated to the right with the rows that are changed in bold face.     1 2 −1 | 1 1 2 −1 | 1  1 −1 1 | 0  →  0 −3 2 | −1  − 1[row1] + row2 1 1 2 | 1 1 1 2 | 1   1 2 −1 | 1 →  0 −3 2 | −1  − 1[row1] + row3 0 −1 3 | 0   1 2 −1 | 1 → 0 0 −7 | −1  − 3[row3] + row2 0 −1 3 | 0   1 2 −1 | 1 →  0 −1 3 | 0 row3 ↔ row2 0 0 −7 | −1 Compare this with the previous reduction using equations. We can reconstruct the corresponding system from the augmented matrix, and, as before, we get x1 + 2x2 − x3 = 1 −x2 + 3x3 = 0 −7x3 = −1

20

I. LINEAR ALGEBRA, BASIC NOTIONS

Earlier, we applied back-substitution to find the solution. However, it is better for matrix computation to use an essentially equivalent process. Starting with the last row, use the leading non-zero entry to eliminate the entries above it. (That corresponds to substituting the value of the corresponding unknown in the previous equations.) This process is called Jordan reduction. The combined process is called Gauss–Jordan reduction. or, sometimes, reduction to row reduced echelon form. 

1 0 0

  2 −1 | 1 1 −1 3 | 0 → 0 0 −7 | −1 0  1 → 0 0  1 → 0 0  1 → 0 0  1 → 0 0

 2 −1 | 1 −1 3 | 0 0 1 | 1/7

(1/7)[row3]

 2 −1 | 1 −1 0 | −3/7  − 3[row3] + row2 0 1 | 1/7  2 0 | 8/7 −1 0 | −3/7  [row3] + row1 0 1 | 1/7  0 0 | 2/7 −1 0 | −3/7  2[row2] + row1 0 1 | 1/7  0 0 | 2/7 1 0 | 3/7  − 1[row2] 0 1 | 1/7

This corresponds to the system 

1 0 0

0 1 0

   2/7 0 0  X =  −3/7  1 1/7



or

 2/7 X = IX =  −3/7  1/7

which is the desired solution: x1 = 2/7, x2 = −3/7, x3 = 1/7. Here is another similar example. Example 2. x1 + x2 − x3 = 0 + x3 = 2 2x1 x1 − x2 + 3x3 = 1 or 

1 2 1

    1 −1 0 x1 0 1   x2  =  2  x2 −1 3 1

4. LINEAR SYSTEMS OF ALGEBRAIC EQUATIONS

Reduce the augmented matrix as follows.    1 1 −1 | 0 1 1 −1 | 2 0 1 | 2  →  0 −2 3 | 1 −1 3 | 1 1 −1 3 |  1 1 −1 | →  0 −2 3 | 0 −2 4 |  1 1 −1 | →  0 −2 3 | 0 0 1 |

 0 2 1  0 2 1

 0 2 −1

This completes the Gaussian reduction. Now continue with     1 1 −1 | 0 1 1 −1 | 0  0 −2 3 | 2  →  0 −2 0 | 5 0 0 1 | −1 0 0 1 | −1   1 1 0 | −1 →  0 −2 0 | 5 0 0 1 | −1   1 1 0 | −1 →  0 1 0 | −5/2  0 0 1 | −1   1 0 0 | −3/2 →  0 1 0 | −5/2  0 0 1 | −1 This corresponds to  1 0 0 1 0 0

the system    −3/2 0 0  X =  −5/2  1 −1

21

− 2[r1] + [r2]

− [r1] + [r3]

− [r2] + [r3] the Jordan reduction. − 3[r3] + [r2]

[r3] + [r1]

− (1/2)[r2]

− [r2] + [r1]



or

 −3/2 X = IX =  −5/2  −1

which is the desired solution: x1 = −3/2, x2 = −5/2, x3 = −1. (Check it by plugging back into the original matrix equation.) The strategy is clear. Use the sequence of row operations as indicated above to reduce the coefficient matrix A to the identity matrix I. If this is possible, the same sequence of row operations will transform the matrix B to a new matrix B 0 , and the corresponding matrix equation will be IX = B 0

or

X = B0.

It is natural at this point to conclude that X = B 0 is the solution of the original system, but there is a subtle problem with that. As you may have learned in high school, the process of solving an equation or a system of equations may introduce extraneous solutions. These are not actually solutions of the original equations,

22

I. LINEAR ALGEBRA, BASIC NOTIONS

but are introduced by some algebraic manipulation along the way. In particular, it might be true that the original system AX = B has no solutions, and the conclusion X = B 0 is an artifact of the process. The best we conclude from the above logic is the following: if there is a solution, and if it is possible to reduce A to I by a sequence of row operations, then the solution is X = B 0 . That is, if a solution exists, it is unique, i.e., there is only one solution. To see that the solution must be unique, argue as follows. Suppose there were two solutions X and Y . Then we would have AX = B AY = B. Subtraction would then yield A(X − Y ) = 0

or

AZ = 0

where Z = X − Y.

However, we could apply our sequence of row operations to the equation AZ = 0 to obtain IZ = 0 since row operations have no effect on the zero matrix. Thus, we would conclude that Z = X − Y = 0 or X = Y . How about the question of whether or not there is a solution in the first place? Of course, in any given case, we can simply check that X = B 0 is a solution by substituting B 0 for X in AX = B and seeing that it works. However, this relies on knowing A and B explicitly. So, it would be helpful if we had a general argument which assured us that X = B 0 is a solution when the reduction is possible. (Among other things, we could skip the checking process if we were sure we did all the arithmetic correctly.) To understand why X = B 0 definitely is a solution, we need another argument. First note that every possible row operation is reversible. Thus, to reverse the effect of adding a multiple of one row to another, just subtract the same multiple of the first row from the (modified) second row. To reverse the effect of multiplying a row by a non-zero scalar, just multiply the (modified) row by the reciprocal of that scalar. Finally, to reverse the effect of interchanging two rows, just interchange them back. Hence, the effect of any sequence of row operations on a system of equations is to produce an equivalent system of equations. Anything which is a solution of the initial system is necessarily a solution of the transformed system and vice-versa. Thus, the system AX = B is equivalent to the system X = IX = B 0 , which is to say X = B 0 is a solution of AX = B. Appendix. Elementary Matrices and the Effect of Row Operations on Products. Each of the elementary row operations may be accomplished by multiplying by an appropriate square matrix on the left. Such matrices of course should have the proper size for the matrix being multiplied. To add c times the jth row of a matrix to the ith row (with i 6= j), multiply that matrix on the left by the matrix Eij (c) which has diagonal entries 1, the i, jentry c, and all other entries 0. This matrix may also be obtained by applying the specified row operation to the identity matrix. You should try out a few examples to convince yourself that it works.

4. LINEAR SYSTEMS OF ALGEBRAIC EQUATIONS

Example. For n = 3,



1 0 E13 (−4) =  0 1 0 0

23

 −4 0. 1

To multiply the ith row of a matrix by c 6= 0, multiply that matrix on the left by the matrix Ei (c) which has diagonal entries 1 except for the i, i-entry which is c and which has all other entries zero. Ei (c) may also be obtained by multiplying the ith row of the identity matrix by c. Example. For n = 3,



1 E2 (6) =  0 0

0 6 0

 0 0. 1

To interchange the ith and jth rows of a matrix, with i 6= j, multiply by the matrix on the left by the matrix Eij which is obtained from the identity matrix by interchanging its ith and jth rows. The diagonal entries of Eij are 1 except for its i, i, and j, j-entries which are zero. Its i, j and j, i-entries are both 1, and all other entries are zero. Examples. For n = 3,  0 E12 =  1 0

1 0 0

 0 0 1



E13

0 = 0 1

0 1 0

 1 0. 0

Matrices of the above type are called elementary matrices. The fact that row operations may be accomplished by matrix multiplication by elementary matrices has many important consequences. Thus, let E be an elementary matrix corresponding to a certain elementary row operation. The associative law tells us E(AX) = (EA)X as long as the shapes match. However, E(AX) is the result of applying the row operation to the product AX and (EA)X is the result of applying the row operation to A and then multiplying by X. This establishes the important principle enunciated earlier in this section and upon which Gauss-Jordan reduction is based. A row operation on a product AX may be accomplished by first applying that row operation to A and then multiplying the result by X. Exercises for Section 4. 1. Solve each of the following systems by Gauss-Jordan elimination if there is a solution. (a) x1 + 2x2 + 3x3 = 4 3x1 + x2 + 2x3 = −1 + x3 = 0 x1

24

I. LINEAR ALGEBRA, BASIC NOTIONS

(b) x1 + 2x2 + 3x3 = 4 2x1 + 3x2 + 2x3 = −1 x1 + x2 − x3 = 10 (c)



1 2  1 3 2.

    1 −2 3 9 x1 1 0 1   x2   −18    =  . x3 −1 1 0 −9 x4 1 2 1 9

Use Gaussian elimination to solve ·

¸ · 2 0 X= 1 1

3 2

1 0

¸

where X is an unknown 2 × 2 matrix. 3.

Solve each of the following matrix equations · (a)

4.

1 1

¸

·

2 0 X= 2 1

1 0



¸

1 (b)  0 1

0 1 1

  1 1 1X = 0 0 2

 0 1 1

What is the effect of multiplying a 2 × 2 matrix A on the right by the matrix · E=

1 0

¸ a ? 1

What general rule is this a special case of? (E is a special case of an elementary matrix, as discussed in the Appendix.) 5. (Optional) Review the material in the Appendix on elementary matrices. Then calculate 

1 0 0

0 1 0

 1 1 0   −1 1 0

0 1 0

 0 0 01 0 1

1 0 0

 2 0 00 0 1

0 1 0

 1 0 04 7 1

2 5 8

Hint: Use the row operations suggested by the first four matrices.

 3 6. 9

5. SINGULARITY, PIVOTS, AND INVERTIBLE MATRICES

25

5. Singularity, Pivots, and Invertible Matrices Let A be a square coefficient matrix. Gauss-Jordan reduction will work as indicated in the previous section if A can be reduced by a sequence of elementary row operations to the identity matrix I. A square matrix with this property is called non-singular or invertible. (The reason for the latter terminology will be clear shortly.) If it cannot be so reduced, it is called singular. Clearly, there are singular matrices. For example, the matrix equation · ¸· ¸ · ¸ 1 1 1 x1 = x2 0 1 1 is equivalent to the system of 2 equations in 2 unknowns x1 + x2 = 1 x1 + x2 = 0 which is inconsistent and has no solution. Thus Gauss-Jordan reduction certainly can’t work on its coefficient matrix. To understand how to tell if a square matrix A is non-singular or not, we look more closely at the Gauss-Jordan reduction process. The basic strategy is the following. Start with the first row, and use type (1) row operations to eliminate all entries in the first column below the 1, 1-position. A leading non-zero entry of a row, when used in this way, is called a pivot. There is one problem with this course of action: the leading non-zero entry in the first row may not be in the 1, 1-position. In that case, first interchange the first row with a succeeding row which does have a non-zero entry in the first column. (If you think about it, you may still see a problem. We shall come back to this and related issues later.) After the first reduction, the coefficient matrix will have been transformed to a matrix of the form   p1 ∗ . . . ∗  0 ∗ ... ∗   . .  .. .. . . . ...  0



...



where p1 is the (first) pivot. We now do something mathematicians (and computer scientists) love: repeat the same process for the submatrix consisting of the second and subsequent rows. If we are fortunate, we will be able to transform A ultimately by a sequence of elementary row operations into matrix of the form 

p1 0  0  .  . .

∗ p2 0 .. .

∗ ∗ p3 .. .

0

0

0

... ... ... ... ...

∗ ∗ ∗ .. .

     

pn

with pivots on the diagonal and nonzero-entries in those pivot positions. (Such a matrix is also called an upper triangular matrix because it has zeroes below the

26

I. LINEAR ALGEBRA, BASIC NOTIONS

diagonal.) If we get this far, we are bound to succeed. Start in the lower right hand corner and apply the Jordan reduction process. In this way each entry above the pivots on the diagonal may be eliminated. We obtain this way a diagonal matrix   p1 0 0 . . . 0  0 p2 0 . . . 0     0 0 p3 . . . 0   . .. .. .    . . . . . . . .. 0

0

0

...

pn

with non-zero entries on the diagonal. We may now finish off the process by applying type (2) operations to the rows as needed and finally obtain the identity matrix I as required. The above analysis makes clear that the placement of the pivots is what is essential to non-singularity. What can go wrong? Let’s look at an example. Example 1.  1 2 1 2 1 2

  −1 1 0 → 0 −2 0  1 → 0 0

 2 −1 0 1 0 −1  2 0 0 1 0 0

clear 1st column

no pivot in 2, 2 position

Note that the last row consists of zeroes. The general case is similar. It may happen for a given row that the leading non-zero entry is not in the diagonal position, and there is no way to remedy this by interchanging with a subsequent row. In that case, we just do the best we can. We use a pivot as far to the left as possible (after suitable row interchange with a subsequent row where necessary). In the extreme case, it may turn out that the submatrix we are working with consists only of zeroes, and there are no possible pivots to choose, so we stop. For a singular square matrix, this extreme case must occur, since we will run out of pivot positions before we run out of rows. Thus, the Gaussian reduction will still transform A to an upper triangular matrix A0 , but some of the diagonal entries will be zero and some of the last rows (perhaps only the last row) will consist of zeroes. That is the singular case. We showed in the previous section that if the n × n matrix A is non-singular, then every equation of the form AX = B (where both X and B are n × p matrices) does have a solution and also that the solution X = B 0 is unique. On the other hand, if A is singular , an equation of the form AX = B may have a solution, but there will certainly be matrices B for which AX = B has no solutions. This is best illustrated by an example. Example 2. Consider the system  1 2 1 2 1 2

 −1 0x = b −2

5. SINGULARITY, PIVOTS, AND INVERTIBLE MATRICES

where x and b are 3 × 1 column vectors. Without specifying b, the augmented matrix for this system would follow the scheme      1 2 −1 | ∗ 1 2 0 1 2 −1 | b1 1 2 0 | b2  →  0 0 1 | ∗ → 0 0 1 1 2 −2 | b3 0 0 0 0 0 −1 | ∗

27

the reduction of | | |

 b01 b02  . b03

Now simply choose b03 = 1 (or any other non-zero value), so the reduced system is inconsistent. (Its last equation would be 0 = b03 6= 0.) Since, the two row operations may be reversed, we can now work back to a system with the original coefficient matrix which is also inconsistent. (Check in this case that if you choose b01 = 0, b02 = 1, b03 = 1, then reversing the operations yields b1 = −1, b2 = 0, b3 = −1.) The general case is completely analogous. Suppose A → · · · → A0 is a sequence of elementary row operations which transforms A to a matrix A0 for which the last row consists of zeroes. Choose any n × p matrix B 0 for which the last row does not consist of zeroes. Then the equation A0 X = B 0 cannot be valid since the last row on the left will necessarily consist of zeroes. Now reverse the row operations in the sequence which transformed A to A0 . Let B be the effect of this reverse sequence on B 0 . A ← · · · ← A0 B ← · · · ← B0 Then the equation AX = B cannot be consistent because the equivalent system A0 X = B 0 is not consistent. We shall see later that when A is a singular n × n matrix, if AX = B has a solution X for a particular B, then it has infinitely many solutions. There is one unpleasant possibility we never mentioned. It is conceivable that the standard sequence of elementary row operations transforms A to the identity matrix, so we decide it is non-singular, but some other bizarre sequence of elementary row operations transforms it to a matrix with some rows consisting of zeroes, in which case we should decide it is singular. Fortunately this can never happen because singular matrices and non-singular matrices have diametrically opposed properties. For example, if A is non-singular then AX = B has a solution for every B, while if A is singular, there are many B for which AX = B has no solution. This fact does not depend on the method we use to find solutions. Inverses of Non-singular Matrices. Let A be a non-singular n × n matrix. According to the above analysis, the equation AX = I (where we take B to be the n × n identity matrix I) has a unique n × n solution matrix X = B 0 . This B 0 is called the inverse of A, and it is usually denoted A−1 . That explains why non-singular matrices are also called invertible.

28

I. LINEAR ALGEBRA, BASIC NOTIONS

Example 3. Consider



1 0 A = 1 1 1 2 To solve AX = I, we reduce the  1 0 −1 | 1 0 1 1 0 | 0 1 1 2 0 | 0 0

augmented   0 1 0 → 0 1 0  1 → 0 0  1 → 0 0  1 → 0 0

 −1 0 0 matrix [A | I]. 0 1 2

−1 | 1 1 | −1 1 | −1

0 1 0

−1 1 −1

0 1 0

 0 0 1

 0 0 1 0 −2 1  0 −1 | 1 0 0 1 1 | −1 1 0 0 1 | −1 2 −1  0 0 | 0 2 −1 1 0 | 0 −1 1. 0 1 | −1 2 −1 | 1 | −1 | 1

(You should make sure you see which row operations were used in each step.) Thus, the solution is   0 2 −1 1. X = A−1 =  0 −1 −1 2 −1 Check the answer by calculating   0 2 −1 1 A−1 A =  0 −1 11 −1 2 −1 1

0 1 2

  −1 1 0 = 0 0 0

0 1 0

 0 0. 1

There is a subtle point about the above calculations. The matrix inverse X = A−1 was derived as the unique solution of the equation AX = I, but we checked it by calculating A−1 A = I. The definition of A−1 told us only that AA−1 = I. Since matrix multiplication is not generally commutative, how could we be sure that the product in the other order would also be the identity I? The answer is provided by the following tricky argument. Let Y = A−1 A. Then AY = A(A−1 A) = (AA−1 )A = IA = A so that Y is the unique solution of the equation AY = A. However, Y = I is also a solution of that equation, so we may conclude that A−1 A = Y = I. The upshot is that for a non-singular square matrix A, we have both AA−1 = I and A−1 A = I. The existence of matrix inverses for non-singular square matrices suggests the following scheme for solving matrix equations of the form AX = B.

5. SINGULARITY, PIVOTS, AND INVERTIBLE MATRICES

29

First, find the matrix inverse A−1 , and then take X = A−1 B. This is indeed the solution since AX = A(A−1 B) = (AA−1 )B = IB = B. However, as easy as this looks, one should not be misled by the formal algebra. The only method we have for finding the matrix inverse is to apply Gauss-Jordan reduction to the augmented matrix [A | I]. If B has fewer than n columns, then applying Gauss-Jordan reduction directly to [A | B] would ordinarily involve less computation that finding A−1 . Hence, it is usually the case that applying Gauss-Jordan reduction to the original system of equations is the best strategy. An exception to this rule is where we have one common coefficient matrix A and many different matrices B, or perhaps a B with a very large number of columns. In that case, it seems as though it would make sense to find A−1 first. However, there is a variation of Gauss-Jordan reduction called the LU decomposition, that is more efficient and avoids the necessity for calculating the inverse and multiplying by it. See the appendix to this section for a brief discussion of the LU decomposition. A Note on Strategy. The methods outlined in this and the previous section call for us first to reduce the coefficient matrix to one with zeroes below the diagonal and pivots on the diagonal. Then, starting in the lower right hand corner , we use each pivot to eliminate the non-zero entries in the column above the pivot. Why is it important to start at the lower right and work backwards? For that matter, why not just clear each column above and below the pivot as we go? There is a very good reason for that. We want to do as little arithmetic as possible. If we clear the column above the rightmost pivot first, then nothing we do subsequently will affect the entries in that column. Doing it in some other order would require lots of unnecessary arithmetic in that column. For a system with two or three unknowns, this makes little difference. However, for large systems, the number of operations saved can be considerable. Issues like this are specially important in designing computer algorithms for solving systems of equations. Numerical Considerations in Computation. The examples we have chosen to illustrate the principles employ small matrices for which one may do exact arithmetic. The worst that will happen is that some of the fractions may get a bit messy. In real applications, the matrices are often quite large, and it is not practical to do exact arithmetic. The introduction of rounding and similar numerical approximations complicates the situation, and computer programs for solving systems of equations have to deal with problems which arise from this. If one is not careful in designing such a program, one can easily generate answers which are very far off, and even deciding when an answer is sufficiently accurate sometimes involves rather subtle considerations. Typically, one encounters problems for matrices in which the entries differ radically in size. Also, because of rounding, few matrices are ever exactly singular since one can never be sure that a very small numerical value at a potential pivot would have been zero if the calculations had been done exactly. On the other hand, it is not surprising that matrices which are close to being singular can give computer programs indigestion. In practical problems on a computer, the organization and storage of data can also be quite important. For example, it is usually not necessary to keep the old

30

I. LINEAR ALGEBRA, BASIC NOTIONS

entries as the reduction proceeds. It is important, however, to keep track of the row operations. The memory locations which become zero in the reduction process are ideally suited for storing the relevant information to keep track of the row operations. (The LU factorization method is well suited to this type of programming.) If you are interested in such questions, there are many introductory texts which discuss numerical linear algebra. Two such are Introduction to Linear Algebra by Johnson, Riess, and Arnold and Applied Linear Algebra by Noble and Daniel. Appendix. The LU Decomposition. If one needs to solve many equations of the form Ax = b with the same A but different bs, we noted that one could first calculate A−1 by Gauss-Jordan reduction and then calculate A−1 b. However, it is more efficient to store the row operations which were performed in order to do the Gaussian reduction and then apply these to the given b by another method which does not require a time consuming matrix multiplication. This is made precise by a formal decomposition of A as a product in a special form. First assume that A is non-singular and that the Gaussian reduction of A can be done in the usual systematic manner starting in the upper left hand corner, but without using any row interchanges. We will illustrate the method by an example, and save an explanation for why it works for later. Let   1 0 1 A =  1 2 1. −2 3 4 Proceed with the Gaussian reduction while at the same time storing the inverses of the row operations which were performed. In practice in a computer program, the operation (or actually its inverse) is stored in the memory location containing the entry which is no longer needed, but we shall indicate it more schematically. We start with A on the right and the identity matrix on the left.     1 0 0 1 0 1 0 1 0  1 2 1. 0 0 1 −2 3 4 Now apply the first row operation to A on the right. Add −1 times the first row to the second row. At the same time put +1 in the 2, 1 entry in the matrix on the left. (Note that this is not a row operation, we are just storing the important part of the inverse of the operation just performed, i.e., the multiplier.)     1 0 0 1 0 1 1 1 0  0 2 0. 0 0 1 −2 3 4 Next add 2 times the first row to the third row of the matrix on the right and store a −2 in the 3, 1 position of the matrix on the left.     1 0 0 1 0 1  1 1 0 0 2 0. −2 0 1 0 3 6

5. SINGULARITY, PIVOTS, AND INVERTIBLE MATRICES

31

1 Now multiply the second row of the matrix on the right by , and store a 2 in the 2 2, 2 position in the matrix on the left.     1 0 0 1 0 1  1 2 0 0 1 0. 0 3 6 −2 0 1 Next add −3 times the second row store a 3 in the 3, 2 position of the  1 0  1 2 −2 3

to the third row of the matrix on the right and matrix on the left.    0 1 0 1 0 1 0. 0 1 0 0 6

Finally, multiply the third row of the matrix on 3, 3 position of the matrix on the left.    1 0 0 1  1 2 0 0 −2 3 6 0 The net result is that we have stored the row in the matrix  1 0 L= 1 2 −2 3

the right by

0 1 0

1 and store 6 in the 6

 1 0. 1

operations (or rather their inverses)  0 0 6

on the left and we have by Gaussian reduction reduced A to the matrix   1 0 1 U = 0 1 0. 0 0 1 on the right. Note that L is a lower triangular matrix and U is an upper triangular matrix with ones on the diagonal . Also,      1 0 0 1 0 1 1 0 1 LU =  1 2 0   0 1 0  =  1 2 1  = A. −2 3 6 0 0 1 −2 3 4 A = LU is called the LU decomposition of A. We shall see below why this worked, but let’s see how we can use it to solve a system of the form Ax = b. Using the decomposition, we may rewrite this LU x = b. Put y = Lx, and consider the system Ly = b. To be explicit take   1 b = 1 2

32

I. LINEAR ALGEBRA, BASIC NOTIONS

so the system we need to solve is  1 0  1 2 −2 3

    1 0 y1 0   y2  =  1  . 2 y2 6

But this system is very easy to solve. We may simply use Gaussian reduction (Jordan reduction being unnecessary) or equivalently we can use what is called forward substitution. as below: y1 = 1 1 y2 = (1 − y1 ) = 0 2 1 2 y3 = (2 + 2y1 − 3y2 ) = . 6 3 So the intermediate solution is

Now we need only solve U x = y  1 0 0 1 0 0



 1 y =  0. 2/3 or     x1 1 1 0   x2  = y =  0  . 1 x3 2/3

To do this, either we may use Jordan reduction or equivalently, what is usually done, back substitution. 2 3 x2 = 0 − 0 x3 = 0 x3 =

x1 = 1 − 0 x2 − 1 x3 =

1 . 3

So the solution we obtain finally is 

 1/3 x =  0. 2/3 You should check that this is actually a solution of the original system. Note that all this would have been silly had we been interested just in solving the single system Ax = b. In that case, Gauss-Jordan reduction would have sufficed, and it would not have been necessary to store the row operations in the matrix L. However, if we had many such equations to solve with the same coefficient matrix A, we would save considerable time by having saved the important parts of the row operations in L. And unlike the inverse method, forward and back substitution eliminate the need to multiply any matrices.

5. SINGULARITY, PIVOTS, AND INVERTIBLE MATRICES

33

Why the LU decomposition method works. Assume as above, that A is non-singular and can be reduced in the standard order without any row interchanges. Recall that each row operation may be accomplished by pre-multiplying by an appropriate elementary matrix. Let Eij (c) be the elementary matrix which adds c times the ith row to the jth row, and let Ei (c) be the elementary matrix which multiplies the ith row by c. Then in the above example, the Gaussian part of the reduction could be described schematically by A → E12 (−1)A → E13 (2)E12 (−1)A → E2 (1/3)E13 (2)E12 (−1)A → E23 (−3)E2 (1/3)E13 (2)E12 (−1)A → E3 (1/6)E23 (−3)E2 (1/3)E13 (2)E( 12(−1)A = U where



1 U = 0 0

0 1 0

 1 0 1

is the end result of the Gaussian reduction and is upper triangular with ones on the diagonal. To get the LU decomposition of A, simply multiply the left hand side of the last equation by the inverses of the elementary matrices, and remember that the inverse of an elementary matrix is a similar elementary matrix with the scalar replaced by its negative for type one operations or its reciprocal for type two operations. So A = E12 (1)E13 (−2)E2 (3)E23 (3)E3 (6)U = LU where L is just the product of the elementary matrices to the left of A. Because we have been careful of the order in which the operations were performed, all that is necessary to compute this matrix, is to place the indicated scalar in the indicated position. Nothing that is done later can effect the placement of the scalars done earlier, So L ends up being the matrix we derived above. The case in which switching rows is required. In many cases, Gaussian reduction cannot be done without some row interchanges, To see how this affects the procedure, imagine that the row interchanges are not actually done as needed, but the pivots are left in the rows they happen to appear in. This will result in a matrix which is a permuted version of a matrix in Gauss reduced form. We may then straighten it out by applying the row interchanges at the end. Here is how to do this in actual practice. We illustrate it with an example. Let 

0 A = 1 2

0 1 0

 1 1. 4

We apply Gaussian reduction, writing over each step the appropriate elementary

34

matrix  0 1 2

I. LINEAR ALGEBRA, BASIC NOTIONS

which accomplishes   0 1 1 1 E12 0 0 1 1  −→ 0 4 2 0

the desired row operation.    1 1 1 1 E13 (−2) 1  −→  0 0 1 4 0 −2 2   1 1 1 E23  0 −2 2  −→ 0 0 1   1 1 1 E2 (−1/2) −→  0 1 −1  0 0 1 .

Note that two of the steps involved row interchanges: Pre-multiplication by E12 switches rows one and two and E23 switches rows two and three. Do these row interchanges to the original matrix   1 1 1 E23 E12 A =  2 0 4  . 0 0 1 Let Q = E23 E12 , and now apply the LU decomposition procedure to QA as described above. No row interchanges will be necessary, and we get      1 0 0 1 1 1 1 1 1 QA =  2 0 4  =  2 −2 0   0 1 −1  = LU 0 0 1 0 0 1 0 0 1 where



 1 0 0 L =  2 −2 0  0 0 1



and

1 U = 0 0

 1 1 1 −1  0 1

are respectively lower triangular and upper triangular with Now multiply by  0 P = Q−1 = (E12 )−1 (E23 )−1 = E12 E23 =  1 0

ones on the diagonal.

0 0 1

 1 0. 0

We obtain A = P LU Here is a brief description of the process. First do the Gaussian reduction noting the row interchanges required. Then apply those to the original matrix and find its LU decomposition. Finally apply the same row interchanges in the opposite order to the identity matrix to obtain P . Then A = P LU . The matrix P has the property that each row and each column has precisely one nonzero entry which is one. It is

5. SINGULARITY, PIVOTS, AND INVERTIBLE MATRICES

35

obtained by an appropriate permutation of the rows of the identity matrix. Such matrices are called permutation matrices. Once one has the decomposition A = P LU , one may solve systems of the form Ax = b by methods similar to that described above, except that there is also a permutation of the unknowns required. Note. If you are careful, you can recover the constituents of L and U from the original Gaussian elimination, if you apply permutations of indices at the intermediate stages. Exercises for Section 5. 1. In each of the following cases, find the matrix inverse if one exists. Check your answerby multiplication.  1 −1 −2 (a)  2 1 1 2 2 2   1 4 1 (b)  1 1 2  1 3 1   1 2 −1 (c)  2 3 3 4 7 1   2 2 1 1  −1 1 −1 0  (d)   1 0 1 2 2 2 1 2 · 2.

Let A =

¸ a b , and suppose det A = ad − bc 6= 0. Show that c d A−1 =

· ¸ 1 d −b . a ad − bc −c

Hint: It is not necessary to ‘find’ the solution by applying Gauss–Jordan reduction. You were told what it is. All you have to do is show that it works, i.e., that it satisfies the defining condition for an inverse. Just compute AA−1 and see that you get I. Note that this formula is probably the fastest way to find the inverse of a 2 × 2 matrix. In words, you do the following: interchange the diagonal entries, change the signs of the off diagonal entries, and divide by the determinant ad − bc. Unfortunately, there no rule for n × n matrices, even for n = 3, which is quite so simple. 3. Let A and B be invertible n×n matrices. Show that (AB)−1 = B −1 A−1 . Note the reversal of order! Hint: As above, if you are given a candidate for an inverse, you needn’t ‘find’ it; you need only check that it works.

36

I. LINEAR ALGEBRA, BASIC NOTIONS

4. In the general discussion of Gauss-Jordan reduction, we assumed for simplicity that there was at least one non-zero entry in the first column of the coefficient matrix A. That was done so that we could be sure there would be a non-zero entry in the 1, 1-position (after a suitable row interchange) to use as a pivot. What if the first column consists entirely of zeroes? Does the basic argument (for the singular case) still work? 5.

(a) Solve each of the following systems by any method you find convenient. x1 + x2 = 2.0000

x1 + x2 = 2.0000 1.0001x1 + x2 = 2.0001

1.0001x1 + x2 = 2.0002

(b) You should notice that although these systems are very close together, the solutions are quite different. Can you see some characteristic of the coefficient matrix which might suggest a reason for expecting trouble? 6. Below, do all your arithmetic as though you were a calculator which can only handle four significant digits. Thus, for example, a number like 1.0001 would have to be rounded to 1.000. (a) Solve .0001x1 + x2 = 1 x1 − x2 = 0. by the standard Gauss-Jordan approach using the given 1, 1 position as pivot. Check your answer by substituting back in the original equations. You should be surprised by the result. (b) Solve the same system but first interchange the two rows, i.e., choose the original 2, 1 position as pivot. Check your answer by substituting back in the original equations.   1 2 1 7. Find the LU decomposition of the matrix A =  1 4 1 . Use forward and 2 3 1 back substitution to solve the system     1 2 1 1 1 4 1x = 0. 2 3 1 1 Also solve the system directly by Gauss Jordan reduction and compare the results in terms of time and effort.

6. Gauss-Jordan Reduction in the General Case Gauss-Jordan reduction works just as well if the coefficient matrix A is singular or even if it is not a square matrix. Consider the system Ax = b

6. GAUSS–JORDAN REDUCTION IN THE GENERAL CASE

37

where the coefficient matrix A is an m × n matrix. The method is to apply elementary row operations to the augmented matrix [A | b] → · · · → [A0 | b0 ] making the best of it with the coefficient matrix A. We may not be able to transform A to the identity matrix, but we can always pick out a set of pivots, one in each non-zero row, and otherwise mimic what we did in the case of a square non-singular A. If we are fortunate, the resulting system A0 x = b0 will have solutions. Example 1. Consider 

    1 1 1 2 x1  −1 −1 1   x2  =  5  . 3 x3 1 1 3

Reduce the augmented  1 1 2  −1 −1 1 1 1 3

matrix as follows     | 1 1 1 2 | 1 1 | 5 → 0 0 3 | 6 → 0 | 3 0 0 1 | 2 0

1 0 0

 2 | 1 3 | 6 0 | 0

This completes the ‘Gaussian’ part of the reduction with pivots in the 1, 1 and 2, 3 positions, and the last row of the transformed coefficient matrix consists of zeroes. Let’s now proceed with the ‘Jordan’ part of the reduction. Use the last pivot to clear the column above it.       1 1 2 | 1 1 1 2 | 1 1 1 0 | −3 0 0 3 | 6 → 0 0 1 | 2 → 0 0 1 | 2 0 0 0 | 0 0 0 0 | 0 0 0 0 | 0 and the resulting augmented matrix corresponds to the system x1 + x2

= −3 x3 = 2 0=0

Note that the last equation could just as well have read 0 = 6 (or some other non-zero quantity) in which case the system would be inconsistent and not have a solution. Fortunately, that is not the case in this example. The second equation tells us x3 = 2, but the first equation only gives a relation x1 = −3 − x2 between x1 and x2 . That means that the solution has the form         −3 −3 − x2 −1 x1 x =  x2  =  x2  =  0  + x2  1  x3 2 2 0 where x2 can have any value whatsoever. We say that x2 is a free variable, and the fact that it is arbitrary means that there are infinitely many solutions. x1 and x3 are called bound variables. Note that the bound variables are in the pivot positions.

38

I. LINEAR ALGEBRA, BASIC NOTIONS

It is instructive to reinterpret this geometrically in terms of vectors in space. The original system of equations may be written x1 + x2 + 2x3 = 1 −x1 − x2 + x3 = 5 x1 + x2 + 3x3 = 3 which are equations for 3 planes in space. Here we are using x1 , x2 , x3 to denote the coordinates instead of the more familiar x, y, z. Solutions  x1 x =  x2  x3 

correspond to points lying in the common intersection of those planes. Normally, we would expect three planes to intersect in a single point. That would have been the case had the coefficient matrix been non-singular. However, in this case the planes intersect in a line, and the solution obtained above may be interpreted as the vector equation of that line. If we put x2 = s and rewrite the equation using the vector notation you are familiar with from your course in vector calculus, we obtain x = h−3, 0, 2i + sh−1, 1, 0i. You should recognize this as the line passing through the endpoint of the vector h−3, 0, 3i and parallel to the vector h−1, 1, 0i.

Example (1) illustrates many features of the general procedure. Gauss–Jordan reduction of the coefficient matrix is always possible, but the pivots don’t always end up on the diagonal . In any case, the Jordan part of the reduction will yield a 1 in each pivot position with zeroes elsewhere in the column containing the pivot. The position of a pivot in a row will be on the diagonal or to its right, and all entries in that row to the left of the pivot will be zero. Some of the entries to the right of the pivot may be non-zero. If the number of pivots is smaller than the number of rows (which will always be the case for a singular square matrix), then some rows of the reduced coefficient matrix will consist entirely of zeroes. If there are non-zero entries in those rows to the right of the divider in the augmented matrix , the system is inconsistent and has no solutions. Otherwise, the system does have solutions. Such solutions are obtained by writing out the corresponding system, and transposing all terms not associated with the pivot position to the right side of the equation. Each unknown in a pivot position is then expressed in terms of the non-pivot unknowns (if any). The pivot unknowns are said to be bound. The non-pivot unknowns may be assigned any value and are said to be free.

6. GAUSS–JORDAN REDUCTION IN THE GENERAL CASE

39

The vector space Rn . As we saw in Example 1, it is helpful to visualize solutions geometrically. Thus although there were infinitely many solutions, we saw we could capture all the solutions by means of a single parameter s. Thus, it makes sense to describe the set of all solutions as being ‘one dimensional’, in the same sense that we think of a line as being one dimensional. We would like to be able to use such geometric visualization for general systems. To this end, we have to generalize our notion of ‘space’ and ‘geometry’. Let Rn denote the set of all n × 1 column vectors   x1  x2   . .  ..  xn Here the R indicates that the entries are supposed to be real numbers. (As mentioned earlier, we could just as well have considered the set of all 1× n row vectors.) Thus, for n = 1, R1 consists of all 1 × 1 matrices or scalars and as such can be identified with the number line. Similarly, R2 may be identified with the usual coordinate plane, and R3 with space. In making this definition, we hope to encourage you to think of R4 , R5 , etc. as higher dimensional analogues of these familiar geometric objects. Of course, we can’t really visualize such things geometrically, but we can use the same algebra that works for n = 1, 2, or 3, and we can proceed by analogy. For example, as we noted in Section 2, we can define the length |v| of a column vector as the square root of the sum of the squares of its components, and we may define the dot product u · v of two such vectors as the sum of products of corresponding components. The vectors are said to be perpendicular if they are not zero and their dot product is zero. These are straight forward generalizations of the corresponding notions in R2 and R3 . As another example, we can generalize the notion of plane as follows. In R3 , the graph of a single linear equation a1 x1 + a2 x2 + a3 x3 = b is a plane. Hence, by analogy, we call the ‘graph’ in R4 of a1 x1 + a2 x2 + a3 x3 + a4 x4 = b a hyperplane. Example 2. Consider the system x1 + 2x2 − x3 = 0 x1 + 2x2 + x3 + 3x4 = 0 2x1 + 4x2 + 3x4 = 0 which can be rewritten in matrix form    x1    0 1 2 −1 0 x  1 2 (1) 1 3 2  = 0. x3 0 2 4 0 3 x4

40

Reducing the  1 2 −1 1 2 1 2 4 0

I. LINEAR ALGEBRA, BASIC NOTIONS

augmented matrix yields   0 | 0 1 2 −1 0 | 3 | 0 → 0 0 2 3 | 3 | 0 0 0 2 3 |  1 2 −1 0 → 0 0 1 3/2 0 0 0 0

  0 1 2 −1 0 → 0 0 2 0 0 0 0   | 0 1 2 0 | 0 → 0 0 1 | 0 0 0 0

 0 | 0 3 | 0 0 | 0

 3/2 | 0 3/2 | 0  . 0 | 0

(Note that since there are zeroes to the right of the divider, we don’t have to worry about possible inconsistency in this case.) The system corresponding to the reduced augmented matrix is x1 + 2x2

+ (3/2)x4 = 0 x3 + (3/2)x4 = 0 0=0

Thus, x1 = −2x2 − (3/2)x4 x3 =

− 3(/2)x4

with x1 and x3 bound and x2 and x4 free. A general solution has the form         x1 −2x2 − (3/2)x4 −2x2 −(3/2)x4 x2 0 x     x2   x= 2= =  + x3 − (3/2)x4 0 −(3/2)x4 x4 x4 x4 0     −2 −3/2 0  1  x = x2    + x4  −3/2 0 1 0 where the free variables x2 and x4 can assume any value. The bound variables x1 and x3 are then determined. This solution may also be interpreted geometrically in R4 . The original set of equations may be thought of as determining a ‘graph’ which is the intersection of three hyperplanes (each defined by one of the equations.) Note also that each of these hyperplanes passes through the origin since the zero vector is certainly a solution. Introduce two vectors (using vector calculus notation) v1 = h−2, 1, 0, 0i v2 = h−3/2, 0, −3/2, 1i in R4 . Note that neither of these vectors is a multiple of the other. Hence, we may think of them as spanning a (2-dimensional) plane in R4 . Putting s1 = x2 and s2 = x4 , we may express the general solution vector as x = s1 v1 + s2 v2 ,

6. GAUSS–JORDAN REDUCTION IN THE GENERAL CASE

41

so the solution set of the system (1) may be identified with the plane spanned by {v1 , v2 }. Of course, we can’t hope to actually draw a picture of this. Make sure you understand the procedure used in the above examples to express the general solution vector x entirely in terms of the free variables. We shall use it quite generally. Any system of equations with real coefficients may be interpreted as defining a locus in Rn , and studying the structure—in particular, the dimensionality—of such a locus is something which will be of paramount concern. Example 3. Consider 

1  1  −1 2

   1 2 · ¸ 0  x1  5 = .  x2 −7 1 10 0

Reducing the augmented matrix yields    1 2 | 1 1 2 5  1 0 |  0 −2  → −1 1 | −7 0 3 2 0 | 10 0 −4  1 2 0 1 → 0 0 0 0

   | 1 1 2 | 1 | 4  0 −2 | 4  →  | −6 0 0 | 0 | 8 0 0 | 0    | 1 1 0 | 5 | −2   0 1 | −2  →  | 0 0 0 | 0 | 0 0 0 | 0

which is equivalent to x1 = 5 x2 = −2. Thus the unique solution vector is ·

¸ 5 x= . −2 Geometrically, what we have here is four lines in the plane which happen to intersect in the common point with coordinates (5, −2). Rank and Nullity. These examples and the preceding discussion lead us to certain conclusions about a system of the form Ax = b where A is an m × n matrix, x is an n × 1 column vector of unknowns, and b is an m × 1 column vector that is given. The number r of pivots of A is called the rank of A, and clearly it plays an crucial role. It is the same as the number of non-zero rows at the end of the GaussJordan reduction since there is exactly one pivot in each non-zero row. The rank is

42

I. LINEAR ALGEBRA, BASIC NOTIONS

certainly not greater than either the number of rows m or the number of columns n of A. If m = n, i.e., A is a square matrix, then A is non-singular when its rank is n and it is singular when its rank is smaller than n. More generally, suppose A is not square, i.e., m 6= n. In this case, if the rank r is smaller than the number of rows m, then there are column vectors b in Rm for which the system Ax = b does not have any solutions. The argument is basically the same as for the case of a singular square matrix. Transform A by a sequence of elementary row operations to a matrix A0 with its last row consisting of zeroes, choose b0 so that A0 x = b0 is inconsistent, and reverse the operations to find an inconsistent Ax = b. If for a given b in Rm , the system Ax = b does have solutions, then the unknowns x1 , x2 , . . . , xn may be partitioned into two sets: r bound unknowns and n − r free unknowns. The bound unknowns are expressed in terms of the free unknowns. The number n − r of free unknowns is sometimes called the nullity of the matrix A. If the nullity n − r > 0, i.e., n > r, then (if there are any solutions at all) there are infinitely many solutions. Systems of the form Ax = 0 are called homogeneous. Example 2 is a homogeneous system. Gauss-Jordan reduction of a homogeneous system always succeeds since the matrix b0 obtained from b = 0 is also zero. If m = n, i.e., the matrix is square, and A is non-singular, the only solution is 0, but if A is singular, i.e., r < n, then there are definitely non-zero solutions since there are some free unknowns which can be assigned non-zero values. This rank argument works for any m and n: if r < n, then there are definitely nonzero solutions for the homogeneous system Ax = 0. One special case of interest is m < n. Since r ≤ m, we must have r < n in that case. That leads to the following important principle: a homogeneous system of linear algebraic equations for which there are more unknowns than equations always has some non-trivial solutions. Note that the nullity n − r of A measures the ‘number’ of solutions of the homogeneous system Ax = 0 in the sense that it tells us the number of free variables in a general solution. (Of course, it plays a similar role for a general system, but only if it is consistent, i.e., it has solutions.) This explains the etymology of the term ‘nullity’. It measures the ease with which multiplication by A can transform a vector x in Rn to the zero vector in Rm . Pseudo-inverses. (This section is not essential for what follows. ) It some applications, one needs to try to find ‘inverses’ of non-square matrices. Thus, if A is a m × n matrix, one might need to find an n × m matrix A0 such that AA0 = I

the m × m identity.

Such an A0 would be called a right pseudo-inverse. Similarly, an n × m matrix A00 such that A00 A = I the n × n identity is called a left pseudo-inverse.

6. GAUSS–JORDAN REDUCTION IN THE GENERAL CASE

· Example. Let A =

43

¸

1 0

1 0 . To find a right pseudo-inverse, we try to solve 1 1 · ¸ · ¸ 1 1 0 1 0 X= 0 1 1 0 1

for the unknown 3 × 2 matrix X. Apply Gauss–Jordan reduction to the augmented matrix · ¸ · ¸ 1 1 0 | 1 0 1 0 −1 | 1 −1 → . 0 1 1 | 0 1 0 1 1 | 0 1 The corresponding system is ·

1 0

0 1

¸



x −1  11 x21 1 x31

 · ¸ x12 1 −1 x22  = . 0 1 x32

This may be written out explicitly as x11

− x31 = 1 x21 + x31 = 0

x12

− x32 = −1 x22 + x32 = 1

Here x31 and x32 play the roles of free variables, and the other variables are bound. If we put both of these equal to zero, we obtain x11 = 1, x12 = −1, x21 = 0, x22 = 1. Thus, a right pseudo-inverse for A is   1 −1 1. A0 = X =  0 0 0 You should check that AA0 = I. Of course, there are infinitely many other solutions obtained by letting x31 and x32 assume other values. Note that     ¸ 1 0 −1 1 −1 · 1 1 0 = 0 1 1 1 A0 A =  0 0 1 1 0 0 0 0 0 which is definitely not the 3 × 3 identity matrix. So A0 is not a left pseudo-inverse for A. If m < n, i.e., A has fewer rows than columns, then no left pseudo-inverse is possible. Similarly, if m > n, i.e., A has more rows than columns, then no right pseudo-inverse is possible. We shall prove the second statement. Suppose we could find an n × m matrix A0 such that AA0 = I (the m × m identity matrix). Then for any m × 1 column vector b, x = A0 b is a solution of Ax = b since Ax = A(A0 b) = (AA0 )b = Ib = b. On the other hand, we know that since m > n ≥ r, we can always find a b such that Ax = b does not have a solution.

44

I. LINEAR ALGEBRA, BASIC NOTIONS

On the other hand, if m < n and the rank of A is m (which is as large as it can get in any case), then it is always possible to find a right pseudo-inverse. To see this, let   x11 x12 . . . x1m  x21 x22 . . . x2m  X= .. ..   ... . ... .  xn1

xn2

...

xnm

and consider the matrix equation AX = I. It may be viewed as m separate equations of the form       1 0 0 0 1 0      Ax =   ...  , Ax =  ...  , . . . , Ax =  ...  , 0

0

1

one for each column of I. Since r = m, each of these equations has a solution. (In fact it will generally have infinitely many solutions.) Exercises for Section 6. 1. In each of the following cases, apply the Gauss-Jordan reduction process to find the complete solution, if one exists. As in the text, the answer should express the solution x as a ‘particular solution’ (possibly zero) plus a linear combination of ‘basic vectors’ with the free (if any) as coefficients.     unknowns −3 1 −6 −4 x1 (a)  3 −8 −7   x2  =  −5 . 2 x3 −2 2 3     1 2 · ¸ 1 1  x1 3 2 (b)  =  .  x2 4 3 3 2 −1 1      6 1 −2 2 1 x1  1 −2 1 2   x2   4  (c)     =  . x3 14 3 −6 4 5 x4 8 1 −2 3 0 2.

What is wrong with the following reduction and the ensuing logic? · ¸ · ¸ 1 1 1 | 1 1 1 1 | 1 → . 1 2 2 | 1 0 1 1 | 0

The equivalent system is x1 + x2 + x3 = 1 x2 + x3 = 0 which yields the general solution x1 = 1 − x2 − x3 , x2 = −x3 .

6. GAUSS–JORDAN REDUCTION IN THE GENERAL CASE

3.

Find a general solution vector of the system Ax = 0 where     1 0 1 2 1 3 4 0 2 (a) A =  2 −1 1 0 (b) A =  2 7 6 1 1 −1 4 −1 −2 4 13 14 1 3

4.

Consider the vectors



 −1  0 u= , 1 0

45

  1 1 v=  0 1

in R4 . If these were vectors in R3 , we could use the formula u·v cos θ = |u||v| to determine the angle θ between the two vectors. In R4 , we can’t of course talk directly about angles in the geometric sense we are familiar with, but we can still use the above formula to define the angle between the two vectors. In this example, find that angle. 5. What is the rank of the coefficient matrix for each of the matrices in the previous problem. 6.

What is the rank of each  1 1 1 2 4 5

of the following matrices?      1 1 1 2 3 4 2  3, 0 0 0 0,   3 6 0 0 0 0 4

7. Let A be an m × n matrix with m < n, and let r be its rank. Which of the following is always true, sometimes true, never true? (a) r ≤ m < n. (b) m < r < n. (c) r = m. (d) r = n. (e) r < m. (f) r = 0. 8. (a) A system Ax = b with A an m × n matrix of rank r will always have solutions if m = r. Explain. (b) It will not have solutions for some choices of b if r < m. Explain. 9. How do you think the rank of a product AB compares to the rank of A? Is the former rank always ≤, ≥, or = the latter rank? Try some examples, make a conjecture, and see if you can prove it. Hint: Look at the number of rows of zeroes after you reduce A completely to A0 . Could further reduction transform A0 B to a matrix with more rows of zeroes? 10.

(Optional) Find a right pseudo-inverse A0 for · ¸ 1 1 2 A= . 2 1 1

Note that there are infinitely many answers to this problem. You need only find one, but if you are ambitious, you can find all of them. Is there a left pseudo-inverse for A. If there is find one, if not explain why not.

46

I. LINEAR ALGEBRA, BASIC NOTIONS

11. (Optional) If A is an m × n matrix with m > n (more rows that columns), we showed in the text that there can be no right pseudo-inverse A0 for A. How can we use this fact to conclude that if m < n (fewer rows than columns), there is no left pseudo-inverse for A?

7. Homogeneous Systems and Vector Subspaces As mentioned in the previous section, a system of equations of the form Ax = 0 is called homogeneous. (The ‘b’ for such a system consists of zeroes.) A system of the form Ax = b where b 6= 0 is called inhomogeneous. Every inhomogeneous system has an associated homogeneous system, and the solutions of the two systems are closely related. To see this, review the example from the previous section      1 1 1 2 x1  −1 −1 1   x2  =  5  . x3 3 1 1 3 We showed that its general solution has the form     −1 −3 (1) x =  0  + x2  1  , 0 2 where x2 is free and may assume any that the homogeneous system  1 1  −1 −1 1 1

value. On the other hand, it is easy to check     2 x1 0 1   x2  =  0  0 x3 3

has the general solution 

(2)

 −1 x = x2  1  , 0

where x2 is also free. (You should go back and verify that for yourself, which should be easy since the Gauss-Jordan reduction is exactly the same; the only difference is that you have zeroes to the right of the vertical bar.) A close reading of (1) and (2) is informative. First note that if we set x2 = 0 in (1), we obtain the specific solution   −3  0 2

7. HOMOGENEOUS SYSTEMS AND VECTOR SUBSPACES

47

and then the remaining part of the solution is a general solution (2) of the homogeneous equation. The phenomenon illustrated above is part of a general principle. You can always find a general solution of an inhomogeneous linear system by adding one particular solution to a general solution of the corresponding homogeneous system. The reason for this is fairly clear algebraically. Let x0 denote the particular solution of the inhomogeneous equation and let x denote any other solution. Then we have Ax = b Ax0 = b which simply asserts that both are solutions. Now subtract Ax − Ax0 = b − b = 0. However, since Ax − Ax0 = A(x − x0 ), this yields A(x − x0 ) = 0 from which we conclude that z = x − x0 is a solution of the homogeneous system. Transposition yields z . x = x0 + |{z} |{z} par. sol.

hom. sol.

Vector Subspaces. Because of the above remarks, homogeneous systems play a specially important role, so we want to concentrate on the solution sets of such systems. Let A be an m × n matrix. The set of all solutions x of the homogeneous system Ax = 0 is called the null space of A. Notice that the null space of an m × n matrix is a subset of Rn . Null spaces have an important property which we now discuss. A non-empty subset V of Rn is called a vector subspace if it has the property that any linear combination of vectors in V is also in V . In symbols, if u and v are vectors in V , and a and b are scalars, then au + bv is also a vector in V . In two and three dimensions, the subsets which are subspaces are pretty much what you would expect. In R2 any line through the origin is a subspace, but lines not through the origin are not. The diagram below indicates why.

O

Line not through origin

O

Line through origin

48

I. LINEAR ALGEBRA, BASIC NOTIONS

Also, curves are not vector subspaces. (See the exercises at the end of the section). In R3 any line through the origin is also a subspace and lines not through the origin are not. Similarly, planes through the origin are vector subspaces, but other planes are not, and of course curves or curved surfaces are not. There is one slightly confusing point about the way we use this terminology. The entire set Rn is considered a subset of itself, and it certainly has the desired property, so it is considered a vector subspace of itself. It is not hard to see that the zero vector must be in every vector subspace W . Indeed, just pick any two vectors u and v in W —v could even be a multiple of u. Then 0 = (0)u + (0)v, the linear combination with both scalars a = b = 0, must also be in W . The upshot is that any set which does not contain the zero vector cannot be a vector subspace. The set consisting only of the zero vector 0 has the desired property—any linear combination of zero with itself is also zero. Hence, that set is also a vector subspace, called the zero subspace. The term ‘subspace’ is sometimes used more generally to refer to any subset of Rn . Hence the adjective ‘vector’ is crucial. Sometimes people use the term ‘linear subspace’ instead. There are two ways vector subspaces come about. First of all, as noted above, they arise as null spaces, i.e., as solution sets of homogeneous systems Ax = 0. That is the main reason we are interested in them. To see why a null space satisfies the definition, suppose u and v are both solutions of Ax = 0. That is, Au = 0 and Av = 0. Then A(au + bv) = A(au) + A(bv) = aAu + bAv = a 0 + b 0 = 0. So any linear combination of solutions is again a solution and is again in the null space of A. There is another related way in which vector subspaces arise, and this will play an important role in analyzing solutions of linear systems. Recall the homogeneous system   x1     0 1 2 −1 0 x  1 2 (2) 1 3 2  = 0. x3 0 2 4 0 3 x4 discussed in the previous section. We saw that its null space consists of all vectors of the form     −2 −3/2 0  1  x2   + x4   0 −3/2 0 0 as the free scalars x2 and x4 range over all possible values. Let     −2 −3/2 0  1  v1 =  v2 =   . 0 −3/2 0 0

7. HOMOGENEOUS SYSTEMS AND VECTOR SUBSPACES

49

Then, what we have discovered is that the solution set or null space consists of all linear combinations of the set {v1 , v2 } of vectors. This is a much more useful way of presenting the answer, since we specify it in terms of a small number of objects—in this case just two. Since the null space itself is infinite, this simplifies things considerably. In general, suppose W is a vector subspace of Rn and {v1 , v2 , . . . , vk } is a finite subset of W . We say that {v1 , v2 , . . . , vn } is a spanning set for W (or more simply that it spans W ) if each vector v in W can be expressed as a linear combination v = s1 v1 + s2 v2 + · · · + sk vk , for appropriate scalars s1 , s2 , . . . , sk . The simplest case of this is when k = 1, i.e., the spanning set consists of a single vector v. Then the subspace spanned by this vector is just the set of all sv with s an arbitrary scalar. If v 6= 0, this set is just the line through the origin containing v. Example. Consider the set of solutions x in R4 of the single homogeneous equation x1 − x2 + x3 − 2x4 = 0. This is the null space of the 1 × 4 matrix A = [ 1 −1

1 −2 ] .

The matrix is already reduced with pivot 1 in the 1, 1-position. The general solution is x2 , x3 , x4 free, x1 = x2 − x3 + 2x4 and the general solution vector is        1 −1 2 x2 − x3 + 2x4 1 0 x      0  2 x=  = x2   + x3   + x4   . x3 0 1 0 0 0 1 x4 

It follows that the null space is spanned by    1   1 v1 =   , 0   0



 −1  0 v2 =  , 1 0

  2   0 v3 =   . 0   1

This is a special case of a more general principle: Gauss-Jordan reduction for a homogeneous system always results in a description of the null space as the vector subspace spanned by a finite set of basic solution vectors. We shall elaborate a bit more on this principle in the next section.

50

I. LINEAR ALGEBRA, BASIC NOTIONS

Exercises for Section 7. 1. What is the general solution of the equation x1 − 2x2 + x3 = 4? Express it as the sum of a particular solution plus the general solution of the equation x1 − 2x2 + x3 = 0. 2. Determine if each of the following subsets of R3 is a vector subspace of R3 . If it is not a subspace, explain  fails.  what x1 (a) The set of all x =  x2  such that 2x1 − x2 + 4x3 = 0. x  3 x1 (b) The set of all x =  x2  such that 2x1 − x2 + 4x3 = 3. x  3 x1 (c) The set of all x =  x2  such that x1 2 + x2 2 − x3 2 = 1. x3   1 + 2t (d) The set of all x of the form x =  −3t  where t is allowed to assume any 2t real value.   s + 2t (e) The set of all x of the form x =  2s − 3t  where s and t are allowed to s + 2t assume any real values. 3. Let L1 and L2 be two distinct lines through the origin in R2 . Is the set S consisting of all vectors pointing along one or the other of these two lines a vector subspace of R2 ? 4.

Let

  1 u1 =  1  0

  0 and u2 =  1  . 3

What is the subspace of R3 spanned by these two vectors? Describe it another way. 5.

(a) What is the subspace of R3 spanned by the set  

  1 v1 =  1  ,  2

  1 v2 =  0  , 1

  2  v3 =  1  ?  3

(b) What is the subspace of R3 spanned by  

  0 v1 =  1  ,  1

  1 v2 =  1  , 2

  1  v3 =  2  ?  3

8. LINEAR INDEPENDENCE, BASES, AND DIMENSION

51

6. (a) Find a spanning set for the plane in R3 through the origin defined by the equation x1 − 2x2 + 5x3 = 0. Check that each element of your spanning set is perpendicular to the normal vector with components h1, −2, 5i. (b) Find a spanning set for the line in R2 through the origin defined by the equation x1 + x2 = 0.

8. Linear Independence, Bases, and Dimension Let V be a vector subspace of Rn . If V is not the zero subspace, it will have infinitely many elements, but it turns out that it is always possible to specify V as the subspace spanned by some finite subset {v1 , v2 , . . . , vk } of elements of V . (If you are curious why, see the appendix to this section where it is proved.) When doing this, we want to make sure that we don’t have any superfluous vectors in the set {v1 , v2 , . . . , vk }. Example 1. Let

  1 v1 =  2  , 4

  1 v2 =  2  . 3

The subspace of R3 spanned by these two vectors is a plane through the origin.

Neither of these vectors is superfluous since if you omit either, what you get is the line through the origin containing the other. You don’t get the entire plane. Consider instead the vectors   1 v1 =  2  , 4

  2 v2 =  4  . 8

52

I. LINEAR ALGEBRA, BASIC NOTIONS

In this case, the second vector is twice the first vector. Hence, for any linear combination, we have s1 v1 + s2 v2 = s1 v1 + s2 (2v1 ) = (s1 + 2s2 )v1 . If we put s = s1 + 2s2 , then s also ranges over all possible scalars as s1 and s2 do, so the subspace in fact consists of all sv1 , that is, it is a line through the origin. Thus, the vector v2 may be dropped. Similarly, we could have kept v2 and eliminated v1 since v1 = (1/2)v2 . In any case, one of the two vectors is superfluous. In order to deal with the issue of superfluous vectors in a spanning set, we introduce an important new concept. Let {v1 , v2 , . . . , vk } be a non-empty set of vectors in Rn , not all of which are zero. Such a set is called linearly independent if no element of the set can be expressed as a linear combination of the other elements in the set. For a set {v1 , v2 } with two vectors, this is the same as saying that neither vector is a scalar multiple of the other. For a set {v1 , v2 , v3 } with three elements it means that no relation of any of the following forms is possible: v1 = a2 v2 + a3 v3 v2 = b 1 v1 + b 3 v3 v3 = c1 v1 + c2 v2 . The opposite of ‘linearly independent’ is ‘linearly dependent’. Thus, in a linearly dependent set, there is at least one vector which is expressible as a linear combination of the others. It is important to note that linear independence and linear dependence are properties of the entire set, not the individual vectors in the set. Example 2. Consider the set consisting of the following four vectors in R4 .         1 1 0 0 0  −1  1  1  v1 =   , v2 =   , v3 =   , v4 =  . 0 0 0 1 0 −1 0 0 This set is not linearly independent since (1)

v2 = v1 − v3 .

Thus, any element in the subspace spanned by {v1 , v2 , v3 , v4 } can be rewritten c1 v1 + c2 v2 + c3 v3 + c4 v4 = c1 v1 + c2 (v1 − v3 ) + c3 v3 + c4 v4 = (c1 + c2 )v1 + (c3 − c2 )v3 + c4 v4 = c01 v1 + c03 v3 + c4 v4 .

8. LINEAR INDEPENDENCE, BASES, AND DIMENSION

53

On the other hand, if we delete the element v2 , the set consisting of the vectors       1 0 0 0 1  1  v1 =   , v3 =   , v4 =  . 0 0 1 0 0 −1 is linearly independent. To see this, just look carefully at the pattern of zeroes. For example, v1 has first component 1, and the other two have first component 0, so v1 could not be a linear combination of v2 and v3 . Similar arguments eliminate the other two possible relations. In the above example, we could just as well have written v1 = v2 + v3 and eliminated v1 from the spanning set without loss. In general, there are many possible ways to delete superfluous vectors from a spanning set. There are a couple of slightly confusing points about the definition of linear independence. First, the set {v} consisting of a single nonzero vector v is linearly independent. (For, there aren’t any other vectors in the set which could be linear combinations of it.) The set {0} consisting only of the zero vector is not covered by the definition, but, for technical reasons, we specify that it is linearly dependent. Also, for technical reasons, we specify that the empty set, that is, the set with no vectors, is linearly independent. Bases. Let V be a vector subspace of Rn . A subset {v1 , v2 , . . . , vk } of V which spans it and is also linearly independent is called a basis for V . A simple example of a basis is the set        1 0 0   e1 =  0  , e2 =  1  , e3 =  0    0 0 1 which is a basis for R3 . (These vectors are usually called i, j, and k in three dimensional vector algebra. They are the unit vectors pointing along the coordinate axes.) To see that this set is linearly independent, notice that the pattern of ones and zeroes precludes one of them being expressible as a linear combination of the others. Each is one where the others are zero. To see that the set spans R3 , note that we can write any vector x in R3 as         0 0 x1 x1 x =  x2  =  0  +  x2  +  0  x3 0 0 x3       1 0 0 = x1  0  + x2  1  + x3  0  = x1 e1 + x2 e2 + x3 e3 . 0 0 1 In general, let ei be the vector in Rn with all entries zero except the ith entry which is one. (One may also describe this vector as the ith column of the n × n

54

I. LINEAR ALGEBRA, BASIC NOTIONS

identity matrix.) Then, arguing as above, it is not hard to see that {e1 , e2 , . . . , en } is a basis for Rn . It is called the standard basis for Rn . There are many other bases for Rn . Indeed, it turns out that any linearly independent set in Rn with n elements is necessarily a basis for Rn . (See the Appendix below for an explanation.) It is more interesting, perhaps, to consider bases for proper subspaces of Rn . Many algorithms for solving linear problems in mathematics and its applications yield bases. Let A be an m × n matrix, and let W be null space of A, i.e., the solution space in Rn of the homogeneous system Ax = 0. The Gauss-Jordan reduction method always generates a basis the for null space W . We illustrate this with an example. (You should also go back and look at Example 2 in Section 6.) Example 3. Consider

   x1 1 1 0 3 −1  x2    1 1 1 2 1   x3  = 0.   x4 2 2 1 5 0 x5 To solve it, apply Gauss-Jordan reduction    1 1 0 3 −1 | 0 1 1 0 3 1 1 1 2 1 | 0  →  0 0 1 −1 2 2 1 5 0 | 0 0 0 1 −1  1 1 0 3 →  0 0 1 −1 0 0 0 0 The last matrix is fully reduced with pivots in the 1, 1 corresponding system is 

x1 + x2

+ 3x4

−1 2 2 0 2 0 and

 | 0 | 0 | 0  | 0 | 0. | 0 2, 3 positions. The

=0

x3 − x4 + 2x5 = 0 with x1 , x3 bound and x2 , x4 , and x5 free. Expressing the bound variables in terms of the free variables yields x1 = −x2 − 3x4 x3 = + x4 − 2x5 . The general solution vector, when expressed in terms of the free variables, is           0 x1 −x2 − 3x4 −x2 −3x4 0 x2 0   x2     x2             x =  x3  =  x4 − 2x5  =  0  +  x4  +  −2x5            x4 x4 x4 0 0 x5 x5 0 x5 0       −1 −3 0  1  0  0       = x2  0  + x4  1  + x5  −2  .       0 0 1 1 0 0

8. LINEAR INDEPENDENCE, BASES, AND DIMENSION

If we put



 −1  1   v1 =  0  ,   0 0



 −3  0   v2 =  1  ,   1 0

55



 0  0   v3 =  −2  ,   0 1

and c1 = x2 , c2 = x4 , and c3 = x5 , then the general solution takes the form x = c1 v1 + c2 v2 + c3 v3 where the scalars c1 , c2 , c3 (being new names for the free variables) can assume any values. Also, the set {v1 , v2 , v3 } is linearly independent. This is clear for the following reason. Each vector is associated with one of the free variables and has a 1 in that position where the other vectors necessarily have zeroes. Hence, none of the vectors can be linear combinations of the others. It follows that {v1 , v2 , v3 } is a basis for the null space. The above example illustrates all the important aspects of the solution process for a homogeneous system Ax = 0. We state the important facts about the solution without going through the general proofs since they are just the same as what we did in the example but with a lot more confusing notation. The general solution has the form x = s1 v1 + s2 v2 + · · · + sk vk where v1 , v2 , . . . , vk are basic solutions obtained by successively setting each free variable equal to 1 and the other free variables equal to zero. s1 , s2 . . . , sk are just new names for the free variables. The set {v1 , v2 , . . . , vk } is linearly independent because of the pattern of ones and zeroes at the positions of the free variables, and since it spans the null space, it is a basis for the null space of A. The dimension of the null space of A is the nullity of A, i.e., it is the number of free variables in the solution of the homogeneous system Ax = 0. There are some special cases which are a bit confusing. First, if k = 1, the basis consists of a single vector v1 , and the set of solutions consists of all multiples of that vector. A much more confusing case is that in which the spanning set is the empty set, i.e., the set with no elements. That would arise if the zero solution were the unique solution of the homogeneous system, so there would be no free variables and no basic solutions. This is dealt with as follows. First, as noted earlier, the empty set is linearly independent by convention. Second, again by convention, every linear combination of no vectors is set to zero. It follows that the empty set spans the zero subspace {0}, and is a basis for it. Let V be a vector subspace of Rn . If V has a basis {v1 , v2 , . . . , vk } with k elements, then we say that V is k-dimensional. That is, the dimension of a vector subspace is the number of elements in a basis. Not too surprisingly, for the extreme case V = Rn , the dimension is n. For, the standard basis {e1 , e2 , . . . , en } has n elements.

56

I. LINEAR ALGEBRA, BASIC NOTIONS

In this chapter we have defined the concept dimension only for vector subspaces or Rn , but the notion is considerably more general. For example, a plane in R3 should be considered two dimensional even if it doesn’t pass through the origin. Also, a surface in R3 , e.g., a sphere or hyperboloid, should also be considered two dimensional. (People are often confused about curved objects because they seem to extend in extra dimensions. The point is that if you look at a small part of a surface, it normally looks like a piece of a plane, so it has the same dimension. Also, a surface can usually be represented parametrically with only two parameters.) Mathematicians have developed a very general theory of dimension which applies to almost any type of set. In cosmology, one envisions the entire universe as a certain type of four dimensional object. Certain bizarre sets can even have a fractional dimension, and that concept is useful in what is called ‘chaos’ theory. Coordinates. Let V be a subspace of Rn and suppose {v1 , v2 , . . . , vk } is a basis for V . Suppose v is any vector of V . Then v = s1 v1 + s2 v2 + · · · + sk vk for appropriate coefficients s1 , s2 , . . . , sk . The coefficients s1 , s2 , . . . , sk in such a linear combination are unique, and are called the coordinates of the vector v with respect to the basis. We illustrate this with an example which shows how to find coordinates and why they are unique. Example 4. Consider the plane V in R3 spanned by the linearly independent pair of vectors     1 1 v2 =  1  v1 =  2  , 4 3 and consider the vector

  1 v = 3. 5

If v is in V , then it can be written         1 1 1 1 v = s1 v1 + s2 v2 = s1  2  + s2  1  =  2  s1 +  1  s2 . 4 3 4 3 Here, we have rewritten the linear combination with the scalars on the right. The advantage of so doing is that we may re-express it as a matrix product. Namely,         1 s1 + s2 1 1 · ¸ 1  2  s1 +  1  s2 =  2s1 + s2  =  2 1  s1 . s2 4s1 + 3s2 3 4 3 4 Hence, asking if v = v1 s1 + v2 s2 amounts    1 1 3 = 2 5 4

to asking if  1 · ¸ s 1 1 s2 3

8. LINEAR INDEPENDENCE, BASES, AND DIMENSION

57

has a solution s1 , s2 . This is a system of 3 equations in two unknowns (with the ‘given’ vector on the left instead of as usual on the right). It may be solved by Gauss-Jordan reduction as follows.       1 1 | 1 1 1 | 1 1 0 | 2  2 1 | 3  →  0 −1 | 1  →  0 1 | −1  . 4 3 | 5 0 −1 | 1 0 0 | 0 Thus, it has the unique solution s1 = 2, s2 = −1. Thus, · ¸ 2 −1 is the coordinate vector giving the coordinates of v with respect to this basis, i.e., v can be written uniquely · ¸ 2 . v = 2v1 − v2 = v1 (2) + v2 (−1) = [ v1 v2 ] −1 . There are a couple of points about the above example which merit some discussion. First, had the system not had a solution, that would just have meant that the vector v was not in fact in the subspace spanned by {v1 , v2 }. Second, the solution was unique because the rank was as large as possible, in this case two, and there were no free variables. If the rank had been smaller than two, then the corresponding homogeneous system · ¸ s v1 s1 + v2 s2 = [ v1 v2 ] 1 = 0 s2 would necessarily have had non-zero solutions. However, any such solution with say s2 6= 0, would have allowed us to express v2 = −v1

s1 s2

which would contradict the linear independence of the basis. The general case is similar. If {v1 , v2 , . . . , vk } is a basis for the subspace V of Rn , then v is in this subspace if and only if the system 

v = [ v1

v2

...

 s1  s2   vk ]   ...  sk

has a solution. (Note that the ‘given’ vector v is on the left rather than on the right as usual.) The coefficient matrix [ v1 v2 . . . vk ] is an n × k matrix with columns the elements of the basis. It is necessarily of largest possible rank k, and the solution of the system is unique. Otherwise, the corresponding homogeneous system

58

I. LINEAR ALGEBRA, BASIC NOTIONS

would have non-trivial solutions and that would contradict the linear independence of the basis. Given a basis for a vector subspace V , one may think of the elements of the basis as unit vectors pointing along coordinate axes for V . The coordinates with respect to the basis then are the coordinates relative to these axes. The case V = Rn is specially enlightening. Implicitly at least one starts in Rn with the standard basis consisting of the vectors     1 0 . .    . e1 = . , . . . , en = ..  . 0 1 However, there are many other possible bases for Rn which might be useful in some applications. The axes associated with such a basis need not be mutually perpendicular, and also the units of length along these axes may differ. Appendix. Some subtleties. We discuss here some of the subtleties of the theory. This should be of interest to mathematics majors and some others who enjoy theory, but it is not essential for understanding the subject matter. First, we explain why any linearly independent subset {v1 , v2 , . . . , vn } with exactly n elements is necessarily a basis for Rn . Namely, we saw that the linear independence of the set assures us that the n × n matrix [ v1

v2

. . . vn ]

has rank n. Hence, it follows from our general theory that   s1  s2   v = [ v1 v2 . . . vn ]   ...  sn will have a solution

 s1  s2   .   ..  

sn n

for every v in R . That is, every vector in Rn is expressible as a linear combination of {v1 , v2 , . . . , vn }. Next, we investigate some subtle points involved in the definition of dimension. Invariance of dimension. The dimension of V is the number of elements in a basis for V , but it is at least conceivable that two different bases have different numbers of elements. If that were the case, V would have two different dimensions, and that does not square with our idea of how such words should be used. In fact it can never happen that two different bases have different numbers of elements. To see this, we shall prove something slightly different. Suppose V has a basis with k elements. We shall show that

8. LINEAR INDEPENDENCE, BASES, AND DIMENSION

59

any linearly independent subset of V has at most k elements. This would suffice for what we want because if we had two bases one with k and the other with m elements, either could play the role of the basis and the other the role of the linearly independent set. (Any basis is also linearly independent!) Hence, on the one hand we would have k ≤ m and on the other hand m ≤ k, whence it follows that m = k. Here is the proof of the above assertion about linearly independent subsets. Let {u1 , u2 , . . . , um } be a linearly independent subset. Each ui can be expressed uniquely in terms of the basis 

u1 =

k X

vj pj1 = [ v1

v2

...

vj pj2 = [ v1

v2

...

 p11  p21   vk ]   ..  . pk1   p21  p22   vk ]   ..  . pk2

j=1

u2 =

k X j=1

.. .

um =

k X



vj pjm = [ v1

v2

 p1m  p2k   vk ]   ..  . . pkm

...

j=1

Each of these equations represents one column of the complete matrix equation 

[ u1

u2

...

um ] = [ v1

v2

Note that the matrix on the right is an system  p11 p12 . . .  p21 p22 . . .  . ..  . . . ... pk1 pk2 . . .

...

p11  p21 vk ]   .. .

p12 p22 .. .

pk1

pk2

... ... ... ...

 p1m p2m  ..  . . pkm

k × m matrix. Consider the homogeneous   p1m x1 p2m   x2    ..    ...  = 0. . xm pkm

Assume, contrary to what we hope, that m > k. Then, we know by the theory of homogeneous linear systems, that there is a non-trivial solution to this system, i.e.,

60

I. LINEAR ALGEBRA, BASIC NOTIONS

one with at least one xi not zero. Then 

[ u1

u2

...

 x1  x2   um ]   ...  = xm [ v1



v2

...

p11  p21 vk ]   .. .

p12 p22 .. .

pk1

pk2

... ... ... ...

 p1m p2m    ..   .

 x1 x2  = 0. ..  . 

pkm

xm

Thus, 0 has a non-trivial representation 0 = u1 x1 + u2 x2 + · · · + um xm which we know can never happen for a linearly independent set. Thus, the only way out of this contradiction is to believe that m ≤ k as claimed. One consequence of this argument is the following fact. If V and W are subspaces of Rn with V ⊆ W , then the dimension of V is less than or equal to the dimension of W , i.e., larger subspaces have larger dimension. The reasoning is that a basis for V is necessarily a linearly independent set and so it cannot have more elements than the dimension of W . It is important to note that two different bases of the same vector space might have no elements whatsoever in common. All we can be sure of is that they have the same size. Existence of Bases. We assumed implicitly in our discussion of subspaces that every subspace V does in fact have a basis. The following arguments show that this is true. Start by choosing a sequence of vectors v1 , v2 , v3 , . . . in V , but make sure that at each stage the next vector vp that you choose is not a linear combination of the previous vectors v1 , v2 , . . . , vp−1 . It is possible to show that the finite set {v1 , v2 , . . . , vp } is always linearly independent. (The vector vp is not a linear combination of the others by construction, but you have to fiddle a bit to show the none of the previous ones are linear combinations involving vp .) The only question then is whether or not this sequence can go on forever. It can’t do that since eventually we would get a linearly independent subset of Rn with n + 1 elements, and since Rn has dimension n, that is impossible. Hence, the sequence stops, since, at some stage, we can’t choose any vector in V which is not a linear combination of the set of vectors so far chosen. Thus, that set spans V , and since, as just noted, it is linearly independent, it is a basis. Exercises for Section 8. 1. In each of the following cases, determine if the indicated set is linearly independent or not.

8. LINEAR INDEPENDENCE, BASES, AND DIMENSION

61

       1  0  1 (a)  2  ,  1  ,  1  .   2 1 3        1 0 0     0 1 0 (b)   ,   ,   . 0 1    0  1 1 1 2. Find a basis for the subspace of R4 consisting of solutions of the homogeneous system   1 −1 1 −1 1 2 −1 1  x = 0. 0 3 −2 2 3. Find the dimension of the nullspace of A in each of the following cases. (See the Exercises for Section 6.)     1 0 1 2 1 3 4 0 2 (a) A =  2 −1 1 0 (b) A =  2 7 6 1 1 4 13 14 1 3 −1 4 −1 −2 4.

Can the zero vector be an element of a linearly independent set?

5. (Optional) Let {v1 , v2 , v3 } be a subset of Rn Show that the set is linearly independent if and only if the equation 0 = c1 v2 + c2 v2 + c3 v3 has only the trivial solution, i.e., all the coefficients c1 = c2 = c3 = 0. The generalization of this to n vectors is very convenient to use when proving a set is linearly independent. It is often taken as the definition of linear independence in books on linear algebra. 6.

Let

  1 v1 =  1  , 0

  0 v2 =  1  , 1

  1 v3 =  0  . 1

(a) Show that {v1 , v2 , v3 } is a linearly independent set. Hint. If one of the three vectors were a linear combination of the other two, what relation would it have to the cross product of those two? (b) Why can you conclude that  it isa basis for R3 ? 1 (c) Find the coordinates of v =  1  with respect to this basis. 2 7.

·

Show that u1 =

1 −1

¸ and

u2 =

· ¸ 1 1

62

I. LINEAR ALGEBRA, BASIC NOTIONS

form a linearly independent pair in R2 . It follows that they form a basis for R2 . Why? Find the coordinates of e1 and e2 with respect to this new basis. Hint. You need to solve ·

[ u1

x u2 ] 1 x2

¸

·

= e1

and

[ u1

x u2 ] 1 x2

¸ = e2 .

You can solve these simultaneously by solving [ u1

u2 ] X = I

for an appropriate 2 × 2 matrix X. What does this have to do with inverses? 8.

Let

  1 v1 =  1  , 0

  0 v2 =  1  . 1

3 (a) Show that  {v1 , v2 } is a basis for the subspace W of R that it spans. 1 (b) Is v =  −1  in this subspace? If so, find its coordinates with respect to −2 the basis.

9.

(Optional) It is possible to consider infinite sequences of the form x = (x1 , x2 , . . . , xn , . . . )

to be ‘infinite dimensional’ vectors. The set R∞ of all of these is a generalization of a vector space, and many of the concepts we developed for Rn apply to it. Such sequences are added by adding corresponding entries and a sequence is multiplied by a scalar by multiplying each entry by that scalar. Let ei be the vector in R∞ with xi = 1 and all other entries zero. (a) Show that the set {e1 , e2 , . . . , en } of the first n of these is a linearly independent set for each n. Thus there is no upper bound on the size of a linearly independent subset of R∞ . (b) Does the set of all possible ei span R∞ ? Explain.

9. Calculations in Rn Let {v1 , v2 , . . . , vk } be a collection of vectors in Rn . It is a consequence of our discussion of coordinates in the previous section that the set is linearly independent if and only if the n × k matrix [ v1 v2 . . . vk ] has rank k. In that case, the set is a basis for the subspace W that is spans.

9. CALCULATIONS IN Rn

Example 1. Is

   1 1,  0

a linearly independent set? To test vectors as columns:    1 0 1 1 1 1 0 → 0 0 1 1 0

  0 1, 1

63

  1  0  1

this, find the rank of the matrix with these   0 1 1 1 −1  →  0 1 1 0

 0 1 1 −1  . 0 2

It is clear without proceeding further that the rank is three, so the set is linearly independent. More generally, suppose {v1 , v2 , . . . , vk } is a set of vectors in Rn which may or may not be linearly independent. It is often useful to have a way to pick out a linearly independent subset of the set which spans the same subspace W as the original set. Then that subset will be a basis for W . The basic idea (no pun intended) is to throw away superfluous vectors until that is no longer possible, but there is a systematic way to do this all at once. Since the vectors vi are elements of Rn , each may be specified as a n × 1 column vector. Put these vectors together to form an n × k matrix A = [ v 1 v2 . . . v k ] . To find a basis, apply Gaussian reduction to the matrix A, and pick out the columns of A which in the transformed reduced matrix end up with pivots. Example 2. Let   1 0 v1 =   , 1 1

  2 2 v2 =   , 4 0



 −1  1 v3 =  , 0 −2

  0 1 v4 =   . 1 0

Form the matrix A with these columns and apply Gaussian reduction 

1 0  1 1

   2 −1 0 1 2 −1 1 2 1 1 2 1 1 0 →  4 0 1 0 2 1 1 0 −2 0 0 −2 −1 0   1 2 −1 1 1 1 0 2 →  0 0 0 0 0 0 0 1   1 2 −1 1 1 1 0 2 → . 0 0 0 1 0 0 0 0

64

I. LINEAR ALGEBRA, BASIC NOTIONS

This completes the Gaussian reduction, and the pivots are in the first, second, and fourth columns. Hence, the vectors   1 0 v1 =   , 1 1

  2 2 v2 =   , 4 0

  0 1 v4 =   1 0

form a basis for the subspace spanned by {v1 , v2 , v3 , v4 }. Let’s look more closely at this example to see why the subset is linearly independent and also spans the same subspace as the original set. The proof that the algorithm works in the general case is more complicated to write down but just elaborates the ideas exhibited in the example. Consider the homogeneous system Ax = 0. This may also be written 

Ax = [ v1

v2

v3

 x1 x  v4 ]  2  = v1 x1 + v2 x2 + v3 x3 + v4 x4 = 0. x3 x4

In the general solution, x1 , x2 , and x4 will be bound variables (from the pivot positions) and x3 will be free. That means we can set x3 equal to anything, say x3 = −1 and the other variables will be determined. For this choice, the relation becomes v1 x1 + v2 x2 − v3 + v4 x4 = 0 which may be rewritten v3 = x1 v1 + x2 v2 + x4 v4 . Thus, v3 is superfluous and may be eliminated from the set without changing the subspace spanned by the set. On the other hand, the set {v1 , v2 , v4 } is linearly independent, since if we were to apply Gaussian reduction to the matrix A0 = [ v1

v2

v4 ]

the reduced matrix would have a pivot in every column, i.e., it would have rank 3. Thus, the system   x1 [ v1 v2 v4 ]  x2  = v1 x1 + v2 x2 + v4 x4 = 0 x4 has only the trivial solution. That means that no one of the three vectors can be expressed as a linear combination of the other two. For example, if v2 = c1 v1 +c4 v4 , we have v1 c1 + v2 (−1) + v4 c4 = 0. It follows that the set is linearly independent.

9. CALCULATIONS IN Rn

65

Column Space and Row Space. Let A be an m × n matrix. The columns v1 , v2 , . . . , vn of A are vectors in Rm , and {v1 , v2 , . . . , vn } spans a subspace of Rm called the column space of A. The column space plays a role in the the theory of inhomogeneous systems Ax = b in the following way. A vector b is in the column space if and only if it is expressible as a linear combination 

b = v1 x1 + v2 x2 + · · · + vn xn = [ v1

v2

...

 x1  x2   vn ]   ...  = Ax. xn

Thus, the column space of A consists of all vectors b in Rm for which the system Ax = b has a solution. Example 3, continued. We wish to determine if   1 0 b=  1 0 is in the column space of



1 0 A= 1 1

2 −1 2 1 4 0 0 −2

 0 1 . 1 0

This will be true if and only if 

1 0 A= 1 1

2 −1 2 1 4 0 0 −2

   0 1 1 0 x =   1 1 0 0

has a solution. Reduce the augmented matrix 

1 0  1 1

2 −1 2 1 4 0 0 −2

0 1 1 0

| | | |

  1 1 2 −1 1 0 2 1 1 0 → 1 0 2 1 1 0 0 −2 −1 0  1 2 −1 1 | 1 1 | 0 2 → 0 0 0 0 | 0 0 0 1 |  1 2 −1 1 | 1 1 | 0 2 → 0 0 0 1 | 0 0 0 0 |

| | | |

 1 0  0 −1  1 0  0 −1  1 0 . −1 0

66

I. LINEAR ALGEBRA, BASIC NOTIONS

It is clear at this point that the system will have a solution, so we need not go any farther. We can conclude that b is in the column space of A. Note that the method outlined in the beginning of this section gives a basis for the column space, and the number of elements in this basis is the rank of A. (The rank is the number of pivots!) Hence, the rank of an m × n matrix A is the dimension of its column space. The column space of a matrix A is often called the range of A. That is because it describes all possible vectors in Rm of the form Ax. There is a similar concept for rows; the row space of an m × n matrix A is the subspace of Rn spanned by the rows of A. It is not hard to see that the dimension of the row space of A is also the rank of A. For, since each row operation is reversible, applying a row operation does not change the subspace spanned by the rows. Hence, the row space of the matrix A0 obtained by Gauss-Jordan reduction from A is the same as the row space of A. However, the set of non-zero rows of the reduced matrix is a basis for this subspace. To see this, note first that it certainly spans (since leaving out zero rows doesn’t cost us anything). Moreover, it is also a linearly independent set because each non-zero row has a 1 in a pivot position where all the other rows are zero. The fact that both the column space and the row space have the same dimension is sometimes expressed by saying “the column rank equals the row rank”. Of course, there is no particular reason why the row space and the column space should be identical. For example, unless the matrix is square, the vectors in them won’t even have the same number of components. A Note on the Definition of Rank. The rank of A is defined as the number of pivots in the reduced matrix obtained from A by an appropriate sequence of elementary row operations. Since we can specify a standard procedure for performing such row operations, that means the rank is a well defined number. On the other hand, it is natural to wonder what might happen if A were reduced by an alternative, perhaps less systematic, sequence of row operations. The above analysis shows that we would still get the same answer for the rank. Namely, the rank is the dimension of the column space of A, and that number depends only on the column space itself, not on any particular basis for it. (Or you could use the same argument using the row space.) The rank is also the number of non-zero rows in the reduced matrix, so it follows that this number does not depend on the particular sequence of row operations used to reduce A to Gauss-Jordan reduced form. In fact, the entire matrix obtained at the end (as long as it is in Gauss-Jordan reduced form) depends only on the original matrix A and not on the particular sequence or row operations used to obtain it. The proof of this fact is not so easy, and we omit it here.

Exercises for Section 9. 1.

Find a subset of the following set of vectors which is a basis for the subspace

10. REVIEW PROBLEMS

it spans.

67

        1 3 3 1      2   0   3   −1   , , ,  −3 3 −3    3  0 2 1 1 

 1 0 2 1 1 2. Let A =  −1 1 3 0 1 . 1 1 7 2 3 (a) Find a basis for the column space of A. (b) Find a basis for the row space of A. 3.

Let



 1 v1 =  −2  , −1

  1 v2 =  2  . 1

Find a basis for R3 by finding a third vector v3 such that {v1 , v2 , v3 } is linearly independent. Hint. You may find an easier way to do it, but the following method should work. Use the method suggested in Section 9 to pick out a linearly independent subset from {v1 , v2 , e1 , e2 , e3 }. 4. (Optional) Let {v1 , v2 , . . . , vk } be a linearly independent subset of Rn . Apply the method in section 9 to the set {v1 , v2 , . . . , vk , e1 , e2 , . . . , en }. It will necessarily yield a basis for Rn . Why? Show that this basis will include {v1 , v2 , . . . , vk } as a subset. That is show that none of the vi will be eliminated by the process. 5.

(a) Find a basis for the column space of 

1 A = 1 3

2 2 6

2 3 7

 3 4. 10

  0 (b) Is b =  1  in the column space of A? 1 6. (a) Suppose A is a 7 × 12 matrix and W is its range or column space. If dim W = 7, what can you say about the general solvability of systems of the form Ax = b? (b) Suppose instead that A is 12 × 7. What if anything can you say about the general solvability of systems of the form Ax = b?

10. Review Problems

68

I. LINEAR ALGEBRA, BASIC NOTIONS

Exercises for Section 10.   1 0 0 1. Find A−1 for A =  −1 2 0 . Check your answer by computing AA−1 . 3 1 −1   2 −1 6 1 2 2. Let A =  1 3 4 0 0 . 4 5 14 1 2 (a) Find the dimension of the nullspace of A. (b) Find the dimension of the column space of A. (c) How are these two numbers related? 3. What is wrong with the following statement? If A and B are invertible n × n matrices, then AB is invertible and (AB)−1 = A−1 B −1 . 4. Suppose A is a 15×23 matrix. In which circumstances will each of the following statements be true? (a) A system Ax = b of 15 equations in 23 unknowns has a solution for every b in R15 . (b) The homogeneous system Ax = 0 has infinitely many solutions.   1 3 4 0 2 5. Let A =  2 7 6 1 1 . 4 13 14 1 3 (a) Find a basis for the nullspace of the A. (b) Find the dimensions of the nullspace and the column space of A.   1 (c) Does  1  belong to the column space of A? Explain. −1   1 3 3 6. Find the inverse of A =  2 6 8 . 0 2 2   1 2 2 3 7. Let A =  1 2 3 4  . 3 6 7 10 (a) Find a basis for the nullspace of A. (b) Find a basis for the column space of A. (c) Do the columns of A form a linearly independent set? Explain.  2 (d) Does  3  belong to the column space of A? Explain your answer. 4 8. In each case tell if the indicated subset is a vector subspace of R3 and give a reason for your answer. (a) The plane defined by the equation x1 − 2x2 + 3x3 = 0. (b) The sphere defined by the equation x1 2 + x2 2 + x3 2 = 16.

10. REVIEW PROBLEMS

69



9.

1 All the parts of this question refer to the matrix A =  2 1 (a) What is the rank of A? (b) What is the dimension of the nullspace of A?

10.

Consider the subset   1   0  ,   0 1

  3 1  , 1 0

  1 1  , 0 0

of R4 . (a) Is this set linearly independent? Explain? (b) Is it a basis for R4 ? Explain.

  1   0   0   0

2 1 1 3 −1 2

3 3 0



1 2 . 1

70

I. LINEAR ALGEBRA, BASIC NOTIONS

CHAPTER II

DETERMINANTS AND EIGENVALUES

1. Introduction Gauss-Jordan reduction is an extremely effective method for solving systems of linear equations, but there are some important cases in which it doesn’t work very well. This is particularly true if some of the matrix entries involve symbolic parameters rather than specific numbers. Example 1. Solve the general 2 × 2 system ax + by = e cx + dy = f We can’t easily use Gaussian reduction since it would proceed differently if a were zero than it would if a were not zero. In this particular case, however, the usual high school method works quite well. Multiply the first equation by d and the second by b to obtain adx + bdy = ed bcx + bdy = bf and subtract to obtain adx − bcx = ed − bf or

(ad − bc)x = ed − bf ed − bf . or x= ad − bc

Similarly, multiplying the first equation by c, the second by a and subtracting yields y=

af − ce . ad − bc

For this to work, we must assume only that ad − bc 6= 0. (See the exercises for a slightly different but equivalent approach which uses A−1 .) You should recognize the quantity in the denominator as the 2 × 2 determinant · ¸ a b det = ad − bc. c d 71

72

II. DETERMINANTS AND EIGENVALUES

2 × 2 determinants arise in a variety of important situations. For example, if · ¸ · ¸ u1 v u= and v= 1 u2 v2 are two vectors in the plane, then · u det 1 u2

v1 v2

¸ = u 1 v2 − v1 u 2

is the signed area of the parallelogram spanned by the vectors.

v u

The sign is determined by the orientation of the two vectors. It is positive if the smaller of the two angles from u to v is in the counterclockwise direction and negative if it is in the clockwise direction. You are probably also familiar with 3 × 3 determinants. For example, if u, v, and w are vectors in R3 , the triple product (u × v) · w = u · (v × w) gives the signed volume of the parallelepiped spanned by the three vectors. u

v

w

v

u

The sign is positive if the vectors form a ‘right handed’ triple and it is negative if they form a ‘left handed triple’. If you express this triple product in terms of components, you obtain the expression u1 v2 w3 + v1 w2 u3 + w1 u2 v3 − u1 w2 v3 − v1 w2 u3 − w1 v2 u3

1. INTRODUCTION

and this quantity is called the determinant  u 1 v1  u 2 v2 u 3 v3 As you might expect, if you  a11  a21 a31

73

of the 3 × 3 matrix  w1 w2  . w3

try to solve the general 3 × 3 system     a12 a13 x1 b1 a22 a23   x2  =  b2  a32 a33 x3 b3

without having specific numerical values for the entries of the coefficient matrix, then you end up with a bunch of formulas in which the 3 × 3 determinant of the coefficient matrix A plays a significant role. Our program in this chapter will be to generalize these concepts to arbitrary n × n matrices. This is necessarily a bit complicated because of the complexity of the formulas. For n > 3, it is not feasible to try to write out explicit formulas for the quantities which arise, so we need another approach. Exercises for Section 1.     · ¸ v1 u1 u v1 . This in 1. Let u =  u2  , v =  v2 . Verify that ±|u × v| = det 1 u 2 v2 0 0 effect shows that, except for sign, a 2×2 determinant is the area of the parallelogram spanned by its columns in R2 . 2.

· ¸ 1 u= , 2

Let

· ¸ 0 v= . 1

(a) Calculate the determinants of the following 2 × 2 matrices (i) [ u

v],

(ii) [ v

u],

(iii) [ u − 2v

v].

(b) Draw plane diagrams for each of the parallelograms spanned by the columns of these matrices. Explain geometrically what happened to the area. 3.

Let u, v, and w be the columns of the matrix   1 −1 1 A = 1 1 1. 0 0 1

(a) Find det A by computing (u × v) · w and check by computing u · (v × w). (b) Without doing the computation, find det [ v u w ], det [ u w v ], and det [ w v u ]. (c) Explain why the determinant of the above matrix does not change if you replace the first column by the the sum of the first two columns. (d) What happens if you multiply one of the columns by −3?

74

4.

II. DETERMINANTS AND EIGENVALUES

Solve the system

·

a b c d

¸· ¸ · ¸ x e = y f

by multiplying the right hand side by the inverse of the coefficient matrix. Compare what you get with the solution obtained in the section.

2. Definition of the Determinant Let A be an n × n matrix. By definition for n = 1 for n = 2

det [ a ] = a ¸ a11 a12 = a11 a22 − a12 a21 . det a21 a22 ·

As mentioned in the previous section, we can give an explicit formula to define det A for n = 3 , but an explicit formula for larger n is very difficult to describe. Here is a provisional definition. Form a sum of many terms as follows. Choose any entry from the first row of A; there are n possible ways to do that. Next, choose any entry from the second row which is not in the same column as the first entry chosen; there are n − 1 possible ways to do that. Continue in this way until you have chosen one entry from each row in such a way that no column is repeated; there are n! ways to do that. Now multiply all these entries together to form a typical term. If that were all, it would be complicated enough, but there is one further twist. The products are divided into two classes of equal size according to a rather complicated rule and then the sum is formed with the terms in one class multiplied by +1 and those in the other class multiplied by −1. Here is the definition again for n = 3 arranged to exhibit the signs.   a11 a12 a13 det  a21 a22 a23  = a31 a32 a33 a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32 − a12 a21 a33 − a13 a22 a31 . The definition for n = 4 involves 4! = 24 terms, and I won’t bother to write it out. A better way to develop the theory is recursively. That is, we assume that determinants have been defined for all (n − 1) × (n − 1) matrices, and then use this to define determinants for n × n matrices. Since we have a definition for 1 × 1 matrices, this allows us in principle to find the determinant of any n × n matrix by recursively invoking the definition. This is less explicit, but it is easier to work with. Here is the recursive definition. Let A be an n × n matrix, and let Dj (A) be the determinant of the (n − 1) × (n − 1) matrix obtained by deleting the jth row and the first column of A. Then, define det A = a11 D1 (A) − a21 D2 (A) + · · · + (−1)j+1 aj1 Dj (A) + · · · + (−1)n+1 an1 Dn (A).

2. DEFINITION OF THE DETERMINANT

75

In words: take each entry in the first column of A, multiply it by the determinant of the (n − 1) × (n − 1) matrix obtained by deleting the first column and that row, and then add up these entries alternating signs as you do. Examples.  2 −1 det  1 2 0 3

 · ¸ · ¸ · ¸ 3 2 0 −1 3 −1 3  0 = 2 det − 1 det + 0 det 3 6 3 6 2 0 6 = 2(12 − 0) − 1(−6 − 9) + 0(. . . ) = 24 + 15 = 39.

Note that we didn’t bother evaluating the 2 × 2 determinant with coefficient 0. You should check that the earlier definition gives the same result. 

1 0 det  2 1

2 1 0 1

−1 2 3 2

 3 0  6 1



1 = 1 det  0 1

2 3 2

   0 2 −1 3 6  − 0 det  0 3 6 1 1 2 1    2 −1 3 2 −1 + 2 det  1 2 0  − 1 det  1 2 1 2 1 0 3

 3 0. 6

Each of these 3 × 3 determinants may be evaluated recursively. (In fact we just did the last one in the previous example.) You should work them out for yourself. The answers yield   1 2 −1 3 2 0 0 1 det   = 1(3) − 0(. . . ) + 2(5) − 1(39) = −26. 2 0 3 6 1 1 2 1 Although this definition allows us to compute the determinant of any n × n matrix in principle, the number of operations grows very quickly with n. In such calculations one usually keeps track only of the multiplications since they are usually the most time consuming operations. Here are some values of N (n), the number of multiplications needed for a recursive calculation of the determinant of an n × n determinant. We also tabulate n! for comparison. n 2 3 4 5 6 .. .

N (n) 2 6 28 145 876 .. .

n! 2 6 24 120 720 .. .

76

II. DETERMINANTS AND EIGENVALUES

The recursive method is somewhat more efficient than the formula referred to at the beginning of the section. For, that formula has n! terms, each of which requires multiplying n entries together. Each such product requires n − 1 separate multiplications. Hence, there are (n − 1)n! multiplications required altogether. In addition, the rule for determining the sign of the term requires some extensive calculation. However, as the above table indicates, even the number N (n) of multiplications required for the recursive definition grows faster than n!, so it gets very large very quickly. Thus, we clearly need a more efficient method to calculate determinants. As is often the case in linear algebra, elementary row operations come to our rescue. Using row operations for calculating determinants is based on the following rules relating such operations to determinants. Rule (i): If A0 is obtained from A by adding a multiple of one row of A to another, then det A0 = det A. Example 1.



 1 2 3 det  2 1 3  = 1(1 − 6) − 2(2 − 6) + 1(6 − 3) = 6 1 2 1   1 2 3 det  0 −3 −3  = 1(−3 + 6) − 0(2 − 6) + 1(−6 + 9) = 6. 1 2 1

Rule (ii): if A0 is obtained from A by multiplying one row by a scalar c, then det A0 = c det A. Example 2.  1 det  2 0  1 2 det  1 0

2 4 1 2 2 1

 0 2  = 1(4 − 2) − 2(2 − 0) + 0(. . . ) = −2 1  0 1  = 2(1(2 − 1) − 1(2 − 0) + 0(. . . )) = 2(−1) = −2. 1

One may also state this rule as follows: any common factor of a row of A may be ‘pulled out’ from its determinant. Rule (iii): If A0 is obtained from A by interchanging two rows, then det A0 = − det A. Example 3.

· det

1 2

¸ 2 = −3 1

· det

2 1

¸ 1 = 3. 2

The verification of these rules is a bit involved, so we relegate it to an appendix, which most of you will want to skip. The rules allow us to compute the determinant of any n × n matrix with specific numerical entries.

2. DEFINITION OF THE DETERMINANT

77

Example 4. We shall calculate the determinant of a 4 × 4 matrix. You should make sure you keep track of which elementary row operations have been performed at each stage. 

1  0 det  3 −1

2 −1 2 1 0 1 6 0

  1 1 2 2 2 0  = det  1 0 −6 2 0 8  1 2 0 2 = −5 det  0 0 0 0  1 2 0 2 = +5 det  0 0 0 0

  −1 1 1 2 1 2 0 2  = det  4 −2 0 0 −1 3 0 0   −1 1 1 1 2 0  = +5 det  7 4 0 1 1 0  −1 1 1 2 . 1 1 0 −3

 −1 1 1 2  7 4 −5 −5  2 −1 1 2 1 2  0 1 1 0 7 4

We may now use the recursive definition to calculate the last determinant. In each case there is only one non-zero entry in the first column.     1 2 −1 1 2 1 2 0 2 1 2   det  1  . = 1 det  0 1 0 0 1 1 0 0 −3 0 0 0 −3 · ¸ 1 1 = 1 · 2 det = 1 · 2 · 1 det [ −3 ] 0 −3 = 1 · 2 · 1 · (−3) = −6. Hence, the determinant of the original matrix is 5(−6) = −30. The last calculation is a special case of a general fact which is established in much the same way by repeating the recursive definition.   a11 a12 a13 . . . a1n  0 a22 a23 . . . a2n    0 0 a33 . . . a3n  = a11 a22 a33 . . . ann . det   . .. .. ..   .. . . ... .  0 0 0 . . . ann In words, the determinant of an upper triangular matrix is the product of its diagonal entries. It is important to be able to tell when the determinant of an n × n matrix A is zero. Certainly, this will be the case if the first column consists of zeroes, and indeed it turns out that the determinant vanishes if any row or any column consists only of zeroes. More generally, if either the set of rows or the set of columns is a linearly dependent set, then the determinant is zero. (That will be the case if the rank r < n since the rank is the dimension of both the row space and the column space.) This follows from the following important theorem.

78

II. DETERMINANTS AND EIGENVALUES

Theorem 2.1. Let A be an n × n matrix. Then A is singular if and only if det A = 0. Equivalently, A is invertible, i.e., has rank n, if and only if det A 6= 0. Proof. If A is invertible, then Gaussian reduction leads to an upper triangular matrix with non-zero entries on its diagonal, and the determinant of such a matrix is the product of its diagonal entries, which is also non-zero. No elementary row operation can make the determinant zero. For, type (i) operations don’t change the determinant, type (ii) operations multiply by non-zero scalars, and type (iii) operations change its sign. Hence, det A 6= 0. If A is singular, then Gaussian reduction also leads to an upper triangular matrix, but one in which at least the last row consists of zeroes. Hence, at least one diagonal entry is zero, and so is the determinant. ¤ Example 5.



 1 1 2 det  2 1 3  = 1(1 − 0) − 2(1 − 0) + 1(3 − 2) = 0 1 0 1 so the matrix must be singular. To confirm this, we reduce       1 1 2 1 1 2 1 1 2  2 1 3  →  0 −1 −1  →  0 −1 −1  1 0 1 0 −1 −1 0 0 0 which shows that the matrix is singular. In the previous section, we encountered 2×2 matrices with symbolic non-numeric entries. For such a matrix, Gaussian reduction doesn’t work very well because we don’t know whether the non-numeric expressions are zero or not. Example 6. Suppose we want to know whether or not the matrix   −λ 1 1 1 0 0  1 −λ   1 0 −λ 0 1 0 0 −λ is singular. We could try to calculate its rank, but since we don’t know what λ is, it is not clear how to proceed. Clearly, the row reduction works differently if λ = 0 than if λ 6= 0. However, we can calculate the determinant by the recursive method.   −λ 1 1 1 0 0  1 −λ det   1 0 −λ 0 1 0 0 −λ     −λ 0 0 1 1 1 = (−λ) det  0 −λ 0  − 1 det  0 −λ 0 0 0 −λ 0 0 −λ     1 1 1 1 1 1 + 1 det  −λ 0 0  − 1 det  −λ 0 0 0 0 −λ 0 −λ 0 = (−λ)(−λ3 ) − (λ2 ) + (−λ2 ) − (λ2 ) √ √ = λ4 − 3λ2 = λ2 (λ − 3)(λ + 3).

2. DEFINITION OF THE DETERMINANT

79

√ √ Hence, this matrix is singular just in the cases λ = 0, λ = 3, and λ = − 3. Appendix. Some Proofs. We now establish the basic rules relating determinants to elementary row operations. If you are of a skeptical turn of mind, you should study this section, since the relation between the recursive definition and rules (i), (ii), and (iii) is not at all obvious. However, if you have a trusting nature, you might want to skip this section since the proofs are quite technical and not terribly enlightening. The idea behind the proofs is to assume that the rules—actually, modified forms of the rules—have been established for (n − 1) × (n − 1) determinants, and then to prove them for n × n determinants. To start it all off, the rules must be checked explicitly for 2 × 2 determinants. I leave that step for you in the Exercises. We start with the hardest case, rule (iii). First we consider the special case that A0 is obtained from A by switching two adjacent rows, the ith row and the (i + 1)st row. Consider the recursive definition det A0 = a011 D1 (A0 ) − · · · + (−1)i+1 a0i1 Di (A0 ) + (−1)i+2 a0i+1,1 Di+1 (A0 ) + · · · + (−1)n+1 a0n1 Dn (A0 ). Look at the subdeterminants occurring in this sum. For j 6= i, i + 1, we have Dj (A0 ) = −Dj (A) since deleting the first column and jth row of A and then switching two rows— neither of which was deleted—changes the sign by rule (iii) for (n − 1) × (n − 1) determinants. The situation for j = i or j = i + 1 is different; in fact, we have Di (A0 ) = Di+1 (A)

and

Di+1 (A0 ) = Di (A).

The first equation follows because switching rows i and i + 1 and then deleting row i is the same as deleting row i + 1 without touching row i. A similar argument establishes the second equation. Using this together with a0i1 = ai+1,1 , a0i+1,1 = ai1 yields (−1)i+1 a0i1 Di (A0 ) = (−1)i+1 ai+1,1 Di+1 (A) = −(−1)i+2 ai+1,1 Di+1 (A) (−1)i+2 a0i+1,1 Di+1 (A0 ) = (−1)i+2 ai1 Di (A) = −(−1)i+1 ai1 Di (A). In other words, all terms in the recursive definition of det A0 are negatives of the corresponding terms of det A except those in positions i and i+1 which get reversed with signs changed. Hence, the effect of switching adjacent rows is to change the sign of the sum. Suppose instead that non-adjacent rows in positions i and j are switched, and suppose for the sake of argument that i < j. One way to do this is as follows. First move row i past each of the rows between row i and row j. This involves some number of switches of adjacent rows—call that number k. (k = j − i − 1, but it that doesn’t matter in the proof.) Next, move row j past row i and then past the k rows just mentioned, all in their new positions. That requires k + 1 switches of

80

II. DETERMINANTS AND EIGENVALUES

adjacent rows. All told, to switch rows i and j in this way requires 2k + 1 switches of adjacent rows. The net effect is to multiply the determinant by (−1)2k+1 = −1 as required. There is one important consequence of rule (iii) which we shall use later in the proof of rule (i). Rule (iiie): If an n × n matrix has two equal rows, then det A = 0. This is not too hard to see. Interchanging two rows changes the sign of det A, but if the rows are equal, it doesn’t change anything. However, the only number with the property that it isn’t changed by changing its sign is the number 0. Hence, det A = 0. We next verify rule (ii). Suppose A0 is obtained from A by multiplying the ith row by c. Consider the recursive definition (1)

det A0 = a011 D1 (A0 ) + · · · + (−1)i+1 a0i1 Di (A0 ) + · · · + (−1)n+1 an1 Dn (A).

For any j 6= i, Dj (A0 ) = cDj (A) since one of the rows appearing in that determinant is multiplied by c. Also, a0j1 = aj1 for j 6= i. On the other hand, Di (A0 ) = Di (A) since the ith row is deleted in calculating these quantities, and, except for the ith row, A0 and A agree. In addition, a0i1 = cai1 so we pick up the extra factor of c in any case. It follows that every term on the right of (1) has a factor c, so det A0 = c det A. Finally, we attack the proof of rule (i). It turns out to be necessary to verify the following stronger rule. Rule (ia): Suppose A, A0 , and A00 are three n × n matrices which agree except in the ith row. Suppose moreover that the ith row of A is the sum of the ith row of A0 and the ith row of A00 . Then det A = det A0 + det A00 . Let’s first see why rule (ia) implies rule (i). We can add c times the jth row of A to its i row as follows. Let B 0 = A, let B 00 be the matrix obtained from A by replacing its ith row by c times its jth row, and let B be the matrix obtained form A by adding c times its jth row to its ith row. Then according to rule (ia), we have det B = det B 0 + det B 00 = det A + det B 00 . On the other hand, by rule (ii), det B 00 = c det A00 where A00 has both ith and jth rows equal to the jth row of A. Hence, by rule (iiie), det A00 = 0, and det B = det A. Finally, we establish rule (1a). Assume it is known to be true for (n − 1) × (n − 1) determinants. We have (2)

det A = a11 D1 (A) − · · · + (−1)i+1 ai1 Di (A) + · · · + (−1)n+1 an1 Dn (A).

For j 6= i, the the sum rule (ia) may be applied to the determinants Di (A) because the appropriate submatrix has one row which breaks up as a sum as needed. Hence, Dj (A) = Dj (A0 ) + Dj (A00 ).

2. DEFINITION OF THE DETERMINANT

81

Also, for j 6= i, we have aj1 = a0j1 = a00j1 since all the matrices agree in any row except the ith row. Hence, for j 6= i, ai1 Di (A) = ai1 Di (A0 ) + ai1 Di (A00 ) = a0i1 Di (A0 ) + a00i1 Di (A00 ). On the other hand, Di (A) = Di (A0 ) = Di (A00 ) because in each case the ith row was deleted. But ai1 = a0i1 + a00i1 , so ai1 Di (A) = a0i1 Di (A) + a00i1 Di (A) = a0i1 Di (A0 ) + a00i1 Di (A00 ). It follows that every term in (2) breaks up into a sum as required, and det A = det A0 + det A00 . Exercises for Section 2. 1. Find the determinants of each of the following matrices. Use whatever method seems most convenient, but seriously consider the use of elementary row operations.   1 1 2 (a)  1 3 5 . 6 4 1   1 2 3 4 2 1 4 3 (b)  . 1 4 2 3 4 3 2 1   0 0 0 0 3 1 0 0 0 2   (c)  0 1 0 0 1 .   0 0 1 0 4  0 0 0 1 2 0 x y (d)  −x 0 z . −y −z 0 Verify the following rules for 2 × 2 determinants. (i) If A0 is obtained from A by adding a multiple of the first row to the second, then det A0 = det A. (ii) If A0 is obtained from A by multiplying its first row by c, then det A0 = c det A. (iii) If A0 is obtained from A by interchanging its two rows, then det A0 = − det A. Rules (i) and (ii) for the first row, together with rule (iii) allow us to derive rules (i) and (ii) for the second row. Explain. 2.

3.

Derive the following generalization of rule (i) for 2 × 2 determinants. ¸ · 0 0¸ · 00 00 ¸ · 0 a b a b a + a00 b0 + b00 = det + det . det c d c d c d

What is the corresponding rule for the second row? Why do you get it for free if you use the results of the previous problem?

82

II. DETERMINANTS AND EIGENVALUES

·

1 z

z 1

4.

Find all values of z such that the matrix

5.

Find all values of λ such that the matrix   −λ 1 0 A =  1 −λ 1  0 1 −λ

¸ is singular.

is singular. 6. The determinant of the following matrix is zero. Explain why just using the recursive definition of the determinant.   2 −2 3 0 4 2 −3 0 1 4   3  6 −5 −2 0   1 −3 3 0 6 5 2 12 0 10 7.

If A is n × n, what can you say about det(cA)?

8.

Suppose A is a non-singular 6 × 6 matrix. Then det(−A) 6= − det A. Explain.

9.

Find 2 × 2 matrices A and B such that det(A + B) 6= det A + det B.

10. (a) Show that the number of multiplications N (7) necessary to compute recursively the determinant of a 7 × 7 matrix is 6139. (b) (Optional) Find a rule relating N (n) to N (n − 1). Use this to write a computer program to calculate N (n) for any n.

3. Some Important Properties of Determinants Theorem 2.2 (The Product Rule). Let A and B be n × n matrices. Then det(AB) = det A det B. We relegate the proof of this theorem to an appendix, but let’s check it in an example Example 1. Let ·

2 A= 1

¸ 1 , 2

·

1 B= 1

Then det A = 3, det B = 2, and · AB =

3 3

−1 1

¸

¸ −1 . 1

3. SOME IMPORTANT PROPERTIES OF DETERMINANTS

83

so det(AB) = 6 as expected. This example has a simple geometric interpretation. Let · ¸ · ¸ 1 −1 u= , v= . 1 1 Then det B is just the area of the parallelogram spanned by the two vectors. On the other hand the columns of the product · ¸ · ¸ 3 −1 AB = [ Au Av ] i.e., Au = , Av = 3 1 also span a parallelogram which is related to the first parallelogram in a simple way. One edge is multiplied by a factor 3 and the other edge is fixed. Hence, the area is multiplied by 3.

Au v

Av

u

Thus, in this case, in the formula det(AB) = (det A)(det B) the factor det A tells us how the area of a parallelogram changes if its edges are transformed by the matrix A. This is a special case of a much more general assertion. The product rule tells us how areas, volumes, and their higher dimensional analogues behave when a figure is transformed by a matrix. Transposes. Let A be an m×n matrix. The transpose of A is the n×m matrix for which the columns are the rows of A. (Also, its rows are the columns of A.) It is usually denoted At , but other notations are possible. Examples. ·

¸ 2 0 1 A= 2 1 2   1 2 3 A = 0 2 3 0 0 3   a1 a =  a2  a3



2 At =  0 1  1 At =  2 3 at = [ a1

 2 1 2 0 2 3 a2

 0 0 3 a3 ] .

84

II. DETERMINANTS AND EIGENVALUES

Note that the transpose of a column vector is a row vector and vice-versa. The following rule follows almost immediately from the definition. Theorem 2.3. Assume A is an m × n matrix and B is an n × p matrix. Then (AB)t = B t At . Note that the order on the right is reversed. Example 2. Let



1 A = 1 1 

Then

12 AB =  8 4 while Bt =

·

4 0

 11 6, 0 

¸

1 1

2 0 0

2 , 3

1 At =  2 3

 3 2, 0

so

1 0 2



4 B = 1 2 (AB)t =  1 0, 0

·

 0 1. 3 12 11

8 6

4 0

so B t At =

¸

·

12 11

8 6

4 0

¸

as expected. Unless the matrices are square, the shapes won’t even match if the order is not reversed. In the above example At B t would be a product of a 3 × 3 matrix with a 2×3 matrix, and that doesn’t make sense. The example also helps us to understand why the formula is true. The i, j-entry of the product is the row by column product of the ith row of A with the jth column of B. However, taking transposes reverses the roles of rows and columns. The entry is the same, but now it is the product of the jth row of B t with the ith column of At . Theorem 2.4. Let A be an n × n matrix. Then det At = det A. See the appendix for a proof, but here is an example. Example 3.



1 det  2 0  1 det  0 1

0 1 0 2 1 2

 1 2  = 1(1 − 0) − 2(0 − 0) + 0(. . . ) = 1 1  0 0  = 1(1 − 0) − 0(. . . ) + 1(0 − 0) = 1. 1

The importance of this theorem is that it allows us to go freely from statements about determinants involving rows of the matrix to corresponding statements involving columns and vice-versa. Because of this rule, we may use column operations as well as row operations to calculate determinants. For, performing a column operation is the same as transposing the matrix, performing the corresponding row operation, and then transposing back. The two transpositions don’t affect the determinant.

3. SOME IMPORTANT PROPERTIES OF DETERMINANTS

85

Example. 

1 2 det  3 4

2 1 3 2

3 3 6 6

  0 1 1 2  = det  2 3 4 4 = 0.

2 1 3 2

2 1 3 2

 0 1  2 4

operation (−1)c1 + c3

The last step follows because the 2nd and 3rd columns are equal, which implies that the rank (dimension of the column space) is less than 4. (You could also subtract the third column from the second and get a column of zeroes, etc.) Expansion in Minors or Cofactors. There is a generalization of the formula used for the recursive definition. Namely, for any n × n matrix A, let Dij (A) be the determinant of the (n − 1) × (n − 1) matrix obtained by deleting the ith row and jth column of A. Then, (1) det A =

n X

(−1)i+j aij Dij (A)

i=1

= (−1)1+j a1j D1j (A) + · · · + (−1)i+j aij Dij (A) + · · · + (−1)n+j anj Dnj (A). The special case j = 1 is the recursive definition given in the previous section. The more general rule is easy to derive from the special case j = 1 by means of column interchanges. Namely, form a new matrix A0 by moving the jth column to the first position by successively interchanging it with columns j − 1, j − 2, . . . , 2, 1. There are j − 1 interchanges, so the determinant is changed by the factor (−1)j−1 . Now apply the rule for the first column. The first column of A0 is the jth column of A, and deleting it has the same effect as deleting the jth column of A. Hence, a0i1 = aij and Di (A0 ) = Dij (A). Thus, det A = (−1)j−1 det A0 = (−1)j−1 =

n X

n X

(−1)1+i a0i1 Di (A0 )

i=1

(−1)i+j aij Dij (A).

i=1

Similarly, there is a corresponding rule for any row of a matrix (2) det A =

n X

(−1)i+j aij Dij (A)

j=1

= (−1)i+1 ai1 Di1 + · · · + (−1)i+j aij Dij (A) + · · · + (−1)i+n ain Din (A). This formula is obtained from (1) by transposing, applying the corresponding column rule, and then transposing back.

86

II. DETERMINANTS AND EIGENVALUES

Example. Expand the following determinant using its second row. 

1 det  0 3

2 6 2

 · 3 1 0  = (−1)2+3 0(. . . ) + (−1)2+2 6 det 3 1

¸ 3 + (−1)2+3 0(. . . ) 1 = 6(1 − 9) = −48.

There is some terminology which you may see used in connection with these formulas. The determinant Dij (A) of the (n − 1) × (n − 1) matrix obtained by deleting the ith row and jth column is called the i, j-minor of A. The quantity (−1)i+j Dij (A) is called the i, j-cofactor. Formula (1) is called expansion in minors (or cofactors) of the jth column and formula (2) is called expansion in minors (or cofactors) of the ith row. It is not necessary to remember the terminology as long as you remember the formulas and understand how they are used. Cramer’s Rule. One may use determinants to derive a formula for the solutions of a non-singular system of n equations in n unknowns 

a11  a21  .  ..

a12 a22 .. .

an1

an2

... ...

    a1n x1 b1 a2n   x2   b2   .  =  . . ..  .   ..   .. 

... . . . ann

xn

bn

The formula is called Cramer’s rule, and here it is. For the jth unknown xj , take the determinant of the matrix formed by replacing the jth column of the coefficient matrix A by b, and divide it by det A. In symbols, 

a11  a21 det   ...

... ...

b1 b2 .. .

... ...

 a1n a2n  ..  . 

... ... an1 . . . bn . . . ann xj = a . . . a1j . . . a1n  11  a21 . . . a2j . . . a2n  det  ..  ..  ..  . . ... . ... an1 . . . anj . . . ann Example. Consider 

1 1 2 We have

    x1 2 1 2   x2  =  5  . 6 3 x3

0 1 0 

1 det  1 2

0 1 0

 2 2  = 2. 6

3. SOME IMPORTANT PROPERTIES OF DETERMINANTS

87

(Do you see a quick way to compute that?) Hence,   1 0 2 det  5 1 2  3 0 6 0 = =0 x1 = 2 2   1 1 2 det  1 5 2  2 3 6 8 = =4 x2 = 2 2   1 0 1 det  1 1 5  2 0 3 1 = . x3 = 2 2 You should try to do this by Gauss-Jordan reduction. Cramer’s rule is not too useful for solving specific numerical systems of equations. The only practical method for calculating the needed determinants for n large is to use row (and possibly column) operations. It is usually easier to use row operations to solve the system without resorting to determinants. However, if the system has non-numeric symbolic coefficients, Cramer’s rule is sometimes useful. Also, it is often valuable as a theoretical tool. Cramer’s rule is related to expansion in minors. You can find further discussion of it and proofs in Section 5.4 and 5.5 of Introduction to Linear Algebra by Johnson, Riess, and Arnold. (See also Section 4.5 of Applied Linear Algebra by Noble and Daniel.) Appendix. Some Proofs. Here are the proofs of two important theorems stated in this section. The Product Rule. det(AB) = (det A)(det B). Proof. First assume that A is non-singular. Then there is a sequence of row operations which reduces A to the identity A → A1 → A2 → . . . → Ak = I. Associated with each of these operations will be a multiplier ci which will depend on the particular operation, and det A = c1 det A1 = c1 c2 det A2 = · · · = c1 c2 . . . ck det Ak = c1 c2 . . . ck since Ak = I and det I = 1. Now apply exactly these row operations to the product AB AB → A1 B → A2 B → . . . → Ak B = IB = B. The same multipliers contribute factors at each stage, and det AB = c1 det A1 B = c1 c2 det A2 B = · · · = c1 c2 . . . ck det B = det A det B. | {z } det A

88

II. DETERMINANTS AND EIGENVALUES

Assume instead that A is singular. Then, AB is also singular. (This follows from the fact that the rank of AB is at most the rank of A, as mentioned in the Exercises for Chapter 1, Section 6. However, here is a direct proof for the record. Choose a sequence of elementary row operations for A, the end result of which is a matrix A0 with at least one row of zeroes. Applying the same operations to AB yields A0 B which also has to have at least one row of zeroes.) It follows that both det AB and det A det B are zero, so they are equal. ¤ The Transpose Rule. det At = det A. Proof. If A is singular, then At is also singular and vice-versa. For, the rank may be characterized as either the dimension of the row space or the dimension of the column space, and an n × n matrix is singular if its rank is less than n. Hence, in the singular case, det A = 0 = det At . Suppose then that A is non-singular. Then there is a sequence of elementary row operations A → A1 → A2 → · · · → Ak = I. Recall from Chapter 1, Section 4 that each elementary row operation may be accomplished by multiplying by an appropriate elementary matrix. Let Ci denote the elementary matrix needed to perform the ith row operation. Then, A → A1 = C1 A → A2 = C2 C1 A → · · · → Ak = Ck Ck−1 . . . C2 C1 A = I. In other words, A = (Ck . . . C2 C1 )−1 = C1 −1 C2 −1 . . . Ck −1 . To simplify the notation, let Di = Ci −1 . The inverse D of an elementary matrix C is also an elementary matrix; its effect is the row operation which reverses the effect of C. Hence, we have shown that any non-singular square matrix A may be expressed as a product of elementary matrices A = D1 D2 . . . Dk . Hence, by the product rule det A = (det D1 )(det D2 ) . . . (det Dk ). On the other hand, we have by rule for the transpose of a product At = Dk t . . . D2 t D1 t , so by the product rule det At = det(Dk t ) . . . det(D2 t ) det(D1 t ). Suppose we know the rule det Dt = det D for any elementary matrix D. Then, det At = det(Dk t ) . . . det(D2 t ) det(D1 t ) = det(Dk ) . . . det(D2 ) det(D1 ) = (det D1 )(det D2 ) . . . (det Dk ) = det A.

4. SOME IMPORTANT PROPERTIES OF DETERMINANTS

89

(We used the fact that the products on the right are products of scalars and so can be rearranged any way we like.) It remains to establish the rule for elementary matrices. If D = Eij (c) is obtained from the identity matrix by adding c times its jth row to its ith row, then Dt = Eji (c) is a matrix of exactly the same type. In each case, det D = det Dt = 1. If D = Ei (c) is obtained by multiplying the ith row of the identity matrix by c, then Dt is exactly the same matrix Ei (c). Finally, if D = Eij is obtained from the identity matrix by interchanging its ith and jth rows, then Dt is Eji which in fact is just Eij again. Hence, in each case det Dt = det D does hold. ¤ Exercises for Section 3. 1.

Check the validity of the product  1 −2  2 0 −3 1

rule for  6 2 31 1 1

the product  3 1 2 2. 1 0

2. If A and B are n × n matrices, both of rank n, what can you say about the rank of AB? 3.

Find



3 2 det  1 1

0 2 6 5

0 0 4 4

 0 0 . 0 3

Of course, the answer is the product of the diagonal entries. Using the properties discussed in the section, see how many different ways you can come to this conclusion. What can you conclude in general about the determinant of a lower triangular square matrix? 4.

(a) Show that if A is an invertible n × n matrix, then det(A−1 ) =

1 . Hint: det A

Let B = A−1 and apply the product rule to AB. (b) Using part(a), show that if A is any n × n matrix and P is an invertible n × n matrix, then det(P AP −1 ) = det A. 5.

Why does Cramer’s rule fail if the coefficient matrix A is singular?

6.

Use Cramer’s rule to solve the system      0 1 0 0 x1 1  1 0 1 0   x2   2     =  . 0 1 0 1 x3 3 0 0 1 0 4 x4

Also, solve it by Gauss-Jordan reduction and compare the amount of work you had to do in each case.

90

II. DETERMINANTS AND EIGENVALUES

4. Eigenvalues and Eigenvectors for n × n Matrices One way in which to understand a matrix A is to examine its effects on the geometry of vectors in Rn . For example, we saw that det A measures the relative change in area or volume for figures generated by vectors in two or three dimensions. Also, as we have seen in an exercise in Chapter I, Section 2, the multiples Ae1 , Ae2 , . . . , Aen of the standard basis vectors are just the columns of the matrix A. More generally, it is often useful to look at multiples Av for other vectors v. Example 1. Let ·

2 A= 1

¸ 1 . 2

Here are some examples of products Av.

v Av

· ¸ 1 0 · ¸ 2 1

· ·

1 0.5 2.5 2

¸ ¸

· ¸ 1 1 · ¸ 3 3

· ·

0.5 1 2 2.5

¸ ¸

· ¸ 0 1 · ¸ 1 2

These illustrate a trend for vectors in the first quadrant.

Vectors v

Transformed vectors Av

Vectors pointing near one or the other of the two axes are directed closer to the diagonal line. A diagonal vector is transformed into another diagonal vector. Let A be any n × n matrix. In general, if v is a vector in Rn , the transformed vector Av will differ from v in both magnitude and direction. However, some vectors v will have the property that Av ends up being parallel to v; i.e., it points in the same direction or the opposite direction. These vectors will specify ‘natural’ axes for any problem involving the matrix A.

4. EIGENVALUES AND EIGENVECTORS FOR n × n MATRICES

91

v

v

AV

Av

Positive eigenvalue

Negative eigenvalue

Vectors are parallel when one is a scalar multiple of the other, so we make the following formal definition. A non-zero vector v is called an eigenvector for the square matrix A if (1)

Av = λv

for an appropriate scalar λ. λ is called the eigenvalue associated with the eigenvector v. In words, this says that v 6= 0 is an eigenvector for A if multiplying it by A has the same effect as multiplying it by an appropriate scalar. Thus, we may think of eigenvectors as being vectors for which matrix multiplication by A takes on a particularly simple form. It is important to note that while the eigenvector v must be non-zero, the corresponding eigenvalue λ is allowed to be zero. Example 2. Let

·

2 A= 1

¸ 1 . 2

We want to see if the system ·

2 1

1 2

¸·

v1 v2

¸

·

v =λ 1 v2

¸

has non-trivial solutions v1 , v2 . This of course depends on λ. If we write this system out, it becomes 2v1 + v2 = λv1 v1 + 2v2 = λv2 or, collecting terms, (2 − λ)v1 + v2 = 0 v1 + (2 − λ)v2 = 0. In matrix form, this becomes · (2)

2−λ 1

¸ 1 v = 0. 2−λ

92

II. DETERMINANTS AND EIGENVALUES

For any specific λ, this is a homogeneous system of two equations in two unknowns. By the theory developed in the previous sections, we know that it will have nonzero solutions precisely in the case that the rank is smaller than two. A simple criterion for that to be the case is that the determinant of the coefficient matrix should vanish, i.e., · ¸ 2−λ 1 det = (2 − λ)2 − 1 = 0 1 2−λ or

4 − 4λ + λ2 − 1 = λ2 − 4λ + 3 = (λ − 3)(λ − 1) = 0.

The roots of this equation are λ = 3 and λ = 1. Thus, these and only these scalars λ can be eigenvalues for appropriate eigenvectors. First consider λ = 3. Putting this in (2) yields · ¸ · ¸ 2−3 1 −1 1 (3) v= v = 0. 1 2−3 1 −1 Gauss-Jordan reduction yields · −1 1

¸ · 1 1 → −1 0

¸ −1 . 0

(As is usual for homogeneous systems, we don’t need to explicitly write down the augmented matrix, because there are zeroes to the right of the ‘bar’.) The corresponding system is v1 − v2 = 0, and the general solution is v1 = v2 with v2 free. A general solution vector has the form · ¸ · ¸ · ¸ v 1 v . v = 1 = 2 = v2 v2 v2 1 Put v2 = 1 to obtain v1 =

· ¸ 1 1

which will form a basis for the solution space of (3). Any other eigenvector for λ = 3 will be a non-zero multiple of the basis vector v1 . Consider next the eigenvalue λ = 1. Put this in (2) to obtain · ¸ · ¸ 2−1 1 1 1 (4) v= v = 0. 1 2−1 1 1 In this case, Gauss-Jordan reduction—which we omit—yields the general solution v1 = −v2 with v2 free. The general solution vector is · ¸ · ¸ −1 v1 = v2 . v= v2 1 Putting v2 = 1 yields the basic eigenvector · ¸ −1 . v2 = 1

4. EIGENVALUES AND EIGENVECTORS FOR n × n MATRICES

93

The general case We redo the above algebra for an arbitrary n × n matrix. First, rewrite the eigenvector condition as follows Av = λv Av − λv = 0 Av − λIv = 0 (A − λI)v = 0. The last equation is the homogeneous n × n system with n × n coefficient matrix   a12 ... a1n a11 − λ a22 − λ . . . a2n   a21 . A − λI =  .. .. ..   . . ... . an1 an2 . . . ann − λ It has a non-zero solution vector v if and only if the coefficient matrix has rank less than n, i.e., if and only if it is singular . By Theorem 2.1, this will be true if and only if λ satisfies the characteristic equation   a12 ... a1n a11 − λ a22 − λ . . . a2n   a21  = 0. (5) det(A − λI) = det  .. .. ..   . . ... . an2

an1

...

ann − λ

As in the example, the strategy for finding eigenvalues and eigenvectors is as follows. First find the roots of the characteristic equation. These are the eigenvalues. Then for each root λ, find a general solution for the system (A − λI)v = 0.

(6)

This gives us all the eigenvectors for that eigenvalue. The solution space of the system (6), i.e., the null space of the matrix A − λI, is called the eigenspace corresponding to the eigenvalue λ. Example 3. Consider the matrix 

1 A = 4 3

The characteristic equation is  1−λ det(A − λI) = det  4 3

4 1−λ 0

4 1 0

 3 0. 1

 3 0  1−λ

= (1 − λ)((1 − λ)2 − 0) − 4(4(1 − λ) − 0) + 3(0 − 3(1 − λ) = (1 − λ)3 − 25(1 − λ) = (1 − λ)((1 − λ)2 − 25) = (1 − λ)(λ2 − 2λ − 24) = (1 − λ)(λ − 6)(λ + 4) = 0.

94

II. DETERMINANTS AND EIGENVALUES

Hence, the eigenvalues are λ = 1, λ = 6, and λ = −4. We proceed to find the eigenspaces for each of these eigenvalues, starting with the largest. First, take λ = 6, and put it in (6) to obtain the system       −5 4 3 1−6 4 3 v1 v1  4 −5  4 or 0   v2  = 0. 1−6 0   v2  = 0 v3 v3 3 0 −5 3 0 1−6 To solve, use Gauss-Jordan reduction       −5 4 3 −1 −1 3 −1 −1 3  4 −5 0  →  4 −5 0  →  0 −9 12  3 0 −5 3 0 −5 0 −3 4     −1 −1 3 −1 −1 3 → 0 0 0 →  0 3 −4  0 −3 4 0 0 0     1 1 −3 1 0 −5/3 →  0 1 −4/3  →  0 1 −4/3  . 0 0 0 0 0 0 Note that the matrix is singular, and the rank is smaller than 3. This must be the case because the condition det(A − λI) = 0 guarantees it. If the coefficient matrix were non-singular, you would know that there was a mistake: either the roots of the characteristic equation are wrong or the row reduction was not done correctly. The general solution is v1 = (5/3)v3 v2 = (4/3)v3 with v3 free. The general solution vector is     5/3 (5/3)v3 v =  (4/3)v3  = v3  4/3  . v3 1 Hence, the eigenspace is 1-dimensional. A basis may be obtained by setting v3 = 1 as usual, but it is a bit neater to put v3 = 3 so as to avoid fractions. Thus,   5 v1 =  4  3 constitutes a basis for the eigenspace corresponding to the eigenvalue λ = 6 . Note that we have now found all eigenvectors for this eigenvalue. They are all the nonzero vectors in this 1-dimensional eigenspace, i.e., all non-zero multiples of v1 . Next take λ = 1 and put it in (6) to obtain the system    0 4 3 v1  4 0 0   v2  = 0. v3 3 0 0

4. EIGENVALUES AND EIGENVECTORS FOR n × n MATRICES

Use Gauss-Jordan reduction  0 4 4 0 3 0

  3 1 0 0 → ··· → 0 1 0 0 0

95

 0 3/4  . 0

The general solution is v1 = 0 v2 = −(3/4)v3 with v2 free. Thus the general solution vector is     0 0 v =  −(3/4)v3  = v3  −3/4  . v3 1 Put v3 = 4 to obtain a single basis vector 

 0 v2 =  −3  4

for the eigenspace corresponding to the eigenvalue λ = 1. The set of eigenvectors for this eigenvalue is the set of non-zero multiples of v2 . Finally, take λ = −4, and put this in (6) to obtain the system    v1 5 4 3  4 5 0   v2  = 0. 3 0 5 v3 Solve this by Gauss-Jordan reduction.       5 4 3 1 −1 3 1 −1 3 4 5 0 → 4 5 0 → 0 9 −12  3 0 5 3 0 5 0 3 −4     1 −1 3 1 0 5/3 → 0 3 −4  →  0 1 −4/3  . 0 0 0 0 0 0 The general solution is v1 = −(5/3)v3 v2 = (4/3)v3 with v3 free. The general solution vector is     −(5/3)v3 −5/3 v =  (4/3)v3  = v3  4/3  . v3 1

96

II. DETERMINANTS AND EIGENVALUES

Setting v3 = 3 yields the basis vector 

 −5 v3 =  4  3 for the eigenspace corresponding to λ = −4. The set of eigenvectors for this eigenvalue consists of all non-zero multiples of v3 . The set {v1 , v2 , v3 } obtained in the previous example is linearly independent. To see this apply Gaussian reduction to the matrix with these vectors as columns:       5 0 −5 1 0 −1 1 0 −1  4 −3 4  →  0 −3 8  →  0 1 −8/3  . 3 4 3 0 4 6 0 0 50/3 The reduced matrix has rank 3, so the columns of the original matrix form an independent set. It is no accident that a set so obtained is linearly independent. The following theorem tells us that this will always be the case. Theorem 2.5. Let A be an n × n matrix. Let λ1 , λ2 , . . . , λk be different eigenvalues of A, and let v1 , v2 , . . . , vk be corresponding eigenvectors. Then {v1 , v2 , . . . , vk } is a linearly independent set. See the appendix if you are interested in a proof. Historical Aside. The concepts discussed here were invented by the 19th century English mathematicians Cayley and Sylvester, but they used the terms ‘characteristic vector’ and ‘characteristic value’. These were translated into German as ‘Eigenvektor’ and ‘Eigenwerte’, and then partially translated back into English— largely by physicists—as ‘eigenvector’ and ‘eigenvalue’. Some English and American mathematicians tried to retain the original English terms, but they were overwhelmed by extensive use of the physicists’ language in applications. Nowadays everyone uses the German terms. The one exception is that we still call det(A − λI) = 0 the characteristic equation and not some strange German-English name. Solving Polynomial Equations. To find the eigenvalues of an n × n matrix, you have to solve a polynomial equation. You all know how to solve quadratic equations, but you may be stumped by cubic or higher equations, particularly if there are no obvious ways to factor. You should review what you learned in high school about this subject, but here are a few guidelines to help you. First, it is not generally possible to find a simple solution in closed form for an algebraic equation. For most equations you might encounter in practice, you would have to use some method to approximate a solution. (Many such methods

4. EIGENVALUES AND EIGENVECTORS FOR n × n MATRICES

97

exist. One you may have learned in your calculus course is Newton’s Method .) Unfortunately, an approximate solution of the characteristic equation isn’t much good for finding the corresponding eigenvectors. After all, the system (A − λI)v = 0 must have rank smaller than n for there to be non-zero solutions v. If you replace the exact value of λ by an approximation, the chances are that the new system will have rank n. Hence, the textbook method we have described for finding eigenvectors won’t work. There are in fact many alternative methods for finding eigenvalues and eigenvectors approximately when exact solutions are not available. Whole books are devoted to such methods. (See Johnson, Riess, and Arnold or Noble and Daniel for some discussion of these matters.) Fortunately, textbook exercises and examination questions almost always involve characteristic equations for which exact solutions exist, but it is not always obvious what they are. Here is one fact (a consequence of an important result called Gauss’s Lemma) which helps us find such exact solutions when they exist. Consider an equation of the form λn + a1 λn−1 + · · · + an−1 λ + an = 0 where all the coefficients are integers. (The characteristic equation of a matrix always has leading coefficient 1 or −1. In the latter case, just imagine you have multiplied through by −1 to apply the method.) Gauss’s Lemma tells us that if this equation has any roots which are rational numbers, i.e., quotients of integers, then any such root is actually an integer, and, moreover, it must divide the constant term an . Hence, the first step in solving such an equation should be checking all possible factors (positive and negative) of the constant term. Once, you know a root r1 , you can divide through by λ − r1 to reduce to a lower degree equation. If you know the method of synthetic division, you will find checking the possible roots and the polynomial long division much simpler. Example 4. Solve λ3 − 3λ + 2 = 0. If there are any rational roots, they must be factors of the constant term 2. Hence, we must try 1, −1, 2, −2. Substituting λ = 1 in the equation yields 0, so it is a root. Dividing λ3 − 3λ + 2 by λ − 1 yields λ3 − 3λ + 2 = (λ − 1)(λ2 + λ − 2) and this may be factored further to obtain λ3 − 3λ2 + 2 = (λ − 1)(λ − 1)(λ + 2) = (λ − 1)2 (λ + 2). Hence, the roots are λ = 1 which is a double root and λ = −2. Complex Roots. A polynomial equation may end up having complex roots. This can certainly occur for a characteristic equation.

98

II. DETERMINANTS AND EIGENVALUES

Example 5. Let

·

¸ 0 −1 A= . 1 0

Its characteristic equation is

·

¸ −λ −1 det(A − λI) = det = λ2 + 1 = 0. 1 −λ

As you learned in high school algebra, the roots of this equation are ±i where i is the imaginary square root of −1. In such a case, we won’t have much luck in finding eigenvectors in Rn for such ‘eigenvalues’, since solving the appropriate linear equations will yield solutions with non-real, complex entries. It is possible to develop a complete theory based on complex scalars and complex entries, and such a theory is very useful in certain areas like electrical engineering. For the moment, however, we shall restrict our attention to the theory in which everything is assumed to be real. In that context, we just ignore non-real, complex roots of the characteristic equation Appendix. Proof of the linear independence of sets of eigenvectors for distinct eigenvalues. Assume {v1 , v2 , . . . , vk } is not a linearly independent set, and try to derive a contradiction. In this case, one of the vectors in the set can be expressed as a linear combination of the others. If we number the elements appropriately, we may assume that (7)

v1 = c2 v2 + · · · + ck vr ,

where r ≤ k. (Before renumbering, leave out any vector vi on the right if it appears with coefficient ci = 0.) Note that we may also assume that no vector which appears on the right is a linear combination of the others because otherwise we could express it so and after combining terms delete it from the sum. Thus we may assume the vectors which appear on the right form a linearly independent set. Multiply (7) on the left by A. We get Av1 = c2 Av2 + · · · + ck Avk (8)

λ1 v1 = c2 λ2 v2 + . . . ck λk vk

where in (8) we used the fact that each vi is an eigenvector with eigenvalue λi . Now multiply (7) by λ1 and subtract from (8). We get (9)

0 = c2 (λ2 − λ1 )v2 + · · · + ck (λk − λ1 )vk .

Not all the coefficients on the right in this equation are zero. For at least one of the ci 6= 0 (since v1 6= 0), and none of the quantities λ2 − λ1 , . . . λk − λ1 is zero. It follows that (9) may be used to express one of the vectors v2 , . . . , vk as a linear combination of the others. However, this contradicts the assertion that the set of vectors appearing on the right is linearly independent. Hence, our initial assumption that the set {v1 , v2 , . . . , vk } is dependent must be false, and the theorem is proved. You should try this argument out on a set {v1 , v2 , v3 } of three eigenvectors to see if you understand it.

4. EIGENVALUES AND EIGENVECTORS FOR n × n MATRICES

99

Exercises for Section 4. 1. Find the eigenvalues and eigenvectors for each of the following matrices. Use the method given in the text for solving the characteristic equation if it has degree greater· than two. ¸ 5 −3 (a) . 2 0   3 −2 −2 (b)  0 0 1 . 1 0 −1   2 −1 −1 (c)  0 0 −2 . 0 1 3   4 −1 −1 (d)  0 2 −1 . 1 0 3 2. You are a mechanical engineer checking for metal fatigue in a vibrating system. Mathematical analysis   reduces the problem to finding eigenvectors for the matrix  −2 1 0 1 A =  1 −2 1 . A member of your design team tells you that v =  1  is 0 1 −2 1 an eigenvector for A. What is the quickest way for you to check if this is correct? 3.

As in theprevious problem, some other member of your design team tells you 0 that v =  0  is a basis for the eigenspace of the same matrix corresponding to 0 one of its eigenvalues. What do you say in return? 4. Under what circumstances can zero be an eigenvalue of the square matrix A? Could A be non-singular in this case? Hint: The characteristic equation is det(A − λI) = 0. 5. Let A be a square matrix, and suppose λ is an eigenvalue for A with eigenvector v. Show that λ2 is an eigenvalue for A2 with eigenvector v. What about λn and An for n a positive integer? 6. Suppose A is non-singular. Show that λ is an eigenvalue of A if and only if λ−1 is an eigenvalue of A−1 . Hint. Use the same eigenvector. (a) Show that det(A − λI) is a quadratic polynomial in λ if A is a 2 × 2 matrix. (b) Show that det(A − λI) is a cubic polynomial in λ if A is a 3 × 3 matrix. (c) What would you guess is the coefficient of λn in det(A − λI) for A an n × n matrix? 7.

8. (Optional) Let A be an n × n matrix with entries not involving λ. Prove in general that det(A − λI) is a polynomial in λ of degree n. Hint: Assume B(λ) is an n×n matrix such that each column has at most one term involving λ and that term

100

II. DETERMINANTS AND EIGENVALUES

is of the form a + bλ. Show by using the recursive definition of the determinant that det B(λ) is a polynomial in λ of degree at most n. Now use this fact and the recursive definition of the determinant to show that det(A − λI) is a polynomial of degree exactly n. 9. (Project) The purpose of this project is to illustrate one method for approximating an eigenvector and the corresponding eigenvalue in cases where exact calculation is not feasible. We use an example in which one can find exact answers by the usual method, at least if one uses radicals, so we can compare answers to gauge how effective·the method is. ¸ 1 1 Let A = . Define an infinite sequence of vectors vn in R2 as follows. 1 2 · ¸ 1 . Having defined vn , define vn+1 = Avn . Thus, v1 = Av0 , v2 = Let v0 = 0 Av1 = A2 v0 , v3 = Av2 = A3 v0 , etc. Then it turns out in this case that as n → ∞, the directions of the vectors vn approach the direction of an eigenvector for A. Unfortunately, there is one difficulty: the magnitudes |vn | approach · ¸ infinity. an To get get around this problem, proceed as follows. Let vn = and put bn ¸ · 1 an /bn . Then the second component is always one, and the first vn = un = 1 bn · ¸ r is an eigenvector for A. component rn = an /bn approaches a limit r and u = 1 (a) For the above matrix A, calculate the sequence of vectors vn and and numbers rn n = 1, 2, 3, . . . . Do the calculations for enough n to see a pattern emerging and so that you can estimate r accurately to 3 decimal places. (b) Once you know an eigenvector u, you can find the corresponding eigenvalue λ by calculating Au. Use your estimate in part (a) to estimate the corresponding λ. (c) Compare this to the roots of the characteristic equation (1−λ)(2−λ)−1 = 0. Note that the method employed here only gives you one of the two eigenvalues. In fact, this method, when it works, usually gives the largest eigenvalue. 5. Diagonalization In many cases, the process outlined in the previous section results in a basis for Rn which consists of eigenvectors for the matrix A. Indeed, the set of eigenvectors so obtained is always linearly independent, so if it is large enough (i.e., has n elements), it will be a basis. When that is the case, the use of that basis to establish a coordinate system for Rn can simplify calculations involving A. Example 1. Let

·

2 A= 1 We found in the previous section that · ¸ 1 , v1 = 1

¸ 1 . 2 · v2 =

−1 1

¸

5. DIAGONALIZATION

101

are eigenvectors respectively with eigenvalues λ1 = 3 and λ2 = 1. The set {v1 , v2 } is linearly independent, and since it has two elements, it must be a basis for R2 . Suppose v is any vector in R2 . We may express it respect to this new basis · ¸ y v = v1 y1 + v2 y2 = [ v1 v2 ] 1 y2 where (y1 , y2 ) are the coordinates of v with respect to this new basis. It follows that Av = A(v1 y1 + v2 y2 ) = (Av1 )y1 + (Av2 )y2 . However, since they are eigenvectors, each is just multiplied by the corresponding eigenvalue, or in symbols Av1 = v1 (3)

and Av2 = v2 (1).

So (1)

· A(v1 y1 + v2 y2 ) = v1 (3y1 ) + v2 y2 = [ v1

v2 ]

¸ 3y1 . y2

In other words, with respect to the new coordinates, the effect of multiplication by A on a vector is to multiply the first new coordinate by the first eigenvalue λ1 = 3 and the second new coordinate by the second eigenvalue λ2 = 1. Whenever there is a basis for Rn consisting of eigenvectors for A, we say that A is diagonalizable and that the new basis diagonalizes A. The reason for this terminology may be explained as follows. In the above example, rewrite the left most side of equation (1) · ¸ y A(v1 y1 + v2 y2 ) = A [ v1 v2 ] 1 y2 and the right most side as ¸ y1 = [ v1 . [ v1 y2 · ¸ y Since these two are equal, if we drop the common factor 1 on the right, we get y2 · ¸ 3 0 A [ v 1 v2 ] = [ v1 v2 ] . 0 1 ·

3y1 v2 ] y2

¸

·

3 v2 ] 0

0 1

¸·

·

Let P = [ v1

¸ 1 −1 v2 ] = , 1 1

i.e., P is the 2 × 2 matrix with the basic eigenvectors v1 , v2 as columns. Then, the above equation can be written · ¸ · ¸ 3 0 3 0 . AP = P or P −1 AP = 0 1 0 1

102

II. DETERMINANTS AND EIGENVALUES

(The reader should check explicitly in this case that ·

1 −1 1 1

¸−1 ·

2 1

1 2

¸·

¸ · 1 −1 3 = 1 1 0

¸ ! 0 . 1

By means of these steps, the matrix A has been expressed in terms of a diagonal matrix with its eigenvalues on the diagonal. This process is called diagonalization We shall return in Chapter III to a more extensive discussion of diagonalization. Example 2. Not every n×n matrix A is diagonalizable. That is, it is not always possible to find a basis for Rn consisting of eigenvectors for A. For example, let ·

3 A= 0

¸ 1 . 3

The characteristic equation is · det

3−λ 0

¸ 1 = (3 − λ)2 = 0. 3−λ

There is only one root λ = 3 which is a double root of the equation. To find the corresponding eigenvectors, we solve the homogeneous system (A − 3I)v = 0. The coefficient matrix · ¸ · ¸ 3−3 1 0 1 = 0 3−3 0 0 is already reduced, and the corresponding system has the general solution v2 = 0,

v1

free.

The general solution vector is ·

v v= 1 0

¸

· ¸ 1 = v1 = v1 e1 . 0

Hence, the eigenspace for λ = 3 is one dimensional with basis {e1 }. There are no other eigenvectors except for multiples of e1 . Thus, we can’t possibly find a basis for R2 consisting of eigenvectors for A. Note how Example 2 differs from the examples which preceded it; its characteristic equation has a repeated root. In fact, we have the following general principle. If the roots of the characteristic equation of a matrix are all distinct, then there is necessarily a basis for Rn consisting of eigenvectors, and the matrix is diagonalizable. In general, if the characteristic equation has repeated roots, then the matrix need not be diagonalizable. However, we might be lucky, and such a matrix may still be diagonalizable.

5. DIAGONALIZATION

Example 3. Consider the matrix 

1 A =  −1 −1

103

 1 −1 3 −1  . 1 1

First solve the characteristic equation   1−λ 1 −1 det  −1 3−λ −1  = −1 1 1−λ (1 − λ)((3 − λ)(1 − λ) + 1) + (1 − λ + 1) − (−1 + 3 − λ) = (1 − λ)(3 − 4λ + λ2 + 1) + 2 − λ − 2 + λ = (1 − λ)(λ2 − 4λ + 4) = (1 − λ)(λ − 2)2 = 0. Note that 2 is a repeated root. For λ = 2 we need to solve  −1  −1 −1

We find the eigenvectors for each of these eigenvalues. (A − 2I)v = 0.    1 −1 1 −1 1 1 −1  →  0 0 0. 1 −1 0 0 0

The general solution of the system is v1 = v2 − v3 with v2 , v3 free. The general solution vector for that system is       1 −1 v 2 − v3 v =  v2  = v2  1  + v3  0  . v3 0 1 The eigenspace is two dimensional. Thus, for the eigenvalue λ = 2 we obtain two basic eigenvectors     1 −1 v2 =  0  , v1 =  1  , 0 1 and any eigenvector for λ = 2 is a non-trivial linear combination of these. For λ = 1, we need to solve (A − I)v = 0.       0 1 −1 1 −1 0 1 0 −1  −1 2 −1  →  0 1 −1  →  0 1 −1  . −1 1 0 0 0 0 0 0 0 The general solution of the system is v1 = v3 , v2 = v3 with v3 free. The general solution vector is     1 v3 v =  v3  = v3  1  . v3 1

104

II. DETERMINANTS AND EIGENVALUES

The eigenspace is one dimensional, and a basic eigenvector for λ = 1 is   1 v3 =  1  . 1 It is not hard to check that the set of these basic eigenvectors        1 −1 1   v1 =  1  , v2 =  0  , v3 =  1    0 1 1 is linearly independent, so it is a basis for R3 . The matrix is diagonalizable. In the above example, the reason we ended up with a basis for R3 consisting of eigenvectors for A was that there were two basic eigenvectors for the double root λ = 2. In other words, the dimension of the eigenspace was the same as the multiplicity. Theorem 2.6. Let A be an n × n matrix. The dimension of the eigenspace corresponding to a given eigenvalue is always less than or equal to the multiplicity of that eigenvalue. In particular, if all the roots of the characteristic polynomial are real, then the matrix will be diagonalizable provided, for every eigenvalue, the dimension of the eigenspace is the same as the multiplicity of the eigenvalue. If this fails for at least one eigenvalue, then the matrix won’t be diagonalizable. Note. If the characteristic polynomial has non-real, complex roots, the matrix also won’t be diagonalizable in our sense, since we require all scalars to be real. However, it might still be diagonalizable in the more general theory allowing complex scalars as entries of vectors and matrices. Exercises for Section 5. 1.

(a) Find a basis for R2 consisting of eigenvectors for · ¸ 1 2 A= . 2 1

(b) Let P be the matrix with columns the basis vectors you found in part (a). Check that P −1 AP is diagonal with the eigenvalues on the diagonal. 2.

(a) Find a basis for R3 consisting of eigenvectors for   1 2 −4 A =  2 −2 −2  . −4 −2 1

(b) Let P be the matrix with columns the basis vectors in part (a). Calculate P −1 AP and check that it is diagonal with the diagonal entries the eigenvalues you found.

6. THE EXPONENTIAL OF A MATRIX

3.

105

(a) Find the eigenvalues and eigenvectors for   2 1 1 A = 0 2 1. 0 0 1 (b) Is A diagonalizable?

4.

(a) Find a basis for R3 consisting of  2 A = 1 1

eigenvectors for  1 1 2 1. 1 2

(b) Find a matrix P such that P −1 AP is diagonal. Hint: See Problem 1. 5. Suppose A is a 5 × 5 matrix with exactly three (real) eigenvalues λ1 , λ2 , λ3 . Suppose these have multiplicities m1 , m2 , and m3 as roots of the characteristic equation. Let d1 , d2 , and d3 respectively be the dimensions of the eigenspaces for λ1 , λ2 , and λ3 . In each of the following cases, are the given numbers possible, and if so, is A diagonalizable? (a) m1 = 1, d1 = 1, m2 = 2, d2 = 2, m3 = 2, d3 = 2. (b) m1 = 2, d1 = 1, m2 = 1, d2 = 1, m3 = 2, d3 = 2. (c) m1 = 1, d1 = 2, m2 = 2, d2 = 2, m3 = 2, d3 = 2. (d) m1 = 1, d1 = 1, m2 = 1, d2 = 1, m3 = 1, d3 = 1. 6.

Tell if each of the following matrices is diagonalizable or not. · ¸ · ¸ · ¸ 5 −2 1 1 0 −1 (a) , (b) (c) −2 8 0 1 1 0

6. The Exponential of a Matrix Recall the series expansion for the exponential function ex = 1 + x +

∞ X x3 x2 xn + + ··· = . 2! 3! n! n=0

This series is specially well behaved. It converges for all possible x. There are situations in which one would like to make sense of expressions of the form f (A) where f (x) is a well defined function of a scalar variable and A is a square matrix. One way to do this is to try to make a series expansion. We show how to do this for the exponential function. Define ∞ X 1 1 An . eA = I + A + A2 + A3 + · · · = 2 3! n! n=0 A little explanation is necessary. Each term on the right is an n × n matrix. If there were only a finite number of such terms, there would be no problem, and the sum would also be an n × n matrix. In general, however, there are infinitely many terms, and we have to worry about whether it makes sense to add them up.

106

II. DETERMINANTS AND EIGENVALUES

Example 1. Let

· A=t

0 −1

¸ 1 . 0

Then ·

−1 0

0 −1 · ¸ 0 −1 A3 = t3 1 0 · ¸ 1 0 A4 = t4 0 1 · ¸ 0 1 A5 = t 5 −1 0 .. . A2 = t2

¸

Hence, eA =

· ·

1 0

¸ · 0 0 +t 1 −1 2

· ¸ 1 −1 1 + t2 0 0 2

4

1 − t2 + t4! − . . . 3 5 −t + t3! − t5! + . . . · ¸ cos t sin t = . − sin t cos t =

¸ · ¸ 1 0 0 −1 + t3 + ... −1 1 0 3! ¸ 3 5 t − t3! + t5! − . . . 2 4 1 − t2 + t4! − . . .

As in the example, a series of n × n matrices yields a separate series for each of the n2 possible entries. We shall say that such a series of matrices converges if the series it yields for each entry converges. With this rule, it is possible to show that the series defining eA converges for any n × n matrix A, but the proof is a bit involved. Fortunately, it is often the case that we can avoid worrying about convergence by appropriate trickery. In what follows we shall generally ignore such matters and act as if the series were finite sums. The exponential function for matrices obeys the usual rules you expect an exponential function to have, but sometimes you have to be careful. (1) If 0 denotes the n × n zero matrix, then e0 = I. (2) The law of exponents holds if the matrices commute, i.e., if B and C are n × n matrices such that BC = CB, then eB+C = eB eC . d (3) If A is an n × n constant matrix, then eAt = AeAt = eAt A. (It is worth dt writing this in both orders because products of matrices don’t automatically commute.) Here are the proofs of these facts. 1 (1) e0 = I + 0 + 02 + · · · = I. 2

6. THE EXPONENTIAL OF A MATRIX

107

(2) See the Exercises. (3) Here we act as if the sum were finite (although the argument would work in general if we knew enough about convergence of series of matrices.) µ

¶ 1 1 1 I + tA + t2 A2 + t3 A3 + · · · + tj Aj + . . . 2 3! j! 1 1 1 = 0 + A + (2t)A2 + (3t2 )A3 + · · · + (jtj−1 )Aj + . . . 2 3! j! 1 2 3 1 2 j−1 j t A + ... = A + tA + t A + · · · + 2 (j − 1)! 1 1 tj−1 Aj−1 + . . . ) = A(I + tA + t2 A2 + · · · + 2 (j − 1)!

d d At e = dt dt

= AeAt . Note that in the next to last step A could just as well have been factored out on the right, so it doesn’t matter which side you put it on. Exercises for Section 6. · ¸ λ 0 1. (a) Let A = . Show that 0 µ At

e 

λ1  0 (b) Let A =   ...

0 λ2 .. .

... ...

·

eλt = 0

¸ 0 . eµt

 0 0  . Such a matrix is called a diagonal matrix. ..  . 

... 0 0 . . . λn What can you say about eAt ? · ¸ 0 0 2. (a) Let N = . Calculate eN t . 1 0   0 0 0 (b) Let N =  1 0 0 . Calculate eN t . 0 1 0

 0 0 ... 0 0 1 0 ... 0 0   0 1 . . . 0 0 . What is the (c) Let N be an n × n matrix of the form   . .  .. .. . . . ... ...  0 0 ... 1 0 smallest integer k satisfying N k = 0? What can you say about eN t ? · ¸ λ 0 3. (a) Let A = . Calculate eAt . Hint: use A = λI + (A − λI). Note that 1 λ A and N = A − λI commute. 

108

II. DETERMINANTS AND EIGENVALUES



λ (b) Let A =  1 0

0 λ 1

 0 0 . Calculate eAt . λ



λ 0 ... 1 λ ...  0 1 ... (c) Let A be an n × n matrix of the form  . .  .. .. . . . 0 0 ... say about eAt = eλt e(A−λI)t ?

0 0 0 .. .

 0 0  0 . What can you ..  .

1

λ

4. Let A be an n × n matrix, and let P be a non-singular n × n matrix. Show that −1 P eAt P −1 = eP AP t . 5.

Let B and C be two n × n matrices such that BC = CB. Prove that eB+C = eB eC .

Hint: You may assume that the binomial theorem applies to commuting matrices, i.e., X n! BiC j . (B + C)n = i!j! i+j=n 6.

·

Let

0 0

1 0

(a) Show that BC 6= CB. (b) Show that · 1 B e = 0

1 1

B=

¸

· C=

¸

0 −1

·

1 e = −1 C

¸ 0 . 0 ¸ 0 . 1

(c) Show that eB eC 6= eB+C . Hint: B + C = J, where etJ was calculated in the text.

7. Review Exercises for Section 7.   0 0 1 1. What is det  0 2 1 ? Find the answer without using the recursive formula 3 2 1 or Gaussian reduction. 2. Tell whether each of the following statements is true or false. and, if false, explain why. (a) If A and B are n × n matrices then det(AB) = det A det B. (b) If A is an n × n matrix and c is a scalar, then det(cA) = c det A. (c) If A is m × n and B is n × p, then (AB)t = B t At . (d) If A is invertible, then so it At .

7. REVIEW



3.

0 A = 1 1

1 0 1 

4.

1 1 Find det  2 1

109



1 1 . Find the eigenvalues and eigenvectors of A. 0 3 2 4 1

1 4 1 6

 1 3 . 1 1

5. Each of the following statements is not generally true. In each case explain briefly why it is false. (a) An n × n matrix A is invertible if and only if det A = 0. (b) If A is an n × n real matrix, then there is a basis for Rn consisting of eigenvectors for A. (c) det At = det A. Hint. Are these defined?     1 2 1 6 6. Let A =  1 3 1 . Is v =  1  an eigenvector for A? Justify your answer. 1 2 2 5 7.

(a) The characteristic equation of 

2 −4 A = 0 3 1 −4

 1 0 2

is −(λ − 3)2 (λ− 1) = 0. Is A diagonalizable? Explain. 1 2 3 (b) Is B =  0 4 5  diagonalizable? Explain. 0 0 6 8. Let A be an n × n matrix with the property that the sum of all the entries in each row is always the same number a. Without using determinants, show that the common sum a is an eigenvalue. Hint: What is the corresponding eigenvector?

110

II. DETERMINANTS AND EIGENVALUES

CHAPTER III

APPLICATIONS

1. Real Symmetric Matrices The most common matrices we meet in applications are symmetric, that is, they are square matrices which are equal to their transposes. In symbols, At = A. Examples.

are symmetric, but

·

1 2



1 0 0

¸ 2 , 2  2 3, 4

2 1 0



1 −1  −1 0 0 2 

 0 2 3

 1 −1 0 2 −2 0

0  −1 1

are not. Symmetric matrices are in many ways much simpler to deal with than general matrices. First, as we noted previously, it is not generally true that the roots of the characteristic equation of a matrix are necessarily real numbers, even if the matrix has only real entries. However, if A is a symmetric matrix with real entries, then the roots of its characteristic equation are all real. Example 1. The characteristic equations of ·

are

0 1

1 0

¸

λ2 − 1 = 0

· and

and

0 −1 1 0

¸

λ2 + 1 = 0

respectively. Notice the dramatic effect of a simple change of sign. The reason for the reality of the roots (for a real symmetric matrix) is a bit subtle, and we will come back to it later sections. The second important property of real symmetric matrices is that they are always diagonalizable, that is, there is always a basis for Rn consisting of eigenvectors for the matrix. 111

112

III. APPLICATIONS

Example 2. We previously found a basis for R2 consisting of eigenvectors for the 2 × 2 symmetric matrix · ¸ 2 1 A= 1 2 The eigenvalues are λ1 = 3, λ2 = 1, and the basis of eigenvectors is ½ · ¸ · ¸¾ 1 −1 v1 = , v2 = . 1 1 If you look carefully, you will note that the vectors v1 and v2 not only form a basis, but they are perpendicular to one another, i.e., v1 · v2 = 1(−1) + 1(1) = 0. The perpendicularity of the eigenvectors is no accident. It is always the case for a symmetric matrix by the following reasoning. First, recall that the dot product of two column vectors u and v in Rn can be written as a row by column product 

u · v = ut v = [ u1

u2

 v1 n  v2  X = . . . un ]  u i vi . .  ..  vn

i=1

Suppose now that Au = λu and Av = µv, i.e., u and v are eigenvectors for A with corresponding eigenvalues λ and µ. Assume λ 6= µ. Then (1)

u · (Av) = ut (Av) = ut (µv) = µ(ut v) = µ(u · v).

On the other hand, (2)

(Au) · v = (Au)t v = (λu)t v = λ(ut v) = λ(u · v).

However, since the matrix is symmetric, At = A, and (Au)t v = (ut At )v = (ut A)v = ut (Av). The first of these expressions is what was calculated in (2) and the last was calculated in (1), so the two are equal, i.e., µ(u · v) = λ(u · v). If u · v 6= 0, we can cancel the common factor to conclude that µ = λ, which is contrary to our assumption, so it must be true that u · v = 0, i.e., u ⊥ v. We summarize this as follows. Eigenvectors for a real symmetric matrix which belong to different eigenvalues are necessarily perpendicular. This fact has important consequences. Assume first that the eigenvalues of A are distinct and that it is real and symmetric. Then not only is there a basis consisting of eigenvectors, but the basis elements are also mutually perpendicular.

2. REAL SYMMETRIC MATRICES

113

This is reminiscent of the familiar situation in R2 and R3 , where coordinate axes are almost always assumed to be mutually perpendicular. For arbitrary matrices, we may have to face the prospect of using ‘skew’ axes, but the above remark tells us we can avoid this possibility in the symmetric case. In two or three dimensions, we usually require our basis vectors to be unit vectors. There is no problem with that here. Namely, if u is not a unit vector, we can always obtain a unit vector by dividing u by its length |u|. Moreover, if u is an eigenvector for A with eigenvalue λ, then any nonzero multiple of u is also such an eigenvector, 1 u is. in particular, the unit vector |u| √ Example 2, revisited. The eigenvectors v1 and v2 both have length 2. So we replace them by the corresponding unit vectors " 1 # " 1 # √ − √2 1 1 2 √ v1 = 1 √ v2 = √ √1 2 2 2 2 which also constitute a basis for R2 . There is some special terminology which is commonly used in linear algebra for the familiar concepts discussed above. Two vectors are said to be orthogonal if they are perpendicular. A unit vector is said to be normalized . The idea is that if we started with a non-unit vector, we would produce an equivalent unit vector by dividing it by its length. The latter process is called normalization. Finally, a basis for Rn consisting of mutually perpendicular unit vectors is called an orthonormal basis. Exercises for Section 1.

·

¸ −3 4 1. (a) Find a basis of eigenvectors for A = . 4 3 (b) Check that the basis vectors are orthogonal, and normalize them to yield an orthonormal basis. · ¸ −3 2 2. (a) Find a basis of eigenvectors for A = . 8 3 (b) Are the basis vectors orthogonal to one another? If not what might be the problem?   1 0 1 3. (a) Find a basis of eigenvectors for A =  0 1 0 . 1 0 1 (b) Check that the basis vectors are orthogonal, and normalize them to yield an orthonormal basis.   1 4 3 4. Let A =  4 1 0 . Find an orthonormal basis of eigenvectors. 3 0 1 5. Let A be a symmetric n × n matrix, and let P be any n × n matrix. Show that P t AP is also symmetric.

114

III. APPLICATIONS

2. Repeated Eigenvalues, The Gram–Schmidt Process We now consider the case in which one or more eigenvalues of a real symmetric matrix A is a repeated root of the characteristic equation. It turns out that we can still find an orthonormal basis of eigenvectors, but it is a bit more complicated. Example 1. Consider 

−1 1 A =  1 −1 1 1

 1 1. −1

The characteristic equation is 

−1 − λ det  1 1

1 −1 − λ 1

 1 1  −1 − λ

= −(1 + λ)((1 + λ)2 − 1) − 1(−1 − λ − 1) + 1(1 + 1 + λ) = −(1 + λ)(λ2 + 2λ) + 2(λ + 2) = −(λ3 + 3λ2 − 4) = 0. Using the method suggested in Chapter 2, we may find the roots of this equation by trying the factors of the constant term. The roots are λ = 1, which has multiplicity 1, and λ = −2, which has multiplicity 2. For λ = 1, we need to reduce 

  −2 1 1 1 A − I =  1 −2 1 → 0 1 1 −2 0

  1 −2 1 −3 3 → 0 3 −3 0

 0 −1 1 −1  . 0 0

The general solution is v1 = v3 , v2 = v3 with v3 free. A basic eigenvector is   1 v1 =  1  1 but we should normalize this by dividing it by |v1 | =



3. This gives

  1 1   1 . u1 = √ 3 1 For λ = −2, the situation is more complicated. Reduce 

1 A + 2I =  1 1

1 1 1

  1 1 1 → 0 1 0

1 0 0

 1 0 0

2. REPEATED EIGENVALUES, THE GRAM––SCHMIDT PROCESS

115

which yields the general solution v1 = −v2 − v3 with v2 , v3 free. This gives basic eigenvectors     −1 −1 v2 =  1  , v3 =  0  . 0 1 Note that, as the general theory predicts, v1 is perpendicular to both v2 and v3 . (The eigenvalues are different). Unfortunately, v2 and v3 are not perpendicular to each other . However, with a little effort, this is easy to remedy. All we have to do is pick another basis for the subspace spanned by {v2 , v3 }. The eigenvectors with eigenvalue −2 are exactly the non-zero vectors in this subspace, so any basis will do as well. Hence, we arrange to pick a basis consisting of mutually perpendicular vectors. It is easy to construct the new basis. Indeed we need only replace one of the two vectors. Keep v2 , and let v30 = v3 − cv2 where c is chosen so that v2 · v30 = v2 · v3 − cv2 · v2 = 0, v2 · v3 . (See the diagram to get some idea of the geometry behind v2 · v2 this calculation.) i.e., take c =

v

3 v

3

v

2

We have 1 v2 · v3 = v ·v 2    1 2 2 −2 −1 −1 1 1 v30 = v3 − v2 =  0  −  1  =  − 21  . 2 2 0 1 1 We should also normalize this basis by choosing   r  −1  −1 1 1  1 2  21  v2 = √ u2 = 1, u3 = 0 v30 = −2 . |v2 | |v3 | 3 2 0 1 Putting this all together, we see that     1 −1 1   1  u1 = √ 1 , u2 = √ 1, 3 1 2 0

r  −1  2  21  u3 = −2 3 1

form an orthonormal basis for R3 consisting of eigenvectors for A.

116

III. APPLICATIONS

The Gram–Schmidt Process. In Example 1, we used a special case of a more general algorithm in order to construct an orthonormal basis of eigenvectors. The algorithm, called the Gram–Schmidt Process works as follows. Suppose {v1 , v2 , . . . , vk } is a linearly independent set spanning a certain subspace W of Rn . We construct an orthonormal basis for W as follows. Let v10 = v1 v2 · v10 0 v v10 · v10 1 v3 · v10 0 v3 · v20 0 v30 = v3 − 0 v1 − 0 v 0 v1 · v1 v2 · v20 2 .. . v20 = v2 −

vk0 = vk −

k−1 X j=1

vk · vj0 0 v . vj0 · vj0 j

It is not hard to see that each new vk0 is perpendicular to those constructed before it. For example, v10 · v30 = v10 · v3 −

v3 · v10 0 v3 · v20 0 v1 · v10 − 0 v · v0 . 0 0 v1 · v1 v2 · v20 1 2

However, we may suppose that we already know that v10 · v20 = 0 (from the previous stage of the construction), so the above becomes v10 · v30 = v10 · v3 − v3 · v10 = 0. The same argument works at each stage. It is also not hard to see that at each stage, replacing vj by vj0 in 0 , vj } {v10 , v20 , . . . , vj−1

does not change the subspace spanned by the set. Hence, for j = k, we conclude that {v10 , v20 , . . . , vk0 } is a basis for W consisting of mutually perpendicular vectors. Finally, to complete the process simply divide each vj0 by its length uj =

1 0 v . |vj0 | j

Then {u1 , . . . , uk } is an orthonormal basis for W .

2. REPEATED EIGENVALUES, THE GRAM––SCHMIDT PROCESS

117

Example 2. Consider the subspace of R4 spanned by       −1 −1 1  1  1 0 v1 =   , v2 =   , v3 =   . 0 1 0 1 0 1 Then 

 −1  1 v10 =   0 1     −1   −1 −1 3 1   1 2  1  3  v20 =  = −   0 1 1 3 2 1 0 − 3 1 4     −3 −1 1 5 1  1  0  0  1  −1  0 5 3  v3 =   −   − 15   1 =  3 . 0 0 3 5 9 3 1 1 − 32 5 Normalizing, we get 

 −1 1  1 u1 = √   0 3 1  1   −3 −1 1 3  3 1  = √1  u2 = √      3 1 15 15 2 −2 −3 4   4 5 1  5  1 1  5 = √ u3 = √   . 3   35 5 35 3 3 3 5 The Principal Axis Theorem. The Principal Axis Theorem asserts that the process outlined above for finding mutually perpendicular eigenvectors always works. If A is a real symmetric n × n matrix, there is always an orthonormal basis for Rn consisting of eigenvectors for A. Here is a summary of the method. If the roots of the characteristic equation are all different, then all we need to do is find an eigenvector for each eigenvalue and if necessary normalize it by dividing by its length. If there are repeated roots, then it will usually be necessary to apply the Gram–Schmidt process to the set of basic eigenvectors obtained for each repeated eigenvalue.

118

III. APPLICATIONS

Exercises for Section 2. Apply theGram–Schmidt Process to each of the following    2   1 (a)  0  ,  1    1 0       1 1 0     0 1  2 (b)   ,   ,   . 0 1    2  0 1 −1  1 1 2. Find an orthonormal basis of eigenvectors for A =  1 1 1 1

1.

3.

sets of vectors.

 1 1 . 1

Find an orthonormal basis of eigenvectors for   −1 2 2 A =  2 −1 2. 2 2 −1

Hint: 3 is an eigenvalue. 4. Let {v1 , v2 , v3 } be a linearly independent set. Suppose {v10 , v20 , v30 } is the set obtained (before normalizing) by the Gram-Schmidt Process. (a) Explain why v20 is not zero. (b) Explain why v30 is not zero. The generalization of this to an arbitrary linearly independent set is one reason the Gram-Schmidt Process works. The vectors produced by that process are mutually perpendicular provided they are non-zero, and so they form a linearly independent set. Since they are in the subspace W spanned by the original set of vectors and there are just enough of them, they must form a basis a basis for W .

3. Change of Coordinates As we have noted previously, it is probably a good idea to use a special basis like an orthonormal basis of eigenvectors. Any problem associated with the matrix A is likely to take a particularly simple form when expressed relative to such a basis. To study this in greater detail, we need to talk a bit more about changes of coordinates. Although the theory is quite general, we shall concentrate on some simple examples. In Rn , the entries in a column vector x may be thought of as the coordinates x1 , x2 , . . . , xn of the vector with respect to the standard basis. To simplify the algebra, let’s concentrate on one specific n, say n = 3. In that case, we may make the usual identifications e1 = i, e2 = j, e3 = k for the elements of the standard basis. Suppose {v1 , v2 , v3 } is another basis. The coordinates of x with respect to the new basis—call them x01 , x02 , x03 —are defined by the relation  0 x1 (1) x = v1 x01 + v2 x02 + v3 x03 = [ v1 v2 v3 ]  x02  = [ v1 v2 v3 ] x0 . x03

3. CHANGE OF COORDINATES

119

One way to view this relation is as a system of equations in which the old coordinates  x1 x =  x2  x3 

are given, and we want to solve for the new coordinates 

 x01 x =  x02  . x03 0

The coefficient matrix of this system P = [ v1

v2

v3 ]

is called the change of basis matrix. It’s columns are the old coordinates of the new basis vectors. The relation (1) may be rewritten (2)

x = P x0

and it may also be interpreted as expressing the ‘old’ coordinates of a vector in terms of its ‘new’ coordinates. This seems backwards, but it is easy to turn it around. Since the columns of P are linearly independent, P is invertible and we may write instead (3)

x0 = P −1 x

where we express the ‘new’ coordinates in terms of the ‘old’ coordinates. Example 1. Suppose in R2 we pick a new set of coordinate axes by rotating each of the old axes through angle θ in the counterclockwise direction. Call the old coordinates (x1 , x2 ) and the new coordinates (x01 , x02 ). According to the above discussion, the columns of the change of basis matrix P come from the old coordinates of the new basis vectors, i.e., of unit vectors along the new axes. From the diagram, these are · ¸ · ¸ cos θ − sin θ . sin θ cos θ

120

III. APPLICATIONS

x

x’ 2

2 x’ 1

θ θ x

Hence,

·

x1 x2

¸

·

cos θ = sin θ

− sin θ cos θ

¸·

1

¸ x01 . x02

The change of basis matrix is easy to invert in this case. (Use the special rule which applies to 2 × 2 matrices.) ·

cos θ sin θ

− sin θ cos θ

¸−1

1 = 2 cos θ + sin2 θ

·

cos θ − sin θ

sin θ cos θ

¸

·

cos θ = − sin θ

sin θ cos θ

¸

(You could also have obtained this by using the matrix for rotation through angle −θ.) Hence, we may express the ‘new’ coordinates in terms of the ‘old’ coordinates through the relation ¸· ¸ · 0¸ · cos θ sin θ x1 x1 = . x02 x2 − sin θ cos θ For example, suppose θ = π/6. The new coordinates of the point with original coordinates (2, 6) are given by ·

x01 x02

·√

¸ =

3/2 −1/2

√1/2 3/2

¸· ¸ · √ ¸ 2 3 +√ 3 = . 6 −1 + 3 3

So with respect to the rotated axes, the coordinates are (3 +



√ 3, 3 3 − 1).

Orthogonal Matrices. You may have noticed that the matrix P obtained in Example 1 has the property P −1 = P t . This is no accident. It is a consequence of the fact that its columns are mutually perpendicular unit vectors. Indeed, The columns of an n × n matrix form an orthonormal basis for Rn if and only if its inverse is its transpose. An n × n real matrix with this property is called orthogonal .

3. CHANGE OF COORDINATES

121

Example 2. Let ·3 P =

− 45

5 4 5

¸ .

3 5

The columns of P are ·3¸ 5 4 5

u1 =

· ,

u2 =

− 54 3 5

¸ ,

and it is easy to check that these are mutually perpendicular unit vectors in R2 . To see that P −1 = P t , it suffices to show that ·

t

P P =

3 5 − 45

4 5 3 5

¸· 3

5 4 5

− 45 3 5

¸

·

1 = 0

¸ 0 . 1

Of course, it is easy to see that this true by direct calculation, but it may be more informative to write it out as follows ¸ (u1 )t [ u1 P P = (u2 )t t

·

·

u1 · u1 u2 ] = u2 · u1

u1 · u2 u2 · u2

¸

where the entries in the product are exhibited as row by column dot products. The off diagonal entries are zero because the vectors are perpendicular, and the diagonal entries are ones because the vectors are unit vectors. The argument for n × n matrices is exactly the same except that there are more entries. Note. The terminology is very confusing. The definition of an orthogonal matrix requires that the columns be mutually perpendicular and also that they be unit vectors. Unfortunately, the terminology reminds us of the former condition but not of the latter condition. It would have been better if such matrices had been named ‘orthonormal’ matrices rather than ‘orthogonal’ matrices, but that is not how it happened, and we don’t have the option of changing the terminology at this late date. The Principal Axis Theorem Again. As we have seen, given a real symmetric n × n matrix A, the Principal Axis Theorem assures us that we can always find an orthonormal basis {v1 , v2 , . . . , vn } for Rn consisting of eigenvectors for A. Let P = [ v1

v2

...

vn ]

122

III. APPLICATIONS

be the corresponding change of basis matrix. As in Chapter II, Section 5, we have   λ1  0   Av1 = v1 λ1 = [ v1 v2 . . . vn ]   ...  0  0  λ2   vn ]   ...  

Av2 = v2 λ2 = [ v1

v2

...

0 .. .



Avn = vn λn = [ v1

v2

 0  0   vn ]   ... 

...

λn where some eigenvalues λj for different eigenvectors might be repeated. These equations can be written in a single matrix equation   λ1 0 . . . 0  0 λ2 . . . 0  A [ v 1 v2 . . . v n ] = [ v 1 v2 . . . v n ]  .. .   ... . . . . ..  0

0

or

...

λn

AP = P D where D is a diagonal matrix with eigenvalues (possibly repeated) on the diagonal. This may also be written (4) P −1 AP = D. Since we have insisted that the basic eigenvectors form an orthonormal basis, the change of basis matrix P is orthogonal, and we have P −1 = P t . Hence, (4) can be written in the alternate form (5) Example 3. Let to be λ2 − 625 = 0, orthonormal basis of ·3¸ u1 = 54 5

with P orthogonal. P t AP = D · ¸ −7 14 A= . The characteristic equation of A turns out 14 7 so the eigenvalues are λ = ±25. Calculation shows that an eigenvectors is formed by · 4¸ −5 for λ = 25 and u2 = for λ = −25. 3 5

Hence, we may take P to be the orthogonal matrix ·3 ¸ − 45 5 3 . 4 5

5

The reader should check ¸ case · that ¸ · 3 · 3in this 4 t 5 5 . −7 14 5 P AP = 4 14 7 − 45 53 5

− 45 3 5

¸

· =

25 0

¸ 0 . −25

3. CHANGE OF COORDINATES

123

Appendix. A Proof of the Principal Axis Theorem. The following section outlines how the Principal Axis Theorem is proved for the very few of you who may insist on seeing it. It is not necessary for what follows. In view of the previous discussions, we can establish the Principal Axis Theorem by showing that there is an orthogonal n × n matrix P such that (6)

AP = P D

P t AP = D

or equivalently

where D is a diagonal matrix with the eigenvalues of A (possibly repeated) on its diagonal. The method is to proceed by induction on n. If n = 1 there really isn’t anything to prove. (Take P = [ 1 ].) Suppose the theorem has been proved for (n − 1) × (n − 1) matrices. Let u1 be a unit eigenvector for A with eigenvalue λ1 . Consider the subspace W consisting of all vectors perpendicular to u1 . It is not hard to see that W is an n − 1 dimensional subspace. Choose (by the Gram–Schmidt Process) an orthonormal basis {w2 , w2 . . . , wn } for W . Then {u1 , w2 , . . . , wn } is an orthonormal basis for Rn , and  λ1  0  wn ]  . . }  ..  

Au1 = u1 λ1 = [ u1 |

w2

... {z P1

0

This gives the first column of AP1 , and we want to say something about its remaining columns Aw2 , Aw2 , . . . , Awn . To this end, note that if w is any vector in W , then Aw is also a vector in W . For, we have u1 · (Aw) = (u1 )t Aw = (u1 )t At w = (Au1 )t w = λ1 (u1 )t w) = λ1 (u1 · w) = 0, which is to say, Aw is perpendicular to u1 if w is perpendicular to u1 . It follows that each Awj is a linear combination just of w2 , w3 , . . . , wn , i.e.,

Awj = [ u1

w2

  0 ∗  wn ]   ... 

...

∗ where ‘∗’ denotes some unspecified entry. Putting this all together, we see that 

λ1  0 AP1 = P1   ... |

0

0

... 0

A {z

A1

0

    }

124

III. APPLICATIONS

where A0 is an (n − 1) × (n − 1) matrix. P1 is orthogonal (since its columns form an orthonormal basis) so P1 t AP1 = A1 , and it is not hard to derive from this the fact that A1 is symmetric. Because of the structure of A1 , this implies that A0 is symmetric. Hence, by induction we may assume there is an (n − 1) × (n − 1) orthogonal matrix P 0 such that A0 P 0 = P 0 D0 with D0 diagonal. It follows that 

1 0 A1   ... |

0

0 ... P {z P2

0

0





λ1   0 = .   ..

0

...

0

0

A



1 0  .   ..

0

} 

λ1  0 =  ... 0  1 0 0 =  ... |

0

0

... 0 P

0

   

0 0

... 0

0

AP ... P {z P2

0

0

0





λ1   0 = .   ..

0

... 0

P D

0



λ1  0  .   ..

0

0

}|

... D {z D

0

0

0 0

   

   = P2 D.  }

Note that P2 is orthogonal and D is diagonal. Thus, A P1 P2 = P1 A1 P2 = P1 P2 D | {z } | {z } P

P

or

AP = P D.

However, a product of orthogonal matrices is orthogonal—see the Exercises—so P is orthogonal as required. This completes the proof. There is one subtle point involved in the above proof. We have to know that a real symmetric n × n matrix has at least one real eigenvalue. This follows from the fact, alluded to earlier, that the roots of the characteristic equation for such a matrix are necessarily real. Since the equation does have a root, that root is the desired eigenvalue. Exercises for Section 3. 1. Find the change of basis matrix for a rotation through (a) 30 degrees in the counterclockwise direction and (b) 30 degrees in the clockwise direction 2. Let P (θ) be the matrix for rotation of axes through θ. Show that P (−θ) = P (θ)t = P (θ)−1 .

4. CLASSIFICATION OF CONICS AND QUADRICS

125

3. An inclined plane makes an angle of 30 degrees with the horizontal. Change to a coordinate system with x01 axis parallel to the inclined plane and x02 axis perpendicular to it. Use the change of variables formula derived in the section to find the components of the gravitational acceleration vector −gj in the new coordinate system. Compare this with what you would get by direct geometric reasoning. ¸ · 1 2 4. Let A = . Find a 2×2 orthogonal matrix P such that P t AP is diagonal. 2 1 What are the diagonal entries?   1 4 3 5. Let A =  4 1 0 . Find a 3 × 3 orthogonal matrix P such that P t AP is 3 0 1 diagonal. What are the diagonal entries? 6. Show that the product of two orthogonal matrices is orthogonal. How about the inverse of an orthogonal matrix? 7. The columns of an orthogonal matrix are mutually perpendicular unit vectors. Is the same thing true of the rows? Explain.

4. Classification of Conics and Quadrics The Principal Axis Theorem derives its name from its relation to classifying conics, quadric surfaces, and their higher dimensional analogues. The general quadratic equation ax2 + bxy + cy 2 + dx + ey = f (with enough of the coefficients non-zero) defines a curve in the plane. Such a curve is generally an ellipse, a hyperbola, a parabola, all of which are called conics, or two lines, which is considered a degenerate case. (See the Appendix to this section for a review.) Examples. y2 =1 4 x2 − y 2 = 1

x2 +

x2 − 2xy + 2y 2 = 1 x2 + 2xy − y 2 + 3x − 5y = 10 If the linear terms are not present (d = e = 0 and f 6= 0), we call the curve a central conic. It turns out to be an ellipse or hyperbola (but its axes of symmetry may not be the coordinate axes) or a pair of lines in the degenerate case. Parabolas can’t be obtained this way.

126

III. APPLICATIONS

In this section, we shall show how to use linear algebra to classify such central conics and higher dimensional analogues such as quadric surfaces in R3 . Once you understand the central case, it is fairly easy to reduce the general case to that. (You just use completion of squares to get rid of the linear terms in the same way that you identify a circle with center not at the origin from its equation.) In order to apply the linear algebra we have studied, we adopt a more systematic notation, using subscripted variables x1 , x2 instead of x, y. Consider the central conic defined by the equation f (x) = a11 x1 2 + 2a12 x1 x2 + a22 x2 2 = C (The reason for the 2 will be clear shortly.) It is more useful to express the function f as follows. f (x) = (x1 a11 + x2 a21 )x1 + (x1 a12 + x2 a22 )x2 = x1 (a11 x1 + a12 x2 ) + x2 (a21 x1 + a22 x2 ), where we have introduced a21 = a12 . The above expression may also be written in matrix form 2 X xj ajk xk = xt Ax f (x) = j,k=1

where A is the symmetric matrix of coefficients. Note what has happened to the coefficients. The coefficients of the squares appear on the diagonal of A, while the coefficient of the cross term 2bx1 x2 is divided into two equal parts. Half of it appears as b in the 1, 2 position (corresponding to the product x1 x2 ) while the other half appears as b in the 2, 1 position (corresponding to the product x2 x1 which of course equals x1 x2 ). So it is clear why the matrix is symmetric. This may be generalized to n > 2 in a rather obvious manner. Let A be a real symmetric n × n matrix, and define f (x) =

n X

xj ajk xk = xt Ax.

j,k=1

For n = 3 this may be written explicitly f (x) = (x1 a11 + x2 a21 + x3 a31 )x1 + (x1 a12 + x2 a22 + x3 a32 )x2 + (x1 a13 + x2 a23 + x3 a33 )x3 = a11 x1 2 + a22 x2 2 + a33 x3 2 + 2a12 x1 x2 + 2a13 x1 x3 + 2a23 x2 x3 . The rule for forming the matrix A from the equation for f (x) is the same as in the 2 × 2 case. The coefficients of the squares are put on the diagonal. The

4. CLASSIFICATION OF CONICS AND QUADRICS

127

coefficient of a cross term involving xi xj is split in half, with one half put in the i, j position, and the other half is put in the j, i position. The level set defined by f (x) = C is called a central hyperquadric. It should be visualized as an n − 1 dimensional curved object in Rn . For n = 3 it will be an ellipsoid or a hyperboloid (of one or two sheets) or perhaps a degenerate ‘quadric’ like a cone. (As in the case of conics, we must also allow linear terms to encompass paraboloids.) If the above descriptions are accurate, we should expect the locus of the equation f (x) = C to have certain axes of symmetry which we shall call its principal axes. It turns out that these axes are determined by an orthonormal basis of eigenvectors for the coefficient matrix A. To see this, suppose {u1 , u2 , . . . , un } is such a basis and P = [ u1 u2 . . . un ] is the corresponding orthogonal matrix. By the Principal Axis Theorem, P t AP = D is diagonal with the eigenvalues, λ1 , λ2 , . . . , λn , of A appearing on the diagonal. Make the change of coordinates x = P x0 where x represents the ‘old’ coordinates and x0 represents the ‘new’ coordinates. Then f (x) = xt Ax = (P x0 )t A(P x0 ) = (x0 )t P t AP x0 = (x0 )t Dx0 . Since D is diagonal, the quadratic expression on the right has no cross terms, i.e. 

(x0 )t Dx0 = [ x01

x02

λ1 0  · · · x0n ]   ...

0 λ2 .. .

0

0

··· ··· ··· ···

  x0  0 1 0   x02   .  ..  .   ..  λn x0n

= λ1 (x01 )2 + λ2 (x02 )2 + · · · + λn (x0n )2 . In the new coordinates, the equation takes the form

λ1 (x01 )2 + λ2 (x02 )2 + · · · + λn (x0n )2 = C and its graph is usually quite easy to describe. Example 1. We shall investigate the conic f (x, y) = x2 + 4xy + y 2 = 1. First rewrite the equation · ¸· ¸ 1 2 x [x y] = 1. 2 1 y (Note how the 4 was split into two symmetrically placed 2s.) Next, find the eigenvalues of the coefficient matrix by solving · ¸ 1−λ 2 det = (1 − λ)2 − 4 = λ2 − 2λ − 3 = 0. 2 1−λ This equation is easy to factor, and the roots are λ = 3, λ = −1. For λ = 3, to find the eigenvectors, we need to solve · ¸· ¸ −2 2 v1 = 0. v2 2 −2

128

III. APPLICATIONS

Reduction of the coefficient matrix yields · ¸ · ¸ −2 2 1 −1 → 2 −2 0 0 with the general solution v1 = v2 , v2 free. A basic normalized eigenvector is · ¸ 1 1 . u1 = √ 2 1 For λ = −1, a similar calculation (which you should make) yields the basic normalized eigenvector · ¸ 1 −1 √ . u2 = 1 2 (Note that u1 ⊥ u2 as expected.) From this we can form the corresponding orthogonal matrix P and make the change of coordinates · ¸ · 0¸ x x , =P y0 y and, according to the above analysis, the equation of the conic in the new coordinate system is 3(x0 )2 − (y 0 )2 = 1. It is clear that this is a hyperbola with principal axes pointing along the new axes.

y y

x

x

Example 2. Consider the quadric surface defined by x1 2 + x2 2 + x3 2 − 2x1 x3 = 1. We take 

f (x) = x1 2 + x2 2 + x3 2 − 2x1 x3 = [ x1

x2

1 x3 ]  0 −1

0 1 0

  −1 x1 0   x2  . x3 1

4. CLASSIFICATION OF CONICS AND QUADRICS

129

Note how the coefficient in −2x1 x3 was split into two equal parts, a −1 in the 1, 3-position and a −1 in the 3, 1-position. The coefficients of the other cross terms were zero. As usual, the coefficients of the squares were put on the diagonal. The characteristic equation of the coefficient matrix is   1−λ 0 −1 det  0 1−λ 0  = (1 − λ)3 − (1 − λ) = −(λ − 2)(λ − 1)λ = 0 −1 0 1−λ Thus, the eigenvalues are λ = 2, 1, 0. For λ = 2, reduce    −1 0 −1 1  0 −1 0 → 0 −1 0 −1 0

0 1 0

 1 0 0

to obtain v1 = −v3 , v2 = 0 with v3 free. Thus,   −1 v1 =  0  1 is a basic eigenvector for λ = 2, and   −1 1  u1 = √ 0 2 1 is a basic unit eigenvector. Similarly, for λ = 1 reduce 

0  0 −1

  0 −1 1 0 0 → 0 0 0 0

0 0 0

 0 1 0

which yields v1 = v3 = 0 with v2 free. Thus a basic unit eigenvector for λ = 1 is   0 u2 =  1  . 0 Finally, for λ = 0, reduce 

  1 0 −1 1  0 1 0 → 0 −1 0 1 0

 0 −1 1 0. 0 0

This yields v1 = x3 , v2 = 0 with v3 free. Thus, a basic unit eigenvector for λ = 0 is   1 1   u3 = √ 0 . 2 1

130

III. APPLICATIONS

The corresponding orthogonal change of basis matrix is 

P = [ u1

u2

− √12 0 u3 ] =  √1 2

0 1 0

√1 2



0.

√1 2

Moreover, putting x = P x0 , we can express the equation of the quadric surface in the new coordinate system 2x01 2 + 1x02 2 + 0x03 2 = 2x01 2 + x02 2 = 1.

(1)

Thus it is easy to see what this quadric surface is: an elliptical cylinder perpendicular to the x01 , x02 plane. (This is one of the degenerate cases.) The three ‘principal axes’ in this case are the two axes of the ellipse in the x01 , x02 plane and the x03 axis, which is the central axis of the cylinder.

3

Tilted cylinder relative to the 1

new axes. The new axes are labelled.

2

Representing the graph in the new coordinates makes it easy to understand its geometry. Suppose, for example, that we want to find the points on the graph which are closest to the origin. These are the points at which the x01 -axis intersects 1 the surface. These are the points with new coordinates x01 = ± √ , x02 = x03 = 0. If 2 you want the coordinates of these points in the original coordinate system, use the change of coordinates formula x = P x0 . √ Thus, the old coordinates of the minimum point with new coordinates (1/ 2, 0, 0) are given by  1   1   1 √ − √2 0 √12 −2 2  0 1 0 0  =  0. 1 √1 0 √12 0 2 2

4. CLASSIFICATION OF CONICS AND QUADRICS

131

Appendix. A review of conics and quadrics. You are probably familiar with certain graphs when they arise in standard configurations. In two dimensions, the central conics have equations of the form ±

y2 x2 ± 2 = 1. 2 a b

If both signs are +, the conic is an ellipse. If one sign is + and one is −, then the conic is a hyperbola. The + goes with the axis which crosses the hyperbola. Some examples are sketched below.

Ellipse

Hyperbola

Two − signs result in an empty graph, i.e., there are no points satisfying the equation. Parabolas arise from equations of the form y = px2 with p 6= 0.

Parabolas

For x = py 2 , the parabola opens along the positive or negative y-axis. 2 2 There are also some degenerate cases. For example, xa2 − yb2 = 0 defines two lines which intersect at the origin. In three dimensions, the central quadrics have equations of the form ±

x2 y2 z2 ± ± = 1. a2 b2 c2

132

III. APPLICATIONS

If all three signs are +, the quadric is an ellipsoid. If two of the three signs are + and one is −, the quadric is a hyperboloid of one sheet. If one of the two signs is + and the other two are −, the quadric is a hyperboloid of two sheets. Notice that the number of sheets is the same as the number of − signs. It is not hard to figure out how the quadric is oriented, depending on how the signs are arranged. The ‘axis’ of a hyperboloid is labeled by the variable whose sign in the equation is in the minority, i.e., the − sign in the one sheet case and the + sign in the two sheet case.

Ellipsoid

Hyperboloid of one sheet

Hyperboloid of two sheets

If all three signs are −, we get an empty graph. Paraboloids arise from equations of the form z=±

x2 y2 ± 2, 2 a b

or similar equations with x, y, z rearranged. If both signs are + or both are −, then the quadric is an elliptic paraboloid or ‘bowl’. The bowl opens up along the axis of the variable appearing on the left of the equation if the signs are + and it opens along the negative axis of that variable if the signs are −. If one sign is + and the other is −, the surface is a hyperbolic paraboloid or ‘saddle’. Equations of the form z = cxy, c 6= 0 also describe saddles.

Elliptic paraboloid

Hyperbolic paraboloid

There are many degenerate cases. One example would be x2 y2 z2 + − = 0. a2 b2 c2

5. CLASSIFICATION OF CONICS AND QUADRICS

133

Its graph is a double cone with elliptical cross sections. Another would be ±

x2 y2 ± =1 a2 b2

with at least one + sign. Its graph is a ‘cylinder’ perpendicular to the x, y-plane. The cross sections are ellipses or hyperbolas, depending on the combination of signs.

Cylinder

Cone Exercises for Section 4. 1.

Find the principal axes and classify the central conic x2 + xy + y 2 = 1.

2. Identify the conic defined by x2 + 4xy + y 2 = 4. Find its principal axes, and find the points closest and furthest (if any) from the origin. 3. Identify the conic defined by 2x2 + 72xy + 23y 2 = 50. Find its principal axes, and find the points closest and furthest (if any) from the origin. 4.

Find the principal axes and classify the central quadric defined by x2 − y 2 + z 2 − 4xy − 4yz = 1.

5.

(Optional) Classify the surface defined by x2 + 2y 2 + z 2 + 2xy + 2yz − z = 0.

Hint: This is not a central quadric. To classify it, first apply the methods of the section to the quadratic expression x2 +2y 2 +z 2 +2xy+2yz to find a new coordinate system in which this expression has the form λ1 x02 + λ2 y 02 + λ3 z 02 . Use the change of coordinates formula to express z in terms of x0 , y 0 , and z 0 and then complete squares to eliminate all linear terms. At this point, it should be clear what the surface is.

134

III. APPLICATIONS

5. Conics and the Method of Lagrange Multipliers There is another approach to finding the principal axes of a conic, quadric, or hyperquadric. Consider for an example an ellipse in R2 centered at the origin. One of the principal axes intersects the conic in the two points at greatest distance from the origin, and the other intersects it in the two points at least distance from the origin. Similarly, two of the three principal axes of a central ellipsoid in R3 may be obtained in this way. Thus, if we didn’t know about eigenvalues and eigenvectors, we might try to find the principal axes by maximizing (or minimizing) the function giving the distance to the origin subject to the quadratic equation defining the conic or quadric. In other words, we need to minimize a function given a constraint among the variables. Such problems are solved by the method of Lagrange multipliers, which you learned in your multidimensional calculus course. Here is a review of the method. Suppose we want to maximize (minimize) the real valued function f (x) = f (x1 , x2 , . . . , xn ) subject to the constraint g(x) = g(x1 , x2 , . . . , x1 ) = c. For n = 2, this has a simple geometric interpretation. The locus of the equation g(x1 , x2 ) = c is a level curve of the function g, and we want to maximize (minimize) the function f on that curve. Similarly, for n = 3, the level set g(x1 , x2 .x3 ) = c is a surface in R3 , and we want to maximize (minimize) f on that surface.

g(x) = c g(x) = c

n = 2. Level curve in the plane.

n = 3. Level surface in space.

Examples. Maximize f (x, y) = x2 + y 2 on the ellipse g(x, y) = x2 + 4y 2 = 3. (This is easy if you draw the picture.) Minimize f (x, y, z) = 2x2 + 3xy + y 2 + xz − 4z 2 on the sphere g(x, y, z) = 2 x + y 2 + z 2 = 1. Minimize f (x, y, z, t) = x2 + y 2 + z 2 − t2 on the ‘hypersphere’ g(x, y, z, t) = x2 + y 2 + z 2 + t2 = 1. We shall concentrate on the case of n = 3 variables, but the reasoning for any n is similar. We want to maximize (or minimize) f (x) on a level surface g(x) = c in R3 , where as usual we abbreviate x = (x1 , x2 , x3 ). At any point x on the level surface at which such an extreme value is obtained, we must have (1) for some scalar λ.

∇f (x) = λ∇g(x)

5. CONICS AND THE METHOD OF LAGRANGE MULTIPLIERS

f is parallel to

135

g

maximum point

f

g other point (1) is a necessary condition which must hold at the relevant points. (It doesn’t by itself guarantee that there is a maximum or a minimum at the point. There could be no extreme value at all at the point.) In deriving this condition, we assume implicitly that the level surface is smooth and has a well defined normal vector ∇g 6= 0, and that the function f is also smooth. If these conditions are violated at some point, that point could also be a candidate for a maximum or minimum. Taking components, we obtain 3 scalar equations for the 4 variables x1 , x2 , x3 , λ. We would not expect, even in the best of circumstances to get a unique solution from this, but the defining equation for the level surface g(x) = c provides a 4th equation. We still won’t generally get a unique solution, but we will usually get at most a finite number of possible solutions. Each of these can be examined further to see if f attains a maximum (or minimum) at that point in the level set. Notice that the variable λ plays an auxiliary role since we really only want the coordinates of the point x. (In some applications, λ has some significance beyond that.) λ is called a Lagrange multiplier. The method of Lagrange multipliers often leads to a set of equations which is difficult to solve. However, in the case of quadratic functions f , there is a typical pattern which emerges. Example 1. Suppose we want to minimize the function f (x, y) = x2 + 4xy + y 2 on the circle x2 + y 2 = 1. For this problem n = 2, and the level set is a curve. Take g(x, y) = x2 + y 2 . Then ∇f = h2x + 4y, 4x + 2yi, ∇g = h2x, 2yi, and ∇f = λ∇g yields the equations 2x + 4y = λ(2x) 4x + 2y = λ(2y) to which we add x2 + y 2 = 1.

136

III. APPLICATIONS

After canceling a common factor of 2, the first two equations may be written in matrix form · ¸· ¸ · ¸ 1 2 x x =λ 2 1 y y which says that

· ¸ x y

is an eigenvector for the eigenvalue λ, and the equation x2 + y 2 = 1 says it is a unit eigenvector. You should know how to solve such problems, and we leave it to you to make the required calculations. (See also Example 1 in the previous section where we made these calculations in another context.) The eigenvalues are λ = 3 and λ = −1. For λ = 3, a basic unit eigenvector is · ¸ 1 1 , u1 = √ 2 1 and every other eigenvector is of the form cu1 . The latter will be a unit vector if and only |c| = 1, i.e., c = ±1. We √conclude two solutions of √ that λ = 3√yields √ the Lagrange multiplier problem: (1/ 2, 1/ 2) and (−1/ 2, −1/ 2). At each of these points f (x, y) = x2 + 4xy + y 2 = 3. For λ = −1, we obtain the basic unit eigenvector · ¸ 1 −1 , u2 = √ 1 2 √ √ and a similar (which you should do) yields the two points: (1/ 2, −1/ 2) √ √ analysis and (−1/ 2, 1/ 2). At each of these points f (x, y) = x2 + 4xy + y 2 = −1.

Min

Max

Max

Min

Hence, the function attains its maximum value at the first two points and its minimum value at the second two.

5. CONICS AND THE METHOD OF LAGRANGE MULTIPLIERS

137

Example 2. Suppose we want to minimize the function g(x, y) = x2 +y 2 (which is the square of the distance to the origin) on the conic f (x, y) = x2 + 4xy + y 2 = 1. Note that this is basically the same as the previous example except that the roles of the two functions are reversed. The Lagrange multiplier condition ∇g = λ∇f is the same as the condition ∇f = (1/λ)∇g provided λ 6= 0. (λ 6= 0 in this case since otherwise ∇g = 0, which yields x = y = 0. However, (0, 0) is not a point on the conic.) We just solved that problem and found eigenvalues 1/λ = 3 or 1/λ = −1. In this case, we don’t need unit eigenvectors, so to avoid square roots we choose basic eigenvectors · ¸ · ¸ 1 −1 and v1 = 1 1 corresponding respectively to λ = 3 and λ = −1. The endpoint of v1 does not lie on the conic, but any other eigenvector for λ = 3 is of the form cv1 , so all we need to do is adjust c so that the point satisfies the equation f√ (x, y) = x2 + 4xy + y 2 = 1. 2 Substituting (x, √ = 1√or c = ±1/ 6. Thus, we obtain the √ y) =√(c, c) yields 6c two points (1/ 6, 1/ 6) and (−1/ 6, −1/ 6). For λ = −1, substituting (x, y) = (−c, c) in the equation yields −2c2 = 1 which has no solutions. Thus, the (or maximum) are the first pair of √ only√candidates for √ a minimum √ points: √ (1/ 6, 1/ 6) and (−1/ 6, −1/ 6). A simple calculation shows these are both 1/ 3 units from the origin, but without further analysis, we can’t tell if this is the maximum, the minimum, or neither. However, it is not hard to classify this conic—see the previous section—and discover that it is a hyperbola. Hence, the two points are minimum points. The Rayleigh-Ritz Method. Example 1 above is typical of a certain class of Lagrange multiplier problems. Let A be a real symmetric n × n matrix, and consider the problem of maximizing (minimizing) the quadratic function f (x) = xt Ax subject to the constraint g(x) = |x|2 = 1. This is called the Rayleigh–Ritz problem. For n = 2 or n = 3, the level set |x|2 = 1 is a circle or sphere, and for n > 3, it is called a hypersphere. Alternately, we could reverse the roles of the functions f and g, i.e., we could try to maximize (minimize) the square of the distance to the origin g(x) = |x|2 on the level set f (x) = 1. Because the Lagrange multiplier condition in either case asserts that the two gradients ∇f and ∇g are parallel, these two problems are very closely related. The latter problem—finding the points on a conic, quadric, or hyperquadric furthest from (closest to) the origin—is easier to visualize, but the former problem—maximizing or minimizing the quadratic function f on the hypersphere |x| = 1 —is easier to compute with. Let’s go about applying the Lagrange Multiplier method to the Rayleigh–Ritz problem. The components of ∇g are easy: ∂g = 2xi , ∂xi

i = 1, 2, . . . n.

The calculation of ∇f is harder. First write f (x) =

n X j=1

xj (

n X

k=1

ajk xk )

138

III. APPLICATIONS

and then carefully apply the product rule together with ajk = akj . The result is n X ∂f =2 aij xj ∂xi j=1

i = 1, 2, . . . , n.

(Work this out explicitly in the cases n = 2 and n = 3 if you don’t believe it.) Thus, the Lagrange multiplier condition ∇f = λ∇g yields the equations 2

n X

aij xj = λ(2xi )

i = 1, 2, . . . , n

j=1

which may be rewritten in matrix form (after canceling the 2’s) (3)

Ax = λx.

To this we must add the equation of the level set g(x) = |x|2 = 1. Thus, any potential solution x is a unit eigenvector for the matrix A with eigenvalue λ. Note also that for such a unit eigenvector, we have f (x) = xt Ax = xt (λx) = λxt x = λ|x|2 = λ. Thus the eigenvalue is the extreme value of the quadratic function at the point on the (hyper)sphere given by the unit eigenvector. The upshot of this discussion is that for a real symmetric matrix A, the Rayleigh– Ritz problem is equivalent to the problem of finding an orthonormal basis of eigenvectors for A. The Rayleigh–Ritz method may be used to show that the characteristic equation of a real symmetric matrix only has real eigenvalues. This was an issue left unresolved in our earlier discussions. Here is an outline of the argument. The hypersphere g(x) = |x|2 = 1 is a closed bounded set in Rn for any n. It follows from a basic theorem in analysis that any continuous function, in particular the quadratic function f (x), must attain both maximum and minimum values on the hypersphere. Hence, the Lagrange multiplier problem always has solutions, which by the above algebra amounts to the assertion that the real symmetric matrix A must have at least one eigenvalue. This suggests a general procedure for showing that all the eigenvalues are real. First find the largest eigenvalue by maximizing the quadratic function f (x) on the set |x|2 = 1. Let x = u1 be the corresponding eigenvector. Change coordinates by choosing an orthonormal basis starting with u1 . Then the additional basis elements will span the subspace perpendicular to u1 and we may obtain a lower dimensional quadratic function by restricting f to that subspace. We can now repeat the process to find the next smaller real eigenvalue. Continuing in this way, we will obtain an orthonormal basis of eigenvectors for A and each of the corresponding eigenvalues will be real.

6. NORMAL MODES

139

Exercises for Section 5. 1. Find the maximum and minimum values of the function f (x, y) = x2 +y 2 given the constraint x2 + xy + y 2 = 1. 2. Find the maximum and/or minimum value of f (x, y, z) = x2 −y 2 +z 2 −4xy−4yz subject to x2 + y 2 + z 2 = 1. 3. (Optional) The derivation of the Lagrange multiplier condition ∇f = λ∇g assumes that the ∇g 6= 0, so there is a well defined tangent ‘plane’ at the potential maximum or minimum point. However, a maximum or minimum could occur at a point where ∇g = 0, so all such points should also be checked. (Similarly, either f or g might fail to be smooth at a maximum or minimum point.) With these remarks in mind, find where f (x, y, z) = x2 + y 2 + z 2 attains its minimum value subject to the constraint g(x, y, z) = x2 + y 2 − z 2 = 0. 4. Consider as in Example 2 the problem of maximizing f (x, y) = x2 + 4xy + y 2 given the constraint x2 + y 2 = 1. This is equivalent to maximizing F (x, y) = xy on the circle x2 +y 2 = 1. (Why?) Draw a diagram showing the circle and selected level curves√F (x, y) F√. Can you see why F (x, y) attains its maximum √ = c of the function √ at (1/ 2, 1/ 2) and (−1/ 2, −1/ 2) without using any calculus? Hint: consider how the level curves of F intersect the circle and decide from that where F is increasing, and where it is decreasing on the circle.

6. Normal Modes Eigenvalues and eigenvectors are an essential tool in solving systems of linear differential equations. We leave an extended treatment of this subject for a course in differential equations, but it is instructive to consider an interesting class of vibration problems that have many important scientific and engineering applications. We start with some elementary physics you may have encountered in a physics class. Imagine an experiment in which a small car is placed on a track and connected to a wall though a stiff spring. With the spring in its rest position, the car will just sit there forever, but if the car is pulled away from the wall a small distance and then released, it will oscillate back and forth about its rest position. If we assume the track is so well greased that we can ignore friction, this oscillation will in principle continue forever.

k

m

x

We want to describe this situation symbolically. Let x denote the displacement of the car from equilibrium, and suppose the car has mass m. Hooke’s Law tells

140

III. APPLICATIONS

us that there is a restoring force of the form F = −kx where k is a constant called the spring constant. Newton’s second law relating force and acceleration tells us (1)

m

d2 x = −kx. dt2

d2 x k This is also commonly written 2 + x = 0. You may have learned how to solve dt m this differential equation in a previous course, but in this particular case, it is not really necessary. From the physical characteristics of the solution, we can pretty much guess what it should look like. (2)

x = A cos(ωt)

where A is the amplitude of the oscillation and ω is determined by the frequency or rapidity of the oscillation. It is usually called the angular frequency and it is related to the actual frequency f by the equation ω = 2πf. A is determined by the size of the initial displacement. It gives the maximum displacement attained as the car oscillates. ω however is determined by the spring constant. To see how, just substitute (2) in (1). We get m(−ω 2 A cos(ωt)) = −kA cos(ωt) which after canceling common factors yields mω 2 = k r or

ω=

k . m

The above discussion is a bit simplified. We could not only have initially displaced the car from rest, but we could also have given it an initial shove or velocity. In that case, the maximal displacement would be shifted in time. The way to describe this symbolically is x = A cos(ωt + δ) where δ is called the phase shift. This complication does not change the basic character of the problem since it is usually the fundamental vibration of the system that we are interested in, and that turns out to be the same if we include a possible phase shift. We now want to generalize this to more than one mass connected by several springs. This may seem a bit bizarre, but it is just a model for situations commonly met in scientific applications. For example, in chemistry, one often needs to determine the basic vibrations of a complex molecule. The molecule consists of atoms ‘connected’ by interatomic forces. As a first approximation, we may treat the atoms as point masses and the forces between them as linear restoring forces from equilibrium positions. Thus the mass-spring model may tell us something useful about real problems.

6. NORMAL MODES

141

Example 1. Consider the the configuration of masses and springs indicated below, where m is the common mass of the two particles and k is the common spring constant of the three springs.

k

m

k

m

x

x1

k

2

Look at the first mass. When it is displaced a distance x1 to the right from equilibrium, it will be acted upon by two forces. Extension of the spring on the left will pull it back with force −kx1 . At the same time, the spring in the middle will push or pull it depending on whether it is compressed or stretched. If x2 is the displacement of the second mass from equilibrium, the change in length of the second spring will be x1 − x2 , so the force on the first mass will be −k(x1 − x2 ). This yields a total force of −kx1 − k(x1 − x2 ) = −2kx1 + kx2 . A similar analysis works for the second mass. Thus, we obtain the system of differential equations d2 x1 = −2kx1 + kx2 dt2 d2 x2 m 2 = kx1 − 2kx2 . dt

m

The system may also be rewritten in matrix form · ¸ · ¸ d2 x −2k k x1 . x where x = (3) m 2 = x2 k −2k dt Note that the matrix on the right is a symmetric matrix. This is always the case in such problems. It is an indirect consequence of Newton’s third law which asserts that the forces exerted by two masses on each other must be equal and opposite. To solve this, we look for solutions of the form x1 = v1 cos(ωt) (4)

x2 = v2 cos(ωt)

In such a solution, the two particles oscillate with the same frequency but with possibly different amplitudes v1 and v2 . Such a solution is called a normal mode. General motions of the system can be quite a bit more complicated. First of all, we have to worry about possible phase shifts. More important, we also have to

142

III. APPLICATIONS

allow for linear combinations of the normal modes in which there is a mixture of different frequencies. In this way the situation is similar to that of a musical instrument which may produce a complex sound which can be analyzed in terms of basic frequencies or harmonics. We leave such complications for another course. Here we content ourselves at doing the first step, which is to find the fundamental oscillations or normal modes. (4) may be rewritten in matrix form (5)

x = v cos(ωt)

where ω and v 6= 0 are to be determined. Then d2 x = −ω 2 v cos(ωt) dt2 Hence, putting (5) in (3) yields ·

−2k m(−ω v cos(ωt)) = k 2

¸ k v cos(ωt). −2k

Now factor out the common scalar factor cos(ωt) to obtain · ¸ −2k k 2 v. −ω mv = k −2k Note that the ‘amplitude’ v is a vector in this case, so we cannot cancel it as we did in the case of a single particle. The above equation may now be rewritten · ¸ m −2 1 v = −ω 2 v. 1 −2 k This is a trifle messy, but if we put abbreviate λ = −ω 2 m k for the scalar on the right, we can write it · ¸ −2 1 v = λv. 1 −2 This equation should look familiar. It says that v is an eigenvector for the m matrix on the left, and that λ = −ω 2 is the corresponding eigenvalue. However, k we know how to solve such problems. First we find the eigenvalues by solving the characteristic equation. For each eigenvalue, we can find the corresponding r λm . Next, for each eigenvalue, we can determine basic frequency ω from ω = k eigenvectors as before. In this example, the characteristic equation is · ¸ −2 − λ 1 det = (−2 − λ)2 − 1 1 −2 − λ = λ2 + 4λ + 4 − 1 = λ2 + 4λ + 3 = (λ + 1)(λ + 3) = 0.

6. NORMAL MODES

143

p p Hence, the roots arep λ = −1 (ω = k/m) and λ = −3 (ω = 3k/m). For λ = −1 (ω = k/m), finding the eigenvectors results in reducing the matrix ·

¸ · −2 + 1 1 −1 = 1 −2 + 1 1

¸ · ¸ 1 1 −1 → . −1 0 0

Hence, the solution is v1 = v2 with v2 free. A basic solution vector for the subspace of solutions is · ¸ 1 . v1 = 1 The corresponding normal mode has the form x=

· ¸ p 1 cos( k/m t). 1

Note that x1 (t) = x2 (t) for all p t, so the two particles move together in tandem with the same angular frequency k/m. This behavior of the particles is a consequence of the fact that the components p of the basic vector v1 are equal. Similarly, for λ = −3 (ω = 3k/m), we have ·

−2 + 3 1

¸ · 1 1 = −2 + 3 1

¸ · 1 1 → 1 0

¸ 1 . 0

The solution is v1 = −v2 with v2 free, and a basic solution vector for the system is ·

¸ −1 . v2 = 1 The corresponding normal mode is is ·

¸ p −1 x= cos( 3k/m t). 1 Note that x1 (t) = −x2 (t) for all t, so the two masses r move opposite to one another 3k with the same amplitude and angular frequency . m Note that in the above example, we could have determined the two vectors v1 and v2 by inspection. As noted, the first corresponds to motion in which the particles move in tandem and the spring between them experiences no net change in length. The second corresponds to motion in which the particles move back and forth equal amounts in opposite directions but with the same frequency. This would have simplified the problem quite a lot. For, if you know an eigenvector of a matrix, it is fairly simple to find the corresponding eigenvalue, and hence the angular frequency. In fact, it is often true that careful consideration of the physical arrangement of the particles, with particular attention to any symmetries that may be present, may suggest possible normal modes with little or no calculation.

144

III. APPLICATIONS

Relation to the Principal Axis Theorem. As noted above normal mode problems typically result in systems of the form d2 x = Ax dt2

(7)

where A is a real symmetric matrix. (In the case that all the particles have the 1 same mass, A = K, where K is a symmetric matrix of ‘spring constants’. If the m masses are different, the situation is a bit more complicated, but the problem may still be restated in the above form.) If P is a matrix with columns the elements of a basis of eigenvectors for A, then we saw earlier that AP = P D where D is a diagonal matrix with the eigenvalues on the diagonal. Assume we make the change of coordinates x = P x0 . Then d 2 P x0 = AP x0 dt2 d2 x0 P 2 = AP x0 dt

d 2 x0 = P −1 AP x0 = Dx0 . dt2 However, since D is diagonal, this last equation may be written as n scalar equations d2 x0j = λj x0j dt2

j = 1, 2, . . . , n.

In the original coordinates, the motions of the particles are ‘coupled’ since the motion of each particle may affect the motion of the other particles. In the new coordinate system, these motions are ‘decoupled’. The new coordinates are called normal coordinates. Each x0j may be thought of as the displacement of one of n fictitious particles, each of which oscillates independently of the others in one of n mutually perpendicular directions. The physical significance in terms of the original particles of each normal coordinate is a but murky, but they presumably represent underlying structure of some importance. Example 1, revisited. · k −2 d2 x = 1 dt2 m

¸ 1 x. −2

A basis of eigenvectors for the coefficient matrix is as before ½ · ¸ · ¸¾ 1 −1 v1 = , v2 = . 1 1

6. NORMAL MODES

145

If we divide the vectors by their lengths, we obtain the orthonormal basis · ¸ · ¸¾ ½ 1 −1 1 1 √ , √ . 1 2 1 2 This in turn leads to the change of coordinates matrix # " 1 √ − √12 2 P = 1 1 √

x’ 2

x

π/



2

2

x’ 1

2

4 π/

4

x

1

If you look carefully, you will see this represents a rotation of the original x1 , x2 axes through an angle π/4. However, this has nothing to do with the original geometry of the problem. x1 and x2 stand for displacements of two different particles along the same one dimensional axis. The x1 , x2 plane is a fictitious configuration space in which a single point represents a pair of particles. It is not absolutely clear what a rotation of axes means for this plane, but the new normal coordinates x01 , x02 obtained thereby give us a formalism in which the normal modes appear as decoupled oscillations. Exercises for Section 6. 1. Determine the normal modes, including frequencies and relative motions for the system d2 x1 = k(x2 − x1 ) = −kx1 + kx2 dt2 d2 x2 m 2 = k(x1 − x2 ) + k(x3 − x2 ) = kx1 − 2kx2 + kx3 dt d2 x3 m 2 = k(x2 − x3 ) = kx2 − kx3 dt m

146

III. APPLICATIONS

m

k

m

x

x1

k

m

x

2

3

Note that since the masses are not fixed to any wall, one possibility is that they will together move freely at constant velocity without oscillating. This is reflected in the linear algebra by one zero eigenvalue which does not actually give an oscillatory solution. Ignore that eigenvalue and the corresponding eigenvector. 2. Determine the normal modes, including frequencies and relative motions for the system d2 x1 = −kx1 + k(x2 − x1 ) = −2kx1 + kx2 dt2 d2 x2 m 2 = k(x1 − x2 ) + k(x3 − x2 ) = kx1 − 2kx2 + kx3 dt d2 x3 m 2 = k(x2 − x3 ) − kx3 = kx2 − 2kx3 dt m

k

m

x1

k

m

x

2

k

m

x

k

3

3. Suppose a normal mode problem involving two particles has one normal mode in which the displacements satisfy x1 = 2x2 for all time. What relation do the displacements have for the other normal mode? 4. A system of two particles is similar to the example in the text except that one end is free. It is described by the system · ¸ · ¸ k −5 d2 x 2 x1 x where x = . = 2 −2 x2 dt2 m Find the normal modes. 5. A system of two particles is as in the example in the text except that one end is free. It is described by the system · ¸ · ¸ k −4 d2 x 2 x1 x where x = . = 2 −2 x2 dt2 m Find the normal modes.

7. REVIEW

147

6.

A certain molecule has three normal modes. One is degenerate and corresponds   1 to the eigenvalue λ = 0. The eigenvector for this degenerate mode is  1 . The 1 relative motions for another normal mode satisfy x1 = x3 , x2 = −2x3 . What relations do the relative motions for the third normal mode satisfy?

7. Review

1.

Exercises for Section 7. The Gram-Schmidt process fails when applied to the set of vectors       1 2 3     2 3 5  , ,  1 2    1  3 5 8

in R4 . Explain why.   0 1 1 2. Let A =  1 0 1 . 1 1 0 (a) Find the eigenvalues and eigenvectors of A. (b) Find an orthonormal basis for R3 consisting of eigenvectors for A. (c) Find an orthogonal matrix P such that P t AP is diagonal. What is P t AP ? 3. What is wrong with the following statement? If the columns of an n×n matrix P are mutually perpendicular, then P is orthogonal. · ¸ 2 −2 4. Consider the matrix A = which has eigenvalues λ = 6, 1. −2 5 (a) Find the eigenvectors of A. 2 (b) Consider the conic section 2x2·− ¸4xy + 5y · ¸= 24. Find an orthogonal matrix x u P such that the coordinate change =P transforms the equation of the y v conic into the form αu2 + βv 2 = 24 (that is, into an equation with zero cross term). (c) Sketch the conic section of part (b). Include in the same sketch the xy axes and the uv axes. 5.

Use the methods introduced in this course to sketch the graph of the equation 2x2 + y 2 + z 2 + 4yz = 6.

6. A system of two particles with displacements x1 and x2 satisfies the system of differential equations d2 x1 = −3kx1 + 2kx2 dt2 d2 x2 m 2 = 2kx1 − 3kx2 dt

m

148

III. APPLICATIONS

Find the normal modes. Include the ‘angular frequencies’ ω and the initial displacements (u1 , u2 ) for each normal mode. 7. Determine whether or not each of the following matrices may be diagonalized. In each case, explain your answer. Using general principles may help you avoid difficult computations.   1 0 0 (a) A =  1 1 0 . 1 1 1   3 −1 1 (b) B =  0 2 0 . Note: The characteristic polynomial of B is −(λ − 1 −1 3 2)2 (λ − 4).   1 2 1 1 1 2 0 1 0 1   (c) C =  1 1 0 4 0 .   1 0 4 0 5 1 1 0 5 0

CHAPTER I

BASIC NOTIONS 1.1. (a) 86.66 . . . and 88.33 . . . . (b) a1 = 0.6, a2 = 0.4 will work in the first case, but there are no possible such weightings to produce the second case, since Student 1 and Student 3 have to end up with the same score. 1.2. (a) x = 2, y = −1/3. (b) x = 1, y = 2, z = 2. (c) This system does not have a solution since by adding the first two equations, we obtain x + 2y + z = 7 and that contradicts the third equation. (d) Subtracting the second equation from the first yields x + y = 0 or x = −y. This system has infinitely many solutions since x and y can be arbitrary as long as they satisfy this relation. 2.1. 

 −1 x + y =  3, 0



 14 3x − 5y + z =  1  . −25

2.2. 

     −15 −2 −17  −10   −2   −12  Ax =   , Ay =   Ax + Ay = A(x + y) =  . 4 14 18 −10 −20, −30 2.3. A + 3B, C + 2D, DC are not defined. · ¸ · ¸ 0 0 0 −9 8 A+C = , AB = , 0 0 0 −8 4   · ¸ 1 −5 7 −4 −3 13 BA =  1 −1 3  , CD = . −2 −6 10 −3 −1 −5 2.4. We have for the first components of these two products a11 + a12 2 = 3 a11 2 + a12 = 6 This is a system of 2 equations in 2 unknowns, and you can solve it by the usual methods of high school algebra to obtain a11 = 3, a12 = 0. A similar argument applied to the second components yields a2,1 = 7/3, a22 = −2/3. Hence, · ¸ 3 0 A= . 7/3 −2/3 1

2

I. BASIC NOTIONS

2.5. For example 

a11  a21 a31

a12 a22 a32

      a13 a11 1 a11 + 0 + 0 a23   0  =  a21 + 0 + 0  =  a21  . a33 a31 a31 + 0 + 0 0

2.6. (a)

·

(b)

·

(c)



¸ · ¸ 2 −3 2 x= . −4 2 3

¸ · ¸ 2 −3 4 x= . −4 + 2 1

   1 1 0 1 1 x = 1. 0 3 −1 √ √ √ 2.7. (a) |u| = 10, |v| = 2, |w| = 8. (b) Each is perpendicular to the other two. Just take the dot products. (c) Multiply each vector by the reciprocal of its length: 1 1 1 √ u, √ v, √ w. 10 2 8 1 0 2

2.8. (b) Let u be the n × 1 column vector all of whose entries are 1, and let v the the corresponding 1 × n row vector. The conditions are Au = cu and vA = cv for the same c. 2.9. We need to determine the relative number of individuals in each age group after 10 years has elapsed. Notice however that the individuals in any given age group become (less those who die) the individuals in the next age group and that new individuals appear in the 0 . . . 9 age group. 

0  .99   0   0   0 A=  0   0   0  0 0

.01 .04 .03 .01 0 0 0 0 .99 0 0 0 0 .99 0 0 0 0 .99 0 0 0 0 .98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

.001 0 0 0 0 0 0 0 0 0 0 0 .97 0 0 .96 0 0 0 0

0 0 0 0 0 0 0 0 .90 0

0 0 0 0 0 0 0 0 0 .70

 0 0  0  0  0  0  0  0  0 0

Note that this model is not meant to be realistic. 3.1. (a) Every power of I is just I. (b) J 2 = I, the 2 × 2 identity matrix.

I. BASIC NOTIONS

3.2. There are lots of answers. Here is one · ¸· ¸ · 1 1 1 −1 0 = 1 1 −1 1 0

3

0 0

¸

3.3. By the distributive law, A(ax + by) = A(ax) + A(by). However, one of the rules says we may move scalars around at will in a matrix product, so the above becomes a(Ax) + b(Ay). 3.4. This is an exercise in the proper use of subscripts. The i, r entry of (AB)C = DC is p p X n X X dik ckr = aij bjk ckr . k=1 j=1

k=1

Similarly, the i, r entry of A(BC) = AE is n X

aij ejr =

j=1

p n X X

aij bjk ckr .

j=1 k=1

These are the same since the double sums amount to the same thing. 4.1. (a) x1 = −3/2, x2 = 1/2, x3 = 3/2. (b) No solutions. (c) x1 = −27, x2 = 9, x3 = 27, x4 = 27. In vector form,   −27  9  x= . 27 27 4.2. · X= 4.3. (a) Row reduction yields

·

1 0

¸ −1 . 2

2 −3

2 | 0 |

1 1

¸ 0 . −1

Since the last row consists of zeroes to the left of the separator and does not consist of zeroes to the right, the system is inconsistent and does not have a solution. (b) The solution is   3/2 0 X =  1/2 1  . −1/2 0

4

I. BASIC NOTIONS

4.4. The effect is to add a times the first column to the second column. The general rule is that if you multiply a matrix on the right by the matrix with an a in the i, j-position (i 6= j) and ones on the diagonal, the effect is to add a times the ith column to the jth column. 4.5. 

 13 15 −1 0  8 9

11  −2 7 5.1. 

 0 1 −1/2 (a)  1 −3 5/2  , −1 2 −3/2



−5 −1 (b)  1 0 2 1 

(c) not invertible,

−4 −2 1  2 (d)  6 2 −1 0

−3 1 4 0

 7 −1  −3  5 −2   −7 1

5.2. 5.3. (AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I. 5.4. The basic argument does work except that you should start with the second column instead. If that consists of zeroes, go on to the third column, etc. The matrix obtained at the end of the Gauss-Jordan reduction will have as many columns at the beginning which consist only of zeroes as did the original matrix. For example, · ¸ · ¸ 0 1 3 0 0 1 3 0 → ··· → 0 1 3 4 0 0 0 1 5.5. (b) The coefficient matrix is almost singular. Replacing 1.0001 by 1.0000 would make it singular. 5.6. The answer in part (a) is way off but the answer in part (b) is pretty good. This exercise shows you some of the numerical problems which can arise if the entries in the coefficient matrix differ greatly is size. One way to avoid such problems is always to use the largest pivot available in a given column. This is called partial pivoting. 5.7. The LU decomposition  1 2 1 4 2 3

is

  1 1 1 = 1 2 1

The solution to the system is

 1 0 0 1 00 0 −1/2 1 

 1/2 x =  −1/2  . 3/2

 2 1 2 0. 0 −1

I. BASIC NOTIONS

5

6.1.    · ¸ 1 −3/5 3/5     , (a) x = 2/5 + x3 −1/2 , (b) x = 1/5 1 0       2 −3 2 1  0  0 (c) x =   + x2   + x4  . 0 1 2 0 1 0 6.2. Only the Gaussian part of the reduction was done. The Jordan part of the reduction was not done. In particular, there is a pivot in the 2, 2 position with a non-zero entry above it. As a result, the separation into bound and free variables is faulty. The correct solution to this problem is x1 = 1, x2 = −x3 with x3 free. 

6.3. 



   −10 3  2   −1      (b) x = x3  1  + x4  0  .     0 1 0 0 √ √ −1 |u| = 2, and |v| = 3. Hence, cos θ = √ . Hence, 6

 2  0  (a) x = x4  , −4 1 6.4. We have u · v = −1,

−1 θ = cos−1 √ ≈ 1.99 radians or about 114 degrees. 6 6.5. (a) has rank 3 and (b) has rank 3. 6.6. The ranks are 2, 1, and 1.

6.7. (a) is always true because the rank can’t be larger than the number of rows. Similarly, (b) and (d) are never true. (c) and (e) are each sometimes true. (f) is true just for the zero matrix. 6.8. In case (a), after reduction, there won’t be a row of zeroes to the left of the ‘bar’ in the augmented matrix. Hence, it won’t matter what is to the right of the ‘bar’. In case (b), there will be at least one row of zeroes to the left of the ‘bar’, so we can always arrange for a contradictory system by making sure that there is something non-zero in such a row to the right of the ‘bar’. 6.9. The rank of AB is always less than or equal to the rank of A. 6.10. A right pseudo-inverse is



 −1 1  2 −1  . 0 0 There are no left pseudo-inverses for A. For if B were a left pseudo-inverse of A, A would be a right pseudo-inverse of B, and B has more 3 rows and 2 columns. According to the the text, a matrix with more rows than columns never has a right pseudo-inverse.

6

I. BASIC NOTIONS

6.11. Suppose m < n and A has a left pseudo-inverse A0 such that A0 A = I. It would follow that A0 is an n × m matrix with n > m (more rows that columns) and A0 has a right pseudo-inverse, namely A. But we already know that is impossible. 7.1. The augmented matrix has one row [ 1 −2 1 | 4 ]. It is already in Gauss– Jordan reduced form with the first entry being the single pivot. The general solution is x1 = 4 + 2x2 − x3 with x2 , x3 free. The general solution vector is       2 −1 4 x =  0  + x2  1  + x3  0  . 0 1 0 The second two terms form a general solution of the homogeneous equation. 7.2. (a) is a subspace, since it is a plane in R3 through the origin. (b) is not a subspace since it is a plane in R3 not through the origin. One can also see that it it doesn’t satisfy the defining condition that it be closed under forming linear combinations. Suppose for example that u and v are vectors whose components satisfy this equation, and s and t are scalars. Then u1 − u2 + 4u3 = 3 v1 − v2 + 4v3 = 3 Multiply the first equation by s and the second by t and add. You get (su1 + tv1 ) − (su2 + tv2 ) + 4(su3 + tv3 ) = 3(s + t). This is the equation satisfied by the components of su + tv. Only in the special circumstances that s + t = 1 will this again satisfy the same condition. Hence, most linear combinations will not end up in the same subset. A much shorter but less instructive argument is to notice that the components of the zero vector 0 don’t satisfy the condition. (c) is not a subspace because it is a curved surface in R3 . Also, with some effort, you can see that it is not closed under forming linear combinations. Probably, the easiest thing to notice is that the components of the zero vector don’t satisfy the condition. (d) is not a subspace because the components give a parametric representation for a line in R3 which doesn’t pass through the origin. If it did, from the first component you could conclude that t = −1/2, but this would give non-zero values for the second and third components. Here is a longer argument which shows that if you add two such vectors, you get a vector not of the same form.      1 + 2t2 2 + 2(t1 + t2 ) 1 + 2t1  −3t1  +  −3t2  =  −3(t1 + t2 )  2t1 2t2 2(t1 + t2 ) 

The second and third components have the right form with t = t1 + t2 , but the first component does not have the right form because of the ‘2’.

I. BASIC NOTIONS

7

(e) is a subspace. In fact it is the plane spanned by   1 v1 =  2  1



 2 v3 =  −3  . 2

This is a special case of a subspace spanned by a finite set of vectors. Here is a detailed proof showing that the set satisfies the required condition.      s2 + 2t2 s1 + s2 + 2(t1 + t2 ) s1 + 2t1  2s1 − 3t1  +  2s2 − 3t2  =  2(s1 + s2 ) − 3(t1 + t2 )  . s1 + 2t1 s2 + 2t2 s1 + s2 + 2(t1 + t2 )     s + 2t cs + 2(ct) c  2s − 3t  =  2(cs) − 3(ct)  . s + 2t cs + 2(ct) 

What this shows is that any sum is of the same form and also any scalar multiple is of the same form. However, an arbitrary linear combination can always be obtained by combining the process of addition and scalar multiplication in some order. Note that in cases (b), (c), (d), the simplest way to see that the set is not a subspace is to notice that the zero vector is not in the set. 7.3. No. Pick v1 a vector in L1 and v2 a vector in L2 . If s and t are scalars, the only possible way in which sv1 + tv2 can point along one or the other of the lines is if s or t is zero. Hence, it is not true that every linear combination of vectors in the set S is again in the set S. 7.4. It is a plane through the origin. Hence it has an equation of the form a1 x1 + a2 x2 + a3 x3 = 0. The given data show that a1 + a2 = 0 a2 + 3a3 = 0 We can treat these as homogeneous equations in the unknowns a1 , a2 , a3 . The general solution is a1 = 3a3 a2 = −3a3 with a3 free. Taking a3 = 1 yields the specific solution a1 = 3, a2 = −3, a3 = 1 or the equation 3x1 − 3x2 + x3 = 0 for the desired plane. Any other non-zero choice of a3 will yield an equation with coefficients proportion to these, hence it will have the same locus. Another way to find the equation is to use the fact that u1 × u2 is perpendicular to the desired plane. This cross product ends up being the vector with components h3, −3, 1i.

8

I. BASIC NOTIONS

7.5. (a) The third vector is the sum of the other two. The subspace is the plane through the origin spanned by the first two vectors. In fact, it is the plane through the origin spanned by any two of the three vectors. A normal vector to this plane may be obtained by forming the vector product of any two of the three vectors. (b) This is actually the same plane as in part (a). 7.6. (a) A spanning set is given by    2  v1 =  1  ,  0

 −5  v2 =  0  .  1 

Take dot products to check perpendicularity. (b) A spanning set is given by ½·

−1 1

¸¾ .

8.1. (a) No. v1 = v2 + v3 . See also Section 9 which provides a more systematic way to answer such questions. (b) Yes. Look at the pattern of ones and zeroes. It is clear that none of these vectors can be expressed as a linear combination of the others. 8.2.     −1/3 1/3      2/3   −2/3   ,  1 0     0 1 8.3. (a) One. (b) Two. 8.4. No. 0 can always be expressed a linear combination of other vectors simply by taking the coefficients to be zero. One has to quibble about the set which has only one element, namely 0. Then there aren’t any other vectors for it to be a linear combination of. However, in this case, we have avoided the issue by defining the set to be linearly dependent. (Alternately, one could ask if the zero vector is a linear combination of the other vectors in the set, i.e., the empty set. However, by convention, any empty sum is defined to be zero, so the criterion also works in this case.) 8.5. Suppose first that the set is linearly independent. If there were such a relation without all the coefficients c1 , c2 , c3 zero, then one of the coefficients, say it was c2 would not be zero. Then we could divide by that coefficient and solve for v2 to get v1 = −

c1 c3 v1 − v3 , c2 c2

i.e., v2 would be a linear combination of v1 and v3 . A similar argument would apply if c1 or c3 were non-zero. That contradicts the assumption of linear independence.

I. BASIC NOTIONS

9

Suppose conversely that there is no such relation. Suppose we could express v1 in terms of the other vectors v1 = c2 v2 + c3 v3 . This could be rewritten −v1 + c2 v2 + c3 v3 = 0. which would be a relation of the form c1 v1 + c2 v2 + c3 v3 = 0 with c1 = −1 6= 0. By assumption there are no such relations. A similar argument shows that neither of the other vectors could be expressed as a linear combination of the others. A similar argument works for any number of vectors v1 , v2 , . . . , vn .     1 1 8.6. (a) v1 × v2 · v3 =  −1  ·  0  = 2 6= 0, so v3 is not perpendicular to v1 × v2 . 1 1 Similarly, calculate v1 × v3 · v2 and v2 × v3 · v1 . (b) The subspace spanned by these vectors has dimension 3. Hence, it must be all of R3 . (c) Solve the system      s1 1 0 1 1 v1 s1 + v2 s2 + v3 s3 =  1 1 0   s2  =  1  0 1 1 2 s3 for s1 , s2 , s3 . The solution is s1 = 0, s2 = 1, s3 = 1. 8.7. It is clear that the vectors form a linearly independent pair since neither is a multiple of the other. To find the coordinates of e1 with respect to this new basis, solve · ¸· ¸ · ¸ 1 1 1 x1 = . x2 0 −1 1 The solution is x1 = x2 = 1/2. Hence, the coordinates are given by · ¸ 1/2 . 1/2 Similarly, solving

·

1 1 −1 1

¸·

x1 x2

¸ =

· ¸ 0 1

yields the following coordinates for e2 . · ¸ −1/2 . 1/2 One could have found both sets of coordinates simultaneously by solving · ¸ · ¸ 1 1 1 0 X= −1 1 0 1 which amounts to finding the inverse of the matrix [ u1

u2 ].

10

I. BASIC NOTIONS

8.8. (a) The set is linearly independent since neither vector is a multiple of the other. Hence, it is a basis for W . (b) We can answer both questions by trying to solve     1 1 0 · ¸ c v1 c1 + v2 c2 =  1 1  1 =  −1  c2 −2 0 1 for c1 , c2 . If there is no solution, the vector is not in subspace spanned by {v1 , v2 }. If there is a solution, it provides the coordinates. In this case, there is the unique solution c1 = 1, c2 = −2. 8.9. (a) You can see you can’t have a non-trivial linear relation among these vectors because of the pattern of zeroes and ones. Each has a one where the others are zero. (b) This set of vectors does not span R∞ . For example, the ‘vector’ (1, 1, 1, . . . , 1, . . . ) with all entries 1 cannot be written a linear combination of finitely many of the ei . Generally, the only vectors you can get as such finite linear combinations are the ones which have all components zero past a certain point. 9.1. Gauss–Jordan reduction of the  1 0  0 0

matrix with these columns yields  0 3/2 −1/2 1 1/2 1/2   0 0 0 0 0 0

so the first two vectors in the set form a basis for the subspace spanned by the set. 9.2. (a) Gauss-Jordan reduction yields   1 0 2 1 1 0 1 5 1 2 0 0 0 0 0 so     0   1  −1  ,  1    1 1 is a basis. (b) A basis for the row space is {[ 1

0

2

1

1],[0 1

5

1

2 ]}.

Note that neither of these has any obvious connection to the solution space which has basis       2 1 1       5   1   2         1,0,0 .         1 0     0  0 0 1

I. BASIC NOTIONS

9.3. Reduce



1  −2 −1 to get

11

 1 1 0 0 2 0 1 0 1 0 0 1



 1 0 1/2 0 −1/2  0 1 1/2 0 1/2  . 0 0 0 1 −2

Picking out the first, second, and fourth columns shows that {v1 , v2 , e2 } is a basis for R3 containing v1 and v2 . 9.5. (a) Gaussian reduction shows that A has rank 2 with pivots in the first and third columns. Hence,     2   1 1, 3   3 7 is a basis for its column space. (b) Solve the system  1 1 3

   2 2 3 0 2 3 4x = 1 6 7 10 1

It does have solutions, so the vector on the right is in the column space. 9.6. (a) Every such system is solvable. For, the column space of A must be all of R7 since it is a subspace of R7 and has dimension 7. (b) There are definitely such systems which don’t have solutions. For, the dimension of the column space is the rank of A, which is at most 7 in any case. Hence, the column space of A must be a proper subspace of R12 . 10.2. (a) The rank of A turns out to be 2, so the dimension of its nullspace is 4−2 = 3. (b) The dimension of the column space is the rank, which is 2. (c) These add up to the number of columns of A which is 5. 10.3. The formula is correct if the order of the terms on the right is reversed. Since matrix multiplication is not generally commutative, we can’t generally conclude that A−1 B −1 = B −1 A−1 . 10.4. (a) will be true if the rank of A is 15. Otherwise, there will be vectors b in R15 for which there is no solution. (b) is always true since there are more unknowns that equations. In more detail, the rank is at most 15, and the number of free variables is 23 less the rank, so there are at least 23 − 15 = 8 free variables which may assume any possible values.

12

I. BASIC NOTIONS



 1 0 10 −3 0 10.5. (a) The Gauss–Jordan reduction is  0 1 −2 1 0 . The rank is 3, and 0 0 0 0 1 the free variables are x3 and x4 . A basis for the nullspace is     −10 3       2   −1        1, 0 .       0 1      0 0 Whenever you do a problem of this kind, make sure you go all the way to Jordan reduced form! Also, make sure the number of free variables is the total number of unknowns less the rank. (b) The dimension of the null space is the number of free variables which is 2. The dimension of the column space of A is the rank of A, which in this case is 3. (c) In this case, the column space is a subspace of R3 with dimension 3, so it is all of R3 . Hence, the system Ax = b has a solution for any possible b. If the rank of A had been smaller than the number of rows of A (usually called m), you would have had to try to solve Ax = b for the given b to answer the question.   4 −3/2 0 10.6. A−1 =  1 −1/2 2 . You should be able to check your answer yourself. −1 1/2 0 Just multiply it by A and see if you get I.   1 2 0 1 10.7. (a) Reduction yields the matrix  0 0 1 1 . x2 and x4 are the free variables. 0 0 0 0 A basis for the solution space is     −1   −2    1  0 ,     . 0 −1     0 1 (b) Pick out the columns of the original matrix for which we have pivots in the reduced matrix. A basis is     2   1 1,3 .   3 7 Of course, any other linearly independent pair of columns would also work. (c) The columns do not form a linearly independent set since the matrix does not have rank 4. (d) Solve the system     1 2 2 3 2 1 2 3 4 x = 3. 3 6 7 10 4 You should discover that it doesn’t have a solution. The last row of the reduced augmented matrix is [ 0 0 0 0 | −2 ]. Hence, the vector is not in the column space.

I. BASIC NOTIONS

13

10.8. (a) is a vector subspace because it is a plane through the origin. (b) is not because it is a curved surface. Also, any vector subspace contains the element 0, but this does not lie on the sphere. 10.9. (a) The rank is 2. (b) The dimension of the solution space is the number of variables less the rank, which in this case is 5 − 2 = 3. 10.10. (a) Yes, the set is linearly independent. The easiest way to see this is as follows. Form the 4 × 4 matrix with these vectors as columns, but in the opposite order to that in which they are given. That matrix is upper triangular with non-zero entries on the diagonal, so its rank is 4. (b) Yes, it is a basis for R4 . The subspace spanned by this set has a basis with 4 elements, so its dimension is 4. The only 4 dimensional subspace of R4 is the whole space itself.

14

I. BASIC NOTIONS

CHAPTER II

DETERMINANTS AND EIGENVALUES

1.1. The first two components of u × v are zero and the third component is the given determinant, which might be negative. 1.2. (a) (i) 1, (ii) −1, (iii) 1. (b) In case (ii), the orientation is reversed, so the sign changes. In case (iii), the two parallelograms can be viewed as having the same base and same height—one of the sides is shifted—so they have the same area. 1.3. (b) (v × u) · w = (−u × v) · w, so the sign changes. A similar argument shows the sign changes if the second and third columns are interchanged. The last determinant can be obtained by the two switches [u

v

w] → [u

w

v] → [w

u

v]

each of which changes the sign, so the net result is no change. (c) (u + v) × v = u × u + u × v = u × v, so ((u + v) × v) · w = (u × v) · w. (d) The determinant is multiplied by −3. 1.4. We have · ¸· ¸ ¸−1 · ¸ 1 d −b e e = a f f ad − bc −c · ¸ 1 de − bf . = ad − bc −ce + af

· ¸ · x a = y c

b d

2.1. (a) −16. (b) 40. (c) 3. (d) 0. 2.2. This is a lot of algebra, which I leave to you. If you have verified rules (i) and (ii) only for the first row, and you have also verified rule (iii), then you can verify rules (i) and (ii) for the second row as follows. Interchange the two rows. The second row is now the first row, but the sign has changed. Use rules (i) and (ii) on the new first row, then exchange rows again. The sign changes back and the rules are verified for the second row. 2.3. I leave the algebra to you. The corresponding rule for the second row follows by exchanging rows, applying the new rule to both sides of the equation and then exchanging back. 15

16

II. DETERMINANTS AND EIGENVALUES

2.4. The matrix is singular if and only if its determinant is zero. · det

1 z

¸ z = 1 − z2 = 0 1

yields z = ±1.

√ 2.5. det A = −λ3 + 2λ = 0 yields λ = 0, ± 2. 2.6. The relevant point is that the determinant of any matrix which has a column consisting of zeroes is zero. For example, in the present case, if we write out the formula for the determinant of the above 5 × 5 matrix, each term will involve the determinant of a 4 × 4 matrix with a column of zeroes. Similarly, in the formula for the determinant of such a 4 × 4 matrix, each term will involve the determinant of a 3 × 3 matrix with a column of zeroes. Continuing this way, we eventually get to determinants of 2 × 2 matrices, each with a column of zeroes. However, it is easy to see that the determinant of such a 2 × 3 matrix is zero. Note that we will see later that a formula like that used to define the determinant works for any column or indeed any row. Hence, if a column (or row) consists of zeroes, the coefficients in that formula would all be zero, and the net result would be zero. However, it would be premature to use such a formula at this point. 2.7. det(cA) = cn det A. For, multiplying one row of A by c multiplies its determinant by c, and in cA, all n rows are multiplied by c. 2.8. By a previous exercise, we have det(−A) = (−1)6 det A = det A. The only way we could have det A = − det A is if det A = 0, in which case A would be singular. 2.9. Almost anything you come up, with the exception of a few special cases, should work. For example, suppose det A 6= 0 and B = −A. Then, since A is 2×2, it follows that det(−A) = (−1)2 det A = det A. Hence, det A + det B = 2 det A 6= det 0 = 0. 2.10. (a) The recursive formula for n = 7 uses seven 6 × 6 subdeterminants. Each of these requires N (6) = 876 multiplications. Since there are 7 of these, this requires 7 ∗ 876 = 6132 multiplications. However, in addition, once these 7 subdeterminants have been calculated, each must be multiplied by the appropriate entry, and this adds 7 additional multiplications. Hence, the total N (7) = 6132 + 7 = 6139. (b) The recursive rule is N (n) = nN (n − 1) + n. 3.1. The first matrix has determinant 31, and the second matrix has determinant 1. The product matrix is   6 5 −3  7 9 2 −4 −6 −1 which has determinant 31. 3.2. If A and B both have rank n, they are both non-singular. Hence, det A and det B are nonzero. Hence, by the product rule, det(AB) = det A det B 6= 0. Hence, AB is also non-singular and has rank n.

II. DETERMINANTS AND EIGENVALUES

17

3.3. The determinant of any lower triangular matrix is the product of its diagonal entries. For example, you could just use the transpose rule. 3.4. (a) If A is invertible, then AA−1 = I. Hence, det(AA−1 ) = 1. Using the product rule yields det A det(A−1 = 1. Hence, det A 6= 0, and dividing both sides by it yields 1 . det(A−1 ) = det A 1 . det P and (b) det(P AP −1 ) = det P det A det(P −1 ). But det(P −1 ) = det P 1 cancel, so the net result is det A as claimed. det P 3.5. Cramer’s rule has det A in the denominator. Hence, the formula is meaningless if A is singular since in that case det A = 0. 3.6. The determinant of the coefficient matrix is 1. The solution by Cramer’s rule or by Gauss-Jordan reduction is x1 = −2, x2 = 1, x3 = 4, x4 = 2. 4.1. In each case we give the eigenvalues and for each eigenvalue a basis consisting of one or more basic eigenvectors for that eigenvalue. (a) λ = 2,

· ¸ 1 1

λ = 3,

(b)

  6 λ = 2,  1  2

(c)

    1 0 λ = 2, v1 =  0  , v2 =  −1  0 1

(d)

4.2. Compute

  2 λ = 1,  1  1

· ¸ 3 . 2 

 0 λ = −1,  −1  . 1 

 −1 λ = 1,  −2  . 1



 0 λ = 3,  −1  . 1 

−2 Av =  1 0

    1 0 1 −1 −2 11 =  0. 1 1 −2 −1

You see that Av is not a scalar multiple of v, so by definition, it is not an eigenvector for A. Note that trying to find the eigenvalues and eigenvectors of A would be much more time √ In this particular case, the eigenvalues turn out to be √ consuming. λ = −2, −2 + 2, −2 − 2, and the radicals make it a bit complicated to find the eigenvectors.

18

II. DETERMINANTS AND EIGENVALUES

4.3. You say that an eigenvector can’t be zero, so there had to be a mistake somewhere in the calculation. Either the characteristic equation was not solved correctly to find the eigenvalue λ or the solution of the system (A − λI)v = 0 was not done properly to find the eigenspace. 4.4. The eigenvalues of A are the roots of the equation det(A − λI) = 0. λ = 0 is a root of this equation if and only if det(A − 0 I) = 0, i.e., det A = 0. Hence, A would have to be singular. 4.5. Multiply Av = λv by A. We get A2 v = A(Av) = A(λv) = λAv = λ(λv) = λ2 v. In general, An v = λn v. 4.6. Av = λv implies that v = A−1 Av = A−1 (λv) = λA−1 v. Since A is non-singular, λ 6= 0 by a problem above, so we may divide through by λ to obtain λ−1 v = A−1 v. This just says λ−1 is an eigenvalue for A−1 . 4.7. (a) and (b) are done by expanding the determinants ·

a11 − λ a21

a12 a22 − λ

¸



a11 − λ  a21 a31

a12 a22 − λ a32

 a13 a23  . a33 − λ

I leave the details to you. (c) The coefficient of λn is (−1)n , i.e., it is 1 if n is even and −1 if n is odd. · ¸ · ¸ 1 −1 and λ = −1, v2 = . However, other answers are 5.1. (a) λ = 3, v1 = 1 1 possible, depending on how you did the problem. 5.2. (a)



   −2 1 λ = −3, v1 =  2  , v2 =  0  , 0 1



 −2 λ = 6, v3 =  −1  2

{v1 , v2 , v3 } is a basis, but other answers are possible, depending on how you went about doing the problem, 5.3. (a) λ = 2, v1 = e1 and λ = 1, v2 = e3 . (b) For λ = 2, the dimension of the eigenspace is strictly less than the multiplicity. For λ = 1, the number of basic eigenvectors does equal the multiplicity; they are both one. A is not diagonalizable because equality does not hold for at least one of the eigenvalues.

II. DETERMINANTS AND EIGENVALUES

5.4. (a)

19



     −1 −1 1 λ = 1, v1 =  1  , v2 =  0  , λ = 4, v3 =  1  . 0 1 1 However, other answers are possible. (b) This depends on your answer for part (a). For example,   −1 −1 1 0 1 P = [ v1 v2 v3 ] =  1 0 1 1 will work.

5.5. Note that if m1 + m2 + m3 6= 5, that means that the characteristic equation has complex roots which are not considered candidates for real eigenvalues. (a) The dimension of the eigenspace equals the multiplicity for each eigenvalue and the multiplicities add up to five. Hence, the matrix is diagonalizable. (b) d1 < m1 so the matrix is not diagonalizable. (c) d1 > m1 , which is never possible. No such matrix exists. (d) m1 + m2 + m3 = 3 < 5. Hence, there are necessarily some complex roots of the characteristic equation. The matrix is not diagonalizable (in the purely real theory). 5.6. (a) The characteristic equation is λ2 − 13λ + 36 = 0. Its roots λ = 4, 9 are distinct, so the matrix is diagonalizable. In Chapter III, we will learn a simpler more direct way to see that a matrix of this type is diagonalizable. (b) The characteristic equation is (λ − 1)2 = 0 so the only eigenvalue is λ = 1 and it has multiplicity two. That, in itself, is not enough to conclude the matrix isn’t diagonalizable. However, · ¸ · ¸ 1−1 1 0 1 = 0 1−1 0 0 which has rank 1. Hence, the eigenspace has dimension 2 − 1 = 1 which is less than the multiplicity of the eigenvalue. Hence, the matrix is not diagonalizable. (c) The characteristic equation is λ2 +1 = 0. Since its roots are non-real complex numbers, this matrix is not diagonalizable in our sense, since we restrict attention to real scalars. 6.1. (a) For any non-negative integer n, we have ¸ · n λ 0 , An = 0 µn so ¸ · λt · P∞ ∞ n n n X t n 0 e n=0 λ t /n! P = A = ∞ n n 0 0 n! n=0 µ t /n! n=0

(b) We have



eAt

eλ1 t  0 =  ... 0

0

λ2 t

e

.. . 0

... ... ... ...

 0 0  ..  . 

eλn t

¸ 0 . eµt

20

II. DETERMINANTS AND EIGENVALUES

6.2. (a) eN t = I + t

·

0 1

¸ · 0 1 = 0 t

¸ 0 . 1

This used the fact that in this case N k = 0 for k ≥ 2. (b) In this case N k = 0 for k ≥ 3. 

eN t

1 = t t2 /2

 0 0. 1

0 1 t

(c) The smallest such k is n. eN t has the form suggested by the answers in parts (a) and (b). It is an n × n matrix with ‘1’s on the diagonal, ‘t’s just below the diagonal, ‘t2 /2’s just below that, etc. In the lower left hand corner there is a ‘tn−1 /(n − 1)!’. 6.3. (a) e

At

λt

½

·

0 I +t 1

=e

0 0

¸¾

λt

·

=e

1 t

¸ 0 . 1

This used the fact that in this case (A − λI)k = 0 for k ≥ 2. (b) In this case (A − λI)k = 0 for k ≥ 3. 

eAt

1 = eλt  t t2 /2

0 1 t

 0 0. 1

(c) The smallest such k is n. eAt has the form suggested by the answers in parts (a) and (b). There is a scalar factor of eλt followed by a lower triangular matrix with ‘1’s on the diagonal, ‘t’s just below the diagonal, ‘t2 /2’s just below that, etc. In the lower left hand corner there is a ‘tn−1 /(n − 1)!’. 6.4. In general P An P −1 = (P AP −1 )n . Hence, P(

∞ X

tn /n!An )P −1 =

n=0

∞ X

tn /n!P An P −1 =

n=0

∞ X

tn /n!(P AP −1 )n = eP AP

n=0

6.5. e

B+C

∞ ∞ X X 1 1 X n! i j n (B + C) = BC = n! n! i+j=n i!j! n=0 n=0

=

=

X

n = 0∞

X 1 1 Bi C j i! j! i+j=n

∞ ∞ X 1 iX 1 j B C = eB eC . i! j! i=0 j=0

−1

t

.

II. DETERMINANTS AND EIGENVALUES

6.6. (c) We have

·

0 B+C = −1 so by Example 2, B+C

e

·

cos 1 = − sin 1

On the other hand,

·

0 e e = −1 B C

21

¸ 1 , 0 ¸ sin1 . cos 1 ¸ 1 , 1

7.1. This is not an upper or lower triangular matrix. However, after interchanging the first and third rows, it becomes an upper triangular matrix with determinant equal to the product of its diagonal entries. The determinant is −6 because we have to change the sign due to the interchange. 7.2. (a) and (c) are true. (b) is false. The correct rule is det(cA) = cn det A. (d) is true. One way to see this is to notice that det At = det A 6= 0. 7.3. The characteristic equation is −(λ − 2)(λ + 1)2 = 0. The eigenvalues are λ = 2 and λ = −1 which is a double root. For λ = 2,    1  1   1 is a basis for the eigenspace. For λ = −1,     −1   −1  1,  0   0 1 is a basis for the eigenspace. 7.4. Use Gauss Jordan reduction to get an upper triangular matrix. You might speed things up also by using selected column operations. The answer is −23. 7.5. (a) This is never true. It is invertible if and only if its determinant is not zero. (b) This condition is the definition of ‘diagonalizable matrix’. There are many non-diagonalizable matrices. This will happen for example when the dimension of an eigenspace is less than the multiplicity of the corresponding eigenvalue. (It can also happen if the characteristic equation has non-real complex roots.) (c) The statement is only true for square matrices. 7.6. No. Av is not a scalar multiple of v. 7.7. (a) It is not diagonalizable since the dimension of the eigenspace for λ = 3 is one and the multiplicity of the eigenvalue is two. (b) There are three distinct eigenvalues, so the matrix is diagonalizable. 7.8. Take v to be the element of Rn with all its entries equal to one. Then the ith component of Av is just the sum of the entries in the ith row of A. Since these are all equal to a, it follows that Av = av, so v is an eigenvector with corresponding eigenvalue a.

22

II. DETERMINANTS AND EIGENVALUES

CHAPTER III

APPLICATIONS 1.1. The eigenvalues are λ = 5, −5. An orthonormal basis of eigenvectors consists of · ¸ 1 1 √ , 5 2

· ¸ 1 −2 √ . 1 5

1.2. The eigenvalues are λ = 5, −5. A basis of eigenvectors consists of · ¸ 1 , 4

·

−1 1

¸

which are not perpendicular. However, the matrix is not symmetric, so there is no special reason to expect that the eigenvectors will be perpendicular. 1.3. The eigenvalues are 0, 1, 2. An orthonormal basis is        0 −1 1   1 1 √  0,1, √ 0 .  2 2 1  0 1 1.4. The columns of the matrix   

− √52 √4 2 √3 2

0 3 −4

√5 2 √4 2 √3 2

  

form an orthonormal basis of eigenvectors corresponding to the eigenvalues −4, 1, 6. 1.5. (P t AP )t = P t At (P t )t = P t AP . 2.1. (a)

(b)

     1 1   1 1 √ 0, √  1  2 3 −1  1        1 4 −8    1 1  25  0 1  5 √  , √   , √  4   70 −2 994  5 2  5 −17 0 23

24

III. APPLICATIONS

2.2. The eigenvalues are 0 with multiplicity 2 and 3 with multiplicity 1. A basis for the eigenspace corresponding to the eigenvalue 0 is     −1   −1  1, 0 .   0 1 Applying Gram Schmidt to this yields      −1 −1   1 1 √  1  , √  −1  .   2 6 0 2 an eigenvector of length 1 for the eigenvalue 3 is   1 1   √ 1 . 3 1 An orthonormal basis of eigenvectors is        −1 −1 1   1  1  , √1  −1  , √1  1  .  sqrt2 6 3 1  0 2 2.3. The characteristic polynomial is −[(λ+1)3 −12(λ+1)−16]. Put X = λ+1. Then the equation becomes X 3 −12X −16 = 0, and this factors as (X −4)(X +2)2 = 0, so the roots are X = 4, −2, −2. That means the eigenvalues are λ = 3 with multiplicity 1 and λ = −3 with multiplicity 2. For λ = 4, a normalized eigenvector is   1 1   1 . u1 = √ 3 1 For λ = −3,



   −1 −1 v2 =  1  , v3 =  0  0 1

form a basis of eigenvectors. Applying Gram–Schmidt yields     −1 −1 1  1 u2 = √ 1  , u3 = √  −1  . 2 6 0 2 However, it would also make sense to reverse the order and apply Gram–Schmidt to obtain     −1 −1 1 1 u02 = √  0  , u03 = √  2  . 2 6 −1 1

III. APPLICATIONS

25

2.4. (a) v10 = v1 , v20 = v2 − cv10 = v2 − cv1 for an appropriate scalar c, so it is clear that v20 can’t be zero, unless v2 is a multiple of v1 . (b) v30 = v3 − yv20 − zv20 for appropriate scalars y, z. If it were zero, that would say that v3 is a linear combination of v10 and v20 , which we know are linear combinations of v1 and v2 . Hence, v2 would be a linear combination of v1 and v2 , which by assumption is not the case. (The same argument works in general for any number of vectors, as long as they form a linearly independent set. If any vi0 = 0, then that would, after a lot of algebra, give a way to express vi in terms of v1 , . . . , vi−1 .) ¸ ·√ ¸ ·√ 3/2 √ −1/2 3/2 √1/2 . (b) . 3.1. (a) 1/2 −1/2 3/2 3/2 3.2. Replacing θ by −θ doesn’t change the cosine entries and changes the signs of the sine entries. 3.3. The ‘P ’ matrix is

"√

3 2 1 2

− 12 √

3 2

# .

Its inverse is its transpose, so the components of −gj in the new coordinate system are given by #· "√ ¸ · 1¸ 3 1 0 2 √2 . √2 = −g 3 3 −g −1 2 2

2

3.4. Such a matrix is

· ¸ 1 −1 1 √ . 1 1 2

The diagonal entries are −1, 3. 3.5. Such a matrix is

  

5 − 5√ 2 4 √ 5 2 3 √ 5 2

0 3 5 −4 5

5 √ 5 2 4 √ 5 2 3 √ 5 2

  

The diagonal entries are −4, 1, 6. 3.6. Let A, B be orthogonal, i.e., they are invertible and At = A−1 , B t = B −1 . Then AB is invertible and (AB)t = B t At = B −1 A−1 = (AB)−1 . The inverse of an orthogonal matrix is orthogonal. For, if At is the inverse of A, then A is the inverse of At . But (At )t = A, so At has the property that its transpose is its inverse. 3.7. The rows are also mutually perpendicular unit vectors. The reason is that another way to characterize an orthogonal matrix P is to say that the P t is the inverse of P , i.e., P t P = P P t = I. However, it is easy to see from this that P t is also orthogonal. (Its transpose (P t )t = P is also its inverse.) Hence, the columns of P t are mutually perpendicular unit vectors. But these are the rows of P .

26

III. APPLICATIONS

4.1. The principal axes are given by the basis vectors · ¸ 1 −1 , u1 = √ 1 2

· ¸ 1 1 u2 = √ . 2 1

In the new coordinates system the equation is (x0 )2 + 3(y 0 )2 = 2 which is the equation of an ellipse. 4.2. The eigenvalues are −1, 3. The conic is a hyperbola. The principal axes may be specified by the unit vectors · ¸ 1 −1 , u1 = √ 1 2

· ¸ 1 1 u2 = √ . 2 1

The equation in the new coordinates is −(x0 )2 + 3(y 0 )2 = 4. The points closest to 2 the origin are at x0 = 0, y 0 = ± √ . In the original coordinates, these points are 3 p p ±( 2/3, 2/3). 4.3. The principal axes are those along the unit vectors · ¸ 1 −4 , u1 = 3 5

· ¸ 1 3 u2 = . 5 4

The equation in the new coordinate system is −25(x0 )2 + 50(y 0 )2 = 50. The curve is a hyperbola. The points closest to the origin are given by x0 = 0, y 0 = 1 ±1. In the original coordinates these are the points ± (4, 3). There is no upper 5 bound on the distance of points to the origin for a hyperbola. 4.4. The principal axes are along the unit vectors given by   1 1   2 , u1 = √ 6 1

  −1 1  u2 = √ 0, 2 1

  1 1  u3 = √ −1  . 3 1

The equation in the new coordinates system is −3(x0 )2 + (y 0 )2 + 3(z 0 )2 = 1. This is an elliptic hyperboloid of one sheet centered on the x0 -axis.

III. APPLICATIONS

27

5.1. Compare this with Exercise 1 in Section 7. The equations are 2x = λ(2x + y) 2y = λ(x + 2y) 2

x + xy + y 2 = 1 This in effect says that (x, y) give the components of an eigenvector for the matrix · ¸ 2 1 1 2 2 as eigenvalue. However, since we already classified the conic in the aforewith λ mentioned exercise, it is easier to use that information here. In the new coordinates the equation is (x0 )2 + 3(y 0 )2 = 2. The maximum distance to the origin pis at √ 0 0 0 0 x = ± 2, y = 0 and the minimum distance to the origin is at x = 0, y = ± 2/3. Since the change of coordinates is orthogonal, we may still measure distance by p (x0 )2 + (y 0 )2 . Hence, the maximum square distance to the origin is 2 and the minimum square distance to the origin is 2/3. Note that the problem did not ask for the locations in the original coordinates where the maximum and minimum are attained. 5.2. (Look in the previous section for an analogous problem.) The equations are      1 −2 0 x x  −2 −1 −2   y  = λ  y  , x2 + y 2 + z 2 = 1. 0 −2 1 z z Thus we need to find eigenvectors of length 1 for the given matrix. However, we already did this in the aforementioned exercise. The answers are 1 ± √ (1, 2, 1), 6

1 ± √ (−1, 0, 1), 2

1 ± √ (1, −1, 1). 3

The values of the function at these three points are respectively −3, 1, 3. Since a continuous function on a closed bounded set must attain both a maximum and minimum values, the maximum is 3 at the third point and the minimum is −3 at the first point. 5.3. The Lagrange multiplier condition yields the equations x = λx y = λy z = λz 2

x + y2 = z2 . If λ 6= 1, then the first two equations show that x = y = 0. From this, the last equation shows that z = 0. Hence, (0, 0, 0) is one possible maximum point. If

28

III. APPLICATIONS

λ = 1, then the third equation shows that z = 0, and then the last equation shows that x = y = 0. Hence, that gives the same point. Finally, we have to consider all points where ∇g = h2x, 2y, −2zi = 0. Again, (0, 0, 0) is the only such point. Since x2 + y 2 + z 2 ≥ 0 and it does attain the value 0 on the given surface, it follows that 0 is its minimum value. Note that in this example ∇g = 0 didn’t give us any other candidates to examine, but in general that might not have been the case. 5.4. On the circle x2 + y 2 = 1, f (x, y) = 1 + 4xy, so maximizing f (x, y) is the same as maximizing xy. The level sets xy = c are hyperbolas. Some of these intersect the circle x2 + y 2 = 1 and some don’t intersect. The boundary between the two classes are the level curves xy = 1 and xy = −1. The first is tangent to the circle in the first and third quadrants and the second is tangent in the second and fourth quadrants. As you move on the circle toward one of these points of tangency, you cross level curves with either successively higher values of c or successively lower values of c. Hence, the points of tangency are either maximum or minimum points for xy. The maximum points occur in the first and third quadrants with xy attaining the value 1 at those points. Note also that the points of tangency are exactly where the normal to the circle and the normal to the level curve are parallel. 6.1. The characteristic polynomial of   −1 1 0  1 −2 1  0 1 −1 is   −1 − λ 1 0 det  1 −2 − λ 1  = (−1 − λ)((−2 − λ)(−1 − λ) − 1) − 1(−1 − λ) 0 1 −1 − λ = −(λ + 1)(λ2 + 3λ) = −(λ + 1)(λ + 3)λ. Hence, the eigenvalues are −ω 2 (m/k) = λ = −1, −3, and 0. As indicated in the problem statement, the eigenvalue 0 corresponds to a non-oscillatory solution in which the system moves freely p at constant velocity. For λ = −1, we have ω = k/m and       −1 1 0 1 0 1 0  1 1 1  →  0 1 0  yielding basic eigenvector u =  0  . 1 0 0 0 0 1 0 This corresponds to both end particles moving with equal displacements in opposite directions and the middle particle staying still. p For λ = −3, we have ω = 3k/m and       2 1 0 1 0 −1 1  1 1 1  →  0 1 2  yielding basic eigenvector u =  −2  . 0 1 2 0 0 0 1 This corresponds to both end particles moving together with equal displacements in the same direction and the middle particle moving with twice that displacement in the opposite direction.

III. APPLICATIONS

6.2. The characteristic polynomial of  −2  1 0 is



−2 − λ det  1 0

1 −2 − λ 1

29

 1 0 −2 1  1 −2

 0 1  = (−2 − λ)((−2 − λ)(−2 − λ) − 1) − 1(−2 − λ) −2 − λ = −(λ + 2)(λ2 + 4λ + 2).

√ √ 2 Hence, the eigenvalues are −ω p (m/k) = λ = −2, −2 + 2, and = −2 + 2. For λ = −2, we have ω = 2k/m and       0 1 0 1 0 1 −1  1 0 1  →  0 1 0  yielding basic eigenvector u =  0  . 0 1 0 0 0 1 This corresponds to both end particles moving with equal displacements in opposite directions and the middle particle q staying still. √ √ For λ = −2 − 2, we have ω = (2 + 2)k/m and √

2  1 0

  1 0 0 √1 2 √1  →  0 1 0 0 2 1

   −1 1 √ √ 2  yielding basic eigenvector u =  − 2  . 0 1

This corresponds to both end particles moving together √ with equal displacements in the same direction and the middle particle moving with 2 times that displacement in the opposite direction. q √ √ For λ = −2 + 2, we have ω = (2 − 2)k/m and  √ − 2  1 0

1 √

− 2 1

     1 0 −1 0 √1 √ 1  →  0 1 − 2  yielding basic eigenvector u =  2  . √ 0 0 0 1 − 2

This corresponds to both end particles moving together √ with equal displacements in the same direction and the middle particle moving with 2 times that displacement in the same direction. Notice that the intuitive significance of this last normal mode is not so clear. 6.3. The given information about · ¸ the first normal mode tells us that a corresponding 2 . Any basic eigenvector for the second normal mode basic eigenvector is v1 = 1 · ¸ 1 . Hence, the relation must be perpendicular to v1 , so we can take v2 = −2 between the displacements for the second normal mode is x2 = −2x1 .

30

III. APPLICATIONS

r

k , and the relative motions of the particles satisfy m r 6k x2 = 2x1 . For the other normal mode, ω = and the relative motions of the m particles satisfy x1 = 2x2 . r √ k √ 6.5. For one normal mode, λ = −3 + 5 and ω = (3 − 5) . The eigenspace is m obtained by reducing the matrix 6.4. For one normal mode, ω =

·

−1 − 2



5

¸ 2√ . 1− 5

Note that this matrix must be singular so that √ the first row must be a multiple of −1 − 5 the second row. (The multiple is in fact . Check it!) Hence, the reduced 2 matrix is · ¸ √ 1 (1 − 5)/2 . 0 0 √ 5−1 x2 . A similar analysis The relative motions of the particles satisfy x1 = 2 r √ k show that for the other normal mode, ω = (3 + 5) , and the relative motions m √ 5+1 x2 . of the particles satisfy x1 = − 2 6.6. The information given tells us that two of the eigenvectors are   1 1 1

 and

 1  −2  . 1

Any basic eigenvector  the third normal mode must be perpendicular to this. If  for v1 its components are  v2 , then we must have v2 v1 + v 2 + v 3 = 0

and

v2 − 2v2 + v3 = 0.

By the usual method, we find that a basis for the null space of this system is given by   −1  0 1 whence we conclude that the relative motions satisfy x1 = −x3 , x2 = 0. 7.1. The set is not linearly independent.

III. APPLICATIONS

31

7.2. (a) For λ = 2, a basis for the corresponding eigenspace is    1  1 .   1 For λ = −1, a basis for the corresponding eigenspace is     −1   −1  1 ,  0  .   0 1 (b) An orthonormal basis of eigenvectors is      1 −1  1 1 √ 1, √  1 ,  3 2 1 0

  −1  1  √ −1  .  6 2

(c) We have √ 1/√3 P =  1/√3 1/ 3 

√ √  −1/√ 2 −1/√6 1/ 2 −1/√ 6  , 0 2/ 6



2 P t AP =  0 0

0 −1 0

 0 0 . −1

The answers would be different if the eigenvalues or eigenvectors were chosen in some other order. 7.3. The columns of P must also be unit vectors. · ¸ 1 −1 constitutes a basis for the corresponding eigenspace. 7.4. (a) For l = 6, √ · ¸ 5 2 1 2 constitutes a basis for the corresponding eigenspace. For l = 1, √ √ ¸ ·5 1√ −1/√5 2/√5 . The equation is 6u2 + v 2 = 24. (b) P = 2/ 5 1/ 5 (c) The conic is an ellipse with principal axes along the axes determined by the two basic eigenvectors.   2 0 0 7.5. First find an orthonormal basis of eigenvectors for  0 1 2 . The eigenvalues 0 2 1 are λ = 3, 2, −1. The corresponding basis is        0 0  1  1 1 √  1  ,  0  , √  −1  .  2  2 1 1 0 Picking new coordinates with these as unit vectors on the new axes, the equation is 3(x0 )2 + 2(y 0 )2 − (z 0 )2 = 6. This is a hyperboloid of one sheet. Its axis is the z 0 -axis and this points along the third vector in the above list.

32

III. APPLICATIONS

If the eigenvalues had been given in another order, you would have listed the basic eigenvectors in that order also. You might also have conceivably picked a negative of one of those eigenvectors for a basic eigenvector. The new axes would be the same except that they would be labeled differently and some might have directions reversed. The graph would be the same but its equation would look different because of the labeling of the coordinates. Note that just to identify the graph, all you needed was the eigenvalues. The orthonormal basis of eigenvectors is only necessary if you want to sketch the surface relative to the original coordinate axes. 7.6. One normal mode x = v cos(ωt) has λ = −1, r ω=

k , m

· ¸ 1 v= . 1

In this mode, x1 = x2 , so the particles move in the same direction with equal displacements. A second normal mode x = v cos(ωt) has λ = −5, r ω=

5k , m

· v=

¸ −1 . 1

In this mode, x1 = −x2 , so the particles move in the opposite directions with equal displacements. 7.7. (a) Not diagonalizable. 1 is a triple root of the characteristic equation, but A − I has rank 2, which shows that the corresponding eigenspace has dimension 1. (b) If you subtract 2 from each diagonal entry, and compute the rank of the resulting matrix, you find it is one. So the eigenspace corresponding to l = 2 has dimension 3 − 1 = 2. The other eigenspace necessarily has dimension one. Hence, the matrix is diagonalizable. (c) The matrix is real and symmetric so it is diagonalizable.

INDEX

Gaussian reduction 17 Gauss–Jordan reduction 20 Gram–Schmidt Process 116

angular frequency 140 associative law for matrices 13 back substitution 17, 32 basis 54 basis, standard 54

homogeneous system 42 hyperquadric 127 identity matrix 14 inconsistent system 25 independence, linear 52 inverse of a matrix 27 invertible matrix 27

central conic or quadric 126 characteristic equation 93 cofactor 85 column space of a matrix 65 commutative law, fails for matrix product 14 conic, central 126 constraint 134 coordinates, normal 144 coordinates with respect to a basis 56 Cramer’s rule 86

Jordan reduction 20 Lagrange multiplier 135 linear combination 52 linear independence 52 linear independence in Rn 62 LU factorization 29, 30

determinant, definition of 74 determinant of upper triangular matrix 78 diagonalizable 101 diagonalization 102 dimension 55 dimension, invariance of 58

matrix 5 matrix, exponential of 105 matrix inverse 27 matrix, invertible 27 matrix, non-singular 25 matrix product 8 matrix, rank of 41 matrix series 106 matrix, singular 25 maxima and minima with constraints 134 minor 85

eigenspace 93 eigenvalue 91 eigenvector 91 elementary matrices 22 elementary row operations 18 exponential of a matrix 105

non-singular matrix 25 normal coordinates 144 normal mode 143, 144

forward substitution 32 frequency 140 149

150

INDEX

null space 47, 54 nullity 42, 55

row reduced echelon form 20 row space of a matrix 66

permutation matrices 35 pivot 25 polynomial equations 96 product of matrices 8 pseudo-inverse of a matrix 42

series of matrices 106 singular matrix 25 standard basis 54 system, homogeneous 42 system, inconsistent 25

quadric, central 126

transpose of a matrix 83

Rn 39 range 66 rank of a matrix 41 Rayleigh–Ritz problem 137 row operations, elementary 18 row operations, reversibility of 22

upper triangular matrix, determinant of 78 upper triangular matrix 25 vector subspace 47

150