Rigorous yet engaging, Linear Algebra offers a unified treatment of both matrix-oriented and theoretical approaches to t
2,667 594 177MB
English Pages 376 [448] Year 2018
CAMBRIDGE
eee ce
LINEAR ALGEBRA
Elizabeth S. Meckes Mark W. Meckes
Linear Algebra
Linear Algebra offers a unified treatment of both matrix-oriented and theoretical approaches to the course, which will be useful for classes with a mix of mathematics, physics, engineering, and computer science students. Major topics include singular value decomposition, the spectral theorem, linear systems of equations, vector spaces, linear maps, matrices, eigenvalues and eigenvectors, linear independence, bases, coordinates, dimension, matrix factorizations, inner products, norms, and determinants.
CAMBRIDGE
MATHEMATICAL TEXTBOOKS
Cambridge Mathematical Textbooks is a program of undergraduate and beginning graduate level textbooks for core courses, new courses, and interdisciplinary courses in pure and applied mathematics. These texts provide motivation with plenty of exercises of varying difficulty, interesting examples, modern applications, and unique approaches to the material. ADVISORY
BOARD
John B. Conway, George Washington University
Gregory F. Lawler, University of Chicago John M. Lee, University of Washington John Meier, Lafayette College Lawrence C. Washington, University of Maryland, College Park
A complete list of books in the series can be found at www.cambridge.org/mathematics
Recent titles include the following: Chance, Strategy, and Choice: An Introduction to the Mathematics of Games and Elections, S. B. Smith Set Theory: A First Course, D. W. Cunningham Chaotic Dynamics: Fractals, Tilings, and Substitutions, G. R. Goodson Introduction to Experimental Mathematics, S. Eilers & R. Johansen A Second Course in Linear Algebra, S. R. Garcia & R. A. Horn Exploring Mathematics: An Engaging Introduction to Proof, J. Meier & D. Smith
A First Course in Analysis, J. B. Conway Introduction to Probability, D. F. Anderson, T. Seppalainen & B. Valko Linear Algebra, E. S. Meckes & M. W. Meckes
Linear Algebra ELIZABETH
S. MECKES
Case Western Reserve University, Cleveland, OH,
MARK
USA
W. MECKES
Case Western Reserve University, Cleveland, OH, USA
CAMBRIDGE UNIVERSITY
PRESS
CAMBRIDGE
UNIVERSITY PRESS
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY
10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314-321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi - 110025, India 79 Anson Road, #06-04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107177901 DOI: 10.1017/9781316823200
© Elizabeth S. Meckes and Mark W. Meckes 2018
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.
First published 2018 Printed in the United States of America by Sheridan Books, Inc, June 2018 A
catalog record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data Names: Meckes, Elizabeth S., author. | Meckes, Mark W., author. Title: Linear algebra / Elizabeth S. Meckes (Case Western Reserve University, Cleveland, OH, USA), Mark W. Meckes (Case Western Reserve University, Cleveland, OH, USA). Description: Cambridge : Cambridge University Press, [2018] | Includes bibliographical references and index. Identifiers: LCCN
2017053812
| ISBN
9781107177901
(alk. paper)
Subjects: LCSH: Algebras, Linear-Textbooks. Classification: LCC QA184.2 .M43 2018 | DDC 512/.5-de23 LC record available at https://Icen.loc.gov/2017053812 ISBN 978-1-107-17790-1
Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
To Juliette and Peter
Contents
Preface To the Student
page xiii
xvii
Linear Systems and Vector Spaces 11
1.2
15:
2.1
Linear Systems of Equations
Bread, Beer, and Barley Linear Systems and Solutions Gaussian Elimination The Augmented Matrix of a Linear System Row Operations Does it Always Work? Pivots and Existence and Uniqueness of Solutions Vectors and the Geometry of Linear Systems Vectors and Linear Combinations The Vector Form of a Linear System The Geometry of Linear Combinations The Geometry of Solutions Fields General Fields Arithmetic in Fields Linear Systems over a Field Vector Spaces General Vector Spaces Examples of Vector Spaces Arithmetic in Vector Spaces Linear Maps and Matrices
63
Linear Maps
63 63 65 67 69
Recognizing Linear Maps Matrices as Eigenvalues
Sameness in Geometry Linear Maps and Eigenvectors
viii
Contents
The Matrix-Vector Form of a Linear System More on Linear Maps Isomorphism Properties of Linear Maps The Matrix of a Linear Map Some Linear Maps on Function and Sequence Spaces Matrix Multiplication Definition of Matrix Multiplication Other Ways of Looking at Matrix Multiplication The Transpose Matrix Inverses Row Operations and the LU Decomposition Row Operations and Matrix Multiplication Inverting Matrices via Row Operations The LU Decomposition Range, Kernel, and Eigenspaces Range Kernel Eigenspaces Solution Spaces Error-correcting Linear Codes Linear Codes Error-detecting Codes Error-correcting Codes The Hamming Code
73 78 78 80 83 86 90 90 93 96 97 102 102 105 107 114 115 118 120 123 129 129 130 133 134
3
Linear Independence, Bases, and Coordinates
140
3.1
Linear (In)dependence
140
Redundancy Linear Independence The Linear Dependence Lemma Linear Independence of Eigenvectors Bases Bases of Vector Spaces Properties of Bases Bases and Linear Maps Dimension The Dimension of a Vector Space Dimension, Bases, and Subspaces Rank and Nullity The Rank and Nullity of Maps and Matrices The Rank-Nullity Theorem
140 142 145 146 150 150 152 155 162 163 167 172 172 175
2.2
2.3.
2.4
2.5
2.6
3.2
3.3
3.4
Contents
Consequences of the Rank-Nullity Theorem Linear Constraints Coordinates Coordinate Representations of Vectors Matrix Representations of Linear Maps Eigenvectors and Diagonalizability Matrix Multiplication and Coordinates Change of Basis Change of Basis Matrices Similarity and Diagonalizability Invariants Triangularization Eigenvalues of Upper Triangular Matrices Triangularization
178 181 185 185 187 191 193 199 199 203 206 215 215 218
4
Inner Products
225
4.1
Inner Products
225
3.5
3.6
3.7
The Dot Product in R”
225
Inner Product Spaces Orthogonality More Examples of Inner Product Spaces Orthonormal Bases Orthonormality Coordinates in Orthonormal Bases The Gram-Schmidt Process Orthogonal Projections and Optimization Orthogonal Complements and Direct Sums Orthogonal Projections Linear Least Squares Approximation of Functions Normed Spaces General Norms The Operator Norm Isometries Preserving Lengths and Angles Orthogonal and Unitary Matrices The QR Decomposition
226 229 233 239 239 241 244 252 252 255 259 260 266 267 269 276 276 281 283
5
Singular Value Decomposition and the Spectral Theorem
289
5.1
Singular Value Decomposition of Linear Maps Singular Value Decomposition Uniqueness of Singular Values
289 289 293
4.2
4.3
4.4
4.5
Contents
5.2
297 297 301 303 311
Singular Value Decomposition of Matrices Matrix Version of SVD SVD and Geometry Low-rank Approximation Adjoint Maps The Adjoint of a Linear Map Self-adjoint Maps and Matrices The Four Subspaces Computing SVD The Spectral Theorems Eigenvectors of Self-adjoint Maps and Matrices Normal Maps and Matrices Schur Decomposition
316 320 321 324 327
6
Determinants
333
6.1
Determinants
333 333 336 339 346 346 349 351 357 358 360 362 366 366 370 371
5.3
5.4
Multilinear Functions The Determinant 6.2
6.3
6.4
Existence and Uniqueness of the Determinant Computing Determinants Basic Properties Determinants and Row Operations Permutations Characteristic Polynomials The Characteristic Polynomial of a Matrix Multiplicities of Eigenvalues The Cayley-Hamilton Theorem Applications of Determinants Volume Cramer's Rule Cofactors and Inverses
311
314 315
Appendix
378
A.1_
378 378 380 382 384 384 385 386 388
A.2 A.3
Sets and Functions Basic Definitions Composition and Invertibility Complex Numbers Proofs
Logical Connectives Quantifiers
Contrapositives, Counterexamples, and Proof by Contradiction Proof by Induction
xi
Contents
Addendum Hints and Answers to Selected Exercises Index
390 391
423
Preface
It takes some
chutzpah
to write a linear algebra
book. With
so many
choices
already available, one must ask (and our friends and colleagues did): what is new here? The most important context for the answer to that question is the intended audience. We wrote the book with our own students in mind; our linear algebra
course has a rather mixed audience, including majors in mathematics, applied mathematics, and our joint degree in mathematics and physics, as well as students in computer science, physics, and various fields of engineering. Linear algebra will be fundamental to most if not all of them, but they will meet it in different guises;
this course is furthermore the only linear algebra course most of them will take. Most introductory linear algebra books fall into one of two categories: books written in the style of a freshman calculus text and aimed at teaching students to do computations with matrices and column vectors, or full-fledged “theoremproof” style rigorous math texts, focusing on abstract vector spaces and linear maps, with little or no matrix computation. This book is different. We offer a unified treatment, building both the basics of computation and the abstract theory from the ground up, emphasizing the connections between the matrix-oriented viewpoint and abstract linear algebraic concepts whenever possible. The result serves students better, whether they are heading into theoretical mathematics or towards applications in science and engineering. Applied math students will learn Gaussian elimination and the matrix form of singular value decomposition (SVD), but they will also learn how abstract inner product space theory can tell them about expanding periodic functions in the Fourier basis. Students in theoretical mathematics will learn foundational results about vector spaces and linear maps, but they will also learn that Gaussian elimination can be a useful and elegant theoretical tool. Key features of this book include: e
Early introduction of linear maps: Our perspective is that mathematicians invented vector spaces so that they could talk about linear maps; for this reason, we introduce linear maps as early as possible, immediately after the introduction of vector spaces.
xiv
Preface Key concepts referred to early and often: In general, we have introduced topics we see as central (most notably eigenvalues and eigenvectors) as early as we could, coming back to them again and again as we introduce new concepts which
connect
to these
central
ideas. At the end
of the course,
rather than
having just learned the definition of an eigenvector a few weeks ago, students
will have worked with the concept extensively throughout the term. e Eases the transition from calculus to rigorous mathematics: Moving beyond the more problem-oriented calculus courses is a challenging transition; the book was written with this transition in mind. It is written in an accessible style, and we have given careful thought to the motivation of new ideas and to parsing difficult definitions and results after stating them formally. Builds mathematical maturity: Over the course of the book, the style evolves from extremely approachable and example-oriented to something more akin to the style of texts for real analysis and abstract algebra, paving the way for future courses in which a basic comfort with mathematical language and rigor is expected. Fully rigorous, but connects to computation and applications: This book was written for a proof-based linear algebra course, and contains the necessary theoretical foundation of linear algebra. It also connects that theory to matrix computation and geometry as often as possible; for example, SVD is considered abstractly as the existence of special orthonormal bases for a map; from a geometric point of view emphasizing
rotations,
reflections, and
distortions;
and
from a more computational point of view, as a matrix factorization. Orthogonal projection in inner product spaces is similarly discussed in theoretical, computational, and geometric ways, and is connected with applied minimization problems such as linear least squares for curve-fitting and approximation of smooth functions on intervals by polynomials. Pedagogical features: There are various special features aimed at helping students learn to read a mathematics text: frequent “Quick Exercises” serve as checkpoints, with answers upside down at the bottom of the page. Each section ends with a list of “Key Ideas,” summarizing the main points of the section. Features called “Perspectives” at the end of some chapters collect the various viewpoints on important concepts which have been developed throughout the text.
Exercises: The large selection of problems is a mix of the computational and the theoretical, the straightforward and the challenging. There are answers or hints to selected problems in the back of the book. The book elimination, Section 1.3 systems. We
spaces.
begins with linear systems of equations over R, solution by Gaussian and the introduction of the ideas of pivot variables and free variables. discusses the geometry of R” and geometric viewpoints on linear then move into definitions and examples of abstract fields and vector
Preface Chapter 2 is on linear maps. They are introduced with many examples; the usual cohort of rotations, reflections, projections, and multiplication by matrices in R", and more abstract examples like differential and integral operators on
function spaces. Eigenvalues are first introduced in Section 2.1; the representation of arbitrary linear maps on F” by matrices is proved in Section 2.2. Section 2.3 introduces matrix multiplication as the matrix representation of composition, with an immediate derivation of the usual formula. In Section 2.5, the range, kernel, and eigenspaces of a linear map are introduced. Finally, Section 2.6 introduces the Hamming code as an application of linear algebra over the field of two elements. Chapter 3 introduces linear dependence and independence, bases, dimension, and the Rank-Nullity Theorem. Section 3.5 introduces coordinates with respect to arbitrary bases and the representation of maps between abstract vector spaces as matrices; Section 3.6 covers change of basis and introduces the idea of diagonalization and its connection to eigenvalues and eigenvectors. Chapter 3 concludes by showing that all matrices over algebraically closed fields can be triangularized. Chapter 4 introduces general inner product spaces. It covers orthonormal bases and the Gram-Schmidt algorithm, orthogonal projection with applications to least squares and function approximation, normed spaces in general and the operator norm of linear maps and matrices in particular, isometries, and the QR decomposition. Chapter 5 covers the singular value decomposition and the spectral theorem. We begin by proving the main theorem on the existence of SVD and the uniqueness of singular values for linear maps, then specialize to the matrix factorization. There is a general introduction to adjoint maps and their properties, followed by the Spectral Theorem in the Hermitian and normal cases. Geometric interpretation of SVD and truncations of SVD as low-rank approximation are discussed in Section 5.2. The four fundamental subspaces associated to a linear map, orthogonality, and the connection to the Rank-Nullity Theorem are discussed in Section 5.3. Finally, Chapter 6 is on determinants. We have taken the viewpoint that the determinant is best characterized as the unique alternating multilinear form on matrices taking value 1 at the identity; we derive many of its properties from that characterization. We introduce the Laplace expansion, give an algorithm for computing determinants via row operations, and prove the sum over permutations formula. The last is presented as a nice example of the power of linear algebra: there is no long digression on combinatorics, but instead permutations are quickly identified with permutation matrices, and concepts like the sign of a permutation arise naturally as familiar linear algebraic constructions. Section 6.3 introduces the characteristic polynomial and the Cayley-Hamilton Theorem, and Section 6.4 concludes the chapter with applications of the determinant to volume and Cramer's tule. In terms of student prerequisites, one year of calculus is sufficient. While calculus is not needed for any of the main results, we do rely on it for some examples and exercises (which could nevertheless be omitted). We do not expect students to
xv
Preface have taken a rigorous mathematics course before. The book is written assuming some basic background on sets, functions, and the concept of a proof; there is an
appendix containing what is needed for the student's reference (or crash course). Finally, some thanks are in order. To write a textbook that works in the class-
room, it helps to have a classroom to try it out in. We are grateful to the CWRU
Math 307 students from Fall 2014, Spring and Fall 2015, and Spring 2016 for their roles as cheerful guinea pigs. A spectacular feature of the internet age is the ability to get help typesetting a book from someone half-way around the world (where it may in fact be 2 in the morning). We thank the users of tex.stackexchange.com for generously and knowledgeably answering every question we came up with. We began the project of writing this book while on sabbatical at the Institut de Mathématiques de Toulouse at the University of Toulouse, France. We thank the Institut for its warm hospitality and the Simons Foundation for providing sabbatical support. We also thank the National Science Foundation and the Simons Foundation for additional support. And lastly, many thanks to Sarah Jarosz, whose album Build Me Up From Bones provided the soundtrack for the writing of this book.
ELIZABETH MECKES MARK MECKES
Cleveland, Ohio, USA
To the Student
This will be one of the most important classes you ever take. Linear algebra and calculus are the foundations of modern mathematics and its applications; the language and viewpoint of linear algebra is so thoroughly woven into the fabric of mathematical reasoning that experienced mathematicians, scientists, and engineers can forget it is there, in the same way that native speakers of a language seldom think consciously about its formal structure. Achieving this fluency is a big part of that nebulous goal of “mathematical maturity.” In the context of your mathematical education, this book marks an important transition. In it, you will move away from a largely algorithmic, problem-centered viewpoint toward a perspective more consciously grounded in rigorous theoretical mathematics. Making this transition is not easy or immediate, but the rewards of learning to think like a mathematician run deep, no matter what your ultimate career goals are. With that in mind, we wrote this book to be read - by you, the student. Reading and learning from an advanced mathematics text book is a skill, and one that we hope this book will help you develop. There are some specific features of this book aimed at helping you get the most out of it. Throughout the book, you will find “Quick Exercises,” whose answers are usually found (upside down) at the bottom of the page. These are exercises which you should be able to do fairly easily, but for which you may need to write a few lines on the back of an envelope. They are meant to serve as checkpoints; do them! The end of each section lists “Key Ideas,” summarizing (sometimes slightly informally) the big picture of the section. Certain especially important concepts on which there are many important perspectives are summarized in features called “Perspectives” at the end of some chapters. There is an appendix covering the basics of sets, functions,
and
complex
number
arithmetic,
together with
some
formal logic and proof techniques. And of course, there are many exercises. Math-
ematics isn’t something to know, it’s something to do; it is through the exercises that you really learn how.
Linear Systems and Vector Spaces
1.1 Linear Systems of Equations Bread, Beer, and Barley We begin with a very simple example. Suppose you have 20 pounds of raw barley
and you plan to turn some of it into bread and some of it into beer. It takes one pound of barley to make a loaf of bread and a quarter pound of barley to make a pint of beer. You could use up all the barley on 20 loaves of bread, or alternatively, on 80 pints of beer (although that’s probably not advisable). What are your other options? Before rolling up our sleeves and figuring that out, we make the following obvious-seeming observation, without which rolling up our sleeves won't help much.
It's very difficult to talk about something that has no name.
That is, before we can do math we have to have something concrete to do math
to. Therefore: let x be the number of loaves of bread you plan to bake and y be the number of pints of beer you want to wash it down with. Then the information above can be expressed as
1 tay—y = 20,
Ll (1.1)
and now we have something real (to a mathematician, anyway) to work with. This object is called a linear equation in two variables. Here are some things to notice about it:
e There are infinitely many solutions to equation (1.1) (assuming you're okay with fractional loaves of bread and fractional pints of beer). e We only care about the positive solutions, but even so, there are still infinitely many
choices. (It's important to notice that our interest in positive solutions
is a feature of the real-world situation being modeled, but it's not built into
the model itself. We just have to remember what we're doing when interpreting solutions. This caveat may or may not be true of other models.)
Linear Systems and Vector Spaces
e
We could specify how much bread we want and solve for how much beer we can have, or vice versa. Or we could specify a fixed ratio of bread to beer (i.e.,
e
fix the value of c = .x/y) and solve. For the graphically inclined, we can draw a picture of all the solutions of this equation in the .-y plane, as follows: y
Figure 1.1 Graph of the solutions of equation (1.1).
Each point in the x-y plane corresponds to a quantity of bread and of beer, and
if a point lies on the line above, it means that it is possible to exactly use up all of the barley by making those quantities of bread and beer. We have, of course, drastically oversimplified the situation. For starters, you also need yeast to make both bread and beer. It takes 2 teaspoons yeast to make a loaf of bread and a quarter teaspoon to make a pint of beer. Suppose you have meticulously measured that you have exactly 36 teaspoons of yeast available for your fermentation processes. We now have what's called a linear system of equations, as follows:
1 r+ -y=20 ‘ dx+ —yre = 36.
(1.2)
You could probably come up with a couple of ways to solve this system. Just to give one, if we subtract the first equation from the second, we get r=
16.
If we then plug . = 16 into the first equation and solve for y, we get 16+
1
7y=
20
S
y=
16.
(The symbol above is read aloud as “if and only if? and it means that the two equations are equivalent; i.e., that any given value of y makes the equation on the left true if and only if it makes the equation on the right true. Please learn to use this symbol when appropriate; it is not correct to use an equals sign instead.)
1.1 Linear Systems of Equations
We now say that the solution to the linear system (1.2) is xr = 16,y = 16. In particular, we've discovered that there’s exactly one way to use up all the barley and all the yeast to make bread and beer.
Here are some things to notice: e
We can represent the situation modeled by system (1.2) graphically: Figure 1.2 Graph of the solutions of the equations in
system (1.2): blue for the first equation and red for the second.
In Figure 1.2, each line represents all the solutions to one of the equations in system (1.2). The intersection of these two lines (at the point (16, 16)) represents the unique way of using up all of both your ingredients.
e
If the amount of yeast had been different, we might have ended up with a solution that did not have both x, y > 0.
has a solution,
e
but with either x < 0 or y < 0.
If you switch to a different bread recipe that only requires 1 teaspoon (tsp) of yeast per loaf, we might instead have infinitely many solutions (as we did before considering the yeast), or none. (How?)
Suppose now milk and dried into the dough; also be nice if with rosemary,
that your menu gets more interesting: someone comes along with rosemary. Your bread would taste better with a little of each put also, you can use the milk to make a simple cheese, which would you flavored it with rosemary. The beer might be good flavored too. Suppose you use 1 cup of milk per loaf of bread, and 8 cups
*What great philosopher of the modern era said “A man can live on packaged food from here ‘til Judgment Day if he’s got enough rosemary."?
"0 > Ful YNSaL |I\m yseaA JO suOodsea} OZ UEY} Ssaj BuIyAUY
:1# VO
Linear Systems and Vector Spaces
of milk per round of cheese. You put 2 tsps of tsp in each round of cheese and 1/4 tsp in each variable z - the number of rounds of cheese the milk and one for the rosemary. Suppose you milk and 56 tsps of rosemary. Our linear system
rosemary in each loaf of bread, 1 pint of beer. Then we have a new and two new equations, one for have 11 gallons (i.e., 176 cups) of now becomes:
1 r+—y=20 res 2x +
1 —y re
= 36
(1.3)
«+ 8z = 176
1 det Gy +2 = 56. If we go ahead and solve the system, we will find that
x =
16, y =
16,z = 20.
Since we've been given the solution, though, it’s quicker just to check that it’s correct.
Quick Exercise #2. Check that this solution is correct.
In any case, it seems rather lucky that we could solve the system at all; i.e., with four ingredients being divided among three products, we were able to exactly use up everything. A moment's reflection confirms that this was a lucky accident: since it worked out so perfectly, we can see that if you’d had more rosemary but the same amount of everything else, you wouldn't be able to use up everything exactly. This is a first example of a phenomenon we will meet often: redundancy. The system of equations (1.3) is redundant: if we call the equations E,, Eo, F3, E4,
then
f= (z)e+(Z)e+(5)B 1
In particular,
7
1
if (x,y,z) is a solution to all three equations
E,, Fp, and E3, then
it satisfies E, automatically. Thus E, tells us nothing about the values of (x,y,z) that we couldn’t already tell from just the first three equations. The solution + = 16, y = 16, z = 20 is in fact the unique solution to the first three equations, and it satisfies the redundant fourth equation for free.
Linear Systems and Solutions Now
we'll start looking at more general systems of equations that resemble the
ones that came up above. Later in this book, we will think about various types of numbers, some of which you may never have met. But for now, we will restrict puey oo} 11 Buryews
31,N0K uayy ‘Oz pue ‘ST 91 SiaquNU ay) UI BnId asf EY parer1|dul0D sow BuIyPAUe pip NOK 4 :z# WO
1.1 Linear Systems of Equations
our attention to equations involving real numbers and real variables, i.e., unknown real numbers.
Definition The set of all real numbers is denoted by R. The notation t € R is read “f is in R” or “f is an element of IR” or simply “t is a real number.”
Definition A linear system of m equations in n variables over R equations of the form
is a set of
Ayr) toes + Gintn = by (1.4) Amity +--+ + Ginn tn = Om. Here, aij € R for each pair (i,j) with 1
=I
1
2
3
2
3
-I 2
2
‘
1
Figure 1.6 Geometrically representing different vectors in R as linear combinations of A
wi} 1
1 2.
2
Returning next to the example from page 28 of v; = | 1 | andv2=]—1], =I 0 since v; and vp do not point in the same direction, the span (vj, v2) is a plane 1
in R?. We found algebraically that the vector | 1 | does not lie in this plane. 1
A With the more geometric viewpoint on vectors in mind, let's return to the 2 x 2 linear system written in vector form in equation (1.14):
Ebel)Ee)
Geometrically, asking if the system has a solution is the same as asking whether
it is possible to get from the origin in the x-y plane to the point (20, 36) by first
Linear Systems and Vector Spaces
going some distance in the direction of [:] (or the opposite direction), and from 1
there going some distance in the direction of | 4 | (or the opposite direction). The
a
answer, as we've already seen, is yes: y
Figure 1.7 Geometric demonstration that the system
(1.14) is consistent.
i}
5
10
15
x
20-25
Moreover, this can be done in one and only one way: it’s easy to see in the picture that if we change the value of x, we either overshoot or undershoot, and
there's no distance we could then move in the direction of | 7 | to hit the point a (20, 36).
The situation can change a lot if we make a small change to the system. For example, if you actually had 40 teaspoons of yeast, and you suspect that your beer brewing would work better if you used half a teaspoon of yeast per pint rather than only a quarter, you would have the system 1
r+—y=
‘
20
(1.17)
24+ =o
Wels Nisa
or in vector form:
In this case, it is possible to get from the origin in the x-y plane to the point ; , and from there going
some distance in the direction of 4 to do it: the vectors
Al and f
NIG
1
(20, 40) by first going some distance in the direction of
| and there are in fact infinitely many ways
both point directly from the origin toward the 2.
point (20, 40): What's happening here is that _there is redundancy in the geometry; we don’t
need both of the vectors
Nisa
32
to get to the point (20, 40). We can also see the
1.3 Vectors and the Geometry of Linear Systems y
33
Figure 1.8 Geometric illustration of one solution
of the system (1.18).
40 30
20
redundancy in the system from an algebraic point of view: in the system (1.17),
the second equation is new information. The situation would use more yeast in your lying around. Then the
just the first equation multiplied by 2, so it contains no again have been entirely different if you had wanted to brewing process, but you didn’t happen to have any extra new system would have been r+
1
-y= 20 red 1 2x+ =—y3 = 36,
or in vector form:
(1.19)
sos
LJ[]-E3):
The system (1.20) is inconsistent. Quick Exercise #14.
(a) Draw a
picture that shows
that the system
(1.20)
is
inconsistent. (b) Show algebraically that the system (1.20) is inconsistent.
go
The Geometry of Solutions So far in this section, the vectors whose geometry we've exploited have been the columns of the augmented matrix of a linear system. The vectors in question came I 4
4
T
‘| SI 4aUY aU ‘AjaANeWaly “Ob # 9E ()
>)
«1m _s_g, (2) “vl WD
34
Linear Systems and Vector Spaces
directly from the coefficients of the system. Another way that geometry can shed light on linear systems is by viewing the solutions of a system as vectors. Recall the system 1
= ttayt
1 =—w=20 3” 1
dx + 4—y = 36
(1.21)
«+8z+w=176
describing how much bread, beer, cheese, and muesli can be made from our supply of barley, yeast, and milk. A solution of this system consists of values for .x, y, z, w,
#£ which we can write as a vector
,
€ R*. In Section 1.2, we determined that .x,
Ww
y, and z are the pivot variables, and we found the following formulas for them in terms of the free variable w: 1
r=
16+ -w 2
y= 16—4w
(1.22)
3 z=20—-—w. 16
If we add to these the trivially true formula w = equivalent to the vector equation
x y z
=
16 + jw 16 — 4w 3 20- xw
w
w
=
16 16 20
w, then the system (1.22) is
+w
4 -4 -43
0
1
Geometrically, this says that the set of solutions of the system (1.21) is the line in 1
2
R* through the point (16, 16, 20, 0) with direction vector
—4
3
16 1
The same basic approach works with multiple free variables. Consider the 3 x 5 linear system X) + 2x2 + 5x4 = —-3
x)
— 242 2x,
+43 — 6X4 +45
= 2
— 4X2 — 1024 + ¥5 = 8.
(1.23)
35
1.3 Vectors and the Geometry of Linear Systems
Quick Exercise #15. Show that the RREF of the augmented matrix of the system
(1.23) is 120 5 oo1-1 0 000 0
This gives us the formulas X=
—3 — 2x. — 5g
B=
1444
5 = 2 for the pivot variables .r),.73,.r5 in terms of the free variables xr, and 24. In vector
form, this becomes XN
—3 — 2x2 — 5x4
—3
—2
—5
x2
Bod
0
1
0
B)=
1+ 44
=]
1
{+a}
0
[+4]
1
v4
Br
te)
0
1
Xs
2
2
0
0
In general, we can write the solutions of a consistent linear system with k free variables as a fixed vector plus an arbitrary linear combination of k other fixed vectors.
Q—s
KEY IDEAS
e R’" is the set of column vectors of length n. e A linear combination of the vectors vj,...,Vn is a vector of the form cyv; + ++++ ¢)V, for real numbers ¢j,...,Cne The span of a list of vectors is the set of all linear combinations of those vectors.
EXERCISES
36
Linear Systems and Vector Spaces
Nee}74
d E ] f e d i L e ) G o-fe Eee Le-LdeEd For each pair of vectors below, draw sketches illustrating v, w,
v + w,
v—w, and 3v — 2w.
Sh 3)
Describe, in vector form, the set of all solutions of each of the following
linear systems. (a)
2x-y+z=3
(b)
2r-y+z=0
-r+y=-2
—1+y=1
r+2y+3z=-1
r+ 2y+3z=5
()
xrt+y+z=1
(d)
x—2y+3z=4
a1 + 2x2 + 343 + 404= 5
6x, + 7x) + 8x3 + 914 = 10 Lx, + 12%) + 1343 + 14% = 15
(e)
x—2y=0
(8
1
—34+6y=
—3r+6y=0 2x —4y=0
1.3.4
=>
ax —4y=2
Describe, in vector form, the set of all solutions of each of the following
linear systems.
(a) r+2y+3z=2
(b) r+2y+3z=4
de-y+z=4
dx -yt+z=3
—1+y=-2 (c)
2r-y+3z=0 r+
-r+y=-1 (d)
2y—3z=1
XY —6x,
— 2x2 + 3x3
— 444 = 5
+ 722
+ 9x4
— 8x3
= —10
1x — 12x) + 13.43 — 1444 = 15 ()
-r+y=0
()
-r+y=3
2x —2y=0
2r—2y=2
—3r+3y=0
—34+3y=1
1.3 Vectors and the Geometry of Linear Systems
37
by 1.3.5
(a)
b,
Under what conditions is the vector 1
2
2211
is 3 ba,
in
=]
meo]}'|-2]]' : ? Give uA your
=3
1
answer s
in
terms s
of an
4
equation or equations satisfied by the entries b),..., ba. (b)
Which of the following vectors are in the span 1 2 -1 0
=1
1
0
|}—2]{?
i
4
2|"| =) (0) -
il]
2
))
oy
| =
(a)
_
1
2
x
0]
6
|,
tw)
i) | 5
aw),
1 1.3.6
4
—1
3
E
Under what conditions is the vector
€
in
3
li: 1
0
27
0
2 = rllalclo
0
1
s ? Give your answer in terms of an equa-
od
tion or equations satisfied by the entries by,...,b4. (b)
Which of the following vectors are in the span 1
0
0 -1
,
27
2 1
0
a
i)
1
oJ 1]
(cs
4
apy || =i
=a
(ii)
i
1.3.7
2
0
0
AY
1
ara
3
(iii)
|| 2
0
4
4
vn
the vectors
3
}
P
a
(iv)
Sl
Describe all possible ways of writing 1
i)
&
and 3}
6
2 3
=I
P
poner
1 | 388 linear combination of
38
Linear Systems and Vector Spaces
1.3.8
2
Describe all possible ways of writing
A
as a linear combination of
1
13S)
Describe all possible ways of writing
;
as a linear combination of
4
the vectors
1.3.10
1
1
1
0
i)
{0}
1
0
0
Oo}
fis’
joy’
Jay?
jo}?
fa
0.
0.
1
0.
1
1
0
Consider a linear system in vector form:
Xvi +++ +2nVn =b, where vj,...,Vn,b
€ R™.
Show
that the system
is consistent if and
only if
be (vi,...,Vn). LSHUl
Use the Pythagorean Theorem
29 for the solution. 1.3.12
length
twice to derive the formula
of a vector
in R?.
Include
a picture
on page
in your
Let v1, V2, v3 € R}, and suppose that the linear system V1
has
infinitely many
+ yv2 +2zv3 =0
solutions.
Show
that vj,Vv2,Vv3
lie in a plane
containing the origin in R?. HERE}
Two vectors v,w € R” are called collinear if v = aw or w = av for some a € R. Show that the span of any two nonzero vectors in R?
which are not collinear is all of R. 1.3.14 eI)
Give them Give linear
an example of three vectors u,v,w € R?, such that no two of are collinear (see Exercise 1.3.13), but (u,v,w) 4 R?. a geometric description of the set of solutions of a consistent system over R in three variables if it has k free variables, for
Ri Onin 2.3e
1.4 Fields
39
1.4 Fields* General Fields So far, we've assumed that all the numbers we've worked with - as coefficients
or as possible values of variables - have been real numbers. The set IR of real
numbers is part of a hierarchy of different types of numbers: in one direction, there is the more general class of complex numbers, and in the other direction, there are special classes of real numbers, like integers or rational numbers. We may be interested in solving linear systems with complex coefficients, or we may have a system with integer coefficients and be looking for integer solutions. The first question we want to address is how much of what we've done in the previous sections carries over to other number systems. Our algorithm for solving linear systems was Gaussian elimination, described in the proof of Theorem 1.1, so we begin by examining that algorithm to see what algebraic operations we used.
Quick Exercise #16. Reread the proof of Theorem
1.1, and keep track of what
types of algebraic operation are used.
a
As you've presumably confirmed, there are only four operations involved: addition, subtraction, multiplication, and division, which suggests the following definition.
Definition A field consists of a set F with operations + and -, and distinct elements 0, 1 € IF, such that all of the following properties hold: F is closed under addition:
For each a,b € F,a+beF.
F is closed under multiplication:
For each a,b € F, ab =a-be
F.
Addition is commutative: For each a,b € F,a+b=b+a.
Addition is associative: For each a,b,c ¢ F,a+(b+c)=(a+b)+e. 0 is an additive identity: For each
a¢ F,a+0=0+a=a.
Every element has an additive inverse: For element b € F such thata+b=b+a=0.
each
a
€
FF, there
is an
Multiplication is commutative: For each a,b € F, ab = ba.
Multiplication is associative: For each a, b,c € F, a(bc) = (ab)c.
*Most of the rest of this book works with scalars from an arbitrary field (except Chapters 4 and 5, in which scalars are always real or complex), but this is only essential in Section 2.6. Readers who are interested only in real and complex scalars can skip to Section 1.5, mentally replacing each occurence ofF with R or C.
G1 a6ed uo sj :91¢ vO
40
Linear Systems and Vector Spaces
1 is a multiplicative identity:
For each a € F,al = la=a.
Every nonzero element has a multiplicative inverse: For each a€ F such that a 0, there is an element c € F such that ac = ca = 1. Distributive law:
For each a,b,c € F, a(b + c) = ab + ac.
For notational convenience, we will denote the additive inverse of a by —a, so
that
a+(—a) =(-a)+a=0, and the multiplicative inverse (for a
4 0) by a~', so that
We will also call the addition of an additive inverse subtraction and denote it as usual: a—b:=a+(-b).
Similarly, we call multiplication by a multiplicative inverse division and denote it
by
a = i=ab}. b Bottom Line: a field consists of a bunch of things that you can add, subtract, multiply, and divide with, and those operations behave basically as you expect.
Examples 1. 2.
The real numbers: the set R of real numbers, together with ordinary addition and multiplication, satisfies all the properties above, so R is a field. The complex numbers: the set C:=
{a+ib|abeR}
of complex numbers, with addition and multiplication defined by (a + ib) + (c+
id) :=a+c+4
ilb+d)
and (a + ib) - (c + id) = ac — bd + i{ad + be)
is a field. In particular, recall the following formula inverse of a complex number.
for the multiplicative
1.4 Fields
Quick Exercise #17.
41
Show that if a,b € IR and a and b
b )= 1. (a+) (atp — igh
are not both 0, then
Notice that this exercise doesn’t just show that (a + ib)~!
3. 4.
z 7
- intp
more fundamentally, it shows that a complex number which is a multiplicative inverse of (a + ib) exists at all, which is not a priori® obvious. The integers: Let Z denote the set of integers.t Then Z is not a field: nonzero elements other than 1 and —1 do not have multiplicative inverses in Z. The rational numbers: Let Q={"
5.
=
n
| mneZ,
no}.
Then Q, with the usual addition and multiplication, is a field.+ The field of two elements: Define F, := {0,1} with the following addition
and multiplication tables: oj1 oj1 1 |} 140
0|;/0}0 1 |} 0}
1
One can simply check that the properties of the field are satisfied. Note in particular that in F2,
1 =
—1; this makes F2 a very different place from R,
where the only number equal to its own additive inverse is 0. Working with the field F2 is often useful in computer science, since computers internally represent data as strings of Os and 1s.
Quick Exercise #18. Show that if a,b € F2, then a+ b = 1 if and only if: a=1 or b = 1, but not both. This means that addition in F2 models the logical operation called “exclusive or.” A Technically, a field consists of the set IF together with the operations + and -, but we will normally just use F to refer to both, with the operations understood. We do need to be careful, however, because the same symbols may mean different
“This is a Latin phrase that mathematicians use a lot. It refers to what you know before doing any
deep thinking. + The letter Z is traditionally used because the German word for integers is “Zahlen.”
+ Q stands for “quotient”
ates =_ 1= GES
(tte? (Gee)
+9) sei vd
Linear Systems and Vector Spaces
things in different contexts: for example, we saw above that 1 + 1 = 0 is true in F2, but not in R.
Arithmetic in Fields The following theorem collects a lot of “obvious” facts about arithmetic in fields. In the field R, these are just the familiar rules of arithmetic, but it’s useful to know that they continue to be true in other fields. When defining a field, we required as few things as we could (so that checking that something is a field is as quick as possible), but all of the following properties follow quickly from the ones in the definition.
Theorem 1.5 All of the following hold for any field F.
SPS) rs ke bs
42
The additive identity is unique. The multiplicative identity is unique. Additive inverses are unique. Multiplicative inverses are unique. For every a € F, —(—a) =a.
For every nonzero a € F, (a~!)“! =a. For every a € F, 0a =0.
For every a € F, (—1)a =
—a.
If a,b € F and ab= 0, then either a= 0 or b=0.
In interpreting the final statement of the theorem, recall the following.
In mathematical English, the statement “A or B" always means “A or B or both.”
Proof 1. What it means to say that the additive identity is unique is that 0 is the only element of F that functions as an additive identity; i.e., if there is an element a € F with a+ b = b for each b € F, then in fact a=
Suppose then that a € F
0.
satisfies a+ b = b for every b € F. Taking b = 0
implies that a + 0 = 0. By definition, 0 is an additive identity, so
2.
ie.O=a+0=a. Exercise (see Exercise 1.4.16).
3.
Exercise (see Exercise
a+0 = a;
1.4.17).
Let a € F be nonzero, and let b € F be such that ab = 1. Then
a! =a7'1 =a™'(ab) = (“ab = 1b =d. 5.
For any b € F, the notation —b means the unique additive inverse of b in F (its uniqueness is part 3 of this theorem). That means that we need to show that a
43
1.4 Fields
acts as the additive inverse of —a, which means a + (—a) = 0. But this is true 6.
by the definition of —a. Exercise (see Exercise 1.4.18).
7.
Since 0 =0+0, Oa = (0 + O)a = 0a + 0a,
and so 0 = 0a — 0a = (0a + 0a)
— 0a = Oa + (Oa — Oa)
= Oa
+0 = 0a.
(Note that all we did above was write out, in excruciating detail, “subtract 0a
8. 9.
from both sides of 0a = 0a + 0a.”) Exercise (see Exercise 1.4.19). Suppose that a # 0. Then by part 7,
= a '0=
b= 1b=(a'a)b = a (ab)
A
Remark To prove the statement a = 0 or b = 0 in part 9, we employed the strategy of assuming that a 4 0, and showing that in that case we must have b = 0. This is a common strategy when proving that (at least) one of two things must happen: show that if one of them doesn’t happen, then the other one has to. A
Bottom Line, again: a field consists of a bunch of things that you can add, subtract, multiply, and divide with, and those operations behave basically as you expect.
Example To illustrate that the properties in the definition of a field are all that are needed for lots of familiar algebra, suppose that x is an element of some field F and x? = 1 (where, as usual, x? := x - x), Then we know
O=1-1=2r-1. By Theorem
1.5 part 8, OH
-1ar4+r—x-laererte-14(-Urt(-0,
and then using the distributive law twice,
O=rr+
14+ (e+
1
=(- r+
1).
By Theorem 1.5 part 9, this means that either r— 1 = 0 or r+ 1 = O. In other words, we've proved that x = 1 and x = —1 are the only solutions of x? = 1 in any field.” A *One unexpected thing can happen here, though: in F,, —1 = 1 and so the equation x? = 1 actually
has only one solution!
Linear Systems and Vector Spaces
Linear Systems over a Field Our motivation for the definition of a field was that it provided the context in which we could solve linear systems of equations. For completeness, we repeat the definitions that we gave over R for general fields.
Definition A linear system over F of m equations in n variables is a set of equations of the form
ayty t+
+ Aint = D1, (1.24)
Am X] ++ *+ + Amntn = bm. Here, aij € F for each pair (i,j) with
1 F is a vector space over F, equipped with pointwise addition and scalar multiplication. Let V be a vector space, and suppose that U and W are both subspaces of V. Show that UNW
1.5.11
is a field with the operations
=
{v|veU andve W}
is a subspace of V. Show that if U,, Uz are subspaces of a vector space V, then U; + U2 = {uy + u2 | uy € U1, uz € U2} is also a subspace of V.
61
1.5 Vector Spaces
1.5.12
Show that x? + x = 0 for every x € F). Therefore if the polynomial p(x) = x* + x over F) is interpreted as a function p : F, > Fo, as opposed to a formal expression, it is the same as the function p(x) = 0.
Heyy}
Prove that {v¢€ R" | vj > 0,i= 1,...,n} is a real vector space, with “addition” given by component-wise multiplication, “scalar multipli-
cation” given by
vy
v
Dn,
vi
and the “zero” vector given by 0 = 1.5.14
Let
v= [ia
pi>0, i=1,...,05 i=1
(Geometrically, V is a simplex.) Define an addition operation on V by
(Dis -- +s Pn) + (Qis-- +5 Qn) = Lida
Pn) Pigi t+ Pngn’
where the division by p1q; +:+++Pndn
is ordinary division of a vector
by a scalar. Define a scalar multiplication operation by
Api, +--+ Pn) = Show that V
(pe)
Pht
+ ph
is a real vector space.
teSen5) Let S be any set. For subsets A, B C S, symmetric difference is the set AAB defined by the property that s is in AAB if and only if s is in either A or B, but not in both of them. That is,
AAB = (AUB) \ (ANB) =(A\ B)U(B\ A) in set-theoretic notation. Let V be the collection of all subsets of S. For A, B € V (so A,B C S), define A + B = AAB. Furthermore, define 0A = f and 1A = A. Prove that V is a vector space over the field F2, with zero vector the empty set 0. Hint: Check the conditions in the definition of vector space in order, to be sure you don’t miss any. In proving the properties of scalar multiplication, since F; has so few elements you can check them all
62
Linear Systems and Vector Spaces
Linear Maps and Matrices
Throughout this chapter (and, in fact, for much of the rest of the book) we will be working
with multiple vector spaces at the same time; it will always be understood that all of our vector spaces have the same base field F.
2.1 Linear Maps Recognizing Sameness One of the most fundamental when two things which appear ingful way. We have already without explicitly noting them spaces over R:
problems of mathematics is to be able to determine to be different are actually the same in some meancome across some examples of this phenomenon, as such. For example, consider the following vector
ay
R"=)]
i]
lay..aneRe,
An,
and
Pa(R) = {a0 + air + +++ aa!
0,41, ---a¢€ RY.
On the face of it, these two sets appear to be completely different; we have vectors in n-dimensional space on the one hand, and polynomials on the other. Even so, the two spaces feel kind of the same: really, they consist of ordered lists of numbers, and the vector space operations work the same way, that is, componentwise. Seeing and being able to formally describe this kind of identical structure in different guises is part of what mathematics is for, and motivates the following definition.
64
Linear Maps and Matrices
Definition Let V and W be vector spaces. A function T : V > W is called a linear map (or linear transformation, or linear operator) if it has both of the following properties. Additivity:
For each u,v € V, T(u + v) = T(u) + T(r).
Homogeneity:
For each v € V anda
F, T(av) = aT(v).
We refer to V as the domain of T and W all linear maps from V
as the codomain of T. The set of
to W is denoted £(V, W). When
V and W
are the same
space, we write L(V) := L(V, V).
That is, a linear map between vector spaces is exactly one that respects the vector space structure: performing the vector space operations of addition and scalar multiplication works the same way whether you do it before or after applying the linear map. The map thus serves as a kind of translation between the two settings which lets us formalize the idea that the operations work the same way in both
spaces. Sometimes there is no need for translation: if V is a vector space, the identity operator’ on V is the function I : V > V such that for each v € V, I(v) = v.
For a more substantial example, consider the following map T : R"
> P»—,(R):
a
T
= ay tage tess tage".
:
(2.1)
An,
Then T is linear: ay
a, +b,
by
:
T]|:}+] Qy,
:]]=7
ay + bn
by.
= (ay + by) + (az + bo) + +++ + (an + bn)”
“This is another place where we will embrace ambiguity in our notation: we use the same symbol for the identity operator on every vector space.
“(pv = av = (av) pue (m+ (QI =H +4 = (M+ LE VD
2.1
Linear Maps
65
= (a + arte ay
tant") + (0: + Dar tee + byt!) by
=T{]:}]+r]l:]]. Qn
bn.
and ay
Tle
cay
=T||: an.
= cay + cana +--+ + cane"! Can, a
= e(a
bane tes page!)
= eT
it], Gn
thus formalizing our observation that the vector space operations on vectors and
polynomials are fundamentally the same.
Linear Maps in Geometry Even though we motivated linear maps as a way of connecting the vector space operations on two different spaces, linear maps play many other fundamental roles within mathematics. For example, it turns out that many familiar examples of geometric and algebraic transformations define linear maps; we will in particular
see below that reflections and rotations are linear. The common thread between transforming vectors into polynomials and transforming arrows in space through rotations and reflections is the fact that all of these transformations preserve the vector space structure.
Examples 1.
Let T be the map of IR? given by reflection across the x-y plane: Figure 2.1 Reflection across the x-y plane.
66
Linear Maps and Matrices The following picture demonstrates that T is additive: Figure 2.2 Additivity of reflection.
2.
We leave the (simpler) corresponding picture demonstrating homogeneity as Exercise 2.1.1. Let Re be the map on R? given by rotation counterclockwise by 6 radians: Figure 2.3 Rotation by 0.
Ry
Again, one can see the linearity of Rg in pictures: Figure 2.4 Additivity of rotation.
CA i
arsngent grow) \Rw
2.1 Linear Maps
67
Matrices as Linear Maps Let
A= [aij]i» vyjaj + wjaj = Av + Aw. j=l j=l j=l Checking homogeneity is similar. Formula (2.3) makes it easy to see that if ej
€ F” is the vector with a 1 in the
jth position and zeroes everywhere else, then Ae; = aj; that is,
“Watch the order of m and n carefully: an m x n matrix goes from F" to F™!
68
Linear Maps and Matrices
Ag; is exactly the jth column of the matrix A.
As in R", the vectors e;,...,€,;
€ F" are called the standard basis vectors in
F’. Examples 1.
The n x n identity
matrix I,
¢ M,(F)
is the n x n matrix with
1s on the
diagonal and zeroes elsewhere. In terms of columns,
Cn
cs:
Th=l[er
By (2.3), it follows that
n Inv= > vjej =V jel for each v € IF". That is, the identity matrix I, acts as the identity operator Ie 4£(F"), hence its name.
2.
Ann x n matrix A is called diagonal if aj = 0 whenever i 4 j. That is, ay
0
0
ann
A=
We sometimes denote the diagonal matrix with diagonal entries d,...,dn € F as dy
0
diag(d),..., dy) :
Matrix-vector products are particularly simple with diagonal matrices, as the following quick exercise shows.
o
Quick Exercise #3. Let D = diag(d,,...,d,), and let x € F". Show that dyxy
Dx =
AnXn
2.1
69
Linear Maps
010
3.
Consider the 3 x 3 matrix
A=]0
0
x
1). Then for | y|
100
x
€ R’,
Zz
y
Aly|=]z
Zz That is, reordering the coordinates of a vector in this way is a linear transformation of R?. More generally, any reordering of coordinates is a linear map on EF", A
Eigenvalues and Eigenvectors For some linear maps from a vector space to itself, there are special vectors called eigenvectors, on which the map acts in a very simple way. When they exist, these vectors play an important role in understanding and working with the maps in question.
Definition Let V be a vector space over F and T € £(V). The vector v € V is an eigenvector for T with eigenvalue* ) € F if v 4 0 and Tv =. If A € M,(F), the eigenvalues and eigenvectors of A are the eigenvalues and eigenvectors of the map in £(F") defined by v +> Av.
That is, eigenvectors are nonzero vectors on which T acts by scalar multiplication. Geometrically, one can think of this as saying that if v is an eigenvector of T, then applying T may change its length but not its direction. More algebraically, if vy
€ V
is an eigenvector
of a linear map
T, then the set (v) of all
scalar multiples of v is invariant under T; that is, if w € (v), then T(w) € (v) as
well.
“These terms are half-way translated from the German words Eigenvektor and Eigenwert. The German adjective “eigen” can be translated as “own” or “proper,” so an eigenvector of T is something like “T's very own vector" or “a right proper vector for T The fully English phrases “characteristic vector/value” and “proper vector/value” are sometimes used, but they're considered
quite old-fashioned these days. + The exact opposite direction counts as the same.
70
Linear Maps and Matrices
Quick Exercise #4. Prove the preceding statement: if vy € V is an eigenvector of a linear map T and w € (v), then T(w) € (v) as well.
Examples 1.
Consider the linear map T : R?
> R? given by
El=b STE]-L} It is easy to visualize what this map does to R?:
Figure 2.5 Result of multiplication by [: ‘}
1
The vector e; := °] F
is an eigenvector with eigenvalue 1: T does not change
it at all. The vector e2 :=
Oo}.
¥
She
st
1 | san eigenvector with eigenvalue 2: T stretches
e by a factor of 2 but does not change its direction.
2.
Recall the map T : R?
+ R? given by reflection across the x-y plane: +a) 3. ayo = (ao
= (09)L = (my = a9 = m & (a) 3 at uaYL “ay = (a) ‘Z > Y BWOS JO} LAY) ‘JOPaAUABI UE S!4 41 >pH VO
2.1 Linear Maps
n Figure 2.6 Reflection across the xy plane.
It’s easy to recognize lies in the x-y plane eigenvalue 1. On the across the x-y plane with eigenvalue —1.
3.
some eigenvalues and eigenvectors of T: any vector that is unchanged by T and is thus an eigenvector of T with other hand, if v lies along the z-axis, then its reflection is exactly —v, and so any such v is an eigenvector of T
Recall the map Rg : R? > R? given by rotation by @: Figure 2.7
Ry,
It is not too hard to visualize that (unless eigenvectors.
Rotation by 6.
@ = O or
6 = z), Rg has no
>)
Quick Exercise #5. What are the eigenvectors (and corresponding eigenvalues)
go
of R,? What about Ro? 4.
Recall the diagonal matrix
D = diag(d),...,d,
Notice that the jth column ofD = diag(d),...,d,) is exactly dje;. Thus
De; = dje;, and so each ej is an eigenvector of D with eigenvalue dj. (Auap! ayy ysnfsi Oy) | anjeauabia
YM Oy jo JoP~anuabia Ue pue ‘T— anjeauable YM *~y Jo soPanuabia ue sia # A Aung (q) “SH VO
72
Linear Maps and Matrices
5.
Consider the matrix
a=[% of We want to find all possible .r, y, and 4 such that x and y are not both 0 and
IEE) This reduces to the very simple linear system =Ar
, —x
(2.4)
= Ay.
Substituting the first equation into the second, we get that y = —A”y, so either
y = 0 or 2 = —1. But if y = 0 then the second equation tells us + = 0 as well, which is not allowed. Therefore any eigenvalue A ofA satisfies A? = —1. At this point we should notice that we've been vague about the field F. If we think of A as a matrix
in M)(R),
then we've
found
that A has no real
eigenvalues. But if we let ourselves work with complex scalars, then A = +i are both possibilities. Once we pick one of these values of 4, the linear system (2.4) shows us that every vector i]
;
;
x
eigenvalue i, and every vector
6.
for x € C is an eigenvector of A with
A
|
.
-
for x € C is an eigenvalue of A with
eigenvalue —i. Here is an example of finding the eigenvalues and eigenvectors of a map without the aid of geometric intuition or unusually simple algebra. Consider the matrix
A= :
|| € M)(R)
acting as a linear map on R*. We want to find all possible real numbers .r, y, and A such that
[:
and such that I] Fx
1
1
|
y
=A
'|
° . Subtracting® 4
y
:|
»
from both sides shows that the
matrix equation above is equivalent to the linear system given by
* Subtraction is a surprisingly good trick!
73
2.1 Linear Maps
3-2 1
-1 jolmf 1-a]o
1 3-2
1-alo -1 Jo
ri fi I-A 0 oan s1-ap a0} We can see from the second row that for ]
(2.5)
to be a solution, either y = 0 or
—(1-A(B-A)-1=—-(A-2)° =0. If y = 0, then from the first row we get that + = 0 as well, but
A
doesn’t
count as an eigenvector. We thus need 2 = 2 for the equation to have a solution; in that case, the system in (2.5) becomes
so that any
*)
1
-1]0
0
o jo|’
with x = y is an eigenvector. That is, we have shown that
y
4 = 2 is the only eigenvalue of the map given by multiplication by A, and the
‘nat
af
corresponding eigenvectors are those vectors
+!
t
os
with x £0.
A
The Matrix—Vector Form of a Linear System The notation of matrix-vector multiplication introduced earlier in this section allows us to connect linear systems of equations and linear maps, as follows. Let
Ay ty toe + Aindy = by (2.6) Amit
+++ Gnd
= bm
be a linear system over F. If A := [ai ]iIND UI
LeVO
2.2 More on Linear Maps
Our map T : R" —
a)
79
?y—;(R) defined in (2.1) is an isomorphism: if
Fane t-+++ayx" | = by + bor tee + dpe”|,
then a; = b; for each i € {1,..., m}; that is,
and so T
4
by
an
bn.
is injective.
Given a polynomial ao + ayx" +--+ + a@y—yr"—}, ao
ag + ayn" $0) + aq
r=
T
:
7
Gn—1
and so T is surjective. That is, the sense in which R” and P,_,(R) are the same is exactly that they are
isomorphic vector spaces over R. Proposition A.2 in the Appendix says that a function f : X > Y if and only if it is invertible, that is, if and
only if it has an
is bijective
inverse function
f-!:Y > X. In the context of linear maps, we restate this as follows.
Proposition 2.1 Let T ¢ £(V,W). Then T is an isomorphism if and only if T is invertible. A priori, a linear map T € £(V, W) could turn out to have an inverse function which is not linear. The following result shows that this cannot actually happen.
Proposition 2.2 If T : V > W also linear.
Proof
is linear and invertible, then T~'
: W > V
is
To check the additivity of T—', first observe that by additivity of T and the
fact that To T"! =I e L(W),
T(T~"(wy) + T7"(w2)) = T(T~ (ws) + T(T"(wa)) = wi + wr. Applying T~' to both sides of the equation above then gives
T~'(wy) + T7'(w2) = T"(w; + wr). Similarly, if c ¢ F and w € W, then
T(cT~'(w)) = cT(T~"(w)) = cw,
80
Linear Maps and Matrices
and so cT~!(w) = T""(cw).
Example
Consider the subspace
A
1
2
1
},{-1
-1
0
U =
C R°. Geometrically, it
is clear that U is a plane, and so it should be isomorphic to R*. If we define
S:R? + Uby y
1
s[]=
2
1
»
|+y}-1],
-1
0
then it is easy to check that S is linear. It is also clear that S is surjective, since every vector in U can be written asx|
1
2
1
| +y|—1]
—]
0
for some x and y. To see
that S is injective, suppose that
a]
1
2
1
1
}t+ty}-l}
-1
1}
=x]
2
1 | +yo}-1], il
0
or equivalently, 1
ul!
1 -1
2
Oo
}+v/—1)=]0], oO
f)
where u = x; — 42 and v = y; — yp. Solving this linear system for u and v shows v=0, so x; = x2 and y; = y2. That is, S is injective that the only solution is u and thus an isomorphism. From this, we can also conclude that there is a well-defined bijective linear map T:U > R? with
T is exactly S~!.
A
Properties of Linear Maps Knowing that a map is linear puts quite a lot of restrictions on what the images of sets can look like. As a first result, for linear maps the image of the zero vector is completely determined.
81
2.2 More on Linear Maps
Theorem 2.3 If T < L(V, W), then T(0) = 0.
Proof
By linearity,
T(0) = T(0 + 0) = T(0) + T(0); subtracting T(0) from both sides gives
0=T(0). The result above says that if, for example, a function T:
A V — V
is known to
be linear, then 0 is necessarily a fixed point for T; that is, T(0) = 0. For further restrictions on what linear maps can do to sets, see Exercises 2.2.10 and 2.2.11.
If S and T are linear maps such that the composition S o T makes sense, we usually write it simply as ST. If T € £(V), then we denote T o T by T?, and more
generally, T* denotes the k-fold composition of T with itself.
Proposition 2.4 Let U, V, and W W
be vector spaces. If T: U > V and S$: V >
are both linear, then ST is linear.
Proof
A
See Exercise 2.2.19.
Recall that addition and scalar multiplication of functions is defined pointwise, as follows.
Definition Let V and W be vector spaces over F and let $,T
:
V
> W be
linear maps. Then
(S + T)(v) = S(v) + T(r) and for c € F,
(cT)(v) = e(T(v)). These definitions give the collection of linear maps between two vector spaces its own
vector space structure:
Theorem 2.5 Let V and W be vector spaces over F, let S,T : V > W be linear maps, and let c € F. Then S + T and cT are linear maps from V to W. Moreover, £(V, W) is itself a vector space over F with these operations.
82
Linear Maps and Matrices
Proof
See Exercise 2.2.20.
A
The operations interact nicely with composition of linear maps.
Theorem 2.6 (Distributive Laws for Linear Maps) Let U, V, and W be vector spaces and let T,T,,T, € £(U, V) and S,S;,S2 € £(V, W). Then
1.
S(T, +72) = ST +ST2,
2.
= S;T+SaT. (Si +S2)T
Proof
We will give the proof of part
1; part 2 is trivial.
Let u € U. By the linearity of S,
S(T + T2\(u) = S[Tilu) + Tolw)] = $[Tilw)] + S[T2lw)] = (ST: + $T2)(0. That is,
S(T,
+ T2) = ST;
+ ST.
A
Composition of linear maps satisfies an associative law (part 1 of Lemma A.1) and distributive laws, just like multiplication of real numbers, or elements of a field in general. For this reason, we often think of composition of linear maps as a kind of product. There’s one very important difference, though: composition does not satisfy a commutative law. That is, ST (which means “do T
first, then do
S”) is in general different from TS (which means “do S first, then do T”). For this reason, parts | and 2 of Theorem 2.6 have to be stated separately, unlike the single
corresponding distributive law in fields.
Quick Exercise #8. What are the domains and codomains of ST and TS in each of the following situations? (a)
SE L(V, W) and Te L(W,V).
(b)
Se £(U,V) and T € L(V, W).
Theorem 2.7 A function T : V > W is a linear map if and only if, for every list v1,...,Vp € V and ay,...,a, € F, k
T (> a)
i=1
k
= Yat).
i=1
(A Jo aredsqns e aq oy suaddey M ssajun) pauyap uana you S| Es Ing "(Mi ‘A)'y > SL (gq) “(A)T > SLING ‘(M)F > LS (e) :B# VD
2.2 More on Linear Maps
Proof
83
Suppose that T: V — W
is linear, and let »j,...,v~ € V and aj,..., ap €
F. Then by repeated applications of the additivity of T, followed by an application of homogeneity to each term,
k hI T (> a) =T (Xam) + T (ayvp) i=1 =I k-2
=T (x a) +T (ay—10e-1) + T (aere) i=1
=T (ayv1) +--+ +T (apr) = ayT (v1) +--+ + ayT (v4). Conversely, if T : V — W and @),...,a, € F,
has the property that for every list v;,...,vp
k T
k
Yan)
=1
€ V
= Yat),
i=1
then taking k = 2 and a; = a) = 1 gives that, for every 1}, v2,
T(v1 + v2) = Tr) + T(r), and taking k = 1 gives that for every a € F and ve V, A
T(av) = aT(v).
The Matrix of a Linear Map We saw in Section 2.1 that multiplication of a vector by a matrix A € Mm,n(F) gives a linear map F" > F”. In fact, it turns out that every linear map from IF" to F” is of this type.
Theorem 2.8 Let T € £(F",F”). Then there is a unique matrix such that for every v € EF",
A € Mm,n(F)
T(v) = Av. We call A the matrix of T, or say that T is represented by A.
Proof Recall that we saw when defining matrix-vector multiplication that if e is the vector with 1 in the jth entry and zeroes elsewhere, then Ae; is exactly the jth column of A. This means that, given T, if there is a matrix so that T(v) = Av
for every v € F", then its jth column must be given by Te;. That is, there's only
84
Linear Maps and Matrices
ek
Rye)
Figure 2.8 The effect of Rg on e; (left) and on eg (right).
one way this could possibly work,” and if it does work, then the uniqueness of the
matrix comes for free. To confirm that it really does work, for each j = 1,...,n, define aj = T(e;) € F”, and then define A
ay
oct:
|
ay
|
%
Then given any vector v= | : | €F’,
Yn,
Tv) = T | > ye; ] = Yo Tle) = Y> ja; = AV j=l
j=l
j=l
by the linearity of T and equation (2.3).
A
It’s worth emphasizing what we saw in the first paragraph of the proof above:
The jth column of the matrix of T is Te;.
Example
Let @ € (0,27), and recall the linear map Ro
: R? >
R? given by
counterclockwise) rotation by 6. To determine which matrix Ag is the matrix of
Ro, we need to compute Ro(e;) for j = 1, 2. This is trigonometry: “Which makes this what the mathematician Tim Gowers calls a “just do it” proof.
2.2 More on Linear Maps
85
__ | cos(0)
Roles) = [set]
_ | —sin(@)
and
Role2) =
cos(6) |
The matrix Ag of Rg is therefore
_ | cos(@)
—sin(@)
Ao= er From this we can tell that for each Hl
R e
x}\ y|}
_ | cos(@) | sin)
cos(@) | eR’,
—sin(@)]} |x} cos(@) || y}
_ | xcos() — ysin(o) | xsin(@) + ycos(@)
|*
Notice what linearity has bought us: doing two very simple trigonometry calculations gives us the answers to infinitely many more complicated trigonometry problems. This fact is extremely useful, for example in computer graphics, where you may want to quickly rotate a large number of different points. A
Quick Exercise #9. Let R : R?
+ R? be the reflection across the line y =r:
Find the matrix of R.
Theorem 2.8 gives a function from the space £(F",F”) space
Mm,n(F)
of matrices.
The
following
actually an isomorphism; £(F",F”)
theorem
shows
of linear maps to the that this function
is
and Mm,,(F) are the same space in different
guises. In particular, addition and scalar multiplication of linear maps correspond to addition and scalar multiplication of matrices.
Theorem 2.9 Let C : £(F",F”") > Mmn(F) be the map which associates a linear map to its matrix. Then C is an isomorphism of vector spaces.
ft Josmamesen{]-[Jonel]-[+se
86
Linear Maps and Matrices
Proof The easiest way to show that C is invertible is to show that the map D : Mn,n(F) > £",F") defined by [D(A)](v) = Av is an inverse of C. This is done in Exercise 2.2.18. It remains to show that C is linear. Let T,,T2 € £(F",F”). Then the jth column of the matrix of T; + T2 is given
by (T; + Ta)(e)) = Tie; + Tr¢j, which is the sum of the jth columns of C(T;) and C(T,), so
C(T; + T2) = C(T1) + C(T2). Similarly, if a € F and T € £(F",F”), then the jth column of aT
is given by
(aT)(e;) = a(T(e))), so that C(aT) = aC(T).
A
Some Linear Maps on Function and Sequence Spaces We saw in Section 2.1 that many familiar geometric operations define linear maps.
It is also the case that familiar operations from calculus are linear operations on function spaces.
Examples 1.
Let C'[a, b] be the set of continuously differentiable’ functionsf : [a,b] > R. We have already seen (see the examples on page 57) that C'[a, b] is a vector
space. The differentiation operator D : C'[a, b] > Cla, b] defined by Df(x) =f'x) is a linear map: if f,g € C'[a, b], then
Df + Q(x) = (F +9!) =f) + ga) = Df la) + Dal), and ifc € R, then
D(cfMlx) = (cf) = of) = cDfla). 2.
Let h € C[a,b] be fixed. Then we can define a multiplication operator Mj, : Ca, b] + Cla, b] by Mif (x) = h(a)f (2). It is straightforward to check that Mj, is linear. (Note that for our notation My, : C[la,b] + C[a,b] to be legitimate, one also has to recall from calculus that iff and h are continuous, then so is hf.)
*That is, having one continuous derivative.
2.2 More on Linear Maps
3.
87
Let k : R? > R be continuous, and define an operator (called an integral operator) T; : Cla, b] + C[a, b] by b
Tyfle) = f Her oP) dy.
29)
The operator T;, is linear: if f, g € C[a, b], then by the linearity of the integral,
b
TH + ae) = [Her i(P0) + ob) dy a b
b
= [Hesston ay + [tts sta ay = Taste) + Tate, Similarly, if c ¢ R, b
b
TH(cAle)= [ Kervefod dy=c [Kr yWP) dy = Taft The function k is called the (integral) kernel of the operator T,.
Although it may appear more exotic, defining a linear map in terms of an integral kernel in this way is exactly analogous to defining a linear map on
vy R" using a matrix. Begin by thinking of a vector v = | : | € R” asa
function
Un.
v:{l,...,n}
> R given by
v(i) = vi, and
think
{(i,) | 1
of
an
m
x
n
matrix
A
=
{a
i R? which first reflects across the y-axis, then rotates counterclockwise by 7/4 radians, then stretches by a factor of 2 in the y-direction. 2.2.8 Show that if 0 is an eigenvalue of T € L(V), then T is not injective, and therefore not invertible. 2.2.9 Suppose that T € £(V) is invertible and v € V is an eigenvector of T with eigenvalue 4 € F. Show that v is also an eigenvector of T~!. What is the corresponding eigenvalue? 2.2.10 If x,y € R", the line segment between x and y is the set 2.2.4
L:={(1-d)x+ty|O0 R" is linear, then T(L) is also a line segment. Show that the image of the unit square {(r,y) | 0 < x,y < 1} in R?
under a linear map is a parallelogram. PAPAIN
Suppose that T € £(U,V),
S
€ L(V, W),
and ¢ € F. Show
that So
(cT) = e(ST). Apacs}
Consider the linear map T : R? — R? given by the matrix
na|"
2],
0
(a)
2.2.14
Show
that
°]
and
[:]
2
are eigenvectors,
and
determine
the
corresponding eigenvalues, (b) Draw the image of the unit square {(r, y) | 0 < x,y < 1} under T. Consider the linear map T : R? — R? given by the matrix
a-[} i} (a)
Show
that
1
fl
and
1
;
7
are eigenvectors,
A
and determine the
corresponding eigenvalues.
(b) PP pI}
Draw the image of the unit square {(x,y) | 0 < x,y < 1} under T.
Let C*(R)
be the vector space of all infinitely differentiable
func-
tions on R (i.e., functions which can be differentiated infinitely many times), and let D : C*(IR) + C*(IR) be the differentiation operator
Df =f".
90
Linear Maps and Matrices
Show that every 4 € R eigenvector.
2.2.16
is an eigenvalue of D, and give a corresponding
Define T : C[0,00) > C[0, 00) by
rye) = [fo ay. (Note
that,
by
the Fundamental
Theorem
of Calculus,
Tf
is an
antiderivative of f with Tf(0) = 0.)
(a)
Show that T is linear.
(b)
2.2.17
2.2.18
Show that T on page 87), Suppose that T € (a) Show that if (b) Show that if Here is an outline
is an integral operator (as in the example beginning although with a discontinuous kernel k(x,y). £(U,V) and S € L(V, W). ST is injective, then T is injective. ST is surjective, then S is surjective. of the proof of the invertibility of C in Theorem 2.9.
(a) Define D : Mmn(F) > £(B",F") by D(A) = Av (i.e., D(A) is just
(b) (c)
2.2.19 2.2.20
the linear linear. Prove that we define Prove that linear map S=T.
map “multiply the vector by A”), and show that D is CD is the identity map on M»,,(F). That is, given A, if T by T(v) = Av, then the matrix of T is A. DC is the identity map on £(F", F”). That is, if T is a with matrix A, and if we define S by S(v) = Av, then
Fill in the details. Prove Proposition 2.4. Prove Theorem 2.5.
2.3 Matrix Multiplication Definition of Matrix Multiplication Suppose that we have an m x n matrix A and an n x p matrix B over F. As we saw in Section 2.1, we can think of A as a linear map from F" to F”
and B two:
as a linear map from F? to F”, so we can form the composition of the
vr
A(Bv).
(2.10)
By Proposition 2.4, this is a linear map from F? to F”, so Theorem 2.8 tells us that it is represented by some m x p matrix. The question then arises: what is the matrix of this map in terms of A and B? The answer is that we define the product
91
2.3 Matrix Multiplication of two matrices A and B so that the matrix of the composition in formula (2.10) is the matrix product ofA and B.* Definition Let A be an m x n matrix over F and B be an n x p matrix over F. The product AB is the unique m x p matrix over F, such that for all v € F?, A (By) = (AB)v.
This is a fine definition, since we know that each linear map is associated with a unique matrix, but of course we'd also like a formula for the entries of AB in terms of the entries of A and B. We can get this from the definition and the formula we already have for matrix-vector multiplication:
n
n
> n P |ABy]; = [A (Bv)]; = ) > aiz (BY), = Y aie | D> dynj | = >> (> sah) Yj. kal k=1 j=l k=1 Comparing this to the formula for the ith entry of a matrix-vector product in equation (2.2), we see that for 1 j, and is called lower triangular if aij = 0 whenever i ainbyy = [-a-] k=1
© AB=|Ab,
tee
7
€ Mnp(F).
|
|
bj]. |
--- Ab, |.
|
|
—a,B—
© AB=|
: =n
If Ta € LCF", F”) has matrix A and Tg € £(F?, F") has matrix B, then AB is the matrix of Ta o Tp.
Linear Independence, Bases, and Coordinates
3.1 Linear (In)dependence
Redundancy We've come across the idea of redundancy in a couple of ways now. For example, recall the system of equations (1.3): 1
rhoy
-y=20
1 2x+—-y=
36
res
We observed
that the system
.1)
x+8z=
176
ax+
1 q?
56.
was
redundant,
+2 =
in that we could
uniquely
solve
the first three equations, and then the fourth was automatically satisfied. In fact, we explicitly demonstrated this redundancy by observing that if the first three equations are satisfied, then 1
ax+ tt —
1
1
Z=e(rt— a(+ 39)
=
7
1
1
5 (art +p)+pe+ — 5 wt 8z ) +2(
00147364! + gl!79)
(2)
= g20) + geo = 56.
Another type of redundancy we've seen is geometric: in Section
1.3, we found
that solving a linear system was the same thing as finding a way of writing one s
>
i
vector as a linear combination of two others. We saw that to write 1 linear combination of :]
and
i [ih
a
20
asa
we really need both vectors, but if we want
3.1
Linear (In)dependence
141
40 e
30
20
10
2025" Figure 3.1
Forming [2]
combination of
0
30s
20
as a linear combination of [2] and [i]
4.
and |
1
25"
(left) and as a linear
| (right). In the combination on the left, both vectors are needed,
7.
of the vectors In both
1 sats , as a linear combination of ] 1
2
and
NII oo
«, | 20 to write 3]
and
Nise
but on the right, the two pieces are in the same direction.
we don’t need both; either
would do on its own.
of these redundant
situations,
something
(an equation,
a vector)
is
unnecessary because we can get it as a linear combination of other things we already have.
Definition A list of vectors (vj,...,¥) in V (with n > 2) is linearly dependent
if for some i, v; is a linear combination of {v; | j 4 i}. A single vector v in V is linearly dependent if and only if v = 0.
Quick Exercise #1.
Show that ((" : Hl : 3)
is linearly dependent.
Notice that by the definition of a linear combination,
for n > 2, (vj ...,U,) is
linearly dependent if, for some i, there are scalars aj € F for all j ¢ i such that
Vj = {Vy
+ +++ + Gi-1Vj-1 + Gig Vig
+++
+ Anne
Subtracting v; from both sides, this is the same as
O = ayyy + o++ + Givin + (Di + Gig vig ++ + Gnd.
[a +[c]= [i] seve
142
Linear Independence, Bases, and Coordinates
So we've written 0 € V as a linear combination
of the vectors vj,...,¥, in a
non-trivial way; that is, the coefficients are not all zero. On the other hand, suppose we are given vectors v;,...,Un
in V (n > 2) and
by,...,bn € F such that
s bjvj = 0.
j=l If, say, bj 4 0, then we can solve for v;: bj.
*) nat (-*#) Vit ++(
b,
z) Uns
and so 1; is a linear combination of the vectors 11,..., Vi-1, Vit1,.--5 Une
This discussion shows that we could just as well have taken the following as
our definition of linear dependence.
Definition A list of vectors (v;,...,v,) in V is linearly dependent if there exist scalars @),...,4, € F which are not all 0 such that
The two definitions of linear dependence we've just seen are equivalent. That is, a list of vectors is linearly dependent according to the first definition if and only if it is linearly dependent according to the second definition. (This is exactly what we proved in the discussion between the two definitions.) The first definition is what our examples led us to, but the second one is frequently easier to work with.
G
Quick Exercise #2. Verify that the two definitions of linear dependence equivalent for n = 1. (We omitted that case in the discussion above.)
are
When we want to emphasize the base field, we sometimes say “linearly dependent over FF” rather than just “linearly dependent.” This becomes important in some cases, because we may have multiple possible base fields, and whether a list of vectors is linearly dependent may depend on which field we are working over. (See Exercise 3.1.8 for an example of this phenomenon.)
Linear Independence Definition A list of vectors (v;,...,V») in V is linearly independent if it is not linearly dependent.
“0 =0,_0 =4 ua) ‘0 # D pue 0 = av 4 “aT =4 = 0 Ualy Juapuadap Ayeaul SI 4 41 :7# VO
3.1
Linear (In)dependence
143
Thus, (vj,..., Vn) is linearly independent iff the only choice of scalars a),..., an € F for which
n
Yay =0 j=l
(3.3)
isa) =-+-=a,= Quick Exercise #3. Show that the list (e1,...,@m) in F” is linearly independent. (Recall that e; has ith entry equal to 1 and all other entries 0.) The next result is a first indication of the central role that linear independence
plays in the study of linear systems. Proposition 3.1 The columns of a matrix A € Mm,,(F) are linearly independent iff x = 0 is the only solution of the m x n linear system Ax = 0. That is, the columns of A are linearly independent iff ker A = {0}.
Proof
Let vj,...,Vn
€ F”
be the columns of A. Then
the linear system Ax = 0
has the vector form Yay
=o.
jel Thus vj,...,V» are linearly independent if and only if the only solution to the system is 4) = +++ = 2%, =0.
Corollary 3.2 A linear map T : F" matrix are linearly independent.
Proof
A
> EF" is injective iff the columns of its
See Exercise 3.1.19.
A
Corollary 3.3 Suppose that (vj,...,V,) is a list of vectors in F”, with n > m. Then (vj,...,Vn) is linearly dependent.
Proof
By Theorem
1.4, the m x n linear system [vi as vi x = 0 cannot have a
unique solution because m < n. By Proposition 3.1, this means that (v1, ..., Vn) is
linearly dependent.
“juapuadapu! Ayeauy si (Ma‘**-‘13) os ‘0 =v
A
=- = Iwuayy‘o=|
uy 2 | = MaMn4---+ latoyy cee yd
144
Linear Independence, Bases, and Coordinates
Proposition 3.1 provides us (in)dependence of vectors in F”.
with
an
algorithm
for
checking
linear
Algorithm 3.4 To check whether a list (v;,...,Vv,) in F” is linearly independent:
Put the matrix A= [v, --- vn in REF. e
(vi,-..,Vn) in F™ is linearly independent if and only if there is a pivot in every column of the REF.
Proof By Proposition 3.1, (v;,...,V,) is linearly independent iff the linear system Ax = 0 has the unique solution x = 0. The system is consistent since x = 0 is a solution automatically, so by Theorem 1.3, x = 0 is the unique solution iff the REF of A has a pivot in each column. A
Quick Exercise #4. Use Algorithm 3.4 to show (again) that ((3| ; |
is linearly dependent. Examples 1.
Consider the list of vectors (vj,...,Vn) in F"” given by VjHeite for 1 < j
C°(R) given by D?f = f”, where C™(R) is the space of infinitely differentiable functions on
BellailS
R, and use Theorem 3.8. Let n > 1 be an integer, and ,4,
€
R and
distinct
suppose
nonzero
that there
constants
are constants
cj,...,¢,
€
R such
n Ss a,e* =0 k=1
3.1.16
for every x € R. Prove that a; =--- = a, =0. Hint: Use Theorem 3.8. Let V be a vector space over the field Q of rational numbers. Show that a list (v),...,V,) in V is linearly dependent if and only if there are integers ay,...,@, € Z, not all equal to 0, such that QV,
Enlai
3.1.18
+++
+ AyVy = 0.
Give an example of a linear map T € £(V) with two linearly dependent eigenvectors v and w. Hint: Don't overthink this. Consider the set of real numbers R as a vector space over Q (see Exercise 1.5.8). Show that if p;,..., pn
list
are distinct prime numbers, then the
150
Linear Independence, Bases, and Coordinates
(log pi,...,log pn) 3.1.19 3.1.20
in R is linearly independent over Q. Prove Corollary 3.2. Use Algorithm 3.4 to give another proof of Corollary 3.3.
3.2 Bases Bases of Vector Spaces Recall that the span of 11, 12,...,vp € V is the set (v;,v2,..., Ve) of all linear combinations of v1, v2,...,v%. We also say that the list of vectors (v1, 72,..., Vp)
spans a subspace W C V Definition vectors
if (v1, 2,..., 0%) = W.
A vector space V is finite-dimensional if there is a finite list of
(v},...,Un)
in
V such
that
V
C
(v,...,¥)).
A vector space
V
is
infinite-dimensional if it is not finite-dimensional.
Quick Exercise #6. Show that IF” is finite-dimension: Of course if vj,...,U, be equivalent to require
€ V, then V =
(v1,...,Un)
C
V automatically, so it would
(vj,..., Vn) in the definition above. We wrote the
definition the way we did because it's formally simpler. Remember that if (v;,..., vn) spans V, then we can write any vector in V as a linear combination of (v),...,Un); i.e., (¥1,...,Un) suffices to describe all of V. On the other hand, we can think of a linearly independent list as a list with no redundancy. We would like to combine the virtues of each. That is, we would like to describe the whole space with no redundancy.
Definition Let V be a finite-dimensional vector space. A list (v),...,Un) in V is a basis for V if (v),...,v,) is linearly independent and V C
(,..., vn).
Examples 1.
Recall that e;
¢ F”
denotes the vector with
1 in the ith position and zeroes
elsewhere. The list (e1,...,€) in F” is called the standard basis of F”. It is
indeed a basis: We saw in Quick Exercise #3 of Section 3.1 that it is linearly independent, and in Quick Exercise #6 above that it spans F”.
(lata) = ya MYL “Mala +o + tala =a ty
| 2 | = AuaND 294 vO
3.2
Bases
2.
Let P,(R)
151
:= {ap + ax + +++ + Gyr" | do, a),...,4, € R} be the vector space
over R of polynomials in x of degree n or less. Then it is similarly easy to see, with the help of the example on page 146, that (1,.7,...,.°”) forms a basis of
PalR). 3.
Let v; =
and v2 =
in R?. The two vectors are not collinear, and
so they're linearly independent (see Exercise 3.1.7). We've already seen that
(v1, V2) = R? (see pp. 26 and 30). 4.
Consider the two lists of vectors in F", (v1,...,Vn) given by
VySete for 1
ajv; = 0,
i=1
then a; = 0 for each i. In other words, B is linearly independent.
A
Theorem 3.11 Suppose that V is a nonzero vector space and V = (B). Then there is a sublist of B which is a basis of V. Proof
Let (v;,...,v») be the list of all nonzero vectors in ‘B, and define B’ to be
the list containing, in order, each vj such that vj ¢ (v1,..-5¥j-1).
Then 8’ is linearly independent by Corollary 3.7. Now, since B spans V, for v € V given, we can write n v= > GjVj. i=1 Pick the largest k such that vp is not in B’. By construction of 8’, this means that k-1
r= >> divi i=1
for some b;,..., bp;
€ F. Therefore
k-1
v= av; i=l
n
+ agve +
k-1
> aii = Va i=k+1 i=1
n
+ aybi)vi +
> QiVi, i=k+1
154
Linear Independence, Bases, and Coordinates
SO VE
(V1,...,VR—15 Vkt1y+++
Un)» Now
iterate this construction: pick the largest
£ < k such that v¢ is not in B’, and show in the same way that
VE (Vy, . Vey Vogts + «ey VRo1s Vets + «+s Un) « Continue in this way until v has been written as a linear combination of the vectors in B’. Since v was arbitrary, we have shown that B’ spans V and is thus a basis. A
Corollary 3.12 If V is a nonzero finite-dimensional vector space, then there is a basis for V.
Proof
Since V
V. By Theorem
is finite-dimensional, V = (B) for some finite list B of vectors in 3.11, there is then a sublist of B which is a basis for V.
A
The proof of Theorem 3.11 implicitly contains an algorithm for finding a sublist
of a spanning list which forms a basis, but it's not terribly practical. For each vector v; in the list, you have to determine whether vj € (v1, amigi vj-1)3 depending
on the space V, this may or may not be feasible. In F”, though, we can use row
operations to give a practical way to find a linearly independent spanning sublist of a list of vectors.
Algorithm 3.13 Let (vi,...,Vn) be a list of vectors in F”. To find a basis B for (V1, --62Vn)> e
Put A =[v,---v,]
e
Ifthe ith column contains a pivot, include v; in B.
Proof
in RREF.
We saw in the proof of Theorem
3.11 that we can find a basis of (v1,..., Vn)
by first removing any vectors which are 0 and then removing any vectors vj such that vj € (vi,-..,Vj-1). Observe that if [v1 - - vn] is already in RREF, then the pivot columns are exactly the ones that get put in the basis. We can thus complete the proof by showing that whether or not a column of a matrix is a linear combination of preceding columns is unchanged by the row operations R1-R3. But in fact, we C1
already know this: vj = an
cxVp if and only if the vector | : | is a solution
cj-1 to the linear system with augmented matrix
I Vio
| Veal
| vy
3.2 Bases
155
Since the set of solutions to a linear system are not changed by the row operations, we are done. A
Quick Exercise #8. Find a basis for the subspace
Bases and Linear Maps An important fact about linear maps is that they are completely determined by what they do to the elements of a basis.
Theorem 3.14 Suppose that (v;,...,V») is a basis for V, and let w;,...,W, € W be any vectors. Then there is a unique linear map T : V — W such that T(vj) = w; for each i.
Proof We first prove the existence of such a map by explicitly constructing one. Given v € V, by Theorem 3.10, there is a unique representation
; v= Doan
with cy,...,C, € F. We define
T(v) := »
Ci.
i=1 Then clearly T(v;) = w; for each i (write it out carefully for yourself), so we just
have to show that T is in fact a linear map. If ¢ € F, then
T(cv) = T (Zi)
= Vcc wi = ¢ S> cw; = c Tv),
i) “WS4y, a4 UI sJonId YIM * I-z 1
econo ence
-apedsqns ayy 10} siseq e 1
zw
i=1
loom
i=1
loon
i=1
_ 1 0 I
My ‘suuunjo> yunoy pue ‘palyp
0 Oo} S} SUWNIOD aSOU. YUM XUJEW ay} JO 49YY AYL 28H WO o I
156
Linear Independence, Bases, and Coordinates
so T is homogeneous. If
a u= >
divi,
i=l then
n
0
0
1
i=1
i=]
t=1
i=1
Tu+v)=T (de + a) = leit dws = YP iwi t+ D> din; = Tr) + Tw. To show uniqueness, observe that if S € £(V,W) and S(vj) = w; for each i, then by linearity,
s (> on) = Vaso) = Yo cus, i=1
i=1
i=1
so S is the same as the map T defined above.
A
The construction of T in the proof above is very common, and is referred to as “extending by linearity.” Quick Exercise #9.
Let wi,...,w,
€ F” be given. Since (e1,...,€n) is a basis
of F", Theorem 3.14 says there is a unique linear map T : F" > F" such that Tej = wj for eachj. What is the matrix of T? From a slightly different perspective, Theorem 3.14 points to a potential use of the Rat Poison Principle: since there is one and only one linear map which sends a basis to a specified list of vectors, to show that two maps are the same it suffices to check that they do the same thing to a basis.
Theorem 3.15 Let T € L(V, W) and let (v;,...,vn) be a basis of V. Then T is an isomorphism iff (T(v;),..., T(vn)) is a basis of W.
Proof Suppose first that T is an isomorphism. We (T(v1),..., T(vn)) is linearly independent and spans W. Suppose that y
need
to
show
that
cjT(v)) = 0.
i=1 By linearity, this means that
n T (> on) =0. i=1
12'Z uoNDas aus s|\y) UMOUY Apead|e aA,aM — UoNsANb PUI e ISOUUfe St SIL =
| i
|
im | gv |
3.2 Bases
157
Since T is injective, by Theorem 2.37,
and since (vj,..., Vn) is linearly independent, this means that c) = --- = cy = 0.
Thus (T(v;), ..., T(vp)) is linearly independent. Now suppose that w €
W. Since T is surjective, there is a v € V such that
T(v) = w. Since V = (v;,..., Un), we can write
n v= »
div;
i=1
for some dj,..., dy € F. By linearity,
w= T(r) = Yo diTWv). i=1
Therefore W C (T(v,),..., T(vn)), and so (T(v4),..., T(v»)) is a basis for W. Now suppose that (T(v;),..., T(v,)) is a basis for W. We need to show that T is injective and surjective. Suppose that T(v) = 0. Since V = (v;,...,¥,), we can write
n v= »
div;
i=1 for some dj,..., dy € F. By linearity,
0 = Te) = Yo diTWv). i=1
Since (T(v), ..., T(v,)) is linearly independent, this implies that d; = --- = d, = 0, and so v = 0. By Theorem
2.37, this means that T is injective.
Now suppose that w € W. Since W = (T[v),..., T(vn)), we can write
n w=
>
ciT (vj)
i=1
for some ¢j,...,Cn € F. By linearity,
n w=T
(=
on)
€ range T,
i=1 and thus T
is surjective.
A
158
Linear Independence, Bases, and Coordinates
Corollary 3.16 Let
A € Myp,n(F). The following statements are equivalent.
e A is invertible. e The columns of A are a basis of F”. e
The RREF of A is Im.
Proof
Let T ¢ £(F",F"”) be the linear map represented by A; then A is invertible
if and only if T is an isomorphism. By Theorem 3.15, T is an isomorphism if and only if (Te,..., Ten) are a basis of F”. But (Te;,..., Ten) are exactly the columns of A. By Proposition 3.9, the columns of A are a basis of F” if and only if the RREF of Ais Ip. A
Quick Exercise #10. Use Corollary only be invertible if m
3.16 to show that a matrix A € Mm,,(F) can
Example The example on page 99 showed that the two matrices
=
0
Oo
1
and B=
['
—1
0
eee
o
1
-1
0
_
ee
|
:
—
'
' 0
0
O
11
vee
vee
eee
OO
0
1
Ove
»
eee
eee
1
-l
OO
1
are invertible. Corollary 3.16 thus gives another way to see that the columns of A and the columns of B are two different bases of F” (this was shown first in Example 4 on page 151). A
Q—m e A
Key IDEAS basis of a finite-dimensional vector space is a linearly independent list of
vectors which spans the space. e Finite-dimensional vector spaces always have bases: you can start with any spanning list, and eliminate unnecessary elements to get a basis. e A list (v},...,Vm) is a basis of F” if and only if the RREF of the matrix with those columns is I, (so you need m = n). ‘aienbs aq ysnw y Uayy "TSI
JO J3YY SUR JI OLE WO.
3.2 Bases
159
You can define a linear map by defining it on a basis and extending by linearity. If two linear maps do the same thing to a basis, they are the same. A map is invertible if and only if it takes a basis to a basis. A matrix is invertible if and only if its columns are a basis.
EXERCISES 3.2.1
Determine whether each of the following lists of vectors is a basis for R".
[2|
3|
Pa
a i
(d)
1]
L1
3.2.2
,
en os
# (EE)
=i)
EN
fe
2
Alea ]2y'}3}"]3 2
[3
2 4
Determine whether each of the following lists of vectors is a basis for R".
ry
°
(
:]
Al
, |
©
)
(b)
[1] (od
3.2.3.
f2]
[3
Z|)
|| 3
]{5|
2
3
1
2
fi]
f2]
fa
(@) |] 2],)3],]
1
3]
—2
il
Find a basis for each of the following spaces.
160
Linear Independence, Bases, and Coordinates
(a) [4]
[-2
—2],] 1 L2] L-1]
(b)
f2]
3], L-2
fi
1]
0
[4
=2
(d)
=s}
ie
e
ee
L-4]
2
-1
e
0 2 1
1
0
1
0
1
1
1
1
ee
L-1]
Lo
Find a basis for each of the following spaces.
[a 2 21,/3 L 1
(a)
a 14’) Lo
(b)
2 0 3 -1}
; [’} 2]? [-1
1
1
0
-1
2
|’}-2
L-1]
L-2
[2
-1
(c)
Oo
|’}]
L-3} fi
;
0
1 0
(d)
3.2.5
],)
0) 1] |-2 1 —1|’}o}?|-1}’|-2]7] cane 1 0
(c)
3.2.4
0
1 1 , >
; 3
0 0 1 1
1
Lo EE
Find a basis for the kernel of each of the following matrices.
10 (a)
|2
1
2
-3
-1
-3
11-3 (o)
5
>
1
-A@
-1
2
10 (b)
0
}2
1
2
-3
-1
-3
11-3
t
; af
100
20 @}
1
=i
0 1
-10
3
1-2
0
3
0
i
@
2
=i
@
3.2 Bases
Find a basis for the kernel of each of the following matrices.
|
)
ee
{1 -3 0 4
3
3.2.7
ee ee |
nO)
}Oo
3
1
0
-2
ik
; such that aj;~ = aj for every i. Prove that V is finite-dimensional. Let D be the derivative operator on C™(IR). Show that the eigenspaces of D are all finite-dimensional. Suppose that (v),...,”,) is a basis of V. Prove that
(01 + v2, 02 + 035-525 Un—1 + Yny Yn) 3.2.16
is also a basis of V. Find a sublist of (x + 1,x? — 1,4? + 2x + 1,x? — x) which is a basis for
?2(R). 3.2.17 3.2.18
Show that if B is a linearly independent list of vectors in V, then B is a basis for the subspace (B) spanned by B. Suppose that A € M,(F) is upper triangular (see Exercise 2.3.12) and that all the diagonal entries are nonzero (i.e., aj; # 0 for each i = 1,...,n). Show that the columns ofA form a basis of F".
3.2.19
Let (vi,...,Vn) be a basis of F”, and let b € F”. Show that b=x1V1
+++
+-4nVn,
where x = V~'b, and V € M,(F) has columns vj,...,Vn-
3.2.20
Suppose that (v},...,p) is a basis of V, and let w; = De
3.2.21
i =1,...,n and scalars aj € IF. Show that if the matrix A € M,(F) with entries aj is invertible, then (w1,..., Wn) is a basis of V. Let (v),...,¥,) be a basis of V, and let a),..., a, € F. Prove that there
3.2.22
is a unique linear map T : V > F such that Tv; = aj for each j. Suppose that (v),..., vn) and (w;,...,Wn) are both bases of V. Show that there is an isomorphism Ui
3.2.23
T € £(V)
3.2.25
wj; for each
boosqub
Show that Theorem 3.14 fails if the vectors (v;,..., vn) are not linearly independent. That is, give an example of a
3.2.24
such that Tv; =
ajv; for
list (vj,...,¥,) which spans
V and a list (w,,...,w,) in W, such that there is no linear map T € L(V, W) with Tv; = w; for eachj. Suppose that (v;,...,v,) is a basis for V, T : V > W is linear, and (Tv;,...,Tvn) is a basis for W. Prove that if (w;,...,un) is another
basis for V, then (Tuj,..., Tun) is another basis for W. Use Proposition 3.9 to give an alternative proof of Theorem 3.10 when V=F".
3.3 Dimension In this section we will begin making finer distinctions among finite-dimensional spaces.
3.3 Dimension
163
The Dimension of a Vector Space
Lemma 3.17 Suppose that V C (v1,...,Vm), and that (w),..., Wn) is a linearly independent list in V. Then m>
Proof
For
each
j
=
1,...,”,
n.
wy
€
(v1,...,¥m),
and
so
there
are
scalars
yj, ..-,@mj € F such that
m w= DY ayn. i= Now consider the homogeneous m x n linear system Ax = 0, where A = [aj] € Mm,n (IF). If x is a solution of this system, then
for each i=
1,...,m, and so
m
{in
i=1
\j=1
=¥0[ ais} Since
=
(w ,..., Wy) is linearly independent,
this implies
that xj =
Therefore, x = 0 is the only solution of the m x n system Ax Corollary 1.4 means that m > n.
0 for each j.
= 0, which
by A
dependent list of polynomials ,(R) consists ofat most n+
1 polynomial
Intuitively, Lemma 3.17 says that a list of vectors in V with no redundancy is necessarily shorter than (or the same length as) any list that describes all of V. One easy consequence is the following way of recognizing infinite-dimensional
spaces. Theorem 3.18 Suppose that, for every n = 1, V contains a linearly independent list of n vectors. Then V is infinite-dimensional.
Proof We will prove the contrapositive: if V is finite-dimensional, then there is some n such that V does not contain any linearly independent list of n vectors. Indeed, suppose that V C (v),..., Um). By Lemma
3.17, for any n > m, V cannot
contain a linearly independent list of n vectors. greens “x‘1) sjeiwoudjod 1 + u ayy Aq pauueds si (H)"“G
A LL
WO
164
Linear Independence, Bases, and Coordinates
In fact, the converse of Theorem 3.18 also holds (see Exercise 3.3.21). Example For each n, the space ?(R) of all polynomials over R contains the linearly independent list (1,.r,...,.r"). Therefore P(R) is infinite-dimensional. A The most important consequence of Lemma
Theorem 3.19
3.17 is the following fact.
Suppose that (v;,...,Vm) and (w1,...,Wn) are both bases of V.
Then m=n.
Proof
Since V C (r,...,Ym) and (w1,...,W») is a linearly independent list in V,
Lemma 3.17 implies that m > n. On the other hand, since V C (wy,..., Wn) and (v;,...,¥n) is a linearly independent list in V, Lemma 3.17 implies that n > m. A Theorem
3.19 tells us that, even
though
a given
(finite-dimensional)
vector
space has many different bases, they all have something important in common their length. So given a finite-dimensional vector space V, “the length of a basis of V” is a single number, which only depends on V itself, and not on which basis you consider. This means that the following definition makes sense. Definition Let V be a finite-dimensional vector space. e
If V ¥ {0}, then the dimension of V, written dim V, is the length of any basis of V. If dim V = n, we say that V is n-dimensional.
e
If V = {0}, then we define the dimension of V
to be 0.
If it were possible for V to have two (or more) bases with different lengths, then
our definition of dimension would have serious problems: what we thought dim V was would depend on which basis we were using. We would say in that case that the dimension of V was not well-defined. In general when making mathematical definitions, one has to be careful about well-definedness whenever the definition
appears to depend on a choice. Here, Theorem 3.19 tells us that even though the definition of dimension appears to depend on a choice of basis, it really doesn’t we'd get the same number by choosing a different basis.
Examples 1.
We have seen that (€;,...,€,) forms a basis for F", so dim F" = n.
2.
We
observed
in the example
on page
146 that (1,.+,...,.1") is a basis of
Pn(R) the space of polynomials over R with degree n or less, so dim ?,(R) =
n+.
A
We next record some simple consequences of Lemma 3.17 for the dimension of a vector space.
3.3 Dimension
165
Proposition 3.20 If V = (v1,...,n), then dimV n.
and
(v,,...,V,)
Proof By Corollary 3.12, V has a basis B, and by Lemma long as the linearly independent list (1),..., vn).
is a linearly
3.17, B is at least as A
The following lemma gives a good example of the way we can sometimes learn something non-trivial by counting dimensions of spaces.
Lemma 3.22 Suppose that U, and U) are subspaces of a finite-dimensional vector space V, and that dim U, + dim U2 > dim V. Then U; M U2 # {0}.
Proof We will prove the contrapositive: if Uj M U2 = {0}, then dim U; +dim U2 < dim V. Let (u;,..., up) be a basis of U; and (v,...,¥q) be a basis of Uz. Suppose that
Pp
4
>
ajujy + >
i=l
j=l
bjvj =0
for scalars a,...,@y,b1,...,bg. Then
Yam =— Some, and this vector must be in U;
P
4
=1
jel
Up, so by assumption it is 0. Since (w1,..., Uy) is
linearly independent, a; = 0 for each i, and since (v1,...5 0) is linearly indepen-
dent, b; = 0 for each j. Therefore (u1,..., up, ¥1,--.,Vg) is linearly independent, which implies that dim V > p+ q. The following result shows that, n-dimensional vector space over F.
A up
to
isomorphism,
there
is only
Theorem 3.23 Let V and W be finite-dimensional vector spaces. Then dim dim W if and only if V is isomorphic to W.
one
V =
‘A 40 SiSeq @ Si YDIYA IsIIgeNssey (Ma*=**la) ZL WD
166
Linear Independence, Bases, and Coordinates
Proof Suppose first that dim V = dim W = n. Pick bases (v1,...,v,) of V and (wi,...,Wn) of W. By Theorem 3.14, there is a linear map T € £(V, W) such that T(v;) = w; for each i. So by Theorem 3.15, T is an isomorphism.
Now suppose that V and W are isomorphic. Then there is an isomorphism T € L(V, W). If (v1,...,Un) is any basis of V, then by Theorem 3.15, (T(v1),..., T(vn)) is a basis of W, and so dimW
Theorem
=n=dimV.
A
3.23 gives us the abstract statement that every n-dimensional vector
space over F is isomorphic to every other one, but it is helpful to have a concrete example of such a space.
Corollary 3.24 [f dim V = n, then V
Algorithm 3.25 Let (vi,.
is isomorphic to F".
Vn) be a list of vectors in F”. To find the dimension
Of (V1,.. ++ Vn)?
e PutA=|v, e
|
|
--- v,| in RREF.
dim (vj,...,V,) is the number of pivots.
Proof
Algorithm 3.13 shows the pivot columns of
the RREF of A form a basis for
(v1,..-,Vn), So the dimension is the number of pivot columns.
A
Quick Exercise #14. Find the dimension of the subspace
1 2] [fo] fo] fi oO} Jo} trl fol fry) ens 1 —2]’Jol fi} 3] f= -1 2 1] [o]Lo (from Quick Exercise #8 on page 155).
[2UOISUaWIP-2aiL)} S!
anedsqns ayp os (gg | aGed aas) sunjo> asouy YIM xujeLW ayy Jo SULUNjOD JonId ayy ave ALY) “pL VO 740} 1ydsowos! si 7 os ‘z = |. UIP
ai0jaiayL “71 JO siseq e aduay pue Uapuadapu! Aj:eaUi| si (ZA‘IA) ‘ieaUI|jODUOU ase ZA“IA aDUIS :ELH VO
167
3.3 Dimension
Dimension, Bases, and Subspaces The next result gives one way that understanding dimension can save you some work. It says that to check whether a list in V is a basis, as long as it has the right length, you only need to check that the list spans V, and you get linear independence for free.
Theorem 3.26 Suppose that dimV = n 4 0 and that V = (B), where (v1,..., Un). Then B is a basis for V.
Proof
B =
By Theorem 3.11, some sublist B’ of B is a basis for V. Since dim V = n,
8B’ must have length n, so in fact B’ = B. Thus B
Example
is a basis for V.
A
Consider the two lists of vectors in F”, (vj,...,Vn) in F” given by vwHeite
+4
for 1
nullT =0
rankT=dimV_
(by the Rank-Nullity Theorem)
(since dim V = dim W)
T
rankT =dimW
is surjective
(by Proposition 3.31).
It follows that if T is either injective or surjective, then in fact T is both injective and surjective, and thus an isomorphism. Conversely, we already know that if T is an isomorphism then it is both injective and surjective. A This last corollary is an example of what’s known as a “dimension-counting” argument. We showed that injectivity and surjectivity of T are equivalent, not directly, but by using the Rank-Nullity Theorem to relate the dimensions of the image and kernel of T. In addition to possibly being injective but not surjective or vice versa, in general it is possible for linear maps to have only a left-inverse or a right-inverse, i.e., to have S € L(V, W)
and T € £(W,V)
so that, for example, ST = I but TS
# I.
3.4 Rank and Nullity
179
However, it follows from the previous result that this cannot happen in the setting of linear maps if dim V = dim W. Similarly, we have already seen that while some matrices have only one-sided inverses, this is not possible for square matrices; the
result above gives a quick proof of this fact.
Corollary 3.37 1.
Suppose that dim V = dim W and that S € L(V, W) and T € L(W,V). If ST =I,
a= 2.
then TS
=I
as well. In particular,
S and T
are invertible, and
9,
Suppose that A,B € M,(F). If AB =I, then BA = I, as well. In particular,
A and B are invertible, and A~' =B.
Proof 1.
If ST =I, then for every w € W, w = Iw = S(Tw), which implies that S is surjective. By Corollary 3.36, S is invertible, and so
T=S'SsT=S"'. 2.
By Corollary 2.32,
F" = C(l,) = C(AB) C C{A), so rankA
> n. Since A € M,(F), this means that rank A = n; i.e., the RREF
ofA contains n pivots. Since A is an n x n square matrix, this means that the RREF of A
is I,, which by Algorithm
2.25 means that A
is invertible. Then,
exactly as above,
B=A'AB=A"!.
A= [:
‘
] and B= AT =
oon oo
Quick Exercise #21. Let
A
. Compute AB
and BA. Why does this not contradict Corollary 3.37? Dimension counting lets us make the following interesting observation about the eigenvalues of square matrices.
Corollary 3.38 Let A € M,(F). Then d € F is an eigenvalue of A if and only if A is an eigenvalue of A‘. ‘sa2ujeus So oo e-o
azenbs j,uaie q pue y asnevag Aidde y,usaop z¢"¢ Aue|jo10> |
0) | = va pue 1 = av ‘tz# vd
I
180
Linear Independence, Bases, and Coordinates
Proof
Recall that 4 € F is an eigenvalue of A if and only if ker(A—AlI,) 4 {0}. By
the Rank-Nullity Theorem, this is equivalent to rank(A—AlI,)
< n. But by Theorem
3.34,
rank(A — AI,) = rank(A — Al,)" = rank(AT — Al), so rank(A — AI,) < n if and only if rank(A! — AI,) F"” is surjective if and only if the m x n linear system
Ax=b
(3.9)
T— MUN > A unp > L194 — A up = 1 qWer :zz# VO
181
3.4 Rank and Nullity is consistent for every b € F”. Recall (Theorem
1.2) that (3.9) is consistent if and
only if there is no pivot in the last column of its RREF. But if m > n (i.e., there are
more rows than columns), then the RREF of A cannot have a pivot in every row, since each column can contain at most one pivot. Therefore, there are some rows
of all zeroes. This means that for some choices of b, there will be a pivot in the final column of the RREF of (3.9), making the system inconsistent.
Proposition 3.39 also has important geometric consequences for linear maps between finite-dimensional vector spaces. Firstly, it reproves the fact that two vector spaces of different dimensions cannot be isomorphic, but it refines that observation. On the one hand, a linear map cannot stretch out a vector space to manufacture more dimensions: when mapping V into W, a linear map cannot make the dimension go up, only down. On the other hand, linear maps cannot squeeze vector spaces down into spaces of smaller dimension without collapsing some vectors to zero. One can put these ideas together to justify visualizing a linear map as collapsing part of a space to zero and putting the rest down in the codomain pretty much as it was. This kind of restricted behavior is a very special property of linear maps; non-linear maps between vector spaces can do all kinds of strange things.”
Linear Constraints In many applications of linear algebra, one encounters sets which are subsets of an ambient vector space, subject to certain linear constraints; i.e., elements are
required to satisfy some linear equations. For example, one might encounter a set like v1 Sia
~
42 .
:
3x,
+ 244 + %5 = 10
12-43 — 545 =7
a4
x65 The
set S is not
a subspace
of IR°, and
so we
have
dimension for S, but it is not hard to do so: recall so-called “affine subspace” of IR®°, meaning that it shifted so that it no longer contains the origin. It is dimension of S to be the dimension of the subspace
not defined
the notion
of
from Section 2.5 that S is a is a subspace which has been thus reasonable to declare the S — vo of IR®, where vo € R®
is a vector such that S — vo contains 0.
It is easy to read off from S which subspace S is a translation of, and how to translate it back to containing the origin. Indeed, if vo is any element of S (assuming that S is nonempty), then trivially S — vo contains 0. Moreover,
*For example, you might enjoy googling “space-filling curve”
182
Linear Independence, Bases, and Coordinates
S—vw=
4 x,
.
°
3x1 + 244 4+ 25 = 3
i
— 213 — 5x5
=0
4
Xs,
(See Exercise 3.4.7.)
So if we define T : R°
thal .
> R? by
pntante]_
[20
x —-4;—5%5}
0
1
0
2
14/2
-1
0
—5||~?
then S — vo is exactly the kernel of T. Since the rank of T is obviously 2, this means that the nullity of T, i.e., the dimension of S, is 5 — 2 = 3. The same argument works in general: if S is the subset of an n-dimensional vector space defined by k (linearly independent and consistent) constraints, then the dimension of S isn —k.
Q—n
KEY IDEAS
e The rank of a map is the dimension of its image; the rank of a matrix is the dimension of its column space (which is the number of pivots in its RREF).
¢ The nullity of a map or matrix is the dimension of its kernel. e Row rank = column rank. e The Rank-Nullity Theorem: If T ¢ L(V, W), then rank T + null T = dim(V). If A €Mm,(F), then rank A + null A = n.
EXERCISES
3.4 Rank and Nullity
183
f@) [2
7s
|
|
2
3
#1
4 -6
2]
=)
@)
: 3 3
4
3.4.3
‘
2
-1
1
@]i
Al
3°10
-2
1
-5
2
=i
©
3
-1
se}
A
ol
(9 |-2 0 4 -3 u
#14
Let D be the differentiation operator on the space ?,,(R) of polynomials over R of degree at most n. Find the rank and nullity of D.
3.4.4
3.4.5 3.4.6 3.4.7
Let
A = Mm,n(R)
be given by aj = ij for
1
< i < mand1
[Ta.Bv.Bw]gy =AUay-
V and By
W
in
are n- and
respectively,
and
W by
(3.10)
The rather unwieldy notation Ta,%y,%y is essential: which linear map a given matrix defines depends completely on the bases we're working in. For example, consider the 2 x 2 matrix
:
"I
If we let the domain and codomain be RR?
with the standard basis, the map defined by multiplication by this matrix gives the
left-most picture in Figure 3.2; if we simply write the standard basis in the nonstandard order (€2,e;) in both the domain and the codomain, we get the picture in the middle. If we use the standard basis (in the standard order) in the domain, but we use the non-standard order (9, e;) in the codomain, we get the right-most
picture.
Examples 1.
Let I be the identity operator on a vector space V. If we use the same basis B for V in both the domain
2.
and codomain
of I, then the matrix of I is the
(appropriately sized) identity matrix. (Check carefully that this is really true!) However, if By and B2 are two different bases, then [I],8, is not the identity matrix; all we can say about it is that it is invertible, but aside from that the entries could be anything. Let ?,,(IR) denote the polynomials in x over R of degree n or less, and let D:P,,(R) > P,—,(R) be the derivative operator:
Df (x) := f').
3.5 Coordinates
189
Hello f@)
(b)
Heo.
(©)
~=——ollsH—
Figure 3.2 Multiplication by the matrix | 1 |” 0 | when the bases in the domain and codomain
are: (a) both (e;, e2); (b) both (e2, e1); (c) (e1,2) in the domain and (e2, e;) in the codomain.
We saw in Section 2.2 that D is linear. The matrix of D with respect to the bases {1,.2,27,...,2"} and {1,.,27,...,.1771} is the n x (n+ 1) matrix
) 0
000
n]
Let T : R* — R? be given by reflection across the line y = 2x. Figure 3.3 Reflection across y = 2x.
The matrix of T with respect to the standard basis is
ole
3.
0 10 002
4
5
Quick Exercise #24.
Convince yourself that tl
190
Linear Independence, Bases, and Coordinates
However,
things
look
(3) [Dwr
much
simpler
if we
consider
the
basis
r((5) - Em (E) -
B
:=
i 3)
so the matrix of T with respect to B is
The matrix in this basis makes it easy to see that applying T twice gives the
identity map, which is geometrically obvious but rather obscured by the form of the matrix in the standard basis. a Our representation of linear maps as matrices with respect to arbitrary bases has the same nice properties that our representation with respect to the standard basis had. In particular, we have the following generalization of Theorem 2.9.
Theorem 3.42 Suppose that dim V = n, dim W = m, and By and By are bases
of V and W, respectively. Then the map Cay,By defined by
: &(V,W) > Maim w.dim v(F)
Cay,Bw(T) = (Tay, is an isomorphism.
Proof
The
proof
is essentially
identical
to
that
of Theorem
2.9.
Let
By
=
{v1,.--,Un} and By = {wy,..., Wm}. Let T;,Tz € £(V, W). Then the jth column of the matrix Cy3,,8,,(T1 + T2) is given by
(T+ TAD]5, = (Me)ay + P2754 which is the sum of the jth columns of C-g,,.8,(T1) and Cgy,By(T2), so
Cay, Byw(T1 + Tr) = Cay,By(T1) + Cay,By(T2)Similarly, if a ¢ F and T € L(V, W), then the jth column of Cg ,,.3,(aT) is given
by [4N%)]5,, =a [Tr]5, , so that
Cay,BylaT) = aC ay, By(T). The map Cg,,8y
is surjective, since given a matrix A € Maimv,dim w(F), one can
define a linear map Ta,By,By € £(V, W) as in (3.10), and then Cyy,8y(Ta) = A. If Czy,By(T) = 0, then for every vj € By, T(v;) = 0, and so T is the zero map.
The map Cs,
is therefore injective and we are done.
A
191
3.5 Coordinates Theorem
3.42 in particular immediately tells us the dimension of the vector
space L(V, W):
Corollary 3.43 Let V and W be finite-dimensional vector spaces. Then dim £(V, W) = (dim V)(dim W).
We have already observed that the matrix of a given linear map looks completely different depending on which bases are used. In particular, it is always possible to choose bases so that the matrix of an operator is extremely simple, as in the following result. Theorem 3.44 Let V and W be finite-dimensional vector spaces and T € L£(V,W). Then there are bases By of V and By ofW such that if A = [Tlay,By> then aj = 1 for 1 Ix9 INO PUE Lp'E EUIWAT YUM Sp’E LONI! doug AUIqUOD :Z UONN|OS =a 2] UONNIOS :s7# WO tatty (=I = tala "=I = ay os tata =
193
3.5 Coordinates Examples 1.
We saw above that reflection in the line y = 2x is a diagonalizable operator; B=
([:| : [))
is a basis of eigenvectors (with corresponding eigenvalues
2
1 and —1).
2.
i Consider the map T :: R* R2 >
R? 2 defined by T
ry)y
=]
10 of
If
yvy); is an
cE)
eigenvector of T with eigenvalue 4, then
so that either
x = 0 or A = 0. If A #0, then it must be that x = 0, from which
the equality of the second entries gives that y = 0. Since the zero vector is never an eigenvector, this means that the only possible eigenvalue of T is A=0.
x
For 5]
to be an eigenvector with eigenvalue 0, we need
L-[} 2
;
5
ice., x = 0. In particular, the zero eigenspace is spanned by
0
1/° and so we
cannot have a basis of eigenvectors; T is not diagonalizable.
A
Matrix Multiplication and Coordinates We introduced
matrix multiplication in Section 2.3 as the operation of matrices
that corresponded to composition of the associated linear maps; the following result shows that, even in arbitrary coordinates, matrix multiplication corresponds to composition.
Theorem 3.47 Let U, V, and W be finite-dimensional vector spaces with bases By, By, and By respectively, and suppose that T € L(U,V) and SeEL(V,W). Then
(ST]ay,Bw = [Slay,Bw [TMay,By Proof Write By = (ui,...,Up), By = (v1,...,n), and By = (w1,...,Wm), and let A = [S]ay,ay. B = [Tay,ny, and C = [ST]g,,5y-. We need to show that C= AB.
194
Linear Independence, Bases, and Coordinates
By the definition of A, for each k =
1,...,, the kth column of A is the
coordinate representation of Svz with respect to By. That is,
m Srp = Ye aw.
(3.11)
i=1 Similarly, for each j = 1,...,p,
Tuj
(3.12)
By linearity, (3.12) and (3.11) imply that
n
”
m
STuj = >» by Sv_ = >
bay (=
kel
k=1
which means that
n cum)
i=1
>
eum)
Wi,
k=1
n cy = > Gigdiy k=1
for each i andj, which is the same as C = AB.
A
Example Consider the map T : R? > R? given by first reflecting across the line y = 2x, then rotating counterclockwise by an angle of 26, where 6 is the angle the
line y = 2x makes with the y-axis. We have seen (see the example on page 189)
that in the basis B = ([:]: |?) the matrix of the reflection is [: Quick Exercise #26.
"| .
Show that if we use the basis B in the domain and the
standard basis € = {e1, e2} in the codomain, the matrix of the rotation by 20 is
[> =i]: It follows that
[T]a,¢
-[2 a] 2J-[2 7]
.
Corollary 3.48 Let V and W be finite-dimensional vector spaces with bases By and By, respectively. Suppose that T € &(V, W) and let A = [T]a,.5y- Then A is invertible if and only if T is invertible, and in that case [T "lay .ay =
ay} ‘UONe}01 ay} sa}ouap y }! ‘sixe- ay} YIM sayew 1z = € aul)
[| = [ ¢ ayy aj6ue ayy
Jape (*] (( “ ANrexa SI @ aDUIS :97# VO
3.5 Coordinates
195
Proof Suppose first that T is invertible, and let B = [T~']..,,.,. Since T is invertible, V and W have the same dimension, say n. By Theorem 3.47,
AB= [TT] 5, 5 = UlBy,By = ln and so by Corollary 3.37, A is invertible and B= A~!. Now suppose thatA is invertible, which implies thatA is square and so dim V = dim W. Let S € £(W,V) be the linear map with [S]g,,%y = A~!. Then by Theorem 3.47,
[ST ay,ny =A 'A=In and so ST = I. Thus by Corollary 3.37, T is invertible and S = T~'.
Q—m
a
KEY IDEAS C1
e Saying that the vector v has coordinates [v]., = | : | means exactly that Cn,
V=CV] + +++ + Cyn, Where B = (V1,..., Un). e The matrix of T with respect to bases By, By is denoted [T].g,,%,, and defined
so that [T]ay,By Way = [TlayThe matrix [T],,5, has jth column [Tr] s,e A linear map T € £(V) is diagonalizable if there is a basis of V in which the matrix of T is diagonal. In this case, all of the basis vectors are eigenvectors of
I.
(Slay,Bw [Tay.ay = [STlau,Bw[Tos ycay = May" EXERCISES
196
Linear Independence, Bases, and Coordinates
Sys)
Find the coordinate representations of each of the following vectors 1
with respect to the basis
B= |
1
|—1],]0],] (0)
i
(a)
0
0.
1
2
of R?.
=—1
1
6
2)
@]-4
—3)
1
W@W]
@}o}
1
1
—4
}o)}
(b)
]O}
0
Find the coordinate representations of each of the following vectors 1
with respect to the basis
1
B= |] —1],]
2
0
|,]
al
(a)
1 |}
iC]
:
(b)
;
(@]}
0
2H}
oO
=I
1 2) -3
6 @|-4 1
1
@=])|/2\|p]2]/cl] 0
of R?.
B = ([7]: [:)) of R? and
Consider the bases 1
2
‘)) ©
of R3, and the linear maps S € £(iR?,R*)
3
and T € £(IR?, R?) given (with respect to the standard bases) by
0.1 =|3 -3 4 -8
(Slee
and
4-11 mee=[} ‘|
Find each of the following coordinate representations.
(a) [S]z,¢
(b) [Slee
( [Slee
@ (Tee
(Mes
(6 Mes
Consider the bases B = ([3] , [2)) of R? and
1]
[1]
fo
@€=]]1],]/0],]1]
of
[1]
| of R%, and the linear maps S € £(R?,R*)
[a
and T € £(R?,R?) given (with respect to the standard bases) by B=)
(Slee=]
5
-3
—3
2
and
mee=[}
1-1
A
1 “lk
Find each of the following coordinate representations.
(a) [S]z,¢ (0) [Slee (© [Slz,e @ Mee
(Mes
(6 Mes
3.5 Coordinates
See!
197
1
‘)
Consider the bases B = ((i] | 0
, | 1| | of R3, and let T € £(IR3, IR?) be given by
on
(i)
of IR? and
1
“L)-E (a)
Find each of the following coordinate representations.
i (EI
1
i) [FI]
(ii) | | 2
B
B
™) Mee (b)
5
tiv) || -2
3
i) Mes
1
e
Cs
(vil) Tex
Compute each of the following in two ways: by finding Tx directly and finding the coordinate representation, and by using your
answers to the previous parts and Lemma 3.41. @) | TI]
1
3
2
(ii) | T | -2
3
3.5.8
é
!
Consider the bases
B = ([) ;
e=
1
1
0
Wha
|{@}la|
x!
0
1
1
B
‘}) of R2 and
of R}, and let T€ £(R3, R?) be given by
-
ae
Tly
- [2].
Zz
(a)
Find each of the following coordinate representations.
tii) (2
PP] B
Wi) Tes
a
(ii) | | -2
tiv | 2
3
1
B
) Mee (b)
|
(vil) Mes
¢€
Compute each of the following in two ways: by finding Tx directly and finding the coordinate representation, and by using your answers to the previous parts and Lemma 3.41.
(i) | T|
1
3
2
(i) | T] -2
3
(4
1
B
198
Linear Independence, Bases, and Coordinates
330)
Let P be the plane
x y| Z
€
Rjr—-2y+3z=0
(a)
Find a basis for P.
(b)
Determine whether each of the following vectors is in P, and for
each one i.e., give its coordinate representation in terms of your basis. @)
3.5.10
1
2
[ 5
}-1
(ii) | 3
(iii) | -—2
=i)
1
=35)
x y| Zz
€R?|4r+y—2z=0
Let P be the plane
(a) (b)
Find a basis for P. Determine whether each of the following vectors is in P, and for each one ie., give its coordinate representation in terms of your
basis. 1
@)1
0
1
(ii) | 2
(ii) | —2
1
1
1
3.5.11
Let D
: P,(R) —
P,—1(R) be the derivative operator, and let T :
Pn—1(R) > P,,(R) be the linear map given by multiplication by x:
(Tp)(x) = xp(a). Find the matrices of the maps T, DT, and TD with respect to the bases
tba)
3.5.13)
(1,x,...,2") and (1,2,...,2%-). (a) Show that B = (1,2, 3x? — 5) is a basis of P,(R). (b)
Find the coordinate representation of a? with respect to B.
(c)
Let D : P2(R) +
P2(R) be the derivative operator. Find the coor-
dinate representation of D with basis B on both the domain and (d) Use your answers to the last two Suppose that T € £(V) is invertible,
respect to B (i.e., with the same the codomain). parts to calculate £r). and that for some basis B of V,
(T]y = diag(A1,..., An). What is [T~"].,? 3.5.14 Let R € L(IR’) be the counterclockwise rotation of the plane by 7/2 radians. (a) Show that R is not diagonalizable.
199
3.6 Change of Basis
(b)
3.5.15
Show that R? is diagonalizable.
Show that the projection P € £(IR?) onto the x-y plane is diagonalizable.
3.5.16
Let L be a line through the origin in R?, and let P € L£(IR?) be the
3.5.17
orthogonal projection onto L. (See Exercise 2.1.2.) Show that P is diagonalizable. Suppose that ‘B = (vj,...,V») is a linearly independent list in F”, so that it is a basis for U = (B). Let
3.5.18
A = [vs see vn
€ Mmn(F) be the
matrix whose jth column is vj. Show that, for any u ¢ U, u= A [u]. Suppose that B = (vj,...,Vv») is a basis for F”, and let A = [vi nee vn
€ M,(F) be the matrix whose
jth column is vj. Show that,
for any x € F", [x] = A~!x. 3.5.19
Show that if S,T ¢ £(V) are both diagonalized by a basis B, then ST
3.5.20
is also diagonalized by B. Show that if B and Bp are bases of V, then [I]g,., is an invertible matrix, and find its inverse.
3.6 Change of Basis Change of Basis Matrices Say B = (v;,...,¥n) and B! = (v4,...,v),) are two different bases of V. In an abstract sense, at least, knowing the coordinate representation [v]., of a vector v € V is equivalent to knowing the vector v itself, and so it determines the coordinate representation [v].,,. How can we actually express [v] in terms of [v].3?
Theorem 3.49 Let B and B’ be two different bases of V. Then for any v € V,
(5 = Ws,9' Ps -
Proof
Applying Lemma 3.41 to the identity map
1: V >
V,
(ly = Ul = Ua,9° (lp
a
Definition Let B and 8’ be two different bases of V. The matrix S = [I]-g,.5/ is called the change of basis matrix from B to B’.
Example
Let B = ([: , [2:))
and B’ = ([']: [::))
Then to find [I]
3,
we need to find [v].s, for each of the elements v € 8; i.e., we need to identify
the coefficients needed to express v as a linear combination of the elements of B’. This is equivalent to solving the linear systems with augmented matrices
200
Linear Independence, Bases, and Coordinates
1 1
1
]1
and
1
-1/2
1
1
. 1] 3 . The solution of the first system is ata}
2
.
-1}-1
. that is,
Bathe) = [EP Solving the second system gives that the change of basis matrix [I]4 is
(s,2 = = B 2
i}
3
(3.13) Aa
It turns out that there is a more systematic way to find change of basis matrices when V = F”, using the following two lemmas. Lemma 3.50 Let € be the standard basis of F", and let B be any other basis of F". Then the columns of the change of basis matrix S = [I]g,¢ are the column vectors in B, in order.
Proof Let B = (vi,...,Vn). The jth column of S is the coordinate representation of Iv; = vj with respect to €. Since € is the standard basis of F”, the jth column of S is v; itself. a
Lemma 3.51 Let B and B’ be two bases of V, and let S be the change of basis matrix from B to B’. Then the change of basis matrix from B' to B is S~'.
Proof
By definition, S = [I]-3,/, and so by Theorem 3.47,
Sis..2 = s,s Uls,s = Wee = he Since S$ is a square matrix, this proves that [I].5/,3 = S~! by Corollary 3.37.
o
&
Quick Exercise #27. Let B = ([: , [21)) and B’ = ([) , [::)} Find the change of basis matrix from B’ to B.
A #126 04 (€1'€) wos) xuiew sseq yo abuey> aur wanu 22H VO
3.6 Change of Basis
201
Theorem 3.47 and Lemma 3.51 now show that if B and © are two bases of IF", then the change of basis matrix [I] z,¢ is
Wls,e = Me, Hla,e = (Wee) Use.
(3.14)
and Lemma 3.50 tells what both the matrices in this last expression are. Example
We saw in Section 3.2 that the list of vectors B =
(vy,...,v,) in F”
given by vjHeate +e for 1 1 and that the theorem is known to be true for all vector spaces over F with dimension smaller than n. By Proposition 3.66, T has an eigenvalue A and hence an eigenvector v. Write U = range(T — AJ). If u € U, then since U is a subspace, Tu =(T—AlDu+aAue
U,
thus T restricts to a linear map U — U. Furthermore, since 4 is an eigenvalue of T, T — AI is not injective. But this means that T is also not surjective, and so
UFV;ie., m=dimU j. That is, A is upper triangular with all diagonal entries equal to 0. Show that if A € M,,(F) is strictly upper triangular, then A” = 0. Suppose
that F is algebraically closed and that
A € M,(F)
has only
one eigenvalue A € F. Show that (A — AI,)" = 0.
Hint: Use Exercise 3.7.12. 3.7.14
Show
3.7.15
nonzero polynomial p(x) with coefficients in F such that p(T) = 0. Show that the field F = {a + bi : a,b € Q} (see Exercise 1.4.2) is not algebraically closed.
that if V is finite-dimensional
11
and
T € L(V),
then
there is a
3.7.16
Show that the matrix
Sybil}
Prove that if F is a field with only finitely many elements, then F is not algebraically closed. Hint: Come up with a polynomial over F which has every element of
i
A
F as a root, then add 1 to it.
over F2 does not have an eigenvalue.
Perspectives
223
PERSPECTIVES: Bases The list (v),..., vp») of vectors in a vector space V over a field F
is a basis of V
if any of the following hold. @ e
(v;,...,Un) is linearly independent and spans V. Every v € V can be uniquely represented as v = cv] +--+ + Cyn, with cj € F for all i. e (vj,...,0n) is linearly independent and dim(V) = n. e (v},...,Un) spans V and dim(V) = n. e The RREF of the matrix
is the identity I,,, where (v;,...,v,) are the representations of (v,..., vp) in any coordinate system.
PERSPECTIVES: Eigenvalues A scalar 4 € F
is an eigenvalue of T € £(V) if either of the following holds.
e There is a nonzero v € V with Ty = Av. e The map T —AlI is not invertible.
224
Linear Independence, Bases, and Coordinates
PERSPECTIVES: Isomorphisms Let V and W be n-dimensional vector spaces over F. A linear map T : V > is an isomorphism if any of the following hold.
W
T is bijective. T is invertible. T is injective, or equivalently, null(T) = 0. T is surjective, or equivalently, rank(T) = n. If (vj, ..., Un) is a basis of T, then (T(v)),..., T(vn)) is a basis of W. If A is the matrix of T with respect to bases By and By, then the columns ofA form a basis of F”. e IfA is the matrix of T with respect to bases By and By, the RREF of A is the identity I,.
4a" Inner Products
4.1 Inner Products We saw in Section 1.3 that there were various ways in which the geometry of R” could shed light on linear systems of equations. We used a very limited amount of geometry, though; we only made use of general vector space operations. The geometry of R” is much richer than that of an arbitrary vector space because of the concepts of length and angles; it is extensions of these ideas that we will explore in this chapter.
The Dot Product in R” Recall the following definition from Euclidean geometry.
Definition Let x,y € IR". The dot product or inner product of x and y denoted (x,y) and is defined by
is
n
(wy) = So ayy, j=l
oa
J
where x= | : | andy= Tn,
nm,
#1. Show that (x,y)
x forx,y € R”.
The dot product is intimately related to the ideas of length and angle: the length |x|| of a vector x € R" is given by
Ixll = yap ++
+23 =V&»,
226
Inner Products and the angle x,y between two vectors x and y is given by
Oxy
y
=
SS
=
'(
(x,y)
Lixiiiyl
).
In particular, the dot product gives us a condition for perpendicularity: two vectors x,y € IR" are perpendicular if they meet at a right angle, which by the formula above is equivalent to the condition (x, y) = 0. For example, the standard basis vectors €1,...,€; € R" are perpendicular to each other, since (ei, e)) = 0 for
ifj. Perpendicularity is an extremely useful concept in the context of linear algebra;
the following proposition gives a first hint as to why.
VY
Proposition 4.1
Let (e1,...,€n) denote the standard basis of R". Ifv=|
: | €
Vn, R", then for each i, vj = (v, ej).
Proof
For v as above,
(v,e)) =o velede = vj, k since ej has a 1 in the jth position and zeroes everywhere else.
A
We will soon see that, while the computations are particularly easy with the standard basis, the crucial property that makes Proposition 4.1 work is the perpendicularity of the basis elements.
Inner Product Spaces Motivated by the considerations above, we introduce the following extra kind of structure for vector spaces. Here and for the rest of the chapter, we will only allow the base field to be R or C, Recall that for z = a + ib € C, the complex conjugate Z of z is defined by Z = a — ib, and the absolute value or modulus |z| is defined
by |z| = Va +b. Definition
Let F be either R or C, and let V be a vector space over F. An inner
product is an operation on V, written (v, w), such that the following properties hold:
The inner product of two vectors is a scalar: For each v,w € V, (v,w) € F. Distributive law: For each u,v,w € V, (u+ v,w) = (u,w) + (v,w).
4.1 Inner Products
Homogeneity:
227
For each v,w € V and ae F, (av, w) = a(v,w).
Symmetry: For each v,w € V, (v,w) = (w,v). Nonnegativity: For each v € V, (v,v) > 0. Definiteness:
If (v,v) = 0, then
v = 0.
A vector space together with an inner product is called an inner product space.
Notice that if F = R, the symmetry property just says that
(v,w) = (wv), since by the first requirement, inner products over a real vector space are real numbers. Even if F = C, it follows by symmetry that (v,v) € R (we are implicitly using this observation in our nonnegativity requirement).
Examples 1. 2.
R” with the inner product defined on page 225 is an inner product space. For w,z € C", the standard inner product (w,z) is defined by
n (w, z) = » wjZy Jj=1
wy
where w= |
zy
: | andz=
Wn
Quick Exercise #2. [7
3.
see
a]
Zn
Show
that if w,z
€ C", then
(w,z)
= z*w, where z* =
€ M;,,(C) is the conjugate transpose of z.
Let 7 denote the set of square-summable (aj, a9, ...) € @ if DE, a} < oo. Then
(a,b) ==
=
> ajbj
sequences
over
R;
ie,
a
=
(4.1)
defines an inner product on 7. Verifying all but the first property is similarly straightforward to the case of the usual dot product on R”. In order to verify that the definition above gives (a,b)
€ R for any a,b € £?, one has to show
Inner Products that the sum in equation (4.1) always converges; we will postpone confirming this until later in the section (Example 4 on page 235). 4.
Let C([0, 1]) denote the vector space of continuous real-valued functions on [O, 1]. Define an inner product on C([0, 1]) by
trad = [falls 1
The properties in the definition are easily verified. If Cc([0, 1]) denotes the complex vector space of continuous complex-valued functions on [0, 1], then one can define an inner product on Ce([0, 1]) by
5.
(ra) == [pea 1
——
as.
a
For the rest of this section, V will always be an inner product space.
As in the case of fields and vector spaces, we have assumed as little as possible in the definition of an inner product; the following properties all follow easily from the definition. Proposition 4.2 Suppose that V is an inner product space. Then the following all hold: For each u,v, w € V, (u,v+w) = (u,v) + (u, w).
ppp ey
228
For each v,w € V and ae
Proof
F, (v,aw) = Gv, w).
For each ve
V, (0,v) = (v,0) =0.
Ifv € V and
(v,w)
1.
= 0 for every w € V, then
v = 0.
By the symmetry and distributive properties,
(u,v + w) = (v+ w,u) = (v,u) + (w,u) = (v,u) + (w,u) = (u,v) + (u,w). 2.
By the symmetry and homogeneity properties,
(v, aw) = (aw, v) = a (w, v) = a(w, v) = 4 (v, w). 3.
By the distributive property,
(0,v) = (0+ 0,v) = (0,v) + (0,v). Subtracting (0, v) from both sides proves that (0, v) = 0. It then follows from
symmetry that (v,0) 4.
=0=0.
Picking w = v, we have (v,v) = 0. By the definiteness property, this implies that v= 0. A
4.1 Inner Products
229
The following definition gives an analog to length in R". Definition Let v ¢ V. The norm of v is
loll =
V(v, »).
If ||v|| = 1, then v is called a unit vector.
Examples 2
1.
Ifv=]
i | in C? with the standard inner product, then -1
Iw = 2.
+ V2? +12 +11 = V2
+P = Vo.
Consider the function f(x) = 2x in the function space C([0, 1]) with the inner product discussed above. Then
fll =
i
4x? dx = *
In this example, the norm does not seem much like the length of an arrow; the object whose norm we are taking is a function. We nevertheless think of the norm as a kind of measure of size. A Note that (v, v) > 0 by the definition of an inner product, so the square root is defined and nonnegative. Note also that, by the definiteness property of the inner product, ||v|| = 0 if and only if v= 0. Ifc éF, then
llevl| = V(cv, ev) = ¥/Iel? (v, ») = lel [loll « We refer to this property as positive homogeneity; it means that if we multiply a vector by a positive scalar, its “length” gets multiplied by that same scalar.
Orthogonality We observed at the beginning of the section that the idea of perpendicularity was often useful; the following is the generalization of the familiar geometric notion to arbitrary inner product spaces.
Definition Two vectors v,w € V are orthogonal or perpendicular if (v, w) = 0. A
list of vectors (v),...,0,) in V is called orthogonal if (vj, Vp) = 0 whenever
J#R
230
Inner Products
The following result is one reason for the importance of orthogonality in linear
algebra.
Theorem 4.3 An orthogonal list of nonzero vectors is linearly independent.
Proof
Let (v1,..., vn) be an orthogonal list of nonzero vectors, so that (y%, VK) =0
if Ak, and suppose that
for some scalars 4),..., a, € F. Then for each k = 1 n n
Ay [Pell? = ae (De, Pe) = D> 4 (Vj, Ve) = (Som n) = (0,4) =0. j=l
J=1
Since ve 4 0, we know that ||vpg||? 4 0, and therefore this implies that a, = 0. Since this is true for each k, we conclude that (v),...,V,) is linearly independent. A
cA
Quick Exercise #3. Use Theorem 4.3 to verify that the list of vectors
in R‘ is linearly independent. The following result is sometimes called the Pythagorean Theorem
for general
inner product spaces.
Theorem 4.4 If (vj,...,Un) is an orthogonal list of vectors, then
Woy +--+ ell? = foal? +--+ Weal? Proof
By the definition
of the
norm
and
the
distributive
laws
for the
inner
product,
”
oy
-(Su¥on)= j=l
k=1
LEY)
j=l k=l
€'p waioay, Aq yuapuadapu! Ajseau) si}! Os ‘/euoGoyyo SI js!) BY €# WO
4.1 Inner Products
231 Figure 4.1
Decomposing a vector v as
v= aw+u
where u is orthogonal to w.
By orthogonality, the only nonzero terms in this sum are those for which j = k, and so the sum is equal to
n
n
DVle)=V lel’
a
2
j=l
jel
The next result shows us how we can make use of orthogonality even when we
start off with nonorthogonal vectors.
Lemma 4.5 Given v,w € V, v can be written as
v=aw+u, where a € F and u€
V
(4.2)
is orthogonal to w. Moreover, if w # 0, we can take
=
(v, w) [lw
id
=0-
Proof Suppose that w # 0. (For the case that obvious that equation (4.2) holds for the stated simply verify that u is orthogonal to w, but let's from. The thing to notice is that if there is an a € orthogonal to w, then by linearity,
(v, w) w. lll?
w = 0, see Exercise 4.1.21.) It is values of a and u, so we could first see where those values come F such that
v = aw + u with u
(v,w) = (aw + u, w) = a(w,w)+ (u,w) =allw||,
(vs
so the only possibility is = oe. Once a has been determined, the choice of u is determined by equation (4.2) (which says u = v — aw), and it is easy to check that
u is in fact orthogonal to w. The following innocuous-looking theory of inner product spaces.
A inequality is of central
importance
in the
232
Inner Products
Theorem 4.6 (The Cauchy-Schwarz inequality") For every v,w € V,
I(v,w)| < loll wl, with equality if and only if v and w are collinear.
Proof
If w=
0, then both sides of the inequality are 0, and also w = Ov, so v and
w are collinear. If w
4 0, then by Lemma
4.5, we can write v=aw+u,
where a= ce
loll
2
and u is orthogonal to w. By Theorem 4.4,
= Hawi
a
+ lull’
2
2
= lal? [wll
Hail?
2_
|v, w)I 2
+ [hall = Tet
2
Mal :
Therefore
I(v, w)P? Ilo? = oan [lt]
which is equivalent to the claimed inequality. We get equality if and only if u = 0. That is true if and only if v = aw for some scalar a: in other words, exactly when v and w are collinear. A The next result is also Schwarz inequality.
fundamental
and
is a consequence
of the Cauchy-
Theorem 4.7 (The triangle inequality) For any v,w € V,
llv+ wll < [ell + lw.
Proof
By linearity and the Cauchy-Schwarz inequality,
|v + wll? = (vt w,v + w) = (v,v) + (v,w) + (w, v) + (w, w)
= lvl? + 2Re(v, w) + [wll S [lull+ 2|(v,w)| + lw? < lloll? +2 [Jol lel + lawl?
= (loll + (heil)’. “This result is named after the French mathematician Augustin-Louis Cauchy and the German mathematician Hermann Schwarz, who each proved special cases of it. It is also often called Cauchy's inequality (especially by the French), Schwarz’s inequality (especially by Germans), the Cauchy-Bunyakovsky-Schwarz inequality (especially by Russians, who have a point ~ Bunyakovsky proved Schwarz’s special case before Schwarz did), and the Cauchy-Schwartz inequality (by people who don’t know how to spell Schwarz properly).
4.1 Inner Products
233 Figure 4.2 The length of the v + w side of the triangle
is less than or equal to the sum of the lengths of the
/
other sides.
The result follows by taking square roots.
A
This last proof is our first illustration of the following words to live by:
||v||? is much easier to work with than |||). See if you can get away with just taking square roots at the end of your proof.
The next corollary is just a slightly different way of looking at the triangle inequality, which moves the triangle in question away from the origin.
Corollary 4.8 For any u,v,w eV,
lu = ol] < [lu — wll + lw — ol.
Quick Exercise #4. Prove Corollary 4.8.
More Examples of Inner Product Spaces 1.
Let c),...,Cn > 0 be fixed positive numbers, and define
a
(wy) = Vo gay j=l
for x,y € R". This defines an inner product on IR” which is different from the standard inner product (which corresponds to cj = 1 for each j). The standard basis vectors are still orthogonal, but they are no longer unit vectors. Instead, for each j, Jaci
is a unit vector. This means
that the collection
of all unit
vectors (the unit sphere for this inner product space) is an ellipsoid (or, in two
dimensions, simply an ellipse):
‘Ayjenbaut ajGueny ayy Ajddy
(a — a) + (a —n) =a—npeyd
234
Inner Products
X
Figure 4.3 The unit circle for the inner product (x,y) = crt yi+ c2x2y2 on R2,
2.
The following is the function space version of the previous example. Fix a function h € C({a, b]) such that h(x) > 0 for every x € [a,b]. For f,g € C([a, b)), define
b
(fg) = | Slxdglah(x) dx. This defines an inner product. 3.
As we've discussed before, we can think of the space Mm,,(C) of mx n complex
matrices as being C’”” written in an unusual way. The standard inner product on C”” then becomes mon
(A,B) = D> So ajudje
(4.3)
j=l k=1 for A,B
€ Mm,n(C). Recalling that the conjugate transpose of B has (k, j) entry
je, we can see that >}_, ajebje = [AB*]jj, and so equation (4.3) becomes (A,B) = tr AB*. Since this involves both matrix multiplication and the trace, this definition of inner product on My,n(C) seems at least possibly sensible to work with. It is most often called the Frobenius inner product, and the associated norm
||Allp = VtrAA* is most often called the Frobenius norm’ of A.
Quick Exercise #5.
Suppose that A,B
€ M,(R)
are symmetric, meaning that
A= Al and B =B". Prove that
(tr AB)’ < (trA?)(trB?).
“It’s also called the Hilbert-Schmidt norm or the Schur norm, among other names.
“a pue Vv jo Dnpoud sau! sniuaqos4 ayn 07 parjdde Ayjenbaui zremyDs—Ayone> ayp rsnf si SIYL :s# VO
4.1 Inner Products
4.
Returning
235
to the
example
of ¢?
of square-summable
sequences
of real
numbers, we still need to verify that the formula
oe b) = Ye ajb; j=l does
indeed
define
a real number
when
a,b¢¢?.
The
issue is whether
the
infinite sum on the right-hand side converges. The assumption a,b € £7 means that
~
} a; 2 < 00 j=l
oo
> bj 2 < 00. j=l
and
By the Cauchy-Schwarz inequality, for each n € N,
A d lai]
Le
1
rE
Taking the limit as n — oo, we obtain
0
Dilabls j=l
30
|g jt
2
So the original series >a ajbj converges absolutely; by a theorem from calculus, it therefore converges to some real number. This completes the
demonstration that ¢? is an inner product space.
Q—m
KEY IDEAS
e
An inner product space is a vector space with an inner product (-,-), which is a homogeneous, conjugate-symmetric, positive-definite scalar-valued function on pairs of vectors.
e
Inner products define orthogonality (perpendicularity): v and w are orthogonal if (v,w) =0.
e e
Inner products define length: ||vl| := /(v,¥). Examples of inner products include the usual dot product on R”, a modified version on C", and extensions of the same to sequence spaces and function spaces. The Frobenius inner product on matrices is (A,B) := tr(AB*).
e If vand w are orthogonal, ||v+ wl] = y/|Iv||? + ||wll. e
The Cauchy-Schwarz inequality: |(v, w)| < ||v|| wl].
e
The triangle inequality:
||v+ wl] < lvl] + wll.
EXERCISES 4.1.1
Use Theorem 4.3 to show that each of the following lists is linearly independent.
Inner Products
1
a
, 1 1
in R* -1
1 142i] 1],]-3+i],]
b
1
2 0
C)
4.1.2
[21-167 -21-i]
Zi)
-3
A
| inC?
Wi
[! [".
2 2
-2 6
]
3 0]\. in 01
M2,3(R)
d) a — 6x + 1,x” — 2x + 1, 10x” — 8x + 1) in C([0, 1]) Oe that V is a real inner product space, v, w € V, and that
|v—w|?=3
and
|v + wl? = 7.
3S
Compute each of the following: 4.1.3
(v,w) — (b) |||]? + Ihe?
Suppose that V
is a real inner product space, v, w € V, and that
llvtw?=10
and
|jv—wl|l?= 16.
Compute each of the following: =
236
4.1.4
(v,w)
(bd) [lol]? + [lee
Let A,B € Mmn(C). Show that
n
(A,B) = >> (aj,bj) jel
and
n
ale = j=l> lal’. where the aj and bj are the columns of A and B, respectively. 4.1.5
ForA € M,,(C), the Hermitian part of A is the matrix 1
ReA:= 34 +A*) and the anti-Hermitian part of A is the matrix 1
ImA := —(A - A*). 2i
(The notation is analogous to the real and imaginary parts of a complex
number.) (a)
Show that if A € M, (IR), then ReA and ImA are orthogonal (with
respect to the Frobenius inner product).
237
4.1 Inner Products
(b) Show that Al]? = |[Re Alj2 + ||Im Al}. 4.1.6
Suppose that A = [a
tee
an| € Mmn(C)
and B
[bi
tee
bp] €
Mm,p(C), then the (j,k) entry of A°B is abr = (bp, aj). 4.1.7
Equip C([0, 277]) with the inner product
9) = f Plalgtx) dr. 2m
0
Let f(x) = sinx and g(x) = cos.x. Compute each of the following:
(a fll (big =) 9) Then find |jaf+ bg\|, where 4.1.8
a,b
€
R are any
computing any more integrals. Suppose that u,v € V, ||u|| = 2 and ||v|| =
constants,
without
11. Prove that there is no
vector w € V with ||u — w|| < 4 and |jy— w|| < 4.
Suppose that V is a vector space, W T: V
=
Wis
is an inner product space, and
injective. For 1), v2 € V, define
(v1, 02)7 := (Tr, Tr),
4.1.10
where the right-hand side involves the given inner product on W. Prove that this defines an inner product on V. Suppose that V is an inner product space, W is a vector space, and T : V
—
Wiis an isomorphism. For w), w2 € W, define
(wy, 2) =(T~'w1,T~'w2),
4.1.11
where the right-hand side involves the given inner product on V. Prove that this defines an inner product on W. Define 4 (:]
(a)
Jit , [*]) = Xi
+ 4x2y2. Araya:
Use Exercise 4.1.9 to verify that this is an inner product (different from the standard inner product!) on R?.
(b)
Give an example of two vectors in R? which are orthogonal with respect to the standard inner product, but not with respect to this
one.
4.1.12
4.1.13
(c)
Given an example of two vectors in R? which are orthogonal with
(a)
respect to this inner product, but not with respect to the standard inner product. Show that v = 0 if and only if (v, w) = 0 for every w € V.
(b)
Show that v = w
(c)
Let S,T € L(V, W). Show that S = T (Tv, v2) for every v1, v2 € V.
if and only if (v,u) = (w,u) for every ue
V.
if and only if (Sv;,v2) =
Show that the Frobenius norm is not an invariant on M,C).
238
Inner Products
4.1.14
(a)
Prove that if V is a real inner product space, then 1
(v, w) = qile+ w\|? — |v — wll?) for each v,w € V. (b)
Prove that if V is a complex inner product space, then
1
A
5
5
(v,w) = qilu+ w\|? — |v — w|)? + ily + iw)? — ily — iw))?) for each v, w € V. These are known as the polarization identities. 4.1.15
Show that if a),...,@n,61,...,0n € R, then
4.1.16
This is known as Cauchy’s inequality. Show that if f, g : [a, b] > R are continuous, then
(E)=(E*) (Es):
D
a
b
D
( [ Pladgtx) a) < ( i rey? a) ( | gle)? i). 4.1.17
This is known as Schwarz’s inequality. Prove that
,
v3
be
(aybi+an +++ -+anb,)? b2 < (aj +2a5+---+na*) (# + i dhooedh *) for all a),...,4n,b1,...,b, 4.1.18 4.1.19
€ R.
Under what circumstances does equality hold in the triangle inequality? Suppose that V is a complex inner product space. Define (v, W)y = Re (v,w),
4.1.20 4.1.21 4.1.22
where the inner product on the right is the original inner product on V. Show that if we think of V as just a real vector space, then (-,-)p is an inner product on V. Give another proof of part 3 of Proposition 4.2 using homogeneity instead of the distributive property. Prove Lemma 4.5 in the case that w = 0. Given v,w € V, define the function
f() = |v + tw? for t € R. Find the minimum value c of f, and use the fact that c > 0
to give another proof of the Cauchy-Schwarz inequality.
4.2 Orthonormal Bases
239
4.2 Orthonormal Bases Orthonormality Working in bases within an inner product space is often easier than it is in general vector spaces, because of the existence of bases with the following special
property. Definition A finite or countable set {e;} of vectors in V is called orthonormal if
0
(ener) =
iff#k,
we iff=k
1
If B = (e1,...,€n) is an orthonormal list which is also a basis of V, then we call B an orthonormal basis of V.
In other words, an orthonormal perpendicular unit vectors.
set of vectors
is a collection
of mutually
Examples 1.
The standard basis (e;,...,€,) of R” or of C” is orthonormal. 1
2. The list (3 Hl a
1
3 [:1)) is an orthonormal basis of either R? or C?.
Quick Exercise #6. Show that for any fixed @ € R,
6 |. sing!
cos e sind
>) is an
cosé
ao
orthonormal basis of R? and of C?. 3.
Consider the space C,,(R) of continuous 27-periodic functions
f :
R +
with the inner product
a
S(0)g(0) a0.
(fg) = i
Then {1,sin(n@), cos(n@) | n € N} is an orthogonal collection of functions:
, msin(mé) sin; ui titeos ‘mé) cos(n6) an
on sin(n@) cos(m@) do =
F
_c08(n6) mm
2
\o
0
|
ifn £m,
ifm=n =
and if n ~A m,
Qn i
0
sin(n@) sin(m@) dé =
nsin(m@) cos(n@) — mcos(mé) sin(nO)
m2 — n2
Qn
0
C
240
Inner Products and
Qn i
0
msin(mé@) cos(n@) — ncos(mé) sin(n@)
cos(n@) cos(ma)de =
me — 1
Qn 0
On the other hand, i
an
i
2a
(G - —) 2 An
sin?(n0) dd =
0
=n,
‘0
and
Qn
js
an G
cos?(n0) dd =
| ‘0
+ went 2 4n
=
0
It follows that J := {= wed sin(nd), wed cos(n0) | ne n} is an orthonormal set, sometimes called the trigonometric system. It is not an orthonormal basis of Cy (IR) because that space is infinite-dimensional.” Nevertheless, any finite
subcollection of J is an orthonormal basis of its span. 4.
The set {zee
a
ha ei”
| ne z}
is orthonormal
in C),(R).
m, then
e-im dg = 27 trivially, and if n 4 m, then
Qe
[
Qn
elln—ma
cit ein ag —
—m) 0
0
5.
Indeed, if n =
Notice that the real and imaginary parts of these functions recover the trigonometric system described above. Given 1
2
(e),...,¢-1)
and
=
it is
already
(v1,-.-, 0-1).
known
Since
that
(e;,. eve
(v},...,0p)
=)
is
is linearly
4.2 Orthonormal Bases
245
}
0) E (Vip +5 DF-1) = (C1y++ G1)
which implies that @ 4 0, so ej is defined and is a unit vector. For each m = Le. - 1,
fveal= (4S lneenen) k=1
j-l
= (vj, em) — J (vj en) (Cks &m) k=1
= (vj €m) — (vjs €m)
since (€,...,¢-1) is orthonormal,
and so (e,e@m)
=
0. Thus
(e1,...,@) is
orthonormal. It follows from the definition that
€(e1,..., 1,4) = (v1,-..40)), and thus (¢,...,¢)) © (v1,...,;). Finally, we know that (v;,...,j) is linearly independent (by assumption) and that (e),..., ¢;) is linearly independent (since it
is orthonormal), and so
dim(v;,...,vj) = dim(e1,...,@) =j. Therefore in fact (e1,...,¢)) = (v1,.--1¥)Since the previous argument applies for each
j = 1,..., n, it follows in particular
that (e1,...,€n) = (v1,.--, Un) = V, and so (e1,..., en) is also a basis of V.
A
Examples x
1.
Consider
the plane
Then
y||x+y+z=0}.
=
U
the list (vj,v2)
=
We
an
Zz
1
1
-1],] 0
0
is
a
non-orthonormal
basis
for
U.
construct
-1
orthonormal basis (f;, f2) from it using the Gram-Schmidt process: 1
=—v, llvall
e
-
1 ‘ = — | v2 0
ff, =Vv2—(v2,fi)fi =]
ry
0
=a
oyyt
-1}=]
2 1
3
Inner Products
b=
1
Th
1
a
al
246
Quick Exercise #8. How could you extend (fj, f) to an orthonormal basis of all of IR??
2.
Suppose we want an orthonormal product
basis of P2(R)
equipped with the inner
1
= [ Flxdglx) dr.
(4.6)
We've already seen that (1,r,.?) is not orthonormal, but the Gram-Schmidt process can be used to produce an orthonormal basis of its span: e
er) = He. =1.
.
&(x)
=4—- (1)
1s4—
fp ydyax—3,
+ let = /ee—iPar=/5=25 °
©
en(x)
= Be) = 23x
— V3.
(x) =a? = (x? 1) 1 = (2?, 23x — V3) (2V3x — V3) = 2? — $b.
a= (=o, © Wl=Vh@e-r4+1)P © es(t) = RY = 6V5r? — ovr + V5.
We therefore have that (1, 2/3.1—/3, 6/5x? —6/5.x+ 5) is an orthonormal basis of (1,2, a), with respect to the inner product in equation (4.6).
A
Quick Exercise #9. What would happen if you tried to apply the Gra process to a
linearly dependent list (v,
Dy)?
While we usually think of the Gram-Schmidt process as a computational tool,
it also has important theoretical consequences, as in the following results.
Corollary 4.12 Suppose that V is a finite-dimensional inner product space. Then there is an orthonormal basis for V.
{9 ayndwio> 0} ajqissoduui 1 Guryew
‘0 = 2 os pue ‘{!-***-*12} 3 fa aney pinom am quiod awos ye ‘ewua) a2uapuadag Jeaur ayy Ag :6# VO
"] es 1 syiom | 1 | 4 = 4 os ‘7 ul 1opan Auana 0} |eUOBOYLO si yey aniasqo ‘Arannewsariy L 1 (Ea'%‘1y) 0} ssaroid YpiUYrs-wesD ayy Ajdde uayy pue ‘(la * 6'a) 1 fa 101aA Aue YDId :84 VO
4.2 Orthonormal Bases
Proof
process for V.
247
By Corollary 3.12, there is a basis (v),...,,)
(Algorithm
4.11)
can then
be used
for V. The Gram-Schmidt
to produce
an orthonormal
basis A
More generally: Corollary 4.13 Suppose B = (v1,...,v%) is an orthonormal list in a finitedimensional inner product space V. Then 8 can be extended to an orthonormal
basis B! of V. Proof By Theorem 3.27, B can be extended to a basis (v),...,¥,) of V. Applying the Gram-Schmidt process (Algorithm 4.11) to this basis produces an orthonormal basis B’ = (e),...,en) for V. Furthermore, since (vj,...,v%) are already orthonormal, ej
Qn e e e e
»; for j = 1,...,k. Thus BY extends the original list B.
A
KEY IDEAS
An orthonormal basis is a basis consisting of unit vectors which are all perpendicular to each other. With respect to an orthonormal basis, coordinates are gotten by taking inner products with the basis vectors. Every finite-dimensional inner product space has an orthonormal basis. The Gram-Schmidt process lets you start with any basis of a finite-dimensional inner product space and orthonormalize it.
EXERCISES 4.2.1
Verify that each of the following lists of vectors is an orthonormal
basis.
ee
ed 1
1
—]
cs
E 1 1 1
2
4
1
2
1
|,—=]-2],.— =] 4]
2V5|
4
|'2v6],
1;
e
3)
2mi/3
1A
i ,
ania | V3
4
| ink?
4ri/3 | | iinn C3
e2ri/3
248
Inner Products
4.2.2
Verify that each of the following lists of vectors is an orthonormal
basis. (a)
1
1
| ~]
1
1
0
Vi 2)
|,
ra es
e
]-2]
vel
| in{xe
(BCL!
R?|n+m+24=0
|anteata
=U)
o(f JeE. Jeph oeEL pene e
ee
De
1
1 1] 1 1 1} i | af-1) a} -i oy] f241|°2|—1|°2) ey fs}! yet] 4 |°2|-1 1 =i =i i
wo 4.2.3
Find the coordinate representation of each of the following vectors with respect to the orthonormal basis given in the corresponding part of Exercise 4.2.1.
(a) [>|
w}2}
r
3
Find the coordinate
=2
lol
1
4.2.4
ine
wl
L
> 4
representation
of each of the following vectors
with respect to the orthonormal basis given in the corresponding part of Exercise 4.2.2. —3
1 (a) | 2
(b) [;
5
7
Al
0
(c)
;
-3 4.2.5
2
Find the matrix representing each of the following linear maps with respect to the orthonormal basis given in the corresponding part of Exercise 4.2.1. (a)
Reflection across the line y = ry
(b)
Projection onto the x-y plane SP
y
() Tly}=|z Z
4.2.6
xr
x
Zz
(ti) ||"
S|]?
Zz
x
w
y
Find the matrix representing each of the following linear maps with respect to the orthonormal basis given in the corresponding part of Exercise 4.2.2.
4.2 Orthonormal Bases
249
=
y
2
@
@) TA)=a™
f@ Tly|=]z| &
+
Y
TI*]= Zz
Ww
Ww
4.2.7
(a)
ed
Verify
1
i
Vil
that B =
1
i
|'ve
| ~]|-1],—] 0
1
i Z
is an orthonormal
—2
basis of zt
U=}/y|
eR \r+y+z=0 a 1
(b)
Find the coordinate representation with respect to B of | 2 =i
(c)
4.2.8
Find the matrix with respect to B of the linear map T : given by
U >
U
A function of the form
£10) = Y> axsin(kO) + bo + Y° be cos(eo) k=1
for aj,...,n,bo,...,6m
é=1
€ R is called a trigonometric polynomial. If
we put the inner product
va) = f° stove) ao 2
4.2.9
4.2.10
on the space of trigonometric polynomials, express |[f|| in terms of the a; and by without computing any integrals. Recall the basis B = (1, 2/31 — /3,6/5x* — 6/5x + V5) of P2(IR) from the example on page 246, and let D € £(2(R)) be the derivative operator. Find [D]3. Consider the inner product (f,g) = fo faiglx) dx on 2(IR) and the
basis (1,.r, 1”). (a) Compute ||3 — 2x + PP. (b) You should have found that |/3 — 2x + x? ii #37 +2? + 12, Why does this not contradict Theorem 4.10?
250
Inner Products
4.2.11
Carry out the Gram-Schmidt process on each of the following lists of
vectors. 1
2
0
0
0
2
fr]
fa]
fa
1
Oo
il
ol’
1
la
1
1
0
ti fly} L1
4.2.12
2
fo] O
1
(b) | }-1].]1].] 0
(a) || 0 |,J1},]-1 =I
2 0
=I
Carry out the Gram-Schmidt process on each of the following lists of
vectors.
fr]
fa]
fa
1]
(c)
0
1 0
le}
10)
1
0
lo
@
=
4.2.13
0
1
a
fi
(b) | }i],}1].fo
(a) | Jo},)1],}1 0
fo]
[el 1
i
1
§
call
Use the idea of Quick Exercise #9 to determine whether each of the following lists is linearly independent. 1
2)
|3
(a) | ]2},)3],]1
3]
Ja]
[=2 (©) || |=)
(Oe
j2
1
1
1
eee
ee
1] |4]
1
}9
1
|] 2 |Jal) 2
1
ll
=)
4.2.14 Use the idea of Quick Exercise #9 to determine whether each of the following lists is linearly independent. 2
=i
1
1],)2],])
(a)
3
fi] 4.2.15
0
=
fi]
1
=i
1
1],/-1],]
(b)
=i)
-1
1
1
fo
() | }O},}1],]
2
1
=i
le}
1
Carry out the Gram-Schmidt mials 1,2,27,27) with respect products.
process to
each
on of
the
list
of
the
following
polynoinner
251
4.2 Orthonormal Bases
(a) (pa)= [ pindgle) (0 @.a)= f plger? ax
dx) (Pa = i Penalde
1
1
1
4.2.16
Find an orthonormal basis (f,,f2, f3) of R? such that f, and f) are in
the subspace Xr
U={]|y||r-2y+3z=0 Zz
o11
4.2.17
Let
A=]1
0
1]
€M3(R),
and define an inner product on R? by
110
(x, ¥)a = (AX, Ay), where the inner product on the right-hand side is the standard inner product on R?. Find an orthonormal basis of R? with respect to
(rdas 4.2.18
Let A
€ M,(C)
be an invertible matrix,
and
define a non-standard
inner product (-,-), on C” by
(& y)a = (Ax, Ay), where the inner product on the right-hand side is the standard inner product on C”.
(a)
Find (ej, ex), for all j and k.
(b)
Under what circumstances is the standard basis (e1,...,€,) of C”
orthonormal with respect to (-,-)4? 4.2.19
Prove
4.2.20
infinite-dimensional. Suppose that the matrix of
4.2.21
4.2.22
that the space
C),(IR) of continuous
T € £(V)
27-periodic
functions
is
with respect to some basis B
is upper triangular. Show that if © is the orthonormal basis obtained by applying the Gram-Schmidt process to B, then [T]e is also upper triangular. Prove that if (e;,...,e@,) is the orthonormal basis constructed from (v1,-..,¥n) via the Gram-Schmidt process, then (vj,e)) > 0 for each j. Suppose that A € M,(C) is upper triangular and invertible. Show that if the Gram-Schmidt process is applied to the list of columns of A, it produces an orthonormal basis of the form (@1€1,...,@n€n),
where each j.
(€},...,€n)
is
the
standard
basis
and
||
=
1
for
252
Inner Products
4.3 Orthogonal Projections and Optimization Orthogonal Complements and Direct Sums The idea of decomposing a vector v into a component in the direction of u and a component orthogonal to u has come up several times now: for example, in the proof of the Cauchy-Schwarz inequality or in the Gram-Schmidt algorithm. We can begin to make more systematic use of this idea through the concept of orthogonal complements.
Definition Let U be a subspace of an inner product space V. The orthogonal complement of U is the subspace
Ut = {ve V| (u,v) =0 for every u € U}. That is, U+ consists of all those vectors v which are orthogonal to every vector in U.
[4
|
Quick Exer
#10. Verify that U+ really is a subspace.
Examples 1. 2.
If Lisa line through the origin in R2, then L+ is the perpendicular line through the origin. If Pisa plane through the origin in R?, then P+ is the line perpendicular to P through the origin. If L is a line through the origin in R?, then L+ is the plane through the origin to which L is perpendicular. A
/
/
Figure 4.4 Orthogonal complements in R?.
‘Ayauabowoy (aye6n{u0>) Aq o = (av‘n) uayy
‘0 = (a‘n) 41 pue ‘Auanippe Aq 0 = (2a + !a‘n) uayy ‘o = (Zan) = (an) 4+
3 005 ‘0 = (0'N) :O1# VO
4.3 Orthogonal Projections and Optimization
253
The following theorem allows us to uniquely decompose any vector in an inner product space V into a part in a subspace U and a part in the orthogonal
complement U+. Theorem 4.14 Let V be an inner product space. If U is a finite-dimensional subspace of V, then every vector v € V can be uniquely written in the form v=ut+wD,
where u € U and w € U*.
Proof
Since U
is finite-dimensional, there exists an orthonormal basis (€1, . .., m)
of U. Given v € V, we define
m
> (v, ee, j=l and let
w= v—u. Then
v= u+w
and ue
U
trivially. Note that, for each k,
(U, €x) = (0, €k) » so
(W, Ck) = (Vs ek) — (Uy ek) = 0. Now if wu’ is any other vector in U, then write
m
dX ae, j=l
so that m
(w/w) = DP ajo; m) j=l
Therefore w € U+. It remains to show that u and w are unique. Suppose that uj, uz € U, wi,w2
€
U+, and that uy + wy = Wy + Wp. Consider the vector
TSU Since U and U+
— ty = W.-W.
are subspaces, x € U and x € U+, so (x,.x) = 0, which means
that x = 0. Therefore u; = uy and w; = Ww.
A
254
Inner Products When we can decompose every vector in an inner product space as a sum as in Theorem 4.14, we say the space itself is the orthogonal direct sum of the subspaces. More specifically, we have the following definition.
Definition Suppose that Uj;,..., U» are subspaces of an inner product space V. We say that V is the orthogonal direct sum of Uj,..., Um if: every vector v € V can be written as VSU
e
tes +Um
for some u; € Uj,...,Um € Um, and whenever uj € Uj and uz € Uy for j Ak, uj and uy are orthogonal.
In that case we* write
V = U;
@--- ® Um.
Examples 1.
If L; and L2 are any two perpendicular lines through the origin in R2, then
R=1, 2.
ly.
If P is a plane through the origin in R? and L
is the line through the origin
perpendicular to P, then R? = P® L.
A
Theorem 4.14 says that V = U © U*+ for any subspace U C V, generalizing these examples. One consequence of this observation is the following intuitively appealing property of orthogonal complements.
Proposition 4.15 If U is any subspace of a finite-dimensional inner product space V, then (U+)+ = U.
Proof Suppose that u € U. Then for each v € U“, (u,v) = 0. This implies that u € (U+)+, and so U c (U+)+. Now suppose that w € (u+)+. By Theorem 4.14, we can write w = u + v for some u € U and v € U*; we'd like to show that in fact v = 0. Since v = w—u,
and u € U C (U")+, v € (U+)+. So v is in both U+ and (U+)+, which means “Warning: Some people use this notation for a more general notion of direct sum, without meaning
to imply that the subspaces Uj are orthogonal.
25
4.3 Orthogonal Projections and Optimization v is orthogonal to itself, thus v = 0. We therefore have that w = u € U, and so
(U+)+ CU.
a
Quick Exercise #11. Convince yourself geometrically that the proposition is true
for all subspaces of R? and R?.
Orthogonal Projections Definition
Let U be a subspace of V. The orthogonal projection onto U is the
function Py
: V > V
defined by Py(v) = u, where
v = u + w
for u € U and
weU-. We think of Py as picking off the part of v which lies in U.
Theorem 4.16 (Algebraic properties of orthogonal projections) Let U be a finitedimensional subspace of V. 1.
Py is a linear map.
2.
If (e1,...,@m) is any orthonormal basis of U, then
m
Pyv=)> (4) 6 Zi for eachve
V.
3.
For each v € V,
4,
Foreachv,weV,
v— Pyv € U+.
(Pyv,w) = (Pur, Pyw) = (v, Pyw). 5.
Suppose B = (e1,...,n) is an orthonormal basis of V such that (e1,..., €m)
is an orthonormal basis of U. Then
[Pu] = diag(1,...,1,0,...,0),
wane
with the first m diagonal entries 1, and the remaining diagonal entries 0. range Py = U, and Pyu = u for each ue
U.
kerPy = Ut. If V is finite-dimensional, then Py: =I—
Pos lie
Py.
256
Inner Products
Proof
1.
w ,W
Suppose that vy; =
uy + w;
and
vy =
U2 + wW2 for uj,u,
€
U and
€ U+. Then
Dy) + v2 = (uy + Ua) + (wi + wa) with uy; + uy € U and w;
+ w2 € U+ and so
Po(vy + 02)= uy + U2 = Pur
+ Purr.
Thus Py is additive. Homogeneity follows similarly.
This follows from the proofof Theorem 4.14. 3.
Suppose v=
u+w
with ue
U andwe
Ut. Then
By definition, Pyv € U, and by part 3,
v— Pyv
= v—u=weU+.
w — Pyw € U+. Therefore
(Puy, w) = (Pyv, Puw) + (Pur, w — Pyw) = (Pup, Pyw). The other equality is proved similarly. The remaining parts of the proof are left as exercises (see Exercise 4.3.22).
A
The following result uses part 2 of Theorem 4.16 to give a formula for the matrix of Py.
Proposition 4.17 Let U be a subspace of R" or C", with orthonormal basis (fi, ..., fm). Then the matrix ofPy with respect to the standard basis € is
m
Pule = D065 j=l
Proof
By Quick Exercise #2 in Section 4.1 and part 2 of Theorem 4.16, for every v, m
m
vig v= j=lDG lv8) = Det} fi= Pow
-
j=l
Example
Recall that in Section 4.2 we found that if f,)
= —=|—1] v2
if)
—=]|
ve
0
7
1 |, then (fj),f,) is an orthonormal basis of
U = {| y|
~2
and f, =
|x+y+z=0
Zz
in R?. By Proposition 4.17, Pule
= fifi
1
1
-1
0
= 5
-1
1
°0
o
0
0
1
+;
1
1
-2
1
1
-2
—2
-2
4
1
=3
2
-1
-1
-1
2
-1
—-1
-1
2
4.3 Orthogonal Projections and Optimization
Quick Exercise #12.
What
257
is the orthogonal
projection
of | 2]
onto
the
subspace U in the example above?
If a subspace U of R" or C" is described in terms of a basis which is not orthonormal, one option for finding the orthogonal projection onto U is to perform the Gram-Schmidt process, then apply Proposition 4.17. The following result gives an alternative approach.
Proposition 4.18 Let U be a subspace of R" or C" with basis (vj,...,Vg). Let A be the n x k matrix with columns vj,...,Vp. Then
[Pule = A(A*A)“!A*. Note the implicit claim that the matrix A*A is invertible. Proof
First observe that Pyx
is necessarily an element of U, and so it is a linear
combination of the columns of A. In other words, there is some X such that Pyx = Ax.
By part 3 of Theorem 4.16, x —Pyx=x— Axe
Ut.
In particular,
(x — AR, vj) = v} (x — AX) =0 for eachj. Rewriting this system of k equations in matrix form gives
A*(x—Ak)=0 Assuming
that A*A
is indeed
invertible,
A*AK=A*x. multiplying
both
sides by A(A*A)"!
completes the proof, since Ax is exactly Pyx. To see that A*A
is invertible, it suffices to show that the null space of A*A
is
trivial. Suppose then that A*Ax = 0. Then
0 = (A*Ax,x) = x"(A*Ax) = (Ax)*(Ax) = ||Ax||?. But A has rank k, since its columns are assumed to be linearly independent, and so nullA = 0. It follows that x = 0, and so A*A is invertible.
A
258
Inner Products
Theorem 4.19 (Geometric properties of orthogonal projections) Let U be a finitedimensional subspace of V. 1. 2.
Foreach ve V, |\Pyv|| < |\v\|, with equality if and only if v € U. For each ve V andue U,
lv — Pupil < llv— ull, with equality if and only if u= Pyv.
The first part of Theorem 4.19 says that projections are contractions; i.e., they can only make vectors shorter. The second part says that Pyv is the closest point in U to the vector v.
Figure 4.5 ||Pyvll < |lvll-
Figure 4.6 The dashed line perpendicular to U is the shortest line between points in U and v.
Proof
1.
Since
v—Pyve
ut, v— Pyv is orthogonal to u, and so
Ww? = Puy + (v= Pud)|l? = |/Pull? + lv — Purl? > Purl. Equality holds in the inequality here if and only if ||v — Pyv||? = 0, which is true if and only if v = Pyv, and this holds exactly when v € U.
2.
Since v— Pyv € U+ and Pyv—ue
U,
lv — ull? = Iv — Pur) + (Pu — WI? = |lv— Pup||? + ||Puv— ull?
> |v — Purl’.
259
4.3 Orthogonal Projections and Optimization
Equality holds in the inequality here if and only if \|Pyv — ull? = 0, which is true if and only if u = Pyy, i.e., if and only if v € U.
A
Finding the closest point to a given vector v within a specified subspace U can
be seen as finding the best approximation of v by a point in U. The fact that the best possible approximation is given by orthogonal projection of v onto U (and so finding it is easy!) is crucial in many applications; we illustrate with two of the most important examples.
Linear Least Squares Consider a set of points {(x;,yj)}_, © R?; think of them as data resulting from an experiment. A common problem in applications is to find the line y = mr +b which comes “closest” to containing this set of points. Define the subspace mx, +b mx,
v=
+b
.
m,be R}
= {mx+b1|m,b eR} CR"
mx, + b-
spanned by the vector x of first coordinates of the data points and the vector 1 with all entries given by 1. The points (1, yj) all lie on some line y = mr + b if and only if the vector y of second coordinates lies in U. Since U is only a twodimensional subspace of R", however, this is not typically the case. By part 2 of Theorem 4.19, the closest point to y in U is Pyy; taking the corresponding values of m and b gives us our best-fitting line. This approach to fitting points to a line is called simple linear regression. It is also called the method of least squares, because minimizing the distance of y to U means finding the values of m and b for which
n
Dla; + b = yi? i=1
is minimal.
Example Consider the set of five points {(0, 1), (1, 2), (1, 3), (2, 4), (2, 3)}. The sub1
0
U
NNER O
l
>
space
C
R°
defined
above
is spanned
by x
=
NNee
1
and
1
=
| 14; 1 1
let
Quick Exercise #13. Show that
«, [10
6
wa
5
wt
and
(A*A)
fs
=Tal-6
-610:
(eed)
We thus take m points:
=
¢
and b =
ee
NNO
Then by Proposition 4.18,
oRYNS i]
o
Inner Products
pe SRUNS i] NNO
260
8 to get the best-fitting line to the original
5 ”
-
2
7
“d | a
1
1
1
0
Figure 4.7 The best-fitting line to {(0, 1), (1, 2), (1, 3), (2, 4), (2, 3)}.
A
Approximation of Functions Another
important
kind
of approximation
problem
is to approximate
a com-
plicated function by a simpler function. (Just what counts as “complicated” or “simple” depends on the context.) For example, although we know how to do lots of abstract manipulations with exponential functions, actually computing with them is much harder than with polynomials. So we may wish to find a good approximation of an exponential function by, say, a quadratic polynomial. In your calculus classes, you've probably already encountered one way to do this, namely Taylor polynomials. Taylor polynomials, however, are designed to closely approximate a function near one specific point, and may not be the best thing to approximate a function far from that point. On the other hand, the norm
4.3 Orthogonal Projections and Optimization
261
b
if all = i (fx) = glx)? dx is designed to measure how similar two functions are over an entire interval. Theorem 4.19 tells us exactly how to find the best approximation of a function in terms of such a norm. Say for example we want to approximate the function f(x) = e" by a quadratic polynomial q(x) on the interval [0, 1] so that
[ver auir ar 1
is as small as possible. By part 2 of Theorem 4.19, the best possible approximation
is q = Pp, gf, using the inner product
9) = i Slerlgle) de 1
on C((0, 1]). In the last section we found an orthonormal basis (e1, e2,3) of P2(R)
with this inner product:
ei(4) = 1,
en(x) = 2V3x — V3,
and
s(x) = (6/52? — 65x + V5).
From this we can compute Kens
fe draent,
— V5) dr = Ve +3V5, (rea)= [Ove (f,e3) = [ €(6V5x? — 65x + V5) dx = 7V5e — 19V5, and so
x) = (Pr yf)
= Uf e1) el) + (f,e2) eal) + (f, 3) esx) = (39e —
105) + (—216e + 588)x + (210e — 570)x?.
Here are plots showing f(x) = e", the approximation q(x) computed above, and, for comparison, the Taylor polynomial 1 +. + = of f about 0:
Q— e
Key IDEAS
The orthogonal complement of a subspace is the set of all vectors perpendicular to the subspace. For example, the orthogonal complement of the x-y plane in
R? is the z-axis. e
If (uj,...,Um,W1,-.-,Wp) is an orthonormal basis of V such that (u;,..., Um)
is an orthonormal expanded as
basis of a subspace
U, then
any vector v €
V can be
262
Inner Products
“0.0
0.2
04
0.6
08
10
0.88
0.90
0.92
0.94
0.96
0.98
1.00
Figure 4.8 Graphs of f(x) = e", the approximation q(x) and the Taylor polynomial of f about
0. The two graphs which are virtually indistinguishable in these plots are f(x) and q(x). The third is the Taylor polynomial, which gets to be a much worse approximation the farther you get from 0. m
P
v= Yo uj)uj+ » (v, We) Wes
i=l
kal
and the orthogonal projection of v onto U is
m Pyv= Yeu) j=l
wy
e Orthogonal projection of v onto U gives the closest point in U to v. e Linear least squares and good approximations of functions over intervals can both be seen as applications of the fact that Pyv is the closest point to v in U.
EXERCISES
4.3 Orthogonal Projections and Optimization
4.3.2
Find the matrix
263
(with respect to the standard
basis) of the orthogo-
nal projection onto the span of each of the following lists of vectors.
(a)
“llellal|
(b)
0
|
0
5
i}
1
y
1],/0]}]inR®
Find
i},|1)}
4
0 il
(d) |} 7]
1 0.
the matrix
inc? é
] ink"
2 il (with
respect to the
standard
basis)
of each
of the
following subspaces. (Part 8 of Theorem 4.16 may be useful.) 1
fa)
{]2]}+cR? 3 x
€R? | 3r—y—5z=0
y|
N . There is a whole family of £” norms given by VP xl = (Sf bl’) ” for any p > 1, which we won't discuss any further here.
jsi0139A aye UDiym pue S1B|eDS 812 SO Y>IYM PUEISIAPUN NOK ans exeW\ “0 = oll O = [lOO = [loll ‘yaua6oWoY annisod Ag “pL# WO
268
Inner Products The norm properties again follow from those of |-|, together with the fact that max (a; + bj) Isjsn
3.
< max aj + max bj. Isjsn Isjsn
The L' norm on the space C([0, 1]) is defined by 1
Wh = f
Y)| dx.
This is the functional analog of the ¢! norm described above. Confirming that the L' norm is in fact a norm on C([0, 1]) mostly follows easily from the fact that |-| is a norm on R. Definiteness takes slightly more work: suppose thatf € C([0, 1]) such that
1 f
(f)|
dx =0.
0
If there is an xo € [0, 1] such that f(1o) # 0, then by continuity of f, there is
an interval I C [0,1] containing xo such that [f(y)| > Pro) for all y € I. If I = [a, b], then
[ v0 ay> [vn dy>
(b= @) foro) _ 0, 2
which is a contradiction. 4.
The supremum
norm on C([0, 1]) is defined by
IF lloc = max, (Fa) (Recall from calculus that a continuous function on a closed interval has a
maximum, either inside the interval or at one of the endpoints.) The supremum norm is the functional analog of the £* norm above; checking that it is in fact a norm is also analogous. A The following result is a fundamental
fact about norms that come from inner
products. Its name derives from how it relates the lengths of the sides of a parallelogram to the lengths of its diagonals.
Proposition 4.20 (The parallelogram identity) If V is an inner product space, then for any v,we V,
lly + wll? + lly — wil? = 2 [lvl]? + 2 lw? Proof
See Exercise 4.4.20.
(4.8) A
The point of Proposition 4.20 is that, although it is a theorem about inner product spaces, it directly involves only norms, not inner products or even orthogonality. That means that it can be used to check whether or not a given norm
4.4 Normed Spaces
269 Figure 4.9 The sum of the squares
of the lengths of the four sides of
the parallelogram equals the sum of the squares of the lengths of
the diagonals.
comes from an inner product: if you can find any two vectors
v and
w ina
normed
cannot
be the
space
is the
space which
fail to satisfy equation
norm associated to any inner product. An easy consequence of the triangle following fact, which we will need below.
Proposition 4.21
(4.8), then the norm
inequality in a normed
If V is a normed space, then for any v,w € V,
Ilel] = ell < lv — wh).
Proof
By the triangle inequality,
[lvl] = |v — w + wl < |v — wl] + el, and so ||v|| — ||w||
R given by f(v) = \\Tu|| is continuous.
(4.9)
270
Inner Products
Proof
The first inequality in formula (4.9) follows from Lemma
4.21.
For the second, let (e1,..., €n) be an orthonormal basis of V. Then for any v € V,
\|Tv|| =
|T (= (v, e) +}
j=l
=
Dai (v, e)) Te] < zl v,
j=l
where the last inequality follows from the triangle inequality. Now, by the Cauchy-
Schwarz inequality, |(v, ¢;)| < ||| |e] = llvll, so
8 Tol] < llell )> |Te; -
(4.10)
j=l The second inequality in formula (4.9) now follows by applying the inequality (4.10) with v = v) — va, with
" Ci= > Teil.
a
j=l
Quick Exercise #15. The proof of Lemma 4.22 doesn’t always give the best possible value of C (although this isn’t important for how we will use the lemma below). If T = I is the identity map on V, then: (a)
What is the best possible value of C?
(b) What value of C does the proof produce?
Lemma 4.23 Let V and W
be finite-dimensional inner product spaces, and let
T € L(V, W). Then there is a vector u € V with ||u\| = 1 such that
[Tvl] < Tull whenever v € V and |\v|| = 1. We summarize the situation in Lemma 4.23 by writing
|Tul| = max ||ToI|. Iie i] For the proof, we will need a fundamental fact from multivariable calculus. For SCV,
we say that S is bounded if there is a constant C > 0 such that
lls function on a
finite-dimensional
inner product
space, and
R is a continuous S
C V
is a closed,
bounded subset of V, then there is a point so € S such that f(s) < f(so) for every sé S;ie,
max f(s) = seS
f(So).
This is a generalization of the familiar fact that if f : [a,b]
+ R is continuous,
thenf achieves a maximum value at some point in the interval [a, b], either in the interior or at one of the endpoints.
Proof of Lemma 4.23
Consider the set
Si= {ve V| loll = 15 S is a closed and bounded subset of V, and by Lemma 4.22, f(v) = continuous function f : V > R. Thus there is a vector u € S such that
||Tv|| is a
max ||Tv|| = | Tul] .
A
veS
Definition Let V and W be finite-dimensional inner product spaces, and let T € L(V, W). The operator norm’ of T is T|lop == UTllop == max max
Tol
||T|] .
llo=1
An equivalent formulation of the definition (see Exercise 4.4.3) is that |j Tllop is
the smallest number C such that
|Tv|] < Chol
(4.11)
for every v € V. Lemma 4.23 says that there is at least one unit vector v for which
equality holds in the inequality (4.11) for C = | T|lopWe have once again given a definition containing an implicit claim, namely that the quantity ||T||,, defined above does define a norm on L(V, W). The next theorem verifies this claim.
Theorem 4.24 Let V and W
be finite-dimensional inner product spaces. Then
the operator norm is a norm on L(V, W).
* Also called the spectral norm (among other things), for reasons which will be explained later (see Section 5.4).
272
Inner Products
Proof
It is obvious that ITllop = 0 for every T € L(V, W).
If |Tllop = 0, then for every v € V, ||Tv|| = 0 by the inequality (4.11). Therefore Tv = 0 for every v € V, and thus T = 0, the zero operator. Ifa € F and T € L(V, W), then for every v € V,
(@T)v|] = la(Tv)|| = lal |Tvll, which implies that
WlaT lop = max |\(aT)v|| = max |a| | Tvl] = |a| max ||Tv|| = Jal ||TllopveV, veV, veV, Noll=1
Hell 1
lel 1
Finally, if S,T € £(V, W) and v € V, then
I(S + Tol] = Sv + Tol] < Sol] + Toll
< USllop loll + UT lop lll
by the triangle inequality in V and the inequality (4.11). Therefore
IS +Tllop = ba (S + Tol] < my, (Slop + UT lhop) wll} = Slop + IIT lop Wvl=1 ol=1
A
Quick Exercise #16. What is the operator norm of the identity map I on a finite-dimensional inner product space? As usual, anything that can be done for linear maps can be done for matrices.
Definition If A €¢ M»,»(F), then the operator norm’ ||A||,, of A is the norm of the linear map in £(F", F”) whose matrix is A. That is,
Allop = a \|Av||veF", vist
*Also called (as before) the spectral norm, and also the induced norm. This is often denoted |/A\l>, because of its relationship to the ¢? norms used for vectors in the definition, but the same notation is sometimes also used for other things.
1 = @91p\) os ‘a Asana 40} Ila] = llazl| aouay pue ‘a = a7 :9L4 WO
273
4.4 Normed Spaces
Quick Exercise #17. What is ||I;||o)? What is the Frobenius norm |jI,||? One of the most important uses of the operator norm is to understand the effect of error on solutions of linear systems." Say you want to solve an n x n linear system Ax = b over R. If A is invertible, then this can be solved by matrix multipli
x=A'Ax=A™'b. Now suppose that you don't know the entries of b exactly; this could be because they come from real-world measurements, or were calculated by a computer and are subject to round-off error, or some other reason. Intuitively, if the error in b is small, then the error in the computed solution x should also be small. But how
small? Suppose that h € R"” is the vector of errors in b. That is, instead of the true vector b, you actually have b = b +h to work with. In that case, instead of the true vector x = A~'b, the solution you compute will be
A7'(6)
=A +h)
=A D+ ATH=x+ATh
Thus the error in your computed solution is error using the operator norm of AW!:
A~'h. We can bound the size of this
|A“*h| = JAW! ll, where
the first and third norms here are the standard norm
operator norm
At] op tells us how
on R". That is, the
an error of a given size in the vector b
propagates to an error in the solution of the system Ax = b.
Q—m e
KEY IDEAS
A normed space is a vector space together with a notion of length: a norm is a positively homogeneous, definite, real-valued function which satisfies the
triangle inequality. e Inner products define norms, but not all norms come from inner products. The parallelogram identity lets you check whether a norm is coming from an inner product. e The vector space of linear maps between normed spaces V and W has a norm called the operator norm: | T||op = maxyyj=1 | TH). © The operator norm of A~! tells you how much an error in b propagates to an error in the solution of Ax = b.
“Here we're using the word error in the sense of small changes being made in data, not in the sense of making mistakes in your work. The latter issue is much harder to deal with!
u/s = 4|"4I| ‘puey sayyo ay up ‘dew Ayquap! ayy sjuasaidai xuyew Auap! ayy aouis ‘1 = 4°)"qI| yey, smoys asiouaxa y>DINb yse| AYL =ZL# VO
cA
274
Inner Products
EXERCISES
4.4.1
4.4.2
(a)
Show that the ¢' norm is not the norm associated to any inner product on R” or C". (b) Show that the ¢* norm is not the norm associated to any inner product on R” or C”. Show that the operator norm is not the norm associated to any inner
product on My,n(C) when m,n > 2. 4.4.3
Show that if V and W are finite-dimensional inner product spaces and T € L(V, W), then | Tllop is the smallest constant C such that
[Tol] < C [loll 4.4.4 4.4.5
forallve V. Show that if A € C is an eigenvalue of A € M,(C), then |A| < ||Allop. Suppose you are solving the n x n linear system Ax = b using A7!, but there are errors in b. If you know that at most m of the entries of b have
errors,
e > 0, what compute?
4.4.6
and
that each entry
has
an error of size at most
can you say about the size of the error in the x you
Show that
dy IIdiag(d),...,dndllop = and, 4.4.7
Show that if A € Mmn(C), then ||Allop > |laj|| for each j, where a; is
4.4.8 4.4.9
the jth column of A. Show that if A € Mm,n(C) and B € My,p(C), then ||AB||z < ||Allop IIBllFSuppose that U, V, and W are finite-dimensional inner product spaces, and that S € £(U,V) and T € £(V,W). Show that ||TS|lop
then B is invertible. Hint: First show that if x € kerB and x # 0, then Alloy I|Axl|
||Al|p. The inequality follows from Exercise 4.4.17.
Show that the operator norm is not an invariant on My(C).
Hint: Consider the matrices
-
H
and
:
;
, and use Exercises
4.4.6 and 4.4.7.
4.4.15
Show that if x € C”, then ||Xlloo < IXllz < /7|IXlloo-
4.4.16
Show that if x € C”, then ||x|], < [Ixll, < /1IIxll2. Hint: For the first inequality, use the triangle inequality for the é? norm. For the second inequality, use the Cauchy-Schwarz inequality.
4.4.17
Show that if A € Mm,n(C), then ||Alloy < |All < V7 llAllop-
4.4.18
For f € C((0, 1), let Il = y/ fo tel? a. (a)
Show that if f € C([0, 1]), then
Illy 0 such that
ls
Cll
for everyf € C([0, 1]).
Hint: Consider the functions
Ina) =
l—nx
0)
ifOo W
>(ITv + Tv2)|)? = Tr: — Tv2)||’)
(T(r + v2)? = T(r = v2)1I?) (ilu + vall? = Ilo = val?)
= (V1, 0%).
278
Inner Products The proof in the case F = C
is similar (see Exercise 4.5.23).
A
Geometrically, this means that a linear map between inner product spaces which preserves lengths of vectors must also preserve the angles between vectors. The following result gives a related perspective.
Theorem 4.27 An invertible linear map T € £(V,W) between inner product spaces is an isometry if and only if, for each v € V and we W,
(Tv, w) = (v, T 'w).
Proof
(4.13)
Suppose first that T is an isometry. Since T is surjective, w = Tu for some
u € V, and so
(Tv, w) = (Tv, Tu) = (v, u) = (v, Tw). Now suppose that T is invertible and that equation (4.13) holds for each v € V and w € W. Then T
is surjective, and for each vj, v2 € V,
(Tv1,Tv2) = (v1, TT v2) = (v1, 02) « A
Thus T is an isometry.
Example Let Re : R? > R? be the counterclockwise rotation of the plane by an angle of @ radians. Then R,'
= R_», and the theorem above says that, for any
ve R,
(Rov, w) = (v,R_9w) ; i.e., rotating v by @ and then measuring the angle with w is the same as measuring the angle of v with the rotation of w in the opposite direction. A
Theorem 4.28 Suppose V and W
are inner product spaces and (e,...,€n) is
an orthonormal basis of V. Then T € £(V,W) (Te,,..., Ten) is an orthonormal basis of W.
Proof
is an isometry if and only if
Suppose first that T is an isometry. Then for each 1
RR? be the counterclockwise rotation by @ radians. Geometrically, it is obvious that Ry is an isometry: rotating a vector does not change its length. To verify this explicitly, one can simply note that
cos(#) |_| —sin(9)
(Ree1, Ree2) = (peal ,
cos(é) )
is an orthonormal basis of R?, so Rg is an isometry by Theorem 4.28.
2.
Let T : R? > R? be given by reflection across the line y = 2x. Again, it is geometrically clear that T preserves lengths. Recall that we found in the example on page 189 that
nono
3.
which is an orthonormal basis of R?. Consider the inner product space C[0, 1] with
1 (f.9)
| SOgiidx.
(!a!9!2) 2 = fa!oJ = an yeyy ypns yf > 19 ae asaya UayY 'M > 41 JI :6LA VO
280
Inner Products Define the map
Tfla) =f — 2). Then T
is a linear map, which is its own inverse and is therefore surjective.
Since
Tf = i [ trrear = i [0 -2rae= i [vewra= wi, 1
1
1
T is an isometry. 4.
Let
V be an
inner product
space with
orthonormal
basis
(e),...,€,).
Let 2
be a permutation of {1,...,/}; ie, 7 : {1,...,m} > {1,...,} is a bijective function. We can define a linear map Tz : V > V by letting Tz(e) = ex@
and extending by linearity. Then T, is clearly an isometry, since it takes an orthonormal
basis to an orthonormal basis. In the case of, say, R?
and the
standard basis, this is another example along the lines of the previous one, since the map which swaps e; and e; is exactly reflection in the line y= 7. A
Recall that two finite-dimensional vector spaces are isomorphic if and only if. they have the same dimension; the following result extends this to inner product spaces.
Corollary 4.29 Let V and W be finite-dimensional inner product spaces. Then V and W are isometric if and only if dim V = dim W.
Proof
Since an isometry is an isomorphism, Theorem 3.23 implies that if V and
W
are isometric then dim V = dim W. Now suppose that dimV = dimW = n. By Theorem 4.12, V and W have orthonormal bases (e),...,¢n) and (fi,.. respectively. By Theorem 3.14, there is a linear map T : V > W such that Te; = fj for each j, which by Theorem 4.28 is an isometry. A
Corollary 4.30 Suppose that By and By are orthonormal bases of V and W, respectively. Then T € L(V, W) is an isometry if and only if the columns of the matrix [T]3y,%y form an orthonormal basis of F".
Proof
The columns of [T].3,,5y are the column vectors [Tex]-3,,, where By = (ei,...,€n). According to Theorem 4.28, T is an isometry if and only if the vectors Tez form an orthonormal basis of W, and by Theorem 4.10, this happens if and only if their coordinate representations [Teg], form an orthonormal basis of F”, A
281
4.5 Isometries
Orthogonal and Unitary Matrices The following proposition gives a convenient characterization of orthonormality of the columns of a square matrix.
Proposition 4.31 The columns of a matrix A € My(E) are orthonormal if and only if AYA = Ip.
| Proof
Write
A =
|a;
|
| ---
a,
|. By Lemma
2.14, the (j,k) entry of A*A
is
|
aj*ap = (ar, aj). Thus (a;,...,a,) is orthonormal if and only if A*A has diagonal
entries equal to 1, and all other entries 0.
A
Square matrices with orthonormal columns are sufficiently important that they get a name:
Definition A matrix A € M,(C) is called unitary if A*A = I,. A matrix A €
M,(R) is called orthogonal if ATA = I,,.
Note that if A has only real entries then A*
= A‘. Therefore an orthogonal
matrix is the same thing as a unitary matrix with real entries. Thus Corollary 4.30 says that isometries between inner product spaces are represented (with respect to orthonormal bases) by unitary matrices in the complex case, and by orthogonal matrices in the real case. This is one of the many reasons that inner product spaces are especially nice to work with: it is trivial to compute the inverses of the corresponding structure-preserving maps, since if A is unitary,
3
Quick Exercise #20. Verify directly that the matrix | ,° 5
reflection in the line y = 2x (see page 189) is orthogonal.
eeers us
then A~! = A*.
which represents
The discussion above tells us that certain maps of Euclidean space which are interesting geometrically (those that preserve lengths and angles) can be simply described algebraically. This simple algebraic description actually lets us go back and understand the geometry better. The situation is simplest in R?: consider an orthogonal matrix U € M)(R). Now, the first column uj, of U has to be a unit
282
Inner Products
cos(6) vector, so we can write it as f | for some @ € [0, 277). The second column up is sin(@)
also a unit vector which is perpendicular to uj, so there are only two possibilities:
cos(0) [Sine
(a9 |
cos(@)
sin(0) ost]
7
Figure 4.10
uz =+
—sin(@)
[ cos(6) |
That is, we must have either
_ | cos(@)
Us [se
—sin(@)
cos(9) ]
__ | cos(@)
or
= [sat
— sin(@)
oa:
The first case is a counterclockwise rotation by @. In the second case, we can factor
Uas U=
cos(@)
~ | sin(@)
—sin(@)|}1
cosa)
0
|}0
-1]’
and so U corresponds to a reflection across the x-axis, followed by a rotation by 0. This proves the following well-known fact:
All length-preserving linear transformations of IR? are combinations of rotations and reflections. Really, we have shown more, namely that every length-preserving map is either a single rotation or a reflection across the x-axis followed by a rotation. Exercise 4.5.5 asks you to show that these last maps are actually reflections of R? (across a line determined by 6).
4.5 Isometries
283
The QR Decomposition The following result is known as the QR decomposition and is particularly useful in numerical linear algebra.
Theorem 4.32 If A € M,(F) is invertible, then there exist a matrix Q € My(F) with orthonormal columns and an upper triangular matrix R € M,(F) such that A=QR.
Proof
Since A
is invertible, its list of columns (a;,...,a,) forms a basis of F”.
Applying the Gram-Schmidt process to this basis produces an orthonormal basis (qi,..-,4n) of F” such that
(ai,---+aj) =(a1.---.q) for each j. The matrix Q with columns qi,...,qn is unitary, and so
R=Q'A=Q°A has (j,k) entry
Tk = Oak = (ax, qj) Since ag € (qi,-..,qx) and the qj are orthonormal, if j > k then rj, = 0. That is, R
is upper triangular. Finally,
A = QR by the definition of R.
A
Quick Exercise #21. Suppose that the columns of A € M,,(C) are orthogonal and nonzero. What is the QR decomposition of A?
The QR decomposition is useful in solving systems of equations: given Ax = b, if A = QR as above, then
Ax=b
=
Rx = Q*b.
Since Q is known a priori to be unitary, its inverse Q~! = Q* is trivial to compute and, since R is upper triangular, the system on the right is easy to solve via backsubstitution. More significantly for real-life applications, the fact that Q* is an isometry means it doesn’t tend to magnify errors in the vector b. Solving an upper triangular system by back-substitution is also less prone to round-off and other sources of error than going through the full Gaussian elimination algorithm.
(el) «- -* uy
f | teIpBerp = w pue “fe! | S10 Jo uwnjo>
ayy Os “y yo “et:***Te suuunjo> ay) sazijeuou ysni ssa2qud ypiwyrs—wesd ay} ased siyy Ul
“LZ7# WO
G
284
Inner Products Example To solve
3 5][x]_[-1 = 4
6lly
;
1
(4.14)
first find the QR decomposition of the coefficient matrix via the Gram-Schmidt process:
1
= —a,=-
Oe
Tal
1/3
,
5 [4
so
a
q2 =a 2 —
(
)
(a2,261)
39
5
q
-=— 25/4
6
1
|3
8
=—35 | —6
:
and 1
a
~ Teel
2 =
1
4
:
3
Now, r_
-1_
ou=e
4
143
4]
4
and multiplying the system in equation (4.14) through by Q~! yields the upper triangular system
1/25
39/Jr}_
5)0
2]\;y|
1]
1 5}-7}"
From this we can read off that y = -3, and then r=
re
—(1‘
25
39) =—
0
,
((" "!)
and singular values 0; = V2 and 02 = 1.
A
If this choice of input/output vectors didn’t jump out at you, don’t worry; we will see how to find singular value decompositions systematically in Section 5.3. The proof of Theorem 5.1 takes some work; we begin with a preliminary lemma.
Lemma 5.2 Let V and W be finite-dimensional inner product spaces, and let T € L(V, W). Let e € V be a unit vector such that
WTllop = IITell -
(5.1)
Then for any vector u € V with (u,e) =0, we have (Tu, Te) = 0.
X Se UONDAIIP alUeS au) UI s]Ulod JaAaU XJ PUR ‘[eUOIsUaWIP-auO si aredsuaBIa-o ays ‘14 VO
291
5.1 Singular Value Decomposition of Linear Maps
Proof To simplify notation, write T = 0 and the result is trivial.
o =
IT llop- Observe
first that if
o = 0, then
Next, recall that
Tvl] < o loll for every v € V. Thus for any a € F and u € V with (u, e) = 0,
Te + au)||? < 0? lle+ aul|* = 0 (\lel? + llaull?) = 0? (1 + Ia? ull”), (5.2) where the first equality follows from Theorem 4.4. Expanding the left-hand side of inequality (5.2),
|T(e + au)||? = (Te + aTu, Te + aTu) = ||Tel|? + 2Re (a (Tu, Te))+ |laTull”
(5.3)
> 07? +2Re(a(Tu,Te)). Combining inequalities (5.2) and (5.3), we get that
2Re(a (Tu, Te)) < 0” |lul* |al? for every scalar a € IF, Letting a = (Te, Tu) ¢ for ¢ > 0, we get
2|(Tu, Te)|? € < 0? |lull? | (Tu, Te)|? e?
(5.4)
for every € > 0. But if (Tu, Te) # 0, then inequality (5.4) can only be true for 2
é2
o? |u|?”
Thus it must be that (Tu, Te) = 0.
Proof of Theorem 5.1
A
Let S,={(veV|
|v = 1),
and let e; € S; be a unit vector such that ||Te;|| = || Tllop. (Recall that Lemma 4.23 guarantees that such a vector exists.) Write
01 = |ITllop = lITerll If o; = 0, then Tv = 0
for every v € V, so the theorem holds trivially. If 0; > 0,
then define 1
fi = —Te
€ W,
a
so that Te; = of; and |[fil| = ¢ IITer|| = 1. Now we repeat the argument restriction of T to V2. Let
in the subspace V2
S2= {ve V2 | lll = 1},
:= (e;)+. Let Tz denote the
292
Singular Value Decomposition and the Spectral Theorem
let e2 € Sy be a unit vector such that
Teal] = IT2llop = max ||Tvl| ; veS,
note that (e2, e;) = 0 by construction. Define
62 = lITallop = lITeall -
Quick Exel
#2.
Show that o2
. We can thus extend (e;) and (fi) to orthonormal bases any way we like and have Te; = 0 for j > 2. If o2 4 0, then define a unit vector f= By Lemma
1
—Te
o
ew.
5.2, since (e), e2) = 0 and ||Te;|| = IIT llop» we have that (Te;, Tez) = 0
and so 1
(fi.f2) = ——
0102
(Tei, Ter) = 0.
We continue in this way: after constructing orthonormal vectors e€),..., en € V such thatf; = Te1,...,f% = Te, are orthonormal, we define
Sei = [ve (en.-ser)* | l= 1} and pick e,;;
any point op4; an
orthonormal
such that ||Tv||
is maximized
on Spy)
by opy1
=
||Teg+1\l. If at
= 0, then we are finished; we can fill out the list (e;,..., eg) to basis of V, and
since all of the vectors
ej with j > k+ 1
lie
in Spii, Te; = 0 for j => k + 1. Since V is finite-dimensional, at some point the process must terminate, either because we have Teg;; = 0 or because we've constructed a full basis of V. At this point, the list (fi,...,f,) can be extended to an orthonormal basis of W
(if necessary), and this completes the construction of
the singular values and vectors. It remains to show that if k is the largest index
for which
o, > 0, then
k =
r = rank T. We will do this by showing that (f;, . . . ,f,) is an orthonormal basis of range T.
Since thefj were constructed to be orthonormal (in particular, linearly independent), they form a basis for their span. Now, for each j € {1,...,k}, ff = atin and so (fi, -- ste) © range T. lar) 'S?4xeur > |jaz|| S34xeut os ‘Ig 5 5 :z# WO
5.1 Singular Value Decomposition of Linear Maps
293
Conversely, if w € range T, then w = Tv for some v € V. Expanding v with respect
to the orthonormal basis (e1,..., €n),
w=Tv= r(Sies
= Ylne) 0
j=l
= Leloufs
jel
and so range T C (fi,...,fe)-
a
Uniqueness of Singular Values Singular value decompositions are not unique. As a trivial example, consider the identity map I : V > V. Any orthonormal basis (e),..., €n) has the property that I(e;) = ej for allj. Notice, however, that even though we are free to take any basis we like, the values o; do not change; in this example, oj = 1 for eachj. This is true in general: while the singular vectors in Theorem 5.1 are not unique, the singular values are.
Theorem 5.3 Let V and W be finite-dimensional inner product spaces, and let T € £(V,W) with rank(T) = r. Suppose there are orthonormal bases
(e1,...,€n) and (@,...,€n) of V and (fi,...,fm) and (fi,....fm) of W, and real scalars
0; > +--+ > 0,
Te =
> O andG, >--- > 6G, > 0 such that
gy _ [ah fi 0,
:
r
Tv? = >> (reo? j=l
1
so? Y\(.¢)/ so? Do \(ne)/’ =o7 IP, j=
il
65)
294
Singular Value Decomposition and the Spectral Theorem with equality throughout if v = e;. We thus have that Tl op = braid max |Tul| ||Tul| = =o 4, (Tllop and in the same way, ||T lop = 1. Moreover, the fact that 0) = |Tllop equality throughout equation (5.5). For it must be that whenever j € {1,...,1r} second inequality to be an equality, we that
means
that, for
v €
U, we must
have
equality to hold in the first inequality, with (v, ¢) # 0, then oj = o4. For the must have v € (e1,...,¢r). It follows
UC(e,..-5€n), where rj is the largest index such that o,, = 0}.
On the other hand, if v € (e1,...,¢,,), then writing
2
n
n
v= Da
(v, ej) e leads to
2
To)? = ]}S° (vei) off] = 7 D> |(rse)l? = oF Wel? j=l j=l and so (e1,-.-4€n)
© UL
That is, we have shown that (e),..., en) = U; the same argument shows that
(61...) = U. We thus have that
#{j| oj) = 01} =# {| 6 = 61} = dim(U). To continue, we apply the same argument as above to the restriction T, = T|yj..
If
Uzi [ve U* | P20] = Tally hel} then for k, = dim(U)), it follows as above that Oh
= 1+ = Ohphy = Shp = +++ = Oktky = lITrllop>
and that both op, ¢,41 < ok+1 and Gkye,41 < Ok+1- Continuing in this fashion, V is decomposed into subspaces depending only on T, and the (common) values
of the oj and 6; are the operator norms of restrictions of T to those subspaces, with the number of times they occur equal to the dimensions of the corresponding
subspaces.
A
Now that we have verified uniqueness, we can legitimately define the singular values of a map.
5.1 Singular Value Decomposition of Linear Maps
Definition Let V and W
295
be finite-dimensional inner product spaces, and let
T € L(V, W). The singular values of T are the numbers 0; > --- > op > 0, where p = min{m, n}, o1,...,0, are given in the statement of Theorem 5.1, and oj =Oforr+1 o, > 0 (with r = rankT), such that
Te = e
off ii 0
ifl of > O = 074
= +--+ = Gp are unique, but the
bases are not. (r = rank T and p = min{dim(V), dim(W)} > r.) e The largest singular value is the operator norm of the map.
EXERCISES
= [ypea soy 1 = fo yey, smoys |"g Wa!0ayy Jo ood ayy ‘a Kiana 40} |\al| = lla] AUIS eH WO
296
Singular Value Decomposition and the Spectral Theorem
orthogonally onto the y-z plane. Find a singular value decomposition
for T. Consider the space C*([0, 27]) of infinitely differentiable functions f :
[0, 27] — R with the inner product an
(fg) = J
fledglx) dr.
oO
Fix n € N, and let V C C™([0, 277]) be the subspace spanned by the functions
1, sin(2), sin(22), ..., sin(nx), cos(x), cos(22),
..., cos(nx).
Find a singular value decomposition for the derivative operator D €
LV). 5.1.6
Suppose that P € L(V) is an orthogonal projection on a finitedimensional inner product space V. Show that: (a) the singular values of P are all 1 or 0, (b) P has a singular value decomposition in which the left singular vectors are the same as the right singular vectors. Suppose that T € £(V) has singular values 0) > ++. > o, > 0. Show that if 4 is an eigenvalue of T, then oy < |A| < 04. Show
that T € £(V,W)
is invertible if and only if dimV
and all the singular values of T are nonzero. Suppose that T € L£(V,W) is invertible.
Given
= dimW
a singular
value
decomposition of T, find a singular value decomposition of T~'. Suppose that T € £(V, W) is invertible with singular valueso) > --- > on. Show that
on = min Toll = |T "I, a
Spill 5.1.12)
=
yo
Show that if V is a finite-dimensional inner product space and all the singular values of T € £(V) are 1, then T is an isometry. Let V and W be finite-dimensional inner product spaces and suppose that
Sills)
i
T
€
L(V,W)
has singular values 0;
>
--:
>
op. Show
that,
given any s € R with op < s < oj, there is a unit vector v € V with ||Tvl] = s. Let V and W be finite-dimensional inner product spaces and T € L(V, W). Show that there exist:
e
subspaces Vo, V;..., Vk of V such that V is the orthogonal direct
sum Vo ® Vi ®---® e
subspaces
Vi,
Wo, Wi,...,We
of W
direct sum Wo @ W; ®--- ® Wi,
such that W
is the orthogonal
297
5.2 Singular Value Decomposition of Matrices
e
isometries Tj € £(Vj, Wj) for j = 1,...,k,
e
distinct real scalars t),..., tT > 0,
such that k
T=) yljPy,. j=
5.1.14
Suppose that T € L(V, W) has a singular value decomposition with right singular vectors (e),...,n) and singular values 0) > --- > op. (a) FOG see er paglet Ghaonnyp Show that, for every v € Uj, we have ||Tv|| < 9;||»\|(Bioccrata}}, Show that, for every v € Vj, (b) For j = 1,...,p, let Vj we have ||Tv|| > 9;||»|(c) Show that if U is a subspace of V with dim U = n —j +1, then there exists a v € U such that ||Tv|| > 9; II.
(d)
Hint: Use Lemma 3.22. Use the above results to show that min max ||Tv||. C7) = dim U=n—j+ 1 veU, ||v]=1
Remark: This gives another proof of Theorem 5.3.
Gy
In Theorem 5.3, we assumed the number of nonzero singular values in
each singular value decomposition is equal to the rank of T. Show that this assumption is unnecessary: Suppose that there are orthonormal bases (e1,...,€n) of V and (f\,..-,fm) of W, and scalars 0, > --- > o% > O such that Te; = ojfj for j = 1,...,k, and
Te; = 0 for j >
k.
Prove that rank T = k.
5.2 Singular Value Decomposition of Matrices Matrix Version of SVD In this section we explore the consequences of Theorem 5.1 for matrices. We begin by re-expressing the theorem in matrix language.
Theorem 5.4 (Singular value decomposition (SVD)) Let Then there exist matrices
A € Mmn(F) have rank r.
U € My(F) and V € M,(F) with U and V unitary (if
F = C) or orthogonal (if F = R) and unique real numbers 0; > --- > oy > O such that A=UZV", where & € Mmn(R)
has (j,j) entry oj for 1 - -
> 0, > 0 are the nonzero singular values of A.
The proofof Theorem
5.7 in general is left as an exercise (see Exercise 5.2.20),
but we will see how it works in a particular case. zh Ot 2€°S+ 7S + 79° € =HBl = 7+ 29+ 7ZI 9HWO
304
Singular Value Decomposition and the Spectral Theorem Example In the example on page 298, we found that for
there were particular orthonormal bases (uj, U2, U3) of R? and (v1, v2, V3, V4, Vs) of
R® such that
Av; =12u;,
Av) =6u,
Av3=2u;,
Avy=0,
and
Av,=0.
Consider the matrix
B = 12uyv} + 6u2Vv> + 2u3v3. If you're in the mood to waste paper, you can of course just work out all the entries of B in order to show that A = B. But since (v1, v2, V3, V4, Vs) is orthonormal, it is easy to compute that By; = 12u,,
Bv,=6u),
Bv;=2u3;,
Bvy=0,
and
Bv,=0.
Since (v1, v2, V3, Va, Vs) is a basis of R°, this is enough to show that A = B.
A
We now show that the best rank k approximation of A is gotten by truncating the singular value decomposition of A, in the form given in Theorem 5.7.
Theorem 5.8 Let A € Mm,n(C) be fixed with positive singular values 0, > +++ >
o, > 0. Then for any B € Mmn(C) with rank k Oke, and equality is achieved for
k = DV oiujyj, joi where the notation is as in the statement of Theorem 5.7.
Example Returning again to the example from page 298, the best approximation ofA by a matrix of rank 1 is
B=12ujvj=|
1
[1
-1
3
1
6] =
1
-1
3
1
6
|,
5.2 Singular Value Decomposition of Matrices
305
and the best approximation by A by a matrix of rank 2 is
B=12uivjt+6mwv,;=|
2
-2
3
-1
6
|.
a
Quick Exercise #7. What is ||A — B||,, for each of the approximations above? (Recall that the singular values ofA are 12, 6, and 2.)
Proof of Theorem 5.8
Let
r
A= jel Dou} be the singular value decomposition of A, and let B have rank k. By the RankNullity Theorem, dim kerB = n — k. So by Lemma
3.22,
(Viy-++ Ver) VkerB A {0}. That is, there exists
a nonzero vector x € (Vj,...,Vp+1)
such that Bx = 0. Since
(v1,.--,Vn) is orthonormal, x is orthogonal to v; for j > k + 1. Therefore
r Ax= YD
r
k+l ojuyy7x = Vail x, vj) y= Lal x, Vj) uj.
d=
jel
Then
REL
(A = B)xl|?= Ax||? = | > 0; (x, vj) uy
2
et
-Lel (x, vj)/?,
j=l
since (u),...,U,)) is orthonormal, and so k+1
WA —B)xl)? = of, D> |(x.vi)]? = of Ix? 2
2
jel
By inequality (4.11), this implies that ||A — Bll gy = o41For the proof that equality is achieved for the Exercise 5.2.21.
stated
matrix
B,
see A
Using the methods of Section 4.3, we can also prove a version of Theorem 5.8 for the Frobenius norm.
2 = “lq — vil ‘uonewixoudde z yued ayy 104 ‘9 = “IIgq — yl ‘uonewosdde | Yue ayy JO4 :Z4 YO
Singular Value Decomposition and the Spectral Theorem
Theorem 5.9 Let A € My,»(C) be fixed with positive singular values 0, > +++ >
o, > 0. Then for any
BE Mm,n(C) with rank k = and equality is achieved for
k B
Ss ojujv}, F
where the notation is as in the statement of Theorem 5.7.
Together, Theorem 5.9 and Theorem 5.8 imply the surprising fact that the best rank k approximation to a given matrix is the same regardless of whether we use the operator norm or Frobenius norm to judge “best approximation.” (Generally, the “best” way of doing something depends a lot on how you define what is “best”;
see Exercise 5.2.17.)
o
Quick Exercise #8. Let A be the 3 x 5 matrix from the example on page 298. (a) What are the best rank 1 and rank 2 approximations ofA with respect to the Frobenius norm? (b) For each of those approximations B, what is ||A — B||-?
To prove Theorem
5.9, we will need the following lemma.
Lemma 5.10 Let W c F™ be afixed k-dimensional subspace, and define
Vw := [B € Mmn(C) | C(B) c WH. Then Vy is a subspace of Mmn(C). Moreover, if (w;,...,W,) is an orthonormal basis of W, then the matrices
Wje=|0
I
|
--»
|
wy |
-
|
0
|
(with w; in the éth column) form an orthonormal basis of Vw.
Proof
See Exercise 5.2.22.
A
7 pue sz = zz + 29/ (q) SOE-vOE Sa6ed Uo ajdwexa ay) ul saruTeW OM) ayy se WES ay (e) :8# VO
5.2 Singular Value Decomposition of Matrices
Proof of Theorem 5.9
307
To prove that, for any B of rank k,
(5.6)
|A—Bllp = observe first that if A = UZV*
with U and V
unitary, then for any B
€ Mm,n(C),
|A—Blip = |UEV" ~B|,= |U — U'BVWV*|, = |= —U'BV],. Furthermore, rank U*BV = rank B, since U and V
are unitary, hence isomorphisms.
It thus suffices to prove the lower bound of Theorem 5.9 when
|
lod
O
+.
O
oe,
+--+
A=D=]oje;
|
(The final columns of zeroes are not there if r = n.) Now, for each fixed k-dimensional subspace W C F”, ||A— B||p is minimized among B € Vy by Py,,A. Since the Wj, are an orthonormal basis of Vw, k
n
PyyA= >> >> (A, Win) Wie j=l 1 k n
=>) (ae, wi) Wie j=l é=1
k =
>
k (ar. wi) wy
YS? (an, wij)
j=l
j=l |
| =|Pwar |
| ---
| Pwan |
Since A — Py,A is orthogonal to Py,,A with respect to the Frobenius inner product,
IA — PyyAll = TAME — Pv All? Moreover, since ay = over for 1
r, the formula above
for Py,,A gives that
|Pvw Alp = >_WPwarl? = )>o? IPweell”, f=1
=
308
Singular Value Decomposition and the Spectral Theorem and so
: ° IA = PyyAllz = > 07 — D007 iPweell?. é=1
co |
Now, for each é, | Pwee||? < |lec||? = 1, and m
2
m
yo iPwedl? => x | (ecw)? e=1
@1j=1
k
2
= > wil? = jel
Since 0; > --- = 0, > 0, this means that the expression )7/_,07 ||Pweell? is largest if W is chosen so that ||Pwey||? = 1 for 1 < ¢ < k and is zero otherwise, that is, if W = (e1,..., ex). It follows that
r ' |A—PwwAll; = 00? - Der IP well? > De - Set => = ¢. =1
é=1
j=k+1
Since every B of rank k is in some Vy, this completes the proof of inequality (5.6). To prove that if B = rE 1 ojujv} , we have equality in (5.6), it suffices to compute ||A— B]|, for this choice of B: k
2
A- SV ojuyj j=l
r
2
>
ojuyy;
de,
iP
RET
f=h+1
+1
E
ror > > oe
ojo¢ tr(ujvj veu;)
t=k+1
= 3 ojo (ve,Vj) (uj, Ue) = y o} okt] t=k+1 j=k+1
Q—m e
KEY IDEAS
SVD: given A € My,,(F), there are orthogonal/unitary U,V, and © € Mmn(F)
with oj in the (j,j) entry for 1 0; > O be the singular values
of A. Show that «(A) = me B22) 274115)
5.2.14
Let
A € Mm,(IR),
and suppose
that
B € Mmn(C)
approximation ofA. Show that all the entries of B are real numbers. Let A € M,(F) be invertible. Show that the minimum value of ||A—Bllop for B a singular matrix is o,. Show that the same holds if ||A — Bllop is replaced with ||A — Bllp. If A € Mm,n(C) is nonzero, then the stable rank of A
srank A :=
is defined as
Allz>:
AI,
(a)
Show that srank A < rank A.
(b)
Suppose that k > a srankA for some a > 0, and B k approximation to A. Show that
IA —Bllop —l Alloy ~ 5.2.15,
is the best rank k
is the best rank
.
Let A = UZV* be a singular value decomposition of Define the pseudoinverse A* € Mn,m(C) of A by
A € Mmmn(C).
At =vz'Ut, where ©* is an n x m matrix with a in the (j,j) entry forl and zeroes otherwise.
a) b)
P,(IR) with respect to the bases
(Bh? ocogee er He oop itdb (d) Why does what you found not contradict Theorem 5.13? Let U be a subspace of a finite-dimensional inner product space V, and consider the orthogonal projection Py as an element of £(V, U). Define the inclusion map J € £(U, V) by J(u) = u for all u € U. Show
that Py and J are adjoint maps. 5.3.10
Let V be a finite-dimensional inner product space, and let T € L(V). (a) Show that tr T* = (tr).
(b)
Show that if T is self-adjoint, then trT € R.
320
Singular Value Decomposition and the Spectral Theorem
yeh]
Let A € M,(C). Show that the Hermitian and anti-Hermitian parts of A (defined in Exercise 4.1.5) are both Hermitian. Remark: This may seem rather strange at first glance, but remember
that the imaginary part of a complex number is in fact a real number! “2h i74 Let V be a finite-dimensional inner product space. Suppose that T € L(V) is self-adjoint and T? = T. Prove that T is the orthogonal projection onto U = range T. Spehile)
Let V and W be finite-dimensional inner product spaces and let T € L(V, W). Show that T*T = 0 if and only if T= 0.
5.3.14
Let V and W
be finite-dimensional inner product spaces and let T €
L(V, W). Show that ||T*|oy = lI Tllop5.3315)
Let V and W
be finite-dimensional inner product spaces and let T €
L(V, W). Show that |IT*Tllop = ITT*llop = IITlldp5.3.16
Let V be a finite-dimensional inner product space. Show that the set W of all self-adjoint linear maps on V is a real vector space.
Sei)
Let V and W
be finite-dimensional inner product spaces. Show that
(S,T)p := tr(ST*) 5.3.18
defines an inner product on £(V, W). Suppose that V is a complex inner product space,
T € £(V),
and
T* = —T. Prove that every eigenvalue of T is purely imaginary (i.e., of
the form ia for some a € R). 5319)
Suppose that V is a complex inner product space, T € £(V), and T* = —T. Prove that eigenvectors of T with distinct eigenvalues are orthogonal.
5.3.20
Suppose that T ¢ £(V,W)
has an SVD with right singular vectors
€1,---,&n € V, left singular vectors fi, ...,fm € W, and singular values
> oy > O (where r = rank T). Show that: 12° (a) (fi,.--.f,) is an orthonormal basis of range T. (b) (€;41,-.+, m) is an orthonormal basis of ker T. (Cc) (fr41.---, fn) is an orthonormal basis of ker T*. (d) (e,,...,e,) is an orthonormal basis of range T*.
S}74 | Prove parts 4 and 5 of Proposition 5.14. 5.322)
State and prove a version of Algorithm 5.17 that starts with “Find the eigenvalues of AA*”
5.4 The Spectral Theorems In this section we will definitively answer the question “which linear maps can be diagonalized in an orthonormal basis?” In the context of matrices, we say that such a matrix is unitarily diagonalizable.
5.4 The Spectral Theorems
321
Eigenvectors of Self-adjoint Maps and Matrices In order to construct an orthonormal basis of eigenvectors, we will for starters need the existence of at least one eigenvector.
Lemma 5.18 If A € M,,(F) is Hermitian, then A has an eigenvector in F". If V is a nonzero finite-dimensional inner product space and T € L(V) is self-adjoint, then T has an eigenvector.
Proof
We will prove the matrix-theoretic statement; the version about linear maps
then follows. Let A = ULV" be a singular value decomposition ofA. Then
A’ =A*A=VE*EV = VEPV, since © is diagonal with real entries when m = n, and V is unitary. By Theorem 3.56, each column vj; ofV is an eigenvector of A? with corresponding eigenvalue
a. Therefore
0 = (A? — of Inj = (A + ofln)(A — olny; Let w = (A — oj1,,)v1. If w = 0, then v; is an eigenvector ofA with eigenvalue o;. On the other hand, if w 4 0, then since (A + oj1,,)w = 0, w is an eigenvector of A with eigenvalue —o;.
A
The following result gives an easily checked sufficient condition for unitary diagonalizability; it is a necessary and sufficient condition for unitary diagonalizability with real eigenvalues (see Exercise 5.4.17). The name derives from the fact that the set of eigenvalues of a linear map is sometimes called its spectrum.
Theorem 5.19 (The Spectral Theorem for self-adjoint maps and Hermitian matrices) If V is a finite-dimensional inner product space and T € L(V) is self-adjoint, then there is an orthonormal basis of V consisting of eigenvectors of T. If A € M,(F) is Hermitian (if F = C) or symmetric (if F = R), then there exist a unitary (if F = C) or orthogonal (if F = R) matrix U € M,(F) and real numbers 1,...,4n
€ R such that
A=Udiag(A1,...,AnJU*. Proof In this case we will prove the statement about linear maps; the statement about matrices then follows from that, together with the fact (see Quick Exercise #10 in Section 5.3) that eigenvalues of self-adjoint maps are real.
322
Singular Value Decomposition and the Spectral Theorem By Lemma
5.18,
T has an eigenvector.
By
rescaling
we can
assume
this
eigenvector is a unit vector; call it e;, and let A; be the corresponding eigenvalue.
Now if (v,e1) = 0, then
(Tv, e1) = (v, Ter) = Ai (v, e1) = 0, so T maps the subspace (ey) to itself.
#11.
Prove that T
By Lemma 5.18, T has an eigenvector in (e;)+, which we may assume is a unit vector e2. Thus e; and e2 are orthonormal eigenvectors of T.
We continue in this way: T
restricts
(e1,..-,€k)>
to
a
#
self-adjoint
dimV,
map
on
{0}, and so by Lemma
(e1,- 6-5 k) te Since k =
after constructing orthonormal eigenvectors e,..., ex,
V
at which
(e,.
5.18,
is finite-dimensional, point
e,)t.
If
dimV
>
k,
T has a unit eigenvector this
process
will
(e),..., eg) will be an orthonormal
then
eep
terminate
€
once
basis consisting
of eigenvectors of T.
A
The factorization A = U diag{A1,...,A,)U* in Theorem 5.19 is called a spectral
decomposition of A. It should be kept in mind that Theorem 5.19 gives a necessary and sufficient condition for unitary (orthogonal) diagonalizability in the case of all real eigenvalues; there are linear maps which can be diagonalized only by a non-orthonormal basis.
Example Let A =
12
o
3" Clearly A is not Hermitian, but since A has both 1 and
3 as eigenvalues, there must be (necessarily linearly independent) corresponding
eigenvectors, and so A is diagonalizable. Indeed,
‘fl-L] = slil--[} so A is diagonal
in the nonorthogonal
basis
((" ; ! )
diagonalized by any orthonormal basis; see Exercise 5.4.17.
but A
cannot
be
A
The Spectral Theorem is a powerful tool for working with self-adjoint maps and Hermitian matrices. We present the following result as one important sample application.
‘apedsqns sii, Jsnf 40 A 40 jJ2 UO Bude J Jo YUIY aM JAYLBYM LE = 4 OS “(my ‘a) = (ma) *7(19) 3 ana UAAID LL WO
323
5.4 The Spectral Theorems
Theorem 5.20 Let A € M,(C)
be Hermitian. The following are equivalent:
1.
Every eigenvalue of A is positive.
2.
For every0 #x € C", we have (Ax,x) > 0.
Definition A matrix satisfying the equivalent properties in Theorem called positive definite.
Proof of Theorem 5.20 Suppose first that every eigenvalue of A the Spectral Theorem, we can write
5.20 is
is positive. By
A =Udiag(A1,...,AnJU*, where U
is unitary and Aj,...,4, are the eigenvalues of A. Therefore
(Ax,x) = (Udiag(A1,..., AnJU*x,x) = (diag(41,...,
AnJU*x, U*x).
Let y = U*x; since U* acts as an isometry, y # 0 if x 4 0. Then
n
(Ax,x) = (diagla1,...,Anysy) = }>aj[yjl” > 0 j=l since 4; > 0 for eachj. Conversely, suppose that (Ax, x) > O for every x # 0. Let & be an eigenvalue of A with eigenvector x. Then
0 < (Ax,x) = (Ax,x) =A |[x||”, sor
A
> 0.
There are some computations which are possible in principle to do directly but are greatly facilitated by the Spectral Theorem. For example, suppose that A is a Hermitian matrix, and let k € N. If A = UDU* is a spectral decomposition of A with D = diag(a,,...,A,), then
A* = (ubU*)* = —— (UDU") - - -(UDU*) ee = UD"U* = Udiag(a*,...,a4)U*. ktimes
This observation makes it easy to calculate arbitrarily high powers of a Hermitian matrix. It also suggests how to go about defining more general functions of a matrix. Consider, for example, the exponential function f(x) = e'. By Taylor's Theorem,
x
f= k=0
324
Singular Value Decomposition and the Spectral Theorem which suggests defining the matrix exponential for A € M,(C) by
— 1 =a k=0 Now, if A is Hermitian and has a spectral decomposition as above, then
00 A
1
A= > qu
ke
00 1
U* =U (x w _
ke
) U*
_=
n Udiag(e*',...,
,n
e*")U*.
This leads us to a general definition for a function of a matrix: if f : R > define
R, we
FIA) = Udiag(F.),....f)U*. This route to defining f(A) is sometimes called the functional calculus. resulting functions sometimes, but not always, behave as you'd expect.
The
Examples 1.
Suppose that A is Hermitian with nonnegative eigenvalues. Then /A can be defined via the functional calculus as described above, and
(VA)? = (Waiagl/ a4, Ley abu? =Udiag(a',...,4)U* =A. 2.
IfA and B are Hermitian and commute, then eA+® = e4e®, but if A and B do not commute,
this need not be the case (see Exercise 5.4.7).
A
Normal Maps and Matrices Besides the potential nonorthogonality of a basis of eigenvectors, another issue
with self-adjoint maps and Hermitian matrices is that they necessarily have real eigenvalues. Theorem 5.19 thus has nothing to say about diagonalizability for any
maps or matrices with nonreal eigenvalues. Let's consider under what circumstances we could hope to prove a
result like
Theorem 5.19 for nonreal eigenvalues. Working in the matrix-theoretic framework, suppose that A = U diag(A;,...,4,)U* for a unitary matrix U € M,(C) and complex numbers Aj,...,An € C. Then
A* =U" diag(a,,...,A,)"U =U" diag(;,...,2,)U, and so
A‘A = U* diag(ii,...,4n) diag(A1,...,An)*U
= U* diag(|4|?,...,/Anl7U = AA‘. So for a start, we need at least to have A*A = AA*.
325
5.4 The Spectral Theorems
Definition A matrix A € M,(C) is normal if A*A = AA*. A linear map T € £(V) is normal if T*T = TT*.
By Theorem 5.13, a linear map is normal if and only if its matrix with respect to an orthonormal basis is normal.
Examples 1. 2.
Any self-adjoint map or Hermitian matrix is normal. Any isometry or unitary matrix is normal: if T is an isometry, then T* = T~', so
TT* =1=T*T.
A
The observations above mean that for basis, T must be normal; it turns out that that is, a linear map can be diagonalized is normal. The following lemma gives an which will be used in the proof.
T to be diagonalized by an orthonormal this condition is necessary and sufficient; by an orthonormal basis if and only if it alternative characterization of normality
Lemma 5.21 Let V be a complex inner product space. Given T € £(V), define To
Lip +T*) 2
and
i
fir_ 2i
els
Then T, and T; are both self-adjoint, and T is normal if and only if
T,T; =T;T,.
Proof The self-adjointness for Tj,
of T, follows
i= (Zr
Tj
1
=(—)("
immediately
from
Proposition
5.14;
-n=-pr-n=1, 1
-T) =-(T - 1) =T;.
For the second claim, note that 1
T,Tj-TiTy = TT and so T,Tj — TjT; = 0
— TT"),
if and only if T*T — TT* = 0, that is, if and only if T is
normal.
Lemma 5.22 Suppose that S,T € L(V) and that ST = TS. If is an eigenvalue of T and v € Eig, (T), then Sv € Eig, (T).
A
326
Singular Value Decomposition and the Spectral Theorem
Quick Exe:
#12. Prove Lemma
Theorem 5.23 (The Spectral Theorem for normal maps and normal matrices) If V is a finite-dimensional inner product space over C and T € L(V) is normal, then there is an orthonormal basis of V consisting of eigenvectors of T. If A € M,(C) is normal, then there exist a unitary matrix U € M,(C) and complex numbers i1,...,A4n € C such that
A =Udiag(h,...,2nJU*. Proof
Again we prove the result for linear maps; the version for matrices follows.
Define T; and
from Lemma
T; as in Lemma
5.21. For each eigenvalue 4 of T,, it follows
5.22 that T; maps Eig, (T,) into itself, and is a self-adjoint map on
that subspace (see Quick Exercise #11). So by Theorem 5.19, there is an orthonor-
mal basis of Eig,(T,) consisting of eigenvectors of T;; these vectors eigenvectors of both T, and T;. Putting all these bases together, we get an orthonormal basis (e),..., each member of which is an eigenvector of both T, and T;, though with eigenvalues in general. So for each j, there are constants 4j,4j € IR such
Tre =Ajey
and
are then en) of V, different that
Tie; = Hjej,
and so Te; = (T, + iT ej = (Aj + inje;.
Thus each ej is an eigenvector of T as well. Example
Let A
:=
a
A
3 . As we've seen, multiplication by A corresponds
to a rotation of R? counterclockwise by 90°, and in particular, A has no real eigenvalues. However, A
is certainly normal (it is an isometry!), and so it follows
from Theorem 5.23 that A can be unitarily diagonalized over C. Indeed, ana=[7,
']
-1
and so A is an eigenvalue ofA A=
-A
R3R1
[: O
if and only if 1 +A?
‘al 14+A
= 0, that is, if and only if
+i.
Quick Exercise #13. Check that (i, an eigenvector for 4
—1) is an eigenvector for 4 = i and that (i, 1)
“ASY = ALS = (aS). vayy (1) “Big 2 4 Hh ZL WO
327
5.4 The Spectral Theorems We thus take
1fi v= 34
i ‘|
“GL DBIGE)
which is indeed unitary (check!), and we have that
Schur Decomposition Theorem 5.23 is mathematically very satisfying, because it gives us necessary and sufficient conditions
for finding an orthonormal
basis of eigenvectors,
that is,
being able to “unitarily diagonalize” a map or a matrix. The downside is that not all matrices and maps can be unitarily diagonalized; not all matrices and maps are normal. However, we do have the next best thing: all matrices and maps over C can be unitarily triangularized. The fact that this only works for complex inner product spaces and matrices is to be expected; recall that we showed in Section 3.7 that in order to be able to triangularize all matrices, the base field needs to be algebraically closed.
Corollary 5.24 (The Schur decomposition) Suppose that V is a finite-dimensional complex inner product space, and T € L(V). Then there is an orthonormal basis B of V such that [T]. is upper triangular. Equivalently, if A € My(C), then there exist a unitary matrix U € My(C) and an upper triangular matrix T € M,(C) such that A = UTU*.
Proof can be and an the QR
We start with the triangularized: by upper triangular decomposition of
statement for matrices. Recall that every matrix over C Theorem 3.67, there exist an invertible matrix S € M,(C) matrix B € M,(C) such that A = SBS~!. Let S = QR be S, so that Q is unitary and R is upper triangular. Then
A = ORB(QR)~! = ORBR~'Q~! = Q(RBR™')Q*. We've seen (see Exercises 2.3.12 and 2.4.19) that inverses and products of upper triangular matrices are upper triangular, so RBR~! is upper triangular. Corollary 5.24 thus follows, with U = Q and T = RBR™!. The statement for maps follows from the statement for matrices: let T ¢ £(V) as above, and let A = [T].3,, for some orthonormal basis Bo of V. Then there is a unitary matrix U € M,(C) and an upper triangular matrix R € M,(C) such that A = URU*; that is,
U* [T],U=R.
328
Singular Value Decomposition and the Spectral Theorem
Define a basis B = (v1,...,¥») of V so that [vis Then 8
is indeed a basis of V, because U
= uj, the jth column of U.
is invertible, and moreover, since Bo is
orthonormal,
(vem) = (us) =
:
1
ifi=j,
0
otherwise.
By construction, [I].3, = U, and so we have an upper triangular R such that
a
R=U*[T]g, U = [Na,,2 [T]s, [a,x = [Ts Quick Exercise #14.
(a)
Show
that
an upper
triangular Hermitian
matrix
is
diagonal. (b) Use this fact and the Schur decomposition to prove the Spectral Theorem for complex Hermitian matrices.
Example
Let
A = [:
—2
2
. A quick check shows that A is not normal, and so
it cannot be unitarily diagonalized. To find a Schur decomposition of A, we first look for an eigenvector (which will be the first column of U):
6-A anat=[ 2
-2 | rari | 1 1-4 fal : “402 Te
a9)
and so A has a single eigenvalue 4 = 4. An eigenvector is an element of the null space of
a-a—>|)
—'], 0
0
so we can choose the first column of U to be u; = w
Hl . Since we only need
a 2 x 2 unitary matrix U which triangularizes A, we can take up to be any unit vector orthogonal to uy; for example, uz = a
A = UTU*
=
L
1|° We can now solve for T:
T=U*AU
‘AnOAd 0} OM BOW SB¥e} JEU) “DUIaWUAS SI (y)"W > V J! |EYOBOYLO [ead aq 0} Uaye} aq UeD IeY) aroId 2,USaOP SIYL IeYI B2NON “JeUOBeIP SIL OS “4. = 4(AVaN) = NV. = L uay)
‘uon!sodworap
uny2s
sy
Ss!
,MIN
=
W
pue
ueNUaH
SI
W
4
(q)
‘snoINgO
(e)
:pL#
WO
329
5.4 The Spectral Theorems This was of course a rather tice, the Schur decomposition QR algorithm, but we will not theoretical fact that it is always
Q—s
ad hoc way to unitarily triangularize A. In pracof a matrix is found by an iterative version of the discuss it here. Our main interest is in the purely possible to unitarily triangularize a matrix over C.
KEY IDEAS
e@ A map or matrix is normal if it commutes with its adjoint. e The Spectral Theorem for maps: If T € £(V) is normal and V is complex, then
e
V has an orthonormal basis of eigenvectors of T. If T is self-adjoint, then V has an orthonormal basis of eigenvectors of T and the eigenvalues are real. The Spectral Theorem for matrices: If A € M,(C) is normal, there is a unitary U such that A = UDU*, where D is diagonal with the eigenvalues of A on the diagonal. If A is Hermitian, then the entries of D are real. If A is real symmetric,
e
then U is orthogonal. The Schur decomposition: If A € M,(C), there is a Hermitian triangular T such that A = UTU*.
U and upper
EXERCISES 5.4.1
Find a spectral decomposition of each of the following matrices.
af; 5.4.2
oe
Hl
of}
1-1
201 a
101
Oo
-1
0]
1
0
2
11
@jo
110
Find a spectral decomposition of each of the following matrices.
> oF 5]
slalt mf Al
Teliol@lot et (/0 02) @)_ 1 4 1 9 020
o
1
0
1
5.4.3
Find a Schur decomposition of each of the following matrices.
ofS] ols 3
5.4.4
Find a Schur decomposition of each of the following matrices.
of) 3
5.4.5
0
of a] =
Prove that if A ¢ M,,(R) is symmetric, then there is a matrix
B € M,(R)
such that B? = A. 5.4.6
Suppose that A is an invertible Hermitian matrix. Show that defining k via the functional calculus does indeed produce a matrix inverse to A.
5.4.7
Let A and B be Hermitian then e4+8
= e4e8, Show
need not be true.
matrices.
Show
that if
A and B commute,
that if A and B do not commute,
then this
330
Singular Value Decomposition and the Spectral Theorem
5.4.8 5.4.9
5.4.10
Let A be a Hermitian matrix. Show that e* is positive definite.
Let A € M,(C) be Hermitian. Show that the following are equivalent: (a) Every eigenvalue ofA is nonnegative. (b) For every x € C”, we have (Ax,x) > 0. A matrix satisfying these properties is called positive semidefinite.
Let A € Mmn(C) (a)
5.4.11
Show that A*A is positive semidefinite (see Exercise 5.4.9).
(b) Show that if rank A = n, then A*A is positive definite. Prove that if A € M,(F) is positive definite, then there is an upper triangular matrix X € M,(F) such that
A = X*X. (This is known
as a
Cholesky decomposition of A.)
5.4.12
Let V be a finite-dimensional inner product space, and suppose that S,T
€
L(V)
are
both
self-adjoint
and
all
their
eigenvalues
are
nonnegative. Show that all the eigenvalues of S + T are nonnegative. 5.4.13
Show that if A € M,,(C) is positive definite, then (x, y) = y*Ax defines
5.4.14
an inner product on C”. Suppose that V is a finite-dimensional complex inner product space. Prove that if T € £(V) is normal, then there is a map S € £(V) such
that S? = T. 5.4.15,
A circulant matrix is a matrix of the form a
ag
43
GA,
An-1
+
Gn-1
Ay
a
a4
Az
AZ
An
An
Ay
So rts
es
a An
ay
where 4y,...,@, € C. (a) Prove that if A and B are n x n circulant matrices, then AB = BA. (b)
5.4.16
5.4.17
Prove that every circulant matrix is normal.
Suppose that C € M,(C) is a circulant matrix (see Exercise 5.4.15) and that F € M,,(C) is the DFT matrix (see Exercise 4.5.3). Show that
C = FDF* for some diagonal matrix D. (a) Let A € M,(C) have only real eigenvalues. Show that an orthonormal basis of C” consisting of eigenvectors A is Hermitian. (b) Let A € M,(IR) have only real eigenvalues. Show that an orthonormal basis of IR" consisting of eigenvectors A is symmetric.
if there is
of A, then if there is
of A, then
331
5.4 The Spectral Theorems
5.4.18
Let U € M,(C)
5.4.19
M,(C) such that V*UV is diagonal, with diagonal entries with absolute value 1. If U is assumed to be real and orthogonal, can V be taken to be orthogonal? Let V be a finite-dimensional inner product space and T € £(V). (a) Suppose that F = R. Show that T is self-adjoint if and only if V is the orthogonal direct sum of the eigenspaces of T. (b) Suppose that F = C. Show that T is normal if and only if V is the orthogonal direct sum of the eigenspaces of T.
(c)
be unitary. Show
that there is a unitary
€
Suppose that F = C. Show that T is self-adjoint if and only if all the eigenvalues of T are real and V of the eigenspaces of T.
5.4.20
matrix V
Prove that if A € M,(F)
is the orthogonal direct sum
is Hermitian, then there are an orthonormal
basis (vj,...,V,) ofF” and numbers 4j,...,4n € R such that
n i= Savy}.
jl
5.4.21
Suppose that V L(V)
is a finite-dimensional inner product space and T €
is self-adjoint. Prove that there are subspaces
Uj,...,Um
of V
and numbers 4j,...,4m € R such that V is the orthogonal direct sum of U;,...,Um and
m T=)
> AjPy,. =
5.4.22 5.4.23
Show that if A € M,C) is normal, then eigenvectors of A with distinct
eigenvalues are orthogonal. Let V be a finite-dimensional inner product space, and suppose that T € L(V) is self-adjoint. By reordering if necessary, let (e1,...,€n) be an orthonormal basis of V consisting of eigenvectors of T, such that the corresponding eigenvalues ),...,4n are nonincreasing (i.e.,
Ay 2+ 2 An). (a)
Forj =1,...,n, let Uj =(¢,..., en). Show that, for every v € Uj, we have (Tv, v) < A, ||v|I?.
(b) Forj = 1,...,n, let Vj = (¢1,...,¢j). Show that, for every v € Vj, we have (Tp, v) > aj lull”. (c)
Show that if U is a subspace of V with dim U = n —j + 1, then there exists a v € U such that (Tv, v) > Aj | v|/?.
(d)
Use the above results to show that A=
min
max
dim U=n—j+1
||=1
ucV
veU
(Tp, v).
332
Singular Value Decomposition and the Spectral Theorem
5.4.24
Remark: This is known as the Courant-Fischer min-max principle. Let A € M,(C) be normal, and let 4 be an eigenvalue ofA. Show that, in any spectral decomposition of A, the number of times that A appears
in the diagonal factor is dim ker(A — 41,).
5.4.25
Suppose that A € M,(C) has distinct eigenvalues 41, ..., An. Prove that
Hint: Prove this first for upper triangular matrices, then use Schur decomposition. 5.4.26
Let A € M,(C)
and let ¢ > 0. Show that there is a B € M,,(C) with n
distinct eigenvalues such that ||A — B||p < ¢.
Hint: First consider the case where A
5.4.27
is upper triangular, then use the
Schur decomposition. (a) Prove that if A ¢ M,(C) is upper triangular and normal, then A is diagonal. (b) Use this fact and the Schur decomposition to give another proof of the Spectral Theorem for normal matrices.
Determinants
After spending the last two chapters working exclusively over the fields IR and C, in this chapter we will once again allow the scalar field F to be any field at all.*
6.1 Determinants Multilinear Functions In this section we will meet a matrix invariant which, among other things, encodes invertibility. Of course we already have good algorithms based on Gaussian elimination
to determine
whether
a matrix
is invertible. But,
for some
theoretical
purposes, it can be useful to have a formula, because a formula is often better than an algorithm as a starting point for future analysis. We begin by noting that in the case of 1 x 1 matrices, there’s a simple invariant which indicates whether the matrix is invertible: if D([a]) = a, then the matrix [a] is invertible if and only if D([a]) 4 0.
Quick Exercise #1. Show that D([a]) = a really is an invariant of 1 x 1 matrices.
More generally, we've seen that an n x n matrix A fails to be invertible if and only if there is a linear dependence among the columns. The most obvious way for this to happen is if two columns are actually identical; we take this as motivation for the following definition.
Definition A function f : M,(F) A has two identical columns.
> F is called isoscopic* if f(A) = 0 whenever
“Although as usual, if you prefer to just stick with R and C, go ahead. tIsoscopic is a nonce word used here for convenience of exposition. q= 1-895 =
"31 !_[s]lalIs] = [2] eu yons 0 # s ‘a 3 s si axayy VAY ‘[9] OF se|NWIS SI [7] JIL VO
334
Determinants Of course, we want our function to detect any linear dependence among the columns, not just this very simple type. One way to do this is to require the function to be isoscopic and also to have the following property.
Definition A function D : M,(F) > F is called multilinear aj,...,a, and bj,...,b, € F” and ce F, for each j = 1,...,n,
Dy
=D)
|
|
ja
aj
|
|
|
Jay
aj +bj
an
|
|
|
| an | | +D]
if, for every
|
|
|
] ai
bj
an
|
|
|
and
| Di
lay
| ++.
|
caj
++
|
ap
=cD}
la;
| ---
aj
| ---
an
That is, D(A) is linear when thought of as a function of one of the columns of A, with the other columns held fixed.
Lemma 6.1 Suppose D : My(F) > F is an isoscopic multilinear function. If AéM,(F) is singular, then D(A) = 0.
Proof
If A is singular, then rank A < n, so some column of A
is a linear combina-
tion of the other columns. Suppose that aj = Desi cpap for some scalars {cp}ezj-
Then by multilinearity,
ptay=p|
fa
at
Deke
ayer
an
|
| |
Aj
|
| |
=o
kaj
oD
al
|
osts
|
|
|
Aj-1
Ak
Ajy1
|
|
|
ves
An
|
Since D is isoscopic, each term of this sum is 0, and so D(A) = 0.
A
6.1 Determinants
335
Examples 1.
The function D : M,(F)
>
F given by D(A) = ay,a22-++@yn
is multilinear,
but not isoscopic. However, if D is restricted to the set of upper triangular matrices, then D is isoscopic: if two columns of an upper triangular matrix are identical, then the one farther to the right has a zero in the diagonal
entry. 2.
More generally, pick any i},...,in € {1,...,}. The function
D : M,(F)
>
given by D(A) = aj, 14j,2 «++ @i,n is multilinear (but not isoscopic).
#2. Suppose that
In
the
context
of multilinear
n(E) > F is multilinear,
functions,
there
F
A
A € M,(F), and
is a different
term
which
is
normally used in place of what we have called isoscopic.
Definition A multilinear function D : M,(F) > F
is called alternating if
whenever there are i # j such that aj = aj.
The reason for this terminology is given by the following lemma.
Lemma 6.2 Suppose D : M,(F) — F Given
A € M,(F) and 1 F is an alternating multilinear function, then D(A) = D(Iy) det A for every
Proof
A € M,(F).
Suppose first that D(I;) 4 0, and definef : M,(F) > F by
Thenf is alternating and multilinear, and f{I,) = ae = 1, so by uniqueness, F(A) = det(A). Multiplying through by D(I,) completes the proof. On the other hand, if D(I,) = 0, then D(A) = 0 for all A. Indeed, by multilinearity,
|
DA)=D)
| iar aine,
|
+++
Liar Ginniy
|
n =
n eV
i=l
i=l
1 = 00-11 = {ap ‘0 = 90-30 = [;
|
| aii es ind
Ci
|
rt
Ci
|
;] pep :Buneuayy Aye|wis 440m ULUN}OD puodas au ul AyWeEUI,
pue Ayauabouoy “ ‘29q + taq— —p ptytv == (29-4+ '9)q-—pl@v+ 10) =|? = le
a Hd t+ lo ap :AUeaUIIRIN EH VO
338
Determinants
If any of the indices i, are the same, then
| Jen |
Dy because D
| ein | | =O |
vr
is alternating. If the iz are all distinct, then by swapping columns as
needed, it follows from Lemma 6.2 that
|
Dy
Jen
«+
|
i
e,]
| =+D0,) =0.
A
|
The following important property of the determinant is our first impressive consequence of the uniqueness stated in Theorem 6.3.
Theorem 6.5 If A,B € M,(IF), then det(AB) = (det A)(det B). Proof
Fora
given A € M,(IF), we define a function Da : M,(F) >
F by
Da(B) = det(AB). We
claim
that Da
is an
alternating
Corollary 6.4 will imply that, for every
multilinear
function.
Once
this is proved,
B € My(F),
det(AB) = Da(B) = Da(I,) det B = (det A)(det B).
| WriteB=
|b;
| ---
|
b,
}, so that
| | Da(B) = det(AB) = det]
Ab;
| --.
Ab,
|
|
The linearity of matrix multiplication and the multilinearity of det imply that Da is multilinear. Moreover,
if bj = bj for i ¢ j, then Abj
=
Ab; and so Da(B)
=
det(AB) = 0 because the determinant itself is alternating. Thus Da is alternating. Since A € M,,(F) was arbitrary, we now know that, for every A,B € M,(F),
det(AB) = Da(B) = (det A)(det B). A
A
first consequence of Theorem 6.5 is that the determinant is a good detector of
invertibility, since it returns zero if and only if a matrix is singular. Corollary 6.6 A matrix
det A~! = (det A)~!.
A € M,(E)
is invertible iff detA # 0. In that case,
6.1 Determinants
339
Proof If A is singular, then det A = 0 by Lemma 6.1. If A is invertible, then by Theorem 6.5,
1 = detI, = det (AA~') = (detA) (detA~'), which implies that det A 4 0 and det A~! = (det A)~!.
A
Theorem 6.5 also implies that the determinant is a matrix invariant.
Corollary 6.7 If A,B € M,(F) are similar, then detA = detB.
Proof If B= SAS~', then by Theorem 6.5, det B = (det $)(det A)(det S~'). By Corollary 6.6, det S~! = (det S)~!, and so detB = detA.
A
In particular, we can define the notion of the determinant of a linear map.
Definition Let V be finite-dimensional. The determinant det T = det [T].3, where B is any basis of V.
of T
€
L(V)
is
Quick Exercise #4. Suppose that V is a finite-dimensional vector space. What is det(cl)?
Existence and Uniqueness of the Determinant We conclude this section with the proof of Theorem piece of notation.
Definition
Let
A 2. For each
6.3. We will need the following
1 < i,j
F such that D,(I,) = 1 as a determinant function on M,(F) (so that Theorem 6.3 says that there exists a unique determinant function on M,(IF)).
= (“P)ep = (Pap os ‘A up
= u aay
“Pp si_A JO siseq Aue 0} Padsas YM
JP JO xUeW aL
+
go
340
Determinants
Lemma 6.8 For each n, there exists a determinant function on M,(E). Proof We will prove this by induction on n. First of all, it is trivial to see that D; defined by D,([a]) = a works. Now suppose that n > 1 and we already have a determinant function D,_; on Mn—
(IF). For any fixed i we define D, : Mn(F) >
F by
(6.2)
Dy(A) = Y(-1)'FagDn—1(Ay). Jel We claim that D, is a determinant function on M,(F), and that D,(I,) = 1. Write A=] a;
---
|
a, |, fix 1
F
by bye
d(B) = Dy,
Dana
aj
:
“
:
:
brat
ee
On-an—1
n-1g
0
vee
oO
Ayj
:
(6.3)
Then dj; is an alternating multilinear function on My—;(F) because Dy is alternating and multilinear. By the same argument as in the proof of Corollary 6.4, it now follows from the induction hypothesis that dj(B) = dj(In—1)Dn—1(B) for every B € My-1(F).
By multilinearity,
dj(n—1) = Dy
nl 4yjDn(In) + > ayjDn| ft
| {er |
s+
lI en-1 lod
Since D,, is alternating, all the terms in the latter sum are zero, so
(6.4)
dj(In—1) = angDn(In) = anj since Dy, is a determinant function on M,(F). This means that
bn
Dn
ees
se
bn-a1 0
ves sat
Dnt Dn-tn-1 0
aj
:
An-ay nj
= 4njDy1(B);
i.e., Dn is uniquely determined for matrices of this special form. Finally, recall (see Lemma 6.1) that if A does not have full rank, then D,(A) = 0, and so D,,(A) is uniquely determined. If A does have full rank, then there is at least
one column, say the jth, with the last entry nonzero. It then follows from the fact that D, is alternating and multilinear that
6.1 Determinants
343
Dy(A) = Dn] | ar
--+
aj
oe
an
| =D,
(a - ata)
| | =—-D)
(a - dla)
| where the sign change in the last equality results from switching the jth and nth
columns as in Lemma 6.2 (and does not take place if j = n). This final expression for D,(A) above is D,(A), where A has zeroes in all but the last entry of the bottom
row
as in (6.3), and
so D,
has already been
shown
to be uniquely
determined.
A
It’s worth noticing that in equation (6.2) in the proof of Lemma 6.8, we built an alternating multilinear function on M,(IF) from one on My;
(FF). Now that we've
finished proving the uniqueness of the determinant, equation (6.2) gives a formula for the determinant of an n x n matrix in terms of determinants of (n— 1) x (n—1) matrices:
a det A = )“(-1)'aj det(Ay)
(6.5)
j=l
for each i.
Examples 1.
Formula (6.5) can be used to derive formula (6.1) for the determinant of a 2 x 2 matrix: using i= 1,
det :
Quick Exercise #5. instead.
2.
= adet | ] - baet [| = ad — be.
Check that you'd
get the same result here using i
Once we know how to compute determinants of 2 x 2 matrices, formula (6.5)
can be used to compute determinants of 3 x 3 matrices:
29g — po = [o] pp + [a] apo— = [? 4 PP ‘s#vd
Determinants
det
3
-1
4
3
°.
:
5
oul
%
-9
Penal
-1
-9
2
fee
-1
5
’
= 3(5 - 5—(—9}(—6)) —FA)((-1)5—(—9)2)+-4((—1)(—6)-5 - 2) = 3 -(—29) —(—1)- 13 + 4(—4)
= —90. Of course there’s no need to stop with 3 x 3 matrices, but we'll wait until the next section to tackle computing determinants of larger matrices. A
Q—
KEY IDEAS
e
A function on matrices is multilinear if it is a linear function of each column (keeping the other columns fixed). A multilinear function is alternating if it gives zero whenever the argument has two identical columns. e The determinant is the unique alternating multilinear function on matrices taking the value 1 at the identity. e det(A) = 0 if and only if A is singular (i-e., not invertible).
e det(AB) = det(A) det(B). e
The determinant is a matrix invariant.
EXERCISES 6.1.1
Compute the determinant of each of the following matrices (using only the techniques of this section).
. {a
(b) [ver 3)
[334 ‘]
. a i J5-i
0 ()
2-1
|] 3 0 aa
4 a
fl a il (@ ]1 2 2 li 29 6.1.2
Compute the determinant of each of the following matrices (using only the techniques of this section).
[3 (a) | 4
2-
[1
oo
-2
}-3 0
2
(@)
6.1.3
(b)
31 i 2+i 1-i —2i
1
(|
Ss4
-5 6 aa
0 4 -1 Suppose that D : M,(F) > F is an alternating multilinear function such that D(I,) = 7. Compute each of the following.
(
=
344
o([} d) (> 2)
6.1 Determinants
Suppose
345
that D
: M3(F)
—> F
is an alternating multilinear function
such that D(I;) = 6. Compute each of the following.
ig (a) DJ |}0 4 0 0 6.1.5
(a) (b)
6.1.6
Suppose What is Suppose What is
o 5 6
3.24 (b) DJ }2 1 2 201
that D : M)(F) D(A+ B)? that D : M3(F) D(A + B)?
is multilinear, and A,B
€ M2(F).
> F
is multilinear, and A,B
€ M3(F).
Show that
201 0
|
-1
I:
2
5.6
6.1.7
> F
and
5 1
= 1|
2 ]
—2
o
0
(=
1
are not similar. Suppose that U is a proper subspace of a finite-dimensional inner product space V, and let Py € £(V) denote the orthogonal projection onto U. What is det Py?
Let R : R?
+ R? denote reflection across the plane x + 2y + 3z = 0.
What is det R? Suppose that dimV
=
n and
T € L(V)
has n
distinct eigenvalues
A1,..-,An. Prove that CETUS 6.1.10
Suppose
that
A €
M,(R)
contro
factors as
A =
BC,
where
B
Ce€M,,,,(R), and m < n. Show that det A = 0.
6.1.11 6.1.12
6.1.13
Show that if A € M,,(C) is Hermitian, then detA € R.
Use formula (6.5) to find a general formula for
a det | a2;
a2 a2
413 a3
431
432
433
Show that the function
n F(A) = I] (> lan — wi) i#j \k=1 is an isoscopic function on M,(R) which is not multilinear.
6.1.14
Show that the function
g(A) = I (= w) j=
\i=1
€ Mnm(R),
346
Determinants
6.2 Computing Determinants Basic Properties We first recall the recursive formula for the determinant established in the previous section.
Proposition 6.9 (Laplace expansion along a row) [f A € M,(F), then for any i =
theses
n det A = )°(-1)'Way det(Ay). j=l
Laplace expansion is particularly useful for computing determinants of matrices that are sparse, i.e., which have lots of entries equal to 0.
Example
det}
—2
0
1
Oo
-1
0
1
0
0
0
-3
iL
0
2-2
4
-2
0
-4
4
2
0
0
100
O | =—(—3)det
1
02-5
4
201
-5
01
4
1
=-3det}
—2 1
1
4
2
—-5
4
0
1
~-o(osaft 4] eso9f-* 1)
6.2 Computing Determinants
347
= —3(4{—13) — 5) = 171.
Quick Exercise #6. example above?
The
Laplace
a
Which rows were used in the Laplace expansions in the
expansion
gives
a simple
formula
for the
determinant
of an
upper triangular matrix, which generalizes the observation we made earlier about diagonal matrices.
Corollary 6.10 If A € M, (I) is upper triangular, then
det A = 411 --- Gun
Proof
We will prove this by induction on n. The statement is trivial if n = 1.
Suppose that n > 2 and the theorem
is known
to be true for (n — 1) x (n — 1)
matrices. Notice that if A is upper triangular, aj, = 0 for i < n—1, upper triangular with diagonal
entries a),...,@n—1,n-1. Then
and Ann is
by Proposition
6.9
with i=n,
n det A = Levan j=l
det Ain = (—1)"*"ayn det Ann = 11... Qn—1,n—14nnA
The following convenient symmetry of the determinant can also be proved via
the Laplace expansion.
Theorem 6.11 If A < M,(F), then detA' = detA. Proof
We will prove by induction on n that the function f(A) = detAT is a deter-
minant function on M,(F), which
by the uniqueness of the determinant implies
that f(A) = detA. Slightly unusually, this inductive proof needs two base steps (the reason for this will become clear in the inductive step of the proof). If A € M, (IF), then AT=A,
so detA = detA'. If A € M)(F), then we can show that detA’ = detA using equation (6.1).
XINEW € X € AY} JO MOd Pull PUR 'XUTEW FX F AU} JO MOI PUODAS "KUEW G x G BY JO MOI PAIL 29H VO
348
Determinants
Quick Exercise #7. Show that det A = detA! for A € M2(F). Now suppose that n > 3, and that we already know that detB' every B € My-1(F). By Proposition 6.9, for any A
= detB for
€ M,(F) and any i=
1,...,n,
(6.6)
FIA) = SY(-1)'aj deta), j=l where
(A);
means the (n — 1) x (n — 1) matrix obtained by removing the ith
row and jth column from A‘. Now, for each k = expansion along the kth row, we see that
| SF]
|
jar
cs)
=
=
I
agtbe
|
s+)
an
|
I
Ev
(aj + by) deta) y
ED
n aj deta gy + 0(- DF bj, det(A) yy
j=l
j=l
[| =
1,...,n, by using the Laplace
|
arp
osss
|
an||+f]
|
far
---
be
| =
an],
since (Dg is unaffected by changing the kth column of A. Thereforef is additive in each column. Homogeneity follows similarly, and so f is a multilinear function. (Note that we haven't yet used the induction hypothesis, that the theorem is true for (n — 1) x (n — 1) matrices.) Now suppose that for some k ¢ @, ay = ag, and choose i ¢ {k, €} (this is why we need n > 3 for this part of the proof). Then by equation (6.6) and the induction hypothesis,
n
n
FIA) = S°(-1)"
aj det(Aji)" = Y(-1)"
j=l
aj det Aj.
j=l
Since i ¢ {k,€}, Aji has two columns which are equal, and so detAjj = 0. Therefore
F(A) = 0 in this situation, and so f is alternating. Finally f(I,)
=
det t
=
detI,
=
1,
and
so f
is indeed
a
determinant
function.
A
Theorem 6.11 implies in particular that the determinant multilinear function of the rows of a matrix. Pq
is an alternating
=a—-ppo=2.90—-pp=\?
2?
.
6.2 Computing Determinants
349
Corollary 6.12 (Laplace expansion along a column) If A € M,([F), then for any
8 detA = )(—1)ay det{Ay). Proof This follows by using Proposition 6.9 to express det A', which by Theorem 6.11 is equal to detA. A
Corollary 6.13 If A € My(C), then detA* = detA. If V is a finite-dimensional inner product space and T € L(V), then det T* = det T.
Proof
Let A denote the matrix with entries Gx, i.e., the entry-wise complex con-
jugate of A. Since A* = RB, by Theorem 6.11 we need to show that detA = det A.
This is trivial for 1 x 1 matrices, and then follows easily by induction on n using Proposition 6.9. The statement for operators now follows from the definition of the determinant of a linear operator and Theorem 5.13. A
Determinants and Row Operations Now that we know that the determinant is an alternating multilinear function of the rows of a matrix, we can perform row operations on an arbitrary matrix to bring it into upper triangular form (in fact, we need only operations R1 and R3 to do this). The effect of those row operations on the determinant of the matrix
is directly determined by the alternating and multilinear properties. Viewing the
row operations as multiplication by elementary matrices is a useful bookkeeping device in this context; recall that the following matrices are those that correspond to operations R1 and R3:
1
Po,ig =
0
(0)
0
;c
, »
9
o
GAD
0
0
1
1
0
Ry
1
Aj)
350
Determinants
Lemma 6.14 Let 1 < i,j < n with i Aj and c€ F, and let P.ij and Rij be the elementary n x n matrices above. Then:
Proof
The matrices P,,jj are upper triangular, so by Theorem 6.10, det P,ij is the
product of the diagonal entries of P,,i;, which are all 1.
The matrix Rj; can be obtained by exchanging the ith and jth columns of I, so by Lemma
6.2, det Rj; = — detI, = —1.
A
This lemma lets us keep track of the effect of row operations on the determinant of a matrix, which leads to the following algorithm for computing determinants. For large matrices, this gives a much more efficient way to compute most determinants than Laplace expansion, or the sum over permutations formula we will see later in this section.
Algorithm 6.15 To compute the determinant of A < M,(F): e
convert A into an upper triangular matrix B via row operations R1 and
R3 only, e let k be the number of times two rows are switched,
© detA = (-1)*bn +++ bin. Proof
If B is the result of performing row operations R1 and R3 starting from A,
then by Theorem
2.21,
B=E,---EyA for some elementary matrices E;,...,Ep of type Pc,ij or Rij. By Theorems 6.10 and
6.5,
det B = by; +++ ban = (detE;)- + - (det E,) detA, and by Lemma
6.14,
(detE,) + -(detE,) = (—1)*.
A
Example
3 det|—2 1
0 1 1 01 -2 1]=-det]-2 -2 1 01 3.01 1 =-det}O
0 -2
1 3
6.2 Computing Determinants
351
1
0
1
=-—det]}0
-2
3
0
0
-2
= —(1)(—2)(—2) = —4.
A
Quick Exercise #8. What were the row operations used in the example above?
Permutations In this section we will develop an alternative formula for the determinant. We first need to introduce the concept of a permutation.
Definition A permutation of {1,...,n} is a bijective function
ot{heyn)> (een Equivalently, (o(1),...,o(n)) is a list of all the numbers in {1,...,} in some order. The set of all permutations of {1,...,”} is denoted S, and is called the symmetric group on n letters. The identity permutation is denoted 1.
The symmetric
group
S, has a natural representation
matrices: to a permutation
o
as a group
€ S, we associate a permutation
of n x n
matrix A,
with
entries
0
otherwise.
.
to
the
permutation
of {1,2,3,4}
which
o-oo
coo
Example The matrix corresponding exchanges 1 with 2 and 3 with 4 is
ifol)=j,
coo +ooo
aj =
1
0} paau },UOp am yng ‘wWYyWOHje UOReUIWIa UeISsNeEd ay} MO}|O} Aj}IU}S },UPIP BM ARON
"MOd PJIYy) BY O} MOL
JS1l} ay) SAWN I— Ppe Mos Puodas ay} 0} MOI JS4y aYI SAWN Z PPE ‘SMO PAIL PU ISI aUI dems :BH WO
Determinants of {1,2,3,4} which
fixes 2 and
cocoon o-oo
oo-0
The matrix corresponding to the permutation cyclicly permutes 1 > 3 > 4—> lis +ooo
352
Quick Exercise #9. Show that
Ac
That is, A, is exactly the matrix of the linear map on F” which permutes the coordinates of a vector according to o. This has the important consequence that
Aojoo, = Ao, Ao:
(6.7)
Every permutation has an associated sign, which is a kind of parity. While the sign can be defined in terms of the permutation itself, it is most easily defined in terms of the representation of the permutation as a matrix, as follows.
Definition
Let o € S,, with permutation matrix A,. The sign of o is sgn(o) := det Ag.
Lemma 6.16 Let o, p € S,. Then
e sgn(o) € {+1}, © sgn(o o p) = sgn(a) sgn(p), e sgn(o) = sgn(o~'). Proof It follows from equation (6.7) that Az! = A,-1, so that Ag is in fact invertible. The RREF of A, is therefore I,; moreover, it is clear that when row-
reducing A,, only row operation R3 is used. It thus follows from Algorithm 6.15 that sgn(o) = det A, = (— 1)*, where k is the number of times R3 is applied. The fact that sgn(o o p) = sgn(o)sgn(p) is immediate from equation (6.7) and the fact that det(AB) = (det A)(det B). In particular, this implies that
sgn(o) sgn(o~!) = sgn(oo~!) = sgn(v) = detI, = 1.
A
6.2 Computing Determinants
353
We are now in a position to state the classical “sum over permutations” formula for the determinant.
Theorem 6.17 For A < M,(F), detA = >> (sgn o)a1,o(1) -- + @no(n) oeSy
= LV bgnoagn1 --- doom oESy
Proof
By multilinearity,
|
det(A) = det | | SF _y aie
n
=F. i=l
++
Enon
Sia: sdindet}
i=l
(6.8)
|
|
fe,
--- ei,
|
|
Now, if any two of the indices i),...,in are equal, then
because the determinant is alternating. So the sum in (6.8) can be taken to be the sum over sets of distinct indices {i),..., in}, i-e., the sum over permutations
> oESy
| Go(iji**Go(njn det | | ota) |
+++
| Cain) | | -
(6.9)
|
Observe that the (i,j) entry of the matrix
| Co(1)
|
| 7"
Ca(n)
|
is 1 if i = o(j) and O otherwise. That is, this matrix is exactly A,-1, and so its determinant is sgn(o~!) = sgn(o). This completes the proof of the second version of the formula. Alternatively, observe that Go(iyi
++ Go(nn = Mo-!(1) °° * Ano
*(n)3
354
Determinants
all of the same factors appear, just in a different order. We can therefore rewrite
expression (6.9) as >
yg -1(1) +
Gao 1(n) SBI").
oESn Making the change of variables p = o~! gives the first version of the formula. A
Q—m
KEY IDEAS
e Laplace expansion:
"
n
det(A) = )0(-1)"ay det(Ay)
= >(-1)'May deta).
j=l
isl
© det(A’) = det(A) and det(A*) = det{A). e To compute A via row operations: row-reduce A to an upper triangular matrix B. Then
detA = (—1)kep!
cebu Dany
where k is the number of times R3 is used and the ¢; are the constants used in e
R2. A permutation
is a bijection from
{1,...,n}
to itself. Permutations
can be
encoded as matrices. e The sum over permutations formula:
det A = SP (sgn o)aio(1) + Aoi) = oESy
EXERCISES
(68M O)A5(1),1 +++ Gotan TES
355
6.2 Computing Determinants
Gr2e2)
Find the determinant of each of the following matrices.
(a)
2-7 1 8 2a 8
=1 (b)pb) | 72
hoon
a
@
[3
dj2 ee 1 4 a ° 3) in 2 21 " C 6.2.3
(a) (b)
4°41 -1 -2
Ml
1 -2 3 0 a Am |e Oo =1 2 3
(ci 0
0
0
0 2 0 001 io 0 1 0-2 0 10
3
~2
-10 0 020 4 0 1 0 0 -1 -12 0
Show that if U € M,,(C) is unitary, then |detU| = 1. Show that if o1,...,0» are the singular values of A € M,(C), then
|detA] = 0) --- op. 6.2.4
Let
A € M,(F).
(a)
Show Uy
(b)
that if
A =
LU
is an LU
decomposition,
then
detA
=
+ + Unn-
Suppose
that PA
= LU is an LUP
the permutation matrix A,
decomposition,
and that P is
for some o € S,. Show
that detA =
(sgno)un --- Un. 6.2.5
Show that if
A =
LDU is an LDU decomposition
of A € M,,(F) (see
Exercise 2.4.17), then det A = dyy-++ dyn. 6.2.6
Show
that if A =
QR
is a QR decomposition
of
A € M,(C),
then
|detAJ = [ri +++ tan. 6.2.7
Show that if A € M,(C), then |detA| < []}_; jal]. (This is called Hadamard’s inequality.) Hint: Use Exercise 4.5.21.
6.2.8
Define A € M,(R)
by aj = min{i,j}. Show that det A = 1.
Hint: Proceed by induction. For the inductive step, use row and column operations. 6.2.9
Suppose that V is a finite-dimensional complex vector space, and T € £(V) has only real eigenvalues. Prove that det T € R.
6.2.10
Suppose that
A € Mm+.,(F) has the form
for some B €
Mm (F),
det B det D.
pd
D € My(F), and C € Mm,.n(F). Show that det A =
356
Determinants
Hint: First show that f(B) = det [:
Cir p| ==
function on M;,(F). Then show that g(D) =
6.2.11
a an alternating multilinear
EB
5] is an alternating
multilinear function of the rows of D. Suppose that A € M,(C) is a normal matrix. Give an expression
for
detA in terms of the eigenvalues of A. (Remember that the eigenvalues
may not be distinct.) 6.2.12
Let V be a real vector space with basis (v;,...,v,), and define
T € £(V)
by setting
k T=
)
in = ¥, +202 + +--+ kop
i=1
for each k = 1,.
,n and extending by linearity.
(a)
Find det T.
(b)
Use this to show that (01,1 + 2v2,...,¥1 + 202 + +--+ vy) isa basis of V.
6.2.13
Fix A € M,,(R) and for t € R let f(t) = det Un + tA).
6.2.14
Show that f’(0) = trA. Hint: Either proceed from the sum over permutations formula, or consider the case that A is upper triangular first, then show that the general case follows. The permanent of a matrix A € M,,([F) is defined by
perA := » ot) *** Ana(n)oESn That is, it is the same as the determinant but without the factors of sgn(o) in Theorem 6.17. (a) Show that the permanent is a multilinear function, but that it is not alternating.
(b)
6.2.15
Give an example of matrices A,B € M2(F) such that per(AB) 4 (per A)(per B). Show that a matrix A € M,(F) is a permutation matrix if and only if: e e e
6.2.16
each row of A contains exactly one entry equal to 1, each column of A contains exactly one entry equal to 1, all the other entries ofA are 0.
Let o € S,, for n > 2. Use the representation of permutations as permu-
tation matrices together with linear algebra to prove the classical fact
6.3 Characteristic Polynomials
357
that there are transpositions (i.e., permutations which move only two
elements) t),...,T% € Sn such that O=T0+:-0TR 6.2.17
Suppose that t),...,t, and t,...,7 € S» are transpositions (permutations which move only two elements), and that T1O0++-OTR=T0---0%.
6.2.18
Prove that k and ¢ are either both even or both odd. Let x,...,4%, € F, and define
Well,
1
x
x
+e.
xt
1
ox; 2
x} 5
---
xf » | €Mai@);
Lote
Ro
a
ie, vj = can Prove that det V =
I]
(qj — xi).
O 2, finding roots of polynomials gets a whole lot harder. For degrees 3 and 4 there are a cubic formula and a quartic formula which are analogous to the familiar quadratic formula, but they're drastically more complicated.” When
n > 5, no such formula exists.*
In fact, some computer algorithms for finding the roots of polynomials actually use Proposition 6.18 the other way around: to find the roots of a polynomial p(x), they first construct a matrix A whose characteristic polynomial is p(x) (one way to do this is given in Exercise 6.3.18) and then compute the eigenvalues of A by other means. Nevertheless, the characteristic polynomial is handy for finding the eigenvalues of 2 x 2 matrices (which come up more in linear algebra homework than in realworld problems), and is an important theoretical tool.
“They're much too messy to include here ~ try googling “cubic formula” and “quartic formula.” t It’s not just that no one has been able to find a formula so far - it was actually proved by the Norwegian mathematician Niels Henrik Abel that there isn’t one. (Vd = ("pr — yyrp = (, SIF — v)spap = (“Ir — ,_Svspap = (2) !-S¥Sd :o1# vd
360
Determinants
Tx+
where T For the remainder of the section, it will be helpful to be working over an algebraically closed field. The following result (which we state without proof) often justifies restricting to that setting.
Proposition 6.21 If F is any field, then there is an algebraically closed field K such that F C K, and such that the operations on F are the same as those on K.
For example, the field R of real numbers is not algebraically closed, but it is contained in the algebraically closed field C of complex numbers. Saying that the operations are the same means, for example, that R sits inside
C because + and -
mean the same thing in both places, but F2 does not sit inside C, since + and -
mean different things in those two fields. Proposition 6.21 says that there is an algebraically closed field which contains F, with the same operations, but it is more complicated to describe. The relevance of Proposition 6.21 in this section is that if A € M,(F) and F is not algebraically closed, then the characteristic polynomial pa(x) may not factor into linear factors.
Example x? + seen On Pa(x)
The characteristic polynomial of the matrix
A =
ool
1
Hl is pata) =
1, which does not factor over R. This corresponds to the fact, which we have before, that the matrix A has no eigenvalues or eigenvectors over R. the other hand, if pa(x) = x? + 1 is viewed as a polynomial over C, then does factor:
pala) = (x + Ir — i). This means that over C, the matrix A does have eigenvalues: i and —i.
A
Multiplicities of Eigenvalues Over an algebraically closed field, matrix factors completely, which definition.
amet sy to)
r=
the characteristic polynomial allows us to make the
of every following
—ny—o=([) ]e— [2 3]) seats
6.3 Characteristic Polynomials
361
Definition Let A € M,,(F), where F is an algebraically closed field, and suppose that A is an eigenvalue of A. Then the characteristic polynomial pa(x) of A factors as
pala) = (=1)"(r — FE =e)" +(e = em), where the c; are distinct and different from 4. The power k to which the factor (x — A) appears in pa(x) is called the multiplicity of A as an eigenvalue of A.
If p(x) has degree n and F is algebraically closed, then although p(x) may have fewer than n distinct roots, it has exactly n roots if we count each root with multiplicity. For example, the polynomial
pir) = 1° — 52° + ort + 4? — 8x? = (r+
Ilr — 2)?
has the six roots —1, 0, 0, 2, 2, and 2, counted with multiplicity.
The following result summarizes our observations.
Proposition 6.22 If F is algebraically closed, every A € M,(F) has n eigenvalues, counted with multiplicity, Ifd1,.-.,Am are the distinct eigenvalues of A, with respective multiplicities
ky,...,Rm, then
m pala) = T[e =a5}s
jel Since the characteristic polynomial of A is an invariant, it follows that not only the eigenvalues of a matrix but also their multiplicities are invariants. In the upper triangular case, the multiplicities of the eigenvalues are easy to
see: Lemma 6.23 Suppose that A € M,(IF) is upper triangular and that 2 is an eigenvalue of A. Then the multiplicity of i. as an eigenvalue of A is the number of times 4 appears on the diagonal of A.
Quick Exercise #12. Prove Lemm
Counting eigenvalues of a matrix with multiplicity gives us the following nice expressions for the trace and determinant in terms of the eigenvalues.
(Wy = x) +++ (Uo = a)y(1-) = (@ — MD) + (r= Up) = (Vd os YeinBueuy si "Er — y xuyeW aUL :Z1# WO
362
Determinants
Corollary 6.24 Suppose that F is algebraically closed, and that 21,...,An are the eigenvalues of A € My(F), counted with multiplicity. Then
tr(A) =A, +--+ +An Proof This follows immediately erwise, it follows from the fact (see Theorem 3.67) and the fact eigenvalues with multiplicity are
and
det(A) = Ay ---An.
from Lemma 6.23 if A is upper triangular. Oththat A is similar to an upper triangular matrix that the trace, the determinant, and the list of all invariants. A
The following result is a good example of the value of viewing a field F as lying in a larger, algebraically closed field: even though the conclusion is true in any field, the proof works by working in the larger field.
Proposition 6.25 If A € M,(IF), then the coefficient of x"~! (—1)""' tr(A) and the constant term in pa(x) is detA.
in pa(x)
Proof First suppose that F is algebraically closed and that Aj,...,A, eigenvalues of A, counted with multiplicity. Then
is
are the
Palt) = (21 — 2) +++ (An — 2). Expanding the product gives that the terms of order ( — 1) are
Dalal
vet dn(—a" ts (Ha
tb Ande
Ss (- 1)" tral"!
and that the constant term is A, +++ A, = det A.
If F is not algebraically closed, we may view A as a matrix over K, where K is an algebraically closed field containing F. Then the above argument applies to show that, when viewed as a polynomial over K, the coefficient of x”~! in pa(a) is (—1)""' tr(A) and the constant term is (—1)" det(A). But, in fact, it doesn’t matter
how we view pa; we have identified the order n—1 and order 0 terms of pa(x), and we are done. (As a sanity check, note that even though we are using expressions for tr(A) and det(A) in terms of the eigenvalues ofA, which may not lie in F, the trace and determinant themselves are indeed elements of
F, by definition.)
The Cayley—Hamilton Theorem We conclude this section with the following famous theorem.
Theorem 6.26 (The Cayley-Hamilton Theorem) If A € My (IF), then pa(A) = 0.
A
6.3 Characteristic Polynomials
363
Before proving Theorem 6.26, let's consider an appealing but wrong way to try to prove it: since pa(x) = det(A — x1), substituting x = A gives
(6.10)
palA) = det(A — AI,,) = det 0 = 0.
The typographical conventions used in this book make it a bit easier to see that there’s something fishy here - pa(A) should be a matrix, but det0 is the scalar 0! In fact the error creeps in already in the first equality in equation (6.10). Since the things on opposite sides of the equals sign are of different types (a matrix on the left, a scalar on the right), they can’t possibly be equal. Proof of Theorem 6.26 First of all, we will assume that F is algebraically closed. If not, then by Proposition 6.21 we could instead work over an algebraically closed field containing F, as in the proof of Proposition 6.25. Now since F is algebraically closed, by Theorem 3.67, there exist an invertible matrix S € M,,(IF) and an upper triangular matrix T € M,,(F) such that A = STS~'. Then for any polynomial p(x), p(A) = Sp(T)S~!.
Quick Exercise #13.
Prove the preceding statement.
Furthermore, pa(x) = pr(x) since the characteristic polynomial is an invariant. Therefore
palA) = Spa(T)S~! = Spr(T)S~', so we need to prove that py(T) = O for an upper triangular matrix T.
Now, by Proposition 6.23,
prt) = (ti = a) +++ (tn = 2), so
pr) = (tila — T)- + tanln — 1). Note that the jth column
of tjjIn — T can only have
nonzero
(j — 1) positions; in particular, the first column of tI,
entries in the first
— T is 0. Thus,
(tyln — The; € (e1,-..,€j-1) for j > 2, and (1, — T)e; = 0. This implies that
(tin — T) ++ (tn — The; = 0, and so
pr(Te; = (t21 jel — T) +++ tnln — T(triln — T) ++ (tn — They = 0. Therefore each column of pr(T) is 0.
a
1-SWD4S = | Sus” +--+ + ;_Sist9 + "100 = (yd uayp ‘ytly + +--+ aly + Op = (xd J! OS §|_Sy1S = (;-S1S)---(,_-SIS) = yV ‘I = ¥ yea 404 “E14 VO
364
Determinants
Q—m e
KEY IDEAS
The characteristic polynomial ofA is pa(A) eigenvalues of A.
= det(A — AI,). Its roots are the
e The multiplicity of an eigenvalue of A is its multiplicity as a root of pa. For an upper triangular matrix, this is the number of times the eigenvalue appears on the diagonal. e The Cayley-Hamilton Theorem: pa(A) = 0.
EXERCISES 6.3.1
For each of the following matrices, find the characteristic polynomial and all the eigenvalues and eigenvectors.
ofeo)
[Sey ey]
r
L
[1 id)
}O
o
3
-—2
0
3°90
6.3.2
=
=
1
3
0
4
1
For each of the following matrices, find the characteristic polynomial and all the eigenvalues and eigenvectors.
Pa) fi
ft 3
2
2
f2
6.3.3
faa
-
41
—4
-3
=
=A
tA= Show
i
0|
il
fi
2
0
2
1
(d|-1
-2
0
il
il
il
9
that
A
and
I,
have
the
same
trace,
determinant,
characteristic polynomial. Show that A and I, are not similar.
6.3.4
Prove that the matrices
[ere] 1
me [5 |
—5, 387, 621.4
and
3)
are similar. Prove that the matrices 1,000000001 (0) are not similar.
0 2)
nt
10 0 2
3)
and
365
6.3 Characteristic Polynomials
6535)
Suppose that the matrix A € M3(R)
has trA = —1
and detA = 6. If
A = 2 is an eigenvalue of A, what are the other two eigenvalues? 6.3.6
Suppose that the matrix
A € M3(R)
has trA = —1 and detA = —6. If
A = 2 is an eigenvalue of A, what are the other two eigenvalues? 6.3.7
Let A= (a) (b)
il
7
al
Find the characteristic polynomial pa(x). Use the Cayley-Hamilton Theorem to
find
A?
without
ever
find
A‘
without
ever
multiplying two matrices together. 6.3.8
(Se)
6.3.10
6.3.11
Let
A=
es
(a)
Find the characteristic polynomial pa(x).
(b)
Use
1
the
ai
8
Cayley-Hamilton
Theorem
to
multiplying two matrices together. Let A € M,(F). Show that rankA = n if and only if pa(0) 4 0. Suppose A is an eigenvalue of A € M,(). Prove that the geometric multiplicity of 4 (see Exercise 3.6.24) is less than or equal to the multiplicity of A. Show that if A is a normal matrix, then the geometric multiplicities (see Exercise 3.6.24) of its eigenvalues are equal to their multiplicities.
6.3.12
Show that if A € M,,(C) has only real eigenvalues, then the coefficients
(pels)
of pa(x) are all real. Suppose that A € M,,,,(C) with m < n, and that 0; > --- > om > 0 are the singular values of A. Show that On: fae nee are the eigenvalues
6.3.14
6.3.15
of AA*, counted with multiplicity. Suppose that F is algebraically closed, A € M,(IF) is invertible, and 4 is an eigenvalue of A. Show that A—! has the same multiplicity as an eigenvalue of A~! that A has as an eigenvalue of A. Let p(x) be any polynomial over F. Prove that p(A) lies in the span of
(In, A, A2,...,A"-1), 6.3.16
lies in the span of
(a)
Show that if A € M,(F) is invertible, then A~!
(b)
A tae): (pean Ace Use this fact and Exercise 2.3.12 to prove that if A is invertible
and upper triangular, then A~! is upper triangular. 6.3.17
Suppose
that A
€
M,(C)
multiplicity. Prove that
has
eigenvalues
Aj,...,An,
counted
with
z al’ < Ale j=l
Hint: Prove this first for upper triangular matrices, then use Schur decomposition.
366
Determinants
6.3.18
Let p(x) = co + cre + +++ + Cpa"!
+2" be a given polynomial with
coefficients in F. Define
0 0 10 A=|0 1 0 0
0 0 0 =
-co -c -c
1 —tey
Show that pa(x) = p(a). Remark: The matrix A
6.3.19
is called the companion matrix of p(x).
Here’s another way to convince yourself that the appealing-but-wrong way to prove the Cayley-Hamilton Theorem is indeed wrong. Given A € M,(F), define the trace polynomial ta(x) = tr(A — +I,). The same kind of argument suggests that ta(A) = tr(A — AI,) = 0. Show that, in fact, if F = R or C then ta(A) = 0 if and only if A = cl,
for some c € F.
6.4 Applications of Determinants Volume The connection to volume is one of the most important applications of determinants outside of linear algebra. Since much of this topic is necessarily beyond the scope of this book, we will only sketch the details. Given real numbers a; < bj for each i = 1,...,n, the set
R= [ay bi] x +++
[@n,bal = ((01,..-,4n) | ai 0 and n = 2. Then T(R) is a parallelogram P:
R
P=T(R)
Figure 6.2 Transforming the rectangle 8 by P¢,1,2-
Ignoring line segments (which have zero volume rectangle and two disjoint triangles J; and J2:
in R?),
P consists
Figure 6.3 Dividing up P = T(R).
Translating J2 as shown:
as
Figure 6.4 Moving J2 to turn P back into R.
of a
6.4 Applications of Determinants
369
we see that T(R) has the same area as R. Thus vol(T(R)) = vol(R) (recall that in this case, det T = 1, so this is what we wanted).
When n > 2, the discussion above describes what happens in the cross-sections of T(R) in planes parallel to the x;-1; plane. The equality vol(T(R)) = vol(R) then follows by computing the volume of T() by slices. e
A=
Q¢i. In this case
T(R) = [ay, by] x +++ x [caj, cbj] x «++ x [an, Dn)
ifc > 0, and T(R) = [a, bi] x --- x [cbi, cai] x --- x [an, dnl if c < 0. In either case,
vol(T(R)) = |e] (61 — a1) +++ (bn — an) = |e] vol(R) = |det T| vol(R).
© A=Ry. Quick Exe:
#14. Prove that in this case, vol(T(R)) = vol().
In all three cases, by Lemma
6.14, we have
vol(T(R)) = |det T| vol(R)
(6.12)
when & is a rectangle and the matrix A of T is elementary. We can immediately extend to simple sets: if A is still elementary and § simple set
where ®j,..., Rm
is the
are essentially disjoint rectangles, then
m
m
i=1
i=l
T(S) =T (U x) =UrTei, and the sets T(R;),..., T(Rm) are still essentially disjoint. Thus
vol(T(S)) =} vol(T(Ri)) = }~ |det T| vol(R;) = |det T| vol(S). i=1
(6.13)
i=1
If Q C R" is Jordan measurable and is approximated by a sequence (S;),>1 of simple sets, and A
is still elementary, then
vol(T(Q)) = jlim_vol(T(Si)) = |det T| jim, vol(S,) = |det T| vol(Q)
[Hq ta] x ++ x [iq Ho] x +++ x [fq] x + x [tg *t0] = Qo)
(6.14)
vaya T> Fp WO
370
Determinants
Finally, let
T €
£(R")
n then
T(Q)
be an
if rankT
|
It ay db It
ol aiyr ot
---
| an |
obtained from A by replacing the ith column with b. Then the unique solution of the n x n linear system Ax = b is given by
y= Het det A for eachi=1,...,n.
6.4 Applications of Determinants
Proof
oat
Since detA # 0, we know that A
is invertible, and so there is indeed a
unique solution x of the linear system Ax = b. For each i=
1,...,n, define X; to
be the n x n matrix
e1
Sfer
C1
detX;
=
+;
C1
|
hot
|
|
+
|
the
|
Verte
|
because
en
|
er
| Then
X
Cir
|
|
oe
On
|
determinant
is
|
multilinear
and
alternating.
Furthermore, AX; is exactly Aj.
Quick
Exercise
Ai.
#15. Verify that AX,
Therefore detAj = det AX; = (det A)(detXj) = (det A).
a
Example Recall our very first linear system, describing how much bread and beer we could make with given quantities of barley and yeast: 1 Xr+—-y=20 ty
1
2x+—-y ty
= 36.
a=[%
|:
2
|:
36
Bleale
a-[}
Bleal—
In this system,
af} I) 2
36
and so 1 detA = wu
detA; = —4,
detA, = —4.
It follows from Theorem 6.28 that the unique solution to this system is
rey aie, (-3) —4)
a
Cofactors and Inverses There is a classical formula for the inverse of a matrix in terms of determinants, which can be proved using Theorem 6.28. To state it we first need a definition.
q=xv sisi f= 14) fe = fay si sips ‘PA 1 4 FX JO UUUN}OD YAS ayy set W SI IxW JO ULUN}OD UNF aus :GL# VO
372
Determinants
Definition
Let A € M,(F). The cofactor matrix ofA is the matrix C € M, (IF)
with cj = (—1)'* det Ay. The adjugate” of A is the matrix adj A = C'.
Examples
1. The cofactor matrix of A = :
d -c 2.
Al is
‘
o, = O are the singular values of T.
6.4.11
Show that the volume of the n-dimensional parallelepiped spanned by
aj,-..,an € R" is | det]
a;
| 6.4.12
| ---
an
|
Use Theorem 6.29 to prove that if A € M,(F) is upper triangular and
invertible, then A~!
is also upper triangular. What are the diagonal
entries of A7!? 6.4.13
6.4.14
6.4.15
Consider an n x n linear system Ax = b. Show that if A and b have only integer entries, and det A = +1, then the unique solution x also has only integer entries. Suppose that A € M,(IR) has only integer entries. Prove that A~! has only integer entries if and only if detA = +1. In the sketch of the proof of Theorem 6.27, we asserted without proof that a line segment in RR? has zero volume (i.e., area). Prove this by showing that for every n, a given line segment can be covered by n rectangles, in such a way that the total area of the rectangles goes to 0 asn—
w.
376
Determinants
PERSPECTIVES: Determinants
e
The determinant is the unique which takes the value lon I.
e For any i, detA= SE j=l n e For any j, detA = VE i=l
vay
alternating multilinear function on M,(F)
det(ay).
nhay det(Aj).
© detA = So (sgno)ar,o(1)---Ano(n) = (8ENG)ag(1),1 --- @o(n.noeSn
OES,
O11 ++ Ban
e
detA = (—1) omc gular matrix B, and
‘ where A can be row-reduced to the upper trian-
in the process,
the cj are the constants used in row
operation R2 and rows are swapped k times. e
IfA is upper triangular, detA = ay) +++ Gan.
e detA
=
[J/L,4j,
where
Aj are the eigenvalues
of A,
counted
with
multiplicity.
We
conclude
by
repeating
some
earlier
perspectives,
with
new
additions
involving determinants.
PERSPECTIVES:
Eigenvalues A scalar 4 € F is an eigenvalue of T € £(V) if any of the following hold. e There is a nonzero v € V with Ty = dv. e The map T — AJ is not invertible. e det(T — Al) =0.
Perspectives
377
PERSPECTIVES: Isomorphisms Let V and W be n-dimensional vector spaces over F. A linear map T : V > is an isomorphism if any of the following hold.
W
T is bijective. T is invertible. T is injective, or equivalently null(T) = 0. T is surjective, or equivalently rank(T) =
If (v1,..-, Un) is a basis of T, then (T(v1),..., T(vn)) is a basis of W. If A is the matrix of T with respect to any bases on V and W, then the columns of A form a basis of F”. e IfA is the matrix of T with respect to any bases on V and W, the RREF of A is the identity I,,.
© V=Wand detT £0.
Appendix
A.1 Sets and Functions Basic Definitions Definition A set S is a collection of objects, called elements. If s is an element of S, we write s € S.
A subset T of S is a set such that every element of T is an element of S. If T is a subset of S, we write T C S.
The subset T is a proper subset of S if there is at least one element of S which is not in T; in this case, we write T ¢ S.
The union of sets S; and Sp is the collection of every element which is in either S; or S2; the union of S; and S2 is denoted S; U S2.
The intersection of sets S$; and S2 is the collection of every element which is in both S; and Sp; the intersection of S$; and S) is denoted S$; M So. This is a rather unsatisfying definition; if we need to define the word set, defin-
ing it as a collection of things seems a bit cheap. Nevertheless, we need to start somewhere. Some sets which come up so often that they have their own special notation are the following.
e R: the set of real numbers C: the set of complex numbers N: the set of natural numbers
1, 2, 3,...
Z: the set of integers (natural numbers, their negatives, and zero) Q:
the set of rational
numbers;
i.e., real numbers
that can
be expressed
as
fractions of integers R": the set of n-dimensional (column) vectors with real entries
When we want to describe sets explicitly, we often use so-called “set-builder notation,” which has the form* “Some people use a colon in the middle instead of a vertical bar.
379
A.1 Sets and Functions S = {objects | conditions} . That is, inside the curly brackets set elements are, and to the right satisfied so that the object on the left which satisfies the conditions
to the left of the bar is what kind of thing the of the bar are the requirements that need to be left is in the set. Every object of the type on the on the right is in S. For example, the set
S={teR|t>0} is the set of all nonnegative real numbers, which you may also have seen denoted [0, 00). Set-builder notation doesn’t give a unique way of describing a set. For example, l+z
x
z—6||zeR}=f]y| Zz
eR? |xr=14+z,
y=z-6
Z
As in this example, if there are multiple conditions to the right of the colon separated by a comma (or a semi-colon), it means that they all have to be satisfied. As with the definition of a set, we will give a slightly evasive definition of our
other fundamental mathematical object, a function. Definition A function is a rule that assigns an output to each element of a set of possible inputs. To be a function, it must always assign the same output to a given input. The set of possible inputs is called the domain of the function, and the set of all outputs is called the range of the function. Saying that a function is defined on a set X means that X is the set of inputs; saying that a function takes values in a set Y means that all of the possible outputs are elements of Y, but there may be more elements of Y which do not actually occur as outputs. If a function takes values in Y, then Y is called the codomain of the function. Strictly speaking, if the same rule is thought of as producing different sets, then it is defining two different functions (because are different). For example, the rule which assigns the output x? to x € R) could be thought of as having codomain R or codomain [0,
outputs in two the codomains the input x (for 00).
It is standard to summarize the name of a function, its domain, and codomain
as follows: If f is a function defined on X and taking values in Y, then we write
fixX>yY.
380
Appendix
If we want to describe what a function does to an individual element, we may
use a formula, as in f(x) = x?, or we may use the notation x +> x2, or x 4 x, if we want the name of the function made explicit.
Definition If f : X — Y is a function and f(x) = y, we call y the image of x. More generally, if A C X is any subset of X, we define the image of A to be the set
FIA) = (f() | 4
Definition Let f : X —
A}.
Y be any function. We say that f is injective or
one-to-one if
Svs) = f(r)
=
ty = 2.
We say thatf is surjective or onto if for every y € Y, there is an x € X so that
SQ) = y. We say that f is bijective or is a one-to-one injective and surjective.
correspondence
if f is both
Intuitively, an injective function f : X — Y loses no information: if you can tell apart two things that go into f, you can still tell them apart when they come out. On the other hand, if f : X — Y is surjective, then it can tell you about every element of Y. In this sense, if there is a bijective function f : X — Y, then, using f, elements of Y contain no more and no less information than elements of X.
Composition and Invertibility Definition
1. Let X, Y, and Z be sets and let f : X > Y and g: Y > Z be functions. The composition go f of g andf is defined by
goftx) = g(fla). 2. 3.
Let X be a set. The identity function on X is the function « : X > X such that for each x € X, ox) =x. Suppose that f : X > Y is a function. A function g : Y > X is called an
inverse function (or simply inverse) of f if go f is the identity function on X and f og is the identity function on Y. That is, for each x € X,
gofl=
A.1 Sets and Functions
381
and for each y€ Y,
fogh) =». In that case we write invertible. It's
important
to
watch
g = f—!.
out
for a
If an inverse to f exists, we say thatf is
funny
consequence
of the
way
function
composition is written: g of means do f first, then g.
Lemma A.1 1.
Composition of functions is associative. That is, if f : X; > X2, g : X, >
X3, and h : X3 — Xz, then ho(gof) =(hog)of. 2.
If f :X — Y
Proof
Exercise.
is invertible, then its inverse is unique.
a
Proposition A.2 A function f : X — Y
is invertible if and only if it is bijective.
Proof Suppose first that f is bijective. We need to show thatf has an inverse g:Y>X. Since f is surjective, for each y € Y, there is an x € X such that f(x) = y.
Furthermore, since f is injective, there is only one such x. We can therefore unambiguously define a function g : Y > X by g(y) =r. We then have that
Fogo) =f (90) = fl) = y. On the other hand, given any x € X, let y = f(x). Then g(y) = x by the definition of g, so
g of (x) = g(f(x) = gh) =x. Conversely, suppose that f : X — Y inverse. If x),
is invertible, and let g :
€ X are such that
f(r) = flr), then
x = 9(flxs)) = g(flx2)) =»,
Y > X
be its
Appendix and so f is injective. Next, for each y € Y,
y=F(90)), and so f is surjective.
A
A.2 Complex Numbers The set C of complex numbers is
C:= {a+ ib|a,beR}, where i is defined by the fact that 2 = —1. Addition and multiplication of elements of C are defined by
(a + ib) + (c+ id) = a+c+ilb+d) and
(a + ib) - (c + id) = ac — bd + iad + be). For geometric purposes, we sometimes identify C with R*: if z= a+ib € C, we can associate z with the ordered pair (a, b) of real numbers. But it is important to
remember that C is not R?: the multiplication defined above is a crucial difference.
Definition For z = a + ib € C, the real part of z is Rez = a and the imaginary part of z is Imz = b. The complex conjugate Z of z is defined by Z=a-— ib. The absolute value or modulus
|z| is defined by
|z| = Va? + b?. Geometrically, the complex conjugate of z is the reflection of z across the x-axis (sometimes called the real axis in this setting). The modulus of z is its length as a
vector in R?. The following lemma gives some of the basic properties of complex conjugation.
Lemma A.3 BGS fo ES fe
382
w+z = WZ. wz ZER if and only ifZ
Z=(z|. Z+Z=2Rez.
z—Z=2ilmz.
A.2 Complex Numbers
Figure A.1_ Zs the reflection of z across the x-axis, and |z| is the length of z.
Proof
Exercise.
A
The study of complex-valued functions of a complex variable is called complex analysis. We use very little complex analysis in this book, but it is occasionally convenient to have access to the complex exponential function e”. This function can be defined in terms of a power series as in the real case:
2 Zz e=ye j=0
In particular, if Imz = 0, this is the same as the usual exponential function. More generally, the complex exponential function shares with the usual exponential the property that
ote = ee", The following identity relating the complex exponential to the sine and cosine functions is fundamental. Theorem A.4 (Euler's formula) For all t € R,
et = cos(t) + isin(t). It follows in particular that e™! = —1 and e”! = Recall that in R? there is a system of polar coordinates, in which a vector is identified by its length, together with the angle it makes with the positive x-axis. It follows from Euler's formula that e” is the unit vector which makes angle @ with the positive x-axis, and since |z| is the length of the complex number z, thought
of as a vector in R?, we can write z in polar form as
z=\z\e", where @ is the angle z makes with the positive real axis; @ is called the argument of z.
384
Appendix Figure A.2 The modulus and argument of z.
The following formulas follow immediately from Euler's formula, making use of the trigonometric identities cos(—xr) = cos(x) and sin(—x) = — sin(x).
Corollary A.5 For all t € R,
1.
ef =e, eit
—it
2. cos(t)= —**—
nes
3.
i
sin(t)=
A.3 Proofs Logical Connectives Formal logic is a system created by mathematicians in order to be able to think and speak clearly about truth and falseness. The following five basic concepts,
called logical connectives, are the foundations of this system. inition The five basic logical connectives are: not,
and, or, if..., then ..., if and only if.
The first three are fairly self-explanatory: the statement “not A” is true if the statement
“A”
is false, and vice versa.
For example,
the statement
“x is not a
natural number” (abbreviated x ¢ N) is true exactly when the statement “xr is a natural number” (x € N) is false.
The statement “A and B” is true if both A and B For example, the statement
x>0O
and
x’ 0O
and
xr0
or
r0
or
x
0 is true, because x? < 0 is always false. A statement
using connective
“if..., then...” is called a conditional
The connective is often abbreviated with the symbol
statement.
=; we also use the word
“implies,” as in “A implies B,” which means the same thing as “If A, then B” which is also the same as “A = B.” A statement of this form is true if whenever A is true, then B must be true as well. For example, r>2
=>
xr>0
is a true conditional statement, because whenever x > 2, then it is also true that r>0.
The connective “if and only if” is often abbreviated “iff” or with the symbol +. It is the connective which describes equivalence: two statements are equivalent if they are always either both true or both false. For example, the statements x > 5 and x—5
> 0 are equivalent, because for every real number +, either both
statements are true or both statements are false. So we might write r>5
@
4x-5>0.
Statements with the connective “if and only if” are sometimes called biconditional statements, because they are the same thing as two conditional statements: “A iff B” is the same thing as “A implies B and B implies A.” This is neatly captured by the arrow notation.
Quantifiers Quantifiers are phrases which are applied to objects in mathematical statements in order to distinguish between statements which are always true and statements which are true in at least one instance. “The mathematical definition of “or” is inclusive: “A or B” means “A or B or both.”
tlt is important to note that in mathematical English, “implies” does not mean the same thing as “suggests.” “Suggests” still means what it means in regular English, but “A implies B” means that if A is true, B must be true. You will sometimes hear mathematicians say things like “This suggests
but does not imply that...”
386
Appendix
Definition The two quantifiers are e e
there exists (denoted 3), for all (denoted Y).
For example, the statement “There exists an x € IR with x? = 2” is the assertion that the number 2 has at least one* square root among the real numbers. This statement is true (but not trivial!). The statement
“For all x €¢ R, x? > 0” is the assertion that the square of any
real number is nonnegative, that is, that the statement x? > 0 is always true when x is a real number.
Contrapositives, Counterexamples, and Proof by Contradiction One of the most important logical equivalences is the contrapositive, defined as follows.
Definition The contrapositive of the conditional statement “A = B” is the conditional statement “(not B)
> (not A)”
The contrapositive is equivalent to the original statement: “A = B” means that whenever A
is true, B must be as well. So if B is false, then A must have been as
well. Whenever we have a statement which claims to always be true, it is false if and
only if there is a counterexample: that is, one instance in which the statement is false. For example, the statement “For all x € R, x? > 0.’ is false because there is
a counterexample, namely + = 0. Conditional statements are a kind of statement claiming to be always true: “A = B” means that whenever A is true, B is always true as well. So again, a conditional statement is false if and only if there is a counterexample. The statement
rP>4
>
x>2
is false because there is a counterexample: x = —3 (of course, there are infinitely many, but we only need one). On the other hand, the statement
(reR
and
x?
3=2
is true, because there is no counterexample; a counterexample would consist of a
value of x € R with x? < 0, together with the falseness of the statement 3 = 2.
*For a statement beginning “there exists” to be a true statement, what follows “there exists” only needs to be true once. If it’s true more often than that, that’s fine.
A3
387
Proofs
We now turn to proof by contradiction, which used to be known by the charming phrase of “proof by reduction to the absurd.” The basic idea is to suppose that the statement you're trying to prove is false, and use that assumption to prove something which is known to be false. If every step in between was valid, the only problem must be with the starting point. We illustrate with a famous example, the irrationality of the square root of two.
Proposition A.6 [fx ¢ R with x? = 2, thenx ¢Q.
Proof
Suppose not; that is, suppose that there is an x € IR with x? = 2 such that
x € Q,. Since x € Q, we may write x = f for some integers p,q. We may assume that p and q have no common prime factors, that is, that we have written x as a fraction in lowest terms. By assumption, _P @
= r2
=2.
Multiplying through by q? gives that
p? =2¢°. From this equation we can see that p? is divisible by 2. This means that p itself is
divisible by 2 as well; if p is odd, then so is p*. We may therefore write
p = 2k,
so that p? = 4k’. Dividing the equation 4k? = 2q? through by 2, we see that q’ = 2k?, so q? is also even. But then so is q, and so p and q do have a common factor of 2, even though earlier we assumed that x had been written in lowest terms. We thus have a contradiction, and so in fact -r is irrational. A This example illustrates a useful rule of thumb,
which
is that proof by con-
tradiction is generally most helpful when assuming the negation of what you're trying to prove gives you something concrete to work with. Here, by assuming that x was rational, we got the integers p and q to work with. Something to take note of in the proof above is the first two words: “suppose not”; this is short for “suppose the statement of the proposition is false,” and is typical for a proof by contradiction. It is also typical, and a good idea, to follow the “suppose not” with a statement of exactly what it would mean to say that the statement of the theorem was false. This is called the negation of the statement. In this particular example, the statement to be proved is a conditional statement: If x € R with x? = 2, then x ¢ Q. As was discussed above, a conditional statement is false if and only if there is a counterexample; assuming as we did above that there is an x € R with x? = 2 such that x € Q was exactly assuming that there was a counterexample. Mathematical statements can of course become quite complicated; they are built from the five basic logical connectives and the quantifiers, but all of those pieces
388
Appendix can be nested and combined comes down to simple rules:
in complicated ways. Still, negating a statement
The negation of “A and B” is “(not A) or (not B).”
The negation of “A or B” is “(not A) and (not B).” The negation of a conditional statement is the existence of a counterexample. The negation of “For all x, Q(x) is true” is “For some .r, Q(x) is false.” The negation of “There is an x such that Q(x) is true” is “For all x, Q(x) is false.”
Example The negation of the statement “For all € > 0 there is a 5 > O such that if | — | < 4, then [f(x) — f()| < €”
“There exists an « > 0 such that for all 5 > O there exist x and y with |x — y| < 5
and (f(x) —fQ)| = €” Working through to confirm this carefully is an excellent exercise in negation.
Proof by Induction Induction is a technique used to prove statements about natural numbers, typically something of the form “For all n €N, ...” One first proves a “base case”; usually, this means that you prove the statement in question when n = 1 (although sometimes there are reasons to start at some larger value, in which case you will prove the result for natural numbers with that value or larger). Next comes the “induction”; this is usually the main part of the proof. In this part, one assumes that the result is true up to a certain point, that is, that the result holds for n < N. This is called the induction hypothesis. One then uses the induction hypothesis to prove that the result also holds when n = N + 1. In this way, the result is proved for all n € N: first it was proved for n = 1. Then by the inductive step, knowing it for n = 1 meant it was true for n = 2, which meant it was true for n = 3, and then n= 4, and so on.
Example We will prove by induction that
142b- pn First, the base case: if n = M2) z _ 4.
MED
1, the left-hand side is 1 and the right-hand side is
A.3 Proofs
389
Now suppose that the formula is true for n < N. Then
1424-tu 4g =D yy N(IN+1
-wen(%+) _ +I +2) —
where we used the induction hypothesis to get the first equality and the rest is algebra.
Addendum
Truthfully, we never actually make bread from barley flour.” Here is the recipe for sandwich bread in the Meckes household, adapted from a recipe from King Arthur Flour. 19.1 oz flour
54 oz rolled oats 1.5 oz butter 24 tsp salt 2 tsp instant (i.e., rapid rise) yeast t ¢ packed brown sugar 15 oz lukewarm milk Makes a large loaf - we use a long pullman loaf pan.
1.
Mix (either by hand or in a stand mixer) all the ingredients except the milk in
a large bowl. Stir in the milk and mix until well blended. Knead by hand or with the dough hook of a mixer for around 10 minutes. The dough should be fairly smooth and elastic (although broken up some by the oats) and should no longer be sticky. Let the dough rise in an oiled bowl covered with plastic wrap or a damp towel for around an hour. Form into a loaf and place, seam side down, into the greased loaf pan. Let rise another 45 minutes to an hour - in our pan, it’s time to put it in the oven when the top-most part of the dough is just below the edge of the pan. It will continue to rise in the oven. Bake at 350°F for 50 minutes to an hour. Remove from the pan and let cool for at least 45 minutes before slicing.
*We don’t make our own beer or cheese either.
Hints and Answers to Selected Exercises
1 Linear Systems and Vector Spaces 1.1 Linear Systems of Equations 1.1.1
1.1.3
(a)
You can make 8 loaves of bread, 76 pints of beer, and 21 rounds of
(b)
cheese. You'll use 27 pounds of barley.
(a)
Consistent with a unique solution.
(b)
Inconsistent.
(o)
Consistent with a unique solution.
(d) pe
Al Consistent; solution is not unique.
392
Hints and Answers
Consistent with a unique solution.
» Inconsistent.
(a) The
x-y
plane.
(b) The
.r-axis.
(c) The
origin.
(d) The
origin.
(e) No point satisfies all four equations.
ps
(a)
a—b+c=1
(b) f(x) = 3x? + dx.
c=0 11,9) 1.1.11
at+b+c52. Remember the Rat Poison Principle - just plug it in. For each i,
ajy(tey + (1 — td) +--+ + ain(tey + (1 — thd)
= tajer ++ ++ + Gintn) + 1 — Haids +--+ + Gindn) = bj + (1 — 1b; = 8}.
1.2 Gaussian Elimination 12a
(a)R1
(b)R3
io ofa
Jo 12.7
1.2.9) 1.2.11 E243
(c) RI
4 |
(d)R1_(e) R2_(f)R2
10
-1
0]
3
Jefe
2
O}-1/@]0
&
0
1| 2
1000 0100
00
01
100 010
0|0 -1/0
0/%lo01 2]o
0001
000
0]1
(g)R1
(h)RI
10
100
1/@j]o10
00
001
(a) x = 14-13z,y= —5+5z,z eR. (b) x= $-2w+4zy = —t-w+3z, w,zER. () r= 1,y=0,z= 1. (d) x = 1, y=0, z= 1. (e) There are no
solutions. a=—1,b=0,c=4,d=2 No. (a) r+y+z=1 (b) Impossible by Corollary 1.4. rt+yt+z=2
(d)
r=1 r=2
()
xr+y=1 2x+2y=2
Bx + 3y =3
(c)
r+
y=0
Hints and Answers
1.2.15
by +b:
393
—2b3
— bg = 0
1.3 Vectors and the Geometry of Linear Systems
(c) A31211-10-9 -8 7 6 Sa 5-3
for some ce R.
[x (b)
|y|
-1] =]
0
1 | +c]
1 | forsomeceR.
Px
©
|y}=
)
|Z
dj
(@)
x3
=(-
1
(5)
a4
1.3.5
2 | forsomeceR.
[6
xy XY
—1|+c]
3 -1
1
2
()
—2
—3
oft}
a
4
1{7?lo ie}
(e)
Hl =c :] for some ce R.
(p)
Hl
= (-3)
:] +c ']
(a) b} +b: — 2b; — bg =O
b
for some a,b € R.
dL.
for some c € R.
(b) Check each vector using part (a).
Hints and Answers
1 2 3
4
=(a— 2)
+(b-1)
+(4—a—b)
+(4-a—-6)
1.3.11
1
5
+b
1
+a
:
==
oreo
0
Hoon
Forany a,beR,
oo
1.3.9
ono
ForanyzeR,
oone
f]--a0}-9el-t
1.3.7.
Write (x,y,z) = (x,3,0) + (0,0,z). Apply the Pythagorean Theorem get that |\(x,y,0)| = /x?+y?, and then again to get ||(x,y,z)||
to =
v ll,y, Ol? + (0,0, 2)? = Vx? + y? +22. 1.3.13.
The vectors v = (v1, ¥2) and w = (w;, w2) span R? if and only if the linear system
vy
wy
ir
v2
W2/y
has a solution for all x and y. First assume that both v; and v2 are nonzero. Row-reduce the system to see what is required for a solution to always exist, and do a little algebra to see that the condition you get is equivalent
1.3.15
to v and w being noncollinear. Consider the cases of vj = 0 and v2 = 0 separately. If k =O, there is a single solution: one point. If k = 1, the set of solutions is a line. If k = 2, the set of solutions is a plane. If k = 3, then every point is a solution.
1.4 Fields
14.1
0=0+0¥V5. If a,b,c,d € Q, then (a+ bV5)(c-+dy/5) = (ac+ 5bd) + (ad + be)V5 € F. If a,b € Q, then
1.4.3
—_~
atby5
=| 45 @—5be
€ F. Note that the. denomi-
nator has to be nonzero because a and b are rational. The remaining parts are similar. (a) Doesn't contain 0. (b) Doesn’t have all additive inverses. (c) Not closed under multiplication (multiply by i, for example). (d) Doesn't have all multiplicative inverses.
Hints and Answers
395
1.4.5
(a) 7a| [F543i ~
us
1
-i
(b) } i}
+e}
1 | 27 w5[? ]
-1
0
6i
1
1.4.7
Use Gaussian elimination in each case.
1.4.9
(a)
Ifit has a solution over R, Gaussian elimination will find it. But if the
coefficients are all natural numbers, then Gaussian elimination can’t produce a solution which is not rational. (b)
If x is the unique solution in Q, then the system has no free variables,
(c)
whether we consider it over Q or R, so x is the only solution. 2r=1 100
14.11
(a)}O
1
101 O}
(b)}O
ool
1.4.13 1.4.15
1.4.17 1.4.19
1
1
000
If the solution to the system when thought of as a system over F is unique, what does that tell you about the RREF of the coefficient matrix? Mod 4, 2-2 =0, but 2 4 0. This cannot happen in a field (see part 9 of Theorem 1.5). Fora € F, suppose that a + b = a+ c=0. Add c to both sides. a+(—1)a=(1 — 1)a=0a =0, so (—1)a is an additive inverse of a.
1.5 Vector Spaces 1.5.1
(a) Yes. (b) No (not closed under scalar multiplication). (c) No (not closed
under addition). (d) No (doesn’t contain 0). 1.5.3.
The zero matrix has trace zero, and if tr(A) = tr(B) = 0, then tr(cA + B) =
Vilcaii + bi) = ctr(A) + tr(B) = 0. 1.5.5
(a) The properties are all trivial to check. (b) If you multiply a “vector” (element of Q) by a “scalar” (element of R),
you don’t necessarily get another “vector” (element of Q): for example, take y=
1 andc=
/2.
1.5.7
All the properties of a vector space involving scalars from F are just special cases of those properties (assumed to hold) for scalars from K.
1.5.9
Pointwise addition means (f + g)(a) = f(a) + g(a) (with the second + in
1.5.11
F), so f +g is a function from A to F. Commutativity and associativity of vector addition follows from commutativity and associativity of addition in F. The zero vector is the function z : A > F with z(a) = 0 for each a. The rest of the properties are checked similarly. The subspaces U; and U2 both contain 0, so we can take u; = uw, = 0 to
see 0
= 0+0 €
U; + Up.
If uy
+ uy
€
U;
+ Up and
c € F,
then
c(uy + U2) = cuy + cu2 € U; + Ud, since cu; € U; and cuz € Uz. Showing U, + Up; is closed under addition is similar.
396
Hints and Answers
1.5.13
Y% Addition and scalar multiplication: | : | + Vn 1 vk strictly positive entries, and A]
wy)
vw) =
, which has
Wn
: | = | :
VpWn,
|, which has strictly positive
entries. Commutativity and associativity of vector addition follow from commutativity and associativity of multiplication. If the “zero vector” is 1
1
,then}
2
L1
"1
]+]
1
ly
2}
vy
=
Vy
Ua
= 1+
. The additive inverse of
Dp.
Vn
Yn.
1
in
"1
is | : |. Multiplication by 1 and associativity of multiplication: 1]
: | =
1 Un
Yn
vi i)
VY) =
vy
vb
epee
vib
=al:|]=
:
Ynb
Lon.
v1 = (ab)|
vit b
Vn
yyw; laws:
Distributive
=a
a
(vyw,)*
wt , and
Lv
rue :{=|:
Lene, b Lol? 1.5.19
v [+]: fea vin
Unb
”
=
(a + b)
Yn
=
:
mn
b
: +b Yn
Vn
For v € V, suppose w + v = u+v= 0. Add w to both sides. Ov = (0+ 0)v = Ov + Ov. Now subtract (i.e., add the additive inverse of) Ov from both sides.
2 Linear Maps and Matrices 2.1 Linear Maps Ryav)=aR,y
2.1.1
(vn Wn)" vith
"1
:
=
:
=
:
VnWn [vt
: |.
Hints and Answers
(a) [;] (b) Not possible (c) io at A) an
8.03 | @ 70 @ 794
7
=.
-74+3i (0 [2 + i For continuous
functionsf and g, and a € R, [Tlaf + g)]() =
(af +
g(x) cos(x) = [af(x) + glx)] cos(x) = af (x) cos(x) + g(x) cos(x) = al [Tal(x). 2.1.7
-2
219
(a) Multiples of Hl
have eigenvalue 2; multiples of
']
Tf}(x) +
have eigen-
(b) Multiples of
]
have eigenvalue
—2; multiples of !
3
‘
wis
value 0. ] have
eigenvalue 2. (c) Multiples of °]
have eigenvalue 2 (this is the only eigenvalue).
oO
1
(d) Multiples of | 1 | have eigenvalue —2; multiples of | 0 | have eigen0
—1 1
value —1; and multiples of | 0 | have eigenvalue 1. 1 21,11
Vectors on the line L have eigenvalue 1; vectors on the line perpendicular to L have eigenvalue 0.
2.1.13
b=0
2,1;15
First use the fact that each standard basis vector is an eigenvector to get that A is diagonal. Then use the fact that (1,...,1) is an eigenvector to show that all the diagonal entries are the same.
21,17
(b) If 4 is an eigenvalue, there there is a nonzero sequence (a), 42,...) such that Aa; = 0 and Aaj = aj_, for every i > 2.
2.2 More on Linear Maps 2.2.1
(a) Yes. (b) No. (c) No.
Hints and Answers
2.2.3
The
set
of all solutions
of the
system
is
pve
398
—1+i
1 i) i)
at
tL
-V2
v2
v2
v2
Write Tv = Av; apply T~' to both sides and divide both sides by A. The line segments making ments by Exercise 2.2.10,
up so
a
corner
quadrilateral
hand
{+
side
of
(with the
‘rel
one
square
R
and R =
the sides get mapped to line segthe image of the unit square is
and
at the
the
origin).
left-hand
Call
side
(fa) ++{3| ‘re.
L;
the then
rightL
=
This means
T(R) = {r({3]) val!) 21 (0, | - r([3)) + T(D, and both T(R) and T(L) have direction T
; . The same argument shows that the
images of the top and bottom are parallel.
2.2.18
(a) Evaluate Av for each of the given vectors to check; the eigenvalues are —land 2.
(b)
0
05
05
10~=«S,
Hints and Answers
399
2.2.15
Consider f(x) = e**.
2.2.17
(a) Prove the contrapositive: if T is not injective, there are u; # uz such that Tu; = Tuz. Now apply S to see that ST can’t be injective. (b) If ST is surjective, then every w € W can be written w = STu = S(Tu)
for some u. This shows that w=
Sv for
v= Tu.
2.2.19 S[T(cuy + up)] = S(cTuy + Tun) = cSTu; + STur. 2.3 Matrix Multiplication 2.3.1
8
(a) [*
5
—34+6i
A
2-2:
(b) [2 wie
e (c) Not possible
ml
(d) —12
(e) Not possible (multiplying the first two works, but then you're stuck) -1
-2
23.5
5 E
>
2.3.7
diag(ajbi,...,dnbn)
2.3.9 — [a] is invertible if and only if a # 0. If a £0, [a]~! = [a]. 2.3.11
Write b as ABb; this says x = Bb is a solution to Ax = b.
2.3.15
(A7~')~! is defined to be any matrix B such that BA~' = A~'B = L The matrix B = A does the trick.
2.4 Row Operations and the LU Decomposition 2.4.1
(a) Q2,1Q3,2P_1
12
(b) P3,2,1P—1,1,20—2,212
1-20 (d) Paja1P-13,1Pi32{0
2.4.3
(fy a) | 21AC)b)
0
0
000
0+0i
2.4.5
1 o -1
03 141} 0 2
(e) Singular -1
0 |
0
1
2/5
7
—1/5i
evans]
0 (d) Singular
0
-1
2 1
eee
wea
1-2
(f)
i] 2° -3 =Z}-1 5 -1 101
11
oj}
©
°0
poet 1 1-10 2 3 2 -2
port
i
-1
111
1
400
Hints and Answers
(b) (x,y, 2) = 35(—31, 43, -13)
(a) (x, 9,2) = £(-7, 6,4)
(c) (9,2) = £272 — 249, 7 — V2 — 83, /2 + 83)
“LI Labs (d) (x, 9,2) = 3(2a —3c,5b—a—c,at+o
2.4.7
(© }-1 2.4.9
0
-1
1;;0
Oo
1
o|fo -1
(a)
3
(d) (9, Z
2.4.11
oO;;2
2
of} 1 O};-1 If
(b) (, y) = (1, 2)
31 4]2
(o) Gy, 2) = (0,3, 2)
2(0, —9,17) + c(5, 1,2) for anyce R
wr=nv=[) ‘fe-[e At 7
(b)L=]
}2 3}.p_]0
1
00
0
1
1
2-1 2
OJ;U=]0
-101
oo
2
100
0 1 -1];P=]0
1
010
2.4.13
(x,y,z) = (3, 3,3)
2.4.15
These are easiest to see by considering the effect of multiplying an arbitrary matrix A by these products. Ri;Pc,k¢ performs R1 with rows k and é, and then switches rows i and j, so if {i,j} 9 {k, €} = 9, it doesn’t matter
which order you do these operations in: the matrices commute. If i = k but j # é, then RjjP.,z,¢ adds c times row ¢ to row k, then switches row
k with row j. This is the same as first switching row k with row j, then adding
c times
row
¢ to row j: Ri jPcke
similar. Alternatively, you can just confirm the formulas.
write
=
PejeRej- The other two
everything
out
in components
are
and
2.4.17
The L from the LU decomposition of A already works. The U is upper triangular but may not have 1s on the diagonal. Use row operation R2 on U to get a new U of the desired form; encoding the row operations as matrices will give you D.
2.4.21
Write Qe = [qjilf—, and A = [ajilicjcm. Then [Q.iA]je = D7 qeaee: Isksn
Since Q,,j is zero except on the diagonal, qje =
1 if j = € but j ¢ i, and
qii = ¢, the only surviving term from the sum is qijajz, which is just aj, except when j = i, when it is cajx. This exactly means that the ith row of A has been multiplied by c.
Hints and Answers
401
2.5 Range, Kernel, and Eigenspaces
1 (a){|—-2}}){} 1
-6 3 |] © 5
2.5.3
(a) Yes.
(c) No.
(d) No.
(e) Yes.
2.5.5
(a) Yes; Eig,(A)=
‘}}
(b) No.
(c) Yes; Eig_,(A)=|] 1
2.5.1
(b) Yes.
r
(c) Yes; Eig_3(A) =
0
4
0
(a)
3
(0)
(d)
(e)
0
3
fo
0
1
o}
[-1
|:ceR
(b)
1
0
0
2
3
-} r3 2 Olt}
¢}—-1]
-¥ +c]
1
2 +e]
0
a
4
[+4
3
1] illo
-2|+c]
1 (d) (0) (e) | } 1 1
1
—5
-2 [ o
2.5.9
: oO ;
L fi] f-2 (d) Yes; Eig,(A)={|3],| 0 o} | 3 .
2.5.7
-
—2]:ceR
1
[—3 |4al—'|:edeR 0
,
[1
|+c]-1]:ceR
1
; 12
|S R
1
(a) Just plug it in and see that it works.
(b) Tf =,f’ +f is a linear map on C?(IR); the set of solutions to equation (2.22) is the set of f € C?(IR) with Tf = g, where g(t) = 4e~".
(c) f(t) = 2e~' + k; sin(t) — cos(t) for some ky € R. (d) Such a solution must have the form f(t) = 2e~' + kj sin(t) + k2 cos(t)
with ki = —2e~7 +b and ky = -2 +a, 2.5.11
(a) If w € range(S + T), there is a v € V such that w = S(v) + T(v) € range S + range T. (b) C(A) = range(A) as an operator on F", so this is
immediate from the previous part.
402
Hints and Answers 2.5.13 25,15
The matrix AB = 0 if and only if A(Bv) = 0 for every v, if and only if Bv € kerA for every v. Explain why this is enough. Let v be an eigenvector with eigenvalue 4, and apply T k times.
2.5.17
No. For example, the set of eigenvectors together with 0 of :
"
is the
x- and y-axes (and nothing else), which is not a subspace of R?. 2.5.19
C(AB) is the set of linear combinations of the columns of AB. Since each
column of AB is itself a linear combination of columns of A, every element of C(AB) is a linear combination of columns of A.
2,521
T(0) = 0, so 0 € Eig, (T). If v1, v2 € Eig, (T) and c € F, then T(cv; + v2) = cTy, + Tr2 = cAv; + Av2, So cv) + v2 € Eig, (T).
2.6 Error-correcting Linear Codes 2.6.1
2.6.3
(a) (1,0,1, 1,1)
1
(b) (1, 1, 1,0,0,0, 1,1, 1,1,1,1)
1
1
1
1
0
(c)x=
or [‘]
(b) x = A
(a) x= A
(¢) (1,0, 1, 1,0,0, 1)
°
0
(d)x=]1],]O],or]1 0
2.6.5 2.6.7
1
1
(a) (1,0, 1, 1) (b) (0,0, 1, 1) (©) (1, 1,0,0) (d) (0, 1,0, 1) (e) (1,0, 1,0) If errors occur in bits i; and i2, then
z = y+e;, +e), and Bz = By+bj;, +bj,.
Since bj, + bj, # 0 (explain why not!), we can tell errors occured but not how many or where. If three or more errors occur, we can’t even tell for sure that there were any (because it is possible that bj, + bj, + bj, =
0). There are 2* vectors in F$. Each can lead to 8 = 2? received messages (no errors or an error in one of the seven transmitted bits). Since these vectors are all distinct (why?), there are 27 possible received messages,
all of F?.
2.6.11
100 0 0100 0010 11101000 A=]° °° 'V@pef) 1Oteroo 1110 10110010 1101
1011 1111
11110001
which is
Hints and Answers
403
3 Linear Independence, Bases, and Coordinates 3.1 Linear (In)dependence 3.1.1
(a) Linearly independent (b) Linearly dependent (c) Linearly independent
(d) Linearly dependent (e) Linearly dependent 3.1.3
We
wanted
to see the last equation
in (3.1) as a linear combination
of
the first three, so we wanted a,b,c so that 2x + $y +z = a(x+ dy) + b(2r+ zy) + c(x + 8z). By equating coefficients of x, y, and z we get a+2b+c=2,
3.1.5
fa+ib=
4, and
8¢ = lL
Take the linear combination of (0, v;,..., vn) given by 1(0) + O(v;)+---+ O(vo).
3.1.7
(a) The collinear condition is a linear dependence of v and w. If v and w are linearly independent, then c;v + c,w = 0 for some cj, cz which are
not both zero. Divide by whichever is nonzero and solve for that vector in terms of the other.
(b) (1,0),1, 1), (0, 1) 3.1.9
Suppose there are scalars so that O = c¢Tryy) + ++ + CnTin T[cyv) +---+ envy]. By injectivity, this means cv) +--+ + Cy¥n = and since the v; are linearly independent, this means all the
= ¢;
are 0.
3.1.11 3.1.13.
3.1.15 3.1.17.
If the matrix is upper triangular with nonzero entries on the diagonal, it
is already in REF, and each diagonal entry is a pivot. Use Algorithm 3.4. Consider D : C%°(R) + C%(R) defined by [Df](x) =f’(2). Scalar multiples of eigenvectors are eigenvectors (with the same eigen-
value). 3.1.19
T is injective iff ker T = {0}. Now apply Proposition 3.1.
3.2 Bases 3.2.1
(a) Yes. (b) No. (c) Yes. (d) Yes.
1 3.2.3
(a)
0
+121]
1 (d)
4
—2 —4
o({3}
4
1
—3
0
—1
0
boldly
0 |
My}
2 0
, 1
2 1
3
1
0
_oytefofe}—o
Hints and Answers
(a)
fa —2| 1
2
0
(b)b
1
5 | 1
0
,
3 |-3 0
1
2 1] (c)
°
l
o}
| | o >|
;1
[-1
oofte N
afi }-a[
2
3fi] as
,, —
a
a
(d)
ee Lolnw
|
I
,
2 |-3
0
Ee wl OwH
404
217 ]+z [4] 3f2],5f4
32.9
Every element of C can be uniquely written as a + ib for some a,b € R.
3.2.11
B
32:13
For 1 Si 1, so rankA < 4.
3.4.13
If rank T = dim V, then by the Rank-Nullity Theorem, null T = 0 and so T is injective. If rank T = dim(range T) = dim W, then range T = dim W and so T is surjective.
3.4.15
If T has m distinct nonzero eigenvalues Aj,..., Am,
ing eigenvectors
then the correspond-
v;,...,%m are linearly independent. range T contains
A1V1,..+;AmUm, none of which are zero, and so rank T > m.
3.4.17
3.4.19
Row-reducing A is the same whether we do it in F or in K. The rank of A is the number of pivots and the nullity is the number of free variables, so these numbers are the same regardless of whether we view A as an element of Minn(F) or Minn (K). (a) Let aj be the jth column of A; since aj € C(A), there are coefficients
bj so that aj = bye; + --- + bycr, which is the same as saying aj = Cbj, where bj = [os
oe
bal
This means A = CB.
(b) By Corollary 2.32, rankAT = rank(B'C!) < rank(B'). By Theorem 3.32,
3.4.21
rank(B!) < r since B! € My,r(F). (c) Applying the previous part twice, rankA™ < rankA and rankA = rank(A')' < rank A‘, and so rankA? = rank A. Let A € Mm,,(F). If n > m then kerA # {0}, and if m > n then C(A) ¢ F”".
Hints and Answers
407
Elf] Ls aE «Ez
3.5 Coordinates
1
if 353
7
iJ-
>
1
() 5
1
1
i
1
0
0
2
-1
|
3
oi]
(vi)
6
B
1
]
(c) | -1
(d) | 0
-3
1
0
1
1
4]
1
(e)
3
10 |oa 0
(J
wif)
Nien
1)
oo
35.7
(b)
Ve
©)
0 -4
1
°
°
(iv) | 3
2
—2
oe
1 01-1
of
(iii) | 1
xin
(a)|3 0
=u
-3
1-1 a0 02
fi 3.5.5
3
(v) [:
oii
0
i
Nien
[2
wo Al wi 5 | aBo
2)
[3
o|
[1
(a) (| , 0 } 0
0
10
3.5.11
[T]=]9 Lo
(m]
3.5.15
(ii) Not in P (ii) [3]
fe)
100
fe}
0
020
oO
1
9] eMny1n(R); [DT]=]}°
0
1
0
o10
0
=| 0
2
0
diaglay!,...,a; 1) 100 [P]le=]0 1 0 00
9 0
000
0
3.5.13
) @) ]
0} n
e Mn-1n-1(R).
Oo
3
0} EMnn(R); n
Hints and Answers
3.5.17
C1 [ulg = |
C1 tacenesnssonnal|
toto
Cn
3.5.19
Ch,
If S and T are both diagonalized by B = (v1,..., vn), then for each i € A1s.0smg n} there are A; and jz; so that Sv; = Ajv; and Tv; = pujv;. Then
STv; = Ajjvj, and so ST is diagonalized by B.
3.6 Change of Basis 3.6.1
(a) [Nae=
3] 5
ik (a,c = ; [:
) Mes = ; [ [1
(Mse=5]-2
2
0
-6
-1
32
-3 -1],Mlen=|2
Oo
-1
1
-2
1
N
-4
-2
NN
408
0 [:] fi ] (ii 2] (iv) :] ) [| 3.65
fi (@Mse=|-1
1 0
of] ifs 1 |, Mes=>]1 1
‘]
-1
1
)
1
~ 1 5
wy Gi) 5
1
=
om iii)
=] (iv)
9 3
1
M>
1
3.67
(a) Mae = i
‘} Wes =[ ; zh Mee =
10
|3 2 0
-8
Wee
3
-2
=]12
-4
3
9
-3
2
1
0
11
,
|)
0
1
1
|3
0}
fi) | 0
Oo
-4
-1
ol
1
‘|
)
[:
—1
]
-1
-1
vi
E
1
oO | fii)
Jo
2
0
1
seo ef? dJovnee-[3) fm, SJL tiv) :
3
1-1
Oo
0
0
1
4]
1
1
1
2] 0
Hints and Answers
3.6.11
409
me (GE)ob adel: SJeab 0
3.6.13
(se =]0
oo
1
0 2
|, ]e53=}0
1 o
0 3
315
The traces are different.
3.6.17
A has two distinct eigenvalues (1 and 2).
3.6.19 3.6.21 3.6.23
A has no real eigenvalues (any eigenvalue 4 has to satisfy 47 = —1). tR=1 If T has
n
distinct
eigenvalues
Aj,...,An,
then
V
has
a basis
B
=
(vj,..., Un) of eigenvectors: Tv; = Ajj. Then [T]3 = diag(A1,..., An). 3.6.25
A=
3.6.27
sumA = 443 =sumD. (a) Take S = I, (b) We have S and T invertible such that
:
0
|
2
~ :
C = TBT"', so 3.6.29
0
4] = D since A has distinct eigenvalues 2
1 and 2. But
B = SAS~!
and
C= TSAS~'T~! = (TS)A(TS)~! and thus C ~ A.
A is diagonalizable if and only if it is similar to a diagonal matrix D, which by Theorem 3.54 means that there is a basis B of F” such that D = [A],. By Proposition 3.45, this means the elements of B are eigenvectors of A and the diagonal entries are the corresponding eigenvalues. So A = SDS! with S = []z,¢, whose columns are exactly the elements of B expressed in the standard basis.
3.7 Triangularization 3.7.1
a (s
i A
, ) (-([}]}) )
Pup
1 fol)y, 0.
() | -2,{
1 }o})} 0)
].13.4
1 15 0
@y1y]
far]
2 ay
dy.
0
-1 1 y 0)
Lat
16 [e4 | 25 10
1 0 dG )
(a) Diagonalizable (b) Diagonalizable (c) Not diagonalizable (d) Diagonalizable Show that a lower triangular matrix is upper triangular in the basis
(en,...,€1)Use the fact that the eigenvalues of an upper triangular matrix are exactly the diagonal entries. If A and B are upper triangular, what are the diagonal entries of A+ B and AB? The eigenvalues of an upper triangular matrix are exactly the diagonal entries, so A has n distinct eigenvalues, and hence a basis of eigenvectors.
410
Hints and Answers
3.7.11
BF15
Since F is algebraically closed, T can be represented in some basis by an upper triangular matrix. Prove that if A is upper triangular, then p(A) is upper triangular with diagonal entries p(aj;),..., (ann). Then apply Theorem 3.64. Consider the polynomial p(x) = x? — 2.
4 Inner Products 4.1 Inner Products 4.1.3 4.1.5
(a) —} (b) 13 (a) (ReA,ImA) = 5 [|All — |A*Iz — 2ilm (A, A*),]
4.1.7
= 1 [tr(AA*) — tr(A*A) — 21m tr(A’)] = 0 (b) Apply Theorem 4.4. (a) Vz (b) Vz_()0
4.1.9
(v1,02)7
llaf+ bgll = Varlifll? + Wligl? — 2ab (Fg) = Vxr(a + 0?) is
trivially
a
(v) +02,3)p
scalar;
=
by linearity of T; homogeneity
(v1, 03)7 + (v2,03)7
(T(vi+12),Tv3)
=
is similar. Symme-
try and nonnegativity are easy. Definiteness requires the injectivity of T: if (v, v) 7 = 0, then Tv = 0, which means v = 0 since T 4.1.11
(a) Let T(x, x2) =
4.1.13
IfA= [:
is injective.
(4, 272). (b) (1, 1) and (1, —1) (c) (2, 1) and (2, —1)
| and D = [: ‘I then A ~ D (why?), but ||Al]> = V3 and
Dip = V2. 4.1.15
4.1.17
4.1.19
Let x = (a),...,@y) € R” and y = (bj,...,6n) € R". Then the result is immediate from the Cauchy-Schwarz inequality with the standard inner
product on R". Let x = (a), /2a2,...,./iay) and y = (61... 4) and apply the Cauchy-Schwarz inequality. The properties mostly follow easily from the properties of the original inner product over C. For definiteness, observe that (v,v)
4.1.21
is necessarily
real, so if (v,v)p = 0, then (v, v) = 0. If w = 0, then any v € V is already orthogonal to w and so taking u = v
and any choice of a € F works.
4.2 Orthonormal Bases 4.2.1
In each case, the list is the appropriate length, so it’s enough to check that each vector has unit length and each pair is orthogonal.
5
4.2.3
a]
— 29.
#] w}
~Vi5
i
2
Yin
%}o@l/*#=—-|@) = Lee 5
mee ve
by
marr
°% 2)
4)
is mapped
to
7 sin(nx), wed cos(x), i cos(2x), A
is orthonormal and D maps it to
(0 a cos(x), # cos(2x),..., ir cos(nx), — 7 sin(x), — wd sin(2.x),
sin(nx)). The singular values are thus 0) = 0) = n, 03 = 04 =
+36 “i
n—1,...,02n—1
= O2n =
1,02n41
= 0. The original basis gives the left
singular vectors (rearranged to get the singular values in decreasing order) and the orthonormal basis with negative cosines gives the right singular vectors.
5.1.7
If v
is
are
the
a
unit
eigenvector
right
VERA (re)
singular
0?
+++ > oy >
0 such that Te; = ojfj for each j. Then T71f, = oj 16 for each j, so
5.1.11 5.1.13.
(fn, --.>fi) are the right singular vectors of T~!, (en,...,ei) singular vectors, and the singular values are o,,! > --- > i Use Theorem 4.28. kis the number of distinct nonzero singular values; V; is the right singular values that get scaled by 01, vy; is the span of singular values scaled by the second-largest singular value, Vo = ker T.
5.1.15
Show that (f;,...,fg) is an orthonormal basis of range T.
are the left > 0. span of the those right and so on.
416
Hints and Answers
5.2 Singular Value Decomposition of Matrices 5.2.1 5.2-3 5.2.5 5.2.7
(a) 3,1
(b) 6,2
() 2V2,2
(a) 2V2,2,0
(e) 2, 1,1
The eigenvalues of A, are just 1 and 2; |Az||p = 07 +07 = 5+2z’, so the singular values must depend on z.
If A= UXV*, then A* = VE'U*, and for 1 , 385 n, 378 U, 378 3, 386 V, 386 €, 5, 378
{v,w), 226
\lal|, 267 in an inner product space, 229
(v1, 025.245 Up), 53 diag(d),...,dn), 68
A‘, 97 Al, 96 Aq',97
Alle, 234
c, 378
[Allop, 272
&, 378 V,51
T, 311
C, 40, 46
trA, 208 trT, 209 aij, 10
F,39
F°, 56 F", 52, 59 F2, 41, 46 Q, 41, 46 R, 5, 46 R", 25, 35
p(T), pA), 217
C(A), 116 dim V, 164
Eig, (7), Eig, (A), 120
Z,41 Cla,b], 50
kerT, kerA, 118 null T, null A, 175 rankT, rankA, 173
Dia, b], 57
n-dimensional, 164
L(V, W), 64 L(V), 64 Minn), 44
Mm,n(R), 10 Py(F), 57
co, 56 Imz, 382 Rez, 382 |z|, 226, 382
Pojijr 102 Qe,is 102 Rij, 102 (Tp y,By» 187 (Tay,Bys 195 [v]-B, 185, 195 L' norm, 268 €) norm, 267
f(A), 380 f:X— Y,379
€% norm, 267 Spy 351 4,351 sgn(o), 352
fog, 381
det(A), 336
fT}, 381 xr f(x), 380
Ajj, 339
Z, 226,
382
Pata), 358
424
Index
absolute value (of a complex number), 226,
and inverse matrix, 372
382 addition mod p, 49 adjoint operator, 311, 311-318 adjugate, 372, 373 affine subspace, 123, 181 algebraically closed, 217, 221 alternating, 335, 344 argument (of a complex number), 383 augmented matrix, 11, 9-11, 45
and solution of linear systems, 370-373 and volume, 366, 373
computation by row operations, 349, 354 existence and uniqueness, 339-344 expansion along a column, 349, 354 expansion along a row, 346, 354 product of eigenvalues, 345 sum over permutations, 353, 354
determinant function, 339
diagonal matrix, 68, 191 back-substitution, 14
diagonalizable, 192, 195, 209
base field, 51
map, 192
basis, 150, 150-162
matrix, 204
bijective, 78, 380
binary code, 129 linear, 130, 136
unitarily, 320 differentiation operator, 86, 88 dimension,
164, 162-172
direct sum, see orthogonal direct sum Cauchy-Schwarz inequality, 232, 235
Cayley-Hamilton Theorem, 362, 364, 365 change of basis matrix, 199, 209 characteristic polynomial, 358, 358-364
discrete Fourier transform, 285 division, 40
domain, 64, 379
Cholesky decomposition, 330
eigenspace, 120, 120-122, 125
circulant matrix, 330 closed under addition, 55 closed under scalar multiplication, 55
eigenvalue, 69, 69-73, 75, 122 and determinant, 362 and trace, 362
codomain, 64, 115, 379
geometric multiplicity, 214, 365
coefficient matrix, 10, 116, 120
multiplicity, 361, 360-362, 364
cofactor matrix, 372
of a diagonal matrix, 71, 122 of a self-adjoint map linear map, 314 of a unitary matrix, 286 of an upper triangular matrix, 215, 220, 361
collinear, 38, 148
column space, 116, 125 column vector, 25 companion matrix, 366 complex conjugate, 226, 382 composition, 380
condition number, 274, 287, 310
conjugate transpose, 97, 227, 234 consistent linear system, 7, 7, 18, 21 contrapositive, 386 coordinate representation of a linear map, 187 ofa vector, 185 coordinates, 185, 185-199
in orthonormal bases, 241-244, 247
of similar matrices, 207
of transpose matrix, 179 root of characteristic polynomial, 358 eigenvector, 69, 69-73, 75 linear independence of eigenvectors, 146
of commuting linear maps, 325
orthogonality of, for self-adjoint linear maps, 314 orthogonality of, for normal matrices, 331 element (ofa set), 378 elementary matrices, 103 encoding function, 129 linear, 130
counterexample, 386 Courant-Fischer min-max principle, 332
encoding matrix, 130
Cramer's rule, 370, 373
entry (ofa matrix), 10, 44
error propagation, 273, 273 determinant, 336, 333-377
error-correcting code, 133, 136
425
Index
error-detecting code, 131, 136 extending by linearity, 156, 159
kernel, 118,
118-120,
125
Laplace expansion, 354
feasible production plan, 74 field, 39, 39-49
finite-dimensional, 150
four fundamental subspaces, 315 free variable, 16, 21
Frobenius inner product, 234, 235 Frobenius norm, 234
along a column, 349 along a row, 346 LDU decomposition, 114 least squares, 259, 259-260, 262 length
in a normed space, 273 in an inner product space, 235
function, 379
linear combination, 26, 35, 53
function space, 57
linear constraints, 181, 181-182
functional calculus, 324 Gaussian elimination, 12, 21
Gram-Schmidt process, 244, 244-247
Hadamard’s inequality, 355 Hamming
code,
134,
134-136
Hermitian matrix, 314, 318
homogeneous linear system, 6, 54
Householder matrix, 285
identity matrix, 68 identity operator, 64 image, 115, 380 imaginary part, 382 inclusion map, 319 inconsistent linear system, 7 infinite-dimensional, 150
injective, 380 linear map, 120, 125 inner product, 226, 225-238
inner product space, 227, 225-238 integral kernel, 87 integral operator, 87, 88 intersection, 378
Linear Dependence Lemma, 145, 145-150 linear map, 64, 63-90
diagonalizable, 192 linear operator, see linear map linear regression, see least squares linear system of equations, 44, 44-49 matrix-vector form, 73, 73-75
over R, 5, 2-7 vector form, 27-28
linear transformation, see linear map linearly dependent, 141, 140-150 linearly independent, 142, 140-150 logical connectives, 384 low-rank approximation, 303, 303-308 lower triangular matrix, 101, 107, 221 LU decomposition,
107, 107-110
LUP decomposition, magnitude,
29
matrix, 21, 44 as a linear map, 67-69
diagonalizable, 204 over R, 10
matrix exponential, 324
matrix decompositions Cholesky, 330
invariant of a matrix, 207, 206-209
LDU, 114
invariant subspace, 69 inverse matrix, 97, 97-107
LU, 107, 107-110
computing via determinants, 372 computing via row operations, 105, 110 invertible, 98 isometry, 276, 276-288 isomorphism, 78, 78-80, 88 isoscopic, 333
110
LUP, 110 QR, 283, 283-284
Schur, 327, 327-329
singular value, see also singular value decomposition, 297
spectral, see also Spectral Theorem, 322 matrix invariant, see invariant of a matrix
matrix multiplication, 91, 90-100, 139 Jordan measurable, 367
in coordinates,
193
426
Index
matrix of a linear map in £(F",F™), 83, 83-86, 88 with respect to a basis, 187, 195
perpendicular, 226, 229, 235 Perspectives bases, 223
matrix-vector multiplication, 67
determinants, 376
modulus, 226, 382 multilinear, 334, 344
eigenvalues, 223, 376
multiplication mod p, 49
isomorphisms, 223, 377
multiplication operator, 86, 88
matrix multiplication, 139
isometries, 288
pivot, 15, 18-20
negation, 387
pivot variable, 16, 21
norm, 29, 101, 267
polarization identities, 238, 277
L!, 268 e!, 267
positive definite matrix, 323
e%, 267
positive semidefinite matrix, 330
positive homogeneity, 229
Frobenius, 234
in an inner product space, 229 operator, 271, 272, 269-273
spectral, 271 strictly convex, 276 supremum, 268 normal matrix, 325, 329
proof by induction, 388 pseudoinverse, 310
OR decomposition, 283, 283 quantifier, 386 range, 115, 115-118, 125, 379
normal operator, 325, 329
rank, 173, 172-175, 182
normed space, 267, 266-274
operator, see linear map
Rank-Nullity Theorem, 175, 175-182 Rat Poison Principle, 6, 7 real part, 382 recipe, 390 reduced row-echelon form (RREF), 15, 21 row operations, 12, 11-14, 21 row rank, 174, 182 row space, 174
operator norm, 271, 269-273
row
null space, see kernel nullity, 175, 175, 182 one-to-one, 380
one-to-one correspondence, 380 onto, 380
of a matrix, 272
vector,
94
row-echelon form (REF), 15, 21
orthogonal, 229, 235 orthogonal complement, 252, 261
scalar, 25, 51
orthogonal direct sum, 254
scalar multiplication, 51
orthogonal matrix, 281, 284
in R", 25
orthogonal projection, 75, 255, 255-262 algebraic properties, 255 geometric properties, 258
self-adjoint, 314
orthonormal, 239
sign of a permutation, 352
orthonormal basis, 239, 239-252
similar matrices, 203, 209
Schur decomposition, 327, 327-329 set, 378
overdetermined linear system, 20
singular matrix, 98
parallelogram identity, 268, 273 parity bit code, 131
singular value decomposition computing, 316, 318 geometric interpretation, 301-303
parity-check matrix, 131, 136
of a map, 289, 289-295
permanent, 356
of a matrix, 297, 297-309
permutation, 280, 351, 354 permutation matrix,
109, 351
singular values computing, 299
427
Index
of a map, 289, 295
SVD, see singular value decomposition
of a matrix, 299, 308 uniqueness, 293-295
symmetric group, 351 symmetric matrix, 234, 314, 318
singular vectors
of a map, 289 of a matrix, 299
solution (ofa linear system), 44 over R, 3, 5 via determinants, 370-373
solution space, 123-125 span, 26, 35, 53, 150 spectral decomposition, 322
trace, 60, 208, 209 transpose, 96, 100 triangle inequality, 267 in an inner product space, 232, 235 triangularization, 219, 219-221
spectral norm, see operator norm
underdetermined linear system, 20 union, 378 unique solution, 7, 7, 21
Spectral Theorem, 320-329
unit vector, 229
for Hermitian matrices, 321
for normal maps and matrices, 326
unitarily invariant norm, 287 unitary matrix, 281, 284
for self-adjoint maps, 321
upper triangular linear system, 19, 48
spectrum, 321 stable rank, 310
upper triangular matrix, 101, 107, 215, 215-216
standard basis, 68, 150
strictly upper triangular matrix, 222
Vandermonde determinant, 357 vector, 51 over R, 25
subfield, 60
vector addition, 51
subset, 378
vector space, 51, 49-62
standard basis vectors of R", 26
proper, 378 subspace, 55, 59 subtraction, 40
supremum norm, 268 surjective, 380
complex, 51
real, 51 vector sum in R", 26 volume, 366