An Introduction to Linear Algebra

"A one-semester linear algebra course. Topics include matrices, vector spaces, inner products and determinants, lin

903 52 1MB

English Pages 136 Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

An Introduction to Linear Algebra

  • Commentary
  • Original source: https://www.ams.org/open-math-notes/omn-view-listing?listingId=110759

Table of contents :
To the Student
Vectors and Vector Spaces
Linear Transformations
Matrix Representation
Matrix Multiplication
Composition of Linear Transformations
Geometry of Linear Transformations
Diagonalization
Structural Summary
Organization of the Book
Table of Contents
Matrix Algebra
Matrix Multiplication
Systems of Linear Equations
Criteria for Invertibility
Exercises
Vector Spaces
Vector Space Axioms
Subspaces
Spans, Sums of Subspaces
Bases and Dimension
Exercises
Euclidean Geometry
Inner Products
Orthogonal Projection
The Determinant
Exercises
Linear Transformations
Linear Transformations
The Space of Linear Transformations
The Rank-Nullity Theorem
Exercises
Diagonalization
Linear Operators
Eigenvalues and Eigenvectors
The Characteristic Polynomial
Symmetric Operators
Applications
Exercises
Index

Citation preview

An Introduction to Linear Algebra Andrew D. Hwang February 2015

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

To the Student Linear algebra comprises a variety of topics and viewpoints, including computational machinery (matrices), abstract objects (vector spaces), mappings (linear transformations), and the “fine structure” of linear transformations (diagonalizability). Like all mathematical subjects at an introductory level, linear algebra is driven by examples and comprehended by unfamiliar theory. Each of the preceding topics can be difficult to assimilate until the others are understood. You, the reader, consequently face a chickenand-egg problem: Examples appear unconnected without a theoretical framework, but theory without examples tends to be dry and unmotivated. This preface sketches an overview of the entire book by looking at a family of representative examples. The “universe” is the Cartesian plane R2 , whose coordinates we denote (x1 , x2 ). (The use of indices instead of the more familiar (x, y) will economize our use of letters, particularly when we begin to study functions of arbitrarily many variables. The use of superscripts as indices rather than as exponents highlights important, subtle structure in formulas.) We view ordered pairs as individual entities, and write x = (x1 , x2 ).

Vectors and Vector Spaces We view an ordered pair x as a vector, a type of object that can be added to another vector, or multiplied by a real constant (called a scalar ) to obtain another vector. If x = (x1 , x2 ) and y = (y 1 , y 2 ), and if c is a scalar, we define x + y = (x1 + y 1 , x2 + y 2 ),

cx = (cx1 , cx2 ).

The set R2 equipped with these operations is said to be a vector space. The vector x = (x1 , x2 ) in the plane may be viewed geometrically as the arrow with its tail at the origin 0 = (0, 0) and its tip at the iii

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

iv

LINEAR ALGEBRA

point x. Vector addition corresponds to forming the parallelogram with sides x and y, and taking x + y to be the far corner. Scalar multiplication cx corresponds to “stretching” x by a factor of c if c > 0, or to stretching by a factor of |c| and reversing the direction if c < 0. y = (y 1, y 2) c1 x + y x+y

c2 x −x (0, 0) c1 x

x = (x1 , x2 )

Linear Transformations In linear algebra, most mappings send vector spaces to vector spaces. The special properties dictated by linear algebra may be written ) S(x + y) = S(x) + S(y), for all vectors x, y, all scalars c. S(cx) = cS(x) For technical convenience, these conditions are often expressed as a single condition S(cx + y) = cS(x) + S(y)

for all vectors x, y, all scalars c.

A mapping S satisfying this conditions is called a linear transformation. Geometrically, if x and y are arbitrary vectors and c is a scalar, so that cx + y is the far corner of the parallelogram with sides cx and y, then S(cx+y) = cS(x)+S(y) is the far corner of the parallelogram with sides S(cx) = cS(x) and S(y). A linear transformation S therefore maps an arbitrary parallelogram to a parallelogram in an obvious (and restrictive) sense. y

S(cx + y)

S(y)

S(x + y)

cx + y x+y 0 0

cx x

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

S(cx) S(x)

TO THE STUDENT

v

The special vectors e1 = (1, 0) and e2 = (0, 1) are the standard basis of R2 . Every vector in R2 can be expressed uniquely as a linear combination: x = (x1 , x2 ) = (x1 , 0) + (0, x2 ) = x1 (1, 0) + x2 (0, 1) = x1 e 1 + x2 e 2 . Formally, a linear transformation “distributes” over an arbitrary linear combination. In detail, if S : R2 → R2 is a linear transformation, repeated application of the defining properties gives S(x) = S(x1 e1 + x2 e2 ) = S(x1 e1 ) + S(x2 e2 ) = x1 S(e1 ) + x2 S(e2 ). This innocuous equation expresses a remarkable conclusion: A linear transformation S : R2 → R2 is completely determined by its values S(e1 ), S(e2 ) at two vectors.

Matrix Representation To study vector spaces and linear transformations in greater detail, we will represent vectors and linear transformations as rectangular arrays of numbers, called matrices. The first chapter of the book introduces matrix notation, a central piece of computational machinery in linear algebra. Here we focus on motivation and geometric intuition, using the special case of linear transformations from the plane to the plane. We use the notational convention in which vectors are written as columns:  1 x 1 2 x = (x , x ) = 2 . x

With this notation, a linear transformation S : R2 → R2 is completely and uniquely specified by scalars Aji satisfying  1  1 A2 A1 1 2 1 2 . , S(e2 ) = A2 e1 + A2 e2 = S(e1 ) = A1 e1 + A1 e2 = 2 A22 A1 The (standard ) matrix of S assembles these into a rectangular array A:  1    A1 A12 A = S(e1 ) S(e2 ) = . A21 A22

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

vi

LINEAR ALGEBRA

The real number Aij in the ith row and jth column of A is called the (i, j)-entry, and encodes the dependence of the ith output variable on the jth input variable.

Matrix Multiplication For all x, we have S(x) = x1 S(e1 ) + x2 S(e2 )  1  1 1   1 A1 x + A12 x2 2 A2 1 A1 = . +x =x A21 x1 + A22 x2 A22 A21 The expression on the right may be interpreted as a type of “product” of the matrix of S and the column vector of x:  1  1 1     1 y A1 x + A12 x2 A1 A12 x1 = , = y2 A21 A22 x2 A21 x1 + A22 x2 or simply y = Ax. The second equality defines the product of the “2 × 2 square matrix A” and the “2 × 1 column matrix x”. Particularly when the number of variables is large, sigma (summation) notation comes into its own, both condensing common expressions and highlighting their structure. The relationship between the j inputs the outputs y i of a linear transformation may be written Px and i j i y = j Aj x .∗ Composition of Linear Transformations

If T : R2 → R2 is a linear transformation with matrix   1 B1 B21 , B= B12 B22  the composition T ◦ S : R2 → R2 , defined by (T ◦ S)(x) = T S(x) , is easily shown to be a linear transformation (check this). The associated Physicists sometimes go further, omitting the summation sign and implicitly summing over every index that appears both as a subscript and as a superscript: y i = Aij xj . This book does not use this “Einstein summation convention”, but the possibility of doing so explains our use of superscripts as indices, despite the potential risk of reading superscripts as exponents. Exponents appear so rarely in linear algebra that mentioning them at each occurrence is feasible. ∗

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

TO THE STUDENT

vii

matrix is a “product” of the matrices B and A of the transformations T and S. To determine the entries of this product, note that by definition, T (e1 ) = B11 e1 + B12 e2 ,

S(e1 ) = A11 e1 + A21 e2 ,

T (e2 ) = B21 e1 + B22 e2 ,

S(e2 ) = A12 e1 + A22 e2 .

Consequently, T S(e1 ) = T (A11 e1 + A21 e2 ) = A11 T (e1 ) + A21 T (e2 ) = A11 (B11 e1 + B12 e2 ) + A21 (B21 e1 + B22 e2 ) = (B11 A11 + B21 A21 )e1 + (B12 A11 + B22 A21 )e2 ; similarly (check this), T S(e2 ) = (B11 A12 + B21 A22 )e1 + (B12 A12 + B22 A22 )e2 . Since the coefficients of T (e1 ) give the first column of the matrix of T S and the coefficients of T (e2 ) give the second column of the matrix, we are led to define the matrix product by   1   1 1  B1 B21 A11 A12 B1 A1 + B21 A21 B11 A12 + B21 A22 BA = = . B12 B22 A21 A22 B12 A11 + B22 A21 B12 A12 + B22 A22 This forbidding collection of formulas is clarified by summation notation: X Bki Akj . (BA)ij = k

The preceding equation has precisely the same form for matrices of arbitrary size, and furnishes our general definition of matrix multiplication. (Convince yourself that the entries of BA are given by the preceding summation formula.) When working computationally with specific matrices, the formula is generally less important than the procedure encoded by the formula. First, define the “product” of a “row” and a “column” by       a 0 a0 . 0 ba + b b b = a0

Now, to find the (i, j)-entry of the product BA, multiply the ith row of B by the jth column of A. For example, to find the entry in the first row and second column of BA, multiply the first row of B by the second column of A.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

viii

LINEAR ALGEBRA

Geometry of Linear Transformations Consider the linear transformation S that rotates R2 about the origin by π6 , and T that shears the plane horizontally by one unit: T (e2 )

S(e2 ) S(e1 )

e2 0

0 e1

0

T (e1 )

The matrix of each may be read off the images of the standard basis vectors. Thus, √      3 −1 cos π6 cos 2π 1 3 √ , =2 A = S(e1 ) S(e2 ) = sin π6 sin 2π 3 1 3     1 1 . B = T (e1 ) T (e2 ) = 0 1

The composite transformations T S (rotate, then shear) and ST (shear, then rotate) are linear, and their matrices may be found by matrix multiplication: √   √ √ √ 3+1 3√− 1 3 √3 − 1 1 1 , AB = 2 . BA = 2 3 3+1 1 1

Note carefully that T S 6= ST . Composition of linear transformations, and consequently multiplication of square matrices, is generally a noncommutative operation.

ST (e2 ) T S(e2 ) T S(e1 ) 0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

ST (e1 ) 0

TO THE STUDENT

ix

Diagonalization The so-called identity matrix 

1 0 0 1



corresponds to the identity mapping I(x) = x for all x in R2 . Generally, if λ1 and λ2 are real numbers, the diagonal matrix   1 λ 0 0 λ2 corresponds to axial scaling, (x1 , x2 ) 7→ (λ1 x1 , λ2 x2 ). Diagonal matrices are among the simplest matrices. In particular, if n is a positive integer, the nth power of a diagonal matrix is trivially calculated: n  1 n  1  λ 0 (λ ) 0 . = 0 λ2 0 (λ2 )n The solution to a variety of mathematical problems rests on our ability to compute arbitrary powers of a matrix. We are naturally led to ask: If S : R2 → R2 is a linear transformation, does there exist a coordinate system in which S acts by axial scaling? This question turns out to reduce to existence of scalars λ1 and λ2 , and of non-zero vectors v1 and v2 , such that S(v1 ) = λ1 v1 ,

S(v2 ) = λ2 v2 .

Each λi is an eigenvalue of S; each vi is an eigenvector of S. A pair of non-proportional eigenvectors in the plane is an eigenbasis for S. A linear transformation may or may not admit an eigenbasis. The rotation S of the preceding example has no real eigenvalues at all. The shear T has one real eigenvalue, and admits an eigenvector, but has no eigenbasis. The compositions T S and ST both turn out to admit eigenbases.

Structural Summary Basic linear algebra has three parallel “levels”. In increasing order of abstraction, they are: (i) “The level of entries”: Column and matrices written out P vectors i i j as arrays of numbers. (y = j Aj x .)

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

x

LINEAR ALGEBRA

(ii) “The level of matrices”: Column vectors and matrices written as single entities. (y = Ax.) (iii) “The abstract level”: Vectors (defined axiomatically) and linear transformations (mappings that distribute over linear combinations). (y = S(x).) Linear algebra is, at heart, the study of linear combinations and mappings that “respect” them. Along your journey through the material, strive to detect the levels’ respective viewpoints and idioms. Among the most universal idioms is this: A linear combination of linear combinations is a linear combination. Matrices are designed expressly to handle the bookkeeping details. In summation notation at level (i), if i

z =

m X

Bki y k

k

and y =

n X

Akj xj ,

j=1

k=1

then substitution of the second into the first gives ! m m n n n X X X X X zi = Akj xj = Bki Akj xj = Bki (BA)ij xj . k=1

j=1

j=1

k=1

j=1

Once we establish properties of matrix operations, the preceding can be distilled down to an extremely simple computation at level (ii): If z = By and y = Ax, then z = B(Ax) = (BA)x.

Organization of the Book The first chapter introduces real matrices as formally and quickly as feasible. The goal is to construct machinery for flexible computation. The book proceeds to introduce vector spaces and their properties, two auxiliary pieces of algebraic machinery (the dot product, and the determinant function on square matrices, each of which has a useful geometric interpretation), linear transformations and their properties, and diagonalization. Of necessity, the motivation for a particular definition may not be immediately apparent. At each stage, we are merely generalizing and systematizing the preceding summary. It may help to review this preface periodically as you proceed through the material.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Contents 1 Matrix Algebra 1.1 Matrix Multiplication . . . . 1.2 Systems of Linear Equations 1.3 Criteria for Invertibility . . Exercises . . . . . . . . . . . 2 Vector Spaces 2.1 Vector Space Axioms . . . 2.2 Subspaces . . . . . . . . . 2.3 Spans, Sums of Subspaces 2.4 Bases and Dimension . . . Exercises . . . . . . . . . . 3 Euclidean Geometry 3.1 Inner Products . . . . 3.2 Orthogonal Projection 3.3 The Determinant . . . Exercises . . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . .

1 3 9 16 19

. . . . .

23 23 30 35 39 51

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

53 53 57 64 71

4 Linear Transformations 4.1 Linear Transformations . . . . . . . . 4.2 The Space of Linear Transformations 4.3 The Rank-Nullity Theorem . . . . . . Exercises . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

73 73 79 86 91

. . . .

95 95 96 100 105

. . . .

. . . .

. . . .

. . . .

5 Diagonalization 5.1 Linear Operators . . . . . . . 5.2 Eigenvalues and Eigenvectors 5.3 The Characteristic Polynomial 5.4 Symmetric Operators . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

xi

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

. . . .

. . . .

xii

LINEAR ALGEBRA 5.5

Applications . . . . . . . . . . . . . . . . . . . . . . . . . 108 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Index

119

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Chapter 1 Matrix Algebra Definition 1.1. Let m and n be positive integers. An m × n matrix is a rectangular array having m rows and n columns:   A11 A12 . . . A1n  A2 A2 . . . A2  n  1 2  .. ..  , .. . .  . . .  . m m A1 A2 . . . Am n

or simply [Aij ] for brevity if the size is implicit (or unimportant). The (i, j) entry Aij is written in the ith row and jth column. If A = [Aij ] and B = [Bji ] are m × n real matrices, their sum is the m × n matrix defined by A + B = [Aij + Bji ]. If c is real, we define scalar multiplication by cA = c[Aij ] = [cAij ].

Remark 1.2. The superscripts are row indices, not exponents. Explicit mention is made in this book when superscripts do stand for exponents. The idiomatic value of this notational convention will gradually become apparent. When we write [Aij ], the indices i and j are dummies, having no meaning outside the brackets. We may use any convenient letters without changing the meaning, e.g., [Aij ] = [Ak` ], so long as the row index runs from 1 to m and the column index runs from 1 to n. What matters is that an unambiguous matrix entry is associated to each pair of indices. Remark 1.3. In this book, the entries of a matrix are usually real numbers. We may speak of real matrices to emphasize this fact. The set of all m × n real matrices is denoted Rm×n . 1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

2

LINEAR ALGEBRA

However, the entries of a matrix might be other kinds of numbers (leading us to speak, for example, of integer matrices or complex matrices), or even other matrices (in which case we speak of block matrices). The space of all complex m × n matrices is denoted Cm×n . Definition 1.4. A matrix of size n × 1 is called a column. We denote the set of columns by Rn (read “RN”) rather than by Rn×1 . A matrix of size 1 × n is called a row. We denote the set of rows by (Rn )∗ (read “RN star”) rather than by R1×n . If A = [Aij ] is an m × n matrix, Ai denotes the ith row, an element of (Rn )∗ , and Aj denotes the jth column, in Rm . Definition 1.5. If i and j are integers, the expression ( 1 if i = j, δji = 0 if i 6= j, is called the Kronecker delta-symbol. Example 1.6. The following    1 1 0 0 0  0 1 0 , 0 0 0 1

matrices  0 1 , 0

are all defined by Aij = δji :     1 1 0 0  0 . , 0 1 0 0

Definition 1.7. Let n be a positive integer. For each j = 1, . . . , n, we define ej to be the element of Rn whose ith row is δji . That is, the jth row is 1 and all other entries are 0:       0 0 1 0 1 0       e1 =  ..  , e2 =  ..  , . . . , en =  ..  . . . . 1 0 0 The ordered set (ej )nj=1 is called the standard basis of Rn .

Definition 1.8. Let n be a positive integer. For each i = 1, . . . , n, we define ei to be the element of (Rn )∗ whose jth column is δji . That is, the ith column is 1 and all other entries are 0:       e1 = 1 0 . . . 0 , e2 = 0 1 . . . 0 , . . . , en = 0 0 . . . 1 . The ordered set (ei )ni=1 is called the standard dual basis of (Rn )∗ .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

3

Definition 1.9. Let n be a positive integer. A matrix of size n × n is called a square matrix. The main diagonal of a square matrix is the set of entries (Ajj )nj=1 running from the upper left to the lower right. A square matrix is diagonal if every non-diagonal entry is 0. The diagonal matrix In in Rn×n defined by Aij = δji is called the (n × n) identity matrix.     1 0 , Example 1.10. The identity matrices are I1 = 1 , I2 = 0 1     1 0 ... 0   1 0 0 0 1 0 0  0 1 . . . 0  0 1 0 0   , I 3 =  0 1 0 , I4 =  I =  .. .. . . ..  . n  0 0 1 0 . . . . 0 0 1 0 0 0 1 0 0 ... 1

Remark 1.11. The jth column of In is the standard basis vector ej . In other words, In may be written as a (block) row matrix whose jth column is ej , namely, In = [e1 e2 . . . en ]. Similarly, the ith row of In is ei , so In may be written as a (block) column matrix whose ith row is ei . Definition 1.12. If A = [Aij ] ∈ Rm×n , the transpose of A is the

matrix AT in Rn×m whose (i, j) entry is Aji .  1 2 3 T  1 1  a b a a a Example 1.13. 1 2 3 = a2 b2 , and vice versa. b b b a3 b3

Remark 1.14. The transpose of a row matrix is a column, and vice versa. Particularly, the transpose defines a map from Rn (the set of columns) i i T to (Rn )∗ (the set of rows). For example, eT i = e and (e ) = ei . Generally, the ith row of AT is the transpose of the ith column of A, and the jth column of AT is the transpose of the jth row of A. The transpose of AT is A itself: (AT )T = A. Remark 1.15. If A = [Aij ] and B = [Bji ] are in Rm×n , and if c is real, then (cA + B)T = cAT + B T . This is immediate by comparing entries: (cA + B)ji = cAji + Bij .

1.1

Matrix Multiplication

An m × n real matrix is a “data structure” containing mn real numbers. Matrix addition and scalar multiplication merely “parallelize”

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

4

LINEAR ALGEBRA

real addition and real multiplication. A third operation, matrix multiplication, “mixes” entries. The definition is introduced in two stages. Definition 1.16. Let m, p, and n be positive integers. Assume   v1 v 2    ` = [`k ] = [`1 `2 . . . `p ] ∈ (Rp )∗ , v = [v k ] =  ..  ∈ Rp . . vp

We define the product or dual pairing h | iRp : (Rp )∗ × Rp → R by   v1 p v 2  X   1 2 p h` | viRp = [`1 `2 . . . `p ]  ..  = `1 v + `2 v + · · · + `p v = `k v k . . k=1 vp

Generally, if A = [Aik ] ∈ Rm×p and B = [Bjk ] ∈ Rp×n , their (matrix) product is the m × n matrix C = [Cji ] defined by Cji



i

= A | Bj



Rp

=

Ai1 Bj1

+ Ai2 Bj2

+ · · · + Aip Bjp

=

p X

Aik Bjk .

k=1

That is, Cji is the dual pairing of the ith row of A, an element of (Rp )∗ , and the jth column of B, an element of Rp . Example 1.17. Let n be a positive integer, (ei )ni=1 the standard basis of Rn , and (ej )nj=1 the standard dual basis of (Rn )∗ . For 1 ≤ i, j ≤ n, we have ej ei = δij , a 1 × 1 matrix. (In this “inner” product, p = n.)

Example 1.18. Let m and n be positive integers, (ei )m i=1 the standard m j n basis of R , and (e )j=1 the standard dual basis of (Rn )∗ . For each i = 1, . . . , m and j = 1, . . . , n, the “outer” product which p = 1) is the m × n matrix having a 1 in the 0’s everywhere else:    0 0 ... 0 ... ..  ..   .. . .  .     ei ej = 1 0 . . . 1 . . . 0 =  0 . . . 1 . . . . . ..  ..  ..  . 0 ... 0 ... 0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

eji = ei ej (for (i, j) entry and  0 ..  .  0 = eji . ..  . 0

CHAPTER 1. MATRIX ALGEBRA

5

Matrix multiplication is associative, has an identity element, commutes with scalar multiplication, and distributes over addition. These properties allow us to work algebraically with matrices as individual entities, often avoiding the messiness of working with large collections of entries. To state these properties precisely requires only that we exercise care regarding sizes of operands. Theorem 1.19. Let m, p, q, n be positive integers, c a real number, k

A = [Aik ] ∈ Rm×p , B = [B`k ], B 0 = [B 0 ` ] ∈ Rp×q , C = [Cj` ] ∈ Rq×n . (i) (AB)C = A(BC); (ii) Im A = A and AIp = A; (iii) A(cB) = c(AB); (iv) A(B + B 0 ) = AB + AB 0 and (B + B 0 )C = BC + B 0 C. Sketch of proof. In each part, it suffices to show that two matrices have the same (i, j) entries for all indices i and j. P P (i). It suffices to prove ` (AB)i` Cj` = k Aik (BC)kj . But q p X X `=1

k=1

Aik B`k

!

Cj` =

p X q X

Aik B`k Cj` =

k=1 `=1

p X

Aik

k=1

q X

B`k Cj`

`=1

!

.

(ii). For 1 ≤ i ≤ m and 1 ≤ j ≤ p, we have (AIp )ij

=

p X

Aik δjk = Aij ;

k=1

because of the factor δjk , only the term with k = j is non-zero. The other identity is similar. (iii).

p X

Aik (cBjk )

k=1

(iv).

p X k=1

=

p X

(cAik )Bjk .

k=1

Aik (Bjk

k + B0j )

=

p X

k

(Aik Bjk + Aik B 0 j ).

k=1

The other distributive law is similar.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

6

LINEAR ALGEBRA

Remark 1.20. In the proof of (ii), multiplying by a Kronecker delta and summing selects one term. This computational idiom is important. Despite the “mixing” of entries in the definition of matrix multiplication, each identity comes down to a corresponding property for ordinary real number arithmetic. Multiplication by a standard row or column “selects” part of A, yielding useful criteria for two matrices to be equal. Corollary 1.21. Let A be an m×n matrix, and let i and j be arbitrary indices with 1 ≤ i ≤ m and 1 ≤ j ≤ n. Viewing ei as a row matrix in (Rn )∗ and ej as a column matrix in Rm : (i) The product ei A is Ai , the ith row of A; (ii) The product Aej is Aj , the jth column of A; (iii) The product ei Aej is Aij , the (i, j) entry of A. Particularly, if Aej = 0m for j = 1, . . . , n, then A = 0m×n . Proof. Since ei is the ith row of In and ej is the jth column of Im , (i), (ii), and (iii) follow immediately from Theorem 1.19 (ii). Finally, if Aej is the zero vector for every j, then every entry of A is zero. Remark 1.22. To summarize, if Ax = 0m for all x in Rn (or even if this equation holds merely for the standard basis vectors), then A is the zero matrix. (The converse is obvious.) Analogously, if y T A = (0n )T for all y in Rn (or even if ei A = (0n )T for i = 1, . . . , m), then A = 0m×n . Corollary 1.23. If A and A0 are m × n matrices, and if Aej = A0 ej for j = 1, . . . , n, then A = A0 . Proof. Apply Corollary 1.21 to the difference A − A0 , and use the distributive law to note that (A − A0 )ej = Aej − A0 ej .

Non-Commutativity of Matrix Multiplication Matrix multiplication is not commutative. Even if A and B are both n × n, so that both products AB and BA are defined and have the same size, it need not be the case that (AB)ij =

n X k=1

Aik Bjk

and (BA)ij =

n X k=1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Bki Akj

CHAPTER 1. MATRIX ALGEBRA

7

are equal for all i and j. Example 1.24. The matrices       1 0 0 0 0 1 , satisfy AB = and B = A= 0 0 1 0 0 0

 0 0 . BA = 0 1 

Example 1.25. If A is an n×n matrix, then A commutes with matrices including In , A, A2 , and in general every “polynomial” in A. Theorem 1.26. If A = [Aik ] ∈ Rm×p and B = [Bjk ] ∈ Rp×n , then (AB)T = B T AT as elements of Rn×m . P Proof. Since (AB)ij = k Aik Bjk , the (i, j) entry of (AB)T is (AB)ji

=

p X

Ajk Bik

k=1

=

p X

Bik Ajk

k=1

=

p X

(B T )ik (AT )kj ,

k=1

the (i, j) entry of B T AT . Remark 1.27. A “visual” proof of Theorem 1.26 can be given by considering the following diagrammatic depiction of matrix multiplication: Bj Bj1 Bj2 .. . B

Ai Ai1 Ai2 A

· · · Aip

Bjp

(AB)ij AB

Reflecting the paper across the diagonal line has the effect of transposing and reversing the order of the factors, and of transposing the product.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

8

LINEAR ALGEBRA

Invertibility of Square Matrices Definition 1.28. Let n be a positive integer. An n × n matrix A is invertible if there exists an n × n matrix B such that AB = In and BA = In . Such a matrix B is called an inverse of A. Remark 1.29. Part (i) of the next result shows an invertible matrix has a unique inverse. We may therefore speak of the inverse of A, and denote it A−1 . Note that A−1 is invertible, and (A−1 )−1 = A. Theorem 1.30. Let A, B, and B 0 be n × n matrices. (i) If B and B 0 are inverses of A, then B = B 0 . (ii) If AB = In , and if either A or B is invertible, then BA = In , so B = A−1 . (iii) If A and B are invertible, then their product AB is invertible, and (AB)−1 = B −1 A−1 . (iv) If A is invertible, then AT is invertible, and (AT )−1 = (A−1 )T . Proof. Each assertion is a general consequence of associativity: (i). By hypothesis, BA = In and AB 0 = In . Consequently, B = BIn = B(AB 0 ) = (BA)B 0 = In B 0 = B 0 . (ii). If AB = In and A is invertible, multiplying on the left by A−1 and on the right by A gives In = A−1 In A = A−1 (AB)A = (A−1 A)BA = BA. If instead B is invertible, In = BIn B −1 = B(AB)B −1 = BA. (iii). By definition, an inverse of AB is any matrix C such that (AB)C = In and C(AB) = In : It suffices to check (AB)(B −1 A−1 ) = In and (B −1 A−1 )(AB) = In . By associativity of matrix multiplication,  (AB)(B −1 A−1 ) = (AB)B −1 A−1  = A(BB −1 ) A−1 = (AIn )A−1 = AA−1 = In . The other equality is similar.

(iv). By Theorem 1.26, AT (A−1 )T = (A−1 A)T = InT = In and = (AA−1 )T = InT = In .

(A−1 )T AT

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

1.2

9

Systems of Linear Equations

Definition 1.31. Let A = [Aij ] be a real m × n matrix. The equation Ax = 0m , with x = (x1 , . . . , xn ) unknown, is called a (homogeneous) linear system of m equations in n variables. If b = (b1 , . . . , bm ) 6= 0m is a fixed element of Rm , the equation Ax = b is called a non-homogeneous linear system of m equations in n unknowns. If there exists an x0 such that Ax0 = b, the system is said to be consistent. The matrix A is called the coefficient matrix. The m × (n + 1) block matrix [A b] is called the augmented matrix. The zero vector x = 0n always satisfies the homogeneous system. The general goal with a homogeneous system is to describe the set of solutions, and in particular, to determine whether the system has non-trivial solutions (i.e., solutions other than 0n ). A non-homogeneous system may or may not be consistent. The solution set of a consistent non-homogeneous system is closely related to the soluton set of the associated homogeneous system, see Theorem 1.32. An algorithm for solving linear systems is presented below; the output of the algorithm is an equivalent linear system (i.e., having precisely the same set of solutions) that can be solved (or shown to be inconsistent) by inspection. Theorem 1.32. Let A be an m×n real matrix and b an element of Rm . If there exists a solution x0 , i.e., an element of Rn satisfying Ax0 = b, then Ax = b if and only if x = x0 + xh for some xh in Rn satisfying Axh = 0m . Proof. By hypothesis, Ax0 = b. Let xh be an arbitrary element of Rn . If we write x = x0 + xh , then Ax = b if and only if Ax = Ax0 , if and only if 0m = Ax − Ax0 = A(x − x0 ) = Axh .

Row Reduction Fix an m × n matrix A and a column b in Rm . (We allow b = 0m , and treat the homogeneous and non-homogeneous cases together.) Let Ai denote the ith row of A. The matrix equation Ax = b is equivalent,

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

10

LINEAR ALGEBRA

by equating components, to the system of m scalar equations A1 x = b1 ,

A2 x = b 2 ,

...,

Am x = bm .

We begin by giving precise criteria for a coefficient matrix under which a system can be “solved by inspection”. Definition 1.33. Let A be a real m × n matrix with rows Ai . A row Ai has a leading 1 if the row is not zero, and the first (leftmost) non-zero entry is 1. The matrix A is in row-echelon form if: (i) Every non-zero row of A has a leading 1; (ii) Successive leading 1’s occur further to the right. That is, if the leading 1’s occur in positions (1, k1 ), (2, k2 ), . . . , (`, k` ), then the column indices increase: k1 < k2 < · · · < k` ; (iii) Zero rows (if any) are collected at the bottom. The matrix A is in reduced row-echelon form if A is in row-echelon form, and in addition each leading 1 is the only non-zero entry in its column. Remark 1.34. Though an m × n matrix A has infinitely many rowechelon forms, the reduced row-echelon form of A is unique, see Theorem 1.42. Remark 1.35. If a coefficient matrix is in row-echelon form, the number of leading 1’s is at most min(m, n), the smaller of the number of variables and the number of equations. Definition 1.36. If the leading 1’s in the reduced row-echelon matrix of A have positions (1, k1 ), . . . , (`, k` ), the variables xk1 , . . . , xk` of the system Ax = b are said to be basic; the remaining variables are free. Remark 1.37. A non-homogeneous system Ax = b is inconsistent (has no solution) if and only if the augmented matrix (reduced to rowechelon form) has a leading 1 in the last column; such a row corresponds to the inconsistent equation 0 = 1. When the augmented matrix of a consistent system is in reduced row-echelon form, the basic variables are expressed in terms of the free variables, and the free variables may take arbitrary real values. A homogeneous system has a non-trivial solution if and only if there is at least one free variable. Particularly, if m < n (more variables than equations), a homogeneous system Ax = 0 has a non-trivial solution.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

11

Elementary Row Operations Next we describe a set of transformations that do not change the solution set of a linear system, and suffice to bring the augmented matrix to reduced row-echelon form. I. Add a multiple of one equation to another equation. II. Multiply some equation by a non-zero real number. III. Swap two equations. Symbolically, if i and i0 denote distinct indices, the preceding operations correspond to 0

0

I. Replacing Ai x = bi with (Ai + cAi )x = bi + cbi for some real c. II. Replacing Ai x = bi with cAi x = cbi for some c 6= 0. 0

0

III. Exchanging Ai x = bi and Ai x = bi . Proposition 1.38. Each of the preceding operations preserves the set of solutions. Proof. Operations II. and III. obviously do not change the set of solu0 0 tions. To see the same is true for I., note that if Ai x = bi and Ai x = bi , then 0 0 0 0 (Ai + cAi )x = bi + cbi and Ai x = bi ; that is, every solution of the system before I. is a solution of the system after I. Conversely, if 0

(Ai + cAi )x = bi + cbi

0

0

0

and Ai x = bi ;

then adding −c times the second to the first gives Ai x = bi

0

0

and Ai x = bi ;

that is, every solution of the system after I. is a solution of the system before I. Conceptually, each of the three operations is reversible. A system of equations Ax = b can be recovered from its augmented matrix; there is no need to “carry along” the variables x when calculating. Performed on the augmented matrix, the preceding transformations are called elementary row operations of type I. (add a multiple of

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

12

LINEAR ALGEBRA

one row to another row), II. (multiply a row by a non-zero number), or III. (swap two rows). The row-reduction algorithm has two “stages”, each recursive. The first stage puts the augmented matrix into row-echelon form; the second puts the augmented matrix in reduced row echelon form. 1.0. Find the leftmost column of coefficients that is not 0m ; call this the “active” column. 1.1. Pick a non-zero entry in the active column and use operations of type I. or II. to make that entry equal to 1. Then swap this row with the first row, so the active column has 1 as its topmost entry. 1.2. Use operations of type I. to make every entry in the active column (except for the topmost) equal to 0. Specifically, if the ith row of the active column contains αi , add −αi times the first row to the ith row. 1.3 Mentally cross out the first row. If there are rows remaining, start over at Step 0; otherwise the first stage is complete. The algorithm terminates after finitely many steps because in each “round”, the matrix being modified has one fewer rows and at least one fewer non-zero columns. To put a row-echelon matrix into reduced row-echelon form, we need only make the entries “above” each leading 1 equal to 0. 2.1. Find the rightmost leading 1 and use row operations of type I. to cancel the non-zero entries “above” it (similarly to I.2). 2.2. Mentally cross out the last non-zero row. If there are more leading 1’s, start over at Step 2.1; otherwise the algorithm is complete. Example 1.39. Describe (recalling that superscripts are indices, not exponents) the solution set of the non-homogeneous linear system 3x1 + x2 + 3x3 = −11,

2x1

+ 4x3 = −10,



 3 1 3 −11 . 2 0 4 −10

The augmented matrix at right is read off by inspection. The row reduction algorithm operates on this matrix. Round 1 (Step 0) The first column is not 0, so becomes “active”.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

13

(Step 1) Our goal is to produce a 1 in the first column with elementary row operations. There are at least three strategies: Multiply the first row by 1/3; multiply the second row by 1/2; subtract the second row from the first. As a practical matter, avoiding fractions usually means less work. Since the coefficients in the second row are all even, dividing the second row by 2 avoids fractions; do this, then swap the rows:       3 1 3 −11 21 R2 3 1 3 −11 R1 ↔R2 1 0 2 −5 −→ −→ . 2 0 4 −10 1 0 2 −5 3 1 3 −11 (Step 2) Subtract multiples of the first row from the remaining row(s) to “clear” the active column:     2 −5 1 0 2 −5 R2 −3R1 1 0 . −→ 4 3 1 3 −11 0 1 −3 (Step 3) Mentally cross out the first row, look at the remaining system of (m − 1) = 1 equation(s) in (n − 1) = 2 variables, and start over:   0 1 −3 4 . x2 − 3x3 = 4,

Round 2 (Step 0) The (new) first non-zero column becomes active. (Step 1) The “active” coefficient is already 1, and there are no additional rows, so the first stage is complete. In fact, the augmented matrix is already in reduced row echelon form, so nothing needs to be done for the second stage. We have shown the original system is equivalent to (has the same set of solutions as)   x1 + 2x3 = −5, 1 0 2 −5 . 4 0 1 −3 x2 − 3x3 = 4, The basic variables are x1 and x2 , while x3 is free. To describe an arbitrary solution, write each basic variable in terms of the free variable(s):  1       3 x −5 − 2x −5 −2 1 3 x = −5 − 2x , x2  =  4 + 3x3  =  4 + x3  3 . x2 = 4 + 3x3 , 0 1 x3 x3

On the right, the general solution of the original system is in “parametric form”. To each real value of the “parameter” x3 there corresponds a solution x of the original system.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

14

LINEAR ALGEBRA

Remark 1.40. In the preceding example, x0 = (−5, 4, 0) is a “particular” solution, and every scalar multiple of (−2, 3, 1) is a homogeneous solution xh , as you should check, compare Theorem 1.32. Example 1.41. Find the general solution of 3x1 + 12x2 + x3 + x4 + 3x5 = 7, 2x1 + 8x2 + x3 + x4 + x5 = 5, x1 + 4x2 + 3x3 + x4 − x5 = 5,

−x1 − 4x2 + 3x3 + The row reduction  3 12  2 8   1 4 −1 −4 R2 −2R1 ,  R3 −3R1 , 1 R4 +R1  0

− 2x5 = 1.

algorithm applied to the augmented matrix gives:    1 1 3 7 1 4 3 1 −1 5 1 1 1 5 8 1 1 1 5 R1 ↔R3   2   −→  3 12 1 1 3 1 −1 5 3 7 3 0 −2 1 −1 −4 3 0 −2 1

 4 3 1 −1 5 0 −5 −1 3 −5  −→  0 0 −8 −2 6 −8 6 0 0 6 1 −3   R3 −4R2 , 1 4 3 1 −1 5 0 0 1 0 0 1 R4 −6R2   −→  0 0 0 1 −3 0 0 0 0 1 −3 0   1 4 3 0 2 5 0 0 1 0 0 1 R1 −R3   −→  0 0 0 1 −3 0 0 0 0 0 0 0



R2 +R4 , 1 − 21 R3 0

4 0 −→  0 0 0 0  1 4 4 3 0 0 R −R  −→  0 0 0 0  1 4 0 0 R1 −3R2  −→  0 0 0 0

3 1 4 6 3 1 0 0 0 1 0 0

 1 −1 5 0 0 1  1 −3 4 1 −3 6  1 −1 5 0 0 1  (∗) 1 −3 0 0 0 0  0 2 2 0 0 1 . 1 −3 0 0 0 0

The starred matrix is in row-echelon form, with leading 1’s in positions (1, 1), (2, 3), and (3, 4). The basic variables are therefore x1 , x3 , and x4 ; the remaining variables, x2 and x5 , are free. To parametrize the solution space, write x1 = 2 − 4x2 − 2x5 , x3 = 1, x4 = 3x5 , so  1         x 2 − 4x2 − 2x5 −2 −4 2 2  x2          x  3   0  1  0 5 2 x  =   = 1 + x  0 + x  0 . 1  4         5 x    3  0  0 3x 5 5 1 0 0 x x |{z} | {z } x0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

xh

CHAPTER 1. MATRIX ALGEBRA

15

Here again, x0 = (2, 0, 1, 0, 0) is a particular solution, and the remaining portion is the general homogeneous solution.

Uniqueness of Reduced Row-Echelon Form Theorem 1.42. Let A be an m × n matrix. If A01 and A02 are matrices obtained by putting A into reduced row-echelon form, then A01 = A02 . Proof.∗ If an m × (k + 1) matrix A is in reduced row-echelon form, then the m × k matrix B obtained by removing the (k + 1)th column is in reduced row-echelon form: Removing the last column cannot create a leading 1 or remove a row of 0’s. We proceed by induction on the number of columns. If A has one column, the theorem is obvious. Assume inductively for some k ≥ 1 that every m × k matrix has a unique reduced row-echelon form. Let A be m × (k + 1), and suppose A01 and A02 are reduced row-echelon forms of A. Let B, B10 and B20 be the m × k matrices obtained by removing the (k + 1)th column of A, A01 and A02 , respectively. Performing a sequence of row operations and then removing the (k + 1)th column has the same effect as removing the (k + 1)th column and performing the same sequence of row operations. Consequently, B10 and B20 are reduced row-echelon forms of B, hence are equal by the inductive hypothesis. It suffices to prove A01 and A02 have the same (k + 1)th column. Because A01 and A02 are obtained from A by elementary row operations, the homogeneous systems A01 x = 0, and A02 x = 0 have the same solution sets. Let ` be the number of leading 1’s in B10 , i.e., the number of non-zero rows after deleting the (k + 1)th column of A01 . If A01 has a leading 1 in the (k + 1)th column, then the (` + 1)th row of A01 x = 0 reads xk+1 = 0. Since A02 x = 0 has the same solution set, A02 also has a leading 1 in the (k + 1)th column, necessarily in the (` + 1)th row. Since each leading 1 is the only non-zero entry in its column, A01 = A02 . (This is the only step where the hypothesis of reduced row-echelon form is used.) If instead A01 does not have a leading 1 in the (k + 1)th column, then there exists an x = (x1 , . . . , xk , 1) satisfying A01 x = 0 = A02 x, and therefore (A01 − A02 )x = 0. However, the first k columns of A01 − A02 ∗

Adapted from Thomas Yuster, The Reduced Row Echelon Form of a Matrix is Unique: A Simple Proof, Mathematics Magazine, 57, 2 (1984), 93–94.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

16

LINEAR ALGEBRA

are zero, so 0m = (A01 − A02 )x is the (k + 1)th column of A01 − A02 , and again A01 = A02 . In either case, we have shown that a general m × (k + 1) matrix A has a unique reduced row-echelon form. This establishes the inductive step, and completes the proof.

1.3

Criteria for Invertibility

The basic properties of invertibility “at the level of matrices” are given by Theorem 1.30. In practice, we need algorithmic criteria for detecting invertibility of specific matrices, and methods for calculating inverses of invertible matrices. These algorithms necessarily operate “at the level of entries”. A 1×1 matrix is essentially a real number, A = [A11 ], and is invertible if the single entry is non-zero, in which case A−1 = [1/A11 ]. A general 2 × 2 matrix admits a similar criterion for invertibility and a formula for the inverse: Proposition 1.43. A 2 × 2 matrix A = [Aij ] is invertible if and only if the quantity ∆ = A11 A22 − A21 A12 is non-zero, in which case  A11 A12 , A= A21 A22 

−1

A

  1 A22 −A12 . = A11 ∆ −A21

Proof. The rows are proportional if and only if ∆ = 0, if and only if the row-echelon form has a row of zeros, if and only if A is not invertible. The formula is proven by verifying AA−1 = I2 and A−1 A = I2 . Remark 1.44. This formula for the inverse of a 2 × 2 matrix should be memorized. It may help to write     1 a b d −b −1 , A = A= . a c d ad − bc −c The quantity ∆, the determinant of A, is studied in Chapter 3. There are analogous formulas for larger matrices, but they are prohibitively complicated for practical use. Instead we develop a rowreduction algorithm for inverting an n × n matrix (or detecting that the matrix is not invertible).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

17

Elementary Matrices Definition 1.45. Let A be an m × n matrix. An m × m matrix E is an elementary matrix if the product EA gives the result of performing an elementary row operation on A. We will show by construction that elementary matrices exist for each type of row operation. The following “principle” is useful for writing down elementary matrices in practice. The proof is immediate from the definition by taking A = Im . Proposition 1.46. An elementary matrix is obtained by performing the corresponding row operation on the identity matrix Im , and is invertible. Example 1.47. The following 3 × 3 matrices (i) add c times the third row to the second (R2 + cR3 ); (ii) multiply the second row by c (cR2 ); and (iii) swap the second and third rows (R2 ↔ R3 ).       1 0 0 1 0 0 1 0 0  0 0 1 . 0 c 0 , 0 1 c  , 0 1 0 0 0 1 0 0 1

To see these matrices have the expected effect, multiply by a general matrix A, expressed as a block matrix whose entries are rows, e.g.,   1  1   1     A A A A1 1 0 0 1 0 0 0 0 1 A2  = A3  . 0 1 c  A2  = A2 + cA3  ; 0 1 0 0 0 1 A3 A2 A3 A3

To construct elementary matrices for each type of row operation, m ` m let (ek )m k=1 and (e )`=1 be the standard basis of R and the standard m ∗ dual basis of (R ) . If i and j are indices between 1 and m, recall that eji = ei ej is the m × m matrix with a 1 in the (i, j) entry and 0’s elsewhere. Remark 1.48. The product ej A is the jth row of A, and eji A = ei ej A is the m × n matrix whose ith row is the jth row of A, and whose other entries are zero. I. If c is real, then E = Im + ceji implements the type I. elementary row operation Ri + cRj : If A is an arbitrary m × n matrix, then (Im + ceji )A = A + ceji A is the result of adding c times the jth row of A to the ith row of A.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

18

LINEAR ALGEBRA The diagonal entries Ekk are all 1, the (i, j) entry is c, and all other entries of E are 0. The inverse operation is to subtract c times the ith row from the jth row: E −1 = (Im + ceji )−1 = Im − ceji . Since (eij )(eij ) = 0m×m , the inverse relationship (Im + ceji )(Im − ceji ) = Im is a difference of squares factorization, compare Exercise 1.4.

II. If c 6= 0 is real, then E = Im + (c − 1)eii implements the type II. elementary row operation cRi : (Im + (c − 1)eii )A is the result of multiplying the ith row of A by c. The matrix E is diagonal, with Eii = c and all other diagonal entries Ekk equal to 1. The inverse operation 1c Ri is to multiply the ith row by 1/c. III. If i 6= j, then E = Im − (ei − ej )(ei − ej ) implements the Type III. elementary row operation Ri ↔ Rj : EA is the result of swapping the ith and jth rows of A. τ (k)

If τ = (i j) is the transposition exchanging i and j, then E = [δk ]: the (k, k) entry is 1 if k 6= i and k 6= j, while the (i, j) and (j, i) entries are 1, and all other entries are 0. This matrix is its own inverse.

An Algorithm for Matrix Inversion A sequence of row operations reducing a matrix A to echelon form A0 may be viewed as a factorization A0 = Er Er−1 . . . E1 A, or after rear−1 Er−1 A0 , of A into a product of elementary ranging, A = E1−1 . . . Er−1 matrices and a single row-echelon matrix. This observation leads to a criterion for invertibility, and to an algorithm for computing A−1 . Theorem 1.49. If A is an m × m matrix, the following are equivalent: (i) A can be row-reduced to the identity matrix Im . (ii) A is invertible. Proof. If A can be row-reduced to the identity matrix, then A is a product of elementary matrices, and hence invertible.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

19

Inversely, if A cannot be reduced to the identity matrix, then the reduced row-echelon form of A has fewer than m leading 1’s, and therefore has a row of 0’s. It follows that the homogeneous system Ax = 0m has a non-trivial solution x0 . If B is an arbitrary m × m matrix, then (BA)x0 = B(Ax0 ) = B0m = 0m 6= x0 , so BA 6= Im . That is, A has no inverse. The algorithm for computing A−1 uses this result. First form the m × (2m) matrix [A Im ] by “augmenting” with the identity matrix. Now row reduce the left-hand block, applying the row operations all the way across. If the left-hand block is row-reduced to Im , the right-hand block is A−1 ; if the left-hand block acquires a row of 0’s, then A is not invertible. This works because a sequence of row operations reducing A to the identity may be viewed as a factorization     Er Er−1 . . . E1 A Im = Im B . Equality of the right-hand block says B = Er Er−1 . . . E1 ; equality of the left-hand block says BA = Im . But B is a product of elementary matrices, and therefore invertible. By Theorem 1.30 (ii), AB = Im , so B = A−1 .

Exercises Exercise 1.1. There are five elementary row operations on 2×2 matrices. List them, and for each, write down the corresponding elementary matrix E and its inverse E −1 , and multiply matrices to check that EE −1 = I2 . (Four of the “E”s will contain an arbitrary constant c.) Exercise 1.2. Write down the 4 × 4 elementary matrices for the indicated operations. (a) Adding c times the first or third row to the second row, or to the fourth row. (Four matrices in all.) (b) Exchanging the first, second, or third row with the fourth row. Exercise 1.3. Use the row-reduction algorithm to derive the formula for the inverse of a 2 × 2 matrix. (For simplicity, you may use the notation of Remark 1.44, and assume a 6= 0.)

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

20

LINEAR ALGEBRA

Exercise 1.4. If i, j, k, and ` are indices between 1 and m, show that ( e`i if j = k, j ` j ` ei ek = δk ei = m×m 0 if j 6= k. Referring to Example 1.24, use this formula to compute AB and BA, noting that A = e12 and B = e21 . Suggestion: Use the results of Examples 1.17 and 1.18. Exercise 1.5. Suppose A = [Aij ] is an m × n matrix, x = [xj ] is an ordered n-tuple of “inputs”, and y = Ax = [y i ] is the corresponding ordered m-tuple of “outputs” upon multiplication by A. Show that the entry Aij is the “coefficient of sensitivity of y i with respect to xj ”, in the sense that if the jth input xj is incremented by an amount ∆xj (and the remaining inputs are held fixed), then the ith output y i changes by ∆y i = Aij ∆xj . Exercise 1.6. Let A be m × p and B be p × n. (a) If the ith row of A is 0, what (if anything) can be deduced about the product AB? What if the jth column of A is 0? (b) If the ith row of B is 0, what (if anything) can be deduced about the product AB? What if the jth column of B is 0? P j j P j j Exercise 1.7. If A = j b ej are n × n diagoj a ej and B = nal matrices, prove AB = BA. (Part of the exercise is to cope with reindexing, but be sure you understand this result conceptually.) Exercise 1.8. An n × n matrix A is symmetric if AT = A. Prove that if A and B are symmetric n × n matrices, then AB is symmetric if and only if BA = AB. Hint: Use Theorem 1.26. Exercise 1.9. If A = [Aij ] and B = [Bji ] are n × n matrices, their commutator is defined to be [A, B] = AB −BA. In particular, A and B commute if and only if [A, B] = 0n×n . (a) Show that the (i, j) entry of [A, B] is [A, B]ij

=

n X k=1

(Aik Bjk − Akj Bki ).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 1. MATRIX ALGEBRA

21

(b) Show that if A, B1 , and B2 are n × n matrices and c is real, then [A, cB1 + B2 ] = c[A, B1 ] + [A, B2 ]. Suggestion: Work at the level of matrices, not at the level of entries. Exercise 1.10. Let A = [Aij ] be an n × n matrix, and e`k = ek e` the n × n matrix with a 1 in the (k, `) entry and 0’s elsewhere. (a) Show that [A, e`k ] = 0n×n if and only if Aik δj` = A`j δki for all i and j. Hint: The (i, j) entry of e`k is δki δj` .

(b) Show that AB = BA for every B in Rn×n if and only if A is a scalar matrix, i.e., there exists a real number c such that A = cIn . Hint: One direction is easy. For the converse, use part (a) and the fact that A commutes with every e`k . Exercise 1.11. Let A = [Aij ] be an m × n matrix, and let Aj denote the jth column. (a) Write down three types of of “elementary column operation” analogous to elementary row operations. (b) For each type of column operation, find an n × n matrix E such that AE is the result of performing that operation on A. (c) Does there exist an m × m matrix E such that EA implements an elementary column operation? Explain. Exercise 1.12. Let n ≥ 1 be an integer. Show that the set GL(n, R) of invertible, n × n real matrices is a group under matrix multiplication.∗ Suggestion Use Theorem 1.30. Exercise 1.13. Let σ be a bijection of the set {1, 2, . . . , n}, i.e., a permutation. The matrix Pσ =

n X

ei eσ(i) =

i=1

n X

eσ−1 (j) ej ,

j=1

whose (i, j) entry is i δσ(i)

=

σ −1 (j) δj

=

(

1 if j = σ(i), 0 if j 6= σ(i),



The notation GL(n, R) stands for the general linear group of size n, with real entries.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

22

LINEAR ALGEBRA

and which therefore has precisely one 1 in every row and column, is the associated permutation matrix. (a) Write out the two 2 × 2 permutations and the associated matrices, and the six 3 × 3 permutations and associated matrices. P P (b) If x = j xj ej , show that Pσ x = j xσ(j) ej . Verify explicitly (by multiplying matrices) for the six 3 × 3 permutation matrices. (c) Prove that Pσ is invertible, and that (Pσ )−1 = Pσ−1 = PT σ. (d) Prove that if σ and τ are permutations, then Pτ Pσ = Pτ σ . (That is, the mapping σ 7→ Pσ is an injective group homomomorphism from the symmetric group to GL(n, R).) Exercise 1.14. Let A = [Aij ] be 2 × 2, b = (b1 , b2 ), x = (x1 , x2 ), and 0 = 01×2 . Evaluate the product  1   1    x A1 A12 b1 x A b 2 2 2 2 A1 A2 b  x  = 0 1 1 1 0 0 1

in two ways: (i) as an ordinary product, and (ii) as a “product of block matrices”. Show the results are compatible in an obvious sense.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Chapter 2 Vector Spaces 2.1

Vector Space Axioms

Definition 2.1. A real vector space (V, +, ·) comprises a non-empty set V , a binary operation + : V × V → V , and a scalar multiplication map · : R × V → V satisfying, for all v1 , v2 , and v3 in V and all real numbers c1 , c2 , (i) (Associativity of +) (v1 + v2 ) + v3 = v1 + (v2 + v3 ); (ii) (Commutativity of +) v2 + v1 = v1 + v2 ; (iii) (Identity element) There exists a zero vector 0 in V such that 0+v =v =v+0

for all v in V ;

(iv) (Additive inverses) For every v in V , there exists an element −v such that v + (−v) = 0 = (−v) + v. (v) (“Associativity” of ·) (c1 c2 ) · v = c1 · (c2 · v); (vi) (Left distributivity) (c1 + c2 ) · v = c1 · v + c2 · v; (vii) (Right distributivity) c · (v1 + v2 ) = c · v1 + c · v2 ; (viii) (Normalization) 1 · v = v. Remark 2.2. The pair (V, +) is an Abelian group, on which the real numbers “act” by multiplication. Generally, scalars may (for example) be complex numbers, and one might speak of a complex vector space. In this book, vector spaces are real unless the contrary is stated explicitly. 23

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

24

LINEAR ALGEBRA

Remark 2.3. The operations + and · are part of the definition of a vector space. However, it is often convenient (and not ambiguous) to speak of “a vector space V ”, suppressing explicit mention of the operations. Remark 2.4. For emphasis, or when considering more than one vector space, we may write 0V to signify the zero vector of V . Proposition 2.5. Let (V, +, ·) be a vector space. For every v in V : (i) 0 · v = 0; (ii) (−1) · v = −v. Proof. Since 0 + 0 = 0, the left distributive law gives 0 · v + 0 · v = (0 + 0) · v = 0 · v. Canceling 0 · v gives 0 · v = 0. Now, since 0 = 1 + (−1), the left distributive law gives  0 = 0 · v = 1 + (−1) · v = 1 · v + (−1) · v.

But 1 · v = v by normalization, so 0 = v + (−1) · v. By definition of additive inverses, −v = (−1) · v.

Remark 2.6. The scalar multiplication dot is often omitted in practice. When calculating, we freely use 1v = v, 0v = 0, and (−1)v = −v.

Function Spaces Example 2.7. Let X be a non-empty set. The set F(X, R) of all real-valued functions on X becomes a vector space under “pointwise addition and scalar multiplication”, i.e.,  (f + g)(x) = f (x) + g(x), (cf )(x) = c f (x) . The zero vector is the function whose value at each point is 0; the additive inverse of a function f is the function (−f )(x) = −f (x). The axioms are immediate; a function f with domain X is an “X-tuple” of real numbers, i.e., a collection of real numbers f (x) “indexed by” the points of X, and each vector space axiom corresponds to a property of addition or multiplication of real numbers.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

25

For illustration, here is a proof of the right distributive law. If f1 and f2 are arbitrary functions on X and c is a real number, then the value of c(f1 + f2 ) at x is     c(f1 + f2 ) (x) = c (f1 + f2 )(x) Defn. of scalar multiplication   = c f1 (x) + f2 (x) Defn. of vector addition = cf1 (x) + cf2 (x) Distrib. law for real numbers = (cf1 )(x) + (cf2 )(x) Defn. of scalar multiplication   = (cf1 ) + (cf2 ) (x) Defn. of vector addition. Since the functions c(f1 + f2 ) and (cf1 ) + (cf2 ) have the same value at every x in X, they are the same function.

Example 2.8. If N = {0, 1, 2, 3, . . . } denotes the set of natural numbers, we write F(N, R) = Rω . An element of Rω is effectively an infinite ordered list of real numbers (an )∞ n=0 , i.e., a real sequence.

Column Vectors and Matrices Example 2.9. Let n be a positive integer, X = {1, 2, . . . , n} a set with n elements. Because a function f : X → R is essentially an ordered n-tuple of real numbers f (1), . . . , f (n), we denote F(X, R) by Rn . We write elements of Rn as “column vectors”. The vector space operations are defined componentwise:           x1 x1 + y 1 y1 x1 cx1      ..   ..    .. c  ...  =  ...  , ,  . + . = . xn + y n yn xn xn cxn

or simply [xj ] + [y j ] = [xj + y j ], c[xj ] = [cxj ] for brevity. The zero vector of Rn is denoted 0n . Remark 2.10. The superscripts denote row indices, not exponents. Remark 2.11. When we denote an element of Rn by [xj ], “j” is a dummy index, having no meaning outside the brackets. We may use any convenient letter without changing the meaning: [xi ] = [xj ] = [xα ], etc. When we speak of a component xj , by contrast, j has a specific (possibly implicit) numerical value between 1 and n.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

26

LINEAR ALGEBRA

Example 2.12. If X = {1, 2, . . . , m} and Y = {1, 2, . . . , n}, the Cartesian product X × Y consists of all ordered pairs (i, j) with 1 ≤ i ≤ m and 1 ≤ j ≤ n. A real-valued function f : X × Y → R is essentially a doubly-indexed family of numbers f (i, j) = Aij , namely, a real m × n matrix. That is, F(X × Y, R) = Rm×n . Addition and scalar multiplication of functions correspond to the “entrywise” operations of matrix addition and scalar multiplication, so (Rm×n , +, ·) is a vector space. The zero matrix 0m×n in Rm×n is the identity element for addition. The negative of a matrix A = [Aij ] is −A = [−Aij ].

Polynomials Example 2.13. Let n be a non-negative integer, t an indeterminate, and let tn denote the nth power of t. If a0 , a1 , . . . , an are real numbers, the expression p(t) = a0 + a1 t + a2 t2 + · · · + an tn is called a polynomial with coefficients a0 , . . . , an . The degree of p is the largest index with non-zero coefficient. If all coefficients are 0, we define the degree to be −∞. We define polynomial addition and scalar multiplication by adding or multiplying coefficients. That is, if q(t) = b0 + b1 t + · · · + bn tn , and if c is real, we define (p + q)(t) = (a0 + b0 ) + (a0 + b0 )t + · · · + (an + bn )tn , (cp)(t) = ca0 + ca1 t + ca2 t2 + · · · + can tn .

The set of all polynomials of degree at most n is denoted Pn . It is straightforward (if somewhat tedious) to check all the vector space axioms. When we study “subspaces”, we will see that only two conditions require verification, and these are encoded by the fact that a sum or scalar multiple of polynomials is itself a polynomial. Example 2.14. If a0 , a1 , . . . , an and b1 , . . . , bn are real numbers, the expressions C(t) = a0 + a1 cos t + a2 cos(2t) + . . . + an cos(nt), S(t) = b1 sin t + b2 sin(2t) + . . . + bn sin(nt),

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

27

are called the cosine polynomial with coefficients a0 , . . . , an and the sine polynomial with coefficients b1 , . . . , bn . Two cosine polynomials are added, and a cosine polynomial is multiplied by a scalar, as if they were functions on X = R. The set of cosine polynomials turns out to be a vector space under these operations; the proof boils down to the fact that a sum of cosine polynomials is a cosine polynomial, and a scalar multiple of a cosine polynomial is a cosine polynomial. Corresponding remarks for sine polynomials are true.

The Geometry of Vector Operations The vector space (R2 , +, ·) may be viewed as the Cartesian plane, a plane equipped with a distinguished pair of perpendicular lines (the coordinate axes) meeting at the origin. A point of R2 is an ordered pair x = (x1 , x2 ), and is located by treating x1 as a horizontal (eastwest) position and x2 as a vertical (north-south) position. The origin is the zero vector, 0 = (0, 0). x + 2y 2x x+y 3y 2y

x y

0

x−y

−x + 2y

x − 2y −x + y

−y −x

−2y −x − y

−2x

Figure 2.1: Vector addition and scalar multiplication in R2 . As a vector, x may be viewed as an arrow with tail at 0 and tip at x. A vector sum x+y is visualized by the parallelogram law : construct the parallelogram with a vertex at 0 and having sides x and y; the fourth corner is x + y. Scalar multiplication cx is visualized by “scaling” x

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

28

LINEAR ALGEBRA

by a factor of c. Example 2.15. In Figure 2.1, x = (2, 1) and y = (−1, 12 ) are plotted on a Cartesian grid of unit squares, along with several sums of scalar multiples. For example, x + y = (1, 32 ) and x + 2y = (2, 0). Example 2.16. Analogous pictures can be drawn in more abstract vector spaces. For example, let V = F(R, R) be the set of real-valued functions on R, regarded as a vector space with ordinary addition and scalar multiplication of functions. The functions f1 (x) = cos2 x, f2 (x) = sin2 x, g1 (x) = 1, and g2 (x) = cos(2x) may each be viewed as an arrow, and any two of these functions “span a plane”. The trigonometric identities cos2 x + sin2 x = 1,

sin2 x

cos2 x − sin2 x = cos(2x)

1

0

cos2 x may be interpreted schematically as vector sums. In particular, all four functions lie in a plane in V , since − sin2 x cos(2x) g1 = f1 + f2 and g2 = f1 − f2 . The identities cos2 x = 12 (1 + cos 2x) and sin2 x = 12 (1 − cos 2x) may also be interpreted using this diagram. (Check the details yourself.)

Linear Combinations Definition 2.17. Let (V, +, ·) be a vector space, and v1 , v2 , . . . , vp be distinct elements of V . If x1 , x2 , . . . , xp are real numbers, the expression p X 1 2 p x v1 + x v2 + · · · + x vp = xj v j , j=1

representing an element of V , is called the linear combination of the vj with coefficients xj . We say the linear combination is non-trivial if at least one coefficient xj is non-zero. Remark 2.18. If a coefficient is zero, the corresponding summand may as well be omitted. With this convention, a “non-trivial” linear combination may be regarded as a “non-empty” linear combination, i.e., as having at least one summand. An empty linear combination has value 0V .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

29

Example 2.19. If x = [xj ] ∈ Rn , and (ej )nj=1 denotes the standard basis of Rn (Definition 1.7), then         0 0 1 x1 n  1  x2  0 0 X         x= xj ej , i.e.,  ..  = x1  ..  + x2  ..  + · · · + xn  ..  . . . . . j=1 n 1 0 0 x

In words, every column x in Rn is assembled as a linear combination from the standard basis using the components of x as coefficients. Multiplying by a standard dual basis element ek selects the kth component. By Theorem 1.19,  X X n n n X j k k xj e k e j = xj δjk = xk . x ej = e x=e j=1

j=1

j=1

(Note the final step; summing xj δjk over j selects the term j = k.)

Example 2.20. row ` = [`i ] in (Rn )∗ may be written as a linear P Every i combination i `i e of the standard dual basis (ei )ni=1 .

Example 2.21. If A = [A1 . . . An ] is an m × n matrix (partitioned into n columns, each an element of Rm x = [xj ] is an element P) and j n of R , then the matrix product Ax = j x Aj , an element of Rm , is precisely the linear combination of the columns of A with coefficients xj .

Example 2.22. Recall that the product eji = ei ej in Rm×n has a 1 in the (i, j) entry and 0’s elsewhere. If A = [Aij ] is an arbitrary element of Rm×n , then m X n m X n X X i j A= Aj e i e = Aij eji , i=1 j=1

i=1 j=1

compare Example 2.19. In particular, In =

n X i=1

i

ei e =

n X

eii in Rn×n .

i=1

Remark 2.23. A computational proof of the preceding claim illustrates a couple of algebraic idioms. Let A0 denote the sum on the right. Since j is a dummy index in the expression for A0 , we may call it k. Now, if j is an arbitrary index, j = 1, . . . , n, then ! m X n m X n m X X X A0 e j = Aik ei ek ej = Aik ei δjk = Aij ei = Aej . i=1 k=1

By Corollary 1.23,

i=1 k=1

A0

i=1

= A.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

30

2.2

LINEAR ALGEBRA

Subspaces

Definition 2.24. Let (V, +, ·) be a vector space, and let W ⊆ V be a non-empty subset. If x + y ∈ W for all x and y in W , then W is closed under addition. If cx ∈ W for all x in W and all real c, then W is closed under scalar multiplication. If (W, +, ·) is a vector space, we say W is a (vector ) subspace of V . Remark 2.25. If a non-empty set W in a vector space (V, +, ·) is closed under scalar multiplication, then 0V ∈ W : By hypothesis there is some x in W , so 0x = 0V ∈ W . Further, W contains the additive inverse of each of its elements: If x ∈ W , then (−1) · x = −x ∈ W . Theorem 2.26. Let (V, +, ·) be a vector space, W ⊆ V non-empty. The following are equivalent: (i) W is closed under addition and under scalar multiplication. (ii) For all x and y in W and all real c, cx + y ∈ W . (iii) W is a subspace of (V, +, ·). Proof. ((i) if and only if (ii)). Assume W is closed under addition and scalar multiplication. If x and y are elements of W and c is real, then cx ∈ W since W is closed under scalar multiplication, so cx + y ∈ W since W is closed under addition. Conversely, if Condition (ii) holds, then taking c = 1 shows W is closed under addition, while taking y = 0 shows W is closed under scalar multiplication. ((i) if and only if (iii)). If W is closed under addition in V , then the addition operator + is a binary operation on W itself, and is, a fortiori, associative and commutative on W . If W is closed under scalar multiplication, W contains the identity element of V as well as the additive inverse of each of its elements, by Remark 2.25. That is, the first four vector space axioms hold for (W, +, ·). The last four axioms hold automatically. Conversely, if W is a subspace, then (W, +, ·) is a vector space, so by definition W is closed under addition and scalar multiplication. Remark 2.27. If (V, +, ·) is a vector space, W ⊆ V is a subspace, and if v1 , . . . , vm are elements of W , then an arbitrary linear combination of the vj is in W , by induction on the number of summands.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

31

Example 2.28. Every vector space has two subspaces: The entire space (which is “not proper”), and the trivial subspace W = {0V }. A subspace other than these is “proper and non-trivial”. Example 2.29. Consider the vector space (R2 , +, ·), and write a general element x as (x1 , x2 ). The set W1 = {x : xi ≥ 0}, the closed first quadrant, is closed under addition, since if x and y are in W1 , then x + y = (x1 + y 1 , x2 + y 2 ) has non-negative components. (A sum of non-negative real numbers is non-negative.) This set is not closed under scalar multiplication. For example, (1, 1) ∈ W1 , but −1(1, 1) = (−1, −1) is not in W1 . The set W2 = {x : x1 x2 = 0}, the union of the coordinate axes, is closed under scalar multiplication. If x ∈ W2 and c is real, then cx = (cx1 , cx2 ) satisfies the membership criterion for W2 , since (cx1 )(cx2 ) = c2 (x1 x2 ) = c2 (0) = 0. This set is not closed under addition: x = (1, 0) and y = (0, 1) are in W2 , but their sum x + y = (1, 1) is not in W2 .

x * W1

y * W2

W3

x + y 6* W2

W4 2x 6* W4 x * W4

x * W2 2x 6* W1

x + y 6* W4 y * W4

The set W3 = {x : 5x1 − 3x2 = 0} is closed under addition and scalar multiplication. To prove this, let x and y be arbitrary elements of W3 , and let c be real. We wish to show that cx + y satisfies the defining equation of W3 . But 5(cx1 + y 1 ) − 3(cx2 + y 2 ) = c(5x1 − 3x2 ) + (5y 1 − 3y 2 ) = 0 + 0 = 0, so cx + y ∈ W3 . The set W4 = {x : 5x1 −3x2 = 1} is not closed under either addition or scalar multiplication. We give two proofs of each assertion. First, it suffices to find counterexamples. If we take x = ( 51 , 0) and y = (0, − 31 ), then x, y ∈ W4 , but x + y = ( 51 , − 13 ) does not satisfy 5x1 − 3x2 = 1, so x + y 6∈ W4 .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

32

LINEAR ALGEBRA

For scalar multiplication, the zero vector is not in W4 , so as remarked earlier, W4 is not closed under scalar multiplication. Alternatively, if x and y satisfy the defining equation of W4 , we may add these equations, or multiply one of them by c: 1 = 5x1 − 3x2 , 1 = 5y 1 − 3y 2 ,

2 = 5(x1 + y 1 ) − 3(x2 + y 2 ),

1 = 5x1 − 3x2 ,

c = c(5x1 − 3x2 )

= 5(cx1 ) − 3(cx2 ).

The left-hand side of the sum is 2 6= 1, so the pair (x1 + y 1 , x2 + y 2 ) is not in W4 . The left-hand side of the product is c, so if c 6= 1, the pair (cx1 , cx2 ) is not in W4 . Example 2.30. If a1 and a2 are real numbers (possibly 0), the set W = {x : a1 x1 + a2 x2 = 0} ⊆ R2 is a subspace. Conceptually, the defining equation is “preserved by addition and by scalar multiplication”. Generally, if a1 , a2 , . . . , an are real numbers, the set {x : a1 x1 + a2 x2 + · · · + an xn = 0} ⊆ Rn is a subspace. This is easily verified directly, see Theorem 2.37. Definition 2.31. Let n be a positive integer. A square matrix A in Rn×n is said to be symmetric if AT = A, and is skew-symmetric if AT = −A. Example 2.32. Let V = Rn×n be the set of square matrices, regarded as a vector space under matrix addition and scalar multiplication. The set Symn of symmetric matrices in V is a subspace: If A and B are symmetric, and if c is real, then (cA + B)T = cAT + B T = cA + B by Remark 1.15. Theorem 2.26 implies Symn is a subspace of V . Similarly, the set Skewn of skew-symmetric matrices in V is a subspace. Example 2.33. Let V = R2×2 under addition and scalar multiplication. The set C of matrices [Aij ] satisfying A11 = A22 and A21 = −A12 has typical element      1  a −a2 2 0 −1 1 1 0 = a1 I + a2 J, +a =a 1 0 0 1 a2 a1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

33

where the final equality is the definition of J. If A = a1 I + a2 J and B = b1 I + b2 J are in C, and if c is real, then cA + B = c(a1 I + a2 J) + (b1 I + b2 J) = (ca1 + b1 )I + (ca2 + b2 )J is in C. By Theorem 2.26, C is a subspace of V . The set W of matrices satisfying A11 A22 − A21 A12 = 0 is closed under scalar multiplication, but not under addition. For example, if       1 0 0 0 1 0 , , A+B = , B= A= 0 1 0 1 0 0 then A and B are in W , but A + B is not. Example 2.34. Let V = F(R, R), viewed as a vector space under ordinary addition and scalar multiplication of functions. The following subsets are vector subspaces, as is easily checked using Theorem 2.26. Z(0) = {f : f (0) = 0}, the set of functions vanishing at 0; generally, Z(x0 ) = {f : f (x0 ) = 0} if x0 is real. C(R) = {f : f is continuous}, D(R) = {f : f is differentiable}, C 1 (R) = {f : f is continuously differentiable}. In each case, a theorem from analysis shows that a sum or scalar multiple of functions of the stated type is another function of the same type. Pn , the set of polynomial functions of degree at most n, and P , the space of all polynomial functions of arbitrary (finite) degree. The space of all cosine polynomials, or of all sine polynomials. {f in C 1 : f 0 = f }. This is an example of a solution space of a differential equation. Theorem 2.35. Let V be a vector space, W1 and W2 subspaces of V . (i) The intersection W1 ∩ W2 is a subspace of V . (ii) The union W1 ∪ W2 is a subspace of V if and only if W1 ⊆ W2 or W2 ⊆ W1 . Proof. (i). Let x and y be elements of W1 ∩W2 , and let c be real. Since x and y are elements of W1 and W1 is a subspace, cx + y ∈ W1 . An entirely similar argument shows cx + y ∈ W2 . Thus cx + y ∈ W1 ∩ W2 . Since x, y, and c were arbitrary, W1 ∩ W2 is a subspace. (ii). If W1 ⊆ W2 , then W1 ∪ W2 = W2 is a subspace. If W2 ⊆ W1 , then W1 ∪ W2 = W1 is a subspace.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

34

LINEAR ALGEBRA

W2

W1

W1 ∩ W2

W2 x2

0

x1 + x2

x1

W1

Figure 2.2: The intersection and union of subspaces. Inversely, suppose neither subspace is contained in the other. Pick x1 in W1 \ W2 , and pick x2 in W2 \ W1 . Since each xi is in W1 ∪ W2 , it suffices to show x1 + x2 is not in W1 ∪ W2 . But if x1 + x2 were an element of W1 , then x2 = (x1 + x2 ) − x1 would be an element of W1 , contrary to the choice of x2 . Similarly, if x1 +x2 were an element of W2 , then x1 would be an element of W2 , contrary to choice. This means x1 + x2 is not in W1 ∪ W2 , so this set is not closed under addition. (The union is closed under scalar multiplication, as you can check.) Remark 2.36. An intersection of an arbitrary family of subspaces of V is a subspace of V , by an obvious modification of the proof of (i). Theorem 2.37. If A = [Aij ] is an m × n matrix, then W = {[xj ] in Rn : Ai1 x1 + · · · + Ain xn = 0 for all i = 1, . . . , m} is a subspace of (Rn , +, ·). We give two proofs. Matrix multiplication. By the definition of matrix multiplication, the set W is the set of x in Rn such that Ax = 0m . If x and y are elements of W , and if c is real, then by properties of matrix multiplication (Theorem 1.19), A(cx + y) = A(cx) + Ay = c(Ax) + Ay = c(0m ) + 0m = 0m . That is, cx + y ∈ W , so W is a subspace by Theorem 2.26. Intersection of subspaces. For each i = 1, . . . , m, consider the set Wi = {[xj ] in Rn : Ai1 x1 + · · · + Ain xn = 0}.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

35

If x = [xj ] and y = [y j ] are in Wi and if c is real, then Ai1 (cx1 + y 1 ) + · · · + Ain (cxn + y n )

= c(Ai1 x1 + · · · + Ain xn ) + (Ai1 y 1 + · · · + Ain y n ) = c · 0 + 0 = 0.

By Theorem 2.26, Wi is a subspace of Rn for i = 1, 2, . . . , m. By Theorem 2.35 (i), the intersection, namely W , is a subspace of Rn . Example 2.38. If X ⊆ R is an arbitrary non-empty subset, then Z(X) = {f : R → R : f (x) = 0 for all x in X}, the set of functions vanishing identically on X, is a subspace of F(R, R). This can be checked using Theorem 2.26. Alternatively, if x0 is real, the set Z(x0 ) = {f : f (x0 ) = 0} is a subspace, so Z(X) is an intersection of subspaces: \ Z(X) = Z(x0 ). x0 ∈X

2.3

Spans, Sums of Subspaces

In the preceding section, subspaces of a vector space (V, +, ·) are constructed by “cutting away”, namely by imposing conditions on elements of V . Dually, a subspace can be “built up” by taking sums and scalar multiples of elements. This is the viewpoint explored below. Definition 2.39. Let (V, +, ·) be a vector space and S ⊆ V a set of vectors. A linear combination from S is a finite sum of the form 1

2

m

x v1 + x v2 + · · · + x vm =

m X

xi v i ,

i=1

in which the vectors vi in S are distinct, and the xi are real numbers. The span of S is the set Span(S) of all linear combinations from S. We say S spans V if V = Span(S). If some finite set S spans V , we say V is finite-dimensional. Lemma 2.40. Let (V, +, ·) be a vector space, S and S 0 subsets of V . (i) Span(S) is a subspace of V . (ii) If S ⊆ S 0 , then Span(S) ⊆ Span(S 0 ).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

36

LINEAR ALGEBRA

 (iii) Span Span(S) = Span(S).

Proof. (i). If S is empty, then Span(S) = {0V }, which is a subspace. Otherwise, the fact that Span(S) is a subspace boils down to the fact that “a linear combination of linear combinations is a linear combination”. Precisely, assume x and y are elements of Span(S), and let v1 , . . . , vm enumerate the vectors appearing in either linear combination. By hypothesis, there exist scalars xi and y i , 1 ≤ i ≤ m, such that m m X X x= xi v i , y= y i vi . i=1

i=1

If c is real, then cx + y = c

m X i=1

xi v i

!

+

m X i=1

y i vi =

m X i=1

(cxi + y i )vi ∈ Span(S).

By Theorem 2.26, Span(S) is a subspace. (ii). This is essentially obvious: Every element of S is an element of S 0 , so every linear combination from S is, a fortiori, a linear combination from S 0 , i.e., Span(S) ⊆ Span(S 0 ).  (iii). Clearly S ⊆ Span(S), so (ii) gives Span(S) ⊆ Span Span(S) . Conversely, since Span(S) is a subspace, Span(S) is closed under addition and scalar multiplication. That is, an arbitrary linear combination from Span(S) is an element of Span(S), which proves the reverse inclusion, Span Span(s) ⊆ Span(S). Proposition 2.41. Let (V, +, ·) be a vector space, and S ⊆ V . Span(S) is the intersection of all subspaces of V that contain S. Proof. Let {Wα } denote the family of all subspaces of V that contain S, and let W = ∩α Wα be the intersection.

(W ⊆ Span(S)). Since Span(S) is itself a subspace containing S, i.e., is one of the Wα , we have W ⊆ Span(S).

(Span(S) ⊆ W ). Let Wα be an arbitrary subspace of V such that S ⊆ Wα . By Lemma 2.40, Span(S) ⊆ Span(Wα ) = Wα . Since Wα is an arbitrary subspace containing S, Span(S) ⊆ ∩α Wα = W .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

37

Sums and Direct Sums of Subspaces Definition 2.42. Let (V, +, ·) be a vector space. If W1 and W2 are subspaces, their sum is W1 + W2 = {v in V : v = w1 + w2 for some w1 in W1 and w2 in W2 }. Proposition 2.43. The sum W1 + W2 is a subspace of V . Proof. Let x and y be arbitrary elements of W1 + W2 , and let c be real. By hypothesis, there exist elements x1 and y1 in W1 , and x2 and y2 in W2 , such that x = x1 + x2 and y = y1 + y2 . Thus cx + y = c(x1 + x2 ) + (y1 + y2 ) = (cx1 + y1 ) + (cx2 + y2 ). Since W1 is a subspace, cx1 + y1 ∈ W1 . Similarly, cx2 + y2 ∈ W2 . The preceding therefore shows cx + y ∈ W1 + W2 . Theorem 2.44. Let V be a vector space. If S1 and S2 are subsets of V , and if Wi = Span(Si ), then Span(S1 ∪ S2 ) = W1 + W2 . Proof. (W1 + W2 ⊆ Span(S1 ∪ S2 )). Since S1 ⊆ S1 ∪ S2 , Lemma 2.40 implies W1 = Span(S1 ) ⊆ Span(S1 ∪ S2 ). Similarly, W2 ⊆ Span(S1 ∪ S2 ). Since Span(S1 ∪ S2 ) is a subspace, we have W1 + W2 ⊆ Span(S1 ∪ S2 ).

(Span(S1 ∪ S2 ) ⊆ W1 + W2 ). If x ∈ Span(S1 ∪ S2 ), then there P exist i vectors v1 , . . . , vm in S1 ∪ S2 and scalars x such that x = i xi vi . Reindexing if necessary, we may assume there is a k, 0 ≤ k ≤ m, such that v1 , . . . , vk are in S1 and vk+1 , . . . , vm are in S2 . Thus x=

X k i=1

i

x vi



+

 X m

i=k+1

i

x vi



∈ W1 + W2 .

Definition 2.45. Let (V, +, ·) be a vector space, and let W1 and W2 be subspaces. If W1 ∩ W2 = {0V }, we say W1 + W2 is a direct sum, and write W1 ⊕ W2 . Theorem 2.46. If W = W1 ⊕ W2 is a direct sum of vector spaces, and if w ∈ W , there exist unique vectors w1 in W1 and w2 in W2 such that w = w1 + w2 .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

38

LINEAR ALGEBRA

Proof. By the definition of a sum of vector spaces, there exist vectors w1 in W1 and w2 in W2 satisfying w = w1 + w2 . To prove uniqueness, assume w10 in W1 and w20 in W2 provide another “decomposition” of w. Since w1 + w2 = w10 + w20 , we have w1 − w10 = w20 − w2 . However, the left-hand side is in W1 and the right-hand side is in W2 . Since W1 ∩ W2 = {0V }, we have w1 − w10 = 0V and w20 − w2 = 0V , i.e., w1 = w10 and w2 = w20 . Example 2.47. In (R3 , +, ·), the planes W1 = {x : x1 = 0} and W2 = {x : x2 = 0} are subspaces (Theorem 2.37) whose sum is all of R3 . The general element x = (x1 , x2 , x3 ) may be written (0, x2 , x3 ) + (x1 , 0, 0) = (0, x2 , 0) + (x1 , 0, x3 ) ∈ W1 + W2 . The sum is not direct, since the non-zero vector (0, 0, 1) is in W1 ∩ W2 .

Example 2.48. In (R4 , +, ·), the planes W1 = {(x1 , x2 , 0, 0)} and W2 = {(0, 0, x3 , x4 )} are subspaces whose sum is all of R4 , and the sum is clearly direct. Proposition 2.49. Let n be a positive integer. If Rn×n is the vector space of square matrices under matrix addition and scalar multiplication, Symn is the subspace of symmetric matrices, and Skewn the subspace of skew-symmetric matrices, then Rn×n = Symn ⊕ Skewn . In particular, every square matrix can be written uniquely as the sum of a symmetric matrix and a skew-symmetric matrix. Proof. (W1 + W2 = Rn×n ). Let A be an arbitrary n × n matrix. The matrices Askew = 12 (A − AT ) Asym = 21 (A + AT ),

are symmetric and skew-symmetric, respectively:  1 T T T T T T 1 1 AT = 2 (A + A) = Asym , sym = 2 (A + A ) = 2 A + (A )  T T T T T 1 1 = 21 (AT − A) = −Askew . AT skew = 2 (A − A ) = 2 A − (A )

Moreover, Asym + Askew is obviously A. This shows that an arbitrary n × n matrix can be written as the sum of a symmetric and a skewsymmetric matrix. (W1 ∩ W2 = {0n×n }). If A is both symmetric and skew-symmetric, then A = AT = −A, so A = 0n×n .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

2.4

39

Bases and Dimension

The “size” of a real vector space is quantified by the minimum number of spanning elements. For example, (Rn , +, ·) is spanned by the standard basis (ei )ni=1 . It turns out that every set of fewer than n vectors does not span, while every set of more than n vectors is “redundant”, or “linearly dependent”, in that some proper subset also spans. (It is not true that every set of n vectors spans Rn ; for example, one of the vectors could be 0n , or all the vectors could be proportional.)

Linear Dependence and Independence Definition 2.50. A set S of vectors in V is linearly dependent if there exists a non-trivial linear combination from S equal to the zero vector, i.e., if there exist distinct vectors v1 , . . . , vk in S (k ≥ 1) and scalars xi , not all zero, such that k X xi v i = 0V . i=1

If S is not linearly dependent, we say S is linearly independent.

Remark 2.51. The empty set is linearly independent. A non-empty set S is linearly independent if and only if no non-trivial linear combination from S is 0V . For later use we give a formal statement. Lemma 2.52. Let (V, +, ·) be a vector space. A non-empty set S of vectors in V is linearly independent if and only if: For all (vi )ki=1 in S P i i k and all scalars (x )i=1 , if i x vi = 0V , then xi = 0 for all i. Example 2.53. In (R2 , +, ·), the set S = {(1, 0), (0, 1)} = (ej )2j=1 is linearly independent: If (x1 , x2 ) = x1 e1 + x2 e2 = (0, 0), then (by equating components) x1 = x2 = 0. The set S 0 = {(1, 1), (1, −1), (4, 2)} = (vj )3j=1 is linearly dependent. For example, 3v1 + v2 − v3 = (0, 0).

Theorem 2.54. Let (V, +, ·) be a vector space, and let S be a set of vectors in V . The following are equivalent: (i) S is linearly dependent. (ii) There exists a vector v in S such that v ∈ Span(S \ {v}). That is, some element of S can be expressed as a linear combination of the other elements of S.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

40

LINEAR ALGEBRA

(iii) There is a proper subset S 0 of S such that Span(S 0 ) = Span(S). Proof. ((i) implies (ii)). If S is linearly dependent, there exist distinct vectors v1 , . . . , vk in S and scalars x1 , . . . , xk , not all zero, such that 0 = x 1 v 1 + · · · + xk v k .

Without loss of generality, we may assume x1 6= 0. The preceding equation may be rearranged as 1 v1 = − 1 (x2 v2 + · · · + xk vk ), x which expresses some element of S as a linear combination of other elements. ((ii) implies (iii)). Let v be as in (ii), and let S 0 = S \ {v} ⊆ S. By Lemma 2.40, Span(S 0 ) ⊆ Span(S). Conversely, v ∈ Span(S 0 ), so every linear combination from S is a linear combination from S 0 , i.e., Span(S) ⊆ Span(S 0 ).

((iii) implies (i)). Let S 0 be a proper subset of S with Span(S 0 ) = Span(S), and let v1 be an element of S \ S 0 . Since v1 ∈ Span(S 0 ), there exist vectors v2 , . . . , vk in S 0 (necessarily distinct from v1 ) and scalars x2 , . . . , xk , such that v 1 = x2 v 2 + · · · + xk v k ,

i.e.,

− v 1 + x2 v 2 + · · · + xk v k = 0V .

This is a non-trivial linear combination from S whose sum is 0. By definition, S is linearly dependent. Corollary 2.55. Let S be a linearly independent subset of some vector space (V, +, ·). If v ∈ V \ S, then S ∪ {v} is linearly independent if and only if v 6∈ Span(S).

Proof. If v ∈ Span(S), then Span(S ∪ {v}) = Span(S), so S ∪ {v} is linearly dependent by the theorem. Contrapositively, if S ∪ {v} is linearly independent, then v 6∈ Span(S). Suppose v 6∈ Span(S), and assume xv + x1 v1 + · · · + xk vk = 0

for some vectors vi in S and some scalars x, xi . If x 6= 0, the preceding equation could be rearranged to express v as a linear combination from S, contrary to hypothesis; thus x = 0. Because S is linearly independent, xi = 0 for i = 1, . . . , k. That is, the only linear combination from S ∪ {v} equal to 0 is the trivial linear combination. By Lemma 2.52, S ∪ {v} is linearly independent.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

41

Bases and Coordinate Vectors Definition 2.56. An ordered, linearly independent spanning set of V is called a basis of V . Remark 2.57. If (v1 , v2 ) is a basis, then (v2 , v1 ) is a different basis. (Round brackets signify an ordered set, while curly braces signify an unordered set.) Many books do not explicitly specify that a basis is ordered, but ordering is implicit when basis elements are indexed. Theorem 2.58. Let (V, +, ·) be a vector space spanned by some finite set T . If S ⊆ V is a linearly independent set, there exists a basis S 0 with S ⊆ S 0 ⊆ S ∪ T . In particular, some subset of T is a basis of V .

Proof. The proof is inductive. Introduce the “initial” set S0 = S, and write T = (y1 , . . . , yn ). For m = 1, . . . , n, define ( Sm−1 ∪ {ym } if ym 6∈ Span(Sm−1 ), Sm = Sm−1 if ym ∈ Span(Sm−1 ).

By construction, Sm−1 ⊆ Sm ⊆ S ∪ T and Span(yj )m j=1 ⊆ Span(Sm ) for each m. In particular, S ⊆ Sm ⊆ S ∪ T for each m. By Corollary 2.55 and induction on m, Sm is linearly independent. The set S 0 = Sn is linearly independent, satisfies S ⊆ S 0 ⊆ S ∪ T , and spans V because V = Span(yj )nj=1 ⊆ Span(S 0 ) ⊆ V .

Theorem 2.59. Let V be a vector space, and assume S = (vi )ni=1 is a basis of V . For every vector x in V , there exists a unique ordered n-tuple of scalars (xi )ni=1 such that x=

n X

xi v i .

i=1

Proof. (Existence). Let x be an arbitrary element of V . Existence of scalars xi as in the theorem follows immediately from the definition of a spanning set. (Uniqueness). Suppose there 1 , . . . , vm in S and Pexist vectors vP scalars xi and y i such that x = i xi vi and x = i y i vi . Subtracting the second from the first,  X  X X m m m V i i 0 = (xi − y i )vi . y vi = x vi − i=1

i=1

i=1

By Lemma 2.52, xi − y i = 0 for all i, i.e., xi = y i for all i.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

42

LINEAR ALGEBRA

Definition 2.60. The ordered n-tuple [x]S = [xi ] associated to the vector x with respect to the basis S is called the coordinate vector of x (in the basis S). Example 2.61. Let S = (ei )ni=1 be the standard basis of Rn ,       0 0 1 0 1 0       e1 =  ..  , e2 =  ..  , ..., en =  ..  . . . . 1 0 0 P If x = [xi ] ∈ Rn , then x = i xi ei = [x]S ; every vector is its own coordinate vector in the standard basis. Example 2.62. In R2×2 , the matrices (eji )2i,j=1 , namely,         0 0 0 0 0 1 1 0 2 1 2 1 e1 = , , e2 = , e2 = , e1 = 0 1 1 0 0 0 0 0

constitute the standard basis of R2×2 . An arbitrary matrix A = [Aij ] may be written uniquely as a linear combination   1 2 X A1 A12 1 1 1 2 2 1 2 2 = A e + A e + A e + A e = Aij eji . 1 1 2 1 1 2 2 2 A21 A22 i,j=1

Example 2.63. For each pair of indices (i, j) with 1 ≤ i ≤ m and 1 ≤ j ≤ n, let eji be the m×n matrix having ei as its jth column and all other columns equal to 0, i.e., having a 1 in the ith row and jth column and 0 elsewhere. The collection S = (eji ) of all such matrices forms the standard basis of Rm×n . If A = [Aij ] ∈ Rm×n , then A=

m X n X

Aij eji .

i=1 j=1

That is, every m × n real matrix is its own coordinate vector with respect to the standard basis of Rn×m .

Dimension of a Vector Space Theorem 2.64. Let (V, +, ·) be a vector space spanned by a set S containing m elements. If S 0 ⊆ V is a linearly independent set of n elements, then n ≤ m.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

43

0 0 n Proof. Write S = (vi )m i=1 and S = (vj )j=1 . Because S spans V , there exist scalars Aij , 1 ≤ i ≤ m and 1 ≤ j ≤ n, such that

vj0 =

m X

Aij vi

for all j = 1, . . . , n.

i=1

These scalars constitute a matrix A = [Aij ] in Rm×n , which defines a homogeneous linear system Ax = 0m of m equations in n variables. If x = [xj ] is an arbitrary vector of coefficients, then n X j=1

xj vj0

=

n X j=1

x

j

X m i=1

Aij vi



=

m X n X i=1

j=1

Aij xj



vi .

In particular, if x is a solution of Ax = 0m , then the linear combination on the left is 0V . If m < n, i.e., the system Ax = 0m has more variables than equations, then there exists a non-trivial solution x by Remark 1.37. This means 0V is a non-trivial linear combination from S 0 , i.e., S 0 is linearly dependent. Contrapositively, if S 0 is linearly independent, then n ≤ m. Corollary 2.65. Let (V, +, ·) be a vector space admitting a basis consisting of m elements. If S 0 = (v1 , . . . , vn ) is a basis of V , then n = m. Proof. Let S be a basis of V containing m elements. Since S spans V and S 0 is linearly independent, the theorem gives n ≤ m. Since S 0 spans V and S is linearly independent, m ≤ n. Definition 2.66. If some (hence every) basis of V contains exactly n elements, we say V is n-dimensional, and write n = dim V . If V has no finite basis, we say V is infinite-dimensional, and write dim V = ∞. Corollary 2.67. If (V, +, ·) is an n-dimensional vector space, and if W is a subspace, then dim W ≤ n, with equality if and only if W = V . Proof. We first show W has a finite basis. Let S0 = ∅. If W = {0V }, then S0 is a basis. Otherwise, define sets Sk (for k = 1, 2, . . . ) inductively as follows: If W 6= Span(Sk−1 ), pick a vector vk in W \ Span(Sk−1 ) and put Sk = Sk−1 ∪ {vk }. By construction, the set Sk is linearly independent and contains exactly k elements.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

44

LINEAR ALGEBRA

It must be that W = Span(Sm ) for some integer m with 0 ≤ m ≤ n; if not, then Sn+1 ⊆ V is a linearly independent set of (n + 1) elements, contrary to Theorem 2.64. This proves simultaneously that dim W is finite and no larger than n. If W = V , then obviously dim W = dim V . If W 6= V , there exists a vector v0 in V \ W ; if (vj )m j=1 is a basis m of W , then (vj )j=0 is a linearly independent set in V by Corollary 2.55, so dim W = m < m + 1 ≤ dim V . Contrapositively, if dim W = dim V , then W = V . Corollary 2.68. If (V, +, ·) is an n-dimensional vector space, and if S is a set of n elements, then S is linearly independent if and only if Span(S) = V . Proof. Apply the preceding corollary to W = Span(S). Example 2.69. The standard basis of (Rn , +, ·) contains n elements, so dim Rn = n.  Example 2.70. The standard dual basis of (Rn )∗ , +, · contains n elements, so dim(Rn )∗ = n.

Example 2.71. The standard basis of (Rm×n , +, ·) contains mn elements, so dim Rm×n = mn. Example 2.72. Let n be a non-negative integer. The polynomial space Pn = {a0 + a1 t + a2 t2 + · · · + an tn }

has the basis (1, t, t2 , . . . , tn ), containing (n+1) elements. (Superscripts denote exponents here.) Thus dim Pn = n + 1. Example 2.73. The vector space of symmetric 2 × 2 real matrices is 3-dimensional, and the space of skew-symmetric 2 × 2 real matrices is 1-dimensional. To show this, compare an arbitrary 2 × 2 matrix A to its transpose AT :     a c a b T . , A = A= b d c d

These are equal if and only if b = c. These are negatives if and only if b = −c and a = d = 0. General symmetric and skew-symmetric 2 × 2 matrices can therefore be written             0 −1 0 −b 0 0 a b 0 1 1 0 =b +d +b =a 1 0 b 0 0 1 1 0 0 0 b d = ae11 + b(e21 + e12 ) + de22 ,

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

= b(e12 − e21 ).

CHAPTER 2. VECTOR SPACES

45

The right-hand sides are linear combinations of specific matrices, and clearly the only way the right-hand side can be the zero matrix is if every coefficient is 0. It follows that {e11 , e21 + e12 , e22 } is a basis for Sym2 , and {e12 − e21 } is a basis of Skew2 . Example 2.74. The space P of all polynomial functions has infinite basis {1, t, t2 , t3 , . . . }, so dim P = ∞. Example 2.75. In the vector space (Rω , +, ·) of real sequences, define e1 = (1, 0, 0, . . . ), e2 = (0, 1, 0, . . . ), . . . , ej = (0, . . . , 0, 1, 0, . . . ), . . . , with ej having a 1 in the jth component and 0s elsewhere. The set ω S = (ej )∞ j=1 is linearly independent, but does not span R : Every linear combination from S is a finite sum, so every element of Span(S) has only finitely many non-zero components, while “most” elements of Rω have infinitely many non-zero components. If we let S n = (ej )nj=1 be the first n elements of S, we may identify Rn with Span(S n ), and with this identification we have inclusions R1 ⊆ R2 ⊆ . . . ⊆ Rn ⊆ . . . . The union of these spaces, denoted R∞ , is Span(S). For every positive integer n, we have proper inclusions Rn ⊆ Rn+1 ⊆ R∞ ⊆ Rω . Theorem 2.76. The solution space of a homogeneous linear system Ax = 0m of m equations in n variables is a vector subspace of Rn whose dimension is equal to the number of free variables. Proof. The solution set of Ax = 0m is a subspace of Rn by Theorem 2.37. When the coefficient matrix is put into reduced row-echelon form, the resulting system A0 x = 0m has precisely the same solution space as the original system. Let ` denote the number of free variables, and say the free variables are xk1 , . . . , xk` ; for simplicity, write xkj = y j . The basic variables are linear combinations of the free variables. For each j = 1, . . . , `, let vj be the solution obtained by setting y j = 1 and all other free variables equal to 0. In particular, the kj component of vj is 1, and every other “free” solution has a 0 in the kj component. The solution space is clearly spanned by (vj )`j=1 , and this set is linearly independent: If aj are scalars satisfying 0n = a1 v1 + · · · + a` v` , then aj = 0, as the kj th component of the linear combination. This shows the solution space has a basis of ` elements (` the number of free variables).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

46

LINEAR ALGEBRA

Definition 2.77. If X is a vector space and W ⊆ X is a subspace, a subspace X 0 is said to be complementary to W in X if X = W ⊕ X 0 . Lemma 2.78. If X is a finite-dimensional vector space and W is a subspace, there exists a subspace X 0 complementary to W in X, and we have dim X = dim W + dim X 0 . Proof. Write k = dim W , k + ` = dim X. Pick a basis (w1 , . . . , wk ) of W . By Theorem 2.58, there exist vectors (x1 , . . . , x` ) such that S = (w1 , . . . , wk , x1 , . . . , x` ) is a basis of X. Consider the subspace X 0 = Span(x1 , . . . , x` ). Since S spans X, X = W + X 0 . Since S is linearly independent, W ∩ X 0 = {0}. Finally, ` = dim X 0 , so dim X = k + ` = dim W + dim X 0 . Theorem 2.79 (The Dimension Theorem). If X and Y are finitedimensional subspaces of some vector space (V, +, ·), then dim(X + Y ) = dim X + dim Y − dim(X ∩ Y ). Proof. Let W = X ∩ Y , and pick a basis (w1 , . . . , wk ) of W . Now use Lemma 2.78 to pick a subspace X 0 ⊆ X with basis (x1 , . . . , x` ) such that X = W ⊕ X 0 , and a subspace Y 0 ⊆ Y , with basis (y1 , . . . , ym ) such that Y = W ⊕ Y 0 . We claim that S = (w1 , . . . , wk , x1 , . . . , x` , y1 , . . . , ym ) is a basis of X +Y . Clearly X +Y = Span(S). To prove S is linearly independent, suppose there exist scalars (ai )ki=1 , (bi )`i=1 , and (ci )m i=1 such that V

0 =

k X i=1

i

a wi +

` X i=1

i

b xi +

m X i=1

c i yi = w + x + y ∈ W + X 0 + Y 0 .

Rearranging, −y = w + x; the left side is an element of Y 0 ⊆ Y while the right side is in X. The common value is therefore an element of W = X ∩ Y . But W ∩ Y 0 = {0}, so y = 0 and w + x = 0. Since W ∩ X 0 = {0}, we have w = 0 and x = 0. Each of the sets (wi )ki=1 , (xi )`i=1 , and (yi )m i=1 is linearly independent, and the summands w, x, and y are individually 0, so the scalars (ai )ki=1 , (bi )`i=1 , and (ci )m i=1 are all zero. This shows that no non-trivial linear combination from S is 0. That is, S is linearly independent, hence is a basis for X + Y . The dimension of X + Y is found by counting basis elements: dim(X + Y ) = k + ` + m = (k + `) + (k + m) − k = dim X + dim Y − dim(X ∩ Y ).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

47

Example 2.80. A plane through the origin in (Rn , +, ·) is a twodimensional subspace. If W1 and W2 are distinct planes through the origin in R3 , then 2 < dim(W1 + W2 ) ≤ dim(R3 ) = 3, so in fact W1 + W2 = R3 by Corollary 2.67. Theorem 2.79 implies dim(W1 ∩ W2 ) = 1. That is, distinct planes through the origin in R3 span R3 , and intersect in a line through the origin. A pair of distinct planes through the origin in R4 can intersect in a line (if the planes span a three-dimensional subspace) or a point (if the planes span R4 ). Example 2.81. If W1 and W2 are distinct three-dimensional subspaces of (R6 , +, ·), their intersection can be a plane (if dim(W1 + W2 ) = 4); a line (if dim(W1 + W2 ) = 5); or a point (if W1 + W2 = R6 ). Definition 2.82. An n × n matrix [Aij ] is: (i) Upper triangular if every entry below the main diagonal is zero, i.e., if Aij = 0 whenever i > j. (ii) Lower triangular if every entry above the main diagonal is zero, i.e., if Aij = 0 whenever i < j. (i) Diagonal if every entry off the main diagonal is zero, i.e., if Aij = 0 whenever i 6= j. Remark 2.83. The sets of upper triangular and lower triangular n × n matrices span Rn×n , and their intersection is the space of diagonal matrices. Example 2.84. If eji denotes the standard basis matrix with a 1 in the (i, j) entry and 0’s elsewhere, then eji is upper triangular if and only if i ≤ j; lower triangular if and only if j ≤ i; and diagonal if and only if i = j. When n = 2, the space of diagonal matrices has basis (e11 , e22 ). In addition, e21 is upper triangular, and e12 is lower triangular. If we let D, U , and L denote the spaces of 2 × 2 diagonal, upper triangular, and lower triangular matrices, respectively, then D = U ∩ L, R2×2 = U + L, and dim R2×2 = 4 = 3 + 3 − 2 = dim U + dim L − dim D, in agreement with Theorem 2.79.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

48

LINEAR ALGEBRA

Computational Algorithms Let (V, +, ·) be a vector space, and let S = (vj )nj=1 be an ordered subset. To determine whether S is linearly dependent or independent, express the zero vector 0V as a linear combination with coefficients x1 , . . . , xn , reduce the resulting coefficient matrix to echelon form, and determine whether or not there are free variables (dependent) or not (independent). Example 2.85. Is the set S = {(1, 1, −1, −1), (2, 2, 2, 2), (0, 0, 1, 1)} linearly dependent? Here we might notice that 2v1 − v2 + 4v3 = 04 by inspection, so S is linearly dependent. We can also P proceed systemati1 2 3 cally. Suppose x , x , and x are scalars such that j xj vj = 0:           x1 + 2x2 1 2 0 0 1 2     0       = x1  1 + x2 2 + x3 0 =  1x + 2x  2 3 −1 2 1 −x + 2x + x  0 0 −1 2 1 −x1 + 2x2 + x3

This is a homogeneous system, whose coefficient matrix is the 4 × 3 matrix whose columns are the elements of S. (That is, we need not have written out the system at all!) We have    R −R ,    1 2 0 R42 −R31 , 1 2 0 R2 +R1 , 1 2 0  1 2 0 R2 ↔R3 −1 2 1 14 R2 0 1 1  4      −1 2 1 −→  0 0 0 −→ 0 0 0  . −1 2 1 0 0 0 0 0 0

From the row-echelon form, we see that x1 and x2 are basic and x3 is free. This system has a non-trivial solution (i.e., 04 is a non-trivial linear combination from S), so S is linearly dependent.

Example 2.86. For the S of the preceding example, is b = (1, 2, 3, 3) in Span(S)? Here we wish to express b as a linear combination from S, i.e., to find scalars xj such that x1 v1 + x2 v2 + x3 v3 = b, or prove none exist. Letting A denote the coefficient matrix in the preceding example, we wish to solve Ax = b. We form the augmented matrix and row reduce:   R −R ,   1 2 0 1 R42 −R31 , 1 2 0 1  1 2 0 2 R2 ↔R3 −1 2 1 3     −1 2 1 3 −→  0 0 0 1 . −1 2 1 3 0 0 0 0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

49

Already we can stop; the third equation is inconsistent; the system Ax = b has no solutions, so b 6∈ Span(S). Example 2.87. In the (four-dimensional) polynomial space P3 , consider the set S 0 = {1 + t − t2 − t3 , 2 + 2t + 2t2 + 2t3 , t2 + t3 }, and let q(t) = 1 + 2t + 3t2 + 3t3 . (Superscripts on t denote exponents.) Is S 0 linearly independent? Is q an element of Span(S 0 )? We use aj to denote unknowns. (Superscripts on a connote indices.) We have two computational alternatives. The first is to set a1 p1 + a2 p2 + a3 p3 = 0 as polynomials (i.e., to expand the left-hand side and equate all coefficients to 0), then solve the resulting system for the aj , looking for a non-trivial solution; or to set a1 p1 + a2 p2 + a3 p3 = q as polynomials and attempt to solve. The second method is to pick any convenient basis of P3 , express each element of S 0 (and the polynomial q) as a coordinate vector (Definition 2.60), and to perform computations in R4 . But {1, t, t2 , t3 } is a basis of P3 , and the coordinate vectors of the elements of S 0 are precisely the elements of S in the preceding examples, while the coordinate vector of q is b = (1, 2, 3, 3). From our work in the two preceding examples, we learn that S 0 is linearly dependent, and q 6∈ Span(S 0 ).

Example 2.88. Let S = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} = (vj )3j=1 . Determine whether S is a basis of R3 ; if so, and if b = (b1 , b2 , b3 ) is arbitrary, find the coordinate vector (b)S . Let x1 , x2P , and x3 be coefficients. In theory our task is twofold: Show that if j xj vj = 03 , then xj = 0 for all j (so that S is linearly P independent, hence a basis of R3 ); and solve the system j xj vj = b. Both operations entail forming the coefficient matrix A whose columns are the elements of S and row reducing. Rather than row reduce the same matrix twice, we set out (“optimistically”) to answer the second question, and along the way answer the first. The augmented matrix is reduced as follows:  R3 −R2 ,   0 1 1 b1 R3 −R1 , 1 0 1 b2 1 ↔R2  1 0 1 b2  R−→  0 1 1 b1 3 1 2 3 1 1 0 b 0 0 −2 −b − b + b 

− 12 R3 ,  R2 −R3 , 1 R1 −R3

0 0  −→ 0 1 0 0 0 1

1 2 2 (b 1 3 2 (b 1 1 2 (b

 + b3 − b1 ) + b1 − b2 )  . + b2 − b3 )

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

50

LINEAR ALGEBRA

Because there are no free variables, the original set is linearly independent, hence is a basis. As a fringe benefit, we learn that  2   1 b + b3 − b1 x S 1 3 1 2   (b) = 2 b + b − b = x2  . b1 + b2 − b3 x3

That is x1 v1 + x2 v2 + x3 v3 = b.

Example 2.89. Let W ⊆ R5 be spanned by x1 = (1, 1, 1, 1, 1) and x2 = (1, 2, 3, 4, 5). Find a homogeneous linear system Ax = 05 whose solution space is W . The idea is to treat the coefficients of a scalar equation as unknowns; that is, suppose A = [a1 a2 a3 a4 a5 ] is a set of coefficients for a linear equation vanishing on W . The vectors x1 and x2 satisfy Ax = 0 if and only if a1 + a2 + a3 + a4 + a5 = 0, a1 + 2a2 +3a3 + 4a4 +5a5 = 0. To solve, form the coefficient matrix and put it in reduced row-echelon form. The coefficient matrix has rows equal to the vectors in a basis of W :       1 1 1 1 1 R2 −R1 1 1 1 1 1 R1 −R2 1 0 −1 −2 −3 . −→ −→ 1 2 3 4 5 0 1 2 3 4 0 1 2 3 4 The basic variables are a1 and a2 . Solving for the basic variables, we get a1 = a3 + 2a4 + 3a5 , a2 = −2a3 − 3a4 − 4a5 . The general solution is  1         a a3 + 2a4 + 3a5 1 2 3 a2  −2a3 − 3a4 − 4a5  −2 −3 −4  3         a  =   = a3  1 + a4  0 + a5  0 . a3  4         a     0  1  0 a4 0 0 1 a5 a5 The three columns on the right span the solution space; the corresponding linear equations vanish on W , and all together they “cut out” W . That is, W is the solution set of   x1 − 2x2 + x3 = 0, 1 −2 1 0 0 2 −3 0 1 0 . 2x1 − 3x2 + x4 = 0, 3 −4 0 0 1 3x1 − 4x2 + x5 = 0;

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 2. VECTOR SPACES

51

Exercises Exercise 2.1. Let V = F(R, R) be the vector space of real-valued functions on R under ordinary addition and scalar multiplication. Determine (with justification) which of the following sets are linearly independent. (a) {1 − t + t2 , 1 + t + t2 , t + t2 }; {1 − t + t2 , 1 + t + t2 , 1 + t2 }. (b) {cos2 t, sin2 t}; {1, cos2 t, sin2 t}. (c) {cos t, sin t}; {1, cos t, sin t}. Exercise 2.2. Find a basis for the subspace W ⊆ P5 defined by W = {p in P5 : p(−1) = p(0) = p(1) = 0} using the following strategy: Write the general element of P5 in the form p(t) = a0 + a1 t + · · · + a5 t5 , use each of the given conditions to impose a linear equation on the coefficients, and use row reduction to solve the resulting system of equations. Show that q(t) = t − t3 divides each basis element. Exercise 2.3. Let a < b < c be real numbers, and consider the quadratic Lagrange interpolation polynomials   e1 (t) = (t − b)(t − c)/ (a − b)(a − c) ,   e2 (t) = (t − a)(t − c)/ (b − a)(b − c) ,   e3 (t) = (t − a)(t − a)/ (c − a)(c − b) . (a) Evaluate each polynomial at t = a, b, and c.

(b) Show (ei )3i=1 is a basis of P2 . Hint: Use part (a). (c) If p is an arbitrary quadratic polynomial, find a formula (in terms of a, b, c, and p) expressing p as a linear combination from (ei )3i=1 , and write each of p0 (t) = 1, p1 (t) = t, and p2 (t) = t2 as a linear combination from (ei )3i=1 . Hint: Use part (a). Exercise 2.4. Let V = F(R, R) be the vector space of real-valued functions on R under ordinary addition and scalar multiplication. (a) A function φ is even if φ(−t) = φ(t) for all t. Prove that the set E of even functions is a subspace of V .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

52

LINEAR ALGEBRA

(b) A function φ is odd if φ(−t) = −φ(t) for all t. Prove that the set O of odd functions is a subspace of V . (c) Show that V = E + O, and determine whether or not the sum is direct. Hint: If f is an arbitrary function, consider the functions   feven (t) = 21 f (t) + f (−t) , fodd (t) = 12 f (t) − f (−t) .

(d) Write f (t) = et as the sum of an even and an odd function.

Exercise 2.5. Let (V, +, ·) be a vector space, x, y and z elements of V . (a) Show {x, y} is linearly independent if and only if {x + y, x − y} is linearly independent. (b) True or false (with proof): {x, y, z} is linearly independent if and only if {x + y, x + z, y + z} is linearly independent. (c) True or false (with proof): {x, y, z} is linearly independent if and only if {x − y, x − z, y − z} is linearly independent. Exercise 2.6. In R6 , give specific examples of three-dimensional subspaces W1 and W2 whose intersection has dimension (a) two; (b) one; (c) zero. Exercise 2.7. If A = [Aij ] is a 2 × 2 matrix, the trace of A is defined to be tr(A) = A11 + A22 , the sum of the diagonal entries. (a) Show that the set W of 2 × 2 matrices of trace 0 is a subspace of R2×2 . (b) Following Example 2.73, find bases for W , for Sym2 ∩W , and for Skew2 ∩W . (c) Is it true that Sym2 +W = R2×2 ? Explain. Exercise 2.8. Determine the dimensions of the spaces of upper triangular, lower triangular, and diagonal n × n matrices, and verify the Dimension Theorem for this example.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Chapter 3 Euclidean Geometry Vectors allow us to speak of linear combinations, but not of geometric concepts such as length, angle, and volume. In order to do geometry (particularly in Rn ), we introduce additional structure.

3.1

Inner Products

Definition 3.1. Let (V, +, ·) be a real vector space. An inner product on V is a function B : V × V → R satisfying the following conditions: (i) (Bilinearity) For all x1 , x2 , and y in V , and all real numbers c, B(cx1 + x2 , y) = cB(x1 , y) + B(x2 , y), B(y, cx1 + x2 ) = cB(y, x1 ) + B(y, x2 ). (ii) (Symmetry) For all x and y in V , B(y, x) = B(x, y). (iii) (Positive-definiteness) For all x in V , B(x, x) ≥ 0, with equality if and only if x = 0V . Remark 3.2. Bilinearity formalizes a two-sided “distributive law”, allowing inner products of linear combinations to be expanded in terms of inner products of vectors in the combination. To establish bilinearity for a particular function B, it suffices to verify symmetry and the first “bilinearity” condition. Example 3.3. (The Euclidean dot product) On (Rn , +, ·), define hx, yi = xT y = x1 y 1 + · · · + xn y n =

n X

xi y i .

i=1

53

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

54

LINEAR ALGEBRA

The expression xT y is the 1 × 1 matrix obtained by multiplying the transpose of x (a row) with y (a column), viewed as a real number. Bilinearity amounts to the distributive law: If x1 = [xi1 ], x2 = [xi2 ], and y = [y i ] are arbitrary vectors and c is real, then hcx1 + x2 , yi =

n X

(cxi1

i=1 n X

=c

i=1

+ xi2 )y i

=

n X

(cxi1 y i + xi2 y i )

i=1

xi1 y i +

n X i=1

xi2 y i = c hx1 , yi + hx2 , yi .

Symmetry and positive-definiteness are clear.  Example 3.4. Let a < b be real. On V = C [a, b], R , the vector space of continuous, real-valued functions on the interval [a, b] under function addition and scalar multiplication, the function Z b 1 B(f, g) = f (t)g(t) dt b−a a

defines an inner product. Bilinearity amounts to the distributive law plus linearity of the integral; symmetry is clear; positive definiteness depends on the formal definition of continuity, and is omitted.

For concreteness, the theorems below are stated only for the dot product on Rn . The proofs, however, use nothing more than the axioms for an inner product, and so they hold (with suitable interpretations) for an arbitrary inner product.

Magnitude Definition 3.5. Let x = (x1 , . . . , xn ) be a vector in Rn . The magnitude of x is the non-negative real number 1/2 X n q p i 2 1 2 n 2 (x ) . kxk = hx, xi = (x ) + · · · + (x ) = i=1

A vector of magnitude 1 is a unit vector.

Proposition 3.6. If x is a vector in Rn and c is real, kcxk = |c| kxk. If x 6= 0n , then x may be written uniquely as a positive scalar multiple of a unit vector:   x . x = kxk kxk

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

55

Proof. The definition of magnitude, together with bilinearity, gives kcxk =

p

q hcx, cxi = c2 hx, xi = |c| kxk.

If x 6= 0n , a scalar multiple of x is a unit vector if and only if kcxk = 1, if and only if |c| = 1/kxk, i.e., c = ±1/kxk. The only positive choice is c = 1/kxk. Definition 3.7. If x 6= 0, the unit vector x/kxk is obtained by normalizing x. Remark 3.8. A unit vector may be viewed as a “pure direction”. The proposition gives precise meaning to the “principle” that a vector is a quantity having both magnitude and direction. Theorem 3.9 (Cauchy-Schwarz Inequality). If x and y are vectors in Rn , then | hx, yi | ≤ kxk kyk, with equality if and only if x and y are proportional. Proof. If x = 0, then hx, yi = 0 and the conclusion of the theorem is immediate. Otherwise, for each real number t, consider the vector tx + y. Expanding ktx + yk2 = htx + y, tx + yi as a function of t, 0 ≤ htx + y, tx + yi = t2 hx, xi + 2t hx, yi + hy, yi = t2 kxk2 + 2t hx, yi + kyk2 .

In other words, 0 ≤ At2 + Bt + C, with A = kxk2 , B = 2 hx, yi, and C = kyk2 . A non-negative quadratic function satisfies B 2 − 4AC ≤ 0, or B 2 ≤ 4AC; otherwise the quadratic formula would give two distinct real roots, and the quadratic would be negative in between. Plugging in the definitions of A, B, and C gives (hx, yi)2 ≤ kxk2 kyk2 ,

or | hx, yi | ≤ kxk kyk.

Equality holds if and only if the quadratic has a double root, if and only if there exists a real number t such that tx + y = 0, if and only if x and y are proportional.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

56

LINEAR ALGEBRA

Angle Theorem 3.10. If x and y are vectors in Rn , then the angle θ between x and y satisfies hx, yi = kxk kyk cos θ. Proof. If x = 0 or y = 0, the equality in the theorem is immediate. It remains to handle the case with x and y both non-zero. Let θ be the angle between x and y in the plane containing these two vectors. The

kyk kx 2 yk θ 0

kxk

triangle with vertices 0, x, and y has sides of magnitude kxk, kyk, and kx − yk. By the Law of Cosines, kx − yk2 = kxk2 + kyk2 − 2kxk kyk cos θ. But the left-hand side can be expanded as hx − y, x − yi = kxk2 + kyk2 − 2 hx, yi . Equating and canceling gives the theorem. Corollary 3.11. If x and y are non-zero vectors, there is a unique real number θ in the interval [0, π] such that cos θ =

hx, yi . kxk kyk

Remark 3.12. The expression on the right lies in the interval [−1, 1] by the Cauchy-Schwarz inequality. Definition 3.13. Two vectors x, y in Rn are orthogonal if hx, yi = 0. Example 3.14. The vectors x = (1, 1, 1) and y = (1, −2, 1) in R3 are orthogonal.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

57

Example 3.15. If x = (1, 0) and y = (1, 1), then kxk2 = hx, xi = 1,



kyk2 = hy, yi = 2,

hx, yi = 1.

Consequently, cos θ = 1/ 2, so θ = π/4. This agrees with geometric expectation, since x is a side and y a diagonal of a square. We can perform angle computations for vectors we cannot easily visualize. If x = (1, 0, 0, 0) and y = (1, 1, 1, 1), then kxk2 = 1,

kyk2 = 4,

hx, yi = 1.

Consequently, cos θ = 1/2, so θ = π/3. (Generally, the angle between a standard basis vector and the “diagonal of an n-cube” is not a “nice” multiple of π.) Theorem 3.16 (The Pythagorean Theorem in Rn ). If x and y are vectors in Rn , then kx + yk2 = kxk2 + kyk2

if and only if x and y are orthogonal.

Proof. Expanding the left-hand side as a dot product, kx + yk2 = hx + y, x + yi = kxk2 + kyk2 + 2 hx, yi . The theorem follows immediately.

3.2

Orthogonal Projection

Definition 3.17. Let a be a non-zero vector in Rn . If x ∈ Rn , a pair of vectors xa , xa⊥ is called a decomposition of x into parallel and orthogonal components with respect to a if (i) x = xa + xa⊥ ; (ii) xa = ca for some real number c; (iii) hxa⊥ , ai = 0.

Theorem 3.18. Let a be a non-zero vector in Rn . For each x in Rn , there is a unique decomposition of x into parallel and orthogonal components with respect to a, given by the formulas proja (x) =

hx, ai · a, ha, ai

perpa (x) = x − proja (x).

Proof. Let x be an arbitrary vector in Rn . If c is a real number, the difference x − ca is orthogonal to a if and only if c satisfies the equation 0 = hx − ca, ai = hx, ai − c ha, ai. By elementary algebra, c = hx, ai / ha, ai.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

58

LINEAR ALGEBRA x

Definition 3.19. If a is a non-zero vector in Rn , then perpa (x) proja (x) is called the (orthogonal) projection of x on a. Remark 3.20. If kuk = 1, then proju (x) = hx, ui · u.

a proja (x)

Example 3.21. Let a = (2, 1), and x = (x1 , x2 ) an arbitrary vector in R2 . We have ha, ai = 5, hx, ai = 2x1 + x2 , and 2x1 + x2 (2, 1) = 51 (4x1 + 2x2 , 2x1 + x2 ), 5 perpa (x) = x − proja (x) = 15 (x1 − 2x2 , −2x1 + 4x2 ).   As expected, hperpa (x), ai = 51 2(x1 − 2x2 ) + (−2x1 + 4x2 ) = 0. proja (x) =

Example 3.22. If θ is real, a = (cos θ, sin θ), and a⊥ = (− sin θ, cos θ), then proja (x) = (x1 cos θ + x2 sin θ)(cos θ, sin θ), proja⊥ (x) = (x2 cos θ − x1 sin θ)(− sin θ, cos θ) = a − proja (x). Example 3.23. If a = (1, 1, 1, 1), then ha, ai = 4. For an arbitrary vector x = (x1 , x2 , x3 , x4 ) in R4 we have hx, ai = x1 + x2 + x3 + x4 , x1 + x2 + x3 + x4 (1, 1, 1, 1), 4 perpa (x) = x − proja (x). proja (x) =

Orthonormal Bases and Orthogonal Matrices Definition 3.24. A set (uj )kj=1 in Rn is orthonormal if each element is a unit vector and any two elements are orthogonal, i.e., hui , uj i = δij .

Theorem 3.25. If S = (uj )kj=1 is an orthonormal set of vectors in Rn , P and if x = j xj uj for some scalars (xj )kj=1 , then xj = hx, uj i. In particular, S is linearly independent. P Proof. If x = j xj uj , then for each i, hx, ui i =

DX

j

j

E

x uj , ui =

k X j=1

j

x huj , ui i =

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

k X j=1

xj δji = xi .

CHAPTER 3. EUCLIDEAN GEOMETRY

59

If x = 0n is a linear combination from S, then xj = hx, uj i = 0 for all j, so S is linearly independent. Corollary 3.26. If S = (uj )nj=1 is an orthonormal set in Rn , then S is a basis, and for every x in Rn , x = hx, u1 i · u1 + · · · + hx, un i · un =

n X j=1

hx, uj i · uj .

Proof. Since S is a linearly independent set of n elements in Rn by Theorem 3.25, S is a basis by Corollary 2.68. It follows that an arbitrary vector x may be written as a linear combination from S. Theorem 3.25 gives the form of the coefficients. P Corollary 3.27. If (uj )nj=1 is orthonormal in Rn , then j uj uT j = In . Proof. For every x in Rn , uT j x = hx, uj i. By Corollary 3.26, X n

uj uT j

j=1

for all x, so

P

j



x=

n X j=1

uj (uT j x) =

n X j=1

hx, uj i · uj = x = In x

u j uT j = In by Corollary 1.23.

Definition 3.28. An n × n real matrix P is orthogonal if P −1 = P T , i.e., if P T P = In and P P T = In . Example 3.29. Let θ be real. The following matrices are orthogonal:     cos θ sin θ cos θ − sin θ . , Ref θ/2 = Rotθ = sin θ − cos θ sin θ cos θ (The notation is explained in Examples 4.17 and 4.19.) Example 3.30. A permutation matrix (Exercise 1.13) is orthogonal. Theorem 3.31. The following are equivalent for an n×n real matrix P : (i) The columns of P form an orthonormal set in Rn . (ii) The columns of P T form an orthonormal set in Rn . (iii) P is orthogonal.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

60

LINEAR ALGEBRA

Proof. In general, if A and B are n×n matrices, the (i, j) entry of AT B is the dot product of the ith column of A and the jth column of B. Similarly, the (i, j) entry of AB T is the dot product of the ith column of AT and the jth column of B T . Since In = [δji ], the equation P T P = In holds if and only if the columns of P are orthonormal, and P P T = In holds if and only if the columns of P T are orthonormal. The content of the proof is to show each of these equations implies the other. ((i) if and only if (ii)). If the columns of P are orthonormal, then P T P = In . Further, the columns of P are a basis of Rn by Theorem 3.25. Consequently, the reduced row-echelon form of P T is In , so P T is invertible by Theorem 1.49. By Theorem 1.30 (ii), P P T = In as well, so the columns of P T are orthonormal. The converse implication follows mutatis mutandis by exchanging the roles of P and P T . Since (iii) is equivalent to “P T P = In and P P T = In ”, the proof is complete. Theorem 3.32. The set O(n) of n × n real orthogonal matrices is a group under matrix multiplication. Proof. The identity matrix In is orthogonal, and acts as the multiplicative identity element in O(n). If P and Q are orthogonal, then (P Q)T = QT P T = Q−1 P −1 = (P Q)−1 , so P Q is orthogonal. That is, O(n) is closed under multiplication. By Theorem 3.31, P is orthogonal if and only if P T = P −1 is orthogonal, so O(n) is closed under inversion. Theorem 3.33. The following are equivalent for an n×n real matrix P : (i) hP x, P yi = hx, yi for all x, y in Rn . (ii) P is orthogonal. Proof. Since hx, yi = xT y, we have

hP x, P yi = (P x)T (P y) = (xT P T )(P y) = xT (P T P )y,

and therefore

hP x, P yi − hx, yi = xT (P T P )y − xT (In )y = xT (P T P − In )y

for all x, y. It follows that hP x, P yi = hx, yi for all x, y in Rn if and only if P T P − In = 0n×n , if and only if P is orthogonal.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

61

The Gram-Schmidt Algorithm Theorem 3.34. If W is a subspace of (Rn , +, ·), then W has an orthonormal basis. Proof. The idea is to start from an arbitrary basis (vj )m j=1 of W and “orthonormalize” the vectors inductively, constructing, for k = 1, . . . , m, an orthonormal basis (uj )kj=1 of the space Wk = Span(vj )kj=1 , see Figure 3.1. The process is called the Gram-Schmidt algorithm. When k = m, we obtain an orthonormal basis of W . Since v1 6= 0n , we may set u1 = v1 /kv1 k. Assume inductively that, for some k ≥ 1, we have constructed an orthonormal basis (uj )kj=1 of the space Wk = Span(vj )kj=1 . Define u0k+1

= vk+1 −

k X j=1

hvk+1 , uj i · uj .

Note that u0k+1 6= 0n since vk+1 does not lie in Wk = Span(uj )kj=1 ,

and that u0k+1 , uj = 0 for j = 1, . . . , k by direct computation. We may therefore put uk+1 = u0k+1 /ku0k+1 k. Since each element of the linearly independent set (uj )k+1 j=1 is a linear k+1 combination from (vj )j=1 , each set is a basis of Wk+1 . v3

v3

v3

u3 u23

v1 u1

v2

v1 u1

v2 u2

v1 u1

v2 u2

u22 Figure 3.1: The Gram-Schmidt algorithm.

The Orthogonal Complement of a Subspace Definition 3.35. If W is a subspace of (Rn , +, ·) equipped with the Euclidean inner product, the orthogonal complement W ⊥ is W ⊥ = {x in Rn : hx, wi = 0 for all w in W }.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

62

LINEAR ALGEBRA

Remark 3.36. The fact that W ⊥ is a subspace comes from bilinearity of the dot product: If x, y are in W ⊥ and c is real, then hcx + y, wi = c hx, wi + hy, wi = 0 for all w in W . Theorem 3.37. Let W be a subspace of (Rn , +, ·) and (uj )m j=1 an orthonormal basis of W . If x is an arbitrary vector in Rn , define projW (x) =

m X j=1

hx, uj i · uj .

(i) We have projW (x) ∈ W and x − projW (x) ∈ W ⊥ . (ii) For every w in W , we have kx − projW (x)k ≤ kx − wk, with equality if and only if w = projW (x). Proof. (i). Since S = (uj )m j=1 ⊆ W and projW (x) is a linear combination from S, projW (x) ∈ W . The proof of Theorem 3.25 shows that hprojW (x), uj i = hx, uj i for j = 1, . . . , m. It follows that hx − projW (x), uj i = 0 for j = 1, . . . , m. Since each element of W is a linear combination from (uj )m j=1 , ⊥ hx − projW (x), wi = 0 for all w in W , i.e., x − projW (x) ∈ W . (ii). Let w be an arbitrary vector in W . Since projW (x) − w is in W (as a difference of vectors in W ) and therefore orthogonal to x − projW (x) by part (i), the Pythagorean Theorem gives kx − projW (x)k2 ≤ kx − projW (x)k2 + k projW (x) − wk2 = k(x − projW (x)) + (projW (x) − w)k2 = kx − wk2 ,

with equality if and only if w = projW (x). Remark 3.38. Though the definition of projW (x) depended on a choice of orthonormal basis of W , the value does not: Property (ii) shows there is at most one element of W that is closer to x than all other elements of W , and projW (x) satisfies this “best approximation property”. Corollary 3.39. If W ⊆ Rn is a subspace, then Rn = W ⊕ W ⊥ .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

63

Proof. (W ∩ W ⊥ = {0n }). If x ∈ W ∩ W ⊥ , then x is orthogonal to itself. That is, hx, xi = 0. By positive-definiteness, x = 0n . (Rn = W + W ⊥ ). By the Gram-Schmidt algorithm, W has an n orthonormal basis (uj )m j=1 . If x ∈ R , then by Theorem 3.37,  x = projW (x) + x − projW (x) ∈ W + W ⊥ .

Corollary 3.40. If W is a k-dimensional subspace of (Rn , +, ·), there exists a homogeneous linear system Ax = 0n−k of (n − k) equations in n variables whose solution set is precisely W . Proof. A linear equation a1 x1 + · · · + an xn = 0 may be interpreted as asserting that the product of the row matrix aT = [aj ] with the column x = [xj ] is 0, i.e., that the vectors a = [aj ] and x are orthogonal. Let W ⊆ Rn be a subspace. By Corollary 3.39, the orthogonal n−k , with complement W ⊥ has dimension (n − k). Pick a basis (ai )i=1 j i T ai = [ai ], and form the matrix A = [aj ] whose ith row is ai . If x ∈ Rn , then Ax = 0n−k if and only if hai , xi = 0 for each i = 1, . . . , n − k, if and only if x ∈ (W ⊥ )⊥ , if and only if x ∈ W by Corollary 3.39. Example 3.41. Let v0 = (1, 1, 1, 1), and W = Span(v0 )⊥ ⊆ R4 . Find an orthonormal basis of W . A vector x = (x1 , x2 , x3 , x4 ) is in W if and only if 0 = hx, v0 i = x1 + x2 + x3 + x4 .   The coefficient matrix 1 1 1 1 is already in reduced row-echelon form. The variables x2 , x3 , and x4 are free, and the general solution is         −(x2 + x3 + x4 ) −1 −1 −1 2         x   = x2  1  + x3  0  + x4  0  . 3    0  0  1 x 4 1 0 0 x

Applying Gram-Schmidt to the columns (vj )3j=1 on the right-hand side,   −1  1 v1 1 . u1 = =√  kv1 k 2  0 0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

64

LINEAR ALGEBRA √ Next, since hv2 , u1 i = 1/ 2,   −1  0  v20 = v2 − hv2 , u1 i · u1 =   1 − 0 

  1 −1 −2 1    1  1 − 2  = = 2  0  1 0 0  −1 1 −1 v0 . normalizing gives u2 = 20 = √  kv2 k 6  2 0 Finally, 



 −1   1 −1 2  2 ; 0

v30 = v3 − hv3 , u1 i · u1 − hv3 , u2 i · u2         −1 −1 −1 −1  0 1 −1 1 −1 1 −1        =  0 − 2  0 − 6  2 = 3 −1 , 1 0 0 3   −1 1  v30 −1 . so u3 = 0 = √  kv3 k 2 3 −1 3

Remark 3.42. Normalizing a vector is equivalent to normalizing a nonzero scalar multiple; in the preceding example, we drop the fraction multipliers before computing magnitudes.

3.3

The Determinant

Definition 3.43. Let (vj )nj=1 be a set of vectors in Rn . The box (or parallelipiped ) they span is the set of linear combinations n X j=1

xj vj ,

0 ≤ xj ≤ 1 for j = 1, . . . , n.

The vector vj is the jth edge of the box. The unit cube is the box spanned by the standard basis (ej )nj=1 . We wish to define a notion of “oriented (n-dimensional) volume” for a box together with an ordering of its edges, generalizing oriented

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

65

area of a plane parallelogram, Figure 3.2. Our approach is to give axioms for a real-valued function on the set (Rn )n of ordered n-tuples of vectors in Rn in such a way that oriented volume can be computed by assembling (vj ) into an n × n matrix and using row reduction. v2

v2 v1

v1 v1

v1 v2

v2

Figure 3.2: Positive and negative oriented area. Along the way, we find a polynomial formula for oriented volume in terms of the components of the (vj ). The catch is, the formula contains n! summands. For matrices of size 2 × 2 (two terms) and 3 × 3 (six terms) the formula is suitable for practical use. Even for 10 × 10 matrices (10! = 3, 628, 800 terms), however, row reduction is the only serious approach for computing oriented volume. Theorem 3.44. There exists a unique function vol : (Rn )n → R satisfying the following conditions for all (vj )nj=1 in Rn : (i) For all x1 , x2 in Rn and all real c, vol(cx1 +x2 , v2 , . . . , vn ) = c vol(x1 , v2 , . . . , vn )+vol(x2 , v2 , . . . , vn ). (ii) Swapping two arguments changes the sign: vol(. . . , vj , . . . , vi , . . . ) = − vol(. . . , vi , . . . , vj , . . . ). In particular, vol(. . . , vi , . . . , vi , . . . ) = 0. (iii) vol(e1 , e2 , . . . , en ) = 1. Remark 3.45. The “distributive law” in (i) extends to arbitrary linear combinations in the first argument by induction on the number of summands. In conjunction with (ii), the property of skew-symmetry, there is a similar distributive law in the jth argument: Swap the first and jth arguments using (ii), apply (i) to break up the linear combination, then re-swap the first and jth arguments in each summand; the overall effect is to break up a linear combination in the jth argument.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

66

LINEAR ALGEBRA Before giving a proof, we look explicitly at the cases n = 2 and 3.

Example 3.46. If v1 = a1 e1 + a2 e2 and v2 = b1 e1 + b2 e2 , then vol(v1 , v2 ) = a1 b2 − a2 b1 . Indeed, the “distributive law” gives vol(v1 , v2 ) = vol(a1 e1 + a2 e2 , b1 e1 + b2 e2 )

= a1 b1 vol(e1 , e1 ) + a1 b2 vol(e1 , e2 ) + a2 b1 vol(e2 , e1 ) + a2 b2 vol(e2 , e2 ). Since vol(e1 , e1 ) = vol(e2 , e2 ) = 0, and vol(e2 , e1 ) = − vol(e1 , e2 ) by (ii), vol(v1 , v2 ) = a1 b2 vol(e1 , e2 ) + a2 b1 vol(e2 , e1 ) = (a1 b2 − a2 b1 ) vol(e1 , e2 )

= a1 b2 − a2 b1 .

Conceptually, the “distributive law” allows vol(v1 , v2 ) to be expressed in terms of vol(ei , ej ) for all 22 = 4 ordered pairs of indices i and j. Terms with repeated index are 0, so only 2! = 2 terms are non-zero, one with positive sign, one with negative sign. Example 3.47. If v1 = (a1 , a2 , a3 ), v2 = (b1 , b2 , b3 ), v3 = (c1 , c2 , c3 ), then vol(v1 , v2 , v3 ) = a1 (b2 c3 − b3 c2 ) + a2 (b3 c1 − b1 c3 ) + a3 (b1 c2 − b2 c1 ).

This can be handled by brute force, but now instead of 22 = 4 terms (only two of which are non-zero), there are 33 = 27 terms, of which only 3! = 6 are non-zero. Examining the structure of the computation saves enough work to be worthwhile. When we use v1 = a1 e1 +a2 e2 +a3 e3 (etc.) to expand vol(v1 , v2 , v3 ), we get summands ai bj ck vol(ei , ej , ek ), with each index i, j, k taking on each value 1, 2, 3. Because of skewsymmetry, any term with a repeated index is zero. Thus, there is one non-zero term for each permutation of the indices 1, 2, 3, and vol(ei , ej , ek ) = ± vol(e1 , e2 , e3 ) = ±1, the sign being determined by the number of “swaps” needed to put i, j, k in increasing order. The end result is vol(v1 , v2 , v3 ) = a1 b2 c3 − a1 b3 c2 + a2 b3 c1 − a2 b1 c3 + a3 b1 c2 − a3 b2 c1 = a1 (b2 c3 − b3 c2 ) + a2 (b3 c1 − b1 c3 ) + a3 (b1 c2 − b2 c1 ).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

67

Proof. (Uniqueness). Because an arbitrary vector vj can be expressed (uniquely) as a linear combination of standard basis vectors, the “distributive law” shows vol is completely determined by its values on arbitrary n-tuples of standard basis vectors. By (ii), if any vector is repeated as an argument, the corresponding summand is 0. That is, vol is determined by its values on n-tuples of distinct standard basis vectors, i.e., on permutations of the standard basis. But every ordering of the standard basis vectors can be put into “standard” order by swapping arguments. It follows that vol is completely determined by vol(e1 , . . . , en ), which is equal to 1 by (iii). To implement the preceding conceptual calculation as a formula, write, for each j = 1, . . . , n, vj =

A1j e1

+ · · · + Anj en

=

n X

Aij ei .

i=1

Each non-zero term in vol(v1 , . . . , vn ) comes from a permutation σ. If (−1)σ denotes the sign of σ, i.e., +1 if σ is an even permutation and −1 if σ is odd, then the term corresponding to σ is σ(1)

A1

σ(2)

A2

σ(n)

. . . An

σ(1)

vol(eσ(1) , . . . , eσ(n) ) = (−1)σ A1

σ(2)

A2

σ(n)

. . . An

.

Summing over all permutations in the symmetric group Sn gives X σ(1) σ(2) σ(n) vol(v1 , . . . , vn ) = (−1)σ A1 A2 . . . An . σ∈Sn

(Existence). It remains to show the preceding satisfies (i)–(iii). Condition (i) follows from the distributive law for real numbers, because each summand contains precisely one factor coming from v1 . To prove Condition (ii), let τ = (i j) exchange i and j, and note that (−1)σ = −(−1)στ because τ is odd. The general summand after swapping vi and vj is στ (1)

(−1)σ A1

στ (2)

A2

στ (n)

. . . An

στ (1)

= −(−1)στ A1

στ (2)

A2

στ (n)

. . . An

Summing over σ in Sn amounts toPsumming over στ . To prove (iii), note that ej = i δji ei , so X σ(n) σ(1) σ(2) vol(e1 , . . . , en ) = (−1)σ δ1 δ2 . . . δn .

.

σ∈Sn

The only summand with a non-zero contribution comes from the identity permutation, and the contribution is 1.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

68

LINEAR ALGEBRA

 Example 3.48. The box spanned by (2, −3), (1, −4) has signed area   2 1 = (2)(−4) − (1)(−3) = −5. det −3 −4 That is, the box (parallelogram) has area 5, and the small angle from (2, −3) to (1, −4) is oppositely oriented to the small angle from e1 to e2 (because the sign is negative). Definition 3.49. The determinant of an n × n matrix A = [Aij ] is X σ(n) σ(1) σ(2) (−1)σ A1 A2 . . . An . det A = σ∈Sn

Theorem 3.50. If A is an n × n matrix, then det A = det AT . Proof. Since (−1)σ

−1

= (−1)σ , summing over σ −1 in Sn gives X σ −1 (n) σ −1 (1) det A = . . . An (−1)σ A1 σ −1 ∈Sn

=

X

(−1)σ A1σ(1) A2σ(2) . . . Anσ(n) .

σ∈Sn

The last sum is det AT , since the (i, j) entry of AT is Aji . Theorem 3.51. If A = [Aij ] is upper triangular or lower triangular then the determinant is the product of the diagonal entries: det A =

n Y

Aii .

i=1

Proof. Assume first that A is upper triangular, namely that Aij = 0 for j < i. In the polynomial formula for det A, a summand σ(1)

(−1)σ A1

σ(2)

A2

σ(n)

. . . An

is zero unless σ(j) ≤ j for each j. But the only permutation satisfying these conditions is the identity: If σ is not the identity, there is a smallest index j with σ(j) 6= j, and since σ is a bijection of the set {1, 2, . . . , n}, σ(j) > j. Consequently, if A is upper triangular, then X σ(1) σ(2) σ(n) det A = (−1)σ A1 A2 . . . An = A11 A22 . . . Ann . σ∈Sn

If A is lower triangular, then AT is upper triangular.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY

69

Determinants and Row Operations The determinant of a square matrix is the signed volume of the box spanned by its columns or (because of Theorem 3.50) by its rows. (This is geometrically non-obvious; the boxes themselves do not generally have the same shape.) Properties (i) and (ii) for the determinant, regarded as a function of the rows of an m × m matrix, specify how the determinant changes under an elementary row operation: Theorem 3.52. If A is an m × m matrix, E an elementary matrix, and A = EA0 , then det A = (det E)(det A0 ). Proof. We check the claim explicitly for row operations of each type. (Type I). Adding a multiple of one row to another does not change the determinant: vol(. . . , vi + cvj , . . . , vj , . . . ) = vol(. . . , vi , . . . , vj , . . . ) + c vol(. . . , vj , . . . , vj , . . . ) = vol(. . . , vi , . . . , vj , . . . ). But a Type I. elementary matrix is triangular, with 1s on the diagonal, so det E = 1 in this case. (Type II). Multiplying the ith row by c multiplies the determinant by c. But the Type II. elementary matrix for this operation is diagonal, with single c in the (i, i) entry and 1s elsewhere on the diagonal, so det E = c in this case. (Type III). Exchanging two rows of A multiplies det A by −1. But the corresponding Type III. elementary matrix E is obtained from the identity by swapping two rows, so det E = − det Im = −1 in this case. Remark 3.53. As a fringe benefit of the proof, the determinant of an elementary matrix is non-zero. Theorem 3.52 (together with induction on the number of factors) shows that if E is a product of elementary matrices, then (i) det E 6= 0; (ii) det(EC) = (det E)(det C) for every m × m matrix C. Corollary 3.54. A matrix A is invertible if and only if det A 6= 0.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

70

LINEAR ALGEBRA

Proof. Put A in reduced row-echelon form, i.e., factor A = EA0 with E a product of elementary matrices and A0 a reduced row-echelon matrix, necessarily upper triangular. By Theorem 1.49, A is invertible if and only if A0 = Im , if and only if A0 does not contain a row of 0s, if and only if det A0 6= 0. The corollary follows since det E 6= 0 and det A = (det E)(det A0 ). Corollary 3.55. If A and B are m × m matrices, then det(AB) = (det A)(det B). In particular, det(AB) = det(BA). Proof. Factor A = EA0 as in the preceding proof. If A is invertible, then A0 = Im , and det(AB) = det(EB) = det(E)(det B) = det(A)(det B). If A is not invertible, then det A = 0 by the preceding corollary. Further, the reduced row-echelon matrix A0 has a row of 0s, so A0 B has a row of zeros. The reduced row-echelon matrix of A0 B consequently has a row of zeros, so det(A0 B) = 0, and det(AB) = (det E)(det A0 B) = 0 = (det A)(det B). Corollary 3.56. If A is invertible, then det(A−1 ) = 1/ det A. Proof. We have Im = AA−1 , so 1 = det Im = (det A)(det A−1 ). Corollary 3.57. If A is an arbitrary m×m matrix and P is invertible, then det(P −1 AP ) = det A. Proof. We have det(P −1 AP ) = (det P −1 )(det A)(det P ) = det A. Example 3.58. Find the signed volume  of the box spanned by the ordered triple (1, 1, 1), (1, 2, 4), (1, 3, 9) . The task is to find the determinant of the matrix whose columns are the given vectors. Although there is a six-term formula for a 3 × 3 determinant, we use row operations for simplicity (and illustration). We also use vertical bars, det A = |A|, to save space: 1 1 1 R2 −R1 , 1 1 1 R3 −R1 R3 −3R2 1 1 1 1 2 3 −→ 0 1 2 −→ 0 1 2 = (1)(1)(2) = 2. 1 4 9 0 3 8 0 0 2 For practice, check the answer with the six-term formula.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 3. EUCLIDEAN GEOMETRY Example 3.59. If a < b < c 1 a a2 R2 −R1 , 1 3 −R1 1 b b2 R−→ 0 1 c c2 0 1 b−a R2 , 1 1 c−a R3 −→ (b − a)(c − a) 0 0 1 R3 −R2 −→ (b − a)(c − a) 0 0

Exercises

71

are real numbers, then a a2 b − a (b − a)(b + a) c − a (c − a)(c + a) a a2 1 (b + a) 1 (c + a) a a2 1 (b + a) = (b − a)(c − a)(c − b). 0 (c − b)

Exercise 3.1. Let a be a non-zero vector in Rn , and let λ be a non-zero real number. (a) Use the orthogonal projection formula to show that projλa (x) = proja (x) for all x. (In words, orthogonal projection to a depends only on the direction of a, not on its magnitude.) (b) If a is a unit vector, how does the formula for proja (x) simplify? Exercise 3.2. Each part refers to u1 =

√1 (1, 1, 1), 3

u2 =

√1 (1, 0, −1), 2

u3 =

√1 (1, −2, 1). 6

(a) Verify that {u1 , u2 , u3 } is an orthonormal set. (b) Show by direct calculation that if x = (x1 , x2 , x3 ) is arbitrary, then x = hx, u1 i · u1 + hx, u2 i · u2 + hx, u3 i · u3 . Exercise 3.3. In R4 , suppose u1 = 12 (1, 1, 1, 1), u2 = 12 (1, −1, −1, 1), u3 = 21 (1, −1, 1, −1), u4 = 21 (1, 1, −1, −1). (a) Verify that {u1 , u2 , u3 , u4 } is an orthonormal set. (You need to check six inner products, in addition to computing four magnitudes.)

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

72

LINEAR ALGEBRA

4 (b) Calculate the four outer products (uj uT j )j=1 .

(c) Show by direct calculation that if x = (x1 , x2 , x3 , x4 ) is arbitrary, x = hx, u1 i · u1 + hx, u2 i · u2 + hx, u3 i · u3 + hx, u4 i · u4 . Exercise 3.4. Each part refers to the vectors v1 = (1, 1, 1, 1) and v2 = (1, 2, 3, 4) in R4 , and the plane W they span. (i) Use Gram-Schmidt to find an orthonormal basis of W . (ii) If x = (x1 , x2 , x3 , x4 ) is arbitrary, decompose x in W ⊕ W ⊥ . (iii) Find a system of two equations in four variables whose solution set is W . Similarly for W ⊥ . Exercise 3.5. Referring to Example 3.41: (a) Form the matrix P whose columns are (uj )3j=0 and verify that P T P = I4 and P P T = I4 . T T (b) Calculate the projection matrix projW = u1 uT 1 + u 2 u 2 + u3 u3 directly, and verify the result is consistent with Example 3.23.

Exercise 3.6. Let x and y be vectors in Rn . The parallelogram they span is the plane figure with vertices 0n , x, y, and x+y. The diagonals are the segments from 0n to x + y and from x to y. (a) Prove the parallelogram law :  kx + yk2 + kx − yk2 = 2 kxk2 + kyk2 ,

and give a geometric interpretation. (b) Prove the polarization identity:

kx + yk2 − kx − yk2 = 4 hx, yi . Conclude that the diagonals have equal magnitude if and only if the parallelogram is a rectangle. (c) Prove that the diagonals of a parallelogram are orthogonal if and only if the parallelogram is a rhombus. Exercise 3.7. Let P be an n × n matrix. Prove that P is orthogonal if and only if kP xk = kxk for all x in Rn . Suggestion: One direction is immediate from Theorem 3.33. For the other direction, use the polarization identity (preceding exercise).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Chapter 4 Linear Transformations 4.1

Linear Transformations

Definition 4.1. Let V and W be arbitrary vector spaces. A mapping T : V → W is said to be a linear transformation if T satisfies the morphism condition: T (cx + y) = cT (x) + T (y) for all x, y in V and all real c. Remark 4.2. Geometrically, T (cx + y) = cT (x) + T (y) means T maps the parallelogram in V with sides cx and y to the parallelogram in W with sides cT (x) and T (y), Figure 4.1. The mapping T is a linear transformation if and only if T has this effect on every parallelogram. T (y)

cx + y x + y y

T (cx + y) T (x + y)

x 0W

0V

T (x)

Figure 4.1: A linear transformation preserves parallelograms. Remark 4.3. Induction on the number of summands proves that “a linear transformation distributes over linear combinations”. Precisely, if v1 , . . . , vk are elements of V and x1 , . . . , xk are real numbers, then  X X k k i xi T (vi ). T x vi = i=1

i=1

73

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

74

LINEAR ALGEBRA

Definition 4.4. Let V , V 0 , and V 00 be vector spaces, T : V → V 0 and T 0 : V 0 → V 00 linear transformations. The composition T 0 T : V → V 00  is defined by T 0 T (x) = T 0 T (x) for x in V .

Remark 4.5. A composition T 0 T is linear, Exercise 4.8.

A linear transformation is the analog in linear algebra of a group homomorphism. Much of the terminology carries over, as do basic results about homomorphisms. Definition 4.6. Let T : V → W be a linear transformation. The kernel of T is the set ker(T ) = {x in V : T (x) = 0W } ⊆ V. The image of T is the set im(T ) = T (V ) = {y in W : y = T (x) for some x in V } ⊆ W. If, for all x1 and x2 in V , T (x1 ) = T (x2 ) implies x1 = x2 , then T is said to be injective. If T (V ) = W , i.e., if, for every y in W , there exists an x in V such that y = T (x), then T is said to be surjective. If T is bijective, i.e., both injective and surjective, then T is an isomorphism (of vector spaces). Theorem 4.7. Let T : V → W be a linear transformation. (i) The kernel of T is a subspace of V . (ii) The image of T is a subspace of W . (iii) T is injective if and only if ker(T ) = {0V }, i.e., dim ker(T ) = 0. (iv) If T is an isomorphism, then the inverse map T −1 : W → V is an isomorphism. Proof. (i). We first show T (0V ) = 0W , proving the kernel is non-empty: If x is an arbitrary element of V , then T (0V ) = T (0 · x) = 0 · T (x) = 0W . If x1 and x2 are elements of ker(T ) and c is real, then T (cx1 + x2 ) = cT (x1 ) + T (x2 ) = c · 0W + 0W = 0W ,

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

75

so cx1 + x2 ∈ ker(T ). By Theorem 2.26, ker(T ) is a subspace of V .

(ii). Suppose y1 and y2 are elements of T (V ) and c is real. By hypothesis, there exist x1 and x2 in V such that T (x1 ) = y1 and T (x2 ) = y2 . Consequently, T (cx1 + x2 ) = cT (x1 ) + T (x2 ) = cy1 + y2 ,

so cy1 + y2 ∈ T (V ). By Theorem 2.26, T (V ) is a subspace of W .

(iii). Suppose T is injective. If x is an arbitrary vector in ker(T ), then T (x) = 0W = T (0V ), so x = 0V . That is, ker(T ) = {0V }. Conversely, suppose ker(T ) = {0V }. If T (x1 ) = T (x2 ), we have W 0 = T (x1 ) − T (x2 ) = T (x1 − x2 ), i.e., x1 − x2 ∈ ker(T ) = {0V }. That is, x1 = x2 . Since x1 and x2 were arbitrary, T is injective. (iv). The mapping T −1 : W → V is defined by T −1 (y) = x if and only if T (x) = y. Bijectivity of T −1 is obvious; it suffices to prove T −1 is linear. Suppose y1 and y2 are elements of W , c is real, and x1 and x2 are the unique vectors in V such that T (x1 ) = y1 and T (x2 ) = y2 . Since cy1 + y2 = cT (x1 ) + T (x2 ) = T (cx1 + x2 ) by linearity of T , the definition of T −1 gives T −1 (cy1 + y2 ) = cx1 + x2 = cT −1 (y1 ) + T −1 (y2 ). Corollary 4.8. If V is finite-dimensional and T : V → W is a linear transformation, then dim T (V ) ≤ dim V , with equality if and only if T is injective. Proof. If (vj )nj=1 is a basis of V , and if we put wj = T (vj ), then the nelement set (wj )nj=1 spans T (V ): Every x in V may be written uniquely P P as j xj vj , and by linearity, T (x) = j xj wj . Theorem 2.64 implies dim T (V ) ≤ dim V . Equality of dimensions holds if and only if (wj )nj=1 is linearly indeP pendent, if and only if no non-trivial linear combination j xj wj is 0W , P if and only if no non-trivial linear combination j xj vj is in ker(T ), if and only if T is injective. Remark 4.9. In words, applying a linear transformation T cannot increase dimension, and strictly reduces dimension unless T is injective.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

76

LINEAR ALGEBRA

Examples of Linear Transformations Example 4.10. Let V and W be arbitrary vector spaces. The zero W W for all x in V , is linear. map 0W V : V → W , defined by 0V (x) = 0 Example 4.11. Let V be an arbitrary vector space. The identity map IV : V → V , defined by IV (x) = x for all x in V , is linear. Example 4.12. If A = [Aij ] is an m × n real matrix, multiplication by A defines a linear transformation µA : Rn → Rm via µA (x) = Ax. m In components, if x = [xj ], and (ei )m i=1 is the standard basis of R , then  m X n X i j µA (x) = Aj x ei . i=1

j=1

Linearity follows from properties of matrix operations, Theorem 1.19: µA (cx + y) = A(cx + y) = c(Ax) + Ay = cµA (x) + µA (y). Conversely, every linear transformation from Rn to Rm can be written as matrix multiplication in this way, Theorem 4.26. Theorem 4.13. Let (V, +, ·) be an arbitrary finite-dimensional vector space of dimension n, and let S = (vj )nj=1 be a basis. The mapping ιS : V → Rn that sends each x in V to its coordinate vector ιS (x) = [x]S in Rn is an isomorphism. Proof. (ιS is linear). If x is an arbitrary element of V , then by definition, [x]S = [xj ] if and only if 1

n

x = x v1 + · · · + x vn =

n X

xj vj .

j=1

If y is an arbitrary element of V with [y]S = [y j ], and if c is real, then cx + y = c

n X

xj v j +

j=1

n X j=1

y j vj =

n X

(cxj + y j )vj .

j=1

In terms of coordinate vectors, [cx + y]S = [cxj + y j ] = c[xj ] + [y j ] = c[x]S + [y]S .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

77

(ιS is bijective). The inverse mapping (ιS )−1 : Rn → V sends an ordered n-tuple [xj ] to the linear combination from S with the corresponding coefficients, n  X (ιS )−1 [xj ] = xj v j . j=1

Existence of an inverse proves bijectivity. [x]S = (−2, 2)

[x]S = ( 32 , 0) v2

[x]S = (− 25 , 12 )

0V

[x]S = (−2, −1)

v1 [x]S = (1, −1) [x]S = ( 14 , −2)

Figure 4.2: Coordinate vectors in a two-dimensional vector space. Remark 4.14. Geometrically, a basis S with n elements defines a coordinate grid consisting of linear combinations where at least (n − 1) coefficients are integers, Figure 4.2; the isomorphism ιS assigns to each location x in V an “address” [x]S in Rn . Conceptually, the mapping ιS and its inverse define a “dictionary” between V and Rn in the presence of a basis S of V . This dictionary depends on S. Material below on “change of basis” tells us how to “translate” between the dictionaries associated to two bases. Example 4.15. Let V = C ∞ (R, R) be the vector space of smooth functions under ordinary addition and scalar multiplication. The following define linear transformations from V to V (for (i)–(iii)) or from V to R (for (iv), (v)). Linearity follows from theorems of calculus; a and b denote arbitrary real numbers. (i) Differentiation, D(f ) = f 0 . (ii) Integration from a, Ia (f ) =

Rx a

f (t) dt.

(iii) Multiplication by a smooth function φ, µφ (f ) = φf . (iv) Evaluation at a, Ea (f ) = f (a). Rb (v) Integration over [a, b], Iab (f ) = a f (t) dt.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

78

LINEAR ALGEBRA

Linear Transformations on the Plane Axial scaling, rotation, projection, and reflection are linear transformations T : R2 → R2 . If (ej )2j=1 is the standard basis and if (wj )2j=1 is an arbitrary pair of vectors (possibly equal), there exists a unique 2 × 2 matrix A satisfying Ae1 = w1 , and Ae2 = w2 , namely the matrix whose columns are w1 and w2 . Figure 4.3 shows the geometric effect on the plane. Example 4.16. (Axial scaling) Let a1 and a2 be real numbers, and define    a1 0 x1 1 2 1 2 T (x , x ) = (a1 x , a2 x ) = . 0 a2 x2 Here, wi = ai ei . The geometric effect is to “scale” each coordinate axis. Axis-aligned rectangles map to axis-aligned rectangles under T , but if |a1 | = 6 |a2 |, “tilted” rectangles do not map to rectangles. Example 4.17. Let θ be real. Rotating the plane counterclockwise about the origin maps the standard basis vector e1 to w1 = (cos θ, sin θ), and maps e2 to w2 = (− sin θ, cos θ). Rotation preserves parallelograms, and is therefore a linear transformation Rotθ , given by     1  x cos θ − x2 sin θ cos θ − sin θ x1 1 2 . = 1 Rotθ (x , x ) = sin θ cos θ x2 x sin θ + x2 cos θ Example 4.18. Let θ be real, a = (cos θ, sin θ). Orthogonal projection proja maps e1 to w1 = cos θ(cos θ, sin θ), and maps e2 to w2 = sin θ(cos θ, sin θ). The matrix of this linear transformation is      cos2 θ cos θ sin θ cos θ  cos θ sin θ . = 2 sin θ cos θ sin θ sin θ The factorization on the right is an outer product. Example 4.19. Let θ be real, a = (cos θ, sin θ), a⊥ = (− sin θ, cos θ). Reflection across a is defined by Ref θ (x) = x − 2 proja⊥ (x) = −x + 2 proja (x). The matrix of this linear transformation may be found by calculating Ref θ (e1 ) and Ref θ (e2 ), or by using the preceding example:       cos2 θ cos θ sin θ cos(2θ) sin(2θ) −1 0 . = +2 sin(2θ) − cos(2θ) 0 −1 cos θ sin θ sin2 θ

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

79

w2 w2

w1

w1

w1

w1 w2

w2

Figure 4.3: The identity, axial scaling, rotation, and reflection.

4.2

The Space of Linear Transformations

Proposition 4.20. If T1 and T2 : V → W are linear transformations and α is real, the linear combination αT1 + T2 : V → W defined by (αT1 + T2 )(x) = αT1 (x) + T2 (x)

for all x in V

is a linear transformation. Proof. If x and y are arbitrary elements of V and c is real, then (αT1 + T2 )(cx + y) = αT1 (cx + y) + T2 (cx + y)     = α cT1 (x) + T1 (y) + cT2 (x) + T2 (y)     = c αT1 (x) + T2 (x) + αT1 (y) + T2 (y) = c(αT1 + T2 )(x) + (αT1 + T2 )(y). Remark 4.21. Let V and W be vector spaces. The set L(V, W ) of linear transformations T : V → W becomes a vector space under the operations of addition and scalar multiplication defined in the proposition. If V = Rn and W = Rm , it turns out that L(V, W ) = Rm×n . Proposition 4.22. Let V be a vector space with basis S = (vα ). If T : V → W is a linear transformation satisfying T (vα ) = 0W for all α, then T (x) = 0W for all x in V . The notation (vα ) indicates an arbitrary family of vectors; that is, the proposition holds even for infinite-dimensional spaces. Proof. Let x be an arbitrary element of V , and write x as a linear combination of basis elements. Since a linear combination is finite by definition, we have n X x= xj v j j=1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

80

LINEAR ALGEBRA

for some basis vectors vj in S and some scalars xj . By linearity, T (x) = T

X n j=1

j

x vj



=

n X

j

x T (vj ) =

n X j=1

j=1

xj · 0W = 0W .

Corollary 4.23. If S is a basis of (V, +, ·), and if T1 and T2 are linear transformations from V to W that agree on S, in that T1 (vα ) = T2 (vα ) for all vα in S, then T1 (x) = T2 (x) for all x in V . Proof. Apply Proposition 4.22 to T1 − T2 . Theorem 4.24. Let V be an n-dimensional vector space, S = (vj )nj=1 a basis, W an arbitrary vector space, and S 0 = (wj )nj=1 an arbitrary ordered set of vectors in W . (i) There exists a unique linear transformation T : V → W satisfying T (vj ) = wj for j = 1, . . . , n. (ii) T is injective if and only if S 0 is linearly independent. (iii) T is surjective if and only if S 0 spans W . Remark 4.25. In words, (i) says a linear transformation is uniquely specified by its values on a basis. If ordered sets S and S 0 of vectors are given as in the theorem, the transformation T is said to be obtained from the conditions T (vj ) = wj via extension by linearity. Proof. (i). Because S is a basis of VP, every vector x in V may be written uniquely as a linear combination j xj vj from S. If T is linear, then T (x) =

n X j=1

xj T (vj ) =

n X

xj wj .

j=1

This formula defines a mapping T : V → W , easily checked to be linear, and to satisfy T (vj ) = wj . Uniqueness is asserted by Corollary 4.23. In the remainder of the proof, the preceding formula, which expresses T (x) as the general linear combination from S 0 , is used freely. P j (ii). A vector x is in ker(T ) if and only if 0W = T (x) = x wj . V But T is injective only if the P ifj and only if ker(T ) = {0 }, if and equation 0W = x wj has only the trivial solution [xj ] = 0n , if and only if S 0 is linearly independent.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

81

(iii). A vector P y is in T (V ) if and only if there exist scalars xj such that y = T (x) = xj wj . But TPis surjective if and only if T (V ) = W , if and only if the equation y = xj wj has a solution [xj ] for every y in W , if and only if S 0 spans W .

The Matrix of a Transformation We now have enough machinery to express an arbitrary linear transformation between finite-dimensional vector spaces in terms of matrices of real numbers. Theorem 4.26. Let V and W be finite-dimensional vector spaces, and assume S = (vj )nj=1 and S 0 = (wi )m i=1 are bases for V and W , respectively. If T : V → W is a linear transformation, there exist unique scalars Aij , for i = 1, . . . , m and j = 1, . . . , n, such that T (vj ) =

m X

Aij wi

for j = 1, . . . , n.

i=1

Proof. For each j, the vector T (vj ) can be written uniquely as a linear combination from S 0 . If Aij is the wi -component of T (vj ), then the equation in the theorem holds. 0

Definition 4.27. The m × n matrix [T ]SS = [Aij ] is called the matrix of T with respect to S and S 0 . Remark 4.28. If bases S for V and S 0 for W are fixed, this association 0 of a matrix to a linear transformation defines a linear isomorphism ιSS from the vector space L(V, W ) to the vector space Rm×n of m × n real matrices. Theorem 4.26 is the analog, for linear transformations, of the “coordinatization” in Theorem 4.13 for vectors. Example 4.29. Let V be a vector space with basis S = (vj )nj=1 . The matrix of the identity transformation is   1 0 ... 0 0 1 . . . 0   [IV ]SS =  .. .. . . ..  , . . . . 0 0 ... 1 whose jth column encodes the equation IV (vj ) = vj .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

82

LINEAR ALGEBRA

Example 4.30. Let V and W be finite-dimensional vector spaces with respective bases S = (vj )nj=1 and S 0 = (wi )m i=1 . For each pair (i, j)

with i = 1, . . . , m and j = 1, . . . , n, let Eij : V → W be the linear transformation defined by Eij (vj ) = wi and Eij (vk ) = 0W if k 6= j; 0 that is, Eij (vk ) = δkj wi . The matrix [Eij ]SS is eji = ei ej , having a 1 in the (i, j) entry and 0’s elsewhere.

Example 4.31. If T : Rn → Rm has one-dimensional image, there exist non-zero vectors v in Rn and w in Rm such that T (x) = hv, xi · w = wv T x for all x in Rn . The standard matrix of T is wv T . The kernel and image of T are ker(T ) = Span(v)⊥ and im(T ) = Span(w). Proof. By hypothesis, T (V ) is 1-dimensional. Let w be an arbitrary non-zero vector in the image. If (ej )nj=1 is the standard basis of Rn , then for each is a scalar v j such that T (ej ) = v j w. Put v = [v j ]. P j there If x = j xj ej is an arbitrary vector in Rn , then T (x) =

n X

xj T (ej ) =

j=1

n X j=1

xj v j w = hv, xi · w.

Corollary 4.32. In the notation of the theorem, 0

0

[T (x)]S = [T ]SS [x]S

for all x in V .

P 0 Proof. Write [T ]SS = [Aij ] and [x]S = [xj ], so that x = j xj vj . We want to express T (x) with respect to S 0 . Substituting,  m m X n n n X X X X i i j j j Aj wi = T (x) = Aj x wi . x T (vj ) = x j=1

j=1

i=1

i=1

j=1

The expression in parentheses is precisely the ith row of the matrix 0 product [T ]SS [x]S , and is also the wi -component of T (x), i.e., the 0 ith row of [T (x)]S . Corollary 4.33. If T : V → V 0 and T 0 : V 0 → V 00 are linear transformations between finite-dimensional vector spaces with respective bases S, S 0 , and S 00 , then 0 00 00 [T 0 T ]SS = [T 0 ]SS 0 [T ]SS .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

83

Proof. For all x in V , Corollary 4.32 gives 00

00

00

0

00

0

[T 0 T ]SS [x]S = [T 0 T (x)]S = [T 0 ]SS 0 [T (x)]SS = [T 0 ]SS 0 [T ]SS [x]S . 0

00

00

By Corollary 1.23, [T 0 T ]SS = [T 0 ]SS 0 [T ]SS . Remark 4.34. In words, evaluation of a linear transformation and composition of transformations correspond to matrix multiplication. These results are a basic computational idiom in linear algebra, and justify the definition of matrix multiplication. The conclusions of Corollaries 4.32 and 4.33 may be expressed, respectively, as commutative diagrams, collections of spaces and mappings in which two compositions with the same domain and target are the same mapping. Specifically: x∈V   Sy ι

T

−−−→ [T ]S

0

T

T (x) ∈ W   S0 yι

V −−−→  S yι

0

[x]S ∈ Rn −−−S→ [T (x)]S ∈ Rm

[T ]S S

Rn

0

−−−→

T0

V 0 −−−→ V 00    S0  S 00 yι yι 00

0 Rn

[T 0 ]S S0

−−−−→ Rn

00

Change of Basis Definition 4.35. Let V be an n-dimensional vector space with bases 0 S and S 0 . The n×n matrix [IV ]SS is called the transition matrix from S to S 0 . 0

0

Remark 4.36. By Corollary 4.32, [x]S = [IV ]SS [x]S for all x in V . That is, the transition matrix “converts” coordinate vectors with respect to S into coordinate vectors with respect to S 0 . It may help to think of an element x in V as an object, and to think of the coordinate vector [x]S as a description of the object in the lan0 guage S. The transition matrix [IV ]SS is a “dictionary” or “translator” from language S to language S 0 . Corollary 4.37. Let V be an n-dimensional vector space with bases S and S 0 . 0

(i) The transition matrix [IV ]SS is invertible, and the inverse is the transition matrix [IV ]SS 0 . 0

0

(ii) If T : V → V is linear, then [T ]SS 0 = [IV ]SS [T ]SS [IV ]SS 0 .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

84

LINEAR ALGEBRA

Proof. Example 4.29 notes that the matrix of the identity transformation with respect to a single basis is the n × n identity matrix In . By Corollary 4.33, 0

[IV ]SS 0 [IV ]SS = [IV ]SS = In ,

0

0

[IV ]SS [IV ]SS 0 = [IV ]SS 0 = In .

Assertion (ii) follows immediately from Corollary 4.33. Definition 4.38. Let n be a positive integer, and let A1 and A2 be n × n matrices. We say A2 is similar to A1 if there exists an invertible n × n matrix P such that A2 = P −1 A1 P . Remark 4.39. Corollary 4.37 (ii) says that if T is a linear operator on V (a linear transformation from V to itself), then the matrices of T with respect to arbitrary bases are similar. Proposition 4.40. Similarity is an equivalence relation on the set of n × n matrices. Proof. (Reflexivity). Since the identity matrix In is its own inverse, A = In AIn = In−1 AIn . (Symmetry). Suppose A2 = P −1 A1 P . Multiplying on the left by and on the right by P −1 gives (P −1 )−1 A2 P −1 = A1 . In words, if A2 is similar to A1 , then A1 is similar to A2 . (P −1 )−1

(Transitivity). Suppose A2 = P −1 A1 P and A3 = Q−1 A2 Q. Substituting the first into the second, A3 = Q−1 A2 Q = Q−1 P −1 A1 P Q = (P Q)−1 A1 (P Q), so A3 is similar to A1 . Remark 4.41. Detecting whether two n × n matrices are similar or not requires work. Chapter 5 is devoted to locating, with the similarity class of a matrix A, a “proxy” having a particularly simple structure. Two matrices are similar if and only if they have the same proxy. Example 4.42. Let V = P3 be the four-dimensional vector space of cubic polynomials, S = (1, t, t2 , t3 ) = (vj )4j=1 the “standard” basis, S 0 = (1, 1 + t, 1 + t + t2 , 1 + t + t2 + t3 ) = (vj0 )4j=1 a non-standard basis, and D : P3 → P3 the derivative operator from calculus, D(p) = p0 . We will find the transition matrices between S and S 0 , calculate the matrix

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

85

of D with respect to S, and use Corollary 4.37 to find the matrix of D with respect to S 0 . The jth column of the transition matrix [I]SS 0 is the coordinate vector [vj0 ]S of the jth element of S 0 with respect to S. By inspection,

[I]SS 0



1 0 = 0 0

1 1 0 0

 1 1 . 1 1

1 1 1 0

The inverse can be computed either by calculating the coordinate vec0 tors [vj ]S , or by using row operations on the preceding matrix. Both are shown for illustration. To calculate coordinate vectors, set up the vector equations v1 v2 v3 v4

v10 + v10 + v10 + v10 +

= = = =

v20 + v20 + v20 + v20 +

v30 + v30 + v30 + v30 +

v40 , v40 , v40 , v40 .

In general, each equation becomes a system; here, the system is easy to solve by inspection: 1 = 1 v10 + t = −1 v10 +

t2 =

0 v20 + 1 v20 +

0 v30 + 0 v30 +

0 v40 , 0 v40 ,

0 v10 + −1 v20 +

1 v30 +

0 v40 ,

0 v20 + −1 v30 +

1 v40 .

0 v10 +

t3 =

0

The coefficients in the jth row are the jth column of [I]SS . Alternatively, use the matrix inversion algorithm: 

1 0  0 0

1 1 0 0

1 1 1 0

1 1 1 1

1 0 0 0

0 1 0 0

0 0 1 0

 R −R ,  0 R21 −R32 , 1  0 3 −R4 0  R−→ 0 0 0 1

0 1 0 0

0 0 1 0

0 0 0 1

 1 −1 0 0 0 1 −1 0 . 0 0 1 −1 0 0 0 1

0

The right-hand block is [I]SS . To calculate the standard matrix of D, evaluate D(vj ), then express the result as a coordinate vector with respect to S. Here, D(1) = 0,

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

86

LINEAR ALGEBRA

D(t) = 1, D(t2 ) = 2t, and D(t3 ) = 3t2 . The standard coordinate vectors may be read off by inspection, giving   0 1 0 0  0 0 2 0  [D]SS =   0 0 0 3 . 0 0 0 0 0

0

0

Finally, [I]SS [D]SS [I]SS = [D]SS 0 , or 

 0 1 −1 0 0  0  1 −1 0 0  0 0 1 −1 0 0 0 0 1 0

1 0 0 0

0 2 0 0

 0 1   0  0 3  0 0 0

1 1 0 0

1 1 1 0

  1 0   1  0 = 1  0 0 1

 1 −1 −1 0 2 −1 . 0 0 3 0 0 0

The third column signifies that D(v30 ) = −v10 + 2v20 , while the fourth column signifies that D(v40 ) = −v10 − v20 + 3v30 .

4.3

The Rank-Nullity Theorem

Let V be an n-dimensional vector space. The qualitative structure of a linear transformation T : V → W is simple: “Some of the dimensions of V are sent to 0W ; the rest and mapped injectively”. We prove two precise statements of this result, one very concrete, one abstract. A third proof strategy, intermediate in abstraction, may be found in Exercise 4.15. Definition 4.43. Let A = [Aij ] be an m × n matrix, with rows (Ai )m i=1 in (Rn )∗ and columns (Aj )nj=1 in Rm . The nullspace of A, Null(A), is the solution space of the homogeneous system Ax = 0m . The nullity of A is the dimension of the nullspace of A. The column space of A, Col(A), is Span(Aj )nj=1 ⊆ Rm . The column rank of A is the dimension of the column space of A. n ∗ The row space of A, Row(A), is Span(Ai )m i=1 ⊆ (R ) . The row rank of A is the dimension of the row space of A. Remark 4.44. If µA : Rn → Rm is multiplication by A, then the nullspace of A is the kernel of µA , and the column space of A is the image of µA .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

87

Lemma 4.45. If A is an m × n matrix, E is a product of m × m elementary row matrices, and A0 = EA, then (i) Null(A) = Null(A0 ). (ii) dim Col(A) = dim Col(A0 ). (iii) Row(A) = Row(A0 ). Proof. (i). This restates Proposition 1.38. (Elementary row operations do not change the homogeneous solution space.) (ii). Multiplication by E defines an invertible (hence injective) linear transformation µE : Rm → Rm sending Col(A) to Col(A0 ). Corollary 4.8 implies dim Col(A) = dim Col(A0 ). (iii). Each elementary row operation replaces a set of rows with a set of linear combinations of the rows, so Row(A0 ) ⊆ Row(A). Since E −1 is a product of elementary matrices, reversing the roles of A and A0 gives Row(A) ⊆ Row(A0 ). Theorem 4.46 (Rank-Nullity for Matrices). If A is an m × n matrix, (i) dim Col(A) = dim Row(A). (ii) dim Null(A) + dim Col(A) = n. Proof. (i). Let A0 be the reduced row-echelon form of A, and let r denote the number of non-zero rows of A0 , i.e., the number of leading 1’s. Since the rows of A0 are linearly independent, dim Row(A0 ) = r. Since every column of A0 has at most r non-zero components, we have dim Col(A0 ) ≤ r. But the r columns containing leading 1’s are obviously linearly independent, so dim Col(A0 ) = r. By Lemma 4.45, dim Col(A) = dim Col(A0 ) = r = dim Row(A0 ) = dim Row(A). (ii). By Theorem 2.76, dim Null(A) is the number of free variables, namely (n − r), the number of columns not containing a leading 1, while dim Col(A) = r is the number of basic variables. The sum of these dimensions is n. Definition 4.47. If A is an m×n matrix, the rank of A is the common value dim Col(A) = dim Row(A).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

88

LINEAR ALGEBRA

Proposition 4.48. Let A be an m × n matrix. If the basic variables of the homogeneous system Ax = 0m are in columns (kj )rj=1 , the corresponding columns (Akj )rj=1 are a basis of Col(A). Remark 4.49. That is, one way to find a basis of Col(A) is to rowreduce A, then select the columns of A itself corresponding to the leading 1’s of the row-echelon matrix. Proof. Let B be the m × r matrix whose jth column is Akj . (In words, discard all the “free variable” columns of A.) The reduced row-echelon form B 0 is Ir followed by m − r rows of 0’s, so dim Col(A) = r = dim Col(B 0 ) = dim Col(B).

Since B has r columns, its columns are linearly independent, hence form a basis of Col(A).

Quotient Vector Spaces Our second proof of the Rank-Nullity Theorem is the direct analog of the First Homomorphism Theorem from group theory. Definition 4.50. The nullity of T is dim ker(T ). The rank is dim T (V ). Proposition 4.51. Let (V, +, ·) be a vector space, K a subspace. The binary relation x1 ≡ x2 (mod K) if and only if x2 − x1 ∈ K is an equivalence relation on V . Proof. (Reflexivity). If x ∈ V , then x − x = 0V ∈ K; by definition, x ≡ x (mod K).

(Symmetry). If x1 ≡ x2 (mod K), then x2 − x1 ∈ K. Since K is closed under inversion, x1 − x2 = −(x2 − x1 ) ∈ K, so x2 ≡ x1 (mod K).

(Transitivity). If x1 ≡ x2 (mod K) and x2 ≡ x3 (mod K), then by hypothesis, x2 − x1 ∈ K and x3 − x2 ∈ K. Since K is closed under addition, x3 − x1 = (x3 − x2 ) + (x2 − x1 ) ∈ K, proving that x1 ≡ x3 (mod K). Definition 4.52. Let (V, +, ·) be a vector space, K a subspace. For each x0 in V , the set x0 + K = {x in V : x = x0 + v for some v in K} is called a coset of K in V .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

89

Remark 4.53. The equivalence classes of the relation in Proposition 4.51 are precisely the cosets of K in V : If x and x0 are arbitrary vectors in V , then x0 ≡ x (mod K) if and only if x − x0 ∈ K, if and only if x ∈ x0 + K. Lemma 4.54. Let (V, +, ·) be a vector space, K a subspace. If x1 ≡ x2 (mod K) and y1 ≡ y2 (mod K), and if c is real, then cx1 + y1 ≡ cx2 + y2

(mod K).

Proof. By hypothesis, there exist elements x0 and y0 of K such that x2 − x1 = x0 and y2 − y1 = y0 . Thus (cx2 + y2 ) − (cx1 + y1 ) = c(x2 − x1 ) + (y2 − y1 ) = cx0 + y0 ∈ K. Remark 4.55. Lemma 4.54 guarantees that addition and scalar multiplication of cosets is “well-defined”, i.e., depends only on the cosets, and not on the representative elements of V . The coset K = 0V + K acts as identity element for addition, i.e., as the zero vector. The axioms for a vector space are mere formalities, amounting to appending “+K” to each of the axioms for V . Definition 4.56. Let (V, +, ·) be a vector space, K a subspace. The set of cosets of K in V is denoted V /K. For x, y in V and for real c, we define (x + K)  (y + K) = (x + y) + K,

c

(x + K) = (cx) + K.

The vector space (V /K, , ) is called the quotient of V by K. Theorem 4.57. If (V, +, ·) is a vector space of dimension n, and if K is a subspace of dimension k, the quotient (V /K, , ) is a vector space of dimension (n − k). Proof. Pick a basis (vj )kj=1 for K, and extend to a basis S of V by appending vectors (vj )nj=k+1 . Write V 0 = Span(vj )nj=k+1 , and note that V = K ⊕ V 0 . It suffices to show that the cosets (vj + K)nj=k+1 are a basis of V /K. (Spanning). If x + K ∈ V /K, then x can be written as a linear combination from S, say x=

n X j=1

j

x vj =

k X j=1

j

x vj +

n X

xj v j .

j=k+1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

90

LINEAR ALGEBRA

Calling the sums x0 and x0 , respectively, we have x0 ∈ K, so that x ≡ x0 (mod K), and x0 ∈ V 0 . As cosets (i.e., as elements of V /K), we have n X x + K = x0 + K = xj (vj + K). j=k+1

(Linear independence). Suppose xk+1 , . . . , xn are scalars such that 0

V /K

=

n X

j=k+1

j

x (vj + K) =

 X n

j

x vj

j=k+1



+ K.

The expression x0 in parentheses lies in V 0 by definition, and lies in K because x0 + K = 0V /K . Since K ∩ V 0 = {0V }, we have x0 = 0V . Since (vj )nj=k+1 is linearly independent, xj = 0 for each j. Theorem 4.58. Let V and W be vector spaces. If T : V → W is a linear transformation with kernel K, there is a vector space isomorphism T : V /K → T (V ) defined by T (x + K) = T (x). Proof. (T is well-defined). If x1 + K = x2 + K for some x1 , x2 in V , then x2 − x1 ∈ K = ker(T ), so 0W = T (x2 − x1 ) = T (x2 ) − T (x1 ), or T (x1 ) = T (x2 ). That is, T (x1 + K) = T (x2 + K). (Injectivity). T (x0 + K) = T (x0 ) = 0W if and only if x0 ∈ K = ker(T ), if and only if x0 + K = 0V /K . (Surjectivity). Suppose y ∈ T (V ). By hypothesis, there exists an x in V such that T (x) = y. By definition, T (x + K) = T (x) = y. Corollary 4.59 (The Rank-Nullity Theorem). Suppose V is a finitedimensional vector space, W an arbitrary vector space. If T : V → W is a linear transformation, then dim ker(T ) + dim T (V ) = dim V. Proof. Theorem 4.58 says T (V ) is isomorphic to V / ker(T ). Theorem 4.57 immediately implies the claim. Corollary 4.60. Let V and W be vector spaces of dimension n. A linear transformation T : V → W is injective if and only if it is surjective.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

91

Proof. By the Rank-Nullity Theorem, dim V = dim ker(T )+dim T (V ). By hypothesis, dim V = dim W . Thus, T is injective if and only if dim ker(T ) = 0, if and only if dim T (V ) = dim V = dim W , if and only if T (V ) = W , if and only if T is surjective. Remark 4.61. The conclusion of the Rank-Nullity Theorem is true if dim V = ∞, in the sense that at least one of the kernel and the image is infinite-dimensional. Remark 4.62. Let A be a real m × n matrix, µA : Rn → Rm the multiplication map. The dimension of Rn is the number of columns of A. The kernel of µA is the solution space of the homogeneous system Ax = 0m , i.e., the number of free variables after the coefficient matrix is put into row-echelon form. A vector b in Rm is in the image of µA if and only if Ax = b is consistent. The solution set is a coset of the homogeneous solution space. This recapitulates our earlier result that an arbitrary non-homogeneous solution x is the sum of any particular solution x0 and a homogeneous solution xh . By the Rank-Nullity Theorem, the dimension of the image is the number of columns minus the number of free variables, i.e., the number of basic variables.

Exercises Exercise 4.1. In the vector space P3 of cubic polynomials, consider the basis S 00 = (1, 1 + t, 1 + t + 21 t2 , 1 + t + 21 t2 + 61 t3 ) = (vj00 )4j=1 . Use the techniques (and results, as appropriate) of Example 4.42 to calculate 00 [D]SS 00 . Exercise 4.2. Let V be the vector space P2 of quadratic polynomials, and let W = R3 , both with the usual vector space operations. Define  T : V → W by T (p) = p(−1), p(0), p(1) . (a) Show directly that T is linear.

(b) Find the matrix of T with respect to the standard bases. (c) Determine (with justification) whether or not T is an isomorphism. (d) If possible, solve T (p) = (1, 2, 3) for p.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

92

LINEAR ALGEBRA

Exercise 4.3. Each part refers to the matrix   −1 2 −3 4 −6 A = −2 1 −2 3 and the associated linear transformaton µA : R3 → R3 . (a) Use row reduction to find a basis for ker(µA ), and for im(µA ). Suggestion: Use Rank-Nullity to find dim im(µA ). (b) Find vectors v and w in R3 such that A = wv T , and verify that ker(µA ) = Span(v)⊥ and im(µA ) = Span(w). Exercise 4.4. Let S = (e11 , e21 , e12 , e22 ) be the standard basis of R2×2 . Calculate the matrices with respect to S of (a) The transpose operator; (b) The operator T (A) = A + AT ; (c) The operator T (A) = A − AT . Exercise 4.5. Let V ⊆ F(R, R) be the subspace spanned by the set S = (cos, sin) = (vj )2j=1 . (a) Show that differentiation defines an operator D : V → V , and find the matrix J = [D]SS . (b) Calculate J 2 , the square of the matrix you found in (a), and interpret the result in terms of second derivatives of cos and sin. Exercise 4.6. Let n be a positive integer. (a) Prove that Pn is isomorphic (as a vector space) to Rn+1 . Suggestion: There is an “obvious” way to map a polynomial of degree at most n to an ordered tuple of coefficients. (b) Let t0 be a real number, and define the “evaluation functional” εt0 : Pn → R by εt0 (p) = p(t0 ). Use the Rank-Nullity Theorem to prove that ker(T ) = {p in Pn : p(t0 ) = 0} is n-dimensional. n−1 (c) In the notation of part (b), show that S = (t − t0 )tj j=0 is a basis of ker(T ). Exercise 4.7. Let a, b, and c be real numbers, not all 0,   0 c −b 0 a , A = −c b −a 0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 4. LINEAR TRANSFORMATIONS

93

and let µA : R3 → R3 be multiplication by A. (a) Show that hAx, yi = − hx, Ayi for all x, y in R3 . Hint: Recall that hx, yi = xT y. (b) Find the rank of µA . Suggestion: Split into cases (i) a 6= 0; (ii) a = 0 and b 6= 0; and (iii) a = b = 0, c 6= 0. (c) Find the nullity of µA , give a basis for the kernel, and use part (a) to show that R3 is the orthogonal direct sum ker(µA ) ⊕ im(µA ). Exercise 4.8. Prove from the definition that a composition of linear transformations is linear. Exercise 4.9. Let T : V → V 0 and T 0 : V 0 → V 00 be linear transformations. Prove: (a) ker(T ) ⊆ ker(T 0 T ).

(b) im(T 0 T ) ⊆ im(T 0 ).

Exercise 4.10. Suppose A, B, and P are n × n matrices, and that B = P −1 AP . Prove by mathematical induction that if N is a positive integer, then B N = P −1 AN P . (Note carefully that this is not what you might expect by “distributing” the exponent!) Exercise 4.11. Let a1 and a2 be real numbers, and define       a1 a2 1 1 A= . , v2 = , v1 = a2 a1 −1 1 (a) Show Av1 = (a1 + a2 )v1 . Establish a similar formula for Av2 . (b) Let S = (e1 , e2 ) be the standard basis of R2 , and S 0 = (v1 , v2 ). 0 Calculate the transition matrices [I]SS and [I]SS 0 . (c) Let T : R2 → R2 be the linear transformation whose standard 0 matrix [T ]SS is A. Find the matrix [T ]SS 0 . (d) If N is a positive integer, find a formula for the entries of AN . Suggestion: Use Exercise 4.10. Exercise 4.12. Let σ be a permutation (i.e., bijection) of the set {1, 2, . . . , n}, and let Pσ be the associated permutation matrix, see Exercise 1.13. If A = diag[λ1 , λ2 , . . . , λn ] for some numbers λi , calculate Pσ AP−1 σ .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

94

LINEAR ALGEBRA

Exercise 4.13. In each part, let V be the space of smooth (infinitelydifferentiable) real-valued functions on R, let V0 be the subspace of functions f satisfying f (0) = 0, and let D and S denote the differentiation and integration operators Z t 0 f (s) ds. (Df ) = f (t), (Sf )(t) = 0

Use theorems from calculus to justify your answer in each part. (Exercise 4.9 may also be helpful.) (a) Calculate the compositions D ◦ S and S ◦ D. Is either composition the identity map IV ? (b) Show S is injective, and find the image. (c) Show D is surjective, and find the kernel. (d) Does S map V0 to itself? If so, is S : V0 → V0 an isomorphism? (e) Does D map V0 to itself? If so, is D : V0 → V0 an isomorphism? Exercise 4.14. Let ` be a positive real number. A function f : R → R is `-periodic if f (t + `) = f (t) for all real t. With the notation of the preceding exercise: (a) Prove that the space of `-periodic functions is a vector subspace of F(R, R). (Consequently, the set W of smooth `-periodic functions is a subspace of V .) (b) Show that D maps W to W . (That is, the derivative of a smooth, `-periodic function is `-periodic.) R` (c) Show that if f ∈ W , then Sf ∈ W if and only if 0 f (t) dt = 0.

Exercise 4.15. Let V be a finite-dimensional vector space with basis S = (vj )nj=1 , let T : V → W be a linear transformation, and put wj = T (vj ) for each j. Prove from the definitions (not using results proven in this chapter) that: (a) The set T (S) = (wj )nj=1 spans the image T (V ). (b) If (vj )kj=1 is a basis for ker(T ), then the set (wj )nj=k+1 is a basis of the image T (V ). (c) Use part (b) to deduce the Rank-Nullity Theorem (Corollary 4.59).

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Chapter 5 Diagonalization 5.1

Linear Operators

If (V, +, ·) is a vector space, a linear transformation T : V → V is called a linear operator (on V ). If V is finite-dimensional, and if S and S 0 are bases, an operator T may be expressed as a matrix with respect to either basis, and the two matrices are similar: 0

0

[T ]SS 0 = [IV ]SS [T ]SS [IV ]SS 0 ,

or A0 = P −1 AP.

Given a linear operator T , we would like to find a basis S 0 with respect to which the matrix A0 of T is “as simple as possible”.∗ For example, we might want A0 to be a diagonal matrix or an upper triangular matrix. Can we arrange this, and if so, how can we calculate a basis S 0 for a specific operator? Remark 5.1. If A0 were a scalar matrix, i.e., if A0 = cIn for some real number c, the condition A0 = P −1 AP would imply A = P A0 P −1 = P (cIn )P −1 = cIn , see Theorem 1.19 (iii). Consequently, every matrix similar to a scalar matrix is a scalar matrix. Definition 5.2. Let (V, +, ·) be an n-dimensional real vector space. A linear operator T : V → V is diagonalizable if there exists a basis S 0 0 of V such that A0 = [T ]SS 0 is diagonal. ∗

The technical term is a canonical form for T , or for A. A matrix does not have a unique “canonical form”; the notion of “simplest” depends on one’s mathematical intent.

95

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

96

LINEAR ALGEBRA

The following characterization is immediate from the definition, but stated formally for emphasis. Proposition 5.3. Let (V, +, ·) be an n-dimensional real vector space, S 0 = (vj )nj=1 a basis of V , and T : V → V a linear operator. The 0

matrix A0 = [T ]SS 0 is diagonal if and only if there exist scalars (λj )nj=1 such that T (vj ) = λj vj for each j. Remark 5.4. In the notation of the proposition,  λ1 0  0 λ2 0  [T ]SS 0 = diag[λ1 , λ2 , . . . , λn ] = diag[λj ] =  .. .. . . 0 0

5.2

... ... .. .

 0 0  ..  . . 

. . . λn

Eigenvalues and Eigenvectors

Definition 5.5. Let T : V → V be a linear operator. A real number λ is an eigenvalue of T if the operator (T − λIV ) is not invertible. A non-zero vector v in V is a λ-eigenvector of T if T (v) = λv. If λ is an eigenvalue of T , the subspace Eλ = ker(T − λIV ) = {v in V : T (v) = λv} is the λ-eigenspace of T . Remark 5.6. If λ is an eigenvalue of T , the λ-eigenspace consists of all λ-eigenvectors together with the zero vector 0V , which by definition is not an eigenvector but does satisfy T (v) = λv. Eigenspaces of an operator are “invariants”, depending only on the operator, not on any choice of basis of V . Every linear operator on a 1-dimensional space is scalar, hence diagonal with respect to an arbitrary basis. Remark 5.7. Proposition 5.3 says T is diagonalizable if and only if V has a basis consisting entirely of eigenvectors of T . For brevity, such a basis is called a T -eigenbasis of V . Example 5.8. If A = diag[λ1 , λ2 , . . . , λn ], then every standard basis vector is an eigenvector: Aej = λj ej . If the λj are distinct, the eigenspaces of A are precisely the coordinate axes. If there are repetitions among the diagonal entries, the standard basis vectors corresponding to a single eigenvalue λ span the λ-eigenspace.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

97

Example 5.9. A linear operator T has eigenvalue 0 if and only if T is not invertible, and E0 = ker(T ). Example 5.10. Let a and b be real numbers, and let T : R2 → R2 have standard matrix   a b S . [T ]S = A = b a The vector v1 = (1, 1) is easily checked to be an (a + b)-eigenvector, while v2 = (1, −1) is an (a − b)-eigenvector. It follows that T is diag0 onalizable, and if S 0 = (vj )2j=1 , then [T ]SS 0 = A0 = diag[a + b, a − b]. (Checking this by direct computation is worthwhile, see Exercise 4.11.) If b 6= 0, the eigenvalues a ± b of T are distinct, and there are two 1-dimensional eigenspaces: Ea+b = Span(v1 ),

Ea−b = Span(v2 ).

If b = 0, there is one eigenvalue, a, and Ea = R2 . Example 5.11. Let P denote the space of polynomials in one variable, and D the derivative operator. A non-zero constant polynomial is a 0-eigenvector for D. There are no other eigenvectors in P : If p is a polynomial of degree n ≥ 1, then D(p) = p0 is a non-zero polynomial of degree n − 1 ≥ 0, while every scalar multiple of p is either zero or of degree n. The operator T (p)(t) = tp0 (t) on P has each positive integer k as eigenvalue, since T (tk ) = t(ktk−1 ) = ktk . Since these eigenvectors span P , it turns out there are no other eigenvalues or eigenspaces. Remark 5.12. If T : V → V is an operator and c is real, then λ is an eigenvalue of T if and only if (λ − c) is an eigenvalue of T − cIV . Theorem 5.13. Let (V, +, ·) be an n-dimensional real vector space, T : V → V a linear operator. (i) If λ is an eigenvalue of T , then dim Eλ ≥ 1. (ii) If λ1 > λ2 > · · · > λr are eigenvalues of T , and if S = (vj )rj=1 is a set of corresponding eigenvectors, i.e., if T (vj ) = λj vj for each j, then S is linearly independent. (iii) T is diagonalizable if and only if its eigenspaces span V .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

98

LINEAR ALGEBRA

Proof. (i). If λ is an eigenvalue of T , then T − λIV is not invertible. By Corollary 4.60, T − λIV is not injective, so dim ker(T − λIV ) ≥ 1. (Concretely, there is a non-zero vector v such that (T − λIV )(v) = 0V , i.e., T (v) = λv.) (ii). The proof is by induction on r. The case r = 1 is obvious: an eigenvector is non-zero by definition, so a set of one eigenvector is linearly independent. Assume inductively for some k ≥ 1 that the set (vj )kj=1 of eigenvectors is linearly independent, and suppose (*)

V

1

k

0 = x v1 + · · · + x vk + x

k+1

k+1 X

vk+1 =

xj v j

j=1

for some scalars xj . Applying the operator T −λk+1 IV to the preceding equation gives 0V = x1 (λ1 − λk+1 )v1 + · · · + xk (λk − λk+1 )vk =

k X j=1

xj (λj − λk+1 )vj .

The right-hand sum is a linear combination from a linearly independent set, so the coefficients xj (λj − λk+1 ) are all 0. By hypothesis, the eigenvalues are decreasing with index, so 0 < (λj − λk+1 ); thus xj = 0 for 1 ≤ j ≤ k. Substituting into (*), we find xk+1 = 0 as well; that is, (vj )k+1 j=1 is linearly independent. (iii). As noted above, T is diagonalizable if and only if there exists a T -eigenbasis of V . Suppose V has a T -eigenbasis S 0 . Since every eigenvector of T lies S 0 in some eigenspace, S ⊆ λ Eλ , and therefore [  M 0 V = Span(S ) ⊆ Span Eλ = Eλ . λ

λ

(The final step is Theorem 2.44; the sum is direct by (ii) just proven.) Conversely, assume the eigenspaces of T span V . For each eigenλ value λ of T , pick a basisSSλ = (vjλ )nj=1 of Eλ . It suffices to prove the union of these sets, S = λ Sλ , is a basis of V . Since Span(Sλ ) = Eλ , Theorem 2.44 gives M Span(S) = Eλ = V. λ

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

99

To prove S is linearly independent, suppose some linear combination from S is equal to 0V . Indexing coefficients (and grouping terms) according to the corresponding eigenvalue,  nλ X X j λ V 0 = xλ v j . λ

j=1

P λ λ This equation has the form 0V = λ v , with v in Eλ . By (ii), λ V λ v P = λ0 ; if any v were non-zero, the non-trivial linear combination λ v would be non-zero. Fix an eigenvalue λ arbitrarily. Since nλ X

xjλ vjλ = v λ = 0V

j=1

λ and Sλ = (vjλ )nj=1 is linearly independent, all the scalars xjλ are zero; only the trivial linear combination from S gives the zero vector.

Corollary 5.14. If dim V = n and T has n distinct real eigenvalues, then T is diagonalizable. Proof. For each eigenvalue λ there exists a λ-eigenvector by (i); these n vectors are linearly independent by (ii), so are a basis of V . Example 5.15. If v and w are arbitrary non-zero elements of Rn , there is a rank-one operator T (x) = hv, xi · w, with standard matrix wv T , see Example 4.31. If hv, xi = 0, then T (x) = 0n ; that is, the 0-eigenspace contains Span(v ⊥ ), which has dimension (n − 1). (The 0-eigenspace is not Rn , because wv T 6= 0n×n , so in fact E0 = Span(v ⊥ ).) If x is an eigenvector of T and hv, xi 6= 0, then hv, xi · w = λx. It follows that x = cw with c 6= 0, in which case λ = hv, wi. If hv, wi = 6 0, then Eλ = Span(w). Consequently, Rn = E0 ⊕ Eλ , and T is diagonalizable. If instead hv, wi = 0, then all n eigenvalues of T are 0 but T 6= 0, so T is not diagonalizable.

Involutions Definition 5.16. Let V be an arbitrary vector space (possibly infinitedimensional). An operator T satisfying T 2 = IV is an involution of V . Proposition 5.17. If T is an involution, the eigenspaces of T span V .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

100

LINEAR ALGEBRA

Proof. If v is an arbitrary element of V , then the vectors v1 =

1 2

 v + T (v) ,

v −1 =

1 2

 v − T (v) ,

are eigenvectors of T with eigenvalue 1 and −1, respectively (compare the proof of Proposition 2.49). Moreover, v = v 1 + v −1 ; that is, the eigenspaces of T span V . Remark 5.18. The eigenvalues of an involution are, a priori, ±1, since if v is an eigenvector, then  v = T 2 (v) = T T (v) = T (λv) = λT (v) = λ2 v. Suppose “optimistically” that every vector decomposes as a sum of eigenvectors: v = v 1 + v −1 . Applying T gives T (v) = v 1 − v −1 . “Solving” for v 1 and v −1 gives the formulas in the preceding proof. Example 5.19. Let V = Rn×n . The transpose operator T (A) = AT is an involution (the transpose of a transpose is the original matrix). Its eigenspaces are E1 = Symn (the symmetric matrices) and E−1 = Skewn (the skew-symmetric matrices), compare Proposition 2.49.

5.3

The Characteristic Polynomial

Throughout this section, (V, +, ·) is a finite-dimensional real vector space of dimension n, and T : V → V is a linear operator. Proposition 5.20. If S and S 0 are bases of V , and if A = [T ]SS and 0 A0 = [T ]SS 0 are the matrices of T in these bases, then det A0 = det A. Proof. By Corollary 4.37 (ii), the matrices A and A0 are similar. By Corollary 3.57, similar matrices have the same determinant. Definition 5.21. The determinant of T is the determinant of the matrix of T with respect to an arbitrary basis of V . The characteristic polynomial of T is χT (λ) = det(T − λIV ) = det T + · · · + (−λ)n , of degree n. The characteristic equation of T is χT (λ) = 0.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

101

Remark 5.22. Each summand in a determinant is a signed product of n matrix entries. Since each entry in the matrix of T − λIV is either a number or a linear polynomial in λ, the determinant has degree n. The only term of degree n comes from the product of the diagonal entries (Ajj − λ), so the coefficient of λn is (−1)n . The constant term is found by setting λ = 0. Proposition 5.23. A real number λ is an eigenvalue of T if and only if χT (λ) = 0. Proof. A real number λ is an eigenvalue of T if and only if T − λIV is not invertible, if and only if χT (λ) = det(T − λIV ) = 0. Remark 5.24. By the Fundamental Theorem of Algebra, the characteristic polynomial of T has precisely n complex roots, counting multiplicity. For terminological convenience, we extend the definition of “eigenvalue” to encompass non-real roots. Definition 5.25. Let (V, +, ·) be a finite-dimensional real vector space, and T : V → V a linear operator. A root of the characteristic polynomial of T is an eigenvalue of T . Example 5.26. Let a and b be real, and consider the matrix   a − λ b a b = (a − λ)2 − b2 , ; χA (λ) = det(A − λI2 ) = A= b a − λ b a

a difference of squares: χA (λ) = (a + b − λ)(a − b − λ). We recover the eigenvalues λ = a ± b. (Computing corresponding eigenvectors by substituting each eigenvalue in turn and solving the resulting homogeneous system is a worthwhile exercise.) Example 5.27. Let a and b be real, and consider the matrix   a − λ −b a −b = (a−λ)2 +b2 . ; χB (λ) = det(B−λI2 ) = B= b a − λ b a

The characteristic equation of B is λ2 − 2aλ + (a2 + b2 ) = 0. The quadratic formula gives p p 2a ± (2a)2 − 4(a2 + b2 ) λ= = a ± −b2 = a ± bi. 2 That is, if b 6= 0, the roots of the characteristic equation are non-real. Because B has no real eigenvalues, B has no real eigenvectors. It follows that B is not diagonalizable over the real numbers.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

102

LINEAR ALGEBRA

Remark 5.28. The matrix B of the preceding example defines a linear operator on the complex vector space C2 , the set of ordered pairs of complex numbers equipped with complex scalar multiplication. In this space, B has two linearly independent eigenvectors: v1 = (1, −i) and v2 = (1, i). For example,        1 a + bi 1 a −b = (a + bi)v1 . = (a + bi) = Bv1 = −i b − ai b a −i Thus, B is diagonalizable over the complex numbers. Example  a C= 0

5.29. Let a and b 6= 0 be real, and consider the matrix  a − λ b b = (a − λ)2 . ; χC (λ) = det(C − λI2 ) = 0 a − λ a

The only real eigenvalue of C is a, of “algebraic multiplicity 2”. (That is, λ = a is a double root of the characteristic equation.) To find the corresponding eigenspace, we solve the homogeneous system with coefficient matrix C − aI2 = be21 . Multiplying the first row by 1/b puts the coefficient matrix into reduced row-echelon form. The variable x2 is basic, and x1 is free. The only eigenvectors are therefore scalar multiples of v1 = (1, 0), i.e., Ea = {(x1 , 0) in R2 }. The eigenspace does not span R2 , so C is not diagonalizable. Because dim Ea = 1, we say the eigenvalue a has “geometric multiplicity 1”. Remark 5.30. The preceding three examples typify the general eigenspace behavior of a real n × n matrix A. In the “best” case, A has all real eigenvalues and a set of eigenvectors that span Rn , so A is diagonalizable. Two things can prevent A from being diagonalizable over the reals. First, some of the eigenvalues may be non-real. (Because the characteristic polynomial of a real matrix has real coefficients, any non-real eigenvalues occur in complex conjugate pairs. A matrix with non-real eigenvalues may be diagonalizable over the complex numbers, i.e., there may exist an invertible complex n × n matrix P such that P −1 AP is diagonal.) Second, A may have only real eigenvalues, but its eigenspaces may fail to span Rn .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

103

Example 5.31. Find the eigenvalues and eigenspaces of the matrix   1 1 −1 1 1 , A = 1 1 −1 3 and determine whether A is diagonalizable. If so, find a matrix P so that A0 = P −1 AP is diagonal with the eigenvalues in non-increasing order. Subtract λI3 , then expand the determinant using the formula from Example 3.47: 1 − λ 1 −1 1−λ 1 det(A − λI3 ) = 1 1 −1 3 − λ   = (1 − λ) (1 − λ)(3 − λ) + 1   − (3 − λ) − 1   + (−1) (1)(−1) − (1 − λ)       = (1 − λ) λ2 − 4λ + 4 + λ − 2 − λ − 2 = −(λ − 1)(λ − 2)2 ,

so the eigenvalues (in non-increasing order) are 2, 2, and 1. To find a basis for E2 , row-reduce A − 2I3 :     −1 1 −1 R3 −R2 , 0 0 0 R1 +R2 1 −1 1 . 1 −→ A − 2I3 =  1 −1 0 0 0 1 −1 1 The system may be written x1 = x2 − x3 , so the general solution is      1  2      x x − x3 −1 1 −1 1  x2  =  x2  = x2  1  + x3  0  ; v 1 =  1  , v 2 =  0  . 1 0 1 0 x3 x3 Theorem 5.13 implies A is diagonalizable: We have shown dim(E2 ) = 2, and dim(E1 ) ≥ 1, so the eigenspaces of A span R3 . To find a 1eigenvector, row-reduce A − I3 :       1 0 1 0 1 −1 R3 −R2 , 0 1 −1 R1 ↔R2 R3 +R1 0 1 −1 . 1 0 1 −→ 0 1 −→ A − I 3 = 1 0 0 0 0 0 0 1 −1 2

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

104

LINEAR ALGEBRA

The system is written x1 = −x3 and x2 = x3 , so    1  3   x −x −1 −1  x2  =  x3  = x3  1  ; v 3 =  1  . 1 1 x3 x3 To catch arithmetic mistakes, it’s a good idea to verify by matrix multiplication that Av1 = 2v1 , Av2 = 2v2 , and Av3 = v3 . The eigenbasis S 0 = (vj )3j=1 diagonalizes. The transition matrix 0

P = [I3 ]SS 0 has jth column [vj ]S . Tocompute the inverse, P −1 = [I3 ]SS , form the augmented matrix P I3 and row-reduce: 

 R1 +R3 , 

 1 −1 −1 1 0 0 R2 −R1 , 1 0 0 1 0 1 R3 −R2 0 0 1 −1 1 1 −1 ; 0 1 0 1 0 −→ 0 1 1 0 0 1 0 1 0 1 −1 2 swapping the second and third rows gives     1 −1 −1 1 0 1 0 1 . 2 , P = 1 P −1 =  1 −1 0 1 1 −1 1 −1 Again to guard against arithmetic errors, it’s a good idea to check that P −1 P = I3 . On theoretical grounds, diag[2, 2, 1] = A0 = P −1 AP , i.e.,       1 −1 −1 1 1 −1 1 0 1 2 0 0 0 2 0 =  1 −1 0 1 . 1 1 1 2 1 0 1 1 1 −1 3 −1 1 −1 0 0 1 For good measure, check this by multiplying matrices. Example 5.32. Let k ≥ 2 be an integer and  λ 1 0 λ k−1  X  j+1 Bk (λ) = λIk + ej e =  ... ...  j=1 0 0 0 0

λ real. The k × k matrix  0 ... 0 0 1 . . . 0 0  .. . . .. ..  . . . .  0 . . . λ 1 0 ... 0 λ

is called the Jordan block of size k with eigenvalue λ. The matrix Nk = Bk (0) is nilpotent: Nkj 6= 0k×k for 1 ≤ j < k, but Nkk = 0k×k .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

105

The eigenvalue of Bk (λ) is λ, with algebraic multiplicity k. The λ-eigenspace is easily checked to be Span(e1 ), so λ has geometric multiplicity 1. In particular, Bk (λ) is not diagonalizable. In more advanced courses, it is proven that if V is finite-dimensional and T is an operator with real eigenvalues, then V splits as a direct sum of subspaces in such a way that T acts as a Jordan block on each subspace.

5.4

Symmetric Operators

Let N be a positive integer. Throughout this section, V is a subspace of (RN , +, ·), and is equipped with the inner product coming from the Euclidean dot product, hx, yi = xT y. Definition 5.33. An operator T : V → V is symmetric if hT (x), yi = hx, T (y)i

for all x and y in V .

Remark 5.34. If (vj )nj=1 is a basis of V , T is symmetric if and only if hT (vi ), vj i = hvi , T (vj )i

for all i, j = 1, . . . , n.

Symmetry implies this condition if this condition P i a fortiori.PConversely, j holds, then writing x = i x vi and y = j y vj and using bilinearity of the dot product shows T is symmetric. Example 5.35. Let V = Rn , and assume T (x) = Ax for all x in V ; that is, A is the standard matrix of T . The operator T is symmetric if and only if xT AT y = (Ax)T y = hAx, yi = hx, Ayi = xT Ay for all x and y in Rn , if and only if AT = A. Proposition 5.36. If W is a subspace of (Rn , +, ·), the orthogonal projection map projW : Rn → Rn is symmetric.

Proof. Every vector x in Rn decomposes uniquely as x = xW + xW ⊥ , and the orthogonal projection operator is defined by projW (x) = xW . If x = xW + xW ⊥ and y = yW + yW ⊥ are arbitrary vectors, then hprojW (x), yi = hxW , yW + yW ⊥ i = hxW , yW i = hxW + xW ⊥ , yW i = hx, projW (y)i .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

106

LINEAR ALGEBRA

Proposition 5.37. If T1 and T2 are symmetric, and if c is real, then cT1 + T2 is symmetric. Proof. Exercise 5.3. Remark 5.38. In particular, the set of symmetric operators is a vector subspace of the set L(V, V ) of all operators. Further, an arbitrary linear combination of orthogonal projections is symmetric; the Spectral Theorem, below, asserts the converse. Proposition 5.39. Let S 0 = (uj )nj=1 be an orthonormal basis of V . If T is an operator on V , then T is symmetric if and only if the matrix 0 [T ]SS 0 is a symmetric matrix. Proof. The matrix A = [Aij ] of T with respect to S 0 is defined by T (uj ) =

n X

Akj uk .

k=1

Since hui , uk i = δik , we have hT (uj ), ui i =

n X

Akj

k=1

huk , ui i =

n X

Akj δik = Aij .

k=1

Exchanging the roles of i and j gives huj , T (ui )i = hT (ui ), uj i = Aji . The operator T is symmetric if and only if Aij = hT (uj ), ui i = huj , T (ui )i = Aji

for all i and j,

0

if and only if [Aij ] = [T ]SS 0 is a symmetric matrix. Remark 5.40. If S 0 = (vj )nj=1 is an arbitrary basis and T is symmetric, the expression n X Akj hvk , vi i k=1

0

is symmetric in i and j, but [Aij ] = [T ]SS 0 may not be symmetric. Theorem 5.41. Let A be a real, symmetric n × n matrix. (i) Every eigenvalue of A is real. (ii) Distinct eigenspaces of A are orthogonal.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

107

Proof. (i). Let λ = α + iβ be an eigenvalue of A. It follows that the complex conjugate λ = α − iβ is also an eigenvalue of A. Consequently, the real matrix A = (A − λIn )(A − λIn ) = A2 − 2αA + (α2 + β 2 )In = (A − αIn )2 + (βIn )2

is not invertible. Since A − αIn is symmetric, if x is a non-zero vector in the nullspace of A, then



0 = hAx, xi = (A − αIn )2 x, x + (βIn )2 x, x = h(A − αIn )x, (A − αIn )xi + hβx, βxi = k(A − αIn )xk2 + kβxk2 .

By positive definiteness, β = 0 and Ax = αx. (ii). If λ1 6= λ2 are eigenvalues of A, with corresponding eigenvectors u1 and u2 , then

1 λ u1 , u2 = hAu1 , u2 i = hu1 , Au2 i = u1 , λ2 u2 ,

or (λ1 − λ2 ) hu1 , u2 i = 0. Since λ1 − λ2 6= 0, we have hu1 , u2 i = 0.

The Spectral Theorem Theorem 5.42. Let V be a subspace of (RN , +, ·). If T : V → V is a symmetric linear operator, there exists an orthonormal T -eigenbasis of V . Proof. The proof proceeds by induction on the dimension of V . If dim V = 1, then T is a scalar operator, and an arbitrary unit vector in V is an orthonormal eigenbasis. Assume inductively for some k ≥ 1 that the conclusion of the theorem holds for every subspace of dimension k, and let V be a subspace of dimension (k + 1). Since T is symmetric, its eigenvalues are real. Pick an eigenvalue λk+1 and a unit λk+1 -eigenvector uk+1 in V , and let V 0 = Span(uk+1 )⊥ be the orthogonal complement of uk+1 . We claim T maps V 0 to V 0 : A vector y is in V 0 if and only if huk+1 , yi = 0. Since T is symmetric and uk+1 is an eigenvector of T , huk+1 , T (y)i = hT (uk+1 ), yi = hλuk+1 , yi = 0;

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

108

LINEAR ALGEBRA

that is, T (y) ∈ V 0 . Since dim V 0 = k, the inductive hypothesis guarantees there exists an orthonormal T -eigenbasis (uj )kj=1 of V 0 . Appending uk+1 gives an orthonormal T -eigenbasis (uj )k+1 j=1 of V . Corollary 5.43. If A is a symmetric n × n real matrix, there exists an orthogonal n × n matrix P such that P T AP is diagonal. Theorem 5.42 has a geometric formulation: Theorem 5.44. Let V be a subspace of (RN , +, ·). If T : V → V is a symmetric operator, λ1 > λ2 > · · · > λr are the eigenvalues of T , and Πj : V → V is the orthogonal projection to the eigenspace Eλj , then: P (i) T = j λj Πj . P (ii) IV = j Πj . Remark 5.45. (ii) recapitulates Theorem 3.25.

Example 5.46. (Symmetric rank-one operators on Rn ) Recall that a rank-one operator T on Rn has standard matrix A = wv T for some non-zero vectors v and w in Rn . The matrix A is symmetric if and only if wv T = A = AT = vwT , if and only if v i wj − v j wi = 0 for all i, j, if and only if v and w are proportional. In this event, the eigenspaces are E0 = Span(v)⊥ , of dimension (n − 1), and Ekvk2 = Span(v), of dimension 1. In agreement with Theorem 5.44, T is kvk2 times orthogonal projection to Span(v).

5.5

Applications

For computing powers of a square matrix, diagonal matrices are formally simpler than general matrices. This section presents a selection of examples. Key properties are gathered here for reference; each is easily established with mathematical induction. Proposition 5.47. Let n ≥ 1 be an integer. If A, B, and P and n × n matrices, with P invertible, then for all m ≥ 1: (i) If A0 = P −1 AP , then (A0 )m = P −1 Am P .

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

109

(ii) If AB = BA, then (AB)m = Am B m . (iii) If (dj )nj=1 are real numbers, and if A0 = diag[d1 , d2 , . . . , dn ], then   (A0 )m = diag (d1 )m , (d2 )m , . . . , (dn )m .

The Fibonacci Numbers The Fibonacci sequence is the real sequence (xm )∞ m=1 defined by the two-term recursion x1 = x2 = 1,

xm+2 = xm + xm+1

for m ≥ 1.

The first fifteen terms are 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610 . . . . An obvious inductive argument proves that each Fibonacci number is an integer. There is a remarkable closed formula, known to De Moivre but usually called “Binet’s formula”. √ √ Theorem 5.48. Let λ+ = 12 (1 + 5) and λ− = 21 (1 − 5). For m ≥ 1, √ √ (1 + 5)m − (1 − 5)m (λ+ )m − (λ− )m m √ √ = . x = 5 2m 5 Proof. For m ≥ 1, define the vector xm = (xm , xm+1 ) in R2 . The Fibonacci recursion relation may be expressed in matrix form as  m+1    m    x x xm+1 0 1 m+1 = Axm , = x = m+2 = m m+1 1 1 xm+1 x x +x with A denoting the indicated 2 × 2 matrix. By induction on m, m−1    m   x 1 0 1 m m−1 1 for all m ≥ 1. =x =A x = 1 1 1 xm+1 To calculate Am−1 in closed form, diagonalize A. The characteristic equation is −λ 1 0 = det(A − λI) = = λ2 − λ − 1. 1 1 − λ √ The quadratic formula gives the eigenvalues of A: λ+ = 21 (1 + 5) and √ λ− = 21 (1 − 5). The subsequent computation is done symbolically, using the identities √ λ+ + λ− = 1, λ+ − λ− = 5, λ− λ+ = −1, (λ± )2 = λ± + 1.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

110

LINEAR ALGEBRA

Corresponding eigenvectors are   1 v1 = + , λ



 1 v2 = − . λ

As guaranteed by Theorem 5.41, the eigenvalues of A are real and the corresponding eigenvectors are orthogonal. Let S = (ej )2j=1 be the standard basis of R2 and S 0 = (vj )2j=1 the A-eigenbasis. The transition matrix P = [I2 ]SS 0 and its inverse are  1 1 P = + − , λ λ 

P

−1

    − 1 1 −λ− 1 −λ 1 . =√ = + λ+ −1 λ+ −1 λ − λ− 5

(Note that P −1 6= P T ; the columns of P are not of unit length.) If T : R2 → R2 is the operator whose standard matrix is A, then   + 0 0 λ 0 0 = [T ]SS 0 = [I2 ]SS [T ]SS [I2 ]SS 0 = P −1 AP. A = − 0 λ Writing A = P A0 P −1 = diag[λ+ , λ− ] and remembering λ+ λ− = −1, Am−1 = P (A0 )m−1 P −1   −    + m−1 1 −λ 1 1 1 (λ ) 0 =√ + − λ+ −1 0 (λ− )m−1 5 λ λ  + m−2  1 (λ ) − (λ− )m−2 (λ+ )m−1 − (λ− )m−1 =√ . + m−1 − (λ− )m−1 (λ+ )m − (λ− )m 5 (λ )

Figure 5.1: The action of T on S, and on an orthonormal eigenbasis.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

111

Consequently,  m   m−1   x 1 0 1 = 1 1 1 xm+1  + m−2   1 (λ ) − (λ− )m−2 (λ+ )m−1 − (λ− )m−1 1 =√ + m−1 − (λ− )m−1 1 (λ+ )m − (λ− )m 5 (λ )    + m−2 1 (λ ) + (λ+ )m−1  − (λ− )m−2 + (λ− )m−1  =√ . (λ+ )m−1 + (λ+ )m − (λ− )m−1 + (λ− )m 5 In particular, since 1 + λ± = (λ± )2 ,   + )m−2 + (λ+ )m−1 − (λ− )m−2 + (λ− )m−1 (λ √ xm = 5 + m − m (λ ) − (λ ) √ = . 5

Remark 5.49. Each power in the numerator may be expanded using the binomial theorem. It’s a good exercise to simplify the numerator in this way, and to prove the formula really does generate only integers.

Matrix Exponentiation Definition 5.50. The trace of an n × n matrix A is the sum of its diagonal entries, tr(A) = A11 + · · · + Ann =

n X

Ajj .

j=1

Proposition 5.51. Let A and B be n × n matrices. (i) tr(BA) = tr(AB). (ii) If P is invertible, then tr(P −1 AP ) = tr(A). (iii) If A is diagonalizable, then tr(A) is the sum of the eigenvalues, counting multiplicity. P Proof. (i). Since (AB)ij = k Aik Bjk , tr(AB) =

n X n X i=1 k=1

Aik Bik =

n X n X

Bik Aik = tr(BA).

k=1 i=1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

112

LINEAR ALGEBRA   (ii). By (i), tr P −1 (AP ) = tr (AP )P −1 = tr(A).

(iii). Suppose A0 = P −1 AP . By (ii) tr(A) = tr(A0 ). Since A0 is diagonal, the eigenvalues of A0 are its diagonal entries, so tr(A0 ) is the sum of the eigenvalues, say ΛA0 . The matrices A and A0 have the same characteristic polynomial, hence the same eigenvalues. In particular, ΛA0 = ΛA , the sum of the eigenvalues of A. Combining these observations, tr(A) = tr(A0 ) = ΛA0 = ΛA . Definition 5.52. Let A = [Aij ] be an n × n real (or complex) matrix. The exponential series is defined by exp(tA) =

∞ X (tA)k k=0

k!

= I + tA + t2

A2 A3 + t3 + .... 2! 3!

Remark 5.53. Since there are only finitely many entries of A, there is a real number M such that |Aij | ≤ M for all i and j. By induction on m, |(Am )ij | ≤ (M n)k for all k ≥ 0; for example, if m = 1, we have n n X X 2 i i ` |(A )j | = A` Aj ≤ |Ai` A`j | ≤ M 2 n ≤ (M n)2 . `=1

`=1

It follows that each entry in the exponential series is bounded in absolute value by the convergent series ∞ X (tM n)k k=0

= etM n .

k!

Example 5.54. If A = diag[d1 , d2 , . . . , dn ], then Am is the diagonal matrix whose entries are the mth powers of the entries of A, so 1

2

n

exp(tA) = diag[etd , etd , . . . , etd ]. Example 5.55. If N is the 4 × 4 nilpotent block, its is 04×4 , so N , N 2 , N 3 , and exp(tN ) are        1 0 1 0 0 0 0 1 0 0 0 0 1   0 0 1 0  0 0 0 1  0 0 0 0 0        0 0 0 1 ,  0 0 0 0 ,  0 0 0 0 ,  0 0 0 0 0 0 0 0 0 0 0 0 0 0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

fourth power t

t2 2!

1 0 0

t 1 0

t3 3!  t2  2!  .



t 1

CHAPTER 5. DIAGONALIZATION

113

Example 5.56. If N = Bn (0) is the n×n nilpotent block, then N m has a diagonal of 1’s m rows above the diagonal, and N n = 0n×n . The exponential series is a polynomial, and   2 tn−2 tn−1 1 t t2! . . . (n−2)! (n−1)!   0 1 t . . . tn−3 tn−2   (n−3)! (n−2)!    ..  ..  .. .. .. . . . exp(tN ) =  . . . . . .   2 t  0 0 0 . . . t 2!   0 0 0 . . . 1 t  0 0 0 ... 0 1 Theorem 5.57. Let A and B be n × n matrices. (i) If P is an invertible n × n matrix, then

exp(tP −1 AP ) = P −1 exp(tA)P.  (ii) If BA = AB, then exp t(A + B) = exp(tA) exp(tB).

(iii) If A is diagonalizable, then det exp(tA) = et tr(A) .  T (iv) exp(tAT ) = exp(tA) .

(v) If AT = −A, then exp(tA) is orthogonal and has determinant 1.

Proof. (i). This is a consequence of the identity (P −1 AP )k = P −1 Ak P : exp(tP

−1

AP ) =

∞ X (tP −1 AP )k k=0

k!

=

∞ X k=0

P −1

(tA)k P = P −1 exp(tA)P. k!

(ii). This follows from the binomial formula k X Ai B j (A + B)k X Ai B k−i = = , k! i! (k − i)! i! j! i=0

i+j=k

which holds for commuting matrices, together with the “Cauchy product formula” for absolutely convergent double series: k ∞ ∞ X X  X t(A + B) (tA)i (tB)j exp t(A + B) = = k! i! j! k=0 i+j=k k=0 X X  ∞ ∞ (tA)i (tB)j = = exp(tA) exp(tB). i! j! i=0

j=0

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

114

LINEAR ALGEBRA

(iii). Suppose P −1 AP = A0 = diag[d1 , . . . , dn ] for some invertible matrix P . By (i) and Example 5.54, 1

n

2

P −1 exp(tA)P = exp(tA0 ) = diag[etd , etd , . . . , etd ]. Taking determinants, 0

det exp(tA) = det exp(tA ) =

n Y

k

etd = et

P

k

dk

0

= et tr(A ) = et tr(A) .

k=1

(iv). This is an immediate consequence of (Ak )T = (AT )k . (v). Clearly exp(0n×n ) = In , so by (ii), exp(−tA) = exp(tA)−1 . If AT = −A. then exp(tA)−1 = exp(−tA) = exp(tAT ) = exp(tA)T ,

which proves exp(tA) is orthogonal. Since the diagonal entries of a skew-symmetric matrix are all 0, tr(A) = 0. By (iii), det exp(tA) = et tr(A) = e0 = 1. Remark 5.58. An orthogonal n × n matrix of determinant 1 is, by definition, a Euclidean rotation. Part (v) of the theorem says that a skew-symmetric matrix is an “infinitesimal” rotation. Example 5.59. If   0 −1 , A= 1 0

 cos t − sin t . then exp(tA) = sin t cos t 

To prove this, note that A2 = −I2 , from which it follows immediately that A2k = (−1)k I2 and A2k+1 = (−1)k A. Splitting the exponential series into terms of even and odd degree and recalling the power series for the circular functions, cos t =

∞ X k=0

t2k (−1) , (2k)! k

sin t =

∞ X k=0

t2k+1 , (−1) (2k + 1)! k

we have ∞ X (tA)2k

∞ X (tA)2k+1 (2k)! (2k + 1)! k=0 k=0 X X ∞ ∞ 2k  2k+1  k t k t = I2 + A (−1) (−1) (2k)! (2k + 1)!

exp(tA) =

+

k=0

k=0

= (cos t)I2 + (sin t)A.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

115

Exercises Exercise 5.1. Find the eigenvalues and bases for the eigenspaces of the following matrices, and determine which are diagonalizable:         4 0 0 0 4 0 0 0 4 0 0 0 4 0 0 0  0 4 0 0  0 4 0 0  0 4 0 0  0 4 0 0          0 0 4 0  0 0 2 0  0 0 2 0  0 0 2 0 1 0 0 2 0 0 1 2 0 0 1 0 1 0 0 4 Exercise 5.2. On the vector space V = F(R, R) of all real-valued functions on R under function addition and scalar multiplication, let R be the “domain reflection” operator defined by (Rf )(t) = f (−t). (a) Show R is a linear involution. (b) Identify the eigenspaces of R, and show that every function can be written uniquely as a sum of eigenvectors. (c) Find the decomposition of f (t) = et into eigenvectors. Exercise 5.3. Prove Proposition 5.37. Exercise 5.4. An operator Π : V → V is a projection if Π2 = Π. (a) Prove that if Π is a projection, then IV − Π is a projection, and the only eigenvalues of Π are 0 and 1. (b) Assuming “optimistically” that every vector v in V decomposes as a sum v 0 + v 1 of eigenvectors, find formulas for v 0 and v 1 in terms of v and Π(v). (c) Show that the vectors v 0 and v 1 found in part (b) are eigenvectors of Π. Conclude that Π is diagonalizable. (d) Find a projection on R2 whose standard matrix is not symmetric. Exercise 5.5. If Π : V → V is a projection (preceding exercise), the operator R = Iv − 2Π is a reflection. (a) Prove that a reflection is an involution. (b) How are the eigenvalues and eigenspaces of Π related to the eigenspaces of R?

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

116

LINEAR ALGEBRA

(c) Find a reflection on R2 whose standard matrix is not symmetric. Exercise 5.6. Let T : V → V be an operator satisfying T 3 = T . (a) Find the possible eigenvalues of T . (b) Assuming “optimistically” that every vector v in V decomposes as a sum of eigenvectors, find formulas for the each summand in terms of v, T (v) and T 2 (v). (c) Show that the vectors found in part (b) are eigenvectors of T . Conclude that T is diagonalizable. Exercise 5.7. Let A be a non-zero n × n matrix satisfying Ak = 0n×n for some integer 2 ≤ k ≤ n. (a) Prove that every eigenvalue of A is 0. (b) If A3 = 0n×n , prove (I + A)−1 = I − A + A2 . (c) Find a formula for (I + A)−1 if Ak = 0n×n for some k > 3. (d) Prove I + A is not diagonalizable. Hint: What are the eigenvalues? Exercise 5.8. Let S = (vj )nj=1 be a basis of Rn . Prove S is orthonormal if and only if v1 v1T + · · · + vn vnT = In . Exercise 5.9. Let B be an arbitrary n × n real matrix, and put A = B T B, noting that A is symmetric. Prove that every eigenvalue of A is non-negative, and 0 is an eigenvalue of A if and only if B is not invertible. Hint: First show that hx, Axi = kBxk2 for every x in Rn . Exercise 5.10. Let a, b, and c be real numbers.  a b . (a) Find the eigenvalues and eigenspaces of A = b c 

(b) Prove (by direct computation) that the eigenspace of A are orthogonal. (c) Prove the eigenvalues have opposite sign if and only if ac − b2 < 0. (d) Prove the eigenvalues have the same sign if and only if ac − b2 > 0, and their mutual sign is the sign of a.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

CHAPTER 5. DIAGONALIZATION

117

 0 2 . Find a formula for An , n > 0 an Exercise 5.11. Let A = 2 3 arbitrary integer. 

Exercise 5.12. Show that if A is skew-symmetric and λ is a real eigenvalue, then λ = 0. Hint: Let v be a λ-eigenvector, and compute hAv, vi. Exercise 5.13. Let A and B be n × n matrices. We say A and B are simultaneously diagonalizable if there exists an invertible matrix P such that A0 = P −1 AP and B 0 = P −1 BP are diagonal. Prove that simultaneously diagonalizable matrices commute, i.e., BA = AB. Hint: What can you say about the product of diagonal matrices? Exercise 5.14. Let A and B be commuting n × n matrices. (a) Show that if v is a λ-eigenvector of A, then Bv is in the λeigenspace of A. That is, B preserves the eigenspaces of A. (b) Prove that if A is diagonalizable and B is invertible, then A and B are simultaneously diagonalizable. Hint: Show that A and B −1 commute, then use part (a) to show the eigenspaces of B are precisely the eigenspaces of A. (c) Remove the hypothesis in part (b) that B is invertible. Suggestion: B is diagonalized by P if and only if B + cIn is; show that there exists a real number c so that B + cIn is invertible.

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

118

LINEAR ALGEBRA

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

Index Complex numbers, 32 Coordinate vector, 41–42, 76 Cosets of a subspace, 88 set of, as vector space, 89 vector space operations on, 89

Angle, 56–57 Augmented matrix, 9, 10, 19 Basic variable, 10 number of is rank, 91 Basis, 41 algorithm for testing, 48 counting elements of, 46 dual, 2 existence of, 41 linear transformation determined by action on, 80 orthonormal, 58 standard, 2, 42 Bijective, 74 Bilinear function, 53 Block matrix, 2 Box, 64

Decomposition into parallel and orthogonal components, 57 of identity into orthogonal projections, 59, 108 with respect to a direct sum, 37 with respect to orthogonal subspaces, 62 i δj , see Kronecker delta Determinant, 64–71 effect of row operations on, 69 of an operator, 100 of invertible matrix, 69 of matrix product, 70 of similar matrices, 70, 100 of transpose, 68 of triangular matrix, 68 polynomial formula for, 66–68 using row operations to compute, 70–71 Diagonal matrix, 3, 47, 95 linear transformation

Cauchy-Schwarz inequality, 55 Change of basis, 83–84 Characteristic polynomial of an operator, 100 Column matrix, 2, 4 Column space of a matrix, 86–88 basis of, 88 Column vector, 25, 29 Commutative diagram, 83 Complementary subspace, 46 119

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

120

LINEAR ALGEBRA

associated to, 78 powers of, 108 Diagonalizable operator, 95 criteria for, 97–99 over the complex numbers, 102 typifying examples, 101–102 Differential equation, 33 Dimension, 43 of (Rn )∗ , 44 of Rm×n , 44 of Rn , 44 of Pn , 44 of a subspace, 43 of direct sum, 46 of sum, 46 The Dimension Theorem, 46 Direct sum, 37, 62 decomposition of Rn×n , 38 decomposition of vectors, 37 dimension of, 46 Dual pairing, 4 Dummy index, 1, 25 Edge, see Box ei , 2, 17 Eigenbasis, 96 Eigenspace, 96 Eigenvalue, 96 non-real, 101 Eigenvector, 96 eij , 4, 17, 29, 42, 47, 82 ej , 2, 17 Elementary matrix, 17–19 Elementary row operation, 11–18 Equality of matrices, 6 Exponential series, 112 examples of, 112 Extension by linearity, 80

Fibonacci numbers, 109–111 closed formula for, 109 Finite-dimensional, 35, 43 Free variable, 10 number of, 45 number of is nullity, 91 Functions as vectors, 24 vector space F(X, R), 24, 33 Fundamental Theorem of Algebra, 101 Gaussian elimination, see Row reduction General linear group, 8 GL(n, R), see General linear group Golden ratio, 109 Gram-Schmidt algorithm, 61 example of, 63 Homogeneous linear system, 9 Identity map, 76 matrix with respect to a basis, 81 Identity matrix, 3, 5, 8, 29 Image of a linear transformation, 74 In , see Identity matrix Injective, 74 criterion for, 74 relation with linear independence, 80 Inner product, 4, 53–57 preserved by orthogonal matrix, 60 Inverse matrix, 8 algorithm for, 18–19 determinant of, 70 formula for 2 × 2, 16

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

INDEX of product, 8 Uniqueness of, 8 Involution, 99 Isomorphism, 74 Jordan block matrix, 104 Kernel of a linear transformation, 74 Kronecker delta, 2–3, 6, 29 Lagrange interpolation polynomials, 51 Law of cosines, 56 Leading 1, 10 Linear combination, 28–29 empty, 28, 36 from a set of vectors, 35, 37, 39 non-trivial, 28, 39 Linear operator, 95 diagonalizable, 95 Linear system, 9–15, 84–86 augmented matrix, 9, 10 basic variable of, 10 coefficient matrix, 9 consistent, 9 free variable of, 10 homogeneous, 9, 32, 43, 45 inconsistent, 10 non-homogeneous, 9 non-trivial solution, 9, 10, 43 particular solution of, 14, 91 solution set, 9, 32, 34, 63, 91 dimension of, 45 Linear transformation, 73 associated to a matrix, 76 composition, 74 composition as commutative diagram, 83

121

composition as matrix product, 82 determined by action on a basis, 80 effect on linear combination, 73 evaluation as commutative diagram, 83 evaluation as matrix product, 82 image of, 74 induced isomorphism, 90 kernel of, 74 matrix of, 81 nullity of, 88 rank of, 88 Linear transformations vector space of, 79 Linearly dependent, 39 Linearly independent, 39, 40 algorithm for testing, 48 maximum number of elements, 42 number of elements, 44 empty set, 39 Magnitude of a vector, 54 Matrix, 1 2 × 2 as complex number, 32–33 as linear combination of outer products, 29 block, 2 column, 2, 4 detecting equality, 6, 29 diagonal, 3, 47 inverse of, 8 of linear transformation, 81 orthogonal, 59–60 real, 1

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

122

LINEAR ALGEBRA

row, 2, 4 row reduction, 9–18 row-echelon form of, 10 skew-symmetric, 32 square, 3 symmetric, 32 trace of, 111 triangular, 47 Matrix inverse algorithm for, 18–19 Matrix multiplication, 4–6 associative law, 5, 8 definition of, 4 distributive law, 5, 34 linear transformation induced by, 76 not commutative, 6 properties of, 5, 34 Matrix transpose, 3, 7

Orthogonal complement, 61 Orthogonal group, 60 Orthogonal matrix, 59–60 as exponential of skew-symmetric matrix, 113 columns of, 59 preserves dot product, 60 Orthogonal projection, 57–59, 62, 105 best approximation property, 62 Orthogonal vectors, 56 Orthonormal basis, 58 existence of, 61 matrix of symmetric operator in, 106 Outer product, 4, 29, 42, 78

Nilpotent matrix, 104 Non-homogeneous linear system, 9 Non-trivial solution of a linear system, 9 Nullity of a linear transformation, 88 Nullspace of a matrix, 86

Parallelipiped, see Box Parallelogram definition of, 72 linear transformation preserves, 73 Parallelogram law, 27 Particular solution, 14 Permutation matrix, 21 Pn , see Polynomials Pointwise operations, 24 Polynomials Lagrange interpolation, 51 trigonometric, 26 vector space of, 26, 33 dimension, 44 Projection orthogonal, 62 Projection matrix, 78 The Pythagorean Theorem, 57

Operator characteristic polynomial of, 100 determinant of, 100 diagonalizable over the complex numbers, 102 linear, 95 symmetric, 105–111 trace of, 111 Oriented volume, 64 axioms for, 65

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

INDEX Quadratic formula, 55 Quotient vector space, 88 dimension of, 89 Rank of a linear transformation, 88 Rank of a matrix, 87 Rank-Nullity Theorem, 86–91 statement of, 87, 90 Rank-one matrix, 4 symmetric, 108 Rank-one transformation, 82 diagonalizability of, 99 Reflection matrix, 59, 78 Reverse order law, 8 Rm×n , 1 dimension of, 44 Rn , 2, 17, 29 dimension of, 44 subspace of, as solution of a linear system, 50, 63 n ∗ (R ) , 2, 17, 29 dimension of, 44 Rotation matrix, 59, 78 Row matrix, 2, 4 Row operation, 11–18 effect on determinant, 69 via elementary matrix, 17 Row reduction, 9–18 algorithm for, 12–15 Row space of a matrix, 86 Row vector, 29 Row-echelon form, 10 uniqueness of, 15–16 Scalar matrix, 95 matrix similar to, 95 Sequence finite, 45 real, 25, 45

123

Similar matrices, 84, 95 determinant of, 70, 100 powers of, 108 trace of, 111 Similar matrix example of, 84 Skew-symmetric matrix, 32, 38, 44 exponential of, 113 Smooth functions, 77 Solution set, 9, 34 as a subspace, 32, 34 Span, 35–36 as intersection of all containing subspaces, 36 of empty set, 36 of proper subset, 39 properties of, 35 Spanning algorithm for testing, 48 Square matrix, 3 Standard basis, 2, 29, 42, 44 as edges of unit cube, 64 Standard dual basis, 2, 29, 44 Subspace, 30–38 as solution space of a linear system, 50, 63 closed under linear combinations, 30 complementary, 46 criteria for, 30 dimension of, 43 proper, 31 trivial, 31 Subspaces direct sum of, 37 examples of, 31–33 intersection of, 33, 34, 36 sum of, 37–38 as span of union, 37

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31

124

LINEAR ALGEBRA

union of, 33 Sum of subspaces, 37–38 dimension of, 46 Superscripts as indices, 25 Surjective, 74 relation with spanning, 80 Symmetric matrix, 32, 38, 44, 106 Symmetric operator, 105–111 diagonalized by orthogonal matrix, 107 has orthogonal eigenspaces, 106 has real eigenvalues, 106 in an orthonormal basis, 106 orthogonal projection is, 105 Trace of a matrix as sum of eigenvalues, 111 Transition matrix, 83 as translator, 83 example of, 84 properties of, 83 Transpose operator, 3 diagonalizability of, 100 eigenspaces of, 100 linearity of, 3

of a product, 7 Triangular matrix, 47, 95 determinant of, 68 Trigonometric polynomials, 26 Unit cube in Rn , 57, 64 Unit vector, 54–55 Vector column, 25 magnitude of, 54 unit, 54 Vector operations geometry of, 27 Vector space axioms for, 23 of all linear transformations, 79 of polynomials, 26 quotient by a subspace, 88 Vector subspace, see Subspace Volume, see Oriented volume Zero map, 76 criterion for, 79 Zero matrix, 6 0V , 24

AMS Open Math Notes: Works in Progress; Reference # OMN:201801.110759; Last Revised: 2018-01-20 09:27:31