1,160 258 749KB
English Pages [190] Year 2022
Basic Linear Algebra An Introduction with an Intuitive Approach Duc Van Khanh Tran
Table of Contents About the Author
1
Preface
2
I
3
Vectors
Chapter 1: What are Vectors? 5 1.1 Introduction to Vectors . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Graphical Representation of Vectors . . . . . . . . . . . . . . . . . . 6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2: Addition, Subtraction, and Scalar 2.1 Addition and Subtraction . . . . . . . . . . 2.2 Scalar Multiplication . . . . . . . . . . . . . 2.3 Graphical Representation . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . .
Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
11 12 14 15 19
Chapter 3: Linear Combination and Linear Independence 3.1 Linear Combination and Span of Vectors . . . . . . . . . . 3.2 Linear Independence and Dependence . . . . . . . . . . . 3.3 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . 3.4 Vector Space as Span of Vectors . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
20 21 22 24 26 31
Chapter 4: Norms of Vectors 4.1 Definition and Properties of Norms . . . . . 4.2 Graphical Meaning of `2 Norm . . . . . . . 4.3 Unit Vectors and Normalization of Vectors . 4.4 Graphical Meaning of `1 Norm . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
33 34 38 41 43 46
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Chapter 5: Dot Product and Orthogonality 47 5.1 Transpose of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Dot Product of Vectors . . 5.3 Graphical Representation of 5.4 Orthogonal Vectors . . . . . Exercises . . . . . . . . . . . . .
II
. . . . . . . . Dot Product . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Matrices
Chapter 6: What are Matrices? 6.1 Introduction to Matrices . . . . . . . . 6.2 Main Diagonal, Trace, and Transpose 6.3 Special Types of Matrices . . . . . . . Exercises . . . . . . . . . . . . . . . . . . .
48 51 53 57
59
. . . .
. . . .
. . . .
61 62 63 65 70
Chapter 7: Addition, Subtraction, and Multiplication 7.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . 7.2 Matrix Multiplication with Scalars and Vectors . . . . . . . . . . 7.3 Multiplication of Two Matrices . . . . . . . . . . . . . . . . . . . 7.4 Distributive and Associative Properties of Matrix Multiplication 7.5 Transpose of Product of Matrices . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
72 73 74 77 85 89 92
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Chapter 8: Row Operations on Matrices 94 8.1 Three Types of Row Operations . . . . . . . . . . . . . . . . . . . . . 95 8.2 Row Operation as Matrix Multiplication . . . . . . . . . . . . . . . . 97 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Chapter 9: Reduced Row Echelon Form and Rank of 9.1 Reduced Row Echelon Form . . . . . . . . . . . . . . 9.2 Rank of Matrices . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .
106 107 112 114
Chapter 10: The Four Fundamental Subspaces 10.1 What are Four Fundamental Subspaces? . . . 10.2 Column Space . . . . . . . . . . . . . . . . . . 10.3 Row Space . . . . . . . . . . . . . . . . . . . 10.4 Null and Left Null Spaces . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
116 117 117 120 123 130
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Chapter 11: Inverse of Matrices 131 11.1 Invertible and Singular Matrices . . . . . . . . . . . . . . . . . . . . 132 11.2 Finding Inverse of an Invertible Matrix . . . . . . . . . . . . . . . . . 133 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Chapter 12: Determinant of Matrices 12.1 Properties of Determinant . . . . . . . . 12.2 Determinant Formula for 2 x 2 Matrices 12.3 Determinant Formula for 3 x 3 Matrices Exercises . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
Chapter 13: Systems of Linear Equations 13.1 Systems of Linear Equations as Matrix Linear 13.2 Characteristic of Matrix Linear Equations . . 13.3 Solving Matrix Linear Equations . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
138 139 144 146 148
Equations . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
149 150 151 152 159
. . . .
. . . .
. . . .
. . . .
. . . .
Appendices
160
Appendix A: Cauchy-Schwarz Inequality
162
Appendix B: Circle in Taxicab Geometry
165
Appendix C: Beyond Vectors and Matrices: Blades and Tensors
167
Solutions to Exercise Problems
171
Reviews
184
About the Author Hi! My name is Duc Van Khanh Tran, and I am a Vietnamese undergraduate at the University of Texas at Austin majoring in Pure Mathematics. When I was in elementary school, I studied at Morinosato Elementary School in Kanazawa, Ishikawa province, Japan for about two years. For middle school education, I studied at Le Quy Don Middle School in Ho Chi Minh City, Vietnam. When I was in 9th grade, I came to the USA and started my high school study at Brentwood Christian School in Austin, Texas. The classes in my high school were more relaxing and less stressful than the classes in my middle school, so I began to understand and enjoy the things I studied at school. After a while, I realized that I enjoyed math the most out of all the subjects. Also, there was a math team in my high school, and participating in the math team made me love mathematics even more. That is how I “fell in love” with mathematics. When I was in 11th grade, I came across @daily math , a very popular math account on Instagram. Inspired by the @daily math page, I also created a math account on Instagram called @dvkt math with about 27,000 followers currently. Other than sharing mathematics knowledge by the means of a math page on Instagram, I decided to also write some math books. Before writing this book, I also wrote two other books called An Introduction to Calculus: With Hyperbolic Functions, Limits, Derivatives, and More, which was published a little before my high school graduation, and Integrals and Sums Fiesta: An Integral Part of a Math Enthusiast’s Life, which was published during my first year in college.
1
Preface Linear Algebra is usually only taught in college and not covered sufficiently or not covered at all in high school. However, Linear Algebra is actually not very complicated, and high school students can start learning it as well. This book introduces some important basic concepts of Linear Algebra with high school Algebra and Geometry as the only prerequisites. However, this book is by no means only restricted to high school students. Any learners who want to get started on Linear Algebra in an intuitive manner are welcomed to read this book. Sometimes, in college, Linear Algebra can be introduced in a rigorous manner without any intuition at all. For many students who are not ready for the rigor, it would be good to have some intuition for the subject as a base from which they can build up the rigor. This book is divided into two parts: the first part is about vectors, and the second part is about matrices. Part I starts by introducing what vectors are and then goes into some important concepts related to vectors. Similarly, part II starts by introducing what matrices are and then goes into discussing some important basic concepts about matrices. At the end of the book, there are appendices about some additional topics related to some topics discussed in part I or part II. As mentioned before, this book introduces some basic concepts of Linear Algebra in an intuitive manner. So, not all topics that would usually be covered in an introductory college-level Linear Algebra course are covered in this book, and some topics covered in this book are not dived as deep into as they would be in introductory college-level Linear Algebra courses. If you choose to continue learning more Linear Algebra, you could learn more about the topics covered in this book as well as more topics in Linear Algebra. I hope you enjoy reading this book and learning Linear Algebra!
Duc Van Khanh Tran Texas, USA, 2022
2
Part I
Vectors
3
Chapter 1
What are Vectors?
5
1.1
Introduction to Vectors
Let us start our journey of linear algebra with an introduction to vectors. What are vectors? Basically, a vector is a list of numbers such as 1 , (1.1) 2 3 5 , 6
(1.2)
etc. The vector in (1.1) has two elements 1 and 2, and the vector in (1.2) has three elements 3, 5, and 6. A vector with one element, such as 8 , is simply a number, also called a scalar. There can also be vectors with more than three elements. A vector is usually denoted as a letter with a small arrow above, such as ~v . For example, for the vectors in (1.1) and (1.2), we can denote them as 1 ~v = 2 and
1.2
3 ~u = 5 . 6
Graphical Representation of Vectors
Graphically, a vector with two elements v1 v2 is an arrow that goes v1 units in the horizontal direction (or in the x-direction of the xy-coordinate plane) and v2 units in the vertical direction (or in the y-direction of the xy-coordinate plane). The arrow goes to the right if v1 is positive, and it goes to the left if v1 is negative. For the vertical direction, The arrow goes up if v2 is positive, and it goes down if v2 is negative.
6
v1 v2
|v2 |
|v1 | Figure 1.1: Graphical representation of
v1 if v1 > 0 and v2 > 0 v2
v1 v2
|v2 |
|v1 | v Figure 1.2: Graphical representation of 1 if v1 < 0 and v2 < 0 v2
v1 v2
|v2 |
|v1 | v Figure 1.3: Graphical representation of 1 if v1 > 0 and v2 < 0 v2
v1 v2
|v2 |
|v1 | v Figure 1.4: Graphical representation of 1 if v1 < 0 and v2 > 0 v2
7
An important point to note here is that direction matters. So, a vector is an object with a direction and a magnitude. The magnitude (or length) of vectors is discussed in chapter 4. If the arrow starts at the origin, i.e. the point (0,0), then the vector v1 v2 represents the point (v1 , v2 ) in the xy-plane (or in the two-dimensional space), i.e the point with x-coordinate v1 and y-coordinate v2 . We will talk about space in more details later. For example, the vector 1 2 represents the point (1,2). y-axis 3
(1, 2) 2
1
1
2
3
x-axis
1 Figure 1.5: Vector 2 Similarly, a vector with three elements represents a point in the three-dimensional space. In the three-dimensional space, other than the x-coordinate and the ycoordinate, there is z-coordinate representing the height. z-axis
y-axis
x-axis
Figure 1.6: Three-dimensional space 8
A vector
v1 v2 v3
represents the point (v1 , v2 , v3 ) in the three-dimensional space, i.e. the point with x-coordinate v1 , y-coordinate v2 , and z-coordinate v3 . Vectors with more than three elements represent points in higher-dimensional spaces. For example, a vector with four elements represents a point in the fourdimensional space. A point in the four-dimensional space has four coordinates corresponding to four elements of the vector. However, spaces with four dimensions and higher are hard to imagine in our head since we are only used to the three-dimensional world that we are living in.
9
Exercises 1. Which of the following are vectors? 5 3 A. 9 B. C. 0.5 D. 1 2
Which of the following are scalars? 2.5 4.1 5.7 E. π F. 5 12 10
2 2. Draw the vector . What point does it represent? −1 3. Draw the vector
−3 . What point does it represent? 4
4. Draw the vector
−2 . What point does it represent? −3
1 5. What point does the vector 3 represent? What is its y-coordinate? 6 −4 6. What point does the vector 5 represent? What is its x-coordinate? −2
8 7. What point does the vector 15 represent? What is its z-coordinate? −20
3 −2 8. The vector −7 represents a point in the space with how many dimensions? 5 −11
10
Chapter 2
Addition, Subtraction, and Scalar Multiplication
11
2.1
Addition and Subtraction
Now that we learned what vectors are, let us learn how to do basic arithmetic operations to vectors. When we add two vectors together, we add them elementby-element. That is, we add the first element (from top to bottom) of the first vector to the first element of the second vector, the second element of the first vector to the second element of the second vector, and so on. Note that we can only add two vectors with the same number of elements (in the same dimension). Let us look at a few examples. 1 0 Example 2.1: + =? 4 3 1 0 1+0 1 + = = . 4 3 4+3 7 1 5 Example 2.2: 2 + −8 =? 3 12 6 1+5 5 1 2 + −8 = 2 + (−8) = −6 . 15 3 + 12 12 3 4 −3 5 −9 Example 2.3: 1 + 10 =? −7 7 −3 4 (−3) + 4 1 5 −9 5 + (−9) −4 + = = . 1 10 1 + 10 11 7 −7 7 + (−7) 0 Similarly to addition of numbers, vector addition satisfies commutative and associative properties. That is, for vectors ~u, ~v , and w, ~ we have ~u + ~v = ~v + ~u and ~u + (~v + w) ~ = (~u + ~v ) + w. ~ We can prove these properties by doing some simple algebra and using the properties of addition of numbers. Let u1 u2 ~u = . , .. un 12
v1 v2 ~v = . , .. vn and
w1 w2 w ~ = . . .. wn Then, we have
u1 v1 u1 + v 1 v1 + u1 v1 u1 u2 v2 u2 + v2 v2 + u2 v2 u2 ~u + ~v = . + . = = = .. + .. = ~v + ~u .. .. .. .. . . . . un
vn
un + vn
vn + un
vn
un
and
u1 v1 w1 u1 + (v1 + w1 ) u2 v2 w2 u2 + (v2 + w2 ) ~u + (~v + w) ~ = . + . + . = .. .. .. .. . un + (vn + wn ) vn wn w1 v1 u1 (u1 + v1 ) + w1 (u2 + v2 ) + w2 u2 v2 w2 ~ = = .. + .. + .. = (~u + ~v ) + w. .. . . . . wn un vn (un + vn ) + wn un
Subtraction of vectors works in the same way as addition. We subtract elementby-element, and we can only subtract vectors in the same dimension. 9 4 Example 2.4: − =? 5 1 9 4 9−4 5 − = = . 5 1 5−1 4
16 −1 Example 2.5: 8 − 12 =? −2 3
16 −1 16 − (−1) 17 8 − 12 = 8 − 12 = −4 . −2 3 (−2) − 3 −5 13
2.2
Scalar Multiplication
Recall that a scalar is a number. When we multiply a vector with a scalar, we distribute and multiply the scalar to each element of the vector. Let us look at a few examples. −2 Example 2.6: 3 · =? 5 −2 3 · (−2) −6 3· = = . 5 3·5 15 1 3 Example 2.7: −4 · −2 =? 5 0
−4 (−4) · 1 1 3 (−4) · 3 −12 −4 · −2 = (−4) · (−2) = 8 . 5 (−4) · 5 −20 0 (−4) · 0 0
Similarly to multiplication of numbers, scalar multiplication of vectors satisfies the distributive property. That is, for a scalar a and vectors ~u and ~v , we have a(~u + ~v ) = a · ~u + a · ~v . We can prove this by using the distributive property of multiplication of numbers. Let u1 u2 ~u = . .. un and
v1 v2 ~v = . . .. vn 14
Then, we have a(u1 + v1 ) u1 + v 1 v1 u1 u2 + v2 a(u2 + v2 ) u2 v2 a(~u + ~v ) = a . + . = a · = .. .. .. .. . .
a(un + vn ) un + vn vn a · u1 + a · v1 u1 v1 a · u2 + a · v2 u2 v2 = = a · .. + a · .. = a · ~u + a · ~v . .. . . . un
a · un + a · vn
2.3
un
vn
Graphical Representation
Next, let us take a look at the graphical representations of these arithmetic operations to vectors. First, let us take a look at the graphical representation of vector addition ~u + ~v . If we put the starting point of ~v at the endpoint of ~u, then ~u + ~v is the vector going from the starting point of ~u to the endpoint of ~v .
~v
~u + ~v
~u Figure 2.1: Graphical representation of vector addition Let us look at the two-dimensional case more closely. For vectors u ~u = 1 u2 and
v ~v = 1 , v2
we have ~u + ~v =
u1 + v1 . u2 + v2
Assuming u1 , u2 , v1 , and v2 are all positive, graphing the vectors ~u, ~v , and ~u + ~v would give us the following picture.
15
~v
v2 u2 + v2
~u + ~v
v1
u2
~u u1
u1 + v1
Figure 2.2: Addition of two-dimensional vectors
We can see that the components of ~u + ~v are indeed u1 + v1 and u2 + v2 . From the graphical representation of vector addition, we can derive the graphical representation of vector subtraction ~u − ~v . Since ~v + (~u − ~v ) = ~u, the graphical representation of vector subtraction ~u − ~v is as follows.
~u − ~v
~u
~v Figure 2.3: Graphical representation of vector subtraction Now let us look at the graphical representation of scalar multiplication of vectors. When we multiply a vector ~v by a scalar a, if a > 0, we stretch or shrink the vector by a factor of a. If a > 1, we are stretching the vector. If a < 1, we are shrinking the vector.
a · ~v
~v Figure 2.4: Graphical representation of a · ~v when a > 1
16
~v
a · ~v
Figure 2.5: Graphical representation of a · ~v when 0 < a < 1 For example, 2 · ~v would stretch ~v by a factor of 2, and 12 · ~v would shrink ~v by a factor of 21 . Let us look at the two-dimensional case more closely. Let v ~v = 1 , v2 then we have
a · v1 a · ~v = . a · v2
Assuming v1 and v2 are positive and a > 1, graphing the vectors ~v and a · ~v would give us the following picture. D
a · ~v a · v2
B ~v
v2
A v1
C
E a · v1
Figure 2.6: Scalar multiplication of a two-dimensional vector We can see that 4ABC is similar to 4ADE. So, indeed to stretch a vector, we need to stretch each component of the vector, v1 and v2 in this case. If a < 0, the scalar multiplication a · ~v would flip the vector ~v to the opposite direction and then stretch or shrink it depending on whether |a| > 1 or |a| < 1.
17
~v
a · ~v
Figure 2.7: Graphical representation of a · ~v when a < 0
18
Exercises −4 −1 1. 0 + 5 =? 6 −9 −5 −3 5 4 2. 9 − 10 =? −2 −7
8 3. 2 · 15 =? −6 1 −3 4 4. −4 · 0 =? 2 −2
1 0 5. 3 · +5· =? 2 1 5 15 6. 6 · 2 − 2 · 10 =? 0 −3 7. For any scalars a and b and vector ~v , show that (a + b) · ~v = a~v + b~v .
19
Chapter 3
Linear Combination and Linear Independence
20
3.1
Linear Combination and Span of Vectors
Let us talk about linear combination of vectors applying what we learned in the last chapter. What is a linear combination of vectors? A linear combination of vectors is a sum of scalar multiples of vectors. For example, let us look back at questions 7 and 8 in the exercises of the last chapter: 1 0 3· +5· 2 1 1 0 is a linear combination of and , and 2 1 5 15 6 · 2 − 2 · 10 0 −3 5 15 is a linear combination of 2 and 10 because 0 −3 15 5 15 5 6 · 2 − 2 · 10 = 6 · 2 + (−2) · 10 . −3 0 −3 0 In general, a linear combination of n vectors ~v1 , ~v2 , · · · , ~vn is of the form c1~v1 + c2~v2 + · · · + cn~vn for some scalars c1 , c2 , · · · , cn . If there is only one vector ~v , then a linear combination of ~v is just a scalar multiple of ~v : c~v . The span of vectors ~v1 , ~v2 , · · · , ~vn , denoted as span {~v1 , ~v2 , · · · , ~vn } , is the set of all possible linear combinations of ~v1 , ~v2 , · · · , ~vn . If a vector ~u can be written as a linear combination of ~v1 , ~v2 , · · · , ~vn , then we write ~u ∈ span {~v1 , ~v2 , · · · , ~vn } . √ The notation “∈” means “belongs to.” For example, √ √ we can write “ 2 ∈ R” to mean “ 2 belongs to the set of real numbers” or “ 2 is a real number.” Let us look at a few more examples. 5 4 3 Example 3.1: Write as a linear combination of and . −6 1 8 5 4 3 =2 − . −6 1 8 21
10 6 4 0 Example 3.2: Write 7 as a linear combination of 3, 5 , and 1. −2 8 13 0 10 6 4 0 7 = 3 3 − 2 5 + 8 1 . −2 8 13 0 2 12 3 Example 3.3: Verify that −5 ∈ span −4 , 1 . 9 0 3 2 3 12 −5 = 2 −4 + 3 1 . 3 0 9 2 3 12 That means −5 can be written as a linear combination of −4 and 1, so 3 0 9 12 2 3 −5 ∈ span −4 , 1 . 9 3 0
3.2
Linear Independence and Dependence
Next, let us discuss the concept of linear independence, which is closely related to the concept of linear combination. We say that vectors are linearly independent if none of them can be written as a linear combination of other vectors. Otherwise, the vectors are said to be linearly dependent. For example, three vectors ~v1 , ~v2 , and ~v3 are linearly independent if ~v1 6∈ span {~v2 , ~v3 } , ~v2 6∈ span {~v1 , ~v3 } , and ~v3 6∈ span {~v1 , ~v2 } . Three vectors ~v1 , ~v2 , and ~v3 are linearly dependent if ~v1 ∈ span {~v2 , ~v3 } , ~v2 ∈ span {~v1 , ~v3 } , or ~v3 ∈ span {~v1 , ~v2 } . Let us look at a few examples. 22
1 0 Example 3.4: Are and linearly independent or dependent? 0 1 1 0 0 We can see that is not a scalar multiple of , which means that is 0 1 1 1 not a scalar multiple of , either. So, 0 1 0 6∈ span 0 1 and
0 1 6∈ span . 1 0
1 0 and are linearly independent. 0 1 5 4 3 Example 3.5: Are , , and linearly independent or dependent? −6 1 8 Therefore,
From example 3.1, we know that 5 4 3 ∈ span , . −6 1 8
5 4 3 Therefore, , , and are linearly dependent. −6 1 8 0 4 6 Example 3.6: Are 3, 5 , and 1 linearly independent or dependent? 0 13 8 By doing some additions and scalar multiplications for those vectors, we can see that none of them can be written as a linear combination of other two vectors. So, 6 0 4 3 6∈ span 5 , 1 , 8 13 0 4 0 6 5 6∈ span 3 , 1 , 13 8 0 and
0 4 6 1 6∈ span 3 , 5 . 8 13 0 6 4 0 Therefore, 3, 5 , and 1 are linearly independent. 8 13 0 23
12 3 2 Example 3.7: Are −5, −4, and 1 linearly independent or dependent? 9 0 3 From example 3.3, we know that 12 2 3 −5 ∈ span −4 , 1 . 9 0 3 12 3 2 Therefore, −5, −4, and 1 are linearly dependent. 9 0 3 Before going to the next section, let us talk about the formal definition of linear independence: if the only solution to c1~v1 + c2~v2 + · · · + cn~vn = 0 is c1 = c2 = · · · = cn = 0, then ~v1 , ~v2 , · · · , ~vn are linearly independent. Well, how is this definition related to the definition given at the beginning of this section? If all of the c1 , c2 , · · · , cn are 0, then there is no way to rearrange the expression to write one of the vectors as a linear combination of other vectors. If at least one of the c1 , c2 , · · · , cn is non-zero, then we can rearrange the expression to write one of the vectors as a linear combination of other vectors. For example, assuming that c1 6= 0, we can rearrange the expression as ~v1 = −
3.3
c2 c3 cn ~v2 − ~v3 − · · · − ~vn . c1 c1 c1
Vector Spaces and Subspaces
We have been talking about two-dimensional and three-dimensional spaces, but we have not discussed the concept of “space” in details. What is a vector space? For the introductory purpose of this book, we only need to know that a vector space is a set of vectors closed under addition and scalar multiplication. What does “closed” here mean? That means if ~u and ~v are in a vector space, then ~u + ~v and c~u are also in the same vector space for any scalar c. The word “closed” here means that addition and scalar multiplication do not make the vectors go out of the vector space. There are a few other axioms (or conditions) for a set of vectors to be a vector space, but they are not really important for our introductory purpose now. For example, one of the other axioms is that the vectors in a vector space must satisfy the commutative property, ~u +~v = ~v + ~u, but we already know that vector addition satisfies the commutative property from chapter 2. The other axioms are more important when we consider vector spaces in a broader sense. However, in this 24
book we are only considering vector spaces of real geometric vectors with finitely many elements which are real numbers. An important point to note is that a vector space must have the 0 vector. A 0 vector is a vector whose elements are all 0. For example, the three-dimensional 0 vector is 0 0 . 0 When we do scalar multiplication, the scalar c can be 0. So, if ~v is in a vector space, then c~v = 0~v = 0 must also be in the same vector space. Now why is the two-dimensional space a vector space? If it was not a vector space, we would not call it “two-dimensional space.” Remember from chapter 1 that every point (x, y) can be represented by a vector x . y When we add two-dimensional vectors in the two-dimensional space together, the result is another two-dimensional vector also in the two-dimensional space. When we multiply a two-dimensional vector by a scalar, the result is also another twodimensional vector. So, the two-dimensional space is indeed a vector space closed under addition and scalar multiplication. Similarly, the three-dimensional space is a vector space consisting of three-dimensional vectors x y . z In general, any n-dimensional space is a vector space with n-dimensional vectors. Also, there is a notation to denote the n-dimensional space. In algebra, we learned that the set of real numbers is denoted by R. In an n-dimensional space, each point is a list of n real numbers, so the n-dimensional space is denoted by Rn . For example, we can write 1 ∈ R2 5 1 to mean that is a vector in the two-dimensional space. 5 Each n-dimensional space Rn has subspaces. A subspace is a vector space contained in another vector space of the same size or bigger size, such as Rn . A subspace of Rn has n dimensions or fewer. For example, R2 has two-dimensional subspace (R2 itself) and one-dimensional subspaces (lines in R2 ), and R3 has threedimensional subspace (R3 itself), two-dimensional subspaces (planes in R3 ), and 25
one-dimensional subspaces (lines in R3 ). It should be easy to see that vectors lying on the same line are closed under addition and scalar multiplication if you think of the graphical representations of vector addition and vector multiplication by a scalar. Similarly for planes in R3 , if we add two vectors lying on the same plane together or multiply a vector lying on a plane by a scalar, the result will be another vector lying on that same plane.
z-axis
0 0 0 y-axis
x-axis
Figure 3.1: A two-dimensional subspace (or a plane) in R3 Remember that a subspace is also a vector space. So, for the lines and planes to be subspaces, they have to contain the origin, or the 0 vector. The 0 vector by itself can also be a subspace or a vector space with zero dimension, but it is not a very interesting vector space.
3.4
Vector Space as Span of Vectors
The concept of vector space is very closely related to the concepts of linear combination and linear independence. A vector space V can be written as the span of a certain set of linearly independent vectors ~v1 , ~v2 , · · · , ~vn , i.e. V = span {~v1 , ~v2 , · · · , ~vn } . The vectors ~v1 , ~v2 , · · · , ~vn are called basis vectors of V . We will discuss why basis vectors have to be linearly independent later in this section. Well, how does this align with the definition of a vector space? How is the span of vectors a set of vectors closed under addition and scalar multiplication? We know that the span of ~v1 , ~v2 , · · · , ~vn is the set of vectors of the form c1~v1 + c2~v2 + · · · + cn~vn . For two vectors ~u, ~v ∈ V , let ~u = c1~v1 + c2~v2 + · · · + cn~vn 26
and ~v = c01~v1 + c02~v2 + · · · + c0n~vn . Then, we have ~u + ~v = (c1 + c01 ) ~v1 + (c2 + c02 ) ~v2 + · · · + (cn + c0n ) ~vn ∈ span {~v1 , ~v2 , · · · , ~vn } and c~u = cc1~v1 + cc2~v2 + · · · + ccn~vn ∈ span {~v1 , ~v2 , · · · , ~vn } . So, V = span {~v1 , ~v2 , · · · , ~vn } is indeed a vector space closed under addition and scalar multiplication. Basis Vectors for n-Dimensional Space Next, let us discuss how a vector space can be written as the span of a certain set of vectors. Let us consider the two-dimensional space R2 first. Note that any two-dimensional vector x y 1 0 in R2 can be written as a linear combination of and : 0 1 x 1 0 =x +y . y 0 1 So, R2 = span
1 0 , . 0 1
However, 1 0 , 0 1 is nottheonly set of basis vectors for R2 . For example, from example 3.1, we know 5 4 3 that can be written as a linear combination of and . In fact, any −6 1 8 4 3 two-dimensional vector can be written as a linear combination of and : 1 8 4y − x 3 8x − 3y 4 x + . = 1 8 y 29 29 So, we can also write 4 3 R = span , , 1 8 2
27
and
4 3 , 1 8
is another possible set of basis vectors for R2 . What is the common characteristic of the two sets 1 0 , 0 1 and
4 3 , ? 1 8
Both of these sets have two linearly independent vectors. In general, any set of two linearly independent two-dimensional vectors can be a set of basis vectors for the two-dimensional space R2 . Similarly, for higher dimensions, any set of n linearly independent n-dimensional vectors can be a set of basis vectors for n-dimensional space Rn . For example, a possible set of basis vectors for R3 is the set of three linearly independent threedimensional vectors 0 0 1 0 , 1 , 0 , 1 0 0 and we can write
0 0 1 R3 = span 0 , 1 , 0 . 0 1 0
Note that any vector in R3 can be written as a linear combination of these three vectors: 1 0 0 x y = x 0 + y 1 + z 0 . 0 0 1 z It is important to note that there are only at most n linearly independent ndimensional vectors. Since any n-dimensional vector can be written as a linear combination of n linearly independent n-dimensional vectors, if we have n + 1 or more n-dimensional vectors, at least one of them must be able to be written as a linear combination of the other n vectors, so they are linearly dependent. For example, there are at most three linearly independent vectors in R3 ; there cannot be 4 or 5 or more linearly independent vectors. Basis Vectors for Subspaces What about the basis vectors for subspaces? Let us consider the one-dimensional subspaces in R2 , which are lines in R2 . Let us think of the graphical representation of scalar multiplication of vectors. When we multiply a vector by a scalar, we stretch or shrink the vector and possibly flip it to the opposite direction. So, all the scalar multiples of a vector lie on the same line with the vector. That means a line in R2 is the span of a two-dimensional vector lying on that line. 28
y-axis 2 2 1
1
−4
−3
−2
−1
1
2
3
4
x-axis
−1 −2
Figure 3.2: A one-dimensional subspace in R2
For example, span
2 1
is the line y = 21 x in R2 , which is a one-dimensional subspace in R2 . Similarly, a one-dimensional subspace, or a line, in R3 is the span of a threedimensional vector lying on that line. The basis is three-dimensional instead of two-dimensional because the subspace is in R3 . Let us also consider the two-dimensional subspaces, or planes, in R3 . What are the basis vectors of a two-dimensional subspace in R3 ? Let us think back to the basis vectors of R2 for a moment because R2 is basically a plane in R3 at z = 0. A set of basis vectors for R2 is any set of two linearly independent two-dimensional vectors. Similarly, a plane in R3 , not necessarily the same as R2 , is the span of any two linearly independent three-dimensional vectors lying on that plane. Note that the basis vectors are three-dimensional because the subspace is in R3 . For example, 2 3 span −4 , 1 0 3
3 2 is the plane in R3 with the vectors −4 and 1 lying on it, which is a two0 3 dimensional subspace in R3 . In general, a set of basis vectors for a m-dimensional subspace in Rn is any set of m linearly independent n-dimensional vectors in that subspace. Span of Linearly Dependent Vectors An important point about basis vectors is that basis vectors have to be linearly independent. For example, as mentioned before, a plane, or a two-dimensional 29
subspace, in R3 is the span of any two linearly independent three-dimensional vectors lying on that plane. So, why do the basis vectors have to be linearly independent? Let us consider the span of two linearly dependent vectors ~v1 and ~v2 . Since ~v1 and ~v2 are linearly dependent, let ~v2 = c~v1 for some scalar c. If ~u ∈ span {~v1 , ~v2 }, then ~u is a vector of the form ~u = c1~v1 + c2~v2 = c1~v1 + c2 c~v1 = (c1 + c2 c)~v1 ∈ span {~v1 } . Thus, if ~v1 and ~v2 are linearly dependent, then span {~v1 , ~v2 } is the same as span {~v1 }, which is a one-dimensional vector space instead of a two-dimensional vector space. That is why basis vectors have to be linearly independent. Main Points of This Section If this section was too complicated and too wordy for you, the main points of this section are summarized below. Although understanding more detailed explanations is important, you should at least memorize these facts to familiarize yourself with the topic. • Vector spaces (and subspaces) can be written as span of a set of vectors called basis vectors. • Basis vectors have to be linearly independent. • The span of any m n-dimensional basis vectors is a m-dimensional subspace in Rn , where m ≤ n. Remember that n-dimensional subspace in Rn is Rn itself. • There are at most n linearly independent vectors in Rn .
30
Exercises 1 −5 1. Write 0 as a linear combination of 4, 3 8
0 4 6, and 0. 1 2
2 5 1 1 1 ∈ span , 0 . 2. Verify that −2 −1 0 3 4 −2
−2 5 3. Are 1, 3 , 8 4
−1 6 0, and −3 linearly independent or dependent? 9 7
4. Which of the following is NOT a vector space? A. The three-dimensional space R3 B. A line in R2 with y-intercept (0, 3) C. A plane in R3 passing through (0, 0, 0) D. The line y = 5x in R2 5. There can be at most how many linearly independent vectors in R5 ?
6. Is
−2 −3 , a set of basis vectors for R2 ? Why or why not? 6 9
7. What is the dimension of the vector space 4 0 6 span 3 , 5 , 1 ? 8 13 0 This vector space is a subspace in Rn ; what is n?
31
8. What is the dimension of the vector space 0 −1 1 0 2 , , 2 ? span 3 5 8 6 1 5 This vector space is a subspace in Rn ; what is n?
32
Chapter 4
Norms of Vectors
33
4.1
Definition and Properties of Norms
There are many vector norms. By definition, for p ≥ 1, the `p norm of a vector v1 v2 ~v = . .. vn is defined as ||~v ||p =
p p
|v1 |p + |v2 |p + · · · + |vn |p .
For example, we have ||~v ||1 = |v1 | + |v2 | + · · · + |vn |, p ||~v ||2 = |v1 |2 + |v2 |2 + · · · + |vn |2 , p ||~v ||3 = 3 |v1 |3 + |v2 |3 + · · · + |vn |3 , etc. Let us look at a few examples. 1 Example 4.1: ~u = −5, ||~u||1 =? 2 ||~u||1 = |1| + | − 5| + |2| = 8. 3 Example 4.2: ~v = , ||~v ||2 =? 4 ||~v ||2 =
p
|3|2 + |4|2 = 5.
The vector norms satisfy the following properties:
1) ||~v || > 0 for all ~v 6= 0, and ||~v || = 0 if and only if ~v = 0;
2) ||k~v || = |k| · ||~v || for any scalar k;
3) ||~u + ~v || ≤ ||~u|| + ||~v ||.
Let us look at the first property. Recall that a 0 vector is a vector whose elements are all 0. If at least one element, say vi , is non-zero, then the norm ||~v ||p is larger than 0 because |vi | > 0 for any non-zero vi . 34
y-axis 3 2 1 −3 −2 −1 −1
1
2
3
x-axis
−2 −3
Figure 4.1: Graph of absolute value function |x|
If all elements are 0, i.e. v1 = v2 = · · · = vn = 0, then p ||~v ||p = p |0|p + |0|p + · · · + |0|p = 0. If ||~v ||p = 0, then the only way for p p |v1 |p + |v2 |p + · · · + |vn |p = 0 is when |v1 |p = |v2 |p = · · · = |vn |p = 0, which is when v1 = v2 = · · · = vn = 0. Remember that “if and only if” is a two-way implication. “||~v || = 0 if and only if ~v = 0” means “||~v || = 0 if ~v = 0” and “~v = 0 if ||~v || = 0.” Next, let us look at the second property. Since for any numbers a and b we have |ab| = |a| · |b|, p ||k~v ||p = p |kv1 |p + |kv2 |p + · · · + |kvn |p p = p |k|p |v1 |p + |k|p |v2 |p + · · · + |k|p |vn |p p p = p |k|p · p |v1 |p + |v2 |p + · · · + |vn |p = |k| · ||~v ||p . The third property is called the triangle inequality, and it is an extension of the triangle inequality for numbers |x + y| ≤ |x| + |y|
(4.1)
2 to vectors. Let us prove the triangle inequality for numbers first. Since |x| = |x| · |x| = x2 = x2 , we have
|x + y|2 = (x + y)2 = x2 + y 2 + 2xy. 35
Squaring the right hand side of (4.1) gives 2
(|x| + |y|) = |x|2 + |y|2 + 2|x||y| = x2 + y 2 + 2|x||y|. When x and y have the same sign (both positive or both negative) or one of them is 0, 2xy = 2|x||y|. When x and y have different signs, 2xy < 2|x||y|. Thus, we have 2xy ≤ 2|x||y| ⇔ x2 + y 2 + 2xy ≤ x2 + y 2 + 2|x||y| 2
⇔ |x + y|2 ≤ (|x| + |y|) .
y-axis 4
3
2
1
−2
−1
1
2
x-axis
Figure 4.2: Graph of function x2 By observing the graph of the function x2 , we can see that for two numbers a, b ≥ 0, if a2 ≥ b2 , then a ≥ b. Since |x + y| ≥ 0, |x| + |y| ≥ 0, and 2
|x + y|2 ≤ (|x| + |y|) , we can conclude that |x + y| ≤ |x| + |y|. 1
The triangle inequality for ` norm follows directly from the triangle inequality for numbers. Let u1 u2 ~u = . , .. un 36
then we have ||~u + ~v ||1 = |u1 + v1 | + |u2 + v2 | + · · · + |un + vn | ≤ |u1 | + |v1 | + |u2 | + |v2 | + · · · + |un | + |vn | = ||~u||1 + ||~v ||1 . The triangle inequality for `2 norm can be proven by using the triangle inequality for numbers and an inequality called Cauchy-Schwarz inequality. Using the triangle inequality for numbers, we get p ||~u + ~v ||2 = |u1 + v1 |2 + |u2 + v2 |2 + · · · + |un + vn |2 q 2 2 2 (4.2) ≤ (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) . √ √ The inequality in (4.2) follows from the fact that a ≥ b if a ≥ b and a, b ≥ 0, as √ observed in the graph of x. y-axis 2
1
1
2
3
4
Figure 4.3: Graph of function
x-axis
√
x
By definition of `2 norm, p p ||~u||2 + ||~v ||2 = |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 . Squaring both sides, p 2 p 2 |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 (||~u||2 + ||~v ||2 ) =
=
p 2 p 2 |u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 p p + 2 |u1 |2 + |u2 |2 + · · · + |un |2 · |v1 |2 + |v2 |2 + · · · + |vn |2
=|u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 p + 2 (|u1 |2 + |u2 |2 + · · · + |un |2 ) (|v1 |2 + |v2 |2 + · · · + |vn |2 ). (4.3) This is where we use the Cauchy-Schwarz inequality. The Cauchy-Schwarz inequality states that 2 a21 + a22 + · · · + a2n b21 + b22 + · · · + b2n ≥ (a1 b1 + a2 b2 + · · · + an bn ) . 37
The proof of Cauchy-Schwarz inequality is in an appendix for those who are interested. Using the Cauchy-Schwarz inequality, we have |u1 |2 + |u2 |2 + · · · + |un |2 |v1 |2 + |v2 |2 + · · · + |vn |2 2
≥ (|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |) . Taking the square root of both sides, p (|u1 |2 + |u2 |2 + · · · + |un |2 ) (|v1 |2 + |v2 |2 + · · · + |vn |2 ) ≥ |u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |.
(4.4)
Note that here we have q 2 (|u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn |) = |u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn | because |u1 ||v1 | + |u2 ||v2 | + · · · + |un ||vn | ≥ 0. Remember that the square root If a number a is √ always takes non-negative p values. √ negative, then we would have a2 = |a|. For example, (−2)2 = 4 = 2 = | − 2|. Substituting (4.4) into (4.3), 2
(||~u||2 + ||~v ||2 ) ≥|u1 |2 + |u2 |2 + · · · + |un |2 + |v1 |2 + |v2 |2 + · · · + |vn |2 + 2|u1 ||v1 | + 2|u2 ||v2 | + · · · + 2|un ||vn | 2
2
2
= (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) . Taking the square root of both sides, q 2 2 2 ||~u||2 + ||~v ||2 ≥ (|u1 | + |v1 |) + (|u2 | + |v2 |) + · · · + (|un | + |vn |) .
(4.5)
Inequality is transitive: if a ≥ b and b ≥ c, then a ≥ c. Thus, from (4.2) and (4.5), we can conclude that ||~u + ~v ||2 ≤ ||~u||2 + ||~v ||2 . The proof of triangle inequality for `p norms for other values of p is a little bit more complicated and requires the use of the more generalized version of CauchySchwarz inequality called H¨ older’s inequality. The triangle inequality for `p norms is also known as Minkowski inequality. We will not prove the triangle inequality for `p norms (the Minkowski inequality) here, and the readers will find out in the next section that our main concern in this book is the `2 norm. At the same time, we will also explore the `1 norm a little bit.
4.2
Graphical Meaning of `2 Norm
As we learned in geometry, the distance between two points (x1 , y1 ) and (x2 , y2 ) is q 2 2 (x2 − x1 ) + (y2 − y1 ) . 38
So, the distance between a point (x, y) and the origin (0, 0) is p p (x − 0)2 + (y − 0)2 = x2 + y 2 . Hence, for a two-dimensional vector v ~v = 1 , v2 the `2 norm ||~v ||2 =
p
|v1 |2 + |v2 |2 =
q
v12 + v22
is the distance between the point (v1 , v2 ) and the origin (0, 0). Recall from chapter 1 that the vector ~v is a vector going from (0, 0) to (v1 , v2 ). So, the distance between (0, 0) and (v1 , v2 ) is the length of the vector ~v , and thus the `2 norm of a vector is the length of that vector. This is also a direct consequence from the Pythagoras’ theorem. (v1 , v2 )
||v||2 =
p v12 + v22 v2
(0, 0)
v1
Figure 4.4: Graphical interpretation of `2 norm in R2 Similarly in R3 , the distance between two points (x1 , y1 , z1 ) and (x2 , y2 , z2 ) is q 2 2 2 (x2 − x1 ) + (y2 − y1 ) + (z2 − z1 ) . So, the `2 norm of a three-dimensional vector v1 ~v = v2 , v3 which is ||~v ||2 =
q
v12 + v22 + v32 ,
is the distance between (0, 0, 0) and (v1 , v2 , v3 ) and thus the length of ~v since ~v is a vector going from (0, 0, 0) to (v1 , v2 , v3 ). In general, the `2 norm of a vector v1 v2 ~v = . .. vn 39
is the length of ~v in Rn . Now let us take a look at the triangle inequality again. In geometry, we learned the triangle inequality which states that the sum of lengths of two sides of a triangle is greater than the length of the third side.
B
C A Figure 4.5: AB + BC > AC This is actually exactly the same as the triangle inequality for `2 norm. Recall from chapter 2 that the graphical representation of vector addition is as follows.
~v
~u
~u + ~v Figure 4.6: Graphical representation of vector addition From what we learned about the graphical meaning of `2 norm, we know that ~u has length ||~u||2 , ~v has length ||~v ||2 , and ~u + ~v has length ||~u + ~v ||2 . The triangle inequality for `2 norm states that ||~u||2 + ||~v ||2 ≥ ||~u + ~v ||2 . However, the case ||~u||2 + ||~v ||2 = ||~u + ~v ||2 only happens when ~v = a · ~u for some non-negative scalar a: ||~u + ~v ||2 = ||~u + a · ~u||2 = ||(1 + a) · ~u||2 = (1 + a)||~u||2 = ||~u||2 + a||~u||2 = ||~u||2 + ||a · ~u||2 = ||~u||2 + ||~v ||2 . If ~v = a · ~u, then ~u + ~v = (1 + a) · ~u. Thus, ~u, ~v , ~u + ~v ∈ span {~u} , 40
which means all three vectors ~u, ~v , and ~u + ~v lie on the same line. When they do not lie on the same line and form a triangle like in figure 4.6, the triangle inequality for `2 norm would give us ||~u||2 + ||~v ||2 > ||~u + ~v ||2 , which is exactly the same as the triangle inequality we learned in geometry.
4.3
Unit Vectors and Normalization of Vectors
In this section, we will discuss the unit vectors briefly. It is a simple but also important concept in linear algebra. What are unit vectors? Well, as the word “unit” suggests, a unit vector is a vector with length one. A unit vector is usually denoted as a letter with a small hat above, such as vˆ. Using what we learned about `2 norm, if vˆ is a unit vector, then ||ˆ v ||2 = 1. Let us look at a few examples of unit vectors, especially the important ones related to basis of vector spaces. In the last chapter, we learned that a possible set of basis vectors for R2 is 1 0 , . 0 1 There is something special about these vectors: they are unit vectors because 1 = 0 = 1. 1 0 2 2 These basis vectors also represent the points (1, 0) and (0, 1) on the x-axis and the y-axis, respectively. These basis vectors are so special that they have special notations to denote them: 1 ˆı = 0 and 0 ˆ = . 1 So, any vectors in R2 can be written as x = x ˆı + y ˆ. y A set of basis vectors consisting of unit vectors representing points on the coordinate axes like this is called the standard basis. Similarly to R2 , the standard basis of R3 is 0 0 1 0 , 1 , 0 . 0 0 1 41
Note that these are unit vectors because 0 0 1 0 = 1 = 0 = 1. 1 0 0 2
2
2
These vectors also have special notations: 1 ˆı = 0 , 0 0 ˆ = 1 , 0 and
0 kˆ = 0 . 1
The first two have the same notations as those of the standard basis vectors of R2 because they are the same points on the x-axis and the y-axis. Next, let us talk about normalization of vectors. Normalizing a vector is turning that vector into a unit vector. When we normalize a vector, we keep the direction of the vector the same, and we only changes the length, or the `2 norm. Let vˆ denote the normalized version of ~v . To normalize a vector, we divide that vector by its length: ~v vˆ = . ||~v ||2 We can check that vˆ has length one: ~v = ||~v ||2 = 1. ||ˆ v ||2 = ||~v ||2 2 ||~v ||2 We can factor out ||~v1||2 , which is a scalar, from the norm by using the second property of norms. Let us look at a few examples. 1 Example 4.3: Normalize the vector ~u = . −2 The length of ~u is ||~u||2 =
p √ 12 + (−2)2 = 5.
The normalized vector u ˆ is
√ ~u 1 1 1/ √5 u ˆ= =√ = . −2 −2/ 5 ||~u||2 5 3 Example 4.4: Normalize the vector ~v = 0. 4 42
The length of ~v is ||~v ||2 =
p 32 + 02 + 42 = 5.
The normalized vector vˆ is 3 3/5 ~v 1 0 = 0 . vˆ = = ||~v ||2 5 4 4/5
4.4
Graphical Meaning of `1 Norm
In this section, we will explore the graphical meaning of `1 norm. Similarly to `2 norm, `1 norm also represents a distance but not the same distance as that of `2 norm. The `1 norm represents the distance in Taxicab geometry, one of the non-Euclidean geometries. The distance in Taxicab geometry is called “L1 distance.” The distance represented by the `2 norm, which we discussed in the last section, is the distance in Euclidean geometry, which is the type of geometry that we learn in high school geometry. Now what is the difference between L1 distance and Euclidean distance?
q(x2 , y2 )
p(x1 , y1 ) Figure 4.7: Euclidean distance In Euclidean geometry, the Euclidean distance between two points p(x1 , y1 ) and q(x2 , y2 ) is the length of the line segment connecting those two points, and we have the familiar formula q 2 2 d(p, q) = (x2 − x1 ) + (y2 − y1 ) , where d(p, q) denotes the Euclidean distance between p and q. In Euclidean geometry, there is only one unique path for the distance between two points.
43
q(x2 , y2 )
p(x1 , y1 ) Figure 4.8: L1 distance in Taxicab geometry In Taxicab geometry, the L1 distance between two points p(x1 , y1 ) and q(x2 , y2 ) is the shortest distance from p to q (or q to p) moving only horizontally and vertically. If you play chess, this is like how the rooks in chess move. Unlike Euclidean distance, L1 distance has many possible paths, as shown in figure 4.8.
q(x2 , y2 )
|y2 − y1 |
p(x1 , y1 )
|x2 − x1 |
Figure 4.9: L1 distance formula Since we are moving horizontally and vertically, the formula for L1 distance between p(x1 , y1 ) and q(x2 , y2 ) can be obtained by adding the absolute differences of their coordinates: d1 (p, q) = |x2 − x1 | + |y2 − y1 |, where d1 (p, q) denotes the L1 distance between p and q. Thus, for a vector v ~v = 1 , v2 the `1 norm ||~v ||1 = |v1 | + |v2 | is the L1 distance between the point (v1 , v2 ) and the origin (0, 0). In higher dimensions, the L1 distance is the shortest distance when moving only 44
in directions along the coordinate axes. For example, in three dimensions, the L1 distance is the shortest distance when moving only in x-direction, y-direction, and z-direction, and we have the formula d1 (p, q) = |x2 − x1 | + |y2 − y1 | + |z2 − z1 | for the L1 distance between p(x1 , y1 , z1 ) and q(x2 , y2 , z2 ). So, for a three-dimensional vector v1 ~v = v2 , v3 the `1 norm ||~v ||1 = |v1 | + |v2 | + |v3 | 1
is the L distance between the point (v1 , v2 , v3 ) and the origin (0, 0, 0). In general, the `1 norm of a vector
v1 v2 ~v = . .. vn is the L1 distance between the origin and the point (v1 , v2 , · · · , vn ) in Rn . There is another Taxicab geometry fun fact in an appendix. You should check it out if you find Taxicab geometry interesting.
45
Exercises
2 1. ~u = −3. ||~u||1 =? 5 2. ~v =
5 . ||~v ||2 =? 12
1 3. Normalize the vector w ~ = −2. −1
2 1 4. Let ~u = −3 and ~v = 4. 1 0
a. Find ||~u||2 , ||~v ||2 , and ||~u + ~v ||2 . b. Verify that ||~u + ~v ||2 ≤ ||~u||2 + ||~v ||2 . 5. Express the distance∗ of the point (12, −9) from the origin as a vector norm.
6. Express the distance of the point (7, −1, 10) from the origin as a vector norm.
7. Imagine a rook moving from the origin to the point (−15, 9) on the xy-plane and assume that each grid is 1 cm by 1 cm. a. What is the shortest distance the rook can travel? b. Express this distance as a vector norm.
∗ The word “distance” in this book is assumed to be Euclidean distance unless implied by context or specified to be otherwise.
46
Chapter 5
Dot Product and Orthogonality
47
5.1
Transpose of Vectors
Before discussing dot product, let us talk about the transpose of vectors briefly. It is a very simple concept. When we take the transpose of a vector, we rotate the vector so that it becomes horizontal: a vertical vector becomes a horizontal vector. The transpose of a vector ~v is denoted as ~v T . Let v1 v2 ~v = . , .. vn then we have ~v T = v1
v2
···
vn .
Let us look at a few examples. −2 Example 5.1: ~u = , ~u T =? 7 ~u T = −2
7 .
5 Example 5.2: ~v = 0 , ~v T =? −11
~v T = 5
0
−11 .
Transpose of vectors does not have a geometrical meaning in particular. A vector ~v and its transpose ~v T both represent the same point; it is just that ~v T is a horizontal vector (or a row vector) while ~v is a vertical vector (or a column vector).
5.2
Dot Product of Vectors
In this section, we will talk about the definition and properties of dot product. First, what is a dot product? The dot product of ~u and ~v is denoted as ~u ·~v . When we take the dot product of two vectors, we multiply each element of the vectors element-by-element and then add all of the products together: u1 v1 u2 v2 ~u · ~v = . · . = u1 v1 + u2 v2 + · · · + un vn . .. .. un
vn
Let us look at a few examples. 2 −1 Example 5.3: · =? 5 4 2 −1 · = 2 · (−1) + 5 · 4 = 18. 5 4 48
1 0 · =? 0 1 1 0 · = 1 · 0 + 0 · 1 = 0. 0 1 3 2 Example 5.5: −5 · 4 =? 6 3 3 2 −5 · 4 = 3 · 2 + (−5) · 4 + 6 · 3 = 4. 6 3 −1 −1 Example 5.6: 2 · −3 =? 1 5 −1 −1 2 · −3 = (−1) · (−1) + 2 · (−3) + 1 · 5 = 0. 1 5 Example 5.4:
The dot product of vectors can also be written as multiplication of a row vector and a column vector: ~u · ~v = ~u T ~v . For example, 2 −1 · = 2 5 4
−1 5 . 4
This fact is important when we learn the multiplication of matrices in part II. Next, let us discuss the properties that the dot product satisfies. Like the multiplication of numbers, the dot product satisfies the commutative property, ~u · ~v = ~v · ~u, and the distributive property, ~u · (~v + w) ~ = ~u · ~v + ~u · w. ~ Let
u1 u2 ~u = . , .. un v1 v2 ~v = . , .. vn 49
and
w1 w2 w ~ = . . .. wn Then, using the commutative property and the distributive property of multiplication of numbers, we have v1 u1 u2 v2 ~u · ~v = . · . = u1 v1 + u2 v2 + · · · + un vn .. ..
un
vn
v1 u1 v2 u2 = v1 u1 + v2 u2 + · · · + vn un = . · . = ~v · ~u .. .. vn
un
and v1 w1 u1 v1 + w1 u1 u2 v2 w2 u2 v2 + w2 ~u · (~v + w) ~ = . · . + . = . · .. .. .. .. .. .
un
vn
wn
un
vn + wn
= u1 (v1 + w1 ) + u2 (v2 + w2 ) + · · · + un (vn + wn ) = u1 v1 + u1 w1 + u2 v2 + u2 w2 + · · · + un vn + un wn
u1 v1 u1 w1 u2 v2 u2 w2 = . · . + . · . = ~u · ~v + ~u · w. ~ .. .. .. .. un
vn
un
wn
The dot product also satisfies associative property with a scalar. For any vectors ~u and ~v and any scalar a, (a~u) · ~v = a(~u · ~v ). 50
Using the distributive property of multiplication of numbers, v1 a u1 v1 u1 u2 v2 a u2 v2 (a~u) · ~v = a . · . = . · . = a u1 v1 + a u2 v2 + · · · + a un vn .. .. .. .. vn a un vn un u1 v1 u2 v2 = a (u1 v1 + u2 v2 + · · · + un vn ) = a . · . = a(~u · ~v ). .. .. un
vn
There is an interesting relation between the `2 norm and the dot product: 2
~v · ~v = (||~v ||2 ) because v1 v1 v 2 v2 ~v · ~v = . · . .. ..
vn
vn
+ + · · · + vn2 q 2 2 2 2 = v1 + v2 + · · · + vn =
v12
v22
2
= (||~v ||2 ) .
5.3
Graphical Representation of Dot Product
Now that we learned what the dot product is, let us take a look at the graphical meaning of dot product. The dot product ~u · ~v is the length of the vector obtained after projecting ~u on to ~v and then multiplying by the length of ~v . Also, we know that the dot product is commutative, so the order does not matter. We can also project ~v on to ~u and then multiply by the length of ~u. Let us look at the projection part first. Remember that the vector ~u has length ||~u||2 . By using the trigonometric ratio in right triangles, we get that the length of projection of ~u on to ~v is ||~u||2 cos θ, where θ is the smaller angle between ~u and ~v .
51
~u ||~u||2
θ
~v
||~u||2 cos θ Figure 5.1: Projection of ~u on to ~v Then, after the projection, we need to multiply by the length of ~v , which is ||~v ||2 . So, we have the formula ~u · ~v = ||~u||2 ||~v ||2 cos θ. We can also use this formula to find the measure of the smaller angle between two vectors. Rearranging the formula above, we have ~u · ~v θ = cos−1 . ||~u||2 ||~v ||2 Let us look at a few examples. Example 5.7: Find the measure of the smaller angle between
−1 2 and . 3 4
First, we need to find the dot product of the two vectors and the length of each vector: −1 2 · = (−1) · 2 + 3 · 4 = 10, 3 4 p √ −1 2 2 3 = (−1) + 3 = 10, 2 and
p √ 2 2 2 4 = 2 + 4 = 2 5. 2
Therefore, the measure of the smaller angle θ between those two vectors is 10 1 √ θ = cos−1 √ = cos−1 √ = 45◦ . 10 · 2 5 2 1 0 Example 5.8: Find the measure of the smaller angle between 2 and 1 . −1 −1 First, we need to find the dot product of the two vectors and the length of each vector: 1 0 2 · 1 = 1 · 0 + 2 · 1 + (−1) · (−1) = 3, −1 −1 52
1 p √ 2 = 12 + 22 + (−1)2 = 6, −1 2 and
0 p √ 1 = 02 + 12 + (−1)2 = 2. −1 2
Therefore, the measure of the smaller angle θ between those two vectors is √ ! 3 3 θ = cos−1 √ √ = 30◦ . = cos−1 2 6· 2 Example 5.9: Find the measure of the smaller angle between
√ 0 3 and . 2 1
First, we need to find the dot product of the two vectors and the length of each vector: √ √ 0 3 · = 0 · 3 + 2 · 1 = 2, 2 1 p 0 2 2 2 = 0 + 2 = 2, 2 and
q √ √ 2 3 = 3 + 12 = 2. 1 2
Therefore, the measure of the smaller angle θ between those two vectors is 2 1 θ = cos−1 = cos−1 = 60◦ . 2·2 2 Note that when ~u = ~v , the angle θ between them is 0 degree, and we have 2
~v · ~v = ||~v ||2 ||~v ||2 cos (0◦ ) = (||~v ||2 ) because cos (0◦ ) = 1.
5.4
Orthogonal Vectors
Next, let us discuss the orthogonality of vectors and how to determine when two vectors are orthogonal. The word “orthogonal” is just a fancier word for “perpendicular.” However, it is worth noting that the word “orthogonal” is better for vectors in general, and the word “perpendicular” only applies to geometric vectors (arrows) that we are learning in this book. If you continue to learn more Linear Algebra, you will learn that there are more types of vectors than just the geometric vectors. For the non-geometric vectors, it only makes sense to say that they are orthogonal, not perpendicular. Although we only discuss orthogonality of geometric 53
vectors here, the definition of orthogonality of vectors discussed here still applies to non-geometric vectors in general. We can determine whether two non-zero vectors are orthogonal by using the dot product. If the dot product of two non-zero vectors is equal to 0, then those two non-zero vectors are orthogonal. Let us see why this is true. There are two ways to see why this is true. First, we can use the formula obtained in the last section: ~u · ~v = ||~u||2 ||~v ||2 cos θ. When two vectors ~u and ~v are orthogonal, the angle θ between them is 90◦ . So, ~u · ~v = ||~u||2 ||~v ||2 cos (90◦ ) = 0 when ~u and ~v are orthogonal because cos (90◦ ) = 0.
~u + ~v
~v
~u
Figure 5.2: ~u + ~v when ~u and ~v are orthogonal Another way is by using the Pythagoras’ theorem. When two vectors ~u and ~v are orthogonal, we have the graph for the vector addition ~u + ~v as in figure 5.2. Recall that ~u has length ||~u||2 , ~v has length ||~v ||2 , and ~u + ~v has length ||~u + ~v ||2 . By Pythagoras’ theorem, we have 2
2
2
(||~u||2 ) + (||~v ||2 ) = (||~u + ~v ||2 ) . Using the fact that the dot product of a vector with itself is the square of the `2 2 norm of that vector, i.e. ~v · ~v = (||~v ||2 ) , we have ~u · ~u + ~v · ~v = (~u + ~v ) · (~u + ~v ). Using the distributive property of dot product, (~u + ~v ) · (~u + ~v ) = ~u · (~u + ~v ) + ~v · (~u + ~v ) = ~u · ~u + ~u · ~v + ~v · ~u + ~v · ~v = ~u · ~u + ~v · ~v + 2~u · ~v . Thus, we get ~u · ~u + ~v · ~v = ~u · ~u + ~v · ~v + 2~u · ~v . 54
Subtracting ~u · ~u + ~v · ~v from both sides, we obtain 2~u · ~v = 0 ~u · ~v = 0 when ~u and ~v are orthogonal. Let us look at a few examples of orthogonal vectors. From example 5.4, we know that 1 0 · = 0, 0 1 1 0 so and are orthogonal. From example 5.6, we know that 0 1 −1 −1 2 · −3 = 0, 5 1 −1 −1 so 2 and −3 are orthogonal. 1 5 The concept of orthogonal vectors can be extended to more than two vectors. When there are more than two vectors, the vectors are orthogonal if each vector is orthogonal to each of the other vectors. For example, let us look at the standard basis vectors for R3 : 0 0 1 0 , 1 , 0 . 0 0 1 These vectors are orthogonal because 1 0 1 0 0 0 0 · 1 = 0 · 0 = 1 · 0 = 0. 0 0 0 1 0 1 In the last chapter, we learned that the standard basis vectors are unit vectors. Here, we saw that the standard basis vectors are orthogonal as well. Vectors that are both unit and orthogonal like these are called orthonormal vectors. If we have a set of orthogonal vectors, we can always obtain orthonormal vectors by normalizing each vector. For example, we know that −1 2 1 and
−1 −3 5 55
are orthogonal, so we can obtain orthonormal vectors by normalizing −1 2 1 and
−1 −3 . 5
56
Exercises 2 5 T 1. ~v = 0 , ~v =? −8
2. For any vectors ~u and ~v , show that (~u + ~v ) T = ~u T + ~v T . 9 1 3. −3 · 7 =? −6 −4
4. −1
8 6 =? 3 1 4 −5 =? −7
5. 9
−2
1 −1 6. Find the measure of the smaller angle between and . 3 7 −2 3 7. Find the measure of the smaller angle between 1 and −5. 5 4 8. Are
√ √ 1/√2 −1/√ 2 and orthogonal? If yes, are they orthonormal? 1/ 2 1/ 2
1 9. Are 4, 3
−1 1 −2, and −1 orthogonal? If yes, are they orthonormal? 3 1
57
1 1 , 10. Are 1 1 mal?
−1 −1 , 1 1
1 1 −1 , and −1 orthogonal? If yes, are they orthonor−1 1 1 −1
58
Part II
Matrices
59
Chapter 6
What are Matrices?
61
6.1
Introduction to Matrices
In chapter 1, we learned that a vector is a list of numbers. A matrix is bigger; it is a block of numbers such as −1 2 , (6.1) 3 −4 −3 0 5 , (6.2) 7 −2 6 −9 8 −6 1 0 7 , (6.3) −5 −1 10 etc. A matrix is usually denoted as an uppercase letter: −1 2 A= . 3 −4 We can also think of a matrix as a list of column (vertical) vectors or a list of row (horizontal) vectors. For example, | | | −3 0 5 = ~v1 ~v2 ~v3 7 −2 6 | | | where
−3 , 7 0 ~v2 = , −2 5 ~v3 = , 6 ~v1 =
where
~u2T
|
~u3T
|
|
~u1T
−9 ~u1 = 8 , −6 1 ~u2 = 0 , 7 −5 ~u3 = −1 . 10 62
|
8 0 −1
−6 7 = 10
|
−9 1 −5
|
and
Similarly to vectors, matrices also have dimensions. We describe the dimensions of a matrix like how we describe the dimensions of paper. A m × n matrix is a matrix with m rows and n columns. For example, the matrix in (6.2) is a 2 × 3 matrix. When m = n, a n × n matrix is a square matrix, obviously because it is a square. For example, the matrix in (6.1) is a 2 × 2 square matrix, and the matrix in (6.3) is a 3 × 3 square matrix. There is also a convenient notation for the elements of matrices. Elements of a matrix are usually denoted as lowercase letters with subscripts containing the row number and the column number. It is like a coordinates system where we identify a point with coordinates. For example, the element on row 2 and column 3 can be denoted as a23 . A 3 × 3 matrix A can be written as a11 a12 a13 A = a21 a22 a23 , a31 a32 a33 and a 2 × 4 matrix B can be written as b b B = 11 12 b21 b22
b13 b23
b14 . b24
Remember that we always put the row number before the column number, whether they are dimensions of matrices or subscripts of matrix elements.
6.2
Main Diagonal, Trace, and Transpose
In this section, we will talk about some really basic concepts related to matrices, particularly square matrices. First, let us talk about the main diagonal of a square matrix. The main diagonal of a matrix is the diagonal segment from the top left corner to the bottom right corner of the matrix. In other words, the main diagonal of a matrix A contains the elements aij such that the row number i is the same as the column number j (i = j). For example, the main diagonal of a 4 × 4 matrix a11 a12 a13 a14 a21 a22 a23 a24 a31 a32 a33 a34 a41 a42 a43 a44 contains the elements a11 , a22 , a33 , and a44 . Now what is the trace of a matrix? The trace of a matrix A, denoted as Tr(A), is the sum of all the elements on the main diagonal of A. For example, a11 a12 a13 Tr a21 a22 a23 = a11 + a22 + a33 . a31 a32 a33 63
In general, for a n × n square matrix a11 a12 a21 a22 A= . .. .. . an1 an2
··· ··· .. .
a1n a2n .. , .
···
ann
the main diagonal of A contains the elements a11 , a22 , · · · , ann , and Tr(A) = a11 + a22 + · · · + ann . Next, let us discuss the transpose of a matrix. This concept also applies to nonsquare matrices. In chapter 5, we learned the transpose of vectors, which are basically matrices with only one column. When we take the transpose of a matrix, we take the transpose of each column of that matrix. So, column 1 of A becomes row 1 of AT , column 2 of A becomes row 2 of AT , and so on. Also, note that as columns of A become rows of AT , the rows of A also become columns of AT . So, taking the transpose of a matrix is basically flipping columns and rows of that matrix. For example, if a11 a21 A= a31 a41 then AT = If
then
a11 a12
b B = 11 b21
a12 a22 , a32 a42
a21 a22 b12 b22
b11 B T = b12 b13
a31 a32
a41 . a42
b13 , b23 b21 b22 . b23
So, in general, the transpose of a m × n matrix is a n × m matrix. The matrix A above is a 4 × 2 matrix, and AT is a 2 × 4 matrix; B is a 2 × 3 matrix, and B T is a 3 × 2 matrix. For square matrices, the transpose is much simpler. For example, if c11 c12 c13 C = c21 c22 c23 , c31 c32 c33 64
then c11 C T = c12 c13
c21 c22 c23
c31 c32 . c33
So, we can see that when we take the transpose of a square matrix, we are basically flipping the elements across the main diagonal. Perhaps, it would be easier to see with numbers than with symbols: if 1 −3 6 M = −2 5 0 , 9 −1 7 then
MT
1 = −3 6
−2 5 0
9 −1 . 7
Notice how the main diagonal stays the same and the other numbers are reflecting across the main diagonal.
6.3
Special Types of Matrices
There are a few basic special types of matrices that you should be familiar with. In this section, we will learn about those special matrices. Also, note that these types of matrices only apply to square matrices. Triangular Matrices There are two types of triangular matrices: lower-triangular matrices and uppertriangular matrices. A lower-triangular matrix is a matrix whose elements above the main diagonal are all 0. So, the elements on the main diagonal and below form a triangle in the lower-half of the matrix. For example, 1 0 0 2 3 0 4 5 6 and −2 5 0 7
0 8 −1 0
0 0 0 0 0 0 9 −6
are lower-triangular matrices. The elements on the main diagonal and below are not necessarily non-zero. 65
In general, a lower-triangular matrix is of l11 0 l21 l22 L= . .. .. . ln1 ln2
the form ··· ··· .. .
0 0 .. .
···
lnn
.
Note that lij = 0 whenever i < j. Upper-triangular matrices are the opposite of lower-triangular matrices. An uppertriangular matrix is a matrix whose elements below the main diagonal are all 0. So, the elements on the main diagonal and above form a triangle in the upper-half of the matrix. For example, 5 −3 0 2 and
10 −6 0 7 0 0
1 0 −9
are upper-triangular matrices. The elements on the main diagonal and above are not necessarily non-zero. In general, an upper-triangular matrix is of the form u11 u12 · · · u1n 0 u22 · · · u2n U = . .. .. . .. .. . . . 0 0 · · · unn Note that uij = 0 whenever i > j. Symmetric Matrices A symmetric matrix is a matrix whose elements above the main diagonal are the same as the elements below the main diagonal. So, the elements of the matrix are symmetric across the main diagonal. For example, 1 −3 , −3 5 4 3 −2 3 0 7 , −2 7 −1 and
0 1 2 3
1 4 5 6
2 5 7 8 66
3 6 8 9
are symmetric matrices. In general, if we denote the elements of a symmetric matrix S as sij , then we have sij = sji . For example, for s11 s12 s13 4 3 −2 S = s21 s22 s23 = 3 0 7 , s31 s32 s33 −2 7 −1 we have s12 = s21 = 3, s13 = s31 = −2, and s23 = s32 = 7. Note that when we take the transpose of a symmetric matrix, we end up with the same matrix. In other words, if S is a symmetric matrix, then S T = S. For example, T 4 3 −2 4 3 −2 3 0 7 = 3 0 7 . −2 7 −1 −2 7 −1 Skew-symmetric Matrices A skew-symmetric matrix is a matrix whose elements above the main diagonal are the negatives of the elements below the main diagonal and whose elements on the main diagonal are all 0. For example, 0 3 −3 0 and
0 6 −8
−6 8 0 1 −1 0
are skew-symmetric matrices. In general, if we denote the elements of a skewsymmetric matrix A as aij , then we have aij = −aji . For example, for a a12 0 3 A = 11 = , a21 a22 −3 0 we have a12 = −a21 = 3, a11 = −a11 = 0, and a22 = −a22 = 0. Note that when we take the transpose of a skew-symmetric matrix, we end up with the negative of the original matrix. In other words, if A is a skew-symmetric matrix, then AT = −A. We have not talked about multiplying a matrix with a scalar yet, but it is basically the same as multiplying a vector with a scalar: we multiply the scalar to each element of the matrix. For example,
0 6 −8
−6 0 −1
T 8 0 1 = −6 0 8
6 −8 0 0 −1 = − 6 1 0 −8
Diagonal Matrices 67
−6 8 0 1 . −1 0
A diagonal matrix is a matrix whose elements not on the main diagonal are all 0. For example, 1 0 0 0 2 0 0 0 3 and
−5 0 0 0
0 0 0 8 0 0 0 0 0 0 0 6
are diagonal matrices. The elements on the main diagonal are not necessarily nonzero. In general, a diagonal matrix is of the form d11 0 · · · 0 d22 · · · D= . .. .. .. . . 0
0 0 .. .
···
0
.
dnn
Note that dij = 0 whenever i 6= j. It is also important to know that a diagonal matrix is also a lower-triangular matrix, an upper-triangular matrix, and a symmetric matrix. However, not all of the lower-triangular matrices, upper-triangular matrices, and symmetric matrices are diagonal matrices. It is like how squares are rectangles, but not all rectangles are squares. A diagonal matrix is a lower-triangular matrix because the elements above the main diagonal are all 0. It is an upper-triangular matrix because the elements below the main diagonal are all 0. It is also a symmetric matrix because the elements above and below the main diagonal are all the same: they are all 0. Identity Matrix The identity matrix is a special case of diagonal matrices. It is the diagonal matrix whose elements on the main diagonal are all 1: 1 0 ··· 0 0 1 · · · 0 I = . . . . . . ... .. .. 0
0
···
For example, the 3 × 3 identity matrix is 1 0 I3 = 0 1 0 0 68
1 0 0 . 1
The identity matrix is the matrix version of number 1. We will see why when learning about the matrix multiplication in the next chapter. Orthogonal Matrices Last but not least, orthogonal matrices are also an important type of matrices. An orthogonal matrix is a matrix whose columns are orthonormal. That means the column vectors must be orthogonal to each other and have unit length. The name “orthogonal” is somewhat misleading because the columns are not just orthogonal; they are orthonormal. For example, 1 0 0
0 1 0
0 0 1
is an orthogonal matrix because the column vectors 0 0 1 0 , 1 , 0 1 0 0 are orthonormal, and 1/2 1/2 1/2 1/2
−1/2 −1/2 1/2 1/2
1/2 1/2 −1/2 −1/2 −1/2 1/2 1/2 −1/2
is an orthogonal matrix because the column vectors 1/2 −1/2 1/2 1/2 1/2 −1/2 −1/2 −1/2 , 1/2 1/2 , −1/2 , 1/2 1/2 1/2 1/2 −1/2 are orthonormal.
69
Exercises 3 1. What are the dimensions of the matrix 5 8
0 −2 12
−6 11 −1
2. Which of the following is NOT a square matrix? 2 4 6 8 2 1 0 −9 −7 −5 −3 −3 A. C. B. 1 0 1 0 −1 10 6 1 2 3 4
7 9 . 10
−5 9 10
3. Let A=
−7 0
3 −1
8 5
and denote the elements of A as aij . a12 =? a21 =? a23 =?
4. Let
2 B = −1 8
−3 0 12
6 5 . −9
a. From top to bottom, what are the elements on the main diagonal of B? b. Tr(B) =? 3 5. C = 8 1
0 −5 6
−7 2 . C T =? 9
6. For any matrix A, (AT )T =?
7. Categorize each of the following matrices as lower-triangular matrix, uppertriangular matrix, symmetric matrix, skew-symmetric matrix, and/or diagonal matrix. 70
1 A. 0
2 3
1 8. Given −1, −1
0 B. 0 0
0 0 0
0 0 0
−2 C. 7 5
0 8 −3
0 0 1
1 10 D. 10 3
2 0 1, and −1, check that these vectors are orthogonal, and 1 1
use these three column vectors to construct a 3 × 3 orthogonal matrix.
71
Chapter 7
Addition, Subtraction, and Multiplication
72
7.1
Addition and Subtraction
In this chapter, we will learn how to do arithmetic operations for matrices. First, in this section, we will learn how to add or subtract matrices. Addition of matrices is the same as addition of vectors: we add matrices elementby-element. That is, we add the row 1 column 1 element of the first matrix to the row 1 column 1 element of the second matrix, the row 1 column 2 element of the first matrix to the row 1 column 2 element of the second matrix, and so on. Note that we can only add two matrices of the same dimensions. Let us look at a few examples. 1 2 3 −3 0 9 Example 7.1: + =? 4 5 6 7 −6 1 1 4
2 5
3 −3 + 6 7
Example 7.2: 10 6
10 6
12 Example 7.3: −3 9 12 −3 9
9 1 + (−3) 2+0 = 1 4+7 5 + (−6)
−3 2 + 0 −6
−3 2 + 0 −6
0 −6
3+9 −2 = 6+1 11
2 −1
12 . 7
−1 =? −7
−1 10 + 2 −3 + (−1) 12 −4 = = . −7 6 + (−6) 0 + (−7) 0 −7
−10 8 −11 5 + −7 6 =? −15 1 2
−11 −10 8 12 + (−10) 5 + −7 6 = −3 + (−7) 2 −15 1 9 + (−15)
−11 + 8 2 −3 5 + 6 = −10 11 . 2+1 −6 3
Like the addition of numbers and vectors, matrix addition satisfies the commutative and associative properties. That is, for matrices A, B, and C with the same dimensions, A+B =B+A and A + (B + C) = (A + B) + C. Similarly to addition, subtraction of matrices is also done element-by-element. Let us look at a few examples.
73
−5 3 1 Example 7.4: 0 6 −8 −5 3 2 2 0 1 7 − 7 6 −8 9 9
2 2 7 − 7 9 9
5 −3 −6 4 =? −9 10 5 −3 −5 − 2 3−5 2 − (−3) −6 4 = 0 − 7 1 − (−6) 7−4 −9 10 6 − 9 −8 − (−9) 9 − 10 −7 −2 5 3 . = −7 7 −3 1 −1 3 6 9 0 5 11 −1 0 Example 7.5: − =? 7 −2 −8 12 10 5 −7 8 3 6 9 0 5 11 −1 0 3 − 5 6 − 11 9 − (−1) 0−0 − = 7 −2 −8 12 10 5 −7 8 7 − 10 −2 − 5 −8 − (−7) 12 − 8 −2 −5 10 0 = . −3 −7 −1 4
7.2
Matrix Multiplication with Scalars and Vectors
In this section, we will learn how to multiply a matrix with a scalar or a vector. In the next section, we will use what we learn here to multiply two matrices. Matrix Multiplication with Scalars As mentioned before in the last chapter, matrix multiplication with scalars is the same as vector multiplication with scalars: we multiply the scalar to each element of the matrix. Let us look at a few examples. −1 3 Example 7.6: 2 =? 5 2 −1 3 2 · (−1) 2 · 3 −2 6 = = . 2 5 2 2·5 2·2 10 4 3 0 −2 1 4 −5 =? Example 7.7: −5 −1 6 7 −3 0 2 3 0 −2 1 −5 · 3 −5 · 0 −5 · (−2) −5 · 1 4 −5 = −5 · (−1) −5 · 6 −5 · 4 −5 · (−5) −5 −1 6 7 −3 0 2 −5 · 7 −5 · (−3) −5 · 0 −5 · 2 −15 0 10 −5 −30 −20 25 . = 5 −35 15 0 −10 74
−5 2 =? 8
−5 −3 −(−5) −3 2 = −0 −2 = 0 8 −(−6) −8 6
3 Example 7.8: − 0 −6 3 − 0 −6
5 −2 . −8
Matrix Multiplication with Column Vectors Matrix multiplication with vectors is a little bit more complicated. First, let us learn how to multiply a matrix with a column vector. When we multiply a matrix with a column vector, we multiply the first element (from top to bottom) of the vector to the first column (from left to right) of the matrix, the second element of the vector to the second column of the matrix, and so on, then we add them all up together: v1 | | | v2 ~a1 ~a2 · · · ~an .. = v1~a1 + v2~a2 + · · · + vn~an . . | | | vn It is important to note that we can only multiply a matrix with a column vector whose dimension is the same as the number of columns of the matrix. Let us look at a few examples. 3 −5 6 Example 7.9: =? −2 6 2
3 −2
−5 6
6 3 −5 18 −10 8 =6 +2 = + = . 2 −2 6 −12 12 0
1 Example 7.10: 0 1 0
7 −2
5 3 −1 =? 4 2
5 3 1 7 3 5 −7 6 4 −1 = 5 + (−1) +2 = + + = . 4 0 −2 4 0 2 8 10 2
−1 2 Example 7.11: 8 7 −1 2 8 7
7 −2
−5 −3 3 =? 6 −2 9
−5 −1 −5 −3 10 7 2 −3 6 6 12 −3 3 = 3 8 + (−2) 6 = 24 + −12 = 12 . 6 −2 9 7 9 21 −18 3 75
From the examples, we see that the column vectors are always put to the right of the matrices. We cannot change the order and put the column vectors to the left because matrix multiplication is not commutative. Also, note that when we multiply a matrix with a column vector, we are adding scalar multiples of each column of the matrix. So, multiplying a matrix with a column vector gives a linear combination of the columns of the matrix. Matrix Multiplication with Row Vectors
|
|
~anT
|
|
|
|
Next, let us discuss matrix multiplication with row vectors. It is very similar to matrix multiplication with column vectors. We multiply each element of the row vector to each row of the matrix, and then add them all up together: ~a1T ~a2T = v1~a1T + v2~a2T + · · · + vn~anT . v1 v 2 · · · vn .. . Addition and scalar multiplication of row vectors are exactly the same as addition and scalar multiplication of column vectors. It is important to note that we can only multiply a matrix with a row vector whose dimension is the same as the number of rows of the matrix. Let us take a look at some examples. 2 4 Example 7.12: 3 −5 =? 1 3 2 4 3 −5 = 3 2 4 + (−5) 1 3 = 6 12 + −5 −15 = 1 −3 . 1 3 0 6 Example 7.13: 1 2 3 −3 8 =? 7 −5 0 6 1 2 3 −3 8 = 1 6 0 + 2 −3 8 + 3 7 −5 7 −5 = 6 0 + −6 16 + 21 −15 = 21 1 . 1 0 5 Example 7.14: 3 5 =? 0 1 −3 1 0 5 3 5 = 3 1 0 5 + 5 0 1 −3 0 1 −3 = 3 0 15 + 0 5 −15 = 3 5 0 . 76
From the examples, we see that the row vectors are always put to the left of the matrices. Also, note that multiplying a matrix with a row vector gives a linear combination of the rows of the matrix.
7.3
Multiplication of Two Matrices
Before we learn the three methods for matrix multiplication, there are some important points about matrix multiplication that we should pay attention to. First, we need to know the condition that the dimensions of two matrices A and B must satisfy to be able to do the matrix multiplication AB. To be able to do the matrix multiplication AB, the number of columns of A must be the same as the number of rows of B. That is, we can only multiply a m × n matrix with a n × p matrix. Also, you should notice from the examples below that the product of a m × n matrix with a n × p matrix is a m × p matrix. Second, you should remember that matrix multiplication is not commutative. That is, generally AB 6= BA. If A is m × n and B is n × p, then we can do the multiplication AB. However, if we change the order, the multiplication BA is impossible to do because the number of columns of B, p, is not the same as the number of rows of A, m. If A is a m × n matrix and B is a n × m matrix, then we can still do the multiplication BA, but the result will be different from AB. Third, as mentioned in the last chapter, the identity matrix is the matrix version of number 1. In high school algebra, we learned that any number times 1 is the number itself. Similarly, any m × n matrix multiplied by the n × n identity matrix is the same m × n matrix itself. That is, AIn = A. Also, the n × n identity matrix multiplied by any n × p matrix is the same n × p matrix itself. That is, In A = A. You should observe these important points while looking at the examples below. There are three ways to multiply two matrices, and you can choose whichever way you like the best although you should be familiar with all three. All three methods produce the same final results; they are just three different ways to carry out the multiplication. Method 1: Multiplying Each Column The first way to do the matrix multiplication AB = C 77
is to multiply A with each column of B. The first column of C is A times the first column of B; the second column of C is A times the second column of B; and so on: | | | | | | A ~b1 ~b2 · · · ~bp = A~b1 A~b2 · · · A~bp . | | | | | | Let us do a few examples. 2 3 5 Example 7.15: 0 −1 −2
−4 =? 1
First, we find the first column of the product: 2 3 5 2 3 4 =5 + (−2) = . 0 −1 −2 0 −1 2 Then, we find the second column of the product: 2 3 −4 2 3 −5 = (−4) +1 = . 0 −1 1 0 −1 −1 Therefore,
Example 7.16:
5 −2
2 3 5 −4 4 = 0 −1 −2 1 2 −4 2 3 =? 1 0 −1
−5 . −1
First, we find the first column of the product: 5 −4 2 5 −4 10 =2 +0 = . −2 1 0 −2 1 −4 Then, we find the second column of the product: 5 −4 3 5 −4 19 =3 + (−1) = . −2 1 −1 −2 1 −7 Therefore,
3 Example 7.17: 2 0
5 −4 2 3 10 = −2 1 0 −1 −4 1 4 3 −5 7 2 −2 1 =? 5 8 2 −4
19 . −7
First, we find the first column of the product: 3 1 4 3 3 1 4 15 2 7 2 −2 = 3 2 + (−2) 7 + 2 2 = −4 . 0 5 8 2 0 5 8 6 78
Then, we find the 3 1 2 7 0 5
second column of the product: 4 −5 3 1 4 −30 2 1 = −5 2 + 1 7 + (−4) 2 = −11 . 8 −4 0 5 8 −27
Therefore, 3 2 0
1 Example 7.18: −3 −1
1 4 3 −5 15 7 2 −2 1 = −4 5 8 2 −4 6 2 1 −2 0 2 −5 =? 3 1 −1 0 6
−30 −11 . −27
First, we find the first column of the product: 1 2 2 7 1 1 −3 −5 = 1 −3 + 3 −5 = −18 . 3 −1 6 6 17 −1 Then, we find the second column of the product: 0 2 1 2 1 −2 −3 −5 = (−2) −3 + 1 −5 = 1 . 1 8 6 −1 6 −1 Next, we find the third column of the product: 1 2 2 −2 1 −3 −5 0 = 0 −3 + (−1) −5 = 5 . −1 −1 6 6 −6 −1 Lastly, we find the fourth column of the product: 1 2 2 1 2 −3 −5 2 = 2 −3 + 0 −5 = −6 . 0 −1 6 −2 −1 6 Therefore,
1 −3 −1
2 1 −5 3 6
−2 Example 7.19: 1
−2 1
1 3 0 0 5 −8 0
0 −1 0 1 0
7 2 = −18 0 17 0 0 =? 1
0 −2 1 5 8 −6
2 −6 . −2
First, we find the first column of the product: 1 −2 3 0 −2 3 0 −2 0 =1 +0 +0 = . 1 5 −8 1 5 −8 1 0 79
Then, we find the second column of the product: 0 −2 3 0 −2 3 0 3 1 =0 +1 +0 = . 1 5 −8 1 5 −8 5 0 Next, we find the third column of the product: 0 −2 3 0 −2 3 0 0 0 =0 +0 +1 = . 1 5 −8 −8 1 5 −8 1 Therefore, −2 1
1 3 0 0 5 −8 0
0 −2 0 = 1 1
0 1 0
3 0 . 5 −8
Method 2: Multiplying Each Row Another way to do the matrix multiplication AB = C
Let us look at a few examples. 2 −1 0 3 Example 7.20: 1 3 −2 0 0 5 0 0
2 −1 3
|
m
|
|
|
|
| |
|
|
|
|
|
is to multiply B with each row of A. The first row of C is the first row of A times B; the second row of C is the second row of A times B; and so on: T T ~ a B 1 ~a1 T ~a2T B ~a2 . .. .. B = . . T ~am ~a T B
1 0 =? −5
First, we find the first row of the product: 1 3 2 2 −1 0 0 −1 0 = 2 3 2 1 + (−1) 0 0 3 −5 = 6 5 2 . Then, we find the second row of the product: 1 3 2 1 3 −2 0 −1 0 = 1 3 2 1 + 3 0 0 3 −5 = 3 −7 11 . 80
−1
0 +0 0
3
−5
0 + (−2) 0
3
−5
−1
Next, we find the third row of the product: 1 3 2 0 5 0 0 −1 0 = 0 3 2 1 + 5 0 0 3 −5 = 0 −5 0 .
−1
0 +0 0
−5
3
Therefore, 2 −1 0 3 2 1 6 1 3 −2 0 −1 0 = 3 0 5 0 0 3 −5 0 −1 6 12 −10 0 Example 7.21: =? 2 −5 5 −2 −3
5 −7 −5
2 11 . 0
First, we find the first row of the product: 12 −10 0 −1 6 = (−1) 12 −10 0 + 6 5 5 −2 −3 = 18 −2 −18 .
−2 −3
Then, we find the second row of the product: 12 −10 0 2 −5 = 2 12 −10 0 + (−5) 5 5 −2 −3 = −1 −10 15 .
−2
−3
Therefore, −1 2 1 Example 7.22: 3
6 −5 −1 4
12 5
−10 0 18 = −2 −3 −1 −3 5 2 7 −2 =? −2 1 0
−2 −18 . −10 15
First, we find the first row of the product: −3 5 1 −1 2 7 −2 = 1 −3 5 + (−1) 7 1 0 = −8 7 . Then, we find the second row of the product: −3 5 3 4 −2 7 −2 = 3 −3 5 + 4 7 1 0 = 17 7 . 81
−2 + 2 1
0
−2 + (−2) 1
0
Therefore,
−3 Example 7.23: 7 1
1 −1 3 4
−3 2 7 −2 1
5 1 −1 −2 3 4 0
5 −8 −2 = 17 0
7 . 7
2 =? −2
First, we find the first row of the product: 1 −1 2 −3 5 = (−3) 1 −1 2 + 5 3 3 4 −2 = 12 23 −16 .
4
−2
Then, we find the second row of the product: 1 −1 2 7 −2 = 7 1 −1 2 + (−2) 3 3 4 −2 = 1 −15 18 .
4
−2
Next, we find the third row of the product: 1 −1 2 1 0 = 1 1 −1 2 + 0 3 3 4 −2 = 1 −1 2 .
4
−2
Therefore, −3 5 12 1 −1 2 7 −2 =1 3 4 −2 1 0 1 1 0 0 0 1 2 0 1 0 0 −3 4 Example 7.24: 0 0 1 0 5 −6 =? 0 0 0 1 −7 −8
23 −15 −1
−16 18 . 2
First, we find the first row of the product: 1 2 −3 4 1 0 0 0 5 −6 = 1 1 2 + 0 −3 −7 −8 = 1 2 .
4 +0 5
−6 + 0 −7
−8
Then, we find the second row of the product: 1 2 −3 4 0 1 0 0 5 −6 = 0 1 2 + 1 −3 −7 −8 = −3 4 .
4 +0 5
−6 + 0 −7
−8
82
Next, we find the third 1 −3 0 0 1 0 5 −7
row of the product: 2 4 = 0 1 2 + 0 −3 −6 −8 = 5 −6 .
Lastly, we find the fourth row of the 1 2 −3 4 0 0 0 1 5 −6 = 0 1 −7 −8 = −7
4 +1 5
−6 + 0 −7
−8
4 +0 5
−6 + 1 −7
−8
product:
2 + 0 −3 −8 .
Therefore, 1 0 0 0
0 1 0 0
0 0 1 0
1 2 −3 4 = −6 5 −7 −8
1 0 −3 0 0 5 −7 1
2 4 . −6 −8
Method 3: Multiplying Each Row with Each Column Last but not least, another way to do the matrix multiplication AB = C
|
~a2T .. . T ~am
T ~a1 ~b1 ~a1T ~b2 | ~a T ~b1 ~a2T ~b2 ~bp = 2. .. . . . | T~ T~ ~am b1 ~am b2
|
|
~a1T
| | ~b1 ~b2 | |
···
|
|
|
is to multiply each row of A with each column of B to obtain each element of C: ··· ··· .. . ···
~a1T ~bp ~a2T ~bp .. . . ~a T ~bp m
So, in general, if we denote the elements of C as cij , then we have cij = ~aiT ~bj . In other words, the row i column j element of C is i-th row of A times j-th column of B. Also, recall from chapter 5 that ~u T ~v = ~u · ~v . Let us take a look at some examples. −1 2 1 0 Example 7.25: =? 2 5 0 1 83
Let
−1 2
2 1 5 0
0 = C. 1
For the row 1 column 1 element, we have 1 −1 1 c11 = −1 2 = · = (−1) · 1 + 2 · 0 = −1. 0 2 0 For the row 1 column 2 element, we have 0 −1 0 c12 = −1 2 = · = (−1) · 0 + 2 · 1 = 2. 1 2 1 For the row 2 column 1 element, we have 1 2 1 c21 = 2 5 = · = 2 · 1 + 5 · 0 = 2. 0 5 0 For the row 2 column 2 element, we have 0 2 0 c22 = 2 5 = · = 2 · 0 + 5 · 1 = 5. 1 5 1 Therefore, −1 2 −2 Example 7.26: 5 Let
2 1 0 c11 c12 −1 = = 5 0 1 c21 c22 2 3 6 −1 2 =? −4 3 1 −4 −2 5
3 −4
6 3
−1 1
2 . 5
2 = C. −4
For the row 1 column 1 element, we have 6 −2 6 c11 = −2 3 = · = (−2) · 6 + 3 · 3 = −3. 3 3 3 For the row 1 column 2 element, we have −1 −2 −1 c12 = −2 3 = · = (−2) · (−1) + 3 · 1 = 5. 1 3 1 For the row 1 column 3 element, we have 2 −2 2 c13 = −2 3 = · = (−2) · 2 + 3 · (−4) = −16. −4 3 −4 For the row 2 column 1 element, we have 6 5 6 c21 = 5 −4 = · = 5 · 6 + (−4) · 3 = 18. 3 −4 3 84
For the row 2 column 2 element, we have −1 5 −1 c22 = 5 −4 = · = 5 · (−1) + (−4) · 1 = −9. 1 −4 1 For the row 2 column 3 element, we have 2 5 2 c23 = 5 −4 = · = 5 · 2 + (−4) · (−4) = 26. −4 −4 −4 Therefore, −2 5
7.4
3 6 −4 3
−1 1
2 c = 11 −4 c21
c12 c22
c13 −3 = c23 18
5 −9
−16 . 26
Distributive and Associative Properties of Matrix Multiplication
Although matrix multiplication does not satisfy commutative property, it still does satisfy distributive and associative properties: A(B + C) = AB + AC and (AB)C = A(BC). In this section, we will prove these properties of matrix multiplication. Distributive Property Before proving the distributive property of matrix multiplication in general, let us first prove that A(~u + ~v ) = A~u + A~v for any m × n matrix A and n-dimensional vectors ~u and ~v . Let
and
| | | A = ~a1 ~a2 · · · ~an , | | | u1 u2 ~u = . , .. un v1 v2 ~v = . . ..
vn 85
Then, we have v1 u1 | u2 v2 ~an . + . .. .. | vn un u1 + v1 | u2 + v2 ~an .. . | u n + vn
| | A(~u + ~v ) = ~a1 ~a2 | |
···
| | = ~a1 ~a2 | |
···
= (u1 + v1 )~a1 + (u2 + v2 )~a2 + · · · + (un + vn )~an = u1~a1 + v1~a1 + u2~a2 + v2~a2 + · · · + un~an + vn~an = (u1~a1 + u2~a2 + · · · + un~an ) + (v1~a1 + v2~a2 + · · · + vn~an )
| | = ~a1 ~a2 | |
···
u1 | | | u2 ~an . + ~a1 ~a2 . | | | . un
···
v1 | v2 ~an . . | . vn
= A~u + A~v . Next, we can use this result to prove the more general distributive property: A(B + C) = AB + AC for any m × n matrix A and n × p matrices B and C. In this proof, we will use method 1 of matrix multiplication, which is multiplying the first matrix to each column of the second matrix to obtain each column of the product. Let
| | B = ~b1 ~b2 | |
···
| ~bp |
···
| ~cp . |
and
| | C = ~c1 ~c2 | | 86
Then, we have | | | | | | A(B + C) = A ~b1 ~b2 · · · ~bp + ~c1 ~c2 · · · ~cp | | | | | | | | | ~ ~ ~ = A b1 + ~c1 b2 + ~c2 · · · bp + ~cp | | | | | | = A(~b1 + ~c1 ) A(~b2 + ~c2 ) · · · A(~bp + ~cp ) | | | | | | = A~b1 + A~c1 A~b2 + A~c2 · · · A~bp + A~cp | | | | | | | | | = A~b1 A~b2 · · · A~bp + A~c1 A~c2 · · · A~cp | | | | | | | | | | | | = A ~b1 ~b2 · · · ~bp + A ~c1 ~c2 · · · ~cp | | | | | |
= AB + AC. Thus, we proved the distributive property of matrix multiplication. It could also be proven in a similar manner using method 2 of matrix multiplication, which is multiplying each row of the first matrix to the second matrix to obtain each row of the product, instead of method 1. Similarly, we also have the property (A + B)C = AC + BC. Associative Property Now let us prove the associative property of matrix multiplication. Before proving the associative property of matrix multiplication in general, let us first prove that (AB)~v = A(B~v ) for any m × n matrix A, n × p matrix B, and p-dimensional vector ~v . Let
| | B = ~b1 ~b2 | | 87
···
| ~bp |
and v1 v2 ~v = . . .. vp Then, we have
···
v1 | v2 ~bp .. . | vp
···
v1 | v2 ~ Abp .. . | vp
| | (AB)~v = A ~b1 ~b2 | |
| ~ = Ab1 |
| ~ Ab2 |
= v1 · A~b1 + v2 · A~b2 + · · · + vp · A~bp = A(v1~b1 + v2~b2 + · · · + vp~bp )
| | = A ~b1 ~b2 | |
···
v1 | v2 ~bp .. . | vp
= A(B~v ). Note that the fourth step, where we factored out A, follows from the distributive property of matrix multiplication. Next, we can use this result to prove the more general associative property: (AB)C = A(BC) for any m × n matrix A, n × p matrix B, and p × r matrix C. Again, in this proof, we will use method 1 of matrix multiplication, which is multiplying the first matrix to each column of the second matrix to obtain each column of the product. Let
| | C = ~c1 ~c2 | | 88
···
| ~cr . |
Then, we have
| | (AB)C = (AB) ~c1 ~c2 · · · | | | | = (AB)~c1 (AB)~c2 | | | | = A(B~c1 ) A(B~c2 ) | | | | = A B~c1 B~c2 · · · | | | | = A B ~c1 ~c2 · · · | |
| ~cr | | · · · (AB)~cr | | · · · A(B~cr ) | | B~cr | | ~cr |
= A(BC). Thus, we proved the associative property of matrix multiplication. The associative property could also be proven in a similar manner using method 2 of matrix multiplication instead of method 1.
7.5
Transpose of Product of Matrices
In this section, we will prove the interesting formula (AB)T = B T AT , where A is any m × n matrix and B is any n × p matrix. To prove that formula, we first need to prove that (A~v )T = ~v T AT for any m × n matrix A and n-dimensional vector ~v . Let
| | A = ~a1 ~a2 | |
···
and v1 v2 ~v = . . ..
vn 89
| ~an |
Then, we have
| | (A~v )T = ~a1 ~a2 | |
···
T v1 | v2 ~an . .. | vn T
= (v1~a1 + v2~a2 + · · · + vn~an ) .
|
.
|
|
|
|
(A~v )T = v1~a1T + v2~a2T + · · · + vn~anT ~a1T ~a2T = v1 v2 · · · vn .. . ~anT
|
Using the result (~u + ~v ) T = ~u T + ~v T in question 2 of the exercises in chapter 5, we have
Note that T v1 v2 = . = v1 ..
vn
~a1T
|
|
| ~an = |
~a2T .. .
|
~anT
|
···
|
~v T
v2
vn
|
and T
| | AT = ~a1 ~a2 | |
···
because the transpose turns columns into rows. Therefore, we proved that (A~v )T = ~v T AT . Now let us prove the more general formula (AB)T = B T AT . We will use method 1 and method 2 of matrix multiplication in the proof. Let
| | B = ~b1 ~b2 | | 90
···
| ~bp . |
Then, we have T | | | (AB)T = A ~b1 ~b2 · · · ~bp | | | T | | | = A~b1 A~b2 · · · A~bp | | | (A~b1 ) T (A~b2 ) T = .. . T ~ (Abp ) ~b T AT 1 ~T T b2 A = .. . ~b T AT p ~b T 1 ~T b2 T A = .. . ~b T |
|
|
|
|
p
(7.2)
|
|
|
|
|
| | | T
(7.1)
|
|
|
|
|
T
=B A . Note that we used method 1 of matrix multiplication in (7.1) and method 2 of matrix multiplication in (7.2). This formula can be used to prove an interesting result in question 11 in the exercises.
91
Exercises 1 1. 3 3
−2 −1 +5 0 2 3 −3 1 − 7 5 8 −7
2 2. 10 3 6
10 3. −5
−9 6
4. −2
4
1 5. −5 2
4 =? 5 −4 0 =? 1
3 7 2 =? 3 1
5 2
3 4
1 =? 6
7 11 6 5 =? 8 −3
−3 0 1
6. Show that any matrix times the a11 a12 a21 a22 .. .. . .
1 7. −3 5
2 2 4 1 −6
2 8. 0
0 −2
3 9. 0 0
2 −1 3
6 −3
0 vector is the 0 vector: · · · a1n 0 0 0 0 · · · a2n .. .. = .. . .. . . . . ···
am1
am2
−6 0
0 −1
−8 =? 7
−1 5
3 =? 0
1 2 0 1 −5 0
−1 3 5
amn
0 −2 =? 0 92
0
0
10. Show that the product of two diagonal matrices is a diagonal matrix: c1 d1 0 ··· 0 d1 0 · · · 0 c1 0 · · · 0 0 c2 · · · 0 0 d 2 · · · 0 0 c2 d 2 · · · 0 .. .. .. . .. . . .. = .. .. . . .. .. . . . . . . . . . . . . . . 0 0 · · · cn dn 0 0 · · · dn 0 0 · · · cn 11. For any matrix A, show that AT A is a symmetric matrix. (Hint: a symmetric matrix S is a matrix such that S T = ?)
93
Chapter 8
Row Operations on Matrices
94
8.1
Three Types of Row Operations
In this chapter, we will learn about the row operations, and then we will learn an application of it in the next chapter. Simply speaking, row operations are something we do to the rows of a matrix. There are three types of row operations. Switching Rows When doing row operations on a matrix, we can switch two rows with each other. Let us look at some examples. 1 5 Example 8.1: Switch row 1 and row 2 of . 2 6 1 2
5 2 6 → . 6 1 5 3 −1 Example 8.2: Switch row 1 and row 3 of 2 −2 1 5 3 2 1
−1 −2 5
1 0 8 → 2 3 10
5 −2 −1
1 Example 8.3: Switch row 2 and row 3 of −2 5
1 −2 5
2 3 −5 1 1 0 6 → 5 7 8 −9 −2
0 8 . 10 10 8 . 0
2 3 −5 1 0 6 . 7 8 −9 2 3 −5 7 8 −9 . 1 0 6
Multiplying a Row by a Scalar The second type of row operations is multiplying a row by a non-zero scalar. When we multiply row i1 by k, we replace the original row i1 with a new row vector obtained by multiplying the original row i1 by k. Let us do some examples. 1 2 Example 8.4: Multiply row 1 by 3 for the matrix . −5 3 We replace row 1 with 3 times row 1: 1 2 →3 1
2 = 3
So,
1 −5
2 3 → 3 −5 95
6 . 3
6 .
3 1
Example 8.5: Multiply row 2 by −5 for the matrix We replace row 2 with −5 times row 2: 1 −2 2 → (−5) 1 −2
2 = −5
6 −2
10
−9 . 2
−10 .
So, 3 1
−9 3 → 2 −5
−9 . −10 1 5 Example 8.6: Multiply row 3 by 2 for the matrix −9 −3 6 3 6 −2
We replace row 3 with 2 times row 3: 6 3 −2 → 2 6 3
6 10
−2 = 12
12 10 . −2
6 −4 .
So,
1 −9 6
5 −3 3
1 12 10 → −9 12 −2
12 10 . −4
5 −3 6
Adding a Multiple of Another Row to a Row Another type of row operations which we can do is adding a multiple of another row to a row. When we add k times row i1 to row i2 , we replace the original row i2 with a new row vector obtained by adding k times row i1 to row i2 . Let us do a few examples. −1 2 Example 8.7: Add 2 times row 1 to row 2 for the matrix . 3 −5 Adding 2 times row 1 to row 2 gives 2 −1 2 + 3
−5 = 1
−1 .
So, we will replace row 2 with this new row vector: −1 2 −1 2 → . 3 −5 1 −1 −2 Example 8.8: Add −3 times row 2 to row 3 for the matrix 0 7 Adding −3 times row 2 to row 3 gives (−3) 0 5 + 7
10 = 7
−5 .
So, we will replace row 3 with this new row vector: −2 3 −2 3 0 5→ 0 5 . 7 10 7 −5 96
3 5 . 10
1 Example 8.9: Add row 1 to row 3 for the matrix 3 2 Adding row 1 to row 3 gives 1 −1 5 + 2
10
6 = 3
9
5 −8. 6
−1 7 10
11 .
So, we will replace row 3 with this new row vector: 1 −1 5 1 −1 5 3 7 −8 → 3 7 −8 . 2 10 6 3 9 11
8.2
Row Operation as Matrix Multiplication
The row operations done to a m×n matrix can be expressed as multiplying a m×m matrix to the left of the m × n matrix. When expressing the row operations as matrix multiplications like this, the second method of matrix multiplication learned in chapter 7, which is multiplying each row of the first matrix to the second matrix to obtain each row of the product, will be helpful. Switching Rows Let us look back at an example we did in the last section. In example 8.1, we saw that switching row 1 and row 2 of 1 5 (8.1) 2 6 gives 2 1
6 . 5
Now consider the matrix multiplication 0 1 1 1 0 2 The first row of the product is 1 5 0 1 =0 1 2 6
5 . 6
5 +1 2
6 = 2
6 ,
so we can see that the original row 2 is now moved to the position of row 1. The second row of the product is 1 5 1 0 =1 1 5 +0 2 6 = 1 5 , 2 6 so the original row 1 is now moved to the position of row 2. Thus, 0 1 1 5 2 6 = , 1 0 2 6 1 5 97
which is the same as the matrix obtained after switching row 1 and row 2 of the original matrix in (8.1). Do you notice anything particular about the matrix 0 1 1 0 which we are multiplying at the left? It is the 2 × 2 identity matrix with row 1 and row 2 switched: 1 0 0 1 → . 0 1 1 0 Next, let us look at another example we did in the last section. In example 8.3, we saw that switching row 2 and row 3 of 1 2 3 −5 −2 1 0 6 (8.2) 5 7 8 −9 gives
1 2 3 5 7 8 −2 1 0 Now consider the matrix multiplication 1 1 0 0 0 0 1 −2 5 0 1 0
−5 −9 . 6 2 3 −5 1 0 6 . 7 8 −9
The first row of the product is 1 2 3 −5 1 0 0 −2 1 0 6 5 7 8 −9 =1 1 2 3 −5 + 0 −2 1 0 = 1 2 3 −5 ,
6 +0 5
7
so row 1 stays the same. The second row of the product is 1 2 3 −5 0 0 1 −2 1 0 6 5 7 8 −9 =0 1 2 3 −5 + 0 −2 1 0 6 + 1 5 7 = 5 7 8 −9 ,
8
−9
8
−9
so the original row 3 is now moved to the position of row 2. The third row of the product is 1 2 3 −5 0 1 0 −2 1 0 6 5 7 8 −9 =0 1 2 3 −5 + 1 −2 1 0 6 + 0 5 7 8 −9 = −2 1 0 6 , 98
so the original row 1 0 0
2 is now moved 0 0 1 2 0 1 −2 1 1 0 5 7
to the position of 3 −5 1 0 6 = 5 8 −9 −2
row 3. Thus, 2 3 −5 7 8 −9 , 1 0 6
which is the same as the matrix obtained after switching row 2 and row 3 of the original matrix in (8.2). Again, do you notice anything particular about the matrix 1 0 0 0 0 1 0 1 0 which we are multiplying at the left? It is the row 3 switched: 1 1 0 0 0 1 0 → 0 0 0 0 1
3 × 3 identity matrix with row 2 and 0 0 0 1 . 1 0
So, here is the general rule: when we switch row i1 and row i2 of a m × n matrix A, it is the same as multiplying a m × m matrix E1 to the left of A, where E1 is the m × m identity matrix with row i1 and row i2 switched. For example, if we switch row 2 and row 4 of a 4 × 3 matrix, it is the same as multiplying the 4 × 4 identity matrix with row 2 and row 4 switched, 1 0 0 0 0 0 0 1 0 0 1 0 , 0 1 0 0 to the left of the 4 × 3 matrix. Multiplying a Row by a Scalar Next, let us see how this type of row operations can be expressed as a matrix multiplication. In example 8.5, we saw that multiplying row 2 by −5 for the matrix 3 6 −9 (8.3) 1 −2 2 gives
3 −5
6 10
Now consider the matrix multiplication 1 0 3 0 −5 1 The first row of the product is 3 6 −9 1 0 =1 3 1 −2 2
6
−9 . −10 6 −2
−9 . 2
−9 + 0 1 99
−2
2 = 3
6
−9 ,
so row 1 stays the same. The second row of the product is 3 6 −9 0 −5 = 0 3 6 −9 + (−5) 1 −2 2 = −5 1 −2 2 so row 2 is multiplied by −5. Thus, 1 0 3 6 0 −5 1 −2
−9 3 = 2 −5
−10 ,
10
6 −9 , 10 −10
which is the same as the matrix obtained after multiplying row 2 by −5 for the original matrix in (8.3). Do you notice anything particular about the matrix 1 0 0 −5 which we are multiplying at the left? It is the 2 × 2 identity matrix with row 2 multiplied by −5: 1 0 1 0 → . 0 1 0 −5 Next, let us look at another example from the last section. In example 8.6, we saw that multiplying row 3 by 2 for the matrix 1 5 12 −9 −3 10 (8.4) 6 3 −2 gives
1 −9 12
5 −3 6
Now consider the matrix multiplication 1 0 0 1 0 1 0 −9 0 0 2 6
12 10 . −4
5 −3 3
12 10 . −2
The first row of the product is 5 12 1 1 0 0 −9 −3 10 = 1 1 5 12 + 0 −9 6 3 −2 = 1 5 12 ,
10 + 0 6
3
−2
so row 1 stays the same. The second row of the product is 5 12 1 0 1 0 −9 −3 10 = 0 1 5 12 + 1 −9 −3 10 + 0 6 6 3 −2 = −9 −3 10 ,
3
−2
100
−3
so row 2 stays the same. The third row of the product is 5 12 1 0 0 2 −9 −3 10 = 0 1 5 12 + 0 −9 −3 6 3 −2 = 12 6 −4 , so row 3 is multiplied by 2. Thus, 1 0 0 1 5 0 1 0 −9 −3 0 0 2 6 3
1 12 10 = −9 −2 12
5 −3 6
10 + 2 6
3
−2
12 10 , −4
which is the same as the matrix obtained after multiplying row 3 by 2 for the original matrix in (8.4). Again, do you notice anything particular about the matrix 1 0 0 0 1 0 0 0 2 which we are multiplying at the left? It is the 3 × 3 identity matrix with row 3 multiplied by 2: 1 0 0 1 0 0 0 1 0 → 0 1 0 . 0 0 1 0 0 2 So, we have the following general rule: when we multiply row i1 by k for a m × n matrix A, it is the same as multiplying a m × m matrix E2 to the left of A, where E2 is the m × m identity matrix with row i1 multiplied by k. For example, if we multiply row 2 by −1 for a 4 × 2 matrix, it is the same as multiplying the 4 × 4 identity matrix with row 2 multiplied by −1, 1 0 0 0 0 −1 0 0 0 0 1 0 , 0 0 0 1 to the left of the 4 × 2 matrix. Adding a Multiple of Another Row to a Row Now let us see how to express this type of row operations as a matrix multiplication. In example 8.7, we saw that adding 2 times row 1 to row 2 for the matrix −1 2 (8.5) 3 −5 gives −1 1
2 . −1
101
Row 1 stays the same, so we have 1 −1
2 +0 3
−5 = 1
−1 0 3
2 . −5
Row 2 is replaced with 2 times the original row 1 added to the original row 2, so we have −1 2 2 −1 2 + 1 3 −5 = 2 1 . 3 −5 Thus, we have 1 2
0 −1 1 3
2 −1 = −5 1
2 . −1
So, multiplying 1 2
0 1
to the left of the original matrix in (8.5) is the same as adding 2 times row 1 to row 2 for that matrix. Do you notice anything particular about the matrix which we are multiplying at the left? It is the 2 × 2 identity matrix with 2 times row 1 added to row 2: 1 0 1 0 → . 0 1 2 1 Next, let us look at example 8.8. In that row 2 to row 3 for the matrix −2 0 7
example, we saw that adding −3 times 3 5 (8.6) 10
gives −2 0 7
3 5 . −5
Row 1 stays the same, so we have 1 −2
3 +0 0
5 +0 7
10 = 1
0
−2 0 0 7
3 5 . 10
1
−2 0 0 7
3 5 . 10
Row 2 also stays the same, so we have 0 −2
3 +1 0
5 +0 7
10 = 0
Row 3 is replaced with −3 times the original row 2 added to the original row 3, so we have −2 3 5 . 0 −2 3 + (−3) 0 5 + 1 7 10 = 0 −3 1 0 7 10 102
Thus, we have 1 0 0
0 1 −3
0 −2 0 0 1 7
3 −2 5= 0 10 7
3 5 . −5
So, multiplying 1 0 0
0 1 −3
0 0 1
to the left of the original matrix in (8.6) is the same as adding −3 times row 2 to row 3 for that matrix. Again, do you notice anything particular about the matrix which we are multiplying at the left? It is the 3 × 3 identity matrix with −3 times row 2 added to row 3: 1 0 0 1 0 0 0 1 0 → 0 1 0 . 0 −3 1 0 0 1 Here is the general rule: when we add k times row i1 to row i2 for a m × n matrix A, it is the same as multiplying a m × m matrix E3 to the left of A, where E3 is the m × m identity matrix with k times row i1 added to row i2 . For example, if we add row 1 to row 3 for a 3 × 5 matrix, it is the same as multiplying the 3 × 3 identity matrix with row 1 added to row 3, 1 0 1
0 1 0
0 0 , 1
to the left of the 3 × 5 matrix.
As you have seen, all of the matrices which we multiply to the left of other matrices to perform row operations on those matrices are identity matrices being done the same row operations on. These matrices which perform row operations on other matrices when multiplied to the left are called elementary matrices. Well, what do we do when we perform two or more row operations successively on the same matrix? How do we express it as a matrix multiplication? If there are two or more row operations, we just need to multiply two or more elementary matrices corresponding to those row operations. For example, if we switch row 2 and row 3 and then multiply row 2 by 5 for a 3 × 2 matrix B, it is the same as multiplying 1 0 0
0 0 1 103
0 1 , 0
which switches row 2 and row 3, and then 1 0 0 0 5 0 , 0 0 1 which multiplies row 2 by 5, to 1 0 0
the left of B: 0 0 1 0 5 0 0 0 0 1 0 1
0 1 B. 0
If we want to express it as multiplication of two matrices instead of three, we just need to multiply the two elementary matrices together: 1 0 0 1 0 0 1 0 0 0 5 0 0 0 1 B = 0 0 5 B. 0 0 1 0 1 0 0 1 0
104
Exercises −1 3 1. Let A = 7 −2
2 6 . 10 −3
5 0 −9 8
a) What is the resulting matrix after switching row 2 and row 4? b) If we express this row operation as a matrix multiplication EA A, EA =?
2. Let B =
3 5
−2 −3
1 6
0 . 2
a) What is the resulting matrix after multiplying row 1 by 2? b) If we express this row operation as a matrix multiplication EB B, EB =? 6 3. Let C = 3 1
11 −5 2
−3 6 . −1
a) What is the resulting matrix after adding −3 times row 3 to row 1? b) If we express this row operation as a matrix multiplication EC C, EC =? −2 4. Let D = 3 6
7 −1. −8
a) What is the resulting matrix after switching row 1 and row 3 and then adding row 3 to row 2? b) If we express these row operations as a matrix multiplication ED D, ED =?
105
Chapter 9
Reduced Row Echelon Form and Rank of Matrices
106
9.1
Reduced Row Echelon Form
Any matrix can be reduced to the reduced row echelon form, and the reduced row echelon form of a matrix A is denoted as rref(A). In this section, we will learn how to reduce a matrix to reduced row echelon form by applying row operations we learned in the last chapter. Here is how to reduce a matrix A to reduced row echelon form: 1. Start with the non-zero number at the top left corner (row 1 column 1 position). If the number at the top left corner is 0, then switch rows so that the number at the top left corner is a non-zero number. 2. Then, do row operations on the matrix so that all elements below on the same column become 0 and the number we started with becomes 1. 3. Go down to the next row and start with the first non-zero number (from left to right) on that row. If all elements on that row are 0, go down one more row to see if there is a non-zero number and start with that number. 4. Then, do row operations on the matrix so that all elements above and below on the same column become 0 and the new number we started with becomes 1. 5. Repeat step 3-4 until we reach the last row. Then, if there are any rows with all 0’s, switch rows so that the rows with all 0’s come to the bottom. 6. The resulting matrix is reduced row echelon form of A. The columns with 1 and 0’s at other places are called pivot columns, and the 1’s on pivot columns are pivots. Now let us look at some examples to understand this better. 2 1 Example 9.1: Let A = , rref(A) =? 6 −3 First, we start with the number at the top left corner, which is 2. Then, we need to do row operations so that that 2 becomes 1 and the 6 below becomes 0. Multiplying first row by 21 , 2 6
1 1 → −3 6
Adding −6 times row 1 to row 2, 1 1 12 → 6 −3 0
1 2
1 2
−3
−6
.
.
Next, we go down to row 2 and start with the first non-zero number on row 2, which is −6. Then, we need to do row operations so that that −6 becomes 1 and 107
the
1 2
above becomes 0.
Multiplying row 2 by − 16 ,
1 2
1 → −6 0
1 0
Adding − 12 times row 2 to row 1, 1 0
1 2
1 → 0 1
1 2
1
.
0 . 1
Therefore, rref(A) =
1 0
0 . 1
Column 1 and column 2 are pivot columns. −2 3 −6 Example 9.2: Let B = 10 −9 18 , rref(B) =? 8 −6 12 First, we start with the number at the top left corner, which is −2. Then, we need to do row operations so that that −2 becomes 1 and the 10 and 8 below become 0. Multiplying row 1 by − 21 , −2 10 8 Adding −10 times row 1 to 1 10 8
3 −6 1 −9 18 → 10 −6 12 8
− 32 −9 −6
3 18 . 12
row 2, 3 1 18 → 0 12 8
− 32 6 −6
3 −12 . 12
Adding −8 times row 1 to row 3, 3 1 1 − 32 0 6 −12 → 0 8 −6 12 0
− 32 6 6
3 −12 . −12
− 32 −9 −6
Next, we go down to row 2 and start with the first non-zero number on row 2, which is 6. Then, we need to do row operations so that that 6 becomes 1 and the − 23 above and the 6 below become 0. Multiplying row 2 by 61 , 1 0 0
− 32 6 6
3 1 −12 → 0 −12 0 108
− 32 1 6
3 −2 . −12
Adding
3 2
times row 2 to row 1, 1 0 0
− 32 1 6
1 3 −2 → 0 0 −12
0 1 6
0 −2 . −12
Adding −6 times row 2 to row 3, 1 0 0 1 0 1 −2 → 0 0 6 −12 0
0 1 0
0 −2 . 0
Next, we go down to row 3. Since row 3, the last row, is all 0’s, there is nothing more to do. Therefore, 1 0 0 rref(B) = 0 1 −2 . 0 0 0 Column 1 and column 2 are pivot columns. 1 −1 5 , rref(C) =? Example 9.3: Let C = 3 −2 6 First, we start with the number at the top left corner, which is 1. Then, we need to do row operations so that that 1 becomes 1 and the 3 and −2 below become 0. (Obviously, since the 1 at the top left corner is already 1, we do not need to do any row operations for that part.) Adding −3 times row 1 to row 2, 1 −1 1 3 5 → 0 −2 6 −2 Adding 2 times row 1 to row 3, 1 0 −2
−1 1 8 → 0 6 0
−1 8 . 6
−1 8 . 4
Next, we go down to row 2 and start with the first non-zero number on row 2, which is 8. Then, we need to do row operations so that that 8 becomes 1 and the −1 above and the 4 below become 0. Multiplying row 2 by 18 , 1 0 0
−1 1 8 → 0 4 0 109
−1 1 . 4
Adding row 2 to row 1, 1 0 0 Adding −4 times row 2 to row 3, 1 0 0
−1 1 1 → 0 4 0
0 1 1 → 0 4 0
0 1 . 4
0 1 . 0
Next, we go down to row 3. Since row 3, the more to do. Therefore, 1 rref(C) = 0 0
last row, is all 0’s, there is nothing 0 1 . 0
Column 1 and column 2 are pivot columns. 3 9 6 Example 9.4: Let D = , rref(D) =? 5 15 7 First, we start with the number at the top left corner, which is 3. Then, we need to do row operations so that that 3 becomes 1 and the 5 below becomes 0. Multiplying row 1 by 13 , 3 5
9 6 1 → 15 7 5
3 15
Adding −5 times row 1 to row 2, 1 3 2 1 → 5 15 7 0
2 . 7
2 . −3
3 0
Next, we go down to row 2 and start with the first non-zero number on row 2, which is −3. Then, we need to do row operations so that that −3 becomes 1 and the 2 above becomes 0. Multiplying row 2 by − 13 , 1 0
3 0
Adding −2 times row 2 to row 1, 1 3 0 0
2 1 → −3 0
2 1 → 1 0
3 0
3 0
2 . 1
0 . 1
Therefore, 1 rref(D) = 0 110
3 0
0 . 1
Column 1 and column 3 are pivot columns. 1 5 3 6 Example 9.5: Let E = 2 10 6 12, rref(E) =? 5 25 8 9 First, we start with the number at the top left corner, which is 1. Then, we need to do row operations so that that 1 becomes 1 and the 2 and 5 below become 0. (Obviously, since the 1 at the top left corner is already 1, we do not need to do any row operations for that part.) Adding −2 times row 1 to 1 2 5
row 2, 5 3 10 6 25 8
1 6 12 → 0 5 9
Adding −5 times row 1 to row 3, 1 1 5 3 6 0 0 0 0 → 0 0 5 25 8 9
5 0 25
5 0 0
3 0 −7
6 0 . 9
3 0 8
6 0 . −21
Next, we go down to row 2. Since row 2 is all 0’s, we go down to row 3 and start with the first non-zero number on row 3, which is −7. Then, we need to do row operations so that that −7 becomes 1 and the 3 above becomes 0. Multiplying row 3 by − 17 , 1 5 0 0 0 0 Adding −3 times row 3 to 1 0 0 Since row 2 in the middle comes to the bottom: 1 0 0
3 0 −7
1 6 0 → 0 0 −21
5 0 0
3 0 1
6 0 . 3
row 1, 5 0 0
3 0 1
6 1 5 0 0 → 0 0 0 3 0 0 1
−3 0 . 3
is a 0 row, we switch row 2 and row 3 so that the 0 row 5 0 0 0 0 1
−3 1 5 0 0 → 0 0 1 3 0 0 0
−3 3 . 0
Therefore, 1 rref(E) = 0 0
5 0 0
Column 1 and column 3 are pivot columns. 111
0 1 0
−3 3 . 0
9.2
Rank of Matrices
Notice that when we reduce a matrix to reduced row echelon form, each pivot is in a different column and a different row from other pivots. In other words, each row and each column only have at most one pivot. Also, note that the rows that do not have pivots are 0 rows. The number and positions of the pivots of reduced row echelon form of a matrix play important roles in linear algebra. In this section, we will focus on the meaning of the number of pivots, which is the rank of the matrix. For example, if the reduced row echelon form of a matrix A has two pivots, then we say that the matrix A has rank 2. The rank of a matrix is the number of linearly independent rows of that matrix. For example, let us say we have a matrix of five rows. If row 1, row 2, and row 4 are linearly independent, and row 3 and row 5 can be written as linear combinations of row 1, row 2, and row 4, then that matrix has three linearly independent rows, and its rank is 3. It should not be too hard to understand why the number of pivots of reduced row echelon form is equal to the number of linearly independent rows. Let us think of an example. Say we have a matrix of three rows with row 1 and row 2 being linearly independent and row 3 being linearly dependent with respect to row 1 and row 2, specifically row 3 equals row 1 plus row 2. Recall that we obtain reduced row echelon form by doing row operations on the matrix. By adding −1 times row 1 to row 3 and then adding −1 times row 2 to row 3, row 3 in the reduced row echelon form would be a 0 row, and only row 1 and row 2 of the reduced row echelon form would have pivots. In general, the linearly dependent rows will be cancelled out when doing row operations, and the linearly independent rows will remain non-zero and contain pivots. There is also a nice fact that the number of linearly independent rows of a matrix is equal to the number of linearly independent columns of that matrix. However, the proof of that is slightly complicated, so we will not prove it here. To summarize, the number of pivots of reduced row echelon form of a matrix is the rank of the matrix, which is also the number of linearly independent rows and the number of linearly independent columns of the matrix. Now let us look at some examples. 2 1 Example 9.6: Let A = . What is the rank of A? 6 −3 In example 9.1, we found that
1 rref(A) = 0 112
0 . 1
Since there are two pivots, the rank of A is 2. −2 3 −6 Example 9.7: Let B = 10 −9 18 . How many linearly independent rows 8 −6 12 are there in B? In example 9.2, we found that 1 rref(B) = 0 0
0 −2 . 0
0 1 0
Since there are two pivots, B has two linearly independent rows. 1 −1 5 . How many linearly independent columns Example 9.8: Let C = 3 −2 6 are there in C? In example 9.3, we found that 0 1 . 0
1 rref(C) = 0 0
Since there are two pivots, C has two linearly independent columns. 3 9 6 Example 9.9: Let D = . What is the rank of D? 5 15 7 In example 9.4, we found that rref(D) =
1 0
Since there are two pivots, the rank of D 1 5 3 Example 9.10: Let E = 2 10 6 5 25 8 columns are there in E?
3 0
0 . 1
is 2. 6 12. How many linearly independent 9
In example 9.5, we found that 1 rref(E) = 0 0
5 0 0
0 −3 1 3 . 0 0
Since there are two pivots, E has two linearly independent columns. 113
Exercises
2 −2 3 1. Let A1 = 6 12 −12
6 0 . 15
a) rref(A1 ) =? b) Which column(s) of rref(A1 ) is/are pivot column(s)? c) What is the rank of A1 ? d) How many linearly independent rows are there in A1 ? c) How many linearly independent columns are there in A1 ? −3 2. Let A2 = 6
2 −4
9 . −18
a) rref(A2 ) =? b) Which column(s) of rref(A2 ) is/are pivot column(s)? c) What is the rank of A2 ? d) How many linearly independent rows are there in A2 ? c) How many linearly independent columns are there in A2 ?
5 3. Let A3 = −2 7
−3 6 −1
2 −5 3
−1 1 . 2
a) rref(A3 ) =? b) Which column(s) of rref(A3 ) is/are pivot column(s)? c) What is the rank of A3 ? 114
d) How many linearly independent rows are there in A3 ? c) How many linearly independent columns are there in A3 ? −6 4. Let A4 = 1 5
3 −2. 1
a) rref(A4 ) =? b) Which column(s) of rref(A4 ) is/are pivot column(s)? c) What is the rank of A4 ? d) How many linearly independent rows are there in A4 ? c) How many linearly independent columns are there in A4 ?
115
Chapter 10
The Four Fundamental Subspaces
116
10.1
What are Four Fundamental Subspaces?
The four fundamental subspaces are some vector spaces related to a matrix which are subspaces of some n-dimensional space Rn . The four fundamental subspaces of a matrix are column space, row space, null space, and left null space. Given an m × n matrix A, • The column space is the vector space of all linear combinations of columns of A and is a subspace of Rm ; • The row space is the vector space of all linear combinations of rows of A and is a subspace of Rn ; • The null space is the vector space of all n-dimensional vectors ~v such that A~v = 0 and is a subspace of Rn ; • The left null space is the vector space of all m-dimensional vectors ~u such that ~u T A = 0 and is a subspace of Rm . Now let us learn about these four spaces in more details. If you are not yet very familiar with the concept of vector spaces, I recommend that you review chapter 3 before moving on with this chapter.
10.2
Column Space
The column space of A, denoted as C(A), is the vector space of all linear combinations of columns of A, so it is the span of columns of A. However, remember from chapter 3 that only the number of linearly independent basis vectors determines the dimensions of the vector space, and the linearly dependent vectors do not affect the vector space at all. So, more precisely, C(A) is the span of linearly independent columns of A. In the last chapter, we learned how to determine how many linearly independent columns there are in a matrix. In this section, we will learn how to determine which columns are linearly independent so that we can find the basis vectors for the column space of that matrix. To determine which columns of A are linearly independent, we need to use the reduced row echelon form of A. The positions of pivot columns of rref(A) are the same as the positions of linearly independent columns of A. For example, if column 1 and column 3 of rref(A) are pivot columns, then column 1 and column 3 of A are the linearly independent columns of A. The proof that the positions of pivot columns of rref(A) are the same as the positions of linearly independent columns of A is slightly complicated, so we will not prove it here. Instead, think of it intuitively like this: because the pivot columns of rref(A) are linearly independent, the columns of A at the same positions are also linearly independent. It should not be too difficult to see why the pivot columns are linearly independent. Each pivot column has 0’s and one 1, and the 1’s of 117
different pivot columns are at different places. For example, there is only one pivot column with 1 as its first element, and there is only at most one pivot column with 1 as its second element, and so on. As an example, let us consider the following vectors which are some possible pivot columns of the reduced row echelon form of a matrix: 0 0 1 0 1 0 , , . 0 0 1 0 0 0 Is there any non-zero linear combination of these vectors which gives the 0 vector? In other words, is it possible to have 0 0 0 1 0 0 1 0 c1 0 + c2 0 + c3 1 = 0 0 0 0 0 where c1 6= 0, c2 6= 0, or c3 6= 0? The answer is no. The only possible linear combination to give the 0 vector is when c1 = c2 = c3 = 0, so those three vectors are linearly independent. Now let us do some examples of finding the column space of a matrix as the span of linearly independent columns. 2 1 Example 10.1: Let A = . Find C(A). 6 −3 In example 9.1, we found that rref(A) =
1 0
0 . 1
Since column 1 and column 2 of rref(A) are the pivot columns, column 1 and column 2 of A are linearly independent, and thus 2 1 C(A) = span , . 6 −3 Because C(A) is the span of two linearly independent vectors, it is a two-dimensional vector space. −2 3 −6 Example 10.2: Let B = 10 −9 18 . Find C(B). 8 −6 12 In example 9.2, we found that 1 rref(B) = 0 0 118
0 1 0
0 −2 . 0
Since column 1 and column 2 of rref(B) are the pivot columns, column 1 and column 2 of B are linearly independent, and thus 3 −2 C(B) = span 10 , −9 . 8 −6 Because C(B) is the span of two linearly independent vectors, it is a two-dimensional vector space. 1 −1 5 . Find C(C). Example 10.3: Let C = 3 −2 6 In example 9.3, we found that 1 rref(C) = 0 0
0 1 . 0
Since column 1 and column 2 of rref(C) are the pivot columns, column 1 and column 2 of C are linearly independent, and thus −1 1 C(C) = span 3 , 5 . 6 −2 Because C(C) is the span of two linearly independent vectors, it is a two-dimensional vector space. 3 9 6 Example 10.4: Let D = . Find C(D). 5 15 7 In example 9.4, we found that 1 rref(D) = 0
3 0
0 . 1
Since column 1 and column 3 of rref(D) are the pivot columns, column 1 and column 3 of D are linearly independent, and thus 3 6 C(D) = span , . 5 7 Because C(D) is the span of two linearly independent vectors, it is a two-dimensional vector space. 1 5 3 6 Example 10.5: Let E = 2 10 6 12. Find C(E). 5 25 8 9 119
In example 9.5, we found that 1 rref(E) = 0 0
5 0 0
0 −3 1 3 . 0 0
Since column 1 and column 3 of rref(E) are the pivot columns, column 1 and column 3 of E are linearly independent, and thus 3 1 C(E) = span 2 , 6 . 8 5 Because C(E) is the span of two linearly independent vectors, it is a two-dimensional vector space. There are a few more things to note about the column space before we move on to the row space. First, the column space of a m × n matrix is a subspace of Rm . Note that each column of a m × n matrix has m elements, so it is a m-dimensional vector. Remember from chapter 3 that the span of m-dimensional linearly independent vectors is a subspace of Rm . So, the column space of a m × n matrix is a subspace of Rm . For example, the column space of a 3 × 2 matrix is a subspace of R3 . Second, the column space of A is the same as the row space of AT . As we will learn in the next section, the row space of a matrix is the vector space of all linear combinations of rows of that matrix. When we take the transpose of a matrix, the columns of A become rows of AT . So, linear combinations of columns of A are the same as linear combinations of rows of AT .
10.3
Row Space
The row space of A, denoted as R(A), is the vector space of all linear combinations of rows of A. Like how C(A) is the span of linearly independent columns, R(A) is the span of linearly independent rows because only the number of linearly independent basis vectors determines the dimensions of the vector space. However, for R(A), we do not need to use linearly independent rows of A as the basis vectors although we can. Instead, we can write R(A) as the span of non-zero rows of rref(A). Remember that we obtain rref(A) by doing row operations to A. When we do row operations, we switch rows, multiply a row by a scalar, and/or adding a multiple of a row to another row. So, the new rows of rref(A) are some linear combinations of rows of A. Also, as explained in section 9.2 of the last chapter, the number of non-zero rows of rref(A) is the same as the number of linearly independent rows of 120
A, and the non-zero rows of rref(A) are linearly independent because they would have been cancelled out to become 0 rows if they were linearly dependent instead. If we have a set of linearly independent vectors and another set of other linearly independent vectors which can be written as linear combinations of the vectors of the first set, then the span of the second set is the same as the span of the first set. Let us see why this is true with an example with two vectors. Say we have a set of two linearly independent vectors ~v1 and ~v2 and another set of linearly independent vectors ~u1 and ~u2 , where ~u1 and ~u2 can be written as some linear combinations of ~v1 and ~v2 , i.e. ~u1 = c1~v1 + c2~v2 and ~u2 = c3~v1 + c4~v2 for some scalars c1 , c2 , c3 , and c4 . Then, for some scalars a and b, a vector w ~ in the span of ~u1 and ~u2 is a vector of the form w ~ = a~u1 + b~u2 = a(c1~v1 + c2~v2 ) + b(c3~v1 + c4~v2 ) = ac1~v1 + ac2~v2 + bc3~v1 + bc4~v2 = (ac1 + bc3 )~v1 + (ac2 + bc4 )~v2 , which is a linear combination of ~v1 and ~v2 in the span of ~v1 and ~v2 as well. Thus, since the non-zero rows of rref(A) are linearly independent and are some linear combinations of the linearly independent rows of A, the span of non-zero rows of rref(A) is the same as the span of linearly independent rows of A, and we can write R(A) as the span of non-zero rows of rref(A). Now let us do some examples of finding the row space of a matrix. 2 1 Example 10.6: Let A = . Find R(A). 6 −3 In example 9.1, we found that rref(A) =
1 0
0 . 1
Since row 1 and row 2 of rref(A) are non-zero rows of rref(A), 1 0 R(A) = span , . 0 1 Because R(A) is the span of two linearly independent vectors, it is a two-dimensional vector space. −2 3 −6 Example 10.7: Let B = 10 −9 18 . Find R(B). 8 −6 12 121
In example 9.2, we found that 1 rref(B) = 0 0
0 0 1 −2 . 0 0
Since row 1 and row 2 of rref(B) are non-zero rows of rref(B), 0 1 R(B) = span 0 , 1 . −2 0 Because R(B) is the span of two linearly independent vectors, it is a two-dimensional vector space. 1 −1 5 . Find R(C). Example 10.8: Let C = 3 −2 6 In example 9.3, we found that 1 rref(C) = 0 0
0 1 . 0
Since row 1 and row 2 of rref(C) are non-zero rows of rref(C), 1 0 R(C) = span , . 0 1 Because R(C) is the span of two linearly independent vectors, it is a two-dimensional vector space. 3 9 6 Example 10.9: Let D = . Find R(D). 5 15 7 In example 9.4, we found that 1 rref(D) = 0
3 0
0 . 1
Since row 1 and row 2 of rref(D) are non-zero rows of rref(D), 0 1 R(D) = span 3 , 0 . 0 1 Because R(D) is the span of two linearly independent vectors, it is a two-dimensional vector space.
122
1 Example 10.10: Let E = 2 5
6 12. Find R(E). 9
5 3 10 6 25 8
In example 9.5, we found that 1 rref(E) = 0 0
5 0 0
0 −3 1 3 . 0 0
Since row 1 and row 2 of rref(E) are non-zero rows of rref(E), 0 1 5 , 0 . R(E) = span 0 1 3 −3 Because R(E) is the span of two linearly independent vectors, it is a two-dimensional vector space. There are a few more things to note about the row space before we move on to the null and left null spaces. First, the row space of a m × n matrix is a subspace of Rn . Note that each row of a m × n matrix has n elements, so it is a n-dimensional vector. Remember from chapter 3 that the span of n-dimensional linearly independent vectors is a subspace of Rn . So, the row space of a m × n matrix is a subspace of Rn . For example, the row space of a 4 × 5 matrix is a subspace of R5 . Second, the row space of A is the same as the column space of AT . As we learned in the last section, the column space of a matrix is the vector space of all linear combinations of columns of that matrix. When we take the transpose of a matrix, the rows of A become columns of AT . So, linear combinations of rows of A are the same as linear combinations of columns of AT .
10.4
Null and Left Null Spaces
The null space of a m × n matrix A, denoted as N (A), is the vector space of all n-dimensional vectors ~v such that A~v = 0. The vectors ~v have to be n-dimensional because a m × n matrix can only be multiplied by a n-dimensional vector. Because N (A) is a vector space of n-dimensional vectors, it is a subspace of Rn . We know that any matrix times the 0 vector is the 0 vector, so ~v = 0 is in the null space of any matrix A. The more interesting question is when there can be non-zero vectors ~v such that A~v = 0. There can be non-zero vectors ~v such that A~v = 0 when not all columns of A are linearly independent, i.e. when A has some linearly dependent column(s).
123
Let
| | A = ~a1 ~a2 | |
and
···
| ~an |
v1 v2 ~v = . . ..
vn Then,
| | ~a1 ~a2 | |
···
v1 | v2 ~an . = v1~a1 + v2~a2 + · · · + vn~an , . | . vn
so the equation A~v = 0 becomes v1~a1 + v2~a2 + · · · + vn~an = 0. From the definition of linear independence, we know that the only solution to this is v1 = v2 = · · · = vn = 0 if the column vectors ~a1 , ~a2 , · · · , ~an are all linearly independent. So, if all columns of A are linearly independent, then the only solution to A~v = 0 is ~v = 0. Otherwise, if not all columns of A are linearly independent, then not all of v1 , v2 , · · · , vn are 0, so there can be non-zero vectors ~v such that A~v = 0. To find the vectors ~v in N (A), we need to solve the equation A~v = 0. However, solving A~v = 0 directly could be complicated, so we can solve the equation rref(A)~v = 0 instead because these two equations are equivalent. Reduced row echelon form rref(A) is obtained by doing row operations to A, and recall from chapter 8 that doing row operations to A is the same as multiplying some elementary matrices to A. In high school algebra, we learned that to keep an equation the same, we need to multiply both sides of the equation by the same thing. For the equation A~v = 0, multiplying the elementary matrices to both sides gives rref(A)~v = 0 because the elementary matrices times A is rref(A), and the elementary matrices times 0 is still 0 because any matrix times the 0 vector is the 0 vector. Now let us do some examples of finding the null space of a matrix. 2 1 Example 10.11: Let A = . Find N (A). 6 −3 In example 9.1, we found that
1 rref(A) = 0 124
0 . 1
Let
v ~v = 1 , v2
and we need to solve the equation rref(A)~v = 0: 1 0 v1 0 = 0 1 v2 0 1 0 0 v1 + v2 = 0 1 0 v1 0 = . v2 0 So, we have v1 = 0 and v2 = 0: v 0 ~v = 1 = . v2 0 Thus, N (A) contains only the 0 vector: N (A) =
0 . 0
We can see that N (A) contains only the 0 vector because all columns of A are linearly independent. Since it contains only the 0 vector, it is a 0-dimensional vector space. −2 3 −6 Example 10.12: Let B = 10 −9 18 . Find N (B). 8 −6 12 In example 9.2, we found that 1 rref(B) = 0 0 Let
0 1 0
0 −2 . 0
v1 ~v = v2 , v3
and we need to solve the equation rref(B)~v = 0: 1 0 0 v1 0 0 1 −2 v2 = 0 0 0 0 v3 0 1 0 0 0 v1 0 + v2 1 + v3 −2 = 0 0 0 0 0 v1 0 v2 − 2v3 = 0 . 0 0 125
So, we have v1 = 0 and v2 = 2v3 : v1 0 0 ~v = v2 = 2v3 = v3 2 . v3 v3 1 0 Thus, N (B) contains any vectors ~v that are linear combinations of 2: 1 0 N (B) = span 2 . 1 Because N (B) is the span of one linearly independent vector, it is a one-dimensional vector space. 1 −1 5 . Find N (C). Example 10.13: Let C = 3 −2 6 In example 9.3, we found that 1 rref(C) = 0 0 Let
0 1 . 0
v ~v = 1 , v2
and we need to solve the equation rref(C)~v = 0: 1 0 0 0 1 v1 = 0 v2 0 0 0 1 0 0 v1 0 + v2 1 = 0 0 0 0 v1 0 v2 = 0 . 0 0 So, we have v1 = 0 and v2 = 0: v 0 ~v = 1 = . v2 0 Thus, N (C) contains only the 0 vector: 0 N (C) = . 0 126
We can see that N (C) contains only the 0 vector because all columns of C are linearly independent. Since it contains only the 0 vector, it is a 0-dimensional vector space. 3 9 6 Example 10.14: Let D = . Find N (D). 5 15 7 In example 9.4, we found that 1 rref(D) = 0 Let
3 0
0 . 1
v1 ~v = v2 , v3
and we need to solve the equation rref(D)~v = 0: v1 0 1 3 0 v2 = 0 0 0 1 v3 1 3 0 0 v1 + v2 + v3 = 0 0 1 0 v1 + 3v2 0 = . v3 0 So, we have v1 = −3v2 and v3 = 0: v1 −3v2 −3 ~v = v2 = v2 = v2 1 . 0 v3 0 −3 Thus, N (D) contains any vectors ~v that are linear combinations of 1 : 0 −3 N (D) = span 1 . 0 Because N (D) is the span of one linearly independent vector, it is a one-dimensional vector space. 1 5 3 6 Example 10.15: Let E = 2 10 6 12. Find N (E). 5 25 8 9 In example 9.5, we found that 1 rref(E) = 0 0
5 0 0
127
0 1 0
−3 3 . 0
Let
v1 v2 ~v = v3 , v4
and we need to solve the equation rref(E)~v = 0: v 1 5 0 −3 1 0 0 0 1 3 v2 = 0 v3 0 0 0 0 0 v4 0 −3 0 5 1 v1 0 + v2 0 + v3 1 + v4 3 = 0 0 0 0 0 0 v1 + 5v2 − 3v4 0 v3 + 3v4 = 0 . 0 0 So, we have v1 = −5v2 + 3v4 and v3 = −3v4 : v1 −5v2 + 3v4 3 −5 v2 0 1 v 2 ~v = v3 = −3v4 = v2 0 + v4 −3 . 1 v4 0 v4 −5 1 Thus, N (E) contains any vectors ~v that are linear combinations of 0 and 0 3 0 : −3 1 3 −5 1 0 N (E) = span , . −3 0 0 1 Because N (E) is the span of two linearly independent vectors, it is a two-dimensional vector space. Next, let us learn about the left null space. The left null space of a m × n matrix A is the vector space of all m-dimensional vectors ~u such that ~u T A = 0. The vectors ~u have to be m-dimensional because a m × n matrix can only be multiplied by a m-dimensional row vector. Because the left null space is a vector space of m-dimensional vectors, it is a subspace of Rm .
128
The left null space is actually nothing really new. The left null space of A is the null space of AT , so we just need to find the null space of AT if we want to find the left null space of A. Taking transpose of both sides of the equation AT ~u = 0 gives T AT ~u = 0 because the transpose of 0 column vector is just the 0 row vector. Then, using the formula proven in section 7.5 of chapter 7, T T AT ~u = ~u T AT . From question 6 in the exercises of chapter 6, we know that AT T AT ~u = ~u T A. Thus, the equation AT ~u = 0 is equivalent to the equation ~u T A = 0, so the left null space of A is the same as the null space of AT .
129
T
= A, so
Exercises
2 −2 3 1. Let A1 = 6 12 −12
6 0 . 15
a) Find the column space of A1 . How many dimensions does it have? b) Find the row space of A1 . How many dimensions does it have? c) Find the null space of A1 . How many dimensions does it have? d) Find the left null space of A1 . How many dimensions does it have?
2. Let A2 =
−3 6
2 −4
9 . −18
a) Find the column space of A2 . How many dimensions does it have? b) Find the row space of A2 . How many dimensions does it have? c) Find the null space of A2 . How many dimensions does it have? d) Find the left null space of A2 . How many dimensions does it have?
5 3. Let A3 = −2 7
−3 6 −1
2 −5 3
−1 1 . 2
a) Find the column space of A3 . How many dimensions does it have? b) Find the row space of A3 . How many dimensions does it have? c) Find the null space of A3 . How many dimensions does it have? d) Find the left null space of A3 . How many dimensions does it have?
130
Chapter 11
Inverse of Matrices
131
11.1
Invertible and Singular Matrices
A n × n square matrix A can have the inverse A−1 such that AA−1 = A−1 A = In where In is the n × n identity matrix. Also, if we have A~v = ~u, then multiplying A−1 to both sides gives A−1 A~v = A−1 ~u In~v = A−1 ~u ~v = A−1 ~u. However, not all square matrices have inverses. Matrices that have inverses are called invertible matrices, and matrices that do not have inverses are called singular matrices. So, when is a matrix invertible? When does the inverse of a matrix exist? As an analogue, let us think of the inverse of a function. In high school algebra, we learned that the inverse of a function exists when the graph of that function passes the horizontal line test, which means any horizontal line only intersects the graph once. Let us take an example where the function does not have an inverse: f (x) = x2 . We can see that any horizontal line in the upper half of the xy-plane intersects the graph y = x2 twice. So, the graph does not pass the horizontal line test, and the function does not have an inverse. This happens because for each positive y-value, there are two x-values such that f (x) = y. For example, (−3)2 = 32 = 9. So, f is taking two x-values to the same y-value. The inverse f −1 is supposed to take a yvalue back to a x-value, but in this case f −1 does not know which x-value to take the y-value back to because there are two options. For example, f −1 would not know whether to take the y-value 9 back to the x-value 3 or −3. So, f −1 does not exist. Remember that a function can only take one value to one value, not more than one. We can think of a matrix A and its inverse A−1 in a similar way. As explained earlier, if A~v = ~u, then A−1 ~u = ~v when A−1 exists. We can think of this as A is taking a vector ~v to a vector ~u, and and A−1 is taking the vector ~u back to the vector ~v .
132
When not all columns of A are linearly independent, there can be two or more vectors ~v such that A~v = ~u1 for some vector ~u1 . Suppose there is a vector ~v1 such that A~v1 = ~u1 . As we learned in the last chapter, if not all columns of A are linearly independent, then there are non-zero vectors ~vn in the null space of A. Let ~v10 = ~v1 + ~vn , then A~v10 = A (~v1 + ~vn ) = A~v1 + A~vn = ~u1 + 0 = ~u1 . So, A takes different vectors ~v1 and ~v10 to the same vector ~u1 . Then, A−1 would not know whether to take the vector ~u1 back to the vector ~v1 or ~v10 , so A−1 does not exist. Thus, when not all columns of A are linearly independent, A−1 does not exist, and A is a singular matrix. Otherwise, if all columns of A are linearly independent, then A−1 exists, and A is an invertible matrix.
11.2
Finding Inverse of an Invertible Matrix
In the last section, we discussed that a square matrix A is invertible only when all of its columns are linearly independent. For a n × n square matrix A, if all of the n columns are linearly independent, then rref(A) would be a n×n square matrix with n pivots, which is the n × n identity matrix. For example, the reduced row echelon form of a 3 × 3 square matrix whose all three columns are linearly independent is 1 0 0 0 1 0 . 0 0 1 That means we can do some row operations to A to get the identity matrix. Remember that doing row operations to A is the same as multiplying some elementary matrices to A. Since multiplying the product of those elementary matrices to A gives the identity matrix, the product of those elementary matrices is A−1 . So, to find A−1 , we need to know what the product of those elementary matrices is. Now how do we know what the product of those elementary matrices is? Well, we can rewrite each row operation as a matrix multiplication, keep track of each elementary matrix, and then multiply all of the elementary matrices at the end. However, there is a better way. We can attach the identity matrix I to the right of A and do the same row operations to both A and I at the same time. By doing this, we are multiplying the same elementary matrices to I, and any matrix times I is the matrix itself, so we will know what the product of those elementary matrices is. Let us look at some examples.
133
Example 11.1: Let A =
2 6
1 . Is A invertible? If so, find A−1 . −3
By reducing A to rref(A), we can see that all columns are linearly independent, so A is invertible. To find A−1 , we first attach I2 to the right of A: 2 1 1 0 . 6 −3 0 1 Next, we do some row operations to reduce A to rref(A). Multiplying first row by 1 2, 1 2 1 1 0 1 21 0 2 → . 6 −3 0 1 6 −3 0 1 Adding −6 times row 1 to row 2, 1 1 12 0 1 2 → 6 −3 0 1 0 Multiplying row 2 by − 16 , 1 21 0 −6
1 2
0 1
−3
→
1 2
−6
1 0
1 2
1
Adding − 12 times row 2 to row 1, 0 1 0 1 12 12 → 0 1 12 − 16 0 1
1 2
−3
1 2 1 2
1 4 1 2
0 1
0 − 61
1 12 − 16
.
.
.
Therefore, A We can check that
1
Example 11.2: Let B =
−1
1 =
4 1 2
1 12 − 16
.
4 1 2
1 12 − 16
1 5
3 . Is B invertible? If so, find B −1 . 15
2 6
1 1 = −3 0
0 . 1
We can see that the second column is 3 times the first column, so not all columns of B are linearly independent. So, B is not invertible. We can use reduced row echelon form to check the linear dependence of columns, but it is not necessary in this case where we can see quickly like this. 1 6 −3 Example 11.3: Let C = −2 0 −6. Is C invertible? If so, find C −1 . 3 9 1 By reducing C to rref(C), we can see that all columns are linearly independent, so C is invertible. To find C −1 , we first attach I3 to the right of C: 1 6 −3 1 0 0 −2 0 −6 0 1 0 . 3 9 1 0 0 1 134
Next, we do some row 2, 1 −2 3
row operations to reduce C to rref(C). Adding 2 times row 1 to 6 −3 0 −6 9 1
1 0 0
Adding −3 times row 1 to 1 6 −3 0 12 −12 3 9 1
0 1 0 → 0 1 3
0 1 0
6 −3 1 12 −12 2 0 9 1
row 3, 1 2 0
0 1 0 → 0 1 0
0 1 0
1 Multiplying row 2 by 12 , 1 6 −3 1 0 12 −12 2 0 −9 10 −3
6 −3 12 −12 −9 10
1 0 0 1 0 → 0 0 1 0
6 1 −9
−3 −1 10
Adding −6 times row 2 to row 1, 1 6 −3 1 0 0 1 1 0 1 −1 1 → 0 0 6 12 0 −9 10 −3 0 1 0
0 1 −9
3 −1 10
0 1 0
3 −1 1
Adding 9 times row 2 to row 3, 1 0 3 0 − 12 1 0 1 −1 1 6 12 0 −9 10 −3 0
1 0 0 → 0 1 0
Adding row 3 to row 2, 1 0 3 0 1 0 1 −1 6 0 0 1 − 32
0 1 0 3 0 → 0 1 0 1 0 0 1
Adding −3 times 1 0 0 1 0 0
0 0 . 1
0 1 0
− 12 1 12 3 4
0 0 1 0 . 0 1
1 2 −3
1
0
1 6
1 12
−3
0
− 21
0 1 6
1 12
−3
0
− 12
0 1 6 − 32
0 − 34 − 23
1 12 3 4
− 12 5 6 3 4
0 0 . 1
0 0 . 1
0 0 . 1
0 1 . 1
row 3 to row 1, 3 0 1
− 21
0 − 43 − 32
5 6 3 4
0 1 0 0 1 → 0 1 0 1 0 0 1
9 2 − 43 − 32
− 11 4 5 6 3 4
Therefore, 9 2 − 4 3 − 23
C −1 = We can check that
9 2 − 4 3 − 32
− 11 4 5 6 3 4
−3 1 1 −2 3 1
− 11 4 5 6 3 4
−3 1 . 1
6 −3 1 0 −6 = 0 9 1 0
135
0 1 0
0 0 . 1
−3 1 . 1
−2 Example 11.4: Let D = 10 8
3 −9 −6
−6 18 . Is D invertible? If so, find D−1 . 12
In example 9.2, we found that 1 rref(D) = 0 0 Since there are only two pivot dent, so D is not invertible. 1 0 Example 11.5: Let E = 0 0
0 1 0
0 −2 . 0
columns, not all columns of D are linearly indepen0 2 0 0
0 0 3 0
0 0 . Is E invertible? If so, find E −1 . 0 5
By reducing E to rref(E), we can see that all columns are linearly independent, so E is invertible. To find E −1 , we first attach I4 to the right of E: 1 0 0 0 1 0 0 0 0 2 0 0 0 1 0 0 0 0 3 0 0 0 1 0 . 0 0 0 5 0 0 0 1 Next, we do 1 0 0 0
some row operations to reduce E to rref(E). 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 2 0 0 0 1 0 0 → 0 0 3 0 0 3 0 0 0 1 0 0 0 5 0 0 0 1 0 0 0 5
Multiplying 1 0 0 0
row 3 by 31 ,
Multiplying 1 0 0 0
row 4 by 51 ,
1 0 0 0
0
1 0 0 0
0
0 0 0 1 0 0 0 3 0 0 0 5
0 0 0 1 0 0 0 1 0 0 0 5
1 2
0 0
1 2
0 0
0 0 0 0 → 1 0 0 1
0 0 1 3
0
1 0 0 0
0 1 0 0
0 0 1 0
Multiplying row 2 by 21 , 1 0 0 0 0 12 0 0 . 0 0 1 0 0 0 0 1
0 0 0 5
1 0 0 0
0 1 2
0 0
0 0
0
1 0 0 0 0 0 1 0 0 0 → 0 0 1 0 0 1 0 0 0 1
1 0 0 0
0
1 3
1 2
0 0
0 0
0
1 3
0 0 . 0 1
0 0 . 0 1 5
Therefore, 1 0 0 0 0 1 0 0 2 E −1 = 0 0 1 0 . 3 0 0 0 15 We can see that finding the inverse of an invertible diagonal matrix is very simple: we just need to take the reciprocal of each main diagonal element. 136
Exercises
2 −2 3 1. Let A1 = 6 12 −12 2. Let A2 =
3. Let A3 =
6 0 . Is A1 invertible? If so, find A−1 1 . 15
−5 15
7 . Is A2 invertible? If so, find A−1 2 . −21
3 . Is A3 invertible? If so, find A−1 3 . −2
2 −1
−3 4. Let A4 = 5 6
2 −1 −4
1 3 . Is A4 invertible? If so, find A−1 4 . −2
5. For any invertible matrices A and B, show that A−1 (B − A)B −1 = A−1 − B −1 .
6. For any invertible matrices A and B, show that (AB)−1 = B −1 A−1 .
137
Chapter 12
Determinant of Matrices
138
12.1
Properties of Determinant
In this chapter, we will learn about the determinant of a square matrix. The determinant of a square matrix A is denoted as det(A). The determinant is, in some sense, a function that assigns a number to each matrix, and it is uniquely defined by some properties. First, we will learn those properties of determinant in this section. Property 1: The determinant of the identity matrix is 1. For example, det
1 0 = det 0 1 0
1 0
0 1 0
0 0 = 1. 1
Property 2: The determinant is linear in each row. First, what does it mean to be linear? A function f (x) is linear if f (x + x0 ) = f (x) + f (x0 ) and f (nx) = nf (x) for any number n. In high school algebra, we learned about the linear functions of the form f (x) = mx + b. We can check that mx + b satisfies the conditions f (x + x0 ) = f (x) + f (x0 ) and f (nx) = nf (x). For the determinant of a matrix, this linearity occurs in each row. For example, we have 0 0 a + a0 b + b0 a b a b det = det + det c d c d c d and
det
a nc
b a = n det nd c
b . d
Property 3: Switching two rows changes the sign of the determinant. Switching two rows of a matrix once changes the sign of the determinant of the matrix. So, switching rows an even number of times would keep the determinant the same because (−1)n = 1 if n is even. For example, we have a b c d det = −det . c d a b These three properties define what a determinant is. However, from these properties, we can derive more important properties of determinant.
139
Property 4: Determinant of a diagonal matrix is the product of all elements on the main diagonal. Let
d1 0 D=. ..
0 d2 .. .
··· ··· .. .
0 0 .. . .
0
0
···
dn
By property 2, the determinant is linear in each row, so we can factor out d1 from the first row, d2 from the second row,..., and dn from the n-th row: 1 0 ··· 0 d1 0 · · · 0 0 1 · · · 0 0 d2 · · · 0 = d1 d2 · · · dn det . . . det . . . . . . ... .. .. .. .. .. .. 0 0 ··· 1 0 0 · · · dn = d1 d2 · · · dn det(In ). By property 1, we know that det(In ) = 1, so d1 0 · · · 0 0 d2 · · · 0 det . .. . . .. = d1 d2 · · · dn . .. . . . 0
0
···
dn
Property 5: If a matrix has two or more same rows, then the determinant is 0. If a matrix A has two same rows, then switching those two rows would still give the same matrix A. By property 3, we know that switching two rows changes the sign of the determinant, so we have det(A) = −det(A), which means det(A) = 0. Property 6: Adding a multiple of another row to a row does not change the determinant.
|
|
~a2T
|
.
~a3T
| .
k~a2T + ~a3T 140
|
|
|
|
Then, adding k times row 2 to row 3 gives ~a1T ~a2T
|
A=
~a1T
|
|
|
Say we have a matrix A with three rows:
|
~a2T
|
~a3T
|
|
| |
|
~a2T
|
= 0
~a2T
.
~a3T
|
~a2T
|
|
~a1T
|
|
|
~a2T
|
+ det
|
|
|
|
|
~a1T
|
|
+ det
|
|
|
|
|
|
|
|
|
|
|
Applying linearity in the third row, we have ~a1T ~a1T det ~a2T = det ~a2T T T k~a2T k~a2 + ~a3 ~a1T = k det ~a2T
| |
k~a2T + ~a3T
~a2T
.
~a3T
|
= det
~a1T
|
|
|
|
|
|
|
because there are two same rows, so ~a1T det ~a2T
|
|
|
det
~a1T
|
|
By property 5, we know that
So, we can see how adding a multiple of another row to a row does not change the determinant. Property 7: If there is at least one 0 row in the matrix, then the determinant is 0. Say we have a 2 × 2 matrix with a 0 row: 0 0 . c d Since 0 = n · 0 for any number n, applying linearity in the first row gives 0 0 n·0 n·0 det = det c d c d 0 0 = n det . c d So, the determinant must be 0. Property 8: If not all rows are linearly independent, then the determinant is 0. As explained in section 9.2 of chapter 9, if there are some linearly dependent rows, then they will be cancelled out and become 0 rows after doing some row operations, specifically by adding multiples of some rows to other rows. By property 141
6, adding a multiple of another row to a row does not change the determinant. So, the determinant of a matrix with linearly dependent rows is the same as the determinant of a matrix with 0 rows. By property 7, the determinant of a matrix with 0 rows is 0. So, the determinant of a matrix with linearly dependent rows is 0. Property 9: For a square matrix A, det AT = det(A). We can see why det AT = 0 when det (A) = 0. By property 8, when not all rows of A are linearly independent, det (A) = 0. A square matrix A has the same number of rows and columns, and the number of linearly independent rows is the same as the number of linearly independent columns. So, if not all rows of A are linearly independent, then not all columns of A are linearly independent. Since columns of A are rows of AT , that means not all rows of AT are linearly independent. So, det AT = 0 by property 8. It is harder to see why det AT = det(A) when det(A) is non-zero. We can easily see that the formula is true when A is a diagonal matrix. A diagonal matrix is also symmetric, so AT = A, and det AT = det(A). If all columns (or rows) of A are linearly independent, then A can be reduced to a diagonal matrix by doing some row operations, and there are determinant properties corresponding to the row operations. So, that gives an intuitive sense for why the formula det AT = det(A) should be true. However, the precise proof of the formula is quite complicated, so we will not prove it here. Now note that by applying the previous properties to the rows of AT , we get similar properties for columns of A because rows of AT are columns of A. Property 10: The determinant is linear in each column. Property 11: Switching two columns changes the sign of the determinant. Property 12: If a matrix has two or more same columns, then the determinant is 0. Property 13: Adding a multiple of another column to a column does not change the determinant. Property 14: If there is at least one 0 column in the matrix, then the determinant is 0. Property 15: If not all columns are linearly independent, then the determinant is 0. In section 11.1 of chapter 11, we learned that A is not invertible when not all columns of A are linearly independent. Here, we know that det(A) = 0 when not all columns of A are linearly independent. So, A is a singular matrix if det(A) = 0. Property 16: For square matrices A and B, det(AB) = det(A)det(B). 142
We can check that this is true when det(A) = 0 or det(B) = 0, which is when det(A)det(B) = 0. Say det(A) = 0. By property 15, det(A) = 0 when not all columns of A are linearly independent. By using method 1 of matrix multiplication, which is multiplying A to each column of B to obtain each column of the product AB, we can see that the columns of AB are some linear combinations of the columns of A. So, if not all columns of A are linearly independent, then not all columns of AB are linearly independent. By property 15, det(AB) = 0. Similarly, we can check that det(AB) = 0 when det(B) = 0.
|
|
B=
~b T 1 ~b T
2 ~b T 3
|
|
.
|
and
|
When det(A) 6= 0 and det(B) 6= 0, we can have an intuitive sense for why the formula det(AB) = det(A)det(B) should be true just like how we had an intuitive sense for why the formula det AT = det(A) should be true. First, we can check that the formula det(AB) = det(A)det(B) is true when A is a diagonal matrix. Let us use 3 × 3 matrices as an example: let d1 0 0 A = 0 d2 0 0 0 d3
|
.
|
|
|
|
| |
2 ~ d3 b3T
|
|
det(AB) = det
d1~b1T d2~b T
|
|
3
|
so
|
3
|
|
|
|
|
By using method 2 of matrix multiplication, which is multiplying each row of A to B to obtain each row of the product AB, we get ~T b1 d1~b1T d1 0 0 AB = 0 d2 0 ~b2T = d2~b2T , 0 0 d3 ~b T d3~b T
Since
d1 det(A) = det 0 0
0 d2 0 143
|
3
|
|
|
|
|
| |
3
|
|
|
|
Since the determinant is linear in each row, we can factor out d1 from the first row, d2 from the second row, and d3 from the third row: ~b T d1~b1T 1 det d2~b2T = d1 d2 d3 det ~b2T = d1 d2 d3 det(B). ~b T d3~b T 0 0 = d1 d2 d3 d3
|
|
|
|
|
|
general,
|
.
|
|
|
|
|
|
n
|
|
|
|
|
|
|
|
|
|
|
by property 4, we indeed have det(AB) = det(A)det(B). In ~b T 1 d1 0 · · · 0 ~ T b 0 d2 · · · 0 2 det .. .. . . .. . .. . . . . 0 0 · · · dn ~b T n T d1~b1 d2~b2T =det .. . T ~ dn bn ~b T 1 ~ T b2 =d1 d2 · · · dn det .. . ~b T n ~b T 1 d1 0 · · · 0 ~b T 0 d2 · · · 0 2 =det . .. . . .. det .. .. . . . . 0 0 · · · dn ~b T
Thus, the formula det(AB) = det(A)det(B) is indeed true when A is a diagonal matrix. Then, if all columns of A are linearly independent, then A can be reduced to a diagonal matrix by doing some row operations, and there are determinant properties corresponding to the row operations. So, that gives an intuitive sense for why the formula det(AB) = det(A)det(B) should be true.
12.2
Determinant Formula for 2 x 2 Matrices
In this section, we will use some properties of determinant to derive the determinant formula for any 2 × 2 matrix a11 a12 . a21 a22 Since a11
a12 = a11
0 + 0
a12 ,
applying linearity in the first row gives a11 a12 a11 0 0 det = det + det a21 a22 a21 a22 a21
a12 a22
Similarly, applying linearity in the second row gives a11 0 a11 0 a11 det = det + det a21 a22 a21 0 0 144
0 a22
.
(12.1)
and
det
0 a21
a12 a22
= det
0 a21
a12 0
0 + det 0
a12 a22
.
By property 14, det
0 0 = det 0 0
a11 a21
a12 a22
because there is a 0 column, so a11 0 a11 det = det a21 a22 0 and
det
0 a21
a12 a22
= det
0 a21
=0
0 a22
a12 0
,
(12.2)
.
(12.3)
Substituting (12.2) and (12.3) into (12.1), a11 a12 a11 0 0 det = det + det a21 a22 0 a22 a21
a12 0
.
By property 4, det
a11 0
0 a22
= a11 a22 .
By property 3 and property 4, 0 a12 a21 det = −det a21 0 0
0 a12
= −a21 a12 .
Therefore, det
a11 a21
a12 a22
= a11 a22 − a21 a12 .
Note that when we decompose a determinant into smaller determinant matrices using linearity in each row, non-zero determinants occurred when each row and each column only has one element of the original matrix, such as a11 0 det 0 a22 and
det
0 a21
a12 0
.
When some column or row has more than one element from the original matrix, such as a11 0 det a21 0 and
det
0 0
a12 a22
,
there will be some 0 column or row, which makes the determinant 0. In the next section, we will derive the determinant formula for 3 × 3 matrices in a similar manner. 145
12.3
Determinant Formula for 3 x 3 Matrices
Remember that when we decompose a determinant into smaller determinant matrices using linearity in each row, non-zero determinants occur only when each row and each column only has one element of the original matrix. So, we just need to care about the determinant matrices whose columns and rows each has only one element: a11 a12 a13 det a21 a22 a23 a31 a32 a33 a11 =det 0 0
0 a22 0
0 a11 0 + det 0 a33 0
0 0 a32
0 a23 0
a12 0 0
0 0 0 + det 0 a33 a31
a12 0 0
0 a23 0
0 a22 0
0 a13 0 + det a21 0 0
0 0 a32
a13 0 . 0
0 + det a21 0 0 + det 0 a31
(12.4)
By property 4, a11 det 0 0
0 a22 0
0 0 = a11 a22 a33 . a33
By property 3 and property 4, switching row a11 0 0 a11 0 a23 = −det 0 det 0 0 a32 0 0
2 and row 3 gives 0 0 a32 0 = −a11 a23 a32 . 0 a23
By property 3 and property 4, switching row 0 a12 0 a21 0 = −det 0 det a21 0 0 0 a33 0
1 and row 2 gives 0 0 a12 0 = −a12 a21 a33 . 0 a33
(12.5)
(12.6)
(12.7)
By property 3 and property 4, switching row 1 and row 2 and then switching row 1 and row 3 give 0 a12 0 a31 0 0 0 a23 = det 0 a12 0 = a12 a23 a31 . (12.8) det 0 a31 0 0 0 0 a23 146
By property 3 and property 4, switching row 0 0 a13 a31 det 0 a22 0 = −det 0 a31 0 0 0
1 and row 3 gives 0 0 a22 0 = −a13 a22 a31 . 0 a13
(12.9)
By property 3 and property 4, switching row 2 and row 3 and then switching row 1 and row 3 give 0 0 a13 a21 0 0 0 = det 0 a32 0 = a13 a21 a32 . (12.10) det a21 0 0 0 a13 0 a32 0 Substituting (12.5), a11 det a21 a31
(12.6), (12.7), (12.8), (12.9), and (12.10) into (12.4), a12 a13 a22 a23 =a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a32 a33 − a13 a22 a31 − a12 a21 a33 − a11 a23 a32 .
For higher order matrices such as 4 × 4 matrices or 5 × 5 matrices, we can find the determinants in a similar manner by breaking them down into smaller determinant matrices using linearity of determinant in each row, or we can also do some row operations to make the matrix simpler using the determinant properties related to row operations.
147
Exercises 1. For a square matrix A, what is det A3 in terms of det(A)?
2. For a n × n square matrix A, what is det (5A) in terms of det(A)?
3. For an invertible matrix A, what is det A−1 in terms of det(A)?
1 −2
8 6
4. det
5. det
5 =? 7 3 =? 1
−3 1 5 −6 =? −2 0
2 6 7. det −8 9
0 −5 1 20
0 0 3 −12
−1 0 8. det 0 0
−3 6 0 0
2 5 −7 0
2 6. det 0 −1
0 0 =? (Hint: do row operations) 0 10 8 −3 =? (Hint: do row operations) −2 3
148
Chapter 13
Systems of Linear Equations
149
13.1
Systems of Linear Equations as Matrix Linear Equations
In high school algebra, we learned about systems of linear equations such as 3x + 5y − 2z x − 6y + 2z 7x + y + 9z
=8 = −5 . = 11
(13.1)
The system of linear equations in (13.1) has three equations and three unknown variables. In high school algebra, we mostly learned about systems where the number of equations is the same as the number of unknowns. However, this is not always necessarily the case. For example, we can have a system of equations like 5x − 8y x + 2y 7x − 5y
=3 = −3 . = −2
(13.2)
Any system of linear equations can be written as a matrix linear equation of the form A~x = ~b, where A is the matrix containing all the coefficients on the left-hand side of the system, ~x is the column vector containing all the unknown variables, and ~b is the column vector containing all the numbers on the right-hand side of the system. For example, the system of equations in (13.1) can be written as 3x + 5y − 2z 8 x − 6y + 2z = −5 7x + y + 9z 11 5y −2z 8 3x x + −6y + 2z = −5 7x y 9z 11 3 5 −2 8 x 1 + y −6 + z 2 = −5 7 1 9 11 3 5 −2 x 8 1 −6 2 y = −5 , 7 1 9 z 11 150
and the system of equations in (13.2) can be written as 5x − 8y 3 x + 2y = −3 7x − 5y −2 5x −8y 3 x + 2y = −3 7x −5y −2 3 −8 5 x 1 + y 2 = −3 −2 −5 7 3 5 −8 1 2 x = −3 . y −2 7 −5 In this chapter, we will learn about systems of linear equations more generally as matrix linear equations.
13.2
Characteristic of Matrix Linear Equations
When we learned about systems of linear equations, we learned that a system can have no solution, a unique solution (only one solution), or infinitely many solutions. No Solution vs Has Solution(s) Let us discuss when the equation A~x = ~b
(13.3)
has or does not have a solution. Remember from chapter 7 that multiplying a matrix A by a column vector gives some linear combination of columns of A. So, the equation (13.3) has at least one solution when ~b is a linear combination of columns of A. Then, remember from chapter 10 that the column space of A is the space of all linear combinations of columns of A. So, the equation (13.3) has at least one solution when ~b is in the column space of A. Now what condition would guarantee that the vector ~b would always be in the column space of A? That is, what condition would guarantee that the equation (13.3) would definitely have at least one solution? The equation (13.3) would definitely have at least one solution when the rank of A is equal to the number of rows of A. Say A is a m × n matrix. Remember from chapter 10 that the column space of a m × n matrix is a subspace of Rm . If the rank of A is equal to m, which is the number of rows, then there are m linearly independent columns. As we learned in chapter 10, the column space of A is the span of linearly independent columns of A, so it would be the span of m linearly independent columns, which means it 151
is a m-dimensional vector space. Thus, if the rank of A is equal to the number of rows of A, then the column space of A would be a m-dimensional subspace of Rm , which is the whole Rm space itself. Since the column space of A is the whole Rm space, any m-dimensional vector ~b would always be in the column space of A, and the equation (13.3) would definitely have at least one solution. On the other hand, if the rank of A is less than the number of rows of A, then the equation (13.3) might or might not have solution depending on whether ~b is in the column space of A. Unique Solution vs Infinite Solutions When the equation (13.3) has solution, it could have a unique solution or infinitely many solutions. Let us discuss when it has a unique solution or infinitely many solutions. The equation (13.3) can have infinitely many solutions when there are non-zero vectors in the null space of A. Say there is a vector ~x = ~vp that satisfies equation (13.3): A~vp = ~b. Let ~vn be a non-zero vector in the null space of A. Then, ~x = ~vp + ~vn is another vector which satisfies equation (13.3) because A(~vp + ~vn ) = A~vp + A~vn = ~b + 0 = ~b. As we saw in chapter 10, if there are some non-zero vectors in the null space of A, then their span is also in the null space of A. So, there would be infinitely many non-zero vectors in the null space of A, and we can have infinitely many solutions for the equation (13.3). As we learned in chapter 10, there are non-zero vectors in the null space of A when not all columns of A are linearly independent, that is when the rank of A is less than the number of columns of A. So, the equation (13.3) can have infinitely many solutions if the rank of A is less than the number of columns of A. On the other hand, the equation (13.3) can only have at most one solution if the rank of A is equal to the number of columns of A.
13.3
Solving Matrix Linear Equations
Now that we learned about the characteristic of a matrix linear equation based on the rank of the matrix on the left-hand side, let us learn how to solve a matrix linear equation. We will solve a matrix linear equation A~x = ~b 152
by using rref(A). We can obtain rref(A) by doing some row operations to A, which is the same as multiplying some elementary matrices to the left of A. Then, to keep the equation the same, we need to multiply the same elementary matrices to the left of ~b on the right-hand side as well, so we need to do the same row operations to ~b. (We can think of a m-dimensional vector as a m × 1 matrix.) So, to solve a matrix linear equation, we first attach the vector ~b to the right of A, and then we do some row operations to reduce A to rref(A). At the same time, the vector ~b will become some new vector ~b0 . Then, we will solve the equation rref(A)~x = ~b0 , which is much simpler than the original equation. Also, notice that reducing A~x = ~b to rref(A)~x = ~b0 by doing row operations is like how we learned to solve systems of linear equations in high school algebra. In high school, we learned to solve a system of linear equations by multiplying an equation by a number, which is the same as multiplying a row of a matrix by a scalar, and adding a multiple of an equation to another equation, which is the same as adding a multiple of a row to another row. Let us do some examples to get familiar with this. 2 1 x 7 Example 13.1: Solve = . 6 −3 y 3 First, we attach the vector on the right-hand side to the matrix on the left-hand side: 2 1 7 . 6 −3 3 Then, we do some row operations to reduce the matrix to its reduced row echelon form. Multiplying first row by 12 , 7 2 1 7 1 12 2 → . 6 −3 3 6 −3 3 Adding −6 times row 1 to row 2, 7 1 12 1 2 → 6 −3 3 0
−6
Multiplying row 2 by − 16 , 1 0
1 0
1 2
−6
7 2
−18
→
153
1 2
7 2
.
−18
1 2
7 2
1
3
.
Adding − 21 times row 2 to row 1,
1 0
1 2
7 2
1
3
→
1 0
0 1
2 3
.
So, the original equation is equivalent to 1 0 x 2 = , 0 1 y 3 which gives x 2 = y 3 when we do the multiplication on solution. −2 3 Example 13.2: Solve 10 −9 8 −6
the left-hand side. This equation has a unique x 8 −6 18 y = −2. z 5 12
First, we attach the vector on the right-hand side to the matrix on the left-hand side: −2 3 −6 8 10 −9 18 −2 . 8 −6 12 5 Then, we do some row operations to reduce the matrix to its reduced row echelon form. Multiplying row 1 by − 12 ,
−2 10 8
3 −6 −9 18 −6 12
Adding −10 times row 1 to 1 − 32 10 −9 8 −6
8 1 − 32 −2 → 10 −9 5 8 −6
3 18 12
−4 −2 . 5
row 2, −4 1 −2 → 0 5 8
− 32 6 −6
3 −4 −12 38 . 12 5
Adding −8 times row 1 to row 3, 3 1 − 32 −4 1 0 6 −12 38 → 0 8 −6 12 5 0
− 32 6 6
3 −12 −12
−4 38 . 37
− 32 1 6
3 −2 −12
−4
3 18 12
Multiplying row 2 by 61 ,
1 0 0
− 32 6 6
1 3 −4 −12 38 → 0 −12 37 0 154
19 3
37
.
Adding
3 2
times row 2 1 0 0
to row 1, − 32 1 6
Adding −6 times row 2 1 0 0
3 −4 1 0 → 0 1 −2 19 3 −12 37 0 6
11 2 19 3
0 −2 −12
.
37
to row 3, 0 1 6
11 2 19 3
1 → 0 37 0
0 −2 −12
0 1 0
11 2 19 3
0 −2 0
.
−1
So, the original equation is equivalent to 11 1 0 0 x 2 0 1 −2 y = 19 , 3 0 0 0 z −1 which gives 11 x 2 y − 2z = 19 3 0 −1
when we do the multiplication on the left-hand side. Therefore, this equation has no solution because 0 6= −1. x 9 3 9 6 y = . Example 13.3: Solve 12 5 15 7 z First, we attach the vector on the right-hand side to the matrix on the left-hand side: 3 9 6 9 . 5 15 7 12 Then, we do some row operations to reduce the matrix to its reduced row echelon form. Multiplying row 1 by 13 , 3 9 6 9 1 3 2 3 → . 5 15 7 12 5 15 7 12 Adding −5 times row 1 to row 2, 1 3 2 3 1 → 5 15 7 12 0 Multiplying row 2 by − 13 , 1 3 0 0
2 −3
3 −3
→
Adding −2 times row 2 to row 1, 1 3 2 3 1 → 0 0 1 1 0 155
3 0
2 −3
3 −3
1 3 2 0 0 1
3 0
0 1
3 1
1 1
.
.
.
So, the original equation is equivalent to x 1 3 0 1 y = , 0 0 1 1 z which gives x + 3y 1 = z 1 when we do the multiplication on the left-hand side. So, we have x + 3y = 1 ⇔ x = 1 − 3y and z = 1. Therefore, the solutions are 1 −3 1 − 3y x y = y = 0 + y 1 z 1 0 1 for any real numbers y. This 1 Example 13.4: Solve 3 −2
equation has infinitely many solutions. −1 6 x 5 = 10 . y 6 −16
First, we attach the vector on the right-hand side to the matrix on the left-hand side: 6 1 −1 3 5 10 . −2 6 −16 Then, we do some row operations to reduce the matrix to its reduced row echelon form. Adding −3 times row 1 to row 2, 1 −1 1 −1 6 6 3 5 10 → 0 8 −8 . −2 6 −16 −2 6 −16 Adding 2 times row 1 to row 3, 1 −1 6 1 0 8 −8 → 0 −2 6 −16 0 Multiplying row 2 by 81 ,
1 0 0
−1 8 4
6 1 −8 → 0 −4 0 156
−1 8 4
−1 1 4
6 −8 . −4
6 −1 . −4
Adding row 2 to row 1,
1 0 0
−1 1 4
6 1 0 −1 → 0 1 −4 0 4
Adding −4 times row 2 to row 3, 1 0 5 1 0 0 1 −1 → 0 1 0 4 −4 0 0
5 −1 . −4
5 −1 . 0
So, the original equation is equivalent to 1 0 5 x 0 1 = −1 , y 0 0 0 which gives x 5 y = −1 0 0 when we do the multiplication on the left-hand side. Therefore, the solution is x 5 = . y −1 This equation has a unique solution. 1 Example 13.5: Solve 2 5
5 3 10 6 25 8
x 6 6 y = 12. 12 z 23 9 w
First, we attach the vector on the right-hand side: 1 5 3 6 2 10 6 12 5 25 8 9
side to the matrix on the left-hand 6 12 . 23
Then, we do some row operations to reduce the form. Adding −2 times row 1 to row 2, 1 5 3 6 6 1 2 10 6 12 12 → 0 5 25 8 9 23 5 Adding −5 times row 1 1 5 0 0 5 25
matrix to its reduced row echelon 5 0 25
3 6 0 0 8 9
6 0 . 23
to row 3, 3 6 0 0 8 9
6 1 5 0 → 0 0 23 0 0 157
3 0 −7
6 6 0 0 . −21 −7
Multiplying row 3 by − 17 , 1 5 3 0 0 0 0 0 −7
6 6 1 0 0 → 0 −21 −7 0
5 0 0
Adding −3 times row 3 to row 1, 1 5 3 6 6 1 5 0 0 0 0 0 0 → 0 0 0 0 0 1 3 1 0 0 1 Switching row 2 and row 1 5 0 0 0 0
3 0 1
6 0 3
−3 0 3
6 0 . 1
3 0 . 1
3, −3 0 3
0 0 1
1 5 0 3 0 → 0 0 1 0 0 0 1
−3 3 3 1 . 0 0
So, the original equation is equivalent to 1 0 0
5 0 0
0 1 0
x 3 −3 y 1 , 3 = z 0 0 w
which gives x + 5y − 3w 3 z + 3w = 1 0 0 when we do the multiplication on the left-hand side. So, we have x + 5y − 3w = 3 ⇔ x = 3 − 5y + 3w and z + 3w = 1 ⇔ z = 1 − 3w. Therefore, the solutions are x 3 − 5y + 3w 3 −5 3 y 0 1 0 y = z 1 − 3w = 1 + y 0 + w −3 w w 0 0 1 for any real numbers y and w. This equation has infinitely many solutions.
158
Exercises
2 −2 3 1. Solve 6 12 −12 −3 2. Solve 6
2 −4
6 x −8 0 y = 12 . 15 z −27 x 9 1 y = . −18 2 z x −7 −1 y = 13 . 1 z 0 2 w
5 3. Solve −2 7
−3 6 −1
−6 4. Solve 1 5
3 5 x −2 = −3. y 1 −2
2 −5 3
159
Appendices
160
Appendix A
Cauchy-Schwarz Inequality In this appendix, we will prove the Cauchy-Schwarz inequality, which we used to prove the triangle inequality for `2 norm in chapter 4. In the proof, we will use the following two basic facts about inequality. Fact 1: If a ≥ b and c ≥ d, then a + c ≥ b + d. This should be intuitive because the sum of larger numbers must be larger than the sum of smaller numbers. Fact 2: The square of anything is larger than or equal to 0 (non-negative). This follows from the range of parabola: x2 ≥ 0 for all real x. Recall that the Cauchy-Schwarz inequality states that 2 a21 + a22 + · · · + a2n b21 + b22 + · · · + b2n ≥ (a1 b1 + a2 b2 + · · · + an bn ) . Let us prove the case n = 2 first, which is 2 a21 + a22 b21 + b22 ≥ (a1 b1 + a2 b2 ) . Using fact 2, we have (x − y)2 ≥ 0 x2 + y 2 − 2xy ≥ 0 x2 + y 2 ≥ 2xy. Let x1 = p
a1 a21
and y1 = p
+ a22
b1 b21 + b22
162
(A.1)
for the inequality (A.1), we get
a21
b2 2a1 b1 a21 . + 2 1 2 ≥p 2 2 + a2 b1 + b2 (a1 + a22 ) (b21 + b22 )
(A.2)
Let x2 = p
a2 a21 + a22
and y2 = p
b2 b21
+ b22
for the inequality (A.1), we get
a21
b2 2a2 b2 a22 + 2 2 2 ≥p 2 . 2 b1 + b2 + a2 (a1 + a22 ) (b21 + b22 )
(A.3)
Using fact 1, adding (A.2) and (A.3) gives b2 + b22 2a1 b1 + 2a2 b2 a21 + a22 + 12 ≥p 2 2 2 2 a1 + a2 b1 + b2 (a1 + a22 ) (b21 + b22 ) 2(a1 b1 + a2 b2 ) 2≥ p 2 (a1 + a22 ) (b21 + b22 ) a1 b1 + a2 b2 1≥ p 2 (a1 + a22 ) (b21 + b22 ) q
(a21 + a22 ) (b21 + b22 ) ≥ a1 b1 + a2 b2 .
Squaring both sides, we obtain the Cauchy-Schwarz inequality for n = 2: a21 + a22
2 b21 + b22 ≥ (a1 b1 + a2 b2 ) .
Similarly, the general case for any natural number n can be proven by letting a1 x1 = p 2 , a1 + a22 + · · · + a2n a2 , x2 = p 2 2 a1 + a2 + · · · + a2n .. . an xn = p 2 , 2 a1 + a2 + · · · + a2n 163
and y1 = p y2 = p
b1 b21
+
b22
+ · · · + b2n b2
b21 + b22 + · · · + b2n
, ,
.. . yn = p
bn b21 + b22 + · · · + b2n
.
There is also a simpler proof using dot product. In chapter 5, we learned that ~u · ~v = ||~u||2 ||~v ||2 cos θ where θ is the smaller angle between ~u and ~v . Squaring both sides, 2
2
2
(~u · ~v ) = (||~u||2 ) (||~v ||2 ) cos2 θ. Because cos2 θ ≤ 1, 2
2
2
(~u · ~v ) ≤ (||~u||2 ) (||~v ||2 ) . Let
a1 a2 ~u = . .. an
and
b1 b2 ~v = . , .. bn
we obtain the Cauchy-Schwarz inequality 2
(a1 b1 + a2 b2 + · · · + an bn ) ≤ a21 + a22 + · · · + a2n
164
b21 + b22 + · · · + b2n .
Appendix B
Circle in Taxicab Geometry Before discussing the circle in Taxicab geometry, let us talk about the circle in Euclidean geometry first. We know how a circle looks.
r r
Figure B.1: Circle in Euclidean geometry However, a circle only looks this way in Euclidean geometry. How a circle looks differs depending on which type of geometry it is in, but the definition of a circle is always the same. Definition of circle: A circle is a set of all points equidistant from the center. In figure B.1, we can see that the Euclidean distances between the center and all points on the circle are the same since the Euclidean distance between two points is defined to be the length of the line segment connecting those points.
r
r r
Figure B.2: Circle in Taxicab geometry 165
However, in Taxicab geometry, the definition of distance changes, so the shape of a circle changes. In Taxicab geometry, a circle is a set of all points having the same L1 distance from the center. That is why a circle in Taxicab geometry has the shape shown in figure B.2. We can think about this graphically as well. Let us set the center at (0, 0). The Euclidean distance between a point (x, y) and the origin is p x2 + y 2 . A circle with radius r in Euclidean geometry is a set of all points (x, y) whose Euclidean distance from the center (0, 0) is r. p So, to graph the circle with radius r in Euclidean geometry, we graph the curve x2 + y 2 = r or x2 + y 2 = r2 . y
r
r
−r
x
−r
Figure B.3: Graph of x2 + y 2 = r2 In Taxicab geometry, the L1 distance between a point (x, y) and the origin is |x| + |y|. A circle with radius r in Taxicab geometry is a set of all points (x, y) whose L1 distance from the center (0, 0) is r. So, to graph the circle with radius r in Taxicab geometry, we graph the curve |x| + |y| = r. y
r
r
−r
x
−r
Figure B.4: Graph of |x| + |y| = r
166
Appendix C
Beyond Vectors and Matrices: Blades and Tensors In chapter 1, we learned that vectors are arrows with directions. That means vectors are one-dimensional oriented objects. They are one-dimensional because they are arrows; they are oriented because they have directions (or orientations).
Figure C.1: A vector If we have one-dimensional oriented objects, then do we have two-dimensional oriented objects? The answer is yes!
~u ∧ ~v
~v
~u Figure C.2: A 2-blade or a bivector We can have an oriented parallelogram as in figure C.2, and the arc with an arrow at the end indicates the orientation of that parallelogram. Such two-dimensional oriented object is called a 2-blade or a bivector. (A vector is a 1-blade because it is a one-dimensional oriented object.) 167
The symbol “∧” is the wedge product, which is a type of product used in an advanced field of algebra in mathematics called exterior algebra. It is just like how in linear algebra we learned a type of product called dot product (in chapter 5). The wedge product of any two vectors gives a 2-blade.
~u
~v
~v ∧ ~u
Figure C.3: ~v ∧ ~u has the opposite orientation In figure C.2, the wedge product ~u ∧ ~v gives a parallelogram oriented counterclockwise. Since the orientation matters, if we change the order of wedge product, the orientation of ~v ∧ ~u is clockwise. For higher dimensions, if we want to have a higher-dimensional oriented object, we just need to take the wedge product of more vectors. For example, the wedge product of three vectors gives an oriented parallelepiped. Such a three-dimensional oriented object is called a 3-blade or a trivector. A parallelepiped is a threedimensional box whose surfaces are all parallelograms.
Figure C.4: A parallelepiped (not oriented) In general, a n-dimensional oriented object, called a n-blade, is the wedge product of n vectors. Other than viewing a vector as an arrow, we can also view a vector as a list of numbers such as 3 5 . −10 Then, in chapter 6, we learned that a matrix is a block of numbers with two dimensions: rows and columns.
168
So, we can think of vectors as one-dimensional collections of numbers and matrices as two-dimensional collections of numbers. Then, do we have three-dimensional or any n-dimensional collections of numbers? The answer is, again, yes! For example, we can have a box of numbers with three dimensions.
rank-1 tensor (vector)
rank-2 tensor (matrix)
rank-3 tensor
Figure C.5: Tensors Such box of numbers with three dimensions is called a rank-3 tensor. In this sense, a vector is a rank-1 tensor, and a matrix is a rank-2 tensor. In general, a ndimensional collection of numbers is a rank-n tensor. The concepts of blades and tensors are some really advanced concepts in the algebra field of mathematics, so of course they are way beyond the scope of this book. The purpose of this appendix is to give the readers a general idea that the concepts in linear algebra we learn in this book can be extended and that algebra in mathematics is way more than just high school algebra.
169
Solutions to Exercise Problems Chapter 1 1. Vectors: A, B, D, F; Scalars: C, E 2. y-axis
1
2
x-axis
3
(2, −1) −1
−2
−3
The vector represents the point (2, −1) 3. y-axis
(−3, 4) 4
3
2
1
−4
−3
−2
−1
x-axis
The vector represents the point (−3, 4) 171
4. y-axis
−3
−2
x-axis
−1 −1
−2
−3
(−2, −3)
The vector represents the point (−2, −3) 5. It represents the point (1, 3, 6). Its y-coordinate is 3. 6. It represents the point (−4, 5, −2). Its x-coordinate is −4. 7. It represents the point (8, 15, −20). Its z-coordinate is −20. 8. 5 dimensions
Chapter 2 −5 1. 5 −3
2 1 2. −1 −5
16 3. 30 −12
−4 12 −16 4. 0 −8 8 3 5. 11 172
0 6. −8 6 7. Let
v1 v2 ~v = . . .. vn Then, a · v1 + b · v 1 (a + b) · v1 v1 v2 (a + b) · v2 a · v2 + b · v2 (a + b) · ~v =(a + b) · . = = .. .. .. . .
a · vn + b · v n (a + b) · vn a · v1 b · v1 v1 v1 a · v2 b · v2 v2 v2 = . + . = a · . + b · . = a~v + b~v . .. .. .. .. a · vn b · vn vn vn
vn
Chapter 3 0 4 1 −5 1. 0 = 3 4 − 2 6 + 3 0 2 1 8 3 1 5 2 1 5 2 1 1 0 1 1 = − 2 , so ∈ span , 0 . 2. 0 −2 −1 0 −2 −1 −2 4 3 −2 4 3
3. There can only be at mostthree linearly independent three-dimensional vec5 −2 6 −1 tors, so 1, 3 , 0, and −3 are linearly dependent. 4 8 9 7 4. B. It does not contain the origin (0, 0). 5. 5 6. It is not a set of basis vectors for R2 . Basis vectors have to be linearly in173
dependent, but 3 −2 −3 = . 9 2 6 7. It is the span of three linearly independent vectors, so the dimension of the vector space is three. It is the span of three-dimensional vectors, so it is a subspace of R3 . 8. The first two vectors are linearly independent, and the third vector is linearly dependent with respect to the first two. It is the span of two linearly independent vectors, so the dimension of the vector space is two. It is the span of four-dimensional vectors, so it is a subspace of R4 .
Chapter 4 1. ||~u||1 = 10 2. ||~v ||2 = 13 √ 1/ √6 1 w ~ ~ = −2/√6 =√ w 3. w ˆ= ||w|| ~ 2 6 −1/ 6 4. a. ||~u||2 = b.
√
11