Linear Algebra [2 ed.]
 9789332522145, 9332522146

Table of contents :
Cover
Copyright
Contents
Preface
Preface to the Second Edition
A Note to Students
List of Symbols
Matrices
1.1INTRODUCTION
1.2BASIC CONCEPTS
1.3 MATRIX OPERATIONS AND THEIR PROPERTIES
1.4 INVERTIBLE MATRICES
1.5 TRANSPOSE OF A MATRIX
1.6 PARTITION OF MATRICES; BLOCK MULTIPLICATION
1.7 GROUPS AND FIELDS
Systems of LinearEquations
2.1 INTRODUCTION
2.2 GAUSSIAN ELIMINATION
2.3 ELEMENTARY ROW OPERATIONS
2.4 ROW REDUCTION
2.5 INVERTIBLE MATRICES AGAIN
2.6 LU FACTORIZATION
2.7 DETERMINANT
Vector Spaces
3.1 INTRODUCTION
3.2 BASIC CONCEPTS
3.3 LINEAR INDEPENDENCE
3.4 BASIS AND DIMENSION
3.5 SUBSPACES AGAIN
3.6 RANK OF A MATRIX
3.7 ORTHOGONALITY IN Rn
3.8 BASES OF SUBSPACES
3.9 QUOTIENT SPACE
Linear Mapsand Matrices
4.1 INTRODUCTION
4.2 BASIC CONCEPTS
4.3 ALGEBRA OF LINEAR MAPS
4.4 ISOMORPHISM
4.5 MATRICES OF LINEAR MAPS
Linear Operators
5.1 INTRODUCTION
5.2 POLYNOMIALS OVER FIELDS
5.3 CHARACTERISTIC POLYNOMIALS AND EIGENVALUES
5.4 MINIMAL POLYNOMIAL
5.5 INVARIANT SUBSPACES
5.6 SOME BASIC RESULTS
5.7 REAL QUADRATIC FORMS
Canonical Forms
6.1 INTRODUCTION
6.2 PRIMARY DECOMPOSITION THEOREM
6.3 JORDAN FORMS
Bilinear Forms
7.1 INTRODUCTION
7.2 BASIC CONCEPTS
7.3 LINEAR FUNCTIONALS AND DUAL SPACE
7.4 SYMMETRIC BILINEAR FORMS
7.5 GROUPS PRESERVING BILINEAR FORMS
Inner Product Spaces
8.1 INTRODUCTION
8.2 HERMITIAN FORMS
8.3 INNER PRODUCT SPACE
8.4 GRAM–SCHMIDT ORTHOGONALIZATION PROCESS
8.5 ADJOINTS
8.6 UNITARY AND ORTHOGONAL OPERATORS
8.7 NORMAL OPERATORS
Bibliography
Index
Blank Page

Citation preview

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Linear Algebra

i

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Linear Algebra Second Edition

Promode Kumar Saikia North-Eastern Hill University

iii

Saikia-Linear Algebra

book1

February 18, 2014

14:0

No part of this eBook may be used or reproduced in any manner whatsoever without the publisher’s prior written consent. Copyright © 2014 Pearson India Education Services Pvt. Ltd This eBook may or may not include all assets that were part of the print version. The publisher reserves the right to remove any material in this eBook at any time. ISBN: 9789332522145 eISBN: 9789332540521 Head Office: 7th Floor, Knowledge Boulevard, A-8(A) Sector 62, Noida 201 309, India. Registered Office: Module G4, Ground Floor, Elnet Software City, TS-140, Block 2 & 9, Rajiv Gandhi Salai, Taramani, Chennai, Tamil Nadu 600113, Fax : 080-30461003, Phone: 080-30461060, www.pearson.co.in, Email id: [email protected]

iv

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Contents Preface

ix

Preface to the Second Edition

xiii

A Note to Students

xv

List of Symbols 1

2

xvii

Matrices

1

1.1 Introduction

1

1.2 Basic Concepts

1

1.3 Matrix Operations and Their Properties

15

1.4 Invertible Matrices

27

1.5 Transpose of a Matrix

32

1.6 Partition of Matrices; Block Multiplication

36

1.7 Groups and Fields

45

Systems of Linear Equations

49

2.1 Introduction

49

2.2 Gaussian Elimination

49

2.3 Elementary Row Operations

55

2.4 Row Reduction

65

2.5 Invertible Matrices Again

77

2.6 LU Factorization

82

2.7 Determinant

96

v

Saikia-Linear Algebra

vi

3

4

5

6

book1

February 18, 2014

14:0

Contents

Vector Spaces

114

3.1 Introduction

114

3.2 Basic Concepts

115

3.3 Linear Independence

127

3.4 Basis and Dimension

135

3.5 Subspaces Again

147

3.6 Rank of a Matrix

153

3.7 Orthogonality in Rn

163

3.8 Bases of Subspaces

178

3.9 Quotient Space

184

Linear Maps and Matrices

191

4.1 Introduction

191

4.2 Basic Concepts

191

4.3 Algebra of Linear Maps

204

4.4 Isomorphism

215

4.5 Matrices of Linear Maps

221

Linear Operators

237

5.1 Introduction

237

5.2 Polynomials Over Fields

238

5.3 Characteristic Polynomials and Eigenvalues

243

5.4 Minimal Polynomial

271

5.5 Invariant Subspaces

283

5.6 Some Basic Results

298

5.7 Real Quadratic Forms

310

Canonical Forms

321

6.1 Introduction

321

6.2 Primary Decomposition Theorem

321

6.3 Jordan Forms

329

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Contents

7

8

Bilinear Forms

vii

346

7.1 Introduction

346

7.2 Basic Concepts

346

7.3 Linear Functionals and Dual Space

355

7.4 Symmetric Bilinear Forms

360

7.5 Groups Preserving Bilinear Forms

374

Inner Product Spaces

380

8.1 Introduction

380

8.2 Hermitian Forms

380

8.3 Inner Product Space

385

8.4 Gram–Schmidt Orthogonalization Process

390

8.5 Adjoints

403

8.6 Unitary and Orthogonal Operators

409

8.7 Normal Operators

416

Bibliography

430

Index

431

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Preface This book is the outcome of a growing realization, shared by my colleagues, that there is a need for a comprehensive textbook in linear algebra whose main emphasis should be on clarity of exposition. There are several excellent textbooks available currently; however, the perception is that each of these has its own area of excellence leaving room for improvement. This perception has guided the approach to some topics of this book. For the contents of the book, I have drawn on my experience of teaching a full semester course in linear algebra over the years for postgraduate classes in the North-Eastern Hill University in Shillong, India. The inputs from some colleagues from undergraduate colleges have also helped. My main concern has always been with simplicity and clarity, and an effort has been made to avoid cumbersome notations. I have opted for informal discussions instead of giving definitions which appear cluttered-up. Overall, our aim has been to help readers acquire a feeling for the subject. Plenty of examples and numerous exercises are also included in this book. Chapter 1 introduces matrices and matrix operations and explores the algebraic structures of sets of matrices while emphasizing the similarities with more familiar structures. The role of unit matrices in the ring structure of matrices is also discussed. Block operations of partitioned matrices are quite useful in later chapters. This chapter discusses such matrices to make readers comfortable with their uses. Chapter 2 comprehensively covers the treatment of solutions of systems of linear equations by row reduction with the help of elementary row/column operations. Elementary matrices appear naturally; their usefulness in analysing matrices, especially invertible matrices, is also examined, and a section on properties of determinants is also included in this chapter. Determinants are defined in terms of expansions by minors along the first row; by doing so, it has become possible to give proofs of properties of determinants of arbitrary orders accessible to even undergraduate students. It should be noted that these properties are well-known and used frequently but hardly proved in classrooms. Chapter 3 begins by introducing the basic concepts related to vector spaces. Ample examples are provided for concepts like linear independence, basis and coordinates to make it easier for an average student. A whole section of this chapter is devoted to the idea of the rank of a matrix in computations as well as in theory. Rank of a matrix is defined through the row space and the column space of the matrix; this approach has the advantage of working with ideas like linear independence to make relevant proofs more transparent. Computations of bases of sums and intersections of subspaces have always been difficult for students and an attempt has been made to remove the difficulties of such computations. The easy-paced treatment of the topics of these three chapters makes this part of the book suitable for both students and teachers of undergraduate courses. Chapters 4 to 8 deal adequately with the essentials in linear algebra for a postgraduate student in mathematics. More practically, the topics cover the requirements of the NET syllabus. A brief look at the contents of these chapters follows. Linear maps between vector spaces are studied in detail in Chapter 4. The interplay between linear maps and matrices is stressed throughout this chapter. Other

ix

Saikia-Linear Algebra

x

book1

February 18, 2014

14:0

Preface

important concepts, such as isomorphism, dimension formula and similarity, are dealt with in this chapter. Projections as well as nilpotent maps and matrices are also introduced so that readers are familiar with them long before their actual applications. Chapter 5 is a long one; the goal is to obtain the diagonalization theorems. However, the main emphasis is to carefully develop the concepts, such as eigenvalues, characteristic polynomials, minimal polynomials and invariant subspaces, which are essential in many branches of higher mathematics. Cyclic subspaces and companion matrices are also treated here. Chapter 6 is devoted to canonical forms of matrices. A shorter and more accessible treatment of Jordan form is provided. Primary decomposition theorem and rational canonical forms are the other two topics in this chapter. Chapter 7 discusses bilinear forms. A method for diagonalizing symmetric matrices as well as quadratic forms is given here. Sylvester’s classical result for real symmetric matrices is also included. Chapter 8 deals with certain essential concepts which can be treated in the framework of inner product spaces and are introduced through hermitian forms. The main objective of this chapter is to obtain the orthogonal diagonalization of hermitian and real symmetric matrices. Standard topics, such as Gram-Schmidt process, adjoints, self-adjoint and normal operators, are thoroughly examined in this chapter leading to the Spectral theorem. Unitary and orthogonal operators are the other key topics of this chapter. The final chapter, Chapter 9, is devoted to a few topics which are must for a student of linear algebra but unfortunately do not find a place in the syllabi of linear algebra in most of the Indian universities. The chapter begins with a discussion of rigid motions and the canonical forms for orthogonal operators. Many applications of linear algebra in diverse disciplines depend on the theory of real quadratic forms and real symmetric matrices; as examples of such applications, this chapter discusses the classifications of conics and quadrics as well as the problems of constrained optimization, and relative extrema of real-valued functions. To facilitate the discussion of these problems, positive definite matrices are also introduced. Singular value decompositions of real or complex matrices reveal important properties of such matrices and lead to amazing applications. The last section of the chapter deals with singular value decompositions; as an application, Moore–Penrose inverses of matrices are briefly discussed. Numerous exercises are provided for almost all the sections of the book. These exercises form an integral part of the text; attempts to solve these will enhance the understanding of the material they deal with. A word about the true/false questions included in this book: We, at the North-Eastern Hill University, have been encouraging the practice of including such true/false questions in examination papers. We hope that the inclusion of such questions in this book will help spread the practice to other mathematics departments of the country. My thoughts about the subject matter of this book have been shaped by various books and articles on algebra and linear algebra by master expositors such as Halmos, Herstein, Artin and others. Their influence on this book is undeniable. I take this opportunity to acknowledge my indebtedness to all of them. I have also been greatly benefited by the textbooks listed in the bibliography; I express my gratitude to all the authors of these textbooks. The material about isometry in the last chapter closely follows Kumaresan’s lovely article on isometries which appeared in the Mathematics Newsletter, vol. 14, March 2005. Above all, my colleagues in the Mathematics Department of the North-Eastern Hills University deserve special thanks for helping me in so many ways during the preparation of this book. Professor M.B. Rege and Professor H. Mukherjee were always ready with suggestions for me; their encouragement kept me going. Innumerable discussions with my younger colleagues, Ashish Das, A. Tiken Singh, A. M. Buhphang, S. Dutta, J. Singh and Deepak Subedi, helped me immensely to give the final shape to the manuscript, especially in preparing various exercises. A. Tiken Singh and Ashish Das also

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Preface

xi

helped me to learn the intricacies of AMS Latex. Professor Nirmal Dev read through some portions of the initial draft; I thank him for his valuable suggestions. I must thank the authorities of the North-Eastern Hill University, especially the then ViceChancellor Professor M. Miri for granting me sabbatical leave for a year in 2003 during which the first draft of this book was prepared. Finally, I must admit that without my wife Moinee’s support, it would have been impossible to go through preparing and typing several drafts of the manuscript for this book in the last five years. She deserves my special thanks. In spite of all the care I have taken, mistakes may have remained in this book. I take full responsibility for any such mistake and will appreciate if they are pointed out. —Promode Kumar Saikia

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Saikia-Linear Algebra

book1

February 18, 2014

14:0

Preface to the Second Edition This new edition was initially conceived so that certain topics, as suggested by reviewers, could be incorporated to make the book useful to a wider readership. Apart from these new topics, select portions of the first edition, mainly in Chapters 3, 6 and 7, have been rewritten for greater clarity for the present edition. Moreover, material in this edition is arranged in such a way that due importance can be given to real symmetric matrices in the initial chapters only. As a result, it has been possible to present the important result about the orthogonal diagonalizibility of real symmetric matrices in Chapter 5 (where diagonalizable operators are discussed). This removes a major drawback of the first edition, where one had to wait till Chapter 8 to obtain the same result as a consequence of the theory of self-adjoint operators in general inner product spaces. The following are the new additions in this edition: LU factorization (Section 2.6; permutation matrices needed for this section are introduced in a new subsection in Section 2.3), orthogonality in Rn with respect to the standard dot product(Section 3.7; this section also deals with orthogonal and unitary matrices and Gram Schmidt process), orthogonal diagonalizibility of real symmetric matrices (Section 5.3, in a new subsection), groups preserving bilinear forms such as orthogonal, pseudoorthogonal and symplectic groups (Section 7.5). A new section 1.7 contains definitions and examples of groups, rings and fields for the benefit of readers not familiar with these concepts. Because of these new material, we have to omit the last chapter of the first edition on selected topics in this edition to keep the book to a reasonable length; the omitted chapter, along with a new section on difference equations and recurrence relations, will be uploaded to the website of the book. However, two important topics from the omitted chapter are included in this edition at appropriate places. The first one deals with conic sections as an application of real quadratic forms (Section 5.7) and the other briefly discusses positive definite and positive semi definite matrices (in Section 7.4). The topic of rational canonical forms in Chapter 6 of the first edition is shifted to the website. The other major changes in this edition are as follows: a new proof of the result that the characteristic and the minimal polynomials of a linear operator have the same irreducible factors, which does not use field extensions (Proposition 5.6.6), correction of the proof of existence of SN decomposition (Proposition 6.2.3) and a simpler proof of Sylvester’s law of inertia (Theorem 7.4.8). Some incorrect statements in the first edition are either corrected or removed for this edition. Some new exercises are also added in this edition. While preparing this edition, I have had many fruitful discussions with several of my colleagues, especially with Prof. A. K. Das of North Eastern Hill University and Prof. R. Shukla of Allahabad University; their valuable suggestions helped me in formulating the contents of several new topics. Prof. S. S. Khare went through the drafts of some of the new material and offered useful comments to

xiii

Saikia-Linear Algebra

xiv

book1

February 18, 2014

14:0

Preface to the Second Edition

make the presentation better. Thanks are due also to the teachers using the first edition in class rooms for pointing out mistakes and shortcomings in the first edition. I must acknowledge the role of the anonymous reviewers too whose views prompted this revised edition. Finally, my sincere thanks to all the persons in Pearson Education India, especially Jigyasa Bhatia, Anita Yadav, Rajesh Matthews and Nikhil Rakshit, whose patient cooperation and useful suggestions have made this new edition possible. —Promode Kumar Saikia

Saikia-Linear Algebra

book1

February 18, 2014

14:0

A Note to Students This book is intended to help in understanding concepts and learning procedures for computations in linear algebra. Computations are essential in linear algebra; but to carry out computations effectively and provide justification for the procedures adopted, one needs to thoroughly understand the related concepts. Classroom instructions do play a key role in understanding the material in a course of linear algebra. However, you will have to follow up the classroom lectures with your own effort. This is where this book fits in. I have chosen a conversational style for this book so that it will be easy for you to read and reread any material till you feel you have mastered it. You can test your understanding with the help of exercises provided at the end of all sections of the book. It is essential that you work out these exercises honestly. In doing so, you will find, more often than not, that you have to go back to the text to clear up some points. There are true/false type of questions in most sets of exercises; these questions ask you to determine whether a statement is true or false. Your answer has to be justified; a true statement requires a short proof whereas an example (known as a counter example) is needed to disprove a false one. You will find that quite a few results in this book are left to the reader; some of these depend on routine verifications. It is expected that you complete such proofs. Routine verifications are usually based on straightforward arguments; you should go through such verifications at least once even if they are not too exciting. Finally, I hope that using this book proves as useful and enjoyable for you as it was for me while writing. —Promode Kumar Saikia

xv

Saikia-Linear Algebra

book1

February 18, 2014

14:0

List of Symbols ai j ann(T ! ") Ab A−1 A∗ At Bil(V) C EC[a, b] C( f (x)) δi j ei j End(V) EndF (V) F Fn F[x] GLn (F) Hom(V, W) In Im( f ) Jn (a) ker f Mm×n (F) Mn (F) O Q R R[x] Rn [x]

the (i, j)th entry of the matrix A the annihilator of T an augmented matrix the inverse of A the conjugate of A the transpose of A the set of bilinear forms on V the set of complex numbers the set of all real-valued continuous functions on [a, b] the companion matrix of f (x) the Kroncker delta symbol the unit matrix with 1 at (i, j)th place the set of linear maps of V into itself the set of F-linear maps of V into itself a field the set of all n-tuples over a field F the set of all polynomials over a field F the set of all invertible matrices of order n over a field F the set of all linear maps of V into W the identity matrix of order n the image of a map f the elementary Jordan form of order n with eigenvalue a the kernel of a linear map f the set of all m × n matrices over F the set of all matrices of order n over F the zero matrix the set of rational numbers the set of all real numbers the set of all polynomials over R the set of all polynomials of degree at most n over R

xvii

Saikia-Linear Algebra

xviii

Rθ S ym(V) $S % ! T∗ T r(A) V⊥ V ⊥L V ⊥R V∗ $v, w% 'v' W⊥ XR z Z Z(v, T )

book1

February 18, 2014

14:0

List of Symbols

the rotation of the plane through angle θ the set of all symmetric bilinear forms on V the span of the set S the relation of row equivalence the adjoint of the operator T the trace of the matrix A the radical of V the left radical of V the right radical of V the dual space of V the inner product of v and w the lenght of v the orthogonal complement of W the set of all maps from X into R the zero map the set of integers the T -cyclic subspace generated by v

Saikia-Linear Algebra

1

book1

February 25, 2014

0:8

Matrices

1.1 INTRODUCTION Matrices, these days, are indispensable in the field of mathematics and in countless applications of mathematics. They are extremely useful in arranging, manipulating and transferring data, and so have proved invaluable in efficient analysis of large databases. To cite a few examples, recent innovations such as internet search engine algorithms, effective communications of images and data over the internet and to and from satellites, computer-aided designs used extensively in industries, all these rely crucially on matrix techniques. Linear models are employed in diverse disciplines to study various phenomena; matrices play an important role in such models. On the other hand, keeping pace with the ever-increasing speed and memories of computers, matrix techniques in numerical analysis are undergoing constant refinement leading to vast improvement of our ability to deal with complex systems one comes across in areas such as weather forecasting, economic planning, data management services, etc. There is no doubt that the study of matrix techniques and the theory of matrices will keep growing in importance in coming days. In linear algebra, matrices are used to not only to visualize abstract concepts in concrete terms, but also to gain insights about such concepts using their matrix representations. For example, as we shall see shortly, systems of linear equations, one of the major topics in linear algebra, can be interpreted as matrix equations; this allows an efficient treatment of such systems using powerful matrix methods. We shall also see that solutions of matrix equations lead naturally to concrete examples of vector spaces, another central topic in linear algebra. The concept of linear maps or linear transformations lies at the core of linear algebra; we shall further initiate the reader to the idea of matrices as linear maps or linear transformations between such concrete vector spaces. So, we begin our study of linear algebra with an introduction algebraic structures formed by matrices and illustrating some fundamental concepts of linear algebra with the help of matrices.

1.2 BASIC CONCEPTS An array A of mn numbers arranged in m rows and n columns, such as:   a11  A =  ...  am1

 a12 · · · a1n  .. .  . · · · ..  am2 · · · amn

1

Saikia-Linear Algebra

2

book1

February 25, 2014

0:8

Matrices

is said to be an m × n matrix, or simply a rectangular matrix. The number which appears at the intersection of the ith row and the jth column is usually referred to as the (i, j)th entry of the matrix A. It is natural that the number appearing as the (i, j)th entry be denoted by ai j in general. Then matrix A is described as [ai j ]m×n ; if it is clear from the context or if the size of the array is not important, we drop the subscript m × n. Sometimes, it is more convenient to describe the (i, j)th entry of a matrix A as Ai j . The entries ai j of a matrix A = [ai j ] may be real numbers or complex numbers. If all the entries are real, the matrix A is called a real matrix; otherwise, it is called a complex matrix. For example, if A=

( ' 5 −1 1/2 √ , 0 2 2

then A is a real 2 × 3 matrix, i.e., A is a matrix having 2 rows and 3 columns of real numbers with √ a11 = 5, a12 = 0, a23 = 2, etc. The set of real numbers R as well as the set of complex numbers C are examples of algebraic structures known as fields. Like these familiar numbers, elements of a field can be added and multiplied. Every field has two distinguished elements which behave exactly in the same way with respect to field addition and multiplication as our familiar real numbers 0 and 1; it is customary to denote these field elements by the same symbols 0 and 1. Moreover, subtraction and division by non-zero elements are also allowed in a field. For formal definitions and some examples of fields and other algebraic structures such as groups and rings, see Section 1.7 at the end of this chapter. The elements of an arbitrary but fixed field are called scalars. A field, in general, will be denoted by the symbol F. Matrices over any field can be defined the same way as matrices with real or complex entries. Thus, any rectangular array of m rows and n columns of scalars from a field F is an m × n matrix over F. From now onwards, we will be considering matrices over an arbitrary field unless otherwise specified. However readers, who are unfamiliar with the idea of an arbitrary field, can continue to treat matrices as arrays over any familiar number system without any hindrance to their understanding of the material in the initial two chapters. We go back to introducing new concepts and nomenclatures for matrices over a field F. If a matrix A over a field F is an 1 × n matrix, that is, A has a single row, then we say that A is (an n-dimensional) row vector over F. It is convenient in this case to drop the row index i and write A simply as A = [a1

a2

...

an ].

Similarly, an m × 1 matrix A over F is (an m-dimensional) column vector

where ai ∈ F.

   a1   a   2 A =  .  ,  ..    am

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

3

Observe that an m × n matrix A = [ai j ] over F can be described as comprising m row vectors ρ1 , ρ2 , . . . , ρm , where ρi is given by ρi = (ai1 ,

ai2

,...,

ain ) 1 ≤ i ≤ m,

or as comprising n column vectors γ1 , γ2 , . . . , γn , where    a1 j   a   2j γ j =  .   ..    am j

1≤ j≤n

is an m-dimensional column vector over F. Sometimes, it is convenient to write a matrix A having n columns γ1 , γ2 , . . . , γn as A = [γ1

γ2

, . . . , γn ].

Quite frequently, we have to consider rectangular matrices having the same number of rows and columns. An n × n matrix (having n rows and n columns) with entries from a field F is a square matrix of order n over F. Such square matrices, because they occur frequently in diverse areas, form a very important subfamily of matrices. Special Forms of Matrices A square matrix A = [ai j ] has a distinguished set of entries, namely, its diagonal consisting of the entries aii , that is, a11 , a22 , . . . , ann . A square matrix A = [ai j ] is said to be a diagonal matrix if all its off-diagonal entries are zero, that is, ai j = 0 if i ! j. (Of course, some or all the diagonal entries of a diagonal matrix may be zero). A diagonal matrix is called a scalar matrix if all the diagonal entries are equal; a very special case of a scalar matrix of order n is the one where all the diagonal entries are equal to 1; then, it is called the identity matrix of order n over F. An identity matrix of order n is usually denoted by In . So, the identity matrix In of order n over F looks like   1   1   0   .     . , In =   .     .  0  1   1

where the multiplicative identity 1 of F appears along the diagonal. The two ‘zeros’ appearing offdiagonal indicate that except for those indicated along the diagonal, all other entries of the matrix are zeros.

Saikia-Linear Algebra

4

book1

February 25, 2014

0:8

Matrices

A square matrix over a field F is upper triangular if all its entries below the diagonal are zero. (That does not exclude the possibility of zeros occurring as other entries). Using our ‘zero’ notation, such a matrix of order n can be described as   a11 a12 . . . a1n   a22 . . . a2n    . . . .   , . . .   0   . .    ann the zero below the diagonal indicating that all the entries below the diagonal are zeros. Similarly, a lower triangular matrix looks like  a11 a21   .   .  .  an1

a22 . . . an2

  0   .  .   ..  . . ann

We now come back to the discussion of general rectangular matrices. An m×n matrix over a field F having all its entries zeros is called the zero matrix over F; we will denote it by 0m×n or simply by 0 if its size is not important. Before operations on matrices are introduced, one has to know when two matrices are equal. Definition 1.2.1.

Two matrices A and B over field F are equal, and we write A = B if

• A and B have the same size, that is, they have the same number of rows and columns, and • The corresponding entries of A and B are equal as elements of F. Symbolically, two m × n matrices A = [ai j ] and B = [bi j ] over F are equal if ai j = bi j

for all i, j

such that 1 ≤ i ≤ m, 1 ≤ j ≤ n.

According to the preceding definition, two n-dimensional row vectors a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ) over a field F are equal if and only if ai = bi (for 1 ≤ i ≤ n) as elements of F; this shows that the order in which the components ai appear in the row vector (a1 , a2 , . . . , an ) is important. The order of appearance of elements ai determines the row vector (a1 , a2 , . . . , an ), it is also called an ordered n-tuple. For n = 2 such a symbol is called an ordered pair; thus, as ordered pairs (a, b) is not the same as (b, a) (unless a = b) even though as sets {a, b} and {b, a} are the same. An ordered n-tuple for n = 3 is a ordered triple. Similarly, two n-dimensional column vectors over F are equal if and only if corresponding entries are equal. Thus the order of the entries of an n-dimensional column vector determines it uniquely and so we can also think of it also as an ordered n-tuple. Thus, an n-dimensional row or a column vector over a field F is an ordered n-tuple and conversely any n-tuple with elements from a field F can be considered either an n-dimensional row vector or an n-dimensional column vector.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

5

The set of all ordered n-tuples formed by the elements of a field is usually denoted by Fn . As we shall see later, Fn is a vector space; its elements are called vectors. When a vector in Fn is represented as an n-dimensional column vector or row vector, its entries will be called the components of the vector. More generally, the set of all m × n matrices over a field F is another example of a vector space (see Section 1.3). The requirements that these matrices need to satisfy are certain properties of two operations on such matrices, namely addition and scalar multiplication. We introduce these operations now. From now onwards, the set of all m × n matrices will be denoted by Mm×n (F); Mn (F) will denote the set of all square matrices of order n over the field F. Addition and Multiplication of Matrices We can add two matrices over a field if they have the same size; the sum is again a matrix of the same size and its entries are obtained by adding the corresponding entries of the two given matrices. In other words, if A, B ∈ Mm×n (F) with A = [ai j ] and B = [bi j ], then the sum, denoted by A + B, is given by A + B = [ai j + bi j ]. We can also define the sum by letting A + B = [ci j ], where ci j = a i j + b i j

for 1 ≤ i ≤ m, 1 ≤ j ≤ n.

(1.1)

Note that A + B ∈ Mm×n (F). In particular, if A, B ∈ Mn (F), (that is for m = n) then their sum A + B ∈ Mn (F). Similarly, the sum a + b of two n-dimensional column (respectively, row) vectors in Fn , obtained by adding the corresponding entries of a and b, is an n-dimensional column (respectively, row) vector in Fn as Fn = Mn×1 (F). The following illustrates the addition of two real 2 × 3 matrices: ' 1 4

( ' 2 0 1 + 0 6 0

( ' ( ' 0 3 1+1 2+0 0+3 2 = = 5 0 4+0 0+5 6+0 4

( 2 3 . 5 6

It should be clear that matrices of different sizes cannot be added; we say that such sums are not defined. We can multiply a matrix over a field F by an element of the field or a scalar; such an operation is known as scalar multiplication. Definition 1.2.2. For any A ∈ Mm×n (F) and any scalar c ∈ F the scalar multiple cA is the matrix in Mm×n (F) obtained from A by multiplying each entry of A (which is a scalar too) by c. Thus, if A = [ai j ] then / 0 cA = cai j

for 1 ≤ i ≤ m, 1 ≤ j ≤ n.

It is clear that for A ∈ Mn (F) or x ∈ Fn , the scalar multiples cA is in Mn (F) and cx is in Fn .

(1.2)

Saikia-Linear Algebra

6

book1

February 25, 2014

0:8

Matrices

For example, if A =

' −1 3

0 2

( i is a 2 × 3 matrix over C, then the following scalar multiples 0

' −2 2A = 6

0 4

( 2i 0

' −i and iA = 3i

0 2i

( −1 0

are also 2 × 3 matrices over C. Recall that a scalar matrix is a diagonal matrix (that is, a square matrix whose all off-diagonal entries are zero) whose all diagonal entries are equal. It is clear that a scalar matrix of order n all of whose diagonal entries are equal to, say c, can be written as the scalar multiple cIn of the identity matrix In of order n. Note that if we replace the scalars ai j that appear as entries of a matrix A = [ai j ] ∈ Mm×n (F) by their negatives −ai j in F, the resultant m × n matrix is just the scalar multiple of A by the scalar −1. For convenience, we denote this new matrix in Mm×n (F) by −A. We think of this matrix as the negative (or formally as the additive inverse) of the matrix A. It is clear that when a matrix A ∈ Mm×n (F) is added to the matrix −A, the sum is the zero matrix 0 in Mm×n (F). The notation of the negative of a matrix allows us to introduce subtraction of matrices of the same size. If A = [ai j ] and B = [bi j ] are two matrices in Mm×n (F), then A − B will be the matrix in Mm×n (F) given by A − B = A + (−B) = [ai j − bi j ]. In other words, the (i, j)th entry of A − B is obtained by subtracting the scalar bi j from ai j in F. Thus, for any given A ∈ Mm×n (F), all the entries of A − A will be zeros, that is, A − A = 0, the zero matrix in Mm×n (F). We illustrate some of the ideas discussed so far with an example. If '

2 A= 1/2

( 0 −1

and

'

1 B= 1/4

( 0 −1/2

are matrices over R, then   2 A − 2B =  1/2

   0  1 0  − 2   −1 1/4 −1/2    0 0  2   −  −1 1/2 −1

  2 =  1/2   0 0 . =  0 0

So, A − 2B is the 2 × 2 zero matrix over R. Similarly, the reader can verify that ' ( 4 0 A + 2B = . 1 −2 We also note that A = 2B or B = 1/2A.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

7

Like numbers, matrices can also be multiplied although matrix multiplication, unlike addition or subtraction, is a more complicated operation. The product of two matrices cannot be computed simply by multiplying the corresponding entries of the matrices. We first look at the special case of the multiplication of a row vector A by a column vector B of the same size over a field F. If

A = (a1 , a2 , . . . , an )

and

then the product AB is defined as the scalar (or the number)

  b1    b2  B =  . ,  ..    bn

AB = a1 b1 + a2 b2 + · · · + an bn . This is usually referred to as the dot product or the scalar product of the two vectors. For example, the dot product of /

0 −3

A= 5 0

  2   B = 1   4

and

will be AB = 5.2 + 0.1 + (−3) · 4 = 10 + 0 − 12 = −2. The sum expressing the product AB can be abbreviated by using the convenient notation for summation denoted by the Greek letter sigma (Σ): a 1 b1 + a2 b2 + · · · + an bn =

n 1

ai bi .

(1.3)

i=1

In the general case, we can multiply a matrix A by another matrix B only when the number of columns of A is the same as the number of rows of B, so that we can take the dot product of each row vector of A with each column vector of B. To be more precise, if A is an m × n matrix (having n columns) and B an n × p matrix (having n rows), both over the same field F, then the product AB is an m × p matrix over F; the (i, j)th entry of AB is the scalar obtained by the dot product, given by the Equation (1.3), of the ith row of A and the jth column of B. Definition 1.2.3. For any m × n matrix A = [ai j ], 1 ≤ i ≤ m, 1 ≤ j ≤ n and an n × p matrix B = [bi j ], 1 ≤ i ≤ n, 1 ≤ j ≤ p, the product AB is the m × p matrix AB = [ci j ] , where ci j =

n 1

aik bk j =

k=1

for any fixed i and j with 1 ≤ i ≤ m, 1 ≤ j ≤ p.

1 k

aik bk j ,

(1.4)

Saikia-Linear Algebra

8

book1

February 25, 2014

0:8

Matrices

For example, the product of the following matrices  1 A =  0

2 4

 3  −1

and

 0  B = 2  1

 1  −2  1

is a 2 × 2 matrix as A is a 2 × 3 and B a 3 × 2 matrix. If ci j denotes the (i, j)th entry of AB then c11 = 1.0 + 2.2 + 3.1 = 7 c12 = 1.1 + 2(−2) + 3.1 = 0 c21 = 0.1 + 4.2 + 1(−1) = 7 c22 = 0.1 + 4(−2) + (−1)1 = −9 Hence,  7 AB =  7

 0 . −9

Sometimes, we say that matrices A and B are comparable for multiplication if the product AB is defined. For example, if m ! n then for A, B ∈ Mm×n (F) the product AB is not defined. However, it is clear that if A, B ∈ Mn (F) then product AB is defined and AB ∈ Mn (F). In particular, if A ∈ Mn (F) then A can be multiplied to itself. This product, which is denoted by A2 is in Mn (F). Therefore, it is possible to define integral powers Ak for any positive integer k as a matrix of order n by multiplying A to itself k times. But, we will have to wait till the Section 1.3 to see that there is no ambiguity in defining Ak for k ≥ 3. As an exercise involving matrix multiplication, we prove the following simple but useful result. Proposition 1.2.4. The product of a finite number of lower triangular (respectively, upper triangular) matrices in Mn (F) is a lower triangular (respectively, upper triangular) matrix in Mn (F). Moreover, if the diagonal elements of the matrices in the product are all equal to 1, then the diagonal elements of the product are also equal to 1. Proof. We prove the result for lower triangular matrices, leaving the similar proof for upper triangular matrices to the reader. Also it is clear that it suffices to prove the result for a product of two matrices. So let A = [ai j ] and B = [bi j ] be two lower triangular matrices of order n over a field F. Since in both A and B, the entries above the diagonal are all zero, it follows that ai j = bi j = 0 if i < j. Now, if we denote AB = [ci j ], then by the definition of matrix multiplication, for any fixed i and j with 1 ≤ i, j ≤ n, ci j = =

n 1 k=1 i 1 k=1

aik bk j aik bk j +

n 1

aik bk j .

k=i+1

If i < j then each bk j in the first sum is zero as k ≤ i in this sum whereas each aik in the second sum is zero as k ≥ i + 1 in the second sum. It therefore follows that ci j = 0 if i < j, proving that AB = [ci j ] is lower triangular.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

9

To verify the second assertion, note that by hypothesis, aii = bii = 1 for each i, 1 ≤ i ≤ n. Also any diagonal element of AB is given by cii = =

n 1 k=1 i−1 1

aik bki aik bki +

k=1

n 1

aik bki + aii bii .

k=i+1

Each bki is zero in the first sum and each aik is zero in the second sum so the sum reduces to cii = aii bii = 1 by hypothesis. ! Matrix Notation for Systems of Linear Equations We now discuss as to how the definition of matrix multiplication allows us to describe systems of linear equations in matrix notations. Consider, for example, the following system of two equations in two variables: 2x − 3y = 5 x + 4y = −3. Arranging the coefficients of the variables x and y in an array exactly the way they appear in the two equations, we obtain the following matrix of order 2: A=

' 2 1

( −3 , 4

which is known as the coefficient matrix of the given 2 3 system of equations. Note that as the coefficients x are real numbers, A is a real matrix. If we set x = , a 2-dimensional column vector (a 2 × 1 matrix), y then it is a simple exercise of matrix multiplication to show that the product Ax is a 2 × 1 matrix given by 2 3 2x − 3y Ax = . x + 4y 3 5 , it follows, by the definition of the equality of two column vectors, that the −3 given system of equations can be described by the matrix equation Therefore, if b =

2

Ax = b. The procedure for expressing any system of linear equations over any field as a matrix equation is a straight-forward generalization of the preceding example. The general system of m equations in n

Saikia-Linear Algebra

10

book1

February 25, 2014

0:8

Matrices

variables (sometimes thought of as unknowns) over a field F is usually described as a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = bm . Here, the ai j and the bi stand for given (or known) scalars in F and the xi are the variables or the unknowns. For convenience, this system of linear equations can also be described briefly by using the summation notation: n 1

ai j x j = bi

for i = 1, 2, . . . , m.

(1.5)

j=1

Here, the index i tells us which one of the m equations is being considered, whereas for each fixed i, the index j indicates the position in the ith equation that we are looking at. So, ai j is the coefficient of the jth variable in the ith equation. The system of linear equations given in Equation (1.5) is said to be homogeneous, if bi = 0 for all i. A solution of the system is a list s1 , s2 , ..., sn of scalars in F such that when the scalars s1 , s2 , ..., sn are substituted for the variables x1 , x2 , ..., xn , respectively, each one of the m equations becomes an equality. Clearly the order of the scalars in the list s1 , s2 , ..., sn is important. For example, the system of two equations we had considered earlier has 1, −1 as a solution but not −1, 1. Thus, any solution of the general system of m equations in n variables given by Equation (1.5) is an ordered n-tuple of elements of F; usually such a solution will be considered a column vector in Fn . If each si is zero, the solution is called the zero solution; a nonzero solution has at least one si nonzero. Associated with the system given by Equation (1.5) of linear equations is an m × n matrix, known as the coefficient matrix of the system. If the system of equations is over a field F, the coefficients are scalars from F, and so the coefficient matrix is also over F. As the name suggests, this matrix has for its jth column precisely the coefficients of the jth variable x j appearing the way they do vertically downwards from the first equation. Thus, symbolically, the coefficient matrix of the system described by Equation (1.5) is [ai j ]. For example, the coefficient matrix of the following system x1 + 2x2 − 3x3 = 1 4x1 + 5x3 = 0 −x1 + x2 + x3 = 9    1 2 −3   5. In this matrix, for example, a11 = 1, a12 = 2, a22 = 0 and is the 3 × 3 real matrix A =  4 0   −1 1 1 a32 = 1. Next, we show how the idea of matrix multiplication can be used to express the general system of linear equations given by Equation (1.5) as a single matrix equation. Since the system consists of m

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

11

equations in n variables, the coefficient matrix A = [ai j ] will be an m × n matrix over F. Let x and b, respectively, be the n × 1 and m × 1 column vectors given by      x1  b1   x2  b2  x =  .  and b =  .  .  ..   ..      xn bn

Here, the components of x are the variables or the unknowns and those of the column vector b over F are the scalars appearing in right-hand side of Equation (1.5). Now observe that the product Ax is an m-dimensional column vector whose ith component, by the rule of matrix multiplication, is the dot 4 product of the ith row of A and the column vector x and hence is precisely nj=1 ai j x j , the left hand side of Equation (1.5). Therefore, the system given by Equation (1.5) is equivalent to the single matrix equation Ax = b (1.6)    s1   s   2 by the definition of equality of matrices. Note that, as stated earlier, a column vector s =  .  in Fn is  ..    sn a solution of the matrix equation Ax = b if and only if the components s1 , s2 , . . . , sn , in that order, form a solution of the system of equations (1.5). Column-Row Expansion The product Ax of a matrix and a column vector, which we have expanded using the row-column multiplication, has another equally useful expansion, usually referred to as the column-row expansion. To see how this works, consider an m × n matrix A = [ai j ] over a field F and denote its jth column for    a1 j   a2 j  1 ≤ j ≤ n, by γ j =  . . Now for any scalar x j (for 1 ≤ j ≤ n),  ..    am j    a1 j x j   a2 j x j  x j γ j =  .  .  ..    am j x j On the other hand, as we have noted already, the ith component of the m-dimensional column vector 4 Ax is the sum nj=1 ai j x j and so, by the preceding expression for the m-dimensional column vector 4 x j γ j , also equals the ith component of the sum nj=1 x j γ j of the m-dimensional column vectors. Thus, Ax can be expressed as    x1   x   2 A  .  = x1 γ1 + x2 γ2 + · · · + xn γn , (1.7)  ..  xn a linear combination of the columns of A.

Saikia-Linear Algebra

12

book1

February 25, 2014

0:8

Matrices

For example,   1 2  4 5  7 8

        1 2 3 3   2          6   −3  = 2 4 − 3 5 + 4 6 ,         9 4 7 8 9

which the reader should verify by computing the product on the left hand side by the usual row-column multiplication and adding the column vectors (after performing the scalar multiplication) on the right hand side. The column-row expansion, as given in Equation (1.7), will be useful in several occasions later in this book, primarily because it shows that the matrix equation Ax = b is equivalent to the vector equation x1 γ1 + x2 γ2 + · · · + xn γn = b,

(1.8)

where γ1 , γ2 , . . . , γn are the vectors forming the columns of A and x1 , x2 , . . . , xn are the components of the column vector x. We shall further see (at the end of Section 1.6) that, in general, the product of two matrices can also be computed by column-row expansion. Before we end this introductory section, we consider another important aspect of matrices. Given any m × n matrix A ∈ Mm×n (F) and any column vector a ∈ Fn , that is, an n × 1 matrix over F, the product Aa = b is am m × 1 matrix over F, that is column vector in Fm . In other words, multiplication by A produces a function or a mapping, say f , from Fn to Fm , given by f (a) = Aa. This function has the following basic properties: f (a + a' ) = f (a) + f (a' ) and

f (ca) = c f (a)

for any a, a' ∈ Fn and c ∈ F. Due to these properties, f is said to be a linear map. Linear maps play a most important role in linear algebra and its applications. As we shall see later, any linear map from Fn into Fm can essentially be realized as multiplication by some suitable matrix in Mm×n (F). In the next section, we shall be studying systematically the properties of the operations on matrices that we have introduced so far. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All matrices are over an arbitrary field F unless otherwise specified. (a) For any two matrices A and B of the same size, A + B = B + A. (b) If, for matrices A and B the product AB is defined, then the product BA is also defined. (c) For any two matrices A and B of order 2, AB = BA. (d) If A, B and C are matrices of the same size, then (A + B) + C = A + (B + C). (e) If C is a scalar matrix of order n, then for any matrix A of order n, CA = AC. (f) If A is a real matrix of order n, then A2 − 2A is also a real matrix of order n. (A2 = AA)

(g) For a matrix A of order n such that A ! In , A2 can never be equal to In .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

13

(h) Every homogeneous system of linear equations with coefficients from a field F determines a matrix equation Ax = 0 where 0 is a zero column vector over F. (i) If the first two columns of a matrix B are equal, then the first two columns of AB are equal whenever AB is defined. (j) If the first two rows of a matrix B are equal, then the first two columns of AB are equal whenever AB is defined. (k) Every field has at least two distinct elements. 2. 3.

4. 5.

(l) If A is an m × n matrix and B is the n × p zero matrix, then AB is the m × p zero matrix. ' ( 0 1 If A = over any field F, then show that A2 is the zero matrix of order 2 over F. 0 0 Compute AB and BA if ' ( ' ( 0 1 1 0 A= and B = . 0 0 0 0 ' ( i 1 If A = , then show that A2 is the zero matrix of order 2 over C. 1 −i If   1 2 3   A = 0 4 5,   0 0 6

then show that the product (A − I3 )(A − 4I3)(A − 6I3) is the zero matrix of order 3, where I3 is the identity matrix of order 3 over R. 6. Show that A3 = I3 if   0 0 1   A = 1 0 0.   0 1 0

7. Show that A3 = I2 if

A=

' ( −1 −1 . 1 0

8. If for a real 2 × 2 matrix A, AB = BA for every real 2 × 2 matrix B, then prove that A must be a scalar matrix. 9. Let ' ( ' ( a 0 p q A= and B = 0 b r s be matrices over a field F. If q and r are non-zero scalars, find the conditions on the entries of A so that AB = BA. 10. If the first two columns of a matrix B over a field F are equal, then show that the first two columns of AB are equal for any matrix A over F such that AB is defined. 11. Show that it is possible to find two 2 × 2 matrices A and B over any field F having entries only 0 and 1 but such that AB ! BA. Generalize to the case of n × n matrices.

Saikia-Linear Algebra

14

book1

February 25, 2014

0:8

Matrices

12. Consider a matrix A over a field F having a zero row. For any matrix B over F such that AB is defined, show that AB must have a zero row. 13. Give an example of two real 2 × 2 matrices A and B such that (A + B)2 ! A2 + 2AB + B2. 14. Let A and B be two 2 × 2 complex matrices. Prove that (AB − BA)2 is a scalar matrix. Can this result be generalized to n × n matrices? 15. For any complex numbers a, b, c and d, let ' ( ' ( a b c d A= and B = . −b a −d c Prove that AB = BA. ' cos θ 16. For any real number θ, let T (θ) = sin θ

( − sin θ . Show that cos θ

T (θ1 )T (θ2 ) = T (θ2 )T (θ1 ) = T (θ1 + θ2 ) for any two real numbers θ1 and θ2 . 17. Let A and B be real matrices given by ( ' cos2 θ cos θ sin θ A= cos θ sin θ sin2 θ

and

'

( cos2 φ cos φ sin φ B= . cos φ sin φ sin2 φ

Prove that AB is the zero matrix of order 2 if θ and φ differ by an odd multiple of π/2. 18. For the Pauli’s matrices, ' ( ' ( ' ( 01 0 −i 1 0 σ1 = , σ2 = , σ3 = , 10 i 0 0 −1 prove the following relations: σ21 = σ22 = σ23 = −iσ1 σ2 σ3 = I2 , and σ1 σ2 = iσ3

σ2 σ1 = −iσ3 .

19. Prove Proposition (1.2.4) for upper triangular matrices. 20. Let A be an m × n, and B be an n × p matrix over R such that the sum of the entries of each row of both the matrices equals to 1. Prove that the sum of the entries of each row of the product AB is also 1. Is the result true if rows are replaced by columns? 21. Determine real matrices A and B such that   3  x1  2 3x1 − x2 + x3    A  x2  = , + 2x3   −x1 x3

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrix Operations and Their Properties

15

and       x1   x1 + x2     x2 − 3x3  . B  x2  =      x3 4x1 − x2 + x3

22. Write the following systems of equations with coefficients from R as matrix equations Ax = b: (a) x1 − 2x2 + 3x3 − x4 =0 x2 − x3 + 3x4 − 2x5 = 0. √ (b) 2x1 + 7x2 = 1 −x1 − πx2 = −3. One needs familiarity with finite fields for the following exercises. 23. The field of two elements is usually described as Z2 = {0, 1}, where 1 + 1 = 0. Write down the distinct 2 × 2 matrices over Z2 . 24. Generalize the last exercise to compute the number of m × n matrices over the finite field of p elements.

1.3 MATRIX OPERATIONS AND THEIR PROPERTIES Addition of scalars from any field, say the field of real numbers, follow certain rules. These rules, in turn, are key to verifications of basic properties of matrix addition. The rules governing addition, denoted by +, in a field F are as follows: (i) (Commutativity) For any a, b ∈ F a + b = b + a. (ii) (Associativity) For any a, b, c ∈ F (a + b) + c = a + (b + c). (iii) (Existence of Additive Identity) There is a unique element (the zero) 0 ∈ F such that a+0 = a = 0+a for any a ∈ F. (iv) (Existence of Additive Inverse)

For any a ∈ F there is a unique element −a ∈ F such that a + (−a) = 0 = (−a) + a.

Because of these rules, a field F is said to be an abelian group with respect to its addition. Also note the exact similarity of these rules to the rules of addition of integers which every reader must be familiar with; so the set of Z of integers is another example of an abelian group (for a formal definition of groups and abelian groups, see Section 1.7 at the end of this chapter). We shall shortly prove that addition of matrices in Mm×n (F) satisfies rules similar to that of addition of integers or of field elements. As mentioned earlier, the reader can assume, if necessary, that either F = R, the field of real numbers, or F = C, the field of complex numbers. One identifies the set M1 (F) of all 1 × 1 matrices over F with F itself. According to the definition of addition of matrices (see Equation 1.1), any two matrices in Mm×n (F) (or, in Mn (F)) can be added to obtain another matrix in the same set. This fact is sometimes expressed by saying that Mm×n (F) or Mn (F) is closed with respect to addition, or that matrix addition is a binary operation in Mm×n (F) or in Mn (F). Here are the basic properties of matrix addition.

Saikia-Linear Algebra

16

book1

February 25, 2014

0:8

Matrices

Proposition 1.3.1. Let Mm×n (F) be the set of all m × n over a field F. Then, (a) A + B = B + A for any A, B ∈ Mm×n (F); (b) (A + B) + C = A + (B + C) for any A, B, C ∈ Mm×n (F); (c) If 0 = 0m×n is the zero matrix in Mm×n (F), then, A+0 = 0+A = A for any A ∈ Mm×n (F); (d) Given any A ∈ Mm×n (F), there is a B ∈ Mm×n (F) such that A + B = 0m×n = B + A. In fact, A determines B uniquely and B = −A. We reiterate that these properties hold for matrices in Mn (F) too (for m = n) as well as for ndimensional row vectors or column vectors for any positive integer n. Proof. It is clear that the matrices appearing on both sides of the equality in each of the assertions are in Mm×n (F). Therefore, to verify these assertions, we just have to show that the general (i, j)th entries of matrices of both sides of any equality are the same. Let A = [ai j ], B = [bi j ] and C = [ci j ] be arbitrary matrices in Mm×n (F). Now, for any i, j, the (i, j)th entry of A+ B is the scalar ai j +bi j whereas the (i, j)th entry of B + A is bi j +ai j . Since ai j +bi j = bi j +ai j by commutativity of addition for scalars of F, it follows that A + B = B + A, which proves (a). Next, by associativity of addition in F, (ai j + bi j ) + ci j = ai j + (bi j + ci j ) for any i, j. So the (i, j)th entry of (A + B) +C is the same as the (i, j)th entry of A + (B +C) showing that (A + B) +C = A + (B +C) which is the assertion (b) of the proposition. To prove the property of the zero matrix given in (c), note that every entry of the zero matrix is the scalar 0 of F. Thus, for any A = [ai j ] ∈ Mm×n (F), the (i, j)th entry of A + 0 is ai j + 0 = ai j = 0 + ai j , which is the (i, j)th entry of 0 + A. Thus property (c) holds in Mm×n (F). Finally, for any A = [ai j ] in Mm×n (F), the matrix B = [−ai j ], where −ai j is the additive inverse of ai j in F, is clearly in Mm×n (F). Since ai j + (−ai j ) = 0 = (−ai j ) + ai j for all i, j, it follows that A + B = 0 = B + A. Thus (d) holds. ! As with the addition in fields, matrix addition is commutative and associative because of properties (a) and (b), respectively, of Mm×n (F) given in the proposition. 0 is the additive identity in Mm×n (F) and −A is the additive inverse of any A ∈ Mm×n (F) by properties (c) and (d) respectively. Thus, just like a field with addition, Mm×n (F) is an abelian groups precisely because of these properties of matrix addition (see Definition 1.7.1 in Section 1.7, the last section of this chapter). We record this important fact. Theorem 1.3.2. Mm×n (F) is an abelian group with respect to matrix addition with 0m×n acting as the additive identity. Since Mn (F) as well as Fn are closed with respect to matrix addition, it follows that both Mn (F) and F are also abelian groups with respect to matrix addition. Before we consider properties of scalar multiplication of matrices. we need to look at some properties of multiplication in a field. Recall the entries of a scalar multiple cA of a matrix A = [ai j ] over n

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrix Operations and Their Properties

17

a field F are the products cai j of scalars in F. Consequently, the verifications of various properties of scalar multiplication requires, in turn, the following properties of multiplication in F: (a) (Associativity) For any a, b, c ∈ F, a(bc) = (ab)c. (b) (Existence of multiplicative Identity) There is a unique element 1 ∈ F such that 1a = a = a1 for any a ∈ F, (c) (Left Distributive Law) For any a, b and c in F a(b + c) = ab + ac. (d) (Right Distributive Law) For any a, b and c in F (a + b)c = ac + bc. We now present the basic properties of scalar multiplication of matrices. Proposition 1.3.3. c, c' ∈ F (a) (b) (c) (d)

Let Mm×n (F) be the set of all m×n matrices over a field F. Then, for any scalars

(cc' )A = c(c' A) for any A ∈ Mm×n (F); (c + c')A = cA + c' A for any A ∈ Mm×n (F); c(A + B) = cA + cB for any A, B ∈ Mm×n (F); 1A = A for any A ∈ Mm×n (F), where 1 is the multiplicative identity of F.

It is clear that the results of the proposition are valid for matrices in Mn (F) as well as for row or column vectors in Fn . Proof. As in the case of matrix addition, these equalities are established by showing that the general entries of the matrices on both sides of any of these equalities are the same as all the scalar multiples of matrices involved are clearly in Mm×n (F). If A = [ai j ] is an arbitrary matrix in Mm×n (F), then for any c, c' ∈ F, the (i, j)th entry of the scalar multiple (cc' )A, for any i, j, is (cc' )ai j . Since c, c' and ai j are all elements of F, it follows, by the associativity of field multiplication, that (cc' )ai j = c(c' ai j ) which is clearly the (i, j)th element of the scalar multiple c(c' A). This proves assertion (a). On the other hand, the (i, j)th entry of (c + c' )A is the scalar (c + c' )ai j . Now by the right distributive law in F, (c + c' )ai j = cai j + c' ai j , which is the (i, j)th entry of the sum cA+c' A. Therefore assertion (b) holds. Now , if B = [bi j ] is another arbitrary matrix in Mm×n (F), then ai j + bi j being the (i, j)th entry of A + B, the (i, j)th entry of the scalar multiple c(A + B) is clearly c(ai j + bi j ). By the left distributive law in F, c(ai j + bi j ) = cai j + cbi j , the (i, j)th entry of the sum cA + cB. So assertion (c) of the proposition holds. The verification of the last assertion of the proposition is trivial as for the multiplicative identity 1 of F, 1ai j = ai j = ai j 1. ! Because the abelian group Mm×n (F) (respectively, Mn (F)) also satisfies the properties stated in this proposition with respect to scalar multiplication by scalars in F, Mm×n (F) (respectively, Mn (F)) is said to be a vector space over the field F (see Chapter 3 for the formal definition of a vector space). For the same reason, Fn is also a vector space over F.

Saikia-Linear Algebra

18

book1

February 25, 2014

0:8

Matrices

Let us now consider matrix multiplication. We have seen in the last section that, unlike real numbers or integers, two matrices in Mm×n (F), in general, cannot be multiplied. So we say that Mm×n (F) is not closed under multiplication in general. Even then, as the following proposition shows, under suitable restrictions, matrix multiplication does satisfy nice properties. For example, the first assertion of the proposition shows that matrix multiplication is associative whereas the second and the third together shows that it satisfies the distributive laws. The verifications of these results are nice exercises in matrix multiplications and so the reader is urged to follow the arguments closely. Proposition 1.3.4. Let F be a field. (a) If for matrices A, B and C over F the matrix products AB, BC, (AB)C and A(BC) are defined, then (AB)C = A(BC). (b) If for matrices A, B and C over F the matrix products AB, AC and A(B + C) are defined, then A(B + C) = AB + AC. (c) If for matrices A, B and C over F the matrix products AC, BC and (A + B)C are defined, then (A + B)C = AC + BC. (d) If for matrices A and B over F the matrix product AB is defined, then for any c ∈ F c(AB) = (cA)B = a(cB). (e) If A is an m × n matrix over F and Im and In are identity matrices of order m and n respectively over F, then Im A = A = AIn . Proof. For the hypothesis in (a) to be satisfied, if A is an m × n matrix, then B has to be an n × p matrix and so C is an p × r matrix for some positive integers m, n, p and r. In that case while AB is an m × p matrix, BC is an n × r matrix and so (AB)C and A(BC) are both m × r matrices over F. Thus to prove that these two matrices are equal, we need to show that their corresponding entries are the same. So let A = [ai j ], B = [bi j ] and C = [ci j ]. Setting AB = [xi j ] and BC = [yi j ], we then see that (see Equation 1.4) xi j =

n 1

aik bk j

for any i, j (1 ≤ i ≤ m, 1 ≤ j ≤ p),

p 1

bik ck j

for any i, j (1 ≤ i ≤ n, 1 ≤ j ≤ r).

k=1

and yi j =

k=1

(In both these sums the index k is a dummy index used only to indicate that the summations will take place for the integral values of k as stated; so any other letter or symbol can be used in place of k.)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrix Operations and Their Properties

19

Now fix positive integers i and j such that 1 ≤ i ≤ m and 1 ≤ j ≤ r. The (i, j)th entry of (AB)C for these fixed i and j can be expressed as  n  p p 1 1 1    xik ck j = ail blk  ck j . (1.9) k=1

k=1

l=1

Note that in the inner sum representing xik , we are forced to use l as the dummy index as k, for each xik , is fixed. Now, (by the right distributive law in the field F)again, the product of the inner sum on the right hand side of the preceding equation by each ck j ( for k = 1, 2, . . . , p) can be expressed as sums of products of scalars of the type ail blk ck j for l = 1, 2, . . . , n. The resultant expression can then be rearranged by collecting the coefficients of ail for each l (1 ≤ l ≤ n). Since the coefficient of ail in 4p the resultant expression, by the left distributive law in F, is clearly k=1 blk ck j , it follows that the left hand side of Equation (1.9) can be rewritten, using the formula for yl j , as n 1 l=1

 p  n 1  1 ail  blk ck j  = ail yl j k=1

(1.10)

l=1

which is precisely the (i, j)th entry of the product A(BC). This completes the verification of the assertion in (a). We sketch the proof of (b) leaving it to the reader to fill in the details. For (b), we may assume that while A is an m × n matrix, B and C (and so B + C also) are n × p matrices. Thus the product A(B + C) and the sum AB + AC are both m × p matrices so we need only to compare their corresponding entries. Now, if we set A = [ai j ], B = [bi j ] and C = [ci j ], then the (i, j)th entry of A(B + C) is given by n 1 k=1

5 6 aik bk j + ck j ,

which, by the left distributive law in F, can be split into two sums n 1

aik bk j +

k=1

n 1

aik ck j .

k=1

Since the first sum is the (i, j)th entry of AB and the second the (i, j)th entry of AC, their sum is the (i, j)th entry of AB + AC. Hence assertion (b). A similar proof establishes assertion (c). Since for any scalar c ∈ F, by the left distributive law in F again, c

n 1 k=1

aik bk j =

n 1 k=1

caik bk j =

n 1

aik cbk j ,

k=1

the definition of scalar multiple of a matrix implies the assertion in (d). The easy verification of (e) is left to the reader.

!

Note that if x and y are n-dimensional column vectors in Fn , that is, n × 1 matrices over F, then so is the sum x + y. Therefore, for any m × n matrix A over F, the products Ax, Ay and A(x + y) are defined. Proposition (1.3.4) then implies the following:

Saikia-Linear Algebra

20

book1

February 25, 2014

0:8

Matrices

Corollary 1.3.5.

Let F be a field and A ∈ Mm×n (F).

(a) For any column vectors x, y ∈ Fn ,

A(x + y) = Ax + Ay. (b) For any column vector x ∈ Fn and a scalar c ∈ F, A(cx) = cAx. Earlier in the last section, we had remarked that the multiplication of n-dimensional column vectors in Fn by an m × n matrix over F is a linear map from Fn into Fm ; this corollary provides the justification of our remark (the formal definition of a linear map is given in Chapter 4). Properties of Multiplication in Mn (F) Recall that the set Mn (F) of all square matrices of order n is closed with respect to addition and multiplication of matrices and so the hypotheses in all the assertions of the preceding proposition are valid for matrices in Mn (F). Since we shall be dealing with square matrices most of the time, we reiterate the assertions of the proposition specifically for matrices in Mn (F) next. Proposition 1.3.6. Let F be a field. Then (a) For any A, B, C ∈ Mn (F), (AB)C = A(BC); (b) For any A ∈ Mn (F), AIn = AIn = A, where In is the identity matrix in Mn (F); (c) For any A, B ∈ Mn (F) and c ∈ F, c(AB) = (cA)B = A(cB); (d) For any A, B, C ∈ Mn (F) A(B + C) = AB + AC

and (A + B)C = AC + BC.

In other words, matrix multiplication in Mn (F) is associative and obeys both the left and right distributive laws; moreover, Mn (F) has the multiplicative identity. Since Mn (F) is an abelian group with respect to addition, these properties of matrix multiplication makes Mn (F) into a ring with identity. Note that while matrix addition is commutative, AB ! BA, in general, for A, B ∈ Mn (F). What we mean is that though for specific matrices A, B in Mn (F), AB may be equal to BA (for example, if one of them is In ), we cannot set this as a rule applicable to any two matrices in Mn (F). (See Exercise 3 in Section 1.2.) Because AB ! BA, in general, for A, B ∈ Mn (F), we say that for n > 1, Mn (F) is non-commutative ring. (What happens for n = 1?)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrix Operations and Their Properties

21

Another property with respect to which matrix multiplication in Mn (F) differs from addition is the existence of inverse. Any A ∈ Mn (F) has its additive inverse. However, given an A ∈ Mn (F), even if A ! 0, there is no certainty that A has a multiplicative inverse, that is, a matrix B such that AB = In = BA. For example, it is easy to verify that there can be no real numbers a, b, c and d such that ' (' ( ' ( 1 0 a b 1 0 = . 0 0 c d 0 1 It must be pointed out though that Mn (F), for any positive integer n, contains a large class of invertible matrices, that is, matrices having multiplicative inverses. The structural similarity between Mn (F) and the set Z of integers also deserves attention. Z, like Mn (F), is a ring with identity (namely, the integer 1). However, unlike Mn (F), Z is a commutative ring as ab = ba for any two integers a, b. On the other hand, as in Mn (F), not every non-zero integer has a multiplicative inverse; in fact, 1 and −1 are the only invertible integers. There is another important difference between the ring of integers Z and the ring of matrices Mn (F) for n > 1. The product of two non-zero integers is always non-zero (so Z is called an integral domain). But in Mn (F), for n > 1, one can find many non-zero matrices whose product is the zero matrix. For example, in M2 (F) for any field F, ' (' ( ' ( 1 0 0 0 0 0 = . 0 0 0 1 0 0 We end the discussion about multiplicative properties of Mn (F) by pointing out an important consequence of the associativity of multiplication in Mn (F) which provides us with the unambiguous meaning of any positive integral power Ak for a matrix A ∈ Mn (F). We have already noted that A2 can be defined as the product AA. Now, by associativity, (AA)A = A(AA) so A3 can be defined as any one of the two equal products (A2 )A or A(A2 ). In general, we define Ak for any positive integer k ≥ 3 inductively as follows Ak = (Ak−1 )A = A(Ak−1 ). Thus, Ak is the product of k number of A, where the product can be computed in any order because of the associativity of multiplication. For k = 0, by convention, we let A0 = In , the identity matrix of order n. Unit Matrices It is time to introduce some very specific matrices in Mn (F) (for n > 1), which in some sense are the building blocks of Mn (F). To understand these matrices, we first look at them in M2 (F), the ring of 2 × 2 matrices. We define the four unit matrices in M2 (F) as follows: ( ( ' ' 1 0 0 1 , e12 = e11 = 0 0 0 0

and

e21 =

' 0 1

( 0 , 0

e22 =

' 0 0

( 0 . 1

Here, following convention, we are using small case letters to denote unit matrices. Any matrix in M2 (F) can be expressed in terms of these simple matrices. For example, given the matrix

Saikia-Linear Algebra

22 A=

book1

February 25, 2014

0:8

Matrices

'

−1 1/2

( 3 , we see that A is the following sum of scalar multiples of the preceding unit matrices: 4 A = (−1)e11 + 3e12 + 1/2e21 + 4e22.

Once we understand how this example works (the (i, j)th entries of A combine with corresponding ei j to produce A), it is too easy to write down the general formula; given an arbitrary A = [ai j ] ∈ M2 (F), A is the following sum of scalar multiples of the unit matrices: A = a11 e11 + a12e12 + a21 e21 + a22 e22 . Noting that the suffixes in the sum run independently through values 1 and 2, we can conveniently write A as a double sum A=

2 1

a i j ei j .

i, j=1

Technically speaking, we have just expressed A as a linear combination of the unit matrices, that is, as a sum of scalar multiples of the unit matrices. Keeping the 2 ×2 unit matrices in mind, we now consider the general case of unit matrices in Mn (F). For any positive integer n, there are exactly n2 unit matrices ei j for 1 ≤ i, j ≤ n in Mn (F), where ei j is the n × n matrix whose entries are all zeros, except for the entry at the (i, j)th place, which is 1. We record the basic properties of these unit matrices now. Proposition 1.3.7. Let Mn (F) be the set of all square matrices of order n over a field F. Then, (a) Any matrix in Mn (F) is a linear combination of the ei j . If A = [ai j ], then A = a11 e11 + a12 e12 + · · · + annenn n 1 = a i j ei j . i, j=1

(b) If In is the identity matrix in Mn (F), then In =

n 1

eii .

i=1

(c) Given two unit matrices ei j and ekl in Mn (F), ei j ekl = 0 = eil

if j ! k if j = k.

Proof. Only property (c) may present some difficulty for the reader and so we leave the proof of the first two to the reader. As for property (c), once we recall that every entry of the product ei j ekl is a dot product of some row vector of ei j with some column vector of ekl , the verification will be quite simple. Since every row vector of ei j except the ith one and every column vector of ekl except the lth one are the zero vectors, it follows that the only possible non-zero entry in ei j ekl can result from the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrix Operations and Their Properties

23

dot product of the ith row of ei j and the lth column of ekl . In other words, the only possible non-zero entry in the product will be the (i, l)th entry. Next, observe that this (i, l)th entry is zero, unless j = k in which case it is 1. Thus, ei j ekl is the zero matrix, unless j = k in which case it must be the unit matrix having 1 at the (i, l)th entry. This finishes the proof of (c). ! There is a useful notation, known as the Kronecker delta symbol which allows us to express the two relations of property (c) of the preceding proposition as a single one. The Kronecker delta symbol, denoted by δi j , is actually a function of two variables, depicted as the subscripts i and j, and its values are given by    0 if i ! j δi j =  (1.11)  1 if i = j We have not declared the range of the variables deliberately. They can range over any set of numbers, finite or infinite. The only requirement is that both the variables must range over the same set. However, in almost all our uses of the Kronecker delta symbol, the variables will range over some subset of the set of positive integers. Note that the property (c) of the preceding proposition can be put in the form ei j ekl = δ jk eil as the right-hand side equals eil if j = k, and is zero otherwise. In this case, the variables of the Kronecker symbol vary over the set of positive integers from 1 to n. Unit matrices in Mn (F) are useful in establishing important properties of the ring Mn (F). For example, these unit matrices are non-zero divisors of zero in the sense that the product of non-zero matrices ei j and ekl produces the zero matrix if j ! k. See Exercises 20 and 22 for other examples. It is possible to consider rectangular unit matrices, too. An m×n matrix is a unit matrix in Mm×n (F), if all its entries are zero, except one entry which is 1. Thus, in Mm×n (F), which consists of all matrices with m rows and n columns, there are precisely mn unit matrices. The same notation ei j can be used to describe the unit matrix in Mm×n (F) having 1 at the (i, j)th place and zeros elsewhere. Of course, unless m = n, we cannot talk about multiplication of unit matrices in Mm×n (F). However, as in the case of square matrices, it is easy to see that the unit matrices can be used to build arbitrary matrices in Mm×n (F). Proposition 1.3.8. Any matrix in Mm×n (F) is a linear combination of the unit matrices ei j . If A = [ai j ], then, A = a11 e11 + a12 e12 + · · · + amn emn n m 1 1 a i j ei j . = i=1 j=1

We sometimes need to multiply unit matrices of different sizes. If they are comparable for multiplication, then their product will either be the zero matrix, or a unit matrix again. We record this fact using the Kronecker delta symbol in the following result; the proof is similar to that of Proposition (1.3.7). Note that we keep using the same letter e to denote unit matrices of different sizes; in practice, the sizes should be clear from the context.

Saikia-Linear Algebra

24

book1

February 25, 2014

0:8

Matrices

Proposition 1.3.9. a field F. Then

Let ei j and ekl be two unit matrices of sizes m × n and n × p, respectively, over ei j ekl = δ jk eil .

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given matrices are over an arbitrary field F unless otherwise specified. (a) The integers form an abelian group with respect to addition. (b) If, for square matrices A, B and C of the same order, AB = AC, then B = C. (c) Even if both A and B are non-zero matrices of the same order, the product AB can be the zero matrix. (d) For any m × n matrix A, AIm = A whereas Im A is not even defined if m ! n. (e) For any square unit matrix ei j , the product ei j 2 is the zero matrix. (f) No scalar multiple of a non-zero m × n matrix can be the zero matrix.

(g) If, for a 3 × 3 matrix A, A3 is the zero matrix, then A must be the zero matrix. (h) If, for an m × n matrix A, both the products AB and BA are defined for some B, then m = n. (i) For matrices A, B and C of the same size, A − (B − C) = (A − B) − C.

(j) For square matrices A, B and C of the same order, A(B − C) = (−A)(C − B). 2. Prove that for any A, B, C ∈ Mn (F), (A + B)C = AC + BC. 3. For any A, B ∈ Mn (F), and any scalar a ∈ F, show that (aA)B = a(AB) = A(aB). 4. Prove that for any A, B, C ∈ Mn (F), (a) −(−A) = A;

(b) A(B − C) = AB − AC; (c) (A − B)C = AC − BC.

5. For any m × n matrix A, n × p matrix B and p × 1 matrix γ over a field F, show that (AB)γ = A(Bγ). 6. Let C be a matrix in Mn (F) whose column vectors are γ1 , γ2 , . . . , γn , so C = [γ1

γ2

···

γn ].

Prove that for any B ∈ Mn (F) BC = [Bγ1

Bγ2

···

Bγn ].

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrix Operations and Their Properties

25

7. Use the preceding exercise to prove that A(BC) = (AB)C for any A, B, C ∈ Mn (F). 8. Prove that the following laws of indices hold for any A ∈ Mn (F) and for any non-negative integers k and l: (a) Ak Al = Ak+l and (b) (Ak )l = Akl . 9. Let A, B ∈ Mn (F) such that AB = BA. Show that for any positive integer k, (a) ABk = Bk A and (b) (AB)k = Ak Bk .

10. Let A, B ∈ Mn (F) such that AB = BA. Show that (A + B)2 = A2 + 2AB + B2, assuming that 2 ! 0 in F. As in the last exercise, one can show that for matrices A, B ∈ Mn (F) such that AB = BA, the Binomial Theorem holds for the expansion of (A + B)k , provided the base field F has characteristic 0. The field R of real numbers or the field C of complex numbers is of characteristic 0. 11. Evaluate the following for any positive integer k: ' 1 0

(k

1 , 1

 0  1 0

0 0 0

k 0  0 .  1

12. Prove that the subsets of Mn (F) consisting of the following types of matrices are closed with respect to addition, multiplication and scalar multiplication. (a) Upper triangular matrices. (b) Lower triangular matrices. (c) Diagonal matrices. (d) Scalar matrices. of A. Given a matrix A ∈ Mn (F), we define the trace of A to be the sum /of the 0 diagonal entries 4 The trace of A, which is a scalar, is denoted by T r(A). Thus, if A = ai j , then T r(A) = ni=1 aii .

13. Let A, B ∈ Mn (F). Prove the following:

(a) T r(A + B) = T r(A) + T r(B). (b) T r(cA) = cT r(A) for any scalar c ∈ F. (c) T r(AB) = T r(BA).

14. Give an example of two matrices A and B, say, in M2 (R), such that T r(AB) ! T r(A)T r(B). 15. Use properties of traces of matrices to show that for any two matrices A, B ∈ Mn (F), AB − BA ! In , where In is the identity matrix in Mn (F). (Here, F is a field in which n ! 0.)

Saikia-Linear Algebra

26

book1

February 25, 2014

0:8

Matrices

16. Prove Proposition (1.3.9). 17. Let e11 , e12 , e21 and e22 be the unit matrices in M2 (F), and let ( ' a11 a12 = a11 e11 + a12e12 + a21e21 + a22 e22 A= a21 a22 be an arbitrary matrix in M2 (F). Use the formula for multiplication of unit matrices given in Proposition (1.3.7) to compute the following matrices: e11 A, e12 A, Ae21 and Ae22 . 4 18. Let A = i, j ai j ei j be an arbitrary matrix in Mn (F), where ei j are the unit matrices in Mn (F). Write down the matrices elk A and Aelk as linear combinations of the unit matrices. We say that two matrices A and B in Mn (F) commute if AB = BA. 19. Find all matrices B ∈ M2 (F) such that B commutes with ' ( 0 0 A= . 1 0 20. Prove that a scalar matrix ' in Mn (F) ( commutes with every matrix in Mn (F). a11 a12 21. Consider a matrix A = in Mn (F) such that A commutes with e12 and e21 . Prove that a21 a22 a11 = a22

and a12 = a21 = 0.

22. Let A be a matrix in Mn (F) such that A commutes with every matrix in Mn (F). Prove that A must be a scalar matrix. 4 (Hint: A = i, j ai j ei j commutes with every unit matrix elk .)

Given an m × n matrix A = [ai j ] and a p × q matrix B, both over a field F, the Kronecker Product A ⊗ B of A and B is the (mp) × (nq) matrix given by A ⊗ B = [ai j B],

where ai j B is the scalar multiple of the matrix B by ai j . Thus, the Kronecker product of any two matrices over a field is always defined. 23. Let A, B, C and D be matrices over a field F. (a) If A and B are both m × n matrices, then show that for any C, (A + B) ⊗ C = A ⊗ C + B ⊗ C. (b) If B and C are both p × q matrices, then show that for any A, A ⊗ (B + C) = A ⊗ B + A ⊗ C. (c) If sizes of the matrices A, B, C and D are such that the products AB and CD are defined, then verify that the product of A ⊗ C and B ⊗ D is also defined and (A ⊗ C)(B ⊗ D) = (AB) ⊗ (CD).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invertible Matrices

27

24. For any two matrices A and B over a field F, show that T r(A ⊗ B) = T r(A)T r(B). A matrix A ∈ Mn (R) is called a stochastic or a transition matrix if all its entries are nonnegative reals and the sum of the entries in each column is 1. 25. Prove that the product of any two stochastic matrices of order n is a stochastic matrix of order n.

1.4 INVERTIBLE MATRICES Definition 1.4.1. A matrix A ∈ Mn (F), that is, a square matrix A of order n over a field F, is said to be invertible in Mn (F), if there is a matrix B ∈ Mn (F) such that AB = BA = In , where In is the identity matrix of order n over F. The matrix B is said to be the inverse of A and denoted by A−1 . To take a trivial example, In itself is invertible and In−1 = In . On the other hand, the zero matrix surely cannot be invertible. As we have seen in the last section, in general, a non-zero matrix need not be invertible. If a matrix A in Mn (F) is not invertible, then we sometimes say that A is a singular matrix. EXAMPLE 1

The reader is invited to check that the following matrices are invertible by verifying that AA−1 = A−1 A = In : ' ( ' ( 1/3 0 3 0 −1 ; (a) A = , A = 0 2/3 0 2 ' ( ' ( 1 2 1 −2/3 (b) A = , A−1 = ; 0 3 0 1/3 ' ( ' ( 1 0 1 0 (c) A = , A−1 = ; 1 1 −1 1 (d) A = aIn (a ! 0),

A−1 = a−1 In ;

The definition of the invertibility of a matrix is clearly not suitable for checking whether a given matrix A is invertible. For, not only one has to guess what A−1 may be, but also compute AA−1 and A−1 A. However, we will be able to develop some conditions for invertibility that will not require any knowledge of the inverse. Moreover, there are more efficient ways (see, for example, Algorithm 2.5.6 in Chapter 2) to calculate the inverse of a matrix. Even then, we must point out that for theoretical purposes, Definition (1.4.1) will continue to be useful throughout this book. For now, we will be content with a slight simplification of Definition (1.4.1) whose justification will be given later (see Proposition 2.5.7). We claim that if we can find a matrix B such that either of the following conditions AB = In

or

BA = In

(1.12)

Saikia-Linear Algebra

28

book1

February 25, 2014

0:8

Matrices

holds, then A is invertible with B as its inverse. A nice way of putting this is to say that a one-sided inverse must be a two-sided inverse. This is clearly a non-trivial fact, as matrix multiplication is not commutative in general and therefore requires proof. But we will have to wait till Proposition (2.5.7) in Section 2.5 for the proof. A careful reader must have noted that we are assuming the uniqueness of the inverse of an invertible matrix. That is why it is possible to name the inverse A−1 and call it the inverse of A. It is easy to see why a matrix cannot have two inverses. If possible, suppose that A has two inverses B and C, so that AB = In = BA and AC = In = CA. Now C = CIn = C(AB). However, C(AB) = (CA)B and CA = In showing that C = B. Using the fact that the inverse of an invertible matrix is unique, one can easily deduce from the relation AA−1 = In = A−1 A the following proposition. Proposition 1.4.2. If A is invertible in Mn (F), then its inverse A−1 is also invertible and (A−1 )−1 = A. The sum of two invertible matrices need not be an invertible matrix. But we can say something definite about their products. Proposition 1.4.3. invertible and

Let A and B be two invertible matrices in Mn (F). Then, the product AB is also (AB)−1 = B−1 A−1 .

Proof. We show that AB is invertible by verifying that B−1 A−1 is its inverse. Note that by one of our preceding remarks, it is sufficient to verify that it is an one-sided inverse: (AB)(B−1 A−1 ) = (A(BB−1))A−1 = (AIn )A−1 = AA−1 = In .

!

Proposition (1.4.3) shows that the set of invertible matrices in Mn (F) is closed with respect to matrix multiplication. It is also clear that the identity matrix is invertible. Moreover, by Proposition (1.4.2), the inverse of any invertible matrix is again invertible. Denoting the set of invertible matrices in Mn (F) by GLn (F), we have, therefore, the following result: Theorem 1.4.4. The set GLn (F) of invertible matrices in Mn (F) forms a group with respect to matrix multiplication. We end this section by presenting a sufficient condition for a square matrix of order 2 to be invertible as well as a formula for the inverse in case the condition is satisfied. ' ( a b Proposition 1.4.5. Let A = be a matrix over a field F. If δ = ad − bc ! 0 in F, then A is c d

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invertible Matrices

29

invertible and A−1 = 1/δ Proof. A direct computation shows that ' (' a b d c d −c

'

d −c

( −b . a

( ' −b ad − bc = a 0

( 0 . ad − bc

So if δ ! 0, then on multiplication of the preceding matrix equation by the scalar 1/δ reduces the right hand side of the equation to I2 , which shows that A, the first matrix of the left hand side of the equation is invertible. By the uniqueness of the inverse, the scalar multiple 1/δ times the second matrix has to be inverse of A proving the assertion of the proposition (recall that for a scalar c, one has c(XY) = X(cY) for matrices X, Y of the same order). ! ' ( ab It must have been recognized that the scalar ad − bc is the determinant of the matrix . As we cd shall see later, every square matrix A over a field can be associated with a scalar det A, known as its determinant, though the formula for det A for a general matrix A of order n ≥ 3 is not as simple as that for a matrix of order 2. However, we shall also see that a matrix A ∈ Mn (F) is invertible if and only if det A is non-zero; thus the preceding result is one half of this general result for the special case of n = 2; see Exercise 7 of this section in this connection. The general formula for the inverse of an invertible matrix using determinant, even for invertible matrices of order 3, involves a large number of computations to be of any practical value; as we have stated earlier, one uses instead the algorithm given in Section 5 of Chapter 2. In contrast, finding the inverse of an invertible matrix A of order 2 by using the preceding proposition is quite simple; interchange the diagonal elements of A, change the sign of the other two entries and then scalar multiply the resultant ( by the reciprocal of the determinant. For example, as the determinant of ' matrix 2 2 is 2.4 − 2.2 =0.2,the inverse of A is the real matrix A = 2 4 ' ( ' ( 4 −2 1 −1/2 −1 A = (1/4) = . −2 2 −1/2 1/2 We will come back to invertible matrices in Section 5 of the next chapter. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given matrices are square and over an arbitrary field F unless otherwise specified. (a) A matrix with a zero row or a zero column cannot be invertible. (b) The sum of two invertible matrices of the same order is invertible. (c) A diagonal matrix with all its diagonal entries non-zero, is necessarily invertible. (d) If matrices A and B commute, then the invertibility of A implies the invertibility of B. (e) If A is an invertible matrix, then the homogeneous system of equations Ax = 0 has a non-zero solution. (f) If A is an invertible matrix, then AB = AC implies that B = C.

Saikia-Linear Algebra

30

book1

February 25, 2014

0:8

Matrices

(g) If for matrices A and B of order n, AB = In , then A and B are both invertible. (h) The sum of two singular matrices of the same order cannot be invertible. (i) The trace of an invertible matrix must be non-zero. (j) The set of non-zero scalar matrices of order n is a subgroup of GLn (F). 2. Let A be an invertible matrix in Mn (F). Prove the following: (a) For any non-zero scalar a ∈ F, the scalar multiple aA is invertible. (b) For any positive integer k, Ak is invertible. 3. Let A ∈ Mn (F). Prove the following:

(a) If aA is invertible for some non-zero scalar a ∈ F, then A is also invertible; (b) If for some positive integer k, Ak is invertible, then A is also invertible.

4. Let A and B be matrices in Mn (F). (a) If the product AB is invertible, then prove that both A and B are invertible. (b) If AB is invertible, then prove that BA is also invertible. 5. Let A be an invertible matrix in Mn (F). Show that for no non-zero matrix B ∈ Mn (F), AB or BA can be the zero matrix. 6. Use the formula in Proposition (1.4.5) to compute the inverses of the following matrices: A=

' cos θ C= sin θ

' 1 0

( 1 , 1

( −sin θ , cos θ

B=

' 1 3

D=

'

( 2 , 4 1+i −i

( i . 1−i

Note that while A, B, C are real matrices, D is a complex matrix with i2 = −1. 7. Verify that the condition δ( ! 0 in Proposition (1.4.5) is necessary for the invertibility of A by ' 1 1 showing that if A = , then there is no B ∈ M2 (R) such that 1 1 ' 1 AB = 0

( 0 . 1

8. Let A ∈ M2 (F) such that its second row is a scalar multiple of the first, that is, A is of the form A=

'

a ca

( b . cb

Prove that A is not invertible by showing that there can be no B ∈ M2 (F) such that AB = I2 . Generalize to prove that if some row of a matrix A ∈ Mn (F), (n > 1) is a scalar multiple of another row, then A cannot be invertible. 9. Use the preceding exercise to show that if A is an n × 1 and B an 1 × n matrix over a field F, then the n × n matrix AB cannot be invertible (n > 1).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invertible Matrices

31

10. The following matrices A over R are invertible. Guess their inverses A−1 , and compute AA−1 to confirm your guess:       1 0 0 1 2 −1 0 1 0       3, A = 1 0 0. A = 1 1 0, A = 0 1       1 1 1 0 0 1 0 0 1

11. Find the inverses of the following matrices of order 3 over a field F:   a1 0 0    A =  0 a2 0  ai ! 0,   0 0 a3   0  A =  0  a3

 a  A = 1  0

Generalize to matrices of order n. 12. Consider the real matrices ' 5 A= −2

( −2 2

0 a2 0

 a1   0  ai ! 0,  0

 0  0 a ! 0.  a

0 a 1

and

P=

' 1 2

( −2 . 1

(a) Compute P−1 . (b) Verify that '

1 A=P 0

( 0 −1 P . 6

(c) Prove that, for any positive integer n, ;' 1 A = 1/5 2 n

' a 13. Let A = c

' ( 4 2 n +6 −2 4

(< −2 . 1

( b ∈ M2 (R), where a, b, c and d are non-negative real numbers such that a + c = d ' ( b 1 b + d = 1 and A ! I2 . Let P = . c −1 (a) Prove that P is invertible and that ( ' 1 0 . P−1 AP = 0 a+d−1

(b) Compute An for any positive integer n. The next exercise shows how to use matrix methods to solve systems of recurrence relations.

Saikia-Linear Algebra

32

book1

February 25, 2014

0:8

Matrices

14. Consider the following system of recurrence relations: xn+1 = 5xn − 2yn

yn+1 = −2xn + 2yn,

where the initial values are given by x0 = 0 and y0 = 1. Let ' ( x Xn = n yn for any non-negative integer n. (a) Writing the system of recurrence relations in matrix form, show that Xn = A n X 0 for any positive integer n, where A is the matrix of Exercise 12. (b) Use Exercise 12 to show that xn = 1/5(2 − 2.6n) yn = 1/5(4 + 6n) for any positive integer n. 15. List the invertible matrices in Mn (F) if F = Z2 , for n = 2 and 3. 16. Let A and B be invertible matrices over a field F. Prove that the Kronecker product A ⊗ B is also invertible and (A ⊗ B)−1 = A−1 ⊗ B−1.

1.5 TRANSPOSE OF A MATRIX Quite often, for various reasons, one has to consider a given matrix A with its rows and columns interchanged. The new matrix so obtained by such an interchange is called the transpose of A, and denoted by At . To be precise, if A = [ai j ] is an m × n matrix over a field F, then its transpose At = [bi j ] is an n × m matrix over F such that bi j = a ji

for all i, j

such that 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Thus, for A ∈ Mm×n (F), we have At ∈ Mn×m (F). In particular, the transpose of a n-dimensional row (respectively, column) vector is a n-dimensional column (respectively, row) vector. Note also that if A ∈ Mn (F), then At is also in Mn (F). We need to know how transposing a matrix works with various matrix operations. The following result explains. Proposition 1.5.1. For any matrices A and B over a field F and a ∈ F, the following hold: (a) (At )t = A. (b) (aA)t = aAt .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Transpose of a Matrix

33

(c) If A and B can be added, then (A + B)t = At + Bt . (d) If A and B can be multiplied, then (AB)t = Bt At . (e) If A is invertible, then so is At , and (At )−1 = (A−1 )t . Proof. The proofs of the first three are straightforward and we leave them to the reader. For assertion (d), we may assume that A is an m × n and B is an n × p matrix over F so the product AB is an m × p and the transpose (AB)t a, p × m matrix. Similarly, the product Bt At of the transposes is a p × m matrix and thus both sides of (d) are matrices of the same size. To check the equality of the entries of these matrices, we let A = [ai j ], B = [bi j ] and AB = [ci j ]. Then, a typical (i, j)th entry of (AB)t will be c ji , which, by the definition of AB will be given by c ji =

n 1

a jk bki .

k=1

On the other hand, supposing At = [di j ] and Bt = [ fi j ], we see that the (i, j)th entry of the product Bt At is given by n 1 k=1

fik dk j =

n 1

bki a jk ,

k=1

which is clearly c ji . The equality in (d) follows. Finally, if A ∈ Mn (F) is invertible, then AA−1 = In = A−1 A. Taking transposes, and using the product rule (d) for transposes, we obtain (A−1 )t At = In t = At (A−1 )t . Since In t = In , it follows from the displayed equation that At is invertible and its inverse is (A−1 )t . The proof of assertion (e) is complete. ! Matrices known as symmetric matrices form an important class of matrices in Mn (F), and they are defined in terms of transposes. Definition 1.5.2.

A matrix A ∈ Mn (F) is symmetric if At = A, and skew-symmetric if At = −A.

Thus, a symmetric matrix is a square matrix in which each off-diagonal entry equals its mirror image or its reflection about the diagonal on the other side; however, there is no restriction on the diagonal entries. So the identity matrix, the zero matrix, any scalar matrix and any diagonal matrix in Mn (F) are

Saikia-Linear Algebra

34

book1

February 25, 2014

0:8

Matrices

trivially symmetric. Clearly both the following matrices A=

' 2 2

( 2 , 4

A−1 = 1/4

'

4 −2

( −2 2

are symmetric. There are several important facts about symmetric matrices one should know. We record three here; others can be found in the exercises. Proposition 1.5.3. Let F be a field. (a) If A and B are symmetric matrices in Mn (F) then so are A + B and A − B. (b) If A is a symmetric matrix in Mn (F) then so is cA for any scalar c ∈ F. (c) If a symmetric matrix a ∈ Mn (F) is invertible then so is A−1 . The proofs are routine applications of properties of transposes and are left to the reader to decide on. Before looking for examples of skew-symmetric matrices, the reader should note that all the diagonal entries of a skew-symmetric matrix are zeros. For complex matrices, there is a related concept that merits attention. First, recall that for a complex number z = a + ib ∈ C, its conjugate, denoted by z, is given by z = a − ib. Given a matrix A = [ai j ] ∈ Mm×n (C), its conjugate transpose, denoted by A∗ is the matrix obtained from the transpose At of A by replacing each entry of At by its conjugate. Thus, A∗ ∈ Mn×m (C), and if A∗ = [bi j ], then bi j = a ji . Thus, for example, if   i  A = 3i  0

 1  1 + i,  −i

then

A∗ =

'

−i 1

−3i 1−i

( 0 . i

Note that if A is a matrix with real entries, then its conjugate transpose A∗ coincides with At . There are results for conjugate transposes analogous to those for transposes given in the preceding Proposition (1.5.1). They are listed in the next proposition for future reference. The proofs are similar and left to the reader as easy exercises. Proposition 1.5.4. Let A and B be matrices over F where F is either C or R. Then, (a) (A∗ )∗ = A; (b) (cA)∗ = cA∗ for any c ∈ F; (c) If A and B can be added, then (A + B)∗ = A∗ + B∗; (d) If A and B can be multiplied, then (AB)∗ = B∗ A∗ .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Transpose of a Matrix

35

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given matrices are over an arbitrary field F unless otherwise specified. (a) If a matrix A is symmetric, so is At . (b) If a matrix A is skew-symmetric, so is At . (c) A symmetric matrix is invertible. (d) If an invertible matrix A is skew-symmetric, so is A−1 . (e) If A and B are symmetric matrices of the same order, then the product AB is symmetric. (f) If A and B are skew-symmetric matrices of the same order, then the product AB is symmetric. (g) A square matrix cannot be both symmetric and skew-symmetric. (h) For a square matrix A, (Ak )t = (At )k for any positive integer k. (i) For any m × n matrix A, the product AAt is symmetric. (j) If A and B are invertible matrices of the same order, then ((AB)−1)t = (At )−1 (Bt )−1 . ' ( 2 3 1 3 1 2. Let A = and u = be matrices over R. Compute the matrices (Au)t , ut At as well as 3 −1 2 ut u and uut . 3. For any m×n matrix A over a field F, show that the products AAt and At A are symmetric matrices in Mm (F) and Mn (F), respectively. 4. For any field F, prove the following: (a) For any A ∈ Mn (F), the matrices A + At and A − At are symmetric and skew-symmetric, respectively. (b) For any symmetric or skew-symmetric matrix A ∈ Mn (F), and for any scalar c ∈ F, the matrix cA is symmetric or skew-symmetric, respectively. (c) Every matrix in Mn (F) is a sum of a symmetric and a skew-symmetric matrix (provided, division by 2 is allowed in the field F). 5. For any m × n matrix over R, show that each diagonal entry of the matrix At A is non-negative. 6. For a non-zero matrix A in Mn (F), where F = C or R, prove that AAt or At A cannot be the zero matrix. Give a counter example over the field F of two elements. 7. Let A be either a symmetric or a skew-symmetric matrix in Mn (F). Show that A2 is a symmetric matrix. 8. Let A, B ∈ Mn (F) be both either symmetric or skew-symmetric. Prove that AB is symmetric if and only if A and B commute. 9. Let A, B ∈ Mn (F) be symmetric matrices. Prove that AB + BA is symmetric and AB − BA is skew-symmetric. 10. Given any symmetric matrix A ∈ Mn (F), show that for any m × n matrix C over F, the matrix CAC t is symmetric. 11. Prove the properties of conjugate transposes given in Proposition (1.5.4). 12. If A is invertible in Mn (C), show that A∗ is also invertible in Mn (C). 13. Let A be an upper triangular matrix in Mn (R) such that A commutes with At . Prove that A is diagonal. 14. Give an example of matrices A, B ∈ Mn (C) such that (a) At = A but A∗ ! A; (b) Bt = −B but B∗ ! −B.

Saikia-Linear Algebra

36

book1

February 25, 2014

0:8

Matrices

1.6 PARTITION OF MATRICES; BLOCK MULTIPLICATION Though matrices are introduced as arrays of numbers displayed in rows and columns, other ways of looking at them can be quite useful. For example, subdividing a matrix by vertical as well as horizontal lines to produce what is known as a partitioned matrix turns out to be very convenient, especially when calculations with large matrices are required. Many contemporary applications of linear algebra appear more natural and much easier to handle, if we resort to partitioned matrices. However, we will not give a formal definition of a partitioned matrix as it requires cumbersome notation. We can avoid such a formal definition for the idea of a partitioned matrix is so simple and natural that examples alone will be sufficient for its understanding. To keep our presentation informal, we will assume throughout this chapter that matrices under discussion at any given point are over a fixed field, and we will not mention the underlying field at all except in the last proposition. We begin our discussion by considering the following example:  1  A = 7  0

2 8 2

3 0 9

4 −3 1

6 1 2

 0  6.  4

As indicated by the vertical and horizontal lines, A is an example of a 2 × 3 partitioned, or 2 × 3 block matrix which can be visualized as consisting of submatrices or blocks: A=

'

A11 A21

A12 A22

( A13 . A23

It is essential to regard the blocks or the submatrices Ai j of A virtually as the entries of A, and manipulate the blocks as if they are scalars; because of this viewpoint, A can be thought of as having 2 horizontal blocks and 3 vertical blocks. Note that in this example, the blocks or submatrices that form the partition of A are of different sizes. This is typical, for there is no formal restriction on sizes of blocks. We partition a matrix the way we deem fit, depending on our requirements. Sometimes, the sizes of the blocks will be determined naturally. For example, when modelling a physical system such as a communication or an electric network, or a large transportation system, or the Indian economy by a matrix, it is essential to consider the matrix as a partitioned one whose blocks will be naturally determined by mutual interactions of the different components of the system. Another advantage of partitioned matrices lies in the fact that very large matrices, which appear in many applications with the advent of high-speed computers, can be handled with relative ease if we partition them properly. Such large matrices are partitioned into much smaller submatrices so that computers can work with several submatrices at one time. Thus, the idea of partitioned matrix has turned fruitful in tackling highly complex processes of today’s technical world. Let us briefly discuss how the usual matrix operations work with partitioned matrices. Two matrices of the same size can be added block wise if they are partitioned exactly in the same way; the sum will be obviously a matrix with the same partition. To be more specific, given matrices A and B with similar partitions, each block of the sum A + B is clearly the sum of the corresponding blocks of A and B. Similarly, the scalar multiple of a partitioned matrix is obtained by multiplying each block of the matrix by the scalar.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Partition of Matrices; Block Multiplication

37

We now consider the following partitioned matrices of the same size to see how the operations work in practice:  1  A = 7  0

2 8 2

3 0 9

4 −3 1

6 1 2

 6  C = 7  0

  0 0  6, B = 1   4 4 5 8 2

4 0 9

3 −1 1

2 −2 −4

−1 4 6

2 2 2

 1  0.  4

0 −2 1

−3 2 0

 5  −5,  −4

Now, A+ B can be computed block wise as A and B are partitioned the same way with corresponding blocks or submatrices having the same sizes. However, even though A and C are matrices of the same size (so A + C can be computed adding elements entry wise) A + C cannot be computed block wise as the blocks of A and C are not comparable. But such problems do not occur in practice, as we partition matrices beforehand in such a way that block wise operations are possible. Coming back to our example, if we write ' ( ' ( A B A12 A13 B12 B13 A = 11 , B = 11 A21 A22 A23 B21 B22 B23 in terms of their blocks, then A + B, being the sum of corresponding blocks of A and B, respectively, can be visualized in the following partitioned form: ' ( A11 + B11 A12 + B12 A13 + B13 A+B = . A21 + B21 A22 + B22 A23 + B23 This presentation of A + B validates the point we made earlier, that while combining matrices with comparable blocks, we treat their blocks as if they are actual matrix entries. If the scalar entries of the sum A + B are needed, one simply adds up the blocks as matrices. In A + B, for example, A12 + B12 is the submatrix ' ( ' ( ' ( 4 6 0 −3 4 3 + = . −3 1 −2 2 −5 3 We adopt the same point of view (that is, of treating blocks as entries) to see how partitioned matrices can be multiplied. In fact, we multiply partitioned matrices A and B by our old row-column method for entries now applied to blocks, if the column partition of A matches the row partition of B. By this we mean two things: • The number of horizontal blocks of A equals the number of vertical blocks of B. • The number of columns in each block of A equals the number of rows in the corresponding block of B. However, the basic requirement that the number of columns of A is equal to the number of rows of B must be satisfied for the product AB to make sense. We work through a few examples to understand block multiplication.

Saikia-Linear Algebra

38

book1

February 25, 2014

0:8

Matrices

Let A be a 3 × 5 and B be a 5 × 2 matrix (so that the product AB makes sense and is a 3 × 5 matrix) partitioned into blocks by vertical and horizontal lines as shown below  1  A = 0  2

2 −1 0

3 2 1

4 1 0

  1   3 0   1, B =  5   1  0 −1

 2  4 6.  1  2

As indicated, A is partitioned into two horizontal blocks having 3 and 2 columns, respectively, whereas B has two vertical blocks with 3 and 2 rows, respectively; thus, the column partition of A matches the row partition of B. So, block multiplication can be performed. If we represent A and B by their submatrices ( ' ( ' A12 B A , B = 11 , A = 11 A21 A22 B21 then AB will also appear as consisting of two submatrices: ' ( A B + A12 B21 AB = 11 11 . A21 B11 + A22 B21 Note that it is as if we have multiplied a 2 × 2 matrix to a 2 × 1 matrix to obtain a 2 × 1 matrix. Of course, we still have to compute the submatrices by the usual rules of matrix operations. For example,   ' ( 1 2 ' (' (  4 0 1 2 3  0 1 3 4 + A11 B11 + A12 B21 = 0 −1 2   1 1 −1 2 5 6 ' ( ' ( 22 28 0 4 = + 7 8 −1 3 ' ( 22 32 = . 6 11 Similarly, / A21 B11 + A22 B21 = 2

Thus, AB turns out to be

/ = 7 / = 6

 0 1 0 1 3  5 0 / 10 + −1 0 12 .  22  6  6

 32 11.  12

 2 /  4 + 0  6 0 2

0' 0 1 −1

1 2

(

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Partition of Matrices; Block Multiplication

39

It is clear that once the partitions of A and B match in this manner, all the products of the submatrices making up the AB are automatically well-defined. Also, note that compared to the calculations required in direct computation of product of matrices from definition, those needed in block multiplication are much shorter, especially if the blocks are chosen carefully. (see Exercise 5 of this section.) Therefore, block multiplication turns out to be more efficient especially when working with large matrices. For most of the applications required in this book though, partitioning matrices into four blocks will suffice. We discuss this important special case thoroughly. So, let A be an m × n and B be an n × p matrix (so that AB is an m × p matrix) partitioned as follows: ' ( ( ' A11 A12 B11 B12 A= , , B= B21 B22 A21 A22 where the number of columns of A11 equals the numbers of rows of B11. Observe that this single condition ensures that the column partition of A matches the row partition of B, for there are just two horizontal blocks of A and two vertical blocks of B. Now, we multiply these two matrices as if they are both 2 × 2 matrices to obtain AB as follows: ' ( A B + A12 B21 A11 B12 + A12 B22 AB = 11 11 . A21 B11 + A22 B21 A21 B12 + A22 B22 For example,   1  −1 0

2 0 2

 3  0   1 −1  1 1

1 0 2

−1 3 0

4 0 1

  2  1   −1 =  1   0 −1

7 1 2

5 1 6

7 −3 1

 0  −2.  −2

One should verify the details of the last example so as to gain some experience in dealing with block matrices. We work out the upper-most right submatrix in the product AB: ' (' ( ' (/ 0 1 2 4 2 3 1 0 + A11 B12 + A12 B22 = −1 0 0 −1 1 ' ( ' ( 4 0 3 0 = + −4 −2 1 0 ' ( 7 0 = . −3 −2 Let us discuss a couple of examples to illustrate the advantages of partitioning a matrix into four blocks; these examples will be useful later. EXAMPLE 2

Let us compute the square A2 of the following 5 × 5 matrix by block multiplication:   0 0 3 7 0  1 0 0 0 0   2. A = 0 0 1 −1  0 0 0 1 0  0 0 1 3 −2

Saikia-Linear Algebra

40

book1

February 25, 2014

0:8

Matrices

Since A has a number of zeros, we try to partition it in such a way that zero blocks do result; for, without actually doing any calculations we know that a zero block multiplied to any comparable block will produce a zero block only. Also, since we have to multiply A to itself, we require that after partitioning A, its upper left-hand block, that is, A11 , must have the same number of rows as columns. In other words, it must be a square submatrix. Thus, the following seems to be the most convenient partitioning of A:   3 −1  A =  0   0 0

7 0 0 0 0

0 0 1 0 1

0 0 −1 1 3

 0  0  2.  0 −2

Now, because we treat the four blocks as entries of a matrix, it is clear that the zero submatrices in A will produce zero submatrices in the same positions in A2 . We have the complete calculation as follows:   3 −1  2 A =  0   0 0   2 −3  =  0   0 0

7 0 0 0 0 21 −7 0 0 0

0 0 1 0 1

0 0 −1 1 3 0 0 3 0 −1

 0  3  0 −1   2  0  0  0  −2 0  0 0  0 0  4 −2.  1 0  −4 6

7 0 0 0 0

0 0 1 0 1

0 0 −1 1 3

 0  0  2  0  −2

(1.13)

The special kind of matrix we had just considered is known as a block triangular matrix. A square matrix A of order n is a block upper triangular matrix, if A can be put in the form ( ' A11 A12 , A= O A22 where A11 is also a square matrix of order, say r where r < n. It is clear that A22 is a square matrix of order (n − r), and that O is the zero matrix of size (n − r) × r. Given such a matrix A, block multiplication of A with itself is permissible as A11 is a square submatrix. We leave it to the reader to verify that ( ' 2 A1 A , A2 = 11 O A222 where A1 is the r × (n − r) submatrix given by A11 A12 + A12 A22 . Since A2 is again a block upper triangular matrix with its upper left corner submatrix A211 a square matrix, it follows that we can continue the process of block multiplication of A with

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Partition of Matrices; Block Multiplication

41

its powers to obtain, for any positive integer k, the power Ak in the following form:   k A11 Ak  k A =  . O Ak22 EXAMPLE 3

Consider the block upper triangular matrix ' A11 A= O

A12 A22

(

of order n such that A is invertible. We seek to express A−1 also in a block triangular form. We assume that A11 has order r and A22 has order s, so that r + s = n. For the time being, let us denote A−1 as B. Partition B into four blocks as follows: ' ( B11 B12 B= , B21 B22 where B11 is a square block of order r. Then, the relation AB = In implies that ' (' ( ' ( A11 A12 B11 B12 Ir O AB = = O A22 B21 B22 O Is so that by the rules of block multiplication, we obtain the following four matrix equations after equating submatrices in the preceding equality: A11 B11 + A12 B21 = Ir , A11 B12 + A12 B22 = O, A22 B21 = O, A22 B22 = I s . Note that the last of the preceding equality implies, according to condition (1.12), that A22 is invertible and A−1 22 = B22 . Now, multiplying the third of the equalities by A−1 22 , we then see that B21 = O. It follows from the first of the equalities that A11 is −1 −1 invertible and A−1 11 = B11 . Finally, multiplying the last equality by A11 and B22 shows −1 −1 that B12 = −A11 A12 A22 . Hence, we can conclude that  −1 A A−1 = B =  11 O

−1  −A−1 11 A12 A22  . A−1 22

For future reference, we record a condition sufficient for block multiplication of two comparable matrices with two horizontal and two vertical blocks each. Proposition 1.6.1.

Let A=

'

A11 A21

( ' A12 B and B = 11 A22 B21

B12 B22

(

Saikia-Linear Algebra

42

book1

February 25, 2014

0:8

Matrices

be two m × n and n × p partitioned matrices, respectively. If the number of columns in the submatrix A11 is the same as the number of rows of B11, then block multiplication of A and B is possible, and '

A B + A12 B21 AB = 11 11 A21 B11 + A22 B21 Corollary 1.6.2.

( A11 B12 + A12 B22 . A21 B12 + A22 B22

Let '

A A = 11 A21

A12 A22

(

be a partitioned matrix. If A11 is a square submatrix, then

A2 =

'

A11 2 + A12 A21 A21 A11 + A22 A21

( A11 A12 + A12 A22 . A21 A12 + A22 2

We end this section by presenting an alternative, but quite useful, way of looking at matrix multiplication, which we had anticipated in Section 2. For an m × n matrix A and n × p matrix B over any field F, the entries of the product AB are usually obtained by the dot products of the rows of A with the columns of B. However, by partitioning A into its columns, and B into its rows, and block multiplying the partitioned matrices thus obtained, we can produce another description of the product AB. This way of partitioning results in n blocks in A in a row, each block an m-dimensional column vector; and similarly in n blocks in B in a column, each block a p-dimensional row-vector; multiplying the blocks of A with the blocks of B by the row-column method (treating the blocks as if they are scalars), we then obtain the required expression for AB as given in the following proposition. Proposition 1.6.3. Let A be an m × n matrix, and B be an n × p matrix over a field F. If γ1 , γ2 , . . . , γn are the columns of A (so that each γ j is an m × 1 matrix), and ρ1 , ρ2 , . . . , ρn are the rows of B (so that each ρ j is an 1 × p matrix), then AB = γ1 ρ1 + γ2 ρ2 + · · · + γn ρn . Note that each product γ j ρ j is an m × p matrix and so their sum is also m × p matrix. This way of multiplying two comparable matrices is known as the column–row multiplication of matrices. In particular, we see that if A is an m × n matrix with γ1 , γ2 , . . . , γn as its columns and x = (x1 , x2 , . . . xn ) an n × 1 column vector, then as the jth row of x is x j , Ax = x1 γ1 + x2 γ2 + · · · + xn γn , a result which we have verified directly in Section 2 (see the discussion preceding Equation 1.7).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Partition of Matrices; Block Multiplication

43

EXERCISES 1. In each of the following products of block matrices, assume that the blocks are such that block multiplication is possible. Express each of the product as a single block matrix. I and O stand for the identity matrix and zero matrix of suitable sizes, respectively. ' ' (' ( (' ( A O C D A B I E , , O B E F O D O I ' O I

(' I A D C

( B , O

'

A O

(' B C I O

D O

( I . F

2. In each of the following products of two matrices, one of the matrices is a partitioned one. Partition the other matrix suitably so that block multiplication is possible. Express each product as a single block matrix after performing block multiplication.  0 2  1   1 −1   0  0

1 0 2

−1 3 3

2 5 0 0

0 0 3 2

2 −1 0

   1 0  0  1  3   −2  1 −1

 0  2  0  0  1 −1  −3 2

 2  −2  1,  1 −2

−1 1 0 2 2

−1 3 1 −1

1 −1 −2 1

 0  1 . 0  1

3. Compute A2 by block multiplication after suitably partitioning the following matrices A:

4. Consider matrices

 2  A = 1  0

−1 3 0

 0  0,  4

 2 1 A =  0 0

−1 3 0 0

0 0 4 3

 0  0 . −1 1

  3 3  4 9 0 0 0   1 0 0 and B = 6 8. 5 9 0 1 0   7 0 0 / Compute AB by block multiplication after partitioning A as I3 0 x and partitioning B in such a way that block multiplication is possible. 5. Verify Proposition (1.6.1). 6. Verify Corollary (1.6.2). 7. Verify Proposition (1.6.3).  1  A = 0  0

 2  3  4

Saikia-Linear Algebra

44

book1

February 25, 2014

0:8

Matrices

8. Suppose a matrix A ∈ Mn (F) can be partitioned as ' ( I O A= , B I where the symbols I stand for identity matrices of possibly different orders. Prove that A is invertible, and express A−1 as a block matrix. Hint: Determine matrices C, D, E and F such that ' C A E

( ' D I = F O

( O . I

9. Let A be an invertible matrix in Mn (F) in block upper triangular form: ' ( A A12 A = 11 , O A22 where A11 is a square matrix. Show that A−1 is also a block upper triangular matrix. Express −1 the blocks ' of A ( in terms of blocks of A. B O 10. Let A = , where B and C are square matrices over a field F of possibly different orders. O C Prove that A is invertible if and only if B and C are invertible. 11. Consider a matrix A over a field F which can be partitioned as follows: ' ( A11 A12 A= , A21 A22 where A11 is invertible. Find matrices X and Y over F such that (' ( ' (' I O A11 O I Y , A= O S O I X I where S = A22 − A21 A11 −1 A12 . 12. For any two m × n matrices A and B over a field F, show that ' (' ( ' ( I m A Im B Im A + B = , O In O In O In where Im and In are identity matrices of order m and n over F. Hence, show that for any m × n matrix A over F, ' (−1 ' ( Im A Im −A . = O In O In 13. A lower triangular matrix in Mn (F) can be represented as ' ( a 0 , b C

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Groups and Fields

45

where a ∈ F, b is an (n − 1) × 1 column vector over F, 0 is the 1 × (n − 1) zero row vector and C is a lower triangular matrix in M(n−1) (F). Prove that the product of two lower triangular matrices in Mn (F) is lower triangular by using block multiplication. Hint: Use induction on n.

1.7 GROUPS AND FIELDS Earlier in this chapter, while discussing properties of addition and multiplication of matrices, certain terms, such as fields and groups, were used to refer to specific algebraic structures; we also used properties of fields for proving properties of matrix operations. In this brief section, we discuss these structures and give some examples, mostly of those which appear frequently in linear algebra. One of the basic concepts required for these definitions is that of a binary operation on a set. As the term suggests (‘bi’ means two), a binary operation on a non-empty set S is a rule by which any pair of elements of S is associated to a unique element of S ; one also says that S is closed with respect to the binary operation. To describe the result of the application of the rule to a pair, various symbols such as + (usually called addition) or · (called multiplication) or even ∗ are used. Thus when one says that + (or ·) is a binary operation on a set S , one means that every pair x, y of elements of S is associated to the unique element x + y ∈ S (or to x · y; most of the times the juxtapose xy is used to denote x · y). Thus, the usual addition and multiplication of integers are binary operations in the set Z of integers; for any pair of integers x, y, x + y and xy are well-defined integers. Similarly, the familiar operations of addition and multiplication are binary operations in the set R of real numbers as well as in the set C of complex numbers. The definition of addition of m × n matrices shows that it is too a binary operation on the set of such matrices. Now for the definition of a group refer to next definition. Definition 1.7.1. A non-empty set G with a binary operation · is said to be a group (with respect to ·) if the following axioms (rules) are satisfied. (a) (Associativity) (x · y) · z = x · (y · z) for any x, y, z ∈ G. (b) (Existence of Identity) There is an element e ∈ G such that x · e = x = e · x for any

x ∈ G.

(c) (Existence of Inverse) For every x ∈ G, there is an element y ∈ G such that x · y = e = y · x. A group G with a binary operation · is called an commutative group if (d) (Commutativity) x · y = y · x for any x, y ∈ G. In case a group G with addition +, satisfies the condition for commutativity, that is, x + y = y + x for all x, y ∈ G, it is customary to call it an abelian group It can be easily shown that the identity of a group is unique and so is the inverse of any element of a group.

Saikia-Linear Algebra

46

book1

February 25, 2014

0:8

Matrices

Now for the examples. With respect to usual addition of numbers, each of the sets Z (integers), Q (rational numbers), R (real numbers) or C (complex numbers) is an abelian group. For, (x + y) + z = x + (y + z) and x + y = y + x for any numbers x, y and z in any of these four sets; the number 0, which is common to all these four sets, clearly acts as the (additive) identity: x + 0 = x = 0 + x. Moreover, the negative −x of a number x is its (additive) inverse. Thus, each of the four sets is an abelian group. Regarding multiplication, in Z it is associative and 1 acts as the multiplicative identity. However, as we have remarked earlier, even the non-zero integers do not form a group with respect to multiplication as, except 1 and −1, no integer has an inverse. In contrast, each of the three sets Q∗ (non-zero rationals), R∗ (non-zero reals) and C∗ (non-zero complex numbers) is a commutative group with respect to usual multiplication. The number 1, common to these three sets, is clearly the identity. If r is a non-zero rational or a real number, then the reciprocal 1/r is the inverse; for a non-zero complex number a + bi a − bi (so at least one of a and b must be a non-zero reals), the complex number 2 2 is the inverse. a +b There are numerous other examples of groups which can be found in any standard textbook of algebra such as the classic Topics in Algebra by I.M. Herstein [3]. Some groups appear naturally in linear algebra and will be considered as we go along. At this point, we want to introduce a family of finite groups. A finite group has finitely many elements. Note that in all the examples we have given so far, the groups have infinitely many elements; they are infinite groups. The simplest finite group has two elements and is usually described as Z2 = {0, 1}. We can introduce a binary operation + in Z2 by the following relations: 0 + 0 = 0,

0 + 1 = 1,

1 + 0 = 1,

1 + 1 = 0.

Thus, every possible pair of elements in Z2 is associated to a unique element in Z2 and so + is indeed a binary operation in Z2 , called addition in Z2 . It is a trivial verification that Z2 satisfies all the axioms for an abelian group with respect to addition; the element 0 is the identity (zero) and 1 is the inverse of itself. Similarly, multiplication in Z2 can be defined by 0 · 0 = 0,

0 · 1 = 0,

1 · 0 = 0,

1·1 = 1

Note: It is clear that Z2 cannot be group with respect to multiplication as the element 0 does not have an inverse in Z2 . On the other hand, note that the singleton {1}, consisting of the only non-zero element of Z2 , forms a commutative group with respect to multiplication as all the required axioms are satisfied trivially because of the relation 1 · 1 = 1. In general, for every prime number p, we can consider the set Z p = {0, 1, . . . , p − 1} of p elements. As in the case of Z2 , addition and multiplication are defined in Z p as follows. For any a, b ∈ Z p , the sum a + b and the product a · b in Z p are the least non-negative remainders of the sum a + b (as integers) and the product a · b (as integers), respectively, when divided by p in Z; the operations are known as addition and multiplication modulo p. For example, as 3 + 2 = 5 and 3 · 2 = 6 in Z, in Z5 , 3 + 2 = 0 and 3 · 2 = 1. Since the least non-negative remainder of any integer when divided by p is an integer between 0 and p − 1, it follows that addition and multiplication in Z p , as defined in the preceding para, are binary operations on Z p . It can, further, be verified that (a) Z p with its addition is an abelian group with the zero 0 ∈ Z p acting as the additive identity; (b) Z p ∗ , the set of non-zero elements of Z p , is also a commutative group with respect to multiplication with 1 ∈ Z p acting as the multiplicative identity.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Groups and Fields

47

In all the examples of abelian groups (with respect to whatever operation known as addition) that we have discussed so far, each of the groups has another binary operation called multiplication such that, with the exception of the additive group Z of integers, the non-zero elements of the additive group form a commutative group with respect to multiplication. These examples, thus, lead naturally to the definition of a field. In the following definition, as in all the examples, the identity of a group, with respect to a binary operation called addition, will be designated as the zero of the group and will be denoted by the symbol 0. Definition 1.7.2. Let F be a non-empty set with two binary operations, one called addition and the other multiplication. F is a field if the following conditions hold. (a) F is an abelian group with respect to addition. (b) F∗ , the set of the non-zero elements of F, is a commutative group with respect to multiplication. (c) The elements of F satisfy the distributive laws: (x + y : z = x : z + y : z for all

x, y, z ∈ F,

x : (y + z) = x : y + x : z for all

x, y, z ∈ F.

and

In general, the unique additive identity of a field F is called the zero of F and is denoted by 0, whereas the unique multiplicative identity is denoted by 1; by convention, 0 ! 1 so any field has at least two elements. The additive inverse of an element x in a field F is denoted by −x; the multiplicative inverse of a non-zero element x ∈ F is denoted either by 1/x or by x−1 . We have already noted that the number systems Q, R, C and Z p satisfy the first two axioms of the definition of a field. Since the distributive laws also hold for them, we have the following list of fields: (a) Q, R and C with respect to usual addition and multiplication of numbers. (b) Z p with respect to addition and multiplication modulo p for any prime p (including p = 2). There are many more important examples of fields. However, the fields relevant for our purpose are the ones mentioned in the preceding example and their subfields. Before introducing the concept of a subfield, we digress a bit with some remarks about our discussion so far in this section. It must have been noticed that while presenting examples of fields of various numbers, we have glossed over the verifications of the properties of addition and multiplication in these number systems. So it must be pointed out that rigorous proofs of these properties can be given once these numbers and their addition and multiplication are defined properly (see, for example, Chapter 13 of Basic Abstract Algebra by Bhattacharya, Jain and Nagpaul). We come back to subfields now. As the name suggests, a subfield K of a field F is a non-empty subset of F such that K itself is a field with respect to the field operations of F. Thus, for elements x, y of K, the sum x + y and the product xy in F, must be in K. It is also clear that the identities 0 and 1 of F must belong to K. To be precise, the following conditions on a non-empty subset K of a field F are sufficient for K to be a subfield: (a) (b) (c) (d)

For any x, y ∈ K, x + y ∈ K. For any x ∈ K, −x ∈ K. For any non-zero x ∈ K, the inverse x−1 ∈ K. 1 ∈ K.

Saikia-Linear Algebra

48

book1

February 25, 2014

0:8

Matrices

If K is a subfield of F, we also say that F is an extension field of K. The following are some examples of subfields: (a) (b) (c) (d)

The field R of real numbers is a subfield of the field C = {a + bi : a, b ∈ R} of complex numbers. The field Q of rational numbers is a subfield of R and also of C. √ The subset {a + b √2 : a, b ∈ Q} is a subfield of R. The subset {a + b −2 : a, b ∈ R} is a subfield of C.

We finally consider rings. Like a field, a ring also has two binary operations, called addition and multiplication. Definition 1.7.3. A non-empty set R, with two binary operations called addition (+) and multiplication (:), is a ring if the following hold. (a) (b) (c) (d) (e)

R with respect to addition is an abelian group. Multiplication in R is associative. The distributive laws (as in a field) hold in R. A ring R is a commutative ring if the multiplication in R is commutative. A ring R is a ring with identity if R has multiplicative identity 1.

Note: A field is a commutative ring with identity in which every non-zero element has a multiplicative inverse. As we have noted earlier in Section 1.3, the set Mm×n (F) of all m × n matrices over a field F is an abelian group with respect to matrix addition (see Theorem 1.3.2). Moreover, the set Mn (F) of all square matrices of order n over a field F, with respect to matrix addition and multiplication, is a non-commutative ring with identity. The other important example of commutative rings is, as we have already noted in Section 1.3, is the set of Z of integers with respect to usual addition and multiplication. These sets of matrices are also examples of vector spaces over a field. We shall be considering vector spaces in detail in Chapter 3.

Saikia-Linear Algebra

2

book1

February 25, 2014

0:8

Systems of Linear Equations

2.1 INTRODUCTION This chapter is a leisurely but exhaustive treatment of systems of linear equations and their solutions. The chapter begins by looking at the well-known procedure of row reduction of systems of linear equations for obtaining their solutions and thereby develops the important theoretical machinery of row and column operations. We also initiate the study of linear equations by using matrix operations in this chapter. The application of matrix methods yields useful insights into other entities such as invertible matrices. This chapter also introduces the important idea of determinants and develops its properties.    x1   x   2 Note: from now onwards, we shall be writing a column vector x =  .  as the transpose  ..    xn (x1 , x2 , . . . , xn )t of the corresponding row vector as far as practicable.

2.2 GAUSSIAN ELIMINATION We begin by recalling the elementary procedure for solving two linear equations in two variables. The idea is to use one equation to eliminate one variable from the other equation; the resultant equation, having only the remaining variable, readily yields the value of this variable. Using it in any of the original equations, we obtain an equation which determines the value of the other variable. These values constitute the required solution. The same idea, with slight modification, works in the more general case of a system of, say, m equations in n variables x1 , x2 , . . . , xn over any field. In this general case, we use the x1 term in the first equation, or in any other equation, to eliminate the x1 term in all the other (m − 1) equations. The resultant system of m equations will thus have exactly one equation with x1 term. Next, we use the x2 term in any equation, which does not have the x1 term, to eliminate the x2 term in the other (m − 1) equations. After the second round of eliminations, the x1 and the x2 term will be presented only in a single equation each in the new system. We now proceed with the x3 term the same way, and so on with the other variables one by one, until we have a very simple equivalent system of equations from which the values of the variables, or the solution set of the system can obtained easily. Not only that, the shape of the final system can also tell us when the system does not have a solution.

49

Saikia-Linear Algebra

50

book1

February 25, 2014

0:8

Systems of Linear Equations

This method of solving a system of linear equations, by systematic elimination of as many variables as possible, to arrive at simpler equivalent system is known as the method of Gaussian elimination. In what follows, we refine this procedure by using matrices to turn it into a powerful tool for examining systems of linear equations. Before we proceed any further, we recall some of the terms associated with a system of linear equations over a field F. A reader not familiar with the notion of a field can assume F to be either R, the field of real numbers or C, the field of complex numbers. As seen in Chapter 1, the general system of m equations in n variables (unknowns) over a field F is usually described as a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = bm , where the coefficients ai j and the bi are given scalars in the field F. Such a system can also be described briefly by using the summation sign: n 1

ai j x j = bi

for i = 1, 2, . . . , m.

(2.1)

j=1

/ 0 We have also seen that if A = ai j denotes the m × n matrix of the coefficients of the system (2.1), then the system can also be described as a single matrix equation: Ax = b,

(2.2)

where x is the column vector (x1 , x2 , . . . , xn )t of variables and b = (b1 , b2 , . . . , bm )t is the column vector over F consisting of the scalars appearing on the right-hand side of the system of equations (2.1). Recall that A is called the coefficient matrix of the given system. A solution of the system is an n × 1 column vector s = (s1 , s2 , . . . , sn )T over F such that As = b becomes a true identity. In other words, when the scalars s j replace the variables x j , each of the equations of the system (2.1) becomes a true statement. The collection of all the solutions of a given system of linear equations over any field is the solution set of the system. It must be pointed out that a system of linear equations need not have a solution; in that case, the solution set is empty. Two systems of linear equations over the same field are said to be equivalent if they have the same solution set. The point of subjecting a system of equations to Gaussian elimination is to reduce the given system to a simpler equivalent system so that the solutions of the given system are obtained by considering the equivalent system. It is clear that if any system equivalent to a given system of equations fails to have a solution, then the original system too can have no solution. Definition 2.2.1. A system of linear equations over a field F is said to be consistent if it has some solution over F; it is inconsistent if it has no solution. The preceding remark can now be restated: a system of equations is inconsistent if and only if any equivalent system is inconsistent. We now present a few simple examples to show various possibilities of solutions of a system of linear equations.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gaussian Elimination

EXAMPLE 1

51

Consider the system 2x1 − 3x2 = −3 x1 + x2 = 1 of equations over the reals. One verifies easily (for example, by eliminating x1 from the first equation to begin with) that x1 = 0 x2 = 1 is an equivalent system. Thus, we may conclude that s =

EXAMPLE 2

The system

2 3 0 is the only solution. 1

x 1 + x2 = 3 2x1 + 2x2 = 4 is equivalent to x1 + x 2 = 3 0 = −2 which is clearly inconsistent. No matter which real numbers replace x1 and x2 in the system, one of the equations can never be a true statement. Thus, the given system is also inconsistent. EXAMPLE 3

However, a system equivalent to

is

x1 − 2x2 = 3 2x1 − 4x2 = 6 x1 − 2x2 = 3

which is obtained by subtracting 2 3 the first equation from the second. It follows that 2a + 3 for any real number a, is a solution of both the systems. Thus, the solution a set of the given system of equations is infinite. In these examples, we have tacitly assumed that certain operations, which constitute the process of Gaussian elimination, on a system of linear equations produce an equivalent system. To justify our assumption, we need to study systematically the effects of Gaussian elimination on a system of linear equations. The procedure of Gaussian elimination applied to the equations of a system of linear equations involves just three types of operations with the equations. 1. Multiplication of all the terms of an equation by a non-zero scalar (for simplicity, we call the resultant a scalar multiple, or just a multiple of the equation.) 2. Interchanging two equations. 3. Replacing an equation by its sum with a multiple of another equation.

Saikia-Linear Algebra

52

book1

February 25, 2014

0:8

Systems of Linear Equations

We will illustrate these operations shortly in an example. Moreover, for later use, we will also examine how these operations change certain matrices associated with the systems of linear equations at the same time. We have already introduced the idea of a coefficient matrix of a system of linear equations. The next definition introduces another matrix known as the augmented matrix of a system of equations. Definition 2.2.2. Let A be the coefficient matrix of the following system of m linear equations in n variables: n 1 ai j x j = bi for i = 1, 2, . . . , m. (2.3) j=1

Then, the augmented matrix of the system (2.3) is the m × (n + 1) matrix given in the block form as / 0 A b,

where b is the column vector consisting of the scalars bi in the right-hand side of Equation (2.3).

In the following example, we shall write the augmented matrix of each system of equations alongside it so that the changes that occur in these matrices as we go on eliminating variables in the equations are visible. EXAMPLE 4

We consider the following system of equations with the augmented matrix alongside it:   x1 − x2 + 2x3 = 3 3 2 1 −1   3 3x1 + 2x2 − x3 = 1 2 −1 1.   0 1 4 −1 x2 + 4x3 = −1

To start the procedure of Gaussian elimination, we use the first equation to eliminate x1 from the other two equations. So, the first step will be to multiply the first equation by −3 and add it to the second equation (or multiply it by 3 and subtract it from the second) to eliminate x1 from the second equation. If the third equation had a x1 term, we would have done a similar operation to eliminate it from the equation. The new system and the corresponding augmented matrix will look like this   x1 − x2 + 2x3 = 3 3 2 1 −1   0 5x2 − 7x3 = −8 5 −7 −8.   0 1 4 −1 x2 + 4x3 = −1

Now, we have two equations without the x1 term, and theoretically we can use either one to eliminate the x2 term. However, as the third one has 1 as the coefficient of x2 , we choose to work with it. For reasons which will be clear later, we interchange the two equations so as to have the x2 term with coefficient 1 in the second row; note that in the new augmented matrix, the first non-zero term in the first two rows is 1:   x1 − x2 − 2x3 = 3 3 2 1 −1   0 1 4 −1. x2 + 4x3 = −1   0 5 −7 −8 − 5x2 − 7x3 = −8

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gaussian Elimination

53

Now that we have only x2 in the second equation, we add suitable multiples of the second row to the first as well as the third row to get rid of the x2 terms therein. This will result in   x1 + 2x3 = 3 2 6 1 0  0 1 x2 + 4x3 = −1 4 −1.   0 0 −27 −3 − 27x3 = −3

There is only x3 term present in the third equation, and multiplying it by −1/27 makes its coefficient 1. Now, suitable multiples of this new third equation can be used to eliminate the x3 terms from the other two equations. Carrying out the necessary computations, we finally arrive at   x1 = 12/9 12/9 1 0 0  0 1 0 −13/9. x2 = −13/9   0 0 1 1/9 x3 = 1/9

We conclude that the given system has a unique solution, namely, (12/9, −13/9, 1/9)t .

We note the following points about the preceding procedure: 1. There is no single way of carrying out the elimination process. For example, while carrying the process, the second step could have been that of multiplying the second equation by 1/5 instead of interchanging the second and the third rows. 2. Whatever steps one may take to eliminate the variables, one tries to manipulate the equations in such a way that the first surviving variable (with coefficient 1) in each equation appears from the left to the right as we come down the equations of the system. 3. Finally, about the augmented matrices. Consider the augmented matrices of the systems of equations we obtained in each stage. It is clear that we could have obtained these matrices by performing on the rows of the original matrix the same operations as the ones that were performed on the equations. For example, the second of these matrices could have been obtained simply by adding −3 times the first row to the second row of the first augmented matrix. Similarly, the third of the augmented matrices could have been obtained by simply interchanging the second and the third rows of the second augmented matrix. Thus, we see that solving a system of equations can also be accomplished by performing a series of row operations (which we will describe a little later) on the augmented matrix to bring the matrix to the simplest possible form and then writing down the equations corresponding to the new matrix. Then, one can read off the solutions from the new simpler system of equations. Before we try the alternative method of tackling systems of equations, we need to explain the two terms, namely ‘row operations’ and ‘simplest possible form of matrices’. We discuss these two ideas in the next section. EXERCISES 1. Determine whether the following statements are true or false giving brief justifications. Any given matrix is over an arbitrary field. (a) Any system of linear equations, given by the matrix equation Ax = 0, is consistent. (b) Any system of linear equations, given by the matrix equation Ax = b for a fixed but arbitrary column vector b, is consistent.

Saikia-Linear Algebra

54

book1

February 25, 2014

0:8

Systems of Linear Equations

(c) Two systems of linear equations described by matrix equations, Ax = b and Bx = c, are equivalent, if Bs = c for any solution s of Ax = b. (d) Two equivalent systems of linear equations must have the same number of equations. (e) The augmented matrix of a system of linear equations can never be a square matrix. (f) A system of linear equations consisting of a single linear equation involving n variables is always consistent if n > 1. (g) If s1 and s2 are solutions of the system of linear equations Ax = 0, so is s1 + s2 . (h) If s1 and s2 are solutions of the system of linear equations Ax = b for some non-zero column vector b, then so are s1 + s2 . (i) For a square matrix A, any system of linear equations Ax = b has a unique solution if A is invertible. 2. Show that the the following equation over R has infinitely many solutions; further show that every solution can be expressed as a linear combination, that is, as a sum of scalar multiples of three fixed 3 × 1 column vectors over R: x1 − x2 − 2x3 = 1. 3. Solve the following system of equations over R by the method of Gaussian elimination: (a)

x1 + x2 = 3 2x1 − x2 = 1,

(b)

x1 − 2x2 − x3 = 2 2x1 + x2 − 2x3 = 9,

(c)

3x1 + 6x2 + x3 = 16 2x1 + 4x2 + 3x3 = 13 x1 + 3x2 + 2x3 = 9.

4. Is the following system of equations over the field C of complex numbers consistent? x1 + ix2 = 1 + i ix1 − x2 = 1 − i. 5. Find the values of a for which the following system of equations over R is consistent. Determine the solutions for each such a. x1 + 3x2 − x3 = 1 2x1 + 7x2 + ax3 = 3 x1 + ax2 − 7x3 = 0. 6. Find the values of the real scalars a, b and c such the the following system of equations over the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Elementary Row Operations

55

real numbers has a unique solution, no solution and infinitely many solutions. 2x1 + 2x2 + 2x3 = a x1 + 2x2 − 2x3 = b x1 + x2 + x3 = c.

2.3 ELEMENTARY ROW OPERATIONS Though the term ‘row operation’ has come up in the context of augmented matrix, it should be clear that the following definition is valid for any matrix. In fact, we will see many applications of row operations in areas other than that of systems of equations. Definition 2.3.1. Let A be an m × n matrix over a field F, whose rows are considered n-dimensional row vectors. An elementary row operation on A is any one of the following three types of operations that can be performed on the rows of A. 1. Row scaling: multiplying any row of A by a non-zero scalar; 2. Row exchange: interchanging any two rows of A; 3. Row replacement: adding a scalar multiple of a row of A to another row. These row operations are considered as scalar multiplication and addition of vectors in Fn . Note that row operations do not change the size of a matrix. If an m × n matrix A is changed to a matrix B by a sequence of row operations (or even by a single row operation), we say that B is row equivalent to A. So, row equivalence is a relation in Mm×n (F), which we will show, a little later, to be an equivalence relation. We sometimes use the symbol ∼ to describe row equivalence; if B is row equivalent to A, then we write B ∼ A. We can also define elementary column operations of three types on a matrix in a similar manner. It suffices to say that all one has to do is to replace the word ‘row’ by ‘column’ in Definition (2.3.1). It is also clear as is what is meant by saying that B is column equivalent to A. Elementary Matrices The study of row equivalence (respectively, column equivalence) is made easier by the fact that the effect of any elementary row operation (respectively, column operation) on a matrix can also be realized by left multiplying (respectively, right multiplying) by certain matrices known as elementary matrices. As the following definition shows, an elementary matrix corresponding to an elementary row operation can be obtained by applying precisely the same row operation to an identity matrix of suitable size; thus, there are three types of elementary matrices. Note that any elementary matrix has to a square one. Definition 2.3.2. An elementary matrix of order m over a field F is a matrix obtained by applying an elementary row operation to the identity matrix Im ∈ Mm (F).

Saikia-Linear Algebra

56

book1

February 25, 2014

0:8

Systems of Linear Equations

Thus, corresponding to the three types of elementary row operations given in Definition (2.3.1), there are three types of elementary matrices: (a) An elementary matrix of order m, corresponding to a row scaling, is obtained by multiplying a row of Im by a non-zero scalar a and therefore is of the form:  1           

1

       .     1

0

. a . .

0

1

(b) An elementary matrix of order m, corresponding to a row exchange, is obtained by interchanging two rows of Im and therefore is of the form:  1             

0

1 . 0 .. .

···

1 .. .

1

···

0 .

0

1

        .       1

(c) An elementary matrix of order m, corresponding to a row replacement, is obtained by adding to a row of Im a scalar multiple of another of its rows and therefore is of the form  1       

1

a

. . .

0

1 1

     .   

It can be easily verified that each of these elementary matrices can be obtained from the identity matrix by either an elementary row operation or the corresponding column operation. For example, to get an elementary matrix of the row replacement type, either we add to the ith row a multiple of the jth row of the identity matrix, or we add to the jth column the same multiple of the ith column. Interchanging the ith and the jth rows in the identity matrix has the same effect as interchanging the corresponding columns. Finally, multiplying the ith row of the identity matrix by a non-zero scalar a is the same as doing the same to the ith column.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Elementary Row Operations

57

It is easy to write down the elementary matrices over any field F in practice. For example, the following is a complete list of elementary matrices of order 2: ' ( ' ( a 0 1 0 Row scaling: or 0 1 0 a Row interchange: Row replacement:

' 0 1

' 1 0

1 0 a 1

(

(

or

' 1 a

0 1

(

Here, a stands for an arbitrary non-zero scalar form F. To take another example, let us list the possible elementary matrices of order 3 of row interchange type. Since the corresponding row operations involve the interchange of two rows, the required matrices can be obtained by simply interchanging two rows of the 3 × 3 identity matrix I3 . Thus, they are       0 1 0 0 0 1 1 0 0       1 0 0, 0 1 0, 0 0 1. 0 0 1 1 0 0 0 1 0 As we have noted earlier, these can also be obtained by interchanging the columns of the identity matrix I3 . Then we examine what happens when we left-multiply an arbitrary 2 × 3 matrix over a field by the elementary matrices of order 2 over the same field in the following example: ' (' ( ' ( 1 a a11 a12 a13 a + aa21 a12 + aa22 a13 + aa23 = 11 ; 0 1 a21 a22 a23 a21 a22 a23 ' 1 a

0 1

(' a11 a21

a12 a22

' 0 1 ' a 0

1 0 0 1

( ' a13 a11 = a23 a21 + aa11

(' a11 a21

(' a11 a21

a12 a22 a12 a22

a12 a22 + aa12

( a13 ; a23 + aa13

( ' a13 a = 21 a23 a11

a22 a12

( a23 ; a13

( ' a13 aa11 = a23 a21

aa12 a22

( aa13 . a23

These calculations show that the effect of applying an elementary row operation to a 2 × 3 (or for that matter, to any matrix having two rows with any number of columns) matrix is the same as that of left-multiplying the matrix by the corresponding elementary matrix of order 2. Stated differently, left-multiplication of a 2 × n matrix by an elementary matrix of order 2 results in the corresponding row operation in that matrix. Interestingly, the column operations can similarly be carried out with the help of elementary matrices. However, we need to right-multiply by elementary matrices to affect the column operations. Leaving the verification in some small cases to the reader (see Exercises 9 and 10), we straightaway record the general result.

Saikia-Linear Algebra

58

book1

February 25, 2014

0:8

Systems of Linear Equations

Proposition 2.3.3. Let A ∈ Mm×n (F). Suppose that e(A) and e' (A) are the matrices obtained from A by some elementary row operation e and column operation e' , respectively. Let E and E ' be the corresponding elementary matrices of order m and n, respectively, over F. Then, e(A) = EA and e' (A) = AE ' . Before beginning the proof, let us note that the sizes of the elementary matrices in the proposition depend on whether we have to left-or right-multiply A by them. Also, we shall deal with row operations e only; the proof with respect to column operations e' can be carried out in a similar manner. Proof. This is one of those results where several cases have to be verified one by one. We choose to verify only one case, leaving the rest to the reader. Consider the row operation e which interchanges the ith and the jth rows of the m × n matrix A. Here, i and j are fixed but arbitrary integers between 1 and m. We need to show that, for any k with 1 ≤ k ≤ m, the kth rows of e(A) and EA are the same. We first assume that k ! i, j. As A and e(A) differ only in the ith and the jth rows, for our choice of k, the kth row of A is the same as the kth row of e(A). 4 On the other hand, a typical entry of the kth row, say, the (k, l)th entry, of EA is given by nr=1 ckr arl if E = [ckl ] and A = [akl ]. Since E is obtained from Im by interchanging its ith and jth rows, it follows that ckk = 1 and ckr = 0 if k ! r so the (k, l)th entry of EA is just akl . Thus, the kth rows of A and EA, and hence the kth rows of e(A) and EA coincide. Next, we assume that k = i or k = j. Suppose that k = i (the case of k = j can be settled the same way). Now, the ith row of e(A) is the jth row of A. On the other hand, as E = [ckl ] is obtained by interchanging the ith and the jth rows of Im , ci j = 1 and cir = 0 if r ! j. Therefore, a typical entry of the 4 ith row EA, say, the (i, l)th one, is nr=1 cir arl = a jl . Thus, the ith row of EA is precisely the jth row of A which, as we have already seen, is the ith row of e(A). This completes the proof of the case we had chosen. ! Since row or column equivalent matrices are obtained by sequences of elementary row operations or column operations, the preceding proposition implies the following. Proposition 2.3.4. Let A ∈ Mm×n (F), and let B ∈ Mm×n (F) be row equivalent to A. Then, there are elementary matrices E1 , E2 , . . . , Er of order m over F, such that B = Er . . . E2 E1 A. Similarly, if C is column equivalent to A in Mm×n (F), then there are elementary matrices F1 , F2 , . . . , F s of order n over F, such that C = AF1 F2 . . . F s . To continue our discussion on row and column operations, we now note that every elementary row or column operation has a reverse operation. To be precise, if A is changed to e(A) or e' (A), then there is the reverse operation e1 or e'1 so that e1 (e(A)) = A and e'1 (e' (A)) = A. For example, if e(A) is obtained from A by interchanging the ith and the jth rows, then interchanging the same rows of e(A), we get back A from e(A). Similarly, corresponding to the operation of adding a times the jth row to the ith row of A, the operation of adding of (−a) times the jth row of e(A) to its ith row will be the reverse operation on e(A). These remarks form the basis of the proof of the following result about elementary matrices.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Elementary Row Operations

59

Proposition 2.3.5. Any elementary matrix over a field F is invertible. Moreover, the inverse of an elementary matrix over F is again an elementary matrix of the same order over F. Before formally proving this proposition, we consider some examples. To find, for example, the inverse of the elementary matrix ' ( 0 1 , 1 0 we note that the corresponding elementary row operation interchanges the two rows of a 2 × n matrix. Since to undo this change, we have to interchange the two rows of the resultant matrix, it follows that the reverse of the row operation must be the same. Thus, the inverse of the given elementary matrix is ' (−1 ' ( 0 1 0 1 = . 1 0 1 0 Similarly, as the reverse of the process of adding 3 times the 3rd row to the 2nd row of a matrix is to add (−3) times the 3rd row to the 2nd row, we see that  1  0 0

0 1 0

−1  1 0   3 = 0   1 0

0 1 0

 0  −3.  1

With these examples in mind, we now proceed with the proof.

Proof. Consider an elementary matrix E of order m over a field F. If E is obtained by multiplying the ith row of the identity matrix Im by a non-zero scalar d, then it is clear that the elementary matrix obtained by multiplying the same row of Im by d−1 is the inverse of E. (Just multiply the two elementary matrices which are diagonal matrices.) In case E is obtained by adding d times the ith row of the identity matrix Im to its jth row, then E = Im + dei j where ei j is the unit matrix of order m over F having 1 as the (i, j)th entry and zeros elsewhere. Now, using the formula for multiplication of unit matrices given in Proposition (1.3.7), we see that, as i ! j, (Im + dei j )(Im − dei j ) = Im + dei j − dei j + d2 ei j ei j = Im .

Therefore, the elementary matrix Im − dei j is the inverse of E. Finally, observe that if E is obtained by interchanging the ith and the jth row of Im , then E = Im − eii − e j j + ei j + e ji . We leave it to the reader (see Exercise 8 of this section) to verify, by using again the formula for multiplication of unit matrices, that E 2 = Im . Thus, E is its own inverse. This completes the proof. ! We now record the implications of this result for row equivalent matrices. Proposition 2.3.6. Let A, B ∈ Mm×n (F) such that B is row equivalent to A. Then, there is an invertible matrix E of order m over F such that B = EA.

Saikia-Linear Algebra

60

book1

February 25, 2014

0:8

Systems of Linear Equations

Proof. If B is row equivalent to A, then by Proposition (2.3.3) there are elementary matrices E1 , E2 , . . . , Er , each of order m, such that B = Er Er−1 . . . E1 A. By the preceding proposition, each Ek is invertible. However, a product of invertible matrices is invertible (see Proposition 1.4.3) and Hence E = Er Er−1 . . . E1 is invertible. ! The following corollary is now clear. Corollary 2.3.7. to B.

Let A, B ∈ Mm×n (F) such that B is row equivalent to A. Then A is row equivalent

With these results, it is easy to prove that row equivalence is an equivalence relation in Mm×n (F); we leave the proof to the reader. (See Exercise 16 of this section.) It should be clear that we have analogous results for column equivalent matrices in Mm×n (F). The following observation will be helpful later. Proposition 2.3.8. Let A, B ∈ Mm×n (F) be row equivalent. Let A' be the matrix obtained from A by removing certain columns, and let B' be the matrix obtained from B by removing the corresponding columns. Then, A' and B' are row equivalent. Since elementary row operations affect rows of a matrix column wise, the proposition is clear. Permutation Matrices Let us end this section with a brief discussion of certain class of matrices known as permutation matrices. A permutation matrix of order n (over a field F) is any matrix that can be produced by rearranging, or permuting the rows of the identity matrix In of order n (over F). In other words, a matrix in Mn (F) is a permutation matrix if and only if each row and each column has a single non-zero entry equal to 1. So any elementary matrix of order n, which is obtained by a single exchange of two rows of In , is clearly a permutation matrix; we shall refer to such matrix as an elementary permutation matrix. While, in general, a permutation matrix need not be a symmetric matrix, it is clear that an elementary permutation matrix is symmetric. A more precise description of a permutation matrix depends on a mathematical interpretation of the idea of a permutation. A permutation of n symbols can be thought of as an one-to-one map σ from the set Xn = {1, 2, . . . , n} onto itself; if σ(i) = ki , for 1 ≤ i ≤ n, then one may write σ = {k1 , k2 , . . . , kn }. The corresponding permutation of the rows of In produces a permutation matrix P; it is clear that the ith row of P is the ki th row of In . If we denote the n-dimensional row vector which is the ith row of In as ei (so ei has 1 at the ith place and zeros elsewhere), we can also describe P as the matrix of order n, whose ith row is eki . For example, corresponding to the the permutation {2, 3, 1} of X3 = {1, 2, 3}, we have the permutation matrix     e2   0 1 0      P = e3  =  0 0 1 .     e1 1 0 0

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Elementary Row Operations

61

The identity matrix In can be considered a permutation matrix of order n. It corresponds to the identity permutation, that is, the identity map on Xn . With this convention, it is an easy exercise in counting to show that there are n! permutation matrices of order n over any field. The simplest permutation matrix, other than the identity matrix, is the one that can be obtained from the identity matrix by interchanging two rows in it. It corresponds to a permutation σ of Xn , which interchanges two symbols, say i and j, in Xn while fixing all the other symbols in Xn ; any such permutation is known as a transposition and is usually denoted by the symbol (i, j). So while the corresponding permutation matrix P has e j at the ith and ei at the jth row, all the other rows of P are the same as the corresponding rows of In . As we have noted earlier, this particular permutation matrix, which sometimes will be denoted by Pi j , is precisely the elementary matrix (of order n) corresponding to the elementary row operation of interchanging the ith and the jth rows. Note: Pi j = P ji . such a matrix, which we have called an elementary permutation matrix, is symmetric. Thus, any permutation matrix in Mn (F), which is generated by a transposition, is symmetric. Before proceeding any further, we consider an example. As we have already seen while discussing elementary row operations, the elementary permutation matrices of order 3 are the following:        0 1 0   0 0 1   1 0 0        P12 =  1 0 0 , P13 =  0 1 0 , P23 =  0 0 1 .       0 0 1 1 0 0 0 1 0 Apart from these three and the identity matrix, there are two more permutation matrices of order 3:      0 1 0   0 0 1      P =  0 0 1 , Q =  1 0 0 .     1 0 0 0 1 0

It is clear that P can be obtained from I3 by first interchanging 1st and the 2nd rows, and then interchanging 2nd and 3rd rows in the resultant matrix. The elementary matrices corresponding to these row interchanges are P12 and P23 . Now, it turns out that     1 0 0   0 1 0      P23 P12 =  0 0 1   1 0 0  = P.    0 1 0 0 0 1 On the other hand, if we first interchange the 2nd and the 3rd row of I3 and then interchange the 1st and the 2nd rows of the resultant matrix, we end up with the other permutation matrix Q. We leave it to the reader to verify that P12 P23 = Q. So, at least for permutation matrices of order 3 (over any field), we have verified that any such matrix is a product of elementary permutation matrices; this product may include only one factor or even no factors in order to cover the cases of elementary permutation matrices and of the identity matrix, respectively. In general, consider elementary permutation matrices P1 , P2 , . . . , Pk of order n over any field, and set P = P1 P2 . . . Pk , the product of these given matrices. Now, each Pi corresponds to an elementary row operation, namely a single interchange of rows. Hence, the product P = PIn , by Proposition (2.3.4), can be obtained by applying the sequence of row interchanges, corresponding to the matrices Pi , to

Saikia-Linear Algebra

62

book1

February 25, 2014

0:8

Systems of Linear Equations

In (note that this means P2 P1 In can be obtained by performing the row interchange given by P2 in P1 In ). It follows that the rows of P have to be the rows of In permuted in some order determined by the sequence of row interchanges and so P is a permutation matrix of order n. Thus, we have shown that the product of a finite number of elementary permutation matrices is a permutation matrix. Conversely, any rearrangement of the rows of In can be produced by subjecting In to a sequence of suitable row interchanges. For example, as we have seen a short while ago, the rearrangement of the rows of I3 , given by the permutation {2, 3, 1} of X3 = {1, 2, 3}, can also be produced by applying the row interchanges corresponding to transpositions (12) and (23), in that order, to I3 . The general assertion is a restatement of a basic fact in the theory of permutations: any permutation on n symbols is a product of transpositions. (Since proving this needs a somewhat lengthy digression, we omit the proof which can be found in any basic text book dealing with elementary group theory). Now, applying a sequence of row interchanges to In , by Proposition (2.3.4), is equivalent to left-multiplying In by the corresponding elementary matrices. But any such elementary matrix is a permutation matrix corresponding to some transposition. It follows that if a permutation matrix P of order n, is obtained from In by subjecting In to a sequence of row interchanges corresponding to elementary permutation matrices, say, P1 , P2 , . . . , Pk , then P = Pk . . . P2 P1 In , showing that P = Pk . . . P2 P1 . Thus, we have proved the first assertion of the following proposition. Proposition 2.3.9. Let P be a permutation matrix of order n (over any field). (a) P is a product of elementary matrices which are themselves permutation matrices of order n. (b) If Q is another permutation matrix of order n, then PQ is a permutation matrix of order n. (c) P is invertible and P−1 = Pt . Proof. Coming to the proof of the second assertion, we note that, by the first assertion, PQ is a product of elementary permutation matrices. Now, as we have shown earlier, any product of elementary permutation matrices is a permutation matrix. It follows that PQ is a permutation matrix. To prove the final assertion, we express P as a product of elementary permutation matrices, say P = Pk Pk−1 . . . P2 P1 . Recall that any elementary matrix is invertible; in fact, the inverse of any elementary permutation matrix is itself (see Proposition 2.3.5). It follows that P is invertible and −1 −1 P−1 = P−1 1 P2 . . . Pk = P1 P2 . . . Pk .

Finally, observe that any elementary permutation matrix is symmetric, that is, Pi = Pti . Therefore, by using the formula for the transpose of a product of matrices, one obtains the following: P−1 = Pt1 Pt2 . . . Ptk

= (Pk Pk−1 . . . P1 )t = Pt

as required.

!

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Elementary Row Operations

63

EXERCISES 1. Determine whether the following statements are true or false giving brief justifications. All given matrices are over an arbitrary field. (a) Two row equivalent matrices have to be of the same size. (b) Two row equivalent matrices are column equivalent. (c) For any two elementary matrices E1 and E2 of order n, E1 E2 = E2 E1 . (d) Every elementary matrix is invertible. (e) The transpose of an elementary matrix is an elementary matrix. (f) The product of two elementary matrices of the same order is an elementary matrix. (g) The sum of two elementary matrices of the same order is an elementary matrix. (h) Every non-zero square matrix of order n is row equivalent to In . (i) The identity matrix In is an elementary matrix. (j) The set of all permutation matrices of order n is a group with respect to matrix multiplication. 2. Determine whether the following matrices are elementary matrices; for each of the elementary matrices, find a single elementary row operation that produces it from a suitable identity matrix. ' ( ' ( 1 1 1 0 , , 1 1 0 0  0  1 0

1 0 1  1 0  0  0

 0  0,  1 0 0 1 0

 0  0 1 0 1 0 0

0 1 0  0  0 , 0 1

 1  0,  0

 1 0  0  0

  1   0 −3 0 1 0 0

4 0 0 0

0 1 0  0  0 . 0 1

 0  0,  1

3. For each of the elementary matrices of Exercise 2, determine its inverse. 4. Let      3 −1 2 1 0 −1     1 1 A = −2 −1 −1, B = 2     1 0 −1 3 −1 2 and

 1  C = 1  4

0 1 −1

 −1  0  3

be matrices over R. Find elementary row operations that changes A to B, and then elementary column operations that changes B to C. Hence, determine invertible matrices E and F in M3 (R)

Saikia-Linear Algebra

64

book1

February 25, 2014

0:8

Systems of Linear Equations

such that EA = B and BF = C. 5. Let

and

 1  A = 2  1

0 1 −2

 −1  2,  1

 1  C = 1  1

6. 7. 8.

9.

 1  B = 1  1 0 3 −2

0 3 −2

 −1  1  1

 0  2  2

be matrices over R. Find an elementary row operation that changes A to B and an elementary column operation that changes B to C. Finally, show that a sequence of elementary row operations will transform C to I3 . Find invertible matrices E and F in M3 (R) such that EAF = I3 where A is the matrix of Exercise 5. Complete the proof of Proposition (2.3.3). Let i ! j and m be positive integers such that 1 ≤ i, j ≤ m. Express the following products of unit matrices of order m over a field F as single matrices: eii 2 , e j j 2 , ei j e ji , e ji ei j , ei j ei j and e ji e ji . Hence, show that if' E = I(m − eii − e j j + ei j + e ji , then E 2 = Im . 1 2 Given the matrix A = over R, and matrices 3 4 '

1 E1 = −1

( ' 0 0 , E2 = 1 1

( ' 1 −2 and E3 = 0 0

( 0 , 1

compute AE1 , AE2 and AE3 . Verify that there are elementary column operations which transform A to ' AE1 , AE2 and (AE3 . a12 a13 a be an arbitrary 2 × 3 matrix over a field F. Verify that: 10. Let A = 11 a21 a22 a23 (i) the interchange of the 2nd and the 3rd column of A can be produced by right-multiplying it by   1 0 0 0 0 1,   0 1 0

(ii) the addition of 2 times the 1st column of A to the second column can be produced by rightmultiplying it by   1 2 0 0 1 0.   0 0 1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Row Reduction

65

11. Consider the matrix  a b A =   c d

0 a b c

0 0 a b

 0  0 , 0 a

where the non-zero scalars a, b, c and d are from a field F. Let E be the elementary matrix obtained by adding −b times row 2 of I4 to its row 4. Find the matrix EA by subjecting A to suitable row operation. 12. Let A ∈ Mn (F). Show that if a finite sequence of elementary row operations reduces A to the identity matrix In , then A is invertible. In fact, the converse is also true. But we have to wait till Section 2.5 to see it. 13. Let      1 3 −1 1     2 and b = 1 A =  0 1     −1 0 8 3

be real matrices. Perform elementary row operations to the 3 ×4 block matrix [A | b] to transform it to the form [I3 | b' ]. Hence, deduce the solutions of the following system of equations: x1 + 3x2 − x3 = 1 x2 + 2x3 = 1 −x1 + 8x3 = 3.

14. Express the matrix  0 1 A =  0 0

1 0 0 0

0 0 0 1

 0  0  1 0

as a product of elementary matrices of order 4. Find the inverse of A. 15. Prove Corollary (2.3.7). 16. Show that row equivalence is an equivalence relation in Mm×n (F) for any field F. 17. State and prove results analogous to Proposition (2.3.6) and Corollary (2.3.7) about column equivalent matrices in Mm×n (F). 18. Prove that there are n! permutation matrices of order n over any field F. 19. Let P be a permutation matrix and D a diagonal matrix in Mn (F). Prove that PDP−1 is diagonal.

2.4 ROW REDUCTION We now come back to the systems of linear equations and their analysis by means of elementary row operations. We have seen in Section 2.2 that by applying suitable elementary row operations to the augmented matrix of a system of linear equations, we can arrive at a simpler system from which the solutions can be read off easily. It was tacitly assumed that any solution of this row-equivalent system is also a solution of the original system. Recall that two systems of linear equations are equivalent if they have the same solution set. In other words, we assumed that row-equivalent systems of linear

Saikia-Linear Algebra

66

book1

February 25, 2014

0:8

Systems of Linear Equations

equations are equivalent. We begin this section by proving this assumption. Refer to Section 2.2, especially the notations and terminologies introduced beginning with Equations (2.1) and (2.2), and Definition (2.2.2). Proposition 2.4.1. Consider the system of linear equations over a field F given by Ax = b, where A ∈ Mm×n (F) is the coefficient matrix and b is an m × 1 column vector of scalars from F. Suppose that a series of elementary row operations on the augmented matrix [A | b] of this system reduces it to = ' '> A | b . Then, the solutions of Ax = b and A' x = b' are the same. = > Proof. Let C = [A | b] and C ' = A' | b' be the augmented matrices of the two systems. Since C ' is obtained from C by a series of row operations, it follows that C ' = Er Er−1 . . . E2 E1C = EC, where E = Er Er−1 , . . . , E2 E1 is the product of the elementary matrices E1 , E2 , . . . , Er of order m corresponding to the successive elementary row operations applied to C. As was seen in the last section, the matrix E, being a product of elementary matrices, is invertible (see Proposition 2.3.5). Also, note that by virtue of properties of block multiplication, or by straightforward calculation with the entries of E and C, we have EC = [EA | E b], so that = > [EA | E b] = C ' = A' | b' .

Now, suppose that an n × 1 column vector u over F is a solution of the system Ax = b. It means that the matrix equation Au = b holds. It follows that A' u = (EA)u = E(Au) = E b = b' which shows that u is also a solution of the system A' x = b' . Since E is invertible, a similar argument shows that any solution of A' x = b' is a solution of Ax = b. ! Observe that this proposition also implies that Ax = b is inconsistent, that is, it has no solution if and only if A' x = b' is inconsistent. Row Echelon Matrices We next seek a standard form to which the augmented matrix of a system of equations can be row reduced so that corresponding equivalent system of equations is simple enough. The row echelon form of the augmented matrix is one such form. Such forms give rise to simpler systems as row echelon matrices have rows in step-like fashion (echelon) and in general, look something like  1    





0 1



0

∗ ∗

∗ ∗

∗ ∗

 0 ∗ ∗ 0 · · ·   0 ∗ ∗ 0 · · ·  . 1 ∗ ∗ 0 · · · 1···

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Row Reduction

67

We introduce some terminologies before we give the formal definition. By a non-zero row (respectively, column) of a matrix we mean a row (respectively, column) in which at least one entry is non-zero; by a zero row (respectively, column) we mean a row (respectively, column) having all zero entries. A leading entry of a row means the first non-zero entry from the left of a non-zero row, that is, the left-most non-zero entry of the row. Definition 2.4.2. are satisfied:

A matrix in Mm×n (F) is said to be a row echelon matrix if the following conditions

(a) All non-zero rows are above any zero row. (b) Each leading entry in a non-zero row is in a column to the right of the leading entry in any row above it. (c) All the entries in the column below a leading entry are zero. A row echelon matrix is, further, said to be a reduced row echelon matrix provided it satisfies two more conditions: (d) The leading entry in each non-zero row is 1. (e) The leading entry is the only non-zero entry in the column containing the leading entry. The leading entry of a non-zero row of a matrix in an echelon or a reduced echelon form is called a pivot, the location of a pivot a pivot position and a column containing a pivot a pivot column. Quite often, a column of a matrix is also referred to as a pivot column if the column corresponds to a pivot column of any of its row echelon forms. The difference between a row echelon matrix and a reduced row echelon matrix is that the leading entry 1 is the only non-zero entry in its column in a row reduced echelon matrix, whereas there may be non-zero entries above a leading entry in a row echelon matrix. Moreover, the leading entry in a row echelon matrix may be any non-zero scalar, and not necessarily 1 as in the case of a row reduced echelon matrix. Thus,   2 3 0 4   0 1 0 4 0  3 2 0 0 −1 0 6 2 0 1 −1 6    2  and 0 0  0 0 5 −1  0 0   0 4 −3 0 0   0 0 0 0  0 0 0 0 0 0 0 0 0 0 0

are examples of row echelon matrices, whereas  1 0  0  0

0 1 0 0

0 0 0 0

0 0 1 0

 0  2  −3 0

and

 0 0  0 0  0

1 0 0 0 0

0 1 0 0 0

3 0 0 0 0

0 0 1 0 0

 4  2  −1  0 0

are examples of row reduced echelon matrices. One of the most crucial facts about matrices is that any matrix over a field can be row reduced to a row echelon, or even to a reduced row echelon matrix, by appropriate elementary row operations. This procedure for obtaining a row echelon or a reduced row echelon matrix row equivalent to a given matrix is known as the row reduction of that matrix, and such row reduced matrices are known as row echelon forms or the reduced row echelon form of the given matrix. It is clear that any echelon form of a zero matrix is itself. We now give the algorithm for row reduction of any non-zero matrix.

Saikia-Linear Algebra

68

book1

February 25, 2014

0:8

Systems of Linear Equations

Algorithm for Row Reduction 2.4.1 Consider any rectangular non-zero matrix over a field. Step 1: Pick the first non-zero column, that is, the left-most non-zero column. If a non-zero entry occurs in the first row of this column, go to Step 2; otherwise bring any non-zero entry in this column to the top row by interchanging the first row and the row containing that non-zero entry. Step 2: Add or subtract suitable multiples of the first row (of the new matrix if some interchange has taken place in Step 1) to other rows to make all the entries, except the first one of the first non-zero column, zeros. Step 3: Cross out the first row and the first non-zero column (which has a single non-zero entry at the first place now) as well as all the zero columns, if any, to the left of the next non-zero column in the matrix obtained after the first step. Repeat Steps 1 and 2 in the new submatrix thus obtained. Note that first column of this new matrix is a non-zero column. It is clear that continuing this process will reduce the original matrix to a row echelon matrix. Note that the executions of Steps 1 and 2 for the first time produce the first pivot column in the matrix, and applying the same steps to the subsequent submatrices will yield the other pivot columns one by one. To produce the row reduced echelon form of the matrix, we need to carry out the next steps. Step 4: Multiply each row containing a pivot element by a suitable scalar to make each pivot equal to 1. Note that by repeating Step 2, all the entries below each pivot element were already made equal to zero. The next step will ensure that all the entries above each pivot are made zeros. Step 5: Beginning with the right-most pivot, make all the entries above pivot zeros by adding or subtracting suitable multiples of the row containing the pivot to the the rows above it. Since any non-zero entry of a matrix over a field has a multiplicative inverse, it follows that the preceding algorithm applied to a non-zero matrix will produce its row echelon form as well as its reduced row echelon form. Two important remarks about row echelon and row reduced echelon forms of a matrix are in order: • Row echelon forms of a matrix are not unique. Different sequences of row operations on a given matrix will produce different echelon matrices. In other words, a given matrix may be row reduced to a number of different row echelon matrices. For example, both ' 1 0

( ' 2 1 and 2 0

0 2

(

' ( 1 2 are row echelon forms of . 1 4 • Fortunately, the reduced row echelon form of a given matrix is unique. That is, the reduced row echelon form of a given matrix is independent of 'the sequence of row operations employed. Taking ( 1 0 the matrix of the preceding remark, we see that is its only reduced row echelon form. 0 1 We will give a proof of the uniqueness of the reduced row echelon form of a matrix shortly.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Row Reduction

69

Solutions of Systems of Linear Equations ' b' ] of the Our first task is to /see how 0 the row echelon or the row reduced echelon form, say, [A augmented matrix A b of a system of linear equations Ax = b help us to find the solutions of the system. We look at some examples to begin with. For example, assume that the reduced row echelon form of an augmented matrix looks like

/

A

'

 1 b = 0 0 '

0 1 0

−1 0 0

0

 2  4 .  1

3 5 0

The corresponding system of equations (which is equivalent to some given system) is clearly the following one: x1 − x2 + 3x4 = 2 x3 + 5x4 = 4 0 = 1. Because the last equation is an absurdity, this system and therefore, the original system it is equivalent to, has no solution. Thus, we see that if the last column of the reduced matrix is a pivot column, then the original system of equations is inconsistent. Note that the last column is the reduced column vector b' of the system Ax = b. Next, consider the reduced echelon matrix

/

A'

 1  0 0 ' b =  0  0

0

3

1

−1

0 0

0 0

−1

1

1

0

0

0

The corresponding system of equations will be x1

2

0

+ 3x3

− x5 = 0

x2 − x3

+ 2x5 = 3

 0  3 . 4  0

x4 + x5 = 4. Note that the variables in the pivot columns appear exactly once in the whole system; these are called the basic variables of the original system of equations. The other variables are called the free variables. Thus, in the last example, x1 , x2 and x4 are the basic variables, whereas x3 and x5 are the free variables. We obtain the solutions of the system by assigning arbitrary values to the free variables, if any, and then expressing the basic variables in terms of these arbitrary constants according to the

Saikia-Linear Algebra

70

book1

February 25, 2014

0:8

Systems of Linear Equations

equations involving the basic variables. So, for arbitrary scalars a and b, we have x5 = a x3 = b x4 = 4 − a x2 = 3 − 2a + b x1 = a − 3b,

which constitute the complete solution of the given system. Note that because of the presence of the free variables, which can be assigned arbitrary values, there can be no unique solution of the system of equations. For every choice of scalars a and b for the variables x3 and x5 , respectively, we have one solution of the system. The arbitrary choice of the free variables works, because the system of equations does not put any constraints on the free variables. Observe that any solution whose components were described above can also be expressed as a sum of scalar multiples of fixed column vectors. For examples,          a − 3b  0  1 −3 3 − 2a + b 3 −2  1               b  = 0 + a  0 + b  1 .     −1  0  4 − a  4     a 0 0 0

This is known as the general solution of the given system in parametric form. The scalars a and b are the parameters. We now summarize the basic facts about the method of finding solutions of a system of equations using the reduced row echelon form of the augmented matrix of the system of equations. > = Theorem 2.4.3. Let A' | b' be the reduced row echelon form of the augmented matrix [A | b] of a system Ax = b of linear equations over a field F. = > (a) If the last column of A' | b' is a pivot column, then the system Ax = b is inconsistent. = ' '> (b) If the last column of A | b is not a pivot column, then the system does have solution. In this case, two situations may arise: • If every column of A' is a pivot column, then the system Ax = b has a unique solution.

• Otherwise, the solutions of Ax = b consist of the basic variables, that is, the variables in the pivot columns of A' , expressed in terms of the other variables known as the free variables. The free variables can be assigned any arbitrary values. The proof of the theorem is clear and left to the reader. Recall that it is customary to call a column of an arbitrary matrix also as a pivot column if it corresponds to a pivot column of the reduced row echelon form of that matrix. Therefore, it is clear that the preceding theorem can also be stated in terms of pivot columns of [A | b]. One should note that if one is interested in deciding only the consistency of a given system of equations, it is sufficient to find a row echelon form of the augmented matrix, and not necessarily the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Row Reduction

71

reduced form. For, if the last column of any row echelon form is a pivot column, then we can conclude that the system is inconsistent. The following corollary is clear. Corollary 2.4.4. A homogeneous system of linear equations Ax = 0 is always consistent. The number of free variables comprising the general solution of the system is the number of non-pivot columns of A. Furthermore, this system will have only the zero solution if and only if every column of A is a pivot column. Note that while finding solutions of a homogeneous system Ax = 0 of linear equations, there is no need to form the augmented matrix of the system; one row reduces A directly. We apply the algorithm of row reduction to the augmented matrix of the following system of equations over R to find its solutions, if any. EXAMPLE 5

Consider the system of equations: 3x2 + 9x3 + 6x5 = −3 −2x1 + 4x2 + 6x3 − 4x4 = −2 4x1 − 11x2 − 18x3 − 2x4 − 3x5 = 10. The augmented matrix to which the algorithm will be applied is   3 9 0 6 −3  0  −2 4 6 −4 0 −2.   4 −11 −18 2 −3 10

The very first column being non-zero, we interchange rows 1 and 2 as the first step so that top-most left position of the matrix is non-zero. We could have interchanged rows 1 and 3 to achieve the first goal; this will lead to an echelon form probably different from the one we will be getting. The row equivalent matrix after this first row operation is   4 6 −4 0 −2 −2   0 3 9 0 6 −3 R1 ↔ R2 .   4 −11 −18 2 −3 10

Note that for convenience, we are indicating briefly the operations to the right of the matrix. As the second step, we add twice the 1st row to row 3 so that all the entries in the first column, except the first one, are zeros. Clearly, while dealing with a bigger matrix, this step has to be repeated several times to yield the same shape in the matrix. The new equivalent matrix is   4 6 −4 0 −2 −2   0 3 9 0 6 −3 R3 → R3 + 2R1.   0 −3 −6 −6 −3 6 For the third step, we cover up the first row and the first column, and repeat step 1 and 2 to the smaller submatrix. Since the submatrix has already a non-zero entry,

Saikia-Linear Algebra

72

book1

February 25, 2014

0:8

Systems of Linear Equations

namely 3, in the top left-most position, we can go directly to the 2nd step. Adding the second row to the third row, we get   −2 4 6 −4 0 −2  0 3 9 0 6 −3 R3 → R3 − R2 .   0 0 3 −6 3 3 We have already reached a row echelon form, and as the last column is not a pivot column, the system does have solutions. To produce the actual solutions, we have to apply steps 4 and 5 to the last matrix. Step 4 gives us easily   1 2 0 1 −2 −3  0 1 3 0 2 −1 R1 → −1/2R1, R2 → 1/3R2, R3 → 1/3R3.   0 0 1 −2 1 1

Step 5 is about making all the entries above the pivot entries zero. We begin by the right-most pivot entry, which is 1 in the 3rd column; Adding 3 times the third row to the first row, and subtracting the same from the second row we can make all the entries above these pivot zeros:   4 3 1 −2 0 −4  0 1 0 6 −1 −4 R1 → R1 + 3R3 , R2 → R2 − 3R3.   0 0 1 −2 1 1

Finally, by adding suitable multiples of the second row to the first row, the following reduced row echelon form is produced:   8 1 −4 1 0 0  0 1 0 6 −1 −4 R1 → R1 + 2R2 .   0 0 1 −2 1 1

Observe that by Theorem (2.4.3), x4 and x5 are the free variables as the fourth and fifth columns are not pivot columns. Also, note that the last column is not considered for deciding about free variables; it is the column for the scalars. Now, the system of equations corresponding to the reduced row echelon matrix is clearly x1 x2 x3

+ 8x4 + x5 = −4 + 6x4 − x5 = −4 − 2x4 + x5 = 1.

which is equivalent to the original system of equations. To write down the solutions, we follow the procedure outlined in Theorem (2.4.3). We first assign arbitrary constants to the free variables, and then express the basic variables in terms of these constants according to the last set of equations. Thus, the components of any solution will be given by x5 x4 x3 x2 x1

= a = b = 1 − a + 2b = −4 + a − 6b = −4 − a − 8b.

where a and b are arbitrary reals. Thus, we see that for every choice of reals a and b, the list (−4 − a − 8b, −4 + a − 6b, 1 − a + 2b, b, a) is a solution of the given sys-

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Row Reduction

73

tem of equations. We leave it to the reader to show that the general solution in the parametric form of the given system of equations can be put in the form          x1  −4 −1 −8  x  −4   −6   1    2         x3  =  1 + a −1 + b  2.  x   0  0  1  4        x5 0 1 0

In the next example, we will have the opportunity to consider inconsistent systems. EXAMPLE 6

We find the values of a for which the following system of equations over R is inconsistent: x1 + 5x2 − 3x3 = −4 −x1 − 4x2 + x3 = 3 −2x1 − 7x2 = a. The augmented matrix is

  1 −1  −2

5 −4 −7

−3 1 0

 −4  3.  a

Adding the first row to the second row, and 2 times the first row to the third row, we make the entries other than the pivot one in the 1st column zeros. The row equivalent matrix now looks like   −4 1 5 −3  0 1 −2 −1.   0 3 −6 a − 8

Next, we add 3 times the second row to the third row to make the entry below the pivot one in the second column zero. This produces   −4 1 5 −3  0 1 −2 −1   0 0 0 a−5

which is the row echelon form of the augmented matrix. Its last column is a pivot column unless a = 5. In other words, if the real number a ! 5, then the given system is inconsistent.

We now prove the uniqueness of the reduced row echelon form of a matrix; the simple proof we give is due to Thomas Yuster. Theorem 2.4.5.

The reduced row echelon form of any matrix over a field F is unique.

Proof. Let A be an m × n matrix over a field F. We can assume that A is a non-zero matrix. The proof is by induction on n. If n = 1, there is nothing to prove as (1, 0, 0, . . . , 0)t is the only possible reduced row echelon form of A. So, we can assume that A has n columns, where n ≥ 2. Let A' be the matrix obtained from A by deleting the last column. As elementary row operations on a matrix affect it column

Saikia-Linear Algebra

74

book1

February 25, 2014

0:8

Systems of Linear Equations

wise, therefore any sequence of elementary row operations that reduces A to its reduced row echelon form also transforms A' to its reduced row echelon form. However, by the induction hypothesis, the reduced row echelon form of A' is unique. It follows that if B and C are two reduced row echelon forms of A, then they can differ only in the nth column only. Thus, if B ! C, then there is some j, (1 ≤ j ≤ m) such that the ( j, n)th entries of B and C are not equal, say, b jn ! c jn . Now, let v = (v1 , v2 , . . . , vn )t be an n × 1 column vector over F such that Bv = 0. Then, as B and C are row equivalent, Cv = 0 so (B−C)v = 0. But the first n −1 columns of B −C are zero columns, so by considering the jth component of (B − C)v, we can conclude that (b jn − c jn )vn = 0, or that vn = 0 as b jn ! c jn . Thus, we have shown that in case B ! C, then any solution x = (x1 , x2 , . . . , xn )t of the system of equations Bx = 0 or Cx = 0 must have xn = 0. It follows that xn cannot be a free variable for these systems of equations, or equivalently, the nth column of both B and C are pivot columns. But the first n − 1 columns of B and C are identical to the corresponding columns of the reduced row echelon form of A' . Therefore, the rows containing the pivots in the nth column of B and C cannot be different as these must correspond to the first zero row of A' . Finally, note that all the entries, except for the pivots, in the nth columns of B and C are zeros. These two observations imply that B = C contradicting our assumption about B and C. This establishes the theorem. ! We finally consider a square matrix A of order n whose reduced row echelon forms R has no zero rows. Then every row of R has a pivot. To accommodate these n pivots, each of the n columns of R must have a pivot. This forces R = In , which proves the following useful result. Proposition 2.4.6. row, then R = In .

If the reduced row echelon form R of a matrix A ∈ Mn (F) does not have a zero

EXERCISES 1. Determine whether the following statements are true or false giving brief justification. All given matrices are over an arbitrary field F. (a) Every matrix has only one row echelon form. (b) The reduced row echelon form of a non-zero matrix cannot be the zero matrix. (c) The reduced row echelon form of a non-zero square matrix is the identity matrix. (d) Any row echelon form of an invertible matrix has to be invertible. (e) If the reduced row echelon form of the augmented matrix of a system of linear equations has no zero row, then the system has a unique solution. (f) If the reduced row echelon form of the augmented matrix of a system of linear equations has no zero column, then the system has no solution. (g) If a square matrix A of order 3 has two pivot columns, then the system Ax = 0 has non-zero solutions. 0 / 0 / (h) If A' b' can be obtained from A b by a finite number of elementary column operations, then Ax = b and A' x = b' are equivalent. (i) If the reduced row echelon form R of a square matrix does not have a zero column, then R must be an identity matrix. (j) In any row echelon form of a matrix, the pivot in any non-zero row is the only non-zero entry in that pivot column.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Row Reduction

75

(k) If every column of a row echelon form of a matrix A is a pivot column, then the system of equations Ax = b for any b has a solution. (l) A basic variable in a system of equation Ax = b is a variable that corresponds to a pivot column of A. (m) The last column of the augmented matrix of a system of equations Ax = b can never be a pivot column. (n) If the number of variables in a system of equations Ax = b exceeds the number of equations, then the system has to be consistent. 2. Determine whether the following matrices over R are in row echelon form; which ones are in reduced row echelon form? ( ( ' ' / 0 0 1 1 3 1 0 0 1 0, , , 1 0 0 1  1  0 0

 1  0 0

2 1 0 0 0 1

3. Transform the following matrices operations:  1 2  2 3 3 4

1 0 0

 −1  0,  0

 0  1 ,  0

 1  0 0

0 0 1

 1  0 0

1 0 0 0 0 0

1 1 0

 0  0,  0

 0  1.  1

over R to reduced row echelon forms by elementary row 3 4 5  0  2 0

 4  5,  6

0 4 1

 1  2 0

2 0 −1

2 3 2 1 6 3

0 1 0  −1  8.  −2

0 0 1

 4  0,  1

What will be the reduced row echelon forms if the matrices are considered over the field Q and C of rational numbers? 4. Describe, in each case, all solutions of Ax = 0 over R in parametric form if A is row-equivalent to   ' ( 1 0 0 0 3 −1 / 0 1 0 −1   2 −4 6 , , 0 2 −2 4 6 −2. 2 1 1   0 0 0 2 0 −2

5. Find the parametric solutions of the following homogeneous system of equations: x1 + 2x2 + x3 = 0 2x1 ; x1 − x2 + 3x3 = 0

− x3 = 0 . 2x2 − x3 = 0

Saikia-Linear Algebra

76

book1

February 25, 2014

0:8

Systems of Linear Equations

6. Determine whether the systems of linear equations over R whose augmented matrices are given below are consistent; if consistent, find the solutions.     1 1 2 1 0 1 2  ,  , 2 2 4 2 0 2 3  3 2   1

−1 1

2 1

−3

0

 1 0,  2

  1 −2   3

−2 4

−1 5

−6

−6

3 −5 8

 0 3.  2

7. Determine whether the following systems of linear equations over R are consistent by row reducing the augmented matrices; if consistent, find the solutions: − 3x5 = −1 x1 + 4x2 + 2x3 (a) 2x1 + 9x2 + 5x3 + 2x4 + x5 = 2. x1 + 3x2 + x3 − 2x4 − 4x5 = 1 2x1 − x2 + 7x3 + 3x4 = 6 x1 + 2x2 + x3 − x4 = −2. (b) −3x1 − x3 + 5x4 = 10 x1 + x2 x1 + x2 + x3 (c) x2 + x3 + x4 x3 + x4

= = = =

1 2 . 2 1

x1 + x2 x1 + x2 + x3 x2 + x3 + x4 (d) x3 + x4 + x5 x4 + x5

= = = = =

1 4 −3. 2 −1

8. In the following system of linear equations, determine the values of the real number a for which the system is consistent and then determine the solutions for such a: x1 + x2 + ax3 = 3 x1 + ax2 + x3 = 0. ax1 + x2 + x3 = 0 9. Prove Corollary (2.4.4). 10. Let A ∈ Mm×n (F). Prove that Ax = 0 has only the zero solution if and only if for any m × 1 matrix b over F, Ax = b has a unique solution. 11. Consider a system of linear equations over a field F in which there are more variables than equations. Suppose that F is infinite (e.g., F = R or C). Show that if the system is consistent, then there will be infinitely many solutions.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invertible Matrices Again

77

12. Give an example of a consistent system of linear equations in which there are more equations than variables. 13. Consider a consistent system of linear equations Ax = b where A ∈ Mm×n (F) and b is an m × 1 column vector over F. Prove the following: (a) The homogeneous system Ax = 0 is consistent. (b) If s is a fixed but arbitrary solution of Ax = b, then any solution of Ax = b can be written as s + s0 for some solution s0 of Ax = 0.

2.5 INVERTIBLE MATRICES AGAIN As we have mentioned earlier, row operations have useful applications in areas other than that of solving systems of linear equations. In this section, we use elementary row operations to analyse invertible matrices and compute their inverses. Recall that a homogeneous system Ax = 0 of m linear equations in n variables over a field F (which means that A ∈ Mm×n (F)) always has the zero solution; in fact, it is the only solution in case every column of A is a pivot column. We now seek a condition on the size of A under which the system Ax = 0 has a non-zero solution. To do that assume that A has r pivot columns. It is clear that r ≤ min{m, n}. So, if m < n, then n − r, the number of non-pivot columns of A, is strictly positive. Since non-pivot columns give rise to free variables, there is at least one free variable in the solution of Ax = 0 (see Corollary 2.4.4). Thus, Ax = 0 will have non-zero solutions as free variables can be assigned any non-zero scalar. We thus have the following result which will have important consequences later. Proposition 2.5.1.

Let Ax = 0

be a homogeneous system of m equations in n variables over a field F. If m < n, then the system will have non-zero solutions. Let us now consider the system of equations Ax = 0, where the coefficient matrix A is a square matrix over F. In that case, there is a connection between the invertibility of A and the solvability of the corresponding system of equations. Proposition 2.5.2.

Let A ∈ Mn (F). Then, A is invertible if and only if the homogeneous system Ax = 0

of linear equations over F has x = 0 as the only solution. Proof. Suppose that A is invertible. If an n × 1 column vector v over F satisfies the system of equations Ax = 0, then multiplying the matrix equation Av = 0 by the inverse A−1 of A, we see that v = 0 which means that Ax = 0 cannot have a non-zero solution. Conversely, assume that Ax = 0 does not have any non-zero solution (it always has the zero solution) over F. Therefore, if R is the reduced row echelon form of A, then the equivalent system Rx = 0

Saikia-Linear Algebra

78

book1

February 25, 2014

0:8

Systems of Linear Equations

also does not have a non-zero solution. Now, according to Proposition (2.4.6), R is either In or it has at least one zero row. In the second case, R has to have a column without a pivot forcing free variables in the system of equations leading to some non-zero solution. Thus, our hypothesis implies that R = In . Since R is the reduced row echelon form of A, it follows by Proposition (2.3.6) that there is an invertible matrix E such that R = EA. Hence, A = E −1 In = E −1 showing that A itself is invertible. ! We can restate the last result as follows. Corollary 2.5.3. identity matrix.

A matrix in Mn (F) is invertible if and only if its reduced row echelon form is the

The following is an easy consequence whose proof is left to the reader. Corollary 2.5.4.

A matrix in Mn (F) cannot be invertible if it has a zero row or a zero column.

We apply Corollary (2.5.3) to triangular matrices now. Observe that an upper triangular matrix A, which, by definition is a square matrix, is already in a row echelon form. Therefore, if even a single diagonal element of A is zero, then the reduced row echelon form of A cannot be the identity matrix. Thus A is invertible if and only if each of its diagonal elements is non-zero. To prove a similar assertion for a lower triangular matrix, we first note that the transpose of a such a matrix is upper triangular with the same diagonal. Since a matrix is invertible if and only if its transpose is, we see that the following result holds for a lower triangular matrix too. Corollary 2.5.5. A triangular matrix in Mn (F) is invertible if and only if each of its diagonal elements is non-zero. Finally, observe that the proof of Proposition (2.5.2) also contains the following useful fact. Proposition 2.5.6. If a matrix in Mn (F) is invertible, then it can be expressed as a product of finite number of elementary matrices in Mn (F). A convenient procedure to compute the inverse of an invertible matrix can now be formulated. Since the reduced row echelon form of an invertible matrix A in Mn (F) is In , there are elementary matrices E1 , E2 , . . . , Er in Mn (F) such that E r . . . E 2 E 1 A = In . It then follows that A−1 = In A−1 = Er . . . E2 E1 In . As each elementary matrix corresponds to an elementary row operation, we can interpret the preceding remark as follows: the elementary row operations which change A to In , when applied in the same sequence to In , will produce A−1 . In practice, we use the following procedure:

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invertible Matrices Again

79

Algorithm for finding Inverse 2.5.1

0 / Let A ∈ Mn (F). Apply elementary row operations to the block matrix A In so as to reduce A to 0 / its reduced row echelon form. If A is row-equivalent to In , then A is invertible, and A In is row/ 0 equivalent to In A−1 . Otherwise A is not invertible. EXAMPLE 7

' ( 2 −2 We find the inverse of A = if it exists. 4 7 0 / We apply elementary row operations to A I2 so as to bring A to its row reduced form. The computation is as follows: ' ( / 0 2 −2 1 0 A I2 = 4 7 0 1 ( ' 1 0 2 −2 ∼ 0 11 −2 1 ( ' 1/2 0 1 −1 ∼ 0 1 −2/11 1/11 ( ' 7/22 1/11 1 0 . ∼ 0 1 −2/11 1/11 Since A has been reduced to I2 , according to our Algorithm (2.5), A−1 exists, and equals the second block in the last matrix.

The process of row reduction is useful not only in the practical matter of finding inverses, but also as a theoretical tool. As an example, we prove that a one-sided inverse of a matrix is necessarily its inverse, a result we had quoted in Chapter 1 (see Equation 1.12). Proposition 2.5.7. Let A ∈ Mn (F). If A has a right inverse (respectively, left inverse) B ∈ Mn (F), that is, if AB = In (respectively, BA = In ), then A is invertible with B as its inverse. Proof. Suppose that AB = In . Let R be the reduced row echelon form of A, and let E be the invertible matrix in Mn (F) such that EA = R.

(2.4)

It therefore follows, by left-multiplying the relation AB = In by E, that RB = E.

(2.5)

Now, by Proposition (2.4.6), R is either the identity matrix or has at least one zero row. The second possibility cannot occur as in that case, the matrix E on the right-hand side of the Equation (2.5) will also have a zero row. In that case, E cannot be invertible by Corollary (2.5.4). Thus, R has to be the identity matrix. Therefore, Equation (2.5) reduces to B = E which implies that B itself is an invertible matrix. Equation (2.4) now, in turn, reduces to BA = In showing that B is the inverse of A. If, on the other hand, we assume that BA = In , the same argument with B in place of A shows us that B is invertible with A as its inverse. But that also implies that A is invertible with B as the inverse. ! We end this section with a result that will be needed later.

Saikia-Linear Algebra

80

book1

February 25, 2014

0:8

Systems of Linear Equations

Proposition 2.5.8. triangular.

The inverse of an invertible lower triangular matrix in Mn (F) is lower

Proof. According to Proposition (2.5.6), the inverse A−1 of an invertible matrix A ∈ Mn (F) is given by the product A−1 = Ek Ek−1 , . . . , E2 E1 , where E1 , E2 , . . . , Ek are the elementary matrices in Mn (F) corresponding (in order) to the elementary row operations which, when applied successively to A, bring it to its reduced row echelon form, the identity matrix. We claim that each Ei is lower triangular. To prove our claim, we first observe that, as A is lower triangular, no elementary operation of the second type, that is, no row interchange, is required for the row reduction of A. An elementary matrix corresponding to an elementary row operation of the first type, that is row scaling by dividing by a nonzero pivot, is diagonal and so lower triangular. Again as A is lower triangular, any row replacement, that is, any elementary row operation of the third type, used for the row reduction of A is the addition of a suitable multiple of some row to a row below it. Therefore, the elementary matrix corresponding to such a row operation is lower triangular. So our claim follows. Since the product of a finite number of lower triangular matrices is lower triangular (see Proposition 1.2.4), our claim implies that A−1 is lower triangular. ! The analogous result for an invertible upper triangular matrix is left as an exercise. EXERCISES 1. Determine whether the following statements are true or false giving brief justification. All given matrices are square matrices over an arbitrary field F. (a) If, for a matrix A of order n, Ax = 0 has a solution, then A is row equivalent to In . (b) If A is invertible, then Ax = b has a solution for any column vector b. (c) If Ax = b has a unique solution for a column vector b, then Ax = c has a solution for any column vector c. (d) If Ax = b is inconsistent for some column vector b, then the reduced row echelon form of A must have at least one zero row. (e) If, for matrices A and B of order n, ABx = 0 only has the zero solution, then A is invertible. (f) Two invertible matrices of the same order are row equivalent. (g) If A is invertible, then At x = 0 has only the zero solution. (h) If all the diagonal entries of an upper triangular matrix A are non-zero, then A is invertible. ' ( a b (i) Suppose that the scalars a, b, c, d are all non-zero. Then the matrix is row equivac d lent to I2 . (j) If all the entries of a square matrix are non-zero scalar, then the matrix must be invertible. 2. Prove Corollaries (2.5.3) and (2.5.4). 3. Determine whether the following matrices over R are invertible; find the inverses of the invertible ones using Algorithm (2.5). Also, express each invertible matrix as a product of elementary matrices.       ' ( 0 0 1 1 1 1 1 1 2 1 2       , 0 1 0, 1 0 1, 1 1 1, 3 5       1 0 0 1 2 1 1 2 1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invertible Matrices Again

 1 2  2  2

2 2 4 4

 2  4 , 0 2

3 6 6 6

  2 −1   1  0

2 2 4 0

 0  0 , 0 2

2 1 1 0

4. Find the inverses of the following matrices over R:  1  0 0

2 1 0

 3  2,  1

 1  2 3

 0  0,  1

0 1 2

 1  2 3

 1 2  4  2

0 3 5 3

 3  2  1

2 1 2

0 0 6 6

  1  1/2 1/3

81

 0  0 . 0 1

1/2 1/3 1/4

 1/3  1/4.  1/5

5. Let a, b be real numbers such that a is non-zero and a ! b. Determine whether the following matrix is invertible over R and if so, find its inverse:  a  a a

 b  b.  a

b a a

6. Determine the inverse of the following matrices in case they are invertible:

7. Find the inverse of

  4  −1   −1  −1

−1 4 −1 −1

−1 −1 4 −1

    A =   

1 0 0 0 0

   , 

−1 −1 −1 4

−1 1 0 0 0

  2  −1   0  0

1 −1 1 0 0

0 1 0 −1

−1 0 1 0

−1 1 −1 1 0

1 −1 1 −1 1

    .  

Hence find the inverse of the following without any row reduction:   1  −1  B =  1  −1  1

0 1 −1 1 −1

0 0 1 −1 1

0 0 0 1 −1

0 0 0 0 1

    .  

0 0 −1 1

   .  

Saikia-Linear Algebra

82

book1

February 25, 2014

0:8

Systems of Linear Equations

8. Let A ∈ Mn (F). Suppose that the column vectors of A are γ1 , γ2 , . . . , γn . Prove that A is invertible if and only if the only scalars c1 , c2 , . . . , cn for which the relation n 1

c i γi = 0

i=1

holds are c1 = c2 = · · · = cn = 0. 9. Let A and B ∈ Mn (F) such that AB is invertible. Prove that both A and B are invertible. 10. Prove that the inverse of an invertible upper triangular matrix in Mn (F) is upper triangular. 11. Let A ∈ Mn (F). Prove that A is invertible if and only if the matrix equation Ax = b has at least one solution for each n × 1 column vector b over F. 12. The coefficient matrices of the following systems of linear equations over R are invertible. Find the solutions of the systems of equations by computing the inverses of the coefficient matrices by elementary row operations. x1 + 2x2 = 3 , 2x1 + 3x2 = 1

x1 x1 x1 2x1

+ x2 + x2 + 2x2 + 3x2

+ x3 + 2x3 + 3x3 + 4x3

+ + + +

2x4 3x4 4x4 5x4

= = = =

1 2 . 3 4

2.6 LU FACTORIZATION As we have seen in the section on Gaussian elimination, this procedure to solve a system of linear equations, such as Ax = b, proceeds in two distinct stages. The first, which can be termed as/ for0 ward elimination, consists of a sequence of elementary row operations to the augmented matrix A b , to eliminate (that is, to make zero) as many entries of A as possible. In the second stage, which is sometimes known as backward substitution, one uses the reduced row echelon form of the augmented matrix, produced by operations of the first stage, to obtain the solutions of the original system of equations. In practice, especially in many computer programmes dealing with solutions of systems of equations, one replaces A with a factorization of A into a product of two or more simpler matrices. For example, in case no row exchange (type two row operation) is needed for the row reduction of A, one can express A as a product A = LU, where L is a lower triangular matrix with all its diagonal entries equal to 1, and U a row echelon form of A obtained without any row scaling. In fact, L too can be obtained as a by-product of of the row reduction of A. Such a factorization A = LU is known as an LU factorization or an LU decomposition of A. If A is an m × n matrix over a field F, then L is of order m whereas U is an m × n matrix, both over F. EXAMPLE 8

The following is an example of an LU factorization of a square matrix A of order 3:       1 −1 −2   1 0 0   1 −1 −2       1 0 −1  =  1 1 0   0 1 1 .      2 3 2 2 5 1 0 0 1

Once an LU factorization A = LU is obtained, the task of solving the system of equations Ax = b reduces to that of solving a pair of simpler systems: first Ly = b for

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

83

y and then U x = y for x. Since Ax = L(U x), the solution x of U x = y is the required solution of Ax = b. The matrix equations Ly = b and U x = y are easier to solve as L is lower triangular with 1 on the diagonal and U is almost upper triangular. For a square matrix A, U is, in fact, upper triangular. EXAMPLE 9

As an example, we solve the system of equations Ax = b over R given explicitly as       1 −1 −2   x1   2       1 0 −1   x2  =  −1  ,      x3 2 3 2 1

using the LU factorization of the coefficient matrix A Ly = b, using the triangular factor L, that is, solve      1 0 0   y1   2       1 1 0   y2  =  −1 y3 1 2 5 1

which is equivalent to the system

given earlier. We first solve     ,

= 2 y1 y1 + y2 = −1 . 2y1 + 5y2 + y3 = 1 It is an easy matter to solve this system by forward substitution as L is a lower triangular matrix with 1 along the diagonal:    2    y =  −3  .   12

We next solve U x = y, that is ,       1 −1 −2   x1   2        0 1 1   x2  =  −3  , 0 0 1 x3 12

which gives the system of equations

x1 − x2 − 2x3 = 2 x2 + x3 = −3 . x3 = 12 Backward substitution then yields the solution x = (x1 , x2 , x3 )t of the original system of equations: x3 = 12 x2 = −15 . x1 = 11

Saikia-Linear Algebra

84

book1

February 25, 2014

0:8

Systems of Linear Equations

We remark that in this example, U is a square matrix of order 3 as A is so; moreover, it turns out that U is upper triangular with non-zero entries along the diagonal as A is invertible. Existence of LU Factorizations We now verify that an LU factorizations of any m × n matrix A over a field F can be obtained provided no row interchange is required for row reducing A into some row echelon form. If no row scaling is performed (that is, no division of a row by the pivot contained in that row) in the row reduction of A, then only the elementary row operations of the third type (row replacements) are needed to bring A to some row echelon form. Now observe that as these row replacements are always additions of suitable multiples of a row to rows below it, the corresponding elementary matrices are necessarily lower triangular. Thus, the process of row reduction of A can be described as follows: there are lower triangular matrices E1 , E2 , . . . , Ek , all of order n over F, such that Ek Ek−1 . . . E2 E1 = U,

(2.6)

where U is an m × n matrix over F in row echelon form. Note: as no elementary operation of type one is used, that is, no row scaling is resorted to in order to make the pivots of U equal to 1, each Ei has all its diagonal entries equal to 1. Thus, each Ei is invertible by Corollary (2.5.5). Setting L = E1 −1 E2 −1 . . . Ek −1 = (Ek Ek−1 . . . E1 )−1

(2.7)

we see, because of Equation (2.6), that A = LU. Moreover, By Proposition (2.5.8) again, each Ei −1 is lower triangular with all its diagonal elements equal to 1; hence, by Proposition (1.2.4), L, being a product of such Ei −1 , is itself a lower triangular matrix with all its diagonal elements equal to 1. This completes the verification of the following result. Proposition 2.6.1. Let F be a field and A ∈ Mm×n (F). Assume that A can be row reduced to a matrix U in row echelon form without using any row interchange, that is, without using any elementary row operation of the third type. Then there is a lower triangular matrix L of order m over F such that A = LU. Moreover, if no row scaling is done to make the pivots in U equal to 1, then all the diagonal entries of L are equal to 1. Why are LU factorizations important in applications? First of all, because of increasingly powerful computing powers available these days, solving very large systems of linear equations and similar large scale computations are routinely performed. Several computer programmes designed for such computations rely on LU factorizations of matrices. It is because such factorizations are relatively more efficient as well as reliable (for example, in minimizing round-off errors which are inherent in computer programmes). Secondly, in many technical and business applications, series of systems of linear equations Ax = bi with the same coefficient matrix A but with different bi are needed to be solved. The advantage of an LU factorization in such situations is clear: one has to compute A = LU only once. Forward substitutions with the same L for the different bi and subsequently backward substitution with the same U then solve the systems one by one. Finally, another advantage of the LU factorization is the ease with which L can be constructed; the very process of row reduction of A to the echelon form U provides all the input for L.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

85

Construction of L We now explain the precise manner in which entries of L are to be determined by analysing the row reduction of an arbitrary m × n matrix A (over a field F); the only assumption we are making is that at no stage of the reduction, any row interchange is required. The row reduction begins by making all the entries of the first column of A, below the first entry, zeros one by one, by adding suitable multiples of the first row to the rows below it. Now from our discussions of elementary matrices in Section 2.2, we know that if one has to subtract l j1 times the first row of A from the jth row (or, equivalently, adding −l j1 times the first row to the jth row)to eliminate the ( j, 1)th entry of A, then the corresponding elementary matrix of order m, say E j1 , is given by E j1 = Im − l j1 e j1 ; here Im is the identity matrix of order m and e j1 denotes the unit matrix of order m having 1 at the ( j, 1)th position and zeros elsewhere. We set, for convenience, E1 = Em1 Em−1,1 . . . E21 = (Im − lm1 em1 )(Im − lm−1,1 em−1,1 ) . . . (Im − l21 e21 ) It is clear that A reduces to the matrix E1 A after all the elementary row operations dealing with the first column of A are carried out. Now, from the proof of Proposition (2.3.5), we know that E −1 j1 = Im +l j1 e j1 . It follows that −1 −1 −1 E1−1 = E21 E . . . Em1 ? 31 = (Im + l j1 e j1 ) j>1

−1 E −1 = (I + l e )(I + l e ) = I + To get a more explicit description of E1 −1 , first note that E21 m 21 21 m 31 31 m 31 l21 e21 + l31 e31 by the rules governing the multiplications of unit matrices (see Proposition 1.3.7 of Chapter 1). Continuing the multiplications of the factors in E1−1 in this way, we finally arrive at 1 l j1 e j1 , (2.8) E1−1 = Im + j>1

which, when written out explicitly, yields the following:   1 0 0 .  l  21 1 0 .  l31 0 1 .  . . . . E1−1 =   . . .   . . .  lm1 0 0 .

. . .

. .

. .

. .

0 0 0 . . . 1

      .    

Note: E1−1 is a lower triangular matrix with 1 along the diagonal; most interestingly, the ( j, 1)th entry of this matrix, for j ≥ 2, is precisely the multiplier l j1 used to make the ( j, 1)th entry of A zero. Also all the entries of E1−1 , other than those along the diagonal and in the first column, are zeros. The row reduction of A next moves to the next pivot column of E1 A; this pivot, whatever column it may be in, has to be in the second row. This stage of the row reduction consists of subtracting multiples of the second row by suitable multipliers, from each of the rows below it, one by one, so as to make

Saikia-Linear Algebra

86

book1

February 25, 2014

0:8

Systems of Linear Equations

all the entries below the second pivot zeros. As in the preceding case, if we denote the multiplier, used to make the entry in the jth row below the pivot zero, as l j2 then the elementary matrix corresponding to the replacement of the jth row will be given by E j2 = I − l j2 e j2 . Therefore, the row replacements associated with the second pivot reduces E1 A to E2 E1 A, where E2 = Em2 Em−1,2 . . . E32 = (Im − lm2 em2 )(Im − lm−1,2 em−1,2 ) . . . (Im − l32 e32 ) As with the case for E1 ,we then have −1 −1 . . . Em2 E2−1 = E32 ? = (Im + l j2 e j2 ) j>2

= Im +

1

l j2 e j2 .

j>2

Explicitly,

Observe that

       E2−1 =     

1 0 0 0 . . . 0

0 1

0 0 1 0 . . . 0

l32 l42 . . . lm2

E1−1 E2−1 = (Im +

1

. . . .

. . . . .

1

.

.

l j1 e j1 +

as e j1 ek2 is the zero matrix if k > 2. This shows that  0 0  1  l21 1 0   l31 l32 1  l l42 0 E1−1 E2−1 =  41 . . .   . . .   . . .  lm1 lm2 0

. .

.

l j1 e j1 )(Im +

j>1

0 0 0 0 . . . 1

. . . .

.

j>1

= Im +

. . . .

1

1

       .     

l j2 e j2 )

j>2

l j2 e j2

j>2

. . . .

. . . . .

. . . .

. . . .

. .

.

.

. .

0 0 0 0 . . . 1

       .     

Continuing in this manner, we see that if A has t pivots (t ≤ m), then the complete process of row reduction of A can be effected by left-multiplying A by E t Et−1 . . . E1 with Ek given by Ek = (Im − lmk emk )(Im − lm−1,k em−1,k ) . . . (Im − lk+1,k ek+1,k ),

(2.9)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

87

where l jk (for j > k) is the multiplier of the kth row used for row replacement of the jth row for making the entry in the jth row below the kth pivot zero. If U is the row echelon matrix thus obtained from A, it is clear that Et Et−1 . . . E1 A = U. Each Ek , being a product of elementary matrices, is invertible. Hence, the factorization A = LU holds with L = E1−1 E2−1 . . . Et−1 . On the other hand, from Equation (2.9), it follows that Ek−1 = (Im + lk+1,k ek+1,k )(Im + lk+2,k ek+2,k ) . . . (Im + lm,k em,k ). Therefore, L = (Im +

1

l j1 e j1 )(Im +

j>1

= Im +

1 j>1

l j1 e j1 +

1 j>2

1

l j2 e j2 ) . . . (Im +

j>2

l j2 e j2 + · · · +

1

1

l jt e jt )

j>t

l jt e jt

j>t

as e jk elq is the zero matrix if k < l. From this expression of L, we can give an explicit description of L. L is clearly a lower triangular matrix of order m with 1 along the diagonal; moreover, the entries in its kth column (1 ≤ k ≤ t), below the diagonal entry are the multipliers lk+1,k , lk+2,k , . . . , lmk . In case the number of pivots t < m then the last m − t columns of L will be the same as the corresponding columns of the identity matrix of order m. However, in case t = m, the last pivot of A occurs in the last row of A. Since there is no need for row replacement in this case, we may take Et = Em = I, the identity matrix of order m. It follows that the lower triangular matrix L of order m has all entries in the last column equal to zero except for the bottommost one, which is the diagonal element 1. This description of L allows us to write down L at the same time when row replacements are being used to bring A to its row echelon form U. All we have to do is to keep track of the multipliers. We start with the identity matrix of order m (if A is m × n) and place the multipliers associated with the row replacements used to make the entries below the kth pivot in the kth column below the diagonal entry 1 in the order indicated in the preceding description of L. If the number t of pivots is less than m, the process will create entries below the diagonal in the first t columns of the identity matrix, resulting in L. If t = m, all the positions below the diagonal will be filled up with the multipliers while the last column of the identity matrix is left intact. We present a couple of examples to illustrate the construction of L according to the method we have just outlined. For convenience, as practised earlier, a row replacement in which we subtract l times the ith row from the jth row, will be denoted by R j → R j − lRi . EXAMPLE 10 We go back to the matrices of Example 1 to show how row reduction of   1 −1 −2   0 −1 A = 1   2 3 2

Saikia-Linear Algebra

88

book1

February 25, 2014

0:8

Systems of Linear Equations

helps us to construct L as given in the example. The first stage of row reduction of A are the row replacements given by R2 → R2 − R1 and R3 → R3 − 2R1, which reduce A to   1 −1 −2   1 1. A1 = 0   0 5 6

Thus, the multipliers used to deal with the first column of A are respectively, 1 and 2. These will then fill the first column of L below the diagonal. Moving on to the second pivot in A1 , which is the first non-zero entry in the second row, we see that the row replacement R3 → R3 − 5R1 makes the entry below the second pivot zero. This operation changes A1 to   1 −1 −2 1 1. A2 = 0   0 0 1

So the second column of L has the multiplier 5 below the diagonal entry 1. Since A2 is already in a row echelon form so we take U = A2 . The corresponding L is therefore given by   1 0 0   L = 1 1 0.   2 5 1 Note: U is an upper triangular matrix. It is left to the reader to check that A = LU.

EXAMPLE 11 We determine an LU factorization of the following matrix   1 −1 0 −2  2   4 2 1 1 0 . A =  4 1 8 −2 −1 6 3 −3 0 −10

The row replacements R2 → R2 − 2R1 , R3 → R3 + R1 and R4 → R4 − 3R1 give us zeros below the first pivot 2 of A, and reduce A to   2 1 −1 0 −2 0 0 3 1 4 . A1 =  3 1 6 0 0 0 0 0 0 −4

Thus, the multipliers which will fill the first column of L below the diagonal entry 1 are respectively 2, −1 and 3 (note that as the second row operation was adding R1 to R3 , the corresponding multiplier is −1). The next pivot is 3 appearing in the second row of A1 but in the third column. We need a single row replacement, namely R3 → R3 − R2 , to make the entries below this pivot zeros. This reduces A1 to   2 1 −1 0 −2 0 0 3 1 4 . A2 =  0 0 2 0 0 0 0 0 0 −4

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

89

So the multipliers which will fill the second column of L are respectively 1 and 0. Finally, R4 → R4 + 2R3 reduces A2 to a row echelon form of A:   2 1 −1 0 −2 0 0 3 1 4 . U =  0 0 2 0 0 0 0 0 0 0

So the multiplier to fill in the third column of L is −2. Note: as L has to be matrix of order 4 and there are only three pivots of A, the last column of L is the fourth column of I4 . Hence, A = LU, where   0 0  1 0   2 1 0 0 . L =  1 0 −1 1 3 0 −2 1

Continuing with the example, it can be easily shown that U can be further factored as U = DV, where D is a diagonal matrix of order 4 and V obtained from U by replacing each pivot of U by 1. To verify that such a factorization is possible, we first observe that V can be obtained from U by performing the elementary row operations of dividing out each row of U containing a pivot by the pivot itself. If we denote a diagonal matrix having a1 , a2 , . . . , an as diagonal elements by diag[a1, a2 , . . . , an ], then it is clear that the three elementary matrices corresponding to these row operations on U which makes the pivots equal to 1 are E1 = diag[1/2, 1, 1, 1], E2 = [1, 1/3, 1, 1] and E3 = diag[1, 1, 1/2, 1]. Thus we see that E3 E2 E1 U = V, which implies that U = E1−1 E2−1 E3−1 V. But as we have seen in our discussion about inverses of elementary matrices in Section 2.3, E1−1 = diag[2, 1, 1, 1], E2−1 = diag[1, 3, 1, 1] and E3−1 = diag[1, 1, 2, 1] so their product equals the diagonal matrix D = diag[2, 3, 2, 1]. Thus, we obtain the factorization U = DV  2 0 =  0 0

0 3 0 0

0 0 2 0

 0 1  0 0  0 0 1 0

1/2 0 0 0

−1/2 1 0 0

0 1/3 0 0

 −1   4/3 . 1  0

This example illustrate the following: suppose that an m × n matrix A has an LU factorization A = LU. Construct a diagonal matrix D of order m by placing the pivots of U along the diagonal in the same order in which they appear in U (from left to right) and, if necessary, filling the rest of the diagonal by 1’s. Then we have another factorization A = LDV, where V is obtained from U by replacing each of its pivots by 1. The easy verification is left to the reader as an exercise. Any factorization of a matrix A ∈ Mn (F) in the form A = LDV, where L is a lower triangular matrix of order m with 1 along the diagonal, D a diagonal matrix of order m and V an m × n matrix whose all pivots are equal to 1, is called an LDV factorization or an LDV decomposition of A.

Saikia-Linear Algebra

90

book1

February 25, 2014

0:8

Systems of Linear Equations

Computing Inverses by LU Factorization One important application of an LU factorization provides an efficient method to work out the inverse of an invertible matrix A ∈ Mn (F). In fact, because of the efficiency, software programmes, such as MATLAB, use this method to compute inverses. The procedure is simple. If A = LU, then A−1 = U −1 L−1 , where U −1 and L−1 are still upper and lower triangular matrices respectively (see Proposition 2.5.8) and so computation of A−1 as the product U −1 L−1 is relatively simple. Further simplification is possible. Since A is invertible, it must have n pivots, that is, the diagonal entries of the echelon form U, being the pivots, are non-zero. Therefore, the corresponding LDV factorization A = LDV is such that D is a diagonal matrix of order n with the pivots along the diagonal, and L and V are, respectively, lower and upper triangular matrices of order n with their diagonal entries all equal to 1. It follows that the computation of V −1 and L−1 and hence of A−1 = U −1 D−1 L−1 , by row reduction algorithm, is even simpler. EXAMPLE 12 An easy calculation shows that for the matrix   2  A =  6  −4

 −1  6  5

−3 0 6

one has an LU factorization given by   2   6 −4

  −1  1   6 =  3   5 −2

−3 0 6

0 1 0

Hence an LDV factorization of A is   1   3 −2

0 1 0

 0 2  0 0  1 0

0 9 0

 0 2  0 0  1 0

 0 1  0 0  3 0

−3 9 0

 −1  9,  3

 −1/2  1 .  1

−3/2 1 0

The row reduction method of finding inverses readily gives us, in fact, in two steps in both cases, that   1   3 −2

and  1  0 0

−3/2 1 0

0 1 0

−1   1 0   0 = −3   2 1 −1  1 −1/2   1  = 0   0 1

0 1 0

3/2 1 0

 0  0  1  −1  −1.  1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

We conclude that  1 3/2  1 A−1 = 0  0 0  −2/3  =  −1  2/3

 −1 1/2  −1  0  1 0

1/6 1/9 0

0 1/9 0

 −1/3  −1/3.  1/3

 0   1  0  −3  1/3 2

0 1 0

91

 0  0  1

One interesting aspect of an LDV factorization of an invertible matrix A = LDV is that such a factorization is unique, a fact which we shall shortly prove. However, for a singular matrix, we cannot hope for uniqueness as the following example shows. EXAMPLE 13 The LU factorization  2  A = 4  2

1 4 −1

  −1 1   −1 = 2   −2 1

0 1 −1

 0 2  0 0  1 0

of the real matrix A shows that for any real number x,    1 0 0 2 0 0 1 1/2    1 0 0 2 0 0 1 A = 2    1 −1 1 0 0 x 0 0

Thus, A has infinitely many LDV factorizations.

1 2 0

 −1  1  0  −1/2  1/2 .  0

Proposition 2.6.2. Let A ∈ Mn (F) be an invertible matrix such that no row interchange is required for its row reduction. Then the LDV factorization of A is unique. Proof. If possible, let A = L1 D1 V1 = L2 D2 V2 be two factorizations, where L1 , L2 and V1 , V2 are, respectively, lower and upper triangular matrices of order n, having all their diagonal entries equal to 1, and D1 , D2 diagonal matrices of order n. By Proposition (2.5.5), each of L1 , L2 , V1 and V2 is invertible and so (as A is invertible) D1 and D2 are also invertible. Now, left-multiplying L1 D1 V1 = L2 D2 V2 by −1 = V −1 D−1 , one obtains L−1 2 and right-multiplying by (D1 V1 ) 1 1 −1 −1 L−1 2 L1 = D2 V2 V1 D1 .

(2.10)

Note: Recall that (see Proposition 2.5.8) the inverse of a lower triangular (respectively, an upper triangular) matrix is lower triangular (respectively, upper triangular). Moreover, by Proposition (1.2.4), a product of lower triangular (respectively, upper triangular) matrices is lower triangular (respectively, upper triangular). It follows that the left hand side of Equation(2.10) represents a lower triangular matrix of order n whereas the right hand side is an upper triangular matrix of order n (considering D1 , D2 as upper triangular). Therefore, Equation (2.10) can hold only if both sides of the equation represent the same diagonal matrix, say D, of order n. Since both L−1 2 and L1 are lower triangular matrices of order n, each having all diagonal entries equal to 1, by Proposition (1.2.4) again, the diagonal matrix D = L−1 2 L1 must have all its diagonal entries equal to 1, and so D = In . Thus, we may conclude that L = In and D2 V2 V1−1 D−1 L−1 1 2 1 = In . The first implies that L1 = L2 while the second that D2 V2 = D1 V1 ,

Saikia-Linear Algebra

92

book1

February 25, 2014

0:8

Systems of Linear Equations

which can be rewritten as V2 V1−1 = D−1 2 D1 . As the left hand side of the last equation is an upper triangular matrix of order n with all the diagonal entries equal to 1, whereas the right hand side a diagonal matrix of order n, we deduce that V2 V1−1 = In = D−1 2 D1 . Hence, V1 = V2 and D1 = D2 completing the proof of the proposition. ! As a consequence of this uniqueness, the factors L and V of the LDV factorization of a symmetric invertible matrix reflect the symmetry in a beautiful way. Proposition 2.6.3. Let A ∈ Mn (F) be an invertible symmetric matrix such that its row reduction requires no row interchange. If A = LDV is the unique LDV factorization of A, then V = Lt . Proof. Since (XY)t = Y t X t for any X, Y ∈ Mn (F), and At = A as A is symmetric, it follows, from the LDV factorization, that A = At = (LDV)t = V t DLt , as D is diagonal. Observe that V t is a lower triangular matrix with all diagonal entries equal to 1 (taking transpose does not alter the diagonal) and Lt an upper triangular with 1 along the diagonal. So, comparing this factorization of A with A = LDV, we conclude, by the uniqueness of the LDV factorization, that L = V t and, equivalently, V = Lt . ! EXAMPLE 14 The elementary row operations R2 → R2 − 2R1 , R3 → R3 − 3R3 and R3 → R3 − R2 , applied successively, to the real symmetric matrix

produces an echelon form

which tells us that

 1  A = 2  3  1  0 0

 3  8  10

2 6 8

2 2 0

 1  V = 0  0

 3  2,  −1 2 1 0

 3  1  1

and that the pivots, from the left to the right, are 1, 2 and −1. Since A is symmetric as well as invertible (what are the pivots?), L = V t and we obtain the LDV factorization of A as     1 0 0 1 0 0 1 2 3     0 0 1 1. A = 2 1 0 0 2     3 1 1 0 0 −1 0 0 1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

93

Permuted LU Factorization We finally consider modifications in the procedure for obtaining a LU factorization of a matrix which needs row interchange for row reduction. In practice, one cannot avoid such matrices; moreover, for minimizing round-off errors in computations with matrices, row interchanges are necessary. In the following example, we illustrate a permuted LU factorization of a matrix, whose row reduction without row interchange does not yield an LU factorization. EXAMPLE 15 The elementary row operations R2 → R2 + 2R1 and R3 → R3 − 3R1 reduces    1 2 3   A = −2 −4 −2   3 9 4 to   1 2 3  0 0 4,   0 3 −5

which is not in the form required for an LU factorization. However, it is clear that the row interchange R2 ↔ R3 brings the preceding matrix to the required form. Now if, instead of A, we start with the matrix obtained from A by applying the same row interchange to A, that is, with    1 2 3   9 4, P23 A =  3   −2 −4 −2

where P23 is the elementary permutation matrix (see subsection of Section 2.3 for a discussion of permutation matrices) corresponding to the row interchange R2 ↔ R3 , then one easily verifies that P23 A has an LU factorization given by     1 0 0 1 2 3     1 0 0 3 −5. P23 A =  3    −2 −1 0 0 0 4

In general, suppose that A ∈ Mm×n (F) needs row interchanges, at various stages of its row reduction, to arrive at an echelon form. To produce a permuted LU factorization of A, one first applies all the required row interchanges to A before any other elementary row operation. Now recall that the elementary matrices corresponding to row interchanges are also permutation matrices. Thus, the matrix that A changes to because of all the row exchanges, can also be obtained by leftmultiplying A by a sequence of permutation matrices of order m. By Proposition (2.3.9), the product of these permutation matrices of order n is a permutation matrix, say P, of order m. It is clear that the matrix PA does not need any row interchange for its row reduction and so PA has an LU factorization. How does one use a permuted LU factorization of A to solve a system of equations Ax = b? If PA = LU is such a factorization, then one first solves Ly = Pb for y by forward substitution followed by solving U x = y for x by backward substitution; since P is a matrix of order m, the product Pb is still an m-dimensional column vector. Observe that we are actually working with the augmented matrix [PA|Pb]. This augmented matrix corresponds to a system of equations obtained by permuting the equations in Ax = b according to the permutation described by P. So the systems Ax = b and PAx = Pb are equivalent in the sense that they have the same solution sets.

Saikia-Linear Algebra

94

book1

February 25, 2014

0:8

Systems of Linear Equations

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All matrices are over an arbitrary field F unless otherwise specified. (a) Any matrix which requires no row interchange for its row reduction has a unique LU factorization. (b) If Pi j and P jk (k ! i) are elementary permutation matrix of the same order, then Pi j P jk = P jk Pi j . (c) If Pi j and Pkl are elementary permutation matrices of the same order such that none of i and j equals any of k and l, then Pi j Pkl = Pkl Pi j . (d) Any invertible matrix has an LU factorization. (e) If the system of equations Ax = b has a solution, then A has an LU factorization. (f) If an invertible matrix A has an LU factorization, then so does A−1 . (g) If a square matrix A has no LU factorization, then neither can AB has for any non-zero matrix B of the same order. (h) If A = LU is an LU factorization, then L is invertible. (i) If A = LU is an LU factorization of a square matrix A, then U is invertible. (j) A real symmetric matrix has an LDV factorization. 2. Find the LU factorizations of the following real matrices:  2  4 6

−1 0 −5

 2  7,  7

  5  −10 15

 1  3  −3

2 −2 2

3. Determine whether the following real matrix has an LU factorization:  0  0 0

2 −1 −1

−6 3 3

 −4  2.  10

−2 3 7

4. Find LU and LDV factorizations of

 2  2 0

2 5 3

 0  3.  6

5. Find LU factorizations of the following matrices and determine their inverses if they exist:   1  −1 1

2 1 0

 1  2,  1

 1  1 1

2 0 1

 1  1  2

Saikia-Linear Algebra

book1

February 25, 2014

0:8

LU Factorization

95

6. Find an LU factorization  1 1 A =  1 1

−1 2 2 2

−1 −2 3 3

 −1  −2 . −3 4

Does the LU factorization of A indicate that A is invertible? If so, find the inverse of A. 7. Find an LU factorization of  2 1 A =  1 1

1 2 1 1

1 1 2 1

 1  1 . 1 2

Use the factorization to solve the systems of equations Ax = b for    1   −1  b =   ,  −1  1

   −1   1  b =   ,  −1  1

8. Use an LU factorization of the tridiagonal matrix   1 −1  A =  0  0  0

0 −1 2 −1 0

−1 2 −1 0 0

   0   1  b =   .  −1  1 0 0 −1 2 −1

to solve the system of equations Ax = b for    1   −1    b =  1  .  −1    1

 0  0  0  −1 2

Find A−1 after finding the LDV factorization of A. 9. Let a, b, c and d are real numbers such that a ! 0, b ! a, c ! b and d ! c. Find the LDV factorization of

and hence find A−1 .

 a a A =  a a

a b b b

a b c c

 a  b , c d

Saikia-Linear Algebra

96

book1

February 25, 2014

0:8

Systems of Linear Equations

10. Is it possible to have an LU factorization of  1  A = 2  1

2 4 3

 −1  1?  1

If not, find a permutation matrix P such that PA has an LU factorization.

2.7 DETERMINANT With every square matrix over a field, we can associate a unique scalar called its determinant. Originally, the determinant of the coefficient matrix of a system of linear equations having the same number of equations and variables was used to determine the existence of solutions. Since then, the concept of determinant has evolved into a very useful theoretical tool in diverse areas of mathematics. In this section, we present a treatment of the concept of determinants which makes the proofs of their basic properties accessible at this level. Because of its very nature, as the reader will find out a little later, the definition of determinants, and verifications of their properties, tend to be tedious and cumbersome. One can develop the theory in a very elegant manner, but such a treatment requires mathematical ideas way beyond the scope of this book. We have decided to keep the definition simple, but then we have to sacrifice certain amount of simplicity in our proofs. Our definition of determinant of a matrix will be a recursive one. We first write out the determinant of a 1 × 1 and a 2 × 2 matrix explicitly. Then, for n ≥ 2, we will define the determinant of a n × n matrix inductively in terms of determinants of certain (n − 1) × (n − 1) matrices. This recursive definition allows us to express the original determinant in terms of determinants of matrices of successively lower orders step by step. Thus, theoretically at least, it is possible to evaluate the determinant of a n × n matrix explicitly as it can be expressed in terms of 2 × 2 determinants which are already defined. Note that we have already used an expression like n × n determinant to mean the determinant of an n × n matrix. We let this abuse of language stand to make things easier for us. Recursive Definition of Determinants Now, for the definitions. Let det A denote the determinant of any matrix A ∈ Mn (F) 'where F is(a field. a a12 For any 1 × 1 matrix A = [a], which is just a scalar in F, we let det A = a, and for A = 11 where a21 a22 ai j ∈ F, we let det A = a11 a22 − a12 a21 ,

(2.11)

which is a well-defined scalar in F. In general, given an n × n matrix A (n ≥ 2 ) over F, let Ai j be the (n − 1) × (n − 1) submatrix obtained from A by deleting the ith row and the jth column of A for any i, j, 1 ≤ i, j ≤ n; such Ai j are called the minors of A. The determinant of A is then defined in terms of the determinants of the minors from the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

97

first row of A as follows: det A = a11 det A11 − a12 det A12 · · · ± a1n det A1n =

n 1

(−1)1+ ja1 j det A1 j .

(2.12)

j=1

This is known as the expansion of det A by minors along the first row. Note that for n = 2, the definition given by Equation (2.12) is the same as the one given by Equation (2.11). Also, note that the determinant of any matrix in Mn (F), is a scalar in F. EXAMPLE 16 In the following example, we compute the 3 × 3 determinant first by expressing it in terms of 2 × 2 determinants according to Equation (2.12) before using Equation (2.11):   ' ( ' ( ' (  2 0 −3 1 0 −1 0 −1 1   1 0 = 2. det det −1 − 0. det + (−3). det −1 1 2 1 2 −1   2 −1 1 = 2.1 − 0.(−1) − 3.(−1) = 2−0+3 = 5.

Determinant of a Lower Triangular Matrix EXAMPLE 17 We next compute the determinant of a lower triangular matrix in a similar manner:   ' (  2 0 0 4 0   det −1 4 0 = 2. det 5 6   3 5 6 = 2.4. det [6] = 24.

Clearly, one can generalize this example. Proposition 2.7.1. The determinant of a lower triangular matrix over any field F, and in particular, of a diagonal matrix, is the product of its diagonal entries. Proof. Suppose, A = [ai j ] is a lower triangular matrix in Mn (F). The proof is by induction on n, the order of A. There is nothing to prove if n = 1, so we can begin induction with n ≥ 2. Since all the entries of the first row of A, except possibly a11 , are zeros, it follows from Equation (2.12) that det A = a11 det A11 . But as A is a lower triangular matrix, it is clear that A11 itself is a lower triangular matrix of order (n −1) with diagonal entries a22 , a33 , . . . , ann . Therefore, the induction hypothesis implies that det A11 is the product a22 a33 , . . . , ann . The preceding equality then proves the proposition. !

Saikia-Linear Algebra

98

book1

February 25, 2014

0:8

Systems of Linear Equations

Corollary 2.7.2.

If In is the identity matrix of order n (n ≥ 1) over any field F, then det In = 1.

Row Operations and Determinants One of our main concerns will be to see how elementary row operations on a square matrix changes its determinant. Before presenting the relevant results, we introduce a bit of notation which brings out dependence of the determinant of a matrix on its rows. Definition 2.7.3.

If A ∈ Mn (F) has the row vectors ρ1 , ρ2 , . . . , ρn as its rows, then we will write det A = Dn (ρ1 , ρ2 , . . . , ρn ). j

For any j (1 ≤ n), let ρi be the 1 × (n − 1) row vector obtained by deleting the jth entry of the row vector ρi . Then, for the minor A1 j , which is obtained from A by deleting the 1st row and the jth column of A, we have, in our new notation, j

j

det A1 j = Dn−1 (ρ2 , . . . , ρn ). Therefore, the definition of the determinant of A yields the following: det A =

n 1 j j (−1)1+ j a1 j Dn−1 (ρ2 , . . . , ρn ).

(2.13)

j=1

Also, note that as we view the rows of a matrix as row vectors, any addition of rows or scalar multiplication of a row are done componentwise. Now, we are ready to present the first of the basic properties of determinants. Proposition 2.7.4. Let ρ j (1 ≤ j ≤ n) be n 1 × n row vectors over a field F. Then, the following hold: (a) For a fixed i, (1 ≤ i ≤ n), and any 1 × n row vector σ over F, Dn (ρ1 , ρ2 , . . . , ρi + σ, . . . , ρn ) = Dn (ρ1 , ρ2 , . . . , ρi , . . . , ρn ) + Dn (ρ1 , ρ2 , . . . , σ, . . . , ρn ), where in the second Dn , on the right-hand side of the equation, σ appears in the ith place; (b) For any scalar c ∈ F and a fixed i, (1 ≤ i ≤ n), Dn (ρ1 , ρ2 , . . . , cρi , . . . , ρn ) = cDn (ρ1 , ρ2 , . . . , ρi , . . . , ρn ). A terse way of stating the proposition is to say that a determinant, viewed as a function on row vectors, is linear or the determinant is linear on the rows of matrices. A word of caution, however; it does not say, for example, that det (A + B) = det A + det B. To  clarify what  it actually means, we give 1 2 3   a couple of examples to illustrate the proposition. Consider 3 2 0. Since the second row of the   0 1 0

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

99

matrix can be thought of as (3, 2, 0) + (0, 0, 0), the first relation of the proposition implies that        1 1 2 3 1 2 3 2 3        det 3 + 0 2 + 0 0 + 0 = det 3 2 0 + det 0 0 0.       0 1 0 0 1 0 0 1 0 Similarly, the second relation of the proposition allows us to work out the following: ' ( ' ( 1 2 1 2 det = det 2 4 2.1 2.2 ( ' 1 2 . = 2. det 1 2 We now come to the proof. Proof. The proof of both the assertions will be carried out by induction on n, the number of rows (and columns) of the matrices involved. Consider assertion (a) first. If n = 1, then i = 1 and the matrices involved are just scalars. The corresponding determinants are the same scalars and so, there is nothing to prove. So, assume that n ≥ 2 and the assertion holds for determinants of any (n −1) ×(n −1) matrices. Let A be the n × n matrix whose rows are ρ1 , . . . , ρi + σ, . . . , ρn . Further, for any k, (1 ≤ k ≤ n), let ρk = (ak1 , ak2 , . . . , akn ) and σ = (b1 , b2 , . . . , bn ) for scalars akl and bl . Consider the case when i = 1. Then, the jth entry of first row of A is (a1 j + b j ), and so according to Equation (2.13), we see that 1 j j Dn (ρ1 + σ, . . . , ρ2 , . . . , ρn ) = (−1)1+ j(a1 j + b j )Dn−1 (ρ2 , . . . , ρn ). j

j

j

Since Dn−1 (ρ2 , . . . , ρn ) is also a scalar like a1 j and b1 , the sum on the right-hand side of the preceding equality can be split into two sums as follows: 1 1 j j j j (−1)1+ ja1 j Dn−1 (ρ2 , . . . , ρn ) + (−1)1+ j b j Dn−1 (ρ2 , . . . , ρn ). j

j

Invoking Equation (2.13) again, and noting that ρ1 = (a11 , a12 , . . . , a1n ) and σ = (b1 , b2 , . . . , bn ), we see that the preceding two sums can be expressed as Dn (ρ1 , ρ2 , . . . , ρn ) + Dn(σ, ρ2 , . . . , ρn ) which proves the assertion in this case. Note that this straightforward case did not need the induction hypothesis. So, consider the case when 2 ≤ i ≤ n. So, the sum of the row vectors occurs in A in a row other than the first one. In this case, any minor A1 j , where 1 ≤ j ≤ n, obtained by removing the 1st row and j j j the jth column of A, is an (n − 1) × (n − 1) having the row vectors ρ2 , . . . , ρi + σ, . . . , ρn as its rows. j (see Definition (2.6.1) for the definition of ρk ) The obvious modifications needed for i = 2 or i = n are clear. Thus, by Equation (2.13), we have 1 j j j Dn (ρ1 , . . . ρi + σ, . . . , ρn ) = (−1)1+ ja1 j Dn−1 (ρ2 , . . . , ρi + σ j , . . . , ρn ). (2.14) j

Saikia-Linear Algebra

100

book1

February 25, 2014

0:8

Systems of Linear Equations

However, by the induction hypothesis j

j

j

Dn−1 (ρ2 , . . . , ρi + σ j , . . . , ρn ) j

j

j

j

j

= Dn−1 (ρ2 , . . . , ρi , . . . , ρn ) + Dn−1(ρ2 , . . . , σ j , . . . , ρn ). j

j

j

By substituting the preceding expression for Dn−1 (ρ2 , . . . , ρi + σ j , . . . , ρn ) in Equation (2.14), we obtain the desired formula for Dn (ρ1 , . . . ρi + σ, . . . , ρn ) applying Equation (2.13). The reader can provide the proof of assertion (b) along the same lines using induction. !

Since the zero row vector can be thought as a scalar multiple of any row vector, say, of (1, 1, . . . , 1), by the scalar zero, assertion (b) of the preceding proposition immediately implies the following corollary; note that the determinant of any matrix is a scalar, too. Corollary 2.7.5.

If a row of a matrix A ∈ Mn (F) is a zero row, then det A = 0.

An analogous result in the case of a zero column is also true. However, our way of treating determinant means that we cannot give an analogous proof. But induction readily proves the result, and we leave the proof of the following to the reader. Proposition 2.7.6. If a column of a matrix A ∈ Mn (F) is a zero column, then det A = 0. The next result is crucial in calculations with determinants. Proposition 2.7.7. If two adjacent rows of a matrix A ∈ Mn (F) are equal, then det A = 0. In terms of our alternative notation, Dn (ρ1 , . . . , ρn ) = 0 if ρi = ρi+1 for some i (1 ≤ i ≤ n − 1) Proof. There are two cases to be considered. The first is when the first two rows of A are equal, and the second is when the adjacent equal rows occur after the first row. We take up the first case to begin with. For the hypothesis to make sense in the first case, n ≥ 2. If n = 2, then A has the form ' a11 a11

( a12 . a12

Then, the formula for the determinant of a 2 × 2 matrix directly shows that det A = 0. So we may assume that n ≥ 3. Since the first two rows of A are equal, for any j (1 ≤ j ≤ n), the first row of the minor A1 j is the first row of A with the jth entry removed. Thus, if (a11 , a12 , . . . , a1n ) is the first row of A, then the first row of A1 j is (a11 , a12 , . . . , a1, j−1 , a1, j+1 , . . . , a1n ). Thus, according to Equation (2.12), we can express, for each fixed j, det A1 j as an alternate sum of terms of the type a1k det B jk where B jk can be thought of as the (n − 2) × (n − 2) submatrix obtained from A by deleting the 1st and the 2nd row as well as the jth and the kth columns of A; observe that k ! j.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

101

Before continuing with our proof, we work out the details in case of a 4 × 4 matrix A with its first two rows equal, to make our proposed line of reasoning clear. We have the formula for det A explicitly as det A = a11 det A11 − a12 det A12 + a13 det A13 − a14 det A14 , where  a12  A11 = a32  a42

Therefore,

 a11  A13 = a31  a41

a13 a33 a43 a12 a32 a42

 a14   a34 ,  a44

 a14   a34 ,  a44

 a11  A12 = a31  a41

 a11  A14 = a31  a41

a13 a33 a43 a12 a32 a42

 a14   a34   a44

 a13   a33 .  a43

det A = a11 a12 det B12 − a11 a13 det B13 + a11 a14 det B14 − a12a11 det B21 + a12a13 det B23 − a12 a14 det B24 +· · · ,

where ' a B12 = 33 a43 ' a B13 = 32 a42

( a34 = B21 a44 (

a34 = B31 etc. a44

(2.15) (2.16) (2.17)

The point of this example is to note that B jk and Bk j are the same matrix so that det A, being the alternate sum of the B jk , is automatically zero. Let us come back to the proof. We have already noted, as in the example, that det A is the sum of all the terms of type a1 j a1k det B jk (ignoring the signs of the terms), where both j and k run through 1 to n with the stipulation that k ! j. To deal with this sum, we pair off, for each pair ( j, k) with j < k, the term a1 j a1k det B jk with the term a1k a1 j det Bk j . As we insist that j < k, this pairing will exhaust all the terms in the sum for det A. On the other hand, B jk and Bk j are obtained from A by removing the same rows and columns, but in different sequence. Therefore, it is clear that B jk and Bk j are equal matrices. (That is why in the example of the 4 × 4 matrix, B12 is equal to B21, etc.). Consequently, to prove that det A is zero as the proposition claims, it suffices to show that the terms we have paired off in the sum for det A, for example, a1 j a1k det B jk and a1k a1 j det Bk j , have different signs. Assume that a1 j a1k det B jk has ‘−’ sign. This negative sign may arise in two different ways in the expansion of det A, and we treat them as separate cases. Suppose that the ‘−’ sign was picked up by a1 j during the row expansion of det A, and hence a1k has the ‘+’ sign during the expansion of det A1 j . In that case, a careful look at Equation (2.12) for the occurrence of signs, we conclude that both j and k are even (note as jth column was removed to get A1 j and j < k, a1k in A1 j is at an odd-numbered place if k is even.) But then, a1k in the expansion of

Saikia-Linear Algebra

102

book1

February 25, 2014

0:8

Systems of Linear Equations

det A, and a1 j in the expansion of det A1k will both pick up the ‘−’sign. For, as j < k, the removal of of the kth column does not affect the sign of a1 j in the expansion of det A1k , as the term remains at an even-numbered place in A1k . Thus, a1 j a1k det B jk and a1k a1 j det Bk j do have different signs in this case. In case a1 j picks up the ‘+’ sign in the expansion of det A, then in the expansion of det A1 j , a1k has to have the ‘−’ sign. So, j has to be odd and since a1k has to be in an even-numbered place in the first row of A1 j , k must also be odd. Consequently, the term containing a1k in the expansion of det A has the ‘+’ sign. However, as j < k, a1 j will still be in an odd-numbered place in the first row of a1k , so a1k a1 j Bk j has the ‘+’ sign in this case. This finishes the argument when a1 j a1k det B jk has ‘−’ sign. When a1 j a1k det B jk has + sign, the desired conclusion can be arrived at exactly in the same manner. We now take up the case when the equal adjacent rows appear after the first row in A which is possible only when n ≥ 3. We will use induction to settle this case. If n = 3, then the second and the third row of A are equal. Therefore, the three minors A11 , A12 and A13 are all 2 × 2 matrices having equal rows so their determinants are zeros by the first case. So, assume that n ≥ 4 and the proposition holds for the determinant of any (n − 1) × (n − 1) matrix satisfying the hypothesis of the proposition. Consider the expansion of det A by the minors of the first row. Now, each of these minors is an (n − 1) × (n − 1) matrix whose two adjacent rows are equal by our assumption and hence their determinants are zeros by the induction hypothesis. It follows that det A is also zero, proving the proposition in this case. ! There are some more results about how determinants are affected by certain conditions on adjacent rows of a matrix. These are presented now. Proposition 2.7.8. Let A ∈ Mn (F). (a) If matrix B is obtained by adding a scalar multiple of a row of A to an adjacent row, then det B = det A. (b) If matrix B is obtained by interchanging two adjacent rows of A, then det B = − det A. Proof. Consider assertion (a) first. Assume that a scalar multiple of a row of A is added to a row above it. To prove assertion (a) in this case, we have to show that if the rows of the matrix A are the row vectors ρ1 , . . . , ρi , . . . , ρn , then for any i, (1 ≤ i ≤ n − 1) and any scalar a, det A = Dn (ρ1 , . . . , ρi + aρi+1 , ρi+1 , . . . , ρn ). Now, as determinant is linear on rows according to Proposition (2.7.4), we have Dn (ρ1 , . . . , ρi + aρi+1 , ρi+1 , . . . , ρn ) = Dn (ρ1 , . . . , ρi , ρi+1 , . . . , ρn ) + a.Dn (ρ1 , . . . , ρi+1 , ρi+1 , . . . , ρn ). The first term on the right-hand side of the equality is the determinant of A whereas the second term is the determinant of a matrix with identical adjacent rows. So, by the preceding proposition, the second term is zero and the assertion follows in this case. The case when a scalar multiple of a row of A is added to a lower row can be settled the same way. For proving assertion (b), we begin by noting, as particular cases of the first assertion, that we can add or subtract a row from an adjacent row of a matrix without changing its determinant. We use this

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

103

observation repeatedly in the following calculations to prove (b): det A = Dn (ρ1 , . . . , ρi , ρi+1 , . . . , ρn ) = Dn (ρ1 , . . . , ρi − ρi+1 , ρi+1 , . . . ρn ) = Dn (ρ1 , . . . , ρi − ρi+1 , ρi , . . . , ρn ) = Dn (ρ1 , . . . , −ρi+1 , ρi , . . . , ρn ) = −Dn (ρ1 , . . . , ρi+1 , ρi , . . . , ρn ) = − det B,

where the last but one equality follows from Proposition (2.7.5).

!

We can now quickly establish the analogous results for arbitrary rows. Theorem 2.7.9.

Let A ∈ Mn (F).

(a) If two rows of A are identical, then det A = 0. (b) If the matrix B is obtained by adding a scalar multiple of a row of A to another row, then det B = det A. (c) If the matrix B is obtained by interchanging any two rows of A, then det B = − det A. Proof. If two rows of A are equal, a few interchanges of adjacent rows will result in these rows becoming adjacents in the matrix C to which A changes with these row operations. By the second assertion of Proposition (2.7.8), det C = ± det A, and by the first, det C = 0, showing that det A = 0. This proves assertion (a). Consider assertion (b). Suppose that the matrix B is obtained by adding a scalar multiple of the jth row of A to its ith row. The idea is to verify that the effect of this single row operation on A is equivalent to the combined effect of certain row operations on adjacent rows of A and matrices obtained from A by these operations. It is clear that by interchanging the jth row and the rows adjacent to it but in between it and the ith row of A (to be precise, in A and the matrices obtained in succession by each of these interchanges), we obtain a matrix C which has the ith row and the jth row of A as adjacent rows. Observe that if k interchanges are needed to produce C from A, then det C = (−1)k det A by assertion (b) of the preceding proposition. Next, we obtain matrix C ' by adding the required scalar multiple of the row of C which is the jth row of A to its ith row (which is still the ith row of A). Since these two rows are adjacent in C, by assertion (a) of the preceding proposition, det C ' = det C. Finally, in C ' , we perform the same interchanges of adjacent rows that were done to rows of A but in reverse order. It is clear that these interchanges will produce the matrix B from C ' , and det B = (−1)k det C ' . It follows that det B = (−1)2k det A = det A. Hence assertion (b). A similar argument will prove assertion (c), and we leave the proof to the reader. ! Determinants and Elementary Matrices Our next task is to translate the foregoing results in terms of determinants of elementary matrices. Recall that there are three types of elementary matrices (see Definitions 2.3.2), and applying an elementary row operation to a matrix is equivalent to left-multiplying it by the corresponding elementary matrix (Proposition 2.3.3). The next proposition is, therefore, just a restatement of the last theorem.

Saikia-Linear Algebra

104

book1

February 25, 2014

0:8

Systems of Linear Equations

Proposition 2.7.10.

Let A ∈ Mn (F).

(a) If E is an elementary matrix of order n over F, corresponding to a row scaling by a scalar a (a ! 0) then det (EA) = a det A. (b) If E is an elementary matrix order n over F,, corresponding to a row exchange, then det (EA) = − det A. (c) If E is an elementary matrix order n over F, corresponding to a row replacement then det (EA) = det A. Taking A in the proposition to be the identity matrix In , whose determinant we have seen in Proposition (2.7.2) to be 1, we derive the following corollary: Corollary 2.7.11.

Let E be an elementary matrix of order n over a field F.

(a) If E corresponds to a row scaling by a (a ! 0) then det E = a. (b) If E corresponds to a row exchange then det E = −1. (c) If E corresponds to a row replacement then det E = 1. Substituting these values of the determinants of the elementary matrices in the preceding proposition, we get the following important result: Corollary 2.7.12. have

Let A ∈ Mn (F). Then, for any elementary matrix E of order n over a field F, we det (EA) = det E. det A.

Thus, if E1 , E2 , . . . , Er are elementary matrices of order n over F, then det (E1 E2 , · · · , Er ) = det E1 det E2 , . . . , det Er . Note that Corollary (2.7.11) also demonstrates that the determinant of any elementary matrix is nonzero. Since any invertible matrix can be written as a product of elementary matrices by Proposition (2.5.6), and since according to the preceding corollary, the determinant of a product of elementary matrices is the product of their determinants, it then follows that the determinant of an invertible matrix cannot be zero. Thus, we have proved one-half of the following important characterization of invertible matrices. Theorem 2.7.13. Let A ∈ Mn (F). A is invertible if and only if det A ! 0. Proof. For the other half, assume that det A ! 0. Now, let R be the reduced row echelon form of A, and let E1 , E2 , . . . , Er be the elementary matrices such that R = E1 E2 , . . . , Er A. Then, repeated application of Corollary (2.7.12) shows that det R = det E1 det E2 , . . . , det Er det A. It follows by hypothesis that det R ! 0. But R, being the reduced row echelon form of a square matrix A, either equals the identity, or has a zero row. (see Proposition 2.4.6). The second possibility cannot occur as det R is non-zero. So, R is the identity matrix, which is possible only if A is invertible by Corollary (2.5.3). ! The corollary preceding this important result is, in fact, a special case of the so-called multiplicative property of the determinant function:

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

Theorem 2.7.14.

105

For any A, B ∈ Mn (F), det (AB) = det A det B.

Proof. We first consider the case when A is invertible. Therefore, we can assume that A is a product of elementary matrices, say, of E1 , E2 , . . . , Er over F. The last corollary then yields that det A = det E1 det E2 , . . . , det Er . Therefore, by repeated applications of Corollary (2.7.12), we obtain det (AB) = det (E1 , E2 , . . . , Er B) = det (E1 (E2 , . . . , Er B)) = det E1 det (E2 , . . . , Er B) = det E1 . det E2 , . . . , det Er . det B = det A. det B. Next assume that A is not invertible, or equivalently that det A = 0. Since A is not invertible, we know that its reduced row echelon form, say, R must have a zero row. But then the product RB also has at least one zero row, forcing det (RB) = 0. Note that R is obtained by left-multiplying A by elementary matrices. Thus, by Corollary (2.7.12), det (RB) is the product of determinants of elementary matrices and det (AB). As elementary matrices have non-zero determinants, we conclude that det (AB) = 0. Since det A = 0, it follows that det A det B and det (AB) are equal, as both are equal to zero. ! This multiplicative property, combined with the fact that the determinant of the identity matrix is 1, gives us a formula of the determinant of the inverse of an invertible matrix. Proposition 2.7.15.

Let A be an invertible matrix in Mn (F). Then det A−1 =

1 . det A

Proof. Note that the formula makes sense, as A is invertible and so det A is a non-zero scalar. For the derivation of the formula, we take the determinant of both sides of the matrix equation AA−1 = In , and then use the multiplicative property of the det function to obtain det A det A−1 = 1. Since det A is non-zero, the proposition follows. ! Another consequence of the multiplicative property of determinants is the following corollary which we record for future reference: Corollary 2.7.16.

For A, B ∈ Mn (F) with B invertible, det (B−1 AB) = det A.

Proof. We leave the proof to the reader.

!

In this connection, we would like to remark that for matrices X and Y, det X det Y = det Y det X as the determinants are scalars, even though the matrix product XY is not equal to YX.

Saikia-Linear Algebra

106

book1

February 25, 2014

0:8

Systems of Linear Equations

It is now time to examine whether our original definition of determinant can be modified. Recall that the Definition (2.12) of determinant of a matrix A was in terms of expansion by minors along the first row. We ask: can we use some other row in place of the first row, say the ith row? To answer the question, given a matrix A with row vectors ρ1 , ρ2 , . . . , ρi , . . . , ρn as its successive rows from the top, we consider the matrix B with row vectors ρi , ρ1 , . . . , ρi−1 , ρi+1 , . . . , ρn as its rows. Thus, whereas the first row of B is the ith row ρi of A, the rest of the rows of B are the rows of A occurring in the same order as in A starting with ρ1 but with the ith row missing. Observe that the minor obtained from A by deleting the ith row and the jth column is that same as the one obtained from B by removing the 1st row and the jth column. Therefore, if Ai j denotes the minor obtained from A by deleting the ith row and the jth column of A, then according to Definition (2.12), we have 1 det B = (−1) j+1 ai j det Ai j (2.18) j

provided ρi = (ai1 , ai2 , . . . , ain ). Next, we find out how det A is related to det B. First note that we can obtain B from A by (i − 1) successive interchanges of rows, in fact, of adjacent rows. Begin by interchanging ρi and ρi−1 in A so that in the new matrix ρi is the (i − 1)th row and ρi−1 is the ith row; next interchange ρi and ρi−2 in the new matrix to make ρi the (i − 2)nd row of the newer matrix, and continue in this manner. Since i − 1 interchanges of adjacent rows produce B from A, and each interchange changes the determinant of the matrix in which this row operation is taking place by a factor of (−1), we see that det B = (−1)(i−1) det A. But note that (−1)2k = 1 for any positive integer k. So, the same relation can be put as det A = (−1)(i−1) det B. Using the value of det B from Equation (2.18), we thus obtain the formula 1 (−1)i+ j ai j det Ai j . (2.19) det A = j

The preceding formula, known as Laplace’s Expansion by minors along the ith row, enables one to expand the determinant of a matrix A by minors Ai j along any fixed ith row. Observe that for i = 1, this gives our original formula (2.12). As we have shown the validity of expansion of a determinant by any row, naturally a question arises: Can the determinant of a matrix be found by expanding along a column? The answer is yes. Starting by defining the determinant of a matrix in terms of the minors of its first column, we can develop all the properties of the determinant function analogous to all the ones we have developed so far but in terms of columns. But a simpler way would be to show that the determinants of a matrix and its transpose are equal. Since the rows of the transpose of a matrix are precisely its columns, it will then be a straightforward task to obtain the properties of the determinant function in terms of columns. We begin by showing that the determinants of an elementary matrix and its transpose are the same. Lemma 2.7.17.

Let E be an elementary matrix in Mn (F). Then det E = det E t .

Proof. If E is an elementary matrix, corresponding to either a row scaling or a row exchange then it is clear that E t = E (see Definition 2.3.2) so there is nothing to prove. So assume that E is the matrix obtained by adding c times the jth row of In to its ith row, then E = In + cei j , where ei j is the unit matrix of order n having 1 at its (i, j)th place and zeros elsewhere. It is then clear that E t = In + ce ji , which corresponds to the row operation of adding c times the ith row to the jth row. Therefore, by the third part of Corollary (2.7.11), det E = det E t . !

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

107

Recall that if E is an elementary matrix corresponding to some elementary column operation, then E can also be produced from In by some elementary row operation (see discussion after Definition 2.3.1 of elementary matrices), the lemma is also valid such an E. The lemma is crucial for the proof of the next result. Proposition 2.7.18.

Let A ∈ Mn (F). Then,

det A = det (At ),

where At is the transpose of A. Proof. Let R be the reduced row echelon form of A with r pivots, where 1 ≤ r ≤ n (if r = 0 then R and so A must be the zero matrix and so there is nothing to prove). Thus, there are elementary matrices E1 , E2 , . . . , El , corresponding to elementary row operations which reduce A to R such that R = El , . . . , E2 E1 A. We claim that suitable column operations will further reduce R to ( ' Ir 0 . S= 0 0 If r = 1, R itself has the form of S and so we may assume that r ≥ 2. It is clear that by adding suitable multiples of pivot columns of R to non-pivot columns, we can make all the non-zero entries of R, except the pivots, zeros. Therefore, the only non-zero entries of the matrix obtained after these column operations are the pivots of R and so suitable column exchanges will reduce it to S . Hence our claim. By our claim there are elementary matrices F 1 , F2 , . . . , Fm , corresponding to elementary column operations, which reduced R to S , such that S = El · · · E2 E1 AF1 F2 · · · Fm .

(2.20)

Since S T = S , taking transposes of both sides of the preceding relation, one obtains S = F m t · · · F 1 t At E 1 t · · · E l t .

(2.21)

Therefore, taking determinants of the two matrix products expressing the same matrix S , given in Equations (2.20) and (2.21) results in the following equality (det El ) · · · (det E2 )(det E1 )(det A)(det F1 ) · · · (det Fm ) = (det Fm t ) · · · (det F1 t )(det At )(det E1 t ) · · · (det El t ).

(2.22)

as determinant is multiplicative. By Corollary (2.7.11), the determinant of an elementary matrix is non-zero. Also we have just seen that the determinants of an elementary matrix and its transpose are equal. Thus, equation (2.22) (note that all the determinants in the equality are scalars in F) implies that det A = det At . ! Let us examine why the last result allows us to compute a determinant of a matrix A = [ai j ] by expanding it by minors along any column. Denote the transpose At of A by B = bi j so bi j = a ji for all i and j. Observe that the minor B ji obtained from B by crossing out the jth row and the ith column of B is the transpose of the minor Ai j obtained by removing the ith row and the jth column of A. It follows from the preceding proposition that det B ji = det Ai j . Now, according to Equation (2.19), we can find

Saikia-Linear Algebra

108

book1

February 25, 2014

0:8

Systems of Linear Equations

the determinant of B by expanding along, say, the jth row: 1 det B = (−1) j+i b ji det B ji . i

However, det B = det A, b ji = ai j and det B ji = det Ai j , so that the preceding equation can be re-written as 1 det A = (−1)i+ j ai j det Ai j . (2.23) i

Note that the last sum is over dummy index i, so that the sum runs over the entries of A along the jth column as i varies. In other words, this gives us the expansion of the determinant of A in minors along any fixed column of A. Apart from helping to derive the explicit expansion of the determinant of a matrix along any of its columns, the preceding proposition also allows us to translate all the properties of the determinant function determined by conditions on rows of matrices (which have been derived so far) to properties determined by analogous conditions on columns. We gather all such properties in the following theorem whose easy proof is left to the reader. Theorem 2.7.19. Let A ∈ Mn (F). (a) (b) (c) (d) (e)

The determinant function is linear on columns of A. If any column of A is a zero column, then det A = 0. If two columns of A are identical, then det A = 0. If a multiple of a column of A is added to another column, the determinant is unchanged. If two columns of A are interchanged, the determinant changes by a factor of −1.

A final remark about evaluation of determinants of specific matrices. More often than not, elementary row or column operations are performed on a given matrix to bring it to a form where it is possible to use some result to compute its determinant directly without using the expansion by minors. For example, if it is possible to reduce the matrix to an upper triangular matrix, then its determinant can be computed simply by multiplying its diagonal elements. In case expansion by minors along a row or a column has to be employed, one makes sure that row or column has plenty of zeros by appropriate row or column operations. However, if elementary row or column operations are used on a given matrix to simplify the computation of its determinant, then one must keep track of the changes of the determinant of the original matrix due to these operations. Cramer’s Rule To end this section, we apply the idea of determinants to give explicit solution of a system of equations Ax = b in case it has a unique solution. This explicit formula is known as Cramer’s Rule. It should be mentioned that this rule has little practical use as it involves prohibitive number of calculations. However, its theoretical importance requires that we are familiar with the rule. We need a bit of notation first. Given a system of n equations in n variables Ax = b over a field F, we let A( j) for any j, (1 ≤ j ≤ n), to be the n × n matrix obtained from the coefficient matrix A by replacing its jth column by the column vector b. In other words, if γ1 , γ2 , . . . , γn are the n column vectors of a matrix A, and if we describe A as / 0 A = γ1 . . . γ j . . . γn ,

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

then for any fixed j,1 ≤ j ≤ n,

/ A( j) = γ1

Thus, for

we have, for example,

  3  A =  2  −1  4  A(1) = 1  0

1 0 3

...  −4  6  5

1 0 3

 −4  6  5

and

b

...

and

109

0 γn .

  4   b = 1 ,   0

  3  A(3) =  2  −1

1 0 3

 4  1 , etc.  0

Now, we can state the rule that explicitly gives the solution of a system of equations, provided the determinant of the coefficient matrix is non-zero. Theorem 2.7.20. (The Cramer’s Rule) Let Ax = b, be a system of n equations in n variables over a field F. If det A ! 0, then a unique solution of the system of equations exist whose components are given by xj =

det A( j) det A

for any j, 1 ≤ j ≤ n. Note that the standard convention we are following is that the components of the n×1 column vector x are the variables x1 , x2 , . . . , xn . However, in the formula for the Cramer’s Rule, the x j actually stands for the components of the solution vector. We have already seen that the condition det A ! 0 is a necessary and sufficient for the system Ax = b to have a solution, and by an abuse of language, we will refer to the components of the solution (they are now actually scalars) by x j . This is more in keeping up with the conventional ways of stating Cramer’s Rule. Proof. Since, by Theorem (2.7.19), the determinant function is linear on columns, we get, for any fixed j, 0 / x j γ j . . . γn , x j det A = det γ1 . . .

where γi is the column vector consisting of the entries of the ith column of A. Also, by the same result (2.7.19), adding a multiple of a column to a fixed column does not change a determinant. Therefore, we may add to the jth column γ j of the preceding determinant the multiples x1 γ1 , x2 γ2 , . . . , xn γn of all the other column vectors without changing the determinant. But in that case the jth column vector 4 of x j det A will be the sum i xi γi where the sum runs from 1 to n. On the other hand, as we are assuming that the xi are the components of the solution of Ax = b, interpreting this matrix equations in terms of the column vectors using the column-row rule for multiplication of matrices, we obtain n 1 i=1

xi γi = b.

Saikia-Linear Algebra

110

book1

February 25, 2014

0:8

Systems of Linear Equations

Consequently, the jth column of the determinant x j det A can be replaced by the column vector b whence 0 / b . . . γn . x j det A = det γ1 . . . Thus, x j det A is precisely the determinant of the matrix A( j) introduced at the beginning of this section. Since det A ! 0, the formula in the theorem follows. !

The Cramer’s Rule can also be derived from a certain explicit formula for the inverse of an invertible matrix, although it must be admitted that like the Cramer’s Rule, this formula too is not very practical. We need a definition before stating the formula. Definition 2.7.21. Let A ∈ Mn (F). For any i and j, 1 ≤ i, j ≤ n, let Ai j be the minor produced by deleting the ith row and the jth column of A, and let ∆i j = (−1)i+ j det Ai j . The (classical) adjoint of A, denoted by ad j(A), is defined to be the n × n matrix, whose (i, j)th entry is ∆ ji . In other words, ad j(A) = [ ∆i j ]t , where the superscript t denotes the transpose. ' ( 1 2 For example, for A = , we have 3 4 ∆11 = 4, ∆12 = −3 ∆21 = −2, ∆22 = 1 so that ad j(A) =

'

4 −3

( −2 . 1

Observe that A · ad j(A) =

' −2 0

( 0 = (−2)I2. −2

It is no accident that −2 is the determinant of the original matrix A. It follows that A−1 = ad jA/ det A. This relation actually holds for any arbitrary invertible matrix and is a consequence of the next theorem. Theorem 2.7.22. Let A ∈ Mn (F). Then ad j(A) · A = det A · In = A · ad j(A) so that if A is invertible, A−1 =

ad j(A) . det A

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

111

Proof. Let [ci j ] be the product (ad j(A))A. Then, according to Definition (2.6.2), a typical entry of the product is given by ci j = =

n 1

k=1 n 1 k=1

∆ki ak j (−1)k+i ak j · det Aki .

It is clear from the form of the formula that ci j is the expansion by minors along the ith column of the determinant of some matrix. Since the minors in the sum are Aki , where i is fixed and k varies, we see that the matrix must be the one obtained from A by replacing the ith column by the jth column of A itself. This new matrix, thus, has two identical columns if i ! j. If i = j, then the matrix is A itself, and so cii = det A. Stated differently, the product (ad j(A))A is a diagonal matrix as off-diagonal entries ci j are zeros whereas with each diagonal entry cii equal to det A. It follows that (ad j(A))A is the scalar matrix det A · In . A similar calculation shows that A · ad j(A) is the same scalar matrix. !

EXERCISES 1. Determine whether the following statements are true or false giving brief justifications. All given matrices are square matrices over an arbitrary field F. (a) If two rows, or two columns, of a matrix A are identical, then det A = 0. (b) If det A = 0, then at least two rows or two columns of A are identical. (c) For any elementary matrix E, det E = − det E t .

(d) If B is row equivalent to A, then det A = det B. (e) If A is an upper triangular matrix, then det A is the product of the diagonal entries of A. (f) For any matrix A, det (−A) = − det A.

(g) The determinant a matrix in reduced row echelon form is either 1 or 0. (h) If det A of a matrix A in Mn (F) is non-zero, then the reduced row echelon form of A must be In . (i) For an invertible matrix A, det A = det (A−1 ). (j) For any two matrices A and B of the same order, det (AB) = det (BA). (k) For any two matrices A and B of the same order, det (A + B) = det B + det A. (l) If A ∈ Mn (R) has integer entries, then det A is an integer. (m) If, for A ∈ Mn (R), det A is an integer, then A has integer entries. (n) If A ∈ Mn (R) has positive entries, then det A is also positive. (o) If A = −At , then det A is zero.

(p) If det A = 0, then the system of equations Ax = 0 has more than one solution.

2. Prove assertion (b) of Proposition (2.7.4). 3. Prove, by induction, that if any column of a matrix A ∈ Mn (F) is a zero column, then det A = 0. 4. Prove assertion (c) of Theorem (2.7.9).

Saikia-Linear Algebra

112

book1

February 25, 2014

0:8

Systems of Linear Equations

5. Prove Corollary (2.7.16). 6. Prove Theorem (2.7.19). 7. For an elementary matrix E over a field F, show directly that det E = det (E t ), that is, without invoking Proposition (2.7.18). 8. Show that for any A ∈ Mn (F) and any scalar c ∈ F, det (cA) = cn det A. A matrix A ∈ Mn (F) is called a nilpotent matrix if Ak is the zero matrix for some positive integer k. 9. For any nilpotent A ∈ Mn (F), show that det A = 0. 10. Let A ∈ Mn (F). Assume that 2 ! 0 in F. (a) If A = −At , then show that A cannot be invertible if n is odd. (b) Assume that F = R. If AAt = In , then show that det A = ±1.

11. Let F be a field. Show that det : Mn (F) → F is an onto function which is not one–one. Prove, further, that det is a homomorphism from the multiplicative group GLn (F) of invertible matrices in Mn (F) onto the multiplicative group of non-zero elements of F. 12. Evaluate the determinants of the following matrices over R.       3 −1 2 3 1 a a2  0 −1      0 2 3,  3 4 0, 1 b b2        2 1 −4 0 −2 1 0 1 c c 13. Find the determinant of the following real matrix:  1 a  0  0 0  0

1 1 0 0 0 0

0 0 a 1 0 0

0 0 0 a 0 0

0 0 0 0 a 1

 0  0 0  0  1  1

. 14. Let A ∈ Mn (R) having a ∈ R in the diagonal, 1 along the subdiagonal as well as along the superdiagonal and zeros everywhere else. Evaluate det A. 15. Use the Cramer’s Rule to find solutions of the following system of equations over any field F: ax1 + x2 = b1 x1 + ax2 + x3 = b2 x2 + ax3 = b3 ,

5 √ 6 a! 2

where a is a non-zero and b1 , b2 and b3 are arbitrary elements of F. Hence or otherwise, find the inverse of the matrix   a 1 0 1 a 1.   0 1 a

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Determinant

16. Solve the following system of equations over R by using the Cramer’s Rule:       2 −1 −1  x1  b1       2 −1  x2  = b2 , −1     b3 −1 −1 2 x3

113

where b1 , b2 and b3 are arbitrary real numbers. 17. Use the Cramer’s Rule to find solutions of the following system of equations over any field F:      0 a b  x1  0 a b a  x2  = a,       b a 0 x3 b where a and b are non-zero elements of F.

Saikia-Linear Algebra

3

book1

February 25, 2014

0:8

Vector Spaces

3.1 INTRODUCTION The set of m × n matrices, say with real entries, is an example of a mathematical structure with two operations, similar to addition and scalar multiplication of matrices, which satisfy the same basic properties as the matrix operations do. Such structures are known as vector spaces. Examples of vector spaces and their applications are found in every conceivable area of today’s theoretical and technical disciplines. Because these structures occur in such diverse areas, it makes sense to study vector spaces in abstract without referring to specific entities so that a single theory is available to deal with all of them. Before stating the formal definition of a vector space, we have a brief look at a few examples of such spaces. One of the most important example is a special case of the matrices, namely, the set Rn of all ordered n-tuples of real numbers. As we have seen in the very first chapter, elements in Rn can be viewed either as n-dimensional row vectors or as n-dimensional column vectors depending on our convenience. Vectors of Rn can be added and multiplied by a real number component-wise; the resultant vectors will still be in Rn :       a1  b1  a1 + b1 a + b  a2  b2    +   =  2 2 . .  .   .   ...   .   .    an + bn bn an     a1  ca1  a  ca   2   2 c  .  =  .  . .  .   ..      can an

Rn is a vector space as Rn with respect to these operations, called addition and scalar multiplication, satisfy the properties given in Propositions (1.3.1) and (1.3.3). If, instead, we consider Rn as the collection of n-dimensional row vectors, the vector space operations will be similarly performed

114

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

115

component-wise as follows: (a1 , a2 , . . . , an ) + (b1, b2 , . . . , bn ) = (a1 + b1 , . . . , an + bn ) c(a1 , a2 , . . . , an ) = (ca1 , ca2 , . . . , can ). For the next example of vector spaces, we have chosen the collection of all directed line segments in the plane (or in the three-dimensional space). These are probably the first vectors the reader had encountered in college mathematics. As explained in geometry courses • Two line segments are equal, if they have the same length and direction. • Two line segments are added by the Parallelogram law. • The scalar multiple of a line segment u by a real number a is the line segment whose length is |a| times the length of the original one, and whose direction is the same as that of u if a > 0, and opposite to that of u otherwise. The reader may recall that the basic properties of the two operations of addition and scalar multiplication of line segments are exactly the same as those of matrix operations given in Propositions (1.3.1) and (1.3.3), and can be verified by geometrical arguments. For example, a line segment having length zero, that is, a point is clearly the additive identity, playing the role of the zero matrix here. Again, given a line segment u, the line segment v having the same length as u, but being in the opposite direction acts as the additive inverse of u, for u + v is clearly the segment of zero length, that is, the zero segment. The set of directed line segments, thus, is a real vector space, as the scalars are real numbers. As our third example, we consider the set of all real-valued functions on some closed interval [a, b]. As we will see later, such functions can be added and their scalar multiples can be defined in a very natural manner. For example, x2 + sin x and 3 exp x need no explanations. What is important is that the properties satisfied by these operations on functions are the same as the ones satisfied by matrices, or by ordered n-tuples of real numbers. In view of these examples, it will be convenient to have the concept of a vector space in abstract without specifying the nature of the elements in it. To be more specific, we will define a vector space to be a set satisfying certain rules (axioms) with respect to certain operations and derive properties satisfied by the space solely on the basis of these axioms.

3.2 BASIC CONCEPTS As in the earlier chapters, a reader who is not familiar with the concept of a field, may take it as the set of either the real numbers or the complex numbers with usual addition and multiplication. Definition 3.2.1.

Let F be a field and V a non-empty set with two operations:

(a) addition, which is a rule by which each pair of elements u, v in V is associated with a unique element u + v ∈ V, called the sum, (b) scalar multiplication, which is a rule by which each u ∈ V for any a ∈ F is associated to a unique element au ∈ V, called the scalar multiple.

We say that V is a vector space over the field F if the following axioms hold: (i) For all u, v and w in V,

Saikia-Linear Algebra

116

book1

February 25, 2014

0:8

Vector Spaces

(a) u + v = v + u; (b) (u + v) + w = u + (v + w); (c) There is an element 0 ∈ V such that u + 0 = u = 0 + u; (d) For each u ∈ V, there is an element −u ∈ V such that u + (−u) = 0 = (−u) + u; and (ii) for all u, v in V and all scalars a, b in F, (a) a(u + v) = au + av; (b) (a + b)u = au + bu; (c) (ab)u = a(bu); (d) 1u = u, where 1 is the multiplicative identity in F. The four axioms in (i) shows that a vector space V is an abelian group under addition. In other words, a vector space V over a field F is an abelian group with respect to addition satisfying additionally the axioms in (ii) for multiplication by scalars in F. The elements of a vector space are usually called vectors; the vector 0 whose existence is assured in the third axiom for addition is called zero vector. The zero vector in a vector space is also known as the additive identity, and the vector −u as the additive inverse of u. Note our convention that vectors are denoted by bold lower case letters and scalars by plain lower case letters. If F = R, we say that V is a real vector space and if F = C, V is a complex vector space. In general, if V is a vector space over a field F then the elements of F are called scalars. It is useful to have the notion of subtraction in a vector space V; for any u, v ∈ V, we let u − v = u + (−v).

(3.1)

Thus v − v = 0. It follows that by subtracting v from both sides of the relation u + v = w + v, we obtain u = w. The following properties are easy consequences of the definition of a vector space. Proposition 3.2.2. For a vector space V over a field F, the following assertions hold: (a) (b) (c) (d) (e) (f)

The zero vector 0 of V is unique. The negative or the additive inverse −u in V of any vector u ∈ V is unique. a0 = 0 for any scalar a ∈ F. 0u = 0 for any vector u ∈ V. (−a)u = −(au) for any u ∈ V and any a ∈ V. For a non-zero vector u ∈ V, au = 0 for a scalar a implies that a = 0.

We should point out that the bold-faced zero denotes the zero vector, and the plain zero denotes the scalar zero.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

117

Proof. To prove (a), assume that there is another vector 0 ∈ V such that 0 + v = v = v + 0 for all v ∈ V. In particular then, 0 + 0 = 0 = 0 + 0. However, by definition, 0 + 0 = 0. It follows that 0 = 0, proving the uniqueness of 0. Similarly to prove (b), assume that for a v ∈ V, there is some u ∈ V such that v + u = 0 = u + v. Now as (−v) + v = 0, one has the following: u = 0 + u = (−v + v) + u = −v + (v + u) = −v + 0 = −v. This proves the uniqueness of the additive inverse of any v ∈ V. Next observe that by the distributive law for scalar multiplication, for any u ∈ V and a ∈ F, one has au = a(u+0) = au+a0. So by subtracting the vector au from both sides, we conclude that a0 = 0, which is (c). On the other hand, by subtracting au from both sides of the relation au = (a + 0)u = au + 0u, we obtain 0u = 0, which is (d). Now for any a ∈ F, a + (−a) = a − a = 0 in F. Therefore, au + (−a)u = (a + (−a))u = 0u = 0 by (d). By the uniqueness of the additive inverse, then it follows that −au = (−a)u, which proves (e). For the final assertion, recall that any non-zero scalar a ∈ F has a multiplicative inverse a−1 in F such that aa−1 = 1. Therefore, if au = 0 for a non-zero scalar a, then multiplying the equality by a−1 and using the last two axioms for scalar multiplication, we see that u = 0. Thus, if au = 0 holds for a non-zero vector u, then a = 0 in F. This completes the proof. Next observe that for any a ∈ F and v ∈ V, −av is the additive inverse of av. Therefore, by the distributive law for scalar multiplication, a(v − v) = av − av = 0. it follows that a0 = a(v − v) = 0.

!

Subspaces Before discussing examples of vector spaces, let us introduce the important idea of a subspace. A subspace W of a vector space V is a non-empty subset of V which is a vector space on its own with respect to the operations of V. In other words, a subspace is actually a vector space within a larger vector space with its laws of compositions the same as the ones in the larger one but restricted to it. It is easy to see that a subset of a given vector space will form a vector space on its own if it satisfies only the three conditions stated in the next result. Proposition 3.2.3. A non-empty subset W of a vector space V is a subspace (with respect to addition and scalar multiplication of V restricted to it) if (a) W is closed with respect to addition, that is, for any u and v ∈ W, the sum u + v ∈ W; (b) W is closed with respect to scalar multiplication, that is, for any u ∈ W, and a ∈ F, the scalar multiple au ∈ W; (c) the zero vector of V is in W. Proof. Conditions (a) and (b) imply that the operations of V are also valid as operations within W. So W, whose elements are vectors in V too, does satisfy all the axioms of a vector space except possibly the ones that stipulate the existence of additive identity (zero) and additive inverse. However, condition (c) ensures that W has the zero vector. Thus, the only point we have to check is that for any u ∈ W, its inverse −u in V actually lies in W. But by condition (b), (−1)u ∈ W and (−1)u = −u as u is a vector in a known vector space V.

Saikia-Linear Algebra

118

book1

February 25, 2014

0:8

Vector Spaces

Actually, the third condition is superfluous as W is non-empty. For any given vector in W, the second condition ensures that its inverse is in W so that the first condition implies that the zero vector is in W. ! Let us discuss some examples of vector spaces. Note that the subspaces of any vector space also provide us with examples of vector spaces. EXAMPLE 1

For any vector space V, there are always two subspaces, namely, the whole space V itself, and the singleton {0} consisting of the zero vector of V, known as the zero subspace of V. The three conditions defining a subspace in both the cases are trivially satisfied. These two subspaces are known as the trivial or the improper subspaces of V. Any subspace of V, other than these two, is known as a proper subspace of V.

EXAMPLE 2

Let K be a subfield of a field F (see Section 1.7 for the definition of subfields). F, being a field, is an abelian group with respect its addition. Multiplication of elements of F by elements of K can be considered as scalar multiplication on F, which, by the properties of multiplication in F, trivially satisfies all the axioms for scalar multiplication. Thus, F is a vector space over any of its subfields. In particular, F is a vector space over itself. Since the field R of real numbers is a subfield of the field C of complex numbers, C is a vector space over R. Similarly both R and C are vector spaces over the field Q of rational numbers.

EXAMPLE 3

The basic model for a vector space for us, as we have mentioned earlier, is the real vector space Rn , whose elements are n-dimensional column (or n-dimensional row) vectors with real entries. As we have seen in the introduction to this chapter, the addition as well as the scalar multiplication in Rn are performed (and equality defined) component-wise. That Rn is an abelian group under addition, and that Rn is actually is a vector space over R are consequences of the corresponding results in Propositions (1.3.1) and (1.3.3) for matrices. Nevertheless, the reader should verify the vector space axioms for Rn directly. Two special cases of Rn , namely, the plane R2 and the three-dimensional space 3 R , are the easiest spaces to visualize and so will play an important role in the rest of this book.

EXAMPLE 4

The subspaces of Rn , in particular of R2 and R3 , are important geometrical objects. In fact, the subspaces of R2 and R3 are best described geometrically. Note that R2 = {(x1 , x2 ) | xi ∈ R} so that we can think of R2 as the plane with a fixed coordinate system whose points are identified through their coordinates with the ordered pairs in R2 . Thus, the zero subspace of R2 is the singleton formed by the origin (0, 0). Now, any line passing through the origin has the equation of the form y = mx, and so the subset {(x, mx) |

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

119

x ∈ R} of R2 does describe the line. It is easy to verify that {(x, mx) | x ∈ R} for a fixed real number m is indeed a subspace of R2 . We will show later that any proper subspace of R2 must be such a straight line in the plane through the origin. Similarly, the proper subspaces of R3 are either the straight lines through the origin or the planes containing the origin. Sometimes for geometrical considerations, instead of identifying a row vector x = (x1 , x2 ) (or the column vector x = (x1 , x2 )t ) in R2 with the point (x1 , x2 ) in the plane, we think of x as representing the directed line segment from the origin (0, 0) to the point (x1 , x2 ). This representation works effectively as vector addition corresponds precisely to the addition of line segments by the Parallelogram law: if L1 is the line segment from (0, 0) to (x1 , x2 ) and L2 the segment from (0, 0) to (y1 , y2 ), then L1 + L2 is the line segment from (0, 0) to the point (x1 + y1 , x2 + y2 ). Such identification of vectors with line segments enables geometrical problems to be discussed in terms of vectors. EXAMPLE 5

For any field F, we can consider the set Fn of all n-tuples of scalars from F. The components being field elements, we can define equality and addition as well as scalar multiplication component-wise in Fn in terms of addition and multiplication in F exactly the way they were defined for Rn . With respect to these operations, Fn becomes a vector space over F (see details of these operations in Section 3 of Chapter 1). An important example is the complex vector space Cn of all n-tuples of complex numbers with respect to operations defined component-wise. The elements of Fn (so of Cn too) can be considered either n-dimensional column or n-dimensional row vectors. Usually a column vector shall be written as the transpose (a1 , a2 , . . . , an )t of a row vector.

EXAMPLE 6

Every homogeneous system of equations Ax = 0, where A is a fixed matrix in Mm×n (F) and x is an n-dimensional column vector of variables, provides us with a subspace of Fn (here, we consider Fn as the space of column vectors over F). We claim that the n-dimensional column vectors in Fn , which are solutions of Ax = 0, form a subspace of Fn , called the solution space of the given system of equations. For, by the rules of matrix multiplication, A(y + z) = Ay + Az and Acy = cAy, so that the sum of any two solutions and any scalar multiple of a solution are again solutions of the system of equations Ax = 0. Since the zero vector is always a solution of Ax = 0, our claim follows from Proposition (3.2.3). Note that if A is an invertible matrix, then the solution space of Ax = 0 is none other than the zero subspace of Fn .

EXAMPLE 7

It is easy to check that Propositions (1.3.1) and (1.3.3) of Chapter 1 imply that the set of matrices Mm×n (F) or Mn (F) are both vector spaces over the field F. There are numerous subspaces of these two vector spaces. Some interesting ones of the vector space of square matrices Mn (F) are listed below. (a) The lower triangular matrices; (b) The upper triangular matrices;

Saikia-Linear Algebra

120

book1

February 25, 2014

0:8

Vector Spaces

(c) The diagonal matrices; (d) The scalar matrices; (e) The set of all matrices whose some specified entries are zeros. For example, if all off-diagonal entries are zero, then the corresponding set is the subspace of diagonal matrices. Also note, as another example, that the subset of all matrices in Mn (F) whose first rows are zero is a subspace of Mm×n (F). EXAMPLE 8

Let X be a non-empty subset of either R or the set of natural numbers N, and let X R be the set of all functions from X into R. Thus, X R = { f | f : X → R}. First, we define equality in X R . For any f, g ∈ X R , we let f = g if and only if f (x) = g(x) for all x ∈ X. We define the sum f + g of elements f, g ∈ X R and the scalar multiple a f for any a ∈ R point-wise, that is, by giving their values at any x ∈ X as follows: ( f + g)(x) = f (x) + g(x) (a f )(x) =a f (x), where the sum f (x) + g(x) and the product a f (x) are computed as real numbers. One easily verifies that X R with these laws of compositions is a real vector space. Note that verification of the vector space axioms is straightforward because of the properties of addition and multiplication of real numbers. The zero vector here is the function z defined to satisfy z(x) = 0 for every x ∈ X. Consequently, the inverse of any f ∈ X R is the function − f given by (− f )(x) = − f (x). Consider the special case when X = [a, b], a closed interval in R. The set of all continuous functions as well as the set of all differentiable functions from [a, b] into R form two interesting subspaces of X R . We will denote the real vector space of all real-valued continuous functions on a closed interval [a, b] as C[a, b].

EXAMPLE 9

For any field F, a sequence in F can also be considered as a function f from the natural numbers N into F. It is a convention to describe such a sequence f such that f (n) = an for n = 1, 2, . . . by the symbol {an }. We can define operations on the collection V of all sequences in F term-wise: {an } + {bn } = {an + bn } and c{an } = {can }. Since the terms of the sequences are field elements, these term-wise operations make V into a vector space over F. Note that the zero of this vector space has to be the constant sequence whose every term is the zero of the field.

EXAMPLE 10 Let R[x] be the collection of all polynomials with real coefficients. Take the laws of compositions in R[x] to be the usual addition of polynomials and the multiplication of a polynomial by a real number. With these operations, R[x] is a real vector space.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

121

Observe that for any non-negative integer n, the set Rn [x] of all polynomials of degree at most n is a subspace of R[x]. We point out that polynomials can be considered over any field; the coefficients of such a polynomial are scalars from the field. It is clear that operations defined in the same way as in R[x] will make the set F[x] of such polynomials a vector space over F. Sum of Subspaces For the following result, which produces new subspaces from known ones, we need a new notation: if A and B are non-empty subsets of a vector space V, then the sum A + B = {a + b | a ∈ A, b ∈ B} is the collection of all possible sums of the vectors of A and B. Proposition 3.2.4. Let W1 and W2 be two subspaces of a vector space V over a field F. Then the intersection W1 ∩ W2 and the sum W1 + W2 are subspaces of V. Proof. Leaving the case of the intersection to the reader, we consider that of the sum. Since the zero vector of V belongs to both the subspaces, it is in W1 + W2 too. Next, let u, v ∈ W1 + W2 . By definition, there are elements wi , w'i ∈ Wi such that u = w1 + w2 and v = w'1 + w'2 . So, u + v = (w1 + w2 ) + (w'1 + w'2 ) = (w1 + w'1 ) + (w2 + w'2 ), which is clearly in W1 + W2 . Now, if a ∈ F, then au = a(w1 + w2 ) = aw1 + aw2 is in W1 + W2 as W1 and W2 are closed with respect to scalar multiplication. Thus, by Proposition (3.2.3) W1 + W2 is a subspace of V. ! We must point out that the union W1 ∪ W2 need not be a subspace. See Exercise 9 of this section. There is another natural way of constructing subspaces of a given vector space. We begin by the simplest of such construction. EXAMPLE 11 Fix any arbitrary vector v in a vector space V over a field F. Then, by Proposition (3.2.3), one sees easily that the set {av | a ∈ F} of all scalar multiples of v is a subspace of V. It is the subspace of V generated or spanned by v. If W1 and W2 are subspaces of V generated by vectors v1 and v2 , respectively, then it is clear that the sum W1 + W2 is the collection of all sums of the type a1 v1 + a2 v2 . Since such sums of scalar multiples of vectors, known as linear combinations of vectors, are essential in examining vector space structures, we discuss them now. Let V be a vector space over a field F. Given a set of vectors v1 , v2 , . . . vm in V, a linear combination of the given vectors is a sum of the type 1 a i vi = a 1 v1 + a 2 v2 + · · · + a m vm . i

where a1 , a2 , . . . , am are arbitrary scalars from F. Choosing all the ai to be zeros, we see that the zero vector can be realized as a linear combination of any set of vectors. A single vector v j can also be thought of as a linear combination of v1 , v2 , . . . , vm by choosing a j = 1 and all the other ai to be zeros.

Saikia-Linear Algebra

122

book1

February 25, 2014

0:8

Vector Spaces

We have already come across linear combinations of vectors in preceding chapters. For example, in Propositions (1.3.7) and (1.3.8), we had expressed arbitrary matrices as linear combinations of unit matrices. Note: If W is a subspace of a vector space, any linear combination of a finite number of vectors of W is a vector in W. Linear combinations of vectors from a set of vectors give rise to a subspace known as the linear span of the vectors. Definition 3.2.5. Given a set S = {v1 , v2 , . . . , vm } of finitely many vectors of a vector space V over a field F, the linear span /S 0 of S (over F) is the set all possible linear combinations of vectors of S , that is, /S 0 = {a1 v1 + a2 v2 + · · · + am vm        1 ai vi | ai ∈ F = .    

| ai ∈ F}

i

If S has infinitely many vectors, then /S 0 is defined to be the collection of all the linear combinations of all possible finite subsets of S . We may rephrase the definitions for the two cases as a single assertion as follows: /S 0 is the set of all finite linear combinations of vectors of S . If S = {v}, then it is clear that /S 0 is the set of all scalar multiples of v which includes v as well as the zero vector, and as we had seen in the preceding example, is a subspace. Let us see some more examples. EXAMPLE 12 If S is a singleton {u} in R2 , then its linear span is the set {au} where a ranges over R. Geometrically, the span is the straight line through the origin and the point u in the plane R2 . For example, if we choose u = (1, 0), then its span is {(a, 0)} which is the x-axis. Similarly, the linear span of the unit vectors {(1, 0), (0, 1)} is the set {a(1, 0) + b(0, 1)} = {(a, b)} for arbitrary reals a and b, and so must be the whole plane R2 . EXAMPLE 13 In the same manner, we can see that if u and v are two vectors in R3 , then their linear span will be the plane through the origin containing the two vectors. Thus, if u = (1, 0, 0) and v = (0, 1, 0), then their span is the xy plane. EXAMPLE 14 Consider the set {1, x, x2 } of vectors in the real vector space R[x]. The linear span of this set is clearly {a + bx + cx2 | a, b, c ∈ R}, and thus is the subspace of all polynomials of degree at most 2. Consider next the infinite set S = {1, x, x2 , . . . , xn , . . . } consisting of all the nonnegative integral powers of x in R[x]. Any linear combination of finitely many powers of x will be in /S 0. Moreover, we can choose these powers from S arbitrarily. Therefore, any polynomial, irrespective of its degree and number of terms, can be

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

123

thought of as linear combination of finitely many suitable powers from S , and so belongs to /S 0. It follows that /S 0 = R[x]. In each of these examples, the linear span of a set of vectors turns out to be a subspace. This is true in general. This and other basic facts about linear spans of vectors are collected in the next proposition. Proposition 3.2.6.

For a non-empty subset S of a vector space V, the following hold:

(a) The linear span /S 0 of S is a subspace of V. In fact, /S 0 is the smallest subspace of V containing S. (b) //S 00 = /S 0. (c) If T is another non-empty subset of V such that T ⊂ S , then /T 0 ⊂ /S 0. (d) /S ∪ T 0 = /S 0 + /T 0 where the symbol + denotes the sum of subspaces. Proof. We begin by noting that as any vector in /S 0 a linear combination of finite number of vectors in S , the sum of two such vectors is again a linear combination of only finite number of vectors of S . The zero vector can be realized as a linear combination of any set of vectors and so it is in /S 0. Next, it is clear that any scalar multiple of a linear combination in /S 0 is again in /S 0. Thus, by Proposition (3.2.3),/S 0 is a subspace of V. For the other assertion in (a), observe that any finite linear combination of vectors of a subspace must be in the subspace as a subspace is closed with respect to addition and scalar multiplication. Thus, if W is a subspace of V containing S , then /S 0, by definition, will be a subset of W. The proof of (a) is complete. In particular, it is shown that if W is a subspace of V, then /W0, being the smallest subspace of V containing W, is W itself. (b) therefore follows as /S 0 itself is a subspace. (c) follows from the definition of linear span. For the last assertion, it suffices to verify that the sum on the right-hand side is the smallest subspace containing the union S ∪ T , and we leave the verification to the reader. ! We sometimes say that /S 0 is the subspace generated by S , or spanned by S , and refer to S as a generating set, or a spanning set of /S 0. Note that there may be more than one generating set for a subspace. In other words, there may be two different subsets S and T of a vector space such that /S 0=/T 0. For example, it can be shown easily that R2 = /(1, 0), (0, −1)0 even though we had seen that R2 is the span of {(1, 0), (0, 1)}. Another point to note is that we can add certain vectors to a generating set S without altering the linear span of S . For example, R2 is also the span of {(1, 0), (0, 1), (1, 1)} for (1, 1) itself is a linear combination of the other two vectors. Thus, there is a need to identify minimal generating sets, if any, of a linear span of vectors. We examine this and other aspects of generating sets of a linear span in the following section. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications: (a) Any non-zero vector space over R has infinitely many distinct vectors. (b) Any non-zero vector space over an infinite field has infinitely many distinct subspaces. (c) The field Q of rationals is a real vector space. (d) The field R is a vector space over Q. (e) The set C[x] of all polynomials with complex coefficients is a real vector space.

Saikia-Linear Algebra

124

book1

February 25, 2014

0:8

Vector Spaces

(f) The set R[x] of all real polynomials is a complex vector space. (g) The subset {a + bi | a, b ∈ Q} of C is a subspace of C over Q.

(h) The sum of the subspaces {(x, 0) ∈ R2 } and {(x, x) ∈ R2 } is R2 . (i) W1 + W2 = W2 + W1 for any two subspaces W1 and W2 of a vector space V. (j) If, for subspaces W1 , W2 and W3 of a vector space V, W1 + W2 = W1 + W3 , then W2 = W3 . (k) If, for two subspaces W1 and W2 of a vector space V, W1 + W2 = V, then W1 ∩ W2 is the zero subspace. (l) Any two sets of vectors spanning a subspace of a vector space must have the same number of vectors. (m) If W is the linear span of a set of vectors S of a vector space, then no proper subset of S can span W. (n) The sum of two subspaces of a vector space contains each of the two subspaces. (o) The set of all invertible matrices in Mn (F) is a subspace. (p) The set of all non-invertible matrices in Mn (F) is a subspace. (q) The empty set is a subspace of every vector space. (r) For vectors v1 , v2 of a vector space, /v1 , v2 0 = /v1 + v2 0. (s) R2 is a subspace of R3 . (t) The set {(a, a) | a ∈ R} is a subspace of R2 . (u) If U is a subspace of W and W is a subspace of V, then U is a subspace of V. 2. Prove the basic properties of vector space operations as stated in Proposition (3.2.2). 3. Let v be a fixed vector in a vector space V over a field F. Verify directly that the set {av | a ∈ F} of all scalar multiples of v is a subspace of V. 4. Determine whether the following subsets S of the real vector spaces V form subspaces: (a) S = {(x, mx) | x ∈ R} for any fixed real m; V = R2 ; (b) S = {(x, y) | y = sin x , x ∈ R}; V = R2 ; (c) S = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 0, xi ∈ R}; V = R3 ;

(d) S = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 1, xi ∈ R}; V = R3 ; (e) S = {(x1 , x2 , x3 ) | x1 = x2 , x3 = 2x1 , xi ∈ R}; V = R3 ;

(f) S = {(x1 , x2 , x3 ) | x1 ≥ 0, xi ∈ R}; V = R3 ; (g) S = {(x1 , x2 , x3 ) | x1 2 + x2 2 + x3 2 = 0, xi ∈ R}; V = R3 .

5. Verify that the subsets of the vector space Mn (F) consisting of the given matrices in each of the following are subspaces: (a) the symmetric matrices; (b) the lower triangular matrices; (c) the upper triangular matrices; (d) the diagonal matrices;

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

125

(e) the scalar matrices; (f) matrices whose all diagonal entries are zeros; (g) matrices which commute with a fixed matrix A ∈ Mn (F); (h) matrices whose first row has all zero entries. 6. In each of the following cases, determine whether the subset of the real vector space R[x] consisting of the given polynomials forms a subspace: (a) polynomials of degree at most n for any positive integer n; (b) constant polynomials or polynomials of degree 0; (c) polynomials whose constant term is zero; (d) polynomials whose constant term is 1; (e) polynomials whose derivatives vanish at x = 1. Can this exercise be generalized to polynomials with complex coefficients? 7. Verify that the set I R of all real-valued functions defined on a closed real interval I = [a, b] is a real vector space with respect to pointwise addition and pointwise scalar multiplication as defined in Example 8. Show further that the subsets of all continuous and differentiable functions on I, respectively, are subspaces. 8. Let V be the set of all real sequences {an }, that is, sequences in R. Verify that V is a real vector space with respect to coordinate-wise addition and scalar multiplication as defined in Example 9. Prove further that, in each of the following, the subset of V consisting of the given sequences forms a subspace: (a) convergent sequences; (b) sequences which converge to 0; 4 (c) sequences {an } such that i an 2 is finite.

9. Consider subspaces W1 = {(x, x) | x ∈ R} and W2 = {(x, 2x) | x ∈ R} of R2 . Verify that W1 ∪ W2 is not a subspace of R2 . 10. Given subspaces W1 and W2 of a vector space V, show that W1 ∪ W2 is a subspace of V if and only if W1 ⊂ W2 or W2 ⊂ W1 . 11. Prove the following variant of Proposition (3.2.3): A non-empty subset W of a vector space V over a field F is a subspace if and only if aw1 + w2 ∈ W whenever a ∈ F, and w1 , w2 ∈ W. 12. Let W1 and W2 be subspaces of a vector space V. Prove that any subspace of V that contains both W1 and W2 also contains the subspace W1 + W2 . 13. Let S and T be non-empty subsets of a vector space V. Prove that the linear span of S ∪ T is the sum /S 0 + /T 0 of subspaces spanned by S and T . 14. Let S be a non-empty set of vectors in a vector space V, and let v ∈ S be such that v is a linear combination of finitely many vectors of S , none of which is v. If S ' is the set obtained from S by removing v from S , then show that /S 0 = /S ' 0. 15. Prove that R2 is spanned by the vectors (1, 0) and (1, −1). 16. Let W be the subset of R3 given by      x        1    x − 2x − 3x = 0 W = . | x  2 1 2 3             x 3

Saikia-Linear Algebra

126

book1

February 25, 2014

0:8

Vector Spaces

Verify that W is a subspace of R3 . Show that every vector of W can be expressed as   2a + 3b  a ,   b

for some real numbers a and b. Hence, find vectors v1 and v2 in R3 such that their span /v1 , v2 0 equals W. 17. Let W be the set of all vectors in R4 of the form    a − b  a − 2b   2a − b 3b

for arbitrary reals a, b. Show that W is a subspace of R4 by finding vectors in R4 whose span is W. 18. Let V = R2 be the set of all ordered pairs of real numbers. In V, define addition component-wise as usual, but define scalar multiplication in the following way: a(x1 , x2 ) = (ax2 , ax1 ) for any a ∈ R. Is V, with these operations, a vector space over R? 19. Let V = Rn be the real vector space of all ordered n-tuples of real numbers with respect to usual component-wise addition and scalar multiplication. Is V a vector space over the field C of complex numbers with similar operations? Is V a vector space over the field Q of rational numbers with similar operations? 20. Prove that a vector space over an infinite field cannot be the union of finitely many proper subspaces. The following three exercises are taken from the article ’Generating Exotic-looking Vector spaces’ by M.A. Carchidi, which appeared in the College Mathematics Journal,Vol.29, No.4(Sept.,1998) 21. Consider the set of real numbers in the open interval (−1, 1). For any x, y ∈ (−1, 1) and for any α ∈ R, define x⊕y =

x+y , 1 + xy

α3 x =

(1 + x)α − (1 − x)α . (1 + x)α + (1 − x)α

Verify that for such x, y and α, x ⊕ y and α 3 x are indeed real numbers in (−1, 1). Then show that (−1, 1) is a vector space over R with respect to addition ⊕ and scalar multiplication 3. (The reader is expected to verify each of the vector space axioms.) 22. Let a be a fixed real number. For any real numbers x, y and α, define x ⊕ y = x + y − a,

α 3 x = αx + a(1 − α).

Show that R is a vector space over itself with respect to ⊕ and 3. The preceding two exercises are special cases of the following general result.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Independence

127

23. Let f : R → V be an one–one map from the set R of real numbers onto a non-empty set V. Define addition ⊕ and scalar multiplication 3 in V, for any x, y ∈ V and any α ∈ R, by x ⊕ y = f ( f −1 (x) + f −1 (y)) α 3 x = f (α. f −1 (x)), where f −1 : V → R is the inverse of f , and + and . are usual addition and multiplications in R. Verify that V is a real vector space with respect to ⊕ and 3.

3.3 LINEAR INDEPENDENCE We have seen in the preceding section that a given set of vectors generating a subspace of a vector space may not be the ideal one. This section deals with the important question of finding suitable generating sets for vector spaces. It is reasonable to expect that an ideal generating set for a vector space should be minimal in the sense that no proper subset of it can span the vector space. We will see that such minimal generating sets for a large class of vector spaces will allow us to describe vectors in these spaces in terms of coordinates. The use of coordinates, in turn, will permit matrix methods for analysis of questions even in abstract vector spaces. The search for minimal generating sets depends on another important idea, that of linear independence of vectors. Definition 3.3.1. A finite set S of vectors {v1 , v2 , . . . , vm } of a vector space V over a field F is said to be linearly independent over F if the relation a1 v1 + a2 v2 + · · · + am vm = 0,

ai ∈ F,

holds only when all the scalars ai = 0 in F. We also say that S is linearly dependent over F if S is not linearly independent. An arbitrary set S of vectors in V is linearly independent over F if every finite subset of S is linearly independent. Otherwise, S is linearly dependent over F. Note that {v1 , v2 , . . . , vm } is linearly dependent over F if and only if a relation a1 v1 + a2 v2 + · · · + am vm = 0,

ai ∈ F

(3.2)

holds where not all scalar coefficients ai are zeros. Such a relation among a set of linearly dependent vectors is sometimes referred to as a relation of linear dependence. A linear combination, such as the preceding one, where at least one scalar coefficient is non-zero, may be termed as non-trivial as opposed to a trivial one in which all the coefficients are zeros. Using these terms, we may rephrase our definition of linear independence as follows: a finite set of vectors is linearly independent if and only if no non-trivial linear combination of these vectors results in the zero vector. Similarly, the set is linearly dependent if and only if some non-trivial linear combination equals the zero vector. A remark about the definition of linear independence is in order; the concept of linear independence depends closely on the scalar field associated with the vector space. See Example 21 discussed later. Even then, if there is no confusion about the underlying field, we may drop the words ‘ over F’ while talking about linear independence. The observations in the following result will prove quite useful.

Saikia-Linear Algebra

128

book1

February 25, 2014

0:8

Vector Spaces

Proposition 3.3.2. Let S be a non-empty subset of a vector space V. (a) If S contains the zero vector of V, then S is linearly dependent. Also, if S has two equal vectors, then S is linearly dependent. (b) If S is linearly independent, then so is every subset of S . (c) If S is linearly dependent, then so is every subset of V containing it. (d) A single non-zero vector in V is linearly independent. (e) Two vectors of V are linearly dependent if and only if one is a scalar multiple of the other. The verifications of these assertions are straightforward, and left to the reader. To make the idea of linear independence clear, let us look at some examples. EXAMPLE 15 The vectors (1, 0) and (0, 1) of R2 are linearly independent over R. For, if x1 (1, 0) + x2 (0, 1) = (0, 0) for real numbers x1 and x2 , then it follows that (x1 , x2 ) = (0, 0) or, equivalently, that x1 = x2 = 0. A similar verification shows that the m vectors e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), . . . en = (0, 0, . . . , 0, 1) in Rm (or even in Fm for any field F), where e j has 1 as the jth place and zeros elsewhere, are linearly independent over R (over F). Note that the relation of linear dependence for vectors of a vector space V has the zero vector of V on the right-hand side, not the zero scalar. EXAMPLE 16 Let us examine whether the vectors (a, b, c), (1, 1, 0) and (0, 1, 0) of R3 are linearly independent over R, where none of the real numbers a, b and c is zero. The relation of linear dependence now reads x1 (a, b, c) + x2 (1, 1, 0) + x3(0, 1, 0) = (0, 0, 0), which is equivalent to three equations x1 a + x2 = 0 x1 b + x2 + x3 = 0. x1 c = 0 Solving these for the scalars x1 , x2 and x3 , we see that x1 = 0 as c ! 0, whence the first will force x2 = 0 and then the second implies that x3 = 0. Thus, the given vectors are linearly independent over R. This example illustrates an important technique for determining the linear independence of vectors in R3 , or in general, of vectors in the vector space Fm over F. Recall that those three equations, because of column-row multiplication (see Equation 1.7 in Section 1.2) amount to a single vector equation         a 1 0 0         x1 b + x2 1 + x3 1 = 0         c 0 0 0 as in Proposition (3.3.3). However, we go a step further and express the vector equation as the follow-

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Independence

129

ing equivalent matrix equation:  a  b c

1 1 0

    0  x1  0     1  x2  = 0.     0 x3 0

It is, therefore, clear that the theory of matrix equations developed in previous chapters will be useful in dealing with questions about linear independence. We will discuss this point in detail after the following examples. EXAMPLE 17 It is easy to see that the set {1, x, x2 } of vectors in R[x] is linearly independent, whereas {1, x, x2 , 5 +2x−3x2 } is linearly dependent. For, a polynomial, in particular, a polynomial in 1, x and x2 is the zero polynomial (the zero vector of R[x]) only when all its coefficients are zeros. For the asserted dependence of 1, x, x2 and 5 + 2x − 3x2 , note that for scalars a1 = −5, a2 = −2, a3 = 3 and a4 = 1, we do have a relation of linear dependence. EXAMPLE 18 In fact, the infinite set {1, x, x2 , . . . , xn , . . . } of all the non-negative powers of x as vectors of R[x] is linearly independent, as no non-trivial linear combination of any finite set of powers of x can give us the zero polynomial. EXAMPLE 19 Consider the real vector space Mm×n (R) of all m × n matrices with real entries. The subset {ei j | 1 ≤ i ≤ m, 1 ≤ j ≤ n} of the unit matrices is linearly independent over R. 4 For, if the sum i, j ai j ei j equals the zero vector (which in this case is the zero matrix), then the matrix [ai j ] is the zero matrix which implies that each scalar ai j = 0. EXAMPLE 20 Consider the subset {sin t, cos t} of the real vector space C[−π/2, π /2] of all continuous real-valued functions on the closed interval [−π/2, π /2]. These two functions are linearly independent, for otherwise according to one of our observations in Proposition (3.3.2), one must be a scalar multiple of the other, say, sin t of cos t. Since these two are functions, we conclude that sin t = a cos t

for all t ∈ [−π/2, π /2]

for some fixed scalar a. This is absurd since the same a cannot work for all the t ∈ [−π/2, π /2]. (Compare the graphs). Our assertion follows. EXAMPLE 21 The set {1, i} of C is linearly independent over R if we consider C as a vector space over R, whereas as vectors of the complex vector space they are dependent as the following relation of linear dependence shows: (i)1 + (−1)i = 0. Note that in this relation, the scalars a1 = i and a2 = −1 are from the base field C.

Saikia-Linear Algebra

130

book1

February 25, 2014

0:8

Vector Spaces

Testing Linear Independence by Using Matrices Now, we discuss a systematic procedure of checking the linear independence or dependence of vectors belonging to Fm by using matrices and their echelon forms. The key to the procedure is the following observation: the scalars a1 , a2 , . . . , am in the relation for linear dependence of the vectors v1 , v2 , . . . , vm form a non-zero solution (as an element of Fm ) of the vector equation: x1 v1 + x2 v2 + · · · + xm vm = 0. Hence, we have the following alternative definition of linear independence. Definition 3.3.3. The vectors v1 , v2 , . . . , vm of a vector space V over a field F are linearly independent over F if and only if the vector equation x1 v1 + x2 v2 + · · · + x m vn = 0 has a unique solution in Fm , namely, the zero solution given by x1 = x2 = · · · = xn = 0. Consider a set of n vectors v1 , v2 , . . . , vn in Fm . For our purpose, it will be convenient to think of these vectors as m-dimensional column vectors. Let A be the m × n matrix over F, whose jth column is the column vector v j : A = [v1

v2 · · · vn ].

Also for a set of n scalars a1 , a2 , . . . , an in F, let a be the n-dimensional column vector given by   a1  a2  a =  . .  ..    an

With these notation in place, the linear combination

a 1 v1 + a 2 v2 + · · · + a n vn can be expressed, by column-row multiplication (see Equation 1.7), as the following matrix product: Aa = a1 v1 + a2 v2 + · · · + an vn .

(3.3)

Note that Aa ∈ Fm . To derive a workable definition of linear independence of vectors in Fm in terms of solutions of matrix equations, we next introduce x, the n-dimensional column vector of variables: x = (x1 , x2 , . . . , xn )t We are now ready with the characterization of linear independence of vectors in Fm in terms of matrix equations.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Independence

131

Proposition 3.3.4. Let v1 , v2 , . . . , vn be vectors in Fm and let A ∈ Mm×n (F) be the matrix whose jth column is the m-dimensional column vector v j . The vectors v1 , v2 , . . . , vn are linearly independent over F if and only if the matrix equation Ax = 0 has only the trivial solution x1 = x2 = · · · = xn = 0. Note that 0 in the matrix equation is in Fm . Proof. By Equation (3.3), Ax is the linear combination x1 v1 + x2 v2 + · · · + xn vn so the proposition follows from Definition (3.3.3) of linear independence of vectors. ! This proposition, coupled with Corollary (2.4.4) from Chapter 2, provides us with the following test for linear independence. Corollary 3.3.5. Notation as in the last Proposition. Let R be either a row echelon form or the reduced row echelon form of A. If every column of R is a pivot column, then the vectors are linearly independent. Thus, if even a single column of R fails to be a pivot column, then the vectors are linearly dependent. As an application of the corollary, let us determine whether the vectors (1, 1, 0, 2), (0, −1, 1, 0) and (−1, 0, −1, −2) in R4 are linearly independent over R. The matrix formed by these vectors as columns is given by  1 1 A =  0 2

0 −1 1 0

 −1  0 . −1 −2

One easily sees that the reduced row echelon form of A is  1 0 R =  0 0

0 1 0 0

 −1  −1 . 0 0

Since r has only two pivot columns, we conclude, by the preceding corollary, that the given vectors are linearly dependent over R. Some Important Results Two consequences, which are of great theoretical importance, must be singled out. Corollary 3.3.6. A set of m vectors in Fm are linearly independent if and only if the m × m matrix over F, formed by the vectors as its columns, is invertible. Corollary 3.3.7.

Any set of n vectors in Fm is linearly dependent if n > m.

Saikia-Linear Algebra

132

book1

February 25, 2014

0:8

Vector Spaces

The first corollary follows from Corollary (2.5.3), which states that a square matrix is invertible if and only if its reduced row echelon form is the identity matrix in which case clearly all the columns of the reduced row echelon form are pivot columns. For the second corollary, we need to note that if the number of columns in a matrix is more than the number of rows, then there will be columns in the echelon form of the matrix which cannot be pivot columns. The last corollary can be rephrased as follows: Corollary 3.3.8. exceed m.

The number of vectors in any linearly independent set of vectors in Fm cannot

Thus, in Fm we have this concept of maximal linearly independent sets of vectors; these are the sets of linearly independent vectors in Fm which, when extended by adding even a single vector, no longer remain linearly independent. This is a concept as important as the one of minimal generating set. We now examine how to arrive at maximal linearly independent set of vectors in an arbitrary vector space V. Let us begin by extending the smallest possible linearly independent set of vectors in V. Recall that any non-zero vector in a vector space is linearly independent. So, starting with a nonzero vector v1 , how do we choose v2 such that {v1 , v2 } is still linearly independent? Now, by the last observation in Proposition (3.3.2), {v1 , v2 } is linearly independent if and only if v2 is not a scalar multiple of v1 , or in other words, if and only if v2 " /v1 0, as the span of v1 is the set of all its scalar multiples. Fix a v2 such that {v1 , v2 } is linearly independent. Now, if for another vector v3 , a 1 v1 + a 2 v2 + a 3 v3 = 0 for scalars ai , not all of which are zeros, it follows that a3 ! 0. For, otherwise the linear independence of {v1 , v2 } will force both a1 and a2 to be zeros. Therefore, we can divide the relation by a3 to express v3 as a linear combination of v1 and v2 , which is another way of saying that v3 is in the span of {v1 , v2 }. Thus, we can add v3 to the linearly independent set {v1 , v2 } to obtain a larger linearly independent set if and only if v3 is not in the span of {v1 , v2 }. Continuing in the same vein, we can prove the general case as given in the next proposition. Proposition 3.3.9. Let S be a finite set of linearly independent vectors of a vector space V over a field F. Then, for any v ∈ V, the extended set S ∪ {v} is linearly independent over F if and only if v " /S 0, where /S 0 is the linear span of S . The connection between linear dependence and linear span that came out in the discussion preceding the last proposition gives rise to the following explicit characterization of linearly dependent set of vectors. Proposition 3.3.10. A set {v1 , v2 , . . . , vm } of non-zero vectors of a vector space V is linearly dependent over the base field F if and only if for some j > 1, the vector v j is a linear combination of the preceding vectors v1 , v2 , . . . , v j−1 in the list. Note that the proposition makes sense only when m ≥ 2.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Independence

133

Proof. Note that if, for some j > 1, v j is the linear combination a1 v1 + a2v2 + · · · + a j−1 v j−1 , then not all the scalars ai are zeros (for, otherwise v j = 0). Therefore, we have the following relation of linear dependence: a1 v1 + · · · + a j−1 v j−1 + (−1)v j + 0v j+1 + · · · + 0vm = 0. Conversely, suppose that the given vectors are linearly dependent. In any relation of linear dependence 4 for these vectors, such as i ai vi = 0, choose the largest subscript j for which a j ! 0. Thus, all ak = 0 for k > j so the relation actually does not contain the terms corresponding to the vectors vk for all k > j. Dividing the relation by the non-zero scalar a j then allows us to express v j as a linear combination of v1 , v2 , . . . , v j−1 . ! Note that if v j is a linear combination of v1 , v2 , . . . , v j−1 , then the span of v1 , v2 , . . . , v j−1 is the same as the span of v1 , v2 , . . . , v j−1 , v j . (See Exercise 14 of the last Section). We exploit this simple fact for the next corollary. Corollary 3.3.11. Let the subspace W of V be spanned by S = {v1 , v2 , . . . , . . . , vm }. If for some k, 1 ≤ k ≤ m, the set {v1 , v2 , . . . , vk } is linearly independent, then it can be extended to a subset S ' of S consisting of only linearly independent vectors such that W = /S ' 0. Proof. If S is linearly independent, then there is nothing to prove as we may choose S ' = S . Otherwise, it follows from the preceding proposition that some v j in S is a linear combination of v1 , v2 , . . . , v j−1 where j − 1 ≥ k. If the set formed by deleting v j from S is linearly independent, it can be chosen as S ' ; otherwise continue the process of weeding out any vector which is a linear combination of the preceding vectors in the new set. This process must stop after a finite number of steps to yield S ' , as k vectors of S are already linearly independent. The statement about the span of S ' follows from the remark preceding this corollary. ! This corollary shows that it is possible to extract a linearly independent subset from among the vectors of a finite generating set of a vector space V such that the linearly independent subset itself will span V. This observation leads to the important concepts of bases and dimensions of vector spaces. We deal with these two concepts in detail in the next section. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications; all the given vectors are in an arbitrary field unless otherwise mentioned. (a) If v1 , v2 , . . . , vm are linearly dependent vectors, then vm is a linear combination of the other m − 1 vectors. (b) Every non-empty subset of a linearly independent set of vectors is linearly independent. (c) Every set of vectors containing a linearly independent set is linearly independent. (d) Every non-empty subset of a linearly dependent set of vectors is linearly dependent. (e) Every set of vectors containing a linearly dependent set is linearly dependent.

Saikia-Linear Algebra

134

book1

February 25, 2014

0:8

Vector Spaces

(f) If v is in the span of a non-empty set S of vectors, then any u ∈ S is in the span of the set obtained from S by replacing u by v. √ (g) The vectors 1, 2 are linearly independent in R considered a vector space over the field Q. √ √ (h) The vectors 1, 2, 8 are linearly independent in R considered a vector space over the field Q. (i) A subset of a vector space is linearly independent if none of its vectors is a linear combination of the others. (j) In R3 , there cannot be a set of four vectors such that any three of them are linearly independent. (k) The real polynomials 1 − 2x + 3x2 − 4x3 − x20 and 2 − 4x + 6x2 − 8x3 + x20 are linearly dependent over R. (l) The functions 1, cos t and sin t, as vectors of C[−π/2, π /2], are linearly dependent over R.

2. Prove the assertions in Proposition (3.3.2). 3. Prove Corollary (3.3.8). 4. Are the following vectors in R3 in the span of (1, 1, 1) and (1, 2, 3)? (a) (1, 0, 2) (b) (−1, −2, 3) (c) (3, 4, 5)

5. Do the vectors (1, 3, 2, 4), (−2, 4, 7, −1), (0, 2, 7, −1) and (−2, 1, 0, −3) span R4 ? 6. Give a spanning set of five vectors of the vector space R3 . Also, find a linearly independent subset of the spanning set. 7. Verify that S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} span the vector space C3 of ordered triples of complex numbers over the field C of complex numbers. If C3 is considered a vector space over the field of real numbers, then show that S does not span C3 . Find a spanning set of C3 if it is regarded a real vector space. 8. Given that v1 , v2 and v3 are linearly independent in a vector space V over a field F, determine whether the following sets are linearly independent: (a) v1 , v1 + v2 , v1 + v2 + v3 (b) v1 − v2 , v2 − v3 , v3 − v1

9. Find three vectors in R3 which are linearly dependent but such that any two of them are linearly independent. 10. In each of the following, determine whether the given vectors are linearly independent in the indicated vector space. All the vector spaces are over R. (a) (1, −2, 2), (2, −4, 2) in R3

(b) (1, −2, 2), (2, −4, 2), (4, −8, 2) in R3

(c) (0, −1, 2, 3), (1, 2, −1, 0), (−2, −8, 6, 6) in R4

(d) x2 − x, 2 + 3x − x2, −4 − 8x in R3 [x] ' ( ' ( ' ( 1 −2 2 −4 3 −4 (e) , , in M2 (R) 0 1 1 2 2 −4

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basis and Dimension

 1  (f) 0  0

 1  0 ,  −1

 1  0 0

 0  1 ,  1

  0   1 −1

 1  1 ,  0

  1   1 −1

135

 0  0 in M2 (R)  1

(g) sin 2t, cos 2t, sin t cos t in C[0, 1] Recall that we are following the convention that vectors in Fn can be described as either row vectors or column vectors.       3  1   1        11. Find the values of a for which the vector 3 in R3 is in the span of  −1  and  2 .       a 1 −3 12. Let A ∈ Mn (F). Show that A is invertible if and only if the rows of A, regarded as vectors in Rn , are linearly independent.

3.4 BASIS AND DIMENSION We had observed, at the end of the preceding section, that any set, spanning a vector space, always contains a linearly independent subset which too spans the vector space. On the other hand, Proposition (3.3.9) implies that the process of extending linearly independent sets can be carried on till the extended linearly independent set spans the vector space. The following definition, therefore, is a natural one. Definition 3.4.1. A linearly independent subset of a vector space is a basis of the vector space if it spans the vector space. Thus, it is clear that a basis of a vector space is a maximal linearly independent set of vectors in that space. Because both the concepts of span and independence depend on the scalar field F, a basis too depends on F. However, as has been our practice in similar situations, we do not emphasize this dependence if there is no occasion for confusion. A fundamental fact is that any vector space does possess a basis. For a large class of vector spaces, it is quite easy to see this. The following definition introduces these spaces. Definition 3.4.2. A finite-dimensional vector space is one which has a finite generating set. A vector space which cannot be generated by any finite subset is an infinite-dimensional vector space. According to Corollary (3.3.11), any finite generating set of a vector space contains a linearly independent subset which spans the vector space. Thus, a finite-dimensional vector space has a finite basis. Even if a vector space is not finite-dimensional, it has a basis. But any proof of this fact depends on deep concepts of set theory, such as the axiom of choice, which are beyond the scope of this book. Let us consider some finite-dimensional vector spaces and their bases. EXAMPLE 22 Rm is a finite-dimensional vector space. The vectors e1 , e2 , . . . , em , where j

e j = (. . . , 0, . . . , 1, . . . , 0, . . . )

Saikia-Linear Algebra

136

book1

February 25, 2014

0:8

Vector Spaces

is the vector having 1 at the jth place and zeros everywhere else, span Rm . Indeed, 4 any vector (x1 , . . . , x j , . . . , xm ) ∈ Rm is the linear combination k xk ek of these e j . Moreover, these vectors e j are linearly independent over R as the relation 1

xk ek = 0

k

implies that (x1 , . . . , x j , . . . , xm ) is the zero vector whence one concludes that each x j = 0. Thus, {e1 , e2 , · · · , em } is a basis of Rm . This basis is usually referred to as the standard basis of Rm . More generally, if 1 and 0 denote the identities of a field F, then one can verify that the vectors e j of Fm , similarly defined, form the standard basis of Fm over the field F. EXAMPLE 23 An argument based on degrees of polynomials shows that R[x] cannot be finitedimensional. For, suppose that polynomials f1 (x), f2 (x), . . . , fk (x) span R[x]. Note: For any two polynomials f (x) and g(x) and scalars a, b, the degree of the non-zero polynomial a f (x) + bg(x) is the larger of the two degrees of f (x) and g(x). It follows that if n is the largest of the degrees of the given polynomials f1 (x), f2 (x), . . . , fk (x), then n must be the degree of any non-zero linear combination a1 f1 (x) + a2 f2 (x) + · · · + ak fk (x). Thus the degree of any non-zero polynomial in the span of the given polynomials cannot exceed n. Since R[x] has polynomials of arbitrarily large degrees, our supposition is absurd. EXAMPLE 24 However, the real vector space Rn [x] is finite-dimensional. It is easy to see that the n + 1 vectors 1, x, x2 , . . . , xn not only span Rn [x], but also are linearly independent over R, so they form a basis of Rn [x]. We will call this basis the standard basis of Rn [x]. Similarly, the space of all polynomials, with coefficients from a field F, whose degrees do not exceed a fixed positive integer, is finite-dimensional considered a vector space over F. EXAMPLE 25 Consider the vector space Mm×n (F) of m × n matrices with entries from a field F. As we have seen in Proposition (1.3.8) of Chapter 1, the mn unit matrices ei j span this vector space. They form a basis of Mm×n (F) for, if for some scalars xi j 1

xi j ei j = 0,

i, j

where the matrix 0 is the zero (vector) of Mm×n (F), then the m × n matrix [xi j ] itself is the zero matrix 0 showing that each scalar xi j is zero. Note that this basis of Mm×n (F) is similar to the standard basis of Rm . EXAMPLE 26 The space Mn (F) of square matrices has a basis consisting of n2 unit matrices ei j , 1 ≤ i, j ≤ n.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basis and Dimension

137

We leave it to the reader to verify that the subspace of Mn (F) consisting of symmetric matrices is spanned by e11 , e22 , . . . , enn and the sums ei j + e ji for i ! j. Thus, this subspace has n(n + 1)/2 matrices forming a basis. EXAMPLE 27 Let us come back to R2 which has the standard basis {e1 , e2 }, where e1 = (1, 0) and e2 = (0, 1). It is quite simple to show that the vectors (1, 0) and (1, 1) form another basis of R2 . In fact, we can find infinitely many pairs of vectors in R2 which form bases of R2 . Refer to Corollary (3.3.6) for an easy method to check whether a given pair forms a basis or not. Dimension In fact, the last example is typical. In any vector space, there will be many choices for a basis. In later chapters, a lot of effort will be spent in choosing a basis appropriate for any given application. So the following result is fundamental. Theorem 3.4.3. tors.

Any two bases of a finite-dimensional vector space have the same number of vec-

The proof is an easy consequence of the following lemma which implies that the number of vectors in any basis cannot exceed the same for any other basis. Note that the lemma is a generalization of Corollary (3.3.7). Lemma 3.4.4. Let {v1 , v2 , . . . , vm } be a basis of a vector space V over a field F. If vectors u1 , u2 , . . . , un are linearly independent in V, then n ≤ m. Proof. It is sufficient to show that any set of n vectors in V is linearly dependent if n > m. In other words, it is sufficient to show that given any arbitrary set of n vectors u1 , u, . . . , un where n > m, it is possible to find scalars x1 , x2 , . . . , xn in F, not all zero, such that x1 u1 + x2 u2 + · · · + xn un = 0. To find such scalars, we first express each u j as a linear combination of the given basis vectors, which is possible as the basis vectors span V. So, for each fixed j (1 ≤ j ≤ n), let a1 j , a2 j , . . . , am j be m scalars such that u j = a1 j v1 + a2 j v2 + · · · + am j vm m 1 = a i j vi . i=1

Let A be the m × n matrix [ai j ] over F formed by these mn scalars in such a way the coefficients for u j form the jth column of A. Consider now the homogeneous system of m equations in n variables given by the matrix equation: Ax = 0. Since by hypothesis n > m, Proposition (2.5.1) about such system implies that this system has a nonzero solution in Fn . In other words, it is possible to choose x1 , x2 , . . . , xn in F, not all zero (constituting

Saikia-Linear Algebra

138

book1

February 25, 2014

0:8

Vector Spaces

the non-zero solution), such that n 1

ai j x j = 0

for i = 1, 2, . . . , m.

j=1

It follows that x1 u1 + x2 u2 + · · · + xn un =

n 1

x ju j

i=1

j=1

j=1 n 1

m  1   = x j  ai j vi  j=1 i=1   n m   1 1  v   a x = i j j  i  =0

as the coefficient of each vi is zero by our choice of x j . This completes the proof of the lemma.

!

Since the number of vectors in any linearly independent set in a vector space of dimension m cannot exceed m, a basis is sometimes referred to as a maximal linearly independent set. But the main use of the theorem is in enabling one to assign a number to a finite-dimensional vector space which is intrinsic to the space and independent of the choice of a basis. Definition 3.4.5. The dimension of a finite-dimensional vector space V, denoted by dim V, is the number of vectors in any of its basis. The dimension of the zero vector space {0} is defined as zero. The convention about the dimension of the zero space is necessary as it has no basis. Another way around this difficulty is to declare that the zero space is spanned by the empty set of linearly independent vectors, and thus its dimension is zero. From our examples of bases of vector spaces, we immediately conclude that: (a) for any field F, dimFm = m, as the standard basis of Fm has m vectors; (b) dimMm×n (F) = mn as the unit matrices form a basis. In particular, dimension of the vector space Mn (F) is n2 ; (c) the real vector space Rn [x] has dimension (n + 1) as 1, x, x2 , . . . , xn form a basis; (d) The subspace R2 [x] of Rn [x] for n > 2 has dimension 3 as 1, x, x2 span it. Basis of a Subspace In the last example, we could find a finite generating set spanning of a subspace of a finite-dimensional vector space. However, it is not clear that an arbitrary subspace of a finite-dimensional vector space must have a finite generating set. The next proposition uses dimension argument to make it clear. Proposition 3.4.6. A subspace W of a finite-dimensional vector space V is necessarily finitedimensional. In fact, dim W ≤ dim V.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basis and Dimension

139

Proof. If W is the zero space {0}, the result is trivial by our convention about the dimension of the zero space. So, assume that W is non-zero. Then, W certainly has linearly independent vectors. If any finite set of such linearly independent vectors in W does not span W, by Proposition (3.3.9), it can be expanded to a larger, but still a finite, set of independent vectors. However, vectors independent in W are also independent in V and so their number cannot exceed dim V. That shows that the expansion of a finite set of linearly independent vectors in W to larger sets cannot be continued indefinitely, and so after a finite number of steps must end resulting in a spanning set of W, that is, in a basis of W. Since the basis vectors of W are linearly independent in V too, it follows that dim W ≤ dim V. ! We want to record the last point noted in the proof as a corollary. Corollary 3.4.7. Any linearly independent set of vectors in a non-zero subspace W of a finitedimensional vector space V can be expanded to a basis of W. Note that W can be chosen as V also in the corollary. In that case, the corollary yields the following useful result. Corollary 3.4.8. a basis.

Any set of m linearly independent vectors in an m-dimensional vector space forms

The last proposition comes handy in describing subspaces of a vector space in terms of dimensions. We classify the subspaces of R2 according to their dimensions as an example. Since the dimension of R2 is two, any of its subspaces must have dimension less than two. EXAMPLE 28 The following are the subspaces of R2 . Zero-dimensional subspace: Only the zero space {0}. One-dimensional subspaces: All the straight lines passing through the origin. Any such line will be spanned by any non-zero vector (a point in this case) lying in the line. Two-dimensional subspaces: Only the whole space R2 . Any two linearly independent vectors will form a basis. If we start with any non-zero vector (x1 , x2 ) (think of this as a point in the plane), it forms the basis of the unique straight line passing through it and the origin. Now, if (y1 , y2 ) is not in the span of the first vector, i.e. if (y1 , y2 ) does not lie on that line, then the set {(x1 , x2 ), (y1 , y2 )} is automatically linearly independent, so forms a basis of R2 . Tests for a Basis For various calculations, we will frequently need to test sets of vectors to see whether they form a basis. For vectors in Fm , the following restatement of Corollary (3.3.6) provides such a test. Lemma 3.4.9. Let v1 , v2 , . . . , vm be m vectors in the m-dimensional vector space Fm . Let P be the square matrix of order m whose jth column consists of the components of the vector v j . Then, these m vectors form a basis of Fm if and only if P is invertible.

Saikia-Linear Algebra

140

book1

February 25, 2014

0:8

Vector Spaces

Thus, we know that (1, 0) and (1, 1) form a basis of R2 as ' ( 1 1 P= 0 1 and det P ! 0. For vectors in arbitrary finite-dimensional vector spaces, analogous tests can be formulated once coordinates of vectors are available. The idea of coordinates, which makes numerical calculations possible in arbitrary vector spaces, follows from the following fundamental result. Proposition 3.4.10. Let {v1 , v2 , . . . , vm } be a basis of a finite-dimensional vector space V. Then every vector of V can be expressed uniquely as a linear combination of these basis vectors. That every vector is a linear combination of the basis vectors is part of the definition of a basis. The point of this result is that for a fixed basis, there is only one way of choosing the scalar coefficients in the expression of a vector as a linear combination of the basis vectors. Proof. It is sufficient to prove the assertion about the uniqueness. Assume that it is possible to express 4 4 a vector v ∈ V as i xi vi as well as i yi vi for scalars xi and yi . These two linear combinations being equal, we can rewrite the equality as 1 (xi − yi )vi = 0. i

However, vi are basis vectors and so are linearly independent. We may, therefore, conclude from the last vector equation that for each i, xi − yi = 0 or xi = yi . This proves the required uniqueness. ! Thus, every vector in a m-dimensional vector space determines a unique set of m scalars with respect to a given basis. Let us formalize this association of vectors with unique sets of scalars first. Coordinates and Coordinate Vectors So let V be a finite-dimensional vector space over a field F, and let B = {v1 , v2 , . . . , vm } be a fixed basis of V. Then, given a vector v ∈ V, there is one and only one way of writing v as a linear combination of the basis vectors: v = x1 v 1 + x 2 v 2 · · · + x m v m ,

xi ∈ F.

The m scalars xi , uniquely determined by the vector relative to the given basis, are called the coordinates of v relative to the basis B, and the column vector    x1   x   2  (3.4)  ..  = (x1 , x2 , . . . , xm )t  .    xm

in Fm consisting of these scalars is called the coordinate vector of v with respect to the given basis of V. If we want to indicate the basis with respect to which the coordinates are taken, we will refer to the coordinate vector of v as [v]B .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basis and Dimension

141

Note: The association of a vector of an m-dimensional vector space with an ordered m-tuple in Fm depends on the order of the basis vectors in the given basis. Thus, to remove any ambiguity in this association, we need to fix the order of the basis vectors. Hence, we adopt the following convention: a basis of V is an ordered set of linearly independent vectors which span V. Thus, for us a basis will always mean an ordered basis. However, we will continue to use curly brackets {} to describe sets of vectors forming a basis. (Usually, ordered sets are enclosed in round brackets). Thus, for us, the basis {v1 , v2 , . . . , vm } is not the same as the basis {v2 , v1 , . . . , vm } even though as sets they are the same. Let us consider some examples. EXAMPLE 29 Consider the m-dimensional vector space Fm over the field F equipped with the standard basis E = {e1 , e2 , . . . , em }, where j

e j = (. . . , 0, . . . , 1, . . . , 0, . . . ). Given any vector v = (x1 , x2 . . . , xm ) in Fm , since 1 (x1 , x2 . . . , xm ) = xk ek , k

it follows that the components of v themselves are the coordinates of v with respect to the standard basis E. For example, in R2 , ((1, 2))E = (1, 2)t . EXAMPLE 30 The vector space M2 (R) of 2 × 2 real matrices has the ' four unit( matrices, e11 , e12 , e21 a12 a as a vector of M2 (R) and e22 , forming the standard basis. The matrix 11 a21 a22 has a11 , a12 , a21 and a22 as the coordinates relative to(this standard basis. Thus, for ' 1 −3 example, the coordinate vector of the matrix with respect to the standard 0 −2 basis is (1, −3, 0, −2)t . EXAMPLE 31 If V is the vector space of all real polynomials of degree at most 3, then as we have seen earlier, {1, x, x2 , x3 } is the standard basis of V over R. The polynomial a0 + a1 x + a2 x2 + a3 x3 clearly has a0 , a1 , a2 and a3 as its coordinates with respect to this basis. EXAMPLE 32 Consider the vector space R2 with basis (1, 2), (0, 4) (note one is not a scalar multiple of the other, so they are linearly independent, and hence form a basis, as R2 has dimension 2). To determine the coordinates of, say, (5, −3) with respect to this basis, we have to solve the following vector equation (5, −3) = x1 (1, 2) + x2(0, 4)

Saikia-Linear Algebra

142

book1

February 25, 2014

0:8

Vector Spaces

for real numbers x1 and x2 . We express this vector equation as the matrix equation ' (2 3 2 3 5 1 0 x1 = . −3 2 4 x2 The 2 × 2 matrix here has the basis vectors as its columns, so is invertible by Lemma (3.4.9). Multiplying the matrix equation from the left by the inverse ' ( 4 0 1/4 , we obtain the coordinate vector of (5, −3) as −2 1 2 3 ' (2 3 2 3 x1 4 0 5 5 = 1/4 = . −2 1 −3 −13/4 x2 Note that we have solved the matrix equation through multiplication by the inverse. This method makes sense only when the matrices are small in size. In fact, it is easier to solve matrix equations by the method of row reduction of the augmented matrix, which was discussed in the last chapter. EXAMPLE 33 In general, let B = {v1 , v2 , . . . , vm } be an arbitrary basis of Fm . Given a vector v ∈ Fm , we wish to determine the coordinates x1 , x2 , . . . , xm of this column vector v with respect to the basis B, determined by v = x1 v1 + x2 v2 + · · · + xm vm . Now, as in Equation (3.3), the sum on the right-hand side is precisely the matrix product Px, where P is the square matrix whose jth column is the vector v j , and x is the column vector formed by the scalars x1 , x2 , . . . xm we are seeking. Therefore, the preceding vector equation can be put in the form v = Px. Observe that P, according to Lemma (3.4.9), is invertible as basis vectors form the columns of P. Therefore, the required coordinates of v are given by x = P−1 v. Using the notation for coordinate vectors introduced earlier, we may write the last equation as [v]B = P−1 v

(3.5)

for expressing the coordinate vector of v with respect to the basis B. Note that we can interpret the column vector v as the coordinate vector [v]E of itself with respect to the standard basis E. To obtain a similar result about the way coordinates change due to change of bases in an arbitrary vector space, we need the general form of Lemma (3.4.9) which makes it easier to check whether a set of vectors form a basis or not.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basis and Dimension

143

Proposition 3.4.11. Let V be an m-dimensional vector space with basis B = {v1 , v2 , . . . , vm }. Consider another set of m vectors u1 , u2 , . . . , um in V. Let p j be the coordinate vector of u j with respect to the basis B, and let P be the m × m matrix whose jth column is the column vector p j . Then, the m vectors u1 , u2 , . . . , um form a basis of V if and only if P is invertible. Proof. Since dim V = m, it suffices to show that the m vectors u1 , u2 , . . . , um are linearly independent if and only if P is invertible. But this is precisely what we had shown in the proof of Lemma (3.4.4) (with A in place of P). ! This proposition provides us with a very efficient method to produce bases out of a given basis of a finite-dimensional vector space. Corollary 3.4.12. Let v1 , v2 , . . . , vm be a basis of a vector space V over a field F, and let P = [pi j ] be an invertible matrix in Mm (F). Then, the m vectors u1 , u2 , . . . , um defined by the equations uj =

m 1

p i j vi

for

j = 1, 2..., m,

i=1

form a basis of V. We now give a couple of examples to illustrate the uses of the preceding results. EXAMPLE 34 Consider the set B = {1, 1 + x, 1 + x2 } of vectors in R2 [x], the real vector space of all real polynomials of degree at most 2. Since E = {1, x, x2 } is the standard basis of V, we may express the vectors of B in terms of the vectors of E to construct the matrix P as in the preceding lemma. It is clear that  1  P = 0  0

1 1 0

 1  0  1

is invertible, as det P = 1. Therefore, B is another basis of R2 [x]. EXAMPLE 35 Consider R2 with the usual standard basis. For any real θ, consider the vectors 2 3 2 3 cos θ − sin θ and v2 = v1 = sin θ cos θ in R2 . Expressing these vectors in terms of the standard basis, we see that the matrix P in this case is ' ( cos θ − sin θ P= , sin θ cos θ from which it is clear that det P = 1. Thus, P is invertible and so v1 , v2 form a basis of R2 .

Saikia-Linear Algebra

144

book1

February 25, 2014

0:8

Vector Spaces

Change of Basis Matrix It is time now to discuss the change of coordinates relative to a change of basis. First, a definition. Definition 3.4.13. Let B = {v1 , v2 , . . . , vm } and B' = {u1 , u2 , . . . , um } be two ordered bases of an m-dimensional vector space V. Consider the unique m2 scalars pi j obtained by expressing the u j as linear combinations of the vectors of the basis B: uj =

m 1

p i j vi

for

j = 1, 2, . . . , m.

i=1

The m × m matrix P = [pi j ] whose jth column is formed by the coefficients in the expression for u j (in the same order) is called the transition matrix, or the change of basis matrix from the basis B' to the basis B. Note that such a transition matrix from one basis to another is invertible by Lemma (3.4.11). For example, the matrices P in the preceding Examples 34 and 35 are the transition matrices from the new bases to the usual standard bases of R2 [x] and R2 , respectively. We now generalize Example 33 to give a formula for relating the coordinates of a vector with respect to two different bases of a vector space: Theorem 3.4.14. Let V be a finite-dimensional vector space with bases B and B' . For any v ∈ V, let x and x' be the coordinate vectors of v with respect to bases B and B' , respectively. Then, x' = P−1 x, where P is the transition matrix from the basis B' to the basis B. Proof. Let B = {v1 , v2 , . . . , vm } and B' = {v' 1 , v' 2 , . . . , v' m } be two bases of the vector space V. Given any v ∈ V, let v=

m 1

xi vi

and v =

i=1

m 1

x'j v' j

(3.6)

j=1

be the expressions of v as a linear combination of the bases vectors so that the scalars xi and x' j form the coordinate vectors x and x' of v with respect to the two bases, respectively. Let P = [pi j ] be the matrix of transition from the basis B' to B. Then, by Definition (3.4.13) v' j =

m 1

p i j vi

for

j = 1, 2, . . . , m.

i=1

Substituting the expression for the v! j in the second of the Equation (3.6), we then see that m  m 1 1  '   v= x j  pi j vi  j=1 i=1   m m   1 1   v . '  p x = i j j  i  i=1

j=1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basis and Dimension

145

Since the vectors vi are independent, comparing the preceding expression for v with the first of the Equation (3.6), we conclude that xi =

m 1

pi j x'j

for i = 1, 2, . . . , m.

j=1

But these m equations are equivalent to a single matrix equation x = Px' . The proof is complete as P is invertible.

!

EXERCISES 1. Determine whether the following assertions are true or false- giving brief justifications. (a) Every vector space is finite-dimensional. (b) If a finite subset S of a finite-dimensional vector space V spans V, then S must be a basis of V. (c) The zero vector space can have no basis. (d) A finite-dimensional vector space over R can have only finitely many distinct bases. (e) If B is a basis of a vector space V, then for any subspace W of V, there is a subset of B which is a basis of W. (f) In an m-dimensional vector space V, any set of m vectors is linearly independent if it spans V. (g) In an m-dimensional vector space V, any set of m vectors spans V if it is linearly independent. (h) If a subspace W of an m-dimensional vector space V has m linearly independent vectors, then W = V. (i) Any transition matrix from one basis of a finite-dimensional vector space to another basis is an invertible matrix. (j) Any invertible matrix of order n over a field F is the transition matrix of some basis of Fn to its standard basis. (k) Any non-zero subspace of an infinite-dimensional vector space has to be infinitedimensional. (l) Any proper subspace of an infinite-dimensional vector space has to be finite-dimensional. (m) There is only one subspace of dimension m in an m-dimensional vector space. (n) There are infinitely many distinct one-dimensional subspaces of R2 . (o) If a vector space is spanned by an infinite set, then the vector space must be infinitedimensional. (p) Any vector space over a finite field is necessarily finite-dimensional. (q) Given an m-dimensional real vector space V and a basis B of V, any element of Rm is the coordinate vector of some vector of V with respect to B 2. Prove that there are infinitely many bases of R2 one of whose members is (1, 0).

Saikia-Linear Algebra

146

book1

February 25, 2014

0:8

Vector Spaces

3. Find three different bases of the real vector space R3 [x] of all polynomials with real coefficients of degree at most 3. 4. Prove that there is a one–one correspondence between the collection of distinct bases of Fm and GLm (F), the collection of all invertible matrices in Mn (F). 5. Let W be the subset of the complex vector space C4 consisting of all those vectors whose third and the fourth components are the same. Verify that W is a subspace of C4 , and determine its dimension. 6. Determine the dimension of the subspace W of Mn (F) consisting of the symmetric matrices by actually exhibiting a basis of W. Do the same for the subspace consisting of those matrices A ∈ Mn (F) such that At = −A. 7. Classify the subspaces of R3 in terms of their dimensions. 8. Let W be the subset of M2 (C) consisting of all matrices A = [ai j ] such that a11 + a22 = 0. (a) Show that W is a subspace of M2 (C), and find a basis of W. (b) Determine the dimension of W considering it as a vector space over the field of real numbers. (c) Determine the dimension of the subspace U of the real vector space W consisting of all those matrices A = [ai j ] in W such that a12 = −a22 . 9. Show that the field of real numbers, considered a vector space over the field of rational numbers, cannot be finite-dimensional. 10. Show that the space C[a, b] of all continuous real-valued functions on a closed interval [a, b] of the real line is infinite-dimensional. 11. Let B = {v1 , v2 , . . . , vn } and C = {u1 , u2 , . . . , un } be bases of a vector space V over a field F. Let P be the n × n matrix over F whose jth column is the coordinate vector of v j with respect to basis C for 1 ≤ j ≤ n. Let [v]B and [v]C be the coordinate vectors of an arbitrary vector v with respect to bases B and C, respectively. Which of the following two equations is satisfied by every vector v ∈ V? [v]B = P[v]C

or [v]C = P[v]B .

12. Let A = {v1 , v2 , v3 } and B = {u1 , u2 , u3 } be bases of a vector space V over R. Suppose that v1 = u1 − u2 , v2 = −u1 + 2u2 + u3 and v3 = −2u1 + u2 + 4u3 . Determine the change of basis matrix from B to A, and the coordinates of v = −3v1 + v2 − 2v3 with respect to the basis B. 13. Let A = {v1 , v2 , . . . , vn } and B = {u1 , u2 , . . . , un } be bases of Fn , and P be the change of basis matrix from A to B. Consider the augmented matrix: / A = u1 · · · un

0 v1 · · · vn .

Prove that A is row-equivalent to [In P], where In is the identity matrix of order n over F. 14. Determine the coordinates of (−3, 2, −1) ∈ R3 with respect to the basis {(1, 1, 1), (1, 0, 1), (1, 1, 2)} of R3 . 15. Determine the coordinate vector of 1 − 3x + 2x2 ∈ R2 [x] with respect to the basis {x2 − x, x2 + 1, x − 1} of R2 [x]. 16. Find a basis of the subspace W = {(x1 , x2 , x3 , x4 ) | x1 − 3x2 + x3 = 0} of R4 . Calculate the coordinates of (1, 1, 2, −1) ∈ W with respect to this basis.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Subspaces Again

147

17. Consider the matrix  0 0  A = 0 1  0

1 0 0 0 0

0 0 1 0 0

0 0 0 0 1

 0  1  0  0 0

over any field F. Prove that A is invertible by showing that A is a transition matrix from a certain basis of F5 to the standard basis. Generalize. 18. Prove that a vector space V over an infinite field cannot be the union of finite number of proper subspaces. 19. Prove that the polynomials 1, 2x, −2 + 4x2 and −12x + 8x3 form a basis of R3 [x]. Find the coordinate vector of −5 + 4x + x2 − 7x3 with respect to this basis. These polynomials are the first four Hermite polynomials. 20. Prove that the polynomials 1, 1 − x, 2 − 4x + x2 and 6 − 18x + 9x2 − x3 form a basis of R3 [x]. Find the coordinate vector of −5 + 4x + x2 − 7x3 with respect to this basis. These polynomials are the first four Laguerre polynomials.

3.5 SUBSPACES AGAIN In this section, we focus on some aspects of sums of subspaces of a vector space. Here is a summary of the facts we have learnt about subspaces so far. • A subspace W of a finite-dimensional vector space V is finite-dimensional; in fact, dim W ≤ dim V. • Any generating set for a subspace W contains a basis of W. • Any basis of a subspace W can be extended to a basis of V.

We can strengthen the first result by a simple observation whose proof is left as an exercise to the reader. Lemma 3.5.1. Let U ⊂ W be subspaces of a finite-dimensional vector space V. Then, U = W if and only if dim U = dim W. Note that any result about a vector space is applicable to a subspace, as a subspace is a vector space on its own. Thus, for example, in the lemma we can take W = V. Similarly, any result proved for general subspaces is valid for the whole space unless it is specifically for proper subspaces only. Quite frequently, we need to discuss sums of subspaces. As we had seen in Section 3, given subspaces W1 and W2 of a vector space V, their sum W1 + W2 , defined as W1 + W2 = {w1 + w2 | wi ∈ Wi }, is again a subspace of V. The following result gives a useful formula for the dimension of the sum of two subspaces. Proposition 3.5.2.

Let W1 and W2 be finite-dimensional subspaces of a vector space V. Then, dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ).

Saikia-Linear Algebra

148

book1

February 25, 2014

0:8

Vector Spaces

Thus W1 ∩ W2 is the zero subspace if and only if dim(W1 + W2 ) = dim W1 + dim W2 . Proof. Consider the case when the subspace W1 ∩ W2 is non-zero to begin with, so we can suppose that it has a basis. Let {v1 , v2 , . . . , vr } be a basis of W1 ∩ W2 so that r = dim(W1 ∩ W2 ). Now, W1 ∩ W2 being a subspace of both W1 and W2 , its basis can be extended to bases of these bigger subspaces. So, we may assume that B1 = {v1 , v2 , . . . , vr , u1 , . . . , ut } and B2 = {v1 , v2 , . . . , vr , w1 , . . . , w s } are the extended bases of W1 and W2 respectively. Note that dim W1 = r + t, and dim W2 = r + s. We claim that B = {v1 , v2 , . . . , vr , u1 , . . . , ut , w1 , . . . , w s } is a basis of W1 + W2 . It is clear from the definition of sum of subspaces that B spans W1 + W2 . So to prove the claim, we need only to prove that B is a linearly independent set. If not, some vector in the list for B must be a linear combination of the vectors preceding it. Since the first (r + t) vectors in the list for B are linearly independent, this vector has to be one of the last s vectors of the list. In other words, for some k, (1 ≤ k ≤ s), the vector wk is the following linear combination: wk = a1 v1 + · · · + ar vr + b1 u1 + · · · + b s u s + c1 w1 + · · · + ck−1 wk−1 , where at least one c j is non-zero. This relation can be rewritten as wk − c1 w1 − · · · − ck−1 wk−1 = a1 v1 + · · · + ar vr + b1 u1 + · · · + b su s . Observe that the expression on the left-hand side is in W2 , whereas the one on the right-hand side is in W1 so that the vector represented by these two equal expressions must be in W1 ∩ W2 . Thus, wk − c1 w1 − · · · − ck−1 wk−1 is a linear combination of the basis vectors v1 , v2 , . . . , vr of W1 + W2 . This, however, contradicts the fact that the vectors of B2 are linearly independent. Our claim is thus established showing that dim(W1 + W2 ) = r + s + t. The desired equality of dimensions follows as dim W1 = r + s and dim W2 = r + t. In case the intersection of W1 and W2 is the zero subspace, a similar argument shows that the union of any two bases chosen for W1 and W2 is a basis of the sum W1 + W2 . Hence, the formula for the dimension of W1 + W2 holds in this case, too. ! Note that the proposition is applicable even if V is infinite-dimensional. We give some applications of the result in the following examples. EXAMPLE 36 Let V = R2 , and let W1 and W2 be two distinct lines passing through the origin. Thus, they are one-dimensional subspaces of R2 such that their intersection is the zero space {0} which is the origin of R2 . Thus, dim(W1 ∩ W2 ) = 0. But then the formula in Proposition (3.5.2) implies that dim(W1 + W2 ) must be 2. Since R2 itself has dimension 2, it follows from the first result quoted at the beginning of this section that W1 + W2 = R2 . Our conclusion also proves that any two non-zero vectors, one each from these distinct lines, will form a basis of R2 , for any non-zero vector in a one-dimensional space forms a basis. EXAMPLE 37 Consider subspaces in R3 next. Let W1 and W2 be two distinct planes in R3 passing through the origin. Thus, they are two-dimensional subspaces of R3 . Since dim(W1 + W2 ) cannot exceed 3, the dimension of R3 , it follows from Proposition (3.5.2) that dim(W1 ∩ W2 ) ! 0. However, (W1 ∩ W2 ) ⊂ W1 so that its dimension cannot exceed

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Subspaces Again

149

dim W1 = 2. If it equals 2, then by Lemma (3.5.1), (W1 ∩ W2 ) = W1 , a contradiction as W1 and W2 are distinct planes. Thus, the only possibility is that dim(W1 ∩ W2 ) = 1. We conclude that the given planes intersect at a straight line passing through the origin. It should be noted that in the last example, an argument based only on dimensions has helped us in getting a geometrical insight. We now discuss the general case of sums of finitely many subspaces of a vector space. Definition 3.5.3. Let W1 , W2 , . . . , Wk be subspaces of a vector space V. The sum of these subspaces, denoted by W1 + W2 + · · · + Wk , is defined as follows: W1 + W2 + · · · + Wk = {w1 + w2 + · · · + wk | wi ∈ Wi }. The subspaces Wi are known as summands of the sum. It is clear that the sum, thus defined, is a subspace of V. Also, note that the definition is valid for subspaces of an infinite-dimensional vector space, too. Direct Sums of Subspaces The assertions of the following proposition give the equivalent conditions, which when imposed on a sum of subspaces, make such sums one of the most useful concepts of linear algebra. Proposition 3.5.4. Let W1 , W2 , . . . , Wk be subspaces of a vector space V, and W = W1 + W2 + · · · + Wk be their sum. Then, the following are equivalent. (a) Every vector in the sum W can be expressed uniquely as a sum of vectors from the subspaces W1 , W2 , . . . , Wk . (b) For vectors wi ∈ Wi , the relation w1 + w2 + · · · + wi = 0 implies that each wi = 0 in Wi . (c) For each i, 1 ≤ i ≤ k, Ci + · · · Wk ) = {0} Wi ∩ (W1 + W2 + · · · + W

is the zero subspace. (Here, the hat over Wi means that the term is missing in the sum.) Proof. As every subspace contains the zero vector, the zero vector of W can be expressed as the sum v1 + v2 + · · · + vk , where for each i, vi is the zero vector of the subspace Wi . Therefore, the uniqueness assumed in condition (a) shows that (a) implies (b). Conversely, if for a vector v ∈ W, v = w1 + w2 · · · + wk = u1 + u2 · · · + uk ,

where for each i, wi , ui ∈ Wi , then we have (w1 − u1 ) + (w2 − u2 ) + · · · + (wk − uk ) = 0.

Saikia-Linear Algebra

150

book1

February 25, 2014

0:8

Vector Spaces

Since Wi is a subspace, for each i, (wi − ui ) ∈ Wi . So, if (b) holds, wi = ui for each i. Thus, (b) implies (a). A relation of the type w1 + w2 + · · · + wk = 0 with vectors wi ∈ Wi implies that for each fixed i, wi is the sum of vectors −w j for all j ! i, and therefore, is in the intersection of Wi with the sum of the other subspaces W j with j ! i. Thus, conditions (b) and (c) are equivalent. ! Definition 3.5.5. Let W1 , W2 , . . . , Wk be subspaces of a vector space V. The subspace W = W1 + W2 + · · · + Wk is called the internal direct sum, or simply the direct sum, of the subspaces if any one and hence all of the three conditions of Proposition (3.5.4) are satisfied. In that case, we write W = W1 ⊕ W2 ⊕ · · · ⊕ Wk . The subspaces Wi are known as direct summands of the subspace W. In case the subspaces are finite-dimensional, there is a useful characterization of their direct sum in terms of dimensions. Proposition 3.5.6. Let W1 , W2 , . . . , Wk be finite-dimensional subspaces of a vector space V, and W = W1 + W2 + · · · + Wk be their sum. Then, the following are equivalent. (a) W = W1 ⊕ W2 ⊕ + · · · ⊕Wk . (b) The union of any bases B1 , B2 , . . . , Bk of the subspaces W1 , W2 , . . . , Wk , respectively, is a basis of W. (c) dim W = dim W1 + dim W2 + · · · + dim Wk . Proof. We first note that the vectors of the union of any bases of the subspaces W1 , W2 , . . . , Wk span W, whether the sum W is direct or not. To prove that condition (a) implies (b), note that as the union of the bases of the subspaces Wi clearly spans their sum W, it suffices to show that vectors in this union are linearly independent. Now, given any linear combination of the vectors in the union of the bases which equals the zero vector, we can group together, for each i, the scalar multiples of the members of Bi in that combination, and label the sum of these multiples as, say, ai . Each ai being a linear combination of vectors of the basis Bi is in Wi . Therefore, the relation we started with can be rewritten as a1 + a2 + · · · + ak = 0,

where ai ∈ Wi .

Since we are assuming that W is a direct sum, it follows, by virtue of the preceding proposition, that each ai is the zero vector. On the other hand, the vectors in each Bi are linearly independent so the scalars in the linear combination of the vectors of Bi that resulted in ai , must all be zeros. The argument can be repeated for each i to show that all the scalar coefficients in the original relation involving the basis vectors in the union are zeros. Thus, the vectors in the union are linearly independent proving that condition (a) implies condition (b). A similar argument will prove that condition (b) implies (a). That (b) implies (c) is trivial. So, we assume condition (c), and consider the union of given bases B1 , B2 , . . . , Bk of the subspaces W1 , W2 , . . . , Wk , respectively. It is clear that the union is a spanning set of the sum W. Thus, if the union is not a basis of W, then there is a proper subset of the union which will be a basis of W. In that case, dim W will be strictly less than the sum of the dimensions of W1 , W2 , . . . , Wk contradicting our assumption. This completes the proof of the proposition. !

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Subspaces Again

151

A couple of remarks are in order. (i) Recall that we have insisted that a basis of a vector space is an ordered one. Thus, the union of the bases in the condition (c) must be interpreted as the sequence of vectors obtained by stringing together the vectors in the ordered bases of the subspaces one after the other in the same order. (ii) The definition of a direct sum of two subspaces is particularly simple. We can rephrase it as follows: the sum W = W1 + W2 is a direct sum if and only if W1 ∩ W2 = {0}, the zero subspace. (iii) For finite-dimensional subspaces W1 and W2 of a vector space V, the sum W = W1 ⊕ W2 is direct if and only if dim W = dim W1 + dim W2 . EXAMPLE 38 Going back to Proposition (3.5.2), we see, for example, that the sum of any two distinct one-dimensional subspaces of R2 is direct; in fact, R2 = W1 ⊕ W2 for any two distinct lines W1 and W2 passing through the origin. On the other hand, we had seen after Proposition (3.5.2) that the sum of two distinct planes in R3 passing through the origin cannot be a direct sum. EXAMPLE 39 Recall that a matrix A ∈ Mn (F) is called a symmetric matrix if At = A, and a skewsymmetric matrix if At = −A. We had also seen that the symmetric and the skewsymmetric matrices form subspaces of Mn (F), say W1 and W2 , respectively. Observe that for any matrix A ∈ Mn (F), the matrix 1/2(A + At ) is symmetric, whereas 1/2(A − At ) is anti-symmetric. (This assumes that the field F is such that division by 2 is possible in F). It follows that Mn (F) = W1 + W2 . Next, note that a matrix in Mn (F) is symmetric as well as skew-symmetric if and only if it is the zero matrix. In other words, W1 ∩ W2 = {0}. We can, therefore, conclude that Mn (F) = W1 ⊕ W2 . Given a subspace W of a vector space V, quite often we need to know whether W is a direct summand of V, or equivalently, whether there is another subspace W1 such that V = W ⊕ W1 . Such a subspace is sometimes referred to as a direct complement of W. If W is a subspace of a finitedimensional vector space V, it is easy to see that complementary subspaces for W exist. Choose a basis of W, expand it to a basis of V and let W1 be the subspace spanned by those basis vectors which are not in W. Then, it is clear that V = W ⊕ W1 . We record this observation now. Proposition 3.5.7.

Any subspace of a finite-dimensional vector space has direct complements.

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given vector spaces are finite-dimensional over arbitrary fields. (a) Any basis of a proper subspace of a vector space V can be extended to a basis of V. (b) Any linearly independent subset of a proper subspace of a vector space V can be extended to a basis of V.

Saikia-Linear Algebra

152

book1

February 25, 2014

0:8

Vector Spaces

(c) For any two subspaces W1 and W2 of a vector space, dim(W1 + W2 ) is larger than the minimum of the two dimensions of W1 and W2 . (d) The direct complement of any proper subspace of a vector space is unique. (e) If, for subspaces W1 , W2 , . . . , Wk in a vector space, Wi ∩ W j = (0) for all i ! j, then the sum W1 + W2 + · · · + Wk is direct. (f) If for two subspaces W1 and W2 of a vector space V, dim W1 + dim W2 = dim V, then V = W1 ⊕ W2 .

(g) If dim V > 1, then there are always two subspaces W1 and W2 of vector space V such that V = W1 ⊕ W2 .

(h) For subspaces W1 , W2 and W3 in a vector space, the dimension of the sum (W1 + W2 + W3 ) equals dim W1 + dim W2 + dim W3 − dim(W1 ∩ W2 ∩ W3 ). (i) In an n-dimensional vector space V, there are n distinct subspaces whose direct sum is V. (j) An n-dimensional vector space cannot be a direct sum with more than n direct summands. 2. 3. 4. 5.

Prove Lemma (3.5.1). Prove Proposition (3.5.2) in case W1 ∩ W2 = {0}. Prove that condition (c) implies (a) in Proposition (3.5.4). Let F be a field, and let e1 , e2 , . . . , en be the standard basis of the vector space Fn . Prove that Fn = Fe1 ⊕ Fe2 ⊕ · · · ⊕ Fen .

6. Give an example of a vector space V and subspaces W1 , W2 and W3 such that W2 ! W3 and V = W1 ⊕ W2 = W1 ⊕ W3 . 7. Let W1 , W2 and W3 be subspaces of a vector space. If W2 ⊂ W1 , then prove the Modular Law: W1 ∩ (W2 + W3 ) = W2 + W1 ∩ W3 . 8. Let W1 , W2 , W3 and V1 be subspaces of a vector space V such that V1 = W2 ⊕ W3 . If V = W1 ⊕ V1 , then show that V = W1 ⊕ W2 ⊕ W3 . 4 9. Let W1 , W2 , . . . , Wk be subspaces of a vector space. Prove that the sum i Wi of these subspaces is direct if and only if every set of non-zero vectors w1 , w2 , . . . , wk , where w j is chosen from W j for j = 1, 2, . . . , k, is linearly independent. 10. Let W1 , W2 and W3 be the following subsets of vectors in R3 : W1 = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 0}, W2 = {(x1 , x2 , x3 ) | x1 = x2 }, W3 = {(x1 , x2 , x3 ) | x1 = x2 = 0}.

Verify that W1 , W2 and W3 are subspaces of R3 such that R3 = W1 + W2 = W1 + W3 . Which of these two sums are direct?

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Rank of a Matrix

153

11. Let W1 and W2 be subspaces of M2 (R) given by ;' ( < a b | a, b, c ∈ R , W1 = c 0 ;' ( < 0 b W2 = | b, c, d ∈ R . c d Show that M2 (R) = W1 + W2 but that the sum is not direct. Find a subspace W3 of W2 such that M2 (R) = W1 ⊕ W3 . 12. Let V be the real vector space of all mappings from R into R. (See Example 8 of Section 3.2 for the definition of space of mappings). Let W1 and W2 be the subsets of V consisting of even and odd functions in V, respectively, given by W1 = { f ∈ V | f (−x) = f (x) for all x ∈ R}, W2 = { f ∈ V | f (−x) = − f (x) for all x ∈ R}. Verify that W1 and W2 be subspaces of V, and show that V = W1 ⊕ W2 . 13. Let W1 and W2 be the subspaces of Mn (F) consisting of symmetric and skew-symmetric matrices of Mn (F), respectively. Argue in terms of the dimensions of these subspaces to show that Mn (F) = W1 ⊕ W2 .

Recall that a scalar matrix is a diagonal matrix in Mn (F) such that all its diagonal entries are equal. 14. Let W be the subspace of all scalar matrices in M2 (R). Find a basis of W in terms of the unit matrices e11 , e12 , e21 and e22 . Hence, determine three distinct subspaces W1 , W2 and W3 of M2 (R) by giving their bases such that M2 (R) = W ⊕ W1 = W ⊕ W2 = W ⊕ W3 .

3.6 RANK OF A MATRIX The ideas such as subspace and linear independence, developed in the context of vector spaces, also help us in gaining useful insights about individual matrices. In this section, we use the ideas to examine the important concept of the rank of a matrix. Consider any A ∈ Mm×n (F). Each of the m rows of A, considered an n-dimensional row vector, is a vector in the vector space Fn . Similarly, the m-dimensional column vectors of A can be considered as vectors in Fm . Definition 3.6.1. Given A ∈ Mm×n (F), the subspace of Fn spanned by the row vectors of A is called the row space of A and the dimension of the row space of A is called the row rank of A. Similarly, the subspace of Fm spanned by the column vectors of A is the column space of A, and the dimension of the column space is the column rank of A. We denote the row space and the column space of a matrix A by row(A) and col(A), respectively. It is clear that the identity matrix In of order n over any field F has n as its row rank and column rank as its row vectors as well as the column vectors form the standard basis of Fn . On the other hand, it is clear that the zero matrix in Mm×n (F) has both its ranks zero. To avoid trivialities, we will deal with only non-zero matrices in this section.

Saikia-Linear Algebra

154

book1

February 25, 2014

0:8

Vector Spaces

We illustrate the definitions with the following example: EXAMPLE 40 Consider real matrices  1 0 0 1 A =  0 0 0 0  0 0 C =  0 0

0 0 0 0

0 0 1 0

0 0 0 0

1 0 0 0

0 1 0 0

 0  0 , 0 1

 0  0 , 1 0

 0 0 B =  0 0

 0 0 D =  0 0

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 0

1 0 0 0

0 0 0 0

 0  0 , 0 1

 0  1 . 0 0

For each of these four matrices, the rows are vectors in R5 whereas the columns are vectors in R4 . Further, in each non-zero row, there is only one non-zero entry, and more importantly, this is the only non-zero entry of the column containing it. It is thus clear that the non-zero rows are linearly independent as vectors of R5 so that the row rank of each of these matrices is the number of non-zero rows in the matrix. Similarly, the non-zero columns are also linearly independent and thus their numbers are the column ranks. Observe that it is easy to find the row and the column ranks of these matrices precisely because they are in reduced row echelon form in which every non-zero column is a pivot column. Thus, it makes sense to relate the ranks of an arbitrary matrix to the ranks of its echelon forms. It is easy to relate the row rank of a matrix to the row rank of any matrix row equivalent to it. Recall that (see Definition (2.3.1)) there are three types of elementary row operations. Lemma 3.6.2. Two row equivalent matrices in Mm×n (F) have the same row rank. Proof. We show that two row equivalent matrices have the same row spaces. Therefore, it suffices to show that if B is obtained from A ∈ Mm×n (F) by a single elementary row operation, then row(B) = row(A). Considering the three types of elementary row operations, we see that any row vector of B is either a scalar multiple of the corresponding row vector of A, or a linear combination of two row vectors of A. It follows that the span row(B) of the row vectors of B is contained in row(A). Since A can also be obtained from B by some suitable row operations, a similar argument shows that row(A) ⊂ row(B). The lemma follows. ! We may thus conclude that the row rank of any matrix is the same as the row rank of its reduced row echelon form or any of its echelon forms. Before we take up the general result about the row ranks of such matrices, we note that the non-zero rows of the matrix   0 0 1 0 0 0 1 0 −1 0   0 0, R = 0 0 1  0 0 0 0 1  0 0 0 0 0

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Rank of a Matrix

155

which is the reduced row echelon form of some matrix of order 5, are linearly independent as row vectors in R5 . To take another example, consider the following matrix in row echelon form:  1 0 S =  0 0

2 1 0 0

3 7 0 0

4 9 0 0

6 8 2 0

 5  −1 . 3 0

In this matrix too, it is clear that the non-zero rows are linearly independent as vectors of R6 , as the entries below any pivot are all zeros. The reader should verify the asserted linear independence by working out the details, as it will make the argument of the following result clear. Lemma 3.6.3. Let R ∈ Mm×n (F) be a row echelon or a reduced row echelon matrix. Assume that R is non-zero. Then, the non-zero row vectors of R are linearly independent as vectors in Fn . Proof. Consider the non-zero rows of R as row vectors in Fn , and number them as ρ1 , ρ2 , . . . , ρr starting from the bottom-most non-zero row so that the top-most row of R is ρr . Now in the list ρ1 , ρ2 , . . . , ρr , the rows preceding any row ρ j are actually the non-zero rows below it inside the echelon matrix R. Thus, the components in ρ1 , ρ2 , . . . , ρ j−1 corresponding to the pivot in ρ j are all zeros. It follows that ρ j cannot be a linear combination of the preceding rows ρ1 , ρ2 , . . . , ρ j−1 . Hence, by the characterization of linearly dependent vectors given in Proposition (3.3.10), the lemma follows. ! Corollary 3.6.4. The row rank of a row echelon, or a reduced row echelon matrix, is the number of its non-zero rows. This corollary, along with Lemma (3.6.2), implies the following easy characterization of the row rank of a matrix. Proposition 3.6.5. The row rank of a matrix A in Mm×n (F) is the number of the non-zero rows of the reduced row echelon form of A. We now begin our discussion of column rank. The first task is to relate the column rank of a reduced row echelon matrix R to its pivot columns. Let γ be any non-zero non-pivot column of R. Consider the pivot columns preceding γ as we go along the matrix R from the left to the right; we number them as γ1 , γ2 , . . . , γ s , where γk is the column in which the pivot appears in the kth row. It should be clear that γk need not be the kth column of R as we are numbering only the pivot columns of R. As γ s is the pivot column just preceding γ (there may be non-pivot columns between γs and γ), all the entries in the column γ below the sth entry are zeros. Therefore, the column vector γ looks like   a1   ..   .    a  γ =  s ,  0   .   ..    0

Saikia-Linear Algebra

156

book1

February 25, 2014

0:8

Vector Spaces

where some of the scalars ak may also be zeros. Observe that each of the pivot columns γ1 , γ2 , . . . , γ s has pivot 1 so the column γ can be expressed as the linear combination a 1 γ1 + a 2 γ2 + · · · + a s γ s . We have thus shown that any non-zero non-pivot column of R is a linear combination of some of its pivot columns. Therefore, the pivot columns of R are sufficient to span the column space. Note next that the entries preceding a pivot in the row containing the pivot are all zeros. Therefore, no pivot column of R can be a linear combination of the pivot columns preceding it. This proves that the pivot columns of R are linearly independent and so form a basis of the column space of R. We record this fact as the following result. Lemma 3.6.6. Let R ∈ Mm×n (F) be a reduced row echelon matrix. Then, the pivot columns of R form a basis of col(R), so that the column rank of R is the number of pivot columns of R. To link up this lemma with a result about the column rank of an arbitrary matrix, we need to prove the following. Lemma 3.6.7. The column ranks of two row equivalent matrices in Mm×n (F) are equal. Proof. Let A and B be row equivalent matrices in Mm×n (F). Since the column rank of a matrix is the dimension of the space spanned by its column vectors, it suffices to prove that the numbers of linearly independent columns of A and B are the same. Equivalently, it is sufficient to show that any relation of linear dependence among the columns of A gives rise to an exactly similar relation among the corresponding columns of B and vice versa. To prove the assertion, let γ1 , γ2 , . . . , γn be the column vectors A, and σ1 , σ2 , . . . , σn be the column vectors of B. Consider the following relation of linear dependence c1 γ1 + c2 γ2 + · · · + cn γn = 0,

ci ∈ F.

(3.7)

It implies that (c1 , c2 , . . . , cn )t is a solution of the vector equation x1 γ1 + x2 γ2 + · · · + xn γn = 0, which can also be expressed as the matrix equation Ax = 0 (see Equation 3.3), where x is the column vector (x1 , x2 , . . . , xn )t . It follows that (c1 , c2 , . . . , cn )t is a solution of the matrix equation Ax = 0. However, A and B being row equivalent, we know that the systems Ax = 0 and Bx = 0 have the same set of solutions. Working backwards with the columns of B now, we see that the relation (3.7) implies the following relation among the columns of B: c1 σ1 + c2 σ2 + · · · + cn σn = 0,

ci ∈ F

Since row equivalence is symmetric, a similar assertion, obtained by interchanging A and B, holds. This completes the proof. ! The preceding two lemmas together yield the following result about column ranks.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Rank of a Matrix

157

Proposition 3.6.8. Let R be the reduced row echelon form of a matrix A ∈ Mm×n (F). Then, the linearly independent columns of A are precisely those corresponding to the pivot columns of R. Thus, the column rank of A is the number of pivot columns of R. We can now state the main result of this section. Theorem 3.6.9.

The row rank and the column rank of a matrix in Mm×n (F) are the same.

Proof. Let R be the reduced row echelon form of an arbitrary matrix A ∈ Mm×n (F). Since row equivalent matrices have the same row rank as well as the same column rank, it suffices to show that the row rank and the column rank of R are the same. We can assume that R is non-zero. By Lemma (3.6.3), the row rank of R is the number of non-zero rows of R. However, each non-zero row of R contains a pivot, and each pivot of R belongs to exactly one pivot column of R. It follows that the number of non-zero rows of R is precisely the number of pivot columns of R. Thus, the row rank of R, according to Lemma (3.6.6), is the column rank of R. ! Definition 3.6.10. Let A ∈ Mm×n (F). The rank of A is the common value of the row and the column rank of A. We denote this common value as rank(A). As we have already noted, for a matrix A ∈ Mm×n (F), the product Ax describes all possible linear combinations of the column vectors of A as x ranges over Fn . It is also clear, from the properties of matrix multiplication, that the set {Ax | x ∈ Fn } is a subspace of Fm . Thus, {Ax | x ∈ Fn } is the explicit description of the column space of A. So, we have another characterization of the rank of a matrix, which is useful for discussing ranks of products of matrices. Proposition 3.6.11.

Let A ∈ Mm×n (F). Then rank(A) is the dimension of the subspace col(A) = {Ax | x ∈ Fn }.

The next proposition gathers some assorted results about ranks of matrices. All the matrices in the proposition are over the same field F. Proposition 3.6.12. (a) (b) (c) (d)

Let A ∈ Mm×n (F).

If At is the transpose of A, then rank(A) = rank(At ). For any n × p matrix B, rank(AB) ≤ rank(A). For any p × m matrix B, rank(BA) ≤ rank(A). For any invertible matrices P and Q of suitable orders, rank(PA) = rank(AQ) = rank(A).

Proof. (a) is immediate from the definition of rank. For the rest, the preceding description of column space helps us in deriving quick proofs. For example, for any y ∈ F p and n × p matrix B, let x = By. Then, x ∈ Fn . Note that Ax = (AB)y. This shows that col(AB) ⊂ col(A), proving (b). We leave the proofs of the other two to the reader. !

Saikia-Linear Algebra

158

book1

February 25, 2014

0:8

Vector Spaces

Sometimes, we need to find bases for the row and the column spaces of a matrix explicitly. We summarize now our findings about such bases. • The basis of the column space col(A) of a matrix A is precisely those columns which correspond to the pivot columns of the reduced row echelon form of A. Note that the pivot columns of the reduced form of A need not form a basis of the column space of A in general, as row operations invariably change the column space. This is in sharp contrast to the situation obtained for the row space of a matrix. • The basis of the row space row(A) of a matrix A is precisely the non-zero rows of the reduced row echelon form of A. The following examples illustrate the results we have developed in this section. EXAMPLE 41 Consider the matrix

Now, it can be verified that

 1 4  A = 0 8  0  1 0  R = 0 0  0

0 1 3 2 0

2 0 1 0 1

0 1 0 0 0

0 0 1 0 0

0 −1 0 −2 0

0 −1 0 0 0

 −3  0  −2.  0 −3  0  0  0  1 0

is the reduced row echelon form of A. Since R has 4 non-zero rows, it follows that the row rank of R as well as of A is 4. Note that 4 must be the column rank of A and R too. According to the assertions preceding this example, we see that the row space row(A) as well as row(R) is spanned by (1, 0, 0, 0, 0), (0, 1, 0, −1, 0), (0, 0, 1, 0, 0) and (0, 0, 0, 0, 1). Considering the columns of A corresponding to the pivot columns of R, we see that the column space col(A) is spanned by

Null Space and Nullity

  1 4   0, 8   0

  2 1   3, 2   0

  0 0   1 0   1

and

   −3   0     2 .    0  −3

There is yet another subspace associated with an m × n matrix A, though unlike the row space and the column space of A, this one has no direct relations with the entries of the matrix. Recall (see Example 6 in the examples of subspaces in Section 3.1) that the solutions of the matrix equation Ax = 0 form a subspace of the vector space Fn .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Rank of a Matrix

159

Definition 3.6.13. Let A ∈ Mm×n (F). The null space of A, denoted by null(A), is the subspace of Fn consisting of the solutions of the matrix equation Ax = 0. The dimension of the null space of A is called the nullity of A and denoted by nullity(A). For example, the nullity of an invertible matrix A is zero as the invertibility of A implies that the zero vector is the only solution of Ax = 0 so the null space of such a matrix is the zero subspace. We now establish a relation between the rank and the nullity of an arbitrary matrix. This relation is one of the most useful results in linear algebra. Theorem 3.6.14.

Let A ∈ Mm×n (F). Then, rank(A) + nullity(A) = n.

Proof. Let s = nullity(A). If s = 0, then the system of equations Ax = 0 has only the zero solution and so every column of A is a pivot column (see discussion preceding Proposition 2.5.1). It follows that the rank of A is n proving the theorem in this case. So, we assume that s ≥ 1. Let x1 , x2 , . . . , x s be column vectors in Fn forming a basis of null(A); extend it to a basis x1 , x2 , . . . , x s , x s+1 , . . . , xn of Fn . We claim that the column vectors Ax s+1 , . . . , Axn of Fm form a basis of col(A). Recall that col(A) is the subspace consisting of all the vectors Ax as x ranges over Fn . So, it is clear that the vectors Ax j , 4 for s + 1 ≤ j ≤ n, are in col(A). On the other hand, for any x ∈ Fn , we can express it as nj=1 b j x j for some scalars b j . Therefore, Ax =

n 1

b j Ax j =

j=1

n 1

b j Ax j

j=s+1

as Ax j = 0 for 1 ≤ j ≤ s. Thus the vectors Ax s+1 , . . . , Axn span col(A). Next, we show that these vec4 4 tors are linearly independent. Suppose that for scalars c j , nj=s+1 c j Ax j = 0. Then, A( nj=s+1 c j x j ) = 0 4n showing that j=s+1 c j x j ∈ null(A). It follows that n 1

j=s+1

cjxj =

s 1

bi x j ,

i=1

for some scalars b1 , b2 , . . . , b s . As the vectors x1 , x2 , . . . , x s , x s+1 , . . . , xn form a basis of Fn , it follows that all the coefficients in the preceding equality and, in particular, all the c j are zero. The claim is thus established. By our claim, the dimension of col(A), which is the rank of A, is precisely n − s. The theorem follows. !

For the explicit determination of a basis of the null space of a matrix A, one can use the reduced row echelon form R of A. Since the solution space of Ax = 0 is the solution space of Rx = 0, such a determination is easier working with R. The following example explains the method of finding a basis of the null space of a matrix; since such a basis is a linearly independent set of solutions of Ax = 0, or equivalently of Rx = 0, the reader is advised to review the material of Chapter 2, especially of Section 2.4.

Saikia-Linear Algebra

160

book1

February 25, 2014

0:8

Vector Spaces

EXAMPLE 42 Consider the 4 × 5 real matrix:   2 −8 A =   0 0

0 3 2 4

1 5 6 −12

0 −6 −4 8

 4  −1 . 0 2

The null space of A, or of its reduced row echelon form R, is a subspace of R5 . The reader can verify that  1 0 R =  0 0

0 1 0 0

1/2 3 0 0

0 −2 0 0

 0  0  1 0

is the reduced row echelon form of A. R has three pivot columns, namely, the 1st, 2nd and 5th column so the nullity of A is 2. So, if x = (x1 , x2 , x3 , x4 , x5 )t is the column vector of variables, then x1 , x2 and x5 are the basic variables, and the other the free variables. Now the matrix equation Rx = 0, i.e.  1 0  0  0

0 1 0 0

can be written out explicitly as

1/2 3 0 0

0 −2 0 0

    0  x 0  1      x2  0 0       x  = 0. 1  3     x  0 0  4    x5 0

x1 + (1/2)x3 = 0 x2 + 3x3 − 2x4 = 0 x5 = 0. Thus, the non-trivial linear combinations of the free variables are given by x1 = −1/2x3

x2 = −3x3 + x4 . We think of the remaining basic variable x5 as the trivial or the zero linear combination of the free variables. Thus, the general solution of Rx = 0, and equivalently of Ax = 0, is given by      x1   −1/2x3   x2  −3x3 + x4      x3  x3  =  .  x    x4  4    x5 0

This general solution was obtained by following the steps outlined in the summary preceding this example.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Rank of a Matrix

161

Now comes the crucial step of expressing the general solution as a linear combination of suitable column vectors, with the coefficients in the combination being the free variables. In our example, this is done as follows:

It is now clear that

   x1   x2     x3  = x3  x   4  x5

   −1/2   −3    1  + x4    0   0

   −1/2   −3    1     0   0

and

  0 2   0. 1   0

  0 2   0 1   0

span the null space of A (as well as of R). They are also linearly independent as can be seen by looking at their components corresponding to the free variables. Corollary 3.6.15. For any A ∈ Mm×n (F), the nullity of A, or the dimension of the solution space of the homogeneous system of linear equations Ax = 0, is the number of non-pivot columns of the reduced row echelon form of A, and hence is the number of free variables of the system. We now present a result which relates the invertibility of a square matrix to its rank. Recall that a square matrix is invertible if and only if its row reduced echelon form is the identity matrix. Proposition 3.6.16. (a) (b) (c) (d) (e)

The following are equivalent for a matrix A ∈ Mn (F).

A is invertible. The rows of A form a basis of Fn . The rank of A is n. The columns of A form a basis of Fn . The nullity of A is 0.

The proof is left to the reader. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given matrices are over an arbitrary field. (a) The rank of an m × n matrix cannot exceed either m or n. (b) The rank of a 2 × n matrix, where n > 2, must be 2 if the second row is not a multiple of the first row. (c) The nullity of an m × n matrix cannot be zero if m ! n. (d) The row space of a square matrix is the same as its column space. (e) The row space of a matrix A is the same as the column space of its transpose At .

Saikia-Linear Algebra

162

book1

February 25, 2014

0:8

Vector Spaces

(f) If A is an m × 1 matrix and B a 1 × n matrix, then the product AB has rank 1. (g) The sum of the dimensions of the null space and the row space of a matrix cannot exceed the number of rows of the matrix. (h) If R is any row echelon form of a matrix A, and R has four non-zero rows, then the first four rows of A form a basis of the row space of A. (i) If R is any row echelon form of a matrix A, then the pivot columns of R form a basis of the column space of A. (j) The intersection of the null space and the column space of any square matrix is the zero subspace. (k) The row space and the column space of any non-invertible square matrix cannot be the same subspace. (l) The nullity of a matrix A is the same as the nullity of its transpose At . (m) The m × n zero matrix is the only matrix of rank zero in Mm×n (F). (n) An n × n matrix having n linearly independent columns is invertible.

(o) If the rank of an m × n matrix is r, then the nullity of At is (m − r). (p) If, for an m × n matrix A over F, the equation Ax = b is consistent for all b ∈ F, then the column space of A is all of Fm . (q) If the rank of an m × n matrix A is 2, then A is the sum of 2 m × n matrices each of whose rank is 1. 2. Find the rank and the nullity of the following matrices over R:       1 2 1 1 1 0 0  1 −1 0 4  1 0 1, 0 0 1 1, −1 2 −4 7,       1 1 2 1 1 1 0 5 −6 4 9   0 −2   2 1

3 1 −3 −1

−1 2 0 2

−2 1 4 −2

 6  −3 , 1 3

  4  6  −7   3 −5

−7 −8 10 −5 6

3 5 −8 4 −6

7 12 −9 2 −7

 −5  −8  14.  −6 3

3. Find bases for the row space, the column space and the null space of each of the matrices in Exercise 2. 4. Prove assertions (c) and (d) of Proposition (3.6.12). 5. Prove that the statements of Proposition (3.6.16) are equivalent. 6. For any A ∈ Mm×n (F) and any non-zero scalar c ∈ F, prove that rank(cA) = rank(A). 7. Let A be an m × n, and B be an n × p matrix over a field F. Show that product AB can be written as a sum of n matrices each of which has rank at most 1. Hint: Use column-row expansion for the product as given in Proposition (1.6.3). 8. Let A ∈ Mm×n (F) be of rank 1. Show that A can be written as a product BC where B is an m × 1 and C is an 1 × n matrix over F. 9. Let A ∈ Mm×n (F) be of rank r. Show that A can be written as the sum of r rank 1 matrices in Mm×n (F).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

163

10. How does the rank of a matrix in Mm×n (F) change if a single entry of the matrix is replaced by another scalar?    1    11. Determine whether v =  2  is in the column space of   −1  −4  A =  5  3

−1 1 0

 3  2.  1

Does v belong to the null space A? 12. Let A ∈ Mm×n (F) with nullity k. If m + k = n, then show that for every column vector b ∈ Fm , the system of equations Ax = b has a solution. 13. Let A ∈ Mm×n (F). Show that the system of equations Ax = b for any b ∈ Fm has a solution if and only if the system of equations At x = 0 has only the trivial solution. 14. Let A be an m × n and B an n × m matrix over any field F. If m > n, then prove that AB cannot be invertible. 15. Let A and B be matrices in Mn (F) such that A2 = A, B2 = B and In − A − B is invertible. Prove that A and B have the same rank by computing A(In − A − B) and B(In − A − B).

3.7 ORTHOGONALITY IN Rn There are some interesting interplay between the row space, the column space and the null space of a real matrix which can only be understood in terms of the natural geometry of Rn . Geometrical considerations in a real or a complex vector space can be introduced through the concept of inner products; we shall be discussing such inner products in detail in Chapter 8. Here, our aim being limited, we deal with only the standard inner product on the vector space Fm , where F is either R or C; this product is usually known as the dot product. Even if one is dealing solely with real matrices, sometimes the dot product on Cm cannot be avoided; for example, we shall be needing it to prove an important result about real symmetric matrices in Section 5.3. We shall also be introducing orthogonal matrices and QR-factorizations of real matrices in this section. Throughout this section, F is either C or R. Also, we shall be treating elements of Fm as column vectors; we shall denote a column vector as (x1 , x2 , . . . , xm )t . First the notation. For any x = a + ib for a, b ∈ R, its conjugate is given by x = a − ib; x = x if and only if x is real. Any real number can obviously be treated as a complex number with zero imaginary part; x = a + i0. Also recall from Section 1.5, that for any x = (x1 , x2 , . . . , xm )t ∈ Fm , its conjugate transpose is given by x∗ = (x1 , x2 , . . . , xm ); in case F = R, x∗ = xt = (x1 , x2 , . . . , xm ). We also set a∗ = a for a scalar a ∈ F, considering a an 1 × 1 matrix. Definition 3.7.1. For any x = (x1 , x2 , . . . , xm )t and y = (y1 , y2 , . . . , ym )t in Fm , the dot product or the standard inner product /x, y0 is given by /x, y0 = y∗ x

= x1 y1 + x2 y2 + · · · + xm ym

Note that /x, y0 is a scalar in F.

Saikia-Linear Algebra

book1

164

February 25, 2014

0:8

Vector Spaces

In case F = R, the inner product is the usual dot product in R, which is a real number: /x, y0 = yt x = x1 y1 + x2 y2 + · · · + xm ym . It is easy to establish the basic properties of the dot product. Proposition 3.7.2. For any x, y, z ∈ Fm and a ∈ F, the following hold: (a) (b) (c) (d) (e)

/x + y, z0 = /x, z0 + /y, z0. /x, y + z0 = /x, y0 + /x, z0. /ax, y0 = a /x, y0. /x, y0 = /y, x0. /x, ay0 = a /x, y0

Proof. We treat vectors in Fm as m × 1 and their conjugates as 1 × m matrices. Then by properties of matrix multiplication, one has z∗ (x + y) = z∗ x + z∗ y and y∗ (ax) = a(y∗ x); also, as yi + zi = yi + zi , (y + z)∗ x = (y∗ + z∗ )x = y∗ x + z∗ x. These relations are the first three assertions of the proposition. Next, applying the rule for the conjugate transpose of a product of two matrices given in Section 1.5, we obtain (x∗ y)∗ = y∗ (x∗ )∗ = y∗ x, which implies the fourth assertion. It is clear that the third and the fourth together give us the last assertion. ! The proposition clearly implies, for scalars c1 , c2 , . . . , cr and vectors x1 , x2 , . . . , xr in Fm , that D1 r i=1

E

ci x i , y =

r 1 i=1

ci /xi , y0

(3.8)

for any y ∈ Fm , a result whose verification is left to the reader. For a complex number x = a + ib, xx = a2 + b2 is a non-negative real number; it is clear that xx = 0 if and only if x = 0. The modulus or the absolute value |x| is then defined as the non-negative square root of xx; thus |x| = 0 if and only if x = 0. In case x is real, |x| = x if x is non-negative and |x| = −x if x is negative. It follows that for any x = (x1 , x2 , . . . , xm )t ∈ Fm /x, x0 = |x1 |2 + |x2|2 + · · · + |xm |2 , a non-negative real number with /x, x0 = 0 if and only if x is the zero vector. Definition 3.7.3. /x, x0. Thus,

The length 4x4 of a vector x ∈ Fm is defined as the non-negative square root of

4x42 = |x1 |2 + |x2 |2 + · · · + |xm |2 .

A vector x ∈ Fm is a unit vector if 4x4 = 1. The standard basis vectors e1 , e2 , . . . , em of Fm are clearly 1 x is a unit vector. unit vectors. Also note that for any non-zero vector x, 4x4

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

165

Orthogonal Vectors We now introduce the important idea of orthogonality in Fm . Definition 3.7.4.

A vector x ∈ Fm is orthogonal to y ∈ Fm , and we write x ⊥ y, if /x, y0 = 0.

Note: /x, y0 = 0 is equivalent to /y, x0 = 0. So x ⊥ y if and only if y ⊥ x. Thus we usually use the term orthogonal vectors. The idea of orthogonal vectors in Fm generalizes that of perpendicular straight lines in R2 and R3 . A vector (x1 , x2 )t in R2 represents a (directed) line segment in the plane from the origin to the point with coordinates (x1 , x2 ); the length of the segment is the length of the vector as we have defined. Now consider vectors x and y in R2 ; then they represent two sides of a triangle, which meet at the origin. By the Parallelogram law, the sum x + y represents a line segment which is parallel to and equal in length to the third side of the triangle. We calculate the length of x + y by using the properties of the dot product in R2 : 4x + y42 = /x + y, x + y0

= /x, x0 + /x, y0 + /y, x0 + /y, y0 = 4x42 + 4y42 + /x, y0 + /y, x0

= 4x42 + 4y42 + 2 /x, y0 ,

as /y, x0 = /x, y0 = /x, y0 for the dot product over R. Now, if the two sides of the triangle represented by the vectors x and y are perpendicular, then by Pythagoras’ theorem 4x + y42 = 4x42 + 4y42 , which holds, according to our calculation, if and only if /x, y0 = /y, x0 = 0. In other words, the condition for the perpendicularity of two vectors in R2 is that the their dot product in R2 is zero. Thus, the definition of the orthogonality of vectors in Fm is indeed a generalization of the idea of perpendicularity in R2 . We now record a few useful facts about orthogonality in Fm . Proposition 3.7.5.

Let F be either R or C.

(a) Any x ∈ Fm is orthogonal to the zero vector in Fm . (b) The standard basis vectors e1 , e2 , . . . , em form a mutually orthogonal set of unit vectors in Fm . (c) If x1 , x2 , . . . , xm are the coordinates of x ∈ Fm with respect to the standard basis {e1 , e2 , . . . , em }, then xi = /x, ei 0 for each i. (d) The only vector in Fm , which is orthogonal to itself, is the zero vector. (e) If x ∈ Fm is orthogonal to every vector in Fm , then x is the zero vector. Proof. The first two assertions follow directly from the definition of orthogonality. The hypothesis of 4 the next assertion implies that x = mj=1 x j e j . Taking dot product of both sides of the relation by ei we G F then obtain the required result as /ei , ei 0 = 1 and e j , ei = 0 for j ! i by (b). For the fourth assertion, note that for complex numbers x1 , x2 , . . . , xm , the relation |x1 |2 + |x2 |2 + · · · + |xm |2 = 0 holds if and only if each xi = 0. Thus, for x = (x1 , x2 , . . . , xm )t ∈ Fm , the dot product /x, x0 = 0 if and only if each component of x is zero. Finally, the hypothesis in (e) implies, in particular, that x is orthogonal to each of the standard basis vectors ei , for 1 ≤ i ≤ m. Since /x, ei 0 is the ith component of x, the assertion in (e) too follows. !

Saikia-Linear Algebra

166

book1

February 25, 2014

0:8

Vector Spaces

One of our goals of this section is to show that every vector in the row space of a matrix A ∈ Mm×n (R) is orthogonal to any vector in its null space. In fact, a lot more is true. We need one more concept to to be able to describe the situation completely. Consider the set of all vectors in Fm which are orthogonal to each vector of a given subspace W of m F ; denote it by W ⊥ . So W ⊥ = {x ∈ Fm | /x, y0 = 0 for all

y ∈ W}.

W ⊥ is clearly non-empty as the zero vector belongs to it. Now the relation /ax + by, w0 = a /x, w0 + b /y, w0 implies that if x, y ∈ W ⊥ , that is, /x, w0 = 0 and /y, w0 = 0 for any w ∈ W, then ax + by ∈ W ⊥ . This shows that W ⊥ is a subspace of Fm . Definition 3.7.6. ment of W.

For any subspace W of Fm , the subspace W ⊥ is called the orthogonal comple-

We now proceed to prove the result about the relationship between the row space row(A) and the null space null(A) of an m × n matrix A over R. For relevant definitions and results, see the discussion on row and column spaces in Section 3.6. Proposition 3.7.7. Let A ∈ Mm×n (R). Then row(A)⊥ = null(A). Proof. Since row(A) = col(At ), any vector in row(A) can be expressed as At y for some y ∈ Rm . Now, for any x ∈ null(A), the following calculation, using the definition of the dot product and some properties of transposes of matrices, shows that x ∈ row(A)⊥ : F

G x, At y = (At y)t x = yt Ax

= /Ax, y0 = /0, y0 = 0.

Thus, null(A) ⊂ row(A)⊥ . Note that though we started with the dot product in Rn , the scalar yt Ax in the middle of the calculation was expressed as a dot product in Rm . To prove the reverse inclusion row(A)⊥ ⊂ null(A), consider any v ∈ row(A)⊥ . Then v is orthogonal H I to every vector in row(A) = col(At ), and so for any arbitrary vector y ∈ Rm , v, At y = 0, where the dot n product is in R . A calculation, similar to the preceding one, then shows that /Av, y0 = 0, where the dot product takes place in Rm . Since y is an arbitrary vector in Rm , it follows from part (iv) of Proposition (3.7.5) that Av = 0 in Rm . Thus v ∈ null(A), which completes the proof of the first equality of sets in the proposition. !

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

167

Orthogonal and Orthonormal basis Definition 3.7.8. A set of vectors in Fm is said to an orthogonal set of vectors if any two distinct vectors in the set are orthogonal. An orthogonal set of vectors is said to be an orthonormal set if every vector in the set is a unit vector. G F Thus, vectors x1 , x2 , . . . , xr in Fm is orthogonal if xi , x j = 0 for i ! j. In addition, if /xi , xi 0 = 1 for each i for 1 ≤ i ≤ r, then the vectors form an orthonormal set. Note: Any set of non-zero orthogonal vectors can be transformed into an orthonormal set by dividing each vector by its length, which is a scalar. The standard basis vectors e1 , e2 , . . . , em form an orthonormal set of vectors in Fm ; it is an example of an orthonormal basis. The following result shows that any set of m non-zero orthogonal vectors form a basis of Fm . Proposition 3.7.9.

Any set of non-zero orthogonal vectors in Fm is linearly independent.

Proof. Given any set x1 , x2 , . . . , xr of non-zero orthogonal vectors in Fm , consider any relation c1 x1 + c2 x2 + · · · + cr xr = 0,

(3.9)

where ci ∈ F. Taking the dot product of both sides of the preceding relation by x j , for a fixed j where 1 ≤ j ≤ r and then using Equation (3.8) to express the dot product of a sum of vectors as a sum of dot products, we obtain G F c j x j, x j = 0

F G as the given vectors are orthogonal. Since the given vectors are non-zero, the scalar x j , x j is nonzero and so we conclude that c j = 0. Thus, Equation (3.9) can hold only if all ci are zeros. This proves the linear independence of the given set of vectors. ! Corollary 3.7.10.

Any set of m orthonormal vectors in Fm forms an orthonormal basis.

Orthogonal and Unitary Matrices Matrices whose columns are orthonormal have nice properties; moreover, they appear frequently in applications. Definition 3.7.11. A matrix Q ∈ Mn (R) is said to be orthogonal if its columns are orthonormal in Rn . A matrix U ∈ Mn (C) is said to be unitary if its columns are orthonormal in Cn . EXAMPLE 43 Recall that the columns of a permutation matrix P of order n are a rearrangement of the columns of the identity matrix In . Since the columns of In are orthonormal, it follows that the columns of any permutation matrix are orthonormal. Thus any permutation matrix is an orthonormal matrix.

Saikia-Linear Algebra

168

book1

February 25, 2014

0:8

Vector Spaces

EXAMPLE 44 We leave to the reader the verification that the following complex matrix ' ( 1 1 1 A= √ 2 i −i is unitary. Let Q ∈ Mn (R) be an orthogonal matrix with orthonormal columns γ1 , γ2 , . . . , γn , each column an n × 1 matrix. The columns being linearly independent, Q is invertible (see Proposition 3.6.16). We claim that its inverse is the transpose Qt . Now Qt , a matrix of order n, has rows γt1 , γt2 , . . . , γtn , each row an 1 × n matrix. It is clear that the (i, j)th entry of Qt Q is the product γti γ j , a scalar. Since {γ1 , γ2 , . . . , γn } is an orthonormal set, it follows, from the definition of dot product in Rn , that G F γti γ j = γ j , γi = 0 if i ! j, = 1 if

i = j.

Thus, Qt Q = In showing that Qt is a left inverse of Q. But we have already seen (see Proposition (2.5.7) in Section 2.5) that an one-sided inverse of a square matrix is necessarily the unique inverse of the matrix. Hence, Qt is the inverse of Q. Note that by a similar argument, the condition At A = In , for any A ∈ Mn (R), implies that the columns of A are orthonormal. Thus, we have the following result, which is also sometimes taken as the definition of an orthogonal matrix. Proposition 3.7.12.

A matrix Q ∈ Mn (R) is orthogonal if and only if Qt Q = In = QQt .

It must be emphasized that we have not used any special property of the columns of Q to derive the condition QQt = In from Qt Q = In ; rather it reflects a property of invertible matrices. So it is surprising that a consequence of the condition QQt = In is the following, whose proof is left to the reader. Corollary 3.7.13. thonormal.

If the columns of A ∈ Mn (R) are orthonormal, then the rows of A are also or-

We verify next some more properties of orthogonal matrices. Proposition 3.7.14.

Let Q be an orthogonal matrix in Mn (R).

(a) For any column vectors x and y in Rn , /Qx, Qy0 = /x, y0 . Thus, multiplication by Q preserves the dot product. (b) For any column vector x in Rn , 4Qx4 = 4x4. Thus, multiplication by Q preserves lengths of vectors. (c) det Q = ±1.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

169

Proof. The definition of dot product proves the first assertion immediately: /Qx, Qy0 = (Qy)t Qx = (yt Qt )(Qx) = yt (Qt Q)x = yt x = /x, y0 as Qt Q = In and (XY)t = Y t X t for any two matrices X and Y which can be multiplied. The next assertion follows from the first as 4x42 = /x, x0. Finally, taking determinant of both sides of the relation Qt Q = In , one obtains (det Q)2 = 1 as the determinants of a matrix and its transpose are the same. The final assertion follows. ! A detailed discussion of orthogonal matrices in general inner product spaces can be found in Section 8.6. Still a point must be noted here: orthogonal matrices preserve dot product and lengths of vectors so they can be used to represent geometric transformations in Rn , such as rotation and reflections, which do not change lengths of line segments or angles between them. See relevant material in our website for a thorough discussion of such transformations. Orthogonal matrices, in such contexts, can be identified by their determinants. Definition 3.7.15. if det Q = −1.

An orthogonal matrix Q is a rotation matrix if det Q = 1 and a reflection matrix

EXAMPLE 45 Consider the orthogonal matrices ' ( cos θ − sin θ R= , sin θ cos θ

S=

'

0 1

( 1 . 0

R represents a rotation of the plane R2 through an angle θ, whereas S a reflection 2 3 x of the plane about the straight line y = x as can be verified by computing R 1 and x2 2 3 x1 S . Note: det R = 1 and det S = −1. x2 More examples of rotation and reflection matrices can be found in the exercises that follow this section. EXAMPLE 46 The rotation matrix '

cos θ − sin θ R= sin θ cos θ

(

has the inverse R−1 = Rt =

'

( cos θ sin θ . − sin θ cos θ

Saikia-Linear Algebra

170

book1

February 25, 2014

0:8

Vector Spaces

It can be easily verified that the columns of the following matrix   1 1 1  1 −1 1 H =   1 1 −1 1 −1 −1

 1   −1   −1  1

form an orthogonal basis of R4 . Converting the columns into orthonormal columns, one concludes that 1 H −1 = H. 4 Another advantage of orthonormal sets in calculations involving vectors in Fm is evident from the following, whose simple verification is left to the reader. Proposition 3.7.16. any v ∈ W,

Let {v1 , v2 , . . . , vr } be an orthonormal basis of a subspace W of Fm . Then for

v=

r F 1 j=1

G v, v j v j .

Thus, computing the coordinates of a vector with respect to an orthonormal basis is quite simple. One does have to solve systems of equations or use the change of basis matrix as we have done in Section 3.4. Gram–Schmidt Process Because of many advantages of orthonormal bases, the need frequently arises to replace a set of linearly independent vectors (such as a basis of a subspace) in Fm by an orthonormal set of vectors. This can be done by following a procedure known as the Gram–Schmidt orthogonalization process, which is applicable even in general inner product spaces. (A detailed discussion of the process is given in Section 8.6). Here, by considering vectors in R2 , we try to give an intuitive feeling for the key idea which makes the process work: finding a formula for the projection of a vector onto another. The reader has, in fact, worked with this idea while computing the component of a vector along another vector in coordinate geometry or basic vector algebra. As before, we identify a vector x = (x1 , x2 )t in R2 with the directed line segment from the origin (0, 0) of the plane R2 to the point (x1 , x2 ). Consider two vectors v = (x1 , x2 )t and u = (y1 , y2 )t in R2 ; we assume that they are linearly independent so that they are not collinear. Let (z1 , z2 ) be the foot of the perpendicular from the point (y1 , y2 ) to the line L joining the origin (0, 0) to (x1 , x2 ); then the line segment from (0, 0) to (z1 , z2 ), that is, the vector (z1 , z2 )t is the component (projection) of u = (y1 , y2 )t along v. As the line containing v is one-dimensional, the component (z1 , z2 )t is av for some real number a. Similarly, if L1 is the straight line through the origin perpendicular to L, then dropping the perpendicular from (y1 , y2 ) onto L1 , we can determine the component, say w, of u along L1 . Then by the Parallelogram law for addition of line segments again, u = av + w. Taking dot product of both sides of this vector relation with v, one obtains /u, v0 = a /v, v0 as v is orthogonal to w.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

171

Therefore, we may conclude that u−

/u, v0 v /v, v0

is orthogonal to v. To put our discussion in proper perspective as far as the Gram–Schmidt process is concerned, we note that a direct calculation (without referring to coordinates) shows that, for any v, u ∈ Fm , D E /u, v0 v, u − v = 0. 4v42 /u, v0 v as the component (projection) of u along v. 4v42 In general, given a set of mutually orthogonal vectors v1 , v2 , . . . , vk−1 and another vector uk in Fm , subtracting from uk each of its components along v1 , v2 , . . . , vk−1 , respectively, produces a vector vk orthogonal to each of v j for 1 ≤ j ≤ (k − 1). This procedure is the key to Gram–Schmidt process outlined in the following proposition; for a proof see the discussion of the process in general inner product spaces in Section 8.6. We shall refer to the vector

Proposition 3.7.17. Let u1 , u2 , . . . , ur be a set of linearly independent vectors in Fm . We determine vectors v1 , v2 , . . . , vr recursively as follows: we set v1 = u1 , and after finding v1 , v2 , . . . , vk−1 for 2 ≤ k ≤ r, we set vk = u k −

/uk , v1 0 4v1 4

2

v1 −

/uk , v2 0 4v2 4

2

v2 − · · · −

/uk , vk−1 0 4vk−1 42

vk−1 .

(3.10)

Then v1 , v2 , . . . , vr form a mutually orthogonal set of linearly independent vectors such that their span is the same subspace spanned by u1 , u2 , . . . , ur . Finally, if we set, for 1 ≤ k ≤ r, qk =

1 vk , 4vk 4

then {q1 , q2 , . . . , qr } is an orthonormal set such that its span is the same as the span of u1 , u2 , . . . , ur . We shall need the following observation later: H

I qk , uk ! 0

for any k, 1 ≤ k ≤ r.

(3.11)

Since vk is a scalar multiple of qk , it is sufficient to verify that /vk , uk 0 ! 0, which is clear for k = 1. For k ≥ 2, by Equation (3.10), vk − uk is a linear combination of v1 , v2 , . . . , vk−1 , each of which is orthogonal to vk . Thus, if /vk , uk 0 = 0, then /vk , vk 0 = 0 (verify), which implies that vk is the zero vector. This contradiction completes the verification of our observation. Since any subspace W of Fm is finite-dimensional, W has a basis consisting of linearly independent vectors. The Gram–Schmidt process then converts this basis into an orthonormal basis. Furthermore, we know that any linearly independent set in Fm can be extended to a basis of Fm (see Section 3.4). It follows any orthonormal basis of W can be extended to an orthonormal basis of Fm .

Saikia-Linear Algebra

172

book1

February 25, 2014

0:8

Vector Spaces

Proposition 3.7.18. Any subspace of Fm has an orthonormal basis, which can be extended to an orthonormal basis of Fm . Given a subspace W of Fm , choose an orthonormal basis v1 , v2 , . . . , vr of W and then extend it to an orthonormal basis v1 , v2 , . . . , vr , vr+1 , . . . , vm of Fm . If U is the subspace of Fm spanned by vr+1 , . . . , vm , then it is clear that Fm = W ⊕ U. Also, as each basis vector of U is orthogonal to every basis vector of W, U ⊂ W ⊥ . On the other hand, given any v ∈ W ⊥ , we may write, by Proposition (3.7.16), v=

m 1 i=1

/v, vi 0 vi ,

which implies that v is a linear combination of vr+1 , . . . , vm (as /v, vi 0 = 0 for i = 1, 2, . . . , r). This proves the first part of the following; the proof of the second part is left to the reader. Proposition 3.7.19.

Let W be a subspace of Fm . Then Fm = W ⊕ W ⊥ .

Furthermore, (W ⊥ )⊥ = W. The Proposition (3.7.7), about the row and null space of a matrix, now implies the following. Corollary 3.7.20.

Let A ∈ Mm×n (R).

(a) row(A) = null(A)⊥. (b) Rn = row(A) ⊕ null(A). Now for some numerical computations. EXAMPLE 47 We apply the Gram–Schmidt process to the vectors     1 1     u1 = 1 and u2 = 0     2 0 in R3 . We set v1 = u1 = (1, 1, 0)t . Then

4v41 2 = /v1 , v1 0 = vt1 v1 = 1 + 1 + 0 = 2. So the next vector will be given by v2 = u2 −

/u2 , v1 0

v1 4v41 2 1 = (1, 0, 2)t − (1, 1, 0)t 2 = (1/2, −1/2, 2)t .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

173

√ Since 4v42 2 = 1/4 + 1/4 + 4 = 9/2, the length of v2 = 3/ 2. Thus the required orthonormal vectors are √   √   1/ 2  1/3 2  √  √    . 1/ 2 and  −1/3 √ 2  0 2 2/3

EXAMPLE 48 In this example, we determine an orthonormal basis of the column space of the matrix    1 0 2    1 1 . A =  1   −1 −1 0 Here we start with the vectors u1 = (1, 1, −1)t , u2 = (0, 1, −1)t and u3 = (2, 1, 0)t . So v1 = (1, 1, −1)t , whose length squared is 1 + 1 + 1 = 3. Thus, v2 = u2 −

/u2 , v1 0 4v41 2

v1

0+1+1 (1, 1, −1)t 3 = (−2/3, 1/3, −1/3)t = (0, 1, −1)t −

and so 4v42 2 = 4/9 + 1/9 + 1/9 = 6/9 = 2/3. It follows that /u3 , v1 0

=

2+1+0 =1 3

/u3 , v2 0

=

−4/3 + 1/3 + 0 = −3/2. 2/3

4v41

2

4v42 2

So, finally v3 = u3 −

/u3 , v1 0 4v41

2

v1 −

/u3 , v2 0 4v42 2

v2

= (2, 1, 0)t − (1, 1, −1)t + 3/2(−2/3, 1/3, −1/3)t

= (0, 1/2, 1/2)t ,

√ √ whose length is 1/4 + 1/4 = 1/ 2. Normalizing the vectors v1 , v2 and v3 , that is, dividing them by their respective lengths, we obtain the following orthonormal basis of the column space of A: √     √ √    1/ 3   − 2/ 3   √0  √  √       1/ √3 ,  1/ √6 ,  1/ √2 .       1/ 2 −1/ 3 −1/ 6

Saikia-Linear Algebra

174

book1

February 25, 2014

0:8

Vector Spaces

We reiterate that the Gram–Schmidt process guarantees that the final orthonormal vectors spans the same subspace whose basis we started with. Note that if we start with linearly dependent set of vectors then the Gram–Schmidt process breaks down (see Exercise 13). EXAMPLE 49 We now consider an example involving complex vectors. Let us apply the Gram– Schmidt process to vectors 2 3 2 3 1 1+i u1 = and u2 = . i −i One has to be a little careful because dot products of complex vectors are expressed in terms of conjugate transposes and so the order of the in dot products do 2 vectors 3 1 2 = 2. To find v2 , we first matter. To begin with, v1 = u1 and so 4v1 4 = (1, −i) i compute /u2 , v1 0 = v∗1 u2 = (1, −i)

2

= 1 + i + i2

1+i −i

3

= i. Therefore, the next vector, given by the Gram–Schmidt process, is /u2 , v1 0

v1 4v1 42 2 3 2 3 i 1 1+i = − −i 2 i 2 3 1 + i/2 = . 1/2 − i

v2 = u2 −

The reader can now verify that 4v2 42 = 5/2 and that the required orthonormal vectors are 2 3 2 3 1 1 1 2+i . and √ √ 10 1 − 2i 2 i QR Factorization The Gram–Schmidt process gives rise to what is known as the QR factorization of real matrices. Consider a real m × n matrix A with n linearly independent columns u1 , u2 , . . . , un in Rm . Note that, as Rm is an m-dimensional vector space, n ≤ m. It is also clear that the rank of A is n. Suppose that the Gram–Schmidt process, applied to the column vectors of A produce an orthonormal set q1 , q2 , . . . , qn . Let Q be the m × n matrix over R having the vectors of this orthonormal set as its columns. We claim that there is an invertible upper triangular matrix R of order n such that A = QR. To prove our claim, we first recall that the Gram–Schmidt process is such that for any k, 1 ≤ k ≤ n, span{u1 , u2 , . . . , uk } = span{q1 , q2 , . . . , qk }. Thus, for any j, 1 ≤ j ≤ n, the vector u j is a linear

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

175

combination of the orthonormal vectors q1 , q2 , . . . , q j . It then follows from Proposition (3.7.16) that G G G F F F u j = u j , q1 q1 + u j , q2 q2 + u j , q j q j . (3.12)

G F For a fixed j, set ri j = u j , qi for 1 ≤ i ≤ j and define an n-dimensional column vector r j as   r1 j  r   2 j   ..   .    r j = r j j .    0   ..   .    0

Thus, if R is the matrix of order n having the column vectors r1 , r2 , . . . , rn as its n columns, then R is clearly upper triangular. We have already noted (see end of Section 1.6 about column-row multiplication as well as the description of column space in Section 3.6) that if M is an m×n matrix with columns γ1 , γ2 , . . . , γn and x an n-dimensional column vector (x1 , x2 , . . . , xn )t , then Mx = x1 γ1 + x2 γ2 + · · · + xn γn . Applying this formula to the matrix Q having columns v1 , v2 , . . . , vn and the column vector r j , we see that Qr j = r1 j q1 + r2 j q2 + · · · + r j j q j = u j,

by Equation(3.12). In other words, Q multiplied to the jth column of R produces u j , the jth column of A. Therefore, QR = A. To complete the proof of our claim, we need to show F thatG R is invertible. Now R is an upper triangular matrix such that each of its diagonal entry r j j = u j , q j is non- zero by Equation (3.11). Therefore, R is invertible. Thus, we have proved the following. Proposition 3.7.21. Let A be a real m × n matrix such that the rank of A is n. Then there is an m × n matrix Q with orthonormal columns and an invertible upper triangular matrix R of order n such that A = QR. EXAMPLE 50 We determine the QR factorization of the following matrix   1  2 A =   −1 −2

−1 3 2 1

   .  

Clearly the column vectors u1 = (1, 2, −1, −2)t and u2 = (−1, 3, 2, −1)t are linearly independent. We leave it to the reader to show that the Gram–Schmidt process, ap-

Saikia-Linear Algebra

176

book1

February 25, 2014

0:8

Vector Spaces

plied to u1 , u2 gives us the orthonormal set q1 , q2 , where √  √     −1/ √10   −3/5 2  √      2/ √10   4/5 √2  q1 =  , , q2 =     5/5 2   −1/ √10    0 −2/ 10

and that

I √ H u1 , q1 = 10 I √ H u1 , q2 = 10/2 √ I H u2 , q2 = 5/ 2

It follows that the QR factorization of A is given by √ √     1 −1   −1/ √10 −3/5√ 2  2 3   2/ √10 4/5 √2  = 2   −1/ 10  −1 5/5 2 √  −2 1 0 −2/ 10

  ' √  10   0 

( 10/2 √ . 5/ 2



Note: The matrix equation A = QR implies that Qt A = R as Qt Q is the identity matrix of order 2.

EXERCISES All matrices are over R unless otherwise specified. The field F is either R or C. 1. Determine whether the following statements are true or false giving brief justification. (a) Any linearly independent set in Fm is an orthogonal set. (b) A square matrix with orthogonal columns is an orthogonal matrix. (c) If the subspace W of Fm is spanned by u1 , u2 and y ∈ Fm is orthogonal to both u1 and u2 , then y ∈ W ⊥ . (d) If 4x − y42 = 4x42 + 4y42 for x, y ∈ Fm , then x ⊥ y. (e) If an m × n matrix A has orthonormal columns, then AAt = Im . (f) If ut v = 0 for vectors u, v ∈ Cm , then v ⊥ u.

(g) An orthogonal matrix has linearly independent rows. (h) if a vector u ∈ Rm has zero component along a vector v ∈ Rm , then u and v are orthogonal. (i) For any subspace W of Fm , W ∩ w⊥ = {0}.

(j) For an orthogonal matrix Q of order n, Qx ⊥ Qy if and only if x ⊥ y. (k) Every non-zero m × n matrix over R has a QR factorization.

(l) If A = QR is a QR factorization, then the columns of Q is an orthonormal basis of the column space of A.

(m) If A = QR is a QR factorization, then the diagonal entries of R are positive.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Orthogonality in Rn

177

(n) For any m × n real matrix A, the null space of At is the orthogonal complement of the column space of A. (o) The null space of any orthogonal matrix is {0}. (p) If A is a real symmetric matrix of order n such that A2 = In , then A is orthogonal. (q) For a real matrix A, the equation Ax = b has a solution if and only if b is orthogonal to every solution of At x = 0. 2. For x, y ∈ Rm , show that x + y is orthogonal to x − y if and 2only3if 4x4 = 4y4. 1 3. Find two vectors x1 and x2 in R2 which are orthogonal to . Can they be linearly indepen−1 dent?    1    4. Find two vectors x1 and x2 in R3 which are orthogonal to  1 . Is it possible to choose x1 and   −1 x2 such that they are linearly independent or orthogonal? 5. Show that      −2  1     u1 = 0 and u2 =  1      2 1 are orthogonal in R3 . Extend {u1 , u2 } to an orthogonal basis {u1 , u2 , u3 } of R3 . 6. Let W be the subspace of R3 represented by the plane x − 2y + z = 0. Determine W ⊥ . 7. Let   1 1 1   A = 2 3 3 .   344

8. 9. 10. 11. 12. 13.

Find vectors u, v ∈ R3 such that u is orthogonal to the row space of A and v orthogonal to the column space of A. Find an orthonormal basis of the column space of A, the matrix of the preceding Exercise. Find an orthonormal basis of W ⊥ , where W is the subspace of R4 spanned by (1, 0, 0, 1)t . Let A be real matrix of order 3, one of whose rows is (1, 1, 1). Is it possible that the vector (−1, 0, 2)t is a solution of the system of equation Ax = 0? Let A be a real m × n matrix. Prove that the orthogonal complement of null(At ) is the column space of A. Let A be a real m × n matrix. Prove that A and At A have the same null space (in Rn ). Show further that if A has non-zero orthogonal columns, then At A is invertible. Let u1 = (1, 1)t and u2 = (2, 2)t . Verify that u2 −

/u2 , u1 0 4u1 42

u1 = 0.

14. Let L be the line segment from the origin (0, 0) to the point (x1 , x2 ) and let x = (x1 , x2 )t be the vector representing L. If L makes an angle α with the positive direction of the x-axis, then show that x2 x1 sin α = and cos α = . 4x4 4x4

Saikia-Linear Algebra

178

book1

February 25, 2014

0:8

Vector Spaces

Further, assume that the line segment L1 representing y = (y1 , y2 )t makes an angle β with the positive direction of the x-axis and that θ = β − α. Prove that cos θ = 15. 16. 17. 18. 19.

/y, x0 . 4y44x4

Prove that the orthogonal matrices of order n form a group with respect to matrix multiplication. Let u be a unit vector in Rm . Prove that Q = Im − 2uut is an orthogonal matrix. Prove that the rows of an orthogonal matrix Q in Mm (R) form an orthonormal set. Let Q ∈ Mm (R) such that /Qx, Qy0 = /x, y0 for all x, y ∈ Rm . Is Q an orthogonal matrix? Verify that the columns of the following matrices     1 1 1  1 1 1 1   1    1 −1 1 i i2 i3  1 −1    , F =  H =  1 i2 i4 i6  1 −1 −1   1   1 −1 −1 1 1 i3 i6 i9

are orthogonal in R4 and C4 , respectively. Hence determine their inverses. 20. Verify that the following matrices    0  2 −1 −2   1 √ 0 1      3/2 −1/2 2 −2  ,  0 A = √  −1 √  10  −2 −2 −1  3/2 0 1/2

are orthogonal. Are they rotation or reflection matrices? 21. Let a, b and c be real numbers such that a2 + b2 + c2 = 1. Prove that   −2ab −2ac   1 − 2a2   −2bc  Q =  −2ab 1 − 2b2   2 −2ac −2bc 1 − 2c

   

is orthogonal and determine whether Q is a rotation or a reflection matrix. 22. Find QR factorizations of the following matrices;   0  1   −1  1

1 1 0 1

    ,  

 1  2 1

1 3 0

 2  1 .  1

23. Let A be a real symmetric matrix of order n. If A = QR is a QR-factorization, then prove that A2 = Rt R is an LU factorization of A2 .

3.8 BASES OF SUBSPACES We need to compute the dimensions as well as bases of specific subspaces of vector spaces quite often. In this section, we discuss, with the help of a few examples, as to how these computations can be performed efficiently with matrices and their echelon forms. The techniques are the same as the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Bases of Subspaces

179

ones adopted to find row and column ranks of matrices in the last section. Thus, familiarity with the material of the last section will be useful in understanding the working of the examples here. As vectors in an arbitrary finite-dimensional vector space can be visualized as row or column vectors by introducing coordinates through a fixed basis, our examples will be in Fn for a field F, or more specifically, in Rn . EXAMPLE 51 Let W be the subspace of R5 spanned by the vectors (1, 1, 2, 1, −2), (2, 3, 8, 1, −1) and (−1, 1, 6, −3, 8). We discuss the following questions. (a) How to find a basis of W? (b) What may be the general form of a vector in W? (c) What are the coordinates of an arbitrary vector of W with respect to a basis of W? In this example, we try to answer these questions by forming the 3 × 5 matrix A listing the given vectors as the rows of A. Thus,   1  A =  2  −1

1 3 1

2 8 6

1 1 −3

 −2  −1.  8

The crucial point is that W is the same as the row space of the matrix A. Therefore, any basis of the row space of A is a basis of W. Now recall [see the remarks after the proof of Proposition (3.6.9) in the last section] that a basis of the row space of A is formed by the non-zero rows of the reduced row echelon form R of A. Thus, we need to perform row operations on A to reduce it to R. Recall that the symbol ∼ denotes row equivalence.    1 1 2 1 −2   1 −1 A =  2 3 8   −1 1 6 −3 8   1 1 2 1 −2   3 ∼ 0 1 4 −1   0 2 8 −2 6

We see that

 1  ∼ 0  0  1  ∼ 0  0

 1  R = 0  0

1 1 0

2 4 0

1 −1 0

0 1 0

−2 4 0

2 −1 0

0 1 0

−2 4 0

2 −1 0

 −2  3  0

 −5  3.  0  −5  3.  0

Saikia-Linear Algebra

180

book1

February 25, 2014

0:8

Vector Spaces

It follows that the non-zero row vectors v1 = (1, 0, −2, 2, −5) and v2 = (0, 1, 4, −1, 3) of R form a basis of W. This answers the first question. For the second question, note that any vector in W is of the form x1 v1 + x2 v2 for some scalars x1 and x2 . But such a general description does not really throw any light on the nature of vectors in W. What we would like to have is some specific relations among the components of the vectors in W which distinguishes them from the rest of vectors in R5 . To discover such relations, let y = (y1 , y2 , y3 , y4 , y5 ) = x1 v1 + x2 v2 be an arbitrary vector in W. Note that (x1 , x2 )t is the coordinate vector of y with respect to the basis {v1 , v2 } of W. Now, equating the components of both sides of the equality y = x1 v1 + x2 v2 = x1 (1, 0, −2, 2, −5) + x2(0, 1, 4, −1, 3), we can easily express yi in terms of the scalars x j : y1 = x1 , y2 = x2 , y3 = −2x1 + 4x2, y4 = 2x1 − x2 , y5 = −5x1 + 3x2. Eliminating the x j , we conclude that (y1 , y2 , y3 , y4 , y5 ) ∈ W if and only if 2y1 − 4y2 + y3 = 0 −2y1 + y2 + y4 = 0

5y1 − 3y2 + y5 = 0.

In other words, W is precisely the solution space in R5 of the following homogeneous system of linear equations over R: 2x1 − 4x2 + x3 = 0 2x1 − x2 − x4 = 0 5x1 − 3x2 + x5 = 0. Observe that the number of free variables of this system of equations is the dimension of W by Corollary (3.6.15). The fact that a subspace of Rn spanned by a certain set of vectors can be identified as the solution space of some homogeneous system of linear equations over R will be useful in tackling some other problem as we will see shortly. Coming back to our example, note that we have also found the coordinates x1 and x2 of y = (y1 , y2 , y3 , y4 , y5 ) ∈ W with respect to the basis {v1 , v2 } of W: x 1 = y1

and

x2 = y2 .

For example, it is readily verified that the coordinates of the vectors spanning W we started with, will be given as follows (1, 1, 2, 1, −2) = 1 · v1 + 1 · v2 (2, 3, 8, 1, −1) = 2 · v1 + 3 · v2

(−1, 1, 6, −3, 8) = −1 · v1 + 1 · v2 .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Bases of Subspaces

181

It is now quite straightforward to obtain the coordinates of a vector in W relative to an arbitrary basis of W by using Theorem (3.4.14). For example, the rank of the matrix A being 2 (as the rank equals the rank of R, the echelon form of A), the first two rows of A, say u1 = (1, 1, 2, 1, −2) and u2 = (2, 3, 8, 1, −1), also form a basis of W. Expressing these vectors in terms of the basis vectors v1 , v2 of W, we see that ' ( 1 2 P= 1 3 is the transition matrix from basis {u1 , u2 } to basis {v1 , v2 } of W. Since ' ( 3 −2 , P−1 = −1 1 it follows that the coordinate vector of any y = (y1 , y2 , y3 , y4 , y5 ) in W with respect to basis {u1 , u2 } will be given by 2 3 2 3 y 3y1 −2y2 . P−1 1 = y2 −y1 +y2 We next illustrate the method of finding a basis of the intersection of two subspaces of Fn by computing a basis of W ∩ U, where W is the same subspace of R5 we have been considering and U is the subspace of R5 spanned by (2, 1, 0, 3, −7), (1, 0, −1, 2, −6) and (4, 2, 3, 5, −9). We first find a basis of U the same way as in the case for W by finding an echelon form of the matrix whose rows are the vectors spanning U. It is an easy matter to see that the required echelon form is   1 0 −1 2 −6  0 1 2 −1 5,   0 0 0 0 0

so that (1, 0, −1, 2, −6) and (0, 1, 2, −1, 5) form a basis of U. Thus, if x1 , x2 are the coordinates of an arbitrary vector y = (y1 , y2 , y3 , y4 , y5 ) in U, then (y1 , y2 , y3 , y4 , y5 ) = (x1 , x2 , −x1 + 2x2 , 2x1 − x2 , −6x1 + 5x2 ). Eliminating the xi , we see that the components of y satisfy the relations y1 − 2y2 + y3 = 0 2y1 − y2 − y4 = 0.

6y1 − 5y2 + y5 = 0

It follows that U is the solution space of the following homogeneous system of linear equations over R: x1 − 2x2 + x3 = 0 2x1 − x2 − x4 = 0. 6x1 − 5x2 + x5 = 0 The key to finding a basis of W ∩ U is the simple observation that the intersection W ∩ U is the common solution space of the preceding two homogeneous systems of equations whose solution spaces are W and U, respectively.

Saikia-Linear Algebra

182

book1

February 25, 2014

0:8

Vector Spaces

So, following the procedure outlined in Section 2.4 for finding solutions of systems of linear equations, we form the coefficient matrix  2 2  5 C =  1 2  6

−4 −1 −3 −2 −1 −5

1 0 0 1 0 0

0 −1 0 0 −1 0

 0  0 0  0  0  1

of the six equations which define subspaces W and U, and row reduce it. We leave it to the reader to verify that the reduced row echelon form S of C is  1 0  0 S =  0 0  0

−2 1 0 0 0 0

0 0 1 0 0 0

0 2 0 7 0 0

 0  1 0 . 3  0  0

Since S has just a single non-pivot column, the solutions satisfying the two systems of equations simultaneously, i.e. the subspace W ∩U form an one-dimensional vector space. The matrix S implies that the components of the vectors (x1 , x2 , x3 , x4 , x5 ) in W ∩ U satisfy the following relations: x1 − 2x2 = 0 x2 + 2x4 + x5 = 0 . x3 = 0 7x4 + 3x5 = 0 A basis of W ∩ U, which consists of a single vector, can be determined by assigning any arbitrary value to x5 , the free variable, and then expressing the others in terms of x5 according to these relations. Thus, for example, if we choose x5 = 7, then we see that x4 = −3, x2 = −1 and x1 = −2. We can, therefore, conclude that (−2, −1, 0, −3, 7) forms a basis of W ∩ U. In the next example, we find a basis of the sum of two subspaces whose spanning sets are given. EXAMPLE 52 Let W be the subspace of R4 spanned by the vectors (1, −2, −1, 1) and (−2, 1, 3, −3), and U be the subspace spanned by (1, −8, 1, −1) and (3, 2, 1, −1). We note that the sum W + U is spanned by the four given vectors, and therefore a basis of W + U can be determined from the reduced row echelon form of the 4 × 4 matrix A whose rows are the vectors spanning W and U, respectively. It can be easily verified that the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Bases of Subspaces

183

reduced row echelon form of   1 −2 A =   1 3

is

 1 0 R =  0 0

−2 1 −8 2

0 1 0 0

 1  −3  −1 −1

−1 3 1 1

0 0 1 0

 0  0 . −1 0

So, a basis of W + U is formed by v1 = (1, 0, 0, 0), v2 = (0, 1, 0, 0) and v3 = (0, 0, 1, −1). As in the preceding example, we then see that if y = (y1 , y2 , y3 , y4 ) is an arbitrary vector of W + U, then its coordinates with respect to the basis {v1 , v2 , v3 } of W + U are just y1 , y2 and y3 . It can also be deduced easily that W + U is the solution space of a single linear equation x3 + x4 = 0. EXERCISES 1. Let W be the subspace of R4 spanned by the vectors (1, −2, 1, 4), (−2, 1, 3, −1), (0, 6, −10, −14) and (−5, 1, 10, 1). Find a basis of W, and a homogeneous system of linear equations over R whose solution space is W. Determine the coordinates of an arbitrary vector in W with respect to the basis of W. 2. Let W be the subspace of R5 generated by the vectors (−2, 1, 3, 0, 5), (1, −3, 0, 2, −2), (1, 2, 3, 0, 5) and (−1, −2, −1, 0, 6). Find a basis of W, and a homogeneous system of linear equations over R whose solution space is W. Determine the coordinates of an arbitrary vector in W with respect to the basis of W. 3. Let W be the subspace of R5 consisting of the vectors (x1 , x2 , x3 , x4 , x5 ) such that x1 − 2x2 + 3x3 − x4 = 0 . x2 − x3 + 3x4 − 2x5 = 0 Find a basis of W and extend it to a basis of R5 . What are the coordinates of a vector in W with respect to the basis just found? 4. Find a basis of the subspace W of the space R3 [x] of all real polynomials of degree at most 3 spanned by 2 − 3x2 + 4x3 , x + 2x2 − x3 , x + x2 − x3 and 4 + x − 5x2 + 2x3 . Hint: Use the coordinates of the polynomials spanning W with respect to the standard basis of R3 [x] to reduce the problem to finding a basis of a subspace of R4 . The trace T r(A) of a matrix A ∈ Mn (F) is the sum of its diagonal entries. 5. Let W1 and W2 be the following subsets of M3 (F): (a) W1 = {A | T r(A) = 0}.

Saikia-Linear Algebra

184

book1

February 25, 2014

0:8

Vector Spaces

(b) W2 = {A = [ai j ] |

43

j=1

a1 j = 0}.

Prove that W1 and W2 are subspaces of M3 (F). Further, determine the dimensions W1 , W2 , W1 ∩ W2 and W1 + W2 . 6. Let W1 = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 − x2 − x3 = 0} and W2 = {(x1 , x2 , x3 , x4 , x5 ) ∈ R5 | x1 = x2 } be subspaces of R5 . Determine the dimensions of the subspaces W1 , W2 , W1 + W2 and W1 ∩ W2 . 7. Find bases of W1 + W2 and W1 ∩ W2 , where W1 and W2 are the subspaces of R4 generated by (1, 1, 1, 1), (1, 1, −1, −1), (1, −1, 1, −1) and (1, 1, 0, 0), (1, −1, 1, 0), (2, −1, 1, −1), respectively. 8. Let W1 and W2 be the subspaces spanned by (1, −2, 3, 4, 0), (1, 2, −3, 4, 1), (3, 2, −3, 12, 2) and (1, 1, 1, −1, −2), (0, 1, −4, 5, 3), (3, 5, −5, 7, 0), respectively. Find two systems of homogeneous linear equations over R whose solution spaces are W1 and W2 , respectively. Hence, find a basis of W1 ∩ W2 . What are the coordinates of an arbitrary vector in W1 ∩ W2 with respect to this basis? 9. Given the subspace W1 of R4 spanned by (1, 2, −1, 1), (2, 2, −1, 1) and (1, 1, 2, 3), find bases for two subspaces W2 and W3 of R4 such that R4 = W1 ⊕ W2 = W1 ⊕ W3 .

3.9 QUOTIENT SPACE We have seen that the geometrical object of a straight line passing through the origin in R2 or R3 is a one-dimensional subspace in the respective vector space. A question therefore arises: Is there any interpretation of the straight lines not passing through the origin? Such an interpretation needs the idea of a quotient space.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Quotient Space

185

To introduce the idea of such a space, let us consider a straight line, say, y = x + 1 in R2 . The points on this line form a subset L = {(x1 , x2 ) ∈ R2 | x2 = x1 + 1} of R2 . Note that this is not a subspace of R2 . However, it is closely related to the subspace of R2 formed by the straight line passing through the origin and parallel to the line y = x + 1. Let the subspace be denoted by W so that W = {(x1 , x2 ) ∈ R2 | x2 = x1 }. To relate L to W, we need the following notation which is similar to that if sum of two subspaces. For a vector v and a subset U of a vector space V, we let v + U denote the subset of V formed by the sums v + u as u ranges over U, i.e., v + U = {v + u | u ∈ U}. Note that if v ∈ U, then v + U = U as subsets of V. The notation just introduced can be extended to the addition of two arbitrary subsets of a vector space which generalizes sum of two subspaces. Definition 3.9.1. Let A and B be any two non-empty subsets of a vector space V. Then, the sum A + B is the subset of V defined as A + B = {a + b | a ∈ A, b ∈ B}. Going back to the example with which we started, the notation of a sum of a vector and a subset allows us to relate L and W as subsets of R2 as follows L = (0, 1) + W. For, if (a1 , a2 ) ∈ L, then a2 = a1 + 1, so that (a1 , a2 ) = (0, 1) + (a1, a1 ) which is clearly in (0, 1) + W. Similarly, every vector of (0, 1) + W is in L. Hence, the equality between these two subsets of R2 . Because of this equality, we say that L is a coset of the subspace W in R2 , and (0, 1) is a coset representative of the coset L. We list some of the obvious points related to this idea of a coset of a subspace of the vector space R2 for they will have relevance in the general case also. • There is no unique representative of the coset L. A moment’s thought will make it clear that any vector in the subset L will be as good a representative as (0, 1) is. • However, as the subspace W contains the zero vector, any coset representative has to be a vector in the set L. • There is nothing special about the line L which makes it a coset of W. In fact, any line parallel to the line W can be thought of as a coset of the subspace W. For any line L' parallel to W, we may choose a vector l in L' , and a routine calculation then will show that L' is the coset l + W. • W is a coset of itself, for we may choose any vector in W, for example, the zero vector, as a representative of the coset W. • Consider the cosets or the straight lines L = (0, 1) + W and L' = (0, 2) + W. Since L = {(x1 , x2 ) ∈ R2 | x2 = x1 + 1}, and L' = {(x1 , x2 ) ∈ R2 | x2 = x1 + 2}, it follows that the sum of L and L' as subsets of R2 is the set L'' = {(x1 , x2 ) ∈ R2 | x2 = x1 + 3}. See Definition (3.8.1). On the other hand, the sum of the coset representatives of L and L' gives the vector (0, 3) which determines a third coset (0, 3) + W. But note that this coset is precisely L'' , the sum of the first two cosets. The point is that cosets can be added.

Saikia-Linear Algebra

186

book1

February 25, 2014

0:8

Vector Spaces

• In a similar manner, it can be verified that the scalar multiples of the vectors in a coset of W by a scalar will again be a coset. For example, multiplying the vectors in L by the scalar 2, we obtain the vectors in L' so that it makes sense to say that 2L = L' . These cosets can be added and scalar multiplied, makes the idea of viewing straight lines in R2 a rewarding one. For, then these straight lines themselves may be organized as a vector space. In fact, even in the most general case, the cosets of any subspace of a vector space can, in a very natural manner, be made into a vector space. Such spaces, whose vectors are cosets, are known as quotient spaces. Let us look at the general case now. Cosets of a Subspace Let W be a subspace of a vector space V over a field F. (V may be infinite-dimensional also). For any v ∈ V, the coset v + W of W represented by v is the subset of V defined as v + W = {v + w | w ∈ W}.

(3.13)

It is easy to verify that two cosets v + W and v' + W are equal as sets if and only if v − v' ∈ W. Thus, a coset v + W can have any other representative v' from V as long as v − v' ∈ W. The collection of all distinct cosets of W in V, denoted by the symbol V/W, is called the quotient space of V by W. Thus, V/W = {v + W | v ∈ V}. There is a convenient notation for cosets in case there is no ambiguity about the underlying subspace W (for example, in this present discussion there is only one subspace). If the role of W is understood, we may let v = v + W. In this notation, V/W = {v | v ∈ V}

(3.14)

if and only if v1 − v2 ∈ W.

(3.15)

and v1 = v2

Addition and Scalar Multiplication of Cosets In V/W, as we have mentioned earlier, addition and scalar multiplication of cosets are in terms of their representatives as follows: v1 + v2 = v1 + v2 av = av, where a is a scalar from the field F. Note that these formulae in our old notation for cosets, are (v1 + W) + (v2 + W) = (v1 + v2 ) + W

(3.16)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Quotient Space

187

and a(v1 + W) = av1 + W. Also, note that the operation of addition + and scalar multiplication as implied by juxtaposition on the left-hand side of these formulae are defined for V/W in terms of similar operations of V on the right-hand side. Before we proceed any further, let us discuss a difficulty with these definitions. The difficulty arises because the operations are defined in terms of the representatives of cosets, and we know that there is no unique representative of a coset. Thus, for these definitions to be valid, we have to make sure that they do not depend on the choices of vectors to represent cosets. The verification that the equations in (3.16) are independent of the choice of coset representatives is described as establishing the welldefinedness of those operations. Let us take the first of these two definitions, and choose different representatives for the cosets v1 and v2 . So, let u1 and u2 in V such that v1 = u1

and

v2 = u2 .

We have to show that v1 + v2 = u1 + u2 , or, equivalently, that v1 + v2 = u1 + u2 . Now, v1 = u1 and v2 = u2 imply that v1 − u1 and v2 − u2 are in the subspace W. So, the sum of these two vectors is again in W. Since the sum can be put in the form (v1 + v2 ) − (u1 + u2 ), the cosets of W determined by (v1 + v2 ) and (u1 + u2 ) are the same i.e. v1 + v2 = u1 + u2 as desired. Hence, the sum of cosets does not depend on our choice of representatives of the cosets. We invite the reader to verify the well-definedness of the scalar multiplication of cosets in the same way. It is routine now to prove the following theorem. Theorem 3.9.2. Let W be a subspace of a vector space V over a field F. Then, the set V/W of all distinct cosets of W in V is a vector space over the same field F with respect to addition and scalar multiplication of cosets as defined by Equation(3.16). Proof. The verification of the vector space axioms for V/W is straightforward, as the operations on the cosets are defined in terms of the operations on vectors of V, and in V those axioms are already satisfied. For example, the zero vector of V/W is 0 = 0 + W, the coset corresponding to the zero vector 0 of V. Similarly, as −v is the additive inverse of v in V, the inverse of the coset v in V/W is the coset −v. With these comments, we leave the detailed verification to the reader. ! EXAMPLE 53 For any vector space V, if we choose W = V, then the quotient space V/V clearly has only one distinct coset, for v + V = 0 + V for any vector v in V. Thus, the quotient space V/V is an one element vector space, namely, the zero vector space.

Saikia-Linear Algebra

188

book1

February 25, 2014

0:8

Vector Spaces

On the other extreme, if we take W = {0}, the zero subspace of V, then V/W is almost the same as V, for this time distinct vectors in V would determine distinct cosets. EXAMPLE 54 We take up the example at the beginning of the section. Here, V = R2 and W = {(x1 , x2 ) | x1 = x2 }. As we showed there, the quotient space R2 /W is an example of new vector space, whose vectors are all the distinct straight lines in R2 parallel to the line represented by W. It is clear that, in general, if W is taken to be the one-dimensional subspace of R2 given by any straight line L passing through the origin, then R2 /W is the vector space of all lines in R2 parallel to L. We will see in the next chapter that the quotient space R2 /W in all such cases are essentially the same as the vector space R. EXAMPLE 55 Consider the infinite-dimensional real vector space R[x] of all real polynomials. Consider the set W of polynomials in R[x] which are multiples in R[x] of the fixed polynomial x2 + 1. Thus, W = {(x2 + 1)g(x) | g(x) ∈ R[x]}. Note that W is not just the scalar multiples of x2 + 1, i.e. it is not the subspace generated by x2 + 1 in R[x]. But then one verifies that W is still a subspace of R[x], and so we may talk about the quotient space R[x]/W. An arbitrary coset in this quotient space is an object like p(x) + W, where p(x) is a polynomial in R[x]. To get a manageable description of such a coset, we divide the polynomial p(x) by x2 + 1. It is clear that either p(x) is a multiple of x2 + 1 or the remainder is a polynomial of degree less than that of x2 + 1. In other words, if q(x) is the quotient, then p(x) = (x2 + 1)q(x) + r(x), where r(x) is either the zero polynomial or a polynomial of degree at most one. Now, the last relation implies that p(x) − r(x), being a multiple of x2 + 1, belongs to the subspace W. By the definition of equality of cosets, one then finds that the cosets p(x) + W and r(x) + W are the same. For example, in case p(x) is a multiple of x2 + 1, the coset p(x) + W coincides with the coset W, the zero of R[x]/W. Every other coset in R[x]/W has a representative polynomial of degree at most one. Thus, we may describe R[x]/W as R[x]/W = {a0 + a1 x + W | a0 , a1 ∈ R}. This quotient space is essentially a copy of R2 . Dimensions of Quotient Spaces One must have been struck by our comment about the quotient spaces being almost the same as some known ones. But we must wait till the next chapter which deals with mappings between spaces before

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Quotient Space

189

these connections can be made precise. One of the results we need for that purpose is about dimensions of quotient spaces. Proposition 3.9.3. Let W be a subspace of a finite-dimensional vector space over a field F. Then, the quotient space V/W is also finite-dimensional and dim V/W = dim V − dim W. Proof. Let dim V = n and dim W = m. Note that as V is finite-dimensional, W has to be finitedimensional, and dim W ≤ dim V by Proposition (3.4.6). It is sufficient to exhibit a basis of V/W consisting of n − m vectors. Start with a basis w1 , w2 , . . . , wm for W over F. Now, these vectors are linearly independent in W, hence automatically in V also. By Corllary (3.4.7), this linearly independent set can be extended to a basis w1 , w2 , . . . , wm , wm+1 , . . . , wn of V. We claim that the cosets wm+1 , . . . , wn form a basis of the quotient space V/W. Consider linear independence of these cosets over the field F. Note that the zero of the vector space V/W is the coset W = 0. Assume that for scalars am+1 , am+2 . . . , an , we have a relation of linear dependence for these vectors in V/W as follows am+1 wm+1 + am+2 wm+2 + · · · + an wn = 0. Applying the definition of scalar multiplication of a coset in each of the terms of the sum, and then combining all the resultant cosets into a single coset by the rule of coset addition, we can rewrite the last relation as an equality of two cosets: (am+1 wm+1 + am+2 wm+2 + · · · + anwn ) + W = 0 + W, which immediately places the vector am+1 wm+1 + am+2 wm+2 + · · · + an wn in W. Since W is spanned by w1 , w2 , . . . , wm , the preceding vector can be written as a linear combination of these basis vectors of W. Thus, we can find scalars, which we name as a1 , a2 , . . . , am so that am+1 wm+1 + am+2 wm+2 + · · · + an wn = a1 w1 + a2 w2 + · · · + am wm . But the vectors w1 , w2 , . . . , wm , wm+1 , . . . , wn in V form a basis of V. Hence, the last relation forces all the scalars on both sides of the relation, and in particular, am+1 , am+2 , . . . , an to be zeros. This establishes the linear independence of the vectors wm+1 , . . . , wn . To complete the proof, we have to show that these cosets span V/W. So, let v + W be any coset in V/W. If we express v, a vector of V, as a linear combination of the basis vectors w1 , w2 , . . . , wm , wm+1 , . . . , wn of V, then the coset v + W will be the corresponding linear combination of the cosets w1 + W, w2 + W, . . . , wm + W, wm+1 + W, . . . , wn + W. But as w1 , w2 , . . . , wm are in W, the corresponding cosets are the zero cosets in V/W. Therefore, v + W is actually a linear combination of the remaining cosets wm+1 + W, . . . , wn + W. ! This theorem, for example, shows that if W is a straight line passing through the origin of R2 , then the dimension of the quotient space R2 /W is 1. Thus, any non-zero coset can be a basis. Putting differently, this says that every coset is a scalar multiple of a fixed non-zero coset.

Saikia-Linear Algebra

190

book1

February 25, 2014

0:8

Vector Spaces

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All vector spaces are finite-dimensional over an arbitrary field unless otherwise specified. (a) For any subspace W of an infinite-dimensional vector space V, the quotient space V/W is infinite-dimensional. (b) The set of all planes in R3 parallel to a plane containing the origin of R3 is a quotient space of R3 . (c) If W1 is a subspace of another subspace W2 of V, the V/W1 is a subspace of V/W2 . (d) If W1 is a subspace of another subspace W2 of V, the W2 /W1 is a subspace of V/W1 . (e) Two cosets v1 + W and v2 + W are equal if and only if v1 = v2 . (f) If v1 , v2 , . . . , vn span a vector space V, then v1 + W, v2 + W, . . . , vn + W span V/W for any subspace W of V. (g) If the quotient space V/W has dimension m, then for any basis v1 , vm , . . . , vn of V, v1 + W, . . . , vm + W is a basis of V/W. (h) The dimension of a quotient space V/W is strictly less than the dimension of V if and only if W is non-zero.

2. 3. 4. 5. 6.

(i) If W is the subspace of all those matrices in Mn (F) having trace zero, then any matrix in a non-zero coset of W must have non-zero trace. Let W be the subspace of R[x] consisting of all polynomial multiples of p(x) = x3 + 1. Describe the elements of the quotient space R[x]/W. Determine a basis of R[x]/W, if it is finitedimensional. If W1 ⊂ W2 are subspaces of a vector space V, then show that W2 /W1 is a subspace of V/W1 . Is every subspace of V/W1 is of the form W/W1 , where W is some subspace of V containing W1 ? Justify your answer. Let W be the subspace of R3 spanned by (1, 1, 0) and (1, 0, 1). Find a basis of the quotient space R3 /W. Let W be the subspace of R4 spanned by (1, −1, 0, 1) and (2, 0, −1, 2). Find a basis of R4 /W. Let W be the subspace of V = M2 (R) consisting of the diagonal matrices in M2 (R). Find a basis of V/W.

Saikia-Linear Algebra

book1

4

February 25, 2014

0:8

Linear Maps and Matrices

4.1 INTRODUCTION In Chapter 1, we have noted that the multiplication by an m × n matrix over a field F transforms column vectors of Fn to column vectors in Fm and so can be thought of as a function or a mapping from Fn to Fm . To be precise, if we let T (x) = Ax for any x ∈ Fn , then T (x) ∈ Fm and so T is a mapping from Fn to Fm . The most important property of this mapping, given by the multiplication by A, is that it preserves the vector space operations: T (x + y) = T (x) + T (y) T (ax) = aT (x), since by the properties of matrix multiplication A(x + y) = Ax + Ay and A(ax) = aAx. The first of the preceding equalities states that the vector in Fm , produced by applying T to the sum x + y in Fn can also be obtained by adding the images T (x) and T (y) in Fm ; the second states that the image of the scalar multiple ax in Fn under T can be obtained by scalar multiplying the image T (x) by a in Fm . In other words, it is immaterial whether vector space operations are carried out before applying T or after applying T . That is the reason T is said to preserve vector space operations. Functions or mappings between vector spaces that preserve vector space operations are called linear maps. They are also known as linear transformations. Linear maps, like matrices, are indispensable in diverse applications of linear algebra mainly because these maps are well-suited to describe physical phenomena and changes in physical objects; linear maps are also a basic tool in exploring relations between vector spaces. A mapping f from a set X to another set Y is usually denoted by f : X → Y; the element y = f (x) ∈ Y for any x ∈ X is the image of x under f while x is called the pre-image of y.

4.2 BASIC CONCEPTS Linear maps or linear transformations can be defined between any pair of vector spaces, finite or infinite-dimensional, as long as their underlying scalar field is the same. Definition 4.2.1.

Let V and W be vector spaces over the same field F. A map T : V → W is a linear

191

Saikia-Linear Algebra

192

book1

February 25, 2014

0:8

Linear Maps and Matrices

map over F if T (v1 + v2 ) = T v1 + T v2 T (av) = aT v

for any v1 , v2 ∈ V,

for any v ∈ V and a ∈ F.

For such a linear map T : V → W, V is said to be the domain and W the range of T . It should be clear that the vector addition and the scalar multiplication in the left-hand side of the relations defining T are the operations of the domain V whereas the operations of the right-hand side are of W, the range of T . The equality in both the conditions is in W. Linear maps or linear transformations between vector spaces over a field F are also known as Fhomomorphisms or simply as vector space homomorphisms. Linear maps from a vector space V into itself occur frequently. These are usually known as linear operators on V. Some prefer to call them as endomorphisms of V. We will normally use the term linear transformation to describe linear maps between Rm and Rn for various m and n. Two linear maps T and S are equal if they are equal as functions, that is, T = S if they have the same domain and range and T v = S v for all v ∈ V. It is left to the reader to verify that the two conditions defining a linear map T can be combined to give a single equivalent one: T (av1 + v2 ) = aT v1 + T v2

for v1 , v2 ∈ V and a ∈ F.

The last result can be put in a more general form to show that linear combinations of vectors are preserved by linear maps:   m m 1  1  T  a j v j  = a jT v j. j=1

(4.1)

j=1

Two simple but useful consequences of the definition of linear maps are as follows. Proposition 4.2.2. Let T : V → W be a linear map. Then, (a) T (0) = 0; (b) T (−v) = −T v for any v ∈ V. In the first identity, the zero in T (0) is clearly the zero vector of V whereas the zero on the righthand side is the zero vector of W. Since the same symbol may be used to denote different entities, their usages have to be understood from the context. Proof. For the first assertion, note that T (0) = T (0 + 0) = T (0) + T (0) by the linearity of T . Now, subtracting T (0) from both sides yields the result. The linearity of T again

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

193

which implies that T (−v) must be the additive inverse of T v, giving us the second assertion.

!

shows that T v + T (−v) = T (v − v) = T (0) = 0, The first result says that the zero vector of the domain is always a pre-image of the zero vector of the range of any linear map. In fact, the set of the pre-images of the zero vector (the additive identity) of the range of a linear map is a very important subspace of the domain, called the kernel of the map. As we will see later, closely linked to this kernel is the image of the map. For now, we introduce these subspaces. Kernel and Image of a Linear Map Proposition 4.2.3.

Let T : V → W be a linear map.

(a) The kernel of T , denoted by ker T , and defined as ker T = {v ∈ V | T v = 0}, is a subspace of V. (b) The image of T , denoted by Im(T ), and defined as Im(T ) = {w ∈ W | w = T v

for some v ∈ V},

is a subspace of W. Proof. Note that ker T is non-empty as the zero vector of V is in it. So, to prove that it is a subspace, it is sufficient to show that any linear combination of two vectors in ker T is again in ker T . So let v1 , v2 ∈ ker T , and let a1 , a2 be scalars. Then, by the linearity of T , we have T (a1 v1 + a2 v2 ) = a1 T v1 + a2T v2 . But both T v1 and T v2 are zero in W, so the last relation shows that a1 v1 + a2 v2 is in ker T . A similar application of the linearity of T shows that Im(T ) is a subspace of W.

!

We now look at some examples of linear maps. EXAMPLE 1

For any vector spaces V and W over the same field, the map T : V → W given by T v = 0 for any v ∈ V is trivially a linear map. It is customary to call this map the zero map from V into W. We will denote the zero map by z. The kernel of the zero map is the whole of V, whereas the image is the zero subspace of W. By the zero operator of V, we will mean the linear operator z on V which maps every vector of V to its zero vector.

EXAMPLE 2

The identity map of an arbitrary vector space V, that is, the map T : V → V such that T v = v for all v ∈ V, is clearly a linear operator on V. We denote this identity operator as IV , or simply as I.

Saikia-Linear Algebra

194

book1

February 25, 2014

0:8

Linear Maps and Matrices

In fact, if V is vector space over the field F, then for any fixed scalar a ∈ F, the map T : V → V given by T v = av can easily be shown to be linear on V. We denote this map as aIV or simply as aI. EXAMPLE 3

Let P1 and P2 be defined from R2 to itself by P1 (x1 , x2 ) = (x1 , 0) P2 (x1 , x2 ) = (0, x2 ). We leave it to the reader to verify that these are linear maps. These linear operators of R2 are known as projections of R2 onto the x- and y-axis, respectively. We can also think of these projections as linear maps from R2 into R in an obvious manner. It is clear that we can similarly define projections from, say, R3 to R2 or to R, or for that matter from Fn to Fm for any field F provided n ≥ m.

EXAMPLE 4

On the other hand, if n ≥ m, the inclusion map from Fm to Fn given by (x1 , x2 , . . . , xm ) 6→ (x1 , x2 , . . . , xm , 0, 0, . . . , 0)

is trivially a linear map. In general, any direct sum decomposition of a vector space V gives rise to various projections of V. For example, if V = W ⊕ W1 , then there is a linear map P of V onto W defined as follows: for any v ∈ V, if v = w + w1 be the unique expression of v in terms of vectors of the summands W and W1 , then we let Pv = w. P is clearly linear and onto W, and has precisely W1 as its kernel. P is called a projection of V onto W. Note that different complements of W in V will give rise to different projections of V onto W. However, they can be identified by their kernels. EXAMPLE 5

Consider the subspace W = {(x, 0) | x ∈ R} (the x-axis) of R2 . If W1 = {(0, y) | y ∈ R} is the y-axis, then R2 = W ⊕ W1 and the projection P of R2 onto W determined by this decomposition is the linear map we have already seen in Example 3: P(x1 , x2 ) = (x1 , 0). The kernel of this projection is clearly the complement of W in this case. Note that W2 = {(x, x) | x ∈ R} is another complement of the x-axis W in R2 . This time, however, (x1 , x2 ) = (x1 − x2 , 0) + (x2 , x2 ) ∈ W ⊕ W2 so the projection P onto W, determined by the decomposition R2 = W ⊕ W2 , will be the linear map given by P(x1 , x2 ) = (x1 − x2 , 0). We leave it to the reader to verify directly that the image of this P is W, and the kernel is W2 .

EXAMPLE 6

Let V = Rn [x] be the real vector space of all real polynomials of degree at most n, and let D : V → V be the differential map given by D( f (x)) = f ' (x),

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

195

where f ' (x) is the formal derivative of f (x). In other words, D(a0 + a1 x + a2 x2 + · · · + am xm ) = a1 + 2a2 x + · · · + mam xm−1 . The familiar properties of differentiation show that D is indeed a linear operator on V. The kernel of D is clearly the set of all scalar polynomials. However, we can think of this set as the subspace of V generated by any non-zero scalar. How about the image of D? Since indefinite integrals of polynomials can be considered, it follows that the subspace of all polynomials of degree at most n − 1 is the image of D. Note that D can be defined on the infinite-dimensional vector space R[x]. It can be shown the same way that even on R[x], D is linear. EXAMPLE 7

Let a, b, c and d be arbitrary real numbers, and let T : R2 → R2 be defined by T (x1 , x2 ) = (ax1 + bx2 , cx1 + dx2 ). Then, T is a linear map on R2 . The verification that ker T = {(0, 0)} if and only if ad − bc ! 0 is left to the reader.

EXAMPLE 8

Consider the map Rθ : R2 → R2 defined by Rθ (x1 , x2 ) = (x1 cos θ − x2 sin θ, x1 sin θ + x2 cos θ). This map, which is the anticlockwise rotation of the plane R2 through an angle θ, is a linear operator on R2 . The verification that Rθ is a linear map can be made easier by observing that its effect can be realized by matrix multiplication. For this point of view, we need to think of the vectors of R2 as column vectors. Then, we let ' ( cos θ −sin θ Mθ = sin θ cos θ so that Mθ

(2 3 2 3 ' cos θ −sin θ x1 x1 = sin θ cos θ x2 x2 2 3 x cos θ −x2 sin θ = 1 . x1 sin θ +x2 cos θ

It is now clear by properties of matrix multiplication that Rθ is a linear operator on R2 . From geometric consideration (as Rθ is a (anticlockwise rotation of the plane through an angle θ), we see that ker Rθ = {(0, 0)} and ImRθ = R2 . We invite the reader to supply a direct proof of these conclusions. EXAMPLE 9

In a similar manner, we can describe the operation that reflects every vector of R2 about a straight line that makes an angle θ with the positive x-axis in terms of matrix

Saikia-Linear Algebra

196

book1

February 25, 2014

0:8

Linear Maps and Matrices

multiplication. We will show later that if ' cos 2θ Hθ = sin 2θ

( sin 2θ , − cos θ

then for any column vector v ∈ R2 , Hθ v is the vector obtained by reflecting v about the line making an angle θ with the positive x-axis. As in the last example, the corresponding map is a linear operator on R2 by properties of matrix multiplication. Three important cases can be singled out as given below. (a) θ = 0; Reflection about the x-axis. This can be described as the linear operator T such that T (x, y) = (x, −y). (b) θ = π/2; Reflection about the y-axis. This can be described as the linear operator T such that T (x, y) = (−x, y). (c) θ = π/4; Reflection about the line y = x. This can be described as the linear operator T such that T (x, y) = (y, x). EXAMPLE 10 In fact, matrix multiplication, in some sense, is the most important example of linear maps, especially in the case of finite-dimensional spaces. We discuss the general case. Consider, for any field F, the vector spaces Fn and Fm and let A be an arbitrary but fixed m × n matrix over F. Write vectors in Fn and Fm as column vectors. Now, the map T A : Fn → Fm given by T A (x) = Ax for any

x ∈ Fn ,

is a linear map by the rules of matrix multiplication. It is interesting to examine the kernel and the image of the map T A. The kernel of T A is precisely the solution space in Fn of the homogeneous system of equations Ax = 0, so ker T A is the null space of A. On the other hand, the image of T A is the subspace of Fm consisting of all those vectors b for which the system of equations Ax = b has a solution in Fn , so Im(T A) is the column space of A. (See Section 3.6 for the definitions of the null space and the column space of a matrix.) Linear Maps Defined on Basis Vectors We now consider a useful way of describing linear maps whose domains are finite-dimensional. Observe that if T : V → W is a linear map, and v1 , v2 , . . . , vn is a basis of V, then T is completely determined by its actions on the basis vectors vi . For, if v = a1 v1 + a2 v2 + · · · + an vn , then by Equation (4.1), T v = a 1 T v1 + a 2 T v2 + · · · + a n T vn . Therefore once the vectors T vi in W are known, then T v can be determined for any v ∈ V. This observation leads to the following useful result.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

197

Proposition 4.2.4. Let V and W be vector spaces over the same field F. Assume that V is finitedimensional. Let {v1 , v2 , . . . , vn } be a basis of V, and w1 , w2 , . . . , wn a set of n arbitrary vectors in W. Then there is a unique linear map f : V → W such that T vi = wi for all i. Proof. Clearly, the map T (a1 v1 + a2 v2 + · · ·+ an vn ) = a1 w1 + a2 w2 + · · · + an wn is the required one. ! The fact that a linear map on a finite-dimensional vector space is completely determined by its action on the vectors of any basis also implies that two linear maps on a finite-dimensional vector are equal if and only if they agree on the vectors of any basis of the domain. Lemma 4.2.5. Let V and W be vector spaces over the same field. Assume that V is finitedimensional. Let {v1 , v2 , . . . , vn } be any basis of V. Then for any two linear maps T and S from V into W, T =S

if and only if

Tvj = S vj

for all j.

Proof. T = S if T v = S v for any v ∈ V. So if T v j = S v j for 1 ≤ j ≤ n, then by the remarks preceding the proposition, we easily infer that T v = S v for any v ∈ V. The converse is trivial. ! The following remarks will be useful in applications of the preceding proposition. (a) The range W of T in the proposition need not be finite-dimensional. (b) The vectors wi in W can be chosen really arbitrarily. We may even choose them to be all equal. For example, even for the choice w1 = w2 = · · · = wn = 0 in W, the proposition guarantees a unique linear map from V to W (which in this case is the zero map). (c) In practice, once wi s are given, it is customary to define the linear map guaranteed by this proposition in the following manner: Define T by T vi = wi . Extend T linearly to all of V to get a unique linear map from V to W. We show how to use the proposition by deriving the differential map for the polynomials as an example. Choose the standard basis 1, x, x2 , . . . , xn of the real vector space Rn [x], and let D(xm ) = mxm−1

for m = 1, 2, . . . , n

Then, it is easy to see that once we extend D linearly to all of Rn [x], we recover the differentiation map from Rn [x] to itself. To take another example, let us define a map T from R2 to itself by letting it act on the standard basis vectors of R2 as follows: 3 2 3 2 cos 2θ 1 = T e1 = T sin 2θ 0 and T e2 = T

2 3 2 3 0 sin 2θ = . 1 − cos 2θ

Saikia-Linear Algebra

198

book1

February 25, 2014

0:8

Linear Maps and Matrices

We can extend T to a linear operator on R2 by Proposition (4.2.4). Then, for any vector (x1 , x2 )t in R2 , 2 3 x T 1 = T (x1 e1 + x2 e2 ) x2

= x 1 T e1 + x 2 T e2 2 3 2 3 sin 2θ cos 2θ + x2 = x1 −cos 2θ sin 2θ 2 3 x1 cos 2θ + x2 sin 2θ = x1 sin 2θ − x2 cos 2θ 2 3 x = Hθ 1 , x2

where Hθ is the matrix we had introduced in Example 9. We leave it to the reader now to verify, by geometrical consideration, that T (e1 ) and T (e2 ) are indeed, the vectors obtained by reflecting the standard basis vectors about the line making an angle θ with the positive x-axis. We now come back to our general discussion about linear maps. Recall that any map T from V into W is onto or surjective if the image of T is the whole of the range W, and is one–one or injective if distinct elements of V are mapped by T to distinct elements of W. A map which is both one–one and onto is a bijection. As the next result shows that the kernel is a very convenient tool in determining whether a linear map is one–one or not. Proposition 4.2.6. Let T : V → W be a linear map. T is one–one if and only if ker T = {0}. Proof. The linearity of T shows that the equality T v1 = T v2 is equivalent to T (v1 − v2 ) = 0. By the definition of kernel, this equality is equivalent to the inclusion (v1 − v2 ) ∈ ker T . Hence the result. ! Dimension Formula But the linearity of a map gives rise to a much more fundamental relation between the kernel and the image, in case the domain of the map is finite-dimensional. More precisely, we have the following theorem. Theorem 4.2.7 (Dimension Formula). Let V and W be vector space over the same field F. Assume that V is finite-dimensional. For a linear map T : V → W, the following holds: dim V = dim ker T + dim Im(T ). Proof. Let dim V = n and dim ker T = k. If k ! 0, let v1 , v2 , . . . , vk be a basis of the subspace ker T of V. Then, by Corollary (3.4.7), we can find vectors vk+1 , vk+2 , . . . , vn in V such that v1 , v2 , . . . , vk , vk+1 , . . . , vn is a basis of V. In case k = 0, ker T = {0} and we choose any basis v1 , v2 , . . . , vn of V to begin with. We claim that in both the cases, whether k = 0 or not, the vectors T vk+1 , T vk+2 , . . . , T vn form a basis of the subspace Im(T ) of W.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

199

We sketch a proof of the claim in case k ! 0. Since Im(T ) consists of vectors T v as v ranges over V, it follows that for suitable scalars a1 , a2 , . . . , an in F, any vector of Im(T ) can be expressed as   n n 1  1 T  a j v j  = a jT v j. j=1

j=1

However, as v1 , v2 , . . . , vk are in ker T , we see that the above sum is actually a linear combination of T vk+1 , T vk+2 , . . . , T vn only. In other words, the vectors T vk+1 , T vk+2 , . . . , T vn span Im(T ). Next, we need to verify that the vectors T vk+1 , T vk+2 , . . . , T vn are linearly independent. Now, if bk+1 T vk+1 + bk+2 T vk+2 + · · · + bn T vn = 0

for some scalars b j , then using the linearity of T , we can rewrite the relation as T (bk+1 vk+1 + bk+2 vk+2 + · · · + bnvn ) = 0. This implies that the vector bk+1 vk+1 + bk+2 vk+2 + · · · + bn vn of V is actually in ker T and so is some linear combination of the basis vectors v1 , v2 , . . . , vk of ker T . It follows that we have a linear combination of all the vectors v1 , . . . , vk , . . . , vn which equals the zero vector of V. But these vectors are linearly independent. Therefore, all the coefficients in that linear combination and, in particular, the scalars bk+1 , bk+2 , . . . , bn must be zeros. This completes the proof of the claim in case k ! 0. The slight modification needed in the proof for the case k = 0 is left to the reader. ! Note that the proof of the preceding theorem is exactly the same as that of Theorem (3.6.14) of Chapter 3. To see why that is so, consider an arbitrary matrix A ∈ Mm×n (F) and the corresponding linear map T A from Fn to Fm . As we have noted in Example 10, ker T A is the null space of A whereas the image of T A is the column space of A. Since the dimensions of the null space and the column space of A are the nullity and the rank of A, respectively, the preceding theorem implies that rank(A) +nullity(A) = n, which is Theorem (3.6.14). In view of this discussion and anticipating later development, we make the following definitions: Definition 4.2.8. Let V and W be finite-dimensional vector spaces over the same field and T a linear map from V into W. (a) The nullity of T is the dimension of the kernel of T and denoted by nullity(T ). (b) The rank of T is the dimension of the image of T and denoted by rank(T ). We can then restate the dimension formula as follows. Corollary 4.2.9. Let T be a linear map of V into W, where V and W are finite-dimensional vector spaces over the same field. Then, dim V = nullity(T ) + rank(T ). The following is a useful result. Corollary 4.2.10. Let T : V → W be a linear map. Assume that dim V = dim W. Then, T is one–one if and only T is onto.

Saikia-Linear Algebra

200

book1

February 25, 2014

0:8

Linear Maps and Matrices

Proof. Let T be one–one. So we may assume, by Proposition (4.2.6), that dim ker T = 0. In that case, the dimension formula of the preceding theorem implies that dim V = dim Im(T ). Since by hypothesis dim V = dim W, we conclude that dim Im(T ) = dim W. But Im(T ) is a subspace of W so the equality of the dimensions means that Im(T ) = W, which implies that T is onto W. The converse can be proved in exactly the same way. ! Thus, a linear operator on a finite-dimensional vector space is one–one if and only if it is onto. The preceding corollary as well as the dimension formula are immensely useful. We give some instances of their uses. When we considered the map Rθ from R2 into itself, we remarked that ker(Rθ ) is the zero subspace (after all, which point in R2 can be rotated through an angle θ to end up at the origin?). The preceding corollary then shows immediately that Rθ is onto. Similarly, the linear operator T (x, y) = (ax + by, cx + dy) on R2 is one–one if and only if ad − bc ! 0, hence must be onto under the same condition. Finally, consider the differential map D defined on the real vector space of all polynomials of degree at most n. As soon as we find that its kernel is one-dimensional, consisting of the scalars only, we can deduce from the dimension formula that its image must be of dimension n, as V has dimension n + 1. Projections and Direct Sums We now consider a general class of linear maps, some special cases of which we have already encountered in earlier examples. In the following discussion, we anticipate products or composites of linear maps which will be considered in detail later. For a linear operator T on a vector space, we let T 2 to be the map on V given by T 2 v = T (T v) for any v ∈ V; it will be shown later that T 2 is also a linear operator on V. However, it will be good exercise for the reader to verify the linearity of T 2 now. Definition 4.2.11.

A projection P on a vector space V is a linear operator on V such that P2 = P.

Note that the projections determined by a direct sum decomposition of a vector space V, as discussed in some of the preceding examples, are projections in the sense of this definition too. For, given a direct sum V = W1 ⊕ W2 , every v in V can be expressed uniquely as a sum w1 + w2 with wi ∈ Wi . In that case, the map P on V given by Pv = w1 does satisfy the condition that P2 = P as P2 v = Pw1 = w1 = Pv. It is easy to find properties that characterize arbitrary projections. Proposition 4.2.12. kernel.

Let P be a projection on a vector space V with W as its image and K as its

(a) w ∈ W if and only if Pw = w. (b) Any v ∈ V can be uniquely expressed as a sum Pv + (v − Pv) of vectors in W and K. (c) V = W ⊕ K. Proof. If Pv = w is in the range W of P, then Pw = P2 v = Pv = w. Assertion (a) follows. Since for any v ∈ V, P(v − Pv) = Pv − P2 v = 0, it follows that (v − Pv) is in K, the kernel of P. So by (a), Pv + (v − Pv) is indeed an expression of v as a sum of vectors from W and K. If v = w1 + w2 is another such sum, then Pv = Pw1 = w1 , as w1 is in the range of P. Uniqueness in (b) follows. Finally, for assertion (c),

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

201

note that if w ∈ W ∩ K, then 0 = Pw. On the other hand, w being in W, by (a), equals Pw. Thus, W ∩ K is the zero subspace so that assertion (b) implies (c). ! In general, any direct sum decomposition of a vector space V into k summands can be described completely by k number of projections. If V = W1 ⊕ · · · ⊕ Wk , then any v can be uniquely written as a sum v = v1 + · · · + vk , where v j ∈ W j . We define, for each j, 1 ≤ j ≤ k, a map P j : V → V by P j v = v j . It is easy to see that each P j is a projection of V onto W j 4 with kernel the subspace i! j Wi . These projections, further, enjoy certain properties that are closely connected to the decomposition of V. The following proposition spells out these properties. Proposition 4.2.13.

If a vector space V can be decomposed as a direct sum V = W1 ⊕ · · · ⊕ Wk

of k subspaces, then there are k projections P1 , P2 , . . . , Pk on V satisfying the following conditions. (a) The image of P j is W j for any j, 1 ≤ j ≤ k. (b) P j Pi is the zero map on V if j ! i. (c) P1 + P2 + · · · + Pk = I, the identity map on V.

Conversely, if P1 , P2 , . . . , Pk are projections on a vector space V satisfying conditions (b) and (c), then V is the direct sum of the images of these projections. The proof is a routine verification, and left as an exercise. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. In the following, V and W are vector spaces over the same field. (a) If for a linear map T from V to W, T v = 0 only for v = 0, then T is one–one. (b) Any linear map from V to W carries a linearly independent subset of V to a linearly independent subset of W. (c) Any linear map from V to W carries a linearly dependent subset of V to a linearly dependent subset of W. (d) If T : R → R is defined by T (a) = a + 2, then T is linear. (e) There is no linear map T from R3 into R2 such that T (1, 1, 1) = (1, 0) and T (1, 2, 1) = (0, 1). (f) There is no linear map from R2 to R3 such that T (1, −1) = (1, 1, 1) and T (−1, 1) = (1, 2, 1).

(g) If W is a subspace of a finite-dimensional vector space V, then there is a linear operator on V whose kernel is W. (h) If W is a subspace of a finite-dimensional vector space V, then there is a linear operator on V whose image is W. (i) Given vectors v ∈ V and w ∈ W with v non-zero, there is a unique linear map T : V → W such that T v = w.

Saikia-Linear Algebra

202

book1

February 25, 2014

0:8

Linear Maps and Matrices

(j) There can be no linear map from R2 onto R. (k) The column space of an m × n matrix over a field F is the image of some linear map from Fn into Fm . (l) There is no linear operator on R2 such that the kernel of T coincides with the image of T . (m) There is no linear operator on R3 such that the kernel of T coincides with the image of T . (n) There is a linear operator on R4 having its kernel equal to its image. 2. Determine whether the following maps between the indicated real vector spaces are linear. For each linear map, determine the kernel and the image and deduce whether the map is one–one or onto. (a) T : R3 → R2 ;

T (x1 , x2 , x3 ) = (x1 + x2 , −x3 ).

(b) T : R2 → R3 ;

T (x1 , x2 ) = (0, x1 − 2x2 , x1 + 3x2 ). ( ' −x2 x1 2 . (c) T : R → M2 (R); T (x1 , x2 ) = x2 x1 + x2 ( ' ' x11 0 x11 x12 (d) T : M2 (R) → M2×3 (R); T = x21 x22 −x12 0 (e) T : R3 [x] → R3 [x]; (f) T : Rn [x] → R;

( −x21 . x22

T (p(x)) = p' (x) + p(x).

T (p(x)) = p(0).

(g) T : Mn (F) → Mn (F);

T (A) = A + At .

3. Explain why the following maps from R2 to R2 are not linear. (a) T (x1 , x2 ) = (x1 , x2 2 ). (b) T (x1 , x2 ) = (x1 + x2 + 1, x1 − x2 ). (c) T (x1 , x2 ) = (0, sin x2 ). (d) T (x1 , x2 ) = (1, x2 ).

4. Let V and W be vector spaces over a field F. Show that a map T : V → W is linear if and only if T (av1 + v2 ) = aT v1 + T v2 for any v1 , v2 ∈ V and a ∈ F. 5. Let T : R3 → R3 be defined as T (x1 , x2 , x3 ) = (x1 + x2 − 2x3, −x1 + 2x2 − x3 , 4x2 − 5x3). Verify that T is a linear operator on R3 . Find all (a, b, c) ∈ R3 such that (a, b, c) ∈ ker T . Also, find the vectors (a, b, c) in the range of T . 6. Let T be the translation operator on the vector space R[x] of all real polynomials given by T (p(x)) = p(x + 1) for any p(x) ∈ R[x]. Prove that T is a linear operator on R[x].

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

203

7. Let R[x] be the vector space of all real polynomials. Prove that the differential operator D on R[x] is onto but not one–one. Let T : R[x] → R[x] be defined by J x T (p(x)) = p(t)dt. 0

Prove that I is a linear operator on R[x] which is one–one but not onto. 8. Let B be a fixed matrix in Mn (F), where F is an arbitrary field. Define T : Mn (F) → Mn (F) by T (A) = AB − BA.

9. 10. 11. 12. 13. 14. 15.

Prove that T is a linear operator on Mn (F). Describe the kernel of T if B is scalar matrix. Find the range of T in case B is a diagonal matrix. Consider the set of complex numbers C as a vector space over R. Is the map T : R2 → C given by T (a, b) = a + ib a linear map? Let T be a linear operator on the vector space Fm . Prove that if the kernel of T coincides with the image of T , then show that m has to be even. Give an example of a linear operator T on R4 whose kernel and image are the same subspace of R4 . (Hint: Use Proposition (4.2.4) to define T in terms of the standard basis.) Give an example of two distinct linear operators on the same vector space which have the same kernel and image. Let T be a linear map from a vector space V to a vector space W, both over the same field F. If V1 is a subspace of V, then show that T (V1 ) = {T v | v ∈ V1 } is a subspace of W. Similarly, show that if W1 is a subspace of W, then T −1 (W1 ) = {v ∈ V | T v ∈ W1 } is a subspace of V. Let V and W be vector spaces of the same dimension over a field F. For a linear map T : V → W, show that T is onto if and only if for every basis v1 , v2 , . . . , vm of V, the vectors T v1 , T v2 , . . . , T vm form a basis of W. Let a, b, c and d be fixed real numbers and let T : R2 → R2 be the map T (x1 , x2 ) = (ax1 + bx2 , cx1 + dx2 ).

Prove that T is linear. Show further that T is one–one if and only if ad − bc ! 0. 16. Prove directly that the rotation Rθ of the plane R2 through an angle θ is a one–one, onto linear operator on R2 . 17. Let T be the linear operator on R2 such that T e1 = (1, −1) and T e2 = (2, 3), where e1 and e2 form the standard basis of R2 . Find an expression for T (x1 , x2 ) in terms of x1 and x2 for any (x1 , x2 ) ∈ R2 . Is T one–one? Is it possible to find (x1 , x2 ) ∈ R2 such that T (x1 , x2 ) = (a1 , a2 ) for any (a1 , a2 ) ∈ R2 ? 18. Find bases for the kernel and images of the following linear transformations and hence determine their rank and nullity. (a) T : R3 → R3 ; T (x1 , x2 , x3 ) = (2x1 − x2 + 3x3, −x1 + 2x3 , x1 + 2x2 − x3 ).

(b) T : R3 → R2 ; T (x1 , x2 , x3 ) = (x1 − x2 + x3 , x1 + 2x2 − 3x3 ). (Hint: If F is linear on V, then the image of F is spanned by F(v1 ), F(v2 ), . . . , F(vm ) for any basis v1 , v2 , . . . , vm of V.)

Saikia-Linear Algebra

204

book1

February 25, 2014

0:8

Linear Maps and Matrices

19. Find a linear operator T on R3 whose image is spanned by (1, −1, 1), (1, 2, 3) and (0, 0, −1). (The expected answer should be a formula for T (x1 , x2 , x3 ) in terms of x1 , x2 and x3 .) 20. Find a linear transformation from R2 to R3 whose image is spanned by (1, 1, 1). 21. Find a linear transformation from R3 to R2 whose kernel is spanned by (1, 1, 1).

4.3 ALGEBRA OF LINEAR MAPS The collection of all linear maps between two vector spaces over a field can be given a vector space structure. More interestingly, the vector space of linear operators of a vector space carries the additional structure of a ring. We discuss these structures in detail in this section. We first fix some notations. Definition 4.3.1. Let V and W be vector spaces over a field F. The collection of all linear maps (that is, F-homomorphisms) from V into W are denoted by HomF (V, W) or simply by Hom(V, W). The collection of all linear operators (that is, F-endomorphisms) on a vector space V over a field F is denoted by EndF (V) or simply by End(V). We have already noted in examples of Section 4.2 that Hom(V, W) always contains the zero map (the map which takes every vector of V to the zero vector of W), and End(V) the zero operator as well as the identity operator of V. Though it is not clear now, there are innumerable elements in Hom(V, W) if V and W are non-zero spaces. In fact, we will see shortly that for finite-dimensional spaces V and W, once a pair of bases are fixed, any matrix of a suitable size gives rise to a map in Hom(V, W). Our immediate task is to put in place a vector space structure on Hom(V, W). Since the elements of Hom(V, W) are maps or functions, the usual definitions of sums and scalar multiples of maps give us the required operations for Hom(V, W). We have already seen that with such operations, real-valued functions on a closed interval or real polynomials form vector spaces. (See Examples 8 and 10 of Section 3.2). Definition 4.3.2. For linear maps T and S in Hom(V, W), we define the sum T + S to be the map from V → W whose action on a vector of V is as follows: for any v ∈ V, (T + S )v = T v + S v. Similarly, the scalar multiple aT of T ∈ Hom(V, W) for a scalar a ∈ F is defined by what it does to a vector of V: for any v ∈ V, (aT )v = aT v. The crucial fact is that both T + S and aT are again linear maps from V into W. Proposition 4.3.3. Let V and W be vector spaces over the same field F. (a) For any T, S ∈ Hom(V, W), the sum T + S ∈ Hom(V, W). (b) For any T ∈ Hom(V, W) and a ∈ F, the scalar multiple aT ∈ Hom(V, W). In particular, if T and S are linear operators on V, then T + S and aT for any a ∈ F are again linear operators on V.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Algebra of Linear Maps

205

Proof. For any v1 , v2 ∈ V, we have (T + S )(v1 + v2 ) = T (v1 + v2 ) + S (v1 + v2 ) = T v1 + T v2 + S v1 + S v2 , where the first equality follows from the definition of the sum T + S , and the second from the linearity of T and S . Rearranging the terms in the last expression and using the definition of the sum T + S once again, we obtain (T + S )(v1 + v2 ) = (T + S )v1 + (T + S )v2 . Thus, T + S preserves addition of vectors. Similar calculation shows that T + S preserves scalar multiplication: (T + S )(av) = T (av) + S (av) = aT v + aS v = a(T v + S v) = a(T + S )v.

(4.2)

Thus we have shown that T + S is a linear map from V to W and so is in Hom(V, W). For the second assertion, one needs to show that, for any a ∈ F and any linear map T from V to W, the scalar multiple aT is also a linear map from V to W or, to be precise, to verify that (aT )(v1 + v2 ) = (aT )v1 + (aT )v2 (aT )(bv) = b(aT )v. for any v1 , v2 and v in V and b ∈ F. This routine verification is left to the reader. The last assertion of the proposition can be obtained by taking W = V.

!

Vector Space of Linear Maps The preceding proposition confirms that Hom(V, W) is closed with respect to addition and scalar multiplication of linear maps. Another round of routine verifications, this time of the vector space axioms, establishes that Hom(V, W) itself a vector space. Theorem 4.3.4. Let V and W be vector spaces over a field F. The collection Hom(V, W) of linear maps of V into W forms a vector space over F with respect to addition and scalar multiplication of maps. The zero map z from V to W is the zero vector (additive identity) of Hom(V, W). Observe that verifying the vector space axioms for Hom(V, W) amounts to checking equalities of maps. For example, to verify that addition of elements in Hom(V, W) is commutative, one has to show that the maps T + S and S + T are equal, which, in turn, is equivalent to showing that (T + S )v = (S + T )v

Saikia-Linear Algebra

206

book1

February 25, 2014

0:8

Linear Maps and Matrices

for all v ∈ V. Similarly, the trivial fact that T v + 0 = T v implies, according to the definition of equality of functions, that T + z = T , so that one may conclude that z acts as the zero vector in Hom(V, W). With these remarks, we leave the verification of the vector space axioms for Hom(V, W) to the reader. A special case of the preceding theorem, obtained by letting W = V, must be singled out. Corollary 4.3.5. For any vector space V over a field F, the collection End(V) = EndF (V) of all linear operators on V is a vector space over F with respect to addition and scalar multiplication of linear operators. The zero operator on V acts as the additive identity in End(V). The following fact is a useful one: a non-zero element of Hom(V, W) is a linear map T from V into W such that there is at least one v ∈ V such that T v ! 0, the zero vector of W. Equivalently, the kernel of T cannot be all of V.

Dimensions of Spaces of Linear Maps Till now, we have been looking at linear maps between arbitrary vector spaces. However, for finitedimensional vector spaces, there is a striking formula, given in the following theorem, relating the dimension of Hom(V, W) with those of V and W. Theorem 4.3.6. If V and W are finite-dimensional vector spaces over a field F with dim V = n and dim W = m, then Hom(V, W) is also finite-dimensional over F and dim Hom(V, W) = nm. In particular, for an n-dimensional vector space V, End(V) has dimension n2 . The usual proof of this result is a constructive one, in the sense that once two bases of V and W are fixed, then by using the given basis vectors one can actually write down nm specific linear maps from V to W which form a basis of Hom(V, W). However, at this point it is difficult to see the motivation for constructing these specific maps. So we postpone the proof till the next section where we will be exploring the connection between Hom(V, W) and Mm×n (F). Recall that Mm×n (F) is an mn-dimensional vector space; thus, our theorem will be a by product of the main result of that section. Another special case of Hom(V, W), is introduced in the following definition. Recall that a field F can be considered a vector space over itself. Definition 4.3.7. For any vector space V over a field F, the vector space Hom(V, F) is known as the dual space of V. It is denoted by K V. The elements of K V, which are linear maps from V to F, are known as linear functionals on V. The preceding theorem then implies the following corollary.

Corollary 4.3.8.

For an n-dimensional vector space V over a field F, dim K V = n.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Algebra of Linear Maps

207

Composition of Linear Maps We now introduce the important concept of the composition of linear maps. The reader must have come across instances of functions produced by combining two functions (for example, the function sin x2 is the composite of the functions f (x) = sin x and g(x) = x2 ); the composite of two linear maps is obtained in the same manner. Definition 4.3.9. Let T : V → W and S : W → U be linear maps where V, W and U are vector spaces over the same field F. The composite S T of T and S is the map S T : V → U defined by (S T )v = S (T v)

for any v ∈ V.

Now for any v1 , v2 ∈ V and a ∈ F, using the linearity of T and S separately, we see that S T (av1 + v2 ) = S (T (av1 + v2 )) = S (aT v1 + T v2 ) = S (aT v1 ) + S (T v2 ) = aS (T v1 ) + S (T v2 ) = a(S T )(v1 ) + (S T )(v2 ). Thus, S T is a linear map. Note that the definition of composite is valid even when two or all of the vector spaces V, W and U are the same. Thus if T and S are linear operators on a vector space V, then their composite S T , which we shall also call as their product, is again a linear operator on V. In particular, given a linear operator T on V, the composite T 2 = T T , defined by T 2 v = T (T v) for any v ∈ V, is a linear operator. Letting T 0 = IV , the identity operator on V, we can then inductively define, for any positive integer n, the linear operator T n by T n v = T (T n−1 v) for any v ∈ V. EXAMPLE 11 Let T and S be the reflections in R2 about the x-axis and y-axis respectively. Thus, T (x, y) = (x, −y) and S (x, y) = (−x, y) for any (x, y) ∈ R2 . Then S (T (x, y)) = S (x, −y) = (−x, −y), showing that the product S T is the reflection in R2 about the origin. Note that S T = T S . However, as we shall a little later that composition of linear operators is not commutative in general. EXAMPLE 12 Geometrically, it is obvious that if the rotation through an angle θ in R2 is followed by another rotation through the same angle, then the combined result is the rotation through angle 2θ. In other words, if T is the linear operator on R2 representing a rotation through angle θ, then the operator T 2 is the rotation through an angle 2θ. We verify this assertion now. Recall from an earlier example that T can be described, using a matrix, as 2 3 2 '3 ' (2 3 x x cos θ −sin θ x T = ' = , y sin θ cos θ y y

Saikia-Linear Algebra

208

book1

February 25, 2014

0:8

Linear Maps and Matrices

where we are writing elements of R2 as column vectors as matrix multiplication is used for describing T . Then 2 '3 2 3 x x =T ' T2 y y ' (2 3 cos θ −sin θ x' = sin θ cos θ y' ' (' (2 3 cos θ −sin θ cos θ −sin θ x = . sin θ cos θ sin θ cos θ y However, because of the familiar trigonometrical identities cos2 θ−sin2 θ = cos 2θ and sin 2θ = 2 cos θ sin θ, the matrix product in the preceding equation can be simplified as follows: ' (' ( ' 2 ( cos θ −sin θ cos θ −sin θ cos θ − sin2 θ −2 cos θ sin θ = sin θ cos θ sin θ cos θ 2 cos θ sin θ cos2 θ − sin2 θ ' ( cos 2θ − sin 2θ = . sin 2θ cos 2θ Therefore, 2 3 ' x cos 2θ = T y sin 2θ 2

− sin 2θ cos 2θ

(2 3 x , y

which shows that T 2 is indeed a rotation through an angle 2θ. Ring of Linear operators We have already seen that the composite or the product of two linear operators on a vector space V over a field F is again a linear operator on V. Thus, the collection EndF (V) of all linear operators on V, apart from addition of operators, has composition of two operators as another binary operation. What is of importance to us is that with respect to these two binary operations EndF (V) is a ring. A reader not familiar with the concept of a ring should to go through Section 1.7 in the first chapter for relevant definitions. Theorem 4.3.10. Let V be a vector space over a field F. Then, the vector space End(V) = EndF (V) formed by the linear operators on V is a ring with respect to addition and composition of maps. The identity operator on V acts as the (multiplicative) identity of the ring End(V). Proof. It has been noted that End(V) is a vector space and so, in particular, it is an abelian group with respect to addition of operators on V. To complete the proof that End(V) is a ring we therefore need to verify the following identities involving arbitrary operators on V: T 1 (T 2 T 3 ) = (T 1 T 2 )T 3 , T 1 (T 2 + T 3 ) = T 1 T 2 + T 1 T 3 ,

(4.3) (4.4)

(T 1 + T 2 )T 3 = T 1 T 3 + T 2 T 3 ,

(4.5)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Algebra of Linear Maps

209

for any T 1 , T 2 and T 3 in End(V). Since End(V) is closed with respect to addition and composition of linear operators on V, it follows that both sides of all the three identities we are trying to prove are operators on V and so have the same domain and the same range. Thus to prove the equalities, we just have to show that both sides in each equality agree on arbitrary vectors of V. We take the second relation first. Now for any vector v ∈ V, by the definitions of composition and addition of operators, we see that T 1 (T 2 + T 3 )v = T 1 ((T 2 + T 3 )v) = T 1 (T 2 v + T 3 v) which, by the linearity of T 1 , equals to = T 1 (T 2 v) + T 1(T 3 v) = (T 1 T 2 )v + (T 1 T 3 )v = (T 1 T 2 + T 1T 3 )v. By looking at the beginning and the end of this chain of equalities, we infer, as the relation holds for an arbitrary vector v in V, that T 1 (T 2 + T 3 ) = T 1 T 2 + T 1 T 3 as required (recall the definition of equality of two operators). Note the use the definitions of composition and addition of operators, respectively, in deriving the last two equalities. We leave the verifications of the other two equalities, which can be done in a similar manner, to the reader. It is also easy to see that the identity operator I on V acts as the multiplicative identity in End(V), that is, T I = IT = T for any linear operator T on V. For, given any v ∈ V, (T I)v = T (Iv) = T v and (IT )v = I(T v) = T v.

!

Because of Equation (4.3), the composition of linear operators is said to be associative; the two Equations (4.4) and (4.5) show that it is also distributive. The distributive laws tell us how composition of linear operators combine with addition. Similarly, there is a useful property that specifies how composition of linear operators on a vector space V over a field F combines with respect to scalar multiplication: a(S T ) = (aS )T = S (aT ),

(4.6)

for any S , T in EndF (V) and any a ∈ F. The verification is standard and left to the reader. The ring End(V), like its counterpart Mn (F), is a very interesting one. To understand some of the features that make it interesting, consider the special case of V = R2 . Fix any basis v1 , v2 of R2 . Recall from Proposition (4.2.4) that we can get unique linear operators of R2 , that is, elements of End(R2 ), by simply specifying their values at the basis vectors v1 , v2 and then extending them linearly to all

Saikia-Linear Algebra

210

book1

February 25, 2014

0:8

Linear Maps and Matrices

of R2 . Let us define four elements T 11 , T 12 , T 21 , T 22 of End(R2 ) in this manner: T 11 v1 T 12 v1 T 21 v1 T 22 v1

= v1 , = 0, = v2 , = 0,

T 11 v2 T 12 v2 T 21 v2 T 22 v2

= 0, = v1 , = 0, = v2 .

Therefore, (T 11 + T 22)v1 = v1 (T 11 + T 22)v2 = v2 which show that T 11 + T 22 agrees with the identity map I on R2 on each basis vector. It follows that T 11 + T 22 = I on R2 . It is equally instructive to interpret the product T 11 T 22 . Since (T 11T 22 )v1 = T 11 (T 22 v1 ) = T 11 0 = 0 and (T 11 T 22 )v2 = T 11 (T 22 v2 ) = T 11 v2 = 0, it follows that T 11 T 22 is the zero map z of End(R2 ). Thus, we have shown that the product of two non-zero elements of End(R2 ) is the zero element. Similar calculations show that T 11 T 21 = z whereas T 21 T 11 ! z proving that the product T 11 T 21 is not the same map as the product T 21 T 11 . In other words, End(R2 ) is not a commutative ring. With this example in mind, the reader should have no difficulty in proving the following general result. Proposition 4.3.11.

Let V be a finite-dimensional vector space over a field F such that dim V > 1.

(a) The ring End(V) is not commutative. (b) End(V) has non-zero zero divisors, that is, End(V) has non-zero operators whose product is the zero operator. Compare this with similar properties of Mn (F), which we discussed after Proposition (1.3.6). The similarity of these two results is yet another pointer to the connection between End(V) and Mn (F). This connection will be made precise in the next section. Invertible Linear Operators Analogous to the invertible matrices in Mn (F), there are invertible linear operators in End(V). Definition 4.3.12. Let V be a vector space over a field F. An operator T ∈ EndF (V) is invertible in EndF (V) if there is an operator S in End(V) such that S T = T S = IV , where IV is the identity operator on V. If such an operator S on V exists, one says that S is the inverse of T and is usually denoted by T −1 .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Algebra of Linear Maps

211

An alert reader must have noticed our tacit assumption that if an inverse of an operator exists, then it must be unique. Uniqueness of inverses is a general fact and one can verify it by a standard argument. Suppose that for a linear operator T on V, there are two operators S 1 and S 2 on V such that S 1 T = T S 1 = I and S 2 T = T S 2 = I, where I stands for the identity operator on V. Now, as S 1 = S 1 I = S 1 (T S 2 ), by associativity of composition of maps, S 1 = (S 1 T )S 2 = IS 2 = S 2 proving our contention that if an inverse of T exists it must be unique. In case a linear operator T on V has a set-theoretic inverse, then it can be shown that the inverse has to be automatically a linear operator and so has to be the inverse of T in the ring End(V). Recall that a function f : X → Y has a set-theoretic inverse if and only if f is one–one and onto. In that case for any y ∈ Y, there is a unique x ∈ X such that f (x) = y; the set-theoretic inverse of f , denoted by f −1 , is then the function from Y to X given by f −1 (y) = x. One easily verifies that the composite f −1 f = IX , the identity map on X and f f −1 = IY , the identity map on Y; if X = Y, then of course, f −1 f = f f −1 = IX . We are now ready to prove the following. Proposition 4.3.13. Let T be a linear operator on a vector space V over a field F. Then T is invertible in EndF (V) if and only if T is one–one and onto. Proof. Assume first that T : V → V is one–one and onto. Then inverse function T −1 : V → V exists and T T −1 = T −1 T = IV , the identity map on V. So it suffices to show that T −1 is a linear map. Let v, w be arbitrary vectors in V and set v1 = T −1 v and w1 = T −1 w. Then by definition, T v1 = v and T w1 = w. Since T is linear, for any a ∈ F, T (av1 + w1 ) = aT v1 + T w1 = av + w and so applying the definition of T −1 again, we obtain T −1 (av + w) = av1 + w1 = aT −1 v + T −1 w, which proves that T −1 is linear. Conversely, we assume that T is invertible and S is its inverse. If v ∈ ker T , then applying S to both sides of T v = 0, we see that S (T v) = S 0 = 0 as S is linear. As S T is the identity map on V, it follows that v = 0 and so ker T = {0} showing that T is one–one. On the other hand, for any w ∈ V, if v = S w, then applying T to both sides, we obtain T v = T (S w) = (T S )w = IV w = w. This implies that T is onto. ! The proof also shows that the inverse of an invertible operator is an one–one and onto map. Recall that a linear operator on a finite-dimensional vector space is one–one if and only if it is onto. So we have the following useful corollary. Corollary 4.3.14. A linear operator on a finite-dimensional vector space is invertible if and only if it is either one–one or onto. EXAMPLE 13 Consider the linear operator T on R2 given by T (x, y) = (−x, −y). Since for any x, y ∈ R, −(−x) = x and −(−y) = y, it follows that T 2 (x, y) = T (T (x, y)) = T (−x, −y) = (x, y) for all (x, y) ∈ R2 . Thus T 2 acts as the identity operator on R2 . By definition then T −1 = T and so T is invertible. Geometrically, T reflects any point in R2 about the origin and so it is clear that T 2 (which is T followed by itself) will bring any point back to itself.

Saikia-Linear Algebra

212

book1

February 25, 2014

0:8

Linear Maps and Matrices

In general, any projection P on a vector space is invertible with P itself as its inverse as P2 = P. EXAMPLE 14 The operator representing an anticlockwise rotation of R2 through an angle θ is clearly invertible as a clockwise rotation through −θ will correspond to its inverse. Nilpotent Operators We next introduce nilpotent elements in the ring End(V) for any vector space V of dimension at least 2. Definition 4.3.15. A linear operator T ∈ End(V) is said to be nilpotent if T k = z for some positive integer k. For a nilpotent operator T , the positive integer k is said to be the index of nilpotency if T k = z, but T k−1 ! z. Let V be a vector space of dim = n over a field F, where n > 1. Choose any basis v1 , v2 , . . . , vn of V. Consider the linear operator T on V defined by T v j = v j+1

for j = 1, 2, . . . , n − 1,

T vn = 0.

(4.7)

It is clear that for the product or composite T 2 , we have T 2 v j = v j+2 2

T vj = 0

for j = 1, 2, . . . , n − 2,

for j = n − 1, n.

Similar calculations for higher powers of T show that whereas T k , for k ≤ n − 1, cannot be the zero map (as T k v1 ! 0), T n carries every basis vector to the zero vector, hence must be the zero map. Thus, we have found a nilpotent linear operator of index n. By modifying this example, nilpotent operators of index less than n can be produced easily. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. In the following, V is a vector space over an arbitrary field. (a) The sum and the product of two linear operators on V are again linear operators on V. (b) The vector space of all linear transformations of R3 to R2 is of dimension 5. (c) If, for a linear operator T on Rn , T 2 is the zero map, then T itself is the zero map. (d) If for a linear operator T on V, T 2 = I, then either T = I or T = −I. (e) If T 1 , T 2 and T 3 are linear operators on V such that T 1 + T 2 = T 1 + T 3 , then T 2 = T 3 . (f) If T 1 , T 2 and T 3 are linear operators on V such that T 1 T 2 = T 1 T 3 , then T 2 = T 3 . (g) If T 1 , T 2 and T 3 are linear operators on V, then T 1 (T 2 + T 3 ) = T 1 T 2 + T 1 T 3 . (h) For non-zero operators T 1 and T 2 on V, the product T 1 T 2 cannot be the zero operator. (i) For a linear operator T on V, T 2 is the zero operator on V if and only if Im(T ) ⊂ ker T .

(j) If, for a non-zero linear operator T on V, T 2 = T , then T must be the identity operator on V.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Algebra of Linear Maps

213

(k) The dual of a finite-dimensional vector space V is isomorphic to V. (l) The trace map on Mn (F) is a linear functional. (m) An invertible operator on V cannot be nilpotent. 2. Complete the verification of the vector space axioms for HomF (V, W) in Theorem (4.3.4). 3. Prove Theorem (4.3.10). 4. Determine, in each of the following, whether the given map is a linear functional on the indicated vector space: (a) T on Mm×n (F), T ([ai j ]) = a11 . (b) T on Mn (F), T (A) = det A. (c) T on Rn [x], T (g(x)) = g(0). Lb (d) T on C[a, b], T (g(x)) = a g(t)dt.

(e) T on R2 , T ((x1 , x2 )) = x1 + x2 . 5. Let T and S be invertible operators on a vector space V. (a) Prove that T −1 is invertible and (T −1 )−1 = T . (b) Prove that S T is invertible and (S T )−1 = T −1 S −1 . 6. Let V and W be vector spaces over a field F and T : V → W be a linear map which is one–one and onto. Prove that the set-theoretic inverse T −1 is an one–one, onto linear map from W to V. 7. Let T : R3 → R2 and S : R2 → R3 be the linear transformations given by T (x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 ) and S (x1 , x2 ) = (x1 + x2 , x1 − x2 , x2 ). Give similar formulae for S T and T S . 8. Let Rθ and Rφ be the linear operators on R2 representing anticlockwise rotations of R2 through angles θ and φ, respectively. Prove, using matrices, that their composite or product is the rotation through the angle θ + φ. 9. Let R be the linear operator on R2 representing the rotation of R2 through the angle π/4. Prove that the linear operator R7 is the multiplicative inverse of R. 10. Let T : R3 → R2 and S : R2 → R3 be the maps given by formulas T (x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 ) and S (x1 , x2 ) = (x1 − x2 , x1 + x2 , x1 ) respectively. Prove that T and S are linear maps. Describe the composites T S and S T by in terms of similar formulas. 11. Let T be the linear operator on R3 such that T e1 = e1 − e2 , T e2 = e1 + e2 + e3 and T e3 = e3 − e1 where e1 , e2 and e3 form the standard basis of R3 . Compute T (x1 , x2 , x3 ) for any (x1 , x2 , x3 ) in R3 . Also, find all the vectors (x1 , x2 , x3 ) ∈ R3 such that T (x1 , x2 , x3 ) = (0, 0, 0). Is T invertible? 12. Let T be the linear operator on the complex vector space C3 such that T e1 = (i, 0, −1), T e2 = (0, 1, 1) and T e3 = (1, 1 + i, 2)

Saikia-Linear Algebra

214

book1

February 25, 2014

0:8

Linear Maps and Matrices

where e1 , e2 and e3 form the standard basis of C3 . Compute T (x1 , x2 , x3 ) for any (x1 , x2 , x3 ) ∈ C. Also find all the vectors (x1 , x2 , x3 ) ∈ C such that T (x1 , x2 , x3 ) = (0, 0, 0). Is T invertible? 13. Let T 1 and T 2 be linear operators on R3 given by T 1 (x1 , x2 , x3 ) = (0, x1 , x2 ),

T 2 (x1 , x2 , x3 ) = (x3 , x2 , x1 ).

(a) Give similar formulae for the operators T 1 + T 2 , T 1 T 2 , T 2 T 1 , T 1 2 and T 2 2 . (b) Determine which of the two operators is invertible, and give a formula for the inverse. (c) Find non-zero linear operators S 1 and S 2 such that S 1 T 1 and T 1 S 2 are zero operators. 14. Let T be the linear operator on M2 (R) given by 2' (3 ' ( a b 0 a T = . c d b c Compute the powers T 2 , T 3 and T 4 by giving their actions on an arbitrary matrix of M2 (R). Also, find a linear operator S on M2 (R) such that S T ! T S . (For the second part, you should be able to produce a specific A ∈ M2 (R) such that (S T )(A) ! (T S )(A) for whatever operator S you have found.) 15. Let T and S be linear operators on R2 [x] given by T (a0 + a1 x + a2 x2 ) = a0 + a1 (x + 1) + a2(x + 1)2 and S (a0 + a1 x + a2 x2 ) = a1 + 2a2 x.

16.

17. 18. 19. 20. 21.

Give similar formulae for the operators T 2 , S 2 , S T and T S . Which of the two operators T and S is nilpotent? Let T and S be linear operators on a vector space V. Prove that T S is nilpotent if and only if S T is nilpotent. In Exercises 13 through 16, the required linear maps should be described in terms of their actions on some chosen bases of their domains. Let V be an m-dimensional vector space over a field F. Prove that for any positive integer k, 1 ≤ k ≤ m, there is a nilpotent operator on V whose index of nilpotency is k. Let V be an m-dimensional vector space over a field F. Prove that if m > 1, then there are nonzero linear operators T 1 and T 2 on V such that T 1 T 2 ! T 2 T 1 . Determine linear operators T and S on R3 such that S T is the zero operator on R3 whereas T S is not. Find two non-zero operators T 1 and T 2 on R2 such that T 1 T 2 is the zero operator on R2 . Can your example be generalized to an arbitrary vector space of dimension larger than 1? Let T be the translation operator on the real vector space R[x] of all real polynomials defined by T (p(x)) = p(x + 1) for any f (x) ∈ R[x].

Is T invertible? 22. Let T be the linear operator on the vector space R[x] of all real polynomials defined by T ( f (x)) = x f (x)

for any f (x) ∈ R[x].

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Isomorphism

215

Is T one–one? Is T onto? Also, if D is the usual differential operator on R[x], then prove that DT − T D is the identity operator on R[x]. 23. Let T be a linear operator on a finite-dimensional vector space V such that rank(T 2 ) = rank(T ). Prove that Im(T ) ∩ ker T = {0}. 24. Let V, W and U be vector spaces over a field F. If T ∈ HomF (V, W) and S ∈ HomF (W, U), then show that the composite S T is in HomF (V, U). 25. Let T ∈ HomF (V, W) and S ∈ HomF (W, U) be linear maps, where V, W and U are finitedimensional vector spaces over a field F. (a) Show that T (ker S T ) is a subspace of ker S . (b) Hence prove that dim ker(S T ) ≤ dim ker S + dim ker T. 26. Let T and S be linear operators on a finite-dimensional vector space V over a field F such that T 2 = S 2 is the zero operator and T S + S T is the identity operator on V. Prove that ker T = T ker S , ker S = S ker T and V = ker T ⊕ ker S .

4.4 ISOMORPHISM We have, quite often, come across instances of two vector spaces which we claimed are essentially the same. We can now make this vague idea precise by introducing the idea of isomorphism of vector spaces. Definition 4.4.1. Let V and W be vector spaces over the same field F. A linear map T : V → W is an isomorphism of V onto W if T is both one–one and onto. In that case, we say V and W are isomorphic as vector spaces. If V is isomorphic to W, then sometimes the notation V # W is used. The existence of an isomorphism between V and W means that every vector of W is associated with a unique vector of V in such a way that this association respects the vector space operations. In other words, the vectors of isomorphic spaces differ in names only; the isomorphism allows one to rename the vectors of one vector space as vectors of the other space in a way compatible with the respective operations of the two spaces. In that sense, two isomorphic vector spaces are the same. If T is an isomorphism of V onto W, then T has a set theoretic inverse T −1 (why?) from W to V. In an exercise in the last section, the reader was asked to show that T −1 is also one–one, onto linear map. So T −1 is an isomorphism of W onto V. Similarly, it is a routine exercise to show that if T : V → W and S : W → U are isomorphisms, then the composite S T is an isomorphism of V onto U. The last two assertions are the main ingredients of the proof of the following proposition which we leave to the reader.

Saikia-Linear Algebra

216

book1

February 25, 2014

0:8

Linear Maps and Matrices

Proposition 4.4.2. over a fixed field.

Isomorphism is an equivalence relation in the collection of all vector spaces

Here are some examples of isomorphic spaces. EXAMPLE 15 The set of complex numbers C, as a vector space over R, is isomorphic to R2 . The isomorphism is clearly the map given by T (a + ib) = (a, b). EXAMPLE 16 When we say that R2 is isomorphic to itself, we probably think of the identity map (the one that maps v to itself) as the isomorphism. However, there are infinitely many ways in which R2 can be conceived of as an isomorphic copy of itself. For example, as we have seen in the last section, every choice of reals a, b, c and d such that ad − bc ! 0, the map (x1 , x2 ) 6→ (ax1 + bx2 , cx1 + dx2 ) sets up an isomorphism of R2 with itself. Another one will be the rotation Rθ where θ ! 2nπ. This abundance of isomorphisms is not limited to R2 only. We will show presently that we have many choices for isomorphism for any arbitrary vector space. EXAMPLE 17 The vector space M2 (R) is isomorphic to R4 . The map which sends ' ( a b 6→ (a, b, c, d) c d is an isomorphism. It is equally easy to show that Mm×n (F) # Fmn for any field F. EXAMPLE 18 The map a0 + a1 x + a2 x2 6→ (a0 , a1 , a2 ) establishes an isomorphism from the vector space R2 [x] of all real polynomials of degree at most two with R3 . The general case of isomorphism between the real vector space Rn [x] of polynomials of degree at most n, and Rn+1 is left to the reader, as an exercise, to formulate as well as to prove. EXAMPLE 19 However, as the most important example, we show that any n-dimensional vector space V over a field F is isomorphic to Fn . To see this, choose a basis v1 , v2 , . . . , vn of V. If, for an arbitrary vector v ∈ V, v = a 1 v1 + a 2 v2 + · · · + a n vn

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Isomorphism

for scalars ai , then by setting

217

  a1  a   2 T v =  . ,  ..    an

we get a well-defined map T : V → Fn (see discussion following Equation 3.4). Basic properties of addition and scalar multiplication in vector spaces show that T is a linear map. As the basis vectors are linearly independent, T is one–one. Therefore, T is onto as the dimensions of V and Fn are the same. Though this example shows that there is only one vector space of dimension n over F up to isomorphism, that does not mean that we should restrict ourselves to studying Fn only. For, the identification of V with Fn depends on our choice of the basis of V; every choice of a basis determines a rule for identifying vectors of V with n-tuples of Fn . Thus, there is no natural way of associating the vectors of a general n-dimensional space V with vectors of Fn . Thus, the isomorphism outlined in the example has a very limited use. However, for future reference, we state the following proposition. Proposition 4.4.3. Let V be an n-dimensional vector space over a field F. Fix a basis of V, so that every vector is assigned a unique coordinate vector in Fn . This assignment is a vector space isomorphism of V onto Fn . Note that in order to set up the isomorphism in the last example, we could have started by letting T vi = ei , where ei form the standard basis of Fn . Then, according to Proposition (4.2.4), T could have been extended to a linear map on V. In that case the following result would have directly shown that f is an isomorphism. Proposition 4.4.4. Let V be a finite-dimensional vector space, and let v1 , v2 , . . . , vn be any basis of V. Suppose that W is another vector space over the same field F. Then a linear map T : V → W is an isomorphism if and only if the images T v1 , T v2 , . . . , T vn form a basis of W. Proof. The proof depends on the following familiar fact: if v = a1 v1 + a2 v2 + · · · + an vn , then T v = a1 T v1 + a2 T v2 + · · · + an T vn . Therefore, if the images T vi span W, then T is onto; if they are linearly independent, then ker T must be zero, that is, T is one–one. Thus, the proposition is proved in one direction. A simple modification yields the proof in the other direction. ! The preceding proposition implies that the dimension of a vector space is a crucial number as far as isomorphism is concerned. Corollary 4.4.5. Two finite-dimensional vector spaces over a field are isomorphic as vector spaces if and only if they have the same dimension. Proof. If V and W are isomorphic then any basis of V is mapped by the isomorphism to a basis of W. So they have the same dimension. Conversely, assume that both V and W have dimension n, and choose bases v1 , v2 , . . . , vn and w1 , w2 , . . . , wn of V and W, respectively. Now, Proposition (4.2.4) ensures that there is a linear map T : V → W which maps vi to wi for each i. But then the preceding proposition implies that T is an isomorphism. !

Saikia-Linear Algebra

218

book1

February 25, 2014

0:8

Linear Maps and Matrices

The rest of this section is devoted to a brief discussions about results known as Isomorphism theorems. This portion can be left out by the reader at the first reading. We begin by looking at a very natural mapping from a vector space to any of its quotient spaces. Let V be a vector space over a field F, and W any subspace of V. Define η : V → V/W as follows ηv = v + W. The rules for addition and scalar multiplication of cosets show that η(v1 + v2 ) = ηv1 + ηv2 η(av) = aηv for all vectors v1 , v2 , v ∈ V and scalars a ∈ F. Hence, η is a linear map. It is clear that η is onto V/W. We claim that the kernel of η is W. To establish the claim, first recall that the zero vector of the quotient space V/W is the coset W itself. Hence, if v ∈ ker η, then ηv = W and so v + W = W by the definition of η. The equality of cosets then implies that v ∈ W, which, in turn, shows that ker η ⊂ W. To complete the proof of our claim, we have to verify the inclusion in the other direction which can be easily done in a similar manner. The map η is customarily described as the canonical homomorphism of V onto V/W. Thus, we have shown that the canonical homomorphism η : V → V/W is an onto linear map whose kernel is precisely W. The idea of a quotient space is crucial in establishing isomorphisms between vector spaces. The next theorem is an example of how it can be done. Theorem 4.4.6. Then,

Let V and W be vector spaces over a field F, and let T : V → W be a linear map. V/ker T # Im(T ).

In particular, if T : V → W is an onto linear map, then V/ker T # W. Proof. Put K = ker T , and define S : V/K → W as follows: S (v + K) = T v. Note that as S is defined in terms of a representatives of a coset, we have to be careful as to whether the definition of S is independent of the choice of representative of cosets. The process of checking this independence is known as verifying that S is well defined; in practice, we take two arbitrary representatives, say, v1 and v2 , of the same coset of K and show that T v1 = T v2 . But our choice of v1 and v2 implies that v1 − v2 ∈ K = ker T . Therefore, T (v1 − v2 ) = 0 in W so the linearity of T gives us the required equality. To complete the proof of the theorem, we need to show that (a) S is linear, (b) S is one–one and (c) S is onto Im(T ). But these depend on routine and by now familiar verifications and therefore left to the reader.

!

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Isomorphism

219

Observe that, in case of a finite-dimensional vector space V, the isomorphism of the theorem allows us to deduce the following formula: dim(V/ ker T ) = dim Im(T ). It follows, from the expression of the dimension of a quotient space (see Proposition 3.9.3), that dim V = dim ker T + dim Im(T ), which is the dimension formula of Theorem (4.2.7). The proofs of the next two Isomorphism theorems are omitted; however, we indicate the necessary steps needed for the proofs in Exercises 6 and 8. Theorem 4.4.7. Let T : V → W be an onto linear map with kernel K.Then there is a one–one correspondence between the subspaces of W and the subspaces of V containing K. Corollary 4.4.8. Let U be a subspace of V. Then every subspace of the quotient space V/U is of the form L/U for some subspace L of V containing U. Theorem 4.4.9.

Let W1 and W2 be subspaces of a vector space V. Then (W1 + W2 )/W1 # W2 /W1 ∩ W2 .

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. In the following, given vector spaces are over an arbitrary field unless otherwise mentioned. (a) Rm is not isomorphic to Rn if m ! n. (b) Two proper, distinct subspaces of a finite-dimensional vector space can never be isomorphic. (c) Every vector space is isomorphic to itself. (d) An infinite-dimensional vector space V cannot be isomorphic to a proper subspace of V. (e) Rn is isomorphic to a unique subspace of Rm if m > n. (f) There is a one–one correspondence between the subspaces of two isomorphic vector spaces. (g) Any quotient space of a vector space V is isomorphic to a subspace of V. (h) Every subspace of a vector space V is isomorphic to a quotient space of V. (i) Every pair of bases of two vector spaces over a field having the same dimension determines an isomorphism between them. (j) If, for a finite-dimensional vector space V, V = W1 ⊕ W2 = W1 ⊕ W3 for subspaces W1 , W2 and W3 , then W2 # W3 . (k) For any field F, Mm×n (F) # Fm+n . (l) The null space of a matrix A ∈ Mm×n (F) cannot be isomorphic to its column space.

Saikia-Linear Algebra

220

book1

February 25, 2014

0:8

Linear Maps and Matrices

2. Let V, W and U be vector spaces over a field F. (a) If T is a linear map of V onto W such that the inverse map T −1 : W → V exists, then show that T −1 is also linear. (b) If T : V → W and S : W → U are linear maps, then show that the composite map S ◦ T is a linear map from V to U. (c) Complete the proof of Proposition (4.4.2). 3. Give a detailed proof of Proposition (4.4.3). 4. Complete the proof of Proposition (4.4.4). 5. Carry out the verifications needed to complete the proof of Theorem (4.4.6). The following exercise provides a proof of Theorem (4.4.7). 6. Let T be a linear map from a vector space V onto another space W, both over the same field. For any subspace W1 of W, let V1 = {v ∈ V | T v ∈ W1 }.

Show that V1 is a subspace of V containing K, the kernel of T . Show further that W1 6→ V1 is an one–one map from the set of all subspaces of W onto the set of all subspaces of V containing K. 7. Prove Corollary (4.4.8). 8. Prove Theorem (4.4.9) by using Theorem (4.4.6) after carrying out the following steps: For any subspaces W1 and W2 , define T from the sum W1 + W2 to the quotient space W2 /(W1 ∩ W2 ) by T (w1 + w2 ) = w2 + (W1 ∩ W2 ).

Show that (a) T is well-defined, i.e., if w1 + w2 = w' 1 + w' 2 ∈ W1 + W2 , then T (w1 + w2 ) = T (w' 1 + w' 2 ).

(b) T is linear. (c) T is onto W2 /(W1 ∩ W2 ).

(d) Kernel of T is precisely W1 .

9. Let F be any field, and A a fixed matrix in Mn (F). Prove that the map T A : Fn → Fn defined by T A (x) = Ax is a vector space isomorphism if and only if A is invertible. 10. Let F be any field, and B a fixed invertible matrix in Mn (F). Prove that the map φ : Mn (F) → Mn (F) defined by φ(A) = B−1 AB is a vector space isomorphism. 11. Consider M2 (R) with the standard basis consisting of the four unit matrices. Use Exercise 10 and the invertible matrix ' ( 1 0 B= 0 −1 to produce another basis of M2 (R). 12. Is the map T : Mn (F) → Mn (F) given by T (A) = At an isomorphism?

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

221

13. Give examples of vector spaces over a field and linear maps T and S between them such that (i) T is one–one but not onto and (ii) S is onto but not one–one.

4.5 MATRICES OF LINEAR MAPS One of the reasons for the utility of linear maps between finite-dimensional vector spaces is that there is a simple way of representing them as matrices. This representation of linear maps as matrices is a very effective one, for any algebraic manipulation of linear maps corresponds to exactly the same manipulation of the matrices representing the maps. That explains, as the reader must have guessed by now, the similarity between the algebraic structures of Hom(V, W) and Mm×n (F). We now discuss how this representation works. Let V and W be finite-dimensional vector spaces over a field F with dim V = n and dim W = m, and let T : V → W be a linear map. Fix a pair of ordered bases B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wm } of V and W, respectively. The image T v j , being a vector in W, is a linear combination of the basis vectors wi . Thus, for each fixed j, 1 ≤ j ≤ n, we can find m unique scalars a1 j , a2 j , . . . , am j in F such that Tvj =

m 1

ai j wi .

(4.8)

i=1

Once T v j is expressed in this manner for all j, we have a set of mn scalars ai j for 1 ≤ i ≤ m, 1 ≤ j ≤ n. Definition 4.5.1. the matrix

The matrix of T , with respect to the bases B and C of V and W, respectively, is m(T ) = [ai j ]

in Mm×n (F), whose jth column consists of precisely the scalars determined by T v j in Equation (4.8). Thus, this construction allows us, after fixing bases, to associate with any linear map T : V → W a unique matrix m(T ) in Mm×n (F). Some of the important features of the definition are pointed out in the following remarks: (a) It should be clear that different pair of bases will yield different matrix representations of the same linear map. So, it is reasonable to expect that this dependence on bases should be reflected in the notation m(T ) for the matrix of T . However, that will make the notation really clumsy. We prefer to keep our notation as simple as m(T ), but that means we have to keep in mind the bases used for a given matrix representation. (b) Even when we consider maps from V into itself, that is, with W = V in the definition, it is not necessary to take B = C. (See examples after these remarks.) (c) However, if for the linear operator T : V → V, we use the same basis B for both the domain V and the range V, then we will refer to the matrix m(T ) of T as the matrix with respect to the basis B. This will be the case in most of our important examples. We now present some examples.

Saikia-Linear Algebra

222

book1

February 25, 2014

0:8

Linear Maps and Matrices

EXAMPLE 20 Consider the identity map I = IV on V, where V is an n-dimensional vector space over a field F. Pick any basis B of V. Since I fixes any vector of V, in particular, the basis vectors of B, it follows that the entries of the jth column of m(I) will be all zeros, except the entry at the jth place, which is 1. Thus, the matrix of I with respect to any basis B will be the identity matrix In of order n in Mn (F). However, it is not hard to see that the matrix of I with respect to two distinct bases B and C of V need not be the identity matrix, for the jth column of the matrix now will consist of the scalars which are the coefficients in the linear combination expressing the jth vector of B in terms of the vectors in the basis C. EXAMPLE 21 Consider the zero map z from V to W, both vector spaces over a field F. Assume dim V = n and dim W = m. Since z takes every vector of V, and in particular the basis vectors of any basis of V, to the zero vector in W, it follows that all the entries of any of the n columns of m(z) must be zero. Thus, no matter which bases are chosen, m(z) will be the zero matrix in Mm×n (F). EXAMPLE 22 Let P1 : R2 → R2 be the linear operator given by P1 (x1 , x2 ) = (x1 , 0). Take the standard basis {e1 , e2 } of R2 where e1 = (1, 0),

e2 = (0, 1).

We express the images of the basis vectors under P1 as combination of the same basis vectors: P1 e1 = P1 (1, 0) = (1, 0) = 1.e1 + 0.e2 , P1 e2 = P1 (0, 1) = (0, 0) = 0.e1 + 0.e2 . Therefore, the matrix of P1 with respect to the standard basis of R2 is ' ( 1 0 . 0 0

To appreciate the importance of ordered basis in matrix representation, consider the matrix of P1 with respect to the basis {e2 , e1 } of R2 . We repeat that this is not the standard basis of R2 , though as a set it is the same as the standard basis. The reader should have no difficulty in showing that the matrix of P1 with respect to the new basis is ( ' 0 0 0 1 which is not the matrix we had found for P1 with the first ordered basis. EXAMPLE 23 We find the matrix of P1 with respect to another basis B of R2 , consisting of vectors v1 = (1, 1), v2 = (2, −1). To do that we have to determine scalars a, b, c and d such that P1 v1 = (1, 0) = av1 + bv2 = a(1, 1) + b(2, −1) = (a + 2b, a − b), P1 v2 = (2, 0) = cv1 + dv2 = c(1, 1) + d(2, −1) = (c + 2d, c − d).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

223

The two equations we have to solve for a and b are a + 2b = 1 and a − b = 0, which give a = b = 1/3. Similarly, equating the components of both sides of the second equation, and solving them for c and d, we see that c = d = 2/3. Thus, the required matrix of P1 relative to basis B is given by ' ( 1/3 2/3 . 1/3 2/3 EXAMPLE 24 Let T : R2 → R3 be defined by T (x1 , x2 ) = (x1 + x2 , x2 , x2 ). To find the matrix of T with respect to the standard bases of R2 and R3 , respectively, note that T (1, 0) = (1, 0, 0) = 1.e1 + 0.e2 + 0.e3 T (0, 1) = (1, 1, 1) = 1.e1 + 1.e2 + 1.e3 . Note that to avoid confusion, we have used the symbols ei to mean the standard basis vectors of R3 whereas no such symbols are used for the basis vectors in R2 . Thus, the preceding relations show that the matrix of T with respect to the standard bases of R2 and R3 is a 3 × 2 one, given by   1 1 0 1.   0 1

EXAMPLE 25 Let T : R3 → R be the linear map given by T (x1 , x2 , x3 ) = x1 + x2 + x3 . The matrix of T , with respect to the standard basis {e1 , e2 , e3 } of R3 and the basis {1} of R, is clearly the following 1 × 3 matrix / 0 1 1 1. Note that any non-zero real a can form a basis of the vector space R over itself. Keeping the same basis for R3 but changing the basis of R to {a}, we see that the matrix of T with respect to the new pair of bases is [a−1

a−1

a−1 ].

EXAMPLE 26 Consider the differential map D on R3 [x], the real vector space of all real polynomials of degree at most 3. Thus, D(a0 + a1 x + a2 x2 + a3 x3 ) = a1 + 2a2 x + 3a3 x2 . Take the standard basis {1, x, x2 , x3 } of R3 [x]. To obtain the matrix of D with respect to this basis, we have to express D of each of these basis vectors as linear combinations the same basis vectors. The coefficients in these combinations will form the

Saikia-Linear Algebra

224

book1

February 25, 2014

0:8

Linear Maps and Matrices

columns of the required matrix. Since D(1) = 0, D(x) = 1, D(x2 ) = 2x, D(x3 ) = 3x2 , it follows that the required matrix is  0 0  0  0

1 0 0 0

0 2 0 0

 0  0 . 3 0

However, with respect to the basis {1, 1 + x, 1 + x2, 1 + x3 }, the matrix of the same differential map D will be the following one:   0 1 −2 −3 0 0 2 0  . 0 3 0 0 0 0 0 0

Going back to the general discussion now, we seek to relate the vectors appearing as images of a linear map T : V → W between two finite-dimensional vector spaces to its matrix representation. Choose B = {v1 , . . . , vn } and C = {w1 , . . . , wm } as bases of V and W, respectively, and let A = [ai j ] be the matrix in Mm×n (F) representing T with respect to these bases. Recall the idea of coordinates of 4 vectors in finite-dimensional vector spaces. For any v ∈ V, if v = b1 v1 + · · · + bn vn = nj=1 b j v j is the expression of v in terms of the vectors of the basis B, then the coordinate vector of v with respect to basis B is the n × 1 column matrix (b1 , . . . , bn )t . Similarly, if T v = w, then writing w in terms of the 4 basis vectors of C we get the coordinate vector (c1 , . . . , cm )t of w, where T v = w = m i=1 ci wi . There is a very natural and useful relation between the coordinate vector of v and the coordinate vector of T v = w through the matrix A of T . To find this relation, recall that as A = [ai j ] is the matrix of T with respect to the bases B and C, we have, from Equation (4.8) Tvj =

m 1

ai j wi .

i=1

It follows that w = Tv   n 1   = T  b j v j  j=1

=

n 1

b jT v j

j=1 n 1

as T is linear,

 m  1  ai j wi  = b j  i=1 j=1   m n   1 1   w  = a b i j j  i  i=1

j=1

by the formula for T v j , by interchanging the sums.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

225

Observe that changing the order of the two summations and then rearranging the terms to obtain the last equality are allowed as the sums are finite. Comparing this expression for w with the earlier one 4 w= m i=1 ci wi , we conclude that ci =

n 1

ai j b j

for i = 1, 2, . . . , m,

j=1

by the uniqueness of linear combinations of basis vectors. However, these m equations are equivalent to a single matrix equation      c1  b1   c  b   2   2   ..  = A  .. .  .   .  bn cm

Thus, we have shown that the vector equation Tv = w

is equivalent to the matrix equation

Ax = y

(4.9)

where x and y are the coordinate vectors of v and w with respect to the given bases of V and W, respectively. Observe that this nice formula works only when the matrix representation of T and the coordinate vectors of v and w are computed with respect to the same bases of V and W. Ranks and Nullities of Matrices Representing Linear Maps It is time now to examine whether the rank and nullity of a linear map, as introduced in Definition (4.2.8) in Section 4.2, are related to more familiar numbers known as the rank and nullity of any matrix that represents it. By definition, the rank of a linear map T : V → W is the dimension of the subspace Im(T ) of W whereas the nullity of T is the dimension of ker T . Let A be the matrix of T relative to some fixed bases of V and W. Assume that the dimensions of V and W are n and m, respectively. Observe that choosing a basis for the n-dimensional vector space V over a field F means setting up an isomorphism of V onto Fn under which a vector v in V corresponds to its coordinate vector x in Fn (see Proposition 4.4.3). Similarly, W is isomorphic to Fm . Since T v = 0 if and only if Ax = 0, it follows that ker T and the nullspace of A are isomorphic under the same correspondence between vectors of V and their coordinate vectors in Fn . Thus, the dimensions of these two subspaces are equal showing that the nullity of the linear map T and the nullity of the matrix A are the same. Under the same correspondence, the image of T , that is, the subspace {w | T v = w for some v ∈ V} of W is isomorphic to the subspace {y | Ax = y for some x ∈ Fn } of Fm . But this subspace of Fm is the column space of A whose dimension is the rank of the matrix A. Thus, the ranks of T and A are the same. We record our observations as the following proposition. Proposition 4.5.2. Let V and W be finite-dimensional vector spaces over a field F. Let T : V → W be a linear map and let A be its matrix with respect to some fixed bases of V and W. Then, rank(T ) = rank(A) and nullity(T ) = nullity(A).

Saikia-Linear Algebra

226

book1

February 25, 2014

0:8

Linear Maps and Matrices

The relations proved in Proposition (4.5.2) allow us to settle certain questions about linear maps by looking at corresponding matrices. For example, let us try to see the implication of T being one–one. We keep the same notation. Now, T is one–one if and only if T v = 0 implies v = 0. By the equivalence given in Equation (4.9), this condition holds if and only if Ax = 0 implies

x = 0,

which is another way of stating that the matrix equation Ax = 0 has only the zero solution. We now specialize to the case when dim V = dim W = n so that the matrix A is now a square matrix of order n. Now, a square matrix A is invertible if and only if the equation Ax = 0 has only the zero solution. Our discussion in this case then implies that T : V → W is one–one if and only if the corresponding matrix A of T is invertible. But the fact that A is invertible also means that the equation Ax = y has a solution for any n-dimensional column vector y (W has dimension n). In other words, the equation T v = w for any w ∈ W has a solution v ∈ V, that is, T is onto. Note that the conclusion that T is onto under the hypothesis that T is one–one was arrived at earlier as Corollary (4.2.9) of the dimension formula. But we have reached the same conclusion by using matrix equation as the equivalence (4.9) is now available. Singular maps and Matrices Linear operators which are not invertible are also useful. We take this opportunity to have a very brief discussion about such maps. First a definition. Definition 4.5.3. Let T be a linear operator on a vector space V. We say T is singular if T is not invertible. Similarly, a square matrix A is said to be singular if A is not invertible. Now, let T be a linear operator on an n-dimensional vector space V over a field F, and A the matrix of T with respect to any fixed but arbitrary basis of V. We leave the proof of the following to the reader. Proposition 4.5.4. The following are equivalent: (a) (b) (c) (d) (e)

T is singular. ker T is non-zero. The matrix equation Ax = 0 has a non-zero solution. det A = 0. A is singular.

Linear Maps representing Matrices We now show that every matrix determines a linear map in exactly the same manner a linear map determines a matrix, so that the association of linear maps and matrices works in both ways. Lemma 4.5.5. Let V and W be finite-dimensional vector spaces over a field F with dim V = n and dim W = m. Let B ∈ Mm×n (F) be an arbitrary matrix. Fix any bases B = {v1 , . . . , vn } and C = {w1 , . . . , wm } for V and W, respectively. Then there is a unique linear map S : V → W such that the matrix of S with respect to these bases is precisely B.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

227

Proof. If B = [bi j ], according to Proposition (4.2.4), there is a unique linear map S from V into W such that m 1 S vj = bi j wi for j = 1, 2, . . . , n. i=1

It is clear, by Definition (4.5.1), that the matrix of the linear map S with respect to the bases B and C of V and W, respectively, is precisely B. !

Let us discuss a couple of examples to illustrate the preceding lemma. ( ' 0 1 , and let V = W = R2 . Fix the standard basis {e1 , e2 } of R2 . Following EXAMPLE 27 Let A = 1 0 the procedure outlined in the lemma, we see that A determines a unique linear operator T : R2 → R2 (with respect to the chosen basis) such that T e1 = 0.e1 + 1.e2 = e2 and T e2 = 1.e1 + 0.e2 = e2 . Thus, for a typical vector (x1 , x2 ) in R2 , T (x1 , x2 ) = T (x1 e1 + x2 e2 ) = x1 e2 + x2 e1 = (x2 , x1 ). We can also use Equation (4.9) to determine the coordinate vector of T (x1 , x2 ), which is given by (2 3 2 3 2 3 ' 0 1 x1 x x = 2 . A 1 = 1 0 x2 x2 x1

Observe that x1 and x2 are the coordinates of the vector (x1 , x2 ) in R2 only with respect to the standard basis. It should be clear that in calculations like the preceding matrix calculation, the coordinate vectors must be interpreted in terms of the bases chosen. EXAMPLE 28 Consider the example of the same matrix A. This time keep the standard basis {e1 , e2 } for the domain R2 , but let {v1 , v2 } be the basis of the range space R2 , where v1 = (2, 0) and v2 = (1, 1). Let S be the linear operator on R2 determined by the same matrix A with respect to the new bases, so that S e1 = v2

and S e2 = v1 .

Note that if (x1 , x2 )t is the coordinate vector of v, then the coordinate vector of S v in W will still be the column vector (x2 , x1 )t , but the coordinates this time must be interpreted in terms of the new basis of W. Hence, the components of the coordinate vector (x2 , x1 )t in W = R2 are given by x2 v1 + x1 v2 = x2 (2, 0) + x1(1, 1) = (x1 + 2x2 , x1 ). We leave it the reader to verify directly that S (x1 , x2 ) is indeed (x1 + 2x2 , x1 ), by computing S (x1 e1 + x2 e2 ). ' ( 2 0 −1 EXAMPLE 29 Consider now the matrix A = . Being a 2 × 3 real matrix, A determines −1 1 1 a unique linear map from any three-dimensional real vector space V into a two-

Saikia-Linear Algebra

228

book1

February 25, 2014

0:8

Linear Maps and Matrices

dimensional real vector space W, once bases for V and W are fixed. Let us choose V = R2 [x] with standard basis {1, x, x2 } and W = R1 [x] with standard basis {1, x}. Recall that the coefficients of a polynomial in Rn [x] themselves are the coordinates of that polynomial with respect to the standard basis {1, x, · · · , xn }. Now     3 ' ( a0  2 a0  2 0 −1   2a0 − a2   a1  = . A a1  = −a0 + a1 + a2 −1 1 1     a2 a2

Since we are considering usual standard bases for both the polynomial spaces, it follows that the linear map determined by A is T : R2 [x] → R1 [x], where T (a0 + a1 x + a2 x2 ) = (2a0 − a2 ) + (−a0 + a1 + a2 )x.

Observe that the same matrix determines the linear map S : R3 → R2 with respect to the usual standard bases, where S is given by S (x1 , x2 , x3 ) = (2x1 − x3 , −x1 + x2 + x3 ). HomF (V, W) is Isomorphic to Mm×n (F) The preceding examples will help in getting a feeling for the correspondence described in the next theorem. Recall that a one–one correspondence between two sets is a one–one and onto map. We shall also require the fact (for the proof of the theorem) that a map is one–one and onto if and only if the inverse of the map exists. Theorem 4.5.6. Let V and W be finite-dimensional vector spaces over a field F with dimensions n and m, respectively. Choose any bases B={v1 , v2 , . . . , vn } and C={w1 , w2 , . . . , wm } of V and W, respectively. For any linear map T : V → W, let m(T ) be the matrix of T with respect to the bases B and C. Then, the map T 6→ m(T ) from HomF (V, W) to Mm×n (F) is a one–one correspondence which establishes a vector space isomorphism from HomF (V, W) onto Mm×n (F). Proof. By our remark preceding the statement of the theorem, to prove that the map T 6→ m(T ) is a one–one correspondence, we need to exhibit an inverse of this map. This inverse is provided by the map in Lemma (4.5.5) which assigns every matrix in Mm×n (F) a unique linear map from V into W with respect to a pair of fixed bases. Denote this map by m∗ . Now, if B ∈ Mm×n (F) determines S ∈ HomF (V, W), it was shown in that lemma that m(S ) is precisely B. In other words, m(m∗ (B)) = m(S ) = B. Moreover, the uniqueness of m∗ of a matrix as given by the same lemma shows that m∗ (m(T )) = T for any T ∈ HomF (V, W). Thus, m∗ is indeed the inverse of m. To complete the proof, thus, we need to show further that m preserves the vector space operations, that is, to show that m(T 1 + T 2) = m(T 1 ) + m(T 2) for any T 1 , T 2 ∈ Hom(V, W) and a ∈ F.

and m(aT 1 ) = am(T 1)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

229

To verify the first equality, assume that m(T 1 ) = [ai j ] and m(T 2 ) = [bi j ] be the m × n matrices of T 1 and T 2 with respect to the bases B and C, respectively. According to Equation (4.8), one then has T1v j =

m 1

ai j wi

and T 2 v j =

i=1

It follows that

m 1

bi j wi .

i=1

(T 1 + T 2 )v j =

m 1

(ai j + bi j )wi ,

i=1

showing that the jth column of the matrix of T 1 + T 2 with respect to the given bases is the sum of the jth columns of m(T 1 ) and m(T 2 ). Since j is arbitrary, it follows that m(T 1 +T 2 ) is indeed m(T 1 ) +m(T 2). We leave a similar verification of the second equality to the reader. ! A consequence of the preceding theorem is that the vector space End(V) of all linear operators of an n-dimensional vector space over a field F is isomorphic to Mn (F) as vector spaces. More interestingly, the same isomorphism also preserves the product in these spaces, which we verify in the following corollary. That means, for example, that invertible matrices correspond to one–one, onto linear operators. Some other consequences are listed in the exercises. Corollary 4.5.7. Let V be an n-dimensional vector space over a field F. Let B = {v1 , v2 , . . . , vn } be a fixed basis of V. For any T ∈ EndF (V), let m(T ) be the matrix of T with respect to the basis B. Then, the map m : EndF (V) → Mn (F) is a vector space isomorphism which preserves the product also. This corollary is also stated as follows: Mn (F) and EndF (V) are isomorphic as F-algebras. Proof. That m : EndF (V) → Mn (F) is a vector space isomorphism follows from the preceding theorem. Thus, the only verification left is to check that m(T 1 T 2 ) = m(T 1 )m(T 2 ) for any T 1 , T 2 ∈ EndF (V). As in the proof of the theorem, if we let m(T 1 ) = A = [ai j ] and m(T 2 ) = B = [bi j ] be the matrices, of order n, of T 1 and T 2 , respectively, with respect to the basis B, then for each j, (1 ≤ j ≤ m), T1v j =

n 1

ai j vi

and T 2 v j =

i=1

Therefore,

n 1

b i j vi .

i=1

(T 1 T 2 )v j = T 1 (T 2 v j ) by definition of product,  n  1   = T 1  bk j vk  by the formula for T 2 , k=1

=

n 1

b k j T 1 vk

k=1 n 1

as T 1 is linear

 n  1   = bk j  aik vi  k=1  n i=1  n 1 1    = aik bk j  vi i=1

k=1

by the formula forT 1 ,

Saikia-Linear Algebra

230

book1

February 25, 2014

0:8

Linear Maps and Matrices

It follows that the (i, j)th entry of the matrix m(T 1 T 2 ), which is the coefficient of vi in the sum 4 of the last equality, is nk=1 aik bk j . But this sum is also the (i, j)th entry of the matrix product AB. We, therefore, conclude that the matrix of the product T 1 T 2 is the product AB of the matrices of T 1 and T 2 . ! We give a couple of applications of these results. Since isomorphic vector spaces have the same dimension, and since the unit matrices ei j for 1 ≤ i ≤ m, 1 ≤ j ≤ n form a basis of Mm×n (F), we have the following corollary. Corollary 4.5.8.

Let dim V = n and dim W = m. Then, dim HomF (V, W) = nm.

In particular, dim EndF (V) = n2 . Note that this corollary is Theorem (4.3.6) which we stated without proof in the preceding section. As another application of the isomorphism between matrices and linear maps, we show how to produce basis vectors of Hom(V, W), or End(V). Observe that basis vectors correspond to basis vectors under any isomorphism between vector spaces. Therefore, if we write down the linear maps corresponding to the unit matrices ei j , then Theorem (4.5.6) guarantees that these maps form a basis of Hom(V, W). So let us fix any two bases, say B = {v1 , v2 , . . . , vn } and C = {w1 , w2 , . . . , wm } of V and W, respectively. The linear maps we are looking for, say fi j , must be such that their matrices with respect to the bases B and C are precisely the ei j . Thus, we must define these maps in such a way that the kth column of the matrix of fi j will be the kth column of ei j .(Here, we are assuming that i, j are fixed but arbitrary, and k is a positive integer between 1 and n.) This gives us the clue as to how fi j acts on the kth basis vector of B. As the jth column of ei j consists of all zeros except for the entry at the ith row which is 1, and as the kth column, for k ! j, is the zero column, we must have fi j (vk ) = wi =0

if k = j if k ! j.

These mn linear maps fi j from V to W then form a basis of Hom(V, W). We leave to the reader the slight modification needed to obtain a basis of End(V). Next, we present two examples where linear maps will be used to deduce specific as well as general facts about matrices. The first example is about the existence of nilpotent matrices. Recall that a matrix A ∈ Mn (F) is said to be nilpotent if for some positive integer k, Ak is the zero matrix. The smallest k for which Ak is the zero matrix, but Ak−1 is not, is said to be the index of nilpotency of A. We ask the question: Can we find nilpotent matrices in Mn (F) of any index k ≤ n? (We will see later that the index cannot be greater than n.) It turns out that the nilpotent operators, constructed after Definition (4.4.3), can be used to derive nilpotent matrices. So, as in that example, we fix a basis of Fn , say, the standard basis {e1 , e2 , . . . , en }. Then, the linear operator T : Fn → Fn determined by the formulae T e j = e j+1 T en = 0

for j = 1, 2, . . . , n − 1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

231

has the property that T n is the zero map whereas T n−1 is not. Therefore, by the isomorphism between Mn (F) and End(Fn ) as F-algebras, the matrix of T with respect to the standard basis must have the same property. In other words, it must be a nilpotent matrix of index n. This n × n matrix has a very special but simple form, and we denote it by Jn (0). Considering the action of T on the basis vectors, we see that

Definition 4.5.9.

 0 1  0  . Jn (0) =   .  .  0  0

0 0 1 . . . 0 0

0 0 0 . . . 0 0

. . .

. . .

. . .

. .

. .

. .

0 0 0 . . . 1 0

0 0 0 . . . 0 1

 0   0   0  .  . .   .   0  0

Jn (0) is called the elementary Jordan block over F of order n with zero diagonal.

In general, by the matrix Jn (a) ∈ Mn (F) for a scalar a ∈ F, we will mean an n ×n matrix over F having all the diagonal entries equal to a, all the subdiagonal entries equal to 1 and having zeros elsewhere. Observe that, unless a = 0, the matrix Jn (a) is not nilpotent. To understand better the ideas presented just now, we write down the matrix Jn (0) and its powers in the case of n = 4:  0 1 J4 (0) =  0 0

 0 0 J4 3 (0) =  0 1

0 0 1 0

0 0 0 1

0 0 0 0

0 0 0 0

 0  0  0 0

 0  0  0 0

 0 0 J4 2 (0) =  1 0

 0 0 J4 4 (0) =  0 0

0 0 0 1

0 0 0 0

0 0 0 0

0 0 0 0

 0  0  0 0

 0  0 . 0 0

This example clearly suggests the way nilpotent matrices of different indices can be formed. The following result shows how difficult questions about matrices may be settled with ease by examining the corresponding linear maps. Lemma 4.5.10. Let A, B ∈ Mn (F), and In the identity matrix in Mn (F). The matrix In − AB is invertible if and only if In − BA is invertible. Proof. Consider the n-dimensional vector space Fn over F, and fix any basis, say the standard basis, of Fn . Once the basis is fixed, we have an isomorphism between Mn (F) and End(Fn ). The point to note is that invertible matrices in Mn (F) correspond to invertible operators in End(Fn) and vice-versa under this isomorphism. Let T and S be the linear operators on Fn corresponding to the matrices A and B. If I denotes the identity map on Fn , it follows that I − T S and I − S T will correspond to matrices I − AB and I − BA, respectively. Assume that I − AB is invertible, but I − BA is not. This assumption implies that the operator I − T S is invertible, but I − S T is not. Therefore, ker(I − T S ) is the zero subspace, whereas

Saikia-Linear Algebra

232

book1

February 25, 2014

0:8

Linear Maps and Matrices

there is some non-zero vector v ∈ Fn such that (I − S T )v = 0. It follows that S (T v) = Iv = v. Put w = T v, so we have S w = v. Observe that this relation implies that w ! 0 for, otherwise v = 0. Applying T to both sides of the relation we finally obtain T (S w) = T v = w which shows that w ∈ ker(I − T S ). This contradicts our assumption that I − T S is invertible as w is non-zero. This completes the proof. !

We have already noted that different choices of bases produce different matrices of the same linear map. We now discuss the precise relation between two different matrix representations of a single linear map with respect to different choices of bases. For simplicity, we take up the special case of a linear operator on a vector space. This is the case one usually encounters in practice. For the general case of a linear map between two vector spaces, see the result at the end of this section. So let V be a finite-dimensional vector space over a field F, and T is a linear operator on V. We choose two bases B and B' of V. Let A and B be the matrices of T with respect to bases B and B' , respectively. Now, by Equation (4.9), the vector equation T v = w can be written in terms of A and B as follows: Ax = y

and

Bx' = y'

where x, x' are the coordinate vectors of v, and y, y' of w with respect to bases B and B' , respectively. Let P be the change of basis matrix from the basis B' to B. Then, as we have shown in the discussion about change of coordinates (see Theorem 3.4.14 in the last chapter) Px' = x

and

Py' = y.

Therefore, (P−1 AP)x' = (P−1 A)x = P−1 y = y' . We can compare this with Bx' = y' . Since x' can be chosen arbitrarily, this comparison yields the equality P−1 AP = B. Thus we have proved the following important result. Proposition 4.5.11. If A and B are matrices of a linear operator on a finite-dimensional vector space V with respect to bases B and B' , respectively, then B = P−1 AP, where P is the change of basis matrix from B' to B. This proposition gives rise to the important concept of similar matrices. Definition 4.5.12. Given two matrices A, B ∈ Mn (F), we say B is similar (or conjugate) to A over F, if there is an invertible matrix P ∈ Mn (F) such that B = P−1 AP. We leave it to the reader to verify that similarity is an equivalence relation in the set Mn (F). The similarity relation partitions Mn (F) into equivalence classes, called similarity classes of similar matri-

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

233

ces. Note that similarity depends on the base field. For example, two matrices of order n over R may not be similar, even though considered as matrices in Mn (C), they may be similar. The discussion preceding Definition (4.5.11) proves the following proposition. Proposition 4.5.13. Let V be an n-dimensional vector space over a field F, and let T be a linear operator on V. If A and B are matrices in Mn (F) representing T with respect to two different bases of V, then A and B are similar in Mn (F). Going the other way, we may ask the following question: Suppose, A and B are similar matrices in Mn (F). Given any n-dimensional vector space V over F, is it possible to find a linear operator T on V, and a pair of bases of V such that A and B are matrices of T with respect to these bases? We claim that it is indeed, possible and we now sketch a proof of our claim. Consider similar matrices A = [ai j ] and B = [bi j ] in Mn (F) as well as an invertible matrix P = [pi j ] such that B = P−1 AP. Choose any basis B = {v1 , v2 , . . . , vn } of V, and let T be the unique linear operator on V determined by the matrix A with respect to this basis. Thus, Tvj =

n 1

a i j vi

for each j = 1, 2, . . . , n.

i=1

We seek the other basis of V with the help of the invertible matrix P. Let S : V → V be the unique linear operator determined this time by P the usual way: S vj =

n 1

p i j vi

for each j = 1, 2, . . . , n.

i=1

Let S v j = u j . Since P is invertible, S is an isomorphism, and consequently the vectors {u1 , u2 , . . . , un }, being the images of basis vectors under S , will form a basis of V. We leave it to the reader to show that B is precisely the matrix of T with respect to this new basis of V, which establishes our claim. We record our claim as the following proposition. Proposition 4.5.14. Given two similar matrices in Mn (F) and any n-dimensional vector space V over F, there is a linear operator T on V and a pair of bases of V such that the given matrices are the matrix representations of T with respect to these bases. It is sometimes useful to consider similar linear operators, which can be defined analogously to similar matrices. Definition 4.5.15. Two linear operators T, S ∈ EndF (V) are said to be similar, if there is an invertible linear operator R ∈ EndF (V) such that S = R−1 T R. It should be clear that the matrices representing two similar operators on a finite-dimensional vector space with respect to any basis must be similar. As in Mn (F), similarity is an equivalence relation in EndF (V). For a simple criterion for the similarity of two linear operators, see Exercise 19 of this section. The relation between the matrices of a linear map between two vector spaces with respect to two pairs of bases is described in the following result. The proof, which is similar to the proof of Proposition (4.5.13), is also left to the reader.

Saikia-Linear Algebra

234

book1

February 25, 2014

0:8

Linear Maps and Matrices

Proposition 4.5.16. Let V and W be vector spaces over a field F, having dimensions n and m, respectively. Let T ∈ HomF (V, W), and let A ∈ Mm×n (F) be the matrix of T with respect to bases B and C of V and W, respectively. Given another pair of bases B' and C' of V and W, respectively, if A' is the matrix of T with respect to the bases B' and C' , then there exist invertible matrices P ∈ Mn (F) and Q ∈ Mm (F) such that A' = Q−1 AP. In fact, P and Q are the matrices of change of bases from B' to B and C' to C, respectively. We conclude this section by presenting an example of two matrices which can be shown to be similar by actually representing them as matrices of a single linear operator with respect to two different bases. EXAMPLE 30 Consider matrices  0  A = 1  0

0 0 1

 0  1 and  1

 1  B = 1  0

1 0 0

 1  −1  0

in M3 (R). Our aim is to find an operator on the three-dimensional vector space R3 such that the given matrices are its representations with respect to some suitable bases. Usually, we take one of the bases to be the standard basis e1 , e2 , e3 , and the operator T to be the unique one determined by one of the matrices, say A, with respect to this basis. Thus, we have T e1 = e2 , T e2 = e3 and T e3 = e2 + e3 . Next, we try to define another basis, whose vectors are suitable linear combinations of the vectors of the first basis, such that the matrix of T with respect to this new basis is B. The entries of B suggest that we try the vectors given by u1 = e2 + e3 u2 = e3 u3 = e1 . We leave to the reader the verification that the matrix of T is indeed, B with respect to the basis formed by u1 , u2 and u3 . This then confirms that A and B are similar. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given vector spaces are finite-dimensional and over an arbitrary field. (a) Any m × n matrix over a field F determines a unique linear map from Fm to Fn with respect to their standard bases. (b) If A and B are, respectively, the matrices of linear maps T and S of a vector space V to another space W (both over the same field) with respect to some fixed bases of V and W, then A + B is the matrix of T + S with respect to the same bases.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Matrices of Linear Maps

235

(c) HomF (V, W) # HomF (W, V) for any vector spaces V and W over the field F. (d) If T is an invertible linear operator on a vector space V, then its matrix with respect to any basis of V is an invertible matrix. (e) If a matrix in Mn (F) determines two linear operators T and S on Fn with respect to two bases, then T = S . (f) If an invertible matrix A is similar to another matrix B in Mn (F), then B is invertible. (g) If a matrix A is similar to B in Mn (F), then A2 is similar to B2 in Mn (F). (h) For any positive integer n > 1, there are similar matrices in Mn (F) having different ranks. (i) If A is the matrix of an invertible linear operator on Fn with respect to any fixed basis, then the columns of A form a basis of Fn . (j) There is some basis of the real vector space C of complex numbers with respect to which the matrix of the operator T given by T (z) = z is singular. (k) The nullity of the elementary Jordan block Jn (0) for n ≥ 1 is 1.

(l) The nullity of any non-zero nilpotent matrix in Mn (F) is 1. 2. Let T be the linear transformation from R3 → R2 given by

T (x1 , x2 , x3 ) = (x1 − 2x2, x2 − 2x3). Find the matrix of T with respect to the standard bases of R3 and R2 . What will be the matrix of T if the basis of R2 is changed to {v1 , v2 }, where v1 = (1, −2) and v2 = (0, 1)? 3. Compute the matrix of the linear operator T on R4 given by T (x1 , x2 , x3 , x4 ) = (x1 , 2x1 + x2 , 3x1 + 2x2 + x3 , 4x1 + 3x2 + 2x3 + x4 ) with respect to the standard basis of R4 . Determine the invertibility of T by considering the matrix of T . 4. Let T be the linear map from R2 [x] to R3 [x] given by T (p(x)) = (x + 1)p(x) for any

p(x) ∈ R2 [x].

Find the matrix of T with respect to the bases {1, x, x2 } and {1, x, x2 , x3 } of R2 [x] and R3 [x], respectively. 5. Consider bases B = {1, x, x2 } and B' = {1, 1 + x, 1 + x2 } of R2 [x].

(a) Find the matrix of the translation operator T on R2 [x] given by T (p(x)) = p(x + 1) for any p(x) ∈ R2 [x] with respect to the bases B and B' .

(b) Let B'' = {1} be a basis of the real vector space R. Find the matrix of the linear map S : R2 [x] → R given by S (p(x)) = p(0) with respect to the bases B and B'' . 6. Let T be the linear operator on M2 (R) given by T (A) = At for any A ∈ M2 (R). Find the matrix of T with respect to the basis of M2 (R) consisting of the unit matrices e11 , e12 , e21 and e22 , and deduce that T is an invertible operator. 7. Compute the ranks and the nullities of the following linear operators by considering their matrices with respect to the standard basis: (a) T on R3 ; T (x1 , x2 , x3 ) = (x1 − x2 + 2x3 , 3x2 − x3 , 3x1 + 5x3 ). (b) T on R4 ; T (x1 , x2 , x3 , x4 ) = (x2 + x3 − x4 , 2x1 − x3 + x4 , x1 + x2 − 2x4, x1 − 2x2 − 3x3 ).

Saikia-Linear Algebra

236

book1

February 25, 2014

0:8

Linear Maps and Matrices

8. Prove Proposition (4.5.4). 9. Complete the proof of Theorem (4.5.6). The following establishes the dimension of HomF (V, W) directly. 10. Let {v1 , v2 , . . . , vn } and {w1 , w2 , . . . , wm } be two bases of vector spaces V and W over a field F respectively. For 1 ≤ i ≤ m and 1 ≤ j ≤ n, let fi j be the linear map from V to W given by fi j (vk ) = δ jk wi

for k = 1, 2, . . . , n,

where δ jk is the Kronecker delta. (a) Show that { fi j } are mn linearly independent elements of HomF (V, W). (b) Show that { fi j } span HomF (V, W).

(Hint: For (b), given any f ∈ HomF (V, W), first write f (v j ) as a linear combination of w1 , w2 , . . . , wm .)

11. Complete the proof of Proposition (4.5.14). 12. Prove Proposition (4.5.16). 13. Show that the following matrices are similar over C: ' ( ' iθ cos θ sin θ e A= and B = − sin θ cos θ 0

0 e−iθ

(

.

14. Let V be an n-dimensional vector space over a field F where n ≥ 2, and let T be a nilpotent operator of index of nilpotency n. Exhibit a basis of V with respect to which the matrix of T is precisely Jn (0), the elementary Jordan block of order n over F. 15. Let F be a field, and let A and B be nilpotent matrices in Mn (F) such that both have index of nilpotency n. Use the preceding exercise to show that A and B are similar over F. 16. Let     1 1 1 3 0 0     A = 1 1 1 and B = 0 0 0     1 1 1 0 0 0 be matrices in M3 (R). Prove that they are similar over R by showing that if T is the linear operator on R3 represented by A with respect to the standard basis of R3 , then there is a basis of R3 relative to which the matrix of T is B. 17. Prove that over any field F, the elementary Jordan block Jn (0) is similar to its transpose Jn (0)t for any positive integer n. 18. Let A and B be similar matrices in Mn (F). Determine whether A and B have (a) The same rank, (b) The same nullity, (c) The same trace, (d) The same determinant. 19. Let T and S be linear operators on a finite-dimensional vector space V. If the matrix of T with respect to some basis of V is the same as the matrix of S with respect to another basis of V, then show that T and S are similar. [Hint: If {v1 , v2 , . . . , vn } and {u1 , u2 , . . . , un } are the bases, consider the linear operator R on V defined by Rv j = u j .]

Saikia-Linear Algebra

5

book1

February 25, 2014

0:8

Linear Operators

5.1 INTRODUCTION The advantage of representing a linear operator on a finite-dimensional vector space by a matrix lies in the freedom to choose suitable bases of the vector space. An appropriate basis will result in a relatively simple matrix of the linear operator which will enable us to understand the operator better. Ideally, one would like such a matrix to be as simple as a diagonal one, such as:  λ1  0   0  . diag[λ1 , λ2 , . . . , λn ] =   .  .   0  0

0 λ2 0 . . . 0 0

0 0 λ3 . . . 0 0

... ... ... ... ... ··· ··· ···

0 0 0 . . . λn−1 0

 0   0   0  .  . .   .   0  λn

If a linear operator T on an n-dimensional vector space V can be represented by such a diagonal matrix, then just by counting the number of non-zero entries along the diagonal, one would know the rank as well as the nullity of T ; in fact, determining the bases for the image and the kernel of T will be equally easy. Also note that if D is a diagonal matrix, then solving the system of equations Dx = 0 or the system Dx = b is trivial. Now, for T to be represented by a diagonal matrix like the preceding one, there must be a basis v1 , v2 , . . . , vn of V, such that T v j = λ jv j

for

j = 1, 2, . . . , n.

Non-zero vectors, such as v j , which T changes into scalar multiples of themselves are crucial in understanding T , and so are given a special name: they are eigenvectors of T corresponding to the eigenvalue λ j . Thus, the ideal situation will be the one in which T has enough eigenvectors to form a basis of V; we will then say that T is diagonalizable. So resolving the diagonalization problem for a linear operator depends on finding its eigenvectors. However, it turns out that it is far easier to find the eigenvalues as there is a systematic procedure for determining them. Once eigenvalues are found, simple matrix equations lead to the corresponding eigenvectors.

237

Saikia-Linear Algebra

238

book1

February 25, 2014

0:8

Linear Operators

The ideas of eigenvalues and eigenvectors of a linear operator are intimately related to certain polynomials determined by the operator. A study of these polynomials helps us not only in developing alternate ways of looking at diagonalizable operators but also in analysing non-diagonalizable operators later. This chapter thus explores several key concepts of linear algebra. However, the focus will be on diagonalizable operators throughout. We begin though with a brief discussion of polynomials with coefficients from a field, such as the field of real numbers or complex numbers, as polynomials will play a crucial role in the material that follows.

5.2 POLYNOMIALS OVER FIELDS This section is a brief review of the nomenclature, notation and results about polynomials that we will be needing in this as well as in later chapters. Most of the results are without proofs. Readers looking for more comprehensive treatment of the material should look up standard textbooks in algebra such as Topics in Algebra [3] by I. N. Herstein. A polynomial f (x) with coefficients from a field F is an expression of the form f (x) = a0 + a1 x + · · · + an xn , where n is a non-negative integer, and the field elements a j are the coefficients of f (x). By a nonzero polynomial, we mean a polynomial having at least one non-zero coefficient. The degree of a non-zero polynomial f (x) is the largest exponent of x with corresponding coefficient non-zero; the zero polynomial is assigned the degree −1 as a convention. We denote the zero polynomial by 0; it is sometimes convenient to think of the zero polynomial as one of indeterminate degree, having zeros for all of its coefficients. Polynomials of degree zero are called constants or scalars. The leading coefficient of a non-zero f (x) of degree n is the coefficient an ; a monic polynomial is a non-zero polynomial with leading coefficient 1. The set of all polynomials with coefficients from a field F is denoted by F[x]. Thus, R[x] is the set of all real polynomials which we have already treated as an example of a vector space over R. Two polynomials in F[x] are equal if they have the same degree and their corresponding coefficients are equal. Polynomials f (x) and g(x) in F[x] of degree m and n, respectively, can be added naturally by adding the corresponding coefficients to produce the sum polynomial f (x) + g(x); any coefficient missing in one of the polynomials corresponding to a non-zero coefficient of the other polynomial is assumed to be zero for this purpose. So the degree of the sum f (x) + g(x) is max(m, n). Scalar multiplication of a polynomial in F[x] by an element of F is straightforward: if f (x) = a0 + a1 x + · · · + am xm , then for any c ∈ F, the scalar multiple c f (x) is the polynomial of same degree obtained by multiplying each coefficient of f (x) by c. So, c f (x) = ca0 + ca1 x + · · · + cam xm . EXAMPLE 1

If f (x) = 1 + 2x and g(x) = −3x + 4x2 + x3 are two real polynomials, then f (x) + g(x) = 1 − x + 4x2 + x3 is a polynomial of degree 3 in R[x]. Similarly, for f (x) = 2 + ix + (1 + i)x2 and g(x) =

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Polynomials Over Fields

239

−i+ x2 +3x4 in C[x], the sum f (x)+g(x), a polynomial of degree 4 in C[x], is given by f (x) + g(x) = (2 − i) + ix + (2 + i)x2 + 3x4 . EXAMPLE 2

The scalar multiple of f (x) = 2 − 4x + 6x3 in R[x] by c = 1/2 is the polynomial 1/2 f (x) = 1 − 2x + 3x3. For f (x) = 1 − ix + (1 + i)x2 in C[x], the scalar multiple i f (x) is i + x + (−1 + i)x2.

EXAMPLE 3

Given polynomials f (x) = a0 + a1 x + · · · + am xm and g(x) = b0 + b1 x + · · · + bn xn with m ≥ n, the linear combination c f (x) + dg(x) equals the zero polynomial for some scalars c and d if and only if the following equalities hold in F: ca j + db j = 0 for j = 1, 2, . . . , n and ca j = 0 for j ≥ n. It follows that given the polynomials f1 (x) = 1, f2 (x) = x, f3 (x) = x2 , . . . , fn+1 (x) = xn , the linear combination c1 f1 (x)+c2 f2 (x)+· · ·+cn+1 fn+1 (x) = 0 if and only if ci = 0 for all i.

Since the addition and scalar multiplication of polynomials in F[x] are basically in terms of the corresponding operations in the field F, it is a routine matter to verify that the set of polynomials F[x] is not only an additive group with respect to addition of polynomials but also a vector space over F. Proposition 5.2.1. The set F[x] of polynomials with coefficients from a field F is an additive group with respect to addition of polynomials with the zero polynomial acting as the additive identity. If scalar multiplication of polynomials by scalars from F is also taken into account, the F[x] becomes a vector space over F. F[x] is an infinite-dimensional vector space. The subset Fn [x] of all polynomials over F of degree at most n is an (n + 1)-dimensional subspace of F[x] with {1, x, x2 , . . . , xn } as its standard basis. Now, we want to focus on the multiplicative structure of F[x]. Unlike addition or scalar multiplication, which are defined component-wise, multiplication of polynomials is performed in the following manner. If f (x) = a0 + a1 x + · · · + am xm and g(x) = b0 + b1 x + · · · + bn xn are two polynomials in F[x], then their product f (x)g(x) is defined to be the polynomial c0 + c1 x + · · · + cm+n xm+n , where 1 ai b j for all k = 0, 1, . . . , m + n ck = i+ j=k

the sum being taken over all 0 ≤ i ≤ m and 0 ≤ j ≤ n. Note that f (x)g(x) = g(x) f (x). It is also clear that if the degrees of f (x) and g(x) are m and n, respectively, then f (x)g(x) is of degree m + n. It is again a routine verification that the additive group F[x] with this multiplication is a commutative ring with the constant polynomial 1 as the identity of the ring. In fact, if scalar multiplication is taken into account, then F[x], like the matrix algebra Mn (F), turns out to be an F-algebra with identity. Divisibility Properties of Polynomials over a Field However, it is the divisibility properties of F[x], akin to those of the ring Z of integers, that will play a crucial role for us. As usual, given polynomials f (x), g(x) ∈ F[x] with g(x) non-zero, we say that g(x) divides f (x), if there is some polynomial h(x) ∈ F[x] such that f (x) = g(x)h(x). In that case, g(x) is a

Saikia-Linear Algebra

240

book1

February 25, 2014

0:8

Linear Operators

divisor of f (x), or f (x) is a multiple of g(x). All the familiar properties of the division process in the integers carry over to F[x] as detailed in the following proposition. Proposition 5.2.2. The following hold for polynomials in F[x]. (a) (b) (c) (d) (e) (f)

f (x) divides itself. If f (x) divides g(x), and g(x) divides k(x), then f (x) divides k(x). If f (x) divides g(x) and h(x), then f (x) divides g(x) + h(x). If f (x) divides g(x), then f (x) divides any multiple g(x)h(x). Every non-zero constant divides any polynomial in F[x]. The non-zero constants are the only invertible elements in F[x].

As in Z, F[x] has a division algorithm which essentially says that if a non-zero polynomial g(x) is not a divisor of f (x), then we can employ division by g(x) to obtain a remainder of degree less than that of g(x). Proposition 5.2.3. Given polynomials f (x), g(x) ∈ F[x] such that g(x) is non-zero, there are polynomials q(x), r(x) such that f (x) = g(x)q(x) + r(x), where either r(x) = 0

or

deg r(x) < deg g(x).

Recall now that an ideal I of a commutative ring R is an additive subgroup of R which is closed with respect to multiplication by elements of R. For example, the multiples mZ of an integer m are ideal of the ring Z. The division algorithm in F[x] implies that any polynomial in a non-zero ideal of the ring F[x] is a multiple of a fixed polynomial called a generator of that ideal; in fact, the generator can be chosen to be a monic polynomial. Proposition 5.2.4. Every ideal of F[x] has a generator. In case of a non-zero ideal I of F[x], there is a monic polynomial of positive degree which generates I. Thus, for any non-zero ideal I of F[x], we can find a monic polynomial d(x) such that every polynomial in I can be expressed as a product d(x)q(x) for some polynomial q(x); so, I = {d(x)q(x) | q(x) ∈ F[x]}. Also, by an argument using degrees of polynomials, one can easily see that the product of two non-zero polynomials in F[x] is non-zero. In other words, F[x] is an integral domain. As every ideal of this integral domain F[x] is generated by a single element, it is called a principal ideal domain, or a PID. PIDs possess important divisibility properties such as existence of greatest common divisors and unique factorizations. We briefly consider the implications of these concepts for F[x] now. We begin by noting that given polynomials f1 (x), f2 (x), . . . , fn (x) in F[x], not all of which are zero polynomials, the collection I of all possible linear combinations: f1 (x)h1 (x) + f2 (x)h2(x) + · · · + fn (x)hn(x),

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Polynomials Over Fields

241

where h1 (x), h2 (x), . . . , hn (x) are arbitrary polynomials in F[x], is an ideal of F[x]; it is called the ideal generated by the polynomials f1 (x), f2 (x), . . . , fn (x). It is clear that each fi (x) ∈ I. Since F[x] is a PID, it follows that I is generated by a single monic polynomial, say d(x), in I. In other words, every polynomial in I is a multiple of d(x). One can therefore draw the following conclusions • d(x) = f1 (x)q1 (x) + f2 (x)q2 (x) + · · · + fn (x)qn (x) for some polynomials q1 (x), q2 (x), . . . , qn (x) in F[x]. • d(x) divides every fi (x). • If f (x) divides every fi (x), that is, if f (x) is a common divisor of the fi (x), then f (x) must divide d(x). We have just shown that the monic polynomial d(x) is the greatest common divisor of f1 (x), f2 (x), . . . , fn (x). It is commonplace to use the term gcd for a greatest common divisor. It is clear that any non-zero scalar multiple of d(x) will have the same divisibility properties with respect to polynomials f1 (x), f2 (x), . . . , fn (x); however, d(x) is the unique monic polynomial with these properties. For future reference, we record our observations in the following. Proposition 5.2.5. Let f1 (x), f2 (x), . . . , fn (x) be polynomials in F[x], not all of which are the zero polynomials. Then, their greatest common divisor exists; it is the unique monic polynomial which also generates the ideal generated by the polynomials f1 (x), f2 (x), . . . , fn (x). Thus, d(x) = f1 (x)q1 (x) + f2(x)q2 (x) + · · · + fn (x)qn (x) for some polynomials q1 (x), q2 (x), . . . , qn (x) in F[x]. An important case occurs when non-zero polynomials f1 (x), f2 (x), . . . , fn (x) have no common divisors other than the constants. In that case we say that the polynomials are relatively prime. It is clear that the gcd of relatively prime polynomials is the constant 1. Corollary 5.2.6. If non-zero polynomials f1 (x), f2 (x), . . . , fn (x) in F[x] are relatively prime, then there exist polynomials q1 (x), q2 (x), . . . , qn (x) in F[x] such that f1 (x)q1 (x) + f2(x)q2 (x) + · · · + fn (x)qn (x) = 1. Irreducible Polynomials Finite sets of what are known as irreducible polynomials over F provide us with examples of relatively prime polynomials. An irreducible polynomial over F is a non-zero, non-constant polynomial in F[x] whose only divisors are the non-zero constants or the scalar multiples of itself. So except for non-zero constants, an irreducible polynomial can have no divisor of degree less than its own degree. A polynomial which is not irreducible is a reducible polynomial. EXAMPLE 4

Any linear polynomial ax + b where a, b ∈ F with a ! 0 is an irreducible polynomial over F.

EXAMPLE 5

The polynomial x2 + 1 is reducible over C as it can be factored as (x + i)(x − i) over C which shows that it has divisors of degree 1.

Saikia-Linear Algebra

242

book1

February 25, 2014

0:8

Linear Operators

Quite often, one uses the factor theorem to identify divisors of degree 1 of given polynomials. Before stating this result, we note that a polynomial f (x) over a field F can be considered as a mapping from F into F. For this interpretation, the value of the polynomial f (x) = a0 + a1 x + · · · + am xm at a scalar c ∈ F is defined naturally as f (c) = a0 + a1 c + · · · + am cm , obtained by substituting the indeterminate x by c in the expression for f (x). Note that one can draw the graph of a real polynomial as it can be considered as a map from R to R. We say that if a scalar c is a root of f (x) in F if the scalar f (c) = 0. Thus, c is a root of f (x) if c is a solution of the functional equation f (x) = 0. Now, we can state the factor theorem which is an easy consequence of the division algorithm. Corollary 5.2.7. For a polynomial f (x) over F, a scalar c ∈ F is a root of f (x) in F if and only if (x − c) is a divisor of f (x) in F[x]. It follows that if f (x) has a root in F and the degree of f (x) > 1, then f (x) is reducible over F. We also say that c ∈ F is a root of f (x) of multiplicity r, if (x − c)r divides f (x) but (x − c)r+1 does not. By induction, we can also establish that a polynomial of degree n over a field F can have at most n roots in F even if the roots are counted according to their multiplicities. EXAMPLE 6

The polynomial x2 + 1 is irreducible over R. If not, it will have a root, say c ∈ R, by the factor theorem. But that is absurd as, for a real c, c2 ! −1. Similarly, x2 + x + 1 is irreducible over R, as by the quadratic formula, the two roots of x2 + x + 1 are non real complex numbers.

EXAMPLE 7

Any polynomial f (x) of odd degree over R must have a root in R so such a polynomial of degree > 1 cannot irreducible over R. The existence of a real root of such a polynomial can be verified by considering the graph of y = f (x).

The case for polynomials over C is rather simple because of the celebrated Fundamental Theorem of Algebra. Theorem 5.2.8. Every non-constant polynomial over C has a root in C. This implies the following characterization of irreducible polynomials over C. Corollary 5.2.9.

The only irreducible polynomials over C are the linear ones.

In general, a field F is said to be algebraically closed if every non-constant polynomial over F has a root in F, or equivalently, the irreducible polynomials over F are precisely the linear polynomials. A basic result (which we quote without proof) states that any field can be considered a subfield of an algebraically closed field. For example, the field R of real numbers is a subfield of the algebraically closed field C.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

243

To put the preceding theorem in perspective, we now state the Unique Factorization Theorem for the PID F[x] which can be proved exactly the way it is proved that any positive integer greater than 1 can be expressed as a product of primes. Theorem 5.2.10. Let F be a field. Any non-constant (monic) polynomial in F[x] can be factored as a product of (monic) irreducible polynomials over F. Such a factorization is unique except for the order of the factors. Thus, given a non-constant polynomial f (x) over F, we can express it as a product f (x) = p1 (x)r1 p2 (x)r2 · · · pt (x)rt of distinct irreducible polynomials pi (x) over F. Note that the irreducible polynomials pi (x) are relatively prime in pairs. In particular, a non-constant monic polynomial f (x) over an algebraically closed field F such as C, can be uniquely expressed as a product of linear factors: f (x) = (x − a1)r1 (x − a2)r2 · · · (x − at )rt , where ai ∈ F. Observe that each ai in F is a root of f (x) occurring ri times; ri is called the multiplicity of the root ai . It follows that a polynomial of degree n over an algebraically closed field F has n roots in F if the roots are counted according to their multiplicities. Note the following implication: since a polynomial f (x) over R can be considered a polynomial over C, even if f (x) cannot be expressed as a product of linear factors over R, it can be expressed so over C. Therefore, a polynomial of degree n over R has precisely n roots in C if the roots are counted according to their multiplicities. We end the section by pointing out another interesting property of real polynomials. If a real polynomial f (x) has a non-real complex root a (as f (x) can be considered a polynomial over C, too), conjugating the relation f (a) = 0, we see that the conjugate a must also be a root of f (x). It is then clear that the real polynomial x2 − (a + a)x + |a|2 is a divisor of f (x). One can thus conclude that, apart from the linear ones, the only other irreducible polynomials over R must be of degree 2.

5.3 CHARACTERISTIC POLYNOMIALS AND EIGENVALUES We begin by introducing eigenvalues and eigenvectors of a linear operator formally. Definition 5.3.1. Let T be a linear operator on a vector space V over a field F. A scalar λ ∈ F is an eigenvalue of T if there is a non-zero vector v in V such that T v = λv. Such a non-zero vector v ∈ V is called a eigenvector of T belonging to the eigenvalue λ. Note that v = 0 always satisfies T v = λv for any λ ∈ F. So the point of this definition is the existence of a non-zero vector v satisfying the given condition. One way of looking at an eigenvalue of T is to examine the set of vectors {v ∈ V | T v = λv}. It is easy to verify that for any λ ∈ F, this set is actually a subspace of V. Thus λ is an eigenvalue of T if and only if the subspace {v ∈ V | T v = λv} is non-zero. This non-zero subspace is called the eigenspace of T corresponding to the eigenvalue λ.

Saikia-Linear Algebra

244

book1

February 25, 2014

0:8

Linear Operators

Observe that any non-zero vector of the eigenspace corresponding to an eigenvalue is an eigenvector belonging to the eigenvalue. There is yet another useful interpretation of the idea of an eigenvalue which can be seen as follows: T v = λv if and only if (T − λIV )v = 0. Therefore, a necessary and sufficient condition for λ to be an eigenvalue of T is that the kernel of the linear operator (T − λIV ) is non-zero. But by Proposition (4.5.4), the kernel of an operator is non-zero if and only if the operator is singular. Recall that singular means ‘ not invertible’. The following proposition lists some equivalent conditions for a scalar to be an eigenvalue. Proposition 5.3.2. Let T be a linear operator on a vector space V over a field F. For a scalar λ ∈ F, the following are equivalent. (a) λ is an eigenvalue of T . (b) The kernel of (T − λIV ) ∈ End(V) is non-zero. (c) The map (T − λIV ) ∈ End(V) is singular. Let us look at some examples to understand the concepts introduced. Note that eigenvalues and eigenvectors of a linear operator on the zero space are not defined; so whenever we discuss eigenvalues or eigenvectors of a linear operator, we will tacitly assume that the underlying vector space is non-zero. EXAMPLE 8

The scalar 0 is the only eigenvalue of z, the zero operator, on any vector space V.

EXAMPLE 9

For the identity map IV of any vector space V, the scalar 1 is the only eigenvalue with every non-zero vector of V an eigenvector for the eigenvalue.

EXAMPLE 10 Consider the projection P1 on R2 . Since P1 (x, y) = (x, 0), 1 is an eigenvalue of P1 with any (x, 0) with x ! 0 as an eigenvector. Similarly, the scalar 0 is another eigenvalue of P1 with (0, y) as an eigenvector for any non-zero y. EXAMPLE 11 In general, if P is an arbitrary projection on a vector space V with range W and kernel K, then by the properties of such projections as given in Proposition (4.2.12), a vector w in the image W if and only if Pw = w. Thus, any non-zero w in W is an eigenvector of P belonging to the eigenvalue 1. Note that if W ! V, then 0 is another eigenvalue of P with every non-zero vector in K being an eigenvector for this eigenvalue. We will see shortly that 1 and 0 are the only eigenvalues of a projection. EXAMPLE 12 Let Rθ be the linear operator on R2 which is the rotation of the plane counterclockwise through an angle θ. Assume that θ is not an integral multiple of π (that means θ ! 0 too). Note that any scalar multiple of a non-zero vector (x1 , x2 ) in R2 must lie on the straight line passing through the origin and the point (x1 , x2 ). Since Rθ moves any non-zero point along a circular arc through an angle θ, it follows that no non-zero vector can possibly be an eigenvector. So, Rθ has no eigenvalue in R.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

245

EXAMPLE 13 Let λ ∈ F be an eigenvalue of some linear operator T on a vector space V over a field F. Let v be a non-zero vector in V such that T v = λv. But then, T 2 v = T (λv) = λT v = λ2 v, showing that the map T 2 has λ2 as an eigenvalue. An easy induction shows that T k has eigenvalue λk for any positive integer k. For any polynomial f (x) = a0 + a1 x + a2 x2 + · · · + am xm in F[x], let the symbol f (T ) denote the operator a0 IV + a1 T + a2 T 2 + · · · + am T m on V. The preceding discussion shows that the operator f (T ) in End(V) has the scalar f (λ) ∈ F as an eigenvalue. EXAMPLE 14 Let T be a nilpotent operator on a vector space V over a field F. If the index of nilpotency of T is r, then, T r = z but T k ! z for any positive integer k if k < r. Here, z is the zero operator on V. Now, if λ ∈ F is an eigenvalue of T , then as in the last example, λr is an eigenvalue of T r . Since T r is the zero map on V, it follows that λr = 0 which implies that λ = 0. On the other hand, as T r−1 ! z, there is some nonzero v ∈ V such that T r−1 v ! 0. If we set w = T r−1 v, then it is a non-zero vector such that T w = T (T r−1 v) = 0 as T r is the zero map on V. Thus, 0 ∈ F is the only eigenvalue of the nilpotent operator T with any non-zero T r−1 v as an eigenvector. If we take the more specific case of the differential map D on Rn [x], a nilpotent operator of index (n + 1), we can easily see that no non-zero real number can be an eigenvalue of D. For, D applied to any non-zero polynomial lowers its degree by 1 whereas the degree of a scalar multiple of a polynomial remains the same. Method for Finding Eigenvalues Observe that in all these examples, we had to utilize specific properties of individual linear operators to obtain some information about their eigenvalues. We now discuss a procedure (which was hinted at in the introduction to this chapter) which enables us, at least in principle, to determine the eigenvalues of any linear operator on a finite-dimensional vector space. Let T be a linear operator on an n-dimensional vector space V over a field F. Assume that n ! 0. Fix any basis of V and let A ∈ Mn (F) be the matrix representation of T with respect to the chosen basis. Observe that for any scalar λ in F, the matrix (A − λIn ) represents the linear operator (T − λIV ) with respect to the same basis. Now, according to the alternative characterization given in Proposition (5.3.2), λ ∈ F is an eigenvalue of T if and only if the map (T − λIV ) is singular. However, invertibility is preserved by the isomorphism between EndF (V) and Mn (F) (see Corollary 4.5.7). It follows that λ is an eigenvalue of T if and only if the matrix (A − λIn) is singular, which is equivalent to the condition that det(λIn − A) = 0. We thus have the following matrix analogue of Proposition (5.3.2). Proposition 5.3.3. Let T be a linear operator on an n-dimensional vector space V, and let A ∈ Mn (F) be the matrix of T with respect to any arbitrary basis of V. Then, λ ∈ F is an eigenvalue of T if and only if det(λIn − A) = 0.

Saikia-Linear Algebra

246

book1

February 25, 2014

0:8

Linear Operators

Observe that the equality det(λIn − A) = 0 is equivalent to the statement that the function det(xIn − A) of x vanishes at x = λ. This equivalence turns out to be a useful once it is realized that det(xIn − A) is actually a monic polynomial in x over F of degree n. For example, if n = 3 and A = [ai j ], then the matrix xI3 − A looks like    x − a11 −a12 −a13    x − a22 −a23   −a21  −a31 −a32 x − a33

so that by expanding the corresponding determinant, we see that det(xI3 − A) is the polynomial x3 − x2 (a11 + a22 + a33 ) + xs1 − det A, where s1 is a certain sum of products of the ai j ’s taken two at a time. Note that the constant term of the polynomial is det A. Thus, the constant term as well as the other coefficients of the polynomial are certain sums of products of the entries of A, hence are scalars in F. It is not difficult to see, in general too, that if A is a matrix over a field F of order n, then det(xIn − A) is a monic polynomial of degree n with coefficients from F. Summarizing, we see that λ ∈ F is an eigenvalue of T if and only if λ is root of the monic polynomial det(xIn − A) for any matrix representation A of T . Characteristic Polynomial Definition 5.3.4. Let A ∈ Mn (F). The monic polynomial det(xIn − A) of degree n over F is called the characteristic polynomial of A. We sometimes denote it by ch(A). The usefulness of the idea of characteristic polynomial is largely due to the following fact. Proposition 5.3.5. Similar matrices in Mn (F) have the same characteristic polynomial. Proof. Recall that if A and B in Mn (F) are similar, then there is an invertible matrix P ∈ Mn (F) such that B = P−1 AP. Now det(xIn − B) = det(xIn − P−1 AP)

= det(P−1 xIn P − P−1 AP) = det(P−1 (xIn − A)P)

= det P−1 (det(xIn − A)) det P = det(xIn − A).

!

Since any two matrices representing a linear operator relative to two bases are similar (see Proposition 4.5.13), it follows that the last proposition enables us to define the characteristic polynomial of a linear operator without any ambiguity. Definition 5.3.6. Let T be a linear operator on a finite-dimensional vector space. The characteristic polynomial of T is defined as the characteristic polynomial of any matrix representing T . We can rephrase the conclusion of the earlier discussion about eigenvalues of T in terms of its characteristic polynomial as follows.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

247

Proposition 5.3.7. Let T be a linear operator on an n-dimensional vector space V over a field F. Let A be the matrix of T with respect to any fixed but arbitrary basis of V. Then, (a) λ ∈ F is an eigenvalue of T if and only if λ is a root in F of the characteristic polynomial of A. (b) The eigenvalues of T are precisely the roots in F of its characteristic polynomial. We can introduce eigenvalues and eigenvectors of any matrix A ∈ Mn (F) in an obvious manner: a scalar λ ∈ F is an eigenvalue of A if we can find a non-zero column vector x in Fn such that Ax = λx; the vector x is an eigenvector of A belonging to the eigenvalue λ. We leave it to the reader to verify that λ is an eigenvalue of A if and only if λ is a root of the characteristic polynomial of A. As with linear operators, the eigenspace of a matrix belonging to an eigenvalue is the subspace consisting of non-zero eigenvectors along with the zero vector. For future reference, we make this idea precise. Proposition 5.3.8. The eigenspace for an eigenvalue λ of a matrix A ∈ Mn (F) is the solution space in Fn of the matrix equation (A − λIn)x = 0. Proof. A vector x ∈ Fn is a non-zero solution of (A − λIn )x = 0 if and only if x is an eigenvector for A belonging to the eigenvalue λ. The proposition is thus clear. ! It is clear that linear operators on finite-dimensional vector spaces and their matrices have the same eigenvalues. In fact, by Proposition (4.5.4) and the discussion preceding it, we can have the following precise formulation of the relationship between the eigenvalues and corresponding eigenvectors of a linear operator and those of any of its matrix representations. Proposition 5.3.9. Let T be a linear operator on an n-dimensional vector space V over a field F, and A ∈ Mn (F) be its matrix with respect to a fixed but arbitrary basis B of V. For any vector v ∈ V, let x ∈ Fn be its coordinate vector with respect to B. Then, v is an eigenvector of T belonging to an eigenvalue λ ∈ F if and only if x is an eigenvector of A belonging to the same eigenvalue λ ∈ F of A. Thus, the vectors comprising the eigenspace of T belonging to the eigenvalue λ correspond precisely to the column vectors in Fn comprising the solution space of the system of equations given by (A − λIn)x = 0. The preceding proposition coupled with Proposition (5.3.8) then yields the following. Corollary 5.3.10. Notations same as above. Then the eigenspaces belonging to the same eigenvalue of T and its matrix A are isomorphic. We need to introduce one more terminology related to the concept of an eigenvalue. Definition 5.3.11. If an eigenvalue occurs r times as a root of the characteristic polynomial, we then say that the eigenvalue has algebraic multiplicity r. We now list the characteristic polynomials and eigenvalues of some special linear operators and matrices. The characteristic polynomial of a matrix A is usually worked out directly by evaluating the determinant det(xIn − A); note that, by definition, the characteristic polynomial of a linear operator is the characteristic polynomial of any matrix representing it.

Saikia-Linear Algebra

248

book1

February 25, 2014

0:8

Linear Operators

EXAMPLE 15 The zero matrix 0n of order n over any field F clearly has xn as the characteristic polynomial. The zero map z on an n-dimensional vector space V over F thus has xn as its characteristic polynomial as the matrix of z with respect to any basis of V is the zero matrix of order n. So 0 is the only eigenvalue, with algebraic multiplicity n, of such a zero operator or of the zero matrix of order n. EXAMPLE 16 The scalar matrix aIn of order n over a field F, with a ∈ F, has (x − a)n as its characteristic polynomial. For, the matrix xIn − aIn , whose determinant gives us the characteristic polynomial, is also a scalar matrix of order n having (x − a) as the diagonal entry. So, aIn has a single distinct eigenvalue a, again with algebraic multiplicity n. Thus, for any n-dimensional vector space V over a field F, the linear operator aIV has (x − a)n as its characteristic polynomial, as aIn is the matrix of aIV with respect to any basis of V. In particular, the identity operator on an n-dimensional vector space over any field has (x − 1)n as its characteristic polynomial. EXAMPLE 17 The characteristic polynomial of any diagonal matrix diag[λ1 , λ2 , . . . , λn ] is clearly (x − λ1 )(x − λ2 ) . . . (x − λn ). Similarly, the characteristic polynomial of a lower triangular (or of an upper triangular) matrix, with scalars λ1 , λ2 , . . . , λn along the diagonal, is (x − λ1 )(x − λ2 ) . . . (x − λn ). In both these cases, the eigenvalues are the distinct entries appearing on the diagonal; the algebraic multiplicity of each eigenvalue is the number of times it appears on the diagonal. The fact that the characteristic polynomial of a linear operator on an n-dimensional vector space, or equivalently of an n × n matrix, is a polynomial of degree n and eigenvalues roots of this polynomial implies that the number of distinct eigenvalues can be at most n. See Section 5.2 for a discussion of roots of polynomials. Of course, if the underlying field is C or any other algebraically closed field, then the number of eigenvalues (not necessarily distinct) will be the same as the degree of the characteristic polynomial. On the other hand, if the characteristic polynomial turns out to be an irreducible polynomial of degree ≥ 2 over the underlying field, then there can be no eigenvalue. We present a few examples to illustrate the various possibilities. EXAMPLE 18 Consider the operator Rθ on R2 , whose matrix with respect to the standard basis is ' cos θ A= sin θ

( −sin θ . cos θ

A direct calculation shows that det(xI2 − A) equals x2 − 2(cos θ)x + 1, so this polynomial is the characteristic polynomial of A as well as of Rθ . Note that the discriminant of this polynomial is (−4sin2 θ), a negative real number unless sin θ = 0. It follows, from the formula for the solutions of a quadratic equation, that if θ is not a multiple of π, then there can be no real root of the characteristic polynomial. Hence, for such values of θ, A or Rθ has no real eigenvalues.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

249

However, as the polynomial can be factored into two distinct linear factors over C, A considered a matrix over C, has two distinct eigenvalues. EXAMPLE 19 Consider the linear operator T on the real space R3 whose matrix A with respect to the standard basis of R3 is the permutation matrix   0 0 1   A = 1 0 0.   0 1 0

Evaluating det(xI3 − A), we see that A and T have x3 − 1 as the characteristic polynomial. Since x3 − 1 = (x − 1)(x2 + x + 1) and since x2 + x + 1 is irreducible over R, it follows that 1 is the only eigenvalue of T and A in R. If A is considered a matrix over C (for example, as the matrix of a linear operator on C3 with respect to the standard basis of C3 ), then A has three distinct eigenvalues 1, ω and ω2 , where ω is a non-real cube root of unity.

Computing Eigenvectors EXAMPLE 20 Consider the real matrix A=

'

−8 −15

( 6 . 11

The characteristic polynomial of A, which the reader can easily work out, is x2 − 3x + 2 whose roots in R are 1 and 2. Thus, A has two distinct eigenvalues. We work out the eigenvectors of A in R2 corresponding to these eigenvalues. Recall that a column vector x is an eigenvector for A corresponding to the eigenvalue λ if it is a non-zero solution of the matrix equation Ax = λx, or equivalently, of (A − λI2 )x = 0. Thus, to find the eigenvectors of A for the eigenvalue λ = 1, we have to solve (2 3 2 3 2 3 ' −9 6 x1 0 x (A − I2) 1 = = , −15 10 x2 0 x2 which is easily seen to be equivalent to ' (2 3 2 3 −3 2 x1 0 = . 0 0 x2 0 It follows that any solution (x1 , x2 )t is given by the condition 3x1 = 2x2 . We may choose any vector (x1 , x2 )t satisfying this condition to be an eigenvector; for example, (2, 3)t is an eigenvector for the eigenvalue λ = 1. A similar argument shows that the general solution of the equation (A − 2I2)x = 0 is given by 5x1 = 3x2 , so we may take (3, 5)t as an eigenvector of A for the eigenvalue λ = 2. These two vectors are clearly linearly independent and hence form a basis of R2 . Note that the choices of the eigenvectors are arbitrary; we could have taken any vectors as eigenvectors as long as they satisfied the matrix equations or the equivalent

Saikia-Linear Algebra

250

book1

February 25, 2014

0:8

Linear Operators

conditions giving the general solution. As we will prove a little later, any arbitrary choice would have still resulted in a basis, for eigenvectors belonging to distinct eigenvalues are automatically linearly independent. We continue with the matrix of the previous example to see the effect of the existence of a basis consisting of eigenvectors. EXAMPLE 21 Let T be the linear operator on R2 whose matrix with respect to the standard basis {e1 , e2 } of R2 be A. Thus, by the definition of matrix representation (see Section 4.5), 2 3 2 3 2 3 2 3 1 −8 0 6 T = and T = . 0 −15 1 11 Note that not only T and A have the same eigenvalues, they also have the same eigenvectors by Proposition (5.3.9) as T acts on column vectors of R2 . Therefore, the basis {v1 = (2, 3)t , v2 = (3, 5)t } of R2 consists of eigenvectors of T corresponding, respectively, to eigenvalues 1, 2 of T . Therefore T v1 = 1v1 and T v2 = 2v2 and so the matrix D of T with respect to the basis {v1 = (2, 3)t , v2 = (3, 5)t } is the diagonal matrix ( ' 1 0 , D= 0 2 where the diagonal entries are the eigenvalues of T . Since the matrices A and D represents the same linear operator T , they are similar matrices. In fact, as we have seen in Section 4.5 during the discussion of similar matrices, if P is the transition matrix from the new basis of R2 of to the ' eigenvectors ( 2 3 −1 original standard basis, then P AP = D. In this example, P = and it can be 3 5 easily verified that ' (−1 ' (' ( ' ( 2 3 −8 6 2 3 1 0 = . 3 5 −15 11 3 5 0 2 EXAMPLE 22 As another exercise in calculating eigenvectors, let us find the eigenvectors of the permutation matrix   0 0 1   A = 1 0 0   0 1 0

considering it a matrix over C. Thus, A can be considered the matrix of a linear operator T on C3 , the three-dimensional vector space of ordered triples of complex numbers, with respect to the standard basis of C3 . We have already seen that 1, ω and ω2 are the three eigenvectors of A over C. The matrices A − λI3 for these three eigenvalues are       0 1 −1 0 1 −ω 0 1 −ω2       0,  1 −ω 0,  1 −ω2 0,  1 −1      0 1 −1 0 1 −ω 0 1 −ω2

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

respectively. Since they are row equivalent to     1 0 −1 ω 0 −1     0 1 −1 ,  0 1 −ω , 0 0 0 0 0 0

 2 ω   0 0

x1 − x3 = 0 , x2 − x3 = 0

ω2 x1 − x3 x2 − ω3x

0 1 0

251

 −1  −ω2 ,  0

it follows that the solutions x = (x1 , x2 , x3 )t of the matrix equation (A − λI3 )x = 0 for λ = 1, ω and ω2 , respectively, will be given by ωx1 − x3 = 0 , x2 − ωx3 = 0

=0 . =0

As in the previous example, we can pick any column vector (x1 , x2 , x3 )t satisfying these three sets of equations to obtain the required eigenvectors. Thus, we may choose (1, 1, 1)t , (1, ω2 , ω)t and (1, ω, ω2 )t as eigenvectors corresponding to eigenvalues 1, ω and ω2 , respectively. We leave to the reader to verify that these are linearly independent over C, and hence form a basis of C3 over C. It is clear that relative to this new basis, the matrix of T is the diagonal matrix:   1 0 0   0 ω 0 .   2 0 0 ω Our computation also shows that   0 0 1 1 0 0   0 1 0 are similar over C.

and

 1  0 0

0 ω 0

 0   0   2 ω

With these examples in mind, we introduce one of the most important concepts of linear algebra. Definition 5.3.12. Let T be a linear operator on a finite-dimensional vector space V over a field F. T is said to diagonalizable (over F) if there is a basis of V consisting of eigenvectors of T . Note that relative to a basis of eigenvectors, the matrix of a diagonalizable operator will be a diagonal one. An analogous definition can be given for matrices. Definition 5.3.13. A matrix A ∈ Mn (F) is said to be diagonalizable (over F) if there is a basis of Fn consisting of eigenvectors of A. We will see shortly that A is diagonalizable if and only if A is similar to a diagonal matrix over F. It is clear that a linear operator on a finite-dimensional vector space is diagonalizable if and only if any matrix representing the operator is diagonalizable. Given a diagonalizable matrix A in Mn (F), consider the unique linear operator T on Fn determined by A with respect to the standard basis of Fn . Since T and A have the same eigenvalues, it follows that with respect to the basis of Fn consisting of eigenvectors of A, the matrix of T is a diagonal matrix

Saikia-Linear Algebra

252

book1

February 25, 2014

0:8

Linear Operators

D, whose diagonal entries are the eigenvalues of A. Therefore, if P is the transition matrix from the basis consisting of the eigenvectors of A to the standard basis of Fn , then, by Proposition (4.5.11), D = P−1 AP. Note that by the definition of transition matrix, the columns of P consist of the eigenvectors of A forming the basis of Fn . Thus, we have the following useful result about diagonalizable matrices. Proposition 5.3.14. Let A ∈ Mn (F) be diagonalizable with eigenvalues λ1 , λ2 , . . . , λn , not necessarily distinct. Let v1 , v2 , . . . , vn be eigenvectors of A forming a basis of Fn such that v j , for each j, is an eigenvector belonging to the eigenvalue λ j . If / P = v1

v2

···

vn

0

denotes the matrix in Mn (F) whose jth column is formed by the components of v j , then P is invertible and P−1 AP = diag[λ1, λ2 , . . . , λn ]. We say that P diagonalizes A. Note that as P depends on our choice of the eigenvectors belonging to the eigenvalues of A, the matrix diagonalizing A is not unique. Corollary 5.3.15. A matrix A in Mn (F) is diagonalizable if and only if A is similar to a diagonal matrix D over F; in that case, the diagonal entries are precisely the eigenvalues of A. We want to draw the reader’s attention to another important point contained in the preceding discussion: any matrix in Mn (F) determines a unique linear operator on the n-dimensional vector space Fn over F, say, with respect to the standard basis. They are equivalent in the sense that they share the same characteristic polynomial, the same eigenvalues as well as the same eigenvectors. Thus, all the results about linear operators can be translated to results about their corresponding matrices, too. We will do that kind of translation without any comments from now onwards. Let us now consider some linear operators and matrices and check whether they are diagonalizable. EXAMPLE 23 The projection P1 : R2 → R2 given by P1 (x1 , x2 ) = (x1 , 0) is diagonalizable over R. For, it has two eigenvalues 1 and 0, and it is clear that the standard basis vectors e1 and e2 are the eigenvectors of P1 corresponding to those eigenvalues, respectively. EXAMPLE 24 In fact, any projection P, that is, a linear operator P such that P2 = P, on a finitedimensional vector space V is trivially diagonalizable. For, if W and K are the image and the kernel of P respectively, then by Proposition (4.2.12), we have (a) Pw = w for any w ∈ W. (b) V = W ⊕ K. Thus, non-zero vectors of W and K are eigenvectors of P belonging to the eigenvalues 1 and 0, respectively. Also, by (b), the union of any bases of W and K yields a basis of V. It is clear that with respect to such a basis of V, the matrix of P is a diagonal one with 1 and 0 as the diagonal entries.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

EXAMPLE 25 Consider the matrix A=

' 0 1

1 0

253

(

over R. The eigenvalues of A are 1 and −1. The eigenvectors of A are column vectors in R2 . Just by looking at A, we can conclude that e1 + e2 = (1, 1)t is an eigenvector of A for the eigenvalue 1. we leave to the reader to verify that e1 − e2 = (1, −1)t is an eigenvector belonging to the eigenvalue −1 and ' (−1 ' ( ' ( 1 1 1 1 1 0 A = . 1 −1 1 −1 0 −1

EXAMPLE 26 The rotation Rθ of R2 is not a diagonalizable operator of R2 over R unless θ is a multiple of π. For, as we had seen in Example 14, Rθ has no real eigenvalue for other values of θ. The matrix of Example 15 is not diagonalizable over R but is so over C.

Quite often, one may be interested only in the dimension of the eigenspace, that is, the number of linearly independent eigenvectors (but not actual eigenvectors) for a given eigenvalue. Recall that the dimension of the solution space of the matrix equation Bx = 0 is precisely the nullity of the matrix B (see Definition 3.6.13). Therefore, by appealing to Proposition (5.3.8), we can derive the following convenient way of finding dimensions of eigenspaces. Lemma 5.3.16. For any eigenvalue λ of a matrix A of order n, the number of linearly independent eigenvectors belonging to λ is the nullity of the matrix A − λIn. Recall that the nullity of a matrix B of order n is n − r, where r is the rank of B. EXAMPLE 27 Consider the nilpotent matrix

 0 1 J4 (0) =  0 0

0 0 1 0

0 0 0 1

 0  0  0 0

over any field F. (Any field, even a finite one, must have the multiplicative and the additive identity, which are usually denoted by 1 and 0 respectively.) It is clear that J4 (0), being a lower triangular matrix having zeros along the diagonal, has x4 as its characteristic polynomial so that 0 is the only eigenvalue. J4 (0) is thus diagonalizable only if it has four linearly independent eigenvectors in F4 belonging to that single eigenvalue 0. However, the nullity of J4 (0) is 1 as its rank is clearly 3. Thus, there cannot be more than one linearly independent eigenvector for J4 (0), showing that J4 (0) is not diagonalizable over any field. As some of these examples (including the last one) show, linear operators may fail to be diagonalizable if they do not have enough eigenvalues. On the other hand, the following result implies that a linear operator on an n-dimensional vector space or a matrix of order n having n distinct eigenvalues is necessarily diagonalizable.

Saikia-Linear Algebra

254

book1

February 25, 2014

0:8

Linear Operators

Proposition 5.3.17.

Eigenvectors belonging to distinct eigenvalues are linearly independent.

Proof. We have left the statement of the theorem deliberately vague, so that it can be used for operators as well as matrices. We prove it for linear operators. The obvious modifications needed for matrices are left to the reader. So let T be a linear operator on a finite-dimensional vector space V over a field F. Let v1 , v2 , . . . , vk be eigenvectors of T in V corresponding to distinct eigenvalues λ1 , λ2 , . . . , λk . We can assume that k ≥ 2. Now, if these eigenvectors are not linearly independent, some vector in the list is a linear combination of the preceding ones of the list (see Proposition 3.3.10). Let v j ( j ≥ 2) be the first such vector in the list. Thus, v1 , v2 , . . . , v j−1 are linearly independent. Also, we have scalars c1 , c2 , . . . , c j−1 such that v j = c1 v1 + c2 v2 + · · · + c j−1 v j−1 .

(5.1)

Applying T to both sides of Equation (5.1) and noting that T vi = λi vi , we obtain λ j v j = c1 λ1 v1 + c2 λ2 v2 + · · · + c j−1 λ j−1 v j−1 .

(5.2)

On the other hand, multiplying Equation (5.1) by λ j yields another relation: λ j v j = c1 λ j v1 + c2 λ j v2 + · · · + c j−1 λ j v j−1 .

(5.3)

Finally, we subtract Equation (5.3) from Equation (5.2) to arrive at c1 (λ1 − λ j )v1 + c2 (λ2 − λ j )v2 + · · · + c j−1 (λ j−1 − λ j )v j−1 = 0. However, the vectors v1 , v2 , . . . , v j−1 being linearly independent, the coefficients ci (λi − λ j ), for 1 ≤ i ≤ j − 1, in the preceding relation are all zeros. It follows that c1 = c2 = · · · = c j−1 = 0 since the eigenvalues λi , for 1 ≤ i ≤ j, are distinct. Equation (5.1) then shows that the eigenvector v j is the zero vector, an absurdity. As the assumption that the eigenvectors are dependent led us to this absurdity, the proposition follows. ! In view of the proposition, if a linear operator on an n-dimensional vector space, or a matrix of order n over a field F, has n distinct eigenvalues, then by choosing an arbitrary eigenvector for each of the eigenvalues, we can produce a set of n linearly eigenvectors which will form a basis of the vector space (or of Fn in case of the matrix A). The following useful corollary results. Corollary 5.3.18. A linear operator on an n-dimensional vector space or a matrix of order n over a field F, having n distinct eigenvalues is diagonalizable over F. It should be clear that even if an operator on an n-dimensional space or a matrix of order n does not have n distinct eigenvalues, it may still be diagonalizable as the next example shows. EXAMPLE 28 The reader can check that the real matrix   10 0  A = −3 1  −1 0

 72  −24  −7

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

255

has two distinct eigenvalues 1 and 2, with 1 having multiplicity 2. A will be diagonalizable if it has two linearly independent eigenvectors corresponding to one of these eigenvalues. Begin with the eigenvalue λ = 1. Since in the matrix    9 0 72   (A − 1I3)x = −3 0 −24,   −1 0 −8

the first as well as the second row is a multiple of the third row, it follows that the rank of the matrix is 1 and its nullity is 2. Thus, it is possible to find two linearly independent eigenvectors belonging to the eigenvalue λ = 1. Now, there will be at least one eigenvector of A for the eigenvalue λ = 2, and this together with the two eigenvectors chosen already, form a basis of R3 by Proposition (5.3.17). Thus A is diagonalizable over R. We leave the task of actually producing the basis to the reader. In contrast to this diagonalizable 3 × 3 matrix with two distinct eigenvalues, the 3 × 3 matrix of the next example is not diagonalizable, even though it too has two distinct eigenvalues. EXAMPLE 29 Consider the real matrix  15  A =  4  9

−12 −2 −8

 −16  −5.  −9

It is easy to see that there are two distinct eigenvalues of A, namely, 1 and 2, with 1 having multiplicity 2. To find the number of linearly independent eigenvectors of A belonging to the eigenvalue 1, we need to find the nullity of the matrix   14 −12 −16   −3 −5. A − I3 =  4   9 −8 −10

To do that we seek the reduced form of the matrix.     14 −12 −16 1 −1 −1     4 −3 −5 ∼ 4 −3 −5     9 −8 −10 9 −8 −10   1 −1 −1   1 −1 ∼ 0   0 1 −1   1 0 −2   ∼ 0 1 −1.   0 0 0

The last matrix, which is the row reduced form of (A − I3 ), has two pivot columns so its nullity is 1. Thus, the nullity of (A − I3 ), or equivalently, the dimension of the eigenspace of A belonging to eigenvalue 1 is 1. Similar calculation for the eigenvalue λ = 2 shows that the corresponding eigenspace also has dimension 1. Thus, we see

Saikia-Linear Algebra

256

book1

February 25, 2014

0:8

Linear Operators

that for each of the eigenvalues, there cannot be more than one linearly independent eigenvector. Consequently, there cannot be a basis of of R3 consisting of eigenvectors of A, showing that A is not diagonalizable. Note that A is not diagonalizable even if we consider it as a matrix over the larger field C. Eigenspaces The last two examples suggest that in case the number of eigenvalues is not equal to the order of a matrix, a simpler criterion for deciding diagonalizability is needed. For deriving any such criterion, the idea of an eigenspace needs deeper examination. Without any loss of generality, we consider eigenspaces of operators. We record some general observations about eigenspaces in the following remarks. (a) If λ is an eigenvalue of the operator T , then the eigenspace ker(T − λIV ) is non-zero and so has dimension at least one. (b) Every non-zero vector in the eigenspace ker(T − λIV ) is an eigenvector of T belonging to the eigenvalue λ. Consequently, any non-zero linear combination of finitely many eigenvectors of T belonging to the eigenvalue λ is again an eigenvector for the same λ. (c) Let W1 , W2 , . . . , Wk be the eigenspaces of T corresponding to distinct eigenvalues, and let W = W1 + W2 + · · · + Wk be the sum of these subspaces. We claim that this sum of subspaces is direct (see Section 3.5 for a discussion of direct sums). To prove the claim assume that v1 + v2 + · · · + vk = 0 in V, where vi ∈ Wi . We need to show that each vi is the zero vector. Now, if some vi are non-zero, clearly some other v j ( j ! i) must also be non-zero. Thus the given sum is a relation of linear dependence for the non-zero vectors among v1 , v2 , . . . , vk . However, any non-zero vector in an eigenspace is an eigenvector. So we have a relation of linear dependence for eigenvectors belonging to distinct eigenvalues, contradicting Proposition (5.3.17). Thus each vi must be the zero vector and our claim follows. We may therefore write W = W1 ⊕ W2 ⊕ · · · ⊕ Wk . As opposed to the algebraic multiplicity of an eigenvalue, which is the multiplicity of the eigenvalue as a root of the characteristic polynomial, we introduce now its geometrical multiplicity. Definition 5.3.19. The geometrical multiplicity of an eigenvalue is the dimension of the corresponding eigenspace. So the last remark implies that the sum of the geometrical multiplicities of the distinct eigenvalues of an operator is the dimension of the subspace spanned by the eigenvectors belonging to all possible eigenvalues. We now look at the case of a diagonalizable operator closely. Let T be a diagonalizable linear operator on an n-dimensional vector space V over a field F. Therefore, there is a basis of eigenvectors of V with respect to which the matrix of T is a diagonal one, say D = diag[λ1 , λ2 , . . . , λn ],

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

257

where the λi are eigenvalues of T . In general, the scalars λ1 , λ2 , . . . , λn need not be distinct. Suppose that exactly k (1 ≤ k ≤ n) distinct values appear on the diagonal of D. Without loss of generality, we may further assume (if necessary by reordering the basis of eigenvectors) that the first n1 of the scalars appearing on the diagonal are equal to λ1 , the next n2 equal to λ2 and so on, and finally the last nk are equal to λk . In other words, there are k distinct eigenvalues λ1 , λ2 , . . . , λk of T which appear as entries of D. Note that for any j corresponding to the eigenvalue λ j , there are n j linearly independent eigenvectors belonging to this particular eigenvalue in the chosen basis of V. Therefore, with respect to this basis, the matrix D can be expressed in blocks as follows:   0 . . . 0  λ1 In1   0 λ2 I n 2 . . . 0    D =  . (5.4) ..  , ..  ..  . .   0 0 . . . λk Ink with λ j appearing exactly n j times in the diagonal of D. The characteristic polynomial of D is easily seen to be (x − λ1)n1 (x − λ2 )n2 . . . (x − λk )nk ,

(5.5)

where n1 + n2 + · · · + nk = n, the dimension of V. This must also be the characteristic polynomial of T since the characteristic polynomial of a linear operator is the same as that of any matrix representing it. Examining Equations (5.4) and (5.5), we can arrive at the following conclusions. • The characteristic polynomial of T factors completely into a product of linear factors, some of which may be repeated, over F. • The distinct diagonal entries λ1 , λ2 , . . . , λk of D are all the possible eigenvalues of T as these are the only roots of the characteristic polynomial as given by Equation (5.5). • n j , the number of times λ j appears on the diagonal of D for any j, is precisely the algebraic multiplicity of the eigenvalue λ j . We are now ready to state one of the main theorems of this section. Theorem 5.3.20. Let T be a linear operator on a finite-dimensional vector space V over a field F. Let λ1 , λ2 , . . . , λk be the distinct eigenvalues of T , and let W1 , W2 , . . . , Wk be the corresponding eigenspaces of T . Then, the following are equivalent. (a) T is diagonalizable over F. (b) The characteristic polynomial of T factors completely into linear factors over F as (x − λ1 )d1 (x − λ2)d2 . . . (x − λk )dk , where d j = dim W j for j = 1, 2, . . . , k. (c) The sum of the geometric multiplicities of the distinct eigenvalues of T equals dim V. (d) If W1 , W2 , . . . , Wk are the distinct eigenspaces of T , then dim V = dim W1 + dim W2 + · · · + dim Wk . (e) If W1 , W2 , . . . , Wk are the distinct eigenspaces of T , then V = W1 ⊕ W2 ⊕ · · · ⊕ Wk . The matrix version is obvious and we leave it to the reader to formulate and prove it.

Saikia-Linear Algebra

258

book1

February 25, 2014

0:8

Linear Operators

Proof. If T is diagonalizable with distinct eigenvalues λ1 , λ2 , . . . , λk , then we have seen that with respect to a suitably ordered basis of eigenvectors, the matrix of T can be expressed as a block matrix  λ1 In1  0  D =  .  ..  0

0 λ2 I n 2 .. .

. .

. .

. .

0 0 .. .

0

.

.

.

λk I n k

     ,  

where for each j, n j is the algebraic multiplicity of the eigenvalue λ j , which is its multiplicity as a root of the characteristic polynomial of T . Therefore, to prove that (a) implies (b), it suffices to show that for each j, n j equals dim W j = d j , the geometric multiplicity of a j . Now, there are exactly n j zero rows in the diagonal matrix (D − λ j In ) as λi ! λ j for i ! j. Hence, the nullity of (D − λ j In ), and the nullity of the equivalent linear operator T − λ j IV is n j . That is another way of saying that n j is the dimension of the eigenspace W j . That (b) implies (c) is trivial, as the degree of the characteristic polynomial of T is dim V. (d) is just a restatement of (c). For the next implication, note that in the last of the preceding remarks, we have seen that the sum W of the eigenspaces of T is a direct sum. Since the dimension of a direct sum is the sum of the dimensions of the summands, it follows from the hypothesis in (d) that dim W = dim V forcing W = V. Thus, (e) follows. Finally, (e) implies that the eigenvectors of T span V. Therefore, we can choose a basis of V from among these eigenvectors. In other words, T is diagonalizable proving (a). ! One of the advantages of this theorem is that, in some cases, without actually finding eigenvectors, we can decide whether an operator or a matrix is diagonalizable. For example, as soon as we found that the characteristic polynomial of the permutation matrix   0 0 1   A = 1 0 0   0 1 0

was (x−1)(x2 + x+1), we could have concluded that A is not diagonalizable over R, as x2 + x+1 cannot be factored into linear factors over R. However, a word of caution: even if the characteristic polynomial of a matrix or an operator factors into linear factors, it does not necessarily mean diagonalizability. To cite an instance, the characteristic polynomial of the matrix in Example 23 is easily seen to be (x − 1)2(x − 2), so it does factor into linear factors over R (or even over C). However, as we had seen in that example, the matrix is not diagonalizable even over C. There is another characterization of diagonalizable operator which is simpler to verify, but it needs the concept of the minimal polynomial of an operator. We devote the next section for a detailed study of this concept for it is as important for a linear operator as its characteristic polynomial. Power of a Diagonalizable Matrix

In many practical problems, one needs to compute powers of a given matrix. For a diagonalizable matrix, there is a simple but useful method for calculating such powers. Let A ∈ Mn (F) be a diagonalizable matrix. Suppose P ∈ Mn (F) is an invertible matrix such that P−1 AP = D,

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

259

where D = diag[λ1, λ2 , . . . , λn ] whose diagonal entries λ1 , λ2 , . . . , λn are the eigenvalues of A, not necessarily distinct. It is an easy exercise to show that for any positive integer k, Ak = PDk P−1 ,

(5.6)

where clearly, Dk = diag[λ1k , λ2 k , . . . , λn k ]. Since P and P−1 are already known, the formula in Equation (5.6) is quite efficient. For a large class of diagonalizable matrices that appear in applications, P can be so chosen that P−1 = Pt , the transpose. In that case, this method for computing powers becomes even more efficient. Real symmetric matrices form such a class of matrices and we devote the rest of this section to a discussion of diagonalizability of these matrices. Eigenvalues and Eigenvectors of Real Symmetric Matrices We have seen earlier, in quite a few examples, real matrices which cannot be diagonalized over R because either they lack eigenvalues in R or there are not enough eigenvectors. Therefore, it is truly remarkable that any real symmetric matrix can be diagonalized over R. The proof of this important result, which we shall be presenting shortly, is simple but makes essential use of concepts related to the usual dot product in Rn and in Cn (see Section 3.7 for details about dot products). We need a few basic properties of dot product for proving this, which we recall now. Definition 5.3.21. Given any two column vectors x = (x1 , x2 , . . . , xn )t and y = (y1 , y2 , . . . , yn )t in Cn , the usual dot product /x, y0 is defined by /x, y0 = x1 y1 + x2 y2 + · · · + xn yn , where yi denotes the conjugate of the complex number yi . Recall that y ∈ C is real if and only if y = y. Thus for x and y in Rn , the dot product reduces to /x, y0 = x1 y1 + x2 y2 + · · · + xn yn . Letting the row vector y∗ = (y1 , y2 , . . . , yn ) denote the conjugate transpose of the column vector y = (y1 , y2 , . . . , yn )t (in the real case y∗ is simply the transpose yt ), the dot product of x and y can be expressed as the following matrix product /x, y0 = y∗ x, or as /x, y0 = yt x, in case x and y are in Rn . The following observation will be useful later. Lemma 5.3.22. Let X and Y be real matrices of order n.If ρ1 , ρ2 , . . . , ρn are the row vectors of X Fand γ1 G, γ2 , . . . , γn the column vectors of Y, then the (i, j)th entry of the product XY is the dot product ρti , γ j in Rn .

Saikia-Linear Algebra

260

book1

February 25, 2014

0:8

Linear Operators

Proof. Note that if ρi = (ai1 , ai2 , . . . , ain ) and γ j = (b1 j , b2 j , . . . , bn j )t , then the (i, j)th entry of the product XY, by the usual rules of matrix multiplication, is given by the sum ai1 b1 j +ai2 b2 j +· · ·+ain bn j which is precisely the dot product of the column vectors ρti and γ j by the definition of dot product. ! Two basic properties that we shall also need are /λx, y0 = λ /x, y0 and /x, λy0 = λ /x, y0 for any λ ∈ C; these properties were verified in Section 3.7. We are now ready to prove the following remarkable result. Proposition 5.3.23. Let A be a real symmetric matrix of order n. Then A has n real eigenvalues, counting multiplicities. Moreover, for each such real eigenvalue, A has real eigenvectors. Proof. Consider A a matrix over C. Then its characteristic polynomial over C has n complex roots in C as C is algebraically closed (see Section 5.2 for roots of polynomials over C). Thus, A has n complex eigenvalues. Let λ ∈ C be any such eigenvalue of A and x ∈ Cn an eigenvector belonging to λ, so Ax = λx. Then by properties of the dot product: λ /x, x0 = /λx, x0 = /Ax, x0 = x∗ (Ax).

(5.7)

On the other hand, according to properties of conjugate transposes (see Section 1.5 for these properties), x∗ (Ax) = (x∗ A)x = (A∗ x)∗ x = (Ax)∗ x as A∗ = At = A, A being a real symmetric matrix. Since (Ax)∗ x = /x, Ax0, it then follows from Equation (5.7) that λ /x, x0 = /x, Ax0 = /x, λx0

= λ /x, x0.

These equalities imply that (λ − λ) /x, x0 = 0.

(5.8)

However, for any x = (x1 , x2 , . . . , xn )t ∈ Cn , /x, x0 = |x1 |2 + |x2 |2 + · · · + |xn |2 = 0 if and only if each xi = 0, that is, if and only if x = 0. The vector x in Equation (5.8), being an eigenvector, is non-zero and so the preceding equation can hold only if λ = λ. This proves that λ is a real number. Since λ is an arbitrary eigenvalue of A, it follows that every eigenvalue of A is real.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

261

Now, expressing each component x j of x (for 1 ≤ j ≤ n) as x j = a j + ib j for real numbers a j and b j , we may write x = a + ib, where both a = (a1 , a2 , . . . , an )t and b = (b1 , b2 , . . . , bn )t are column vectors in Rn . Now for the real number λ, λx = λ(a + ib) = λa + iAb and by properties of matrix multiplication Ax = A(a + ib) = Aa + iAb. Therefore, the relation Ax = λx can be restated as Aa + iAb = λa + iλb.

(5.9)

Since A is a real matrix, a, b are vectors with real components and λ a real number, comparing both sides (which are n-dimensional column vectors) of Equation (5.9), we conclude that Aa = λa and Ab = λb. Thus both a and b are real eigenvector of A for the eigenvalue λ. The proof is complete. ! The same proof, with minor modification, also shows that if H ∈ Mn (C) is a hermitian matrix, that is, H ∗ = H, then H has n real eigenvalues. We cannot avoid using some other concepts about orthogonality in Rn to prove our main result. We recall the relevant definitions and results; for details, the reader can go through the material in Section 3.7. Two vectors x and y in Rn (or in Cn ) are orthogonal if /x, y0 = 0. A set of mutually orthogonal vectors in Rn (or in Cn ) is a orthogonal set; so any two vectors in the set are orthogonal. Any orthogonal set of non-zero vectors is a linearly independent set. For x = (x1 , x2 , . . . , xn )t in Rn , /x, x0 = x21 + x22 + · · · + x2n is clearly non-negative; it is 0 if and only if x = 0. The length 4x4 of a non-zero x is defined as the positive square root of /x, x0; the length of the zero vector is 0. Thus 4x42 = /x, x0 . A vector in Rn is a unit vector if its length is 1; note that any non-zero vector can be made into a unit vector by dividing it by its length. An orthonormal set of vectors in Rn is a set of mutually orthogonal vectors such that each vector is a unit vector. Thus vectors {v1 , v2 , . . . , vr } in Rn form an orthonormal set in Rn , if G F vi , v j = v j t v i = δ i j ,

where the Kronecker delta symbol δi j = 0 if i ! j and δii = 1 for all i. Observe that if Q is an orthogonal matrix of order n, that is, a matrix whose columns form an orthonormal set, then by the preceding formula for dot product of orthonormal vectors, Qt Q = In , which shows that such a Q is invertible and that Q−1 = Qt . A well-known procedure, called the Gram–Schmidt process, converts any linearly independent set of vectors in Rn into an orthonormal set. In particular, any orthonormal set of vectors in Rn , being linearly independent, can be extended to an orthonormal basis of Rn . We can now prove the following remarkable result which states that a real symmetric matrix is similar to a diagonal matrix Over R.

Proposition 5.3.24. For any real symmetric matrix A of order n, there is an orthogonal matrix Q of order n such that Q−1 AQ = Qt AQ is diagonal. Proof. The proof is by induction on n. If n = 1, then there is nothing to prove. So assume that n > 1 and that (by induction hypothesis) the result holds for any real symmetric matrix of order n − 1. Let A be a real symmetric matrix of order n and λ any eigenvalue of A. By Proposition (5.3.23), λ is real and there is a eigenvector, say v1 ∈ Rn , of A belonging to the eigenvalue λ. Since a scalar multiple of an eigenvector is still an eigenvector for the same eigenvalue, we can assume that v1 is a unit vector. Therefore, by the Gram–Schmidt process, {v1 } can be extended to an orthonormal basis

Saikia-Linear Algebra

262

book1

February 25, 2014

0:8

Linear Operators

{v1 , v2 , . . . , vn } of Rn . Let P be the real matrix whose columns are these basis vectors; as we have remarked a while ago, that makes P an orthogonal matrix and so P−1 = Pt , the transpose of P. Note that the jth row of Pt is the transpose vtj of the jth column v j of P. We also observe, by the row-column multiplication rule (see Exercise 8 of Section 1.3), that the jth0 column of the product AP is the column / vector Av j and so we may express AP = Av1 Av2 , . . . , Avn . Therefore one has  t v1  vt  / 0  2 −1 t P AP = P AP =  .  Av1 Av2 , . . . , Avn .  ..   t vn

G F From this expression, one concludes, by Lemma (5.3.22), that the (i, j)th entry of P−1 AP is vi , Av j as (vti )t = vi ; this enables us to compute the first column of P−1 AP by taking j = 1. Since {v1 , v2 , . . . , vn } is an orthonormal set and v1 is an eigenvector for A for the real eigenvalue λ, one obtains /vi , Av1 0 = /vi , λv1 0 = λ /vi , v1 0    λ if i = 1 =  0 if i ! 1

Thus the first column of Pt AP is (λ, 0, . . . , 0)t . Now, observe that as A is symmetric, Pt AP is also symmetric: (Pt AP)t = Pt At (Pt )t = Pt AP. We, therefore conclude that, there is a real symmetric matrix B of order n − 1 such that ( λ 0 P AP = , 0B t

'

where the two symbols 0 denote the (n − 1)-dimensional zero row vector and zero column vector, respectively. As B is a real symmetric matrix of order n − 1, by the induction hypothesis, there is an orthogonal matrix U of order n − 1 such that U −1 BU = U t BU = D1 , where D1 is a diagonal matrix of order (n − 1). Set ' ( 1 0 Q=P . 0U Then, as Pt P = In and U t U = In−1 , ' 1 QQ= 0 ' 1 = 0 t

= In ,

( ' ( 0 1 0 t PP Ut 0U (' ( 0 1 0 Ut 0 U

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

263

which shows that Q is an orthogonal matrix. Finally, ( ( ' ' 1 0 1 0 t t P AP Q AQ = 0U 0 Ut ' (' (' ( 1 0 λ 0 1 0 = 0 Ut 0 B 0 U ' ( λ 0 = , 0 D1 where D1 = U t BU is a real diagonal matrix. As Q is orthogonal, this calculation shows that Q−1 AQ = Qt AQ is a diagonal matrix. The proof is complete. ! Observe that if Q is an orthogonal matrix and A a real symmetric matrix, both of order n, such that Qt AQ is a diagonal matrix D, then the diagonal entries of D are necessarily the eigenvalues of A. It is interesting that the converse of the preceding result holds: an orthogonally diagonalizable matrix must be a symmetric one. Proposition 5.3.25. Let A be a real matrix of order n. If there is an orthogonal matrix P of order n such that P−1 AP is diagonal, then A must be symmetric. The proof is straightforward and left to the reader. For diagonalizing specific real symmetric matrices the following result will be needed. Proposition 5.3.26. Let λ1 and λ2 be two distinct eigenvalues of a real symmetric matrix A. If v1 and v2 are eigenvectors belonging to λ1 and λ2 , respectively, then v1 and v2 are orthogonal. Proof. By properties of dot products for real vectors λ1 /v1 , v2 0 = /λ1 v1 , v2 0 = /Av1 , v2 0 = v2 t Av1 = (At v2 )t v1

= (Av2 )t v1 , as A is symmetric. Again (Av2 )t v1 = /v1 , Av2 0 = /v1 , λ2 v2 0 = λ2 /v1 , v2 0 as λ2 is real. These computations show that (λ1 − λ2 ) /v1 , v2 0 = 0. Since λ1 and λ2 are distinct, it follows that /v1 , v2 0 = 0. !

Now we can outline a procedure for diagonalizing a real symmetric matrix A; this procedure will also produce the orthogonal matrix Q such that Qt AQ is diagonal. The first step will be to find the eigenvalues of A and determine the basis vectors of each eigenspace by the general methods used earlier. The next step is to apply Gram–Schmidt process, as discussed in Section 3.7, to the basis vectors of each eigenspace to find an orthonormal basis of each one of them separately. Since eigenvectors belonging to distinct eigenvalues are orthogonal by Proposition (5.3.26), the union of the orthonormal

Saikia-Linear Algebra

264

book1

February 25, 2014

0:8

Linear Operators

bases of the eigenspaces yield a basis of Rn . Finally, we form the matrix whose columns are the members of the orthonormal bases of the eigenspaces. This is the orthogonal matrix Q such that Qt AQ is diagonal, the diagonal entries of which are the eigenvalues of A appearing in the same order as their corresponding eigenvectors in Q. We illustrate the procedure in the following example. EXAMPLE 30 Consider the 3 × 3 symmetric matrix

 2  A = 1  1

1 2 1

 1  1.  2

It is easy to verify that λ = 1, 4 are the eigenvalues of A with the eigenvalue 1 having multiplicity 2. Now, for the eigenvalue λ = 1, the matrix   1 1 1   A − λI3 = 1 1 1   1 1 1

clearly has nullity 2 so that corresponding eigenspace W1 has dimension 2. We may choose u1 = (1, −1, 0)t

and

u2 = (1, 0, −1)t

as a basis of this eigenspace. For the eigenvalue λ = 4, the matrix     −2 1 1 1 0 −1     A − λI3 =  1 −2 1 ∼ 0 1 −1     1 1 −2 0 0 0

and so the dimension of the eigenspace W2 is 1. We choose u3 = (1, 1, 1)t

as a basis for this eigenspace. The Gram–Schmidt process applied to the basis {u1 , u2 } yields the following orthonormal basis √ √ √ √ √ √ v2 = (1/ 6, 1/ 6, − 2/ 3)t v1 = (1/ 2, −1/ 2, 0)t of W1 . Finally, normalizing u3 , we obtain √ √ √ v3 = (1/ 3, 1/ 3, 1/ 3)t . Thus, the required orthogonal matrix Q will then be given by √ √ √    1/ 2 1/ √6 1/ √3 √  . Q = −1/ 2 √1/ √6 1/ √3  0 − 2/ 3 1/ 3

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

265

We leave it to the reader to verify directly that  2 t  Q 1  1

1 2 1

  1 1   1 Q = 0   2 0

0 1 0

0 0 4

   .

While working out an orthonormal basis of eigenvectors of a real symmetric matrix, the reader should remember that if the multiplicity of an eigenvalue of such a matrix is r, then it is guaranteed that there will be r linearly independent eigenvectors for that eigenvalue.

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. (a) No non-zero scalar can be an eigenvalue of the zero operator on a vector space. (b) Every linear operator on a finite-dimensional vector space has eigenvectors. (c) Every linear operator on a complex vector space has eigenvalues. (d) Any two eigenvectors of a linear operator on a vector space are linearly independent. (e) The sum of two eigenvalues of a linear operator is again its eigenvalue. (f) If A ∈ Mn (F) is diagonalizable, so is Ak for any positive integer k. (g) If Ak is diagonalizable for some integer k ≥ 2, then A is diagonalizable.

(h) Any projection on a finite-dimensional vector space is diagonalizable. (i) A linear operator on a finite-dimensional vector space whose characteristic polynomial factors into linear factors must be diagonalizable. (j) A diagonalizable linear operator on an n-dimensional vector space has n distinct eigenvalues. (k) Two similar matrices have the same eigenspace for a common eigenvalue. (l) A matrix in Mn (F) is similar to a diagonal matrix only if Fn has a basis of eigenvectors of the matrix.

(m) The number of linearly independent eigenvectors of a matrix A ∈ Mn (F) belonging to an eigenvalue λ equals the number zero rows of the row reduced form of (A − λIn). (n) A linear operator on R3 must have a real eigenvalue. (o) If A ∈ Mn (F) is diagonalizable, so is A + aIn for any a ∈ F.

(p) For matrices A, B ∈ Mn (F), every eigenvector of AB is an eigenvector of BA. (q) If A ∈ Mn (F) is diagonalizable, then the rank of A is the number of non-zero eigenvalues, counted according to their multiplicities. (r) A non-zero real symmetric matrix is invertible. (s) An orthogonal matrix cannot be a symmetric one.

(t) Any permutation matrix is an orthogonal matrix. 2. In each of the following cases, let T be the linear operator on R2 represented by the given matrix A with respect to the standard basis of R2 . Find the characteristic polynomial, eigenvalues and

Saikia-Linear Algebra

266

book1

February 25, 2014

0:8

Linear Operators

eigenvectors spanning the eigenspaces for each eigenvalue of T :

A= A=

'

' 0 0

5 −7

( 1 , 0

( 1 , −3

A= A=

' 1 1

'

1 0

( 1 , 1

( −1 . 1

3. For each of the following matrices A over the field F, find the eigenvalues and the eigenvectors spanning the eigenspace for each of the eigenvalue. Also, determine whether any of the matrices A is diagonalizable; if so, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D. ' ( i 1 (a) A = for F = C. 2 −i ' ( 0 1 (b) A = for F = R. 1 0   0 0 −2   1 for F = R. (c) A = 1 2   1 0 3 ' ( −3 1 (d) A = for F = C. −7 3   1 2 1   0 −2 for F = R. (e) A = 2   1 −2 3   −13 −60 −60   42 40 for F = R. (f) A =  10   −5 −20 −18   3 4 1 2  2 4 6 8  for F = R. (g) A =  9 12 3 6 4 8 12 16 4. For each of the following real matrices A, find the eigenvalues and the eigenvectors spanning the eigenspace for each of the eigenvalue. Also, determine whether any of the matrices A is diagonalizable; if so, find an invertible matrix P and a diagonal matrix D such that P−1 AP = D.   1 0 1   (a) A = 0 2 0 .   1 0 1    1 0 1   (b) A = −1 2 −1 .   1 0 1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

267

   2 0 1   (c) A = −1 1 −1 .   1 0 2 5. Prove Proposition (5.3.26). Give an example of a diagonalizable real matrix of order 3 which cannot have a full set of 3 orthonormal eigenvectors. 6. For each of the following 3×3 real matrices A, determine orthogonal matrices P such that P−1 AP is diagonal; use the diagonal form of A to compute Ak for any positive integer k:   1 1 0   (a) A = 1 0 1 .   0 1 1   1 1 3   (b) A = 1 3 1 .   3 1 1   1/2 1/2 1/4   (c) A = 1/4 1/4 1/2 .   1/4 1/4 1/2 7. For each of the following permutation matrices, compute the eigenvalues in C:  1  0 0

0 0 1

 1 0  0  0

 0  1 ,  0

0 0 0 1

0 0 1 0

8. For real numbers a, b and c, let C be the circulant matrix  a  C =  c  b

 0  1 . 0 0

 c  b .  a

b a c

Show that if f (x) = a + bx + cx2 and P the permutation matrix  0  P = 0  1

 0  1 ,  0

1 0 0

then C = f (P). Find the eigenvalues of P and hence the eigenvalues of C. 9. Compute the eigenvalues of the permutation matrix  0 0 P =  0 1

1 0 0 0

0 1 0 0

 0  0 . 1 0

Saikia-Linear Algebra

268

book1

February 25, 2014

0:8

Linear Operators

Hence find the eigenvalues of the circulant matrix  a d C =   c b

b a d c

c b a d

 d   c  , b  a

where a, b, c and d are real numbers. Note that C = f (P), where f (x) = a + bx + cx2 + dx3 . 10. Let A = [ai j ] be the permutation matrix of order n over R such that ai j = 1 if i + j = 2 or/ n + 2,0 and ai j = 0 otherwise. Show that the eigenvalues of A are 1 and −1 with multiplicities n2 + 1 / 0 and n−1 2 , respectively. Here, [x] denotes the largest integer ≤ x.

[Hint: As in Exercise 16 of Section 4.5, show that if T is the linear operator on Rn determined by A with respect to the standard basis of Rn , then there is another basis of Rn relative to which the matrix of T is a diagonal one with 1 and −1 as the only diagonal entries.] 11. Determine the trace and the determinant of the matrix in the preceding exercise for any positive integer n. 12. Show that ±1 and ±i are the possible eigenvalues of the following matrix A over C by considering the eigenvalues of A2 :  1 1  A = √ 1 3 1

1 ω ω2

 1   ω2  .  4 ω

Here, ω is a non-real cube root of unity. 13. Generalize the preceding exercise as follows: For any positive integer n ≥ 2, let ω = e2πi/n and A the Fourier matrix of order n over C given by 1 / 0 A = √ a jk , n

where a jk = ω( j−1)(k−1) .

Show that the possible eigenvalues of A are ±1 and ±i.

[Hint: Use the identity 1 + ω + ω2 + · · · + ωn−1 = 0.] 14. Find the eigenvalues of the linear operator T on R2 which takes the circle {(x1 , x2 ) | x1 2 + x2 2 = 1} to the ellipse {(x1 , x2 ) | x1 2 /a2 + x2 2 /b2 = 1}. 15. Find the eigenvalues and the corresponding eigenvectors of the linear operator T , on the vector space R2 [x] of all real polynomials of degree at most 2, given by T (a0 + a1 x + a2 x2 ) = (3a0 − 2a1) − (2a0 − 3a1)x + 4a2 x2 . 16. Find the eigenvalues and the corresponding eigenvectors of the differentiation operator D on R3 [x], the real vector space of all real polynomials of degree at most 3. 17. Find the eigenvalues and the corresponding eigenvectors of the following linear operators on M2 [R], the vector space of the 2 × 2 real matrices: ' ( ' ( ' ( ' ( a b 2c a+c a b c d T = , S = . c d b − 2c d c d a b

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Characteristic Polynomials and Eigenvalues

269

18. Let A be a diagonalizable matrix of order n over a field F having 0 and 1 as its only eigenvalues. If the null space of A has dimension m, which of the following are correct assertions? (a) The characteristic polynomial of A is xm (x − 1)n−m. (b) Ar = Ar+1 for any positive integer r. (c) The trace and determinant of A are n − m and 0, respectively. (d) The rank of A is n − m.

19. For any matrix A ∈ Mn (F) for a field F, show that A and its transpose At have the same set of eigenvalues. 20. Let T be a linear operator on a vector space V over a field F. Show that T is invertible if and only if zero is not an eigenvalue of T . Show further that a ∈ F is an eigenvalue of an invertible operator T if and only if a−1 is an eigenvalue of T −1 . 21. Show that any matrix in Mn (R), with n odd, has at least one real eigenvalue. 22. Let A ∈ Mn (F) be a matrix such that the sum of the entries in each row (column) is a scalar a. Show that a is an eigenvalue of A. 23. Let A, B ∈ Mn (F). Prove that AB and BA have the same eigenvalues. (Hint: Consider two separate cases for non-zero eigenvalues and zero as an eigenvalue.) 24. Let x and y be two n × 1 column vectors over a field F. Use Exercise 20 to find the eigenvalues of xyt . (Hint: Find two suitable matrices A and B of order n over F such that AB = xyt .) 25. Let ai and bi , for i = 1, 2, 3, be arbitrary elements of a field F. Find the eigenvalues of the matrix  a1 b1  a2 b1  a3 b1

 a1 b3   a2 b3  .  a3 b3

a1 b2 a2 b2 a3 b2

26. Let A and B be matrices in Mn (F) such that a non-zero a ∈ F is an common eigenvalue of AB and BA. Prove that corresponding eigenspaces of AB and BA have the same dimension. 27. Let A and B be matrices in Mn (F). If any one of A or B is invertible, then prove that AB and BA have the same characteristic polynomial by showing that they are similar over F. 28. Let A and B be matrices in Mn (F) such that none of them is invertible. Prove that AB and BA have the same characteristic polynomials by carrying out the following computations: (a) If A is of rank r, then show that there are P and Q in Mn (F) such that '

Ir PAQ = 0

( 0 , 0

where Ir is the r × r identity matrix and 0 denote zero matrices of appropriate sizes. Let Q−1 BP−1 be partitioned as follows: Q−1 BP−1 =

' C E

( D , F

Saikia-Linear Algebra

270

book1

February 25, 2014

0:8

Linear Operators

where C is an r × r matrix. Show that PABP−1 =

'

C 0

D 0

(

Q−1 BAQ =

'

C E

0 0

(

and .

(b) Noting that AB and BA are similar to PABP−1 and Q−1 BAQ, respectively, deduce from the last two matrix equations in (a) that the chracteristic polynomials of both AB and BA are given by xn−r det(xIr − C). 29. Give an example of two square matrices A and B over any field such that AB and BA are not similar. 30. Let A be a fixed matrix in Mn (F). If T be the linear operator on Mn (F) given by T (B) = AB

for any B ∈ Mn (F),

then show that T and the matrix A have the same eigenvalues. 31. Let A, B ∈ Mn (F) such that each has n distinct eigenvalues. Prove that AB = BA if and only if they have the same eigenvectors. 32. Let T be the linear operator on Mn (R) given by T (A) = At , the transpose of T . Find the eigenvalues of T . Show further that there is a basis of Mn (R) with respect to which the matrix of T is diagonal. 33. If a is an eigenvalue of a linear operator T on a vector space V over a field F, then show that for any polynomial f (x) over F, the scalar f (a) is an eigenvalue of the operator f (T ). [If f (x) = a0 + a1 x + · · · + am xm , then f (T ) is the linear operator given by f (T ) = a0 I + a1 T + · · · + am T m , where I is the identity map on V] 34. Use the preceding exercise to prove that if f (x) is the characteristic polynomial of a diagonalizable operator T on a finite-dimensional vector space V, then f (T ) = z, the zero operator on V. 35. For any diagonalizable operator T on a finite-dimensional vector space V, prove that V = Im(T )⊕ Ker(T ). 36. Let T be a linear operator on a finite-dimensional vector space V over a field F such that all the roots of the characteristic polynomial of T are in F. Prove that T is diagonalizable if and only if for every eigenvalue a of T Ker(T − aI) = Ker(T − aI)k for every integer k ≥ 2, where I is the identity operator on V. 37. Let A be an invertible matrix in M2 (R) such that A is similar to A2 . Prove that the characteristic polynomial of A is either x2 + x + 1 or x2 − 2x + 1. (Hint: Consider A as a matrix over C and then relate the coefficients of the characteristic polynomial of A to the sums and the products of the eigenvalues of A and A2 .)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Minimal Polynomial

271

38. Let V be the vector space of all the differentiable, real-valued functions on R, and let D be the function on V given by D( f (x)) = f ' (x), the derivative of f (x). Prove that D is a linear operator on V and any a ∈ R is an eigenvalue of D. (Hint: Consider the exponential function on R.) 39. Let A ∈ Mn (F) be diagonalizable with characteristic polynomial a 0 + a 1 x + a 2 x2 + · · · + xn . Show that the rank of A is the largest integer i such that an−i ! 0.

5.4 MINIMAL POLYNOMIAL We have seen that the vector space End(V) = EndF (V) of all the linear operators on a finite-dimensional vector space V over a field F is also a ring. However, to be able to utilize the multiplicative structure of End(V) in analysing a linear operator T , we need to find some ways to relate different powers of T . The idea of polynomials in T provides a convenient way to do that. Consider a vector space V (no restriction on dimension) over a field F and let T be a linear operator on V. Given a polynomial f (x) over F, say f (x) = a0 + a1 x + a2 x2 + · · · + am xm , where the coefficients a j are scalars from F, we define the symbol f (T ) as follows: f (T ) = a0 IV + a1 T + a2 T 2 + · · · + am T m ,

(5.10)

where IV stands for the identity operator on V. Note that being a linear combination of powers of T , f (T ) is actually a linear operator on V, that is, f (T ) ∈ End(V). It is clear from Equation (5.10) that if f (x) = g(x) as polynomials in F[x], then f (T ) = g(T ) as maps on V for any T ∈ EndF (V). Similarly, if h(x) = f (x) + g(x) or h(x) = f (x)g(x), then h(T ) = f (T ) + g(T ) or h(T ) = f (T )g(T ), respectively. Definition 5.4.1. For a linear operator T on a vector space V over a field F, one says that T satisfies polynomial f (x) ∈ F[x] if f (T ) is the zero operator z of V, that is, if f (T )v = a0 v + a1 T v + a2 T 2 v + · · · + am T m v = 0

(5.11)

for every v ∈ V (0 denotes the zero vector of V). Recall that the only linear operator on the zero space over any field F is the zero operator which clearly satisfies any polynomial over F. Since this will cause technical difficulties, our discussion about linear operators satisfying polynomials has to be restricted to operators on non-zero vector spaces. So we assume, for rest of this section, that all vector spaces are non-zero. We present some examples of polynomials satisfied by various linear operators. EXAMPLE 31 Let z be the zero operator on a vector space V over a field F and let f (x) = x. Now for any v ∈ V, f (z)(v) = z(v) = 0 by the definition of the zero operator. Thus z satisfies f (x) = x.

Saikia-Linear Algebra

272

book1

February 25, 2014

0:8

Linear Operators

EXAMPLE 32 The identity operator IV on any vector space V over a field F satisfies a simple relation: IV v = v for any v ∈ V. Taking f (x) = x − 1 and T = IV in Equation (5.11), we see that IV satisfies the polynomial x − 1 over F. EXAMPLE 33 Any projection P on a vector space V over a field F is a linear operator on V such that P2 = P. Thus, by definition, a projection P satisfies the polynomial x2 − x = x(x − 1) over the field F. In particular, the special projections P1 and P2 on R2 do satisfy the polynomial 2 x − x over R.

EXAMPLE 34 Let T : Rn → Rn be the linear transformation defined by its action on the standard basis of Rn as follows: T e j = e j+1

for 1 ≤ j ≤ n − 1 and T en = 0.

As we have seen earlier that T n = z, the zero map on Rn , whereas T k is not z for any k < n. We can, therefore, say that T satisfies the polynomial xn , but not the polynomials xk for any k < n. EXAMPLE 35 If T be the linear operator on R3 represented by the diagonal matrix A = diag[c1, c2 , c3 ] with respect to the standard basis, then the matrix A −c j I3 , for 1 ≤ j ≤ 3 has its jth row the zero row. Therefore, it follows that (A − c1 I3 )(A − c2 I3 )(A − c3 I3 ) is the zero matrix. In terms of the operator T , we thus see that T satisfies the polynomial (x − c1 )(x − c2)(x − c3) over R. As the last example suggests, we can also talk about matrices satisfying polynomials in complete analogy to linear operators. Given A ∈ Mn (F) and any polynomial f (x) = a0 + a1 x + a2 x2 + · · · + am xm over F, we define f (A) = a0 In + a1 A + a2 A2 + · · · + am Am ,

(5.12)

where In is the identity matrix of order n. It is clear that f (A) is a matrix in Mn (F). f We say that A satisfies the polynomial f (x) over F if f (A) is the zero matrix in Mn (F). For example, if A is the diagonal matrix of Example 27, then we have ' seen ( that A does satisfy the 0 0 the polynomial (x −c1 )(x −c2 )(x −c3) over R. Similarly, the matrix A = satisfies the polynomial 1 0 x2 over any field F, but not x. Annihilators To facilitate our discussion about polynomials satisfied by an operator, we introduce a new notation. Definition 5.4.2. Given any linear operator T on a vector space V over a field F, the annihilator ann(T ) of T is the collection of all polynomials over F satisfied by T . Thus ann(T ) = { f (x) ∈ F[x] | f (T ) = z}, where z is the zero operator on V.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Minimal Polynomial

273

Thus, f (x) ∈ ann(T ) if and only if the operator f (T ) is such that f (T )v = 0 for all v ∈ V. The basic properties of ann(T ) are listed in the following proposition. Proposition 5.4.3. For any linear operator T on a vector space V over a field F, the following hold. All polynomials are over F. (a) (b) (c) (d) (e)

The zero polynomial is in ann(T ). The polynomial x ∈ ann(T ) if and only if T is the zero operator on V. If f (x), g(x) ∈ ann(T ), then f (x) ± g(x) ∈ ann(T ). If f (x) ∈ ann(T ), then f (x)g(x) ∈ ann(T ) for any g(x). If ann(T ) has a non-zero polynomial f (x), then ann(T ) has a monic polynomial of degree same as that of f (x).

We leave the verifications of these properties to the reader as only straightforward applications of the definition are involved. Property (d) additionally requires the fact that polynomials in T commute. This follows from the obvious property of multiplication of polynomials over a field: f (x)g(x) = g(x) f (x). Proposition 5.4.4. For any linear operator T on a vector space V over a field F and polynomials f (x) and g(x) over F, f (T )g(T ) = g(T ) f (T ). In an exactly analogous manner we can define the annihilator ann(A) of a matrix A ∈ Mn (F): ann(A) = { f (x) ∈ F[x] | F(A) = 0},

(5.13)

where 0 is the zero matrix in Mn (F). It is obvious that all the assertions in the preceding proposition are valid if T is replaced by A. Now we restrict ourselves to finite-dimensional vector spaces. So let T be a linear operator on a vector space V of dimension n over F and A ∈ Mn (F) be the matrix of T with respect to a fixed but arbitrary basis of V. Then, the isomorphism between EndF (V) and Mn (F) induced by the fixed basis implies that T satisfies a polynomial f (x) over F if and only if A satisfies f (x). Similarly, if A ∈ Mn (F) satisfies a polynomial over F, and A determines the linear operator T on V with respect to any basis of V, then T also satisfies the same polynomial. See Section 4.4 for details of the isomorphism which enables such correspondence between operators and matrices. It is clear, therefore, that any result about polynomials satisfied by linear operators on a finitedimensional vector space has a counterpart for matrices. It is not clear at this moment whether, given an arbitrary non-zero linear operator T on a vector space, ann(T ) has any non-zero polynomial. Thus the following result is theoretically important as it ensures the existence of such polynomials. Theorem 5.4.5. Let V be a finite-dimensional non-zero vector space over a field F and let T be a linear operator on V. Then T satisfies at least one polynomial of positive degree over F. Proof. Assume that dim V = n ! 0. Now the vector space EndF (V) of all linear operators has dimension n2 over F (see Theorem 4.3.6). It follows that any set of (n2 + 1) vectors in EndF (V), and in 2 particular, the vectors T 0 = I, T, T 2 , . . . , T n are linearly dependent over the field F. Therefore we

Saikia-Linear Algebra

274

book1

February 25, 2014

0:8

Linear Operators

can find scalars a0 , a1 , . . . , an2 in F, not all zero, such that 2

a0 I + a1 T + a2 T 2 + · · · + an2 T n = z, where z is the zero operator in EndF (V). Note that it cannot be the case that only a0 is non-zero and the rest of the scalars are zeros, for then a0 I = z, which is not possible. Thus, T satisfies the polynomial 2 a0 + a1 x + a2 x2 + · · · + an2 xn of degree at most n2 over F, where at least one ai , for i ≥ 1, is non-zero. The last condition implies that the polynomial has positive degree. ! Thus for any linear operator T on a finite-dimensional vector space, ann(T ) has polynomials of positive degree. In fact, a reader familiar with the concept of ideals in a ring will notice that because of the third and fourth assertions of Proposition (5.4.3), the following result holds. Proposition 5.4.6. For any linear operator T on a finite-dimensional vector space over a field F, ann(T ) is a non-zero ideal in the ring F[x] of polynomials over F. Note that a non-zero constant polynomial cannot belong to ann(T ). The fundamental fact about ann(T ) is that it has a special polynomial m(x) with the property that every other polynomial in ann(T ) is a multiple of m(x); such a polynomial is called a generator of the ideal ann(T ). The existence of such a polynomial can be inferred from the general theory of ideals in the ring of polynomials over a field; see, for example, Proposition (5.2.4) in our brief discussion on polynomials, which states that every non-zero ideal in F[x] has a monic polynomial as a generator. However, as we shall see presently, the existence of such generators can be proved easily without invoking ideal theory; our aim in mentioning such theories is to make the reader aware of the general theoretical framework. Proposition 5.4.7. Let V be a finite-dimensional vector space over a field F and T a linear operator on V. Let m(x) be a monic polynomial of the least positive degree in ann(T ). (a) If f (x) ∈ ann(T ), then m(x) divides f (x) in F[x]. (b) m(x) is unique. Proof. For any f (x) ∈ ann(T ), by the division algorithm in F[x] or by usual long division, we can find polynomials q(x) and r(x) in F[x] such that f (x) = m(x)q(x) + r(x), where either r(x) = 0 or deg r(x) < deg m(x). Since m(x) and f (x) are in ann(T ), it follows, by substituting T for x in the last polynomial equation and working inside the ring End(V), that r(T ) = z. This shows that r(x) ∈ ann(T ). Therefore, if r(x) is non-zero, a suitable scalar multiple of r(x) yields a monic polynomial in ann(T ) whose degree is strictly smaller than that of m(x). This contradicts our choice of m(x). This contradiction forces r(x) to be the zero polynomial. Hence m(x) divides f (x). For the uniqueness part, assume that m1 (x) is another monic polynomial of the least positive degree in ann(T ). Then, assertion (a) implies that m(x) and m1 (x) are such that each one is a multiple of the other in F[x]. Since they are both monic polynomials of the same degree, it follows that they are equal. ! This result leads us to the following important definition.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Minimal Polynomial

275

Definition 5.4.8. Let T be a linear operator on a non-zero vector space V over a field F. The unique monic polynomial of the least positive degree in ann(T ) is called the minimal polynomial of T over F. The definition is applicable to infinite-dimensional vector spaces also. However, note that minimal polynomials are defined for linear operators on non-zero vector spaces only. One of the reasons for excluding the zero vector space (over any field) is that the zero operator, which is the only linear operator on such a space, satisfies every polynomial over the field. Thus there can be no unique monic polynomial satisfied by the operator on the zero space; even if the field is the finite field of two elements, x and x − 1 are two distinct monic polynomials of least positive degree. We restate the proposition preceding the definition in terms of minimal polynomials for future reference. Proposition 5.4.9. Given a linear operator T on a vector space over a field F, the minimal polynomial m(x) of T is the unique monic polynomial in F[x] such that m(T ) = z. Further, for any polynomial g(x) such that g(T ) = z, m(x) divides g(x) in F[x]. A similar analysis shows that for any matrix A ∈ Mn F, there is a unique monic polynomial over F, say m(x) such that (a) m(A) is the zero matrix in Mn F; (b) if f (A) is also the zero matrix for some f (x) in F[x], then m(x) divides f (x) in F[x]. We say that m(x) is the minimal polynomial of A. It is clear that linear operators and their corresponding matrices have the same minimal polynomials. We consider some examples next. EXAMPLE 36 The zero linear operator on a vector space V, and similarly the zero matrix in Mn (F), has the minimal polynomial x. EXAMPLE 37 The minimal polynomial of the identity operator on any vector space, and similarly of the identity matrix in Mn (F), is x − 1. EXAMPLE 38 Consider the projection P1 : R2 → R2 given by P1 (x1 , x2 ) = (x1 , 0). As we have seen earlier, P1 satisfies the polynomial x2 − x = x(x − 1) over R. The minimal polynomial for P1 therefore, is a monic divisor of x(x − 1), and has to be one of three: x, x − 1 or x(x − 1). If it is x, then P1 has to be the zero map; if it is x − 1, then P1 has to be the identity map on R2 . It follows that x(x − 1) is 'the minimal polynomial of P1 . ( 1 0 is x(x − 1). Similarly, the minimal polynomial of A = 0 0 It is clear that for a general projection P on vector space V, the minimal polynomial has to be x(x − 1) unless P is the identity map (projection onto all of V) or the zero map (projection onto the zero subspace).

Saikia-Linear Algebra

276

book1

February 25, 2014

0:8

Linear Operators

EXAMPLE 39 Recall that, for any field F, the linear operator T on Fn , given by the following action on the standard basis T e j = e j+1

for 1 ≤ j ≤ n − 1 and T en = 0,

satisfies the polynomial xn . So the minimal polynomial of T must be a divisor of xn . Since we also know that T n−1 cannot be the zero operator, we can conclude that the minimal polynomial of T must be xn . It follows that the special nilpotent matrix Jn (0), introduced in Definition (4.5.3) as the matrix of the nilpotent map T , has minimal polynomial xn . Thus the minimal polynomials of the matrices  0 1 J4 (0) =  0 0

0 0 1 0

are x4 and x3 , respectively.

0 0 0 1

 0  0  0 0

and

 0 0 2 J4 (0) =  1 0

0 0 0 1

0 0 0 0

 0  0  0 0

EXAMPLE 40 Let A = diag[c1, c2 , c3 ] be a diagonal matrix over any field F. As we had seen in Example 17, A satisfies the polynomial (x − c1 )(x − c2 )(x − c3 ) and so the minimal polynomial is a divisor of this polynomial. Note that if c1 , c2 and c3 are distinct real numbers, each of the following three matrices:     0 0  0 0  c1 − c2 0     0 0  0  ,  0 0 c2 − c1    0 0 c3 − c 2 0 0 c3 − c1 and

 c1 − c3 0   0 c2 − c3 0 0

 0  0  0

has exactly one row zero. Thus for these diagonal matrices A −c1 I, A −c2 and A −c3 I, the product of any two cannot be the zero matrix, whereas the product of all the three is. We, therefore, can conclude that (x − c1 )(x − c2 )(x − c3 ) is the minimal polynomial of A. If c1 = c2 ! c3 , the minimal polynomial for A is (x − c1 )(x − c3 ), and if c1 = c2 = c3 , it is just (x − c1). EXAMPLE 41 Consider the real matrix  0  A = 1  0

0 0 1

 −1  0.  0

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Minimal Polynomial

277

The minimal polynomial of A is the minimal polynomial of any linear operator A determines on, say, R3 . Fix a basis v1 , v2 , v3 of R3 . Then, A determines an operator T on R3 given by T v1 = v2 ,

T v2 = v3 ,

T v3 = −v1 .

To get some idea as to how the powers of T behave, we compute the images of v1 under successive powers of T : T v1 = v2 ,

T 2 v1 = v3 ,

T 3 v1 = −v1 .

Thus, (T 3 + I)v1 = 0. Computing in the same manner, we see that (T 3 + I)v2 = 0 = (T 3 + I)v3 . Thus, T 3 + I is the zero map on R3 , showing that T satisfies x3 + 1 over R. Since over R, (x + 1)(x2 − x + 1) is a factorization of x3 + 1 as a product of irreducible polynomials, apart from x3 + 1 the possible candidates for the minimal polynomial of T are x + 1 and x2 − x + 1. But it is clear that none of T + I or T 2 − T + I is the zero operator; for example, T v1 + Iv1 = v2 + v1 ! 0 and T 2 v1 − T v1 + Iv1 = v3 − v2 + v1 ! 0 as v1 , v2 and v3 are linearly independent. It follows that the minimal polynomial of T , and therefore of A is x3 + 1. We may treat A as a complex matrix, so the linear operator T determined by A will be acting on a three-dimensional vector space C3 over C. As in the example with the real matrix, we can show that A and T does not satisfy x + 1 or x2 − x + 1. Therefore even though over C, the polynomial x2 − x + 1 can be factored into linear factors, none of these linear factors can be the minimal polynomial of T or A. Thus, even over C, x3 + 1 is the minimal polynomial of both T and A. Applications of Minimal Polynomials The minimal polynomial of an operator contains important information about the operator. It is, therefore, not surprising to come across non-trivial results about linear operators which are proved using the idea of minimal polynomial. The rest of the section is devoted to some such results. Proposition 5.4.10. Let T be a linear operator on a finite-dimensional vector space V over a field F. Then, T is invertible in EndF (V) if and only if the constant term of its minimal polynomial is non-zero. Proof. Let m(x) = a0 + a1 x + a2 x2 + · · · + xk be the minimal polynomial of T . Assume that the constant term a0 ! 0. Now, we may rewrite the equality m(T ) = z as T (a1 I + a2T + · · · + T k−1 ) = −a0 I, where I stands for the identity map in EndF (V). Since a0 ! 0, the expression (−a1 I − a2 T − · · · − T k−1 )a−1 0 gives us a well-defined operator in EndF (V). Call it S . In that case the last equation can be simply put as TS = S T = I showing that T is invertible with S ∈ EndF (V) as its inverse.

Saikia-Linear Algebra

278

book1

February 25, 2014

0:8

Linear Operators

Conversely, assume that T is invertible, and if possible, let a0 = 0. Multiplying the relation m(T ) = z by T −1 , we then obtain a1 I + a2 T + · · · + ak−1 T k−2 + T k−1 = z as T T −1 = T −1 T = I. This, however, shows that T satisfies a polynomial of degree less than that of m(x), the minimal polynomial of T . This contradiction proves that a0 is non-zero. ! Recall that T is singular if it is not invertible. (see Definition (4.5.2).) Corollary 5.4.11. A linear operator T on a finite-dimensional vector space V is singular if and only if there is non-zero S ∈ EndF (V) such that T S = S T = z. Proof. For an invertible T , the relation T S = S T = z, after multiplication by T −1 , implies that S is the zero map. Thus if the relation holds for some non-zero S ∈ End(V), T cannot be invertible. Conversely, assume that T is singular. If m(x) is the minimal polynomial of T , by the preceding proposition, m(x) has no constant term. Therefore, the equality m(T ) = z in EndF (V) has the form a1 T + a2 T 2 + · · · + ak T k = z for some scalars a1 , a2 , . . . , ak . Note that k ≥ 1 as the degree of a minimal polynomial is positive. Let S = a1 I + a2 T + · · · + ak T k−1 . Then, S is a linear operator on V such that S T = T S = z. S is non-zero, for otherwise T will satisfy a polynomial of degree less than k, contradicting the fact that m(x) has degree k. The proof is complete. ! Recall that a T ∈ EndF (V) is invertible if and only if ker T ! {0}. Thus, the only way T can fail to be invertible is if there is some non-zero vector v ∈ V such that T v = 0. The last corollary tells us more about such v; it can be chosen to be S w for any w not in the kernel of S . There is also an unexpected implication of the same corollary. Corollary 5.4.12.

If T ∈ EndF (V) is right-invertible, then T is invertible.

Proof. Let T ' be the right inverse of T in EndF (V) so that T T ' = I, where I is the identity map in EndF (V). If T is not invertible, then Corollary (5.4.11) provides a non-zero S in EndF (V) such that S T = z. But then, z = (S T )T ' = S (T T ' ) = S I = S , a contradiction.

!

One can similarly show that left-invertibility implies invertibility. It is clear that the preceding proposition and its corollaries imply analogous results about matrices in Mn (F), one of which we have already proved in Chapter 2. Proposition 5.4.13.

Let A ∈ Mn (F).

(a) A is invertible if and only if the constant term of its minimal polynomial is non-zero.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Minimal Polynomial

279

(b) A is singular if and only if there is a non-zero matrix B such that AB = BA = 0n . (c) If A is either right-invertible or left-invertible, then A is invertible. We conclude this section by showing that the minimal polynomial is an invariant of a similarity class of operators or matrices; in other words, the objects within a similarity class have the same minimal polynomial. Proposition 5.4.14.

Let V be a finite-dimensional vector space over a field F.

(a) If T and S are similar in EndF (V), then they have the same minimal polynomial. (b) If A and B are similar matrices in Mn (F), then they have the same minimal polynomial. Proof. We prove the result for matrices. Let B = P−1 AP for some invertible matrix P ∈ Mn (F). We 2 claim that B satisfies any polynomial that A satisfies. First observe that (P−1 AP) = (P−1 AP)(P−1 AP) = P−1 A(PP−1 )AP = P−1 A2 P. An easy induction then shows that k

P−1 Ak P = (P−1 AP) = Bk for any positive integer k. Now, if A satisfies a polynomial f (x) = a0 + a1 x + · · · + am xm , after leftmultiplying both sides of the equation f (A) = 0n by P−1 , and right-multiplying by P, we can rewrite the equation as a0 In + a1 P−1 AP + a2 P−1 A2 P + · · · + am P−1 Am P = 0n . Since the products of matrices in each term of the sum can be replaced by appropriate powers of B according to our preceding observation, it follows that B does satisfy the polynomial f (x). Hence, our claim. In particular, B will satisfy the monic polynomial of A. Hence, according to the matrix analogue of Proposition (5.4.9), the minimal polynomial of B divides the minimal polynomial of A. Similarly, as A = Q−1 BQ with Q = P−1 , the minimal polynomial of A divides the minimal polynomial of B. Since minimal polynomials are monic, each dividing the other one implies that they are the same. It is clear that the result for linear operators can be proved exactly the same way. ! Recall that given any square matrix of order n or a linear operator on an n-dimensional vector space (n > 0), the degree of its minimal polynomial does not exceed n2 . However, even for a 2 × 2 matrix, it may be difficult to find, or guess a polynomial of degree not exceeding 4 which may be a candidate for its minimal polynomial. For example, consider the linear operator Rθ of rotation of R2 through an angle θ = π/6. We work with the matrix A of Rθ with respect to the standard basis of R2 . The matrix A and its powers are listed below: √   '√ (  1/2 − 3/2 3/2 √ −1/2 2   A= A =  √ 3/2 1/2 3/2 1/2 √      −1/2 − 3/2 0 −1 3 4  A =  A =  √ . 1 0 3/2 −1/2

Saikia-Linear Algebra

280

book1

February 25, 2014

0:8

Linear Operators

The reader will agree that it is difficult even to guess a relation among these powers. Fortunately, certain results we will derive in Section 5.6 will help us in finding the minimal polynomial of an operator by considering the factors of its characteristic polynomial. One such result states that the minimal polynomial of an operator or a matrix divides its characteristic polynomial. We have cited this result, without any proof, so that the reader can use it for the exercises at the end of the section. The derivation of these results depends on the idea of invariant subspaces which we take up in the next section. EXERCISES 1. Determine whether the following statements are true or false giving brief justifications. All given operators are on non-zero finite-dimensional vector spaces. (a) The minimal polynomial of no non-zero operator can be x. (b) For every positive integer k, 1 ≤ k ≤ n, there is a matrix A ∈ Mn (F) such that the minimal polynomial of A is xk . (c) Every linear operator has a unique minimal polynomial. (d) If two matrices in Mn (F) have the same minimal polynomial, then they must be similar over F. (e) If f (x) and g(x) are the minimal polynomials of linear operators T and S , respectively, then f (x)g(x) is the minimal polynomial of the composite T S . (f) If f (x) and g(x) are, respectively, the minimal polynomials of matrices A and B in Mn (F), then f (x) + g(x) is the minimal polynomial of the sum A + B. (g) For a linear operator T , the minimal polynomial of T 2 divides the minimal polynomial of T . (h) For a linear operator T , the minimal polynomial of T divides the minimal polynomial of T 2 . (i) The characteristic polynomial and the minimal polynomial of a diagonalizable operator are the same. (j) The minimal polynomials of a matrix in Mn (F) and its transpose are the same. (k) If a linear operator on an n-dimensional vector space has n distinct eigenvalues, then its minimal polynomial has degree n. (l) The minimal polynomial of the differential operator on Rn [x] is xn+1 . (m) If f (x) is the minimal polynomial of a linear operator T on a vector space V, then f (x − 1) is the minimal polynomial of T − IV .

(n) For A, B ∈ Mn (F) the matrices AB and BA have the same minimal polynomial. 2. In each of the following cases, find the minimal polynomial of the linear operator T on the indicated vector space V: (a) T (x1 , x2 ) = (x1 , x1 + x2 ) on V = R2 . (b) T (x1 , x2 , x3 ) = (−x3 , x1 − ix3 , x2 + ix3 ) on V = C3 . (c) T ( f (x)) = f ' (x) + f (x) on V = R3 [x].

(d) T ( f (x)) = x f ' (x) + f (x) on V = R4 [x]. (e) T (A) = At on V = Mn (F). (f) T ( f (x)) = f (x + 1) on V = R3 [x].

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Minimal Polynomial

281

3. Find the characteristic polynomials and therefore, the minimal polynomials of the following matrices over the indicated fieldF: ' ( 3 1 (i) A = for F = R. 1 3 ' ( 0 1 (ii) A = for F = R. −1 0 ' ( 0 1 (iii) A = for F = C. −1 0 '√ ( 3/2 √ −1/2 (iv) A = for F = R. 3/2 1/2   0 0 1   (v) A = 0 1 0 for F = R.   1 0 0    1 1 1   (vi) A = −1 −1 −1 for F = C.   1 1 1

4. Can there be a 3 × 3 real non-diagonal matrix whose minimal polynomial is x2 − 5x + 4? 5. Let T be a nilpotent operator of index n on an n-dimensional vector space V (that is, T n is the zero operator whereas T n−1 is not). Show that, if n > 1, then there is no linear operator S on V such that S 2 = T . 6. Show that the minimal polynomial of the real matrix ' ( 0 −1 1 −1 is x2 + x + 1. Also, find a real matrix of order 4 whose minimal polynomial is x2 + x + 1. 7. Compute the minimal polynomial of the matrix    A =  

0 1 −1

0 0 1

1 1 0

   

considered a matrix over C. Verify that A is diagonalizable over C but not so if it is considered a real matrix. 8. Let A = [ai j ] be an n × n matrix over R such that a11 is non-zero. Suppose that the kth row of A is k times the first row of A for 2 ≤ k ≤ n. Compute the characteristic polynomial and the minimal polynomial of A. (Hint: Use suitable properties of determinant to simplify the computation of the characteristic polynomial.) 9. Give examples of diagonalizable operators on R3 (a) whose characteristic and minimal polynomials are equal; (b) whose characteristic and minimal polynomials are not equal.

Saikia-Linear Algebra

282

book1

February 25, 2014

0:8

Linear Operators

10. Let A and B be two matrices in Mn (F) having the same trace and the same minimal polynomial of degree n − 1. Prove that the characteristic polynomials of A and B are the same.

11. For each of the following permutation matrices, compute the minimal polynomial over C:  1  0 0

0 0 1

 0  1,  0

 1 0  0  0

0 0 0 1

 0  1,  0

 0 0  0  1

1 0 0 0

0 0 1 0

 0  1 . 0 0

Are the minimal polynomials over R the same as those over C? 12. Compute the minimal polynomials of the following the permutation matrices over C and over R:  0  0 1

1 0 0

0 1 0 0

 0  0 . 1 0

/ 0 13. Find the minimal polynomial of the real permutation matrix A = ai j , where ai j = 1 if i + j = 2 or n + 2, and ai j = 0 otherwise. 14. Give an example of two matrices A and B over R, such that the products AB and BA have different minimal polynomials. 15. Let A ∈ Mn (F) be an upper triangular matrix with diagonal entries a1 , a2 , . . . an as its diagonal entries. Find the conditions on the ai which forces the minimal polynomial of A to have degree n and degree 1, respectively. 16. Let A ∈ M3 (F) be the matrix  0  A = 1  0

0 0 1

 −a0   −a1 ,  −a2

where ai ∈ F. Prove that the characteristic polynomial as well as the minimal polynomial of A over F is x3 + a2 x2 + a1 x + a0. 17. For a fixed matrix A ∈ Mn (F), let T be the linear operator on Mn (F) given by T (B) = AB for any B ∈ Mn (F). Show that the minimal polynomial of T 18. Prove that the matrices  0 0  A = 1 0  0 1

is the same as the minimal polynomial of the matrix A.  1  0  0

and

 0  B = 0  1

1 0 0

 0  1  0

have the same minimal polynomial over C by showing that are similar over C. Are they similar over R?

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

283

19. Show that  1  A = 1  1

 1  1  1

1 1 1

 3  B = 0  0

and

 0  0  0

0 0 0

are similar over R by exhibiting two bases of R3 with respect to which A and B are the matrices of a linear operator on R3 . Hence find the minimal polynomial of A. (Hint: Consider the linear operator on R3 which is represented by A with respect to the standard basis of R3 .) 20. Let A = [ai j ] be an n × n real matrix such that ai j = 1 for all i, j. Find the characteristic and the minimal polynomial of A by showing that A is similar to B = [bi j ], where b11 = n and bi j = 0 if either i or j is different from 1. 21. Compute the minimal polynomial of the permutation matrix   0 0 1   A = 1 0 0   0 1 0

over R. Further, if C = A2 + A + I3, then find the minimal polynomial of C, too. 22. Show that the following matrices over any field F have the same characteristic and minimal polynomial:  0 1 A =  0 0

0 0 0 0

0 0 0 1

 0  0  and 0  0

 0 1 B =  0 0

0 0 0 0

0 0 0 0

 0  0 . 0  0

Also, verify that A and B are not similar over F by showing that it is not possible to find an invertible matrix P over F such that AP = PB. ( P 1 P2 .) P3 P4 23. Let D be the differential operator on the polynomial ring R[x]. Show that D can have no minimal polynomial over R by proving that there is no polynomial f (x) over R such that f (D) is the zero operator on R[x]. (Hint: For easier computation, write P as a block matrix

'

5.5 INVARIANT SUBSPACES A deeper analysis of linear operators on a vector space depends on subspaces on which they act as linear operators again. The eigenspaces of a linear operator are examples of such spaces. If W is an eigenspace of a linear operator T belonging to an eigenvalue λ, then for any w ∈ W, the image T w is again a vector in W, for T (T w) = T (λw) = λT w. Thus, T maps W into W, and we express this property of W with respect to T by saying that W is invariant under T .

Saikia-Linear Algebra

284

book1

February 25, 2014

0:8

Linear Operators

Definition 5.5.1. Let T be a linear operator on a vector space V. A subspace W of V is said to be T -invariant if for any w ∈ W, T w ∈ W. In other words, W is T -invariant if T (W) ⊂ W. This definition is valid for infinite-dimensional vector spaces too. It is clear that if W is a T -invariant subspace, then T can be considered a linear operator on W which is itself a vector space on its own. First, a few standard examples. EXAMPLE 42 Any vector space V and its zero subspace {0} are T -invariant for any operator T on V. EXAMPLE 43 For the identity operator IV on any vector space V, every subspace of V is invariant. EXAMPLE 44 For any operator T on a vector space V, consider v ∈ ker T . Then T (T v) = T (0) = 0 showing that ker T is T -invariant. Similarly, the subspace Im(T ) is T -invariant. EXAMPLE 45 Suppose that for an operator T on a vector space V over a field F, there is a vector v ∈ V such that T n v = 0 for some positive integer n ≥ 2. Let w be any vector in W, the 4 j subspace spanned by the vectors v, T v, T 2 v, . . . , T n−1 v. Since w = n−1 j=0 a j T v for 4n−1 4 j+1 v as T n v = 0. Thus T w ∈ W a j ∈ F, it follows that T w = j=0 a j T j+1 v = n−2 j=0 a j T proving that W is T -invariant. The following proposition yields a large number of T -invariant subspaces for a linear operator T . Proposition 5.5.2. Let T be a linear operator on a vector space V. If S is a linear operator on V such that T and S commute, that is, T S = S T , then the subspaces ker S and Im(S ) are T -invariant. Proof. If v ∈ ker S , then S (T v) = (S T )v = (T S )v = T (S v) = T (0) = 0, which implies that T v ∈ ker S . It follows that ker S is T -invariant. On the other hand, for any v ∈ V, T (S v) = S (T v) is clearly in Im(S ). So Im(S ) is T -invariant too. ! Recall that if f (x) = a0 + a1 x + · · · + an xn is a polynomial over a field F, then for any linear operator T on a vector space V over the field F, the symbol f (T ) = a0 IV + a1 T + · · · + an T n is again a linear operator on V. Since T commutes with any power of T , it follows that T commutes with f (T ) for any polynomial f (x). Therefore, the following corollary results. Corollary 5.5.3. Let V be a vector space over a field F, and T is a linear operator on V. For any polynomial f (x) ∈ F[x], the subspaces ker f (T ) and Im( f (T )) are T -invariant. In particular, for any eigenvalue λ of T , the eigenspace W = ker(T − λIV ) is T -invariant. For deriving the second assertion, take f (x) = x − λ ∈ F(x). As we have remarked at the outset, a linear operator T can be considered a map on any T -invariant subspace. We now discuss this idea in detail. Let W be a T -invariant subspace for a linear operator T on a vector space V over a field F and so T (W) ⊂ W. It is convenient to think of the linear map T from

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

285

W to W as different from T ; it is, in fact, different in the sense that its domain and range are restricted to W. Definition 5.5.4. as

If W is a T -invariant subspace of V, then the restriction T W of T to W is defined T W (w) = T w for any w ∈ W.

It is clear that T W is a linear operator on W; note that T W (v) is not defined for v " W. We now specialize to the case when W is a T -invariant subspace of a finite-dimensional vector space V. We wish to obtain a matrix representation of T which will reflect the fact that W is a T invariant subspace of V. To do so, we first fix a basis, say v1 , v2 , . . . , vm of W, and then extend it to a basis v1 , v2 , . . . , vm , vm+1 , . . . , vn of V. First consider the matrix of T W on W: if, for 1 ≤ j ≤ m, T W (v j ) =

m 1

b i j vi

i=1

for scalars bi j ∈ F, then the matrix B of T W with respect to the basis of W is in Mm (F) given by B = [bi j ]. Let A be the n × n matrix of T with respect to the extended basis v1 , v2 , . . . , vm , vm+1 , . . . , vn of V. Since T v j = T W (v j ) for 1 ≤ j ≤ m, it follows that Tvj =

m 1

b i j vi

i=1

for 1 ≤ j ≤ m,

for the same scalars bi j which appeared in the matrix B for T W . Therefore, the relations that will yield the entries of the matrix A relative to the basis v1 , v2 , . . . , vm , vm+1 , . . . , vn of V will look like Tvj =

m 1 i=1

bi j vi + 0.vm+1 + · · · + 0.vn

for 1 ≤ j ≤ m

and Tvj =

n 1 i=1

ci j vi

for m + 1 ≤ j ≤ n,

(5.14)

for some scalars ci j . It follows that the first m columns of the matrix A have all zero entries below the mth row and that if we ignore these zero entries, the first m columns of A are exactly the same as the columns of B. In other words, A can be written in terms of blocks of submatrices as ' ( B D A= , (5.15) O C where B is the m×m matrix representing T W , D and C are, respectively, m×(n −m) and (n −m) ×(n −m) matrices with entries determined by Equation (5.14), and finally, O is the (n − m) × m zero matrix. We illustrate this construction by the following example. EXAMPLE 46 Suppose that the linear operator T : R4 → R4 has an eigenvalue λ = 2 such that the eigenspace W = ker(T − 2I) belonging to the eigenvalue 2 has dimension 2. Choose

Saikia-Linear Algebra

286

book1

February 25, 2014

0:8

Linear Operators

a basis v1 , v2 of W; they are necessarily eigenvectors of T with eigenvalue 2. Since W is an eigenspace, it is T -invariant and so the restriction T W is defined and T W (v1 ) = T v1 = 2v1 T W (v2 ) = T v2 = 2v2 . Thus, the matrix of T W with respect to the given basis of W of eigenvectors is the 2 × 2 matrix ' ( 2 0 B= . 0 2 Next, we extend the basis v1 , v2 of W to a basis v1 , v2 , v3 , v4 of R4 . Now, even if we do not know anything specific about v3 and v4 , it is clear that the shape of the matrix A of T will be determined by the following relations: T v1 = 2v1 + 0.v2 + 0.v3 + 0.v4 T v2 = 0.v1 + 2.v2 + 0.v3 + 0.v4 T v3 = c13 v1 + c23 v2 + c33 v3 + c43 v4 T v4 = c14 v1 + c24 v2 + c34 v3 + c44 v4 for some scalars ci j in R. Using the notations of Equation (5.15), we have D, C and O as 2 × 2 matrices with ' ( ' ( ' ( c13 c14 c33 c34 0 0 D= , C= and O = . 0 0 c23 c24 c43 c44 Thus, the matrix of T relative to the basis of V will be the block matrix ' ( B D A= . O C Note that if we would have known that v3 and v4 are both eigenvectors of T for another eigenvalue λ, then D would have been the zero matrix, and C would have been the diagonal matrix diag[λ,λ ], so that A would have been the diagonal matrix diag[2, 2,λ ,λ ]. In that case, A will look like ' ( B O A= , O C where both B and C are diagonal matrices. Direct Sums of Matrices Keeping in mind the preceding discussion, we now consider the general case of such matrix descriptions of operators when the vector space on which the operator is acting can be expressed as a direct sum of invariant subspaces. So assume that for a linear operator T on a finite-dimensional vector space V, we have V = W1 ⊕ W2 ⊕ · · · ⊕ Wk

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

287

where W1 , W2 , . . . , Wk are T -invariant subspaces. Denote the restriction of T to Wi as T i . Assume further that we have chosen basis Bi for each Wi , and the matrix of the restriction T i with respect to this basis of Wi is Ai . Now, by the properties of direct sum (see Proposition 3.5.4), the union B of the bases Bi is a basis of V. Recall that for us any basis is an ordered basis. Thus the ordered basis B is more then just a union of the ordered bases Bi ; we require that in B, the vectors of Bi will precede those of Bi+1 , and will appear exactly in the same order in B as in Bi . It is, as if we are stringing together the bases Bi to obtain B. Let the matrix of T relative to the basis B be A. We want to relate A to the various Ai . Note that, by the definition of the matrix A, non-zero entries in the columns of A determined by the image of any vector in Bi under T can occur only in the rows corresponding to the vectors in Bi , as these vectors span the T -invariant subspace Wi . Since the action of T on the basis vectors of Bi is the same as the action of T i , it follows that the entries in the rows and columns in A corresponding to the vectors of Bi form precisely the submatrix Ai . Therefore, we can represent A as the following block diagonal matrix:  A1  0   . A =   .  .  0

0 A2 . . . 0

. .

. .

. .

.

.

.

 0   0  .  , .   .   Ak

where the presence of the symbols 0 reflects the fact that the entries in the rows and columns corresponding to each Ai other than its own entries are all zeros. It will be convenient to call such a block diagonal matrix a direct sum of the matrices A1 , A2 , . . . , Ak . We will sometimes write such a direct sum as A1 ⊕ A2 ⊕ · · · ⊕ Ak .

(5.16)

We have proved the following proposition which will be quite useful in determining the simplest matrix form of linear operators. Proposition 5.5.5. Suppose that a finite-dimensional vector space V can be decomposed as a direct sum of T -invariant subspaces W1 , W2 , . . . , Wk . Let A1 , A2 , . . . , Ak be the matrices of the restrictions T i of T to Wi with respect to some bases of Wi . If we string together the chosen bases of the Wi to get a basis of V, then with respect to that basis of V, the matrix of T is the direct sum of the matrices A1 , A2 , . . . , Ak . We note that this proposition generalizes the situation obtained for diagonalizable operators. For, a linear operator T on a finite-dimensional vector space V is diagonalizable if and only if V is a direct sum of distinct eigenspaces. Each of these eigenspace is T -invariant, and the matrix of the restriction of T on such an eigenspace for eigenvalue λ j with respect to a basis of eigenvectors is clearly the scalar matrix λ j Id j , where d j is the dimension of the eigenspace. The matrix of T , with respect to the basis of V formed by stringing together the bases of the distinct eigenspaces is thus the diagonal matrix, which is the direct sum of the scalar matrices λ1 Id1 , λ2 Id2 , . . . , λk Idk . Two remarks about direct sum of matrices are in order.

Saikia-Linear Algebra

288

book1

February 25, 2014

0:8

Linear Operators

(i) The converse of Proposition (5.5.5) holds. If the matrix A of a linear operator T on a finitedimensional vector space V can be expressed as a direct sum of submatrices, then V can be decomposed as a direct sum of T -invariant subspaces in such a way that the submatrices are the matrices of the restrictions of T to these subspaces. (ii) If   0 . . . 0  A1   0 A . . . 0  2   . . .  A =   . .   .   . . .    0 0 . . . Ak then, by block multiplication one can see easily that  m 0 . . A1  0 A2 m . .   . . Am =  .  .  . .  0 0 . .

. .

.

for any positive integer m.

 0   0  .   .   .   Ak m

(5.17)

Using the notation we had introduced in Equation (5.16), we can give a brief description of the preceding expression for the power of a direct sum of matrices as follows: if A = A1 ⊕ A2 ⊕ · · · ⊕ Ak , then for any positive integer m, Am = A 1 m ⊕ A2 m ⊕ · · · ⊕ A k m . Let us now go back to the general discussion about T -invariant subspaces. The first task is to relate the minimal and the characteristic polynomials of a linear operator to those of its restriction to an invariant subspace. Proposition 5.5.6. Let T be a linear operator on a finite-dimensional vector space V over a field F and W a T -invariant subspace of V. If T W is the restriction of T to W, then (a) the minimal polynomial of T W divides the minimal polynomial of T ; (b) the characteristic polynomial of T W divides the characteristic polynomial of T . Proof. Let m(x) and mW (x) denote the minimal polynomials of T and T W , respectively. Consider any polynomial f (x) ∈ F[x] such that f (T ) = z, the zero map on V. This implies that f (T )v = 0 for any v ∈ V. Thus f (T ) takes every vector of W also to the zero vector of W (the zero vector of W is the same as that of V), which means that f (T W ) acts as the zero map on W. In other words, T W satisfies any polynomial that T satisfies. In particular, T W satisfies m(x). By properties (see Proposition 5.4.9) of minimal polynomials, it follows that mW (x) divides m(x) proving assertion (a).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

289

We deal with assertion (b) now. Let dim V = n and dim W = m. Fix a basis of W, and let B be the m × m matrix of T W with respect to the fixed basis of W. Extend that basis to a basis of V, and let A be the matrix of T relative to that basis of V. Then, as in Equation (5.15), we have ' ( B D A= , O C where C is some (n − m) × (n − m) square matrix. It then follows that the characteristic polynomial of A is the product of the characteristic polynomials of matrices B and C (see Exercise 13). Since the characteristic polynomials of matrices A and B are the characteristic polynomials of operators T and T W , respectively, the second assertion follows. ! One can say more about the connection between the characteristic polynomial of T and that of T W . For that we need to introduce the linear operator induced by T on the quotient space V/W for a T -invariant subspace W of V. We have discussed quotient spaces in detail in Section 3.9; here we recall the main points briefly. For any subspace W of V over a field F, the quotient space V = V/W is the collection of all cosets v = v + W for v ∈ V; v = u if and only if v − u ∈ W. V/W can be made into a vector space over F be defining addition and scalar multiplication of cosets as follows: for any v, u ∈ V and a ∈ F, v + u = (v + W) + (u + W) = (v + u) + W = v + u, and av = a(v + W) = (av) + W = av. Clearly 0 = 0 + W = w + W = w (for any w ∈ W) is the additive identity in V/W. Now let T be a linear operator on V and W be a T -invariant subspace of V. We define T : V → V by T (v) = T (v + W) = T v + W = T v. Since any coset can be represented by different vectors and the definition of T depends on the vectors representing cosets, we need to verify that T is well-defined, that is, the image of a coset under T is independent of the choice of the vector representing the coset. So suppose for vectors v1 and v2 in V, the cosets v1 = v2 . Thus v1 − v2 ∈ W and so T v1 − T v2 = T (v1 −v2 ) ∈ W as W is T -invariant. Therefore, by the definition of equality in V, we infer that T v1 +W = T v2 + W which is another way of saying that T (v1 ) = T (v2 ). This proves that T is well-defined. It is an easy exercise to show, using the linearity of T , that T is a linear operator on the quotient space V; one says that T is the operator induced by T on V. Now assume that V is finite-dimensional. In this case, we are interested in relating a matrix representation of T to that of T . As we have seen in Proposition (3.9.3), if we extend a basis {w1 , w2 , . . . , wm } of W to a basis {w1 , w2 , . . . , wm , wm+1 , . . . , wn } of V, then the cosets wm+1 , . . . , wn form a basis of V. Suppose that the matrix of T with respect this basis of V is A = [ai j ] whereas the matrix of the restriction T W with respect to the basis {w1 , w2 , . . . , wm } of W is B. Since W is T -invariant, T w j , for 1 ≤ j ≤ m, is a linear combination of only the basis vectors of W. It follows, as in the proof of the preceding proposition, that ' ( B D A= , (5.18) O C

Saikia-Linear Algebra

290

book1

February 25, 2014

0:8

Linear Operators

where O represents all the zeros below the entries of B. Next, we observe that, for m + 1 ≤ j ≤ n, the relation T w j = a1 j w1 + a2 j w2 + · · · + an j wn

implies that

T (w j ) = am+1, j wm+1 + am+2, j wm+2 + · · · + an j wn . Thus the matrix of T with respect to the basis {wm+1 , wm+2 , . . . , wn } of V is precisely C, the submatrix of A in Equation (5.18). Since the same equation shows that the characteristic polynomial of A is the product of the characteristic polynomials of B and C, we have just proved the first assertion of the following proposition. Proposition 5.5.7. Let T be a linear operator on a finite-dimensional vector space V over a field F and W a T -invariant subspace of V. Let T W be the restriction of T to W and T the linear operator on the quotient space V = V/W induced by T . Then the following hold. (a) The characteristic polynomial of T is the product of the characteristic polynomial of T W and that of T . (b) The minimal polynomial of T divides the minimal polynomial of T . Proof. To prove the second assertion, it suffices to show that if T satisfies a polynomial f (x) ∈ F[x], then T also satisfies f (x). So suppose that T satisfies f (x) = a0 + a1 x + · · · + an xn over F. Thus a0 v + a1 T v + · · · + an T n v = 0,

(5.19)

for any v ∈ V. Now note that for any v1 , v2 and v in V and scalars a, b ∈ F, by properties of operations in the quotient space V, one has av1 + bv2 = av1 + bv2

k

and T (v) = T k v.

Using these properties, one can deduce from Equation (5.19) that, for any v ∈ V, n

a0 v + a1 T (v) + · · · + an T (v) = 0. 2

n

Thus a0 + a1 T + a2 T + · · · + an T acts as the zero operator on V, which shows that T too satisfies the polynomial f (x). In particular, T satisfies the minimal polynomial of T and so by the definition of the minimal polynomial of T , it divides the minimal polynomial of T . ! Cyclic Subspaces We now introduce a special invariant subspace that will come up time and again. Definition 5.5.8. Let T be a linear operator on a vector space V over a field F, and let v be a vector in V. The subspace of V spanned by the sequence of vectors v, T v, T 2 v, . . . , T k v, . . . is known as the T -cyclic subspace generated by v, and denoted by Z(v, T ).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

291

If Z(v, T ) = V, then sometimes it is said that v is a T -cyclic vector for V, or simply that T has a cyclic vector. We now observe that for any polynomial f (x) ∈ F[x], the operator f (T ) on V can be thought of as a linear combination (with coefficients from F) of some finitely many of the powers T k for k ≥ 0. Thus we obtain the following description of Z(v, T ). Z(v, T ) = { f (T )v | f (x) ∈ F[x] }.

(5.20)

The following gives yet another description of a T -cyclic subspace. Proposition 5.5.9. Let T be a linear operator on a vector space V over a field F. For any v ∈ V, the T -cyclic subspace Z(v, T ) is the smallest T -invariant subspace of V containing v. Proof. For any polynomial f (x) ∈ F[x], it is trivial that T f (T ) is again a polynomial in T . Therefore, for any f (T )v ∈ Z(v, T ), where f (x) is some polynomial over F, T ( f (T )v) = (T f (T ))v is clearly in Z(v, T ) which shows that Z(v, T ) is T -invariant. On the other hand, if W is a T -invariant subspace of V containing v, then T v is in W and so T 2 v = T (T v) is also in W. Continuing in this manner, it can be shown that T k v ∈ W for any integer k ≥ 0. Since such vectors span Z(v, T ), one concludes that Z(v, T ) is contained in W. The proof of the proposition is complete. ! Now consider Z(v, T ) for a linear operator T on a finite-dimensional vector space V and for a nonzero v ∈ V. So the vectors T k v spanning Z(v, T ) cannot all be linearly independent. On the other hand, as v is non-zero, the singleton {v} is linearly independent. Thus, it is possible to find the largest positive integer m such that S = {v, T v, . . . , T m−1 v} is linearly independent. In that case, S ∪ {T m v} is linearly dependent and so T m v is in the span of the vectors in S. Suppose that T m v = c0 v + c1 T v + c2 T 2 v + · · · + cm−1 T m−1 v,

(5.21)

for some scalars c0 , c1 , . . . , cm−1 . Then applying T to both sides of the preceding relation, we see that T m+1 v = c0 T v + c1 T 2 v + · · · + cm−2 T m−1 v + cm−1 T m v,

(5.22)

the right hand side of which can expressed again as a linear combination of vectors in S by replacing T m v by its expression in Equation (5.21). It is clear that continuing in the same manner, one can show that T k v for every k ≥ (m + 1) is in the span of vectors in S. This proves the following. Proposition 5.5.10. Let T be a linear operator on a finite-dimensional vector space V, and let Z(v, T ) be the T -cyclic subspace generated by a non-zero vector v in V. Suppose that m is the largest positive integer such that S = {v, T v, . . . , T m−1 v} is linearly independent. Then dim Z(v, T ) = m.

Saikia-Linear Algebra

292

book1

February 25, 2014

0:8

Linear Operators

Note that m is also the least positive integer such that T m v is a linear combination of the vectors in the sequence v, T v, T 2 v, . . . , T k v, . . . preceding it. Thus, a T -cyclic subspace of dimension m has a special basis which can be expressed in terms of T as well as the generating vector v. It is convenient to name this basis. Definition 5.5.11. Let T be an operator on a finite-dimensional vector space V, and let v be any non-zero vector in V. Assume that the dimension of the T -cyclic subspace Z(v, T ) generated by v is m. Then the basis of Z(v, T ) given by v, T v, T 2 v, . . . , T m−1 v is called a T -cyclic basis of Z(v, T ). In case T m v = 0, we will sometimes refer to the preceding basis of Z(v, T ) as a T -nilcyclic basis of Z(v, T ), and Z(v, T ) as a T -nilcyclic subspace. The scalars appearing in Equation (5.21) are important too, and we incorporate them in a polynomial that plays a crucial role in what follows. Definition 5.5.12. For a linear operator T on a finite-dimensional vector space V and a non-zero vector v ∈ V, suppose that the dimension of dim Z(v, T ) is m. Suppose further that T m v = −am−1 T m−1 v − · · · − a1T v − a0v for some scalars ai . (note that ai = −ci of Equation (5.21) for notational convenience). Then the polynomial fv (x) = a0 + a1 x + · · · + am−1 xm−1 + xm is called the T -annihilator of v. Sometimes, it is also called the T -annihilator of the subspace Z(v, T ). It is clear that the T -annihilator fv (x) is the unique monic polynomial of least degree such that fv (T )v = 0. Note that • Every non-zero vector in a finite-dimensional space V has a unique T -annihilator for any linear operator T on V. • The degree of the T -annihilator of v is m if and only if the dimension of the T -cyclic subspace Z(v, T ) is m. Let us look at some examples. EXAMPLE 47 If v is an eigenvector of T belonging to an eigenvalue λ, then T v = λv, T 2 v = λ2 v, . . ., so it clear that Z(v, T ) is one-dimensional with {v} as a basis. Note that the T annihilator of the eigenvector v is x − λ, which is the minimal polynomial of T restricted to the eigenspace corresponding to λ.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

293

EXAMPLE 48 If dim V > 1, then the identity map I on V can have no cyclic vector as I k v = v for all non-negative integers k. EXAMPLE 49 If T : R3 → R3 is represented by

 0  1 0

0 0 1

 0  0  0

with respect to the standard basis, then T has a cyclic vector. In fact, e1 = (1, 0, 0) is a T -cyclic vector, that is, Z(e1 , T ) = R3 as e1 , T e1 = e2 , T 2 e1 = e3 form a basis of R3 ; also T 3 e1 = 0. It is also easy to verify that the T -annihilator of e1 is x3 , the minimal polynomial of T . These examples suggest an interesting connection between the T -annihilator of a vector v and the minimal polynomial of the restriction of T to the invariant subspace Z(v, T ). Proposition 5.5.13. Let T be a linear operator on a finite-dimensional vector space V, and v be any non-zero vector in V. Let T v be the restriction of T to the T -invariant, T -cyclic subspace Z(v, T ) generated by v. Then the minimal as well as the characteristic polynomial of T v is precisely the T annihilator fv (x) of v. Note that the degrees of these two polynomials equal the dimension of Z(v, T ). Proof. Let fv (x) = a0 + a1 x + · · · + am−1 xm−1 + xm be the T -annihilator of v. Therefore, the vectors v, T v, . . . , T m−1 v form a T -cyclic basis of Z(v, T ). Since by the definition of T -annihilator, fv (T )v = 0, and since fv (T ) commutes with every power of T , it follows that fv (T ) takes every vector in the cyclic basis, and hence every vector of Z(v, T ), to the zero vector. But the action of T on Z(v, T ) is the same as the action of T v . Thus, we may conclude that fv (T v ) is the zero map on Z(v, T ). In other words, T v satisfies the polynomial fv (x) on Z(v, T ). On the other hand, if T v satisfies any polynomial of degree k < m, then it is easy to see that T k v is a linear combination of the vectors v, T v, . . . , T k−1 v. As k < m, this contradicts the choice of fv (x) as the T -annihilator of v. It follows that fv (x) has to be the minimal polynomial of T v . To establish the assertion about the characteristic polynomial of T v , we first work out the matrix of T v with respect to the T -cyclic basis of Z(v, T ). Note that T and therefore T v applied to any of these basis vectors in that list v, T v, . . . , T m−1 v except the last one, produces the next vector in the list. On the other hand, T applied to the last vector in the list produces T m v which can be expressed by using the T -annihilator fv (x) as the linear combination T m v = −a0 v − a1 T v − · · · − am−1 T m−1 v

Saikia-Linear Algebra

294

book1

February 25, 2014

0:8

Linear Operators

of the basis vectors. Thus the matrix of matrix:  0 0 1 0  0 1  . . C =   . .  . .  0 0  0 0

T v with respect to the T -cyclic basis is the following m × m 0 0 0 . . . 0 0

. . .

0 0

. . .

0 0

. . .

0 0

0 0 0 . . . 1 0

0 0 0 . . . 0 1

 −a0   −a1   −a2  . . .  .  −am−2  −am−1

(5.23)

Expanding the determinant det(xIm − C) by the first column, and then applying induction, one can easily show that the characteristic polynomial of C and therefore of T v is a0 + a1 x + · · · + am−1 xm−1 + xm , which is the T -annihilator of v.

!

The special type of matrix we have just considered is quite useful and so deserves a name. Definition 5.5.14. The m × m matrix defined in Equation (5.23) is called the companion matrix of the monic polynomial a0 + a1 x + a2 x2 + · · · + am−1 xm−1 + xm . If we denote the polynomial as f (x), then its companion matrix is denoted by C( f (x)). Note that if A is the companion matrix of the polynomial f (x) = a0 + a1 x + a2 x2 + · · · + am−1 xm−1 + then

xm , • • • • •

the order of A equals the degree of f (x); all the subdiagonal entries of A are equal to 1; the negatives of the coefficients (except the leading one) of f (x) appear on the last column of A; every entry off the subdiagonal or the last column is zero; the characteristic as well as the minimal polynomial of A are f (x) itself.

Thus, for example, if a matrix is to be produced whose minimal polynomial is a given monic polynomial, one needs only to consider its companion matrix. Cyclic Subspaces for Nilpotent Operator Nilpotent operators provide nice examples of T -cyclic, T -invariant subspaces. Recall that an operator T on V is nilpotent of index r, if r is the smallest positive integer such that T r is the zero map on V. This means that T r−1 is not zero and so we may choose a vector v in V such that T r−1 v is non-zero. It is now easy to verify that the vectors v, T v, T 2 v, . . . , T r−1 v

(5.24)

are linearly independent. Since the zero vector T r v is a trivial linear combination of the vectors in the list (5.24), according to Definition (5.5.11), these vectors do form a nilcyclic basis of the T -invariant T -cyclic subspace W = Z(v, T ) of dimension r.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

295

As T r v is the zero vector, it follows from Equation (5.23) that the matrix of T W , the restriction of T to W, is the r × r matrix having 1 on the subdiagonal and zeros everywhere else, which is Jr (0), the elementary Jordan block of order r with eigenvalue 0. Proposition 5.5.15. Let T be a nilpotent operator on a finite-dimensional vector space V. If the index of nilpotency of T is r, then there is a T -cyclic, T -invariant subspace W of dimension r such that the matrix of T W with respect to the corresponding T -nilcyclic basis of W is the elementary Jordan block Jr (0). We end this section by pointing out an interesting relationship between a linear operator T on a vector space V and the projection on V onto a T -invariant subspace. Recall that (see Proposition (4.2.12)) given a projection P, V can be expressed as W ⊕ K where W is the image of P and K the kernel. Lemma 5.5.16. Let T be a linear operator on a vector space V, and let P be a projection on V with range W and kernel K. Then, W and K are T -invariant subspaces of V if and only if T and P commute. Proof. The assertion in one direction follows from a general result about commuting operators. If T commutes with P, then by Proposition (5.5.2), the range and kernel of P are T -invariant. Thus the special properties of projections are needed only when we prove the lemma in the other direction. So assume that W and K are T -invariant. Now, as V = W ⊕ K, any v ∈ V can be written as v = v1 + v2 = Pv1 +v2 . Here, we have used the fact that w ∈ W if and only if Pw = w as W is the range of the projection P. But K is T -invariant so P(T v2 ) = 0. It follows that P(T v) = P(T (Pv1 )) = T (Pv1 ) as T (Pv1 ) is in W. Using the fact that Pv2 is the zero vector, we then see that (PT )v = T (Pv1 ) = T (Pv1 ) + T (Pv2 ) = T P(v1 + v2 ) which is (T P)v. This completes the proof.

!

This result can be generalized to the situation when a vector space has a direct sum decomposition of finitely many subspaces. We follow the notation of Proposition (4.2.13) which asserts the following: if V = W1 ⊕ · · · ⊕ Wk , and P1 , . . . , Pk are the associated projections with the image of P j as W j , then Pi P j for i ! j is the zero map and P1 + · · · + Pk is the identity map on V. Proposition 5.5.17.

Let T be a linear operator on a vector space V, and let V = W1 ⊕ W2 ⊕ · · · ⊕ Wk

be a direct sum decomposition of V with associated projections P1 , P2 , . . . , Pk . Then, W j , for each j, is T -invariant if and only if T commutes with P j . The proof needs a slight modification of the proof of the preceding lemma and left to the reader. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given vector spaces are finite-dimensional. (a) If every subspace of a vector space is invariant under a linear operator T on V, then T = aI for some scalar a, where I is the identity operator on V.

Saikia-Linear Algebra

296

book1

February 25, 2014

0:8

Linear Operators

(b) If V = W1 ⊕ W2 , where W1 is T -invariant for some linear operator T on V, then W2 is also T -invariant. (c) Every linear operator T on R2 has a one-dimensional T -invariant subspace. (d) If the T -cyclic subspace generated by some non-zero v ∈ V is one-dimensional, then v is an eigenvector of T for some eigenvalue. (e) If D is the differential operator on R3 [x], then R3 [x] is D-cyclic. (f) For a linear operator T on a finite-dimensional vector space V, the T -cyclic subspace generated by any v ∈ V is the same as the T -cyclic subspace generated by T v. (g) If P is a projection of a finite-dimensional vector space V, then V = Im(P) ⊕ ker P is a direct sum of T -invariant subspaces. (h) If T is a nilpotent operator of index of nilpotency r on a vector space V where r < dim V, then T has no cyclic vector. (i) If a linear operator T on a finite-dimensional vector space has a cyclic vector, then so does T 2 . (j) If for a linear operator T on a finite-dimensional vector space, T 2 has a cyclic vector, then so does T . (k) If (x + 1) is a divisor of the minimal polynomial of a linear operator T on a vector space V, then there is a vector v ∈ V whose T -annihilator is precisely (x + 1).

(l) If the minimal polynomial of the restriction T W of a linear operator T to a T -invariant subspace W is x − λ, then λ is an eigenvalue of T . (m) If the characteristic polynomial of the restriction T W of a linear operator T to a T -invariant subspace W is (x − λ)2 , then λ cannot be an eigenvalue of T . 2. Find all T -invariant subspaces of R2 for the linear operator T whose matrix with respect to the standard basis of R2 is ( ' −1 2 A= . −1 2 3. In each of the following, determine whether the given subspace W of the vector space V is T -invariant for the given linear operator: (a) V = R[x], T ( f (x)) = x f ' (x); W = R2 [x]; (b) V = R3 , T (x1 , x2 , x3 ) = (x1 + x2 , x2 + x3 , x3 + x1 );

W = {(a, a, a) | a ∈ R};

(c) V = R3 , T (x1 , x2 , x3 ) = (−x3 , x1 − x3 , x2 − x3 ); W = {(x1 , x2 , x3 ) | x1 + x2 + x3 = 0}; (d) V = Mn (R), T (A) = At ; W = {A | A = At }; ' ( 0 1 (e) V = M2 (R), T (A) = A; W = {A | A = At }. 1 0 4. Let T be the linear operator on the vector space V of all real-valued continuous functions on the interval [0, 1] given by J x (T f )(x) = f (t)dt for 0 ≤ x ≤ 1. 0

Which of the following subspaces of V are T -invariant?

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Invariant Subspaces

297

(a) The subspace of all differentiable functions on [0, 1]; (b) The subspace of all f ∈ V such that f (0) = 0; (c) The subspace of all polynomials of degree at most n; (d) The subspace spanned by sin x and cos x;

(e) The subspace spanned by {1, sin x, cos x}. 5. Let T be a linear operator on a finite-dimensional vector space V. For any v ∈ V, show that the T -annihilator of v is a divisor of the minimal polynomial of T . 6. Let T be a linear operator on a finite-dimensional vector space V. Prove that T has a cyclic vector if and only if there is some basis of V relative to which T is represented by the companion matrix of the minimal polynomial of T . 7. Let T be a diagonalizable linear operator on an n-dimensional vector space V. (a) If T has a cyclic vector, show that T has n distinct eigenvalues. (b) If T has n distinct eigenvalues, and if eigenvectors v1 , v2 , . . . , vn of T form a basis of V, then show that v = v1 + v2 + · · · + vn is a cyclic vector of T .

8. Let T and S be linear operators on a vector space of dimension n. If S commutes with T , then show that every eigenspace of T is S -invariant. Hence, prove that if T is diagonalizable with n distinct eigenvalues and S commutes with T then S itself is diagonalizable. 9. Let T be the linear operator on R3 such that its matrix with respect to the standard basis is  1  0 0

0 1 0

 0  0.  −1

(a) Show that T has no cyclic vector. (b) Determine the T -cyclic subspaces W1 and W2 generated by v1 = (1, 1, 0)t and v2 = (1, 1, −1)t , respectively.

(c) Determine the T -annihilators of v1 and v2 . 10. Let T be the linear operator on C3 such that its matrix with respect to the standard basis is  0  1 0

i −1 1

 0  −i.  1

Find the T -cyclic subspaces generated by e1 = (1, 0, 0)t and v = (1, 0, i)t , respectively. What are the T -annihilators of these two vectors? 11. Prove Proposition (5.5.17). 12. Prove in detail that if a matrix A ∈ Mn (F) can be expressed as a block matrix A=

'

B O

D C

(

then the characteristic polynomial of A is the product of the characteristic polynomials of B and C.

Saikia-Linear Algebra

298

book1

February 25, 2014

0:8

Linear Operators

13. Let T be a linear operator on a finite-dimensional vector space V. Suppose that V can be decomposed as a direct sum V = W1 ⊕ W2 ⊕ · · · ⊕ Wk of T -invariant subspaces. Let T i be the restriction of T to Wi . Prove that the characteristic polynomial of T is the product of the characteristic polynomials of all the restrictions T 1 , T 2 , . . . , T k . 14. Let T be the linear operator on the real vector space R3 [x] of all real polynomials of degree at most 3 given by T ( f (x)) = f '' (x), the second derivative of f (x). Let W1 and W2 be the T -cyclic subspaces generated by v1 = x3 and v2 = x2 , respectively. If T 1 and T 2 are the restrictions of T to W1 and W2 , respectively, compute the characteristic polynomials of T 1 and T 2 . Hence, find the characteristic polynomial of T . 15. Let T be the linear operator on R4 whose matrix with respect to the standard basis is  1 0  1  1

1 1 0 0

0 −1 1 0

 0  0 . 0 1

Find the characteristic and minimal polynomials of the restriction T 1 of T to the T -cyclic subspace generated by e1 = (1, 0, 0, 0)t . Determine the characteristic and the minimal polynomials of T without evaluating any determinant. 16. Show that any linear operator on R3 has invariant subspaces of all possible dimensions. 17. For any linear operator T on Rn (n ≥ 2), show that there is a two-dimensional T -invariant subspace of Rn . 18. Let T and S be diagonalizable linear operators on a finite-dimensional vector space V such that T and S commute. (a) If W is the eigenspace of S for some eigenvalue, then show that W is T -invariant. (b) Prove that the restriction T W of T to W is diagonalizable and W has a basis consisting common eigenvectors of T and S . (c) Hence prove that there is a basis of V with respect to which matrices of both T and S are diagonal. 19. Let A and B be diagonalizable matrices in Mn (F) such that A and B commute. Prove that there is an invertible matrix U ∈ Mn (F) such that both U −1 AU and U −1 BU are diagonal. The preceding two exercises show that two commuting diagonalizable linear operators (or two commuting diagonalizable matrices) are simultaneously diagonalizable. 20. Let A and B be diagonalizable matrices in Mn (F) such that A and B commute. Prove that A + B and AB are also diagonalizable.

5.6 SOME BASIC RESULTS In this section, we derive some major theoretical results about linear operators by using appropriate subspaces invariant under the operators. We also explore the relationship between characteristic and minimal polynomials of linear operators.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Some Basic Results

299

We begin by proving the classical result we had referred to earlier, which states that the minimal polynomial of a linear operator divides its characteristic polynomial. We have chosen, from among various proofs of the result, one which uses the idea of invariant subspaces; it is remarkable how the power of the concepts related to invariant subspaces makes the proof truly simple. Theorem 5.6.1. (Cayley–Hamilton Theorem) Let T be a linear operator on a finite-dimensional vector space V over a field F. Then T satisfies its characteristic polynomial, that is, if ch(x) is the characteristic polynomial of T over F, then ch(T ) = z, the zero map on V. Proof. It needs to be shown that ch(T )v = 0 for any v ∈ V. We can assume that v is non-zero. Let W = Z(v, T ) be the T -cyclic subspace generated by v, and let fv (x) ∈ F[x] be the T -annihilator of v. Then, by Proposition (5.5.13), fv (x) is also the characteristic polynomial of the restriction T W of T to the T -invariant subspace W. Therefore, fv (x) divides ch(x), the characteristic polynomial of T (see Proposition 5.5.6). Suppose that ch(x) = q(x) fv (x) for some q(x) ∈ F[x]. Then, ch(T )v = q(T ) fv (T )v = q(T )( fv (T )v) = q(T )(0) = 0, as the T -annihilator of v takes it to the zero vector. The theorem follows.

!

Since the minimal polynomial of T divides any polynomial satisfied by T , we have the following important result. Corollary 5.6.2. Let T be a linear operator on a finite-dimensional vector space over a field F. Then, the minimal polynomial of T divides its characteristic polynomial in F[x]. We reiterate the matrix version of the Cayley–Hamilton theorem next. Theorem 5.6.3. A square matrix over a field F satisfies its characteristic polynomial. Thus, the minimal polynomial of a square matrix divides its characteristic polynomial in F[x]. Since the characteristic polynomial of a linear operator on an n-dimensional vector space or of an n × n matrix has degree precisely n, the following corollary results: Corollary 5.6.4. The degree of the minimal polynomial of a linear operator on an n-dimensional vector space, or a matrix of order n, cannot exceed n. Eigenvalues and Minimal Polynomials Recall that a polynomial f (x) in F[x] has a linear factor x − λ (λ ∈ F) if and only if λ is a root of f (x) in F. Therefore, it follows from the preceding results that any root in F of the minimal polynomial of a linear operator T on a finite-dimensional vector space V over a field F must be a root of the characteristic polynomial of T in F, that is, an eigenvalue of T . Conversely, assume that λ ∈ F is an eigenvalue of T . Thus, for some non-zero vector v ∈ V, T v = λv. Then, it is easy to see that for any scalar c ∈ F and any positive integer k, (cT k )v = cλk v. It follows that given any polynomial f (x) ∈ F[x], f (T )v = f (λ)v.

Saikia-Linear Algebra

300

book1

February 25, 2014

0:8

Linear Operators

This shows that f (λ) is an eigenvalue of the operator f (T ) on V with v as an eigenvector. In particular, if m(x) is the minimal polynomial of T , then 0 = m(T )v = m(λ)v as m(T ) is the zero operator on V. Since v is non-zero, we conclude from the preceding equality that m(λ) = 0. Thus, we have proved the following proposition. Proposition 5.6.5. For a linear operator T on a finite-dimensional vector space over a field F, the eigenvalues of T are precisely the roots of its minimal polynomial in F. Thus, the roots of the characteristic polynomial and the minimal polynomial of T in the underlying field F are the same, apart from their multiplicities. The matrix version of the proposition is clear and we leave it to the reader to formulate and verify such a version. As nonconstant polynomials over F, the characteristic and the minimal polynomial of T can be factorized uniquely as products of irreducible polynomials over F. Since linear polynomials, that is polynomials of degree 1, are irreducible, it follows from the preceding discussion that the characteristic and the minimal polynomial share the same irreducible factors of degree 1 (the number of times such factors appear in the respective factorizations of the two polynomials may be different; see Section 5.2 for relevant results) We now prove the remarkable fact that the characteristic polynomial and the minimal polynomial share even the irreducible factors of degree greater than 1; again, in general, the number of times each such irreducible factor appear in the factorizations of these two polynomials need not be the same. We shall require the following observation, which is a consequence of the uniqueness of factorizations of polynomials into irreducible factors: if p(x) an irreducible polynomial over a field F such that p(x) divides the product f (x)g(x) of polynomials f (x) and g(x) over F, then p(x) divides either f (x) or g(x). Proposition 5.6.6. Let V be a finite-dimensional vector space over a field F and T a linear operator on V. Let ch(x) and m(x) be the characteristic and the minimal polynomial of T over F, respectively. Then the irreducible factors of ch(x) and m(x) in F[x] are the same. Proof. The proof is by induction on dim V. If dim V = 1, then the characteristic polynomial of T is necessarily a linear one such as x − λ for some λ ∈ F. Since the minimal polynomial of T is a divisor of the characteristic polynomial and has be a monic polynomial of positive degree, it must also be x − λ. So the result holds in this case. Let us assume then that dim V > 1. By the induction hypothesis, we can also assume that the proposition holds for any linear operator on a vector space whose dimension is less than that of V. We choose any non-zero vector v ∈ V and let W = Z(v, T ), the T cyclic subspace of V generated by v; also let V be the quotient space V/W. Note that as dim W ≥ 1, by Proposition (3.9.3), dim V = dim V − dim W ≤ n − 1 < dim V. Let T W denote the restriction of T to W and T the operator induced by T on the quotient space V. Further, let ch1 (x) and ch2 (x) be the characteristic polynomials of T W and of T respectively; then ch(x) = ch1 (x)ch2(x) by the first assertion of Proposition (5.5.7). Coming back to the proof proper, let p(x) be an irreducible factor of ch(x), the characteristic polynomial of T . Since ch(x) = ch1 (x)ch2(x), it follows from our remark about an irreducible factor of a product of two polynomials, that either p(x) divides ch1 (x) or it divides ch2 (x). In the first case, as W

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Some Basic Results

301

is a T -cyclic subspace, ch1 (x) is the same as the minimal polynomial m1 (x) of T W (see Proposition 5.5.13) and so p(x) divides m1 (x). On the other hand, m1 (x) divides m(x), the minimal polynomial of T by Proposition (5.5.6). Thus we conclude that in this case p(x) divides m(x). Suppose now that the second case holds. Then, as T is a linear operator on a vector space of dimension less than dim V, by the induction hypothesis, p(x) divides the minimal polynomial of T . Since Proposition (5.5.7) also asserts that the minimal polynomial of T divides m(x), we are done in this case too. The proof is complete by induction. !

The preceding proof is a modified version of a proof due to Prof. M. Leeuwen. Recall from Theorem (5.3.20) that the characteristic polynomial of a diagonalizable operator T on a finite-dimensional vector space V over a field F factors completely into linear factors. Proposition (5.6.6) then implies the following result for diagonalizable operators, which will be sharpened shortly. Corollary 5.6.7. Let T be a diagonalizable linear operator on a finite-dimensional vector space V over a field F. Then its minimal polynomial factors completely into linear factors over F. We now discuss a couple of examples which illustrate some of the results of this section. We point out that all the results about the relationships between minimal polynomials, characteristic polynomials and eigenvalues for operators have obvious counterparts for matrices, and we will use them without any comments. EXAMPLE 50 If a linear operator on a finite-dimensional vector space or a square matrix has (x − λ1 )n1 (x − λ2 )n2 · · · (x − λk )nk as its characteristic polynomial for distinct λi , then its minimal polynomial has to be (x − λ1 )r1 (x − λ2 )r2 . . . (x − λk )rk , where 1 ≤ ri ≤ ni for each i = 1, 2, . . . , k. It follows as any linear factor is an irreducible polynomial over any field. EXAMPLE 51 Let T be a linear operator on R4 having two eigenvalues 1 and 2. Then we cannot conclude that the characteristic polynomial of T is a product of factors (x − 1) and (x − 2); there may be an irreducible factor (over R) of degree 2. But if we are given that T is diagonalizable, then both the characteristic as well as the minimal polynomial of T must be products of only these factors. To determine the multiplicities of these factors, we need more information about T . However, if T is an arbitrary operator on C4 having 1 and 2 as the only eigenvalues, then we can conclude that the characteristic polynomial has to be a product of the factors (x − 1) and (x − 2) as the characteristic polynomial factors completely into a product of linear factors over C and every such factor (x − λ) will give rise to an eigenvalue λ. Although, as in the preceding case, we cannot say how many times each factor repeats without knowing more about T . Note that the minimal polynomial of T is also a product of the same linear factors whose multiplicities cannot exceed the corresponding ones for the characteristic polynomial.

Saikia-Linear Algebra

302

book1

February 25, 2014

0:8

Linear Operators

EXAMPLE 52 Consider the linear operator T on R4 whose matrix relative to the standard basis is  0 1 A =  0 0

0 0 1 0

0 0 0 1

 1  0 . 0 0

Note that A is a companion matrix so the characteristic as well as the minimal polynomial of A, and hence of T , are x4 − 1. Now the irreducible factors of the minimal as well as the characteristic polynomials of T over R are (x − 1), (x + 1) and (x2 + 1). Therefore, T has two eigenvalues, 1 and −1. It is also clear that T is not diagonalizable over R as the characteristic polynomial is not a product of linear factors over R. However, if T is considered an operator on C4 with the same matrix with respect to the standard basis, then it has four distinct eigenvalues, namely 1, −1, i and −i. T is diagonalizable over C and its minimal and the characteristic polynomial have the same factorization (x − 1)(x + 1)(x − i)(x + i) over C. Our next goal is to derive a simple but useful necessary and sufficient condition for an operator to be diagonalizable in terms of its minimal polynomial. Recall from Corollary (5.6.7) that the minimal polynomial of a diagonalizable operator factors completely into linear factors over the base field. However, as the following theorem shows there can be no repetition of factors; in fact, even the converse holds. Theorem 5.6.8. Let T be a linear operator on a finite-dimensional vector space V over a field F. T is diagonalizable if and only if the minimal polynomial of T is a product (x − λ1)(x − λ2 ) . . . (x − λk ) of distinct linear factors over F, that is, λi ! λ j for i ! j. Proof. We tackle the easier half first. Assume that T is diagonalizable with distinct eigenvalues M λ1 , λ2 , . . . , λk . Thus, the characteristic polynomial of T must be kj=1 (x − λ j )n j for some positive integers n j . By hypothesis, V has a basis consisting of eigenvectors of T . But any such basis vector must be in the kernel of one the operators T − λ1 I, T − λ2 I, . . . , T − λk I (I denotes the identity map IV ). Note that these operators are polynomials in T and therefore commute. It follows that (T − λ1 I)(T − λ2 I) . . . (T − λk I)(v) = 0 for each basis vector v, and so this product of linear operators takes every vector in V to the zero vector. In other words (T − λ1 I)(T − λ2 I) . . . (T − λk I) = z on V and therefore T satisfies the polynomial (x − λ1 )(x − λ2 ) . . . (x − λk ). Consequently, the minimal polynomial of T must divide (x − λ1 )(x − λ2 ) . . . (x − λk ). On the other hand, by Proposition (5.6.6), each of these linear factors, being an irreducible factor of the characteristic polynomial of T , must be a factor of the minimal polynomial of T too. We conclude that the minimal polynomial must be (x − λ1 )(x − λ2 ) . . . (x − λk ) proving one part of the theorem. To prove the converse, assume that the minimal polynomial of T is (x − λ1)(x − λ2 ) . . . (x − λk ), a product of distinct linear factors. Thus, the composite (T − λ1 I)(T − λ2 I) . . . (T − λk I) is the zero map

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Some Basic Results

303

on V. Therefore, V = ker((T − λ1 I)(T − λ2 I) . . . (T − λk I)).

(5.25)

Now, by Exercise 21 of Section 4.4, for the composite S R of two linear operators S and R on V, we have dim ker(S R) ≤ dim ker S + dim ker R. So, extending this inequality to the composite of the k operators T − λ1 I, T − λ2 I, . . . , T − λk I on V (by induction, for example), we see that dim ker((T − λ1 I)(T − λ2 I) . . . (T − λk I)) ≤ dim ker(T − λ1 I) + dim ker(T − λ2 I) + · · · + dim ker(T − λk I) = dim(ker(T − λ1 I) ⊕ ker(T − λ2 I) ⊕ · · · ⊕ ker(T − λk I)),

where the last equality follows as the sum of distinct eigenspaces is a direct sum (see remarks in following Example 23). Equation (5.25) then shows that dim V ≤ dim(ker(T − λ1 I) ⊕ ker(T − λ2 I) ⊕ · · · ⊕ ker(T − λk I)). Since the direct sum of the eigenspaces is a subspace of V, it follows that the preceding inequality is an equality and so by properties of direct sums of subspaces (see Proposition 3.5.6), V is the direct sum of the distinct eigenspaces of T . This completes the proof. ! It is easy to formulate and prove the following matrix version of the theorem. Theorem 5.6.9. Let A ∈ Mn (F). A is similar to a diagonal matrix over F if and only if the minimal polynomial of A is a product of distinct linear factors over F. Theorems (5.6.8) and (5.6.9) are extremely useful because of their simplicity. We give a few examples illustrating their uses. The reader should compare these with similar examples that we worked out previously about diagonalizability using Theorem (5.3.20). EXAMPLE 53 Consider the 4 × 4 matrix

 0 1 A =  0 0

0 0 1 0

0 0 0 1

 0  0 . 0 0

We have seen, in Section 5.5, that the minimal polynomial of A is x4 (it is its characteristic polynomial too). The linear factor x = (x−0) repeats four times in the minimal polynomial and so by Theorem (5.6.9), A cannot be similar to a diagonal matrix over any field. Similarly the nilpotent matrix of order n represented by the Jordan block Jn (0) has minimal polynomial xn , and so, cannot be similar to a diagonal matrix over any field for n > 1.

Saikia-Linear Algebra

304

book1

February 25, 2014

0:8

Linear Operators

EXAMPLE 54 Consider the real matrix of Example 23 of Section 5.3 given by   15 −12 −16   −2 −5. A =  4   9 −8 −9

Since A was seen to have two eigenvalue 1 and 2, the results of this section implies that A will be similar to a diagonal matrix only when its minimal polynomial is (x − 1)(x − 2). Now,    14 −12 −16 13 −12 −16    −3 −5   4 −4 −5  (A − I3)(A − 2I3) =  4    9 −8 −10 9 −8 −11

is clearly a non-zero matrix, so that A does not satisfy (x − 1)(x − 2). It follows that A cannot be diagonalizable. EXAMPLE 55 Let A be the following 4 × 4 real matrix  1 1 A =  0 0

0 1 0 0

0 0 2 0

 0  0 . 0 2

Since this is a lower triangular matrix, the entries along the diagonal are the eigenvalues. So, 1 and 2 are the eigenvalues, each repeating. The shape of the matrix itself suggests that A cannot satisfy the polynomial (x − 1)(x − 2). We leave it to the reader to verify that the product (A − I4 )(A − 2I4 ) is not the zero matrix, and so (x − 1)(x − 2) cannot be the minimal polynomial of A. Thus, A is not diagonalizable.

Upper Triangular Matrix Representations Linear operators, which cannot be diagonalized but whose minimal polynomials are products of linear factors, can still have fairly useful matrix representations. A celebrated result due to Schur states that for such a linear operator T on a finite-dimensional vector space V, one can find a basis with respect to which the matrix of T is upper triangular. We shall need the following lemma about eigenvalues of an induced operator for the proof of Schur’s theorem. Lemma 5.6.10. Let T be a linear operator on a finite-dimensional vector space V such that its minimal polynomial is a product of linear factors. Let W be a proper T -invariant subspace of V and T the operator on the quotient space V = V/W induced by T . Then some eigenvalue of T is also an eigenvalue of T . Proof. Since W is a proper subspace of V, the quotient space V = V/W is a non-zero finite-dimensional vector space and so the minimal polynomial m1 (x) of the induced operator T on V has degree at least 1. On the other hand, by Proposition (5.5.7), the minimal polynomial m1 (x) of T divides the minimal polynomial m(x) of T . As m(x) is the product of only linear factors, at least one such linear factor, say x − λ is also a factor of m1 (x). Then λ is clearly an eigenvalue of T . !

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Some Basic Results

305

Note that if λ is the eigenvalue of T as postulated in the lemma, then there is a non-zero v ∈ V such that (T − λI)v = 0, where I is the identity operator of V induced by the identity operator I of V. Since v is non-zero in V = V/W, it follows that v " W whereas (T − λI)v ∈ W. This conclusion, which follows from the lemma, will be key in the proof of the following result. Proposition 5.6.11. Let T be a linear operator on a finite-dimensional vector space V over a field F. If the minimal polynomial of T is a product of linear factors over F, then there is a basis of V with respect to which the matrix of T is upper triangular. Proof. We show how to construct the required basis {v1 , v2 , . . . , vn } of V. Since the matrix of T has to be upper triangular, it follows that the vectors of the basis have to chosen so as to satisfy the condition that for any k ≥ 1, T vk is in the span of v1 , v2 , . . . , vk . We begin our construction by choosing v1 to be any eigenvector of an eigenvalue, say λ1 , of T (note that T has at least one eigenvalue as its minimal polynomial is a product of linear factors); since T v1 = λ1 v1 , the required condition is satisfied. Let W1 be the subspace spanned by v1 ; W1 is trivially T -invariant. If W1 = V, we are clearly done. Otherwise W1 is a proper subspace of V and so we can apply the lemma to find a vector v2 " W1 such that for some eigenvalue λ2 of T , (T − λ2 I)v2 ∈ W1 . The choice of v2 implies that {v1 , v2 } are linearly independent and that T v2 = a12 v1 + λ2 v2 ,

(5.26)

for some a12 ∈ F. If W2 is the subspace spanned by v1 , v2 , then Equation (5.26) shows that W2 is T -invariant. Now if W2 = V, we are done as {v1 , v2 } is the required basis. Otherwise we continue in a similar manner. To be precise, suppose that we have been able to find linearly independent vectors v1 , v2 , . . . , vk−1 such that their span Wk−1 is T -invariant and T vk−1 ∈ Wk−1 . If Wk−1 is a proper subspace of V, then by the preceding lemma, we can find a vector vk " Wk−1 and an eigenvalue λk of T such that T vk − λk vk is in Wk−1 . It is clear then that v1 , v2 , . . . , vk−1 , vk are linearly independent and if Wk is the span of these vectors, then T vk is in Wk . Since V is finite-dimensional, this process must stop after finitely many steps producing a basis of V with the required property. ! Definition 5.6.12. A linear operator T on a finite-dimensional vector space V over a field F is called triangulable if there is a basis of V with respect to which the matrix of T is upper triangular. Similarly, a matrix A ∈ Mn (F) is said to be triangulable if A is similar to a upper triangular matrix in Mn (F). It can be easily shown that if U is an upper triangular matrix of order n, whose diagonal elements are a11 , a22 , . . . , ann (not necessarily distinct), then the characteristic polynomial of U is ch(U) = (x − a11)(x − a22) · · · (x − ann). Since the minimal polynomial of U divides ch(U), it follows that the the minimal polynomial of U, and of any matrix similar to U, has to a product of linear factors. This proves that the converse of the preceding proposition holds. We can now present Schur’s complete theorem. Theorem 5.6.13. (Schur’s Theorem) A linear operator T on a finite-dimensional vector space V over a field F is triangulable if and only if the characteristic polynomial (or the minimal polynomial) of T factors completely into a product of linear factors over F.

Saikia-Linear Algebra

306

book1

February 25, 2014

0:8

Linear Operators

For algebraically closed fields such as the field C of complex numbers, Schur’s theorem takes a particularly simple form as any non-constant polynomial (and so characteristic polynomial of any operator) factors into a product of linear factors. Corollary 5.6.14. Any linear operator on a finite-dimensional vector space over an algebraically closed field is triangulable. The matrix versions are straightforward. Corollary 5.6.15. (Schur’s Theorem) Let A ∈ Mn (F). A is similar to a upper triangular matrix in Mn (F) if and only if the characteristic polynomial (or the minimal polynomial) of A factors completely into a product of linear factors over F. Corollary 5.6.16.

If F is an algebraically closed field, the any A ∈ Mn (F) is triangulable.

For some applications, see Exercises 23, 27 and 28.

EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. All given vector spaces are finite-dimensional. (a) If a triangular matrix A is similar to a diagonal matrix over a field F, then A is already diagonal. (b) Any matrix A in Mn (F) such that A2 = A is diagonalizable over F. (c) Any square matrix over C is diagonalizable. (d) The restriction of a non-diagonalizable operator T on a finite-dimensional vector space to a T -invariant subspace can never be diagonalizable. (e) If every one-dimensional subspace of a finite-dimensional vector space is invariant under a linear operator T , then T is diagonalizable. (f) The only nilpotent diagonalizable operator on a non-zero finite-dimensional vector space is the zero operator. (g) If zero is the only eigenvalue of a linear operator on a finite-dimensional vector space, then it must be nilpotent. (h) If the characteristic polynomial of a linear operator is a product of distinct linear factors, then it coincides with the minimal polynomial. (i) The roots in F of the minimal polynomial of a matrix in Mn (F) are precisely its eigenvalues. (j) Any matrix in Mn (C) is similar to a lower triangular matrix. (k) If T is the operator on V/W induced by a linear operator T on V, then any eigenvalue of T is an eigenvalue of T . 2. Give an example of each of the following: (a) A non-zero matrix in M2 (R) which is diagonalizable but not invertible. (b) A matrix in M2 (R) which is invertible but not diagonalizable.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Some Basic Results

307

(c) Diagonalizable matrices A and B in M2 (R) such that A + B is not diagonalizable. (d) Diagonalizable matrices A and B in M2 (R) such that AB is not diagonalizable. 3. Find the minimal polynomial of  0 1 A =  0 1

1 0 1 0

0 1 0 1

 1  0 . 1 0

Is A diagonalizable over R? Over C? 4. Let A be real 6 × 6 matrix which has x4 (x − 1)2 as its characteristic polynomial and x(x − 1) as its minimal polynomial. What are the dimensions of the eigenspaces of A? 5. Can x2 + x + 1 be the minimal polynomial of a real 3 × 3 diagonalizable matrix or a complex 3 × 3 diagonalizable matrix? 6. If A is a diagonalizable matrix in Mn (R) such that Ak = In for some positive integer k ≥ 1, then show that A2 = In . 7. Let A be a matrix in M3 (R). If A is not similar to a lower triangular matrix over R, then show that A is similar to a diagonal matrix over C. 8. If zero is the only eigenvalue of a linear operator T on a finite-dimensional complex vector space, then show that T is nilpotent. 9. Let T be a linear operator on an n-dimensional vector space V such that T k = z, the zero operator on V, for some positive integer k. Prove that T n = z. 10. Let T be a linear operator on a finite-dimensional vector space V over a field F with minimal polynomial m(x). For any polynomial f (x) over F, let r(x) be the gcd of f (x) and m(x). Prove that ker f (T ) = ker r(T ). 11. Let T be a linear operator on a finite-dimensional vector space V over a field F with minimal polynomial m(x). Prove that for any irreducible polynomial r(x) over F, r(x) and m(x) are relatively prime if and only if the operator r(T ) is invertible on V. 12. Let T be a diagonalizable operator on a finite-dimensional real vector space V. Prove that there is no non-zero v ∈ V such that (T 2 + T + I)(v) = 0, the zero vector of V. (I is the identity map on V). 13. Let T be a diagonalizable operator on a finite-dimensional vector space V over a field F. Show that for any polynomial f (x) over F such that f (a) is non-zero for any eigenvalue a of T , the operator f (T ) on V is not only diagonalizable, but also invertible. 14. Let T be a nilpotent operator on a finite-dimensional vector space V over a field F. Show that for any polynomial f (x) over F such that the constant term of f (x) is non-zero, the operator f (T ) is invertible, hence cannot be nilpotent. 15. Let T be a linear operator on a finite-dimensional vector space V. Suppose that V can be decomposed as a direct sum V = W1 ⊕ W2 ⊕ · · · ⊕ Wk of T -invariant subspaces. Let T i be the restriction of T to the T -invariant subspace Wi . Prove that T is diagonalizable if and only if each T i is diagonalizable. 16. Let A be a lower triangular matrix in Mn (F) having distinct eigenvalues a1 , a2 , . . . , ak . Suppose that the algebraic multiplicity of the eigenvalue ai is di . Verify directly that A satisfies its

Saikia-Linear Algebra

308

book1

February 25, 2014

0:8

Linear Operators

characteristic polynomial (x − a1 )d1 (x − a2)d2 · · · (x − ak )dk . 17. Use the preceding exercise to give another proof of Cayley–Hamilton Theorem for matrices in Mn (C). 18. Prove that any permutation matrix in Mn (C) is diagonalizable. 19. Give an example of a 4 × 4 real permutation matrix P ! I4 which is diagonalizable over R. 20. Give an example of a 4 × 4 real permutation matrix which is not diagonalizable over R. 21. Let T be a linear operator on a vector space V, and W be a T -invariant subspace of V. Define T on the quotient space V = V/W by T (v) = T (v) + W for any v ∈ V. Verify that T is well-defined, and that it is a linear operator on the quotient space V/W. We sketch an alternative proof of Proposition (5.6.6) in the following exercise. 22. Let A ∈ Mn (F), where F is an arbitrary field. If the minimal polynomial m(x) of A is m(x) = a0 + a1 x + a2 x2 + · · · + ar−1 xr−1 + xr , where ai ∈ F, define matrices B j for j = 0, 1, 2, . . . , r − 1, as follows: B0 = I B j = A j + ar−1 A j−1 + · · · + ar− j+1 A + ar− j I, where I is the n × n identity matrix over F. (a) Show that B j − AB j−1 = ar− j I

for j = 1, 2, . . . , r − 1

and −ABr−1 = a0 I. (b) If B(x) is the polynomial with matrix coefficients given by B(x) = Br−1 + Br−2 x + · · · + B1 xr−2 + B0 xr−1 , then prove that (xI − A)B(x) = m(x)I. (c) Hence prove that the characteristic polynomial of A divides m(x)n . (d) Finally, use properties of irreducible polynomials over fields to prove that any irreducible factor of the characteristic polynomial of A divides m(x). 23. Let A ∈ Mn (F) whose characteristic polynomial ch(x) factors over F as follows: ch(x) = (x − λ1 )d1 (x − λ2 )d2 . . . (x − λk )dk . Prove that T r(A) = d1 λ1 + d2 λ2 + · · · + dk λk

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Some Basic Results

309

and det(A) = λ1 d1 λ2 d2 . . . λk dk . 24. Let A be the following matrix in M3 (C):  1  A = 0  0

0 ω 0

 0   0  .  2 ω

Prove that T r(A) = T r(A2 ) = 0, but that A is not nilpotent. 25. Give an example of non-nilpotent matrix A ∈ Mn (C) such that T r(A) = T r(A2 ) = · · · = T r(An−1 ) = 0. Can you find a non-nilpotent matrix in Mn (R) with similar properties? 26. Let A ∈ Mn (C) be an invertible matrix such that Ak is diagonalizable for some positive integer k. Prove that A itself is diagonalizable. We need Newton’s identities for the following exercises. These identities relate power sums pk (x1 , x2 , . . . xn ) = x1 k + x2 k + · · · + xn k of n variables x1 , x2 , . . . , xn to elementary symmetric polynomials ek (x1 , x2 , . . . , xn ) given by e0 (x1 , x2 , . . . xn ) = 1, n 1 xi , e1 (x1 , x2 , . . . xn ) = e2 (x1 , x2 , . . . xn ) =

i=1 1

xi x j etc.

i≤ j

In general, e j (x1 , x2 , . . . , xn ) for 1 ≤ j ≤ n is the sum of the products of x1 , x2 , . . . , xn taken j at a time. Thus, for example, en (x1 , x2 , . . . , xn ) = x1 x2 . . . xn . Then, the Newton’s identities can be stated as kek (x1 , x2 , . . . , xn ) =

k 1

(−1)i−1ek−i (x1 , x2 , . . . , xn )pk (x1 , x2 , . . . , xn ),

i=1

valid for any positive integer k, (1 ≤ k ≤ n). 27. Let A ∈ Mn (F). If T r(A) = T r(A2 ) = · · · = T r(An ) = 0, then show that A is nilpotent. (Hint: Consider A as a matrix over an extension of F, where the characteristic polynomial of A factors completely into a product of linear factors and then use Schur’s theorem to reduce to the case of a upper triangular matrix.)

Saikia-Linear Algebra

310

book1

February 25, 2014

0:8

Linear Operators

28. Let A, B ∈ Mn (F). If T r(Ak ) = T r(Bk ) for each positive integer k, (1 ≤ k ≤ n), then show that A and B have the same characteristic poynomial. Hence show that if A, B ∈ Mn (C) with T r(Ak ) = T r(Bk ) for each positive integer k, (1 ≤ k ≤ n), then A and B have the same set of n eigenvalues. 29. Let T be a linear operator on a finite-dimensional vector space V such that its minimal polynomial is a product of linear factors. If a proper subspace W of V is T -invariant, then show that there is some v " W and an eigenvalue λ of T such that (T − λI)v ∈ W. (Hint: Use Proposition (5.6.10).)

5.7 REAL QUADRATIC FORMS Real quadratic forms appear naturally in various branches of mathematics (such as co-ordinate geometry of R2 and R3 and applications of linear algebra) as well as in physics, statistics and economics. Special types of real quadratic forms play an important role in number theory. Thus it is a topic that every student of mathematics should be familiar with. We are now in a good position to introduce the reader to real quadratic forms as such forms are closely related to real symmetric matrices which we have already studied in detail; our aim is to develop enough theory to be able to classify conics in R2 and R3 . We give a couple of examples to show how real symmetric matrices give rise to real quadratic forms. Consider first the real symmetric matrix ' 1 A= 2

( 2 . 1

If x denotes the column vector (x1 , x2 )t , then (2 3 x1 x2 2 3 x = (x1 + 2x2 , 2x1 + x2 ) 1 x2

xt Ax = (x1 , x2 )

' 1 2

2 1

= x21 + 4x1 x2 + x22 .

Thus xt Ax is an expression where the degree of each term is two (degree of 2x1 x2 is the sum of the degrees of x1 and x2 ) and so is a homogeneous polynomial of degree 2 in variables x1 and x2 . Note that the expression is of the form x2 + 4xy + y2 which is the standard equation of a circle in R2 . Similarly, for the symmetric matrix   4     A =  1     1 − 2

1 2 1 2

1 −  2    1  , 2     1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Real Quadratic Forms

311

and x = (x1 , x2 , x3 )t , we see that    4 1 − 12      x     1   t 1  x Ax = (x1 , x2 , x3 )  1 2 2   x2       x3  − 21 21 1

= 4x21 + 2x22 + x23 + 2x1 x2 − x1 x3 + x2 x3 ,

which, as in the previous example, is a homogeneous polynomial of degree 2 in variables x1 , x2 and x3 . Such homogeneous polynomials of degree 2 in a number of variables are also known as real quadratic forms. Keeping these examples in mind, we present the general definition of a real quadratic form. Definition 5.7.1. A real quadratic form q(x1 , x2 , . . . , xn ) in n variables x1 , x2 , . . . , xn is a homogeneous polynomial of the type q(x1 , x2 , . . . , xn ) =

n 1

cii x2i +

1

ci j xi x j ,

(5.27)

i< j

i=1

where the coefficients ai j are real numbers. If one thinks of x1 , x2 , . . . , xn as the components of the column vector x ∈ Rn , then the quadratic form q can be considered a function on the real vector space Rn ; in that case we denote the quadratic form as q(x). Given a quadratic form q, as in Equation (5.27), we can associate a real symmetric matrix A = [ai j ] with q in the following manner: for 1 ≤ i ≤ j ≤ n, we let    if i = j  cii ai j =  1    ci j if i ! j 2 and set a ji = ai j . Then A = [ai j ] is a real symmetric matrix of order n; we shall call A the matrix of the quadratic form q. Now, if x is the column vector (x1 , x2 , . . . , xn )t , then the product Ax is also a 4 column vector whose i th component is nj=1 ai j x j . Thus xt Ax can be expressed as the double sum n 1 n 1

a i j xi x j ,

(5.28)

i=1 j=1

which can rearranged, by grouping the terms for which i = j first. For the rest of the terms for which i ! j, note that xi x j = x j xi and ai j = a ji . Therefore the double sum in Equation (5.28) can be expressed as n 1 i=1

aii xi xi + 2

1 i< j

a i j xi x j .

Saikia-Linear Algebra

312

book1

February 25, 2014

0:8

Linear Operators

We have thus shown that q(x) = xt Ax. Conversely, the same argument shows that for any real symmetric matrix A = [ai j ], the product xt Ax is a real quadratic form in n variables, or equivalently on Rn . The following result then gives a working definition of a real quadratic form. Proposition 5.7.2. A real quadratic form in n variables x1 , x2 , . . . , xn is xt Ax for some real symmetric matrix A of order n and the column vector x = (x1 , x2 , . . . , xn )t ∈ Rn . Principal Axes Theorem Consider a quadratic form q with associated symmetric matrix A ∈ Mn (R). Then we know, by Proposition (5.3.23), that A has n real eigenvalues, say λ1 , λ2 , . . . , λn , not necessarily distinct. Moreover, by Proposition (5.3.24), there is an orthogonal matrix Q ∈ Mn (R) such that Q−1 AQ = Qt AQ is the diagonal matrix D = diag[λ1, λ2 , . . . , λn ]. Consider the change of coordinates given by x = Qy, where y = (y1 , y2 , . . . , yn )t . Then xt Ax = yt Qt AQy = yt Dy, showing that the quadratic form q can be expressed as λ1 y21 + λ2 y22 + · · · + λn y2n . One says that the orthogonal change of coordinates has removed the cross-product terms of q; crossproduct terms refer to those involving any product xi x j for i ! j in the expression for q. Note that the columns of the orthogonal matrix Q form an orthonormal basis of Rn consisting of orthonormal eigenvectors of A; the ith column of Q is the unit eigenvector belonging to the eigenvalue λi . Thus, we have the following important result about real quadratic forms. Theorem 5.7.3. (Principal Axes Theorem) A real quadratic form q(x) = q(x1 , x2 , . . . , xn ) =

n 1

aii x2i + 2

1

a i j xi x j

i< j

i=1

on Rn can be reduced to its diagonal form q(y) = q(y1 , y2 , . . . , yn ) =

n 1

λi y2i

i=1

by an orthogonal change of coordinates x = Qy, where λ1 , λ2 , . . . , λn are the eigenvalues of the real symmetric matrix A = [ai j ] associated with form q(x1 , x2 , . . . , xn ) and Q is an orthogonal matrix whose columns form an orthonormal basis of eigenvectors of A. It must be pointed out that the orthogonal change of co-ordinates x = Qy means simply this: if a vector in Rn has x = (x1 , x2 , . . . , xn )t as its co-ordinate vector with respect to the standard basis of Rn , then y = (y1 , y2 , . . . , yn )t is its co-ordinate vector with respect to the orthonormal basis of

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Real Quadratic Forms

313

Rn constituted by the columns of the orthogonal matrix Q. The important point about an orthogonal change of coordinates is that such a change does not alter the distances between points of Rn (see part (b) of Proposition 3.7.14 about properties of orthogonal matrices). To apply the Principal Axes theorem to a specific real quadratic form, in practice, we adopt the same procedure used in diagonalizing a real symmetric matrix. EXAMPLE 56 Consider the real quadratic form q(x1 , x2 , x3 ) = 2x21 + 2x22 + 2x23 + 2x1 x2 + 2x1 x3 + 2x2 x3 .

(5.29)

It is clear that the matrix associated to q is  2  A = 1  1

1 2 1

 1  1,  2

the real symmetric matrix which we diagonalized in Example 30 of Section 5.3. We had shown that A has two eigenvalues λ = 1, 4 and that an orthonormal basis for R3 can be chosen whose vectors are respectively √    1/ 2  √    −1/ 2 ,   0

√    1/ √6     √1/ √6 ,   − 2/ 3

 √   1/ 3   √   1/ √3 ,   1/ 3

where the first two are eigenvectors for λ = 1 and the third an eigenvector for λ = 4. Therefore if Q is the orthogonal matrix whose columns are these basis vectors, then  2  Q 1  1 t

1 2 1

  1 1   1 Q = 0   2 0

0 1 0

0 0 4

   .

It then follows from the proof of the Principal Axes theorem that the following change of co-ordinates √ √ √      1/ √6 1/ √3 y1   x1   1/ √2      x2  = −1/ 2 y ,    √1/ √6 1/ √3  2  x3 0 − 2/ 3 1/ 3 y3

transforms the given quadratic form to

q(y1 , y2 , y3 ) = y21 + y22 + 4y23 .

(5.30)

Saikia-Linear Algebra

314

book1

February 25, 2014

0:8

Linear Operators

Observe that the preceding matrix equation gives the following explicit relations between the original and the new co-ordinates: 1 1 1 x1 = √ y1 + √ y2 + √ y3 3 2 6 1 1 1 x2 = − √ y1 + √ y2 + √ y3 2 6 3 √ 1 2 x3 = − √ y2 + √ y3 3 3 It should be clear to the reader that preceding transformation of the quadratic form by matrix method amounts to, in practical terms, showing that by substituting each xi , by its formula in terms of the yi s, in the expression for q in Equation (5.29) yields Equation (5.30). One cannot but marvel the way the correct substitution was found which removes the cross-product terms in q. As our main application of real quadratic forms, we now classify conics in R2 and R3 . Conic Sections From analytic geometry, we know that a quadratic equation of the form ax2 + 2bxy + cy2 + dx + ey + f = 0,

(5.31)

where the coefficients are all real and in which a, b and c are not all zeros, represents a conic section. It means that for suitable choices of the coefficients in Equation (5.31), the resultant equation represents a circle, an ellipse, a hyperbola or a parabola in general; however, equation (5.31) also includes degenerate cases as well as cases with empty solutions such as x2 + y2 + 1 = 0. We now present a brief discussion of the classifications of the conics represented by Equation (5.31). The nature of the conic represented by Equation (5.31) is determined largely by the quadratic form ax2 + 2bxy + cy2 associated with the expression in Equation (5.31) and so by the corresponding real symmetric matrix ' ( a b A= . b c

Equation (5.31) then can be represented as

X t AX + Bt X + f = 0

for X =

2 3 x ∈ R2 y

and B =

2 3 d . e

Note that, as in analytic geometry, we are denoting the components of a vector as x and y and not as x1 and x2 . As A is a real symmetric matrix, it has two real eigenvalues, say λ1 and λ2 . Now, by the Principal Axes Theorem (5.7.3), there is an orthogonal matrix Q of order 2 such that the change of coordinates X = QX ' , where X ' = (x' , y' )t , diagonalizes the quadratic form in Equation (5.32) to the form λ1 x' 2 + λ2 y' 2 . Thus the expression in Equation (5.31) reduces in the x' y' -plane to λ1 x' 2 + λ2 y' 2 + d1 x' + e1 y' + f = 0,

(5.32)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Real Quadratic Forms

315

where Bt QX ' = d1 x' + e1 y' . Next, observe that as Q is an orthogonal matrix, QQt = I2 and so det Q = ±1. If det Q = −1, consider the matrix P obtained from Q by interchanging its columns. It is easy to see that P is an orthogonal matrix with det P = 1 and the change of coordinates X = PX ' reduces the quadratic form X t AX to the diagonal form X ' t diag[λ2, λ1 ]X ' . In other words, without any loss of generality, we can assume that the orthogonal matrix Q, for the change of coordinates X = QX ' , has determinant 1, that is, Q is a rotation in R2 (see Definition 3.7.15). We now proceed to classify the curves given by Equation (5.31) using the equivalent Equation (5.32). Case 1: Both the eigenvalues λ1 and λ2 of A are non-zero. In case the eigenvalues have the same sign, then completing the squares in Equation (5.32), we can put it in the form λ1 (x' + d2 )2 + λ2 (y' + e2 )2 = f2 ,

(5.33)

for some real numbers d2 , e2 and f2 . If f2 is non-zero and has the same sign as λ1 and λ2 , then the translation x'' = x' +d2 and y'' = y' +e2 reduces Equation (5.33) (after replacing x'' and y'' by x and y) to the standard form of an ellipse x2 y 2 + = 1, α2 β2 N N where α = f2 /λ1 and β = f2 /λ2 are non-zero positive reals. 2α and 2β are the lengths of the ellipse along the new x-axis and y-axis, respectively; the larger of these two is the length of the major axis whereas the other one is the length of the minor axis. Note that all we have done is to rotate the coordinate axes of the original xy-plane (through the rotation Q) and then shifted the origin (by a translation) so that axes of the new xy-plane are the axes of the ellipse. The degenerate subcases will occur if non-zero f2 has the sign opposite to that of the eigenvalues in which case Equation (5.33) has no graph (imaginary ellipse) or if f2 = 0 in which case Equation (5.33) represents a pair of straight lines given by x' + d2 = 0 = y' + e2 . If the non-zero eigenvalues have opposite signs, then as in the preceding case, after completing the square we can put Equation (5.32) in either of the two forms x2 y2 − = ±1, α2 β2 which are the standard forms of hyperbolas, or in the degenerate case in the form λ1 (x' + d2 )2 − λ2 (y' + e2 )2 = 0, which again represents a pair of straight lines. Case 2: One of the eigenvalues is zero. Without any loss of generality, we assume that λ1 = 0. In this case, Equation (5.32) can be rewritten as λ2 (y' + e2 )2 = d2 x' + f2 ,

(5.34)

where e2 , d2 and f2 are real numbers. The degenerate cases will arise for d2 = 0; if f2 = 0 too, then

Saikia-Linear Algebra

316

book1

February 25, 2014

0:8

Linear Operators

Equation (5.34) is a pair of coincident lines; otherwise it will represent a pair of parallel lines (in case λ2 and f2 have the same sign) or will not have a solution (in case λ2 and f2 have opposite signs). On the other hand, if d2 is non-zero, then Equation (5.34) can be rewritten as the standard equation of a parabola after performing suitable translation: y2 = 4ax. Case 3: Both the eigenvalues are zero. This case occurs only when the matrix A is the zero matrix. So, Equation (5.31) represents the straight line dx + ey + f = 0. We now illustrate the classification provided by the preceding discussion in a couple of examples. EXAMPLE 57 The symmetric matrix '

2 A= 1

( 1 2

has eigenvalues 3 and 1. Therefore, the equation 2x2 + 2xy + 2y2 − 4 = 0 represents an ellipse, as the eigenvalues of the associated matrix A are both positive and f2 = 4. In fact, following the procedure of orthogonal diagonalization of real symmetric matrices, we see that the change of coordinates X = QX ' effected by the rotation √ ( ' √ 1/ √2 1/ √2 Q= 1/ 2 −1/ 2 transforms the given equation to 3x' 2 + y' 2 = 4, √ which is the equation of an ellipse with axes of lengths 4/ 3 and 4. To determine the required rotation, we note that Q is the orthogonal matrix which diagonalizes the real symmetric matrix A. So the columns of Q are the unit eigenvectors of A forming an orthonormal basis of R2 and therefore can be computed by the usual methods; one has also to make sure that det Q = 1, if necessary, by interchanging the columns of Q. It is also clear that if instead we consider the equation 2x2 + 2xy + 2y2 + 4 = 0, then it cannot represent any graph as f2 now has sign opposite to that of the eigenvalues. This absence of graph can be explained by the fact that there are no real x and y which can satisfy the equation. Now, consider the equation 2x2 + 2xy + 2y2 + 2x − 2y − 4 = 0.

(5.35)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Real Quadratic Forms

317

The associated symmetric matrix is the same matrix A so that the same orthogonal matrix diagonalizes A. However, we have to take into account the effect of Q on the x and y terms of the equation. Note that if X = (x, y)t and X ' = (x' , y' )t , then writing out the transformation X = QX ' explicitly, we obtain 1 x = √ x' + 2 1 ' y= √ x − 2

1 √ y' 2 1 ' √ y. 2

It follows that Equation (5.35) is transformed by Q into 4 3x' 2 + y' 2 + √ y' − 4 = 0, 2 which, after completing the square, can be rewritten as √ 3x' 2 + (y' + 2)2 = 6 or as x2 y2 + =1 2 6 √ after affecting the translation x = x' and y = y' + 2. Thus Equation (5.35) represents yet another ellipse whose major axis still lies along the new y-axis. EXAMPLE 58 Consider the equation 2x2 − 4xy − y2 + 4 = 0. In the notation we have adopted, this can be expressed as X t AX + 4 = 0, where ' ( 2 −2 A= . −2 −1 As the eigenvalues of A are 3 and −2, our discussion of the case of eigenvalues of different signs shows that the given equation represents a hyperbola. The standard equation of this hyperbola is easily seen to be x' 2 y' 2 − = 1. 2 (3/4) Similarly, the equation 2x2 − 4xy − y2 − 4 = 0 represents the hyperbola with its standard equation as y'2 x'2 − = 1. (3/4) 2

Saikia-Linear Algebra

318

book1

February 25, 2014

0:8

Linear Operators

Classification of Quadrics The treatment of conic sections in R2 can be extended to what are known as quadrics or quadratic surfaces in R3 . We present a brief review of the classification of these surfaces now. The general equation of a quadric is ax2 + by2 + cz2 + 2dxy + 2exz + 2 f yz + 2px + 2qy + 2rz + s = 0.

(5.36)

The relevant quadratic form is ax2 + by2 + cz2 + 2dxy + 2exz + 2 f yz with the associated real symmetric matrix  a d  e

d b f

 e   f .  c

As in the case of conic sections in R2 , one can find, by the Principal Axes theorem (5.7.3), an orthogonal change of coordinates X = QX ' which reduces the quadratic form to λ1 x' 2 + λ2 y' 2 + λ3 z' 2 , where λ1 , λ2 and λ3 are the eigenvalues of A (verify). Observe that Q can be chosen such that det Q = 1 so Q can be assumed to be a rotation. Now, suppose that Equation (5.36) of the quadric changes to λ1 x' 2 + λ2 y' 2 + λ3 z' 2 + δ1 x' + δ2 y' + δ3 z' + δ = 0.

(5.37)

As in the case of quadratic conics, the nature of the surface represented by Equation (5.37) depends on the signs of the eigenvalues of A and the constant term. If, for example, all the three eigenvalues λ1 , λ2 and λ3 are non-zero, then an obvious translation of the type x' = x'' + µ1 , y' = y'' + µ2 , z' = z'' + µ3 can be used to reduce Equation (5.37) to the following form: λ1 x'' 2 + λ2 y'' 2 + λ3 z'' 2 = µ,

(5.38)

which then can be put in the standard form as follows. Consider the case when all the eigenvalues have the same sign. In case µ is non-zero, Equation (5.38) can be rewritten (after dropping the primes) as x2 y2 z2 + + =1 α2 β2 γ2 or as x2 y2 z2 + + = −1, α2 β2 γ2 depending on whether µ has the same sign or the sign opposite to that of the eigenvalues. The first equation represents an ellipsoid and the second an imaginary ellipsoid. When µ = 0, we have the

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Real Quadratic Forms

319

equation of the zero ellipsoid: x2 y2 z2 + + = 0. α2 β2 γ2 Similarly, if the eigenvalues are not of the same sign, then we can put Equation (5.38), interchanging x, y and z if necessary, in one of the following three forms: x2 y2 z2 + − = 0, 1 or − 1, α2 β2 γ2 which represents an elliptic cone, a hyperboloid of one sheet or a hyperboloid of two sheets, respectively. Note that the preceding surfaces are possible only when the rank of the matrix A is 3, as the diagonal form of A, consisting of its eigenvalues as diagonal entries, has three non-zero diagonal entries. If the rank of A is 2, then we can assume, without any loss of generalities, that λ3 = 0. In that case, depending on whether δ3 (the coefficient of the z' -term) is non-zero or zero in Equation (5.37), one can perform suitable translations to eliminate the x' −, y' − terms, and the z' − term, if possible, in it to represent it in either of the forms x2 y2 ± = ±z, α2 β2 which represents an elliptic or a hyperbolic paraboloid, or in one of the forms x2 y 2 ± = 0, 1 or − 1, α2 β2 which represents a pair of planes, an elliptic cylinder or a hyperbolic cylinder, respectively. Finally, we consider the case when the rank of A is 1. So, we may assume that λ2 = λ3 = 0. As in the preceding cases, suitable translations then allow us to reduce Equation (5.37) to either x2 + α = 0, which represents a pair of parallel planes, real or imaginary, or x2 + 2αy = 0, a parabolic cylinder. Finally, it should be pointed out that in both R2 and R3 , in general, the reduction of the general equation to the standard form is accomplished by a rotation followed by a translation, a combination which is known as an isometry (see website for a discussion of isometries). EXERCISES 1. For each of the following quadratic forms, determine an orthogonal matrix Q which diagonalizes the quadratic form; give the diagonalized form in each case: a. 2x1 x3 + x2 x3 . b. 3x21 + 2x22 + 2x23 + 2x1 x2 + 2x1 x3 + 2x2 x3 .

Saikia-Linear Algebra

320

book1

February 25, 2014

0:8

Linear Operators

c. −9x21 − 7x22 − 11x23 + 8x1 x2 − 8x1 x3 .

d. 4x21 + x22 − 8x23 + 4x1 x2 − 4x1 x3 + 8x2 x3 .

2. Classify the conics in R2 , represented by the following equations, by reducing the equations to the standard forms; in each case, specify the rotation Q and any translation, if necessary, required to reduce the equation to the standard form (make sure that det Q = 1): a. 7x2 + 4xy + 4y2 − 6 = 0.

b. 2x2 + 10xy + 2y2 + 21 = 0. c. 5x2 − 4xy + 5y2 − 14x − 28y + 10 = 0. √ N d. 8x2 − 8xy + 2y2 − 4 5x + 12 5y + 6 = 0.

3. Classify the quadrics in R3 , represented by the following equations, by reducing the equations to the standard forms; in each case, specify the rotation P and any translation, if necessary, required to reduce the equation to the standard form (make sure that det P = 1): a. x2 + yz = 0. b. x2 + 3y2 + z2 + 2xy + 2xz + 6yz − 10 = 0.

c. x2 + 3y2 + z2 + 2xy + 2xz + 6yz + 18x − 12z − 10 = 0.

d. x2 − z2 − 4xy + 4yz + 6 = 0.

e. x2 − z2 − 4xy + 4yz + 6x − 3y − 12z − 6 = 0.

f. 9x2 − 4xy + 6y2 + 3z2 + 10x − 20y + 12z + 32 = 0.

Saikia-Linear Algebra

6

book1

February 25, 2014

0:8

Canonical Forms

6.1 INTRODUCTION In Chapter 5, we saw that a linear operator on a finite-dimensional vector space can be diagonalized only under some strict conditions on its minimal or characteristic polynomial. So we seek other simple forms of matrix representations of linear operators. An upper or a lower triangular matrix is an example of such simple forms, and we have already seen that over C, any operator can be represented as a triangular matrix. But there are other matrix representations that reflect intrinsic properties of linear operators. This chapter deals with some such representations usually known as the canonical forms. To motivate our approach, recall that (see discussion after Proposition 5.5.5) a linear operator T on a finite-dimensional vector space is diagonalizable precisely because the vector space can be decomposed as a direct sum of T -invariant subspaces such that the restrictions of T to these subspaces act as some scalar times the identity operators of these subspaces. Our search for canonical forms for a general linear operator T generalizes this approach. The underlying vector space is decomposed into a direct sum of suitable T -invariant subspaces on which the restrictions of T act in ways that reflect important features of T . Then, Proposition (5.5.5) allows us to have a matrix representation of T as a block diagonal matrix. The decomposition of the vector space into direct sum of suitable invariant subspaces is accomplished by a powerful result known as the primary decomposition theorem. We begin by discussing this theorem in the next section.

6.2 PRIMARY DECOMPOSITION THEOREM The proof of the following theorem relies heavily on properties of polynomials over fields; for relevant details, see Section 5.2. Theorem 6.2.1. Let T be a linear operator on a finite-dimensional vector space V over a field F, whose minimal polynomial m(x) can be expressed as the following product m(x) = (p1 (x))r1 (p2 (x))r2 · · · (pt (x))rt of irreducible polynomials p j (x) over F, for 1 ≤ j ≤ t, where r j are positive integers.. Define, for each j, 1 ≤ j ≤ t, the following subspaces of V: W j = ker(p j (T ))r j .

321

Saikia-Linear Algebra

322

book1

February 25, 2014

0:8

Canonical Forms

Then the following assertions hold: (a) Each W j is T -invariant. (b) If T j = T W j is the restriction of T to W j , then the minimal polynomial of T j is (p j (x))r j . (c) V can be expressed as the direct sum V = W1 ⊕ W2 ⊕ · · · ⊕ Wm of the T -invariant subspaces W j . Proof. According to Proposition (5.5.2), the kernel of any operator on V which commutes with T has to be T -invariant. Since every polynomial in T and, in particular p j (T )r j , commutes with T , assertion (a) of the theorem follows. We take up assertion (c) next. The direct sum decomposition in (c) will be proved by showing that each W j is the image of some projection on V (for conditions on projections inducing direct sum decompositions, see Proposition 4.2.13). The required projections P j will be defined as certain polynomials in T . To begin with we introduce polynomials f1 (x), f2 (x), . . . , ft (x) using the factorization of m(x) as follows: for each j, 1 ≤ j ≤ t, let ? m(x) f j (x) = = (pk (x))rk . (p j (x))r j k! j From the uniqueness of factorization into irreducible factors, it is clear that these polynomials f j (x) can have no common divisors other than scalar polynomials. Thus, they are relatively prime. It follows (see Corollary 5.2.6) that there are polynomials q1 (x), q2 (x), . . . , qt (x) over F, such that f1 (x)q1 (x) + f2(x)q2 (x) + · · · + ft (x)qt (x) = 1.

(6.1)

We also note that the product f j (x) fk (x), for j ! k, is divisible by m(x), so that by properties of the minimal polynomial, f j (T ) fk (T ) = z,

(6.2)

which is the zero operator on V. Next, we define certain operators P1 , P2 , . . . , Pt on V as polynomials in T as follows. For each j, 1 ≤ j ≤ t, let P j = f j (T )q j (T ). Then by Equations (6.1) and (6.2), P1 + P2 + · · · + Pt = I,

(6.3)

which is the identity operator on V, and for j ! k, P j Pk = z,

(6.4)

the zero operator, as polynomials in T commute. Note that multiplying Equation (6.3) by a fixed P j yields P j 2 = P j because of equation (6.4). Thus each P j is a projection on V. As the projections P1 , P2 , . . . , Pt on V satisfy the conditions given in Equations (6.3) and (6.4), it follows from Proposition (4.2.13) that V is the direct sum of the ranges of these projections. So assertion (c) of the theorem will be established once we show that Im(P j ) = W j for each j. Now, for any v ∈ Im(P j ), as P j is a projection, v = P j v, and so (p j (T ))r j v = (p j (T ))r j (P j v) = (p j (T ))r j ( f j (T )q j (T )v),

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Primary Decomposition Theorem

323

by definition of P j . Since by construction f j (x)(p j (x))r j = m(x), it follows that f j (T )(p j (T ))r j is the zero operator on V. The preceding equation then shows that (p j (T ))r j v = 0 placing v ∈ ker(p j (T ))r j . One concludes that Im(P j ) ⊂ W j . To prove the reverse inclusion, assume that v ∈ W j . Thus (p j (T ))r j v = 0, which implies that Pi v = 0 for any i ! j as (p j (x))r j is a divisor of fi (x) by definition. It follows then by Equation (6.3) that v = Iv = (P1 + P2 + · · · + Pt )v = P j v, which shows, P j being a projection, that v ∈ Im(P j ). This completes the verification that Im(P j ) = W j and so (c) is proved. We finally prove assertion (b), that is, show that the restriction T j of T to W j has minimal polynomial (p j (x))r j . Since W j is the kernel of (p j (T ))r j , it follows that (p j (T j ))r j w = 0 for any w ∈ W j . In other words, T j satisfies the polynomial (p j (x))r j , which implies that the minimal polynomial h(x) of T j divides (p j (x))r j . On the other hand, h(x) being the minimal polynomial of T j , h(T j ), and hence h(T ), is the zero operator on W j . Since f j (T ) acts as the zero operator on Wi for any i ! j and V is the direct sum of the Wi , it follows that h(T ) f j (T ) is the zero operator on V as h(T ) and f j (T ) commute. Therefore, the minimal polynomial m(x) = f j (x)(p j (x))r j of T divides h(x) f j (x), and so (p j (x))r j is a divisor of h(x), which, together with the conclusion of the preceding para, implies that h(x) equals (p j (x))r j . This proves assertion (b) of the theorem. ! We must point out that the proof of the primary decomposition theorem does not involve at any stage the finite-dimensionality of the underlying vector space. Thus the theorem holds even for operators on infinite-dimensional vector spaces, provided their minimal polynomials can be expressed as products of finite numbers of irreducible polynomials. The primary decomposition theorem is a very useful theoretical tool. However, in general, explicit computations (for example, to find suitable bases for the summands W j , which may determine nice matrix forms for the operator) based on the theorem are not practicable. One of the cases where the primary decomposition theorem does yield significant results is when the irreducible factors of the minimal polynomial of an operator are all linear. This, for example, will be the case for any operator on a complex vector space as the only irreducible polynomials over the field of C of complex numbers are the linear ones. In such cases, one has the following useful implication of the primary decomposition theorem. Corollary 6.2.2. Suppose that the minimal polynomial m(x) of a linear operator T on a finitedimensional vector space V over a field F is given by m(x) = (x − λ1 )r1 (x − λ2)r2 . . . (x − λt )rt . If T j is the restriction of T on the T -invariant subspace W j = ker(T − λ j I), then S j = T j − λ j I j , where I j is the identity operator on W j , is a nilpotent operator on W j of index r j . Proof. Proof is clear as, by the primary decomposition theorem, the minimal polynomial of T j is (x − λ j )r j and so that of S j is xr j . ! Observe that for a diagonalizable operator T on a finite-dimensional vector space V with distinct M eigenvalues λ1 , λ2 , . . . , λt , its minimal polynomial m(x) = tj=1 (x − λ j ) is a product of distinct linear factors. The T -invariant subspaces W j of the primary decomposition theorem, in this case, are the

Saikia-Linear Algebra

324

book1

February 25, 2014

0:8

Canonical Forms

eigenspaces of T , for, by definition, W j = ker(T − λ j I). Thus the decomposition of V as the direct sum of W j , as in the theorem, is precisely the direct sum of the distinct eigenspaces of T , a result we proved so laboriously earlier. The projections associated with the decomposition also provides a nice characterization of the diagonalizable operator T . To derive it, we consider the operator D = λ1 P1 + λ2 P2 + · · · λt Pt on V, where P1 , P2 , . . . , Pt are the projections associated with the decomposition of V as the direct sum of the eigenspaces W j , where W j = Im(P j ). Recall that for any j, P j acts as the identity on W j whereas 4 it acts as the zero operator on Wi for i ! j. Thus, for v = j w j in V, where w j ∈ W j , it is an easy 4 4 computation to show that Dv = D j w j = j λ j w j . 4 4 On the other hand, T w j = λ j w j for any w j ∈ W j . Thus, for v = j w j , one has T v = j λ j w j . As v is arbitrary in V, T and D are equal implying that T = λ1 P1 + λ2 P2 + · · · λt Pt .

(6.5)

Conversely, we claim that if P j are non-zero projections on a finite-dimensional vector space V over a field F such that (i) P j Pk , for j ! k, is the zero operator on V; (ii) P1 + P2 + · · · + Pt = I, the identity operator on V,

then a linear operator T on V, satisfying Equation (6.5), is diagonalizable with the scalars λ1 , λ2 , . . . , λt as its eigenvalues. We first show that each λ j is an eigenvalue of T . Multiplying both sides of Equation (6.5) by P j , one obtains T P j = λ j P j by Condition (i). Thus, if W j = ImP j (which means that w j ∈ W j if and only if P j w j = w j ), then W j = ker(T − λ j I). But W j has non-zero vector as P j is non-zero, and so it is clear that λ j is an eigenvalue of T with W j as the corresponding eigenspace. Next note that for any scalar λ, Equation (6.5) coupled with Condition (ii), implies that T − λI = (λ1 − λ)P1 + (λ2 − λ)P2 + · · · + (λt − λ)Pt . Therefore, if (T − λI)v = 0 holds for any non-zero v ∈ V, then (λk − λ)Pk v = 0, for any k, as P j Pk is the zero operator for k ! j. However, v is non-zero if and only if P j v is non-zero for some j. Thus for any such j, the preceding equality implies that λ j − λ = 0. Thus, T can have no eigenvalue other than λ1 , λ2 , . . . , λt . 4 Finally, we note that for any v ∈ V, by Condition (ii), v = j P j v. Since every non-zero vector in Im(P j ) is an eigenvector of T , it then follows that V is spanned by eigenvectors of T . Thus, T is diagonalizable as claimed. Now we consider the general case of a linear operator whose minimal polynomial is a product of linear factors, not necessarily distinct. In that case, the primary decomposition theorem (with a little bit of help from Equation 6.5) provides a nice description of such operators, known as the Jordan– Chevalley or the SN decomposition. Proposition 6.2.3. Let T be a linear operator on a finite-dimensional vector space V, whose minimal polynomial m(x) is a product of linear factors. Then there is a diagonalizable operator S and a nilpotent operator N, both on V, such that T = S +N

and

S N = NS .

Moreover, T determines the diagonalizable operator S and the nilpotent operator N uniquely.

(6.6)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Primary Decomposition Theorem

325

The letter S stands for semi-simple; for, sometimes diagonalizable operators are also known as semi-simple operators. M Proof. Let m(x) = tj=1 (x − λ j )r j be the minimal polynomial of T . Then according to the primary decomposition theorem (see Equations 6.3 and 6.4), there are non-zero projections P1 , P2 , . . . , Pt on V, each a polynomial in T , such that (i) Pk P j , for k ! j, is the zero operator on V and (ii) P1 + P2 + · · · + Pt = I, the identity operator on V. Therefore, by the discussion following Equation (6.5), the operator S = λ1 P1 + λ2 P2 + · · · + λt Pt .

(6.7)

is a diagonalizable operator on V. We next set N = T − S.

(6.8)

We claim that the operator N on V is a nilpotent one. We first note that, by multiplying the relation I = P1 + P2 + · · · + Pt by T , one obtains T = T P1 + T P2 + · · · + T Pt . This relation, combined with the Definition (6.7) of S , implies the following formula for N = T − S : N=

t 1 j=1

(T − λ j I)P j .

(6.9)

In fact, we shall show that, for any positive integer r, the relation Nr =

t 1 j=1

(T − λ j I)r P j

(6.10)

holds, by induction on r. Equation (6.9) starts the induction for r = 1. So we assume that (6.10) holds 4 for any r ≥ 1. Now, multiplying the expression for N r by N = k (T − λk I)Pk , we can simplify the product by noting that any Pk , being a polynomial in T , commutes with the polynomial (T − λk I). Since P j Pk , for j ! k, is the zero operator and P2k = Pk , our simplification yields the relation N r+1 = 4t r+1 Pk . This shows that, by induction, the relation (6.10) holds for any positive integer r. k=1 (T − λk I) Recall that in the proof of the primary decomposition theorem it was shown that Im(P j ) = ker(T − λ j I)r j for any j. Therefore, if we choose a positive integer r such that r > max{r j } for all j, then for any v ∈ V, (T − λ j I)r P j v = 0 for any j. It then follows, from Equation (6.10), that for such an r, 4 N r v = tj=1 (T − λ j I)r P j v = 0. Thus N r is the zero operator and so our claim that N is nilpotent is established. Since, by the definition of N, T = S + N, where S and N, being polynomials in T , commute, the first part of the proposition is proved. Thus to complete the proof, we need to prove the uniqueness part. So let T = S ' + N ' be another decomposition, where S ' is diagonalizable and N ' nilpotent such that they commute. It is then trivial that each of S ' and N ' commutes with T and hence with any polynomial in T . In particular, each commutes with S and N as these were constructed as polynomials in T . Now, the commuting diagonalizable operators S and S ' are simultaneously diagonalizable (see Exercise 19 in Section 5.5) and so S − S ' is a diagonalizable operator on V. On the other hand, as the nilpotent operators N and N ' commute, the operator N ' − N is a nilpotent operator on V. Since

Saikia-Linear Algebra

326

book1

February 25, 2014

0:8

Canonical Forms

T = S + N = S ' + N ' , we have just shown that the operator S − S ' = N ' − N on V is diagonalizable as well as nilpotent, which implies that S − S ' has to be the zero operator. Thus, we may conclude that S = S ' and N ' = N, which proves the required uniqueness. ! It is customary to call S the diagonalizable part and N the nilpotent part of T . The matrix version of Proposition (6.2.3) is clear, and we leave it to the reader to formulate them. However, for a proper appreciation of the matrix version, we have to wait till we develop the theory of Jordan forms later in the chapter. To illustrate the difficulties of actually computing the diagonalizable and nilpotent parts of a matrix, we present a simple example. Consider the real matrix A given by  0 1 A =  0 0

0 0 1 0

0 0 0 1

 −1  0 , 2 0

whose minimal polynomial (as well as the characteristic polynomial) is clearly x4 − 2x2 + 1 (what’s so clear about it?). Note that x4 − 2x2 + 1 = (x − 1)2(x + 1)2 , where x − 1 and x + 1 are irreducible over R. Therefore, if T is the linear operator on R4 , represented by A, say with respect to the standard basis of R4 , then the primary decomposition theorem implies that R4 = W1 ⊕ W2 , where the T -invariant subspaces Wi are given by W1 = ker(T − I)2

and W2 = ker(T + I)2 ,

(6.11)

where I is the identity operator on R4 . Let T j and I j denote the restriction of T to W j and the identity of W j , respectively. Then it is clear that N1 = (T 1 − I1 ) and N2 = (T 2 − I2 ) are nilpotent of index 2 on W1 and W2 , respectively. Our first goal is to find suitable bases for the subspaces W j using the nilpotent operators N j . Fortunately for us, each of the subspaces W j has dimension 2 (by checking the characteristic polynomial) whereas the nilpotent operators are also of index 2; so the nilcyclic bases determined by N1 and N2 are ideal for our purpose (see, for example, Proposition 5.5.15). We take the subspace W1 first. The basis we are looking for is {v1 , N1 v1 }, where v1 is so chosen in W1 such that N1 v1 is non-zero. Since W1 = ker(T − I)2 , we can determine the vectors in W1 by computing the solution space of (A − I4)2 x = 0 in R4 . We leave it to the reader to verify that the row reduced form of

is

  1 −2 (A − I4)2 =   1 0  1 0  0  0

0 1 0 0

0 1 −2 1

−1 −2 0 0

−1 0 3 −2  2  3 , 0 0

 2  −1  −4 3

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Primary Decomposition Theorem

327

which shows that (x1 , x2 , x3 , x4 )t ∈ W1 if and only if the coordinates satisfy the system of equatios x1

− x3 + 2x4 = 0 . x2 − 2x3 + 3x4 = 0

Now it is easy to check that v1 = (1, 2, 1, 0)t ∈ W1 but (A − I4)(1, 2, 1, 0)t is non-zero. This implies that for this choice of v1 , {v1 , N1 v1 } is the basis of W1 with respect to which the matrix of N1 is the elementary Jordan block J2 (0). So the matrix B1 of T 1 with respect to the same basis is given by ' ( 1 0 . B1 = 1 1 Similarly, one can find a basis {v2 , N2 v2 } of W2 with respect to which the matrix of T 2 is ' ( −1 0 . B2 = 1 −1 Since R4 is the direct sum of W1 and W2 , the union of the bases we have found for W1 and W2 provides a basis for R4 . It also follows that the matrix B of T with respect to this basis is the direct sum of matrices B1 and B2:   1 0 0 0  1 1 0 0   B =  0 0 -1 0  0 0 1 -1

We can rewrite B as B = D + J, where D is the diagonal matrix D = [1, 1, −1, −1] and J is the nilpotent matrix J = J2 (0) ⊕ J2 (0). Thus, we have shown that the original matrix A is similar to the sum of a diagonal and a nilpotent matrix. It must be pointed out that the example worked precisely because the dimensions of W1 and W2 are equal to the indices of the corresponding nilpotent operators N1 and N2 . In general, the canonical forms of nilpotent operators (that is, the simplest possible matrix forms for such operators) will be required. We take up the derivation of such canonical forms in the next section. EXERCISES 1. Determine whether the following statements are true or false giving brief justifications: (a) The primary decomposition theorem is not valid for a linear operator whose minimal polynomial is a power of a single irreducible polynomial. (b) The primary decomposition theorem is not valid for a linear operator on an infinitedimensional vector space. (c) Every matrix over C is similar to a sum of a diagonal and a nilpotent matrix. (d) Every matrix over C is a sum of a diagonalizable and a nilpotent matrix. (e) The SN decomposition of A is given by    1 2 3 1    A = 0 1 4 = 0    0 0 1 0

0 1 0

  0 0   0 + 0   1 0

2 0 0

 3  4 .  0

Saikia-Linear Algebra

328

book1

February 25, 2014

0:8

Canonical Forms

(f) If the minimal polynomial of a linear operator is a product of distinct linear factors, then the image of each projection given by the primary decomposition theorem has dimension 1. (g) If the minimal polynomial of a linear operator is a product of distinct linear factors, then the nilpotent operator in the SN decomposition of the operator is the zero operator. (h) If the minimal polynomial of a linear operator is xr , then the diagonalizable part of the operator is the zero operator. (i) If the minimal polynomial of an operator is a product of linear factors, then the operator is diagonalizable if and only if the nilpotent part of the operator is zero. 2. For each of the following matrices A over the indicated fields F, find the SN decomposition if it exists: ' ( 1 1 (a) A = over F = R. 1 1   0 0 1 (b) A = 0 0 0 over F = R.   1 0 0

3. Let T be the linear operator on R3 whose matrix with respect to the standard basis is   0 0 −1   0. A = 1 0   0 1 0

Express the minimal polynomial of T as a product p1 (x)p2 (x) of monic irreducible polynomials over R. Find bases of the kernels of p1 (T ) and p2 (T ), and compute the matrix of T with respect to the basis of R3 thus found. 4. Let T be the linear operator on R4 whose matrix with respect to the standard basis is  0 1 A =  1 1

0 0 −1 −1

1 0 −5 −8

 −1  0 . 4 6

Express the minimal polynomial of T as a product p1 (x)p2 (x) of monic irreducible polynomials over R. Find bases of the kernels of p1 (T ) and p2 (T ), and compute the matrix of T with respect to the basis of R4 thus found. 5. Let T be the linear operator on R3 whose matrix with respect to the standard basis is   −10 −7 23    6 . A =  −2 −3   −5 −4 12

Show that there exist diagonalizable operator S and nilpotent operator N on R3 such that T = S + N. What are the matrices of S and N with respect to the standard basis of R3 ? 6. Let T be a linear operator on a finite-dimensional vector space V over a field F with characteristic polynomial ch(x) = (x − a1)d1 (x − a2)d2 · · · (x − at )dt

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

329

and minimal polynomial m(x) = (x − a1)r1 (x − a2)r2 · · · (x − at )rt over F. Let Wi = ker(T − ai )ri . Prove that dim Wi = di . 7. Prove the following variant of the primary decomposition theorem: Let T be a linear operator on a vector space V over a field F such that its minimal polynomial m(x) factorizes as p1 (x)p2 (x) · · · pt (x), where the pi are pairwise relatively prime monic polynomials over F. Let Wi = ker pi (T ). Prove the following assertions. (a) The subspaces Wi are T -invariant. (b) V = W1 ⊕ W2 ⊕ · · · ⊕ Wt .

(c) The minimal polynomial of T restricted to Wi is pi (x). 8. For a linear operator T on a vector space V whose minimal polynomial factors into a product of powers of distinct irreducible polynomials, let V = W1 ⊕ W2 ⊕ · · · ⊕ Wt be the primary decomposition of V. Prove that for any T -invariant subspace W of V W = (W ∩ W1 ) ⊕ (W ∩ W2 ) ⊕ · · · ⊕ (W ∩ Wt ). 9. Deduce from the primary decomposition theorem the result that if the minimal polynomial of a linear operator is a product of distinct linear factors, then it must be diagonalizable. 10. Let T be a linear operator on a finite-dimensional vector space over C, and let S be the diagonalizable part of T . Show that for any polynomial f (x) over C, f (S ) is the diagonalizable part of f (T ). 11. Let the minimal polynomial of a linear operator T on a vector space V be m(x). Prove that there is a vector in V whose T -annihilator is precisely m(x). 12. Let T be a linear operator on a finite-dimensional vector space V over a field F. Prove that the characteristic polynomial of T is irreducible over F if and only if T has no T -invariant subspace other than V and the zero subspace. 13. Let T be a linear operator on a finite-dimensional vector space over C with minimal polynomial m(x), and let f (x) be any polynomial over F. Prove that ker( f (T )n ) = ker f (T ) for all n ≥ 1 if and only if m(x) has no repeated factors.

6.3 JORDAN FORMS We begin this section by first deriving the canonical form or the Jordan form of a nilpotent operator on a finite-dimensional vector space and then go on to derive the Jordan form of a general operator whose minimal polynomial splits into a product of linear factors. We have already taken the first step in determining the canonical form of a nilpotent operator in the last chapter. In Proposition (5.5.15), we had seen that if T is a nilpotent operator of index of nilpotency r on a vector space V, then there is a T -invariant subspace W1 of dimension r and a T nilcyclic basis of W1 , relative to which the matrix of the restriction T 1 is the elementary Jordan block Jr (0) of eigenvalue 0. In fact, if v ∈ V is a vector such that T r−1 v ! 0, then v, T v, . . . , T r−1 v is a nilcyclic basis with respect to which the matrix of the restriction T 1 is clearly Jr (0). The idea behind the derivation of the canonical form of T is to decompose V as a direct sum V = W1 ⊕ W2 ⊕ · · · ⊕ Wk of T -invariant, T -cyclic subspaces such that the restriction T i of T to the subspace Wi for i ≥ 2 is nilpotent on Wi but whose index of nilpotency does not exceed that of T i−1 . The existence of nilcyclic bases for

Saikia-Linear Algebra

330

book1

February 25, 2014

0:8

Canonical Forms

these subspaces then ensures that the matrix of T is a direct sum of elementary Jordan matrices whose sizes are non-increasing. The crucial step is to determine the nilcyclic bases for the T -invariant subspaces which will yield the necessary Jordan forms. We consider a hypothetical canonical form of a nilpotent operator which is a direct sum of elementary Jordan blocks of non-increasing sizes, to clarify this crucial step. Suppose that T is nilpotent of index 6 on a vector space V of dimension 22 having its Jordan form as J6 (0) ⊕ J6(0) ⊕ J5(0) ⊕ J3(0) ⊕ J2(0). This corresponds to nilcyclic bases of the T -invariant subspaces Wi which we arrange in columns as follows: W1 v1 T v1 T 2 v1 T 3 v1 T 4 v1 T 5 v1

W2 v2 T v2 T 2 v2 T 3 v2 T 4 v2 T 5 v2

W3

W4

v3 T v3 T 2 v3 T 3 v3 T 4 v3

v4 T v4 T 2 v4

W5

. v5 T v5

However, such nilcyclic bases will be constructed (for the proof of their existence) by determining the vectors in each row, starting with the first row and then determining the vectors in each succeeding row, by a well-defined procedure. Before proving that such a procedure works, let us make some preliminary remarks and fix the necessary notation. Suppose that T is nilpotent of index r on a finite-dimensional vector space V. For any positive integer j, let K j = ker T j . Note that K1 , the kernel of T , is the usual eigenspace of T belonging to its sole eigenvalue 0. The rest of the subspaces K j are sometimes called generalized eigenspaces. Since T r is the zero operator whereas T r−1 is not, it follows that Kr = V

but

Kr−1 ! V.

Hence, there exists some v ∈ V such that T r v = 0, but T r−1 v ! 0. Therefore, T r− j v ∈ K j , but T r− j v " K j−1 . Thus, there is the following strict inclusion of generalized eigenspaces: K1 ⊂ K2 ⊂ · · · ⊂ Kr−1 ⊂ Kr = V.

Put K0 = {0}. Next, we define integers qi = qi (T ) for i = 1, 2, . . . , r by

qi = dim(Ki /Ki−1 ) = dim Ki − dim Ki−1 .

(6.12)

Note that q1 = dim K1 = nullity(T ) and

dim Ki = q1 + q2 + · · · + qi for any i = 1, 2, . . . , r. These, in turn, determine another set of integers si = si (T ) as follows sr = qr

and si = qi − qi+1

for i = 1, 2, . . . , r − 1.

(6.13)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

331

The following result implies that the integers si are non-negative. Lemma 6.3.1.

For i = 1, 2, . . . , r − 1, dim(Ki /Ki−1 ) ≥ dim(Ki+1 /Ki ).

Proof. Let u1 + Ki , u2 + Ki , . . . , ul + Ki be linearly independent in the quotient space Ki+1 /Ki . It is clear that the vectors T u1 , T u2 , . . . , T ul are in Ki . To prove the lemma, it suffices to show that the cosets T u1 + Ki−1 , T u2 + Ki−1 , . . . , T ul + Ki−1 are linearly independent in Ki /Ki−1 . Now, if a1 (T u1 + Ki−1 ) + a2(T u2 + Ki−1 ) + · · · + al (T ul + Ki−1 ) = Ki−1 , the zero vector in Ki /Ki−1 , then a1 T u1 + a2 T u2 + · · · + al T ul ∈ Ki−1 forcing a1 u1 + a2 u2 + · · · + al ul ∈ Ki . This, however, implies that a1 (u1 + Ki ) + a2 (u2 + Ki ) + · · · + al (ul + Ki ) = Ki , the zero of Ki+1 /Ki . The linear independence of these cosets then implies that a1 = a2 = · · · = al = 0 which, in turn, proves the linear independence of the cosets T u1 + Ki−1 , T u2 + Ki−1 , . . . , T ul + Ki−1 in Ki /Ki−1 . ! We also note that s1 (T ) + s2 (T ) + · · · + sr (T ) = q1 (T ) = dim ker T.

(6.14)

We remark that even though the integers qi = qi (T ) and si = si (T ) are determined by the generalized eigenspaces of the nilpotent operator T of index r, we sometimes suppress the symbol T for convenience. We can now state the main result about nilpotent operator. Proposition 6.3.2. Let T be a nilpotent operator of index of nilpotency r on a finite-dimensional vector space V with K j = ker T j for 0 ≤ j ≤ r. Let si = si (T ) be the integers defined by Equation (6.13). Also set m = m(T ) = nullity(T ). Then, there exists m vectors v1 , v2 , . . . , vm in V such that (a) exactly s j of these vectors are in the difference K j − K j−1 for j = 1, 2, . . . , r; (b) non-zero vectors of the form T k vi , for k ≥ 0 and 1 ≤ i ≤ m, form a basis of Kr = V.

Furthermore, any set of vectors u1 , u2 , . . . , ul in Kr such that the cosets u1 + Kr−1 , u2 + Kr−1 , . . . , ul + Kr−1 are linearly independent in the quotient space Kr /Kr−1 , can be included among v1 , v 2 , . . . , v m . Proof. The proof is by induction on r. If r = 1, then T is the zero operator on V and vectors from any basis of V will be the required ones trivially. So, we may assume that r > 1. Choose a basis v1 + Kr−1 , v2 + Kr−1 , . . . , v sr + Kr−1 of the quotient space Kr /Kr−1 . (Note that sr = qr is the dimension of Kr /Kr−1 ). The vectors T v1 , T v2 , . . . , T v sr are in Kr−1 , and the preceding Lemma (6.3.1) then shows that the cosets T v1 + Kr−2 , T v2 + Kr−2 , . . . , T v sr + Kr−2 are linearly independent in Kr−1 /Kr−2 . Now T 1 , the restriction of T to Kr−1 , is clearly a nilpotent operator of index r − 1, whose sequence of generalized eigenspaces can be obtained from that of T by just excluding the last one, namely Kr . It follows that sr−1 (T 1 ) = qr−1 = qr−1 (T )

Saikia-Linear Algebra

332

book1

February 25, 2014

0:8

Canonical Forms

and si (T 1 ) = si = si (T )

for i = 1, 2, . . . , r − 2.

Note that sr (T 1 ) is undefined. Observe also that m(T 1 ) = nullity(T 1 ) = m, as the kernel of T as well as of T 1 is the same, namely K1 . Therefore, by the induction hypothesis, there is a set S1 of m vectors u1 , u2 , . . . , um in Kr−1 , which includes T v1 , T v2 , . . . , T v sr by our claim in the preceding paragraph, and is such that (a) exactly s j (T 1 ) of the vectors are from K j − K j−1 for j = 1, 2, . . . , r − 1; (b) non-zero vectors of the form T 1 k ui for i = 1, 2, . . . , m and k ≥ 0 form a basis of Kr−1 . To obtain the required vectors in V, we replace the sr vectors T v1 , T v2 , . . . , T v sr in S1 by v1 , v2 , . . . , v sr and rename the rest of the vectors in S1 as v sr +1 , . . . , vm to obtain a set S of m vectors in Kr . Note that S1 had exactly sr−1 (T 1 ) vectors from Kr−1 . Since sr−1 (T 1 ) = qr−1 (T ) and sr−1 (T ) = qr−1 (T ) − qr (T ) = qr−1 (T ) − sr , it follows by the construction of S that it has exactly sr vectors from Kr − Kr−1 and sr−1 = sr−1 (T ) vectors from Kr−1 − Kr−2 . Furthermore, we have already noted that si (T 1 ) = si (T ) = si for i = 1, 2, . . . , r − 2, so S has exactly si vectors from Ki . Finally, T and T 1 coincide on Kr−1 . Therefore, the properties of S1 , noted earlier, prove that S is the required set in Kr = V. ! This technical result provides us with all the information about canonical forms of nilpotent operators. Proposition 6.3.3. Let T be a nilpotent operator of index of nilpotency r on a finite-dimensional vector space V. Then, there is a basis of V with respect to which the matrix of T is a direct sum of copies of elementary Jordan matrices Jr (0), Jr−1 (0), . . . , J1 (0) with Jl (0) appearing in the direct sum exactly sl times. The total number of elementary Jordan matrices in the sum is nullity(T ). The sizes and the number of elementary Jordan matrices are independent of the choice of the basis and uniquely determined by T . Proof. Given the nilpotent operator T of index r on V, consider the set of vectors v1 , v2 , . . . , vm of the preceding proposition, where m = nullity(T ). For each i, 1 ≤ i ≤ m, let Vi be the T -nilcyclic subspace of V spanned by the vectors T k vi for k ≥ 0; then dim Vi = l if and only if vi ∈ Kl − Kl−1 . In that case, vi , T vi , . . . , T l−1 vi form a basis of Vi , with respect to which the matrix of the restriction of T to Vi is precisely Jl (0), the l × l elementary Jordan matrix of eigenvalue 0. The proposition also implies that the number of such T -cyclic subspaces is sl for any 1 ≤ l ≤ r. It is also clear that dim V = dim V1 + dim V2 + · · · + dim Vm . Therefore if B is the union of the nilcyclic bases of all the Vi , then B is a basis of V and the matrix of T with respect to B is clearly the required direct sum of elementary Jordan blocks.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

333

Note that the number sl of l × l Jordan blocks appearing in the direct sum is given by sr = qr

and

sl = ql − ql+1

for l < r,

where ql = dim(ker T l / ker T l−1 ) for any l. Thus, sl is determined uniquely by T and so independent of the basis chosen for V. Similarly, the total number of Jordan blocks appearing in the direct sum is uniquely determined by T , as it equals nullity(T ). ! We restate this result about the existence of a special form of matrix representation of a nilpotent operator in a more convenient form by changing the notation slightly. Theorem 6.3.4. Let T be a nilpotent operator of index of nilpotency r on a finite-dimensional vector space V. Then, T determines a unique set of positive integers n1 ≥ n2 ≥ · · · ≥ nm with n1 = r, and n1 + n2 + · · · + nm = dim V, such that there is a basis of V relative to which the matrix of T is a direct sum of the following elementary Jordan blocks of eigenvalue 0, namely Jn1 (0), Jn2 (0), . . . , Jnm (0). Equivalently, V = Wn1 ⊕ Wn2 ⊕ · · · ⊕ Wnm ,

(6.15)

where the subspace Wni is a T -nilcyclic subspace of dimension ni . As far as the change of notation is concerned, note that, for example, our new integers n1 = n2 = · · · = n sr are all equal to r, and if sr−1 ! 0, then the corresponding integers are n sr +1 = · · · = n sr +sr−1 = r − 1, and so on. It is clear from Proposition (6.3.3) that T determines these integers ni uniquely, subject to the following two conditions: (i) ni are non-increasing, and (ii) the sum of the ni equals dim V These integers are called the invariants of the nilpotent operator T . Note that there has to be at least one invariant of T , namely n1 = r. As usual, we have the matrix analogue of the results proved just now, and we frame them leaving the derivation to the reader. Theorem 6.3.5. Let A ∈ Mn (F) be nilpotent of index r. Then, there is a set of positive integers n1 ≥ n2 ≥ · · · ≥ nm with n1 = r, 1 ≤ ni ≤ r, and n1 + n2 + · · · + nm = n, such that A is similar to a matrix which is the direct sum of elementary Jordan blocks Jn1 (0), Jn2 (0), . . . , Jnm (0). The number m of Jordan blocks equals nullity(A). As in the case of operators, these integers are called the invariants of the nilpotent matrix A. Also, the matrix which is the direct sum of the elementary Jordan blocks determined by A is called the Jordan form or the canonical form of the nilpotent matrix A. Note that the number sl (A) of elementary Jordan blocks of size l for any 1 ≤ l ≤ r, in the Jordan form of a matrix A is given by sr (A) = qr (A) and

sl (A) = ql (A) − ql+1(A) for l < r,

(6.16)

where ql (A) = nullity(Al ) − nullity(Al−1). We remark here that the invariants of a nilpotent operator T and any matrix representation A of T are exactly the same, for, as the discussion preceding Theorem (6.3.4) shows, the invariants are determined completely by the dimensions of ker T l for various l’s and hence by the nullities of Al .

Saikia-Linear Algebra

334

book1

February 25, 2014

0:8

Canonical Forms

The uniqueness of the invariants of a nilpotent operator implies the following useful result. Proposition 6.3.6. Let A, B ∈ Mn (F) be nilpotent matrices having invariants n1 ≥ n2 ≥ · · · ≥ nm and t1 ≥ t2 ≥ · · · ≥ tq , respectively. Then, A and B are similar if and only if m = q and ni = ti for all i. Proof. If A and B have the same set of invariants, then they will be similar to the same Jordan form, and so must themselves be similar. If A and B are similar, then they are the representation of some nilpotent operator T on Fn with respect to two bases of Fn . Since a nilpotent operator and any of its matrix representation share the same invariants, it follows that A and B have the same invariants. ! Similarly, the following result about similar nilpotent operators holds. Proposition 6.3.7. Two nilpotent operators on a finite-dimensional vector space are similar if and only if they have the same invariants. We consider some examples now. Recall that a nilpotent operator on an n-dimensional vector space, or an n × n matrix with index of nilpotency r, has minimal polynomial xr and characteristic polynomial xn . EXAMPLE 1

EXAMPLE 2

Suppose that the invariants of a nilpotent operator T (or of a nilpotent matrix A) are 3, 3, 2, 1, 1. Then, we have the following information about T : T is nilpotent of index 3, acting on a vector space of dimension 3 + 3 + 2 + 1 + 1 = 10, or A is a 10 × 10 matrix of index of nilpotency 3. The Jordan form of T or A is the direct sum of the elementary Jordan blocks J3 (0), J3 (0), J2 (0), J1 (0), J1 (0). Explicitly, the Jordan form will be    0 0 0 0 0 0 0 0 0 0   1 0 0 0 0 0 0 0 0 0     0 1 0 0 0 0 0 0 0 0     0 0 0 0 0 0 0 0 0 0   0 0 0 1 0 0 0 0 0 0   .  0 0 0 0 1 0 0 0 0 0   0 0 0 0 0 0 0 0 0 0     0 0 0 0 0 0 1 0 0 0     0 0 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 0 0

Assume that T is a nilpotent operator with minimal polynomial x3 and characteristic polynomial x7 . This information is not enough to determine the Jordan form of T or the similarity class of T . All that we can do is to specify the possible Jordan forms of T . Note that T is acting on a vector space of dimension 7 (as the characteristic polynomial has degree 7) and has index of nilpotency 3. The Jordan form of T can be the direct sum of the elementary Jordan blocks listed in any one of the following:

• • • •

J3 (0), J3 (0), J3 (0), J3 (0),

J3 (0), J2 (0), J2 (0), J1 (0),

J1 (0) J2 (0) J1 (0), J1 (0) J1 (0), J1 (0), J1 (0).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

335

These possibilities are determined by finding positive integers n1 ≥ n2 ≥ · · · ≥ nt , where n1 = 3 and n1 + n2 + · · · + nt = 7. EXAMPLE 3

Consider two 3 × 3 nilpotent matrices over any field F having the same minimal polynomial. Note that if the minimal polynomial is x3 , then the Jordan form of both will be J3 (0), and so they are similar. For the other possibility, note that if a 3 × 3 nilpotent matrix has minimal polynomial x2 , its invariants have to be 2 and 1. Thus, by Proposition (6.3.6), any two nilpotent matrices having minimal polynomial x2 will also be similar. It is left as an exercise to the reader to find two 4 × 4 nilpotent matrices having the same minimal polynomial but which are not similar over any given field.

EXAMPLE 4

Let T be a linear operator on R4 such that it is represented with respect to the standard basis by the matrix:  0 1 A =  2 4

0 0 3 5

 0  0 . 0 0

0 0 0 6

Computing higher powers of A, we see that   0  0 A2 =   3 17

0 0 0 18

0 0 0 0

 0  0  0 0

and

  0  0 A3 =   0 18

0 0 0 0

0 0 0 0

 0  0  0 0

and A4 the zero matrix. Thus, A and T are nilpotent of index 4, and it is possible to find a T -nilcyclic basis of R4 . We seek first a vector v in R4 such that T 3 v is nonzero. Equivalently, we seek v = (x1 , x2 , x3 , x4 )t ∈ R4 such that A3 (x1 , x2 , x3 , x4 )t ! (0, 0, 0, 0)t ; from the explicit description of A3 , it is clear that any (x1 , x2 , x3 , x4 )t satisfies the condition if x1 ! 0. Thus, we may choose the required v in many ways. For simplicity, let us choose v = (1, 0, 0, 0)t = e1 . Then simple matrix multiplications show that Av = (0, 1, 2, 4)t , A2 v = (0, 0, 2, 17)t and A3 v = (0, 0, 0, 18)t . It follows, therefore, by Proposition (5.5.15) that v = (1, 0, 0, 0)t , T v = (0, 1, 2, 4)t , T 2 v = (0, 0, 3, 17)t and T 3 v = (0, 0, 0, 18)t form a T -nilcyclic basis of R4 , relative to which the matrix of T is the elementary Jordan block:  0 1 J4 (0) =  0 0

0 0 1 0

0 0 0 1

 0  0 . 0 0

We leave it to the reader to determine the matrix P such that P−1 AP = J4 (0). We now discuss the Jordan forms for general linear operators. To make the ideas clear, we first look at a special case of a linear operator T on an n-dimensional vector space V whose characteristic

Saikia-Linear Algebra

336

book1

February 25, 2014

0:8

Canonical Forms

polynomial is of the form, say (x − λ)n for some scalar λ, so that T has a single eigenvalue. Thus the minimal polynomial of T is (x − λ)r for some r ≤ n. To analyse T , we consider instead the operator S on V given by S = T − λI, where I is the identity map on V. Then, by Corollary (6.2.2), S is nilpotent of index r (as the minimal polynomial of S has to be xr ). Let r = n1 ≥ n2 ≥ · · · ≥ nm be the invariants of S . Thus there is a basis {vi } of V such that the matrix J of S relative to this basis is a direct sum of the Jordan blocks Jn1 (0), Jn2 (0), . . . , Jnm (0). Since T vi = S vi + λvi , it follows that the matrix of T with respect to the same basis will be the sum of J, the matrix of S , and diag[λ,λ , . . . ,λ ], the matrix of λI. In other words, the matrix of T will differ from J only in the diagonal; instead of 0 all along the diagonal as in J, the matrix of T will have λ along the diagonal. Thus, the matrix of T will be the direct sum of elementary Jordan blocks with eigenvalue λ. Recall that an elementary Jordan block Jl (λ), of eigenvalue λ, is a matrix of order l having λ along the diagonal, 1 along the subdiagonal and zeros everywhere else. Because of the properties of the Jordan form of the nilpotent operator S (determined earlier in the section), we can therefore conclude that if T ∈ EndF (V) has characteristic polynomial (x − λ)n and minimal polynomial (x − λ)r , then there is a unique set of positive integers n1 ≥ n2 ≥ · · · ≥ nt with 4 n1 = r and tj=1 n j = n such that there is a basis of V, relative to which the matrix A of T is the direct sum of Jordan blocks Jn1 (λ), Jn2 (λ), . . . , Jnm (λ), that is, A = Jn1 (λ) ⊕ Jn2 (λ) ⊕ · · · ⊕ Jnm (λ). We say, in general, that a matrix is in Jordan form if it can be expressed as a direct sum of elementary Jordan blocks of possibly different eigenvalues. Thus, we have shown that if the characteristic polynomial of a linear operator on a finite-dimensional vector space of the form (x − λ)d , then it can be represented by a matrix in Jordan form. But given such a T , how does one determine the unique integers ni ? Since T has been shown to be a sum of a nilpotent operator and a scalar multiple of the identity operator, it follows that these integers for T are the same as the ones for the nilpotent part, which, by our discussion of invariantsof a nilpotent operator, can be described as follows: (a) n1 = r is the degree of the minimal polynomial of T . (b) The number of Jordan blocks of size r, that is, the number of of ni such that ni = n1 = r is precisely sr = dim ker(T − λI)r . (c) For l = r − 1, r − 2, . . . , 2, 1, there will be exactly sl = (ql − ql+1 ) number of Jordan blocks of size l where, for l > 1, ql = dim ker(T − λI)l − dim ker(T − λI)l−1 , which implies that ql = rank(T − λI)l−1 − rank(T − λI)l . Moreover, one also has q1 = dim ker(T − λI) = n − rank(T − λI). Note the use of the dimension formula of Theorem (4.2.7) in simplifying the expressions for q’s. (d) The total number of Jordan blocks is (q1 − q2 ) + (q2 − q3 ) + · · · + (q s−1 − qr ) + qr = q1 = nullity(T − λI). In practice, these integers are best determined by forming the matrix (T − λI) with respect to any convenient basis of V, and computing the ranks of successive powers of that matrix. We consider a simple example to illustrate these remarks.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

EXAMPLE 5

337

Consider an operator T whose matrix with respect to the standard basis of F4 is  λ  0 A =   0 1

1 λ−1 0 0

0 0 λ 0

 −1   1  , 0  λ+1

where λ is an arbitrary but fixed scalar from the field F. It is easy to see that the characteristic polynomial of A, and therefore, of T , is (x − λ)4 . Thus, T − λI, where I is the identity operator on F4 . We compute the powers of A − λI4 , the matrix of T − λI with respect to the standard basis:  1 0 0 −1 A − λI4 =  0 0 1 0  −1 −1  1 1 2 (A − λI4) =  0  0 1 1

0 0 0 0 0 0 0 0

 −1  1 ; 0 1  0  0 ; 0 0

rank(A − λI4) = 2;

rank(A − λI4)2 = 1;

Finally, (A − λI4 )3 is the zero matrix, showing that x3 is the minimal polynomial of A − λI4 and of T − λI. It follows from the formulae enumerated in the preceding remarks that s3 = q3 = nullity(A − λI4)3 − nullity(A − λI4)2 = 4 − 3 = 1. Similarly, as q2 = nullity(A − λI4)2 − nullity(A − λI4)3 = 4 − 3 = 1, one obtains s2 = q2 − q3 = 0. Finally, it is clear that s1 = 1. So the Jordan form of the nilpotent operator T − λI will have 2 elementary blocks and that the order of the first block must be 3. We can, therefore, conclude that the Jordan form of T has to be the following direct sum of J3 (λ) and J1 (λ):  λ 1  0  0

0 λ 1 0

0 0 λ 0

 0  0 . 0  λ

We can now put all the pieces together (one of the pieces is the primary decomposition Theorem (6.2.1) to state the definitive result about the existence and uniqueness of the Jordan form of a linear operator whose characteristic polynomial and therefore the minimal polynomial factors completely into a product of linear factors over the base field. We first make precise the idea of a matrix in Jordan form. Definition 6.3.8. A matrix A in Mn (F), whose characteristic polynomial factors into linear factors over F and which has m distinct eigenvalues λ1 , λ2 , . . . , λm , is said to be a matrix in Jordan form if

Saikia-Linear Algebra

338

book1

February 25, 2014

0:8

Canonical Forms

(a) A is a direct sum of m submatrices, say, A1 , A2 , . . . , Am ; (b) for each j, 1 ≤ j ≤ m, the submatrix A j is a direct sum of elementary Jordan blocks of nonincreasing orders, each with eigenvalue λ j ; the number of such blocks inside A is nullity(A j − λ j I j ), where I j is the identity matrix of order equal to that of A j . Note that if m = 1, then A = A1 . Also, note that if A is in Jordan form then A is a lower triangular matrix. Theorem 6.3.9. Let T be a linear operator on a finite-dimensional vector space over a field F. Suppose that the characteristic polynomial of T factors into a product of linear factors over F and T has m distinct eigenvalues λ1 , λ2 , . . . , λm . Then, there is a basis of V with respect to which the matrix A of T is in Jordan form with A = A1 ⊕ A2 ⊕ · · · ⊕ Am , where, for each j, (1 ≤ j ≤ m), A j itself is a direct sum of elementary Jordan blocks of the type Jl (λ j ). Proof. We sketch the proof as the ideas involved had been encountered already. By hypothesis, we can M assume that the characteristic polynomial of T factors over F as mj=1 (x − λ j )d j , where d j are positive integers. Therefore, there are positive integers r j , where r j ≤ d j for each j, such that the minimal polynomial of T is of the form m ? (x − λ j )r j . j=1

For each j, let W j = ker(T − λ j I)r j . Then the primary decomposition theorem implies that V = W1 ⊕ W2 ⊕ · · · ⊕ Wm , where each W j is T -invariant, and the restriction T j of T to W j has minimal polynomial (x − λ j )r j . Therefore, if I j denotes the identity map on the subspace W j , then S j = T j − λ j I j is nilpotent on W j of index r j . Observe that on W j , T j acts like S j + λ j I j . The discussion preceding the example is applicable to each T j , and therefore, we may choose a basis B j of W j with respect to which the matrix A j of T j is the direct sum of Jordan blocks: Jn1, j (λ j ), Jn2, j (λ j ), . . . , Jnm j , j (λ j ), where the positive integers n1, j ≥ n2, j ≥ · · · ≥ nm j , j are the invariants of the nilpotent operator S j on W j . Note that the sum of these integers equals dim W j . 4 As V = ⊕W j , stringing together the bases B1 , B2 , . . . , Bm , we get a basis of V, with respect to which the matrix J of T has the required form. !

Like the canonical form of a nilpotent matrix, the Jordan form of an operator or a matrix, if it exists, is essentially unique as shown in the next proposition. Proposition 6.3.10. Let T be a linear operator on a finite-dimensional vector space V over a field F such that the characteristic polynomial of T factors into a product of linear factors over F. Then T determines its Jordan form A uniquely up to the order in which the eigenvalues of T appear inside A.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

339

Proof. Let A = A1 ⊕ A2 ⊕ · · · ⊕ Am be a matrix in Jordan form representing T , corresponding to the distinct eigenvalues λ1 , λ2 , . . . , λm of T . Thus, each A j itself is a direct sum of certain number of elementary Jordan blocks with eigenvalue λ j and so is a lower triangular matrix with λ j appearing along the diagonal. That T determines A uniquely (up to the order in which the matrices A j appear inside A) is clear from the following observations. (a) The number of submatrices A j is the number of distinct eigenvalues of T . (b) For each fixed j, the submatrix A j alone, among the submatrices A1 , A2 , . . . , Am , has λ j along its diagonal. Therefore, the number of times λ j appears along the diagonal of the lower triangular matrix A j is clearly the multiplicity of the eigenvalue λ j as a root of the characteristic polynomial of T . It follows that the order of A j is determined by the characteristic polynomial of T . (c) Finally, we claim that the number, say q j , of elementary Jordan blocks inside A j is also determined by T . It is clear that q j is the nullity of the matrix A j − λ j I j , where I j is the identity matrix of order the same as that of A j . We prove that Nullity(A j − λ j I j ) = dim ker(T − λ j I),

(6.17)

where I is the identity map on V; once proven, this relation establishes our claim. To prove (6.17), we begin by considering the decomposition V = V1 ⊕ V2 ⊕ · · · ⊕ Vm of V as a direct sum of T -invariant subspaces corresponding to the matrix representation A = A1 ⊕ A2 ⊕ · · · ⊕ Am of T . Let B j be the basis of V j for 1 ≤ j ≤ m such that the matrix of T with respect to the basis B of V, obtained by taking the union of the B j , is A. Then, the matrix of T j , the restriction of T to the T -invariant subspace V j , is A j with respect to the basis B j . Observe that, for k ! j, the matrix of T k − λ j Ik (where we let Ik also to denote the identity map of Vk ) with respect to the basis Bk is lower triangular with non-zero diagonal entries, each equal to λk − λ j . Therefore, the operator T k − λ j Ik on Vk is invertible and so ker(T k − λ j Ik ) is the zero subspace of 4 Vk . Now, expressing any v = m k=1 vk as the unique sum of vectors from the T -invariant subspaces Vk , we see that (T − λ j I)v =

m 1 k=1

(T k − λ j Ik )vk .

(6.18)

4 Now recall that by properties of direct sum decomposition (see Proposition 3.5.4) m k=1 wk = 0 for wk ∈ Vk if and only if each wk = 0. It then follows from Equation (6.18) (as a consequence of our observation preceding the equation), that v ∈ ker(T − λ j I) if and only if v j ∈ ker(T j − λ j I j ). Since dim ker(T j − λ j I j ) is the nullity of A j − λ j I j , Equation (6.17) follows. The proof of the proposition is complete.

!

We must point out, even at the risk of being repetitive, the basic features of the Jordan form of a general linear operator. (a) Each A j is a d j × d j matrix with a single eigenvalue λ j . (b) Each A j , being the Jordan form of T j , is itself a direct sum of elementary Jordan blocks with eigenvalue λ j . The first of these Jordan blocks will be Jr j (λ j ), r j being the multiplicity of λ j as

Saikia-Linear Algebra

340

book1

February 25, 2014

0:8

Canonical Forms

a root of the minimal polynomial of T j . The sizes of these Jordan blocks within A j from left to right are non-increasing. The type and frequency of these Jordan blocks within A j are determined by the procedure outlined just before the Theorem (6.3.9). (c) The number of elementary Jordan blocks in A j equals dim ker(T − λ j I). The sum of the sizes of the blocks in A j must be d j . (d) Thus, A j is a lower triangular matrix having the eigenvalue λ j along the diagonal (d j times), and having either 1 or 0 along the subdiagonal. As an illustration, we derive the Jordan form of a diagonalizable operator T on a finite-dimensional M vector space over a field F. Recall that the characteristic polynomial of T is a product mj=1 (x − λ j )d j of linear factors over F, where the m scalars λ j are the distinct eigenvalues of T . Thus, T does have a Jordan form J which is the direct sum of m matrices A j . We examine these A j now. As T is diagonalizable the minimal polynomial of T is the product of distinct linear factors and so the minimal M polynomial of T is mj=1 (x − λ j ). Thus, the first, and therefore each of the elementary Jordan blocks with eigenvalue λ j that comprises A j must be of order 1, that is, each block is the scalar λ j . We conclude that each A j is a d j × d j diagonal matrix having the eigenvalue λ j on the diagonal. In other words, the Jordan form of T is the diagonal matrix having the eigenvalues appearing along the diagonal as many times as their multiplicities as roots of the characteristic polynomial. So, the Jordan form of the diagonalizable operator is nothing but the diagonal matrix of T relative to a basis of eigenvectors of T that we derived earlier. The matrix analogue of Theorem (6.3.9) is clear. Proposition 6.3.11. Let A be a matrix in Mn (F) such that its characteristic polynomial factors into linear factors over F. Then, A is similar to a matrix in Jordan form in Mn (F). Further, the Jordan form of A is unique up to the order of appearance of its eigenvalues. The uniqueness of the Jordan form of a matrix allows us to settle questions regarding similarity of matrices having Jordan forms. Corollary 6.3.12. Let A and B be two matrices in Mn (F) having Jordan forms. Then they are similar if and only if their Jordan forms (up to a rearrangement of eigenvalues) are the same. Let us discuss a few examples of Jordan forms. Recall that in Examples 4 and 5, we had already discussed the Jordan form of an operator by relating it to some nilpotent one. As will be shown in the next set of examples, in simple cases we can do away with this intermediate step of referring to nilpotent operators, and compute the required Jordan forms by determining the relevant invariants by examining the restrictions on them. EXAMPLE 6

We find the possible Jordan forms of a linear operator T on an n-dimensional vector space having the minimal polynomial (x − 1)2 for n = 3, 4 or 5. Since T has a single eigenvector 1, the required Jordan form for any n is the direct sum of elementary Jordan blocks with eigenvalue 1 alone. (In our notation, J has a single A j .) Recalling the basic properties of A j , we therefore see that our task is to determine integers n1 ≥ n2 ≥ n3 · · · such that n1 = 2

and

n1 + n2 + n3 · · · = n

(6.19)

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

341

for, the characteristic polynomial of T has to be (x − 1)n . (In terms of our notation, d1 = n.) For n = 3: The only possible choice we have in this case is n1 = 2 and n2 = 1 so that T has a unique Jordan form J which is the direct sum of J2 (1) and J1 (1):   1 0 0   J = 1 1 0.   0 0 1

For n = 4: This time there are two choices for the set of integers satisfying Equation (6.19): n1 = 2, n2 = 2 or n1 = 2, n2 = 1, n3 = 1. Correspondingly, there are two possibilities for the Jordan form of T :  1 1 J =  0 0

0 1 0 0

0 0 1 1

 0  0  0 1

or

 1 1 J =  0 0

0 1 0 0

 1 1  J = 0  0 0

0 1 0 0 0

0 0 1 0

 0  0 . 0  1

For n = 5, It is easy to check that this time too, there are two sets of integers satisfying Equation (6.19). The required Jordan form is thus, either the direct sum of J2 (1), J2 (1) and J1 (1) or of J2 (1) and three copies of J1 (1):

EXAMPLE 7

 1 1  J = 0  0 0

0 1 0 0 0

0 0 1 1 0

0 0 0 1 0

 0  0  0  0  1

or

0 0 1 0 0

0 0 0 1 0

 0  0  0.  0  1

Let us find the possible Jordan forms of a linear operator T on a eight-dimensional vector space whose minimal polynomial is (x−1)3(x+1)4. Any of these Jordan forms will be the direct sum of two matrices A1 and A2 , with eigenvalues 1 and −1, respectively. The compositions of these two matrices will be determined by the algebraic multiplicities of the corresponding eigenvalues, and therefore by the characteristic polynomial of T . Note that the characteristic polynomial of T is of degree 8. Moreover, as the minimal polynomial and the characteristic polynomial have the same linear factors, and as the minimal polynomial divides the characteristic polynomial, we can determine possible characteristic polynomials of T easily. Accordingly, we have the following cases to deal with. Case 1: Characteristic polynomial (x − 1)3 (x + 1)5 . The point to note is that once the multiplicities of an eigenvalue in the characteristic and the minimal polynomial are known, the determination of the corresponding A j is completely independent of the other eigenvalues; one just follows the procedure of Example 6 for each eigenvalue separately. Thus, in this case, A1 is the Jordan form of an operator having (x − 1)3 as the minimal as well as the characteristic polynomial, whereas A2 is the Jordan form of one with minimal polynomial (x + 1)4 and characteristic polynomial (x + 1)5 . Recall that the degree of the minimal polynomial is the size of the first block, and the sizes of the subsequent blocks are

Saikia-Linear Algebra

342

book1

February 25, 2014

0:8

Canonical Forms

non-increasing. Thus, A1 has to be J3 (1), and A2 has to be the direct sum of J4 (−1) and J1 (−1):  1  A1 = 1  0

0 1 1

 0  0  1

 −1  1  A2 =  0  0  0

and

0 −1 1 0 0

0 0 −1 1 0

0 0 0 −1 0

 0  0  0.  0 −1

Case 2: Characteristic polynomial (x − 1)4(x + 1)4. This time we can assume that A1 is the Jordan form of an operator having (x − 1)3 as the minimal polynomial but (x − 1)4 as its characteristic polynomial, and A2 is the Jordan form of one having (x + 1)4 as the minimal as well as the characteristic polynomial. As in the first case, there can only be one choice for both A1 and A2 .

EXAMPLE 8

 1 1 A1 =  0 0

0 1 1 0

0 0 1 0

 0  0  0 1

and

 −1  1 A2 =   0 0

0 −1 1 0

0 0 −1 1

 0  0 . 0 −1

Let A be a 3 × 3 matrix over a field F having a single eigenvalue λ ∈ F. If the characteristic polynomial of A factors completely into linear factors over F (in which case, it must be (x − λ)3), then A is similar to a matrix in Jordan form. It is now easy to see that depending on the minimal polynomial of A, it will be similar to one of the following three matrices:       λ 0 0 λ 0 0 λ 0 0 0 λ 0, 1 λ 0, 1 λ 0.       0 0 λ 0 0 λ 0 1 λ For example, the matrix  3 1  0 3 0 0

 1  0  3

is similar to

 3  1 0

0 3 0

 0  0  3

which is in Jordan form, as they have the same minimal polynomial (x − 3)2. Lest the reader think that if two matrices having the same characteristic polynomial and the same minimal polynomial are similar to the same Jordan form as this example suggests, let us remind her/him once more that assertion is not true in most of the cases. Though the assertion is true for n × n matrices for n ≤ 3, it fails for matrices for even n = 4. The following two different Jordan forms:  3 1  0  0

0 3 0 0

0 0 3 1

 0  0  0 3

and

 3 1  0  0

0 3 0 0

0 0 3 0

 0  0  0 3

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

343

provide us with a counterexample, as these two have (x − 3)4 and (x − 3)2 as the characteristic and the minimal polynomial, respectively. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. Assume that the characteristic polynomials of the given linear operators or matrices are product of linear products. (a) Any two 3 × 3 nilpotent matrices are similar if and only if they have the same minimal polynomial. (b) Any two 3 × 3 matrices are similar if they have the same minimal polynomial. (c) If a nilpotent operator has invariants 5, 4, 3 and 2, then its nullity is 2. (d) If a nilpotent operator has invariants 5, 4, 3 and 2, then its rank is 5. (e) If a nilpotent operator has minimal polynomial x5 , then its Jordan form has at least one elementary Jordan block of order 5. (f) Two matrices having the same Jordan form are similar. (g) Any two linear operators (or matrices) having the same characteristic polynomial and the same minimal polynomial are similar. (h) Two matrices having the same characteristic polynomial, the same minimal polynomial and the same rank are similar. (i) Two matrices having the same characteristic polynomial, the same minimal polynomial, the same rank and the same trace are similar. (j) The Jordan form of a diagonal matrix is itself. (k) The Jordan form of an upper triangular matrix is a diagonal one. (l) The basis with respect to which the matrix of a linear operator is its Jordan form is unique. (m) For a linear operator T , if rank(T k ) = rank(T k+1 ), then nullity(T k ) = nullity(T k+1 ). (n) For a linear operator T on an n-dimensional vector space with an eigenvalue λ, then ker(T − λI)n = ker(T − λI)k for any positive integer k. (o) The Jordan forms of any two 4 × 4 matrices of rank 2 are the same.

(p) If the minimal polynomial of a linear operator having characteristic polynomial x3 (x − 1)2 is x(x − 1), then its nullity is 5.

2. Find a matrix over C whose characteristic polynomial is x2 (x + 1)4 , the minimal polynomial x2 (x + 1)2 and whose rank is 4. 3. Is there any complex matrix A of order 3, such that   0 0 1    A3 = 0 0 0?   0 0 0

Saikia-Linear Algebra

344

book1

February 25, 2014

0:8

Canonical Forms

4. Suppose that a matrix A ∈ M14 (C) has 0 as its only eigenvalue. If rank(A) = 9, rank(A2 ) = 5, rank(A3 ) = 3, rank(A4 ) = 1 and rank(A5 ) = 0, determine the Jordan form of A. 5. Find all possible Jordan forms of matrices in M7 (C) having 1 as their only eigenvalue. 6. Find the Jordan form of a matrix A ∈ M7 (R) having characteristic polynomial (x − 1)3 (x − 2)4 with nullity(A − I7) = 2 and nullity(A − 2I7) = 3. 7. Suppose that A ∈ M13 (R) has the Jordan form J5 (a) ⊕ J4(a) ⊕ J3(a) ⊕ J1(a)

8. 9. 10. 11.

for some real number a. If I is the identity matrix in M13 (R), compute nullity(A − aI)k for k = 1, 2, 3, . . . . Find all possible Jordan forms of 6 × 6 nilpotent matrices over a field F. Let A and B be 6 × 6 nilpotent matrices over a field F having the same minimal polynomial and the same nullity. Prove that A and B are similar. Give an example to show that the same is not true for 7 × 7 matrices. Let A and B be matrices over a field F having the same characteristic and the same minimal polynomial (x − a1 )d1 (x − a2 )d2 · · · (x − am )dm . If di ≤ 3 for every i, then prove that A and B are similar. Let T be a linear operator on R5 whose matrix with respect to the standard basis is  0 0  A = 0 0  0

1 0 0 0 0

0 0 0 0 0

0 0 1 0 0

 a  b  c   d 0

for some real a, b, c and d. Find a basis of R5 , with respect to which the matrix of T is its Jordan form. 12. Find the Jordan forms of the following matrices over R:   1  −1  1

1 1 0

  1  −1  1

1 1

2

3

5 0

6 8

0

0

 1  0  0  0

0

 1  1  2  1  1  2

 4  7  9  10

  1  −1  1

 1  2  3  4

  1  −2   2  −2

 1  −1  0

1 −1 1 2 4

3 6

6 8

9 12

1

0

0 0

1 0

−1

−1

 4  8  12  16

 0  0 . 0  −1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Jordan Forms

345

13. Let T be the linear operator on C3 [x], the complex vector space of all polynomials with complex coefficients of degree at most 3, given by T ( f (x)) = f (x − 1). Find the Jordan form, the trace and the determinant of T . 14. Let A be a complex matrix with characteristic polynomial ch(x) and minimal polynomial m(x). If ch(x) = m(x)(x + i) and m2 (x) = ch(x)(x2 + 1), find the possible Jordan forms of A. 15. Prove that two projections on a finite-dimensional vector space having the same rank have the same Jordan form. 16. Let A be a nilpotent matrix of order n over a field F of index of nilpotency n. Prove that there is no matrix B over F such that B2 = A. 17. Let A be a nilpotent matrix of order n over a field F. If B = A − In , then compute the determinant of B. 18. Let A be an r × r nilpotent matrix of index of nilpotency r over a field F. Prove that A is similar to its transpose At over F. 19. Prove that any matrix over C is similar to its transpose over C. 20. Let A be a matrix in Mn (C) with trace zero. Prove that A is similar to a matrix with all zeros along the diagonal. 21. Let T be a linear operator on a finite-dimensional vector space V over a field F such that its minimal polynomial is a product of linear factors over F. Prove that T is diagonalizable if and only if ker(T −λI)k = ker(T −λI) for any positive integer k and any eigenvalue λ. (I is the identity operator on V.) 22. Let A, B ∈ Mn (C) such that AB − BA = A. Show that Ak B − BAk = kAk for every positive integer k. Deduce that A is nilpotent.

Saikia-Linear Algebra

7

book1

February 25, 2014

0:8

Bilinear Forms

7.1 INTRODUCTION As we have seen in Chapter 5, the idea of orthogonality in Rn or Cn , introduced through dot products, was the key to establish an important result about real symmetric matrices (that they can always be diagonalized). The concept of bilinear forms generalizes dot products and provides a way for introducing the idea of orthogonality in arbitrary vector spaces. As such, these forms are basic in various areas of advanced mathematics as well as in many applications. Along with bilinear forms, we also introduce alternating bilinear forms in this chapter; such forms and associated symplectic groups are also important in diverse areas of mathematics and physical sciences.

7.2 BASIC CONCEPTS Recall that the dot product f (x, y) = /x, y0 in Rn (or in Cn ) associates a scalar with any two vectors. The usefulness of the dot product is mainly due to the nice properties it has; for example, in Rn , the dot product f (x, y) = /x, y0 is linear in both the variables. In generalizing the dot product to arbitrary vector spaces, this bilinearity is the condition that is carried over. Definition 7.2.1. Let V be a vector space over a field F. A map f : V × V → F is a bilinear form on V if f satisfies the following conditions: f (v + v' , w) = f (v, w) + f (v' , w) f (v, w + w' ) = f (v, w) + f (v, w' ) f (av, w) = a f (v, w) = f (v, aw) for all v, v' , w, w' ∈ V and for all a ∈ F. Thus, if f is bilinear on V, then f (v, w) is a scalar in F for each pair (v, w) of vectors. The reader should be aware of the use of the same symbols for denoting operations in V and F and therefore there should be no confusion in this regard. Note also that a bilinear form is linear in each of the variables so if we fix one variable, the resultant function is just a linear map from V into F. The next proposition states this property more precisely and lists some other simple ones.

346

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

Proposition 7.2.2. (a) (b) (c) (d)

347

Let f be a bilinear form on a vector space V over a field F.

For a fixed v ∈ V, the map Lv , given by Lv (w) = f (v, w) for any w ∈ V, is linear on V. For a fixed w ∈ V, the map Rw , given by Rw (v) = f (v, w) for all v ∈ V, is linear on V. For all v ∈ V, f (0, v) = f (v, 0) = 0. If g(v, w) = f (w, v) for v, w ∈ V, then g is a bilinear form on V.

Proof. We leave the easy verification to the reader as an exercise.

!

Before we consider examples of bilinear forms, we introduce some important classes of such forms. Definition 7.2.3. A bilinear form f on a vector space V over a field F is symmetric if f (v, w) = f (w, v) and skew-symmetric if f (v, w) = − f (w, v) for all v, w ∈ V. f is said to be alternating if f (v, v) = 0 for all v ∈ V. Any alternating form f on a vector space V is skew-symmetric as can be seen by expanding f (v + w, v + w) and using the defining relations f (v, v) = 0 = f (w, w). Conversely, a skew-symmetric form f on V is alternating if in the underlying field F, division by 2 is allowed (that is, chF ! 2). EXAMPLE 1

In any vector space V over a field F, there is a trivial symmetric bilinear form f given by f (v, w) = 0, where 0 is the zero of the field F. We will refer to this form as the zero bilinear form on V, or as the bilinear form on V which is identically zero.

EXAMPLE 2

If V = Fn , the vector space of column vectors or n × 1 matrices over F, then we can get a symmetric bilinear form f on V just like the dot product on Rn by declaring that f (x, y) = xt y for any x, y ∈ V. Note that the matrix product xt y is a scalar in F. By properties of matrix multiplication, one verifies easily that f is symmetric bilinear form on Fn . It must be pointed out that what is usually known as the dot product (see Definition 5.3.21 in Section 3.7) in Cn differs in a significant way from the bilinear form given in the preceding example; if g denotes the usual dot product in Cn , then g(x, ay) = a f (x, y), which means that f is not linear in the second variable. Even then a farreaching generalization of the usual dot product in Cn , known as a hermitian form on a complex vector space, is equally important and has many applications. We study hermitian forms in the next chapter.

EXAMPLE 3

The preceding example is a special case of this one. For the vector space V = Fn , any fixed matrix A ∈ Mn (F) gives rise to a bilinear form fA on V if we let fA (x, y) = xt Ay for any x, y ∈ V.

EXAMPLE 4

We calculate the bilinear form fA on R2 if ' 1 A= 2

( 2 . 3

Saikia-Linear Algebra

348

book1

February 25, 2014

0:8

Bilinear Forms

Given arbitrary vectors x = (x1 , x2 )t and y = (y1 , y2 )t in R2 , the product xt A = (x1 + 2x2 , 2x1 + 3x2 ) and so fA (x, y) = xt Ay = x1 y1 + 2x2 y1 + 2x1 y2 + 3x2 y2 . This formula, like the one for the dot product on R2 , completely describes the form fA in terms of the components of vectors of R2 . Note that in this example, fA (y, x) is the same scalar as fA (x, y) so fA is a symmetric form. As we will see in the following general case, this is no accident as A was a symmetric matrix to begin with. EXAMPLE 5

For a symmetric matrix A = At in Mn (F), one has (xt Ay)t = yt At (xt )t = yt Ax = fA (y, x), by properties of transposes of matrices. On the other hand, xt Ay being a scalar, (xt Ay)t = xt Ay = fA (x, y). It follows that fA (x, y) = fA (y, x) for all x, y in Fn . Thus, the symmetric matrix A forces the bilinear form fA on Fn to be a symmetric form. Conversely, assume that fA (x, y) = xt Ay is a symmetric bilinear form on Fn . Then for the standard basis (considered as column vectors) e1 , e2 , . . . , en of Fn , we have fA (ei , e j ) = fA (e j , ei ) for all i, j. But, by the definition of fA , if A = [ai j ], then it is clear that fA (ei , e j ) = ei t [ai j ]e j = ai j . A similar calculation shows that fA (e j , ei ) = a ji . As fA is assumed to be symmetric, we conclude that the matrix A is symmetric. In a similar manner, the bilinear form fA is an alternating one if A = [ai j ] is an alternating matrix, that is, At = −A and a j j = 0. These examples show the close relationship between bilinear forms and matrices, which we will discuss shortly. For the next example, we need the idea of the trace of a matrix in Mn (F). Recall that for any A = [ai j ] ∈ Mn (F), the trace of A, denoted by T r(A), is the sum of the diagonal elements of A. T r, as a function from Mn (F) to F, is a linear one: T r(A + B) = T r(A) + T r(B) and T r(cA) = cT r(A) for any A, B ∈ Mn (F) and any c ∈ F.

EXAMPLE 6

For a field F, let V = Mm×n (F) be the vector space of all m × n matrices over F. Let A ∈ Mm (F) be a fixed square matrix of order m over F and fA the function defined on V × V as follows: fA (X, Y) = T r(X t AY)

for X, Y ∈ V.

Since for any X, Y and Z in V and c ∈ F, (cX + Y)t = cX t + Y t by properties of transposes, the linearity of the trace function shows that T r((cX + Y)t AZ) = cT r(X t AZ) + T r(Y t AZ) Thus fA is linear in the first variable. Similarly, linearity of T r alone shows that fA is linear in the second variable and so we have verified that fA is a bilinear form on Mm×n (F). In particular, by choosing A = Im , the identity matrix in Mm (F), we see that T r(X t Y) defines a bilinear form f on Mm (F). Note that T r(Y t X) = T r((X t Y)t ) = T r(X t Y) as traces of a matrix and its transpose are the same. Thus f is a symmetric bilinear form.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

EXAMPLE 7

349

Let V = Rn [x] be the real vector space of all real polynomials of degree at most n for any fixed positive integer n. Define J 1 F( f (x), g(x)) = f (x)g(x)dx 0

for any two polynomials f (x) and g(x) in V, where the integral is the usual Riemann integral. The familiar properties of integrals show that F is a symmetric bilinear form on V. In fact, if f (x) and g(x) are continuous real-valued functions in the real vector space C[0, 1] of continuous real-valued functions on the closed interval [0, 1], the same formula for F( f (x), g(x)) provides us with a symmetric bilinear form on C[0, 1]. Lb Note that, in the same way, a f (x)g(x)dx defines symmetric bilinear forms on Rn [x] and C[a, b]. The study of bilinear forms on finite-dimensional vector spaces can be facilitated by associating bilinear forms with matrices. This association, however, depends on bases chosen for the vector space similar to the way matrix representations of linear operators depend on bases.

Definition 7.2.4. Let V be an arbitrary n-dimensional vector space over a field F. Given a bilinear form f on V and a basis B = {v1 , v2 , . . . , vn } of V, the matrix of f with respect to the basis B is the matrix A = [ai j ] in Mn (F), where ai j = f (vi , v j ) for any i, j such that 1 ≤ i, j ≤ n. On the other hand, any A = [ai j ] ∈ Mn (F) determines a bilinear form fA on V, with respect to B, as 4 4 follows: for vectors v = j x j v j and w = j y j v j , let 1 fA (v, w) = a i j xi y j . i, j

Calculations, similar to the ones in earlier examples, show that if x = (x1 , x2 , . . . , xn )t and y = (y1 , y2 , . . . , yn )t denote the coordinate vectors, respectively of v and w with respect B, then fA (v, w) = xt Ay. Thus, fA is a bilinear form on V (see Example 5). Also, note that fA (vi , v j ) = ai j for any basis vectors vi and v j . Therefore by Definition (7.2.4), the matrix of fA , with respect to the basis of B, is A itself. EXAMPLE 8

As an example, we calculate the matrix of the bilinear form fA of Example 4 with respect to two different bases of R2 . Since fA was given by fA ((x1 , x2 )t , (y1 , y2 )t ) = x1 y1 + 2x2 y1 + 2x1 y2 + 3x2 y2 in that example and the standard basis of R2 is given by e1 = (1, 0)t , e2 = (0, 1)t , one has fA (e1 , e1 ) = 1, fA (e1 , e2 ) = 2, fA (e2 , e1 ) = 2 and fA (e2 , e2 ) = 3. It follows that the matrix of fA with respect to the standard basis is A itself. Next, consider the basis B = {(1, 1)t , (1, −1)t } of R2 . Substituting the values of the coordinates of the basis vectors in the expression for fA , one easily shows that the matrix of fA with respect to the new basis is given by ' ( 7 −2 . −2 7 Not unexpectedly, this matrix is also symmetric.

Saikia-Linear Algebra

350

book1

February 25, 2014

0:8

Bilinear Forms

EXAMPLE 9

We consider the matrix of the bilinear form f (X, Y) = tr(X t Y) on V = M2 (F) with respect to the standard basis of V consisting of the four unit matrices e11 , e12 , e21 and e22 ; ei j is the matrix of order 2 whose (i, j)th entry is 1 and whose all other entries are zero. The matrix of f with respect to this basis is clearly a matrix of order 16 over F. So we compute only a few of the entries as an example. For convenience, we rename the matrices in the standard basis; v1 = e11 , v2 = e12 , v3 = e21 and v4 = e22 . The (i, j)th entry ai j of the required matrix, by Definition (7.2.4), then is given by ai j = f (vi , v j ) = T r(vti v j ). Using formulas for multiplication of unit matrices given in Equation 1.11in Chapter 1, we find, for example, a11 = T r(et11 e11 ) = T r(e11 e11 ) = T r(e11 ) = 1, whereas a23 = T r(et12 e21 ) = T r(e21 e21 ) = 0, as e21 e21 is the zero matrix.

For any vector space V, let Bil(V) denote the collection of all bilinear forms on V. One can impose an algebraic structure on Bil(V) by defining addition and scalar multiplication of bilinear forms on V. Definition 7.2.5. Let V be a vector space, not necessarily finite-dimensional, over a field F. The sum f + g, for any f, g ∈ Bil(V), is the map from V × V → F given, for any v, w ∈ V, by ( f + g)(v, w) = f (v, w) + g(v, w). The scalar multiple c f , for any f ∈ Bil(V) and c ∈ F, is the map from V × V → F given , for any v and w in V, by (c f )(v, w) = c f (v, w). Routine verifications show that the sum of two bilinear forms and a scalar multiple of a bilinear form are again bilinear forms. Thus, Bil(V) is closed with respect to the operations defined just now. In fact, Bil(V) is a vector space with the zero form as the zero vector. Proposition 7.2.6. Let V be a vector space over a field F. Then, the set Bil(V) of all bilinear forms on V is itself a vector space over F. Moreover, S ym(V), the subset of all symmetric bilinear forms on V, and Alt(V), the subset of all alternating forms on V, are subspaces of Bil(V). The routine verifications needed for the proof of the proposition are left to the reader. In case V is n-dimensional over a field F, the following result shows that, as vector spaces over F, Bil(V) and Mn (F) are isomorphic. Proposition 7.2.7. Let V be an n-dimensional vector space over a field F. Fix a basis B of V. For f ∈ Bil(V), let T ( f ) ∈ Mn (F) be the matrix of f with respect to B as in Definition (7.2.4). Then T is a vector space isomorphism from Bil(V) onto Mn (F). Moreover, T carries the subspaces S ym(V) and Alt(V) of Bil(V) onto the subspaces of symmetric matrices and of alternating matrices, respectively, in Mn (F).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

351

Proof. Let B = {v1 , v2 , . . . , vn } be the fixed basis of V. For arbitrary bilinear forms f, g ∈ Bil(V), let A = [ai j ] and B = [bi j ] be the matrices in Mn (F) representing f and g, respectively, with respect to B. Thus, for fixed i, j with 1 ≤ i, j ≤ n, one has f (vi , v j ) = ai j

and

g(vi , v j ) = bi j .

Since, for the sum f + g and the scalar multiple c f , we have by definition ( f + g)(vi , v j ) = f (vi , v j ) + g(vi , v j ) = ai j + bi j and (c f )(vi , v j ) = c f (vi , v j ) = cai j , it follows that the (i, j)th entries of the matrices T ( f + g) and T (c f ) are the (i, j)th entries of the matrices A + B and cA, respectively. In other words, T ( f + g) = A + B = T ( f ) + T (g) T (a f ) = aA = aT ( f ), which show that T is linear. Next, note that the discussion following Definition (7.2.4) implies that the linear map T on Bil(V) is one–one and onto Mn (F). Thus, T is an isomorphism as required. The other assertions of the theorem are clear. ! Since the vector space Mn (F) has dimension n2 over F, the following corollary follows. Corollary 7.2.8. The vector space Bil(V), of bilinear forms on an n-dimensional vector space over a field F, is of dimension n2 over F. Summarizing, a bilinear form f on an n-dimensional vector space V over a field F, and the unique matrix A ∈ Mn (F) f determines with respect to a basis B of V are associated through the equation f (v, w) = xt Ay,

(7.1)

where x and y are the coordinate vectors of v and w in V, respectively, with respect to the basis B. The preceding equation also helps us establish the relationship between the matrices representing the same bilinear form with respect to different bases. So suppose that B ∈ Mn (F) of f with respect to another basis B' of V and x' and y' be the coordinate vectors of the same vectors v and w in V. Now, if P is the change of basis matrix from B' to B, then x = Px' and y = Py' (see Proposition 4.5.11 in Section 4.5). Thus, by Equation (7.1), f (v, w) = xt Ay = (Px' )t A(Py' ) = x' t (Pt AP)y' . Since B is the unique matrix f determines with respect to B' , it follows from Equation (7.1) (this time for the basis B' ) that B = Pt AP.

(7.2)

This calls for a definition. Recall that any change of basis matrix, and in particular P, is an invertible matrix.

Saikia-Linear Algebra

352

book1

February 25, 2014

0:8

Bilinear Forms

Definition 7.2.9. For matrices A, B ∈ Mn (F), B is said to be congruent to A (over F), if there is an invertible matrix P ∈ Mn (F) such that B = Pt AP. It is an easy verification, using properties of transposes of matrices, that being congruent to is an equivalence relation in Mn (F). The conclusion of Equation (7.2) can now be stated as follows: any two matrices representing a bilinear form on a finite-dimensional vector space with respect to two bases are congruent. ' ( 0 1 EXAMPLE 10 Consider the matrix A = ∈ M2 (R). We compute the bilinear form f on R2 de−1 0 termined by A relative to the standard basis E of R2 . If v = (x1 , x2 )t and w = (y1 , y2 )t are two arbitrary vectors in R2 , then, as E is the standard basis, (x1 , x2 )t and (y1 , y2 )t themselves are the coordinate vectors of v and w, respectively. Therefore, by Equation (7.1), we obtain ' (2 3 0 1 y1 f (v, w) = (x1 , x2 ) −1 0 y2 2 3 y = (x1 , −x2 ) 1 y2 = x 1 y2 − x2 y1 .

It is clear that A itself is the matrix of f with respect to the standard basis of R2 . To t t compute the matrix B of f with respect ' to the(basis B = {(1, −1) , (1, 2) }, we note 1 1 that the change of basis matrix is P = (see Proposition 3.4.14). Therefore, −1 2 by Equation (7.2), ' (' (' ( ' ( 1 −1 0 1 1 1 0 1 B = 1/3 = . 1 2 −1 0 −1 2 −1 0 The concept of congruence will help us in finding simpler matrix representation of a bilinear form exactly in the manner the concept of similarity helps in finding simpler matrix representation of a linear operator. The ideal situation will be when it is possible to have a diagonal matrix representing a given bilinear form. As was the case with linear operators, therefore, it is important to identify diagonalizable bilinear forms. Definition 7.2.10. A bilinear form f on a finite-dimensional vector space V is said to diagonalizable if there is a basis of V with respect to which the matrix of f is a diagonal one. We examine the problem of diagonalizing a bilinear form in the next two sections. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. Assume that all underlying vector spaces are finite-dimensional and all given matrices are square. (a) Given a vector space V of dimension n over a field F, any matrix of order n over F is the matrix of some bilinear form on V with respect to any given basis of V.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Basic Concepts

353

(b) If dim V = n, then the dimension of the space of all symmetric bilinear forms on V is 2n. (c) Two distinct bilinear forms on a vector space V cannot have the same matrix representation with respect to a fixed basis of V. (d) Two matrices representing a bilinear form on Fn have the same characteristic polynomial. (e) The difference of two symmetric bilinear forms on a vector space need not be a symmetric bilinear form. (f) If for a bilinear form f on a vector space V, f (vi , vi ) ! 0 for any basis vector vi in a given basis of V, then f (v, v) ! 0 for any non-zero v ∈ V. (g) Two similar matrices in Mn (F) need not be congruent in Mn (F). (h) Two congruent matrices in Mn (F) have the same eigenvalues. (i) Two congruent matrices in Mn (F) have the same determinant. (j) Two congruent matrices in Mn (F) have the same rank. (k) The bilinear form f on M2 (F) given by f (A, B) = tr(AB) is symmetric. (l) If charF = 2, then any alternating form on a vector space V over F is a symmetric one. 2. Verify the elementary properties of bilinear forms given in Proposition (7.2.2). 3. Let V be a vector space over a field F. Verify that the sum of two bilinear forms on V and a scalar multiple of a bilinear form on V are bilinear forms on V. Furthermore, carry out the verifications needed to complete the proof of Theorem (7.2.6). 4. Verify that the following functions given in the examples of this section are bilinear forms. F stands for an arbitrary field. (a) fA on Fn if fA (x, y) = xt Ay for any x, y ∈ Fn , where A ∈ Mn (F). L1 (b) F on V = Rn [x], or on C[0, 1] if F( f (x), g(x)) = 0 f (t)g(t)dt for f (x), g(x) ∈ V. 5. Determine which of the following functions from R2 × R2 → R are bilinear forms. Assume v = (x1 , x2 )t and w = (y1 , y2 )t are arbitrary vectors in R2 . (a) f (v, w) = 1. (b) f (v, w) = x1 x2 + y1 y2 . (c) f (v, w) = x1 y2 − x2 y1 .

(d) f (v, w) = (x1 + y1 )2 − x2 y2 . (e) f (v, w) = −x1 y2 − 2x2 y1 − 3x1y2 − 4x2y2 .

6. Which of the following are bilinear forms? Justify your answer. (a) f : R × R → R given by f (x1 , x2 ) = ax1 + bx2 , where a and b are fixed non-zero real numbers. (b) For a vector space V over a field F, f : V × V → F given by f (v, w) = (F(v, w))2 for a fixed bilinear form F on V. (c) f : V × V → F given by f (A, B) = T r(A)T r(B), where V = Mn (F).

(d) f : C × C → R given by f (z1 , z2 ) = |z1 − z2 |, where C is the vector space of complex numbers over the field R, and |z| denotes the modulus, or absolute value of the complex number z.

7. Let V be a vector space over a field F and L and R linear maps from V into F (F considered a vector space over itself). Is the map f : V × V → F given by f (v, w) = L(v)R(w) for any v, w ∈ V, a bilinear form on V? Justify your answer.

Saikia-Linear Algebra

354

book1

February 25, 2014

0:8

Bilinear Forms

8. Let f be the bilinear form on the real vector space R2 given by f ((x1 , x2 )t , (y1 , y2 )t ) = x1 − 2x1 y2 − 2x2 y1 + 3x2 y2 .

Find the matrices A and B of f with respect to (i) the standard basis E = {(1, 0)t , (0, 1)t } and (ii) the basis B = {(1, 1)t , (−1, 1)t }, respectively. 9. Consider the real vector space V = M2 (R), and let f be the bilinear form on V defined by f (X, Y) = T r(X t AY), where ' ( 1 −1 A= . −1 0

Compute the diagonal entries of the matrix of f with respect to the standard basis {e11 , e12 , e21 , e22 }, where ei j is the unit matrix of order 2, whose only non-zero entry is an 1 at the (i, j)th place. ( ' 1 −1 2 with 10. Consider the bilinear forms on the real vector space R determined by the matrix −1 2 t t respect to (i) the standard basis E and (ii) the basis B = {(1, 1) , (1, 2) }, respectively. Compute the formulae for these bilinear forms in terms of components of arbitrary vectors of R2 . 11. Let f be the bilinear form on the real vector space R3 whose matrix with respect to the standard basis of R3 is given by    1 −1 2   0 3. −1  2 −1 −1 Find the matrix of f with respect to the basis {(1, 0, 1)t , (1, 1, 0)t , (1, 1, −1)t }. of R3 . ' ( 0 1 2 12. Verify that the bilinear form f on R determined by the matrix A = , with respect to the −1 0 standard basis of R2 , is alternating by computing xt Ax for any x ∈ R2 . 13. For any positive integer n, let p and q be non-negative integers such that p + q = n and ( ' 0 Ip A= 0 −Iq

a real matrix of order n (here I p and Iq are the identity matrices of orders p and q, respectively, and 0 denote zero matrices of suitable sizes). If f p,q is the bilinear form on Rn given by f p,q (x, y) = xt Ay, find a formula for the form in terms of coordinates of x and y and verify that it is a symmetric one. 14. Prove that congruence of matrices is an equivalence relation in Mn (F) for any field F. Exercises 12 and 13 are valid if the underlying field R is replaced by any field F whose characteristic is ! 2, that is, a field in which division by 2 is allowed. 15. Let f be a bilinear form on a vector space V over R. Prove that f can be expressed uniquely as a sum f1 + f2 where f1 is a symmetric bilinear form on V and f2 an alternating bilinear form on V. 16. Show that, if a bilinear form f on a vector space over R is both symmetric and alternating, then f is the zero form. 17. Let f be a symmetric and g an alternating bilinear form on a complex vector space V. If f + g is the zero form on V, then show that both f and g are zero on V.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Functionals and Dual Space

355

18. Let V be a vector space over a field F. Assume that chF ! 2, so that division by 2 in F is allowed. On the vector space Bil(V) of all bilinear forms on V, define a function T such that for any f ∈ Bil(V), T ( f ) is given by T ( f )(v, w) = (1/2) f (v, w) − (1/2) f (w, v) for any v, w ∈ V. Show that (i) T ( f ) ∈ Bil(V) and (ii) T is a linear operator on Bil(V). 19. Assume that dim V = n. Determine the dimensions of the subspaces consisting of the symmetric and the skew-symmetric bilinear forms in Bil(V). 20. Let V be a vector space over a field F such that V = V1 ⊕ V2 for subspaces V1 and V2 of V. If f1 and f2 are bilinear forms on V1 and V2 , respectively, prove that there is a unique bilinear form f on V whose restrictions to V1 and V2 are f1 and f2 , respectively. 4 21. Define f on Cn by f ((z1 , z2 , . . . , zn ), (w1 , w2 , . . . , wn )) = ni=1 zi wi . Show that f is a bilinear form on Cn if it is considered a vector space over R but not bilinear if considered over C.

7.3 LINEAR FUNCTIONALS AND DUAL SPACE This section is devoted to developing tools that are needed to resolve the diagonalization problem for bilinear forms in general. The notion of linear functional will be quite useful in this development, and so we give a brief review of the theory of linear functionals. Definition 7.3.1. Let V be a vector space over a field F. A linear map f from V to F, where F is considered a vector space over itself, is called a linear functional on V. The set of all linear functionals on V forms a vector space HomF (V, F) as a special case of Theorem (4.3.4). Definition 7.3.2. For a vector space V over a field F, its dual space V ∗ is the vector space of all linear functionals on V. Observe that for an n-dimensional vector space V, dim V ∗ = n by Corollary (4.3.8), and so V ∗ is isomorphic to V as vector spaces over F (see Corollary 4.4.5 in Section 4.4). We now explore the connection between bilinear forms and linear functionals. Let f be a bilinear form on a vector space V over a field F. Recall that (see Proposition (7.2.2)) for a fixed v ∈ V, the map Lv : V → F given by Lv (w) = f (v, w) is a linear map, which means that Lv is a linear functional on V. This allows us to define a map L : V → V ∗ by the formula L(v) = Lv .

(7.3)

Since f (v1 + v2 , w) = f (v1 , w) + f (v2 , w), it follows that L(v1 + v2 ) = L(v1 ) + L(v2 ). On the other hand, for a scalar a, f (av, w) = a f (v, w), so one can conclude that L(av) = aL(v). Thus, L is a linear map of V into V ∗ . Similarly, there is a linear map R from V into V ∗ such that R(w) = Rw ,

(7.4)

where Rw , for a fixed w ∈ V, is the linear map on V given by Rw (v) = f (v, w). We use these maps to define certain subspaces of V important to the studies of a bilinear form f .

Saikia-Linear Algebra

356

book1

February 25, 2014

0:8

Bilinear Forms

Definition 7.3.3. For a vector space V and a bilinear form f on V, let L and R be the linear maps defined in Equations (7.3) and (7.4). We let V ⊥L = ker L V ⊥R = ker R

= {v ∈ V | Lv : V → F is the zero map}.

= {u ∈ V | Ru : V → F is the zero map}.

Being kernels of linear maps, V ⊥L and V ⊥R are subspaces of V. There are simpler descriptions of these subspaces. Proposition 7.3.4. For a bilinear form f on a vector space V, V ⊥L = {v ∈ V | f (v, w) = 0 ∈ F for all w ∈ V}; V ⊥R = {w ∈ V | f (v, w) = 0 ∈ F for all v ∈ V}. Proof. Left as an exercise to the reader.

!

One may like to think of the condition f (v, w) = 0 generalizing the concept of perpendicularity in terms of the dot product in R2 or R3 . However, unless f is symmetric, one does not have the right generalization. Note that if f is symmetric, then V ⊥L = V ⊥R . Definition 7.3.5.

Let f be a bilinear form on a vector space V. If V ⊥L = V ⊥R , then we let V ⊥ = V ⊥ L = V ⊥R ,

and call V ⊥ the radical or the null space of f . For the notion of perpendicularity to be meaningful, we need to exclude pathological cases where a non-zero vector can be perpendicular to itself. It is clear that such exclusion can be ensured by insisting on the condition that V ⊥L = V ⊥R = {0}. The next result will help in finding a more practical interpretation of this condition. Proposition 7.3.6. Let f be a bilinear form on a finite-dimensional vector space V over a field F. Then, the dimensions of the subspaces V ⊥R and V ⊥L are the same, and equal to the nullity of the matrix of f with respect to any basis of V. Recall that (see Definition 3.6.13) the nullity of any n × n matrix C over F is n − rank(C); it is also the dimension of the solution space in Fn of the matrix equation Cx = 0. Proof. Fix a basis B = {v1 , v2 , . . . , vn } of V and let A = [ai j ] be the matrix in Mn (F) of the bilinear form f with respect to B so that ai j = f (vi , v j ) for all i, j, 1 ≤ i, j ≤ n. It follows from the definition 4 of V ⊥L that a vector v = j c j v j ∈ V ⊥L if and only if f (v, vi ) = 0 for i = 1, 2, . . . , n. Expanding 4 4 f (v, vi ) = f ( j c j v j , vi ), we see that v = j c j v j ∈ V ⊥L if and only if n 1

c j a ji = 0 for i = 1, 2, . . . , n.

j=1

Equivalently, v =

4

j

c j v j ∈ V ⊥L if and only if the components of the coordinate vector c =

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Functionals and Dual Space

357

(c1 , c2 , . . . , cn )t of v with respect to B satisfy the system of equations: n 1

bi j x j = 0

for i = 1, 2, . . . , n,

j=1

where bi j = a ji for all 1 ≤ i, j ≤ n. We may thus conclude that the dimension of V ⊥L is precisely the dimension of the solution space in Fn of the matrix equation At x = 0, where At is the transpose of A = [ai j ] and x = (x1 , x2 , . . . , xn )t . Since ranks of a matrix and its transpose are the same, it follows that the dimension of V ⊥L is the nullity of A as claimed. A similar analysis shows that the dimension of V ⊥R also is the nullity of A. ! It is now easy to derive the required interpretation of the condition V ⊥L = {0} = V ⊥R . Corollary 7.3.7. Let f be a bilinear form on a finite-dimensional vector space V over a field. Then, the following are equivalent. (a) V ⊥L = {0}. (b) V ⊥R = {0}. (c) The matrix of f with respect to any basis of V is invertible. Proof. An n × n matrix over a field is invertible if and only if its rank is n, or equivalently, its nullity is zero. ! Another consequence of the last proposition is that the rank and nullity of the matrix of a bilinear form on a finite-dimensional vector space with respect to any basis is independent of the choice of the basis. Note that this independence could have been inferred from the fact that matrices representing a bilinear form with respect to different bases are congruent and congruent matrices have the same rank. Definition 7.3.8. The rank of a bilinear form on a finite-dimensional vector space is the rank of the matrix of the form with respect to any basis of the space. We now introduce an important class of bilinear forms. Definition 7.3.9. A bilinear form f on a vector space V is said to be non-degenerate if V ⊥L = {0} or equivalently, V ⊥R = {0}. Thus, for a non-degenerate bilinear form, its radical V ⊥ is defined and is the zero subspace. The definition implies that for a symmetric or a skew-symmetric non-degenerate bilinear form f on V, for any non-zero vector u ∈ V, there is a vector v ∈ V such that f (u, v) ! 0. It is easy to give examples of non-degenerate bilinear forms thanks to Corollary (7.3.7). For any invertible matrix A ∈ Mn (F), the bilinear form fA on Fn defined by fA (x, y) = xt Ay has A as its matrix with respect to the standard basis and so, by the said corollary, is a non-degenerate form. One of the most important implication of the non-degeneracy of a bilinear form f is that every linear functional on V is determined by a unique vector in V through such a bilinear form. Let f be a non-degenerate bilinear form on a finite-dimensional vector space V over a field F. So, by definition, V ⊥L = {0}. It follows, by Definition (7.3.3), that the kernel of the linear map L : V → V ∗ is the zero subspace and so the linear map L is one–one. However, dim V = dim V ∗ is finite, so L

Saikia-Linear Algebra

358

book1

February 25, 2014

0:8

Bilinear Forms

must be onto V ∗ (see Corollary 4.2.10). But then the ontoness of L implies that any linear functional in V ∗ must be Lv for some v ∈ V. Conversely, if L is onto, that is, if L(V) = V ∗ , then by tracing the argument backwards, we see that f must be non-degenerate. In a similar manner, we can show that f is non-degenerate if and only if R(V) = V ∗ , where R is the map from V to V ∗ given by Equation (7.4). These conclusions are put in a slightly different format in the following proposition. Proposition 7.3.10. For a non-degenerate bilinear form f on a finite-dimensional vector space V over a field F, the following hold. (a) For any linear functional φ ∈ V ∗ , there is a unique vector v ∈ V such that φ(w) = f (v, w) for all w in V. (b) For any linear functional φ ∈ V ∗ , there is a unique vector w ∈ V such that φ(v) = f (v, w) for all v in V. Proof. Use the definitions of Lv and Ru in conjunction with the conclusions of the preceding discussion. ! The importance of the preceding proposition is due the fact that the theoretical basis of the material that will be considered in the next section is provided by it. EXERCISES 1. Determine whether the following assertions are true or false giving brief justifications. Assume that all underlying vector spaces are finite-dimensional over an arbitrary field F. (a) For any vector space V, the dual space V ∗ is isomorphic to V. (b) For a linear functional f on an n-dimensional vector space, the rank of f is n − 1.

(c) For any invertible matrix A in Mn (F), the bilinear form f on Fn given by f (x, y) = xt Ay is a non-degenerate form.

(d) If f is a non-degenerate bilinear form on a vector space V, then every element of the dual V ∗ is of the form Rv for some v ∈ V.

(e) For any non-zero vector v in a vector space V, there is some f ∈ V ∗ such that f (v) is non-zero. (f) On an n-dimensional vector space V, there is a bilinear form on V of rank m for any integer m, 0 ≤ m ≤ n.

(g) On an n-dimensional vector space V, there is a non-degenerate bilinear form on V of rank m for any integer m, 0 ≤ m ≤ n.

(h) Given a non-degenerate bilinear form f on a vector space V, the restriction of f to any non-zero subspace V is non-degenerate.

2. 3. 4. 5. 6.

(i) The sum of two non-degenerate bilinear forms on a vector space is always non-degenerate. Give examples of bilinear forms f on R3 having rank 1, 2 and 3, respectively, by giving formulae for f ((x1 , x2 , x3 )t , (y1 , y2 , y3 )t ) in each case. Give examples of non-degenerate bilinear forms, one each, on R2 and R3 . Give an example of a symmetric bilinear form f on R3 such that f is degenerate, that is, not non-degenerate. Prove Proposition (7.3.4). Complete the proof of Proposition (7.3.6) by showing that the dimension of V ⊥R is the nullity of the matrix A.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Linear Functionals and Dual Space

359

7. Consider the bilinear form f on R4 given by f (x, y) = xt Ay, where A is the matrix   0   1 0 0   0 1 0 0   . 0   0 0 1 0 0 0 −1

Find a vector v ∈ R4 such that f (v, v) = 0. 8. Let f be a bilinear form on a finite-dimensional vector space V and W a subspace of V such that the restriction fW of f to W is a non-degenerate bilinear form on W. Show that the dimension of W cannot exceed the rank of the bilinear form f . 9. Let V be a vector space over a field F, and f1 and f2 linear functionals on V. Show that the map f : V × V → F given by f (v, w) = f1 (v) f2 (w) for v, w in V, defines a bilinear form on V. Is f symmetric? For a finite-dimensional vector space V over a field F with basis {v1 , v2 , . . . , vn }, the dual basis of the dual space V ∗ is the collection { f1 , f2 , . . . , fn } of linear functionals on V such that fi (v j ) = δi j , where δi j is the Kronecker delta for all i, j. We had seen that these fi form a basis of the dual space V ∗ .

10. Let V be a finite-dimensional vector space over a field F with basis B = {v1 , . . . , vn } and dual basis B∗ = { f1 , . . . , fn }. Prove that the bilinear forms T i j given by T i j (v, w) = fi (v) f j (w) for any v, w in V, for 1 ≤ i, j ≤ n, form a basis of the vector space Bil(V) over F. 11. Let V be a finite-dimensional vector space over a field F and f a bilinear form on V. Prove that f can be expressed as the product of two linear functionals f1 and f2 on V in the sense that f (v, w) = f1 (v) f2 (w) for v, w in V if and only if f has rank 1. 12. Let V be a vector space over a field F and V ∗ be its dual space. Form the vector space W = V × V ∗ whose vector space operations are performed component-wise in terms of the vector space operations of V and V ∗ , respectively. Define ω : W × W → F by ω((v1 , f1 ), (v2 , f2 )) = f2 (v1 ) − f1 (v2 ) for all vi ∈ V and fi ∈ V ∗ . Prove that ω is a skew-symmetric, non-degenerate bilinear form on W.

For any linear operator T on a vector space V with a bilinear form f , a linear map T ∗ on V is said to be the adjoint of T relative to the form f if f (T v, w) = f (v, T ∗ w) for all v, w in V. The next exercise proves the existence of the adjoint of any linear operator on a finite-dimensional vector space with a non-degenerate bilinear form. 13. Let f be a non-degenerate bilinear form on a finite-dimensional vector space V over a field F and T a linear operator on V. (a) For a fixed vector w ∈ V, show that the map φw defined by φw (v) = f (T v, w)

for any v ∈ V,

is a linear functional on V. (b) Use Proposition (7.3.10) to show that any fixed w ∈ V determines a unique vector w' ∈ V such that f (T v, w) = f (v, w' ) for all v ∈ V.

Saikia-Linear Algebra

360

book1

February 25, 2014

0:8

Bilinear Forms

(c) Define a map T ∗ : V → V by the rule that for any w ∈ V, T ∗ (w) = w' , where w' is the vector uniquely determined by w as in (b), so that f (T v, w) = f (v, T ∗ w). (d) For any w1 , w2 ∈ V and a ∈ F, let T ∗ (w1 + w2 ) = u1,2 and T ∗ (aw1 ) = u1 , so that by part (c), f (T v, w1 + w2 ) = f (v, u1,2 ) and f (T v, aw1 ) = f (v, u1 ). Use the bilinearity of f in these relations to deduce that f (v, T ∗ w1 + T ∗ w2 ) = f (v, u1,2 ) f (v, aT ∗ w1 ) = f (v, u1 ) for all v ∈ V.

(e) Finally use the non-degeneracy of f to conclude, from the relations proved in part (d), that T ∗ is linear on V. 14. Let T and S be linear operators on a finite-dimensional vector space V over a field F, and let a ∈ F. Assume that V has a non-degenerate bilinear form f . Prove that (i); (T + S )∗ = T ∗ + S ∗ , (ii) (T S )∗ = S ∗ T ∗ , (iii) (aT )∗ = aT ∗ and (iv) (T ∗ )∗ = T , where the adjoints are taken with respect to f .

7.4 SYMMETRIC BILINEAR FORMS In this section, we study symmetric bilinear forms in some detail; the main result in the section deals with diagonalization of such forms so as to determine their canonical matrix forms. The implications for congruence classes of real symmetric matrices are discussed, which naturally leads to positive definite and positive semi-definite matrices. The basic notion that we need for the main result is that of orthogonality which we introduce now. Definition 7.4.1. Let f be a symmetric bilinear form on a vector space V over any field. Given two vectors v, w ∈ V, we say v is orthogonal to w (with respect to f ) if the scalar f (v, w) = 0. Sometimes the notation v ⊥ w is used to mean that v is orthogonal to w. A few remarks are in order. (i) It is clear that the notion of orthogonality depends completely on the bilinear form. Two vectors may be orthogonal with respect to one symmetric bilinear form on a vector space, but need not be so with respect to another. (ii) Orthogonality is a symmetric relation. If v ⊥ w, then by the symmetry of the underlying bilinear form, w ⊥ v. (iii) As with the dot product on Rn , any vector is orthogonal to the zero vector with respect to any given symmetric bilinear form f ; f (v, 0) = f (v, v − v) = f (v, v) − f (v, v) = 0. (iv) However, in contrast to dot product, there may be self-orthogonal non-zero vectors with respect to a non-trivial symmetric bilinear form. For an example, see Exercise 7 of the last section. (v) On the other hand, with respect to a non-degenerate symmetric bilinear form, by Definition (7.3.9), the only vector which is orthogonal to every vector space has to be the zero vector. (vi) The notion of orthogonality, as given in the preceding definition, makes sense in a vector space

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

361

equipped with an alternating form f as for such a form f (v, w) = 0 if and only if f (w, v) = 0. But here in this section, we focus on symmetric bilinear forms as these forms, unlike alternating ones, can be diagonalized. For canonical matrix forms of alternating forms, see the last section of this chapter. We introduce two related notions; these are generalizations of similar ones for the dot products on Rn , which have been used extensively earlier in discussing real symmetric matrices. Definition 7.4.2. Let f be a symmetric bilinear form on a vector space V. For a subspace W of V, the orthogonal complement of W, denoted by W ⊥ , is defined as W ⊥ = {v ∈ V | f (v, w) = 0 for all w ∈ W}. Thus W ⊥ is the set of all vectors in V which are orthogonal to every vector in W. By properties of bilinear forms, W ⊥ itself is a subspace of V. Also, note that, if W = V, then the preceding definition coincides with our earlier definition of V ⊥ . Definition 7.4.3. A basis B = {v1 , v2, , . . . , vn } of a vector space V with a symmetric bilinear form f is called an orthogonal basis of V with respect to f if f (vi , v j ) = 0 for i ! j (1 ≤ i, j ≤ n). It is clear that the matrix of a symmetric bilinear form f , with respect to an orthogonal basis, is a diagonal one. Before coming to the main result, we make an observation: if f is a bilinear form on a vector space V over a field F and W is a subspace of V, then f can be restricted to W, that is, f can be considered a map from W × W into F. It is clear that the restriction of f to W is a bilinear form on W, which is denoted by fW . If f is symmetric, then fW will be symmetric on W. However, fW can be non-degenerate even if f is not so on V (and conversely). Proposition 7.4.4. Let V be a finite-dimensional vector space with a symmetric bilinear form f . If W be a subspace of V such that the restriction fW is non-degenerate, then V = W ⊕ W ⊥. Proof. We first claim that W ∩ W ⊥ = {0}. To prove the claim, note that the non-degeneracy of fW implies that the only vector in W, which is orthogonal to every vector in W, must be the zero vector. Now, if w ∈ W ∩ W ⊥ , then by the definition of W ⊥ , w is orthogonal to all of W, and so must be the zero vector. Hence the claim. Thus, it remains to show that V = W + W ⊥ , or equivalently, dim V = dim W + dim W ⊥ as W ∩ W ⊥ = {0} (see Proposition 3.5.6). Define h : V → W ∗ , the space of linear functionals on W, by letting h(v) = φv

for v ∈ V,

(7.5)

where φv is the restriction of Lv to W (see Equation 7.3 for the definition of Lv ) and thus is a linear functional on W. Since φv (w) = Lv (w) = f (v, w) for any w ∈ W, it follows that, for any v1 , v2 ∈ V and

Saikia-Linear Algebra

362

book1

February 25, 2014

0:8

Bilinear Forms

a ∈ F, φav1 +v2 (w) = f (av1 + v2 , w) = a f (v1 , w) + f (v2 , w) = aφv1 (w) + φv2 (w) = (aφv1 + φv2 )(w). Thus, as maps on W, φav1 +v2 and aφv1 + φv2 are equal. By the definition of h, this equality implies that h is linear on V. We determine the kernel of h now. Note that, by the definition of h, v ∈ ker h if and only if φv (w) = 0 for any w ∈ W, which is equivalent to the relation f (v, w) = 0 holding for any w ∈ W. This shows that v ∈ ker h if and only if v ∈ W ⊥ . Thus ker h = W ⊥ . On the other hand, the non-degeneracy of fW implies, according to Proposition (7.3.10), that any φ ∈ W ∗ is uniquely determined by some w' ∈ W in the sense that φ(w) = f (w, w' ) = f (w' , w) for all w ∈ W. Comparing φ with the definitions of the maps h and φw' , we then see that φ = φw' = h(w' ). Thus we have shown that h is onto W ∗ , that is, Im(h) = W ∗ . As dim W = dim W ∗ , it follows that dim Im(h) = dim W. Therefore the dimension formula of Theorem (4.2.7) yields dim V = dim Im(h) + dim ker h = dim W + dim W ⊥ , as desired.

!

The idea of the characteristic of a field will now be required. By the characteristic ch F of a field F, we mean the smallest positive integer n such that n.1 = 0, where 1 is the multiplicative identity of F; if there is no such integer, we say that F has characteristic zero. It can be easily shown that if the characteristic of a field is non-zero, then it is necessarily a prime. Our familiar fields R, Q or C have all characteristic zero whereas the finite field of p elements has characteristic p. One of the difficulties with a field of characteristic p is that the field element p = p.1 behaves as the zero of the field and so is not invertible; thus division by p is not allowed. Thus if a field, finite or infinite, contains Z2 , the field of two elements, then it has characteristic 2 and so division by 2 is not allowed in such a field. Lemma 7.4.5. Let f be a symmetric bilinear form on a vector space V over a field F such that ch F ! 2. If f is not identically zero, then there is some v ∈ V such that f (v, v) ! 0. Proof. Since f is not identically zero, there is a pair of non-zero vectors v, w ∈ V such that f (v, w) ! 0. If, either f (v, v) ! 0 or f (w, w) ! 0, we are done. Otherwise, we let u = v + w. Then, as both f (v, v) and f (w, w) are zeros, an easy calculation shows that f (u, u) = f (v + w, v + w) = 2 f (v, w) ! 0 and so u is the desired vector in V. Note the use of the hypothesis about the characteristic of the scalar field in deducing the conclusion. ! Now we can prove the main theorem. Theorem 7.4.6. Let f be a symmetric bilinear form on a finite-dimensional vector space V over a field F, where ch F ! 2. Then there is an orthogonal basis of V relative to the symmetric form f .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

363

Proof. If f is identically zero, then any basis of V is trivially an orthogonal basis. So we may assume that f is not identically zero. In that case, the preceding lemma provides a non-zero v1 ∈ V such that f (v1 , v1 ) ! 0. We take v1 as the first vector of the orthogonal basis we are seeking, and construct the rest of the basis vectors inductively as follows: suppose we have chosen a set of linearly independent vectors v1 , . . . , vk for k ≥ 1 such that f (vi , vi ) ! 0 but f (vi , v j ) = 0 if i ! j. We now find the next vector of the required basis. Let W be the subspace of V spanned by the vectors v1 , . . . , vk . The matrix of the restriction fW with respect to this basis of W is clearly diag[ f (v1 , v1 ), f (v2 , v2 ), . . . , f (vk , vk )]. By our choice of the vectors vi , these diagonal entries are all non-zero so the matrix of fW is invertible. It follows from Corollary (7.3.7) that fW is non-degenerate. We can then conclude that V = W ⊕ W⊥ by Proposition (7.4.4). Now, if f restricted to W ⊥ is identically zero, we choose any basis of W ⊥ and label the vectors in the basis as vk+1 , . . . , vn . Since f is identically zero, any two vectors in this list are orthogonal. On the other hand, by the definition of W ⊥ , every vi for 1 ≤ i ≤ k is orthogonal to any v j for k + 1 ≤ j ≤ n. Finally, as V is the direct sum of W and W ⊥ , the union v1 , . . . , vk , vk+1 , . . . , vn is a basis of V which, by the observations we have made, is orthogonal. Consider next the case in which f restricted to W ⊥ is not identically zero. Then, according to Lemma (7.4.5), we can find a non-zero vector vk+1 in W ⊥ such that f (vk+1 , vk+1 ) ! 0. Since every vector in W is orthogonal to each vector in W ⊥ , the vectors in the list v1 , . . . , vk , vk+1 satisfy the following properties: (a) f (vi , v j ) = 0 if i ! j for 1 ≤ i, j ≤ k + 1. (b) f (vi , vi ) ! 0 for 1 ≤ i ≤ k + 1. We claim that v1 , . . . , vk+1 are linearly independent. Suppose that a1 v1 + · · · ak vk + ak+1 vk+1 = 0

4 for scalars ai ∈ F. Then f ( k+1 j=1 a j v j , vk+1 ) = f (0, vk+1 ) = 0 which implies, upon expanding the sum, 4k+1 that j=1 a j f (v j , vk+1 ) = 0. Since distinct vectors of the list v1 , . . . , vk , vk+1 are orthogonal, it follows that the sum reduces to ak+1 f (vk+1 , vk+1 ) = 0 forcing ak+1 = 0. But then the relation of linear dependence becomes a1 v1 + · · · + ak vk = 0, whence we conclude that the rest of the scalars a j too are zero as v1 , . . . , vk are linearly independent. So the claim is established. Thus we have been able to extend the linearly independent set v1 , . . . , vk to a larger linearly independent set v1 , . . . , vk , vk+1 such that the distinct vectors are orthogonal, and f (v j , v j ) ! 0 for each v j in the set. Hence, by induction, we can produce an orthogonal basis of the finite-dimensional vector space V. ! For a converse to this theorem, which holds for any scalar field, see Exercise 7 of this section. Since symmetric matrices correspond to symmetric bilinear forms by Theorem (7.2.7), the diagonalizability of symmetric bilinear forms implies that any symmetric matrix is congruent to some diagonal matrix.

Saikia-Linear Algebra

364

book1

February 25, 2014

0:8

Bilinear Forms

Corollary 7.4.7. Let F be a field such that ch F ! 2. Given any symmetric matrix A ∈ Mn (F), there is an invertible matrix P such that Pt AP is diagonal. For real symmetric matrices, we have already obtained a stronger result in section 5.3: a real symmetric matrix A is similar to a diagonal matrix, whose diagonal entries are the eigenvalues of A. We have seen that two matrices in a similarity class share the same eigenvalues; so in some sense the eigenvalues are the invariants of the similarity class. Moreover, the similarity class of a diagonalizable matrix A contains a unique diagonal matrix (upto the order of the diagonal entries), whose diagonal entries are precisely the eigenvalues of A; this unique diagonal matrix is sometimes called the canonical form of A in its similarity class. On the other hand, two congruent matrices, even over R, need not have the same eigenvalues, traces or determinants ( but do have the same rank). Thus it is reasonable to ask whether the congruence class of a diagonalizable matrix has a distinguished diagonal matrix (the canonical form in the congruence class) containing information about some invariants of the congruence class. For a real symmetric matrix, the answer is provided by the following classical result due to Sylvester. Theorem 7.4.8. (Sylvester’s Law of Inertia) Let f be a symmetric bilinear form on a real ndimensional vector space V. In any diagonal matrix representing f with respect to some orthogonal basis of V, the numbers, respectively, of positive, negative and zero diagonal entries are independent of the chosen basis and are uniquely determined by f . Proof. It suffices to show that if D and D' are diagonal matrices in Rn representing f with respect to two orthogonal bases, say B and B' , of V, then the numbers of positive and negative diagonal entries of D are, respectively, the same as the numbers of positive and negative diagonal entries of D' . We first note that the rank of a diagonal matrix is the number of non-zero diagonal entries. Since, being congruent matrices, D and D' have the same rank, say r, it follows that both D and D' have r non-zero diagonal entries. Suppose now that the numbers of positive entries of D and D' , respectively, are p and s. Note that the ith entry of D is f (ui , ui ), where ui is the ith vector of the ordered basis B (similarly for the ith entry of D' ). Therefore permuting the basis vectors if necessary, we may assume that the first p diagonal entries of D and the first s diagonal entries of D' are positive. Thus we may assume that orthogonal bases B = {v1 , . . . , vr , w1 , . . . , wn−r } and B' = {v'1 , . . . , v'r , w'1 , . . . , w'n−r } can be chosen such that the matrices of f with respect to B and B' are, respectively, D = diag[d1, d2 , . . . , dr , 0, . . . , 0] and D' = diag[d1' , d2' , . . . , dr' , 0, . . . , 0] such that di and d 'j are non-zero; moreover and

f (vi , vi ) = di > 0 for 1 ≤ i ≤ p, f (v'j , v'j ) = d 'j < 0

for s + 1 ≤ j ≤ r.

(7.6) (7.7)

Let W1 be the subspace of V spanned by v1 , v2 , . . . , v p . Then for any non-zero v ∈ V such that v = 4p 2 4p i=1 xi vi , one obtains, by Equation (7.6), f (v, v) = i=1 xi di > 0 as xi are reals. Similarly, for any v ∈ W2 , where W2 is the subspace of V spanned by v's+1 , . . . , v'r , w'1 , . . . , w'n−r , one has f (v, v) ≤ 0 by Equation (7.7) as f (w'j , w'j ) = 0. It follows from the preceding two equations for f (v, v) that W1 ∩ W2 contains only the zero vector of V. We can then conclude from the formula dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ) (see Proposition 3.5.2) that p + (n − s) ≤ dim V = n, that is p ≥ s. A similar argument shows that s ≤ p.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

365

Thus the number of positive diagonal entries of D and D' is the same. Since the number of non-zero diagonal entries of D and D' is the same, the other two assertions of the proposition are now clear. !

The matrix version of the Law of inertia. Corollary 7.4.9. Let A be real symmetric matrix. Then the numbers of positive, negative and zero diagonal entries in any diagonal matrix congruent to A are independent of the choice of the diagonal matrix in the congruence class of A and is uniquely determined by A. Thus, the numbers of positive and negative diagonal entries of any diagonal matrix congruent to a real symmetric matrix A, apart from its rank, are the invariants of the congruent class of A. Definition 7.4.10. Let p and q be the numbers of positive and negative diagonal entries, respectively, of any diagonal matrix representing a symmetric bilinear form f on a real vector space. The number p is called the index, S = p − q the signature and r = p + q the rank of f . These three numbers are invariants of the form f as they are independent of the choice of the diagonal matrix representing the bilinear form. Note that the rank of f is actually the matrix rank of any diagonal matrix representing f . Also, it is clear that any two of these three invariants determine the third. We can define analogous invariants for any real symmetric matrix because of the matrix version of Sylvester’s Law of Inertia. Definition 7.4.11. Let A be a real symmetric matrix and D be a diagonal matrix congruent to A. Let p and q be the numbers of positive and negative diagonal entries, respectively, of D. Then, p is the index, and S = p − q is the signature of the real symmetric matrix A. The index, the signature and the rank of A are invariants of A as they are independent of the choice of the diagonal matrix D congruent to A. Recall that for a real symmetric matrix A, there is a real orthogonal matrix Q such that Q−1 AQ is a diagonal matrix D whose diagonal entries are the eigenvalues of A, all of which are real. Since Q is orthogonal, Q−1 = Qt and so it follows that D is congruent to A. Using the invariants of A, we then obtain the following significant result. Corollary 7.4.12. If p is the index and S the signature of a real symmetric matrix A, then the number of positive and negative eigenvalues of A are, respectively, p and q = p − S . The invariants of a real symmetric matrix A determine A uniquely upto congruence as the next proposition shows. Proposition 7.4.13. the same invariants.

Two real symmetric matrices in Mn (R) are congruent if and only if they have

Proof. It is easy to see that if two symmetric matrices in Mn (R) are congruent, then they must be congruent to the same diagonal matrix. So they have the same invariants.

Saikia-Linear Algebra

366

book1

February 25, 2014

0:8

Bilinear Forms

To prove the converse, we first show that if A ∈ Mn (R) is a real symmetric matrix with index p and rank r, then A is congruent to the following diagonal matrix:   I p 0 0   D p,r =  0 −Ir−p 0,   0 0 0

where Ik denotes the identity matrix of order k, and the symbols 0 denote zero matrices of suitable orders. By hypothesis, the symmetric matrix A is congruent to a diagonal matrix D ∈ Mn (R), whose first p diagonal entries are positive, the next (r − p) diagonal entries negative and the rest of the diagonal entries zeros. Let di be the ith non-zero diagonal entry of D; note we have not defined di for (r + 1) ≤ i ≤ n. Let P = diag[a1, a2 , . . . , an ] be the diagonal matrix in Mn (R) defined by  √   if 1 ≤ i ≤ p, 1/ di ,     √ ai =  1/ −di , if (p + 1) ≤ i ≤ r,     1 if (r + 1) ≤ i ≤ n. Since the transpose Pt is the same matrix as P, a simple calculation shows that Pt DP is precisely D p,r . In other words, D is congruent to Dp, r and so, by the transitivity of the congruence relation, A is congruent to D p,r as claimed. Therefore, if A and B are two real symmetric matrices having the same invariants, then each of them is congruent to the same diagonal matrix D p;r for some p and r. It follows that A and B are themselves congruent. !

Consider the case when symmetric bilinear form f , on a real n-dimensional vector space V, is non-degenerate. Then V ⊥ is the zero subspace and so the rank of f is n. The proof of the preceding proposition then implies the following. Corollary 7.4.14. Let f be a symmetric non-degenerate bilinear form an a real n-dimensional vector space, If the index of f is p and q = n − p, then there is an orthogonal basis of V with respect to which the matrix of f is the following diagonal one: ( ' 0 Ip I p,q = , 0 −Iq where I p and Iq are identity matrices of orders p and q, respectively, and the matrices 0 are zero matrices of suitable sizes. The following matrix version of the two preceding results answers the question we raised, before introducing Sylvester’s law of inertia, about the existence of canonical forms in congruence classes of real symmetric matrices. Corollary 7.4.15. Let A be a real symmetric matrix of order n with rank r and index p. Then A is congruent to D p,r of order n. In case A is invertible, then A is congruent to the matrix I p,q of order n, where q = n − p.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

367

For details about such canonical forms for symmetric matrices over fields other than the real field, the reader can look up Chapter 6 of Jacobson’s Basic Algebra, vol. I [5]. There is a simple method for computing the canonical form (or the invariants) of a real symmetric matrix. The following lemma describes the method. Lemma 7.4.16. Let A ∈ Mn (F) be a symmetric matrix (ch F ! 2). Then A can be diagonalized by performing a sequence of pairs of operations on A, each pair consisting of an elementary column operation and the corresponding row operation. Proof. We begin by recalling the way multiplications of A by elementary matrices produce the effects of corresponding elementary row (column) operations on A (see Section 2.2 for the relevant concepts). If E ∈ Mn (F) is an elementary matrix, then the product AE is the matrix produced by the column operation corresponding to E on A. The crucial point is that the corresponding row operation on AE will result precisely in the product E t AE. Thus, E t AE can be thought of as the result of first performing a certain column operation on A, and then performing the corresponding row operation on AE. This explains the use of the word pairs in the statement of the lemma. Now suppose that for the symmetric matrix A ∈ Mn (F), one can find an invertible matrix P ∈ Mn (F) such that Pt AP is diagonal. By Theorem (2.5.6), the invertible matrix P can be expressed as a product of certain elementary matrices, say, P = E1 E2 , . . . , Em . Taking transposes, we obtain t t Pt = E m Em−1 , . . . , E1t . It follows that the diagonal form D of A is given by t t Em−1 , . . . , E1t AE1 E2 , . . . , Em . D = Pt AP = Em

The associativity of matrix products and the observations of the preceding paragraph then establishes the lemma. ! In view of this lemma, it is clear that the procedure for diagonalizing a symmetric matrix A consists of applying a suitable column operation to A and then the corresponding row operation to AE to begin with, and then repeating with suitable pairs of such column and row operations to the resultant matrix at each stage. The column and the row operations, at each stage, are chosen to make each pair of symmetrically placed off-diagonal entries of the matrices obtained at that stage zeros. Note that the chosen column operations, if applied to the identity matrix of suitable size in proper sequence, will yield the matrix P also. EXAMPLE 11 We diagonalize the following symmetric matrix in M3 (R):    1 −2 3   5 4. A = −2   3 4 −1

We begin with the block matrix [A|I3 ], and subject A to pairs of elementary column and corresponding row operations till it reduces to a diagonal matrix D. Note that only the column operations will be applied to I3 so that finally I3 will change to P for which Pt AP = D. We outline the procedure leaving the details to the reader. For convenience, wed denote the column (row) operation of adding a times the column C j (row R j ) to column Ci (row Ri ) as Ci → Ci + aC j (Ri → Ri + aR j ). Choosing the first pair of operations to be C2 → C2 + 2C1 and R2 → R2 + 2R1 , one sees that [A|I3 ]

Saikia-Linear Algebra

368

book1

February 25, 2014

0:8

Bilinear Forms

changes to  1  0  3

0 1 10

1 0 0

3 10 −1

 0  0.  1

2 1 0

Applying C3 → C3 − 3C1 and R3 → R3 − 3R1 to the preceding block matrix, one then obtains   0 0 1 2 −3 1   1 10 0 1 0. 0   0 10 −10 0 0 1

Finally, the pair C3 → C3 − 10C2 and R3 matrix to [D|P]:  0 1 0  0 0 1  0 0 −110

→ R3 − 10R2 reduces the preceding block 1 0 0

 −23  −10.  1

2 1 0

Thus, we have shown that for

 1  P = 0  0

one has  1  0 0

0 1 0

2 1 0

 0  0 = Pt  −110

 −23  −10,  1   1  −2 3

−2 5 4

By Corollary (7.4.15), A is also congruent to   1 0 0  0 I2,1 =  0 1  0 0 −1

 3  4 P.  −1

   .

We can therefore conclude that A has rank 3 and index 2. More importantly, we now know that any diagonal matrix congruent to A has the same index and rank. In particular, if D is the diagonal matrix Qt AQ, where Q is an orthogonal matrix, then as the diagonal entries of D are the eigenvalues of A, we can conclude that A has two positive and one negative eigenvalue. It must be pointed out that there is no unique way of reducing a symmetric matrix A to a diagonal one by such pairs of operations; one can choose any pair of operations which renders non-zero symmetrically placed entries of A, and of resultant symmetric matrices obtained at each stage, zeros. There is a very useful classification of real symmetric matrices in terms of the signs of its eigenvalues; equivalently, real quadratic forms can be classified in the same manner. Recall that a real

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

369

symmetric matrix A of order n is associated to a real quadratic form q on Rn given by q(x) = xt Ax and conversely. Definition 7.4.17. Let A be a real symmetric matrix and q the associated quadratic form. A and q are said to be positive definite (positive semi-definite) if all the eigenvalues of A are positive (nonnegative). They are called indefinite if A has both positive and negative eigenvalues, and negative definite if all the eigenvalues are negative. It is usual to call positive definite and positive semi-definite matrices simply as pd and psd matrices. For practical purposes, real quadratic forms are also classified in terms of the nature of their ranges. The following discussion depends on the results about diagonalization of real symmetric matrices; see Section 5.3. Proposition 7.4.18. Let q be a quadratic form on Rn given by q(x) = xt Ax, where A is the associated real symmetric matrix of order n. Then the following hold. (a) (b) (c) (d)

q is positive definite if and only if xt Ax > 0 for all non-zero x ∈ Rn . q is positive semi-definite if and only if xt Ax ≥ 0 for all non-zero x ∈ Rn . q is negative definite if and only if xt Ax < 0 for all non-zero x ∈ Rn . q is indefinite if and only if xt Ax assumes both positive and negative real values.

Proof. Let Q be the orthogonal matrix of order n which diagonalizes A. Thus Qt AQ = D, where D is the diagonal matrix having the n eigenvalues λ1 , λ2 , . . . , λn of A as its diagonal entries. As we have seen earlier, as Q−1 = Qt , the change of coordinates Qy = x allows us to compute xt Ax as follows: xt Ax = xt (QDQt )x = (Qt x)t D(Qt x) = yt Dy = λ1 |y1 |2 + λ2 |y|2 + · · · + λn |yn |2 ,

(7.8)

where y = (y1 , y2 , . . . , yn )t . Observe that y = Qt x = 0, if and only if, x = 0. So, if x is non-zero, |yi |2 > 0 for all yi . Thus, an examination of the right-hand side of Equation (7.8) yields all the assertions of the proposition. ! It is clear that the assertions of the proposition hold if we replace q by A. The concepts of pd (positive definite) and psd (positive semi-definite) matrices are important as they appear in numerous applications. We here can give only a very brief introduction to the theory of these matrices. The following example shows the power of the ideas we have developed so far. EXAMPLE 12 Let A and B be real symmetric matrices both having all their eigenvalues nonnegative (positive) and so are psd (pd) matrices. Thus, by the preceding proposition (as applied to symmetric matrices), for any non-zero x ∈ Rn , both the real numbers xt Ax and xt Bx are non-negative (positive). It follows that xt (A + B)x = xt Ax + xt Bx too is non-negative (positive). By the same proposition then, A + B too is psd (pd). We conclude that the eigenvalues of the symmetric matrix A + B are all non-negative (positive).

Saikia-Linear Algebra

370

book1

February 25, 2014

0:8

Bilinear Forms

EXAMPLE 13 Consider the quadratic form q(x) = 3x21 + 4x1 x2 on R2 . Though it is clear that q assumes positive values, we do not see it assuming negative values unless we make a proper choice of x1 and x2 . However, the associated real symmetric matrix is ' 3 2

( 2 , 0

which has eigenvalues 4 and −1. So Proposition (7.4.18) confirms that q is an indefinite quadratic form, which means that q does assume negative values, too. EXAMPLE 14 The eigenvalues of   3  A = −1  0

−1 2 −1

 0  −1  3

are 1, 3 and 4 (verify). Thus, A is a positive definite matrix. Thus the corresponding quadratic form 3x21 + 2x22 + 3x23 − 2x1 x2 − 2x2 x3 on R3 assumes only positive values, a fact which is not apparent from its formula. One of the interesting aspects of psd (pd) matrices is that they behave as non-negative (positive) real numbers. For example, the following result shows that psd matrix has a unique square-root. Proposition 7.4.19. Let A be a real symmetric matrix. Then A is psd (pd) if and only if there is a unique psd (pd) matrix B such that A = B2 . Proof. Suppose A = B2 for some psd matrix B. Since eigenvalues of B are non-negative and the eigenvalues of B2 are the squares of the eigenvalues of B, it follows that A is psd. To prove the converse, let Q be the orthogonal matrix which diagonalizes the psd matrix A. Thus A = QDQt , where D is the diagonal matrix whose diagonal entries λi are the eigenvalues of A, all of which are√non-negative. Let E be the diagonal matrix whose ith diagonal entry is the non-negative square root λi . Then D = E 2 . Since Q−1 = Qt , it follows that A = QDQt = QE 2 Q = (QEQt )(QEQt ) =√B2 , where B = QEQt . It is clear that B is psd as it is a diagonal matrix having the same eigenvalues λi as E does. It remains to prove the uniqueness of B. So let C be another psd matrix such that A = C 2 . Let γ1 , γ2 , . . . , γn be the orthonormal columns of the orthogonal matrix Q which diagonalizes A; thus the ith column vector γi is a unit eigenvector for the eigenvalue λi of A, that is, (A − λi In )γi = 0 for each i (assuming A is of order n). One has then, for any i, 1 ≤ i ≤ n, 0 = (C 2 − λi In )γi = (C −

N

λi In )(C +

N

λi In )γi .

√ If λi is positive for some i, then it is clear that C + λi In is positive√definite (see Example√13) and so, invertible. It then follows from the preceding relation that (C − λi In )γi = 0 and so λi is an eigenvalue of C with γi as an eigenvector. In case λi = 0, Aγi = 0 and so, by properties of the usual dot

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

371

product in Rn , one obtains G H I F 4Cγi 42 = Cγi , Cγi = C t Cγi , γi G F = C 2 γi , γi H I = Aγi , γi = 0. Thus Cγi = 0, which shows that for the eigenvalue λi = 0, γi is an eigenvector of√C. √ √ It follows that C is diagonalizable and Qt CQ = E, the diagonal matrix having λ1 , λ2 , . . . , λn as its diagonal entries. However, by definition B = QEQt and so we conclude that C = B. This proves the uniqueness part. ! The following is clear now. Corollary 7.4.20.

If, for psd (pd) matrices B and C, B2 = C 2 , then B = C.

Suppose now that for a real symmetric matrix A of order n, there is an invertible matrix C ∈ Mn (Rn ) such that C t C = A. Then for any x ∈ Rn , xt Ax = xt (C t C)x = (Cx)t (Cx).

(7.9)

Now by properties of the dot product in Rn , (Cx)t (Cx) = /Cx, Cx0 = 4Cx42 . Therefore Equation (7.9) implies that xt Ax = 0 if and only if Cx = 0. However, as C is invertible, Cx = 0 if and only if x = 0. We conclude that xt Ax = 4Cx42 > 0 for any non-zero x in Rn , that is, A is positive definite. This discussion coupled with Proposition (7.4.19) implies the following characterization of positive definite matrices. Proposition 7.4.21.

For a real symmetric matrix A of order n, the following are equivalent.

(a) A is positive definite. (b) There is a positive definite matrix B such that A = B2. (c) There is an invertible matrix C of order n such that A = C t C. EXERCISES 1. Determine whether the following assertions are true are false giving brief justifications. All given vector spaces are finite-dimensional over fields F whose characteristic ! 2. (a) For any symmetric bilinear form f on V, there is a basis of Vwhich is orthogonal relative to f. (b) If a symmetric matrix A ∈ Mn (F) is congruent to a diagonal matrix D, then the diagonal entries of D are the eigenvalues of A.

Saikia-Linear Algebra

372

book1

February 25, 2014

0:8

Bilinear Forms

(c) Any symmetric bilinear form on R2 of rank 2 can be represented as ' ( 1 0 0 1 with respect to some suitable basis of R2 . (d) Any matrix in Mn (F) which is congruent to a symmetric matrix is itself symmetric. (e) A symmetric matrix in Mn (Q) (Q is the field of rational numbers) is congruent to a diagonal matrix in Mn (Q) whose non-zero entries are ±1. (f) A symmetric matrix in Mn (C) is congruent to a diagonal matrix in Mn (C) all of whose nonzero diagonal entries are 1.

(g) There is no symmetric matrix in Mn (R) whose index equals its signature. (h) There is no symmetric matrix in Mn (R) whose rank equals its signature. (i) Two symmetric matrices in Mn (F) are congruent if and only if they represent the same symmetric bilinear form on the vector space Fn . (j) The determinants of two matrices in Mn (F) representing the same symmetric bilinear form on an n-dimensional vector space over F differ by a multiple of a square of an element of F. (k) For any subspace W of a vector space V with a symmetric bilinear form f , (W ⊥ )⊥ = W. (l) Any two congruent matrices have the same determinant. (m) Any two congruent matrices have the same trace. (n) A real symmetric matrix with negative entries is negative definite. (o) A positive definite matrix is invertible. (p) A negative definite matrix is not invertible. (q) The diagonal entries of a positive definite matrix are positive reals. (r) If the diagonal entries of a real symmetric matrix are positive, then the matrix is positive definite. 2. Verify that 1 and −1 are not eigenvalues of the matrix in Example 11. 3. For each of the following real symmetric matrix A, determine an invertible matrix P and a diagonal matrix D such that Pt AP = D, and hence determine the invariants, and the number of positive and negative eigenvalues of A:     ' (  0 1 −1  1 −2 3 1 1     0 , (c) −2 3 2 . (a) , (b)  1 0 1 −1     −1 0 1 3 2 4

4. Determine an orthogonal basis of R3 relative to each of the following symmetric bilinear form f on R3 and then compute the invariants of f : (a) f ((x1 , x2 , x3 )t ) = x21 + x22 − x23 − 2x1 x2 + 4x2 x3 ;

(b) f ((x1 , x2 , x3 )t ) = −x21 + x23 + 2x1 x2 + 2x1 x3 .

5. Determine an orthogonal basis of C3 relative to the symmetric bilinear form f on C3 given by f ((x1 , x2 , x3 )t ) = (1 + i)x21 + x23 + 4x1 x2 + 2ix1 x3 + 2(1 − i)x2 x3 .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Symmetric Bilinear Forms

373

6. Prove the matrix version of Sylvester’s Law of Inertia given in Corollary (7.4.9). 7. Let f be a bilinear form on a finite-dimensional vector space V. If there is a basis of V with respect to which the matrix of f is diagonal, that is, if f is diagonalizable, then show that f must be symmetric. 8. Let f be a symmetric bilinear form on V, and let W be a subspace of V. Show that f (v + W, u + W) = f (v, u)

9. 10. 11.

12.

defines a symmetric bilinear form on the quotient space V/W. Show further that f is nondegenerate if and only if W = V ⊥ , where V ⊥ is the orthogonal complement of V with respect to f. Let D be a diagonal matrix in Mn (F) for a field F, and let D' be the matrix in Mn (F) obtained by some arbitrary permutation of the diagonal entries of D. Prove that D' is congruent to D in Mn (F). Prove that the number of distinct congruence classes of symmetric matrices in Mn (R) is (n+1)(n+2) . 2 Let f be a non-degenerate symmetric bilinear form on a vector space V over a field F, and let T be a linear operator on V. Show that the map g : V × V → F given by g(v, w) = f (v, T w) is a bilinear form on V. Show further that g is symmetric if and only if T = T ∗ , where T ∗ is the adjoint of T with respect to f . (Hint: See Exercise 12 of the preceding section for the definition of the adjoint of a linear operator.) Let V be a finite-dimensional vector space with a symmetric bilinear form f . If W1 and W2 be subspaces of V such that dim W1 > dim W2 , then prove that W1 contains a a non-zero vector which is orthogonal to every vector of W2 . Exercises 17 to 21 are based on the material in Prof. R. Bhatia’s article Min Matrices and Mean Matrices which appeared in Mathematical Intelligencer, Vol.33, No.2, 2011.

13. If A1 and A2 are psd (respectively, pd) matrices, then show that the direct sum A1 ⊕ A2 is psd (respectively, pd). 14. Consider the real matrix Fn (flat matrix) of order n all of whose entries are equal to 1. Compute xt Fn x for x = (x1 , x2 , . . . , xn )t ∈ Rn and verify that while Fn is positive semi-definite, it cannot be positive definite. 15. Find a positive semi-definite matrix R of order 4 such that the flat matrix F4 = Rt R. Can any choice of R be invertible? Generalise the construction of R to the case of Fn . 16. Let  1 1 M =  1 1

1 2 2 2

1 2 3 3

 1  2 . 3 4

a. Find a positive definite matrix R of order 4 such that M = Rt R. b. Compute xt Mx for x = (x1 , x2 , x3 , x4 )t ∈ R4 and verify that M is positive definite.

Saikia-Linear Algebra

374

book1

February 25, 2014

0:8

Bilinear Forms

17. Let M = [mi j ] be the min matrix of order n, where mi j = min i, j:   1  1   M =  1  .  ..  1

 1 1   2 2   3 3 . ..  .  2 3 . . . . n−1 n 11 ... 22.... 23.... .. .. . .

Prove that M is positive definite by finding a an invertible lower triangular matrix R (through the LU factorization of M) such that M = Rt R. Exercise 13 shows that the restriction on the characteristic of the scalar field is necessary in Theorem (7.4.6) for diagonalization of a symmetric bilinear form. Knowledge of the finite field of two elements is necessary for this exercise. 18. Let F be the field Z2 = {0, 1} of two elements, and let V = F2 be the vector space of 2-dimensional column vectors over F. Consider the map f on V × V given by f (x1 , x2 )t , (y1 , y2 )t = x1 y2 + x2 y1 . Show that f is a symmetric bilinear form on V and compute the matrix A of f relative to the standard basis of V and hence show that rank( f ) = 2. Next prove 'that if (f is diagonalizable then 1 0 there is a basis of V with respect to which the matrix of f is D = . 0 1 ' ( a b Finally, if P = ∈ M2 (F) is an invertible matrix such that Pt AP = D, then show, by comc d paring the diagonal entries of both sides of this matrix equation, that 1 = 0 in F, a contradiction. (Hint: One needs to use the fact that 2x = 0 for any x ∈ F.)

7.5 GROUPS PRESERVING BILINEAR FORMS Recall that if Q is a real orthogonal matrix of order n then Qt Q = In = QQt . Therefore for any x, y ∈ Rn , one has (Qx)t Qy = xt Qt Qy = xt y. In other words, f (Qx, Qy) = f (x, y), where f is the symmetric bilinear form representing the usual dot product on Rn . One says that Q preserves the dot product. In general, one can introduce linear operators (or matrices) which preserve bilinear forms. Operators or matrices preserving non-degenerate forms furnish interesting examples of groups, such as general orthogonal and symplectic groups, having important properties. If f is bilinear form on a vector space V, then a linear operator T on V is said to preserve f if f (T v, T w) = f (v, w) for all v, w ∈ V. It is clear that (i) the identity operator on V preserves any bilinear form f , and (ii) if operators T and S on V preserve f , then so does the composite S T . In case f is a non-degenerate form on a finite-dimensional vector space V, any linear operator T preserving f is necessarily invertible; for, if v ∈ ker T , then as f is non-degenerate, the condition f (v, w) = f (T v, T w) = f (0, w) = 0 for all w ∈ V implies that v = 0. Thus T is one-one and so invertible as V is finite-dimensional. Moreover, f (T −1 v, T −1 w) = f (T (T −1 v), T (T −1 w)) = f (v, w), which proves that T −1 also preserves f . Thus we have verified the following proposition.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Groups preserving Bilinear Forms

375

Proposition 7.5.1. Let f be a non-degenerate bilinear form on a finite-dimensional vector space V over an arbitrary field F. Then the set O( f ) of linear operators on V which preserve f is a group. Definition 7.5.2. The group O( f ) is called the orthogonal group or the group of isometries of the non-degenerate bilinear form f . One can identify the group O( f ), for a non-degenerate bilinear form f on an n-dimensional vector space V over a field F, with a group of invertible matrices of order n over the field F by fixing a basis of V. Let A be the matrix of f with respect to a fixed but arbitrary basis {v1 , v2 , . . . vn } of V; as f is non-degenerate, A is invertible. Recall that if x and y are the coordinate vectors in Fn of vectors v and w in V respectively, then f (v, w) = xt Ay. Therefore, if a linear operator T on V, which preserves f , is represented by the matrix M of order n with respect to the same basis, then M is invertible (as T is) and f (T v, T w) = f (Mx, My) = xt M t AMyt . It follows that T preserves f if and only if M t AM = A.

(7.10)

One also says that the matrix M preserves the form f if M t AM = A, where A is the matrix representing f with respect to some basis of V. The following result, whose easy verification is left to the reader, can be termed as the matrix version of Proposition (7.5.1). Recall that GLn (F) is the group of all invertible matrices of order n over a field F. Proposition 7.5.3. is a group.

For any A ∈ GLn (F), the set O(A) of matrices M ∈ GLn (F) such that M t AM = A

Moreover, using Equation (7.10), one can easily show that the group of operators and the group of matrices preserving a non-degenerate form are essentially the same. Proposition 7.5.4. Let f be a non-degenerate bilinear form on an n-dimensional vector space V over a field F and let A ∈ GLn (F) be the matrix of f with respect to some basis B of V. If for any T ∈ O( f ), θ(T ) is the matrix of T with respect to B, then θ is an isomorphism of the group O( f ) onto O(A). For example, the group of matrices M ∈ GLn (F) preserving the dot product f (x, y) = xt y on Fn consists of precisely those matrices M such that M t M = In , since with respect to the standard basis of Fn , the matrix of the dot product is the identity matrix In . This group is the general orthogonal group On (F); the matrices in this group are the orthogonal matrices of order n over the field F. The real orthogonal group On (R) is, thus, just one example of bilinear form preserving group of matrices. Over F = R, one also has pseudo-orthogonal groups On (p, q) for non-negative integers p and q such that p + q = n; On (p, q) is the group of all real matrices M of order n such that M t I p,q M = I p,q ,

Saikia-Linear Algebra

376

book1

February 25, 2014

0:8

Bilinear Forms

where I p,q =

'

( Ip 0 . 0 −Iq

Note that On (0, n) = On (n, 0) = On (R). Because of Corollary (7.4.14), it is clear that On (p, q) is the set of real matrices M of order n which preserves any non-degenerate symmetric bilinear form on an n-dimensional real vector space having rank n and index p. In case F = C (or some other algebraically closed field), Theorem (7.4.6) implies that any nondegenerate symmetric bilinear form on an n-dimensional vector space determines an orthogonal basis with respect to which matrix of the form is the identity matrix and so matrices preserving any such form is the full orthogonal group On (C). In all theses cases, clear descriptions of groups of matrices (or of linear operators) preserving nondegenerate symmetric bilinear forms on finite dimensional vector spaces over a field F of characteristic different from 2, depended on the existence of nice canonical forms of diagonal matrices representing the form. In general however, for arbitrary fields F (except for finite fields), it may not be possible to determine such canonical forms; for details, one may refer to Basic Algebra,vol. 1 by Jacobson [?]. It is therefore surprising that an alternating non-degenerate bilinear form on a finite-dimensional vector space over any field admits a nice canonical matrix representation. Recall that a bilinear form f on a vector space V over a field F is called alternating if f (v, v) = 0 for all v ∈ V. Observe that an alternating form is skew-symmetric: f (v, w) = − f (w, v) for any v, w ∈ V. If charF ! 2, then a skewsymmetric form f is also alternating; in case charF = 2, skew symmetric forms are the same as the symmetric ones as a = −a for any a ∈ F. The following is the fundamental result about alternating forms. Proposition 7.5.5. Let f be an alternating bilinear form on a finite-dimensional vector space V over a field F. Then there is a basis of V with respect to which the matrix of f is a direct sum of a zero matrix and k copies of the matrix ' ( 0 1 J1 = . −1 0 Furthermore, the rank r of f is even and r = 2k. Proof. If f is the zero form, there is nothing to prove. So we may assume that f is not the zero form. Then there are vectors u1 and v1 in V such that f (u1 , v1 ) ! 0. Replacing v1 by a scalar multiple, if necessary, we can assume that f (u1 , v1 ) = 1. Using properties of the alternating form f , one can easily see that u1 and v1 are linearly independent (see Exercise 8). Let W1 be the subspace of V spanned by u1 and v1 ; the key to finding the required basis of V is to show that V is a direct sum of such two-dimensional subspaces (and possibly a subspace on which f is the zero form). To do that, we first prove that V = W1 ⊕ W1⊥ , where W1⊥ is the subspace (prove it) consisting of the vectors w ∈ V such that f (u1 , w) = 0 and f (v1 , w) = 0. Now, for any v ∈ V, let v' = f (v, v1 )u1 − f (v, u1 )v1 and set w' = v − v' . It is clear that v' is in W1 . Moreover, as f (u1 , u1 ) = 0 and f (u1 , v1 ) = 1, we see that f (u1 , w' ) = f (u1 , v) − f (u1 , v' ) = f (u1 , v) + f (v, u1 ) f (u1 , v1 ) = 0.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

377

Groups preserving Bilinear Forms

A similar calculation shows that f (v1 , w' ) = 0. So w' ∈ W1⊥ . Since v = v' − w' , it follows that v is in W1 + W1⊥ . Next we verify that W1 ∩ W1⊥ = {0}. So let v = au1 + bv1 be in W1⊥ , which implies that f (u1 , au1 +bv1 ) = 0 = f (v1 , au1 +bv1 ). Properties of the alternating form f and the fact that f (u1 , v1 ) = 1 then show that b = 0 = −a. Thus v = 0, completing our verification. We therefore can conclude that V = W1 ⊕ W1⊥ . Now the restriction of f to W1⊥ is clearly alternating. So if f is the zero form on W1⊥ , then we are done. Otherwise we can find two vectors u2 and v2 in W1⊥ such that f (u2 , v2 ) = 1. As in the case of the vectors u1 and v2 , one can show that u2 and v2 are linearly independent. Letting W2 to be the subspace spanned by u2 and v2 and W2⊥ the subspace of all vectors w ∈ W1⊥ such that f (u2 , w) = 0 = f (v2 , w), one can show, as in the preceding case, that W1⊥ = W2 ⊕ W2⊥ . Continuing in this manner, as V is finite-dimensional, we obtain vectors u1 , v1 , . . . , uk , vk such that f (u j , v j ) = 1 for 1 ≤ j ≤ k;

f (ui , v j ) = f (ui , u j ) = f (vi , v j ) = 0 for i ! j.

Moreover, if W j is the subspace spanned by the linearly independent vectors u j and v j , then V = W1 ⊕ W2 ⊕ · · · ⊕ Wk ⊕ W0 , where W0 is the subspace (possibly the zero subspace) such that the restriction of f to W0 is the zero form. Now, choosing any basis w1 , w2 , . . . , wn−2k of W0 , we see that if dim V = n, then u1 , v1 , . . . , uk , vk , w1 , w2 , . . . , wn−2k is a basis of V and that the matrix of f with respect to this basis is the required one. The assertion about the rank of f is now clear. ! The pair u j , v j is sometimes called a hyperbolic pair and the subspace W j spanned by these two linearly independent vectors a hyperbolic plane. Recall that for a non-degenerate bilinear form f on a vector space V, for every non-zero u ∈ V, there is some v ∈ V such that f (u, v) ! 0. Thus, for such a form, the subspace W0 of the preceding proposition has to be the zero subspace and so can be omitted from the direct sum decomposition of V. In that case, reordering the basis of the non-zero part of V as u1 , u2 , . . . , un , v1 , v2 , . . . , vn , we deduce the following result. Corollary 7.5.6. Let f be a non-degenerate alternating form on a finite-dimensional vector space V over a field F. Then the dimension of V is necessarily even. Further, if dim V = 2n, then there is a basis of V with respect to which the matrix of f is ' ( In 0n Jn = −In 0n , where 0n and In are respectively the zero matrix and the identity matrix, both of order n, over F. The matrix Jn gives rise to a group which has deep group-theoretic properties. Definition 7.5.7. Let F be a field of characteristic different from 2. The symplectic group S p2n (F) is the group of matrices {S ∈ GL2n (F) | S t Jn S = Jn }.

(7.11)

Saikia-Linear Algebra

378

book1

February 25, 2014

0:8

Bilinear Forms

Proposition (7.5.3) shows that S p2n (F) is indeed a group. Elements of S p2n (F) are called symplectic matrices. An easy calculation shows that Jn itself is a symplectic matrix of order 2n. As in the case of non-degenerate symmetric bilinear forms, which we had discussed earlier, the linear operators on a vector space V, which preserve a non-degenerate alternating bilinear form f on V, do form a group, say S p( f ). If V is finite-dimensional of dimension 2n, then by choosing a basis of V, consisting of n hyperbolic pairs, one can set up a group isomorphism of S p( f ) onto S p2n (F). Thus, the symplectic matrices in S p2n (F) are also said to preserve the form f . To obtain some examples of symplectic matrices S satisfying the relation S t Jn S = Jn , we express S as a partitioned matrix ' ( A B S= , C D where the blocks A, B, C and D are all matrices over F of order n and then express the product S t Jn S in blocks by block multiplication. Equating these four blocks with the corresponding blocks of Jn , we find the following conditions on the blocks of S : (i) At C and Bt D are symmetric and (ii) At D − C t B = In .

Thus, for example, for any A ∈ GLn (F) of order n and any symmetric matrix B of order n (over F), the blocks of the matrices ' −1 ( ' ( A In B 0n and , 0n In 0n At

where 0n is the zero matrix of order n, satisfy the preceding conditions and so the matrices are symplectic. EXERCISES

1. Determine whether the following assertions are true or false giving brief justification. All vector spaces are finite-dimensional. a. A matrix A ∈ Mn (F) is skew-symmetric if and only if xt Ax = 0 for all x ∈ Fn .

b. A skew-symmetric matrix A ∈ Mn (F) cannot be invertible if n is odd. c. If A and B are matrices of a non-degenerate bilinear form on a vector space, then the groups O(A) and O(B) are isomorphic. d. Any matrix preserving a quadratic form is invertible. e. The group O(1, 1) preserves the form x2 − y2 . f. If M ∈ Mn (C) preserves the dot product on Cn , then the columns of M form an orthonormal set. g. If A is a matrix of a non-degenerate bilinear form on a complex vector space of dimension n, then A is congruent to In . h. The matrix Jn ∈ S p2n (F). i. If S ∈ S p2n (F), then S t ∈ S p2n (F).

j. The determinant of the symplectic matrix Jn is 1. k. Any S ∈ S p2n (F) is invertible.

l. Every real invertible matrix of order 2n is in S p2n (F).

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Groups preserving Bilinear Forms

379

2. If A real skew-symmetric matrix of order n, then show that In + A is invertible and (In − A)(In + A)−1 is orthogonal. 3. Prove that the eigenvalues 'of a real skew-symmetric matrix are purely imaginary. ( cosh t sinh t 4. Verify that the real matrix is in O(1, 1). sinh (t cosh t ' cos z − sin z 5. Verify that A = is in O2 (C) but is not unitary. sin z cos z 6. Let A' and B (be real matrices of order p and q, respectively. Find conditions on A and B such that A 0 M= is in the group O(3, 3). 0 B 7. Find a matrix M ∈ O3 (C) whose first column is (1, 0, i)t . 8. let n be a positive integer and p and q non-negative integers such that p + q = n. Prove that the group O(p, q) is isomorphic to O(q, p). 9. Let f be an alternating form on a vector space V over a field F, and let u1 , v1 ∈ V such that f (u1 , v1 ) = 1. If w = a1 u1 + b1v1 for a1 , b1 ∈ F, then prove that a1 = f (w, v1 ) and b1 = − f (w, u1 . Hence, prove that u1 and v1 are 'linearly independent over F. ( A−1 0 10. Verify that for any A ∈ GLn (F), is in S p2n (F). 0 At 11. Verify that for any symmetric matrix B of order n over a field F, ( ' In B ∈ S p2n (F), 0 In where In denote the identity matrix and 0 the zero matrix, both of order n, over F. 12. If S ∈ S p2n (F), then prove that S −1 ∈ S p2n (F). 13. Let S ∈ S p2n (C). Use the fact that S is similar to S t to prove that S is similar to S −1 . 14. Let A be a real matrix of order 2n such that the product Jn A is symmetric. Prove that the trace of A is zero and that Jn At is also symmetric.

Saikia-Linear Algebra

book1

8

February 25, 2014

0:8

Inner Product Spaces

8.1 INTRODUCTION Bilinear form, as we had seen in the last chapter, is a natural generalization of the dot products of R2 and R3 ; the important geometric concept of perpendicularity can also be introduced for symmetric bilinear forms. For a symmetric bilinear form f , even the notion of the length of a vector can be introduced in case f (v, v) is positive real for any non-zero v; such a form is known as positive definite. However, the theory of positive definite symmetric bilinear forms will be developed in the framework of hermitian forms in this chapter. Hermitian forms, in some sense, generalize real symmetric forms to the complex case. Most of the chapter will be devoted to positive definite hermitian forms, otherwise known as inner products. One of the main goals of this chapter is to develop necessary ideas which help classifying completely the operators which are diagonalizable with respect such inner products. Throughout this chapter, the field F is either R or C.

8.2 HERMITIAN FORMS Definition 8.2.1. Let V be a vector space over a field F. A map H : V × V → F is called a hermitian form on V, if for all v, v' , w ∈ V and a ∈ F, (i) H(v + v' , w) = H(v, w) + H(v' , w); (ii) H(av, w) = aH(v, w); (iii) H(v, w) = H(w, v). Here, bar denotes the complex conjugate. Thus, a hermitian form is linear in the first variable but not exactly linear in the second variable. In fact, by using the last condition to interchange the variables, we can deduce from these conditions that: (iv) H(v, w + w' ) = H(v, w) + H(v, w' ); (v) H(v, aw) = aH(v, w); (vi) H(v, v) is real.

380

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Hermitian Forms

381

As in the case of bilinear forms, one easily verifies that H(v, 0) = 0 for any v ∈ V. We may describe the condition on the second variable as conjugate linearity. Therefore, sometimes a hermitian form is referred to as one and a half linear (sesqui-linear) form. Condition (iii) is referred to as hermitian symmetry; observe that if F = R, then a hermitian form is nothing but a symmetric bilinear form. The reader should be cautioned that hermitian forms can also be defined by insisting on linearity in the second variable and conjugate linearity in the first variable. Thus, it is necessary to find out the definition adopted for hermitian forms while checking other sources. We give some examples of hermitian forms now. EXAMPLE 1

Let V = Fn , the vector space of n-dimensional column vectors over F. For x, y ∈ Fn , we let H(x, y) = y∗ x, where for the column vector y = (y1 , y2 , . . . , yn )t , y∗ denotes the row vector (y1 , y2 , . . . , yn ). Writing out the product, we then see that H(x, y) = x1 y1 + x2 y2 + · · · + xn yn . It is easy to verify that H is a hermitian form on Fn . It is the standard hermitian product or the standard inner product on Fn . If F = R, then this standard product is clearly the standard dot product on Rn as in this case H(x, y) = yt x = x1 y1 + x2 y2 + · · · + xn yn .

Note that the standard dot product xt y on Cn is not a hermitian form. It is customary to denote this standard hermitian form H(x, y) by /x, y0. √ Thus, in C3 , for example, if x = (1, 1 + i, 1 − i)t and y = ( 2, 2 + 2i, −i)t , then √ /x, y0 = 1( 2) + (1 + i)(2 − 2i) + (1 − i)i √ = (5 + 2) + i. To get more examples of hermitian forms on Fn , we need to establish an association with such hermitian forms with certain matrices that will reflect the special properties of hermitian forms. We introduce these matrices now. Recall that for an m × n matrix A over F, its adjoint or conjugate transpose A∗ is the n × m matrix [bi j ], where bi j = a ji ; we have already used the conjugate transpose y∗ of a column vector y in Example 1. Definition 8.2.2.

A matrix A ∈ Mn (F) is a hermitian, or a self-adjoint matrix if A = A∗ .

Thus a real hermitian matrix is simply a symmetric matrix. Note that the diagonal entries of a hermitian matrix are all real. The following properties of adjoints will be useful (see Section 1.5): For A, B ∈ Mn (F), one has (i) (A + B)∗ = A∗ + B∗, (ii) (AB)∗ = B∗ A∗ , ∗

(iii) (aA)∗ = aA∗ for any a ∈ F. Thus, for an invertible matrix A ∈ Mn (F), (A∗ )−1 = (A−1 ) .

Saikia-Linear Algebra

382

book1

February 25, 2014

0:8

Inner Product Spaces

Let H be a hermitian form on an n-dimensional vector space V over F. Fix a basis B = {v1 , v2 , . . . , vn } of V. The matrix A of the form H with respect to the basis B is defined as A = [ai j ],

where ai j = H(vi , v j )

for all i, j.

Since H(vi , v j ) = H(v j , vi ), it follows that ai j = a ji . Thus, the matrix A of the hermitian form H relative to any basis of V is a hermitian or self-adjoint matrix in Mn (F). Conversely, a hermitian matrix of order n determines a hermitian form on any n-dimensional vector space over F. To verify this assertion, let A be a hermitian matrix in Mn (F). Fix a basis B of an ndimensional vector space V over a field F. If x and y are the coordinate vectors of any two arbitrary vectors v and w of V with respect to the basis B, then we define H(v, w) = y∗ Ax.

(8.1)

The properties of matrix multiplication and of adjoints of matrices that have been just listed, readily imply that H is a hermitian form on V. For example, (y + y' )∗ Ax = (y∗ + y' ∗ )Ax = y∗ Ax + y' ∗ Ax. Similarly, (ay)∗ Ax = ay∗ Ax. These show that that H is conjugate linear in the second variable. To verify the other properties of H, we need to find the matrix of H with respect to the chosen basis B. Observe that if B = {v1 , v2 , . . . , vn }, then the coordinate vector of vi for any i is the n × 1 column vector ei , having 1 at the ith place and zeros everywhere else. Therefore, H(vi , v j ) = e j ∗ Aei = (a j1 , a j2 , . . . , a jn )ei = a ji for all i, j. Since a ji = ai j , it follows that H(vi , v j ) = H(v j , vi ) for all basis vectors vi and v j , and so H does have hermitian symmetry for all vectors in V. We have just shown that hermitian forms on an n-dimensional vector space V over a field F are in one–one correspondence with hermitian matrices of order n over F. Also note that hermitian matrices A ∈ Mn (F) supply us with examples of hermitian forms on Fn through the formula y∗ Ax given in Equation (8.1); the standard inner product can be obtained by taking A to the identity matrix. EXAMPLE 2

Consider the hermitian matrix A=

'

1 −i

( i 2

over C. For x = (x1 , x2 )t and y = (y1 , y2 )t in C2 , the form given by ' (2 3 1 i x1 H(x, y) = y∗ Ax = (y1 , y2 ) −i 2 x2 = x1 y1 − ix2 y1 + ix1 y2 + 2x2 y2

is then a hermitian form H on C2 with respect to the standard basis of C2 . Note that H(x, x) = x1 x1 − ix2 x1 + ix1 x2 + 2x2 x2 = |x1 |2 + 2Im(x2 x1 ) + 2|x2|2 is a real number. EXAMPLE 3

If, for any A, B ∈ Mn (F), we define H(A, B) = T r(AB∗ ),

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Hermitian Forms

383

where B∗ is the conjugate transpose of B, then one can verify that H is a hermitian form on Mn (F) by appealing to the properties of the trace function and matrix multiplication. There is also an another way of verifying the same. If A = [ai j ] and B = [bi j ], then the general diagonal entry cii of the product AB∗ is given by 4 cii = k aik bik , so by the definition of the trace of a matrix we have H(A, B) =

11 i

aik bik ,

k

2

which is nothing but the standard product of Example 1 on Fn . Note that Mn (F) as a 2 vector space is naturally isomorphic to Fn . It is also clear, in the light of the discussions preceding Example 2, that for any hermitian matrix P ∈ Mn (F), the formula H(A, B) = T r(APB∗) will define a hermitian form on Mn (F). For an example of a different type, we introduce a hermitian form on an infinite-dimensional vector space. EXAMPLE 4

Let V be the complex vector space of all continuous complex-valued functions on the interval [0, 2π]. Then J

H( f, g) = (1/2π)



f (t)g(t) dt

0

defines a hermitian form of V. We leave the verification, which will depend on the properties of the Riemann integral, to the reader. We point out that any h(t) ∈ V can be written as h(t) = a(t) + ib(t), where a(t) = Re h(t) and b(t) = Im h(t) are real-valued functions on [0, 2π] so that h(t) = a(t) − ib(t). Therefore, the integral of h(t) can be expressed in terms of real integrals as J

0

EXAMPLE 5



a(t) − i

J



b(t).

0

Let V be the complex vector space of all polynomials with complex coefficients of degree at most 3. As in the previous example, H( f (x), g(x)) =

J

1

f (t)g(t) dt 0

is a hermitian form on V. We compute the 4 × 4 matrix A = [ai j ] of H with respect to the basis B = {v1 = 1, v2 = x, v3 = x2 , v4 = x3 } of V. By definition, ai j = H(vi , v j ) =

J

1

ti+ j−2 dt.

0

Evaluating the integral, we see that ai j = 1/(i + j − 1).

Saikia-Linear Algebra

384

book1

February 25, 2014

0:8

Inner Product Spaces

Thus, the matrix A will be the following real hermitian, that is, the real symmetric matrix   1 1/2  1/3  1/4

1/2 1/3 1/4 1/5

1/3 1/4 1/5 1/6

 1/4  1/5 . 1/6 1/7

The concept of perpendicularity, which is commonly known as orthogonality, can be introduced in a vector space V equipped with a hermitian form H exactly in the same manner as it was done in the spaces with symmetric bilinear forms (see Definition (7.4.1)). Two vectors v, u ∈ V are orthogonal (with respect to H), if H(v, u) = 0. By hermitian symmetry, the relation of being orthogonal is symmetric. Our discussion about orthogonality with respect to a symmetric bilinear form in Section 4 of the last chapter can be repeated without much change in the case of a space with hermitian form. For example, the orthogonal complement of a subspace with respect to a hermitian form can be introduced and a result analogous to Theorem (7.4.4) can be proved. This theorem as well as the fact that orthogonal bases exist for hermitian forms hold under the condition that the hermitian form is positive definite. In fact, a lot can be said about the structures of spaces with positive definite forms and we make a detailed study of such forms in the following section. EXERCISES 1. Determine whether the following assertions are true are false giving brief justifications. All underlying vector spaces are finite-dimensional over the field of complex numbers or the field of real numbers. (a) If a hermitian form on a complex vector space is symmetric, then it must be identically zero. (b) A symmetric bilinear form on a real vector space is hermitian. (c) A non-zero hermitian form on a real vector space is positive definite. (d) The determinant of any matrix representation of a positive definite hermitian form is a positive real number. (e) There can be no positive definite hermitian form on an infinite-dimensional complex vector space. (f) If f is a hermitian form on a real vector space is not positive definite, then f cannot be non-degenerate. (g) The restriction of a positive definite hermitian form to a non-zero subspace need not be positive definite. (h) The sum of two hermitian matrices of the same order is again a hermitian matrix. (i) The diagonal elements of any hermitian matrix are real. 2. Show that H(x, y) = xt y for all x, y ∈ Fn defines a hermitian form on Fn . Is H positive definite? 3. Show that H(A, B) = T r(AB∗ ) for A, B ∈ Mn (F) defines a hermitian form on Mn (F) directly by using properties of trace and adjoints of matrices. Is H positive definite?

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Inner Product Space

385

4. Let P be a fixed hermitian matrix in Mn (F). Does H(A, B) = Tr(APB∗) for A, B ∈ Mn (F) define a hermitian form on Mn (F)? 5. Let V be the complex vector space of all continuous L 2π complex-valued functions on the interval [0, 2π]. Prove that the formula H( f, g) = 1/2π 0 f (t)g(t)dt defines a hermitian form H on V. 6. Let V be the complex vector spaceL of all polynomials with complex coefficients of degree at 1 most n. Verify that H( f (x), g(x)) = 0 f (t)g(t)dt defines a hermitian form on V. 7. Verify all the properties of adjoints listed in Proposition (8.5.6). 8. Prove or disprove: for A ∈ Mn (C), det(A∗ ) = det(A).

8.3 INNER PRODUCT SPACE In this section, we consider a positive definite hermitian form on a vector space over a field F, where F is either C or R. Such a form is called an inner product, and a space with such a form is known as an inner product space. One usually denotes an inner product by /, 0. Thus, by the properties of a hermitian form (see Definition (8.2.1)), an inner product /v, w0 of two vectors in an inner product space V over a field F is a scalar satisfying the following conditions: (i) /v, w + w' 0 = /v, w0 + /v, w' 0,

(ii) /v + v' , w0 = /v, w0 + /v' , w0, (iii) /av, w0 = a/v, w0 (iv) /v, aw0 = a/v, w0 (v) /v, w0 = /w, v0

for all v, v' , w, w' ∈ V and a ∈ F. It also satisfies that all important property of positive definiteness: (vi) /v, v0 is a positive real for all non-zero v ∈ V. Note that our definition of inner product makes it linear in the first variable but conjugate linear in the second variable. If F = R, then the conjugates are superfluous and so an inner product over R is linear in both the variables and simply a positive definite real symmetric bilinear form. We can expand the inner product of sums of vectors using repeatedly the properties satisfied by an inner product: D1 i

ai vi ,

1 j

E

b jw j =

1 i, j

ai b j /vi , w j 0.

(8.2)

For example, /v1 + av2 , bw1 0 = b/v1 , w1 0 + ab/v2 , w1 0. A few observations that follow easily from the definition of an inner product are listed in the following proposition. They will be needed frequently later.

Saikia-Linear Algebra

386

book1

February 25, 2014

0:8

Inner Product Spaces

Proposition 8.3.1. For any vectors v, w and u in V, (i) /v, 00 = /0, v0 = 0;

(ii) /v, v0 = 0 if and only if v = 0; (iii) /v, w0 = /v, u0 for all v ∈ V implies that w = u;

(iv) /v, w0 + /w, v0 = Re/v, w0. Here, Re a denotes the real part of a complex number a.

Most of the examples of hermitian forms of the previous section are also inner products as can be verified easily. We recall these examples for the sake of completeness. EXAMPLE 6

For x, y ∈ Cn , the form given by /x, y0 = y∗ x is the standard inner product on Cn or in Rn . For x = (x1 , x2 , . . . , xn )t , we see that /x, x0 = |x1 |2 + |x2 |2 + · · · + |xn |2 and so /x, x0 is a positive real number if x is non-zero. The standard inner product on Rn is similarly given by /x, y0 = yt x, the usual dot product. Thus /x, x0 = x1 2 + x2 2 + · · · + xn 2 for x = (x1 , x2 , . . . , xn )t and so /x, x0 is a positive real number for a non-zero x.

EXAMPLE 7

However, the hermitian form H on Fn defined by a hermitian matrix A (as H(x, y) = ' −1 0 ∗ y Ax is not an inner product in general. For example, if A = , then xt Ax 0 1 clearly cannot be positive for all non-zero vectors x in R2 . 4 Since for A = [ai j ] ∈ Mn (F), T r(AA∗ ) = i, j |ai j |2 , the corresponding hermitian form H on Mn (F) of Example 3 given by H(A, B) = T r(AB∗ ) is an inner product.

EXAMPLE 8 EXAMPLE 9

For functions f and g in the space V of all continuous complex-valued functions on the closed interval [0, 2π], J 2π / f , g0 = f (t)g(t)dt 0

is an inner product on V as the given integral form on V (see L 2π L 2π defines a hermitian Example 4 of the preceding section) and as 0 f (t) f (t)dt = 0 | f (t)|2 dt is a positive real for any non-zero function f in V. EXAMPLE 10 Similarly, the vector space V of all polynomials with complex coefficients is an inner L1 product space with the inner product defined as / f (x), g(x)0 = 0 f (t)g(t)dt.

Observe that any subspace of an inner product space is trivially an inner product space. This observation provides various examples of inner product spaces. For instance, the space R[x] of all polynomials with real coefficients or the space Rn (F) of all polynomials with real coefficients of degree at most n, being subspaces of the inner product space ofL the last example, are themselves inner product spaces 1 with the inherited inner product / f (x), g(x)0 = 0 f (t)g(t)dt. Note that as we are considering polynomials with real coefficients, we can dispense with the conjugate in the definition of the inner product. Notions of perpendicularity and lengths of vectors can be naturally introduced in an inner product space V. Two vectors v and w in an inner product space V are orthogonal if /v, w0 = 0. Note that symmetry is in-built in the definition because of the hermitian symmetry property of an inner product. Properties

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Inner Product Space

387

of inner product show that (i) every v is orthogonal to the zero vector and (ii) only self-orthogonal vector in an inner product space is the zero vector. The fact that the inner product of a vector with itself is a non-negative real number enables us to define the length of a vector. Definition 8.3.2. The √ length 4v4 of a non-zero vector v in an inner product space is defined as the positive square root /v, v0; if v = 0, then /v, v0 = 0 and we set 404 = 0. A unit vector is one whose length is one. In practice, it is easier to deal with squares of lengths of vectors. One of the reasons why inner product spaces turn out to be useful is that the notion of length of vectors in such a space satisfies the basic properties of usual length of vectors in R2 or R3 . Proposition 8.3.3. Let V be an inner product space over a field F. The following hold for all v, w ∈ V and a ∈ F. (i) 4v4 ≥ 0; 4v4 = 0 if and only if v is the zero vector. (ii) 4av4 = |a|4v4.

(iii) For any non-zero vector v, the scalar multiple v/4v4 is a unit vector. (iv) If v ⊥ w, then 4v + w42 = 4v42 + 4w42 . (v) |/v, w0| ≤ 4v44w4.

(vi) 4v + w4 ≤ 4v4 + 4w4. Whereas (iv) and (v) are known respectively as Pythagoras’ identity and Cauchy–Schwarz inequality, the last one is the well-known Triangle inequality. Proof. The first three are immediate from the definition of length, and their verifications are left to the reader as exercises. The verification of (iv) is also straightforward: 4v + w42 = /v + w, v + w0 = /v, v0 + /w, w0 = 4v42 + 4w42 as orthogonality of v and w implies that /v, w0 = /w, v0 = 0. For (v), note that the result is trivial if w = 0. So we may assume that w is non-zero. Let e be the unit vector defined by e = w/4w4. For any v ∈ V, consider the scalar a = /v, e0 ∈ F. Then /v − ae, e0 = /v, e0 − /ae, e0 = /v, e0 − a/e, e0 = a − a = 0 so the vectors v − ae and e are orthogonal. Applying Pythagoras’ Identity to orthogonal vectors v − ae and ae, we obtain 4v42 = 4v − ae42 + 4ae42 = 4v − ae42 + |a|2.

The preceding equality implies that |a|2 ≤ 4v42 , or equivalently, |a| ≤ 4v4 as the length of a vector is non-negative. Since |a| = |/v, e0| = |/v, w/4w40| = (1/4w4)|/v, w0|, it follows from the inequality |a| ≤ 4v4

Saikia-Linear Algebra

388

book1

February 25, 2014

0:8

Inner Product Spaces

that 1/4w4|/v, w0| ≤ 4v4, which, when multiplied by the positive real 4w4, yields the required inequality (v). Before we present the proof of the Triangle inequality, we recall the following elementary fact about complex numbers: if x is a complex number with x as its conjugate, then x + x = 2Re(x) ≤ 2|x|, where Re(x) denotes the real part of x and |x| the absolute value of x. In particular, for vectors v, w ∈ V /v, w0 + /w, v0 = /v, w0 + /v, w0 ≤ 2|/v, w0|. Therefore, 4v + w42 = /v, v0 + /v, w0 + /w, v0 + /w, w0 ≤ /v, v0 + 2|/v, w0| + /w, w0 = 4v42 + 2|/v, w0| + 4w42 .

Using Cauchy–Scharwz inequality, we then obtain 4v + w42 ≤ 4v42 + 24v44w4 + 4w42 . Taking positive square roots of both sides of this inequality of non-negative reals yields the required triangle inequality. ! Cauchy–Schwarz inequality and the triangle inequality, when applied to Fn with the standard inner product, give us some unexpected inequalities of numbers; given 2n numbers x1 , . . . , xn , y1 , . . . , yn , complex or real, form column vectors x = (x1 , . . . , xn )t and y = (y1 , . . . , yn )t in Fn with these num4 4 bers. With the standard inner product in Fn , we then have |/x, y0| = | i xi yi |, and 4x42 = i |xi |2 and 2 4 2 4y4 = i |yi | . Therefore Cauchy–Schwarz inequality and the triangle inequality can be interpreted as the following inequalities for the arbitrary 2n numbers: O1 O1 1 | |xi |2 |yi |2 xi yi | ≤ i

i

i

and O1 i

|2

|xi + yi ≤

O1 i

|xi

|2 +

O1 i

|yi |2 ,

respectively. EXERCISES 1. Determine whether the following statements are true or false giving brief justifications. (a) An inner product is linear in both the variables. (b) An inner product can be defined only on a vector space over R or C. (c) There is a unique inner product on R2 .

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Inner Product Space

389

(d) The sum or the difference of two inner products on a vector space V is an inner product on V. (e) If for a linear operator T on an inner product space V, /T w, v0 = 0 for all w, v ∈ V, then T is the zero operator. (f) If a vector v is orthogonal to every vector of a basis of an inner product space, then v is the zero vector. (g) There is no inner product in R2 such that /e1 , e2 0 = −1, where e1 and e2 are the standard basis vectors of R2 . (h) In an inner product space V, |/v, w0| = 4v44w4 if and only if vectors v and w are linearly independent in V. (i) For a complex inner product space V, there can be no v ∈ V such that 4v4 = −i. (j) The restriction of an inner product to a non-zero subspace need not be an inner product.

2. Let x and y be the coordinate vectors, respectively, of v and u with respect to a fixed basis of an n-dimensional vector space V over a field F, and let A be an arbitrary hermitian matrix in Mn (F). Verify that H(v, u) = y∗ Ax defines a hermitian form H on V. 3. Show that the hermitian form H on Mn (F), given by H(A, B) = T r(AB∗ ) for A, B ∈ Mn (F), is an inner product on Mn (F). 4. In each of the following cases, determine whether the given formula provides an inner product for the given vector space: (a) V = R2 ; /(x1 , x2 ), (y1 , y2 )0 = x1 y1 − x2 y2 . (b) V = R2 ; /(x1 , x2 ), (y1 , y2 )0 = x1 y1 − x2 y1 − x1 y2 + 4x2 y2 . (c) V = C2 ; /x, y0 = xAy∗ , where

A=

'

1 −i

( i 0

A=

' 1 3

2 4

(d) V = M2 (R); /A, B0 = T r(AB). (e) V = R2 ; /x, y0 = yt Ax, where

(

L1 (f) V = R[x]; / f (x), g(x)0 = 0 f ' (t)g(t)dt where f ' (x) is the formal derivative of f (x). 5. Find the conditions on a symmetric 2 × 2 matrix A such that /x, y0 = yt Ax defines an inner product on R2 . 6. Compute the lengths of the given vectors in each of the following cases, and verify both Cauchy– Schwarz inequality and the triangle inequality: (a) v = (1, i), u = (−2, 1 + i) in C2 with the standard inner product. (b) f (x) = e x , g(x) = x2 in the space V of all real-valued continuous functions on the interval [0, 1] with the inner product given by / f, g0 =

J

1 0

f (t)g(t)dt.

Saikia-Linear Algebra

390

book1

February 25, 2014

0:8

Inner Product Spaces

(c) A =

'

1 −i

( ' i 1+i , B= 2 i

( 0 in M2 (C) with inner product /A, B0 = Tr(AB∗ ). −i

7. Let T be the linear operator on R2 given by T (x1 , x2 ) = (−x2 , x1 ). Prove that /v, T v0 = 0 for all v ∈ R2 if /, 0 is the standard inner product on R2 . 8. Prove that there can be no linear operator T on C2 with the standard inner product such that /v, T v0 = 0 for all v ∈ C2 . 9. For any A, B ∈ Mn (F), show that (T r(A∗ B))2 ≤ T r(A∗ A)T r(B∗ B). 10. Let V be the set of all real sequences f = ( f (1), f (2), . . .) such that only finitely many terms of f are non-zero. (a) Show that V is a real vector space with addition and scalar multiplication defined componentwise. 4 (b) If for f, g ∈ V, / f, g0 = ∞ n=1 f (n)g(n), then show that /, 0 is an inner product on V. Note that the sum is actually a finite sum.

8.4 GRAM–SCHMIDT ORTHOGONALIZATION PROCESS The standard basis of Fn plays a very important role even when Fn is considered an inner product space with the standard inner product. The reason for the importance stems from the fact the vectors in the standard basis are unit vectors which are mutually orthogonal with respect to the standard inner product. Such bases of inner product spaces, known as orthonormal bases, make computations with vectors and linear operators extremely simple. In fact, in Section 5.3, we have already used orthonormal bases with respect to the dot product in Rn for proving a significant result about real symmetric matrices; moreover, the reader will recall that one can use a procedure (the Gram–Schmidt process) to convert linearly independent vectors in Rn to orthonormal sets. In this section, we discuss orthonormal sets as well as the Gram–Schmidt process in arbitrary inner product spaces and explore the key idea of orthogonal projection. First we give the relevant definitions. Definition 8.4.1. A non-empty set S of vectors in an inner product space V is called orthogonal if any two distinct vectors in S are orthogonal. An orthogonal set S of vectors is called orthonormal if each vector in S is a unit vector. A basis B of an inner product space is an orthonormal basis if B is an orthonormal set. We point out that these definitions hold even if the set S is infinite. On the other extreme, a single non-zero vector in an inner product space forms an orthogonal set and a single unit vector an orthonormal set. Briefly, a set {vi } of vectors in an inner product space, indexed by i in an indexing set Λ, is orthonormal if and only if, for i, j ∈ Λ, /vi , v j 0 = δi j , where δi j is the Kroncker’s delta symbol. Note that as scalar multiples of two orthogonal vectors remain orthogonal, any orthogonal set of non-zero vectors can be transformed into a set of orthonormal vectors by multiplying each vector by the reciprocal of its length. This process is described as normalizing the vectors.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gram–Schmidt Orthogonalization Process

391

EXAMPLE 11 The standard basis E = {e1 , e2 , . . . , en } for Cn , or for Rn , is an orthonormal basis 4 with respect to the standard inner product given by /x, y0 = nj=1 x j y j .

EXAMPLE 12 The vectors (1, 1)t and (1, −1)t form an orthogonal set in R2 with respect to the standard inner product. We can normalize this orthogonal set to √an orthonormal √ set √ by dividing (1/ 2, 1/ 2)t and √ t each vector by its length. Thus, the vectors (1/ 2, −1/ 2) form an orthonormal set of vectors in R2 , which is clearly an orthonormal basis. EXAMPLE 13 Consider the inner product space Mn (F) with the inner product /A, B0 = T r(AB∗). For 1 ≤ i, j ≤ n, let ei j be the unit matrix having 1 at the (i, j)th place and zeros elsewhere. Then it is easy to see that the n2 unit matrices ei j , for 1 ≤ i, j ≤ n, form a basis of Mn (F). Observe that /ei j , ekl 0 = T r(ei j ekl ∗ ) = T r(ei j elk ) = T r(δ jl eik ) and so /ei j , ekl 0 ! 0 if and only if j = l and i = k. Thus any two distinct unit matrices must be orthogonal, as at least one subscript has to be different for two unit matrices to be distinct. Our calculation also shows that /ei j , ei j 0 = T r(eii ) = 1 and each unit matrix is a unit vector. It follows that the unit matrices ei j form an orthonormal set in Mn (F) with respect to the given inner product. EXAMPLE 14 Consider the infinite set S = {einx | n = 0, ±1, ±2, . . }. of functions in the inner product space V of complex-valued continuous functions on [0, 2π] as in Example 4 of the preceding section. Here, einx = cos nx + i sin nx where cos and sin are the real cosine and sine functions. Now,

inx

/e , e

imx

J 2π 1 eint eimt dt 0= 2π 0 J 2π 1 = eint e−imt dt. 2π 0 J 2π 1 = ei(n−m)t dt 2π 0

If n ! m, then an easy calculation shows that the value of the last integral is zero, whereas for n = m it is trivially 2π. Thus, while two distinct functions of the set S are orthogonal, the length of each is one (because of the factor of 1/2π) showing that S is an orthonormal set in V. If we denote the function eint for any integer n by fn , then the preceding calculation has shown that / fn , fm 0 = δmn , where δmn is the Kronecker delta symbol.

EXAMPLE 15 The basis {1, x, x2 } of the space of all real L 1 polynomials of degree at most 2 equipped with the inner product / f (x), g(x)0 = 0 f (x)g(x)dx is not an orthogonal set. For L1 example, /1, x0 = 0 1.xdx = 1/2 showing that 1 and x are not orthogonal. One of the advantages of an orthonormal basis is that it simplifies the expressions of vectors as linear combinations of basis vectors as shown in the next proposition; such a linear combination is sometimes known as a Fourier expansion.

Saikia-Linear Algebra

392

book1

February 25, 2014

0:8

Inner Product Spaces

Proposition 8.4.2. Let S = {v1 , v2 , . . . , vr } be an orthogonal set of non-zero vectors in an inner product space V with inner product /, 0. If v is in the span of S, then v=

r 1 /v, vi 0 i=1

In case S is an orthonormal set, one has v=

r 1 i=1

4vi 42

vi .

/v, vi 0vi .

(8.3)

4 Proof. Write v = j a j v j as a linear combination of the vectors in S. For each i, 1 ≤ i ≤ n, taking the inner product of v with vi , we get E 1 D1 a j /v j , vi 0 = ai /vi , vi 0, a j v j , vi = /v, vi 0 = j

j

/v, vi 0

for each i. ! 4vi 42 Thus, coefficients in linear combinations in inner product spaces can be determined by straightforward computations of inner products, instead of by solving systems of linear equations as done earlier. √ t EXAMPLE 16 In an √ earlier texample, we had seen2 that the vectors v1 = (1/ 2)(1, 1) , v2 = (1/ 2)(1, −1) are orthonormal in R with respect to the standard inner product. They clearly form a basis of R2 (as one is not a scalar multiple of the other). To express an arbitrary vector (a, b)t in terms of these basis vectors, we need not write out the equations for x and y from the relation (a, b)t = xv1 + yv2 , and then solve them for x and y as we had √ done earlier. For, √ the last corollary implies that x = /(a, b)t ,√v1 0 = /(a, b)t , (1/√ 2)(1, 1)t 0 = (1/ 2)(a + b) and y = /(a, b)t , v2 0 = /(a, b)t , (1/ 2)(1, −1)t 0 = (1/ 2)(a − b). as distinct vectors in S are orthogonal. It follows that ai =

This technique of evaluating coefficients by computing inner products leads us to the following important result. Proposition 8.4.3. Let S be an orthogonal (or orthonormal) set of non-zero vectors in an inner product space V over a field F, then S is linearly independent over F. Proof. If S is infinite, then we need to show that every finite subset, say {v1 , v2 , . . . , vr }, of S is linearly independent. If S is finite, we choose this subset to be S itself. Consider any linear combination a1 v1 + a2 v2 + · · · + ar vr = 0 for ai ∈ F. As for each fixed i, /vi , v j 0 = 0 for j ! i, taking inner product of both sides of the relation with vi yields ai /vi , vi 0 = 0. Since /vi , vi 0 is a non-zero scalar for any i, it follows from the preceding relation that ai = 0. Thus, the required condition for linearly independence is established. ! Corollary 8.4.4. The number of non-zero vectors in an orthogonal set in a finite-dimensional inner product space cannot exceed the dimension of the space.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gram–Schmidt Orthogonalization Process

393

EXAMPLE 17 It is easy to verify that the vectors v1 = (1, 1, 0)t , v2 = (1, −1, 0)t and v3 = (0, 0, 2)t are orthogonal in R3 with respect to the standard inner product. Thus, our last Proposition (8.4.3) implies that these vectors form an orthogonal basis of R3 . We wish to express a vector v ∈ R3 , for example, v = (1, 2, 3)t , in terms of these basis vectors. We have /v, v1 0 = /(1, 2, 3)t , (1, 1, 0)t 0 = 3, /v, v2 0 = /(1, 2, 3)t , (1, −1, 0)t 0 = −1 and /v, v3 0 = /(1, 2, 3)t , (0, 0, 2)t 0 = 6. Similar calculations show that 4v1 42 = 2, 4v2 42 = 2 and 4v3 42 = 4. Hence, by the formula in (8.4.3), one obtains (1, 2, 3)t = (3/2)(1, 1, 0)t − (1/2)(1, −1, 0)t + (3/2)(0, 0, 2)t . Note that normalizing the given orthogonal basis vectors produces an orthonormal √ √ basis of R3 consisting of u1 = (1/ 2)(1, 1, 0)t , u2 = (1/ 2)(1, −1, 0)t and u3 = t t (1/2)(0, √ 0, 2) . In √ that case it is clear that the vector v = (1, 2, 3) can be expressed as (3/ 2)u1 − (1/ 2)u2 + 3u3 . EXAMPLE 18 Let V be the vector space of all complex-valued continuous functions on the interval L 2π [0, 2π] with the inner product / f (x), g(x)0 = 1/2π 0 f (t)g(t)dt. It was shown in Example 15 that any two members of the infinite family { fn (x) = einx | n any integer} are orthogonal with respect to the given inner product. It follows that V cannot be finitedimensional for, by Proposition (8.4.3), V has an infinite set of linearly independent vectors. Now that we have seen the advantages of an orthogonal or an orthonormal basis, it is time to discuss the existence of such bases, at least for finite-dimensional inner product spaces. As in the case of Rn with dot product (see Section 3.7), in such an inner product space, the Gram–Schmidt orthogonalization process is available, which transforms any given basis of an inner product space into an orthogonal basis. At the heart of this procedure is the construction of orthogonal projection of vectors. The motivation for such a construction was highlighted in our discussion in Section 3.7 about Gram–Schmidt process in Rn in terms of the decomposition of a vector in R2 into two perpendicular directions; it may be worthwhile for the reader to look up that discussion for a clue as to how the following definition is arrived at. Such a decomposition of a vector in R2 is essentially in terms of the projection of the vector onto a line (an one-dimensional subspace) giving one of the direction. In the general case, we need to project a vector orthogonally onto subspaces having finite bases of orthogonal or orthonormal vectors; the following definition spells out the mechanism for doing so. Definition 8.4.5. Let V be an inner product space and W be a subspace having an orthogonal basis {v1 , v2 , . . . , vm }. For any v ∈ V, the orthogonal projection of v on W, denoted by PW (v), is the following vector of W: PW (v) =

/v, v1 0 4v1 42

v1 + · · · +

/v, vm 0 4vm 42

vm =

m 1 /v, vi 0 i=1

4vi 42

vi .

In case the basis {v1 , v2 , . . . , vm } is orthonormal, it is clear that the expression for PW (v) has to be modified by replacing each of the lengths 4vk 42 by 1. Since the coefficients /v, vi 0/4vi 42 of vi in the expression for PW (v) are scalars and /vi , v j 0 = δi j , it follows that /PW (v), v j 0 = /v, v j 0, for any j, by the linearity of the inner product, a result which will be useful later.

Saikia-Linear Algebra

394

book1

February 25, 2014

0:8

Inner Product Spaces

As in the case of symmetric bilinear forms (see Section 7.4), vectors orthogonal to each vector of a given subspace, or even of a non-empty subset, of an inner product space need to be considered. Definition 8.4.6. Let S be any non-empty set of vectors in an inner product space V. The orthogonal complement of S , denoted by S ⊥ , is the set of all vectors in V which are orthogonal to every vector in S . Thus S ⊥ = {v ∈ V | /v, w0 = 0 for every w ∈ S }. Standard arguments show that (i) for any non-empty subset S of V, S ⊥ is a subspace of V and (ii) V ⊥ = {0} and {0}⊥ = V. One can say more about orthogonal complements of subspaces. Proposition 8.4.7. Let V be an inner product space. For any subspace W of V, W ∩ W ⊥ = {0} and (W ⊥ )⊥ = W. Moreover, if W is finite-dimensional, then any vector v, which is orthogonal to each vector of a basis of W, is in W ⊥ . The proof is straightforward and so left to the reader. The following lemma explains the name of orthogonal projections of vectors. Lemma 8.4.8. Let V be an inner product space and W be a subspace which has some finite orthogonal basis. If PW (v) is the orthogonal projection of a vector v ∈ V on the subspace W, then v − PW (v) ∈ W ⊥ . Proof. Let v1 , v2 , . . . , vk be an orthogonal basis of W. By the preceding proposition, it suffices to show that v − PW (v) is orthogonal to any basis vector v j of W. Now /v − PW (v), v j 0 = /v, v j 0 − /PW (v), v j 0 by linearity of inner product. However, as the given basis of W is orthogonal, by the remark following the definition of orthogonal projection, we have /PW (v), v j 0 = /v, v j 0. Thus, /v − PW (v), v j 0 = 0 as required. ! We are now ready to introduce the Gram–Schmidt orthogonalization process in an arbitrary inner product space. This process consists of repeated applications of orthogonal projections to arrive at orthogonal bases for finite-dimensional subspaces of an inner product space. Theorem 8.4.9. Let V be an inner product space, and let S = {u1 , u2 , . . . , un } a linearly independent set of vectors of V. Then there is an orthogonal set S' = {v1 , v2 , . . . , vn } of vectors in V such that for each k = 1, 2, . . . , n, the set {v1 , . . . , vk } is a basis for the subspace Wk spanned by {u1 , . . . , uk }. Proof. We begin by letting v1 = u1 ! 0. The desired vectors v2 , . . . , vn are then defined inductively as follows: Suppose that for any m, 1 ≤ m < n, vectors v1 , v2 , . . . , vm have been chosen such that for each k, 1 ≤ k ≤ m, the vectors v1 , . . . , vk form an orthogonal basis of the subspace Wk spanned by the linearly independent vectors u1 , . . . , uk . Then, the next vector required, that is, vm+1 , will be constructed by projecting um+1 orthogonally on the subspace Wm , now considered spanned by v1 , . . . , vm . More precisely, we define vm+1 = um+1 − PWm (um+1 ) m 1 /um+1 , vi 0 vi . = um+1 − /vi , vi 0 i=1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gram–Schmidt Orthogonalization Process

395

We need to show that v1 , . . . , vm , vm+1 form an orthogonal basis of Wm+1 . First, we observe that vm+1 is non-zero, for, if vm+1 = 0, then um+1 will be in the span of v1 , . . . , vm which, by our inductive construction, is Wm . It then follows um+1 will be a linear combination of u1 , . . . , um (as these span Wm ), which contradicts the linear independence of the set S. Next we claim that the non-zero vectors v1 , . . . , vm , vm+1 are mutually orthogonal. Since by the induction hypothesis, the vectors v1 , . . . , vm form an orthogonal set, it suffices to show that vm+1 is orthogonal to v j for j = 1, 2, . . . , m. However, by Lemma (8.4.8), vm+1 = um+1 − PWm (um+1 ) is in Wm ⊥ , and so is orthogonal to each of the vectors v1 , v2 , . . . , vm that span Wm . Hence our claim. Thus, v1 , v2 , . . . , vm+1 is an orthogonal set of non-zero vectors in the subspace Wm+1 , which is spanned by m + 1 linearly independent vectors u1 , u2 , . . . , um+1 . On the other hand, by Proposition (8.4.3), vectors v1 , v2 , . . . , vm+1 too are linearly independent, and so they form a basis of Wm+1 . By induction, the proof is complete. ! Corollary 8.4.10.

Every finite-dimensional inner product space has an orthonormal basis.

Proof. Any basis {u1 , . . . , un } of a finite-dimensional inner product space can be replaced by an orthogonal basis {v1 , . . . , vn } by the Gram–Schmidt orthogonalization process. Further, an orthonormal basis can be obtained by replacing each v j by the normalized vector v j /4v j 4. ! Corollary 8.4.11. (Orthogonal Decomposition) Let W be a finite-dimensional subspace of an inner product space V. Then V = W ⊕ W ⊥ . Proof. Since by assertion (iii) of Proposition (8.4.7), W ∩ W ⊥ = {0}, we need only to show that V is the sum of W and W ⊥ . To show that we first choose an orthonormal basis of W so that for any v ∈ V, the orthogonal projection PW (v), as defined in Definition (8.4.2), is a vector in W. Now, observe that v = PW (v) + (v − PW (v)), where v − PW (v) ∈ W ⊥ by Lemma (8.4.8). The corollary follows. ! Recall from Proposition (3.4.7) that any set of linearly independent vectors in a finite-dimensional vector space can be extended to a basis. A similar result holds about orthonormal sets of vectors. Corollary 8.4.12. Let S be an orthonormal set of vectors of a finite-dimensional inner product space V. Then S can be extended to an orthonormal basis of V. Proof. Let W be the subspace of V spanned by S and W ⊥ be its orthogonal complement. Now, W ⊥ is also finite-dimensional so by Corollary (8.4.10) it has an orthonormal basis, say S' . It is clear that S ∪ S' is an orthonormal basis of V as V = W ⊕ W ⊥ . ! Quite frequently, we start with a single unit vector e of a finite-dimensional inner product space V. Since e forms an orthonormal set, e along with any orthonormal basis of its orthogonal complement will form an orthonormal basis of V. See Section 3.7 for some detailed numerical examples illustrating the Gram–Schmidt orthogonalization process. Nonetheless we give a couple of examples here; in the first one, the reader is expected to work out the details. EXAMPLE 19 Consider the subspace W of R4 spanned by the vectors u1 = (1, 0, 2, 0)t , u2 = (1, 1, 7, 0)t and u3 = (2, 6, 4, 1)t . Assume that R4 has the standard inner product. We proceed to find an orthonormal basis of W by the Gram–Schmidt orthogonalization process. So, we begin by letting v1 = u1 = (1, 0, 2, 0)t . Now /u2 , v1 0 =

Saikia-Linear Algebra

396

book1

February 25, 2014

0:8

Inner Product Spaces

/(1, 1, 7, 0)t , (1, 0, 2, 0)t 0 = 15 and 4v1 42 = /(1, 0, 2, 0)t , (1, 0, 2, 0)t 0 = 5. Thus, the 15 second required vector is given by v2 = (1, 1, 7, 0)t − (1, 0, 2, 0)t = (−2, 1, 1, 0)t . 5 Similar calculation shows that /u3 , v1 0 = 10, /u3 , v2 0 = 6 and 4v2 42 = 6. Thus, the third vector turns out to be v3 = (2, 5, −1, 1)t . Normalizing√ these three v j , we arrive basis of √ at an orthonormal √ of W consisting t t t the vectors (1/ 5)(1, 0, 2, 0) , (1/ 6)(−2, 1, 1, 0) and (1/ 31)(2, 5, −1, 1) . EXAMPLE 20 Let us obtain an orthonormal basis of R2 [x], the space of all real L 1 polynomials of degree at most 2, equipped with the inner product / f (x), g(x)0 = 0 f (t)g(t)dt. Consider the basis {u1 = 1, u2 = x, u3 = x2 } of R2 [x], and let v1 = u1 = 1. Then the next required vector is given by 2J 1 3 1 /u2 , v1 0 v2 = u2 − v = x − tdt .1 = x − . 1 2 2 4v1 4 0 Now 4v2 42 = 4x − 1/242 =

L1 0

(t − 1/2)2dt = 1/12. Therefore,

/u3 , v1 0

/u3 , v2 0 v1 − v2 4v 4 4v2 42 2J 11 3 2J 1 3 2 2 2 =x − t dt .1 − 12 t (t − 1/2)dt .(x − 1/2)

v3 = u 3 −

2

0

0

= x2 − 1/3 − 12(1/4 − 1/6)(x − 1/2) = x2 − x + 1/6.

L1 Note that 4v3 42 = 0 (t2 − t + 1/6)2dt = 1/180. Thus the polynomials 1, x − 1/2, x2 − x + 1/6 form an orthogonal these vectors, we obtain an √ basis of R2 [x]. Normalizing √ orthonormal basis {1, (1/ 12)(x − 1/2), (1/6 5)(x2 − x + 1/6)}. Another advantage of working with an orthonormal basis of an inner product space lies in the ease with which we can compute the matrix of a linear operator on that space. The precise formulation is given in the following result. Proposition 8.4.13. Let T be a linear operator on a finite-dimensional inner product space V with an orthonormal basis B = {v1 , v2 , . . . , vn }. If A = [ai j ] be the matrix of T with respect to B, then ai j = /T v j , vi 0.

Proof. Since B is an orthonormal basis of V, any v ∈ V can be expressed, by Corollary (8.3), as 4 4 v = ni=1 /v, vi 0vi . In particular, T v j = ni=1 /T v j , vi 0vi , for j = 1, 2, . . . , n. On the other hand, by the definition of the matrix A of T , the vector T v j determines the jth column of A uniquely as follows: 4 T v j = ni=1 ai j vi for each j. Comparing these two expressions (by using the linear independence of the vectors vi ), we arrive at the formula for ai j . !

The point of this proposition is that instead of solving systems of equations to determine the entries of the matrix of an operator on an inner product space, one has the easier option of computing these entries as inner products.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gram–Schmidt Orthogonalization Process

397

EXAMPLE 21 Consider R2 with the standard inner product, and let T : R2 → R2 be the linear operator given by T (x1 , x2 )t = (x1 , x1 +√x2 )t . Suppose, √ we want to find the matrix A = [ai j ] of T with respect to the basis {1/ 2(1, 1)t , 1/ 2(1, −1)t } of R2 . Once we note that this basis is orthonormal relative to the given inner product, computing √ t √the entries √ t of √ A is √ straightforward: for example /T v 2, 1/ 2) , (1/ 2, 1/ 2) 0 = , v 0 = /T (1/ 1 1 √ √ √ /(1/ 2, 2/ 2)t , (1/ 2, 1/ 2)t 0 and so by the formula in the proposition a11 = 1/2 + 2/2 = 3/2. Similar calculations show that a12 = /T v2 , v1 0 = 1/2, a21 = −1/2 and a22 = 1/2. We now examine another aspect of the concept of orthogonal projection. Though the notation PW v for any v ∈ V gives the impression that PW is a map on V, the expression for PW (v), in terms of a given orthonormal or orthogonal basis of W, does not allow us to consider it as a map on V. Fortunately, there is an alternative description of an orthogonal projection onto a subspace, which will not only allow us to treat such a projection as a map from V onto W but also extend the idea to infinite-dimensional cases. The key is to think of the orthogonal projection of a vector on a subspace as that vector of the subspace which is, in some sense, closest to the given vector. Indeed, in R2 the point Q, on a line W (a one-dimensional subspace), which is closest to a given point P in R2 , corresponds to the orthogonal projection on W of the vector v that represents P. We interpret this situation in terms of an inequality involving distances: 4v − PW (v)4 is the least among all possible distances 4v − w4 as w ranges over W. This inequality suggests that the idea of orthogonal projection can be extended in the following manner. Definition 8.4.14. Let W be a subspace of an inner product space V. For any v ∈ V, a best approximation to v by vectors in W is a vector u ∈ W such that 4v − u4 ≤ 4v − w4 for all w ∈ W. Because of our familiarity with R2 or R3 , we intuitively feel that such an approximation should be unique and be such that v − u is orthogonal to all of W. This is indeed the case even in an arbitrary inner product space, as shown in the following proposition. Proposition 8.4.15. Let W be a subspace of an inner product space V. For any v ∈ V, a vector u ∈ W is a best approximation to v by vectors in W if and only if v − u ∈ W ⊥ . Proof. Suppose that for some vector u ∈ W, v − u ∈ W ⊥ . Then, for any w ∈ W, u − w is orthogonal to v − u. Hence, by Pythagoras’ identity (see Proposition 8.3.3), we see that 4v − w42 = 4(v − u) + (u − w)42 = 4v − u42 + 4u − w42 ≥ 4v − u42

for any w ∈ W, where the equality holds only when u = w. Thus, u satisfies the condition in Definition (8.4.4) for being a best approximation to v.

Saikia-Linear Algebra

398

book1

February 25, 2014

0:8

Inner Product Spaces

To prove the proposition in the other direction, assume that u ∈ W such that 4v − w4 ≥ 4v − u4 for any w ∈ W. Note that in general, 4v − w42 = 4(v − u) + (u − w)42

= 4v − u42 + 2Re/v − u, u − w0 + 4u − w42 .

Our assumption then implies that 2Re/v − u, u − w0 + 4u − w42 ≥ 0

(8.4)

for any w ∈ W. Next, observe that as u ∈ W is fixed, every vector in W can be expressed as u − w by choosing an appropriate w, so that Inequality (8.4) holds with u − w replaced by an arbitrary vector of W. In particular, for any w ∈ W such that w ! u, we may replace u − w in Inequality (8.4) by a(u − w), where a is the scalar given by a=−

/v − u, u − w0 4u − w42

.

A short calculation involving basic properties of inner product then shows that Inequality (8.4) reduces to −2

|/v − u, u − w0|2 4u − w42

+

|/v − u, u − w0|2 4u − w42

≥ 0,

which clearly holds if and only if /v − u, u − w0 = 0. This implies, because of our earlier observation about vectors in W, that every non-zero vector of W is orthogonal to v − u. Hence the converse follows. ! Corollary 8.4.16. If, for a vector v in an inner product space V, a best approximation to v by vectors of W exists, then it must be unique. Proof. If u1 and u2 in W are two best approximations to v by vectors of W, then by the preceding proposition /v − u1 , w0 = /v − u2 , w0 = 0 for every w ∈ W. It follows that /u2 − u1 , w0 = /(v − u1 ) − (v − u2 ), w0 = 0, showing that the vector u2 − u1 of W is actually in W ⊥ . But the only vector of a subspace which is also in its orthogonal complement is the zero vector (see Proposition 8.4.7) The corollary follows. ! Recall from Lemma (8.4.8) that if W is a finite-dimensional subspace of an inner product space V, then for any v ∈ V, the vector v − PW (v) is in W ⊥ . Therefore the preceding proposition also implies that best approximations always exist in the case of finite-dimensional subspaces. Corollary 8.4.17. If W is a finite-dimensional subspace of an inner product space V, then the vector PW (v) ∈ W, for any v ∈ V, is the best approximation to v by vectors of W. Taking a cue from the idea of best approximations, we now introduce orthogonal projections as operators on arbitrary inner product spaces.

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gram–Schmidt Orthogonalization Process

399

Definition 8.4.18. Let W be a subspace of an inner product space V such that every vector of V has best approximation by vectors in W. The orthogonal projection of V on W is the map PW : V → W such that PW (v) = v x , where v x is the best approximation of v by vectors in W. We reiterate that if W is a finite-dimensional subspace of an inner product space V, then the orthogonal projection does exist as a map on V; in fact, if {v1 , v2 , . . . , vm } is an orthonormal basis of W, then for any v ∈ V, PW (v) =

m 1 i=1

/v, vi 0vi .

(8.5)

We consider an example to clarify the ideas involved in orthogonal projections. We assume that PW is a linear map, a fact we will prove shortly. EXAMPLE 22 Consider the subspace W of R3 with the standard inner product, spanned by (−1, 0, 1)t . For any v = (x1 , x2 , x3 )t in R3 , we have PW (v) =

/(x1 , x2 , x3 )t , (−1, 0, 1)t 0

4(−1, 0, 1)t 42 −x1 + x3 (−1, 0, 1)t = 2 1 = (x1 − x3 , 0, −x1 + x3 )t . 2

(−1, 0, 1)t

It follows that ker PW = {(y1 , y2 , y3 )t ∈ R3 | y1 = y3 }. On the other hand, v − PW (v) = (x1 , x2 , x3 )t − 1/2(x1 − x3 , 0, −x1 + x3 )t 1 = (x1 + x3 , 2x2 , x1 + x3 )t . 2

It is, therefore, clear that W ⊥ coincides with ker PW . To continue with the example, we consider the orthonormal basis of R3 formed by 1 v1 = √ (−1, 0, 1)t , 2

1 v2 = √ (1, 0, 1)t 2

1 and v3 = √ (−1, 0, −1)t . 2

The choice of the basis will be clear a little later. Let A = [ai j ] be the matrix of PW ; it is an easy exercise to compute the entries of A, by the formula given in Proposition (8.4.13), to show that   1 0 0   A = 0 0 0. (8.6)   0 0 0

That almost all the entries are zeros is not unexpected as two of the basis vectors are in ker PW . Also note that A2 = A so we can conclude that PW is a projection map as PW 2 = PW . Our next result shows, as anticipated in the preceding example, that the map we have just introduced as an orthogonal projection is indeed a projection in our earlier sense (see 4.2.12 for relevant properties).

Saikia-Linear Algebra

400

book1

February 25, 2014

0:8

Inner Product Spaces

Proposition 8.4.19. Let W be a finite-dimensional subspace of an inner product space V, and let PW be the orthogonal projection of V on W. Then, PW is a linear operator on V such that PW 2 = PW . Moreover, Im(PW ) = W and ker PW = W ⊥ . Proof. Since W is finite-dimensional, we may assume that there is an orthonormal basis {v1 , v2 , . . . , vm } of W. In that case, for any v ∈ V we have PW (v) =

m 1 i=1

/v, vi 0vi

by Equation (8.5). This formula shows that PW is a linear map as an inner product is linear in the first variable; that the range of PW is contained in W is also clear from the formula. We can also interpret PW (v) as the best approximation to v by vectors of W. Thus, for w ∈ W, PW (w) has to be w itself. But for any v ∈ V, PW (v) ∈ W, so that PW (PW (v)) = PW (v) which proves that PW is onto W and PW 2 = PW . We have already seen that for any v ∈ V, the vector PW (v) is the unique vector in W such that v − PW (v) ∈ W ⊥ . It follows that PW (v) = 0 if and only if v ∈ W ⊥ . In other words, ker PW = W ⊥ . This completes the proof. ! This result now explains our choice of the orthonormal basis for R3 in Example 23 with reference to the projection PW . We took v1 as an orthonormal basis of the subspace W, and chose v2 and v3 as an orthonormal basis of ker PW . Since ker PW = W ⊥ and R3 = W ⊕ W ⊥ , it follows that v1 , v2 and v3 form an orthonormal basis of R3 . Now PW acts as the identity on W, and is the zero map on W ⊥ . Therefore, the chosen basis reflects the essential character of the projection PW as shown by its matrix A relative to that basis. We end this section by deriving an important inequality which is valid in any inner product space. Corollary 8.4.20. (Bessel’s inequality) Let v1 , v2 , . . . , vn be an orthogonal set of non-zero vectors in an inner product space V. Then for any v ∈ V, n 1 |/v, vi 0|2 i=1

4vi 42

≤ 4v42 .

Proof. Let W be the subspace of V spanned by the given vectors v1 , v2 , . . . , vn . Observe that by Proposition (8.4.3), these non-zero vectors are linearly independent and so form an orthogonal basis of W. On the other hand, by Proposition (8.4.19), any vector v ∈ V can be decomposed as v = PW (v) +u, where PW is the orthogonal projection of V onto W and so PW (v) ∈ W whereas u = v − PW (v) ∈ W ⊥ . Since PW (v) and u are orthogonal, it follows, by Pythagoras’ Identity, that 4v42 = 4PW (v)42 + 4u42 , which implies the inequality 4PW (v)42 ≤ 4v42 .

(8.7)

On the other hand, as v1 , v2 , . . . , vn are orthogonal, the following expression (see Definition 8.4.5) n 1 /v, vi 0 PW (v) = vi 4vi 42 i=1

Saikia-Linear Algebra

book1

February 25, 2014

0:8

Gram–Schmidt Orthogonalization Process

401

readily yields the following formula: 4PW (v)42 = /PW (v), PW (v)0 =

n 1 |/v, vi 0|2 . 4vi 42 i=1

Bessel’s inequality then follows from the inequality in (8.7).

!

Two points should be noted. Firstly, if v1 , v2 , . . . , vn is an orthonormal set, then Bessel’s Inequality reads n 1 i=1

|/v, vi 0|2 ≤ 4v42 .

Secondly, going through the proof of Bessel’s inequality, we see that equality holds in Bessel’s inequality if and only if v = PW (v), that is, if and only if v is in the span of the given vectors. Coming back to the Gram–Schmidt process, it can be used to derive a factorization of a m × n real matrix into a product of a m × n matrix having orthonormal columns and an invertible upper triangular matrix of order n. Such a factorization is known as a QR factorization. QR factorizations of real matrices have practical applications. For details, the reader is referred to Section 3.7, where we have discussed properties of Rn with the usual dot product; the reader shall also find examples and exercises related to the Gram–Schmidt process in Rn in that section. EXERCISES 1. Determine whether the following statements are true or false giving brief justifications. Assume that V is a finite-dimensional inner product space. (a) Every orthogonal set in V is linearly independent. (b) Every linearly independent set in V is orthogonal. (c) Every finite set of vectors in V, containing the zero vector, is orthogonal. (d) A unit vector in V forms an orthonormal set in V. (e) The Gram–Schmidt process changes an arbitrary set of vectors in V into an orthonormal set. (f) If for two subspaces W1 and W2 of V, W1 ⊥ = W2 ⊥ , then W1 = W2 . (g) If for two non-empty subsets S 1 and S 2 of V, S 1 ⊥ = S 2 ⊥ , then S 1 = S 2 . (h) If a subset {v1 , v2 , . . . , vk } of a basis {v1 , v2 , . . . , vn } of V spans a subspace W, then {vk+1 , vk+2 , . . . , vn } spans W ⊥ . (i) If the dimension of a subspace W of V is m, then any orthogonal set of m vectors in W is an orthogonal basis of W.

(j) An orthogonal projection of V onto a subspace W is completely determined by its image of W. (k) Any projection of V onto a subspace is an orthogonal projection. (l) If a vector v ∈ V is orthogonal to each basis vector of subspace W, then PW (v) is the zero vector, where PW is the orthogonal projection of V onto W. (m) For any v ∈ V, the best approximation to v by vectors of a subspace W is precisely v− PW (v).

Saikia-Linear Algebra

402

book1

February 25, 2014

0:8

Inner Product Spaces

(n) For a linear operator T on V, the matrix [ak j ] of T with respect to an orthonormal basis {v1 , v2 , . . . , vn } is given by ak j = /T vk , v j 0.

(o) Every finite-dimensional inner product space has an orthonormal basis. (p) There is an inner product on the vector space Mn (F) with respect to which the n2 unit matrices ei j form an orthonormal basis. 2. In each part of the following exercise, apply the Gram–Schmidt process to the given set S in the indicated inner product space V to obtain an orthonormal basis of the subspace spanned by S ; also find the coordinates of the given vector with respect to the orthonormal basis found: (a) V = R2 with standard inner product, S = {(1, 1)t , (−1, 2)t } and v = (2, 3)t .

(b) V = R3 with standard inner product, S = {(1, 1, 0)t , (−1, 2, 1)t } and v = (1, 4, 1)t . (c) V = C3 with standard inner product, S = {(1, i, 0)t , (1 + i, −i, −1)t } and v = (1 − i, 2 − i, −1)t . Lπ (d) V = /S 0 with inner product / f, g0 = 0 f (t)g(t)dt with S = {sin t, cos t, 1, t} and h(t) = t + 1. (e) V = M2 (R) with inner product /A, B0 = T r(AB∗ ), S=

;'

1 −1

( 1 , 1

' −1 0

2 3

(