1,599 172 11MB
English Pages 199 Year 1961
INTERSCIENCE TRACTS IN PURE AND APPLIED MATHEMATICS Editors: L. BERS
o R. COURANT
o
J. J. STOKER
Number 9
LECTURES ON LINEAR ALGEBRA By I. M. GeI’fanJ
INTERSCIENCE PUBLISHERS, INC., NEW YORK INTERSCIENCE PUBLISHERS LTD.. LONDON
INTERSCIENCE TRACTS IN PURE AND APPLIED MATHEMATICS Editors: L. BERS

R. COURANT

J. J. STOKER
l. D. Montgomery and L. Zippin
TOPOLOGICAL TRANSFORMATION GROUPS 2. Fritz John PLANE WAVES AND SPHERICAL MEANS Applied to Partial Differential Equations 3. E. Artin GEOMETRIC ALGEBRA
4. R. D. Richtmyer
DIFFERENCE METHODS FOR INITIALVALUE PROBLEMS 5. Serge Lang
INTRODUCTION TO ALGEBRAIC GEOMETRY 6. Herbert Busemann
CONVEX SURFACES 7. Serge Lang
ABELIAN VARIETIES 8. S. M. Ulam A COLLECTION OF MATHEMATICAL PROBLEMS
9. I. M. Gel’fand LECTURES 0N LINEAR ALGEBRA
Additional volumes in pnpamtion
LECTURES ON LINEAR ALGEBRA I. M. GEL’FAND Academy of Sciences, M030022), U.S.S.R. Translated from the Revised Second Russian Edition
by A. SHENITZER Adelphi College, Garden City, New York
INTERSCIENCE PUBLISHERS, INC., NEW YORK INTERSCIENCE PUBLISHERS LTD., LONDON
COPYRIGHT © 1961 BY INTERSCIENCE PUBLISHERS, INC. ALL RIGHTS RESERVED LIBRARY OF CONGRESS CATALOG CARD NUMBER 618630
INTERSCIENCE PUBLISHERS, INC.
250 Fifth Avenue, New York 1, N. Y.
For Great Britain and Northevn Ireland: INIERSCIENCE PUBLISHERS LTD. 88/90 Chancery Lane, London, W. C. 2, England
PRINTED IN THE UNITED STATES OF AMERICA
PREFACE TO THE SECOND EDITION
The second edition differs from the first in two ways. Some of the
material was substantially revised and new material was added. The major additions include two appendices at the end of the book dealing with computational methods in linear algebra and the theory of perturbations, a section on extremal properties of eigenvalues, and a section on polynomial matrices (§§ 17 and 21). As for major revisions, the chapter dealing with the Jordan canonical form of a linear transformation was entirely rewritten and Chapter IV was reworked. Minor changes and additions were also made. The new text was written in collaboration with Z. Ia. Shapiro. I wish to thank A. G. Kurosh for making available his lecture notes on tensor algebra. I am grateful to S. V. Fomin for a number of valuable comments. Finally, my thanks go to M. L. Tzeitlin for assistance in the preparation of the manuscript and for a. number of suggestions.
September 1950
I. GEL’FAND
Translator’s note: Professor Gel’fand asked that the two appendices
be left out of the English translation.
PREFACE TO THE FIRST EDITION
This book is based on a course in linear algebra taught by the author in the department of mechanics and mathematics of the Moscow State University and at the Byelorussian State University. S. V. Fomin participated to a considerable extent in the writing of this book. Without his help this book could not have been written. The author wishes to thank Assistant Professor A. E. Turetski of the
Byelorussian State University, who made available to him notes of the lectures given by the author in 1945, and to D. A. Raikov, who carefully read the manuscript and made a number of valuable comments. The material in fine print is not utilized in the main part of the text and may be omitted in a first perfunctory reading. January 1948
I. GEL’FAND
vii
TABLE OF CONTENTS Page Preface to the second edition. vii
Preface to the first edition. I. nDimensional Spaces. Linear and Bilinear Forms .
§ 1. §2.
nDimensional vector spaces Euclidean space. . . . . . . . . . . .
§ 3.
Orthogonal basis. Isomorphism of Euclidean spaces . . .
14 21
§ 4.
Bilinear and quadratic forms . .
§ 5.
Reduction of a quadratic form to a sum of squares . .
§ 6.
Reduction of a quadratic form by means of a triangular transformation .
§ 7.
The law of inertia . .
34 42
Complex ndimensional space . . 11. Linear Transformations . . § 9.
. .
.
.
. . . .
. .
Linear transformations. Operations on linear transformations .
46 55 6O '70 70
§ 10. Invariant subspaces. Eigenvalues and eigenvectors of a linear
transformation. .
. . .
§ 11. The adjoint of a linear transformation. .
.
. . . .
.
81 90
§ 12. Selfadjoint (Hermitian) transformations. Simultaneous reduc
§ 14. Commutative linear transformations. Normal transformations .
97 103 107
§ 15. Decomposition of a linear transformation into a product of a unitary and selfadjoint transformation § 16. Linear transformations on a real Euclidean space .
114
§ 17. Extremal properties of eigenvalues
126
tion of a pair of quadratic forms to a. sum of squares .
§ 13. Unitary transformations . . . . . . . . . . . . . .
III. The Canonical Form of an Arbitrary Linear Transformation . § 18. The canonical form of a linear transformation . § 19. Reduction to canonical form . . . . §20. Elementary divisors .
.
. .
. .
.....
......
.
132 132 137 142 149
§ 21. Polynomial matrices . . IV. Introduction to Tensors.
.....
111
. .
. .
.
.
§ 22. The dual space . . . . §23.Tensors..............
. .
164 164 171
CHAPTER I
n—Dimensional Spaces. Linear and Bilinear Forms § I . nDimensional vector spaces 1. Definition of a vector space. We frequently come across objects which are added and multiplied by numbers. Thus 1. In geometry objects of this nature are vectors in three dimensional space, i.e., directed segments. Two directed segments are said to define the same vector if and only if it is possible to translate one of them into the other. It is therefore convenient to measure off all such directed segments beginning with one common point which we shall call the origin. As is well known the sum of two vectors x and y is, by definition, the diagonal of the parallelogram with sides x and y. The definition of multiplication by (real) numbers is equally well known. 2. In algebra we come across systems of n numbers x = (51, £2,   , En) (e.g., rows of a matrix, the set of coefficients of a linear form, etc.). Addition and multiplication of ntuples by numbers are usually defined as follows: by the sum of the ntuples
x = (51, £2, '  , 5n) and y = (”1,172,   31;”) we mean the ntuple x + y =(E1 + 171, 52 + 172,   , 5,, + 1;”). By the product of the number A and the ntuple x = (51, £2, .  , 5") we mean the ntuple 2.x = (151,152,  ° , 15"). 3. In analysis we define the operations of addition of functions and multiplication of functions by numbers. In the sequel we shall consider all continuous functions defined on some interval [(1, b]. In the examples just given the operations of addition and multiplication by numbers are applied to entirely dissimilar objects. To investigate all examples of this nature from a unified point of View we introduce the concept of a vector space. DEFINITION 1. A set R of elements x, y, z,    is said to be a
vector space over a field F 1f: [1]
2
LECTURES ON LINEAR ALGEBRA
(a) With every two elements x and y in R there is associated an element 2 in R which is called the sum of the elements x and y. The sum of the elements x and y is denoted by x + y. (b) With every element x in R and every number A belonging to the field F there is associated an element 11X in R. ﬁx is referred to as the product of x by Z. The above operations must satisfy the following requirements (axioms): I. l. X + y = y + x
2. (x + y) + z = x + (y —— z)
(commutativity)
(associativity)
3. R contains an element 0 such that x + 0 = x for all x in R. 0 is referred to as the zero element. 4. For every x in R there exists (in R) an element denoted by — x with the property x + (— x) = 0. II. 1. 1  x = x 2. ot(,3X) = afﬁx). III. 1. (a + ﬁ)x = ax + ﬁx 2.
«(x + y) = «X + ay.
It is not an oversight on our part that we have not specified how elements of R are to be added and multiplied by numbers. Any definitions of these operations are acceptable as long as the axioms listed above are satisfied. Whenever this is the case we are dealing with an instance of a vector space. We leave it to the reader to verify that the examples 1, 2, 3 above are indeed examples of vector spaces. Let us give a few more examples of vector spaces. 4. The set of all polynomials of degree not exceeding some natural number n constitutes a vector space if addition of polynomials and multiplication of polynomials by numbers are defined in the usual manner. We observe that under the usual operations of addition and multiplication by numbers the set of polynomials of degree n does not form a vector space since the sum of two polynomials of degree n may turn out to be a polynomial of degree smaller than n. Thus
=2t. 5. We take as the elements of R matrices of order n. As the sum
nDIMENSIONAL SPACES
3
of the matrices Ham“ and ]b,.k we take the matrix Haik + bikll. As the product of the number A and the matrix Hamil we take the matrix lilaikll. It is easy to see that the above set R is now a vector space.
It is natural to call the elements of a vector space vectors. The fact that this term was used in Example 1 should not confuse the reader. The geometric considerations associated with this word will help us clarify and even predict a number of results. If the numbers 1, pl,    involved in the definition of a vector
space are real, then the space is referred to as a real vector space. If the numbers A, ,u,    are taken from the field of complex numbers, then the space is referred to as a complex vector space. More generally it may be assumed that Z, ,a,   .’ are elements of an arbitrary field K. Then R is called a vector space over the field K. Many
concepts and theorems dealt with in the sequel and, in particular, the contents of this section apply to vector spaces over arbitrary fields. However, in chapter I we shall ordinarily assume that R is a real vector space.
2. The dimensionality of a vector space. We now define the notions of linear dependence and independence of vectors which are of fundamental importance in all that follows. DEFINITION 2. Let R be a vector space. We shall say that the vectors x, y, z,   , v are linearly dependent if there exist numbers
a, [3, y,    0, not all equal to zero such that
(1)
aX+ﬂy+VZ++BV=0.
Vectors which are not linearly dependent are said to be linearly independent. In other words, a set of vectors x, y, z,   , v is said to be linearly independent if
the equality
aX+ﬂy+rZ++GV=0 implies that a=ﬂ=y==0=O. Let the vectors x, y, z,   , v be linearly dependent, i.e., let x, y, z,   , v be connected by a relation of the form (1) with at least one of the coefficients, a, say, unequal to zero. Then
ax=—ﬂy—yz——0v. Dividing by a and putting
4
LECTURES 0N LINEAR ALGEBRA
—(I3/°¢) = 1. —(r/°t) = M. ' ' '. —(9/°c) = C, we have
(2)
X=ly+vZ++Cv.
Whenever a vector x is expressible through vectors y, z,   , v
in the form (2) we say that x is a linear combination of the vectors
y, z, . . .’ v. Thus, if the vectors x, y, z,   , v are linearly dependent then at
least one of them is a linear combination of the others. We leave it to the reader to prove that the converse is also true, i.e., that if one of a set of vectors is a linear combination of the remaining vectors then the vectors of the set are linearly dependent. EXERCISES. 1. Show that if one of the vectors x, y, z,   , v is the zero
vector then these vectors are linearly dependent. 2. Show that if the vectors x, y, z,    are linearly dependent and u, v,   
are arbitrary vectors then the vectors x, y, z,   , u, v,    are linearly dependent.
We now introduce the concept of dimension of a vector space. Any two vectors on a line are proportional,i.e.,linear1y depend— ent. In the plane we can find two linearly independent vectors but any three vectors are linearly dependent. If R is the set of vectors in threedimensional space, then it is possible to find three linearly independent vectors but any four vectors are linearly dependent. As we see the maximal number of linearly independent vectors on a straight line, in the plane, and in threedimensional space coincides with what is called in geometry the dimensionality of the line, plane, and space, respectively. It is therefore natural to make the following general definition. DEFINITION 3. A vector space R is said to be n—dimensional if it contains n linearly independent vectors and if any n + 1 vectors in R are linearly dependent. If R is a vector space which contains an arbitrarily large number of linearly independent vectors, then R is said to be infinite~ dimensional. Infinite—dimensional spaces will not be studied in this book. We shall now compute the dimensionality of each of the vector spaces considered in the Examples 1, 2, 3, 4, 5.
n—DIMENSIONAL SPACES
5
1. As we have already indicated, the space R of Example 1 contains three linearly independent vectors and any four vectors in it are linearly dependent. Consequently R is threedimensional. 2. Let R denote the space whose elements are ntuples of real numbers. This space contains n linearly independent vectors. For instance, the vectors
are easily seen to be linearly independent. On the other hand, any m vectors in R, m > n, are linearly dependent.
Indeed, let
Y1 = (7711; 7712» ° ' ': 7711:), Y2 = (7721: 7722; ' ' ': 7221;), Ym = (77ml! "m2, I ' ', 71mm)
be m vectors and let m > n. The number of linearly independent rows in the matrix 7211: 7712: ""711: 7721’ ’722: " ' 7721» 77ml: "m2! ' ' ' nmn
cannot exceed n (the number of columns). Since m > n, our m rows are linearly dependent. But this implies the linear dependence of the vectors y1, y2,   , ym.
Thus the dimension of R is n. 3. Let R be the space of continuous functions. Let N be any natural number. Then the functions: f1(t) E 1, f2(t) = t,   , fN(t) = tN—l form a set of linearly independent vectors (the proof of this statement is left to the reader). It follows that our space contains an arbitrarily large number of linearly independent functions or, briefly, R is infinitedimensional. 4. Let R be the space of polynomials of degree g n — 1. In this space the n polynomials l, t,   , t"—1 are linearly independent. It can be shown that any m elements of R, m > n, are linearly dependent. Hence R is ndimensional.
6
LECTURES ON LINEAR ALGEBRA
5. We leave it to the reader to prove that the space of n X n matrices Haw,” is nZdimensional. 3. Basis and coordinates in ndirnensional space DEFINITION 4. Any set of n linearly independent vectors e1, e2,   , en of an ndirnensional vector space Ris called a basis of R. Thus, for instance, in the case of the space considered in Example
1 any'three vectors which are not coplanar form a basis. By definition of the term “n—dimensional vector space” such a space contains n linearly independent vectors, i.e., it contains a basis. THEOREM 1. Every vector x belonging to an n—dirnensional vector space R can be uniquely represented as a linear combination of basis vectors.
Proof: Let e1, e2,   , en be a basis in R. Let x be an arbitrary vector in R. The set x, e1, e2,   , en contains n + 1 vectors. It
follows from the definition of an ndimensional vector space that these vectors are linearly dependent, i.e., that there exist n + 1
numbers a0, a1,   , can not all zero such that (3)
ocox + oclel +    + anen = 0.
Obviously «0 7k 0. Otherwise (3) would imply the linear dependence of the vectors e1, e2,   , en. Using (3) we have
This proves that every X e R is indeed a linear combination of the vectors e1, e2,   , en.
To prove uniqueness of the representation of x in terms of the basis vectors we assume that X=§1e1+§292+"'+§nen
and x = 5,191 + E’2e2 + ' ' ' “l' E'nenSubtracting one equation from the other we obtain 0 = ('51— 5,1)91 + (£2 — Elzle2 + ' ' ° +051: “ E’n)en'
nDIMENSIONAL SPACES
7
Since el, e2,   , en are linearly independent, it follows that §1_§’1=£2—£’2=".=£n_£’n=0:
i.e., 51:31;
52:52,
"3
En=£,n'
This proves uniqueness of the representation. DEFINITION 5. If e1, e2,   , e" form a basis in an ndimensional space and X = glel + 5262 + ' . ' + Enen’
(4)
then the numbers £1, £2,   , E" are called the coordinates of the vector x relative to the basis e1, e2,   ~, en.
Theorem 1 states that given a basis e1, e2,   , en of a vector space R every vector x e R has a unique set of coordinates. If the coordinates of x relative to the basis e1, e2,   , en are 51, £2,   , En and the coordinates of y relative to the same basis
are 171,172,    17”, i.e., if X=§1e1+52e2+"'+5nen y = ’71e1 + 77292 + ' ' ' + men, then x + y = (51 + ’hleI + (£2 + Uzlez + ' ' ' + (5n + Vlnlem
i.e., the coordinates of x + y are 51 + 171, 52 + 172,   , 5,, + 1;". Similarly the vector 1x has as coordinates the numbers 151, 152’ . . ., 15w Thus the coordinates of the sum of two vectors are the sums of the appropriate coordinates of the summands, and the coordinates of the product of a vector by a scalar are the products of the coordinates of that vector by the scalar in question. It is clear that the zero vector is the only vector all of whose coordinates are zero. EXAMPLES. 1. In the case of threedimensional space our definition of the coordinates of a vector coincides with the definition of the coordinates of a vector in a (not necessarily Cartesian) coordinate system. 2. Let R be the space of ntuples of numbers. Let us choose as basis the vectors
8
LECTURES ON LINEAR ALGEBRA
and then compute the coordinates 171, 172,   , 17,, of the vector x = (.51, £2,   , 5") relative to the basis e1, e2,   , en. By definition X=77191+772ez+ " ' +7inen§
+ 77,.(0, 0, ' ' ', 1) =(171,171+?72,"'M71+?72+"'+nn)
The numbers (171, 112,   , 17”) must satisfy the relations ’71 = 51: ’71 ‘l‘ 772 = ‘52» 171+772+".+77n='§n'
Consequently, ”1:51:
”2:52—51:
"'»
71n=§n—5n—1
Let us now consider a basis for R in which the connection between the coordinates of a vector x = (£1, £2,   , 5,) and the
numbers 51,52,   , 5,1 which define the vector is particularly simple. Thus, let
Then X=(1,£2,'
E=(1,,0
2:571)
,0)+52(0,1,,0)+
+.§n(0,0,,1)
=£e1+£2eg+'+Enen. It follows that m the space R of n tuples (£1, £2,   , 5“) the numbers
n—DIMENSIONAL SPACES
9
51,52,   , 5,, may be viewed as the coordinates of the vector
x = (£1, £2,   , 5,.) relative to the basis
en=(0,0,,1).
e1=(1,0,~,0), e2=(0,1,,0), EXERCISE. Show that in an arbitrary basis e1 = (“11: “in: ' ' '1 am). ex = (“21: “in: ' ' ' “2»). en = (any “as: ' ' '» am)
the coordinates 171, ’72:   , 17,, of a vector x = (51. £2,   , E") are linear combinations of the numbers 51, 5,,   , E”.
3. Let R be the vector space of polynomials of degree g n — 1. A very simple basis in this space is the basis whose elements are the vectors e1 = 1, e2 = t,  . , en = tnl. It is easy to see that the coordinates of the polynomial P(t) = aot”—1 + alt"—2 +    +a,,_1 in this basis are the coefficients an_1, an_2,   , a0. Let us now select another basis for R:
e’1 = 1,
e’2 = t — a,
e’3 = (t — a)2,
,
e’n = (t — a)"—1.
Expanding P(t) in powers of (t — a) we find that
P(t) = P(d) + P’(d)(t—a) + ' ' ' + [P‘"“’(a)/("—1)1](¢—¢)”‘1~ Thus the coordinates of P(t) in this basis are
P(d), P'(d), ' ' a [P‘"‘1’(a)/(% — 1)ll4. Isomorphism of rLdtmehsional vector spaces. In the examples considered above some of the spaces are identical with others when it comes to the properties we have investigated so far. One instance of this type is supplied by the ordinary threedimensional space R considered in Example 1 and the space R’ whose elements are triples of real numbers. Indeed, once a basis has been selected in
R we can associate with a vector in R its coordinates relative to that basis; i.e., we can associate with a vector in R a vector in R’.
When vectors are added their coordinates are added. When a vector is multiplied by a scalar all of its coordinates are multiplied by that scalar. This implies a parallelism between the geometric properties of R and appropriate properties of R’. We shall now formulate precisely the notion of “sameness” or of “isomorphism” of vector spaces.
10
LECTURES ON LINEAR ALGEBRA
DEFINITION 6. Two vector spaces R and R1, are said to be iso
morphic if it is possible to establish a one—toone correspondence x x’ between the elements x e R and x’ e R’ such that if x4—> x’ and y y’, then 1. the vector which this correspondence associates with x + y is
X’ + y’, 2. the vector which this correspondence associates with 1X is Zx’. There arises the question as to which vector spaces are isomorphic and which are not. Two vector spaces of different dimensions are certainly not isomorphic. Indeed, let us assume that R and R’ are isomorphic. If x, y,   are vectors in R and x’, y’,    are their counterparts in R’ then
— in View of conditions 1 and 2 of the definition of isomorphism — the equation 1x + by —  '  = 0 is equivalent to the equation Ax’ + ,uy’ +    = 0. Hence the counterparts in R’ of linearly independent vectors in R are also linearly independent and conversely. Therefore the maximal number of linearly independent vectors in R is the same as the maximal number of linearly independent vectors in R’. This is the same as saying that the dimensions of R and R’ are the same. It follows that two spaces of different dimensions cannot be isomorphic. THEOREM 2. All vector spaces of dimension n are isomorphic. Proof: Let R and R’ be two ndimensional vector spaces. Let e1, e2,  ~ , en be a basis in R and let e’1,e’2,, e’n be a basis in
R’. We shall associate with the vector (5)
X=§1el+§2e2+"'+£nen
the vector X, = 519,1 + Ezelz + ' ' ' + Enelm
i.e., a linear combination of the vectors e’i with the same coeffi
cients as in (5). This correspondence is onetoone. Indeed, every vector x e R has a unique representation of the form (5). This means that the 5,. are uniquely determined by the vector x. But then x’ is likewise uniquely determined by x. By the same token every x’ eR’ determines one and only one vector x e R.
n—DIMENSIONAL SPACES
11
It should now be obvious that if xx’ and ye) y’, then x + y x’ + y’ and Ax Zx’. This completes the proof of the isomorphism of the spaces R and R’. In § 3 we shall have another opportunity to explore the concept of isomorphism. 5. Subspaces of a vector space DEFINITION 7. A subset R’, of a vector space R is called a subspace of R if it forms a vector space under the operations of addition and scalar multiplication introduced in R. In other words, a set R’ of vectors x, y,    in R is called a subspace of R if x e R’, y e R’ implies x + y e R’, A x e R’.
EXAMPLES. 1. The zero or null element of R forms a subspace of R. 2. The whole space R forms a subspace of R. The null Space and the whole space are usually referred to as improper subspaces. We now give a few examples of non—trivial subspaces. 3. Let R be the ordinary threedimensional space. Consider any plane in R going through the origin. The totality R’ of vectors in that plane form a subspace of R. 4. In the vector space of ntuples of numbers all vectors x = (51, £2,   , E") for which 51 = 0 form a subspace. More generally, all vectors x = (61, £2,   , 4:") such that 4151 + “252 + ' ' ' + “7.51. = 0)
where a1, a2,   , an are arbitrary but fixed numbers, form a subspace. 5. The totality of polynomials of degree g n form a subspace of the vector space of all continuous functions. It is clear that every subspace R’ of a vector space R must con— tain the zero element of R. Since a subspace of a vector Space is a vector space in its own right we can Speak of a basis of a subspace as well as of its dimensionality. It is clear that the dimension of an arbitrary subspace of a vector space does not exceed the dimension of that vector space. EXERCISE. Show that if the dimension of a subspace R’ of a vector space R is the same as the dimension of R, then R’ coincides with R.
12
LECTURES ON LINEAR ALGEBRA
A general method for constructing subspaces of a vector space R is implied by the observation that if e, f, g, are a (finite or infinite) set of vectors belonging to R, then the set R’ of all (finite) linear combinations of the vectors e, f, g,    forms a subspace R’ of R. The subspace R’ is referred to as the subspace generated by the vectors e, f, g,   . This subspace is the smallest subspace of R containing the vectors e, f, g,   . The subspace R’ generated by the linearly independent vectors e1, e2,   , ek is k—dimensional and the vectors e1, e2,   , ek form a
basis of R’ Indeed, R’ contains k linearly independent vectors (i.e., the vectors e1, e2,    eh).
On the other hand, let x1,
x2,   , X; be l vectors in R’ and let l > k. If x1 = 51191 ‘l‘ 51292 + ' ' ' + 517cc)“ x2 = £21e1 + 52292 + ' ' ' + §2kekl Xl = 51191 + 51292 “I” ' ' ' + Elkekl then the l rows in the matrix 511: 512: ' ' , Elk 521’ 522: ' ' 3 r52k
Eu: ‘Em: ' ' ': 5m
must be linearly dependent. But this implies (cf. Example 2, page 5) the linear dependence of the vectors x1, x2,   , x,. Thus the maximal number of linearly independent vectors in R’, i.e., the dimension of R’, is k and the vectors e1, e2,   , e,c form a basis
in R’. EXERCISE. Show that every ndimensional vector space contains subspaces of dimension l, l: 1, 2,   , n.
If we ignore null spaces, then the simplest vector spaces are one— dimensional vector spaces. A basis of such a space is a single vector e1 :ﬁ 0. Thus a onedimensional vector space consists of all vectors ocel, where a is an arbitrary scalar. Consider the set of vectors of the form x = x0 + ocel, where x0
and e1 75 0 are fixed vectors and ac ranges over all scalars. It is natural to call this set of vectors ~— by analogy with threedimensional space — a line in the vector space R.
n~DIMENSIONAL SPACES
13
Similarly, all vectors of the form ace1 + ﬂez, where e1 and e2 are fixed linearly independent vectors and on and B are arbitrary numbers form a twodimensional vector space. The set of vectors of the form x=xo+ oce1+13e2, where x0 is a fixed vector, is called a (two—dimensional) plane. EXERCISES. 1. Show that in the vector space of ntuples (£1, 5,,   , .5”)
of real numbers the set of vectors satisfying the relation a151+a2£2+”'+an£n=0
(a1, a2,  ~ ~, an are ﬁxed numbers not all of which are zero) form a subspace of dimension n — l. 2. Show that if two subspaces R1 and R2 of a vector space R have only the null vector in common then the sum of their dimensions does not exceed the dimension of R. 3. Show that the dimension of the subspace generated by the vectors e, f, g,    is equal to the maximal number of linearly independent vectors among the vectors e, f, g,   .
6. Transformation of coordinates under change of basis.
Let
e1, e2,   , en and e’l, e’z,   , e’,1 be two bases of an ndimensional
vector space. Further, let the connection between them be given by the equations (6)
e 1 = “1191 + “2192 + ' ' ' + “mien: e 2 = “1291 + ‘12292 + ' ' ' + “nzem e n = “Inel + a2ne2 + ' ' ' + annen'
The determinant of the matrix .2! in (6) is different from zero (otherwise the vectors e'l, e’2,   , e’n would be linearly depend— ent). Let ii be the coordinates of a vector x in the first basis and é’i its coordinates in the second basis. Then x = 5191 + 5292 + ' ' ' + 5719" = 5'19'1 + 5'29’2 + ' ’ ' + Elneln' Replacing the e’i with the appropriate expressions from (6) we get
x = Elel + Ezez + ' ' ' "i" Enen = 5,1(“1191 + “2192+ ‘ ' ‘ +“n1en)
+ 53011291 + “22% + ' ' ' +“nzen)
+ ........................... + £In(alne1+a2ne2+ . . ' +annen)'
14
LECTURES 0N LINEAR ALGEBRA
Since the e, are linearly independent, the coefficients of the e, on both sides of the above equation must be the same. Hence
(7)
‘51 = “1151 + “125,2 + ' ' ' + “lug/1n ,", 52 = “215,1 + “2252 + ' ' ‘ + “27.5 51» = “mg/1 + “2225'2 + ' ' ' + “Mign
Thus the coordinates E, of the vector x in the first basis are expressed through its coordinates in the second basis by means of the matrix 41’ which is the transpose of .521. To rephrase our result we solve the system (7) for 5’1, 5’2,  ~ , 5 n. Then 5’1 = bug1 + 51252 + ' ' ' + 51155“: 5'2 = b21§1 + b2252 + ' ' ' + b21557“ .n..................... .....
where the bi,c are the elements of the inverse of the matrix 421’. Thus, the coordinates of a vector are transformed by means of a matrix .93 which is the inverse of the transpose of the matrix .2! in (6) which determines the change of basis. § 2. Euclidean space 1. Definition of Euclidean space. In the preceding section a vector space was defined as a collection of elements (vectors) for which there are defined the operations of addition and multiplication by scalars. By means of these operations it is possible to define in a vector space the concepts of line, plane, dimension, parallelism of lines,
etc. However, many concepts of so—called Euclidean geometry cannot be formulated in terms of addition and multiplication by scalars. Instances of such concepts are: length of a vector, angles between vectors, the inner product of vectors. The simplest way of introducing these concepts is the following. We take as our fundamental concept the concept of an inner product of vectors. We define this concept axiomatically. Using the inner product operation in addition to the operations of addi
nDIMENSIONAL SPACES
15
tion and multiplication by scalars we shall find it possible to devel— op all of Euclidean geometry. DEFINITION 1. If with every pair of vectors x, y in a real vector space R there is associated a real number (x, y) such that
1 (x, Y) = (y. X), 2.
(1x, y) = Mx, y),
(A real) 3 (xi + X2, Y) = (X1: Y] + (x2: Y], 4. (x, x) g 0 and (x, x) = 0 if and only ifx = 0, then we say that an inner product is defined in R. A vector space in which an inner product satisfying conditions 1 through 4 has been defined is referred to as a Euclidean space. EXAMPLES. 1. Let us consider the (threedimensional) space R of vectors studied in elementary solid geometry (cf. Example 1, § 1). Let us define the inner product of two vectors in this space as the product of their lengths by the cosine of the angle between them. We leave it to the reader to verify the fact that the operation just defined satisfies conditions 1 through 4 above. 2. Consider the space R of n—tu‘ples of real numbers. Let x = (51, £2,   , E”) and y = (111, 172,   , n”) be in R. In addition
to the definitions of addition X+y=(51+771:52+’72:"':5n+77n)
and multiplication by scalars AX =
(151’ 1'52) . I '1 157,)
with which we are already familiar from Example 2, § 1, we define the inner product of x and y as (x: Y) = 51771 + r52712 + ' ' ' + $1.17,.
It is again easy to check that properties 1 through 4 are satisfied by (x, y) as defined. 3. Without changing the definitions of addition and multiplication by scalars in Example 2 above we shall define the inner product of two vectors in the space of Example 2 in a different and more general manner. Thus let a,,, be a real n X n matrix. Let us put
16
(1)
LECTURES ON LINEAR ALGEBRA
(X: Y) = “115101 + “1251712 + ' ' ' + “Inglnn + “2152711 + “2252772 ‘l‘ ' ' ' ‘l‘ “27152711. + anlénnl + an2§nn2 + . . . + annénnn'
We can verify directly the fact that this definition satisfies Axioms 2 and 3 for an inner product regardless of the nature of the real matrix Ham“. For Axiom 1 to hold, that is, for (x, y) to be symmetric relative to x and y, it is necessary and sufficient that (2)
“He =alci:
i.e., that ”am” be symmetric. Axiom 4 requires that the expression (3)
(x, x) = 2 “17:51:51: i,k=1
be nonnegative fore very choice of the n numbers £1, £2,   , 5,, and that it vanish only if .51 = 52 =    = 5,, = 0. The homogeneous polynomial or, as it is frequently called, quadratic form in (3) is said to be positive definite if it takes on non—negative values only and if it vanishes only when all the E,are zero. Thus for Axiom 4 to hold the quadratic form (3) must be positive definite. In summary, for (1) to define an inner product the matrix Ham“ must be symmetric and the quadratic form associated with Haikll must be positive definite. If we take as the matrix “an,” the unit matrix, i.e., if we put a“ = 1 and a“, = 0 (i 72 k), then the inner product (x, y) defined by (1) takes the form (X, Y) = 12151472' and the result is the Euclidean space of Example 2. EXERCISE.
Show that the matrix
0 1
1 0
cannot be used to define an inner product (the corresponding quadratic form is not positive definite), and that the matrix
(1 i)
can be used to define an inner product satisfying the axioms 1 through 4.
n—DIMENSIONAL SPACES
17
In the sequel (§ 6) we shall give simple criteria for a quadratic form to be positive definite. 4. Let the elements of a vector space be all the continuous functions on an interval [a, b]. We define the inner product of two such functions as the integral of their product
(f. g) = f f(t)g(t) alt. It is easy to check that the Axioms 1 through 4 are satisfied. 5. Let R be the space of polynomials of degree g n — 1. We define the inner product of two polynomials as in Example 4
(P. Q) = f P(t)Q(t)dt2. Length of a vector. Angle between two vectors. We shall now make use of the concept of an inner product to define the length of a vector and the angle between two vectors. DEFINITION 2. By the length of a vector x in Euclidean space we mean the number
(4)
x/ (X, X)
We shall denote the length of a vector x by the symbol [X]. It is quite natural to require that the definitions of length of a vector, of the angle between two vectors and of the inner product of two vectors imply the usual relation which connects these quantities. In other words, it is natural to require that the inner product of two vectors be equal to the product of the lengths of these vectors times the cosine of the angle between them. This dictates the following definition of the concept of angle between two vectors.
DEFINITION 3. By the angle between two vectors X and y we mean the number
q) = arc cos (x, Y) ,
lxl IYI ’
i.e., we put
(5)
_ (x. Y)
°°S "’ _ l IyI '
18
LECTURES ON LINEAR ALGEBRA
The vectors x and y are said to be orthogonal if (x, y) = 0. The angle between two nonzero orthogonal vectors is clearly 75/2. The concepts just introduced permit us to extend a number of theorems of elementary geometry to Euclidean spaces. 1 The following is an example of such extension. If X and y are orthogonal vectors, then it is natural to regard x + y as the diagonal of a rectangle with sides x and y. We shall show that
[X + 3’12 = IX]2 + IYIz, i.e., that the square of the length of the diagonal of a rectangle is equal to the sum of the squares of the lengths of its two nonparallel sides (the theorem of Pythagoras).
Proof: By definition of length of a vector
IX+y2= (X+y,X+y)In view of the distributivity property of inner products (Axiom 3),
(X+y,X+y) = (X.X) + (x,y) + (y,X) + (say)Since x and y are supposed orthogonal,
(x, y) = (y, X) = 0Thus
lx + VI2 = (x, x) + (y, y) = IXI2 + IYIZ. which is what we set out to prove. This theorem can be easily generalized to read: if x, y, z,   
are pairwise orthogonal, then
Ix+y+z+12= l2+ IYI2+IZIZ+~° 3. The Schwarz inequality. In para. 2. we defined the angle (p between two vectors x and y by means of the relation cosgo =
(x, y) IXI IYI
If (p is to be always computable from this relation we must show that 1 We could have axiomatized the notions of length of a vector and angle between two vectors rather than the notion of inner product. However,
this course would have resulted in a more complicated system of axioms than that associated with the notion of an inner product.
19
_1 0, A2 > 0,   , A" > 0. Then there exists a
basis e1, e2,   , en in which A (X; x) takes the form A(X5 X) = 11512 + A2522 ‘i‘ ' ' ' + 17.5112» where all the hi are positive. Hence A(x; x) g 0 for all X and
A(x;x> = z w: 0 i=1
is equivalent to £1=EZ="'=§"=0_
In other words,
If A1 > 0, A2 > 0,   , An > 0, then the quadratic form A(x; x) is positive definite.
52
LECTURES ON LINEAR ALGEBRA
Conversely, let A (x; x) be a positive definite quadratic form. We shall show that then Ak>0 (k= 1,2,,n).
We first disprove the possibility that
A(f1;f1) Am; f2)  Am; ft) Air.
= Adz; f1)
A(f2; f2)
' ' '
A(flc;f1) A(fk;f2)
A (f2; fk)
A(fk;flc)
If A ,c = 0, then one of the rows in the above determinant would be a linear combination of the remaining rows, i.e., it would be possible to find numbers M1, ,uz,   , ,uk not all zero such that lulA(f1;fi) + #21462; ft) + ' ' ' + MAW ft) = 0» i= 1, 2,   , k.
But then
A(H1f1 + Hzfz + ’ ' ° + :ukfk; ft) = 0
(Z = 1: 2» ' ° " k):
so that Alﬂlfl + (“21.2 + ' ' ' + Iukfk; M1f1 + sz + ' ' ' + Mafia) = 0
In view of the fact that ,alfl + ,uzfz +    + :ukfk 7+_ 0, the latter equality is incompatible with the assumed positive definite nature of our form. The fact that Ah 7E 0 (k = 1,   , n) combined with Theorem 1
permits us to conclude that it is possible to express A (x; x) in the form A(x; X) = 11512 + 12522 + . . . + 1251}, 110 = Ala—1
Ah ' Since for a positive definite quadratic form all ilk > 0, it follows that all Ak > 0 (we recall that A0 = 1). We have thus proved
THEOREM 3. Let A (x; y) be a symmetric bilinear form and f1, f2,   , f" , a basis of the ndirnensional space R. For the quadratic form A (x; x) to be positive definite it is necessary and sufficient that A1>0,A2>0,,An>0.
This theorem is known as the Sylvester criterion for a quadratic form to be positive definite.
rtDIMENSIONAL SPACES
53
It is clear that we could use an arbitrary basis of R to express the conditions for the positive definiteness of the form A (x; x). In particular if we used as another basis the vectors f1, f2,   , f" in changed order, then the new 41,, A2,   , A” would be different principal minors of the matrix
Ha,,. This implies the following interesting COROLLARY. If the principal minors A1, A2,   , A” of a matrix ”an,” of a quadratic form A (x; x) relative to some basis are positive, then all principal minors of that matrix are positive.
Indeed, if A, , A2,   , A" are all positive, then A (x; x) is positive definite.
Now let A be a principal minor of ”am” and let p1, p2,   , pk be the numbers of the rows and columns of ”an,“ in A. If we permute the original basis vectors so that the pith vector occupies the ith position (i = 1,   , k) and express the conditions for positive definiteness of A (x; x) relative to the new
basis, we see that A > 0. 3.
The Gramm determinant. The results of this section are valid
for quadratic forms A(x; x) derivable from inner products, i.e.,
for quadratic forms A(x; x) such that A(x; x) E ‘(x, x). If A (x; y) is a symmetric bilinear form on a vector space R and A (x; x) is positive definite, then A (x; y) can be taken as an inner product in R, i.e., we may put (x, y) E A (x; y). Conversely, if (x, y) is an inner product on R, then A (x; y) E (X, y) is a bilinear symmetric form on R such that A (x; x) is positive definite. Thus every positive definite quadratic form on R may be identified with an inner product on R considered for pairs of equal vectors only,
A (x; x) E (x, x). One consequence of this correspondence is that every theorem concerning positive definite quadratic forms is at the same time a theorem about vectors in Euclidean space. Let e1, ez,   , ek be k vectors in some Euclidean space. The determinant (e1: e1)
(e1: e2)
' ' '
(91» etc)
(62, e1)
(82, e2)
' ' '
(82, etc)
(etc: e1)
(etc: 92)
" '
(etc! etc)
is known as the Gramm determinant of these vectors.
THEOREM 4.
The Gramm determinant of a system of vectors
e1, e2, ‘ ~ , ek is always 2 0. This determinant is zero if and only if the vectors 61, e2,   , e7, are linearly dependent.
54
LECTURES ON LINEAR ALGEBRA
Proof: Assume that e1, e2,   , ek are linearly independent. Consider the bilinear form A (x; y) E (x, y), where (x, y) is the inner product of x and y. Then the Gramm determinant of e1, e2,   , ek coincides with the determinant Ak discussed in this section (cf. (7)). Since A (x; y) is a symmetric bilinear form such
that A (X; x) is positive definite it follows from Theorem 3 that Ah > 0. We shall show that the Gramm determinant of a system of linearly dependent vectors e1, e2,   , ek is zero. Indeed, in that
case one of the vectors, say ek , is a linear combination of the others, ek = 1191 + 1292 + ' ' ' + lk—1ek—1It follows that the last row in the Gramm determinant of the vectors e1, e2,   , ek is a linear combination of the others and the
determinant must vanish. This completes the proof. As an example consider the Gramm determinant of two vectors x and y
A 2 _‘ my)‘ _ (y, X) (y, y)
The assertion that A2 > 0 is synonymous with the Schwarz inequality. EXAMPLES. 1. In Euclidean threespace (or in the plane) the determinant A, has the following geometric sense: A, is the square of the area of the parallelogram with sides x and y. Indeed,
(x, Y) = (y, x) = lxl ' IYI COS (P: where «p is the angle between x and y. Therefore,
An = lz IV!” — IXI” IYI“ cos” «p = 1s IYI’ (1 — W 92) = la IYI”sin”¢. i.e., A2 has indeed the asserted geometric meaning. 2. In three—dimensional Euclidean space the volume of a parallelepiped on the vectors x, y, z is equal to the absolute value of the determinant 131
x2
x3
1) = 3/1
3/2
3/3 
3'1
3:
zs
where x“ y“ z, are the Cartesian coordinates of x, y, 2. Now, 9’12 + x22 + was
”a = 3/1131 + 312372 + yams 21161 + 22%,, + zaxa
w191 + 33292 + $3318
$121 + $22, + $333
ylz + (‘122 + I‘laa
3/121 + 3/23: + gal, =
213/1 + 323/2 + 23%:
212 + 222 + 332
nDIMENSIONAL SPACES
55
(x: x)
(x. Y)
(x. z)
(z, x)
(z, Y)
(Z: 2)
= (y. X)
(y, Y)
(y, 2)
Thus the Gramm determinant of three vectors x, y, z is the square of the volume of the parallelepiped on these vectors.
Similarly, it is possible to Show that the Gramm determinant of k vectors x, y,   , w in a kdimenional space R is the square of the determinant
$1
_= f(X) +f(y),
kind it
2. f(}.x) = hf(x). Using the method of § 4 one can prove that every linear function of the first kind can be written in the form f(X) = “151 + “252 + ' . I + anErn
where 5,. are the coordinates of the vector x relative to the basis
e1, e2,   , e1, and al are constants, a, = f(e,), and that every linear function of the second kind can be written in the form
f(X) = 5151 + 5252 + ' ' ' + baguDEFINITION 1. We shall say that A(x; y) is a bilinear form (function) of the vectors x and y if: 1. for anyfixed y, A (x; y) is a linearfunction of thefirst kind of x, 2. for any fixed x, A (x; y) is a linear function of the second kind of y. In other words,
1. A(x1 + x2; y) = A(x1; y) + A(x2; y),
A (11x; 3’) = 1/1 (X; y).
64
LECTURES ON LINEAR ALGEBRA
2 A (X3 Y1 + YE) = A (X; Y1) + A (X; 3’2); A(x; 1y) = AA(X; y).
One example of a bilinear form is the inner product in a unitary
space
A (X; y) = (x, 3’) considered as a function of the vectors X and y. Another example is the expression A(X; y) = Z “mfiﬁk i,k=1
viewed as a function of the vectors x
=
£161
+
§2e2
+
I
.
I
+
Sine":
y = ’71e1 + 77292 + ‘ ' ' + men
Let e1, e2,   , en be a basis of an ndimensional complex space.
Let A (x; y) be a bilinear form. If x and y have the representations X=§1el+£2e2+ "'+§nen»y=771e1+772e2+""l'nneW
then A(x; Y) = 1405191 + 5292 + ' ' ' Enen; 77191 + 772% + ' ' ' + me") = 2 EiﬁkA (er; 91:)i,k=1
The matrix Haw” with at}: = A (ei; ek) is called the matrix of the bilinear form A (X; y) relative to the basis e1, e2,   , en. If we put y = x in a bilinear form A (x; y) we obtain a function A (X; x) called a quadratic form (in complex space). The connection between bilinear and quadratic forms in complex space is summed up in the following theorem: Every bilinear form is uniquely determined by its quadratic form. 6 6 We recall that in the case of real vector spaces an analogous statement holds only for symmetric bilinear forms (cf. § 4).
nDIMENSIONAL SPACES
65
Proof: Let A (x; x) be a quadratic form and let x and y be two arbitrary vectors. The four identities 7:
(I) A (X+y; X+Y) = A (X; x) + A (y; X) + A (X: Y) + A (y; V) (II) A(X+iy; X+iy)=A(X; X)+iA(y; X)—iA(X; y)+A(y; y), (111) A (x—y; X—y) = A (X; x)—A (y; X) —A (X; y)+A (y; 3'), (IV) A(x——z'y; x—iy)=A(x; x)—z'A (y; x)+iA(x; y)+A(y; y),
enable us to compute A (x; y). Namely, if we multiply the equations (1), (II), (III), (IV) by 1, i, — 1, — ’5, respectively, and add the results it follows easily that
(l) A(X;y) = i{A(X + y;x + y) + iA(X + iy;X+ W) — A(X — y;x — .V) — iA(X —iy;x — M}
Since the right side of (1) involves only the values of the quadratic form associated with the bilinear form under consideration our assertion is proved. If we multiply equations (I), (II), (III), (IV) by 1, — i, — 1, i, respectivly, we obtain similarly,
(2) A(y; X) = i{A(X + y;x + y) — iA(X + iy; x + 1'37) —A(x —— y;x — y) +15A(x — iy;x — iy)}. DEFINITION 2. A bilinearform is called Hermitian if
A (X; y) = A (y; X)This concept is the analog of a symmetric bilinear form in a real Euclidean vector space. For a form to be Hermitian it is necessary and sufficient that its matrix a,,, relative to some basis satisfy the condition “He = dki'
Indeed, if the form A(x; y) is Hermitian, then “m = A (er; ek) = A (eh; er) = dki'
Conversely, if a“, = dki, then A (X; Y) = 2 aiksiﬁk = 2 “kinkgi = A(y; x)NOTE: If the matrix of a bilinear form satisfies the condition
" Note that A(x; 1y) = IA (x; y), so that, in particular, A(x;11y)
= —iA(X; y)
66
LECTURES ON LINEAR ALGEBRA
a“, = a“, then the same must be true for the matrix of this form relative to any other basis. Indeed, a“ = a)“. relative to some basis implies that A (x; y) is a Hermitian bilinear form; but then aik=ziu relative to any other basis. If a bilinear form is Hermitian, then the associated quadratic form is also called Hermitian. The following result holds:
For a bilinear form A(x; y) to be Hermitian it is necessary and sufficient that A(x; x) be real for every vector x. Proof: Let the form A(x; y) be Hermitian; i.e., let A(x;y) = A(y; x). Then A(x; x) = A (x; x), so that the number A(x; x) is real. Conversely, if A (x; x) is real for al x, then, in particular, A (x + y; x + y), A (x + iy; x + iy),A (x — y; x—y),
A (x — iy; x — iy) are all real and it is easy to see from formulas
(1) and (2) that A(x; y) = A(Y;X). COROLLARY. A quadratic form is Hermitian if and only ifit is real valued. The proof is a direct consequence of the fact just proved that for a bilinear form to be Hermitian it is necessary and sufficient that A (x; x) be real for all x. One example of a Hermitian quadratic form is the form
A(X; X) = (x, X), where (x, x) denotes the inner product of x with itself. In fact, axioms 1 through 3 for the inner product in a complex Euclidean space say in effect that (x, y) is a Hermitian bilinear form so that (x, x) is a Hermitian quadratic form. If, as in § 4, we call a quadratic formA (x; X) positive definite when A(x;x)>0
for X750,
then a complex Euclidean space can be defined as a complex vector space with a positive definite Hermitian quadratic form. If .27 is the matrix of a bilinear form A (x; y) relative to the basis e1, e2,   ~, en and g the matrix of A (x; y) relative to the
basis f1, f2,   , f” and if f) = 2 one. (j = 1,   , n), then i=1
ﬂ = (KHz/g.
n—DIMENSIONAL SPACES
67
Here g = ”cull and %* = c*i,.[[ is the conjugate transpose of % i.e., c*,.,. = 5w The proof is the same as the proof of the analogous fact in a real
space. 5. Reduction of a quadratic form to a sum of squares THEOREM 1. Let A(x; x) be a Hermitian quadratic form in a complex vector space R. Then there is a basis e1, e2,   , en of R relative to which the form in question is given by A(X§ X) = 3115151 + 125252 + ' ' ' + 11:51:51“
where all the A’s are real. One can prove the above by imitating the proof in §5 of the analogous theorem in a real space. We choose to give a version of the proof which emphasizes the geometry of the situation. The idea is to select in succession the vectors of the desired basis. We choose e1 so that A (e1; e1) 75 0. This can be done for otherwise A (X; x) = 0 for all x and, in view of formula (1), A (x; y) E 0. Now we select a vector e2 in the (n — 1)—dimensional space R”) consisting of all vectors x for which A (e1; x) = 0 so that A(e2, e2) 72 0, etc. This process is continued until we reach the space R") in which A(X; y) E 0 (RM may consist of the zero vector only). If R”) sé 0, then we choose in it some basis e,+1, e,+2,   , en. These vectors and the vectors e1, e2,   , e, form a basis of R. Our construction implies A(e,.; eh) = 0
for i < k.
On the other hand, the Hermitian nature of the form A(x; y) implies A(e,.; ek) = 0
for i > k.
It follows that if X=§1ei+§2ez+'°'+§nen is an arbitrary vector, then
A(x; x) = 515114031; el) + £2§2A(e2; e2) + ‘ ' ' + EngnA(en5 en)! where the numbers A(e,.; e.) are real in View of the Hermitian
68
LECTURES ON LINEAR ALGEBRA
nature of the quadratic form. If we denote A (e1. ; e1) by hi, then A(x;x) = “15151 + 12525.2 + ' ' ' + 11.51157. = “iléilz + 12ml:
+    + 2.15.12.
6. Reduction of a Hermitian quadratic form to a sum of squares by means of a triangular transformation. Let A (X; x) be a Hermitian quadratic form in a complex vector space and e1, e2,   , en a basis. We assume that the determinants
A 1 =a11!
“11 “12 A 2 = “21 “2?
J
Ar _ b
“11 “12 ' ' ' “1n 21 22 211
a
a
o
a
l
a
l
............
where a“, = A (ei; ek), are all different from zero. Then just as in § 6, we can write down formulas for finding a basis relative to which the quadratic form is represented by a sum of squares. These formulas are identical with (3) and (6) of § 6. Relative to such a basis the quadratic form is given by A
A A A A2 rel 2 + . . . + A""—1 lei 2, A1 I511 2 + _l (x, x) = _0
where A0 = 1. This implies, among others, that the determinants A1, A2,   , A” are real. To see this we recall that if a Hermitian quadratic form is reduced to the canonical form (3), then the coefficients are equal to A (ed; e1.) and are thus real. EXERCISE. Prove directly that if the quadratic form A (x; x) is Hermitian, then the determinants A0, A1,   , A" are real.
Just as in § 6 we find that for a Hermitian quadratic form to be positive definite it is necessary and sufficient that the determinants A1, A2,   , An be positive. The number of negative multipliers of the squares in the canonical form of a Hermitian quadratic form equals the number of changes of sign in the sequence
1, A1, A2,   , An. 7. The law of inertia THEOREM 2. If a Hermitian quadratic form has canonical form
nDIMENSIONAL SPACES
69
relative to two bases, then the number of positive, negative and zero coefficients is the same in both cases. The proof of this theorem is the same as the proof of the corresponding theorem in § 7. The concept of rank of a quadratic form introduced in § 7 for real spaces can be extended without change to complex spaces.
CHAPTER II Linear Transformations § 9. Linear transformations. Operations on linear transformations
1. Fundamental definitions. In the preceding chapter we studied functions which associate numbers with points in an ndimensional vector space. In many cases, however, it is necessary
to consider functions which associate points of a vector space with points of that same vector space. The simplest functions of this type are linear transformations. DEFINITION 1. If with every vector x of a vector space R there is associated a (unique) vector y in R, then the mapping y = A(x) is called a transformation of the space R. This transformation is said to be linear if the following two conditions hold: 1 A(x1 + X2) = A(x1) + A(x2), 2. A(}.x) = 1A(x). Whenever there is no danger of confusion the symbol A(x) is replaced by the symbol Ax. EXAMPLES. 1. Consider a rotation of threedimensional Euclidean space R about an axis through the origin. If x is any vector in R, then Ax stands for the vector into which X is taken by this rotation. It is easy to see that conditions 1 and 2 hold for this mapping. Let us check condition 1, say. The left side of 1 is the result of first adding x1 and x2 and then rotating the sum. The right side of 1 is the result of first rotating X1 and x2 and then adding the results. Clearly, both procedures yield the same vector. 2. Let R’ be a plane in the space R (of Example 1) passing through the origin. We associate with x in R its projection x’ = Ax on the plane R’. It is again easy to see that conditions 1 and 2 hold. 70
LINEAR TRANSFORMATIONS
71
3. Consider the vector space of ntuples of real numbers. Let ”an,“ be a (square) matrix. With the vector x:
(51,52)...) 57;)
we associate the vector y = AX = (771: 772’ ' ' I: 177:):
where 772' = 2 aikék' k=l
This mapping is another instance of a linear transformation. 4. Consider the ndimensional vector space of polynomials of degree g n — 1. If we put AP(t) = P'(t), where P’ (t) is the derivative of P(t), then A is a linear transformation. Indeed
1 [P1(t) + Pz(t)]' = P10) + P20), 2. [AP(t)]’ = lP’(t). 5. Consider the space of continuous functions f(t) defined on the interval [0, 1]. If we put
Af(t) = font) «in then Af(t) is a continuous function and A is linear. Indeed,
I. Am +22) = f: [13(1) +f2(7)] dz = f 13(1) ah + f 13(1) d: = A1: + Afz; 2. AW) = I: Mr) d1: = if: fa) d1 = w. Among linear transformations the following simple transforma— tions play a special role. The identity mapping E defined by the equation Ex = X for all x.
LECTURES ON LINEAR ALGEBRA
72
The null transformation 0 defined by the equation Ox = 0 for all x. 2. Connection between matrices and linear transformations. Let e1, e2,   , en be a basis of an ndimensional vector space R and
let A denote a linear transformation on R. We shall show that Given n arbitrary vectors g1, g2,   , g" there exists a unique linear transformation A such that Ae:l = g,
Ae2 = g2,
,
Aen = g”.
We first prove that the vectors Ael, Aez,   , Aen determine A
uniquely. In fact, if (1)
x = 5191 + 52% + ' ' ' + Enen
is an arbitrary vector in R, then
(2)
AX = A(£1e1 + §2e2 + ' ' ' + Enen) = £1Ae1 + .5e2
+    +5.,,Ae,.,
so that A is indeed uniquely determined by the Aei. It remains to prove the existence of A with the desired properties. To this end we consider the mapping A which associates with x = 5191 + Ege2 +    + Enen the vector Ax = flgl + szgz +    + Eng”. This mapping is well defined, since x has a unique representation relative to the basis e1, 62,   , en. It is easily seen that the mapping A is linear. Now let the coordinates of gk relative to the basis e1, e2,   , e,l be am, a2,“   , an,“ i.e., 7
(3)
glc = Aek = 21 “acet
The numbers am (i, k = 1,2,   , n) form a matrix
.52! = Hatill which we shall call the matrix of the linear transformation A relative to the basis e1, e2,   , en. We have thus shown that relative to a given basis e1, e2,   , e,I every linear transformation A determines a unique matrix Haikll and, conversely, every matrix determines a unique linear transformation given by means of the formulas (3), (1), (2).
LINEAR TRANSFORMATIONS
73
Linear transformations can thus be described by means of matrices and matrices are the analytical tools for the study of linear transformations on vector spaces. EXAMPLES. 1. Let R be the threedimensional Euclidean space and A the linear transformation which projects every vector on the XY—plane. We choose as basis vectors of R unit vectors e1, e2, e3 directed along the coordinate axes. Then Ae1 = e1,
Ae2 = e2,
Ae3 = 0,
i.e., relative to this basis the mapping A is represented by the matrix 1 O 0
0 1 0
0 0 . 0
EXERCISE. Find the matrix of the above transformation relative to the basis e’l, e’z, e’s, where ell = er:
e’, = ea: 9,3 = e1 + e2 + ea
2. Let E be the identity mapping and e1, e2,   , en any basis in R. Then Ae,.=e,.(i=1,2,~,n), i.e., the matrix which represents E relative to any basis is 1
0
0
O
1
0.
0
O
1
It is easy to see that the null transformation is always represented by the matrix all of whose entries are zero. 3. Let R be the space of polynomials of degree g n — 1. Let A be the differentiation transformation, i.e.,
AP(t) = P’(t). We choose the following basis in R: t2
e1=1,
e2=t,
e3=?,
tn—l
,
74
LECTURES ON LINEAR ALGEBRA
Then Ae1=1’=0,
A ..)
752 ’
Ae2=t’=1=e1, tn—l en:
Ae3=(§)=t=e,,
I
(—(n_ 1)!)
tn—B =
———(n—2)! =en_1.
Hence relative to our basis, A is represented by the matrix 0
1
0
  
0
.............
000°0 Let A be a linear transformation, e1, e2,   , e,I a basis in R and
”am“ the matrix which represents A relative to this basis. Let (4)
x=£1e1+§2e2+”'+sneny
(4’)
AX = 17161 + 77262 +  °  + men
We wish to express the coordinates 77¢ of Ax by means of the coordinates .51 of x. Now
AX = A(Elel + 52% + ' ' ’ + Enen) = 51(“1191 + “2192 + ' ' ' + anlen) + 52(“1291 + “2292 + ' ' ' + “7.292) + En(alne1 + a2ne2 + ' ' ' + annen)
= (“1151 + “1252 + ‘ ' ' + “1n§n)ei + (“2151 + “2252 + ' ' ‘ + “27.57332 + (617.151 + “1.252 + ° ' ' + annEn)en'
Hence, in View of (4’), 771 = “1151 + “1252 + ' ' ' + “InEnJ 772 = “1251 + “2252 + ' ' ' + “Znén! 17'" = “11151 + an2§2 + ' ' I + arms”!
or, brieﬂy, (5)
’7: = 2 aikEk k=1
LINEAR TRANSFORMATIONS
75
Thus, if Ham” represents a linear transformation A relative to some basis e1, e2,  ~ , en, then transformation of the basis vectors involves the columns of Ham” [formula (3)] and transformation of the coordinates of an arbitrary vector x involves the rows of Haik [formula (5)]. 3. Addition and multiplication of linear transformations. We shall now define addition and multiplication for linear transforma— tions. DEFINITION 2.
By the product of two linear transformations
A and B we mean the transformation C defined by the equation Cx = A(Bx) for all x. If C is the product of A and B, we write C = AB. The product of linear transformations is itself linear, i.e., it satisfies conditions 1 and 2 of Definition 1. Indeed,
C(Xl + x2) = A[B(x1 + X2)] = A(BX1 + 133(2) = ABx1 + ABx2 = Cx1 + s. The first equality follows from the definition of multiplication of transformations, the second from property 1 for B, the third from property 1 for A and the fourth from the definition of multiplication of transformations. That C(Ax) = ACX is proved just as easily. If E is the identity transformation and A is an arbitrary transformation, then it is easy to verify the relations AE = EA = A. Next we define powers of a transformation A: A2=AA,
A3=A2A,
etc.,
and, by analogy with numbers, we define A0 = E.
Clearly,
A'”+” = A’"  A". EXAMPLE. Let R be the space of polynomials of degree g n — 1. Let D be the differentiation operator, DP(t) = P’(t).
Then D2P(t) = D(DP(t)) = (P’(t))’ = P”(t). Likewise, D3P(t) = P”’(t). Clearly, in this case D" = O. EXERCISE. Select in R of the above example a basis as in Example 3 of para. 3 of this section and find the matrices of D, D2, D3,    relative to this basis.
76
LECTURES ON LINEAR ALGEBRA
We know that given a basis e1, ez,   , en every linear transformation determines a matrix. If the transformation A determines the matrix Ham“ and B the matrix “bile”: what is the matrix ”can determined by the product C of A and B. To answer this question we note that by definition of ”Call (6)
Ce,c = 2 cikei.
Further (7)
ABek = A321 bjkej) = gbmAe) = 2 bmaﬁei. = m'
Comparison of (7) and (6) yields (8)
cik = Z aijbik‘ (I
We see that the element cm of the matrix g is the sum of the products of the elements of the ith row of the matrix .2! and the corresponding elements of the kth column of the matrix g. The matrix (g with entries defined by (8) is called the product of the matrices .2! and Q in this order. Thus, if the (linear) transforma
tion A is represented by the matrix Ham” and the (linear) transformation B by the matrix “m I, then their product is represented by the matrix cik which is the product of the matrices Ham“ and ”bile“
DEFINITION 3. By the sum of two linear transformations A and B we mean the transformation C defined by the equation Cx = Ax + Bx for all x. If C is the sum of A and B we write C = A + B. It is easy to see that C is linear. Let C be the sum of the transformations A and B. If Ham” and ”ball represent A and B respectively (relative 'to some basis e1, e2,  ~ , en) and Howl] represents the sum C of A and B (relative to the same basis), then, on the one .hand,
Aek = 2 awe), i
Bek = 2 bike),
Cek = 2 Gwen
13
i
and, on the other hand,
cek = Aelc + Bela = 2 (“Us +' bik)ei’ I'
LINEAR TRANSFORMATIONS
77
so that cik = “m + bar The matrix Ham + 5m” is called the sum of the matrices Hatkl] and bikl l. Thus the matrix of the sum of two linear transformations is the sum of the matrices associated with the summaha’s. Addition and multiplication of linear transformations have some of the properties usually associated with these operations. Thus
1.A+B=B+m 2. (A+B)+C=A+(B+C); 3. A(BC)=(AB)C; 4 {(A+B)C=AC+BC, ' C(A+B)=CA+CB. We could easily prove these equalities directly but this is unnecessary. We recall that we have established the existence of a oneto—one correspondence between linear transformations and matrices which preserves sums and products. Since properties 1 through 4 are proved for matrices in a course in algebra, the isomorphism between matrices and linear transformations just mentioned allows us to claim the validity of 1 through 4 for linear transformations. We now define the product of a number I. and a linear transformation A. Thus by AA we mean the transformation which associates with every vector x the vector MAX). It is clear that if A is represented by the matrix Ham”, then 2A is represented by the matrix ”ham“. If P(t) = aot’" + alt”‘—1 +    + am is an arbitrary polynomial and A is a transformation, we define the symbol P(A) by the equation
P(A) = aoAm + alAmI + . ~  + amE. EXAMPLE. Consider the space R of functions defined and infinitely differentiable on an interval (a, 6). Let D be the linear mapping defined on R by the equation
Df(t) =f’(t)
LECTURES ON LINEAR ALGEBRA
78
If P(t) is the polynomial P(t) = aot’” + a1 tm—1+    + a,“ then P(D) is the linear mapping which takes f (t) in R into
P(D)f(t) = aof""’(13) + d1f(’"'1’(t) + ' ' ' + (MN)Analogously, with P(t) as above and .2! a matrix we define P(Jzi), a polynomial in a matrix, by means of the equation
13021) = “0.51” + alarm1 +    + does". EXAMPLE.
Let a! be a diagonal matrix, i.e., a matrix of the form 11 0 0 0 0 A. O 0 6.0.0 .......A."
“7—
We wish to find P(d). Since 112 d2:
0
0
11m 0
0 l,“ 0 6....0......A":
’
_._’
.211»:
0
0 M 0 0....0....lnm
’
it follows that
P(M) =
'P(2.1) 0 0 P(Az)
0 0
EXERCISE. Find P(d) for ‘0 1 0 0 O 0 0 1 0 0 ﬂ=’0 0 0 1 0 6.0.0.0 ....... 1. _0
O
O
0
0
It is possible to give reasonable definitions not only for a polynomial in a matrix .52! but also for any function of a matrix .2! such as exp 4%, sin 4%, etc.
As was already mentioned in § 1, Example 5, all matrices of order n with the usual definitions of addition and multiplication by a scalar form a vector space of dimension n2. Hence any n2 + 1 matrices are linearly dependent. Now consider the following set of powers of some matrix M: 6", .51, .212,   , 421”“.
LINEAR TRANSFORMATIONS
79
Since the number of matrices is n2 + 1, they must be linearly
dependent, that is, there exist numbers a0, a1, a2,   , an. (not all zero) such that aoé” + (£1.21 + a2d2 +    + anneal"a = (9.
It follows that for every matrix of order n there exists a polyno— mial P of degree at most n2 such that P(l) = 0. This simple proof of the existence of a polynomial P(t) for which P(., 0, then P’ (t) is a polynomial of degree k — 1. Hence P’(I) = APO) implies l = 0 and P0) = constant, as asserted. It follows
that regardless of the choice of basis the matrix of A is not diagonal.
We shall prove in chapter III that if 1 is a root of multiplicity m of the characteristic polynomial of a transformation then the maximal number of linearly independent eigenvectors correspond— ‘ ing to A is m. In the sequel (§§ 12 and 13) we discuss a few classes of diagonable linear transformations (i.e., linear transformations which in some bases can be represented by diagonal matrices). The problem of the “Simplest” matrix representation of an arbitrary linear trans— formation is discussed in chapter III. 4. Characteristicpolynomial. In para. 2 we definedthe characteristic polynomial of the matrixazl of alinear transformation A as the determinant of the matrix .2! — M” and mentioned the fact that this polynomial is determined by the linear transformation A alone, i.e., it is independent of the choice of basis. In fact, if .2! and
.93 represent A relative to two bases then ﬂ = (6‘1 41% for some %. But [g—lgg — M9 = [(6—1] Ls! — M”! 1%] = 1.2! — 1.61. This proves our contention. Hence we can speak of the characteristic polynomial of a linear transformation (rather than the characteristic polynomial of the matrix of a linear transformation). EXERCISES.
1.
Find the characteristic polynomial of the matrix 1., 1 0
0 lo 1
0 0 lo
0 0 0
0 O O
0
0
0
1
lo
2. Find the characteristic polynomial of the matrix “1
a2
“a
a'n—l
a”
1 O
0 l
0 0
0 0
0 0
0
0
0
1
0
Solution: (—1)"(A" — div1 — asp3 —    — an).
We shall now find an explicit expression for the characteristic polynomial in terms of the entries in some representation 52/ of A.
LECTURES ON LINEAR ALGEBRA
88
We begin by computing a more general polynomial, namely, QM) = let — WI, where ea? and .9 are two arbitrary matrices.
0(1) =
“11 " M11 “21 " M321
“12 — ”’12 “22 " M722
anl _ lbnl
“112 _a2
‘ ' ‘ ' ' ' ' ' .
“1n — AI’m “2n — M27; arm _ Abnn
and can (by the addition theorem on determinants) be written as the sum of determinants. The free term of (2(1) is
(4)
90 =
“11 “21
“12 “22
' ' ' ' ' '
“1n “2n
anl
“712
I I .
arm

The coefficient of (— M" in the expression for Q“) is the sum of determinants obtained by replacing in (4) any k columns of the matrix Ham” by the corresponding columns of the matrix “but”. In the case at hand 9 = (5" and the determinants which add up to the coefficient of (—h") are the principal minors of order n —— k of the matrix Haikll. Thus, the characteristic polynomial P(/1) of the matrix M has the form
PM) = ( WW — 2511"—1 + 117M"—2 — ° ' ' :l: 1151.), where p1 is the sum of the diagonal entries of 42/, p2 the sum of the principal minors of order two, etc. Finally, p” is the determinant of42!. We wish to emphasize the fact that the coefficients p1, p2,   ,
p" are independent of the particular representation .52! of the transformation A. This is another way of saying that the characteristic polynomial is independent of the particular representation M of A. The coefficients p" and p1 are of particular importance. pn is the determinant of the matrix 42! and p1 is the sum of the diagonal elements of J21. The sum of the diagonal elements of sad is called its trace. It is clear that the trace of a matrix is the sum of all the roots of its characteristic polynomial each taken with its proper multiplicity. To compute the eigenvectors of a linear transformation we must know its eigenvalues and this necessitates the solution of a polynomial equation of degree n. In one important case the roots of
LINEAR TRANSFORMATIONS
89
the characteristic polynomial can be read off from the matrix representing the transformation; namely, If the matrix of a transformation A is triangular, i.e., if it has the
form
(5)
“11 0
“12 “22
“1a “23
' ' ' ' ' '
“1n “2n
0
0
0
a,,,,
then the eigenvalues of A are the numbers an, (122,   , am. The proof is obvious since the characteristic polynomial of the
matrix (5) is P“) = (“11 — A)(“22 — A) ' ' ' (arm — Al and its roots are an, azg,   , am. EXERCISE. Find the eigenvectors corresponding to the eigenvalues an, an, aas of the matrix (5). We conclude with a discussion of an interesting property of the characteristic polynomial. As was pointed out in para. 3 of § 9, for every matrix .2! there exists a polynomial P(t) such that P(m’) is the zero matrix. We now show that the characteristic polynomial is just such a polynomial. First we prove the following LEMMA 1.
Let the polynomial P(i.) = aol'" + alhm‘l +    + a,"
and the matrix a! be connected by the relation
(6)
P(A)é’ = (d—Zé’)%(h)
where (6(1) is a polynomial in h with matrix coefficients, i.e.,
?(i.) = «ohm—1 + 9111“" +    + 9%,. Then P(M) = 0. (We note that this lemma is an extension of the theorem of Bezout to polynomials with matrix coefficients.) Proof: We have
(7)
(.21 — 16’)?(A) = are”. + (”gm—s — {emu + (”gm—a _' gin2);: + ' ' ' _ Vol”
Now (6) and (7) yield the equations sigma = a," 6’,
d?,,,_, — ﬁnd = a,,,_1 6’,
(8)
dgm—a —‘ «In—I = awn—n 6:
90
LECTURES ON LINEAR ALGEBRA
If we multiply the first of these equations on the left by J, the second by J,
the third by .212, ~  , the last by .9!” and add the resulting equations, we get 0 on the left, and PM!) = amé’ + am_1.n( +  ~  + aa 1”" on the right. Thus P(.si) = 0 and our lemma is proved 3. THEOREM 3. If PO.) is the characteristic polynomial of .21, then P(.sz/) = 0. Proof: Consider the inverse of the matrix .2! — 16’. We have (.11 — M”) (.2! — he)‘1 = e’. As is well known, the inverse matrix can be written in the form 1
(a! _ M) —1 —_ _ PM) so), where (6(1) is the matrix of the cofactors of the elements of a! — A! and PU.) the determinant of .11 — 16’, i.e., the characteristic polynomial of 4!. Hence (.2! — h€)%(}.) = P(}.)6’. Since the elements of (6(1) are polynomials of degree g n — 1 in A, we
conclude on the basis of our lemma that PM!) = (9. This completes the proof. We note that if the characteristic polynomial of the matrix M has no multiple roots, then there exists no polynomial QU.) of degree less than n such that QM!) = 0 (cf. the exercise below). EXERCISE. Let .2! be a diagonal matrix
A. o o a: 0 1,0' 0
o
A
where all the 1‘ are distinct. Find a polynomial P(t) of lowest degree for which P(.a’) = 0 (cf. para. 3, § 9).
§ 1 I . The adjoint of a linear transformation 1. Connection between transformations and bilinear forms in
Euclidean space. We have considered under separate headings linear transformations and bih'near forms on vector spaces. In 3 In algebra the theorem of Bezout is proved by direct substitution of A
in (6). Here this is not an admissible procedure since A is a number and a! is a, matrix. However, we are doing essentially the same thing. In fact, the kth equation in (8) is obtained by equating the coefficients of 1" in (6).
Subsequent multiplication by 41" and addition of the resulting equations is tantamount to the substitution of M in place of A.
LINEAR TRANSFORMATIONS
9l
the case of Euclidean spaces there exists a close connection between bilinear forms and linear transformations". Let R be a complex Euclidean space and let A (x; y) be a bilinear form on R. Let e1, e2,   , en be an orthonormal basis in R. If
x = é:191 + 5292 + ‘ ' ' + Ewen andy = 77191 + 77292 + ' ' °+77nem then A(X; y) can be written in the form (1)
A(X§ Y) = “1151’71 + “1251772 ‘l‘ ' ' ' + “11.51771; + “2152771 ‘l‘ “2252572 + ' ' ' + “azﬁn + anlénﬁl + “112515772 + ' ' ' + anngnﬁn'
We shall now try to represent the above expression as an inner product. To this end we rewrite it as follows:
+ ........................... + (“17:51 + a2n£2 + I ° ' + ann£n)ﬁn'
Now we introduce the vector z with coordinates C1 = “1151 + “2152 ‘l‘ ' ' ' + “1:15”;
52 = “1251 ‘l‘ “2252 + ' ' ' + “1.257“ Cu = “11:51 + a2n§2 + ' ' ' + annEn'
It is clear that z is obtained by applying to x a linear transformation whose matrix is the transpose of the matrix Ham“ 0f the bilinear form A (x; y). We shall denote this linear transformation ‘ Relative to a given basis both linear transformations and bilinear forms are given by matrices. One could therefore try to associate with a given linear transformation the bilinear form determined by the same matrix as the transformation in question. However, such correspondence would be without significance. In fact, if a linear transformation and a bilinear form are represented relative to some basis by a matrix .21, then, upon change of basis, the linear transformation is represented by ‘K—1 41% (cf. § 9) and the bilinear form is represented by Wat? (cf. § 4). Here ‘6’ is the transpose of ‘6. The careful reader will notice that the correspondence between bilinear forms and linear transformations in Euclidean space considered below associates bilinear forms and linear transformations whose matrices relative to an orthonormal basis are transposes of one another. This correspondence is shown to be independent of the choice of basis.
92
LECTURES ON LINEAR ALGEBRA
by the letter A, i.e., we shall put z = AX. Then A(X; Y) = 41771 + 42772 + ' I ' + Cnﬁn = (Z, Y) = (AX: Y)'
Thus, a bilinearform A (x; y) on Euclidean vector space determines a linear transformation A such that
MK: 3’) E (AX. Y)The converse of this proposition is also true, namely: A linear transformation A on a Euclidean vector space determines a bilinear form A(x; y) defined by the relation
A(X; y) E (Ax, y)The bilinearity of A(x; y) E (Ax, y) is easily proved: 1.
(A (x1 + X2) Y) = (Ax1 + AX2» Y) = (Axl’ Y) + (AX2» Y). (Ahx, y)= (hAx, y) = MAX, y).
2
(X A(Y1 ‘1' Y2” = (X, AY1 + AY2) = (X: AY1) + (X; AY2):
(X AMY): (X MAY) = (1(X:AY)We now show that the bilinear form A(x; y) determines the
transformation A uniquely. Thus, let
A (x; y) = (Ax, Y) and
A(X; y) = (Bx, y)Then (Ax, y) —= (Bx, y), i.e., (Ax — Bx, y) = 0
for all y. But this means that Ax — Bx = 0 for all x. Hence Ax = Bx for all X, which is the same as saying that A = B. This proves the uniqueness assertion. We can now sum up our results in the following THEOREM 1. The equation
(2)
A(X; y) = (Ax, y)
establishes a onetoone correspondence between bilinear forms and linear transformations on a Euclidean vector space.
LINEAR TRANSFORMATIONS
93
The oneoneness of the correspondence established by eq. (2) implies its independence from choice of basis. There is another way of establishing a connection between bilinear forms and linear transformations. Namely, every bilinear form can be represented as A(X;y) = (x, A*y).
This representation is obtained by rewriting formula (1) above in the following manner: A(X; Y) = 51(“11771 + “12772 + ' ' ' ‘l' “1117771) + 52(“21’71 + “22772 + ' ' ' + “zn’lnl —— .......................... +5 n(anl771 + an2ﬁ2 + I I I + annﬁn)
= 51(‘i11’71 + d12’72 + ' ' ' + dlnnn) + 52(521’71 ‘l‘ d22’72 ‘l' ‘ ‘ ' + dZnnn)
+ £n(an1771 + 4.112172 + ' I ' + dnnnn) = (X, A*y)'
Relative to an orthogonalbasis the matrix a*i,c, of A* and the matrix ”am” of A are connected by the relation “*tk = dki'
For a nonorthogonal basis the connection between the two matrices is more complicated.
2. Transition from A to its adjoint (the operation *) DEFINITION 1. Let A be a linear transformation on a complex Euclidean space. The transformation A* defined by
(Ax, y) = (x, A*y) is called the adjoint of A. THEOREM 2. In a Euclidean space there is a one—toone correspondence between linear transformations and their adjoints. Proof: According to Theorem 1 of this section every linear transformation determines a unique bilinear form A (x; y) = (Ax, y). On the other hand, by the result stated in the conclusion of para. 1, every bilinear form can be uniquely represented as (x, A* y). Hence
94
LECTURES ON LINEAR ALGEBRA
(Ax, Y) = A (X; y) = (x, A*Y)~ The connection between the matrices of A and A* relative to an
orthogonal matrix was discussed above. Some of the basic properties of the operation * are
1 2. 3. 4. 5.
(AB)* = B*A*. (A*)* = A. (A + B)* = A* + 13*. (AA)* = ZA*. E* = E.
We give proofs of properties 1 and 2.
1. (ABx, y) = (Bx, A*y) = (x, B*A*y). On the other hand, the deﬁnition of (AB)* implies
(ABx, Y) = (x, (AB)*Y)If we compare the right sides of the last two equations and recall that a linear transformation is uniquely determined by the corresponding bilinear form we conclude that (AB)* = B* A*. 2. By the definition of A*, (Ax, y) = (x, A*y).
Denote A* by C. Then (Ax, y) = (x, Cy), whence (y, Ax) = (Cy, x).
Interchange of X and y gives
(Cx, y) = (X. Ay)But this means that C* = A, i.e., (A*)* = A. EXERCISES. 1. Prove properties 3 through 5 of the operation *. 2. Prove properties 1 through 5 of the operation * by making use of the connection between the matrices of A and A“ relative to an orthogonal basis.
3. Selfadjoint, unitary and normal linear transformations. The operation * is to some extent the analog of the operation of
LINEAR TRANSFORMATIONS
95
conjugation which takes a complex number at into the complex number a. This analogy is not accidental. Indeed, it is clear that for matrices of order one over the field of complex numbers, i.e., for complex numbers, the two operations are the same.
The real numbers are those complex numbers for which 6: = a. The class of linear transformations which are the analogs of the real numbers is of great importance. This class is introduced by DEFINITION 2. A linear transformation is called self—adjoint (Hermitian) if A* = A. We now show that for a linear transformation A to be selfadjoint
it is necessary and sufficient that the bilinear form (Ax, y) be Hermitian. Indeed, to say that the form (Ax, y) is Hermitian is to say that (a)
(AX, Y) = (AY) X)’
Again, to say that A is self—adjoint is to say that
(AX. Y) = (X. AV).
(b)
Clearly, equations (a) and (b) are equivalent. Every complex number C is representable in the form C = ac + if}, cc, ,3 real.
Similarly,
Every linear transformation A can be written as a sum A = A1 + iAzi
(3)
where A1 and A2 are selfadjoint transformations. In fact, let A1 = (A + A*)/2 and A2 = (A — A*)/2i. Then A = A1 + iAz and c.
*
(AZAY =%()=A+A**
A1*
==%(A*+A) =
A—A**
=
1
_
‘4
()A*+A**
A1: **—__
*
1 = — —
21 (A
* —
A)
= A2)
i.e., A1 and A2 are selfadjoint. This brings out the analogy between real numbers and self— adj oint transformations.
96
LECTURES ON LINEAR ALGEBRA
EXERCISES. 1. Prove the uniqueness of the representation (3) of A. 2. Prove that a linear combination with real coefficients of selfadjoint
transformations is again selfadjoint. 3. Prove that if A is an arbitrary linear transformation then AA* and _ A*A are selfadjoint. NOTE: In contradistinction to complex numbers AA* is, in general, different from A* A.
The product of two self—adjoint transformations is, in general, not self—adjoint. However: THEOREM 3. For the product AB of two selfadjoint transformations A and B to be selfadjoin?! it is necessary and sufficient that A and B commute. Proof: We know that A*=A
and
B*=B.
We wish to find a condition which is necessary and sufficient for (4)
(AB)* = AB.
Now,
(AB)* = B*A* = BA. Hence (4) is equivalent to the equation AB = BA. This proves the theorem. EXERCISE. Show that if A and B are selfadjoint, then AB + BA and i(AB — BA) are also selfadjoint.
The analog of compleX'numbers of absolute value one are unitary transformations. DEFINITION 3. A linear transformation U is called unitary if UU* = U*U = E. 5 In other words for a unitary transformations U* = U—l. In § 13 we shall become familiar with a very simple geometric interpretation of unitary transformations. EXERCISES. 1. Show that the product of two unitary transformations is a unitary transformation. 2. Show that if U is unitary and A selfadjoint, then U1AU is again selfadjoint.
5 In ndimensional spaces UU* = E and U*U = E are equivalent statements. This is not the case in infinite dimensional spaces.
LINEAR TRANSFORMATIONS
97
In the sequel (§ 15) we shall prove that every linear transformation can be written as the product of a selfadjoint transformation and a unitary transformation. This result can be regarded as a generalization of the result on the trigonometric form of a complex number.
DEFINITION 4. A linear transformation A is called normal if AA* = A*A. There is no need to introduce an analogous concept in the field of complex numbers since multiplication of complex numbers is commutative. It is easy to see that unitary transformations and selfadjoint transformations are normal. The subsequent sections of this chapter are devoted to a more detailed study of the various classes of linear transformations just introduced. In the course of this study we shall become familiar with very simple geometric characterizations of these classes of transformations. § 12. Selfadjoint (Hermitian) transformations. Simultaneous reduction of a pair of quadratic forms to a sum of squares
1. Self—adjoint transformations. This section is devoted to a more detailed study of self—adjoint transformations on ndimensional Euclidean space. These transformations are frequently encountered in different applications. (Selfadjoint transformations on infinite dimensional space play an important role in quantum mechanics.)
LEMMA 1. The eigenvalues of a selfadjoint transformation are real. Proof: Let X be an eigenvector of a self—adjoint transformation A and let A be the eigenvalue corresponding to x, i.e., Ax = Ax;
x gé 0.
Since A* = A,
(Ax, x) = (x, Ax), that is,
(ix, x) = (x, 11x),
98
LECTURES ON LINEAR ALGEBRA
or,
11(X, x) = 1(X, x). Since (x, x) 7s 0, it follows that A = X, which proves that Z is real. LEMMA 2. Let A be a selfadjoint transformation on an n—dimensional Euclidean vector space R and let e be an eigenvector of A. The totality R1 of vectors x orthogonal to e form an (n — 1)dimensional subspace invariant under A. Proof: The totality R1 of vectors x orthogonal to e form an (n —— 1)dimensional subspace of R. We show that R1 is invariant under A. Let x 6 R1. This means that (x, e) .= 0. We have to show that Ax 6 R1, that is, (Ax, e) = 0.
Indeed,
(Ax, e) = (x, A*e) = (x, Ae) = (x, 2e) = 1(x, e) = 0. THEOREM 1. Let A be a selfadjoint transformation on an ndimensional Euclidean space. Then there exist n pairwise orthogonal eigenvectors of A. The corresponding eigenvalues of A are all real. Proof: According to Theorem 1, § 10, there exists at least one
eigenvector e1 of A. By Lemma 2, the totality of vectors orthogonal to e1 form an (n — 1)dimensional invariant subspace R1. We now consider our transformation A on R1 only. In R1 there exists a vector e2 which is an eigenvector of A (cf. note to Theorem 1, § 10). The totality of vectors of R1 orthogonal to e2 form an (n — 2)dimensional invariant subspace R2. In R2 there exists an eigenvector e3 of A, etc. In this manner we obtain n pairwise orthogonal eigenvectors e1, e2,   , e". By Lemma 1, the corresponding eigenvalues are real. This proves Theorem 1. Since the product of an eigenvector by any nonzero number is again an eigenvector, we can select the vectors ei so that each of them is of length one. THEOREM 2. Let A be a linear transformation on an ndimensional Euclidean space R. For A to be selfadjoint it is necessary and sufficient that there exists an orthogonal basis relative to which the matrix of A is diagonal and real. Necessity: Let A be self—adjoint. Select in R a basis consisting of
LINEAR TRANSFORMATIONS
99
the n pairwise orthogonal eigenvectors e1, e2,   , en of A constructed in the proof of Theorem 1. Since
Ae1 = Alel, A62 = 11262,
Aen
= 2'71, en,
it follows that relative to this basis the matrix of the transformation A is of the form
(1)
11 o 0 22
0 0
0
1,,
o
where the 1,, are real. Sufficiency: Assume now that the matrix of the transformation A has relative to an orthogonal basis the form (1). The matrix of the adjoint transformation A* relative to an orthonormal basis is obtained by replacing all entries in the transpose of the matrix of A by their conjugates (cf. § 11). In our case this operation has no effect on the matrix in question. Hence the transformations A and A* have the same matrix, i.e., A = A*. This concludes the proof
of Theorem 2. We note the following property of the eigenvectors of a selfadjoint transformation: the eigenvectors corresponding to different eigenvalues are orthogonal. Indeed, let Ael 2' 1161,
A62 = 3.282,
11 ¢ 1.2.
Then (A31: e2) = (91: A* ea) = (e1, A92),
that is 11(e1, ea) = 12(31’ ea): 01'
(’11 — '12)(ei: e2) = 0Since 11 7E 12, it follows that (e1: 92) = 0
100
LECTURES ON LINEAR ALGEBRA
NOTE: Theorem 2 suggests the following geometric interpretation of a. selfadjoint transformation: We select in our space n pairwise orthogonal directions (the directions determined by the eigenvectors) and associate with each a real number A) (eigenvalue). Along each one of these directions we perform a stretching by [M and, in addition, if A; happens to be negative, a reﬂection in the plane orthogonal to the corresponding direction.
Along with the notion of a selfadjoint transformation we introduce the notion of a Hermitian matrix. The matrix am[ is said to be Hermitian if an = d”. Clearly, a necessary and sufficient condition for a linear transformation A to be self—adjoint is that its matrix relative to some orthogonal basis be Hermitian. EXERCISE.
Raise the matrix
(v3 V?) to the 28th power. Hint: Bring the matrix to its diagonal form, raise itto the proper power, and then revert to the original basis.
2. Reduction to principal axes. Simultaneous reduction of a pair of quadratic forms to a sum of squares. We now apply the results obtained in para. 1 to quadratic forms. We know that we can associate with each Hermitian bilinear form a selfadjoint transformation. Theorem 2 permits us now to state the important THEOREM 3. Let A (x; y) be a Hermitian bilinear form defined on an ndimensional Euclidean space R. Then there exists an orthonormal basis in R relative to which the corresponding quadratic form can be written as a sum of squares, A (X; X) = Z Ziléila where the ii are real, and the 4} are the coordinates of the vector
x.6 Proof: Let A(x; y) be a Hermitian bilinear form, i.e.,
A(X; Y) = A(y; X), ° We have shown in § 8 that in any vector space a Hermitian quadratic form can be written in an appropriate basis as a sum of squares. In the case of a Euclidean space we can state a stronger result, namely, we can assert the existence of an orthonormal basis relative to which a given Hermitian quadratic form can be reduced to a sum of squares.
LINEAR TRANSFORMATIONS
101
then there exists (cf. § 11) a selfadjoint linear transformation A such that
A (X; y) E (Ax, y)As our orthonormal basis vectors we select the pairwise orthogonal eigenvectors e1, e2,   , en of the self—adjoint transformation A (cf. Theorem 1). Then
Ae1 = Alel,
Ae2 = Azez,
  ,
Ae” = 111171' e
Let X=§1ei+§2ez+"‘+§new
y=7liei+7lzez+"’+”7nen
Since
(e e)_:1 for i=k " "_
0
for
igék,
we get
A(X; Y) E (AX, Y) = (51Ae1 + §2Ae2 + ' ' ' + EnAen’ 77191 + 77292 + ' ° ° + ﬁnen) = (115191 + 125292 + ' ' ' + Angnem 77191 + 77292 + ' ° ' + mien) = A14:1711 + 1252772 + ' ' ' + lﬂgnﬁ’n'
In particular,
A(x; x) = (Ax. x) = I11612+ 22152? +    + mm This proves the theorem.
The process of finding an orthonormal basis in a Euclidean space relative to which a given quadratic form can be represented as a sum of squares is called reduction to principal axes.
THEOREM 4. Let A (x; x) and B(x; x) be two Hermitiau quadratic forms on an ndimeusioual vector space R and assume B(x; x) to be
positive definite. Then there exists a basis in R relative to which each form cart be written as a sum of squares. Proof: We introduce in R an inner product by putting (x, y) E B(x; y), where B(x; y) is the bilinear form corresponding to B(x; x). This can be done since the axioms for an inner product
state that (x, y) is a Hermitian bilinear form corresponding to a positive definite quadratic form (§ 8). With the introduction of an inner product our space R becomes a Euclidean vector space. By
100
LECTURES ON LINEAR ALGEBRA
NOTE: Theorem 2 suggests the following geometric interpretation of a. selfadjoint transformation: We select in our space n pairwise orthogonal directions (the directions determined by the eigenvectors) and associate with each a real number A, (eigenvalue). Along each one of these directions we
perform a stretching by [hi] and, in addition, if A, happens to be negative, a reflection in the plane orthogonal to the corresponding direction.
Along with the notion of a selfadjoint transformation we introduce the notion of a Hermitian matrix. The matrix llamll is said to be Hermitian if an = Li“. Clearly, a necessary and sufficient condition for a linear transformation A to be self—adjoint is that its matrix relative to some orthogonal basis be Hermitian. EXERCISE.
Raise the matrix
ov2 v2 1 to the 28th power. Hint: Bring the matrix to its diagonal form, raise itto the proper power, and then revert to the original basis.
2. Reduction to principal axes. Simultaneous reduction of a pair of quadratic forms to a sum of squares. We now apply the results obtained in para. 1 to quadratic forms. We know that we can associate with each Hermitian bilinear form a selfadjoint transformation. Theorem 2 permits us now to state the important THEOREM 3. Let A (x; y) be a Hermitian bilinear form defined on an ndimensional Euclidean space R. Then there exists an orthonormal basis in R relative to which the corresponding quadratic form can be written as a sum of squares, A (X; X) = Z lil‘silZ: where the it are real, and the E, are the coordinates of the vector
x.‘3 Proof: Let A(x; y) be a Hermitian bilinear form, i.e.,
A(X; y) = A(y; X), 0 We have shown in § 8 that in any vector space a Hermitian quadratic form can be written in an appropriate basis as a sum of squares. In the case of a Euclidean space we can state a stronger result, namely, we can assert the existence of an orthonormal basis relative to which a given Hermitian quadratic form can be reduced to a sum of squares.
LINEAR TRANSFORMATIONS
101
then there exists (cf. § 11) a selfadjoint linear transformation A such that
A (X; y) E (A191!)As our orthonormal basis vectors we select the pairwise orthogonal eigenvectors e1, e2,   , en of the selfadjoint transformation A (cf. Theorem 1). Then
Ae1 = Alel,
Ae2 = 1292,
  ,
Aen = linen.
Let X=§1e1+5292+”'+§uem
y=771e1+77292+"'+’7nen
Since (e
e)—{1
" k _
0
for
i=k
for
igék,
we get
A(X; Y) E (AX, Y) = (glAel + §2Ae2 + ' ' ' + EnAen: 77191 + 77292 + ' ' ' + When) = (Alglel + 12§2e2 + ' ' ' + Augnen’ 771e1 + 77a + ' ' ' + mien) = 1151771 + 1252772 + ' ' ' + AnE’nﬁn'
In particular,
A (X: X) = (AX, X) = lrlErlz + lalézlz + ' ' ' + inlEnlzThis proves the theorem. The process of finding an orthonormal basis in a Euclidean space relative to which a given quadratic form can be represented as a sum of squares is called reduction to principal axes. THEOREM 4. Let A (x; x) and B(x; x) be two Hermitiau quadratic forms on an u—dimeusioual vector space R and assume B(x; x) to be positive definite. Then there exists a basis in R relative to which each form can be written as a sum of squares. Proof: We introduce in R an inner product by putting (x, y) E B(x; y), where B(X; y) is the bilinear form corresponding to B(x; x). This can be done since the axioms for an inner product state that (x, y) is a Hermitian bilinear form corresponding to a positive definite quadratic form (§ 8). With the introduction of an inner product our space R becomes a Euclidean vector space. By
102
LECTURES ON LINEAR ALGEBRA
theorem 3 R contains an orthonormal 7 basis e1, e2,   , e,I
relative to which the form A(x; x) can be written as a sum of
squares,
A(X; X) = 11§12 + Ilzléizl2 + ' ' ' + inlEnla
(2)
Now, with respect to an orthonormal basis an inner product takes the form
(X, X) = [‘EIIZ + lﬁzl2 + ' ' ' + IEnlzSince B(x; x) E (x, x), it follows that
B(X; X) = [Erlz + lézl2 + ' ' ' + [Enlz
(3)
We have thus found a basis relative to which both quadratic forms A(x; x) and B(x; x) are expressible as sums of squares. We now show how to find the numbers 11, 12,   , in which
appear in (2) above. The matrices of the quadratic forms A and B have the following canonical form:
21 0 M = 0 112
0 0 ,
1 0 ﬂ= 0 1
l). . .0....... ,1”
0 0
0. .0 ...... 1.
Consequently,
(4)
Det (a! 433) = (Al—z) 0. Hence V1,. > 0 and H is nonsingular.
We now prove Theorem 1. Let A be a nonsingular linear transformation. Let
H = V (AA*). In view of Lemmas l and 3, H is a nonsingular positive definite transformation. If
(2)
U = H‘lA,
then U is unitary. Indeed.
UU* = H—1A(H'1A)* = H"1AA"‘H—1 = H—IHZH—1 = E. Making use of eq. (2) we get A = HU. This completes the proof of Theorem 1. The operation of extracting the square root of a transformation can be used to prove the following theorem: THEOREM. Let A be a nonsingular positive definite transformation and let B be a self—adjoint transformation. Then the eigenvalues of the transformation AB are real. Proof: We know that the transformations X=AB
and
C—lXC
have the same characteristic polynomials and therefore the same eigenvalues. If we can choose C so that C‘IXC is selfadjoint, then C—1 XC and X = AB will both have real eigenvalues. A suitable choice for C is C = A1}. Then
cIXC = AtABAt = AtBAt, which is easily seen to be selfadjoint. Indeed, (AiBAiyk = (A%)*B*(A*)* = AtBAi. This completes the proof. EXERCISE. Prove that if A and B are positive definite transformations, at least one of which is nonsingular, then the transformation AB has nonnegative eigenvalues.
§ 16. Linear transformations on a real Euclidean space This section will be devoted to a discussion of linear transformations defined on a real space. For the purpose of this discussion
LINEAR TRANSFORMATIONS
1 15
the reader need only be familiar with the material of §§ 9 through 1 l of this chapter. 1. The concepts of invariant subspace, eigenvector, and eigenvalue introduced in § 10 were defined for a vector space over an arbitrary field and are therefore relevant in the case of a real vector space. In § 10 we proved that in a complex vector space every linear transformation has at least one eigenvector (onedimensional invariant subspace). This result which played a fundamental role in the development of the theory of complex vector spaces does not apply in the case of real spaces. Thus, a rotation of the plane about the origin by an angle different from kn is a linear transformation which does not have any onedimensional invariant subspace. However, we can state the following THEOREM 1. Every linear transformation in a real vector space R has a one—dimensional or two—dimensional invariant subspace. Proof: Let e1, e2,   , en be a basis in R and let Ham” be the matrix of A relative to this basis. Consider the system of equations
“1151 ‘i‘ “1252 + ' ' ' + “17.5,. = 151, “214:1 + “2252 + ' ' ' + 6123,, = 152,
(1)
.............................
“11151 + “11252 + ' I '+ annsn = “'51:
The system (1) has a nontrivial solution if and only if “11 — l “12 “22 — A “21 “rd
“n2
‘ ' ' “in ' ‘ ' “2n
= 0
arm _ A
This equation is an nth order polynomial equation in l with real coefficients. Let 10 be one of its roots. There arise two possibilities: a. 10 is a real root. Then we can find numbers 51°, 52°,   , £710 not all zero which are a solution of (1). These numbers are the coordinates of some vector x relative to the basis e1, e2,   , e". We can thus rewrite (1) in the form Ax = 10x,
i.e., the vector x spans a onedimensional invariant subspace.
116
LECTURES ON LINEAR ALGEBRA
b. lo=a+i13,ﬁ;é0. Let £1 +1.771’E2 + “72’ I I I) En +1177!
be a solution of (1). Replacing 51,52,   , 5” in (1) by these numbers and separating the real and imaginary parts we get “1151 + “1252 + ' ' ' + alnEn = 0‘51 " 16771: (2)
“2151 + “2252 + ' ' ‘ + “21.51; = 0‘52 _‘ 1302: ““151 + “71252 + ' I ' + anngn = “En _ [9’71“
and (2),
“11’71 + “12712 + ' ° ' + “m7!" = “’71 + I351: “21711 + “22712 + ' ' ' + ‘12a = 0“72 + 1352: “711771 + “2737}2 + ' ' ' + annnn = “771: + [357:
The numbers £1, £2,   , En
nates of some vector x
(n1, n2,   , 17”)
are the coordi
(y) in R. Thus the relations (2) and (2’)
can be rewritten as follows
(3)
AX = «X — ﬂy;
Ay = «Y + 13X
Equations (3) imply that the two dimensional subspace spanned by the vectors x and y is invariant under A. In the sequel we shall make use of the fact that in a twodimensional invariant subspace associated with the root A = o: + iﬂ the transformation has form (3). EXERCISE. Show that in an odddimensional space (in particular, threedimensional) every transformation has a one—dimensional invariant sub
space.
2. Self—adjoint transformations DEFINITION 1. A linear transformation A defined on a real Euclidean space R is said to be self—adjoint if
(4)
(Ax, y) = (x, Ay)
for any vectors x and y. Let e1, e2,   , en be an orthonormal basis in R and let
x = §1e1 + 5292 + ' ' ' + Enen’
y = ’71e1 + 772% + ' ' ' +7lnen'
Furthermore, let C, be the coordinates of the vector 2 = Ax, i.e.,
LINEAR TRANSFORMATIONS
117
Ct = 2 “ﬂask! Ic=1
where Hatk” is the matrix of A relative to the basis e1, e2,   , en. It follows that (AX: Y) = (Z: Y) = 2 Ci’li = Z aikgkni' i=1
i,k=1
Similarly, ’ﬂ
(5)
(X: AV) = Z aikéink‘ i,lc=1
Thus, condition (4) is equivalent to “tic = aki'
To sum up, for a linear transformation to be selfadjoint it is necessary and sufficient that its matrix relative to an orthonormal basis be symmetric. Relative to an arbitrary basis every symmetric bilinear form A(x; y) is represented by n
(6)
A (X; Y) =';1aik§ink
where a” = am. Comparing (5) and (6) we obtain the following result: Given a symmetric bilinear form A (X; y) there exists aselfadjoint transformation A such that
A (X; y) = (Ax, Y)We shall make use of this result in the proof of Theorem 3 of this section. We shall now show that given a self—adjoint transformation there exists an orthogonal basis relative to which the matrix of the transformation is diagonal. The proof of this statement will be based on the material of para. 1. A different proof which does not depend on the results of para. 1 and is thus independent of the theorem asserting the existence of the root of an algebraic equation
is given in § 17. We first prove two lemmas.
118
LECTURES ON LINEAR ALGEBRA
LEMMA 1. Every selfadjoint transformation has a onedimensional invariant subspace. Proof: According to Theorem 1 of this section, to every real root 1 of the characteristic equation there corresponds a one« dimensional invariant subspace and to every complex root 1., a twodimensional invariant subspace. Thus, to prove Lemma 1 we need only show that all the roots of a selfadjoint transformation are real. Suppose that 2L = a + iﬂ, 5 ;E 0. In the proof of Theorem 1 we constructed two vectors x and y such that
Ax = ocx — ﬂy, Ay = ﬁx + any. But then
(Ax, Y) = «(x Y) — My, Y) (x, Ay) = ﬁ(x, X) + «(X Y)Subtracting the first equation from the second we get [note that
(AX, Y) = (X, AY)] 0 = 2ﬂ[(X. X) + (y, y)]. Since (x, x) + (y, y) ale 0, it follows that it = 0. Contradiction.
LEMMA 2. Let A be a selfadjoint transformation and e1 an eigenvector of A. Then the totality R’ of vectors orthogonal to e1 forms an (n — 1)dimensional invariant subspace. Proof: It is clear that the totality R’ of vectors x, xeR, orthogonal to e1 forms an (n — 1)dimensional subspace. We show that R’ is invariant under A. Thus, let x eR’, i.e., (X, el) = 0. Then (Ax, el) = (x, Ael) = (x, Zel) = 1(x, el) = 0, i.e., Ax e R’.
THEOREM 2. There exists an orthonormal basis relative to which the matrix of a self—adjoint transformation A is diagonal. Proof: By Lemma 1, the transformation A has at least one eigenvector e1. Denote by R’ the subspace consisting of vectors orthogonal to e1. Since R’ is invariant under A, it contains (again, by Lemma 1)
LINEAR TRANSFORMATIONS
119
an eigenvector e2 of A, etc. In this manner we obtain n pairwise orthogonal eigenvectors e1, e2,   , en. Since Aei = lief
(1: 1; 2: ‘ ' 3 ”)1
the matrix of A relative to the ei is of the form A].
O
' ' '
0
o 22
0
o 0
11,.
3. Reduction of a quadratic form to a sum of squares relative to an orthogonal basis (reduction to principal axes). Let A(x; y) be a symmetric bilinear form on an ndimensional Euclidean space. We showed earlier that to each symmetric bilinear form A (x; y) there corresponds a linear selfadjoint transformation A such that A (x; y) = (Ax, y). According to Theorem 2 of this section there exists an orthonormal basis e1, e2,   , en consisting of the
eigenvectors of the transformation A (i.e., of vectors such that Aei = lei). With respect to such a basis
A(X; y) = (Ax, y) =(A(§191 + 5292 ‘l‘ ‘ ' ' ‘l‘ guenlﬂhei + ’7292 ‘l' ' ' ' ‘l‘ men) = (115191 + 125262 + ' ' ' + Ansnen’nlel ‘l' 71292 + ' ' ' +77nen) = A151’71 ‘l‘ A2’52’72 "l' ' ' ' ‘l‘ lnénm
Putting y = x we obtain the following THEOREM 3. Let A (x; x) be a quadratic form on an ndimensional Euclidean space. Then there exists an orthonormal basis relative to which the quadratic form can be represented as A(X; x) = 2 LE}. Here the 1,. are the eigenvalues of the transformation A or, equivalently, the roots of the characteristic equation of the matrix Hatkll' For n = 3 the above theorem is a. theorem of solid analytic geometry. Indeed, in this case the equation A(x;x) =1 is the equation of a central conic of order two. The orthonormal basis
120
LECTURES ON LINEAR ALGEBRA
discussed in Theorem 3 defines in this case the coordinate system relative to which the surface is in canonical form. The basis vectors e1, e2, e,, are
directed along the principal axes of the surface.
4. Simultaneous reduction of a pair of quadratic forms to a sum of squares THEOREM 4. Let A (x; x) and B(x; X) be two quadratic forms on an ndimensional space R, and let B (X; X) be positive definite. Then there exists a basis in R relative to which each form is expressed as a sum of squares. Proof: Let B(x; y) be the bilinear form corresponding to the quadratic form B(x; x). We define in R an inner product by means of the formula
mw=3ww By Theorem 3 of this section there exists an orthonormal basis e1, e2,   , en relative to which the form A (x; x) is expressed as a sum of squares, i.e.,
(7)
A (x: x) = i=1 2 Ate2.
Relative to an orthonormal basis an inner product takes the form
(8)
Z 53(X; X) = B(X; X) = i=1
Thus, relative to the basis e1, e2,   , en each quadratic form can be expressed as a sum of squares. 5. Orthogonal transformations
DEFINITION. A linear transformation A defined on a real n—dimen— sional Euclidean space is said to be orthogonal if it preserves inner products, i.e., if
(9)
(Ax, Ay) = (x, y)
for all x, y e R. Putting x = y in (9) we get
(10)
[Ax]2 = IXIZ,
that is, an orthogonal transformation is length preserving. EXERCISE. Prove that condition (10) is sufficient for a transformation to be orthogonal.
LINEAR TRANSFORMATIONS
121
Since
(x, y)
=m
and since neither the numerator nor the denominator in the expression above is changed under an orthogonal transformation, it follows that an orthogonal transformation preserves the angle between two vectors.
Let e1, e2,   , en be an orthonormal basis. Since an orthogonal transformation A preserves the angles between vectors and the length of vectors, it follows that the vectors Ae1,Ae2,  ' , Aen likewise form an orthonormal basis, i.e.,
1 for i = k (11)
(Aet: Aek) — {0 for 1’ ¢ [3
Now let Haw” be the matrix of A relative to the basis e1, e2,  ~ , en. Since the columns of this matrix are the coordinates of the vectors Aei, conditions (11) can be rewritten as follows:
(12)
"
1
for
i = k
2 “Ma” = {0 for igék.
a=1
EXERCISE. Show that conditions (11) and, consequently, conditions (12) are sufficient for a transformation to be orthogonal.
Conditions (12) can be written in matrix form.
Indeed,
it
2 adiaak are the elements of the product of the transpose of the (i=1
matrix of A by the matrix of A. Conditions (12) imply that this product is the unit matrix. Since the determinant of the product of two matrices is equal to the product of the determinants, it follows that the square of the determinant of a matrix of an orthogonal transformation is equal to one, i.e., the determinant of a matrix of an orthogonal transformation is equal to j: 1. An orthogonal transformation whose determinant is equal to + 1 is called a proper orthogonal transformation, whereas an orthogonal transformation whose determinant is equal to — 1 is called improper. EXERCISE. Show that the product of two proper or two improper orthogonal transformations is a proper orthogonal transformation and the product of a proper by an improper orthogonal transformation is an improper orthogonal transformation.
122
LECTURES ON LINEAR ALGEBRA
NOTE: What motivates the division of orthogonal transformations into proper and improper transformations is the fact that any orthogonal transformation which can be obtained by continuous deformation from the identity transformation is necessarily proper. Indeed, let At be an orthogonal transformation which depends continuously on the parameter if (this means that the elements of the matrix of the transformation relative to some basis are continuous functions of t) and let A0 = E. Then the determinant of this transformation is also a continuous function of 23. Since a continuous function which assumes the values j: 1 only is a constant and since for t = 0 the determinant of A0 is equal to 1, it follows that for £72 0 the determinant of the transformation is equal to 1. Making use of Theorem 5 of this section one can also prove the converse, namely, that every proper orthogonal transformation can be obtained by continuous deformation of the identity transformation.
We now turn to a discussion of orthogonal transformations in onedimensional and twodimensional vector spaces. In the sequel we shall show that the study of orthogonal transformations in a space of arbitrary dimension can be reduced to the study of these two simpler cases. Let e be a vector generating a onedimensional space and A an orthogonal transformation defined on that space. Then Ae = 1e and since (Ae, Ae) = (e, e), we have 12(e, e) = (e, e),i.e., )1 =;;l.
Thus we see that in a onedimensional vector space there exist two orthogonal transformations only: the transformation Ax E x and the transformation Ax E — x. The first is a proper and the second an improper transformation. Now, consider an orthogonal transformation A on a twodimensional vector space R. Let e1, e2 be an orthonormal basis in R and let
03>
[Z ’2]
be the matrix of A relative to that basis. We first study the case when A is a proper orthogonal transformation, i.e., we assume that «6 — ﬂy = 1. The orthogonality condition implies that the product of the matrix (13) by its transpose is equal to the unit matrix, i.e., that
l: H :1
LINEAR TRANSFORMATIONS
123
Since the determinant of the matrix (13) is equal to one, we have
t: 2H: t]It follows from (14) and (15) that in this case the matrix of the transformation is
a
_
ﬂ
0! ’
where a2 + [32 = 1. Putting at = cos (p, [3 = sin (7) we find that the matrix of a proper orthogonal transformation on a two dimensional space relative to an orthogonal basis is of the form cos go [sin ()0
— sin go cos go]
(a rotation of the plane by an angle (p). Assume now that A is an improper orthogonal transformation, that is, that 0:6 — ﬂy = — 1. In this case the characteristic equation of the matrix (13) is 12 — (ac + 6)). — 1 = O and, thus, has real roots. This means that the transformation A has an eigenvector e, Ae = he. Since A is orthogonal it follows that Ae = ie. Furthermore, an orthogonal transformation preserves
the angles between vectors and their length. Therefore any vector e1 orthogonal to e is transformed by A into a vector orthogonal to Ae = ie, i.e., Ae1 = iel. Hence the matrix of A relative to the
basis e, e1 has the form i1 0
0 i1 '
Since the determinant of an improper transformation is equal to — 1, the canonical form of the matrix of an improper orthogonal transformation in two—dimensional space is
+ 1
0
— 1
0
[0—1]°r[0+1] (a reflection in one of the axes).
We now find the simplest form of the matrix of an orthogonal transformation defined on a space of arbitrary dimension.
124
LECTURES ON LINEAR ALGEBRA
THEOREM 5. Let A be an orthogonal transformation defined on an ndimensional Euclidean space R. Then there exists an orthonormal basis e1 , e2 ,   , en of R relative to which the matrix of the transforma
tion is 1
—1
cos (pl — sin (p1 sin (p1 cos (p1
cos (pk — sin (pk sin (pk cos (pk_ where the unspecified entries have value zero. Proof: According to Theorem 1 of this section R contains a oneor twodimensional invariant subspace R”). If there exists a onedimensional invariant subspace Rm we denote by el a vector of length one in that space. Otherwise Rm is two dimensional and we choose in it an orthonormal basis e1, e2. Consider A on R“). In the case when R”) is onedimensional, A takes the form AX
= ix. If Rm is two dimensional A is a proper orthogonal trans— formation (otherwise Ru) would contain a one—dimensional invariant subspace) and the matrix of A in R”) is of the form cos (p [sin 1,?)
— sin
m1+M2+~+m«  We now compute Dn_1 (/1). Since Dn_1(h) is a factor of Dn().), it must be a product of the factors A — 11, l — 12,   . The problem now is to compute the degrees of these factors. Specifically, we compute the degree of h — 11 in Dn_1(h). We observe that any non—zero minor of .92 = .2! — 16’ is of the form 1 2 k An_1 ___ Ai1)Ai,)° . 'Aik)’
where t] + t2 +    + tk = n — 1 and A}? denotes the tith order minors of the matrix 331.. Since the sum of the orders of the minors 7 Of course, a nonzero kth order minor of Q may have the form A (1’, i.e., it may be entirely made up of elements of $1. In this case we shall write it formally as A,c = Ako‘z’, where A0”) = 1.
CANONICAL FORM or LINEAR TRANSFORMATION
147
Ali), Alf), ' ' , Ali? is n — 1, exactly one of these minors is of order one lower than the order of the corresponding matrix 91., i.e., it is obtained by crossing out a row and a column in a box of the matrix Q. As we saw (cf. page 145) crossing out an appropriate row and column in a box may yield a minor equal to one. Therefore it is possible to select An_1 so that some Al? is one and the remaining minors are equal to the determinants of the appropriate boxes. It follows that in order to obtain a minor of lowest possible degree in ,1 — 11 it suffices to cross out a suitable row and column in the box of maximal order corresponding to 21. This is the box of order n1. Thus the greatest common divisor Dn_1(2.) of minors of order n — 1 contains 1 — Al raised to the power n2 + n3 + .   + np. Likewise, to obtain a minor An_2 of order n — 2 with lowest possible power of A — 1.1 it suffices to cross out an appropriate row and column in the boxes of order n1 and n2 corresponding to 1.1. Thus Dn_2(l) contains 1 — 11 to the power n3 —— n4 +    + n”, etc. The polynomials Dn_,(1), Dn_p_1(l),   , D10.) do not contain h — 11 at all. Similar arguments apply in the determination of the degrees of h — 12, )1 — 13,” in k). We have thus proved the following result. If the jordan canonical form of the matrix of a linear transforma— tion A contains? boxes of order n1, n2,   , n1,(n1 ; n2 2    an?) corresponding to the eigenvalue 11,? q boxes of order m1, m2,   , ma (rn1 ; rn2 g   ' 2 ma) corresponding to the eigenvalue 12, etc, then _ ’12)ml+m2+ma+...+ma _ . . Dug) = (A __ ll)n1+n2+’na+..+ny (h.
Az)m”+m3+”'+"‘a _  . Dn_1(2.) = (l _ 11)n2+na+...+np (A _
Dir—2w = (a — 1W)
a _ AW)'"+"'« . . .
Beginning with Dn_p(h) the factor (A — 11) Beginning with Dn_q(h) the factor (A — 22)
is replaced by one. is replaced by one,
etc.
In the important special case when there is exactly one box of order n1 corresponding to the eigenvalue 11, exactly one box of order rn1 corresponding to the eigenvalue 1.2, exactly one box of order 131 corresponding to the eigenvalue 13, etc., the Di(l) have the following form:
148
LECTURES ON LINEAR ALGEBRA
DM) = (l — M’Wl — 12W“ — Ila)"1 ' ' '
The expressions for the Dkm show that in place of the Dk(l) it is more convenient to consider their ratios
Dk 2
EN) = D._((i) ' The Ek(}.) are called elementary divisors. Thus if the jordan canonical form of a matrix .2! contains p boxes of order n1, n2,   , n1, (n1 2 n2 2    g n,) corresponding to the eigenvalue 11, q boxes of order m1, m2,   , ma (m1 2 m2 2    2 ma) corresponding to the eigenvalue 22, etc., then the elementary divisors Ek(}.) are
EN) = (1 — Ill)"l ([1 — 12)”  ' ', Ell—1(1) = (I1 — 111W (,1 — I12)”2 ' ' a Pin—2(1) = (A — "11)”3 (A _ A2)".a ' ' U Prescribing the elementary divisors En(l), En_1(l),   , deter
mines the Jordan canonical form of the matrix a! uniquely. The eigenvalues it are the roots of the equation En(}.). The orders n1, n2,   , np of the boxes corresponding to the eigenvalue 1.1 coincide with the powers of (A — 11) in En(}.), En_1(/‘L),  We can now state necessary and sufficient conditions for the existence of a basis in which the matrix of a linear transformation is diagonal. A necessary and sufficient condition for the existence of a basis in which the matrix of a transformation is diagonal is that the elementary divisors have simple roots only. Indeed, we saw that the multiplicities of the roots 21,12,   ,
of the elementary divisors determine the order of the boxes in the Jordan canonical form. Thus the simplicity of the roots of the elementary divisors signifies that all the boxes are of order one, i.e., that the Jordan canonical form of the matrix is diagonal. THEOREM 2. For two matrices to be similar it is necessary and sufficient that they have the same elementary divisors.
CANONICAL FORM OF LINEAR TRANSFORMATION
149
Proof: We showed (Lemma 2) that similar matrices have the same polynomials DEM) and therefore the same elementary divisors Ek(}.) (since the latter are quotients of the Dk(l)). Conversely, let two matrices .521 and Q have the same elementary divisors. .2! and .93 are similar to Jordan canonical matrices. Since the elementary divisors of .52! and g are the same, their Jordan canonical forms must also be the same. This means that .52! and .9 are similar to the same matrix. But this means that .2! and .93 are similar matrices. THEOREM 3. The jordan canonicalform of a linear transformation is uniquely determined by the linear transformation.
Proof: The matrices of A relative to different bases are similar. Since similar matrices have the same elementary divisors and these determine uniquely the Jordan canonical form of a matrix,
our theorem follows. We are now in a position to find the Jordan canonical form of a matrix of a linear transformation. For this it suffices to find the elementary divisors of the matrix of the transformation relative to some basis. When these are represented as products of the form (I —— 2.1)"(11 — 112)“    we have the eigenvalues as well as the order of the boxes corresponding to each eigenvalue. § 21. Polynomial matrices 1. By a. polynomial matrix we mean a matrix Whose entries are polynomials in some letter A. By the degree of a polynomial matrix we mean the maximal degree of its entries. It is clear that a polynomial matrix of degree n can be written in the form
Aoin + AIM1 +    + A0, where the Ak are constant matrices. 3 The matrices A — 1E which we considered on a number of occasions are of this type. The results to be derived in this section contain as special cases many of the results obtained in the preceding sections for matrices of the form A — 1E. 3 In this section matrices are denoted by printed Latin capitals.
LECTURES 0N LINEAR ALGEBRA
150
Polynomial matrices occur in many areas of mathematics. Thus, for example, in solving a system of first order homogeneous linear differential equations with constant coefficients
(1)
d ‘n = 2 amyk $ x
. (’L = 1, 2,   an)
k=1
we seek solutions of the form (2)
1/1: = one“:
(2)
where i. and ck are constants. To determine these constants we substitute the functions in (2) in the equations (1) and divide by e“. We are thus led to the following system of linear equations: ’ﬂ
Ac; = Z amok.
k=1 The matrix of this system of equations is A — 1E, with A the matrix of coefficients in the system (1). Thus the study of the system of differential equations (1) is closely linked to polynomial matrices of degree one, namely, those of the form A — 1E.
Similarly, the study of higher order systems of differential equations leads to polynomial matrices of degree higher than one. Thus the study of the system
n
dzyk
n
dyk
n
+ 2 bik—+ Z Emil/1c = 0
k=1
9‘
k=1
is synonymous with the study of the polynomial matrix Al.” + B}. + C,
where A = “an“, B = b.,,, C = c,k.
We now consider the problem of the canonical form of polynomial matrices with respect to socalled elementary transformations. The term “elementary” applies to the following classes of transformations. 1. Permutation of two rows or columns. 2. Addition to some row of another row multiplied by some polynomial E.,(A)    an).
Since E20.) is divisible by E1 (1), E30.) is divisible by E2(l), etc., it follows that the greatest common divisor D10) of all minors of order one is £10.). Since all the polynomials E,c(}.) are divisible by E10») and all polynomials other than E10.) are divisible by E2(}.), the product E,(}.)E,(l)(z' < j) is always divisible by the minor E1(A)E2(Z). Hence D20.) = E1().)E2(}.). Since all Ek(}.) other than E10.) and E20.) are divisible by E3(}.), the product Ei(l)E,(Z)Ek(ﬁ.) (i