267 123 4MB
English Pages [132] Year 1986
Essential Student
Algebra
VOLUME FOUR
Linear
Algebra T.S. BLYTH
&
E.F. ROBERTSON
Essential Student Algebra
——
VOLUME
FOUR
Bs
Linear
Algebra
:
—
~
aghast ist,
a
cor atin evant MMhoqeane x
amen
i? wi on sai = se
ess
ne a
wat est) + ramjaune
ee
ae
b
eri
Preface
If, as it is often said, mathematics is the queen of science then algebra is surely the jewel in her crown. In the course of its vast development over the last half-century, algebra has emerged as the subject in which one can observe pure mathematical reasoning at its best. Its elegance is matched only by the ever-increasing number of its applications to an extraordinarily wide range of topics in areas other than ‘pure’ mathematics. Here our objective is to present, in the form of a series of five concise volumes, the fundamentals of the subject. Broadly speaking, we have covered in all the now traditional syllabus that is found in first and second year university courses, as well as some third year material. Further study would be at the level of ‘honours options’. The reasoning that lies behind this modular presentation is simple, namely to allow the student (be he a mathematician or not) to read the subject in a way that is more appropriate to the length, content, and extent, of the various courses he has to take. Although we have taken great pains to include a wide selection of illustrative examples, we have not included any exercises. For a suitable companion collection of worked examples, we would refer the reader to our series Algebra through practice
(Cambridge University Press), the first five books of which are appropriate to the material covered here.
T.S.B., E.F.R.
Wnts
ae
ae et oP i
a mame ‘Selle
ee
he
ae
x
ee
:
-
a
;
7
.
ee
=e
7
7
_
\ aa
one rs
_
a
:
ae
egitii.se te we 2 up wh a
re
aarignes Woe eb _even }
—
a
t
and ahr ‘mle
7 ae
saiddereveds
yralere ee
el
et eceq sttade veo gam
Seve. Saosaqeteern
Ee wt Susuew ad ¢
=
i
: =
seen pcorstt
:
"
«
-
@ Ef ent od) ae torr ot wvitonide 9 Soap + eos te swells vee :
gputabiven by seibart won. odd ie ai Beiseco"
Jd
flee es: teehee ywariee tear hee Ia tb 5
ee
.
hisew eowre onitioll Sekmtea: saref
ea Bx miat nell aad) guabareusiee at Ssumandega 20jot pel sedate ecle of cisarw cabguniaa escsas heist vere
as
of train ory ceer ot (tes = ts
ate. , noes hee tentang otigstsl od oe ih =
aU
ss ’
wel
ake 73 haw. balsaeas weir g: Qolacmebions ‘ewig wasit tedie see18 2; SegoT
a i
S
bias pete = #
ced ae [nen oe weet on
that onead Ad 2 :
-jele whaet s sites
nina ong
meineoael
~tena ure babatned dem penal ter amen sth aod 2
barre
@
aorta
artnnn. aos 8
ak
weg Beyagact se deaht ashes xo a)
3
bit
to ahaned vit spt od2 {oe wha Syeweey
CHAPTER
ONE
The minimum
polynomial
In Volume Two we introduced the notions of eigenvalue and eigenvector of a linear mapping or matrix. There we concentrated our attention on showing the importance of these notions in solving particular problems. Here we begin by taking a closer algebraic look. Definition Let F be a field. By an algebra over F we shall mean a vector space V over F on which there is defined a mul-
tiplication in such a way that (V,+, -) is a ring with identity and
(Vz,y EV)(VA € F)
(Az)y = A(zy) = x(Ay).
Example With respect to multiplication of matrices, the vector space Mat, yn(F) becomes an algebra. Example
With respect to composition of mappings, the vector
space Lin(V,V) becomes an algebra. If V is a vector space of dimension n over F then we have an algebra isomorphism
Lin(V,V) ~ Matnyn(F) (i.e. a bijection that is both a ring and a vector space isomor-
phism) that is obtained by associating with each lear mapping its matrix relative to some fixed ordered basis. This f:V—-V is well known; see, for example, Theorems 7.2 and 7.3 in Volume Two. In practice, we work in both of these algebras, choosing whichever suits our purposes at the time. Observe, for example,
2
VOLUME
that since Mat,x,(F)
4: LINEAR
ALGEBRA
is of dimension n* 2 over F’, for every n Xn
matrix A over F the n? + 1 powers
Lea Ae
ae
are linearly dependent and so there is a non-zero polynomial
p= d9 +a,X
+ a,X* + Paes
Gt e F(X]
such that p(A) = 0. The same of course is true for any f €
Lin(V,V).
But we can do better than this : there is, in fact,
a polynomial p of degree at most n such that p(A) = 0. This is the celebrated Cayley-Hamilton Theorem which we shall now
establish. Since we shall be working in Matnxn(F), the proof we shall give will be ‘elementary’.
There are other, more elegant,
proofs which use Lin(V, V). Definition
If A € Matnyn(F) the the characteristic polyno-
mal of A is xa = det(XI, — A). Note that y,4 is of degree n in the indeterminate X.
1.1 Theorem Proof
Let
[Cayley-Hamilton]
x 4(A) =0.
B = XI, — A and
xa = det
B= bo +0:X 4+ ---+5,X”.
Consider the matrix adj B. By definition, this is an nx n matrix whose entries are polynomials in X of degree at most n—1 and so we have adj
B= Bp
+ ByX + ---+ Mee
Ha
for some n X n matrices Bo,..., By—1. Recalling that Badj
B=
(det B)I, (see, for example, Theorem 8.11 of Volume Two), we have (det B)J,, = Badj B= (XI, — A) adj B = X adj B — Aadj B, i.e. we have the polynomial identity boln + by,X + ---+6,1,X"
= BoX + +++ Bn-1X”" — ABo — ---— ABn_1X"7?.
THE
MINIMUM
POLYNOMIAL
3
Equating coefficients of like powers, we obtain boln = —ABo bil, = Bo — AB,
bn-il,
Cn
= By
Sec ABn-1
Dnt
Multiplying the first equation on the left by J,,, the second by A, the third by A”, and so on, we obtain boln = —ABo
b,A = ABo — A? By eo b, A”
ee ne =
AT
A
Bas3.
Adding these equations together, we obtain x4(A) = 0. > The Cayley-Hamilton Theorem is really quite remarkable, it being far from obvious that an n Xn matrix over a field F should satisfy a polynomial equation of degree n. This result leads us to consider the following notion. Definition If A € Matnxn(F) then the minimum polynomial of A is the monic polynomial my, of least degree such that
1.2 Theorem [f p ts a polynomial such that p(A) = 0 then the minimum polynomial ma, divides p. Proof By euclidean division there are polynomials q,r such that p = maq+r with r = 0 or degr < degmy. Now by hy-
pothesis p(A) = 0, and by definition m,4(A) = 0. Consequently we have r(A) = 0. By the definition of m4 we cannot then have degr < degm,, and so we must have r = 0. It follows that p = maq and so my divides p. > 1.3 Corollary
my, divides x4.
It is immediate from 1.3 that every zero of m, ya. The converse is also true :
is a zero of
4
VOLUME
1.4 Theorem Proof
4: LINEAR
ALGEBRA
mz, and x4 have the same zeros.
Observe that if
is a zero of x4 then det(AJ,, — A) =0
and so AJ, — A is not invertible. There is therefore a dependence relation between the columns of AJ, — A and so there is a non-
zero x € Mat, x1(F) such that Ax = Ax. Given any
h = ao +a, X
+.->+ a,X*
we then have h(A)x =apx+a,Ax+
+--+ a, A*x
= agX+ ajAx+---+
apA*x
= h(A)x
so that h(A)J, — h(A) is not invertible. Put another way, we have det[h(A)I, — h(A)] = 0. Thus we see that h(A) is a zero of Xn(4)- Now choose h = mg. Then for every zero \ of x4 we
have that m,(4A) is a zero of Xm4(A) = Xo = det XI, = X”.
Since the only zeros of this are 0, we have m4(A) = 0 and so A is a zero of ma. O
Example
The characteristic polynomial of
Vs
oF
sal!
| fan) ee)ore
is ¥4 = (X — 2). Now it is readily seen that A — 2/3 (A — 213)? 4 0, so we also have my = (X — 2)°. Example
#0 and
For the matrix 5. A=j|-1 Sai
6, 4
—6 2 Opt
we have x4 = (X—1)(X—2)?. By 1.4, the minimum polynomial is therefore either (X — 1)(X — 2)? or (X — 1)(X — 2). Since, as is readily seen, (A — I3)(A — 213) = 0, it follows that m4 = (X — 1)(X — 2).
THE
MINIMUM
POLYNOMIAL
5
The notion of characteristic polynomial can be defined for a linear mapping as follows. Given a vector space V of dimension n over F and a linear mapping f : V — V, let A be the matrix of f relative to some fixed ordered basis of V. Then the matrix of f relative to any other ordered basis is of the form P~!AP where P is the transition matrix from the new basis to the old basis (see, for example, Volume Two, Theorem 7.6). Now the characteristic polynomial of P~! AP is
det(XJ, — P~* AP) = det[P~*(XI,, — A)P] = det P~* det(XI, — A) det P = det(XJ, — A), i.e. Xp-1ap = XaA- Thus the characteristic polynomial is independent of the choice of basis, so we can define the characteristic polynomial x; of f to be the characteristic polynomial of any matrix that represents f. Likewise, the minimum polynomial my of f is defined to be the minimum polynomial of any matrix that represents f; for if A, Brepresent f then for any polynomial
p we have p(B) = p(P~!AP) = P~1p(A)P and so p(B) = 0 if and only if p(A) = 0. As we have seen, the characteristic polynomial and the minimum polynomial have the same zeros. These are called the
eigenvalues (of f or of A). Thus d is an eigenvalue of A if and only if det(AJ,, — A) = 0, and the corresponding statement for f is that \ is an eigenvalue of f if and only if Aidy —f
is not invert-
ible. In the former case there exists a non-zero x € Mat,,xi(F) such that Ax = Ax, and in the latter there exists a non-zero
z € Ker(Aidy —f), so that f(z) = Az. Such a column matrix x and vector z are called corresponding eigenvectors (of A and of
f).
1.5 Theorem
Let V be a vector space of dimension n > 1 over
€. Then every f € Lin(V,V) and every A € Matnxn(C) has at least one ergenvalue in C. Proof
x; factorises over C, say as
Xp = (KX— Ai) (X — Ag)? ---(X— A,)**. Substituting f for X and using the Cayley-Hamilton Theorem, we obtain
0 = (f — Ar idy)*(f —Azidy)®? ---(f — Anidv)*.
6
‘
VOLUME
4: LINEAR
ALGEBRA
It follows that not all the factors (f — A; idy)* are invertible and so there is at least one eigenvalue. > Example Note that 1.5 is not true when C is replaced by R. Indeed, consider the rotation matrix cos3
aL ay ee
sin?
ark
The characteristic polynomial of Ry is X? — 2cos# X + 1, the zeros of which are cos ?+7sin ¥. Thus, when ?# is not an integral multiple of 7, Ry has no real eigenvalues. 1.6 Theorem A linear mapping f (or a square matriz A) is wnvertible rf and only if the constant term in the characteristic polynomual 1s not zero. Proof To say that f is invertible is equivalent to saying that 0 is not an eigenvalue of f, i.e. to saying that 0 is not a zero of the characteristic polynomial. Clearly, this is equivalent to the constant term being non-zero. > Example
The matrix
Pn II KF oo orrF ee Se
is such that x4 = (X — 1)°. Thus
0=(A-—I3)° = A® — 3A? +3A—I5 and consequently we see that
Aq} = A*— 3A
Sly 2
1s yi eeabton9 bow leatees Os =O) aed
CHAPTER
TWO
Direct sums of subspaces
If A and B are non-empty subsets of a vector space V over a field F then the subspace spanned by AU B, ie. the smallest subspace that contains both A and B, is the set of linear combinations of elements of AUB. in other words, it is the set of
elements of the form 3 AZe+
45y; where each z; € A, each
t=]
=
y; © B, and d;,uw;€ F. In the case where A, B are subspaces of V, this set can be described as
A+B={a+b;aeEA,beEB} which we call the sum of the subspaces A,B. More generally, if A,,...,A, are subspaces of V then we define their sum to be the subspace, denoted by }> Aj, that is spanned by LJ Aj. t=1
a=
Clearly, we have
yy Ag = {aa +.
+ ap jay € Aj}.
ol
Example
Let X,Y, D be the subspaces of IR? given by
X = {(z,0) ; z € R},
Y = {(0,y); y € R},
D = {(z,2) ; « € R}.
Then IR? = X+Y = X+D=Y+4D, for every (x,y) € IR® can be written in each of the three ways
(z,0)+(0,y),
(c—y,0)+(y,¥),
:
(0,y— 2) +(z,2).
8
VOLUME
Definition
4: LINEAR
ALGEBRA
n )> A; of subspaces Aj,..., An is said to be
A sum
4=1 n
direct if every zs € )> A; can be written in a unique way as a i=1 sum z= a, + -::+a, with a; € A; for each 7. n
We shall use the notation
@ A; to denote the fact that the
i=1
nm
sum )> A; is direct, and call this the direct sum of the subspaces t=1
AS Example
In the previous Example we have IR? =
X@Y =
X@D=YoO0D. Example
Let A, B be the subspaces of IR® given by
A =+{(xfy,2z) pe -py Ft 20},
OB = (apap)
epee RY:
Then IR® = A+ B since, for example,
(=, 9,2) = (3(z— y),-3 (2 — y),0) + (3(z +9), 3(z+ y),2)This sum is not direct, however, for we can also write (z, y, z) as ($(z-y+1), -4(z-y-1), —1)+($(z+y—1), 4(z+y—1),2+1). 2.1 Theorem If Aj,...,A, are subspaces of a vector space V then the following statements are equivalent : n
(1) the sum > A; ts direct; awl
(2) af 30 a; =0 with a; € A; for every 2, then every a; = 0; al
(3) for every, AN > A; = {0;}. jxi
Proof (1) = (2) : By the definition of direct sum, if (1) holds then Oy can be written in only one way
as a sum
a; € A; for every 7.
(2) = (3) : Let
>. a; with t=L
re A;M D> A;, say z =a; = Do a;. We can J#t
TFt
write this as a; — }) a; = 0. By (2) we deduce that a; = 0, j#t whence z = 0.
DIRECT
SUMS
OF
SUBSPACES
9
(3) = (1) : Suppose that (3) holds and that 5> a; = t=
where a;,b; € A; for each z. Then
aj—b; =
b; t =1
DY (b; = a;) At
where the left hand side belongs to A; and the right hand side
belongs to }> A,;. By (3) we deduce that a; — b; = 0. Since this j#t
holds for every 7, (1) follows. > 2.2 Corollary
and only ifV
If A,B are subspaces
= A+B
and
ANB=
of V thenV = A@B
if
{Oy}.
Example A mapping f : IR — RR is said to be even if f(—z) = f(z) for every x € IR, and odd if f(—z) = —f(z) for every z € RR. The sets A, B of even, odd functions are subspaces of the vector
space V = Map(IR, IR). Moreover,
V = A@B. Tosee this, given
any f : IR — R let f+ : IR — IR and f~ : IR > R be given by
f*(2) = 4[f(2) + f(~a)] and f~(2) = 3[f(z) - f(-2)]. ‘Then ft iseven and f~ isodd. Since f = ft+f~ we have V = A+B.
Since clearly AM B consists only of the zero function, it follows by 2.2 that V = AO B.
Example
Let V be the vector space Mat, x,(IR). If A, B are
the subspaces of V consisting of the symmetric, skew-symmetric matrices then V = A@ B. In fact, every X € V can be written uniquely in the form X = Y + Z where Y € A and Z € B; we
have Y = 4(X+ X*) and Z = 3(X— X°). In a direct sum, bases can be pasted together : 2.3 Theorem let Vi,...,Vn
Let V be a finite-dimensional vector space and be non-zero subspaces of V such thatV = @V;,. a= 1
If B; ts a basis of V; for eacht then |) B; is a basis of V. t=1
Proof
i=1
Let dimV; = d; and let B; = {e;,1,..., €,a,}. Since
V=
V; we have V;N35 V; = {Oy} by 2.1 and hence V;NV; = {Oy} j#i
for i # 7. Consequently B; Nn B; = @ for i # 7.. Now a typical
10
‘
VOLUME
4: LINEAR
ALGEBRA
element of the subspace spanned by |) B,; is of the form t=1
dy
()
SSAvery tot 2Sdn Anveng
j=l
g=1
i.e. of the form d;
(2) Since
Zit+:::+2n
where
2; = D> Axze:,;-
nm
V = )> V; and since B,; is a basis of V; it is clear that every t=1
x = V can be expressed in the form (1) and so V is spanned by UB. If now in (2) we have 21 + ---+ 2, = Oy then by 2.1 we deaaes that each z; = Oy and consequently each 4;;= 0. Thus
U B; is a basis ofV. 4=1
2.4 Corollary
n
n
a=)
t=1
dim @ V; = >> dimV;. >
We shall now determine precisely when a vector space is a direct sum of finitely many non-zero subspaces. As we shall see, this is closely related to the following types of linear mapping. Definition Let A, B be subspaces of V = A@B, so that every z € V can the form z = a+b where a € A and on A parallel to B we mean the linear
a vector space V such that be expressed uniquely in b € B. By the projection mapping p: V — V given
by p(x) =a.
Example We know that IR? = X @ D where X = {(z,0);
cE
IR} and D = {(z,z) ; z € IR}. The projection on X parallel to D is given by p(z, y) = (x — y,0). Thus the image of the point (x,y) is the point of intersection with X of the line through (z,y) parallel to to the line D. The terminology used is thus suggested by the geometry. Definition A linear mapping f : V — V is said to be a projection if there are subspaces A, B such that V = A@B and f is the projection on A parallel to B. A linear mapping f :V — V is said to be idempotent if fo f = f.
DIRECT
SUMS
OF
SUBSPACES
11
2.5 Theorem Jf V = A@B parallel to B then
and tf p is the projection on A
(1) A= Imp= {zeEV; z= p(z)}; (2) B= Kerp; (3) p ts idempotent. Proof
(1) It is clear that
If now a € A element in A p(a) =a and (2) Let z €
A = Imp D {rt
EV ; x = p(z)}.
then its unique representation as the sum of an and an element in B is a = a+ 0. Consequently the inclusion becomes equality. V have the unique representation z = a+b where
a€AandbeB. Then since p(x) = a we have p(z)=0y a=0) c=b eB. In other words, Kerp = B.
(3) For every z € V we have f(z) € A and s0, by (1), f(z) = F\é(x)\|.-Thusof —f of > 2.6 Theorem A linear mapping f : V — V 18 a projection tf and only tf 1t 1s tdempotent, in which case V = Imf @ Kerf and f 1s the projection on Imf parallel to Ker f. Proof Suppose that f is a projection. Then there exist subspaces A, B with V = A@B and f is the projection on A parallel to B. By 2.5, f is idempotent. Conversely, suppose that f : V — V is idempotent. If z €
Im f M Kerf then we have z = f(y) for some y, and f(z) = Oy. Consequently, z = f(y) = f[f(y)] = f(z) = Ov and hence Im f N Kerf = {0}. Now for every z € V we observe that
f{z— f(2)] = f(z) — fF (2)] = f(z) — f(z) = and so z — f(z) € Ker f. The identity z = f(x) + z— f(z) now shows that
V = Imf
+ Ker f.
It follows by 2.2 that V =Imf @ Kerf. Suppose now that
z =
a+b where a € Imf and b € Kerf.
Then a = f(y) for some y, and f(b) = Oy. Consequently,
—
f(z) = f(a +) = f(a) + OW = f[f(y)] = fly) =. In other words, f is the projection on Im f parallel to Ker f. >
12
‘
VOLUME
2.7 Corollary If f:V —V Moreover, in this case,
4: LINEAR
ALGEBRA
is a projection then so 1s idy —f.
Im f = Ker(idy —f). Proof Writing f o f = f? we deduce from f?= f that
(idy —f)? =idy —f —f + f?
=idy —f.
Also, by 2.5, we have
zelmf
z= f(z)
(idy —f)(z) = 0
and so Imf = Ker(idy —f). > We shall now show how the decomposition of a vector space into a direct sum of finitely many non-zero subspaces may be expressed in terms of projections. 2.8 Theorem
I[f V is a vector space then there are non-zero
subspaces Vj,...,Vn of V such that there are non-zero that
nm
V = QV; tf and only if =o
linear mappings pj,..-,Pn
: V —
V_ such
(1) La= idy; (2) fee
Pi Op; =0.
Moreover, such mappings p; are necessarily projections and V;, = Inip; fors4.= 1. on
Proof Suppose first that
V = @) V;. Then for: = 1,...,n we e—1
have
V = V; ® }> V;. Let p; be the projection on V; parallel to
jv >» V;, and let p7*(X)= {p;(z) ; z © X} for every subspace X
j#t
of V. Then for every z € V we have, for 7 #12,
P:[p;(z)] € p;page= Djae
by 2.6
DIRECT
SUMS
OF SUBSPACES
and so p; op; = 0.
13
Also, since every s € V can be written n
uniquely in the form z =
)> z; where z; € V; for each 2, and Bil
since p;(z) = 2; for each 7, we have
iSO
n
c=
AES > (2) ~ (> pi)(z) .—
nm
whence
)> p; = idy. rr §
Conversely, suppose that p1,..., pp satisfy (1) and (2). Then we note that nm
nm
Pi = pi cidy = pio ‘goSYh py = =a! YO (pi ops) = pion; so each p; is idempotent and therefore, by 2.6, is a projection. Now for every z € V we have
= idv (2) = (3 ps)(2) =
_ Pi(z) E >> Imp; t
which shows that
t=1
nm
V = )° Imp;. If now z € Imp; N D> Imp; pari ae then, by 2.5, z = p;(x) and z = )> a; where p,(z;) = 2; for
every
7 #1. Consequently,
jFt
= pila) = v(t) = e(Epales) = E pales(a)] = ov jFt
and it follows that
nm
V = @ Imp;. > s=1
The description in 2.8 opens the door to a deep study of linear mappings and their representation by matrices. In order to embark on this, we require the following notion. Definition If V is a vector space over a field F andif f: V +~V is linear then a subspace W of V is said to be f-invariant (or
f-stable) if it satisfies the property
ce W => f(z) EW. Example invariant.
If f : V — V is linear then Imf and Kerf are f-
14
VOLUME
Example
4: LINEAR
ALGEBRA
Let D : IR|X] — IR[X] be the differentiation map
on the vector space of all real polynomials.
Then the subspace
IR, [X] of polynomials of degree at most n+ 1 is D-invariant. Example
If f : V — V is linear and
z € V with z # Oy then
the subspace spanned by {z} is f-invariant if and only if z is an eigenvector of f. In fact, the subspace spanned by {z} is
Fz = {Az ; » © F} and this is f-invariant if and only if for every \ € F there exists w € F such that f(Arz) = wr. Taking \ = 1r we see that z is an eigenvector of f. Conversely, if z is
an eigenvector of f then f(z) = wz for some p € F and so, for every A € F, we have f(Az) = Af(z) = Az. A useful result concerning invariant subspaces is the following. 2.9 Theorem
/ff :V —V
1s linear then for every polynomial
p over F the subspaces Imp(f) and Kerp(f) are f-invariant. Proof
Observe that for every polynomial p we have
foplf) =plf)eof. It follows from this that if c = p(f)(y) then f(z) = p(f)[f(y)],
so Imp(f) is f-invariant; and if p(f)(z) = Oy then p(f)|f(z)| = Oy, so Ker p(f) is f-invariant. > In what follows we shall often have occasion to deal with
expressions of the form p(f) where p is a polynomial and f is a linear mapping, and in so doing we shall find it convenient to denote composites by simple juxtaposition. Thus, for example,
we shall write fp(f) for f op(f), fg for fog, f? for fof. Suppose now that V is of finite dimension n and that the subspace W of V is f-invariant. Choose a basis {w1,..., w,} of W and extend it to a basis
Rise { wi, oy wp, tig, tae Vall of V. Then, since W is f-invariant, it is readily seen that the matrix of f relative to B is of the form
A B 2K
DIRECT
SUMS
OF SUBSPACES
15
where A is an r X r matrix that represents on W by f. Suppose now that V = W, @ W2 where f-invariant. If B, is a basis of W, and Bz by 2.3 we have that B = B, UBz is a basis seen that the matrix of f relative to B is A, O
the mapping induced W, and Wy, are each is a basis of W2 then of V, and it is readily of the form
0 Ag
where Aj, Az represent the mappings induced on Wj, Wp by f. n
More generally, if V =
@ W; where each W, is f-invariant
p,"
respectively, where pi,...,Pk are distinct irreducibles in F[X]. Then jie of the subspaces V; = Kerp,(f)* 13 f-invariant and -6 V;. aa
Moreover,
if f; :Vi —
on V; by f then the minimum
V; 1s the linear mapping died ao Of fa38 pe
and the characteristic polynomual of f; 13 pi ;
16
VOLUME
Proof
If
=
4: LINEAR
1 the result is trivial, so suppose
ALGEBRA
that k > 2.
Fort =sljeg:zkalet qua wnp/presalh Pp,’ Then there is no JF
irreducible factor that is common
to each of q,...,q%
and so
there exist a1,...,a@, € F[X] such that qiay + q2d2+ --' + qeax = 1. Writing t; = q;a; for each zt and substituting f in this polynomial identity, we obtain
(1)
ti(f) +to(f) + --- +te(f) =idy -
Now by the definition of g; we have that if: # 7 then m,;
divides
4i9;- Consequently q;(f)q;(f) = 0 for 7 #7 and then
(2)
(@A3)
— ti(f)t;(f) = 0.
By (1), (2) and 2.8 we see that each t;(f) is a projection and k
i=
@ bey):
Moreover, by 2.9, each of the subspaces Im#;(f) is f-invariant. We now show that Imt,;(f) = Kerp,(f)®. Since p;'q; = my we have p;(f)°‘qi(f) = m;(f) = 0 from which it follows that p;(f)“t;(f)
=
0 and hence Im¢,;(f)
C
Ker p;(f)*. To establish the reverse inclusion, observe that, for every 7,
t3(f) = as(f)a;(f) = TT pel)" -a;(f) iZj
and hence Ker p;(f)* ¢ f) Kert;(f) IFt
C Ker )/ t;(f) T#i
= Ker(idy —t;(f)) = Imt,(f)
by 2.7.
by (1)
DIRECT
SUMS
OF SUBSPACES
17
As for the induced mapping f; : Vi — Vi, let my, be its minimum polynomial. Since p;(f)® is the zero map on V;, so is
pi(fi)*. Consequently we have that my, |pf'. Thus my,|m; and the my, are relatively prime. Suppose now that g € F[X] is a multiple of pid for sian a. Then g(f;) is the zero map on V;.
Bilge eS t=1
GSD
VES V then
t=1
a(f)(2) = 5 (f(x) = a a f:)(u%) = Ov and so g(f) = O and consequently m;|g. my is the least common
Thus we see that
multiple of my,,...,my,,.
Since these k polynomials are relatively prime, we then have m; = [| my,. tt
But we know that m; = llp;', and that my, |ps‘. Since all the 4=1
polynomials in question are monic it follows that ms, = pj‘ for ea eae Finally, we can paste together bases of the subspaces V; to form a basis of V with respect to which the matrix of f is of the block diagonal form
A; Ag Ax Since, by the theory of determinants,
k det(XI — M) = [J det(XI— A;) iit!
we see that x7; = I X4;- Now we know that m;, = p;* and so, by 1.4, we must have Xs = p;' for some r; > e;. Thus bit)
bite
=
I] ef! = xz = IT pF 1
et:
from which it follows that r; = d; fori =1,...,k. >
18
VOLUME
2.11 Corollary
(i=1,...,k)
4: LINEAR
ALGEBRA
dimV; = d; deg p;.
Proof dimV; is the degree of xs;. > 2.12 Corollary Let V be a non-zero finite-dimensional vector space over a field F. If f : V — V 1s linear and all the ergenvalues off le in F, so that
Xp = (X— Ar)" (X — Az)? --- (X— Ak)*, my = (X= Az)3(X - dz)? ---(X— Ak), then each of the subspaces V; = Ker(f — A; idy)* 1s f-invariant,
k
of dimension d;, and VV = @V;.0 1
Example
Consider the linear mapping f : IR® — IR® given by f(z, y, 2) =
(—z,2+2,y + 2).
Relative to the standard ordered basis, the matrix of f is
It is readily seen that x4 = m4 = (X + 1)(X — 1)?. By 2.12,
IR° = Ker(f + idy) @ Ker(f — idy)? with Ker(f+idy )of dimension 1 and Ker( f—idy)? of dimension 2. Now
(f + idy)(z, y,z) = (cx-—2z,2+ y+2,y+4 22) so a basis for Ker(f + idy) is {(1,—2, 1)}. Also, (f —idy)?(z, y,2)=(z—y+z,—-22 + 2y — 2z,2-y+2)
so a basis for Ker(f — idy)? is {(0,1, 1), (1, 1,0)}. Thus a basis for IR° with respect to which the matrix of f is in block diagonal form is B=
{(1,-2, 1), (0, 1, 1), (1, 1, 0)}.
DIRECT
SUMS
OF
SUBSPACES
19
The transition matrix from B to the standard basis is
i P=|-2
OL 1
= 1
1 and the block diagonal form of A is then
=I Baap =
2 ia 1
Example
.0
Consider the differential equation
(D” + a,_,D""* + ---+a,D+ a9)f=0 with constant
(complex) coefficients.
Let V be the solution
space, i.e. the set of all infinitely differentiable functions satisfying the equation. If
m= X" + an-1X" 14+ --++a1X + a9 then over € we have
m=
(X — a1) (X — a2)? ---(X — ag).
Then D: V — V is linear and its minimum polynomial is m. By 2.12, V is the direct sum of the solution spaces V; of the differential equations
(D — a; id)* f=0. Now the solutions of (D — aid)" f = 0 can be determined using the fact that, by a simple inductive argument,
(D — aid)" f = e**D"(e~** f). Thus f is a solution if and only if D"(e~** f) = 0, which is the case if and only if e %* f is a polynomial of degree at most n—1.
A basis for the solution space of (D — aid)" f= 0 is then le (oe Ro eeaed & It is natural to consider the particular case of the Primary Decomposition Theorem in which the irreducible factors p; of my are all linear and each e; = 1. This gives the following important result.
20
VOLUME
4: LINEAR
ALGEBRA
2.13 Theorem Let V be a non-zero finite-dimensional vector space over a field F. Then a linear mapping f : V — V 18 diagonalizable tf and only tf tts minimum polynomial my 1s a product of distinct linear factors. Proof
Suppose that
my = (X — A1)(X — Az) --- (X — Ax) where 41,...,Ax © F are distinct. By 2.12, V is the direct sum of the f-invariant subspaces V; = Ker(f — A;idy). For every z € V; we have (f — ; idy)(z) = 0, so that f(z) = A;z. Thus every non-zero element of V; is an eigenvector associated with the eigenvalue ;. By 2.3, we can paste together bases for V,,...,V~ to form a basis for V. Thus V has a basis consisting of eigenvectors of f, so f is diagonalizable. Conversely, suppose that V has a basis consisting of eigenvectors of f. Let A1,...,A,% be the distinct eigenvalues of f and consider the polynomial
p = (X — A1)(X — Az) ---(X — Ax). Clearly, p(f) maps every basis vector to Oy and consequently p(f) =0. The minimum polynomial m; therefore divides p, and must coincide with p since every eigenvalue is a zero of mys. > Example
Consider the linear mapping f : IR° — IR® given by
f(z, y,2) = (7x — y— 22,-2 + Ty + 22, —22 + 2y + 102). Relative to the standard ordered basis of IR®, the matrix of f is Tt A=j|-1 i
=—1° 7 2
=2 2 10
It is readily seen that x4 = (X — 6)?(X — 12) and that my = (X — 6)(X — 12) so, by 2.13, f is diagonalizable. An interesting result concerning diagonalizable mappings that will be useful later is the following.
DIRECT
SUMS
OF
SUBSPACES
2.14 Theorem Let V space over a field F and mappings. Then f and the sense that there 13 a
21
be a non-zero finite-dimensional vector let f,g: V — V be diagonalizable linear g are simultaneously diagonalizable (in basis of V consisting of eigenvectors of
both f and g) tf and only if fog=gof. Proof = : Suppose that there is a basis {v1,..., un} of V such that each v; is an eigenvector of both f and g. If F(vi) = Avv;
and g(v;) = u,v; then
flg(vi)] = Asmivs = wALv; = 9[F(v,)]. Since f og and go f thus agree on a basis, it follows that they are equal.
V; be the linear mapping thus induced by g. Since g is diagonalizable so is each g;, for the minimum polynomial of g,; divides that of g. We can therefore find a basis B; of V; consisting of eigenvectors of g;. Since every eigenvector of g; is an eigenvector of g and since every element of V; is an eigenvector of f, it
k
follows that
|) B; is a basis of V consisting of eigenvectors of i
both f and g. > 2.15 Corollary Let A,B be nxn matrices over a field F. If A and B are diagonalizable then they are simultaneously dtago-
nalizable (i.e. there is an invertible matriz P such that P~* AP and P~1BP
are diagonal) if and only if AB=
BA.
O
CHAPTER
THREE
Reduction to triangular form
Despite the fact that, in general, f : V — V does not have a diagonal matrix representation, it is possible to ‘simplify’ the matrix representation of f in several ways. In this Chapter we shall describe the ‘easiest’ of these. We shall be concerned with those linear mappings f whose minimum polynomial (and hence also whose characteristic polynomial) factorises completely as a
product of (not necessarily distinct) linear factors.
Of course,
this always happens when the ground field is C, so the results we shall prove will be valid for all linear mappings on a finitedimensional complex vector space. Specifically, we shall show that for such a mapping f there is an ordered basis of V with respect to which the matrix of f is triangular. In order to see how to proceed, we observe first that by 2.12 we can write V as a direct sum of the f-invariant subspaces
V; = Ker(f — A; idy)*. Let f; : V; + V; be the linear mapping induced on the ‘primary component’ V; by f, and consider the mapping f; — 4; idy, : Vi - V;. We have that (f; — 2; idy,)* is the zero map on V;, so f; — ; idy, is nilpotent, in the following
sense. Definition A linear mapping f : V — V is said to be nilpotent if f” = 0 for some positive integer m.
Example f : IR° — IR® given by f(z,y,z) = (0, z, y) is nilpotent. In fact, f?(z, y,z) = (0,0,z) and f* =0. Example If f : €” — C” is such that all the eigenvalues of f are 0 then xs = X” and so, by Cayley-Hamilton, f” = x;(f) = 0. Thus f is nilpotent.
REDUCTION
Example
TO
TRIANGULAR
FORM
23
The differentiation map D : IR,|X] —
IR,[X] is
nilpotent. We now produce a particularly useful basis for V in the presence of a nilpotent linear map. 3.1 Theorem Let V be a non-zero finite-dimensional vector space over a field F and let f : V — V be a nilpotent linear
mapping.
Then there 1s a basis {v,,...,Un} of V such that f(vi) = Ov;
f(v2) € (v1); f(v3) € (v1, v2);
f (un) = (vj, ees
Proof Since f is nilpotent that f™ = 0. If f = 0 then conditions, so let f # 0. integer such that f* = 0.
Oa
there is a every basis Now let k Then ft
positive integer m such of V satisfies the stated be the smallest positive 4 0 for1
A
Seep wie o1a'y Oye
te PORp Ee
aT
bap
VOLUME
34
4: LINEAR
ALGEBRA
Now let fi: =
Aba
53
sis ¢ sony}
€ T,}. Then by 4.3 the set
and write f(T.) = {f(z) ;
By-2U f~ (Tk) is a linearly independent subset of W,-1. Extend this to a basis
By-2U f(Th) U{y1,--- ye} of Wi_1.
Now let
Th-1 = f(Tk) U {yi,---s ye}. Then by 4.3 the set
Br-3 U f~ (Tk-1) is a linearly independent subset of W,_2. Extend this to a basis Beg U Fo tipi) U1 {255
of W,-2, and so on.
ates
Writing T;, as {z1,...,Za}, we thus see
that we can form the following basis of V : Ti,
+e+3
f(z1),
eeey
f{=z),
Y1;
oeey
YB,
f?(z1);
eoey
f?(za);
f(y),
eeey
f (ys),
fF? (21),..-,f"7*
Lay Z1y+++y2yy
(2a) f*-7 (ya) s+» f8-7( yp). + dts + 9 Gu
Note that in this table the elements in the 7-th row from the bottom are in W;. Also, every element in the table is mapped by f to the element lying immediately below it, the elements in the bottom row being mapped to Oy. Now order this basis by taking the first column starting at the bottom, then the second column starting at the bottom, and so on. Then it is readily seen that the ordered basis B that we obtain in this way is such that the matrix of f relative to B is a Jordan block associated with the eigenvalue 0. >
REDUCTION
TO
JORDAN
FORM
35
Example To illustrate the above argument, consider the mapping f : IR* > IR* given by
f(a, 6, c,d) = (0, a,d,0). We have f? = 0 so f is nilpotent of index 2. Now
V; = Ker f = {(0,b,c, 0) ; b,c € IR},
V2 = Ker f? = IR*.
A basis for V; is By = {(0,1,0,0), (0,0, 1,0)} which we extend to a basis
Bz = {(0,1,0,0), (0,0,1,0), (1,0,0,0),(0,0,0,1)} of IR*. Now consider Tp = {(1,0,0,0), (0,0,0,1)}. We have
f- (Tz) = {(0,1,0,0), (0,0, 1,0)} and B, U f~(T2) = Bi. We then form the basis (1,0,0,0), (0,0, 0, 1), (0, 1,0, 0), (0,0, 1, 0) of IR* and order it as follows : Ba
{(0, 1,0, 0), (1,0, 0,0), (0,0, 1, 0), (0, 0,0, 1)}.
The transition matrix from B to the standard basis is
oO oorF KF oo°oO oroo CO ©O
Now P~1! = P and the matrix of f relative to the standard basis is
OF Oo © OsOZOEO oro°o © oorF
36
VOLUME
4: LINEAR
ALGEBRA
So the Jordan block is given by Ome:
PAP —1
=
niet
ms
0
0
In practice, we rarely have to carry out the above computation. To discover why, let us look more closely at the proof of 4.4. Observe that [Tx |=
|T.-1| = a+
A=
Nk —
Nk-1;
B = ne-1 — Pe-2;
|[T.-2| = a+ B+7 = ng-2— Ne-3,
and consequently a+B+7+---+w
=dim Kerf,
as can be seen by referring to the basis displayed on page 34. The number of elements in the bottom row of this display is dim Ker f. Now from this basis it is clear that there are a > 1
elementary Jordan matrices of size k x k involved, then 8 >
O of size (kK — 1) x (k — 1), and so on.
So we conclude from
the above observation that the number of elementary Jordan matrices appearing 1s dim Ker f. Returning to our Example, we see that Ker f has dimension 2, so there are precisely two elementary Jordan matrices involved. Since one at least has to be of size k x k = 2 x 2, the only possibility for the Jordan block is Hl 0 0 Obert
0
0
Let us now apply 4.4 to the mappings f;—, idy, of 2.12. Note
that by 2.10 the minimum polynomial of f; is my, = (X — i). Consequently we have that the mapping f; — A; idy, 13 nilpotent of index e; on the d;-dimensional subspace V;.
REDUCTION
TO
JORDAN
FORM
37
4.5 Theorem Let V be a non-zero finite-dimensional vector space over a field F. If f : V — V 1s linear and if all the eigenvalues of f le in F then there is an ordered basis of V with respect to which the matriz of f is a block diagonal matriz
A; Ag Ax an which every A; ts a Jordan block. Proof With the usual notation, if we apply 4.4 to the nilpotent mapping f; — A; idy, then we see that there is a basis of V; = Ker(f — A; idy)* with respect to which the matrix of f; — Azidy, is a Jordan
block with 0 down
the diagonal
(since
the only eigenvalue of a nilpotent mapping is 0). It follows that the matrix of f; is a Jordan block with \; down the diagonal.
o Definition
A matrix of the form described
in 4.5 is called a
Jordan canonical matriz of f. Of course a Jordan canonical matrix is, strictly speaking, not unique since the order in which the Jordan blocks A; appear down the diagonal is not specified. However, the number of such blocks, the size of each block, and the number of elementary Jordan matrices that appear in each block, are uniquely determined by f. So, provided we ignore the order of the blocks, we can choose to talk of ‘the’ Jordan matrix that represents f. This is often also called the Jordan normal form. If the characteristic and minimum polynomials of f are k
xf
II (X -A)*, =
8
Md
t
ioe(xen)
then from the previous discussion we have that, in the Jordan form, the eigenvalue 4; appears d; times in the diagonal, and the number
of elementary
Jordan matrices
associated with ;
is dim Ker(f; — 4; idv,), which is the geometric multiplicity of the eigenvalue A;. Moreover, at least one of these elementary Jordan matrices is of size e; X e;.
38
VOLUME
4: LINEAR
ALGEBRA
Example Let f : IR’ — IR” be linear with characteristic and minimum polynomials
xp = (X= 1)(X=2)4,
my = (X—1)7(X—2)°.
In any Jordan matrix that represents f the eigenvalue 1 appears three times in the diagonal, with at least one associated elementary Jordan matrix being of size 2 x 2; and the eigenvalue 2 appears four times in the diagonal, with at least one associated elementary Jordan matrix of size 3 x 3. Up to the order of the blocks, the only possibility is therefore
no no
2 Example Let us modify the previous Example slightly. pose that yy is as before and that now
Sup-
my = (X — 1)?(X— 2)?. In this case the eigenvalue 2 appears four times in the diagonal with at least one associated elementary Jordan matrix of size 2 x 2. The possibilities for the Jordan form are then
Example
If f :V — V has characteristic polynomial
xs = (X - 2)?(X— 3)°
REDUCTION
TO
JORDAN
FORM
39
then the possible Jordan forms, obtained by considering all six possible minimum polynomials, are
where A is one of
and B is one of
a
| 3.
Lie
3
;
3
3
3
3
We now consider the problem of finding a Jordan basis for f, i.e. a basis of V with respect to which the matrix of f is a Jordan canonical matrix J. This is, of course, equivalent to the problem of finding an invertible matrix P such that P-1'AP = J where A is the matrix that represents f relative to some fixed ordered basis. To see how to proceed, it suffices to consider the very special case where the Jordan matrix of f is the tx t matrix Atel AY r
1
NY} nN
A corresponding basis {v,,..., v¢} will be such that f(v1) = Av,
f(v2) =Ave+ v1, f (vs) =
Av3
+
U2;
f(ve-1) = Ave-1 + U2; f (ve) =
Avt ae
aie
40
VOLUME
4: LINEAR
ALGEBRA
\
Thus, for every t X t elementary Jordan matrix associated with A we require v1,...,v4 to be linearly independent with
v, € Im(f — Aid) N Ker(f — Aid); (i=2,...,t) Example
(f —Aid)(vs) = vi-1.
Let f : IR° — IR® be given by
f(z,y,2z) = (x+y, -2+ 3y,-—z+ y+ 22). Relative to the standard ordered basis, the matrix of f is jets Gh A=|-1 3 0 —1 1:2
We have x4 = (X — 2)° and mg = (X — 2)?. The Jordan form is then J=
Now we have (f = 2id)(z, Y, 2) =
(—z ty,-Z+y,-zt+
y)
and we have to first choose v; € Im(f — 2id) MNKer(f — 2id). Clearly, vy = (1, 1,1) will do. Next we have to find v2, indepen-
dent of v,, such that (f — 2id)(v2) = v1. Clearly, vg = (1, 2,1) will do. Finally, we have to choose vs € Ker(f — 2id) with {v,, v2, v3} independent. Clearly, vg = (1, 1,0) will do. Thus a Jordan basis is
B = {(1,1,1), (1,2, 1), (1,1, 0)}. The transition matrix from B to the standard basis is Lot P=|1 2 40.58 fOn We invite the reader to verify that P~1'AP = J. An interesting consequence of the Jordan form is the following result.
4
REDUCTION
TO
4.6 Theorem
JORDAN
FORM
41
Every square matriz A over € is similar to tts
transpose.
Proof Because of the form of the Jordan canonical matrix, it clearly suffices to establish the result when A is an elementary Jordan matrix of the form
Now if
f B=
{vy,...,v%}
is an associated Jordan basis define (¢=1,...,k)
Wy = VE-441
and consider the ordered basis
B* = {w,-.., we} = {p,-.., 01}. Now it is readily seen that the matrix relative to this basis is A‘. Consequently we have that A is similar to A’. © We shall now illustrate the usefulness of the Jordan form in solving systems of linear differential equations. It is not our intention to become heavily involved with the theory. A little by way of explanation together with some illustrative examples is all we have in mind. By a system of linear differential equations with constant coefficients we mean a system of equations of the form w= 04,2, + ay272 + --° + Ginn
Ly = A212, + 42272 + -*
a, =
Oni 24 + On2Z2 +
+ GanTn
-**- + OnnZn
42
‘
VOLUME
4: LINEAR
ALGEBRA
where 21,...,Zn are real functions, z, denotes the derivative of z;, and a,; € IR for all 7,7. These equations can be written in the matrix form
(1)
xX! = AX
where X = [z1 ... Zn’ € Matn xi(IR) and A = [ajj|nxn. Suppose that A can be reduced to Jordan normal form Ja, and let P be an invertible matrix such that P~'AP = Jy. Writing Y = P~'X, we have (2)
(BY aX
= AX
APY
and so
(3)
Y' = P-'X' = P“1APY
= Jay.
Now the form of J4 means that (3) is a system that is considerably easier to solve for Y; then, by (2), PY is a solution of
(1).
Example
Consider the system
zy=
21+
2
Lp = —21 + 322 Zs =
21
+ 422 —
Z3
ie. X! = AX where Zi Pe
1
T21;
A=]|-1
z3
el 3
-1
4
0 0 -1
We have x4 = (X + 1)(X — 2)? = my and so the Jordan form of A is eeL Ja =
a
IL 2
We now determine an invertible matrix P such that P~!AP = Ja. For this, we determine a Jordan basis. Let us do so with
REDUCTION
TO
JORDAN
FORM
43
matrices rather than mappings, for a change. Clearly, we have to find independent column vectors p;,p2, p3 such that (A =o I3)p1
=
0,
(A io 2I3)p2
=
0,
(A — 2I3)p3 = po. Suitable vectors are, for example, 0
1
Pi=/0],
—1
po=]1],
1
Ps=
1
0
Thus we can take Pa
0"
i
t0
ed
—1
Leo
0
(Check that P~'AP = Jy or, equivalently, that AP = PJ,.) With Y = P~1X we now solve Y' = JY, ice.
¥,=—-1, y> —
2y2 ata ¥3;
y3 = 2ys. The first and third of these equations give y; = y3 = a3e**, and the second equation becomes
t a ,e ” and
yy = 2y2 + age” 2t so that yz = ag3te** + age**. Consequently we see that
a,e* Y = | age** + agte” a3e*t A solution of the original system of equations is then given by ae" + a3 (t == 1)e”* >, Cae
aze** + a3te?*
aye~* + age” + agte”*
CHAPTER
FIVE
The rational and classical forms
Although in general the minimum polynomial of a linear mapping f : V — V can be expressed as a product of powers of irreducible polynomials over the ground field F of V, say €1 7€2 mz = pip? ... pe,
the irreducible polynomials p; need not be linear. Put another way, the eigenvalues of f need not in general all lie in the ground field F. It is natural, therefore, to seek a canonical matrix representation for f in the general case, which will reduce to the Jordan representation when all the eigenvalues of f do belong to F. In order to develop the machinery to deal with this, we first consider the following notion. Suppose that W is a subspace of the vector space V. Then in
particular W is a (normal) subgroup of the additive group of V and so we can form the quotient group V/W. this are the cosets
The elements of
r+W={t+w;weW}, and the group operation is given by
(c+W)+(y+W) =(c+y)+W. Now we can define a multiplication by scalars on V/W by setting
A(z +W) =Az+W. With respect to this, it is readily seen that V/W becomes a vector space over F. We call this the quotient space of V by W
and denote it also by V/W.
THE
RATIONAL
AND
CLASSICAL
FORMS
45
5.1 Theorem [fV is a finite-dimensional vector space and W ts a subspace ofV then the quotient space V/W is also finitedimensional. Moreover, tf {v1,...,Um} ts a basis of W and
{zi + W,...,2% +W} ts a basis ofV/W then BB 45 ones Urey Figs pS 1s a basis of V.
Proof The natural mapping 4 : V — V/W is given by }(z) = z+ W and is linear. In fact,
h(c+y)=c2+yt+W
=(zc+W)+(yt+W)
=b(z) + b(y);
h(Az) = Az + W = A(z 4+ W) = Ab(z). Suppose now that {21 +W,...,2, +W} is any linearly independent subset of V/W. Then the set {z,,..., 2} of coset representatives is a linearly independent subset of V. For, suppose k that >> A;2; = Oy. Then, using the linearity of 4, we have 7
k
ie
k
Ovjw = h(0v) = n( dss) = »u Aih(zi) = 2 A(z + W) and so each 4; = 0. Consequently k < dimV and V/W is of finite dimension. Consider now the set B. Applying } to any linear combination of elements of B we see as above that B is linearly independent.
Now for every z € V we have f(z) € V/W so there exist scalars A; such that k
2+W
=
k
A(2:+W)
=
Ps 2—1
(
Asai) + W
ues!
and hence z — )> A;z; € W, so that ‘=
*
c— DO Az; = t= 1
*
YO py0;. gj=1
Thus B also spans V and hence is a basis. >
5.2 Corollary
dimV = dimW + dimV/W.¢
5.8 Corollary
[f[V
Proof
We have dimV
dim V/W
=W@Z then Z~V/W. = dimW + dimZ so, by 5.2, dimZ =
and it follows that
Z7~V/W.
We shall be particularly interested in the quotient space V/W when W is a subspace that is f-invariant. In this situation we have the following result.
46
VOLUME
4: LINEAR
ALGEBRA
5.4 Theorem Let V be a finite-dimenstonal vector space and let f:V —V be linear. If W is an f-invariant subspace ofV then the prescription
f'(c+W) = f(z) +W defines a linear mapping f': V/W + V/W, the minimum polynomial of which divides the minimum polynomual of f. Proof Observe that ifz+W =y+W since W is f-invariant,
then
z—y €W
and so,
f(z) — f(y) = f(z-y) EW which gives f(z)
+W = f(y) +W.
mapping from V/W that
to itself. To see that f’ is linear we observe
Thus f' indeed defines a
f'[(z+W) + (y+W)] = f(z+y)+Ww = f(z)+f(y)+W
= (f(z) +W] + [f(y) +); f'[A(z + W)] = f(Az) + W = Af(z) + W = A[f(z) + W]. Now for all positive integers n we have (f”)’= (f')”. This is readily seen by induction. For the anchor point n = 1 the result is trivial; and for the inductive step we have
(fo4*)'(2 + W) = = = =
fot? (2) +W f[f"(z)]+W f'[f"(z) + W] f'[(f')"(z+W)|
= (f')"**(c+W). Thus, for every polynomial p = > a; X* we have [p(f)]’ = p(f'). Consequently, taking p = my we pe: that 0 = m;(f') and hence that my: |my. >
THE
RATIONAL
AND
CLASSICAL
FORMS
47
Definition We call f': V/W — V/W the linear mapping induced by f on the quotient space V/W. We shall now consider a particular type of f-invariant subspace. Let z be a non-zero element of V and consider the set
Zz of all elements of V of the form p(f)(z) where p ranges over all polynomials in F[X]. It is clear that Z, is a subspace of V, and that it is f-invariant. Example
Let f : IR° — IR® be given by
f(z, y,z) = (-y+2z,2+4 2, 22). Consider the element (1,0,0). We have f(1,0,0) = (0,1,0) and f?(1,0,0) = (0, 1,0) = —(1,0, 0), from which it follows that Z(1,0,0) = {(z, y,0) ; z,y € IR}. Our immediate objective is to discover a basis for the subspace Z,. For this purpose, consider the sequence
2, f(z), f?(2),...,f"(z),.-. of elements of Z,. Clearly, there exists a least positive integer
k such that f*(z) is a linear combination of the elements that precede it in this list, say
f*(z) = Aor + Arf(z) + «++ Ag—if**(z), and {z, f(z),..., f*~1(z)} is then a linearly independent subset of Z,. Writing a; = —A, fori = 0,...,k — 1 we deduce that the polynomial
Mz = dp +a,X4+ ---+a,_,X*~) + X* is the monic polynomial of least degree such that m,(f) ‘anni-
hilates’ z, in the sense that m,(f)(z) = Oy. Definition
We call m, the f-annthilator of z.
Example Referring to the previous Example, let z = (1,0,0). Then we have f?(z) = —z. It follows that the f-annihilator of zis mm, = X?+1. With the above notation, we have the following result.
48
VOLUME
5.5 Theorem
If z€V
4: LINEAR
ALGEBRA
has f-annthilator
Woe agar Fe ae then the set B, = {z, f(z),...,f* *(z)} 1s @ basis of Zz, so that dim Z, = degm,. Moreover, if fz : Zz + Zz 13 the induced linear mapping on the f-invariant subspace Zz then the matriz of fz relative to the basis B, 13 o- 00 gee Can
Ome
OO
0 OF
—ao Gi
0
—ag
10-06. t be
Ge se
Finally, the minimum polynomial of fz 13 mz.
Proof Clearly, B, is linearly independent and f*(z) € (Bz). We prove by induction that f"(z) € (B,) for every n. This is clear for n = 1,...,k.
Suppose then that n > k and that
{”-1(z) € (Bz). Then f"~1(z) is a linear combination of x, f(z),...,f*~1(z) and so f"(z) = f[f"~1(z)] is a linear combination of f(z), f7(z),..., f*(z). Since f*(z) € (Bz) it follows that f(z) € (B,). It is immediate from this observation that p(f)(z) € (Bz) for every polynomial p. Thus Z, C (Bz) whence we have equality, the reverse inclusion being obvious. follows that B, is a basis of Z,. Since
It now
f(z) = f(z) fel f(2)] = f?(z)
fel f*~*(z)] = f*~*(2) fal f*—*(z)] = f*(2) = —aoa — ayf(z) — ---— ag—1 f*-*(2) it is clear that the matrix of f, relative to the basis B, is the above matrix C,,,. Finally, suppose that the minimum polynomial of f, is UL oS
bo = 5 by X +
LS
bacgiktint
+X’.
THE
RATIONAL
AND
CLASSICAL
FORMS
49
Then we have
Oy = my, (f2)(z) = my, (f(z) = bot ---+b-_1 f(x) +f" (z) from which f"(z) is a linear combination of z, f(z),..., f’—!(z) and therefore k < r. But m,(f) is the zero map on Z,, whence
so is mz(fz). Consequently we have m;,|m, and sor < k. Thus r=k and it follows that my, = mz. > Definition We shall call Z, the f-cyclic subspace spanned by {x}, and C,,, the companion matriz of the f-annihilator m,. Any basis of the form B, will be called a cyclic basis, and x will be called a cyclic vector. A subspace that has a cyclic basis will be called a cyclic subspace. Our first main objective can now be revealed. It is to prove that if f : V — V has minimum polynomial of the form p* where p is ureducible then V can be expressed as a direct sum of f-cyclic subspaces, the main consequence of this being that f then has a block diagonal representation by companion matrices. Before establishing these facts, we require the following observation. 5.6 Theorem Let W be an f-invariant subspace of V. Then both the f-annthilator ofx and the f'-annthilator of s+W divide the minimum polynomual of f. Proof By 5.5, the f-annihilator of z is the minimum polynomial of f,, the mapping induced on Z, by f, which clearly divides the minimum polynomial of f. As for the f'-annihilator of z + W, this likewise divides the minimum polynomial of f' which, by 5.4, divides that of f. > 5.7 Theorem [Cyclic Decomposition] Let V be a non-zero tor space of finite dimension and let f: V — V be linear minimum polynomial mz = p* where p 13 irreducible. Then are cyclic vectors 21,...,Z and positive integers nj,...,n~ each n; 1) and let V be of dimension n.
As my = p’, there is a non-zero z; € V with p*—'(f)(zi) # Oy.
The f-annihilator of z, is then mz, = pe. Let W = 22,
and let f': V/W —V/W be the induced mapping. By 5.4, the minimum polynomial of f’ divides m; = p* and so the inductive
hypothesis applies to f’ and V/W. Thus there exist f’-cyclic subspaces Zy,4w,---,Zy,+w of V/W such that k
V/W = ® Zy,4w {=2
and, for 2
5.9 Corollary
dimV = (n; + ---+n,)degp.>
Without loss of generality, we can assume that the cyclic vectors Z1,..., 2% of 5.7 are arranged such that the corresponding integers n,; satisfy
t=n2>ng2-:°2m%,21. With this convention, we have :
52
VOLUME
5.10 Theorem Proof
4: LINEAR
ALGEBRA
nj,...,n,% are uniquely determined by f.
From the above we have, for every 1,
dim Z,, = deg m,, = deg p™ = dn. Observe that for every integer 7 the image of Z,, under p(f)? is the f-cyclic subspace Z,/)i(2,)- Since the f- eanitelatee of 2; is p”', of degree dn;, we see that the dimension of Z,(;)i(2,) is
0 if 7 > n, and is d(n,; — 9) if 7 < n. Now every x € V can be written uniquely in the form Z=vu+---+u,>
(v4; € Zz,)
and so every element of Im p(f)? can be written uniquely in the form
p(f)? (2) = p(f)? (v1) + --- + pC)? (ve). Thus, if r is the integer such that nj,...,n, > 7 and np41 7
It follows from this that
dim Im p(f)?~ +— dim Im p( f)?
=4( © (w-s+1)- ¥ (n-3)) ng>j—1
ny>zj
=d Di (mn—g+1-n +3) ni27
=d 351 nji>J
= dx number of n; > 7. Now the dimensions on the left are determined by f so the above expression gives, for each 7, the number of n,; that are greater than or equal to 7. This determines the sequence
t=nyp2ng completely. >
20-92 ng > 1
THE
RATIONAL
AND
CLASSICAL
FORMS
53
Definition If the minimum polynomial of f is of the form p' where p is irreducible then, relative to the uniquely determined chain of integers t =n, >ng>--->n,> 1, the polynomials p’ = p™,p™2,...,p™* are called the elementary divisors of f. It should be noted that the first elementary divisor in the sequence is the minimum polynomial of f. We can now apply the above results to the general situation where the characteristic and minimum polynomials of a linear mapping f:V —V are
Xf Pi Po ely
My
pps? pe
where pi,..., px are distinct irreducible polynomials. We know by the Primary Decompositon Theorem that there is a basis of V with respect to which the matrix of f is a block diagonal matrix ;
A, A2 Ax in which A; is the matrix (of size d; deg p; x d; deg p;) that represents the induced mapping f; on V; = Kerp;(f)*'. Now the minimum polynomial of f; is p;* and so, by the Cyclic Decomposition Theorem, there is a basis of V; with respect to which A; is the block diagonal matrix
C; Ci2 C; in which the C;; are the companion matrices associated with the elementary divisors of f;. By the previous discussion, this block diagonal form, in which each block A, is itself a block diagonal of companion matrices, is unique (to within the order
of the A;). It is called the rational canonical matriz of f. It is important to note that in the sequence of elementary divisors there can be repetitions, for some of the n; can be equal. The result of this is that some companion matrices can appear more than once in the rational form.
54
VOLUME
Example
4: LINEAR
ALGEBRA
Suppose that f : IR* — IR* has minimum polynomial mr = X* +1.
Then yy = (X? +1)?. By 5.9 we have 4 = (nm; + ---+ nx)2. Since the first elementary divisor is the minimum polynomial, we must have n, = 1. Since we must also have each n; > 1, it follows that the only possibility is k = 2 with ny = nz = 1. The rational canonical matrix of f is therefore Ori 1 0 Cyx241
Example nomial
® Cxr41
—
De a 1 0
Suppose now that f : IR° > IR° has minimum poly-
my = (X?+1)(X -2)?.
The characteristic polynomial of f is then one of
x1 = (X? + 1)?(X —2)?,
x2 = (X? +:1)(X -2)*.
Suppose first that yy = x1. Then, arguing exactly as in the previous Example, we see that the rational canonical matrix is
Cx241 ® Cx241 8 C(x_2)2. Suppose now that xy = x2.
In this case we know that IR° =
V, ® V2 with dimV; = 2 and dimV2 = 4. Also, the induced mapping f2 on V2 has minimum polynomial (X — 2)?. By 5.9 applied to fz : V2 + V2 we have 4=n,+ ---+n, with n; = 2. There are therefore two possibilities, namely k =2 withnj = no =2;
k=3
with n, =2,no = ng = 1.
The rational canonical matrix of f is therefore of one of the forms
Cx241 ® C(x~-2)2 B C(x_2)2,
Cx241 8 C(x-2)2
@ Cx-2 ® Cx-2.
Note from the above Example that a knowledge of both the characteristic and the minimum polynomials is not in general enough to determine completely the rational form.
THE
RATIONAL
AND
CLASSICAL
FORMS
55
Note also that the rational form is quite different from the Jordan form. To see this, let us take a matrix in Jordan form and find its rational form. Example
Consider the matrix
PS II oo on Fe © nN KF
We
have
XA=
(X —
2)° =
an
and, by 5.9,3=nj+---tnz
with n; = 3. Thus k = 1 and the rational form is
0 C(x-2)3
i
0 Lae
0
1
8 —12
6
The fact that the rational form is quite different from the Jordan form suggests that we are not yet finished, for what we want is a general canonical form that will reduce to the Jordan form when the eigenvalues lie in the ground field. We shall now obtain such a form by modifying the cyclic bases used to obtain the rational form. In so doing, we shall obtain a matrix representation constructed from the companion matrix
of p; rather than those of p‘". 5.11 Theorem Let z be a cyclic vector of V and let have minimum polynomial p” where p=atayX+
oe CS
f:V —~V
Ge
Then there is a basis of V with respect to which the matriz of f as the kn x kn matriz
C, M Cc, M
56
VOLUME
4: LINEAR
ALGEBRA
1s the kxk
in which Cy is the companion matriz of p, and M matriz il
Proof
Consider the kn elements
(a),
yea
f(z),
x
P(AF*-*(z)],
---,
PFA (2)],
PCF) (2)
of)” [f8-2(2))y «+s PUA)" LF(a) vf)"(2)To show that this set is a basis of V it suffices to show that it is linearly independent. Suppose that it were not so. Then some non-trivial linear combination of these elements would be Oy
and so there would exist a polynomial h such that h(f)(z) = Ov with deg h < kn = deg p”. Since z is cyclic, this contradicts the assumption that p” is the minimum polynomial of f. We order this basis in a row-by-row manner, as we normally read. Now f maps each element in the above array to its predecessor in the same row, except those at the beginning of a row. For these elements we have, for example,
f[f*-*(2)] = f(z) =
—a,—1f*—}(z)
Te
tare
ro
+ p(f)(z).
It is now an easy matter to verify that the matrix of f relative to the above basis is of the form described. > Definition A block matrix of the form described in 5.11 will be called a classical p-matriz associated with the companion matrix
Co: Applying 5.11 to the cyclic subspaces appearing in the Cyclic Decomposition Theorem, we see that in the rational canonical matrix of f we can replace each diagonal block of companion matrices associated with the elementary divisors p;'* by a classical p;-matrix associated with the companion matrix of p;. This gives another canonical matrix which we call the classical canontcal matriz of f.
THE
RATIONAL
Example
AND
CLASSICAL
FORMS
57
Let f : IR° — IR® be such that Xf
=
My
=
ee
+ WR
1)?(X
—-X+
Then the rational canonical matrix of f is 0
00
-1
OO
Sy
010
OCC
lg
2 -8
18
0
-1
1
-2
and the classical canonical matrix is Oo
1
-1
Jy
0
1
0
0
0 1
-1 il —1
1
Finally, let us note that if in 5.11 we have
p = X — a (so
that k = 1 and f — aidy is nilpotent of index n) then C, is the
1 x 1 matrix [a] and the classical p-matrix associated with C, reduces to the n x n elementary Jordan matrix associated with the eigenvalue a. Thus the classical form reduces to the Jordan form when the eigenvalues belong to the ground field.
CHAPTER
SIX
Dual spaces
If V and W
xe V
are vector spaces over a field F then the set
Lin(V, W) of linear mappings-from V to W is also a vector space over F': if f,g € Lin(V, W) define f + g and Af by (f + g)(z) = f(z) + g(z) and (Af)(z)= Af(z)f and observe that f + g,Af belong to Lin(V,W). A particular case of this is of especial importance,
namely that in which for W we take the ground
field F (regarded as a vector space over itself). It is on this ~ vector space Lin(V, F) that we shall now focus our attention. Definition By the dual space of V we shall mean the vector space Lin(V, F), which we shall denote by V4. The elements of V4, ie. the linear mappings f : V — F, will be called linear functionals (or linear forms) on V. Example
The :-th projection p; given by p;(z1,...,2n) = 2;
is a linear functional on IR” so is an element of (IR”)?. Example If V = Mat,,.,(€) then T: V — € given by T(A) = doy. 4 is a linear functional on V so is an element of V4. A eV \ Example
The mapping J: IR[|X] — IR given by I(p =f, ai
a linear functional on IR[X] so is an element of ei In what follows, we shall denote a typical element of V4 by z?. Thus the notation 2? will be used to denote a linear mapping from V to the ground field F. We begin by showing that if V is of finite dimension then so is the dual space V4. This we do by constructing a basis of V4 from a basis for V.
DUAL
SPACES
59
6.1 Theorem Peasy
Let {v1,...,un}
be a basis of V and fori =
Ms Lek ud :V — F be the linear mapping such that
See
Then {v¢,..., Proof
Ha
v2} ts a basis of ve.
a {
4
ee
It is clear that vf € V4. Siok
»
.
a
\
an
that : r;v¢ = 0 in a
V4. Then for 7 = 1,...,n we have
Or = (3 vf) (v4)= DoAsvflos) = DoAber = As and so {v¢,..., v4} is linearly independent. If
j=l
then we have (x)
n
= > 2,0; EV nea:
uf (z) = d z;vf(v;)= YS236 ig = Z—
J=1
and hence, fo. ary f € v4,
(2. F(es)ef) (2) = 2 fes)o#(@) = >) f(oi)a =f(> avs) = F(2) ca)
4=1
Thus we see that
(**)
(VfeV*)
f= Do slvi)u? Ges
which shows that {v?,...,v4}
also spans V4, whence
basis. > 6.2 Corollary
If dimV is finite then dimV4 = dimV.
Note from (*) and (**) in the above proof that (Vz
EV)
t=), vd (x) v,; t=1
it is a
60
VOLUME
4: LINEAR
ALGEBRA
nm
(Vzt e V4)
at = > 24(v,) vf. t=1
Definition If {v,,...,v,} is a basis of V then we shall say that the basis {v?,..., v4} of V4 described in 6.1 is the corresponding dual basis. Because of (x) above, the mappings uv?,...,u% are often called the coordinate forms associated with v1,..-,Un-
Example Consider the basis {vi, v2} of IR? where v; = (1,2) and v2 = (2,3). Let {v?, v4} be the dual basis. Then we have
1 = vf (vy) = vf(1,2) = vf(1,0)+ 2v7(0, 1);
0 = vf (v2) = v#(2,3) = 2v9(1,0)+ 3vf(0, 1). These equations give v?(1,0) = —3 and v¢(0, 1) = 2 and hence uv? is given by
v?(z,y) = —3z + 2y. Similarly, we have
v3(z,y) = 22 —y. Example Consider the standard basis {e1,...,én} of IR". By definition, we have e¢(e;) = 6;; and so nm
e?(21,. tiging)
(>
nm
2;¢;) sat ig a;e¢(e;) =f;
g=k
g=1
whence the dual basis is the set of projections {p1,..., pn}. pae( Myync Mu. J
Example
Let t1,...,t,41 be
xX.
n+ 1 distinct real numbers and
for each 7 let ¢, : IRn[X] — IR be the substitution mapping given by ¢,(p) = p(t;). Then B
=
Cees
ee
is a basis for IR,|X]*. In fact, since IR,|X]? has the same dimension as IR,,[X], namely n + 1, it suffices to prove that B is
DUAL
SPACES
61 n+1
linearly independent. But if > A;¢, = 0 then t= n+1
0=
(s2 Ase, ) (1)=A,
+ A2q+ oot
Angad
t=1 n+1
o=
(3: ses) (X) = Ayti + Agta + +++ +Angitnyi t=1
n+1
ses) (X™) = Art? + Aats + +--+ Angithysor (> s=1 The coefficient matrix of this system of equations is the Vandermonde matriz Lael
Bae
tpeits.
“s3%
i! tn+1
MS
a ek
ee
By induction, it can be shown that det M = || (t; — t,). Since j 6.7 Corollary
f': W4
If f : V —
W
is an tsomorphism
then so 18
V4; moreover, we have (f*)~! = (f—')*.
Proof This follows from (1) and (3) on taking g = f~!. > Of course, when V and W are finite-dimensional 6.6 and 6.7 follow immediately from 6.5 and the corresponding properties of transposition for matrices. We can also consider the transpose of f*. We denote this by f* and call it the bitranspose of f. The connection between bitransposes and biduals is the following. 6.8 Theorem
For every linear mapping f : V —
W
the dia-
gram
V2
“|
[+
iia hy ftt
1s commutative, in the sense that ft Proof We have to show that f**(Z) =
cay = aw of. _—™
f(z) for every
z EV.
Now for all y? € V4 we have
[t**(@)l(y*) = (#0 F*)(y*) = a[f*(y%)] = (z, ie (y*))
= (f(z), y*) = F(2)(y%), from which the result follows. ©
An immediate consequence of 6.8 is that when V and W are of finite dimensions (in which case we agree to identify V, V and W, W and therefore also ay ,idy and ay, idy) we have f** = f.
This then matches the matrix situation, where At = A.
DUAL
SPACES
Definition
67
If z € V and y? € V@ are such that (z,y?) = 0
then we say that z is annthilated by y*.
Since (z,y*) = y4(z) we see that the set of elements of V that are annihilated by y* is Ker y?. Now it is immediate from the identities (f), (7) preceding 6.4 that, for every non-empty subset E of V, the set of elements of V@ that annihilate every element of E is a subspace of V¢. We denote this subspace by E°. Thus
E° = {yt eV"; (VrEE)
(2z,y*) = 0}.
We call E° the annthilator of E. It is clear that {Oy }° = and that V° = {Oy A:a¢ E W° then for 7 = 1,...,m we have —/-
0 = (a;,2%) = mA(aj, a2) =dj. It follows that {a¢,,,,..., a4} is a basis of W° and consequently
dimW° =n—m=dimV — dimW. As for the second statement, consider the subspace W°°
(W°)° of V =V.
=
By definition, every element of W is annihi-
lated by every element of W° and so we have the other hand, by what we have just proved,
W C W°°.
dimW°° = n—dimW° = n-— (n—m) = m=dimW. It follows, therefore, that
W =W°°.
Annihilators and transposes are connected :
On
68
6.10 Theorem 1s linear then
VOLUME
4: LINEAR
ALGEBRA
If V,W are finite-dimensional and f :V +~W
(1) (Im f)° = Ker f*;
(2)4 (Kerf) o—= Imnsf*; (3) dimImf* = dim Im f; (4) dim Ker f* = dim Ker f. Proof (1) We have y4 € (Im f)° if and only if, for every z
EV,
0 = (f(z), y*) = (z, f*(y*)) which is the case if and only if f*(y*) EV° = {Oy}, ie. if and only if y? € Ker f*. (2) Replacing f by f* in (1) and using the fact that f** = f, we obtain (Im f*)° = Ker f. Then, by 6.9, (Ker f)° = (Im f*)°° = Im f?. (3),(4) follow from (1),(2), and 6.9. > 6.11 Corollary The row and column rank of a matriz A over a field F are the same. Proof If A represents a linear mapping f then A* represents f'. The result follows from the fact that the row rank of A is dim Im f and the column rank of A is the row rank of A’ which is dim Im f*. }
CHAPTER
SEVEN
Inner product spaces
In some aspects of our discussion of vector spaces the ground field F has played no significant réle. In this Chapter we shall restrict F' to be IR or C, the results we obtain depending heavily on the properties of these fields. Definition Let V be a vector space over €. By an inner product on V we shall mean a mapping f : V x V — C, described by
(xz, y) + (z|y), such that for all z,2’,y € V and alla € € the following identities hold :
(1) (2)
(e@t+2'|y) = (zy) +(2'|y)s (az|y) = a(z|y);
(3)
(z|y) = (y|z), so that in particular (z|z) € IR;
(4)
(z|z) > 0, with equality if and only if z = Oy.
By a complez inner product space we mean a vector space V over € together with an inner product on V. By a real inner product space we mean a vector space V over IR together with an inner product on V (this being defined as in the above but with the bar denoting complex conjugate omitted). By an inner product space we shall mean either a complex inner product space or a real inner product space. There
are certain other identities that follow immediately
from (1) to (4) above, namely :
(5) (zlyty') = (z]y) + (zly’). In fact, by (1) and (3) we have
(zlyty’) = (y+y'|2) = (y|z) + (y'|2) = (zy) + (zy).
70
VOLUME
4: LINEAR
ALGEBRA
(6) (z|ay) = a(z|y). This follows from (3) and (4) since
(z| ay) = (ay|2) = a(y|z) =a (y|z) = a(zly). (7) (z|0) =0= (0[z). This is immediate from (1), (2), (3) on taking 2’ = —z,y’ = —y, anda=-1l. Example
(C” is a complex inner product space under the map-
ping described by (z, w) ++ (z|w) where (zor.
52a) | bisa
Wat a
ime
Zj Uj.
-.
This inner product is called the standard inner product on C. Example IR” is a real inner product space under the corresponding standard inner product given by
{Ger yornp full (Yap -29 Ya) > si In the cases where n = 2,3 this inner product is often called the dot product or scalar product. This terminology is popular when dealing with the geometric application of vectors. Indeed, several of the results that we shall establish will generalise familiar results in euclidean geometry of two and three dimensions. Example
Let a,b € IR and let V be the real vector space of
continuous functions f : [a,b] —
V x V to IR by
IR. Define a mapping from b
(f,9) + (F19) = | fg. Then this defines an inner product on V.
Example
Let IR,,[X] be the real vector space of polynomials
of degree less than n. Then
(ola = [va defines an inner product on R,,[X].
INNER
PRODUCT
Example
SPACES
103
For an n x n matrix A =
[a;;] let trA =
nm
Yo ay.
i=1
Then the vector space Matnxn(IR) can be made into a real inner product space by defining
(A|B) = tr(BtA). Likewise, Mat,,,,(€) can be made into a complex inner product space by defining
(A|B) = tr(B*A) where B* = Bt is the complex conjugate of the transpose of B. Definition Let V be an inner product space. For every zs € V we define the norm of z to be the non-negative real number
\|z|| = ./(z|z). Given z, y € V we define the distance between z and y to be d(z,y) = ||z — y]. Example
In the real inner product space IR* under the stan-
dard inner product, if z = (z,, x2) then ||z||? = z? + 22, so ||z|| is the distance from z to the origin.
Likewise, if y =
(y1, y2)
then we have ||z — y||? = (z1 — y1)? + (z2 — y2)?, which shows the connection between the general concept of distance and the theorem of Pythagoras.
It is clear from (4) above that ||z|| = 0 if and only if z = 0y. 7.1 Theorem Let V be an inner product space. z,y © V and every scalar i,
(1) [JAZ = Allies
(2) [Cauchy-Schwarz inequality]
(3) [Triangle inequality]
Then, for all
|(z|y)| < ||z|| |lyll;
||x + yl] < ||| + |ly|l.
Proof (1) ||Az|]? = (Az|Az) = AX(z| z) = |Al?||z||?. (2) The result is trivial if c = Oy. Suppose then that z # Ov, so that ||z||
4 0.
Let
z =
y—
ie z.
Then, noting that
(2|z) = 0, we have z) 0< |lz||* Die= (y- (y| iz? — |a
AY
= (yy) - oe (z|y) a — |z|y)? = (y|y) “Tal?
the
Tell?
VOLUME
72
4:
LINEAR
ALGEBRA
from which (2) follows. (3) This follows from the observation that
Iz + yl? =(2+yl|2+y) = (x|z) + (z|y) + (y|z) + (yly)
= = < < = Example
|x||? + (zly)+ Ly)+ lll? |lz||? + 2Re(z|y) + llyll? ||2|? + 2\(z|y)| + lyll? |e|]? + 2\z\Hlyll + lly? by (2) (Iz|| +llyll)”. >
Let V be the set of infinite sequences (a;);>1 of real
numbers that are square summable in the sense that }> a? exi>1 ists. Defining an addition and a multiplication by real scalars in the obvious component-wise manner, we see that V becomes a real vector space. Let (a;);>1 and (b;);>1 be elements of V. By the Cauchy-Schwarz inequality applied to the inner product space IR” with the standard inner product, we have
k Dd ab;
k
$2
4=0
; —
k
2495S ES dX a7 D oF eke
.—
k
so the sequence with k-th term )> a;b; is absolutely summable and hence is summable.
Thus
s=1
}> a;b; exists and we can define #>1
((as)i>a [(Os)i>1) = i>1 DU aii. In this way, V becomes a real inner product space that is often called 2-space or Hilbert space. Definition
If V is an inner product space then z,y € V are
said to be orthogonal if (x|y) = 0. A non-empty subset S of V is said to be an orthogonal subset of V if every pair of distinct elements of S is orthogonal. An orthonormal subset of V is an
orthogonal subset S such that ||z|] = 1 for every set of mutually orthogonal vectors of length 1.
t € S, ie. a
4
INNER
PRODUCT
SPACES
73
Example Relative to the standard inner products, the standard bases of IR” and of €” are orthonormal subsets. Example In IR? the elements z = (z1, 22) and y = (y1, y2) are orthogonal if and only if z,y; +z2y2 = 0. Geometrically, this is equivalent to saying that the lines joining z and y to the origin are mutually perpendicular. Example
In the vector space V of real continuous functions on
the interval [—7, x] with inner product (f |g) = ["_ fg the set S={rr1,zrsinkz,r++
coskz;
k = 1,2,3,...}
is an orthogonal subset. It is clear that an orthonormal subset of V can always be obtained from an orthogonal subset S by normalising each element z of S, i.e. by replacing z by z* = z/||z||. An important property of orthogonal (and hence of orthonor-
mal) sets is the following. 7.2 Theorem Proof
Orthogonal sets are linearly independent.
Let S be an orthogonal subset of V and let z1,...,2, € nm
S be such that }> A;z; = Oy. Then for every 7 we have 4=1
As (zi |zi) = DO An (ze |zi) = (> AKze ke==1
zi) = (Oy|zi) = 0
k=1
from which it follows by (4) on page 69 that A; = 0. > We now describe properties of the subspace spanned by an orthonormal subset.
7.3 Theorem
Let {e1,...,¢n} be an orthonormal subset of the
inner product space V. Then
[Bessel’s inequality]
(VzEV)
rm
> |(z| ex) | < ||z||?. k=1
Moreover, if W is the subspace spanned by {€1,---,€n} then the following statements are equivalent :
(1)
zew;
74
\
VOLUME
4: LINEAR
ALGEBRA
nr
(2) dO |(zlex)|? = [lell?; k=?
as
29
pale |exe}
(4) (WyeEV) (2|y) = 2 (alen)(enl n
Proof Let z = z— >> (z|e,)ex.
Then a simple computation
k=t
gives
0< (z|z) = (2|z) ~ 3 (een) (= Tee) = |\z||? — 2 [zl ex)|?. (2) = (3) is now immediate since (2) implies that z = Oy. (3) > (4): If z=
3 (z|e,)e, then, for all
yc V,
k=1
(z|yv) = ( 3(elen)ex |v)= 50(2Len)(ex|). (4) => (2) follows by taking y = z in (4). (3) => (1) is clear. n
(1) = (3) : If c= > Axex then for 7 = 1,...,n we have k=1
A; = DY Arlen |e;) = (D> AKek |e) =(zle;). k=1
k=]
Definition By an orthonormal basis of an inner product space we mean an orthonormal subset that is a basis.
Example
The standard bases of IR” and €” are orthonormal.
Example In Mat,.n(€) with (A|B) = tr (B* A), an orthonormal basis is {Eq ; p,q = 1,...,n} where E,, has a 1 in the (p, q)-th position and 0 elsewhere. We shall now show that every finite-dimensional inner product space has an orthonormal basis. In so doing, we give a practical method of constructing such a basis.
INNER
PRODUCT
7.4 Theorem
SPACES
75
[Gram-Schmidt orthonormalisation process] Let
V be an inner product space and for every non-zero x € V let
z* = z/||z||. If {21,..., 2%} ts a linearly independent subset of V, define recursively Yi = 233 32° — (z2 ay (ro |yi)¥1) 3
ys = (23 — (23 |y2)yo — (23 |y1)¥1)3
tL
(ce rT Sie: |yi)yi) aa |
Then {y1,.--, ye} ts orthonormal and spans the same subspace oe Epes ae Proof It is readily seen that y; # Oy for every 2 and that y; is a linear combination of 2,,...,2z;. It is also clear that 2,
is a linear combination of yi,...,y;. Thus {z1,...,2,} and {yi,---,Y¥e} span the same subspace. It now suffices to prove that {y1,...,yx} is an orthogonal subset; and this we do inductively. For k = 1 the result is trivial. Suppose that {y1,..., 4-1} is orthogonal where t > 1. Then, writing
jee— (eel vss t=!
=
Mt,
i=
we see that
tA tye = Tt — bas (xt | vs) ys 11
and so, for 7 < t,
ne
eee Hy(cs hoe)(ue|9) = (2+ |ys) — (2+ |ys) =0.
Since a; # O we deduce that (y%|y;) = 0 for 7 < t. Thus {y1, Sale sutt is orthogonal. >
76
VOLUME
4: LINEAR
ALGEBRA
‘\
7.5 Corollary If V is a finite-dimensional inner product space then V has an orthonormal basis. Proof
Apply the Gram-Schmidt process to a basis of V. >
Example
(0,1,1),22
Consider the basis {z1,22,23}
=
Gram-Schmidt
(1,0,1),z3
=
(1,1,0).
of IR° where
z1 =
In order to apply the
process to this basis using the standard inner
product, we first let y; = 21/||z1|| = 75 (0; 1,1). Now z2 — (2 |yi) yi a (1,0, 1) at ya ((1,0, 1) (0, 1, 1)) 35(0, 1, 1) na (1,0, 1) bd 5 (0, 1, 1)
= 3(2,-1,1) sc, normalising this, we take y2 = (2; —1,1). Note that, by 7.1, we have
[el
[Alle]
l-2*
ita: If f : V — W is an inner product isomorphism then
clearly {f(e1),..-, f(en)} is a basis of W. It is also orthonormal since
ledifen=(led={o
fa,
igh
< : Suppose now that {f(e1),..., f(en)} is am orthonormal basis of W. Then f carries a basis to a basis and so is a vector space isomorphism. Now for all z € V we have, using the Fourier
expansion of z relative to {e),...,€n},
(f(z) |f(es)) =(F > ((zl ese)e:) |f(e:)) re
yeaa |
“(aelei) Fle) |fles))
and similarly (f(e3) |z) = (e3 |2). It now follows by Parseval’s identity applied to both V and W that
(F(2) |F(y)) = (C2) | Flea) Mle) |FW) = j=1 ¥(z/e,)(e, |) = (z|y)
and consequently f is an inner product isomorphism. } We now pass to the consideration of the dual of an inner product space. For this purpose, we require the following notions. Definition Let V,W be vector spaces over a field F where F is either IR or C. A mapping f : V — W is called a conjugate transformation if
(Ve,yEV)(VAEF) f(z+y)=f(z)+ f(y), fz) =F (2). If, furthermore, f is a bijection then we say that it is a conjugate tsomorphism,
4
INNER
PRODUCT
SPACES
79
Note that when F = R conjugate transformations are simply linear mappings. We now observe that for every
y © V the mapping from V
to F described by z++ (z|y) is linear, and hence is an element of V*. We shall write this element as y*, so that we have the following useful amalgamated notation :
(z|y) = y*(z) = (z,y%). 7.9 Theorem I[fV 1s a finite-dimensional inner product space then there is a conjugate isomorphism by : V — V4, namely
that given by by (z) = z* where (Vz EV)
z4(y) = (y,2%).
Proof Consider the mapping ty :V — V@ given by #y(z)
=
zt, Since
(z,(y +2)*) = (z|y +z) = (z|y) + (z|z) = (z,y*) + (2,24) = (z,y? + 2%) we see that (y +z)? = y4 + 24, s0 Hy (y+z) = Hy (y) + By (z). Likewise,
(z, (Ay)*) = (z| Ay) = A(z| y) = A(z, y*) = (2, Ay*) and so (Ay)? = Ay* whence Hy (Ay) = AD(y). Thus By is a conjugate transformation. That By is injective follows from the fact that if c € Kerby
then z* = 0 and so (z|z) = (z,z*) = 0 whence z = 0y. To show that Hy is also surjective, let f EV%. If {e1,...,en} is an orthonormal basis of V, let
= L Hees. Then for 7 = 1,...,n we have
a4 (e,) = (es|2) = (4s ¥ Header) = YeHea)(eyes) = (6) Thus z* and f coincide on the basis {¢1,...,én}. We deduce that zt = by(z) and so Hy is also surjective. Thus ty is a conjugate isomorphism. ~
80
VOLUME
4: LINEAR
ALGEBRA
‘\
We note from the above that we have the identity
(Vz,yEV)
(z|y) = (2, By (y)).
Since 8y is a bijection, we also have the following identity (ob-
tained by writing 3*(y) instead of y) :
(Vz,yeV)
— (2| 9y*(y)) = (2,y)-
We can now establish the following important result. 7.10 Theorem Let V andW be finite-dimensional inner product spaces over the same field. Then for every linear mapping f:V —W there is a unique linear mapping f* :W — V such that
(VeeV)(VyeW) Proof
(f(z) |y) = (z| f*(y))-
With the above notation, we have the identity
(f(z) |y) = (f(@), 94)= (2, f(y") = (| 9)" [f*(y%))) = (2| (87° ° f* o 8w)(y)), from which it follows immediately that f*
=
87) 0 ft ody
is the only linear mapping with the stated property. © 7.11 Corollary f* :W — V 1s the unique linear mapping such that the diagram
Tt
W—_ we
Vv —!— Ft oy
1s commutative, in the sense that Sy o f* = fiodvw. Definition The unique linear mapping f* of 7.10 will be called the adjoint of f. Immediate properties of the assignment f ++ f* are the following.
INNER
PRODUCT
SPACES
81
7.12 Theorem Let V,W, X be finite-dimensional inner product be spaces over the same field. Let f,g: V +~W andh:W —X linear mappings. Then
(1) (f+9)* = f* +9"; (2) (Af)* =Af*; (3) (ho f)* = fr oh*; (4).(f*)* =f Proof (1) is immediate from f* = 9 0 ft o Sw and the fact
that (f + 9)’ = fi +9".
(2) (AF)(z) ly) = A(F(z) ly) = A(z] f*(y)) = (zl AF*(y))
and so, by the uniqueness of adjoints, (Af)* = Af*.
(3) (Alf(z)]|¥) = (F(z) |A*(y)) = (z| f*[A*(y)]) and so, by
the uniqueness of adjoints, (ho f)* = f* oh*.
(4) Taking complex conjugates in 7.10 we obtain the identity
(f*(y) |z) = (yl f(z)), from which it follows by the uniqueness of adjoints that (f*)* = f= 7.13 Theorem Let V and W be finite-dimenstonal inner product spaces over the same field with dimV = dimW. If f:V — W is linear then the following statements are equivalent :
(1) f ts an inner product 1somorphism; (2) f ts a vector space isomorphism and f—+ = f*;
(3) fo f* =idw; (4) ftof= ny.
Proof (1) > (2) : If (1) holds then f~+ exists and we have the identity
(f(z) |y) = (f(z) |FIF7* @))) = (21 f(y); from which it follows by the uniqueness of adjoints that f~* =
f*. It is clear that (2) = (3) and (2) = (4). (4) = (1) : If (4) holds then f is injective, hence bijective, and f—! = f*. Consequently,
(vz,yeV)
(f(z) |f(y)) = (21 FL F(y))) = (219)
82
:
VOLUME 4: LINEAR ALGEBRA
and so f is an inner product isomorphism.
The proof of (3) = (1) is similar. > We have seen in 6.10 how the transpose f* of f is such that Ker f* and Im f* are the annihilators of Im f and Ker f respectively. In view of the connection between transposes and adjoints, it will come as no surprise that Ker f* and Im f* are also related to the subspaces Im f and Ker f. This connection is via the following notion. Definition Let V be an inner product space. For every nonempty subset E of V we define the orthogonal complement of E in V to be the set
E* ={yeV; (VWzeE) (z|y) =0}. It is clear that E+ is a subspace of V. suggested by the following result.
The terminology is
7.14 Theorem Let V be an inner product space and let W a fintte-dimenstonal subspace ofV. Then
be
V=Woewt. Proof Let {e1,...,é,} be an orthonormal basis of W, noting that this exists since W is of finite dimension. Given x € V, m
let z' = S0 (xl e;)e; and let 2” = z—z'. j
Then 2! € W and for
t=1
=1,...,n we have
(2"|e5) = (w]e5) — (2' Les) = (ales) — Do(elea)(ec les) = (z|e;) — (z|e;) = 0.
It follows that xz” € W+ and hence that z= 2! +2” eW+W?. Thus V =W+W+. Now if s€WOW?= we have (z|z) = 0 whence ||z|| = 0 and so z = Oy. Thus we conclude that V =
WeWw-+.
>
INNER
PRODUCT
SPACES
;
83
Example The above result has a basic application to the theory of Fourter sertes. Suppose that W is a finite-dimensional subspace of the inner product space V. Given z € V let z = a+b where a € W and b¢€ WH. Then, by orthogonality,
|Z? = (a +b|a +b) = |jal]? + |[o|]?. For any y € W we deduce that
\|z — yl|? = lla— y + b]|? = |la— y||? + |||? = |la — y||? + Iz - ||? 2 iizalls Thus we see that the element of W that is ‘nearest’ the element
z of V is the component a of z in W. Now let {e1,...,en,} be an orthonormal basis for W.
Let the element of W
a given z € V be the element a =
2a A;e;.
A; = (a|e;) and by orthogonality (z =
that is nearest
By 7.3 we have
(a+ b|e;) = (ale;).
Thus the element of W that is nearest z is Yileledes the scalars being the Fourier coefficients. ia Now apply these observations to the inner product space V of continuous functions f : [—7,] — IR under the inner product
(f\lg)= ij” fg. An orthonormal subset of V is —ZFz.
S={zrr Yat
sinks, z+
cos kz; | ae
Way ie Rat
Let W,, be the (2n + 1)-dimensional subspace spanned by
B, = {tr ygrtr sinks, ++ coska ; k=) apn}: Then the element f, of W,, that is nearest a given f € V is of the form nm
fa = Za0 + ¥ (a, cos kx + by sin kz) |
where us
1
f(z)dz, ao = s nT J_x
a, = =
us
a
f(z) cos kz dz,
b, = if f(z) sin kz dz. an TE:
84
VOLUME
4: LINEAR
ALGEBRA
‘\
If f is infinitely differentiable then it can be shown that the sequence (fn)n>1 is a Cauchy sequence having f as its limit. Thus we can write
f = $40 + DY (a, cos ka + by sin kz) k>1
which is the Fourter series representation of f. 7.15 Theorem [fV is a fintte-dimensional inner product space and W is a subspace of V then W =W++4 and
dimW+ = dimV — dimW. Proof By 7.14 we clearly have dimV = dimW+dimW+. Now it is clear from the definition of W+ that we have W C W1+-. Also,
dimW++ = dimV — dimW~ = dimW. It follows that
W =W++.
7.16 Theorem I[fV 1s a finite-dimenstonal inner product space and A,B are subspaces ofV then
Ol) ACB = BG Aa
(2) (AnB)+ = A++ B+; (3)?(A +B) = At'n BH. Proof (1) If A C B then clearly every element that is orthogonal to B is orthogonal to A, so B+ C At.
(2) Since A, B C A+B we have, by (1), (A+ B)+ C AtnB?; and since AN B C A,B we have A+,B+ C (AN B)*+ whence A++Btc
(AN B)+. Since then
ANB=(AnB)t+ ¢ (444 B+)+ Cc At+ A Btt = ANB we deduce that
Area:
AM B = (A+ + B+)+ whence (An B)+t =
(3) This follows from (2) on replacing A, B by A+, B+. > 7.17 Theorem and tf f:V —V
IfV 1s a finite-dimensional inner product space 1s linear then
Im f* = (Ker f)+
and
Ker f* = (Imf)t.
INNER
PRODUCT
SPACES
85
Proof Let z € Im f*, say z = f*(y). Then for every z € Kerf we have
(x|z)= (z| f*(y)) = (f(z) |y) = (Ov |y) =0 and consequently z € (Ker f)+. Thus Im f* C (Ker f)+. Now let y € Ker f*. Then for z = f(z) € Imf we have
(z|y) = (f(z) |y) = (2| f*(y)) = (z| Ov) =0 and consequently y € (Im f)+. Thus Ker f* C (Im f)+. Using 7.15 we then have
dim Im f = dimV — dim(Im f)+ < dimV — dim Ker f*
= dim Im f*
—
< dim(Ker ‘ape = dimV — dim Ker f = dim Im f.
The resulting equality gives both dim Im f* = dim(Ker f)+ and dim(Im f)+ = dim Ker f*, from which the results follow. We now investigate how matrices that represent f and f* are related. Definition
If A = [a;;|mxn € Matmxn(C) then by the adjoint
(or conjugate transpose) of A we mean the n x m matrix A* such that [A*];; = a;;. The following result justifies the above terminology. 7.18 Theorem Let V,W be finite-dimensional inner product spaces over the same field. If, relative to ordered orthonormal
bases (v;)n,(wWi)m @ linear mapping f : V — W 18 represented by the matriz A then the mapping f* 13 represented, relative to the bases (w;)m and (v;)n, by the matriz A*.
Proof For j = 1,...,n we have f(v;) = )>(f(v;) |w;)w; by 7.6, so if A = [a;;] we have a,; = (f(v;) |w;). Likewise, we have
f*(v;) =
(f*(w,) |v:)vz. Then since =
aig = (f (vz) |ws) = (wi |f(s) = (F* (ms) |29)
86
\
VOLUME
4: LINEAR
ALGEBRA
it follows that the matrix that represents f* is A*. > It is clear from 7.18 and 7.13 that a square matrix A represents an inner product space isomorphism if and only if A’ exists and is A*. Such a matrix is said to be unitary. It is readily seen by extending the corresponding results for ordinary vector spaces to inner product spaces that if A,B are n x n matrices over the ground field of V then A,B represent the same linear mapping with respect to possibly different ordered orthonormal bases of V if and only if there is a unitary matrix U such that B = U*AU = U—!AU. We describe this situation by saying that B is unitartly similar to A. When the ground field is IR, the word orthogonal is often used instead of unitary. In this case A is orthogonal if and only if A~? exists and is A*. When there exists an orthogonal matrix U such that B = U‘AU = U~1AU then we say that B is orthogonally similar to A.
It is clear that the relation of being unitarily (or orthogonally) similar is an equivalence relation on the set of n x n matrices over € (or IR). Just as with ordinary similarity, the problem of locating particularly simple representatives, or canonical forms, in certain equivalence classes is important from both the theoretical and practical points of view. We shall consider this problem later.
CHAPTER
EIGHT
Orthogonal direct sums
In 7.14 we obtained, in an inner product space V, a direct sum decomposition of the form V =W @W1. This leads us to consider the following notion. Definition Let V;,...,V, be non-zero subspaces of an inner product space V. Then V is said to be the orthogonal direct sum of Vj,...,Vn if
(1) v=; 2) (=igasyn)
GV
= Oo V5 j#t
In order to study inner product space p:V—V and the established in 2.6. sum, it is clear that
orthogonal direct sum decompositions in an V let us begin by considering a projection associated decomposition V = Imp @ Kerp In order that this be an orthogonal direct p has to be an ortho-projection in the sense
that Kerp = (Imp)* or, equivalently, Imp = (Kerp)+.
To
discover when this happens, we require the following result. 8.1 Theorem IfW,X are subspaces of a fintte-dimenstonal 1nner product space VY such thatV =W@X thenV=Wt@X?.
Proof By 7.16 we have {Ov} = V+ =(W+
X)t =Wtnxt+
and V = {0y}t = (Wn X)+ = Wt + X+ and hence V = W+e@X1+.9 8.2 Corollary If p is the projection on W parallel to X then p* is the projection on X+ parallel to Wt.
88
.
Proof
VOLUME
4: LINEAR
ALGEBRA
By 7.12 and since p is idempotent, we have p* o p* =
(pop)* = p*. Thus p* is idempotent and so is the projection on Im p* parallel to Ker p*. By 2.5, Imp = W and Kerp = X so, by
7.17, W+ = (Imp)+ = Kerp* and X+ = (Kerp)* = Imp*. > Definition If V is an inner product space then f : V — V is said to be self-adjoint if f = f*. 8.8 Theorem Let V be an inner product space of finite dimenston. Ifp 1s a projection on V then p 1s an ortho-projection if and only if p 1s self-adjoint. Proof By 8.2, p* is the projection on Im p* = (Kerp)* parallel to Kerp* = (Imp). If then p is an ortho-projection we have Im p* = Imp. It follows by 2.5 that for every z € V we have
p(z) = p*|p(z)|. Consequently p = p* o p and hence
p* = (p* op)* = p* op** = p* op=p, so that p is self-adjoint.
Conversely, if p = p* then Imp = Imp* = (Kerp)+ shows that p is an ortho-projection. > It is clear from the above results that if V is an inner product space of finite dimension and if V,,...,V, are non-zero subnm
spaces of V such that
V=
@ V; then this sum is an orthogonal =1
direct sum if and only if, for every 7, the projection p; of V onto V; parallel to >> V; is self-adjoint. j#t nm
It is also clear that if V = GV; then this direct sum is an #=1
orthogonal direct sum if and only if, for each 7, every element of V; is orthogonal to every element of V; when j #7. In fact in this case we have )> V; C V,* whence we have equality since j#t
dim )> V; = dimV — dimV; = dimV,". j#t
Suppose that V is a finite-dimensional inner product space and that f : V — V is linear. We shall now consider under what conditions f is ortho-diagonalizable in the sense that there is an
ORTHOGONAL
DIRECT
SUMS
89
orthonormal basis of V consisting of eigenvectors of f; equivalently, under what conditions there is an ordered orthonormal basis of V with respect to which the matrix of f is diagonal. In purely matrix terms this problem is that of determining when a given square matrix (over IR or C) is unitarily similar to a diagonal matrix. 8.4 Theorem Let V be a non-zero finite-dimensitonal inner product space over a field F and let f : V — V be linear. Then f 1s ortho-dtagonalizable tf and only if there are non-zero self-adjoint projections pi,...,pk :V —-V and distinct scalars A1,---;Ak € F such that
fee a on @)
Dr= idy ;
(3) (i44) Proof
Pi op; = 0.
=>: Since f is diagonalizable, we have
V = ® V,, where =1 A1,---,;Axz are the distinct eigenvalues of f and tHe subspace Y, = Ker( f — A; idy) is the eigenspace associated with ,. If
pi: V + V is the projection on Vj, parallel to } Vy, then (2), jAt
(3) follow from 2.8. Now for every z € V we have
f(a) = (35 vile)) = ¥ flea) k
= 2s Aip;(z) = (= spi) (x) k
and this gives (1). The fact that € Vy, is an orthogonal direct s=1
sum means that each projection p; is an ortho-projection and so, by 8.3, is self-adjoint.
k < : If the conditions hold then by 2.8 we have V = @ Impj. s=1
Now the A; appearing in (1) are precisely the distinct eigenvalues of f. To see this, observe that
fop;= (=Nips) © 75 = s Ai (pi © py)= AsPs
90
:
VOLUME 4: LINEAR ALGEBRA
o (f — A; idy) op; = 0 and hence {Ov }4 Imp, C Ker(f — A; idy). Thus each A; is an eigenvalue of f. On the other hand, for every A € F we have
k k k f —Aidy = DO Aspi— DO Avi= DE (Ai — A) ca
2
—)
so that, nez is an eigenvector of f corresponding to ne eigen-
value i, = (A; — A)pi(z)= Oy and hence, since V = ® Im p;, we have 0, — X)p;(z) = Oy fort =1,...,k. IfAFA; pec every a pone pi(z)= Oy for every 7 and we ane the contradiction a= = pi(z) = Oy. Thus A = 2; for some 7 and consequently A1;- Ra are the distinct eigenvalues of f.
We now observe that Imp;= aa f
—A;idy). For, suppose
that f(z)= A;z. Then Oy= EO - d;)pi(z) and therefore (A; —A;i)pi(z) = Oy for all: paints pi(z) = Oy for allt F 7.
Then zs= 3 pi(z)= p;(z) € Imp; and so Ker(f— A; idy) C Im p; . The rmyeree ipgnaon a established above. Since now V = @ Imp; = @ Ker(f — A; idy) it follows that s=1
t=1
V has a basis consisting of eigenvectors of f and so f is diagonalizable. Now by hypothesis the projections p; are self-adjoint so, for 7 #2,
(pi(z) |p3(z)) = (ps(z) | P35(2)) = (Pslps(z)] |z) = (Ov |2) = 0. It follows that the above eigenvector basis is orthogonal. By normalising each vector in this basis we obtain an orthonormal basis of eigenvectors. Hence f is ortho-diagonalizable. > Definition
For an ortho-diagonalizable mapping f the equality
k
f = ¥& Aip; of 8.4 is called the spectral resolution of f. =
ORTHOGONAL
DIRECT
SUMS
91
Suppose now that f : V — V is ortho-diagonalizable. Applying the results of 7.12 to the conditions in 8.4 we obtain, with an obvious notation,
(1*) ft = % Pies D Mes (2")=(2), (3) = (3). We deduce by 8.4 that f* is also ortho-diagonalizable and that (1*) gives its spectral resolution (so that A1,...,A% are the distinct eigenvalues of f*). A simple calculation now reveals that
k fe Foe
IM pi aime
from which we deduce that ortho-diagonalizable mappings commute with their adjoints. This observation leads to the following notion. Definition If V is a finite-dimensional inner product space and f : V — V is linear then we say that f is normal if it commutes with its adjoint. Similarly, a square matrix A over the ground field of V is said to be normal if AA* = A*A. Example
It is readily seen that
i-"+
a
i;2+ i
is normal.
We have just seen that a necessary condition for a linear mapping f to be ortho-diagonalizable is that it be normal. It is quite remarkable that, when the ground field is €, this condition is also sufficient. In order to establish this, we require the following properties of normal linear mappings. 8.5 Theorem Let V product space and let
(1) (VzeEV)
be a non-zero finite-dimenstonal inner f: V + V be normal. Then
|f(2)l = IF" (2)|l;
(2) afp 1s a@ polynomial with coefficients in the ground field of V then p(f):V — V 1s also normal;
(3) Im f NKer f = {Oy}.
92
VOLUME
4: LINEAR
ALGEBRA
\
Proof (1) Since f o f* = f* o f we have, for all
ze V,
(f(z) |f(z) = (21 F*1F(@))) = (21 FIP (@))) = CF" (2) 1F@) from which (1) follows. (2) If p=ap+a,X+ ---+a,X" then p(f) = ao idy tai f+ -+--+anf” and, by 7.12, [p(f)|* = ao idy +01 f*+ ---+an(f*)”.
Since f and f* commute, it follows that so do p(f) and [p(f)]*. Thus p(f) is normal. (3) If zc € Imf mM Kerf then there exists y € V such that z = f(y) and f(z) = Oy. By (1)-we have f*(z) = Oy and so
0 = (f*(z)|y) = (z| f(y)) = (z|2) whence z = Oy. 8.6 Theorem Let V be a non-zero fintte-dimensional inner product space. If p is a projection on V then p 1s normal if and only tf tt 1s self-adjoint. Proof
Clearly, if p is self-adjoint then p is normal.
Suppose,
conversely, that p is normal. By 8.5 we have ||p(z)|| = ||p* (z)|| and so p(z) = Oy if and only if p*(z) = Oy. Given z EV, let y = x — p(x). We have p(y) = p(z) — p(z) = Oy and so
Ov = p*(y) = p*(z) — p*[p(z)]. Thus p* = p* op and so p=p=(prop)’
—p
Op
—f
opp
i.e. p is self-adjoint. > We can now solve the ortho-diagonalization problem for complez inner product spaces. 8.7 Theorem Let V be a non-zero finite-dimensional complex inner product space. If f : V — V is linear then f 1s orthodiagonalizable af and only if f 1s normal. Proof We have already seen that the condition is necessary. As for sufficiency, suppose that f is normal. To show that f is diagonalizable, it suffices to show that its minimum polynomial is a product of distinct linear factors. For this, we make use of the fact that € is algebraically closed, in the sense that every polynomial over € of degree at least 1 can be expressed as a
ORTHOGONAL
DIRECT
SUMS
93
product of linear polynomials. Thus my, is certainly a product of linear polynomials. Suppose, by way of obtaining a contradiction, that a € C is a multiple zero of my, 30 that we have
my = (X — a)?Q for some polynomial g. Then for every z € V we have
Ov = [my(F)](z) = [(f — aidy)?© g(f)](z) and consequently [(f — aidy) o g(f)|(z) belongs to both the image and the kernel of f — aidy.
Since, by 8.5(2), f — aidy
is normal we deduce from 8.5(3) that (Vz EV)
[(f — aidy) o 9(f)](z) = Oy.
Consequently (f — aidy) o g(f) is the zero mapping on V, and
this contradicts the fact that (X — a)*g is the minimum polynomial of f. Thus we see that f is diagonalizable. To show that f is ortho-diagonalizable, it suffices to show that the corresponding projections p; of 8.4 are ortho-projections, and by 8.3 it is enough to show that they are self-adjoint. Now since f is diagonalizable it is clear from the proof of 2.10 that for
each 1 there is a polynomial t; such that t;(f) = p;. By 8.5(2), each p; is therefore normal and so, by 8.6, is self-adjoint. 8.8 Corollary If A 1s a square matriz over € then A 1s unttartly similar to a diagonal matriz if and only if A 1s normal. > It should be noted that in the proof of 8.7 we made use of the fact that the field € is algebraically closed. This is not true of IR and so we might expect that the corresponding result fails in general for real inner product spaces (and real square matrices). This is indeed the case : there exist normal linear mappings on a real inner product space that are not diagonalizable. One way in which this can happen is when all the eigenvalues of the mapping in question are complex. For example, the rotation matrix
Dae gee a 2
2
is normal and its minimum polynomial is X* + X + 1 which has no zeros in IR. So, in order to obtain an analogue of 8.7 in the case where the ground field is IR, we are led to consider normal linear mappings whose eigenvalues are all real. These can be characterised as follows.
94
*.
VOLUME
4: LINEAR
ALGEBRA
8.9 Theorem Let V be a non-zero finite-dimensitonal complez inner product space. If f : V + V 1s linear then the following conditions are equivalent : (1) f ts normal and all its ergenvalues are real;
(2) f ts self-adjoint. Proof (1) > (2) : By 8.7, f is ortho-diagonalizable. k >> Asp; be its spectral resolution.
Let f =
We know that f* is also
t=1
hiss
normal with spectral resolution. f* = )> A;p;. Since each A; is a!
real by hypothesis, it follows that f* = f.
k (2) => (1) : If f* = f then clearly f is normal. If f = )) A:p; t=1
a
and f* =
)> X;p; are the spectral resolutions then we have 1
k
on
k
+i
(Ai — As)ps = 0 and so $0 (A; — Ax)pi(z) = Oy for every t=1
2
aa z € V, whence (A; — A;)p; = 0 for every 7 since
k V = @ Impj.
8.10 Corollary real. >
All the etgenvalues of a self-adjoint matriz are
The analogue of 8.7 is now the following. 8.11 Theorem Let V be a non-zero fintte-dimenstonal real inner product space. If f : V — V 18 linear then f 1s orthodtagonalizable tf and only tf f 1s self-adjoint.
k Proof => : If f is ortho-diagonalizable let f = }> A;p; be its spectral resolution. Since the ground field is IR, eee A; 1s real and so, taking adjoints and using 8.3, we obtain f* = f. n>dim(Z+ X). It follows that the have equality) and of ZX. Then whereas from (2)
sum Z+ X is not direct (otherwise we would so ZNX # {Ov}. Let z be a non-zero element from (1) we see that (f,(z)|z) is negative, we see that (f,(z)|z) is non-negative. This
contradiction shows that we cannot have r’ < r. Similarly we
cannot have r < r’ and so we conclude that r = r’ whence also S= 31.4 The above result gives immediately the following theorem which describes canonical quadratic forms.
9.5 Theorem
[Sylvester] Let V be a vector space of dimension
n over IR and let
Q: V — IR be a quadratic form on V.
Then
there 1s an ordered basis (v;)n of V such that fx = > 2,0; then i=1 38 2 2 2 Q(z) == ty ST PERE
24
"7
Deyg:
Moreover, the integers r and s are independent of such a basis. > The integer r+s in 9.5 is often called the rank of the quadratic form Q, and r — s the stgnature of Q. Example
Consider the quadratic form Q : IR® — IR given by
Q(z, y, 2) = 2? — 2ay + 4yz — 2y? + 42”. By the process of ‘completing the squares’ it is readily seen that
Q(z, y,z) = (x — y)? — 4y? + (y + 22)? which is in canonical form, of rank 3 and signature 1. Alternatively, we can use matrices. The matrix of Q is AS
115.0 |b =F 2 0 2 4
Let P be an orthogonal matrix such that P* AP is the diagonal
matrix D. If y = P*x (so that x = Py) then
x’ Ax = (Py)'APy = y’P’ APy = y' Dy, where the right hand side is of the form X? — 4Y? + Z?.
106
:
VOLUME
4: LINEAR
ALGEBRA
Example The quadratic form given by Q(z, y, 2) = 2zy + 2yz can be reduced to canonical form either by the method of completing squares or by a matrix reduction. The former method is not so easy in this case, but can be achieved as follows. Define
Vir = X4+Y,
V2y=X-Y,
V2z2=Z.
Then the form becomes
X? —~Y?4(X-Y)Z=(X+42Z)? -(¥ + $2)?
= $(z+y+2)?-3(2—-y+2), which is of rank 2 and signature 0. Definition
A quadratic form Q is said to be posttive definite
if Q(z) > 0 for all non-zero z. By taking the inner product space V to be Mat, x1(IR) under (x|y) = x*y, we see that a quadratic form Q on V is positive definite if and only if, for all non-zerox € V,
0 < Q(x) =x! Ax= (Ax|x), which is the case if and only if A is positive definite. It is clear that this situation obtains when there are no negative terms in the canonical form, i.e. when the rank and the signature are the same. Example Let f : IR x IR — IR be a function whose partial derivatives f,, fy are zero at (zo, yo). Then the Taylor series at
(zo +h, yo +k) is f (zo, yo) ‘2 i(h? fos + 2hk fey + k? fyy](zo, yo) rian
For small values of h,k the significant term is this quadratic form in h, k. If it has rank 2 then its normal form is +H?+ K?.
If both signs are positive (ie.
the form is positive definite)
then f has a relative minimum at (zo, yo), and if both signs are
negative then f has a relative maximum at (zo, yo). If one sign is positive and the other is negative then f has a saddle-point at (zo, yo). Thus the geometry is distinguished by the signature of the quadratic form.
BILINEAR AND QUADRATIC FORMS Example
107
Consider the quadratic form
4a” + 4y* + 427 — 2zy — 2yz + Qaz. Its matrix is
45 A=j|-1
=1 ae eek
1 Sa 4
The eigenvalues of A are 3 (of algebraic multiplicity 2), and 6. If P is an orthogonal matrix such that P* AP is diagonal then, changing coordinates by X = P'x, we transform the quadratic form to
3X? + 3Y7? +62?
which is positive definite.
CHAPTER
TEN
Real normality
We have seen in 8.7 that the ortho-diagonalizable linear mappings on a complex inner product space are precisely those that are normal; and in 8.11 that the ortho-diagonalizable linear mappings on a real inner product space are precisely those that are self-adjoint. It is therefore natural to ask what can be said about normal linear mappings on a real inner product space; equivalently, to ask about real square matrices that commute with their transposes. Our objective now will be to obtain a canonical form for such a matrix under orthogonal similarity. For this purpose, we consider the following notion. Definition Let V be a finite-dimensional real inner product space and let f : V — V be linear. We say that f is skewadjoint if f* = —f. The corresponding terminology for real square matrices is skew-symmetric. 10.1 Theorem I[fV product space and f self-adjoint g: V + such that f =g+h.
1s a non-zero finite-dimensional real inner : V — V 1s linear then there ts a unique V and a unique skew-adjointh: V + V Moreover, f is normal if and only tf g,h
commute.
Proof We have f = S(f+f*)+3(f-f*) where +(f+/*) is selfadjoint and a(f — f*) is skew-adjoint. Also, if f = g +h where g is self-adjoint and h is skew-adjoint then f* = g*+h* =g—h
and consequently we see that g = 3(f+ f*) andh= 3(f — f*). Now fof* = f*of gives (g +h) o(g—h) = (g—h)o(g+h) which reduces to
goh = hog. Conversely, if g, h commute then
it is readily seen that f o f* = g?—h? = f*of.
REAL
NORMALITY
109
We now obtain a useful characterisation of skew-adjoint mappings. 10.2 Theorem [fV is a non-zero finite-dimensional real inner product space then f :V —V 1s skew-adjoint if and only if
(VeeV) Proof
=:
(f(z)|2) =0.
If f is skew-adjoint then for every z € V we have
(f(z) |2)= (z| — f(2)) = —(2| f(2)) = —(f (2) |2) and so (f(z) |z) =0. 0 with: # 7. Then the minimum polynomials of, f;, f; are X? — 2a;X +a? +? and X? —2a,;X+ as + b? where either a; # a; or b? # b. By 10.11, we have
Mg, = X—a;,mg, = X—a; and mp, = X?+b?, mp, = X? +07. Given z; € V; and z; € V; we therefore have
= ((h? + 0 idy,)(zi) |zy) = (h? (a4) |23)+BF (a |25) = (2; |h?(23)) + bf (a: |3) = (x; |hF(z;)) + 0} (x; |x3)
= —0F(a;|23) + 67(2: |23) = (2? - bo) (x5 |z;),
so that in the case where b? # 6% we have (z; |x;) = 0. Likewise,
0 = ((9; — a idy,) (zi) |23) = (9(2s) |27) — a (as |25) = (2; |9(z3)) — ai (zi |z;)
= (2; |9;(z3)) — a:(z3 |23)
= a; (x; |z;) — a;(2; |25) = (a; — a;) (2; |23)
116
>
VOLUME
4: LINEAR
ALGEBRA
so that in the case where a; # a; we have (z;|z;) = 0. We thus see that V,,...,V, are pairwise othogonal. That Vo is also orthogonal to each V; for 2 > 1 follows from the above strings of equalities on taking 7 = O and using the fact that fo = aoidy, is self-adjoint and consequently go = fo and hh = 0. O We can now establish the main result.
10.18 Theorem If V 1s a non-zero finite-dimensional real inner product space and if f : V + V 18s a normal linear mapping then there 1s an ordered orthonormal basis ofV relative to which the matriz of f ts of the form
Ai Az Ax where each A; 1s etther form
a 1X 1 matriz or a 2 x 2 matriz of the
a
—f
B
a
an which B #0. Proof
With the same notation as above, let
k my = (X — ao) TOS — 2a;X+ a? + 6?) and let the primary components
of f be V; for 1 = 0,...,k.
Then ms, = X — ao if 1 = 0 and my, = X? — 2a;X + a?+83 otherwise. Given any V; with 1 # 0 we have f; = g; + h; where the self-adjoint part g; has minimum polynomial X — a; and the skew-adjoint part h; has minimum polynomial X? + b?. Now h; is skew-adjoint and so, by 10.8, there is an ordered orthonormal
REAL
NORMALITY
117
basis B; of V; with respect to which the matrix of h; is
M (b;) =
b;
0
Since the minimum polynomial of g; is X — a; we have g;(z) = a;z for every z € B; and so the matrix of g; relative to B; is the diagonal matrix all of whose diagonal entries are a;. It now follows that the matrix of f; = g; + h; relative to B; is
M
(a,, b;) -
b;
sas
In the case where 1 = 0, we have fy = agidy, so fo is selfadjoint. By 8.11, there is an ordered orthonormal basis of Vo with respect to which the matrix of fo is diagonal. Now by 10.12 the primary components V; are pairwise orthogonal. Pasting together the ordered orthonormal bases in question, we then obtain an ordered orthonormal basis of V relative to which the matrix of f is of the form stated. > 10.14 Corollary A real square matriz 1s normal if and only if it 1s orthogonally similar to a matriz of the form described in 10.13.
>
Our labours produce a bonus : an orthogonal linear mapping f is such that f—! exists and equals f*, and so is in particular normal. We can therefore deduce from the above a canonical form for orthogonal mappings and matrices.
VOLUME
\
118
4: LINEAR
ALGEBRA
10.15 Theorem [If V is a non-zero finite-dimenstonal real inner product space and f : V — V 1s an orthogonal linear mapping then there 1s an ordered orthonormal basis ofV with respect to which the matriz of f 1s of the form
Im
el P; P, Py
an which each P; ts a 2 X 2 matrix of the form
He where 8B #0 and a? + f? = 1. Proof
With the same notation as in 10.13, we have that the ma-
trix M(a;,b;), which represents f; relative to the ordered basis B;, is an orthogonal matrix (since f; is orthogonal). Multiplying this matrix by its transpose, we obtain an identity matrix and, equating entries, we see that a? + b? = 1. As for the primary component Vo, the matrix of fo is diagonal. Since the square of this diagonal matrix is an identity matrix, its entries must be +1. We can now rearrange the basis to see that the matrix of f has the form described. > Example If f : IR° — IR® is orthogonal then f is called a rotation if det A= 1 for any matrix A that represents f. If f is a rotation then there is an ordered orthonormal basis of IR® with respect to which the matrix of f is 1
0
0
0
cos?
—sin?
O
sin?
cos
for some real number ?.
Index
algebra, 1 annihilator, 47,67 adjoint, 80,85
elementary divisor, 53 elementary Jordan matrix, 33
eigenvalue, 5 eigenvector, 5
Bessel’s inequality, 73 bidual, 63 bilinear form, 99 bitranspose, 66 block diagonal form, 15 Cayley-Hamilton theorem, 2 Cauchy-Schwarzg inequality, 71 characteristic polynomial, 2 classical canonical matrix, 56
classical p-matrix, 56 companion matrix, 49 complex inner product space, 69 conjugate isomorphism, 78 conjugate transformation, 78 coordinate form, 60 cyclic basis, 49 cyclic decomposition, 49 cyclic subspace, 49 diagonalizable, 20 direct sum, 8 distance, 71
Fourier coefficients, 77 Gram matrix, 98 Gram-Schmidt process, 75 Hilbert space, 72 idempotent, 10 index, 31 inner product space, 69 invariant subspace, 13 Jordan Jordan Jordan Jordan
basis, 39 block, 33 canonical matrix, 37 decomposition, 28
Lagrange polynomial linear form, 58 linear functional, 58
mimimum polynomial, 3
dot product, 70 dual space, 58
nilpotent, 22 normalising, 73
120
ortho-diagonalizable, 88 orthogonal, 72 orthogonal complement, 82 orthogonal direct sum, 87 orthogonally similar, 86 orthonormal, 72 orthonormal basis, 74 ortho-projection, 87 Parseval’s identity, 77 positive, 96 positive definite, 96,106 primary decomposition, 15 projection, 10 quadratic form, 101 quotient space, 44
rational canonical matrix, 53 real inner product space, 69 scalar product, 70 signature, 105 simultaneously diagonalizable, 21 skew-adjoint, 108 spectral resolution, 90 square summable, 72 sum of subspaces, 7 Sylvester’s theorem, 105 symmetric bilinear form, 101 triangular form, 24 triangle inequality, 71 unitarily similar, 86 unitary, 86
Essential Student Algebra T.S. Blyth and E.F. Robertson Abstract algebra is the cornerstone of mathematics. The study of algebra begins with the concepts of sets and mappings (functions), which underlie all of mathematics and are logical tools used throughout science. For students who are starting on a course of study in mathematics, science, engineering or technology, algebra will form a basis for their syllabus. Essential Student Algebra is for them. Essential Student Algebra is a set of five modular texts, covering all the important topics of abstract algebra at first and second year level. Written in a straight-forward, readable style, each volume stands on its own as a concise text on one aspect of algebra. Taken as a set, the five volumes make up a comprehensive library of student algebra. Written by two highly regarded authors of algebra books for students, Essential Student Algebra includes both the theoretical side of algebra and a wealth of illustrative examples. The five volumes comprise a complete modular course up to third year level. Essential Student Algebra will be an invaluable text in colleges, universities and polytechnics as well as in senior classes at high schools everywhere. Titles in this series Volume Volume Volume Volume Volume
1: 2: 3: 4: 5:
Sets and Mappings (0 412 27880 4) Matrices and Vector Spaces (0 412 27870 7) Abstract Algebra (0 412 27860 X) Linear Algebra (0 412 27850 2) Groups (0 412 27840 5)
ISBN
O-41e-27850-e2
9 "780412°278501