Essential Student Algebra: Volume Four: Linear Algebra [Book 4, 1 ed.]
 0412278502

Citation preview

Essential Student

Algebra

VOLUME FOUR

Linear

Algebra T.S. BLYTH

&

E.F. ROBERTSON

Essential Student Algebra

——

VOLUME

FOUR

Bs

Linear

Algebra

:



~

aghast ist,

a

cor atin evant MMhoqeane x

amen

i? wi on sai = se

ess

ne a

wat est) + ramjaune

ee

ae

b

eri

Preface

If, as it is often said, mathematics is the queen of science then algebra is surely the jewel in her crown. In the course of its vast development over the last half-century, algebra has emerged as the subject in which one can observe pure mathematical reasoning at its best. Its elegance is matched only by the ever-increasing number of its applications to an extraordinarily wide range of topics in areas other than ‘pure’ mathematics. Here our objective is to present, in the form of a series of five concise volumes, the fundamentals of the subject. Broadly speaking, we have covered in all the now traditional syllabus that is found in first and second year university courses, as well as some third year material. Further study would be at the level of ‘honours options’. The reasoning that lies behind this modular presentation is simple, namely to allow the student (be he a mathematician or not) to read the subject in a way that is more appropriate to the length, content, and extent, of the various courses he has to take. Although we have taken great pains to include a wide selection of illustrative examples, we have not included any exercises. For a suitable companion collection of worked examples, we would refer the reader to our series Algebra through practice

(Cambridge University Press), the first five books of which are appropriate to the material covered here.

T.S.B., E.F.R.

Wnts

ae

ae et oP i

a mame ‘Selle

ee

he


ae

x

ee

:

-

a

;

7

.

ee

=e

7

7

_

\ aa

one rs

_

a

:

ae

egitii.se te we 2 up wh a

re

aarignes Woe eb _even }



a

t

and ahr ‘mle

7 ae

saiddereveds

yralere ee

el

et eceq sttade veo gam

Seve. Saosaqeteern

Ee wt Susuew ad ¢

=

i

: =

seen pcorstt

:

"

«

-

@ Ef ent od) ae torr ot wvitonide 9 Soap + eos te swells vee :

gputabiven by seibart won. odd ie ai Beiseco"

Jd

flee es: teehee ywariee tear hee Ia tb 5

ee

.

hisew eowre onitioll Sekmtea: saref

ea Bx miat nell aad) guabareusiee at Ssumandega 20jot pel sedate ecle of cisarw cabguniaa escsas heist vere

as

of train ory ceer ot (tes = ts

ate. , noes hee tentang otigstsl od oe ih =

aU

ss ’

wel

ake 73 haw. balsaeas weir g: Qolacmebions ‘ewig wasit tedie see18 2; SegoT

a i

S

bias pete = #

ced ae [nen oe weet on

that onead Ad 2 :

-jele whaet s sites

nina ong

meineoael

~tena ure babatned dem penal ter amen sth aod 2

barre

@

aorta

artnnn. aos 8

ak

weg Beyagact se deaht ashes xo a)

3

bit

to ahaned vit spt od2 {oe wha Syeweey

CHAPTER

ONE

The minimum

polynomial

In Volume Two we introduced the notions of eigenvalue and eigenvector of a linear mapping or matrix. There we concentrated our attention on showing the importance of these notions in solving particular problems. Here we begin by taking a closer algebraic look. Definition Let F be a field. By an algebra over F we shall mean a vector space V over F on which there is defined a mul-

tiplication in such a way that (V,+, -) is a ring with identity and

(Vz,y EV)(VA € F)

(Az)y = A(zy) = x(Ay).

Example With respect to multiplication of matrices, the vector space Mat, yn(F) becomes an algebra. Example

With respect to composition of mappings, the vector

space Lin(V,V) becomes an algebra. If V is a vector space of dimension n over F then we have an algebra isomorphism

Lin(V,V) ~ Matnyn(F) (i.e. a bijection that is both a ring and a vector space isomor-

phism) that is obtained by associating with each lear mapping its matrix relative to some fixed ordered basis. This f:V—-V is well known; see, for example, Theorems 7.2 and 7.3 in Volume Two. In practice, we work in both of these algebras, choosing whichever suits our purposes at the time. Observe, for example,

2

VOLUME

that since Mat,x,(F)

4: LINEAR

ALGEBRA

is of dimension n* 2 over F’, for every n Xn

matrix A over F the n? + 1 powers

Lea Ae

ae

are linearly dependent and so there is a non-zero polynomial

p= d9 +a,X

+ a,X* + Paes

Gt e F(X]

such that p(A) = 0. The same of course is true for any f €

Lin(V,V).

But we can do better than this : there is, in fact,

a polynomial p of degree at most n such that p(A) = 0. This is the celebrated Cayley-Hamilton Theorem which we shall now

establish. Since we shall be working in Matnxn(F), the proof we shall give will be ‘elementary’.

There are other, more elegant,

proofs which use Lin(V, V). Definition

If A € Matnyn(F) the the characteristic polyno-

mal of A is xa = det(XI, — A). Note that y,4 is of degree n in the indeterminate X.

1.1 Theorem Proof

Let

[Cayley-Hamilton]

x 4(A) =0.

B = XI, — A and

xa = det

B= bo +0:X 4+ ---+5,X”.

Consider the matrix adj B. By definition, this is an nx n matrix whose entries are polynomials in X of degree at most n—1 and so we have adj

B= Bp

+ ByX + ---+ Mee

Ha

for some n X n matrices Bo,..., By—1. Recalling that Badj

B=

(det B)I, (see, for example, Theorem 8.11 of Volume Two), we have (det B)J,, = Badj B= (XI, — A) adj B = X adj B — Aadj B, i.e. we have the polynomial identity boln + by,X + ---+6,1,X"

= BoX + +++ Bn-1X”" — ABo — ---— ABn_1X"7?.

THE

MINIMUM

POLYNOMIAL

3

Equating coefficients of like powers, we obtain boln = —ABo bil, = Bo — AB,

bn-il,

Cn

= By

Sec ABn-1

Dnt

Multiplying the first equation on the left by J,,, the second by A, the third by A”, and so on, we obtain boln = —ABo

b,A = ABo — A? By eo b, A”

ee ne =

AT

A

Bas3.

Adding these equations together, we obtain x4(A) = 0. > The Cayley-Hamilton Theorem is really quite remarkable, it being far from obvious that an n Xn matrix over a field F should satisfy a polynomial equation of degree n. This result leads us to consider the following notion. Definition If A € Matnxn(F) then the minimum polynomial of A is the monic polynomial my, of least degree such that

1.2 Theorem [f p ts a polynomial such that p(A) = 0 then the minimum polynomial ma, divides p. Proof By euclidean division there are polynomials q,r such that p = maq+r with r = 0 or degr < degmy. Now by hy-

pothesis p(A) = 0, and by definition m,4(A) = 0. Consequently we have r(A) = 0. By the definition of m4 we cannot then have degr < degm,, and so we must have r = 0. It follows that p = maq and so my divides p. > 1.3 Corollary

my, divides x4.

It is immediate from 1.3 that every zero of m, ya. The converse is also true :

is a zero of

4

VOLUME

1.4 Theorem Proof

4: LINEAR

ALGEBRA

mz, and x4 have the same zeros.

Observe that if

is a zero of x4 then det(AJ,, — A) =0

and so AJ, — A is not invertible. There is therefore a dependence relation between the columns of AJ, — A and so there is a non-

zero x € Mat, x1(F) such that Ax = Ax. Given any

h = ao +a, X

+.->+ a,X*

we then have h(A)x =apx+a,Ax+

+--+ a, A*x

= agX+ ajAx+---+

apA*x

= h(A)x

so that h(A)J, — h(A) is not invertible. Put another way, we have det[h(A)I, — h(A)] = 0. Thus we see that h(A) is a zero of Xn(4)- Now choose h = mg. Then for every zero \ of x4 we

have that m,(4A) is a zero of Xm4(A) = Xo = det XI, = X”.

Since the only zeros of this are 0, we have m4(A) = 0 and so A is a zero of ma. O

Example

The characteristic polynomial of

Vs

oF

sal!

| fan) ee)ore

is ¥4 = (X — 2). Now it is readily seen that A — 2/3 (A — 213)? 4 0, so we also have my = (X — 2)°. Example

#0 and

For the matrix 5. A=j|-1 Sai

6, 4

—6 2 Opt

we have x4 = (X—1)(X—2)?. By 1.4, the minimum polynomial is therefore either (X — 1)(X — 2)? or (X — 1)(X — 2). Since, as is readily seen, (A — I3)(A — 213) = 0, it follows that m4 = (X — 1)(X — 2).

THE

MINIMUM

POLYNOMIAL

5

The notion of characteristic polynomial can be defined for a linear mapping as follows. Given a vector space V of dimension n over F and a linear mapping f : V — V, let A be the matrix of f relative to some fixed ordered basis of V. Then the matrix of f relative to any other ordered basis is of the form P~!AP where P is the transition matrix from the new basis to the old basis (see, for example, Volume Two, Theorem 7.6). Now the characteristic polynomial of P~! AP is

det(XJ, — P~* AP) = det[P~*(XI,, — A)P] = det P~* det(XI, — A) det P = det(XJ, — A), i.e. Xp-1ap = XaA- Thus the characteristic polynomial is independent of the choice of basis, so we can define the characteristic polynomial x; of f to be the characteristic polynomial of any matrix that represents f. Likewise, the minimum polynomial my of f is defined to be the minimum polynomial of any matrix that represents f; for if A, Brepresent f then for any polynomial

p we have p(B) = p(P~!AP) = P~1p(A)P and so p(B) = 0 if and only if p(A) = 0. As we have seen, the characteristic polynomial and the minimum polynomial have the same zeros. These are called the

eigenvalues (of f or of A). Thus d is an eigenvalue of A if and only if det(AJ,, — A) = 0, and the corresponding statement for f is that \ is an eigenvalue of f if and only if Aidy —f

is not invert-

ible. In the former case there exists a non-zero x € Mat,,xi(F) such that Ax = Ax, and in the latter there exists a non-zero

z € Ker(Aidy —f), so that f(z) = Az. Such a column matrix x and vector z are called corresponding eigenvectors (of A and of

f).

1.5 Theorem

Let V be a vector space of dimension n > 1 over

€. Then every f € Lin(V,V) and every A € Matnxn(C) has at least one ergenvalue in C. Proof

x; factorises over C, say as

Xp = (KX— Ai) (X — Ag)? ---(X— A,)**. Substituting f for X and using the Cayley-Hamilton Theorem, we obtain

0 = (f — Ar idy)*(f —Azidy)®? ---(f — Anidv)*.

6



VOLUME

4: LINEAR

ALGEBRA

It follows that not all the factors (f — A; idy)* are invertible and so there is at least one eigenvalue. > Example Note that 1.5 is not true when C is replaced by R. Indeed, consider the rotation matrix cos3

aL ay ee

sin?

ark

The characteristic polynomial of Ry is X? — 2cos# X + 1, the zeros of which are cos ?+7sin ¥. Thus, when ?# is not an integral multiple of 7, Ry has no real eigenvalues. 1.6 Theorem A linear mapping f (or a square matriz A) is wnvertible rf and only if the constant term in the characteristic polynomual 1s not zero. Proof To say that f is invertible is equivalent to saying that 0 is not an eigenvalue of f, i.e. to saying that 0 is not a zero of the characteristic polynomial. Clearly, this is equivalent to the constant term being non-zero. > Example

The matrix

Pn II KF oo orrF ee Se

is such that x4 = (X — 1)°. Thus

0=(A-—I3)° = A® — 3A? +3A—I5 and consequently we see that

Aq} = A*— 3A

Sly 2

1s yi eeabton9 bow leatees Os =O) aed

CHAPTER

TWO

Direct sums of subspaces

If A and B are non-empty subsets of a vector space V over a field F then the subspace spanned by AU B, ie. the smallest subspace that contains both A and B, is the set of linear combinations of elements of AUB. in other words, it is the set of

elements of the form 3 AZe+

45y; where each z; € A, each

t=]

=

y; © B, and d;,uw;€ F. In the case where A, B are subspaces of V, this set can be described as

A+B={a+b;aeEA,beEB} which we call the sum of the subspaces A,B. More generally, if A,,...,A, are subspaces of V then we define their sum to be the subspace, denoted by }> Aj, that is spanned by LJ Aj. t=1

a=

Clearly, we have

yy Ag = {aa +.

+ ap jay € Aj}.

ol

Example

Let X,Y, D be the subspaces of IR? given by

X = {(z,0) ; z € R},

Y = {(0,y); y € R},

D = {(z,2) ; « € R}.

Then IR? = X+Y = X+D=Y+4D, for every (x,y) € IR® can be written in each of the three ways

(z,0)+(0,y),

(c—y,0)+(y,¥),

:

(0,y— 2) +(z,2).

8

VOLUME

Definition

4: LINEAR

ALGEBRA

n )> A; of subspaces Aj,..., An is said to be

A sum

4=1 n

direct if every zs € )> A; can be written in a unique way as a i=1 sum z= a, + -::+a, with a; € A; for each 7. n

We shall use the notation

@ A; to denote the fact that the

i=1

nm

sum )> A; is direct, and call this the direct sum of the subspaces t=1

AS Example

In the previous Example we have IR? =

X@Y =

X@D=YoO0D. Example

Let A, B be the subspaces of IR® given by

A =+{(xfy,2z) pe -py Ft 20},

OB = (apap)

epee RY:

Then IR® = A+ B since, for example,

(=, 9,2) = (3(z— y),-3 (2 — y),0) + (3(z +9), 3(z+ y),2)This sum is not direct, however, for we can also write (z, y, z) as ($(z-y+1), -4(z-y-1), —1)+($(z+y—1), 4(z+y—1),2+1). 2.1 Theorem If Aj,...,A, are subspaces of a vector space V then the following statements are equivalent : n

(1) the sum > A; ts direct; awl

(2) af 30 a; =0 with a; € A; for every 2, then every a; = 0; al

(3) for every, AN > A; = {0;}. jxi

Proof (1) = (2) : By the definition of direct sum, if (1) holds then Oy can be written in only one way

as a sum

a; € A; for every 7.

(2) = (3) : Let

>. a; with t=L

re A;M D> A;, say z =a; = Do a;. We can J#t

TFt

write this as a; — }) a; = 0. By (2) we deduce that a; = 0, j#t whence z = 0.

DIRECT

SUMS

OF

SUBSPACES

9

(3) = (1) : Suppose that (3) holds and that 5> a; = t=

where a;,b; € A; for each z. Then

aj—b; =

b; t =1

DY (b; = a;) At

where the left hand side belongs to A; and the right hand side

belongs to }> A,;. By (3) we deduce that a; — b; = 0. Since this j#t

holds for every 7, (1) follows. > 2.2 Corollary

and only ifV

If A,B are subspaces

= A+B

and

ANB=

of V thenV = A@B

if

{Oy}.

Example A mapping f : IR — RR is said to be even if f(—z) = f(z) for every x € IR, and odd if f(—z) = —f(z) for every z € RR. The sets A, B of even, odd functions are subspaces of the vector

space V = Map(IR, IR). Moreover,

V = A@B. Tosee this, given

any f : IR — R let f+ : IR — IR and f~ : IR > R be given by

f*(2) = 4[f(2) + f(~a)] and f~(2) = 3[f(z) - f(-2)]. ‘Then ft iseven and f~ isodd. Since f = ft+f~ we have V = A+B.

Since clearly AM B consists only of the zero function, it follows by 2.2 that V = AO B.

Example

Let V be the vector space Mat, x,(IR). If A, B are

the subspaces of V consisting of the symmetric, skew-symmetric matrices then V = A@ B. In fact, every X € V can be written uniquely in the form X = Y + Z where Y € A and Z € B; we

have Y = 4(X+ X*) and Z = 3(X— X°). In a direct sum, bases can be pasted together : 2.3 Theorem let Vi,...,Vn

Let V be a finite-dimensional vector space and be non-zero subspaces of V such thatV = @V;,. a= 1

If B; ts a basis of V; for eacht then |) B; is a basis of V. t=1

Proof

i=1

Let dimV; = d; and let B; = {e;,1,..., €,a,}. Since

V=

V; we have V;N35 V; = {Oy} by 2.1 and hence V;NV; = {Oy} j#i

for i # 7. Consequently B; Nn B; = @ for i # 7.. Now a typical

10



VOLUME

4: LINEAR

ALGEBRA

element of the subspace spanned by |) B,; is of the form t=1

dy

()

SSAvery tot 2Sdn Anveng

j=l

g=1

i.e. of the form d;

(2) Since

Zit+:::+2n

where

2; = D> Axze:,;-

nm

V = )> V; and since B,; is a basis of V; it is clear that every t=1

x = V can be expressed in the form (1) and so V is spanned by UB. If now in (2) we have 21 + ---+ 2, = Oy then by 2.1 we deaaes that each z; = Oy and consequently each 4;;= 0. Thus

U B; is a basis ofV. 4=1

2.4 Corollary

n

n

a=)

t=1

dim @ V; = >> dimV;. >

We shall now determine precisely when a vector space is a direct sum of finitely many non-zero subspaces. As we shall see, this is closely related to the following types of linear mapping. Definition Let A, B be subspaces of V = A@B, so that every z € V can the form z = a+b where a € A and on A parallel to B we mean the linear

a vector space V such that be expressed uniquely in b € B. By the projection mapping p: V — V given

by p(x) =a.

Example We know that IR? = X @ D where X = {(z,0);

cE

IR} and D = {(z,z) ; z € IR}. The projection on X parallel to D is given by p(z, y) = (x — y,0). Thus the image of the point (x,y) is the point of intersection with X of the line through (z,y) parallel to to the line D. The terminology used is thus suggested by the geometry. Definition A linear mapping f : V — V is said to be a projection if there are subspaces A, B such that V = A@B and f is the projection on A parallel to B. A linear mapping f :V — V is said to be idempotent if fo f = f.

DIRECT

SUMS

OF

SUBSPACES

11

2.5 Theorem Jf V = A@B parallel to B then

and tf p is the projection on A

(1) A= Imp= {zeEV; z= p(z)}; (2) B= Kerp; (3) p ts idempotent. Proof

(1) It is clear that

If now a € A element in A p(a) =a and (2) Let z €

A = Imp D {rt

EV ; x = p(z)}.

then its unique representation as the sum of an and an element in B is a = a+ 0. Consequently the inclusion becomes equality. V have the unique representation z = a+b where

a€AandbeB. Then since p(x) = a we have p(z)=0y a=0) c=b eB. In other words, Kerp = B.

(3) For every z € V we have f(z) € A and s0, by (1), f(z) = F\é(x)\|.-Thusof —f of > 2.6 Theorem A linear mapping f : V — V 18 a projection tf and only tf 1t 1s tdempotent, in which case V = Imf @ Kerf and f 1s the projection on Imf parallel to Ker f. Proof Suppose that f is a projection. Then there exist subspaces A, B with V = A@B and f is the projection on A parallel to B. By 2.5, f is idempotent. Conversely, suppose that f : V — V is idempotent. If z €

Im f M Kerf then we have z = f(y) for some y, and f(z) = Oy. Consequently, z = f(y) = f[f(y)] = f(z) = Ov and hence Im f N Kerf = {0}. Now for every z € V we observe that

f{z— f(2)] = f(z) — fF (2)] = f(z) — f(z) = and so z — f(z) € Ker f. The identity z = f(x) + z— f(z) now shows that

V = Imf

+ Ker f.

It follows by 2.2 that V =Imf @ Kerf. Suppose now that

z =

a+b where a € Imf and b € Kerf.

Then a = f(y) for some y, and f(b) = Oy. Consequently,



f(z) = f(a +) = f(a) + OW = f[f(y)] = fly) =. In other words, f is the projection on Im f parallel to Ker f. >

12



VOLUME

2.7 Corollary If f:V —V Moreover, in this case,

4: LINEAR

ALGEBRA

is a projection then so 1s idy —f.

Im f = Ker(idy —f). Proof Writing f o f = f? we deduce from f?= f that

(idy —f)? =idy —f —f + f?

=idy —f.

Also, by 2.5, we have

zelmf

z= f(z)

(idy —f)(z) = 0

and so Imf = Ker(idy —f). > We shall now show how the decomposition of a vector space into a direct sum of finitely many non-zero subspaces may be expressed in terms of projections. 2.8 Theorem

I[f V is a vector space then there are non-zero

subspaces Vj,...,Vn of V such that there are non-zero that

nm

V = QV; tf and only if =o

linear mappings pj,..-,Pn

: V —

V_ such

(1) La= idy; (2) fee

Pi Op; =0.

Moreover, such mappings p; are necessarily projections and V;, = Inip; fors4.= 1. on

Proof Suppose first that

V = @) V;. Then for: = 1,...,n we e—1

have

V = V; ® }> V;. Let p; be the projection on V; parallel to

jv >» V;, and let p7*(X)= {p;(z) ; z © X} for every subspace X

j#t

of V. Then for every z € V we have, for 7 #12,

P:[p;(z)] € p;page= Djae

by 2.6

DIRECT

SUMS

OF SUBSPACES

and so p; op; = 0.

13

Also, since every s € V can be written n

uniquely in the form z =

)> z; where z; € V; for each 2, and Bil

since p;(z) = 2; for each 7, we have

iSO

n

c=

AES > (2) ~ (> pi)(z) .—

nm

whence

)> p; = idy. rr §

Conversely, suppose that p1,..., pp satisfy (1) and (2). Then we note that nm

nm

Pi = pi cidy = pio ‘goSYh py = =a! YO (pi ops) = pion; so each p; is idempotent and therefore, by 2.6, is a projection. Now for every z € V we have

= idv (2) = (3 ps)(2) =

_ Pi(z) E >> Imp; t

which shows that

t=1

nm

V = )° Imp;. If now z € Imp; N D> Imp; pari ae then, by 2.5, z = p;(x) and z = )> a; where p,(z;) = 2; for

every

7 #1. Consequently,

jFt

= pila) = v(t) = e(Epales) = E pales(a)] = ov jFt

and it follows that

nm

V = @ Imp;. > s=1

The description in 2.8 opens the door to a deep study of linear mappings and their representation by matrices. In order to embark on this, we require the following notion. Definition If V is a vector space over a field F andif f: V +~V is linear then a subspace W of V is said to be f-invariant (or

f-stable) if it satisfies the property

ce W => f(z) EW. Example invariant.

If f : V — V is linear then Imf and Kerf are f-

14

VOLUME

Example

4: LINEAR

ALGEBRA

Let D : IR|X] — IR[X] be the differentiation map

on the vector space of all real polynomials.

Then the subspace

IR, [X] of polynomials of degree at most n+ 1 is D-invariant. Example

If f : V — V is linear and

z € V with z # Oy then

the subspace spanned by {z} is f-invariant if and only if z is an eigenvector of f. In fact, the subspace spanned by {z} is

Fz = {Az ; » © F} and this is f-invariant if and only if for every \ € F there exists w € F such that f(Arz) = wr. Taking \ = 1r we see that z is an eigenvector of f. Conversely, if z is

an eigenvector of f then f(z) = wz for some p € F and so, for every A € F, we have f(Az) = Af(z) = Az. A useful result concerning invariant subspaces is the following. 2.9 Theorem

/ff :V —V

1s linear then for every polynomial

p over F the subspaces Imp(f) and Kerp(f) are f-invariant. Proof

Observe that for every polynomial p we have

foplf) =plf)eof. It follows from this that if c = p(f)(y) then f(z) = p(f)[f(y)],

so Imp(f) is f-invariant; and if p(f)(z) = Oy then p(f)|f(z)| = Oy, so Ker p(f) is f-invariant. > In what follows we shall often have occasion to deal with

expressions of the form p(f) where p is a polynomial and f is a linear mapping, and in so doing we shall find it convenient to denote composites by simple juxtaposition. Thus, for example,

we shall write fp(f) for f op(f), fg for fog, f? for fof. Suppose now that V is of finite dimension n and that the subspace W of V is f-invariant. Choose a basis {w1,..., w,} of W and extend it to a basis

Rise { wi, oy wp, tig, tae Vall of V. Then, since W is f-invariant, it is readily seen that the matrix of f relative to B is of the form

A B 2K

DIRECT

SUMS

OF SUBSPACES

15

where A is an r X r matrix that represents on W by f. Suppose now that V = W, @ W2 where f-invariant. If B, is a basis of W, and Bz by 2.3 we have that B = B, UBz is a basis seen that the matrix of f relative to B is A, O

the mapping induced W, and Wy, are each is a basis of W2 then of V, and it is readily of the form

0 Ag

where Aj, Az represent the mappings induced on Wj, Wp by f. n

More generally, if V =

@ W; where each W, is f-invariant

p,"

respectively, where pi,...,Pk are distinct irreducibles in F[X]. Then jie of the subspaces V; = Kerp,(f)* 13 f-invariant and -6 V;. aa

Moreover,

if f; :Vi —

on V; by f then the minimum

V; 1s the linear mapping died ao Of fa38 pe

and the characteristic polynomual of f; 13 pi ;

16

VOLUME

Proof

If

=

4: LINEAR

1 the result is trivial, so suppose

ALGEBRA

that k > 2.

Fort =sljeg:zkalet qua wnp/presalh Pp,’ Then there is no JF

irreducible factor that is common

to each of q,...,q%

and so

there exist a1,...,a@, € F[X] such that qiay + q2d2+ --' + qeax = 1. Writing t; = q;a; for each zt and substituting f in this polynomial identity, we obtain

(1)

ti(f) +to(f) + --- +te(f) =idy -

Now by the definition of g; we have that if: # 7 then m,;

divides

4i9;- Consequently q;(f)q;(f) = 0 for 7 #7 and then

(2)

(@A3)

— ti(f)t;(f) = 0.

By (1), (2) and 2.8 we see that each t;(f) is a projection and k

i=

@ bey):

Moreover, by 2.9, each of the subspaces Im#;(f) is f-invariant. We now show that Imt,;(f) = Kerp,(f)®. Since p;'q; = my we have p;(f)°‘qi(f) = m;(f) = 0 from which it follows that p;(f)“t;(f)

=

0 and hence Im¢,;(f)

C

Ker p;(f)*. To establish the reverse inclusion, observe that, for every 7,

t3(f) = as(f)a;(f) = TT pel)" -a;(f) iZj

and hence Ker p;(f)* ¢ f) Kert;(f) IFt

C Ker )/ t;(f) T#i

= Ker(idy —t;(f)) = Imt,(f)

by 2.7.

by (1)

DIRECT

SUMS

OF SUBSPACES

17

As for the induced mapping f; : Vi — Vi, let my, be its minimum polynomial. Since p;(f)® is the zero map on V;, so is

pi(fi)*. Consequently we have that my, |pf'. Thus my,|m; and the my, are relatively prime. Suppose now that g € F[X] is a multiple of pid for sian a. Then g(f;) is the zero map on V;.

Bilge eS t=1

GSD

VES V then

t=1

a(f)(2) = 5 (f(x) = a a f:)(u%) = Ov and so g(f) = O and consequently m;|g. my is the least common

Thus we see that

multiple of my,,...,my,,.

Since these k polynomials are relatively prime, we then have m; = [| my,. tt

But we know that m; = llp;', and that my, |ps‘. Since all the 4=1

polynomials in question are monic it follows that ms, = pj‘ for ea eae Finally, we can paste together bases of the subspaces V; to form a basis of V with respect to which the matrix of f is of the block diagonal form

A; Ag Ax Since, by the theory of determinants,

k det(XI — M) = [J det(XI— A;) iit!

we see that x7; = I X4;- Now we know that m;, = p;* and so, by 1.4, we must have Xs = p;' for some r; > e;. Thus bit)

bite

=

I] ef! = xz = IT pF 1

et:

from which it follows that r; = d; fori =1,...,k. >

18

VOLUME

2.11 Corollary

(i=1,...,k)

4: LINEAR

ALGEBRA

dimV; = d; deg p;.

Proof dimV; is the degree of xs;. > 2.12 Corollary Let V be a non-zero finite-dimensional vector space over a field F. If f : V — V 1s linear and all the ergenvalues off le in F, so that

Xp = (X— Ar)" (X — Az)? --- (X— Ak)*, my = (X= Az)3(X - dz)? ---(X— Ak), then each of the subspaces V; = Ker(f — A; idy)* 1s f-invariant,

k

of dimension d;, and VV = @V;.0 1

Example

Consider the linear mapping f : IR® — IR® given by f(z, y, 2) =

(—z,2+2,y + 2).

Relative to the standard ordered basis, the matrix of f is

It is readily seen that x4 = m4 = (X + 1)(X — 1)?. By 2.12,

IR° = Ker(f + idy) @ Ker(f — idy)? with Ker(f+idy )of dimension 1 and Ker( f—idy)? of dimension 2. Now

(f + idy)(z, y,z) = (cx-—2z,2+ y+2,y+4 22) so a basis for Ker(f + idy) is {(1,—2, 1)}. Also, (f —idy)?(z, y,2)=(z—y+z,—-22 + 2y — 2z,2-y+2)

so a basis for Ker(f — idy)? is {(0,1, 1), (1, 1,0)}. Thus a basis for IR° with respect to which the matrix of f is in block diagonal form is B=

{(1,-2, 1), (0, 1, 1), (1, 1, 0)}.

DIRECT

SUMS

OF

SUBSPACES

19

The transition matrix from B to the standard basis is

i P=|-2

OL 1

= 1

1 and the block diagonal form of A is then

=I Baap =

2 ia 1

Example

.0

Consider the differential equation

(D” + a,_,D""* + ---+a,D+ a9)f=0 with constant

(complex) coefficients.

Let V be the solution

space, i.e. the set of all infinitely differentiable functions satisfying the equation. If

m= X" + an-1X" 14+ --++a1X + a9 then over € we have

m=

(X — a1) (X — a2)? ---(X — ag).

Then D: V — V is linear and its minimum polynomial is m. By 2.12, V is the direct sum of the solution spaces V; of the differential equations

(D — a; id)* f=0. Now the solutions of (D — aid)" f = 0 can be determined using the fact that, by a simple inductive argument,

(D — aid)" f = e**D"(e~** f). Thus f is a solution if and only if D"(e~** f) = 0, which is the case if and only if e %* f is a polynomial of degree at most n—1.

A basis for the solution space of (D — aid)" f= 0 is then le (oe Ro eeaed & It is natural to consider the particular case of the Primary Decomposition Theorem in which the irreducible factors p; of my are all linear and each e; = 1. This gives the following important result.

20

VOLUME

4: LINEAR

ALGEBRA

2.13 Theorem Let V be a non-zero finite-dimensional vector space over a field F. Then a linear mapping f : V — V 18 diagonalizable tf and only tf tts minimum polynomial my 1s a product of distinct linear factors. Proof

Suppose that

my = (X — A1)(X — Az) --- (X — Ax) where 41,...,Ax © F are distinct. By 2.12, V is the direct sum of the f-invariant subspaces V; = Ker(f — A;idy). For every z € V; we have (f — ; idy)(z) = 0, so that f(z) = A;z. Thus every non-zero element of V; is an eigenvector associated with the eigenvalue ;. By 2.3, we can paste together bases for V,,...,V~ to form a basis for V. Thus V has a basis consisting of eigenvectors of f, so f is diagonalizable. Conversely, suppose that V has a basis consisting of eigenvectors of f. Let A1,...,A,% be the distinct eigenvalues of f and consider the polynomial

p = (X — A1)(X — Az) ---(X — Ax). Clearly, p(f) maps every basis vector to Oy and consequently p(f) =0. The minimum polynomial m; therefore divides p, and must coincide with p since every eigenvalue is a zero of mys. > Example

Consider the linear mapping f : IR° — IR® given by

f(z, y,2) = (7x — y— 22,-2 + Ty + 22, —22 + 2y + 102). Relative to the standard ordered basis of IR®, the matrix of f is Tt A=j|-1 i

=—1° 7 2

=2 2 10

It is readily seen that x4 = (X — 6)?(X — 12) and that my = (X — 6)(X — 12) so, by 2.13, f is diagonalizable. An interesting result concerning diagonalizable mappings that will be useful later is the following.

DIRECT

SUMS

OF

SUBSPACES

2.14 Theorem Let V space over a field F and mappings. Then f and the sense that there 13 a

21

be a non-zero finite-dimensional vector let f,g: V — V be diagonalizable linear g are simultaneously diagonalizable (in basis of V consisting of eigenvectors of

both f and g) tf and only if fog=gof. Proof = : Suppose that there is a basis {v1,..., un} of V such that each v; is an eigenvector of both f and g. If F(vi) = Avv;

and g(v;) = u,v; then

flg(vi)] = Asmivs = wALv; = 9[F(v,)]. Since f og and go f thus agree on a basis, it follows that they are equal.

V; be the linear mapping thus induced by g. Since g is diagonalizable so is each g;, for the minimum polynomial of g,; divides that of g. We can therefore find a basis B; of V; consisting of eigenvectors of g;. Since every eigenvector of g; is an eigenvector of g and since every element of V; is an eigenvector of f, it

k

follows that

|) B; is a basis of V consisting of eigenvectors of i

both f and g. > 2.15 Corollary Let A,B be nxn matrices over a field F. If A and B are diagonalizable then they are simultaneously dtago-

nalizable (i.e. there is an invertible matriz P such that P~* AP and P~1BP

are diagonal) if and only if AB=

BA.

O

CHAPTER

THREE

Reduction to triangular form

Despite the fact that, in general, f : V — V does not have a diagonal matrix representation, it is possible to ‘simplify’ the matrix representation of f in several ways. In this Chapter we shall describe the ‘easiest’ of these. We shall be concerned with those linear mappings f whose minimum polynomial (and hence also whose characteristic polynomial) factorises completely as a

product of (not necessarily distinct) linear factors.

Of course,

this always happens when the ground field is C, so the results we shall prove will be valid for all linear mappings on a finitedimensional complex vector space. Specifically, we shall show that for such a mapping f there is an ordered basis of V with respect to which the matrix of f is triangular. In order to see how to proceed, we observe first that by 2.12 we can write V as a direct sum of the f-invariant subspaces

V; = Ker(f — A; idy)*. Let f; : V; + V; be the linear mapping induced on the ‘primary component’ V; by f, and consider the mapping f; — 4; idy, : Vi - V;. We have that (f; — 2; idy,)* is the zero map on V;, so f; — ; idy, is nilpotent, in the following

sense. Definition A linear mapping f : V — V is said to be nilpotent if f” = 0 for some positive integer m.

Example f : IR° — IR® given by f(z,y,z) = (0, z, y) is nilpotent. In fact, f?(z, y,z) = (0,0,z) and f* =0. Example If f : €” — C” is such that all the eigenvalues of f are 0 then xs = X” and so, by Cayley-Hamilton, f” = x;(f) = 0. Thus f is nilpotent.

REDUCTION

Example

TO

TRIANGULAR

FORM

23

The differentiation map D : IR,|X] —

IR,[X] is

nilpotent. We now produce a particularly useful basis for V in the presence of a nilpotent linear map. 3.1 Theorem Let V be a non-zero finite-dimensional vector space over a field F and let f : V — V be a nilpotent linear

mapping.

Then there 1s a basis {v,,...,Un} of V such that f(vi) = Ov;

f(v2) € (v1); f(v3) € (v1, v2);

f (un) = (vj, ees

Proof Since f is nilpotent that f™ = 0. If f = 0 then conditions, so let f # 0. integer such that f* = 0.

Oa

there is a every basis Now let k Then ft

positive integer m such of V satisfies the stated be the smallest positive 4 0 for1

A

Seep wie o1a'y Oye

te PORp Ee

aT

bap

VOLUME

34

4: LINEAR

ALGEBRA

Now let fi: =

Aba

53

sis ¢ sony}

€ T,}. Then by 4.3 the set

and write f(T.) = {f(z) ;

By-2U f~ (Tk) is a linearly independent subset of W,-1. Extend this to a basis

By-2U f(Th) U{y1,--- ye} of Wi_1.

Now let

Th-1 = f(Tk) U {yi,---s ye}. Then by 4.3 the set

Br-3 U f~ (Tk-1) is a linearly independent subset of W,_2. Extend this to a basis Beg U Fo tipi) U1 {255

of W,-2, and so on.

ates

Writing T;, as {z1,...,Za}, we thus see

that we can form the following basis of V : Ti,

+e+3

f(z1),

eeey

f{=z),

Y1;

oeey

YB,

f?(z1);

eoey

f?(za);

f(y),

eeey

f (ys),

fF? (21),..-,f"7*

Lay Z1y+++y2yy

(2a) f*-7 (ya) s+» f8-7( yp). + dts + 9 Gu

Note that in this table the elements in the 7-th row from the bottom are in W;. Also, every element in the table is mapped by f to the element lying immediately below it, the elements in the bottom row being mapped to Oy. Now order this basis by taking the first column starting at the bottom, then the second column starting at the bottom, and so on. Then it is readily seen that the ordered basis B that we obtain in this way is such that the matrix of f relative to B is a Jordan block associated with the eigenvalue 0. >

REDUCTION

TO

JORDAN

FORM

35

Example To illustrate the above argument, consider the mapping f : IR* > IR* given by

f(a, 6, c,d) = (0, a,d,0). We have f? = 0 so f is nilpotent of index 2. Now

V; = Ker f = {(0,b,c, 0) ; b,c € IR},

V2 = Ker f? = IR*.

A basis for V; is By = {(0,1,0,0), (0,0, 1,0)} which we extend to a basis

Bz = {(0,1,0,0), (0,0,1,0), (1,0,0,0),(0,0,0,1)} of IR*. Now consider Tp = {(1,0,0,0), (0,0,0,1)}. We have

f- (Tz) = {(0,1,0,0), (0,0, 1,0)} and B, U f~(T2) = Bi. We then form the basis (1,0,0,0), (0,0, 0, 1), (0, 1,0, 0), (0,0, 1, 0) of IR* and order it as follows : Ba

{(0, 1,0, 0), (1,0, 0,0), (0,0, 1, 0), (0, 0,0, 1)}.

The transition matrix from B to the standard basis is

oO oorF KF oo°oO oroo CO ©O

Now P~1! = P and the matrix of f relative to the standard basis is

OF Oo © OsOZOEO oro°o © oorF

36

VOLUME

4: LINEAR

ALGEBRA

So the Jordan block is given by Ome:

PAP —1

=

niet

ms

0

0

In practice, we rarely have to carry out the above computation. To discover why, let us look more closely at the proof of 4.4. Observe that [Tx |=

|T.-1| = a+

A=

Nk —

Nk-1;

B = ne-1 — Pe-2;

|[T.-2| = a+ B+7 = ng-2— Ne-3,

and consequently a+B+7+---+w

=dim Kerf,

as can be seen by referring to the basis displayed on page 34. The number of elements in the bottom row of this display is dim Ker f. Now from this basis it is clear that there are a > 1

elementary Jordan matrices of size k x k involved, then 8 >

O of size (kK — 1) x (k — 1), and so on.

So we conclude from

the above observation that the number of elementary Jordan matrices appearing 1s dim Ker f. Returning to our Example, we see that Ker f has dimension 2, so there are precisely two elementary Jordan matrices involved. Since one at least has to be of size k x k = 2 x 2, the only possibility for the Jordan block is Hl 0 0 Obert

0

0

Let us now apply 4.4 to the mappings f;—, idy, of 2.12. Note

that by 2.10 the minimum polynomial of f; is my, = (X — i). Consequently we have that the mapping f; — A; idy, 13 nilpotent of index e; on the d;-dimensional subspace V;.

REDUCTION

TO

JORDAN

FORM

37

4.5 Theorem Let V be a non-zero finite-dimensional vector space over a field F. If f : V — V 1s linear and if all the eigenvalues of f le in F then there is an ordered basis of V with respect to which the matriz of f is a block diagonal matriz

A; Ag Ax an which every A; ts a Jordan block. Proof With the usual notation, if we apply 4.4 to the nilpotent mapping f; — A; idy, then we see that there is a basis of V; = Ker(f — A; idy)* with respect to which the matrix of f; — Azidy, is a Jordan

block with 0 down

the diagonal

(since

the only eigenvalue of a nilpotent mapping is 0). It follows that the matrix of f; is a Jordan block with \; down the diagonal.

o Definition

A matrix of the form described

in 4.5 is called a

Jordan canonical matriz of f. Of course a Jordan canonical matrix is, strictly speaking, not unique since the order in which the Jordan blocks A; appear down the diagonal is not specified. However, the number of such blocks, the size of each block, and the number of elementary Jordan matrices that appear in each block, are uniquely determined by f. So, provided we ignore the order of the blocks, we can choose to talk of ‘the’ Jordan matrix that represents f. This is often also called the Jordan normal form. If the characteristic and minimum polynomials of f are k

xf

II (X -A)*, =

8

Md

t

ioe(xen)

then from the previous discussion we have that, in the Jordan form, the eigenvalue 4; appears d; times in the diagonal, and the number

of elementary

Jordan matrices

associated with ;

is dim Ker(f; — 4; idv,), which is the geometric multiplicity of the eigenvalue A;. Moreover, at least one of these elementary Jordan matrices is of size e; X e;.

38

VOLUME

4: LINEAR

ALGEBRA

Example Let f : IR’ — IR” be linear with characteristic and minimum polynomials

xp = (X= 1)(X=2)4,

my = (X—1)7(X—2)°.

In any Jordan matrix that represents f the eigenvalue 1 appears three times in the diagonal, with at least one associated elementary Jordan matrix being of size 2 x 2; and the eigenvalue 2 appears four times in the diagonal, with at least one associated elementary Jordan matrix of size 3 x 3. Up to the order of the blocks, the only possibility is therefore

no no

2 Example Let us modify the previous Example slightly. pose that yy is as before and that now

Sup-

my = (X — 1)?(X— 2)?. In this case the eigenvalue 2 appears four times in the diagonal with at least one associated elementary Jordan matrix of size 2 x 2. The possibilities for the Jordan form are then

Example

If f :V — V has characteristic polynomial

xs = (X - 2)?(X— 3)°

REDUCTION

TO

JORDAN

FORM

39

then the possible Jordan forms, obtained by considering all six possible minimum polynomials, are

where A is one of

and B is one of

a

| 3.

Lie

3

;

3

3

3

3

We now consider the problem of finding a Jordan basis for f, i.e. a basis of V with respect to which the matrix of f is a Jordan canonical matrix J. This is, of course, equivalent to the problem of finding an invertible matrix P such that P-1'AP = J where A is the matrix that represents f relative to some fixed ordered basis. To see how to proceed, it suffices to consider the very special case where the Jordan matrix of f is the tx t matrix Atel AY r

1

NY} nN

A corresponding basis {v,,..., v¢} will be such that f(v1) = Av,

f(v2) =Ave+ v1, f (vs) =

Av3

+

U2;

f(ve-1) = Ave-1 + U2; f (ve) =

Avt ae

aie

40

VOLUME

4: LINEAR

ALGEBRA

\

Thus, for every t X t elementary Jordan matrix associated with A we require v1,...,v4 to be linearly independent with

v, € Im(f — Aid) N Ker(f — Aid); (i=2,...,t) Example

(f —Aid)(vs) = vi-1.

Let f : IR° — IR® be given by

f(z,y,2z) = (x+y, -2+ 3y,-—z+ y+ 22). Relative to the standard ordered basis, the matrix of f is jets Gh A=|-1 3 0 —1 1:2

We have x4 = (X — 2)° and mg = (X — 2)?. The Jordan form is then J=

Now we have (f = 2id)(z, Y, 2) =

(—z ty,-Z+y,-zt+

y)

and we have to first choose v; € Im(f — 2id) MNKer(f — 2id). Clearly, vy = (1, 1,1) will do. Next we have to find v2, indepen-

dent of v,, such that (f — 2id)(v2) = v1. Clearly, vg = (1, 2,1) will do. Finally, we have to choose vs € Ker(f — 2id) with {v,, v2, v3} independent. Clearly, vg = (1, 1,0) will do. Thus a Jordan basis is

B = {(1,1,1), (1,2, 1), (1,1, 0)}. The transition matrix from B to the standard basis is Lot P=|1 2 40.58 fOn We invite the reader to verify that P~1'AP = J. An interesting consequence of the Jordan form is the following result.

4

REDUCTION

TO

4.6 Theorem

JORDAN

FORM

41

Every square matriz A over € is similar to tts

transpose.

Proof Because of the form of the Jordan canonical matrix, it clearly suffices to establish the result when A is an elementary Jordan matrix of the form

Now if

f B=

{vy,...,v%}

is an associated Jordan basis define (¢=1,...,k)

Wy = VE-441

and consider the ordered basis

B* = {w,-.., we} = {p,-.., 01}. Now it is readily seen that the matrix relative to this basis is A‘. Consequently we have that A is similar to A’. © We shall now illustrate the usefulness of the Jordan form in solving systems of linear differential equations. It is not our intention to become heavily involved with the theory. A little by way of explanation together with some illustrative examples is all we have in mind. By a system of linear differential equations with constant coefficients we mean a system of equations of the form w= 04,2, + ay272 + --° + Ginn

Ly = A212, + 42272 + -*

a, =

Oni 24 + On2Z2 +

+ GanTn

-**- + OnnZn

42



VOLUME

4: LINEAR

ALGEBRA

where 21,...,Zn are real functions, z, denotes the derivative of z;, and a,; € IR for all 7,7. These equations can be written in the matrix form

(1)

xX! = AX

where X = [z1 ... Zn’ € Matn xi(IR) and A = [ajj|nxn. Suppose that A can be reduced to Jordan normal form Ja, and let P be an invertible matrix such that P~'AP = Jy. Writing Y = P~'X, we have (2)

(BY aX

= AX

APY

and so

(3)

Y' = P-'X' = P“1APY

= Jay.

Now the form of J4 means that (3) is a system that is considerably easier to solve for Y; then, by (2), PY is a solution of

(1).

Example

Consider the system

zy=

21+

2

Lp = —21 + 322 Zs =

21

+ 422 —

Z3

ie. X! = AX where Zi Pe

1

T21;

A=]|-1

z3

el 3

-1

4

0 0 -1

We have x4 = (X + 1)(X — 2)? = my and so the Jordan form of A is eeL Ja =

a

IL 2

We now determine an invertible matrix P such that P~!AP = Ja. For this, we determine a Jordan basis. Let us do so with

REDUCTION

TO

JORDAN

FORM

43

matrices rather than mappings, for a change. Clearly, we have to find independent column vectors p;,p2, p3 such that (A =o I3)p1

=

0,

(A io 2I3)p2

=

0,

(A — 2I3)p3 = po. Suitable vectors are, for example, 0

1

Pi=/0],

—1

po=]1],

1

Ps=

1

0

Thus we can take Pa

0"

i

t0

ed

—1

Leo

0

(Check that P~'AP = Jy or, equivalently, that AP = PJ,.) With Y = P~1X we now solve Y' = JY, ice.

¥,=—-1, y> —

2y2 ata ¥3;

y3 = 2ys. The first and third of these equations give y; = y3 = a3e**, and the second equation becomes

t a ,e ” and

yy = 2y2 + age” 2t so that yz = ag3te** + age**. Consequently we see that

a,e* Y = | age** + agte” a3e*t A solution of the original system of equations is then given by ae" + a3 (t == 1)e”* >, Cae

aze** + a3te?*

aye~* + age” + agte”*

CHAPTER

FIVE

The rational and classical forms

Although in general the minimum polynomial of a linear mapping f : V — V can be expressed as a product of powers of irreducible polynomials over the ground field F of V, say €1 7€2 mz = pip? ... pe,

the irreducible polynomials p; need not be linear. Put another way, the eigenvalues of f need not in general all lie in the ground field F. It is natural, therefore, to seek a canonical matrix representation for f in the general case, which will reduce to the Jordan representation when all the eigenvalues of f do belong to F. In order to develop the machinery to deal with this, we first consider the following notion. Suppose that W is a subspace of the vector space V. Then in

particular W is a (normal) subgroup of the additive group of V and so we can form the quotient group V/W. this are the cosets

The elements of

r+W={t+w;weW}, and the group operation is given by

(c+W)+(y+W) =(c+y)+W. Now we can define a multiplication by scalars on V/W by setting

A(z +W) =Az+W. With respect to this, it is readily seen that V/W becomes a vector space over F. We call this the quotient space of V by W

and denote it also by V/W.

THE

RATIONAL

AND

CLASSICAL

FORMS

45

5.1 Theorem [fV is a finite-dimensional vector space and W ts a subspace ofV then the quotient space V/W is also finitedimensional. Moreover, tf {v1,...,Um} ts a basis of W and

{zi + W,...,2% +W} ts a basis ofV/W then BB 45 ones Urey Figs pS 1s a basis of V.

Proof The natural mapping 4 : V — V/W is given by }(z) = z+ W and is linear. In fact,

h(c+y)=c2+yt+W

=(zc+W)+(yt+W)

=b(z) + b(y);

h(Az) = Az + W = A(z 4+ W) = Ab(z). Suppose now that {21 +W,...,2, +W} is any linearly independent subset of V/W. Then the set {z,,..., 2} of coset representatives is a linearly independent subset of V. For, suppose k that >> A;2; = Oy. Then, using the linearity of 4, we have 7

k

ie

k

Ovjw = h(0v) = n( dss) = »u Aih(zi) = 2 A(z + W) and so each 4; = 0. Consequently k < dimV and V/W is of finite dimension. Consider now the set B. Applying } to any linear combination of elements of B we see as above that B is linearly independent.

Now for every z € V we have f(z) € V/W so there exist scalars A; such that k

2+W

=

k

A(2:+W)

=

Ps 2—1

(

Asai) + W

ues!

and hence z — )> A;z; € W, so that ‘=

*

c— DO Az; = t= 1

*

YO py0;. gj=1

Thus B also spans V and hence is a basis. >

5.2 Corollary

dimV = dimW + dimV/W.¢

5.8 Corollary

[f[V

Proof

We have dimV

dim V/W

=W@Z then Z~V/W. = dimW + dimZ so, by 5.2, dimZ =

and it follows that

Z7~V/W.

We shall be particularly interested in the quotient space V/W when W is a subspace that is f-invariant. In this situation we have the following result.

46

VOLUME

4: LINEAR

ALGEBRA

5.4 Theorem Let V be a finite-dimenstonal vector space and let f:V —V be linear. If W is an f-invariant subspace ofV then the prescription

f'(c+W) = f(z) +W defines a linear mapping f': V/W + V/W, the minimum polynomial of which divides the minimum polynomual of f. Proof Observe that ifz+W =y+W since W is f-invariant,

then

z—y €W

and so,

f(z) — f(y) = f(z-y) EW which gives f(z)

+W = f(y) +W.

mapping from V/W that

to itself. To see that f’ is linear we observe

Thus f' indeed defines a

f'[(z+W) + (y+W)] = f(z+y)+Ww = f(z)+f(y)+W

= (f(z) +W] + [f(y) +); f'[A(z + W)] = f(Az) + W = Af(z) + W = A[f(z) + W]. Now for all positive integers n we have (f”)’= (f')”. This is readily seen by induction. For the anchor point n = 1 the result is trivial; and for the inductive step we have

(fo4*)'(2 + W) = = = =

fot? (2) +W f[f"(z)]+W f'[f"(z) + W] f'[(f')"(z+W)|

= (f')"**(c+W). Thus, for every polynomial p = > a; X* we have [p(f)]’ = p(f'). Consequently, taking p = my we pe: that 0 = m;(f') and hence that my: |my. >

THE

RATIONAL

AND

CLASSICAL

FORMS

47

Definition We call f': V/W — V/W the linear mapping induced by f on the quotient space V/W. We shall now consider a particular type of f-invariant subspace. Let z be a non-zero element of V and consider the set

Zz of all elements of V of the form p(f)(z) where p ranges over all polynomials in F[X]. It is clear that Z, is a subspace of V, and that it is f-invariant. Example

Let f : IR° — IR® be given by

f(z, y,z) = (-y+2z,2+4 2, 22). Consider the element (1,0,0). We have f(1,0,0) = (0,1,0) and f?(1,0,0) = (0, 1,0) = —(1,0, 0), from which it follows that Z(1,0,0) = {(z, y,0) ; z,y € IR}. Our immediate objective is to discover a basis for the subspace Z,. For this purpose, consider the sequence

2, f(z), f?(2),...,f"(z),.-. of elements of Z,. Clearly, there exists a least positive integer

k such that f*(z) is a linear combination of the elements that precede it in this list, say

f*(z) = Aor + Arf(z) + «++ Ag—if**(z), and {z, f(z),..., f*~1(z)} is then a linearly independent subset of Z,. Writing a; = —A, fori = 0,...,k — 1 we deduce that the polynomial

Mz = dp +a,X4+ ---+a,_,X*~) + X* is the monic polynomial of least degree such that m,(f) ‘anni-

hilates’ z, in the sense that m,(f)(z) = Oy. Definition

We call m, the f-annthilator of z.

Example Referring to the previous Example, let z = (1,0,0). Then we have f?(z) = —z. It follows that the f-annihilator of zis mm, = X?+1. With the above notation, we have the following result.

48

VOLUME

5.5 Theorem

If z€V

4: LINEAR

ALGEBRA

has f-annthilator

Woe agar Fe ae then the set B, = {z, f(z),...,f* *(z)} 1s @ basis of Zz, so that dim Z, = degm,. Moreover, if fz : Zz + Zz 13 the induced linear mapping on the f-invariant subspace Zz then the matriz of fz relative to the basis B, 13 o- 00 gee Can

Ome

OO

0 OF

—ao Gi

0

—ag

10-06. t be

Ge se

Finally, the minimum polynomial of fz 13 mz.

Proof Clearly, B, is linearly independent and f*(z) € (Bz). We prove by induction that f"(z) € (B,) for every n. This is clear for n = 1,...,k.

Suppose then that n > k and that

{”-1(z) € (Bz). Then f"~1(z) is a linear combination of x, f(z),...,f*~1(z) and so f"(z) = f[f"~1(z)] is a linear combination of f(z), f7(z),..., f*(z). Since f*(z) € (Bz) it follows that f(z) € (B,). It is immediate from this observation that p(f)(z) € (Bz) for every polynomial p. Thus Z, C (Bz) whence we have equality, the reverse inclusion being obvious. follows that B, is a basis of Z,. Since

It now

f(z) = f(z) fel f(2)] = f?(z)

fel f*~*(z)] = f*~*(2) fal f*—*(z)] = f*(2) = —aoa — ayf(z) — ---— ag—1 f*-*(2) it is clear that the matrix of f, relative to the basis B, is the above matrix C,,,. Finally, suppose that the minimum polynomial of f, is UL oS

bo = 5 by X +

LS

bacgiktint

+X’.

THE

RATIONAL

AND

CLASSICAL

FORMS

49

Then we have

Oy = my, (f2)(z) = my, (f(z) = bot ---+b-_1 f(x) +f" (z) from which f"(z) is a linear combination of z, f(z),..., f’—!(z) and therefore k < r. But m,(f) is the zero map on Z,, whence

so is mz(fz). Consequently we have m;,|m, and sor < k. Thus r=k and it follows that my, = mz. > Definition We shall call Z, the f-cyclic subspace spanned by {x}, and C,,, the companion matriz of the f-annihilator m,. Any basis of the form B, will be called a cyclic basis, and x will be called a cyclic vector. A subspace that has a cyclic basis will be called a cyclic subspace. Our first main objective can now be revealed. It is to prove that if f : V — V has minimum polynomial of the form p* where p is ureducible then V can be expressed as a direct sum of f-cyclic subspaces, the main consequence of this being that f then has a block diagonal representation by companion matrices. Before establishing these facts, we require the following observation. 5.6 Theorem Let W be an f-invariant subspace of V. Then both the f-annthilator ofx and the f'-annthilator of s+W divide the minimum polynomual of f. Proof By 5.5, the f-annihilator of z is the minimum polynomial of f,, the mapping induced on Z, by f, which clearly divides the minimum polynomial of f. As for the f'-annihilator of z + W, this likewise divides the minimum polynomial of f' which, by 5.4, divides that of f. > 5.7 Theorem [Cyclic Decomposition] Let V be a non-zero tor space of finite dimension and let f: V — V be linear minimum polynomial mz = p* where p 13 irreducible. Then are cyclic vectors 21,...,Z and positive integers nj,...,n~ each n; 1) and let V be of dimension n.

As my = p’, there is a non-zero z; € V with p*—'(f)(zi) # Oy.

The f-annihilator of z, is then mz, = pe. Let W = 22,

and let f': V/W —V/W be the induced mapping. By 5.4, the minimum polynomial of f’ divides m; = p* and so the inductive

hypothesis applies to f’ and V/W. Thus there exist f’-cyclic subspaces Zy,4w,---,Zy,+w of V/W such that k

V/W = ® Zy,4w {=2

and, for 2

5.9 Corollary

dimV = (n; + ---+n,)degp.>

Without loss of generality, we can assume that the cyclic vectors Z1,..., 2% of 5.7 are arranged such that the corresponding integers n,; satisfy

t=n2>ng2-:°2m%,21. With this convention, we have :

52

VOLUME

5.10 Theorem Proof

4: LINEAR

ALGEBRA

nj,...,n,% are uniquely determined by f.

From the above we have, for every 1,

dim Z,, = deg m,, = deg p™ = dn. Observe that for every integer 7 the image of Z,, under p(f)? is the f-cyclic subspace Z,/)i(2,)- Since the f- eanitelatee of 2; is p”', of degree dn;, we see that the dimension of Z,(;)i(2,) is

0 if 7 > n, and is d(n,; — 9) if 7 < n. Now every x € V can be written uniquely in the form Z=vu+---+u,>

(v4; € Zz,)

and so every element of Im p(f)? can be written uniquely in the form

p(f)? (2) = p(f)? (v1) + --- + pC)? (ve). Thus, if r is the integer such that nj,...,n, > 7 and np41 7

It follows from this that

dim Im p(f)?~ +— dim Im p( f)?

=4( © (w-s+1)- ¥ (n-3)) ng>j—1

ny>zj

=d Di (mn—g+1-n +3) ni27

=d 351 nji>J

= dx number of n; > 7. Now the dimensions on the left are determined by f so the above expression gives, for each 7, the number of n,; that are greater than or equal to 7. This determines the sequence

t=nyp2ng completely. >

20-92 ng > 1

THE

RATIONAL

AND

CLASSICAL

FORMS

53

Definition If the minimum polynomial of f is of the form p' where p is irreducible then, relative to the uniquely determined chain of integers t =n, >ng>--->n,> 1, the polynomials p’ = p™,p™2,...,p™* are called the elementary divisors of f. It should be noted that the first elementary divisor in the sequence is the minimum polynomial of f. We can now apply the above results to the general situation where the characteristic and minimum polynomials of a linear mapping f:V —V are

Xf Pi Po ely

My

pps? pe

where pi,..., px are distinct irreducible polynomials. We know by the Primary Decompositon Theorem that there is a basis of V with respect to which the matrix of f is a block diagonal matrix ;

A, A2 Ax in which A; is the matrix (of size d; deg p; x d; deg p;) that represents the induced mapping f; on V; = Kerp;(f)*'. Now the minimum polynomial of f; is p;* and so, by the Cyclic Decomposition Theorem, there is a basis of V; with respect to which A; is the block diagonal matrix

C; Ci2 C; in which the C;; are the companion matrices associated with the elementary divisors of f;. By the previous discussion, this block diagonal form, in which each block A, is itself a block diagonal of companion matrices, is unique (to within the order

of the A;). It is called the rational canonical matriz of f. It is important to note that in the sequence of elementary divisors there can be repetitions, for some of the n; can be equal. The result of this is that some companion matrices can appear more than once in the rational form.

54

VOLUME

Example

4: LINEAR

ALGEBRA

Suppose that f : IR* — IR* has minimum polynomial mr = X* +1.

Then yy = (X? +1)?. By 5.9 we have 4 = (nm; + ---+ nx)2. Since the first elementary divisor is the minimum polynomial, we must have n, = 1. Since we must also have each n; > 1, it follows that the only possibility is k = 2 with ny = nz = 1. The rational canonical matrix of f is therefore Ori 1 0 Cyx241

Example nomial

® Cxr41



De a 1 0

Suppose now that f : IR° > IR° has minimum poly-

my = (X?+1)(X -2)?.

The characteristic polynomial of f is then one of

x1 = (X? + 1)?(X —2)?,

x2 = (X? +:1)(X -2)*.

Suppose first that yy = x1. Then, arguing exactly as in the previous Example, we see that the rational canonical matrix is

Cx241 ® Cx241 8 C(x_2)2. Suppose now that xy = x2.

In this case we know that IR° =

V, ® V2 with dimV; = 2 and dimV2 = 4. Also, the induced mapping f2 on V2 has minimum polynomial (X — 2)?. By 5.9 applied to fz : V2 + V2 we have 4=n,+ ---+n, with n; = 2. There are therefore two possibilities, namely k =2 withnj = no =2;

k=3

with n, =2,no = ng = 1.

The rational canonical matrix of f is therefore of one of the forms

Cx241 ® C(x~-2)2 B C(x_2)2,

Cx241 8 C(x-2)2

@ Cx-2 ® Cx-2.

Note from the above Example that a knowledge of both the characteristic and the minimum polynomials is not in general enough to determine completely the rational form.

THE

RATIONAL

AND

CLASSICAL

FORMS

55

Note also that the rational form is quite different from the Jordan form. To see this, let us take a matrix in Jordan form and find its rational form. Example

Consider the matrix

PS II oo on Fe © nN KF

We

have

XA=

(X —

2)° =

an

and, by 5.9,3=nj+---tnz

with n; = 3. Thus k = 1 and the rational form is

0 C(x-2)3

i

0 Lae

0

1

8 —12

6

The fact that the rational form is quite different from the Jordan form suggests that we are not yet finished, for what we want is a general canonical form that will reduce to the Jordan form when the eigenvalues lie in the ground field. We shall now obtain such a form by modifying the cyclic bases used to obtain the rational form. In so doing, we shall obtain a matrix representation constructed from the companion matrix

of p; rather than those of p‘". 5.11 Theorem Let z be a cyclic vector of V and let have minimum polynomial p” where p=atayX+

oe CS

f:V —~V

Ge

Then there is a basis of V with respect to which the matriz of f as the kn x kn matriz

C, M Cc, M

56

VOLUME

4: LINEAR

ALGEBRA

1s the kxk

in which Cy is the companion matriz of p, and M matriz il

Proof

Consider the kn elements

(a),

yea

f(z),

x

P(AF*-*(z)],

---,

PFA (2)],

PCF) (2)

of)” [f8-2(2))y «+s PUA)" LF(a) vf)"(2)To show that this set is a basis of V it suffices to show that it is linearly independent. Suppose that it were not so. Then some non-trivial linear combination of these elements would be Oy

and so there would exist a polynomial h such that h(f)(z) = Ov with deg h < kn = deg p”. Since z is cyclic, this contradicts the assumption that p” is the minimum polynomial of f. We order this basis in a row-by-row manner, as we normally read. Now f maps each element in the above array to its predecessor in the same row, except those at the beginning of a row. For these elements we have, for example,

f[f*-*(2)] = f(z) =

—a,—1f*—}(z)

Te

tare

ro

+ p(f)(z).

It is now an easy matter to verify that the matrix of f relative to the above basis is of the form described. > Definition A block matrix of the form described in 5.11 will be called a classical p-matriz associated with the companion matrix

Co: Applying 5.11 to the cyclic subspaces appearing in the Cyclic Decomposition Theorem, we see that in the rational canonical matrix of f we can replace each diagonal block of companion matrices associated with the elementary divisors p;'* by a classical p;-matrix associated with the companion matrix of p;. This gives another canonical matrix which we call the classical canontcal matriz of f.

THE

RATIONAL

Example

AND

CLASSICAL

FORMS

57

Let f : IR° — IR® be such that Xf

=

My

=

ee

+ WR

1)?(X

—-X+

Then the rational canonical matrix of f is 0

00

-1

OO

Sy

010

OCC

lg

2 -8

18

0

-1

1

-2

and the classical canonical matrix is Oo

1

-1

Jy

0

1

0

0

0 1

-1 il —1

1

Finally, let us note that if in 5.11 we have

p = X — a (so

that k = 1 and f — aidy is nilpotent of index n) then C, is the

1 x 1 matrix [a] and the classical p-matrix associated with C, reduces to the n x n elementary Jordan matrix associated with the eigenvalue a. Thus the classical form reduces to the Jordan form when the eigenvalues belong to the ground field.

CHAPTER

SIX

Dual spaces

If V and W

xe V

are vector spaces over a field F then the set

Lin(V, W) of linear mappings-from V to W is also a vector space over F': if f,g € Lin(V, W) define f + g and Af by (f + g)(z) = f(z) + g(z) and (Af)(z)= Af(z)f and observe that f + g,Af belong to Lin(V,W). A particular case of this is of especial importance,

namely that in which for W we take the ground

field F (regarded as a vector space over itself). It is on this ~ vector space Lin(V, F) that we shall now focus our attention. Definition By the dual space of V we shall mean the vector space Lin(V, F), which we shall denote by V4. The elements of V4, ie. the linear mappings f : V — F, will be called linear functionals (or linear forms) on V. Example

The :-th projection p; given by p;(z1,...,2n) = 2;

is a linear functional on IR” so is an element of (IR”)?. Example If V = Mat,,.,(€) then T: V — € given by T(A) = doy. 4 is a linear functional on V so is an element of V4. A eV \ Example

The mapping J: IR[|X] — IR given by I(p =f, ai

a linear functional on IR[X] so is an element of ei In what follows, we shall denote a typical element of V4 by z?. Thus the notation 2? will be used to denote a linear mapping from V to the ground field F. We begin by showing that if V is of finite dimension then so is the dual space V4. This we do by constructing a basis of V4 from a basis for V.

DUAL

SPACES

59

6.1 Theorem Peasy

Let {v1,...,un}

be a basis of V and fori =

Ms Lek ud :V — F be the linear mapping such that

See

Then {v¢,..., Proof

Ha

v2} ts a basis of ve.

a {

4

ee

It is clear that vf € V4. Siok

»

.

a

\

an

that : r;v¢ = 0 in a

V4. Then for 7 = 1,...,n we have

Or = (3 vf) (v4)= DoAsvflos) = DoAber = As and so {v¢,..., v4} is linearly independent. If

j=l

then we have (x)

n

= > 2,0; EV nea:

uf (z) = d z;vf(v;)= YS236 ig = Z—

J=1

and hence, fo. ary f € v4,

(2. F(es)ef) (2) = 2 fes)o#(@) = >) f(oi)a =f(> avs) = F(2) ca)

4=1

Thus we see that

(**)

(VfeV*)

f= Do slvi)u? Ges

which shows that {v?,...,v4}

also spans V4, whence

basis. > 6.2 Corollary

If dimV is finite then dimV4 = dimV.

Note from (*) and (**) in the above proof that (Vz

EV)

t=), vd (x) v,; t=1

it is a

60

VOLUME

4: LINEAR

ALGEBRA

nm

(Vzt e V4)

at = > 24(v,) vf. t=1

Definition If {v,,...,v,} is a basis of V then we shall say that the basis {v?,..., v4} of V4 described in 6.1 is the corresponding dual basis. Because of (x) above, the mappings uv?,...,u% are often called the coordinate forms associated with v1,..-,Un-

Example Consider the basis {vi, v2} of IR? where v; = (1,2) and v2 = (2,3). Let {v?, v4} be the dual basis. Then we have

1 = vf (vy) = vf(1,2) = vf(1,0)+ 2v7(0, 1);

0 = vf (v2) = v#(2,3) = 2v9(1,0)+ 3vf(0, 1). These equations give v?(1,0) = —3 and v¢(0, 1) = 2 and hence uv? is given by

v?(z,y) = —3z + 2y. Similarly, we have

v3(z,y) = 22 —y. Example Consider the standard basis {e1,...,én} of IR". By definition, we have e¢(e;) = 6;; and so nm

e?(21,. tiging)

(>

nm

2;¢;) sat ig a;e¢(e;) =f;

g=k

g=1

whence the dual basis is the set of projections {p1,..., pn}. pae( Myync Mu. J

Example

Let t1,...,t,41 be

xX.

n+ 1 distinct real numbers and

for each 7 let ¢, : IRn[X] — IR be the substitution mapping given by ¢,(p) = p(t;). Then B

=

Cees

ee

is a basis for IR,|X]*. In fact, since IR,|X]? has the same dimension as IR,,[X], namely n + 1, it suffices to prove that B is

DUAL

SPACES

61 n+1

linearly independent. But if > A;¢, = 0 then t= n+1

0=

(s2 Ase, ) (1)=A,

+ A2q+ oot

Angad

t=1 n+1

o=

(3: ses) (X) = Ayti + Agta + +++ +Angitnyi t=1

n+1

ses) (X™) = Art? + Aats + +--+ Angithysor (> s=1 The coefficient matrix of this system of equations is the Vandermonde matriz Lael

Bae

tpeits.

“s3%

i! tn+1

MS

a ek

ee

By induction, it can be shown that det M = || (t; — t,). Since j 6.7 Corollary

f': W4

If f : V —

W

is an tsomorphism

then so 18

V4; moreover, we have (f*)~! = (f—')*.

Proof This follows from (1) and (3) on taking g = f~!. > Of course, when V and W are finite-dimensional 6.6 and 6.7 follow immediately from 6.5 and the corresponding properties of transposition for matrices. We can also consider the transpose of f*. We denote this by f* and call it the bitranspose of f. The connection between bitransposes and biduals is the following. 6.8 Theorem

For every linear mapping f : V —

W

the dia-

gram

V2

“|

[+

iia hy ftt

1s commutative, in the sense that ft Proof We have to show that f**(Z) =

cay = aw of. _—™

f(z) for every

z EV.

Now for all y? € V4 we have

[t**(@)l(y*) = (#0 F*)(y*) = a[f*(y%)] = (z, ie (y*))

= (f(z), y*) = F(2)(y%), from which the result follows. ©

An immediate consequence of 6.8 is that when V and W are of finite dimensions (in which case we agree to identify V, V and W, W and therefore also ay ,idy and ay, idy) we have f** = f.

This then matches the matrix situation, where At = A.

DUAL

SPACES

Definition

67

If z € V and y? € V@ are such that (z,y?) = 0

then we say that z is annthilated by y*.

Since (z,y*) = y4(z) we see that the set of elements of V that are annihilated by y* is Ker y?. Now it is immediate from the identities (f), (7) preceding 6.4 that, for every non-empty subset E of V, the set of elements of V@ that annihilate every element of E is a subspace of V¢. We denote this subspace by E°. Thus

E° = {yt eV"; (VrEE)

(2z,y*) = 0}.

We call E° the annthilator of E. It is clear that {Oy }° = and that V° = {Oy A:a¢ E W° then for 7 = 1,...,m we have —/-

0 = (a;,2%) = mA(aj, a2) =dj. It follows that {a¢,,,,..., a4} is a basis of W° and consequently

dimW° =n—m=dimV — dimW. As for the second statement, consider the subspace W°°

(W°)° of V =V.

=

By definition, every element of W is annihi-

lated by every element of W° and so we have the other hand, by what we have just proved,

W C W°°.

dimW°° = n—dimW° = n-— (n—m) = m=dimW. It follows, therefore, that

W =W°°.

Annihilators and transposes are connected :

On

68

6.10 Theorem 1s linear then

VOLUME

4: LINEAR

ALGEBRA

If V,W are finite-dimensional and f :V +~W

(1) (Im f)° = Ker f*;

(2)4 (Kerf) o—= Imnsf*; (3) dimImf* = dim Im f; (4) dim Ker f* = dim Ker f. Proof (1) We have y4 € (Im f)° if and only if, for every z

EV,

0 = (f(z), y*) = (z, f*(y*)) which is the case if and only if f*(y*) EV° = {Oy}, ie. if and only if y? € Ker f*. (2) Replacing f by f* in (1) and using the fact that f** = f, we obtain (Im f*)° = Ker f. Then, by 6.9, (Ker f)° = (Im f*)°° = Im f?. (3),(4) follow from (1),(2), and 6.9. > 6.11 Corollary The row and column rank of a matriz A over a field F are the same. Proof If A represents a linear mapping f then A* represents f'. The result follows from the fact that the row rank of A is dim Im f and the column rank of A is the row rank of A’ which is dim Im f*. }

CHAPTER

SEVEN

Inner product spaces

In some aspects of our discussion of vector spaces the ground field F has played no significant réle. In this Chapter we shall restrict F' to be IR or C, the results we obtain depending heavily on the properties of these fields. Definition Let V be a vector space over €. By an inner product on V we shall mean a mapping f : V x V — C, described by

(xz, y) + (z|y), such that for all z,2’,y € V and alla € € the following identities hold :

(1) (2)

(e@t+2'|y) = (zy) +(2'|y)s (az|y) = a(z|y);

(3)

(z|y) = (y|z), so that in particular (z|z) € IR;

(4)

(z|z) > 0, with equality if and only if z = Oy.

By a complez inner product space we mean a vector space V over € together with an inner product on V. By a real inner product space we mean a vector space V over IR together with an inner product on V (this being defined as in the above but with the bar denoting complex conjugate omitted). By an inner product space we shall mean either a complex inner product space or a real inner product space. There

are certain other identities that follow immediately

from (1) to (4) above, namely :

(5) (zlyty') = (z]y) + (zly’). In fact, by (1) and (3) we have

(zlyty’) = (y+y'|2) = (y|z) + (y'|2) = (zy) + (zy).

70

VOLUME

4: LINEAR

ALGEBRA

(6) (z|ay) = a(z|y). This follows from (3) and (4) since

(z| ay) = (ay|2) = a(y|z) =a (y|z) = a(zly). (7) (z|0) =0= (0[z). This is immediate from (1), (2), (3) on taking 2’ = —z,y’ = —y, anda=-1l. Example

(C” is a complex inner product space under the map-

ping described by (z, w) ++ (z|w) where (zor.

52a) | bisa

Wat a

ime

Zj Uj.

-.

This inner product is called the standard inner product on C. Example IR” is a real inner product space under the corresponding standard inner product given by

{Ger yornp full (Yap -29 Ya) > si In the cases where n = 2,3 this inner product is often called the dot product or scalar product. This terminology is popular when dealing with the geometric application of vectors. Indeed, several of the results that we shall establish will generalise familiar results in euclidean geometry of two and three dimensions. Example

Let a,b € IR and let V be the real vector space of

continuous functions f : [a,b] —

V x V to IR by

IR. Define a mapping from b

(f,9) + (F19) = | fg. Then this defines an inner product on V.

Example

Let IR,,[X] be the real vector space of polynomials

of degree less than n. Then

(ola = [va defines an inner product on R,,[X].

INNER

PRODUCT

Example

SPACES

103

For an n x n matrix A =

[a;;] let trA =

nm

Yo ay.

i=1

Then the vector space Matnxn(IR) can be made into a real inner product space by defining

(A|B) = tr(BtA). Likewise, Mat,,,,(€) can be made into a complex inner product space by defining

(A|B) = tr(B*A) where B* = Bt is the complex conjugate of the transpose of B. Definition Let V be an inner product space. For every zs € V we define the norm of z to be the non-negative real number

\|z|| = ./(z|z). Given z, y € V we define the distance between z and y to be d(z,y) = ||z — y]. Example

In the real inner product space IR* under the stan-

dard inner product, if z = (z,, x2) then ||z||? = z? + 22, so ||z|| is the distance from z to the origin.

Likewise, if y =

(y1, y2)

then we have ||z — y||? = (z1 — y1)? + (z2 — y2)?, which shows the connection between the general concept of distance and the theorem of Pythagoras.

It is clear from (4) above that ||z|| = 0 if and only if z = 0y. 7.1 Theorem Let V be an inner product space. z,y © V and every scalar i,

(1) [JAZ = Allies

(2) [Cauchy-Schwarz inequality]

(3) [Triangle inequality]

Then, for all

|(z|y)| < ||z|| |lyll;

||x + yl] < ||| + |ly|l.

Proof (1) ||Az|]? = (Az|Az) = AX(z| z) = |Al?||z||?. (2) The result is trivial if c = Oy. Suppose then that z # Ov, so that ||z||

4 0.

Let

z =

y—

ie z.

Then, noting that

(2|z) = 0, we have z) 0< |lz||* Die= (y- (y| iz? — |a

AY

= (yy) - oe (z|y) a — |z|y)? = (y|y) “Tal?

the

Tell?

VOLUME

72

4:

LINEAR

ALGEBRA

from which (2) follows. (3) This follows from the observation that

Iz + yl? =(2+yl|2+y) = (x|z) + (z|y) + (y|z) + (yly)

= = < < = Example

|x||? + (zly)+ Ly)+ lll? |lz||? + 2Re(z|y) + llyll? ||2|? + 2\(z|y)| + lyll? |e|]? + 2\z\Hlyll + lly? by (2) (Iz|| +llyll)”. >

Let V be the set of infinite sequences (a;);>1 of real

numbers that are square summable in the sense that }> a? exi>1 ists. Defining an addition and a multiplication by real scalars in the obvious component-wise manner, we see that V becomes a real vector space. Let (a;);>1 and (b;);>1 be elements of V. By the Cauchy-Schwarz inequality applied to the inner product space IR” with the standard inner product, we have

k Dd ab;

k

$2

4=0

; —

k

2495S ES dX a7 D oF eke

.—

k

so the sequence with k-th term )> a;b; is absolutely summable and hence is summable.

Thus

s=1

}> a;b; exists and we can define #>1

((as)i>a [(Os)i>1) = i>1 DU aii. In this way, V becomes a real inner product space that is often called 2-space or Hilbert space. Definition

If V is an inner product space then z,y € V are

said to be orthogonal if (x|y) = 0. A non-empty subset S of V is said to be an orthogonal subset of V if every pair of distinct elements of S is orthogonal. An orthonormal subset of V is an

orthogonal subset S such that ||z|] = 1 for every set of mutually orthogonal vectors of length 1.

t € S, ie. a

4

INNER

PRODUCT

SPACES

73

Example Relative to the standard inner products, the standard bases of IR” and of €” are orthonormal subsets. Example In IR? the elements z = (z1, 22) and y = (y1, y2) are orthogonal if and only if z,y; +z2y2 = 0. Geometrically, this is equivalent to saying that the lines joining z and y to the origin are mutually perpendicular. Example

In the vector space V of real continuous functions on

the interval [—7, x] with inner product (f |g) = ["_ fg the set S={rr1,zrsinkz,r++

coskz;

k = 1,2,3,...}

is an orthogonal subset. It is clear that an orthonormal subset of V can always be obtained from an orthogonal subset S by normalising each element z of S, i.e. by replacing z by z* = z/||z||. An important property of orthogonal (and hence of orthonor-

mal) sets is the following. 7.2 Theorem Proof

Orthogonal sets are linearly independent.

Let S be an orthogonal subset of V and let z1,...,2, € nm

S be such that }> A;z; = Oy. Then for every 7 we have 4=1

As (zi |zi) = DO An (ze |zi) = (> AKze ke==1

zi) = (Oy|zi) = 0

k=1

from which it follows by (4) on page 69 that A; = 0. > We now describe properties of the subspace spanned by an orthonormal subset.

7.3 Theorem

Let {e1,...,¢n} be an orthonormal subset of the

inner product space V. Then

[Bessel’s inequality]

(VzEV)

rm

> |(z| ex) | < ||z||?. k=1

Moreover, if W is the subspace spanned by {€1,---,€n} then the following statements are equivalent :

(1)

zew;

74

\

VOLUME

4: LINEAR

ALGEBRA

nr

(2) dO |(zlex)|? = [lell?; k=?

as

29

pale |exe}

(4) (WyeEV) (2|y) = 2 (alen)(enl n

Proof Let z = z— >> (z|e,)ex.

Then a simple computation

k=t

gives

0< (z|z) = (2|z) ~ 3 (een) (= Tee) = |\z||? — 2 [zl ex)|?. (2) = (3) is now immediate since (2) implies that z = Oy. (3) > (4): If z=

3 (z|e,)e, then, for all

yc V,

k=1

(z|yv) = ( 3(elen)ex |v)= 50(2Len)(ex|). (4) => (2) follows by taking y = z in (4). (3) => (1) is clear. n

(1) = (3) : If c= > Axex then for 7 = 1,...,n we have k=1

A; = DY Arlen |e;) = (D> AKek |e) =(zle;). k=1

k=]

Definition By an orthonormal basis of an inner product space we mean an orthonormal subset that is a basis.

Example

The standard bases of IR” and €” are orthonormal.

Example In Mat,.n(€) with (A|B) = tr (B* A), an orthonormal basis is {Eq ; p,q = 1,...,n} where E,, has a 1 in the (p, q)-th position and 0 elsewhere. We shall now show that every finite-dimensional inner product space has an orthonormal basis. In so doing, we give a practical method of constructing such a basis.

INNER

PRODUCT

7.4 Theorem

SPACES

75

[Gram-Schmidt orthonormalisation process] Let

V be an inner product space and for every non-zero x € V let

z* = z/||z||. If {21,..., 2%} ts a linearly independent subset of V, define recursively Yi = 233 32° — (z2 ay (ro |yi)¥1) 3

ys = (23 — (23 |y2)yo — (23 |y1)¥1)3

tL

(ce rT Sie: |yi)yi) aa |

Then {y1,.--, ye} ts orthonormal and spans the same subspace oe Epes ae Proof It is readily seen that y; # Oy for every 2 and that y; is a linear combination of 2,,...,2z;. It is also clear that 2,

is a linear combination of yi,...,y;. Thus {z1,...,2,} and {yi,---,Y¥e} span the same subspace. It now suffices to prove that {y1,...,yx} is an orthogonal subset; and this we do inductively. For k = 1 the result is trivial. Suppose that {y1,..., 4-1} is orthogonal where t > 1. Then, writing

jee— (eel vss t=!

=

Mt,

i=

we see that

tA tye = Tt — bas (xt | vs) ys 11

and so, for 7 < t,

ne

eee Hy(cs hoe)(ue|9) = (2+ |ys) — (2+ |ys) =0.

Since a; # O we deduce that (y%|y;) = 0 for 7 < t. Thus {y1, Sale sutt is orthogonal. >

76

VOLUME

4: LINEAR

ALGEBRA

‘\

7.5 Corollary If V is a finite-dimensional inner product space then V has an orthonormal basis. Proof

Apply the Gram-Schmidt process to a basis of V. >

Example

(0,1,1),22

Consider the basis {z1,22,23}

=

Gram-Schmidt

(1,0,1),z3

=

(1,1,0).

of IR° where

z1 =

In order to apply the

process to this basis using the standard inner

product, we first let y; = 21/||z1|| = 75 (0; 1,1). Now z2 — (2 |yi) yi a (1,0, 1) at ya ((1,0, 1) (0, 1, 1)) 35(0, 1, 1) na (1,0, 1) bd 5 (0, 1, 1)

= 3(2,-1,1) sc, normalising this, we take y2 = (2; —1,1). Note that, by 7.1, we have

[el

[Alle]

l-2*

ita: If f : V — W is an inner product isomorphism then

clearly {f(e1),..-, f(en)} is a basis of W. It is also orthonormal since

ledifen=(led={o

fa,

igh

< : Suppose now that {f(e1),..., f(en)} is am orthonormal basis of W. Then f carries a basis to a basis and so is a vector space isomorphism. Now for all z € V we have, using the Fourier

expansion of z relative to {e),...,€n},

(f(z) |f(es)) =(F > ((zl ese)e:) |f(e:)) re

yeaa |

“(aelei) Fle) |fles))

and similarly (f(e3) |z) = (e3 |2). It now follows by Parseval’s identity applied to both V and W that

(F(2) |F(y)) = (C2) | Flea) Mle) |FW) = j=1 ¥(z/e,)(e, |) = (z|y)

and consequently f is an inner product isomorphism. } We now pass to the consideration of the dual of an inner product space. For this purpose, we require the following notions. Definition Let V,W be vector spaces over a field F where F is either IR or C. A mapping f : V — W is called a conjugate transformation if

(Ve,yEV)(VAEF) f(z+y)=f(z)+ f(y), fz) =F (2). If, furthermore, f is a bijection then we say that it is a conjugate tsomorphism,

4

INNER

PRODUCT

SPACES

79

Note that when F = R conjugate transformations are simply linear mappings. We now observe that for every

y © V the mapping from V

to F described by z++ (z|y) is linear, and hence is an element of V*. We shall write this element as y*, so that we have the following useful amalgamated notation :

(z|y) = y*(z) = (z,y%). 7.9 Theorem I[fV 1s a finite-dimensional inner product space then there is a conjugate isomorphism by : V — V4, namely

that given by by (z) = z* where (Vz EV)

z4(y) = (y,2%).

Proof Consider the mapping ty :V — V@ given by #y(z)

=

zt, Since

(z,(y +2)*) = (z|y +z) = (z|y) + (z|z) = (z,y*) + (2,24) = (z,y? + 2%) we see that (y +z)? = y4 + 24, s0 Hy (y+z) = Hy (y) + By (z). Likewise,

(z, (Ay)*) = (z| Ay) = A(z| y) = A(z, y*) = (2, Ay*) and so (Ay)? = Ay* whence Hy (Ay) = AD(y). Thus By is a conjugate transformation. That By is injective follows from the fact that if c € Kerby

then z* = 0 and so (z|z) = (z,z*) = 0 whence z = 0y. To show that Hy is also surjective, let f EV%. If {e1,...,en} is an orthonormal basis of V, let

= L Hees. Then for 7 = 1,...,n we have

a4 (e,) = (es|2) = (4s ¥ Header) = YeHea)(eyes) = (6) Thus z* and f coincide on the basis {¢1,...,én}. We deduce that zt = by(z) and so Hy is also surjective. Thus ty is a conjugate isomorphism. ~

80

VOLUME

4: LINEAR

ALGEBRA

‘\

We note from the above that we have the identity

(Vz,yEV)

(z|y) = (2, By (y)).

Since 8y is a bijection, we also have the following identity (ob-

tained by writing 3*(y) instead of y) :

(Vz,yeV)

— (2| 9y*(y)) = (2,y)-

We can now establish the following important result. 7.10 Theorem Let V andW be finite-dimensional inner product spaces over the same field. Then for every linear mapping f:V —W there is a unique linear mapping f* :W — V such that

(VeeV)(VyeW) Proof

(f(z) |y) = (z| f*(y))-

With the above notation, we have the identity

(f(z) |y) = (f(@), 94)= (2, f(y") = (| 9)" [f*(y%))) = (2| (87° ° f* o 8w)(y)), from which it follows immediately that f*

=

87) 0 ft ody

is the only linear mapping with the stated property. © 7.11 Corollary f* :W — V 1s the unique linear mapping such that the diagram

Tt

W—_ we

Vv —!— Ft oy

1s commutative, in the sense that Sy o f* = fiodvw. Definition The unique linear mapping f* of 7.10 will be called the adjoint of f. Immediate properties of the assignment f ++ f* are the following.

INNER

PRODUCT

SPACES

81

7.12 Theorem Let V,W, X be finite-dimensional inner product be spaces over the same field. Let f,g: V +~W andh:W —X linear mappings. Then

(1) (f+9)* = f* +9"; (2) (Af)* =Af*; (3) (ho f)* = fr oh*; (4).(f*)* =f Proof (1) is immediate from f* = 9 0 ft o Sw and the fact

that (f + 9)’ = fi +9".

(2) (AF)(z) ly) = A(F(z) ly) = A(z] f*(y)) = (zl AF*(y))

and so, by the uniqueness of adjoints, (Af)* = Af*.

(3) (Alf(z)]|¥) = (F(z) |A*(y)) = (z| f*[A*(y)]) and so, by

the uniqueness of adjoints, (ho f)* = f* oh*.

(4) Taking complex conjugates in 7.10 we obtain the identity

(f*(y) |z) = (yl f(z)), from which it follows by the uniqueness of adjoints that (f*)* = f= 7.13 Theorem Let V and W be finite-dimenstonal inner product spaces over the same field with dimV = dimW. If f:V — W is linear then the following statements are equivalent :

(1) f ts an inner product 1somorphism; (2) f ts a vector space isomorphism and f—+ = f*;

(3) fo f* =idw; (4) ftof= ny.

Proof (1) > (2) : If (1) holds then f~+ exists and we have the identity

(f(z) |y) = (f(z) |FIF7* @))) = (21 f(y); from which it follows by the uniqueness of adjoints that f~* =

f*. It is clear that (2) = (3) and (2) = (4). (4) = (1) : If (4) holds then f is injective, hence bijective, and f—! = f*. Consequently,

(vz,yeV)

(f(z) |f(y)) = (21 FL F(y))) = (219)

82

:

VOLUME 4: LINEAR ALGEBRA

and so f is an inner product isomorphism.

The proof of (3) = (1) is similar. > We have seen in 6.10 how the transpose f* of f is such that Ker f* and Im f* are the annihilators of Im f and Ker f respectively. In view of the connection between transposes and adjoints, it will come as no surprise that Ker f* and Im f* are also related to the subspaces Im f and Ker f. This connection is via the following notion. Definition Let V be an inner product space. For every nonempty subset E of V we define the orthogonal complement of E in V to be the set

E* ={yeV; (VWzeE) (z|y) =0}. It is clear that E+ is a subspace of V. suggested by the following result.

The terminology is

7.14 Theorem Let V be an inner product space and let W a fintte-dimenstonal subspace ofV. Then

be

V=Woewt. Proof Let {e1,...,é,} be an orthonormal basis of W, noting that this exists since W is of finite dimension. Given x € V, m

let z' = S0 (xl e;)e; and let 2” = z—z'. j

Then 2! € W and for

t=1

=1,...,n we have

(2"|e5) = (w]e5) — (2' Les) = (ales) — Do(elea)(ec les) = (z|e;) — (z|e;) = 0.

It follows that xz” € W+ and hence that z= 2! +2” eW+W?. Thus V =W+W+. Now if s€WOW?= we have (z|z) = 0 whence ||z|| = 0 and so z = Oy. Thus we conclude that V =

WeWw-+.

>

INNER

PRODUCT

SPACES

;

83

Example The above result has a basic application to the theory of Fourter sertes. Suppose that W is a finite-dimensional subspace of the inner product space V. Given z € V let z = a+b where a € W and b¢€ WH. Then, by orthogonality,

|Z? = (a +b|a +b) = |jal]? + |[o|]?. For any y € W we deduce that

\|z — yl|? = lla— y + b]|? = |la— y||? + |||? = |la — y||? + Iz - ||? 2 iizalls Thus we see that the element of W that is ‘nearest’ the element

z of V is the component a of z in W. Now let {e1,...,en,} be an orthonormal basis for W.

Let the element of W

a given z € V be the element a =

2a A;e;.

A; = (a|e;) and by orthogonality (z =

that is nearest

By 7.3 we have

(a+ b|e;) = (ale;).

Thus the element of W that is nearest z is Yileledes the scalars being the Fourier coefficients. ia Now apply these observations to the inner product space V of continuous functions f : [—7,] — IR under the inner product

(f\lg)= ij” fg. An orthonormal subset of V is —ZFz.

S={zrr Yat

sinks, z+

cos kz; | ae

Way ie Rat

Let W,, be the (2n + 1)-dimensional subspace spanned by

B, = {tr ygrtr sinks, ++ coska ; k=) apn}: Then the element f, of W,, that is nearest a given f € V is of the form nm

fa = Za0 + ¥ (a, cos kx + by sin kz) |

where us

1

f(z)dz, ao = s nT J_x

a, = =

us

a

f(z) cos kz dz,

b, = if f(z) sin kz dz. an TE:

84

VOLUME

4: LINEAR

ALGEBRA

‘\

If f is infinitely differentiable then it can be shown that the sequence (fn)n>1 is a Cauchy sequence having f as its limit. Thus we can write

f = $40 + DY (a, cos ka + by sin kz) k>1

which is the Fourter series representation of f. 7.15 Theorem [fV is a fintte-dimensional inner product space and W is a subspace of V then W =W++4 and

dimW+ = dimV — dimW. Proof By 7.14 we clearly have dimV = dimW+dimW+. Now it is clear from the definition of W+ that we have W C W1+-. Also,

dimW++ = dimV — dimW~ = dimW. It follows that

W =W++.

7.16 Theorem I[fV 1s a finite-dimenstonal inner product space and A,B are subspaces ofV then

Ol) ACB = BG Aa

(2) (AnB)+ = A++ B+; (3)?(A +B) = At'n BH. Proof (1) If A C B then clearly every element that is orthogonal to B is orthogonal to A, so B+ C At.

(2) Since A, B C A+B we have, by (1), (A+ B)+ C AtnB?; and since AN B C A,B we have A+,B+ C (AN B)*+ whence A++Btc

(AN B)+. Since then

ANB=(AnB)t+ ¢ (444 B+)+ Cc At+ A Btt = ANB we deduce that

Area:

AM B = (A+ + B+)+ whence (An B)+t =

(3) This follows from (2) on replacing A, B by A+, B+. > 7.17 Theorem and tf f:V —V

IfV 1s a finite-dimensional inner product space 1s linear then

Im f* = (Ker f)+

and

Ker f* = (Imf)t.

INNER

PRODUCT

SPACES

85

Proof Let z € Im f*, say z = f*(y). Then for every z € Kerf we have

(x|z)= (z| f*(y)) = (f(z) |y) = (Ov |y) =0 and consequently z € (Ker f)+. Thus Im f* C (Ker f)+. Now let y € Ker f*. Then for z = f(z) € Imf we have

(z|y) = (f(z) |y) = (2| f*(y)) = (z| Ov) =0 and consequently y € (Im f)+. Thus Ker f* C (Im f)+. Using 7.15 we then have

dim Im f = dimV — dim(Im f)+ < dimV — dim Ker f*

= dim Im f*



< dim(Ker ‘ape = dimV — dim Ker f = dim Im f.

The resulting equality gives both dim Im f* = dim(Ker f)+ and dim(Im f)+ = dim Ker f*, from which the results follow. We now investigate how matrices that represent f and f* are related. Definition

If A = [a;;|mxn € Matmxn(C) then by the adjoint

(or conjugate transpose) of A we mean the n x m matrix A* such that [A*];; = a;;. The following result justifies the above terminology. 7.18 Theorem Let V,W be finite-dimensional inner product spaces over the same field. If, relative to ordered orthonormal

bases (v;)n,(wWi)m @ linear mapping f : V — W 18 represented by the matriz A then the mapping f* 13 represented, relative to the bases (w;)m and (v;)n, by the matriz A*.

Proof For j = 1,...,n we have f(v;) = )>(f(v;) |w;)w; by 7.6, so if A = [a;;] we have a,; = (f(v;) |w;). Likewise, we have

f*(v;) =

(f*(w,) |v:)vz. Then since =

aig = (f (vz) |ws) = (wi |f(s) = (F* (ms) |29)

86

\

VOLUME

4: LINEAR

ALGEBRA

it follows that the matrix that represents f* is A*. > It is clear from 7.18 and 7.13 that a square matrix A represents an inner product space isomorphism if and only if A’ exists and is A*. Such a matrix is said to be unitary. It is readily seen by extending the corresponding results for ordinary vector spaces to inner product spaces that if A,B are n x n matrices over the ground field of V then A,B represent the same linear mapping with respect to possibly different ordered orthonormal bases of V if and only if there is a unitary matrix U such that B = U*AU = U—!AU. We describe this situation by saying that B is unitartly similar to A. When the ground field is IR, the word orthogonal is often used instead of unitary. In this case A is orthogonal if and only if A~? exists and is A*. When there exists an orthogonal matrix U such that B = U‘AU = U~1AU then we say that B is orthogonally similar to A.

It is clear that the relation of being unitarily (or orthogonally) similar is an equivalence relation on the set of n x n matrices over € (or IR). Just as with ordinary similarity, the problem of locating particularly simple representatives, or canonical forms, in certain equivalence classes is important from both the theoretical and practical points of view. We shall consider this problem later.

CHAPTER

EIGHT

Orthogonal direct sums

In 7.14 we obtained, in an inner product space V, a direct sum decomposition of the form V =W @W1. This leads us to consider the following notion. Definition Let V;,...,V, be non-zero subspaces of an inner product space V. Then V is said to be the orthogonal direct sum of Vj,...,Vn if

(1) v=; 2) (=igasyn)

GV

= Oo V5 j#t

In order to study inner product space p:V—V and the established in 2.6. sum, it is clear that

orthogonal direct sum decompositions in an V let us begin by considering a projection associated decomposition V = Imp @ Kerp In order that this be an orthogonal direct p has to be an ortho-projection in the sense

that Kerp = (Imp)* or, equivalently, Imp = (Kerp)+.

To

discover when this happens, we require the following result. 8.1 Theorem IfW,X are subspaces of a fintte-dimenstonal 1nner product space VY such thatV =W@X thenV=Wt@X?.

Proof By 7.16 we have {Ov} = V+ =(W+

X)t =Wtnxt+

and V = {0y}t = (Wn X)+ = Wt + X+ and hence V = W+e@X1+.9 8.2 Corollary If p is the projection on W parallel to X then p* is the projection on X+ parallel to Wt.

88

.

Proof

VOLUME

4: LINEAR

ALGEBRA

By 7.12 and since p is idempotent, we have p* o p* =

(pop)* = p*. Thus p* is idempotent and so is the projection on Im p* parallel to Ker p*. By 2.5, Imp = W and Kerp = X so, by

7.17, W+ = (Imp)+ = Kerp* and X+ = (Kerp)* = Imp*. > Definition If V is an inner product space then f : V — V is said to be self-adjoint if f = f*. 8.8 Theorem Let V be an inner product space of finite dimenston. Ifp 1s a projection on V then p 1s an ortho-projection if and only if p 1s self-adjoint. Proof By 8.2, p* is the projection on Im p* = (Kerp)* parallel to Kerp* = (Imp). If then p is an ortho-projection we have Im p* = Imp. It follows by 2.5 that for every z € V we have

p(z) = p*|p(z)|. Consequently p = p* o p and hence

p* = (p* op)* = p* op** = p* op=p, so that p is self-adjoint.

Conversely, if p = p* then Imp = Imp* = (Kerp)+ shows that p is an ortho-projection. > It is clear from the above results that if V is an inner product space of finite dimension and if V,,...,V, are non-zero subnm

spaces of V such that

V=

@ V; then this sum is an orthogonal =1

direct sum if and only if, for every 7, the projection p; of V onto V; parallel to >> V; is self-adjoint. j#t nm

It is also clear that if V = GV; then this direct sum is an #=1

orthogonal direct sum if and only if, for each 7, every element of V; is orthogonal to every element of V; when j #7. In fact in this case we have )> V; C V,* whence we have equality since j#t

dim )> V; = dimV — dimV; = dimV,". j#t

Suppose that V is a finite-dimensional inner product space and that f : V — V is linear. We shall now consider under what conditions f is ortho-diagonalizable in the sense that there is an

ORTHOGONAL

DIRECT

SUMS

89

orthonormal basis of V consisting of eigenvectors of f; equivalently, under what conditions there is an ordered orthonormal basis of V with respect to which the matrix of f is diagonal. In purely matrix terms this problem is that of determining when a given square matrix (over IR or C) is unitarily similar to a diagonal matrix. 8.4 Theorem Let V be a non-zero finite-dimensitonal inner product space over a field F and let f : V — V be linear. Then f 1s ortho-dtagonalizable tf and only if there are non-zero self-adjoint projections pi,...,pk :V —-V and distinct scalars A1,---;Ak € F such that

fee a on @)

Dr= idy ;

(3) (i44) Proof

Pi op; = 0.

=>: Since f is diagonalizable, we have

V = ® V,, where =1 A1,---,;Axz are the distinct eigenvalues of f and tHe subspace Y, = Ker( f — A; idy) is the eigenspace associated with ,. If

pi: V + V is the projection on Vj, parallel to } Vy, then (2), jAt

(3) follow from 2.8. Now for every z € V we have

f(a) = (35 vile)) = ¥ flea) k

= 2s Aip;(z) = (= spi) (x) k

and this gives (1). The fact that € Vy, is an orthogonal direct s=1

sum means that each projection p; is an ortho-projection and so, by 8.3, is self-adjoint.

k < : If the conditions hold then by 2.8 we have V = @ Impj. s=1

Now the A; appearing in (1) are precisely the distinct eigenvalues of f. To see this, observe that

fop;= (=Nips) © 75 = s Ai (pi © py)= AsPs

90

:

VOLUME 4: LINEAR ALGEBRA

o (f — A; idy) op; = 0 and hence {Ov }4 Imp, C Ker(f — A; idy). Thus each A; is an eigenvalue of f. On the other hand, for every A € F we have

k k k f —Aidy = DO Aspi— DO Avi= DE (Ai — A) ca

2

—)

so that, nez is an eigenvector of f corresponding to ne eigen-

value i, = (A; — A)pi(z)= Oy and hence, since V = ® Im p;, we have 0, — X)p;(z) = Oy fort =1,...,k. IfAFA; pec every a pone pi(z)= Oy for every 7 and we ane the contradiction a= = pi(z) = Oy. Thus A = 2; for some 7 and consequently A1;- Ra are the distinct eigenvalues of f.

We now observe that Imp;= aa f

—A;idy). For, suppose

that f(z)= A;z. Then Oy= EO - d;)pi(z) and therefore (A; —A;i)pi(z) = Oy for all: paints pi(z) = Oy for allt F 7.

Then zs= 3 pi(z)= p;(z) € Imp; and so Ker(f— A; idy) C Im p; . The rmyeree ipgnaon a established above. Since now V = @ Imp; = @ Ker(f — A; idy) it follows that s=1

t=1

V has a basis consisting of eigenvectors of f and so f is diagonalizable. Now by hypothesis the projections p; are self-adjoint so, for 7 #2,

(pi(z) |p3(z)) = (ps(z) | P35(2)) = (Pslps(z)] |z) = (Ov |2) = 0. It follows that the above eigenvector basis is orthogonal. By normalising each vector in this basis we obtain an orthonormal basis of eigenvectors. Hence f is ortho-diagonalizable. > Definition

For an ortho-diagonalizable mapping f the equality

k

f = ¥& Aip; of 8.4 is called the spectral resolution of f. =

ORTHOGONAL

DIRECT

SUMS

91

Suppose now that f : V — V is ortho-diagonalizable. Applying the results of 7.12 to the conditions in 8.4 we obtain, with an obvious notation,

(1*) ft = % Pies D Mes (2")=(2), (3) = (3). We deduce by 8.4 that f* is also ortho-diagonalizable and that (1*) gives its spectral resolution (so that A1,...,A% are the distinct eigenvalues of f*). A simple calculation now reveals that

k fe Foe

IM pi aime

from which we deduce that ortho-diagonalizable mappings commute with their adjoints. This observation leads to the following notion. Definition If V is a finite-dimensional inner product space and f : V — V is linear then we say that f is normal if it commutes with its adjoint. Similarly, a square matrix A over the ground field of V is said to be normal if AA* = A*A. Example

It is readily seen that

i-"+

a

i;2+ i

is normal.

We have just seen that a necessary condition for a linear mapping f to be ortho-diagonalizable is that it be normal. It is quite remarkable that, when the ground field is €, this condition is also sufficient. In order to establish this, we require the following properties of normal linear mappings. 8.5 Theorem Let V product space and let

(1) (VzeEV)

be a non-zero finite-dimenstonal inner f: V + V be normal. Then

|f(2)l = IF" (2)|l;

(2) afp 1s a@ polynomial with coefficients in the ground field of V then p(f):V — V 1s also normal;

(3) Im f NKer f = {Oy}.

92

VOLUME

4: LINEAR

ALGEBRA

\

Proof (1) Since f o f* = f* o f we have, for all

ze V,

(f(z) |f(z) = (21 F*1F(@))) = (21 FIP (@))) = CF" (2) 1F@) from which (1) follows. (2) If p=ap+a,X+ ---+a,X" then p(f) = ao idy tai f+ -+--+anf” and, by 7.12, [p(f)|* = ao idy +01 f*+ ---+an(f*)”.

Since f and f* commute, it follows that so do p(f) and [p(f)]*. Thus p(f) is normal. (3) If zc € Imf mM Kerf then there exists y € V such that z = f(y) and f(z) = Oy. By (1)-we have f*(z) = Oy and so

0 = (f*(z)|y) = (z| f(y)) = (z|2) whence z = Oy. 8.6 Theorem Let V be a non-zero fintte-dimensional inner product space. If p is a projection on V then p 1s normal if and only tf tt 1s self-adjoint. Proof

Clearly, if p is self-adjoint then p is normal.

Suppose,

conversely, that p is normal. By 8.5 we have ||p(z)|| = ||p* (z)|| and so p(z) = Oy if and only if p*(z) = Oy. Given z EV, let y = x — p(x). We have p(y) = p(z) — p(z) = Oy and so

Ov = p*(y) = p*(z) — p*[p(z)]. Thus p* = p* op and so p=p=(prop)’

—p

Op

—f

opp

i.e. p is self-adjoint. > We can now solve the ortho-diagonalization problem for complez inner product spaces. 8.7 Theorem Let V be a non-zero finite-dimensional complex inner product space. If f : V — V is linear then f 1s orthodiagonalizable af and only if f 1s normal. Proof We have already seen that the condition is necessary. As for sufficiency, suppose that f is normal. To show that f is diagonalizable, it suffices to show that its minimum polynomial is a product of distinct linear factors. For this, we make use of the fact that € is algebraically closed, in the sense that every polynomial over € of degree at least 1 can be expressed as a

ORTHOGONAL

DIRECT

SUMS

93

product of linear polynomials. Thus my, is certainly a product of linear polynomials. Suppose, by way of obtaining a contradiction, that a € C is a multiple zero of my, 30 that we have

my = (X — a)?Q for some polynomial g. Then for every z € V we have

Ov = [my(F)](z) = [(f — aidy)?© g(f)](z) and consequently [(f — aidy) o g(f)|(z) belongs to both the image and the kernel of f — aidy.

Since, by 8.5(2), f — aidy

is normal we deduce from 8.5(3) that (Vz EV)

[(f — aidy) o 9(f)](z) = Oy.

Consequently (f — aidy) o g(f) is the zero mapping on V, and

this contradicts the fact that (X — a)*g is the minimum polynomial of f. Thus we see that f is diagonalizable. To show that f is ortho-diagonalizable, it suffices to show that the corresponding projections p; of 8.4 are ortho-projections, and by 8.3 it is enough to show that they are self-adjoint. Now since f is diagonalizable it is clear from the proof of 2.10 that for

each 1 there is a polynomial t; such that t;(f) = p;. By 8.5(2), each p; is therefore normal and so, by 8.6, is self-adjoint. 8.8 Corollary If A 1s a square matriz over € then A 1s unttartly similar to a diagonal matriz if and only if A 1s normal. > It should be noted that in the proof of 8.7 we made use of the fact that the field € is algebraically closed. This is not true of IR and so we might expect that the corresponding result fails in general for real inner product spaces (and real square matrices). This is indeed the case : there exist normal linear mappings on a real inner product space that are not diagonalizable. One way in which this can happen is when all the eigenvalues of the mapping in question are complex. For example, the rotation matrix

Dae gee a 2

2

is normal and its minimum polynomial is X* + X + 1 which has no zeros in IR. So, in order to obtain an analogue of 8.7 in the case where the ground field is IR, we are led to consider normal linear mappings whose eigenvalues are all real. These can be characterised as follows.

94

*.

VOLUME

4: LINEAR

ALGEBRA

8.9 Theorem Let V be a non-zero finite-dimensitonal complez inner product space. If f : V + V 1s linear then the following conditions are equivalent : (1) f ts normal and all its ergenvalues are real;

(2) f ts self-adjoint. Proof (1) > (2) : By 8.7, f is ortho-diagonalizable. k >> Asp; be its spectral resolution.

Let f =

We know that f* is also

t=1

hiss

normal with spectral resolution. f* = )> A;p;. Since each A; is a!

real by hypothesis, it follows that f* = f.

k (2) => (1) : If f* = f then clearly f is normal. If f = )) A:p; t=1

a

and f* =

)> X;p; are the spectral resolutions then we have 1

k

on

k

+i

(Ai — As)ps = 0 and so $0 (A; — Ax)pi(z) = Oy for every t=1

2

aa z € V, whence (A; — A;)p; = 0 for every 7 since

k V = @ Impj.

8.10 Corollary real. >

All the etgenvalues of a self-adjoint matriz are

The analogue of 8.7 is now the following. 8.11 Theorem Let V be a non-zero fintte-dimenstonal real inner product space. If f : V — V 18 linear then f 1s orthodtagonalizable tf and only tf f 1s self-adjoint.

k Proof => : If f is ortho-diagonalizable let f = }> A;p; be its spectral resolution. Since the ground field is IR, eee A; 1s real and so, taking adjoints and using 8.3, we obtain f* = f. n>dim(Z+ X). It follows that the have equality) and of ZX. Then whereas from (2)

sum Z+ X is not direct (otherwise we would so ZNX # {Ov}. Let z be a non-zero element from (1) we see that (f,(z)|z) is negative, we see that (f,(z)|z) is non-negative. This

contradiction shows that we cannot have r’ < r. Similarly we

cannot have r < r’ and so we conclude that r = r’ whence also S= 31.4 The above result gives immediately the following theorem which describes canonical quadratic forms.

9.5 Theorem

[Sylvester] Let V be a vector space of dimension

n over IR and let

Q: V — IR be a quadratic form on V.

Then

there 1s an ordered basis (v;)n of V such that fx = > 2,0; then i=1 38 2 2 2 Q(z) == ty ST PERE

24

"7

Deyg:

Moreover, the integers r and s are independent of such a basis. > The integer r+s in 9.5 is often called the rank of the quadratic form Q, and r — s the stgnature of Q. Example

Consider the quadratic form Q : IR® — IR given by

Q(z, y, 2) = 2? — 2ay + 4yz — 2y? + 42”. By the process of ‘completing the squares’ it is readily seen that

Q(z, y,z) = (x — y)? — 4y? + (y + 22)? which is in canonical form, of rank 3 and signature 1. Alternatively, we can use matrices. The matrix of Q is AS

115.0 |b =F 2 0 2 4

Let P be an orthogonal matrix such that P* AP is the diagonal

matrix D. If y = P*x (so that x = Py) then

x’ Ax = (Py)'APy = y’P’ APy = y' Dy, where the right hand side is of the form X? — 4Y? + Z?.

106

:

VOLUME

4: LINEAR

ALGEBRA

Example The quadratic form given by Q(z, y, 2) = 2zy + 2yz can be reduced to canonical form either by the method of completing squares or by a matrix reduction. The former method is not so easy in this case, but can be achieved as follows. Define

Vir = X4+Y,

V2y=X-Y,

V2z2=Z.

Then the form becomes

X? —~Y?4(X-Y)Z=(X+42Z)? -(¥ + $2)?

= $(z+y+2)?-3(2—-y+2), which is of rank 2 and signature 0. Definition

A quadratic form Q is said to be posttive definite

if Q(z) > 0 for all non-zero z. By taking the inner product space V to be Mat, x1(IR) under (x|y) = x*y, we see that a quadratic form Q on V is positive definite if and only if, for all non-zerox € V,

0 < Q(x) =x! Ax= (Ax|x), which is the case if and only if A is positive definite. It is clear that this situation obtains when there are no negative terms in the canonical form, i.e. when the rank and the signature are the same. Example Let f : IR x IR — IR be a function whose partial derivatives f,, fy are zero at (zo, yo). Then the Taylor series at

(zo +h, yo +k) is f (zo, yo) ‘2 i(h? fos + 2hk fey + k? fyy](zo, yo) rian

For small values of h,k the significant term is this quadratic form in h, k. If it has rank 2 then its normal form is +H?+ K?.

If both signs are positive (ie.

the form is positive definite)

then f has a relative minimum at (zo, yo), and if both signs are

negative then f has a relative maximum at (zo, yo). If one sign is positive and the other is negative then f has a saddle-point at (zo, yo). Thus the geometry is distinguished by the signature of the quadratic form.

BILINEAR AND QUADRATIC FORMS Example

107

Consider the quadratic form

4a” + 4y* + 427 — 2zy — 2yz + Qaz. Its matrix is

45 A=j|-1

=1 ae eek

1 Sa 4

The eigenvalues of A are 3 (of algebraic multiplicity 2), and 6. If P is an orthogonal matrix such that P* AP is diagonal then, changing coordinates by X = P'x, we transform the quadratic form to

3X? + 3Y7? +62?

which is positive definite.

CHAPTER

TEN

Real normality

We have seen in 8.7 that the ortho-diagonalizable linear mappings on a complex inner product space are precisely those that are normal; and in 8.11 that the ortho-diagonalizable linear mappings on a real inner product space are precisely those that are self-adjoint. It is therefore natural to ask what can be said about normal linear mappings on a real inner product space; equivalently, to ask about real square matrices that commute with their transposes. Our objective now will be to obtain a canonical form for such a matrix under orthogonal similarity. For this purpose, we consider the following notion. Definition Let V be a finite-dimensional real inner product space and let f : V — V be linear. We say that f is skewadjoint if f* = —f. The corresponding terminology for real square matrices is skew-symmetric. 10.1 Theorem I[fV product space and f self-adjoint g: V + such that f =g+h.

1s a non-zero finite-dimensional real inner : V — V 1s linear then there ts a unique V and a unique skew-adjointh: V + V Moreover, f is normal if and only tf g,h

commute.

Proof We have f = S(f+f*)+3(f-f*) where +(f+/*) is selfadjoint and a(f — f*) is skew-adjoint. Also, if f = g +h where g is self-adjoint and h is skew-adjoint then f* = g*+h* =g—h

and consequently we see that g = 3(f+ f*) andh= 3(f — f*). Now fof* = f*of gives (g +h) o(g—h) = (g—h)o(g+h) which reduces to

goh = hog. Conversely, if g, h commute then

it is readily seen that f o f* = g?—h? = f*of.

REAL

NORMALITY

109

We now obtain a useful characterisation of skew-adjoint mappings. 10.2 Theorem [fV is a non-zero finite-dimensional real inner product space then f :V —V 1s skew-adjoint if and only if

(VeeV) Proof

=:

(f(z)|2) =0.

If f is skew-adjoint then for every z € V we have

(f(z) |2)= (z| — f(2)) = —(2| f(2)) = —(f (2) |2) and so (f(z) |z) =0. 0 with: # 7. Then the minimum polynomials of, f;, f; are X? — 2a;X +a? +? and X? —2a,;X+ as + b? where either a; # a; or b? # b. By 10.11, we have

Mg, = X—a;,mg, = X—a; and mp, = X?+b?, mp, = X? +07. Given z; € V; and z; € V; we therefore have

= ((h? + 0 idy,)(zi) |zy) = (h? (a4) |23)+BF (a |25) = (2; |h?(23)) + bf (a: |3) = (x; |hF(z;)) + 0} (x; |x3)

= —0F(a;|23) + 67(2: |23) = (2? - bo) (x5 |z;),

so that in the case where b? # 6% we have (z; |x;) = 0. Likewise,

0 = ((9; — a idy,) (zi) |23) = (9(2s) |27) — a (as |25) = (2; |9(z3)) — ai (zi |z;)

= (2; |9;(z3)) — a:(z3 |23)

= a; (x; |z;) — a;(2; |25) = (a; — a;) (2; |23)

116

>

VOLUME

4: LINEAR

ALGEBRA

so that in the case where a; # a; we have (z;|z;) = 0. We thus see that V,,...,V, are pairwise othogonal. That Vo is also orthogonal to each V; for 2 > 1 follows from the above strings of equalities on taking 7 = O and using the fact that fo = aoidy, is self-adjoint and consequently go = fo and hh = 0. O We can now establish the main result.

10.18 Theorem If V 1s a non-zero finite-dimensional real inner product space and if f : V + V 18s a normal linear mapping then there 1s an ordered orthonormal basis ofV relative to which the matriz of f ts of the form

Ai Az Ax where each A; 1s etther form

a 1X 1 matriz or a 2 x 2 matriz of the

a

—f

B

a

an which B #0. Proof

With the same notation as above, let

k my = (X — ao) TOS — 2a;X+ a? + 6?) and let the primary components

of f be V; for 1 = 0,...,k.

Then ms, = X — ao if 1 = 0 and my, = X? — 2a;X + a?+83 otherwise. Given any V; with 1 # 0 we have f; = g; + h; where the self-adjoint part g; has minimum polynomial X — a; and the skew-adjoint part h; has minimum polynomial X? + b?. Now h; is skew-adjoint and so, by 10.8, there is an ordered orthonormal

REAL

NORMALITY

117

basis B; of V; with respect to which the matrix of h; is

M (b;) =

b;

0

Since the minimum polynomial of g; is X — a; we have g;(z) = a;z for every z € B; and so the matrix of g; relative to B; is the diagonal matrix all of whose diagonal entries are a;. It now follows that the matrix of f; = g; + h; relative to B; is

M

(a,, b;) -

b;

sas

In the case where 1 = 0, we have fy = agidy, so fo is selfadjoint. By 8.11, there is an ordered orthonormal basis of Vo with respect to which the matrix of fo is diagonal. Now by 10.12 the primary components V; are pairwise orthogonal. Pasting together the ordered orthonormal bases in question, we then obtain an ordered orthonormal basis of V relative to which the matrix of f is of the form stated. > 10.14 Corollary A real square matriz 1s normal if and only if it 1s orthogonally similar to a matriz of the form described in 10.13.

>

Our labours produce a bonus : an orthogonal linear mapping f is such that f—! exists and equals f*, and so is in particular normal. We can therefore deduce from the above a canonical form for orthogonal mappings and matrices.

VOLUME

\

118

4: LINEAR

ALGEBRA

10.15 Theorem [If V is a non-zero finite-dimenstonal real inner product space and f : V — V 1s an orthogonal linear mapping then there 1s an ordered orthonormal basis ofV with respect to which the matriz of f 1s of the form

Im

el P; P, Py

an which each P; ts a 2 X 2 matrix of the form

He where 8B #0 and a? + f? = 1. Proof

With the same notation as in 10.13, we have that the ma-

trix M(a;,b;), which represents f; relative to the ordered basis B;, is an orthogonal matrix (since f; is orthogonal). Multiplying this matrix by its transpose, we obtain an identity matrix and, equating entries, we see that a? + b? = 1. As for the primary component Vo, the matrix of fo is diagonal. Since the square of this diagonal matrix is an identity matrix, its entries must be +1. We can now rearrange the basis to see that the matrix of f has the form described. > Example If f : IR° — IR® is orthogonal then f is called a rotation if det A= 1 for any matrix A that represents f. If f is a rotation then there is an ordered orthonormal basis of IR® with respect to which the matrix of f is 1

0

0

0

cos?

—sin?

O

sin?

cos

for some real number ?.

Index

algebra, 1 annihilator, 47,67 adjoint, 80,85

elementary divisor, 53 elementary Jordan matrix, 33

eigenvalue, 5 eigenvector, 5

Bessel’s inequality, 73 bidual, 63 bilinear form, 99 bitranspose, 66 block diagonal form, 15 Cayley-Hamilton theorem, 2 Cauchy-Schwarzg inequality, 71 characteristic polynomial, 2 classical canonical matrix, 56

classical p-matrix, 56 companion matrix, 49 complex inner product space, 69 conjugate isomorphism, 78 conjugate transformation, 78 coordinate form, 60 cyclic basis, 49 cyclic decomposition, 49 cyclic subspace, 49 diagonalizable, 20 direct sum, 8 distance, 71

Fourier coefficients, 77 Gram matrix, 98 Gram-Schmidt process, 75 Hilbert space, 72 idempotent, 10 index, 31 inner product space, 69 invariant subspace, 13 Jordan Jordan Jordan Jordan

basis, 39 block, 33 canonical matrix, 37 decomposition, 28

Lagrange polynomial linear form, 58 linear functional, 58

mimimum polynomial, 3

dot product, 70 dual space, 58

nilpotent, 22 normalising, 73

120

ortho-diagonalizable, 88 orthogonal, 72 orthogonal complement, 82 orthogonal direct sum, 87 orthogonally similar, 86 orthonormal, 72 orthonormal basis, 74 ortho-projection, 87 Parseval’s identity, 77 positive, 96 positive definite, 96,106 primary decomposition, 15 projection, 10 quadratic form, 101 quotient space, 44

rational canonical matrix, 53 real inner product space, 69 scalar product, 70 signature, 105 simultaneously diagonalizable, 21 skew-adjoint, 108 spectral resolution, 90 square summable, 72 sum of subspaces, 7 Sylvester’s theorem, 105 symmetric bilinear form, 101 triangular form, 24 triangle inequality, 71 unitarily similar, 86 unitary, 86

Essential Student Algebra T.S. Blyth and E.F. Robertson Abstract algebra is the cornerstone of mathematics. The study of algebra begins with the concepts of sets and mappings (functions), which underlie all of mathematics and are logical tools used throughout science. For students who are starting on a course of study in mathematics, science, engineering or technology, algebra will form a basis for their syllabus. Essential Student Algebra is for them. Essential Student Algebra is a set of five modular texts, covering all the important topics of abstract algebra at first and second year level. Written in a straight-forward, readable style, each volume stands on its own as a concise text on one aspect of algebra. Taken as a set, the five volumes make up a comprehensive library of student algebra. Written by two highly regarded authors of algebra books for students, Essential Student Algebra includes both the theoretical side of algebra and a wealth of illustrative examples. The five volumes comprise a complete modular course up to third year level. Essential Student Algebra will be an invaluable text in colleges, universities and polytechnics as well as in senior classes at high schools everywhere. Titles in this series Volume Volume Volume Volume Volume

1: 2: 3: 4: 5:

Sets and Mappings (0 412 27880 4) Matrices and Vector Spaces (0 412 27870 7) Abstract Algebra (0 412 27860 X) Linear Algebra (0 412 27850 2) Groups (0 412 27840 5)

ISBN

O-41e-27850-e2

9 "780412°278501