Linear Algebra And Analysis [November 29, 2020 ed.]

Citation preview

Linear Algebra And Analysis Kuttler [email protected] November 29, 2020

2

CONTENTS

1 Some Prerequisite Topics 1.1 Sets and Set Notation . . . . . . . . . 1.2 The Schroder Bernstein Theorem . . . 1.3 Equivalence Relations . . . . . . . . . 1.4 Well Ordering and Induction . . . . . 1.5 The Complex Numbers and Fields . . 1.6 Polar Form of Complex Numbers . . . 1.7 Roots of Complex Numbers . . . . . . 1.8 The Quadratic Formula . . . . . . . . 1.9 The Complex Exponential . . . . . . . 1.10 The Fundamental Theorem of Algebra 1.11 Ordered Fields . . . . . . . . . . . . . 1.12 Integers . . . . . . . . . . . . . . . . . 1.12.1 Division of Numbers . . . . . . 1.13 Polynomials . . . . . . . . . . . . . . . 1.14 Examples of Finite Fields . . . . . . . 1.14.1 The Field Zp . . . . . . . . . . 1.15 Some Topics From Analysis . . . . . . 1.15.1 lim sup and lim inf . . . . . . . 1.16 Exercises . . . . . . . . . . . . . . . .

I

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

Linear Algebra For Its Own Sake

1 1 2 5 5 7 10 10 12 12 13 14 16 16 18 22 22 23 24 27

31

2 Systems of Linear Equations 2.1 Elementary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Gauss Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 33 34 37

3 Vector Spaces 3.1 Linear Combinations of Vectors, Independence . . . . . . . . . . . 3.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Polynomials and Fields . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The Algebraic Numbers and Minimum Polynomial . . . . . 3.4.2 The Lindermannn Weierstrass Theorem and Vector Spaces 3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

43 45 48 51 54 59 62 63

4 Matrices 4.1 Properties of Matrix Multiplication . . 4.2 Finding the Inverse of a Matrix . . . . 4.3 Linear Relations and Row Operations 4.4 Block Multiplication of Matrices . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

65 67 69 72 74

. . . .

3

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

CONTENTS

4 4.5 4.6 5

Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Linear Transformations 5.1 L (V, W ) as a Vector Space . . . . . . . 5.2 The Matrix of a Linear Transformation 5.3 Rotations About a Given Vector∗ . . . . 5.4 Exercises . . . . . . . . . . . . . . . . .

77 81

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

85 85 87 94 96

6 Direct Sums and Block Diagonal Matrices 6.1 A Theorem of Sylvester, Direct Sums . . . . . . . . . . . 6.2 Finding the Minimum Polynomial . . . . . . . . . . . . 6.3 Eigenvalues and Eigenvectors of Linear Transformations 6.4 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . 6.5 A Formal Derivative and Diagonalizability . . . . . . . . 6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

101 106 110 113 115 119 121

7 Canonical Forms 7.1 Cyclic Sets . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 The Rational Canonical Form . . . . . . . . . . . . . . . 7.3 Nilpotent Transformations and Jordan Canonical Form . 7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Companion Matrices and Uniqueness . . . . . . . . . . . 7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

129 129 134 135 141 144 149

8 Determinants 8.1 The Function sgn . . . . . . . . . . . 8.2 The Definition of the Determinant . 8.3 A Symmetric Definition . . . . . . . 8.4 Basic Properties of the Determinant 8.5 Expansion Using Cofactors . . . . . 8.6 A Formula for the Inverse . . . . . . 8.6.1 Cramer’s Rule . . . . . . . . 8.6.2 An Identity of Cauchy . . . . 8.7 Rank of a Matrix . . . . . . . . . . . 8.8 Summary of Determinants . . . . . . 8.9 The Cayley Hamilton Theorem . . . 8.10 Exercises . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

151 151 153 154 155 157 158 160 160 162 164 164 167

9 Modules and Rings ∗ 9.1 Integral Domains and the Ring of Polynomials . . . . 9.2 Modules and Decomposition Into Cyclic Sub-Modules 9.3 A Direct Sum Decomposition . . . . . . . . . . . . . . 9.4 Quotients . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Cyclic Decomposition . . . . . . . . . . . . . . . . . . 9.6 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Canonical Forms . . . . . . . . . . . . . . . . . . . . . 9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

173 173 178 179 182 183 190 192 196

10 Some Items which Resemble Linear Algebra 10.1 The Symmetric Polynomial Theorem . . . . . 10.2 Transcendental Numbers . . . . . . . . . . . . 10.3 The Fundamental Theorem of Algebra . . . . 10.4 More on Algebraic Field Extensions . . . . . 10.4.1 The Galois Group . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

205 205 209 216 219 224

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . .

. . . . . . . . . . . .

. . . . .

. . . . .

CONTENTS 10.4.2 10.4.3 10.4.4 10.4.5 10.4.6 10.4.7 10.4.8 10.5 A Few 10.5.1 10.5.2

II

5

Normal Field Extensions . . . . . . . . . . . Normal Subgroups and Quotient Groups . . Separable Polynomials . . . . . . . . . . . . Intermediate Fields and Normal Subgroups Permutations . . . . . . . . . . . . . . . . . Solvable Groups . . . . . . . . . . . . . . . Solvability by Radicals . . . . . . . . . . . . Generalizations . . . . . . . . . . . . . . . . The Normal Closure of a Field Extension . Conditions for Separability . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

Analysis and Geometry in Linear Algebra

227 228 229 231 232 236 238 241 241 242

247

11 Normed Linear Spaces 11.1 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Open and Closed Sets, Sequences, Limit Points, Completeness 11.1.2 Cauchy Sequences, Completeness . . . . . . . . . . . . . . . . . 11.1.3 Closure of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.4 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . 11.1.5 Separable Metric Spaces . . . . . . . . . . . . . . . . . . . . . . 11.1.6 Compact Sets in Metric Space . . . . . . . . . . . . . . . . . . 11.1.7 Lipschitz Continuity And Contraction Maps . . . . . . . . . . . 11.1.8 Convergence Of Functions . . . . . . . . . . . . . . . . . . . . . 11.2 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Subspaces Spans And Bases . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Inner Product And Normed Linear Spaces . . . . . . . . . . . . . . . . 11.4.1 The Inner Product In Fn . . . . . . . . . . . . . . . . . . . . . 11.4.2 General Inner Product Spaces . . . . . . . . . . . . . . . . . . . 11.4.3 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . 11.4.4 The p Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.5 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Equivalence Of Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Norms On L (X, Y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Limits Of A Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

249 249 249 252 254 255 255 256 258 260 261 264 265 265 266 268 269 270 272 273 275 278

12 Limits Of Vectors And Matrices 12.1 Regular Markov Matrices . . . 12.2 Migration Matrices . . . . . . . 12.3 Absorbing States . . . . . . . . 12.4 Positive Matrices . . . . . . . . 12.5 Functions Of Matrices . . . . . 12.6 Exercises . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

283 283 286 287 290 297 300

13 Inner Product Spaces and Least Squares 13.1 Orthogonal Projections . . . . . . . . . . . . 13.2 Formula for Distance to a Subspace . . . . . . 13.3 Riesz Representation Theorem, Adjoint Map 13.4 Least Squares . . . . . . . . . . . . . . . . . . 13.5 Fredholm Alternative . . . . . . . . . . . . . . 13.6 The Determinant and Volume . . . . . . . . . 13.7 Finding an Orthogonal Basis . . . . . . . . . 13.8 Exercises . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

303 303 308 309 312 313 314 315 316

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

CONTENTS

6 14 Matrices And The Inner Product 14.1 Schur’s Theorem, Hermitian Matrices . . . . . . 14.2 Quadratic Forms . . . . . . . . . . . . . . . . . . 14.3 The Estimation Of Eigenvalues . . . . . . . . . . 14.4 Advanced Theorems . . . . . . . . . . . . . . . . 14.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 14.6 Cauchy’s Interlacing Theorem for Eigenvalues . . 14.7 The Right Polar Factorization∗ . . . . . . . . . . 14.8 The Square Root . . . . . . . . . . . . . . . . . . 14.9 An Application To Statistics . . . . . . . . . . . 14.10Simultaneous Diagonalization . . . . . . . . . . . 14.11Fractional Powers . . . . . . . . . . . . . . . . . . 14.12Spectral Theory Of Self Adjoint Operators . . . . 14.13Positive And Negative Linear Transformations . 14.14The Singular Value Decomposition . . . . . . . . 14.15Approximation In The Frobenius Norm . . . . . 14.16Least Squares And Singular Value Decomposition 14.17The Moore Penrose Inverse . . . . . . . . . . . . 14.18The Spectral Norm And The Operator Norm . . 14.19The Positive Part Of A Hermitian Matrix . . . . 14.20Exercises . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

323 323 327 328 329 332 339 342 344 346 348 351 352 356 357 359 361 362 365 366 367

15 Analysis Of Linear Transformations 15.1 The Condition Number . . . . . . . . . . 15.2 The Spectral Radius . . . . . . . . . . . . 15.3 Series And Sequences Of Linear Operators 15.4 Iterative Methods For Linear Systems . . 15.5 Exercises . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

373 373 374 377 381 386

16 Numerical Methods, Eigenvalues 16.1 The Power Method For Eigenvalues . . . . . . . . . . . . . 16.1.1 The Shifted Inverse Power Method . . . . . . . . . 16.1.2 The Explicit Description Of The Method . . . . . 16.2 Automation With Matlab . . . . . . . . . . . . . . . . . . 16.2.1 Complex Eigenvalues . . . . . . . . . . . . . . . . . 16.2.2 Rayleigh Quotients And Estimates for Eigenvalues 16.3 The QR Algorithm . . . . . . . . . . . . . . . . . . . . . . 16.3.1 Basic Properties And Definition . . . . . . . . . . 16.3.2 The Case Of Real Eigenvalues . . . . . . . . . . . 16.3.3 The QR Algorithm In The General Case . . . . . . 16.3.4 Upper Hessenberg Matrices . . . . . . . . . . . . . 16.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

393 393 396 397 397 400 401 404 404 407 411 415 418

III

. . . . .

. . . . .

. . . . .

. . . . .

Analysis Which Involves Linear Algebra

17 Approximation of Functions and the Integral 17.1 Weierstrass Approximation Theorem . . . . . . . 17.2 Functions of Many Variables . . . . . . . . . . . . 17.3 A Generalization with Tietze Extension Theorem 17.4 An Approach to the Integral . . . . . . . . . . . 17.5 The M¨ untz Theorems . . . . . . . . . . . . . . . 17.6 Exercises . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

421 . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

423 423 425 427 429 434 436

CONTENTS

7

18 The Derivative, a Linear Transformation 18.1 Basic Definitions . . . . . . . . . . . . . . . 18.2 The Chain Rule . . . . . . . . . . . . . . . . 18.3 The Matrix Of The Derivative . . . . . . . 18.4 A Mean Value Inequality . . . . . . . . . . 18.5 Existence Of The Derivative, C 1 Functions 18.6 Higher Order Derivatives . . . . . . . . . . 18.7 Some Standard Notation . . . . . . . . . . . 18.8 The Derivative And The Cartesian Product 18.9 Mixed Partial Derivatives . . . . . . . . . . 18.10Newton’s Method . . . . . . . . . . . . . . . 18.11Exercises . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

437 437 439 439 441 442 445 446 447 450 452 453

19 Implicit Function Theorem 19.1 Statement And Proof Of The Theorem . . . 19.2 More Derivatives . . . . . . . . . . . . . . . 19.3 The Case Of Rn . . . . . . . . . . . . . . . 19.4 Exercises . . . . . . . . . . . . . . . . . . . 19.5 The Method Of Lagrange Multipliers . . . . 19.6 The Taylor Formula . . . . . . . . . . . . . 19.7 Second Derivative Test . . . . . . . . . . . . 19.8 The Rank Theorem . . . . . . . . . . . . . . 19.9 The Local Structure Of C 1 Mappings . . . 19.10Brouwer Fixed Point Theorem Rn . . . . . 19.10.1 Simplices and Triangulations . . . . 19.10.2 Labeling Vertices . . . . . . . . . . . 19.11The Brouwer Fixed Point Theorem . . . . . 19.12Invariance Of Domain∗ . . . . . . . . . . . . 19.13Tensor Products . . . . . . . . . . . . . . . 19.13.1 The Norm In Tensor Product Space 19.13.2 The Taylor Formula And Tensors . . 19.14Exercises . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

457 457 462 463 463 466 467 469 471 474 476 477 480 482 484 486 490 492 493

20 Abstract Measures And Measurable Functions 20.1 Simple Functions And Measurable Functions . . 20.2 Measures And Their Properties . . . . . . . . . . 20.3 Dynkin’s Lemma . . . . . . . . . . . . . . . . . . 20.4 Measures And Regularity . . . . . . . . . . . . . 20.5 When Is A Measure A Borel Measure? . . . . . . 20.6 Measures And Outer Measures . . . . . . . . . . 20.7 Exercises . . . . . . . . . . . . . . . . . . . . . . 20.8 An Outer Measure On P (R) . . . . . . . . . . . 20.9 Measures From Outer Measures . . . . . . . . . . 20.10One Dimensional Lebesgue Stieltjes Measure . . 20.11Exercises . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

497 497 501 502 504 507 508 509 511 513 517 519

21 The Abstract Lebesgue Integral 21.1 Definition For Nonnegative Measurable Functions . . . . . 21.1.1 Riemann Integrals For Decreasing Functions . . . . 21.1.2 The Lebesgue Integral For Nonnegative Functions 21.2 The Lebesgue Integral For Nonnegative Simple Functions 21.3 The Monotone Convergence Theorem . . . . . . . . . . . 21.4 Other Definitions . . . . . . . . . . . . . . . . . . . . . . . 21.5 Fatou’s Lemma . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

523 523 523 524 525 526 527 528

CONTENTS

8 21.6 21.7 21.8 21.9

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

528 529 533 535

22 Measures From Positive Linear Functionals 22.1 Lebesgue Measure On Rn ,Fubini’s Theorem . 22.2 The Besicovitch Covering Theorem . . . . . . 22.3 Change Of Variables, Linear Map . . . . . . . 22.4 Vitali Coverings . . . . . . . . . . . . . . . . . 22.5 Change Of Variables . . . . . . . . . . . . . . 22.6 Exercises . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

541 547 550 554 556 558 565

Lp Spaces Basic Inequalities And Properties . . . . . . . . . . . . . Density Considerations . . . . . . . . . . . . . . . . . . . Separability . . . . . . . . . . . . . . . . . . . . . . . . . Continuity Of Translation . . . . . . . . . . . . . . . . . Mollifiers And Density Of Smooth Functions . . . . . . Fundamental Theorem Of Calculus For Radon Measures Exercises . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

571 571 577 578 580 580 583 586

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

591 591 595 600 601 607 613 618

23 The 23.1 23.2 23.3 23.4 23.5 23.6 23.7

The Integral’s Righteous Algebraic Desires The Lebesgue Integral, L1 . . . . . . . . . The Dominated Convergence Theorem . . Exercises . . . . . . . . . . . . . . . . . .

. . . .

24 Representation Theorems 24.1 Basic Theory . . . . . . . . . . . . . . . . . . . . 24.2 Radon Nikodym Theorem . . . . . . . . . . . . . 24.3 Improved Change Of Variables Formula . . . . . 24.4 Vector Measures . . . . . . . . . . . . . . . . . . 24.5 Representation Theorems For The Dual Space Of 24.6 The Dual Space Of C0 (X) . . . . . . . . . . . . . 24.7 Exercises . . . . . . . . . . . . . . . . . . . . . .

IV

Appendix

. . . . . . . . Lp . . . .

. . . . . . .

. . . . . . .

623

A The Cross Product 625 A.1 The Box Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 A.2 The Distributive Law For Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . 629 B The Hausdorff Maximal Theorem 631 B.1 The Hamel Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 B.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633

CONTENTS

i

c 2018, You are welcome to use this, including copying it for use in classes or referring Copyright ⃝ to it on line but not to publish it for money.

ii

CONTENTS

Preface This book is about linear algebra and its interaction with analysis. It emphasizes the main ideas, both algebraic and geometric and attempts to present these ideas as quickly as possible without being overly terse. The emphasis will be on arbitrary fields in the first part and then later geometric ideas will be included in the context of the usual fields of R and C. The first part is on linear algebra as a part of modern algebra. It avoids cluttering the presentation with geometric and analytic ideas which are really not essential to understanding these theorems. The second part is on the role of analysis in linear algebra. It is like baby functional analysis. Some analysis ideas do in fact creep in to the first part, but they are generally fairly rudimentary, occur as examples, and will have been seen in calculus. It may be that increased understanding is obtained by this kind of presentation in which that which is purely algebraic is presented first. This also involves emphasizing the minimum polynomial more than the characteristic polynomial and postponing the determinant. In each part, I have included a few related topics which are similar to ideas found in linear algebra or which have linear algebra as a fundamental part.The third part of the book involves significant ideas from analysis which depend on linear algebra. The book is a re written version of an earlier book. It also includes several topics not in this other book including a chapter which is an introduction to modules and rings and much more material on analysis. However, I am not including topics from functional analysis so much. Instead, I am limiting the topics to the standard analysis involving derivatives and integrals. In fact, if everything which uses linear algebra were presented, the book would be much longer. It is limited to topics that I especially like and emphasizes finite dimensional situations.

iii

iv

CONTENTS

Chapter 1

Some Prerequisite Topics The reader should be familiar with most of the topics in this chapter. However, it is often the case that set notation is not familiar and so a short discussion of this is included first. Complex numbers are then considered in somewhat more detail. Many of the applications of linear algebra require the use of complex numbers, so this is the reason for this introduction. Then polynomials and finite fields are discussed briefly to emphasize that linear algebra works for any field of scalars, not just the field of real and complex numbers.

1.1

Sets and Set Notation

A set is just a collection of things called elements. Often these are also referred to as points in calculus. For example {1, 2, 3, 8} would be a set consisting of the elements 1,2,3, and 8. To indicate that 3 is an element of {1, 2, 3, 8} , it is customary to write 3 ∈ {1, 2, 3, 8} . 9 ∈ / {1, 2, 3, 8} means 9 is not an element of {1, 2, 3, 8} . Sometimes a rule specifies a set. For example you could specify a set as all integers larger than 2. This would be written as S = {x ∈ Z : x > 2} . This notation says: the set of all integers, x, such that x > 2. If A and B are sets with the property that every element of A is an element of B, then A is a subset of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} , in symbols, {1, 2, 3, 8} ⊆ {1, 2, 3, 4, 5, 8} . It is sometimes said that “A is contained in B” or even “B contains A”. The same statement about the two sets may also be written as {1, 2, 3, 4, 5, 8} ⊇ {1, 2, 3, 8}. The union of two sets is the set consisting of everything which is an element of at least one of the sets, A or B. As an example of the union of two sets {1, 2, 3, 8} ∪ {3, 4, 7, 8} = {1, 2, 3, 4, 7, 8} because these numbers are those which are in at least one of the two sets. In general A ∪ B ≡ {x : x ∈ A or x ∈ B} . Be sure you understand that something which is in both A and B is in the union. It is not an exclusive or. The intersection of two sets, A and B consists of everything which is in both of the sets. Thus {1, 2, 3, 8} ∩ {3, 4, 7, 8} = {3, 8} because 3 and 8 are those elements the two sets have in common. In general, A ∩ B ≡ {x : x ∈ A and x ∈ B} . The symbol [a, b] where a and b are real numbers, denotes the set of real numbers x, such that a ≤ x ≤ b and [a, b) denotes the set of real numbers such that a ≤ x < b. (a, b) consists of the set of real numbers x such that a < x < b and (a, b] indicates the set of numbers x such that a < x ≤ b. [a, ∞) means the set of all numbers x such that x ≥ a and (−∞, a] means the set of all real numbers which are less than or equal to a. These sorts of sets of real numbers are called intervals. The two points a and b are called endpoints of the interval. Other intervals such as (−∞, b) are defined by analogy to what was just explained. In general, the curved parenthesis indicates the end point it sits next to is not included while the square parenthesis indicates this end point is included. The 1

2

CHAPTER 1. SOME PREREQUISITE TOPICS

reason that there will always be a curved parenthesis next to ∞ or −∞ is that these are not real numbers. Therefore, they cannot be included in any set of real numbers. A special set which needs to be given a name is the empty set also called the null set, denoted by ∅. Thus ∅ is defined as the set which has no elements in it. Mathematicians like to say the empty set is a subset of every set. The reason they say this is that if it were not so, there would have to exist a set A, such that ∅ has something in it which is not in A. However, ∅ has nothing in it and so the least intellectual discomfort is achieved by saying ∅ ⊆ A. If A and B are two sets, A \ B denotes the set of things which are in A but not in B. Thus A \ B ≡ {x ∈ A : x ∈ / B} . Set notation is used whenever convenient. To illustrate the use of this notation relative to intervals consider three examples of inequalities. Their solutions will be written in the notation just described. Example 1.1.1 Solve the inequality 2x + 4 ≤ x − 8 x ≤ −12 is the answer. This is written in terms of an interval as (−∞, −12]. Example 1.1.2 Solve the inequality (x + 1) (2x − 3) ≥ 0. The solution is x ≤ −1 or x ≥

3 3 . In terms of set notation this is denoted by (−∞, −1] ∪ [ , ∞). 2 2

Example 1.1.3 Solve the inequality x (x + 2) ≥ −4. This is true for any value of x. It is written as R or (−∞, ∞) . Something is in the Cartesian product of a set whose elements are sets if it consists of a single thing taken from each set in the family. Thus (1, 2, 3) ∈ {1, 4, .2} × {1, 2, 7} × {4, 3, 7, 9} because it consists of exactly one element from each of the sets which are separated by ×. Also, this is the ∏ notation for the Cartesian product of finitely many sets. If S is a set whose elements are sets, A∈S A signifies the Cartesian product. The Cartesian product is the set of choice functions, a choice function being a function which selects exactly one element of each set of S. You may think the axiom of choice, stating that the Cartesian product of a nonempty family of nonempty sets is nonempty, is innocuous but there was a time when many mathematicians were ready to throw it out because it implies things which are very hard to believe, things which never happen without the axiom of choice.

1.2

The Schroder Bernstein Theorem

It is very important to be able to compare the size of sets in a rational way. The most useful theorem in this context is the Schroder Bernstein theorem which is the main result to be presented in this section. The Cartesian product is discussed above. The next definition reviews this and defines the concept of a function. Definition 1.2.1 Let X and Y be sets. X × Y ≡ {(x, y) : x ∈ X and y ∈ Y } A relation is defined to be a subset of X × Y . A function f, also called a mapping, is a relation which has the property that if (x, y) and (x, y1 ) are both elements of the f , then y = y1 . The domain of f is defined as D (f ) ≡ {x : (x, y) ∈ f } , written as f : D (f ) → Y . Another notation which is used is the following f −1 (y) ≡ {x ∈ D (f ) : f (x) = y} This is called the inverse image.

1.2. THE SCHRODER BERNSTEIN THEOREM

3

It is probably safe to say that most people do not think of functions as a type of relation which is a subset of the Cartesian product of two sets. A function is like a machine which takes inputs, x and makes them into a unique output, f (x). Of course, that is what the above definition says with more precision. An ordered pair, (x, y) which is an element of the function or mapping has an input, x and a unique output y,denoted as f (x) while the name of the function is f . “mapping” is often a noun meaning function. However, it also is a verb as in “f is mapping A to B ”. That which a function is thought of as doing is also referred to using the word “maps” as in: f maps X to Y . However, a set of functions may be called a set of maps so this word might also be used as the plural of a noun. There is no help for it. You just have to suffer with this nonsense. The following theorem which is interesting for its own sake will be used to prove the Schroder Bernstein theorem, proved by Dedekind in 1887. The proof given here is like the version in Hewitt and Stromberg [20]. Theorem 1.2.2 Let f : X → Y and g : Y → X be two functions. Then there exist sets A, B, C, D, such that A ∪ B = X, C ∪ D = Y, A ∩ B = ∅, C ∩ D = ∅, f (A) = C, g (D) = B. The following picture illustrates the conclusion of this theorem. X

Y f A

B = g(D) 

- C = f (A)

g D

Proof: Consider the empty set, ∅ ⊆ X. If y ∈ Y \ f (∅), then g (y) ∈ / ∅ because ∅ has no elements. Also, if A, B, C, and D are as described above, A also would have this same property that the empty set has. However, A is probably larger. Therefore, say A0 ⊆ X satisfies P if whenever y ∈ Y \ f (A0 ) , g (y) ∈ / A0 . A ≡ {A0 ⊆ X : A0 satisfies P}. Let A = ∪A. If y ∈ Y \ f (A), then for each A0 ∈ A, y ∈ Y \ f (A0 ) and so g (y) ∈ / A0 . Since g (y) ∈ / A0 for all A0 ∈ A, it follows g (y) ∈ / A. Hence A satisfies P and is the largest subset of X which does so. Now define C ≡ f (A) , D ≡ Y \ C, B ≡ X \ A. It only remains to verify that g (D) = B. It was just shown that g (D) ⊆ B. Suppose x ∈ B = X \A. Then A∪{x} does not satisfy P and so there exists y ∈ Y \f (A ∪ {x}) ⊆ D such that g (y) ∈ A ∪ {x} . But y ∈ / f (A) and so since A satisfies P, it follows g (y) ∈ / A. Hence g (y) = x and so x ∈ g (D). Hence g (D) = B.  Theorem 1.2.3 (Schroder Bernstein) If f : X → Y and g : Y → X are one to one, then there exists h : X → Y which is one to one and onto. Proof:Let A, B, C, D be the sets of Theorem 1.2.2 and define { f (x) if x ∈ A h (x) ≡ g −1 (x) if x ∈ B Then h is the desired one to one and onto mapping.  Recall that the Cartesian product may be considered as the collection of choice functions.

4

CHAPTER 1. SOME PREREQUISITE TOPICS

Definition ∏ 1.2.4 Let I be a set and let Xi be a set for each i ∈ I. f is a choice function written as f ∈ i∈I Xi if f (i) ∈ Xi for each i ∈ I. ∏ The axiom of choice says that if Xi ̸= ∅ for each i ∈ I, for I a set, then i∈I Xi ̸= ∅. Sometimes the two functions, f and g are onto but not one to one. It turns out that with the axiom of choice, a similar conclusion to the above may be obtained. Corollary 1.2.5 If f : X → Y is onto and g : Y → X is onto, then there exists h : X → Y which is one to one and onto. Proof: For each , f −1 (y) ≡ {x ∈ X : f (x) = y} ̸= ∅. Therefore, by the axiom of choice, ∏ y ∈ Y−1 −1 there exists f0 ∈ y∈Y f (y) which is the same as saying that for each y ∈ Y , f0−1 (y) ∈ f −1 (y). Similarly, there exists (g0−1 (x) ∈) g −1 (x) for all )x ∈ X. Then f0−1 is one to one because if f0−1 (y1 ) = ( −1 −1 −1 f0 (y2 ), then y1 = f f0 (y1 ) = f f0 (y2 ) = y2 . Similarly g0−1 is one to one. Therefore, by the Schroder Bernstein theorem, there exists h : X → Y which is one to one and onto.  Definition 1.2.6 A set S, is finite if there exists a natural number n and a map θ which maps {1, · · · , n} one to one and onto S. S is infinite if it is not finite. A set S, is called countable if there exists a map θ mapping N one to one and onto S.(When θ maps a set A to a set B, this will be written as θ : A → B in the future.) Here N ≡ {1, 2, · · · }, the natural numbers. S is at most countable if there exists a map θ : N →S which is onto. The property of being at most countable is often referred to as being countable because the question of interest is normally whether one can list all elements of the set, designating a first, second, third etc. in such a way as to give each element of the set a natural number. The possibility that a single element of the set may be counted more than once is often not important. Theorem 1.2.7 If X and Y are both at most countable, then X × Y is also at most countable. If either X or Y is countable, then X × Y is also countable. Proof:It is given that there exists a mapping η : N → X which is onto. Define η (i) ≡ xi and consider X as the set {x1 , x2 , x3 , · · · }. Similarly, consider Y as the set {y1 , y2 , y3 , · · · }. It follows the elements of X × Y are included in the following rectangular array. (x1 , y1 ) (x1 , y2 ) (x1 , y3 ) · · · (x2 , y1 ) (x2 , y2 ) (x2 , y3 ) · · · (x3 , y1 ) (x3 , y2 ) (x3 , y3 ) · · · .. .. .. . . .

← Those which have x1 in first slot. ← Those which have x2 in first slot. . ← Those which have x3 in first slot. .. .

Follow a path through this array as follows. (x1 , y1 ) → ↙ (x2 , y1 ) ↓ ↗ (x3 , y1 )

(x1 , y3 ) →

(x1 , y2 ) ↗ (x2 , y2 )

Thus the first element of X × Y is (x1 , y1 ), the second element of X × Y is (x1 , y2 ), the third element of X × Y is (x2 , y1 ) etc. This assigns a number from N to each element of X × Y. Thus X × Y is at most countable. It remains to show the last claim. Suppose without loss of generality that X is countable. Then there exists α : N → X which is one to one and onto. Let β : X × Y → N be defined by β ((x, y)) ≡ α−1 (x). Thus β is onto N. By the first part there exists a function from N onto X × Y . Therefore, by Corollary 1.2.5, there exists a one to one and onto mapping from X × Y to N. 

1.3. EQUIVALENCE RELATIONS

5

Theorem 1.2.8 If X and Y are at most countable, then X ∪ Y is at most countable. If either X or Y are countable, then X ∪ Y is countable. Proof:As in the preceding theorem, X = {x1 , x2 , x3 , · · · } and Y = {y1 , y2 , y3 , · · · } . Consider the following array consisting of X ∪ Y and path through it. x1 y1

→ x2 ↙ → y2

x3





Thus the first element of X ∪ Y is x1 , the second is x2 the third is y1 the fourth is y2 etc. Consider the second claim. By the first part, there is a map from N onto X ×Y . Suppose without loss of generality that X is countable and α : N → X is one to one and onto. Then define β (y) ≡ 1, for all y ∈ Y ,and β (x) ≡ α−1 (x). Thus, β maps X × Y onto N and this shows there exist two onto maps, one mapping X ∪ Y onto N and the other mapping N onto X ∪ Y . Then Corollary 1.2.5 yields the conclusion.  Note that by induction this shows that if you have any finite set whose elements are countable sets, then the union of these is countable.

1.3

Equivalence Relations

There are many ways to compare elements of a set other than to say two elements are equal or the same. For example, in the set of people let two people be equivalent if they have the same weight. This would not be saying they were the same person, just that they weigh the same. Often such relations involve considering one characteristic of the elements of a set and then saying the two elements are equivalent if they are the same as far as the given characteristic is concerned. Definition 1.3.1 Let S be a set. ∼ is an equivalence relation on S if it satisfies the following axioms. 1. x ∼ x

for all x ∈ S. (Reflexive)

2. If x ∼ y then y ∼ x. (Symmetric) 3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive) Definition 1.3.2 [x] denotes the set of all elements of S which are equivalent to x and [x] is called the equivalence class determined by x or just the equivalence class of x. With the above definition one can prove the following simple theorem. Theorem 1.3.3 Let ∼ be an equivalence relation defined on a set, S and let H denote the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ∼ y and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅. Proof: If x ∼ y, then if z ∈ [y] , you have x ∼ y and y ∼ z so x ∼ z which shows that [y] ⊆ [x]. Similarly, [x] ⊆ [y]. If it is not the case that x ∼ y, then there can be no intersection of [x] and [y] because if z were in this intersection, then x ∼ z, z ∼ y so x ∼ y. 

1.4

Well Ordering and Induction

Mathematical induction and well ordering are two extremely important principles in math. They are often used to prove significant things which would be hard to prove otherwise. Definition 1.4.1 A set is well ordered if every nonempty subset S, contains a smallest element z having the property that z ≤ x for all x ∈ S.

6

CHAPTER 1. SOME PREREQUISITE TOPICS

Axiom 1.4.2 Any set of integers larger than a given number is well ordered. In particular, the natural numbers defined as N ≡ {1, 2, · · · } is well ordered. The above axiom implies the principle of mathematical induction. The symbol Z denotes the set of all integers. Note that if a is an integer, then there are no integers between a and a + 1. Theorem 1.4.3 (Mathematical induction) A set S ⊆ Z, having the property that a ∈ S and n+1 ∈ S whenever n ∈ S contains all integers x ∈ Z such that x ≥ a. Proof: Let T consist of all integers larger than or equal to a which are not in S. The theorem will be proved if T = ∅. If T ̸= ∅ then by the well ordering principle, there would have to exist a smallest element of T, denoted as b. It must be the case that b > a since by definition, a ∈ / T. Thus b ≥ a + 1, and so b − 1 ≥ a and b − 1 ∈ / S because if b − 1 ∈ S, then b − 1 + 1 = b ∈ S by the assumed property of S. Therefore, b − 1 ∈ T which contradicts the choice of b as the smallest element of T. (b − 1 is smaller.) Since a contradiction is obtained by assuming T ̸= ∅, it must be the case that T = ∅ and this says that every integer at least as large as a is also in S.  Mathematical induction is a very useful device for proving theorems about the integers. ∑n

Example 1.4.4 Prove by induction that

k=1

k2 =

n (n + 1) (2n + 1) . 6

I By inspection, if n = 1 then the formula is true. The sum yields 1 and so does the formula on the right. Suppose this formula is valid for some n ≥ 1 where n is an integer. Then n+1 ∑ k=1

2

k =

n ∑

2

k 2 + (n + 1) =

k=1

n (n + 1) (2n + 1) 2 + (n + 1) . 6

The step going from the first to the second line is based on the assumption that the formula is true for n. This is called the induction hypothesis. Now simplify the expression in the second line, n (n + 1) (2n + 1) 2 + (n + 1) . 6 This equals

( (n + 1)

and

n (2n + 1) + (n + 1) 6

)

n (2n + 1) 6 (n + 1) + 2n2 + n (n + 2) (2n + 3) + (n + 1) = = 6 6 6

Therefore, n+1 ∑ k=1

k2 =

(n + 1) (n + 2) (2n + 3) (n + 1) ((n + 1) + 1) (2 (n + 1) + 1) = , 6 6

showing the formula holds for n+1 whenever it holds for n. This proves the formula by mathematical induction. Example 1.4.5 Show that for all n ∈ N,

1 3 2n − 1 1 . · ··· (2n + 3) (2n + 1) and this is clearly true which may be seen from expanding both sides. This proves the inequality. Lets review the process just used. If S is the set of integers at least as large as 1 for which the formula holds, the first step was to show 1 ∈ S and then that whenever n ∈ S, it follows n + 1 ∈ S. Therefore, by the principle of mathematical induction, S contains [1, ∞) ∩ Z, all positive integers. In doing an inductive proof of this sort, the set S is normally not mentioned. One just verifies the steps above. First show the thing is true for some a ∈ Z and then verify that whenever it is true for m it follows it is also true for m + 1. When this has been done, the theorem has been proved for all m ≥ a.

1.5

The Complex Numbers and Fields

Recall that a real number is a point on the real number line. Just as a real number should be considered as a point on the line, a complex number is considered a point in the plane which can be identified in the usual way using the Cartesian coordinates of the point. Thus (a, b) identifies a point whose x coordinate is a and whose y coordinate is b. In dealing with complex numbers, such a point is written as a + ib. For example, in the following picture, I have graphed the point 3 + 2i. You see it corresponds to the point in the plane whose coordinates are (3, 2) . Multiplication and addition are defined in the most obvious way subject to the convention that i2 = −1. Thus, 3 + 2i (a + ib) + (c + id) = (a + c) + i (b + d) and (a + ib) (c + id)

= ac + iad + ibc + i2 bd = (ac − bd) + i (bc + ad) .

Every non zero complex number a + ib, with a2 + b2 ̸= 0, has a unique multiplicative inverse. 1 a − ib a b = 2 = 2 −i 2 . a + ib a + b2 a + b2 a + b2 You should prove the following theorem, assuming R is a field. Theorem 1.5.1 The complex numbers with multiplication and addition defined as above form a field satisfying all the field axioms. These are the following list of properties. In this list, F is the symbol for a field. 1. x + y = y + x, (commutative law for addition) 2. There exists 0 such that x + 0 = x for all x, (additive identity). 3. For each x ∈ F, there exists −x ∈ F such that x + (−x) = 0, (existence of additive inverse). 4. (x + y) + z = x + (y + z) ,(associative law for addition). 5. xy = yx, (commutative law for multiplication). You could write this as x × y = y × x. 6. (xy) z = x (yz) ,(associative law for multiplication). 7. There exists 1 such that 1x = x for all x,(multiplicative identity).

8

CHAPTER 1. SOME PREREQUISITE TOPICS 8. For each x ̸= 0, there exists x−1 such that xx−1 = 1.(existence of multiplicative inverse). 9. x (y + z) = xy + xz.(distributive law).

The symbol( x −)y means x + (−y). We call this subtraction of y from x. The symbol x/y for y ̸= 0 means x y −1 . This is called division. When you have a field F some things follow right away from the above axioms. Theorem 1.5.2 Let F be a field. This means it satisfies the axioms of the above theorem. Then the following hold. 1. 0 is unique. 2. −x is unique 3. 0x = 0 4. (−1) x = −x 5. x−1 is unique Proof: Consider the first claim. Suppose ˆ0 is another additive identity. Then ˆ0 = ˆ0 + 0 = 0 and so sure enough, there is only one such additive identity. Consider uniqueness of −x next. Suppose y is also an additive inverse. Then −x = −x + 0 = −x + (x + y) = (−x + x) + y = 0 + y = y so the additive inverse is unique also. 0x = (0 + 0) x = 0x + 0x Now add −0x to both sides to conclude that 0 = 0x. Next 0 = (1 + −1) x = x + (−1) x and by uniqueness of −x, this implies (−1) x = −x as claimed. Finally, if x ̸= 0 and y is a multiplicative inverse, ( ) x−1 = 1x−1 = (yx) x−1 = y xx−1 = y1 = y so y = x−1 .  Something which satisfies these axioms is called a field. Linear algebra is all about fields, although in this book, the field of most interest will be the field of complex numbers or the field of real numbers. You have seen in earlier courses that the real numbers also satisfy the above axioms. For a proof of this well accepted fact and construction of the real numbers, see Hobson [21] or my single variable advanced calculus book. Other books which do this are Hewitt and Stromberg [20] or Rudin [32]. There are two ways to show this, one due to Cantor and the other by Dedikind. Both are in Hobson, my book follows Cantor and so does the one by Hewitt and Stromberg. Rudin presents the other method. An important construction regarding complex numbers is the complex conjugate denoted by a horizontal line above the number. It is defined as follows. a + ib ≡ a − ib. What it does is reflect a given complex number across the x axis. Algebraically, the following formula is easy to obtain. ( ) a + ib (a + ib) = (a − ib) (a + ib) = a2 + b2 − i (ab − ab) = a2 + b2 .

1.5. THE COMPLEX NUMBERS AND FIELDS

9

Definition 1.5.3 Define the absolute value of a complex number as follows. |a + ib| ≡ 1/2 Thus, denoting by z the complex number z = a + ib, |z| = (zz) .



a2 + b2 .

Also from the definition, if z = x+iy and w = u+iv are two complex numbers, then |zw| = |z| |w| . You should verify this. I Notation 1.5.4 Recall the following notation. n ∑

aj ≡ a1 + · · · + an

j=1

There is also a notation which is used to denote a product. n ∏

aj ≡ a1 a2 · · · an

j=1

The triangle inequality holds for the absolute value for complex numbers just as it does for the ordinary absolute value. Proposition 1.5.5 Let z, w be complex numbers. Then the triangle inequality holds. |z + w| ≤ |z| + |w| , ||z| − |w|| ≤ |z − w| . Proof: Let z = x + iy and w = u + iv. First note that zw = (x + iy) (u − iv) = xu + yv + i (yu − xv) and so |xu + yv| ≤ |zw| = |z| |w| . 2

|z + w| = (x + u + i (y + v)) (x + u − i (y + v)) 2

2

= (x + u) + (y + v) = x2 + u2 + 2xu + 2yv + y 2 + v 2 2

2

2

≤ |z| + |w| + 2 |z| |w| = (|z| + |w|) , so this shows the first version of the triangle inequality. To get the second, z = z − w + w, w = w − z + z and so by the first form of the inequality |z| ≤ |z − w| + |w| , |w| ≤ |z − w| + |z| and so both |z| − |w| and |w| − |z| are no larger than |z − w| and this proves the second version because ||z| − |w|| is one of |z| − |w| or |w| − |z|.  With this definition, it is important to note the following. Be sure to verify this. It is not too hard but you need to do it. √ 2 2 Remark 1.5.6 : Let z = a + ib and w = c + id. Then |z − w| = (a − c) + (b − d) . Thus the distance between the point in the plane determined by the ordered pair (a, b) and the ordered pair (c, d) equals |z − w| where z and w are as just described. For example,√consider the distance between (2, 5) and (1, 8) . From the distance formula this √ 2 2 distance equals (2 − 1) + (5 − 8) = 10. On the other hand, letting z = 2 + i5 and w = 1 + i8, √ z − w = 1 − i3 and so (z − w) (z − w) = (1 − i3) (1 + i3) = 10 so |z − w| = 10, the same thing obtained with the distance formula.

10

CHAPTER 1. SOME PREREQUISITE TOPICS

1.6

Polar Form of Complex Numbers

Complex numbers, are often written in the so called polar form which is described next. Suppose z = x + iy is a complex number. Then ( ) √ x y x + iy = x2 + y 2 √ + i√ . x2 + y 2 x2 + y 2 Now note that

(

)2

x

√ x2 + y 2 (

and so



(

y

)2

√ x2 + y 2

+

x

y

x2 + y 2

=1

)

,√ x2 + y 2

is a point on the unit circle. Therefore, there exists a unique angle θ ∈ [0, 2π) such that cos θ = √

x x2 + y 2

, sin θ = √

y x2 + y 2

.

The polar √form of the complex number is then r (cos θ + i sin θ) where θ is this angle just described and r = x2 + y 2 ≡ |z|. r=



x2 + y 2

x + iy = r(cos(θ) + i sin(θ)) r * θ

1.7

Roots of Complex Numbers

A fundamental identity is the formula of De Moivre which follows. Theorem 1.7.1 Let r > 0 be given. Then if n is a positive integer, n

[r (cos t + i sin t)] = rn (cos nt + i sin nt) . Proof: It is clear the formula holds if n = 1. Suppose it is true for n. n+1

[r (cos t + i sin t)]

n

= [r (cos t + i sin t)] [r (cos t + i sin t)]

which by induction equals = rn+1 (cos nt + i sin nt) (cos t + i sin t) = rn+1 ((cos nt cos t − sin nt sin t) + i (sin nt cos t + cos nt sin t)) = rn+1 (cos (n + 1) t + i sin (n + 1) t) by the formulas for the cosine and sine of the sum of two angles.  Corollary 1.7.2 Let z be a non zero complex number. Then there are always exactly k k th roots of z in C.

1.7. ROOTS OF COMPLEX NUMBERS

11

Proof: Let z = x + iy and let z = |z| (cos t + i sin t) be the polar form of the complex number. By De Moivre’s theorem, a complex number r (cos α + i sin α) ,is a k th root of z if and only if 1/k rk (cos kα + i sin kα) = |z| (cos t + i sin t) . This requires rk = |z| and so r = |z| and also both cos (kα) = cos t and sin (kα) = sin t. This can only happen if kα = t + 2lπ for l an integer. Thus ( ) ( )) 1/k ( th α = t+2lπ roots of z are of the form |z| cos t+2lπ + i sin t+2lπ , l ∈ Z. k , l ∈ Z and so the k k k Since the cosine and sine are periodic of period 2π, there are exactly k distinct numbers which result from this formula.  Example 1.7.3 Find the three cube roots of i. ( ( ) ( )) First note that i = 1 cos π2 + i sin π2 . Using the formula in the proof of the above corollary, the cube roots of i are ) ( )) ( ( (π/2) + 2lπ (π/2) + 2lπ + i sin 1 cos 3 3 where l = 0, 1, 2. Therefore, the roots are ( ) ( ) ( ) ( ) (π) (π) 5 5 3 3 cos + i sin , cos π + i sin π , cos π + i sin π . 6 6 6 6 2 2 √ √ ( ) ( ) 3 1 − 3 1 Thus the cube roots of i are +i , +i , and −i. 2 2 2 2 The ability to find k th roots can also be used to factor some polynomials. Example 1.7.4 Factor the polynomial x3 − 27. First find ( the cube roots of 27. ( By the above)procedure using De Moivre’s theorem, these cube √ ) √ −1 3 −1 3 roots are 3, 3 +i , and 3 −i . Therefore, x3 − 27 = 2 2 2 2 (

(

(x − 3) x − 3

( √ )) ( √ )) 3 −1 3 −1 +i x−3 −i . 2 2 2 2

( ( ( √ )) ( √ )) 3 3 −1 Note also x − 3 −1 + i x − 3 − i = x2 + 3x + 9 and so 2 2 2 2 ( ) x3 − 27 = (x − 3) x2 + 3x + 9 where the quadratic polynomial x2 + 3x + 9 cannot be factored without using complex numbers. 3 Note √that even though √ the polynomial x − 27 has all real coefficients, it has some complex zeros, −1 3 −1 3 +i and −i . These zeros are complex conjugates of each other. It is always this way. 2 2 2 2 You should show this is the case. To see how to do this, see Problems 17 and 18 below. Another fact for your information is the fundamental theorem of algebra. This theorem says that any polynomial of degree at least 1 having any complex coefficients always has a root in C. This is sometimes referred to by saying C is algebraically complete. Gauss is usually credited with giving a proof of this theorem in 1797 but many others worked on it and the first completely correct proof was due to Argand in 1806. For more on this theorem, you can google fundamental theorem of algebra and look at the interesting Wikipedia article on it. Proofs of this theorem usually involve the use of techniques from calculus even though it is really a result in algebra. A proof and plausibility explanation is given later.

12

CHAPTER 1. SOME PREREQUISITE TOPICS

1.8

The Quadratic Formula

The quadratic formula x=

−b ±



b2 − 4ac 2a

gives the solutions x to ax2 + bx + c = 0 where a, b, c are real numbers. It holds even if b2 − 4ac < 0. This is easy to show from the above. There are exactly two square roots to this number b2 −4ac from the above methods using De Moivre’s theorem. These roots are of the form ( (π) ( π )) √ √ 4ac − b2 cos + i sin = i 4ac − b2 2 2 and

( )) ( ( ) √ √ 3π 3π + i sin = −i 4ac − b2 4ac − b2 cos 2 2

Thus the solutions, according to the quadratic formula are still given correctly by the above formula. Do these solutions predicted by the quadratic formula continue to solve the quadratic equation? Yes, they do. You only need to observe that when you square a square root of a complex number z, you recover z. Thus )2 ( ) ( √ √ −b + b2 − 4ac −b + b2 − 4ac +b +c a 2a 2a ( ) √ ) 1 2 1 −b + b2 − 4ac 1 √ 2 =a b − c − 2 b b − 4ac + b +c 2a2 a 2a 2a ( )) ) 1 ( √ 2 1 ( √ 2 b b − 4ac + 2ac − b2 b b − 4ac − b2 + c = 0 = − + 2a 2a (



b −4ac Similar reasoning shows directly that −b− 2a also solves the quadratic equation. What if the coefficients of the quadratic equation are actually complex numbers? Does the formula hold even in this case? The answer is yes. This is a hint on how to do Problem 28 below, a special case of the fundamental theorem of algebra, and an ingredient in the proof of some versions of this theorem. 2

Example 1.8.1 Find the solutions to x2 − 2ix − 5 = 0. Formally, from the quadratic formula, these solutions are √ 2i ± −4 + 20 2i ± 4 x= = = i ± 2. 2 2 Now you can check that these really do solve the equation. In general, this will be the case. See Problem 28 below.

1.9

The Complex Exponential

It was shown above that every complex number can be written in the form r (cos θ + i sin θ) where r ≥ 0. Laying aside the zero complex number, this shows that every non zero complex number is of the form eα (cos β + i sin β) . We write this in the form eα+iβ . Having done so, does it follow that the expression preserves the most important property of the function t → e(α+iβ)t for t real, that )′ ( e(α+iβ)t = (α + iβ) e(α+iβ)t ?

1.10. THE FUNDAMENTAL THEOREM OF ALGEBRA

13

By the definition just given which does not contradict the usual definition in case β = 0 and the usual rules of differentiation in calculus, ( )′ ( )′ e(α+iβ)t = eαt (cos (βt) + i sin (βt)) = eαt [α (cos (βt) + i sin (βt)) + (−β sin (βt) + iβ cos (βt))] Now consider the other side. From the definition it equals ( ) (α + iβ) eαt (cos (βt) + i sin (βt)) = eαt [(α + iβ) (cos (βt) + i sin (βt))] = eαt [α (cos (βt) + i sin (βt)) + (−β sin (βt) + iβ cos (βt))] which is the same thing. This is of fundamental importance in differential equations. It shows that there is no change in going from real to complex numbers for ω in the consideration of the problem y ′ = ωy, y (0) = 1. The solution is always eωt . The formula just discussed, that eα (cos β + i sin β) = eα+iβ is Euler’s formula.

1.10

The Fundamental Theorem of Algebra

The fundamental theorem of algebra states that every non constant polynomial having coefficients in C has a zero in C. If C is replaced by R, this is not true because of the example, x2 + 1 = 0. This theorem is a very remarkable result and notwithstanding its title, all the most straightforward proofs depend on either analysis or topology. It was first mostly proved by Gauss in 1797. The first complete proof was given by Argand in 1806. The proof given here follows Rudin [32]. See also Hardy [18] for a similar proof, more discussion and references. The shortest proof is found in the theory of complex analysis. First I will give an informal explanation of this theorem which shows why it is reasonable to believe in the fundamental theorem of algebra. Theorem 1.10.1 Let p (z) = an z n + an−1 z n−1 + · · · + a1 z + a0 where each ak is a complex number and an ̸= 0, n ≥ 1. Then there exists w ∈ C such that p (w) = 0. To begin with, here is the informal explanation. Dividing by the leading coefficient an , there is no loss of generality in assuming that the polynomial is of the form p (z) = z n + an−1 z n−1 + · · · + a1 z + a0 If a0 = 0, there is nothing to prove because p (0) = 0. Therefore, assume a0 ̸= 0. From the polar form of a complex number z, it can be written as |z| (cos θ + i sin θ). Thus, by DeMoivre’s theorem, n

z n = |z| (cos (nθ) + i sin (nθ)) n

It follows that z n is some point on the circle of radius |z| Denote by Cr the circle of radius r in the complex plane which is centered at 0. Then if r is sufficiently large and |z| = r, the term z n is far larger than the rest of the polynomial. It is on n k the circle of radius |z| while the other terms are on circles of fixed multiples of |z| for k ≤ n − 1. Thus, for r large enough, Ar = {p (z) : z ∈ Cr } describes a closed curve which misses the inside of some circle having 0 as its center. It won’t be as simple as suggested in the following picture, but it will be a closed curve thanks to De Moivre’s theorem and the observation that the cosine and sine are periodic. Now shrink r. Eventually, for r small enough, the non constant terms are negligible and so Ar is a curve which is contained in some circle centered at a0 which has 0 on the outside.

14

CHAPTER 1. SOME PREREQUISITE TOPICS Ar a0

Thus it is reasonable to believe that for some r during this shrinking process, the set Ar must hit 0. It follows that 0 p (z) = 0 for some z. r small For example, consider the polynomial x3 + x + 1 + i. It has no real zeros. However, you could let z = r (cos t + i sin t) and insert this into the polynomial. Thus you would want to find a point where Ar

r large

3

(r (cos t + i sin t)) + r (cos t + i sin t) + 1 + i = 0 + 0i Expanding this expression on the left to write it in terms of real and imaginary parts, you get on the left ( ) r3 cos3 t − 3r3 cos t sin2 t + r cos t + 1 + i 3r3 cos2 t sin t − r3 sin3 t + r sin t + 1 Thus you need to have both the real and imaginary parts equal to 0. In other words, you need to have (0, 0) = ( 3 ) r cos3 t − 3r3 cos t sin2 t + r cos t + 1, 3r3 cos2 t sin t − r3 sin3 t + r sin t + 1 for some value of r and t. First here is a graph of this parametric function of t for t ∈ [0, 2π] on the left, when r = 4. Note how the graph misses the origin 0 + i0. In fact, the closed curve is in the exterior of a circle which has the point 0 + i0 on its inside. r too big r too small r just right 50

2

0

y 0

-50

-2

y

-50

0

50

x

4

y

2 0 -2

-2

0

x

2

-4

-2

0

2

4

6

x

Next is the graph when r = .5. Note how the closed curve is included in a circle which has 0 + i0 on its outside. As you shrink r you get closed curves. At first, these closed curves enclose 0 + i0 and later, they exclude 0 + i0. Thus one of them should pass through this point. In fact, consider the curve which results when r = 1. 386 which is the graph on the right. Note how for this value of r the curve passes through the point 0 + i0. Thus for some t, 1.386 (cos t + i sin t) is a solution of the equation p (z) = 0 or very close to one. Now here is a short rigorous proof for those who have studied analysis. The needed analysis will be presented later in the book. You need the extreme value theorem for example. Proof: Suppose the nonconstant polynomial p (z) = a0 + a1 z + · · · + an z n , an ̸= 0, has no zero in C. Since lim|z|→∞ |p (z)| = ∞, there is a z0 with |p (z0 )| = minz∈C |p (z)| > 0. Then let 0) q (z) = p(z+z p(z0 ) . This is also a polynomial which has no zeros and the minimum of |q (z)| is 1 and occurs at z = 0. Since q (0) = 1, it follows q (z) = 1 + ak z k + r (z) where r (z) consists of higher order terms. Here ak is the first coefficient which is nonzero. Choose a sequence, zn → 0, such that ak znk < 0. For example, let −ak znk = (1/n). Then |q (zn )| = 1 + ak z k + r (z) ≤ 1 − 1/n + |r (zn )| = 1 + ak znk + |r (zn )| < 1 for all n large enough because |r (zn )| is small compared with ak znk since it involves higher order terms. This is a contradiction. 

1.11

Ordered Fields

To do linear algebra, you need a field which is something satisfying the axioms listed in Theorem 1.5.1. This is generally all that is needed to do linear algebra but for the sake of completeness, the

1.11. ORDERED FIELDS

15

concept of an ordered field is considered here. The real numbers also have an order defined on them. This order may be defined by reference to the positive real numbers, those to the right of 0 on the number line, denoted by R+ . More generally, for a field, one could consider an order if there is such a “positive cone” called the positive numbers such that the following axioms hold. Axiom 1.11.1 The sum of two positive real numbers is positive. Axiom 1.11.2 The product of two positive real numbers is positive. Axiom 1.11.3 For a given real number x one and only one of the following alternatives holds. Either x is positive, x = 0, or −x is positive. An example of this is the field of rational numbers. Definition 1.11.4 x < y exactly when y + (−x) ≡ y − x ∈ R+ . In the usual way, x < y is the same as y > x and x ≤ y means either x < y or x = y. The symbol ≥ is defined similarly. Theorem 1.11.5 The following hold for the order defined as above. 1. If x < y and y < z then x < z (Transitive law). 2. If x < y then x + z < y + z (addition to an inequality). 3. If x ≤ 0 and y ≤ 0, then xy ≥ 0. 4. If x > 0 then x−1 > 0. 5. If x < 0 then x−1 < 0. 6. If x < y then xz < yz if z > 0, (multiplication of an inequality). 7. If x < y and z < 0, then xz > zy (multiplication of an inequality). 8. Each of the above holds with > and < replaced by ≥ and ≤ respectively except for 4 and 5 in which we must also stipulate that x ̸= 0. 9. For any x and y, exactly one of the following must hold. Either x = y, x < y, or x > y (trichotomy). Proof: First consider 1, the transitive law. Suppose x < y and y < z. Why is x < z? In other words, why is z − x ∈ R+ ? It is because z − x = (z − y) + (y − x) and both z − y, y − x ∈ R+ . Thus by 1.11.1 above, z − x ∈ R+ and so z > x. Next consider 2, addition to an inequality. If x < y why is x + z < y + z? it is because (y + z) + − (x + z)

= (y + z) + (−1) (x + z) =

y + (−1) x + z + (−1) z

=

y − x ∈ R+ .

Next consider 3. If x ≤ 0 and y ≤ 0, why is xy ≥ 0? First note there is nothing to show if either x or y equal 0 so assume this is not the case. By 1.11.3 −x > 0 and −y > 0. Therefore, by 1.11.2 2 2 and what was proved about −x = (−1) x, (−x) (−y) = (−1) xy ∈ R+ . Is (−1) = 1? If so the claim 2 is proved. But − (−1) = (−1) and − (−1) = 1 because −1 + 1 = 0. Next consider 4. If x > 0 why is x−1 > 0? By 1.11.3 either x−1 = 0 or −x−1 ∈ R+ . It can’t happen that x−1 = 0 because then you would have to have 1 = 0x and as was shown earlier, 0x = 0. Therefore, consider the possibility that −x−1 ∈ R+ . This can’t work either because then you would have (−1) x−1 x = (−1) (1) = −1 and it would follow from 1.11.2 that −1 ∈ R+ . But this is impossible because if x ∈ R+ , then (−1) x = −x ∈ R+ and contradicts 1.11.3 which states that either −x or x is in R+ but not both.

16

CHAPTER 1. SOME PREREQUISITE TOPICS

−1 −1 −1 Next ( )consider 5. If x < 0, why is x < 0? As before, x ̸= 0. If x > 0, then as before, −x x−1 = −1 ∈ R+ which was just shown not to occur. Next consider 6. If x < y why is xz < yz if z > 0? This follows because yz − xz = z (y − x) ∈ R+ since both z and y − x ∈ R+ . Next consider 7. If x < y and z < 0, why is xz > zy? This follows because zx − zy = z (x − y) ∈ R+ by what was proved in 3. The last two claims are obvious and left for you. 

1.12

Integers

This is a review of fundamental arithmetic concepts pertaining to the real numbers and emphasizing the integers.

1.12.1

Division of Numbers

First of all, recall the Archimedean property of the real numbers which says that if x is any real number, and if a > 0 then there exists a positive integer n such that na > x. Geometrically, it is essentially the following: For any a > 0, the succession of disjoint intervals [0, a), [a, 2a), [2a, 3a), · · · includes all nonnegative real numbers. Here is a picture of some of these intervals. 0a

1a

2a

3a

4a

Then the version of the Euclidean algorithm presented here says that, for an arbitrary nonnegative real number b, it is in exactly one interval [pa, (p + 1) a) where p is some nonnegative integer. This seems obvious from the picture, but here is a proof. Theorem 1.12.1 Suppose 0 < a and let b ≥ 0. Then there exists a unique integer p and real number r such that 0 ≤ r < a and b = pa + r. Proof: Let S ≡ {n ∈ N : an > b} . By the Archimedean property this set is nonempty. Let p + 1 be the smallest element of S. Then pa ≤ b because p + 1 is the smallest in S. Therefore, r ≡ b − pa ≥ 0. If r ≥ a then b − pa ≥ a and so b ≥ (p + 1) a contradicting p + 1 ∈ S. Therefore, r < a as desired. To verify uniqueness of p and r, suppose pi and ri , i = 1, 2, both work and r2 > r1 . Then a little 1 algebra shows p1 − p2 = r2 −r ∈ (0, 1) .Thus p1 − p2 is an integer between 0 and 1, and there are no a such integers. The case that r1 > r2 cannot occur either by similar reasoning. Thus r1 = r2 and it follows that p1 = p2 .  Note that if a, b are integers, then so is r. Corollary 1.12.2 The same conclusion is reached if b < 0. Proof: In this case, S ≡ {n ∈ N : an > −b} . Let p + 1 be the smallest element of S. Then pa ≤ −b < (p + 1) a and so (−p) a ≥ b > − (p + 1) a. Let r ≡ b + (p + 1). Then b = − (p + 1) a + r and a > r ≥ 0. As to uniqueness, say ri works and r1 > r2 . Then you would have b = p1 a + r1 , b = p2 a + r2 2 and p2 − p1 = r1 −r ∈ (0, 1) which is impossible because p2 − p1 is an integer. Hence r1 = r2 and a so also p1 = p2 . 

Corollary 1.12.3 Suppose a, b ̸= 0, then there exists r such that |r| < |a| and for some p an integer, b = ap + r.

1.12. INTEGERS

17

Proof: This is done in the above except for the case where a < 0. So suppose this is the case. Then b = p (−a) + r where r is positive and 0 ≤ r < −a = |a|. Thus b = (−p) a + r such that 0 ≤ |r| < |a|.  This theorem is called the Euclidean algorithm when a and b are integers and this is the case of most interest here. Note that if a, b are integers, then so is r. Note that 7 = 2 × 3 + 1, 7 = 3 × 3 − 2, |1| < 3, |−2| < 3 so in this last corollary, the p and r are not unique. The following definition describes what is meant by a prime number and also what is meant by the word “divides”. Definition 1.12.4 The number, a divides the number, b if in Theorem 1.12.1, r = 0. That is there is zero remainder. The notation for this is a|b, read a divides b and a is called a factor of b. A prime number is a number at least 2 which has the property that the only numbers which divide it are itself and 1. The greatest common divisor of two positive integers, m, n is that number, p which has the property that p divides both m and n and also if q divides both m and n, then q divides p. Two integers are relatively prime if their greatest common divisor is one. The greatest common divisor of m and n is denoted as (m, n) . There is a phenomenal and amazing theorem which relates the greatest common divisor to the smallest number in a certain set. Suppose m, n are two positive integers. Then if x, y are integers, so is xm + yn. Consider all integers which are of this form. Some are positive such as 1m + 1n and some are not. The set S in the following theorem consists of exactly those integers of this form which are positive. Then the greatest common divisor of m and n will be the smallest number in S. This is what the following theorem says. Theorem 1.12.5 Let m, n be two positive integers and define S ≡ {xm + yn ∈ N : x, y ∈ Z } . Then the smallest number in S is the greatest common divisor, denoted by (m, n) . Proof: First note that both m and n are in S so it is a nonempty set of positive integers. By well ordering, there is a smallest element of S, called p = x0 m + y0 n. Either p divides m or it does not. If p does not divide m, then by Theorem 1.12.1, m = pq + r where 0 < r < p. Thus m = (x0 m + y0 n) q + r and so, solving for r, r = m (1 − x0 ) + (−y0 q) n ∈ S. However, this is a contradiction because p was the smallest element of S. Thus p|m. Similarly p|n. Now suppose q divides both m and n. Then m = qx and n = qy for integers, x and y. Therefore, p = mx0 + ny0 = x0 qx + y0 qy = q (x0 x + y0 y) showing q|p. Therefore, p = (m, n) .  This amazing theorem will now be used to prove a fundamental property of prime numbers which leads to the fundamental theorem of arithmetic, the major theorem which says every integer can be factored as a product of primes. Theorem 1.12.6 If p is a prime and p|ab then either p|a or p|b. Proof: Suppose p does not divide a. Then since p is prime, the only factors of p are 1 and p so follows (p, a) = 1 and therefore, there exists integers, x and y such that 1 = ax + yp. Multiplying this equation by b yields b = abx + ybp.Since p|ab, ab = pz for some integer z. Therefore, b = abx + ybp = pzx + ybp = p (xz + yb) and this shows p divides b.  The following is the fundamental theorem of arithmetic.

18

CHAPTER 1. SOME PREREQUISITE TOPICS

∏n Theorem 1.12.7 (Fundamental theorem of arithmetic) Let a ∈ N\ {1}. Then a = i=1 pi where pi are all prime numbers. Furthermore, this prime factorization is unique except for the order of the factors. Proof: If a equals a prime number, the prime factorization clearly exists. In particular the prime factorization exists for the prime number 2. Assume this theorem is true for all a ≤ n − 1. If n is a prime, then it has a prime factorization. On the other hand, if n is not a prime, then there exist two integers k and m such that n = km where each of k and m are less than n. Therefore, each of these is no larger than n − 1 and consequently, each has a prime factorization. Thus so does n. It remains to argue ∏m factorization is unique except for order of the factors. ∏n the prime Suppose i=1 pi = j=1 qj where the pi and qj are all prime, there is no way to reorder the qk such that m = n and pi = qi for all i, and n + m is the smallest positive integer such that this happens. Then by Theorem 1.12.6, p1 |qj for some j. Since these are prime numbers this requires p1 = qj . Reordering if necessary it can be assumed that qj = q1 . Then dividing both sides by ∏m−1 ∏n−1 p1 = q1 , i=1 pi+1 = j=1 qj+1 . Since n + m was as small as possible for the theorem to fail, it follows that n − 1 = m − 1 and the prime numbers, q2 , · · · , qm can be reordered in such a way that pk = qk for all k = 2, · · · , n. Hence pi = qi for all i because it was already argued that p1 = q1 , and this results in a contradiction. 

1.13

Polynomials

It will be very important to be able to work with polynomials in certain parts of linear algebra to be presented later. Polynomials are a lot like integers. The notion of division is important for polynomials in the same way that it is for integers. Definition 1.13.1 A polynomial is an expression of the form an λn + an−1 λn−1 + · · · + a1 λ + a0 , an ̸= 0 where the ai come from a field of scalars. Two polynomials are equal means that the coefficients match for each power of λ. The degree of a polynomial is the largest power of λ. Thus the degree of the above polynomial is n. Addition of polynomials is defined in the usual way as is multiplication of two polynomials. The leading term in the above polynomial is an λn . The coefficient of the leading term is called the leading coefficient. It is called a monic polynomial if an = 1. Note that the degree of the zero polynomial is not defined in the above. Lemma 1.13.2 Let f (λ) and g (λ) = ̸ 0 be polynomials. Then there exist polynomials, q (λ) and r (λ) such that f (λ) = q (λ) g (λ) + r (λ) where the degree of r (λ) is less than the degree of g (λ) or r (λ) = 0. These polynomials q (λ) and r (λ) are unique. Proof: Suppose that f (λ)−q (λ) g (λ) is never equal to 0 for any q (λ). If it is, then the conclusion follows. Now suppose r (λ) = f (λ) − q (λ) g (λ) and the degree of r (λ) is m ≥ n where n is the degree of g (λ). Say the leading term of r (λ) is bλm while the leading term of g (λ) is ˆbλn . Then letting a = b/ˆb , aλm−n g (λ) has the same leading term as r (λ). Thus the degree of r1 (λ) ≡ r (λ) − aλm−n g (λ) is no more than m − 1. Then   q1 (λ) z }| { ( )   r1 (λ) = f (λ) − q (λ) g (λ) + aλm−n g (λ) = f (λ) − q (λ) + aλm−n  g (λ) Denote by S the set of polynomials f (λ) − g (λ) l (λ) . Out of all these polynomials, there exists one which has smallest degree r (λ). Let this take place when l (λ) = q (λ). Then by the above

1.13. POLYNOMIALS

19

argument, the degree of r (λ) is less than the degree of g (λ). Otherwise, there is one which has smaller degree. Thus f (λ) = g (λ) q (λ) + r (λ). As to uniqueness, if you have r (λ) , rˆ (λ) , q (λ) , qˆ (λ) which work, then you would have (ˆ q (λ) − q (λ)) g (λ) = r (λ) − rˆ (λ) Now if the polynomial on the right is not zero, then neither is the one on the left. Hence this would involve two polynomials which are equal although their degrees are different. This is impossible. Hence r (λ) = rˆ (λ) and so, matching coefficients implies that qˆ (λ) = q (λ).  Definition 1.13.3 A polynomial f is said to divide a polynomial g if g (λ) = f (λ) r (λ) for some polynomial r (λ). Let {ϕi (λ)} be a finite set of polynomials. The greatest common divisor will be the monic polynomial q (λ) such that q (λ) divides each ϕi (λ) and if p (λ) divides each ϕi (λ) , then p (λ) divides q (λ) . The finite set of polynomials {ϕi } is said to be relatively prime if their greatest common divisor is 1. A polynomial f (λ) is irreducible if there is no polynomial with coefficients in F which divides it except nonzero scalar multiples of f (λ) and constants. In other words, it is not possible to write f (λ) = a (λ) b (λ) where each of a (λ) , b (λ) have degree less than the degree of f (λ) unless one of a (λ) , b (λ) is a constant. Proposition 1.13.4 The greatest common divisor is unique. Proof: Suppose both q (λ) and q ′ (λ) work. Then q (λ) divides q ′ (λ) and the other way around and so q ′ (λ) = q (λ) l (λ) , q (λ) = l′ (λ) q ′ (λ) Therefore, the two must have the same degree. Hence l′ (λ) , l (λ) are both constants. However, this constant must be 1 because both q (λ) and q ′ (λ) are monic.  Theorem 1.13.5 Let {ϕi (λ)} be polynomials, not all of which are zero polynomials. Then there exists a greatest common divisor and it equals the monic polynomial ψ (λ) of smallest degree such that there exist polynomials ri (λ) satisfying ψ (λ) =

p ∑

ri (λ) ϕi (λ) .

i=1

∑p Proof: Let S denote the set of monic polynomials which are of the form i=1 ri (λ) ϕi (λ). where ri (λ) is a polynomial. Then S = ̸ ∅ because some ϕ (λ) = ̸ 0. Then let the r i be chosen such i ∑p that the degree of the expression i=1 ri (λ) ϕi (λ) is as small as possible. Letting ψ (λ) equal this sum, it remains to verify it is the greatest common divisor. First, does it divide each ϕi (λ)? Suppose it fails to divide ϕ1 (λ) . Then by Lemma 1.13.2 ϕ1 (λ) = ψ (λ) l (λ) + r (λ) where degree of r (λ) is less than that of ψ (λ). Then dividing r (λ) by the leading coefficient if necessary and denoting the result by ψ 1 (λ) , it follows the degree of ψ 1 (λ) is less than the degree of ψ (λ) and ψ 1 (λ) equals for some a ∈ F ψ 1 (λ) = (ϕ1 (λ) − ψ (λ) l (λ)) a ( =

ϕ1 (λ) − (

=

p ∑

) ri (λ) ϕi (λ) l (λ) a

i=1

(1 − r1 (λ)) ϕ1 (λ) +

p ∑ i=2

) (−ri (λ) l (λ)) ϕi (λ) a

20

CHAPTER 1. SOME PREREQUISITE TOPICS

This is one of the polynomials in S. Therefore, ψ (λ) does not have the smallest degree after all because the degree of ψ 1 (λ) is smaller. This is a contradiction. Therefore, ψ (λ) divides ϕ1 (λ) . Similarly it divides all the other ϕi (λ). ∑p If p (λ) divides all the ϕi (λ) , then it divides ψ (λ) because of the formula for ψ (λ) which equals i=1 ri (λ) ϕi (λ) . Thus ψ (λ) satisfies the condition to be the greatest common divisor. This shows the greatest common divisor exists and equals the above description of it.  Lemma 1.13.6 Suppose ϕ (λ) and ψ (λ) are monic polynomials which are irreducible and not equal. Then they are relatively prime. Proof: Suppose η (λ) is a nonconstant polynomial. If η (λ) divides ϕ (λ) , then since ϕ (λ) is irreducible, ϕ (λ) = η (λ) a ˜ for some constant a ˜. Thus η (λ) equals aϕ (λ) for some a ∈ F. If η (λ) divides ψ (λ) then it must be of the form bψ (λ) for some b ∈ F and so it follows η (λ) = aϕ (λ) = bψ (λ) , a ϕ (λ) b but both ψ (λ) and ϕ (λ) are monic polynomials which implies a = b and so ψ (λ) = ϕ (λ). This is assumed not to happen. It follows the only polynomials which divide both ψ (λ) and ϕ (λ) are constants and so the two polynomials are relatively prime. Thus a polynomial which divides them both must be a constant, and if it is monic, then it must be 1. Thus 1 is the greatest common divisor.  ψ (λ) =

Lemma 1.13.7 Let ψ (λ) be an irreducible monic polynomial not equal to 1 which divides p ∏

k

ϕi (λ) i , ki a positive integer,

i=1

where each ϕi (λ) is an irreducible monic polynomial not equal to 1. Then ψ (λ) equals some ϕi (λ) . ∏p k Proof : Say ψ (λ) l (λ) = i=1 ϕi (λ) i . Suppose ψ (λ) ̸= ϕi (λ) for all i. Then by Lemma 1.13.6, there exist polynomials mi (λ) , ni (λ) such that 1 =

ψ (λ) mi (λ) + ϕi (λ) ni (λ)

ϕi (λ) ni (λ) = 1 − ψ (λ) mi (λ) Hence, (ϕi (λ) ni (λ))

ki

= (1 − ψ (λ) mi (λ))

n (λ) l (λ) ψ (λ) =

p ∏

ki

and so letting n (λ) =

(ni (λ) ϕi (λ))

ki

=

i=1

p ∏



k

i

ni (λ) i ,

(1 − ψ (λ) mi (λ))

ki

i=1

= 1 + g (λ) ψ (λ) for a suitable polynomial g (λ) . You just separate out the term 1ki = 1 in that product and then all terms that are left have a ψ (λ) as a factor. Hence (n (λ) l (λ) − g (λ)) ψ (λ) = 1 which is impossible because ψ (λ) is not equal to 1.  Of course, since coefficients are in a field, you can drop the stipulation that the polynomials are monic and replace the conclusion with: ψ (λ) is a multiple of some ϕi (λ) . Now here is a simple lemma about canceling monic polynomials.

1.13. POLYNOMIALS

21

Lemma 1.13.8 Suppose p (λ) is a monic polynomial and q (λ) is a polynomial such that p (λ) q (λ) = 0. Then q (λ) = 0. Also if p (λ) q1 (λ) = p (λ) q2 (λ) then q1 (λ) = q2 (λ) . Proof: Let p (λ) =

k ∑

pj λj , q (λ) =

j=1

n ∑

qi λi , pk = 1.

i=1

Then the product equals k ∑ n ∑

pj qi λi+j .

j=1 i=1

If not all qi = 0, let qm be the last coefficient which is nonzero. Then the above is of the form k ∑ m ∑

pj qi λi+j = 0

j=1 i=1

Consider the λm+k term. There is only one and it is pk qm λm+k . Since pk = 1, qm = 0 after all. The second part follows from p (λ) (q1 (λ) − q2 (λ)) = 0.  The following is the analog of the fundamental theorem of arithmetic for polynomials. Theorem 1.13.9 Let f (λ) ∏nbe a nonconstant polynomial with coefficients in F. Then there is some a ∈ F such that f (λ) = a i=1 ϕi (λ) where ϕi (λ) is an irreducible nonconstant monic polynomial and repeats are allowed. Furthermore, this factorization is unique in the sense that any two of these factorizations have the same nonconstant factors in the product, possibly in different order and the same constant a. Proof: That such a factorization exists is obvious. If f (λ) is irreducible, you are done. Factor out the leading coefficient. If not, then f (λ) = aϕ1 (λ) ϕ2 (λ) where these are monic polynomials. Continue doing this with the ϕi and eventually arrive at a factorization of the desired form. It remains to argue the factorization is unique except for order of the factors. Suppose a

n ∏

ϕi (λ) = b

i=1

m ∏

ψ i (λ)

i=1

where the ϕi (λ) and the ψ i (λ) are all irreducible monic nonconstant polynomials and a, b ∈ F. If n > m, then by Lemma 1.13.7, each ψ i (λ) equals one of the ϕj (λ) . By the above cancellation lemma, Lemma 1.13.8, you can cancel all these ψ i (λ) with appropriate ϕj (λ) and obtain a contradiction because the resulting polynomials on either side would have different degrees. Similarly, it cannot happen that n < m. It follows n = m and the two products consist of the same polynomials. Then it follows a = b.  The following corollary will be well used. This corollary seems rather believable but does require a proof. ∏p k Corollary 1.13.10 Let q (λ) = i=1 ϕi (λ) i where the ki are positive integers and the ϕi (λ) are irreducible distinct monic polynomials. Suppose also that p (λ) is a monic polynomial which divides q (λ) . Then p ∏ r p (λ) = ϕi (λ) i i=1

where ri is a nonnegative integer no larger than ki .

22

CHAPTER 1. SOME PREREQUISITE TOPICS

∏s r Proof: Using Theorem 1.13.9, let p (λ) = b i=1 ψ i (λ) i where the ψ i (λ) are each irreducible and monic and b ∈ F. Since p (λ) is monic, b = 1. Then there exists a polynomial g (λ) such that p (λ) g (λ) = g (λ)

s ∏

ri

ψ i (λ)

=

i=1

p ∏

ϕi (λ)

ki

i=1

Hence g (λ) must be monic. Therefore, p(λ)

z }| { p s l ∏ ∏ ∏ r k p (λ) g (λ) = ψ i (λ) i η j (λ) = ϕi (λ) i i=1

j=1

i=1

for η j monic and irreducible. By uniqueness, each ψ i equals one of the ϕj (λ) and the same holding true of the η i (λ). Therefore, p (λ) is of the desired form because you can cancel the η j (λ) from both sides. 

1.14

Examples of Finite Fields

The emphasis of the first part of this book will be on what can be done on the basis of algebra alone. Linear algebra only needs a field of scalars along with some axioms involving an Abelian group of vectors and there are infinitely many examples of fields, including some which are finite. Since it is good to have examples in mind, I will present the finite fields of residue classes modulo a prime number in this little section. Then, when linear algebra is developed in the first part of the book and reference is made to a field of scalars, you should think that it is possible that the field might be this field of residue classes.

1.14.1

The Field Zp

Here is the construction of the finite fields Zp for p a prime. Definition 1.14.1 Let Z+ denote the set of nonnegative integers. Thus Z+ = {0, 1, 2, 3, · · · }. Also let p be a prime number. We will say that two integers, a, b are equivalent and write a ∼ b if a − b is divisible by p. Thus they are equivalent if a − b = px for some integer x. Proposition 1.14.2 The relation ∼ is an equivalence relation. Denoting by n ¯ the equivalence class determined by n ∈ N, the following are well defined operations. n ¯+m ¯ ≡n+m n ¯m ¯ ≡ nm { } which makes the set Zp consisting of ¯0, ¯1, · · · , p − 1 into a field. Proof: First note that for n ∈ Z+ there always exists r ∈ {0, 1, · · · , p − 1} such that n ¯ = r¯. This is clearly true because if n ∈ Z+ , then n = mp + r for r < p, this by the Euclidean algorithm. Thus r¯ = n ¯ . Now suppose that n ¯1 = n ¯ and m ¯ 1 = m. ¯ Is it true that n1 + m1 = n + m? Is it true that (n + m) − (n1 + m1 ) is a multiple of p? Of course since n1 − n and m1 − m are both multiples of p. Similarly, is n1 m1 = nm? Is nm − n1 m1 a multiple of p? Of course this is so because nm − n1 m1

= nm − n1 m + n1 m − n1 m1 = m (n − n1 ) + n1 (m − m1 )

which is a multiple of p. Thus the operations are well defined. It follows that all of the field axioms hold except possibly the existence of a multiplicative inverse and an additive inverse. First consider

1.15.

SOME TOPICS FROM ANALYSIS

23

the question of an additive inverse. A typical thing in Zp is of the form r¯ where 0 ≤ r ≤ p − 1. Then consider (p − r) . By definition, r¯ + p − r = p¯ = ¯0 and so the additive inverse exists. Now consider the existence of a multiplicative inverse. This is where p is prime is used. Say n ¯ ̸= ¯ 0. That is, n is not a multiple of p, 0 ≤ n < p. Then since p is prime, n, p are relatively prime and so there are integers x, y such that 1 = xn+yp. Choose m ≥ 0 such that pm+x > 0, pm+y > 0. Then 1 + pmn + pmp = (pm + x) n + (pm + y) p It follows that 1 + pmn + p2 m = ¯ 1, ¯ 1 = (pm + x)¯ n and so (pm + x) is the multiplicative inverse of n ¯.  Thus Zp is a finite field, known as the field of residue classes modulo p. Something else which is often considered is a commutative ring with unity. Definition 1.14.3 A commutative ring with unity is just a field except it lacks the property that nonzero elements have a multiplicative inverse. It has all other properties. In this book, this will be referred to simply as a commutative ring. I will assume that commutative rings always have 1. Thus the axioms of a commutative ring with unity are as follows: Axiom 1.14.4 Here are the axioms for a commutative ring with unity. 1. x + y = y + x, (commutative law for addition) 2. There exists 0 such that x + 0 = x for all x, (additive identity). 3. For each x ∈ F, there exists −x ∈ F such that x + (−x) = 0, (existence of additive inverse). 4. (x + y) + z = x + (y + z) ,(associative law for addition). 5. xy = yx, (commutative law for multiplication). You could write this as x × y = y × x. 6. (xy) z = x (yz) ,(associative law for multiplication). 7. There exists 1 such that 1x = x for all x,(multiplicative identity). 8. x (y + z) = xy + xz.(distributive law). An example of such a thing is Zm where m is not prime, also the ordinary integers. However, the integers are also an integral domain. Definition 1.14.5 A commutative ring with unity is called an integral domain if, in addition to the above, whenever ab = 0, it follows that either a = 0 or b = 0.

1.15

Some Topics From Analysis

Recall from calculus that if A is a nonempty set, supa∈A f (a) denotes the least upper bound of f (A) or if this set is not bounded above, it equals ∞. Also inf a∈A f (a) denotes the greatest lower bound of f (A) if this set is bounded below and it equals −∞ if f (A) is not bounded below. Thus to say supa∈A f (a) = ∞ is just a way to say that A is not bounded above. Definition 1.15.1 Let f (a, b) ∈ [−∞, ∞] for a ∈ A and b ∈ B where A, B are sets which means that f (a, b) is either a number, ∞, or −∞. The symbol, +∞ is interpreted as a point out at the end of the number line which is larger than every real number. Of course there is no such number. That is why it is called ∞. The symbol, −∞ is interpreted similarly. Then supa∈A f (a, b) means sup (Sb ) where Sb ≡ {f (a, b) : a ∈ A} . Note that if {an } is an increasing sequence of real numbers, supn {an } = limn→∞ an if supn {an } < ∞ and also if we define limn→∞ an ≡ ∞ if supn {an } = ∞. Unlike limits, you can take the sup in different orders.

24

CHAPTER 1. SOME PREREQUISITE TOPICS

Lemma 1.15.2 Let f (a, b) ∈ [−∞, ∞] for a ∈ A and b ∈ B where A, B are sets. Then sup sup f (a, b) = sup sup f (a, b) . a∈A b∈B

b∈B a∈A

Proof: Note that for all a, b, f (a, b) ≤ sup sup f (a, b) b∈B a∈A

and therefore, for all a, supb∈B f (a, b) ≤ supb∈B supa∈A f (a, b). Therefore, sup sup f (a, b) ≤ sup sup f (a, b) . a∈A b∈B

b∈B a∈A

Repeat the same argument interchanging a and b, to get the conclusion of the lemma. 

1.15.1

lim sup and lim inf

The nice thing about lim sup and lim inf is that they always exist, unlike the limit of a sequence. n Recall how in calculus, there is no limit of (−1) . First here is a simple lemma and definition. Definition 1.15.3 Denote by [−∞, ∞] the real line along with symbols ∞ and −∞. It is understood that ∞ is larger than every real number and −∞ is smaller than every real number. Then if {An } is an increasing sequence of points of [−∞, ∞] , limn→∞ An is defined to equal ∞ if the only upper bound of the set {An } is ∞. If {An } is bounded above by a real number, then limn→∞ An is defined in the usual way and equals the least upper bound of {An }. If {An } is a decreasing sequence of points of [−∞, ∞] , limn→∞ An equals −∞ if the only lower bound of the sequence {An } is −∞. If {An } is bounded below by a real number, then limn→∞ An is defined in the usual way and equals the greatest lower bound of {An }. More simply, if {An } is increasing, lim An ≡ sup {An }

n→∞

and if {An } is decreasing then

lim An ≡ inf {An } .

n→∞

Before discussing lim sup and lim inf, here is a very useful observation about double sums. Theorem 1.15.4 Let aij ≥ 0. Then ∞ ∞ ∑ ∑

aij =

∞ ∞ ∑ ∑

aij .

j=1 i=1

i=1 j=1

Proof: First note there is no trouble in defining these sums because the aij are all nonnegative. If a sum diverges, it only diverges to ∞ and so ∞ is the value of the sum. Next note that ∞ ∑ ∞ ∑

because

∑∞ ∑∞ j=r

i=r

aij ≥

∑∞ ∑n j=r

∞ ∑ ∞ ∑

n

j=r i=r

= sup lim

n ∞ ∑ ∑

aij

j=r i=r

aij for each n. Therefore,

aij ≥ sup

∞ ∑ n ∑

aij = sup lim

aij = sup

aij = lim

i=r j=r

n ∑

n

i=r j=r n ∑ ∞ ∑

n→∞

m ∑ n ∑

n m→∞

j=r i=r

n ∑ m ∑

n m→∞

n

n

j=r i=r

i=r

= sup

aij ≥ sup

i=r n ∑ ∞ ∑

lim

m ∑

aij j=r ∞ ∑ ∞ ∑

m→∞

aij =

i=r j=r

aij

j=r i=r

i=r j=r

Interchanging the i and j in the above argument proves the theorem. 

aij

1.15.

SOME TOPICS FROM ANALYSIS

25

Lemma 1.15.5 Let {an } be a sequence of real numbers and let Un ≡ sup {ak : k ≥ n} . Then {Un } is a decreasing sequence. Also if Ln ≡ inf {ak : k ≥ n} , then {Ln } is an increasing sequence. Therefore, limn→∞ Ln and limn→∞ Un both exist. Proof: Let Wn be an upper bound for {ak : k ≥ n} . Then since these sets are getting smaller, it follows that for m < n, Wm is an upper bound for {ak : k ≥ n} . In particular if Wm = Um , then Um is an upper bound for {ak : k ≥ n} and so Um is at least as large as Un , the least upper bound for {ak : k ≥ n} . The claim that {Ln } is decreasing is similar.  From the lemma, the following definition makes sense. Definition 1.15.6 Let {an } be any sequence of points of [−∞, ∞] lim sup an ≡ lim sup {ak : k ≥ n} n→∞

n→∞

lim inf an ≡ lim inf {ak : k ≥ n} . n→∞

n→∞

Now the following shows the relation of lim inf and lim sup to the limit. Theorem 1.15.7 Suppose {an } is a sequence of real numbers and that lim sup an n→∞

and lim inf an n→∞

are both real numbers. Then limn→∞ an exists if and only if lim inf an = lim sup an n→∞

n→∞

and in this case, lim an = lim inf an = lim sup an .

n→∞

n→∞

n→∞

Proof: First note that sup {ak : k ≥ n} ≥ inf {ak : k ≥ n} and so, lim sup an ≡ lim sup {ak : k ≥ n} ≥ lim inf {ak : k ≥ n} ≡ lim inf an . n→∞

n→∞

n→∞

n→∞

Suppose first that limn→∞ an exists and is a real number a. Then from the definition of a limit, there exists N corresponding to ε/6 in the definition. Hence, if m, n ≥ N, then |an − am | ≤ |an − a| + |a − an |
0, there exists N such that sup {ak : k ≥ N } − inf {ak : k ≥ N } < ε, and for every N, inf {ak : k ≥ N } ≤ a ≤ sup {ak : k ≥ N } Thus if n ≥ N,

|a − an | < ε

which implies that limn→∞ an = a. In case a = ∞ = lim sup {ak : k ≥ N } = lim inf {ak : k ≥ N } N →∞

N →∞

then if r ∈ R is given, there exists N such that inf {ak : k ≥ N } > r which is to say that limn→∞ an = ∞. The case where a = −∞ is similar except you use sup {ak : k ≥ N }.  The significance of lim sup and lim inf, in addition to what was just discussed, is contained in the following theorem which follows quickly from the definition. Theorem 1.15.8 Suppose {an } is a sequence of points of [−∞, ∞] . Let λ = lim sup an . n→∞

Then if b > λ, it follows there exists N such that whenever n ≥ N, an ≤ b. If c < λ, then an > c for infinitely many values of n. Let γ = lim inf an . n→∞

Then if d < γ, it follows there exists N such that whenever n ≥ N, an ≥ d. If e > γ, it follows an < e for infinitely many values of n. The proof of this theorem is left as an exercise for you. It follows directly from the definition and it is the sort of thing you must do yourself. Here is one other simple proposition. Proposition 1.15.9 Let limn→∞ an = a > 0. Then lim sup an bn = a lim sup bn . n→∞

n→∞

Proof: This follows from the definition. Let λn = sup {ak bk : k ≥ n} . For all n large enough, an > a − ε where ε is small enough that a − ε > 0. Therefore, λn ≥ sup {bk : k ≥ n} (a − ε) for all n large enough. Then lim sup an bn

=

n→∞



lim λn ≡ lim sup an bn

n→∞

n→∞

lim (sup {bk : k ≥ n} (a − ε))

n→∞

= (a − ε) lim sup bn n→∞

Similar reasoning shows lim sup an bn ≤ (a + ε) lim sup bn n→∞

Now since ε > 0 is arbitrary, the conclusion follows. 

n→∞

1.16. EXERCISES

1.16

27

Exercises ∑n

1 4 1 3 1 2 n + n + n . 4 2 4 √ ∑n 1 2. Prove by induction that whenever n ≥ 2, k=1 √ > n. k ∑n 3. Prove by induction that 1 + i=1 i (i!) = (n + 1)!. ∑n ( ) n 4. The binomial theorem states (x + y) = k=0 nk xn−k y k where ( ) ( ) ( ) ( ) ( ) n+1 n n n n = + if k ∈ [1, n] , ≡1≡ k k k−1 0 n 1. Prove by induction that

k=1

k3 =

Prove the binomial theorem by induction. Next show that ( ) n n! = , 0! ≡ 1 (n − k)!k! k I 5. Let z = 5 + i9. Find z −1 . 6. Let z = 2 + i7 and let w = 3 − i8. Find zw, z + w, z 2 , and w/z. 7. Give the complete solution to x4 + 16 = 0. 8. Graph the complex cube roots of 8 in the complex plane. Do the same for the four fourth roots of 16. I 9. If z is a complex number, show there exists ω a complex number with |ω| = 1 and ωz = |z| . n

10. De Moivre’s theorem says [r (cos t + i sin t)] = rn (cos nt + i sin nt) for n a positive integer. Does this formula continue to hold for all integers n, even negative integers? Explain. I 11. You already know formulas for cos (x + y) and sin (x + y) and these were used to prove De Moivre’s theorem. Now using De Moivre’s theorem, derive a formula for sin (5x) and one for cos (5x). I 12. If z and w are two complex numbers and the polar form of z involves the angle θ while the polar form of w involves the angle ϕ, show that in the polar form for zw the angle involved is θ + ϕ. Also, show that in the polar form of a complex number z, r = |z| . 13. Factor x3 + 8 as a product of linear factors. ( ) 14. Write x3 + 27 in the form (x + 3) x2 + ax + b where x2 + ax + b cannot be factored any more using only real numbers. 15. Completely factor x4 + 16 as a product of linear factors. 16. Factor x4 + 16 as the product of two quadratic polynomials each of which cannot be factored further without using complex numbers. ∏n 17. If z, w are complex numbers prove zw = zw and then show by induction that j=1 zj = ∑m ∏n ∑m k=1 zk . In words this says the conjugate of a prodj=1 zj . Also verify that k=1 zk = uct equals the product of the conjugates and the conjugate of a sum equals the sum of the conjugates. 18. Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 where all the ak are real numbers. Suppose also that p (z) = 0 for some z ∈ C. Show it follows that p (z) = 0 also.

28

CHAPTER 1. SOME PREREQUISITE TOPICS

19. Show that 1 + i, 2 + i are the only two zeros to p (x) = x2 − (3 + 2i) x + (1 + 3i) so the zeros do not necessarily come in conjugate pairs if the coefficients are not real. 20. I claim that 1 = −1. Here is why. −1 = i2 =



√ √ √ 2 −1 −1 = (−1) = 1 = 1.

This is clearly a remarkable result but is there something wrong with it? If so, what is wrong? 21. De Moivre’s theorem is really a grand thing. I plan to use it now for rational exponents, not just integers. 1 = 1(1/4) = (cos 2π + i sin 2π)

1/4

= cos (π/2) + i sin (π/2) = i.

Therefore, squaring both sides it follows 1 = −1 as in the previous problem. What does this tell you about De Moivre’s theorem? Is there a profound difference between raising numbers to integer powers and raising numbers to non integer powers? 22. Review Problem 10 at this point. Now here is another question: If n is an integer, is it always n true that (cos θ − i sin θ) = cos (nθ) − i sin (nθ)? Explain. 23. Suppose any polynomial in cos θ and sin θ. By this I mean an expression of the ∑myou∑have n form α=0 β=0 aαβ cosα θ sinβ θ where aαβ ∈ C. Can this always be written in the form ∑m+n ∑n+m γ=−(n+m) bγ cos γθ + τ =−(n+m) cτ sin τ θ? Explain. 24. Show that C cannot be considered an ordered field. Hint: Consider i2 = −1. 25. Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 is a polynomial and it has n zeros, z1 , z 2 , · · · , z n listed according to multiplicity. (z is a root of multiplicity m if the polynomial f (x) = (x − z) divides p (x) but (x − z) f (x) does not.) Show that

m

p (x) = an (x − z1 ) (x − z2 ) · · · (x − zn ) . 26. Give the solutions to the following quadratic equations having real coefficients. (a) x2 − 2x + 2 = 0 (b) 3x2 + x + 3 = 0 (c) x2 − 6x + 13 = 0 (d) x2 + 4x + 9 = 0 (e) 4x2 + 4x + 5 = 0 27. Give the solutions to the following quadratic equations having complex coefficients. Note how the solutions do not come in conjugate pairs as they do when the equation has real coefficients. (a) x2 + 2x + 1 + i = 0 (b) 4x2 + 4ix − 5 = 0 (c) 4x2 + (4 + 4i) x + 1 + 2i = 0 (d) x2 − 4ix − 5 = 0 (e) 3x2 + (1 − i) x + 3i = 0

1.16. EXERCISES

29

28. Prove the fundamental theorem of algebra for quadratic polynomials having coefficients in C. That is, show that an equation of the form ax2 + bx + c = 0 where a, b, c are complex numbers, a ̸= 0 has a complex solution. Hint: Consider the fact, noted earlier that the expressions given from the quadratic formula do in fact serve as solutions. 29. Prove the Euclidean algorithm: If m, n are positive integers, then there exist integers q, r ≥ 0 such that r < m and n = qm + r Hint: You might try considering S ≡ {n − km : k ∈ N and n − km < 0} and picking the smallest integer in S or something like this. It was done in the chapter, but go through it yourself. 30. Recall that two polynomials are equal means that the coefficients of corresponding powers of λ are equal. Thus a polynomial equals 0 if and only if all coefficients equal 0. In calculus we usually think of a polynomial as 0 if it sends every value of x to 0. Suppose you have the following polynomial ¯1x2 + ¯1x where it is understood to be a polynomial in Z2 . Thus it is not the zero polynomial. Show, however, that this equals zero for all x ∈ Z2 so we would be tempted to say it is zero if we use the conventions of calculus. 31. Prove Wilson’s theorem. This theorem states that if p is a prime, then (p − 1)! + 1 is divisible by p. Wilson’s theorem was first proved by Lagrange in the 1770’s. Hint: Check directly for −1 p = 2, 3. Show that p − 1 = −1 and that if a ∈ {2, · · · , p − 2} , then (a) ̸= a. Thus a residue class a and its multiplicative inverse for a ∈ {2, · · · , p − 2} occur in pairs. Show that this implies that the residue class of (p − 1)! must be −1. From this, draw the conclusion. p

p

p

32. Show that in the arithmetic of Zp , (x + y) = (x) + (y) , a well known formula among students. 33. Consider (a) ∈ Zp for p a prime, and suppose (a) ̸= 1, 0. Fermat’s little theorem says that p−1 p−1 (a) = 1. In other words (a) − 1 is divisible by p. Prove this. Hint: Show that there r 2 must exist r ≥ 1, r ≤ p − 1 such that (a) = 1. To do so, consider 1, (a) { , (a) , · · · . Then}these { } p−1 all have values in 1, 2, · · · , p − 1 , and so there must be a repeat in 1, (a) , · · · , (a) , say l

k

l−k

. Then tell why (a)} − 1 = 0. Let r be the first positive integer p − 1 ≥ l > k and (a) = (a) { r r−1 such that (a) = 1. Let G = 1, (a) , · · · , (a) . Show that every residue class in G has its k

r−k

multiplicative inverse in G. In fact, (a){ (a) = 1. Also verify}that the entries in G must be { } k distinct. Now consider the sets bG ≡ b (a) : k = 0, · · · , r − 1 where b ∈ 1, 2, · · · , p − 1 . Show that two of these sets are either the same or disjoint and that they all consist of r elements. Explain why it follows that p − 1 = lr for some positive integer l equal to the p−1 lr number of these distinct sets. Then explain why (a) = (a) = 1. 34. Let p (x) and q (x) be polynomials. Then by the division algorithm, there exist polynomials l (x) , r (x) equal to 0 or having degree smaller than p (x) such that q (x) = p (x) l (x) + r (x) If k (x) is the greatest common divisor of p (x) and q (x) , explain why k (x) must divide r (x). Then argue that k (x) is also the greatest common divisor of p (x) and r (x). Now repeat the process for the polynomials p (x) and r (x). This time, the remainder term will have degree smaller than r (x). Keep doing this and eventually the remainder must be 0. Describe an algorithm based on this which will determine the greatest common divisor of two polynomials.

30

CHAPTER 1. SOME PREREQUISITE TOPICS

35. Consider Zm where m is not a prime. Show that although this will not be a field, it is a commutative ring with unity. 36. This and the next few problems are to illustrate the utility of the lim sup. A sequence of numbers {xn } in C is called a Cauchy sequence if for every ε > 0 there exists m such that if k, l ≥ m, then |xk − xl | < ε. The complex numbers are said to be complete because any Cauchy sequence of the completeness axiom. Using this axiom, ∑∞ converges. This ∑n is one form 1 whenever r ∈ C and |r| < 1. Hint: You need show that k=0 rk ≡ limn→∞ k=0 rk = 1−r to do a computation with the sum and show that the partial sums form a Cauchy sequence. ∑∞ ∑n ∑∞ 37. Show that if j=1 |cj | converges, meaning that limn→∞ j=1 |cj | exists, then j=1 cj also ∑n converges, meaning limn→∞ j=1 cj exists, this for cj ∈ C. Recall from calculus, this says that absolute convergence implies convergence. ∑n ∑∞ 38. Show that if j=1 cj converges, meaning limn→∞ j=1 cj exists, then it must be the case that limn→∞ cn = 0. ∑∞ 1/k 1/n 39. Show that if lim supk→∞ |ak | < 1, then k=1 |ak | converges, while if lim supn→∞ |an | > 1, then the series diverges spectacularly because limn→∞ |cn | fails to equal 0 and in fact has 1/n a subsequence which converges to ∞. Also show that if lim supn→∞ |an | = 1, the test fails because there are examples where the series can converge and examples where the series diverges. This is an improved version of the root test from ∑ It is improved because ∑ calculus. lim sup always exists. Hint: For the last part, consider n n1 and n n12 . Review calculus to see why the first diverges and the second converges. ∑∞ 40. Consider a power series n=0 an xn . Derive a condition for the radius of convergence using 1/n lim supn→∞ |an | . Recall that the radius of convergence R is such that if |x| < R, then the series converges and if |x| > R, the series diverges and if |x| = R is it not known whether the series converges. In this problem, assume only that x ∈ C. 41. Show that if an is a sequence of real numbers, then lim inf n→∞ (−an ) = − lim supn→∞ an .

Part I

Linear Algebra For Its Own Sake

31

Chapter 2

Systems of Linear Equations This part of the book is about linear algebra itself, as a part of algebra. Some geometric and analytic concepts do creep in, but it is primarily about algebra. It involves general fields and has very little to do with limits and completeness although some geometry is included, but not much. Numbers are elements of a field.

2.1

Elementary Operations

In this chapter, the main interest is in fields of scalars consisting of R or C, but everything is applied to arbitrary fields. Consider the following example. Example 2.1.1 Find x and y such that x + y = 7 and 2x − y = 8.

(2.1)

The set of ordered pairs, (x, y) which solve both equations is called the solution set. You can verify that (x, y) = (5, 2) is a solution to the above system. The interesting question is this: If you were not given this information to verify, how could you determine the solution? You can do this by using the following basic operations on the equations, none of which change the set of solutions of the system of equations. Definition 2.1.2 Elementary operations are those operations consisting of the following. 1. Interchange the order in which the equations are listed. 2. Multiply any equation by a nonzero number. 3. Replace any equation with itself added to a multiple of another equation. Example 2.1.3 To illustrate the third of these operations on this particular system, consider the following. x+y =7 2x − y = 8 The system has the same solution set as the system x+y =7 . −3y = −6 To obtain the second system, take the second equation of the first system and add −2 times the first equation to obtain −3y = −6. 33

34

CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS

Now, this clearly shows that y = 2 and so it follows from the other equation that x + 2 = 7 and so x = 5. Of course a linear system may involve many equations and many variables. The solution set is still the collection of solutions to the equations. In every case, the above operations of Definition 2.1.2 do not change the set of solutions to the system of linear equations. Theorem 2.1.4 Suppose you have two equations, involving the variables, (x1 , · · · , xn ). E 1 = f1 , E 2 = f2

(2.2)

where E1 and E2 are expressions E1

=

a1 x1 + · · · + an xn

E2

=

b1 x1 + · · · + bn xn

involving the variables and f1 and f2 are constants where the ai , bi , f1 , f2 are in a field F. (In the above example there are only two variables, x and y and E1 = x + y while E2 = 2x − y.) Then the system E1 = f1 , E2 = f2 has the same solution set as E1 = f1 , E2 + aE1 = f2 + af1 .

(2.3)

Also the system E1 = f1 , E2 = f2 has the same solutions as the system, E2 = f2 , E1 = f1 . The system E1 = f1 , E2 = f2 has the same solution as the system E1 = f1 , aE2 = af2 provided a ̸= 0. Proof: If (x1 , · · · , xn ) solves E1 = f1 , E2 = f2 then it solves the first equation in E1 = f1 , E2 + aE1 = f2 + af1 . Also, it satisfies aE1 = af1 and so, since it also solves E2 = f2 it must solve E2 +aE1 = f2 +af1 . Therefore, if (x1 , · · · , xn ) solves E1 = f1 , E2 = f2 it must also solve E2 +aE1 = f2 +af1 . On the other hand, if it solves the system E1 = f1 and E2 +aE1 = f2 +af1 , then aE1 = af1 and so you can subtract these equal quantities from both sides of E2 + aE1 = f2 + af1 to obtain E2 = f2 showing that it satisfies E1 = f1 , E2 = f2 . The second assertion of the theorem which says that the system E1 = f1 , E2 = f2 has the same solution as the system, E2 = f2 , E1 = f1 is seen to be true because it involves nothing more than listing the two equations in a different order. They are the same equations. The third assertion of the theorem which says E1 = f1 , E2 = f2 has the same solution as the system E1 = f1 , aE2 = af2 provided a ̸= 0 is verified as follows: If (x1 , · · · , xn ) is a solution of E1 = f1 , E2 = f2 , then it is a solution to E1 = f1 , aE2 = af2 because the second system only involves multiplying the equation, E2 = f2 by a. If (x1 , · · · , xn ) is a solution of E1 = f1 , aE2 = af2 , then upon multiplying aE2 = af2 by the number 1/a, you find that E2 = f2 .  Stated simply, the above theorem shows that the elementary operations do not change the solution set of a system of equations.

2.2

Gauss Elimination

A less cumbersome way to represent a linear system is to write it as an augmented matrix. For example the suppose you want to find the solution for x, y, z in Z5 to the system x + ¯3y + z ¯2y + z

= =

To simplify, write the coefficients without the  1   2 0

¯0, ¯2x + y + ¯3z = ¯3, ¯4 bar but do the arithmetic in Z5 .  3 1 0  1 3 3 . 2 1 4

2.2. GAUSS ELIMINATION

35

It has exactly the same information as the original system but here the columns correspond to the variables and the rows correspond to the equations in the system. To solve the system, we can use Gauss elimination in the usual way. The solution set is not changed by using the row operations. Take 3 = −2 times the top equation and add to the second.   1 3 1 0    0 0 1 3  0 2 1 4 

Now switch the bottom two rows.

1   0 0

3 2 0

Then take 4 times the bottom row and add to  1   0 0 Next multiply the second row by 3



1   0 0 Now take 2 times the second row and add to  1   0 0

 0  4  3

1 1 1

the top two.  3 0 2  2 0 1  0 1 3 3 1 0

 2  3  3

0 0 1

the top. 0 1 0

0 0 1

 3  3  3

Therefore, the solution is x = y = z = 3. How do you know when to stop? You certainly should stop doing row operations if you have gotten a matrix in row reduced echelon form described next. The leading entry of a row is the first nonzero row encountered when starting at the left entry and moving from left to right along the row. Definition 2.2.1 An augmented matrix is in row reduced echelon form if 1. All nonzero rows are above any rows of zeros. 2. Each leading entry of a row is in a column to the right of the leading entries of any rows above it. 3. All entries in a column above and below a leading entry are zero. 4. Each leading entry is a 1, the only nonzero entry in its column. Echelon form means that the leading entries of successive rows fall from upper left to lower right. Example 2.2.2 Here are some matrices which are in row reduced echelon form. 

1 0  0 0    0 0 0 0

0 1 0 0

5 2 0 0

8 7 0 0

0 0 1 0





     ,    

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

    .   

36

CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS

Example 2.2.3 Here are matrices in echelon form which which are in echelon form.    1 1 0 6 5 8 2  0  0 0 2 2 7 3      0  ,  0 0 0 0 0 1     0 0 0 0 0 0 0 0

are not in row reduced echelon form but

3 2 0 0 0

5 0 3 0 0

4 7 0 1 0

       

Example 2.2.4 Here are some matrices which are not in echelon form.     0 0 0 0  0 2 3 3     1 2 3 3  1 2 3       1 5 0 2   0 1 0 2 , , 2 4 −6  .      7 5 0 1    4 0 7  0 0 0 1  0 0 1 0 0 0 0 0 The following is the algorithm for obtaining a matrix which is in row reduced echelon form. Algorithm 2.2.5 This algorithm tells how to start with a matrix and do row operations on it in such a way as to end up with a matrix in row reduced echelon form. 1. Find the first nonzero column from the left. This is the first pivot column. The position at the top of the first pivot column is the first pivot position. Switch rows if necessary to place a nonzero number in the first pivot position. 2. Use row operations to zero out the entries below the first pivot position. 3. Ignore the row containing the most recent pivot position identified and the rows above it. Repeat steps 1 and 2 to the remaining sub-matrix, the rectangular array of numbers obtained from the original matrix by deleting the rows you just ignored. Repeat the process until there are no more rows to modify. The matrix will then be in echelon form. 4. Moving from right to left, use the nonzero elements in the pivot positions to zero out the elements in the pivot columns which are above the pivots. 5. Divide each nonzero row by the value of the leading entry. The result will be a matrix in row reduced echelon form. Sometimes there is no solution to a system of equations. When this happens, the system is said to be inconsistent. Here is another example based on the use of row operations. Example 2.2.6 Give the complete solution to the system of equations, 3x − y − 5z = 9, y − 10z = 0, and −2x + y = −6. The augmented matrix of this system is  3 −1  1  0 −2 1

 −5 9  −10 0  0 −6

2.3. EXERCISES

37

After doing row operations, to obtain row reduced echelon form,   1 0 −5 3    0 1 −10 0  . 0 0 0 0 The equations corresponding to this reduced echelon form are y = 10z and x = 3 + 5z. Apparently z can equal any number. Lets call this number t. 1 Therefore, the solution set of this system is x = 3 + 5t, y = 10t, and z = t where t is completely arbitrary. The system has an infinite set of solutions which are given in the above simple way. This is what it is all about, finding the solutions to the system. In summary, Definition 2.2.7 A system of linear equations is a list of equations, a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2 .. . am1 x1 + am2 x2 + · · · + amn xn = bm where aij are numbers, and bj is a number. The above is a system of m equations in the n variables, x1 , x2 · · · , xn . Nothing is said about the relative size of m and n. Written more simply in terms of summation notation, the above can be written in the form n ∑

aij xj = fi , i = 1, 2, 3, · · · , m

j=1

It is desired to find (x1 , · · · , xn ) solving each of the equations listed. As illustrated above, such a system of linear equations may have a unique solution, no solution, or infinitely many solutions and these are the only three cases which can occur for any linear system. Furthermore, you do exactly the same things to solve any linear system. You write the augmented matrix and do row operations until you get a simpler system in which it is possible to see the solution, usually obtaining a matrix in echelon or reduced echelon form. All is based on the observation that the row operations do not change the solution set. You can have more equations than variables, fewer equations than variables, etc. It doesn’t matter. You always set up the augmented matrix and go to work on it. Definition 2.2.8 A system of linear equations is called consistent if there exists a solution. It is called inconsistent if there is no solution. These are reasonable words to describe the situations of having or not having a solution. If you think of each equation as a condition which must be satisfied by the variables, consistent would mean there is some choice of variables which can satisfy all the conditions. Inconsistent would mean there is no choice of the variables which can satisfy each of the conditions.

2.3

Exercises

1. Here is an augmented matrix in which ∗ denotes an arbitrary number and  denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is the solution unique?    ∗ ∗ ∗ ∗ | ∗  0  ∗ ∗ 0 | ∗       0 0  ∗ ∗ | ∗  0 1 In

this context t is called a parameter.

0

0

0

 |



38

CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS 2. Here is an augmented matrix in which ∗ denotes an arbitrary number and  denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is the solution unique?    ∗ ∗ | ∗    0  ∗ | ∗  0 0  | ∗ 3. Here is an augmented matrix in which ∗ denotes an arbitrary number and  denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is the solution unique?    ∗ ∗ ∗ ∗ | ∗  0  0 ∗ 0 | ∗       0 0 0  ∗ | ∗  0

0

0

0

 |



4. Here is an augmented matrix in which ∗ denotes an arbitrary number and  denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is the solution unique?    ∗ ∗ ∗ ∗ | ∗  0  ∗ ∗ 0 | ∗       0 0 0 0  | 0  0

0

0

0



|



5. Suppose a system of equations has fewer equations than variables. Must such a system be consistent? If so, explain why and if not, give an example which is not consistent. 6. If a system of equations has more equations than variables, can it have a solution? If so, give an example and if not, tell why not. 7. Find h such that

(

2 3

h | 4 6 | 7

)

is the augmented matrix of an inconsistent matrix. 8. Find h such that

(

1 2

h | 3 4 | 6

)

is the augmented matrix of a consistent matrix. 9. Find h such that

(

1 3

1 | h |

4 12

)

is the augmented matrix of a consistent matrix. 10. Choose h and k such that the augmented matrix shown has one solution. Then choose h and k such that the system has no solutions. Finally, choose h and k such that the system has infinitely many solutions. ( ) 1 h | 2 . 2 4 | k

2.3. EXERCISES

39

11. Choose h and k such that the augmented matrix shown has one solution. Then choose h and k such that the system has no solutions. Finally, choose h and k such that the system has infinitely many solutions. ( ) 1 2 | 2 . 2 h | k 12. Find the solution in Z5 to the following system of equations. x + 2y + z − w = 2 x−y+z+w =1 2x + y − z = 1 4x + 2y + z = 0 13. Find the solution to the following system in Z5 x + 2y + z − w = 2 x−y+z+w =0 2x + y − z = 1 4x + 2y + z = 3 14. Find the general solution of the system whose  1 2   1 3 1 0

augmented matrix is  0 2  4 2 . 2 1

Find solutions in Z7 . 15. Find the general solution of the system whose  1 2   2 0 3 2

augmented matrix is  0 2  1 1 . 1 3

in Z7 . 16. Find the general solution in Z3 of the system whose augmented matrix is ( ) 2 1 0 1 . 1 0 1 2 17. Solve the system whose augmented matrix  1 0  0 1    1 2 1 0

is 2 0 0 1

1 1 0 0

1 2 1 2

2 1 0 2

    

in Z3 18. Find the general solution of the system whose  1 0 2  0 1 0    0 2 0 1 −1 2 Find the solutions to this one in Z5 .

augmented matrix is  1 1 2 1 2 1   . 0 1 3  2

2

0

40

CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS

19. Give the complete solution to the system of equations, 7x + 14y + 15z = 22, 2x + 4y + 3z = 5, and 3x + 6y + 10z = 13. 20. Give the complete solution to the system of equations, 3x − y + 4z = 6, y + 8z = 0, and −2x + y = −4. 21. Give the complete solution to the system of equations, 9x−2y+4z = −17, 13x−3y+6z = −25, and −2x − z = 3. 22. Give the complete solution to the system of equations, 65x+84y+16z = 546, 81x+105y+20z = 682, and 84x + 110y + 21z = 713. 23. Give the complete solution to the system of equations, 8x + 2y + 3z = −3, 8x + 3y + 3z = −1, and 4x + y + 3z = −9. 24. Give the complete solution to the system of equations, −8x + 2y + 5z = 18, −8x + 3y + 5z = 13, and −4x + y + 5z = 19. 25. Give the complete solution to the system of equations, 3x − y − 2z = 3, y − 4z = 0, and −2x + y = −2. 26. Give the complete solution to the system of equations, −9x + 15y = 66, −11x + 18y = 79 ,−x + y = 4, and z = 3. 27. Give the complete solution to the system of equations, −19x+8y = −108, −71x+30y = −404, −2x + y = −12, 4x + z = 14. 28. Consider the system −5x + 2y − z = 0 and −5x − 2y − z = 0. Both equations equal zero and so −5x + 2y − z = −5x − 2y − z which is equivalent to y = 0. Thus x and z can equal anything. But when x = 1, z = −4, and y = 0 are plugged in to the equations, it doesn’t work. Why? 29. Four times the weight of Gaston is 150 pounds more than the weight of Ichabod. Four times the weight of Ichabod is 660 pounds less than seventeen times the weight of Gaston. Four times the weight of Gaston plus the weight of Siegfried equals 290 pounds. Brunhilde would balance all three of the others. Find the weights of the four sisters. 30. The steady state temperature, u in a plate solves Laplace’s equation, ∆u = 0. One way to approximate the solution which is often used is to divide the plate into a square mesh and require the temperature at each node to equal the average of the temperature at the four adjacent nodes. This procedure is justified by the mean value property of harmonic functions. In the following picture, the numbers represent the observed temperature at the indicated nodes. Your task is to find the temperature at the interior nodes, indicated by x, y, z, and w. One of the equations is z = 14 (10 + 0 + w + x). 30

30

20

y

w

0

20

x

z

0

10

10

31. Consider the following diagram of four circuits.

2.3. EXERCISES

41 3Ω

20 volts 1 Ω

I2 5 volts

I3 5Ω

2Ω

1Ω 6Ω

I1 10 volts

I4 1Ω

4Ω

3Ω 2Ω

Those jagged places denote resistors and the numbers next to them give their resistance in ohms, written as Ω. The breaks in the lines having one short line and one long line denote a voltage source which causes the current to flow in the direction which goes from the longer of the two lines toward the shorter along the unbroken part of the circuit. The current in amps in the four circuits is denoted by I1 , I2 , I3 , I4 and it is understood that the motion is in the counter clockwise direction if Ik ends up being negative, then it just means it moves in the clockwise direction. Then Kirchhoff’s law states that The sum of the resistance times the amps in the counter clockwise direction around a loop equals the sum of the voltage sources in the same direction around the loop. 32. Consider the following diagram of four circuits. 3Ω

12 volts 7 Ω

I1 10 volts

I2 5Ω

2Ω

3Ω 1Ω I3

2Ω

4Ω 4Ω

Those jagged places denote resistors and the numbers next to them give their resistance in ohms, written as Ω. The breaks in the lines having one short line and one long line denote a voltage source which causes the current to flow in the direction which goes from the longer of the two lines toward the shorter along the unbroken part of the circuit. The current in amps in the four circuits is denoted by I1 , I2 , I3 and it is understood that the motion is in the counter clockwise direction. If Ik ends up being negative, then it just means the current flows in the clockwise direction. Then Kirchhoff’s law states that The sum of the resistance times the amps in the counter clockwise direction around a loop equals the sum of the voltage sources in the same direction around the loop. Find I1 , I2 , I3 .

42

CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS

Chapter 3

Vector Spaces It is time to consider the idea of an abstract Vector space which is something which has two operations satisfying the following vector space axioms. Definition 3.0.1 A vector space is an Abelian group of “vectors” satisfying the axioms of an Abelian group, v + w = w + v, the commutative law of addition, (v + w) + z = v + (w + z) , the associative law for addition, v + 0 = v, the existence of an additive identity, v + (−v) = 0, the existence of an additive inverse, along with a field of “scalars” F which are allowed to multiply the vectors according to the following rules. (The Greek letters denote scalars.) α (v + w) = αv + αv,

(3.1)

(α + β) v = αv + βv,

(3.2)

α (βv) = αβ (v) ,

(3.3)

1v = v.

(3.4)

For example, any field is a vector space having field of scalars equal to the field itself. The field of scalars is often R or C and the vector space will be called real or complex depending on whether the field is R or C. However, other fields are also possible. For example, one could use the field of rational numbers or even the field of the integers mod p for p a prime. A vector space is also called a linear space. These axioms do not tell us anything about what is being considered. Nevertheless, one can prove some fundamental properties just based on these vector space axioms. Proposition 3.0.2 In any vector space, 0 is unique, −x is unique, 0x = 0, and (−1) x = −x. Proof: Suppose 0′ is also an additive identity. Then for 0 the additive identity in the axioms, 0′ = 0 ′ + 0 = 0 Next suppose x+y=0. Then add −x to both sides. −x = −x + (x + y) = (−x + x) + y = 0 + y = y 43

44

CHAPTER 3. VECTOR SPACES

Thus if y acts like the additive inverse, it is the additive inverse. 0x = (0 + 0) x = 0x + 0x Now add −0x to both sides. This gives 0 = 0x. Finally, (−1) x + x = (−1) x + 1x = (−1 + 1) x = 0x = 0 By the uniqueness of the additive inverse shown earlier, (−1) x = −x.  If you are interested in considering other fields, you should have some examples other than C, R, Q. Some of these are discussed in the following exercises. If you are happy with only considering R and C, skip these exercises. Here is an important example which gives the typical vector space. Example 3.0.3 Let Ω be a nonempty set and define V to be the set of functions defined on Ω. Letting a, b, c be scalars coming from a field F and f, g, h functions, the vector operations are defined as (f + g) (x) ≡ (af ) (x) ≡

f (x) + g (x) a (f (x))

Then this is an example of a vector space. Note that the set where the functions have their values can be any vector space having field of scalars F. To verify this, check the axioms. (f + g) (x) = f (x) + g (x) = g (x) + f (x) = (g + f ) (x) Since x is arbitrary, f + g = g + f . ((f + g) + h) (x) ≡ (f + g) (x) + h (x) = (f (x) + g (x)) + h (x) = f (x) + (g (x) + h (x)) = (f (x) + (g + h) (x)) = (f + (g + h)) (x) and so (f + g) + h = f + (g + h) . Let 0 denote the function which is given by 0 (x) = 0. Then this is an additive identity because (f + 0) (x) = f (x) + 0 (x) = f (x) and so f + 0 = f . Let −f be the function which satisfies (−f ) (x) ≡ −f (x) . Then (f + (−f )) (x) ≡ f (x) + (−f ) (x) ≡ f (x) + −f (x) = 0 Hence f + (−f ) = 0. ((a + b) f ) (x) ≡ (a + b) f (x) = af (x) + bf (x) ≡ (af + bf ) (x) and so (a + b) f = af + bf . (a (f + g)) (x) ≡ a (f + g) (x) ≡ a (f (x) + g (x)) = af (x) + bg (x) ≡ (af + bg) (x) and so a (f + g) = af + bg. ((ab) f ) (x) ≡ (ab) f (x) = a (bf (x)) ≡ (a (bf )) (x) so (abf ) = a (bf ). Finally (1f ) (x) ≡ 1f (x) = f (x) so 1f = f . As above, F will be a field. It illustrates the important example of Fn , a vector space with field of scalars F. It is a case of the above general consideration involving functions. Indeed, you simply let Ω = {1, 2, · · · , n}. We write such a function f : {1, 2, · · · , n} → F in as an ordered list of numbers (f (1) , · · · , f (n)). The definition, incorporating the usual notation is as follows.

3.1. LINEAR COMBINATIONS OF VECTORS, INDEPENDENCE

45

Definition 3.0.4 Define Fn ≡ {(x1 , · · · , xn ) : xj ∈ F for j = 1, · · · , n} . (x1 , · · · , xn ) = (y1 , · · · , yn ) if and only if for all j = 1, · · · , n, xj = yj . When (x1 , · · · , xn ) ∈ Fn , it is conventional to denote (x1 , · · · , xn ) by the single bold face letter x. The numbers, xj are called the coordinates. Elements in Fn are called vectors. The set {(0, · · · , 0, t, 0, · · · , 0) : t ∈ R} for t in the ith slot is called the ith coordinate axis. The point 0 ≡ (0, · · · , 0) is called the origin. Note that this can be considered as the set of F valued functions defined on (1, 2, · · · , n) . When the ordered list (x1 , · · · , xn ) is considered, it is just a way to say that f (1) = x1 , f (2) = x2 and so forth. Thus it is a case of the typical example of a vector space mentioned above.

3.1

Linear Combinations of Vectors, Independence

The fundamental idea in linear algebra is the following notion of a linear combination. Definition 3.1.1 Let x1 , · · · , xn be vectors∑in a vector space. A finite linear combination of these n vectors is a vector which is of the form j=1 aj xj where the aj are scalars. In short, it is a sum of scalars times vectors. span (x1 , · · · , xn ) denotes the set of all linear combinations of the vectors x1 , · · · , xn . More generally, if S is any set of vectors, span (S) consists of all finite linear combinations of vectors from S. Definition 3.1.2 Let (V, F) be a vector space and its field of scalars. Then S ⊆ V is said to be linearly independent if whenever ∑n{v1 , · · · , vm } ⊆ V with the vi distinct, then there is only one way to have a linear combination i=1 ci vi = 0 and this is to have each ci = 0. More succinctly, if ∑n c v = 0 then each c = 0. A set S ⊆ V is linearly dependent if it is not linearly i i=1 i i ∑nindependent. That is, there is some subset of S {v1 , · · · , vn } and scalars ci not all zero such that i=1 ci vi = 0. The following is a useful equivalent description of what it means to be independent. Proposition 3.1.3 A set of vectors S is independent if and only if no vector is a linear combination of the others. ∑ Proof: ⇒ Suppose S is linearly independent. Could you have for some {u1 , · · · , ur } ui = ∑j̸=i cj uj ? No. This is not possible because if the above holds, then you would have 0 = (−1) ui + j̸=i cj uj in contradiction to the assumption that {u1 , · · · , ur } is linearly independent. ∑n ⇐ Suppose now that no vector in S is a linear combination of the others. Suppose i=1 ci ui = 0 where each ui ∈ S. It is desired to show that whenever this any of ∑n happens, each ci = 0. Could ∑ the ci be non zero? No. If ck ̸= 0, then you would have i=1 ccki ui = 0 and so uk = i̸=k − ccki ui showing that one can obtain uk as a linear combination of the other vectors after all. It follows that all ci = 0 and so {u1 , · · · , ur } is linearly independent.  Example 3.1.4 Determine whether the real valued functions defined on R given by the polynomials x2 + 2x + 1, x2 + 2x, x2 + x + 1 are independent with field of scalars R. ( ) ( ) ( ) Suppose a x2 + 2x + 1 + b x2 + 2x + c x2 + x + 1 = 0 then differentiate both sides to obtain a (2x + 2) + b (2x + 2) + c (2x + 1) = 0. Now differentiate again. This yields 2a + 2b + 2c = 0. In the second equation, let x = −1. Then −c = 0 so c = 0. Thus ( ) ( ) a x2 + 2x + 1 + b x2 + 2x = 0 a+b

= 0

46

CHAPTER 3. VECTOR SPACES

Now let x = 0 in the top equation to find that a = 0. Then from the bottom equation, it follows that b = 0 also. Thus the three functions are linearly independent. The main theorem is the following, called the replacement or exchange theorem. It uses the argument of the second half of the above proposition repeatedly. Theorem 3.1.5 Let {u1 , · · · , ur } , {v1 , · · · , vs } be subsets of a vector space V with field of scalars F and suppose {u1 , · · · , ur } is linearly independent and each ui ∈ span (v1 , · · · , vs ) . Then r ≤ s. In words, linearly independent sets are no longer than spanning sets. ∑ Proof: Say r > s. By assumption, u1 = i bi vi . Not all of the bi can equal 0 because if this were so, you would have u1 = 0 which would violate the assumption that {u1 , · · · , ur } is linearly independent. You could write 1u1 + 0u2 + · · · + 0ur = 0 since u1 = 0. Thus some vi say vi1 is a linear combination of the vector u1 along with the vj for j ̸= i. It follows that the span of {u1 , v1 , · · · , vˆi1 , · · · , vn } includes each of the ui where the hat indicates that vi1 has been omitted from the list of vectors. Now suppose each ui is in span (u1 · · · , uk , v1 , · · · , vˆi1 , · · · , vˆik · · · , vs ) where the vectors vˆi1 , · · · , vˆik have been omitted for k ≤ s. Then there are scalars ci and di such that k ∑ ∑ uk+1 = ci ui + dj v j i=1

j ∈{i / 1 ,··· ,ik }

By the assumption that {u1 , · · · , ur } is linearly independent, not all of the dj can equal 0. Why? Therefore, there exists ik+1 ∈ / {i1 , · · · , ik } such that dik ̸= 0. Hence one can solve for vik+1 as a linear combination of {u1 , · · · , ur } and the vj for j ∈ / {i1 , · · · , ik , ik+1 }. Thus we can replace this vik+1 by a linear combination of these vectors, and so the uj are in ( ) span u1 , · · · , uk , uk+1 , v1 , · · · , vˆi1 , · · · , vˆik , vˆik+1 , · · · , vs Continuing this replacement process, it follows that since r > s, one can eliminate all of the vectors {v1 , · · · , vs } and obtain that the ui are contained in span (u1 , · · · , us ) . But this is impossible because then you would have us+1 ∈ span (u1 , · · · , us ) which is impossible since these vectors {u1 , · · · , ur } are linearly independent. It follows that r ≤ s.  Next is the definition of dimension and basis of a vector space. Definition 3.1.6 Let V be a vector space with field of scalars F. A subset S of V is a basis for V means that 1. span (S) = V 2. S is linearly independent. The plural of basis is bases. It is this way to avoid hissing when referring to it. The dimension of a vector space is the number of vectors in a basis. A vector space is finite dimensional if it equals the span of some finite set of vectors. Lemma 3.1.7 Let S be a linearly independent set of vectors in a vector space V . Suppose v ∈ / span (S) . Then {S, v} is also a linearly independent set of vectors. ∑n Proof: Suppose {u1 , · · · , un , v} is a finite subset of S and av + i=1 bi ui = 0 where a, b1 , · · · , bn are scalars. Does it follow that each of the bi equals zero and that a = 0? If so, then this shows that ∑ {S, v} is indeed linearly independent. First note that a = 0 since if not, you could write n / span (S). Hence you have a = 0 and also v∑= i=1 − bai ui contrary to the assumption that v ∈ b u = 0. But S is linearly independent and so by assumption each bi = 0.  i i i

3.1. LINEAR COMBINATIONS OF VECTORS, INDEPENDENCE

47

Proposition 3.1.8 Let V be a finite dimensional nonzero vector space with field of scalars F. Then it has a basis and also any two bases have the same number of vectors so the above definition of a basis is well defined. Proof: Pick u1 ̸= 0. If span (u1 ) = V, then this is a basis. If not, there exists u2 ∈ / span (u1 ). Then by Lemma 3.1.7, {u1 , u2 } is linearly independent. If span (u1 , u2 ) = V, stop. You have a basis. Otherwise, there exists u3 ∈ / span (u1 , u2 ) . Then by Lemma 3.1.7, {u1 , u2 , u3 } is linearly independent. Continue this way. Eventually the process yields {u1 , · · · , un } which is linearly independent and span (u1 , · · · , un ) = V. Otherwise there would exist a linearly independent set of k vectors for all k. However, by assumption, there is a finite set of vectors {v1 , · · · , vs } such that span (v1 , · · · , vs ) = V . Therefore, k ≤ s. Thus there is a basis for V . If {v1 , · · · , vs } , {u1 , · · · , ur } are two bases, then since they both span V and are both linearly independent, it follows from Theorem 3.1.5 that r ≤ s and s ≤ r.  As a specific example, consider Fn as the vector space. As mentioned above, these are the mappings from (1, · · · , n) to the field F. It was shown in Example 3.0.3 that this is indeed a vector space with field of scalars F. We usually think of this Fn as the set of ordered n tuples {(x1 , · · · , xn ) : xi ∈ F} with addition and scalar mutiplication defined as (x1 , · · · , xn ) + (ˆ x1 , · · · , x ˆn ) = (x1 + x ˆ 1 , · · · , xn + x ˆn ) α (x1 , · · · , xn ) = (αx1 , · · · , αxn ) Also, when referring to vectors in Fn , it is customary to denote them as bold faced letters. It is more convenient to write these vectors in Fn as columns of numbers rather than as rows as done earlier. Thus   x1 )T  .  ( ..  ≡ x1 · · · xn x=   xn Observation 3.1.9 Fn has dimension n. To see this, note that a basis is e1 , · · · , en where ei ≡

( 0

···

1

···

)T 0

the vector in Fn which has a 1 in the ith position and a zero everywhere else. To see this, note that

     

and that if 0 =

∑n i=1

xi ei then

     

x1 x2 .. . xn x1 x2 .. . xn

  n  ∑ = xi e i   i=1





    =    

0 0 .. . 0

     

so each xi is zero. Thus this set of vectors is a spanning set and is linearly independent so it is a basis. There are n of these vectors and so the dimension of Fn is indeed n. There is a fundamental observation about linear combinations of vectors in Fn which is stated next.

48

CHAPTER 3. VECTOR SPACES

Theorem 3.1.10 Let a1 , · · · , an be vectors in Fm where m < n. Then there exist scalars x1 , · · · , xn not all equal to zero such that x1 a1 + · · · + xn an = 0. Proof: If the conclusion were not so, then by definition, {a1 , · · · , an } would be independent. However, there is a spanning set with only m vectors, namely {e1 , · · · , em } contrary to Theorem 3.1.5. Since these vectors cannot be independent, they must be dependent which is the conclusion of the theorem. 

3.2

Subspaces

The notion of a subspace is of great importance in applications. Here is what is meant by a subspace. Definition 3.2.1 Let V be a vector space with field of scalars F. Then let W ⊆ V, W ̸= ∅. That is, W is a non-empty subset of V . Then W is a subspace of V if whenever α, β are scalars and u, v are vectors in W, it follows that αu + βv ∈ W. In words, W is closed with respect to linear combinations. The fundamental result about subspaces is that they are themselves vector spaces. Theorem 3.2.2 Let W be a non-zero subset of V a vector space with field of scalars F. Then it is a subspace if and only if it is itself a vector space with field of scalars F. Proof: Suppose W is a subspace. Why is it a vector space? To be a vector space, the operations of addition and scalar multiplication must satisfy the axioms for a vector space. However, all of these are obvious because it is a subset of V . The only thing which is not obvious is whether 0 is in W and whether −u ∈ W whenever u is. But these follow right away from Proposition 3.0.2 because if u ∈ W, (−1) u = −u ∈ W by the fact that W is closed with respect to linear combinations, in particular multiplication by the scalar −1. Similarly, take u ∈ W. Then 0 = 0u ∈ W. As to + being an operation on W, this also follows because for u, v ∈ W, u + v ∈ W . Thus if it is a subspace, it is indeed a vector space. Conversely, suppose it is a vector space. Then by definition, it is closed with respect to linear combinations and so it is a subspace.  This leads to the following simple result. Proposition 3.2.3 Let W be a nonzero subspace of a finite dimensional vector space V with field of scalars F. Then W is also a finite dimensional vector space. Proof: Suppose span (v1 , · · · , vn ) = V . Using the same construction of Proposition 3.1.8, the same process must stop after k ≤ n steps since otherwise one could obtain a linearly independent set of vectors with more vectors in it than a spanning set. Thus it has a basis with no more than n vectors.  { } Example 3.2.4 Show that W = (x, y, z) ∈ R3 : x − 2y − z = 0 is a subspace of R3 . Find a basis for it. You have from the equation that x = 2y + z  2y + z  y  z

and so any vector in this set is of the form    : y, z ∈ R

Conversely, any vector which is of the above form satisfies the condition to be in W . Therefore, W is of the form     1 2     y 1 +z 0  1 0

3.2. SUBSPACES

49

where y, z are scalars. Hence it equals the span of the two vectors in R3 in the above. Are the two vectors linearly independent? If so, they will be a basis. Suppose then that       2 1 0       y 1 +z 0  =  0  0 1 0 Then from the second position, y = 0. It follows then that z = 0 also and so the two vectors form a linearly independent set. Hence a basis for W is      1   2       1 , 0      0 1 The dimension of this subspace is also 2. Example 3.2.5 Show that



     1 0 1        1 , 3 , 1  1 3 4

is a basis for R3 . There are two things to show, that the set of vectors is independent and that it spans R3 . Thus we need to verify that there is exactly one solution to the system of equations         1 0 a 1         x 1  + y 3  + z 1  =  b  3 4 c 1 for any choice of the right side. Recall how to row reduce it.  1   1 1 After some row operations, this yields  1 0 0   0 1 0 0 0 1

do this. You set up the augmented matrix and then  1 0 a  3 1 b  3 4 c

3 2 2a − 3b + 2 1 3b − 2a − 1 1 3c − 3b

1 6c 1 6c

  

Thus there is a unique solution to the system of equations. This shows that the set of vectors is a basis because one solution when the right side of the system equals the zero vector is x = y = z = 0. Therefore, from what was just done, it is the only solution and so the vectors are linearly independent. As to the span of the vectors equalling R3 , this was just shown also. Example 3.2.6 Show that

is not a basis for R3 .

     1 1 1        1 , 1 , 1  −4 3 1 

50

CHAPTER 3. VECTOR SPACES

You can do it the same way. It is really to the system    1    x 1  + y 1

a question about whether there exists a unique solution      a 1 1      3 +z 1  =  b  c −4 3

for any choice of the right side. The augmented matrix is   1 1 1 a    1 1 1 b  1 3 −4 c After row reduction, this yields



1 1   0 2 0 0

 1 a  −5 c − a  0 b−a

Thus there is no solution to the equation unless b = a. It follows the span of the given vectors is not all of R3 and so this cannot be a basis. Example 3.2.7 Show that



   1 1      1 , 1  1 3

is not a basis for R3 . If the span of these vectors were all of R3 , this would contradict Theorem 3.1.5 because it would be a spanning set which is shorter than a linearly independent set {e1 , e2 , e3 }. Example 3.2.8 Show that

       1 1 1 1          1 , 1  , 0 , 1  1 0 3 1 

is not a basis for R3 . If it were a basis, then it would need to be linearly independent but this cannot happen because it would contradict Theorem 3.1.5 by being an independent set of vectors which is longer than a spanning set. Theorem 3.2.9 If V is an n dimensional vector space and if {u1 , · · · , un } is a linearly independent set, then it is a basis. If m > n then {v1 , · · · , vm } is a dependent set. If V = span (w1 , · · · , wm ) , then m ≥ n and there is a subset {u1 , · · · , un } ⊆ {w1 , · · · , wm } such that {u1 , · · · , un } is a basis. If {u1 , · · · , uk } is linearly independent, then there exists {u1 , · · · , uk , · · · , un } which is a basis. Proof: Say {u1 , · · · , un } is linearly independent. Is span (u1 , · · · , un ) = V ? If not, there would be w∈ / span (u1 , · · · , un ) and then by Lemma 3.1.7 {u1 , · · · , un , w} would be linearly independent which contradicts Theorem 3.1.5. As to the second claim, {v1 , · · · , vm } cannot be linearly independent because this would contradict Theorem 3.1.5 and so it is dependent. Now say V = span (w1 , · · · , wm ). By Theorem 3.1.5 again, you must have m ≥ n since spanning sets are at least as long as linearly independent sets, one of which is a basis having n vectors. If w1 is in the span of the other vectors, delete it. Then consider w2 . If it is in the span of the other vectors, delete it. Continue this way till a shorter list is obtained with the property that no vector is a linear combination of the others, but its span is still V . By Proposition 3.1.3, the resulting list of vectors is linearly independent and is therefore, a basis since it spans V . Now suppose for k < n, {u1 , · · · , uk } is linearly independent. Follow the process of Proposition 3.1.8, adding in vectors not in the span and obtaining successively larger linearly independent sets till the process ends. The resulting list must be a basis. 

3.3. EXERCISES

3.3

51

Exercises

1. Show that the following are subspaces of the set of all functions defined on [a, b] . (a) polynomials of degree ≤ n (b) polynomials (c) continuous functions (d) differentiable functions 2. Show that every subspace of a finite dimensional vector space V is the span of some vectors. It was done above but go over it in your own words. 3. In R2 define a funny addition by (x, y) + (ˆ x, yˆ) ≡ (3x + 3ˆ x, y + yˆ) and let scalar multiplication be the usual thing. Would this be a vector space with these operations? 4. Determine which of the following are subspaces of Rm for some m. a, b are just given numbers in what follows. { } (a) (x, y) ∈ R2 : ax + by = 0 { } (b) (x, y) ∈ R2 : ax + by ≥ y { } (c) (x, y) ∈ R2 : ax + by = 1 { } (d) (x, y) ∈ R2 : xy = 0 { } (e) (x, y) ∈ R2 : y ≥ 0 { } (f) (x, y) ∈ R2 : x > 0 or y > 0 (g) For those who recall the cross product, { } x ∈ R3 : a × x = 0 (h) For those who recall the dot product, {x ∈ Rm : x · a = 0} (i) {x ∈ Rn : x · a ≥ 0} (j) {x ∈ Rm : x · s = 0 for all s ∈ S, S ̸= ∅, S ⊆ Rm } . This is known as S ⊥ . { } 5. Show that (x, y, z) ∈ R3 : x + y − z = 0 is a subspace and find a basis for it. { } 6. In the subspace of polynomials on [0, 1] , show that the vectors 1, x, x2 , x3 are linearly independent. Show these vectors are a basis for the vector space of polynomials of degree no more than 3. 7. Determine whether the real valued functions defined on R { 2 } x + 1, x3 + 2x2 + x, x3 + 2x2 − 1, x3 + x2 + x are linearly independent. Is this a basis for the subspace of polynomials of degree no more than 3? Explain why or why not. 8. Determine whether the real valued functions defined on R { 2 } x + 1, x3 + 2x2 + x, x3 + 2x2 + x, x3 + x2 + x are linearly independent. Is this a basis for the subspace of polynomials of degree no more than 3? Explain why or why not.

52

CHAPTER 3. VECTOR SPACES 9. Show that the following are each a basis for R3 .       −1 2 3       (a)  2  ,  2  ,  −1  1 −1 −1       −2 3 4       (b)  0  ,  1  ,  1  2 −2 −2       −3 5 6       (c)  0  ,  1  ,  1  1 −1 −1       1 2 −1       (d)  2  ,  2  ,  −1  −1 −1 1

10. Show that each of the following is not a basis for R3 . Explain why they fail to be a basis.       3 0 1       (a)  1  ,  1  ,  5  5 1 1       0 3 1       (b)  −1  ,  1  ,  −1  1 5 1       1 0 3       (c)  2  ,  1  ,  0  1 1 5     1 1     (d)  0  ,  1  1 0         1 2 −1 1         (e)  2  ,  2  ,  −1  ,  0  −1 −1 1 0 11. Suppose B is a subset of the set of complex valued functions, none equal to 0 and defined on Ω and it has the property that if f, g are different, then f g = 0. Show that B must be linearly independent. 12. Suppose you have continuous real valued functions defined on [0, 1] , {f1 , f2 , · · · , fn } and these satisfy { ∫ 1 1 if i = j fi (x) fj (x) dx = δ ij ≡ 0 if i ̸= j 0 Show that these functions must be linearly independent. 13. Show that the real valued functions cos (2x) , 1, cos2 (x) are linearly dependent. 14. Show that the real valued functions ex sin (2x) , ex cos (2x) are linearly independent.

3.3. EXERCISES

53

15. Let the √ field of scalars be Q and let the vector space be all vectors (real numbers) of the form a + b 2 for a, b ∈ Q. Show that this really is a vector space and find a basis for it. ( ) ( ) 2 1 16. Consider the two vectors , in R2 . Show that these are linearly independent. 1 2 ( ) ( ) 2 1 Now consider , in Z23 where the numbers are interpreted as residue classes. Are 1 2 these vectors linearly independent? If not, give a nontrivial linear combination which is 0. 17. Is C a vector space with field of scalars R? If so, what is the dimension of this vector space? Give a basis. 18. Is C a vector space with field of scalars C? If so, what is the dimension? Give a basis. 19. The space of real valued continuous functions on [0, 1] usually denoted as C ([0, 1]) is a vector space with field of scalars R. Explain why it is not a finite dimensional vector space. 20. Suppose two vector spaces V, W have the same field of scalars F. Show that V ∩ W is a subspace of both V and W . 21. If V, W are two sub spaces of a vector space U , define V + W ≡ {v + w : v ∈ V, w ∈ W } . Show that this is a subspace of U . 22. If V, W are two sub spaces of a vector space U , consider V ∪ W, the vectors which are in either V or W. Will this be a subspace of U ? If so, prove it is the case and if not, give an example which shows that it is not necessarily true. 23. Let V, W be vector spaces. A function T : V → W is called a linear transformation if whenever α, β are scalars and u, v are vectors in V , it follows that T (αu + βv) = αT u + βT v. Then ker (T ) ≡ {u ∈ V : T u = 0} , Im (T ) ≡ {T u : u ∈ V } . Show the first of these is a subspace of V and the second is a subspace of W . 24. ↑In the situation of the above problem, where T is a linear transformation, suppose S is a linearly independent subset of W . Define T −1 (S) ≡ {u ∈ V : T u ∈ S} . Show that T −1 (S) is linearly independent. 25. ↑In the situation of the above problems, rank (T ) is defined as the dimension of Im (T ). Also the nullity of T , denoted as null (T ) is defined as the dimension of ker (T ). In this problem, you will show that if the dimension of V is n, then rank (T ) + null (T ) = n. (a) Let a basis for ker (T ) be {z1 , · · · , zr } . Let a basis for Im (T ) be {T v1 , · · · , T vs } . You need to show that r +s ∑ = n. Begin with u ∈ V and consider T u. It is a linear combination s of {T v1 , · · · , T vs } say i=1 ai T vi . Why? ∑s (b) Next∑explain why∑ T (u − i=1 ai vi ) = 0. Then explain why there are scalars bj such that s r u − i=1 ai vi = j=1 bj zj . (c) Observe that V = span (z1 , · · · , zr , v1 , · · · , vs ) . Why? (d) Finally show that {z1 , · · · , zr , v1 , · · · , vs } is linearly independent. Thus n = r + s.

54

CHAPTER 3. VECTOR SPACES

3.4

Polynomials and Fields

As an application of the theory of vector spaces, this section considers the problem of field extensions. When you have a polynomial like x2 − 3 which has no rational roots, it turns out you can enlarge the field of rational numbers to obtain a larger field such that this polynomial does have roots in this larger field. I am going to discuss a systematic way to do this. It will turn out that for any polynomial with coefficients in any field, there always exists a possibly larger field such that the polynomial has roots in this larger field. This book mainly features the field of real or complex numbers but this procedure will show how to obtain many other fields. The ideas used in this development are the same as those used later in the material on linear transformations but slightly easier. Here is an important idea concerning equivalence relations which I hope is familiar. If not, see Section 1.3 on page 5. Definition 3.4.1 Let S be a set. The symbol, ∼ is called an equivalence relation on S if it satisfies the following axioms. 1. x ∼ x

for all x ∈ S. (Reflexive)

2. If x ∼ y then y ∼ x. (Symmetric) 3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive) Definition 3.4.2 [x] denotes the set of all elements of S which are equivalent to x and [x] is called the equivalence class determined by x or just the equivalence class of x. Also recall the notion of equivalence classes. Theorem 3.4.3 Let ∼ be an equivalence class defined on a set, S and let H denote the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ∼ y and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅. Definition 3.4.4 Let F be a field, for example the rational numbers, and denote by F (x) the polynomials having coefficients in F. Suppose p (x) is a polynomial. Let a (x) ∼ b (x) (a (x) is similar to b (x)) when a (x) − b (x) = k (x) p (x) for some polynomial k (x) . Denote by (p (x)) all polynomials of the form p (x) k (x) where k (x) is some polynomial. Proposition 3.4.5 In the above definition, ∼ is an equivalence relation. Proof: First of all, note that a (x) ∼ a (x) because their difference equals 0p (x) . If a (x) ∼ b (x) , then a (x)−b (x) = k (x) p (x) for some k (x) . But then b (x)−a (x) = −k (x) p (x) and so b (x) ∼ a (x). Next suppose a (x) ∼ b (x) and b (x) ∼ c (x) . Then a (x) − b (x) = k (x) p (x) for some polynomial k (x) and also b (x) − c (x) = l (x) p (x) for some polynomial l (x) . Then a (x) − c (x) = a (x) − b (x) + b (x) − c (x) = k (x) p (x) + l (x) p (x) = (l (x) + k (x)) p (x) and so a (x) ∼ c (x) and this shows the transitive law.  Definition 3.4.6 Let F be a field and let p (x) ∈ F (x) be a nonzero monic polynomial. This means that the coefficient of the highest power is 1. Also let p (x) have degree at least 1. For the similarity relation of Definition 3.4.4, define the following operations on the equivalence classes. [a (x)] is an equivalence class means that it is the set of all polynomials which are similar to a (x). [a (x)] + [b (x)] ≡ [a (x) + b (x)] [a (x)] [b (x)] ≡ [a (x) b (x)] This collection of equivalence classes is sometimes denoted by F (x) / (p (x)). This is called a quotient space.

3.4. POLYNOMIALS AND FIELDS

55

The set of equivalence classes just described is a commutative ring. This is like a field except it may fail to have multiplicative inverses. The reason for considering only polynomials of degree at least 1 is that F (x) / (1) isn’t very interesting because f (x) ∼ g (x) if and only if their difference is a multiple of 1. Thus every two polynomials are similar so there is only one similarity class. In particular, [1] ∼ [0] . It is shown below that this is well defined. Axiom 3.4.7 Here are the axioms for a commutative ring. 1. x + y = y + x, (commutative law for addition) 2. There exists 0 such that x + 0 = x for all x, (additive identity). 3. For each x ∈ F, there exists −x ∈ F such that x + (−x) = 0, (existence of additive inverse). 4. (x + y) + z = x + (y + z) ,(associative law for addition). 5. xy = yx, (commutative law for multiplication). You could write this as x × y = y × x. 6. (xy) z = x (yz) ,(associative law for multiplication). 7. There exists 1 such that 1x = x for all x,(multiplicative identity). 8. x (y + z) = xy + xz.(distributive law). Recall that p (x) is irreducible, means the only monic polynomials which divide it are 1 and itself. Lemma 3.4.8 With the equivalence classes defined in Definition 3.4.6 where p (x) is a monic polynomial of degree at least 1, 1. The operations are well defined. 2. F (x) / (p (x)) is a commutative ring 3. If a, b ∈ F and [a] = [b] , then a = b. Thus F can be considered a subset of F (x) / (p (x)) . 4. Also [q (x)] = 0 if and only if q (x) = p (x) l (x) for some polynomial l (x). 5. F (x) / (p (x)) is a field if and only if p (x) is also irreducible. Proof: 1.) To show the operations are well defined, suppose [a (x)] = [a′ (x)] , [b (x)] = [b′ (x)] It is necessary to show

[a (x) + b (x)] = [a′ (x) + b′ (x)] [a (x) b (x)] = [a′ (x) b′ (x)]

Consider the second of the two. a′ (x) b′ (x) − a (x) b (x) = a′ (x) b′ (x) − a (x) b′ (x) + a (x) b′ (x) − a (x) b (x) = b′ (x) (a′ (x) − a (x)) + a (x) (b′ (x) − b (x)) Now by assumption (a′ (x) − a (x)) is a multiple of p (x) as is (b′ (x) − b (x)) , so the above is a multiple of p (x) and by definition this shows [a (x) b (x)] = [a′ (x) b′ (x)]. The case for addition is similar. 2.) The various algebraic properties related to these operations are obvious and come directly from the definitions. 3.) Now suppose [a] = [b] . This means that a − b = k (x) p (x) for some polynomial k (x) . Then k (x) must equal 0 since otherwise the two polynomials a − b and k (x) p (x) could not be equal

56

CHAPTER 3. VECTOR SPACES

because they would have different degree. This is where it is important to have the degree of p (x) at least 1. 4.) [q (x)] = [0] means q (x) ∼ 0 which means q (x) = p (x) l (x) for some l (x). 5.) Suppose p (x) is irreducible. Let [q (x)] ∈ F (x) / (p (x)) where [q (x)] ̸= [0] . Then q (x) is not a multiple of p (x) and so q (x) , p (x) are relatively prime. This is because if ψ (x) is a monic polynomial which divides both q (x) and p (x) , then since p (x) is irreducible, ψ (x) equals either a multiple of p (x) which is given not to happen since [q (x)] ̸= 0 or ψ (x) = 1. Thus there exist n (x) , m (x) such that 1 = n (x) q (x) + m (x) p (x) Hence [1] = [n (x) q (x)] = [n (x)] [q (x)] −1

which shows that [q (x)] = [n (x)] . Now suppose p (x) is not irreducible. Then p (x) = l (x) k (x) where l (x) , k (x) have smaller degree than p (x). Then [0] = [l (x)] [k (x)] . Neither [l (x)] nor [k (x)] equals 0 because neither is a multiple of p (x) and this cannot happen in a field. Thus if p (x) is not irreducible, then F (x) / (p (x)) is not a field.  The following proposition is mostly a summary of the above lemma. Recall irreducible means the only monic polynomials which divide p (x) are itself and nonzero scalars. Proposition 3.4.9 In the situation of Definition 3.4.6 where p (x) is a nonzero monic, irreducible polynomial of degree at least 1, the following are valid. 1. The definitions of addition and multiplication are well defined. 2. If a, b ∈ F and [a] = [b] , then a = b. Thus F can be considered a subset of F (x) / (p (x)) . 3. F (x) / (p (x)) is a field in which the polynomial p (x) has a root. 4. F (x) / (p (x)) is a vector space with field of scalars F and its dimension is m where m is the degree of the irreducible polynomial p (x). Proof: 1.) This is shown in Lemma 3.4.8 as is 2.) and 3.) except for the part of 3.) which says p (x) has a root. The polynomial p (x) has a root in this field because if p (x) = xm + am−1 xm−1 + · · · + a1 x + a0 , m

m−1

[0] = [p (x)] = [x] + [am−1 ] [x]

+ · · · + [a1 ] [x] + [a0 ]

Thus [x] is a root of this polynomial in the field F (x) / (p (x)). Consider the last claim. It is clear that F (x) / (p (x)) is a vector space with field of scalars F. Indeed, the operations are defined such that for α, β ∈ F, α [r (x)] + β [b (x)] ≡ [αr (x) + βb (x)] It remains to consider the dimension of this vector space. Let f (x) ∈ F (x) / (p (x)) . Thus [f (x)] is a typical thing in F (x) / (p (x)). Then from the division algorithm, f (x) = p (x) q (x) + r (x) where r (x) is either 0 or has degree less than the degree of p (x) . Thus [r (x)] = [f (x) − p (x) q (x)] = [f (x)] ) ( ) m−1 m−1 but clearly [r (x)] ∈ span [1] , [x] , · · · , [x] . Thus span [1] , · · · , [x] = F (x) / (p (x)). Then { } m−1 [1] , [x] , · · · , [x] is a basis if these vectors are linearly independent. Suppose then that (

m−1 ∑ i=0

i

ci [x] =

[m−1 ∑ i=0

] ci x

i

=0

3.4. POLYNOMIALS AND FIELDS

57

∑m−1 Then you would need to have p (x) / i=0 ci xi which is impossible unless each ci = 0 because p (x) has degree m.  This shows how to enlarge a field to get a new one in which the polynomial has a root. By using a succession of such enlargements, called field extensions, there will exist a field in which the given polynomial can be factored into a product of polynomials having degree one. The field you obtain in this process of enlarging in which the given polynomial factors in terms of linear factors is called a splitting field. Remark 3.4.10 The polynomials consisting of all polynomial multiples of p (x) , denoted by (p (x)) is called an ideal. An ideal I is a subset of the commutative ring (Here the ring is F (x) .) with unity consisting of all polynomials which is itself a ring and which has the property that whenever f (x) ∈ F (x) , and g (x) ∈ I, f (x) g (x) ∈ I. In this case, you could argue that (p (x)) is an ideal and that the only ideal containing it is itself or the entire ring F (x). This is called a maximal ideal. Example 3.4.11 The polynomial x2 −2 is irreducible in Q (x) . This is because if x2 −2 = p (x) q (x) where p (x) , q (x) both have degree less than 2, then they both have degree 1. Hence you would have that a+b = 0 so(this factorization is of the form (x − a) (x + a) x2 −2 = (x + a) (x + b) which requires √ ) and now you need to have a = 2 ∈ / Q. Now Q (x) / x2 − 2 is of the form a + b [x] where a, b ∈ Q √ √ ( ) 2 and [x] − 2 = 0. Thus one can regard [x] as 2. Q (x) / x2 − 2 is of the form a + b 2. [ ] In the above example, x2 + x is not zero because it is not a multiple of x2 − 2. What is [ 2 ]−1 x + x ? You know that the two polynomials are relatively prime and so there exists n (x) , m (x) such that ( ) ( ) 1 = n (x) x2 − 2 + m (x) x2 + x [ ]−1 Thus [m (x)] = x2 + x . How could you find these polynomials? First of all, it suffices to consider only n (x) and m (x) having degree less than 2. Otherwise, reiterating the above, m (x) = p (x) l (x) + r (x) where r (x) has degree smaller than the degree of p (x) and you could simply use r (x) in place of m (x). ( ) ( ) 1 = (ax + b) x2 − 2 + (cx + d) x2 + x 1 = ax3 − 2b + bx2 + cx2 + cx3 + dx2 − 2ax + dx Now you solve the resulting system of equations. a=

1 1 1 ,b = − ,c = − ,d = 1 2 2 2

[ ] Then the desired inverse is − 12 x + 1 . To check, ( ) ( ) ( ) 1 1 − x + 1 x2 + x − 1 = − (x − 1) x2 − 2 2 2 [ ][ ] Thus − 12 x + 1 x2 + x − [1] = [0]. The above is an example of something general described in the following definition. Definition 3.4.12 Let F ⊆ K be two fields. Then clearly K is also a vector space over F. Then also, K is called a finite field extension of F if the dimension of this vector space, denoted by [K : F ] is finite. There are some easy things to observe about this. Proposition 3.4.13 Let F ⊆ K ⊆ L be fields. Then [L : F ] = [L : K] [K : F ]. n

m

Proof: Let {li }i=1 be a basis for L over K and let {kj }j=1 be a basis of K over F . Then if l ∈ L, ∑n there ∑ exist unique scalars xi in K such that∑ l= ∑ i=1 xi li . Now xi ∈ K so there exist fji such that m n m xi = j=1 fji kj . Then it follows that l = i=1 j=1 fji kj li . It follows that {kj li } is a spanning

58

CHAPTER 3. VECTOR SPACES

∑n ∑m ∑m set. If i=1 j=1 fji kj li = 0. Then, since the li are independent, it follows that j=1 fji kj = 0 and since {kj } is independent, each fji = 0 for each j for a given arbitrary i. Therefore, {kj li } is a basis.  You will see almost exactly the same argument in exhibiting a basis for L (V, W ) the linear transformations mapping V to W . Note that if p (x) were of degree n and not irreducible, then there still exists an extension G containing a root of p (x) such that [G : F] ≤ n. You could do this by working with an irreducible factor of p (x). Theorem 3.4.14 Let p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 be a polynomial with coefficients in a field of scalars F. There exists a larger field G and {z1 , · · · , zn } contained in G, listed according to multiplicity, such that n ∏ p (x) = (x − zi ) i=1

This larger field is called a splitting field. Furthermore, [G : F] ≤ n! Proof: From Proposition 3.4.9, there exists a field F1 such that p (x) has a root, z1 (= [x]) Then by the Euclidean algorithm p (x) = (x − z1 ) q1 (x) + r where r ∈ F1 . Since p (z1 ) = 0, this requires r = 0. Now do the same for q1 (x) that was done for p (x) , enlarging the field to F2 if necessary, such that in this new field q1 (x) = (x − z2 ) q2 (x) and so p (x) = (x − z1 ) (x − z2 ) q2 (x) . After no more than n such extensions, you will have obtained the necessary field G. Finally consider the claim about dimension. By Proposition 3.4.9, there is a larger field G1 such that p (x) has a root a1 in G1 and [G1 : F] ≤ n. Then p (x) = (x − a1 ) q (x) . Continue this way until the polynomial equals the product of linear factors. Then by Proposition 3.4.13 applied multiple times, [G : F] ≤ n!.  Example 3.4.15 The polynomial x2 + 1 is irreducible in R (x) , polynomials having real coefficients. To see this is the case, suppose ψ (x) divides x2 + 1. Then x2 + 1 = ψ (x) q (x) . If the degree of ψ (x) is less than 2, then it must be either a constant or of the form ax + b. In the latter case, −b/a must be a zero of the right side, hence of the left but x2 + 1 has no real zeros. Therefore, the degree of ψ (x) must be two and q (x) must be a constant. Thus the only polynomial which divides x2 + 1 are 2 2 constants and [ 2 ] multiples of x + 1. Therefore, this shows ( 2x + )1 is irreducible. Find the inverse of x + x + 1 in the space of equivalence classes, R (x) / x + 1 . You can solve this with partial fractions. 1 x x+1 =− 2 + (x2 + 1) (x2 + x + 1) x + 1 x2 + x + 1 ( ) ( ) ( ) and so 1 = (−x) x2 + x + 1 + (x + 1) x2 + 1 which implies 1 ∼ (−x) x2 + x + 1 and so the inverse is [−x] . The following proposition is interesting. It was essentially proved above but to emphasize it, here it is again. Proposition 3.4.16 Suppose p (x) ∈ F (x) is irreducible and has degree n. Then every element of G = F (x) / (p (x)) is of the form [0] or [r (x)] where the degree of r (x) is less than n. Proof: This follows right away from the Euclidean algorithm for polynomials. If k (x) has degree larger than n − 1, then k (x) = q (x) p (x) + r (x) where r (x) is either equal to 0 or has degree less than n. Hence [k (x)] = [r (x)] .  Example 3.4.17 In the situation of the above example where the polynomial is x2 + 1 irreducible −1 in R (x), find [ax + b] assuming a2 + b2 ̸= 0. Note this includes all cases of interest thanks to the above proposition.

3.4. POLYNOMIALS AND FIELDS

59

You can do it with partial fractions as above. 1 b − ax a2 = + (x2 + 1) (ax + b) (a2 + b2 ) (x2 + 1) (a2 + b2 ) (ax + b) and so 1= Thus

1 a2 +b2

a2

( 2 ) 1 a2 (b − ax) (ax + b) + 2 x +1 2 2 +b (a + b )

(b − ax) (ax + b) ∼ 1and so −1

[ax + b]

=

[(b − ax)] b − a [x] = 2 2 2 a +b a + b2 −1

You might find it interesting to recall that (ai + b) = ab−ai 2 +b2 . Didn’t this just produce the complex numbers algebraically? If, instead of R you used Q this would have just produced a field Q + iQ.

3.4.1

The Algebraic Numbers and Minimum Polynomial

Each polynomial having coefficients in a field F has a splitting field. Consider the case of all polynomials p (x) having coefficients in a field F ⊆ G and consider all roots which are also in G. The theory of vector spaces is very useful in the study of these algebraic numbers. Here is a definition. Definition 3.4.18 Let F and G be two fields, F ⊆ G. The algebraic numbers A are those numbers which are in G and also roots of some polynomial p (x) having coefficients in F. The minimum polynomial1 of a ∈ A is defined to be the monic polynomial p (x) having smallest degree such that p (a) = 0. It is also often called the minimum polynomial. The next theorem is on the uniqueness of the minimum polynomial. Theorem 3.4.19 Let a ∈ A. Then there exists a unique monic irreducible polynomial p (x) having coefficients in F such that p (a) = 0. This polynomial is the minimum polynomial. Proof: Let p (x) be a monic polynomial having smallest degree such that p (a) = 0. Then p (x) is irreducible because if not, there would exist a polynomial having smaller degree which has a as a root. Now suppose q (x) is monic with smallest degree such that q (a) = 0. Then q (x) = p (x) l (x) + r (x) where if r (x) ̸= 0, then it has smaller degree than p (x). But in this case, the equation implies r (a) = 0 which contradicts the choice of p (x). Hence r (x) = 0 and so, since q (x) has smallest degree, l (x) = 1 showing that p (x) = q (x).  Definition 3.4.20 For a an algebraic number, let deg (a) denote the degree of the minimum polynomial of a. Also, here is another definition. Definition 3.4.21 Let a1 , · · · , am be in A. A polynomial in {a1 , · · · , am } will be an expression of the form ∑ ak1 ···kn ak11 · · · aknn k1 ···kn

where the ak1 ···kn are in F, each kj is a nonnegative integer, and all but finitely many of the ak1 ···kn equal zero. The collection of such polynomials will be denoted by F (a1 , · · · , am ) . The splitting field of g (x) ∈ F [x] is F (a1 , · · · , am ) where the {a1 , · · · , am } are the roots of g (x) in A. 1 I grew up calling this and similar things the minimal polynomial, but I think it is better to call it the minimum polynomial because it is unique. If you see minimal polynomial, this is what it is.

60

CHAPTER 3. VECTOR SPACES

Now notice that for a an algebraic number, F (a) is a finite dimensional vector space with field of scalars F. Similarly, for {a1 , · · · , am } algebraic numbers, F (a1 , · · · , am ) is a finite dimensional vector space with field of scalars F. The following fundamental proposition demonstrates this observation. This is a remarkable result. Proposition 3.4.22 Let {a1 , · · · , am } be algebraic numbers. Then dim F (a1 , · · · , am ) ≤

m ∏

deg (aj )

j=1

and for an algebraic number a, dim F (a) = deg (a) Every element of F (a1 , · · · , am ) is in A and F (a1 , · · · , am ) is a field. Proof: Let the minimum polynomial of a be p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 . If q (a) ∈ F (a) , then q (x) = p (x) l (x) + r (x) where r (x) has degree less than the degree of p (x) if it is not zero. Hence q (a) = r (a). Thus F (a) is spanned by { } 1, a, a2 , · · · , an−1 Since p (x) has smallest degree of all polynomials which have a as a root, the above set is also linearly independent. This proves the second claim. Now consider the{first claim. By definition, F (a1 , · · · , am ) is obtained from all linear combina} ak11 , ak22 , · · · , aknn

tions of products of

where the ki are nonnegative integers. From the first part,

it suffices to consider only kj ≤ deg (aj ). This is because am 1 can be written as a linear combik nation of a for k ≤ deg (a ). Therefore, there exists a spanning set for∏F (a1 , · · · , am ) which has 1 1 ∏m m i=1 deg (ai ) entries. By Theorem 3.2.9 a basis has no more vectors than i=1 deg (ai ) . This proves the first claim. Consider the last claim. Let g (a1 , · · · , am ) be a polynomial in {a1 , · · · , am } in F (a1 , · · · , am ). Since m ∏ dim F (a1 , · · · , am ) ≡ p ≤ deg (aj ) < ∞, j=1

it follows 2

1, g (a1 , · · · , am ) , g (a1 , · · · , am ) , · · · , g (a1 , · · · , am )

p

are dependent. It follows g (a1 , · · · , am ) is the root of some polynomial having coefficients in F. Thus everything in F (a1 , · · · , am ) is algebraic. Why is F (a1 , · · · , am ) a field? Let g (a1 , · · · , am ) ̸= 0 be in F (a1 , · · · , am ). Then it has a minimum polynomial, p (x) = xq + aq−1 xq−1 + · · · + a1 x + a0 where the ai ∈ F. Then a0 = ̸ 0 or else the polynomial would not be minimum. You would have ( ) q−1 q−2 g (a1 , · · · , am ) g (a1 , · · · , am ) + aq−1 g (a1 , · · · , am ) + · · · + a1 = 0 q−1

and so g (a1 , · · · , am )

q−2

+ aq−1 g (a1 , · · · , am ) + · · · + a1 = 0. Therefore, since a0 ̸= 0, ( ) q−1 q−2 g (a1 , · · · , am ) g (a1 , · · · , am ) + aq−1 g (a1 , · · · , am ) + · · · + a1 = −a0

3.4. POLYNOMIALS AND FIELDS

61

and so the multiplicative inverse for g (a1 , · · · , am ) is g (a1 , · · · , am )

q−1

q−2

+ aq−1 g (a1 , · · · , am ) −a0

+ · · · + a1

∈ F (a1 , · · · , am ) .

The other axioms of a field are obvious.  Now from this proposition, it is easy to obtain the following interesting result about the algebraic numbers. Like the above result, it is amazing. Theorem 3.4.23 The algebraic numbers A, those roots of polynomials in F [x] which are in G, are a field. Proof: By definition, each a ∈ A has a minimum polynomial. Let a ̸= 0 be an algebraic number and let p (x) be its minimum polynomial. Then p (x) is of the form xn + an−1 xn−1 + · · · + a1 x + a0 where a0 ̸= 0. Otherwise p(x) would not have minimum degree. Then plugging in a yields ( n−1 ) a + an−1 an−2 + · · · + a1 (−1) a = 1. a0 (an−1 +an−1 an−2 +···+a1 )(−1) and so a−1 = ∈ F (a). By Proposition 3.4.22, every element of F (a) is in a0 A and this shows that for every nonzero element of A, its inverse is also in A. What about products and sums of things in A? Are they still in A? Yes. If a, b ∈ A, then both a + b and ab ∈ F (a, b) and from the proposition, each element of F (a, b) is in A.  A typical example of what is of interest here is when the field F of scalars is Q, the rational numbers and the field G is R. However, you can certainly conceive of many other examples by considering the integers mod a prime, for example (See Propositon 1.14.2 on Page 22 for example.) or any of the fields which occur as field extensions in the above. There is a very interesting thing about F (a1 , · · · , an ) in the case where F is infinite which says that there exists a single algebraic γ such that F (a1 , · · · , an ) = F (γ). In other words, every field extension of this sort is a simple field extension. I found this fact in an early version of [12]. Proposition 3.4.24 Let F be infinite. Then there exists γ such that F (a1 , · · · , an ) = F (γ). Here each ai is algebraic. The γ will be of the form γ = a1 + λ1 a2 + · · · + λn−1 an where the λk are in a splitting field. If the field F includes Q so that all polynomials have roots in C, you can conclude that each λi is a positive integer. Proof: To begin with, consider F (α, β). Let γ = α + λβ. Then by Proposition 3.4.22 γ is an algebraic number and it is also clear F (γ) ⊆ F (α, β) I need to show the other inclusion. This will be done for a suitable choice of λ. To do this, it suffices to verify that both α and β are in F (γ). Let the minimum polynomials of α and β be f (x) and g (x) respectively. Let the distinct roots of f (x) and g (x) be {α1 , α2 , · · · , αn } and {β 1 , β 2 , · · · , β m } respectively. These roots are in a field which contains splitting fields of both f (x) and g (x). Let α = α1 and β = β 1 . Now define h (x) ≡ f (α + λβ − λx) ≡ f (γ − λx) so that h (β) = f (α) = 0. It follows (x − β) divides both h (x) and g (x) since it is( given )that g (β) = 0. If (x − η) is a different linear factor of both g (x) and h (x) then it must be x − β j for some β j for some j > 1 because these are the only factors of g (x) . Therefore, this would require ( ) ( ) 0 = h β j = f α1 + λβ 1 − λβ j

62

CHAPTER 3. VECTOR SPACES

and so it would be the case that α1 + λβ 1 − λβ j = αk for some k. Hence λ=

αk − α1 β1 − βj

Now there are finitely many quotients of the above form and if λ is chosen to not be any of them, then the above cannot happen and so in this case, the only linear factor of both g (x) and h (x) will be (x − β). Choose such a λ. If all roots are in C and F contains Q, then you could pick λ a positive integer. Let ϕ (x) be the minimum polynomial of β with respect to the field F (γ). Then this minimum polynomial must divide both h (x) and g (x) because h (β) = g (β) = 0. However, the only factor these two have in common is x−β and so ϕ (x) = x−β which requires β ∈ F (γ) . Now also α = γ −λβ and so α ∈ F (γ) also. Therefore, both α, β ∈ F (γ) which forces F (α, β) ⊆ F (γ) . This proves the proposition in the case that n = 2. The general result follows right away by observing that F (a1 , · · · , an ) = F (a1 , · · · , an−1 ) (an ) and using induction. F (a1 , · · · , an−1 ) (an ) = F (γ 1 ) (an ) = F (γ 1 , an ) = F (γ). Note that γ = α + λβ for two of these. Then F (a1 , a2 , a3 ) = F (a1 , a2 ) (a3 ) = F (a1 + λ1 a2 ) (a3 ) = F (a1 + λ1 a2 , a3 ) = F (a1 + λ1 a2 + λ2 a3 ) continuing this way shows that there are λi in a suitable splitting field such that for γ = a1 + λ1 a2 + · · · + λn−1 an , F (a1 , · · · , an ) = F (γ). If all the numbers are in C, and your field F includes Q, you could choose all the λi to be positive integers. 

3.4.2

The Lindermannn Weierstrass Theorem and Vector Spaces

As another application of the abstract concept of vector spaces, there is an amazing theorem due to Weierstrass and Lindemann. Theorem 3.4.25 Suppose a1 , · · · , an are algebraic numbers, roots of a polynomial with rational coefficients, and suppose α1 , · · · , αn are distinct algebraic numbers. Then n ∑

ai eαi ̸= 0

i=1

In other words, the {e , · · · , e algebraic numbers. α1

αn

} are independent as vectors with field of scalars equal to the

There is a proof of this later. It is long and hard but only depends on elementary considerations other than some algebra involving symmetric polynomials. See Theorem 10.2.8. It is presented here to illustrate how the language of linear algebra is useful in describing something which is very exotic, apparently very far removed from something like Rp . A number is transcendental, as opposed to algebraic, if it is not a root of a polynomial which has integer (rational) coefficients. Most numbers are this way but it is hard to verify that specific numbers are transcendental. That π is transcendental follows from e0 + eiπ = 0. By the above theorem, this could not happen if π were algebraic because then iπ would also be algebraic. Recall these algebraic numbers form a field and i is clearly algebraic, being a root of x2 + 1. This fact about π was first proved by Lindemann in 1882 and then the general theorem above was proved by Weierstrass in 1885. This fact that π is transcendental solved an old problem called squaring the circle which was to construct a square with the same area as a circle using a straight edge and compass. Such numbers are all algebraic. Thus the fact π is transcendental implies this problem is impossible.2 2 Gilbert, the librettist of the Savoy operas, may have heard about this great achievement. In Princess Ida which opened in 1884 he has the following lines. “As for fashion they forswear it, so they say - so they say; and the circle they will square it some fine day some fine day.” Of course it had been proved impossible to do this a couple of years before.

3.5.

3.5

EXERCISES

63

Exercises

1. Let p (x) ∈ F (x) and suppose that p (x) is the minimum polynomial for a ∈ F. Consider a field extension of F called G. Thus a ∈ G also. Show that the minimum polynomial of a with coefficients in G must divide p (x). 2. Here is a polynomial in Q (x)

x2 + x + 3

( ) Show it is irreducible in]Q (x). Now consider x2 − x + 1.( Show that)in Q (x) / x2 + x + 3 it [ 2 follows that x − x + 1 ̸= 0. Find its inverse in Q (x) / x2 + x + 3 . 3. Here is a polynomial in Q (x)

x2 − x + 2

) ( Show it is irreducible in Q (x). Now consider( x + 2. Show that in Q (x) / x2 − x + 2 it follows ) that [x + 2] ̸= 0. Find its inverse in Q (x) / x2 − x + 2 . 4. Here is a polynomial in Z3 (x)

x2 + x + ¯2

) ( ¯2] is not zero in Z3 (x) / x2 + x + ¯2 . Now find its Show it is irreducible in Z (x). Show [x + 3 ( ) inverse in Z3 (x) / x2 + x + ¯ 2 . 5. Suppose the degree of p (x) is r where p (x) is an irreducible monic polynomial with coefficients in[ a ]field F.[ It was r and that a basis is { ]} shown that the dimension of F (x) / (p (x)) is ∑ r 1, [x] , x2 , · · · , xr−1 . Now let A be an r × r matrix and let qi (x) = k=1 Aij xj−1 . Show that {[q1 (x)] , · · · , [qr (x)]} is a basis for F (x) / (p (x)) if and only if the matrix A is invertible. 6. Suppose you have W a subspace of a finite dimensional vector space V . Suppose also that dim (W ) = dim (V ) . Tell why W = V. 7. Suppose V is a vector space with field of scalars F. Let T ∈ L (V, W ) , the space of linear transformations mapping V onto W where W is another vector space (See Problem 23 on Page 53.). Define an equivalence relation on V as follows. v ∼ w means v − w ∈ ker (T ) . Recall that ker (T ) ≡ {v : T v = 0}. Show this is an equivalence relation. Now for [v] an equivalence class define T ′ [v] ≡ T v. Show this is well defined. Also show that with the operations [v] + [w] ≡ [v + w] α [v] ≡ [αv] this set of equivalence classes, denoted by V / ker (T ) is a vector space. Show next that T ′ : V / ker (T ) → W is one to one. This new vector space, V / ker (T ) is called a quotient space. Show its dimension equals the difference between the dimension of V and the dimension of ker (T ). 8. ↑Suppose now that W = T (V ) . Then show that T ′ in the above is one to one and onto. Explain why dim (V / ker (T )) = dim (T (V )) . Now see Problem 25 on Page 53. Show that rank (T ) + null (T ) = dim (V ) . 9. Let V be an n dimensional vector space and let W be a subspace. Generalize the Problem 7 to define and give properties of V /W . What is its dimension? What is a basis? 10. A number is transcendental if it is not the root of any nonzero polynomial with rational coefficients. As mentioned, there are many numbers. Suppose α is a { known transcendental } real transcendental number. Show that 1, α, α2 , · · · is a linearly independent set of real numbers if the field of scalars is the rational numbers.

64

CHAPTER 3. VECTOR SPACES

11. Suppose F is a countable field and let A be the algebraic numbers, those numbers in G which are roots of a polynomial in F (x). Show A is also countable. 12. This problem is on partial fractions. Suppose you have R (x) =

p (x) , degree of p (x) < degree of denominator. q1 (x) · · · qm (x)

where the polynomials qi (x) are relatively prime and all the polynomials p (x) and qi (x) have coefficients in a field of scalars F. Thus there exist polynomials ai (x) having coefficients in F such that m ∑ 1= ai (x) qi (x) i=1

Explain why

∑m m p (x) i=1 ai (x) qi (x) ∑ ai (x) p (x) ∏ R (x) = = q1 (x) · · · qm (x) j̸=i qj (x) i=1

Now continue doing this on each term in the above sum till finally you obtain an expression of the form m ∑ bi (x) i=1

qi (x)

Using the Euclidean algorithm for polynomials, explain why the above is of the form M (x) +

m ∑ ri (x) i=1

qi (x)

where the degree of each ri (x) is less than the degree of qi (x) and M (x) is a polynomial. Now argue that M (x) = 0. From this explain why the usual partial fractions expansion of calculus must be true. You can use the fact that every polynomial having real coefficients factors into a product of irreducible quadratic polynomials and linear polynomials having real coefficients. This follows from the fundamental theorem of algebra. 13. It was shown in the chapter that A is a field. Here A are the numbers in R which are roots of a rational polynomial. Then it was shown in Problem 11 that it was actually countable. Show that A + iA is also an example of a countable field.

Chapter 4

Matrices You have now solved systems of equations by writing them in terms of an augmented matrix and then doing row operations on this augmented matrix. It turns out that such rectangular arrays of numbers are important from many other different points of view. Numbers are also called scalars. In general, scalars are just elements of some field. A matrix is a rectangular array of numbers from a field F. For example, here is a matrix.   1 2 3 4    5 2 8 7  6 −9 1 2 This matrix is a 3 × 4 matrix because there are three rows and four columns. The columns stand upright and are listed in order from left to right. The columns are horizontal and listed in order from top to bottom. The convention in dealing with matrices is to always list the rows first and then the columns. Also, you can remember the columns are like columns in a Greek temple. They stand up right while the rows just lie there like rows made by a tractor in a plowed field. Elements of the matrix are identified according to position in the matrix. For example, 8 is in position 2, 3 because it is in the second row and the third column. You might remember that you always list the rows before the columns by using the phrase Rowman Catholic. The symbol, (aij ) refers to a matrix in which the i denotes the row and the j denotes the column. Using this notation on the above matrix, a23 = 8, a32 = −9, a12 = 2, etc. There are various operations which are done on matrices. They can sometimes be added, multiplied by a scalar and sometimes multiplied. Definition 4.0.1 Let A = (aij ) and B = (bij ) be two m × n matrices. Then A + B = C where C = (cij ) for cij = aij + bij . Also if x is a scalar, xA = C where the ij th entry of C is cij = xaij where the ij th entry of A is aij . In short, cij = xaij . The number Aij will also typically refer to the ij th entry of the matrix A. The zero matrix, denoted by 0 will be the matrix consisting of all zeros. Do not be upset by the use of the subscripts, ij. The expression cij = aij + bij is just saying that you add corresponding entries to get the result of summing two matrices as discussed above. Note that there are 2 × 3 zero matrices, 3 × 4 zero matrices, etc. In fact for every size there is a zero matrix. With this definition, the following properties are all obvious but you should verify all of these properties are valid for A, B, and C, m × n matrices and 0 an m × n zero matrix. A + B = B + A,

(4.1)

(A + B) + C = A + (B + C) ,

(4.2)

the commutative law of addition,

65

66

CHAPTER 4. MATRICES

the associative law for addition, A + 0 = A,

(4.3)

A + (−A) = 0,

(4.4)

the existence of an additive identity, the existence of an additive inverse. Also, for α, β scalars, the following also hold. α (A + B) = αA + αB,

(4.5)

(α + β) A = αA + βA,

(4.6)

α (βA) = αβ (A) ,

(4.7)

1A = A.

(4.8)

These properties are just the vector space axioms discussed earlier and the fact that the m × n matrices satisfy these axioms is what is meant by saying this set of matrices with addition and scalar multiplication as defined above forms a vector space. Definition 4.0.2 Matrices which are n × 1 or 1 × denoted by a bold letter. Thus  x1  . . x =  . xn

n are especially called vectors and are often    

is an n × 1 matrix also called a column vector while a 1 × n matrix of the form referred to as a row vector.

( x1

···

) xn

is

All the above is fine, but the real reason for considering matrices is that they can be multiplied. This is where things quit being banal. The following is the definition of multiplying an m × n matrix times a n × 1 vector. Then after this, the product of two matrices is considered. Definition 4.0.3 First of all, define the product of a 1 × n matrix and a n × 1 matrix.   y ( ) 1  ∑ ..  xi yi x1 · · · xn   . = i yn If you have A an m × n matrix and B is an n × p matrix, then AB will be an m × p matrix whose ij th entry is the product of the ith row of A on the left with the j th column of B on the right. Thus (AB)ij ≡

n ∑

Aik Bkj .

k=1

( ) ) ( and if B = b1 · · · bn , AB = Ab1 · · · Abn . You can do (m × n) × (n × p) but in order to multiply, you must have the number of columns of the matrix on the left equal to the number of rows of the matrix on the right or else the rule just given makes no sense. To see the last claim, note that the j th column of AB involves bj and is of the form    ∑n   B1j A11 · · · A1n k=1 A1k Bkj      . .. ..   = Abj  .  .   . .   ..  =    . ∑n Bnj Am1 · · · Amn k=1 Amk Bkj Here is an example.

4.1. PROPERTIES OF MATRIX MULTIPLICATION

67

product in Z5 . That is, all the numbers are interpreted as

Example 4.0.4 Compute the following residue classes.  1   0 2

2 2 1

1 1 4





1 2 4 1

3   3   1

2 3 1 1

   . 

Doing the arithmetic in Z5 , you get 

1 2   0 2 2 1

4.1





3   3   1

1 1 4

1 2 4 1

2 3 1 1



  2 2     = 1 0   1 2

Properties of Matrix Multiplication

It is sometimes possible to multiply matrices in one order but not in the other order. For example, ( )( ) ( )( ) 1 2 1 1 2 1 2 1 2 1 and 2 1 2 2 1 2 1 2 1 2 What if it makes sense to multiply them in either order? Will they be equal then? ( )( ) ( )( ) 1 2 0 1 0 1 1 2 Example 4.1.1 Compare and . 3 4 1 0 1 0 3 4 The first product is

the second product is

(

(

1 3

2 4

0 1

1 0

)(

)(

0 1 1 0 1 2 3 4

)

( =

)

( =

2 4

1 3

3 1

4 2

) , ) ,

and you see these are not equal. Therefore, you cannot conclude that AB = BA for matrix multiplication. However, there are some properties which do hold. Proposition 4.1.2 If all multiplications and additions make sense, the following hold for matrices, A, B, C and a, b scalars. A (aB + bC) = a (AB) + b (AC) (4.9) (B + C) A = BA + CA

(4.10)

A (BC) = (AB) C

(4.11)

Proof: Using the above definition of matrix multiplication, ∑ ∑ (A (aB + bC))ij = Aik (aB + bC)kj = Aik (aBkj + bCkj ) k

= a

∑ k

Aik Bkj + b



k

Aik Ckj = a (AB)ij + b (AC)ij

k

= (a (AB) + b (AC))ij showing that A (B + C) = AB + AC as claimed. Formula 4.10 is entirely similar.

68

CHAPTER 4. MATRICES

Consider 4.11, the associative law of multiplication. Before reading this, review the definition of matrix multiplication in terms of entries of the matrices. ∑ ∑ ∑ Bkl Clj Aik Aik (BC)kj = (A (BC))ij = k

k

=



l

(AB)il Clj = ((AB) C)ij .

l

Another important operation on matrices is that of taking the transpose. The following example shows what is meant by this operation, denoted by placing a T as an exponent on the matrix. 

T ( ) 1 1 + 2i 1 3 2   1  3  = 1 + 2i 1 6 2 6 What happened? The first column became the first row and the second column became the second row. Thus the 3 × 2 matrix became a 2 × 3 matrix. The number 3 was in the second row and the first column and it ended up in the first row and second column. This motivates the following definition of the transpose of a matrix. Definition 4.1.3 Let A be an m × n matrix. Then AT denotes the n × m matrix which is defined as follows. ( T) A ij = Aji The transpose of a matrix has the following important property. Lemma 4.1.4 Let A be an m × n matrix and let B be a n × p matrix. Then T

(4.12)

T

(4.13)

(AB) = B T AT and if α and β are scalars, (αA + βB) = αAT + βB T Proof: From the definition, ( ) ∑ ∑( ) ( ) ( ) T (AB) = (AB)ji = Ajk Bki = B T ik AT kj = B T AT ij ij

k

k

4.13 is left as an exercise.  Definition 4.1.5 An n × n matrix A is said to be symmetric if A = AT . It is said to be skew symmetric if AT = −A. Example 4.1.6 Let



2  A= 1 3

1 5 −3

 3  −3  . 7

Then A is symmetric. Example 4.1.7 Let



0  A =  −1 −3 Then A is skew symmetric.

 1 3  0 2  −2 0

4.2. FINDING THE INVERSE OF A MATRIX

69

There is a special matrix called I and defined by Iij = δ ij where δ ij is the Kronecker symbol defined by { 1 if i = j δ ij = 0 if i ̸= j It is called the identity matrix because it is a multiplicative identity in the following sense. Lemma 4.1.8 Suppose A is an m × n matrix and In is the n × n identity matrix. Then AIn = A. If Im is the m × m identity matrix, it also follows that Im A = A. ∑ Proof: (AIn )ij = k Aik δ kj = Aij and so AIn = A. The other case is left as an exercise for you. Definition 4.1.9 An n×n matrix A has an inverse A−1 if and only if there exists a matrix, denoted as A−1 such that AA−1 = A−1 A = I where I = (δ ij ) for { 1 if i = j δ ij ≡ 0 if i ̸= j Such a matrix is called invertible. If it acts like an inverse, then it is the inverse. This is the message of the following proposition. Proposition 4.1.10 Suppose AB = BA = I. Then B = A−1 . Proof: From the definition, B is an inverse for A. Could there be another one B ′ ? B ′ = B ′ I = B ′ (AB) = (B ′ A) B = IB = B. Thus, the inverse, if it exists, is unique. 

4.2

Finding the Inverse of a Matrix

Later a formula is given for the inverse of a matirx. However, it is not a good way to find the inverse for a matrix. There is a much easier way and it is this which is presented here. It is also important to note that not all matrices have inverses. ( ) 1 1 Example 4.2.1 Let A = . Does A have an inverse? 1 1 One might think A would have an inverse because it does not equal zero. However, ( )( ) ( ) 1 1 −1 0 = 1 1 1 0 and if A−1 existed, this could not happen because you could multiply on the left by the inverse A T T and conclude the vector (−1, 1) = (0, 0) . Thus the answer is that A does not have an inverse. Suppose you want to find B such that AB = I. Let ( ) B = b1 · · · bn Also the ith column of I is ( ei =

0

···

0 1

0

···

)T 0

70

CHAPTER 4. MATRICES

Thus, if AB = I, bi , the ith column of B must satisfy the equation Abi = ei . The augmented matrix for finding bi is (A|ei ) . Thus, by doing row operations till A becomes I, you end up with (I|bi ) where bi is the solution to Abi = ei . Now the same sequence of row operations works regardless of the right side of the agumented matrix (A|ei ) and so you can save trouble by simply doing the following. row operations (A|I) → (I|B) and the ith column of B is bi , the solution to Abi = ei . Thus AB = I. This is the reason for the following simple procedure for finding the inverse of a matrix. This procedure is called the Gauss Jordan procedure. It produces the inverse if the matrix has one. Actually, it produces the right inverse. Procedure 4.2.2 Suppose A is an n × n matrix. To find A−1 if it exists, form the augmented n × 2n matrix, (A|I) and then do row operations until you obtain an n × 2n matrix of the form (I|B)

(4.14)

if possible. When this has been done, B = A−1 . The matrix A has an inverse exactly when it is possible to do row operations and end up with one like 4.14. Here is a fundamental theorem which describes when a matrix has an inverse. Theorem 4.2.3 Let A be an n × n matrix. Then A−1 exists if and only if the columns of A are a linearly independent set. Also, if A has a right inverse, then it has an inverse which equals the right inverse. Proof: ⇒ If A−1 exists, then A−1 A = I and so Ax = 0 if and only if x = 0. Why? But this says that the columns of A are linearly independent. ⇐ Say the columns are linearly independent. Then they form a basis for Fn . Thus there exists bi ∈ Fn such that Abi = ei where ei is the column vector with 1 in the ith position and zeros elsewhere. Then from the way we multiply matrices, ( ) ( ) A b1 · · · bn = e1 · · · en = I ( ) Thus A has a right inverse. Now letting B ≡ b1 · · · bn , it follows that Bx = 0 if and only if x = 0. However, this is nothing but a statement that the columns of B are linearly independent. Hence, by what was just shown, B has a right inverse C, BC = I. Then from AB = I, it follows that A = A (BC) = (AB) C = IC = C and so AB = BC = BA = I. Thus the inverse exists. Finally, if AB = I, then Bx = 0 if and only if x = 0 and so the columns of B are a linearly independent set in Fn . Therefore, it has a right inverse C which by a repeat of the above argument is A. Thus AB = BA = I.  Similarly, if A has a left inverse then it has an inverse which is the same as the left inverse. The theorem gives a condition for the existence of the inverse and the above procedure gives a method for finding it.   1 0 1   Example 4.2.4 Let A =  1 2 1 . Find A−1 in arithmetic of Z3 . 1 1 2

4.2. FINDING THE INVERSE OF A MATRIX Form the augmented matrix



1   1 1

0 2 1

1 1 2

Now do row operations in Z3 until the n × n yields after some computations,  1 0   0 1 0 0

71

1 0 0

0 1 0

 0  0 . 1

matrix on the left becomes the identity matrix. This 0 0 1

0 1 1

2 2 1

 2  0  1

and so the inverse of A is the matrix on the right,   0 2 2    1 2 0 . 1 1 1 Checking the answer is easy. Just multiply the   1 0 1 0    1 2 1  1 1 1 2 1 All arithmetic is done in Z3 . Always check usually have made a mistake.  6 −1 2  Example 4.2.5 Let A =  −1 2 −1 2 −1 1 Set up the augmented matrix (A|I)  6   −1 2 Now find row reduced echelon form 

1   0 0



1  Example 4.2.6 Let A =  1 2

2 0 2

  . Find A−1 in Q.

 2 1 0 0  −1 0 1 0  1 0 0 1  −3  4  11

0 0 1

1 −1 −3

−1 2 4

1   −1 −3

−1 2 4

 −3  4  11

0 1 0

see if it works.  1 0 0  0 1 0  0 0 1

your answer because if you are like some of us, you will

−1 2 −1



Thus the inverse is

matrices and   2 2   2 0 = 1 1

 2  2 . Find A−1 in Q. 4

72

CHAPTER 4. MATRICES

This time there is no inverse because the columns are not linearly independent. This can be seen by solving the equation      0 x 1 2 2       1 0 2  y  =  0  0 2 2 4 z and finding that there is a nonzero solution which is equivalent to the columns being a dependent set. Thus, by Theorem 4.2.3, there is no inverse. Example 4.2.7 Consider the matrix



 0  0  5

1 1   0 1 0 0

Find its inverse in arithmetic of Q and then find its inverse in Z5 . It has an inverse in Q.



1 1   0 1 0 0

−1  0 1   0  = 0 5 0

−1 1 0

 0  0  1 5

However, in Z5 it has no inverse because 5 = 0 in Z5 and so in Z35 , the columns are dependent. Example 4.2.8 Here is a matrix.

(

2 1 1 2

)

Find its inverse in the arithmetic of Q and then in Z3 . It has an inverse in the arithmetic of Q (

2 1

1 2

)−1

( =

2 3 − 13

− 13

)

2 3

However, there is no inverse in the arithmetic of Z3 . Indeed, the row reduced echelon form of ( ) 2 1 0 1 2 0 ) ) ( ) ( 1 2 0 1 2 1 which shows that the columns are ∈ ker computed in Z3 is and so 0 0 0 1 1 2 not independent so there is no inverse in Z23 . The field of residue classes is not of major importance in this book, but it is included to emphasize that these considerations are completely algebraic in nature, depending only on field axioms. There is no geometry or analysis involved here. (

4.3

Linear Relations and Row Operations

Suppose you have the following system of equations. x − 5w − 3z = 1 2w + x + y + z = 2 2w + x + y + z = 3

4.3. LINEAR RELATIONS AND ROW OPERATIONS

73

You could write it in terms of matrix multiplication as follows.       x 1 1 0 −3 −5  y       2  = 2   1 1 1  z  3 1 1 1 2 w You could also write it in terms of vector addition as follows.           1 0 −3 −5 1           x 1  + y 1  + z 1  + w 2  =  2  1 1 1 2 3 When you find a solution to the system of equations, you are really finding the scalars x, y, z, w such that the vector on the right is the above linear combination of the columns. We considered writing this system as an augmented matrix   1 0 −3 −5 1   2 2   1 1 1 1 1 1 2 3 and then row reducing it to get a matrix in row reduced echelon form from which it was easy to see the solution, finding the last column as a linear combination of the preceding columns. However, this process of row reduction also shows the fourth column as a linear combination of the first three and the third as a linear combination of the first two, so when you reduce to row reduced echelon form, you are really solving many systems of equations at the same time. The important thing was the observation that the row operations did not change the solution set of the system. However, this could be said differently. The row operations did not change the set of scalars which yield the last column as a linear combination of the first four. Similarly, the row operations did not change the scalars to obtain the fourth column as a linear combination of the first three, and so forth. In other words, if a column is a linear combination of the preceding columns, then after doing row operations, that column will still be the same linear combination of the preceding columns. By permuting the columns, placing a chosen column on the right, the same argument shows that any column after the row operation is the same linear combination of the other columns as it was before the row operation. Such a relation between a column and other columns will be called a linear relation. Thus we have the following significant observation which is stated here as a theorem. Theorem 4.3.1 Row operations preserve all linear relations between columns. Now here is a slightly different description of the row reduced echelon form. Definition 4.3.2 Let ei denote the column vector which has all zero entries except for the ith slot which is one. An m × n matrix is said to be in row reduced echelon form if, in viewing successive columns from left to right, the first nonzero column encountered is e1 and if you have encountered e1 , e2 , · · · , ek , the next column is either ek+1 or is a linear combination of the vectors, e1 , e2 , · · · , ek . Earlier an algorithm was presented which will produce a matrix in row reduced echelon form. A natural question is whether there is only one row reduced echelon form. In fact, there is only one and this follows easily from the above definition. Suppose you had two B, C in row reduced echelon form and these came from the same matrix A through row operations. Then they have zero columns in the same positions because row operations preserve all zero columns. Also B, C have e1 in the same position because its position is that of the first column of A which is not zero. Similarly e2 , e3 and so forth must be in the same positions because of the above definition where these positions are defined in terms of a column being the first

74

CHAPTER 4. MATRICES

in A when viewed from the left to the right which is not a linear combination of the columns before it. As to a column after ek and before ek+1 if there is such, these columns are an ordered list top to bottom of the scalars which give this column in A as a linear combination of the columns to its left because all linear relations between columns are preserved by doing row operations. Thus B, C must be exactly the same. This is why there is only one row reduced echelon form for a given matrix and it justifies the use of the definite article when referring to the row reduced echelon form. This proves the following theorem. Theorem 4.3.3 The row reduced echelon form is unique. Now from this theorem, we can obtain the following. Theorem 4.3.4 Let A be an n × n matrix. Then it is invertible if and only if there is a sequence of row operations which produces I. Proof: ⇒ Since A is invertible, it follows from Theorem 4.2.3 that the columns of A must be independent. Hence, in the row reduced echelon form for A, the columns must be e1 , e2 , · · · , en in order from left to right. In other words, there is a sequence of row operations which produces I. ⇐ Now suppose such a sequence of row operations produces I. Then since row operations preserve linear combinations between columns, it follows that no column is a linear combination of the others and consequently the columns are linearly independent. By Theorem 4.2.3 again, A is invertible.  It would be possible to define things like rank in terms of the row reduced echelon form and this is often done. However, in this book, these things will be defined in terms of vector space language and the row reduced echelon form will be a useful tool to determine the rank. Definition 4.3.5 Let A be an m × n matrix, the entries being in F a field. Then rank (A) is defined as the dimension of Im (A) ≡ A (Fn ). Note that, from the way we multiply matrices times a vector, this is just the same as the dimension of span (columns of A), sometimes called the column space. Now here is a very useful result. Proposition 4.3.6 Let A be an m×n matrix. Then rank (A) equals the number of pivot columns in the row reduced echelon form of A. These are the columns of A which are not a linear combination of those columns on the left. Proof: This is obvious if the matrix is already in row reduced echelon form. In this case, the pivot columns consist of e1 , e2 , · · · , er and every other column is a linear combination of these. Thus the rank of this matrix is r because these vectors are obviously linearly independent. However, the linear relations between a column and its preceeding columns are preserved by row operations and so the columns in A corresponding to the first occurance of e1 , first occurance of e2 and so forth, in the row reduced echelon form, the pivot columns, are also a basis for the span of the columns of A and so there are r of these.  Note that from the description of the row reduced echelon form, the rank is also equal to the number of nonzero rows in the row reduced echelon form.

4.4

Block Multiplication of Matrices

Consider the following problem

(

A C

B D

)(

E G

F H

)

You know how to do this. You get (

AE + BG AF + BH CE + DG CF + DH

) .

4.4. BLOCK MULTIPLICATION OF MATRICES

75

Now what if instead of numbers, the entries, A, B, C, D, E, F, G are matrices of a size such that the multiplications and additions needed in the above formula all make sense. Would the formula be true in this case? I will show below that this is true. Suppose A is a matrix of the form   A11 · · · A1m  . ..  ..  . A= (4.15) . .   . Ar1 · · · Arm where Aij is a si × pj matrix where si is constant for j = 1, · · · , m for each i = 1, · · · , r. Such a matrix is called a block matrix, also a partitioned matrix. How do you get the block Aij ? Here is how for A an m × n matrix: si ×m

}|

z(

0 Isi ×si

z 

n×pj

}| { { 0 )   0 A Ipj ×pj . 0

(4.16)

In the block column matrix on the right, you need to have cj −1 rows of zeros above the small pj ×pj identity matrix where the columns of A involved in Aij are cj , · · · , cj + pj − 1 and in the block row matrix on the left, you need to have ri − 1 columns of zeros to the left of the si × si identity matrix where the rows of A involved in Aij are ri , · · · , ri + si . An important observation to make is that the matrix on the right specifies columns to use in the block and the one on the left specifies the rows used. Thus the block Aij in this case is a matrix of size si × pj . There is no overlap between the blocks of A. Thus the identity n × n identity matrix corresponding to multiplication on the right of A is of the form   Ip1 ×p1 0   ..   .   0 Ipm ×pm these little identity matrices don’t overlap. A similar conclusion follows from consideration of the matrices Isi ×si . Next consider the question of multiplication of two block matrices. Let B be a block matrix of the form   B11 · · · B1p  . ..  ..  .  (4.17) . .   . Br1 · · · Brp and A is a block matrix of the form 

A11  .  .  . Ap1

··· .. . ···

 A1m ..   .  Apm

(4.18)

and that for all i, j, it makes sense to multiply Bis Asj for all s ∈ {1, · · · , p}. (That is the two matrices, Bis and Asj are conformable.) ∑ and that for fixed ij, it follows Bis Asj is the same size for each s so that it makes sense to write s Bis Asj . The following theorem says essentially that when you take the product of two matrices, you can do it two ways. One way is to simply multiply them forming BA. The other way is to partition both matrices, formally multiply the blocks to get another block matrix and this one will be BA partitioned. Before presenting this theorem, here is a simple lemma which is really a special case of the theorem.

76

CHAPTER 4. MATRICES

Lemma 4.4.1 Consider the following product.   0 (    I  0 I 0

) 0

where the first is n × r and the second is r × n. The small identity matrix I is an r × r matrix and there are l zero rows above I and l zero columns to the left of I in the right matrix. Then the product of these matrices is a block matrix of the form   0 0 0    0 I 0  0 0 0 Proof: From the definition of the way you multiply matrices, the product is              0 0 0 0 0 0               I  0 · · ·  I  0  I  e1 · · ·  I  er  I  0 · · ·  I  0 0 0 0 0 0 0 which yields the claimed result. In the formula ej refers to the column vector of length r which has a 1 in the j th position.  Theorem 4.4.2 Let B be a q × p block matrix as in 4.17 and let A be a p × n block matrix as in 4.18 such that Bis is conformable with Asj and each product, Bis Asj for s = 1, · · · , p is of the same size so they can be added. Then BA can be obtained as a block matrix such that the ij th block is of the form ∑ Bis Asj . (4.19) s

Proof: From 4.16 ( Bis Asj =

) 0 Iri ×ri

0





0

  B  Ips ×ps  0

(

) 0 Ips ×ps

0





0

  A  Iqj ×qj  0

where here it is assumed Bis is ri × ps and Asj is ps × qj . The product involves the sth block in the ith row of blocks for B and the sth block in the j th column of A. Thus there are the same number of rows above the Ips ×ps as there are columns to the left of Ips ×ps in those two inside matrices. Then from Lemma 4.4.1     0 0 0 0 ( )      Ips ×ps  0 Ips ×ps 0 =  0 Ips ×ps 0  0 0 0 0 Since the blocks of small identity matrices do not overlap,    Ip1 ×p1 0 0 0 ∑   ..   0 Ips ×ps 0  =  . s 0 0 0 0 and so

∑ s



0

 =I 

Ipp ×pp

Bis Asj = ∑( s

) 0 Iri ×ri

0



0



  B  Ips ×ps  0

(

) 0 Ips ×ps

0



0



  A  Iqj ×qj  0

4.5. ELEMENTARY MATRICES ( =

( =



)

0 Iri ×ri

0 Iri ×ri

77 

0

(

)



0



∑     Ips ×ps  0 Ips ×ps 0 A  Iqj ×qj  s 0 0     0 0 ( ) )     0 BIA  Iqj ×qj  = 0 Iri ×ri 0 BA  Iqj ×qj  0 0 0

B

which equals the ij th block of BA.∑Hence the ij th block of BA equals the formal multiplication according to matrix multiplication, s Bis Asj .  Example 4.4.3 Let an n × n matrix have the form ( ) a b A= c P where P is n − 1 × n − 1. Multiply it by

( B=

p r

q Q

)

where B is also an n × n matrix and Q is n − 1 × n − 1. You use block multiplication ( )( a b p c P r

q Q

)

( =

ap + br aq + bQ pc + P r cq + P Q

)

Note that this all makes sense. For example, b = 1 × n − 1 and r = n − 1 × 1 so br is a 1 × 1. Similar considerations apply to the other blocks. Here is a very significant application. A matrix is called block diagonal if it has all zeros except for square blocks down the diagonal. That is, it is of the form   A1 0   A2     A= ..  .   0 Am where Aj is a rj × rj matrix whose main diagonal lies on the main diagonal of A. Then by block multiplication, if p ∈ N the positive integers,  p  A1 0   Ap2   p   (4.20) A = ..  .   0 Apm Also, A−1 exists if and only if each block is invertible and in fact, A−1 is given by the above when p = −1.

4.5

Elementary Matrices

The elementary matrices result from doing a row operation to the identity matrix. Recall the following definition.

78

CHAPTER 4. MATRICES

Definition 4.5.1 The row operations consist of the following 1. Switch two rows. 2. Multiply a row by a nonzero number. 3. Replace a row by a multiple of another row added to it. The elementary matrices are given in the following definition. Definition 4.5.2 The elementary matrices consist of those matrices which result by applying a row operation to an identity matrix. Those which involve switching rows of the identity are called permutation matrices1 . The importance of elementary matrices is that when you multiply on the left by one, it does the row operation which was used to produce the elementary matrix. Now consider what these elementary matrices look like. First consider the one which involves switching row i and row j where i < j. This matrix is of the form   .. .     0 1     ..   .       1 0   .. . Note how the ith and j th rows are switched in the identity matrix and there are thus all ones on the main diagonal except for those two positions indicated. The two exceptional rows are shown. The ith row was the j th and the j th row was the ith in the identity matrix. Now consider what this does to a column vector.      .. .. .. . . .       x   x   0 1  i   j    .   .   ..  .  =  .   .  .   .          xj   xi   1 0      .. .. .. . . . Now denote by P ij the elementary matrix which comes from the identity from switching rows i and j. From what was just explained and Proposition ??,     .. .. .. .. .. .. . . . . . .      a     i1 ai2 · · · aip   aj1 aj2 · · · ajp      . . . . . .  .. ..  .. ..  P ij   ..  =  ..       aj1 aj2 · · · ajp   ai1 ai2 · · · aip      .. .. .. .. . .. . . . . . . . This has established the following lemma. Lemma 4.5.3 Let P ij denote the elementary matrix which involves switching the ith and the j th rows. Then P ij A = B where B is obtained from A by switching the ith and the j th rows. 1 More generally, a permutation matrix is a matrix which comes by permuting the rows of the identity matrix, which means possibly more than two rows are switched.

4.5. ELEMENTARY MATRICES

79

Example 4.5.4 Consider the following.      g d a b 0 1 0       1 0 0  g d  =  a b  e f e f 0 0 1 Next consider the row operation which involves multiplying the ith row by a nonzero constant, c. The elementary matrix which results from applying this operation to the ith row of the identity matrix is of the form   .. . 0     1     c       1   .. . 0 Now consider what this does to a column vector.   . .. . . 0   .    vi−1 1     c    vi      vi+1 1   .. .. . 0 .







.. .

    vi−1      =  cvi     vi+1   .. .

       

Denote by E (c, i) this elementary matrix which multiplies the ith row of the identity by the nonzero constant, c. Then from what was just discussed and Proposition ??, 

.. .

  a  (i−1)1  E (c, i)  ai1   a(i+1)1  .. .

.. .

.. . ··· ··· ···

a(i−1)2 ai2 a(i+1)2 .. .

a(i−1)p aip a(i+1)p .. .





.. .

    a   (i−1)1    =  cai1     a(i+1)1   .. .

.. . a(i−1)2 cai2 a(i+1)2 .. .

.. . ··· ··· ···

a(i−1)p caip a(i+1)p .. .

        

This proves the following lemma. Lemma 4.5.5 Let E (c, i) denote the elementary matrix corresponding to the row operation in which the ith row is multiplied by the nonzero constant, c. Thus E (c, i) involves multiplying the ith row of the identity matrix by c. Then E (c, i) A = B where B is obtained from A by multiplying the ith row of A by c. Example 4.5.6 Consider this. 

1   0 0

0 5 0

    a b a b 0     0   c d  =  5c 5d  e f e f 1

Finally consider the third of these row operations. Denote by E (c × i + j) the elementary matrix which replaces the j th row with the j th row added to c times the ith row. In case i < j this will be

80

CHAPTER 4. MATRICES

of the form

         

..

. 1 .. c

. 1

0

 0          .. .

Now consider what this does to a column vector.     .. .. .. . 0 . .          1 vi    vi     .   .. ..      . .    ..  =         vj   cvi + vj c 1     .. .. .. . 0 . .

         

Now from this and Proposition ??,      E (c × i + j)           =     

.. . ai1 .. . cai1 + aj1 .. .

.. . ai1 .. . aj1 .. .

.. . ai2 .. . aj2 .. .

.. . ai2 .. . cai2 + aj2 .. .

··· ···

··· ···

.. . aip .. . ajp .. .

         

.. . aip .. . caip + ajp .. .

         

The case where i > j is handled similarly. This proves the following lemma. Lemma 4.5.7 Let E (c × i + j) denote the elementary matrix obtained from I by replacing the j th row with c times the ith row added to it. Then E (c × i + j) A = B where B is obtained from A by replacing the j th row of A with itself added to c times the ith row of A. Example 4.5.8 Consider the  1   0 2

third row operation.     a b a b 0 0     c d 1 0  c d  =   2a + e 2b + f e f 0 1

The next theorem is the main result. Theorem 4.5.9 To perform any of the three row operations on a matrix A, it suffices to do the row operation on the identity matrix obtaining an elementary matrix E and then take the product, EA. ˆ such that Furthermore, if E is an elementary matrix, then there is another elementary matrix E ˆ ˆ E E = EE = I.

4.6. EXERCISES

81

Proof: The first part of this theorem has been proved in Lemmas 4.5.3 - 4.5.7. It only remains ˆ Consider first the elementary matrices corresponding to row to verify the claim about the matrix E. operation of type three. E (−c × i + j) E (c × i + j) = I. This follows because the first matrix takes c times row i in the identity and adds it to row j. When multiplied on the left by E (−c × i + j) it follows from the first part of this theorem that you take the ith row of E (c × i + j) which coincides with the ith row of I since that row was not changed, multiply it by −c and add to the j th row of E (c × i + j) which was the j th row of I added to c times the ith row of I. Thus E (−c × i + j) multiplied on the left, undoes the row operation which resulted in E (c × i + j). The same argument applied to the product E (c × i + j) E (−c × i + j) replacing c with −c in the argument yields that this product is also equal to I. Therefore, there is an elementary matrix of the same sort which when multiplied by E on either side gives the identity. Similar reasoning shows that for E (c, i) the elementary matrix which comes from multiplying ˆ = E ((1/c) , i). the ith row by the nonzero constant c, you can take E ij Finally, consider P which involves switching the ith and the j th rows P ij P ij = I because by the first part of this theorem, multiplying on the left by P ij switches the ith and j th rows of P ij which was obtained from switching the ith and j th rows of the identity. First you switch them to get P ij and then you multiply on the left by P ij which switches these rows again and restores the identity matrix.  Using Theorem 4.3.4, this shows the following result. Theorem 4.5.10 Let A be an n × n matrix. Then if R is its row reduced echelon form, there is a sequence of elementary matrices Ei such that E1 E2 · · · Em A = R In particular, A is invertible if and only if there is a sequence of elementary matrices as above such −1 · · · E2−1 E1−1 a product of elementary matrices. that E1 E2 · · · Em A = I. Inverting these,A = Em

4.6

Exercises

1. In 4.1 - 4.8 describe −A and 0. 2. Let A be an n × n matrix. Show A equals the sum of a symmetric and a skew symmetric matrix. 3. Show every skew symmetric matrix has all zeros down the main diagonal. The main diagonal consists of every entry of the matrix which is of the form aii . It runs from the upper left down to the lower right. 4. We used the fact that the columns of a matrix A are independent if and only if Ax = 0 has only the zero solution for x. Why is this so? 5. If A is m × n where n > m, explain why there exists x ∈ Fn such that Ax = 0 but x ̸= 0. 6. Using only the properties 4.1 - 4.8 show −A is unique. 7. Using only the properties 4.1 - 4.8 show 0 is unique. 8. Using only the properties 4.1 - 4.8 show 0A = 0. Here the 0 on the left is the scalar 0 and the 0 on the right is the zero for m × n matrices. 9. Using only the properties 4.1 - 4.8 and previous problems show (−1) A = −A. 10. Prove that Im A = A where A is an m × n matrix.

82

CHAPTER 4. MATRICES

( ) 11. Let A and be a real m × n matrix and let x ∈ Rn and y ∈ Rm . Show (Ax, y)Rm = x,AT y Rn where (·, ·)Rk denotes the dot product in Rk . You need to know about the dot product. It will be discussed later but hopefully it has been seen in physics or calculus. T

12. Use the result of Problem 11 to verify directly that (AB) = B T AT without making any reference to subscripts. However, note that the treatment in the chapter did not depend on a dot product. 13. Let x = (−1, −1, 1) and y = (0, 1, 2) . Find xT y and xyT if possible. 14. Give an example of matrices, A, B, C such that B ̸= C, A ̸= 0, and yet AB = AC.     ( ) 1 1 1 1 −3 1 −1 −2     15. Let A =  −2 −1 , B = , and C =  −1 2 0  . Find if possible 2 1 −2 1 2 −3 −1 0 the following products. AB, BA, AC, CA, CB, BC. −1

16. Show (AB)

= B −1 A−1 .

( )−1 ( −1 )T = A . 17. Show that if A is an invertible n × n matrix, then so is AT and AT 18. Show that if A is an n × n invertible matrix and x is a n × 1 matrix such that Ax = b for b an n × 1 matrix, then x = A−1 b. 19. Give an example of a matrix A such that A2 = I and yet A ̸= I and A ̸= −I. 20. Give an example of matrices, A, B such that neither A nor B equals zero and yet AB = 0. 21. Give another example other than the one given in this section of two square matrices, A and B such that AB ̸= BA. 22. Suppose A and B are square matrices of the same size. Which of the following are correct? 2

(a) (A − B) = A2 − 2AB + B 2 2

(b) (AB) = A2 B 2 2

(c) (A + B) = A2 + 2AB + B 2 2

(d) (A + B) = A2 + AB + BA + B 2 (e) A2 B 2 = A (AB) B 3

(f) (A + B) = A3 + 3A2 B + 3AB 2 + B 3 (g) (A + B) (A − B) = A2 − B 2 (h) None of the above. They are all wrong. (i) All of the above. They are all right. ( ) −1 −1 23. Let A = . Find all 2 × 2 matrices, B such that AB = 0. 3 3 24. Prove that if A−1 exists and Ax = 0 then x = 0. 25. Let



1 2  A= 2 1 1 0

 3  4 . 2

Find A−1 if possible. If A−1 does not exist, determine why.

4.6. EXERCISES

83 

26. Let

1  A= 2 1

 3  4 . 2

0 3 0

Find A−1 if possible. If A−1 does not exist, determine why. 27. Let



1 2  A= 2 1 4 5

 3  4 . 10

Find A−1 if possible. If A−1 does not exist, determine why. 28. Let

   A= 

1 1 2 1

2 1 1 2

 0 2 2 0    −3 2  1 2

Find A−1 if possible. If A−1 does not exist, determine why. (

29. Let A=

2 1

1 3

)

Find A−1 if possible. If A−1 does not exist, determine why. Do this in Q2 and in Z25 . (

30. Let A=

2 1

1 2

)

Find A−1 if possible. If A−1 does not exist, determine why. Do this in Q2 and in Z23 . 31. If you have any system of equations Ax = b, let ker (A) ≡ {x : Ax = 0} . Show that all solutions of the system Ax = b are in ker (A) + yp where Ayp = b. This means that every solution of this last equation is of the form yp + z where Az = 0. 32. Write the solution set of the following system as the span solution space of the following system.     x 1 −1 2      1 −2 1   y  =  z 3 −4 5 33. Using Problem 32 find the general  1   1 3

of vectors and find a basis for the  0  0 . 0

solution to the following linear system.     1 x −1 2     −2 1   y  =  2  . 4 z −4 5

34. Write the solution set of the following system as the span solution space of the following system.     x 0 −1 2      1 −2 1   y  =  z 1 −4 5

of vectors and find a basis for the  0  0 . 0

84

CHAPTER 4. MATRICES

35. Using Problem 34 find the general solution to the following linear system.      1 x 0 −1 2       1 −2 1   y  =  −1  . 1 z 1 −4 5 36. Write the solution set of the following system as the span solution space of the following system.     1 −1 2 x      1 −2 0   y  =  3 −4 4 z 37. Using Problem 36 find the general  1   1 3

of vectors and find a basis for the  0  0 . 0

solution to the following linear system.     −1 2 x 1     −2 0   y  =  2  . −4 4 z 4

38. Show that 4.20 is valid for p = −1 if and only if each block has an inverse and that this condition holds if and only if A is invertible. 39. Let A be an n × n matrix and let P ij be the permutation matrix which switches the ith and j th rows of the identity. Show that P ij AP ij produces a matrix which is similar to A which switches the ith and j th entries on the main diagonal. 40. You could define column operations by analogy to row operations. That is, you switch two columns, multiply a column by a nonzero scalar, or add a scalar multiple of a column to another column. Let E be one of these column operations applied to the identity matrix. Show that AE produces the column operation on A which was used to define E. 41. Consider the symmetric 3 × 3 matrices, those for which A = AT . Show that with respect to the usual notions of addition and scalar multiplication this is a vector space of dimension 6. What is the dimension of the set of skew symmetric matrices? 42. You have an m × n matrix of rank r. Explain why if you delete a column, the resulting matrix has rank r or rank r − 1. 43. Using the fact that multiplication on the left by an elementary matrix accomplishes a row operation, show easily that row operations produce no change in linear relations between columns.

Chapter 5

Linear Transformations This chapter is on functions which map a vector space to another one which are also linear. The description of these is in the following definition. Linear algebra is all about understanding these kinds of mappings. Definition 5.0.1 Let V and W be two finite dimensional vector spaces. A function, L which maps V to W is called a linear transformation and written L ∈ L (V, W ) if for all scalars α and β, and vectors v,w, L (αv+βw) = αL (v) + βL (w) . 3 3 Example ∑3 5.0.2 Let V = R , W = R, and let a ∈ R be given vector in V. Define T : V → W by T v ≡ i=1 ai vi

It is left as an exercise to verify that this is indeed linear. Here is an interesting observation. Proposition 5.0.3 Let L : Fn → Fm be linear. Then there exists a unique m × n matrix A such that Lx = Ax for all x. Also, matrix multiplication yields a linear transformation. ∑n Proof: Note that x = i=1 xi ei and so   ( n ) x 1 n ( ) ∑ ∑ ..   Lx = L xi e i = xi Lei = Le1 · · · Len   .  i=1 i=1 xn ( ) = Le1 · · · Len x The matrix is A. The last claim follows from the properties of matrix multiplication.  I will abuse terminology slightly and say that a m × n matrix is one to one if the linear transformation it determines is one to one, similarly for the term onto.

5.1

L (V, W ) as a Vector Space

The linear transformations can be considered as a vector space as described next. Definition 5.1.1 Given L, M ∈ L (V, W ) define a new element of L (V, W ) , denoted by L + M according to the rule1 (L + M ) v ≡ Lv + M v. For α a scalar and L ∈ L (V, W ) , define αL ∈ L (V, W ) by αL (v) ≡ α (Lv) . 1 Note

that this is the standard way of defining the sum of two functions.

85

86

CHAPTER 5.

LINEAR TRANSFORMATIONS

You should verify that all the axioms of a vector space hold for L (V, W ) with the above definitions of vector addition and scalar multiplication. In fact, is just a subspace of the set of functions mapping V to W which is a vector space thanks to Example 3.0.3. What about the dimension of L (V, W )? What about a basis for L (V, W )? Before answering this question, here is a useful lemma. It gives a way to define linear transformations and a way to tell when two of them are equal. Lemma 5.1.2 Let V and W be vector spaces and suppose {v1 , · · · , vn } is a basis for V. Then if L : V → W is given by Lvk = wk ∈ W and ( n ) n n ∑ ∑ ∑ L a k vk ≡ ak Lvk = ak wk k=1

k=1

k=1

then L is well defined and is in L (V, W ) . Also, if L, M are two linear transformations such that Lvk = M vk for all k, then M = L. Proof: L is well defined on V because, since {v1 , · · · , vn } is a basis, there is exactly one way to write a given vector of V as a linear combination. ∑ Next, observe that L is obviously linear from the n definition. If L, M are equal on the basis, then if k=1 ak vk is an arbitrary vector of V, ( n ) ( n ) n n ∑ ∑ ∑ ∑ L ak vk = ak Lvk = ak M vk = M ak vk k=1

k=1

k=1

k=1

and so L = M because they give the same result for every vector in V .  The message is that when you define a linear transformation, it suffices to tell what it does to a basis. Example 5.1.3 A basis for R2 is

(

1 1

) ( ,

1 0

)

Suppose T is a linear transformation which satisfies ( ) ( ) ( ) ( ) 1 2 1 −1 T = , T = 1 1 0 1 ( ) 3 Find T . 2 ( T

3 2

)

( ( =

T

2 (

1 1 )

)

( +

1 0

))

( ) 1 1 = 2T +T 1 0 ( ) ( ) ( ) 2 −1 3 = 2 + = 1 1 3

Theorem 5.1.4 Let V and W be finite dimensional linear spaces of dimension n and m respectively. Then dim (L (V, W )) = mn. Proof: Let two sets of bases be {v1 , · · · , vn } and {w1 , · · · , wm } for V and W respectively. Using Lemma 5.1.2, let wi vj ∈ L (V, W ) be the linear transformation defined on the basis, {v1 , · · · , vn }, by wi vk (vj ) ≡ wi δ jk

5.2. THE MATRIX OF A LINEAR TRANSFORMATION

87

where δ ik = 1 if i = k and 0 if i ̸= k. I will show that L ∈ L (V, W ) is a linear combination of these special linear transformations called dyadics, also rank one transformations. Then let L ∈ L (V, W ). Since {w1 , · · · , wm } is a basis, there exist constants, djk such that Lvr =

m ∑

djr wj

j=1

∑m ∑n

Now consider the following sum of dyadics. m ∑ n ∑

dji wj vi (vr ) =

j=1 i=1

j=1 m ∑ n ∑

i=1

dji wj vi . Apply this to vr . This yields

dji wj δ ir =

j=1 i=1

m ∑

djr wj = Lvr

(5.1)

j=1

∑m ∑n Therefore, L = j=1 i=1 dji wj vi showing the span of the dyadics is all of L (V, W ) . Now consider whether these special linear transformations are a linearly independent set. Suppose ∑ dik wi vk = 0. i,k

Are all the scalars dik equal to 0? 0=



dik wi vk (vl ) =

m ∑

dil wi

i=1

i,k

and so, since {w1 , · · · , wm } is a basis, dil = 0 for each i = 1, · · · , m. Since l is arbitrary, this shows dil = 0 for all i and l. Thus these linear transformations form a basis and this shows that the dimension of L (V, W ) is mn as claimed because there are m choices for the wi and n choices for the vj .  Note that from 5.1, these coefficients which obtain L as a linear combination of the diadics are given by the equation m ∑ djr wj = Lvr (5.2) j=1

Thus Lvr is in the span of the wj and the scalars in the linear combination are d1r , d2r , · · · , dmr .

5.2

The Matrix of a Linear Transformation

In order to do computations based on a linear transformation, we usually work with its matrix. This is what is described here. Theorem 5.1.4 says that the rank one transformations defined there in terms of two bases, one for V and the other for W are a basis for L (V, W ) . Thus if A ∈ L (V, W ) , there are scalars Aij such that n ∑ m ∑ A= Aij wi vj i=1 j=1

Here we have 1 ≤ i ≤ n and 1 ≤ j ≤ follows.  A11   A21  .  .  . Am1

m. We can arrange these scalars in a rectangular shape as  A12 · · · A1(n−1) A1n  A22 · · · A2(n−1) A2n  .. .. ..   . . .  Am2 · · · Am(n−1) Amn

88

CHAPTER 5.

LINEAR TRANSFORMATIONS

Here this is an m × n matrix because it has m rows and n columns. It is called the matrix of the linear transformation A with ∑ respect to the two bases {v1 , · · · , vn } for V and {w1 , · · · , wm } for W . n Now, as noted earlier, if v = r=1 xr vr , ) ( n m ∑ n ∑ ∑ Av = Aij wi vj x r vr r=1

i=1 j=1

=

=

n ∑

xr

m ∑ n ∑

r=1

i=1 j=1

n ∑

m ∑

r=1

xr

Aij wi vj (vr ) =

Air wi =

i=1

( n m ∑ ∑ i=1

n ∑

xr

r=1

Air xr

m ∑ n ∑

Aij wi δ jr

i=1 j=1

)

wi

r=1

What does this show? It shows that if the component vector of v is   x1  .  .  x =  .  xn meaning that v =

∑ i

xi vi , then the component vector of w has ith component equal to n ∑

Air xr = (Ax)i

r=1

The idea is that acting on a vector v with a linear transformation T yields a new vector w whose component vector is obtained as the matrix of the linear transformation times the component vector of v. It is helpful for some of us to think of this in terms of diagrams. On the other hand, some people hate such diagrams. Use them if it helps. Otherwise ignore them and go right to the algebraic definition 5.2. Let β = {v1 , · · · , vn } be a basis for V and let {w1 , · · · , wm } = γ be a basis for W . Then let qβ : Fn → V, qγ : Fm → W be defined as qβ x ≡

n ∑

xi v i , q γ y ≡

i=1

m ∑

yj wj

j=1

Thus these mappings are linear and take the component vector to the vector determined by the component vector. Then the diagram which describes the matrix of the linear transformation L is in the following picture. L β = {v1 , · · · , vn } V → W {w1 , · · · , wm } = γ (5.3) qβ ↑ ◦ ↑ qγ Fn → Fm [L]γβ In terms of this diagram, the matrix [L]γβ is the matrix chosen to make the diagram “commute”. It is the matrix of the linear transformation because it takes the component vector of v to the component vector for Lv. As implied by the diagram and as shown above, for A = [L]γβ , Lvi =

m ∑ j=1

Aji wj

5.2. THE MATRIX OF A LINEAR TRANSFORMATION

89

Gimmick for finding matrix of a linear transformation It may be useful to write this in the form ( ) ( Lv1 · · · Lvn = w1

···

) wm

A, A is m × n

(5.4)

and multiply formally as if the Lvi , wj were numbers. Example 5.2.1 Let L ∈ L (Fn , Fm ) and let the two bases be

{ e1

···

en

} { , e1

···

} em

, ei

th

denoting the column vector of zeros except for a 1 in the i position. Then from the above, you need to have m ∑ Lei = Aji ej j=1

which says that

( Le1

···

) Len

( = m×n

e1

···

) em

m×m

Am×n

and so Lei equals the ith column of A. In other words, ( ) A = Le1 · · · Len . ( Then for x =

x1

···

)T xn Ax

= A

( n ∑

) xi e i

i=1

=

n ∑

xi Lei = L

=

n ∑

i=1 ( n ∑

i=1

xi Aei )

xi ei

= Lx

i=1

Thus, doing L to a vector x is the same as multiplying on the left by the matrix A. Example 5.2.2 Let V ≡ { polynomials of degree 3 or less}, W ≡ { polynomials of degree 2 or less},

{ } and L ≡ D where D is the differentiation operator. A basis for V is β = 1, x, x2 , x3 and a basis for W is γ = {1, x, x2 }. What is the matrix of this linear transformation with respect to this basis? Using 5.4, ( ) ( ) 0 1 2x 3x2 = 1 x x2 [D]γβ . It follows from this that the first column of [D]γβ is  0    0  0 

The next three columns of [D]γβ are      0 0 1        0 , 2 , 0  3 0 0 

90

CHAPTER 5. 

and so [D]γβ

0  = 0 0

1 0 0

0 2 0

LINEAR TRANSFORMATIONS

 0  0 . 3

Say you have a + bx + cx2 + dx3 . Then doing D to it gives b + 2cx + 3dx2 . The component vector of the function is ( )T a b c d and after doing D to the function, you get for the component vector (

)T b 2c 3d

This is the same result you get when you multiply by [D] . 

0 1   0 0 0 0

0 2 0





0   0   3

a b c d



  b      =  2c   3d

Of course, this is what it means to be the matrix of the transformation. Now consider the important case where V = Fn , W = Fm , and the basis chosen is the standard basis of vectors ei described above. β = {e1 , · · · , en } , γ = {e1 , · · · , em } Let L be a linear transformation from Fn to Fm and let A be the matrix of the transformation with respect to these bases. In this case the coordinate maps qβ and qγ are simply the identity maps on Fn and Fm respectively, and can be accomplished by simply multiplying by the appropriate sized identity matrix. The requirement that A is the matrix of the transformation amounts to Lb = Ab What about the situation where different pairs of bases are chosen for V and W ? How are the two matrices with respect to these choices related? Consider the following diagram which illustrates the situation. Fn A2 Fm − → qβ 2 ↓ ◦ qγ 2 ↓ V − L W → qβ 1 ↑ Fn

In this diagram qβ i and qγ i

◦ qγ 1 ↑ A1 Fm − → are coordinate maps as described above. From the diagram, qγ−1 qγ 2 A2 qβ−1 qβ 1 = A1 , 1 2

where qβ−1 qβ 1 and qγ−1 qγ 2 are one to one, onto, and linear maps which may be accomplished by 1 2 multiplication by a square matrix. Thus there exist matrices P, Q such that P : Fn → Fn and Q : Fm → Fm are invertible and P A 2 Q = A1 . Example 5.2.3 Let β ≡ {v1 , · · · , vn } and γ ≡ {w1 , · · · , wn } be two bases for V . Let L be the linear transformation which maps vi to wi . Find [L]γβ .

5.2. THE MATRIX OF A LINEAR TRANSFORMATION

91

Letting δ ij be the symbol which equals 1 if i = j and 0 if i ̸= j, it follows that L = and so [L]γβ = I the identity matrix.

∑ i,j

δ ij wi vj

Definition 5.2.4 In the special case where V = W and only one basis is used for V = W, this becomes qβ−1 qβ 2 A2 qβ−1 q β 1 = A1 . 1 2 Letting S be the matrix of the linear transformation qβ−1 qβ 1 with respect to the standard basis vectors 2 in Fn , S −1 A2 S = A1 . (5.5) When this occurs, A1 is said to be similar to A2 and A → S −1 AS is called a similarity transformation. Recall the following. Definition 5.2.5 Let S be a set. The symbol ∼ is called an equivalence relation on S if it satisfies the following axioms. 1. x ∼ x

for all x ∈ S. (Reflexive)

2. If x ∼ y then y ∼ x. (Symmetric) 3. If x ∼ y and y ∼ z, then x ∼ z. (Transitive) Definition 5.2.6 [x] denotes the set of all elements of S which are equivalent to x and [x] is called the equivalence class determined by x or just the equivalence class of x. Also recall the notion of equivalence classes. Theorem 5.2.7 Let ∼ be an equivalence class defined on a set S and let H denote the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ∼ y and [x] = [y] or it is not true that x ∼ y and [x] ∩ [y] = ∅. Theorem 5.2.8 In the vector space of n × n matrices, define A∼B if there exists an invertible matrix S such that A = S −1 BS. Then ∼ is an equivalence relation and A ∼ B if and only if whenever V is an n dimensional vector space, there exists L ∈ L (V, V ) and bases {v1 , · · · , vn } and {w1 , · · · , wn } such that A is the matrix of L with respect to {v1 , · · · , vn } and B is the matrix of L with respect to {w1 , · · · , wn }. Proof: A ∼ A because S = I works in the definition. If A ∼ B , then B ∼ A, because A = S −1 BS implies B = SAS −1 . If A ∼ B and B ∼ C, then A = S −1 BS, B = T −1 CT and so

−1

A = S −1 T −1 CT S = (T S)

CT S

which implies A ∼ C. This verifies the first part of the conclusion.

92

CHAPTER 5.

LINEAR TRANSFORMATIONS

Now let V be an n dimensional vector space, A ∼ B so A = S −1 BS and pick a basis for V, β ≡ {v1 , · · · , vn }. Define L ∈ L (V, V ) by Lvi ≡



aji vj

j

where A = (aij ) . Thus A is the matrix of the linear transformation L. Consider the diagram Fn qγ ↓ V qβ ↑ Fn

Fn

B − → ◦ L − →

qγ ↓ V

◦ qβ ↑ A Fn − →

where qγ is chosen to make the diagram commute. Thus we need S = qγ−1 qβ which requires qγ = qβ S −1 Then it follows that B is the matrix of L with respect to the basis {qγ e1 , · · · , qγ en } ≡ {w1 , · · · , wn }. That is, A and B are matrices of the same linear transformation L. Conversely, suppose whenever V is an n dimensional vector space, there exists L ∈ L (V, V ) and bases {v1 , · · · , vn } and {w1 , · · · , wn } such that A is the matrix of L with respect to {v1 , · · · , vn } and B is the matrix of L with respect to {w1 , · · · , wn }. Then it was shown above that A ∼ B.  What if the linear transformation consists of multiplication by a matrix A and you want to find the matrix of this linear transformation with respect to another basis? Is there an easy way to do it? The next proposition considers this. Proposition 5.2.9 Let A be an m × n matrix and consider it as a linear transformation by multiplication on the left by A. Then the matrix M of this linear transformation with respect to the bases β = {u1 , · · · , un } for Fn and γ = {w1 , · · · , wm } for Fm is given by ( M= ( where

w1

···

) wm

w1

(

)−1

···

A

wm

)

···

u1

un

is the m × m matrix which has wj as its j th column. Note that also ( w1

···

)

( M

wm

u1

···

)−1 un

=A

Proof: Consider the following diagram.

F

n

qβ ↑ Fn

A → ◦ → M

Fm ↑ qγ Fm

Here the coordinate maps are defined in the usual way. Thus qβ (x) ≡

n ∑ i=1

( xi ui =

u1

···

) un

x

5.2. THE MATRIX OF A LINEAR TRANSFORMATION

93

Therefore, qβ can)be considered the same as multiplication of a vector in Fn on the left by the matrix ( u1 · · · un . Similar considerations apply to qγ . Thus it is desired to have the following for an arbitrary x ∈ Fn . ) ) ( ( A u1 · · · un x = w1 · · · wn M x Therefore, the conclusion of the proposition follows.  The second formula in the above is pretty useful. You might know the matrix M of a linear transformation with respect to a funny basis and this formula gives the matrix of the linear transformation in terms of the usual basis which is really what you want. Definition 5.2.10 Let A ∈ L (X, Y ) where X and Y are finite dimensional vector spaces. Define rank (A) to equal the dimension of A (X) . Lemma 5.2.11 Let M be an m × n matrix. Then M can be considered as a linear transformation as follows. M (x) ≡ M x That is, you multiply on the left by M . Proof: This follows from the properties of matrix multiplication. In particular, M (ax + by) = aM x + bM y  Note also that, as explained earlier, the image of this transformation is just the span of the columns, known as the column space. The following theorem explains how the rank of A is related to the rank of the matrix of A. Theorem 5.2.12 Let A ∈ L (X, Y ). Then rank (A) = rank (M ) where M is the matrix of A taken with respect to a pair of bases for the vector spaces X, and Y. Here M is considered as a linear transformation by matrix multiplication. Proof: Recall the diagram which describes what is meant by the matrix of A. Here the two bases are as indicated. β = {v1 , · · · , vn } X A Y {w1 , · · · , wm } = γ − → qβ ↑ Fn

◦ ↑ qγ m M − → F

Let {Ax1 , · · · , Axr } be a basis for AX. Thus { } qγ M qβ−1 x1 , · · · , qγ M qβ−1 xr is a basis for AX. It follows that

{ } −1 −1 M qX x1 , · · · , M qX xr

is linearly independent and so rank (A) ≤ rank (M ) . However, one could interchange the roles of M and A in the above argument and thereby turn the inequality around.  The following result is a summary of many concepts. Theorem 5.2.13 Let L ∈ L (V, V ) where V is a finite dimensional vector space. Then the following are equivalent. 1. L is one to one. 2. L maps a basis to a basis. 3. L is onto. 4. If Lv = 0 then v = 0.

94

CHAPTER 5.

LINEAR TRANSFORMATIONS

∑n n Proof:∑ Suppose first L is one to one and let β = {vi }i=1 be a basis. Then if i=1 ci Lvi = 0 it n follows ( i=1 ci vi ) = 0 which means that since L (0) = 0, and L is one to one, it must be the case ∑L n that i=1 ci vi = 0. Since {vi } is a basis, each ci = 0 which shows {Lvi } is a linearly independent set. Since there are n of these, it must be that this is a basis. Now suppose 2.). Then letting ∑n {vi } be a basis, ∑nand y ∈ V, it follows from part 2.) that there are constants, {ci } such that y = i=1 ci Lvi = L ( i=1 ci vi ) . Thus L is onto. It has been shown that 2.) implies 3.). Now suppose 3.). Then L (V ) = V . If {v1 , · · · , vn } is a basis of V, then V = span (Lv1 , · · · , Lvn ) . It follows that {Lv1 , · · · , Lvn } must be linearly independent because if not, one of the vectors could be deleted and you would then have a spanning set with fewer vectors than dim (V ). If Lv = 0, ∑ v= x i vi i

then doing L to both sides, 0=



xi Lvi

i

which imiplies each xi = 0 and consequently v = 0. Thus 4. follows. Now suppose 4.) and suppose Lv = Lw. Then L (v − w) = 0 and so by 4.), v − w = 0 showing that L is one to one.  Also it is important to note that composition of linear transformations corresponds to multiplication of the matrices. Consider the following diagram in which [A]γβ denotes the matrix of A relative to the bases γ on Y and β on X, [B]δγ defined similarly. X qβ ↑ Fn

A Y − → ◦ ↑ qγ [A]γβ Fm −−−→

B Z − → ◦ ↑ qδ [B]δγ Fp −−−→

where A and B are two linear transformations, A ∈ L (X, Y ) and B ∈ L (Y, Z) . Then B ◦ A ∈ L (X, Z) and so it has a matrix with respect to bases given on X and Z, the coordinate maps for these bases being qβ and qδ respectively. Then B ◦ A = qδ [B]δγ qγ−1 qγ [A]γβ qβ−1 = qδ [B]δγ [A]γβ qβ−1 . But this shows that [B]δγ [A]γβ plays the role of [B ◦ A]δβ , the matrix of B ◦ A. Hence the matrix of B ◦ A equals the product of the two matrices [A]γβ and [B]δγ . Of course it is interesting to note that although [B ◦ A]δβ must be unique, the matrices, [A]γβ and [B]δγ are not unique because they depend on γ, the basis chosen for Y . Theorem 5.2.14 The matrix of the composition of linear transformations equals the product of the matrices of these linear transformations.

5.3

Rotations About a Given Vector∗

As an application, consider the problem of rotating counter clockwise about a given unit vector which is possibly not one of the unit vectors in coordinate directions. First consider a pair of perpendicular unit vectors, u1 and u2 and the problem of rotating in the counterclockwise direction about u3 where u3 = u1 × u2 so that u1 , u2 , u3 forms a right handed orthogonal coordinate system. See the appendix on the cross product if this is not familiar. Thus the vector u3 is coming out of the page.

5.3. ROTATIONS ABOUT A GIVEN VECTOR∗

95 -

θ θ u1 u2R

? Let T denote the desired rotation. Then

T (au1 + bu2 + cu3 ) = aT u1 + bT u2 + cT u3 = (a cos θ − b sin θ) u1 + (a sin θ + b cos θ) u2 + cu3 . Thus in terms of the basis γ ≡ {u1 , u2 , u3 } , the matrix of  cos θ − sin θ  [T ]γ ≡  sin θ cos θ 0 0

this transformation is  0  0 . 1

This is not desirable because it involves a funny basis. I want to obtain the matrix of the transformation in terms of the usual basis β ≡ {e1 , e2 , e3 } because it is in terms of this basis that we usually deal with vectors in R3 . From Proposition 5.2.9, if [T ]β is this matrix, 

 cos θ − sin θ 0   cos θ 0   sin θ 0 0 1 ( ( )−1 [T ]β u1 = u1 u2 u3

) u2

u3

and so you can solve for [T ]β if you know the ui . Recall why this is so. R3 R3 [T ]γ −−→ qγ ↓ ◦ qγ ↓ R3 −− T R3 → I↑ R3

◦ [T ]β −−→

I↑ R3 (

The map qγ is accomplished by a multiplication on the left by [T ]β = qγ [T ]γ qγ−1 =

( u2

u3

u2

u3

(

) u1

) u1

[T ]γ

)−1 u1

u2

u3

. Thus .

Suppose the unit vector u3 about which the counterclockwise rotation takes place is (a, b, c). Then I obtain vectors, u1 and u2 such that {u1 , u2 , u3 } is a right handed orthonormal system with u3 = (a, b, c) and then use the above result. It is of course somewhat arbitrary how this is accomplished. I will assume however, that |c| ̸= 1 since otherwise you are looking at either clockwise or counter clockwise rotation about the positive z axis and this is a problem which is fairly easy. Indeed, the matrix of such a rotation in terms of the usual basis is just   cos θ − sin θ 0   (5.6) cos θ 0   sin θ 0 0 1

96

CHAPTER 5.

LINEAR TRANSFORMATIONS

Then let u3 = (a, b, c) and u2 ≡ √a21+b2 (b, −a, 0) . This one is perpendicular to u3 . If {u1 , u2 , u3 } is to be a right hand system it is necessary to have ( ) 1 u1 = u2 × u3 = √ −ac, −bc, a2 + b2 2 2 2 2 2 (a + b ) (a + b + c ) Now recall that u3 is a unit vector and so the above equals √

(

1 (a2

+

b2 )

−ac, −bc, a2 + b2

)

Then from the above, A is given by    

√ −ac 2

(a +b2 ) √ −bc 2 2 √ (a +b ) 2 a + b2

√ b a2 +b2 √ −a a2 +b2

0

a



cos θ  b    sin θ 0 c

− sin θ cos θ 0

  √ −ac 0 (a2 +b2 )   √ −bc 0   (a2 +b2 ) √ 1 a 2 + b2

√ b a2 +b2 √ −a a2 +b2

a

0

c

−1

  b 

It is easy to take the inverse of this matrix on the left. You can check right away that its inverse is nothing but its transpose. Then doing the computation and then some simplification yields   ( ) ab (1 − cos θ) − c sin θ ac (1 − cos θ) + b sin θ a2 + 1 − a2 cos θ ( )   (5.7) =  ab (1 − cos θ) + c sin θ b2 + 1 − b2 cos θ bc (1 − cos θ) − a sin θ  . ( ) 2 2 ac (1 − cos θ) − b sin θ bc (1 − cos θ) + a sin θ c + 1 − c cos θ With this, it is clear how to rotate clockwise about the unit vector, (a, b, c) . Just rotate counter clockwise through an angle of −θ. Thus the matrix for this clockwise rotation is just   ( ) a2 + 1 − a2 cos θ ab (1 − cos θ) + c sin θ ac (1 − cos θ) − b sin θ ( )   =  ab (1 − cos θ) − c sin θ b2 + 1 − b2 cos θ bc (1 − cos θ) + a sin θ  . ( ) ac (1 − cos θ) + b sin θ bc (1 − cos θ) − a sin θ c2 + 1 − c2 cos θ In deriving 5.7 it was assumed that c ̸= ±1 but even in this case, it gives the correct answer. Suppose for example that c = 1 so you are rotating in the counter clockwise direction about the positive z axis. Then a, b are both equal to zero and 5.7 reduces to 5.6.

5.4

Exercises

1. If A, B, and C are each n × n matrices and ABC is invertible, why are each of A, B, and C invertible? 2. Give an example of a 3 × 2 matrix with the property that the linear transformation determined by this matrix is one to one but not onto. 3. Explain why Ax = 0 always has a solution whenever A is a linear transformation. 4. Recall that a line in Rn is of the form x + tv where t ∈ R. Recall that v is a “direction vector”. Show that if T : Rn → Rm is linear, then the image of T is either a line or a point. 5. In the following examples, a linear transformation, T is given by specifying its action on a basis β. Find its matrix with respect to this basis. ( ) ( ) ( ) ( ) ( ) 1 1 −1 −1 −1 (a) T =2 +1 ,T = 2 2 1 1 1 ( ) ( ) ( ) ( ) ( ) 0 0 −1 −1 0 (b) T =2 +1 ,T = 1 1 1 1 1

5.4. EXERCISES ( (c) T

1 0

97

)

( =2

1 2

)

( +1

1 0

)

( ,T

1 2

)

( =1

1 0

)

( −

1 2

)

6. ↑In each example above, find a matrix A such that for every x ∈ R2 , T x = Ax. 7. Consider the linear transformation Tθ which rotates every vector in R2 through the angle of θ. Find the matrix Aθ such that Tθ x = Aθ x. Hint: You need to have the columns of Aθ be T e1 and T e2 . Review why this is before using this. Then simply find these vectors from trigonometry. 8. ↑If you did the above problem right, you got ( cos θ Aθ = sin θ

− sin θ cos θ

)

Derive the famous trig. identities for the sum of two angles by using the fact that Aθ+ϕ = Aθ Aϕ and the above description. 9. Let β = {u1 , · · · , un } be a basis for Fn and let T : Fn → Fn be defined as follows. ( n ) n ∑ ∑ T ak uk = ak bk uk k=1

k=1

First show that T is a linear transformation. Next show that the matrix of T with respect to this basis is [T ]β =   b1   ..   .   bn Show that the above definition is equivalent to simply specifying T on the basis vectors of β by T (uk ) = bk uk . 10. Let T be given by specifying its action on the vectors of a basis β = {u1 , · · · , un } as follows. T uk =

n ∑

ajk uj .

j=1

Letting A = (aij ) , verify that [T ]β = A. It is done in the chapter, but go over it yourself. Show that [T ]γ = ( ) ( )−1 (5.8) u1 · · · un [T ]β u1 · · · un 11. Let a be a fixed vector. The function Ta defined by Ta v = a + v has the effect of translating all vectors by adding a. Show this is not a linear transformation. Explain why it is not possible to realize Ta in R3 by multiplying by a 3 × 3 matrix. 12. ↑In spite of Problem 11 we can represent both translations and rotations by matrix multiplication at the expense of using higher dimensions. This is done by the homogeneous coordinates. T I will illustrate in R3 where most interest in this is found. For each vector v = (v1 , v2 , v3 ) , T consider the vector in R4 (v1 , v2 , v3 , 1) . What happens when you do    v1 1 0 0 a1  0 1 0 a  v   2  2  ?    0 0 1 a3   v3  1 0 0 0 1

98

CHAPTER 5.

LINEAR TRANSFORMATIONS

Describe how to consider both rotations and translations all at once by forming appropriate 4 × 4 matrices. ( ) 13. You want to add 1, 2, 3 to every point in R3 and then rotate about the z axis counter ( ) clockwise through an angle of 30◦ . Find what happens to the point 1, 1, 1 . 14. Let P3 denote the set of real polynomials of degree no more than 3, defined on an interval [a, b]. Show that P3 is a subspace of the { } vector space of all functions defined on this interval. Show that a basis for P3 is 1, x, x2 , x3 . Now let D denote the differentiation operator which sends a function to its derivative. Show D is a linear transformation which sends P3 to P3 . Find the matrix of this linear transformation with respect to the given basis. 15. Generalize the above problem to Pn , the space of polynomials of degree no more than n with basis {1, x, · · · , xn } . ( )−1 ( −1 )T = A . 16. If A is an n × n invertible matrix, show that AT is also and that in fact, AT 17. Suppose you have an invertible n × n matrix A. Consider    p1 (x) 1    ..   = A  .. .    . pn (x) xn−1

the polynomials    

Show that these polynomials p1 (x) , · · · , pn (x) are a linearly independent set of functions. 18. Let the linear transformation be T = D2 + 1, defined {as T f = f ′′}+ f. Find the matrix of this linear transformation with respect to the given basis 1, x, x2 , x3 . 19. Let L be the linear transformation taking polynomials of degree at most three to polynomials of degree at most three given by D2 + 2D + 1 where D is the operator. Find the matrix of this linear transformation relative { differentiation } to the basis 1, x, x2 , x3 . Find the matrix directly and then find the matrix with respect to the differential operator D + 1 and multiply this matrix by itself. You should get the same thing. Why? 20. Let L be the linear transformation taking polynomials of degree at most three to polynomials of degree at most three given by D2 + 5D + 4 where D is the{differentiation } operator. Find the matrix of this linear transformation relative to the bases 1, x, x2 , x3 . Find the matrix directly and then find the matrices with respect to the differential operators D + 1, D + 4 and multiply these two matrices. You should get the same thing. Why? 21. Suppose A ∈ L (V, W ) where dim (V ) > dim (W ) . Show ker (A) ̸= {0}. That is, show there exist nonzero vectors v ∈ V such that Av = 0. 22. A vector v is in the convex hull of a nonempty set if there are finitely many vectors of S, {v1 , · · · , vm } and nonnegative scalars {t1 , · · · , tm } such that v=

m ∑

tk vk ,

k=1

m ∑

tk = 1.

k=1

Such a linear combination is called a convex ∑m combination. Suppose now that S ⊆ V, a vector space of dimension n. Show that if v = k=1 tk vk is a vector in the convex hull for m > n + 1, then there exist other scalars {t′k } such that v=

m−1 ∑ k=1

t′k vk .

5.4. EXERCISES

99

Thus every vector in the convex hull of S can be obtained as a convex combination of at most n + 1 points of S. This incredible result is in Rudin [33]. Hint: Consider L : Rm → V × R defined by ) (m m ∑ ∑ ak ak vk , L (a) ≡ k=1

k=1

Explain why ker (L) ̸= {0} . Next, letting a ∈ ker (L) \ {0} and λ ∈ R, note that λa ∈ ker (L) . Thus for all λ ∈ R, m ∑ (tk + λak ) vk . v= k=1

Now vary λ till some tk + λak = 0 for some ak ̸= 0. 23. For those who know about compactness, use Problem 22 to show that if S ⊆ Rn and S is compact, then so is its convex hull. 24. Show that if L ∈ L (V, W ) (linear transformation) where V and W are vector spaces, then if Lyp = f for some yp ∈ V, then the general solution of Ly = f is of the form ker (L) + yp . 25. Suppose Ax = b has a solution. Explain why the solution is unique precisely when Ax = 0 has only the trivial (zero) solution. 26. Let L : Rn → R be linear. Show that there exists a vector a ∈ Rn such that Ly = aT y. 27. Let the linear transformation T be determined  1 0  Tx = 0 1 1 1

by −5 −3 −8

 −7  −9  x −16

Find the rank of this transformation. ( ) 28. Let T f = D2 + 5D + 4 f for f in the vector space of polynomials of degree no more than 3 where we consider T to map into the same vector space. Find the rank of T . You might want to use Proposition 4.3.6. ∑ 29. (Extra important) Let A be an n × n matrix. The trace of A, trace (A) is defined ( T )as i Aii . It is just the sum of the entries on the main diagonal. Show trace (A) = trace A . Suppose A is m × n and B is n × m. Show that trace (AB) = trace (BA) . Now show that if A and B are similar n × n matrices, then trace (A) = trace (B). Recall that A is similar to B means A = S −1 BS for some matrix S. 30. Suppose you have a monic polynomial ϕ (λ) which is irreducible over F the field of scalars. Remember that this means that no polynomial divides it except scalar multiples of ϕ (λ) and scalars. Say ϕ (λ) = a0 + a1 λ + · · · + ad−1 λd−1 + λd Now consider A ∈ L (V, V ) where V is a vector space. Consider ker { } (ϕ (A)) and suppose this is not 0. For x ∈ ker (ϕ (A)) , x ̸= 0, let β x = x, Ax, · · · , Ad−1 x . Show that β x is an independent set of vectors if x ̸= 0. 31. ↑Let V be a finite dimensional vector space and let A ∈ L (V, V ) . Also let W be a subspace of V such that A (W ) ⊆ W. We call such a subspace an A invariant subspace. Say {w1 , · · · , ws } is a basis for W . Also let x ∈ U W where U is an A invariant subspace which is contained in ker (ϕ (A)). Then you know that {w1 , · · · , ws , x} is linearly independent. Show that in fact {w1 , · · · , ws , β x } is linearly independent where β x is given in the above problem. Hint: Suppose you have s d ∑ ∑ ak wk + bj Aj−1 x = 0. (*) k=1

j=1

100

CHAPTER 5.

LINEAR TRANSFORMATIONS

You need to verify that the second sum is 0. From this it will follow that each bj is 0 and ∑d then each ak = 0. Let S = j=1 bj Aj−1 x. Observe that β S ⊆ β x and if S ̸= 0, then β S is independent from the above problem and both β x and β S have the same dimension. You will argue that span (β S ) ⊆ W ∩ span (β x ) ⊆ span (β x ) and then use Problem 6 on Page 63.. 32. ↑In the situation of the {above problem, show that } there exist finitely many vectors in U, {x1 , · · · , xm } such that w1 , · · · , ws , β x1 , · · · , β xm is a basis for U + W . This last vector space is defined as the set of all y + w where y ∈ U and w ∈ W . 33. ↑ In the situation of the above where ϕ (λ) is irreducible. Let U be defined as m

U = ϕ (A) (ker (ϕ (A) )) ( ) m−1 Explain why U ⊆ ker ϕ (A) . Suppose you have a linearly independent set in U which is { } β x1 , · · · , β xr . Here the notation means { } β x ≡ x, Ax, · · · , Am−1 x where these vectors are independent but Am x is in the span of these. Such exists { any time you } have x ∈ ker (g (A)) for g (λ) a polynomial. Letting ϕ (A) yi = xi , explain why β y1 , · · · , β yr is also linearly independent. This is like the theorem presented earlier that the inverse image of a linearly independent set is linearly independent but it is more complicated here because instead of single vectors, we are considering sets β x .

Chapter 6

Direct Sums and Block Diagonal Matrices This is a convenient place to put a very interesting result about direct sums and block diagonal matrices. First is the notion of a direct sum. In all of this, V will be a finite dimensional vector space of dimension n and field of scalars F. r

Definition 6.0.1 Let {Vi }i=1 be subspaces of V. Then r ∑

Vi ≡ V1 + · · · + Vr

i=1

denotes all sums of the form

∑r i=1

vi where vi ∈ Vi . If whenever r ∑

vi = 0, vi ∈ Vi ,

(6.1)

i=1

it follows that vi = 0 for each i, then a special notation is used to denote

∑r i=1

Vi . This notation is

V1 ⊕ · · · ⊕ Vr , ⊕r or sometimes to save space Vi and it is called a direct sum of subspaces. A subspace W of V i=1 is called A invariant for A ∈ L (V, V ) if AW ⊆ W . The important idea is that you seek to understand A by looking at what it does on each Vi . It is a lot like knowing A by knowing what it does to a basis, an idea used earlier. { } i Lemma 6.0.2 If V = V1 ⊕ · · · ⊕ Vr and if β i = v1i , · · · , vm is a basis for Vi , then a basis for V i is {β 1 , · · · , β r }. Thus r r ∑ ∑ dim (V ) = dim (Vi ) = |β i | i=1

i=1

where |β i | denotes the number of vectors in β i . Conversely, if β i linearly independent and if a basis for V is {β 1 , · · · , β r } , then V = span (β 1 ) ⊕ · · · ⊕ span (β r ) ∑mi ∑r ∑mi cij vji = cij vji = 0. Since it is a direct sum, it follows for each i, j=1 Proof: Suppose i=1 j=1 { i } i 0 and now, since v1 , · · · , vmi is a basis, each cij = 0 for each j, this for each i. Suppose now that each β i is independent and a basis is {β 1 , · · · , β r } . Then clearly V = span (β 1 ) + · · · + span (β r ) 101

102

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

∑r ∑mi i Suppose then that 0 = inside sum being something in span (β i ). Since i=1 j=1 cij vj , the ∑mi {β 1 , · · · , β r } is a basis, each cij = 0. Thus each j=1 cij vji = 0 and so V = span (β 1 )⊕· · ·⊕span (β r ).  Thus, from this lemma, we can produce a basis for V of the form {β 1 , · · · , β r } , so what is the matrix of a linear transformation A such that each Vi is A invariant? Theorem 6.0.3 Suppose V is a vector space with field of scalars F and A ∈ L (V, V ). Suppose also V = V1 ⊕ · · · ⊕ Vq where each Vk is A invariant. (AVk ⊆ Vk ) Also let β k be an ordered basis for Vk and let Ak denote the restriction of A to Vk . Letting M k denote the matrix of Ak with respect to { } this basis, it follows the matrix of A with respect to the basis β 1 , · · · , β q is    

M1



0 ..

  

. Mq

0

{ } Proof: Let β denote the ordered basis β 1 , · · · , β q , |β k | being the number of vectors in β k . Let qk : F|β k | → Vk be the usual map such that the following diagram commutes.

Vk qk ↑ F|β k |

Ak → ◦ → Mk

Vk ↑ qk F|β k |

Thus Ak qk = qk M k . Then if q is the map from Fn to V corresponding to the ordered basis β just described, ( )T q 0 ··· x ··· 0 = qk x, ∑k−1 ∑k where x occupies the positions between i=1 |β i | + 1 and i=1 |β i |. Then M will be the matrix of A with respect to β if and only if a similar diagram to the above commutes. Thus it is required that Aq = qM . However, from the description of q just made, and the invariance of each Vk ,      Aq    

0 .. . x .. . 0





        k  = Ak qk x = qk M x = q       

M1

0 ..

. Mk ..

. Mq

0

        

0 .. . x .. . 0

        

It follows that the above block diagonal matrix is the matrix of A with respect to the given ordered basis.  The matrix of A with respect to the ordered basis β which is described above is called a block diagonal matrix. Sometimes the blocks consist of a single number. Example 6.0.4 Consider the following matrix.  1 0  A≡ 1 0 −2 2

 0  −1  3

103 

  1    Let V1 ≡ span  1  ,  0 Vi is A invariant. Find the

First note that

   1 0    0  , V2 ≡ span  −1  . Show that R3 = V1 ⊕ V2 and and that 1 2 matrix of A with respect to the ordered basis         1 0  1        (*)  1  ,  0  ,  −1      0 1 2 

1 0   1 0 −2 2  1 0   1 0 −2 2

 0 1  −1   1 3 0  0 1  −1   0 3 1





 1    = 1  0    1    = 0  1

Therefore, A (V1 ) ⊆ V1 . Similarly,        0 0 1 0 0 0         1 0 −1   −1  =  −2  = 2  −1  2 4 2 −2 2 3 and so A (V2 ) ⊆ V2 . The vectors in ∗ clearly are a basis for R3 . You can verify this by observing that there is a unique solution x, y, z to the system of equations         a 0 1 1         x  1  + y  0  + z  −1  =  b  c 2 1 0 for any choice of the right side. Therefore, by Lemma 6.0.2, R3 = V1 ⊕ V2 . If you look at the restriction of A to V1 , what is the matrix of this restriction? It satisfies          ) ( 1 1 1 1       a b    A  1  , A  0  =  1  ,  0  c d 1 0 1 0 Thus, from what was observed above,    1 1     1  ,  0 1 0

you need the matrix     1      =  1  ,  0

( and so the matix on the right is just

1 0

0 1

on the right to satisfy  ) ( 1  a b 0  c d 1

) . As to the matrix of A restricted to V2 , we need

     0 0 0       A  −1  = 2  −1  = a  −1  2 2 2 

104

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

where a is a 1 × 1 matrix. Thus a = 2 and so given above is  1   0 0

the matrix of A with respect to the ordered basis 0 1 0

 0  0  2

What if you changed the order of the vectors in the basis? Suppose you had them ordered as        1  0   1        1  ,  −1  ,  0      1 0 2 Then you would have three invariant subspaces whose direct sum is R3 ,       1 0 1       span  1  , span  −1  , and span  0  0 2 1 Then the matrix of A with respect to this ordered  1 0   0 2 0 0

basis is  0  0  1

Example 6.0.5 Consider the following matrix.  3  A =  −1 −1

 1 0  1 0  −1 1



     0 1 1       V1 ≡ span  0  , V2 ≡ span  0  ,  −1  1 −1 0

Let

Show that these are A invariant subspaces and      0     0 ,   1 First note that





3   −1 −1 

  1 0 1   1 0  − 2 0 −1 1 0

find the matrix of A with respect to the ordered basis     1 1     −1  ,  0    0 −1

0 1 0

     0 1 1      0   0  =  −1  1 −1 0

1   and so A  0  is in the span of −1       1 1       ,  −1   0  .     −1 0

105 

Also

    1 2 1 0     1 0   −1  =  −2  −1 1 0 0     1 1     ∈ span  −1  ,  0  0 −1

3   −1 −1

Thus V2 is A invariant. What is the matrix of A restricted to V2 ? We need          ( ) 1 1 1 1          a b A  −1  , A  0  =  −1  ,  0  c d 0 −1 0 −1 Now it was shown above that



     1 1 1       A  0  = 2  0  +  −1  −1 −1 0

and so the matrix is of the form

(

a 1 c 2

)

   1 1     Then it was also shown that A  −1  = 2  −1  and so the matrix is of the form 0 0 ( ) 2 1 0 2 

  0 0    As to V1 , A  0  =  0 1 1 of the number 1. Thus the 

  and the matrix of A restricted to V1 is just the 1 × 1 matrix consisting matrix of A with respect to this basis is   1 0 0    0 2 1  0 0 2

How can you find V as a direct sum of invariant subspaces? In the next section, I will give a systematic way based on a profound theorem of Sylvester. However, there is also a very easy way to come up with an( invariant subspace. Let v ∈ V an n dimensional vector space and let A ∈ L (V, V ) . ) Let W ≡ span v, Av, A2 v, · · · . It is left as an exercise to verify that W is a finite dimensional subspace of V . Recall that the span is the set of all finite linear combinations. Of course W might be all of V or it might be a proper subset of V . The method of Sylvester will end up typically giving proper invariant subspaces whose direct sum is the whole space. An outline of the following presentation is as follows. ∏m ∑m 1. Sylvester’s theorem dim (ker ( i=1 Li )) ≤ i=1 dim (ker (Li )) 2. If Li Lj = Lj Li , Li one to one on ker (Li ) then ker (

∏m

i=1 Li ) =

m ⊕ i=1

ker (Li )

106

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

∏m r 3. L ∈ L (V, V ) having minimum polyinomial i=1 ϕi (λ) i . Then (m ) m ∏ ⊕ ri V ≡ ker ϕi (L) = ker (ϕi (L)) i=1

6.1

i=1

A Theorem of Sylvester, Direct Sums

The notation is defined as follows. First recall the definition of ker in Problem 23 on Page 53. Definition 6.1.1 Let L ∈ L (V, W ) . Then ker (L) ≡ {v ∈ V : Lv = 0} . Lemma 6.1.2 Whenever L ∈ L (V, W ) , ker (L) is a subspace. Also, if V is an n dimensional vector space and W is a subspace of V , then W = V if and only if dim (W ) = n. Proof: If a, b are scalars and v,w are in ker (L) , then L (av + bw) = aL (v) + bL (w) = 0 + 0 = 0 As to the last claim, it is clear that dim (W ) ≤ n. If dim (W ) = n, then, letting {w1 , · · · , wn } be a basis for W, there can be no v ∈ V \ W because then v ∈ / span (w1 , · · · , wn ) and so by Lemma 3.1.7 {w1 , · · · , wn , v} would be independent which is impossible by Theorem 3.1.5. You have an independent set which is longer than a spanning set.  Suppose now that A ∈ L (V, W ) and B ∈ L (W, U ) where V, W, U are all finite dimensional vector spaces. Then it is interesting to consider ker (BA). The following theorem of Sylvester is a very useful and important result. Theorem 6.1.3 Let A ∈ L (V, W ) and B ∈ L (W, U ) where V, W, U are all vector spaces over a field F. Suppose also that ker (A) and A (ker (BA)) are finite dimensional subspaces. Then dim (ker (BA)) ≤ dim (ker (B)) + dim (ker (A)) . Equality holds if and only if A (ker (BA)) = ker (B). Proof: If x ∈ ker (BA) , then Ax ∈ ker (B) and so A (ker (BA)) ⊆ ker (B) . The following picture may help. ker(BA) ker(A)

ker(B) A

A (ker (BA)) ker (A) ker (B)

-

A(ker(BA))

Basis {Ay1 , · · · , Aym } {x1 , · · · , xn } {Ay1 , · · · , Aym , w1 , · · · , ws }

Now let {x1 , · · · , xn } be a basis of ker (A) and let {Ay , · · · , Aym } be a basis for A (ker (BA)) , ∑1m each yi ∈ ker (BA) . Take any z ∈ ker (BA) . Then Az = i=1 ai Ayi and so ( ) m ∑ A z− ai yi = 0 which means z −

∑m i=1

i=1

ai yi ∈ ker (A) and so there are scalars bi such that z−

m ∑ i=1

ai yi =

n ∑ j=1

bi x i .

6.1. A THEOREM OF SYLVESTER, DIRECT SUMS

107

It follows span (x1 , · · · , xn , y1 , · · · , ym ) = ker (BA) and so by the first part, (See the picture.) dim (ker (BA)) ≤ ≤

n + m ≤ dim (ker (A)) + dim (A (ker (BA))) dim (ker (A)) + dim (ker (B))

Now {x1 , · · · , xn , y1 , · · · , ym } is linearly independent because if ∑ ∑ ai xi + bj yj = 0 i

j

∑ then you could do A to both sides and conclude that j b∑ j Ayj = 0 which requires that each bj = 0. Then it follows that each ai = 0 also because it implies i ai xi = 0. Thus the first inequality in the above list is an equal sign and {x1 , · · · , xn , y1 , · · · , ym } is a basis for ker (BA). Each vector is in ker (BA), they are linearly independent, and their span is ker (BA) . Then by Lemma 6.1.2, A (ker (BA)) = ker (B) if and only if m = dim (ker (B)) if and only if dim (ker (BA)) = m + n = dim (ker (B)) + dim (ker (A)) .  Of course this result holds for any finite product of linear transformations by induction. One ∏l way this is quite useful is in the case where you have a finite product of linear transformations i=1 Li ) ∑ ( ∏l l all in L (V, V ) . Then dim ker i=1 Li ≤ i=1 dim (ker Li ) . Now here is a useful lemma which is likely already understood. Lemma 6.1.4 Let L ∈ L (V, W ) where V, W are n dimensional vector spaces. Then L is one to one, if and only if L is also onto. In fact, if {v1 , · · · , vn } is a basis, then so is {Lv1 , · · · , Lvn }. Proof: Let {v1 , · · · , vn } be a basis for V . Then I claim that {Lv1 , ·∑ · · , Lvn } is a basis for n W .∑First of all, I show {Lv1 , · · · , Lvn } is linearly independent. Suppose k=1 ck Lvk = 0. Then ∑n n L ( k=1 ck vk ) = 0 and since L is one to one, it follows k=1 ck vk = 0 which implies each ck = 0. Therefore, {Lv1 , · · · , Lvn } is linearly independent. If there exists w not in the span of these vectors, then by Lemma 3.1.7, {Lv1 , · · · , Lvn , w} would be independent and this contradicts the exchange theorem, Theorem 3.1.5 because it would be a linearly independent set having more vectors than the spanning set {v1 , · · · , vn } . Conversely, suppose L is onto. Then there exists a basis for W which is of the form {Lv1 , · · · , Lvn } . It follows that {v1 , · · · , vn } is linearly independent. Hence it is a basis for V∑by similar reasoning to the above.∑ Then if Lx = 0, it follows that there are scalars ci such that x = i ci vi and consequently 0 = Lx = i ci Lvi . Therefore, each ci = 0 and so x = 0 also. Thus L is one to one.  Here is a fundamental lemma which gives a typical situation where a vector space is the direct sum of subspaces. Lemma 6.1.5 Let Li be in L (V, V ) and suppose for i ̸= j, Li Lj = Lj Li and also Li is one to one on ker (Lj ) whenever i ̸= j. Then ( p ) ∏ ker Li = ker (L1 ) ⊕ + · · · + ⊕ ker (Lp ) i=1

Here

∏p i=1

Li is the product of all the linear transformations. It signifies Lp ◦ Lp−1 ◦ · · · ◦ L1

or the product in any other order since the transformations commute. Proof : Note that since the operators commute, Lj : ker (Li ) → ker (Li ). Here is why. If Li y = 0 so that y ∈ ker (Li ) , then Li Lj y = Lj Li y = Lj 0 = 0 ∑ and so Lj : ker (Li ) 7→ ∏pker (Li ). Next observe p that it is obvious that, since the operators commute, i=1 ker (Lp ) ⊆ ker ( i=1 Li ) .

108

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

∑ ∑p Next, why is i ker ∏(Lp ) = ker (L1 ) ⊕ · · · ⊕ ker (Lp )? Suppose i=1 vi = 0, vi ∈ ker (Li ) , but some vi ̸= 0. Then do j̸=i Lj to both sides. Since the linear transformations commute, this results in   ∏  Lj  (vi ) = 0 j̸=i

which contradicts the assumption that∑ these Lj are one to one on ker (Li ) and the observation that they map ker (Li ) to ker (Li ). Thus if i vi = 0, vi ∈ ker (Li ) then each vi = 0. It follows that ( p ) ∏ ker (L1 ) ⊕ + · · · + ⊕ ker (Lp ) ⊆ ker Li (*) i=1

From Sylvester’s theorem and the observation about direct sums in Lemma 6.0.2, p ∑ i=1

dim (ker (Li )) =

dim (ker (L1 ) ⊕ + · · · + ⊕ ker (Lp )) (

≤ dim ker

(

p ∏

i=1

)) Li



p ∑

dim (ker (Li ))

i=1

which implies all these are equal. Now in general, if W is a subspace of V, a finite dimensional vector space and the two have the same dimension, then W = V, Lemma 6.1.2. It follows from * that ( p ) ∏ ker (L1 ) ⊕ + · · · + ⊕ ker (Lp ) = ker Li  i=1

So how does the above situation occur? First recall the following theorem and corollary about polynomials. It was Theorem 6.1.6 and Corollary 6.1.7 proved earlier. Theorem 6.1.6 Let f (λ)∏be a nonconstant polynomial with coefficients in F. Then there is some n a ∈ F such that f (λ) = a i=1 ϕi (λ) where ϕi (λ) is an irreducible nonconstant monic polynomial and repeats are allowed. Furthermore, this factorization is unique in the sense that any two of these factorizations have the same nonconstant factors in the product, possibly in different order and the same constant a. ∏p k Corollary 6.1.7 Let q (λ) = i=1 ϕi (λ) i where the ki are positive integers and the ϕi (λ) are irreducible monic Suppose also that p (λ) is a monic polynomial which divides q (λ) . ∏p polynomials. r Then p (λ) = i=1 ϕi (λ) i where ri is a nonnegative integer no larger than ki . Now I will show how to use these basic theorems about polynomials to produce Li such that the above major result follows. This is going to have a striking similarity to the notion of a minimum polynomial in the context of algebraic numbers. Definition 6.1.8 Let V be an n dimensional vector space, n ≥ 1, and let L ∈ L (V, V ) which is a vector space of dimension n2 by Theorem 5.1.4. Then p (λ) will be the non constant monic polynomial such that p (L) = 0 and out of all polynomials q (λ) such that q (L) = 0, the degree of p (λ) is the smallest. This is called the minimum polynomial. It is always understood that L ̸= 0. It is not interesting to fuss with this case of the zero linear transformation. In the following, we always define L0 ≡ I. Theorem 6.1.9 The above definition is well defined. Also, if q (L) = 0, then p (λ) divides q (λ).

6.1. A THEOREM OF SYLVESTER, DIRECT SUMS

109 2

Proof: The dimension of L (V, V ) is n2 . Therefore, I, L, · · · , Ln are linearly dependent and so there is some polynomial q (λ) such that q (L) = 0. Let m be the smallest degree of any polynomial with this property. Such a smallest number exists by well ordering of N. To obtain a monic polynomial p (λ) with degree m, divide such a polynomial with degree m, having the property that p (L) = 0 by the leading coefficient. Now suppose q (λ) is any polynomial such that q (L) = 0. Then by the Euclidean algorithm, there is r (λ) either zero or having degree less than the degree of p (λ) such that q (λ) = p (λ) k (λ) + r (λ) for some polynomial k (λ). But then 0 = q (L) = k (L) p (L) + r (L) = r (L) If r (λ) ̸= 0, then this is a contradiction to p (λ) having the smallest degree. Therefore, p (λ) divides q (λ). Now suppose pˆ (λ) and p (λ) are two monic polynomials of degree m. Then from what was just shown pˆ (λ) divides p (λ) and p (λ) divides pˆ (λ) . Since they are both monic polynomials, they must be equal. Thus the minimum polynomial is unique and this shows the above definition is well defined.  Now here is the major result which comes from Sylvester’s theorem given above. Theorem 6.1.10 Let L ∈ L (V, V ) where V is an n dimensional vector space with field of scalars F. Letting p (λ) be the minimum polynomial for L, p (λ) =

p ∏

ki

ϕi (λ)

i=1

where the ki are positive integers and the ϕi (λ) are distinct irreducible monic polynomials. Also the ( ) ki ki kj linear maps ϕi (L) commute and ϕi (L) is one to one on ker ϕj (L) for all j ̸= i as is ϕi (L). Also ( ) ( ) k k V = ker ϕ1 (L) 1 ⊕ · · · ⊕ ker ϕp (L) p ( ) k and each ker ϕi (L) i is invariant with respect to L. Letting Lj be the restriction of L to ( ) k ker ϕj (L) j , k

it follows that the minimum polynomial of Lj equals ϕj (λ) j . Also p ≤ n. ∏p k Proof: By Theorem 6.1.6, the minimum polynomial p (λ) is of the form a i=1 ϕi (λ) i where ϕi (λ) is monic and irreducible with ϕi (λ) ̸= ϕj (λ) if i ̸= j. Since p (λ) is monic, it follows that k a = 1. Since L commutes with itself, all of these ϕi (L) i commute. Also ( ) ( ) k k ϕi (L) : ker ϕj (L) j → ker ϕj (L) j because all of these operators commute. ( ) k Now consider ϕi (L) . Is it one to one on ker ϕj (L) j ? Suppose not. Suppose that for some ) ( k k j ̸= i, ϕi (L) is not one to one on ker ϕj (L) j . We know that ϕi (λ) , ϕj (λ) j are relatively prime meaning the monic polynomial of greatest degree which divides them both is 1. Why is this? If some polynomial divided both, then it would need to be ϕi (λ) or 1 because ϕi (λ) is irreducible. But k ϕi (λ) cannot divide ϕj (λ) j unless it equals ϕj (λ) , this by Corollary 6.1.7 and they are assumed k unequal. Hence there are polynomials l (λ) , m (λ) such that 1 = l (λ) ϕi (λ) + m (λ) ϕj (λ) j . By what we mean by equality of polynomials, that coefficients of equal powers of λ are equal, it follows that for I the identity transformation, kj

I = l (L) ϕi (L) + m (L) ϕj (L)

110

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

( ) k Say v ∈ ker ϕj (L) j and v ̸= 0 while ϕi (L) v = 0. Then from the above equation, kj

v = l (L) ϕi (L) v + m (L) ϕj (L)

v =0+0=0

( ) k k a contradiction. Thus ϕi (L) and hence ϕi (L) i is one to one on ker ϕj (L) j . (Recall that, ( ) ( ) ( ) k k k since these commute, ϕi (L) maps ker ϕi (L) i to ker ϕi (L) i .) On Vj ≡ ker ϕj (L) j , ϕi (L) actually has an inverse. In fact, the above equation says that for v ∈ Vj , v = l (L) ϕi (L) v. hence an m m inverse for ϕi (L) is l (L) . Thus, from Lemma 6.1.5, ( p ) ( ) ( ) ∏ ki k k V = ker ϕi (L) = ker ϕ1 (L) 1 ⊕ · · · ⊕ ker ϕp (L) p i=1

Next consider the claim about the minimum polynomial ( of Lj .)Denote this minimum polynomial kj

as pj (λ). Then since ϕj (L)

kj

= ϕj (Lj )

kj

= 0 on ker ϕj (L)

kj

, it must be the case that pj (λ)

must divide ϕj (λ) and so by Corollary 6.1.7 this means pj (λ) = ϕj (λ) consider the polynomial p ∏ k r ϕi (λ) i ϕj (λ) j ≡ r (λ)

rj

where rj ≤ kj . If rj < kj ,

i=1,i̸=j ki

commute with each other, r (L) = 0 because r (L) v = 0 for every ( r ) and also r (L) v = 0 for v ∈ ker ϕj (L) j . However, this violates the definition

Then since ( these )operators ϕi (L) v ∈ ker ϕi (L)

ki

of the minimum polynomial for L, p (λ) because here is a polynomial r (λ) such that r (L) = 0 but r (λ) has smaller degree than p (λ). Thus rj = kj . ( ) k Consider the claim that p ≤ n the dimension of V . Let vi ∈ ker ϕi (L) i , vi ̸= 0. Then it must ( ) ( ) k k be the case that {v1 , · · · , vp } is a linearly independent set because ker ϕ1 (L) 1 ⊕· · ·⊕ker ϕp (L) p is a direct sum. Hence p ≤ n because a linearly independent set is never longer than a spanning set one of which has n elements.  ( ) ( ) k Letting β i be an ordered basis for ker ϕi (L) i and letting β ≡ β 1 , β 2 , · · · , β p , it follows ( ) k from Theorem 6.0.3, that if Mj is the matrix for Lj , the restriction of L to ker ϕj (L) j , then the matrix of L with respect to the basis β is a block diagonal matrix of the form   M1 0   ..   .   0 Mp The study of cannonical forms has to do with choosing the bases β i in an auspicious manner. This topic will be discussed more later.

6.2

Finding the Minimum Polynomial

All of this depends on the minimum polynomial. It was shown above that this polynomial exists, but how can you find it? In fact, it is not all that hard to find. Recall that if L ∈ L (V, V ) where 2 the dimension of V is n, then I, L2 , · · · , Ln is linearly independent. Thus some linear combination equals zero. The minimum polynomial was the polynomial p (λ) of smallest degree which is monic and which has p (L) = 0. At this point, we only know that this degree is no more than n2 . However, it will be shown later in the proof of the Cayley Hamilton theorem that there exists a polynomial q (λ) of degree n such that q (L) = 0. Then from Theorem 6.1.9 it follows that p (λ) divides q (λ) and so the degree of p (λ) will always be no more than n.

6.2. FINDING THE MINIMUM POLYNOMIAL

111

Another observation to make is that it suffices to find the minimum polynomial for the matrix of the linear transformation taken with respect to any basis. Recall the relation of this matrix and L. L V → V q↑ ◦ ↑q Fn → Fn A where q is a one to one and onto linear map from Fn to V . Thus if p (L) is a polynomial in L, ( ) p (L) = p q −1 Aq A typical term on the right is of the form   k times z }| { ( ) ( ) ( ) ( )   ck  q −1 Aq q −1 Aq q −1 Aq · · · q −1 Aq  = q −1 ck Ak q Thus, applying this to each term and factoring out q −1 and q, p (L) = q −1 p (A) q. Recall the convention that A0 = I the identity matrix and L0 = I, the identity linear transformation. Thus p (L) = 0 if and only if p (A) = 0 and so the minimum polynomial for A is exactly the same as the minimum polynomial for L. However, in case of A, the multiplication is just matrix multiplication so we can compute with it easily. This shows that it suffices to learn how to find the minimum polynomial for an n × n matrix. I will show how to do this with some examples. The process can be made much more systematic, but I will try to keep it pretty short because it is often the case that it is easy to find it without going through a long computation. Example 6.2.1 Find the minimum polynomial  −1   1 −1

of 0 1 0

 6  −3  4

Go right to the definition and use the fact that you only need to have three powers of this matrix in order to get things to work, which will be shown later. Thus the minimum polynomial involves finding a, b, c, d scalars such that      2  3 1 0 0 −1 0 6 −1 0 6 −1 0 6         a  0 1 0  + b  1 1 −3  + c  1 1 −3  + d  1 1 −3  = 0 0 0 1 −1 0 4 −1 0 4 −1 0 4 You could include all nine powers if you want, but there is no point in doing so from what will be presented later. You will be able to find a polynomial of degree no larger than 3 which will work. There is such a solution from the above theory and it is only a matter of finding it. Thus you need to find scalars such that         −13 0 42 −5 0 18 −1 0 6 1 0 0         a  0 1 0  + b  1 1 −3  + c  3 1 −9  + d  7 1 −21  = 0 −7 0 22 −3 0 10 −1 0 4 0 0 1 Lets try the diagonal entries first and then lets pick the bottom left corner. a − b − 5c − 13d = 0 a+b+c+d=0 a + 4b + 10c + 22d = 0 −b + −3c + −7d = 0

112

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

Thus we row reduce the matrix

    

−1 1 4 −1

1 1 1 0

which yields after some computations     

1 0 0 0

0 1 0 0

−5 1 10 −3

−13 1 22 −7

0 0 0 0

    

 −2 −6 0 3 7 0    0 0 0  0 0 0

We can take d = 0 and c = 1 and find that a = 2, b = −3. A candidate for minimum polynomial is λ2 − 3λ + 2. Could you have a smaller degree polynomial? No you could not because if you took both c and d equal to 0, then you would be forced to have a, b both be zero as well. Hence this must be the minimum polynomial provided the matrix satisfies this equation. You verify this by plugging the matrix in to the polynomial and checking to see if you get 0. If it didn’t work, you would simply include another equation in the above computation for a, b, c, d. 2        −1 0 6 1 0 0 0 0 0 −1 0 6          1 1 −3  − 3  1 1 −3  + 2  0 1 0  =  0 0 0  −1 0 4 0 0 1 0 0 0 −1 0 4 It is a little tedious, but completely routine to find this minimum polynomial. To be more systematic, you would take the powers of the matrix and string each of them out into a long n2 × 1 vector and make these the columns of a matrix which would then be row reduced. However, as shown above, you can get away with less as in the above example, but you need to be sure to check that the matrix satisfies the equation you come up with. Now here is an example where F = Z5 and the arithmetic is in F so A is the matrix of a linear transformation which maps F3 to F3 . Example 6.2.2 The matrix is



1  A= 0 4

2 3 1

 3  1  1

Find the minimum polynomial. Powers of the matrix  1   0 0

are 0 1 0

  1 0   , 0   0 4 1

2 3 1

  3 1 3   , 1   4 0 3 2 1

  0 2 3   , 4   0 2 4 1 4

 3  1  0

If we pick the top left corners, the middle entry, the bottom right corner and the entries in the middle of the bottom row, an appropriate augmented matrix is   1 1 3 0 0  1 3 0 2 0       1 1 4 0 0  0 1 2 1 0

6.3. EIGENVALUES AND EIGENVECTORS OF LINEAR TRANSFORMATIONS Then row reduced echelon form in Z5 is     

1 0 0 0

0 1 0 0

0 0 1 0

4 1 0 0

0 0 0 0

113

    

so it would seem a possible minimum polynomial is obtained by a = 1, b = −1 = 4, c = 0, d = 1. Thus it has degree 3. There cannot be any polynomial of smaller degree because of the first three columns so it would seem that this should be the minimum polynomial, 1 + 4λ + λ3 Does it send the matrix to 0? This just involves checking whether it does and in fact, this is the case using the arithmetic in the residue class. In summary, it is not all that hard to find the minimum polynomial.

6.3

Eigenvalues and Eigenvectors of Linear Transformations

We begin with the following fundamental definition. Definition 6.3.1 Let L ∈ L (V, V ) where V is a vector space of dimension n with field of scalars F. An eigen-pair consists of a scalar λ ∈ F called an eigenvalue and a NON-ZERO v ∈ V such that (λI − L) v = 0 Do eigen-pairs exist? Recall ∏pthat fromk Theorem 6.1.10 the minimum polynomial can be factored in a unique way as p (λ) = i=1 ϕi (λ) i where each ϕi (λ) is irreducible and monic. Then the following theorem is obtained. Theorem 6.3.2 Let L ∈ L (V, V ) and let its minimum polynomial p (λ) have a root µ in the field of scalars. Then µ is an eigenvalue of L. Proof: Since p (λ) has a root, we know p (λ) = (λ − µ) q (λ) where the degree of q (λ) is less than the degree of p (λ). Therefore, there is a vector u such that q (L) u ≡ v ̸= 0. Otherwise, p (λ) is not really the minimum polynomial because q (λ) would work better. Then (L − µI) q (L) u = (L − µI) v = 0 and so µ is indeed an eigenvalue.  Theorem 6.3.3 Suppose the minimum polynomial p (λ) of L ∈ L (V, V ) factors completely into linear factors (splits) so that p ∏ k p (λ) = (λ − µi ) i i=1

Then the µi are distinct eigenvalues and corresponding to each of these eigenvalues, there is an eigenvector wi ̸= 0 such that Lwi = µi wi . Also, there are no other eigenvalues than these µi . Also k1

V = ker (L − µ1 I)

( )kp ⊕ · · · ⊕ ker L − µp I k

and if Li is the restriction of L to ker (A − µi I) i , then Li has exactly one eigenvalue and it is µi . Proof: By Theorem 6.3.2, each µi is an eigenvalue and we can let wi be a corresponding eigenvector. By Theorem 6.1.10, k1

V = ker (L − µ1 I)

( )kp ⊕ · · · ⊕ ker L − µp I

114

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES k

Also by this theorem, the minimum polynomial of Li is (λ − µi ) i and so it has an eigenvalue µi . Could Li have any other eigenvalue ν ̸= µi ? To save notation, denote by m the exponent ki and by µ the eigenvalue µi . Also let w denote an eigenvector of Li with respect to ν. Then since minimum m polynomial for Li is (λ − µ) , ( ) m ∑ m m m m−k k 0 = (L − µI) w = (L − νI + (ν − µ) I) w = (L − νI) (ν − µ) w k k=0 =

m

(ν − µ) w

which is impossible because w ̸= 0. Thus there can be no other eigenvalue for Li . Consider the claim about L having no other eigenvalues than the µi . Say µ is another eigenvalue ∑ k with eigenvector w. Then let w = i zi , zi ∈ ker (L − µi I) i . Then not every zi = 0 and 0 = (L − µI)



zi =

i



(Lzi − µzi ) =

i



Li zi − µzi

i k

Since this is a direct sum and each ker (L − µi I) i is invariant with respect to L, we must have each Li zi − µzi = 0. This is impossible unless µ equals some µi because not every zi is 0.  Example 6.3.4 The minimum polynomial for the  4  A =  −1 1

matrix 0 2 0

 −6  3  −1

is λ2 − 3λ + 2. This factors as (λ − 2) (λ − 1) and so the eigenvalues are 1, 2. Find the eigen-pairs. Then determine the matrix with respect to a basis of these eigenvectors if possible. First consider the eigenvalue 2. There exists a nonzero vector v such that (A − 2I) v = 0. This follows from the above theory. However, it is best to just find it directly rather than try to get it by using the proof of the above theorem. The augmented matrix to consider is then   4−2 0 −6 0   3 0   −1 2 − 2 1 0 −1 − 2 0 Row reducing this yields



1   0 0 Thus the solution is any vector  3z   y z

0 0 0

 −3 0  0 0  0 0

of the form      0 3       = z  0  + y  1  , z, y not both 0 0 1

Now consider the eigenvalue 1. This time you row reduce   4−1 0 −6 0   3 0   −1 2 − 1 1 0 −1 − 1 0

6.4. DIAGONALIZABILITY

115

which yields for the row reduced echelon form  1 0   0 1 0 0 Thus an eigenvector is of the form

 −2 0  1 0  0 0



 2z    −z  , z ̸= 0 z

Consider a basis for Rn of the form         0 2  3        , ,  0   1   −1      1 0 1 You might want to consider Problem 9 on Page 97 at this point. This problem shows that the matrix with respect to this basis is diagonal. When the matrix of a linear transformation can be chosen to be a diagonal matrix, the transformation is said to be nondefective. Also, note that the term applies to the matrix of a linear transformation and so I will specialize to the consideration of matrices in what follows. As shown above, this is equivalent to saying that any matrix of the linear transformation is similar to one which is diagonal. That is, the matrix of a linear transformation, or more generally just a square matrix A has the property that there exists S such that S −1 AS = D where D is a diagonal matrix. Here is a definition which also introduces one of the most horrible adjectives in all of mathematics.

6.4

Diagonalizability

Diagonalizability is a term intended to be descriptive of whether a given matrix is similar to a diagonal matrix. More precisely one has the following definition. Definition 6.4.1 Let A be an n × n matrix. Then A is diagonalizable if there exists an invertible matrix S such that S −1 AS = D where D is a diagonal matrix. This means D has a zero as every entry except for the main diagonal. More precisely, Dij = 0 unless i = j. Such matrices look like the following.   ∗ 0   ..   .   0 ∗ where ∗ might not be zero. The most important theorem about diagonalizability1 is the following major result. First here is a simple observation. ( ) Observation 6.4.2 Let S = s1 · · · sn where S is n × n. Then here is the result of multiplying on the right by a diagonal matrix.   λ1 ( ) )  ( ..  = λ1 s1 · · · λn sn s1 · · · sn  .   λn 1 This

word has 9 syllables! Such words belong in Iceland. Eyjafjallaj¨ okull actually only has seven syllables.

116

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

This follows from the way we multiply matrices. The diagonal matrix has ij th entry equal to δ ij λj and the ij th entry of the matrix on the far left is sji where ( )T si = s1i s2i · · · sni . ∑ Thus the ij th entry of the product on the left is k sik δ kj λj = sij λj . It follows that the j th column is ( )T = λj sj s1j λj s2j λj · · · snj λj Theorem 6.4.3 An n × n matrix is diagonalizable if and only if Fn has a basis of eigenvectors of A. Furthermore, you can take the matrix S described above, to be given as ( ) S = s1 s2 · · · sn where here the sk are the eigenvectors in the basis for Fn . If A is diagonalizable, the eigenvalues of A are the diagonal entries of the diagonal matrix. Proof: To say that A is diagonalizable, is to say that for some   λ1   ..  S −1 AS =  .   λn ( the λi being elements of F. This is to say that for S = s1 · · ·  λ ( ) ( ) 1 A s1 · · · sn = s1 · · · sn  

S,

) sn

..

, sk being the k th column,    

. λn

which is equivalent, from the way we multiply matrices and the above observation, that ( ) ( ) As1 · · · Asn = λ1 s1 · · · λn sn which is equivalent to saying that the columns of S are eigenvectors and the diagonal matrix has the eigenvectors down the main diagonal. Since S −1 is invertible, these eigenvectors are a basis. Similarly, if there is a basis of eigenvectors, one can take them as the columns of S and reverse the above steps, finally concluding that A is diagonalizable.  Corollary 6.4.4 Let A be an n × n matrix with minimum polynomial p (λ) =

p ∏

k

(λ − µi ) i , the µi being distinct.

i=1

Then A is diagonalizable if and only if each ki = 1. Proof: Suppose first that it is diagonalizable and that a basis of eigenvectors is {v1 , · · · , vn } with Avi = µi vi . Since n ≥ p, there may be some repeats here, a µi going with more than one ( )kj ∏p (λ − µi ) . Thus this is a monic polynomial vi . Say ki > 1. Now consider pˆ (λ) ≡ j=1,j̸=i λ − µj n which has smaller degree than p (λ) . If you have v ∈ F , since this is a basis, there are scalars ci ∑ such that v = j cj vj . Then pˆ (A) v = 0. Since v is arbitrary, this shows that pˆ (A) = 0 contrary to the definition of the minimum polynomial being p (λ). Thus each ki must be 1. Conversely, if each ki = 1, then ( ) Fn = ker (A − µ1 I) ⊕ · · · ⊕ ker A − µp I and you simply let β i be a basis for ker (A − µi I) which consists entirely of eigenvectors by definition { } of what you mean by ker (A − µi I) . Then a basis of eigenvectors consists of β 1 , β 2 , · · · , β p and so the matrix A is diagonalizable. 

6.4. DIAGONALIZABILITY

117

Example 6.4.5 The minimum polynomial for the matrix   10 12 −6   A =  −4 −4 3  3 4 −1 2

is λ3 − 5λ2 + 8λ − 4. This factors as (λ − 2) (λ − 1) and so the eigenvalues are 1, 2. Find the eigen-pairs. Then determine the matrix with respect to a basis of these eigenvectors if possible. If it is not possible to find a basis of eigenvectors, find a block diagonal matrix similar to the matrix. Note that from the above theorem, it is not possible to diagonalize this matrix. First find the eigenvectors for 2. You need to row reduce   10 − 2 12 −6 0   −4 − 2 3 0   −4 3 4 −1 − 2 0 This yields



1 0   0 1 0 0 Thus the eigenvectors which go with 2 are ( 6z −3z

2z

The eigenvectors which go with 1 are ( z 2

1

−1

−3 3 2

0 )T

)T

 0  0  0

, z ∈ R, z ̸= 0

, z ∈ R, z ̸= 0

By Theorem 6.3.3, there are no other eigenvectors than those which correspond to eigenvalues 1,2. Thus there is no basis of eigenvectors because the span of the eigenvectors has dimension two. However, we can consider ( ) 2 R3 = ker (A − 2I) ⊕ ker (A − I) (( The second of these is just span

2

−1 1

)T ) . What is the first? We find it by row reducing

the following matrix which is the square of A − 2I augmented with a column of zeros.   −2 0 6 0    1 0 −3 0  −1 0 3 0 Row reducing this yields



1 0   0 0 0 0

 −3 0  0 0  0 0

which says that solutions are of the form   3z    y  , y, z ∈ R not both 0 z

118

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES    3 0     span  0  ,  1  0 1 

This is the nonzero vectors of.

Note these are not ( ) eigenvectors. They are called generalized eigenvectors because they pertain 2 to ker (A − 2I) rather than ker ((A − 2I)) . What is the matrix of the restriction of A to this subspace having ordered basis         0 2  3        , , 0 1 −1           1 0 1 

 3   A 0  1   0   A 1  0





Then

 −6 3  3  0 −1 1  −6 0  3  1 −1 0

10 12  =  −4 −4 3 4  10 12  =  −4 −4 3 4



 24     =  −9  8    12     =  −4  4

   12 3 0    −4  =  0 1  M 4 1 0

24   −9 8 and so some computations yield

( M=

 ( 3 0 8    0 1  −9 1 0

8 −9



Indeed this works



4 −4

)

4 −4 )

(6.2)

 24 12   =  −9 −4  8 4 

Then the matrix associated with the other eigenvector is just 1. Hence the matrix with respect to the above ordered basis is   8 4 0    −9 −4 0  0 0 1 So what are some convenient computations which will allow you to find M easily? Take the transpose of both sides of 6.2. Then you would have ( ) ( ) 24 −9 8 3 0 1 T =M 12 −4 4 0 1 0 (

Thus M ( and so M

T

=

8 4

−9 −4

T

0 1

)

) (

so M =

( =

−9 −4

8 −9

4 −4

)

( ,M

) .

T

1 0

)

( =

8 4

)

6.5. A FORMAL DERIVATIVE AND DIAGONALIZABILITY

119

The eigenvalue problem is one of the hardest problems in algebra because of our inability to exactly solve polynomial equations. Therefore, estimating the eigenvalues becomes very significant. In the case of the complex field of scalars, there is a very elementary result due to Gerschgorin. It can at least give an upper bound for the size of the eigenvalues. Theorem 6.4.6 Let A be an n × n matrix. Consider the n Gerschgorin discs defined as     ∑ Di ≡ λ ∈ C : |λ − aii | ≤ |aij | .   j̸=i

Then every eigenvalue is contained in some Gerschgorin disc. This theorem says to add up the absolute values of the entries of the ith row which are off the main diagonal and form the disc centered at aii having this radius. The union of these discs contains σ (A) . Proof: Suppose Ax = λx where x ̸= 0. Then for A = (aij ) , let |xk | ≥ |xj | for all xj . Thus |xk | ̸= 0. ∑ akj xj = (λ − akk ) xk . j̸=k

∑ |xk | |akj | ≥ |akj | |xj | ≥ akj xj = |λ − aii | |xk | . j̸=k j̸=k j̸=k

Then





Now dividing by |xk |, it follows λ is contained in the k th Gerschgorin disc.  In these examples given above, it was possible to factor the minimum polynomial and explicitly determine eigenvalues and eigenvectors and obtain information about whether the matrix was diagonalizable by explicit computations. Well, what if you can’t factor the minimum polynomial? What then? This is the typical situation, not what was presented in the above examples. Just write down a 3 × 3 matrix and see if you can find the eigenvalues explicitly using algebra. Is there a way to determine whether a given matrix is diagonalizable in the case that the minimum polynomial factors although you might have trouble finding the factors? Amazingly, the answer is yes. One can answer this question completely using only methods from algebra.

6.5

A Formal Derivative and Diagonalizability

For p (λ) = an λn + an−1 λn−1 + · · · + a1 λ + a0 where n is a positive integer, define p′ (λ) ≡ nan λn−1 + (n − 1) an−1 λn−2 + · · · + a1 In other words, you use the usual rules of differentiation in calculus to write down this formal derivative. It has absolutely no physical significance in this context because the coefficients are just elements of some field, possibly Zp . It is a purely algebraic manipulation. A term like ka where k ∈ N and a ∈ F means to add a to itself k times. There are no limits or anything else. However, this has certain properties. In particular, the “derivative” of a sum equals the sum of the derivatives. This is fairly clear from the above definition. You just need to always be considering polynomials. Also ( m( ))′ bλ an λn + an−1 λn−1 + · · · + a1 λ + a0 ( )′ = an bλn+m + ban−1 λm+(n−1) + · · · + ba1 λ1+m + a0 bλm ≡

an b (n + m) λn+m−1 + ban−1 (m + n − 1) λm+n−2 + · · · + ba1 (m + 1) λm + a0 bmλm−1

120

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES

Will the product rule give the same thing? Is it true that the above equals ) ′( (bλm ) an λn + an−1 λn−1 + · · · + a1 λ + a0 ( )′ +bλm an λn + an−1 λn−1 + · · · + a1 λ + a0 ? A short computation shows that this is indeed the case. Then by induction one can conclude that ( p ) p ∏ ∑ ∏ pi (λ) = p′j (λ) pi (λ) i=1

j=1

In particular, if p (λ) =

p ∏

i̸=j

ki

(λ − µi )

i=1

then p′ (λ) =

p ∑

( )kj −1 ∏ k kj λ − µj (λ − µi ) i

j=1

i̸=j

I want to emphasize that this is an arbitrary field of scalars, but if one is only interested in the real or complex numbers, then all of this follows from standard calculus theorems. Proposition 6.5.1 Suppose the minimum polynomial p (λ) of an n × n matrix A completely factors into linear factors. Then A is diagonalizable if and only if p (λ) , p′ (λ) are relatively prime. Proof: Suppose p (λ) , p′ (λ) are relatively prime. Say p (λ) =

n ∏

k

(λ − µi ) i , µi are distinct

i=1

From the above discussion, p′ (λ) =

p ∑

( )kj −1 ∏ k kj λ − µj (λ − µi ) i

j=1

i̸=j

and p′ (λ) , p (λ) are relatively prime if and only if each ki = 1. Then by Corollary 6.4.4 this is true if and only if A is diagonalizable.  Example 6.5.2 Find whether the matrix



1  A= 0 1

 −1 2  1 2  −1 1

is diagonalizable. Assume the field of scalars is C because in this field, the minimum polynomial will factor thanks to the fundamental theorem of algebra. Successive powers  1   0 0

of the matrix are     3 1 −1 2 0 0     1 0 , 0 1 2 , 2 2 1 −1 1 0 1

  5 −4 2   −1 4  ,  6 3 −3 1

−9 −7 −6

Then we need to have for a linear combination involving a, b, c, d as scalars a + b + 3c + 5d = 0 2c + 6d = 0 b + 2c + 3d = 0 −b − 3c − 6d = 0

 0  6  −1

6.6. EXERCISES

121

Then letting d = 1, this gives only one solution, a = 1, b = 3, c = −3 and so the candidate for the minimum polynomial is λ3 − 3λ2 + 3λ + 1. In fact, this does work as is seen by substituting A for λ. So is this polynomial and its derivative relatively prime? λ3 − 3λ2 + 3λ + 1 =

( ) 1 (λ − 1) 3λ2 − 6λ + 3 + 2 3

( ) and clearly 3λ2 − 6λ + 3 and 2 are relatively prime. Hence this matrix is diagonalizable. Of course, finding its diagonalization is another matter. For an algorithm for determining whether two polynomials are relatively prime, see Problem 34 on Page 29. Of course this was an easy example thanks to Problem 12 on Page 126. because there are three distinct eigenvalues, one real and two complex which must be complex conjugates. This problem says that eigenvectors corresponding to distinct eigenvalues are an independent set. Be sure to do this problem. Consider the following example in which the eigenvalues are not distinct, consisting of a, a. Example 6.5.3 Find whether the matrix ( A=

a+1 −1

1 a−1

)

is diagonalizable. Listing the powers of the matrix, ( ) ( 1 0 a+1 , 0 1 −1

1 a−1

) ( ,

a2 + 2a 2a 2 −2a a − 2a

)

Then we need to have for a linear combination involving scalars x, y, z ( ) x + (a + 1) y + a2 + 2a z = 0 y + 2az ) x + (a − 1) y + a − 2a z (

2

=

0

=

0

Then some routine row operations yield x = a2 z, y = −2az and z is arbitrary. For the minimum polynomial, we take z = 1 because this is a monic polynomial. Thus the minimum polynomial is 2

a2 − 2aλ + λ2 = (λ − a)

and clearly this and its derivative are not relatively prime. Thus this matrix is not diagonalizable for any choice of a.

6.6

Exercises

1. For the linear transformation determined by multiplication by the following matrices, find the minimum polynomial. ( ) 3 1 (a) −4 −1 ( ) 0 −2 (b) 1 3   2 1 0   (c)  −1 0 0  2 5 2

122

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES 

 2 1 0   (d)  −1 0 0  0 9 4   1 0 0   (e)  −2 −1 0  3 6 2   2 1 0   (f)  −2 2 1  5 −1 −1   5 0 12   (g)  −2 1 −6  −2 0 −5  −2  2  0 ( ) Its minimum polynomial is λ3 −3λ2 +4λ−2 = (λ − 1) −2λ + λ2 + 2 . Obtain a block diagonal matrix similar to this one.

2. Here is a matrix:



−1   2 −1

−4 4 −1

3. Suppose A ∈ L (V, V ) where V is a finite dimensional vector space and suppose p (λ) is the minimum polynomial. Say p (λ) = λm + am−1 λm−1 + · · · + a1 λ + a0 . If A is one to one, show that it is onto and also that A−1 ∈ L (V, V ). In this case, explain why a0 ̸= 0. In this case, give a formula for A−1 as a polynomial in A. ( ) 0 −2 4. Let A = . Its minimum polynomial is λ2 − 3λ + 2. Find A10 exactly. Hint: You 1 3 ( ) can do long division and get λ10 = l (λ) λ2 − 3λ + 2 + 1023λ − 1022. 5. Suppose A ∈ L (V, V ) and it has minimum polynomial p (λ) which has degree m. It is desired to compute An for n large. Show that it is possible to obtain An in terms of a polynomial in A of degree less than m. 6. Determine whether the following matrices are diagonalizable. Assume the field of scalars is C.   1 1 1   (a)  −1 2 1  0 1 1 ( √ ) 2+1 1 √ (b) −1 2−1 ( ) a+1 1 (c) where a is a real number. −1 a − 1   1 1 −1   (d)  2 1 −1  0 1 2   2 1 0   (e)  −1 0 0  2 2 1

6.6. EXERCISES

123

7. The situation for diagonalizability was presented for the situation in which the minimum polynomial factors completely as a product of linear factors since this is certainly the case of most interest, including C. What if the minimum polynomial does not split? Is there a theorem available that will allow one to conclude that the matrix is diagonalizable in a splitting field, possibly larger than the given field? It is a reasonable question because the assumption that p (λ) , p′ (λ) are relatively prime may be determined without factoring the polynomials and involves only computations involving the given field F. If you enlarge the field, what happens to the minimum polynomial? Does it stay the same or does it change? Remember, the matrix has entries all in the smaller field F while a splitting field is G larger than F, but you can 2 determine the minimum polynomial using row operations on vectors in Fn . 8. Suppose V is a finite dimensional vector space and suppose N ∈ L (V, V ) satisfies N m = 0 for some m ≥ 1. Show that the only eigenvalue is 0. 9. Suppose V is an n dimensional vector space and suppose β is a basis for V. Consider the map µI : V → V given by µIv = µv. What is the matrix of this map with respect to the basis β? Hint: You should find that it is µ times the identity matrix whose ij th entry is δ ij which is 1 if i = j and 0 if i ̸= j. Thus the ij th entry of this matrix will be µδ ij . 10. In the case that the minimum polynomial factors, which was discussed above, we had ( )k p k V = ker (L − µ1 I) 1 ⊕ · · · ⊕ ker L − µp I k

k

If Vi = ker (L − µi I) i , then by definition, (Li − µi I) i = 0 where here Li is the restriction of L to Vi . If N = Li − µi I, then N : Vi → Vi and N ki = 0. This is the definition of a nilpotent transformation, one which has a high enough power equal to 0. Suppose then that N : V → V where V is an m dimensional vector space. We will show that there is a basis for V such that with respect to this basis, the matrix of N is block diagonal and of the form   N1 0   ..   .   0 Ns where Ni is an ri × ri matrix of the form  0 1   0     0

0 ..

.

..

.



     1  0

That is, there are ones down the superdiagonal and zeros everywhere else. Now consider the case where Ni = Li − µi I on one of the Vi as just described. Use the preceding problem and the special basis β i just described for Ni to show that the matrix of Li with respect to this basis is of the form   0 J1 (µi )   ..  J (µi ) ≡  .   0 Js (µi ) where Jr (µi ) is of the form

      

µi

1 µi

0

0

..

.

..

.



     1  µi

124

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES ( ) This is called a Jordan block. Now let β = β 1 , · · · , β p . Explain why the matrix of L with respect to this basis is of the form   J (µ1 ) 0   ..   .   ( ) 0 J µp This special matrix is called the Jordan canonical form. This problem shows that it reduces to the study of the matrix of a nilpotent matrix. You see that it is a block diagonal matrix such that each block is a block diagonal matrix which is also an upper triangular matrix having the eigenvalues down the main diagonal and strings of ones on the super diagonal.

11. Now in this problem, the method for finding the special basis for a nilpotent transformation is given. Let V be a vector space and let N ∈ L (V, V ) be nilpotent. First note the only eigenvalue of N is 0. Why? (See Problem 8.) Let v1 be an eigenvector. Then {v1 , v2 , · · · , vr } is called a chain based on v1 if N vk+1 = vk for all k = 1, 2, · · · , r and v1 is an eigenvector so N v1 = 0. It will be called a maximal chain if there is no solution v, to the equation, N v = vr . Now there will be a sequence of steps leading to the desired basis. (a) Show that the vectors in any chain are linearly independent and for {v1 , v2 , · · · , vr } a chain based on v1 , N : span (v1 , v2 , · · · , vr ) 7→ span (v1 , v2 , · · · , vr ) . (6.3) ∑r Also if {v1 , v2 , · · · , vr } is a chain, then r ≤ n. Hint: If 0 = i=1 ci vi , and the last nonzero scalar occurs at l, do N l−1 to the sum and see what happens to cl . (b) Consider the set of all chains based on eigenvectors. Since all have { total length } no larger than n it follows there exists one which has maximal length, v11 , · · · , vr11 ≡ B1 . If span (B1 ) contains all eigenvectors of N, then stop. { Otherwise,} consider all chains based on eigenvectors not in span (B1 ) and pick one, B2 ≡ v12 , · · · , vr22 which is as long as possible. Thus r2 ≤ r1 . If span (B1 , B2 ) contains all eigenvectors of N, stop. Otherwise, consider } { all chains based on eigenvectors not in span (B1 , B2 ) and pick one, B3 ≡ v13 , · · · , vr33 such that r3 is as large as possible. Continue this way. Thus rk ≥ rk+1 . Then show that the above process terminates with a finite list of chains {B1 , · · · , Bs } because for any k, {B1 , · · · , Bk } is linearly independent. Hint: From part a. you know this is true if k = 1. Suppose true for k − 1 and letting L (Bi ) denote a linear combination of vectors of Bi , suppose k ∑ L (Bi ) = 0 i=1

Then we can assume L (Bk ) ̸= 0 by induction. Let vik be the last term in L (Bk ) which has nonzero scalar. Now act on the whole thing with N i−1 to find v1k as a linear combination of vectors in {B1 , · · · , Bk−1 } , a contradiction to the construction. You fill in the details. (c) Suppose N w = 0. (w is an eigenvector). Show that there exist scalars, ci such that w=

s ∑

ci v1i .

i=1

Recall that v1i is the eigenvector in the ith chain on which this chain is based. You know that w is a linear combination of the vectors in {B1 , · · · , Bs } . This says that in

6.6. EXERCISES

125

fact it∑is a linear combination of the bottom vectors in the Bi . Hint: You know that s w = i=1 L (Bi ) . Let vis be the last in L (Bs ) which has nonzero scalar. Suppose that i > 1. Now do N i−1 to both sides and obtain that v1s is in the span of {B1 , · · · , Bs−1 } which is a contradiction. Hence i = 1 and so the only term of L (Bs ) is one involving an eigenvector. Now do something similar to L (Bs−1 ) , L (Bs−2 ) etc. You fill in details. (d) If N w = 0, then w ∈ span (B1 , · · · , Bs ) . This was what was just shown. In fact, it was a particular linear combination involving the bases of the chains. What if N k w = 0? Does it still follow that w ∈ span (B1 , · · · , Bs )? Show that if N k w = 0, then w ∈ span (B1 , · · · , Bs ) . Hint: Say k is as small as possible such that N k w = 0. Then you have N k−1 w is an eigenvector and so N k−1 w =

s ∑

ci v1i

i=1 k−1

If N w is the base of some chain Bi , then there is nothing to show. Otherwise, consider the chain N k−1 w, N k−2 w, · · · , w. It cannot be any longer than any of the chains B1 , B2 , · · · , Bs why? Therefore, v1i = N k−1 vki . Why is vki ∈ Bi ? This is where you use that this is no longer than any of the Bi . Thus ( ) s ∑ k−1 i N w− c i vk = 0 ∑s

i=1

By induction, (details) w − ∈ span (B1 , · · · , Bs ) . m (e) Since N is nilpotent, ker (N ) = V for some m and so all of V is in span (B1 , · · · , Bs ). (f) Now explain why the matrix with respect to the ordered basis (B1 , · · · , Bs ) is the kind of thing desired and described in the above problem. Also explain why the size of the blocks decreases from upper left to lower right. To see why the matrix is like the above, consider ( ) ( ) 0 v1i · · · vri i −1 = v1i v2i · · · vri i Mi i i=1 ci vk

where Mi is the ith block and ri is the length of the ith chain. If you have gotten through this, then along with the previous problem, you have proved the existence of the Jordan canonical form, one of the greatest results in linear algebra. It will be considered a different way later. Specifically, you have shown that if the minimum polynomial splits, then the linear transformation has a matrix of the following form:   J (µ1 ) 0   ..   .   ( ) 0 J µp where without loss of generality, you can arrange these blocks to be decreasing in size from the upper left to the lower right and J (µi ) is of the form   Jr1 (µi ) 0   ..   .   0 Jrs (µi ) Where Jr (µi ) is the r × r matrix which is of the following form   µi 1 0   .   µi . .   Jr (µi ) =   ..   . 1   0 µi

126

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES and the blocks Jr (µi ) can also be arranged to have their size decreasing from the upper left to lower right.

12. (Extra important) The following theorem gives an easy condition for which the Jordan canonical form will be a diagonal matrix. Theorem 6.6.1 Let A ∈ L (V, V ) and suppose (ui , λi ) , i = 1, 2, · · · , m are eigen-pairs such that if i ̸= j, then λi ̸= λj . Then {u1 , · · · , um } is linearly independent. In words, eigenvectors from distinct eigenvalues are linearly independent. ∑k Hint: Suppose i=1 ci ui = 0 where k is as small as possible such that not all of the ci = 0. Then ck ̸= 0. Explain why k > 1 and k ∑

ci λk ui =

i=1

Now

k ∑

k ∑

ci λi ui

i=1

ci (λk − λi ) ui = 0

i=1

Obtain a contradiction of some sort at this point. Thus if the n × n matrix has n distinct eigenvalues, then the corresponding eigenvectors will be a linearly independent set and so the matrix will be diagonal and all the Jordan blocks will be single numbers. 13. This and the next few problems will give another presentation of the Jordan canonical form. Let A ∈ L (V, V ) be a nonzero linear transformation where V has finite dimensions. Consider { } x, Ax, A2 x, · · · , Am−1 x where for k ≤ m − 1, Ak x is not in, ( ) Ak x ∈ / span x, Ax, A2 x, · · · , Ak−1 x { } show that then x, Ax, A2 x, · · · , Am−1 x must be linearly independent. Hint: Let η (λ) be the minimum polynomial for A. Then let ϕ (λ) be the monic polynomial of smallest degree such that ϕ{(A) x = 0. Explain why} ϕ (λ) divides η (λ). Then show that if the degree of ϕ (λ) is d, then x, Ax, A2 x, · · · , Ad−1 x is linearly independent and if k is as described above, then k ≤ d. Note: linear dependence implies the existence of a polynomial ψ (λ) such that 2 m−1 ψ (A)(x = 0. An ordered set x where Am x ∈ ) of vectors of the form x, Ax, A x, · · · , A 2 m−1 span x, Ax, A x, · · · , A x with m as small as possible is called a cyclic set. 14. ↑Suppose now that N ∈ L (V, V ) for V a finite dimensional vector space and the minimum polynomial for N is{λp . In other words, N is} nilpotent, N p = 0 and p as small as possible. For x ̸= 0, let β x = x, N x, N 2 x, · · · , (N m−1 x where we keep the ) order of these vectors in β x and here m is such that N m x ∈ span x, N x, N 2 x, · · · , N m−1 x with m as small as possible. (a) Show that N m x = 0. Hint: You know from the assumption that ( ) N m x ∈ span x, N x, N 2 x, · · · , N m−1 x that there is a monic polynomial η (λ) of degree m such that η (N ) x = 0. Explain why η (λ) divides the minimum polynomial λp . Then η (λ) = λm . Thus N m x = 0. ( ) (b) For each x ̸= 0, there is such a β x and let V1 ≡ span β x1 . Explain why N : V1 → V1 .

6.6. EXERCISES

127

(c) Let N{1 be the restriction of N } to V1 . Find the matrix of N1 with respect to the ordered basis N m−1 x1 , · · · , N x1 , x1 . Note that we reverse the order of these vectors. This is just the traditional way of doing it. Show this matrix is of the form   0 1 0   .   0 ..   B≡ (6.4)  ..   . 1   0 0 15. ↑In the context of the above problems where N p = 0, N ∈ L (V, V )

( ) and β x is defined as above, show that for each k ≤ p, if W is a subspace of ker N k which is invariant with respect to N meaning N (W ) ⊆ W, then there are finitely many yi ∈ W such that ( ) W = span β y1 , β y2 , · · · , β ys , some s { } and β y1 , β y2 , · · · , β ys is linearly independent. This is called a cyclic basis. Hint: If W ⊆ ker (N ) , this is obviously true because in this case, β x = x( for x )∈ ker (N ). Now suppose the assertion is true for k < p and consider invariant W ⊆ ker N k+1 . Argue as follows: ( k) (a) Explain ( why N (W ) is )an invariant subspace of ker N . Thus, by induction, N (W ) = span β x1 , β x2 , · · · , β xs where that in (·) is a basis. ∑s ∑ri −1 (b) Let z ∈ W so N z = i=1 j=0 aij N j xj . Let yj ∈ W such that N yj = xj . Explain why   s r∑ i −1 ∑ N z − aij N j yi  = 0 i=1 j=0

where the length of β xi is ri . Explain why there is an eigenvector y0 such that z=

s r∑ i −1 ∑

aij N j yi + y0

i=1 j=0

(c) Note that β y0 = y0 . Explain why ( ) span β y0 , β y1 , · · · , β ys ⊇ W { } Then explain why β y0 , β y1 , · · · , β ys is linearly independent. Hint: If s r∑ i −1 ∑

aij N j yi + by0 = 0,

i=1 j=0

Do N to both sides and use induction to conclude all aij = 0. 16. ↑Now in the above situation show that there is a basis for V = ker (N p ) such that with respect to this basis, the matrix of N is block diagonal of the form   B1   B2     (6.5) ..   .   Br

128

CHAPTER 6. DIRECT SUMS AND BLOCK DIAGONAL MATRICES where the size of the blocks is decreasing from upper left to lower right and each block is of the form given in 6.4. Hint: Repeat the argument leading to this equation for each β yi where the ordered basis for ker (N p ) is of the form { } β y1 , β y2 , · · · , β yr arranged so that the length of β yi is at least as long as the length of β yi+1 .

17. ↑Now suppose the minimum polynomial for A ∈ L (V, V ) is p (λ) =

r ∏

mi

(λ − µi )

i=1

Thus from what was shown above, V =

r ⊕

ker ((A − µi I)

mi

i=1

)≡

r ⊕

ker (Nimi )

i=1 m

where Ni is the restriction of (A − µi I) to Vi ≡ ker ((A − µi I) i ). Explain why there are ordered bases β 1 , · · · , β r , β j being a basis for Vj such that with respect to this basis, the matrix of Ni has the form   B1   B2     ..   .   Bsi each Bk having ones down the super diagonal and zeros elsewhere. Now explain why each Vi is A invariant and the basis just described yields a matrix for A which is of the form   J1   ..   .   Jr where

  Jk =  



J1 (µk ) ..

  

. Jsk (µk )

with the size of the diagonal blocks decreasing and Jm (µk ) having ones down the super diagonal and µk down the diagonal. Hint: Explain why, for I the identity on Vk the matrix of µk I with respect to any basis is just the diagonal matrix having µk down the diagonal. Thus the matrix of A restricted to Vk relative to the basis β k will be of the desired form. Note that on Vk , A = Nk + µk I. This yields the Jordan canonical form. A more elaborate argument similar to the above will be used in the following chapter to obtain the rational canonical form as well as the Jordan form.

Chapter 7

Canonical Forms Linear algebra is really all about linear transformations and the fundamental question is whether a matrix comes from some linear transformation with respect to some basis. In other words, are two matrices really from the same linear transformation? As proved above, this happens if and only if the two are similar. Canonical forms allow one to answer this question. There are two main kinds of canonical form, the Jordan canonical form for the case where the minimum polynomial splits and the rational canonical form in the other case. Of the two, the Jordan canonical form is the one which is used the most in applied math. However, the other one is also pretty interesting.

7.1

Cyclic Sets

It was shown above that for A ∈ L (V, V ) for V a finite dimensional vector space over the field of scalars F, there exists a direct sum decomposition V = V1 ⊕ · · · ⊕ Vq

(7.1)

where mk

Vk = ker (ϕk (A)

)

and ϕk (λ) is an irreducible monic polynomial. Here the minimum polynomial of A was p (λ) ≡

q ∏

mk

ϕk (λ)

k=1

Next I will consider the problem of finding a basis for Vk such that the matrix of A restricted to Vk assumes various forms. { } Definition 7.1.1 Letting x ̸= 0 and A ∈ L (V, V( ) , denote by β)x the vectors x, Ax, A2 x, · · · , Am−1 x where m is the smallest such that Am x ∈ span x, · · · , Am−1 x . This is called an A cyclic set. For such a sequence of vectors, |β x | ≡ m, the number of vectors in β x . Note that for such an A cyclic set, there exists a unique monic polynomial η (λ) of degree |β x | with η (A) x = 0 such that if ϕ (A) x = 0 for any other polynomial, then η (λ) must divide ϕ (λ). Indeed, ϕ (λ) = η (λ) q (λ) + r (λ) , where r (λ) = 0 or it has degree less than that of η (λ). Thus 0 = ϕ (A) x = q (A) η (A) x + r (A) x = r (A) x which is impossible unless r (λ) = 0 because it would yield a shorter length for |β x |. Such cyclic sets have some very useful properties. The next lemma is a more complete description of what was just observed. 129

130

CHAPTER 7. CANONICAL FORMS

{ } 2 d−1 Lemma 7.1.2 Let β = x, Ax, A x, · · · , A x ,}x ̸= 0 where d is the smallest such that Ad x ∈ x ( ) { d−1 2 d−1 span x, · · · , A x . Then x, Ax, A x, · · · , A x is linearly independent. If x ∈ ker (ϕ (A)) , x ̸= 0, where ϕ (λ) is irreducible and has degree d, then |β x | = d. If x ∈ ker (ϕ (A)) where ϕ (λ) is just some polynomial, there is a unique monic polynomial of minimum degree η (λ) such that η (A) x = 0. Also span (β x ) is always A invariant. Thus An z ∈ span (β x ) whenever z ∈ span (β x ). Proof: Suppose that there are scalars ak , not all zero such that d−1 ∑

ak Ak x = 0

k=0

Then letting ar be the last nonzero scalar in the sum, you can divide by ar and solve for Ar x as a linear combination of the Aj x for j < r ≤ d − 1 contrary to the definition of d. For the second claim, if |β x | < d, then Am x is a linear combination of Ak x for k < m for some m < d. That is, there is a polynomial η (λ) of degree less than d such that η (A) x = 0 and we can let its degree be as small as possible. But then ϕ (λ) = η (λ) q (λ) + r (λ) where the degree of r (λ) is less than the degree of η (λ) or else is 0. It can’t be zero because ϕ (λ) is irreducible. Hence r (A) x = 0 which is a contradiction. Thus |β x | = d because ϕ (A) x = 0. For the third claim, let η (λ) be a monic polynomial of smallest degree such that η (A) x = 0. Then the same argument just given shows that if ηˆ (λ) has the same properties then η (λ) /ˆ η (λ) and ηˆ (λ) /η (λ) so the two are the same. Indeed, η (λ) = ˆl (λ) ηˆ (λ) + rˆ (λ) , ηˆ (λ) = ˆl (λ) η (λ) + r (λ) and both rˆ (λ) , r (λ) are zero. Thus the polynomial is unique. For the last claim, observe that λn = η (λ) q (λ) + r (λ) , deg r (λ) < deg η (λ) where η (λ) is the polynomial of degree equal to |β x | such that η (A) x = 0 which is associated with β x . Then An x = q (A) η (A) x + r (A) x = r (A) x ∈ span (β x ) and so one can replace An x with r (A) x ∈ span (β x ). For z ∈ span (β x ) , the terms of An z are scalar multiples of Am x for various m so An z ∈ span (β x ) if z ∈ span (β x ) .  As an example of these sequences, let x ∈ ker (ϕ (A)) where ϕ (λ) is a polynomial so by definition, ϕ (A) x = 0. Out of all polynomials η (λ) such that η (A) x = 0, take the{one with smallest degree. } Say η (λ) = am λm +ar−1 λm−1 +· · ·+a1 λ+a0 . Then Am x is in the span of x, Ax, A2 x, · · · , Am−1 x . Could this happen for Ak x for some k < m? If so, there would be a polynomial ψ (λ) having degree smaller than m such that ψ (A) x = 0 which doesn’t happen. Thus you get such a sequence whenever x ∈ ker (ϕ (A)) for some polynomial ϕ (λ). Now here is a nice lemma which will be used in what follows. Lemma 7.1.3 Suppose W is a subspace of V where V is a vector space and L ∈ L (V, V ) and suppose LW = LV. Then V = W + ker (L). Proof: Let v ∈ V . Then there exists wv ∈ W such that Lv = Lwv by assumption. Then L (v − wv ) = 0. Letting zv = v − wv , it follows that zv ∈ ker (L) and v = wv + zv showing that V = W + ker (L).  For more on the next lemma and the following theorem, see Hofman and Kunze [22]. I am following the presentation in Friedberg Insel and Spence [16]. See also Herstein [19] or Problem 11 on Page 124 for a different approach to canonical forms. To help organize the ideas in the lemma, here is a diagram.

7.1. CYCLIC SETS

131

V U ⊆ ker(ϕ(A))

W v1 , ..., vs

β x1 , β x2 , ..., β xp { } Also recall that β x denotes a cyclic set x, Ax, A2 x, · · · , Am−1 x where ( ) Am x ∈ span x, Ax, A2 x, · · · , Am−1 x with m as small as possible. Lemma 7.1.4 Let V be a vector space, A ∈ L (V, V ) , and W an A invariant (AW ⊆ W ) subspace of V. Also let m be a positive integer and ϕ (λ) an irreducible monic polynomial of degree d. Let U be an A invariant subspace of ker (ϕ (A)) and assume ker (ϕ (A)) is finite dimensional. If {v1 , · · · , vs } is a basis for W then if x ∈ U \ W, {v1 , · · · , vs , β x } is linearly independent. (In other words, we know that {v1 , · · · , vs , x} is linearly independent by earlier theorems, but this says that you can include, not just x but the entire A cyclic set beginning with x.) There exist vectors x1 , · · · , xp each in U such that } { v1 , · · · , vs , β x1 , · · · , β xp is a basis for U + W. β x = d and p is uniquely determined by U . Also, if x ∈ ker (ϕ (A)m ) , |β x | = kd where k ≤ m. i Here |β x | is the length of β x , the degree of the monic polynomial η (λ) satisfying η (A) x = 0 with η (λ) having smallest possible degree. Proof: By Lemma 7.1.2, if x ∈ ker ϕ (A) , and |β x | denotes the length of β x , then |β x | = d the degree of the irreducible polynomial ϕ(λ) and so { } ( ) β x = x, Ax, A2 x, · · · , Ad−1 x , Ad x ∈ span x, · · · , Ad−1 x also span (β x ) is A invariant, A (span (β x )) ⊆ span (β x ). Suppose now x ∈ U \ W where U ⊆ ker (ϕ (A)). Consider {v1 , · · · , vs , β x } . Is this set of vectors independent? Suppose s ∑ i=1

ai v i +

d ∑

dj Aj−1 x = 0.

j=1

( ) ∑d ∑s If z ≡ j=1 dj Aj−1 x, then z ∈ W ∩ span x, Ax, · · · , Ad−1 x because z = − i=1 ai vi . Note also that z ∈ ker (ϕ (A)) because x is. Then by Lemma 7.1.2, the intersection just considered is A invariant and so, in particular, for each m ≤ d − 1, ( ) Am z ∈ W ∩ span x, Ax, · · · , Ad−1 x

132

CHAPTER 7. CANONICAL FORMS

Therefore, ( ) span z, Az, · · · , Ad−1 z ⊆

( ) W ∩ span x, Ax, · · · , Ad−1 x ( ) ⊆ span x, Ax, · · · , Ad−1 x (7.2) { } Suppose z = ̸ 0. Then from the Lemma 7.1.2 above, z, Az, · · · , Ad−1 z must be linearly independent. Therefore, ( ( )) ( ( )) d = dim span z, Az, · · · , Ad−1 z ≤ dim W ∩ span x, Ax, · · · , Ad−1 x ( ( )) ≤ dim span x, Ax, · · · , Ad−1 x = d Thus

( ) ( ) W ∩ span x, Ax, · · · , Ad−1 x = span x, Ax, · · · , Ad−1 x ( ) which would require W ⊇ span x, Ax, · · · , Ad−1 x so x ∈ W but this is assumed not to take place. Hence z = 0 and {so the linear independence of the {v1 , · · · , vs } implies each ai = 0. Then the linear } independence of x, Ax, · · · , Ad−1 x , which follows from Lemma 7.1.2, shows each dj = 0. Thus { } v1 , · · · , vs , x, Ax, · · · , Ad−1 x is linearly independent as claimed. Let x ∈ U \W ⊆ ker (ϕ (A)) . Then it was just shown that {v1 , · · · , vs , β x } is linearly independent. Let W1 be given by y ∈ span (v1 , · · · , vs , β x ) ≡ W1 Then W1 is A invariant. If W1 equals U + W, then you are done. If not, let W1 play the role of W and pick x1 ∈ U \ W1 and repeat the argument. Continue till ( ) span v1 , · · · , vs , β x1 , · · · , β xn = U + W m

The process stops because ker (ϕ (A) ) is finite dimensional. m Finally, letting x ∈ ker (ϕ (A) ) , there is a monic polynomial η (λ) such that η (A) x = 0 and η (λ) is of smallest possible degree, which degree equals |β x | . Then m

ϕ (λ)

= η (λ) l (λ) + r (λ)

If deg (r (λ)) < deg (η (λ)) , then r (A) x = 0 and η (λ) was incorrectly chosen. Hence r (λ) = 0 m k and so η (λ) must divide ϕ (λ) . Hence by Corollary 1.13.10 η (λ) = ϕ (λ) where k ≤ m. Thus |β x | = kd = deg (η (λ)).  Definition 7.1.5 space. It has a cyclic basis if there are vectors xi ∈ V such that { Let V be a vector } a basis for V is β x1 , · · · , β xp where β x denotes a cyclic set of vectors as described above. Here is the main result. Theorem 7.1.6 Suppose A ∈ L (V, V ) , V ̸= {0} is a finite dimensional vector space. Suppose ker (ϕ (A)) ̸= {0} where ϕ (λ) is a monic irreducible polynomial. Then for m ∈ N, and{Vˆ an invariant } m subspace of ker (ϕ (A) ) , there exists a cyclic basis for Vˆ which is of the form β = β x , · · · , β x . 1

p

Proof: ( ) It is proved by induction that a cyclic basis exists for an A invariant subspace of k ker ϕ (A) for k = 1, 2, · · · . When this is done, the result follows from letting k = m. First suppose k = 1. Then in Lemma 7.1.4 you can let W = {0} and U be{ an A invariant } subspace of ker (ϕ (A)). Then by this lemma, there exist v1 , v2 , · · · , vs such that β v1 , · · · , β vs is a basis for U. ( In particular, you can let U = ker (ϕ (A)). Suppose then that } for any A invariant subspace U of ) { l

ker ϕ (A)

there is a cyclic basis of the form

β x1 , · · · , β xp

for any l ≤ k, k ≥ 1.

7.1. CYCLIC SETS

133 k+1

k

Consider Vˆ , an A invariant subspace of ker ϕ (A)

ˆ ˆ . Now consider ϕ (A) { V . Then ϕ (A) } ϕ (A) V = 0. It follows from induction that there is a cyclic basis for ϕ (A) Vˆ called β x1 , · · · , β xp , the xi in ϕ (A) Vˆ . ( ) { } Let yj ∈ Vˆ be such that ϕ (A) yj = xj ∈ ϕ (A) Vˆ . Consider β y1 , · · · , β yp , yi ∈ Vˆ . Are these vectors independent? Suppose 0=

β yi | p |∑ ∑

aij Aj−1 yi ≡

i=1 j=1

p ∑

fi (A) yi

(7.3)

i=1

{ } If the sum involved xi in place of yi , then something could be said because β x1 , · · · , β xp is a basis. Do ϕ (A) to both sides to obtain 0=

β yi | p |∑ ∑

aij Aj−1 xi ≡

i=1 j=1

p ∑

fi (A) xi

i=1

{ } ( ) Now fi (A) xi = 0 for each i since fi (A) xi ∈ span β xi and as just mentioned, β x1 , · · · , β xp is a basis. Let η i (λ) be the monic polynomial of smallest degree such that η i (A) xi = 0. Then fi (λ) = η i (λ) l (λ) + r (λ) where r (λ) = 0 or else it has smaller degree than η i (λ) . However, the equation then shows that r (A) xi = 0 which would contradict the choice of η i (λ). Thus r (λ) = 0 and η i (λ) divides fi (λ). k k k Also, ϕ (A) xi = ϕ (A) ϕ (A) yi = 0 and so η i (λ) must divide ϕ (λ) . From Corollary 1.13.10, it r follows that, since ϕ (λ) is irreducible, η i (λ) = ϕ (λ) for some r ≤ k. Thus ϕ (λ) divides η i (λ) which divides fi (λ). Hence fi (λ) = ϕ (λ) gi (λ)! Now 0=

p ∑

fi (A) yi =

i=1

p ∑

gi (A) ϕ (A) yi =

i=1

p ∑

gi (A) xi .

i=1

( ) By the same reasoning just given, since gi (A) xi ∈ span β xi , it follows that each gi (A) xi = 0. Therefore, fi (A) yi = gi (A) ϕ (A) yi = gi (A) xi = 0. Therefore,

β yj

fi (A) yi =



aij Aj−1 yi = 0

j=1

and by independence of the vectors in β yi , this implies aij = 0 for each j for each i. ( ) ( ) Next, it follows from the definition that ϕ (A) span β yk ⊇ span β xk and consequently, letting ( ) W = span β y1 , · · · , β yp , ( ) ( ) span β x1 , · · · , β xp ⊆ ϕ (A) span β y1 , · · · , β yp Now W ⊆ Vˆ because each yi ∈ Vˆ . ) ) ( ( ) ( ϕ (A) Vˆ = span β x1 , · · · , β xp ⊆ ϕ (A) span β y1 , · · · , β yp ≡ ϕ (A) (W ) ⊆ ϕ (A) Vˆ { } It follows from Lemma 7.1.3 that Vˆ = W + x ∈ Vˆ : ϕ (A) x = 0 . From Lemma 7.1.4 W + { } { } x ∈ Vˆ : ϕ (A) x = 0 has a basis of the form β y1 , · · · , β yp , β z1 , · · · , β zs .  It is with respect to cyclic bases as described in this theorem that one constructs the rational canonical form which will depend only on the ϕ (λ) and the number of cycles of a certain length.

134

CHAPTER 7. CANONICAL FORMS

7.2

The Rational Canonical Form

Let V be a finite dimensional vector space and let A ∈ L (V, V ). Let the minimum polynomial be q ∏

mi

ϕi (λ)

i=1

where ϕi (λ) is irreducible. Thus we can consider mi

Vi = ker (ϕi (A)

)

( ) Now consider what happens to the restriction of A to span β xj where β xj = kdi . What is its ( ) matrix with respect to the basis for span β xj ? Using the usual gimmick for finding this matrix, ( Axj

A2 x j

···

Akdi −1 xj

Akdi xj

)

( =

xj

Axj

···

Akdi −2 xj

Akdi −1 xj

) M

( ) k Now if ϕi (λ) = a0 + a1 λ + · · · + akdi −1 λkdi −1 + λkdi , then ( ) Akdi xj = − a0 + a1 A + · · · + akdi −1 Akdi −1 xj therefore, we can determine the matrix M . It is  0  1 0     0 1 0   .. .. .  . 0

of the form −a0 −a1 .. . ..

.

1

−akdi −2 −akdi −1

        

(7.4)

{ }q This is called a companion matrix. Thus, with respect to the bases β xi1 , · · · , β xip , the matrix i i=1 of A is block diagonal with each block itself being block diagonal, the smaller blocks being of the form in 7.4. { }q Conversely, if you have such a matrix, it determines a collection of cyclic bases β xi1 , · · · , β xip . i i=1 Corresponding to one of the blocks above, you would have ( ) ( ) Az1 Az2 · · · Azkd−2 Azkd−1 = z1 z2 · · · zkd−2 zkd−1 M and if M is of the form given above, you would need z2 = Az1 , z3 = Az2 etc. This yields just such a cyclic basis with the last entry on the left Akd z1 which will then be equal to the appropriate linear combination of lower powers times z1 . m If the blocks corresponding to ker (ϕi (A) i ) are ordered to decrease in size from upper left to lower right, the matrix obtained is uniquely determined. This is shown later. This matrix is called the rational canonical form. Theorem 7.2.1 Let A ∈ L (V, V ) where V is a finite dimensional vector space and let its minimum polynomial be q ∏ m ϕi (λ) i i=1

Then there is a block diagonal matrix

   

M1

0 ..

0

. Mq

   

7.3. NILPOTENT TRANSFORMATIONS AND JORDAN CANONICAL FORM mi

where{Mi is the matrix } of A with respect to ker (ϕi (λ) form β x1 , · · · , β xp and it is of the form

135

) taken with respect to a cyclic basis of the

i

   

C1

0 ..

0

.

   

Cmi

where each Cj is a companion matrix as described in 7.4. If we arrange these Cj to be descending in size from upper left to lower right, then the matrix just described is uniquely determined. (This proved later.) Also, the largest block corresponding to Ck is of size mk d × mk d where d is the degree of ϕk (λ). Proof: The proof of uniqueness is given later. As to the last claim, let the cycles corresponding m m to ker (ϕ (A) k ) be β y1 , · · · , β yp . If none of these cycles has length dmk , then in fact, ker (ϕ (A) k ) ( ) l equals ker ϕ (A) for some l < mk which would contradict the fact shown earlier that the minimum m

m

polynomial of A on ker (ϕ (A) k ) is ϕ (λ) k .  Note that there are exactly two things which determine the rational canonical form, the factorization of the minimum polynomial into irreducible factors and the numbers consisting of |β x | for β x a cycle in a cyclic basis of V . Thus, if you can find these two things, you can obtain the rational canonical form. The important thing about this canonical form is that it does not depend on being able to factor the minimum polynomial into linear factors. However, in most cases of interest when the field of scalars is the complex numbers, a factorization of the minimum polynomial exists. When this happens, there is a much more commonly used canonical form called the Jordan canonical form.

7.3

Nilpotent Transformations and Jordan Canonical Form

Definition 7.3.1 Let V be a vector space over the field of scalars F. Then N ∈ L (V, V ) is called nilpotent if for some m, it follows that N m = 0. ∏p k In general, when you have A ∈ L (V, V ) and the minimum polynomial is i=1 ϕi (λ) i , you can decompose V into a direct sum as V =

p ⊕

( ) k ker ϕi (A) i

i=1

(

k

)

Then on ker ϕi (A) i , it follows by definition that ϕi (A) is nilpotent. The following lemma contains some significant observations about nilpotent transformations. { } Lemma 7.3.2 Suppose N k x ̸= 0 if and only if x, N x, · · · , N k x is linearly independent. Also, the minimum polynomial of N is λm where m is the first such that N m = 0. ∑k Proof: Suppose i=0 ci N i x = 0 where not all ci = 0. There exists l such that k ≤ l < m and N l+1 x = 0 but N l x ̸= 0. Then multiply both sides by N l to conclude that c0 = 0. Next multiply both sides by N l−1 to conclude that c1 = 0 and continue this way to obtain that all the ci = 0. Next consider the claim that λm is the minimum polynomial. If p (λ) is the minimum polynomial, then by the division algorithm, λm = p (λ) l (λ) + r (λ) where the degree of r (λ) is less than that of p (λ) or else r (λ) = 0. The above implies 0 = 0 + r (N ) contrary to p (λ) being minimum. Hence r (λ) = 0 and so p (λ) divides λm . Hence p (λ) = λk for k ≤ m. But if k < m, this would contradict the definition of m as being the smallest such that N m = 0. 

136

CHAPTER 7. CANONICAL FORMS

Note how this lemma implies that if N k x is a linear combination of the preceeding vectors for k as small as possible, then (N k x = )0. k Now suppose V = ker ϕ (A) where ϕ (λ) is irreducible and the minimum polynomial for A on k

V ϕ (A)}, we can consider a cyclic basis for V of the form { is ϕ (λ) as}in the above. {Letting B = m−1 β x1 , · · · , β xs where β x ≡ x, Bx, · · · , B x . From Lemma 7.3.2, B m x = 0. So what is the matrix of B with respect to this basis? It is block diagonal, the blocks coming from the individual β xi , the size of the blocks being β xi × β xi , i = 1, · · · , s.   C1   ..   .   Cp Using the useful gimmick and ordering the basis using decreasing powers,1 one of these blocks is of the form Ck where Ck is the matrix on the right ( ) 0 B m−1 x · · · Bx   0 1 0    ( )  0 0 ...   = B m−1 x B m−2 x · · · x  .  ..  .  . 1   . 0 0 ··· 0 That is, Ck is the |β x | × |β x | matrix which has ones down the super diagonal and zeros elsewhere. Note that the size of the blocks is determined by |β x | just as in the above rational canonical form. In fact, if you ordered the basis in the opposite way, one of these blocks would just be the companion matrix for B just as in the rational canonical form. This is because of Lemma 7.3.2 which requires B m x = 0 and so the linear combination of B m x in terms of B k x for k < m has all zero coefficients. You would have   0 0 ··· 0    1 0 . . . ...      . ..   . .. . . 0   . 0 0 1 0 because the right column in 7.4 is all zero. The following convenient notation will be used. Definition 7.3.3 Jk (α) is a Jordan block if it is a k × k matrix of the form   α 1 ··· 0    0 . . . . . . ...    Jk (α) =  .  ..  . ..  . . 1   . 0 ··· 0 α In words, there is an unbroken string of ones down the super diagonal and the number α filling every space on the main diagonal with zeros everywhere else. Then with this definition and the above discussion, the following proposition has been proved. 1I

don’t know why we use the reverse order but it seems to be traditional to do it this way. I conjecture that it has something to do with the procedure used in ODE for finding solutions to first order systems when you look for generalized eigenvectors. Yes, you are finding a basis for the Jordan form when you do this. Herstein does not reverse the order in his book.

7.3. NILPOTENT TRANSFORMATIONS AND JORDAN CANONICAL FORM

137

Proposition 7.3.4 Let A ∈ L (V, V ) and let the minimal polylnomial of A be p (λ) ≡

p ∏

k

ϕi (λ) i , V =

i=1

p ⊕

( ) k ker ϕi (A) i

i=1

where the ϕi (λ) are irreducible. Also let Bi ≡ ϕi (A). Then Bi is nilpotent and ) ( k Biki = 0 on Vi = ker ϕi (A) i Letting the dimension of Vi be di , there exists a basis for Vi such that the matrix of Bi with respect to this basis is of the form   Jr1 (0) 0   Jr2 (0)    J = (7.5) ..   .   0 Jrs (0) ∑s where r1 ≥ r2 ≥ · · · ≥ rs ≥ 1 and i=1 ri = di . In the above, the Jrj (0) is called a Jordan block of size rj × rj with 0 down the main diagonal. r

r−1

Observation 7.3.5 Observe that Jr (0) = 0 but Jr (0)

̸= 0.

In fact, the matrix of the above proposition is unique. This is a general fact for a nilpotent matrix N . Corollary 7.3.6 Let J, J ′ both be matrices of the nilpotent linear transformation N ∈ L (W, W ) which are of the form described in Proposition 7.3.4. Then J = J ′ . In fact, if the rank of J k equals the rank of J ′k for all nonnegative integers k, then J = J ′ . Proof: Since J and J ′ are similar, it follows that for each k an integer, J k and J ′k are similar. Hence, for each k, these matrices have the same rank. Now suppose J ̸= J ′ . Note first that r

r−1

Jr (0) = 0, Jr (0)

̸= 0.

Denote the blocks of J as Jrk (0) and the blocks of J ′ as Jrk′ (0). Let k be the first such that Jrk (0) ̸= Jrk′ (0). Suppose that rk > rk′ . By block multiplication and the above observation, it follows that the two matrices J rk −1 and J ′rk −1 are respectively of the forms     Mr1′ 0 M r1 0     .. ..     . .             Mrk′ Mr k   ,      0 ∗         .. ..     . .     0 0 0 ∗ where Mrj = Mrj′ for j ≤ k − 1 but Mrk′ is a zero rk′ × rk′ matrix while Mrk is a larger matrix which is not equal to 0. For example, Mrk could look like   0 ··· 1  .  .. M rk =  . ..    0 0 Thus this contradicts the requirement that J k and J ′k have the same rank. 

138

CHAPTER 7. CANONICAL FORMS

The Jordan canonical form is available when the minimum polynomial can be factored in the field of scalars. Thus r ∏ m p (λ) = (λ − µk ) k k=1

and V =

r ⊕

ker (A − µi I)

mi



i=1

r ⊕

Vi

i=1

Now here is a useful observation. Observation 7.3.7 If W is a vector space and L ∈ L (W, W ) is given by Lw = µw, then for any basis for W, the matrix of L with respect to this basis is   µ 0   ..   .   0 µ To see this, note that  ( µv1

···

) µvn

( =

···

v1

vn

)  

µ

0 ..

0

.

   

µ

∏r m Definition 7.3.8 When the minimum polynomial for A is i=1 (λ − µi ) i , the Jordan canonical form of A is the block diagonal matrix of A obtained from using the cyclic basis for (A − µi I) on m Vi ≡ ker (A − µi I) i . Where r ⊕ Vi V = i=1

It is unique up to order of blocks and is a block diagonal matrix with each block being block diagonal. So what is the matrix description of the Jordan canonical form? From Proposition 7.3.4, the matrix of A − µi I on Vi having dimension di is of the form   Jr1 (0) 0   Jr2 (0)     ..   .   0 Jrs (0) where r1 + · · · + rs = di and we can assume the size of the Jordan blocks decreases from upper left to lower right. It follows then from the above observation that the matrix of A = (A − µi Idi ) + µi Idi is   Jr1 (µi ) 0   Jr2 (µi )    J (µi ) ≡  ..   .   0 Jrs (µi ) Then the Jordan form of the matrix A is  J (µ1 )    0

0 ..

. J (µr )

   

7.3. NILPOTENT TRANSFORMATIONS AND JORDAN CANONICAL FORM

139 k

Each J (µi ) has a size which is uniquely determined by the dimension of ker (A − µi I) i which comes from the minimum polynomial. Furthermore, each J (µi ) is a block diagonal matrix in which the blocks have the above specified form. Note that if any of the β k consists of eigenvectors, then the corresponding Jordan block will consist of a diagonal matrix having λk down the main diagonal. This corresponds to mk = 1. m The vectors which are in ker (A − λk I) k which are not in ker (A − λk I) are called generalized eigenvectors. The following is the main result on the Jordan canonical form. Theorem 7.3.9 Let V be an n dimensional vector space with field of scalars C or some other field such that the minimum polynomial of A ∈ L (V, V ) completely factors into powers of linear factors. Then there exists a unique Jordan canonical form for A, where uniqueness is in the sense that any two have the same number and size of Jordan blocks. Proof: Suppose there are two, J and J ′ . Then these are matrices of A with respect to possibly different bases and so they are similar. Therefore, they have the same minimum polynomials and k the generalized eigenspaces ker (A − µi I) i have the same dimension. Thus the size of the matrices ′ J (λk ) and J (λk ) defined by the dimension of these generalized eigenspaces, also corresponding to the algebraic multiplicity of λk , must be the same. Therefore, they comprise the same set of positive integers. Thus listing the eigenvalues in the same order, corresponding blocks J (λk ) , J ′ (λk ) are the same size. It remains to show that J (λk ) and J ′ (λk ) are not just the same size but also are the same up to order of the Jordan blocks running down their respective diagonals. It is only necessary to worry about the number and size of the Jordan blocks making up J (λk ) and J ′ (λk ) . Since J, J ′ are similar, so are J − λk I and J ′ − λk I. Thus the following two matrices are similar      A≡   

J (λ1 ) − λk I

0 ..

. J (λk ) − λk I ..

.

    B≡   

J ′ (λ1 ) − λk I

0 ..

. J ′ (λk ) − λk I ..

0

       

J (λr ) − λk I

0 



. J ′ (λr ) − λk I

        

( ) ( ) and consequently, rank Ak = rank B k for all k ∈ N. Also, both J (λj ) − λk I and J ′ (λj ) − λk I are one to one for every λj ̸= λk . Since all the blocks in both of these matrices are one to one except the blocks J ′ (λk ) − λk I, J (λk ) −{λk I, (it follows that this)}requires the two sequences of numbers ∞ m m ∞ {rank ((J (λk ) − λk I) )}m=1 and rank (J ′ (λk ) − λk I) must be the same. m=1 Then   Jk1 (0) 0   Jk2 (0)    J (λk ) − λk I ≡  ..   .   0 Jkr (0)

140

CHAPTER 7. CANONICAL FORMS

and a similar formula holds for J ′ (λk )    J ′ (λk ) − λk I ≡   

Jl1 (0)

0 Jl2 (0) ..

0

.

     

Jlp (0)

and it is required to verify that p = r and that the same blocks occur in both. Without loss of generality, let the blocks be arranged according to size with the largest on upper left corner falling to smallest in lower right. Now the desired conclusion follows from Corollary 7.3.6.  m Note that if any of the generalized eigenspaces ker (A − µk I) k has a basis of eigenvectors, then it would be possible to use this basis and obtain a diagonal matrix in the block corresponding to µk . By uniqueness, this is the block corresponding to the eigenvalue µk . Thus when this happens, the block in the Jordan canonical form corresponding to µk is just the diagonal matrix having µk down the diagonal and there are no generalized eigenvectors as fussed over in ordinary differential m equations. Recall that these were vectors in ker (A − µk I) k but not in ker (A − µI). The Jordan canonical form is very significant when you try to understand powers of a matrix. There exists an n × n matrix S 2 such that A = S −1 JS. Therefore, A2 = S −1 JSS −1 JS = S −1 J 2 S and continuing this way, it follows Ak = S −1 J k S. where J is given in the above corollary. Consider J k . By block multiplication,   J1k 0   .. . Jk =  .   0 Jrk The matrix Js is an ms × ms matrix which is of the form Js = D + N

(7.6)

for D a multiple of the identity and N an upper triangular matrix with zeros down the main diagonal. Thus N ms = 0. Now since D is just a multiple of the identity, it follows that DN = N D. Therefore, the usual binomial theorem may be applied and this yields the following equations for k ≥ ms . Jsk

k

= (D + N ) =

=

ms ( ) ∑ k j=0

j

k ( ) ∑ k j=0

j

Dk−j N j

Dk−j N j ,

the third equation holding because N ms = 0. Thus Jsk is of the form   αk · · · ∗  . ..  ..  . Jsk =  . . .  . 0 · · · αk 2 The

S here is written as S −1 in the corollary.

(7.7)

7.4. EXERCISES

141

Lemma 7.3.10 Suppose J is of the form Js described above in 7.6 where the constant α, on the main diagonal is less than one in absolute value. Then ( ) lim J k ij = 0. k→∞

Proof: From 7.7, it follows that for large k, and j ≤ ms , ( ) k k (k − 1) · · · (k − ms + 1) . ≤ j ms ! ( ) Therefore, letting C be the largest value of N j pq for 0 ≤ j ≤ ms , ) ( ( ) k (k − 1) · · · (k − ms + 1) k k−ms |α| J pq ≤ ms C ms ! which converges to zero as k → ∞. This is most easily seen by applying the ratio test to the series ) ∞ ( ∑ k (k − 1) · · · (k − ms + 1) k−ms |α| ms !

k=ms

and then noting that if a series converges, then the k th term converges to zero. 

7.4

Exercises

1. In the discussion of Nilpotent transformations, it was asserted that if two n × n matrices A, B are similar, then Ak is also similar to B k . Why is this so? If two matrices are similar, why must they have the same rank? 2. If A, B are both invertible, then they are both row equivalent to the identity matrix. Are they necessarily similar? Explain. 3. Suppose you have two nilpotent matrices A, B and Ak and B k both have the same rank for all k ≥ 1. Does it follow that A, B are similar? What if it is not known that A, B are nilpotent? Does it follow then? 4. (Review problem.) When we say a polynomial equals zero, we mean that all the coefficients equal 0. If we assign a different meaning to it which says that a polynomial p (λ) equals zero when it is the zero function, (p (λ) = 0 for every λ ∈ F.) does this amount to the same thing? Is there any difference in the two definitions for ordinary fields like Q? Hint: Consider for the field of scalars Z2 , the integers mod 2 and consider p (λ) = λ2 + λ. 5. Let A ∈ L (V, V ) where V is a finite dimensional vector space with field of scalars F. Let p (λ) be the minimum polynomial and suppose ϕ (λ) is any nonzero polynomial such that ϕ (A) is not one to one and ϕ (λ) has smallest possible degree such that ϕ (A) is nonzero and not one to one. Show ϕ (λ) must divide p (λ). 6. Let A ∈ L (V, V ) where V is a finite dimensional vector space with field of scalars F. Let p (λ) be the minimum polynomial and suppose ϕ (λ) is an irreducible polynomial with the property that ϕ (A) x = 0 for some specific x ̸= 0. Show that ϕ (λ) must divide p (λ) . Hint: First write p (λ) = ϕ (λ) g (λ) + r (λ) where r (λ) is either 0 or has degree smaller than the degree of ϕ (λ). If r (λ) = 0 you are done. Suppose it is not 0. Let η (λ) be the monic polynomial of smallest degree with the property that η (A) x = 0. Now use the Euclidean algorithm to divide ϕ (λ) by η (λ) . Contradict the irreducibility of ϕ (λ) .

142

CHAPTER 7. CANONICAL FORMS 

7. Let

1  A= 0 0

0 0 1

 0  −1  0

Find the minimum polynomial for A. 8. { Suppose A is an n }× n matrix and let v be a vector. Consider the A cyclic set of vectors v, Av, · · · , Am−1 v where this is an independent set of vectors but Am v is a linear combination of the preceding vectors in the list. Show how to obtain a monic polynomial of smallest degree, m, ϕv (λ) such that ϕv (A) v = 0 Now let {w1 , · · · , wn } be a basis and let ϕ (λ) be the least common multiple of the ϕwk (λ) . Explain why this must be the minimum polynomial of A. Give a reasonably easy algorithm for computing ϕv (λ). 9. Here is a matrix.



−7 −1   −21 −3 70 10

 −1  −3  10

Using the process of Problem 8 find the minimum polynomial of this matrix. Determine whether it can be diagonalized from its minimum polynomial. 10. Let A be an n × n matrix with field of scalars C or more generally, the minimum polynomial splits. Letting λ be an eigenvalue, show the dimension of the eigenspace equals the number of Jordan blocks in the Jordan canonical form which are associated with λ. Recall the eigenspace is ker (λI − A) . 11. For any n × n matrix, why is the dimension of the eigenspace always less than or equal to the algebraic multiplicity of the eigenvalue as a root of the characteristic equation? Hint: Note the algebraic multiplicity is the size of the appropriate block in the Jordan form. 12. Give an example of two nilpotent matrices which are not similar but have the same minimum polynomial if possible. 13. Here is a matrix. Find its Jordan canonical form by directly finding the eigenvectors and generalized eigenvectors based on these to find a basis which will yield the Jordan form. The eigenvalues are 1 and 2.   −3 −2 5 3  −1 0 1 2       −4 −3 6 4  −1 −1 1 3 Why is it typically impossible to find the Jordan canonical form? 14. Let A be an n × n matrix and let J be its Jordan canonical form. Here F = R or C. Recall J is a block diagonal matrix having blocks Jk (λ) down the diagonal. Each of these blocks is of the form   λ 1 0   .   λ ..   Jk (λ) =   ..   . 1   0 λ

7.4. EXERCISES

143

Now for ε > 0 given, let the diagonal matrix Dε be given by   1 0   ε     Dε =  ..  .   k−1 0 ε Show that Dε−1 Jk (λ) Dε has the same form as Jk (λ) but instead of ones down the super diagonal, there is ε down the super diagonal. That is Jk (λ) is replaced with   λ ε 0   .   λ ..     ..   . ε   0 λ Now show that for A an n×n matrix, it is similar to one which is just like the Jordan canonical form except instead of the blocks having 1 down the super diagonal, it has ε. 15. Let A be in L (V, V ) and suppose that Ap x ̸= 0 for some x ̸= 0. Show that Ap ek ̸= 0 for some ek ∈ {e1 , · · · , en } , a basis for V . If you have a matrix which is nilpotent, (Am = 0 for some m) will it always be possible to find its Jordan form? Describe how to do it if this is the case. Hint: First explain why all the eigenvalues are 0. Then consider the way the Jordan form for nilpotent transformations was constructed in the above. 16. Show that if two n×n matrices A, B are similar, then they have the polynomial ∏ssame minimum r and also that if this minimum polynomial is of the form p (λ) = i=1 ϕi (λ) i where the ϕi (λ) r r are irreducible and monic, then ker (ϕi (A) i ) and ker (ϕi (B) i ) have the same dimension. Why is this so? This was what was responsible for the blocks corresponding to an eigenvalue being of the same size. 17. In Theorem 7.1.6 show that each cyclic set β x is associated with a monic polynomial η x (λ) such that η x (A) (x) = 0 and this polynomial has smallest possible degree such that this happens. Show that the cyclic sets β xi can be arranged such that η xi+1 (λ) /η xi (λ). 18. Show that if A is a complex n×n matrix, then A and AT are similar. block. Note that      0 0 1 λ 1 0 0 0 1 λ       0 1 0  0 λ 1  0 1 0  =  1 1 0 0 0 0 λ 1 0 0 0

Hint: Consider a Jordan 0 λ 1

 0  0  λ

∑ 19. (Extra important) Let A be an n × n matrix. The trace of A, trace (A) is defined ( T )as i Aii . It is just the sum of the entries on the main diagonal. Show trace (A) = trace A . Suppose A is m × n and B is n × m. Show that trace (AB) = trace (BA) . Now show that if A and B are similar n × n matrices, then trace (A) = trace (B). Recall that A is similar to B means A = S −1 BS for some matrix S. 20. (Extra important) If A is an n × n matrix and the minimum polynomial splits in F the field of scalars, show that trace (A) equals the sum of the eigenvalues listed according to multiplicity according to number of times they occur in the Jordan form. Next, show that this is true even if the minimum polynomial does not split. 21. Let A be a linear transformation defined on a (finite dimensional vector space V . Let the ) ∏q mi i i minimum polynomial be i=1 ϕi (λ) and let β vi , · · · , β vri be the cyclic sets such that 1

i

144

CHAPTER 7. CANONICAL FORMS { } ∑ ∑ m β ivi , · · · , β ivri is a basis for ker (ϕi (A) i ). Let v = i j vji . Now let q (λ) be any polyno1 i mial and suppose that q (A) v = 0 Show that it follows { } q (A) = 0. Hint: First consider the special case where a basis for V is x, Ax, · · · , An−1 x and q (A) x = 0.

7.5

Companion Matrices and Uniqueness

Recall the concept of a companion matrix. Definition 7.5.1 Let q (λ) = a0 + a1 λ + · · · + an−1 λn−1 + λn be a monic polynomial. The companion matrix of q (λ) , denoted as C (q (λ)) is the matrix 

0   1    0

··· 0 .. .

0 .. 1

.

−a0 −a1 .. . −an−1

     

Proposition 7.5.2 Let q (λ) be a polynomial and let C (q (λ)) be its companion matrix. Then q (C (q (λ))) = 0. In fact, q (λ) is the minimum polynomial for C (q (λ)). Proof: Write C instead of C (q (λ)) for short. Note that Ce1 = e2 , Ce2 = e3 , · · · , Cen−1 = en Thus ek = C k−1 e1 , k = 1, · · · , n and so it follows

{

e1 , Ce1 , C 2 e1 , · · · , C n−1 e1

}

(7.8) (7.9)

are linearly independent. Hence these form a basis for F . Now note that Cen is given by n

Cen = −a0 e1 − a1 e2 − · · · − an−1 en and from 7.8 this implies C n e1 = −a0 e1 − a1 Ce1 − · · · − an−1 C n−1 e1 and so q (C) e1 = 0. Indeed, q (C) e1 = ( n ) C + an−1 C n−1 + · · · + a1 C + a0 I e1 = −a0 e1 − a1 Ce1 − · · · − an−1 C n−1 e1 + an−1 C n−1 e1 + · · · + a1 Ce1 + a0 Ce1 = 0 Now since 7.9 is a basis, every vector of Fn is of the form k (C) e1 for some polynomial k (λ). Therefore, if v ∈ Fn , q (C) v = q (C) k (C) e1 = k (C) q (C) e1 = 0 which shows q (C) = 0. It remains to show that q (λ) is the minimum polynomial for C. Suppose then that k (C) = 0 and k (λ) = b0 + b1 λ + · · · + bp−1 λp−1 + λp

7.5. COMPANION MATRICES AND UNIQUENESS

145

Then if p < n, 0 = k (C) e1 = b0 e1 + b1 Ce1 + · · · + bp−1 C p−1 e1 + C p e1 = b0 e1 + b1 e2 + · · · + bp−1 ep + ep+1 which is impossible if p < n. Therefore, if p (λ) is the minimum polynomial, it has degree at least as large as n. However, p (λ) divides q (λ) because q (C) = 0. p (λ) = q (λ) l (λ) + r (λ) where r (λ) = 0 or it has degree less than n = deg (q (λ)) . The above equation shows that r (C) = 0 and this would contradict p (λ) being the minimum polynomial. Thus q (λ) /p (λ) . Also, q (λ) = p (λ) l (λ) + rˆ (λ) where rˆ (λ) has smaller degree than p (λ) or else is 0. Again, if it is nonzero, you would have a contradiction to p (λ) being the minimum polynomial. Thus rˆ (λ) = 0 and so p (λ) /q (λ). These are both monic polynomials and so they must be the same.  Now recall the rational canonical form. You have A ∈ L (V, V ) where V is a finite dimensional vector space. The minimum polynomial is p (λ) =

r ∏

ki

ϕi (λ)

i=1

( ) k where ϕi (λ) is a monic irreducible polynomial of degree d. Then for Vi ≡ ker ϕi (A) i , you have V =

r ⊕

Vi

i=1

where each Vi is an A invariant subspace of V and the minimum polynomial of A restricted to Vi is k ϕi (λ) i . Then to get the rational canonical form, you consider a cyclic basis for each Vi and obtain this as a block diagonal matrix of block diagonal matrices. The large blocks pertain to the individual Vi and the question of uniqueness reduces to consideration of the restriction of A on Vi . Thus the k problem reduces to A ∈ L (V, V ) and its minimum polynomial is ϕ (λ) where ϕ (λ) is monic and irreducible. Do any two rational canonical forms for this situation have the same blocks? Recall that the blocks are obtained from cycles of length md which are of the form ( ) x Ax · · · Amd−1 x ( ) m where Amd x = − a0 + a1 Ax + · · · + am−1 Amd−1 x where ϕ (λ) = a0 + a1 λ + · · · + am−1 λmd−1 + m λmd . Then the block obtained is C (ϕ (λ) ) obtained as follows: ( ) ( ) m Ax A2 x · · · Amd x = x Ax · · · Amd−1 x C (ϕ (λ) ) 

where

0

  1  C (ϕ (λ) ) =    0 m

··· .. . .. .

0

−a0 .. .

0 1

−am−2 −am−1

      

146

CHAPTER 7. CANONICAL FORMS

Suppose then that you have two rational canonical forms  m C (ϕ (λ) 1 )  m C (ϕ (λ) 2 )   M ≡   0  n C (ϕ (λ) 1 )  n C (ϕ (λ) 2 )   N ≡   0

from the same A. 

0 ..

  ,  

. mr

C (ϕ (λ) 0 ..

)      

. n

C (ϕ (λ) s )

where in each case, the size of the blocks are decreasing from upper left to lower right. Why is r = s and mi = ni ? Since these come from the same linear transformation with possibly different bases, this is the same as saying that they are similar. Furthermore, if f (λ) is a polynomial, then f (M ) must also be similar to f (N ) . By block multiplication this would say that   m f (C (ϕ (λ) 1 )) 0   m f (C (ϕ (λ) 2 ))    , f (M ) ≡  ..  .   mr 0 f (C (ϕ (λ) ))   n1 f (C (ϕ (λ) )) 0   n f (C (ϕ (λ) 2 ))    f (N ) ≡  ..   .   n 0 f (C (ϕ (λ) s )) are similar. Now we pick f auspiciously. Suppose the blocks are the same size for the first k − 1 blocks starting at the top and going toward the bottom, and that there is a discrepancy in the k th n position. Then the first k − 1 blocks must be identical before the discrepancy. Say C (ϕ (λ) k ) is mk nk nk smaller than C (ϕ (λ) ) . Then pick f (λ) ≡ ϕ (λ) . By Proposition 7.5.2, f (C (ϕ (λ) )) = 0. m However, it cannot be the case that f (C (ϕ (λ) k )) = 0 because by the above proposition the mk minimum polynomial of this matrix is C (ϕ (λ) ) which has larger degree. Also, for ni < nk , n n n f (C (ϕ (λ) i )) = 0 because ϕ (λ) i divides ϕ (λ) k . Thus      f (N ) =    

B1

0 ..

. 0 ..

0

. 0





         , f (M ) =       

B1

0 ..

. Bk ..

0

.

        



These cannot possibly be similar since they do not even have the same rank and so there can be no discrepancies in the size of the blocks. Hence they are all identical. This proves the following theorem. Theorem 7.5.3 Let A ∈ L (V, V ). Then its rational canonical form is uniquely determined up to the order of the blocks. In particular, the numbers |β x | occuring in the cyclic basis are determined. In the case where two n × n matrices M, N are similar, recall this is equivalent to the two being matrices of the same linear transformation taken with respect to two different bases. Hence each are similar to the same rational canonical form.

7.5. COMPANION MATRICES AND UNIQUENESS

147

Example 7.5.4 Here is a matrix. 

−2 10 0

5  A= 2 9

 1  −2  9

Find a similarity transformation which will produce the rational canonical form for A. The minimum polynomial is λ3 − 24λ2 + 180λ − 432. This factors as 2

(λ − 6) (λ − 12) ( ) 2 Thus Q3 is the direct sum of ker (A − 6I) and ker (A − 12I) . Consider the first of these. You see easily that this is     1 −1     y  1  + z  0  , y, z ∈ Q. 0 1 What about the length of A cyclic sets? It turns out it doesn’t matter much. You can start with either of these and get a cycle of length 2. Lets pick the second one. This leads to the cycle           −12 −1 −4 −1 −1          2  0  ,  −4  = A  0  ,  −48  = A  0  1 −36 1 0 1 where the last of the three is a linear combination of the first two. Take the first two as the first two columns of S. To get the third, you need a cycle of length 1 corresponding to ker (A − 12I) . This ( )T . Thus yields the eigenvector 1 −2 3 

−1  S= 0 1

−4 −4 0

 1  −2  3

Now using Proposition 5.2.9, the rational canonical form for A should be 

−1   0 1

−4 −4 0

−1  1 5   −2   2 3 9

−2 10 0

 1 −1  −2   0 9 1

−4 −4 0

  1 0   −2  =  1 3 0

 −36 0  12 0  0 12

2

Note that (λ − 6) = λ2 − 12λ + 36 and so the top left block is indeed the companion matrix of this polynomial. Of course in this case, we could have obtained the Jordan canonical form. Example 7.5.5 Here is a matrix.     A=   

12 −4 4 0 −4

−3 1 5 −5 3

 −19 −14 8  1 6 −4   5 −2 4    −5 2 0  11 6 0

Find a basis such that if S is the matrix which has these vectors as columns S −1 AS is in rational canonical form assuming the field of scalars is Q.

148

CHAPTER 7. CANONICAL FORMS The minimum polynomial is λ3 − 12λ2 + 64λ − 128

This polynomial factors as ( ) (λ − 4) λ2 − 8λ + 32 ≡ ϕ1 (λ) ϕ2 (λ) where the second factor is irreducible over Q. Consider ϕ2 (λ) first. Messy computations yield         −1 −1 −1 −2          1   0   0   0                 ker (ϕ2 (A)) = a   0  + b 1  + c 0  + d 0 .          0   0   1   0  0 0 0 1 Now start with one of these basis vectors and look for an the cycle    −1 −15     1   5     0 , 1       0    −5 0 7

A cycle. Picking the first one, you obtain        

because the next vector involving A2 yields a vector which is in the span of the above two. You check this by making the vectors the columns of a matrix and finding the row reduced echelon form. Clearly this cycle does not span ker (ϕ2 (A)) , so look for another cycle. Begin with a vector which is not in the span of these two. The last one works well. Thus another A cycle is     −16 −2      0   4       0  ,  −4           0   0  8 1 It follows a basis for ker (ϕ2 (A)) is           −16 −1 −15 −2                          4 1 5 0          0  ,  −4  ,  0  ,  1                    0 0 −5 0                 8 0 7 1 Finally consider a cycle coming from ker (ϕ1 (A)). This amounts to nothing more than finding an eigenvector for A corresponding to the eigenvalue 4. An eigenvector is ( )T −1 0 0 0 1 Now the desired matrix for the similarity transformation is   −2 −16 −1 −15 −1    0 4 1 5 0    S≡ −4 0 1 0    0   0 0 −5 0   0 1 8 0 7 1

7.6. EXERCISES

149

Then doing the computations, you get     S −1 AS =    

0 1 0 0 0

−32 8 0 0 0

0 0 0 1 0

0 0 −32 8 0

0 0 0 0 4

       

and you see this is in rational canonical form, the two 2 × 2 blocks being companion matrices for the polynomial λ2 − 8λ + 32 and the 1 × 1 block being a companion matrix for λ − 4. Note that you could have written this without finding a similarity transformation to produce it. This follows from the above theory which gave the existence of the rational canonical form. Obviously there is a lot more which could be considered about rational canonical forms. Just begin with a strange field and start investigating what can be said. One can also derive more systematic methods for finding the rational canonical form. The advantage of this is you don’t need to find the eigenvalues in order to compute the rational canonical form and it can often be computed for this reason, unlike the Jordan form. The uniqueness of this rational canonical form can be used to determine whether two matrices consisting of entries in some field are similar, provided you are able to factor the minimum polynomial into the product of irreducible factors.

7.6

Exercises

1. Find the minimum polynomial for 

 1 2 3   A= 2 1 4  −3 2 1 assuming the field of scalars is the rational numbers. 2. Show, using the rational root theorem, the minimum polynomial for A in the above problem is irreducible with respect to Q. Letting the field of scalars be Q find the rational canonical form and a similarity transformation which will produce it. 3. Letting the field of scalars be Q, find the rational  1 2 1  2 3 0    1 3 2 1 2 1

canonical form for the matrix  −1 2    4  2

( ) 4. Let A : Q3 → Q3 be linear. Suppose the minimum polynomial is (λ − 2) λ2 + 2λ + 7 . Find the rational canonical form. Can you give generalizations of this rather simple problem to other situations? 5. Find the rational canonical form with respect  0  A= 1 0

to the field of scalars equal to Q for the matrix  0 1  0 −1  1 1

Observe that this particular matrix is already a companion matrix of λ3 − λ2 + λ − 1. Then find the rational canonical form if the field of scalars equals C or Q + iQ.

150

CHAPTER 7. CANONICAL FORMS

6. Suppose you have two n × n matrices A, B whose entries are in a field F and suppose G is an extension of F. For example, you could have F = Q and G = C. Suppose A and B are similar with respect to the field G. Can it be concluded that they are similar with respect to the field F? Hint: First show that the two have the same minimum polynomial over F. Let this be ∏ q p p ϕ (λ) i . Say β vj is associated with the polynomial ϕ (λ) j . Thus, as described above i=1 i β vj equals pj d. Consider the following table which comes from the A cyclic set { } vj , Avj , · · · , Ad−1 vj , · · · , Apj d−1 vj αj0

αj1

αj2

···

αjd−1

vj ϕ (A) vj .. . pj −1 ϕ (A) vj

Avj ϕ (A) Avj .. . pj −1 ϕ (A) Avj

A2 v j ϕ (A) A2 vj .. . pj −1 2 ϕ (A) A vj

··· ···

Ad−1 vj ϕ (A) Ad−1 vj .. . pj −1 d−1 ϕ (A) A vj

···

In the above, αjk signifies the vectors below it in the k th column. None of these vectors below p −1 the top row are equal to 0 because the degree of ϕ (λ) j λd−1 is dpj − 1, which is less than pj d and the smallest degree of a nonzero polynomial sending vj to 0 is pj d. Thus the vector on the bottom of each column is an eigenvector for ϕ (A). Also, each of these vectors is in the span of β vj and there are dpj of them, just as there are dpj vectors in β vj . Now show that { } αj0 , · · · , αjd−1 is independent. If you string these together in opposite order, you get a basis which yields a block composed of sub-blocks for the Jordan canonical form for ϕ (A) . Now show that ϕ (A) , ϕ (B) are similar with respect to G. Therefore, they have exactly the same Jordan canonical form with respect to G. The lengths of the cycles are determined according to the size of these Jordan blocks and so you must have the same lengths of cycles for A as you do for B which shows they must have the same rational canonical form. Therefore, they are similar with respect to F. Thus all you have to do is to verify that the set is independent and that ϕ (A) , ϕ (B) are similar with respect to G. To show why the set is independent, suppose j −1 d−1 p∑ ∑

k

cik ϕ (A) Ai vj = 0

i=0 k=0 pj −1

Then multiplying both sides by ϕ (A)

d−1 ∑

. Explain why pj −1

ci0 ϕ (A)

Ai v j = 0

i=0

Now if any of the ci0 is nonzero this would imply there exists a polynomial having degree smaller p −2 than pj d which sends vj to 0. This zeros out the ci0 scalars. Next multiply by ϕ (A) j and give a similar argument. This kind of argument is used by Friedberg Insel and Spence [16] to verify uniqueness of the rational canonical form. Note that this is a very interesting result, being even better than the theorem which gives uniqueness of the rational canonical form with respect to a single field.

Chapter 8

Determinants The determinant is a number which comes from an n × n matrix of elements of a field F. It is easiest to give a definition of the determinant which is clearly well defined and then prove the one which involves Laplace expansion which the reader might have seen already. Let (i1 , · · · , in ) be an ordered list of numbers from {1, · · · , n} . This means the order is important so (1, 2, 3) and (2, 1, 3) are different. Two books which give a good introduction to determinants are Apostol [1] and Rudin [32]. Some recent books which also have a good introduction are Baker [4], and Baker and Kuttler [6]. The approach here is less elegant than in these other books but it amounts to the same thing. I have just tried to avoid the language of permutations in the presentation. The function sgn presented in what follows is really the sign of a permutation however.

8.1

The Function sgn

The following Lemma will be essential in the definition of the determinant. Lemma 8.1.1 There exists a function, sgnn which maps each ordered list of numbers from {1, · · · , n} to one of the three numbers, 0, 1, or −1 which also has the following properties. sgnn (1, · · · , n) = 1

(8.1)

sgnn (i1 , · · · , p, · · · , q, · · · , in ) = − sgnn (i1 , · · · , q, · · · , p, · · · , in )

(8.2)

In words, the second property states that if two of the numbers are switched, the value of the function is multiplied by −1. Also, in the case where n > 1 and {i1 , · · · , in } = {1, · · · , n} so that every number from {1, · · · , n} appears in the ordered list, (i1 , · · · , in ) , sgnn (i1 , · · · , iθ−1 , n, iθ+1 , · · · , in ) ≡ (−1)

n−θ

sgnn−1 (i1 , · · · , iθ−1 , iθ+1 , · · · , in )

(8.3)

where n = iθ in the ordered list, (i1 , · · · , in ) . Proof: Define sign (x) = 1 if x > 0, −1 if x < 0 and 0 if x = 0. If n = 1, there is only one list and it is just the number 1. Thus one can define sgn1 (1) ≡ 1. For the general case where n > 1, simply define ( ) ∏ sgnn (i1 , · · · , in ) ≡ sign (is − ir ) r k and this must be the same number. In other words, for each k there are the same number of mj larger than k as there are lj larger than k. Thus the lj coincide with the mj and there are the same number of them. Hence s = t and mj = lj . This last assertion deserves a little more explanation. The smallest lj and kj can be is 1. ( ) Otherwise, you would have ann (w1 ) or ann (v1 ) = p0 = (1) = D. Thus for every α ∈ D, αw1 = 0. In particular, this would hold if α = 1 and so w1 = 0 contrary to assumption that none of the wi , vi equal to 0. (Obviously you can’t get uniqueness if you allow some vj to equal 0 because you can string together as many D0 as you like.) Therefore, you could consider M/pM such that k = 0 and there are the same number of vi and wj since each li , mj is larger than 0. Thus s = t. Then a contradiction is also obtained if some mi ̸= li . You just consider the first such i and let k be the smaller of the two.

192

CHAPTER 9. MODULES AND RINGS



Theorem 9.6.1 Suppose M is a finitely generated torsion module for a p.i.d. and ann (M ) = (pq ) where p is a prime. Then M = Dv1 ⊕ · · · ⊕ Dvs , no vj = 0 Letting ann (vj ) = (ν) , it follows that ν = plj for some lj ≤ q. If the direct summands are listed in the order that the li are increasing (or decreasing), then s is independent of the choice of the vj and any other such cyclic direct sum for M will have the same sequence of lj . These considerations about uniqueness yield the following more refined version of Theorem 9.5.9. Theorem 9.6.2 Let M be a non zero torsion module for a p.i.d. D and suppose that M = Dz1 + · · · + Dzp n

so that it is finitely generated. Then there exist primes {pi }i=1 distinct in the sense that none is an invertible multiple of another such that M = Mp1 ⊕ · · · ⊕ Mpn , no Mpi equal to 0 For (β) = ann (M ) , β =

∏n i=1

ˆp M i

pki i , a product of non-invertible primes, it follows { } ≡ m ∈ M : pki i m = 0 { } = m ∈ M : pki m = 0 for some k = Mp

which is not dependent on the spanning set. Also, Mpj = Daj1 ⊕ · · · ⊕ Dajmj , mj ≤ kj where ann (ajr ) = plr , such that lr ≤ kj and we can have the order arranged such that l1 ≤ l2 ≤ · · · ≤ lmj = kj . The numbers l1 ≤ l2 ≤ · · · ≤ lmj = kj are the same for each cyclic decomposition of Mpj . That is, they do not depend on the particular decomposition chosen. Of course we would also have uniqueness if we adopted the convention that the li should be arranged in decreasing order. All of the above is included in [24] and [7] where, in addition to the above, there are some different presentations along with much more on modules and rings.

9.7

Canonical Forms

Now let D = F [x] and let L ∈ L (V, V ) where V is a finite dimensional vector space over F. Define the multiplication of something in D with something in V as follows. g (x) v ≡ g (L) v, L0 ≡ I As mentioned above, V is a finitely generated torsion module and F [x] is a p.i.d. The non-invertible primes are polynomials irreducible over F. Letting {z1 , · · · , zl } be a basis for V, it follows that V = Dz1 + · · · + Dzl n

and so there exist irreducible polynomials over F, {ϕi (x)}i=1 and corresponding positive integers ki such that ) ) ( ( k k V = ker ϕ1 (L) 1 ⊕ · · · ⊕ ker ϕn (L) n

9.7. CANONICAL FORMS

193

This is just Theorem 6.1.10 obtained as a special case. Recall that the entire theory of canonical forms is based on this result. This follows because { } k k Mϕi = v ∈ V : ϕi (x) i v ≡ ϕi (L) i v = 0 . Now continue letting D = F [x] , L ∈ (L (V, V ) ) for V a finite dimensional vector space and g (x) v ≡ k

g (L) v as above. Consider Mϕi ≡ ker ϕi (L) i which is a sub module of V . Then by Theorem 9.5.9, this is the direct sum of cyclic sub modules. (

Mϕi = Dv1 ⊕ · · · ⊕ Dvs

)

(*)

where s = rank Mϕi .

) ( ) ( ) ( k l At this point, note that ann Mϕi = ϕi (x) i and so ann (Dvj ) = ϕi (x) j where lj ≤ ki . If

di is the degree of ϕi (x) , this implies that for vj being one of the v in ∗, 1, Lvj , L2 vj , · · · , Llj di −1 vj ( ) l must be a linearly independent set since otherwise, ann (Dvj ) ̸= ϕi (x) j because a polynomial of l

smaller degree than the degree of ϕi (x) j will annihilate vj . Also, ( ) Dvj ⊆ span vj , Lvj , L2 vj , · · · , Llj di −1 vj ) ( l by division. Vectors in Dvj are of the form g (x) vj = ϕi (x) j k (x) + ρ (x) vj = ρ (x) vj where the { } degree of ρ (x) is less than lj di . Thus a basis for Dvj is vj , Lvj , L2 vj , · · · , Llj di −1 vj and the span of these vectors equals Dvj . Now you see why the term “cyclic” is appropriate for the submodule Dv. This shows the following theorem. Theorem 9.7.1 Let V be a∏finite dimensional vector space over a field of scalars F. Also suppose n k the minimum polynomial is i=1 (ϕi (x)) i where ki is a positive integer and the degree of ϕi (x) is di . Then ( ) ( ) k k V = ker ϕ1 (L) 1 ⊕ · · · ⊕ ker ϕn (L) n ≡ Mϕ1 (x) ⊕ · · · ⊕ Mϕn (x) ) ( k Furthermore, for each i, in ker ϕi (L) i , there are vectors v1 , · · · , vsi and positive integers l1 , · · · , lsi ( ) k each no larger than ki such that a basis for ker ϕi (L) i is given by } { ls di −1 β lv11di −1 , · · · , β vsii where the symbol β lvjjdi −1 signifies the ordered basis (

vj , Lvj , L2 vj , · · · , Llj di −2 vj , Llj di −1 vj

)

k

Its length is the degree of ϕj (x) j and is therefore, determined completely by the lj . Thus the lengths of the β lvjjdi −1 are uniquely determined if they are listed in order of increasing or decreasing length. The last claim of this theorem will mean that the various canonical forms are uniquely determined. It is clear that the span of β lvjjdi −1 is invariant with respect to L because, as discussed above, ) ( l this span is Dvj where D = F [x]. Also recall that ann (Dvj ) = ϕi (x) j where lj ≤ ki . Let ϕi (x) j = xlj di + an−1 xlj di −1 + · · · + a1 x + a0 l

194

CHAPTER 9. MODULES AND RINGS



Recall that the minimum polynomial has leading coefficient equal to 1. Of course this makes no difference in the above presentation because an is invertible and so the ideals and above direct sum are the same regardless of whether this leading coefficient equals 1, but it is convenient to let this happen since otherwise, the blocks for the rational canonical form will not be standard. Then what is the matrix of L restricted to Dvj ? ( ) ( ) Lvj · · · Llj di −1 vj Llj di vj = vj · · · Llj di −2 vj Llj di −1 vj M ) ( l where M is the desired matrix. Now ann (Dvj ) = ϕi (x) j and so ( ) Llj di vj = (−1) an−1 Llj di −1 vj + · · · + a1 Lvj + a0 vj Thus the matrix M must be of the form 

0   1    0

..

. 1

−a0 −a1 .. . −an−1

     

(9.2)

It follows that the matrix of L with respect to the basis obtained as above will be a block diagonal with blocks like the above. This is the rational canonical form. ( ) k Of course, those blocks corresponding to ker ϕi (L) i can be arranged in any order by just ls di −1

in various orders. If we want the blocks to be larger in the top left listing the β lv11di −1 , · · · , β vsii and get smaller towards the lower right, we just re-number it to have li be a decreasing sequence. Note that this is the same as saying that ann (Dv1 ) ⊆ ann (Dv2 ) ⊆ · · · ⊆ ann (Dvsi ) . If we want the blocks to be increasing in size from the upper left corner to the lower right, this corresponds to re-numbering such that ann (Dv1 ) ⊇ ann (Dv2 ) ⊇ · · · ⊇ ann (Dv1 ). This second one involves letting l1 ≤ l2 ≤ · · · ≤ lsi . ( ) k

What about uniqueness of the rational canonical form given an order of the spaces ker ϕi (L) i ( ) k and under the convention that the blocks associated with ker ϕi (L) i should be increasing or degreasing in size from upper left toward lower right? In other words, suppose you have ( ) k ker ϕi (L) i = Dv1 ⊕ · · · ⊕ Dvs = Dw1 ⊕ · · · ⊕ Dwt such that ann (Dv1 ) ⊆ ann (Dv2 ) ⊆ · · · ⊆ ann (Dvs ) and ann (Dw1 ) ⊆ ann (Dw2 ) ⊆ · · · ⊆ ann (Dwt ) will it happen that s = t and the that the blocks with corresponding Dvi and Dwi are ) ( associated l m the same size? In other words, if ann (Dvj ) = ϕi (x) j and ann (Dwj ) = (ϕi (x) j ) is mj = lj . If this is so, then this proves uniqueness of the rational canonical form up to order of the blocks. However, this was proved above in the discussion on uniqueness, Theorem 9.6.1. In the case that the minimum polynomial splits the following is also obtained. Corollary 9.7.2 Let V∏be a finite dimensional vector space over a field of scalars F. Also let the n k minimal polynomial be i=1 (x − µi ) i where ki is a positive integer. Then ) ) ( ( k k V = ker (L − µ1 I) 1 ⊕ · · · ⊕ ker (L − µn I) n

9.7. CANONICAL FORMS

195

( ) k Furthermore, for each i, in ker (L − µi I) i , there are vectors v1 , · · · , vsi and positive integers ( ) k l1 , · · · , lsi each no larger than ki such that a basis for ker (L − µi I) i is given by { } ls −1 β lv11−1 , · · · , β vsii where the symbol β lvjj−1 signifies the ordered basis ( ) l −1 l −2 2 (L − µi I) j vj , (L − µi I) j vj , · · · , (L − µi I) vj , (L − µi ) vj , vj (Note how this is the reverse order to the above. This is to follow the usual convention in the Jordan form in which the string of ones is on the super diagonal.) Proof: The proof is essentially the same. ( ) k ker (L − µi ) i = Dv1 ⊕ · · · ⊕ Dvsi ( ) k k ann (Dvj ) for vj ∈ ker (L − µi ) i is a principal ideal (ν (x)) where ν (x) / (x − µi ) i and so it is of l

the form (x − µi ) j where 0 ≤ lj ≤ ki . Then as before, l −1

2

vj , (L − µi I) vj , (L − µi I) vj , · · · , (L − µi I) j

vj

(*)

must be independent because if not, ( ) there is a polynomial g (x) of degree less than lj such that l g (x) (Dvj ) = 0 and so (x − µi ) j cannot really be ann (Dvj ). It is also true that the above list l

of vectors must span Dvj because if f (x) ∈ D, then f (x) = (x − µi ) j m (x) + r (x) where r (x) has degree less than lj . Thus f (x) vj = r (x) vj and clearly r (x) can be written in the form rˆ (x − µi ) with the same ( degree. (Just ) take Taylor expansion of r formally.) Thus ∗ is indeed a basis of Dvj for vj ∈ ker (L − µi I)

ki

.

l

This gives the Jordan form right away. In this case, we know that (L − µi I) j vj = 0 and so the matrix of the transformation L − µi I with respect to this basis on Dvj obtained in the usual way. ( ) l −1 0 (L − µi I) j vj · · · (L − µi I) vj =  (

lj −1

(L − µi I)

vj

lj −2

(L − µi I)

···

vj

 )  vj   

0 1 0 0

0 ..

.

..

.



     1  0

a Jordan block for the nilpotent matrix (L − µi I) . Thus, with respect to this basis, the block associated with L and β lvjj−1 is just       

µi

1 µi

0

0

..

.

..

.



     1  µi

This has proved the existence of the Jordan form. You have a string of blocks like the above for k ker (L − µi I) i . Of course, these can be arranged so that the size of the blocks is decreasing from upper left to lower right. As with the rational canonical form, once it is decided to have the blocks be decreasing (increasing) in size from upper left to lower right, the Jordan form is unique.

196

9.8

CHAPTER 9. MODULES AND RINGS



Exercises

1. Explain why any finite Abelian group is a module over the integers. Explain why every finite Abelian group is the direct sum of cyclic subgroups. 2. Let R be a commutative ring which has 1. Show it is a field if and only if the only ideals are (0) and (1). 3. Let R be a commutative ring and let I be an ideal of R. Let N (I) ≡ {x ∈ R : xn ∈ I for some n ∈ N} . Show that N (I) is an ideal containing I. Also show that N (N (I)) = N (I). This is called n the radical of I. Hint: If xn ∈ I for some n, what of yx? Is it true that (yx) ∈ I? For the n second part, it is clear that N (N (I)) ⊇ N (I). If x ∈ N (N (I)) , then x ∈ N (I) for some n. k Is it true that (xn ) = xkn ∈ I for some k? 4. Let F be a field and p (x) ∈ F [x]. Consider R ≡ F [x] / (p (x)) . Let N ≡ {q ∈ R : q n = 0 for some n ∈ N} Show that N is an ideal and that it equals (0) if and only if p (x) is not divisible by the square n of any polynomial. Hint: Say N = (0) . Then by assumption, if q (x) ∈ (p (x)) , we must ∏m ki have q (x) a multiple of p (x) . Say p (x) = i=1 pi (x) . Argue that a contradiction results if any ki > 1 by replacing ki in the product with li , 1 ≤ li ∏ < ki . This is q (x). Explain n m why q (x) ∈ (p (x)) but q (x) ∈ / (p (x)). Therefore, p (x) = i=1 pi (x) , the pi (x) distinct 2 noninvertible primes. Explain why this precludes having p (x) divisible by q (x) . Conversely, ∏m 2 ki if p (x) is divisible by q (x) for some polynomial, then p (x) = i=1 pi (x) where some ki > 1 and so N ̸= (0). Explain. 5. Recall that if p (x) is irreducible in F [x] , then F [x] / (p (x)) is a field. Show that if p (x) is not irreducible, then this quotient ring is not even an integral domain. 6. Find a polynomial p (x) of degree 3 which is irreducible over Z3 . Thus is a prime in Z3 [x]. Now consider Z3 [x] / (p (x)) . Recall that this is a field. How many elements are in this field? Hint: Show that all elements of this field are of the form a0 + a1 x + a2 x2 + (p (x)) where there are three choices for each ai . 7. Let F be a field and consider F [x] / (p (x)) where p (x) is prime in F [x] (irreducible). Then this was shown to be a field. Let α be a root of p (x) in a field G which contains F. Consider F [α] which means all polynomials in α having coefficients in F. Show that this is isomorphic to F [x] / (p (x)) and is itself a field. Hint: Let θ (k (x) + (p (x))) ≡ k (α) . Show that since p (x) is irreducible, it has the smallest possible degree out of all polynomials q (x) for which q (α) = 0. You might use this to show that θ is one to one. Show that θ is actually an isomorphism. Thus F [α] must be a field. 8. Letting Z be the integers, show that Z [x] is an integral domain. Is this a principal ideal domain? Show that in Z [x] , if you have f (x) , g (x) given, then there exists an integer b such that bf (x) = g (x) Q (x) + R (x) where the degree of R (x) is less than the degree of g (x). Note how in a field, you don’t need to multiply by some integer b. Hint: Concerning the question about whether this is a p.i.d.,suppose you have an ideal in Z [x] called I. Say l (x) ∈ I and l (x) has smallest degree out of all polynomials in I. Let k (x) ∈ I. Then in Q [x] , you have r (x) , q (x) ∈ Q [x] such that k (x)

=

l (x) q (x) + r (x) ,

r (x)

= 0 or degree of r (x) < degree of l (x)

9.8. EXERCISES

197

Now multiply by denominators to get an equation in which everything is in Z [x] but all the degrees are the same. Let the result on the left be kˆ (x). Then it is in I. Obtain a contradiction if r (x) ̸= 0. 9. Now consider the polynomials x3 + x and 2x2 + 1. Show that you cannot write ( ) x3 + x = q (x) 2x2 + 1 + r (x) where the degree of r (x) is less than the degree of 2x2 + 1 and both r (x) , q (x) are in Z [x]. Thus, even though Z [x] is a p.i.d. the degree will not serve to make Z [x] into a Euclidean domain. 10. The symbol F [x1 , · · · , xn ] denotes the polynomials in the indeterminates x1 , · · · , xn which have coefficients in the field F. Thus a typical element of F [x1 , · · · , xn ] is of the form ∑ ai1 ,··· ,in xi11 · · · xinn i1 ,··· ,in

where this sum is taken over all lists of nonnegative integers i1 , · · · , in and only finitely many ai1 ,··· ,in ∈ F are nonzero. Explain why this is a commutative ring which has 1. Also explain why this cannot be a principal ideal ring in the case that n > 1. Hint: Consider F [x, y] and show that (x, y) , denoting all polynomials of the form ax + by where a, b ∈ F cannot be obtained as a principal ideal (p (x, y)). 11. Suppose you have a commutative ring R and an ideal I. Suppose you have a morphism ˆ where R ˆ is another ring and that I ⊆ ker (h). Also let h:R→R f : R → R/I. be defined by f (r) ≡ r + I Here R/I is as described earlier in the chapter. The entries are of the form r + I where r ∈ R. ≡

r + I + rˆ + I

(r + I) (ˆ r + I) ≡

r + rˆ + I rˆ r+I

show this is a ring and that the operations are well defined. Show that f is a morphism and ˆ such that h = θ ◦ f is also onto. Then show that there exists θ : R/I → R R ↓f

↘h

R/I

99K

θ

ˆ R

Hint: It is clear that f is well defined and onto and is a morphism. Define θ (r + I) ≡ h (r) . Now show it is well defined. 12. The Gaussian integers Z [i] are complex numbers of the form m + in where m, n are integers. Show that this is also the same as polynomials with integer coefficients in powers of i which explains the notation. Show this is an integral domain. Reviewing the Definition 9.1.1 show that the Gaussian integers are also a Euclidean domain if δ (a) ≡ a¯ a. Hint: For the last part, if you have a, b ∈ Z [i] , then as in [24] a = µ + iλ b where µ, ν ∈ Q the rational numbers. There are integers u, l such that |µ − u| ≤ 1/2, |λ − l| ≤ 1/2. Then letting ε ≡ µ − u, η ≡ λ − l, it follows that a = b (u + ε) + i (l + η)

198

CHAPTER 9. MODULES AND RINGS Thus



r

z }| { a = b (u + il) + b (ε + iη)

It follows that r ∈ Z [i]. Why? Now consider δ (r) ≡ b (ε + iη) b (ε + iη) = b (ε + iη) ¯b (ε − iη) Verify that δ (r) < δ (b). Explain why r is maybe not unique. 13. This, and the next several problems give a more abstract treatment of the cyclic decomposition theorem found in Birkhoff and McClain. [7]. Suppose B is a submodule of A a torsion module over a p.i.d.D. Show that A/B is also a module over D, and that ann (A/B) equals (α) for some α ∈ D. Now suppose that A/B is cyclic A/B = Da0 + B Say (β) = ann (a0 ). Now define I ≡ {λ : λa0 ∈ B}. Explain why this is an ideal which is at least as large as (β). Let it equal (α). Explain why α/β. In particular, α ̸= 0. Then let θ : D → B ⊕ D be given by θ (λ) ≡ (αλa0 , −λα) . Here we write B ⊕ D to indicate ( B )× D( but with a )summation and multiplication by a scalar defined as follows: (b, α) + ˆb, α ˆ ≡ b + ˆb, α + α ˆ , β (a, λ) ≡ (βa, βλ). Show that this mapping is one to one and a morphism meaning that it preserves the operation of addition, θ (δ + λ) = θδ + θλ. Why is αλa0 ∈ B? 14. ↑In the above problem, define η : B ⊕ D → A as η (b, α) ≡ b + αa0 . Show that η maps onto A, that η ◦ θ = 0, and that θ (D) = ker (η). People write the result of these two problems as η

θ

0→D →B⊕D →A→0 and it is called a “short exact sequence”. It is exact because θ (D) = ker (η). 15. ↑Let C be a cyclic module with ann (C) = (µ) and µA = 0. Also, as above, let A be a module with B a submodule such that A/B is cyclic, A/B ≡ D (a0 + B) , ann (A/B) = (ν), C = Dc0 . From the above problem, ν ̸= 0. Suppose there is a morphism τ : B → C. Then it is desired to extend this to σ : A → C as illustrated in the following picture where in the picture, i is the inclusion map. i B → A ↓τ ↙σ C You need to define σ. First explain why µ = νλ for some λ. Next explain why νa0 ∈ B and then why this implies λτ (νa0 ) = 0. Let τ (νa0 ) = βc0 . The β exists because of the assumption that τ maps to C = Dc0 . Then since λτ (νa0 ) = 0, it follows that λβc0 = 0. Explain why λβ = µδ = νλδ. Then explain why β = νδ Thus τ (νa0 ) = τ (κνa0 )

=

νδc0 ≡ νc′ κνc′

From the above problem, θ

η

0→D →B⊕D →A→0

9.8. EXERCISES

199

where θ (κ) ≡ (κνa0 , −κν) , η (b, κ) ≡ b + κa0 and the above is a short exact sequence. So now extend τ to τ ′ : B ⊕ D → C as follows. τ ′ (b, κ) ≡ τ (b) + κc′ The significance of this definition is as follows. τ ′ (θ (κ))

≡ τ ′ ((κνa0 , −κν)) ≡ τ (κνa0 ) − κνc′ =

κνc′ − κνc′ = 0

This is clearly a morphism which agrees with τ on B ⊕ 0 provided we identify this with B. The above shows that θ (D) ⊆ ker (τ ′ ). But also, θ (D) = ker (η) from the above problem. Define τˆ′ : (B ⊕ D) /θ (D) = (B ⊕ D) / ker (η) → C as

τˆ′ ((b, κ) + θ (D)) ≡ τ ′ (b, κ) ≡ τ (b) + κc′ ( ) This is well defined because if ˆb, κ ˆ − (b, κ) ∈ θ (D) , then this difference is in ker (τ ′ ) and so ( ) ˆ = τ ′ (b, κ). Now consider the following diagram. τ ′ ˆb, κ η ˆ

(B ⊕ D) /θ (D) → ↓ τˆ′ i



C

A ↓σ C

where ηˆ : (B ⊕ D) /θ (D) = (B ⊕ D) / ker (η) → A be given by ηˆ ((b, κ) + ker (η)) ≡ η ((b, κ)) ≡ b + κa0 . Then choose σ to make the diagram commute. That is, σ = i ◦ τˆ′ ◦ ηˆ−1 . Verify this gives the desired extension. It is well defined. You just need to check that it agrees with τ on B. If b ∈ B, ηˆ−1 (b) ≡ (b, 0) + θ (D) Now recall that it was shown above that τ ′ extends τ . This proves the existence of the desired extension σ. This proves the following lemma: Let C = Dc0 be a cyclic module over D and let B be a submodule of A, also over D. Suppose A = span (B, a0 ) so that A/B is cyclic and suppose µA = 0 where (µ) = ann (C) and let (ν) be ann (B) . Suppose that τ : B → C is a morphism. Then there is an extension σ : A → C such that σ is also a morphism. 16. ↑Let A be a Noetherian torsion module, let B be a sub module, and let τ : B → C be a morphism. Here C is a cyclic module C = Dc0 where ann (c0 ) = (µ) and µ (A) = 0. Then there exists a morphism σ : A → C which extends τ . Note that here you are not given that A/B is cyclic but A is a Noetherian torsion module. Hint: Explain why there are finitely many {a1 , a2 , · · · , an } such that A = span (B, a1 , a2 , · · · , an ) . Now explain why if Ak = span (B, · · · , ak ) then each of these is a module and Ak /Ak−1 is cyclic. Now apply the above result of the previous problem to get a succession of extensions. 17. ↑Suppose C is a cyclic sub module of A, a Noetherian torsion module and (µ) = ann (C) . Also suppose that µA = 0. Then there exists a morphism σ : A → C such that A = C ⊕ ker σ To so this, use the above propositon to get the following diagram to commute. C ↓i C

i

→A ↙σ

200

CHAPTER 9. MODULES AND RINGS



Then consider(a = σa)+ (a − σa) which is something in C added to something else. Explain ∈C why σ 2 a = σ iσ (a) = σ (a) . Thus the second term is in ker (σ). Next suppose you have b ∈ ker (σ) and c ∈ C. Then say c + b = 0. Thus σc = c = 0 when σ is done to both sides. This gives a condition under which C is a direct summand. 18. ↑ Let M be a non zero torsion module for a p.i.d. D and suppose that M = Dz1 + · · · + Dzp so that it is finitely generated. Show that it is the direct sum of cyclic submodules. To do this, pick a1 ̸= 0 and consider Da1 a cyclic submodule. Then you use the above problem to write M = Da1 ⊕ ker σ for a morphism σ. Let M2 ≡ ker σ. It is also a torsion module for D. If it is not zero, do the same thing for it that was just done. The process must end because it was shown above in the chapter that M is Noetherian, Proposition 9.5.4. 19. The companion matrix of the polynomial q (x) = xn + an−1 xn−1 + · · · + a1 x + a0 is   0 0 −a0    1 ... −a1    C=  .. .. ..   . . .   0 1 −an−1 Show that the characteristic polynomial, a1 x + a0 . Hint: You need to take  x   −1 det    0

det (xI − C) of C is equal to xn + an−1 xn−1 + · · · + 0 x .. .

x −1

a0 a1 .. . x + an−1

     

To do this, use induction and expand along the top row. 20. Letting C be the n × n matrix as in the above where q (x) is also given there, explain why Ce1 = e2 , Ce2 = e3 , · · · , Cen−1 = en . Thus ek = C k−1 e1 . Also explain why Cen

= C n e1 = (−1) (a0 e1 + a1 e2 + · · · + an−1 en ) ( ) = (−1) a0 e1 + a1 Ce1 + · · · + an−1 C n−1 e1

Explain why q (C) e1 = 0. Now each ek is a multiple of C times e1 . Explain why q (C) ek = 0. Why does this imply that q (C) = 0? 21. Suppose you have a block diagonal matrix  M1   A=

 ..

  

. Mm

Show that det (xI − A) =

m ∏ j=1

det (xIj − Mj ) , Ij the appropriate size.

9.8. EXERCISES

201

22. Now let L ∈ L (V, V ) . The rational canonical form says that there is a basis for V such that with respect to this basis, the matrix of L is of the form   M1 0   ..  B≡ .   0 Mm and each Mj is also a block diagonal matrix,  Cj,1  Mj =   0



0 ..

  

. Cj,sj

where Cj,a is a companion matrix for a polynomial qj,a (x). From the above problems, det (xI − Cj,a ) = qj,a (x) , and qj,a (Cj,a ) = 0 Explain why the characteristic polynomial for L is g (x) = det (xI − B) =

sj m ∏ ∏

det (xI − Cj,a ) =

j=1 a=1

sj m ∏ ∏

qj,a (x)

j=1 a=1

Now, if you have a polynomial g (x) and a block diagonal matrix   C1 0   ..  D= .   0 Cq Explain why g (D) =

   

g (C1 )

0 ..

.

0

   

g (Cq )

Now explain why L satisfies its characteristic equation. This gives a general proof of the Cayley Hamilton theorem. 23. If you can find the eigenvalues exactly, then the Jordan form can be computed. Otherwise, you can forget about it. Here is a matrix.   1 2 0 0  0 1 0 0       −1 0 0 −2  1

−1 1

3

It has two eigenvalues, λ = 1, and 2. Find its Jordan form. 24. Let the field of scalars be Z5 and consider the matrix   2 1 4   A= 0 2 2  0 0 0 where the entries are considered as residue classes in Z5 . Show that its eigenvalues are indeed 0, 2, 2. Now find its Jordan canonical form J, and S such that A = SJS −1 .

202

CHAPTER 9. MODULES AND RINGS



25. Recall that an integral domain is a commutative ring which has 1 such that if ab = 0 then one of a or b equals 0. A Euclidean domain D is one which has the property that it has a function δ, defined on the nonzero elements of D which has values in {0, 1, 2, · · · } such that if a, b ̸= 0 then there exist q, r such that a = bq + r, where δ (r) < δ (b) or r = 0 Show that such a Euclidean domain is a principal ideal domain. Also show that an example of such a thing is the integers and also the ring of polynomials F [x] for F a field. Hint: Start with an ideal I. Pick b ∈ I which has δ (b) as small as possible. Now suppose you have a ∈ I. Use the above Euclidean algorithm to write a = bq + r where δ (r) < δ (b) or else r = 0. This was done in the chapter, but do it yourself. 26. Suppose you have an n × n matrix A in which the entries come from a commutative ring which has a multiplicative identity 1. Explain why there exists a matrix B also having entries in the ring such that AB = BA = I whenever det (A) = 1. Explain why examples of such invertible matrices are the elementary matrices in which a multiple of a row (column) is added to another, and those which involve switching two rows (columns). Hint: You might look at determinants and the formula for the inverse of a matrix using the cofactor matrix. 27. Consider the m × n matrices which have entries in a commutative ring R. Then A ∼ B means there are invertible matrices P, Q of the right size having entries in R, such that A = P BQ Show that this is an equivalence relation. Note that the P, Q are different and may even be of different size. Thus this is not the usual notion of similar square matrices. 28. Suppose a ̸= 0. Show that there exists an invertible 2 × 2 matrix R such that ( ) ( ) a b d 0 R= ∗ ∗ ∗ ∗ where d is a greatest common divisor of a and b. 29. Let D be a principal ideal domain and let p be a noninvertible prime. Consider the principal ideal (p) . Show that if U is any ideal such that (p) ⊆ U ⊆ D, then either U = D or U = (p). Such an ideal is called a maximal ideal. It is an ideal, not equal to the whole ring such that no ideal can be placed between it and the whole ring. Now show that if you have a maximal ideal I in a principal ideal domain, then it must be the case that I = (p) for some prime p. 30. Let R be the ring of continuous functions defined on [0, 1] . Here it is understood that f = g means the usual thing, that f (x) = g (x). Multiplication and addition are defined in the usual way. Pick x0 ∈ [0, 1] and let Ix0 ≡ {f ∈ R : f (x0 ) = 0} . Show that this is a maximal ideal of R. Then show that there are no other maximal ideals. Hint: For the second part, let I be a maximal ideal. Show using a compactness argument and continuity of the functions that unless there exists some x0 for which all f ∈ I are zero, then there exists a function in I which is never 0. Then since this is an ideal, you can show that it contains 1. Explain why this ring cannot be an integral domain. 31. Suppose you have a commutative ring which has 1 called R and suppose that I is a maximal ideal. Show that R/I is a field. Hint: Consider the ideal Rr + I where r + I ̸= 0. Then explain why this ideal is strictly larger than I and hence equals R. Thus 1 ∈ Rr + I. 32. In the above problem about the ring of continuous functions with field of scalars R, there were maximal ideals as described there. Thus R/Ix0 is a field. Describe the field.

9.8. EXERCISES

203

33. Say you have a ring R and a ∈ R \ {1} , a ̸= 0 which is not invertible. Explain why (a) is an ideal which does not contain 1. Show there exists a maximal ideal. Hint: You could let F denote all ideals which do not contain 1. It is nonempty by assumption. Now partially order this by set inclusion. Consider a maximal chain. This uses the Hausdorff maximal theorem in the appendix. 34. It is always assumed that the rings used here are commutative rings and that they have a multiplicative identity 1. However, sometimes people have considered things which they have called rings R which have all the same axioms except that there is no multiplicative identity 1. However, all such things can be considered to be in a sense embedded in a real ring which has a multiplicative identity. You consider Z × R and define addition in the obvious way ( ) ( ) ˆ rˆ ≡ k + k, ˆ r + rˆ (k, r) + k, and multiplication as follows. ( ) ( ) ˆ rˆ ≡ k k, ˆ kˆ ˆ + rˆ (k, r) k, r + kr r Then the multiplicative identity is just (1, 0) . You have (1, 0) (k, r) ≡ (k, r). You just have to verify the other axioms like the distributive laws and that multiplication is associative.

204

CHAPTER 9. MODULES AND RINGS



Chapter 10

Some Items which Resemble Linear Algebra This chapter is on some topics which don’t usually appear in linear algebra texts but which seem to be related to linear algebra in the sense that the ideas are similar.

10.1

The Symmetric Polynomial Theorem

First here is a definition of polynomials in many variables which have coefficients in a commutative ring. A commutative ring would be a field except you don’t know that every nonzero element has a multiplicative inverse. A good example of a commutative ring is the integers. In particular, every field is a commutative ring. Thus, a commutative ring satisfies the following axioms. They are just the field axioms with one omission mentioned above. You don’t have x−1 if x ̸= 0. We will assume that the ring has 1, the multiplicative identity. Axiom 10.1.1 Here are the axioms for a commutative ring. 1. x + y = y + x, (commutative law for addition) 2. There exists 0 such that x + 0 = x for all x, (additive identity). 3. For each x ∈ F, there exists −x ∈ F such that x + (−x) = 0, (existence of additive inverse). 4. (x + y) + z = x + (y + z) ,(associative law for addition). 5. xy = yx, (commutative law for multiplication). You could write this as x × y = y × x. 6. (xy) z = x (yz) ,(associative law for multiplication). 7. There exists 1 such that 1x = x for all x,(multiplicative identity). 8. x (y + z) = xy + xz.(distributive law). The example of most interest here is where the commutative ring is the integers Z. Next is a definition of what is meant by a polynomial. Definition 10.1.2 Let k ≡ (k1 , k2 , · · · , kn ) where each ki is a nonnegative integer. Let ∑ |k| ≡ ki i

Polynomials of degree p in the variables x1 , x2 , · · · , xn are expressions of the form ∑ g (x1 , x2 , · · · , xn ) = ak xk11 · · · xknn |k|≤p

205

206

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

where each ak is in a commutative ring. If all ak = 0, the polynomial has no degree. Such a polynomial is said to be symmetric if whenever σ is a permutation of {1, 2, · · · , n}, ( ) g xσ(1) , xσ(2) , · · · , xσ(n) = g (x1 , x2 , · · · , xn ) An example of a symmetric polynomial is s1 (x1 , x2 , · · · , xn ) ≡

n ∑

xi

i=1

Another one is sn (x1 , x2 , · · · , xn ) ≡ x1 x2 · · · xn Definition 10.1.3 The elementary symmetric polynomial sk (x1 , x2 , · · · , xn ) , k = 1, · · · , n k

is the coefficient of (−1) xn−k in the following polynomial. (x − x1 ) (x − x2 ) · · · (x − xn ) = xn − s1 xn−1 + s2 xn−2 − · · · ± sn Thus s2 =



s1 = x1 + x2 + · · · + xn ∑ xi xj , s 3 = xi xj xk , . . . , sn = x1 x2 · · · xn

i max (K, m, U ) you would have −KmU p + M p an integer so it cannot equal the left side which will be small if p is large. Therefore, ∗ follows.  Next is an even more interesting Lemma which follows from the above corollary. Lemma 10.2.7 If b0 , b1 , · · · , bn are non zero integers, and γ 1 , · · · , γ n are distinct algebraic numbers, then b0 eγ 0 + b1 eγ 1 + · · · + bn eγ n ̸= 0 Proof: Assume b0 eγ 0 + b1 eγ 1 + · · · + bn eγ n = 0

(10.12)

K + b1 eα(1) + · · · + bn eα(n) = 0

(10.13)

Divide by eγ 0 and letting K = b0 ,

where α (k) = γ k − γ 0 . These are still distinct algebraic numbers. Therefore, α (k) is a root of a polynomial Qk (x) = vk xmk + · · · + uk (10.14)

214

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

having integer coefficients, vk , uk ̸= 0. Recall algebraic numbers were defined as roots of polynomial equations having rational coefficients. Just multiply by the denominators to get one with integer coefficients. Let the roots of this polynomial equation be { } α (k)1 , · · · , α (k)mk and suppose they are listed in such a way that α (k)1 = α (k). Thus, by Theorem 10.1.6 every symmetric polynomial in these roots is rational. Letting ik be an integer in {1, · · · , mk } it follows from the assumption 10.12 that ) ( ∏ K + b1 eα(1)i1 + b2 eα(2)i2 + · · · + bn eα(n)in = 0 (10.15) (i1 ,··· ,in ) ik ∈{1,··· ,mk }

This is because one of the factors is the one occurring in 10.13 when ik = 1 for every k. The product is taken over all distinct ordered lists (i1 , · · · , in ) where ik is as indicated. Expand this possibly huge product. This will yield something like the following. ( ) ( ) K ′ + c1 eβ(1)1 + · · · + eβ(1)µ(1) + c2 eβ(2)1 + · · · + eβ(2)µ(2) + · · · + ( ) cN eβ(N )1 + · · · + eβ(N )µ(N ) = 0

(10.16)

These integers cj come from products of the bi and K. You group these exponentials according to which ci they multiply. The β (i)j are the distinct exponents which result, each being a sum of some of the α (r)ir . Since the product included all roots for each Qk (x), interchanging their order does not change the distinct exponents β (i)j which result. They might occur in a different order however, but you would still have the same distinct exponents associated with each cs as shown in the sum. Thus any symmetric polynomial in the β (s)1 , β (s)2 , · · · , β (s)µ(s) is also a symmetric polynomial in the roots of Qk (x) , α (k)1 , α (k)2 , · · · , α (k)mk for each k. Doesn’t this contradict Corollary 10.2.6? This is not yet clear because we don’t know that the β (i)1 , ..., β (i)µ(i) are roots of a polynomial having rational coefficients. For a given r, β (r)1 , · · · , β (r)µ(r) are roots of the polynomial ( ) (x − β (r)1 ) (x − β (r)2 ) · · · x − β (r)µ(r) (10.17) the coefficients of which are elementary symmetric polynomials in the β (r)i , i ≤ µ (r). Thus the coefficients are symmetric polynomials in the α (k)1 , α (k)2 , · · · , α (k)mk for each k. Say the polynomial is of the form µ(r) ∑ xn−l Bl (A (1) , · · · , A (n)) l=0

{ } where A (k) signifies the roots of Qk (x) , α (k)1 , · · · , α (k)mk . Thus, by the symmetric polynomial theorem applied to the commutative ring Q [A (1) , · · · , A (n − 1)], the above polynomial is of the form µ(r) ∑ ∑ l kl kn xµ(r)−l Bkl (A (1) , · · · , A (n − 1)) s11 · · · sµ(r) l=0

kl

{ } where the sk is one of the elementary symmetric polynomials in α (n)1 , · · · , α (n)mn and Bkl is symmetric in α (k)1 , α (k)2 , · · · , α (k)mk for each k ≤ n−1 and Bkl ∈ Q [A (1) , · · · , A (n − 1)] . Now do to Bkl what was just done to Bl featuring A (n − 1) this time, and continue till eventually you obtain for the coefficient of xµ(r)−l a large sum of rational numbers times a product of symmetric polynomials in A (1) , A (2) , etc. By Theorem 10.1.6 applied repeatedly, beginning with A (1) and then to A (2) and so forth, one finds that the coefficient of xµ(r)−l is a rational number and so the β (r)j for j ≤ µ (r) are algebraic numbers because they are roots of a polynomial which has rational

10.2. TRANSCENDENTAL NUMBERS

215

coefficients, namely the one in 10.17, hence roots of a polynomial with integer coefficients. Now 10.16 contradicts Corollary 10.2.6.  Note this lemma is sufficient to prove Lindermann’s theorem that π is transcendental. Here is why. If π is algebraic, then so is iπ and so from this lemma, e0 + eiπ ̸= 0 but this is not the case because eiπ = −1. The next theorem is the main result, the Lindermann Weierstrass theorem. It replaces the integers bi in the above lemma with algebraic numbers. Theorem 10.2.8 Suppose a (1) , · · · , a (n) are nonzero algebraic numbers and suppose α (1) , · · · , α (n) are distinct algebraic numbers. Then a (1) eα(1) + a (2) eα(2) + · · · + a (n) eα(n) ̸= 0 Proof: Suppose a (j) ≡ a (j)1 is a root of the polynomial vj xmj + · · · + uj where vj , uj ̸= 0. Let the roots of this polynomial be a (j)1 , · · · , a (j)mj . Suppose to the contrary that a (1)1 eα(1) + a (2)1 eα(2) + · · · + a (n)1 eα(n) = 0 Then consider the big product ) ( ∏ a (1)i1 eα(1) + a (2)i2 eα(2) + · · · + a (n)in eα(n)

(10.18)

(i1 ,··· ,in ) ik ∈{1,··· ,mk }

the product taken over all ordered lists (i1 , · · · , in ) . Since one of the terms factors in this product equals 0, this product equals 0 = b1 eβ(1) + b2 eβ(2) + · · · + bN eβ(N )

(10.19)

where the β (j) are the distinct exponents which result and the bk result from combining terms corresponding to a single β (k). The β (i) are clearly algebraic because they are the sum of the α (i). I want to show that the bk are actually rational numbers. Since the product in 10.18 is taken for all ordered lists as described above, it follows that for a given k,if a (k)i is switched with a (k)j , that is, two of the roots of vk xmk + · · · + uk are switched, then the product is unchanged and so 10.19 is also unchanged. Thus each bl is a symmetric polynomial in the a (k)j , j = 1, · · · , mk for each k. Consider then a particular bk .It follows ∑ j j bk = Aj1 ,··· ,jmn a (n)11 · · · a (n)mmnn (j1 ,··· ,jmn )

} { and this is symmetric in the a (n)1 , · · · , a (n)mn (note n is distinguished) the coefficients Aj1 ,··· ,jmn being in the commutative ring Q [A (1) , · · · , A (n − 1)] where A (p) denotes a (k)1 , · · · , a (k)mp and so from Theorem 10.1.4, ∑ ( ) ( ) j bk = Bj1 ,··· ,jmn pj11 a (n)1 · · · a (n)mn · · · pmmnn a (n)1 · · · a (n)mn (j1 ,··· ,jmn )

where the Bj1 ,··· ,jmn are symmetric in

{ }mk for each k ≤ n − 1. and the plk are elementary a (k)j j=1

symmetric polynomials. Now doing to Bj1 ,··· ,jmn what was just done to bk and continuing this way,

216

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

it { follows }mbkk is a finite sum of rational numbers times powers of elementary polynomials in the various a (k)j for k ≤ n. By Theorem 10.1.6 this is a rational number. Thus bk is a rational number j=1

as desired. Multiplying by the product of all the denominators, it follows there exist integers ci such that 0 = c1 eβ(1) + c2 eβ(2) + · · · + cN eβ(N ) which contradicts Lemma 10.2.7.  This theorem is sufficient to show e is transcendental. If it were algebraic, then ee−1 + (−1) e0 ̸= 0 but this is not the case. If a ̸= 1 is algebraic, then ln (a) is transcendental. To see this, note that 1eln(a) + (−1) ae0 = 0 which cannot happen if ln (a) is algebraic according to the above theorem. If a is algebraic and sin (a) ̸= 0, then sin (a) is transcendental because 1 1 ia e − e−ia + (−1) sin (a) e0 = 0 2i 2i which cannot occur if sin (a) is algebraic. There are doubtless other examples of numbers which are transcendental by this amazing theorem. For example, π is also transcendental. This is because 1 + eiπ = 0. This couldn’t happen if π were algebraic because then so would be iπ.

10.3

The Fundamental Theorem of Algebra

This is devoted to a mostly algebraic proof of the fundamental theorem of algebra. It depends on the interesting results about symmetric polynomials which are presented above. I found it on the Wikipedia article about the fundamental theorem of algebra. You google “fundamental theorem of algebra” and go to the Wikipedia article. It gives several other proofs in addition to this one. According to this article, the first completely correct proof of this major theorem is due to Argand in 1806. Gauss and others did it earlier but their arguments had gaps in them. You can’t completely escape analysis when you prove this theorem. The necessary analysis is in the following lemma. Lemma 10.3.1 Suppose p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 where n is odd and the coefficients are real. Then p (x) has a real root. Proof: This follows from the intermediate value theorem from calculus. Next is an algebraic consideration. First recall some notation. m ∏

ai ≡ a1 a2 · · · am

i=1

Recall a polynomial in {z1 , · · · , zn } is symmetric only if it can be written as a sum of elementary symmetric polynomials raised to various powers multiplied by constants. The following is the main part of the theorem. In fact this is one version of the fundamental theorem of algebra which people studied earlier in the 1700’s. Lemma 10.3.2 Let p (x) = xn + an−1 xn−1 + · · · + a1 x + a0 be a polynomial with real coefficients. Then it has a complex root. Proof: It is possible to write n = 2k m

10.3. THE FUNDAMENTAL THEOREM OF ALGEBRA

217

where m is odd. If n is odd, k = 0. If n is even, keep dividing by 2 until you are left with an odd number. If k = 0 so that n is odd, it follows from Lemma 10.3.1 that p (x) has a real, hence complex root. The proof will be by induction on k, the case k = 0 being done. Suppose then that it works for n = 2l m where m is odd and l ≤ k − 1 and let n = 2k m where m is odd. Let {z1 , · · · , zn } be the roots of the polynomial in a splitting field, the existence of this field being given by the above proposition. Then n n ∏ ∑ k p (x) = (x − zj ) = (−1) pk (z1 , · · · , zn ) xk (10.20) j=1

k=0

where pk (z1 , · · · , zn ) is the k th elementary symmetric polynomial. Note this shows k

an−k = pk (z1 , · · · , zn ) (−1) , a real number.

(10.21)

There is another polynomial which has coefficients which are sums of real numbers times the pk raised to various powers and it is ∏ qt (x) ≡ (x − (zi + zj + tzi zj )) , t ∈ R 1≤i 1. Then p (x) has a factor q (x) irreducible over F which has degree larger than 1. If not, you could factor p (x) as linear factors and so all the roots would be in F so the dimension [K : F]would equal 1. Without loss of generality, let the roots of q (x) in K be {r1 , · · · , rm }. Thus m n ∏ ∏ q (x) = (x − ri ) , p (x) = (x − ri ) i=1

i=1

Now q¯ (x) ≡ η (q (x)) defined analogously to p¯ (x) , also has degree at least 2. Furthermore, it divides ¯ This is obvious because η is an isomorphism. You have p¯ (x) all of whose roots are in K. l (x) q (x) = p (x) so ¯l (x) q¯ (x) = p¯ (x) . ¯ as {¯ Denote the roots of q¯ (x) in K r1 , · · · , r¯m } where they are counted according to multiplicity. Recall why [F (r1 ) : F] = m. It is because q (x) is irreducible and monic so by Lemma 10.4.7, it is the minimum polynomial for each of the ri . Since q (x) is irreducible, 1, r1 , r12 , ..., r1m−1 must

224

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

be independent so the dimension is at least m. However, it is not more than m because q (x) is of degree m. Thus, using the division algorithm, everything in F (r1 ) is expressible as a polynomial in r1 of degree less than m. Then from Corollary 10.4.6, using q (x) and q¯ (x) in place of the p (x) and p¯ (x) in this corol¯ ≡ lary, there exist k ≤ m one to one homomorphisms (monomorphisms) ζ i mapping F (r1 ) to K ¯ (¯ ¯ These are {ξ , ..., ξ } where k ≤ m. If the F r1 , · · · , r¯n ), one for each distinct root of q¯ (x) in K. 1 k roots of p¯ (x) are distinct, then this is sufficient to imply that the roots of q¯ (x) are also distinct, and k = m = [F (r1 ) : F] . Otherwise, maybe k < m. (It is conceivable that q¯ (x) might have repeated ¯ Then by Proposition 3.4.13, roots in K.) >1

z }| { [K : F] = [K : F (r1 )] [F (r1 ) : F] and so [K : F (r1 )] < [K : F] . Therefore, by induction, two things happen: ¯ called ξ for i ≤ k ≤ m = 1.) Each of these one to one homomorphisms mapping F (r1 ) to K i ¯ [F (r1 ) : F] extends to an isomorphism from K to K. 2.) For each of these ζ i , there are no more than [K : F (r1 )] extensions of these isomorphisms, exactly [K : F (r1 )] in case the roots of p¯ (x) are distinct. Therefore, if the roots of p¯ (x) are distinct, this has shown that there are [K : F (r1 )] m = [K : F (r1 )] [F (r1 ) : F] = [K : F] ¯ which agree with η on F. If the roots of p¯ (x) are not distinct, then maybe isomorphisms of K to K there are fewer than [K : F] extensions of η. ¯ Then consider Is this all of the isomorphisms? Suppose ζ is such an isomorphism of K and K. its restriction to F (r1 ) . By Corollary 10.4.6, this restriction must coincide with one of the ζ i chosen earlier. Then by induction, ζ is one of the extensions of the ζ i just mentioned. Thus, in particular, ¯ are isomorphic.  K and K

10.4.1

The Galois Group

First, here is the definition of a Group. Definition 10.4.10 A group G is a nonempty set with an operation, denoted here as · such that the following axioms hold. (Often the operation is composition.) 1. For α, β, γ ∈ G, (α · β) · γ = α · (β · γ) . We usually don’t bother to write the ·. 2. There exists ι ∈ G such that αι = ια = α 3. For every α ∈ G, there exists α−1 ∈ G such that αα−1 = α−1 α = ι. ¯ and the isomorphism of F with itself is just In Theorem 10.4.9, consider the case where F = F the identity. Definition 10.4.11 When K is a finite extension of L, denote by G (K, L) the automorphisms of K which leave L fixed. For a finite set S, denote by |S| as the number of elements of S. Most of the following theorem was shown earlier in Theorem 10.4.9. Theorem 10.4.12 Let K be the splitting field of p (x) over the field F. F [a1 , ..., an ] where {a1 , ..., an } are the roots of p (x). Then |G (K, F)| ≤ [K : F]

Thus K consists of (10.22)

When the roots of p¯ (x) = p (x) are distinct, equality holds in the above. If the roots are listed according to multiplicity, the automorphisms are determined by the permutations of the roots. When the roots are distinct, |G (K, F)| = n!. Also, G (K, F) is a group for the operation being composition.

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

225

Proof: So how large is |G (K, F)| in case p (x) is a polynomial of degree n which has n distinct roots? Let p (x) be a monic polynomial with roots in K, {r1 , · · · , rn } and suppose that none of the ri is in F. Thus p (x) = xn + a1 xn−1 + a2 xn−2 + · · · + an =

n ∏

(x − rk ) , ai ∈ F

k=1

Thus K = F [r1 , · · · , rn ]. Let σ be a mapping from {r1 , · · · , rn } to {r1 , · · · , rn } , say rj → rij . In other words σ produces a permutation of these roots. Consider the following way of obtaining something in G (K, F) from σ. If you have a typical thing in K, you can obtain another thing in K by replacing each rj with rij in an element of F [r1 , · · · , rn ], a polynomial which has coefficients in F. Furthermore, if you do this, then the resulting map from K to K is an automorphism, preserving the operations of multiplication and addition. Does it keep F fixed? Of course it does because you don’t change the coefficients of the polynomials which are always in F. Thus every permutation of the roots determines an automorphism of K. Now suppose σ is an automorphism of K and the roots of p (x) are distinct. Does σ determine a permutation of the roots? If ri is a root, what of σ (ri )? Is it also a root simply due to σ being an automorphism? Note that σ (0) = 0 and so σ (0) = 0 = σ (p (ri )) = p (σ (ri )), the last from the assumption that σ is an automorphism. Thus σ maps roots to roots. Since it is one to one and the roots are distinct, it must be a permutation. It follows that |G (K, F)| equals the number of permutations of {r1 , · · · , rn } which is n! and that there is a one to one correspondence between the permutations of the roots and G (K, F) . It is always the case that an automorphism takes roots to roots, but if the roots are repeated, then there may be fewer than n! of these automorphisms. Now consider the claim about G (K, F) being a group. The associative law (α · β) · γ = α · (β · γ) is obvious. This is just the way composition acts. The identity ι is just the identity map, clearly an automorphism which fixes F. Each automorphism is, by definition one to one and onto. Therefore, the inverse must also be an automorphism. Indeed, if σ (x) , σ (y) are two generic things in K, then σ −1 (σ (x) σ (y)) = σ −1 (σ (xy)) = xy = σ −1 (σ (x)) σ −1 (σ (y)) That σ −1 is an automorphism with respect to addition goes the same way. The estimate on the size is from 10.22. Does the inverse fix F? Consider α, α2 , · · · . Because of the estimate(on the )m size of G (K, F) , you must have αm = αn for some m < n. Hence multiply on the left by α−1 to get n−m −1 (n−1)−m ι=α . Thus α = α which is α raised to a nonnegative power. The right leaves F fixed and so the left does also.  In the above, there is a field which is a finite extension of a smaller field and the group of automorphisms which leave the given smaller field fixed was discussed. Next is a more general notion in which there is given a group of automorphisms. This group will determine a smaller field called a fixed field. Definition 10.4.13 Let G be a group of automorphisms of a field K. Then denote by KG the fixed field of G. Thus KG ≡ {x ∈ K : σ (x) = x for all σ ∈ G} Lemma 10.4.14 Let G be a group of automorphisms of a field K. Then KG is a field. Proof: It suffices to show that KG is closed with respect to the operations of the field K. Suppose then x, y ∈ KG . Is x + y ∈ KG ? Is xy ∈ KG ? This is obviously so because the things in G are automorphisms. Thus if θ ∈ G, θ (x + y) = θx + θy = x + y. It is similar with multiplication.  There is another fundamental estimate due to Artin and is certainly not obvious. I also found this in [24]. There is more there about some of these things than what I am including. Above it was shown that |G (K, F)| ≤ [K : F] . This fundamental estimate goes the other direction when F is a fixed field.

226

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Theorem 10.4.15 Let K be a field and let G be a finite group of automorphisms of K. Then [K : KG ] ≤ |G|

(10.23)

Proof: Let G = {σ 1 , · · · , σ n } , σ 1 = ι the identity map and suppose {u1 , · · · , um } is a linearly independent set in K with respect to the field KG . These σ i are the automorphisms of K. Suppose m > n. Then consider the system of equations σ 1 (u1 ) x1 + σ 1 (u2 ) x2 + · · · + σ 1 (um ) xm = 0 σ 2 (u1 ) x1 + σ 2 (u2 ) x2 + · · · + σ 2 (um ) xm = 0 .. . σ n (u1 ) x1 + σ n (u2 ) x2 + · · · + σ n (um ) xm = 0

(10.24)

which is of the form M x = 0 for x ∈ Km . Since M has more columns than rows, there exists a nonzero solution x ∈ Km to the above system. Let the solution x be one which has the least possible number of nonzero entries. Without loss of generality, some xk = 1 for some k. If σ r (xk ) = xk for all xk and for each r, then the xk are each in KG and so the first equation would say u1 x1 + u2 x2 + · · · + um xm = 0 with not all xi = 0 and this contradicts the linear independence of the ui . Therefore, there exists l ̸= k and σ r such that σ r (xl ) ̸= xl . For purposes of illustration, say l > k. Now do σ r to both sides of all the above equations. This yields, after re ordering the resulting equations a list of equations of the form σ 1 (u1 ) σ r (x1 ) + · · · + σ 1 (uk ) 1 + · · · + σ 1 (ul ) σ r (xl ) + · · · + σ 1 (um ) σ r (xm ) = 0 σ 2 (u1 ) σ r (x1 ) + · · · + σ 2 (uk ) 1 + · · · + σ 2 (ul ) σ r (xl ) + · · · + σ 2 (um ) σ r (xm ) = 0 .. . σ n (u1 ) σ r (x1 ) + · · · + σ n (uk ) 1 + · · · + σ n (ul ) σ r (xl ) + · · · + σ n (um ) σ r (xm ) = 0 This is because σ (1) = 1 if σ is an automorphism. It is of the form M σ r (x) = 0. The original system in 10.24 is of the form σ 1 (u1 ) x1 + · · · + σ 1 (uk ) 1 + · · · + σ 1 (ul ) xl + · · · + σ 1 (um ) xm = 0 σ 2 (u1 ) x1 + · · · + σ 2 (uk ) 1 + · · · + σ 2 (ul ) xl + · · · + σ 2 (um ) xm = 0 .. . σ n (u1 ) x1 + · · · + σ n (uk ) 1 + · · · + σ n (ul ) xl + · · · + σ n (um ) xm = 0 which will be denoted as M x = 0. Thus M (σ r (x) − x) = 0 where y ≡ σ r (x) − x ̸= 0. If any xk is 0, then σ r (xk ) = 0. Thus all zero entries of x remain 0 in y and yk = 0 whereas xk ̸= 0 so y has fewer nonzero entries than x contradicting the choice of x as the one with fewest nonzero entries such that M x = 0.  With the above estimate, here is another relation between the fixed fields and subgroups of automorphisms. Proposition 10.4.16 Let H be a finite group of automorphisms defined on a field K. Then for KH the fixed field, G (K, KH ) = H Proof: If σ ∈ H, then by definition of KH , σ ∈ G (K, KH ) so H ⊆ G (K, KH ) . Then by Theorem 10.4.15 and Theorem 10.4.12, |H| ≥ [K : KH ] ≥ |G (K, KH )| ≥ |H|

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

227

and so H = G (K, KH ).  For H a group of automorphisms of G (K, F) , let Hx be all hx for h ∈ H. Thus Hx = x means hx = x for all h ∈ H. KH = {x ∈ K : Hx = x}. ( ( ) ) G K, KG(K,F) = Note how this proposition shows G (K, F) = G K, K . Thus |G (K, F)| = G(K,F) [ ] K : KG(K,F) . Is KG(K,F) = F? If x ∈ F, G (K, F) x = x so by definition, x ∈ KG(K,F) and so F ⊆ KG(K,F) . However, if x ∈ KG(K,F) so G (K, F) fixes x, it is not at all clear that x ∈ F. Maybe G (K, F) fixes more things than F. Later a situation is given in which KG(K,F) = F. Summary 10.4.17 The following are now available. 1. Let K be the splitting field of p (x). Then |G (K, F)| ≤ [K : F] . If the roots of p (x) are unique, then these are equal. 2. |G (K, F)| ≤ n! and when the roots of p (x) are distinct, |G (K, F)| = n!. 3. If H is a finite group of automorphisms on an arbitrary field K, then G (K, KH ) = H where KH is the fixed field of H. 4. F ⊆ KG(K,F) but it is not clear that these are equal.

10.4.2

Normal Field Extensions

The following is the definition of a normal field extension. Definition 10.4.18 Let K be a finite dimensional extension of a field F such that every element of K is algebraic over F, that is, each element of K is a root of some polynomial in F [x]. Then K is called a normal extension if for every k ∈ K all roots of the minimum polynomial of k are contained in K. So what are some ways to tell that a field is a normal extension? It turns out that if K is a splitting field of f (x) ∈ F [x] , then K is a normal extension. I found this in [24]. This is an amazing result. Proposition 10.4.19 The following are valid 1. Let K be a splitting field of f (x) ∈ F [x]. Then K is a normal extension. 2. If L is an intermediate field between F and K where K is a normal field extension of F, then L is also a normal extension of F. Proof: 1.) Let r ∈ K ≡ F (a1 , ..., aq ) where {a1 , ..., aq } are the roots of f (x) and let g (x) be the minimum polynomial of r with coefficients in F. Thus, g (x) is an irreducible monic polynomial in F [x] having r as a root. It is required to show that every other root of g (x) is in K. Let the roots of g (x) in a splitting field be {r1 = r, r2 , · · · , rm }. Now g (x) is the minimum polynomial of rj over F because g (x) is irreducible by Lemma 10.4.7. By Theorem 10.4.5, there exists an isomorphism η of F (r1 ) and F (rj ) which fixes F and maps r1 to rj . Thus η is an extension of the identity on F. Now K (r1 ) and K (rj ) are splitting fields of f (x) over F (r1 ) and F (rj ) respectively. By Theorem 10.4.9, the two fields K (r1 ) and K (rj ) are isomorphic, the isomorphism, ζ extending η. Hence [K (r1 ) : K] = [K (rj ) : K] But r1 ∈ K and so K (r1 ) = K. Therefore, [K (rj ) : K] = 1 and so K = K (rj ) and so rj is also in K. Thus all the roots of g (x) are in K. 2.) Consider the last assertion. Suppose r = r1 ∈ L where the minimum polynomial for r is denoted by q (x). Then since K is a normal extension, all the roots of q (x) are in K. Let them be

228

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

{r1 , · · · , rm }. By Theorem 10.4.5 applied to the identity map on L, there exists an isomorphism θ : L (r1 ) → L (rj ) which fixes L and takes r1 to rj . But this implies that 1 = [L (r1 ) : L] = [L (rj ) : L] Hence rj ∈ L also. If rj ∈ / L, then {1, rj } is independent and so the dimension would be at least 2. Since r was an arbitrary element of L, this shows that L is normal. 

10.4.3

Normal Subgroups and Quotient Groups

When you look at groups, one of the first things to consider is the notion of a normal subgroup. The word “normal” is greatly over used in math. Its meaning in this context is given next. Definition 10.4.20 Let G be a group. A subset N of a group G is called a subgroup if it contains ι the identity and is closed with respect to the operation on G. That is, if α, β ∈ N, then αβ ∈ N . Then a subgroup N is said to be a normal subgroup if whenever α ∈ G, α−1 N α ⊆ N The important thing about normal subgroups is that you can define the quotient group G/N . Definition 10.4.21 Let N be a subgroup of G. Define an equivalence relation ∼ as follows. α ∼ β means α−1 β ∈ N Why is this an equivalence relation? It is clear that α ∼ α because α−1 α = ι ∈ N since N is a subgroup. If α ∼ β, then α−1 β ∈ N and so, since N is a subgroup, ( −1 )−1 α β = β −1 α ∈ N which shows that β ∼ α. Now suppose α ∼ β and β ∼ γ. Then α−1 β ∈ N and β −1 γ ∈ N. Then since N is a subgroup α−1 ββ −1 γ = α−1 γ ∈ N and so α ∼ γ which shows that it is an equivalence relation as claimed. Denote by [α] the equivalence class determined by α. Now in the case of N a normal subgroup, you can consider the quotient group. Definition 10.4.22 Let N be a normal subgroup of a group G and define G/N as the set of all equivalence classes with respect to the above equivalence relation. Also define [α] [β] ≡ [αβ] Proposition 10.4.23 The above definition is well defined and it also makes G/N into a group. Proof: First consider the claim that the definition is well defined. Suppose then that α ∼ α ¯ and ¯ It is required to show that β ∼ β. [ ] ¯ [αβ] = α ¯β −1

Is (αβ)

¯ ∈ N ? Is β −1 α−1 α ¯ ∈ N? α ¯β ¯β ∈N

−1

(αβ)

¯ α ¯β

= β

−1 −1

α

¯=β α ¯β

∈N

z }| { ¯ α α ¯β

−1 −1

∈N

z ( }| ) {z }| { ¯ = n1 n2 ∈ N = β −1 α−1 α ¯ β β −1 β Thus the [operation is well defined. Clearly the identity is [ι] where ι is the identity in G and the ] inverse is α−1 where α−1 is the inverse for α in G. The associative law is also obvious.  Note that it was important to have the subgroup be normal in order to have the operation defined on the quotient group consisting of the set of equivalence classes.

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

10.4.4

229

Separable Polynomials

This is a good time to make a very important observation about irreducible polynomials. Lemma 10.4.24 Suppose q (x) ̸= p (x) are both irreducible polynomials over a field F. Then there is no root common to both p (x) and q (x). Proof: If l (x) is a monic polynomial which divides them both, then l (x) must equal 1. Otherwise, it would equal p (x) and q (x) which would require these two to be equal. Thus p (x) and q (x) are relatively prime and there exist polynomials a (x) , b (x) having coefficients in F such that a (x) p (x) + b (x) q (x) = 1 Now if p (x) and q (x) share a root r, then (x − r) divides both sides of the above in K [x] where K is a field which contains all roots of both polynomials. But this is impossible.  Now here is an important definition of a class of polynomials which yield equality in the inequality of Theorem 10.4.12. We know that if p (x) of this theorem has distinct roots, then equality holds. However, there is a more general kind of polynomial which also gives equality. Definition 10.4.25 Let p (x) be a polynomial having coefficients in a field F. Also let K be a splitting field. Then p (x) is separable if it is of the form p (x) =

m ∏

ki

qi (x)

i=1

where each qi (x) is irreducible over F and each qi (x) has distinct roots in K. From the above lemma, no two qi (x) share a root. Thus p1 (x) ≡

m ∏

qi (x)

i=1

has distinct roots in K. Example 10.4.26 For example, consider the case where F = Q and the polynomial is of the form ( 2 )2 ( 2 )2 x +1 x − 2 = x8 − 2x6 − 3x4 + 4x2 + 4 [ √ ] Then let K be the splitting field over Q, Q i, 2 .The polynomials x2 + 1 and x2 − 2 are irreducible over Q and each has distinct roots in K. Then the following corollary is the reason why separable polynomials are so important. Also, one can show that if F contains a field which is isomorphic to Q then every polynomial with coefficients in F is separable. This will be done later after presenting the big results. This is equivalent to saying that the field has characteristic zero. In addition, the property of being separable holds in other situations. Corollary 10.4.27 Let K be a splitting field of p (x) over the field F. Assume p (x) is separable. Then |G (K, F)| = [K : F] Proof: Just note that K is also the splitting field of p1 (x), the product of the distinct irreducible factors and that from Lemma 10.4.24, p1 (x) has distinct roots. Thus the conclusion follows from Theorem 10.4.9 or 10.4.12.  What if L is an intermediate field between F and K? Then p1 (x) still has coefficients in L and distinct roots in K and so it also follows that |G (K, L)| = [K : L] Now the following says that you can start with L, go to the group G (K, L) and then to the fixed field of this group and end up back where you started. More precisely,

230

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Proposition 10.4.28 If K is a splitting field of p (x) over the field F for separable p (x) , and if L is a field between K and F, then K is also a splitting field of p (x) over L and also L = KG(K,L) In every case, even if p (x) is not separable, L ⊆ KG(K,L) . Proof: First of all, I claim that L ⊆ KG(K,L) in any case. This is because of the definition. If l ∈ L, then it is in the fixed field of G (K, L) since by definition, G (K, L) fixes everything in L. Now suppose p (x) is separable. By the above Lemma 10.4.14 and Corollary 10.4.27, [ ][ ] |G (K, L)| = [K : L] = K : KG(K,L) KG(K,L) : L ( ) [ ] [ ] = G K, KG(K,L) KG(K,L) : L = |G (K, L)| KG(K,L) : L [ ] which shows that KG(K,L) : L = 1 and so, it follows that L = KG(K,L) . It is obvious that K is a splitting field of p (x) over L because L ⊇ F so the coefficients of p (x) are in L.  This has shown that in the context of K being a splitting field of a separable polynomial over F and L being an intermediate field, L is a fixed field of a subgroup of G (K, F) , namely G (K, L). F



L = KG(K,L)



K

In the above context, it is clear that G (K, L) ⊆ G (K, F) because if it fixes everything in L then it fixes everything in the smaller field F. Then an obvious question is whether every subgroup of G (K, F) is obtained in the form G (K, L) for some intermediate field L? This leads to the following interesting correspondence in the case where K is a splitting field of a separable polynomial over a field F. β

Fixed fields

L → G (K, L) α KH ← H

Subgroups of G (K, F)

(10.25)

Then αβL = L and βαH = H. Thus there exists a one to one correspondence between the fixed fields and the subgroups of G (K, F). The following theorem summarizes the above result. Theorem 10.4.29 Let K be a splitting field of a separable polynomial p (x) over a field F. Then there exists a one to one correspondence between the fixed fields KH for H a subgroup of G (K, F) and the intermediate fields as described in the above. H1 ⊆ H2 if and only if KH1 ⊇ KH2 . Also |H| = [K : KH ] Proof: The one to one correspondence is established above in Proposition 10.4.16 because G (K, KH ) = H whenever H is a subgroup of G (K, F). Thus each subgroup H determines an intermediate field KH(. Going the) other direction, if L is an intermediate field, it comes from a sub-group because G K, KG(K,L) = G (K, L) so L = KG(K,L) as mentioned earlier. The claim about the fixed fields is obvious because if the group is larger, then the fixed field must get harder because it is more difficult to fix everything using more automorphisms than with fewer automorphisms. Consider the estimate. From Theorem 10.4.15, |H| ≥ [K : KH ]. But also, H = G (K, KH ) from Proposition 10.4.16 G (K, KH ) = H and from Theorem 10.4.12, and what was just shown, |H| = |G (K, KH )| ≤ [K : KH ] ≤ |H| . Note that from the above discussion, when K is a splitting field of p (x) ∈ F [x] , this implies that if L is an intermediate field, then it is also a fixed field of a subgroup of G (K, F). In fact, from the above, L = KG(K,L) If H is a subgroup, then it is also the Galois group H = G (K, KH ) . By Proposition 10.4.19, each of these intermediate fields L is also a normal extension of F. Here is a summary of the principal items obtained up till now.

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

231

Summary 10.4.30 When K is the splitting field of a separable polynomial with coefficients in F, the following are obtained. 1. There is a one to one correspondence between the fixed fields KH and the subgroups H of G (K, F). This is given by θ (H) ≡ KH . θ−1 (L) = G (K, L). that is H = G (K, KH ) whenever H is a subgroup of G (K, F). 2. All the intermediate fields are normal field extensions of F and are fixed fields L = KG(K,L) . 3. For H a subgroup of G (K, F), |H| = [K : KH ] , H = G (K, KH ) . Are the Galois groups G (L, F) for L an intermediate field between F and K for K the splitting field of a separable polynomial normal subgroups of G (K, F)? It might seem like a normal expectation to have. One would hope this is the case.

10.4.5

Intermediate Fields and Normal Subgroups

When K is a splitting field of a separable polynomial having coefficients in F, the intermediate fields are each normal extensions from the above Proposition 10.4.19 which says that splitting fields are normal extensions. If L is one of these intermediate fields, what about G (L, F)? is this a normal subgroup of G (K, F)? More generally, consider the following diagram which has now been established in the case that K is a splitting field of a separable polynomial in F [x]. F ≡ L0 ⊆ L1 ⊆ L2 ··· G (F, F) = {ι} ⊆ G (L1 , F) ⊆ G (L2 , F) · · ·

⊆ Lk−1 ⊆ Lk ≡ K ⊆ G (Lk−1 , F) ⊆ G (K, F)

(10.26)

The intermediate fields Li are each normal extensions of F each element of Li being algebraic. As implied in the diagram, there is a one to one correspondence between the intermediate fields and the Galois groups displayed. Is G (Lj−1 , F) a normal subgroup of G (Lj , F)? Lemma 10.4.31 G (K, Lj ) is a normal subgroup of G (K, F). Here K is a splitting field for some polynomial having coefficients in F or more generally a normal extension of F. Proof: Let η ∈ G (K, F) and let σ ∈ G (K, Lj ) . Is η −1 ση ∈ G (K, Lj )? First I need to verify it is a automorphism on K. After this, I need to show that it fixes Lj . η −1 ση is obviously an automorphism on K because each in the product is. Does η −1 ση fix Lj ? Let r ∈ Lj with minimum polynomial f (x) having roots ri and coefficients in F. Then 0 = ηf (r) = f (η (r)) and so η (r) is one of the roots of f (x) . It follows that η (r) ∈ Lj because K is a normal extension and Lj is an intermediate field so is also a normal extension. See Proposition 10.4.19. Therefore, σ fixes η (r) and so η −1 ση (r) = η −1 η (r) = r.  Because of this lemma, it makes sense to consider the quotient group G (K, F) /G (K, Lj ). This leads to the following fundamental theorem of Galois theory. Theorem 10.4.32 Let K be a splitting field of a separable polynomial p (x) having coefficients in k a field F. Let {Li }i=0 be the increasing sequence of intermediate fields between F and K as shown above in 10.26. Then each of these is a normal extension of F (Proposition 10.4.19) and the Galois group G (K, Lj ) is a normal subgroup of G (K, F) and G (Lj , F) ≃ G (K, F) /G (K, Lj ) where the symbol ≃ indicates the two groups are isomorphic. Proof: All that remains is to check that the above isomorphism is valid. Let θ : G (K, F) /G (K, Lj ) → G (Lj , F) , θ [σ] ≡ σ|Lj In other words, this is just the restriction of σ to Lj . Thus the quotient group is well defined by Proposition 10.4.23. Is θ well defined? First of all, does it have values in G (Lj , F)? In other words,

232

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

if σ ∈ G (K, F) , does its restriction to Lj send Lj to Lj ? If r ∈ Lj it has a minimum polynomial q (x) with coefficients in F. σ (r) is one of the other roots of q (x) (Theorem 10.4.12) so, since K is a normal extension, being a splitting field of a separable polynomial, σ (r) ∈ K. But these subfields are all normal extensions so σ (r) ∈ Lj . Thus θ has values in G (Lj , F) . Is θ well defined? If [σ 1 ] = [σ 2 ] , then by definition, σ −1 1 σ2 ∈ −1 G (K, Lj ) so σ −1 σ fixes everything in L . Thus if r ∈ L , σ σ r = r and so σ r = σ r. It follows 2 1 1 2 2 1 1 1 that the restrictions of σ 1 and σ 2 to Lj are equal. Therefore, θ is well defined. It is obvious that θ is a homomorphism. Why is θ onto? This follows right away from Theorem 10.4.9. Note that K is the splitting field of p (x) over Lj since Lj ⊇ F. Also if σ ∈ G (Lj , F) so it is an automorphism of Lj , then, since it fixes F, p (x) = p¯ (x) in that theorem. Thus σ extends to ζ, an automorphism of K. Thus θζ = σ. Why is θ one to one? If θ [σ] = θ [α] , this means σ = α on Lj . Thus σα−1 is the identity on Lj . Hence σα−1 ∈ G (K, Lj ) which is what it means for [σ] = [α].  The following picture is a summary of what has just been shown. Lk ≡ K = KG(K,F) .. . Lj = KG(Lj ,F) .. . L1 = KG(L1 ,F) F ≡ L0

10.4.6

G (K, F) .. . G (Lj , F) .. . G (L1 , F) G (L0 , F) = {ι}

≃ G (K, F) /G (K, K) .. . ≃ G (K, F) /G (K, Lj ) .. . ≃ G (K, F) /G (K, L1 ) ≃ G (K, F) /G (K, F)

Permutations

As explained above, the automorphisms of a splitting field K of p (x) ∈ F [x] are determined by the permutations of the roots of p (x) . Thus it makes sense to consider permutations. Let {a1 , · · · , an } be a set of distinct elements. Then a permutation of these elements is usually thought of as a list in a particular order. Thus there are exactly n! permutations of a set having n distinct elements. With this definition, here is a simple lemma. Lemma 10.4.33 Every permutation can be obtained from every other permutation by a finite number of switches. Proof: This is obvious if n = 1 or 2. Suppose then that it is true for sets of n − 1 elements. Take two permutations of {a1 , · · · , an } , P1 , P2 . To get from P1 to P2 using switches, first make a switch to obtain the last element in the list coinciding with the last element of P2 . By induction, there are switches which will arrange the first n − 1 to the right order.  It is customary to consider permutations in terms of the set In ≡ {1, · · · , n} to be more specific. Then one can think of a given permutation as a mapping σ from this set In to itself which is one to one and onto. In fact, σ (i) ≡ j where j is in the ith position. Often people write such a σ in the following form ( ) 1 2 ··· n (10.27) i1 i2 · · · in meaning 1 → i1 , 2 → i2 , ... where {i1 , i2 , ..., in } = {1, 2, ..., n}. An easy way to understand the above permutation is through the use of matrix multiplication by permutation matrices. The above vector T (i1 , · · · , in ) is obtained by   1  ( )  2   (10.28) ei1 ei2 · · · ein  .    ..  n This can be seen right away from looking at a simple example or by using the definition of matrix multiplication directly.

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

233

Definition 10.4.34 The sign of the permutation 10.27 is defined as the determinant of the above matrix in 10.28. In other words, the sign of the permutation ( 1 2 i1 i2

··· ···

n in

)

equals sgn (i1 , · · · , in ) defined earlier in Lemma 8.1.1. Note that from the fact that the determinant is well defined and its properties, the sign of a permutation is 1 if and only if the permutation is produced by an even number of switches and that the number of switches used to produce a given permutation must be either even or odd. Of course a switch is a permutation itself and this is called a transposition. Note also that all these matrices are orthogonal matrices so to take the inverse, it suffices to take a transpose, the inverse also being a permutation matrix. The resulting group consisting of the permutations of In is called Sn . An important idea is the notion of a cycle. Let σ be a permutation, a one to one and onto function defined on In . A cycle is of the form ( ) k, σ (k) , σ 2 (k) , σ 3 (k) , · · · , σ m−1 (k) , σ m (k) = k. The last condition must hold for some m because In is finite. Then a cycle can be considered as a permutation as follows. Let (i1 , i2 , · · · , im ) be a cycle. Then define σ by σ (i1 ) = i2 , σ (i2 ) = i3 , · · · , σ (im ) = i1 , and if k ∈ / {i1 , i2 , · · · , im } , then σ (k) = k. Note that if you have two cycles, (i1 , i2 , · · · , im ) , (j1 , j2 , · · · , jm ) which are disjoint in the sense that {i1 , i2 , · · · , im } ∩ {j1 , j2 , · · · , jm } = ∅, then they commute. It is then clear that every permutation can be represented in a unique way by disjoint cycles. Start with 1 and form the cycle determined by 1. Then start with the smallest k ∈ In which was not included and begin a cycle starting with this. Continue this way. Use the convention that (k) is just the identity sending k to k and all other indices to themselves. This representation is unique up to order of the cycles which does not matter because they commute. Note that a transposition can be written as (a, b), a → b and b → a. A cycle can be written as a product of non disjoint transpositions. (i1 , i2 , · · · , im ) = (im−1 , im ) · · · (i3 , im ) (i2 , im ) (i1 , im ) Thus if m is odd, the permutation has sign 1 and if m is even, the permutation has sign −1. Also, it is −1 clear that the inverse of the above permutation is (i1 , i2 , · · · , im ) = (im , · · · , i2 , i1 ) . For example, (1, 2, 3) = (2, 3) (1, 3) Definition 10.4.35 An is the subgroup of Sn such that for σ ∈ An , σ is the product of an even number of transpositions. It is called the alternating group. Since each transposition switches a pair of columns in the above permutation matrix, the sign of the determinant which is the sign of the permutation is always 1 for permutations in An . This is another way to describe An , those permutations with sign 1. If n = 1, there is only one permutation and it is the identity so A1 = identity. If n = 2, you would have two permutations, the identity and the transposition (1, 2). Thus A2 = identity. It might be useful to think of the identity map as having zero transpositions. The following important result is useful in describing An . Proposition 10.4.36 Let n ≥ 3. Then every permutation in An is the product of 3 cycles and the identity.

234

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA Proof: In case n = 3, you can list all of the permutations in An . ( ) ( ) ( ) 1 2 3 1 2 3 1 2 3 , , 1 2 3 2 3 1 3 1 2

In terms of cycles, these are identity, (1, 2, 3) , (1, 3, 2) You can easily check that the the last two are inverses of each other. Now suppose n ≥ 4. The permutations in An are defined as the product of an even number of transpositions. There are two cases. The first case is where you have two transpositions which share a number, (a, c) (c, b) = (a, c, b) Thus when they share a number, the product is just a 3 cycle. Next suppose you have the product of two transpositions which are disjoint. This can happen because n ≥ 4. First note that (a, b) = (c, b) (b, a, c) = (c, b, a) (c, a) Therefore, (a, b) (c, d) = =

(c, b, a) (c, a) (a, d) (d, c, a) (c, b, a) (c, a, d) (d, c, a)

and so every product of disjoint transpositions is the product of 3 cycles.  Lemma 10.4.37 If n ≥ 5, then if B is a normal subgroup of An , and B is not the identity, then B must contain a 3 cycle. Proof: Let α be the permutation in B which is “closest” to the identity without being the identity. That is, out of all permutations which are not the identity, this is one which has the most fixed points or equivalently moves the fewest numbers. Then α is the product of disjoint cycles. Suppose that the longest cycle is the first one and it has at least four numbers. Thus α = (i1 , i2 , i3 , i4 , · · · , m) γ 1 · · · γ p Since B is normal, α1 ≡ (i3 , i2 , i1 ) (i1 , i2 , i3 , i4 , · · · , m) (i1 , i2 , i3 ) γ 1 · · · γ p ∈ Am Then since the various cycles are disjoint, α1 α−1 = −1 (i3 , i2 , i1 ) (i1 , i2 , i3 , i4 , · · · , m) (i1 , i2 , i3 ) γ 1 · · · γ p (m, · · · , i4 , i3 , i2 , i1 ) γ −1 p · · · γ1

=

−1 (i3 , i2 , i1 ) (i1 , i2 , i3 , i4 , · · · , m) (i1 , i2 , i3 ) (m, · · · , i4 , i3 , i2 , i1 ) γ 1 · · · γ p γ −1 p · · · γ1

= (i3 , i2 , i1 ) (i1 , i2 , i3 , i4 , · · · , m) (i1 , i2 , i3 ) (m, · · · , i4 , i3 , i2 , i1 ) Then for this permutation, i1 → i3 , i2 → i2 , i3 → i4 , i4 → i1 . The other numbers not in {i1 , i2 , i3 , i4 } are fixed, and in addition i2 is fixed which did not happen with α. Therefore, this new permutation moves only 3 numbers. Since it is assumed that m ≥ 4, this is a contradiction to α fixing the most points. It follows that α = (i1 , i2 , i3 ) γ 1 · · · γ p (10.29) or else α = (i1 , i2 ) γ 1 · · · γ p In the first case 10.29, say γ 1 = (i4 , i5 , · · · ) . Multiply as follows α1 = (i4 , i2 , i1 ) (i1 , i2 , i3 ) (i4 , i5 , · · · ) γ 2 · · · γ p (i1 , i2 , i4 ) ∈ B

(10.30)

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

235

Then form α1 α−1 ∈ B given by −1 (i4 , i2 , i1 ) (i1 , i2 , i3 ) (i4 , i5 , · · · ) γ 2 · · · γ p (i1 , i2 , i4 ) γ −1 p · · · γ 1 (i3 , i2 , i1 )

= (i4 , i2 , i1 ) (i1 , i2 , i3 ) (i4 , i5 , · · · ) (i1 , i2 , i4 ) (· · · , i5 , i4 ) (i3 , i2 , i1 ) Then i1 → i4 , i2 → i3 , i3 → i5 , i4 → i2 , i5 → i1 and other numbers are fixed. Thus α1 α−1 moves 5 points. However, α moves more than 5 if γ i is not the identity for any i ≥ 2. It follows that α = (i1 , i2 , i3 ) γ 1 and γ 1 can only be a transposition. However, this cannot happen because then the above α would not even be in An . Therefore, γ 1 = ι and so α = (i1 , i2 , i3 ) Thus in this case, B contains a 3 cycle. Now consider case 10.30. None of the γ i can be a cycle of length more than 4 since the above argument would eliminate this possibility. If any has length 3 then the above argument implies that α equals this 3 cycle. It follows that each γ i must be a 2 cycle. Say α = (i1 , i2 ) (i3 , i4 ) γ 2 · · · γ p Thus it moves at least four numbers, greater than four if any of γ i for i ≥ 2 is not the identity. As before, α1 ≡ (i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) γ 2 · · · γ p (i1 , i2 , i4 ) =

(i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) (i1 , i2 , i4 ) γ 2 · · · γ p ∈ B

Then α1 α−1 = −1 −1 (i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) (i1 , i2 , i4 ) γ 2 · · · γ p γ −1 p · · · γ 2 γ 1 (i3 , i4 ) (i1 , i2 )

=

(i4 , i2 , i1 ) (i1 , i2 ) (i3 , i4 ) (i1 , i2 , i4 ) (i3 , i4 ) (i1 , i2 ) ∈ B

Then i1 → i3 , i2 → i4 , i3 → i1 , i4 → i3 so this moves exactly four numbers. Therefore, none of the γ i is different than the identity for i ≥ 2. It follows that α = (i1 , i2 ) (i3 , i4 )

(10.31)

and α moves exactly four numbers. Then since B is normal, α1 ≡ (i5 , i4 , i3 ) (i1 , i2 ) (i3 , i4 ) (i3 , i4 , i5 ) ∈ B Then α1 α−1 =

(i5 , i4 , i3 ) (i1 , i2 ) (i3 , i4 ) (i3 , i4 , i5 ) (i3 , i4 ) (i1 , i2 ) ∈ B

Then i1 → i1 , i2 → i2 , i3 → i4 , i4 → i5 , i5 → i3 . Thus this permutation moves only three numbers and so α cannot be of the form given in 10.31. It follows that case 10.30 does not occur.  Definition 10.4.38 A group G is said to be simple if its only normal subgroups are itself and the identity. The following major result is due to Galois [24]. Proposition 10.4.39 Let n ≥ 5. Then An is simple.

236

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Proof: From Lemma 10.4.37, if B is a normal subgroup of An , B ̸= {ι} , then it contains a 3 cycle α = (i1 , i2 , i3 ), ( ) i1 i2 i3 i2 i3 i1 Now let (j1 , j2 , j3 ) be another 3 cycle. (

j1 j2

j2 j3

j3 j1

)

Let σ be a permutation which satisfies σ (ik ) = jk Then σασ −1 (j1 ) =

σα (i1 ) = σ (i2 ) = j2

σασ −1 (j2 ) =

σα (i2 ) = σ (i3 ) = j3

σασ

−1

(j3 ) =

σα (i3 ) = σ (i1 ) = j1

while σασ −1 leaves all other numbers fixed. Thus σασ −1 is the given 3 cycle. It follows that B contains every 3 cycle not just a particular one. By Proposition 10.4.36, this implies B = An . The only problem is that it is not known whether σ is in An a product of an even number of transpositions. This is where n ≥ 5 is used. If necessary, you can modify σ on two numbers not equal to any of the {i1 , i2 , i3 } by multiplying by a transposition so that the possibly modified σ is expressed as an even number of transpositions. 

10.4.7

Solvable Groups

Recall the fundamental theorem of Galois theory which established a correspondence between the normal subgroups of G (K, F) and normal field extensions whenever K is the splitting field of a separable polynomial p (x). Also recall that if H is one of these normal subgroups, then there was an isomorphism between G (KH , F) and the quotient group G (K, F) /H. The general idea of a solvable group is given next. m

Definition 10.4.40 A group G is solvable if there exists a decreasing sequence of subgroups {Hi }i=0 such that Hi is a normal subgroup of H(i−1) , G = H0 ⊇ H1 ⊇ · · · ⊇ Hm = {ι} , and each quotient group Hi−1 /Hi is Abelian. That is, for [a] , [b] ∈ Hi−1 /Hi , [ab] = [a] [b] = [b] [a] = [ba] Note that if G is an Abelian group, then it is automatically solvable. In fact you can just consider H0 = G, H1 = {ι}. In this case H0 /H1 is just the group G which is Abelian. Also, the definition requires Hm−1 to be Abelian. There is another idea which helps in understanding whether a group is solvable. It involves the commutator subgroup. This is a very good idea. Definition 10.4.41 Let a, b ∈ G a group. Then the commutator is aba−1 b−1 The commutator subgroup, denoted by G′ , is the smallest subgroup which contains all the commutators.

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

237

The nice thing about the commutator subgroup is that it is a normal subgroup. There are also many other amazing properties. Theorem 10.4.42 Let G be a group and let G′ be the commutator subgroup. Then G′ is a normal subgroup. Also the quotient group G/G′ is Abelian. If H is any normal subgroup of G such that G/H is Abelian, then H ⊇ G′ . If G′ = {ι} , then G must be Abelian. Proof: The elements of G′ are just finite products of things like aba−1 b−1 . Note that the inverse of something like this is also one of these. ( −1 −1 )−1 aba b = bab−1 a−1 . Thus the collection of finite products is indeed a subgroup. Now consider h ∈ G. Then haba−1 b−1 h−1 = hah−1 hbh−1 ha−1 h−1 hb−1 h−1 ( )−1 ( )−1 = hah−1 hbh−1 hah−1 hbh−1 which is another one of those commutators. Thus for c a commutator and h ∈ G, hch−1 = c1 another commutator. If you have a product of commutators c1 c2 · · · cm , then hc1 c2 · · · cm h−1 =

m ∏

hci h−1 =

i=1

m ∏

di ∈ G′

i=1



where the di are each commutators. Hence G is a normal subgroup. Consider now the quotient group. Is [g] [h] = [h] [g]? In other words, is [gh] = [hg]? In other −1 ´ ′ consists of products words, is gh (hg) = ghg −1 h−1 ∈ G′ ? Of course. This is a commutator and G of these things. Thus the quotient group is Abelian. Now let H be a normal subgroup of G such that G/H is Abelian. Then if g, h ∈ G, −1

[gh] = [hg] , gh (hg)

= ghg −1 h−1 ∈ H

Thus every commutator is in H and so H ⊇ G. The last assertion is obvious because G/ {ι} is isomorphic to G. Also, to say that G′ = {ι} is to say that aba−1 b−1 = ι which implies that ab = ba.  Let G be a group and let G′ be its commutator subgroup. Then the commutator subgroup of G′ is G′′ and so forth. To save on notation, denote by G(k) the k th commutator subgroup. Thus you have the sequence G ≡ G(0) ⊇ G(1) ⊇ G(2) ⊇ G(3) · · · each G(i) being a normal subgroup of G(i−1) although this does not say that G(i) is a normal subgroup of G. Then there is a useful criterion for a group to be solvable. ˆ is a subgroup of G then G ˆ is also solvable. Theorem 10.4.43 If G is a solvable group and G Proof: Suppose G = H0 ⊇ H1 ⊇ · · · ⊇ Hm = {ι} where the quotient groups are Abelian ˆ Would this be a normal subgroup of G? ˆ Let a ∈ G ˆ and the Hi are normal. Consider Hk ∩ G. −1 ˆ ˆ ˆ ˆ and x ∈ Hk ∩ G. Is axa ∈ Hk ∩ G? We know this product is in G because G is a subgroup. ˆ We know it is in H because H What k ( ) ( ) k is normal. Thus the Hk ∩ G are normal. ( ) of ( the quotient ) ˆ ˆ ˆ ˆ is groups Hk ∩ G / Hk+1 ∩ G ? Are these Abelian? If [x] , [y] are in Hk ∩ G / Hk+1 ∩ G −1 ˆ This equals y −1 x−1 yx. However, xy, yx are both in Hk and [xy] = [yx]? Is (xy) yx ∈ Hk+1 ∩ G? −1 −1 ˆ because the quotient groups Hk /Hk+1 are Abelian, so (xy) yx ∈ Hk+1 . But also (xy) yx ∈ G ˆ G is a subgroup. Hence [x] [y] = [xy] = [yx] = [y] [x] and so the quotient groups are Abelian. Hence ˆ = H0 ∩ G ˆ⊇G ˆ ∩ H1 ⊇ · · · ⊇ G ˆ ∩ Hm = {ι} and so G ˆ is solvable.  G

238

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Theorem 10.4.44 Let G be a group. It is solvable if and only if G′ is solvable so G(k) = {ι} for some k. Proof: If G(k) = {ι} then G is clearly solvable because G(k−1) /G(k) is Abelian by Theorem 10.4.42. The sequence of commutator subgroups provides the necessary sequence of subgroups. Next suppose that you have G = H0 ⊇ H1 ⊇ · · · ⊇ Hm = {ι} where each is normal in the preceding and the quotient groups are Abelian. Then from Theorem 10.4.42, G(1) ⊆ H1 . Thus H1′ ⊇ G(2) . But also, from Theorem 10.4.42, since H1 /H2 is Abelian, H2 ⊇ H1′ ⊇ G(2) . Continuing this way G(k) = {ι} for some k ≤ m. ˆ = G′ . This is solvable by Theorem 10.4.43 since it is a subgroup Alternatively, you could let G of G.  Theorem 10.4.45 If G is a solvable group and if H is a homomorphic image of G, then H is also solvable. Proof: By the above theorem, it suffices to show that H (k) = {ι} for some k. Let f be the homo−1 −1 morphism. H ′ = f (G′ ). To see this, consider a commutator of H, f (a) f (b) f (a) f (b) = ( −1 −1Then ) ( (1) ) (1) (1) f aba b . It follows that H = f G . Now continue this way, letting G play the role of G and H (1) the role of H. Thus, since G is solvable, some G(k) = {ι} and so H (k) = {ι} also.  Now as an important example, of a group which is not solvable, here is a theorem. Theorem 10.4.46 For n ≥ 5, Sn is not solvable. Proof: It is clear that An is a normal subgroup of Sn because if σ is a permutation, then it has the same sign as σ −1 . Thus σασ −1 ∈ An if α ∈ An because both α and σασ −1 are a product of an even number of transpositions. If H is a normal subgroup of Sn , for which Sn /H is Abelian, then H contains the commutator Sn′ . However, ασα−1 σ −1 ∈ An obviously so An ⊇ Sn′ . By Proposition (k) 10.4.39 (An is simple), this forces Sn′ = An . So what is Sn′′ ? If it is Sn , then Sn ̸= {ι} for any k and it follows that Sn is not solvable. If Sn′′ = {ι} , the only other possibility, then An / {ι} is Abelian and so An is Abelian, but this is obviously false because the cycles (1, 2, 3) , (2, 1, 4) are both in An . However, (1, 2, 3) (2, 1, 4) is ( ) 1 2 3 4 4 2 1 3 while (2, 1, 4) (1, 2, 3) is

(

1 2 1 3

3 4

4 2

)

Alternatively, by Theorem 10.4.43, if Sn is solvable, then so is An . However, An is simple so there is no normal subgroup other than An and ι. Now An / {ι} = An is not commutative for n ≥ 4.  Note that the above shows that An is not Abelian for n = 4 also.

10.4.8

Solvability by Radicals

The idea here is to begin with a big field F and show that there is no way to solve the polynomial in terms of radicals of things in the big field. It will then follow that there is no way to get a solution in terms of radicals of things in a smaller field like the rational numbers. The most interesting conclusion is what will be presented here, that you can’t do it. This amazing conclusion is due to

10.4. MORE ON ALGEBRAIC FIELD EXTENSIONS

239

Abel and Galois and dates from the 1820’s. It put a stop to the search for formulas which would solve polynomial equations. First of all, in the case where all fields are contained in C, there exists a field which has all the nth roots of 1. You could simply define it to be the smallest sub field of C such that it contains these roots. You could also enlarge it by including some other numbers. For example, you could include Q. Observe that if ξ ≡ ei2π/n , then ξ n = 1 but ξ k ̸= 1 if k < n and that if k < l < n, ξ k ̸= ξ l . The following is from Herstein [19]. This is the kind of field considered here. Lemma 10.4.47 Suppose a field F has all the nth roots of 1 for a particular n and suppose there exists ξ such that the nth roots of 1 are of the form ξ k for k = 1, · · · , n, the ξ k being distinct, as is the case when all fields are in C. Let a ∈ F be nonzero. Let K denote the splitting field of xn − a over F, thus K is a normal extension of F. Then K = F (u) where u is any root of xn − a. The Galois group G (K, F) is Abelian. Proof: Let u be a root of xn − a and let K equal F (u) . Then let ξ be the nth root of unity mentioned. Then ( )n k ξ k u = (ξ n ) un = a { } and so each ξ k u is a root of xn − a and these are distinct. It follows that u, ξu, · · · , ξ n−1 u are the roots of xn − a and all are in F (u) . Thus F (u) = K. Let σ ∈ G (K, F) and observe that since σ fixes F, (( )n ) ( ( ))n −a 0 = σ ξk u − a = σ ξk u It follows that σ maps roots of xn − a to roots of xn − a. Therefore, if σ, α are two elements of G (K, F) , there exist i, j each no larger than n − 1 such that σ (u) = ξ i u, α (u) = ξ j u A typical thing in F (u) is p (u) where p (x) ∈ F [x]. Then ( ) ( ) σα (p (u)) = p ξ j ξ i u = p ξ i+j u ( ) ( ) ασ (p (u)) = p ξ i ξ j u = p ξ i+j u Therefore, G (K, F) is Abelian.  Thus this one is clearly solvable as noted above. To say a polynomial is solvable by radicals is expressed precisely in the following definition. Definition 10.4.48 For F a field, a polynomial p (x) ∈ F [x] is solvable by radicals over F ≡ F0 if there are algebraic numbers ai , i = 1, 2, ..., k, and a sequence of fields F1 = F (a1 ) , F2 = F1 (a2 ) , · · · , Fk = Fk−1 (ak ) such that for each i ≥ 1, aki i ∈ Fi−1 and Fk contains a splitting field K for p (x) over F. Actually, the only case of interest here is included in the following lemma. Lemma 10.4.49 In Definition 10.4.48 when the roots of unity are of the form ξ k as described in Lemma 10.4.47, Fk is a splitting field provided you assume F contains all the nth roots of 1 for all k n ≤ max {ki }i=1 . ({ } { }kk −1 ) k1 −1 j and so Fk is , ..., aj1 Proof: by Lemma 10.4.47, Fk = F (a1 , a2 , · · · , ak ) = F a1 j=1 j=1 ) ( ∏k the splitting field of i=1 xki − aki i . Each ai is a single root of xki − aki i where aki i ∈ F.  At this point, it is a good idea to recall the big fundamental theorem mentioned above which gives the correspondence between normal subgroups and normal field extensions since it is about to be used again. F ≡ F0 ⊆ F1 ⊆ F2 ··· G (F, F) = {ι} ⊆ G (F1 , F) ⊆ G (F2 , F) · · ·

⊆ Fk−1 ⊆ Fk ≡ K ⊆ G (Fk−1 , F) ⊆ G (Fk , F)

(10.32)

240

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA k

Theorem 10.4.50 Let K be a splitting field for a separable polynomial p (x) ∈ F [x]. Let {Fi }i=0 be the increasing sequence of intermediate fields between F and K. Then each of these is a normal extension of F and the Galois group G (Fj−1 , F) is a normal subgroup of G (Fj , F). In addition to this, G (Fj , F) ≃ G (K, F) /G (K, Fj ) where the symbol ≃ indicates the two spaces are isomorphic. Theorem 10.4.51 Let f (x) be a separable polynomial in F [x] where F contains all nth roots of unity for each n ∈ N or for all n ≤ m and the roots of unity are of the form ξ k as described in Lemma 10.4.47. Let K be a splitting field of f (x) . If f (x) is solvable by radicals over F, or solvable by radicals over F with the ki ≤ m in Definition 10.4.48, then the Galois group G (K, F) is a solvable group. Proof: Using the definition given above for f (x) to be solvable by radicals, there is a sequence of fields F0 = F ⊆ F1 ⊆ · · · ⊆ Fk , K ⊆ Fk , where Fi = Fi−1 (ai ), aiki ∈ Fi−1 , and each field extension is a normal extension of the preceding one. By Lemma 10.4.49, Fk is the splitting field of a polynomial having coefficients in Fj−1 . This follows from the Lemma 10.4.49 above. Then it follows from Theorem 10.4.50, letting Fj−1 play the role of F, that G (Fj , Fj−1 ) ≃ G (Fk , Fj−1 ) /G (Fk , Fj ) By Lemma 10.4.47, the Galois group G (Fj , Fj−1 ) is Abelian and so this and the above isomorphism requires that G (Fk , F) is a solvable group since the quotient groups are Abelian. By Theorem 10.4.43, it follows that, since G (K, F) is a subgroup of G (Fk , F) , it must also be solvable.  Now consider the equation p (x) = xn − a1 xn−1 + a2 xn−2 + · · · ± an , p (x) ∈ F [x] , n ≥ 5 and suppose that p (x) has distinct roots, none of them in F. Let K be a splitting field for p (x) over F so that n ∏ (x − ri ) p (x) = k=1

Then it follows that ai = si (r1 , · · · , rn ) where the si are the elementary symmetric functions defined in Definition 10.1.3. For σ ∈ G (K, F) you can define σ ¯ ∈ Sn by the rule σ ¯ (k) ≡ j where σ (rk ) = rj . Recall that the automorphisms of G (K, F) take roots of p (x) to roots of p (x). This mapping σ → σ ¯ is onto, a homomorphism, and one to one and onto because the symmetric functions si are unchanged when the roots are permuted. Thus a rational function in s1 , s2 , · · · , sn is unaffected when the roots rk are permuted. It follows that G (K, F) cannot be solvable if n ≥ 5 because Sn is not solvable. 1 3 For example, consider 3x5 − 25x3 + 45x + 1 or equivalently x5 − 25 3 x + 15x + 3 ∈ Q (x) . It clearly has no rational roots and a graph will show it has 5 real roots. Let F = Q (ω) where ω denotes all k th roots of unity for k ≤ 5. Then some computations show that none of these roots of the polynomial are in F and they are all distinct. Thus the polynomial cannot be solved by radicals involving k th roots for k ≤ 5 of numbers in Q. In fact, it can’t be solved by radicals involving k th roots for k ≤ 5 of(√ numbers in Q (ω) . √ ) Recall that Q 2 can be written as a+b 2 where a, b are rational. However, algebraic numbers are roots of polynomials having rational coefficients. Can each of these be written in this way in terms of radicals. It was just shown that, surprisingly, this is not the case. It is a little like the fact

10.5. A FEW GENERALIZATIONS

241

from real analysis that it is extremely difficult to give an explicit description of a generic Borel set, except that the present situation seems even worse because in the case of Borel sets, you can sort of do it provided you use enough hard set theory. Thus you must use the definition of algebraic numbers described above. It is also pointless to search for the equivalent of the quadratic formula for polynomials of degree 5 or more.

10.5

A Few Generalizations

Sometimes people consider things which are more general. Also, it is worthwhile identifying situations when all polynomials are separable to generalize Theorem 10.4.51.

10.5.1

The Normal Closure of a Field Extension

An algebraic extension F (a1 , a2 , · · · , am ) is contained in a field which is a normal extension of F. To begin with, recall the following definition. Definition 10.5.1 When you have F (a1 , · · · , am ) with each ai algebraic so that F (a1 , · · · , am ) is a field, you could consider m ∏ f (x) ≡ fi (x) i=1

where fi (x) is the minimum polynomial of ai . Then if K is a splitting field for f (x) , this K is called the normal closure. It is at least as large as F (a1 , · · · , am ) and it has the advantage of being a normal extension. { } Let G (K, F) = η 1 , η 2 , · · · , η q . The conjugate fields are defined as the fields η j (F (a1 , · · · , am )) Thus each of these fields is isomorphic to any other and they are all contained in K. Let K′ denote the smallest field contained in K which contains all of these conjugate fields. Note that if k ∈ F (a1 , · · · , am ) so that η i (k) is in one of these conjugate fields, then η j η i (k) is also in a conjugate field because η j η i is one of the automorphisms of G (K, F). Let { } S = k ∈ K′ : η j (k) ∈ K′ each j . Then from what was just shown, each conjugate field is in S. Suppose k ∈ S. What about k −1 ? ( ) ( ) η j (k) η j k −1 = η j kk −1 = η j (1) = 1 ( )−1 ( ) ( )−1 ( ) and so η j (k) = η j k −1 . Now η j (k) ∈ K′ because K′ is a field. Therefore, η j k −1 ∈ K′ . Thus S is closed with respect to taking inverses. It is also closed with respect to products. Thus it is clear that S is a field which contains each conjugate field. However, K′ was defined as the smallest field which contains the conjugate fields. Therefore, S = K′ and so this shows that each η j maps K′ to itself while fixing F. Thus G (K, F) ⊆ G (K′ , F) because each of the η i is in G (K′ , F). This is what was just shown. However, since K′ ⊆ K, it follows that also G (K′ , F) ⊆ G (K, F) . Therefore, G (K′ , F) = G (K, F) and by the one to one correspondence between the intermediate fields and the Galois groups, it follows that K′ = K. If K′ is a proper subset of K then you would need to have G (K′ , F) a proper subgroup of G (K, F) but these are equal. This proves the following lemma. Lemma 10.5.2 Let K denote the normal extension of F (a1 , · · · , am ) with each ai algebraic so that F (a1 , · · · , am ) is a field. Thus K is the splitting field of the product of the minimum polynomials of the ai . Then K is also the smallest field containing the conjugate fields η j (F (a1 , · · · , am )) for { } η 1 , η 2 , · · · , η q = G (K, F). Lemma 10.5.3 In Definition 10.4.48, you can assume that Fk is a normal extension of F.

242

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Proof: First note that Fk = F [a1 , a2 , · · · , ak ]. Let G be the normal extension of Fk . By Lemma 10.5.2, G is the smallest field which contains the conjugate fields ( ) η j (F (a1 , a2 , · · · , ak )) = F η j a1 , η j a2 , · · · , η j ak ( ) ( )ki for {η 1 , η 2 , · · · , η m } = G (Fk , F). Also, η j ai = η j aki i ∈ η j Fi−1 , η j F = F. Then G = F (η 1 (a1 ) , η 1 (a2 ) , · · · , η 1 (ak ) , η 2 (a1 ) , η 2 (a2 ) , · · · , η 2 (ak ) · · · ) and this is a splitting field so is a normal extension. Thus G could be the new Fk with respect to a longer sequence of ai but would now be a splitting field. 

10.5.2

Conditions for Separability

So when is it that a polynomial having coefficients in a field F is separable? It turns out that this is always the case for fields which are enough like the rational numbers. It involves considering the derivative of a polynomial. In doing this, there will be no analysis used, just the rule for differentiation which we all learned in calculus. Thus the derivative is defined as follows. ( )′ an xn + an−1 xn−1 + · · · + a1 x + a0 ≡

nan xn−1 + an−1 (n − 1) xn−2 + · · · + a1

This kind of formal manipulation is what most students do anyway, never thinking about where it comes from. Here nan means to add an to itself n times. With this definition, it is clear that the usual rules such as the product rule hold. This discussion follows [24]. Definition 10.5.4 A field has characteristic 0 if na ̸= 0 for all n ∈ N and a ̸= 0. Otherwise a field F has characteristic p if p · 1 = 0 for p · 1 defined as 1 added to itself p times and p is the smallest positive integer for which this takes place. Note that with this definition, some of the terms of the derivative of a polynomial could vanish in the case that the field has characteristic p. I will go ahead and write them anyway. For example, if the field has characteristic p, then ′ (xp − a) = 0 because formally it equals p · 1xp−1 = 0xp−1 , the 1 being the 1 in the field. Note that the field Zp does not have characteristic 0 because p · 1 = 0. Thus not all fields have characteristic 0. How can you tell if a polynomial has no repeated roots? This is the content of the next theorem. Theorem 10.5.5 Let p (x) be a monic polynomial having coefficients in a field F, and let K be a field in which p (x) factors n ∏ (x − ri ) , ri ∈ K. p (x) = i=1

Then the ri are distinct if and only if p (x) and p′ (x) are relatively prime over F. Proof: Suppose first that p′ (x) and p (x) are relatively prime over F. Since they are not both zero, there exists polynomials a (x) , b (x) having coefficients in F such that a (x) p (x) + b (x) p′ (x) = 1 Now suppose p (x) has a repeated root r. Then in K [x] , 2

p (x) = (x − r) g (x)

10.5. A FEW GENERALIZATIONS

243

and so p′ (x) = 2 (x − r) g (x) + (x − r) g ′ (x). Then in K [x] , ( ) 2 2 a (x) (x − r) g (x) + b (x) 2 (x − r) g (x) + (x − r) g ′ (x) = 1 2

Then letting x = r, it follows that 0 = 1. Hence p (x) has no repeated roots. Next suppose there are no repeated roots of p (x). Then p′ (x) =

n ∏ ∑

(x − rj )

i=1 j̸=i

p′ (x) cannot be zero in this case because p′ (rn ) =

n−1 ∏

(rn − rj ) ̸= 0

j=1

because it is the product of nonzero elements of K. Similarly no term in the sum for p′ (x) can equal zero because ∏ (ri − rj ) ̸= 0. j̸=i

Then if q (x) is a monic polynomial of degree larger than 1 which divides p (x), then the roots of q (x) in K are a subset of {r1 , · · · , rn }. Without loss of generality, suppose these roots of q (x) are {r1 , · · · , rk } , k ≤ n − 1, since q (x) divides p′ (x) which has degree at most n − 1. Then ∏k q (x) = i=1 (x − ri ) but this fails to divide p′ (x) as polynomials in K [x] and so q (x) fails to divide p′ (x) as polynomials in F [x] either. Therefore, q (x) = 1 and so the two are relatively prime.  The following lemma says that the usual calculus result holds in case you are looking at polynomials with coefficients in a field of characteristic 0. Lemma 10.5.6 Suppose that F has characteristic 0. Then if f ′ (x) = 0, it follows that f (x) is a constant. Proof: Suppose f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 Then 0xn + 0xn−1 + · · · + 0x + 0 = nan xn−1 + an−1 (n − 1) xn−2 + · · · + a1 Therefore, each coefficient on the right is 0. Since the field has characteristic 0 it follows that each ak = 0 for k ≥ 1. Thus f (x) = a0 ∈ F.  If F has characteristic p as in Zp for p prime, this is not true. Indeed, xp − 1 is not constant but has derivative equal to 0. Now here is a major result which applies to fields of characteristic 0. Theorem 10.5.7 If F is a field of characteristic 0, then every polynomial p (x) , having coefficients in F is separable. Proof: It is required to show that the irreducible factors of p (x) have distinct roots in K a splitting field for p (x). So let q (x) be an irreducible, non constant, monic polynomial. Thus q ′ (x) ̸= 0 because the field has characteristic 0. If l (x) is a monic polynomial of positive degree which divides both q (x) and q ′ (x) , then since q (x) is irreducible, it must be the case that l (x) = q (x) or l (x) = 1. If l (x) = q (x) , then this forces q (x) to divide q ′ (x) , a nonzero polynomial having smaller degree than q (x) . This is impossible. Hence l (x) = 1 and so q ′ (x) and q (x) are relatively prime which implies that q (x) has distinct roots.  It follows that the above theory all holds for any field of characteristic 0. For example, if the field is Q then everything holds.

244

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Proposition 10.5.8 If a field F has characteristic p, then p is a prime. Proof: First note that if n · 1 = 0, if and only if for all a ̸= 0, n · a = 0 also. This just follows from the distributive law and the definition of what is meant by n · 1, meaning that you add 1 to itself n times. Suppose then that there are positive integers, each larger than 1 n, m such that nm · 1 = 0. Then grouping the terms in the sum associated with nm · 1, it follows that n (m · 1) = 0. If the characteristic of the field is nm, this is a contradiction because then m · 1 ̸= 0 but n times it is, implying that n < nm but n · a = 0 for a nonzero a. Hence n · 1 = 0 showing that mn is not the characteristic of the field after all.  Definition 10.5.9 A field F is called perfect if every polynomial p (x) having coefficients in F is separable. The above shows that fields of characteristic 0 are perfect. The above theory about Galois groups and fixed fields all works for perfect fields. What about fields of characteristic p where p is a prime? The following interesting lemma has to do with a nonzero a ∈ F having a pth root in F. Lemma 10.5.10 Let F be a field of characteristic p. Let a ̸= 0 where a ∈ F. Then either xp − a is p irreducible or there exists b ∈ F such that xp − a = (x − b) . Proof: Suppose that xp − a is not irreducible. Then xp − a = g (x) f (x) where the degree of g (x) , k is less than p and at least as large as 1. Then let b be a root of g (x). Then bp − a = 0. Therefore, p xp − a = xp − bp = (x − b) . p

That is right. xp − bp = (x − b) just like many beginning calculus students believe. It happens because of the binomial theorem and the fact that the other terms have a factor of p. Hence p

xp − a = (x − b) = g (x) f (x) p

k

and so g (x) divides (x − b) which requires that g (x) = (x − b) since g (x) has degree k. It follows, since g (x) is given to have coefficients in F, that bk ∈ F. Also bp ∈ F. Since k, p are relatively prime, due to the fact that k < p with p prime, there are integers m, n such that 1 = mk + np Then from what you mean by raising b to an integer power and the usual rules of exponents for integer powers, ( )m n b = bk (bp ) ∈ F.  So when is a field of characteristic p perfect? As observed above, for a field of characteristic p, p

(a + b) = ap + bp . Also, p

(ab) = ap bp It follows that a → ap is a homomorphism. This is also one to one because, as mentioned above p

(a − b) = ap − bp Therefore, if ap = bp , it follows that a = b. Therefore, this homomorphism is also one to one. Let Fp be the collection of ap where a ∈ F. Then clearly Fp is a subfield of F because it is the image of a one to one homomorphism. What follows is the condition for a field of characteristic p to be perfect. Theorem 10.5.11 Let F be a field of characteristic p. Then F is perfect if and only if F = Fp .

10.5. A FEW GENERALIZATIONS

245

Proof: Suppose F = Fp first. Let f (x) be an irreducible polynomial over F. By Theorem 10.5.5, if f ′ (x) and f (x) are relatively prime over F then f (x) has no repeated roots. Suppose then that the two polynomials are not relatively prime. If d (x) divides both f (x) and f ′ (x) with degree of d (x) ≥ 1. Then, since f (x) is irreducible, it follows that d (x) is a multiple of f (x) and so f (x) divides f ′ (x) which is impossible unless f ′ (x) = 0. But if f ′ (x) = 0, then f (x) must be of the form a0 + a1 xp + a2 x2p + · · · + an xnp since if it had some other nonzero term with exponent not a multiple of p then f ′ (x) could not equal zero since you would have something surviving in the expression for the derivative after taking out multiples of p which is like kaxk−1 where a ̸= 0 and k < p. Thus ka ̸= 0. Hence the form of f (x) is as indicated above. If ak = bpk for some bk ∈ F, then the expression for f (x) is bp + bp1 xp + bp2 x2p + · · · + bpn xnp (0 )p = b0 + b1 x + bx x2 + · · · + bn xn because of the fact noted earlier that a → ap is a homomorphism. However, this says that f (x) is not irreducible after all. It follows that there exists ak such that ak ∈ / Fp contrary to the assumption p ′ that F = F . Hence the greatest common divisor of f (x) and f (x) must be 1. p Next consider the other direction. Suppose F ̸= Fp . Then there exists a ∈ F \ F . Consider the polynomial xp − a. As noted above, its derivative equals 0. Therefore, xp − a and its derivative cannot be relatively prime. In fact, xp − a would divide both.  Now suppose F is a finite field. If n·1 is never equal to 0 then, since the field is finite, k·1 = m·1, for some k < m. m > k, and (m − k) · 1 = 0 which is a contradiction. Hence F is a field of characteristic p for some prime p, by Proposition 10.5.8. The mapping a → ap was shown to be a homomorphism which is also one to one. Therefore, Fp is a subfield of F. It follows that it has characteristic q for some q a prime. However, this requires q = p and so Fp = F. Then the following corollary is obtained from the above theorem. With this information, here is a convenient version of the fundamental theorem of Galois theory. Theorem 10.5.12 Let K be a splitting field of any polynomial p (x) ∈ F [x] where F is either k of characteristic 0 or of characteristic p with Fp = F. Let {Li }i=0 be the increasing sequence of intermediate fields between F and K. Then each of these is a normal extension of F and the Galois group G (Lj−1 , F) is a normal subgroup of G (Lj , F). In addition to this, G (Lj , F) ≃ G (K, F) /G (K, Lj ) where the symbol ≃ indicates the two spaces are isomorphic.

246

CHAPTER 10. SOME ITEMS WHICH RESEMBLE LINEAR ALGEBRA

Part II

Analysis and Geometry in Linear Algebra

247

Chapter 11

Normed Linear Spaces In addition to the algebraic aspects of linear algebra presented earlier, there are many analytical and geometrical concepts which are usually included. This material involves the special fields R and C instead of general fields. It is these things which are typically generalized in functional analysis. The main new idea is that the notion of distance is included. This allows one to consider continuity, compactness, and many other topics from calculus. First is a general treatment of the notion of distance which has nothing to do with linear algebra but is a useful part of the vocabulary leading most efficiently to the inclusion of analytical topics.

11.1

Metric Spaces

This section is here to provide definitions and main theorems about fundamental analytical ideas and terminology. The first part is on metric spaces which really have absolutely nothing to do with linear algebra but they provide a convenient framework for discussion of the analytical aspects of linear algebra.

11.1.1

Open and Closed Sets, Sequences, Limit Points, Completeness

It is most efficient to discus things in terms of abstract metric spaces to begin with. Definition 11.1.1 A non empty set X is called a metric space if there is a function d : X × X → [0, ∞) which satisfies the following axioms. 1. d (x, y) = d (y, x) 2. d (x, y) ≥ 0 and equals 0 if and only if x = y 3. d (x, y) + d (y, z) ≥ d (x, z) This function d is called the metric. We often refer to it as the distance. Definition 11.1.2 An open ball, denoted as B (x, r) is defined as follows. B (x, r) ≡ {y : d (x, y) < r} A set U is said to be open if whenever x ∈ U, it follows that there is r > 0 such that B (x, r) ⊆ U . More generally, a point x is said to be an interior point of U if there exists such a ball. In words, an open set is one for which every point is an interior point. For example, you could have X be a subset of R and d (x, y) = |x − y|. Then the first thing to show is the following. Proposition 11.1.3 An open ball is an open set. 249

250

CHAPTER 11. NORMED LINEAR SPACES

Proof: Suppose y ∈ B (x, r) . We need to verify that y is an interior point of B (x, r). Let δ = r − d (x, y) . Then if z ∈ B (y, δ) , it follows that d (z, x) ≤ d (z, y) + d (y, x) < δ + d (y, x) = r − d (x, y) + d (y, x) = r Thus y ∈ B (y, δ) ⊆ B (x, r).  Definition 11.1.4 Let S be a nonempty subset of a metric space. Then p is a limit point (accumulation point) of S if for every r > 0 there exists a point different than p in B (p, r) ∩ S. Sometimes people denote the set of limit points as S ′ . A related idea is the notion of the limit of a sequence. Recall that a sequence is really just a ∞ mapping from N to X. We write them as {xn } or {xn }n=1 if we want to emphasize the values of n. Then the following definition is what it means for a sequence to converge. Definition 11.1.5 We say that x = limn→∞ xn when for every ε > 0 there exists N such that if n ≥ N, then d (x, xn ) < ε Often we write xn → x for short. This is equivalent to saying lim d (x, xn ) = 0.

n→∞

Proposition 11.1.6 The limit is well defined. That is, if x, x′ are both limits of a sequence, then x = x′ . Proof: From the definition, there exist N, N ′ such that if n ≥ N, then d (x, xn ) < ε/2 and if n ≥ N ′ , then d (x, xn ) < ε/2. Then let M ≥ max (N, N ′ ) . Let n > M. Then d (x, x′ ) ≤ d (x, xn ) + d (xn , x′ )
0, B (p, r) contains xn ∈ S for all n large enough. Hence, p is a limit point because none of these xn are equal to p. 

Definition 11.1.8 A set H is closed means H C is open. Note that this says that the complement of an open set is closed. If V is open, then the ( )C complement of its complement is itself. Thus V C = V an open set. Hence V C is closed. Then the following theorem gives the relation between closed sets and limit points. Theorem 11.1.9 A set H is closed if and only if it contains all of its limit points. Proof: =⇒ Let H be closed and let p be a limit point. We need to verify that p ∈ H. If it is not, then since H is closed, its complement is open and so there exists δ > 0 such that B (p, δ) ∩ H = ∅. However, this prevents p from being a limit point. ⇐= Next suppose H has all of its limit points. Why is H C open? If p ∈ H C then it is not a limit point and so there exists δ > 0 such that B (p, δ) has no points of H. In other words, H C is open. Hence H is closed. 

11.1. METRIC SPACES

251

Corollary 11.1.10 A set H is closed if and only if whenever {hn } is a sequence of points of H which converges to a point x, it follows that x ∈ H. Proof: =⇒ Suppose H is closed and hn → x. If x ∈ H there is nothing left to show. If x ∈ / H, then from the definition of limit, it is a limit point of H. Hence x ∈ H after all. ⇐= Suppose the limit condition holds, why is H closed? Let x ∈ H ′ the set of limit points of H. By Theorem 11.1.7 there exists a sequence of points of H, {hn } such that hn → x. Then by assumption, x ∈ H. Thus H contains all of its limit points and so it is closed by Theorem 11.1.9.  Next is the important concept of a subsequence. ∞

Definition 11.1.11 Let {xn }n=1 be a sequence. Then if n1 < n2 < · · · is a strictly increasing ∞ ∞ sequence of indices, we say {xnk }k=1 is a subsequence of {xn }n=1 . The really important thing about subsequences is that they preserve convergence. Theorem 11.1.12 Let {xnk } be a subsequence of a convergent sequence {xn } where xn → x. Then lim xnk = x

k→∞

also. Proof: Let ε > 0 be given. Then there exists N such that d (xn , x) < ε if n ≥ N. It follows that if k ≥ N, then nk ≥ N and so d (xnk , x) < ε if k ≥ N. This is what it means to say limk→∞ xnk = x.  Another useful idea is the distance to a set. Definition 11.1.13 Let (X, d) be a metric space and let S be a nonempty set in X. Then dist (x, S) ≡ inf {d (x, y) : y ∈ S} . The following lemma is the fundamental result. Lemma 11.1.14 The function, x → dist (x, S) is continuous and in fact satisfies |dist (x, S) − dist (y, S)| ≤ d (x, y) . Proof: Suppose dist (x, S) is as least as large as dist (y, S). Then pick z ∈ S such that d (y, z) ≤ dist (y, S) + ε. Then |dist (x, S) − dist (y, S)| =

dist (x, S) − dist (y, S)



d (x, z) − (d (y, z) − ε)

=

d (x, z) − d (y, z) + ε



d (x, y) + d (y, z) − d (y, z) + ε

=

d (x, y) + ε.

Since ε > 0 is arbitrary, this proves the lemma. The argument is the same if dist (x, S) ≤ dist (y, S). Just switch the roles of x and y. 

252

CHAPTER 11. NORMED LINEAR SPACES

11.1.2

Cauchy Sequences, Completeness n

Of course it does not go the other way. For example, you could let xn = (−1) and it has a convergent subsequence but fails to converge. Here d (x, y) = |x − y| and the metric space is just R. However, there is a kind of sequence for which it does go the other way. This is called a Cauchy sequence. Definition 11.1.15 {xn } is called a Cauchy sequence if for every ε > 0 there exists N such that if m, n ≥ N, then d (xn , xm ) < ε Now the major theorem about this is the following. Theorem 11.1.16 Let {xn } be a Cauchy sequence. Then it converges if and only if any subsequence converges. Proof: =⇒ This was just done above. ⇐= Suppose now that {xn } is a Cauchy sequence and limk→∞ xnk = x. Then there exists N1 such that if k > N1 , then d (xnk , x) < ε/2. From the definition of what it means to be Cauchy, there exists N2 such that if m, n ≥ N2 , then d (xm , xn ) < ε/2. Let N ≥ max (N1 , N2 ). Then if k ≥ N, then nk ≥ N and so ε ε d (x, xk ) ≤ d (x, xnk ) + d (xnk , xk ) < + = ε (11.1) 2 2 It follows from the definition that limk→∞ xk = x.  Definition 11.1.17 A metric space is said to be complete if every Cauchy sequence converges. Another nice thing to note is this. Proposition 11.1.18 If {xn } is a sequence and if p is a limit point of the set S = ∪∞ n=1 {xn } then there is a subsequence {xnk } such that limk→∞ xnk = x. Proof: By Theorem 11.1.7, there exists a sequence of distinct points of S denoted as {yk } such that none of them equal p and limk→∞ yk = p. Thus B (p, r) contains infinitely many different points of the set D, this for every r. Let xn1 ∈ B (p, 1) where n1 is the first index such that xn1 ∈ B (p, 1). Suppose xn1 , · · · , xnk have been chosen, the ni increasing and let 1 > δ 1 > δ 2 > · · · > δ k where xni ∈ B (p, δ i ) . Then let { } ( ) 1 , d p, x δ k+1 ≤ min , δ , j = 1, 2 · · · , k n j j 2k+1 Let xnk+1 ∈ B (p, δ k+1 ) where nk+1 is the first index such that xnk+1 is contained B (p, δ k+1 ). Then lim xnk = p. 

k→∞

Another useful result is the following. Lemma 11.1.19 Suppose xn → x and yn → y. Then d (xn , yn ) → d (x, y). Proof: Consider the following. d (x, y) ≤ d (x, xn ) + d (xn , y) ≤ d (x, xn ) + d (xn , yn ) + d (yn , y) so d (x, y) − d (xn , yn ) ≤ d (x, xn ) + d (yn , y) Similarly d (xn , yn ) − d (x, y) ≤ d (x, xn ) + d (yn , y)

11.1. METRIC SPACES

253

and so |d (xn , yn ) − d (x, y)| ≤ d (x, xn ) + d (yn , y) and the right side converges to 0 as n → ∞.  First are some simple lemmas featuring one dimensional considerations. In these, the metric space is R and the distance is given by d (x, y) ≡ |x − y| First recall the nested interval lemma. You should have seen something like it in calculus, but this is often not the case because there is much more interest in trivialities like integration techniques. Lemma 11.1.20 Let [ak , bk ] ⊇ [ak+1 , bk+1 ] for all k = 1, 2, 3, · · · . Then there exists a point p in ∩∞ k=1 [ak , bk ]. Proof: We note that for any k, l, ak ≤ bl . Here is why. If k ≤ l, then ak ≤ al ≤ bl If k > l, then bl ≥ bk ≥ ak It follows that for each l, sup ak ≤ bl k

Hence supk ak is a lower bound to the set of all bl and so it is no larger than the greatest lower bound. It follows that sup ak ≤ inf bl k

l

Pick x ∈ [supk ak , inf l bl ]. Then for every k, ak ≤ x ≤ bk . Hence x ∈ ∩∞ k=1 [ak , bk ] .  Lemma 11.1.21 The closed interval [a, b] is compact. This means that if there is a collection of open intervals of the form (a, b) whose union includes all of [a, b] , then in fact [a, b] is contained in the union of finitely many of these open intervals. Proof: Let C be a set of open intervals the union of which includes all of [a, b] and suppose [a, b] fails to admit a finite subcover. That is, no finite [subset ]of C has [ a+bunion ] which contains [a, b]. Then this must be the case for one of the two intervals a, a+b and , b . Let I1 be the one for which 2 2 this is so. Then split it into two equal pieces like what was just done and let I2 be a half for which there is no finite subcover of sets of C. Continue this way. This yields a nested sequence of closed intervals I1 ⊇ I2 ⊇ · · · and by the above lemma, there exists a point x in all of these intervals. There exists U ∈ C such that x ∈ U. Thus x ∈ (a, b) ∈ C However, for all n large enough, the length of In is less than min (|x − a| , |x − b|). Hence In is actually contained in (a, b) ∈ C contrary to the construction. Hence [a, b] is compact after all.  As a useful corollary, this shows that R is complete. Corollary 11.1.22 The real line R is complete. ∞

Proof: Suppose {xk } is a Cauchy sequence in R. Then there exists M such that {xk }k=1 ⊆ [−M, M ] . Why? If there is no convergent subsequence, then for each x ∈ [−M, M ] , there is an open set (x − δ x , x + δ x ) which contains xk for only finitely many values of k. Since [−M, M ] is compact, there are finitely many of these open sets whose union includes [−M, M ]. This is a contradiction because [−M, M ] contains xk for all k ∈ N so at least one of the open sets must contain xk for infinitely many k. Thus there is a convergent subsequence. By Theorem 11.1.16 the original Cauchy sequence converges to some x ∈ [−M, M ]. 

254

CHAPTER 11. NORMED LINEAR SPACES

Example 11.1.23 Let n ∈ N. Cn with distance given by d (x, y) ≡

max j∈{1,··· ,n}

{|xj − yj |}

√ is a complete space. Recall that |a + jb| ≡ a2 + b2 . Then Cn is complete. Similarly Rn is complete. { }∞ { }∞ To see that this is complete, let xk k=1 be a Cauchy sequence. Observe that for each j, xkj k=1 . That is, each component is a Cauchy sequence in C. Next, k+p k+p Re xkj − Re xj ≤ xkj − xj { }∞ }∞ { Therefore, Re xkj k=1 is a Cauchy sequence. Similarly Im xkj k=1 is a Cauchy sequence. It follows from completeness of R shown above, that these converge. Thus there exists aj , bj such that lim Re xkj + i Im xkj = aj + ibj ≡ x

k→∞

and so xk → x showing that Cn is complete. The same argument shows that Rn is complete. It is easier because you don’t need to fuss with real and imaginary parts.

11.1.3

Closure of a Set

Next is the topic of the closure of a set. Definition 11.1.24 Let A be a nonempty subset of (X, d) a metric space. Then A is defined to be the intersection of all closed sets which contain A. Note the whole space, X is one such closed set which contains A. The whole space X is closed because its complement is open, its complement being ∅. It is certainly true that every point of the empty set is an interior point because there are no points of ∅. Lemma 11.1.25 Let A be a nonempty set in (X, d) . Then A is a closed set and A = A ∪ A′ where A′ denotes the set of limit points of A. Proof: First of all, denote by C the set of closed sets which contain A. Then A = ∩C and this will be closed if its complement is open. However, { } C A = ∪ HC : H ∈ C . Each H C is open and so the union of all these open sets must also be open. This is because if x is in this union, then it is in at least one of them. Hence it is an interior point of that one. But this implies it is an interior point of the union of them all which is an even larger set. Thus A is closed. The interesting part is the next claim. First note that from the definition, A ⊆ A so if x ∈ A, C / A. If y ∈ / A, a closed set, then there exists B (y, r) ⊆ A . then x ∈ A. Now consider y ∈ A′ but y ∈ Thus y cannot be a limit point of A, a contradiction. Therefore, A ∪ A′ ⊆ A Next suppose x ∈ A and suppose x ∈ / A. Then if B (x, r) contains no points of A different than x, since x itself is not in A, it would follow that B (x,r) ∩ A = ∅ and so recalling that open balls C are open, B (x, r) is a closed set containing A so from the definition, it also contains A which is / A, then x ∈ A′ and so contrary to the assertion that x ∈ A. Hence if x ∈ A ∪ A′ ⊇ A 

11.1. METRIC SPACES

11.1.4

255

Continuous Functions

The following is a fairly general definition of what it means for a function to be continuous. It includes everything seen in typical calculus classes as a special case. Definition 11.1.26 Let f : X → Y be a function where (X, d) and (Y, ρ) are metric spaces. Then f is continuous at x ∈ X if and only if the following condition holds. For every ε > 0, there exists δ > 0 such that if d (ˆ x, x) < δ, then ρ (f (ˆ x) , f (x)) < ε. If f is continuous at every x ∈ X we say that f is continuous on X. For example, you could have a real valued function f (x) defined on an interval [0, 1] . In this case you would have X = [0, 1] and Y = R with the distance given by d (x, y) = |x − y|. Then the following theorem is the main result. Theorem 11.1.27 Let f : X → Y where (X, d) and (Y, ρ) are metric spaces. Then the following are equivalent. a f is continuous at x. b Whenever xn → x, it follows that f (xn ) → f (x) . Also, the following are equivalent. c f is continuous on X. d Whenever V is open in Y, it follows that f −1 (V ) ≡ {x : f (x) ∈ V } is open in X. e Whenever H is closed in Y, it follows that f −1 (H) is closed in X. Proof: a =⇒ b: Let f be continuous at x and suppose xn → x. Then let ε > 0 be given. By continuity, there exists δ > 0 such that if d (ˆ x, x) < δ, then ρ (f (ˆ x) , f (x)) < ε. Since xn → x, it follows that there exists N such that if n ≥ N, then d (xn , x) < δ and so, if n ≥ N, it follows that ρ (f (xn ) , f (x)) < ε. Since ε > 0 is arbitrary, it follows that f (xn ) → f (x). b =⇒ a: Suppose b holds but f fails to be continuous at x. Then there exists ε > 0 such that for all δ > 0, there exists x ˆ such that d (ˆ x, x) < δ but ρ (f (ˆ x) , f (x)) ≥ ε. Letting δ = 1/n, there exists xn such that d (xn , x) < 1/n but ρ (f (xn ) , f (x)) ≥ ε. Now this is a contradiction because by assumption, the fact that xn → x implies that f (xn ) → f (x). In particular, for large enough n, ρ (f (xn ) , f (x)) < ε contrary to the construction. c =⇒ d: Let V be open in Y . Let x ∈ f −1 (V ) so that f (x) ∈ V. Since V is open, there exists ε > 0 such that B (f (x) , ε) ⊆ V . Since f is continuous at x, it follows that there exists δ > 0 such that if x ˆ ∈ B (x, δ) , then f (ˆ x) ∈ B (f (x) , ε) ⊆ V. (f (B (x, δ)) ⊆ B (f (x) , ε)) In other words, B (x, δ) ⊆ f −1 (B (f (x) , ε)) ⊆ f −1 (V ) which shows that, since x was an arbitrary point of f −1 (V ) , every point of f −1 (V ) is an interior point which implies f −1 (V ) is open. ( ) C d =⇒ e: Let H be closed in Y . Then f −1 (H) = f −1 H C which is open by assumption. Hence f −1 (H) is closed because its complement is open. ( ) C e =⇒ d: Let V be open in Y. Then f −1 (V ) = f −1 V C which is assumed to be closed. This is because the complement of an open set is a closed set. d =⇒ c: Let x ∈ X be arbitrary. Is it the case that f is continuous at x? Let ε > 0 be given. Then B (f (x) , ε) is an open set in V and so x ∈ f −1 (B (f (x) , ε)) which is given to be open. Hence there exists δ > 0 such that x ∈ B (x, δ) ⊆ f −1 (B (f (x) , ε)) . Thus, f (B (x, δ)) ⊆ B (f (x) , ε) so ρ (f (ˆ x) , f (x)) < ε. Thus f is continuous at x for every x. 

11.1.5

Separable Metric Spaces

Definition 11.1.28 A metric space is called separable if there exists a countable dense subset D. This means two things. First, D is countable, and second that if x is any point and r > 0, then B (x, r) ∩ D ̸= ∅. A metric space is called completely separable if there exists a countable collection of nonempty open sets B such that every open set is the union of some subset of B. This collection of open sets is called a countable basis.

256

CHAPTER 11. NORMED LINEAR SPACES

For those who like to fuss about empty sets, the empty set is open and it is indeed the union of a subset of B namely the empty subset. Theorem 11.1.29 A metric space is separable if and only if it is completely separable. Proof: ⇐= Let B be the special countable collection of open sets and for each B ∈ B, let pB be a point of B. Then let P ≡ {pB : B ∈ B}. If B (x, r) is any ball, then it is the union of sets of B and so there is a point of P in it. Since B is countable, so is P. =⇒ Let D be the countable dense set and let B ≡ {B (d, r) : d ∈ D, r ∈ Q ∩ [0, ∞)}. Then B is countable because the Cartesian product of countable sets is countable. It suffices to show that every ball is (the union of these sets. Let B (x, R) be a ball. Let y ∈ B (y, δ) ⊆ B (x, R) . Then there ) δ δ . Let ε ∈ Q and 10 < ε < 5δ . Then y ∈ B (d, ε) ∈ B. Is B (d, ε) ⊆ B (x, R)? If so, exists d ∈ B y, 10 then the desired result follows because this would show that every y ∈ B (x, R) is contained in one of these sets of B( which ) is contained in B (x, R) showing that B (x, R) is the union of sets of B. Let z ∈ B (d, ε) ⊆ B d, 5δ . Then d (y, z) ≤ d (y, d) + d (d, z)
0, there is an ε net. Thus the metric space is totally bounded. Let Nε denote an ε net. Let D = ∪∞ k=1 N1/2k . Then this is a countable dense set. It is countable because it is the countable union of finite sets and it is dense because given a point, there is a point of D within 1/2k of it.  Also recall that a complete metric space is one for which every Cauchy sequence converges to a point in the metric space. The following is the main theorem which relates these concepts. Theorem 11.1.38 For (X, d) a metric space, the following are equivalent. 1. (X, d) is compact. 2. (X, d) is sequentially compact. 3. (X, d) is complete and totally bounded. Proof: 1.=⇒ 2. Let {xn } be a sequence. Suppose it fails to have a convergent subsequence. Then it follows right away that no value of the sequence is repeated infinitely often. If ∪∞ n=1 {xn } has a limit point in X, then it follows from Proposition 11.1.18 there would be a convergent subsequence converging to this limit point. Therefore, assume ∪∞ k=1 {xn } has no limit point. This is equivalent to saying that ∪∞ k=m {xk } has no limit point for each m. Thus these are closed sets by Theorem 11.1.9 because they contain all of their limit points due to the fact that they have none. Hence the open sets C (∪∞ k=m {xn }) yield an open cover. This is an increasing sequence of open sets and none of them contain all the values of the sequence because no value is repeated for infinitely many indices. Thus this is an open cover which has no finite subcover contrary to 1. 2.=⇒ 3. If (X, d) is sequentially compact, then by Lemma 11.1.37, it is totally bounded. If {xn } is a Cauchy sequence, then there is a subsequence which converges to x ∈ X by assumption. However, from Theorem 11.1.16 this requires the original Cauchy sequence to converge. 3.=⇒ 1. Since (X, d) is totally bounded, there must be a countable dense subset of X. Just take the union of 1/2k nets for each k ∈ N. Thus (X, d) is completely separable by Theorem 11.1.32 has ∞ the Lindeloff property. Hence, if X is not compact, there is a countable set of open sets {Ui }i=1 which covers X but no finite subset does. Consider the nonempty closed sets Fn and pick xn ∈ Fn where C X \ ∪ni=1 Ui ≡ X ∩ (∪ni=1 Ui ) ≡ Fn ( ) { }Mk be a 1/2k net for X. We have for some m, B xkmk , 1/2k contains xn for infinitely Let xkm m=1 many values of n because many { } there are only ( finitely many ) balls and infinitely ( ) indices. Then out of the finitely many xk+1 for which B xk+1 , 1/2k+1 intersects B xkmk , 1/2k , pick one xk+1 m m mk+1 such ( ) { k }∞ k+1 k+1 that B xmk+1 , 1/2 contains xn for infinitely many n. Then obviously xmk k=1 is a Cauchy sequence because ( ) 1 1 1 d xkmk , xk+1 mk+1 ≤ k + k+1 ≤ k−1 2 2 2

258

CHAPTER 11. NORMED LINEAR SPACES

Hence for p < q,

q−1 ( ∞ ( ) ∑ ) ∑ d xpmp , xqmq ≤ d xkmk , xk+1 mk+1
0 and a subsequence nk such that d (f nk (x0 ) , x) ≥ ε Now nk = pk n + rk where rk is one of the numbers {0, 1, 2, · · · , n − 1}. It follows that there exists one of these numbers which is repeated infinitely often. Call it r and let the further subsequence continue to be denoted as nk . Thus ( ) d f pk n+r (x0 ) , x ≥ ε In other words, d (f pk n (f r (x0 )) , x) ≥ ε However, from Theorem 11.1.41, as k → ∞, f pk n (f r (x0 )) → x which contradicts the above inequality. Hence the sequence of iterates converges to x, as it did for f a contraction map.  Now with the above material on analysis, it is time to begin using the ideas from linear algebra in this special case where the field of scalars is R or C.

11.1.8

Convergence Of Functions

Next is to consider the meaning of convergence of sequences of functions. There are two main ways of convergence of interest here, pointwise and uniform convergence. Definition 11.1.44 Let fn : X → Y where (X, d) , (Y, ρ) are two metric spaces. Then {fn } is said to converge pointwise to a function f : X → Y if for every x ∈ X, lim fn (x) = f (x)

n→∞

{fn } is said to converge uniformly if for all ε > 0, there exists N such that if n ≥ N, then sup ρ (fn (x) , f (x)) < ε x∈X

11.2. CONNECTED SETS

261

Here is a well known example illustrating the difference between pointwise and uniform convergence. Example 11.1.45 Let fn (x) = xn on the metric space [0, 1] . Then this function converges pointwise to { 0 on [0, 1) f (x) = 1 at 1 but it does not converge uniformly on this interval to f . Note how the target function f in the above example is not continuous even though each function in the sequence is. The nice thing about uniform convergence is that it takes continuity of the functions in the sequence and imparts it to the target function. It does this for both continuity at a single point and uniform continuity. Thus uniform convergence is a very superior thing. Theorem 11.1.46 Let fn : X → Y where (X, d) , (Y, ρ) are two metric spaces and suppose each fn is continuous at x ∈ X and also that fn converges uniformly to f on X. Then f is also continuous at x. In addition to this, if each fn is uniformly continuous on X, then the same is true for f . Proof: Let ε > 0 be given. Then ρ (f (x) , f (ˆ x)) ≤ ρ (f (x) , fn (x)) + ρ (fn (x) , fn (ˆ x)) + ρ (fn (ˆ x) , f (ˆ x)) By uniform convergence, there exists N such that both ρ (f (x) , fn (x)) and ρ (fn (ˆ x) , f (ˆ x)) are less than ε/3 provided n ≥ N. Thus picking such an n, ρ (f (x) , f (ˆ x)) ≤

2ε + ρ (fn (x) , fn (ˆ x)) 3

Now from the continuity of fn , there exists δ > 0 such that if d (x, x ˆ) < δ, then ρ (fn (x) , fn (ˆ x)) < ε/3. Hence, if d (x, x ˆ) < δ, then ρ (f (x) , f (ˆ x)) ≤

2ε 2ε ε + ρ (fn (x) , fn (ˆ x)) < + =ε 3 3 3

Hence, f is continuous at x. Next consider uniform continuity. It follows from the uniform convergence that if x, x ˆ are any two points of X, then if n ≥ N, then, picking such an n, ρ (f (x) , f (ˆ x)) ≤

2ε + ρ (fn (x) , fn (ˆ x)) 3

By uniform continuity of fn there exists δ such that if d (x, x ˆ) < δ, then the term on the right in the above is less than ε/3. Hence if d (x, x ˆ) < δ, then ρ (f (x) , f (ˆ x)) < ε and so f is uniformly continuous as claimed. 

11.2

Connected Sets

This has absolutely nothing to do with linear algebra but is here to provide convenient results to be used later when linear algebra will occur as part of some topics in analysis. Stated informally, connected sets are those which are in one piece. In order to define what is meant by this, I will first consider what it means for a set to not be in one piece. This is called separated. Connected sets are defined in terms of not being separated. This is why theorems about connected sets sometimes seem a little tricky. Definition 11.2.1 A set, S in a metric space, is separated if there exist sets A, B such that S = A ∪ B, A, B ̸= ∅, and A ∩ B = B ∩ A = ∅. In this case, the sets A and B are said to separate S. A set is connected if it is not separated. Remember A denotes the closure of the set A.

262

CHAPTER 11. NORMED LINEAR SPACES

Note that the concept of connected sets is defined in terms of what it is not. This makes it somewhat difficult to understand. One of the most important theorems about connected sets is the following. Theorem 11.2.2 Suppose U is a set of connected sets and that there exists a point p which is in all of these connected sets. Then K ≡ ∪U is connected. Proof: Suppose K =A∪B ¯ ¯ where A ∩ B = B ∩ A = ∅, A ̸= ∅, B ̸= ∅. Let U ∈ U . Then U = (U ∩ A) ∪ (U ∩ B) and this would separate U if both sets in the union are nonempty since the limit points of U ∩ B are contained in the limit points of B. It follows that every set of U is contained in one of A or B. Suppose then that some U ⊆ A. Then all U ∈ U must be contained in A because if one is contained in B, this would violate the assumption that they all have a point p in common. Thus K is connected after all because this requires B = ∅. Alternatively, p is in one of these sets. Say p ∈ A. Then by the above argument every U must be in A because if not, the above would be a separation of U . Thus B = ∅.  The intersection of connected sets is not necessarily connected as is shown by the following picture. U V

Theorem 11.2.3 Let f : X → Y be continuous where Y is a metric space and X is connected. Then f (X) is also connected. Proof: To do this you show f (X) is not separated. Suppose to the contrary that f (X) = A ∪ B where A and B separate f (X) . Then consider the sets f −1 (A) and f −1 (B) . If z ∈ f −1 (B) , then f (z) ∈ B and so f (z) is not a limit point of A. Therefore, there exists an open set, U containing f (z) such that U ∩ A = ∅. But then, the continuity of f and Theorem 11.1.27 implies that f −1 (U ) is an open set containing z such that f −1 (U ) ∩ f −1 (A) = ∅. Therefore, f −1 (B) contains no limit points of f −1 (A) . Similar reasoning implies f −1 (A) contains no limit points of f −1 (B). It follows that X is separated by f −1 (A) and f −1 (B) , contradicting the assumption that X was connected.  An arbitrary set can be written as a union of maximal connected sets called connected components. This is the concept of the next definition. Definition 11.2.4 Let S be a set and let p ∈ S. Denote by Cp the union of all connected subsets of S which contain p. This is called the connected component determined by p. Theorem 11.2.5 Let Cp be a connected component of a set S in a metric space. Then Cp is a connected set and if Cp ∩ Cq ̸= ∅, then Cp = Cq . Proof: Let C denote the connected subsets of S which contain p. By Theorem 11.2.2, ∪C = Cp is connected. If x ∈ Cp ∩ Cq , then from Theorem 11.2.2, Cp ⊇ Cp ∪ Cq and so Cp ⊇ Cq . The inclusion goes the other way by the same reason.  This shows the connected components of a set are equivalence classes and partition the set. A set, I is an interval in R if and only if whenever x, y ∈ I then (x, y) ⊆ I. The following theorem is about the connected sets in R.

11.2. CONNECTED SETS

263

Theorem 11.2.6 A set C in R is connected if and only if C is an interval. Proof: Let C be connected. If C consists of a single point, p, there is nothing to prove. The interval is just [p, p] . Suppose p < q and p, q ∈ C. You need to show (p, q) ⊆ C. If x ∈ (p, q) \ C let C ∩ (−∞, x) ≡ A, and C ∩ (x, ∞) ≡ B. Then C = A ∪ B and the sets A and B separate C contrary to the assumption that C is connected. Conversely, let I be an interval. Suppose I is separated by A and B. Pick x ∈ A and y ∈ B. Suppose without loss of generality that x < y. Now define the set, S ≡ {t ∈ [x, y] : [x, t] ⊆ A} and let l be the least upper bound of S. Then l ∈ A so l ∈ / B which implies l ∈ A. But if l ∈ / B, then for some δ > 0, (l, l + δ) ∩ B = ∅ contradicting the definition of l as an upper bound for S. Therefore, l ∈ B which implies l ∈ / A after all, a contradiction. It follows I must be connected.  This yields a generalization of the intermediate value theorem from one variable calculus. Corollary 11.2.7 Let E be a connected set in a metric space and suppose f : E → R and that y ∈ (f (e1 ) , f (e2 )) where ei ∈ E. Then there exists e ∈ E such that f (e) = y. Proof: From Theorem 11.2.3, f (E) is a connected subset of R. By Theorem 11.2.6 f (E) must be an interval. In particular, it must contain y. This proves the corollary.  The following theorem is a very useful description of the open sets in R. Theorem 11.2.8 Let U be an open set in R. Then there exist countably many disjoint open sets ∞ {(ai , bi )}i=1 such that U = ∪∞ i=1 (ai , bi ) . Proof: Let p ∈ U and let z ∈ Cp , the connected component determined by p. Since U is open, there exists, δ > 0 such that (z − δ, z + δ) ⊆ U. It follows from Theorem 11.2.2 that (z − δ, z + δ) ⊆ Cp . This shows Cp is open. By Theorem 11.2.6, this shows Cp is an open interval, (a, b) where a, b ∈ [−∞, ∞] . There are therefore at most countably many of these connected components because each ∞ must contain a rational number and the rational numbers are countable. Denote by {(ai , bi )}i=1 the set of these connected components.  Definition 11.2.9 A set E in a metric space is arcwise connected if for any two points, p, q ∈ E, there exists a closed interval, [a, b] and a continuous function, γ : [a, b] → E such that γ (a) = p and γ (b) = q. An example of an arcwise connected metric space would be any subset of Rn which is the continuous image of an interval. Arcwise connected is not the same as connected. A well known example is the following. ) } {( 1 : x ∈ (0, 1] ∪ {(0, y) : y ∈ [−1, 1]} (11.2) x, sin x You can verify that this set of points in the normed vector space R2 is not arcwise connected but is connected.

264

11.3

CHAPTER 11. NORMED LINEAR SPACES

Subspaces Spans And Bases

As shown earlier, Fn is an example of a vector space with field of scalars F. Here is a short review of the major exchange theorem. Here and elsewhere, when it is desired to emphasize that certain things are vectors, bold face will be used. However, sometimes the context makes this sufficiently clear and bold face is not used. Theorem 11.3.1 If span (u1 , · · · , ur ) ⊆ span (v1 , · · · , vs ) ≡ V and {u1 , · · · , ur } are linearly independent, then r ≤ s. Proof: Suppose r > s. Let Ep denote a finite list of vectors of {v1 , · · · , vs } and let |Ep | denote the number of vectors in the list. Let Fp denote the first p vectors in {u1 , · · · , ur }. In case p = 0, Fp will denote the empty set. For 0 ≤ p ≤ s, let Ep have the property span (Fp , Ep ) = V and |Ep | is as small as possible for this to happen. I claim |Ep | ≤ s − p if Ep is nonempty. Here is why. For p = 0, it is obvious. Suppose true for some p < s. Then up+1 ∈ span (Fp , Ep ) and so there are constants, c1 , · · · , cp and d1 , · · · , dm where m ≤ s − p such that up+1 =

p ∑ i=1

ci ui +

m ∑

di zj

j=1

for {z1 , · · · , zm } ⊆ {v1 , · · · , vs } . Then not all the di can equal zero because this would violate the linear independence of the {u1 , · · · , ur } . Therefore, you can solve for one of the zk as a linear combination of {u1 , · · · , up+1 } and the other zj . Thus you can change Fp to Fp+1 and include one fewer vector in Ep . Thus |Ep+1 | ≤ m − 1 ≤ s − p − 1. This proves the claim. Therefore, Es is empty and span (u1 , · · · , us ) = V. However, this gives a contradiction because it would require us+1 ∈ span (u1 , · · · , us ) which violates the linear independence of these vectors.  Also recall the following. Definition 11.3.2 A finite set of vectors, {x1 , · · · , xr } is a basis for a vector space V if span (x1 , · · · , xr ) = V and {x1 , · · · , x∑ r } is linearly independent. Thus if v ∈ V there exist unique scalars, v1 , · · · , vr r such that v = i=1 vi xi . These scalars are called the components of v with respect to the basis {x1 , · · · , xr }. Corollary 11.3.3 Let {x1 , · · · , xr } and {y1 , · · · , ys } be two bases1 of Fn . Then r = s = n. Lemma 11.3.4 Let {v1 , · · · , vr } be a set of vectors. Then V ≡ span (v1 , · · · , vr ) is a subspace. Definition 11.3.5 Let V be a vector space. Then dim (V ) read as the dimension of V is the number of vectors in a basis. 1 This is the plural form of basis. We could say basiss but it would involve an inordinate amount of hissing as in “The sixth shiek’s sixth sheep is sick”. This is the reason that bases is used instead of basiss.

11.4. INNER PRODUCT AND NORMED LINEAR SPACES

265

Of course you should wonder right now whether an arbitrary subspace of a finite dimensional vector space even has a basis. In fact it does and this is in the next theorem. First, here is an interesting lemma which was also presented earlier. Lemma 11.3.6 Suppose v ∈ / span (u1 , · · · , uk ) and {u1 , · · · , uk } is linearly independent. Then {u1 , · · · , uk , v} is also linearly independent. Recall that this implies the following theorems also presented earlier. Theorem 11.3.7 Let V be a nonzero subspace of Y a finite dimensional vector space having dimension n. Then V has a basis. In words the following corollary states that any linearly independent set of vectors can be enlarged to form a basis. Corollary 11.3.8 Let V be a subspace of Y, a finite dimensional vector space of dimension n and let {v1 , · · · , vr } be a linearly independent set of vectors in V . Then either it is a basis for V or there exist vectors, vr+1 , · · · , vs such that {v1 , · · · , vr , vr+1 , · · · , vs } is a basis for V. Theorem 11.3.9 Let V be a subspace of Y, a finite dimensional vector space of dimension n and suppose span (u1 · · · , up ) = V where the ui are nonzero vectors. Then there exist vectors, {v1 · · · , vr } such that {v1 · · · , vr } ⊆ {u1 · · · , up } and {v1 · · · , vr } is a basis for V .

11.4

Inner Product And Normed Linear Spaces

11.4.1

The Inner Product In Fn

To do calculus, you must understand what you mean by distance. For functions of one variable, the distance was provided by the absolute value of the difference of two numbers. This must be generalized to Fn and to more general situations. This is the most familiar setting for elementary courses. We call it the dot product in calculus and physics but it is a case of something which also works in Cn . Definition 11.4.1 Let x, y ∈ Fn . Thus x = (x1 , · · · , xn ) where each xk ∈ F and a similar formula holding for y. Then the inner product of these two vectors is defined to be ∑ x · y ≡ (x, y) ≡ xj yj ≡ x1 y1 + · · · + xn yn . j

This is also often denoted by (x, y) or as ⟨x, y⟩ and is called an inner product. I will use either notation. Notice how you put the conjugate on the entries of the vector, y. It makes no difference if the vectors happen to be real vectors but with complex vectors you must do it this way2 . The reason for this is that when you take the inner product of a vector with itself, you want to get the square of the length of the vector, a positive number. Placing the conjugate on the components of y in the above definition assures this will take place. Thus ∑ ∑ 2 |xj | ≥ 0. (x, x) = xj xj = j

j

2 Sometimes people put the conjugate on the components of the first entry. It doesn’t matter a lot, but it is good to be consistent. I have chosen to place the conjugate on the components of the second entry.

266

CHAPTER 11. NORMED LINEAR SPACES

If you didn’t place a conjugate as in the above definition, things wouldn’t work out correctly. For example, 2 (1 + i) + 22 = 4 + 2i and this is not a positive number. The following properties of the inner product follow immediately from the definition and you should verify each of them. Properties of the inner product: 1. (u, v) = (v, u) 2. If a, b are numbers and u, v, z are vectors then ((au + bv) , z) = a (u, z) + b (v, z) . 3. (u, u) ≥ 0 and it equals 0 if and only if u = 0. Note this implies (x,αy) = α (x, y) because (x,αy) = (αy, x) = α (y, x) = α (x, y) The norm is defined as follows. Definition 11.4.2 For x ∈ Fn , ( |x| ≡

n ∑

)1/2 |xk |

2

1/2

= (x, x)

k=1

11.4.2

General Inner Product Spaces

Any time you have a vector space which possesses an inner product, something satisfying the properties 1 - 3 above, it is called an inner product space. As usual, F will mean the field of scalars, either C or R. Here is a fundamental inequality called the Cauchy Schwarz inequality which holds in any inner product space. First here is a simple lemma. Lemma 11.4.3 If z ∈ F there exists θ ∈ F such that θz = |z| and |θ| = 1. Proof: Let θ = 1 if z = 0 and otherwise, let θ =

z . Recall that for z = x + iy, z = x − iy and |z|

2

zz = |z| . In case z is real, there is no change in the above.  Theorem 11.4.4 (Cauchy Schwarz)Let H be an inner product space. The following inequality holds for x and y ∈ H. 1/2 1/2 |(x, y)| ≤ (x, x) (y, y) (11.3) Equality holds in this inequality if and only if one vector is a multiple of the other. Proof: Let θ ∈ F such that |θ| = 1 and θ (x, y) = |(x, y)| (

)

Consider p (t) ≡ x + θty, x + tθy where t ∈ R. Then from the above list of properties of the inner product, 0

≤ p (t) = (x, x) + tθ (x, y) + tθ (y, x) + t2 (y, y) =

(x, x) + tθ (x, y) + tθ(x, y) + t2 (y, y)

=

(x, x) + 2t Re (θ (x, y)) + t2 (y, y)

=

(x, x) + 2t |(x, y)| + t2 (y, y)

(11.4)

11.4. INNER PRODUCT AND NORMED LINEAR SPACES

267

and this must hold for all t ∈ R. Therefore, if (y, y) = 0 it must be the case that |(x, y)| = 0 also since otherwise the above inequality would be violated. Therefore, in this case, |(x, y)| ≤ (x, x)

1/2

(y, y)

1/2

.

On the other hand, if (y, y) ̸= 0, then p (t) ≥ 0 for all t means the graph of y = p (t) is a parabola which opens up and it either has exactly one real zero in the case its vertex touches the t axis or it has no real zeros. From the quadratic formula this happens exactly when 2

4 |(x, y)| − 4 (x, x) (y, y) ≤ 0 which is equivalent to 11.3. It is clear from a computation that if one vector is a scalar multiple of the other that equality 2 holds in 11.3. Conversely, suppose equality does hold. Then this is equivalent to saying 4 |(x, y)| − 4 (x, x) (y, y) = 0 and so from the quadratic formula, there exists one real zero to p (t) = 0. Call it t0 . Then 2 ) ( p (t0 ) ≡ x + θt0 y, x + t0 θy = x + θty = 0 and so x = −θt0 y. This proves the theorem.  Note that in establishing the inequality, I only used part of the above properties of the inner product. It was not necessary to use the one which says that if (x, x) = 0 then x = 0. Now the length of a vector can be defined. 1/2

Definition 11.4.5 Let z ∈ H. Then |z| ≡ (z, z)

.

Theorem 11.4.6 For length defined in Definition 11.4.5, the following hold. |z| ≥ 0 and |z| = 0 if and only if z = 0

(11.5)

If α is a scalar, |αz| = |α| |z|

(11.6)

|z + w| ≤ |z| + |w| .

(11.7)

Proof: The first two claims are left as exercises. To establish the third, |z + w|

2

≡ (z + w, z + w) =

(z, z) + (w, w) + (w, z) + (z, w)

=

|z| + |w| + 2 Re (w, z)



|z| + |w| + 2 |(w, z)|



|z| + |w| + 2 |w| |z| = (|z| + |w|) . 

2

2

2

2

2

2

2

One defines the distance between two vectors x, y in an inner product space as |x − y| . This produces a metric in the obvious way: d (x, y) ≡ |x − y| Not surprisingly we have the following theorem in which F will be either R or C. Theorem 11.4.7 Fn is complete. Also, if K is a nonempty closed and bounded subset of Fn , then K is compact. Also, if f : K → R, it achieves its maximum and minimum on K. Proof: Recall Example 11.1.23 which established completeness of Fn with the funny norm ||x||∞ ≡ max {|xi | , i = 1, 2, · · · , n} However, 1 √ |x| ≤ ||x||∞ ≤ |x| n

268

CHAPTER 11. NORMED LINEAR SPACES

and so the Cauchy sequences and limits are exactly the same for the two norms. Thus Fn is complete where the norm is the one just discussed. Now suppose K is closed and bounded. By the estimate on the norms just given, it is closed and bounded with respect to ||·||∞ also because a point is a limit point with respect to one norm if and only if it is a limit point with respect to the other. Now if B (0, r) ⊇ K, then B∞ (0,r) ⊇ K also where this symbol denotes the ball taken with respect to ||·||∞ rather than |·|. Hence K ⊆ ∏ n j=1 ([−r, r] + [−ir, ir]) . It suffices to verify sequential compactness thanks to Theorem 11.1.38. Letting {xn } ⊆ K, it follows that Re xni is in [−r, r] and Im xni is in [−r, r] and so, taking 2n subsequences, there exists a subsequence still denoted with n such that limn→∞ Re xni = ai ∈ [−r, r] , limn→∞ Im xni = bi for each i. Hence xn → a + ib ≡ x. Now, since K is closed, it follows that x ∈ K and this shows sequential compactness which is equivalent to compactness. The last claim is as follows. Let M ≡ sup {f (x) : x ∈ K} and let xn be a maximizing sequence so that M = limn→∞ f (xn ) . By compactness, there is a subsequence xnk → x ∈ K. Then by continuity, M = limk→∞ f (xnk ) = f (x) . The existence of the minimum is similar. 

11.4.3

Normed Vector Spaces

The best sort of a norm is one which comes from an inner product, because these norms preserve familiar geometrical ideas. However, any vector space V which has a function ||·|| which maps V to [0, ∞) is called a normed vector space if ||·|| satisfies 11.8 - 11.10. That is ||z|| ≥ 0 and ||z|| = 0 if and only if z = 0

(11.8)

If α is a scalar, ||αz|| = |α| ||z||

(11.9)

||z + w|| ≤ ||z|| + ||w|| .

(11.10)

The last inequality above is called the triangle inequality. Another version of this is |||z|| − ||w||| ≤ ||z − w||

(11.11)

Note that this shows that x → ∥x∥ is a continuous function. Thus B (z, r) ≡ {x : ∥x − z∥ < r} is an open set and D (z, r) ≡ {x : ∥x − z∥ ≤ r} is a closed set. To see that 11.11 holds, note ||z|| = ||z − w + w|| ≤ ||z − w|| + ||w|| which implies ||z|| − ||w|| ≤ ||z − w|| and now switching z and w, yields ||w|| − ||z|| ≤ ||z − w|| which implies 11.11. The distance between x, y is given by ∥x − y∥ This distance satisfies ∥x − y∥ = ∥y − x∥ ∥x − y∥ ≥ 0 and is 0 if and only if x = y ∥x − y∥ ≤ ∥x − z∥ + ∥z − y∥ Thus this yields a metric space, but it has more because it also involves interaction with the algebra of the vector space.

11.4. INNER PRODUCT AND NORMED LINEAR SPACES

11.4.4

269

The p Norms

Examples of norms are the p norms on Cn . These do not come from an inner product but they are norms just the same. Definition 11.4.8 Let x ∈ Cn . Then define for p ≥ 1, ( n )1/p ∑ p ||x||p ≡ |xi | i=1

The following inequality is called Holder’s inequality. Proposition 11.4.9 For x, y ∈ Cn , n ∑

|xi | |yi | ≤

( n ∑

i=1

p

|xi |

)1/p ( n ∑

i=1

)1/p′ p′

|yi |

i=1

The proof will depend on the following lemma. Lemma 11.4.10 If a, b ≥ 0 and p′ is defined by

1 p

1 p′

+

= 1, then



ab ≤

ap bp + ′. p p

Proof of the Proposition: If x or y equals the zero vector there is nothing to prove. Therefore, (∑ )1/p′ ∑n n p 1/p p′ . Then using assume they are both nonzero. Let A = ( i=1 |xi | ) and B = |y | i=1 i Lemma 11.4.10, [ ( )p ( ) p′ ] n n ∑ ∑ 1 |xi | |xi | |yi | 1 |yi | ≤ + ′ A B p A p B i=1 i=1 =

n n 1 1 ∑ 1 1 ∑ p p′ |x | + |yi | i p Ap i=1 p′ B p i=1

=

1 1 + =1 p p′

and so n ∑

|xi | |yi | ≤ AB =

( n ∑

i=1

p

|xi |

)1/p ( n ∑

)1/p′ p′



|yi |

i=1

i=1

Theorem 11.4.11 The p norms do indeed satisfy the axioms of a norm. Proof: It is obvious that ||·||p does indeed satisfy most of the norm axioms. The only one that is not clear is the triangle inequality. To save notation write ||·|| in place of ||·||p in what follows. Note also that pp′ = p − 1. Then using the Holder inequality, p

||x + y||

=

n ∑

p

|xi + yi |

i=1

≤ =

n ∑ i=1 n ∑

p−1

|xi + yi |

|xi + yi | p′ |xi | +

i=1



p

( n ∑

n ∑

|xi + yi |

i=1 n ∑

p−1

|yi |

p

|xi + yi | p′ |yi |

i=1

)1/p′ ( n )1/p ( n )1/p  ∑ ∑ p p p   + |xi + yi | |xi | |yi |

i=1

=

|xi | +

p/p′

||x + y||

(

i=1

||x||p + ||y||p

)

i=1

270

CHAPTER 11. NORMED LINEAR SPACES p/p′

so dividing by ||x + y||

, it follows −p/p′

p

||x + y|| ||x + y|| = ||x + y|| ≤ ||x||p + ||y||p ( ( ) ) p − pp′ = p 1 − p1′ = p p1 = 1. .  It only remains to prove Lemma 11.4.10. Proof of the lemma: Let p′ = q to save on notation and consider the following picture: x b x = tp−1 t = xq−1 t a ∫



a

ab ≤

tp−1 dt + 0

b

xq−1 dx = 0

ap bq + . p q

Note equality occurs when ap = bq .  Alternate proof of the lemma: First note that if either a or b are zero, then there is nothing to show so we can assume b, a > 0. Let b > 0 and let f (a) =

ap bq + − ab p q

Then the second derivative of f is positive on (0, ∞) so its graph is convex. Also f (0) > 0 and lima→∞ f (a) = ∞. Then a short computation shows that there is only one critical point, where f is minimized and this happens when a is such that ap = bq . At this point, f (a) = bq − bq/p b = bq − bq−1 b = 0 Therefore, f (a) ≥ 0 for all a and this proves the lemma.  Another example of a very useful norm on Fn is the norm ∥·∥∞ defined by ∥x∥∞ ≡ max {|xk | : k = 1, 2, · · · , n} You should verify that this satisfies all the axioms of a norm. Here is the triangle inequality. ∥x + y∥∞

= max {|xk + yk |} ≤ max {|xk | + |yk |} k

k

≤ max {|xk |} + max {|yk |} = ∥x∥∞ + ∥y∥∞ k

k

It turns out that in terms of analysis (limits of sequences, completeness and so forth), it makes absolutely no difference which norm you use. There are however, significant geometric differences. This will be explained later. First is the notion of an orthonormal basis.

11.4.5

Orthonormal Bases

Not all bases for an inner product space H are created equal. The best bases are orthonormal. Definition 11.4.12 Suppose {v1 , · · · , vk } is a set of vectors in an inner product space H. It is an orthonormal set if { 1 if i = j (vi , vj ) = δ ij = 0 if i ̸= j

11.4. INNER PRODUCT AND NORMED LINEAR SPACES

271

Every orthonormal set of vectors is automatically linearly independent. Indeed, if n ∑

ak vk = 0,

k=1

then taking the inner product with vj , yields 0 = use this simple observation whenever convenient.

∑n k=1

ak (vk , vj ) = aj . Thus each aj = 0. We will

Proposition 11.4.13 Suppose {v1 , · · · , vk } is an orthonormal set of vectors. Then it is linearly independent. ∑k Proof: Suppose i=1 ci vi = 0. Then taking inner products with vj , ∑ ∑ 0 = (0, vj ) = ci (vi , vj ) = ci δ ij = cj . i

i

Since j is arbitrary, this shows the set is linearly independent as claimed.  It turns out that if X is any subspace of H, then there exists an orthonormal basis for X. Lemma 11.4.14 Let X be a subspace of dimension n whose basis is {x1 , · · · , xn } . Then there exists an orthonormal basis for X, {u1 , · · · , un } which has the property that for each k ≤ n, span(x1 , · · · , xk ) = span (u1 , · · · , uk ) . Proof: Let {x1 , · · · , xn } be a basis for X. Let u1 ≡ x1 / |x1 | . Thus for k = 1, span (u1 ) = span (x1 ) and {u1 } is an orthonormal set. Now suppose for some k < n, u1 , · · · , uk have been chosen such that (uj , ul ) = δ jl and span (x1 , · · · , xk ) = span (u1 , · · · , uk ). Then define ∑k xk+1 − j=1 (xk+1 , uj ) uj , uk+1 ≡ (11.12) ∑k xk+1 − j=1 (xk+1 , uj ) uj where the denominator is not equal to zero because the xj form a basis and so xk+1 ∈ / span (x1 , · · · , xk ) = span (u1 , · · · , uk ) Thus by induction, uk+1 ∈ span (u1 , · · · , uk , xk+1 ) = span (x1 , · · · , xk , xk+1 ) . Also, xk+1 ∈ span (u1 , · · · , uk , uk+1 ) which is seen easily by solving 11.12 for xk+1 and it follows span (x1 , · · · , xk , xk+1 ) = span (u1 , · · · , uk , uk+1 ) . −1 ∑k If l ≤ k, then denoting by C the scalar xk+1 − j=1 (xk+1 , uj ) uj ,   k ∑ (uk+1 , ul ) = C (xk+1 , ul ) − (xk+1 , uj ) (uj , ul )  = C (xk+1 , ul ) −

j=1 k ∑

 (xk+1 , uj ) δ lj 

j=1

=

C ((xk+1 , ul ) − (xk+1 , ul )) = 0.

n {uj }j=1

, generated in this way are therefore an orthonormal basis because each vector The vectors, has unit length.  The process by which these vectors were generated is called the Gram Schmidt process. The following corollary is obtained from the above process. Corollary 11.4.15 Let X be a finite dimensional inner product space of dimension n whose basis is {u1 , · · · , uk , xk+1 , · · · , xn } . Then if {u1 , · · · , uk } is orthonormal, then the Gram Schmidt process applied to the given list of vectors in order leaves {u1 , · · · , uk } unchanged.

272

11.5

CHAPTER 11. NORMED LINEAR SPACES

Equivalence Of Norms

As mentioned above, it makes absolutely no difference which norm you decide to use. This holds in general finite dimensional normed spaces and is shown here. Definition 11.5.1 Let (V, ∥·∥) be a normed linear space and ∑let a basis be {v1 , · · · , vn }. For x ∈ V, let its component vector in Fn be (α1 , · · · , αn ) so that x = i αi vi . Then define θx ≡ α =

( α1

···

)T αn

Thus θ is well defined, one to one and onto from V to Fn . It is also linear and its inverse θ−1 satisfies all the same algebraic properties. The following fundamental lemma comes from the extreme value theorem for continuous functions defined on a compact set. Let





f (α) ≡ αi vi ≡ θ−1 α

i

∑ Then it is clear that f is a continuous function. This is because α → i αi vi is a continuous map into V and from the triangle inequality x → ∥x∥ is continuous as a map from V to R. Lemma 11.5.2 There exists δ > 0 and ∆ ≥ δ such that δ = min {f (α) : |α| = 1} , ∆ = max {f (α) : |α| = 1} Also, δ |α| δ |θv|



−1 θ α ≤ ∆ |α|

≤ ||v|| ≤ ∆ |θv|

(11.13) (11.14)

Proof: These numbers exist thanks ∑n to Theorem 11.4.7. It cannot be that δ = 0 because if it were, you would have |α| = 1 but j=1 αk vj = 0 which is impossible since {v1 , · · · , vn } is linearly independent. The first of the above inequalities follows from

( )

−1 α

=f α ≤∆ θ δ≤

|α| |α| the second follows from observing that θ−1 α is a generic vector v in V .  Now we can draw several conclusions about (V, ∥·∥) for V finite dimensional. Theorem 11.5.3 Let (V, ||·||) be a finite dimensional normed linear space. Then the compact sets are exactly those which are closed and bounded. Also (V, ||·||) is complete. If K is a closed and bounded set in (V, ||·||) and f : K → R, then f achieves its maximum and minimum on K. Proof: First note that the inequalities 11.13 and 11.14 show that both θ−1 and θ are continuous. Thus these take convergent sequences to convergent sequences. ∞ ∞ Let {wk }k=1 be a Cauchy sequence. Then from 11.14, {θwk }k=1 is a Cauchy sequence. Thanks n to Theorem 11.4.7, it converges to some β ∈ F . It follows that limk→∞ θ−1 θwk = limk→∞ wk = θ−1 β ∈ V . This shows completeness. Next let K be a closed and bounded set. Let {wk } ⊆ K. Then {θwk } ⊆ θK which is also a closed and bounded set thanks to the inequalities 11.13 and 11.14. Thus there is a subsequence still denoted with k such that θwk → β ∈ Fn . Then as just done, wk → θ−1 β. Since K is closed, it follows that θ−1 β ∈ K. Finally, why are the only compact sets those which are closed and bounded? Let K be compact. ∞ If it is not bounded, then there is a sequence of points of K, {km }m=1 such that ∥km ∥ ≥ m. It follows

11.6. NORMS ON L (X, Y )

273

that it cannot have a convergent subsequence because the points are further apart from each other than 1/2. Hence K is not sequentially compact and consequently it is not compact. It follows that K is bounded. If K is not closed, then there exists a limit point k which is not in K. (Recall that closed means it has all its limit points.) By Theorem 11.1.7, there is a sequence of distinct points ∞ having no repeats and none equal to k denoted as {km }m=1 such that km → k. Then this sequence {km } fails to have a subsequence which converges to a point of K. Hence K is not sequentially compact. Thus, if K is compact then it is closed and bounded. The last part is identical to the proof in Theorem 11.4.7. You just take a convergent subsequence of a minimizing (maximizing) sequence and exploit continuity.  Next is the theorem which states that any two norms on a finite dimensional vector space are equivalent. Theorem 11.5.4 Let ||·|| , |||·||| be two norms on V a finite dimensional vector space. Then they are equivalent, which means there are constants 0 < a < b such that for all v, a ||v|| ≤ |||v||| ≤ b ||v|| ˆ go with |||·|||. Then using the inequalities Proof: In Lemma 11.5.2, let δ, ∆ go with ||·|| and ˆδ, ∆ of this lemma, ˆ ˆ ∆ ∆∆ ∆∆ ||v|| ≤ ∆ |θv| ≤ |||v||| ≤ |θv| ≤ ||v|| ˆδ ˆδ δ ˆδ and so

ˆδ ˆ ∆ ||v|| ≤ |||v||| ≤ ||v|| ∆ δ

Thus the norms are equivalent.  It follows right away that the closed and open sets are the same with two different norms. Also, all considerations involving limits are unchanged from one norm to another. Corollary 11.5.5 Consider the metric spaces (V, ∥·∥1 ) , (V, ∥·∥2 ) where V has dimension n. Then a set is closed or open in one of these if and only if it is respectively closed or open in the other. In other words, the two metric spaces have exactly the same open and closed sets. Also, a set is bounded in one metric space if and only if it is bounded in the other. Proof: This follows from Theorem 11.1.27, the theorem about the equivalent formulations of continuity. Using this theorem, it follows from Theorem 11.5.4 that the identity map I (x) ≡ x is continuous. The reason for this is that the inequality of this theorem implies that if ∥vm − v∥1 → 0 then ∥Ivm − Iv∥2 = ∥I (vm − v)∥2 → 0 and the same holds on switching 1 and 2 in what was just written. Therefore, the identity map takes open sets to open sets and closed sets to closed sets. In other words, the two metric spaces have the same open sets and the same closed sets. Suppose S is bounded in (V, ∥·∥1 ). This means it is contained in B (0, r)1 where the subscript of 1 indicates the norm is ∥·∥1 . Let δ ∥·∥1 ≤ ∥·∥2 ≤ ∆ ∥·∥1 as described above. Then S ⊆ B (0, r)1 ⊆ B (0, ∆r)2 so S is also bounded in (V, ∥·∥2 ). Similarly, if S is bounded in ∥·∥2 then it is bounded in ∥·∥1 . 

11.6

Norms On L (X, Y )

First here is a definition which applies in all cases, even if X, Y are infinite dimensional. Definition 11.6.1 Let X and Y be normed linear spaces with norms ||·||X and ||·||Y respectively. Then L (X, Y ) denotes the space of linear transformations, called bounded linear transformations, mapping X to Y which have the property that ||A|| ≡ sup {||Ax||Y : ||x||X ≤ 1} < ∞.

274

CHAPTER 11. NORMED LINEAR SPACES

Then ||A|| is referred to as the operator norm of the bounded linear transformation A. We will always assume that if a norm is present the mappings are bounded. However, we show that this boundedness will be automatic in the case of finite dimensions. It is an easy exercise to verify that ||·|| is a norm on L (X, Y ) and it is always the case that ||Ax||Y ≤ ||A|| ||x||X . Furthermore, you should verify that you can replace ≤ 1 with = 1 in the definition. Thus ||A|| ≡ sup {||Ax||Y : ||x||X = 1} . In the case that the vector spaces are finite dimensional, the situation becomes very simple. Lemma 11.6.2 Let V be a finite dimensional vector space with norm ||·||V and let W be a vector space with norm ||·||W . Then if A is a linear map from V to W, then A is continuous and bounded. Proof: Suppose limk→∞ vk = v in V. Let {v1 , ·(· · , vn })be a basis and let θ be the coordinate map of Definition 11.5.1. Then by 11.14, limk→∞ θ vk − v = 0 ∈ Fn . Letting αk and α be θvk and θv respectively, it follows that αk → α and so Avk = A

n ∑ j=1

αkj vj =

n ∑

αkj Avj

j=1

∑n

which converges to k=1 αj Avj = Av as k → ∞. Thus A is continuous. Then also v → ||Av||W is a continuous function. Now let D be the closed ball of radius 1 in V . By Theorem 11.5.3, this set D is compact and so max {||Av||W : ||v||V ≤ 1} ≡ ||A|| < ∞.  Then we have the following theorem. Theorem 11.6.3 Let X and Y be finite dimensional normed linear spaces of dimension n and m respectively and denote by ||·|| the norm on either X or Y . Then if A is any linear function mapping X to Y, then A ∈ L (X, Y ) and (L (X, Y ) , ||·||) is a complete normed linear space of dimension nm with ||Ax|| ≤ ||A|| ||x|| . Also if A ∈ L (X, Y ) and B ∈ L (Y, Z) where X, Y, Z are normed linear spaces, ∥BA∥ ≤ ∥B∥ ∥A∥ Proof: It is necessary to show the norm defined on linear transformations really is a norm. Again the triangle inequality is the only property which is not obvious. It remains to show this and verify ||A|| < ∞. This last follows from the above Lemma 11.6.2. Thus the norm is at least well defined. It remains to verify its properties. ||A + B|| ≡ sup{||(A + B) (x)|| : ||x|| ≤ 1} ≤ sup{||Ax|| : ||x|| ≤ 1} + sup{||Bx|| : ||x|| ≤ 1} ≡ ||A|| + ||B|| . Next consider the assertion about the dimension of L (X, Y ) . It follows from Theorem 5.1.4. By Theorem 11.5.4 (L (X, Y ) , ||·||) is complete. If x ̸= 0, x 1 ≤ ||A|| ||Ax|| = A ||x|| ||x|| Thus ||Ax|| ≤ ||A|| ||x||.

11.7. LIMITS OF A FUNCTION

275

Consider the last claim. ∥BA∥ ≡ sup ∥B (A (x))∥ ≤ ∥B∥ sup ∥Ax∥ = ∥B∥ ∥A∥  ∥x∥≤1

∥x∥≤1

Note by Theorem 11.5.4 you can define a norm any way desired on any finite dimensional linear space which has the field of scalars R or C and any other way of defining a norm on this space yields an equivalent norm. Thus, it doesn’t much matter as far as notions of convergence are concerned which norm is used for a finite dimensional space. In particular in the space of m × n matrices, you can use the operator norm defined above, or some other way of giving this space a norm. A popular choice for a norm is the Frobenius norm. Definition 11.6.4 Define A∗ as the transpose of the conjugate of A. This is called the adjoint of A. Make the space of m × n matrices into a inner product space by defining ∑ ∑∑ ∑ ∗ (A, B) ≡ trace (AB ∗ ) ≡ (AB ∗ )ii = Aij Bji ≡ Aij Bij i 1/2

∥A∥ ≡ (A, A)

i

j

i,j

.

This is clearly a norm because, as implied by the notation, A, B → (A, B) is an inner product on the space of m × n matrices. You should verify that this is the case.

11.7

Limits Of A Function

As in the case of scalar valued functions of one variable, a concept closely related to continuity is that of the limit of a function. The notion of limit of a function makes sense at points x, which are limit points of D (f ) and this concept is defined next. In all that follows (V, ∥·∥) and (W, ∥·∥) are two normed linear spaces. Recall the definition of limit point first. Definition 11.7.1 Let A ⊆ W be a set. A point x, is a limit point of A if B (x, r) contains infinitely many points of A for every r > 0. Definition 11.7.2 Let f : D (f ) ⊆ V → W be a function and let x be a limit point of D (f ). Then lim f (y) = L

y→x

if and only if the following condition holds. For all ε > 0 there exists δ > 0 such that if 0 < ∥y − x∥ < δ, and y ∈ D (f ) then, ∥L − f (y)∥ < ε. Theorem 11.7.3 If limy→x f (y) = L and limy→x f (y) = L1 , then L = L1 . Proof: Let ε > 0 be given. There exists δ > 0 such that if 0 < |y − x| < δ and y ∈ D (f ), then ∥f (y) − L∥ < ε, ∥f (y) − L1 ∥ < ε. Pick such a y. There exists one because x is a limit point of D (f ). Then ∥L − L1 ∥ ≤ ∥L − f (y)∥ + ∥f (y) − L1 ∥ < ε + ε = 2ε. Since ε > 0 was arbitrary, this shows L = L1 .  As in the case of functions of one variable, one can define what it means for limy→x f (x) = ±∞.

276

CHAPTER 11. NORMED LINEAR SPACES

Definition 11.7.4 If f (x) ∈ R, limy→x f (x) = ∞ if for every number l, there exists δ > 0 such that whenever ∥y − x∥ < δ and y ∈ D (f ), then f (x) > l. limy→x f (x) = −∞ if for every number l, there exists δ > 0 such that whenever ∥y − x∥ < δ and y ∈ D (f ), then f (x) < l. The following theorem is just like the one variable version of calculus. Theorem 11.7.5 Suppose f : D (f ) ⊆ V → Fm . Then for x a limit point of D (f ), lim f (y) = L

(11.15)

lim fk (y) = Lk

(11.16)

y→x

if and only if y→x

where f (y) ≡ (f1 (y) , · · · , fp (y)) and L ≡ (L1 , · · · , Lp ). Suppose here that f has values in W, a normed linear space and lim f (y) = L, lim g (y) = K

y→x

y→x

where K,L ∈ W . Then if a, b ∈ F, lim (af (y) + bg (y)) = aL + bK,

(11.17)

lim (f, g) (y) = (L,K)

(11.18)

y→x

If W is an inner product space, y→x

If g is scalar valued with limy→x g (y) = K, lim f (y) g (y) = LK.

y→x

(11.19)

Also, if h is a continuous function defined near L, then lim h ◦ f (y) = h (L) .

y→x

(11.20)

Suppose limy→x f (y) = L. If ∥f (y) − b∥ ≤ r for all y sufficiently close to x, then |L−b| ≤ r also. Proof: Suppose 11.15. Then letting ε > 0 be given there exists δ > 0 such that if 0 < ∥y−x∥ < δ, it follows |fk (y) − Lk | ≤ ∥f (y) − L∥ < ε which verifies 11.16. Now suppose 11.16 holds. Then letting ε > 0 be given, there exists δ k such that if 0 < ∥y−x∥ < δ k , then |fk (y) − Lk | < ε. Let 0 < δ < min (δ 1 , · · · , δ p ). Then if 0 < ∥y−x∥ < δ, it follows ∥f (y) − L∥∞ < ε Any other norm on Fm would work out the same way because the norms are all equivalent. Each of the remaining assertions follows immediately from the coordinate descriptions of the various expressions and the first part. However, I will give a different argument for these. The proof of 11.17 is left for you. Now 11.18 is to be verified. Let ε > 0 be given. Then by the triangle inequality, |(f ,g) (y) − (L,K)| ≤ |(f ,g) (y) − (f (y) , K)| + |(f (y) , K) − (L,K)| ≤ ∥f (y)∥ ∥g (y) − K∥ + ∥K∥ ∥f (y) − L∥ .

11.7. LIMITS OF A FUNCTION

277

There exists δ 1 such that if 0 < ∥y−x∥ < δ 1 and y ∈ D (f ), then ∥f (y) − L∥ < 1, and so for such y, the triangle inequality implies, ∥f (y)∥ < 1 + ∥L∥. Therefore, for 0 < ∥y−x∥ < δ 1 , |(f ,g) (y) − (L,K)| ≤ (1 + ∥K∥ + ∥L∥) [∥g (y) − K∥ + ∥f (y) − L∥] .

(11.21)

Now let 0 < δ 2 be such that if y ∈ D (f ) and 0 < ∥x−y∥ < δ 2 , ∥f (y) − L∥
0 given, there exists η > 0 such that if ∥y−L∥ < η, then ∥h (y) −h (L)∥ < ε Now since limy→x f (y) = L, there exists δ > 0 such that if 0 < ∥y−x∥ < δ, then ∥f (y) −L∥ < η. Therefore, if 0 < ∥y−x∥ < δ,

∥h (f (y)) −h (L)∥ < ε.

It only remains to verify the last assertion. Assume ∥f (y) − b∥ ≤ r. It is required to show that ∥L−b∥ ≤ r. If this is not true, then ∥L−b∥ > r. Consider B (L, ∥L−b∥ − r). Since L is the limit of f , it follows f (y) ∈ B (L, ∥L−b∥ − r) whenever y ∈ D (f ) is close enough to x. Thus, by the triangle inequality, ∥f (y) − L∥ < ∥L−b∥ − r and so r


0 there exists δ > 0 such that if ∥x − y∥ < δ and y ∈ D (f ), then |f (x) − f (y)| < ε. In particular, this holds if 0 < ∥x − y∥ < δ and this is just the definition of the limit. Hence f (x) = limy→x f (y). Next suppose x is a limit point of D (f ) and limy→x f (y) = f (x). This means that if ε > 0 there exists δ > 0 such that for 0 < ∥x − y∥ < δ and y ∈ D (f ), it follows |f (y) − f (x)| < ε. However, if y = x, then |f (y) − f (x)| = |f (x) − f (x)| = 0 and so whenever y ∈ D (f ) and ∥x − y∥ < δ, it follows |f (x) − f (y)| < ε, showing f is continuous at x.  ( 2 ) −9 Example 11.7.7 Find lim(x,y)→(3,1) xx−3 ,y .

278

CHAPTER 11. NORMED LINEAR SPACES It is clear that lim(x,y)→(3,1)

x2 −9 x−3

= 6 and lim(x,y)→(3,1) y = 1. Therefore, this limit equals (6, 1).

Example 11.7.8 Find lim(x,y)→(0,0)

xy x2 +y 2 .

First of all, observe the domain of the function is R2 \ {(0, 0)}, every point in R2 except the origin. Therefore, (0, 0) is a limit point of the domain of the function so it might make sense to take a limit. However, just as in the case of a function of one variable, the limit may not exist. In fact, this is the case here. To see this, take points on the line y = 0. At these points, the value of the function equals 0. Now consider points on the line y = x where the value of the function equals 1/2. Since, arbitrarily close to (0, 0), there are points where the function equals 1/2 and points where the function has the value 0, it follows there can be no limit. Just take ε = 1/10 for example. You cannot be within 1/10 of 1/2 and also within 1/10 of 0 at the same time. Note it is necessary to rely on the definition of the limit much more than in the case of a function of one variable and there are no easy ways to do limit problems for functions of more than one variable. It is what it is and you will not deal with these concepts without suffering and anguish.

11.8

Exercises

1. Consider the metric space C ([0, T ] , Rn ) with the norm ∥f ∥ ≡ maxx∈[0,T ] ∥f (x)∥∞ . Explain why the maximum exists. Show this is a complete metric space. Hint: If you have {fm } a Cauchy sequence in C ([0, T ] , Rn ) , then for each x, you have {fm (x)} a Cauchy sequence in Rn so it converges by completeness of Rn . See Example 11.1.23. Thus there exists f (x) ≡ limm→∞ fm (x). You must show that f is continuous. Consider ∥fm (x) − fm (y)∥ ≤ ∥fm (x) − fn (x)∥ + ∥fn (x) − fn (y)∥ + ∥fn (y) − fm (y)∥ ≤ 2ε/3 + ∥fn (x) − fn (y)∥ for n large enough. Now let m → ∞ to get the same inequality with f on the left. Next use continuity of fn . Finally, ∥f (x) − fn (x)∥ = lim ∥fm (x) − fn (x)∥ m→∞

and since a Cauchy sequence, ∥fm − fn ∥ < ε whenever m > n for n large enough. Use to show that ∥f − f n ∥∞ → 0. 2. For f ∈ C ([0, T ] , Rn ) , you define the Riemann integral in the usual way using Riemann sums. Alternatively, you can define it as (∫ t ) ∫ t ∫ t ∫ t f (s) ds = f1 (s) ds, f2 (s) ds, · · · , fn (s) ds 0

0

0

0

Then show that the following limit exists in Rn for each t ∈ (0, T ) . ∫ t+h lim

h→0

0

f (s) ds − h

∫t 0

f (s) ds

= f (t)

You should use the fundamental theorem of calculus from one variable calculus and the definition of the norm to verify this. Recall that lim f (t) = l

t→s

means that for all ε > 0, there exists δ > 0 such that if 0 < |t − s| < δ, then ∥f (t) − l∥∞ < ε. You have to use the definition of a limit in order to establish that something is a limit.

11.8. EXERCISES

279

3. A collection of functions F of C ([0, T ] , Rn ) is said to be uniformly equicontinuous if for every ε > 0 there exists δ > 0 such that if f ∈ F and |t − s| < δ, then ∥f (t) − f (s)∥∞ < ε. Thus the functions are uniformly continuous all at once. The single δ works for every pair t, s closer together than δ and for all functions f ∈ F . As an easy case, suppose there exists K such that for all f ∈ F , ∥f (t) − f (s)∥∞ ≤ K |t − s| show that F is uniformly equicontinuous. Now suppose G is a collection of functions of C ([0, T ] , Rn ) which is bounded. That is, ∥f ∥ = maxt∈[0,T ] ∥f (t)∥∞ < M < ∞ for all f ∈ G. Then let F denote the functions which are of the form ∫ t F (t) ≡ y0 + f (s) ds 0

where f ∈ G. Show that F is uniformly equicontinuous. Hint: This is a really easy problem if you do the right things. Here is the way you should proceed. Remember ∫ ∫ the triangle inequality b b from one variable calculus which said that for a < b a f (s) ds ≤ a |f (s)| ds. Then



b

f (s) ds

a



∫ ∫ b ∫ b b = max fi (s) ds ≤ max |fi (s)| ds ≤ ∥f (s)∥∞ ds i a i a a

Reduce to the case just considered using the assumption that these f are bounded. 4. Let V be a vector space with {v1 , · · · , vn }. For v ∈ V, denote its coordinate vector as ∑basis n v = (α1 , · · · , αn ) where v = k=1 αk vk . Now define ∥v∥ ≡ ∥v∥∞ Show that this is a norm on V . 5. Let (X, ∥·∥) be a normed linear space. A set A is said to be convex if whenever x, y ∈ A the line segment determined by these points given by tx + (1 − t) y for t ∈ [0, 1] is also in A. Show that every open or closed ball is convex. Remember a closed ball is D (x, r) ≡ {ˆ x : ∥ˆ x − x∥ ≤ r} while the open ball is B (x, r) ≡ {ˆ x : ∥ˆ x − x∥ < r}. This should work just as easily in any normed linear space. 6. A vector v is in the convex hull of a nonempty set S if there are finitely many vectors of S, {v1 , · · · , vm } and nonnegative scalars {t1 , · · · , tm } such that v=

m ∑

tk v k ,

k=1

m ∑

tk = 1.

k=1

Such a linear combination is called a convex ∑m combination. Suppose now that S ⊆ V, a vector space of dimension n. Show that if v = k=1 tk vk is a vector in the convex hull for m > n + 1, then there exist other nonnegative scalars {t′k } summing to 1 such that v=

m−1 ∑

t′k vk .

k=1

Thus every vector in the convex hull of S can be obtained as a convex combination of at most n + 1 points of S. This incredible result is in Rudin [33]. Convexity is more a geometric property than a topological property. Hint: Consider L : Rm → V × R defined by (m ) m ∑ ∑ L (a) ≡ ak vk , ak k=1

k=1

280

CHAPTER 11. NORMED LINEAR SPACES Explain why ker (L) ̸= {0} . This will involve observing that Rm has higher dimension that V × R. Thus L cannot be one to one because one to one functions take linearly independent sets to linearly independent sets and you can’t have a linearly independent set with more than n + 1 vectors in V × R. Next, letting a ∈ ker (L) \ {0} and λ ∈ R, note that λa ∈ ker (L) . Thus for all λ ∈ R, m ∑ (tk + λak ) vk . v= k=1

Now vary λ till some tk +λak = 0 for some ak ̸= 0. You can assume each tk > 0 since otherwise, there is nothing to show. This is a really nice result because it can be used to show that the convex hull of a compact set is also compact. Show this next. This is also Problem 22 but here it is again. This is because it is a really nice result. 7. Show that the usual norm in Fn given by |x| = (x, x)

1/2

satisfies the following identities, the first of them being the parallelogram identity and the second being the polarization identity. 2

|x + y| + |x − y|

2

2

2

= 2 |x| + 2 |y| ) 1( 2 2 Re (x, y) = |x + y| − |x − y| 4

Show that these identities hold in any inner product space, not just Fn . 8. Let K be a nonempty closed and convex set in an inner product space (X, |·|) which is complete. For example, Fn or any other finite dimensional inner product space. Let y ∈ / K and let λ = inf {|y − x| : x ∈ K} Let {xn } be a minimizing sequence. That is λ = lim |y − xn | n→∞

Explain why such a minimizing sequence exists. Next explain the following using the parallelogram identity in the above problem as follows. 2 2 y − xn + xm = y − xn + y − xm 2 2 2 2 2 y x ( y x ) 2 1 1 n m 2 2 = − − − − + |y − xn | + |y − xm | 2 2 2 2 2 2 Hence xm − xn 2 2

2 xn + xm 1 1 2 2 = − y − + 2 |y − xn | + 2 |y − xm | 2 1 1 2 2 ≤ −λ2 + |y − xn | + |y − xm | 2 2

Next explain why the right hand side converges to 0 as m, n → ∞. Thus {xn } is a Cauchy sequence and converges to some x ∈ X. Explain why x ∈ K and |x − y| = λ. Thus there exists a closest point in K to y. Next show that there is only one closest point. Hint: To do this, 2 using the parallelogram law to show that this suppose there are two x1 , x2 and consider x1 +x 2 average works better than either of the two points which is a contradiction unless they are really the same point. This theorem is of enormous significance.

11.8. EXERCISES

281

9. Let K be a closed convex nonempty set in a complete inner product space (H, |·|) (Hilbert space) and let y ∈ H. Denote the closest point to y by P x. Show that P x is characterized as being the solution to the following variational inequality Re (z − P x, y − P x) ≤ 0 for all z ∈ K. Hint: Let x ∈ K. Then, due to convexity, a generic thing in K is of the form x + t (z − x) , t ∈ [0, 1] for every z ∈ K. Then 2

2

2

|x + t (z − x) − y| = |x − y| + t2 |z − x| − t2Re (z − x, y − x) If x = P y, then the minimum value of this on the left occurs when t = 0. Function defined on [0, 1] has its minimum at t = 0. What does it say about the derivative of this function at t = 0? Next consider the case that for some x the inequality Re (z − x, y − x) ≤ 0. Explain why this shows x = P y. 10. Using Problem 9 and Problem 8 show the projection map, P onto a closed convex subset is Lipschitz continuous with Lipschitz constant 1. That is |P x − P y| ≤ |x − y| 11. Suppose, in an inner product space, you know Re (x, y) . Show that you also know Im (x, y). That is, give a formula for Im (x, y) in terms of Re (x, y). Hint: (x, iy)

=

−i (x, y) = −i (Re (x, y) + iIm (x, y))

=

−iRe (x, y) + Im (x, y)

while, by definition, (x, iy) = Re (x, iy) + iIm (x, iy) Now consider matching real and imaginary parts. 12. Suppose K is a compact subset (If C is a set of open sets whose union contains K,(open cover) then there are finitely many sets of C whose union contains K.) of (X, d) a metric space. Also let C be an open cover of K. Show that there exists δ > 0 such that for all x ∈ K, B (x, δ) is contained in a single set of C. This number is called a Lebesgue number. Hint: For each x ∈ K, there exists B (x, δ x ) such that this ball is contained { (in a set)}ofn C. Now consider the { ( δx )} δ balls B x, 2 x∈K . Finitely many of these cover K. B xi , 2xi Now consider what i=1 } { δ happens if you let δ ≤ min 2xi , i = 1, 2, · · · , n . Explain why this works. You might draw a picture to help get the idea. 13. Suppose C is a set of compact sets (A set is compact if every open cover admits a finite subcover.) in a metric space (X, d) and suppose that the intersection of every finite subset of C is nonempty. This is called the finite intersection property. Show that ∩C, the intersection of all sets of C is nonempty. This particular result is enormously important. { } Hint: You could let U denote the set K C : K ∈ C . If ∩C is empty, then its complement is ∪U = X.{ Picking K ∈}C, it follows that U is an open cover of K. Therefore, you would need C to have K1C , · · · , Km is a cover of K. In other words, C m K ⊆ ∪m i=1 Ki = (∩i=1 Ki )

C

Now what does this say about the intersection of K with these Ki ? 14. Show that if f is continuous and defined on a compact set K in a metric space, then it is uniformly continuous. Continuous means continuous at every point. Uniformly continuous means: For every ε > 0 there exists δ > 0 such that if d (x, y) < δ, then d (f (x) , f (y)) < ε.

282

CHAPTER 11. NORMED LINEAR SPACES The difference is that δ does not depend on x. Hint: Use the existence of the Lebesgue number in Problem 12 to prove { continuity on a compact set } K implies uniform continuity on this set. Hint: Consider C ≡ f −1 (B (f (x) , ε/2)) : x ∈ X . This is an open cover of X. Let δ be a Lebesgue number for Suppose d (x, x ˆ) < δ. Then both x, x ˆ are in B (x, δ) and ( this ( openε cover. )) so both are in f −1 B f (¯ x) , 2 . Hence ρ (f (x) , f (¯ x)) < 2ε and ρ (f (ˆ x) , f (¯ x)) < 2ε . Now consider the triangle inequality. Recall the usual definition of continuity. In metric space it is as follows: For (D, d) , (Y, ρ) metric spaces, f : D → Y is continuous at x ∈ D means that for all ε > 0 there exists δ > 0 such that if d (y, x) < δ, then ρ (f (x) , f (y)) < ε. Continuity on D means continuity at every point of D.

15. The definition of compactness is that a set K is compact if and only if every open cover (collection of open sets whose union contains K) has a finite subset which is also an open cover. Show that this is equivalent to saying that every open cover consisting of balls has a finite subset which is also an open cover. 16. A set K in a metric space is said to be sequentially compact if whenever {xn } is a sequence in K, there exists a subsequence which converges to a point of K. Show that if K is compact, then it is sequentially compact. Hint: Explain why if x ∈ K, then there exist an open set Bx containing x which has xk for only finitely many values of k. Then use compactness. This was shown in the chapter, but do your own proof of this part of it. 17. Show that f : D → Y is continuous at x ∈ D where (D, d) , (Y, ρ) are metric spaces if and only if whenever xn → x in D, it follows that f (xn ) → f (x). Recall the usual definition of continuity. f is continuous at x means that for all ε > 0 there exists δ > 0 such that if d (y, x) < δ, then ρ (f (x) , f (y)) < ε. Continuity on D means continuity at every point of D. This is in the chapter, but go through the proof and write it down in your own words. 18. Give an easier proof of the result of Problem 14. Hint: If f is not uniformly continuous, then there exists ε > 0 and xn , yn , d (xn , yn ) < n1 but d (f (xn ) , f (yn )) ≥ ε. Now use sequential compactness of K to get a contradiction. 19. This problem will reveal the best kept secret in undergraduate mathematics, the definition of the derivative of a function of n variables. Let ∥·∥V be a norm on V and also denote by ∥·∥W a norm on W . Write ∥·∥ for both to save notation. Let U ⊆ V be an open set. Let f : U 7→ W be a function having values in W . Then f is differentiable at x ∈ U means that there exists A ∈ L (V, W ) such that for every ε > 0, there exists a δ > 0 such that whenever 0 < ∥v∥ < δ, it follows that ∥f (x + v) − f (x) − Av∥ 0. It is clear that Ni (λi I) = (λi I) N and so n

(Jri (λi )) =

n ( ) ∑ n k=0

k

N k λn−k = i

ri ( ) ∑ n k=0

k

N k λn−k i

which converges to 0 due to the assumption that |λi | < 1. There are finitely many terms and a typical one is a matrix whose entries are no larger than an expression of the form n−k

|λi |

Ck n (n − 1) · · · (n − k + 1) ≤ Ck |λi |

n−k

nk

∑∞ n−k k which converges to 0 because, by the root test, the series n=1 |λi | n converges. Thus for each i = 2, . . . , p, n lim (Jri (λi )) = 0. n→∞

By Condition 2, if

anij

denotes the ij th entry of An , then either p ∑

anij = 1 or

i=1

p ∑

anij = 1, anij ≥ 0.

j=1

This follows from Lemma 12.1.2. It is obvious each anij ≥ 0, and so the entries of An must be bounded independent of n. It follows easily from n times

z }| { P −1 AP P −1 AP P −1 AP · · · P −1 AP = P −1 An P that

P −1 An P = J n

(12.1)

Hence J must also have bounded entries as n → ∞. However, this requirement is incompatible with an assumption that N ̸= 0. If N ̸= 0, then N s ̸= 0 but N s+1 = 0 for some 1 ≤ s ≤ r. Then n

n

(I + N ) = I +

s ( ) ∑ n k=1

k

Nk

One of the entries of N s is nonzero by the definition of s. Let this entry be nsij . Then this implies ( ) n that one of the entries of (I + N ) is of the form ns nsij . This entry dominates the ij th entries of ( n) k k N for all k < s because ( ) ( ) n n lim / =∞ n→∞ s k

12.1. REGULAR MARKOV MATRICES

285

n

Therefore, the entries of (I + N ) cannot all be bounded. From block multiplication,   n (I + N )   n (Jr2 (λ2 ))   −1 n   P A P = ..  .   n (Jrm (λm )) and this is a contradiction because entries are bounded on the left and unbounded on the right. Since N = 0, the above equation implies limn→∞ An exists and equals   I   0   −1  P P  ..  .   0 Are there examples which will cause the eigenvalue condition of this theorem to hold? The following lemma gives such a condition. It turns out that if aij > 0, not just ≥ 0, then the eigenvalue condition of the above theorem is valid. Lemma 12.1.4 Suppose A = (aij ) is a stochastic matrix. Then λ = 1 is an eigenvalue. If aij > 0 for all i, j, then if µ is an eigenvalue of A, either |µ| < 1 or µ = 1. Proof: First consider the claim that 1 is an eigenvalue. By definition, ∑ 1aij = 1 i

(

)T

. Since A, AT have the same eigenvalues, this shows 1 1 ··· 1 is an eigenvalue of A. Suppose then that µ is an eigenvalue. Is |µ| < 1 or µ = 1? Let v be an eigenvector for AT and let |vi | be the largest of the |vj | . ∑ µvi = aji vj and so AT v = v where v =

j

and now multiply both sides by µvi to obtain ∑ ∑ 2 2 |µ| |vi | = aji vj µvi = aji Re (vj µvi ) j





j 2

2

aji |vi | |µ| = |µ| |vi |

j

Therefore, |µ| ≤ 1. If |µ| = 1, then equality must hold in the above, and so vj vi µ must be real and nonnegative for each j. In particular, this holds for j = i which shows µ is real and nonnegative. Thus, in this case, µ = 1 because µ ¯ = µ is nonnegative and equal to 1. The only other case is where |µ| < 1.  The next lemma is sort of a conservation result. It says the sign and sum of entries of a vector are preserved when multiplying by a Markov matrix. Lemma 12.1.5 ∑ Let A be any Markov matrix and let v be a vector having all ∑its components non negative with i vi = c. Then if w = Av, it follows that wi ≥ 0 for all i and i wi = c. Proof: From the definition of w, wi ≡

∑ j

aij vj ≥ 0.

286

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

Also



wi =

i

∑∑ i

aij vj =

j

∑∑ j

aij vj =

i



vj = c. 

j

The following theorem about limits is now easy to obtain. Theorem 12.1.6 Suppose A is a Markov matrix in which aij > 0 for all i, j and suppose w is a vector. Then for each i, ( ) lim Ak w i = vi k→∞

k

where Av = v. In words, A w ∑ always converges to a steady state. In addition to this, if the vector w satisfies w ≥ 0 for all i and i i wi = c, then the vector v will also satisfy the conditions, vi ≥ 0, ∑ i vi = c. Proof: By Lemma 12.1.4, since each aij > 0, the eigenvalues are either 1 or have absolute value less than 1. Therefore, the claimed limit exists by Theorem 12.1.3. The assertion that the components are nonnegative and sum to c follows from Lemma 12.1.5. That Av = v follows from v = lim An w = lim An+1 w = A lim An w = Av.  n→∞

n→∞

n→∞

It is not hard to generalize the conclusion of this theorem to regular Markov processes which are those having some power with all positive entries. Corollary 12.1.7 Suppose A is a regular Markov matrix, one for which the entries of Ak are all positive for some k, and suppose w is a vector. Then for each i, lim (An w)i = vi

n→∞

where Av = v. In words, An w ∑ always converges to a steady state. In addition to this, if the vector w satisfies w ≥ 0 for all i and i i wi = c, Then the vector v will also satisfy the conditions vi ≥ 0, ∑ v = c. i i Proof: Let the entries of Ak be all positive for some k. Now suppose that aij ≥ 0 for all i, j and A = (aij ) is a Markov matrix. Then if B = (bij ) is a Markov matrix with bij > 0 for all ij, it follows that BA is a Markov matrix which has strictly positive entries. This is because the ij th entry of BA is ∑ bik akj > 0, k

Thus, from Lemma 12.1.4, A has eigenvalues {1, λ1 , · · · , λr } , |λr | < 1. The same must be true of A. If Ax = µx for x ̸= 0 and µ ̸= 1, Then Ak x = µk x and so either µk = 1 or |µ| < 1. If µk = 1, then |µ| = 1 and the eigenvalues of Ak+1 are either 1 or have absolute value less than 1 because Ak+1 has all postive entries thanks to Lemma 12.1.4. Thus µk+1 = 1 and so k

1 = µk+1 = µµk = µ By Theorem 12.1.3, limn→∞ An w exists. The rest follows as in Theorem 12.1.6. 

12.2

Migration Matrices

Definition 12.2.1 Let n locations be denoted by the numbers 1, 2, · · · , n. Also suppose it is the case that each year aij denotes the proportion of residents in location j which move to location i. Also suppose no one escapes or emigrates from without these n locations. This last assumption requires ∑ a = 1. Thus (aij ) is a Markov matrix referred to as a migration matrix. i ij

12.3. ABSORBING STATES

287

T

If v = (x1 , · · · , xn ) where xi is the population of∑ location i at a given instant, you obtain the population of location i one year later by computing j aij xj = (Av)i . Therefore, the population ( ) of location i after k years is Ak v i . Furthermore, Corollary 12.1.7 can be used to predict in the case where A is regular what the long time population will be for the given locations. As an example of the above, consider the case where n = 3 and the migration matrix is of the form   .6 0 .1    .2 .8 0  . .2 .2 .9 Now



2  .1 . 38 .0 2   0  =  . 28 . 64 .9 . 34 . 34 ( k ) and so the Markov matrix is regular. Therefore, A v i will steady state. It follows the steady state can be obtained from .6   .2 .2

0 .8 .2

 . 15  .0 2  . 83 converge to the ith component of a solving the system

. 6x + . 1z = x . 2x + . 8y = y . 2x + . 2y + . 9z = z along with the stipulation that the sum of x, y, and z must equal the constant value present at the beginning of the process. The solution to this system is {y = x, z = 4x, x = x} . If the total population at the beginning is 150,000, then you solve the following system y = x, z = 4x, x + y + z = 150000 whose solution is easily seen to be {x = 25 000, z = 100 000, y = 25 000} . Thus, after a long time there would be about four times as many people in the third location as in either of the other two.

12.3

Absorbing States

There is a different kind of Markov process containing so called absorbing states which result in transition matrices which are not regular. However, Theorem 12.1.3 may still apply. One such example is the Gambler’s ruin problem. There is a total amount of money denoted by b. The Gambler starts with an amount j > 0 and gambles till he either loses everything or gains everything. He does this by playing a game in which he wins with probability p and loses with probability q. When he wins, the amount of money he has increases by 1 and when he loses, the amount of money he has decreases by 1. Thus the states are the integers from 0 to b. Let pij denote the probability that the gambler has i at the end of a game given that he had j at the beginning. Let pnij denote the probability that the gambler has i after n games given that he had j initially. Thus ∑ pn+1 = pik pnkj , ij k

288

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

and so pnij is the ij th entry of P n where P is the transition matrix. The above description indicates that this transition probability matrix is of the form   1 q 0 0 ··· 0    0 0 q 0    ..   ..  0 p 0  . .  P = (12.2)  .  . . .. .. q 0   ..     p 0 0   0 0 0 ··· 0 p 1 The absorbing states are 0 and b. In the first, the gambler has lost everything and hence has nothing else to gamble, so the process stops. In the second, he has won everything and there is nothing else to gain, so again the process stops. Consider the eigenvalues of this matrix which is a piece of the above transition matrix. Lemma 12.3.1 Let p, q > 0 and p + q = 1. Then the eigenvalues of   0 q 0    p 0 ...    A≡  .. ..   . . q   0 p 0 have absolute value less than 1. Proof: By Gerschgorin’s ∑ theorem, (See Page 119.) if λ is an eigenvalue, then |λ| ≤ 1. Alternatively, you note that i Aij ≤ 1. If λ is an eigenvalue of A then it is also one for AT and if AT x = λx where |xi | is the largest of the |xj | , ∑ ∑ Aji xj = λxi , |λ| |xi | ≤ Aji |xj | ≤ |xi | so |λ| ≤ 1. j

j

Now suppose v is an eigenvector for λ. Then  qv2  pv + qv 1 3   .. Av =  .    pvn−2 + qvn pvn−1





v1 v2 .. .

       = λ      vn−1  vn

    .   

Suppose |λ| = 1. Let vk be the first nonzero entry. Then vk−1 = 0 and so qvk+1 = λvk which implies |vk+1 | > |vk |. Thus |vk+1 | ≥ |vk |. Then consider the next term. From the above equations and what was just shown, |vk+1 | = |pvk + qvk+1 | ≤ p |vk | + q |vk+2 | ≤ p |vk+1 | + q |vk+2 | and so q |vk+1 | ≤ q |vk+2 | n

Continuing this way, it follows that the sequence {|vj |}j=k must be increasing. Specifically, if m {|vj |}j=k is increasing for some m > k, then p |vm−1 | + q |vm | ≥ |pvm−2 + qvm | = |λvm−1 | = |vm−1 |

12.3. ABSORBING STATES

289

and so q |vm | ≥ q |vm−1 | . Hence |vn | ≥ |vn−1 | > 0. However, this is contradicted by the the last line which states that p |vn−1 | = |vn | which requires that |vn−1 | > |vn | , a contradiction. Therefore, it must be that |λ| < 1.  Now consider the eigenvalues of 12.2. For P given there,   1−λ q 0 ··· 0   .  0 −λ . . 0      . .   .. q .. P − λI =  0 p    .. ..   . −λ  . 0  0 ··· 0 p 1−λ and so, expanding the determinant of the matrix column yields  −λ   p  2 (1 − λ) det   

along the first column and then along the last 

q ..

.

..

.

..

.

−λ p

   .  q  −λ

2

The roots of the polynomial after (1 − λ) have absolute value less than 1 because they are just the eigenvalues of a matrix of the sort in Lemma 12.3.1. It follows that the conditions of Theorem 12.1.3 apply and therefore, limn→∞ P n exists.  Of course, the above transition matrix, models many other kinds of problems. It is called a Markov process with two absorbing states, sometimes a random walk with two aborbing states. It is interesting to find the probability that the gambler loses all his money. This is given by limn→∞ pn0j .From the transition matrix for the gambler’s ruin problem, it follows that ∑

n−1 n−1 pn−1 0k pkj = qp0(j−1) + pp0(j+1) for j ∈ [1, b − 1] ,

pn0j

=

pn00

= 1, and pn0b = 0.

k

Assume here that p ̸= q. Now it was shown above that limn→∞ pn0j exists. Denote by Pj this limit. Then the above becomes much simpler if written as Pj

= qPj−1 + pPj+1 for j ∈ [1, b − 1] ,

(12.3)

P0

= 1 and Pb = 0.

(12.4)

It is only required to find a solution to the above difference equation with boundary conditions. To do this, look for a solution in the form Pj = rj and use the difference equation with boundary conditions to find the correct values of r. Thus you need rj = qrj−1 + prj+1 and so to find r you need to have pr2 − r + q = 0, and so the solutions for r are r = ) 1 ( ) √ √ 1 ( 1 + 1 − 4pq , 1 − 1 − 4pq 2p 2p Now

√ √ √ 1 − 4pq = 1 − 4p (1 − p) = 1 − 4p + 4p2 = 1 − 2p.

Thus the two values of r simplify to 1 q 1 (1 + 1 − 2p) = , (1 − (1 − 2p)) = 1 2p p 2p

290

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

Therefore, for any choice of Ci , i = 1, 2, ( )j q C1 + C2 p will solve the difference equation. Now choose C1 , C2 to satisfy the boundary conditions 12.4. Thus you need to have ( )b q =0 C1 + C2 = 1, C1 + C2 p It follows that C2 = Thus Pj = qb pb + b b b q −p p − qb

pb

pb qb , C1 = b b −q q − pb

( ) ( )j q j q b−j − pb−j qb q pb−j q j = b − b = p q − pb q − pb q b − pb

To find the solution in the case of a fair game, one could take the limp→1/2 of the above solution. Taking this limit, you get b−j Pj = . b You could also verify directly in the case where p = q = 1/2 in 12.3 and 12.4 that Pj = 1 and Pj = j are two solutions to the difference equation and proceeding as before.

12.4

Positive Matrices

Earlier theorems about Markov matrices were presented. These were matrices in which all the entries were nonnegative and either the columns or the rows added to 1. It turns out that many of the theorems presented can be generalized to positive matrices. When this is done, the resulting theory is mainly due to Perron and Frobenius. I will give an introduction to this theory here following Karlin and Taylor [25]. Definition 12.4.1 For A a matrix or vector, the notation, A >> 0 will mean every entry of A is positive. By A > 0 is meant that every entry is nonnegative and at least one is positive. By A ≥ 0 is meant that every entry is nonnegative. Thus the matrix or vector consisting only of zeros is ≥ 0. An expression like A >> B will mean A − B >> 0 with similar modifications for > and ≥. T For the sake of this section only, define the following for x = (x1 , · · · , xn ) , a vector. T

|x| ≡ (|x1 | , · · · , |xn |) . Thus |x| is the vector which results by replacing each entry of x with its absolute value1 . Also define for x ∈ Cn , ∑ ||x||1 ≡ |xk | . k

Lemma 12.4.2 Let A >> 0 and let x > 0. Then Ax >> 0. ∑ Proof: (Ax)i = j Aij xj > 0 because all the Aij > 0 and at least one xj > 0. Lemma 12.4.3 Let A >> 0. Define S ≡ {λ : Ax > λx for some x >> 0} , 1 This notation is just about the most abominable thing imaginable because it is the same notation but entirely different meaning than the norm. However, it saves space in the presentation of this theory of positive matrices and avoids the use of new symbols. Please forget about it when you leave this section.

12.4. POSITIVE MATRICES

291

and let K ≡ {x ≥ 0 such that ||x||1 = 1} . Now define S1 ≡ {λ : Ax ≥ λx for some x ∈ K} . Then sup (S) = sup (S1 ) . Proof: Let λ ∈ S. Then there exists x >> 0 such that Ax > λx. Consider y ≡ x/ ||x||1 . Then ||y||1 = 1 and Ay > λy. Therefore, λ ∈ S1 and so S ⊆ S1 . Therefore, sup (S) ≤ sup (S1 ) . Now let λ ∈ S1 . Then there exists x ≥ 0 such that ||x||1 = 1 so x > 0 and Ax > λx. Letting y ≡ Ax, it follows from Lemma 12.4.2 that Ay >> λy and y >> 0. Thus λ ∈ S and so S1 ⊆ S which shows that sup (S1 ) ≤ sup (S) .  This lemma is significant because the set, {x ≥ 0 such that ||x||1 = 1} ≡ K is a compact set in Rn . Define λ0 ≡ sup (S) = sup (S1 ) . (12.5) The following theorem is due to Perron. Theorem 12.4.4 Let A >> 0 be an n × n matrix and let λ0 be given in 12.5. Then 1. λ0 > 0 and there exists x0 >> 0 such that Ax0 = λ0 x0 so λ0 is an eigenvalue for A. 2. If Ax = µx where x ̸= 0, and µ ̸= λ0 . Then |µ| < λ0 . 3. The eigenspace for λ0 has dimension 1. T

Proof: To see λ0 > 0, consider the vector, e ≡ (1, · · · , 1) . Then ∑ (Ae)i = Aij > 0 j

and so λ0 is at least as large as min i



Aij .

j

Let {λk } be an increasing sequence of numbers from S1 converging to λ0 . Letting xk be the vector from K which occurs in the definition of S1 , these vectors are in a compact set. Therefore, there exists a subsequence, still denoted by xk such that xk → x0 ∈ K and λk → λ0 . Then passing to the limit, Ax0 ≥ λ0 x0 , x0 > 0. If Ax0 > λ0 x0 , then letting y ≡ Ax0 , it follows from Lemma 12.4.2 that Ay >> λ0 y and y >> 0. But this contradicts the definition of λ0 as the supremum of the elements of S because since Ay >> λ0 y, it follows Ay >> (λ0 + ε) y for ε a small positive number. Therefore, Ax0 = λ0 x0 . It remains to verify that x0 >> 0. But this follows immediately from ∑ 0< Aij x0j = (Ax0 )i = λ0 x0i . j

This proves 1. Next suppose Ax = µx and x ̸= 0 and µ ̸= λ0 . Then |Ax| = |µ| |x| . But this implies A |x| ≥ |µ| |x| . (See the above abominable definition of |x|.) Case 1: |x| ̸= x and |x| ̸= −x. In this case, A |x| > |Ax| = |µ| |x| and letting y = A |x| , it follows y >> 0 and Ay >> |µ| y which shows Ay >> (|µ| + ε) y for sufficiently small positive ε and verifies |µ| < λ0 . Case 2: |x| = x or |x| = −x

292

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

In this case, the entries of x are all real and have the same sign. Therefore, A |x| = |Ax| = |µ| |x| . Now let y ≡ |x| / ||x||1 . Then Ay = |µ| y and so |µ| ∈ S1 showing that |µ| ≤ λ0 . But also, the fact the entries of x all have the same sign shows µ = |µ| and so µ ∈ S1 . Since µ ̸= λ0 , it must be that µ = |µ| < λ0 . This proves 2. It remains to verify 3. Suppose then that Ay = λ0 y and for all scalars α, αx0 ̸= y. Then A Re y = λ0 Re y, A Im y = λ0 Im y. If Re y = α1 x0 and Im y = α2 x0 for real numbers, αi ,then y = (α1 + iα2 ) x0 and it is assumed this does not happen. Therefore, either t Re y ̸= x0 for all t ∈ R or t Im y ̸= x0 for all t ∈ R. Assume the first holds. Then varying t ∈ R, there exists a value of t such that x0 + t Re y > 0 but it is not the case that x0 + t Re y >> 0. Then A (x0 + t Re y) >> 0 by Lemma 12.4.2. But this implies λ0 (x0 + t Re y) >> 0 which is a contradiction. Hence there exist real numbers, α1 and α2 such that Re y = α1 x0 and Im y = α2 x0 showing that y = (α1 + iα2 ) x0 . This proves 3. It is possible to obtain a simple corollary to the above theorem. Corollary 12.4.5 If A > 0 and Am >> 0 for some m ∈ N, then all the conclusions of the above theorem hold. Proof: There exists µ0 > 0 such that Am y0 = µ0 y0 for y0 >> 0 by Theorem 12.4.4 and µ0 = sup {µ : Am x ≥ µx for some x ∈ K} . Let λm 0 = µ0 . Then

( ) (A − λ0 I) Am−1 + λ0 Am−2 + · · · + λm−1 I y0 = (Am − λm 0 0 I) y0 = 0 ( ) and so letting x0 ≡ Am−1 + λ0 Am−2 + · · · + λm−1 I y0 , it follows x0 >> 0 and Ax0 = λ0 x0 . 0 Suppose now that Ax = µx for x ̸= 0 and µ ̸= λ0 . Suppose |µ| ≥ λ0 . Multiplying both sides by m m A, it follows Am x = µm x and |µm | = |µ| ≥ λm 0 = µ0 and so from Theorem 12.4.4, since |µ | ≥ µ0 , m m m and µ is an eigenvalue of A , it follows that µ = µ0 . But by Theorem 12.4.4 again, this implies x = cy0 for some scalar, c and hence Ay0 = µy0 . Since y0 >> 0, it follows µ ≥ 0 and so µ = λ0 , a contradiction. Therefore, |µ| < λ0 . Finally, if Ax = λ0 x, then Am x = λm 0 x and so x = cy0 for some scalar, c. Consequently, ) ( m−1 ) ( A + λ0 Am−2 + · · · + λm−1 I x = c Am−1 + λ0 Am−2 + · · · + λm−1 I y0 0 0 =

cx0 .

Hence x = cx0 mλm−1 0 which shows the dimension of the eigenspace for λ0 is one.  The following corollary is an extremely interesting convergence result involving the powers of positive matrices. Corollary 12.4.6 Let A > 0 and Am >> 0 for ( some N. Then for λ0 given in 12.5, there )m m ∈ exists a rank one matrix P such that limm→∞ λA0 − P = 0. Proof: Considering AT , and the fact that A and AT have the same eigenvalues, Corollary 12.4.5 implies the existence of a vector, v >> 0 such that AT v = λ0 v.

12.4. POSITIVE MATRICES

293

Also let x0 denote the vector such that Ax0 = λ0 x0 with x0 >> 0. First note that xT0 v > 0 because both these vectors have all entries positive. Therefore, v may be scaled such that vT x0 = xT0 v = 1.

(12.6)

Define P ≡ x0 vT . Thanks to 12.6, A P = x0 vT = P, P λ0

(

A λ0

)

( = x0 v T

A λ0

) = x0 vT = P,

(12.7)

and P 2 = x0 vT x0 vT = vT x0 = P.

(12.8)

Therefore, (

A −P λ0

)2

( = ( =

A λ0 A λ0

)2

( −2

)2

A λ0

) P + P2

− P.

Continuing this way, using 12.7 repeatedly, it follows (( ) )m ( )m A A − P. (12.9) −P = λ0 λ0 ( ) The eigenvalues of λA0 − P are of interest because it is powers of this matrix which determine ( )m the convergence of λA0 to P. Therefore, let µ be a nonzero eigenvalue of this matrix. Thus ((

A λ0

)

) −P

x = µx

(12.10)

for x ̸= 0, and µ ̸= 0. Applying P to both sides and using the second formula of 12.7 yields ( ( ) ) A 0 = (P − P ) x = P − P 2 x = µP x. λ0 But since P x = 0, it follows from 12.10 that Ax = λ0 µx which implies λ0 µ is an eigenvalue of A. Therefore, by Corollary 12.4.5 it follows that either λ0 µ = λ0 in which case µ = 1, or λ0 |µ| < λ0 which implies |µ| < 1. But if µ = 1, then x is a multiple of x0 and 12.10 would yield (( ) ) A − P x0 = x0 λ0 which says x0 − x0 vT x0 = x0 and so by 12.6, x0 = 0 contrary to the property that ( x)0 >> 0. Therefore, |µ| < 1 and so this has shown that the absolute values of all eigenvalues of λA0 − P are less than 1. By Gelfand’s theorem, Theorem 15.2.4, it follows (( ) )m 1/m A > 0? As before, K ≡ {x ≥ 0 such that ||x||1 = 1} . Now define S1 ≡ {λ : Ax ≥ λx for some x ∈ K} and λ0 ≡ sup (S1 )

(12.11)

Theorem 12.4.7 Let A > 0 and let λ0 be defined in 12.11. Then there exists x0 > 0 such that Ax0 = λ0 x0 . Proof: Let E consist of the matrix which has a one in every entry. Then from Theorem 12.4.4 it follows there exists xδ >> 0 , ||xδ ||1 = 1, such that (A + δE) xδ = λ0δ xδ where λ0δ ≡ sup {λ : (A + δE) x ≥ λx for some x ∈ K} . Now if α < δ {λ : (A + αE) x ≥ λx for some x ∈ K} ⊆ {λ : (A + δE) x ≥ λx for some x ∈ K} and so λ0δ ≥ λ0α because λ0δ is the sup of the second set and λ0α is the sup of the first. It follows the limit, λ1 ≡ limδ→0+ λ0δ exists. Taking a subsequence and using the compactness of K, there exists a subsequence, still denoted by δ such that as δ → 0, xδ → x ∈ K. Therefore, Ax = λ1 x and so, in particular, Ax ≥ λ1 x and so λ1 ≤ λ0 . But also, if λ ≤ λ0 , λx ≤ Ax < (A + δE) x showing that λ0δ ≥ λ for all such λ. But then λ0δ ≥ λ0 also. Hence λ1 ≥ λ0 , showing these two numbers are the same. Hence Ax = λ0 x.  If Am >> 0 for some m and A > 0, it follows that the dimension of the eigenspace for λ0 is one and that the absolute value of every other eigenvalue of A is less than λ0 . If it is only assumed that A > 0, not necessarily >> 0, this is no longer true. However, there is something which is very interesting which can be said. First here is an interesting lemma. Lemma 12.4.8 Let M be a matrix of the form ( M= (

or M=

A B

0 C

A B 0 C

)

)

where A is an r × r matrix and C is an (n − r) × (n − r) matrix. Then det (M ) = det (A) det (B) and σ (M ) = σ (A) ∪ σ (C) .

12.4. POSITIVE MATRICES

295

Proof: To verify the claim about the determinants, note ( ) ( )( A 0 A 0 I = B C 0 I B Therefore,

( det

A B

0 C

)

( = det

A 0

0 I

)

0 C (

det

I B

)

0 C

) .

But it is clear from the method of Laplace expansion that ( ) A 0 det = det A 0 I and from the multilinear properties of the determinant and row operations that ( ) ( ) I 0 I 0 det = det = det C. B C 0 C The case where M is upper block triangular is similar. This immediately implies σ (M ) = σ (A) ∪ σ (C) . Theorem 12.4.9 Let A > 0 and let λ0 be given in 12.11. If λ is an eigenvalue for A such that m |λ| = λ0 , then λ/λ0 is a root of unity. Thus (λ/λ0 ) = 1 for some m ∈ N. Proof: Applying Theorem 12.4.7 to AT , there exists v > 0 such that AT v = λ0 v. In the first part of the argument it is assumed v >> 0. Now suppose Ax = λx, x ̸= 0 and that |λ| = λ0 . Then A |x| ≥ |λ| |x| = λ0 |x| and it follows that if A |x| > |λ| |x| , then since v >> 0, ( ) λ0 (v, |x|) < (v,A |x|) = AT v, |x| = λ0 (v, |x|) , a contradiction. Therefore, A |x| = λ0 |x| . It follows that

(12.12)

∑ ∑ A x Aij |xj | ij j = λ0 |xi | = j j

and so the complex numbers, Aij xj , Aik xk must have the same argument for every k, j because equality holds in the triangle inequality. Therefore, there exists a complex number, µi such that Aij xj = µi Aij |xj | and so, letting r ∈ N, Summing on j yields

(12.13)

Aij xj µrj = µi Aij |xj | µrj . ∑

Aij xj µrj = µi

j



Aij |xj | µrj .

(12.14)

j

Also, summing 12.13 on j and using that λ is an eigenvalue for x, it follows from 12.12 that ∑ ∑ λxi = Aij xj = µi Aij |xj | = µi λ0 |xi | . (12.15) j

j

296

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

From 12.14 and 12.15, ∑

Aij xj µrj



= µi

j

Aij |xj | µrj

j



= µi

see 12.15

z }| { Aij µj |xj | µjr−1

j



= µi

( Aij

j

( = µi

λ λ0

λ λ0

)∑

) xj µr−1 j

Aij xj µr−1 j

j

Now from 12.14 with r replaced by r − 1, this equals ( )∑ ( )∑ λ λ 2 µ2i Aij |xj | µr−1 = µ Aij µj |xj | µr−2 i j j λ0 λ 0 j j ( )2 ∑ λ 2 = µi Aij xj µr−2 . j λ0 j Continuing this way,



( Aij xj µrj

=

µki

j

λ λ0

)k ∑

Aij xj µr−k j

j

and eventually, this shows ∑

( Aij xj µrj

µri

=

j

( =

( and this says

λ λ0

)r+1

( is an eigenvalue for

A λ0

λ λ0

λ λ0 )r

)r ∑

Aij xj

j

λ (xi µri )

) with the eigenvector being T

(x1 µr1 , · · · , xn µrn ) . ( )2 ( )3 ( )4 Now recall that r ∈ N was arbitrary and so this has shown that λλ0 , λλ0 , λλ0 , · · · are each ( ) eigenvalues of λA0 which has only finitely many and hence this sequence must repeat. Therefore, ( ) λ is a root of unity as claimed. This proves the theorem in the case that v >> 0. λ0 Now it is necessary to consider the case where v > 0 but it is not the case that v >> 0. Then in this case, there exists a permutation matrix P such that   v1  .   .   .  ( )    vr  u  ≡ v1 Pv =   0 ≡ 0    .   .   .  0 Then λ0 v = AT v = AT P v1 .

12.5. FUNCTIONS OF MATRICES

297

Therefore, λ0 v1 = P AT P v1 = Gv1 Now P 2 = I because it is a permutation matrix. Therefore, the matrix G ≡ P AT P and A are similar. Consequently, they have the same eigenvalues and it suffices from now on to consider the matrix G rather than A. Then ( ) ( )( ) u M1 M2 u λ0 = 0 M3 M4 0 where M1 is r×r and M4 is (n − r)×(n − r) . It follows from block multiplication and the assumption that A and hence G are > 0 that ) ( A′ B . G= 0 C Now let λ be an eigenvalue of G such that |λ| = λ0 . Then from Lemma 12.4.8, either λ ∈ σ (A′ ) or λ ∈ σ (C) . Suppose without loss of generality that λ ∈ σ (A′ ) . Since A′ > 0 it has a largest positive eigenvalue λ′0 which is obtained from 12.11. Thus λ0′ ≤ λ0 but λ being an eigenvalue of A′ , has its absolute value bounded by λ′0 and so λ0 = |λ| ≤ λ′0 ≤ λ0 showing that λ0 ∈ σ (A′ ) . Now if there exists v >> 0 such that A′T v = λ0 v, then the first part of this proof applies to the matrix A and so (λ/λ0 ) is a root of unity. If such a vector, v does not exist, then let A′ play the role of A in the above argument and reduce to the consideration of ( ) A′′ B ′ ′ G ≡ 0 C′ where G′ is similar to A′ and λ, λ0 ∈ σ (A′′ ) . Stop if A′′T v = λ0 v for some v >> 0. Otherwise, decompose A′′ similar to the above and add another prime. Continuing this way you must eventually T obtain the situation where (A′···′ ) v = λ0 v for some v >> 0. Indeed, this happens no later than ′···′ when A is a 1 × 1 matrix. 

12.5

Functions Of Matrices

The existence of the Jordan form also makes it possible to define various functions of matrices. Suppose ∞ ∑ f (λ) = an λn (12.16) n=0

∑∞

for all |λ| < R. There is a formula for f (A) ≡ n=0 an An which makes sense whenever ρ (A) < R. Thus you can speak of sin (A) or eA for A an n × n matrix. To begin with, define fP (λ) ≡

P ∑

an λn

n=0

so for k < P (k)

fP (λ) =

P ∑ n=k

=

P ∑ n=k

Thus

an n · · · (n − k + 1) λn−k ( ) n an k!λn−k . k

( ) P (k) ∑ fP (λ) n n−k = an λ k! k n=k

(12.17)

(12.18)

298

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

To begin with consider f (Jm (λ)) where Jm (λ) is an m × m Jordan block. Thus Jm (λ) = D + N where N m = 0 and N commutes with D. Therefore, letting P > m P ∑

n

an Jm (λ)

n=0

n ( ) ∑ n Dn−k N k k n=0 k=0 P P ∑ ∑ (n) = an Dn−k N k k k=0 n=k ( ) m−1 P ∑ ∑ n k = N an Dn−k . k

=

P ∑

an

k=0

From 12.18 this equals m−1 ∑

( k

N diag

k=0

(12.19)

n=k

(k)

(k)

fP (λ) f (λ) ,··· , P k! k!

) (12.20)

where for k = 0, · · · , m − 1, define diagk (a1 , · · · , am−k ) the m × m matrix which equals zero everywhere except on the k th super diagonal where this diagonal is filled with the numbers, {a1 , · · · , am−k } from the upper left to the lower right. With no subscript, it is just the diagonal matrices having the indicated entries. Thus in 4 × 4 matrices, diag2 (1, 2) would be the matrix     

0 0 0 0

0 0 0 0

1 0 0 0



0 2 0 0

  . 

Then from 12.20 and 12.17, P ∑

n

an Jm (λ) =

n=0

Therefore,

∑P n=0

m−1 ∑

( diag k

k=0

(k)

(k)

fP (λ) f (λ) ,··· , P k! k!

) .

n

an Jm (λ) =  ′ fP (λ) f (λ) 1!  P   fP (λ)        0

(2)

fP (λ) 2! ′ fP (λ) 1!

fP (λ)

··· .. . .. . .. .

(m−1)

fP (λ) (m−1)!

.. . (2)

fP (λ) 2! ′ fP (λ) 1!

          

(12.21)

fP (λ)

Now let A be an n × n matrix with ρ (A) < R where R is given above. Then the Jordan form of A is of the form   J1 0   J2    (12.22) J = ..   .   0 Jr where Jk = Jmk (λk ) is an mk × mk Jordan block and A = S −1 JS. Then, letting P > mk for all k, P ∑ n=0

an An = S −1

P ∑ n=0

an J n S,

12.5. FUNCTIONS OF MATRICES

299

and because of block multiplication of matrices,  ∑P n n=0 an J1  P  ∑  an J n =   n=0  0 and from 12.21

∑P n=0



0 ..

     

. ..

.

∑P n=0

an Jrn

an Jkn converges as P → ∞ to the mk × mk matrix   ′ f (2) (λk ) f (m−1) (λk ) k) · · · f (λk ) f (λ 1! 2! (mk −1)!     .. ′ . f (λ ) . k   . 0 f (λ ) . k 1!     (2) . f (λk ) .   . 0 0 f (λk )   2!   .. .. ..   f ′ (λk ) . . .   1! 0

···

0

0

(12.23)

f (λk )

There is no convergence problem because |λ| < R for all λ ∈ σ (A) . This has proved the following theorem. Theorem 12.5.1 Let f be given by 12.16 and suppose ρ (A) < R where R is the radius of convergence of the power series in 12.16. Then the series, ∞ ∑

a n An

(12.24)

k=0

converges in the space L (Fn , Fn ) with respect to any of the norms on this space and furthermore,   ∑∞ n 0 n=0 an J1   .. ∞   ∑ .  n −1  an A = S  S ..   . k=0   ∑∞ n 0 n=0 an Jr ∑∞ where n=0 an Jkn is an mk × mk matrix of the form given in 12.23 where A = S −1 JS and the Jordan form of A, J is given by 12.22. Therefore, you can define f (A) by the series in 12.24. Here is a simple example.    Example 12.5.2 Find sin (A) where A =  

4 1 0 −1

1 1 −1 2

−1 0 1 1

1 −1 −1 4

   . 

In this case, the Jordan canonical form of the matrix is not too hard to find.     

4 1 0 −1

1 1 −1 2

−1 0 1 1

1 −1 −1 4





    =  

2 1 0 −1

0 −4 0 4

−2 −2 −2 4

−1 −1 1 2

   · 

300

CHAPTER 12. LIMITS OF VECTORS AND MATRICES     

Then from the above theorem  4  0  sin   0 0 Therefore, sin (A) =  2 0 −2  1 −4 −2    0 0 −2 −1 4 4

−1 −1 1 2

4 0 0 0

0 2 0 0

0 1 2 0

0 0 1 2



1 2 1 8

    0 0

sin (J) is given by   0 0 0 sin 4  0 2 1 0    = 0 2 1   0 0 0 2 0

    

sin 4 0 0 0

0 sin 2 0 0

0 cos 2 sin 2 0

1 2 − 38 1 4 1 2

0 0 − 14 1 2

0 sin 2 0 0

0



1 2 − 18 1 4 1 2

  . 

0 cos 2 sin 2 0 

0

  . cos 2  sin 2

− sin 2 2

1 2 1 8

   cos 2   0 sin 2 0

− sin 2 2

where the columns of M are as follows from left to right,    sin 4 − sin 2 − cos 2 sin 4  1 sin 4 − 1 sin 2   1 sin 4 + 3 sin 2 − 2 cos 2  2   2 2 2  ,    0 − cos 2 − 12 sin 4 + 12 sin 2 − 12 sin 4 − 21 sin 2 + 3 cos 2   sin 4 − sin 2 − cos 2  1 sin 4 + 1 sin 2 − 2 cos 2   2  2  .   − cos 2 − 12 sin 4 +

1 2



1 2 − 38 1 4 1 2

      ,  

0 0 − 41 1 2

1 2 − 81 1 4 1 2

   =M 

− cos 2 sin 2 sin 2 − cos 2 cos 2 − sin 2

    

sin 2 + 3 cos 2

Perhaps this isn’t the first thing you would think of. Of course the ability to get this nice closed form description of sin (A) was dependent on being able to find the Jordan form along with a similarity transformation which will yield the Jordan form. The following corollary is known as the spectral mapping theorem. Corollary 12.5.3 Let A be an n × n matrix and let ρ (A) < R where for |λ| < R, f (λ) =

∞ ∑

an λn .

n=0

Then f (A) is also an n × n matrix and furthermore, σ (f (A)) = f (σ (A)) . Thus the eigenvalues of f (A) are exactly the numbers f (λ) where λ is an eigenvalue of A. Furthermore, the algebraic multiplicity of f (λ) coincides with the algebraic multiplicity of λ. All of these things can be generalized to linear transformations defined on infinite dimensional spaces and when this is done the main tool is the Dunford integral along with the methods of complex analysis. It is good to see it done for finite dimensional situations first because it gives an idea of what is possible.

12.6

Exercises

1. Suppose the migration matrix for three locations is   .5 0 .3    .3 .8 0  . .2 .2 .7

12.6. EXERCISES

301

Find a comparison for the populations in the three locations after a long time. ∑ 2. Show that if i aij = 1, then if A = (aij ) , then the sum of the entries of Av equals the sum of the entries of v. Thus it does not matter whether aij ≥ 0 for this to be so. 3. If A satisfies the conditions of the above problem, can it be concluded that limn→∞ An exists? 4. Give an example of a non regular Markov matrix which has an eigenvalue equal to −1. 5. Show that when a Markov matrix is non defective, all of the above theory can be proved very easily. In particular, prove the theorem about the existence of limn→∞ An if the eigenvalues are either 1 or have absolute value less than 1. 6. Find a formula for An where 

5 2

 5  A= 7  2 7 2

− 12 0 − 12 − 12

−1 −4 − 52 −2

0 0 1 2

0

    

Does limn→∞ An exist? Note that all the rows sum to 1. Hint: This matrix is similar to a diagonal matrix. The eigenvalues are 1, −1, 12 , 12 . 7. Find a formula for An where 

2  4  A= 5  2 3

− 12 0 − 12 − 12

−1 −4 −2 −2

1 2

1 1 1 2

    

Note that the rows sum to 1 in this matrix also. Hint: This matrix is not similar to a diagonal matrix but you can find the Jordan form and consider this in order to obtain a formula for this product. The eigenvalues are 1, −1, 12 , 12 . 8. Find limn→∞ An if it exists for the matrix  1   A= 

2 − 12 1 2 3 2

− 12 1 2 1 2 3 2

− 12 − 12 3 2 3 2

0 0 0 1

    

The eigenvalues are 12 , 1, 1, 1. 9. Give an example of a matrix A which has eigenvalues which are either equal to 1,−1, or have absolute value strictly less than 1 but which has the property that limn→∞ An does not exist. 10. If A is an n × n matrix such that all the eigenvalues have absolute value less than 1, show limn→∞ An = 0. 11. Find an example of a 3 × 3 matrix A such that limn→∞ An does not exist but limr→∞ A5r does exist. 12. If A is a Markov matrix and B is similar to A, does it follow that B is also a Markov matrix? ∑ 13. In ∑ Theorem 12.1.3 suppose everything is unchanged except that you assume either j aij ≤ 1 or i aij ≤ 1. Would the same conclusion be valid? What if you don’t insist that each aij ≥ 0? Would the conclusion hold in this case?

302

CHAPTER 12. LIMITS OF VECTORS AND MATRICES

14. Let V be an n dimensional vector space and let x ∈ V and x ̸= 0. Consider β x ≡ x, Ax, · · · ,Am−1 x where

( ) Am x ∈ span x,Ax, · · · ,Am−1 x

and m is the smallest such that the above inclusion in the span takes place. Show that { } x,Ax, · · · ,Am−1 x must be linearly independent. Next suppose {v1 , · · · , vn } is a basis for V . Consider β vi as just discussed, having length mi . Thus Ami vi is a linearly combination of vi ,Avi , · · · ,Am−1 vi for m as small as possible. Let pvi (λ) be the monic polynomial which expresses this linear combination. Thus pvi (A) vi = 0 and the degree of pvi (λ) is as small as possible for this to take place. Show that the minimum polynomial for A must be the monic polynomial which is the least common multiple of these polynomials pvi (λ).

Chapter 13

Inner Product Spaces and Least Squares In this chapter is a more complete discussion of important theorems for inner product spaces. These results are presented for inner product spaces, the typical example being Cn or Rn . The extra generality is used because most of the ideas have a straight forward generalization to something called a Hilbert space which is just a complete inner product space. First is a major result about projections.

13.1

Orthogonal Projections

Recall that any finite dimensional normed linear space is complete. The following definition includes the case where the norm comes from an inner product. Definition 13.1.1 Let (H, (·, ·)) be a complete inner product space. This means the norm comes 1/2 from an inner product as described on Page 266, |v| ≡ (v, v) . Such a space is called a Hilbert space As shown earlier, if H is finite dimensional, then it is a Hilbert space automatically. The following is the definition of a convex set. This is a set with the property that the line segment between any two points in the set is in the set. Definition 13.1.2 A nonempty subset K of a vector space is said to be convex if whenever x, y ∈ K and t ∈ [0, 1] , it follows that tx + (1 − t) y ∈ K. Theorem 13.1.3 Let K be a closed and convex nonempty subset of a Hilbert space and let y ∈ H. Also let λ ≡ inf {|x − y| : x ∈ K} Then if {xn } ⊆ K is a sequence such that limn→∞ |xn − y| = λ, then it follows that {xn } is a Cauchy sequence and limn→∞ xn = x ∈ K with |x − y| = λ. Also if |x − y| = λ = |ˆ x − y| , then x ˆ = x. Proof: Recall the parallelogram identity valid in any innner product space: 2

2

2

2

|x + y| + |x − y| = 2 |x| + 2 |y|

First consider the claim about uniqueness. Letting x, x ˆ be as given, 2 2 2 2 x − y x x − x x + x ˆ ˆ ˆ ˆ − y + x − x = + − y + 2 2 2 2 2 2 2 x x − y ˆ − y + 2 = λ2 = 2 2 2 303

304

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

x Since x+ˆ ˆ since it shows that 2 ∈ K due to convexity, this is a contradiction unless x = x to y than λ. Now consider the minimizing sequence. From the same computation just given, 2 2 2 2 xn + xm + xn − xm = 2 xn − y + 2 xm − y − y 2 2 2 2 1 1 2 2 = |xn − y| + |xm − y| 2 2

since

xn +xm 2

∈ K,

x+ˆ x 2

is closer

xn − xm 2 ≤ 1 |xn − y|2 + 1 |xm − y|2 − λ2 2 2 2

and as n, m → ∞, the right side converges to 0 by definition. Hence {xn } is a Cauchy sequence as claimed. By completeness, it converges to some x ∈ H. Since K is closed, it follows that x ∈ K. Then from the triangle inequality, lim |y − xn | = |y − x| = λ. 

n→∞

In the above theorem, denote by P y the vector x ∈ K closest to y. It turns out there is an easy 2 way to characterize P y. For a given z ∈ K, one can consider the function t → |x + t (z − x) − y| for x ∈ K. By properties of the inner product, this is 2

t → |x − y| + 2t Re (z − x, x − y) + t2 |z − x|

2

according to whether Re (z − x, x − y) ≥ 0. Thus elementary considerations yield the two possibilities shown in the graph. Either this function is increasing on [0, 1] or it is not. In the case Re (z − x, x − y) < 0 the graph shows that x ̸= P y because there is a positive value of t such that 2 the function is less than |x − y| and in case Re (z − x, x − y) ≥ 0, we obtain x = P y if this is always true for any z ∈ K. Note that by convexity, x + t (z − x) ∈ K for all t ∈ [0, 1] since it equals (1 − t) x + tz.

|x − y|2 0

Re (z − x, x − y) < 0

t 1

0

Re (z − x, x − y) ≥ 0

t 1

Theorem 13.1.4 Let x ∈ K and y ∈ H. Then there exists a closest point of K to y denoted by P y. Then x = P y if and only if Re (z − x, y − x) ≤ 0 (13.1) for all z ∈ K. z K

θ

y

x

- y

Proof: First suppose 13.1 so Re (z − x, x − y) ≥ 0. Then for arbitrary z ∈ K, 2

2

|x + t (z − x) − y| = |x − y| + 2t Re (z − x, x − y) + t2 |z − x|

2 2

and is an increasing function on [0, 1]. Thus it has its minimum at t = 0. In particular |x − y| ≤ 2 |z − y| . Since z is arbitrary, this shows x = P y.

13.1. ORTHOGONAL PROJECTIONS

305 2

Next suppose x = P y. Then for arbitrary z ∈ K, the minimum of t → |x + t (z − x) − y| occurs when t = 0 since x + t (z − x) ∈ K. This will not happen unless Re (z − x, x − y) ≥ 0 because if this is less than 0, the minimum of that function will take place for some positive t. Thus 13.1 holds.  Every subspace is a closed and convex set. Note that Re and Im are real linear maps from C to R. Re (x + iy) ≡ x, Im (x + iy) ≡ y That is, for a, b real scalars and z, w complex numbers, Re (az + bw) =

a Re (z) + b Re (w)

Im (az + bw) =

a Im (z) + b Im (w)

This assertions follow directly from the definitions of complex arithmetic and will be used without any mention whenever convenient. The next proposition will be very useful in what follows. Proposition 13.1.5 If W is a subspace. Then Re (z, w) ≤ 0 for all w ∈ W, if and only if (z, w) = 0 for all w ∈ W . Proof: ⇒First of all, Re (z, −w) = − Re (z, w) so if Re (z, w) ≤ 0 for all w ∈ W, then for each w ∈ W, 0 ≥ Re (z, −w) = − Re (z, w) ≥ 0. Thus Re (z, w) = − Re (z, −w) = 0. Now also (z, iw) =

Re (z, iw) + i Im (z, w) = −i (z, w)

= −i [Re (z, w) + i Im (z, w)] = −i Re (z, w) + Im (z, w) and so Im (z, w) = (z, iw) . Therefore, if Re (z, w) ≤ 0 for all w ∈ W, it follows that Re (z, w) = 0 for all w and hence Im (z, w) = 0 for all w ∈ W and so (z, w) = 0 for all w ∈ W . ⇐ Conversely, if (z, w) = 0 for all w ∈ W, then obviously Re (z, w) = 0 for all w ∈ W .  Next is a fundamental result used in inner product spaces. It is called the Gram Schmidt process. Lemma 13.1.6 Let {v1 , · · · , vn } be a linearly independent subset of an inner product space H. Then there exists orthonormal vectors {u1 , · · · , un } which have the property that for each k ≤ n, span (v1 , · · · , vk ) = span (u1 , · · · , uk ) . Proof: Let u1 ≡ v1 / |v1 | . Thus for k = 1, span (u1 ) = span (v1 ) and {u1 } is an orthonormal set. Now suppose for some k < n, u1 , · · · , uk have been chosen such that (uj , ul ) = δ jl and span (v1 , · · · , vk ) = span (u1 , · · · , uk ). Then define ∑k vk+1 − j=1 (vk+1 , uj ) uj , (13.2) uk+1 ≡ ∑k vk+1 − j=1 (vk+1 , uj ) uj where the denominator is not equal to zero because the vj form a basis, and so vk+1 ∈ / span (v1 , · · · , vk ) = span (u1 , · · · , uk ) Thus by induction, uk+1 ∈ span (u1 , · · · , uk , vk+1 ) = span (v1 , · · · , vk , vk+1 ) .

306

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

Also, vk+1 ∈ span (u1 , · · · , uk , uk+1 ) which is seen easily by solving 11.12 for vk+1 , and it follows span (v1 , · · · , vk , vk+1 ) = span (u1 , · · · , uk , uk+1 ) . If l ≤ k,

 (uk+1 , ul ) = C (vk+1 , ul ) −  C (vk+1 , ul ) −

k ∑



k ∑

 (vk+1 , uj ) (uj , ul ) =

j=1

(vk+1 , uj ) δ lj  = C ((vk+1 , ul ) − (vk+1 , ul )) = 0.

j=1

The vectors, length. 

n {uj }j=1

, generated in this way are therefore orthonormal because each vector has unit

Theorem 13.1.7 Let K be a nonempty closed subspace of H a Hilbert space. Let y ∈ H. Then x = P y, the closest point in K to y if and only if (y − x, w) = 0 for all w ∈ K. If K is a finite dimensional subspace of H then by Lemma 13.1.6 it has an orthonormal basis {u1 , · · · , un } . Then n ∑ Py = (y, uk ) uk k=1

In particular, if y ∈ K, then y = Py =

n ∑

(y, uk ) uk

k=1

Proof: From Theorem 13.1.4, x = P y, x ∈ K if and only if for all z ∈ K, Re (y − x, z − x) ≤ 0 However, if w ∈ K, let z = x + w and this shows that x = P y if and only if for all w ∈ K, Re (y − x, w) ≤ 0 From Proposition 13.1.5 above, this happens if and only if (y − x, w) = 0 for all w ∈ K. It only remains to verify the orthogonality condition for the vector claimed to be the closest point. ( ) n n ∑ ∑ y− (y, uk ) uk , uj = (y, uj ) − (y, uk ) (uk , uj ) k=1

k=1

= (y, uj ) − (y, uj ) = 0 and so, from the first part, P y is indeed given by the claimed formula.  Because of this theorem, P y is called the orthogonal projection. What if H is not complete but K is a finite dimensional subspace? Is it still the case that you can obtain a projection? Proposition 13.1.8 Let H be an inner product space, not necessarily complete and let K be a finite dimensional subspace. Then if u ∈ H, a point z ∈ K is the closest point to u if and only if ∑n(u − z, w) = 0 for all w ∈ K. Furthermore, there exists a closest point and it is given by i=1 (u, ei ) ei where {e1 , ..., en } is an orthonormal basis for K.

13.1. ORTHOGONAL PROJECTIONS

307 2

Proof: Suppose z is the closest point to u in K. Then if w ∈ K, |u − (z + tw)| has a mini2 mum at t = 0. However, the function of t has a derivative. The function of t equals |u − z| − 2 2 2t Re (u − z, w) + t2 |w| and so its derivative is −2 Re (u − z, w) + t |w| and when t = 0 this is to be zero so Re (u − z, w) = 0 for all w ∈ K. Now (u − z, w) = Re (u − z, w) + i Im (u − z, w) and so (u − z, iw) = Re (u − z, iw) + i Im (u − z, iw) which implies that −i (u − z, w) = −i Re (u − z, w) + Im (u − z, w) = Re (u − z, iw)+i Im (u − z, iw) so Im (u − z, w) = Re (u − z, iw) and this shows that Im (u − z, w) = 0 = Re (u − z, w) so (u − z, w) = 0. 2 2 2 2 Next suppose (u − z, w) = 0 for all w ∈ K. Then |u − w| = |u − z + z − w| = |u − z| +|z − w| 2 2 because 2 Re (u − z, z − w) = 0∑and so it follows that |u − z| ≤ |u − w| for all w ∈ K. n It remains to verify that ∑i=1 (ei , u) ei is as close as possible. From what was just shown, n it ∑ suffices to verify that (u − i=1 (u, ei ) ei , ek ) = 0 for all ek . However, this is just (u, ek ) − i (u, ei ) (ei , ek ) = (u, ek ) − (u, ek ) = 0.  Example 13.1.9 Consider X equal to the continuous functions defined on [−π, π] and let the inner product be given by ∫ π f (x) g (x)dx −π

It is left to the reader to verify that this is an inner product. Letting ek be the function x → define ( ) n M ≡ span {ek }k=−n . Then you can verify that



(ek , em ) =

(

π

−π

1 √ e−ikx 2π

)(

√1 eikx , 2π

) ∫ π 1 1 √ emix dx = ei(m−k)x = δ km 2π −π 2π

then for a given function f ∈ X, the function from M which is closest to f in this inner product norm is n ∑ (f, ek ) ek g= √1 2π

∫π

k=−n

In this case (f, ek ) = f (x) e −π partial sum of the Fourier series.

ikx

dx. These are the Fourier coefficients. The above is the nth

2 To( show how this kind ) of thing approximates a given function, let f (x) = x . Let M = { }3 √1 e−ikx span . Then, doing the computations, you find the closest point is of the form 2π k=−3

( ) ) ∑ ( 3 k 1√ 5 1 (−1) 2 √ √ 1 −ikx + 2π 2 √ 2 π√ e 3 k2 2π 2π k=1 ( ) 3 k ∑ (−1) 2 √ √ 1 ikx + 2 π√ e k2 2π k=1

and now simplify to get 1 2 ∑ k π + (−1) 3 3

k=1

(

4 k2

) cos kx

Then a graph of this along with the graph of y = x2 is given below. In this graph, the dashed graph is of y = x2 and the solid line is the graph of the above Fourier series approximation. If we had taken the partial sum up to n much bigger, it would have been very hard to distinguish between the graph of the partial sum of the Fourier series and the graph of the function it is approximating. This is in contrast to approximation by Taylor series in which you only get approximation at a point of a function and its derivatives. These are very close near the point of interest but typically fail to approximate the function on the entire interval.

308

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

13.2

Formula for Distance to a Subspace

Let K be a finite dimensional subspace of a real inner product space H, for the sake of convenience, and suppose a basis for K is {v1 , ..., vn } . Thus this is a closed subspace. Then each point of H has a closest point in K thanks to Proposition 13.1.8. I want a convenient formula for the distance to K. Definition 13.2.1 If Gij ≡ (vi , vj ) where {v1 , ..., vn } are vectors, then G is called the Grammian matrix, also the metric tensor. This matrix will also be denoted as G ( v1 , ..., vn ) to indicate the vectors used in defining G. Thus, it is an n × n matrix. Proposition 13.2.2 {v1 , ..., vn } is linearly independent, if and only if G (v1 , ..., vn ) is invertible. ∑ ∑n Proof: If G is invertible, then if i=1 xi vi = 0, i (vj , vi ) xi = 0 and so Gx = 0 which ∑ can only hapen if x = 0 because G is invertible. If G is not invertible, then for some x ̸= 0, j Gij xj = (∑ ) ∑ j j = 0 for each vi and so j (vi , vj ) x = 0 for each i. However, this requires that j vj x , v i ∑ j v x = 0 where x = ̸ 0. Thus, if G is invertible, {v , ..., v } is independent and if G is not 1 n j j invertible, then {v1 , ..., vn } is not linearly independent.  Let V ≡ span (v1 , ..., vn ) where these spanning vectors constitute a linearly independent set. Suppose u ∈ H. I want to find a convenient formula for the distance between u and V . From Theorem 13.1.7, P u ≡ z, the projection of u onto K which is the closest point of K to u, is defined by (u − z, vi ) = 0 for all vi or equivalently (u − z, v) = 0 for all v ∈ V . Thus, for d the distance from u to V, 2 2 2 2 |u| = |u − z| + |z| = d2 + |z| , (u, vi ) = (z, vi ) for each i (13.3) ∑n Let z = i=1 z i vi . Then in the above,   n n ∑ ∑ Gij z j z j vj , v i  = (u, vi ) = (z, vi ) =  j=1

j=1

( )T Letting z ≡ z 1 , ..., z n and y ≡ ((u, v1 ) , (u, v2 ) , ..., (u, v3 )) , G (v1 , ..., vn ) z = y, zT G (v1 , ..., vn ) = yT

(13.4)

From 13.3 and 13.4, 2

|u| = d2 +



Gij z i z j = d2 + zT G (v1 , ..., vn ) z = d2 + yT z

i,j

Then from 13.3 and 13.4, (

G (v1 , ..., vn ) 0 yT 1

)(

z d2

)

( =

y 2 |u|

)

By Cramer’s rule, ( det 2

d =

G (v1 , ..., vn ) y 2 yT |u|

det (G (v1 , ..., vn ))

) ≡

det (G (v1 , ..., vn , u)) det (G (v1 , ..., vn ))

This proves the interesing approximation theorem. Theorem 13.2.3 Suppose {v1 , ..., vn } is a linearly independent set of vectors in H an inner product 1 ,...,vn ,u)) space. Then if u ∈ H, and d is the distance to V ≡ span (v1 , ..., vn ) , then d2 = det(G(v det(G(v1 ,...,vn )) .

13.3. RIESZ REPRESENTATION THEOREM, ADJOINT MAP

13.3

309

Riesz Representation Theorem, Adjoint Map

The next theorem is one of the most important results in the theory of inner product spaces. It is called the Riesz representation theorem. Theorem 13.3.1 Let f ∈ L (H, F) where H is a Hilbert space and f is continuous. Recall that in finite dimensions, this is automatic. Then there exists a unique z ∈ H such that for all x ∈ H, f (x) = (x, z) . Proof: First I will verify uniqueness. Suppose zj works for j = 1, 2. Then for all x ∈ H, 0 = f (x) − f (x) = (x, z1 − z2 ) and so z1 = z2 . If f (H) = 0, let z = 0 and this works. Otherwise, let u ∈ / f −1 (0) which is a closed subspace of H. Let w = u − P u ̸= 0. Then f (f (w) x − f (x) w) = f (w) f (x) − f (x) f (w) = 0 and so from Theorem 13.1.7, 0 = (f (w) x − f (x) w, w) = f (w) (x, w) − f (x) (w, w) It follows that for all x,

( f (x) =

x,

f (w)w |w|

)

2

.

This leads to the following important definition. Corollary 13.3.2 Let A ∈ L (X, Y ) where X and Y are two inner product spaces of finite dimension or else Hilbert spaces. Then there exists a unique A∗ ∈ L (Y, X) , the bounded linear transformations, such that (Ax, y)Y = (x, A∗ y)X (13.5) for all x ∈ X and y ∈ Y. The following formula holds ∗

(αA + βB) = αA∗ + βB ∗ ∗

Also, (A∗ ) = A. Proof: Let fy ∈ L (X, F) be defined as fy (x) ≡ (Ax, y)Y . This is linear and |fy (x)| = |(Ax, y)Y | ≤ |Ax| |y| ≤ (||A|| |y|) |x| Then by the Riesz representation theorem, there exists a unique element of X, A∗ (y) such that (Ax, y)Y = (x, A∗ (y))X . It only remains to verify that A∗ is linear. Let a and b be scalars. Then for all x ∈ X, (x, A∗ (ay1 + by2 ))X ≡ (Ax, (ay1 + by2 ))Y ≡ a (Ax, y1 ) + b (Ax, y2 ) ≡ ∗

a (x, A (y1 )) + b (x, A∗ (y2 )) = (x, aA∗ (y1 ) + bA∗ (y2 )) .

310

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

Since this holds for every x, it follows A∗ (ay1 + by2 ) = aA∗ (y1 ) + bA∗ (y2 ) which shows A∗ is linear as claimed. Consider the last assertion that ∗ is conjugate linear. ( ∗ ) x, (αA + βB) y ≡ ((αA + βB) x, y) = =

α (Ax, y) + β (Bx, y) = α (x, A∗ y) + β (x, B ∗ y) ) ) ( ) ( ( (x, αA∗ y) + x, βA∗ y = x, αA∗ + βA∗ y .

Since x is arbitrary,

( ) ∗ (αA + βB) y = αA∗ + βA∗ y

and since this is true for all y,



(αA + βB) = αA∗ + βA∗ .

( ∗ ) Finally, (A∗ x, y) = (y, A∗ x) = (Ay, x) = (x, Ay) while also, (A∗ x, y) = x, (A∗ ) y and so for all x,

(

) ∗ x, (A∗ ) y − Ay = 0



and so (A∗ ) = A.  Definition 13.3.3 The linear map, A∗ is called the adjoint of A. In the case when A : X → X and A = A∗ , A is called a self adjoint map. Such a map is also called Hermitian. ( )T Theorem 13.3.4 Let M be an m × n matrix. Then M ∗ = M in words, the transpose of the conjugate of M is equal to the adjoint. Proof: Using the definition of the inner product in Cn , ∑ ∑ ∑ (M ∗ )ij yj = (M x, y) = (x,M ∗ y) ≡ xi (M ∗ )ij yj xi . j

i

Also (M x, y) =

i,j

∑∑ j

Mji yj xi .

i

Since x, y are arbitrary vectors, it follows that Mji = (M ∗ )ij and so, taking conjugates of both sides, ∗ Mij = Mji  Some linear transformations preserve distance. Something special can be asserted about these which is in the next lemma. Lemma 13.3.5 Suppose R ∈ L (X, Y ) where X, Y are inner product spaces and R preserves distances. Then R∗ R = I. Proof: Since R preserves distances, |Ru| = |u| for every u. Let u,v be arbitrary vectors in X |u + v|

2

= |u| + |v| + 2 Re (u, v)

2

2

|Ru + Rv|

2

= |Ru| + |Rv| + 2 Re (Ru, Rv)

2

2

= |u| + |v| + 2 Re (R∗ Ru, v) 2

Thus

2

Re (R∗ Ru − u, v) = 0

for all v and so by Proposition 13.1.5, (R∗ Ru − u, v) = 0 for all v and so R∗ Ru = u for all u which implies R∗ R = I.  The next theorem is interesting. You have a p dimensional subspace of Fn where F = R or C. Of course this might be “slanted”. However, there is a linear transformation Q which preserves distances which maps this subspace to Fp .

13.3. RIESZ REPRESENTATION THEOREM, ADJOINT MAP

311

Theorem 13.3.6 Suppose V is a subspace of Fn having dimension p ≤ n. Then there exists a Q ∈ L (Fn , Fn ) such that QV ⊆ span (e1 , · · · , ep ) and |Qx| = |x| for all x. Also

Q∗ Q = QQ∗ = I. p

Proof: By Lemma 13.1.6 there exists an orthonormal basis for V, {vi }i=1 . By using the Gram Schmidt process this may be extended to an orthonormal basis of the whole space Fn , {v1 , · · · , vp , vp+1 , · · · , vn } . ∑n Now define Q ∈ L (Fn , Fn ) by Q (vi ) ≡ ei and extend linearly. If i=1 xi vi is an arbitrary element of Fn , ( n 2 n 2 ) 2 n n ∑ ∑ ∑ ∑ 2 xi v i = xi ei = |xi | = xi v i . Q i=1

i=1

i=1

i=1

Thus Q preserves lengths and so, by Lemma 13.3.5, it follows that Q∗ Q = I. Also, this shows that Q maps V onto V and so a generic element of V is of the form Qx. Now   =I z }| { 2 2 |Q∗ Qx| = (Q∗ Qx, Q∗ Qx) = Qx, QQ∗ Qx = (Qx, Qx) = |Qx| showing that Q∗ also preserves lengths. Hence it is also the case that QQ∗ = I because from the ∗ definition of the adjoint, (Q∗ ) = Q.  Definition 13.3.7 If U ∈ L (X, X) for X an inner product space, then U is called unitary if U ∗ U = U U ∗ = I. Note that it is actually shown that QV = span (e1 , · · · , ep ) and that in case p = n one obtains that a linear transformation which maps an orthonormal basis to an orthonormal basis is unitary. Unitary matrices are also characterized by preserving length. More generally Corollary 13.3.8 Suppose U ∈ L (X, X) where X is an inner product space. Then U is unitary if and only if |U x| = |x| for all x so it preserves distance. Proof: ⇒ If U is unitary, then |U x| = (U x, U x) = (U ∗ U x, x) = (x, x) = |x| . ⇐ If |U x| = |x| for all x then by Lemma 13.3.5, U ∗ U = I. Thus U is onto since it is one to one and so a generic element of X is U x. Note how this would fail if you had U ∈ L (X, Y ) where the dimension of Y is larger than the dimension of X. Then as above, 2

2

|U ∗ U x| = (U ∗ U x, U ∗ U x) = (U x, U U ∗ U x) = (U x, U x) = |U x| 2

2



Thus also U U ∗ = I because U ∗ preserves distances and (U ∗ ) = U from the definition.  Now here is an important result on factorization of an m × n matrix. It is called a QR factorization. Theorem 13.3.9 Let A be an m × n complex matrix. Then there exists a unitary Q and R, all zero below the main diagonal (Rij = 0if i > j) such that A = QR. Proof: This is obvious if m = 1. ( a1 · · · Suppose true for m − 1 and let

) an

( A=

a1

···

( = (1)

a1 )

an

···

) an

, A is m × n

312

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

Using Theorem 13.3.6, there exists Q1 a unitary matrix such that Q1 (a1 / |a1 |) = e1 in case a1 ̸= 0. Thus Q1 a1 = |a1 | e1 . If a1 = 0, let Q1 = I. Thus ( ) a bT Q1 A = 0 A1 where A1 is (m − 1) × (n − 1). If n = 1, this obtains ( ) ( ) a a ∗ Q1 A = , A = Q1 , let Q = Q∗1 . 0 0 That which is desired is obtained. So assume n > 1. By induction, there exists Q′2 an (m − 1) × ′ = 0 if i > j. Then (n − 1) unitary matrix such that Q′2 A1 = R′ , Rij ) ) ( ( a bT 1 0 =R Q1 A = 0 R′ 0 Q′2 Since the product of unitary matrices is unitary, there exists Q unitary such that Q∗ A = R and so A = QR. 

13.4

Least Squares

A common problem in experimental work is to find a straight line which approximates as well as p possible a collection of points in the plane {(xi , yi )}i=1 . The usual way of dealing with these problems is by the method of least squares and it turns out that all these sorts of approximation problems can be reduced to Ax = b where the problem is to find the best x for solving this equation even when there is no solution. Lemma 13.4.1 Let V and W be finite dimensional inner product spaces and let A : V → W be linear. For each y ∈ W there exists x ∈ V such that |Ax − y| ≤ |Ax1 − y| for all x1 ∈ V. Also, x ∈ V is a solution to this minimization problem if and only if x is a solution to the equation, A∗ Ax = A∗ y. Proof: By Theorem 13.1.7 on Page 306 there exists a point, Ax0 , in the finite dimensional 2 2 subspace, A (V ) , of W such that for all x ∈ V, |Ax − y| ≥ |Ax0 − y| . Also, from this theorem, this happens if and only if Ax0 − y is perpendicular to every Ax ∈ A (V ) . Therefore, the solution is characterized by (Ax0 − y, Ax) = 0 for all x ∈ V which is the same as saying (A∗ Ax0 − A∗ y, x) = 0 for all x ∈ V. In other words the solution is obtained by solving A∗ Ax0 = A∗ y for x0 .  Consider the problem of finding the least squares regression line in statistics. Suppose you have n given points in the plane, {(xi , yi )}i=1 and you would like to find constants m and b such that the line y = mx+b goes through all these points. Of course this will be impossible in general. Therefore, try to find m, b such that you do the best you can to solve the system     ( ) x1 1 y1  .   . ..   m  . = . .   .   . b xn 1 yn  ( ) y1  . m  −  .. which is of the form y = Ax. In other words try to make A b yn

 2   as small as possible. 

13.5. FREDHOLM ALTERNATIVE

313

According to what was just shown, it is desired to solve the  ( ) y1  m ∗ ∗  .. A A =A  . b yn Since A∗ = AT in this case, ( ∑ n x2i ∑i=1 n i=1 xi

∑n i=1

xi

n

)(

m b

)

following for m and b.   . 

( ∑ ) n i=1 xi yi = ∑n i=1 yi

Solving this system of equations for m and b, ∑n ∑n ∑n − ( i=1 xi ) ( i=1 yi ) + ( i=1 xi yi ) n m= ∑n ∑n 2 ( i=1 x2i ) n − ( i=1 xi ) and

∑n ∑n ∑n ∑n − ( i=1 xi ) i=1 xi yi + ( i=1 yi ) i=1 x2i . b= ∑n ∑n 2 ( i=1 x2i ) n − ( i=1 xi )

One could clearly do a least squares fit for curves of the form y = ax2 + bx + c in the same way. In this case you solve as well as possible for a, b, and c the system      y1 x21 x1 1 a   .   .. ..   .    . . .   b  =  ..   . c yn x2n xn 1 using the same techniques.

13.5

Fredholm Alternative

The best context in which to study the Fredholm alternative is in inner product spaces. This is done here. Definition 13.5.1 Let S be a subset of an inner product space, X. Define S ⊥ ≡ {x ∈ X : (x, s) = 0 for all s ∈ S} . The following theorem also follows from the above lemma. It is sometimes called the Fredholm alternative. Theorem 13.5.2 Let A : V → W where A is linear and V and W are inner product spaces. Then ⊥ A (V ) = ker (A∗ ) . Proof: Let y = Ax so y ∈ A (V ) . Then if A∗ z = 0, (y, z) = (Ax, z) = (x, A∗ z) = 0 ⊥



showing that y ∈ ker (A∗ ) . Thus A (V ) ⊆ ker (A∗ ) . ⊥ Now suppose y ∈ ker (A∗ ) . Does there exists x such that Ax = y? Since this might not be immediately clear, take the least squares solution to the problem. Thus let x be a solution to A∗ Ax = A∗ y. It follows A∗ (y − Ax) = 0 and so y−Ax ∈ ker (A∗ ) which implies from the assumption about y that (y − Ax, y) = 0. Also, since Ax is the closest point to y in A (V ) , Theorem 13.1.7 on Page 306 implies that (y − Ax, Ax1 ) = 0 for all x1 ∈ V. In particular this is true for x1 = x and so =0

z }| { 2 ⊥ 0 = (y − Ax, y) − (y − Ax, Ax) = |y − Ax| , showing that y = Ax. Thus A (V ) ⊇ ker (A∗ ) . 

314

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

Corollary 13.5.3 Let A, V, and W be as described above. If the only solution to A∗ y = 0 is y = 0, then A is onto W. Proof: If the only solution to A∗ y = 0 is y = 0, then ker (A∗ ) = {0} and so every vector from ⊥ W is contained in ker (A∗ ) and by the above theorem, this shows A (V ) = W . 

13.6

The Determinant and Volume

The determinant is the essential algebraic tool which provides a way to give a unified treatment of the concept of p dimensional volume. Here is the definition of what is meant by such a thing. In what follows, X will be typically some Rm . Definition 13.6.1 Let u1 , · · · , up be vectors in some inner product space X. The parallelepiped determined by these vectors will be denoted by P (u1 , · · · , up ) and it is defined as   p ∑  p P (u1 , · · · , up ) ≡ sj uj : sj ∈ [0, 1] = U Q, Q = [0, 1]   j=1

The volume of this parallelepiped is defined as volume of P (u1 , · · · , up ) ≡ v (P (u1 , · · · , up )) ≡ (det (G))

1/2

.

where Gij = ui · uj . This G is called the metric tensor or Grammian matrix. Let G (u1 , · · · , up ) denote the metric tensor determined by u1 , · · · , up . The vectors ui are dependent, if and only if the p dimensional volume just defined gives 0. That is, if and only if det (G (u1 , · · · , up )) = 0. The last assertion follows from Proposition 13.2.2. I am going to show that this is the only reasonable definition of volume for such a parallelepiped if you desire to preserve Euclidean ideas of distance and volume. Here is a picture which shows the relation between P (u1 , · · · , up−1 ) and P (u1 , · · · , up ). In particular, if you have a parallelepiped P (u1 , · · · , up−1 ) , 6 then by adding another vector u not in the span of the w {u1 , · · · , up−1 } you would want the p dimensional volume of P (u1 , · · · , up−1 , u) to equal the distance from u to the sub space spanned by u1 , · · · , up−1 multiplied by the p − 1 dimenu sional volume of P (u1 , · · · , up−1 ) , v (P (u1 , · · · , up−1 )). Thus, 3 from Theorem 13.2.3, assuming P (u1 , · · · , up−1 ) ̸= 0 so that θ P det G (u1 , · · · , up−1 ) ̸= 0 you would want the p dimensional volume to satisfy P = P (u1 , · · · , up−1 ) 2

v (P (u1 , · · · , up−1 , u)) =

det (G (u1 , · · · , up−1 , u)) 2 v (P (u1 , · · · , up−1 )) = det (G (u1 , · · · , up−1 , u)) det (G (u1 , · · · , up−1 ))

and so it follows that this is the only geometrically reasonable definition of the volume if the only 1/2 reasonable definition of one dimensional volume is det (G (v)) . Clearly if v = 0, this gives what the volume should be, 0. If v ̸= 0, then P (v) is just a line of the form 0 + tv : t ∈ [0, 1] and the endpoints would be 0 and v. We would want the one dimensional volume of this line to be its length. But if length is to be defined in terms of the Pythagorean theorem, this length is just 1/2 1/2 (v, v) = det (G (v)) . Therefore, the above is the only reasonable definition of Euclidean volume.

13.7. FINDING AN ORTHOGONAL BASIS

13.7

315

Finding an Orthogonal Basis

The Gram Schmidt process described above gives a way to generate an orthogonal set of vectors from a linearly independent set. Is there a convenient way to do this? Probably not. However, if you have access to a computer algebra system there might be a way which could help. In the following lemma, vi will be a vector and it is assumed that vi , i = 1, ..., n are linearly independent. Lemma 13.7.1 Let {v1 , ..., vn } be linearly independent and consider the following formal derivative:   (v1 , v1 ) (v1 , v2 ) ··· (v1 , vn−1 ) v1  (v , v ) (v2 , v2 ) ··· (v2 , vn−1 ) v2  2 1    .. .. .. ..    det  . . . .     (vn−1 , v1 ) (vn−1 , v2 ) · · · (vn−1 , vn−1 ) vn−1  (vn , v1 )

···

(vn , v2 )

(vn , vn−1 )

vn

Then the vector which results from expanding this determinant formally is perpendicular to each of v1 , ..., vn−1 . ∑n Proof: It is of the form i=1 vi Ci where Ci is a suitable (n − 1) × (n − 1) determinant. Thus the inner product of this with vk for k ≤ n − 1 is the expansion of a determinant which has two equal columns. However, the inner product with vn will be the Grammian of {v1 , ..., vn } which is not zero since these vectors vi are independent, this by Proposition 13.2.2  Example 13.7.2 The vectors 1, x, x2 , x3 are linearly independent on [0, 1], the vector space being the continuous functions defined on [0, 1]. You might show this. An inner product is given by ( ) ∫1 f (x) g (x) dx. Find an orthogonal basis for span 1, x, x2 , x3 . 0 You could use the above lemma. u1 (x) = 1. Now I will assemble the formal determinants as given above.     1 12 13 1 1 ( ) 1 2 1  1 1 1 x  1 1  1 1    det , det  2 3 x  , det  21 31 41 1 2    x x 1 1 2 3 4 5 x2 1 1 1 3 4 3 x 4 5 6 Now the orthogonal basis is obtained from evaluating these determinants and adding 1 to the list.{Thus an orthonormal basis is } 1 2 1 1 1 1 1 1, x − 12 , 12 x − 12 x + 72 , 2160 x3 − 1440 x2 + 3600 x − 43 1200 Is this horrible? Yes it is. However, if you have a computer algebra system do it for you, it isn’t so bad. For example, to get the last term, you just do     1 1 x x2  x ( )  x x 2 x3       2  1 x x2 =  2   x   x x3 x4  x3 Then you do the following.

 ∫ 0

1

   

x3

1 x x2 x3

x x2 x3 x4

x2 x3 x4 x5





     dx =   

1 1 2 1 3 1 4

x4

1 2 1 3 1 4 1 5

x5

1 3 1 4 1 5 1 6

    

You could get Matlab to do it for you. Then you add in the last column which consists of the original vectors. If you wanted an orthonormal basis, you could divide each vector by its magnitude. This was only painless because I let the computer do all the tedious busy work. However, I think it has independent interest because it gives a formula for a vector which will be orthogonal to a given set of linearly independent vectors.

316

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

13.8

Exercises

1. Find the best solution to the system x + 2y = 6 2x − y = 5 3x + 2y = 0 2. Find an orthonormal basis for R3 , {w1 , w2 , w3 } given that w1 is a multiple of the vector (1, 1, 2). 3. Suppose A = AT is a symmetric real n × n matrix which has all positive eigenvalues. Define (x, y) ≡ (Ax, y) . Show this is an inner product on Rn . What does the Cauchy Schwarz inequality say in this case? 4. Let ||x||∞ ≡ max {|xj | : j = 1, 2, · · · , n} . Show this is a norm on Cn . Here ( )T x = x1 · · · xn . Show 1/2

||x||∞ ≤ |x| ≡ (x, x) where the above is the usual inner product on Cn . 5. Let ||x||1 ≡

∑n

n j=1 |xj | .Show this is a norm on C . Here x =

( x1

···

)T xn

. Show

1/2

||x||1 ≥ |x| ≡ (x, x)

where the above is the usual inner product on Cn . Show there cannot exist an inner product such that this norm comes from the inner product as described above for inner product spaces. 6. Show that if ||·|| is any norm on any vector space, then |||x|| − ||y||| ≤ ||x − y|| . 7. Relax the assumptions in the axioms for the inner product. Change the axiom about (x, x) ≥ 0 and equals 0 if and only if x = 0 to simply read (x, x) ≥ 0. Show the Cauchy Schwarz inequality 1/2 1/2 still holds in the following form. |(x, y)| ≤ (x, x) (y, y) . n

8. Let H be an inner product space and let {uk }k=1 be an orthonormal basis for H. Show (x, y) =

n ∑

(x, uk ) (y, uk ).

k=1

9. Let the vector space V consist of real polynomials of degree no larger than 3. Thus a typical vector is∫ a polynomial of the form a + bx + cx2 + dx3 . For p, q ∈ V define the inner product, 1 (p, q) ≡ 0 p (x) q (x) dx. Show this is indeed an inner product. { } Then state the Cauchy Schwarz 2 3 inequality in terms of this inner product. Show 1, x, x , x is a basis for V . Finally, find an orthonormal basis for V. This is an example of some orthonormal polynomials. 10. Let Pn denote the polynomials of degree no larger than n − 1 which are defined on an interval [a, b] . Let {x1 , · · · , xn } be n distinct points in [a, b] . Now define for p, q ∈ Pn , (p, q) ≡

n ∑

p (xj ) q (xj )

j=1

Show this yields an inner product on Pn . Hint: Most of the axioms are obvious. The one which says (p, p) = 0 if and only if p = 0 is the only interesting one. To verify this one, note that a nonzero polynomial of degree no more than n − 1 has at most n − 1 zeros.

13.8. EXERCISES

317

11. Let C ([0, 1]) denote the vector space of continuous real valued functions defined on [0, 1]. Let the inner product be given as ∫ 1 (f, g) ≡ f (x) g (x) dx 0

Show this is an inner product. Also let V be the subspace described in Problem 9. Using the result of this problem, find the vector in V which is closest to x4 . 12. A regular Sturm Liouville problem involves the differential equation, for an unknown function of x which is denoted here by y, ′

(p (x) y ′ ) + (λq (x) + r (x)) y = 0, x ∈ [a, b] and it is assumed that p (t) , q (t) > 0 for any t ∈ [a, b] and also there are boundary conditions, C1 y (a) + C2 y ′ (a) = ′

C3 y (b) + C4 y (b) =

0 0

where C12 + C22 > 0, and C32 + C42 > 0. There is an immense theory connected to these important problems. The constant, λ is called an eigenvalue. Show that if y is a solution to the above problem corresponding to λ = λ1 and if z is a solution corresponding to λ = λ2 ̸= λ1 , then ∫ b q (x) y (x) z (x) dx = 0. (13.6) a

and this defines an inner product. Hint: Do something like this: ′

(p (x) y ′ ) z + (λ1 q (x) + r (x)) yz

= 0,

(p (x) z ) y + (λ2 q (x) + r (x)) zy

= 0.

′ ′

Now subtract and either use integration by parts or show ′



(p (x) y ′ ) z − (p (x) z ′ ) y = ((p (x) y ′ ) z − (p (x) z ′ ) y)



and then integrate. Use the boundary conditions to show that y ′ (a) z (a) − z ′ (a) y (a) = 0 and y ′ (b) z (b) − z ′ (b) y (b) = 0. The formula, 13.6 is called an orthogonality relation. It turns out there are typically infinitely many eigenvalues and it is interesting to write given functions as an infinite series of these “eigenfunctions”. ∫π 13. Consider the continuous functions defined on [0, π] , C ([0, π]) . Show (f, g) ≡ 0 f gdx is an }∞ {√ 2 inner product on this vector space. Show the functions are an orthonormal π sin (nx) n=1

set. What √ about the (√ does this mean ) dimension of the vector space C ([0, π])? Now let VN = 2 2 span π sin (x) , · · · , π sin (N x) . For f ∈ C ([0, π]) find a formula for the vector in VN which is closest to f with respect to the norm determined from the above inner product. This is called the N th partial sum of the Fourier series of f . An important problem is to determine whether and in what way this Fourier series converges to the function f . The norm which comes from this inner product is sometimes called the mean square norm. 14. Consider the subspace V ≡ ker (A) where    A= 

1 2 4 5

4 1 9 6

−1 2 0 3

−1 3 1 4

    

Find an orthonormal basis for V. Hint: You might first find a basis and then use the Gram Schmidt procedure.

318

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

15. The Gram Schmidt process starts with a basis for a subspace {v1 , · · · , vn } and produces an orthonormal basis for the same subspace {u1 , · · · , un } such that span (v1 , · · · , vk ) = span (u1 , · · · , uk ) for each k. Show that in the case of Rm the QR factorization does the same thing. More specifically, if ( ) A = v1 · · · vn and if A = QR ≡

( q1

···

) qn

R

then the vectors {q1 , · · · , qn } is an orthonormal set of vectors and for each k, span (q1 , · · · , qk ) = span (v1 , · · · , vk ) 16. Verify the parallelogram identify for any inner product space, 2

2

2

2

|x + y| + |x − y| = 2 |x| + 2 |y| . Why is it called the parallelogram identity? 17. Let H be an inner product space and let K ⊆ H be a nonempty convex subset. This means that if k1 , k2 ∈ K, then the line segment consisting of points of the form tk1 + (1 − t) k2 for t ∈ [0, 1] is also contained in K. Suppose for each x ∈ H, there exists P x defined to be a point of K closest to x. Show that P x is unique so that P actually is a map. Hint: Suppose z1 and z2 both work as closest points. Consider the midpoint, (z1 + z2 ) /2 and use the parallelogram identity of Problem 16 in an auspicious manner. 18. In the situation of Problem 17 suppose K is a closed convex subset and that H is complete. This means every Cauchy sequence converges. Recall a sequence {kn } is a Cauchy sequence if for every ε > 0 there exists Nε such that whenever m, n > Nε , it follows |km − kn | < ε. Let {kn } be a sequence of points of K such that lim |x − kn | = inf {|x − k| : k ∈ K}

n→∞

This is called a minimizing sequence. Show there exists a unique k ∈ K such that lim |kn − k|

n→∞

and that k = P x. That is, there exists a well defined projection map onto the convex subset of H. Hint: Use the parallelogram identity in an auspicious manner to show {kn } is a Cauchy sequence which must therefore converge. Since K is closed it follows this will converge to something in K which is the desired vector. 19. Let H be an inner product space which is also complete and let P denote the projection map onto a convex closed subset, K. Show this projection map is characterized by the inequality Re (k − P x, x − P x) ≤ 0 for all k ∈ K. That is, a point z ∈ K equals P x if and only if the above variational inequality holds. This is what that inequality is called. This is because k is allowed to vary and the inequality continues to hold for all k ∈ K.

13.8. EXERCISES

319

20. Using Problem 19 and Problems 17 - 18 show the projection map, P onto a closed convex subset is Lipschitz continuous with Lipschitz constant 1. That is |P x − P y| ≤ |x − y| 21. Give an example of two vectors in R4 or R3 x, y and a subspace V such that x · y = 0 but P x·P y ̸= 0 where P denotes the projection map which sends x to its closest point on V . 22. Suppose you are given the data, (1, 2) , (2, 4) , (3, 8) , (0, 0) . Find the linear regression line using the formulas derived above. Then graph the given data along with your regression line. 23. Generalize the least squares procedure to the situation in which data is given and you desire to fit it with an expression of the form y = af (x) + bg (x) + c where the problem would be to find a, b and c in order to minimize the error. Could this be generalized to higher dimensions? How about more functions? 24. Let A ∈ L (X, Y ) where X and Y are finite dimensional vector spaces with the dimension of X equal to n. Define rank (A) ≡ dim (A (X)) and nullity(A) ≡ dim (ker (A)) . Show that r r nullity(A) + rank (A) = dim (X) . Hint: Let {xi }i=1 be a basis for ker (A) and let {xi }i=1 ∪ n−r n−r {yi }i=1 be a basis for X. Then show that {Ayi }i=1 is linearly independent and spans AX. 25. Let A be an m × n matrix. Show the column rank of A equals the column rank of A∗ A. Next verify column rank of A∗ A is no larger than column rank of A∗ . Next justify the following inequality to conclude the column rank of A equals the column rank of A∗ . rank (A) = rank (A∗ A) ≤ rank (A∗ ) ≤ = rank (AA∗ ) ≤ rank (A) . Hint: Start with an orthonormal basis, {Axj }j=1 of A (Fn ) and verify {A∗ Axj }j=1 is a basis for A∗ A (Fn ) . r

r

26. Let A be a real m × n matrix and let A = QR be the QR factorization with Q orthogonal and R upper triangular. Show that there exists a solution x to the equation RT Rx = RT QT b and that this solution is also a least squares solution defined above such that AT Ax = AT b. T

T

T

27. Here are three vectors in R4 : (1, 2, 0, 3) , (2, 1, −3, 2) , (0, 0, 1, 2) . Find the three dimensional volume of the parallelepiped determined by these three vectors. T

T

28. Here are two vectors in R4 : (1, 2, 0, 3) , (2, 1, −3, 2) . Find the volume of the parallelepiped determined by these two vectors. T

T

T

29. Here are three vectors in R2 : (1, 2) , (2, 1) , (0, 1) . Find the three dimensional volume of the parallelepiped determined by these three vectors. Recall that from the above theorem, this should equal 0. 30. Find the equation of the plane through the three points (1, 2, 3) , (2, −3, 1) , (1, 1, 7) . 31. Let T map a vector space V to itself. Explain why T is one to one if and only if T is onto. It is in the text, but do it again in your own words. 32. ↑Let all matrices be complex with complex field of scalars and let A be an n × n matrix and B a m × m matrix while X will be an n × m matrix. The problem is to consider solutions to Sylvester’s equation. Solve the following equation for X AX − XB = C

320

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES where C is an arbitrary n × m matrix. Show there exists a unique solution if and only if σ (A) ∩ σ (B) = ∅. Hint: If q (λ) is a polynomial, show first that if AX − XB = 0, then q (A) X − Xq (B) = 0. Next define the linear map T which maps the n × m matrices to the n × m matrices as follows. T X ≡ AX − XB Show that the only solution to T X = 0 is X = 0 so that T is one to one if and only if σ (A) ∩ σ (B) = ∅. Do this by using the first part for q (λ) the characteristic polynomial for −1 B and then use the Cayley Hamilton theorem. Explain why q (A) exists if and only if the condition σ (A) ∩ σ (B) = ∅.

33. What is the geometric significance of the Binet Cauchy theorem, Theorem 8.4.5? 34. Let U, H be finite dimensional inner product spaces. (More generally, complete inner product spaces.) Let A be a linear map from U to H. Thus AU is a subspace of H. For g ∈ AU, define −1 A the ( −1g to be ) unique element of {x : Ax = g} which is closest to 0. Then define (h, g)AU ≡ A g, A−1 h U . Show that this is a well defined inner product. Let U, H be finite dimensional inner product spaces. (More generally, complete inner product spaces.) Let A be a linear map from U to H. Thus AU is a subspace of H. For g ∈ AU, define A−1 ( g to be the )unique element of {x : Ax = g} which is closest to 0. Then define (h, g)AU ≡ A−1 g, A−1 h U . Show that

this is a well defined inner product and that if A is one to one, then ∥h∥AU = A−1 h U and ∥Ax∥AU = ∥x∥U . 35. For f a piecewise continuous function, Sn f (x) =

(∫ π ) n 1 ∑ ikx e f (y) e−iky dy . 2π −π k=−n

where Sn f (x) denotes the nth partial sum of the Fourier series. Recall that this Fourier series was of the form ∫ π n ∑ 1 1 an √ eikx , an ≡ √ f (y) e−iky dy 2π 2π −π k=−n Show this can be written in the form Sn f (x) =



π

−π

where Dn (t) =

f (y) Dn (x − y) dy n 1 ∑ ikt e 2π k=−n

This is called the Dirichlet kernel. Show that 1 sin (n + (1/2)) t Dn (t) = 2π sin (t/2) For V the vector space of piecewise continuous functions, define Sn : V 7→ V by ∫ π Sn f (x) = f (y) Dn (x − y) dy. −π

Show that Sn is a linear transformation. (In∫ fact, Sn f is not just piecewise continuous but π infinitely differentiable. Why?) Explain why −π Dn (t) dt = 1. Hint: To obtain the formula, do the following. ei(t/2) Dn (t) = ei(−t/2) Dn (t) =

n 1 ∑ i(k+(1/2))t e 2π

1 2π

k=−n n ∑ k=−n

ei(k−(1/2))t

13.8. EXERCISES

321

Change the variable of summation in the bottom sum and then subtract and solve for Dn (t). 36. ↑Let V be an inner product space and let U be a finite dimensional subspace with an orthonorn mal basis {ui }i=1 . If y ∈ V, show 2

|y| ≥

n ∑

2

|⟨y, uk ⟩|

k=1

Now suppose that

∞ {uk }k=1

is an orthonormal set of vectors of V . Explain why lim ⟨y, uk ⟩ = 0.

k→∞

When applied to functions, this is a special case of the Riemann Lebesgue lemma. 37. ↑Let f be any piecewise continuous real function which is bounded on [−π, π] . Show, using the above problem, that ∫ π ∫ π lim f (t) sin (nt) dt = lim f (t) cos (nt) dt = 0 n→∞

n→∞

−π

−π

38. ↑∗ Let f be a function which is defined on (−π, π]. The 2π periodic extension is given by the formula f (x + 2π) = f (x) . In the rest of this problem, f will refer to this 2π periodic extension. Assume that f is piecewise continuous, bounded, and also that the following limits exist for this 2π extension. f (x − y) − f (x+) f (x + y) − f (x+) , lim lim y→0+ y→0+ y y Here it is assumed that f (x+) ≡ lim f (x + h) , f (x−) ≡ lim f (x − h) h→0+

h→0+

both exist at every point. The above conditions rule out functions where the slope taken from either side becomes infinite. Actually, you don’t need anything about these quotients being bounded. It is enough to have what is called a Dini condition which is a bound involving a Holder condition and it gives the quotients in L1 but this kind of thing involves more analysis. The above result is still very interesting. Justify the following assertions and eventually conclude that under these very reasonable conditions (more general ones are possible.) lim Sn f (x) = (f (x+) + f (x−)) /2

n→∞

the mid point of the jump. In words, the Fourier series converges to the midpoint of the jump of the function. ∫ π ∫ π Sn f (x) = f (y) Dn (x − y) dy = f (x − y) Dn (y) dy −π

−π

You just change variables and then use 2π periodicity to get this. ∫ π ( ) f (x+) + f (x−) Sn f (x) − f (x+) + f (x−) = f (x − y) − D (y) dy n 2 2 −π

∫ =





0

0

π

π

f (x − y) Dn (y) dy +

0

∫ ≤



π

f (x + y) Dn (y) dy π (f (x+) + f (x−)) Dn (y) dy 0

∫ (f (x − y) − f (x−)) Dn (y) dy +

0

π

(f (x + y) − f (x+)) Dn (y) dy

Now apply some trig. identities and use the result of Problem 37 to conclude that both of these terms must converge to 0.

322

CHAPTER 13. INNER PRODUCT SPACES AND LEAST SQUARES

Chapter 14

Matrices And The Inner Product 14.1

Schur’s Theorem, Hermitian Matrices

Every matrix is related to an upper triangular matrix in a particularly significant way. This is Schur’s theorem and it is the most important theorem in the spectral theory of matrices. The important result which makes this theorem possible is the Gram Schmidt procedure of Lemma 11.4.14. Definition 14.1.1 An n × n matrix U, is unitary if U U ∗ = I = U ∗ U where U ∗ is defined to be the ∗ . Note that every real orthogonal, meaning QT Q = I, transpose of the conjugate of U. Thus Uij = Uji ∗ matrix is unitary. For A any matrix, A , just defined as the conjugate of the transpose, is called the adjoint. As shown above, this is also defined by (Ax, y) = (x,A∗ y) ( Note that if U =

v1

···

) vn

where the vk are orthonormal vectors in Cn , then U is unitary.

This follows because the ij th entry of U ∗ U is viT vj = δ ij since the vi are assumed orthonormal. ∗

Lemma 14.1.2 The following holds. (AB) = B ∗ A∗ . Proof: Using the definition in terms of inner products, (x, (AB) ∗ y) = (ABx, y) = (Bx,A∗ y) = (x, B ∗ A∗ y) and so, since x is arbitrary, (AB) ∗ y = B ∗ A∗ y which shows the result since y is arbitrary.  Theorem 14.1.3 Let A be an n × n matrix. Then there exists a unitary matrix U such that U ∗ AU = T,

(14.1)

where T is an upper triangular matrix having the eigenvalues of A on the main diagonal listed according to multiplicity as roots of the characteristic equation. If A is a real matrix having all real eigenvalues, then U can be chosen to be an orthogonal real matrix. Proof: The theorem is clearly true if A is a 1 × 1 matrix. Just let U = 1, the 1 × 1 matrix which has entry 1. Suppose it is true for (n − 1) × (n − 1) matrices, n ≥ 2 and let A be an n × n matrix. Then let v1 be a unit eigenvector for A. Then there exists λ1 such that Av1 = λ1 v1 , |v1 | = 1. Extend {v1 } to a basis and then use the Gram - Schmidt process or Theorem 13.3.6 to obtain {v1 , · · · , vn }, an orthonormal basis of Cn . Let U0 be a matrix whose ith column is vi so that U0 is 323

324

CHAPTER 14. MATRICES AND THE INNER PRODUCT

unitary. Consider U0∗ AU0 

 v1∗  . ( .  U0∗ AU0 =   .  Av1 vn∗



···

Thus U0∗ AU0 is of the form

 v1∗ )  ( ..   =  .  λ1 v1 vn∗

Avn

(

λ1 0

···

) Avn

)

a A1

where A1 is an n − 1 × n − 1 matrix. Now by induction, there exists an (n − 1) × (n − 1) unitary e1 = Tn−1 , an upper triangular matrix. Consider e1 such that U e ∗ A1 U matrix U 1 ( ) 1 0 U1 ≡ . e1 0 U (

Then U1∗ U1

=

1 0 e∗ 0 U 1

Also

( U1∗ U0∗ AU0 U1

= ( =

)(

1 0 e1 0 U

1 0 e∗ 0 U 1 λ1 0

)( )

∗ Tn−1

)

( =

λ1 0

∗ A1

1 0 0 In−1 )(

)

1 0 e1 0 U

)

≡T

where T is upper triangular. Then let U = U0 U1 . It is clear that this is unitary because both matrices preserve distance. Therefore, so does the product and hence U . Alternatively, ∗

I = U0 U1 U1∗ U0∗ = (U0 U1 ) (U0 U1 )

and so, it follows that A is similar to T and that U0 U1 is unitary. Hence A and T have the same characteristic polynomials, and therefore the same eigenvalues listed according to multiplicity as roots of the characteristic equation. These are the diagonal entries of T listed with multiplicity and so this proves the main conclusion of the theorem. In case A is real with all real eigenvalues, the above argument can be repeated word for word using only the real dot product to show that U can be taken to be real and orthogonal.  As a simple consequence of the above theorem, here is an interesting lemma. Lemma 14.1.4 Let A be of the form 

··· .. .

P1  . . A=  . 0

···

 ∗ ..   .  Ps

where Pk is an mk × mk matrix. Then det (A) =



det (Pk ) .

k

Proof: Let Uk be an mk × mk unitary matrix such that Uk∗ Pk Uk = Tk

14.1. SCHUR’S THEOREM, HERMITIAN MATRICES

325

where Tk is upper triangular. Then letting U denote the the blocks on the diagonal,    U1 · · · 0  .  ..  .. ∗  .   U = . . . , U =  0 · · · Us

block diagonal matrix, having the Ui as

and



U1∗  .  .  . 0

··· .. . ···

 0 P1  . ..   .   .. Us∗ 0

 ∗ U1  . ..   .   .. Ps 0

··· .. . ···

and so det (A) =



det (Tk ) =

··· .. . ···

 0 ..   .  Us∗

  0 T1  . ..    .  =  .. Us 0

··· .. . ···



k

U1∗ .. . 0

··· .. . ···

 ∗ ..   .  Ts

det (Pk ) . 

k

Definition 14.1.5 An n × n matrix A is called Hermitian if A = A∗ . Thus a real symmetric (A = AT ) matrix is Hermitian. The following is the major result about Hermitian matrices. It says that any Hermitian matrix is similar to a diagonal matrix. We say it is unitarily similar because the matrix U in the following theorem which gives the similarity transformation is a unitary matrix. Theorem 14.1.6 If A is an n × n Hermitian matrix, there exists a unitary matrix U such that U ∗ AU = D

(14.2)

where D is a real diagonal matrix. That is, D has nonzero entries only on the main diagonal and these are real. Furthermore, the columns of U are an orthonormal basis of eigenvectors for Cn . If A is real and symmetric, then U can be assumed to be a real orthogonal matrix and the columns of U form an orthonormal basis for Rn . Proof: From Schur’s theorem above, there exists U unitary (real and orthogonal if A is real) such that U ∗ AU = T where T is an upper triangular matrix. Then from Lemma 14.1.2 ∗

T ∗ = (U ∗ AU ) = U ∗ A∗ U = U ∗ AU = T. Thus T = T ∗ and T is upper triangular. This can only happen if T is really a diagonal matrix having real entries on the main diagonal. (If i = ̸ j, one of Tij or Tji equals zero. But Tij = Tji and so they are both zero. Also Tii = Tii .) Finally, let ( ) U = u1 u2 · · · un where the ui denote the columns of U and



 D= 

λ1

0 ..

.

0

   

λn

The equation, U ∗ AU = D implies ( AU

=

Au1

= UD =

(

Au2 λ1 u1

···

) Aun

λ2 u2

···

) λn un

326

CHAPTER 14. MATRICES AND THE INNER PRODUCT

where the entries denote the columns of AU and U D respectively. Therefore, Aui = λi ui and since the matrix is unitary, the ij th entry of U ∗ U equals δ ij and so δ ij = uTi uj = uTi uj = ui · uj . This proves the corollary because it shows the vectors {ui } form an orthonormal basis. In case A is real and symmetric, simply ignore all complex conjugations in the above argument.  This theorem is particularly nice because the diagonal entries are all real. What of a matrix which is unitarily similar to a diagonal matrix without assuming the diagonal entries are real? That is, A is an n × n matrix with U ∗ AU = D Then this requires

U ∗ A∗ U = D ∗

and so since the two diagonal matrices commute, AA∗

=

U DU ∗ U D∗ U ∗ = U DD∗ U ∗ = U D∗ DU ∗

=

U D∗ U ∗ U DU ∗ = A∗ A

The following definition describes these matrices. Definition 14.1.7 An n × n matrix is normal means: A∗ A = AA∗ . We just showed that if A is unitarily similar to a diagonal matrix, then it is normal. The converse is also true. This involves the following lemma. Lemma 14.1.8 If T is upper triangular and normal, then T is a diagonal matrix. If A is normal and U is unitary, then U ∗ AU is also normal. Proof: This is obviously true if T is 1 × 1. In fact, it can’t help being diagonal in this case. Suppose then that the lemma is true for (n − 1) × (n − 1) matrices and let T be an upper triangular normal n × n matrix. Thus T is of the form ( ) ( ) t11 a∗ t11 0T ∗ T = , T = 0 T1 a T1∗ Then

( TT



= (



T T

=

t11 0

a∗ T1

t11 a

0T T1∗

)( )(

t11 a

0T T1∗

t11 0

a∗ T1

)

( =

)

( =

|t11 | + a∗ a T1 a 2

a∗ T1∗ T1 T1∗

2

t11 a∗ aa∗ + T1∗ T1

|t11 | at11

) )

Since these two matrices are equal, it follows a = 0. But now it follows that T1∗ T1 = T1 T1∗ and so by induction T1 is a diagonal matrix D1 . Therefore, ( ) t11 0T T = 0 D1 a diagonal matrix. As to the last claim, let A be normal. Then ∗

(U ∗ AU ) (U ∗ AU ) =

U ∗ A∗ U U ∗ AU = U ∗ A∗ AU

= U ∗ AA∗ U = U ∗ AU U ∗ A∗ U ∗

= (U ∗ AU ) (U ∗ AU ) 

14.2. QUADRATIC FORMS

327

Theorem 14.1.9 An n × n matrix is unitarily similar to a diagonal matrix if and only if it is normal. Proof: It was already shown above that if A is similar to a diagonal matrix then it is normal. Suppose now that A is normal. By Schur’s theorem, there is a unitary matrix U such that U ∗ AU = T where T is upper triangular. By Lemma 14.1.8, T is normal and, since it is upper triangular, it is a diagonal matrix. 

14.2

Quadratic Forms

Definition 14.2.1 A quadratic form in three dimensions is an expression of the form   x ( )   x y z A y  z

(14.3)

where A is a 3 × 3 symmetric matrix. In higher dimensions the idea is the same except you use a larger symmetric matrix in place of A. In two dimensions A is a 2 × 2 matrix. For example, consider (

) x

y

z



3   −4 1

−4 0 −4

  1 x   −4   y  3 z

(14.4)

which equals 3x2 − 8xy + 2xz − 8yz + 3z 2 . This is very awkward because of the mixed terms such as −8xy. The idea is to pick different axes such that if x, y, z are taken with respect to these axes, the quadratic form is much simpler. In other words, look for new variables, x′ , y ′ , and z ′ and a unitary matrix U such that     x′ x     (14.5) U  y′  =  y  ′ z z and if you write the quadratic form in terms of the primed variables, there will be no mixed terms. Any symmetric real matrix is Hermitian and is therefore normal. From Corollary 14.1.6, it follows there exists a real unitary matrix U, (an orthogonal matrix) such that U T AU = D a diagonal matrix. Thus in the quadratic form, 14.3     x x′ ( ) ( )     x y z A y  = x′ y ′ z ′ U T AU  y ′  z z′   x′ ( )  ′  ′ ′ ′ = D y  x y z z′ and in terms of these new variables, the quadratic form becomes λ1 (x′ ) + λ2 (y ′ ) + λ3 (z ′ ) 2

2

2

where D = diag (λ1 , λ2 , λ3 ) . Similar considerations apply equally well in any other dimension. For the given example,

328

CHAPTER 14. MATRICES AND THE INNER PRODUCT 

 √ √  1 3 −4 1 − 21 2 0 2 2 √ √   1√  1 1 6 0 −4  ·  6 6 3 √ 6 √6   −4 √ 1 − 13 3 13 3 1 −4 3 3 3     √1 − √12 √16 2 0 0 3     √2 − √13  =  0 −4 0   0 6 √1 √1 √1 0 0 8 2

6

3

and so if the new variables are given by   

− √12 0 √1 2

√1 6 √2 6 √1 6

√1 3 − √13 √1 3

   x x′   ′    y  =  y , z′ z 

it follows that in terms of the new variables the quadratic form is 2 (x′ ) − 4 (y ′ ) + 8 (z ′ ) . You can work other examples the same way. 2

14.3

2

2

The Estimation Of Eigenvalues

There are ways to estimate the eigenvalues for matrices. The most famous is known as Gerschgorin’s theorem. This theorem gives a rough idea where the eigenvalues are just from looking at the matrix. Theorem 14.3.1 Let A be an n × n matrix. Consider the n Gerschgorin discs defined as     ∑ Di ≡ λ ∈ C : |λ − aii | ≤ |aij | .   j̸=i

Then every eigenvalue is contained in some Gerschgorin disc. This theorem says to add up the absolute values of the entries of the ith row which are off the main diagonal and form the disc centered at aii having this radius. The union of these discs contains σ (A) . Proof: Suppose Ax = λx where x ̸= 0. Then for A = (aij ) , let |xk | ≥ |xj | for all xj . Thus |xk | ̸= 0. ∑ akj xj = (λ − akk ) xk . j̸=k

Then

∑ |xk | |akj | ≥ |akj | |xj | ≥ akj xj = |λ − aii | |xk | . j̸=k j̸=k j̸=k ∑



Now dividing by |xk |, it follows λ is contained in the k th Gerschgorin disc.  Example 14.3.2 Here is a matrix. Estimate its eigenvalues.   2 1 1    3 5 0  0 1 9

14.4. ADVANCED THEOREMS

329

According to Gerschgorin’s theorem the eigenvalues are contained in the disks D1 = {λ ∈ C : |λ − 2| ≤ 2} , D2 = {λ ∈ C : |λ − 5| ≤ 3} , D3 = {λ ∈ C : |λ − 9| ≤ 1} It is important to observe that these disks are in the complex plane. In general this is the case. If you want to find eigenvalues they will be complex numbers. iy

x 2

9

5

So what are the values of the eigenvalues? In this case they are real. You can compute them by graphing the characteristic polynomial, λ3 − 16λ2 + 70λ − 66 and then zooming in on the zeros. If you do this you find the solution is {λ = 1. 295 3} , {λ = 5. 590 5} , {λ = 9. 114 2} . Of course these are only approximations and so this information is useless for finding eigenvectors. However, in many applications, it is the size of the eigenvalues which is important and so these numerical values would be helpful for such applications. In this case, you might think there is no real reason for Gerschgorin’s theorem. Why not just compute the characteristic equation and graph and zoom? This is fine up to a point, but what if the matrix was huge? Then it might be hard to find the characteristic polynomial. Remember the difficulties in expanding a big matrix along a row or column. Also, what if the eigenvalues were complex? You don’t see these by following this procedure. However, Gerschgorin’s theorem will at least estimate them.

14.4

Advanced Theorems

More can be said but this requires some theory from complex variables1 . The following is a fundamental theorem about counting zeros. Theorem 14.4.1 Let U be a region and let γ : [a, b] → U be closed, continuous, bounded variation, and the winding number, n (γ, z) = 0 for all z ∈ / U. Suppose also that f is analytic on U having zeros a1 , · · · , am where the zeros are repeated according to multiplicity, and suppose that none of these zeros are on γ ([a, b]) . Then ∫ ′ m ∑ 1 f (z) n (γ, ak ) . dz = 2πi γ f (z) k=1

Proof: It is given that f (z) = rule,

∏m j=1

(z − aj ) g (z) where g (z) ̸= 0 on U. Hence using the product

f ′ (z) ∑ 1 g ′ (z) = + f (z) z − aj g (z) j=1 m

where

g ′ (z) g(z)

is analytic on U and so 1 2πi

∫ γ

∑ f ′ (z) 1 dz = n (γ, aj ) + f (z) 2πi j=1 m

∫ γ

∑ g ′ (z) dz = n (γ, aj ) .  g (z) j=1 m

Now let A be an n × n matrix. Recall that the eigenvalues of A are given by the zeros of the polynomial, pA (z) = det (zI − A) where I is the n × n identity. You can argue that small changes 1 If you haven’t studied the theory of a complex variable, you should skip this section because you won’t understand any of it.

330

CHAPTER 14. MATRICES AND THE INNER PRODUCT

in A will produce small changes in pA (z) and p′A (z) . Let γ k denote a very small closed circle which winds around zk , one of the eigenvalues of A, in the counter clockwise direction so that n (γ k , zk ) = 1. This circle is to enclose only zk and is to have no other eigenvalue on it. Then apply Theorem 14.4.1. According to this theorem ∫ ′ pA (z) 1 dz 2πi γ pA (z) is always an integer equal to the multiplicity of zk as a root of pA (t) . Therefore, small changes in A result in no change to the above contour integral because it must be an integer and small changes in A result in small changes in the integral. Therefore whenever B is close enough to A, the two matrices have the same number of zeros inside γ k , the zeros being counted according to multiplicity. By making the radius of the small circle equal to ε where ε is less than the minimum distance between any two distinct eigenvalues of A, this shows that if B is close enough to A, every eigenvalue of B is closer than ε to some eigenvalue of A.  Theorem 14.4.2 If λ is an eigenvalue of A, then if all the entries of B are close enough to the corresponding entries of A, some eigenvalue of B will be within ε of λ. Consider the situation that A (t) is an n × n matrix and that t → A (t) is continuous for t ∈ [0, 1] . Lemma 14.4.3 Let λ (t) ∈ σ (A (t)) for t < 1 and let Σt = ∪s≥t σ (A (s)) . Also let Kt be the connected component of λ (t) in Σt . Then there exists η > 0 such that Kt ∩ σ (A (s)) ̸= ∅ for all s ∈ [t, t + η] . Proof: Denote by D (λ (t) , δ) the disc centered at λ (t) having radius δ > 0, with other occurrences of this notation being defined similarly. Thus D (λ (t) , δ) ≡ {z ∈ C : |λ (t) − z| ≤ δ} . Suppose δ > 0 is small enough that λ (t) is the only element of σ (A (t)) contained in D (λ (t) , δ) and that pA(t) has no zeroes on the boundary of this disc. Then by continuity, and the above discussion and theorem, there exists η > 0, t + η < 1, such that for s ∈ [t, t + η] , pA(s) also has no zeroes on the boundary of this disc and A (s) has the same number of eigenvalues, counted according to multiplicity, in the disc as A (t) . Thus σ (A (s)) ∩ D (λ (t) , δ) ̸= ∅ for all s ∈ [t, t + η] . Now let ∪ H= σ (A (s)) ∩ D (λ (t) , δ) . s∈[t,t+η]

It will be shown that H is connected. Suppose not. Then H = P ∪ Q where P, Q are separated and λ (t) ∈ P. Let s0 ≡ inf {s : λ (s) ∈ Q for some λ (s) ∈ σ (A (s))} . There exists λ (s0 ) ∈ σ (A (s0 )) ∩ D (λ (t) , δ) . If λ (s0 ) ∈ / Q, then from the above discussion there are λ (s) ∈ σ (A (s)) ∩ Q for s > s0 arbitrarily close to λ (s0 ) . Therefore, λ (s0 ) ∈ Q which shows that s0 > t because λ (t) is the only element of σ (A (t)) in D (λ (t) , δ) and λ (t) ∈ P. Now let sn ↑ s0 . Then λ (sn ) ∈ P for any λ (sn ) ∈ σ (A (sn )) ∩ D (λ (t) , δ) and also it follows from the above discussion that for some choice of sn → s0 , λ (sn ) → λ (s0 ) which contradicts P and Q separated and nonempty. Since P is nonempty, this shows Q = ∅. Therefore, H is connected as claimed. But Kt ⊇ H and so Kt ∩ σ (A (s)) ̸= ∅ for all s ∈ [t, t + η] .  Theorem 14.4.4 Suppose A (t) is an n×n matrix and that t → A (t) is continuous for t ∈ [0, 1] . Let λ (0) ∈ σ (A (0)) and define Σ ≡ ∪t∈[0,1] σ (A (t)) . Let Kλ(0) = K0 denote the connected component of λ (0) in Σ. Then K0 ∩ σ (A (t)) ̸= ∅ for all t ∈ [0, 1] . Proof: Let S ≡ {t ∈ [0, 1] : K0 ∩ σ (A (s)) ̸= ∅ for all s ∈ [0, t]} . Then 0 ∈ S. Let t0 = sup (S) . Say σ (A (t0 )) = λ1 (t0 ) , · · · , λr (t0 ) . Claim: At least one of these is a limit point of K0 and consequently must be in K0 which shows that S has a last point. Why is this claim true? Let sn ↑ t0 so sn ∈ S. Now let the discs, D (λi (t0 ) , δ) , i = 1, · · · , r be disjoint with pA(t0 ) having no zeroes on γ i the boundary of

14.4. ADVANCED THEOREMS

331

D (λi (t0 ) , δ) . Then for n large enough it follows from Theorem 14.4.1 and the discussion following it that σ (A (sn )) is contained in ∪ri=1 D (λi (t0 ) , δ). It follows that K0 ∩ (σ (A (t0 )) + D (0, δ)) ̸= ∅ for all δ small enough. This requires at least one of the λi (t0 ) to be in K0 . Therefore, t0 ∈ S and S has a last point. Now by Lemma 14.4.3, if t0 < 1, then K0 ∪ Kt would be a strictly larger connected set containing λ (0) . (The reason this would be strictly larger is that K0 ∩ σ (A (s)) = ∅ for some s ∈ (t, t + η) while Kt ∩ σ (A (s)) ̸= ∅ for all s ∈ [t, t + η].) Therefore, t0 = 1.  Corollary 14.4.5 Suppose one of the Gerschgorin discs, Di is disjoint from the union of the others. Then Di contains an eigenvalue of A. Also, if there are n disjoint Gerschgorin discs, then each one contains an eigenvalue of A. ( ) Proof: Denote by A (t) the matrix atij where if i ̸= j, atij = taij and atii = aii . Thus to get A (t) multiply all non diagonal terms by t. Let t ∈ [0, 1] . Then A (0) = diag (a11 , · · · , ann ) and A (1) = A. Furthermore, the map, t → A (t) is continuous. Denote by Djt the Gerschgorin disc obtained from the j th row for the matrix A (t). Then it is clear that Djt ⊆ Dj the j th Gerschgorin disc for A. It follows aii is the eigenvalue for A (0) which is contained in the disc, consisting of the single point aii which is contained in Di . Letting K be the connected component in Σ for Σ defined in Theorem 14.4.4 which is determined by aii , Gerschgorin’s theorem implies that K ∩ σ (A (t)) ⊆ ∪nj=1 Djt ⊆ ∪nj=1 Dj = Di ∪ (∪j̸=i Dj ) and also, since K is connected, there are not points of K in both Di and (∪j̸=i Dj ) . Since at least one point of K is in Di ,(aii ), it follows all of K must be contained in Di . Now by Theorem 14.4.4 this shows there are points of K ∩ σ (A) in Di . The last assertion follows immediately.  This can be improved even more. This involves the following lemma. Lemma 14.4.6 In the situation of Theorem 14.4.4 suppose λ (0) = K0 ∩ σ (A (0)) and that λ (0) is a simple root of the characteristic equation of A (0). Then for all t ∈ [0, 1] , σ (A (t)) ∩ K0 = λ (t) where λ (t) is a simple root of the characteristic equation of A (t) . Proof: Let S ≡ {t ∈ [0, 1] : K0 ∩ σ (A (s)) = λ (s) , a simple eigenvalue for all s ∈ [0, t]} . Then 0 ∈ S so it is nonempty. Let t0 = sup (S) and suppose λ1 ̸= λ2 are two elements of σ (A (t0 )) ∩ K0 . Then choosing η > 0 small enough, and letting Di be disjoint discs containing λi respectively, similar arguments to those of Lemma 14.4.3 can be used to conclude Hi ≡ ∪s∈[t0 −η,t0 ] σ (A (s)) ∩ Di is a connected and nonempty set for i = 1, 2 which would require that Hi ⊆ K0 . But then there would be two different eigenvalues of A (s) contained in K0 , contrary to the definition of t0 . Therefore, there is at most one eigenvalue λ (t0 ) ∈ K0 ∩ σ (A (t0 )) . Could it be a repeated root of the characteristic equation? Suppose λ (t0 ) is a repeated root of the characteristic equation. As before, choose a small disc, D centered at λ (t0 ) and η small enough that H ≡ ∪s∈[t0 −η,t0 ] σ (A (s)) ∩ D is a nonempty connected set containing either multiple eigenvalues of A (s) or else a single repeated root to the characteristic equation of A (s) . But since H is connected and contains λ (t0 ) it must be contained in K0 which contradicts the condition for s ∈ S for all these s ∈ [t0 − η, t0 ] . Therefore, t0 ∈ S as hoped. If t0 < 1, there exists a small disc centered at λ (t0 ) and η > 0 such that for all s ∈ [t0 , t0 + η] , A (s) has only simple eigenvalues in D and the only eigenvalues of A (s) which could be in K0 are in D. (This last assertion follows from noting that λ (t0 ) is the only eigenvalue of A (t0 ) in K0 and so the others are at a positive distance from K0 . For s close enough to t0 , the eigenvalues of A (s) are either close to these eigenvalues of A (t0 ) at a positive distance from K0 or they are close to the eigenvalue λ (t0 ) in which case it can be assumed they are in D.) But this shows that t0 is not really an upper bound to S. Therefore, t0 = 1 and the lemma is proved.  With this lemma, the conclusion of the above corollary can be sharpened.

332

CHAPTER 14. MATRICES AND THE INNER PRODUCT

Corollary 14.4.7 Suppose one of the Gerschgorin discs, Di is disjoint from the union of the others. Then Di contains exactly one eigenvalue of A and this eigenvalue is a simple root to the characteristic polynomial of A. Proof: In the proof of Corollary 14.4.5, note that aii is a simple root of A (0) since otherwise the ith Gerschgorin disc would not be disjoint from the others. Also, K, the connected component determined by aii must be contained in Di because it is connected and by Gerschgorin’s theorem above, K ∩ σ (A (t)) must be contained in the union of the Gerschgorin discs. Since all the other eigenvalues of A (0) , the ajj , are outside Di , it follows that K ∩σ (A (0)) = aii . Therefore, by Lemma 14.4.6, K ∩ σ (A (1)) = K ∩ σ (A) consists of a single simple eigenvalue.  Example 14.4.8 Consider the matrix



5 1   1 1 0 1

 0  1  0

The Gerschgorin discs are D (5, 1) , D (1, 2) , and D (0, 1) . Observe D (5, 1) is disjoint from the other discs. Therefore, there should be an eigenvalue in D (5, 1) . The actual eigenvalues are not easy to find. They are the roots of the characteristic equation, t3 − 6t2 + 3t + 5 = 0. The numerical values of these are −. 669 66, 1. 423 1, and 5. 246 55, verifying the predictions of Gerschgorin’s theorem.

14.5

Exercises

1. Explain why it is typically impossible to compute the upper triangular matrix whose existence is guaranteed by Schur’s theorem. 2. Now recall the QR factorization of Theorem 13.3.9 on Page 311. The QR algorithm is a technique which does compute the upper triangular matrix in Schur’s theorem sometimes. There is much more to the QR algorithm than will be presented here. In fact, what I am about to show you is not the way it is done in practice. One first obtains what is called a Hessenburg matrix for which the algorithm will work better. However, the idea is as follows. Start with A an n×n matrix having real eigenvalues. Form A = QR where Q is orthogonal and R is upper triangular. (Right triangular.) This can be done using the technique of Theorem 13.3.9 using Householder matrices. Next take A1 ≡ RQ. Show that A = QA1 QT . In other words these two matrices, A, A1 are similar. Explain why they have the same eigenvalues. Continue by letting A1 play the role of A. Thus the algorithm is of the form An = QRn and An+1 = Rn+1 Q. Explain why A = Qn An QTn for some Qn orthogonal. Thus An is a sequence of matrices each similar to A. The remarkable thing is that often these matrices converge to an upper triangular matrix T and A = QT QT for some orthogonal matrix, the limit of the Qn where the limit means the entries converge. Then the process computes the upper triangular Schur form of the matrix A. Thus the eigenvalues of A appear on the diagonal of T. You will see approximately what these are as the process continues. 3. ↑Try the QR algorithm on

(

−1 6

−2 6

)

which has eigenvalues 3 and 2. I suggest you use a computer algebra system to do the computations. 4. ↑Now try the QR algorithm on

(

0 2

−1 0

)

Show that the algorithm cannot converge for this example. Hint: Try a few iterations of the algorithm. Use a computer algebra system if you like.

14.5. EXERCISES

333 (

) ( ) 0 −1 0 −2 5. ↑Show the two matrices A ≡ and B ≡ are similar; that is there exists 4 0 2 0 a matrix S such that A = S −1 BS but there is no orthogonal matrix Q such that QT BQ = A. Show the QR algorithm does converge for the matrix B although it fails to do so for A. 6. Let F be an m × n matrix. Show that F ∗ F has all real eigenvalues and furthermore, they are all nonnegative. 7. If A is a real n × n matrix and λ is a complex eigenvalue λ = a + ib, b ̸= 0, of A having eigenvector z + iw, show that w ̸= 0. 8. Suppose A = QT DQ where Q is an orthogonal matrix and all the matrices are real. Also D is a diagonal matrix. Show that A must be symmetric. 9. Suppose A is an n × n matrix and there exists a unitary matrix U such that A = U ∗ DU where D is a diagonal matrix. Explain why A must be normal. 10. If A is Hermitian, show that det (A) must be real. 11. Show that every unitary matrix preserves distance. That is, if U is unitary, |U x| = |x| . 12. Show that if a matrix does preserve distances, then it must be unitary. 13. ↑Show that a complex normal matrix A is unitary if and only if its eigenvalues have magnitude equal to 1. 14. Suppose A is an n × n matrix which is diagonally dominant. Recall this means ∑ |aij | < |aii | j̸=i

show A−1 must exist. 15. Give some disks in the complex plane whose union  1 + 2i 4  0 i  5 6

contains all the eigenvalues of the matrix  2  3  7

16. Show a square matrix is invertible if and only if it has no zero eigenvalues. 17. Using Schur’s theorem, show the trace of an n × n matrix equals the sum of the eigenvalues and the determinant of an n × n matrix is the product of the eigenvalues. 18. Using Schur’s theorem, show that if A is any complex n × n matrix having eigenvalues {λi } ∑ ∑n 2 2 listed according to multiplicity, then i,j |Aij | ≥ i=1 |λi | . Show that equality holds if and only if A is normal. This inequality is called Schur’s inequality. [29] 19. Here is a matrix.

    

1234 0 98 56

 6 5 3 −654 9 123    123 10, 000 11  78 98 400

I know this matrix has an inverse before doing any computations. How do I know?

334

CHAPTER 14. MATRICES AND THE INNER PRODUCT

20. Show the critical points of the following function are ( (0, −3, 0) , (2, −3, 0) , and

1, −3, −

1 3

)

and classify them as local minima, local maxima or saddle points. f (x, y, z) = − 32 x4 + 6x3 − 6x2 + zx2 − 2zx − 2y 2 − 12y − 18 − 32 z 2 . 21. Here is a function of three variables. f (x, y, z) = 13x2 + 2xy + 8xz + 13y 2 + 8yz + 10z 2 change the variables so that in the new variables there are no mixed terms, terms involving xy, yz etc. Two eigenvalues are 12 and 18. 22. Here is a function of three variables. f (x, y, z) = 2x2 − 4x + 2 + 9yx − 9y − 3zx + 3z + 5y 2 − 9zy − 7z 2 change the variables so that in the new variables there are no mixed terms, terms involving 19 xy, yz etc. The eigenvalues of the matrix which you will work with are − 17 2 , 2 , −1. 23. Here is a function of three variables. f (x, y, z) = −x2 + 2xy + 2xz − y 2 + 2yz − z 2 + x change the variables so that in the new variables there are no mixed terms, terms involving xy, yz etc. 24. Show the critical points of the function, f (x, y, z) = −2yx2 − 6yx − 4zx2 − 12zx + y 2 + 2yz. are points of the form,

( ) (x, y, z) = t, 2t2 + 6t, −t2 − 3t

for t ∈ R and classify them as local minima, local maxima or saddle points. 25. Show the critical points of the function f (x, y, z) =

1 4 1 x − 4x3 + 8x2 − 3zx2 + 12zx + 2y 2 + 4y + 2 + z 2 . 2 2

are (0, −1, 0) , (4, −1, 0) , and (2, −1, −12) and classify them as local minima, local maxima or saddle points. 26. Let f (x, y) = 3x4 − 24x2 + 48 − yx2 + 4y. Find and classify the critical points using the second derivative test. 27. Let f (x, y) = 3x4 − 5x2 + 2 − y 2 x2 + y 2 . Find and classify the critical points using the second derivative test. 28. Let f (x, y) = 5x4 − 7x2 − 2 − 3y 2 x2 + 11y 2 − 4y 4 . Find and classify the critical points using the second derivative test. 29. Let f (x, y, z) = −2x4 − 3yx2 + 3x2 + 5x2 z + 3y 2 − 6y + 3 − 3zy + 3z + z 2 . Find and classify the critical points using the second derivative test. 30. Let f (x, y, z) = 3yx2 − 3x2 − x2 z − y 2 + 2y − 1 + 3zy − 3z − 3z 2 . Find and classify the critical points using the second derivative test.

14.5. EXERCISES

335

31. Let Q be orthogonal. Find the possible values of det (Q) . 32. Let U be unitary. Find the possible values of det (U ) . 33. If a matrix is nonzero can it have only zero for eigenvalues? 34. A matrix A is called nilpotent if Ak = 0 for some positive integer k. Suppose A is a nilpotent matrix. Show it has only 0 for an eigenvalue. 35. If A is a nonzero nilpotent matrix, show it must be defective. 36. Suppose A is a nondefective n × n matrix and its eigenvalues are all either 0 or 1. Show A2 = A. Could you say anything interesting if the eigenvalues were all either 0,1,or −1? By DeMoivre’s theorem, an nth root of unity is of the form ( ( ) ( )) 2kπ 2kπ cos + i sin n n Could you generalize the sort of thing just described to get An = A? Hint: Since A is nondefective, there exists S such that S −1 AS = D where D is a diagonal matrix. 37. This and the following problems will present most of a differential equations course. Most of the explanations are given. You fill in any details needed. To begin with, consider the scalar initial value problem y ′ = ay, y (t0 ) = y0 When a is real, show the unique solution to this problem is y = y0 ea(t−t0 ) . Next suppose y ′ = (a + ib) y, y (t0 ) = y0

(14.6)

where y (t) = u (t) + iv (t) . Show there exists a unique solution and it is given by y (t) = y0 ea(t−t0 ) (cos b (t − t0 ) + i sin b (t − t0 )) ≡ e(a+ib)(t−t0 ) y0 .

(14.7)

Next show that for a real or complex there exists a unique solution to the initial value problem y ′ = ay + f, y (t0 ) = y0 and it is given by



t

y (t) = ea(t−t0 ) y0 + eat

e−as f (s) ds.

t0

Hint: For the first part write as y ′ − ay = 0 and multiply both sides by e−at . Then explain why you get ) d ( −at e y (t) = 0, y (t0 ) = 0. dt Now you finish the argument. To show uniqueness in the second part, suppose y ′ = (a + ib) y, y (t0 ) = 0 and verify this requires y (t) = 0. To do this, note y ′ = (a − ib) y, y (t0 ) = 0 2

and that |y| (t0 ) = 0 and d 2 |y (t)| = y ′ (t) y (t) + y ′ (t) y (t) dt 2

= (a + ib) y (t) y (t) + (a − ib) y (t) y (t) = 2a |y (t)| . Thus from the first part |y (t)| = 0e−2at = 0. Finally observe by a simple computation that 14.6 is solved by 14.7. For the last part, write the equation as 2

y ′ − ay = f and multiply both sides by e−at and then integrate from t0 to t using the initial condition.

336

CHAPTER 14. MATRICES AND THE INNER PRODUCT

38. Now consider A an n × n matrix. By Schur’s theorem there exists unitary Q such that Q−1 AQ = T where T is upper triangular. Now consider the first order initial value problem x′ = Ax, x (t0 ) = x0 . Show there exists a unique solution to this first order system. Hint: Let y = Q−1 x and so the system becomes y′ = T y, y (t0 ) = Q−1 x0 (14.8) T

Now letting y = (y1 , · · · , yn ) , the bottom equation becomes ( ) yn′ = tnn yn , yn (t0 ) = Q−1 x0 n . Then use the solution you get in this to get the solution to the initial value problem which occurs one level up, namely ( ) ′ yn−1 = t(n−1)(n−1) yn−1 + t(n−1)n yn , yn−1 (t0 ) = Q−1 x0 n−1 Continue doing this to obtain a unique solution to 14.8. 39. Now suppose Φ (t) is an n × n matrix of the form ( Φ (t) = x1 (t) · · · where Explain why

) (14.9)

xn (t)

x′k (t) = Axk (t) . Φ′ (t) = AΦ (t)

if and only if Φ (t) is given in the form of 14.9. Also explain why if c ∈ Fn , y (t) ≡ Φ (t) c solves the equation y′ (t) = Ay (t) . 40. In the above problem, consider the question whether all solutions to x′ = Ax

(14.10)

are obtained in the form Φ (t) c for some choice of c ∈ Fn . In other words, is the general solution to this equation Φ (t) c for c ∈ Fn ? Prove the following theorem using linear algebra. Theorem 14.5.1 Suppose Φ (t) is an n × n matrix which satisfies Φ′ (t) = AΦ (t) . Then the −1 general solution to 14.10 is Φ (t) c if and only if Φ (t) exists for some t. Furthermore, if −1 −1 Φ′ (t) = AΦ (t) , then either Φ (t) exists for all t or Φ (t) never exists for any t. (det (Φ (t)) is called the Wronskian and this theorem is sometimes called the Wronskian alternative.) Hint: Suppose first the general solution is of the form Φ (t) c where c is an arbitrary constant −1 −1 vector in Fn . You need to verify Φ (t) exists for some t. In fact, show Φ (t) exists for every −1 t. Suppose then that Φ (t0 ) does not exist. Explain why there exists c ∈ Fn such that there is no solution x to the equation c = Φ (t0 ) x. By the existence part of Problem 38 there exists a solution to x′ = Ax, x (t0 ) = c but this cannot be in the form Φ (t) c. Thus for every t, Φ (t) −1 t0 , Φ (t0 ) exists. Let z′ = Az and choose c such that z (t0 ) = Φ (t0 ) c

−1

exists. Next suppose for some

14.5. EXERCISES

337

Then both z (t) , Φ (t) c solve

x′ = Ax, x (t0 ) = z (t0 )

Apply uniqueness to conclude z = Φ (t) c. Finally, consider that Φ (t) c for c ∈ Fn either is the −1 general solution or it is not the general solution. If it is, then Φ (t) exists for all t. If it is −1 not, then Φ (t) cannot exist for any t from what was just shown. −1

41. Let Φ′ (t) = AΦ (t) . Then Φ (t) is called a fundamental matrix if Φ (t) there exists a unique solution to the equation

exists for all t. Show

x′ = Ax + f , x (t0 ) = x0

(14.11)

and it is given by the formula −1

x (t) = Φ (t) Φ (t0 )



t

x0 + Φ (t)

Φ (s)

−1

f (s) ds

t0

Now these few problems have done virtually everything of significance in an entire undergraduate differential equations course, illustrating the superiority of linear algebra. The above formula is called the variation of constants formula. Hint: Uniquenss is easy. If x1 , x2 are two solutions then let u (t) = x1 (t) − x2 (t) and argue u′ = Au, u (t0 ) = 0. Then use Problem 38. To verify there exists a solution, you could just differentiate the above formula using the fundamental theorem of calculus and verify it works. Another way is to assume the solution in the form x (t) = Φ (t) c (t) and find c (t) to make it all work out. This is called the method of variation of parameters. −1

42. Show there exists a special Φ such that Φ′ (t) = AΦ (t) , Φ (0) = I, and suppose Φ (t) for all t. Show using uniqueness that

exists

−1

Φ (−t) = Φ (t) and that for all t, s ∈ R

Φ (t + s) = Φ (t) Φ (s) Explain why with this special Φ, the solution to 14.11 can be written as ∫ t x (t) = Φ (t − t0 ) x0 + Φ (t − s) f (s) ds. t0

Hint: Let Φ (t) be such that the j th column is xj (t) where x′j = Axj , xj (0) = ej . Use uniqueness as required. 43. You can see more on this problem and the next one in the latest version of Horn and Johnson, [23]. Two n × n matrices A, B are said to be congruent if there is an invertible P such that B = P AP ∗ Let A be a Hermitian matrix. Thus it has all real eigenvalues. Let n+ be the number of positive eigenvalues, n− , the number of negative eigenvalues and n0 the number of zero eigenvalues. For k a positive integer, let Ik denote the k × k identity matrix and Ok the k × k zero matrix. Then the inertia matrix of A is the following block diagonal n × n matrix.   I n+   I n−   On0

338

CHAPTER 14. MATRICES AND THE INNER PRODUCT Show that A is congruent to its inertia matrix. Next show that congruence is an equivalence relation on the set of Hermitian matrices. Finally, show that if two Hermitian matrices have the same inertia matrix, then they must be congruent. Hint: First recall that there is a unitary matrix, U such that   Dn+   U ∗ AU =  Dn−  On0 where the Dn+ is a diagonal matrix having the positive eigenvalues of A, Dn− being defined similarly. Now let Dn− denote the diagonal matrix which replaces each entry of Dn− with its absolute value. Consider the two diagonal matrices  −1/2  D n+   Dn− −1/2 D = D∗ =   In0 Now consider D∗ U ∗ AU D.

44. Show that if A, B are two congruent Hermitian matrices, then they have the same inertia matrix. Hint: Let A = SBS ∗ where S is invertible. Show that A, B have the same rank and this implies that they are each unitarily similar to a diagonal matrix which has the same number of zero entries on the main diagonal. Therefore, letting VA be the span of the eigenvectors associated with positive eigenvalues of A and VB being defined similarly, it suffices to show that these have the same dimensions. Show that (Ax, x) > 0 for all x ∈ VA . Next consider S ∗ VA . For x ∈ VA , explain why ( ) −1 (BS ∗ x,S ∗ x) = S −1 A (S ∗ ) S ∗ x,S ∗ x ) ( ) ( ( )∗ = S −1 Ax,S ∗ x = Ax, S −1 S ∗ x = (Ax, x) > 0 Next explain why this shows that S ∗ VA is a subspace of VB and so the dimension of VB is at least as large as the dimension of VA . Hence there are at least as many positive eigenvalues for B as there are for A. Switching A, B you can turn the inequality around. Thus the two have the same inertia matrix. 45. Let A be an m × n matrix. Then if you unraveled it, you could consider it as a vector in Cnm . The Frobenius inner product on the vector space of m × n matrices is defined as (A, B) ≡ trace (AB ∗ ) Show that this really does satisfy the axioms of an inner product space and that it also amounts to nothing more than considering m × n matrices as vectors in Cnm . 46. ↑Consider the n × n unitary matrices. Show that whenever U is such a matrix, it follows that √ |U |Cnn = n Next explain why if {Uk } is any sequence of unitary matrices, there exists a subsequence ∞ {Ukm }m=1 such that limm→∞ Ukm = U where U is unitary. Here the limit takes place in the sense that the entries of Ukm converge to the corresponding entries of U . 47. ↑Let A, B be two n × n matrices. Denote by σ (A) the set of eigenvalues of A. Define dist (σ (A) , σ (B)) = max min {|λ − µ| : µ ∈ σ (B)} λ∈σ(A)

14.6. CAUCHY’S INTERLACING THEOREM FOR EIGENVALUES

339

Explain why dist (σ (A) , σ (B)) is small if and only if every eigenvalue of A is close to some eigenvalue of B. Now prove the following theorem using the above problem and Schur’s theorem. This theorem says roughly that if A is close to B then the eigenvalues of A are close to those of B in the sense that every eigenvalue of A is close to an eigenvalue of B. This is a very important observation when you try to approximate eigenvalues using the QR algorithm. Theorem 14.5.2 Suppose limk→∞ Ak = A. Then lim dist (σ (Ak ) , σ (A)) = 0

k→∞

(

) a b 48. Let A = be a 2 × 2 matrix which is not a multiple of the identity. Show that A is c d similar to a 2 × 2 matrix which has at least one diagonal entry equal to 0. Hint: First note that there exists a vector a such that Aa is not a multiple of a. Then consider ( B=

)−1 a

Aa

( A

) a

Aa

Show B has a zero on the main diagonal. 49. ↑ Let A be a complex n × n matrix which has trace equal to 0. Show that A is similar to a matrix which has all zeros on the main diagonal. Hint: Use Problem 39 on Page 84 to argue that you can say that a given matrix is similar to one which has the diagonal entries permuted in any order desired. Then use the above problem and block multiplication to show that if the A has k nonzero entries, then it is similar to a matrix which has k − 1 nonzero entries. Finally, when A is similar to one which has at most one nonzero entry, this one must also be zero because of the condition on the trace. 50. ↑An n × n matrix X is a commutator if there are n × n matrices A, B such that X = AB − BA. Show that the trace of any commutator is 0. Next show that if a complex matrix X has trace equal to 0, then it is in fact a commutator. Hint: Use the above problem to show that it suffices to consider X having all zero entries on the main diagonal. Then define   1 0 { X   ij 2   i−j if i ̸= j  , Bij = A= ..   0 if i = j .   0

14.6

n

Cauchy’s Interlacing Theorem for Eigenvalues

Recall that every Hermitian matrix has all real eigenvalues. The Cauchy interlacing theorem compares the location of the eigenvalues of a Hermitian matrix with the eigenvalues of a principal submatrix. It is an extremely interesting theorem. Theorem 14.6.1 Let A be a Hermitian n × n matrix and let ( ) a y∗ A= y B where B is (n − 1) × (n − 1) . Let the eigenvalues of B be µ1 ≤ µ2 ≤ · · · ≤ µn−1 . Then if the eigenvalues of A are λ1 ≤ λ2 ≤ · · · ≤ λn , it follows that λ1 ≤ µ1 ≤ λ2 ≤ µ2 ≤ · · · ≤ µn−1 ≤ λn .

340

CHAPTER 14. MATRICES AND THE INNER PRODUCT Proof: First note that B is Hermitian because ( ) ( ∗ a y a A∗ = =A= y B∗ y

y∗ B

)

It is easiest to consider the case where strict inequality holds for the eigenvalues for B so first is an outline of reducing to this case. There exists U unitary, depending on B such that U ∗ BU = D where   µ1 0   ..  D= .   0 µn−1 Now let {εk } be a decreasing sequence of very small positive numbers converging to 0 and let Bk be defined by   µ1 + εk 0   µ2 + 2εk   ∗   U Bk U = Dk , Dk ≡  ..  .   0 µn−1 + (n − 1) εk where U is the above unitary matrix. Thus the eigenvalues of Bk , µ ˆ1 < · · · < µ ˆ n−1 are strictly increasing and µ ˆ j ≡ µj + jεk . Let Ak be given by ( Ak = Then

(

1 0∗ 0 U∗

)

( Ak

1 0∗ 0 U

)

( = ( =

1 0∗ 0 U∗ a U ∗y

y∗ Bk

a y

)

)(

)( ) a y∗ 1 0∗ y Bk 0 U )( ) ( y∗ 1 0∗ a = U ∗ Bk 0 U U ∗y

y∗ U Dk

)

We can replace y in the statement of the theorem with yk such that limk→∞ yk = y but zk ≡ U ∗ yk has the property that each component of zk is nonzero. This will probably take place automatically but if not, make the change. This makes a change in Ak but still limk→∞ Ak = A. The main part of this argument which follows has to do with fixed k. Expanding det (λI − Ak ) along the top row, the characteristic polynomial for Ak is then q (λ) = (λ − a)

n−1 ∏

(λ − µ ˆi) −

i=1

n−1 ∑

( ) 2 |zi | (λ − µ ˆ 1 ) · · · (λ\ −µ ˆi) · · · λ − µ ˆ n−1

(14.12)

i=2

∏n−1 where (λ\ −µ ˆ i ) indicates that this factor is omitted from the product i=1 (λ − µ ˆ i ) . To see why this is so, consider the case where Bk is 3 × 3. In this case, you would have (

1 0T 0 U∗

(

) (λI − Ak )

1 0T 0 U

)

   = 

λ−a z1 z2 z3

z1 λ−µ ˆ1 0 0

z2 0 λ−µ ˆ2 0

z3 0 0 λ−µ ˆ3

    

14.6. CAUCHY’S INTERLACING THEOREM FOR EIGENVALUES

341

In general, you would have an n × n matrix on the right with the same appearance. Then expanding as indicated, the determinant is   z 0 0 1 3 ∏   (λ − µ ˆ i ) − z 1 det  z2 λ − µ (λ − a) ˆ2 0  i=1 z3 0 λ−µ ˆ3     z1 λ − µ ˆ1 0 z1 λ − µ ˆ1 0     +z 2 det  z2 0 0 0 λ−µ ˆ2   − z 3 det  z2 z3 0 λ−µ ˆ3 z3 0 0 ( ) 3 2 2 ∏ |z1 | (λ − µ ˆ 2 ) (λ − µ ˆ 3 ) + |z2 | (λ − µ ˆ 1 ) (λ − µ ˆ3) = (λ − a) (λ − µ ˆi) − 2 + |z3 | (λ − µ ˆ 1 ) (λ − µ ˆ2) i=1 Notice how, when you expand the 3 × 3 determinants along the first column, you have only one non-zero term and the sign is adjusted to give the above claim. Clearly, it works the same for ( ) any size matrix. Since the µ ˆ i are strictly increasing in i, it follows from 14.12 that( q (ˆ µ)i ) q µ ˆ i+1 ≤ 0. However, since each |zi | ̸= 0, none of the q (ˆ µi ) can equal 0 and so q (ˆ µi ) q µ ˆ i+1 < 0. Hence, from the (intermediate value theorem of calculus, there is a root of q (λ) in each of the disjoint open ) intervals µ ˆi, µ ˆ i+1 . There are n − 2 of these intervals and so this accounts for n − 2 roots of q (λ). q (λ) = (λ − a)

n−1 ∏ i=1

(λ − µ ˆi) −

n−1 ∑

( ) 2 |zi | (λ − µ ˆ 1 ) · · · (λ\ −µ ˆi) · · · λ − µ ˆ n−1

i=2

( ) n−3 What of q (ˆ µ1 )? Its sign is the same as (−1) and also q µ ˆ n−1 < 0 . Therefore, there is a root to q (λ) which is larger than µ ˆ n−1 . Indeed, limλ→∞ q (λ) = ∞ so there exists a root of q (λ) strictly larger than µ ˆ n−1 . This accounts for n − 1 roots of q (λ) . Now consider q (ˆ µ1 ) . Suppose first that n is odd. Then you have q (ˆ µ1 ) > 0. Hence, there is a root of q (λ) which is no larger than µ ˆ 1 because in this case, limλ→−∞ q (λ) = −∞. If n is even, then q (ˆ µ1 ) < 0 and so there is a root of q (λ) which is smaller than µ ˆ 1 because in this case, limλ→−∞ q (λ) = ∞. This accounts for all roots of q (λ). Hence, if the roots of q (λ) are λ1 ≤ λ2 ≤ · · · ≤ λn , it follows that λ1 < µ ˆ 1 < λ2 < µ ˆ2 < · · · < µ ˆ n−1 < λn To get the complete result, simply take the limit as k → ∞. Then limk→∞ µ ˆ k = µk and Ak → A and so the eigenvalues of Ak converge to the corresponding eigenvalues of A (See Problem 47 on Page 338), and so, passing to the limit, gives the desired result in which it may be necessary to replace < with ≤.  Definition 14.6.2 Let A be an n × n matrix. An (n − r) × (n − r) matrix is called a principal submatrix of A if it is obtained by deleting from A the rows i1 , i2 , · · · , ir and the columns i1 , i2 , · · · , ir . Now the Cauchy interlacing theorem is really the following corollary. Corollary 14.6.3 Let A be an n × n Hermitian matrix and let B be an (n − 1) × (n − 1) principal submatrix. Then the interlacing inequality holds λ1 ≤ µ1 ≤ λ2 ≤ µ2 ≤ · · · ≤ µn−1 ≤ λn where the µi are the eigenvalues of B listed in increasing order and the λi are the eigenvalues of A listed in increasing order. Proof: Suppose B is obtained from A by deleting the ith row and the ith column. Then let P be the permutation matrix which switches the ith row with the first row. It is an orthogonal th matrix and so its inverse is its transpose. The transpose switches ( ) the i column with the first ∗ a y column. See Problem 40 on Page 84. Thus P AP T = and it follows that the result of y B the multiplication is indeed as shown, a Hermitian matrix because P, P T are orthogonal matrices. Now the conclusion of the corollary follows from Theorem 14.6.1. 

342

14.7

CHAPTER 14. MATRICES AND THE INNER PRODUCT

The Right Polar Factorization∗

The right polar factorization involves writing a matrix as a product of two other matrices, one which preserves distances and the other which stretches and distorts. This is of fundamental significance in geometric measure theory and also in continuum mechanics. Not surprisingly the stress should depend on the part which stretches and distorts. See [17]. First here are some lemmas which review and add to many of the topics discussed so far about adjoints and orthonormal sets and such things. Lemma 14.7.1 Let A be a Hermitian matrix such that all its eigenvalues are nonnegative. Then ( )2 there exists a Hermitian matrix A1/2 such that A1/2 has all nonnegative eigenvalues and A1/2 = A. Proof: Since A is Hermitian, there exists a diagonal matrix D having all real nonnegative entries and a unitary matrix U such that A = U ∗ DU. Then denote by D1/2 the matrix which is obtained by replacing each diagonal entry of D with its square root. Thus D1/2 D1/2 = D. Then define A1/2 ≡ U ∗ D1/2 U. Then

Since D1/2 is real,

( (

A1/2

)2

= U ∗ D1/2 U U ∗ D1/2 U = U ∗ DU = A.

U ∗ D1/2 U

)∗

( )∗ ∗ = U ∗ D1/2 (U ∗ ) = U ∗ D1/2 U

so A1/2 is Hermitian.  In fact this square root is unique. This is shown a little later after the main result of this section. Next it is helpful to recall the Gram Schmidt algorithm and observe a certain property stated in the next lemma. Lemma 14.7.2 Suppose {w1 , · · · , wr , vr+1 , · · · , vp } is a linearly independent set of vectors such that {w1 , · · · , wr } is an orthonormal set of vectors. Then when the Gram Schmidt process is applied to the vectors in the given order, it will not change any of the w1 , · · · , wr . Proof: Let {u1 , · · · , up } be the orthonormal set delivered by the Gram Schmidt process. Then u1 = w1 because by definition, u1 ≡ w1 / |w1 | = w1 . Now suppose uj = wj for all j ≤ k ≤ r. Then if k < r, consider the definition of uk+1 . ∑k+1 wk+1 − j=1 (wk+1 , uj ) uj uk+1 ≡ ∑k+1 wk+1 − j=1 (wk+1 , uj ) uj By induction, uj = wj and so this reduces to wk+1 / |wk+1 | = wk+1 since |wk+1 | = 1.  This lemma immediately implies the following lemma. Lemma 14.7.3 Let V be a subspace of dimension p and let {w1 , · · · , wr } be an orthonormal set of vectors in V . Then this orthonormal set of vectors may be extended to an orthonormal basis for V, {w1 , · · · , wr , yr+1 , · · · , yp } Proof: First extend the given linearly independent set {w1 , · · · , wr } to a basis for V and then apply the Gram Schmidt theorem to the resulting basis. Since {w1 , · · · , wr } is orthonormal it follows from Lemma 14.7.2 the result is of the desired form, an orthonormal basis extending {w1 , · · · , wr }.  Recall Lemma 13.3.5 which is about preserving distances. It is restated here in the case of an m × n matrix.

14.7. THE RIGHT POLAR FACTORIZATION∗

343

Lemma 14.7.4 Suppose R is an m × n matrix with m ≥ n and R preserves distances. Then R∗ R = I. With this preparation, here is the big theorem about the right polar factorization. Theorem 14.7.5 Let F be an m × n matrix where m ≥ n. Then there exists a Hermitian n × n matrix U which has all nonnegative eigenvalues and an m × n matrix R which preserves distances and satisfies R∗ R = I such that F = RU. Proof: Consider F ∗ F. This is a Hermitian matrix because ∗



(F ∗ F ) = F ∗ (F ∗ ) = F ∗ F Also the eigenvalues of the n×n matrix F ∗ F are all nonnegative. This is because if x is an eigenvalue, λ (x, x) = (F ∗ F x, x) = (F x,F x) ≥ 0. Therefore, by Lemma 14.7.1, there exists an n × n Hermitian matrix U having all nonnegative eigenvalues such that U 2 = F ∗ F. Consider the subspace U (Fn ). Let {U x1 , · · · , U xr } be an orthonormal basis for U (Fn ) ⊆ Fn . Note that U (Fn ) might not be all of Fn . Using Lemma 14.7.3, extend to an orthonormal basis for all of Fn , {U x1 , · · · , U xr , yr+1 , · · · , yn } . Next observe that {F x1 , · · · , F xr } is also an orthonormal set of vectors in Fm . This is because ( ) (F xk , F xj ) = (F ∗ F xk , xj ) = U 2 xk , xj (U xk , U ∗ xj ) = (U xk , U xj ) = δ jk

=

Therefore, from Lemma 14.7.3 again, this orthonormal set of vectors can be extended to an orthonormal basis for Fm , {F x1 , · · · , F xr , zr+1 , · · · , zm } Thus there are at least as many zk as there are yj because m ≥ n. Now for x ∈ Fn , since {U x1 , · · · , U xr , yr+1 , · · · , yn } is an orthonormal basis for Fn , there exist unique scalars, c1 · · · , cr , dr+1 , · · · , dn such that x=

r ∑ k=1

Define Rx ≡

n ∑

ck U xk +

dk yk

k=r+1

r ∑

ck F xk +

k=1

n ∑

d k zk

(14.13)

k=r+1

Thus, since {F x1 , · · · , F xr , zr+1 , · · · , zn } is orthonormal, 2

|Rx| =

r ∑ k=1

2

|ck | +

n ∑ k=r+1

2

2

|dk | = |x|

344

CHAPTER 14. MATRICES AND THE INNER PRODUCT

and so it follows from Corollary 13.3.8 or Lemma 14.7.4 that R∗ R = I. Then also there exist scalars bk such that r ∑ bk U x k (14.14) Ux = k=1

and so from 14.13, RU x =

r ∑

( bk F xk = F

k=1

r ∑

) bk xk

k=1

∑r Is F ( k=1 bk xk ) = F (x)? Using 14.14, ( ( r ) ( r ) ) ∑ ∑ F bk xk − F (x) , F bk xk − F (x) k=1

k=1

( =

(

(F ∗ F ) (

=

(

U (

=

U (

=

2

r ∑

) ( bk x k − x ,

k=1 r ∑

) (

bk x k − x ,

k=1 ( r ∑

)

bk xk − x , U

)) bk x k − x

k=1 r ∑

)) bk x k − x

k=1

bk U xk − U x,

k=1

r ∑

))

bk x k − x

k=1 ( r ∑

k=1 r ∑

r ∑

bk U x k − U x

) =0

k=1

∑r Therefore, F ( k=1 bk xk ) = F (x) and this shows RU x = F x.  Note that U 2 is completely determined by F because F ∗ F = U R∗ RU = U 2 . In fact, U is also uniquely determined. This will be shown later in Theorem 14.8.1. First is an easy corollary of this theorem. Corollary 14.7.6 Let F be m×n and suppose n ≥ m. Then there exists a Hermitian U and and R, such that F = U R, RR∗ = I. ∗

Proof: Recall that L∗∗ = L and (M L) = L∗ M ∗ . Now apply Theorem 14.7.5 to F ∗ . Thus, ∗ F = R∗ U where R∗ and U satisfy the conditions of that theorem. In particular R∗ preserves distances. Then F = U R and RR∗ = R∗∗ R∗ = I. 

14.8

The Square Root

Now here is a uniqueness and existence theorem for the square root. It follows from this theorem that U in the above right polar decomposition of Theorem 14.7.5 is unique. Theorem 14.8.1 Let A be a self adjoint and nonnegative n × n matrix (all eigenvalues are nonnegative). Then there exists a unique self adjoint nonnegative matrix B such that B 2 = A. Proof: Suppose B 2 = A where B is such a Hermitian square root for A with nonnegative eigenvalues. Then by Theorem 14.1.6, B has an orthonormal basis for Fn of eigenvectors {u1 , · · · , un } . Bui = µi ui Thus B=

∑ i

µi ui u∗i

14.8. THE SQUARE ROOT

345

because both linear transformations agree on the orthonormal basis. But this implies that Aui = B 2 ui = µ2i ui Thus these are also an orthonormal basis of eigenvectors for A. Hence, letting λi = µ2i ∑ ∑ 1/2 A= λi ui u∗i , B = λi ui u∗i i

i 1/2

Let p (λ) be a polynomial such that p (λi ) = λi . Say p (λ) = a0 + a1 λ · · · + ap λp . Then ( )m ∑ ∑ m ∗ A = λi ui ui = λi1 ui1 u∗i1 λi2 ui2 u∗i2 · · · λim uim u∗im i



=

(14.15)

i1 ,··· ,im

λi1 λi2 · · · λim ui1 u∗i1 ui2 u∗i2 · · · uim u∗im

i1 ,··· ,im



=

λi1 λi2 · · · λim ui1 u∗im δ i1 i2 δ i2 i3 · · · δ im−1 im

i1 ,··· ,im



=

λi1 λi2 · · · λ2im−1 ui1 u∗im−1 δ i1 i2 δ i2 i3 · · · δ im−2 im−1

i1 ,··· ,im−1

=

.. . ∑

∗ λm i1 ui1 ui1 =



i1

∗ λm i ui ui

(14.16)

i

Therefore, p (A) = a0 I + a1 A · · · + ap Ap = =

a0



∑ i

ui u∗i + a1



i

λi ui u∗i + · · · + ap

i

p (λi ) ui u∗i

=





λpi ui u∗i

i 1/2 λi ui u∗i

=B

(14.17)

i

and so B commutes with every matrix which commutes with A. To see this, suppose CA = AC, then BC = p (A) C = Cp (A) = B This shows that if B is such a square root, then it commutes with every matrix C which commutes with A. It also shows, by a repeat of the argument 14.15 - 14.16 that B 2 = A. Could there be another such Hermitian square root which has all nonnegative eigenvalues? It was just shown that any such square root commutes with every matrix which commutes with A. Suppose B1 is another square root which is self adjoint, and has nonnegative eignevalues. Since both B, B1 are nonnegative, (B (B − B1 ) x, (B − B1 ) x) ≥ 0, (B1 (B − B1 ) x, (B − B1 ) x) ≥ 0

(14.18)

Now, adding these together, and using the fact that the two commute because they both commute with every matrix which commutes with A, ((B + B1 ) (B − B1 ) x, (B − B1 ) x) ≥ 0 (( 2 ) ) B − B12 x, (B − B1 ) x = ((A − A) x, (B − B1 ) x) = 0.

346

CHAPTER 14. MATRICES AND THE INNER PRODUCT

It follows that both inner products in 14.18 equal 0. Next √ √use the existence part shown above to take the square root of B and B1 which is denoted by B, B1 respectively. Then (√ ) √ 0 = B (B − B1 ) x, B (B − B1 ) x ) (√ √ 0 = B1 (B − B1 ) x, B1 (B − B1 ) x which implies



B (B − B1 ) x =



B1 (B − B1 ) x = 0. Thus also,

B (B − B1 ) x = B1 (B − B1 ) x = 0 Hence 0 = (B (B − B1 ) x − B1 (B − B1 ) x, x) = ((B − B1 ) x, (B − B1 ) x) and so, since x is arbitrary, B1 = B.  Corollary 14.8.2 The U in Theorem 14.7.5 is unique.

14.9

An Application To Statistics

A random vector is a function X : Ω → Rp where Ω is a probability space. This means that there exists a σ algebra of measurable sets F and a probability measure P : F → [0, 1]. In practice, people often don’t worry too much about the underlying probability space and instead pay more attention to the distribution measure of the random variable. For E a suitable subset of Rp , this measure gives the probability that X has values in E. There are often excellent reasons for believing that a random vector is normally distributed. This means that the probability that X has values in a set E is given by ) ( ∫ 1 1 ∗ −1 (x − m) Σ (x − m) dx exp − p/2 1/2 2 E (2π) det (Σ) The expression in the integral is called the normal probability density function. There are two parameters, m and Σ where m is called the mean and Σ is called the covariance matrix. It is a symmetric matrix which has all real eigenvalues which are all positive. While it may be reasonable to assume this is the distribution, in general, you won’t know m and Σ and in order to use this formula to predict anything, you would need to know these quantities. I am following a nice discussion given in Wikipedia which makes use of the existence of square roots. What people do to estimate m, and Σ is to take n independent observations x1 , · · · , xn and try to predict what m and Σ should be based on these observations. One criterion used for making this determination is the method of maximum likelihood. In this method, you seek to choose the two parameters in such a way as to maximize the likelihood which is given as n ∏ i=1

1 det (Σ)

1/2

( ) 1 ∗ −1 exp − (xi −m) Σ (xi −m) . 2

p/2

For convenience the term (2π) was ignored. Maximizing the above is equivalent to maximizing the ln of the above. So taking ln, ( )) 1 ∑ n ( ∗ ln det Σ−1 − (xi −m) Σ−1 (xi −m) 2 2 i=1 n

Note that the above is a function of the entries of m. Take the partial derivative with respect to ml . Since the matrix Σ−1 is symmetric this implies n ∑ ∑ i=1

r

(xir − mr ) Σ−1 rl = 0 each l.

14.9. AN APPLICATION TO STATISTICS Written in terms of vectors,

n ∑

347



(xi − m) Σ−1 = 0

i=1

and so, multiplying by Σ on the right and then taking adjoints, this yields n ∑

(xi − m) = 0, nm =

i=1

n ∑

1∑ xi ≡ x ¯. n i=1 n

xi , m =

i=1



Now that m is determined, it remains to find the best estimate for Σ. (xi −m) Σ−1 (xi −m) is a scalar, so since trace (AB) = trace (BA) , ( ) ∗ ∗ (xi −m) Σ−1 (xi −m) = trace (xi −m) Σ−1 (xi −m) ( ) ∗ = trace (xi −m) (xi −m) Σ−1 Therefore, the thing to maximize is n ( ( )) ∑ ( ) ∗ n ln det Σ−1 − trace (xi −m) (xi −m) Σ−1 i=1

  S }| { z( )  n    ∑ ( ( −1 )) ∗ −1  (x −m) (x −m) Σ = n ln det Σ − trace  i i     i=1 We assume that S has rank p. Thus it is a self adjoint matrix which has all positive eigenvalues. Therefore, from the property of the trace, trace (AB) = trace (BA) , the thing to maximize is ( ) ( ( )) n ln det Σ−1 − trace S 1/2 Σ−1 S 1/2 Now let B = S 1/2 Σ−1 S 1/2 . Then B is positive and self adjoint also and so there exists U unitary such that B = U ∗ DU where D is the diagonal matrix having the positive scalars λ1 , · · · , λp down the main diagonal. Solving for Σ−1 in terms of B, this yields S −1/2 BS −1/2 = Σ−1 and so ( ( ) ( )) ( ( )) ( ( )) ln det Σ−1 = ln det S −1/2 det (B) det S −1/2 = ln det S −1 + ln (det (B)) which yields C (S) + n ln (det (B)) − trace (B) as the thing to maximize. Of course this yields

(

C (S) + n ln

p ∏

) λi



i=1

=

C (S) + n

p ∑ i=1

ln (λi ) −

p ∑

λi

i=1 p ∑

λi

i=1

as the quantity to be maximized. To do this, take ∂/∂λk and set equal to 0. This yields λk = n. Therefore, from the above, B = U ∗ nIU = nI. Also from the above, B −1 = and so

1 I = S −1/2 ΣS −1/2 n

1 1∑ ∗ S= (xi − m) (xi − m) n n i=1 n

Σ=

This has shown that the maximum likelihood estimates are n n 1∑ 1∑ ∗ m=x ¯≡ xi , Σ = (xi − m) (xi − m) . n i=1 n i=1

348

CHAPTER 14. MATRICES AND THE INNER PRODUCT

14.10

Simultaneous Diagonalization

Recall the following definition of what it means for a matrix to be diagonalizable. Definition 14.10.1 Let A be an n × n matrix. It is said to be diagonalizable if there exists an invertible matrix S such that S −1 AS = D where D is a diagonal matrix. Also, here is a useful observation. Observation 14.10.2 If A is an n × n matrix and AS = SD for D a diagonal matrix, then each column of S is an eigenvector or else it is the zero vector. This follows from observing that for sk the k th column of S and from the way we multiply matrices, Ask = λk sk It is sometimes interesting to consider the problem of finding a single similarity transformation which will diagonalize all the matrices in some set. Lemma 14.10.3 Let A be an n × n matrix and let B be an m × m matrix. Denote by C the matrix ( ) A 0 C≡ . 0 B Then C is diagonalizable if and only if both A and B are diagonalizable. −1 −1 Proof: Suppose SA ASA = DA and SB BSB = DB( where DA ) and DB are diagonal matrices. SA 0 You should use block multiplication to verify that S ≡ is such that S −1 CS = DC , a 0 SB diagonal matrix. Consider the converse that C is diagonalizable. It is necessary to show that A has a basis of m eigenvectors for Fn and that ( B has a basis of) eigenvectors in F . Thus S has columns si . Suppose C is diagonalized by S = s1 · · · sn+m . For each of these columns, write in the form

( si =

xi yi

where xi ∈ Fn and where yi ∈ Fm . The result is ( S11 S= S21

)

S12 S22

)

where S11 is an n × n matrix and S22 is an m × m matrix. Then there is a diagonal matrix, D1 being n × n and D2 m × m such that ( ) D1 0 D = diag (λ1 , · · · , λn+m ) = 0 D2 such that

( ( =

A 0 0 B S11 S21

)(

S12 S22

) S11 S12 S21 S22 )( ) D1 0 0 D2

14.10. SIMULTANEOUS DIAGONALIZATION Hence by block multiplication, (

AS11 BS21

AS12 BS22

)

349

(

S11 D1 S21 D1

=

S12 D2 S22 D2

)

Thus, AS11 = S11 D1 , BS22 = S22 D2 BS21 = S21 D1 , AS12 = S12 D2 It follows each of the xi is an eigenvector of A or else is the zero vector and that each of the yi is an eigenvector of B or is the zero vector. If there are n linearly independent xi , then A is diagonalizable by Theorem 6.4.3 on Page 6.4.3. ( ) The row rank of the top half of S, the matrix x1 · · · xn+m must be n because if this is not so, the row rank of S would be less than n + m which would mean S −1 does not exist. Therefore, since the column rank equals the row rank, this top half of S has column rank equal to n and this means there are n linearly independent eigenvectors of A implying that A is diagonalizable. Similar reasoning applies to B by considering the bottom half of S.  Note that once you know that each of A, B are diagonalizable, you can then use the specific method used in the first part to accomplish the diagonalization. The following corollary follows from the same type of argument as the above. Corollary 14.10.4 Let Ak be an nk × nk matrix and let C denote the block diagonal ( r ) ( r ) ∑ ∑ nk × nk k=1

matrix given below.

  C≡ 

k=1

A1

0 ..

.

0

  . 

Ar

Then C is diagonalizable if and only if each Ak is diagonalizable. Definition 14.10.5 A set, F of n × n matrices is said to be simultaneously diagonalizable if and only if there exists a single invertible matrix S such that for every A ∈ F, S −1 AS = DA where DA is a diagonal matrix. F is a commuting family of matrices if whenever A, B ∈ F , AB = BA. Lemma 14.10.6 If F is a set of n × n matrices which is simultaneously diagonalizable, then F is a commuting family of matrices. Proof: Let A, B ∈ F and let S be a matrix which has the property that S −1 AS is a diagonal matrix for all A ∈ F. Then S −1 AS = DA and S −1 BS = DB where DA and DB are diagonal matrices. Since diagonal matrices commute, AB

=

SDA S −1 SDB S −1 = SDA DB S −1

=

SDB DA S −1 = SDB S −1 SDA S −1 = BA. 

Lemma 14.10.7 Let D be a diagonal matrix of the  λ 1 I n1 0   0 λ2 In2  D≡ .. ..  . .  0 ···

form ··· .. . .. . 0

0 .. . 0 λr Inr

    ,  

(14.19)

350

CHAPTER 14. MATRICES AND THE INNER PRODUCT

where Ini denotes the ni × ni identity matrix and λi ̸= λj for i ̸= j and suppose B is a matrix which commutes with D. Then B is a block diagonal matrix of the form   B1 0 · · · 0  ..  ..  0 B . .    2 B= . (14.20)  .. ..  .  . . 0   . 0 ··· 0 Br where Bi is an ni × ni matrix. Proof: Let B = (Bij ) where Bii = Bi a block matrix as above in 14.20. Since it commutes with D, 

B11

B12

··· .. . .. .

  B  21 B22  . ..  . .  . Br1 Br2 · · ·  λ1 In1 0   0 λ2 In2  =  .. ..  . .  0 ···

B1r B2r .. . Brr ··· .. . .. . 0

       0 .. . 0 λ r I nr

λ1 In1

0

0 .. . 0 

λ2 In2 .. .

··· .. . .. .

···

0

B11

  B   21  .  .  . Br1

B22 .. .

··· .. . .. .

Br2

···

B12

0 .. . 0 λr Inr B1r B2r .. . Brr

             

Thus λj Bij = λi Bij Therefore, if i ̸= j, Bij = 0. Hence B as the form which is claimed.  Lemma 14.10.8 Let F denote a commuting family of n × n matrices such that each A ∈ F is diagonalizable. Then F is simultaneously diagonalizable. commuting + diagonalizable ⇒ simultaneously diagonalizable Proof: First note that if every matrix in F has only one eigenvalue, there is nothing to prove. This is because for A such a matrix, S −1 AS = λI and so A = λI Thus all the matrices in F are diagonal matrices and you could pick any S to diagonalize them all. Therefore, without loss of generality, assume some matrix in F has more than one eigenvalue. The significant part of the lemma is proved by induction on n. If n = 1, there is nothing to prove because all the 1 × 1 matrices are already diagonal matrices. Suppose then that the theorem is true for all k ≤ n−1 where n ≥ 2 and let F be a commuting family of diagonalizable n×n matrices. Pick A ∈ F which has more than one eigenvalue and let S be an invertible matrix such that S −1 AS = D where D is of the form given in 14.19. By permuting the columns of S there { is no loss of generality } in assuming D has this form. Now denote by Fe the collection of matrices, S −1 CS : C ∈ F . Note Fe features the single matrix S. It follows easily that Fe is also a commuting family of diagonalizable matrices. Indeed, ) ( )( ( −1 )( ) ˆ ˆ = S −1 CCS ˆ ˆ = S −1 C CS = S −1 CS S CS S −1 CS S −1 CS

14.11. FRACTIONAL POWERS

351

e then S −1 CS = M where C ∈ F and so so the matrices commute. Now if M is a matrix in F, C = SM S −1 By assumption, there exists T such that T −1 CT = D and so )−1 ( D = T −1 CT = T −1 SM S −1 T = S −1 T M S −1 T showing that M is also diagonalizable. By Lemma 14.10.7 every B ∈ Fe is a block diagonal matrix of the form given in 14.20 because each of these commutes with D described above as S −1 AS and so by block multiplication, the diagonal ˆi corresponding respectively to B, B ˆ ∈ Fe commute. blocks Bi , B By Corollary 14.10.4 each of these blocks is diagonalizable. This is because B is known to be so. Therefore, by induction, since all the blocks are no larger than n − 1 × n − 1, thanks to the assumption that A has more than one eigenvalue, there exist invertible ni × ni matrices, Ti such that Ti−1 Bi Ti is a diagonal matrix whenever Bi is one of the matrices making up the block diagonal e It follows that for T defined by of any B ∈ F.   T1 0 · · · 0  ..  ..  0 T . .    2 T ≡ . , .. ..  .  . . 0   . 0 ··· 0 Tr then T −1 BT = a diagonal matrix for every B ∈ Fe including D. Consider ST. It follows that for all C ∈ F, e something in F

T −1

z }| { S −1 CS

−1

T = (ST )

C (ST ) = a diagonal matrix. 

Theorem 14.10.9 Let F denote a family of matrices which are diagonalizable. Then F is simultaneously diagonalizable if and only if F is a commuting family. Proof: If F is a commuting family, it follows from Lemma 14.10.8 that it is simultaneously diagonalizable. If it is simultaneously diagonalizable, then it follows from Lemma 14.10.6 that it is a commuting family.  This is really a remarkable theorem. Recall that if S −1 AS = D a diagonal matrix, then the columns of S are a basis of eigenvectors. Hence this says that when you have a commuting family of non defective matrices, then they have the same eigenvectors. This shows how remarkable it is when a set of matrices commutes.

14.11

Fractional Powers

The main result is the following theorem. Theorem 14.11.1 Let A be a self adjoint and nonnegative n × n matrix (all eigenvalues are nonnegative) and let k be a positive integer. Then there exists a unique self adjoint nonnegative matrix B such that B k = A. n

Proof: By Theorem 14.1.6, there exists an orthonormal basis of eigenvectors of A, say {vi }i=1 such that Avi = λi vi with each λi real. In particular, there exists a unitary matrix U such that U ∗ AU = D, A = U DU ∗ where D has nonnegative diagonal entries. Define B in the obvious way. B ≡ U D1/k U ∗

352

CHAPTER 14. MATRICES AND THE INNER PRODUCT

Then it is clear that B is self adjoint and nonnegative. Also it is clear that(B k = A. ) What of 1/k uniqueness? Let p (t) be a polynomial whose graph contains the ordered pairs λi , λi where the λi are the diagonal entries of D, the eigenvalues of A. Then p (A) = U P (D) U ∗ = U D1/k U ∗ ≡ B Suppose then that C k = A and C is also self adjoint and nonnegative. ( ) ( ) CB = Cp (A) = Cp C k = p C k C = p (A) C = BC and so {B, C} is a commuting family of non defective matrices. By Theorem 14.10.9 this family of matrices is simultaneously diagonalizable. Hence there exists a single S such that S −1 BS = DB , S −1 CS = DC Where DC , DB denote diagonal matrices. Hence, raising to the power k, it follows that k −1 k −1 A = B k = SDB S , A = C k = SDC S

Hence

k −1 k −1 SDB S = SDC S

k k and so DB = DC . Since the entries of the two diagonal matrices are nonnegative, this implies DB = DC and so S −1 BS = S −1 CS which shows B = C.  A similar result holds for a general finite dimensional inner product space. See Problem 21 in the exercises.

14.12

Spectral Theory Of Self Adjoint Operators

First is some notation which may be useful since it will be used in the following presentation. Definition 14.12.1 Let X, Y be inner product space and let u ∈ Y, v ∈ X. Then define u ⊗ v ∈ L (X, Y ) as follows. u ⊗ v (w) ≡ (w, v) u where (w, v) is the inner product in X. Then this is clearly linear. That it is continuous follows right away from |(w, v) u| ≤ |u|Y |w|X |v|X and so sup |u ⊗ v (w)|Y ≤ |u|Y |v|X

|w|X ≤1

Sometimes this is called the tensor product, although much more can be said about the tensor product. Note how this is similar to the rank one transformations used to consider the dimension of the space L (V, W ) in Theorem 5.1.4. This is also a rank one transformation but here there is no restriction on the dimension of the vector spaces although, as usual, the interest is in finite dimensional spaces. In case you have {v1 , · · · , vn } an orthonormal basis for V and {u1 , · · · , um } an orthonormal basis for Y, (or even just a basis.) the linear transformations ui ⊗ vj are the same as those rank ∑ one transformations used before in the above theorem and are a basis for L (V, W ). Thus for A = i,j aij ui ⊗ vj , the matrix of A with respect to the two bases has its ij th entry equal to aij . This is stated as the following proposition. Proposition 14.12.2 Suppose {v1 , · · · , vn } is an ∑ orthonormal basis for V and {u1 , · · · , um } is a basis for W . Then if A ∈ L (V, W ) is given by A = i,j aij ui ⊗ vj , then the matrix of A with respect to these two bases is an m × n matrix whose ij th entry is aij .

14.12. SPECTRAL THEORY OF SELF ADJOINT OPERATORS

353

In case A is a Hermitian matrix, and you have an orthonormal basis of eigenvectors and U is the unitary matrix having these eigenvectors as columns, recall that the matrix of A with respect to this basis is diagonal. Recall why this is. ( ) ( ) Au1 · · · Aun = u1 · · · un D where D is the diagonal matrix having the eigenvalues down the diagonal. Thus D = U ∗ AU and Aui = λi ui . It follows that as a linear transformation, ∑ λi ui ⊗ ui A= i

because both give the same answer when acting on elements of the orthonormal basis. This also says that the matrix of A with respect to the given orthonormal basis is just the diagonal matrix having the eigenvalues down the main diagonal. The following theorem is about the eigenvectors and eigenvalues of a self adjoint operator. Such operators may also be called Hermitian as in the case of matrices. The proof given generalizes to the situation of a compact self adjoint operator on a Hilbert space and leads to many very useful results. It is also a very elementary proof because it does not use the fundamental theorem of algebra and it contains a way, very important in applications, of finding the eigenvalues. This proof depends more directly on the methods of analysis than the preceding material. Recall the following notation. Definition 14.12.3 Let X be an inner product space and let S ⊆ X. Then S ⊥ ≡ {x ∈ X : (x, s) = 0 for all s ∈ S} . Note that even if S is not a subspace, S ⊥ is. Theorem 14.12.4 Let A ∈ L (X, X) be self adjoint (Hermitian) where X is a finite dimensional inner product space of dimension n. Thus A = A∗ . Then there exists an orthonormal basis of n eigenvectors, {vj }j=1 . Proof: Consider (Ax, x) . This quantity is always a real number because (Ax, x) = (x, Ax) = (x, A∗ x) = (Ax, x) thanks to the assumption that A is self adjoint. Now define λ1 ≡ inf {(Ax, x) : |x| = 1, x ∈ X1 ≡ X} . Claim: λ1 is finite and there exists v1 ∈ X with |v1 | = 1 such that (Av1 , v1 ) = λ1 . Proof of claim: The set of vectors {x : |x| = 1} is a closed and bounded subset of the finite dimensional space X. Therefore, it is compact and so the vector v1 exists by Theorem 11.5.3. I claim that λ1 is an eigenvalue and v1 is an eigenvector. Letting w ∈ X1 ≡ X, the function of the real variable, t, given by f (t) ≡

(A (v1 + tw) , v1 + tw) |v1 + tw|

2

=

(Av1 , v1 ) + 2t Re (Av1 , w) + t2 (Aw, w) 2

2

|v1 | + 2t Re (v1 , w) + t2 |w|

achieves its minimum when t = 0. Therefore, the derivative of this function evaluated at t = 0 must equal zero. Using the quotient rule, this implies, since |v1 | = 1 that 2

2 Re (Av1 , w) |v1 | − 2 Re (v1 , w) (Av1 , v1 ) = 2 (Re (Av1 , w) − Re (v1 , w) λ1 ) = 0. Thus Re (Av1 − λ1 v1 , w) = 0 for all w ∈ X. This implies Av1 = λ1 v1 by Proposition 13.1.5. ⊥ Continuing with the proof of the theorem, let X2 ≡ {v1 } . This is a closed subspace of X and A : X2 → X2 because for x ∈ X2 , (Ax, v1 ) = (x, Av1 ) = λ1 (x, v1 ) = 0.

354

CHAPTER 14. MATRICES AND THE INNER PRODUCT

Let λ2 ≡ inf {(Ax, x) : |x| = 1, x ∈ X2 } ⊥

As before, there exists v2 ∈ X2 such that Av2 = λ2 v2 , λ1 ≤ λ2 . Now let X3 ≡ {v1 , v2 } and continue ⊥ in this way. As long as k < n, it will be the case that {v1 , · · · , vk } ̸= {0}. This is because for k < n these vectors cannot be a spanning set and so there exists some w ∈ / span (v1 , · · · , vk ) . Then ⊥ letting z be the closest point to w from span (v1 , · · · , vk ) , it follows that w −z ∈ {v1 , · · · , vk } . Thus n there is an decreasing sequence of eigenvalues {λk }k=1 and a corresponding sequence of eigenvectors, {v1 , · · · , vn } with this being an orthonormal set.  Contained in the proof of this theorem is the following important corollary. Corollary 14.12.5 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner product space. Then all the eigenvalues are real and for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exists an orthonormal set of vectors {u1 , · · · , un } for which Auk = λk uk . Furthermore, λk ≡ inf {(Ax, x) : |x| = 1, x ∈ Xk } where



Xk ≡ {u1 , · · · , uk−1 } , X1 ≡ X. Corollary 14.12.6 Let A ∈ L (X, X) be self adjoint (Hermitian) where X is a finite dimensional inner product space. Then the largest eigenvalue of A is given by max {(Ax, x) : |x| = 1}

(14.21)

and the minimum eigenvalue of A is given by min {(Ax, x) : |x| = 1} .

(14.22)

Proof: The proof of this is just like the proof of Theorem 14.12.4. Simply replace inf with sup and obtain a decreasing list of eigenvalues. This establishes 14.21. The claim 14.22 follows from Theorem 14.12.4.  Another important observation is found in the following corollary. ∑ Corollary 14.12.7 Let A ∈ L (X, X) where A is self adjoint. Then A = i λi vi ⊗ vi where n Avi = λi vi and {vi }i=1 is an orthonormal basis. Proof : If vk is one of the orthonormal basis vectors, Avk = λk vk . Also, ∑ ∑ ∑ λi vi ⊗ vi (vk ) = λi vi (vk , vi ) = λi δ ik vi = λk vk . i

i

i

Since the two linear transformations agree on a basis, it follows they must coincide.  n By Proposition 14.12.2 this says the matrix of A with respect to this basis {vi }i=1 is the diagonal matrix having the eigenvalues λ1 , · · · , λn down the main diagonal. The result of Courant and Fischer which follows resembles Corollary 14.12.5 but is more useful because it does not depend on a knowledge of the eigenvectors. Theorem 14.12.8 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner product space. Then for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exist orthonormal vectors {u1 , · · · , un } for which Auk = λk uk . Furthermore, λk ≡

max

w1 ,··· ,wk−1 ⊥

{ { }} ⊥ min (Ax, x) : |x| = 1, x ∈ {w1 , · · · , wk−1 }

where if k = 1, {w1 , · · · , wk−1 } ≡ X.

(14.23)

14.12. SPECTRAL THEORY OF SELF ADJOINT OPERATORS

355

Proof: From Theorem 14.12.4, there exist eigenvalues and eigenvectors with {u1 , · · · , un } orthonormal and λi ≤ λi+1 . (Ax, x) =

n ∑

(Ax, uj ) (x, uj ) =

j=1

Recall that (z, w) =



n ∑ j=1

j=1

(z, uj ) (w, ui ). Then let Y = {w1 , · · · , wk−1 }

 k ∑ 

2

λj |(x, uj )|



j

inf {(Ax, x) : |x| = 1, x ∈ Y } = inf

≤ inf

n ∑

λj (x, uj ) (uj , x) =

 n ∑ 

2

λj |(x, uj )| : |x| = 1, x ∈ Y

j=1

2

λj |(x, uj )| : |x| = 1, (x, uj ) = 0 for j > k, and x ∈ Y

j=1

  

  

.

(14.24)

The reason this is so is that the infimum is taken over a smaller set. Therefore, the infimum gets larger. Now 14.24 is no larger than   n  ∑  2 inf λk |(x, uj )| : |x| = 1, (x, uj ) = 0 for j > k, and x ∈ Y ≤ λk   j=1

2

because since {u1 , · · · , un } is an orthonormal basis, |x| =

∑n j=1

2

|(x, uj )| . It follows, since

{w1 , · · · , wk−1 } is arbitrary, sup w1 ,··· ,wk−1

{ { }} ⊥ inf (Ax, x) : |x| = 1, x ∈ {w1 , · · · , wk−1 } ≤ λk .

(14.25)

Then from Corollary 14.12.5, { } ⊥ λk = inf (Ax, x) : |x| = 1, x ∈ {u1 , · · · , uk−1 } ≤ sup w1 ,··· ,wk−1

{ { }} ⊥ inf (Ax, x) : |x| = 1, x ∈ {w1 , · · · , wk−1 } ≤ λk

Hence these are all equal and this proves the theorem.  The following corollary is immediate. Corollary 14.12.9 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner product space. Then for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exist orthonormal vectors {u1 , · · · , un } for which Auk = λk uk . Furthermore, { λk ≡

max

w1 ,··· ,wk−1

{ min

(Ax, x) |x|

2

}} : x ̸= 0, x ∈ {w1 , · · · , wk−1 }



where if k = 1, {w1 , · · · , wk−1 } ≡ X. Here is a version of this for which the roles of max and min are reversed.



(14.26)

356

CHAPTER 14. MATRICES AND THE INNER PRODUCT

Corollary 14.12.10 Let A ∈ L (X, X) be self adjoint where X is a finite dimensional inner product space. Then for λ1 ≤ λ2 ≤ · · · ≤ λn the eigenvalues of A, there exist orthonormal vectors {u1 , · · · , un } for which Auk = λk uk . Furthermore,

{ λk ≡

min

w1 ,··· ,wn−k

{ max

(Ax, x) |x|

2

}} ⊥

: x ̸= 0, x ∈ {w1 , · · · , wn−k }

(14.27)



where if k = n, {w1 , · · · , wn−k } ≡ X.

14.13

Positive And Negative Linear Transformations

The notion of a positive definite or negative definite linear transformation is very important in many applications. In particular it is used in versions of the second derivative test for functions of many variables. Here the main interest is the case of a linear transformation which is an n × n matrix but the theorem is stated and proved using a more general notation because all these issues discussed here have interesting generalizations to functional analysis. Definition 14.13.1 A self adjoint A ∈ L (X, X) , is positive definite if whenever x ̸= 0, (Ax, x) > 0 and A is negative definite if for all x ̸= 0, (Ax, x) < 0. A is positive semidefinite or just nonnegative for short if for all x, (Ax, x) ≥ 0. A is negative semidefinite or nonpositive for short if for all x, (Ax, x) ≤ 0. The following lemma is of fundamental importance in determining which linear transformations are positive or negative definite. Lemma 14.13.2 Let X be a finite dimensional inner product space. A self adjoint A ∈ L (X, X) is positive definite if and only if all its eigenvalues are positive and negative definite if and only if all its eigenvalues are negative. It is positive semidefinite if all the eigenvalues are nonnegative and it is negative semidefinite if all the eigenvalues are nonpositive. Proof: Suppose first that A is positive definite and let λ be an eigenvalue. Then for x an eigenvector corresponding to λ, λ (x, x) = (λx, x) = (Ax, x) > 0. Therefore, λ > 0 as claimed. Now ∑n suppose all the eigenvalues of A are positive. From Theorem 14.12.4 and Corollary 14.12.7, A = i=1 λi ui ⊗ ui where the λi are the positive eigenvalues and {ui } are an orthonormal set of eigenvectors. Therefore, letting x ̸= 0, (( n ) ) ( n ) ∑ ∑ (Ax, x) = λi ui ⊗ ui x, x = λi ui (x, ui ) , x ( =

i=1 n ∑

i=1

)

λi (x, ui ) (ui , x)

i=1

=

n ∑

2

λi |(ui , x)| > 0

i=1 2

∑n

2

because, since {ui } is an orthonormal basis, |x| = i=1 |(ui , x)| . To establish the claim about negative definite, it suffices to note that A is negative definite if and only if −A is positive definite and the eigenvalues of A are (−1) times the eigenvalues of −A. The claims about positive semidefinite and negative semidefinite are obtained similarly.  The next theorem is about a way to recognize whether a self adjoint n × n complex matrix A is positive or negative definite without having to find the eigenvalues. In order to state this theorem, here is some notation. Definition 14.13.3 Let A be an n × n matrix. Denote by Ak the k × k matrix obtained by deleting the k + 1, · · · , n columns and the k + 1, · · · , n rows from A. Thus An = A and Ak is the k × k submatrix of A which occupies the upper left corner of A. The determinants of these submatrices are called the principle minors.

14.14. THE SINGULAR VALUE DECOMPOSITION

357

The following theorem is proved in [10]. For the sake of simplicity, we state this for real matrices since this is also where the main interest lies. Theorem 14.13.4 Let A be a self adjoint n × n matrix. Then A is positive definite if and only if det (Ak ) > 0 for every k = 1, · · · , n. Proof: This theorem is proved by induction on n. It is clearly true if n = 1. Suppose then that it is true for n − 1 where n ≥ 2. Since det (A) > 0, it follows that all the eigenvalues are nonzero. Are they all positive? Suppose not. Then there is some even number of them which are negative, even because the product of all the eigenvalues is known to be positive, equaling det (A). Pick two, λ1 and λ2 and let Aui = λi ui where ui ̸= 0 for i = 1, 2 and (u1 , u2 ) = 0. Now if y ≡ α1 u1 + α2 u2 is an element of span (u1 , u2 ) , then since these are eigenvalues and (u1 , u2 )Rn = 0, a short computation shows 2 2 2 2 (A (α1 u1 + α2 u2 ) , α1 u1 + α2 u2 ) = |α1 | λ1 |u1 | + |α2 | λ2 |u2 | < 0. Now letting x ∈ Rn−1 , x ̸= 0, the induction hypothesis implies ( ) ( T ) x x ,0 A = xT An−1 x = (An−1 x, x) > 0. 0 The dimension of {z ∈ Rn : zn = 0} is n−1 and the dimension of span (u1 , u2 ) = 2 and so there must be some nonzero x ∈ Rn which is in both of these subspaces of Rn . However, the first computation would require that (Ax, x) < 0 while the second would require that (Ax, x) > 0. This contradiction shows that all the eigenvalues must be positive. This proves the if part of the theorem. To show the converse, note that, as above, (Ax, x) = xT Ax. Suppose that A is positive definite. Then this is equivalent to having 2 xT Ax ≥ δ ∥x∥ Note that for x ∈ Rk ,

(

xT

(

) 0

A

x 0

) 2

= xT Ak x ≥ δ ∥x∥

From Lemma 14.13.2, this implies that all the eigenvalues of Ak are positive. Hence from Lemma 14.13.2, it follows that det (Ak ) > 0, being the product of its eigenvalues.  Corollary 14.13.5 Let A be a self adjoint n × n matrix. Then A is negative definite if and only if k det (Ak ) (−1) > 0 for every k = 1, · · · , n. Proof: This is immediate from the above theorem by noting that, as in the proof of Lemma 14.13.2, A is negative definite if and only if −A is positive definite. Therefore, det (−Ak ) > 0 for all k k = 1, · · · , n, is equivalent to having A negative definite. However, det (−Ak ) = (−1) det (Ak ) . 

14.14

The Singular Value Decomposition

In this section, A will be an m × n matrix. To begin with, here is a simple lemma observed earlier. Lemma 14.14.1 Let A be an m × n matrix. Then A∗ A is self adjoint and all its eigenvalues are nonnegative. Proof: It is obvious that A∗ A is self adjoint. Suppose A∗ Ax = λx. Then λ |x| = (λx, x) = (A Ax, x) = (Ax,Ax) ≥ 0.  2



Definition 14.14.2 Let A be an m × n matrix. The singular values of A are the square roots of the positive eigenvalues of A∗ A.

358

CHAPTER 14. MATRICES AND THE INNER PRODUCT

With this definition and lemma here is the main theorem on the singular value decomposition. In all that follows, I will write the following partitioned matrix ( ) σ 0 0 0 where σ denotes an r × r diagonal matrix of the form  σ1 0  ..  .  0 σk

   

and the bottom row of zero matrices in the partitioned matrix, as well as the right columns of zero matrices are each of the right size so that the resulting matrix is m × n. Either could vanish completely. However, I will write it in the above form. It is easy to make the necessary adjustments in the other two cases. Theorem 14.14.3 Let A be an m × n matrix. Then there exist unitary matrices, U and V of the appropriate size such that ( ) σ 0 ∗ U AV = 0 0 where σ is of the form

  σ= 

σ1

0 ..

0

.

   

σk

for the σ i the singular values of A, arranged in order of decreasing size. n

Proof: By the above lemma and Theorem 14.12.4 there exists an orthonormal basis, {vi }i=1 for Fn such that A∗ Avi = σ 2i vi where σ 2i > 0 for i = 1, · · · , k, (σ i > 0) , and equals zero if i > k. Let the eigenvalues σ 2i be arranged in decreasing order. It is desired to have ( ) σ 0 AV = U 0 0 ( and so if U =

u1

···

) um

, one needs to have for j ≤ k, σ j uj = Avj . Thus let uj ≡ σ −1 j Avj , j ≤ k

Then for i, j ≤ k, (ui , uj ) =

−1 −1 −1 ∗ σ −1 j σ i (Avi , Avj ) = σ j σ i (A Avi , vj )

−1 2 = σ −1 j σ i σ i (vi , vj ) = δ ij

Now extend to an orthonormal basis of Fm , {u1 , · · · , uk , uk+1 , · · · , um } . If i > k, (Avi , Avi ) = (A∗ Avi , vi ) = 0 (vi , vi ) = 0 so Avi = 0. Then for σ given as above in the statement of the theorem, it follows that ( ) ( ) σ 0 σ 0 AV = U , U ∗ AV =  0 0 0 0 The singular value decomposition has as an immediate corollary the following interesting result.

14.15. APPROXIMATION IN THE FROBENIUS NORM

359

Corollary 14.14.4 Let A be an m × n matrix. Then the rank of both A and A∗ equals the number of singular values. Proof: Since V and U are unitary, they are each one to one and onto and so it follows that ( ) σ 0 ∗ rank (A) = rank (U AV ) = rank = number of singular values. 0 0 Also since U, V are unitary, ( ∗) rank (A∗ ) = rank (V ∗ A∗ U ) = rank (U ∗ AV ) (( = rank

14.15

σ 0

0 0

)∗ ) = number of singular values. 

Approximation In The Frobenius Norm

The Frobenius norm is one of many norms for a matrix. It is arguably the most obvious of all norms. Here is its definition. Definition 14.15.1 Let A be a complex m × n matrix. Then ||A||F ≡ (trace (AA∗ ))

1/2

Also this norm comes from the inner product (A, B)F ≡ trace (AB ∗ ) 2

Thus ||A||F is easily seen to equal

∑ ij

2

|aij | so essentially, it treats the matrix as a vector in Fm×n .

Lemma 14.15.2 Let A be an m × n complex matrix with singular matrix ( ) σ 0 Σ= 0 0 with σ as defined above, U ∗ AV = Σ. Then 2

2

||Σ||F = ||A||F

(14.28)

and the following hold for the Frobenius norm. If U, V are unitary and of the right size, ||U A||F = ||A||F , ||U AV ||F = ||A||F .

(14.29)

Proof: From the definition and letting U, V be unitary and of the right size, ||U A||F ≡ trace (U AA∗ U ∗ ) = trace (U ∗ U AA∗ ) = trace (AA∗ ) = ||A||F 2

Also,

2

||AV ||F ≡ trace (AV V ∗ A∗ ) = trace (AA∗ ) = ||A||F . 2

It follows

2

∥Σ∥F = ||U ∗ AV ||F = ||AV ||F = ||A||F .  2

2

2

Of course, this shows that 2

||A||F =

∑ i

the sum of the squares of the singular values of A.

σ 2i ,

2

360

CHAPTER 14. MATRICES AND THE INNER PRODUCT Why is the singular value decomposition important? It implies ( ) σ 0 A=U V∗ 0 0

where σ is the diagonal matrix having the singular values down the diagonal. Now sometimes A is a huge matrix, 1000×2000 or something like that. This happens in applications to situations where the entries of A describe a picture. What also happens is that most of the singular values are very small. What if you deleted those which were very small, say for all i ≥ l and got a new matrix ) ( σ′ 0 ′ V ∗? A ≡U 0 0 Then the entries of A′ would end up being close to the entries of A but there is much less information to keep track of. This turns out to be very useful. More precisely, letting   ( ) σ1 0   σ 0 ..  , U ∗ AV = σ= , .   0 0 0 σr ||A −

2 A′ ||F ′

( σ − σ′ = U 0

0 0

)

2 r ∑ σ 2k V = ∗

F

k=l+1



Thus A is approximated by A where A has rank l < r. In fact, it is also true that out of all matrices of rank l, this A′ is the one which is closest to A ) in the Frobenius norm. Here is roughly ( σ r×r 0 ˜ approximates A = as well as possible out of all matrices why this is so. Suppose B 0 0 ˜ having rank no more than l < r the size of the matrix σ r×r . B ˜ is l. Then obviously no column xj of B ˜ in a basis for the column space Suppose the rank of B can have j > r since if so, the approximation of A could be improved by simply making this column ( ) r into a zero column. Therefore there are choices for columns for a basis for the column space l ˜ of B. ˜ and let it be column j in the matrix Let x be a column in the basis for the column space of B ˜ B. Denote the diagonal entry by xj = σ j + h. Then the error incurred due to approximating with this column is ∑ h2 + x2i i̸=j

One obviously minimizes this error by letting h = 0 = xi for all i ̸= j. That is, the column should ˜ which are not pivot columns, have all zeroes with σ j in the diagonal position. As to any columns of B such a column is a linear combination of these basis columns which have exactly one entry, in the diagonal position. These non pivot columns must have a 0 in the diagonal position since if not, the rank of the matrix would be more than l. Then the off diagonal entries should equal zero to make the approximation as good as possible. Thus the non basis columns are columns consisting of zeros ˜ is a diagonal matrix with l nonzero diagonal entries selected from the first r columns of A. and B It only remains to observe that, since the singular values decrease in size from upper left to lower ˜ in order to right in A, to minimize the error, one should pick the first l columns for the basis for B use the sum of the squares of the smallest possible singular values in the error. That is, you would replace σ r×r with the upper left l × l corner of σ r×r . ( ) ( ) σ r×r 0 σ l×l 0 ˜ A= ,⇒ B = 0 0 0 0

14.16. LEAST SQUARES AND SINGULAR VALUE DECOMPOSITION For example, consider



3   0 0 The best rank 2 approximation is



3   0 0

0 2 0

0 0 1

 0  0  0

0 2 0

0 0 0

 0  0  0

361

Now suppose A is an m × n matrix. Let U, V be unitary and of the right size such that ( ) σ r×r 0 ∗ U AV = 0 0 Then suppose B approximates A as well as possible in the Frobenius norm, B having rank l < r. Then you would want

(

)

σ

0

r×r ∗ ∗ ∗ ∥A − B∥ = ∥U AV − U BV ∥ = − U BV

0 0 to be as small as possible. Therefore, from the above discussion, you should have ( ) ( ) σ 0 σ 0 l×l l×l ∗ ˜ ≡ U BV = B ,B = U V∗ 0 0 0 0 (

whereas A=U

14.16

σ r×r 0

0 0

) V∗

Least Squares And Singular Value Decomposition

The singular value decomposition also has a very interesting connection to the problem of least squares solutions. Recall that it was desired to find x such that |Ax − y| is as small as possible. Lemma 13.4.1 shows that there is a solution to this problem which can be found by solving the system A∗ Ax = A∗ y. Each x which solves this system solves the minimization problem as was shown in the lemma just mentioned. Now consider this equation for the solutions of the minimization problem in terms of the singular value decomposition. z ( V

A∗

A

A∗

}| ) {z ( }| ) { z ( }| ) { σ 0 σ 0 σ 0 U ∗U V ∗x = V U ∗ y. 0 0 0 0 0 0

Therefore, this yields the following upon using block multiplication and multiplying on the left by V ∗. ( ) ( ) σ2 0 σ 0 ∗ V x= U ∗ y. (14.30) 0 0 0 0 One solution to this equation which is very easy to spot is ( ) σ −1 0 x=V U ∗ y. 0 0

(14.31)

362

CHAPTER 14. MATRICES AND THE INNER PRODUCT

14.17

The Moore Penrose Inverse

The particular solution of the least squares problem given in 14.31 is important enough that it motivates the following definition. Definition 14.17.1 Let A be an m × n matrix. Then the Moore Penrose inverse of A, denoted by A+ is defined as ( ) σ −1 0 + A ≡V U ∗. 0 0 (

Here ∗

U AV =

σ 0

0 0

)

as above. Thus A+ y is a solution to the minimization problem to find x which minimizes |Ax − y| . In fact, one can say more about this. In the following picture My denotes the set of least squares solutions x such that A∗ Ax = A∗ y. A+ (y) I My x



ker(A∗ A)

Then A+ (y) is as given in the picture. Proposition 14.17.2 A+ y is the solution to the problem of minimizing |Ax − y| for all x which has smallest norm. Thus AA+ y − y ≤ |Ax − y| for all x and if x1 satisfies |Ax1 − y| ≤ |Ax − y| for all x, then |A+ y| ≤ |x1 | . Proof: Consider x satisfying 14.30, equivalently A∗ Ax =A∗ y, ) ) ( ( σ 0 σ2 0 ∗ V x= U ∗y 0 0 0 0 which has smallest norm. This is equivalent to making |V ∗ x| as small as possible because V ∗ is unitary and so it preserves norms. For z a vector, denote by (z)k the vector in Fk which consists of the first k entries of z. Then if x is a solution to 14.30 ) ( ) ( σ (U ∗ y)k σ 2 (V ∗ x)k = 0 0 and so (V ∗ x)k = σ −1 (U ∗ y)k . Thus the first k entries of V ∗ x are determined. In order to make |V ∗ x| as small as possible, the remaining n − k entries should equal zero. Therefore, ( ) ( ) ( ) (V ∗ x)k σ −1 (U ∗ y)k σ −1 0 ∗ V x= = = U ∗y 0 0 0 0

14.17. THE MOORE PENROSE INVERSE (

and so x=V

363

σ −1 0

0 0

) U ∗ y ≡ A+ y 

Lemma 14.17.3 The matrix A+ satisfies the following conditions. AA+ A = A, A+ AA+ = A+ , A+ A and AA+ are Hermitian. (

Proof: This is routine. Recall A=U

(

and A+ = V

σ 0 σ −1 0

(14.32)

)

0 0

V∗

0 0

) U∗

so you just plug in and verify it works.  A much more interesting observation is that A+ is characterized as being the unique matrix which satisfies 14.32. This is the content of the following Theorem. The conditions are sometimes called the Penrose conditions. Theorem 14.17.4 Let A be an m × n matrix. Then a matrix A0 , is the Moore Penrose inverse of A if and only if A0 satisfies AA0 A = A, A0 AA0 = A0 , A0 A and AA0 are Hermitian.

(14.33)

Proof: From the above lemma, the Moore Penrose inverse satisfies 14.33. Suppose then that A0 satisfies 14.33. It is necessary to verify that A0 = A+ . Recall that from the singular value decomposition, there exist unitary matrices, U and V such that ( ) σ 0 ∗ U AV = Σ ≡ , A = U ΣV ∗ . 0 0 (

Recall that +

A =V (

Let A0 = V

σ −1 0 P R

0 0 Q S

) U∗ ) U∗

(14.34)

where P is r × r, the same size as the diagonal matrix composed of the singular values on the main diagonal. Next use the first equation of 14.33 to write A

z (

z }| { U ΣV ∗ V

A0

}| ) { A A z }| { z }| { P Q ∗ ∗ U U ΣV = U ΣV ∗ . R S

Then multiplying both sides on the left by U ∗ and on the right by V, ( )( )( ) ( ) ( σ 0 P Q σ 0 σP σ 0 σ = = 0 0 R S 0 0 0 0 0

0 0

) (14.35)

364

CHAPTER 14. MATRICES AND THE INNER PRODUCT

Therefore, P = σ −1 . From the requirement that AA0 is Hermitian, z (

A

z }| { U ΣV ∗ V

A0

P R

}| ) { ( Q σ ∗ U =U S 0

0 0

must be Hermitian. Therefore, it is necessary that ( )( ) ( σ 0 P Q σP = 0 0 R S 0 (

is Hermitian. Then

I 0

)

σQ 0

( =

σQ 0

I Q∗ σ

)(

)

P R (

=

Q S

I 0

) U∗

)

σQ 0

)

0 0

and so Q = 0. Next, z ( V

A0

}| ) { A ( z }| { P Q Pσ U ∗ U ΣV ∗ = V R S Rσ

is Hermitian. Therefore, also

(

is Hermitian. Thus R = 0 because (

I Rσ

I Rσ

0 0

)∗

0 0

)

0 0

( =

( V∗ =V

I Rσ

0 0

) V∗

)

)

σ ∗ R∗ 0

I 0

which requires Rσ = 0. Now multiply on right by σ −1 to find that R = 0. Use 14.34 and the second equation of 14.33 to write z ( V which implies

A0

A0

A0

}| ) { A z ( }| ) { z ( }| ) { z }| { P Q P Q P Q U ∗ U ΣV ∗ V U∗ = V U ∗. R S R S R S (

P R

Q S

)(

σ 0

0 0

)(

P R

Q S

(

) =

P R

Q S

This yields from the above in which is was shown that R, Q are both 0 ( )( )( ) ( σ −1 0 σ 0 σ −1 0 σ −1 = 0 S 0 0 0 S 0 ( σ −1 = 0 Therefore, S = 0 also and so ( ∗

V A0 U ≡

P R

Q S

)

( =

σ −1 0

0 0

)

) .

0 0 0 S

) (14.36) ) .

(14.37)

14.18. THE SPECTRAL NORM AND THE OPERATOR NORM which says

(

σ −1 0

A0 = V

0 0

365

) U ∗ ≡ A+ . 

The theorem is significant because there is no mention of eigenvalues or eigenvectors in the characterization of the Moore Penrose inverse given in 14.33. It also shows immediately that the Moore Penrose inverse is a generalization of the usual inverse. See Problem 3.

14.18

The Spectral Norm And The Operator Norm

Another way of describing a norm for an n × n matrix is as follows. Definition 14.18.1 Let A be an m × n matrix. Define the spectral norm of A, written as ||A||2 to be { } max λ1/2 : λ is an eigenvalue of A∗ A . That is, the largest singular value of A. (Note the eigenvalues of A∗ A are all positive because if A∗ Ax = λx, then 2 λ |x| = λ (x, x) = (A∗ Ax, x) = (Ax,Ax) ≥ 0.) Actually, this is nothing new. It turns out that ||·||2 is nothing more than the operator norm for A taken with respect to the usual Euclidean norm, ( |x| =

n ∑

)1/2 2

|xk |

.

k=1

Proposition 14.18.2 The following holds. ||A||2 = sup {|Ax| : |x| = 1} ≡ ||A|| . Proof: Note that A∗ A is Hermitian and so by Corollary 14.12.6, { } { } 1/2 1/2 ||A||2 = max (A∗ Ax, x) : |x| = 1 = max (Ax,Ax) : |x| = 1 = max {|Ax| : |x| = 1} = ||A|| .  Here is another ( proof of)this proposition. Recall there are unitary matrices of the right size U, V σ 0 such that A = U V ∗ where the matrix on the inside is as described in the section on the 0 0 singular value decomposition. Then since unitary matrices preserve norms,

||A||

( = sup U |x|≤1 ( = sup U |y|≤1

σ 0 σ 0

( ) σ 0 ∗ V x = sup U V x |V ∗ x|≤1 0 0 ( ) ) σ 0 0 y = σ 1 ≡ ||A||2 y = sup 0 0 0 |y|≤1

0 0

)



This completes the alternate proof. From now on, ||A||2 will mean either the operator norm of A taken with respect to the usual Euclidean norm or the largest singular value of A, whichever is most convenient.

366

14.19

CHAPTER 14. MATRICES AND THE INNER PRODUCT

The Positive Part Of A Hermitian Matrix

Actually, some of the most interesting functions of matrices do not come as a power series expanded about 0 which was presented earlier. One example of this situation has already been encountered in the proof of the right polar decomposition with the square root of an Hermitian transformation which had all nonnegative eigenvalues. Another example is that of taking the positive part of an Hermitian matrix. This is important in some physical models where something may depend on the positive part of the strain which is a symmetric real matrix. Obviously there is no way to consider this as a power series expanded about 0 because the function f (r) = r+ ≡ |r|+r is not even 2 differentiable at 0. Therefore, a totally different approach must be considered. Actually, the only use of this I know of involves real symmetric matrices but the general case is considered here. First the notion of a positive part is defined. Definition 14.19.1 Let A be an Hermitian matrix. Thus it suffices to consider A as an element of L (Fn , Fn ) according to the usual notion of matrix multiplication. Then there is a unitary matrix U such that A = U DU ∗ where D is a diagonal matrix. Then

A+ ≡ U D + U ∗

where D+ is obtained from D by replacing each diagonal entry with its positive part. This gives us a nice definition of what is meant but it turns out to be very important in the applications to determine how this function depends on the choice of symmetric matrix A. The following addresses this question. Then ∑ Ax = λi ui u∗i x i

You can see this is the case by checking on the uj . A agrees with give the same result for all vectors. Thus similarly ∑ ∗ A+ = λ+ i ui ui

∑ i

λi ui u∗i on a basis and so they

i

Theorem 14.19.2 If A, B be Hermitian matrices, then for |·| the Frobenius norm, |A+ − B+ | ≤ |A − B| . ∑ Proof: Let A = i λi vi vi∗ and let B = j µj wj wj∗ where {vi } and {wj } are orthonormal bases of eigenvectors. Now A+ , B+ are Hermitian and so their difference is also. It follows that ∑

 |A+ − B+ | = trace  2



∗ λ+ i vi vi −

i



2 ∗ µ+ = j wj wj

j

 ∑ ( )2 ∑ ( )2 trace  vi vi∗ + λ+ µ+ wj wj∗ i j i



∑ i,j

+ λ+ i µj

(wj , vi ) vi wj∗

j





 + λ+ i µj

(vi , wj ) wj vi∗ 

i,j

The trace satisfies trace (AB) = trace (BA) when both products make sense. Therefore, ( ) ( ) trace vi wj∗ = trace wj∗ vi = wj∗ vi ≡ (vi , wj ) ,

14.20. EXERCISES

367

a similar formula for wj vi∗ . Therefore, this equals ∑ ( )2 ∑ ( )2 ∑ 2 + = λ+ + µ+ −2 λ+ i i µj |(vi , wj )| . j i

j

Since these are orthonormal bases, ∑

2

|(vi , wj )| = 1 =

i

and so 14.38 equals =

2

|(vi , wj )|

λ+ i

) ( )2 2 + + |(vi , wj )| . + µ+ − 2λ µ i j j

)2

j

Similarly, 2

|A − B| =

∑∑( i

2

) ( )2 2 2 (λi ) + µj − 2λi µj |(vi , wj )| .

j

(

Now it is easy to check that (λi ) + µj

14.20

∑ j

∑ ∑ (( i

(14.38)

i,j

)2

( )2 ( + ) 2 + − 2λi µj ≥ λ+ + µj − 2λ+ i i µj . 

Exercises ∗



1. Show (A∗ ) = A and (AB) = B ∗ A∗ . 2. Prove Corollary 14.12.10. 3. Show that if A is an n × n matrix which has an inverse then A+ = A−1 . 4. Using the singular value decomposition, show that for any square matrix A, it follows that A∗ A is unitarily similar to AA∗ . 5. Let A, B be a m × n matrices. Define an inner product on the set of m × n matrices by (A, B)F ≡ trace (AB ∗ ) . Show this is an inner∑ product satisfying all the inner product axioms. Recall for M an n × n n matrix, trace (M ) ≡ i=1 Mii . The resulting norm, ||·||F is called the Frobenius norm and it can be used to measure the distance between two matrices. 6. It was shown that a matrix A is normal if and only if it is unitarily similar to a diagonal matrix. It was also shown that if a matrix is Hermitian, then it is unitarily similar to a real diagonal matrix. Show the converse of this last statement is also true. If a matrix is unitarily similar to a real diagonal matrix, then it is Hermitian. ∑ 2 7. Let A be an m × n matrix. Show ||A||F ≡ (A, A)F = j σ 2j where the σ j are the singular values of A. 8. If A is a general n × n matrix having possibly repeated eigenvalues, show there is a sequence {Ak } of n × n matrices having distinct eigenvalues which has the property that the ij th entry of Ak converges to the ij th entry of A for all ij. Hint: Use Schur’s theorem. 9. Prove the Cayley Hamilton theorem as follows. First suppose A has a basis of eigenvectors n {vk }k=1 , Avk = λk vk . Let p (λ) be the characteristic polynomial. Show p (A) vk = p (λk ) vk = 0. Then since {vk } is a basis, it follows p (A) x = 0 for all x and so p (A) = 0. Next in the general case, use Problem 8 to obtain a sequence {Ak } of matrices whose entries converge to the entries of A such that Ak has n distinct eigenvalues and therefore by Theorem 6.6.1 on Page 126 Ak has a basis of eigenvectors. Therefore, from the first part and for pk (λ) the characteristic polynomial for Ak , it follows pk (Ak ) = 0. Now explain why and the sense in which limk→∞ pk (Ak ) = p (A) .

368

CHAPTER 14. MATRICES AND THE INNER PRODUCT

10. Show directly that if A is an n×n matrix and A = A∗ (A is Hermitian) then all the eigenvalues are real and eigenvectors can be assumed to be real and that eigenvectors associated with distinct eigenvalues are orthogonal, (their inner product is zero). 11. Let v1 , · · · , vn be an orthonormal basis for Fn . Let Q be a matrix whose ith column is vi . Show Q∗ Q = QQ∗ = I. 12. Show that an n × n matrix Q is unitary if and only if it preserves distances. This means |Qv| = |v| . This was done in the text but you should try to do it for yourself. 13. Suppose {v1 , · · · , vn } and {w1 , · · · , wn } are two orthonormal bases for Fn and suppose Q is an n × n matrix satisfying Qvi = wi . Then show Q is unitary. If |v| = 1, show there is a unitary transformation which maps v to e1 . This is done in the text but do it yourself with all details. 14. Let A be a Hermitian matrix so A = A∗ and suppose all eigenvalues of A are larger than δ 2 . Show 2 (Av, v) ≥ δ 2 |v| ∑n Where here, the inner product is (v, u) ≡ j=1 vj uj . 15. The discrete Fourier transform maps Cn → Cn as follows. n−1 1 ∑ −i 2π jk e n xj . F (x) = z where zk = √ n j=0

Show that F −1 exists and is given by the formula n−1 1 ∑ i 2π jk F −1 (z) = x where xj = √ e n zk n j=0

Here is one way to approach this problem. Note z = U x where    1  U=√  n  

e−i n 0·0 2π e−i n 0·1 2π e−i n 0·2 ...

e−i n 1·0 2π e−i n 1·1 2π e−i n 1·2 .. .

e−i n 2·0 2π e−i n 2·1 2π e−i n 2·2 .. .

··· ··· ···

e−i n (n−1)·0 2π e−i n (n−1)·1 2π e−i n (n−1)·2 .. .

e−i n 0·(n−1)

e−i n 1·(n−1)

e−i n 2·(n−1)

···

e−i n (n−1)·(n−1)















       



Now argue U is unitary and use this to establish the result. To show this verify each row 2π has length 1 and the inner product of two different rows gives 0. Now Ukj = e−i n jk and so 2π (U ∗ )kj = ei n jk . 16. Let f be a periodic function having period 2π. The Fourier series of f is an expression of the form ∞ n ∑ ∑ ck eikx ≡ lim ck eikx n→∞

k=−∞

k=−n

and the idea is to find ck such that the above sequence converges in some way to f . If f (x) =

∞ ∑ k=−∞

ck eikx

14.20. EXERCISES

369

and you formally multiply both sides by e−imx and then integrate from 0 to 2π, interchanging the integral with the sum without any concern for whether this makes sense, show it is reasonable from this to expect ∫ 2π 1 cm = f (x) e−imx dx. 2π 0 Now suppose you only know f (x) at equally spaced points 2πj/n for j = 0, 1, · · · , n. Consider the Riemann sum for this integral from using the left endpoint of the subintervals { obtained }n determined from the partition 2π j . How does this compare with the discrete Fourier n j=0 transform? What happens as n → ∞ to this approximation? 17. Suppose A is a real 3 × 3 orthogonal matrix (Recall this means AAT = AT A = I. ) having determinant 1. Show it must have an eigenvalue equal to 1. Note this shows there exists a vector x ̸= 0 such that Ax = x. Hint: Show first or recall that any orthogonal matrix must preserve lengths. That is, |Ax| = |x| . 18. Let A be a complex m × n matrix. Using the description of the Moore Penrose inverse in terms of the singular value decomposition, show that lim (A∗ A + δI)

δ→0+

−1

A∗ = A+

where the convergence happens in the Frobenius norm. Also verify, using the singular value decomposition, that the inverse exists in the above formula. Observe that this shows that the Moore Penrose inverse is unique. 19. Show that A+ = (A∗ A) A∗ . Hint: You might use the description of A+ in terms of the singular value decomposition. +

20. In Theorem 14.11.1. Show that every matrix which commutes with A also commutes with A1/k the unique nonnegative self adjoint k th root. 21. Let X be a finite dimensional inner product space and let β = {u1 , · · · , un } be an orthonormal basis for X. Let A ∈ L (X, X) be self adjoint and nonnegative and let M be its matrix with respect to the given orthonormal basis. Show that M is nonnegative, self adjoint also. Use this to show that A has a unique nonnegative self adjoint k th root. ( ) σ 0 22. Let A be a complex m × n matrix having singular value decomposition U ∗ AV = 0 0 as explained above, where σ is k × k. Show that ker (A) = span (V ek+1 , · · · , V en ) , the last n − k columns of V . 23. The principal submatrices of an n × n matrix A are Ak where Ak consists those entries which are in the first k rows and first k columns of A. Suppose A is a real symmetric matrix and that x → ⟨Ax, x⟩ is positive definite. This means that if x ̸= 0, then ⟨Ax, x⟩ > 0. Show ( that ) ( ) x T each of the principal submatrices are positive definite. Hint: Consider x 0 A 0 where x consists of k entries. 24. ↑A matrix A has an LU factorization if it there exists a lower triangular matrix L having all ones on the diagonal and an upper triangular matrix U such that A = LU . Show that if A is a symmetric positive definite n × n real matrix, then A has an LU factorization with the

370

CHAPTER 14. MATRICES AND THE INNER PRODUCT property that each entry on the main diagonal in U is positive. Hint: This is pretty clear if A is 1×1. Assume true for (n − 1) × (n − 1). Then ( ) Aˆ a A= aT ann Then as above, Aˆ is positive definite. Thus it has an LU factorization with all positive entries on the diagonal of U . Notice that, using block multiplication, ( ) ( )( ) LU a L 0 U L−1 a A= = aT ann 0 1 aT ann ˜U ˜ where U ˜ has all positive Now consider that matrix on the right. Argue that it is of the form L diagonal entries except possibly for the one in the nth row and nth column. Now explain why ˜ are positive. det (A) > 0 and argue that in fact all diagonal entries of U

25. ↑Let A be a real symmetric n × n matrix and A = LU where L has all ones down the diagonal and U has all positive entries down the main diagonal. Show that A = LDH where L is lower triangular and H is upper triangular, each having all ones down the diagonal and D a diagonal matrix having all positive entries down the main diagonal. In fact, these are the diagonal entries of U . 26. ↑Show that if L, L1 are lower triangular with ones down the main diagonal and H, H1 are upper triangular with all ones down the main diagonal and D, D1 are diagonal matrices having all positive diagonal entries, and if LDH = L1 D1 H1 , then L = L1 , H = H1 , D = D1 . Hint: −1 Explain why D1−1 L−1 . Then explain why the right side is upper triangular and 1 LD = H1 H the left side is lower triangular. Conclude these are both diagonal matrices. However, there are all ones down the diagonal in the expression on the right. Hence H = H1 . Do something similar to conclude that L = L1 and then that D = D1 . 27. ↑Show that if A is a symmetric real matrix such that x → ⟨Ax, x⟩ is positive definite, then there exists a lower triangular matrix L having all positive entries down the diagonal such that A = LLT . Hint: From the above, A = LDH where L, H are respectively lower and upper triangular having all ones down the diagonal and D is a diagonal matrix having all positive entries. Then argue from the above problem and symmetry of A that H = LT . Now modify L by making it equal to LD1/2 . This is called the Cholesky factorization. 28. Given F ∈ L (X, Y ) where X, Y are inner product spaces and dim (X) = n ≤ m = dim (Y ) , there exists R, U such that U is nonnegative and Hermitian (U = U ∗ ) and R∗ R = I such that F = RU. Show that U is actually unique and that R is determined on U (X) . This was done in the book, but try to remember why this is so. 29. If A is a complex Hermitian n × n matrix which has all eigenvalues nonnegative, show that there exists a complex Hermitian matrix B such that BB = A. 30. ↑Suppose A, B are n × n real Hermitian matrices and they both have all nonnegative eigenvalues. Show that det (A + B) ≥ det (A)+det (B). Hint: Use the above problem and the Cauchy Binet theorem. Let P 2 = A, Q2 = B where P, Q are Hermitian and nonnegative. Then ( ) ( ) P A+B = P Q . Q (

) α c∗ is an (n + 1) × (n + 1) Hermitian nonnegative matrix where α is a 31. Suppose B = b A scalar and A is n × n. Show that α must be real, c = b, and A = A∗ , A is nonnegative, and that if α = 0, then b = 0. Otherwise, α > 0.

14.20. EXERCISES

371

32. ↑If A is an n × n complex Hermitian and nonnegative matrix, show that there exists an upper triangular matrix B such that B ∗ B = A. Hint: Prove this by induction. It is obviously true if n = 1. Now if you have an (n ( + 1) × (n + 1))Hermitian nonnegative matrix, then from the α2 αb∗ above problem, it is of the form , α real. αb A 33. ↑ Suppose A is a nonnegative Hermitian matrix (all eigenvalues are nonnegative) which is partitioned as ( ) A11 A12 A= A21 A22 where A11 , A22 are square matrices. Show that det (A) ≤ det (A11 ) det (A22 ). Hint: Use the above problem to factor A getting )( ) ( ∗ B11 B12 0∗ B11 A= ∗ ∗ 0 B22 B12 B22 ∗ ∗ ∗ B22 . Use the Cauchy Binet theorem to Next argue that A11 = B11 B11 , A22 = B12 B12 + B22 ∗ ∗ ∗ B22 ) . Then explain why argue that det (A22 ) = det (B12 B12 + B22 B22 ) ≥ det (B22

det (A) =

∗ ∗ det (B11 ) det (B22 ) det (B11 ) det (B22 )

∗ ∗ = det (B11 B11 ) det (B22 B22 )

34. ↑ Prove the inequality of Hadamard. If A is ∏a Hermitian matrix which is nonnegative (all eigenvalues are nonnegative), then det (A) ≤ i Aii .

372

CHAPTER 14. MATRICES AND THE INNER PRODUCT

Chapter 15

Analysis Of Linear Transformations 15.1

The Condition Number

Let A ∈ L (X, X) be a linear transformation where X is a finite dimensional vector space and consider the problem Ax = b where it is assumed there is a unique solution to this problem. How does the solution change if A is changed a little bit and if b is changed a little bit? This is clearly an interesting question because you often do not know A and b exactly. If a small change in these quantities results in a large change in the solution, x, then it seems clear this would be undesirable. In what follows ||·|| when applied to a linear transformation will always refer to the operator norm. Recall the following property of the operator norm in Theorem 11.6.3. Lemma 15.1.1 Let A, B ∈ L (X, X) where X is a normed vector space as above. Then for ||·|| denoting the operator norm, ||AB|| ≤ ||A|| ||B|| . Lemma 15.1.2 Let A, B ∈ L (X, X) , A−1 ∈ L (X, X) , and suppose ||B|| < 1/ A−1 . Then )−1 −1 ( (A + B) , I + A−1 B exists and

( )−1 )−1

( (15.1)

I + A−1 B

≤ 1 − A−1 B 1 −1 −1 (A + B) ≤ A 1 − ||A−1 B|| . The above formula makes sense because A−1 B < 1.

(15.2)

Proof: By Lemma 11.6.3, −1 −1 A B ≤ A ||B|| < A−1

1 =1 ||A−1 ||

Then from the triangle inequality, ( ) I + A−1 B x ≥ ||x|| − A−1 Bx ) ( ≥ ||x|| − A−1 B ||x|| = 1 − A−1 B ||x||

(15.3)

( ) It follows that I +A−1 B is one to one because from 15.3, 1− A−1 B > 0. Thus if I + A−1 B x = 0, −1 then x = (0. Thus I + ) A B is also onto, taking a basis to a basis. Then a generic y ∈ X is of the −1 form y = I + A B x and the above shows that

( )−1 )−1

(

∥y∥ y ≤ 1 − A−1 B

I + A−1 B 373

374

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

( ) which verifies 15.1. Thus (A + B) = A I + A−1 B is one to one and this with Lemma 11.6.3 implies 15.2.  Proposition 15.1.3 Suppose A is invertible, b ̸= 0, Ax = b, and (A + B) x1 = b1 where ||B|| < 1/ A−1 . Then

−1

A ∥A∥ ( ∥b1 − b∥ ∥B∥ ) ||x1 − x|| ≤ + ||x|| 1 − ∥A−1 B∥ ∥b∥ ∥A∥ Proof: This follows from the above lemma.

(

)−1 −1

A b1 − A−1 b

I + A−1 B ∥x1 − x∥ = ∥x∥ ∥A−1 b∥

−1

( )

A b1 − I + A−1 B A−1 b 1 ≤ 1 − ∥A−1 B∥ ∥A−1 b∥

≤ ≤

−1



A (b1 − b) + A−1 BA−1 b 1 1 − ∥A−1 B∥ ∥A−1 b∥

−1 ( )

A ∥b1 − b∥ + ∥B∥ 1 − ∥A−1 B∥ ∥A−1 b∥

because A−1 b/ A−1 b is a unit vector. Now multiply and divide by ∥A∥ . Then

−1 )

A ∥A∥ ( ∥b1 − b∥ ∥B∥ ≤ + 1 − ∥A−1 B∥ ∥A∥ ∥A−1 b∥ ∥A∥

−1

A ∥A∥ ( ∥b1 − b∥ ∥B∥ ) ≤ + .  1 − ∥A−1 B∥ ∥b∥ ∥A∥ This shows that the number, A−1 ||A|| , controls how sensitive the relative change in the solution of Ax = b is to small changes in A and b. This number is called the condition number. It is bad when this number is large because a small relative change in b, for example could yield a large relative change in x. Recall that for A an n × n matrix, ||A||2 = σ 1 where σ 1 is the largest singular value. The largest singular value of A−1 is therefore, 1/σ n where σ n is the smallest singular value of A. Therefore, the condition number is controlled by σ 1 /σ n , the ratio of the largest to the smallest singular value of A provided the norm is the usual Euclidean norm.

15.2

The Spectral Radius

Even though it is in general impractical to compute the Jordan form, its existence is all that is needed in order to prove an important theorem about something which is relatively easy to compute. This is the spectral radius of a matrix. Definition 15.2.1 Define σ (A) to be the eigenvalues of A. Also, ρ (A) ≡ max (|λ| : λ ∈ σ (A)) The number, ρ (A) is known as the spectral radius of A. Recall the following symbols and their meaning. lim sup an , lim inf an n→∞

n→∞

15.2. THE SPECTRAL RADIUS

375

They are respectively the largest and smallest limit points of the sequence {an } where ±∞ is allowed in the case where the sequence is unbounded. They are also defined as ≡

lim sup an n→∞

lim (sup {ak : k ≥ n}) ,

n→∞



lim inf an n→∞

lim (inf {ak : k ≥ n}) .

n→∞

Thus, the limit of the sequence exists if and only if these are both equal to the same real number. Lemma 15.2.2 Let J be a p × p Jordan block  λ 1   λ  J =   0



0

..

.

..

.

     1  λ

Then 1/n

lim ||J n ||

= |λ|

n→∞

Proof: The norm on matrices can be any norm. It could be operator norm for example. If λ = 0, there is nothing to show because J p = 0 and so the limit is obviously 0. Therefore, assume λ ̸= 0. ) ( p ∑ n Jn = N i λn−i i i=0 Then ∥J n ∥ ≤

p ∑

(

i=0 n

n i

)

p ∑

i n−i n n

N |λ| = |λ| + C |λ|

(

i=1

n i

n ˜ p ≤ |λ| (1 + Cnp ) ≤ |λ| Cn

∑p −i where the C depends on i=1 |λ| and the N i . Therefore,

lim sup ∥J n ∥

1/n

n→∞

≤ |λ| lim sup

(

˜ p Cn

) |λ|

−i

(15.4) (15.5)

)1/n

n→∞

= |λ|

Next let x be an eigenvector for λ such that ∥x∥ = 1, the norm being whatever norm is desired. Then Jx = λx. It follows that J n x = λn x Thus n

∥J n ∥ ≥ ∥J n x∥ = |λ| ∥x∥ = |λ|

n

It follows that lim inf ∥J n ∥

1/n

n→∞

≥ |λ|

Therefore, 1/n

|λ| ≤ lim inf ∥J n ∥ n→∞

1/n

≤ lim sup ∥J n ∥ n→∞

≤ |λ|

1/n

which shows that limn→∞ ∥J n ∥ = |λ|. The same conclusion holds for any other norm. Indeed, if |||·||| were another norm, there are constants δ, ∆ such that 1/n

δ 1/n ∥J n ∥

1/n

≤ |||J n |||

≤ ∆1/n ∥J n ∥

1/n

Then since limn→∞ δ 1/n = limn→∞ ∆1/n = 1, the squeezing theorem from calculus implies that lim |||J n |||

n→∞

1/n

= ρ. 

376

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

Corollary 15.2.3 Let J be in Jordan canonical form  J1  .. J = . 

    Js

where each Jk is a block diagonal having λk on the main diagonal and strings of ones on the super diagonal, as described earlier. Also let ρ ≡ max {|λi | : λi ∈ σ (J)} . Then for any norm ∥·∥ 1/n

lim ∥J n ∥



n→∞

Proof: For convenience, take the norm to be given as ∥A∥ ≡ max {|Aij | , i, j}. Then with this norm, { } 1/n 1/n ∥J n ∥ = max ∥Jkn ∥ , k = 1, · · · , s From Lemma 15.2.2, lim ∥Jkn ∥

1/n

n→∞

= |λk |

Therefore, lim ∥J n ∥

1/n

n→∞

(

{ }) 1/n max ∥Jkn ∥ , k = 1, · · · , s n→∞ ( ) 1/n = max lim ∥Jkn ∥ = max |λk | = ρ.

=

lim k

n→∞

k

Now let the norm on the matrices be any other norm say |||·||| . By equivalence of norms, there are δ, ∆ such that δ ∥A∥ ≤ |||A||| ≤ ∆ ∥A∥ for all matrices A. Therefore, 1/n

δ 1/n ∥J n ∥

1/n

1/n

≤ |||J n |||

≤ ∆1/n ∥J n ∥

and so, passing to a limit, it follows that, since limn→∞ δ 1/n = limn→∞ ∆1/n = 1, 1/n

ρ = lim |||J n ||| n→∞



Theorem 15.2.4 (Gelfand) Let A be a complex p × p matrix. Then if ρ is the absolute value of its largest eigenvalue, 1/n lim ||An || = ρ. n→∞

Here ||·|| is any norm on L (C , C ). n

n

Proof: Let ||·|| be the operator norm on L (Cn , Cn ). Then letting J denote the Jordan form of A, S −1 AS = J and these two J, A have the same eigenvalues. Thus it follows from Corollary 15.2.3 1/n

lim sup ||An ||

=

n→∞

1/n

)1/n ( ≤ lim sup ∥S∥ S −1 ∥J n ∥ lim sup SJ n S −1 n→∞

n→∞

=

)1/n n 1/n

∥J ∥ lim ∥S∥ S −1 =ρ (

n→∞

Letting λ be the largest eigenvalue of A, |λ| = ρ, and Ax = λx where ∥x∥ = 1, ∥An ∥ ≥ ∥An x∥ = ρn and so 1/n

lim inf ∥An ∥ n→∞

1/n

≥ ρ ≥ lim sup ∥An ∥ n→∞

15.3. SERIES AND SEQUENCES OF LINEAR OPERATORS 1/n

1/n

377 1/n

If follows that lim inf n→∞ ||An || = lim supn→∞ ||An || = limn→∞ ||An || = ρ. As in Corollary 15.2.3, there is no difference if any other norm is used because they are all equivalent.  I would argue that a better way to prove this theorem is to use the theory of complex analysis and tie it in with a Laurent series. However, there is a non trivial issue related to the set of convergence of the Laurent series which involves the theory of functions of a complex variable and this knowledge is not being assumed here. Thus the above gives an algebraic proof which does not involve so much hard analysis.   9 −1 2   Example 15.2.5 Consider  −2 8 4  . Estimate the absolute value of the largest eigenvalue. 1 1 8 A laborious computation reveals the eigenvalues are 5 and 10. Therefore, the right answer in this 1/9 where the norm is obtained by taking the maximum of all the absolute case is 10. Consider A9 values of the entries. Thus 

9   −2 1

9  −1 2 800 390 625   8 4  =  −399 218 750 1 8 199 609 375

 −199 609 375 399 218 750  600 781 250 798 437 500  199 609 375 600 781 250

and taking the seventh root of the largest entry gives ρ (A) u 800 390 6251/9 = 9. 755 6. Of course the interest lies primarily in matrices for which the exact roots to the characteristic equation are not known and in the theoretical significance.

15.3

Series And Sequences Of Linear Operators

Before beginning this discussion, it is necessary to define what is meant by convergence in L (X, Y ) . ∞

Definition 15.3.1 Let {Ak }k=1 be a sequence in L (X, Y ) where X, Y are finite dimensional normed linear spaces. Then limn→∞ Ak = A if for every ε > 0 there exists N such that if n > N, then ||A − An || < ε. Here the norm refers to any of the norms defined on L (X, Y ) . By Corollary 11.5.4 and Theorem 5.1.4 it doesn’t matter which one is used. Define the symbol for an infinite sum in the usual way. Thus n ∞ ∑ ∑ Ak Ak ≡ lim k=1 ∞ {Ak }k=1

Lemma 15.3.2 Suppose normed linear spaces. Then if

n→∞

k=1

is a sequence in L (X, Y ) where X, Y are finite dimensional ∞ ∑

||Ak || < ∞,

k=1

It follows that

∞ ∑

Ak

k=1

exists (converges). In words, absolute convergence implies convergence. Also,





∑ ∑

Ak ≤ ∥Ak ∥

k=1

k=1

(15.6)

378

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS Proof: For p ≤ m ≤ n,

m n ∞ ∑ ∑ ∑ Ak − Ak ≤ ||Ak || k=1

k=1

k=p

and so for p large enough, this term on the right in the above inequality is less than ε. Since ε is arbitrary, this shows the partial sums of 15.6 are a Cauchy sequence. Therefore by Corollary 11.5.4 it follows that these partial sums converge. As to the last claim,

n n ∞



∑ ∑

Ak ≤ ∥Ak ∥ ≤ ∥Ak ∥

k=1

Therefore, passing to the limit,

k=1

k=1





∑ ∑

Ak ≤ ∥Ak ∥ . 

k=1

k=1

Why is this last step justified? (Recall the triangle inequality |∥A∥ − ∥B∥| ≤ ∥A − B∥. ) Now here is a useful result for differential equations. Theorem 15.3.3 Let X be a finite dimensional inner product space and let A ∈ L (X, X) . Define Φ (t) ≡

∞ k k ∑ t A

k!

k=0

Then the series converges for each t ∈ R. Also ∞



k=1

k=0

∑ tk Ak Φ (t + h) − Φ (t) ∑ tk−1 Ak = =A = AΦ (t) h→0 h (k − 1)! k!

Φ′ (t) ≡ lim

Also AΦ (t) = Φ (t) A and for all t, Φ (t) Φ (−t) = I so Φ (t) that A0 = I in the above formula.)

−1

= Φ (−t), Φ (0) = I. (It is understood

Proof: First consider the claim about convergence. ∞ ∞ k k k k ∑

t A ∑ |t| ∥A∥

≤ = e|t|∥A∥ < ∞

k! k! k=0

k=0

so it converges by Lemma 15.3.2. Φ (t + h) − Φ (t) h

=

=

( ) k k ∞ (t + h) − t Ak ∑ 1 h k! k=0 ( ) k−1 ∞ h Ak 1 ∑ k (t + θk h) h

k=0

k!

=

∞ k−1 k ∑ (t + θk h) A k=1

(k − 1)!

this by the mean value theorem. Note that the series converges thanks to Lemma 15.3.2. Here θk ∈ (0, 1). Thus ( )



∞ (t + θ h)k−1 − tk−1 Ak ∞

Φ (t + h) − Φ (t) ∑ k−1 k ∑ k

t A



=

h (k − 1)! (k − 1)!

k=1

k=1

( ) ( )



∞ (k − 1) (t + τ θ h)k−2 θ h Ak

∞ (t + τ θ h)k−2 θ Ak k k k k k k





= |h|

=



(k − 1)! (k − 2)!

k=1

k=2

15.3. SERIES AND SEQUENCES OF LINEAR OPERATORS ≤ |h|

∞ k−2 k−2 ∑ (|t| + |h|) ∥A∥

(k − 2)!

k=2

2

379 2

∥A∥ = |h| e(|t|+|h|)∥A∥ ∥A∥ 2

so letting |h| < 1, this is no larger than |h| e(|t|+1)∥A∥ ∥A∥ . Hence the desired limit is valid. It is obvious that AΦ (t) = Φ (t) A. Also the formula shows that Φ′ (t) = AΦ (t) = Φ (t) A, Φ (0) = I. Now consider the claim about Φ (−t) . The above computation shows that Φ′ (−t) = AΦ (−t) d and so dt (Φ (−t)) = −Φ′ (−t) = −AΦ (−t). Now let x, y be two vectors in X. Consider (Φ (−t) Φ (t) x, y)X Then this equals (x, y) when t = 0. Take its derivative. ((−Φ′ (−t) Φ (t) + Φ (−t) Φ′ (t)) x, y)X = ((−AΦ (−t) Φ (t) + Φ (−t) AΦ (t)) x, y)X = (0, y)X = 0 Hence this scalar valued function equals a constant and so the constant must be (x, y)X . Hence for all x, y, (Φ (−t) Φ (t) x − x, y)X = 0 for all x, y and this is so in particular for y = Φ (−t) Φ (t) x − x which shows that Φ (−t) Φ (t) = I.  In fact, one can prove a group identity of the form Φ (t + s) = Φ (t) Φ (t) for all t, s ∈ R. Corollary 15.3.4 Let Φ (t) be given as above. Then Φ (t + s) = Φ (t) Φ (s) for any s, t ∈ R. Proof: Let y (t) ≡ (Φ (t) Φ (s) − Φ (t + s)) x. Thus y (0) = 0. Now pick z ∈ X. y′ (t) = Ay (t) , (y′ (t) , z) = (Ay (t) , z) Now, as above,

d dt

(Φ (−t)) = −AΦ (−t) and so d (Φ (−t) y (t) , z) = (−AΦ (−t) y (t) + Φ (−t) y′ (t) , z) dt

=

(−AΦ (−t) y (t) + Φ (−t) y′ (t) , z) = (−AΦ (−t) y (t) + Φ (−t) Ay (t) , z)

=

(−AΦ (−t) y (t) + AΦ (−t) y (t) , z) = 0

It follows from beginning calculus that (Φ (−t) y (t) , z) = C However, y (0) = 0 and so C = 0. Since z is arbitrary, it follows that Φ (−t) y (t) = 0. By Theorem 15.3.3, y (t) = 0. Now x was arbitrary so also Φ (t) Φ (s) − Φ (t + s) = 0 which proves the corollary.  Note how this also shows that Φ (t) commutes with Φ (s) for any t, s. Also note that all of this works with no change if A ∈ L (X, X) where X is a Hilbert space, possibly not finite dimensional. In fact you don’t even need a Hilbert space. It would work fine with a Banach space, and you would replace the inner product with the pairing with the dual space but this requires more functional analysis than what is considered here. As a special case, suppose λ ∈ C and consider ∞ k k ∑ t λ k=0

where t ∈ R. In this case, Ak = corollary is of great interest.

tk λk k!

k!

and you can think of it as being in L (C, C). Then the following

380

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

Corollary 15.3.5 Let f (t) ≡

∞ k k ∑ t λ k=0

k!

≡1+

∞ k k ∑ t λ k=1

k!

Then this function is a well defined complex valued function and furthermore, it satisfies the initial value problem, y ′ = λy, y (0) = 1 Furthermore, if λ = a + ib, |f | (t) = eat . Proof: The first part is a special case of the above theorem. Then ¯ y , y¯ (0) = 1 y¯′ = λ¯ It follows d 2 |y (t)| dy

= y ′ (t) y¯ (t) + y (t) y¯′ (t) 2

2

= (a + ib) |y (t)| + (a − ib) |y (t)| =

2

2

2a |y (t)| , |y (0)| = 1

It follows 2

|y (t)| = e2at , |y (t)| = eat as claimed. This follows because in general, if z ′ = cz, z (0) = 1, with c real it follows z (t) = ect . To see this, z ′ − cz = 0 and so, multiplying both sides by e−ct you get d ( −ct ) ze =0 dt and so ze−ct equals a constant which must be 1 because of the initial condition z (0) = 1.  Definition 15.3.6 The function in Corollary 15.3.5 given by that power series is denoted as exp (λt) or eλt . The next lemma is normally discussed in advanced calculus courses but is proved here for the convenience of the reader. It is known as the root test. Definition 15.3.7 For {an } any sequence of real numbers lim sup an ≡ lim (sup {ak : k ≥ n}) n→∞

n→∞

Similarly lim inf an ≡ lim (inf {ak : k ≥ n}) n→∞

n→∞

In case An is an increasing (decreasing) sequence which is unbounded above (below) then it is understood that limn→∞ An = ∞ (−∞) respectively. Thus either of lim sup or lim inf can equal +∞ or −∞. However, the important thing about these is that unlike the limit, these always exist.

15.4. ITERATIVE METHODS FOR LINEAR SYSTEMS

381

It is convenient to think of these as the largest point which is the limit of some subsequence of {an } and the smallest point which is the limit of some subsequence of {an } respectively. Thus limn→∞ an exists and equals some point of [−∞, ∞] if and only if the two are equal. Lemma 15.3.8 Let {ap } be a sequence of nonnegative terms and let r = lim sup a1/p p . p→∞

∑∞

Then if r < 1, it follows the series, k=1 ak converges and if r > 1, then ap fails to converge to 0 so the series diverges. If A is an n × n matrix and 1/p

r = lim sup ||Ap ||

,

(15.7)

p→∞

∑∞ then if r > 1, then k=0 Ak fails to converge and if r < 1 then the series converges. Note that the series converges when the spectral radius is less than one and diverges if the spectral radius is larger 1/p 1/p than one. In fact, lim supp→∞ ||Ap || = limp→∞ ||Ap || from Theorem 15.2.4. Proof: Suppose r < 1. Then there exists N such that if p > N, a1/p 1. Then letting 1 < R < r, it follows there are infinitely many values of p at which R < a1/p p which implies Rp < ap , showing that ap cannot converge to 0 and so the series cannot converge either. {∑m }∞ To see the last claim, if r > 1, then ||Ap || fails to converge to 0 and so Ak m=0 is not a k=0 ∑∞ ∑m Cauchy sequence. Hence k=0 Ak ≡ limm→∞ k=0 Ak cannot exist. If r < 1, then for all n large ∑ 1/n enough, ∥An ∥ ≤ r∑ < 1 for some r so ∥An ∥ ≤ rn . Hence n ∥An ∥ converges and so by Lemma ∞ 15.3.2, it follows that k=1 Ak also converges.  p Now denote by σ (A) the collection of all numbers of the form λp where λ ∈ σ (A) . p

Lemma 15.3.9 σ (Ap ) = σ (A) ≡ {λp : λ ∈ σ (A)}. Proof: In dealing with σ (Ap ) , it suffices to deal with σ (J p ) where J is the Jordan form of A because J p and Ap are similar. Thus if λ ∈ σ (Ap ) , then λ ∈ σ (J p ) and so λ = α where α is one of the entries on the main diagonal of J p . These entries are of the form λp where λ ∈ σ (A). Thus p p λ ∈ σ (A) and this shows σ (Ap ) ⊆ σ (A) . Now take α ∈ σ (A) and consider αp . ( ) αp I − Ap = αp−1 I + · · · + αAp−2 + Ap−1 (αI − A) p

and so αp I − Ap fails to be one to one which shows that αp ∈ σ (Ap ) which shows that σ (A) ⊆ σ (Ap ) . 

15.4

Iterative Methods For Linear Systems

Consider the problem of solving the equation Ax = b

(15.8)

where A is an n × n matrix. In many applications, the matrix A is huge and composed mainly of zeros. For such matrices, the method of Gauss elimination (row operations) is not a good way to

382

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

solve the system because the row operations can destroy the zeros and storing all those zeros takes a lot of room in a computer. These systems are called sparse. The method is to write A=B−C where B −1 exists. Then the system is of the form Bx = Cx + b and so the solution is solves

x = B −1 Cx + B −1 b ≡ T x

In other words, you look for a fixed point of T . There are standard methods for finding such fixed points which hold in general Banach spaces which is the term for a complete normed linear space. Definition 15.4.1 A normed vector space, E with norm ||·|| is called a Banach space if it is also ∞ complete. This means that every Cauchy sequence converges. Recall that a sequence {xn }n=1 is a Cauchy sequence if for every ε > 0 there exists N such that whenever m, n > N, ||xn − xm || < ε. Thus whenever {xn } is a Cauchy sequence, there exists x such that lim ||x − xn || = 0.

n→∞

The following is an example of an infinite dimensional Banach space. We have already observed that finite dimensional normed linear spaces are Banach spaces. Example 15.4.2 Let E be a Banach space and let Ω be a nonempty subset of a normed linear space F . Let B (Ω; E) denote those functions f for which ||f || ≡ sup {||f (x)||E : x ∈ Ω} < ∞ Denote by BC (Ω; E) the set of functions in B (Ω; E) which are also continuous. Lemma 15.4.3 The above ∥·∥ is a norm on B (Ω; E). The subspace BC (Ω; E) with the given norm is a Banach space. Proof: It is obvious ||·|| is a norm. It only remains to verify BC (Ω; E) is complete. Let {fn } be a Cauchy sequence. Since ∥fn − fm ∥ → 0 as m, n → ∞, it follows that {fn (x)} is a Cauchy sequence in E for each x. Let f (x) ≡ limn→∞ fn (x). Then for any x ∈ Ω. ||fn (x) − fm (x)||E ≤ ||fn − fm || < ε whenever m, n are large enough, say as large as N . For n ≥ N, let m → ∞. Then passing to the limit, it follows that for all x, ||fn (x) − f (x)||E ≤ ε and so for all x, ∥f (x)∥E ≤ ε + ∥fn (x)∥E ≤ ε + ∥fn ∥ . It follows that ∥f ∥ ≤ ∥fn ∥ + ε and ∥f − fn ∥ ≤ ε. It remains to verify that f is continuous. ∥f (x) − f (y)∥E



∥f (x) − fn (x)∥E + ∥fn (x) − fn (y)∥E + ∥fn (y) − f (y)∥E 2ε ≤ 2 ∥f − fn ∥ + ∥fn (x) − fn (y)∥E < + ∥fn (x) − fn (y)∥E 3

for all n large enough. Now pick such an n. By continuity, the last term is less than 3ε if ∥x − y∥ is small enough. Hence f is continuous as well.  The most familiar example of a Banach space is Fn . The following lemma is of great importance so it is stated in general.

15.4. ITERATIVE METHODS FOR LINEAR SYSTEMS

383

Lemma 15.4.4 Suppose T : E → E where E is a Banach space with norm |·|. Also suppose |T x − T y| ≤ r |x − y|

(15.9)

for some r ∈ (0, 1). Then there exists a unique fixed point, x ∈ E such that T x = x.

(15.10)

Letting x1 ∈ E, this fixed point x, is the limit of the sequence of iterates, x1 , T x1 , T 2 x1 , · · · .

(15.11)

In addition to this, there is a nice estimate which tells how close x1 is to x in terms of things which can be computed. 1 x − x ≤ 1 x1 − T x1 . (15.12) 1−r }∞ { Proof: This follows easily when it is shown that the above sequence, T k x1 k=1 is a Cauchy sequence. Note that 2 1 T x − T x1 ≤ r T x1 − x1 . Suppose

k 1 T x − T k−1 x1 ≤ rk−1 T x1 − x1 .

(15.13)

Then k+1 1 T x − T k x1

≤ r T k x1 − T k−1 x1 ≤ rrk−1 T x1 − x1 = rk T x1 − x1 .

By induction, this shows that for all k ≥ 2, 15.13 is valid. Now let k > l ≥ N. k−1 ∑( ∑ k 1 ) k−1 j+1 1 j 1 T j+1 x1 − T j x1 T x − T l x 1 = T x −T x ≤ j=l j=l ≤

k−1 ∑ j=N

rN r j T x 1 − x 1 ≤ T x 1 − x 1 1−r

which converges to 0 as N → ∞. Therefore, this is a Cauchy sequence so it must converge to x ∈ E. Then x = lim T k x1 = lim T k+1 x1 = T lim T k x1 = T x. k→∞

k→∞

k→∞

This shows the existence of the fixed point. To show it is unique, suppose there were another one, y. Then |x − y| = |T x − T y| ≤ r |x − y| and so x = y. It remains to verify the estimate. 1 x − x ≤ x1 − T x1 + T x1 − x = x1 − T x1 + T x1 − T x ≤ x1 − T x1 + r x1 − x and solving the inequality for x1 − x gives the estimate desired.  The following corollary is what will be used to prove the convergence condition for the various iterative procedures.

384

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

Corollary 15.4.5 Suppose T : E → E, for some constant C |T x − T y| ≤ C |x − y| , for all x, y ∈ E, and for some N ∈ N, N T x − T N y ≤ r |x − y| , for all x, y ∈ E where r}∈ (0, 1). Then there exists a unique fixed point for T and it is still the limit { of the sequence, T k x1 for any choice of x1 . Proof: From Lemma 15.4.4 there exists a unique fixed point for T N denoted here as x. Therefore, T x = x. Now doing T to both sides, T N T x = T x. N

By uniqueness, T x = x because the above equation shows T x is a fixed point of T N and there is only one fixed point of T N . In fact, there is only one fixed point of T because a fixed point of T is automatically a fixed point of T N . It remains to show T k x1 → x, the unique fixed point of T N . If this does not happen, there exists ε > 0 and a subsequence, still denoted by T k such that k 1 T x − x ≥ ε Now k = jk N + rk where rk ∈ {0, · · · , N − 1} and jk is a positive integer such that limk→∞ jk = ∞. Then there exists a single r ∈ {0, · · · , N − 1} such that for infinitely many k, rk = r. Taking a further subsequence, still denoted by T k it follows j N +r 1 T k (15.14) x − x ≥ ε However, T jk N +r x1 = T r T jk N x1 → T r x = x and this contradicts 15.14.  Now return to our system Ax = b. Recall it was a fixed point of T where x = B −1 Cx + B −1 b ≡ T x Then the fundamental theorem on convergence is the following. First note the following. ( ) T 2 x = B −1 C B −1 Cx + b + B −1 b ( )2 = B −1 C + e2 (b) where e2 (b) does not depend on x. Similarly, ( )n T n x = B −1 C + en (b)

(15.15)

where en (b) does not depend on x. Thus ( )n |T n x − T n y| ≤ B −1 C |x − y| . ( ) Theorem 15.4.6 Suppose ρ B −1 C < 1. Then the iterates described above converge to the unique solution of Ax = b. Proof: Consider the above iterates. Let T x = B −1 Cx + B −1 b. Then k ( ( ) ) ( −1 )k

T x − T k y = B −1 C k x − B −1 C k y ≤

B C |x − y| . Here ||·|| refers to any of the operator norms. It doesn’t matter which one you pick because they are all equivalent. I am writing the proof to indicate the operator norm taken with respect to the

15.4. ITERATIVE METHODS FOR LINEAR SYSTEMS

385

( ) usual norm on E. Since ρ B −1 C < 1, it follows theorem, Theorem 15.2.4 on Page ( from) Gelfand’s −1 k 376, there exists N such that if k ≥ N, then B C ≤ r < 1. Consequently, N T x − T N y ≤ r |x − y| . Also |T x − T y| ≤ B −1 C |x − y| and so Corollary 15.4.5 applies and gives the conclusion of this theorem.  In the Jacobi method, you have   ∗ ∗   ..  A= .   ∗ ∗ and you let B be the diagonal matrix whose diagonal entries are those of A and you let C be (−1) times the matrix obtained from A by making the diagonal entries 0 and retaining all the other entries of A. Thus     ∗ 0 0 ∗     .. .. , C = −  B= . .     0 ∗ ∗ 0 In the Gauss Seidel method, you let  ∗  B=  ∗

0 ..

.





  , C = −  





0 .. 0

.

   

0

Thus you keep the entries of A which are on or below the main diagonal in order to get B. To get C you take −1 times the matrix obtained from A by replacing all entries below and on the main diagonal with zeros. Observation 15.4.7 Note that if A is diagonally dominant, meaning ∑ |aii | > |aij | j̸=i

then in both cases above,

( ) ρ B −1 C < 1

so the two iterative procedures will converge. To see this, suppose

B −1 Cx = λx, |λ| ≥ 1.

Then you get (λB − C) x = 0 However, in either the case of Jacobi iteration or Gauss Seidel iteration, the matrix λB − C will be diagonally dominant and so by Gerschgorin’s theorem will have no zero eigenvalues which requires ( ) that this matrix be one to one. Thus there are no eigenvectors for such λ and hence ρ B −1 C < 1.

386

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

15.5

Exercises

1. Solve the system



4   1 0

1 5 2

    1 x 1     2  y  =  2  6 z 3

using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it using row operations. 2. Solve the system



4   1 0

1 7 2

    1 x 1     = 2  y   2  4 z 3

using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it using row operations. 3. Solve the system



5   1 0

1 7 2

    1 x 1     2  y  =  2  4 z 3

using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it using row operations. 4. If you are considering a system of the form Ax = b and A−1 does not exist, will either the Gauss Seidel or Jacobi methods work? Explain. What does this indicate about finding eigenvectors for a given eigenvalue? 5. For ||x||∞ ≡ max {|xj | : j = 1, 2, · · · , n} , the parallelogram identity does not hold. Explain. 6. A norm ||·|| is said to be strictly convex if whenever ||x|| = ||y|| , x ̸= y, it follows x + y 2 < ||x|| = ||y|| . Show the norm |·| which comes from an inner product is strictly convex. 7. A norm ||·|| is said to be uniformly convex if whenever ||xn || , ||yn || are equal to 1 for all n ∈ N and limn→∞ ||xn + yn || = 2, it follows limn→∞ ||xn − yn || = 0. Show the norm |·| coming from an inner product is always uniformly convex. Also show that uniform convexity implies strict convexity which is defined in Problem 6. 8. Suppose A : Cn → Cn is a one to one and onto matrix. Define ||x|| ≡ |Ax| . Show this is a norm. 9. If X is a finite dimensional normed vector space and A, B ∈ L (X, X) such that ||B|| < ||A|| and A−1 exists, can it be concluded that A−1 B < 1? Either give a counter example or a proof. 10. Let X be a vector space with a norm ||·|| and let V = span (v1 , · · · , vm ) be a finite dimensional subspace of X such that {v1 , · · · , vm } is a basis for V. Show V is a closed subspace of X. This means that if wn → w and each wn ∈ V, then so is w. Next show that if w ∈ / V, dist (w, V ) ≡ inf {||w − v|| : v ∈ V } > 0

15.5. EXERCISES

387

is a continuous function of w and |dist (w, V ) − dist (w1 , V )| ≤ ∥w1 − w∥ Next show that if w ∈ / V, there exists z such that ||z|| = 1 and dist (z, V ) > 1/2. For those who know some advanced calculus, show that if X is an infinite dimensional vector space having norm ||·|| , then the closed unit ball in X cannot be compact. Thus closed and bounded is never compact in an infinite dimensional normed vector space. 11. Suppose ρ (A) < 1 for A ∈ L (V, V ) where V is a p dimensional vector space having a norm ||·||. You can use Rp or Cp if you like. Show there exists a new norm |||·||| such that with respect to this new norm, |||A||| < 1 where |||A||| denotes the operator norm of A taken with respect to this new norm on V , |||A||| ≡ sup {|||Ax||| : |||x||| ≤ 1} Hint: You know from Gelfand’s theorem that 1/n

||An ||

0 there there exists δ > 0 such that if |s − t| < δ then ||Ψ (t) − Ψ (s)|| < ε. Show t → (Ψ (t) v, w) is continuous. Here it is the inner product in W. Also define what it means for t → Ψ (t) v to be continuous and show this is continuous. Do it all for differentiable in place of continuous. Next show t → ||Ψ (t)|| is continuous. 20. If z (t) ∈ W, a finite dimensional inner product space, what does it mean for t → z (t) to be continuous or differentiable? If z is continuous, define ∫

b

z (t) dt ∈ W a

(

as follows.



)

b

w,

z (t) dt



b



a

(w, z (t)) dt. a

Show that this definition is well defined and furthermore the triangle inequality, ∫ ∫ b b z (t) dt ≤ |z (t)| dt, a a and fundamental theorem of calculus, (∫ t ) d z (s) ds = z (t) dt a hold along with any other interesting properties of integrals which are true. 21. For V, W two inner product spaces, define ∫

b

Ψ (t) dt ∈ L (V, W ) a

as follows.

(



w,

)

b

Ψ (t) dt (v) a

∫ ≡

b

(w, Ψ (t) v) dt. a

∫b Show this is well defined and does indeed give a Ψ (t) dt ∈ L (V, W ) . Also show the triangle inequality ∫ ∫ b b ||Ψ (t)|| dt Ψ (t) dt ≤ a a where ||·|| is the operator norm and verify the fundamental theorem of calculus holds. (∫ a

t

)′ Ψ (s) ds = Ψ (t) .

390

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS Also verify the usual properties of integrals continue to hold such as the fact the integral is linear and ∫ b ∫ c ∫ c Ψ (t) dt + Ψ (t) dt = Ψ (t) dt a

b

a

and similar things. Hint: On showing the triangle inequality, it will help if you use the fact that |w|W = sup |(w, v)| . |v|≤1

You should show this also. 22. Prove Gronwall’s inequality. Suppose u (t) ≥ 0 and for all t ∈ [0, T ] , ∫

t

u (t) ≤ u0 +

Ku (s) ds. 0

where K is some nonnegative constant. Then u (t) ≤ u0 eKt . Hint: w (t) = following.

∫t 0

u (s) ds. Then using the fundamental theorem of calculus, w (t) satisfies the u (t) − Kw (t) = w′ (t) − Kw (t) ≤ u0 , w (0) = 0.

Now use the usual techniques you saw in an introductory differential equations class. Multiply both sides of the above inequality by e−Kt and note the resulting left side is now a total derivative. Integrate both sides from 0 to t and see what you have got. 23. With Gronwall’s inequality and the integral defined in Problem 21 with its properties listed there, prove there is at most one solution to the initial value problem y′ = Ay, y (0) = y0 . Hint: If there are two solutions, subtract them and call the result z. Then z′ = Az, z (0) = 0. It follows

∫ z (t) = 0+

t

Az (s) ds 0

and so

∫ ||z (t)|| ≤

t

∥A∥ ||z (s)|| ds 0

Now consider Gronwall’s inequality of Problem 22. 24. Suppose A is a matrix which has the property that whenever µ ∈ σ (A) , Re µ < 0. Consider the initial value problem y′ = Ay, y (0) = y0 . The existence and uniqueness of a solution to this equation has been established above in preceding problems, Problem 17 to 23. Show that in this case where the real parts of the eigenvalues are all negative, the solution to the initial value problem satisfies lim y (t) = 0.

t→∞

Hint: A nice way to approach this problem is to show you can reduce it to the consideration of the initial value problem z′ = Jε z, z (0) = z0

15.5. EXERCISES

391

where Jε is the modified Jordan canonical form where instead of ones down the main diagonal, there are ε down the main diagonal (Problem 14). Then z′ = Dz + Nε z where D is the diagonal matrix obtained from the eigenvalues of A and Nε is a nilpotent matrix commuting with D which is very small provided ε is chosen very small. Now let Ψ (t) be the solution of Ψ′ = −DΨ, Ψ (0) = I described earlier as

∞ k ∑ (−1) tk Dk

k!

k=0

.

Thus Ψ (t) commutes with D and Nε . Tell why. Next argue ′

(Ψ (t) z) = Ψ (t) Nε z (t) and integrate from 0 to t. Then ∫

t

Ψ (t) z (t) − z0 =

Ψ (s) Nε z (s) ds. 0

It follows



t

||Nε || ||Ψ (s) z (s)|| ds.

||Ψ (t) z (t)|| ≤ ||z0 || + 0

It follows from Gronwall’s inequality ||Ψ (t) z (t)|| ≤ ||z0 || e||Nε ||t Now look closely at the form of Ψ (t) to get an estimate which is interesting. Explain why   eµ1 t 0   ..  Ψ (t) =  .   0 eµn t and now observe that if ε is chosen small enough, ||Nε || is so small that each component of z (t) converges to 0. 25. Using Problem 24 show that if A is a matrix having the real parts of all eigenvalues less than 0 then if Ψ′ (t) = AΨ (t) , Ψ (0) = I it follows lim Ψ (t) = 0.

t→∞

Hint: Consider the columns of Ψ (t)? 26. Let Ψ (t) be a fundamental matrix satisfying Ψ′ (t) = AΨ (t) , Ψ (0) = I. Show Ψ (t) = Ψ (nt) . Hint: Subtract and show the difference satisfies Φ′ = AΦ, Φ (0) = 0. Use uniqueness. n

27. If the real parts of the eigenvalues of A are all negative, show that for every positive t, lim Ψ (nt) = 0.

n→∞

Hint: Pick Re (σ (A)) < −λ < 0 and use Problem 18 about the spectrum of Ψ (t) and Gelfand’s theorem for the spectral radius along with Problem 26 to argue that Ψ (nt) /e−λnt < 1 for all n large enough.

392

CHAPTER 15. ANALYSIS OF LINEAR TRANSFORMATIONS

28. Let H be a Hermitian matrix. (H = H ∗ ) . Show that eiH ≡

∑∞ n=0

(iH)n n!

is unitary.

29. Show the converse of the above exercise. If V is unitary, then V = eiH for some H Hermitian. Hint: First verify that V is normal. Thus U ∗ V U = D. Now verify that D∗ D = I. What does this mean for the diagonal entries of D? If you have a complex number which has magnitude 1, what form does it take? −1

30. If U is unitary and does not have −1 as an eigenvalue so that (I + U )

exists, show that

−1

H = i (I − U ) (I + U ) is Hermitian. Then, verify that

−1

U = (I + iH) (I − iH)

.

31. Suppose that A ∈ L (V, V ) where V is a normed linear space. Also suppose that ∥A∥ < 1 where this refers to the operator norm on A. Verify that −1

(I − A)

=

∞ ∑

Ai

i=0

This is called the Neumann series. Suppose now that you only know the algebraic condition −1 ρ (A) < 1. Is it still the case that the Neumann series converges to (I − A) ?

Chapter 16

Numerical Methods, Eigenvalues 16.1

The Power Method For Eigenvalues

This chapter discusses numerical methods for finding eigenvalues. However, to do this correctly, you must include numerical analysis considerations which are distinct from linear algebra. The purpose of this chapter is to give an introduction to some numerical methods without leaving the context of linear algebra. In addition, some examples are given which make use of computer algebra systems. For a more thorough discussion, you should see books on numerical methods in linear algebra like some listed in the references. Let A be a complex p × p matrix and suppose that it has distinct eigenvalues {λ1 , · · · , λm } and that |λ1 | > |λk | for all k. Also let the Jordan form of A be   J1   ..  J = .   Jm with J1 an m1 × m1 matrix. Jk = λk Ik + Nk where

Nkrk

̸= 0 but

Nkrk +1

= 0. Also let P −1 AP = J, A = P JP −1 .

Now fix x ∈ Fp . Take Ax and let s1 be the entry of the vector Ax which has largest absolute value. Thus Ax/s1 is a vector y1 which has a component of 1 and every other entry of this vector has magnitude no larger than 1. If the scalars {s1 , · · · , sn−1 } and vectors {y1 , · · · , yn−1 } have been obtained, let yn ≡ Ayn−1 /sn where sn is the entry of Ayn−1 which has largest absolute value. Thus yn =

AAyn−2 An x ··· = sn sn−1 sn sn−1 · · · s1 

=

 P sn sn−1 · · · s1  1

 =

 λ1n P sn sn−1 · · · s1 



J1n ..

 −1 P x 

. n Jm



n λ−n 1 J1

..

. n λ−n 1 Jm

393

(16.1)

 −1 P x 

(16.2)

394

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES

Consider one of the blocks in the Jordan form. First consider the k th of these blocks, k > 1. It equals rk ( ) ∑ n −n n−i i n λ−n J = λ λ Nk 1 k i 1 k i=0 which clearly converges to 0 as n → ∞ since |λ1 | > |λk |. An application of the ratio test or root test for each term in the sum will show this. When k = 1, this block is n λ−n 1 J1

=

n λ−n 1 Jk

r1 ( ) ∑ n

=

i=0

i

n−i i λ−n N1 1 λ1

( ) ] n [ −r1 r1 = λ 1 N1 + e n r1

where limn→∞ en = 0 because it is a sum of bounded matrices which are multiplied by This quotient converges to 0 as n → ∞ because i < r1 . It follows that 16.2 is of the form ) ( ) ( −r1 r1 ( ) λ 1 N1 + e n 0 λn1 n n λn1 yn = P wn P −1 x ≡ sn sn−1 · · · s1 r1 sn sn−1 · · · s1 r1 0 En

( n) ( n ) i / r1 .

( ) where En → 0, en → 0. Let P −1 x m1 denote the first m1 entries of the vector P −1 x. Unless a very ) ( unlucky choice for x was picked, it will follow that P −1 x m1 ∈ / ker (N1r1 ) . Then for large n, yn is close to the vector ) ( ) ( −r1 r1 ( ) λ 1 N1 0 λn1 n λn1 n −1 P w ≡ z ̸= 0 P x≡ sn sn−1 · · · s1 r1 sn sn−1 · · · s1 r1 0 0 However, this is an eigenvector because A−λ1 I

z }| { (A − λ1 I) w = P (J − λ1 I) P −1 P   P  ( = P

(



N1 ..

r1 1 λ−r 1 N1 0



 −1  P P   

.

0 0

) P −1 x = 

r1 1 λ−r 1 N1

Jm − λ1 I ) 0 P −1 x = 0 0

..

 −1 P x 

. 0

r1 1 N1 λ−r 1 N1

0

Recall N1r1 +1 = 0. Now you could recover an approximation to the eigenvalue as follows. (Az, z) (Ayn , yn ) u = λ1 (yn , yn ) (z, z) Here u means “approximately equal”. However, there is a more convenient way to identify the eigenvalue in terms of the scaling factors sk .

( )

λn1 n

sn · · · s1 r1 (wn − w) u 0 ∞ Pick the largest nonzero entry of w, wl . Then for large n, it is also likely the case that the largest entry of wn will be in the lth position because wm is close to w. From the construction, ( ) ( ) λn1 n λn1 n wnl = 1 u wl sn · · · s1 r1 sn · · · s1 r1

16.1. THE POWER METHOD FOR EIGENVALUES In other words, for large n

Therefore, for large n,

395

( ) λn1 n u 1/wl sn · · · s1 r1 ( ) ( ) λn1 n λn+1 n+1 1 u sn · · · s1 r1 sn+1 sn · · · s1 r1 ( ) ( ) n n+1 λ1 / u r1 r1 sn+1

and so

( ) ( ) But limn→∞ rn1 / n+1 = 1 and so, for large n it must be the case that λ1 u sn+1 . r1 This has proved the following theorem which justifies the power method. Theorem 16.1.1 Let A be a complex p × p matrix such that the eigenvalues are {λ1 , λ2 , · · · , λr } with |λ1 | > |λj | for all j ̸= 1. Then for x a given vector, let y1 =

Ax s1

where s1 is an entry of Ax which has the largest absolute value. If the scalars {s1 , · · · , sn−1 } and vectors {y1 , · · · , yn−1 } have been obtained, let yn ≡

Ayn−1 sn

where sn is the entry of Ayn−1 which has largest absolute value. Then it is probably the case that {sn } will converge to λ1 and {yn } will converge to an eigenvector associated with λ1 . If it doesn’t, you picked an incredibly inauspicious initial vector x. In summary, here is the procedure. Finding the largest eigenvalue with its eigenvector. 1. Start with a vector, u1 which you hope is not unlucky. 2. If uk is known, uk+1 =

Auk sk+1

where sk+1 is the entry of Auk which has largest absolute value. 3. When the scaling factors sk are not changing much, sk+1 will be close to the eigenvalue and uk+1 will be close to an eigenvector. 4. Check your answer to see if it worked well. If things don’t work were miraculously unlucky in your choice.  5 −14 11  Example 16.1.2 Find the largest eigenvalue of A =  −4 4 −4 3 6 −3

well, try another u1 . You   .

396

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES T

You can begin with u1 = (1, · · · , 1) and apply the above procedure. However, you can accelerate the process if you begin with An u1 and then divide by the largest entry to get the first approximate eigenvector. Thus   20    5 −14 11 1 2. 555 8 × 1021       4 −4   1  =  −1. 277 9 × 1021   −4 3 6 −3 1 −3. 656 2 × 1015 Divide by the largest entry to obtain a good aproximation.     2. 555 8 × 1021 1.0 1     = −0.5  −1. 277 9 × 1021   21 2. 555 8 × 10 15 −6 −3. 656 2 × 10 −1. 430 6 × 10 Now begin with this one.      5 −14 11 1.0 12. 000      4 −4   −0.5 −6. 000 0  −4 =  3 6 −3 −1. 430 6 × 10−6 4. 291 8 × 10−6 Divide by 12 to get the next iterate.     1.0 12. 000     1 = −0.5 −6. 000 0    12 3. 576 5 × 10−7 4. 291 8 × 10−6 Another iteration will reveal that the scaling factor is still 12. Thus this is an(approximate eigenvalue. ) In fact, it is the largest eigenvalue and the corresponding eigenvector is 1.0 −0.5 0 . The process has worked very well.

16.1.1

The Shifted Inverse Power Method

This method can find various eigenvalues and eigenvectors. It is a significant generalization of the above simple procedure and yields very good results. One can find complex eigenvalues using this method. The situation is this: You have a number α which is close to λ, some eigenvalue of an n × n matrix A. You don’t know λ but you know that α is closer to λ than to any other eigenvalue. Your problem is to find both λ and an eigenvector which goes with λ. Another way to look at this is to start with α and seek the eigenvalue λ, which is closest to α along with an eigenvector associated with λ. If α is an eigenvalue of A, then you have what you want. Therefore, I will always assume α −1 is not an eigenvalue of A and so (A − αI) exists. The method is based on the following lemma. n

Lemma 16.1.3 Let {λk }k=1 be the eigenvalues of A. If xk is an eigenvector of A for the eigenvalue −1 λk , then xk is an eigenvector for (A − αI) corresponding to the eigenvalue λk1−α . Conversely, if −1

(A − αI)

y=

1 y λ−α

and y ̸= 0, then Ay = λy. Proof: Let λk and xk be as described in the statement of the lemma. Then (A − αI) xk = (λk − α) xk and so

1 −1 xk = (A − αI) xk . λk − α

(16.3)

16.2. AUTOMATION WITH MATLAB

397

1 Suppose 16.3. Then y = λ−α [Ay − αy] . Solving for Ay leads to Ay = λy.  1 Now assume α is closer to λ than to any other eigenvalue. Then the magnitude of λ−α is greater −1 than the magnitude of all the other eigenvalues of (A − αI) . Therefore, the power method applied −1 1 1 to (A − αI) will yield λ−α . You end up with sn+1 u λ−α and solve for λ.

16.1.2

The Explicit Description Of The Method

Here is how you use this method to find the eigenvalue closest to α and the corresponding eigenvector. −1

1. Find (A − αI)

.

2. Pick u1 . If you are not phenomenally unlucky, the iterations will converge. 3. If uk has been obtained,

−1

uk+1 = −1

where sk+1 is the entry of (A − αI)

(A − αI) sk+1

uk

uk which has largest absolute value.

4. When the scaling factors, sk are not changing much and the uk are not changing much, find the approximation to the eigenvalue by solving sk+1 =

1 λ−α

for λ. The eigenvector is approximated by uk+1 . 5. Check your work by multiplying by the original matrix to see how well what you have found works. −1

Thus this amounts to the power method for the matrix (A − αI)

16.2

but you are free to pick α.

Automation With Matlab

You can do the above example and other examples using Matlab. Here are some commands which will do this. It is done here for a 3 × 3 matrix but you adapt for any size. a=[5 -8 6;1 0 0;0 1 0]; b=i; F=inv(a-b*eye(3)); S=1; u=[1;1;1]; d=1; k=1; while d >.00001 & k.0001 & k 0, it follows as in the first part that r22 → 1. Hence

lim qk2 = q2 .

k→∞

Continuing this way, it follows k lim rij =0

k→∞

for all i ̸= j and

k lim rjj = 1, lim qkj = qj .

k→∞

k→∞

Thus Rk → I and Q(k) → Q. This proves the first part of the lemma. The second part follows immediately. If QR = Q′ R′ = A where A−1 exists, then −1

Q∗ Q′ = R (R′ )

and I need to show both sides of the above are equal to I. The left side of the above is unitary and the right side is upper triangular having positive entries on the diagonal. This is because the inverse of such an upper triangular matrix having positive entries on the main diagonal is still upper triangular having positive entries on the main diagonal and the product of two such upper triangular matrices gives another of the same form having positive entries on the main diagonal. Suppose then

16.3. THE QR ALGORITHM

407

that Q = R where Q is unitary and R is upper triangular having positive entries on the main diagonal. Let Qk = Q and Rk = R. It follows IRk → R = Q −1

and so from the first part, Rk → I but Rk = R and so R = I. Thus applying this to Q∗ Q′ = R (R′ ) yields both sides equal I.  n A case of all( this is of great interest. ) Suppose A has a largest eigenvalue λ which is real. Then A is of the form An−1 a1 , · · · , An−1 an and so likely each of these columns will be pointing roughly in the direction of an eigenvector of A which corresponds to this eigenvalue. Then when you do the QR factorization of this, it follows from the fact that R is upper triangular, that the first column of Q will be a multiple of An−1 a1 and so will end up being roughly parallel to the eigenvector desired. Also this will require the entries below the top in the first column of An = QT AQ will all be small because they will be of the form qTi Aq1 u λqTi q1 = 0. Therefore, An will be of the form ( ) λ′ a e B where e is small. It follows that λ′ will be close to λ and q1 will be close to an eigenvector for λ. Then if you like, you could do the same thing with the matrix B to obtain approximations for the other eigenvalues. Finally, you could use the shifted inverse power method to get more exact solutions.

16.3.2

The Case Of Real Eigenvalues

With these lemmas, it is possible to prove that for the QR algorithm and certain conditions, the sequence Ak converges pointwise to an upper triangular matrix having the eigenvalues of A down the diagonal. I will assume all the matrices are real here. ( ) 0 1 This convergence won’t always happen. Consider for example the matrix . You can 1 0 verify quickly that the algorithm will return this matrix for each k. The problem here is that, although the matrix has the two eigenvalues −1, 1, they have the same absolute value. The QR algorithm works in somewhat the same way as the power method, exploiting differences in the size of the eigenvalues. If A has all real eigenvalues and you are interested in finding these eigenvalues along with the corresponding eigenvectors, you could always consider A + λI instead where λ is sufficiently large and positive that A + λI has all positive eigenvalues. (Recall Gerschgorin’s theorem.) Then if µ is an eigenvalue of A + λI with (A + λI) x = µx then Ax = (µ − λ) x so to find the eigenvalues of A you just subtract λ from the eigenvalues of A + λI. Thus there is no loss of generality in assuming at the outset that the eigenvalues of A are all positive. Here is the theorem. It involves a technical condition which will often hold. The proof presented here follows [36] and is a special case of that presented in this reference. Before giving the proof, note that the product of upper triangular matrices is upper triangular. If they both have positive entries on the main diagonal so will the product. Furthermore, the inverse of an upper triangular matrix is upper triangular. I will use these simple facts without much comment whenever convenient. Theorem 16.3.5 Let A be a real matrix having eigenvalues λ1 > λ2 > · · · > λn > 0

408

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES

and let

A = SDS −1 

where

 D= 

λ1

0 ..

0

(16.9)

.

   

λn

and suppose S −1 has an LU factorization. Then the matrices Ak in the QR algorithm described above converge to an upper triangular matrix T ′ having the eigenvalues of A, λ1 , · · · , λn descending on the main diagonal. The matrices Q(k) converge to Q′ , an orthogonal matrix which equals Q except for possibly having some columns multiplied by −1 for Q the unitary part of the QR factorization of S, S = QR, and

lim Ak = T ′ = Q′T AQ′

k→∞

Proof: From Lemma 16.3.3

Ak = Q(k) R(k) = SDk S −1

Let S = QR where this is just a QR factorization which is known to exist and let S is assumed to exist. Thus Q(k) R(k) = QRDk LU and so

(16.10) −1

= LU which (16.11)

Q(k) R(k) = QRDk LU = QRDk LD−k Dk U

That matrix in the middle, Dk LD−k satisfies ( k ) D LD−k ij = λki Lij λ−k for j ≤ i, 0 if j > i. j Thus for j < i the expression converges to 0 because λj > λi when this happens. When i = j it reduces to 1. Thus the matrix in the middle is of the form I + Ek where Ek → 0. Then it follows Ak = Q(k) R(k) = QR (I + Ek ) Dk U ( ) = Q I + REk R−1 RDk U ≡ Q (I + Fk ) RDk U where Fk → 0. Then let I + Fk = Qk Rk where this is another QR factorization. Then it reduces to Q(k) R(k) = QQk Rk RDk U This looks really interesting because by Lemma 16.3.4 Qk → I and Rk → I because Qk Rk = (I + Fk ) → I. So it follows QQk is an orthogonal matrix converging to Q while ( )−1 Rk RDk U R(k) is upper triangular, being the product of upper triangular matrices. Unfortunately, it is not known that the diagonal entries of this matrix are nonnegative because of the U . Let Λ be just like the identity matrix but having some of the ones replaced with −1 in such a way that ΛU is an upper triangular matrix having positive diagonal entries. Note Λ2 = I and also Λ commutes with a diagonal matrix. Thus Q(k) R(k) = QQk Rk RDk Λ2 U = QQk Rk RΛDk (ΛU )

16.3. THE QR ALGORITHM

409

At this point, one does some inspired massaging to write the above in the form ] ( ) [( )−1 QQk ΛDk ΛDk Rk RΛDk (ΛU ) [( ] )−1 = Q (Qk Λ) Dk ΛDk Rk RΛDk (ΛU ) ≡Gk

=

z [ }| { ] ( )−1 Q (Qk Λ) Dk ΛDk Rk RΛDk (ΛU )

Now I claim the middle matrix in [·] is upper triangular and has all positive entries on the diagonal. This is because it is an upper triangular matrix which is similar to the upper triangular matrix[ Rk R and so it has] the same eigenvalues (diagonal entries) as Rk R. Thus the matrix )−1 ( Rk RΛDk (ΛU ) is upper triangular and has all positive entries on the diagonal. Gk ≡ Dk ΛDk Multiply on the right by G−1 k to get ′ Q(k) R(k) G−1 k = QQk Λ → Q

where Q′ is essentially equal to Q but might have some of the columns multiplied by −1. This is because Qk → I and so Qk Λ → Λ. Now by Lemma 16.3.4, it follows Q(k) → Q′ , R(k) G−1 k → I. It remains to verify Ak converges to an upper triangular matrix. Recall that from 16.10 and the definition below this (S = QR) A = SDS −1 = (QR) D (QR)

−1

= QRDR−1 QT = QT QT

Where T is an upper triangular matrix. This is because it is the product of upper triangular matrices R, D, R−1 . Thus QT AQ = T. If you replace Q with Q′ in the above, it still results in an upper triangular matrix T ′ having the same diagonal entries as T. This is because T = QT AQ = (Q′ Λ) A (Q′ Λ) = ΛQ′T AQ′ Λ T

and considering the iith entry yields ∑ ( T ) ( ) ( ) ( ) Q AQ ii ≡ Λij Q′T AQ′ jk Λki = Λii Λii Q′T AQ′ ii = Q′T AQ′ ii j,k

Recall from Lemma 16.3.3, Ak = Q(k)T AQ(k) . Thus taking a limit and using the first part, Ak = Q(k)T AQ(k) → Q′T AQ′ = T ′ .  An easy case is for A symmetric. Recall Corollary 14.1.6. By this corollary, there exists an orthogonal (real unitary) matrix Q such that QT AQ = D where D is diagonal having the eigenvalues on the main diagonal decreasing in size from the upper left corner to the lower right. Corollary 16.3.6 Let A be a real symmetric n × n matrix having eigenvalues λ1 > λ2 > · · · > λn > 0 and let Q be defined by QDQT = A, D = QT AQ,

(16.12)

where Q is orthogonal and D is a diagonal matrix having the eigenvalues on the main diagonal decreasing in size from the upper left corner to the lower right. Let QT have an LU factorization. Then in the QR algorithm, the matrices Q(k) converge to Q′ where Q′ is the same as Q except having some columns multiplied by (−1) . Thus the columns of Q′ are eigenvectors of A. The matrices Ak converge to D.

410

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES Proof: This follows from Theorem 16.3.5. Here S = Q, S −1 = QT . Thus Q = S = QR

and R = I. By Theorem 16.3.5 and Lemma 16.3.3, Ak = Q(k)T AQ(k) → Q′T AQ′ = QT AQ = D. because formula 16.12 is unaffected by replacing Q with Q′ .  When using the QR algorithm, it is not necessary to check technical condition about S −1 having an LU factorization. The algorithm delivers a sequence of matrices which are similar to the original one. If that sequence converges to an upper triangular matrix, then the algorithm worked. Furthermore, the technical condition is sufficient but not necessary. The algorithm will work even without the technical condition. Example 16.3.7 Find the eigenvalues and eigenvectors  5 1 1  A= 1 3 2 1 2 1

of the matrix   

It is a symmetric matrix but other than that, I just pulled it out of the air. By Lemma 16.3.3 it follows Ak = Q(k)T AQ(k) . And so to get to the answer quickly I could have the computer raise A to a power and then take the QR factorization of what results to get the k th iteration using the above formula. Lets pick k = 10. 

5   1 1

1 3 2

10  1 4. 227 3 × 107   2  =  2. 595 9 × 107 1 1. 861 1 × 107

2. 595 9 × 107 1. 607 2 × 107 1. 150 6 × 107

 1. 861 1 × 107  1. 150 6 × 107  8. 239 6 × 106

Now take QR factorization of this. The computer will do that also. This yields   . 797 85 −. 599 12 −6. 694 3 × 10−2   −. 507 06 ·  . 489 95 . 709 12 . 351 26 . 371 76 . 859 31   5. 298 3 × 107 3. 262 7 × 107 2. 338 × 107   0 1. 217 2 × 105 71946.   0 0 277. 03 Next it follows 

A10

. 797 85  =  . 489 95 . 351 26  5 1 1   1 3 2 1 2 1

T −. 599 12 −6. 694 3 × 10−2  . 709 12 −. 507 06  · . 371 76 . 859 31   . 797 85 −. 599 12 −6. 694 3 × 10−2   −. 507 06    . 489 95 . 709 12 . 351 26 . 371 76 . 859 31

and this equals 

6. 057 1   3. 698 × 10−3 3. 434 6 × 10−5

3. 698 × 10−3 3. 200 8 −4. 064 3 × 10−4

 3. 434 6 × 10−5  −4. 064 3 × 10−4  −. 257 9

16.3. THE QR ALGORITHM

411

By Gerschgorin’s theorem, the eigenvalues are pretty close to the diagonal entries of the above matrix. Note I didn’t use the theorem, just Lemma 16.3.3 and Gerschgorin’s theorem to verify the eigenvalues are close to the above numbers. The eigenvectors are close to       . 797 85 −. 599 12 −6. 694 3 × 10−2       −. 507 06  . 489 95  ,  . 709 12  ,   . 351 26 . 371 76 . 859 31 Lets check one of these.

  5 1 1 1 0     1 3 2  − 6. 057 1  0 1 1 2 1 0 0     0 −2. 197 2 × 10−3     =  2. 543 9 × 10−3  u  0  1. 393 1 × 10−3 0

Now lets see how well  5   1 1



   0 . 797 85    0   . 489 95  1 . 351 26

the smallest approximate eigenvalue and eigenvector works.      −6. 694 3 × 10−2 1 1 1 0 0      −. 507 06 3 2  − (−. 257 9)  0 1 0    . 859 31 0 0 1 2 1     0 2. 704 × 10−4     =  −2. 737 7 × 10−4  u  0  0 −1. 369 5 × 10−4

For practical purposes, this has found the eigenvalues and eigenvectors.

16.3.3

The QR Algorithm In The General Case

In the case where A has distinct positive eigenvalues it was shown above that under reasonable conditions related to a certain matrix having an LU factorization the QR algorithm produces a sequence of matrices {Ak } which converges to an upper triangular matrix. What if A is just an n × n matrix having possibly complex eigenvalues but A is nondefective? What happens with the QR algorithm in this case? The short answer to this question is that the Ak of the algorithm typically cannot converge. However, this does not mean the algorithm is not useful in finding eigenvalues. It turns out the sequence of matrices {Ak } have the appearance of a block upper triangular matrix for large k in the sense that the entries below the blocks on the main diagonal are small. Then looking at these blocks gives a way to approximate the eigenvalues. First it is important to note a simple fact about unitary diagonal matrices. In what follows Λ will denote a unitary matrix which is also a diagonal matrix. These matrices are just the identity matrix with some of the ones replaced with a number of the form eiθ for some θ. The important property of multiplication of any matrix by Λ on either side is that it leaves all the zero entries the same and also preserves the absolute values of the other entries. Thus a block triangular matrix multiplied by Λ on either side is still block triangular. If the matrix is close to being block triangular this property of being close to a block triangular matrix is also preserved by multiplying on either side by Λ. Other patterns depending only on the size of the absolute value occurring in the matrix are also preserved by multiplying on either side by Λ. In other words, in looking for a pattern in a matrix, multiplication by Λ is irrelevant. Now let A be an n × n matrix having real or complex entries. By Lemma 16.3.3 and the assumption that A is nondefective, there exists an invertible S, Ak = Q(k) R(k) = SDk S −1

(16.13)

412

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES 

where

 D= 

λ1

0 ..

.

0

   

λn

and by rearranging the columns of S, D can be made such that |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | . Assume S −1 has an LU factorization. Then Ak = SDk LU = SDk LD−k Dk U. Consider the matrix in the middle, Dk LD−k . The ij th entry is of the form  k −k   λi Lij λj if j < i ( k ) −k D LD = 1 if i = j ij   0 if j > i and these all converge to 0 whenever |λi | < |λj | . Thus Dk LD−k = (Lk + Ek ) where Lk is a lower triangular matrix which has all ones down the diagonal and some subdiagonal terms of the form λki Lij λ−k (16.14) j for which |λi | = |λj | while Ek → 0. (Note the entries of Lk are all bounded independent of k but some may fail to converge.) Then Q(k) R(k) = S (Lk + Ek ) Dk U Let SLk = Qk Rk

(16.15)

where this is the QR factorization of SLk . Then Q(k) R(k)

= (Qk Rk + SEk ) Dk U ( ) = Qk I + Q∗k SEk Rk−1 Rk Dk U = Qk (I + Fk ) Rk Dk U

where Fk → 0. Let I + Fk = Q′k Rk′ . Then Q(k) R(k) = Qk Q′k Rk′ Rk Dk U. By Lemma 16.3.4 Q′k → I and Rk′ → I.

(16.16)

Now let Λk be a diagonal unitary matrix which has the property that Λ∗k Dk U is an upper triangular matrix which has all the diagonal entries positive. Then Q(k) R(k) = Qk Q′k Λk (Λ∗k Rk′ Rk Λk ) Λ∗k Dk U That matrix in the middle has all positive diagonal entries because it is itself an upper triangular matrix, being the product of such, and is similar to the matrix Rk′ Rk which is upper triangular with positive diagonal entries. By Lemma 16.3.4 again, this time using the uniqueness assertion, Q(k) = Qk Q′k Λk , R(k) = (Λ∗k Rk′ Rk Λk ) Λ∗k Dk U

16.3. THE QR ALGORITHM

413

Note the term Qk Q′k Λk must be real because the algorithm gives all Q(k) as real matrices. By 16.16 it follows that for k large enough Q(k) u Qk Λk where u means the two matrices are close. Recall Ak = Q(k)T AQ(k) and so for large k, ∗

Ak u (Qk Λk ) A (Qk Λk ) = Λ∗k Q∗k AQk Λk As noted above, the form of Λ∗k Q∗k AQk Λk in terms of which entries are large and small is not affected by the presence of Λk and Λ∗k . Thus, in considering what form this is in, it suffices to consider Q∗k AQk . This could get pretty complicated but I will consider the case where if |λi | = |λi+1 | , then |λi+2 | < |λi+1 | .

(16.17)

This is typical of the situation where the eigenvalues are all distinct and the matrix A is real so the eigenvalues occur as conjugate pairs. Then in this case, Lk above is lower triangular with some nonzero terms on the diagonal right below the main diagonal but zeros everywhere else. Thus maybe (Lk )s+1,s ̸= 0 Recall 16.15 which implies Qk = SLk Rk−1

(16.18)

where Rk−1 is upper triangular. Also recall from the definition of S in 16.13, it follows that S −1 AS = D. Thus the columns of S are eigenvectors of A, the ith being an eigenvector for λi . Now from the form of Lk , it follows Lk Rk−1 is a block upper triangular matrix denoted by TB and so Qk = STB . It follows from the above construction in 16.14 and the given assumption on the sizes of the eigenvalues, there are finitely many 2 × 2 blocks centered on the main diagonal along with possibly some diagonal entries. Therefore, for large k the matrix Ak = Q(k)T AQ(k) is approximately of the same form as that of Q∗k AQk = TB−1 S −1 ASTB = TB−1 DTB which is a block upper triangular matrix. As explained above, multiplication by the various diagonal unitary matrices does not affect this form. Therefore, for large k, Ak is approximately a block upper triangular matrix. How would this change if the above assumption on the size of the eigenvalues were relaxed but the matrix was still nondefective with appropriate matrices having an LU factorization as above? It would mean the blocks on the diagonal would be larger. This immediately makes the problem more cumbersome to deal with. However, in the case that the eigenvalues of A are distinct, the above situation really is typical of what occurs and in any case can be quickly reduced to this case. To see this, suppose condition 16.17 is violated and λj , · · · , λj+p are complex eigenvalues having nonzero imaginary parts such that each has the same absolute value but they are all distinct. Then let µ > 0 and consider the matrix A + µI. Thus the corresponding eigenvalues of A + µI are λj + µ, · · · , λj+p + µ. A short computation shows |λj + µ| , · · · , |λj+p + µ| are all distinct and so the above situation of 16.17 is obtained. Of course, if there are repeated eigenvalues, it may not be possible to reduce to the case above and you would end up with large blocks on the main diagonal which could be difficult to deal with. So how do you identify the eigenvalues? You know Ak and behold that it is close to a block upper triangular matrix TB′ . You know Ak is also similar to A. Therefore, TB′ has eigenvalues which are close to the eigenvalues of Ak and hence those of A provided k is sufficiently large. See Theorem 14.4.2 which depends on complex analysis or the exercise on Page 338 which gives another way to see this. Thus you find the eigenvalues of this block triangular matrix TB′ and assert that these are good approximations of the eigenvalues of Ak and hence to those of A. How do you find the eigenvalues of a block triangular matrix? This is easy from Lemma 14.1.4. Say   B1 · · · ∗  ..  ..  TB′ =  . .   0 Bm

414

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES

Then forming λI − TB′ and taking the determinant, it follows from Lemma 14.1.4 this equals m ∏

det (λIj − Bj )

j=1

and so all you have to do is take the union of the eigenvalues for each Bj . In the case emphasized here this is very easy because these blocks are just 2 × 2 matrices. How do you identify approximate eigenvectors from this? First try to find the approximate eigenvectors for Ak . Pick an approximate eigenvalue λ, an exact eigenvalue for TB′ . Then find v solving TB′ v = λv. It follows since TB′ is close to Ak that Ak v u λv and so Q(k) AQ(k)T v = Ak v u λv Hence AQ(k)T v u λQ(k)T v and so Q(k)T v is an approximation to the eigenvector which goes with the eigenvalue of A which is close to λ. Example 16.3.8 Here is a matrix. 

3  −2  −2

2 0 −2

 1  −1  0

It happens that the eigenvalues of this matrix are 1, 1 + i, 1 − i. Lets apply the QR algorithm as if the eigenvalues were not known.

Applying the QR algorithm to this matrix yields the following sequence of matrices.   1. 235 3 1. 941 2 4. 365 7   A1 =  −. 392 15 1. 542 5 5. 388 6 × 10−2  −. 161 69 −. 188 64 . 222 22 

A12

.. .

9. 177 2 × 10−2  = −2. 855 6 1. 078 6 × 10−2

. 630 89 1. 908 2 3. 461 4 × 10−4

 −2. 039 8  −3. 104 3  1.0

At this point the bottom two terms on the left part of the bottom row are both very small so it appears the real eigenvalue is near 1.0. The complex eigenvalues are obtained from solving ( ( ) ( )) 1 0 9. 177 2 × 10−2 . 630 89 det λ − =0 0 1 −2. 855 6 1. 908 2 This yields λ = 1.0 − . 988 28i, 1.0 + . 988 28i Example 16.3.9 The equation x4 + x3 + 4x2 + x − 2 = 0 has exactly two real solutions. You can see this by graphing it. However, the rational root theorem from algebra shows neither of these solutions are rational. Also, graphing it does not yield any information about the complex solutions. Lets use the QR algorithm to approximate all the solutions, real and complex.

16.3. THE QR ALGORITHM

415

A matrix whose characteristic polynomial  −1  1    0 0

is the given polynomial is  −4 −1 2 0 0 0    1 0 0  0

1

0

Using the QR algorithm yields the following sequence of iterates for Ak    A1 =  

. 999 99 2. 121 3 0 0

−2. 592 7 −1. 777 8 . 342 46 0

−1. 758 8 −1. 604 2 −. 327 49 −. 446 59

−1. 297 8 −. 994 15 −. 917 99 . 105 26

    

.. .    A9 =  

−. 834 12 1. 05 0 0

−4. 168 2 . 145 14 4. 026 4 × 10−4 0

−1. 939 . 217 1 −. 850 29 −1. 826 3 × 10−2

−. 778 3 2. 547 4 × 10−2 −. 616 08 . 539 39

    

Now this is similar to A and the eigenvalues are close to the eigenvalues obtained from the two blocks on the diagonal, ) ) ( ( −. 850 29 −. 616 08 −. 834 12 −4. 168 2 , −1. 826 3 × 10−2 . 539 39 1. 05 . 145 14 since 4. 026 4 × 10−4 is small. After routine computations involving the quadratic formula, these are seen to be −. 858 34, . 547 44, −. 344 49 − 2. 033 9i, −. 344 49 + 2. 033 9i When these are plugged in to the polynomial equation, you see that each is close to being a solution of the equation.

16.3.4

Upper Hessenberg Matrices

It seems like most of the attention to the QR algorithm has to do with finding ways to get it to “converge” faster. Great and marvelous are the clever tricks which have been proposed to do this but my intent is to present the basic ideas, not to go in to the numerous refinements of this algorithm. However, there is one thing which should be done. It involves reducing to the case of an upper Hessenberg matrix which is one which is zero below the main sub diagonal. The following shows that any square matrix is unitarily similar to such an upper Hessenberg matrix. Let A be an invertible n × n matrix. Let Q′1 be a unitary matrix  √∑    n 2   a |a | j1 j=2   a21     0   .   0   ′    Q1  ..  =  ≡ .  ..    ..   .   an1 0 0 The vector Q′1 is multiplying is just the bottom n − 1 entries of the first column of A. Then let Q1 be ( ) 1 0 0 Q′1

416

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES

It follows

 ( Q1 AQ∗1

=

1 0

0 Q′1

)

  =  

AQ∗1    =  

a11 a .. . 0

∗ ∗ a .. . 0

···

a12

a1n

A′1

···

 ) (  1 0   0 Q′∗  1





    

A1

Now let Q′2 be the n − 2 × n − 2 matrix which does to the first column of A1 the same sort of thing that the n − 1 × n − 1 matrix Q′1 did to the first column of A. Let ) ( I 0 Q2 ≡ 0 Q′2 where I is the 2 × 2 identity. Then applying block  ∗  ∗   ∗ ∗ 0 Q2 Q1 AQ1 Q2 =    ..  . 0

multiplication, ∗ ··· ∗ ··· ∗ .. . 0

∗ ∗ A2

 ∗ ∗       

where A2 is now an n − 2 × n − 2 matrix. Continuing this way you eventually get a unitary matrix Q which is a product of those discussed above such that   ∗ ∗ ··· ∗ ∗  ∗ ∗ ··· ∗ ∗     ..    T QAQ =  0 ∗ ∗ .      .. .. . . .. . ∗  .  . . 0



0



This matrix equals zero below the subdiagonal. It is called an upper Hessenberg matrix. It happens that in the QR algorithm, if Ak is upper Hessenberg, so is Ak+1 . To see this, note that the matrix is upper Hessenberg means that Aij = 0 whenever i − j ≥ 2. Ak+1 = Rk Qk where Ak = Qk Rk . Therefore as shown before, Ak+1 = Rk Ak Rk−1 Let the ij th entry of Ak be akij . Then if i − j ≥ 2 ak+1 ij

=

j n ∑ ∑

−1 rip akpq rqj

p=i q=1

It is given that akpq = 0 whenever p − q ≥ 2. However, from the above sum, p−q ≥i−j ≥2

16.3. THE QR ALGORITHM

417

and so the sum equals 0. Since upper Hessenberg matrices stay that way in the algorithm and it is closer to being upper triangular, it is reasonable to suppose the QR algorithm will yield good results more quickly for this upper Hessenberg matrix than for the original matrix. This would be especially true if the matrix is good sized. The other important thing to observe is that, starting with an upper Hessenberg matrix, the algorithm will restrict the size of the blocks which occur to being 2 × 2 blocks which are easy to deal with. These blocks allow you to identify the eigenvalues.   1 2 3 4  2 −2 −3 3    Example 16.3.10 Let A =   a symmetric matrix. Thus it has real eigenval 3 −3 5 1  4 3 1 −3 ues and can be diagonalized. Find its eigenvalues. As explained above, there is an upper Hessenberg matrix. Matlab can find it using the techniques given above pretty quickly. The syntax is as follows. A=[2 1 3;-5,3,-2;1,2,3]; [P,H]=hess(A) Then the Hessenberg matrix similar to A is  −1.4476 −4.9048  −4.9048 3.2553  H=  0 −2.0479 0 0

0 −2.0479 2.1923 −5.0990

0 0 −5.0990 −3

    

Note how it is symmetric also. This will always happen when you begin with a symmetric matrix. Now use the QR algorithm on this matrix. The syntax is as follows in Matlab. H=[enter H here] hold on for k=1:100 [Q,R]=qr(H); H=R*Q; end Q R H You already have H and matlab knows about it so you don’t need to enter H again. This yields the following matrix similar to the original one.   7.4618 0 0 0   0 −6.3804 0 0      0 0 −4.419 −.3679  0 0 −.3679 4.3376 The eigenvalues of this matrix are 7.4618, −6.3804, 4. 353, −4. 434 4 You might want to check that the product of these equals the determinant of the matrix and that the sum equals the trace of the matrix. In fact, this works out very well. To find eigenvectors, you could use the shifted inverse power method. They will be different for the Hessenberg matrix than for the original matrix A.

418

16.4

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES

Exercises

In these exercises which call for a computation, don’t waste time on them unless you use a computer or calculator which can raise matrices to powers and take QR factorizations. 1. In Example 16.2.7 an eigenvalue was found correct to several decimal places along with an eigenvector. Find the other eigenvalues along with their eigenvectors.   3 2 1   2. Find the eigenvalues and eigenvectors of the matrix A =  2 1 3  numerically. In this 1 3 2 √ case the exact eigenvalues are ± 3, 6. Compare with the exact answers.   3 2 1   3. Find the eigenvalues and eigenvectors of the matrix A =  2 5 3  numerically. The exact 1 3 2 √ √ eigenvalues are 2, 4 + 15, 4 − 15. Compare your numerical results with the exact values. Is it much fun to compute the exact eigenvectors?   0 2 1   4. Find the eigenvalues and eigenvectors of the matrix A =  2 5 3  numerically. I don’t 1 3 2 know the exact eigenvalues in this case. Check your answers by multiplying your numerically computed eigenvectors by the matrix.   0 2 1   5. Find the eigenvalues and eigenvectors of the matrix A =  2 0 3  numerically. I don’t 1 3 2 know the exact eigenvalues in this case. Check your answers by multiplying your numerically computed eigenvectors by the matrix.   3 2 3   T 6. Consider the matrix A =  2 1 4  and the vector (1, 1, 1) . Find the shortest distance 3 4 0 between the Rayleigh quotient determined by this vector and some eigenvalue of A.   1 2 1   T 7. Consider the matrix A =  2 1 4  and the vector (1, 1, 1) . Find the shortest distance 1 4 5 between the Rayleigh quotient determined by this vector and some eigenvalue of A.   3 2 3   T 8. Consider the matrix A =  2 6 4  and the vector (1, 1, 1) . Find the shortest distance 3 4 −3 between the Rayleigh quotient determined by this vector and some eigenvalue of A.   3 2 3   9. Using Gerschgorin’s theorem, find upper and lower bounds for the eigenvalues of A =  2 6 4  . 3 4 −3 10. Tell how to find a matrix whose characteristic polynomial is a given monic polynomial. This is called a companion matrix. Find the roots of the polynomial x3 + 7x2 + 3x + 7. 11. Find the roots to x4 + 3x3 + 4x2 + x + 1. It has two complex roots.

16.4. EXERCISES

419

12. Suppose A is a real symmetric matrix and the technique of reducing to an upper Hessenberg matrix is followed. Show the resulting upper Hessenberg matrix is actually equal to 0 on the top as well as the bottom.

420

CHAPTER 16. NUMERICAL METHODS, EIGENVALUES

Part III

Analysis Which Involves Linear Algebra

421

Chapter 17

Approximation of Functions and the Integral In this part of the book, some topics in analysis are considered in which linear algebra plays a key role. This is not about linear algebra, but linear algebra is used in a very essential manner. This first chapter is just what the title indicates. It will involve approximating functions and a simple definition of the integral. This definition is sufficient to consider all piecewise continuous functions and it does not depend on Riemann sums. Thus it is closer to what was done in the 1700’s than in the 1800’s. However, it is based on the Weierstrass approximation theorem so it is definitely dependent on material which originated in the 1800’s. After this, is a very interesting application of ideas from linear algebra to prove M¨ untz’s theorems. The notation C ([0, b] ; X) will denote the functions which are continuous with values in [0, b] with values in X which will always be a normed vector space. It could be C or R for example.

17.1

Weierstrass Approximation Theorem

An arbitrary continuous function defined on an interval can be approximated uniformly by a polynomial, there exists a similar theorem which is just a generalization of this which will hold for continuous functions defined on a box or more generally a closed and bounded set. However, we will settle for the case of a box first. The proof is based on the following lemma. Lemma 17.1.1 The following estimate holds for x ∈ [0, 1] and m ≥ 2. m ( ) ∑ m k=0

k

2

(k − mx) xk (1 − x)

m−k



1 m 4

Proof: First of all, from the binomial theorem m ( ) ∑ m k=0

k

k

(tx) (1 − x)

m−k

m

= (1 − x + tx)

Take a derivative and then let t = 1. m ( ) ∑ m k−1 m−k m−1 k (tx) x (1 − x) = mx (tx − x + 1) k k=0

m ( ) ∑ m k m−k k (x) (1 − x) = mx k

k=0

423

424

CHAPTER 17. APPROXIMATION OF FUNCTIONS AND THE INTEGRAL

Then also,

m ( ) ∑ m

k

k=0

k

m−k

k (tx) (1 − x)

= mxt (tx − x + 1)

m−1

Take another time derivative of both sides. m ( ) ∑ m 2 k−1 m−k k (tx) x (1 − x) k k=0 ( ) m−1 m−2 m−2 = mx (tx − x + 1) − tx (tx − x + 1) + mtx (tx − x + 1) Plug in t = 1.

m ( ) ∑ m 2 k m−k k x (1 − x) = mx (mx − x + 1) k

k=0

Then it follows

m ( ) ∑ m

k

k=0

=

m ( ∑ k=0

2

(k − mx) xk (1 − x)

m−k

) ) m ( 2 m−k k − 2kmx + x2 m2 xk (1 − x) k

and from what was just shown, this equals x2 m2 − x2 m + mx − 2mx (mx) + x2 m2 = −x2 m + mx =

( )2 m 1 −m x− . 4 2

Thus the expression is maximized when x = 1/2 and yields m/4 in this case. This proves the lemma.  With this preparation, here is the first version of the Weierstrass approximation theorem. I will allow f to have values in a complete, real or complex normed linear space. Thus, f ∈ C ([0, 1] ; X) where X is a Banach space, Definition 15.4.1. Thus this is a function which is continuous with values in X as discussed earlier with metric spaces. Theorem 17.1.2 Let f ∈ C ([0, 1] ; X) and let the norm on X be denoted by ∥·∥ . ( ) ( ) m ∑ m k m−k xk (1 − x) f pm (x) ≡ . m k k=0 Then these polynomials having coefficients in X converge uniformly to f on [0, 1]. Proof: Let ∥f ∥∞ denote the largest value of ∥f (x)∥. By uniform continuity of f , there exists a δ > 0 such that if |x − x′ | < δ, then ∥f (x) − f (x′ )∥ < ε/2. By the binomial theorem, ) (

( )

m ∑

m m−k

f k − f (x) ∥pm (x) − f (x)∥ ≤ xk (1 − x)

m k k=0



∑ k −x| 0 is given, there exists a polynomial p such that for all x ∈ [0, 1] , ∥p (x) − f ◦ l (x)∥ < ε. Therefore, letting y = l (x) , it follows that for all y ∈ [a, b] ,

( −1

)

p l (y) − f (y) < ε.  The exact form of the polynomial is as follows. ( ) ( ( )) m ∑ m k m−k p (x) = xk (1 − x) f l m k k=0 (

p l

−1

)

(y) =

m ∑ k=0

(

m k

)

(

l

−1

)k ( )m−k (y) 1 − l−1 (y) f

( ( )) k l m

(17.1)

As another corollary, here is the version which will be used in Stone’s generalization later. Corollary 17.1.4 Let f be a continuous function defined on [−M, M ] with f (0) = 0. Then there exists a sequence of polynomials {pm }, pm (0) = 0 and limm→∞ ∥pm − f ∥∞ = 0. Proof: From Corollary 17.1.3 there exists a sequence of polynomials {pc c m } such that ∥p m − f ∥∞ → 0. Simply consider pm = pc − p c (0).  m m

17.2

Functions of Many Variables

First note that if h : K × H → R is a real valued continuous function where K, H are compact sets in metric spaces, max h (x, y) ≥ h (x, y) , so max max h (x, y) ≥ h (x, y) x∈K

y∈H x∈K

which implies maxy∈H maxx∈K h (x, y) ≥ max(x,y)∈K×H h (x, y) . The other inequality is also obtained. p Let f ∈ C (Rp ; X) where Rp = [0, 1] . Then let x ˆp ≡ (x1 , ..., xp−1 ) . By Theorem 17.1.2, if n is large enough,

( )( ) n



k ε n k

n−k f ·, < max xp (1 − xp ) − f (·, xp )

n k 2 xp ∈[0,1] k=0 C ([0,1]p−1 ;X )

426

CHAPTER 17. APPROXIMATION OF FUNCTIONS AND THE INTEGRAL

( ) Now f ·, nk ∈ C (Rp−1 ; X) and so by induction, there is a polynomial pk (ˆ xp ) such that

) ( ) (

ε n k


0 there exists a polynomial p having coefficients in X such that ∥p − f ∥C(R;X) < ε. These Bernstein polynomials are very remarkable approximations. It turns out that if f is C 1 ([0, 1] ; X) , then limn→∞ p′n (x) → f ′ (x) uniformly on [0, 1] . This all works for functions of many variables as well, but here I will only show it for functions of one variable. I assume the reader knows about the derivative of a function of one variable. If not, skip this till later till after the derivative has been presented. Lemma 17.2.2 Let f ∈ C 1 ([0, 1]) and let ( ) ( ) m ∑ m k m−k k pm (x) ≡ x (1 − x) f m k k=0 be the mth Bernstein polynomial. Then in addition to ∥pm − f ∥[0,1] → 0, it also follows that ∥p′m − f ′ ∥[0,1] → 0 Proof: From simple computations, ( ) ( ) m ∑ m k m−k ′ k−1 pm (x) = kx (1 − x) f m k k=1 ( ) ( ) m−1 ∑ m k m−1−k − xk (m − k) (1 − x) f m k k=0 ( ) m (m − 1)! k m−k k−1 = x (1 − x) f (m − k)! (k − 1)! m k=1 ( ) ( ) m−1 ∑ m k m−1−k − xk (m − k) (1 − x) f m k k=0 m ∑

=

) k+1 m k=0 ( ) m−1 ∑ m (m − 1)! k m−1−k k − x (1 − x) f (m − 1 − k)!k! m m−1 ∑

m (m − 1)! k m−1−k x (1 − x) f (m − 1 − k)!k!

k=0

(

17.3. A GENERALIZATION WITH TIETZE EXTENSION THEOREM

=

427

( ( ) ( )) m (m − 1)! k k+1 k m−1−k x (1 − x) f −f (m − 1 − k)!k! m m k=0 ( ) ( ( ) ( k )) m−1 ∑ f k+1 −f m m−1 m−1−k k m x (1 − x) = 1/m k k=0

m−1 ∑

By the mean value theorem, ( ) (k) ( ) f k+1 −f m k k+1 ′ m = f (xk,m ) , xk,m ∈ , 1/m m m Now the desired result follows as before from the uniform continuity of f ′ on [0, 1]. Let δ > 0 be such that if |x − y| < δ, then |f ′ (x) − f ′ (y)| < ε k and let m be so large that 1/m < δ/2. Then if x − m < δ/2, it follows that |x − xk,m | < δ and so ) ( k ) ( −f m f k+1 ′ ′ ′ m |f (x) − f (xk,m )| = f (x) − < ε. 1/m Now as before, letting M ≥ |f ′ (x)| for all x, ( ) m−1 ∑ m−1 m−1−k ′ ′ |pm (x) − f (x)| ≤ xk (1 − x) |f ′ (xk,m ) − f ′ (x)| k k=0 (





k |< δ2 } {x:|x− m

+M

m−1 ∑ k=0

(

m−1 k

m−1 k

)

) xk (1 − x)

m−1−k

ε

2

4 (k − mx) k m−1−k x (1 − x) m2 δ 2

1 1 1 ≤ ε + 4M m 2 2 = ε + M < 2ε 4 m δ mδ 2 whenever m is large enough. Thus this proves uniform convergence. 

17.3

A Generalization with Tietze Extension Theorem

This is an interesting theorem which holds in arbitrary normal topological spaces. In particular it holds in metric space and this is the context in which it will be discussed. First is a lemma. Lemma 17.3.1 Let X be a metric space and let S be a nonempty subset of X. dist (x, S) ≡ inf {d (x, z) : z ∈ S} Then |dist (x, S) − dist (y, S)| ≤ d (x, y) . Proof: Say dist (x, S) ≥ dist (y, S) . Then letting ε > 0 be given, there exists z ∈ S such that d (y, z) < dist (y, S) + ε Then |dist (x, S) − dist (y, S)| = dist (x, S) − dist (y, S) ≤ dist (x, S) − (d (y, z) − ε) ≤ d (x, z) − (d (y, z) − ε) ≤ d (x, y) + d (y, z) − d (y, z) + ε = d (x, y) + ε Since ε is arbitrary, |dist (x, S) − dist (y, S)| ≤ d (x, y) It is similar if dist (x, S) < dist (y, S) . 

428

CHAPTER 17. APPROXIMATION OF FUNCTIONS AND THE INTEGRAL

Lemma 17.3.2 Let H, K be two nonempty disjoint closed subsets of X. Then there exists a continuous function, g : X → [−1/3, 1/3] such that g (H) = −1/3, g (K) = 1/3, g (X) ⊆ [−1/3, 1/3] . dist(x,H) Proof: Let f (x) ≡ dist(x,H)+dist(x,K) . The denominator is never equal to zero because if dist (x, H) = 0, then x ∈ H because H is closed. (To see this, pick hk ∈ B (x, 1/k) ∩ H. Then hk → x and since H is closed, x ∈ H.) Similarly, if dist (x, K) = 0, then x ∈ K and so the denominator is never zero as claimed. ( Hence )f is continuous and from its definition, f = 0 on H and f = 1 on K. Now let g (x) ≡ 32 f (x) − 12 . Then g has the desired properties. 

Definition 17.3.3 For f : M ⊆ X → R, define ∥f ∥M as sup {|f (x)| : x ∈ M } . This is just notation. I am not claiming this is a norm. Lemma 17.3.4 Suppose M is a closed set in X and suppose f : M → [−1, 1] is continuous at every point of M. Then there exists a function, g which is defined and continuous on all of X such that ∥f − g∥M < 32 , g (X) ⊆ [−1/3, 1/3] . Proof: Let H = f −1 ([−1, −1/3]) , K = f −1 ([1/3, 1]) . Thus H and K are disjoint closed subsets of M. Suppose first H, K are both nonempty. Then by Lemma 17.3.2 there exists g such that g is a continuous function defined on all of X and g (H) = −1/3, g (K) = 1/3, and g (X) ⊆ [−1/3, 1/3] . It follows ∥f − g∥M < 2/3. If H = ∅, then f has all its values in [−1/3, 1] and so letting g ≡ 1/3, the desired condition is obtained. If K = ∅, let g ≡ −1/3.  Lemma 17.3.5 Suppose M is a closed set in X and suppose f : M → [−1, 1] is continuous at every point of M. Then there exists a function g which is defined and continuous on all of X such that g = f on M and g has its values in [−1, 1] . Proof: Using Lemma 17.3.4, let g1 be such that g1 (X) ⊆ [−1/3, 1/3] and ∥f − g1 ∥M ≤ 32 . Suppose g1 , · · · , gm have been chosen such that gj (X) ⊆ [−1/3, 1/3] and

( )m m ( )i−1

∑ 2 2

gi < . (17.2)

f −

3 3 i=1 M

This has been done for m = 1. Then

( ) ( ) m ( )i−1

3 m

∑ 2

f− gi

2

3 i=1 and so

( 3 )m ( 2

f−

∑m ( 2 )i−1 i=1

3

)

gi

≤1

M

can play the role of f in the first step of the proof. Therefore,

there exists gm+1 defined and continuous on all of X such that its values are in [−1/3, 1/3] and

( ) ( ) m ( )i−1

3 m ∑ 2 2

f− gi − gm+1 ≤ .

2 3 3 i=1 M

Hence

(

) ( ) m ( )i−1 m

∑ 2 2

gi − gm+1

f−

3 3 i=1

M

( )m+1 2 ≤ . 3

It follows there exists a sequence, {gi } such that each has its values in [−1/3, 1/3] and for every m ∑∞ ( )i−1 17.2 holds. Then let g (x) ≡ i=1 23 gi (x) . It follows ∞ ( ) m ( )i−1 ∑ 2 i−1 ∑ 2 1 |g (x)| ≤ gi (x) ≤ ≤1 3 3 3 i=1 i=1 ( ) i−1 1 ( )i−1 gi (x) ≤ 23 and 23 3 so the Weierstrass M test applies and shows convergence is uniform. Therefore g must be continuous by Theorem 11.1.46. The estimate 17.2 implies f = g on M .  The following is the Tietze extension theorem.

17.4. AN APPROACH TO THE INTEGRAL

429

Theorem 17.3.6 Let M be a closed nonempty subset of X and let f : M → [a, b] be continuous at every point of M. Then there exists a function, g continuous on all of X which coincides with f on M such that g (X) ⊆ [a, b] . 2 Proof: Let f1 (x) = 1 + b−a (f (x) − b) . Then f1 satisfies the conditions of Lemma 17.3.5 and so there exists (g1 : )X → [−1, 1] such that g is continuous on X and equals f1 on M . Let g (x) = (g1 (x) − 1) b−a + b. This works.  2 For x ∈ M, (( ) )( ) 2 b−a g (x) = 1+ (f (x) − b) − 1 +b b−a 2 (( )) ( ) 2 b−a = (f (x) − b) +b b−a 2 = (f (x) − b) + b = f (x)

Also 1 +

2 b−a

(f (x) − b) ∈ [−1, 1] so

2 b−a

(f (x) − b) ∈ [−2, 0] and

(f (x) − b) ∈ [−b + a, 0] , f (x) ∈ [a, b] . With the Tietze extension theorem, here is a better version of the Weierstrass approximation theorem. Theorem 17.3.7 Let K be a closed and bounded subset of Rp and let f : K → R be continuous. Then there exists a sequence of polynomials {pm } such that lim (sup {|f (x) − pm (x)| : x ∈ K}) = 0.

m→∞

In other words, the sequence of polynomials converges uniformly to f on K. Proof: By the Tietze extension theorem, there exists an extension of f to a continuous function p g defined on ∏pall R such that g = f on K. Now since K is bounded, there exist intervals, [ak , bk ] such that K ⊆ k=1 [ak , bk ] = R. Then by the Weierstrass approximation theorem, Theorem 17.2.1 there exists a sequence of polynomials {pm } converging uniformly to g on R. Therefore, this sequence of polynomials converges uniformly to g = f on K as well. This proves the theorem.  By considering the real and imaginary parts of a function which has values in C one can generalize the above theorem. Corollary 17.3.8 Let K be a closed and bounded subset of Rp and let f : K → F be continuous. Then there exists a sequence of polynomials {pm } such that lim (sup {|f (x) − pm (x)| : x ∈ K}) = 0.

m→∞

In other words, the sequence of polynomials converges uniformly to f on K. More generally, the function f could have values in Rp . There is no change in the proof. You just use norm symbols rather than absolute values and nothing at all changes in the theorem where the function is defined on a rectangle. Then you apply the Tietze extension theorem to each component in the case the function has values in Rp . Using a better extension theorem than what is presented in this book, one could generalize this to a function having values in a Banach space.

17.4

An Approach to the Integral

First is a short review of the derivative of a function of one variable. (x) Definition 17.4.1 Let f : [a, b] → R. Then f ′ (x) ≡ limx→0 f (x+h)−f where h is always such h that x, x + h are both in the interval [a, b] so we include derivatives at the right and left end points in this definition.

430

CHAPTER 17. APPROXIMATION OF FUNCTIONS AND THE INTEGRAL

The most important theorem about derivatives of functions of one variable is the mean value theorem. Theorem 17.4.2 Let f : [a, b] → R be continuous. Then if the maximum value of f occurs at a point x ∈ (a, b) , it follows that if f ′ (x) = 0. If f achieves a minimum at x ∈ (a, b) where f ′ (x) exists, it also follows that f ′ (x) = 0. Proof: By Theorem 11.1.39, f achieves a maximum at some point x. If f ′ (x) exists, then f (x + h) − f (x) f (x + h) − f (x) = lim h→0− h→0+ h h

f ′ (x) = lim

However, the first limit is non-positive while the second is non-negative and so f ′ (x) = 0. The situation is similar if the minimum occurs at x ∈ (a, b).  The Cauchy mean value theorem follows. The usual one is obtained by letting g(x) = x. Theorem 17.4.3 Let f, g be continuous on [a, b] and differentiable on (a, b) . Then there exists x ∈ (a, b) such that f ′ (x) (g (b) − g (a)) = g ′ (x) (f (b) − f (a)). If g (x) = x, this yields f (b)−f (a) = f ′ (x) (b − a) , also f (a) − f (b) = f ′ (x) (a − b). Proof: Let h (x) ≡ f (x) (g (b) − g (a)) − g (x) (f (b) − f (a)) . Then h (a) = h (b) = f (a) g (b) − g (a) f (b) . If h is constant, then pick any x ∈ (a, b) and h′ (x) = 0. If h is not constant, then it has either a maximum or a minimum on (a, b) and so if x is the point where either occurs, then h′ (x) = 0 which proves the theorem.  Recall that an antiderivative of a function f is)just a function F such that F ′ = f . You know how ( n+1 ′ ∫ ∑n ∑n xk+1 k to find an antiderivative for a polynomial. xn+1 = xn so k=1 ak x = k=1 ak k+1 + C. With this information and the Weierstrass theorem, it is easy to define integrals of continuous functions with all the properties presented in elementary calculus courses. It is an approach which does not depend on Riemann sums yet still gives the fundamental theorem of calculus. Note that if F ′ (x) = 0 for x in an interval, then for x, y in that interval, F (y) − F (x) = 0 (y − x) so F is a constant. Thus, if F ′ = G′ on an open interval, F, G continuous on the closed interval, it follows that F − G is a constant and so F (b) − F (a) = G (b) − G (a). Definition 17.4.4 For p (x) a polynomial on [a, b] , let P ′ (x) = p (x) . Thus, by the mean value ∫b theorem if P ′ , Pˆ ′ both equal p, it follows that P (b) − P (a) = Pˆ (b) − Pˆ (a) . Then define a p (x) dx ≡ ∫b ∫b P (b) − P (a). If f ∈ C ([a, b]) , define a f (x) dx ≡ limn→∞ a pn (x) dx where limn→∞ ∥pn − f ∥ ≡ limn→∞ maxx∈[a,b] |f (x) − pn (x)| = 0. Proposition 17.4.5 The above integral is well defined and satisfies the following properties. ∫ ∫b b 1. a f dx = f (ˆ x) (b − a) for some x ˆ between a and b. Thus a f dx ≤ ∥f ∥ |b − a| . 2. If f is continuous on an interval which contains all necessary intervals, ∫ c ∫ b ∫ b ∫ b ∫ a ∫ b f dx + f dx = f dx, so f dx + f dx = f dx = 0 a

c

a

a

b

b

∫x

f dt, Then F ′ (x) = f (x) so any continuous function has an antiderivative, and ∫b for any a ̸= b, a f dx = G (b) − G (a) whenever G′ = f on the open interval determined by a, b and G continuous on the closed interval determined by a, b. Also, ∫ ∫ b ∫ b f (x) dx + β βg (x) dx (αf (x) + βg (x)) dx = α

3. If F (x) ≡

a

a

a

a

∫ ∫ ∫b b b If a < b, and f (x) ≥ 0, then a f dx ≥ 0. Also a f dx ≤ a |f | dx .

17.4. AN APPROACH TO THE INTEGRAL 4.

∫b a

431

1dx = b − a.

Proof: First, why is the integral well defined? With notation as in the above definition, the mean value theorem implies ∫ b p (x) dx ≡ P (b) − P (a) = p (ˆ x) (b − a) (17.3) a

∫ b where x ˆ is between a and b and so a p (x) dx ≤ ∥p∥ |b − a| . If ∥pn − f ∥ → 0, then lim ∥pn − pm ∥ = 0

m,n→∞

and so

∫ ∫ b b pn (x) dx − pm (x) dx a a

= |(Pn (b) − Pn (a)) − (Pm (b) − Pm (a))| = |(Pn (b) − Pm (b)) − (Pn (a) − Pm (a))| ∫ b = (pn − pm ) dx ≤ ∥pn − pm ∥ |b − a| a

Thus the limit exists because

{∫ b a

} pn dx is a Cauchy sequence and R is complete. n

From 17.3, 1. holds for a polynomial p (x). Let ∥pn − f ∥ → 0. Then by definition, ∫ b ∫ b f dx ≡ lim pn dx = pn (xn ) (b − a) n→∞

a

(17.4)

a

for some xn in the open interval determined by (a, b) . By compactness, there is a further subsequence, still denoted with n such that xn → x ∈ [a, b] . Then fixing m such that ∥f − pn ∥ < ε whenever n ≥ m, assume n > m. Then ∥pm − pn ∥ ≤ ∥pm − f ∥ + ∥f − pn ∥ < 2ε and so |f (x) − pn (xn )| ≤ |f (x) − f (xn )| + |f (xn ) − pm (xn )| + |pm (xn ) − pn (xn )| ≤ |f (x) − f (xn )| + ∥f − pm ∥ + ∥pm − pn ∥ < |f (x) − f (xn )| + 3ε Now if n is still larger, continuity of f shows that |f (x) − pn (xn )| < 4ε. Since ε is arbitrary, pn (xn ) → f (x) and so, passing to the limit with this subsequence in 17.4 yields 1. Now consider 2. It holds for polynomials p (x) obviously. So let ∥pn − f ∥ → 0. Then ∫ c ∫ b ∫ b pn dx + pn dx = pn dx a

c

a

∫b Pass to a limit as n → ∞ and use the definition to get 2. Also note that b f (x) dx = 0 follows from the definition. Next consider 3. Let h ̸= 0 and let x be in the open interval determined by a and b. Then for small h, ∫ F (x + h) − F (x) 1 x+h f (t) dt = f (xh ) = h h x where xh is between x and x + h. Let h → 0. By continuity of f, it follows that the limit of the right side exists and so F (x + h) − F (x) lim = lim f (xh ) = f (x) h→0 h→0 h If x is either end point, the argument is the same except you have to pay attention to the sign of h so that both x and x + h are in [a, b]. Thus F is continuous on [a, b] and F ′ exists on (a, b) so if G is an antiderivative, ∫ b f (t) dt ≡ F (b) = F (b) − F (a) = G (b) − G (a) a

432

CHAPTER 17. APPROXIMATION OF FUNCTIONS AND THE INTEGRAL

The claim that the integral is linear is obvious from this. Indeed, if F ′ = f, G′ = g, ∫ b (αf (t) + βg (t)) dt = αF (b) + βG (b) − (αF (a) + βG (a)) a

= α (F (b) − F (a)) + β (G (b) − G (a)) ∫ b ∫ b = α f (t) dt + β g (t) dt a

a

If f ≥ 0, then the mean value theorem implies that for some ∫ b t ∈ (a, b) , F (b) − F (a) = f dx = f (t) (b − a) ≥ 0. a

∫b ∫b ∫b ∫b (|f | − f ) dx ≥ 0, a (|f | + f ) dx ≥ 0 and so a |f | dx ≥ a f dx and a |f | dx ≥ − a f dx so ∫b ∫ b ∫ b this proves a f dx ≤ a |f | dx. This, along with part 2 implies the other claim that a f dx ≤ ∫ b a |f | dx . The last claim is obvious because an antiderivative of 1 is F (x) = x.  Note also that the usual change of variables theorem is available because if F ′ = f, then d f (g (x)) g ′ (x) = dx F (g (x)) so that, from the above proposition, ∫ b ∫ g(b) F (g (b)) − F (g (a)) = f (y) dy = f (g (x)) g ′ (x) dx. Thus

∫b

∫b a

g(a)

a



We usually let y = g (x) and dy = g (x) dx and then change the limits as indicated above, equivalently we massage the expression to look like the above. Integration by parts also follows from differentiation rules. ∫b ∫b αp 1 Consider the iterated integral a11 · · · app αxα 1 · · · xp dxp · · · dx1 . It means just what it meant in calculus. You do the integral with respect to xp first, keeping the other variables constant, obtaining a polynomial function of the other variables. Then you do this one with respect to xp−1 and so forth. Thus, doing the computation, it reduces to (∫ ) ) p p ( α +1 bk ∏ ∏ aαk +1 b k αk α xk dxk = α − αk + 1 αk + 1 ak k=1

k=1

and the same thing would be obtained for any other order of the iterated integrals. Since each of these integrals is linear, it follows that if (i1 , · · · , ip ) is any permutation of (1, · · · , p) , then for any polynomial q, ∫ b1 ∫ bp ∫ bi1 ∫ bip ··· q (x1 , ..., xp ) dxp · · · dx1 = ··· q (x1 , ..., xp ) dxip · · · dxi1 a1

∏p

ap

aip

aip

Now let f : k=1 [ak , bk ] → R be continuous. Then each iterated integral results in a continuous function of the remaining variables and so the iterated integral makes sense. For example, by ∫ ∫d d Proposition 17.4.5, c f (x, y) dy − c f (ˆ x, y) dy = ∫ d x, y)| < ε (f (x, y) − f (ˆ x, y)) dy ≤ max |f (x, y) − f (ˆ y∈[c,d] c if |x − x ˆ| is sufficiently small, thanks to uniform continuity of f on the compact set [a, b]×[c, d]. Thus ∫b∫d it makes perfect sense to consider the iterated integral a c f (x, y) dydx. Then using Proposition 17.4.5 on the iterated integrals along with Theorem 17.2.1, there exists a sequence of polynomials which converges to f uniformly {pn } . Then applying Proposition 17.4.5 repeatedly, ∫ ∫ bip ∫ bi1 ∫ bip bi1 pn (x) dxp · · · dx1 ··· f (x) dxp · · · dx1 − ··· aip aip aip aip

17.4. AN APPROACH TO THE INTEGRAL

433 p ∏

≤ ∥f − pn ∥

|bk − ak |

(17.5)

k=1

With this, it is easy to prove a rudimentary Fubini theorem valid for continuous functions. ∏p Theorem 17.4.6 f : k=1 [ak , bk ] → R be continuous. Then for (i1 , · · · , ip ) any permutation of (1, · · · , p) , ∫ bp ∫ b1 ∫ bi1 ∫ bip f (x) dxp · · · dx1 ··· f (x) dxip · · · dxi1 = ··· aip

aip

ap

a1

If f ≥ 0, then the iterated integrals are nonnegative if each ak ≤ bk . Proof: Let ∥pn − f ∥ → 0 where pn is a polynomial. Then from 17.5, ∫

bi1



bip

···

n→∞

aip

ai1



b1

= lim

n→∞

∫ f (x) dxip · · · dxi1 = lim ∫

···

a1

bp

bi1



aip



ap

pn (x) dxip · · · dxi1

aip

b1

pn (x) dxp · · · dx1 =

bip

··· ∫ ···

a1

bp

f (x) dxp · · · dx1 

ap

You could replace f with f XG where XG (x) = 1 if x ∈ G and 0 otherwise provided each section of G consisting of holding all variables constant but 1, consists of finitely many intervals. Thus you can integrate over all the usual sets encountered in beginning calculus. Definition 17.4.7 A function f : [a, b] → R is piecewise continuous if there are zi with a = z0 < z1 < · · · < zn = b, called a partition of [a, b] , and functions fi continuous on [zi−1 , zi ] such that f = fi on (zi−1 , zi ). For f piecewise continuous, define ∫

b

f (t) dt ≡ a

n ∫ ∑

zi

fi (s) ds

zi−1

i=1

Of course this gives what appears to be a new definition because if f is continuous on [a, b] , then it is piecewise continuous for any such partition. However, it gives the same answer because, from this new definition, ∫ b n ∑ (F (zi ) − F (zi−1 )) = F (b) − F (a) f (t) dt = a

i=1

Does this give the main properties of the integral? In particular, is the integral still linear? Suppose n f, g are piecewise continuous. Then let {zi }i=1 include all the partition points of both of these functions. Then, since it was just shown that no harm is done by including more partition points, ∫

b

αf (t) + βg (t) dt ≡ a

n ∫ ∑

=

∫ n ∑ α

=

zi

zi−1 i=1 ∫ n ∑ zi

α

i=1 ∫ b

α

fi (s) ds +

∫ n ∑ β

zi

zi−1 i=1 ∫ n ∑ zi

fi (s) ds + β

zi−1



f (t) dt + β a

(αfi (s) + βgi (s)) ds

zi−1

i=1

=

zi

i=1

gi (s) ds gi (s) ds

zi−1

b

g (t) dt a

∫b ∫c ∫b Also, the claim that a f dt = a f dt+ c f dt is obtained exactly as before by considering all partition points on each integral preserving the order of the limits in the small intervals determined by the

434

CHAPTER 17. APPROXIMATION OF FUNCTIONS AND THE INTEGRAL

partition points. That is, if a > c, you would have zi−1 > zi . Notice how this automatically takes care of orientation. Is this as general as a complete treatment of Riemann integration? No it is not. In particular, it ( ) does not include the well known example where f (x) = sin x1 for x ∈ (0, 1] and f (0) ≡ 0. However, it is sufficiently general to include all cases which are typically of interest. It would be enough to build a theory of ordinary differential equations. It would also be enough to provide the theory of convergence of Fourier series to the midpoint of the jump and so forth. Also, the Riemann integral is woefully inadequate when it comes to a need to handle limits. You need the Lebesgue integral and to obtain this, it is enough to consider knowledge of integrals of continuous functions. This is shown later.

17.5

The M¨ untz Theorems

All this about to be presented would work on any interval, but it would involve fussy considerations involved with extra constants. Therefore, I will only present what happens on [0, 1]. These theorems have to do with considering linear combinations of the functions fp (x) ≡ xp for p = p1 , p2 , ... and whether one can approximate an arbitrary continuous function with such a linear combination. Linear algebra techniques are what make this possible, at least in this book. I am following Cheney [13]. In what follows m will be a nonnegative integer. I will consider the real inner product space X ∫1 consisting of functions in C ([0, 1]) with the inner product 0 f gdx = (f, g) . Thus, as shown earlier, the Cauchy Schwarz inequality holds ∫

(∫

1

|f | |g| dx ≤ 0

I will write |f | ≡

(∫

1 0

2

|f | dx

1

)1/2 (∫ 2

|f | dx 0

1

)1/2 2 |g| dx

0

)1/2 . The above treatment of the integral of continuous functions is

sufficient for the needs here. Also let Vn ≡ span (fp1 , ..., fpn ) . The main idea is to estimate the distance between fm and Vm in X. The Grammian matrix of {fp1 , ..., fpn } is easily seen to be   1 · · · p1 +p1n +1 p1 +p1 +1   .. ..  G (fp1 , ..., fpn ) =  . .   1 1 · · · p1 +pn +1 pn +pn +1 I will assume pj > − 12 to avoid any possibility of terms which make no sense in the Grammian matrix given above. I will also assume none of these pj are integers so that Vn never contains fm , fm (x) = xm , m a positive integer. If such is in your list, it simply makes the approximation easier to obtain. By Theorem 8.6.5, the Cauchy identity for determinants, ∏ j 0 such that B ((x, y) , r) ∈ U.

This says that if (u, v) ∈ X × Y such that ||(u, v) − (x, y)|| < r, then (u, v) ∈ U. Thus if ||(u, y) − (x, y)|| = ||u − x|| < r, then (u, y) ∈ U. This has just said that B (x,r), the ball taken in X is contained in Uy . This proves the lemma.  Or course one could also consider Ux ≡ {y : (x, y) ∈ U } in the same way and conclude this set is open ∏n in Y . Also, the generalization to many factors yields the same conclusion. In this case, for x ∈ i=1 Xi , let ( ) ||x|| ≡ max ||xi ||Xi : x = (x1 , · · · , xn ) ∏n Then a similar argument to the above shows this is a norm on i=1 Xi . Consider the triangle inequality. ( ) ( ) ∥(x1 , · · · , xn ) + (y1 , · · · , yn )∥ = max ||xi + yi ||Xi ≤ max ∥xi ∥Xi + ∥yi ∥Xi (

i

)

(

≤ max ||xi ||Xi + max ||yi ||Xi Corollary 18.8.2 Let U ⊆

∏n

i

)

i

i

Xi be an open set and let { ( ) } U(x1 ,··· ,xi−1 ,xi+1 ,··· ,xn ) ≡ x ∈ Fri : x1 , · · · , xi−1 , x, xi+1 , · · · , xn ∈ U . i=1

Then U(x1 ,··· ,xi−1 ,xi+1 ,··· ,xn ) is an open set in Fri . ( ) Proof: Let z ∈ U(x1 ,··· ,xi−1 ,xi+1 ,··· ,xn ) . Then x1 , · · · , xi−1 , z, xi+1 , · · · , xn ≡ x ∈ U by definition. Therefore, since U is open, there exists r > 0 such that B (x, r) ⊆ U. It follows that for B (z, r)Xi denoting the ball in Xi , it follows that B (z, r)Xi ⊆ U(x1 ,··· ,xi−1 ,xi+1 ,··· ,xn ) because to say that ∥z − w∥Xi < r is to say that

( ) ( )

x1 , · · · , xi−1 , z, xi+1 , · · · , xn − x1 , · · · , xi−1 , w, xi+1 , · · · , xn < r and so w ∈ U(x1 ,··· ,xi−1 ,xi+1 ,··· ,xn ) .  Next is a generalization of the partial derivative. ∏n Definition 18.8.3 Let g : U ⊆ i=1 Xi → Y , where U is an open set. Then the map ( ) z → g x1 , · · · , xi−1 , z, xi+1 , · · · , xn is a function from the open set in Xi , { ( ) } z : x = x1 , · · · , xi−1 , z, xi+1 , · · · , xn ∈ U to Y . When this map ∏ is differentiable, its derivative is denoted by Di g (x). To aid in the notation, n for v∏∈ Xi , let θi v ∈ i=1 Xi be the vector (0, · · · , v, · · · , 0) where the v is in the ith slot and for n v ∈ i=1 Xi , let vi denote the entry in the ith slot of v. Thus, by saying ( ) z → g x1 , · · · , xi−1 , z, xi+1 , · · · , xn is differentiable is meant that for v ∈ Xi sufficiently small, g (x + θi v) − g (x) = Di g (x) v + o (v) . Note Di g (x) ∈ L (Xi , Y ) .

18.8. THE DERIVATIVE AND THE CARTESIAN PRODUCT

449

Definition 18.8.4 Let U ⊆ X be an open set. Then f : U → Y is C 1 (U ) if f is differentiable and the mapping x → Df (x) , is continuous as a function from U to L (X, Y ). With this definition of partial derivatives, here is the major theorem. Note the resemblance with the matrix of the derivative of a function having values in Rm in terms of the partial derivatives. ∏n Theorem 18.8.5 Let g, U, i=1 Xi , be given as in Definition 18.8.3. Then g is C 1 (U ) if and only if Di g exists and is continuous on U for each i. In this case, g is differentiable and ∑ Dg (x) (v) = Dk g (x) vk (18.8) k

where v = (v1 , · · · , vn ) . Proof: Suppose then that Di g exists and is continuous for each i. Note that k ∑

θj vj = (v1 , · · · , vk , 0, · · · , 0) .

j=1

Thus

∑n j=1

θj vj = v and define

∑0 j=1

g (x + v) − g (x) =

θj vj ≡ 0. Therefore,

n ∑

  g x+

k ∑





θj vj  − g x +

j=1

k=1

k−1 ∑

 θ j vj 

Consider the terms in this sum.     k−1 k ∑ ∑ θj vj  = g (x+θk vk ) − g (x) + θ j vj  − g x + g x+

g x+

k ∑

(18.10)

j=1

j=1

 

(18.9)

j=1





 

θj vj  − g (x+θk vk ) − g x +

j=1

k−1 ∑





θj vj  − g (x)

(18.11)

j=1

and the expression in 18.11 is of the form h (vk ) − h (0) where for small w ∈ Xk ,   k−1 ∑ h (w) ≡ g x+ θj vj + θk w − g (x + θk w) . j=1

Therefore,

 Dh (w) = Dk g x+

k−1 ∑

 θj vj + θk w − Dk g (x + θk w)

j=1

and by continuity, ||Dh (w)|| < ε provided ||v|| is small enough. Therefore, by Theorem 18.4.2, the mean value inequality, whenever ||v|| is small enough, ||h (vk ) − h (0)|| ≤ ε ||v|| which shows that since ε is arbitrary, the expression in 18.11 is o (v). Now in 18.10 g (x+θk vk ) − g (x) = Dk g (x) vk + o (vk ) = Dk g (x) vk + o (v) .

450

CHAPTER 18. THE DERIVATIVE, A LINEAR TRANSFORMATION

Therefore, referring to 18.9, g (x + v) − g (x) =

n ∑

Dk g (x) vk + o (v)

k=1

which shows Dg (x) exists and equals the formula given in 18.8. Also x →Dg (x) is continuous since each of the Dk g (x) are. Next suppose g is C 1 . I need to verify that Dk g (x) exists and is continuous. Let v ∈ Xk sufficiently small. Then g (x + θk v) − g (x)

=

Dg (x) θk v + o (θk v)

=

Dg (x) θk v + o (v)

since ||θk v|| = ||v||. Then Dk g (x) exists and equals Dg (x) ◦ θk Now ∏n x → Dg (x) is continuous. Since θk is linear, it follows from Theorem 11.6.3 that θk : Xk → i=1 Xi is also continuous.  Note that the above argument also works at a single point x. That is, continuity at x of the partials implies Dg (x) exists and is continuous at x. The way this is usually used is in the following corollary which has already been obtained. Remember the matrix of Df (x). Recall that if a function is C 1 in the sense that x → Df (x) is continuous then all the partial derivatives exist and are continuous. The next corollary says that if the partial derivatives do exist and are continuous, then the function is differentiable and has continuous derivative. Corollary 18.8.6 Let U be an open subset of Fn and let f :U → Fm be C 1 in the sense that all the partial derivatives of f exist and are continuous. Then f is differentiable and f (x + v) = f (x) +

n ∑ ∂f (x) vk + o (v) . ∂xk

k=1

Similarly, if the partial derivatives up to order k exist and are continuous, then the function is C k in the sense that the first k derivatives exist and are continuous.

18.9

Mixed Partial Derivatives

The following theorem about equality of partial derivatives was known to Euler around 1734 and was proved later. Theorem 18.9.1 Suppose f : U ⊆ F2 → R where U is an open set on which fx , fy , fxy and fyx exist. Then if fxy and fyx are continuous at the point (x, y) ∈ U , it follows fxy (x, y) = fyx (x, y) . Proof: Since U is open, there exists r > 0 such that B ((x, y) , r) ⊆ U. Now let |t| , |s| < r/2, t, s real numbers and consider h(t)

h(0)

}| { z }| { 1 z ∆ (s, t) ≡ {f (x + t, y + s) − f (x + t, y) − (f (x, y + s) − f (x, y))}. st Note that (x + t, y + s) ∈ U because ( )1/2 |(x + t, y + s) − (x, y)| = |(t, s)| = t2 + s2 ( 2 )1/2 r r2 r ≤ + = √ < r. 4 4 2

(18.12)

18.9. MIXED PARTIAL DERIVATIVES

451

As implied above, h (t) ≡ f (x + t, y + s) − f (x + t, y). Therefore, by the mean value theorem from one variable calculus and the (one variable) chain rule, ∆ (s, t)

1 1 (h (t) − h (0)) = h′ (αt) t st st 1 (fx (x + αt, y + s) − fx (x + αt, y)) s

= =

for some α ∈ (0, 1) . Applying the mean value theorem again, ∆ (s, t) = fxy (x + αt, y + βs) where α, β ∈ (0, 1). If the terms f (x + t, y) and f (x, y + s) are interchanged in 18.12, ∆ (s, t) is unchanged and the above argument shows there exist γ, δ ∈ (0, 1) such that ∆ (s, t) = fyx (x + γt, y + δs) . Letting (s, t) → (0, 0) and using the continuity of fxy and fyx at (x, y) , lim (s,t)→(0,0)

∆ (s, t) = fxy (x, y) = fyx (x, y) .

The following is obtained from the above by simply fixing all the variables except for the two of interest. Corollary 18.9.2 Suppose U is an open subset of X and f : U → R has the property that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk are both continuous at x ∈ U. Then fxk xl (x) = fxl xk (x) . By considering the real and imaginary parts of f in the case where f has values in C you obtain the following corollary. Corollary 18.9.3 Suppose U is an open subset of Fn and f : U → F has the property that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk are both continuous at x ∈ U. Then fxk xl (x) = fxl xk (x) . Finally, by considering the components of f you get the following generalization. Corollary 18.9.4 Suppose U is an open subset of Fn and f : U → Fm has the property that for two indices, k, l, fxk , fxl , fxl xk , and fxk xl exist on U and fxk xl and fxl xk are both continuous at x ∈ U. Then fxk xl (x) = fxl xk (x) . It is necessary to assume the mixed partial derivatives are continuous in order to assert they are equal. The following is a well known example [2]. Example 18.9.5 Let

{ f (x, y) =

xy (x2 −y 2 ) x2 +y 2

if (x, y) ̸= (0, 0) 0 if (x, y) = (0, 0)

From the definition of partial derivatives it follows immediately that fx (0, 0) = fy (0, 0) = 0. Using the standard rules of differentiation, for (x, y) ̸= (0, 0) , fx = y Now

x4 − y 4 + 4x2 y 2 (x2 +

2 y2 )

, fy = x

x4 − y 4 − 4x2 y 2 2

(x2 + y 2 )

−y 4 fx (0, y) − fx (0, 0) = lim = −1 y→0 (y 2 )2 y→0 y

fxy (0, 0) ≡ lim

452

CHAPTER 18. THE DERIVATIVE, A LINEAR TRANSFORMATION

while

fy (x, 0) − fy (0, 0) x4 = lim =1 x→0 x→0 (x2 )2 x

fyx (0, 0) ≡ lim

showing that although the mixed partial derivatives do exist at (0, 0) , they are not equal there. Incidentally, the graph of this function appears very innocent. Its fundamental sickness is not apparent. It is like one of those whited sepulchers mentioned in the Bible.

18.10

Newton’s Method

Remember Newton’s method from one variable calculus. It was an algorithm for finding the zeros of a function. Beginning with xk the next iterate was −1

xk+1 = xk − f ′ (xk )

(f (xk ))

Of course the same thing can sometimes work in Rn or even more generally. Here you have a function f (x) and you want to locate a zero. Then you could consider the sequence of iterates −1

xk+1 = xk − Df (xk )

(f (xk ))

If the sequence converges to x then you would have x = x−Df (x)

−1

(f (x))

and so you would need to have f (x) = 0. In the next section, a modification of this well known method will be used to prove the Implicit function theorem. The modification is that you look for a solution to the equation near x0 and replace the above algorithm with the simpler one −1

xk+1 = xk − Df (x0 ) −1

Then if T x = x − Df (x0 )

(f (xk ))

(f (x)) , it follows that as long as x is sufficiently close to x0 , −1

DT (x) = I − Df (x0 )

Df (x)

and the norm of this transformation is very small so one can use the mean value inequality to conclude that T is a contraction mapping and provide a sequence of iterates which converge to a fixed point. Actually, the situation will be a little more complicated because we will do the implicit function theorem first, but this is the idea.

18.11. EXERCISES

18.11

453

Exercises 4

1. For (x, y) ̸= (0, 0) , let f (x, y) = x2xy +y 8 . Show that this function has a limit as (x, y) → (0, 0) for (x, y) on an arbitrary straight line through (0, 0). Next show that this function fails to have a limit at (0, 0). 2. Here are some scalar valued functions of several variables. Determine which of these functions are o (v). Here v is a vector in Rn , v = (v1 , · · · , vn ). (a) v1 v2 (b) v2 sin (v1 ) (c) v12 + v2 (d) v2 sin (v1 + v2 ) (e) v1 (v1 + v2 + xv3 ) (f) (ev1 − 1 − v1 ) (g) (x · v) |v| 3. Here is a function of two variables. f (x, y) = x2 y + x2 . Find Df (x, y) directly from the definition. Recall this should be a linear transformation which results from multiplication by a 1 × 2 matrix. Find this matrix. ( ) x2 + y 4. Let f (x, y) = . Compute the derivative directly from the definition. This should y2 be the linear transformation which results from multiplying by a 2 × 2 matrix. Find this matrix. 5. You have h (x) = g (f (x)) Here x ∈ Rn ,f (x) ∈ Rm and g (y) ∈ Rp . where f , g are appropriately differentiable. Thus Dh (x) results from multiplication by a matrix. Using the chain rule, give a formula for the ij th entry of this matrix. How does this relate to multiplication of matrices? In other words, you have two matrices which correspond to Dg (f (x)) and Df (x) Call z = g (y) , y = f (x) . Then ( ) ( ) ∂y ∂y ∂z ∂z Dg (y) = ∂y , Df (x) = · · · · · · ∂ym ∂x1 ∂xn 1 Explain the manner in which the ij th entry of Dh (x) is ∑ ∂zi ∂yy ∂yk ∂xj k

This is a review of the way we multiply matrices. what is the ith row of Dg (y) and the j th column of Df (x)? 6. Find fx , fy , fz , fxy , fyx , fzy for the following. Verify the mixed partial derivatives are equal. (a) x2 y 3 z 4 + sin (xyz) (b) sin (xyz) + x2 yz 7. Suppose f is a continuous function and f : U → R where U is an open set and suppose that x ∈ U has the property that for all y near x, f (x) ≤ f (y). Prove that if f has all of its partial derivatives at x, then fxi (x) = 0 for each xi . Hint: Consider f (x + tv) = h (t). Argue that h′ (0) = 0 and then see what this implies about Df (x).

454

CHAPTER 18. THE DERIVATIVE, A LINEAR TRANSFORMATION

8. As an important application of Problem 7 consider the following. Experiments are done at n times, t1 , t2 , · · · , tn and at each time there results a collection of numerical outcomes. Denote p by {(ti , xi )}i=1 the set of all such pairs and try to find numbers a and b such that the line x = at + b approximates ∑p these ordered2 pairs as well as possible in the sense that out of all choices of a and b, i=1 (ati + b − xi ) is as small as possible. In other words, you want to ∑p 2 minimize the function of two variables f (a, b) ≡ i=1 (ati + b − xi ) . Find a formula for a and b in terms of the given ordered pairs. You will be finding the formula for the least squares regression line. 9. Let f be a function which has continuous derivatives. Show that u (t, x) = f (x − ct) solves the wave equation utt − c2 ∆u = 0. What about u (x, t) = f (x + ct)? Here ∆u = uxx . 10. Show that if ∆u = λu where u is a function of only x, then eλt u solves the heat equation ut − ∆u = 0. Here ∆u = uxx . 11. Show that if f (x) = o (x), then f ′ (0) = 0. ( ) 12. Let f (x, y) be defined on R2 as follows. f x, x2 = 1 if x ̸= 0. Define f (0, 0) = 0, and f (x, y) = 0 if y ̸= x2 . Show that f is not continuous at (0, 0) but that lim

h→0

f (ha, hb) − f (0, 0) =0 h

for (a, b) an arbitrary vector. Thus the Gateaux derivative exists at (0, 0) in every direction but f is not even continuous there. {

13. Let f (x, y) ≡

xy 4 x2 +y 8

if (x, y) ̸= (0, 0) 0 if (x, y) = (0, 0)

Show that this function is not continuous at (0, 0) but that the Gateaux derivative lim

h→0

f (ha, hb) − f (0, 0) h

exists and equals 0 for every vector (a, b). 14. Let U be an open subset of Rn and suppose that f : [a, b] × U → R satisfies (x, y) →

∂f (x, y) , (x, y) → f (x, y) ∂yi

are all continuous. Show that ∫



b

f (x, y) dx, a

a

b

∂f (x, y) dx ∂yi

all make sense and that in fact ) ∫ (∫ b b ∂f ∂ f (x, y) dx = (x, y) dx ∂yi ∂y i a a Also explain why

∫ y→ a

b

∂f (x, y) dx ∂yi

is continuous. Hint: You will need to use the theorems from one variable calculus about the existence of the integral for a continuous function. You may also want to use theorems about uniform continuity of continuous functions defined on compact sets.

18.11. EXERCISES

455

15. I found this problem in Apostol’s book [1]. This is a very important result and is obtained very simply. Read it and fill in any missing details. Let ∫

2 2 e−x (1+t ) dt 1 + t2

1

g (x) ≡ 0

and

(∫

x

f (x) ≡

e

−t2

)2 dt .

0

Note ∂ ∂x

(

2 2 e−x (1+t ) 1 + t2

) 2 2 = −2xe−x (1+t )

Explain why this is so. Also show the conditions of Problem 14 are satisfied so that ∫ 1( ) 2 2 ′ g (x) = −2xe−x (1+t ) dt. 0

Now use the chain rule and the fundamental theorem of calculus to find f ′ (x) . Then change the variable in the formula for f ′ (x) to make it an integral from 0 to 1 and show f ′ (x) + g ′ (x) = 0. Now this shows f (x) + g (x) is a constant. Show the constant is π/4 by letting x → 0. Next ∫∞ 2 take a limit as x → ∞ to obtain the following formula for the improper integral, 0 e−t dt, (∫



e

−t2

)2 dt

= π/4.

0

In passing to the limit in the integral for g as x → ∞ you need to justify why that integral converges to 0. To do this, argue the integrand converges uniformly to 0 for t ∈ [0, 1] and then explain why this gives convergence of the integral. Thus ∫ ∞ √ 2 e−t dt = π/2. 0

16. The gamma function is defined for x > 0 as ∫ ∞ ∫ −t x−1 Γ (x) ≡ e t dt ≡ lim R→∞

0

R

e−t tx−1 dt

0

Show this limit exists. Note you might have to give a meaning to ∫ R e−t tx−1 dt 0

if x < 1. Also show that Γ (x + 1) = xΓ (x) , Γ (1) = 1. How does Γ (n) for n an integer compare with (n − 1)!? 17. Show the mean value theorem for integrals. Suppose f ∈ C ([a, b]) . Then there exists x ∈ [a, b] , in fact x can be taken in (a, b) , such that ∫ f (x) (b − a) =

b

f (t) dt a

You will need to recall simple theorems about the integral from one variable calculus.

456

CHAPTER 18. THE DERIVATIVE, A LINEAR TRANSFORMATION

18. In this problem is a short argument showing a version of what has become known as Fubini’s theorem. Suppose f ∈ C ([a, b] × [c, d]) . Then ∫

b





d

d



b

f (x, y) dydx = a

c

f (x, y) dxdy c

a

First explain why the two iterated integrals make sense. Hint: To prove the two iterated integrals are equal, let a = x0 < x1 < · · · < xn = b and c = y0 < y1 < · · · < ym = d be two partitions of [a, b] and [c, d] respectively. Then explain why ∫

b



d

f (x, y) dydx a



=

c d



b

f (x, y) dxdy c

=

a

n ∑ m ∫ ∑

xi

i=1 j=1 xi−1 m ∑ n ∫ yj ∑ j=1 i=1

yj−1



yj

f (s, t) dtds yj−1



xi

f (s, t) dsdt xi−1

Now use the mean value theorem for integrals to write ∫ xi ∫ yj ( ) f (s, t) dtds = f sˆi , tˆj (xi − xi−1 ) (yi − yi−1 ) xi−1

do something similar for

yj−1



yj



xi

f (s, t) dsdt yj−1

xi−1

and then observe that the difference between the sums can be made as small as desired by simply taking suitable paritions.

Chapter 19

Implicit Function Theorem 19.1

Statement And Proof Of The Theorem

Recall the following notation. L (X, Y ) is the space of bounded linear mappings from X to Y where here (X, ∥·∥X ) and (Y, ∥·∥Y ) are normed linear spaces. Recall that this means that for each L ∈ L (X, Y ) ∥L∥ ≡ sup ∥Lx∥ < ∞ ∥x∥≤1

As shown earlier, this makes L (X, Y ) into a normed linear space. In case X is finite dimensional, L (X, Y ) is the same as the collection of linear maps from X to Y . This was shown earlier. In what follows X, Y will be Banach spaces. If you like, think of them as finite dimensional normed linear spaces, but if you like more generality, just think: complete normed linear space and L (X, Y ) is the space of bounded linear maps. Definition 19.1.1 A complete normed linear space is called a Banach space. Theorem 19.1.2 If Y is a Banach space, then L(X, Y ) is also a Banach space. Proof: Let {Ln } be a Cauchy sequence in L(X, Y ) and let x ∈ X. ||Ln x − Lm x|| ≤ ||x|| ||Ln − Lm ||. Thus {Ln x} is a Cauchy sequence. Let Lx = lim Ln x. n→∞

Then, clearly, L is linear because if x1 , x2 are in X, and a, b are scalars, then L (ax1 + bx2 )

= = =

lim Ln (ax1 + bx2 )

n→∞

lim (aLn x1 + bLn x2 )

n→∞

aLx1 + bLx2 .

Also L is bounded. To see this, note that {||Ln ||} is a Cauchy sequence of real numbers because |||Ln || − ||Lm ||| ≤ ||Ln − Lm ||. Hence there exists K > sup{||Ln || : n ∈ N}. Thus, if x ∈ X, ||Lx|| = lim ||Ln x|| ≤ K||x||.  n→∞

The following theorem is really nice. The series in this theorem is called the Neuman series.

457

458

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Lemma 19.1.3 Let (X, ∥·∥) is a Banach space, and if A ∈ L (X, X) and ∥A∥ = r < 1, then (I − A)

−1

=

∞ ∑

Ak ∈ L (X, X)

k=0

where the series converges in the Banach space L (X, X). If O consists of the invertible maps in L (X, X) , then O is open and if I is the mapping which takes A to A−1 , then I is continuous. Proof: First of all, why does the series make sense?



q q ∞ ∑

q k ∑

k ∑ rp k



A ≤ ∥A∥ rk ≤ A ≤

1−r

k=p k=p k=p k=p and so the partial sums are Cauchy in L (X, X) . Therefore, the series converges to something in L (X, X) by completeness of this normed linear space. Now why is it the inverse? ( n ) ∞ n n+1 ∑ ∑ ∑ ∑ ( ) k k k k A (I − A) = lim A (I − A) = lim A − A = lim I − An+1 = I n→∞

k=0

n→∞

k=0

n+1 because An+1 ≤ ∥A∥ ≤ rn+1 . Similarly, (I − A)

∞ ∑ k=0

k=0

k=1

n→∞

( ) Ak = lim I − An+1 = I n→∞

and so this shows that this series is indeed the desired inverse. r Next suppose A ∈ O so A−1 ∈ L (X, X) . Then suppose ∥A − B∥ < 1+∥A −1 ∥ , r < 1. Does it follow that B is also invertible? [ ] B = A − (A − B) = A I − A−1 (A − B)



[ ]−1 exists. Hence Then A−1 (A − B) ≤ A−1 ∥A − B∥ < r and so I − A−1 (A − B) [ ] −1 −1 B −1 = I − A−1 (A − B) A Thus O is open as claimed. As to continuity, let A, B be as just described. Then using the Neuman series,

[ ]−1 −1

∥IA − IB∥ = A−1 − I − A−1 (A − B) A = ≤ ≤





∞ )k −1 )k −1

−1 ∑ ( −1

∑ ( −1

A (A − B) A = A (A − B) A

A −



k=0 k=1 ( )k ∞ ∞ ∑

−1 k+1

−1 2 ∑

−1 k r k

A

A ∥A − B∥ = ∥A − B∥ A 1 + ∥A−1 ∥ k=1 k=0

2 1 . ∥B − A∥ A−1 1−r

Thus I is continuous at A ∈ O.  Lemma 19.1.4 Let and let

O ≡ {A ∈ L (X, Y ) : A−1 ∈ L (Y, X)} I : O → L (Y, X) , IA ≡ A−1.

Then O is open and I is in C m (O) for all m = 1, 2, · · · . Also DI (A) (B) = −I (A) (B) I (A). In particular, I is continuous.

(19.1)

19.1. STATEMENT AND PROOF OF THE THEOREM

459

Proof: Let A ∈ O and let B ∈ L (X, Y ) with ||B|| ≤ Then

1 −1 −1 A . 2

−1 −1 A B ≤ A ||B|| ≤ 1 2

and so by Lemma 19.1.3,

(

I + A−1 B

)−1

∈ L (X, X) .

It follows that −1

(A + B)

( ( ))−1 ( )−1 −1 = A I + A−1 B = I + A−1 B A ∈ L (Y, X) .

Thus O is an open set. Thus −1

(A + B)

∞ ( )−1 −1 ∑ )n n( = I + A−1 B A = (−1) A−1 B A−1 n=0

[ ] = I − A−1 B + o (B) A−1 which shows that O is open and, also, I (A + B) − I (A)

=

∞ ∑

(−1)

n

(

A−1 B

)n

A−1 − A−1

n=0

=

−A−1 BA−1 + o (B)

=

−I (A) (B) I (A) + o (B)

which demonstrates 19.1. It follows from this that we can continue taking derivatives of I. For ||B1 || small, − [DI (A + B1 ) (B) − DI (A) (B)] = I (A + B1 ) (B) I (A + B1 ) − I (A) (B) I (A) =

I (A + B1 ) (B) I (A + B1 ) − I (A) (B) I (A + B1 ) + I (A) (B) I (A + B1 ) − I (A) (B) I (A) = [I (A) (B1 ) I (A) + o (B1 )] (B) I (A + B1 ) + I (A) (B) [I (A) (B1 ) I (A) + o (B1 )]

=

[ ] [I (A) (B1 ) I (A) + o (B1 )] (B) A−1 − A−1 B1 A−1 + o (B1 ) + I (A) (B) [I (A) (B1 ) I (A) + o (B1 )]

= I (A) (B1 ) I (A) (B) I (A) + I (A) (B) I (A) (B1 ) I (A) + o (B1 ) and so D2 I (A) (B1 ) (B) = I (A) (B1 ) I (A) (B) I (A) + I (A) (B) I (A) (B1 ) I (A) which shows I is C 2 (O). Clearly we can continue in this way which shows I is in C m (O) for all m = 1, 2, · · · .  Here are the two fundamental results presented earlier which will make it easy to prove the implicit function theorem. First is the fundamental mean value inequality.

460

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Theorem 19.1.5 Suppose U is an open subset of X and f : U → Y has the property that Df (x) exists for all x in U and that, x + t (y − x) ∈ U for all t ∈ [0, 1]. (The line segment joining the two points lies in U .) Suppose also that for all points on this line segment, ||Df (x+t (y − x))|| ≤ M. Then ||f (y) − f (x)|| ≤ M |y − x| . Next recall the following theorem about fixed points of a contraction map. It was Corollary 11.1.42. Corollary 19.1.6 Let B be a closed subset of the complete metric space (X, d) and let f : B → X be a contraction map d (f (x) , f (ˆ x)) ≤ rd (x, x ˆ) , r < 1. ∞

Also suppose there exists x0 ∈ B such that the sequence of iterates {f n (x0 )}n=1 remains in B. Then f has a unique fixed point in B which is the limit of the sequence of iterates. This is a point x ∈ B such that f (x) = x. In the case that B = B (x0 , δ), the sequence of iterates satisfies the inequality d (x0 , f (x0 )) d (f n (x0 ) , x0 ) ≤ 1−r and so it will remain in B if d (x0 , f (x0 )) < δ. 1−r The implicit function theorem deals with the question of solving, f (x, y) = 0 for x in terms of y and how smooth the solution is. It is one of the most important theorems in mathematics. The proof I will give holds with no change in the context of infinite dimensional complete normed vector spaces when suitable modifications are made on what is meant by L (X, Y ) . There are also even more general versions of this theorem than to normed vector spaces. Recall that for X, Y normed vector spaces, the norm on X × Y is of the form ||(x, y)|| = max (||x|| , ||y||) . Theorem 19.1.7 (implicit function theorem) Let X, Y, Z be finite dimensional normed vector spaces and suppose U is an open set in X × Y . Let f : U → Z be in C 1 (U ) and suppose −1

f (x0 , y0 ) = 0, D1 f (x0 , y0 )

∈ L (Z, X) .

(19.2)

Then there exist positive constants, δ, η, such that for every y ∈ B (y0 , η) there exists a unique x (y) ∈ B (x0 , δ) such that f (x (y) , y) = 0. (19.3) Furthermore, the mapping, y → x (y) is in C 1 (B (y0 , η)). −1

Proof: Let T (x, y) ≡ x − D1 f (x0 , y0 )

f (x, y). Therefore, −1

D1 T (x, y) = I − D1 f (x0 , y0 )

D1 f (x, y) .

(19.4)

by continuity of the derivative which implies continuity of D1 T, it follows there exists δ > 0 such that if ∥x − x0 ∥ < δ and ∥y − y0 ∥ < δ, then ||D1 T (x, y)||
x′1 (t0 )

^ x′2 (s0 ) The way we present this in engineering math is to consider a smooth C 1 curve (x (t) , y (t) , z (t)) for t ∈ (a, b) such that when t = c ∈ (a, b) , (x (c) , y (c) , z (c)) equals the point (x, y, z) in the level surface and such that (x (t) , y (t) , z (t)) lies in this surface. Then 0 = f (x (t) , y (t) , z (t)) Show, using the chain rule, that the gradient vector at the point (x, y, z) is perpendicular to (x′ (c) , y ′ (c) , z ′ (c)) . Recall that the chain rule says that for h (t) = f (x (t) , y (t) , z (t)) , Dh (t) =

19.4. EXERCISES (

∂f ∂x

465

(x (t) , y (t) , z (t))

∂f ∂y

(x (t) , y (t) , z (t))

∂f ∂z

) (x (t) , y (t) , z (t))



 x′ (t)    y ′ (t)  z ′ (t)

Since this holds for all such smooth curves in the surface which go through the given point, we say that the gradient vector is perpendicular to the surface. In the picture, there are two intersecting curves which are shown to intersect at a point of the surface. We present this to the students in engineering math and everyone is happy with it, but the argument is specious. Why do there exist any smooth curves in the surface through a point? What would you need to assume to justify the existence of smooth curves in the surface at some point of the level surface? Why? 5. This problem illustrates what can happen when the gradient of √a scalar valued function vanishes or is not well defined. Consider the level surface given by z − (x2 + y 2 ) = 0. Sketch the graph of this ( surface. ) Why is there no unique tangent plane at the origin (0, 0, 0)? Next consider z 2 − x2 + y 2 = 0. What about a well defined tangent plane at (0, 0, 0)? 6. Suppose you have two level surfaces f (x, y, z) = 0 and g (x, y, z) = 0 which intersect at a point (x0 , y0 , z0 ) , each f, g is C 1 . Use the implicit function theorem to give conditions which will guarantee that the intersection of these two surfaces near this point is a curve. Explain why. 7. Let X, Y be Banach spaces and let U be an open subset of X. Let f : U → Y be C 1 (U ) , let x0 ∈ U, and δ > 0 be given. Show there exists ε > 0 such that if x1 , x2 ∈ B (x0 , ε) , then ∥f (x1 ) − f (x2 ) − Df (x0 ) (x1 − x2 )∥ ≤ δ ∥x1 − x2 ∥ Hint: You know f (x1 ) − f (x2 ) = Df (x2 ) (x1 − x2 ) + o (x1 − x2 ). Use continuity. 8. ↑This problem illustrates how if Df (x0 ) is one to one, then near x0 the same is true of f . Suppose in this problem that all normed linear spaces are finite dimensional. Suppose Df (x0 ) is one to one. Here f : U → Y where U ⊆ X. (a) Show that there exists r > 0 such that ∥Df (x0 ) x∥ ≥ r ∥x∥ . To do this, recall equivalence of norms. (b) Use the above problem to show that there is ε > 0 such that f is one to one on B (x0 , ε) provided Df (x0 ) is one to one. 9. If U, V are open sets in Banach spaces X, Y respectively and f : U → V is one to one and onto and both f, f −1 are C 1 , show that Df (x) : X → Y is one to one and onto for each x ∈ U . Hint: f ◦ f −1 = identity. Now use chain rule. 10. A function f : U ⊆ C → C where U is an open set subset of the complex numbers C is called analytic if f (z + h) − f (z) lim ≡ f ′ (z) , z = x + iy h→0 h exists and z → f ′ (z) is continuous. Show that if f is analytic on an open set U and if f ′ (z) ̸= 0, then there is an open set V containing z such that f (V ) is open, f is one to one, and f, f −1 are both continuous. Hint: This follows very easily from the inverse function theorem. Recall that we have allowed for the field of scalars the complex numbers. 11. Problem 8 has to do with concluding that f is locally one to one if Df (x0 ) is only known to be one to one. The next obvious question concerns the situation where Df (x0 ) maybe is possibly not one to one but is onto. There are two parts, a linear algebra consideration, followed by an application of the inverse function theorem. Thus these two problems are each generalizations of the inverse function theorem.

466

CHAPTER 19. IMPLICIT FUNCTION THEOREM (a) Suppose X is a finite dimensional vector space and M ∈ L (X, Y ) is onto Y . Consider a basis for M (X) = Y, {M x1 , · · · , M xn } . Verify that {x1 , · · · , xn } is linearly independent. ˆ ≡ span (x1 , · · · , xn ). Show that if M ˆ is the restriction of M to X, ˆ then M is Define X one to one and onto Y . (b) Now suppose f : U ⊆ X → Y is C 1 and Df (x0 ) is onto Y . Show that there is a ball B (f (x0 ) , ε) and an open set V ⊆ X such that f (V ) ⊇ B (f (x0 ) , ε) so that if Df (x) is onto for each x ∈ U , then f (U ) is an open set. This is called the open map theorem. ˆ Y . You might want to You might use the inverse function theorem with the spaces X, consider Problem 1. This is a nice illustration of why we developed the inverse and implicit function theorems on arbitrary normed linear spaces. You will see that this is a fairly easy problem.

12. Recall that a function f : U ⊆ X → Y where here assume X is finite dimensional, is Gateaux differentiable if f (x + tv) − f (x) lim ≡ Dv f (x) t→0 t exists. Here t ∈ R. Suppose that x → Dv f (x) exists and is continuous on U . Show ∑ it follows that f is differentiable and in fact Dv f (x) = Df (x) v. Hint: Let g (y) ≡ f ( i yi xi ) and argue that the partial derivatives of g all exist and are continuous. Conclude that g is C 1 and then argue that f is just the composition of g with a linear map.   (x2 −y4 )2

13. Let f (x, y) =

if (x, y) ̸= (0, 0) (x2 +y 4 )2  1 if (x, y) = (0, 0)

Show that f is not differentiable, and in fact is not even continuous, but Dv f (0, 0) exists and equals 0 for every v ̸= 0. {

14. Let f (x, y) =

xy 4 x2 +y 8

if (x, y) ̸= (0, 0) 0 if (x, y) = (0, 0)

Show that f is not differentiable, and in fact is not even continuous, but Dv f (0, 0) exists and equals 0 for every v ̸= 0.

19.5

The Method Of Lagrange Multipliers

As an application of the implicit function theorem, consider the method of Lagrange multipliers from calculus. Recall the problem is to maximize or minimize a function subject to equality constraints. Let f : U → R be a C 1 function where U ⊆ Rn and let gi (x) = 0, i = 1, · · · , m

(19.15)

be a collection of equality constraints with m < n. Now consider the system of nonlinear equations f (x)

=

a

gi (x)

= 0, i = 1, · · · , m.

x0 is a local maximum if f (x0 ) ≥ f (x) for all x near x0 which also satisfies the constraints 19.15. A local minimum is defined similarly. Let F : U × R → Rm+1 be defined by   f (x) − a    g1 (x)  .  (19.16) F (x,a) ≡  ..    . gm (x)

19.6. THE TAYLOR FORMULA

467

Now consider the m + 1 × n Jacobian matrix, the matrix of the linear transformation, D1 F (x, a) with respect to the usual basis for Rn and Rm+1 .   fx1 (x0 ) · · · fxn (x0 )    g1x1 (x0 ) · · · g1xn (x0 )   . .. ..     . . gmx1 (x0 ) · · · gmxn (x0 ) If this matrix has rank m + 1 then some m + 1 × m + 1 submatrix has nonzero determinant. It follows from the implicit function theorem that there exist m + 1 variables, xi1 , · · · , xim+1 such that the system F (x,a) = 0 (19.17) specifies these m + 1 variables as a function of the remaining n − (m + 1) variables and a in an open set of Rn−m . Thus there is a solution (x,a) to 19.17 for some x close to x0 whenever a is in some open interval. Therefore, x0 cannot be either a local minimum or a local maximum. It follows that if x0 is either a local maximum or a local minimum, then the above matrix must have rank less than m + 1 which requires the rows to be linearly dependent. Thus, there exist m scalars, λ1 , · · · , λ m , and a scalar µ, not all zero such that    fx1 (x0 ) g1x1 (x0 )    .. ..    µ . .  = λ1  fxn (x0 ) g1xn (x0 ) If the column vectors





 gmx1 (x0 )    ..  + · · · + λm  . .    gmxn (x0 )



  g1x1 (x0 ) gmx1 (x0 )    .. ..  ,··· . .    g1xn (x0 ) gmxn (x0 )

(19.18)

   

are linearly independent, then, µ ̸= 0 and dividing by µ yields an expression of the form       fx1 (x0 ) g1x1 (x0 ) gmx1 (x0 )       .. .. ..   = λ1   + · · · + λm   . . .       fxn (x0 ) g1xn (x0 ) gmxn (x0 )

(19.19)

(19.20)

at every point x0 which is either a local maximum or a local minimum. This proves the following theorem. Theorem 19.5.1 Let U be an open subset of Rn and let f : U → R be a C 1 function. Then if x0 ∈ U is either a local maximum or local minimum of f subject to the constraints 19.15, then 19.18 must hold for some scalars µ, λ1 , · · · , λm not all equal to zero. If the vectors in 19.19 are linearly independent, it follows that an equation of the form 19.20 holds.

19.6

The Taylor Formula

First recall the Taylor formula with the Lagrange form of the remainder. It will only be needed on [0, 1] so that is what I will show. Theorem 19.6.1 Let h : [0, 1] → R have m + 1 derivatives. Then there exists t ∈ (0, 1) such that h (1) = h (0) +

m ∑ h(k) (0) k=1

k!

+

h(m+1) (t) . (m + 1)!

468

CHAPTER 19. IMPLICIT FUNCTION THEOREM Proof: Let K be a number chosen such that ( ) m ∑ h(k) (0) h (1) − h (0) + +K =0 k! k=1

Now the idea is to find K. To do this, let ( F (t) = h (1) −

h (t) +

m ∑ h(k) (t)

k!

k=1

) k

(1 − t) + K (1 − t)

m+1

Then F (1) = F (0) = 0. Therefore, by Rolle’s theorem there exists t between 0 and 1 such that F ′ (t) = 0. Thus, 0

=





−F (t) = h (t) +

m ∑ h(k+1) (t)

k!

k=1



m ∑ h(k) (t)

k!

k=1

k−1

k (1 − t)

(1 − t)

k

m

− K (m + 1) (1 − t)

And so = h′ (t) +

m ∑ h(k+1) (t)

k!

k=1

m−1 ∑ k=0

−K (m + 1) (1 − t) = h′ (t) +

k

(1 − t) −

h(k+1) (t) k (1 − t) k!

m

h(m+1) (t) m m (1 − t) − h′ (t) − K (m + 1) (1 − t) m!

and so K=

h(m+1) (t) . (m + 1)!

This proves the theorem.  Now let f : U → R where U ⊆ X a normed vector space and suppose f ∈ C m (U ) and suppose m+1 D f (x) exists on U . Let x ∈ U and let r > 0 be such that B (x,r) ⊆ U. Then for ||v|| < r consider

f (x+tv) − f (x) ≡ h (t)

for t ∈ [0, 1]. Then by the chain rule, h′ (t) = Df (x+tv) (v) , h′′ (t) = D2 f (x+tv) (v) (v) and continuing in this way, h(k) (t) = D(k) f (x+tv) (v) (v) · · · (v) ≡ D(k) f (x+tv) vk . It follows from Taylor’s formula for a function of one variable given above that f (x + v) = f (x) +

m ∑ D(k) f (x) vk k=1

k!

+

D(m+1) f (x+tv) vm+1 . (m + 1)!

This proves the following theorem. Theorem 19.6.2 Let f : U → R and let f ∈ C m+1 (U ). Then if B (x,r) ⊆ U, and ||v|| < r, there exists t ∈ (0, 1) such that 19.21 holds.

(19.21)

19.7. SECOND DERIVATIVE TEST

19.7

469

Second Derivative Test

Now consider the case where U ⊆ Rn and f : U → R is C 2 (U ). Then from Taylor’s theorem, if v is small enough, there exists t ∈ (0, 1) such that f (x + v) = f (x) + Df (x) v+

D2 f (x+tv) v2 . 2

(19.22)

Consider D2 f (x+tv) (ei ) (ej ) ≡

D (D (f (x+tv)) ei ) ej ) ( ∂f (x + tv) ej = D ∂xi ∂ 2 f (x + tv) = ∂xj ∂xi

where ei are the usual basis vectors. Letting v=

n ∑

vi ei ,

i=1

the second derivative term in 19.22 reduces to 1∑ 2 1∑ D f (x+tv) (ei ) (ej ) vi vj = Hij (x+tv) vi vj 2 i,j 2 i,j where Hij (x+tv) = D2 f (x+tv) (ei ) (ej ) = Definition 19.7.1 The matrix whose ij th entry is H (x).

∂ 2 f (x) ∂xj ∂xi is

∂ 2 f (x+tv) . ∂xj ∂xi

called the Hessian matrix, denoted as

From Theorem 18.9.1, this is a symmetric real matrix, thus self adjoint. By the continuity of the second partial derivative, 1 f (x + v) = f (x) + Df (x) v+ vT H (x) v+ 2 ) 1( T v (H (x+tv) −H (x)) v . 2 where the last two terms involve ordinary matrix multiplication and

(19.23)

vT = (v1 · · · vn ) for vi the components of v relative to the standard basis. Definition 19.7.2 Let f : D → R where D is a subset of some normed vector space. Then f has a local minimum at x ∈ D if there exists δ > 0 such that for all y ∈ B (x, δ) f (y) ≥ f (x) . f has a local maximum at x ∈ D if there exists δ > 0 such that for all y ∈ B (x, δ) f (y) ≤ f (x) .

470

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Theorem 19.7.3 If f : U → R where U is an open subset of Rn and f is C 2 , suppose Df (x) = 0. Then if H (x) has all positive eigenvalues, x is a local minimum. If the Hessian matrix H (x) has all negative eigenvalues, then x is a local maximum. If H (x) has a positive eigenvalue, then there exists a direction in which f has a local minimum at x, while if H (x) has a negative eigenvalue, there exists a direction in which H (x) has a local maximum at x. Proof: Since Df (x) = 0, formula 19.23 holds and by continuity of the second derivative, H (x) is a symmetric matrix. Thus H (x) has all real eigenvalues. Suppose first that H (x) has all positive eigenvalues and that all are larger than δ 2 > 0. Then by Theorem 14.1.6, H (x) ∑nhas an orthonormal n basis of eigenvectors, {vi }i=1 and if u is an arbitrary vector, such that u = j=1 uj vj where uj = u · vj , then n n ∑ ∑ uT H (x) u = uj vjT H (x) uj vj j=1

=

n ∑ j=1

u2j λj ≥ δ 2

j=1 n ∑

2

u2j = δ 2 |u| .

j=1

From 19.23 and the continuity of H, if v is small enough, 1 1 δ2 2 2 2 f (x + v) ≥ f (x) + δ 2 |v| − δ 2 |v| = f (x) + |v| . 2 4 4 This shows the first claim of the theorem. The second claim follows from similar reasoning. Suppose H (x) has a positive eigenvalue λ2 . Then let v be an eigenvector for this eigenvalue. Then from 19.23, 1 f (x+tv) = f (x) + t2 vT H (x) v+ 2 ( ) 1 2 T t v (H (x+tv) −H (x)) v 2 which implies ) 1 1 ( 2 f (x) + t2 λ2 |v| + t2 vT (H (x+tv) −H (x)) v 2 2 1 2 2 2 ≥ f (x) + t λ |v| 4

f (x+tv) =

whenever t is small enough. Thus in the direction v the function has a local minimum at x. The assertion about the local maximum in some direction follows similarly. This proves the theorem.  This theorem is an analogue of the second derivative test for higher dimensions. As in one dimension, when there is a zero eigenvalue, it may be impossible to determine from the Hessian matrix what the local qualitative behavior of the function is. For example, consider f1 (x, y) = x4 + y 2 , f2 (x, y) = −x4 + y 2 . Then Dfi (0, 0) = 0 and for both functions, the Hessian matrix evaluated at (0, 0) equals ( ) 0 0 0 2 but the behavior of the two functions is very different near the origin. The second has a saddle point while the first has a minimum there.

19.8. THE RANK THEOREM

19.8

471

The Rank Theorem

This is a very interesting result. The proof follows Marsden and Hoffman. First here is some linear algebra. Theorem 19.8.1 Let L : Rn → RN have rank m. Then there exists a basis {u1 , · · · , um , um+1 , · · · , un } such that a basis for ker (L) is {um+1 , · · · , un } . Proof: Since L has rank m, there is a basis for L (Rn ) which is of the form {Lu1 , · · · , Lum } ∑

Then if

ci ui = 0

i

you can do L to both sides and conclude that each ci = 0. Hence {u1 , · ·∑ · , um } is linearly indepenm dent. Let {v1 , · · · , vk } be a basis for ker (L) . Let x ∈ Rn . Then Lx = i=1 ci Lui for some choice of scalars ci . Hence ( ) m ∑ L x− ci ui = 0 i=1

which shows that there exist dj such that x=

m ∑

ci ui +

i=1

k ∑

dj vj

j=1

It follows that span (u1 , · · · , um , v1 , · · · , vk ) = Rn . Is this set of vectors linearly independent? Suppose k m ∑ ∑ dj vj = 0 ci ui + j=1

i=1

Do L to both sides to get

Thus each ci = 0. Hence can let

m ∑

∑k j=1

ci Lui = 0

i=1

dj vj = 0 and so each dj = 0 also. It follows that k = n − m and we {v1 , · · · , vk } = {um+1 , · · · , un } . 

Another useful linear algebra result is the following lemma. ( ) Lemma 19.8.2 Let V ⊆ Rn be a subspace and suppose A (x) ∈ L V, RN for x in some open set U. Also suppose x → A (x) is continuous for x ∈ U . Then if A (x0 ) is one to one on V for some x0 ∈ U, then it follows that for all x close enough to x0 , A (x) is also one to one on V . ∗

Proof: Consider V as an inner product space with the inner product from Rn and A (x) A (x) . ∗ ∗ Then A (x) A (x) ∈ L (V, V ) and x → A (x) A (x) is also continuous. Also for v ∈ V, ( ) ∗ A (x) A (x) v, v V = (A (x) v, A (x) v)RN ∗

If A (x0 ) A (x0 ) v = 0, then from the above, it follows that A (x0 ) v = 0 also. Therefore, v = 0 and ∗ so A (x0 ) A (x0 ) is one to one on V . For all x close enough to x0 , it follows from continuity that ∗ ∗ A (x) A (x) is also one to one. Thus, for such x, if A (x) v = 0, Then A (x) A (x) v = 0 and so v = 0. Thus, for x close enough to x0 , it follows that A (x) is also one to one on V . 

472

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Theorem 19.8.3 Let f : A ⊆ Rn → RN where A is open in Rn . Let f be a C r function and suppose that Df (x) has rank m for all x ∈ A. Let x0 ∈ A. Then there are open sets U, V ⊆ Rn with x0 ∈ V, and a C r function h : U → V with inverse h−1 : V → U also C r such that f ◦ h depends only on (x1 , · · · , xm ). Proof: Let L = Df (x0 ), and N0 = ker L. Using the above linear algebra theorem, there exists {u1 , · · · , um } such that {Lu1 , · · · , Lum } is a basis for LRn . Extend to form a basis for Rn , {u1 , · · · , um , um+1 , · · · , un } such that a basis for N0 = ker L is {um+1 , · · · , un }. Let M ≡ span (u1 , · · · , um ) . Let the coordinate maps be ψ k so that if x ∈ Rn , x = ψ 1 (x) u1 + · · · + ψ n (x) un Since these coordinate maps are linear, they are infinitely differentiable. Next I will define coordinate maps for x ∈ RN . Then by the above construction, {Lu1 , · · · , Lum } is a basis for L (Rn ). Let a basis for RN be { } Lu1 , · · · , Lum , vm+1 , · · · , vN (Note that, since the rank of Df (x) = m you must have N ≥ m.) The coordinate maps ϕi will be defined as follows for x ∈ RN . x = ϕ1 (x) Lu1 + · · · ϕm (x) Lum + ϕm+1 (x) vm+1 + · · · + ϕN (x) vN Now define two infinitely differentiable maps G : Rn → Rn and H : RN → Rn , ( ) G (x) ≡ 0, · · · , 0, ψ m+1 (x) , · · · , ψ n (x) H (y) ≡ (ϕ1 (y) , · · · , ϕm (y) , 0, · · · , 0) For x ∈ A ⊆ Rn , let

g (x) ≡ H (f (x)) + G (x) ∈ Rn

Thus the first term picks out the first m entries of f (x) and the second term the last n − m entries of x. It is of the form ( ) ϕ1 (f (x)) , · · · , ϕm (f (x)) , ψ m+1 (x) , · · · , ψ n (x) Then Dg (x0 ) (v) = HL (v) + Gv =HLv + Gv

(19.24)

which is of the form ( ) Dg (x0 ) (v) = ϕ1 (Lv) , · · · , ϕm (Lv) , ψ m+1 (v) , · · · , ψ n (v) If this equals 0, then all the components of v, ψ m+1 (v) , · · · , ψ n (v) are equal to 0. Hence v=

m ∑

ci ui .

i=1

But also the coordinates of Lv,ϕ1 (Lv) , · · · , ϕm (Lv) are all zero so Lv = 0 and so 0 = so by independence of the Lui , each ci = 0 and consequently v = 0.

∑m

i=1 ci Lui

19.8. THE RANK THEOREM

473

This proves the conditions for the inverse function theorem are valid for g. Therefore, there is an open ball U and an open set V , x0 ∈ V , such that g : V → U is a C r map and its inverse g−1 : U → V is also. We can assume by continuity and Lemma 19.8.2 that V and U are small enough that for each x ∈ V, Dg (x) is one to one. This follows from the fact that x → Dg (x) is continuous. Since it is assumed that Df (x) is of rank m, Df (x) (Rn ) is a subspace which is m dimensional, denoted as Px . Also denote L (Rn ) = L (M ) as P . RN

Px

P

M ⊆ Rn

Thus {Lu1 , · · · , Lum } is a basis for P . Using Lemma 19.8.2 again, by making V, U smaller if necessary, one can also assume that for each x ∈ V, Df (x) is one to one on M (although not on Rn ) and HDf (x) is one to one on M . This follows from continuity and the fact that L = Df (x0 ) is one to one on M . Therefore, it is also the case that Df (x) maps the m dimensional space M onto the m dimensional space Px and H is one to one on Px . The reason for this last claim is as follows: If Hz = 0 where z ∈ Px , then HDf (x) w = 0 where w ∈ M and Df (x) w = z. Hence w = 0 because HDf (x) is one to one, and so z = 0 which shows that indeed H is one to one on Px . Denote as Lx the inverse of H which is defined on Rm × 0, Lx : Rm × 0 → Px . That 0 refers to the N − m string of zeros in the definition given above for H. Define h ≡ g−1 and consider f1 ≡ f ◦ h. It is desired to show that f1 depends only on x1 , · · · , xm . Let D1 refer to (x1 , · · · , xm ) and let D2 refer to (xm+1 , · · · , xn ). Then f = f 1 ◦ g and so by the chain rule Df (x) (y) = Df1 (g (x)) Dg (x) (y) (19.25) Now as in 19.24, for y ∈ Rn , Dg (x) (y) = HDf (x) (y) + Gy

(

) = ϕ1 (Df (x) y) , · · · , ϕm (Df (x) y) , ψ m+1 (y) , · · · , ψ n (y) Recall that from the above definitions of H and G, ( ) G (y) ≡ 0, · · · , 0, ψ m+1 (y) , · · · , ψ n (y) H (Df (x) (y)) = (ϕ1 (Df (x) y) , · · · , ϕm (Df (x) y) , 0, · · · , 0) Let π 1 : Rn → Rm denote the projection onto the first m positions and π 2 the projection onto the last n − m. Thus π 1 Dg (x) (y) = π 2 Dg (x) (y) =

(ϕ1 (Df (x) y) , · · · , ϕm (Df (x) y)) ( ) ψ m+1 (y) , · · · , ψ n (y)

Now in general, for z ∈ Rn , Df1 (g (x)) z = D1 f1 (g (x)) π 1 z + D2 f1 (g (x)) π 2 z Therefore, it follows that Df1 (g (x)) Dg (x) (y) is given by Df (x) (y) = =

Df1 (g (x)) Dg (x) (y) D1 f1 (g (x)) π 1 Dg (x) (y) + D2 f1 (g (x)) π 2 Dg (x) (y) =π 1 Dg(x)(y)

z }| { Df (x) (y) = Df1 (g (x)) Dg (x) (y) = D1 f1 (g (x)) π 1 HDf (x) (y) + D2 f1 (g (x)) π 2 Gy

474

CHAPTER 19. IMPLICIT FUNCTION THEOREM

We need to verify the last term equals 0. Solving for this term, D2 f1 (g (x)) π 2 Gy = Df (x) (y) − D1 f1 (g (x)) π 1 HDf (x) (y) As just explained, Lx ◦ H is the identity on Px , the image of Df (x). Then D2 f1 (g (x)) π 2 Gy

= Lx ◦ HDf (x) (y) − D1 f1 (g (x)) π 1 HDf (x) (y) ( ) = Lx ◦ HDf (x) − D1 f1 (g (x)) π 1 HDf (x) (y)

Factoring out that underlined term, D2 f1 (g (x)) π 2 Gy = [Lx − D1 f1 (g (x)) π 1 ] HDf (x) (y) Now Df (x) : M → Px = Df (x) (Rn ) is onto. (This is based on the assumption that Df (x) has rank m.) Thus it suffices to consider only y ∈ M in the right side of the above. However, for such y,π 2 Gy = 0 because to be in M, ψ k (y) = 0 if k ≥ m + 1, and so the left side of the above equals 0. Thus it appears this term on the left is 0 for any y chosen. How can this be so? It can only take place if D2 f1 (g (x)) = 0 for every x ∈ V . Thus, since g is onto, it can only take place if D2 f1 (x) = 0 for all x ∈ U . Therefore on U it must be the case that f1 depends only on x1 , · · · , xm as desired. 

19.9

The Local Structure Of C 1 Mappings

In linear algebra it is shown that every invertible matrix can be written as a product of elementary matrices, those matrices which are obtained from doing a row operation to the identity matrix. Two of the row operations produce a matrix which will change exactly one entry of a vector when it is multiplied by the elementary matrix. The other row operation involves switching two rows and this has the effect of switching two entries in a vector when multiplied on the left by the elementary matrix. Thus, in terms of the effect on a vector, the mapping determined by the given matrix can be considered as a composition of mappings which either flip two entries of the vector or change exactly one. A similar local result is available for nonlinear mappings. I found this interesting result in the advanced calculus book by Rudin. Definition 19.9.1 Let U be an open set in Rn and let G : U → Rn . Then G is called primitive if it is of the form ( )T G (x) = x1 · · · α (x) · · · xn . Thus, G is primitive if it only changes one of the variables. A function F : Rn → Rn is called a flip if T F (x1 , · · · , xk , · · · , xl , · · · , xn ) = (x1 , · · · , xl , · · · , xk , · · · , xn ) . Thus a function is a flip if it interchanges two coordinates. Also, for m = 1, 2, · · · , n, define Pm (x) ≡

( x1

x2

···

xm

0

···

)T 0

−1

It turns out that if h (0) = 0, Dh (0) exists, and h is C 1 on U , then h can be written as a composition of primitive functions and flips. This is a very interesting application of the inverse function theorem. −1

Theorem 19.9.2 Let h : U → Rn be a C 1 function with h (0) = 0, Dh (0) exists. Then there is an open set V ⊆ U containing 0, flips F1 , · · · , Fn−1 , and primitive functions Gn , Gn−1 , · · · , G1 such that for x ∈ V, h (x) = F1 ◦ · · · ◦ Fn−1 ◦ Gn ◦ Gn−1 ◦ · · · ◦ G1 (x) . The primitive function Gj leaves xi unchanged for i ̸= j.

19.9. THE LOCAL STRUCTURE OF C 1 MAPPINGS Proof: Let

(

h1 (x) ≡ h (x) = ( Dh (0) e1 =

α1 (x) · · ·

α1,1 (0) · · ·

475 )T αn (x) )T αn,1 (0)

k where αk,1 denotes ∂α ∂x1 . Since Dh (0) is one to one, the right side of this expression cannot be zero. Hence there exists some k such that αk,1 (0) ̸= 0. Now define ( )T G1 (x) ≡ αk (x) x2 · · · xn

Then the matrix of DG1 (0) is of the form  αk,1 (0) · · ·  0 1   ..   . 0 0

··· ..

.

···

αk,n (0) 0 .. . 1

     

and its determinant equals αk,1 (0) ̸= 0. Therefore, by the inverse function theorem, there exists an open set U1 , containing 0 and an open set V2 containing 0 such that G1 (U1 ) = V2 and G1 is one to one and onto, such that it and its inverse are both C 1 . Let F1 denote the flip which interchanges xk with x1 . Now define h2 (y) ≡ F1 ◦ h1 ◦ G−1 1 (y) Thus h2 (G1 (x))

≡ F1 ◦ h1 (x) ( = αk (x) · · ·

Therefore,

α1 (x) · · ·

( P1 h2 (G1 (x)) =

Also

αk (x) 0

( P1 (G1 (x)) =

)T

αk (x) 0

··· ···

(19.26)

αn (x) )T .

0 )T 0 −1

so P1 h2 (y) = P1 (y) for all y ∈ V2 . Also, h2 (0) = 0 and Dh2 (0) exists because of the definition of h2 above and the chain rule. Since F21 = I, the identity map, it follows from (19.26) that h (x) = h1 (x) = F1 ◦ h2 ◦ G1 (x) .

(19.27)

Note that on an open set V2 ≡ G1 (U1 ) containing the origin, h2 leaves the first entry unchanged. This is what P1 h2 (G1 (x)) = P1 (G1 (x)) says. In contrast, h1 = h left possibly no entries unchanged. Suppose then, that for m ≥ 2, hm leaves the first m − 1 entries unchanged, Pm−1 hm (x) = Pm−1 (x)

(19.28) −1

for all x ∈ Um , an open subset of U containing 0, and hm (0) = 0, Dhm (0) exists. From (19.28), hm (x) must be of the form ( )T hm (x) = x1 · · · xm−1 α1 (x) · · · αn (x) where these αk are different than the ones used earlier. Then ( Dhm (0) em = 0 · · · 0 α1,m (0) · · ·

)T αn,m (0)

̸= 0

476

CHAPTER 19. IMPLICIT FUNCTION THEOREM

because Dhm (0) as before. Define

−1

exists. Therefore, there exists a k ≥ m such that αk,m (0) ̸= 0, not the same k Gm (x) ≡

( x1

···

xm−1

αk (x) · · ·

)T (19.29)

xn −1

so a change in Gm occurs only in the mth slot. Then Gm (0) = 0 and DGm (0) the above. In fact det (DGm (0)) = αk,m (0) .

exists similar to

Therefore, by the inverse function theorem, there exists an open set Vm+1 containing 0 such that Vm+1 = Gm (Um ) with Gm and its inverse being one to one, continuous and onto. Let Fm be the flip which flips xm and xk . Then define hm+1 on Vm+1 by hm+1 (y) = Fm ◦ hm ◦ G−1 m (y) . Thus for x ∈ Um , and consequently, since

F2m

hm+1 (Gm (x)) = (Fm ◦ hm ) (x) .

(19.30)

Fm ◦ hm+1 ◦ Gm (x) = hm (x)

(19.31)

= I,

It follows = Pm (Fm ◦ hm ) (x) ( = x1 · · · xm−1

Pm hm+1 (Gm (x))

and

( Pm (Gm (x)) =

x1

···

xm−1

αk (x) 0

αk (x) 0

···

)T

···

0 )T .

0

Therefore, for y ∈ Vm+1 , Pm hm+1 (y) = Pm (y) . As before, hm+1 (0) = 0 and Dhm+1 (0) obtaining the following: h (x) =

−1

exists. Therefore, we can apply (19.31) repeatedly,

F1 ◦ h2 ◦ G1 (x)

= F1 ◦ F2 ◦ h3 ◦ G2 ◦ G1 (x) .. . = F1 ◦ · · · ◦ Fn−1 ◦ hn ◦ Gn−1 ◦ · · · ◦ G1 (x) where hn fixes the first n − 1 entries, Pn−1 hn (x) = Pn−1 (x) = and so hn (x) is a primitive mapping of the form ( hn (x) = x1 · · ·

( x1

···

)T xn−1

0

,

)T xn−1

α (x)

.

Therefore, define the primitive function Gn (x) to equal hn (x). 

19.10

Brouwer Fixed Point Theorem Rn

This is on the Brouwer fixed point theorem and a discussion of some of the manipulations which are important regarding simplices. This here is an approach based on combinatorics or graph theory. It features the famous Sperner’s lemma. It uses very elementary concepts from linear algebra in an essential way. However, it is pretty technical stuff. This elementary proof is harder than those which are based on other approaches like integration theory or degree theory.

19.10. BROUWER FIXED POINT THEOREM RN

19.10.1

477

Simplices and Triangulations

Definition 19.10.1 Define an n simplex, denoted by [x0 , · · · , xn ], to be the convex hull of the n + 1 n points, {x0 , · · · , xn } where {xi − x0 }i=1 are linearly independent. Thus } { n n ∑ ∑ [x0 , · · · , xn ] ≡ ti x i : ti = 1, ti ≥ 0 . i=0

i=0

Note that {xj − xm }j̸=m are also independent. I will call the {ti } just described the coordinates of a point x. ∑ To see the last claim, suppose j̸=m cj (xj − xm ) = 0. Then you would have ∑ c0 (x0 − xm ) + cj (xj − xm ) = 0 j̸=m,0



= c0 (x0 − xm ) +



cj (xj − x0 ) + 

j̸=m,0

=





 cj  (x0 − xm ) = 0

j̸=m,0

 cj (xj − x0 ) + 

j̸=m,0



 cj  (x0 − xm )

j̸=m

∑ Then you get j̸=m cj = 0 and each cj = 0 for j ̸= m, 0. Thus c0 = 0 also because the sum is 0 and all other cj = 0. n Since {xi − x0 }i=1 is an independent set, the ti used to specify a point in the convex hull are uniquely determined. If two of them are n ∑

ti xi =

i=0

Then

n ∑

ti (xi − x0 ) =

n ∑

si xi

i=0 n ∑

si (xi − x0 )

i=0

i=0

so ti = si for i ≥ 1 by independence. Since the si and ti sum to 1, it follows that also s0 = t0 . If n ≤ 2, the simplex is a triangle, line segment, or point. If n ≤ 3, it is a tetrahedron, triangle, line segment or point. Definition 19.10.2 If S is an n simplex. Then it is triangulated if it is the union of smller subsimplices, the triangulation, such that if S1 , S2 are two simplices in the triangulation, with [ ] [ ] S1 ≡ z10 , · · · , z1m , S2 ≡ z20 , · · · , z2p then S1 ∩ S2 = [xk0 , · · · , xkr ] where [xk0 , · · · , xkr ] is in the triangulation and { } { } {xk0 , · · · , xkr } = z10 , · · · , z1m ∩ z20 , · · · , z2p or else the two simplices do not intersect. The following proposition is geometrically fairly clear. It will be used without comment whenever needed in the following argument about triangulations.

478

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Proposition 19.10.3 Say [x1 , · · · , xr ] , [ˆ x1 , · · · , x ˆr ] , [z1 , · · · , zr ] are all r − 1 simplices and [x1 , · · · , xr ] , [ˆ x1 , · · · , x ˆr ] ⊆ [z1 , · · · , zr ] and [z1 , · · · , zr , b] is an r + 1 simplex and [y1 , · · · , ys ] = [x1 , · · · , xr ] ∩ [ˆ x1 , · · · , x ˆr ]

(19.32)

{y1 , · · · , ys } = {x1 , · · · , xr } ∩ {ˆ x1 , · · · , x ˆr }

(19.33)

where Then [x1 , · · · , xr , b] ∩ [ˆ x1 , · · · , x ˆr , b] = [y1 , · · · , ys , b] (19.34) ∑s Proof: If you have i=1 ti yi + ts+1 b in the right side, the ti summing to 1 and nonnegative, then it is obviously in both of the two simplices on the left because of 19.33. Thus [x1 , · · · , xr , b] ∩ [ˆ x1 , · · · , x ˆr , b] ⊇ [y1 , · ·∑ · , ys , b]. ∑r r ˆk = j=1 tˆkj zj , as usual, the scalars adding to 1 and nonnegative. Now suppose xk = j=1 tkj zj , x Consider something in both of the simplices on the left in 19.34. Is it in the right? The element on the left is of the form r r ∑ ∑ sα xα + sr+1 b = sˆα x ˆα + sˆr+1 b α=1

α=1

where the sα , are nonnegative and sum to one, similarly for sˆα . Thus r r ∑ ∑

sα tα j zj + sr+1 b =

j=1 α=1

r r ∑ ∑

sˆα tˆα ˆr+1 b j zj + s

(19.35)

α=1 j=1

Now observe that ∑∑ j

sα tα j + sr+1 =

∑∑

α

α

j

sα tα j + sr+1 =



sα + sr+1 = 1.

α

A similar observation holds for the right side of 19.35. By uniqueness of the coordinates in an r + 1 simplex, and assumption that [z1 , · · · , zr , b] is an r + 1 simplex, sˆr+1 = sr+1 and so r ∑

r ∑ sα sˆα xα = x ˆα 1 − s 1 − sr+1 r+1 α=1 α=1

∑ ∑ sα sˆα where α 1−sr+1 = α 1−sr+1 = 1, which would say that both sides are a single element of [x ] ∩ [ˆ x1 , · · · , x ˆr ] = [y1 , · · · , ys ] and this shows both are equal to something of the form ∑1s, · · · , xr∑ t y , t = 1, t ≥ 0. Therefore, i i i i i=1 i s r s ∑ ∑ ∑ sα xα = ti y i , sα xα = (1 − sr+1 ) ti yi 1 − sr+1 α=1 α=1 i=1 i=1 r ∑

It follows that r ∑ α=1

sα xα + sr+1 b =

s ∑

(1 − sr+1 ) ti yi + sr+1 b ∈ [y1 , · · · , ys , b]

i=1

which proves the other inclusion.  Next I will explain why any simplex can be triangulated in such a way that all sub-simplices have diameter less than ε. This is obvious if n ≤ 2. Supposing it to be true for n − 1, is it also so for n? The barycenter b ∑ 1 x . This point is not in the convex hull of any of the faces, those of a simplex [x0 , · · · , xn ] is 1+n i i simplices of the form [x0 , · · · , x ˆk , · · · , xn ] where the hat indicates xk has been left out. Thus, placing

19.10. BROUWER FIXED POINT THEOREM RN

479

b in the k th position, [x0 , · · · , b, · · · , xn ] is a n simplex also. First note that [x0 , · · · , x ˆk , · · · , xn ] is an n − 1 simplex. To be sure [x0 , · · · , b, · · · , xn ] is an n simplex, we need to check that certain vectors are linearly independent. If ( ) n k−1 n ∑ ∑ 1 ∑ dj (xj − x0 ) 0= cj (xj − x0 ) + ak xi − x0 + n + 1 i=0 j=1 j=k+1

then does it follow that ak = 0 = cj = dj ? k−1 ∑

1 cj (xj − x0 ) + ak 0= n + 1 j=1 0=

k−1 ∑(

cj +

j=1

ak n+1

)

ak n+1

(xj − x0 ) + ak

( n ∑

) (xi − x0 )

n ∑

+

i=0

dj (xj − x0 )

j=k+1

( ) n ∑ 1 ak (xk − x0 ) + dj + (xj − x0 ) n+1 n+1 j=k+1

ak n+1

ak n+1

Thus = 0 and each cj + = 0 = dj + so each cj and dj are also 0. Thus, this is also an n simplex. Actually, a little more is needed. Suppose [y0 , · · · , yn−1 ] is an n−1 simplex such that [y0 , · · · , yn−1 ] ⊆ n−1 [x0 , · · · , x ˆk , · · · , xn ] . Why is [y0 , · · · , yn−1 , b] an n simplex? We know the vectors {yj − y0 }k=1 are ∑ ∑ independent and that yj = i̸=k tji xi where i̸=k tji = 1 with each being nonnegative. Suppose n−1 ∑

cj (yj − y0 ) + cn (b − y0 ) = 0

(19.36)

j=1

If cn = 0, then by assumption, each cj = 0. The proof goes by assuming cn ̸= 0 and deriving a contradiction. Assume then that cn ̸= 0. Then you can divide by it and obtain modified constants, still denoted as cj such that b=

n−1 n ∑ 1 ∑ cj (yj − y0 ) xi = y0 + n + 1 i=0 j=1

Thus 1 n+1

n ∑ ∑

t0s (xi − xs ) =

i=0 s̸=k

n−1 ∑

cj (yj − y0 ) =

n−1 ∑

j=1

=

n−1 ∑ j=1

 cj 



j=1

tjs (xs − x0 ) −

s̸=k

  ∑ ∑ cj  tjs xs − t0s xs 



s̸=k



s̸=k

t0s (xs − x0 )

s̸=k

Modify the term on the left and simplify on the right to get   n n−1 ∑ ∑( ) 1 ∑∑ 0 ts ((xi − x0 ) + (x0 − xs )) = cj  tjs − t0s (xs − x0 ) n + 1 i=0 j=1 s̸=k

s̸=k

Thus, 1 n+1

n ∑ i=0

  ∑  t0s  (xi − x0 ) = s̸=k

1 ∑∑ 0 ts (xs − x0 ) n + 1 i=0 s̸=k   n−1 ∑ ∑( ) + cj  tjs − t0s (xs − x0 ) n

j=1

s̸=k

480

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Then, taking out the i = k term on the left yields     1 ∑ 0  1 ∑ ∑ 0  ts (xk − x0 ) = − ts (xi − x0 ) n+1 n+1 s̸=k

1 n+1

i̸=k

s̸=k

  n ∑ n−1 ∑ ∑( ∑ ) tjs − t0s (xs − x0 ) t0s (xs − x0 ) + cj  i=0 s̸=k

j=1

s̸=k

That on the right is a linear combination of vectors (xr − x0 ) for r ̸= k so by independence, ∑ 0 0 r̸=k tr = 0. However, each tr ≥ 0 and these sum to 1 so this is impossible. Hence cn = 0 after all and so each cj = 0. Thus [y0 , · · · , yn−1 , b] is an n simplex. Now in general, if you have an n simplex is the maximum of |xk − xl | for ∑ [x0 , · · · , xn ] , its diameter ∑ n 1 n 1 diam (S). all k ̸= l. Consider |b − xj | . It equals i=0 n+1 (xi − xj ) = i̸=j n+1 (xi − xj ) ≤ n+1 Next consider the k th face of S [x0 , · · · , x ˆk , · · · , xn ]. By induction, it has a triangulation into n simplices which each have diameter no more than n+1 diam (S). Let these n − 1 simplices be de} {[ ]}mk ,n+1 { k k k noted by S1 , · · · , Smk . Then the simplices Si , b i=1,k=1 are a triangulation of S such that ([ ]) [ ] n diam Sik , b ≤ n+1 diam (S). Do for Sik , b what was just done for S obtaining a triangulation of S as the union of what is obtained such that each simplex has diameter no more than )2 ( n diam (S). Continuing this way shows the existence of the desired triangulation. You simply n+1 ( )k n do the process k times where n+1 diam (S) < ε.

19.10.2

Labeling Vertices

Next is a way to label the vertices. Let p0 , · · · , pn be the first n + 1 prime numbers. All vertices n of a simplex S = [x0 , · · · , xn ] having {xk − x0 }k=1 independent will be labeled with one of these primes. In particular, the vertex xk will be labeled as pk if the simplex is [x0 , · · · , xn ]. The “value” of a simplex will be the product of its labels. Triangulate this S. Consider a 1 simplex whose vertices are from the vertices of S, the original n simplex [xk1 , xk2 ], label xk1 as pk1 and xk2 as pk2 . Then label all other vertices of this triangulation which occur on [xk1 , xk2 ] either pk1 or pk2 . Note that by independence of {xk − xr }k̸=r , this cannot introduce an inconsistency because the segment cannot contain any other vertex of S. Then obviously there will be an odd number of simplices in this triangulation having value pk1 pk2 , that is a pk1 at one end and a pk2 at the other. Next consider the 2 simplices [xk1 , xk2 , xk3 ] where the xki are from S. Label all vertices of the triangulation which lie on one of these 2 simplices which have not already been labeled as either pk1 , pk2 , or pk2 . Continue this way. This labels all vertices of the triangulation of S which have at least one coordinate zero. For the vertices of the triangulation which have all coordinates positive, the interior points of S, label these at random from any of p0 , ..., pn . (Essentially, this is the same idea. The “interior” points are the new ones ∏ not already labeled.) The idea is to n show that there is an odd number of n simplices with value i=0 pi in the triangulation and more [ ] generally, for each m simplex xk1 , · · · , xkm+1 , m ≤ n with the xki an [ original vertex ] from S, there are an odd number of m simplices of the triangulation contained in xk1 , · · · , xkm+1 , having value pk1 · · · pkm+1 [ . It is clear ]that this is the case for all such 1 simplices. For convenience, call such simplices xk1 , · · · , xkm+1 m dimensional faces of S. An m simplex which is a subspace of this one will have the “correct” value if its value is pk1 · · · pkm+1 . Suppose that the labeling has produced an odd number of simplices of the triangulation contained in each m dimensional face of [S which have the correct value. Take such an m dimensional face [ ] ] xj1 , . . . , xjk+1 . Consider Sˆ[≡ xj1 , . . . xjk+1 , xjk+2 ]. Then by induction, there is an odd number of ∏ k simplices on the sth face xj1 , . . . , x ˆjs , · · · , xjk+2 having value i̸=s pji . In particular, the face [ ] ∏ xj1 , . . . , xjk+1 , x ˆjk+2 has an odd number of simplices with value i≤k+1 pji . ˆ No simplex in any other [ face of S can have ] this value by uniqueness ∏ of prime factorization. Pick a simplex on the face xj1 , . . . , xjk+1 , x ˆjk+2 which has correct value i≤k+1 pji and cross this

19.10. BROUWER FIXED POINT THEOREM RN

481

ˆ Continue crossing simplices having value ∏ simplex into S. i≤k+1 pji which have not been crossed till the process ends. It must end because there are an odd number of these simplices having value ∏ ˆ p . If the process leads to the outside of S, then one can always enter it again because i≤k+1 ji ∏ there are an odd number of simplices with value i≤k+1 pji available and you will have ∏ used up an even number. Note that in this process, if you have a simplex with one side labeled i≤k+1 pji , there is either one way in or out of this simplex or two depending on whether ∏k+2 the remaining vertex is labeled pjk+2 . When the process ends, the value of the simplex must be i=1 pji because it will have the additional label pjk+2 . Otherwise, there would be another route out of this, through the other ∏ ∏k+2 side labeled i≤k+1 pji . This identifies a simplex in the triangulation with value i=1 pji . Then [ ] ∏ ˆjk+2 which have not been repeat the process with i≤k+1 pji valued simplices on xj1 , . . . , xjk+1 , x ∏k+2 crossed. Repeating the process, entering from the outside, cannot deliver a i=1 pji valued simplex encountered earlier because of what was just noted. There is either ∏ one or two ways to cross the simplices. In other words, the process is one to one in selecting a i≤k+1 pji simplex from crossing ˆ Continue doing this, crossing a ∏ such a simplex on the selected face of S. i≤k+1 pji simplex on the ˆ face of∏S which has not been crossed previously. This identifies an odd number of simplices having k+2 value i=1 pji . These are the ones which are “accessible” from the outside using this process. If there are any which are not accessible from outside, applying the ∏ same process starting inside one of k+2 these, leads to exactly one other inaccessible simplex with value i=1 pji . Hence these inaccessible simplices occur in pairs and so there are an odd number of simplices in the triangulation having value ∏k+2 p . i=1 ji We refer to this procedure of labeling as Sperner’s lemma. The system of labeling is well n defined thanks to the assumption that {xk − x0 }k=1 is independent which implies that {xk − xi }k̸=i is also linearly independent. Thus there can be no ambiguity in the labeling of vertices on any “face” the convex hull of some of the original vertices of S. The following is a description of the system of labeling the vertices. n

Lemma 19.10.4 Let [x0 , · · · , xn ] be an n simplex with {xk − x0 }k=1 independent, and let the first n + 1 primes be p0 , p1 , · · · , pn . Label xk as pk and consider a triangulation of this simplex. Labeling the vertices of this triangulation which occur on [xk1 , · · · , xks ] with any of pk1 , · · · , pks , beginning with all 1 simplices [xk1 , xk2 ] and then 2 simplices and so forth, there are an odd number of simplices [yk1 , · · · , yks ] of the triangulation contained in [xk1 , · · · , xks ] which have value pk1 · · · pks . This for s = 1, 2, · · · , n. A combinatorial method We now give a brief discussion of the system of labeling for Sperner’s lemma from the point of view of counting numbers of faces rather than obtaining them with an algorithm. Let p0 , · · · , pn n be the first n + 1 prime numbers. All vertices of a simplex S = [x0 , · · · , xn ] having {xk − x0 }k=1 independent will be labeled with one of these primes. In particular, the vertex xk will be labeled as pk . The value of a simplex will be the product of its labels. Triangulate this S. Consider a 1 simplex coming from the original simplex [xk1 , xk2 ], label one end as pk1 and the other as pk2 . Then label all other vertices of this triangulation which occur on [xk1 , xk2 ] either pk1 or pk2 . The assumption of linear independence assures that no other vertex of S can be in [xk1 , xk2 ] so there will be no inconsistency in the labeling. Then obviously there will be an odd number of simplices in this triangulation having value pk1 pk2 , that is a pk1 at one end and a pk2 at the ] [ other. Suppose that the labeling has been done for all vertices of the triangulation which are on xj1 , . . . xjk+1 , } { xj1 , . . . xjk+1 ⊆ {x0 , . . . xn } any k simplex for k ≤ n − 1, and there is[an odd number of simplices from the triangulation having ] ∏k+1 value equal to i=1 pji . Consider Sˆ ≡ xj1 , . . . xjk+1 , xjk+2 . Then by induction, there is an odd number of k simplices on the sth face [ ] x j1 , . . . , x ˆjs , · · · , xjk+1 [ ] ∏ having value i̸=s pji . In particular the face xj1 , . . . , xjk+1 , x ˆjk+2 has an odd number of simplices ∏k+1 with value pj := Pˆk . We want to argue that some simplex in the triangulation which is i=1

i

482

CHAPTER 19. IMPLICIT FUNCTION THEOREM

∏k+2 contained in Sˆ has value Pˆk+1 := i=1 pji . Let Q be the number of k + 1 simplices from the triangulation contained in Sˆ which have two faces with value Pˆk (A k + 1 simplex has either 1 or 2 Pˆk faces.) and let R be the number of k + 1 simplices from the triangulation contained in Sˆ which have exactly one Pˆk face. These are the ones we want because they have value Pˆk+1 . Thus the number of faces having value Pˆk which is described here is 2Q + R. All interior Pˆk faces being ˆ counted twice by this number. Now we count [ ] the total number of Pk faces another way. There are P of them on the face xj1 , . . . , xjk+1 , x ˆjk+2 and by induction, P is odd. Then there are O of them which are not on this face. These faces got counted twice. Therefore, 2Q + R = P + 2O ˆ and so, since P is odd, so is R. Thus there is an odd number of Pˆk+1 simplices in S. We refer to this procedure of labeling as Sperner’s lemma. The system of labeling is well defined n thanks to the assumption that {xk − x0 }k=1 is independent which implies that {xk − xi }k̸=i is also linearly independent. Thus there can be no ambiguity in the labeling of vertices on any “face”, the convex hull of some of the original vertices of S. Sperner’s lemma is now a consequence of this discussion.

19.11

The Brouwer Fixed Point Theorem n

S ≡ [x0 , · · · , xn ] is a simplex in Rn . Assume {xi − x0 }i=1 are linearly independent. Thus a typical point of S is of the form n ∑ ti xi i=0

where and the map x → t is continuous from S to the compact { the ti are∑uniquely determined } set t ∈ Rn+1 : ti = 1, ti ≥ 0 . The map t → x is one to one and clearly continuous. Since S is compact, it follows that the inverse map is also continuous. This is a general consideration but what follows is a short explanation why this is so in this example. ∑n specific k k k To see this, suppose x → x in S. Let x ≡ t x with x defined similarly with tki replaced i i i=0 ∑n with ti , x ≡ i=0 ti xi . Then xk − x0 =

n ∑

tki xi −

i=0

Thus xk − x0 =

n ∑

n ∑

tki x0 =

i=0

n ∑

tki (xi − x0 )

i=1

tki (xi − x0 ) , x − x0 =

i=1

n ∑

ti (xi − x0 )

i=1

Say tki fails to converge to ti for all i ≥ 1. Then there exists a subsequence, still denoted with superscript k such that for each i = 1, · · · , n, it follows that tki → si where si ≥ 0 and some si ̸= ti . But then, taking a limit, it follows that x − x0 =

n ∑ i=1

si (xi − x0 ) =

n ∑

ti (xi − x0 )

i=1

which contradicts independence of the xi − x0 . It follows that for all i ≥ 1, tki → ti . Since they all sum to 1, this implies that also tk0 → t0 . Thus the claim about continuity is verified. Let f : S → S be continuous. When doing f to a point x, one obtains another point of S denoted ∑ n as i=0 si xi . Thus ∑n in this argument the scalars si will be the components after doing f to a point of S denoted as i=0 ti xi . Consider a triangulation of S such that all simplices in the triangulation have diameter less than ε. The vertices of the simplices in this triangulation will be labeled from p0 , · · · , pn the first n + 1 prime numbers. If [y0 , · · · , yn ] is one of these simplices in the triangulation, each vertex is of the

19.11. THE BROUWER FIXED POINT THEOREM

483

∑n ∑ ∑n form l=0 tl xl where tl ≥ 0 and l tl = 1. Let yi be one of these vertices, yi = l=0 tl xl . Define rj ≡ sj /tj if tj > 0 and ∞ if tj = 0. Then p (yi ) will be the label placed on yi . To determine this label, let rk be the smallest of these ratios. Then the label placed on yi will be pk where rk is the smallest of all these extended nonnegative real numbers just described. If there is duplication, pick pk where k is smallest. Note that for the vertices which are on [xi1 , · · · , xim ] , these will be labeled from the list {pi1 , · · · , pim } because tk = 0 for each of these and so rk = ∞ unless k ∈ {i1 , · · · , im } . In particular, this scheme labels xi as pi . By∏the Sperner’s lemma procedure described above, there are an odd number of simplices having value i̸=k pi on the k th face and an odd number of simplices in the triangulation of S for which the product of the labels on their vertices, referred to here as its value, equals p0 p1 · · · pn ≡ Pn . Thus if [y0 , · · · , yn ] is one of these simplices, and p (yi ) is the label for yi , n ∏

p (yi ) =

i=0

n ∏

pi ≡ Pn

i=0

What is rk , the smallest of those ratios in determining a label? Could it be larger than 1? rk is certainly finite because at least some tj ̸= 0 since they sum to 1. Thus, if rk > 1, you would have sk > tk . The sj sum to 1 and so some sj < tj since otherwise, the sum of the tj equalling 1 would require the sum of the sj to be larger than 1. Hence rk was not really the smallest after all and so rk ≤ 1. Hence sk ≤ tk . Let S ≡ {S1 , · · · , Sm } denote those simplices whose value is Pn . In other words, if {y0 , · · · , yn } are the vertices of one of these simplices in S, and ys =

n ∑

tsi xi

i=0

rks ≤ rj for all j ̸= ks and {k0 , · · · , kn } = {0, · · · , n}. Let b denote the barycenter of Sk = [y0 , · · · , yn ]. n ∑ 1 b≡ yi n + 1 i=0 Do the same system of labeling for each n simplex in a sequence of triangulations where the diameters of the simplices in the k th triangulation is no more than 2−k . Thus each of these triangulations has a n simplex having diameter no more than 2−k which has value Pn . Let bk be the barycenter of one of these n simplices having value Pn . By compactness, there is a subsequence, still denoted with the index k such that k → x. This x is a fixed point. ∑b ∑n n Consider this last claim. x = i=0 ti xi and after applying f , the result is i=0 si xi . Then bk is the barycenter of some 2−k which has value Pn . Say σ k is a ] }diameter no more[ than { σk k having k k simplex having vertices y0 , · · · , yn and the value of y0 , · · · , ynk is Pn . Thus also lim yik = x.

k→∞

Re ordering these if necessary, we can assume that the label for yik is pi which implies that, as noted above, for each i = 0, · · · , n, si ≤ 1, si ≤ ti ti ( ) the ith coordinate of f yik with respect to the original vertices of S decreases and each i is represented for i = {0, 1, · · · , n} . As noted above, yik → x

( ) and so the ith coordinate of yik , tki must converge to ti . Hence if the ith coordinate of f yik is denoted by ski , ski ≤ tki

484

CHAPTER 19. IMPLICIT FUNCTION THEOREM

By continuity of f , it follows that ski → si . Thus the above inequality is preserved on taking k → ∞ and so 0 ≤ si ≤ ti this for each i and these si , ti pertain to the single point x. But these si add to 1 as do the ti and so in fact, si = ti for each i and so f (x) = x. This proves the following theorem which is the Brouwer fixed point theorem. n

Theorem 19.11.1 Let S be a simplex [x0 , · · · , xn ] such that {xi − x0 }i=1 are independent. Also let f : S → S be continuous. Then there exists x ∈ S such that f (x) = x. Corollary 19.11.2 Let K be a closed convex bounded subset of Rn . Let f : K → K be continuous. Then there exists x ∈ K such that f (x) = x. Proof: Let S be a large simplex containing K and let P be the projection map onto K. See Problem 10 on Page 281 for the necessary properties of this projection map. Consider g (x) ≡ f (P x) . Then g satisfies the necessary conditions for Theorem 19.11.1 and so there exists x ∈ S such that g (x) = x. But this says x ∈ K and so g (x) = f (x).  Definition 19.11.3 A set B has the fixed point property if whenever f : B → B for f continuous, it follows that f has a fixed point. The proof of this corollary is pretty significant. By a homework problem, a closed convex set is a retract of Rn . This is what it means when you say there is this continuous projection map which maps onto the closed convex set but does not change any point in the closed convex set. When you have a set A which is a subset of a set B which has the property that continuous functions f : B → B have fixed points, and there is a continuous map P from B to A which leaves points of A unchanged, then it follows that A will have the same “fixed point property”. You can probably imagine all sorts of sets which are retracts of closed convex bounded sets. Also, if you have a compact set B which has the fixed point property and h : B → h (B) with h one to one and continuous, it will follow that h−1 is continuous and that h (B) will also have the fixed point property. This is very easy to show. This will allow further extensions of this theorem. This says that the fixed point property is topological.

19.12

Invariance Of Domain∗

As an application of the inverse function theorem is a simple proof of the important invariance of domain theorem which says that continuous and one to one functions defined on an open set in Rn with values in Rn take open sets to open sets. You know that this is true for functions of one variable because a one to one continuous function must be either strictly increasing or strictly decreasing. This will be used when considering orientations of curves later. However, the n dimensional version isn’t at all obvious but is just as important if you want to consider manifolds with boundary for example. The need for this theorem occurs in many other places as well in addition to being extremely interesting for its own sake. Lemma 19.12.1 Let f be continuous and map B (p, r) ⊆ Rn to Rn . Suppose that for all x ∈ B (p, r), |f (x) − x| < εr Then it follows that

( ) f B (p, r) ⊇ B (p, (1 − ε) r)

Proof: This is from the Brouwer fixed point theorem, Corollary 19.11.2. Consider for y ∈ B (p, (1 − ε) r) h (x) ≡ x − f (x) + y

19.12. INVARIANCE OF DOMAIN∗

485

Then h is continuous and for x ∈ B (p, r), |h (x) − p| = |x − f (x) + y − p| < εr + |y − p| < εr + (1 − ε) r = r Hence h : B (p, r) → B (p, r) and so it has a fixed point x by Corollary 19.11.2. Thus x − f (x) + y = x so f (x) = y.  The notation ∥f ∥K means supx∈K |f (x)|. Lemma 19.12.2 Let K be a compact set in Rn and let g : K → Rn be continuous, z ∈ K is fixed. Let δ > 0. Then there exists a polynomial q (each component a polynomial) such that −1

∥q − g∥K < δ, q (z) = g (z) , Dq (z)

exists

Proof: By the Weierstrass approximation theorem, Theorem 17.3.7, (apply this theorem to the algebra of real polynomials) there exists a polynomial q ˆ such that ∥ˆ q − g∥K < Then define for y ∈ K

δ 3

q (y) ≡ q ˆ (y) + g (z) − q ˆ (z)

Then q (z) = q ˆ (z) + g (z) − q ˆ (z) = g (z) Also |q (y) − g (y)| ≤

|(ˆ q (y) + g (z) − q ˆ (z)) − g (y)|

≤ |ˆ q (y) − g (y)| + |g (z) − q ˆ (z)|




∥xi ∥ ∥yi ∥ , π (v) + ε >

i

Then π (u + v) ≤





∥ˆ xj ∥ ∥ˆ yj ∥ .

j

∥xi ∥ ∥yi ∥ +

i



∥ˆ xj ∥ ∥ˆ yj ∥ ≤ π (u) + π (v) + 2ε

j

Since ε is arbitrary, π (u + v) ≤ π (u) + π (v). Now suppose π (u) = ∑ 0. Does it follow that u (A) = 0 for all A ∈ B (X × Y, V )? Let A ∈ B (X × Y ; V ) and let u = i xi ⊗ yi such that ∑ ε/ ∥A∥ = π (u) + ε/ ∥A∥ > ∥xi ∥ ∥yi ∥ i

Then u (A) ≡

∑ i

A (xi , yi ) ≤



∥A∥ ∥xi ∥ ∥yi ∥ < (ε/ ∥A∥) ∥A∥ = ε

i

and since ε is arbitrary, this requires u (A) = 0. Since A is arbitrary, this requires u = 0. Next is the very interesting equality that π (x ⊗ y) = ∥x∥ ∥y∥. It is obvious that π (x ⊗ y) ≤ ∥x∥ ∥y∥ because one way to write x ⊗ y is x ⊗ y. Let ϕ (x) = ∥x∥ , ψ (y) = ∥y∥ where ∥ϕ∥ , ∥ψ∥ = 1. Here ϕ, ψ ∈ X ′ , Y ′ respectively. You get them from the Hahn Banach theorem. Then consider

19.13. TENSOR PRODUCTS

491

∑ the continuous bilinear form A (ˆ x, yˆ) ≡ ϕ (ˆ x) ψ (ˆ y ) . Say x ⊗ y = i xi ⊗ yi . There is a linear map ψ ∈ L (X ⊗ Y, V ) such that ψ (ˆ x ⊗ yˆ) = A (ˆ x, yˆ) . You just specify this on all things of the form e ⊗ f where e ∈ E a Hamel basis for X and f ∈ F, a Hamel basis for Y . Then it must hold for the linear span of these things which would yield the desired result. Hence, in particular, ∑ ∥x∥ ∥y∥ = ϕ (x) ψ (y) = |A (x, y)| = |ψ (x ⊗ y)| = ψ (xi ⊗ yi ) i ∑ ∑ ∑ ∥xi ∥ ∥yi ∥ A (xi , yi ) = ϕ (xi ) ψ (yi ) ≤ = i

i

i

It follows that on taking inf of both sides over all such representations of x ⊗ y that ∥x∥ ∥y∥ ≤ π (x ⊗ y) .  There is no difference if you replace X ⊗ Y with X1 ⊗ X2 ⊗ · · · ⊗ Xp . One modifies the definition as follows. Definition 19.13.12 We define for u ∈ X1 ⊗ X2 ⊗ · · · ⊗ Xp   p ∑ ∏  ∑

j π (u) ≡ inf x1i ⊗ x2i ⊗ · · · ⊗ xpi

xi : u =   i

j=1

i

In this context, it is assumed that the elements of X1 ⊗∏X2 ⊗ · · · ⊗ Xp act on continuous p linear forms. That is A is p linear and |A (x1 , · · · , xp )| ≤ ∥A∥ i ∥xi ∥ Corollary 19.13.13 π (u) is well defined and is a norm. Also, ( ) ∏ k

x . π x1 ⊗ x2 ⊗ · · · ⊗ xp = k

Recall Corollary 19.13.9 X1 × · · · × Xp ψ

⊗p ↓

↘ ˆ ψ

X1 ⊗ X2 ⊗ · · · ⊗ Xp

99K V

ˆ is continuous? Letting a ∈ X1 ⊗ X2 ⊗ · · · ⊗ Xp , does it follow that ˆ (a) Is it the case that ψ

ψ

≤ ∥a∥X1 ⊗X2 ⊗···⊗Xp ? ∑ Let a = i x1i ⊗ x2i ⊗ · · · ⊗ xpi . Then

(

)

∑ (



ˆ

ˆ ∑ 1 p p ) 2 1 2 ˆ = ψ xi ⊗ xi ⊗ · · · ⊗ xi = ψ xi ⊗ xi ⊗ · · · ⊗ xi

ψ (a)



V i i V V

p

∑ ( ∑ ∏ )

∥xm A x1i , · · · , xpi ≤ ∥A∥ = i ∥

i

V

i

V

m=1



ˆ

Then taking the inf over all such representations of a, it follows that ψ (a) ≤ ∥A∥ ∥a∥X1 ⊗X2 ⊗···⊗Xp . This proves the following theorem. Theorem 19.13.14 Suppose A is a continuous p linear map in B (X1 × · · · × Xp ; V ) where V is a vector space. Let ⊗p : X1 × · · · × Xp → X1 ⊗ X2 ⊗ · · · ⊗ Xp be given by ⊗p (x1 , x2 , · · · , xp ) ≡

492

CHAPTER 19. IMPLICIT FUNCTION THEOREM

ˆ ∈ L (X1 ⊗ X2 ⊗ · · · ⊗ Xp ; V ) such that x1 ⊗ x2 ⊗ · · · ⊗ xp . Then there exists a unique linear map ψ p ˆ ψ ◦ ⊗ = ψ. In other words, the following diagram commutes. X1 × · · · × Xp ψ

⊗p ↓



X1 ⊗ X2 ⊗ · · · ⊗ Xp

99K

ˆ ψ

V

ˆ is a continuous linear mapping. Also ψ

19.13.2

The Taylor Formula And Tensors

Let V be a Banach space and let P be a function defined on an open subset of V which has values in a Banach space W . Recall the nature of the derivatives. DP (x) ∈ L (V, W ). Thus v → DP (x) (v) is a differentiable map from V to W . Then from the definition, vˆ → D2 P (x) (v) (ˆ v) is again a differentiable function with values in W . D2 P (x) ∈ L (V, L (V, W )). We can write it in the form D2 P (x) (v, vˆ) Of course it is linear in both variables from the definition. Similarly, we denote P j (x) as a j linear form which is the j th derivative. In fact it is a symmetric j linear form as long as it is continuous. To see this, consider the case where j = 3 and P has values in R. Then let P (x + tu + sv + rw) = h (t, s, r) By equality of mixed partial derivatives, htsr (0, 0, 0) = hstr (0, 0, 0) etc. Thus htsr (t, s, r) = P 3 (x + tu + sv + rw) (u, v, w) and so P 3 (x) (u, v, w) = P 3 (x) (v, u, w) ′ etc. If it values )in W (you just replace ( has ) P with ϕ (P ) where ϕ ∈ W and use this result to conclude 3 3 that ϕ P (u, v, w) = ϕ P (v, u, w) etc. Then since the dual space separates points, the desired result is obtained. Similarly, D3 P (x) (v, u, w) = D3 P (x) (u, w, v) etc. Recall the following proof of Taylor’s formula:

h (t) ≡ P (u + tv) h (t) = h (0) + h′ (0) t + · · · +

1 k 1 h (0) tk + h(k+1) (s) tk+1 , |s| < |t| k! (k + 1)!

Now h′ (t) = ′′

DP (u + tv) v,

h (t) =

D2 P (u + tv) (v) (v) ,

h′′′ (t) =

D3 P (u + tv) (v) (v) (v)

etc. Thus letting t = 1, ( ) 1 2 1 k D P (u) (v, v) + · · · + Dk P (u) (v, · · · , v) + o |v| 2! k! Now you can use the representation theorem Theorem 19.13.14 to write this in the following form. ( ) 1 1 k P (u + v) = P (u) + P 1 (u) v + P 2 (u) v ⊗2 + · · · + P k (u) v ⊗k + o |v| 2! k! ( ) where f j (u) ∈ L V ⊗j , W . I think that the reason this is somewhat attractive is that the P k are linear operators. Recall P k (u) v ⊗k = DP k (u) (v, · · · , v). P (u + v) = P (u) + DP (u) v +

19.14. EXERCISES

19.14

493

Exercises

1. This problem was suggested to me by Matt Heiner. Earlier there was a problem in which two surfaces intersected at a point and this implied that in fact, they intersected in a smooth 2 curve. Now suppose you have two spheres x2 + y 2 + z 2 = 1 and (x − 2) + y 2 + z 2 = 1. These intersect at the single point (1, 0, 0) . Why does the implicit function theorem not imply that these surfaces intersect in a curve? 2

2

2. Maximize 2x + y subject to the condition that x4 + y9 ≤ 1. Hint: You need to consider interior points and also the method of Lagrange multipliers for the points on the boundary of this ellipse. 3. Maximize x + y subject to the condition that x2 +

y2 9

+ z 2 ≤ 1.

4. Find the points on y 2 x = 16 which are closest to (0, 0). 5. Let f (x, y, z) = x2 − 2yx + 2z 2 − 4z + 2. Identify all the points where Df = 0. Then determine whether they are local minima local maxima or saddle points. 6. Let f (x, y) = x4 − 2x2 + 2y 2 + 1. Identify all the points where Df = 0. Then determine whether they are local minima local maxima or saddle points. 7. Let f (x, y, z) = −x4 +2x2 −y 2 −2z 2 −1. Identify all the points where Df = 0. Then determine whether they are local minima local maxima or saddle points. 8. Let f : V → R where V is a finite dimensional normed vector space. Suppose f is convex which means f (tx + (1 − t) y) ≤ tf (x) + (1 − t) f (y) whenever t ∈ [0, 1]. Suppose also that f is differentiable. Show then that for every x, y ∈ V, (Df (x) − Df (y)) (x − y) ≥ 0. Thus convex functions have monotone derivatives. 9. Suppose B is an open ball in X and f : B → Y is differentiable. Suppose also there exists L ∈ L (X, Y ) such that ||Df (x) − L|| < k for all x ∈ B. Show that if x1 , x2 ∈ B, |f (x1 ) − f (x2 ) − L (x1 − x2 )| ≤ k |x1 − x2 | . Hint: Consider T x = f (x) − Lx and argue ||DT (x)|| < k. 10. Let f : U ⊆ X → Y , Df (x) exists for all x ∈ U , B (x0 , δ) ⊆ U , and there exists L ∈ L (X, Y ), such that L−1 ∈ L (Y, X), and for all x ∈ B (x0 , δ) ||Df (x) − L||
0 and an open subset of B (x0 , δ) , V , such that f : V → B (f (x0 ) , ε) is one to one and onto. Also Df −1 (y) exists for each y ∈ B (f (x0 ) , ε) and is given by the formula [ ( )]−1 . Df −1 (y) = Df f −1 (y) Hint: Let

Ty (x) ≡ T (x, y) ≡ x−L−1 (f (x) − y)

(1−r)δ n for |y − f (x0 )| < 2||L −1 || , consider {Ty (x0 )}. This is a version of the inverse function theorem 1 for f only differentiable, not C .

494

CHAPTER 19. IMPLICIT FUNCTION THEOREM

11. In the last assignment, you showed that if Df (x0 ) is invertible, then locally the function f was one to one. However, this is a strictly local result! Let f :R2 → R2 be given by ( ) ex cos y f (x, y) = ex sin y This clearly is not one to one because if you replace y with y + 2π, you get the same value. −1 Now verify that Df (x, y) exists for all (x, y). ∑ 12. Show every polynomial, |α|≤k dα xα is C k for every k. 13. Suppose U ⊆ R2 is an open set and f : U → R3 is C 1 . Suppose Df (s0 , t0 ) has rank two and   x0   f (s0 , t0 ) =  y0  . z0 Show that for (s, t) near (s0 , t0 ), the points f (s, t) may be realized in one of the following forms. {(x, y, ϕ (x, y)) : (x, y) near (x0 , y0 )}, {(ϕ (y, z) , y, z) : (y, z) near (y0 , z0 )}, or {(x, ϕ (x, z) , z, ) : (x, z) near (x0 , z0 )}. This shows that parametrically defined surfaces can be obtained locally in a particularly simple form. ∑n ∑n 2 2 14. Minimize j=1 xj subject to the constraint j=1 xj = a . Your answer should be some function of a which you may assume is a positive number. 15. A curve is formed from the intersection of the plane, 2x+3y+z = 3 and the cylinder x2 +y 2 = 4. Find the point on this curve which is closest to (0, 0, 0) . 16. A curve is formed from the intersection of the plane, 2x+3y+z = 3 and the sphere x2 +y 2 +z 2 = 16. Find the point on this curve which is closest to (0, 0, 0) . 17. Let A = (Aij ) be an n×n matrix which is symmetric. Thus Aij = Aji and recall (Ax)i = Aij xj ∂ (Aij xj xi ) = 2Aij xj . Show that when where as usual sum over the repeated index. Show ∂x i you use the ∑ method of Lagrange multipliers to maximize the function, Aij xj xi subject to the n 2 constraint, j=1 xj = 1, the value of λ which corresponds to the maximum value of this functions is such that Aij xj = λxi . Thus Ax = λx. Thus λ is an eigenvalue of the matrix, A. 18. Let x1 , · · · , x5 be 5 positive numbers. Maximize their product subject to the constraint that x1 + 2x2 + 3x3 + 4x4 + 5x5 = 300. 19. Let f (x1 , · · · , xn ) = xn1 xn−1 · · · x1n . Then f achieves a maximum on the set, 2 { } n ∑ S ≡ x ∈ Rn : ixi = 1 and each xi ≥ 0 . i=1

If x ∈ S is the point where this maximum is achieved, find x1 /xn .

19.14. EXERCISES

495

∏n ∑n 20. Maximize i=1 x(2i (≡ x) 21 × x22 × x23 × · · · × x2n ) subject to the constraint, i=1 x2i = r2 . Show n the maximum is r2 /n . Now show from this that (

)1/n

n ∏

1∑ 2 x n i=1 i n



x2i

i=1

and finally, conclude that if each number xi ≥ 0, then (

n ∏

i=1

)1/n xi

1∑ xi n i=1 n



and there exist values of the xi for which equality holds. This says the “geometric mean” is always smaller than the arithmetic mean. 21. Show that there exists a smooth solution y = y (x) to the equation xey + yex = 0 which is valid for x, y both near 0. Find y ′ (x) at a point (x, y) near (0, 0) . Then find y ′′ (x) for such (x, y). Can you find an explicit formula for y (x)? 22. The next few problems involve invariance of domain. Suppose U is a nonempty open set in Rn , f : U → Rn is continuous, and suppose that for each x ∈ U, there is a ball Bx containing x such that f is one to one on Bx . That is, f is locally one to one. Show that f (U ) is open. 23. ↑ In the situation of the above problem, suppose f : Rn → Rn is locally one to one. Also suppose that lim|x|→∞ |f (x)| = ∞. Show that it follows that f (Rn ) = Rn . That is, f is onto. Show that this would not be true if f is only defined on a proper open set. Also show that this would not be true if the condition lim|x|→∞ |f (x)| = ∞ does not hold. Hint: You might show that f (Rn ) is both open and closed and then use connectedness. To get an example in the second case, you might think of ex+iy . It does not include 0 + i0. Why not? 24. ↑ Show that if f : Rn → Rn is C 1 and if Df (x) exists and is invertible for all x ∈ Rn , then f is locally one to one. Thus, from the above problem, if lim|x|→∞ |f (x)| = ∞, then f is also onto. Now consider f : R2 → R2 given by ( ) ex cos y f (x, y) = ex sin y Show that this does not map onto R2 . In fact, it fails to hit (0, 0), but Df (x, y) is invertible for all (x, y). Show why it fails to satisfy the limit condition.

496

CHAPTER 19. IMPLICIT FUNCTION THEOREM

Chapter 20

Abstract Measures And Measurable Functions The Lebesgue integral is much better than the Rieman integral. This has been known for over 100 years. It is much easier to generalize to many dimensions and it is much easier to use in applications. That is why I am going to use it rather than struggle with an inferior integral. It is also this integral which is most important in probability. However, this integral is more abstract. This chapter will develop the abstract machinery necessary for this integral. The next definition describes what is meant by a σ algebra. This is the fundamental object which is studied in probability theory. The events come from a σ algebra of sets. Recall that P (Ω) is the set of all subsets of the given set Ω. It may also be denoted by 2Ω but I won’t refer to it this way. Definition 20.0.1 F ⊆ P (Ω) , the set of all subsets of Ω, is called a σ algebra if it contains ∅, Ω, ∞ and is closed with respect to countable unions and complements. That is, if {An }n=1 is countable and each An ∈ F , then ∪∞ n=1 Am ∈ F also and if A ∈ F , then Ω \ A ∈ F . It is clear that any intersection of σ algebras is a σ algebra. If K ⊆ P (Ω) , σ (K) is the smallest σ algebra which contains K. If F is a σ algebra, then it is also closed with respect to countable intersections. Here is why. ∞ C Let {Fk }k=1 ⊆ F . Then (∩k Fk ) = ∪k FkC ∈ F and so ( )C ( )C C ∩k Fk = (∩k Fk ) = ∪k FkC ∈ F. Example 20.0.2 You could consider N and for your σ algebra, you could have P (N). This satisfies all the necessary requirements. Note that in fact, P (S) works for any S. However, useful examples are not typically the set of all subsets.

20.1

Simple Functions And Measurable Functions

Recall that a σ algebra is a collection of subsets of a set Ω which includes ∅, Ω, and is closed with respect to countable unions and complements. Definition 20.1.1 Let (Ω, F) be a measurable space, one for which F is a σ algebra contained in P (Ω). Let f : Ω → X where X is a metric space. Then f is measurable means that f −1 (U ) ∈ F whenever U is open. It is important to have a theorem about pointwise limits of measurable functions, those with the property that inverse images of open sets are measurable. The following is a fairly general such theorem which holds in the situations to be considered in these notes.

497

498

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

Theorem 20.1.2 Let {fn } be a sequence of measurable functions mapping Ω to (X, d) where (X, d) is a metric space and (Ω, F) is a measureable space. Suppose also that f (ω) = limn→∞ fn (ω) for all ω. Then f is also a measurable function. Proof: It is required to show f −1 (U ) is measurable for all U open. Let } { ) ( 1 Vm ≡ x ∈ U : dist x, U C > . m Thus, since dist is continuous, { Vm ⊆

(

x ∈ U : dist x, U

C

)

1 ≥ m

}

and Vm ⊆ Vm ⊆ Vm+1 and ∪m Vm = U. Then since Vm is open, −1 ∞ f −1 (Vm ) = ∪∞ n=1 ∩k=n fk (Vm )

and so f −1 (U ) =

−1 ∪∞ (Vm ) m=1 f

−1 ∞ ∞ = ∪∞ m=1 ∪n=1 ∩k=n fk (Vm ) ( ) −1 ⊆ ∪∞ Vm = f −1 (U ) m=1 f

which shows f −1 (U ) is measurable.  An important example of a metric space is of course R, C, Fn , where F is either R or C and so forth. However, it is also very convenient to consider the metric space (−∞, ∞], the real line with ∞ tacked on at the end. This can be considered as a metric space in a very simple way. ρ (x, y) = |arctan (x) − arctan (y)| with the understanding that arctan (∞) ≡ π/2. It is easy to show that this metric restricted to R gives the same open sets on R as the usual metric given by d (x, y) = |x − y| but in addition, allows the inclusion of that ideal point out at the end of the real line denoted as ∞. This is considered mainly because it makes the development of the theory easier. The open sets in (−∞, ∞] are described in the following lemma. Lemma 20.1.3 The open balls in (−∞, ∞] consist of sets of the form (a, b) for a, b real numbers and (a, ∞]. This is a separable metric space. Proof: If the center of the ball is a real number, then the ball will result in an interval (a, b) where a, b are real numbers. If the center of the ball is ∞, then the ball results in something of the form (a, ∞]. It is obvious that this is a separable metric space with the countable dense set being Q since every ball contains a rational number.  If you kept both −∞ and ∞ with the obvious generalization that arctan (−∞) ≡ − π2 , then the resulting metric space would be a complete separable metric space. However, it is not convenient to include −∞, so this won’t be done. The reason is that it will be desired to make sense of things like f + g. Then for functions which have values in (−∞, ∞] we have the following extremely useful description of what it means for a function to be measurable. Lemma 20.1.4 Let f : Ω → (−∞, ∞] where F is a σ algebra of subsets of Ω. Here (−∞, ∞] is the metric space just described with the metric given by ρ (x, y) = |arctan (x) − arctan (y)| . Then the following are equivalent. f −1 ((d, ∞]) ∈ F for all finite d,

20.1. SIMPLE FUNCTIONS AND MEASURABLE FUNCTIONS

499

f −1 ((−∞, d)) ∈ F for all finite d, f −1 ([d, ∞]) ∈ F for all finite d, f −1 ((−∞, d]) ∈ F for all finite d, f −1 ((a, b)) ∈ F for all a < b, −∞ < a < b < ∞. Any of these equivalent conditions is equivalent to the function being measurable. Proof: First note that the first and the third are equivalent. To see this, observe −1 f −1 ([d, ∞]) = ∩∞ ((d − 1/n, ∞]), n=1 f

and so if the first condition holds, then so does the third. −1 f −1 ((d, ∞]) = ∪∞ ([d + 1/n, ∞]), n=1 f

and so if the third condition holds, so does the first. Similarly, the second and fourth conditions are equivalent. Now f −1 ((−∞, d]) = (f −1 ((d, ∞]))C so the first and fourth conditions are equivalent. Thus the first four conditions are equivalent and if any of them hold, then for −∞ < a < b < ∞, f −1 ((a, b)) = f −1 ((−∞, b)) ∩ f −1 ((a, ∞]) ∈ F . Finally, if the last condition holds, ( )C −1 f −1 ([d, ∞]) = ∪∞ ((−k + d, d)) ∈ F k=1 f and so the third condition holds. Therefore, all five conditions are equivalent. Since (−∞, ∞] is a separable metric space, it follows from Theorem 11.1.29 that every open set U is a countable union of open intervals U = ∪k Ik where Ik is of the form (a, b) or (a, ∞] and, as just shown if any of the equivalent conditions holds, then f −1 (U ) = ∪k f −1 (Ik ) ∈ F. Conversely, if f −1 (U ) ∈ F for any open set U ∈ (−∞, ∞], then f −1 ((a, b)) ∈ F which is one of the equivalent conditions and so all the equivalent conditions hold.  There is a fundamental theorem about the relation of simple functions to measurable functions given in the next theorem. Definition 20.1.5 Let E ∈ F for F a σ algebra. Then { 1 if ω ∈ E XE (ω) ≡ 0 if ω ∈ /E This is called the indicator function of the set E. Let s : (Ω, F) → R. Then s is a simple function if it is of the form n ∑ s (ω) = ci XEi (ω) i=1

where Ei ∈ F and ci ∈ R, the Ei being disjoint. Thus simple functions have finitely many values and are measurable. In the next theorem, it will also be assumed that each ci ≥ 0. Each simple function is measurable. This is easily seen as follows. First of all, you can assume the ci are distinct because if not, you could just replace those Ei which correspond to a single value with their union. Then if you have any open interval (a, b) , s−1 ((a, b)) = ∪ {Ei : ci ∈ (a, b)} and this is measurable because it is the finite union of measurable sets.

500

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

Theorem 20.1.6 Let f ≥ 0 be measurable. Then there exists a sequence of nonnegative simple functions {sn } satisfying 0 ≤ sn (ω) (20.1) · · · sn (ω) ≤ sn+1 (ω) · · · f (ω) = lim sn (ω) for all ω ∈ Ω. n→∞

(20.2)

If f is bounded, the convergence is actually uniform. Conversely, if f is nonnegative and is the pointwise limit of such simple functions, then f is measurable. Proof : Letting I ≡ {ω : f (ω) = ∞} , define 2 ∑ k tn (ω) = X −1 k k+1 (ω) + 2n XI (ω). n f ([ n , n )) n

k=0

Then tn (ω) ≤ f (ω) for all ω and limn→∞ tn (ω) = f (ω) for all ω. This is because tn (ω) = 2n for n ω ∈ I and if f (ω) ∈ [0, 2 n+1 ), then 0 ≤ f (ω) − tn (ω) ≤

1 . n

(20.3)

Thus whenever ω ∈ / I, the above inequality will hold for all n large enough. Let s1 = t1 , s2 = max (t1 , t2 ) , s3 = max (t1 , t2 , t3 ) , · · · . Then the sequence {sn } satisfies 20.1-20.2. Also each sn has finitely many values and is measurable. To see this, note that −1 n s−1 n ((a, ∞]) = ∪k=1 tk ((a, ∞]) ∈ F To verify the last claim, note that in this case the term 2n XI (ω) is not present and for n large enough, 2n /n is larger than all values of f . Therefore, for all n large enough, 20.3 holds for all ω. Thus the convergence is uniform. The last claim follows right away from Theorem 20.1.2.  Although it is not needed here, there is a more general theorem which applies to measurable functions which have values in a separable metric space. In this context, a simple function is one which is of the form m ∑ xk XEk (ω) k=1

where the Ek are disjoint measurable sets and the xk are in X. I am abusing notation somewhat by using a sum. You can’t add in a general metric space. The symbol means the function has value xk on the set Ek . Theorem 20.1.7 Let (Ω, F) be a measurable space and let f : Ω → X where (X, d) is a separable metric space. Then f is a measurable function if and only if there exists a sequence of simple functions,{fn } such that for each ω ∈ Ω and n ∈ N, d (fn (ω) , f (ω)) ≥ d (fn+1 (ω) , f (ω))

(20.4)

lim d (fn (ω) , f (ω)) = 0.

(20.5)

and n→∞ ∞

Proof: Let D = {xk }k=1 be a countable dense subset of X. First suppose f is measurable. Then since in a metric space every open set is the countable intersection of closed sets, it follows n f −1 (closed set) ∈ F . Now let Dn = {xk }k=1 . Let { } A1 ≡ ω : d (x1 , f (ω)) = min d (xk , f (ω)) k≤n

20.2. MEASURES AND THEIR PROPERTIES

501

That is, A1 are those ω such that f (ω) is approximated best out of Dn by x1 . Why is this a measurable set? It is because ω → d (x, f (ω)) is a real valued measurable function, being the composition of a continuous function, y → d (x, y) and a measurable function, ω → f (ω) . Next let { } A2 ≡ ω ∈ / A1 : d (x2 , f (ω)) = min d (xk , f (ω)) k≤n

n

and continue in this manner obtaining disjoint measurable sets, {Ak }k=1 such that for ω ∈ Ak the best approximation to f (ω) from Dn is xk . Then fn (ω) ≡

n ∑

xk XAk (ω) .

k=1

Note min d (xk , f (ω)) ≤ min d (xk , f (ω))

k≤n+1

k≤n

and so this verifies 20.4. It remains to verify 20.5. Let ε > 0 be given and pick ω ∈ Ω. Then there exists xn ∈ D such that d (xn , f (ω)) < ε. It follows from the construction that d (fn (ω) , f (ω)) ≤ d (xn , f (ω)) < ε. This proves the first half. Now suppose the existence of the sequence of simple functions as described above. Each fn is a measurable function because fn−1 (U ) = ∪ {Ak : xk ∈ U }. Therefore, the conclusion that f is measurable follows from Theorem 20.1.2 on Page 498. 

20.2

Measures And Their Properties

First we define what is meant by a measure. Definition 20.2.1 Let (Ω, F) be a measurable space. Here F is a σ algebra of sets of Ω. Then ∞ µ : F → [0, ∞] is called a measure if whenever {Fi }i=1 is a sequence of disjoint sets of F, it follows that ∞ ∑ µ (∪∞ F ) = µ (Ei ) i i=1 i=1

Note that the series could equal ∞. If µ (Ω) < ∞, then µ is called a finite measure. An important case is when µ (Ω) = 1 when it is called a probability measure. Note that µ (∅) = µ (∅ ∪ ∅) = µ (∅) + µ (∅) and so µ (∅) = 0. Example 20.2.2 You could have P (N) = F and you could define µ (S) to be the number of elements of S. This is called counting measure. It is left as an exercise to show that this is a measure. Example 20.2.3 Here is a pathological example. Let Ω be uncountable and F will be those sets which have the property that either the set is countable or its complement is countable. Let µ (E) = 0 if E is countable and µ (E) = 1 if E is uncountable. It is left as an exercise to show that this is a measure. Of course the most important measure is Lebesgue measure which gives the “volume” of a subset of Rn . However, this requires a lot more work. ∑∞ Lemma 20.2.4 If µ is a measure and Fi ∈ F , then µ (∪∞ i=1 Fi ) ≤ i=1 µ (Fi ). Also if Fn ∈ F and Fn ⊆ Fn+1 for all n, then if F = ∪n Fn , µ (F ) = lim µ (Fn ) n→∞

If Fn ⊇ Fn+1 for all n, then if µ (F1 ) < ∞ and F = ∩n Fn , then µ (F ) = lim µ (Fn ) n→∞

502

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS Proof: Let G1 = F1 and if G1 , · · · , Gn have been chosen disjoint, let Gn+1 ≡ Fn+1 \ ∪ni=1 Gi

Thus the Gi are disjoint. In addition, these are all measurable sets. Now µ (Gn+1 ) + µ (Fn+1 ∩ (∪ni=1 Gi )) = µ (Fn+1 ) and so µ (Gn ) ≤ µ (Fn ). Therefore, µ (∪∞ i=1 Gi ) =



µ (Gi ) ≤

i



µ (Fi ) .

i

Now consider the increasing sequence of Fn ∈ F . If F ⊆ G and these are sets of F µ (G) = µ (F ) + µ (G \ F ) so µ (G) ≥ µ (F ). Also

F = ∪∞ i=1 (Fi+1 \ Fi ) + F1

Then µ (F ) =

∞ ∑

µ (Fi+1 \ Fi ) + µ (F1 )

i=1

Now µ (Fi+1 \ Fi ) + µ (Fi ) = µ (Fi+1 ). If any µ (Fi ) = ∞, there is nothing to prove. Assume then that these are all finite. Then µ (Fi+1 \ Fi ) = µ (Fi+1 ) − µ (Fi ) and so µ (F ) =

∞ ∑

µ (Fi+1 ) − µ (Fi ) + µ (F1 )

i=1

=

lim

n→∞

n ∑

µ (Fi+1 ) − µ (Fi ) + µ (F1 ) = lim µ (Fn+1 ) n→∞

i=1

Next suppose µ (F1 ) < ∞ and {Fn } is a decreasing sequence. Then F1 \ Fn is increasing to F1 \ F and so by the first part, µ (F1 ) − µ (F ) = µ (F1 \ F ) = lim µ (F1 \ Fn ) = lim (µ (F1 ) − µ (Fn )) n→∞

n→∞

This is justified because µ (F1 \ Fn ) + µ (Fn ) = µ (F1 ) and all numbers are finite by assumption. Hence µ (F ) = lim µ (Fn ) .  n→∞

20.3

Dynkin’s Lemma

Dynkin’s lemma is a very useful result. It is used quite a bit in books on probability. It will be used here to obtain n dimensional Lebesgue measure, also to establish an important result on regularity in the next section. Definition 20.3.1 Let Ω be a set and let K be a collection of subsets of Ω. Then K is called a π system if ∅, Ω ∈ K and whenever A, B ∈ K, it follows A ∩ B ∈ K.

20.3. DYNKIN’S LEMMA

503

The following is the fundamental lemma which shows these π systems are useful. This is due to Dynkin. Lemma 20.3.2 Let K be a π system of subsets of Ω, a set. Also let G be a collection of subsets of Ω which satisfies the following three properties. 1. K ⊆ G 2. If A ∈ G, then AC ∈ G ∞

3. If {Ai }i=1 is a sequence of disjoint sets from G then ∪∞ i=1 Ai ∈ G. Then G ⊇ σ (K) , where σ (K) is the smallest σ algebra which contains K. Proof: First note that if H ≡ {G : 1 - 3 all hold} then ∩H yields a collection of sets which also satisfies 1 - 3. Therefore, I will assume in the argument that G is the smallest collection satisfying 1 - 3. Let A ∈ K and define GA ≡ {B ∈ G : A ∩ B ∈ G} . I want to show GA satisfies 1 - 3 because then it must equal G since G is the smallest collection of subsets of Ω which satisfies 1 - 3. This will give the conclusion that for A ∈ K and B ∈ G, A ∩ B ∈ G. This information will then be used to show that if A, B ∈ G then A ∩ B ∈ G. From this it will follow very easily that G is a σ algebra which will imply it contains σ (K). Now here are the details of the argument. Since K is given to be a π system, K ⊆ G A . Property 3 is obvious because if {Bi } is a sequence of disjoint sets in GA , then ∞ A ∩ ∪∞ i=1 Bi = ∪i=1 A ∩ Bi ∈ G because A ∩ Bi ∈ G and the property 3 of G. It remains to verify Property 2 so let B ∈ GA . I need to verify that B C ∈ GA . In other words, I need to show that A ∩ B C ∈ G. However, ( )C A ∩ B C = AC ∪ (A ∩ B) ∈ G Here is why. Since B ∈ GA , A ∩ B ∈ G and since A ∈ K ⊆ G it follows AC ∈ G by assumption 2. It follows from assumption 3 the union of the disjoint sets, AC and (A ∩ B) is in G and then from 2 the complement of their union is in G. Thus GA satisfies 1 - 3 and this implies since G is the smallest such, that GA ⊇ G. However, GA is constructed as a subset of G. This proves that for every B ∈ G and A ∈ K, A ∩ B ∈ G. Now pick B ∈ G and consider GB ≡ {A ∈ G : A ∩ B ∈ G} . I just proved K ⊆ GB . The other arguments are identical to show GB satisfies 1 - 3 and is therefore equal to G. This shows that whenever A, B ∈ G it follows A ∩ B ∈ G. This implies G is a σ algebra. To show this, all that is left is to verify G is closed under countable unions because then it follows G is a σ algebra. Let {Ai } ⊆ G. Then let A′1 = A1 and A′n+1

≡ An+1 \ (∪ni=1 Ai ) ( ) = An+1 ∩ ∩ni=1 AC i ( ) = ∩ni=1 An+1 ∩ AC ∈G i

because finite intersections of sets of G are in G. Since the A′i are disjoint, it follows ∞ ′ ∪∞ i=1 Ai = ∪i=1 Ai ∈ G

Therefore, G ⊇ σ (K). 

504

20.4

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

Measures And Regularity

It is often the case that Ω has more going on than to simply be a set. In particular, it is often the case that Ω is some sort of topological space, often a metric space. In this case, it is usually if not always the case that the open sets will be in the σ algebra of measurable sets. This leads to the following definition. Definition 20.4.1 A Polish space is a complete separable metric space. For a Polish space E or more generally a metric space or even a general topological space, B (E) denotes the Borel sets of E. This is defined to be the smallest σ algebra which contains the open sets. Thus it contains all open sets and closed sets and compact sets and many others. Don’t ever try to describe a generic Borel set. Always work with the definition that it is the smallest σ algebra containing the open sets. Attempts to give an explicit description of a “typical” Borel set tend to lead nowhere because there are so many things which can be done.You can take countable unions and complements and then countable intersections of what you get and then another countable union followed by complements and on and on. You just can’t get a good useable description in this way. However, it is easy to see that something like ( ∞ ∞ )C ∩i=1 ∪j=i Ej is a Borel set if the Ej are. This is useful. For example, R is a Polish space as is any separable Banach space. Amazing things can be said about finite measures on the Borel sets of a Polish space. First the case of a finite measure on a metric space will be considered. Definition 20.4.2 A measure, µ defined on B (E) will be called inner regular if for all F ∈ B (E) , µ (F ) = sup {µ (K) : K ⊆ F and K is closed} A measure, µ defined on B (E) will be called outer regular if for all F ∈ B (E) , µ (F ) = inf {µ (V ) : V ⊇ F and V is open} When a measure is both inner and outer regular, it is called regular. Actually, it is more useful and likely more standard to refer to µ being inner regular as µ (F ) = sup {µ (K) : K ⊆ F and K is compact} Thus the word “closed” is replaced with “compact”. For finite measures, defined on the Borel sets of a metric space, the first definition of regularity is automatic. These are always outer and inner regular provided inner regularity refers to closed sets. Lemma 20.4.3 Let µ be a finite measure defined on B (X) where X is a metric space. Then µ is regular. Proof: First note every open set is the countable union of closed sets and every closed set is the countable intersection of open sets. Here is why. Let V be an open set and let { ( ) } Kk ≡ x ∈ V : dist x, V C ≥ 1/k . Then clearly the union of the Kk equals V. Next, for K closed let Vk ≡ {x ∈ X : dist (x, K) < 1/k} . Clearly the intersection of the Vk equals K. Therefore, letting V denote an open set and K a closed set, µ (V ) =

sup {µ (K) : K ⊆ V and K is closed}

µ (K) =

inf {µ (V ) : V ⊇ K and V is open} .

20.4. MEASURES AND REGULARITY

505

Also since V is open and K is closed, µ (V ) =

inf {µ (U ) : U ⊇ V and V is open}

µ (K) =

sup {µ (L) : L ⊆ K and L is closed}

In words, µ is regular on open and closed sets. Let F ≡ {F ∈ B (X) such that µ is regular on F } . Then F contains the open sets and the closed sets. Suppose F ∈ F . Then there exists V ⊇ F with µ (V \ F ) < ε. It follows V C ⊆ F C and ( ) µ F C \ V C = µ (V \ F ) < ε. Thus F C is inner regular. Since F ∈ F, there exists K ⊆ F where K is closed and µ (F \ K) < ε. Then also K C ⊇ F C and ( ) µ K C \ F C = µ (F \ K) < ε. Thus if F ∈ F so is F C . Suppose now that {Fi } ⊆ F , the Fi being disjoint. Is ∪Fi ∈ F ? There exists Ki ⊆ Fi such that µ (Ki ) + ε/2i > µ (Fi ) . Then µ (∪∞ i=1 Fi )

=

∞ ∑

µ (Fi ) ≤ ε +

i=1


µ (Vi ) and µ (∪∞ i=1 Fi ) =

∞ ∑ i=1

µ (Fi ) > −ε +

∞ ∑

µ (Vi ) ≥ −ε + µ (∪∞ i=1 Vi )

i=1

which shows µ is outer regular on ∪∞ i=1 Fi . It follows F contains the π system consisting of open sets and so by the Lemma on π systems, Lemma 20.3.2, F contains σ (τ ) where τ is the set of open sets. Hence F contains the Borel sets and is itself a subset of the Borel sets by definition. Therefore, F = B (X) .  One can say more if the metric space is complete and separable. In fact in this case the above definition of inner regularity can be shown to imply the usual one where the closed sets are replaced with compact sets. Lemma 20.4.4 Let µ be a finite measure on a σ algebra containing B (X) , the Borel sets of X, a separable complete metric space. Then if C is a closed set, µ (C) = sup {µ (K) : K ⊆ C and K is compact.} It follows that for a finite measure on B (X) where X is a Polish space, µ is inner regular in the sense that for all F ∈ B (X) , µ (F ) = sup {µ (K) : K ⊆ F and K is compact} ( ) 1 Proof: Let {ak } be a countable dense subset of C. Thus ∪∞ k=1 B ak , n ⊇ C. Therefore, there exists mn such that ( ( )) 1 ε mn ≡ µ (C \ Cn ) < n . µ C \ ∪k=1 B ak , n 2

506

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

Now let K = C ∩ (∩∞ n=1 Cn ) . Then K is a subset of Cn for each n and so for each ε > 0 there exists an ε net for K since Cn has a 1/n net, namely a1 , · · · , amn . Since K is closed, it is complete and so it is also compact since it is complete and totally bounded, Theorem 11.1.38. Now µ (C \ K) = µ (∪∞ n=1 (C \ Cn ))
r

n ∑

µBk (E ∩ Ak ) > l

k=1

Thus this is inner regular. To show outer regular, it suffices to assume µ (E) < ∞ since otherwise there is nothing to prove. There exists an open Vn containing E ∩ An which is contained in Bn such that µBn (E ∩ An ) + ε/2n > µBn (Vn ) . Then let V be the union of all these Vn . µ (V \ E) = =

µ (∪k Vk \ ∪k (E ∩ Ak )) ≤ ∞ ∑

µBk (Vk \ (E ∩ Ak ))
µ (V ) 

20.5

When Is A Measure A Borel Measure?

You have an outer measure defined on the power set of some metric space. How can you tell that the σ algebra of measurable sets includes the Borel sets? This is what is discussed here. This is a very important idea because, from the above, you can then assert regularity of the measure if, for example it is finite on any ball. Definition 20.5.1 For two sets, A, B in a metric space, we define dist (A, B) ≡ inf {d (x, y) : x ∈ A, y ∈ B} . Theorem 20.5.2 Let µ be an outer measure on the subsets of (X, d), a metric space. If µ(A ∪ B) = µ(A) + µ(B) whenever dist(A, B) > 0, then the σ algebra of measurable sets S contains the Borel sets. Proof: It suffices to show that closed sets are in S, the σ-algebra of measurable sets, because then the open sets are also in S and consequently S contains the Borel sets. Let K be closed and let S be a subset of Ω. Is µ(S) ≥ µ(S ∩ K) + µ(S \ K)? It suffices to assume µ(S) < ∞. Let Kn ≡ {x : dist(x, K) ≤

1 } n

By Lemma 11.1.14 on Page 251, x → dist (x, K) is continuous and so Kn is closed. By the assumption of the theorem, µ(S) ≥ µ((S ∩ K) ∪ (S \ Kn )) = µ(S ∩ K) + µ(S \ Kn )

(20.6)

since S ∩ K and S \ Kn are a positive distance apart. Now µ(S \ Kn ) ≤ µ(S \ K) ≤ µ(S \ Kn ) + µ((Kn \ K) ∩ S).

(20.7)

508

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

If limn→∞ µ((Kn \ K) ∩ S) = 0 then the theorem will be proved because this limit along with 20.7 implies limn→∞ µ (S \ Kn ) = µ (S \ K) and then taking a limit in 20.6, µ(S) ≥ µ(S ∩ K) + µ(S \ K) as desired. Therefore, it suffices to establish this limit. Since K is closed, a point, x ∈ / K must be at a positive distance from K and so Kn \ K = ∪∞ k=n Kk \ Kk+1 . Therefore µ(S ∩ (Kn \ K)) ≤

∞ ∑

µ(S ∩ (Kk \ Kk+1 )).

(20.8)

k=n

If

∞ ∑

µ(S ∩ (Kk \ Kk+1 )) < ∞,

(20.9)

k=1

then µ(S ∩ (Kn \ K)) → 0 because it is dominated by the tail of a convergent series so it suffices to show 20.9. M ∑ µ(S ∩ (Kk \ Kk+1 )) = ∑

k=1

µ(S ∩ (Kk \ Kk+1 )) +

k even, k≤M



µ(S ∩ (Kk \ Kk+1 )).

(20.10)

k odd, k≤M

By the construction, the distance between any pair of sets, S ∩ (Kk \ Kk+1 ) for different even values of k is positive and the distance between any pair of sets, S ∩ (Kk \ Kk+1 ) for different odd values of k is positive. Therefore, ∑ ∑ µ(S ∩ (Kk \ Kk+1 )) + µ(S ∩ (Kk \ Kk+1 )) ≤ k even, k≤M

 µ



k odd, k≤M



(S ∩ (Kk \ Kk+1 )) + µ 

k even, k≤M

and so for all M,

20.6

∑M k=1





 (S ∩ (Kk \ Kk+1 ))

k odd, k≤M

≤ µ (S) + µ (S) = 2µ (S) µ(S ∩ (Kk \ Kk+1 )) ≤ 2µ (S) showing 20.9. 

Measures And Outer Measures

The above is all fine but is pretty abstract. Here is a simple example. Let Ω = N and let the σ algebra be the set of all subsets of N, P (N). Then let µ (A) ≡ the number of elements of A. This is known as counting measure. You can verify that this is an example of a measure and σ algebra. However, we really want more interesting examples than this. There is also something called an outer measure which is defined on the set of all subsets. Definition 20.6.1 Let Ω be a nonempty set and let λ : P (Ω) → [0, ∞) satisfy the following: 1. λ (∅) = 0 2. If A ⊆ B, then λ (A) ≤ λ (B) ∑∞ 3. λ (∪∞ i=1 Ei ) ≤ i=1 λ (Ei ) Then λ is called an outer measure.

20.7. EXERCISES

509

Every measure determines an outer measure. For example, suppose that µ is a measure on F a σ algebra of subsets of Ω. Then define µ ¯ (S) ≡ inf {µ (E) : E ⊇ S, E ∈ F } This is easily seen to be an outer measure. Also, we have the following Proposition. Proposition 20.6.2 Let µ be a measure as just described. Then µ ¯ as defined above, is an outer measure and also, if E ∈ F , then µ ¯ (E) = µ (E). Proof: The first two properties of an outer measure are obvious. What of the third? If any µ ¯ (Ei ) = ∞, then there is nothing to show so suppose each of these is finite. Let Fi ⊇ Ei such that Fi ∈ F and µ ¯ (Ei ) + 2εi > µ (Fi ) . Then µ ¯ (∪∞ i=1 Ei )


0, lim µ (ω : |fn (ω) − f (ω)| > ε) = 0

n→∞

Show that if this happens, then there exists a set of measure N such that if ω ∈ / N, then lim fn (ω) = f (ω) .

n→∞

Also show that limn→∞ fn (ω) = f (ω) , then fn converges in measure to f .

20.8. AN OUTER MEASURE ON P (R)

511

16. Let X, Y be separable metric spaces. Then X × Y can also be considered as a metric space with the metric ρ ((x, y) , (ˆ x, yˆ)) ≡ max (dX (x, x ˆ) , dY (y, yˆ)) Verify this. Then show that if K consists of sets A × B where A, B are Borel sets in X and Y respectively, then it follows ∏ that σ (K) = B (X × Y ) , the Borel sets from X × Y . Extend to the Cartesian product i Xi of finitely many separable metric spaces.

20.8

An Outer Measure On P (R)

A measure on R is like length. I will present something more general than length because it is no trouble to do so and the generalization is useful in many areas of mathematics such as probability. This outer measure will end up determining a measure known as the Lebesgue Stieltjes measure. Definition 20.8.1 The following definition is important. F (x+) ≡ lim F (y) , F (x−) = lim F (y) y→x+

y→x−

Thus one of these is the limit from the left and the other is the limit from the right. In probability, one often has F (x) ≥ 0, F is increasing, and F (x+) = F (x). This is the case where F is a probability distribution function. In this case, F (x) ≡ P (X ≤ x) where X is a random variable. In this case, limx→∞ F (x) = 1 but we are considering more general functions than this including the simple example where F (x) = x. This last example will end up giving Lebesgue measure on R. Definition 20.8.2 P (S) denotes the set of all subsets of S. Theorem 20.8.3 Let F be an increasing function defined on R. This will be called an integrator function. There exists a function µ : P (R) → [0, ∞] which satisfies the following properties. 1. If A ⊆ B, then 0 ≤ µ (A) ≤ µ (B) , µ (∅) = 0. ∑∞ 2. µ (∪∞ k=1 Ai ) ≤ i=1 µ (Ai ) 3. µ ([a, b]) = F (b+) − F (a−) , 4. µ ((a, b)) = F (b−) − F (a+) 5. µ ((a, b]) = F (b+) − F (a+) 6. µ ([a, b)) = F (b−) − F (a−). Proof: First it is necessary to define the function µ. This is contained in the following definition. Definition 20.8.4 For A ⊆ R, µ (A) = inf

 ∞ ∑ 

j=1

  (F (bi −) − F (ai +)) : A ⊆ ∪∞ (a , b ) i i i=1 

In words, you look at all coverings of A with open intervals. For each of these open coverings, you add the “lengths” of the individual open intervals and you take the infimum of all such numbers obtained. Then 1.) is obvious because if a countable collection of open intervals covers B, then it also covers A. Thus the set of numbers obtained for B is smaller than the set of numbers for A. Why is µ (∅) = 0? Pick a point of continuity of F. Such points exist because F is increasing and so it has

512

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

only countably many points of discontinuity. Let a be this point. Then ∅ ⊆ (a − δ, a + δ) and so µ (∅) ≤ F (a + δ) − F (a − δ) for every δ > 0. Letting δ → 0, it follows that µ (∅) = 0. Consider 2.). If any µ (Ai ) = ∞, there is nothing to prove. The assertion simply is ∞ ≤ ∞. Assume then that µ (Ai ) < ∞ for all i. Then for each m ∈ N there exists a countable set of open m ∞ intervals, {(am i , bi )}i=1 such that ∞

µ (Am ) +

∑ ε m > (F (bm i −) − F (ai +)) . 2m i=1

Then using Theorem 1.15.4 on Page 24, µ (∪∞ m=1 Am ) ≤ ≤



m (F (bm i −) − F (ai +)) =

i,m ∞ ∑

∞ ∞ ∑ ∑

m (F (bm i −) − F (ai +))

m=1 i=1

µ (Am ) +

m=1

∞ ∑

ε = µ (Am ) + ε, 2m m=1

and since ε is arbitrary, this establishes 2.). ∞ Next consider 3.). By definition, there exists a sequence of open intervals, {(ai , bi )}i=1 whose union contains [a, b] such that µ ([a, b]) + ε ≥

∞ ∑

(F (bi −) − F (ai +))

i=1

By Theorem 11.5.3, finitely many of these intervals also cover [a, b]. It follows there exist finitely n many of these intervals, denoted as {(ai , bi )}i=1 , which overlap, such that a ∈ (a1 , b1 ) , b1 ∈ (a2 , b2 ) , · · · , b ∈ (an , bn ) . Therefore, µ ([a, b]) ≤

n ∑

(F (bi −) − F (ai +)) .

i=1

It follows n ∑

(F (bi −) − F (ai +)) ≥ µ ([a, b])

i=1



n ∑

(F (bi −) − F (ai +)) − ε

i=1

≥ F (b+) − F (a−) − ε Therefore, directly from the definition, since [a, b] ⊆ (a − δ, b + δ) F (b + δ) − F (a − δ) ≥ µ ([a, b]) ≥ F (b+) − F (a−) − ε Letting δ → 0,

F (b+) − F (a−) ≥ µ ([a, b]) ≥ F (b+) − F (a−) − ε

Since ε is arbitrary, this shows µ ([a, b]) = F (b+) − F (a−) , This establishes 3.). Consider 4.). For small δ > 0, µ ([a + δ, b − δ]) ≤ µ ((a, b)) ≤ µ ([a, b]) . Therefore, from 3.) and the definition of µ, F ((b − δ)) − F ((a + δ)) ≤ F ((b − δ) +) − F ((a + δ) −)

20.9. MEASURES FROM OUTER MEASURES

513

= µ ([a + δ, b − δ]) ≤ µ ((a, b)) ≤ F (b−) − F (a+) Now letting δ decrease to 0 it follows F (b−) − F (a+) ≤ µ ((a, b)) ≤ F (b−) − F (a+) This shows 4.) Consider 5.). From 3.) and 4.), for small δ > 0, F (b+) − F ((a + δ)) ≤ F (b+) − F ((a + δ) −) = µ ([a + δ, b]) ≤ µ ((a, b]) ≤ µ ((a, b + δ)) = F ((b + δ) −) − F (a+) ≤ F (b + δ) − F (a+) . Now let δ converge to 0 from above to obtain F (b+) − F (a+) = µ ((a, b]) = F (b+) − F (a+) . This establishes 5.) and 6.) is entirely similar to 5.).  The first two conditions of the above theorem are so important that we give something satisfying them a special name. Definition 20.8.5 Let Ω be a nonempty set. A function mapping P (Ω) → [0, ∞] is called an outer measure if it satisfies the following two condition. 1. If A ⊆ B, then 0 ≤ µ (A) ≤ µ (B) , µ (∅) = 0. ∑∞ 2. µ (∪∞ k=1 Ai ) ≤ i=1 µ (Ai )

20.9

Measures From Outer Measures

Earlier in Theorem 20.8.3 an outer measure on P (R) was constructed. This can be used to obtain a measure defined on R. However, the procedure for doing so is a special case of a general approach due to Caratheodory in about 1918. Definition 20.9.1 Let Ω be a nonempty set and let µ : P(Ω) → [0, ∞] be an outer measure. For E ⊆ Ω, E is µ measurable if for all S ⊆ Ω, µ(S) = µ(S \ E) + µ(S ∩ E).

(20.11)

To help in remembering 20.11, think of a measurable set E, as a process which divides a given set into two pieces, the part in E and the part not in E as in 20.11. In the Bible, there are several incidents recorded in which a process of division resulted in more stuff than was originally present.1 Measurable sets are exactly those which are incapable of such a miracle. You might think of the measurable sets as the non-miraculous sets. The idea is to show that they form a σ algebra on which the outer measure µ is a measure. First here is a definition and a lemma. Definition 20.9.2 (µ⌊S)(A) ≡ µ(S ∩ A) for all A ⊆ Ω. Thus µ⌊S is the name of a new outer measure, called µ restricted to S. 11

Kings 17, 2 Kings 4, Mathew 14, and Mathew 15 all contain such descriptions. The stuff involved was either oil, bread, flour or fish. In mathematics such things have also been done with sets. In the book by Bruckner Bruckner and Thompson there is an interesting discussion of the Banach Tarski paradox which says it is possible to divide a ball in R3 into five disjoint pieces and assemble the pieces to form two disjoint balls of the same size as the first. The details can be found in: The Banach Tarski Paradox by Wagon, Cambridge University press. 1985. It is known that all such examples must involve the axiom of choice.

514

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

The next lemma indicates that the property of measurability is not lost by considering this restricted measure. Lemma 20.9.3 If A is µ measurable, then A is µ⌊S measurable. Proof: Suppose A is µ measurable. It is desired to to show that for all T ⊆ Ω, (µ⌊S)(T ) = (µ⌊S)(T ∩ A) + (µ⌊S)(T \ A). Thus it is desired to show µ(S ∩ T ) = µ(T ∩ A ∩ S) + µ(T ∩ S ∩ AC ).

(20.12)

But 20.12 holds because A is µ measurable. Apply Definition 20.9.1 to S ∩ T instead of S.  If A is µ⌊S measurable, it does not follow that A is µ measurable. Indeed, if you believe in the existence of non measurable sets, you could let A = S for such a µ non measurable set and verify that S is µ⌊S measurable. The next theorem is the main result on outer measures which shows that, starting with an outer measure, you can obtain a measure. Theorem 20.9.4 Let Ω be a set and let µ be an outer measure on P (Ω). The collection of µ measurable sets S, forms a σ algebra and If Fi ∈ S, Fi ∩ Fj = ∅, then µ(∪∞ i=1 Fi ) =

∞ ∑

µ(Fi ).

(20.13)

i=1

If · · · Fn ⊆ Fn+1 ⊆ · · · , then if F = ∪∞ n=1 Fn and Fn ∈ S, it follows that µ(F ) = lim µ(Fn ). n→∞

(20.14)

If · · · Fn ⊇ Fn+1 ⊇ · · · , and if F = ∩∞ n=1 Fn for Fn ∈ S then if µ(F1 ) < ∞, µ(F ) = lim µ(Fn ). n→∞

(20.15)

This measure space is also complete which means that if µ (F ) = 0 for some F ∈ S then if G ⊆ F, it follows G ∈ S also. BC

Proof: First note that ∅ and Ω are obviously in S. Now suppose A, B ∈ S. I will show A\B ≡ A∩ is in S. To do so, consider the following picture. It is required to show that S S



∩ C AC B

S S



A





µ (S) = µ (S \ (A \ B)) + µ (S ∩ (A \ B)) First consider S \ (A \ B) . From the picture, it equals ) ( ( ) S ∩ AC ∩ B C ∪ (S ∩ A ∩ B) ∪ S ∩ AC ∩ B

∩ B AC

BC S



A



B

B

Therefore, µ (S) ≤ µ (S \ (A \ B)) + µ (S ∩ (A \ B))

A

( ) ( ) ≤ µ S ∩ AC ∩ B C + µ (S ∩ A ∩ B) + µ S ∩ AC ∩ B + µ (S ∩ (A \ B ( ) ( ) ( = µ S ∩ AC ∩ B C + µ (S ∩ A ∩ B) + µ S ∩ AC ∩ B + µ S ∩ A ∩ B ( ) ( ) ( = µ S ∩ AC ∩ B C + µ S ∩ A ∩ B C + µ (S ∩ A ∩ B) + µ S ∩ AC ∩ ( ) = µ S ∩ B C + µ (S ∩ B) = µ (S)

20.9. MEASURES FROM OUTER MEASURES

515

and so this shows that A \ B ∈ S whenever A, B ∈ S. Since Ω ∈ S, this shows that A ∈ S if and only if AC ∈ S. Now if A, B ∈ S, A∪B = (AC ∩B C )C = (AC \ B)C ∈ S. By induction, if A1 , · · · , An ∈ S, then so is ∪ni=1 Ai . If A, B ∈ S, with A ∩ B = ∅, µ(A ∪ B) = µ((A ∪ B) ∩ A) + µ((A ∪ B) \ A) = µ(A) + µ(B). By induction, if Ai ∩ Aj = ∅ and Ai ∈ S, µ(∪ni=1 Ai ) =

n ∑

µ(Ai ).

(20.16)

i=1

Now let A = ∪∞ i=1 Ai where Ai ∩ Aj = ∅ for i ̸= j. ∞ ∑

µ(Ai ) ≥ µ(A) ≥ µ(∪ni=1 Ai ) =

i=1

n ∑

µ(Ai ).

i=1

Since this holds for all n, you can take the limit as n → ∞ and conclude, ∞ ∑

µ(Ai ) = µ(A)

i=1

which establishes 20.13. Consider part 20.14. Without loss of generality µ (Fk ) < ∞ for all k since otherwise there is nothing to show. Suppose {Fk } is an increasing sequence of sets of S. Then letting F0 ≡ ∅, ∞ {Fk+1 \ Fk }k=0 is a sequence of disjoint sets of S since it was shown above that the difference of two sets of S is in S. Also note that from 20.16 µ (Fk+1 \ Fk ) + µ (Fk ) = µ (Fk+1 ) and so if µ (Fk ) < ∞, then

µ (Fk+1 \ Fk ) = µ (Fk+1 ) − µ (Fk ) .

Therefore, letting

F ≡ ∪∞ k=1 Fk

which also equals

∪∞ k=1 (Fk+1 \ Fk ) ,

it follows from part 20.13 just shown that µ (F ) =

∞ ∑

µ (Fk+1 \ Fk ) = lim

n→∞

k=0

=

lim

n→∞

n ∑

n ∑

µ (Fk+1 \ Fk )

k=0

µ (Fk+1 ) − µ (Fk ) = lim µ (Fn+1 ) .

k=0

n→∞

In order to establish 20.15, let the Fn be as given there. Then, since (F1 \ Fn ) increases to (F1 \ F ), 20.14 implies lim (µ (F1 ) − µ (Fn )) = µ (F1 \ F ) . n→∞

The problem is, I don’t know F ∈ S and so it is not clear that µ (F1 \ F ) = µ (F1 )−µ (F ). However, µ (F1 \ F ) + µ (F ) ≥ µ (F1 ) and so µ (F1 \ F ) ≥ µ (F1 ) − µ (F ). Hence lim (µ (F1 ) − µ (Fn )) = µ (F1 \ F ) ≥ µ (F1 ) − µ (F )

n→∞

which implies lim µ (Fn ) ≤ µ (F ) .

n→∞

516

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

But since F ⊆ Fn ,

µ (F ) ≤ lim µ (Fn ) n→∞

and this establishes 20.15. Note that it was assumed µ (F1 ) < ∞ because µ (F1 ) was subtracted from both sides. It remains to show S is closed under countable unions. Recall that if A ∈ S, then AC ∈ S and S n is closed under finite unions. Let Ai ∈ S, A = ∪∞ i=1 Ai , Bn = ∪i=1 Ai . Then µ(S) =

µ(S ∩ Bn ) + µ(S \ Bn )

= (µ⌊S)(Bn ) +

(20.17)

(µ⌊S)(BnC ).

By Lemma 20.9.3 Bn is (µ⌊S) measurable and so is BnC . I want to show µ(S) ≥ µ(S \ A) + µ(S ∩ A). If µ(S) = ∞, there is nothing to prove. Assume µ(S) < ∞. Then apply Parts 20.15 and 20.14 to the outer measure µ⌊S in 20.17 and let n → ∞. Thus Bn ↑ A, BnC ↓ AC and this yields µ(S) = (µ⌊S)(A) + (µ⌊S)(AC ) = µ(S ∩ A) + µ(S \ A). Therefore A ∈ S and this proves Parts 20.13, 20.14, and 20.15. It only remains to verify the assertion about completeness. Letting G and F be as described above, let S ⊆ Ω. I need to verify µ (S) ≥ µ (S ∩ G) + µ (S \ G) However, µ (S ∩ G) + µ (S \ G) ≤ µ (S ∩ F ) + µ (S \ F ) + µ (F \ G) = µ (S ∩ F ) + µ (S \ F ) = µ (S) because by assumption, µ (F \ G) ≤ µ (F ) = 0.  Corollary 20.9.5 Completeness is the same as saying that if (E \ E ′ ) ∪ (E ′ \ E) ⊆ N ∈ F and µ (N ) = 0, then if E ∈ F , it follows that E ′ ∈ F also. Proof: If the new condition holds, then suppose G ⊆ F where µ (F ) = 0, F ∈ F. Then =∅

z }| { (G \ F ) ∪ (F \ G) ⊆ F and µ (F ) is given to equal 0. Therefore, G ∈ F . Now suppose the earlier version of completeness and let (E \ E ′ ) ∪ (E ′ \ E) ⊆ N ∈ F where µ (N ) = 0 and E ∈ F . Then we know (E \ E ′ ) , (E ′ \ E) ∈ F and all have measure zero. It follows E \ (E \ E ′ ) = E ∩ E ′ ∈ F . Hence E ′ = (E ∩ E ′ ) ∪ (E ′ \ E) ∈ F  The measure µ which results from the outer measure of Theorem 20.8.3 is called the Lebesgue Stieltjes measure associated with the integrator function F . Its properties will be discussed more in the next section. Here is a general result. If you have a measure µ, then by Proposition 20.6.2, µ ¯ defined there is an outer measure which agrees with µ on the σ algebra of measurable sets F. What of the measure determined by µ ¯ ? Denoted still by µ ¯ . Is µ ¯ = µ on F? Is F a subset of the µ ¯ measurable sets, those which satisfy the criterion of being measurable? Suppose E ∈ F . Is it the case that µ ¯ (S) = µ ¯ (S \ E) + µ ¯ (S ∩ E)?

20.10. ONE DIMENSIONAL LEBESGUE STIELTJES MEASURE

517

As usual, there is nothing to show if µ ¯ (S) = ∞ so assume this does not happen. Let F ⊇ S, F ∈ F . Then by Proposition 20.6.2, µ (F ) = µ (F \ E) + µ (F ∩ E) because everything is measurable. Then µ ¯ (S) ≡ ≥ ≥

inf µ (F ) ⊇ inf (µ (F \ E) + µ (F ∩ E))

F ⊇S

F ⊇S

inf µ (F \ E) + inf µ (F ∩ E)

F ⊇S

inf

F ⊇S

F ⊇S\E

µ (F \ E) +

inf

F ⊇S∩E

µ (F ∩ E)

≡ µ ¯ (S \ E) + µ ¯ (S ∩ E) Thus, indeed F is a subset of the µ ¯ measurable sets. By Proposition 20.6.2, µ ¯ = µ on F. This gives a way to complete a measure space which is described in the following proposition. Proposition 20.9.6 Let (Ω, F, µ) be a measure space. Let µ ¯ be the outer measure determined ( ) by ¯ the σ algebra of µ ¯ µ µ as in Proposition 20.6.2. Also denote as F, ¯ measurable sets. Thus Ω, F, ¯ is a complete measure space in which F¯ ⊇ F and µ ¯ = µ on F. Also, in this situation, if µ ¯ (E) = 0, ¯ No new sets are obtained if (Ω, F, µ) is already complete. then E ∈ F. Proof: If S is a set, µ ¯ (S) ≤

µ ¯ (S ∩ E) + µ ¯ (S \ E)



µ ¯ (E) + µ ¯ (S \ E)

=

µ ¯ (S \ E) ≤ µ ¯ (S)

and so all inequalities are equal signs. Thus, if E is a set with µ ¯ (E) = 0, then E ∈ F¯ ¯ Suppose now that (Ω, F, µ) is complete. Let F ∈ F. Then there exists E ⊇ F such that µ (E) = µ ¯ (F ) . This is obvious if µ ¯ (F ) = ∞. Otherwise, let En ⊇ F, µ ¯ (F ) + n1 > µ (En ) . Just let E = ∩n En . Now µ ¯ (E \ F ) = 0. Now also, there exists a set of F called W such that µ (W ) = 0 and W ⊇ E \ F. Thus E \ F ⊆ W, a set of measure zero. Hence by completeness of (Ω, F, µ) , it must be the case that E \ F = E ∩ F C = G ∈ F . Then taking complements of both sides, E C ∪ F = GC ∈ F . Now take intersections with E. F ∈ E ∩ GC ∈ F . 

20.10

One Dimensional Lebesgue Stieltjes Measure

Now with these major results about measures, it is time to specialize to the outer measure of Theorem 20.8.3. The next theorem gives Lebesgue Stieltjes measure on R. The conditions 20.18 and 20.19 given below are known respectively as inner and outer regularity. Theorem 20.10.1 Let F denote the σ algebra of Theorem 20.9.4, associated with the outer measure µ in Theorem 20.8.3, on which µ is a measure. Then every open interval is in F. So are all open and closed sets. Furthermore, if E is any set in F µ (E) = sup {µ (K) : K compact, K ⊆ E}

(20.18)

µ (E) = inf {µ (V ) : V is an open set V ⊇ E}

(20.19)

Proof: The first task is to show (a, b) ∈ F . I need to show that for every S ⊆ R, ( ) C µ (S) ≥ µ (S ∩ (a, b)) + µ S ∩ (a, b)

(20.20)

518

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

Suppose first S is an open interval, (c, d) . If (c, d) has empty intersection with (a, b) or is contained in (a, b) there is nothing to prove. The above expression reduces to nothing more than µ (S) = µ (S). Suppose next that (c, d) ⊇ (a, b) . In this case, the right side of the above reduces to µ ((a, b)) + µ ((c, a] ∪ [b, d)) ≤ F (b−) − F (a+) + F (a+) − F (c+) + F (d−) − F (b−) = F (d−) − F (c+) ≡ µ ((c, d)) The only other cases are c ≤ a < d ≤ b or a ≤ c < d ≤ b. Consider the first of these cases. Then the right side of 20.20 for S = (c, d) is µ ((a, d)) + µ ((c, a]) = F (d−) − F (a+) + F (a+) − F (c+) = F (d−) − F (c+) = µ ((c, d)) The last case is entirely similar. Thus 20.20 holds whenever S is an open interval. Now it is clear 20.20 also holds if µ (S) = ∞. Suppose then that µ (S) < ∞ and let S ⊆ ∪∞ k=1 (ak , bk ) such that µ (S) + ε >

∞ ∑

(F (bk −) − F (ak +)) =

k=1

∞ ∑

µ ((ak , bk )) .

k=1

Then since µ is an outer measure, and using what was just shown, ( ) C µ (S ∩ (a, b)) + µ S ∩ (a, b) ( ) C ∞ ≤ µ (∪∞ k=1 (ak , bk ) ∩ (a, b)) + µ ∪k=1 (ak , bk ) ∩ (a, b) ≤ ≤

∞ ∑ k=1 ∞ ∑

( ) C µ ((ak , bk ) ∩ (a, b)) + µ (ak , bk ) ∩ (a, b) µ ((ak , bk )) ≤ µ (S) + ε.

k=1

Since ε is arbitrary, this shows 20.20 holds for any S and so any open interval is in F. It follows any open set is in F. This follows from the fact that any open set in R is the countable union of open intervals. See Theorem 11.2.8 for example. There can be no more than countably many disjoint open intervals because the rational numbers are dense and countable. Since each of these open intervals is in F and F is a σ algebra, their union is also in F. It follows every closed set is in F also. This is because F is a σ algebra and if a set is in F then so is its complement. The closed sets are those which are complements of open sets. Then the regularity of this measure follows right away from Corollary 20.4.8 because the measure is finite on any open interval.  Definition 20.10.2 When the integrator function is F (x) = x, the Lebesgue Stieltjes measure just discussed is known as one dimensional Lebesgue measure and is denoted as m. Proposition 20.10.3 For m Lebesgue measure, m ([a, b]) = m ((a, b)) = b−a. Also m is translation invariant in the sense that if E is any Lebesgue measurable set, then m (x + E) = m (E). Proof: The formula for the measure of an interval comes right away from Theorem 20.8.3. From this, it follows right away that whenever E is an interval, m (x + E) = m (E). Every open set is the countable disjoint union of open intervals by Theorem 11.2.8, so if E is an open set, then m (x + E) = m (E). What about closed sets? First suppose H is a closed and bounded set. Then letting (−n, n) ⊇ H, µ (((−n, n) \ H) + x) + µ (H + x) = µ ((−n, n) + x)

20.11. EXERCISES

519

Hence, from what was just shown about open sets, µ (H) = =

µ ((−n, n)) − µ ((−n, n) \ H) µ ((−n, n) + x) − µ (((−n, n) \ H) + x) = µ (H + x)

Therefore, the translation invariance holds for closed and bounded sets. If H is an arbitrary closed set, then µ (H + x) = lim µ (H ∩ [−n, n] + x) = lim µ (H ∩ [−n, n]) = µ (H) . n→∞

n→∞

It follows right away that if G is the countable intersection of open sets, (Gδ set, pronounced g delta set ) then m (G ∩ (−n, n) + x) = m (G ∩ (−n, n)) Now taking n → ∞, m (G + x) = m (G) .Similarly, if F is the countable union of compact sets, (Fσ set, pronounced F sigma set) then µ (F + x) = µ (F ) . Now using Theorem 20.10.1, if E is an arbitrary measurable set, there exist an Fσ set F and a Gδ set G such that F ⊆ E ⊆ G and m (F ) = m (G) = m (E). Then m (F ) = m (x + F ) ≤ m (x + E) ≤ m (x + G) = m (G) = m (E) = m (F ) . 

20.11

Exercises

1. Suppose you have (X, F, µ) where F ⊇ B (X) and also µ (B (x0 , r)) < ∞ for all r > 0. Let S (x0 , r) ≡ {x ∈ X : d (x, x0 ) = r} . Show that {r > 0 : µ (S (x0 , r)) > 0} cannot be uncountable. Explain why there exists a strictly increasing sequence rn → ∞ such that µ (x : d (x, x0 ) = rn ) = 0. In other words, the skin of the ball has measure zero except for possibly countably many values of the radius r. 2. In constructing Lebesgue Stieltjes measure on R, we defined the outer measure as {∞ } ∑ ∞ µ (A) ≡ inf F (bi −) − F (ai +) : A ⊆ ∪i=1 (ai , bi ) i=1

It was also shown that F (b−) − F (a+) = µ ((a, b)). Show that this implies that µ is outer regular on P (R). That is, for any set A, µ (A) = inf {µ (V ) : V ⊇ A and V is open} . In particular, this holds for all A ∈ F the σ algebra of measureable sets. Now show that if (X, µ, F) is a measure space such that X is a complete separable metric space (Polish space) and if µ is outer regular on F ⊇ B (X) and finite on every ball, then µ must be inner regular on each set of F. That is, µ (A) = sup {µ (K) : K ⊆ A and K is compact} Hint: Let {rn } be the increasing sequence of Problem 1. Also let An = B (x0 , rn ) , r0 ≡ 0 Bn ≡ B (x0 , rn+1 ). Thus An ⊆ Bn . Let A ∈ F and A ⊆ An ⊆ An . Then show that there exists an open set Vn ⊇ An \ A, Vn ⊆ Bn such that ( ( )) µ V n \ An \ A < ε

520

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS ( ( )) Then explain why VnC ∩ An ⊆ A and µ A \ VnC ∩ An < ε. It might help to draw a picture on this last part. Thus there is a closed set H contained in A such that µ (A \ H) < ε. Now recall the interesting result about regularity in Polish space. Thus there is K compact such that µ (H \ K) < ε. Of course µ is not finite but µ restricted to Bn is. Now let F be arbitrary. Then let l < µ (F ) and argue that l < µ (F ∩ B (x0 , rn )) for some n. Then use what was just shown.

3. Suppose you have any measure space (Ω, F, µ). The problem is that it might not be a complete measure space. That is, you might have µ (F ) = 0 and G ⊆ F but G ∈ / F . Define the following µ ˆ on P (Ω) . {∞ } ∑ µ ˆ (F ) ≡ inf µ (Ei ) : F ⊆ ∪∞ i=1 Ei i=1

Show first that µ ˆ is an outer measure. Next show that it agrees with µ on F and that for every E ∈ F , ( ) µ ˆ (S) = µ ˆ (S ∩ E) + µ ˆ S ∩ EC From the Caratheodory procedure for constructing a measure space, there exists a σ algebra Fˆ which contains F on which µ ˆ is a complete measure. This is called the completion of the measure space. 4. Consider the Cantor set. This is obtained by starting with [0, 1] deleting (1/3, 2, 3) and then taking the two closed intervals which result and deleting the middle open third of each of these and continuing this way. Let Jk denote the union of the 2k closed intervals which result at the k th step of the construction. The Cantor set is J ≡ ∩∞ k=1 Jk . Explain why J is a nonempty compact subset of R. Show that every point of J is a limit point of J. Also show there exists a mapping from J onto [0, 1] even though the sum of the lengths of the deleted open intervals is 1. Show that the Cantor set has empty interior. If x ∈ J, consider the connected component of x. Show that this connected component is just x. 5. Lebesgue measure was discussed. Recall that m ((a, b)) = b − a and it is defined on a σ algebra which contains the Borel sets, more generally on P (R). Also recall that m is translation invariant. Let x ∼ y if and only if x − y ∈ Q. Show this is an equivalence relation. Now let W be a set of positive measure which is contained in (0, 1). For x ∈ W, let [x] denote those y ∈ W such that x ∼ y. Thus the equivalence classes partition W . Use axiom of choice to obtain a set S ⊆ W such that S consists of exactly one element from each equivalence class. Let T denote the rational numbers in [−1, 1]. Consider T + S ⊆ [−1, 2]. Explain why T + S ⊇ W . For T ≡ {rj } , explain why the sets {rj + S}j are disjoint. Now suppose S is measurable. Then show that you have a contradiction if m (S) = 0 since m (W ) > 0 and you also have a contradiction if m (S) > 0 because T + S consists of countably many disjoint sets. Explain why S cannot be measurable. Thus there exists T ⊆ R such that ( ) m (T ) < m (T ∩ S) + m T ∩ S C . Is there an open interval (a, b) such that if T = (a, b) , then the above inequality holds? ] [ 6. [Consider the following nested sequence of compact sets, {Pn }.Let P1 = [0, 1], P2 = 0, 31 ∪ ] 2 3 , 1 , etc. To go from Pn to Pn+1 , delete the open interval which is the middle third of each closed interval in Pn . Let P = ∩∞ n=1 Pn . By the finite intersection property of compact sets, P ̸= ∅. Show m(P ) = 0. If you feel ambitious also show there is a one to one onto mapping of [0, 1] to P . The set P is called the Cantor set. Thus, although P has measure zero, it has the same number of points in it as [0, 1] in the sense that there is a one to one and onto mapping from one to the other. Hint: There are various ways of doing this last part but the most enlightenment is obtained by exploiting the topological properties of the Cantor set rather than some silly representation in terms of sums of powers of two and three. All you need to do is use the Schroder Bernstein theorem and show there is an onto map from the Cantor set

20.11. EXERCISES

521

to [0, 1]. If you do this right and remember the theorems about characterizations of compact metric spaces, Proposition 11.1.38 on Page 257, you may get a pretty good idea why every compact metric space is the continuous image of the Cantor set. 7. Consider the sequence of functions defined in the following way. Let f1 (x) = x on [0, 1]. To get from fn to fn+1 , let fn+1 = fn on all intervals where fn is constant. If fn is nonconstant on [a, b], let fn+1 (a) = fn (a), fn+1 (b) = fn (b), fn+1 is piecewise linear and equal to 12 (fn (a) + fn (b)) on the middle third of [a, b]. Sketch a few of these and you will see the pattern. The process of modifying a nonconstant section of the graph of this function is illustrated in the following picture.

Show {fn } converges uniformly on [0, 1]. If f (x) = limn→∞ fn (x), show that f (0) = 0, f (1) = 1, f is continuous, and f ′ (x) = 0 for all x ∈ / P where P is the Cantor set of Problem 6. This function is called the Cantor function.It is a very important example to remember. Note it has derivative equal to zero a.e. and yet it succeeds in climbing from 0 to 1. Explain why this interesting function is not absolutely continuous although it is continuous. Hint: This isn’t too hard if you focus on getting a careful estimate on the difference between two successive functions in the list considering only a typical small interval in which the change takes place. The above picture should be helpful. 8. ↑ This problem gives a very interesting example found in the book by McShane [30]. Let g(x) = x + f (x) where f is the strange function of Problem 7. Let P be the Cantor set of Problem 6. Let [0, 1] \ P = ∪∞ j=1 Ij where Ij is open and Ij ∩ Ik = ∅ if j ̸= k. These intervals are the connected components of the complement of the Cantor set. Show m(g(Ij )) = m(Ij ) so ∞ ∞ ∑ ∑ m(Ij ) = 1. m(g(Ij )) = m(g(∪∞ j=1 Ij )) = j=1

j=1

Thus m(g(P )) = 1 because g([0, 1]) = [0, 2]. By Problem 5 there exists a set, A ⊆ g (P ) which is non measurable. Define ϕ(x) = XA (g(x)). Thus ϕ(x) = 0 unless x ∈ P . Tell why ϕ is measurable. (Recall m(P ) = 0 and Lebesgue measure is complete.) Now show that XA (y) = ϕ(g −1 (y)) for y ∈ [0, 2]. Tell why g −1 is continuous but ϕ ◦ g −1 is not measurable. (This is an example of measurable ◦ continuous ̸= measurable.) Show there exist Lebesgue measurable sets which are not Borel measurable. Hint: The function, ϕ is Lebesgue measurable. Now recall that Borel ◦ measurable = measurable. 9. Show that every countable set of real numbers is of measure zero. 10. Review the Cantor set in Problem 4 on Page 520. You deleted middle third open intervals. Show that you can take out open intervals in the middle which are not necessarily middle thirds, and end up with a set C which has Lebesgue measure equal to 1 − ε. Also show if you can that there exists a continuous and one to one map f : C → J where J is the usual Cantor set of Problem 4 which also has measure 0.

522

CHAPTER 20. ABSTRACT MEASURES AND MEASURABLE FUNCTIONS

Chapter 21

The Abstract Lebesgue Integral The general Lebesgue integral requires a measure space, (Ω, F, µ) and, to begin with, a nonnegative measurable function. I will use Lemma 1.15.2 about interchanging two supremums frequently. Also, I will use the observation that if {an } is an increasing sequence of points of [0, ∞] , then supn an = limn→∞ an which is obvious from the definition of sup. We have lots of good examples of measure spaces at this point, namely the Lebesgue Stieltjes measures defined above. Included in this is one dimensional Lebesgue measure in which F (x) = x. However, what follows is completely general, requiring only a measure space and measure.

21.1

Definition For Nonnegative Measurable Functions

21.1.1

Riemann Integrals For Decreasing Functions

First of all, the notation [g < f ] is short for {ω ∈ Ω : g (ω) < f (ω)} with other variants of this notation being similar. Also, the convention, 0 · ∞ = 0 will be used to simplify the presentation whenever it is convenient to do so. The notation a ∧ b means the minimum of a and b. Definition 21.1.1 Let f : [a, b] → [0, ∞] be decreasing. Define ∫



b

f (λ) dλ ≡ lim

M →∞

a



b

M ∧ f (λ) dλ = sup M

a

b

M ∧ f (λ) dλ a

where a ∧ b means the minimum of a and b. Note that for f bounded, ∫ M



b

M ∧ f (λ) dλ =

sup a

b

f (λ) dλ a

where the integral on the right is the usual Riemann integral because eventually M > f . For f a nonnegative decreasing function defined on [0, ∞), ∫ 0



∫ f dλ ≡ lim

R→∞



R

f dλ = sup 0

R>1



R

0

R M >0

R

f ∧ M dλ

f dλ = sup sup 0

Since decreasing bounded functions are Riemann integrable, the above definition is well defined. Now here are some obvious properties.

523

524

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

Lemma 21.1.2 Let f be a decreasing nonnegative function defined on an interval [a, b] . Then if [a, b] = ∪m k=1 Ik where Ik ≡ [ak , bk ] and the intervals Ik are non overlapping, it follows ∫ b m ∫ bk ∑ f dλ. f dλ = a

k=1

ak

Proof: This follows from the computation, ∫ b ∫ b f dλ ≡ lim f ∧ M dλ M →∞

a

=

lim

M →∞

a

m ∫ ∑

bk

f ∧ M dλ =

ak

k=1

m ∫ ∑ k=1

bk

f dλ

ak

Note both sides could equal +∞. 

21.1.2

The Lebesgue Integral For Nonnegative Functions

Here is the definition of the Lebesgue integral of a function which is measurable and has values in [0, ∞]. Definition 21.1.3 Let (Ω, F, µ) be a measure space and suppose f : Ω → [0, ∞] is measurable. Then define ∫ ∞ ∫ f dµ ≡ µ ([f > λ]) dλ 0

which makes sense because λ → µ ([f > λ]) is nonnegative and decreasing. ∫ ∫ Note that if f ≤ g, then f dµ ≤ gdµ because µ ([f > λ]) ≤ µ ([g > λ]) . ∑0 For convenience i=1 ai ≡ 0. Lemma 21.1.4 In the situation of the above definition, ∫ ∞ ∑ µ ([f > hi]) h f dµ = sup h>0 i=1

Proof: Let m (h, R) ∈ N satisfy R − h < hm (h, R) ≤ R. Then limR→∞ m (h, R) = ∞ and so ∫ ∫ ∞ ∫ R f dµ ≡ µ ([f > λ]) dλ = sup sup µ ([f > λ]) ∧ M dλ M

0



R

0

m(h,R)

= sup sup sup M R>0 h>0

(µ ([f > kh]) ∧ M ) h + (µ ([f > R]) ∧ M ) (R − hm (h, R))

(21.1)

k=1



m(h,R)

= sup sup sup M R>0 h>0

(µ ([f > kh]) ∧ M ) h

k=1

∫R because the sum in 21.1 is just a lower sum for the integral 0 µ ([f > λ]) ∧ M dλ, these lower sums are increasing, and the last term is smaller than M h. Hence, switching the order of the sups, this equals ∑

m(h,R)

sup sup sup R>0 h>0 M



m(h,R)

(µ ([f > kh]) ∧ M ) h = sup sup lim

R>0 h>0 M →∞

k=1



m(R,h)

= sup sup h>0 R

k=1

(µ ([f > kh])) h = sup h>0

∞ ∑ k=1

(µ ([f > kh]) ∧ M ) h

k=1

(µ ([f > kh])) h. 

21.2. THE LEBESGUE INTEGRAL FOR NONNEGATIVE SIMPLE FUNCTIONS

21.2

525

The Lebesgue Integral For Nonnegative Simple Functions

To begin with, here is a useful lemma. Lemma 21.2.1 If f (λ) = 0 for all λ > a, where f is a decreasing nonnegative function, then ∫ ∞ ∫ a f (λ) dλ = f (λ) dλ. 0

Proof: From the definition, ∫ ∞ f (λ) dλ = 0

=

0

∫ R→∞

0

=

f (λ) dλ

R>1



0

R

f (λ) ∧ M dλ

sup sup 0



R

f (λ) ∧ M dλ

sup sup M R>1

=

0

∫ a sup sup f (λ) ∧ M dλ M R>1 0 ∫ a ∫ sup f (λ) ∧ M dλ ≡ M

R

f (λ) dλ = sup

R>1 M

=



R

lim

0

a

f (λ) dλ. 

0

Now the Lebesgue integral for a nonnegative function has been defined, what does it do to a nonnegative simple function? Recall a nonnegative simple function is one which has finitely many nonnegative real values which it assumes on measurable sets. Thus a simple function can be written in the form n ∑ s (ω) = ci XEi (ω) i=1

where the ci are each nonnegative, the distinct values of s. ∑p Lemma 21.2.2 Let s (ω) = i=1 ai XEi (ω) be a nonnegative simple function where the Ei are distinct but the ai might not be. Then ∫ p ∑ sdµ = ai µ (Ei ) . (21.2) i=1

Proof: Without loss of generality, assume 0 ≡ a0 < a1 ≤ a2 ≤ · · · ≤ ap and that µ (Ei ) < ∞, i > 0. Here is why. If µ (Ei ) = ∞, then letting a ∈ (ai−1 , ai ) , by Lemma 21.2.1, the left side would be ∫ ∞ ∫ ai µ ([s > λ]) dλ ≥ µ ([s > λ]) dλ 0 a0 ∫ ai ≡ sup µ ([s > λ]) ∧ M dλ M

0

≥ sup M ai = ∞ M

and so both sides are equal to ∞. Thus it can be assumed for each i, µ (Ei ) < ∞. Then it follows from Lemma 21.2.1 and Lemma 21.1.2, ∫ ap ∫ ∞ p ∫ ak ∑ µ ([s > λ]) dλ = µ ([s > λ]) dλ µ ([s > λ]) dλ = 0

0

=

p ∑ k=1

(ak − ak−1 )

p ∑ i=k

µ (Ei ) =

k=1 p ∑ i=1

µ (Ei )

i ∑ k=1

ak−1

(ak − ak−1 ) =

p ∑ i=1

ai µ (Ei ) 

526

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

Lemma 21.2.3 If a, b ≥ 0 and if s and t are nonnegative simple functions, then ∫ ∫ ∫ (as + bt) dµ = a sdµ + b tdµ. Proof: Let s(ω) =

n ∑

m ∑

αi XAi (ω), t(ω) =

i=1

β j XBj (ω)

i=1

where αi are the distinct values of s and the β j are the distinct values of t. Clearly as + bt is a nonnegative simple function because it has finitely many values on measurable sets. In fact, (as + bt)(ω) =

m ∑ n ∑

(aαi + bβ j )XAi ∩Bj (ω)

j=1 i=1

where the sets Ai ∩ Bj are disjoint and measurable. By Lemma 21.2.2, ∫ (as + bt) dµ

=

=

=

m ∑ n ∑

(aαi + bβ j )µ(Ai ∩ Bj )

j=1 i=1 n m ∑ ∑

a

αi µ(Ai ∩ Bj ) + b

i=1 j=1 n ∑

a

αi µ(Ai ) + b



21.3

a

β j µ(Ai ∩ Bj )

j=1 i=1

i=1

=

m ∑ n ∑

β j µ(Bj )

j=1



sdµ + b

m ∑

tdµ. 

The Monotone Convergence Theorem

The following is called the monotone convergence theorem. This theorem and related convergence theorems are the reason for using the Lebesgue integral. Theorem 21.3.1 (Monotone Convergence theorem) Let f have values in [0, ∞] and suppose {fn } is a sequence of nonnegative measurable functions having values in [0, ∞] and satisfying lim fn (ω) = f (ω) for each ω.

n→∞

· · · fn (ω) ≤ fn+1 (ω) · · · Then f is measurable and



∫ f dµ = lim

fn dµ.





n→∞

Proof: By Lemma 21.1.4 lim

n→∞

= sup sup n h>0

= sup sup h>0 N

∞ ∑ k=1 N ∑ k=1

fn dµ = sup

fn dµ

n

µ ([fn > kh]) h = sup sup sup h>0 N

µ ([f > kh]) h = sup h>0

∞ ∑

n

N ∑

µ ([fn > kh]) h

k=1

µ ([f > kh]) h =

∫ f dµ. 

k=1

To illustrate what goes wrong without the Lebesgue integral, consider the following example.

21.4. OTHER DEFINITIONS

527

Example 21.3.2 Let {rn } denote the rational numbers in [0, 1] and let { 1 if t ∈ / {r1 , · · · , rn } fn (t) ≡ 0 otherwise Then fn (t) ↑ f (t) where f is the function which is one on the rationals and zero on the irrationals. but f is not Riemann integrable. Therefore, you can’t write ∫Each fn is Riemann ∫ integrable (why?) ∫ f dx = limn→∞ fn dx. In fact, fn dx = 0 for each n. A meta-mathematical observation related to this type of example is this. If you can choose your functions, you don’t need the Lebesgue integral. The Riemann Darboux integral is just fine. It is when you can’t choose your functions and they come to you as pointwise limits that you really need the superior Lebesgue integral or at least something more general than the Riemann integral. The Riemann integral is entirely adequate for evaluating the seemingly endless lists of boring problems found in calculus books. It is shown later that the two integrals coincide when the Lebesgue integral is taken with respect to Lebesgue measure and the function being integrated is Riemann integrable.

21.4

Other Definitions

To review and summarize the above, if f ≥ 0 is measurable, ∫ ∫ ∞ f dµ ≡ µ ([f > λ]) dλ

(21.3)

0

∫ another way to get the same thing for f dµ is to take an increasing sequence of nonnegative simple functions, {sn } with sn (ω) → f (ω) and then by monotone convergence theorem, ∫ ∫ f dµ = lim sn n→∞

where if sn (ω) =

∑m

j=1 ci XEi

(ω) , ∫ sn dµ =

m ∑

ci µ (Ei ) .

i=1

Similarly this also shows that for such nonnegative measurable function, {∫ } ∫ f dµ = sup s : 0 ≤ s ≤ f, s simple Here is an equivalent definition of the integral of a nonnegative measurable function. The fact it is well defined has been discussed above. Definition 21.4.1 For s a nonnegative simple function, s (ω) =

n ∑ k=1

∫ ck XEk (ω) ,

s=

n ∑

ck µ (Ek ) .

k=1

For f a nonnegative measurable function, } {∫ ∫ s : 0 ≤ s ≤ f, s simple . f dµ = sup

528

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

21.5

Fatou’s Lemma

The next theorem, known as Fatou’s lemma is another important theorem which justifies the use of the Lebesgue integral. Theorem 21.5.1 (Fatou’s lemma) Let fn be a nonnegative measurable function. lim inf n→∞ fn (ω). Then g is measurable and ∫ ∫ gdµ ≤ lim inf fn dµ.

Let g(ω) =

n→∞

In other words,

∫ (

∫ ) lim inf fn dµ ≤ lim inf fn dµ n→∞

n→∞

Proof: Let gn (ω) = inf{fk (ω) : k ≥ n}. Then gn−1 ([a, ∞]) = =

−1 ∩∞ k=n fk ([a, ∞]) ( ∞ −1 )C ∪k=n fk ([a, ∞])C ∈ F.

Thus gn is measurable by Lemma 20.1.4. Also g(ω) = limn→∞ gn (ω) so g is measurable because it is the pointwise limit of measurable functions. Now the functions gn form an increasing sequence of nonnegative measurable functions so the monotone convergence theorem applies. This yields ∫ ∫ ∫ gdµ = lim gn dµ ≤ lim inf fn dµ. n→∞

The last inequality holding because

n→∞

∫ gn dµ ≤ fn dµ. ∫ (Note that it is not known whether limn→∞ fn dµ exists.) 

21.6



The Integral’s Righteous Algebraic Desires

The monotone convergence theorem shows the integral wants to be linear. This is the essential content of the next theorem. We can’t say it is linear yet because to be linear, something must be defined on a vector space or something similar where it makes sense to consider linear combinations and the integral has only been defined at this point on nonnegative measurable functions. Theorem 21.6.1 Let f, g be nonnegative measurable functions and let a, b be nonnegative numbers. Then af + bg is measurable and ∫ ∫ ∫ (af + bg) dµ = a f dµ + b gdµ. (21.4) Proof: By Theorem 20.1.6 on Page 500 there exist increasing sequences of nonnegative simple functions, sn → f and tn → g. Then af + bg, being the pointwise limit of the simple functions asn + btn , is measurable. Now by the monotone convergence theorem and Lemma 21.2.3, ∫ ∫ asn + btn dµ (af + bg) dµ = lim n→∞ ( ∫ ) ∫ = lim a sn dµ + b tn dµ n→∞ ∫ ∫ = a f dµ + b gdµ.  As long as you are allowing functions to take the value +∞, you cannot consider something like f + (−g) and so you can’t very well expect a satisfactory statement about the integral being linear until you restrict yourself to functions which have values in a vector space. To be linear, a function must be defined on a vector space. This is discussed next.

21.7. THE LEBESGUE INTEGRAL, L1

21.7

529

The Lebesgue Integral, L1

The functions considered here have values in C, which is a vector space. A function f with values in C is of the form f = Re f + i Im f where Re f and Im f are real valued functions. In fact Re f =

f +f f −f , Im f = . 2 2i

Definition 21.7.1 Let (Ω, S, µ) be a measure space and suppose f : Ω → C. Then f is said to be measurable if both Re f and Im f are measurable real valued functions. Of course there is another definition of measurability which says that inverse images of open sets are measurable. This is equivalent to this new definition. Lemma 21.7.2 Let f : Ω → C. Then f is measurable if and only if Re f, Im f are both real valued measurable functions. Also if f, g are complex measurable functions and a, b are complex scalars, then af + bg is also measurable. Proof: ⇒Suppose first that f is measurable. Recall that C is considered as R2 with (x, y) being identified with √ x + iy. Thus the open sets of C can be obtained with either of the two equivlanent norms |z| ≡

2

2

(Re z) + (Im z) or ∥z∥∞ = max (Re z, Im z). Therefore, if f is measurable Re f −1 (a, b) ∩ Im f −1 (c, d) = f −1 ((a, b) + i (c, d)) ∈ F

In particular, you could let (c, d) = R and conclude that Re f is measurable because in this case, the above reduces to the statement that Re f −1 (a, b) ∈ F . Similarly Im f is measurable. ⇐ Next, if each of Re f and Im f are measurable, then f −1 ((a, b) + i (c, d)) = Re f −1 (a, b) ∩ Im f −1 (c, d) ∈ F and so, since every open set is the countable union of sets of the form (a, b) + i (c, d) , it follows that f is measurable. Now consider the last claim. Let h:C×C→C be given by h (z, w) ≡ az + bw. Then h is continuous. If f, g are complex valued measurable functions, consider the complex valued function, h ◦ (f, g) : Ω → C Then (h ◦ (f, g))

−1

−1

(open) = (f, g)

(

) −1 h−1 (open) = (f, g) (open)

Now letting U, V be open in C, −1

(f, g)

(U × V ) = f −1 (U ) ∩ g −1 (V ) ∈ F .

Since every open set in C × C is the countable union of sets of the form U × V, it follows that −1 (f, g) (open) is in F. Thus af + bg is also complex measurable.  2 2 2 As is always the case for complex numbers, |z| = (Re z) + (Im z) . Also, for g a real valued function, one can consider its positive and negative parts defined respectively as g + (x) ≡

g (x) + |g (x)| − |g (x)| − g (x) , g (x) = . 2 2

Thus |g| = g + + g − and g = g + − g − and both g + and g − are measurable nonnegative functions if g is measurable. Then the following is the definition of what it means for a complex valued function f to be in L1 (Ω).

530

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

Definition 21.7.3 Let (Ω, F, µ) be a measure space. Then a complex valued measurable function f is in L1 (Ω) if ∫ |f | dµ < ∞. For a function in L1 (Ω) , the integral is defined as follows. [∫ ] ∫ ∫ ∫ ∫ + − + − f dµ ≡ (Re f ) dµ − (Re f ) dµ + i (Im f ) dµ − (Im f ) dµ I will show that with this definition, the integral is linear and well defined. First note that it is clearly well defined because all the above integrals are of nonnegative functions and are ∫ each equal to a nonnegative real number because for h equal to any of the functions, |h| ≤ |f | and |f | dµ < ∞. Here is a lemma which will make it possible to show the integral is linear. Lemma 21.7.4 Let g, h, g ′ , h′ be nonnegative measurable functions in L1 (Ω) and suppose that g − h = g ′ − h′ . Then



∫ gdµ −

∫ hdµ =

g ′ dµ −



h′ dµ.

Proof: By assumption, g + h′ = g ′ + h. Then from the Lebesgue integral’s righteous algebraic desires, Theorem 21.6.1, ∫ ∫ ∫ ∫ gdµ + h′ dµ = g ′ dµ + hdµ which implies the claimed result.  ( ) Lemma 21.7.5 Let Re L1 (Ω) denote the vector space of real valued in L1 (Ω) where ( 1 functions ) ∫ the field of scalars is the real numbers. Then dµ is linear on Re L (Ω) , the scalars being real numbers. Proof: First observe that from the definition of the positive and negative parts of a function, ( ) + − (f + g) − (f + g) = f + + g + − f − + g − because both sides equal f + g. Therefore from Lemma 21.7.4 and the definition, it follows from Theorem 21.6.1 that ∫ ∫ ∫ ∫ + − f + gdµ ≡ (f + g) − (f + g) dµ = f + + g + dµ − f − + g − dµ (∫ ) ∫ ∫ ∫ ∫ ∫ + + − − = f dµ + g dµ − f dµ + g dµ = f dµ + gdµ. +

what about taking out scalars? First note that if a is real and nonnegative, then (af ) = af + − + − and (af ) = af − while if a < 0, then (af ) = −af − and (af ) = −af + . These claims follow immediately from the above definitions of positive and negative parts of a function. Thus if a < 0 and f ∈ L1 (Ω) , it follows from Theorem 21.6.1 that ∫ ∫ ∫ ∫ ∫ + − af dµ ≡ (af ) dµ − (af ) dµ = (−a) f − dµ − (−a) f + dµ (∫ ) ∫ ∫ ∫ ∫ − + + − = −a f dµ + a f dµ = a f dµ − f dµ ≡ a f dµ. The case where a ≥ 0 works out similarly but easier.  Now here is the main result.

21.7. THE LEBESGUE INTEGRAL, L1

531

∫ Theorem 21.7.6 dµ is linear on L1 (Ω) and L1 (Ω) is a complex vector space. If f ∈ L1 (Ω) , then Re f, Im f, and |f | are all in L1 (Ω) . Furthermore, for f ∈ L1 (Ω) , [∫ ] ∫ ∫ ∫ ∫ + − + − f dµ ≡ (Re f ) dµ − (Re f ) dµ + i (Im f ) dµ − (Im f ) dµ ∫ ∫ ≡ Re f dµ + i Im f dµ and the triangle inequality holds,

∫ ∫ f dµ ≤ |f | dµ.

(21.5)

Also, for every f ∈ L1 (Ω) it follows that for every ε > 0 there exists a simple function s such that |s| ≤ |f | and ∫ |f − s| dµ < ε. Also L1 (Ω) is a vector space. Proof: First ( consider ) the claim that the integral is linear. It was shown above that the integral is linear on Re L1 (Ω) . Then letting a + ib, c + id be scalars and f, g functions in L1 (Ω) , (a + ib) f + (c + id) g = (a + ib) (Re f + i Im f ) + (c + id) (Re g + i Im g) = c Re (g) − b Im (f ) − d Im (g) + a Re (f ) + i (b Re (f ) + c Im (g) + a Im (f ) + d Re (g)) It follows from the definition that ∫ ∫ (a + ib) f + (c + id) gdµ = (c Re (g) − b Im (f ) − d Im (g) + a Re (f )) dµ ∫ +i

(b Re (f ) + c Im (g) + a Im (f ) + d Re (g))

Also, from the definition, ∫ ∫ (a + ib) f dµ + (c + id) gdµ

(21.6)

(∫

) ∫ Re f dµ + i Im f dµ (∫ ) ∫ + (c + id) Re gdµ + i Im gdµ

= (a + ib)

which equals ∫ =







Re f dµ − b Im f dµ + ib Re f dµ + ia Im f dµ ∫ ∫ ∫ ∫ +c Re gdµ − d Im gdµ + id Re gdµ − d Im gdµ. a

Using Lemma 21.7.5 and collecting terms, it follows that this reduces to 21.6. Thus the integral is linear as claimed. Consider the claim about approximation with a simple function. Letting h equal any of +



+



(Re f ) , (Re f ) , (Im f ) , (Im f ) ,

(21.7)

It follows from the monotone convergence theorem and Theorem 20.1.6 on Page 500 there exists a nonnegative simple function s ≤ h such that ∫ ε |h − s| dµ < . 4

532

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

Therefore, letting s1 , s2 , s3 , s4 be such simple functions, approximating respectively the functions listed in 21.7, and s ≡ s1 − s2 + i (s3 − s4 ) , ∫ ∫ ∫ + − |f − s| dµ ≤ (Re f ) − s1 dµ + (Re f ) − s2 dµ +

∫ ∫ − + (Im f ) − s3 dµ + (Im f ) − s4 dµ < ε

It is clear from the construction that |s| ≤ |f |. ∫ ∫ What about 21.5? Let θ ∈ C be such that |θ| = 1 and θ f dµ = f dµ . Then from what was shown above about the integral being linear, ∫ ∫ ∫ ∫ ∫ f dµ = θ f dµ = θf dµ = Re (θf ) dµ ≤ |f | dµ. If f, g ∈ L1 (Ω) , then it is known that for a, b scalars, it follows that af + bg is measurable. See Lemma 21.7.2. Also ∫ ∫ |af + bg| dµ ≤ |a| |f | + |b| |g| dµ < ∞.  The following corollary follows from this. The conditions of this corollary are sometimes taken as a definition of what it means for a function f to be in L1 (Ω). Corollary 21.7.7 f ∈ L1 (Ω) if and only if there exists a sequence of complex simple functions, {sn } such that sn (ω) → f (ω) for all ω ∈ Ω ∫ (21.8) limm,n→∞ (|sn − sm |) = 0 When f ∈ L1 (Ω) ,



∫ f dµ ≡ lim

sn .

n→∞

(21.9)

Proof: From the above theorem, if f ∈ L1 there exists a sequence of simple functions {sn } such that ∫ |f − sn | dµ < 1/n, sn (ω) → f (ω) for all ω Then



∫ |sn − sm | dµ ≤

∫ |sn − f | dµ +

|f − sm | dµ ≤

1 1 + . n m

Next suppose the existence of the approximating sequence of simple functions. Then f is measurable because its real and imaginary parts are the limit of measurable functions. By Fatou’s lemma, ∫ ∫ |f | dµ ≤ lim inf

n→∞

|sn | dµ < ∞

∫ ∫ ∫ |sn | dµ − |sm | dµ ≤ |sn − sm | dµ } {∫ |sn | dµ is a Cauchy sequence and is therefore, bounded. which is given to converge to 0. Hence In case f ∈ L1 (Ω) , letting {sn } be the approximating sequence, Fatou’s lemma implies ∫ ∫ ∫ ∫ f dµ − sn dµ ≤ |f − sn | dµ ≤ lim inf |sm − sn | dµ < ε because

m→∞

provided n is large enough. Hence 21.9 follows.  This is a good time to observe the following fundamental observation which follows from a repeat of the above arguments.

21.8. THE DOMINATED CONVERGENCE THEOREM

533

Theorem 21.7.8 Suppose Λ (f ) ∈ [0, ∞] for all nonnegative measurable functions and suppose that for a, b ≥ 0 and f, g nonnegative measurable functions, Λ (af + bg) = aΛ (f ) + bΛ (g) . In other words, Λ wants to be linear. Then Λ has a unique linear extension to the set of measurable functions {f measurable : Λ (|f |) < ∞} , this set being a vector space. Since all the manipulations used apply just as well to continuous as measurable, you can replace the word “measurable” in the above with the word “continuous” and ∫ dµ with Λ where Λ wants to be linear and acts like it is on the set of nonnegative functions.

21.8

The Dominated Convergence Theorem

One of the major theorems in this theory is the dominated convergence theorem. Before presenting it, here is a technical lemma about lim sup and lim inf which is really pretty obvious from the definition. Lemma 21.8.1 Let {an } be a sequence in [−∞, ∞] . Then limn→∞ an exists if and only if lim inf an = lim sup an n→∞

n→∞

and in this case, the limit equals the common value of these two numbers. Proof: Suppose first limn→∞ an = a ∈ R. Then, letting ε > 0 be given, an ∈ (a − ε, a + ε) for all n large enough, say n ≥ N. Therefore, both inf {ak : k ≥ n} and sup {ak : k ≥ n} are contained in [a − ε, a + ε] whenever n ≥ N. It follows lim supn→∞ an and lim inf n→∞ an are both in [a − ε, a + ε] , showing lim inf an − lim sup an < 2ε. n→∞

n→∞

Since ε is arbitrary, the two must be equal and they both must equal a. Next suppose limn→∞ an = ∞. Then if l ∈ R, there exists N such that for n ≥ N, l ≤ an and therefore, for such n, l ≤ inf {ak : k ≥ n} ≤ sup {ak : k ≥ n} and this shows, since l is arbitrary that lim inf an = lim sup an = ∞. n→∞

n→∞

The case for −∞ is similar. Conversely, suppose lim inf n→∞ an = lim supn→∞ an = a. Suppose first that a ∈ R. Then, letting ε > 0 be given, there exists N such that if n ≥ N, sup {ak : k ≥ n} − inf {ak : k ≥ n} < ε therefore, if k, m > N, and ak > am , |ak − am | = ak − am ≤ sup {ak : k ≥ n} − inf {ak : k ≥ n} < ε showing that {an } is a Cauchy sequence. Therefore, it converges to a ∈ R, and as in the first part, the lim inf and lim sup both equal a. If lim inf n→∞ an = lim supn→∞ an = ∞, then given l ∈ R, there exists N such that for n ≥ N, inf an > l. n>N

Therefore, limn→∞ an = ∞. The case for −∞ is similar.  Here is the dominated convergence theorem.

534

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

Theorem 21.8.2 (Dominated Convergence theorem) Let fn ∈ L1 (Ω) and suppose f (ω) = lim fn (ω), n→∞

and there exists a measurable function g, with values in [0, ∞],1 such that ∫ |fn (ω)| ≤ g(ω) and g(ω)dµ < ∞. Then f ∈ L1 (Ω) and

∫ ∫ |fn − f | dµ = lim f dµ − fn dµ n→∞

∫ 0 = lim

n→∞

Proof: f is measurable by Theorem 20.1.2. Since |f | ≤ g, it follows that f ∈ L1 (Ω) and |f − fn | ≤ 2g. By Fatou’s lemma (Theorem 21.5.1), ∫ 2gdµ ≤

∫ lim inf 2g − |f − fn |dµ n→∞ ∫ ∫ 2gdµ − lim sup |f − fn |dµ.

= Subtracting



n→∞

2gdµ,

∫ |f − fn |dµ.

0 ≤ − lim sup

n→∞

Hence

(∫ ≥ lim sup

0

n→∞

) |f − fn |dµ

(∫

≥ lim inf

) |f − fn |dµ

n→∞

∫ ∫ ≥ f dµ − fn dµ ≥ 0.

This proves the theorem by Lemma 21.8.1 because the lim sup and lim inf are equal.  Corollary 21.8.3 Suppose fn ∈ L1 (Ω) and f (ω) = limn→∞ fn (ω) ∫ . Suppose ∫ also there exist measurable functions, g , g with values in [0, ∞] such that lim g dµ = gdµ, gn (ω) → g (ω) µ n n→∞ n ∫ ∫ a.e. and both gn dµ and gdµ are finite. Also suppose |fn (ω)| ≤ gn (ω) . Then ∫ lim |f − fn | dµ = 0. n→∞

Proof: It is just like the above. This time g + gn − |f − fn | ≥ 0 and so by Fatou’s lemma, ∫ ∫ |f − fn | dµ = 2gdµ − lim sup n→∞





|f − fn | dµ ∫ ∫ = lim inf ((gn + g) − |f − fn |) dµ ≥ 2gdµ lim inf

n→∞

and so − lim supn→∞ 0



(gn + g) dµ − lim sup

n→∞

n→∞

|f − fn | dµ ≥ 0. Thus (∫ ) ≥ lim sup |f − fn |dµ n→∞ ) ∫ (∫ ∫ |f − fn |dµ ≥ f dµ − fn dµ ≥ 0.  ≥ lim inf n→∞

1 Note

that, since g is allowed to have the value ∞, it is not known that g ∈ L1 (Ω) .

21.9. EXERCISES

535

Definition 21.8.4 Let E be a measurable subset of Ω. ∫ ∫ f dµ ≡ f XE dµ. E

If L1 (E) is written, the σ algebra is defined as {E ∩ A : A ∈ F } and the measure is µ restricted to this smaller σ algebra. Clearly, if f ∈ L1 (Ω), then f XE ∈ L1 (E) and if f ∈ L1 (E), then letting f˜ be the 0 extension of f off of E, it follows f˜ ∈ L1 (Ω).

21.9

Exercises

1. Let Ω = N ={1, 2, · · · }. Let F = P(N), the set of all subsets of N, and let µ(S) = number of elements in S. Thus µ({1}) = 1 = µ({2}), µ({1, 2}) = 2, etc. In this case, all functions are measurable. For a nonnegative function, f defined on N, show ∫ f dµ = N

∞ ∑

f (k)

k=1

What do the monotone convergence and dominated convergence theorems say about this example? 2. For the measure space of Problem 1, give an example of a sequence of nonnegative measurable functions {fn } converging pointwise to a function f , such that inequality is obtained in Fatou’s lemma. 3. If (Ω, F, µ) is a measure space ∫ ∫ and f ≥ 0 is measurable,1 show that if g (ω) = f (ω) a.e. ω and g ≥ 0, then gdµ = f dµ. Show that if f, g ∈ L (Ω) and g (ω) = f (ω) a.e. then ∫ ∫ gdµ = f dµ. 4. Let {fn } , f be measurable functions with values in C. {fn } converges in measure if lim µ(x ∈ Ω : |f (x) − fn (x)| ≥ ε) = 0

n→∞

for each fixed ε > 0. Prove the theorem of F. Riesz. If fn converges to f in measure, then there exists a subsequence {fnk } which converges to f a.e. In case µ is a probability measure, this is called convergence in probability. It does not imply pointwise convergence but does imply that there is a subsequence which converges pointwise off a set of measure zero. Hint: Choose n1 such that µ(x : |f (x) − fn1 (x)| ≥ 1) < 1/2. Choose n2 > n1 such that µ(x : |f (x) − fn2 (x)| ≥ 1/2) < 1/22, n3 > n2 such that

µ(x : |f (x) − fn3 (x)| ≥ 1/3) < 1/23,

etc. Now consider what it means for fnk (x) to fail to converge to f (x). Use the Borel Cantelli lemma of Problem 14 on Page 510.

536

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

5. Suppose (Ω, µ) is a finite measure space (µ (Ω) < ∞) and S ⊆ L1 (Ω). Then S is said to be uniformly integrable if for every ε > 0 there exists δ > 0 such that if E is a measurable set satisfying µ (E) < δ, then ∫ |f | dµ < ε E

for all f ∈ S. Show S is uniformly integrable and bounded in L1 (Ω) if there exists an increasing function h which satisfies {∫ } h (t) lim = ∞, sup h (|f |) dµ : f ∈ S < ∞. t→∞ t Ω S is bounded if there is some number, M such that ∫ |f | dµ ≤ M for all f ∈ S. 6. A collection S ⊆ L1 (Ω) , (Ω, F, µ) a finite measure space, is called equiintegrable if for every ε > 0 there exists λ > 0 such that ∫ |f | dµ < ε [|f |≥λ]

for all f ∈ S. Show that S is equiintegrable, if and only if it is uniformly integrable and bounded. The equiintegrable condition is pretty popular in probability. 7. There is a general construction called product measure. You have two finite measure spaces. (X, F, µ) , (Y, G, ν) Let K be the π system of measurable rectangles A × B where A ∈ F and B ∈ G. Explain why this is really a π system. Now let F × G denote the smallest σ algebra which contains K. Let { } ∫ ∫ ∫ ∫ P≡ A∈F ×G : XA dνdµ = XA dµdν X

Y

Y

X

where both integrals make sense and are equal. Then show that P is closed with respect to complements and countable disjoint unions. By Dynkin’s lemma, P = F × G. Then define a measure µ × ν as follows. For A ∈ F × G ∫ ∫ µ × ν (A) ≡ XA dνdµ X

Y

Explain why this is a measure and why if f is F × G measurable and nonnegative, then ∫ ∫ ∫ ∫ ∫ f d (µ × ν) = XA dνdµ = XA dµdν X×Y

X

Y

Y

X

Hint: This is just a repeat of what I showed you in class except that it is easier because the measures are finite. Pay special attention to the way the monotone convergence theorem is used. 8. Let (X, F, µ) be a regular measure space. For example, it could be Rp with Lebesgue measure. Why do we care about a measure space being regular? This problem will show why. Suppose that closures of balls are compact as in the case of Rp . (a) Let µ (E) < ∞. By regularity, there exists K ⊆ E ⊆ V where K is compact and V is ¯ ⊆ V and W ¯ open such that µ (V \ K) < ε. Show there exists W open such that K ⊆ W is compact. Now show there exists a function h such that h has values in [0, 1] , h (x) = 1 for x ∈ K, and h (x) equals 0 off W . Hint: You might consider Problem 12 on Page 510.

21.9. EXERCISES

537

(b) Show that

∫ |XE − h| dµ < ε

∑n (c) Next suppose s = i=1 ci XEi is a nonnegative simple function where each µ (Ei ) < ∞. Show there exists a continuous nonnegative function h which equals zero off some compact set such that ∫ |s − h| dµ < ε (d) Now suppose f ≥ 0 and f ∈ L1 (Ω) . Show that there exists h ≥ 0 which is continuous and equals zero off a compact set such that ∫ |f − h| dµ < ε (e) If f ∈ L1 (Ω) with complex values, show the conclusion in the above part of this problem is the same. 9. Let (Ω, F, µ) be a measure space and suppose f, g : Ω → (−∞, ∞] are measurable. Prove the sets {ω : f (ω) < g(ω)} and {ω : f (ω) = g(ω)} are measurable. Hint: The easy way to do this is to write {ω : f (ω) < g(ω)} = ∪r∈Q [f < r] ∩ [g > r] . Note that l (x, y) = x − y is not continuous on (−∞, ∞] so the obvious idea doesn’t work. Here [g > r] signifies {ω : g (ω) > r}. 10. Let {fn } be a sequence of real or complex valued measurable functions. Let S = {ω : {fn (ω)} converges}. Show S is measurable. Hint: You might try to exhibit the set where fn converges in terms of countable unions and intersections using the definition of a Cauchy sequence. 11. Suppose un (t) is a differentiable function for t ∈ (a, b) and suppose that for t ∈ (a, b), |un (t)|, |u′n (t)| < Kn where

∑∞ n=1

Kn < ∞. Show (

∞ ∑

un (t))′ =

n=1

∞ ∑

u′n (t).

n=1

Hint: This is an exercise in the use of the dominated convergence theorem and the mean value theorem. 12. Suppose {fn } is a sequence of nonnegative measurable functions defined on a measure space, (Ω, S, µ). Show that ∫ ∑ ∞ ∞ ∫ ∑ fk dµ = fk dµ. k=1

k=1

Hint: Use the monotone convergence theorem along with the fact the integral is linear.

538

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

13. Explain why for each t > 0, x → e−tx is a function in L1 (R) and ∫ ∞ 1 e−tx dx = . t 0 Thus



R

0

sin (t) dt = t

∫ 0

R





sin (t) e−tx dxdt

0

Now explain why you can change the order of integration in the above iterated integral. Then compute what you get. Next pass to a limit as R → ∞ and show ∫ ∞ 1 sin (t) dt = π t 2 0 This is a very important integral. Note that the thing on the left is an improper integral. sin (t) /t is not Lebesgue integrable because it is not absolutely integrable. That is ∫ ∞ sin t t dm = ∞ 0 It is important to understand that the Lebesgue theory of integration only applies to nonnegative functions and those which are absolutely integrable. ∑n k 14. Show limn→∞ 2nn k=1 2k = 2. This problem was shown to me by Shane Tang, a former student. It is a nice exercise in dominated convergence theorem if you massage it a little. Hint: ( ) n−1 n n n−1 n−1 ∑ ∑ ∑ ∑ n ∑ 2k l k−n n −l n −l = 2 = 2 = 2 1 + ≤ 2−l (1 + l) n 2 k k n−l n−l k=1

k=1

l=0

l=0

l



15. Let the rational numbers in [0, 1] be {rk }k=1 and define { 1 if t ∈ {r1 , · · · , rn } fn (t) = 0 if t ∈ / {r1 , · · · , rn } Show that limn→∞ fn (t) = f (t) where f is one on the rational numbers and 0 on the irrational numbers. Explain why each fn is Riemann integrable but f is not. However, each fn is actually a simple function and its Lebesgue and Riemann integral is equal to 0. Apply ∫ the monotone convergence theorem to conclude that f is Lebesgue integrable and in fact, f dm = 0. 16. Give an example of a sequence∫of functions {fn } , fn∫ ≥ 0 and a function f ≥ 0 such that f (x) = lim inf n→∞ fn (x) but f dm < lim inf n→∞ fn dm so you get strict inequality in Fatou’s lemma. 17. Let f be a nonnegative Riemann integrable function defined on [a, b] . Thus there is a unique number between all the upper sums and lower sums. First explain why, if ai ≥ 0, ∫ ∑ n ∑ ai X[ti ,ti−1 ) (t) dm = ai (ti − ti−1 ) i=1

i

Explain why there exists an increasing sequence of Borel measurable functions {gn } converging to a Borel measurable function g, and a decreasing sequence of functions {hn } which are also Borel measurable converging to a Borel measurable function h such that gn ≤ f ≤ hn , ∫ gn dm equals a lower sum ∫ hn dm equals an upper sum

21.9. EXERCISES

539

∫ and (h − g) dm = 0. Explain why {x : f (x) ̸= g (x)} is a set of measure zero. Then explain ∫b ∫ why f is measurable and a f (x) dx = f dm so that the Riemann integral gives the same answer as the Lebesgue integral.

540

CHAPTER 21. THE ABSTRACT LEBESGUE INTEGRAL

Chapter 22

Measures From Positive Linear Functionals Rudin does it this way and I really don’t know a better way to do it. In this chapter, we will only consider the measure space to be a complete separable metric space in which the measure of balls is finite. Rudin does this for an arbitrary locally compact Hausdorff space, but all examples of interest to me are metric spaces. In fact, the one of most interest is Rn . Definition 22.0.1 Let Ω be a Polish space (complete separable metric space). Define Cc (Ω) to be the functions which have complex values and compact support. This means spt (f ) ≡ {x ∈ Ω : f (x) ̸= 0} is a compact set. Then L : Cc (Ω) → C is called a positive linear functional if it is linear and if, whenever f ≥ 0, then L (f ) ≥ 0 also. The following definition gives some notation. Definition 22.0.2 If K is a compact subset of an open set, V , then K ≺ ϕ ≺ V if ϕ ∈ Cc (V ), ϕ(K) = {1}, ϕ(Ω) ⊆ [0, 1], where Ω denotes the whole topological space considered. Also for ϕ ∈ Cc (Ω), K ≺ ϕ if ϕ(Ω) ⊆ [0, 1] and ϕ(K) = 1. and ϕ ≺ V if

ϕ(Ω) ⊆ [0, 1] and spt(ϕ) ⊆ V.

Now we need some lemmas. Theorem 22.0.3 Let H be a compact subset of an open set U in a metric space having the property that the closures of balls are compact. Then there exists an open set V such that H ⊆ V ⊆ V¯ ⊆ U with V¯ compact. There also exists ψ such that H ≺ f ≺ V , meaning that f = 1 on H and spt (f ) ⊆ V¯ . ( ) Proof: Consider h → dist h, U C . This function achieves its minimum at some ( continuous ) h0 ∈ H because H is compact. Let δ ≡ dist h0 , U C . The distance is positive because U C is closed. Now H ⊆ ∪h∈H B (h, δ) . Since H is compact, there are finitely many of these balls which cover H. Say H ⊆ ∪ki=1 B (hi , δ) ≡ V. Then, since there are finitely many of these balls, V¯ = ∪ki=1 B (hi , δ) which is a compact set since it is a finite union of compact sets.

541

542

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS To obtain f, let

( ) dist x, V C f (x) ≡ dist (x, V C ) + dist (x, H)

Then f (x) ≤ 1 and if x ∈ H, its distance to V C is positive and dist (x, H) = 0 so f (x) = 1. If x ∈ V C , then its distance to H is positive and so f (x) = 0. It is obviously continuous because the denominator is a continuous function and never vanishes. Thus H ≺ f ≺ V .  Theorem 22.0.4 (Partition of unity) Let K be a compact subset of a Polish space in which the closures of balls are compact and suppose K ⊆ V = ∪ni=1 Vi , Vi open. Then there exist ψ i ≺ Vi with

n ∑

ψ i (x) = 1

i=1

for all x ∈ K. If H is a compact subset of Vi for some Vi there exists a partition of unity such that ψ i (x) = 1 for all x ∈ H Proof: Let K1 = K \ ∪ni=2 Vi . Thus K1 is compact and K1 ⊆ V1 . Let K1 ⊆ W1 ⊆ W 1 ⊆ V1 with W 1 compact. To obtain W1 , use Theorem 22.0.3 to get f such that K1 ≺ f ≺ V1 and let W1 ≡ {x : f (x) ̸= 0} . Thus W1 , V2 , · · · Vn covers K and W 1 ⊆ V1 . Let K2 = K \(∪ni=3 Vi ∪W1 ). Then K2 is compact and K2 ⊆ V2 . Let K2 ⊆ W2 ⊆ W 2 ⊆ V2 W 2 compact. Continue this way finally obtaining W1 , · · · , Wn , K ⊆ W1 ∪ · · · ∪ Wn , and W i ⊆ Vi W i compact. Now let W i ⊆ Ui ⊆ U i ⊆ Vi , U i compact. By Theorem 22.0.3, let U i ≺ ϕi ≺ Vi , ∪ni=1 W i ≺ γ ≺ ∪ni=1 Ui . Define { ∑n ∑n γ(x)ϕi (x)/ j=1 ϕj (x) if j=1 ϕj (x) ̸= 0, ψ (x) = ∑ Wi U i V i i n 0 if ϕ (x) = 0. j=1 j ∑n If x is such that j=1 ϕj (x) = 0, then x ∈ / ∪ni=1 U i . Consequently γ(y) ∑n = 0 for all y near x and so ψ i (y) = 0 for all y near x. Hence ψ i is continuous at such x. If so ψ i is continuous at such points. Therefore j=1 ϕj (x) ̸= 0, this situation persists near x and ∑ n ψ i is continuous. If x ∈ K, then γ(x) = 1 and so j=1 ψ j (x) = 1. Clearly 0 ≤ ψ i (x) ≤ 1 and fj ≡ Vj \ H. Now in the spt(ψ j ) ⊆ Vj . As to the last claim, keep Vi the same but replace Vj with V proof above, applied to this modified collection of open sets, if j ̸= i, ϕj (x) = 0 whenever x ∈ H. Therefore, ψ i (x) = 1 on H. Now with this preparation, here is the main result called the Riesz representation theorem for positive linear functionals. Theorem 22.0.5 (Riesz representation theorem) Let (Ω, τ ) be a Polish space for which the closures of the balls are compact and let L be a positive linear functional on Cc (Ω). Then there exists a σ algebra S containing the Borel sets and a unique measure µ, defined on S, such that

µ(K)
µ(Vi ). µ(E) ≤ µ(∪∞ i=1 Vi ) ≤

∞ ∑

µ(Vi ) ≤ ε +

∑∞ i=1

µ(Ei ).

i=1

i=1

Since ε was arbitrary, µ(E) ≤

∞ ∑

µ(Ei ) which proves the lemma. 

Lemma 22.0.8 Let K be compact, g ≥ 0, g ∈ Cc (Ω), and g = 1 on K. Then µ(K) ≤ Lg. Also µ(K) < ∞ whenever K is compact. Proof: Let α ∈ (0, 1) and Vα = {x : g(x) > α} so Vα ⊇ K and let h ≺ Vα .

K



g>α Then h ≤ 1 on Vα while gα−1 ≥ 1 on Vα and so gα−1 ≥ h which implies L(gα−1 ) ≥ Lh and that therefore, since L is linear, Lg ≥ αLh. Since h ≺ Vα is arbitrary, and K ⊆ Vα , Lg ≥ αµ (Vα ) ≥ αµ (K) . Letting α ↑ 1 yields Lg ≥ µ(K). This proves the first part of the lemma. The second assertion follows from this and Theorem 22.0.3. If K is given, let K≺g≺Ω and so from what was just shown, µ (K) ≤ Lg < ∞. 

544

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

Lemma 22.0.9 If A and B are disjoint subsets of Ω, with dist (A, B) > 0 then µ(A ∪ B) = µ(A) + µ(B). Proof: There )is nothing to show ( if)µ (A ∪ B) = ∞. Thus we can let δ ≡ dist (A, B) > 0. Then let ( U1 ≡ ∪a∈A B a, 3δ , V1 ≡ ∪b∈B B b, 3δ . It follows that these two open sets have empty intersection. Also, there exists W ⊇ A ∪ B such that µ (W ) − ε < µ (A ∪ B). let U ≡ U1 ∩ W, V ≡ V1 ∩ W. Then µ (A ∪ B) + ε > µ (W ) ≥ µ (U ∪ V ) Now let f ≺ U, g ≺ V such that Lf + ε > µ (U ) , Lg + ε > µ (V ) . Then µ (U ∪ V ) ≥ L (f + g) = L (f ) + L (g) > µ (U ) − ε + (µ (V ) − ε) ≥ µ (A) + µ (B) − 2ε It follows that µ (A ∪ B) + ε > µ (A) + µ (B) − 2ε and since ε is arbitrary, µ (A ∪ B) ≥ µ (A) + µ (B) ≥ µ (A ∪ B).  It follows from Theorem 20.5.2 that the measurable sets S contains the Borel σ algebra B (Ω). Since closures of balls are compact, it follows from Lemma 22.0.8 that µ is finite on every ball. Corollary 20.4.8 implies that µ is regular for every E a Borel set. That is, µ (E) = sup {µ (K) : K ⊆ E} , µ (E) = inf {µ (V ) : V ⊇ E} In particular, µ is inner regular on every open set V . This is obtained immediately. In fact the same thing holds for any F ∈ S in place of E in the above. The second of the two follows immediately from the definition of µ. It remains to verify the first. In doing so, first assume that µ (F ) is contained in a closed ball B. Let V ⊇ (B \ F ) such that µ (V ) < µ (B \ F ) + ε Then µ (V \ (B \ F )) + µ (B \ F ) = µ (V ) < µ (B \ F ) + ε and so µ (V \ (B \ F )) < ε. Now consider V C ∩ F . This is a closed subset of F . To see that it is closed, note that V C ∩ F = V C ∩ B which is a closed set. Why is this so? It is clear that V C ∩ F ⊆ V C ∩ B. Now if x ∈ V C ∩ B, then since V ⊇ (B \ F ) , it follows that x ∈ V C ⊆ B C ∪ F and so either x ∈ B C which doesn’t occur, or x ∈ F and so this must be the case. Hence, V C ∩ B is a closed, hence compact subset of F . Now ( ( )) ( ( )) µ F \ VC ∩B = µ F ∩ V ∪ BC ≤ µ (V \ B) ≤ µ (V \ (B \ F )) < ε (

)

It follows that µ (F ) < µ V C ∩ B + ε which shows inner regularity in case F is contained in some closed ball B. If this is not the case, let Bn be a sequence of closed balls having increasing radii and let Fn = Bn ∩ F . Then if l < µ (F ) , it follows that µ (Fn ) > l for all large enough n. Then picking one of these, it follows from what was just shown that there is a compact set K ⊆ Fn such that also µ (K) > l. Thus S contains the Borel sets and µ is inner regular on all sets of S. It remains to show µ satisfies 22.3. ∫ Lemma 22.0.10 f dµ = Lf for all f ∈ Cc (Ω). Proof: Let f ∈ Cc (Ω), f real-valued, and suppose f (Ω) ⊆ [a, b]. Choose t0 < a and let t0 < t1 < · · · < tn = b, ti − ti−1 < ε. Let Ei = f −1 ((ti−1 , ti ]) ∩ spt(f ).

(22.4)

545 Note that ∪ni=1 Ei is a closed set, and in fact ∪ni=1 Ei = spt(f )

(22.5)

since Ω = ∪ni=1 f −1 ((ti−1 , ti ]). Let Vi ⊇ Ei , Vi is open and let Vi satisfy f (ω) < ti + ε for all ω ∈ Vi ,

(22.6)

µ(Vi \ Ei ) < ε/n. By Theorem 22.0.4 there exists hi ∈ Cc (Ω) such that hi ≺ Vi ,

n ∑

hi (ω) = 1 on spt(f ).

i=1

Now note that for each i, f (ω)hi (ω) ≤ hi (ω)(ti + ε). (If ω ∈ Vi , this follows from 22.6. If ω ∈ / Vi both sides equal 0.) Therefore, Lf

n n ∑ ∑ L( f hi ) ≤ L( hi (ti + ε))

=

i=1 n ∑

=

i=1 n ∑

=

i=1

(ti + ε)L(hi ) (|t0 | + ti + ε)L(hi ) − |t0 |L

i=1

( n ∑

) hi .

i=1

Now note that |t0 | + ti + ε ≥ 0 and so from the definition of µ and Lemma 22.0.8, this is no larger than n ∑ (|t0 | + ti + ε)µ(Vi ) − |t0 |µ(spt(f )) i=1



n ∑

(|t0 | + ti + ε) (µ(Ei ) + ε/n) − |t0 |µ(spt(f ))

i=1 µ(spt(f ))

z }| { n n ∑ ∑ ≤ |t0 | ti µ(Ei ) + ε(|t0 | + |b|) µ(Ei ) + |t0 |ε + i=1 n ∑ i=1

ti

i=1 n ∑

ε µ(Ei ) + ε2 − |t0 |µ(spt(f )). +ε n i=1

From 22.5 and 22.4, the first and last terms cancel. Therefore this is no larger than (2|t0 | + |b| + µ(spt(f )) + ε)ε n n ∑ ∑ ε + ti−1 µ(Ei ) + εµ(spt(f )) + (|t0 | + |b|) n i=1 i=1 ∫ ≤

f dµ + (2|t0 | + |b| + 2µ(spt(f )) + ε)ε + (|t0 | + |b|) ε

Since ε > 0 is arbitrary,

∫ Lf ≤

f dµ

(22.7)

546

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

∫ ∫ for all f ∈ C∫c (Ω), f real. Hence equality holds in 22.7 because L(−f ) ≤ − f dµ so L(f ) ≥ f dµ. Thus Lf = f dµ for all f ∈ Cc (Ω). Just apply the result for real functions to the real and imaginary parts of f . This proves the Lemma. This gives the existence part of the Riesz representation theorem. It only remains to prove uniqueness. Suppose both µ1 and µ2 are measures on S satisfying the conclusions of the theorem. Then if K is compact and V ⊇ K, let K ≺ f ≺ V . Then ∫ ∫ µ1 (K) ≤ f dµ1 = Lf = f dµ2 ≤ µ2 (V ). Thus µ1 (K) ≤ µ2 (K) for all K. Similarly, the inequality can be reversed and so it follows the two measures are equal on compact sets. By the assumption of inner regularity on open sets, the two measures are also equal on all open sets. By outer regularity, they are equal on all sets of S.  The regularity of this measure is very significant. Here is a useful lemma. Lemma 22.0.11 Suppose f : Ω → [0, ∞) is measurable where µ is a regular measure as in the above theorem having the measure of any ball finite and the closures of balls compact. Then there is a set of measure zero N and a sequence of functions {hn } , hn : Ω → [0, ∞) each in Cc (Ω) such that for all ω ∈ Ω \ N, hn (ω) → f (ω) . Also, for ω ∈ / N, hn (ω) ≤ f (ω) for all n large enough. Proof: Consider fn (ω) ≡ XBn (ω) min (f (ω) , n) where Bn is a ball centered at ω 0 which has radius n. Thus fn (ω) is an increasing sequence and converges to f (ω) for each ω. Also by Corollary 20.1.6, there exists a simple function sn such that sn (ω) ≤ fn (ω) , sup |fn (ω) − sn (ω)| < ω∈Ω

Let sn (ω) =

mn ∑

1 2n

cnk XEkn (ω) , cnk > 0

k=1

∫ Then it must be the case that < ∞ because fn dµ < ∞. By regularity, there exists a compact set Kkn and an open set Vkn such that µ (Ekn )

Kkn



Ekn



Vkn ,

mn ∑

µ (Vkn \ Kkn )
r (Bm+1 ) >

3 sup {r : B (a, r) ∈ F , a ∈ Am } . 4

(22.11)

J

Then letting Bj = B (aj , rj ) , this sequence satisfies {B (aj , rj /3)}j=1 are disjoint,. A ⊆ ∪Ji=1 Bi .

(22.12)

22.2. THE BESICOVITCH COVERING THEOREM

551

Proof: First note that Bm+1 can be chosen as in 22.11. This is because the Am are decreasing and so 3 sup {r : B (a, r) ∈ F , a ∈ Am } 4 3 sup {r : B (a, r) ∈ F , a ∈ Am−1 } < r (Bm ) ≤ 4 Thus the r (Bk ) are strictly decreasing and so no Bk contains a center of any other Bj . If x ∈ B (aj , rj /3) ∩ B (ai , ri /3) where these balls are two which are chosen by the above scheme such that j > i, then from what was just shown ( ) rj ri 1 1 2 ∥aj − ai ∥ ≤ ∥aj − x∥ + ∥x − ai ∥ ≤ + ≤ + ri = ri < ri 3 3 3 3 3 and this contradicts the construction because aj is not covered by B (ai , ri ). Finally consider the claim that A ⊆ ∪Ji=1 Bi . Pick B1 satisfying 22.9. If B1 , · · · , Bm have been chosen, and Am is given in 22.10, then if Am = ∅, it follows A ⊆ ∪m i=1 Bi . Set J = m. Now let a be the center of Ba ∈ F. If a ∈ Am for all m,(That is a( does not get covered by the r ) Bi .) then rm+1 ≥ 43 r (Ba ) for all m, a contradiction since the balls B aj , 3j are disjoint and A is bounded, implying that rj → 0. Thus a must fail to be in some Am which means it was covered by some ball in the sequence.  The covering theorem is obtained by estimating how many Bj can intersect Bk for j < k. The thing to notice is that from the construction, no Bj contains the center of another Bi . Also, the r (Bk ) is a decreasing sequence. Let α > 1. There are two cases for an intersection. Either r (Bj ) ≥ αr (Bk ) or αr (Bk ) > r (Bj ) > r (Bk ). First consider the case where we have a ball B (a,r) intersected with other balls of radius larger than αr such that none of the balls contains the center of any other. This is illustrated in the following picture with two balls. This has to do with estimating the number of Bj for j ≤ k where r (Bj ) ≥ αr (Bk ). rx

Bx

By

x• px •

•y

• a Ba

ry

• py r

Imagine projecting the center of each big ball as in the above picture onto the surface of the given ball, assuming the given ball has radius 1. By scaling the balls, you could reduce to this case that the given ball has radius 1. Then from geometric reasoning, there should be a lower bound to the distance between these two projections depending on dimension. Thus there is an estimate on how many large balls can intersect the given ball with no ball containing a center of another one. Intersections with relatively big balls Lemma 22.2.2 Let the balls Ba , Bx , By be as shown, having radii r, rx , ry respectively. Suppose the centers of Bx and By are not both in any of the balls shown, and suppose ry ≥ rx ≥ αr where α is x−a a number larger than 1. Also let Px ≡ a + r ∥x−a∥ with Py being defined similarly. Then it follows α−1 that ∥Px − Py ∥ ≥ α+1 r. There exists a constant L (p, α) depending on α and the dimension, such that if B1 , · · · , Bm are all balls such that any pair are in the same situation relative to Ba as Bx , and By , then m ≤ L (p, α) . Proof: From the definition,



x−a y−a

∥Px − Py ∥ = r − ∥x − a∥ ∥y − a∥

552

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

(x − a) ∥y − a∥ − (y − a) ∥x − a∥

= r

∥x − a∥ ∥y − a∥



∥y − a∥ (x − y) + (y − a) (∥y − a∥ − ∥x − a∥)

= r

∥x − a∥ ∥y − a∥ ≥r

∥x − y∥ ∥y − a∥ |∥y − a∥ − ∥x − a∥| −r ∥x − a∥ ∥x − a∥ ∥y − a∥

=r

∥x − y∥ r − |∥y − a∥ − ∥x − a∥| . ∥x − a∥ ∥x − a∥

(22.13)

There are two cases. First suppose that ∥y − a∥ − ∥x − a∥ ≥ 0. Then the above =r

r ∥x − y∥ − ∥y − a∥ + r. ∥x − a∥ ∥x − a∥

From the assumptions, ∥x − y∥ ≥ ry and also ∥y − a∥ ≤ r + ry . Hence the above ry r r − (r + ry ) + r = r − r ∥x − a∥ ∥x − a∥ ∥x − a∥ ( ) ( ) ( ) r r 1 α−1 ≥ r 1− ≥r 1− ≥r 1− ≥r . ∥x − a∥ rx α α+1

≥ r

The other case is that ∥y − a∥ − ∥x − a∥ < 0 in 22.13. Then in this case 22.13 equals ( ) ∥x − y∥ 1 = r − (∥x − a∥ − ∥y − a∥) ∥x − a∥ ∥x − a∥ r = (∥x − y∥ − (∥x − a∥ − ∥y − a∥)) ∥x − a∥ Then since ∥x − a∥ ≤ r + rx , ∥x − y∥ ≥ ry , ∥y − a∥ ≥ ry , and remembering that ry ≥ rx ≥ αr, ≥ ≥ =

r r (ry − (r + rx ) + ry ) ≥ (ry − (r + ry ) + ry ) rx + r rx + r ( ) r r r 1 (ry − r) ≥ (rx − r) ≥ rx − rx rx + r rx + r α rx + α1 rx α−1 r (1 − 1/α) = r 1 + (1/α) α+1

Replacing r with something larger, α1 rx is justified by the observation that x → α−x α+x is decreasing. This proves the estimate between Px and Py . Finally, in the case of the balls Bi having centers at xi , then as above, let Pxi = a + r ∥xxii −a −a∥ . Then (Pxi − a) r−1 is on the unit sphere having center 0. Furthermore,

(Pxi − a) r−1 − (Pyi − a) r−1 = r−1 ∥Pxi − Pyi ∥ ≥ r−1 r α − 1 = α − 1 . α+1 α+1 How many points on the ( unit ) sphere can be pairwise this far apart? The unit sphere is compact 1 α−1 and so there exists a 4 α+1 net having L (p, α) points. Thus m cannot be any larger than L (p, α) because if it were, by the pigeon hole principal, two of the points (Pxi − a) r−1 would lie in a ( (then )) 1 α−1 single ball B p, 4 α+1 so they could not be α−1 α+1 apart.  The above lemma has to do with balls which are relatively large intersecting a given ball. Next is a lemma which has to do with relatively small balls intersecting a given ball. First is another lemma. m

Lemma 22.2.3 Let Γ > 1 and B (a, Γr) be a ball and suppose {B (xi , ri )}i=1 are balls contained in B (a, Γr) such that r ≤ ri and none of these balls contains the center of another ball. Then there is a constant M (p, Γ) such that m ≤ M (p, Γ).

22.2. THE BESICOVITCH COVERING THEOREM

553

Proof: Let zi = xi − a.( Then B ) (zi , ri ) are balls contained in B (0, Γr) with no ball containing a z i ri center of another. Then B Γr , Γr are balls in B (0,1) with no ball containing the center of another. ) ( M (p,Γ) 1 1 By compactness, there is a 8Γ net for B (0, 1), {yi }i=1 . Thus the( balls )B yi , 8Γ cover B (0, 1). zi 1 If m ≥ M (p, Γ) , then by the pigeon hole principle, one of these B y , would contain some Γr i 8Γ

z

( ) zj z r z r zi j 1 and Γr which requires Γri − Γrj ≤ 4Γ < 4Γr so Γr ∈ B Γrj , Γrj . Thus m ≤ M (p, γ, Γ).  Intersections with small balls Lemma 22.2.4 Let B be a ball having radius r and suppose B has nonempty intersection with the balls B1 , · · · , Bm having radii r1 , · · · , rm respectively, and as before, no Bi contains the center of any other and the centers of the Bi are not contained in B. Suppose α > 1 and r ≤ min (r1 , · · · , rm ), each ri < αr. Then there exists a constant M (p, α) such that m ≤ M (p, α). Proof: Let B = B (a, r). Then each Bi is contained in B (a, 2r + αr + αr) . This is because if y ∈ Bi ≡ B (xi , ri ) , ∥y − a∥ ≤ ∥y − xi ∥ + ∥xi − a∥ ≤ ri + r + ri < 2r + αr + αr Thus Bi does not contain the center of any other Bj , these balls are contained in B (a, r (2α + 2)) , and each radius is at least as large as r. By Lemma 22.2.3 there is a constant M (p, α) such that m ≤ M (p, α).  Now here is the Besicovitch covering theorem. In the proof, we are considering the sequence of balls described above. Theorem 22.2.5 There exists a constant Np , depending only on p with the following property. If F is any collection of nonempty balls in X with sup {diam (B) : B ∈ F } < D < ∞ and if A is the set of centers of the balls in F, then there exist subsets of F, H1 , · · · , HNp , such that each Hi is a countable collection of disjoint balls from F (possibly empty) and N

p A ⊆ ∪i=1 ∪ {B : B ∈ Hi }.

Proof: To begin with, suppose A is bounded. Let L (p, α) be the constant of Lemma 22.2.2 and let Mp = L (p, α) + M (p, α) + 1. Define the following sequence of subsets of F, G1 , G2 , · · · , GMp . Referring to the sequence {Bk } considered in Lemma 22.2.1, let B1 ∈ G1 and if B1 , · · · , Bm have been assigned, each to a Gi , place Bm+1 in the first Gj such that Bm+1 intersects no set already in Gj . The existence of such a j follows from Lemmas 22.2.2 and 22.2.4 and the pigeon hole principle. Here is why. Bm+1 can intersect at most L (p, α) sets of {B1 , · · · , Bm } which have radii at least as large as αr (Bm+1 ) thanks to Lemma 22.2.2. It can intersect at most M (p, α) sets of {B1 , · · · , Bm } which have radius smaller than αr (Bm+1 ) thanks to Lemma 22.2.4. Thus each Gj consists of disjoint sets of F and the set of centers is covered by the union of these Gj . This proves the theorem in case the set of centers is bounded. Now let R1 = B (0, 5D) and if Rm has been chosen, let Rm+1 = B (0, (m + 1) 5D) \ Rm Thus, if |k − m| ≥ 2, no ball from F having nonempty intersection with Rm can intersect any ball from F which has nonempty intersection with Rk . This is because all these balls have radius less than D. Now let Am ≡ A ∩ Rm and apply the above result for a bounded set of centers to those balls of F which intersect Rm to obtain sets of disjoint balls G1 (Rm ) , G2 (Rm ) , · · · , GMp (Rm ) covering ∞ Am . Then simply define Gj′ ≡ ∪∞ k=1 Gj (R2k ) , Gj ≡ ∪k=1 Gj (R2k−1 ) . Let Np = 2Mp and } { } { ′ H1 , · · · , HNp ≡ G1′ , · · · , GM , G , · · · , G 1 M p p

554

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

Note that the balls in Gj′ are disjoint. This is because those in Gj (R2k ) are disjoint and if you consider any ball in Gj (R2m ) , it cannot intersect a ball of Gj (R2k ) for m ̸= k because |2k − 2m| ≥ 2. Similar considerations apply to the balls of Gj .  Of course, you could pick a particular α. If you make α larger, L (p, α) should get smaller and M (p, α) should get larger. Obviously one could explore this at length to try and get a best choice of α. Now let f : Rn → Rn be a Lipschitz function. This means there is a constant K such that ∥f (x) − f (y)∥ ≤ K ∥x − y∥ where ∥·∥ denotes a norm on Rn . For example, f (x) could equal Ax where A is an n × n matrix. In this case, ∥Ax − Ay∥ ≤ ∥A∥ ∥x − y∥ with ∥A∥ the operator norm. Then the following proposition is a fundamental result which says that a Lipschitz map of a measurable set is a measurable set. This is a remarkable result because it is not even true if f is only continuous. For example, see Problem 8 on Page 521. Proposition 22.2.6 Let f : Rn → Rn be Lipschitz continuous with Lipschitz constant K. Then if F is a Lebesgue measurable set, then so is f (K). Also if N is a set of measure zero, then f (N ) is a set of measure zero. Proof: Consider the second claim first. Let V be an open set, mn (V ) < ε, and V ⊇ K. For each point x of K, there is a closed ball B (x, r) ⊆ V such that also r < 1. Let F denote this collection of balls and let {H1 , · · · , HNn } be the finite set each of which consists of countably many disjoint balls from F whose union includes all of N . Denote by f (Hk ) the set of f (B) where B ∈ Hk . Each is compact because f is continuous and B is closed. Thus f (N ) is contained in the union of the f (Hk ). It follows that Nn ∑ ∑ mn (f (N )) ≤ mn (f (B)) k=1 B∈Hk

Here the bar on the measure denotes the outer measure determined by mn . Now f (B) is contained in a ball of the form B (f (x) , Kr) where B = B (x, r) and so, by Proposition 22.1.4, mn (f (B)) ≤ K n rn α (n) , and also α (n) rn = mn (B) . Therefore, the above reduces to mn (f (N ))

≤ ≤

Nn ∑ ∑

mn (f (B)) ≤ K n

k=1 B∈Hk K n Nn mn

Nn ∑ ∑

mn (B)

k=1 B∈Hk

(V ) < K n Nn ε

Since ε is arbitrary, this implies that mn (f (N )) = 0 and so f (N ) is Lebesgue measurable and in fact is a set of measure zero. This is by completeness of Lebesgue measure. See Proposition 20.9.6 and completeness of Lebesgue measure. For the first part, suppose F is Lebesgue measurable. Then there exists H ⊆ F such that H is the countable union of compact sets and also that mn (H) = mn (F ) . Thus H is measurable because it is the union of measurable sets. Say mn (F ) < ∞ first. Then mn (F \ H) = 0 and so f (F \ H) is a measurable set of measure zero. Then f (F ) = f (F \ H) ∪ f (H). The second is obviously measurable because it is a countable union of compact sets, the continuous image of a compact set n being compact. It follows that f (F ) is also measurable. In general, consider Fk ≡ F ∩ [−k, k] . Then f (Fk ) is measurable and now f (F ) = ∪k f (Fk ) so it is also measurable. 

22.3

Change Of Variables, Linear Map

It was shown above that if A is an n × n matrix, then AE is Lebesgue measurable whenever E is.

22.3. CHANGE OF VARIABLES, LINEAR MAP

555

Lemma 22.3.1 Let A be an n × n matrix and let E be Lebesgue measurable. Then mn (A (E)) = |det (A)| mn (E) Proof: This is to be shown first for elementary matrices. To begin with, let A be the elementary matrix which involves adding the ith row of the identity to the j th row. Thus A (x1 , · · · , xn ) = (x1 , · · · , xi , · · · , xi + xj , · · · , xn ) ∏n Consider what it does to B ≡ k=1 [ak , bk ] . The j th variable now goes from aj + xi to bj + xi . Thus, by Fubini’s theorem ∫ ∫ A (B) dmn = XA(B) (x1 , · · · , xn ) dmn ∫ =

∫ ···

Rn

Rn

XA(B) (x1 , · · · , xn ) dxj dxi dx1 · · · dxj−1 dxj+1 · · · dxn

the integration taken in an order such that the first two two are with respect to xi and xj . Then from what was just observed, this reduces to ∫ ∫ ∫ ∫ X[a1 ,b1 ] (x1 ) · · · X[an ,bn ] (xn ) X[ai ,bi ] (xi ) X[aj +xi ,bj +xi ] (xj ) dxj dxi dx1 · · · dxn ∏ That inner integral is still (bj − aj ) and so the whole thing reduces to k (bk − ak ) = mn (B) . The determinant of this elementary matrix is 1 because it is either upper triangular or lower triangular with all ones down the main diagonal. Thus in this case, mn (A (B)) = |det (A)| mn (B). In case A is the elementary matrix which involves multiplying the ith row by α ̸= 0, |det (A)| = |α| and A (B) is also a box which changes only the interval corresponding to xi , making it |α| times as long. Thus mn (A (B)) = |α| mn (B) = |det (A)| mn (B). The remaining kind of elementary matrix A involves ˆ where B ˆ is the Cartesian product of the same intervals as switching two rows of I and it takes B to B in B but the intervals now occur with respect to different variables. Thus the mn (A (B)) = mn (B). Also, |det (A)| = |− det (I)| = 1 and so again mn (A (B)) = |det (A)| mn (B). Letting A be one of these elementary matrices, let G denote the Borel sets E such that n

mn (A (E ∩ Rm )) = |det (A)| mn (E ∩ Rm ) , Rm ≡ [−m, m] (22.14) ∏n Then if K consists of sets of the form k=1 [ak , bk ] , the above shows that K ⊆ G. Also it is clear that G is closed with respect to countable disjoint unions and complements. Therefore, G ⊇ σ (K). But σ (K) ⊇ B (Rn ) because σ (K) clearly contains every open set. Now let m → ∞ in 22.14 to obtain the desired result. Next suppose that F is a Lebesgue measurable set, mn (F ) < ∞. Then by inner regularity, there is a set E ⊆ F such that E is the countable union of compact sets, hence Borel, and mn (F ) = mn (E) . Thus mn (F \ E) = 0, since mn (F ) = 0. By Proposition 22.2.6, mn (A (F )) ≤ mn (A (E)) + mn (A (F \ E)) = |det (A)| mn (E) ≤ |det (A)| mn (F ) Thus, also

( ) ( ) mn (F ) = mn A−1 (AF ) ≤ det A−1 mn (A (F ))

and so |det (A)| mn (F ) ≤ mn (A (F )) This with ∗ shows that |det (A)| mn (F ) = mn (A (F )) . In the general case, |det (A)| mn (F ∩ Rm ) = mn (A (F ∩ Rm )) Now let m → ∞ to get the desired result.

(*)

556

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

It follows from iterating this result that if A is any product of elementary matrices, E1 · · · Er , then for F Lebesgue measurable, mn (A (F ))

= mn (E1 · · · Er (F )) = |det (E1 )| mn (E2 · · · Er (F )) · · · r ∏ = |det (Ei )| mn (F ) = |det (A)| mn (F ) i=1

If A is not the product of elementary matrices, then it has rank less than n and so there are elementary matrices, E1 , · · · , Er such that E1 · · · Er A = B where B is in row reduced echelon form and has at least one row of zeros. Thus, if F is a Lebesgue measurable set, then 0 = mn (B (F )) = mn (E1 · · · Er A (F )) =

r ∏

|det (Ei )| mn (A (F ))

i=1

The first equality comes from Fubini’s theorem and Proposition 22.2.6. B (F ) ⊆ {(x1 , · · · , xn ) ∈ Rn : xn = 0} ≡ C, a Borel set Therefore, by Fubini’s theorem, ∫ mn (B (F )) ⊆ mn (C) =

∫ ···

XC (x1 , · · · , xn ) dx1 · · · dxn = 0

Thus, even in this case, mn (A (F )) = |det (A)| mn (F ). 

22.4

Vitali Coverings

There is another covering theorem which may also be referred to as the Besicovitch covering theorem. As before, the balls can be taken with respect to any norm on Rn . At first, the balls will be closed but this assumption will be removed. Definition 22.4.1 A collection of balls, F covers a set, E in the sense of Vitali if whenever x ∈ E and ε > 0, there exists a ball B ∈ F whose center is x having diameter less than ε. I will give a proof of the following theorem. Theorem 22.4.2 Let µ be a Radon measure on Rn and let E be a set with µ (E) < ∞. Where µ is the outer measure determined by µ. Suppose F is a collection of closed balls which cover E in the sense of Vitali. Then there exists a sequence of disjoint balls, {Bi } ⊆ F such that ( ) µ E \ ∪∞ j=1 Bj = 0. Proof: Let Nn be the constant of the Besicovitch covering theorem. Choose r > 0 such that ( ) 1 −1 (1 − r) 1− ≡ λ < 1. 2Nn + 2 If µ (E) = 0, there is nothing to prove so assume µ (E) > 0. Let U1 be an open set containing E with (1 − r) µ (U1 ) < µ (E) and 2µ (E) > µ (U1 ) , and let F1 be those sets of F which are contained in U1 whose centers are in E. Thus F1 is also a Vitali cover of E. Now by the Besicovitch covering theorem proved earlier, there exist balls B, of F1 such that n E ⊆ ∪N i=1 {B : B ∈ Gi }

22.4. VITALI COVERINGS

557

where Gi consists of a collection of disjoint balls of F1 . Therefore, µ (E) ≤

Nn ∑ ∑

µ (B)

i=1 B∈Gi

and so, for some i ≤ Nn ,



(Nn + 1)

µ (B) > µ (E) .

B∈Gi

It follows there exists a finite set of balls of Gi , {B1 , · · · , Bm1 } such that (Nn + 1)

m1 ∑

µ (Bi ) > µ (E)

(22.15)

i=1

and so (2Nn + 2)

m1 ∑

µ (Bi ) > 2µ (E) > µ (U1 ) .

i=1

Since 2µ (E) ≥ µ (U1 ) , 22.15 implies 1 ∑ µ (U1 ) 2µ (E) µ (E) ≤ = < µ (Bi ) . 2Nn + 2 2Nn + 2 Nn + 1 i=1

m

Also U1 was chosen such that (1 − r) µ (U1 ) < µ (E) , and so ( ) 1 λµ (E) ≥ λ (1 − r) µ (U1 ) = 1 − µ (U1 ) 2Nn + 2 ≥ µ (U1 ) −

m1 ∑

( 1 ) µ (Bi ) = µ (U1 ) − µ ∪m j=1 Bj

i=1

( ) ) m1 1 = µ U1 \ ∪ m j=1 Bj ≥ µ E \ ∪j=1 Bj . (

1 Since the balls are closed, you can consider the sets of F which have empty intersection with ∪m j=1 Bj m1 and this new collection of sets will be a Vitali cover of E \ ∪j=1 Bj . Letting this collection of balls 1 play the role of F in the above argument and letting E \ ∪m j=1 Bj play the role of E, repeat the above argument and obtain disjoint sets of F,

{Bm1 +1 , · · · , Bm2 } , such that and so

( ) (( ) ) ( ) m2 m2 1 1 λµ E \ ∪m E \ ∪m j=1 Bj > µ j=1 Bj \ ∪j=m1 +1 Bj = µ E \ ∪j=1 Bj , ( ) 2 λ2 µ (E) > µ E \ ∪m j=1 Bj .

Continuing in this way, yields a sequence of disjoint balls {Bi } contained in F and ( ( ) ) mk k µ E \ ∪∞ j=1 Bj ≤ µ E \ ∪j=1 Bj < λ µ (E ) ) ( for all k. Therefore, µ E \ ∪∞ j=1 Bj = 0 and this proves the Theorem.  It is not necessary to assume µ (E) < ∞. Corollary 22.4.3 Let µ be a Radon measure on Rn . Letting µ be the outer measure determined by µ, suppose F is a collection of closed balls which cover E in the sense of Vitali. Then there exists a sequence of disjoint balls, {Bi } ⊆ F such that ( ) µ E \ ∪∞ j=1 Bj = 0.

558

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

Proof: Since µ is a Radon measure it is finite on compact sets. Therefore, there are at most ∞ countably many numbers, {bi }i=1 such that µ (∂B (0, bi )) > 0. It follows there exists an increasing ∞ sequence of positive numbers, {ri }i=1 such that limi→∞ ri = ∞ and µ (∂B (0, ri )) = 0. Now let D1 · · · , Dm

≡ {x : ||x|| < r1 } , D2 ≡ {x : r1 < ||x|| < r2 } , ≡ {x : rm−1 < ||x|| < rm } , · · · .

Let Fm denote those closed balls of F which are contained in Dm . Then letting Em denote E ∩ Dm , Fm is a Vitali cover 22.4.2, there exists a countable sequence { of }E∞m , µ (Em ) < ∞,(and so by Theorem ) m of balls from Fm Bjm j=1 , such that µ Em \ ∪∞ B = 0. Then consider the countable collection j=1 j { m }∞ of balls, Bj j,m=1 . ( ) ∞ m µ E \ ∪∞ m=1 ∪j=1 Bj ∞ ∑ ) ( m + µ Em \ ∪ ∞ j=1 Bj



( ) µ ∪∞ j=1 ∂B (0, ri ) +

=

0

m=1

You don’t need to assume the balls are closed. In fact, the balls can be open, closed or anything in between and the same conclusion can be drawn. Corollary 22.4.4 Let µ be a Radon measure on Rn . Letting µ be the outer measure determined by µ, suppose F is a collection of balls which cover E in the sense of Vitali, open closed or neither. Then there exists a sequence of disjoint balls, {Bi } ⊆ F such that ( ) µ E \ ∪∞ j=1 Bj = 0. Proof: Let x ∈ E. Thus x is the center of arbitrarily small balls from F. Since µ is a Radon measure, at most countably many radii, r of these balls can have the property that µ (∂B (0, r)) = 0. Let F ′ denote the closures of the balls of F, B (x, r) with the property that µ (∂B (x, r)) = 0. Since for each x ∈ E there are only countably many exceptions, F ′ is still{a Vitali }∞ cover of E. Therefore, by Corollary 22.4.3 there is a disjoint sequence of these balls of F ′ , Bi i=1 for which ( ) µ E \ ∪∞ j=1 Bj = 0 However, since their boundaries have µ measure zero, it follows ( ) µ E \ ∪∞ j=1 Bj = 0. 

22.5

Change Of Variables

Here is an interesting proposition. Proposition 22.5.1 Let f : U → Rn be differentiable on the open set U ⊆ Rn . Then if F is a Lebesgue measurable set, then so is f (K). Also if N ⊆ U is a set of measure zero, then f (N ) is a set of measure zero. Proof: Consider the second claim first. Let N be a set of measure zero and let Nk ≡ {x ∈ N : ∥Df (x)∥ ≤ k} There is an open set V ⊇ Nk such that mn (V ) < ε. For each x ∈ Nk , there is a ball Bx centered at x with radius rx < 1 such that Bx ⊆ V and for y ∈ Bx , f (y) ∈ f (x) + Df (x) B (0, rx ) + B (0, εrx )

22.5. CHANGE OF VARIABLES

559

Thus f (Bx ) ⊆

f (x) + B (0, (∥Df (x)∥ + ε) rx )



B (f (x) , (k + ε) rx ) n

mn (f (Bx )) ≤

(k + ε) mn (B (x, rx ))

By the Besicovitch covering theorem, there are balls of this sort such that n Nk ⊆ ∪M j=1 ∪ {B ∈ Gj }

where Gj is a countable disjoint collection of these balls. Thus, mn (f (Nk )) ≤ ≤

Mn ∑ ∑

n

mn (f (B)) ≤ (k + ε)

j=1 B∈Gj n

(k + ε) Mn mn (V ) ≤ ε (k +

Mn ∑ ∑

mn (B)

j=1 B∈Gj n ε) Mn

Since ε is arbitrary, it follows that mn (f (Nk )) = 0 and so in fact f (Nk ) is measurable and has mn measure zero. See Proposition 20.9.6. Now let k → ∞ to conclude that mn (f (N )) = 0. Now the conclusion is shown the same as in Proposition 22.2.6. You exploit inner regularity, and what was just shown, to obtain the conclusion that f (F ) is measurable if F is.  Recall Lemma 19.12.1 which was based on the Brouwer fixed point theorem. The version of use here is stated below. In what follows, |·| is the usual Euclidean norm on Rn . Lemma 22.5.2 Let h be continuous and map B (0, r) ⊆ Rn to Rn . Suppose that for all x ∈ B (0, r), |h (x) − x| < εr Then it follows that

( ) h B (0, r) ⊇ B (0, (1 − ε) r)

Now the Besicovitch covering theorem for a Vitali cover is used to give an easy treatment of the change of variables for multiple integrals. This will be based on the following lemma. Lemma 22.5.3 Let h : U → Rn where U is an open bounded set and suppose that Dh (x) exists and is invertible and continuous on U and that h is one to one on U . Let A ⊆ U be a Lebesgue measurable set. Then ∫ mn (h (A)) = |det Dh (x)| dmn A

Proof: In what follows, ε will be a small positive number. Let A be a Lebesgue measurable subset of U and A ⊆ {x ∈ U : |det Dh (x)| < k} (**) let V be an open set containing A such that mn (V \ A) < ε. We can also assume that mn (h (V ) \ h (A)) < ε The reason for this is as follows. There exists G ⊇ A such that G is the countable intersection of nested open sets which each contain A and mn (G \ A) = 0. Say G = ∩i Vi . Then since h is one to one, mn (h (G)) − mn (h (A)) = mn (h (G) \ h (A)) = mn (h (G \ A)) = 0 Then mn (h (A)) = mn (h (G)) = lim mn (h (Vm )) m→∞

560

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

so eventually, for large enough m, mn (h (A)) + ε > mn (h (Vm )) Then for x ∈ A, h (x + v) = h (x) + Dh (x) v + o (v) ((h (x + v) − h (x)) − Dh (x) v) −1

Dh (x)

(h (x + v) − h (x)) − v

(22.16)

= o (v) = o (v)

Thus, for all v small enough, |v| ≤ r, −1 Dh (x) (h (x + v) − h (x)) − v < εr It follows from Lemma 22.5.2 that for each x ∈ A, there is a ball B (x, rx ) ⊆ V Dh (x)

−1

(h (x + B (0, λrx )) − h (x)) ⊇ B (0, (1 − ε) r) h (x + B (0, λrx )) − h (x) ⊇ Dh (x) B (0, (1 − ε) r) h (B (x, λrx )) ⊇ Dh (x) B (h (x) , (1 − ε) r)

this holding for all rx small enough. Here λ > 1 is arbitrary. Thus n

mn (h (B (x, λrx ))) ≥ |det (Dh (x))| (1 − ε) mn (B (x, rx )) Letting λ ↓ 1, and using that h maps sets of measure zero to sets of measure zero, n

mn (h (B (x, rx ))) ≥ |det (Dh (x))| (1 − ε) mn (B (x, rx )) As explained earlier, even if you have a general Radon measure, you could assume that rx is such {y : |x − y| = rx } has measure zero, although in the case of Lebesgue measure, this can be shown for all rx . It follows from Proposition 22.1.4 for example. Also from 22.16, we can assume that rx is small enough that h (B (x, rx )) − h (x) ⊆ Dh (x) B (0, (1 + ε) rx ) h (B (x, rx ))

⊆ Dh (x) B (h (x) , (1 + ε) rx )

and so n

mn (h (B (x, rx ))) ≤ |det Dh (x)| (1 + ε) mn (B (x, rx )) Thus, whenever rx is small enough, n

(1 − ε) |det Dh (x)| mn (B (x, rx )) ≤ mn (h (B (x, rx ))) n

≤ (1 + ε) |det Dh (x)| mn (B (x, rx ))

(*)

At this point, assume a little more on h. Assume it is actually C 1 (U ) . Then by making rx smaller if necessary, it can be assumed that for y ∈ B (x, rx ) , ||det Dh (x)| − |det Dh (y)|| < ε This is a Vitali cover for A. Therefore, there is a countable disjoint sequence of these balls {Bi (xi , ri )} such that mn (A \ ∪i Bi ) = 0. Then since h is one to one, mn (h (A)) ≤

∞ ∑ i=1

mn (h (Bi )) ≤ mn (h (V )) < mn (h (A)) + ε

22.5. CHANGE OF VARIABLES

561

Now the following chain of inequalities follow from the above. ∫ n

|Dh (x)| (1 − ε) dmn

∞ ∫ ∑



A

∞ ∑

i=1

i=1

Bi

∞ ∫ ∑





n

|Dh (x)| (1 − ε) dmn

Bi n

(|Dh (xi )| + ε) (1 − ε) dmn

n

n

mn (h (Bi )) + ε (1 − ε) mn (U ) ≤ mn (h (V )) + ε (1 − ε) mn (U )

i=1 n

≤ mn (h (A)) + ε + ε (1 − ε) mn (U ) ∞ ∑ n ≤ mn (h (Bi )) + ε + ε (1 − ε) mn (U ) i=1



∞ ∑

n

(1 + ε) |det Dh (xi )| mn (Bi ) + Cε

i=1

=

∞ ∑

∫ n



∞ ∑ i=1

|det Dh (xi )| dmn + Cε

(1 + ε)

Bi

i=1

∫ (1 + ε)

n

|det Dh (x)| dmn + Cε Bi

∫ n



|det Dh (x)| dmn (1 + ε) + Cε (V∫



)

∫ |det Dh (x)| dmn +

A

V \A

|det Dh (x)| dmn

n

(1 + ε) + Cε

∫ n



n

|det Dh (x)| dmn (1 + ε) + εk (1 + ε) + Cε A n

Since ε is arbitrary, the ends of this string and mn (h (A)) + ε + ε (1 − ε) mn (V ) in the middle show that ∫ mn (h (A)) = |Dh (x)| dmn A

Now we remove the assumption ∗∗. Letting A be arbitrary and measurable, let Ak ≡ A ∩ {x : |det Dh (x)| ≤ k} From what was just shown,

∫ |Dh (x)| dmn

mn (h (Ak )) = Ak

Let k → ∞ and use monotone convergence theorem to prove the lemma.  You can remove the assumption that h is C 1 but it is a little more trouble to do so. It is easy to remove the assumption that U is bounded. Corollary 22.5.4 Let h : U → Rn where U is an open set and suppose that Dh (x) exists and is invertible and continuous on U and that h is one to one on U . Let A ⊆ U be a Lebesgue measurable set. Then ∫ mn (h (A)) = |det Dh (x)| dmn A

562

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS Proof: Let A ⊆ U and let Ak ≡ B (0, k) ∩ A, Uk ≡ B (0, k) ∩ U . Then the above lemma shows ∫ mn (h (Ak )) = |det Dh (x)| dmn Ak

Now use the monotone convergence theorem to obtain the conclusion of the lemma.  Now it is easy to prove the change of variables formula. Theorem 22.5.5 Let h : U → Rn where U is an open set and suppose that Dh (x) exists and is continuous and invertible on U and that h is one to one on U . Then if f ≥ 0 and is Lebesgue measurable, ∫ ∫ f (h (x)) |det Dh (x)| dmn

f (y) dmn = h(U )

U

Proof: Let f (y) = limk→∞ sk (y) where sk is a nonnegative simple function. Say sk (y) =

mk ∑

cki XFik (y)

i=1

Then sk (h (x)) =

mk ∑

cki XFik (h (x)) =

i=1

mk ∑

cki Xh−1 (F k ) (x) i

i=1

It follows from the above corollary that ∫ ∫ mk ∑ k sk (h (x)) |det Dh (x)| dmn = ci U

=

i=1 mk ∑ i=1

( ) cki mn Fik =



mk ∑

h(U ) i=1

h−1 (Fik )

|det Dh (x)| dmn ∫

cki XFik (y) dmn =

sk (y) dmn h(U )

Now apply monotone convergence theorem to obtain the desired result.  It is a good idea to remove the requirement that det Dh (x) ̸= 0. This is also fairly easy from the Besicovitch covering theorem. The following is Sard’s lemma. In the proof, it does not matter which norm you use in defining balls but it may be easiest to consider the norm ||x|| ≡ max {|xi | , i = 1, · · · , n}. Lemma 22.5.6 (Sard) Let U be an open set in Rn and let h : U → Rn be differentiable. Let Z ≡ {x ∈ U : det Dh (x) = 0} . Then mn (h (Z)) = 0. Proof: For convenience, assume the balls in the following argument come from ||·||∞ . First note that Z is a Borel set because h is continuous and so the component functions of the Jacobian matrix are each Borel measurable. Hence the determinant is also Borel measurable. Suppose that U is a bounded open set. Let ε > 0 be given. Also let V ⊇ Z with V ⊆ U open, and mn (Z) + ε > mn (V ) . Now let x ∈ Z. Then since h is differentiable at x, there exists δ x > 0 such that if r < δ x , then B (x, r) ⊆ V and also, h (B (x,r)) ⊆ h (x) + Dh (x) (B (0,r)) + B (0,rη) , η < 1. Regard Dh (x) as an m × m matrix, the matrix of the linear transformation Dh (x) with respect to the usual coordinates. Since x ∈ Z, it follows that there exists an invertible matrix A such that ADh (x) is in row reduced echelon form with a row of zeros on the bottom. Therefore, mn (A (h (B (x,r)))) ≤ mn (ADh (x) (B (0,r)) + AB (0,rη))

(22.17)

22.5. CHANGE OF VARIABLES

563

The diameter of ADh (x) (B (0,r)) is no larger than ||A|| ||Dh (x)|| 2r and it lies in Rn−1 × {0} . The diameter of AB (0,rη) is no more than ||A|| (2rη) .Therefore, the measure of the right side is no more than n−1

[(||A|| ||Dh (x)|| 2r + ||A|| (2η)) r]

(rη)

n

≤ C (||A|| , ||Dh (x)||) (2r) η Hence from the change of variables formula for linear maps, mn (h (B (x,r))) ≤ η

C (||A|| , ||Dh (x)||) mn (B (x, r)) |det (A)|

Then letting δ x be still smaller if necessary, corresponding to sufficiently small η, mn (h (B (x,r))) ≤ εmn (B (x, r)) The balls of this form constitute a Vitali cover of Z. Hence, by the Vitali covering theorem Corollary ∞ 22.4.4, there exists {Bi }i=1 , Bi = Bi (xi , ri ) , a collection of disjoint balls, each of which is contained in V, such that mn (h (Bi )) ≤ εmn (Bi ) and mn (Z \ ∪i Bi ) = 0. Hence from Proposition 22.5.1, mn (h (Z) \ ∪i h (Bi )) ≤ mn (h (Z \ ∪i Bi )) = 0 Therefore, mn (h (Z))





mn (h (Bi )) ≤ ε

i



mn (Bi )

i

≤ ε (mn (V )) ≤ ε (mn (Z) + ε) . Since ε is arbitrary, this shows mn (h (Z)) = 0. What if U is not bounded? Then consider Zm = Z ∩ B (0, m) . From what was just shown, h (Zm ) has measure 0 and so it follows that h (Z) also does, being the countable union of sets of measure zero.  Now here is a better change of variables theorem. Theorem 22.5.7 Let h : U → Rn where U is an open set and suppose that Dh (x) exists and is continuous on U and that h is one to one on U . Then if f ≥ 0 and is Lebesgue measurable, ∫ ∫ f (y) dmn = f (h (x)) |det Dh (x)| dmn h(U )

U

Proof: Let U+ ≡ {x ∈ U : |det Dh (x)| > 0} , an open set, and U0 ≡ {x ∈ U : |det Dh (x)| = 0} . Then, using that h is one to one along with Lemma 22.5.6 and Theorem 22.5.5, ∫ ∫ f (y) dmn = f (y) dmn h(U ) h(U+ ) ∫ = f (h (x)) |det Dh (x)| dmn U+ ∫ = f (h (x)) |det Dh (x)| dmn  U 1

Now suppose h is only C , not necessarily one to one. For U+ ≡ {x ∈ U : |det Dh (x)| > 0} and Z the set where |det Dh (x)| = 0, Lemma 22.5.6 implies mn (h(Z)) = 0. For x ∈ U+ , the inverse function theorem implies there exists an open set Bx ⊆ U+ , such that h is one to one on Bx . Let {Bi } be a countable subset of {Bx }x∈U+ such that U+ = ∪∞ i=1 Bi . Let E1 = B1 . If E1 , · · · , Ek have been chosen, Ek+1 = Bk+1 \ ∪ki=1 Ei . Thus ∪∞ i=1 Ei = U+ , h is one to one on Ei , Ei ∩ Ej = ∅,

564

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

and each Ei is a Borel set contained in the open set Bi . Now define n(y) ≡

∞ ∑

Xh(Ei ) (y) + Xh(Z) (y).

i=1

The set h (Ei ) , h (Z) are measurable by Proposition 22.5.1. Thus n (·) is measurable. Lemma 22.5.8 Let F ⊆ h(U ) be measurable. Then ∫ ∫ n(y)XF (y)dmn = XF (h(x))| det Dh(x)|dmn . h(U )

U

Proof: Using Lemma 22.5.6 and the Monotone Convergence Theorem   mn (h(Z))=0 ∫ ∫ ∞ z }| {  ∑ n(y)XF (y)dmn = Xh(Ei ) (y) + Xh(Z) (y)  XF (y)dmn  h(U )

h(U )

= = = =

∞ ∫ ∑

Xh(Ei ) (y)XF (y)dmn

i=1 h(U ) ∞ ∫ ∑

Xh(Ei ) (y)XF (y)dmn

i=1 h(Bi ) ∞ ∫ ∑

XEi (x)XF (h(x))| det Dh(x)|dmn

i=1 Bi ∞ ∫ ∑

XEi (x)XF (h(x))| det Dh(x)|dmn

i=1

=

i=1

U

∫ ∑ ∞

XEi (x)XF (h(x))| det Dh(x)|dmn

U i=1



∫ XF (h(x))| det Dh(x)|dmn =

= U+

XF (h(x))| det Dh(x)|dmn .  U

Definition 22.5.9 For y ∈ h(U ), define a function, #, according to the formula #(y) ≡ number of elements in h−1 (y). Observe that #(y) = n(y) a.e.

(22.18)

because n(y) = #(y) if y ∈ / h(Z), a set of measure 0. Therefore, # is a measurable function because of completeness of Lebesgue measure. Theorem 22.5.10 Let g ≥ 0, g measurable, and let h be C 1 (U ). Then ∫ ∫ #(y)g(y)dmn = g(h(x))| det Dh(x)|dmn . h(U )

(22.19)

U

Proof: From 22.18 and Lemma 22.5.8, 22.19 holds for all g, a nonnegative simple function. Approximating an arbitrary measurable nonnegative function, g, with an increasing pointwise convergent sequence of simple functions and using the monotone convergence theorem, yields 22.19 for an arbitrary nonnegative measurable function, g. 

22.6. EXERCISES

22.6

565

Exercises

1. Suppose A ⊆ Rn is covered by a finite collection of Balls, F. Show that then there exists a m b b disjoint collection of these balls, {Bi }i=1 , such that A ⊆ ∪m i=1 Bi where Bi has the same center as Bi but 3 times the radius. Hint: Since the collection of balls is finite, they can be arranged in order of decreasing radius. 2. In Problem 8, you showed that if f ∈ L1 (Rn ) , there exists h which is continuous and equal to 0 off some compact set such that ∫ |f − h| dm < ε Define fy (x) ≡ f (x − y) . Explain why fy is Lebesgue measurable and ∫ ∫ |fy | dmn = |f | dmn Now justify the following formula. ∫ ∫ ∫ ∫ |fy − f | dmn ≤ |fy − hy | dmn + |hy − h| dmn + |h − f | dmn ∫ ≤ 2ε + |hy − h| dmn Now explain why the last term is less than ε if ∥y∥ is small enough. Explain continuity of translation in L1 (Rn ) which says that ∫ lim |fy − f | dmn = 0 y→0

Rn

3. This problem will help to understand that a certain kind of function exists. { 2 e−1/x if x ̸= 0 f (x) = 0 if x = 0 show that f is infinitely differentiable. Note that you only need to be concerned with what happens at 0. There is no question elsewhere. This is a little fussy but is not too hard. 4. ↑Let f (x) be as given above. Now let { fˆ (x) ≡

f (x) if x ≤ 0 0 if x > 0

Show that fˆ (x) is also infinitely differentiable. Now consider let r > 0 and define g (x) ≡ fˆ (− (x −∏ r)) fˆ (x + r). show that g is infinitely differentiable and vanishes for |x| ≥ r. Let n ψ (x) = k=1 g (xk ). For U = B (0, 2r) with the norm given by ∥x∥ = max {|xk | , k ≤ n} , show that ψ ∈ Cc∞ (U ). 5. ↑Using the above problem, let ψ ∈ Cc∞ (B (0, 1)) . Also let ∫ ψ ≥ 0 as in the above problem. Show there exists ψ ≥ 0 such that ψ ∈ Cc∞ (B (0, 1)) and ψdmn = 1. Now define ψ k (x) ≡ k n ψ (kx) ( ) ∫ Show that ψ k(equals zero off a compact subset of B 0, k1 and ψ k dmn = 1. We say that ) spt (ψ k ) ⊆ B 0, k1 . spt (f ) is defined as the closure of the ∫set on which f is not equal to 0. Such a sequence ) of functions as just defined {ψ k } where ψ k dmn = 1 and ψ k ≥ 0 and ( spt (ψ k ) ⊆ B 0, k1 is called a mollifier.

566

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

6. ↑It is important to be able to approximate functions with those which are infinitely differentiable. Suppose f ∈ L1 (Rn ) and let {ψ k } be a mollifier as above. We define the convolution as follows. ∫ f ∗ ψ k (x) ≡ f (x − y) ψ k (y) dmn (y) Here the notation means that the variable of integration is y. Show that f ∗ ψ k (x) exists and equals ∫ ψ k (x − y) f (y) dmn (y) Now show using the dominated convergence theorem that f ∗ ψ k is infinitely differentiable. Next show that ∫ lim |f (x) − f ∗ ψ k (x)| dmn = 0 k→∞

Thus, in terms of being close in L1 (Rn ) , every function in L1 (Rn ) is close to one which is infinitely differentiable. 7. ↑From Problem 8 above and f ∈ L1 (Rn ), there exists h ∈ Cc (Rn ) , continuous and spt (h) a compact set, such that ∫ |f − h| dmn < ε ( ( )) Now consider h ∗ ψ k . Show that this function is in Cc∞ spt (h) + B 0, k2 . The notation ( ) means you start with the compact set spt (h) and it up by adding the set B 0, k1 . ( fatten ) It means x + y such that x ∈ spt (h) and y ∈ B 0, k1 . Show the following. For all k large enough, ∫ |f − h ∗ ψ k | dmn < ε so one can approximate with a function which is infinitely differentiable and also has compact ( ) support. Also show that h ∗ ψ k converges uniformly to h. If h is a function in C k Rk in addition to being continuous with compact support, show that for each |α| ≤ k, Dα (h ∗ ψ k ) → Dα h uniformly. Hint: If you do this for a single partial derivative, you will see how it works in general. 8. ↑Let f ∈ L1 (R). Show that

∫ lim

k→∞

f (x) sin (kx) dm = 0

Hint: Use the result of the above problem to obtain g ∈ Cc∞ (R) , continuous and zero off a compact set, such that ∫ |f − g| dm < ε ∫

Then show that lim

k→∞

g (x) sin (kx) dm (x) = 0

You can do this by integration by parts. Then consider this. ∫ ∫ ∫ f (x) sin (kx) dm = f (x) sin (kx) dm − g (x) sin (kx) dm ∫ + g (x) sin (kx) dm ∫ ≤

∫ |f − g| dm + g (x) sin (kx) dm

This is the celebrated Riemann Lebesgue lemma which is the basis for all theorems about pointwise convergence of Fourier series.

22.6. EXERCISES

567

9. As another application of theory of regularity, here is a very important result. Suppose f ∈ L1 (Rn ) and for every ψ ∈ Cc∞ (Rn ) ∫ f ψdmn = 0 Show that then it follows that f (x) = 0 for a.e.x. That is, there is a set of measure zero such that off this set f equals 0. Hint: What you can do is to let E be a measurable which is bounded and let Kk ⊆ E ⊆ Vk where mn (Vk \ Kk ) < 2−k . Here Kk is compact and Vk is open. By an earlier exercise, Problem 12 on Page 510, there exists a function ϕk which is continuous, has values in [0, 1] equals 1 on Kk and spt (ϕk ) ⊆ V. To get this last part, show ¯ k ⊆ Vk and Wk contains Kk . Then you use the problem to there exists Wk open such that W ¯ get spt (ϕk ) ⊆ Wk . Now you form η k = ϕk ∗ ψ l where {ψ l } is a mollifier. Show that for l large enough, η k has values in [0, 1] , spt (η k ) ⊆ Vk and η k ∈ Cc∞ (Vk ). Now explain why η k → XE off a set of measure zero. Then ∫ ∫ ∫ f XE dmn = f (XE − η k ) dmn + f η k dmn ∫ = f (XE − η k ) dmn Now explain why this converges to 0 on the right. This will involve the dominated convergence ∫ theorem. Conclude that f XE dmn = 0 for every bounded measurable set E. Show that this ∫ implies that f XE dmn = 0 for every measurable E. Explain why this requires f = 0 a.e. The result which gets used over and over in all of this is the dominated convergence theorem. 10. This is from the advanced calculus book by Apostol. Justify the following argument using convergence theorems. ∫ F (x) ≡ 0

Then you can take the derivative

d dx

1

2 2 (∫ x )2 e−x (1+t ) −t2 dt + e dt 1 + t2 0

and obtain

( ) 2 2 (∫ x ) e−x (1+t ) (−2x) 1 + t2 2 −t2 dt + 2 F (x) = e dt e−x 2 1+t 0 0 (∫ x ) ∫ 1 2 2 2 2 = e−x (1+t ) (−2x) dt + 2 e−t dt e−x ′



1

0

0

Why can you take the derivative on the inside of the integral? Change the variable in the second integral, letting t = sx. Thus dt = xds. Then the above reduces to (∫ 1 ) ∫ 1 2 2 2 2 2 = e−x (1+t ) (−2x) dt + 2x e−s x ds e−x 0



= (−2x) 0

1

2 2 e−x (1+t ) dt + 2x

(∫

0 1

e−(s

2

+1)x2

0

and so this function of x is constant. However, ∫ 1 dt 1 = π F (0) = 2 4 0 1+t Now you let x → ∞. What happens to that first integral? It equals (∫ 1 ) ∫ 1 −x2 t2 1 e −x2 −x2 dt ≤ e dt e 2 2 0 1+t 0 1+t

) ds = 0

568

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS and so it obviously converges to 0 as x → ∞. Therefore, taking a limit yields (∫



e

−t2

0

)2 √ ∫ ∞ π π −t2 dt = , e dt = 4 2 0

11. The Dini derivates are as follows. In these formulas, f is a real valued function defined on R. f (x + h) − f (x) f (x + h) − f (x) , D+ f (x) ≡ lim inf h→0+ h h h→0+ f (x) − f (x − h) f (x) − f (x − h) D− f (x) ≡ lim sup , D− f (x) ≡ lim inf h→0+ h h h→0+ D+ f (x) ≡ lim sup

Thus when these are all equal, the function has a derivative. Now suppose f is an increasing function. Let { } Nab = x : D+ f (x) > b > a > D+ f (x) , a ≥ 0 r Let V be an open set which contains Nab ∩ (−r, r) ≡ Nab such that

m (V \ (Nab ∩ (−r, r))) < ε Then explain why there exist disjoint intervals [ai , bi ] such that r r m (Nab \ ∪i [ai , bi ]) = m (Nab \ ∪i (ai , bi )) = 0

and f (bi ) − f (ai ) ≤ am (ai , bi ) each interval being contained in V ∩ (−r, r). Thus you have r r m (Nab ) = m (∪i Nab ∩ (ai , bi )) .

Next show there exist disjoint intervals (aj , bj ) such that each of these in some ∑ is contained r (ai , bi ), the (aj , bj ) are disjoint, f (bj ) − f (aj ) ≥ bm (aj , bj ) , and m (N ∩ (a j , bj )) = ab j r m (Nab ). Then you have the following thanks to the fact that f is increasing. ∑ ∑ r a (m (Nab ) + ε) > am (V ) ≥ a (bi − ai ) > f (bi ) − f (ai ) ≥



i

f (bj ) − f (aj ) ≥ b

j

≥ b





i

bj − a j

j r r m (Nab ∩ (aj , bj )) = bm (Nab )

j

and since ε > 0, r r am (Nab ) ≥ bm (Nab ) r showing that m (Nab ) = 0. This is for any r and so m (Nab ) = 0. Thus the derivative from the right exists for a.e. x by taking the complement of the union of the Nab for a, b nonnegative rational numbers. Now do the same thing to show that the derivative from the left exists a.e. and finally, show that D− f (x) = D+ f (x) for almost a.e. x. Off the union of these three exceptional sets of measure zero all the derivates are the same and so the derivative of f exists a.e. In other words, an increasing function has a derivative a.e.

12. This problem is on Eggoroff’s theorem. Suppose you have a measure space (Ω, F, µ) where µ (Ω) < ∞. Also suppose that {fk } is a sequence of measurable, complex valued functions which converge to f pointwise. Then Eggoroff’s theorem says that for any ε > 0 there is a set N with µ (N ) < ε and convergence is uniform on N C .

22.6. EXERCISES

569

{ } 1 (a) Define Emk ≡ ∪∞ r=m ω : |f (ω) − fr (ω)| > k . Show that Emk ⊇ E(m+1)k for all m and that ∩m Emk = ∅ ( ) (b) Show that there exists m (k) such that µ Em(k)k < ε2−k . / N C , if r > m (k) , then (c) Let N ≡ ∪∞ k=1 Em(k)k . Explain why µ (N ) < ε and that for all ω ∈ 1 |f (ω) − fr (ω)| ≤ k . Thus uniform convergence takes place on N C . 13. Suppose you have a sequence {fn } which converges uniformly on each of sets A1 , · · · , An . Why does the sequence converge uniformly on ∪ni=1 Ai ? 14. ↑Now suppose you have a (Ω, F, µ) where µ is a finite Radon measure and Ω is a metric space. For example, you could have Lebesgue measure for Ω a bounded open subset of Rn . Suppose you have f has nonnegative real values for all ω and is measurable. Then Lusin’s theorem says that for every ε > 0, there exists an open set V with measure less than ε and a continuous function defined on Ω such that f (ω) = g (ω) for all ω ∈ / V. (a) By Lemma 22.0.11, there exists an increasing sequence {fn } ⊆ Cc (Ω) which converges ˆ to(f off ) a set N of measure zero. Use Eggoroff’s theorem to enlarge N to N such that ε ˆ < and convergence is uniform off N ˆ. µ N 2

ˆ having measure less than ε. Thus {fn } (b) Next use outer regularity to obtain open V ⊇ N converges uniformly on V C . Therefore, that which it converges to is continuous on V C a closed set. Now use the Tietze extension theorem. 15. Let h : U → Rn be differentiable for U an open set in Rn . Thus, as explained above, h takes measurable sets to measurable sets. Suppose h is one to one. For E a measurable subset of U define µ (E) ≡ mn (h (E)) Show that µ is a measure and whenever mn (E) = 0, it follows that µ (E) = 0. When this happens, we say that µ ≪ mn . Great things can be said in this situation. It is this which allows one to dispense with the assumption in the change of variables formula that h is C 1 (U ). 16. Say you have a change of variables formula which says that ∫ ∫ f (y) dmn = f (h (x)) |det Dh (x)| dmn h(U )

U

and that this holds for all f ≥ 0 and Lebesgue measurable. Show that the formula continues to hold if f ∈ L1 (h (U )) . Hint: Apply what is known to the positive and negative parts of the real and imaginary parts.

570

CHAPTER 22. MEASURES FROM POSITIVE LINEAR FUNCTIONALS

Chapter 23

The Lp Spaces This book is about linear algebra and things which are related to this subject. Nearly all of it is on finite dimensional vector spaces. Now it is time to include some of the most useful function spaces. They are infinite dimensional vector spaces.

23.1

Basic Inequalities And Properties

One of the main applications of the Lebesgue integral is to the study of various sorts of functions space. These are vector spaces whose elements are functions of various types. One of the most important examples of a function space is the space of measurable functions whose absolute values are pth power integrable where p ≥ 1. These spaces, referred to as Lp spaces, are very useful in applications. In the chapter (Ω, S, µ) will be a measure space. Definition 23.1.1 Let 1 ≤ p < ∞. Define



Lp (Ω) ≡ {f : f is measurable and

|f (ω)|p dµ < ∞} Ω

For each p > 1 define q by 1 1 + = 1. p q Often one uses p′ instead of q in this context. Lp (Ω) is a vector space and has a norm. This is similar to the situation for Rn but the proof requires the following fundamental inequality. . Theorem 23.1.2 (Holder’s inequality) If f and g are measurable functions, then if p > 1, (∫

∫ |f | |g| dµ ≤

|f | dµ p

) p1 (∫

Proof: First here is a proof of Young’s inequality . Lemma 23.1.3 If

p > 1, and 0 ≤ a, b then ab ≤

571

ap p

+

bq q .

|g| dµ q

) q1 .

(23.1)

CHAPTER 23. THE LP SPACES

572 Proof: Consider the following picture:

x b x = tp−1 t = xq−1 t a From this picture, the sum of the area between the x axis and the curve added to the area between the t axis and the curve is at least as large as ab. Using beginning calculus, this is equivalent to the following inequality. ∫ a ∫ b bq ap p−1 + . ab ≤ t dt + xq−1 dx = p q 0 0 The above picture represents the situation which occurs when p > 2 because the graph of the function is concave up. If 2 ≥ p > 1 the graph would be concave down or a straight line. You should verify that the same argument holds in these cases just as well. In fact, the only thing which matters in the above inequality is that the function x = tp−1 be strictly increasing. Note equality occurs when ap = bq . Here is an alternate proof. Lemma 23.1.4 For a, b ≥ 0, ab ≤

ap bq + p q

and equality occurs when if and only if ap = bq . Proof: If b = 0, the inequality is obvious. Fix b > 0 and consider f (a) ≡

bq ap + − ab. p q

Then f ′ (a) = ap−1 − b. This is negative when a < b1/(p−1) and is positive when a > b1/(p−1) . Therefore, f has a minimum when a = b1/(p−1) . In other words, when ap = bp/(p−1) = bq since 1/p + 1/q = 1. Thus the minimum value of f is bq bq + − b1/(p−1) b = bq − bq = 0. p q It follows f ≥ 0 and this yields the desired inequality. ∫ ∫ Proof of Holder’s inequality: If either |f∫|p dµ or |g|∫p dµ equals ∞, the inequality 23.1 is obviously valid because ∞ ≥ anything. If either |f |p dµ or |g|p dµ equals 0, then f = 0 a.e. or that g = 0 a.e. and so in this case the ∫left side of the ∫ inequality equals 0 and so the inequality is therefore true. Therefore assume both |f |p dµ and |g|p dµ are less than ∞ and not equal to 0. Let ) (∫ 1/p

|f |p dµ and let

(∫

|g|p dµ

)1/q

= I (f )

= I (g). Then using the lemma, ∫

|f | |g| 1 dµ ≤ I (f ) I (g) p



1 |f |p p dµ + q I (f )



|g|q q dµ = 1. I (g)

23.1. BASIC INEQUALITIES AND PROPERTIES Hence,

573

(∫



)1/p (∫

|f | |g| dµ ≤ I (f ) I (g) =

|f | dµ

)1/q |g| dµ

p

q

.

This proves Holder’s inequality. The following lemma will be needed. Lemma 23.1.5 Suppose x, y ∈ C. Then p

p

p

|x + y| ≤ 2p−1 (|x| + |y| ) . Proof: The function f (t) = tp is concave up for t ≥ 0 because p > 1. Therefore, the secant line joining two points on the graph of this function must lie above the graph of the function. This is illustrated in the following picture.

(|x| + |y|)/2 = m

|x|

|y|

m

Now as shown above,

(

|x| + |y| 2

)p

p



p

|x| + |y| 2

which implies p

p

p

p

|x + y| ≤ (|x| + |y|) ≤ 2p−1 (|x| + |y| ) and this proves the lemma. Note that if y = ϕ (x) is any function for which the graph of ϕ is concave up, you could get a similar inequality by the same argument. Corollary 23.1.6 (Minkowski inequality) Let 1 ≤ p < ∞. Then (∫

)1/p p

|f + g| dµ

(∫ ≤

)1/p p

|f | dµ

(∫ +

)1/p p

|g| dµ

.

(23.2)

Proof: If p = 1, this is obvious because it is just the triangle inequality. Let p > 1. Without loss of generality, assume (∫ )1/p (∫ )1/p p p |f | dµ + |g| dµ 0 there exists N such that if n, m ≥ N , then ||fn − fm ||p < ε. Now select a subsequence as follows. Let n1 be such that ||fn − fm ||p < 2−1 whenever n, m ≥ n1 . Let n2 be such that n2 > n1 and ||fn − fm ||p < 2−2 whenever n, m ≥ n2 . If n1 , · · · , nk have been chosen, let nk+1 > nk and whenever n, m ≥ nk+1 , ||fn − fm ||p < 2−(k+1) . The subsequence just mentioned is {fnk }. Thus, ||fnk − fnk+1 ||p < 2−k . Let gk+1 = fnk+1 − fnk . Then by the corollary to Minkowski’s inequality, ∞>

∞ ∑

||gk+1 ||p ≥

k=1

for all m. It follows that

∫ (∑ m

m ∑ k=1

m ∑ ||gk+1 ||p ≥ |gk+1 | k=1

)p

(

|gk+1 |

dµ ≤

∞ ∑

p

)p ||gk+1 ||p

n. Thus fn is uniformly bounded and product measurable. By the above lemma, (∫ (∫ ∫ ) p1 ) p1 ∫ p p |fn (x, y)| dλ dµ ≥ ( |fn (x, y)| dµ) dλ . Xm

Yk

Yk

(23.9)

Xm

Now observe that |fn (x, y)| increases in n and the pointwise limit is |f (x, y)|. Therefore, using the monotone convergence theorem in 23.9 yields the same inequality with f replacing fn . Next let k → ∞ and use the monotone convergence theorem again to replace Yk with Y . Finally let m → ∞ in what is left to obtain 23.8.  Note that the proof of this theorem depends on two manipulations, the interchange of the order of integration and Holder’s inequality. Note that there is nothing to check in the case of double sums. Thus if aij ≥ 0, it is always the case that 1/p   ( )p 1/p ∑ ∑ ∑ ∑ p   aij  ≤ aij  j

i

i

j

because the integrals in this case are just sums and (i, j) → aij is measurable. The Lp spaces have many important properties.

23.2

Density Considerations

Theorem 23.2.1 Let p ≥ 1 and let (Ω, S, µ) be a measure space. Then the simple functions are dense in Lp (Ω). Proof: Recall that a function, f , having values in R can be written in the form f = f + − f − where f + = max (0, f ) , f − = max (0, −f ) . Therefore, an arbitrary complex valued function, f is of the form ( ) f = Re f + − Re f − + i Im f + − Im f − . If each of these nonnegative functions is approximated by a simple function, it follows f is also approximated by a simple function. Therefore, there is no loss of generality in assuming at the outset that f ≥ 0. Since f is measurable, Theorem 20.1.6 implies there is an increasing sequence of simple functions, {sn }, converging pointwise to f (x). Now |f (x) − sn (x)| ≤ |f (x)|. By the Dominated Convergence theorem, 0 = lim

n→∞

∫ |f (x) − sn (x)|p dµ.

Thus simple functions are dense in Lp .  Recall that for Ω a topological space, Cc (Ω) is the space of continuous functions with compact support in Ω. Also recall the following definition.

CHAPTER 23. THE LP SPACES

578

Definition 23.2.2 Let (Ω, S, µ) be a measure space and suppose (Ω, τ ) is also a topological space. Then (Ω, S, µ) is called a regular measure space if the σ algebra of Borel sets is contained in S and for all E ∈ S, µ(E) = inf{µ(V ) : V ⊇ E and V open} and if µ (E) < ∞,

µ(E) = sup{µ(K) : K ⊆ E and K is compact }

and µ (K) < ∞ for any compact set, K. For example Lebesgue measure is an example of such a measure. More generally these measures are often referred to as Radon measures. The following useful lemma is stated here for convenience. It is Theorem 22.0.3 on Page 541. Lemma 23.2.3 Let Ω be a metric space in which the closed balls are compact and let K be a compact subset of V , an open set. Then there exists a continuous function f : Ω → [0, 1] such that f (x) = 1 for all x ∈ K and spt(f ) is a compact subset of V . That is, K ≺ f ≺ V. It is not necessary to be in a metric space to do this. You can accomplish the same thing using Urysohn’s lemma. Theorem 23.2.4 Let (Ω, S, µ) be a regular measure space as in Definition 23.2.2 where the conclusion of Lemma 23.2.3 holds. Then Cc (Ω) is dense in Lp (Ω). Proof: First consider a measurable set, E where µ (E) < ∞. Let K ⊆ E ⊆ V where µ (V \ K) < ε. Now let K ≺ h ≺ V. Then ∫ ∫ p |h − XE | dµ ≤ XVp \K dµ = µ (V \ K) < ε. It follows that for each s a simple function in Lp (Ω) , there exists h ∈ Cc (Ω) such that ||s − h||p < ε. This is because if m ∑ s(x) = ci XEi (x) i=1

is a simple function in L where the ci are the distinct nonzero values of s each µ (Ei ) < ∞ since otherwise s ∈ / Lp due to the inequality ∫ p p |s| dµ ≥ |ci | µ (Ei ) . p

By Theorem 23.2.1, simple functions are dense in Lp (Ω) 

23.3

Separability

Theorem 23.3.1 For p ≥ 1 and µ a Radon measure, Lp (Rn , µ) is separable. Recall this means there exists a countable set, D, such that if f ∈ Lp (Rn , µ) and ε > 0, there exists g ∈ D such that ||f − g||p < ε. Proof: Let Q be all functions of the form cX[a,b) where [a, b) ≡ [a1 , b1 ) × [a2 , b2 ) × · · · × [an , bn ), and both ai , bi are rational, while c has rational real and imaginary parts. Let D be the set of all finite sums of functions in Q. Thus, D is countable. In fact D is dense in Lp (Rn , µ). To prove this it is necessary to show that for every f ∈ Lp (Rn , µ), there exists an element of D, s such that ||s − f ||p < ε. If it can be shown that for every g ∈ Cc (Rn ) there exists h ∈ D such that ||g − h||p < ε, then this will suffice because if f ∈ Lp (Rn ) is arbitrary, Theorem 23.2.4 implies

23.3. SEPARABILITY

579

there exists g ∈ Cc (Rn ) such that ||f − g||p ≤ ||h − g||p < 2ε . By the triangle inequality,

ε 2

and then there would exist h ∈ Cc (Rn ) such that

||f − h||p ≤ ||h − g||p + ||g − f ||p < ε. Therefore, assume at the outset that f ∈ Cc (Rn ).∏ n Let Pm consist of all sets of the form [a, b) ≡ i=1 [ai , bi ) where ai = j2−m and bi = (j + 1)2−m for j an integer. Thus Pm consists of a tiling of Rn into half open rectangles having diameters 1 n 2−m n 2 . There are countably many of these rectangles; so, let Pm = {[ai , bi )}∞ = i=1 and R ∞ m ∪i=1 [ai , bi ). Let ci be complex numbers with rational real and imaginary parts satisfying −m |f (ai ) − cm , i | 0 there exists δ > 0 such that if |x − y| < δ, |f (x) − f (y)| < ε/2. Now let m be large enough that every box in Pm has diameter less than δ and also that 2−m < ε/2. Then if [ai , bi ) is one of these boxes of Pm , and x ∈ [ai , bi ), |f (x) − f (ai )| < ε/2 and

−m |f (ai ) − cm < ε/2. i | 0 be given. By Theorem 23.3.1 there exists s ∈ D such that ||F − s||p < ε. Thus ||s − f ||Lp (Ω,µ) ≤ ||s − F ||Lp (Rn ,µ) < ε e is dense in Lp (Ω). and so the countable set D

CHAPTER 23. THE LP SPACES

580

23.4

Continuity Of Translation

Definition 23.4.1 Let f be a function defined on U ⊆ Rn and let w ∈ Rn . Then fw will be the function defined on w + U by fw (x) = f (x − w). Theorem 23.4.2 (Continuity of translation in Lp ) Let f ∈ Lp (Rn ) with the measure being Lebesgue measure. Then lim ||fw − f ||p = 0. ||w||→0

Proof: Let ε > 0 be given and let g ∈ Cc (Rn ) with ||g − f ||p < 3ε . Since Lebesgue measure is translation invariant (mn (w + E) = mn (E)), ||gw − fw ||p = ||g − f ||p
0 is given there exists δ < δ 1 such that if |w| < δ, it follows that |g (x − w) − g (x)| < ) ε/3 1 + mn (B)

1/p

. Therefore, (∫ ||g − gw ||p

)1/p p

|g (x) − g (x − w)| dmn

= B

1/p

ε mn (B) )< . ≤ ε ( 1/p 3 3 1 + mn (B) Therefore, whenever |w| < δ, it follows ||g − gw ||p < proves the theorem.

23.5

ε 3

and so from 23.11 ||f − fw ||p < ε. This

Mollifiers And Density Of Smooth Functions

Definition 23.5.1 Let U be an open subset of Rn . Cc∞ (U ) is the vector space of all infinitely differentiable functions which equal zero for all x outside of some compact set contained in U . Similarly, Ccm (U ) is the vector space of all functions which are m times continuously differentiable and whose support is a compact subset of U . Example 23.5.2 Let U = B (z, 2r) [(  ) ]  exp |x − z|2 − r2 −1 if |x − z| < r, ψ (x) =  0 if |x − z| ≥ r. Then a little work shows ψ ∈ Cc∞ (U ). The following also is easily obtained. Lemma 23.5.3 Let U be any open set. Then Cc∞ (U ) ̸= ∅.

23.5. MOLLIFIERS AND DENSITY OF SMOOTH FUNCTIONS

581

Proof: Pick z ∈ U and let r be small enough that B (z, 2r) ⊆ U . Then let ψ ∈ Cc∞ (B (z, 2r)) ⊆ Cc∞ (U ) be the function of the above example. Definition 23.5.4 Let U = {x ∈ Rn : |x| < 1}. A sequence {ψ m } ⊆ Cc∞ (U ) is called a mollifier (This is sometimes called an approximate identity if the differentiability is not included.) if ψ m (x) ≥ 0, ψ m (x) = 0, if |x| ≥

1 , m

∫ and ψ m (x) = 1. Sometimes it may be written as {ψ ε } where ψ ε satisfies the above conditions except ψ ε (x) = 0 if |x| ≥ ε. In other words, ε takes the place of 1/m and in everything that follows ε → 0 instead of m → ∞. ∫ As before, f (x, y)dµ(y) will mean x is fixed and the function y → f (x, y) is being integrated. To make the notation more familiar, dx is written instead of dmn (x). Example 23.5.5 Let ∫

ψ ∈ Cc∞ (B(0, 1)) (B(0, 1) = {x : |x| < 1})

∫with ψ(x) ≥ 0 and ψdm = 1. Let ψ m (x) = cm ψ(mx)n where cm is chosen in such a way that ψ m dm = 1. By the change of variables theorem cm = m . Definition 23.5.6 A function, f , is said to be in L1loc (Rn , µ) if f is µ measurable and if |f |XK ∈ L1 (Rn , µ) for every compact set, K. Here µ is a Radon measure on Rn . Usually µ = mn , Lebesgue measure. When this is so, write L1loc (Rn ) or Lp (Rn ), etc. If f ∈ L1loc (Rn , µ), and g ∈ Cc (Rn ), ∫ f ∗ g(x) ≡ f (y)g(x − y)dµ.

The following lemma will be useful in what follows. It says that one of these very unregular functions in L1loc (Rn , µ) is smoothed out by convolving with a mollifier. Lemma 23.5.7 Let f ∈ L1loc (Rn , µ), and g ∈ Cc∞ (Rn ). Then f ∗ g is an infinitely differentiable function. Here µ is a Radon measure on Rn . Proof: Consider the difference quotient for calculating a partial derivative of f ∗ g. ∫ f ∗ g (x + tej ) − f ∗ g (x) g(x + tej − y) − g (x − y) = f (y) dµ (y) . t t Using the fact that g ∈ Cc∞ (Rn ), the quotient, g(x + tej − y) − g (x − y) , t is uniformly bounded. To see this easily, use Theorem 18.4.2 on Page 441 to get the existence of a constant, M depending on max {||Dg (x)|| : x ∈ Rn } such that |g(x + tej − y) − g (x − y)| ≤ M |t| for any choice of x and y. Therefore, there exists a dominating function for the integrand of the above integral which is of the form C |f (y)| XK where K is a compact set depending on the support of g. It follows the limit of the difference quotient above passes inside the integral as t → 0 and ∫ ∂ ∂ (f ∗ g) (x) = f (y) g (x − y) dµ (y) . ∂xj ∂xj ∂ g play the role of g in the above argument, partial derivatives of all orders exist. Now letting ∂x j A similar use of the dominated convergence theorem shows all these partial derivatives are also continuous. This proves the lemma. Recall also the following partition of unity result. It was proved earlier. See Corollary 22.0.4 on Page 542.

CHAPTER 23. THE LP SPACES

582 ∞

Theorem 23.5.8 Let K be a compact set in Rn and let {Ui }i=1 be an open cover of K. Then there exist functions, ψ k ∈ Cc∞ (Ui ) such that ψ i ≺ Ui and for all x ∈ K, ∞ ∑

ψ i (x) = 1.

i=1

If K1 is a compact subset of U1 there exist such functions such that also ψ 1 (x) = 1 for all x ∈ K1 . Note that in the last conclusion of above corollary, the set U1 could be replaced with Ui for any fixed i by simply renumbering. Theorem 23.5.9 For each p ≥ 1, Cc∞ (Rn ) is dense in Lp (Rn ). Here the measure is Lebesgue measure. Proof: Let f ∈ Lp (Rn ) and let ε > 0 be given. Choose g ∈ Cc (Rn ) such that ||f − g||p < 2ε . This can be done by using Theorem 23.2.4. Now let ∫ ∫ gm (x) = g ∗ ψ m (x) ≡ g (x − y) ψ m (y) dmn (y) = g (y) ψ m (x − y) dmn (y) where {ψ m } is a mollifier. It follows from Lemma 23.5.7 gm ∈ Cc∞ (Rn ). It vanishes if x ∈ / spt(g) + 1 B(0, m ). (∫ ||g − gm ||p

|g(x) −

= ≤

∫ g(x − y)ψ m (y)dmn (y)| dmn (x) p

) p1

(∫ ∫ ) p1 p ( |g(x) − g(x − y)|ψ m (y)dmn (y)) dmn (x) ∫ (∫



|g(x) − g(x − y)| dmn (x) p

) p1

∫ = 1 ) B(0, m

||g − gy ||p ψ m (y)dmn (y)
ε]∩Z = ∅ and without loss of generality, you can assume µ ([M f > ε]) > 0. Next, for each x ∈ [M f > ε] there exists a ball Bx = B (x,rx ) with rx ≤ 1 and ∫ −1 µ (Bx ) |f | dµ > ε. B(x,rx )

Let F be this collection of balls so that [M f > ε] is the set of centers of balls of F. By the Besicovitch covering theorem, n [M f > ε] ⊆ ∪N i=1 {B : B ∈ Gi } where Gi is a collection of disjoint balls of F. Now for some i, µ ([M f > ε]) /Nn ≤ µ (∪ {B : B ∈ Gi }) because if this is not so, then µ ([M f > ε])



Nn ∑

µ (∪ {B : B ∈ Gi })

i=1


ε]) = µ ([M f > ε]), Nn i=1

a contradiction. Therefore for this i, µ ([M f > ε]) Nn

≤ ≤

µ (∪ {B : B ∈ Gi }) = ε−1

∫ Rn



µ (B) ≤

B∈Gi



ε

−1

B∈Gi

∫ |f | dµ B

|f | dµ = ε−1 ||f ||1 .

This shows Claim 1. Claim 2: If g is any continuous function defined on Rn , then for x ∈ / Z, ∫ 1 lim |g (y) − g (x)| dµ (y) = 0 r→0 µ (B (x, r)) B(x,r) and

1 r→0 µ (B (x,r))



lim

g (y) dµ (y) = g (x).

(23.12)

B(x,r)

Proof: Since g is continuous at x, whenever r is small enough, ∫ ∫ 1 1 |g (y) − g (x)| dµ (y) ≤ ε dµ (y) = ε. µ (B (x, r)) B(x,r) µ (B (x,r)) B(x,r) 23.12 follows from the above and the triangle inequality. This proves the claim. Now let g ∈ Cc (Rn ) and x ∈ / Z. Then from the above observations about continuous functions, ]) ([ ∫ 1 |f (y) − f (x)| dµ (y) > ε (23.13) µ x∈ / Z : lim sup r→0 µ (B (x, r)) B(x,r)

23.6. FUNDAMENTAL THEOREM OF CALCULUS FOR RADON MEASURES ]) ∫ 1 ε ≤ µ x∈ / Z : lim sup |f (y) − g (y)| dµ (y) > 2 r→0 µ (B (x, r)) B(x,r) ([ ]) ε +µ x ∈ / Z : |g (x) − f (x)| > . 2 ([ ([ ε ]) ε ]) ≤ µ M (f − g) > + µ |f − g| > 2 2 Now ∫ ε ([ ε ]) |f − g| dµ ≥ µ |f − g| > 2 2 [|f −g|> 2ε ] and so from Claim 1 23.14 and hence 23.13 is dominated by ( ) 2 Nn + ||f − g||L1 (Rn ,µ) . ε ε

585

([

(23.14)

But by regularity of Radon measures, Cc (Rn ) is dense in L1 (Rn , µ) (See Theorem 23.5.9 for somewhat more than is needed.), and so since g in the above is arbitrary, this shows 23.13 equals 0. Now ([ ]) ∫ 1 µ x∈ / Z : lim sup |f (y) − f (x)| dµ (y) > 0 r→0 µ (B (x, r)) B(x,r) ([ ]) ∫ ∞ ∑ 1 1 ≤ µ x∈ / Z : lim sup |f (y) − f (x)| dµ (y) > =0 k r→0 µ (B (x, r)) B(x,r) k=1

By completeness of µ this implies [

1 x∈ / Z : lim sup µ (B (x, r)) r→0

]



|f (y) − f (x)| dµ (y) > 0 B(x,r)

is a set of µ measure zero.  The following corollary is the main result referred to as the Lebesgue Besicovitch Differentiation theorem. Recall L1loc (Rn , µ) refers to functions f which are measurable and also f XK ∈ L1 (Rn , µ) for any K compact. / Z, Corollary 23.6.3 If f ∈ L1loc (Rn , µ), then for a.e.x ∈ ∫ 1 lim |f (y) − f (x)| dµ (y) = 0 . r→0 µ (B (x, r)) B(x,r)

(23.15)

Proof: If f is replaced by f XB(0,k) then the conclusion 23.15 holds for all x ∈F / k where Fk is a set of µ measure 0. Letting k = 1, 2, · · · , and F ≡ ∪∞ F , it follows that F is a set of measure zero k k=1 and for any x ∈ / F , and k ∈ {1, 2, · · · }, 23.15 holds if f is replaced by f XB(0,k) . Picking any such x, and letting k > |x| + 1, this shows ∫ 1 lim |f (y) − f (x)| dµ (y) r→0 µ (B (x, r)) B(x,r) ∫ 1 f XB(0,k) (y) − f XB(0,k) (x) dµ (y) = 0.  = lim r→0 µ (B (x, r)) B(x,r) In case µ is ordinary Lebesgue measure on Rn , the set Z = ∅ and we obtain the usual Lebesgue fundamental theorem of calculus. Corollary 23.6.4 If f ∈ L1loc (Rn , mn ), then for a.e.x, ∫ 1 lim |f (y) − f (x)| dmn (y) = 0 . r→0 mn (B (x, r)) B(x,r) In particular, for a.e. x, 1 lim r→0 mn (B (x, r))

∫ f (y) dmn (y) = f (x) . B(x,r)

(23.16)

CHAPTER 23. THE LP SPACES

586

23.7

Exercises

1. Let E be a Lebesgue measurable set in R. Suppose m(E) > 0. Consider the set E − E = {x − y : x ∈ E, y ∈ E}. Show that E − E contains an interval. Hint: Let ∫ f (x) = XE (t)XE (x + t)dt. Note f is continuous at 0 and f (0) > 0 and use continuity of translation in Lp . 2. Establish the inequality ||f g||r ≤ ||f ||p ||g||q whenever

1 r

=

1 p

+ 1q .

3. Let (Ω, S, µ) be counting measure on N. Thus Ω = N and S = P (N) with µ (S) = number of things in S. Let 1 ≤ p ≤ q. Show that in this case, L1 (N) ⊆ Lp (N) ⊆ Lq (N) . ∫ Hint: This is real easy if you consider what Ω f dµ equals. How are the norms related? 4. Consider the function, f (x, y) =

xp−1 py

+

y q−1 qx

for x, y > 0 and

f (x, y) ≥ 1 for all such x, y and show this implies xy ≤

p

x p

+

1 p

q

y q

+

1 q

= 1. Show directly that

.

5. Give an example of a sequence of functions in Lp (R) which converges to zero in Lp but does not converge pointwise to 0. Does this contradict the proof of the theorem that Lp is complete? 6. Let K be a bounded subset of Lp (Rn ) and suppose that there exists G such that G is compact with ∫ p |u (x)| dx < εp Rn \G

and for all ε > 0, there exist a δ > 0 and such that if |h| < δ, then ∫ p |u (x + h) − u (x)| dx < εp for all u ∈ K. Show that K is precompact in Lp (Rn ). Hint: Let ϕk be a mollifier and consider Kk ≡ {u ∗ ϕk : u ∈ K} . Verify the conditions of the Ascoli Arzela theorem for these functions defined on G and show there is an ε net for each ε > 0. Can you modify this to let an arbitrary open set take the place of Rn ? 7. Let (Ω, d) be a metric space and suppose also that (Ω, S, µ) is a regular measure space such that µ (Ω) < ∞ and let f ∈ L1 (Ω) where f has complex values. Show that for every ε > 0, there exists an open set of measure less than ε, denoted here by V and a continuous function, g defined on Ω such that f = g on V C . Thus, aside from a set of small measure, f is continuous. If |f (ω)| ≤ M , show that it can be assumed that |g (ω)| ≤ M . This is called Lusin’s theorem. Hint: Use Theorems 23.2.4 and 23.1.10 to obtain a sequence of functions in Cc (Ω) , {gn } which converges pointwise a.e. to f and then use Egoroff’s theorem to obtain a small set, W of measure less than ε/2 such that)convergence is uniform on W C . Now let F be a closed ( subset of W C such that µ W C \ F < ε/2. Let V = F C . Thus µ (V ) < ε and on F = V C , the convergence of {gn } is uniform showing that the restriction of f to V C is continuous. Now use the Tietze extension theorem.

23.7. EXERCISES

587

8. Let ϕm ∈ Cc∞ (Rn ), ϕm (x) ≥ 0,and

∫ Rn

ϕm (y)dy = 1 with

lim sup {|x| : x ∈ spt (ϕm )} = 0.

m→∞

Show if f ∈ Lp (Rn ), limm→∞ f ∗ ϕm = f in Lp (Rn ). 9. Let ϕ : R → R be convex. This means ϕ(λx + (1 − λ)y) ≤ λϕ(x) + (1 − λ)ϕ(y) whenever λ ∈ [0, 1]. Verify that if x < y < z, then

ϕ(y)−ϕ(x) y−x



ϕ(z)−ϕ(y) z−y

and that

ϕ(z)−ϕ(x) z−x



ϕ(z)−ϕ(y) . z−y

Show if s ∈ R there exists λ such that ϕ(s) ≤ ϕ(t) + λ(s − t) for all t. Show that if ϕ is convex, then ϕ is continuous. If ϕ : R → R is ∫convex, µ(Ω) = 1, and f : Ω → R is in L1 (Ω), 10. ↑ Prove∫ Jensen’s inequality. ∫ then ϕ( Ω f du) ≤ Ω ϕ(f )dµ. Hint: Let s = Ω f dµ and use Problem 9. ′

11. Let p1 + p1′ = 1, p > 1, let f ∈ Lp (R), g ∈ Lp (R). Show f ∗ g is uniformly continuous on R and |(f ∗ g)(x)| ≤ ||f ||Lp ||g||Lp′ . Hint: You need to consider why f ∗ g exists and then this follows from the definition of convolution and continuity of translation in Lp . ∫∞ ∫1 12. B(p, q) = 0 xp−1 (1 − x)q−1 dx, Γ(p) = 0 e−t tp−1 dt for p, q > 0. The first of these is called the beta function, while the second is the gamma function. Show a.) Γ(p + 1) = pΓ(p); b.) Γ(p)Γ(q) = B(p, q)Γ(p + q). ∫x 13. Let f ∈ Cc (0, ∞) and define F (x) = x1 0 f (t)dt. Show ||F ||Lp (0,∞) ≤

p ||f ||Lp (0,∞) whenever p > 1. p−1

Hint: Argue ∫ ∞ there is no loss of generality in assuming f ≥ 0 and then assume this is so. Integrate 0 |F (x)|p dx by parts as follows: ∫



∫ z }| { − p F dx = xF p |∞ 0 show = 0

p



xF p−1 F ′ dx.

0

0 ′

Now show xF = f − F and use this in the last integral. Complete the argument by using Holder’s inequality and p − 1 = p/q. 14. ↑ ∫Now suppose f ∈ Lp (0, ∞), p > 1, and f not necessarily in Cc (0, ∞). Show that F (x) = 1 x x 0 f (t)dt still makes sense for each x > 0. Show the inequality of Problem 13 is still valid. This inequality is called Hardy’s inequality. Hint: To show this, use the above inequality along with the density of Cc (0, ∞) in Lp (0, ∞). 15. Suppose f, g ≥ 0. When does equality hold in Holder’s inequality? 16. ↑ Show the Vitali Convergence theorem implies the Dominated Convergence theorem for finite measure spaces but there exist examples where the Vitali convergence theorem works and the dominated convergence theorem does not. 17. ↑ Suppose µ(Ω) < ∞, {fn } ⊆ L1 (Ω), and ∫ h (|fn |) dµ < C Ω

for all n where h is a continuous, nonnegative function satisfying lim

t→∞

h (t) = ∞. t

Show {fn } is uniformly integrable. In applications, this often occurs in the form of a bound on ||fn ||p .

CHAPTER 23. THE LP SPACES

588

18. ↑ Sometimes, especially in books on probability, a different definition of uniform integrability is used than that presented here. A set of functions, S, defined on a finite measure space, (Ω, S, µ) is said to be uniformly integrable if for all ε > 0 there exists α > 0 such that for all f ∈ S, ∫ [|f |≥α]

|f | dµ ≤ ε.

Show that this definition is equivalent to the definition of uniform integrability with the addition of the condition that there is a constant, C < ∞ such that ∫ |f | dµ ≤ C for all f ∈ S. 19. f ∈ L∞ (Ω, µ) if there exists a set of measure zero, E, and a constant C < ∞ such that |f (x)| ≤ C for all x ∈ / E. ||f ||∞ ≡ inf{C : |f (x)| ≤ C a.e.}. Show || ||∞ is a norm on L∞ (Ω, µ) provided f and g are identified if f (x) = g(x) a.e. Show L∞ (Ω, µ) is complete. Hint: You might want to show that [|f | > ||f ||∞ ] has measure zero so ||f ||∞ is the smallest number at least as large as |f (x)| for a.e. x. Thus ||f ||∞ is one of the constants, C in the above. 20. Suppose f ∈ L∞ ∩ L1 . Show limp→∞ ||f ||Lp = ||f ||∞ . Hint: ∫ p p (||f ||∞ − ε) µ ([|f | > ||f ||∞ − ε]) ≤ |f | dµ ≤ [|f |>||f ||∞ −ε] ∫ ∫ ∫ p p−1 p−1 |f | dµ = |f | |f | dµ ≤ ||f ||∞ |f | dµ. Now raise both ends to the 1/p power and take lim inf and lim sup as p → ∞. You should get ||f ||∞ − ε ≤ lim inf ||f ||p ≤ lim sup ||f ||p ≤ ||f ||∞ 21. Suppose µ(Ω) < ∞. Show that if 1 ≤ p < q, then Lq (Ω) ⊆ Lp (Ω). Hint inequality.

Use Holder’s

√ 22. Show L1 (R) * L2 (R) and L2 (R) * L1 (R) if Lebesgue measure is used. Hint: Consider 1/ x and 1/x. 23. Suppose that θ ∈ [0, 1] and r, s, q > 0 with θ 1−θ 1 = + . q r s ∫

show that

∫ |f | dµ) q

(

1/q

If q, r, s ≥ 1 this says that

≤ ((

∫ |f | dµ) r

1/r θ

) ((

|f |s dµ)1/s )1−θ.

||f ||q ≤ ||f ||θr ||f ||1−θ . s

Using this, show that ( ) ln ||f ||q ≤ θ ln (||f ||r ) + (1 − θ) ln (||f ||s ) . ∫

Hint:

∫ |f | dµ = q

Now note that 1 =

θq r

+

q(1−θ) s

|f |qθ |f |q(1−θ) dµ.

and use Holder’s inequality.

23.7. EXERCISES

589

24. Suppose f is a function in L1 (R) and f is infinitely differentiable. Is f ′ ∈ L1 (R)? Hint: What if ϕ ∈ Cc∞ (0, 1) and f (x) = ϕ (2n (x − n)) for x ∈ (n, n + 1) , f (x) = 0 if x < 0? 25. Let (Rn F, µ) be a measure space with µ a Radon measure. That is, it is regular, F contains the Borel sets, µ is complete, and finite on compact sets. Let A be a measurable set. Show that for a.e.x ∈ A, µ (A ∩ B (x, r)) . 1 = lim r→0 µ (B (x, r)) Such points are called “points of density”. Hint: The above quotient is nothing more than ∫ 1 XA (x) dµ µ (B (x, r)) B(x,r) Now consider Corollary 23.6.3.

590

CHAPTER 23. THE LP SPACES

Chapter 24

Representation Theorems 24.1

Basic Theory

As explained earlier, a normed linear space is a vector space X with a norm. This is a map ∥·∥ : X → [0, ∞) which satisfies the following axioms. 1. ∥x∥ ≥ 0 and equals 0 if and only if x = 0 2. For α a scalar and x a vector in X, ∥αx∥ = |α| ∥x∥ 3. ∥x + y∥ ≤ ∥x∥ + ∥y∥ Then, as discussed earlier, this is a metric space if d (x, y) ≡ ∥x − y∥. The field of scalars will be either R or C, usually C. Then as before, there is a definition of an inner product space. If (X, ∥·∥) is complete, meaning that Cauchy sequences converge, then this is called a Banach space. A whole lot can be said about Banach spaces but not in this book. Here, a specialization will be considered which ties in well with the earlier material on inner product spaces. Definition 24.1.1 Let X be a vector space. An inner product is a mapping from X × X to C if X is complex and from X × X to R if X is real, denoted by (x, y) which satisfies the following. (x, x) ≥ 0, (x, x) = 0 if and only if x = 0,

(24.1)

(x, y) = (y, x).

(24.2)

(ax + by, z) = a(x, z) + b(y, z).

(24.3)

For a, b ∈ C and x, y, z ∈ X, Note that 24.2 and 24.3 imply (x, ay + bz) = a(x, y) + b(x, z). Such a vector space is called an inner product space. The Cauchy Schwarz inequality is fundamental for the study of inner product spaces. The proof is identical to that given earlier in Theorem 11.4.4 on Page 266. Theorem 24.1.2 (Cauchy Schwarz) In any inner product space |(x, y)| ≤ ||x|| ||y||. Also, as earlier, the norm given by the inner product, really is a norm. 1/2

Proposition 24.1.3 For an inner product space, ||x|| ≡ (x, x)

does specify a norm.

The following lemma is called the parallelogram identity. It was also discussed earlier. 591

592

CHAPTER 24. REPRESENTATION THEOREMS

Lemma 24.1.4 In an inner product space, ||x + y||2 + ||x − y||2 = 2||x||2 + 2||y||2. The proof, a straightforward application of the inner product axioms, is left to the reader. Lemma 24.1.5 For x ∈ H, an inner product space, ||x|| = sup |(x, y)|

(24.4)

||y||≤1

Proof: By the Cauchy Schwarz inequality, if x ̸= 0, ( ) x ||x|| ≥ sup |(x, y)| ≥ x, = ||x|| . ||x|| ||y||≤1 It is obvious that 24.4 holds in the case that x = 0. Definition 24.1.6 A Hilbert space is an inner product space which is complete. Thus a Hilbert space is a Banach space in which the norm comes from an inner product as described above. Often people use the symbol |·| to denote the norm rather than ∥·∥. In Hilbert space, one can define a projection map onto closed convex nonempty sets. Definition 24.1.7 A set, K, is convex if whenever λ ∈ [0, 1] and x, y ∈ K, λx + (1 − λ)y ∈ K. This was done in the problems beginning with Problem 8 on Page 280. Theorem 24.1.8 Let K be a closed convex nonempty subset of a Hilbert space, H, and let x ∈ H. Then there exists a unique point P x ∈ K such that ||P x − x|| ≤ ||y − x|| for all y ∈ K. Corollary 24.1.9 Let K be a closed, convex, nonempty subset of a Hilbert space, H, and let x ∈ H. Then for z ∈ K, z = P x if and only if Re(x − z, y − z) ≤ 0

(24.5)

for all y ∈ K. Here we present an easier version which will be sufficient for what is needed. First is a simple lemma which is interesting for its own sake. Lemma 24.1.10 Suppose Re (w, y) = 0 for all y ∈ M a subspace of H, an inner product space. Then (w, y) = 0. Proof: Consider the following: (w, y) = (w, iy) =

Re (w, y) + i Im (w, y) −i (w, y) = −i (Re (w, y) + i Im (w, y))

and so Im (w, y) = Re (w, iy) = 0. Since Im (w, y) = 0 as well as Re (w, y) , it follows that (w, y) = 0.  Theorem 24.1.11 Let H be a Hilbert space and let M be a closed subspace. Then if w ∈ H, there exists a unique P w ∈ H such that ∥P w − w∥ ≤ ∥y − w∥ for all y ∈ M . Thus P w is the point of M closest to w and it is unique. Then z ∈ M equals P w if and only if (w − z, y) = 0 for all y ∈ M Also the map w → P w is a linear map which satisfies ∥P w∥ ≤ ∥w∥.

24.1. BASIC THEORY

593

Proof: Let λ ≡ inf {∥y − w∥ : y ∈ M }. Let {yn } be a minimizing sequence. Say ∥yn − w∥ < λ + n1 . Then, by the parallelogram identity,



yn − ym 2 (yn − w) + (ym − w) 2

+



2 2

2

2

yn − ym

+ yn + ym − w



2 2 1 2 ∥ym − yn ∥ 4



yn − w 2

ym − w 2



= 2 + 2 2 2 ( )2 ( )2 1 1 1 1 ≤ λ+ + λ+ 2 n 2 m ( )2 ( )2 1 1 1 1 ≤ λ+ + λ+ − λ2 2 n 2 m

(24.6) (24.7)

and clearly the right side converges to 0 if m, n → ∞. Thus this is a Cauchy sequence and so it converges to some z ∈ M since M is closed. Then λ ≤ ∥z − w∥ = lim ∥yn − w∥ ≤ λ n→∞

and so P w ≡ z is a closest point. There can be only one closest point because if zi works, then by the parallelogram identity again,

2

z1 + z2

z1 − w z2 − w 2



− w = +

2 2 2





z2 − w z1 − z2 2

z1 − w 2



+ 2 − = 2 2 2 2 =

1 2 1 2 1 1 2 2 λ + λ − ∥z1 − z2 ∥ = λ2 − ∥z1 − z2 ∥ 2 2 4 4

2 and so if these are different, then z1 +z is closer to w than λ contradicting the choice of zi . 2 Now for the characterization: For z ∈ M, letting y ∈ M consider z + t (y − z) for t ∈ R. Then

2

2

2

∥w − (z + t (y − z))∥ = ∥w − z∥ + t2 ∥y − z∥ − 2t Re (w − z, y − z)

(*)

Then for z to be the closest point to w, one needs the above to be minimized when t = 0. Taking a derivative, this requires that Re (w − z, y − z) = 0 for any y ∈ M. But this is the same as saying that Re (w − z, y) = 0 for all y ∈ M . By Lemma 24.1.10, (w − z, y) = 0 for all y ∈ M . Thus (w − z, y) = 0 if z = P w. 2 Conversely, if (w − z, y) = 0 for all y ∈ M, then ∗ shows that ∥w − (z + t (y − z))∥ achieves its minimum when t = 0 for any y. But a generic point of M is of the form z + t (y − z) and so z = P w. As to P being linear, for y ∈ M arbitrary, 0 = (αw1 + βw2 − P (αw1 + βw2 ) , y) Also, 0 = (αw1 + βw2 − (αP (w1 ) + βP (w2 )) , y) By uniqueness, P (αw1 + βw2 ) = αP (w1 ) + βP (w2 ). Finally, 2 (w − P w, P w) = 0, ∥P w∥ = (w, P w) ≤ ∥w∥ ∥P w∥ which yields the desired estimate.  Note that the operator norm of P equals 1. ∥P ∥ = sup ∥P w∥ ≤ sup ∥w∥ = 1 ∥w∥≤1

∥w∥≤1

Now pick w ∈ M with ∥w∥ = 1. Then ∥P ∥ ≥ ∥P w∥ = ∥w∥ = 1.

594

CHAPTER 24. REPRESENTATION THEOREMS

Definition 24.1.12 If A : X → Y is linear where X, Y are two normed linear spaces, then A is said to be in L (X, Y ) if and only if ∥A∥ ≡ sup ∥Ax∥Y < ∞ ∥x∥X ≤1

In case that Y = F the field of scalars, equal to either R or C, L (X, F) is known as the dual space, written here as X ′ . Actually it is more often written as X ∗ but I prefer the former notation because the latter is sometimes used to denote a purely algebraic dual space, meaning only that its elements are linear maps into F with no requirement of continuity. Doubtless there are drawbacks to my notation also. Thus P the projection map in the above is in L (H, H). There is a general easy result about L (X, Y ) which follows. It says that these linear maps are continuous. Theorem 24.1.13 Let X and Y be two normed linear spaces and let L : X → Y be linear (L(ax + by) = aL(x) + bL(y) for a, b scalars and x, y ∈ X). The following are equivalent a.) L is continuous at 0 b.) L is continuous c.) There exists K > 0 such that ∥Lx∥Y ≤ K ∥x∥X for all x ∈ X (L is bounded). Proof: a.)⇒b.) Let xn → x. It is necessary to show that Lxn → Lx. But (xn − x) → 0 and so from continuity at 0, it follows L (xn − x) = Lxn − Lx → 0 so Lxn → Lx. This shows a.) implies b.). b.)⇒c.) Since L is continuous, L is continuous at 0. Hence ||Lx||Y < 1 whenever ||x||X ≤ δ for some δ. Therefore, suppressing the subscript on the || ||, ( ) δx ||L || ≤ 1. ||x|| Hence ||Lx|| ≤

1 ||x||. δ

c.)⇒a.) follows from the inequality given in c.).  The following theorem is called the Riesz representation theorem for the dual of a Hilbert space. If z ∈ H then define an element f ∈ H ′ by the rule (x, z) ≡ f (x). It follows from the Cauchy Schwarz inequality and the properties of the inner product that f ∈ H ′ . The Riesz representation theorem says that all elements of H ′ are of this form. Theorem 24.1.14 Let H be a Hilbert space and let f ∈ H ′ . Then there exists a unique z ∈ H such that f (x) = (x, z) (24.8) for all x ∈ H. Proof: Letting y, w ∈ H the assumption that f is linear implies f (yf (w) − f (y)w) = f (w) f (y) − f (y) f (w) = 0 which shows that yf (w) − f (y)w ∈ f −1 (0), which is a closed subspace of H since f is continuous. If f −1 (0) = H, then f is the zero map and z = 0 is the unique element of H which satisfies 24.8. If f −1 (0) ̸= H, pick u ∈ / f −1 (0) and let w ≡ u − P u ̸= 0. Thus (y, w) = 0 for all y ∈ f −1 (0). In particular, let y = xf (w) − f (x)w where x ∈ H is arbitrary. Therefore, 0 = (f (w)x − f (x)w, w) = f (w)(x, w) − f (x)||w||2.

24.2. RADON NIKODYM THEOREM

595

Thus, solving for f (x) and using the properties of the inner product, f (w)w ) ||w||2

f (x) = (x,

Let z = f (w)w/||w||2 . This proves the existence of z. If f (x) = (x, zi ) i = 1, 2, for all x ∈ H, then for all x ∈ H, then (x, z1 − z2 ) = 0 which implies, upon taking x = z1 − z2 that z1 = z2 .  If R : H → H ′ is defined by Rx (y) ≡ (y, x) , the Riesz representation theorem above states this map is onto. This map is called the Riesz map. It is routine to show R is conjugate linear and ∥Rx∥ = ∥x∥. In fact, ¯ (u, y) R (αx + βy) (u) ≡ (u, αx + βy) = α ¯ (u, x) + β ( ) ¯ ¯ ≡ α ¯ Rx (u) + βRy (u) = α ¯ Rx + βRy (u) so it is conjugate linear meaning it goes across plus signs and you factor out conjugates. ∥Rx∥ ≡ sup |Rx (y)| ≡ sup |(y, x)| = ∥x∥ ∥y∥≤1

24.2

∥y∥≤1

Radon Nikodym Theorem

The Radon Nikodym theorem, is a representation theorem for one measure in terms of another. The approach given here is due to Von Neumann and depends on the Riesz representation theorem for Hilbert space, Theorem 24.1.14. Definition 24.2.1 Let µ and λ be two measures defined on a σ-algebra S, of subsets of a set, Ω. λ is absolutely continuous with respect to µ,written as λ ≪ µ, if λ(E) = 0 whenever µ(E) = 0. It is not hard to think of examples which should be like this. For example, suppose one measure is volume and the other is mass. If the volume of something is zero, it is reasonable to expect the mass of it should also be equal to zero. In this case, there is a function called the density which is integrated over volume to obtain mass. The Radon Nikodym theorem is an abstract version of this notion. Essentially, it gives the existence of the density function. Theorem 24.2.2 (Radon Nikodym) Let λ and µ be finite measures defined on a σ-algebra, S, of subsets of Ω. Suppose λ ≪ µ. Then there exists a unique f ∈ L1 (Ω, µ) such that f (x) ≥ 0 and ∫ λ(E) = f dµ. E

If it is not necessarily the case that λ ≪ µ, there are two measures, λ⊥ and λ|| such that λ = λ⊥ + λ|| , λ|| ≪ µ and there exists a set of µ measure zero, N such that for all E measurable, λ⊥ (E) = λ (E ∩ N ) = λ⊥ (E ∩ N ) . In this case the two measures, λ⊥ and λ|| are unique and the representation of λ = λ⊥ + λ|| is called the Lebesgue decomposition of λ. The measure λ|| is the absolutely continuous part of λ and λ⊥ is called the singular part of λ. In words, λ⊥ is “supported” on a set of µ measure zero while λ|| is supported on the complement of this set. Proof: Let Λ : L2 (Ω, µ + λ) → C be defined by ∫ Λg = g dλ. Ω

By Holder’s inequality, (∫ |Λg| ≤

)1/2 (∫ 1 dλ



)1/2 1/2 = λ (Ω) ||g||2 |g| d (λ + µ) 2

2



596

CHAPTER 24. REPRESENTATION THEOREMS

where ||g||2 is the L2 norm of g taken with respect to µ + λ. Therefore, since Λ is bounded, it follows from Theorem 24.1.13 on Page 594 that Λ ∈ (L2 (Ω, µ+λ))′ , the dual space L2 (Ω, µ+λ). By the Riesz representation theorem in Hilbert space, Theorem 24.1.14, there exists a unique h ∈ L2 (Ω, µ + λ) with ∫ ∫ Λg = g dλ = hgd(µ + λ). (24.9) Ω



The plan is to show h is real and nonnegative at least a.e. Therefore, consider the set where Im h is positive. E = {x ∈ Ω : Im h(x) > 0} , Now let g = XE and use 24.9 to get



λ(E) =

(Re h + i Im h)d(µ + λ).

(24.10)

E

Since the left side of 24.10 is real, this shows ∫ ∫ 0= (Im h) d(µ + λ) ≥ E

(Im h) d(µ + λ) ≥

En

{

where En ≡

x : Im h (x) ≥

1 n

1 (µ + λ) (En ) n

}

Thus (µ + λ) (En ) = 0 and since E = ∪∞ n=1 En , it follows (µ + λ) (E) = 0. A similar argument shows that for E = {x ∈ Ω : Im h (x) < 0}, (µ + λ)(E) = 0. Thus there is no loss of generality in assuming h is real-valued. The next task is to show h is nonnegative. This is done in the same manner as above. Define the set where it is negative and then show this set has measure zero. Let E ≡ {x : h(x) < 0} and let En ≡ {x : h(x) < − n1 }. Then let g = XEn . Since E = ∪n En , it follows that if (µ + λ) (E) > 0 then this is also true for (µ + λ) (En ) for all n large enough. Then from 24.10 ∫ λ(En ) = h d(µ + λ) ≤ − (1/n) (µ + λ) (En ) < 0, En

a contradiction. Thus it can be assumed h ≥ 0. This shows that in every case, ∫ λ (E) = hd (µ + λ) E

At this point the argument splits into two cases. Case Where λ ≪ µ. In this case, h < 1. Let E = [h ≥ 1] and let g = XE . Then ∫ λ(E) = h d(µ + λ) ≥ µ(E) + λ(E). E

Therefore µ(E) = 0. Since λ ≪ µ, it follows that λ(E) = 0 also. Thus it can be assumed 0 ≤ h(x) < 1 for all x. From 24.9, whenever g ∈ L2 (Ω, µ + λ), ∫ ∫ g dλ = hgd(µ + λ) Ω



24.2. RADON NIKODYM THEOREM

597



and so

∫ g(1 − h)dλ =

hgdµ.



(24.11)



Now let E be a measurable set and define g(x) ≡

n ∑

hi (x)XE (x)

i=0

in 24.11. This yields

∫ (1 − h

n+1

(x))dλ =

E

Let f (x) = conclude

∑∞ i=1

∫ n+1 ∑

hi (x)dµ.

(24.12)

E i=1

hi (x) and use the Monotone Convergence theorem in 24.12 to let n → ∞ and ∫ λ(E) = f dµ. E

f ∈ L (Ω, µ) because λ is finite. The function, f is unique µ a.e. because, if g is another function which also serves to represent λ, consider for each n ∈ N the set, [ ] 1 En ≡ f − g > n 1



and conclude that

(f − g) dµ ≥

0= En

1 µ (En ) . n

Therefore, µ (En ) = 0. It follows that µ ([f − g > 0]) ≤

∞ ∑

µ (En ) = 0

n=1

Similarly, the set where g is larger than f has measure zero. This proves the theorem when µ ≫ λ. Case where it is not necessarily true that λ ≪ µ. In this case, let N = [h ≥ 1] and let g = XN . Then ∫ λ(N ) = h d(µ + λ) ≥ µ(N ) + λ(N ). N

and so µ (N ) = 0. Now define measures, λ⊥ , λ|| ( ) λ⊥ (E) ≡ λ (E ∩ N ) , λ|| (E) ≡ λ E ∩ N C so λ = λ⊥ + λ|| Since µ (N ) = 0,

( ) µ (E) = µ E ∩ N C ( ) Suppose then that µ (E) = µ E ∩ N C = 0. Does λ|| (E) = 0? Then since h < 1 on N C , if λ|| (E) > 0, ∫ ) ( h d(µ + λ) λ|| (E) ≡ λ E ∩ N C = E∩N C ( ) ( ) < µ E ∩ N C + λ E ∩ N C = µ (E) + λ|| (E) , which is a contradiction because of the strict inequality which results if λ|| (E) > 0. Therefore, λ|| ≪ µ because if µ (E) = 0, then λ|| (E) = 0. ˆ ⊥ and λ ˆ || It only remains to verify the two measures λ⊥ and λ|| are unique. Suppose then that λ ˆ ˆ play the roles of λ⊥ and λ|| respectively. Let N play the role of N in the definition of λ⊥ and let fˆ

598

CHAPTER 24. REPRESENTATION THEOREMS

[ ] ˆ || . I will show that f = fˆ µ a.e. Let Ek ≡ fˆ − f > 1/k for k ∈ N. Then on play the role of f for λ ˆ⊥ = λ ˆ || − λ|| observing that λ⊥ − λ 0

= ≥

( )( ) ∫ C ˆ λ⊥ − λ⊥ Ek ∩ (N1 ∪ N ) = ) 1 1 ( C µ Ek ∩ (N1 ∪ N ) = µ (Ek ) . k k

(

Ek ∩(N1 ∪N )C

) fˆ − f dµ

and so µ (Ek ) = 0. The last equality follows from ( ) ) ) ( ( C µ Ek ∩ (N1 ∪ N ) = µ Ek ∩ N1C ∩ N2C = µ Ek ∩ N1C = µ (Ek ) ([ ]) [ ] ˆ Therefore, µ fˆ − f > 0 = 0 because fˆ − f > 0 = ∪∞ k=1 Ek . It follows f ≤ f µ a.e. Similarly, ˆ || = λ|| and so λ⊥ = λ ˆ ⊥ also.  fˆ ≥ f µ a.e. Therefore, λ The f in the theorem for the absolutely continuous case is sometimes denoted by the Radon Nikodym derivative. The next corollary is a useful generalization to σ finite measure spaces.

dλ dµ

and is called

Corollary 24.2.3 Suppose λ ≪ µ and there exist sets Sn ∈ S, the σ algebra of measurable sets with Sn ∩ Sm = ∅, ∪∞ n=1 Sn = Ω, and λ(Sn ), µ(Sn ) < ∞. Then there exists f ≥ 0, where f is µ measurable, and ∫ λ(E) = f dµ E

for all E ∈ S. The function f is µ + λ a.e. unique. Proof: Define the σ algebra of subsets of Sn , Sn ≡ {E ∩ Sn : E ∈ S}. Then both λ, and µ are finite measures on Sn , and λ ≪ ∫ µ. Thus, by Theorem 24.2.2, there exists a nonnegative Sn measurable function fn ,with λ(E) = E fn dµ for all E ∈ Sn . Define f (x) = fn (x) for x ∈ Sn . Since the Sn are disjoint and their union is all of Ω, this defines f on all of Ω. The function, f is measurable because −1 f −1 ((a, ∞]) = ∪∞ n=1 fn ((a, ∞]) ∈ S.

Also, for E ∈ S, λ(E) = =

∞ ∑

λ(E ∩ Sn ) =

n=1 ∞ ∫ ∑

∞ ∫ ∑

XE∩Sn (x)fn (x)dµ

n=1

XE∩Sn (x)f (x)dµ

n=1

By the monotone convergence theorem ∞ ∫ ∑

XE∩Sn (x)f (x)dµ

=

n=1

= =

lim

N →∞

lim

N ∫ ∑

XE∩Sn (x)f (x)dµ

n=1

∫ ∑ N

N →∞

∫ ∑ ∞ n=1

XE∩Sn (x)f (x)dµ

n=1



XE∩Sn (x)f (x)dµ =

f dµ. E

24.2. RADON NIKODYM THEOREM

599

This proves the existence part of the corollary. To see f is unique, suppose f1 and f2 both work and consider for n ∈ N [ ] 1 Ek ≡ f1 − f2 > . k ∫

Then 0 = λ(Ek ∩ Sn ) − λ(Ek ∩ Sn ) =

Ek ∩Sn

f1 (x) − f2 (x)dµ.

Hence µ(Ek ∩ Sn ) = 0 for all n so µ(Ek ) = lim µ(E ∩ Sn ) = 0. n→∞

Hence µ([f1 − f2 > 0]) ≤

∑∞ k=1

µ (Ek ) = 0. Therefore, λ ([f1 − f2 > 0]) = 0 also. Similarly (µ + λ) ([f1 − f2 < 0]) = 0. 

This version of the Radon Nikodym theorem will suffice for most applications, but more general versions are available. To see one of these, one can read the treatment in Hewitt and Stromberg [20]. This involves the notion of decomposable measure spaces, a generalization of σ finite. Not surprisingly, there is a simple generalization of the Lebesgue decomposition part of Theorem 24.2.2. Corollary 24.2.4 Let (Ω, S) be a set with a σ algebra of sets. Suppose λ and µ are two measures ∞ defined on the sets of S and suppose there exists a sequence of disjoint sets of S, {Ωi }i=1 such that λ (Ωi ) , µ (Ωi ) < ∞, Ω = ∪i Ωi . Then there is a set of µ measure zero, N and measures λ⊥ and λ|| such that λ⊥ + λ|| = λ, λ|| ≪ µ, λ⊥ (E) = λ (E ∩ N ) = λ⊥ (E ∩ N ) . Proof: Let Si ≡ {E ∩ Ωi : E ∈ S} and for E ∈ Si , let λi (E) = λ (E) and µi (E) = µ (E) . Then by Theorem 24.2.2 there exist unique measures λi⊥ and λi|| such that λi = λi⊥ + λi|| , a set of µi measure zero, Ni ∈ Si such that for all E ∈ Si , λi⊥ (E) = λi (E ∩ Ni ) and λi|| ≪ µi . Define for E ∈ S λ⊥ (E) ≡



λi⊥ (E ∩ Ωi ) , λ|| (E) ≡



λi|| (E ∩ Ωi ) , N ≡ ∪i Ni .

i

i

First observe that λ⊥ and λ|| are measures. ∑ ( ) ( ) ∑∑ i λ⊥ ∪∞ ≡ λi⊥ ∪∞ λ⊥ (Ej ∩ Ωi ) j=1 Ej j=1 Ej ∩ Ωi = i

=

∑∑ j

=

i

λi⊥

(Ej ∩ Ωi ) =

i

∑∑ j

j

λi⊥ (Ej ∩ Ωi ) =

j

∑∑ ∑

λ (Ej ∩ Ωi ∩ Ni )

i

λ⊥ (Ej ) .

j

i

The argument for λ|| is similar. Now µ (N ) =



µ (N ∩ Ωi ) =

i



µi (Ni ) = 0

i

and λ⊥ (E) ≡

∑ i

=

∑ i

λi⊥ (E ∩ Ωi ) =



λi (E ∩ Ωi ∩ Ni )

i

λ (E ∩ Ωi ∩ N ) = λ (E ∩ N ) .

600

CHAPTER 24. REPRESENTATION THEOREMS

Also if µ (E) = 0, then µi (E ∩ Ωi ) = 0 and so λi|| (E ∩ Ωi ) = 0. Therefore, ∑ λ|| (E) = λi|| (E ∩ Ωi ) = 0. i

The decomposition is unique because of the uniqueness of the λi|| and λi⊥ and the observation that some other decomposition must coincide with the given one on the Ωi . 

24.3

Improved Change Of Variables Formula

Recall the change of variables formula. An assumption was made that the transformation was C 1 . This made the argument easy to push through with the use of the Besicovitch covering theorem. However, with the Radon Nikodym theorem and the fundamental theorem of calculus, it is easy to give a much shorter argument which actually gives a better result because it does not require the derivative to be continuous. Recall that U ⊆ Rn , was an open set on which h was differentiable and one to one with det (Dh (x)) ̸= 0. Then for x ∈ U, we had an inequality which came from the definition of the derivative and the Brouwer fixed point theorem. n

|det Dh (x)| (1 − ε) mn (B (x, rx )) ≤ mn (h (B (x, rx ))) n

≤ |det Dh (x)| (1 + ε) mn (B (x, rx ))

(18octe1s)

this for all rx sufficiently small. Now consider the following measure µ (E) ≡ mn (h (E)) . By Proposition 22.5.1 this is well defined. It is indeed a measure and µ ≪ mn the details in Problem 15 on Page 569. Therefore, by Corollary 24.2.3, there is a nonnegative measurable g which is in L1loc (Rn ) such that ∫ µ (E) ≡ mn (h (E)) =

gdmn E

It is clearly a Radon measure. In particular,



mn (h (B (x, r))) =

gdmn B(x,r)

The idea is to identify g. Let x be a Lebesgue point of g, for all r small enough, |det Dh (x)| (1 − ε)

n

≤ = ≤

mn (h (B (x, r))) mn (B (x, r)) ∫ 1 gdmn mn (B (x, r)) B(x,r) |det Dh (x)| (1 + ε)

n

Then letting r → 0, n

n

|det Dh (x)| (1 − ε) ≤ g (x) ≤ |det Dh (x)| (1 + ε)

and so, since ε is arbitrary, g (x) = |det Dh (x)|. Thus for any measurable A ⊆ U, ∫ mn (h (A)) = |det Dh| dmn

(**)

A

This yields the following theorem. Theorem 24.3.1 Let U be an open set in Rn and let h : U → Rn be one to one and differentiable with det Dh (x) ̸= 0. Then if f ≥ 0 and Lebesgue measurable, ∫ ∫ f dmn = (f ◦ h) |det Dh| dmn h(U )

U

24.4. VECTOR MEASURES

601

Proof: The proof is exactly as before. You just approximate with simple functions and use ∗∗. Now if A is measurable, so is h (A) by Proposition 22.5.1. Then from the above, ∫ ∫ Xh(A) (y) f (y) dmn = (XA (x) (f ◦ h) (x)) |det Dh (x)| dmn h(U )

Thus

U



∫ (f ◦ h) (x) |det Dh (x)| dmn

f (y) dmn = h(A)

A

This gives a slight generalization. Corollary 24.3.2 Let A be any measurable subset of U and let f be a nonnegative Lebesgue measurable function. Let h : U → Rn be one to one and differentiable with det Dh (x) ̸= 0. Then ∫ ∫ f (y) dmn = (f ◦ h) (x) |det Dh (x)| dmn h(A)

A

As to the generalization when h is only assumed to be one to one and differentiable, this also follows as before. The proof of Sard’s lemma in Theorem 22.5.6 did not require any continuity of the derivative. Thus as before, h (U0 ) = 0 where U0 ≡ {x : det Dh (x) = 0} , Borel measurable because det Dh is Borel measurable due to the fact that h is continuous and the entries of the matrix of Dh are each Borel measurable because they are limits of difference quotients which are continuous functions. Letting U+ ≡ {x : |det Dh (x)| > 0} , Corollary 24.3.2 and Sard’s lemma imply ∫ ∫ f (y) dmn = f (y) dmn h(U ) h(U+ ) ∫ (f ◦ h) (x) |det Dh (x)| dmn = U+ ∫ = (f ◦ h) (x) |det Dh (x)| dmn U

Then one can also obtain the result of Corollary 24.3.2 in the same way. This leads to the following version of the change of variables formula. Theorem 24.3.3 Let A be any measurable subset of U, an open set in Rn and let f be a nonnegative Lebesgue measurable function. Let h : U → Rn be one to one and differentiable. Then ∫ ∫ f (y) dmn = (f ◦ h) (x) |det Dh (x)| dmn h(A)

A

You can also tweak this a little more to get a slightly more general result. You could assume, for example, that h is continuous on the open set U and differentiable and one to one on some F ⊆ U where F is measurable and mn (h (U \ F )) = 0 rather than assuming that h is differentiable on all of the open set U . This would end up working out also, but the above is pretty close and is easier to remember.

24.4

Vector Measures

The next topic will use the Radon Nikodym theorem. It is the topic of vector and complex measures. The main interest is in complex measures although a vector measure can have values in any topological vector space. Whole books have been written on this subject. See for example the book by Diestal and Uhl [9] titled Vector measures.

602

CHAPTER 24. REPRESENTATION THEOREMS

Definition 24.4.1 Let (V, || · ||) be a normed linear space and let (Ω, S) be a measure space. A function µ : S → V is a vector measure if µ is countably additive. That is, if {Ei }∞ i=1 is a sequence of disjoint sets of S, ∞ ∑ µ(∪∞ E ) = µ(Ei ). i=1 i i=1

Note that it makes sense to take finite sums because it is given that µ has values in a vector space in which vectors can be summed. In the above, µ (Ei ) is a vector. It might be a point in Rn or in any other vector space. In many of the most important applications, it is a vector in some sort of function space which may be infinite dimensional. The infinite sum has the usual meaning. That is ∞ n ∑ ∑ µ(Ei ) = lim µ(Ei ) n→∞

i=1

i=1

where the limit takes place relative to the norm on V . Definition 24.4.2 Let (Ω, S) be a measure space and let µ be a vector measure defined on S. A subset, π(E), of S is called a partition of E if π(E) consists of finitely many disjoint sets of S and ∪π(E) = E. Let ∑ |µ|(E) = sup{ ||µ(F )|| : π(E) is a partition of E}. F ∈π(E)

|µ| is called the total variation of µ. The next theorem may seem a little surprising. It states that, if finite, the total variation is a nonnegative measure. Theorem 24.4.3 If |µ|(Ω) < ∞, then |µ| is a measure on S. Even if |µ| (Ω) = ∞, |µ| (∪∞ i=1 Ei ) ≤ ∑ ∞ |µ| ) That is is subadditive and (A) (B) A, B with ⊆ (E . |µ| |µ| ≤ |µ| whenever ∈ S A B. i i=1 Proof: Consider the last claim. Let a < |µ| (A) and let π (A) be a partition of A such that ∑ a< ||µ (F )|| . F ∈π(A)

Then π (A) ∪ {B \ A} is a partition of B and ∑ |µ| (B) ≥ ||µ (F )|| + ||µ (B \ A)|| > a. F ∈π(A)

Since this is true for all such a, it follows |µ| (B) ≥ |µ| (A) as claimed. ∞ Let {Ej }j=1 be a sequence of disjoint sets of S and let E∞ = ∪∞ j=1 Ej . Then letting a < |µ| (E∞ ) , it follows from the definition of total variation there exists a partition of E∞ , π(E∞ ) = {A1 , · · · , An } such that n ∑ a< ||µ(Ai )||. i=1

Also,

Ai = ∪∞ j=1 Ai ∩ Ej ∑∞ and so by the triangle inequality, ||µ(Ai )|| ≤ j=1 ||µ(Ai ∩ Ej )||. Therefore, by the above, and either Fubini’s theorem or Lemma 1.15.4 on Page 24

a
0 was arbitrary, |µ|(E1 ∪ E2 ) ≥ |µ|(E1 ) + |µ|(E2 ).

(24.13)

∑n Then 24.13 implies that whenever the Ei are disjoint, |µ|(∪nj=1 Ej ) ≥ j=1 |µ|(Ej ). Therefore, for the Ei disjoint, ∞ n ∑ ∑ n |µ|(Ej ) ≥ |µ|(∪∞ E ) ≥ |µ|(∪ E ) ≥ |µ|(Ej ). j=1 j j=1 j j=1

j=1

Since n is arbitrary, |µ|(∪∞ j=1 Ej ) =

∞ ∑

|µ|(Ej )

j=1

which shows that |µ| is a measure as claimed.  The following corollary is interesting. It concerns the case that µ is only finitely additive. Corollary 24.4.4 Suppose (Ω, F) is a set with ∑na σ algebra of subsets F and suppose µ : F → C is only finitely additive. That is, µ (∪ni=1 Ei ) = i=1 µ (Ei ) whenever the Ei are disjoint. Then |µ| , defined in the same way as above, is also finitely additive provided |µ| is finite. Proof: Say E ∩ F = ∅ for E, F ∈ F . Let π (E) , π (F ) suitable partitions for which the following holds. ∑ ∑ |µ| (E ∪ F ) ≥ |µ (A)| + |µ (B)| ≥ |µ| (E) + |µ| (F ) − 2ε. A∈π(E)

B∈π(F )

Since ε is arbitrary, |µ| (E ∩ F ) ≥ |µ| (E) + |µ| (F ) . Similar considerations apply to any finite union of disjoint sets. That is, if the Ei are disjoint, then |µ| (∪ni=1 Ei ) ≥

n ∑ i=1

|µ| (Ei ) .

604

CHAPTER 24. REPRESENTATION THEOREMS Now let E = ∪ni=1 Ei where the Ei are disjoint. Then letting π (E) be a partition of E, ∑ |µ| (E) − ε ≤ |µ (F )| , F ∈π(E)

it follows that |µ| (E) ≤ ε +

∑ F ∈π(E)

≤ ε+

n ∑

n ∑ ∑ |µ (F )| = ε + µ (F ∩ Ei ) F ∈π(E) i=1



|µ (F ∩ Ei )| ≤ ε +

i=1 F ∈π(E)

n ∑

|µ| (Ei )

i=1

∑n Since ε is arbitrary, this shows |µ| (∪ni=1 Ei ) ≤ i=1 |µ| (Ei ) . Thus |µ| is finitely additive.  In the case that µ is a complex measure, it is always the case that |µ| (Ω) < ∞. Theorem 24.4.5 Suppose µ is a complex measure on (Ω, S) where S is a σ algebra of subsets of Ω. That is, whenever, {Ei } is a sequence of disjoint sets of S, µ (∪∞ i=1 Ei ) =

∞ ∑

µ (Ei ) .

i=1

Then |µ| (Ω) < ∞. Proof: First here is a claim. Claim: Suppose |µ| (E) = ∞. Then there are disjoint subsets of E, A and B such that E = A∪B, |µ (A)| , |µ (B)| > 1 and |µ| (B) = ∞. Proof of the claim: From the definition of |µ| , there exists a partition of E, π (E) such that ∑ |µ (F )| > 20 (1 + |µ (E)|) . (24.14) F ∈π(E)

Here 20 is just a nice sized number. No effort is made to be delicate in this argument. Also note that µ (E) ∈ C because it is given that µ is a complex measure. Consider the following picture consisting of two lines in the complex plane having slopes 1 and −1 which intersect at the origin, dividing the complex plane into four closed sets, R1 , R2 , R3 , and R4 as shown. Let π i consist of those sets, A of π (E) for which µ (A) ∈ Ri . Thus, some sets, A of π (E) could be in two of the π i if µ (A) is on one of the intersecting lines. This is R2 not important. The thing which is important is that if µ (A) ∈ R1 √ 2 or R3 , then 2 |µ (A)| ≤ |Re (µ (A))| and if µ (A) ∈ R2 or R4 then √ R3 R1 2 2 |µ (A)| ≤ |Im (µ (A))| and both R1 and R3 have complex numbers R4 z contained in ∑these sets all∑have the same sign for Re (z). Thus, for zi ∈ R1 , | i Re (zi )| = i |Re (zi )|. A similar statement holds for ∑ zi ∈ R3 .∑In the case of R2 , R4 , similar considerations hold for the imaginary parts. Thus | i Im zi | = i |Im zi | is zi are all in R2 or else all in R4 . Then by 24.14, it follows that for some i, ∑ |µ (F )| > 5 (1 + |µ (E)|) . (24.15) F ∈π i

Suppose i equals 1 or 3. A similar argument using the imaginary part applies if i equals 2 or 4. Then, since Re (µ (F )) always has the same sign, ∑ ∑ ∑ |Re (µ (F ))| Re (µ (F )) = µ (F ) ≥ F ∈π i F ∈π i F ∈π i √ √ 2 ∑ 2 ≥ |µ (F )| > 5 (1 + |µ (E)|) . 2 2 F ∈π i

24.4. VECTOR MEASURES

605

Now letting C be the union of the sets in π i , ∑ 5 |µ (C)| = µ (F ) > (1 + |µ (E)|) > 1. 2

(24.16)

F ∈π i

Define D such that C ∪ D = E. Then, since µ is a measure, 5 (1 + |µ (E)|) < |µ (C)| = |µ (E) − µ (D)| 2 ≤ |µ (E)| + |µ (D)|

C

D

E

and so, subtracting |µ (E)| from both sides, 5 3 + |µ (E)| < |µ (D)| . 2 2 Now since |µ| (E) = ∞, it follows from Theorem 24.4.5 that ∞ = |µ| (E) ≤ |µ| (C) + |µ| (D) and so either |µ| (C) = ∞ or |µ| (D) = ∞. If |µ| (C) = ∞, let B = C and A = D. Otherwise, let B = D and A = C. This proves the claim. Now suppose |µ| (Ω) = ∞. Then from the claim, there exist A1 and B1 such that |µ| (B1 ) = ∞, |µ (B1 )| , |µ (A1 )| > 1, and A1 ∪ B1 = Ω. Let B1 ≡ Ω \ A play the same role as Ω and obtain A2 , B2 ⊆ B1 such that |µ| (B2 ) = ∞, |µ (B2 )| , |µ (A2 )| > 1, and A2 ∪ B2 = B1 . Continue in this way to obtain a sequence of disjoint sets, {Ai } such that |µ (Ai )| > 1. Then since µ is a measure, 1
1 µ(E) E

Proof of the lemma:



(0, 0) 1

Refer to the picture. However, this contradicts the assumption of the lemma. It follows µ(E) = 0. Since the set of complex numbers, z such that |z| > 1 is an open set, it equals the union of countably ∞ many balls, {Bi }i=1 . Therefore, ( ) ( ) −1 µ f −1 ({z ∈ C : |z| > 1} = µ ∪∞ (Bk ) k=1 f ∞ ∑ ( ) µ f −1 (Bk ) = 0. ≤ k=1

Thus |f (ω)| ≤ 1 a.e. as claimed.  Corollary 24.4.8 Let λ be a complex vector measure with |λ|(Ω) < ∞.1 Then there exists a unique ∫ 1 f ∈ L (Ω) such that λ(E) = E f d|λ|. Furthermore, |f | = 1 for |λ| a.e. This is called the polar decomposition of λ. Proof: First note that λ ≪ |λ| and so such an L1 function exists and is unique. It is required to show |f | = 1 a.e. If |λ|(E) ̸= 0, ∫ λ(E) 1 = ≤ 1. f d|λ| |λ|(E) |λ|(E) E Therefore by Lemma 24.4.7, |f | ≤ 1, |λ| a.e. Now let [ ] 1 En = |f | ≤ 1 − . n Let {F1 , · · · , Fm } be a partition of En . Then m m ∫ m ∫ ∑ ∑ ∑ ≤ |λ (Fi )| = f d |λ| i=1



Fi i=1 ( ∫ m ∑ i=1

Fi

1−

1 n

)

i=1

d |λ| =

) ( 1 . = |λ| (En ) 1 − n 1 As

|f | d |λ|

Fi

proved above, the assumption that |λ| (Ω) < ∞ is redundant.

m ( ∑ i=1

1−

1 n

) |λ| (Fi )

24.5. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP

607

Then taking the supremum over all partitions, ( ) 1 |λ| (En ) ≤ 1 − |λ| (En ) n ∞ which shows |λ| ∫ (En ) = 0. Hence |λ| ([|f | < 1]) = 0 because [|f | ∫< 1] = ∪n=1 En .  If λ (E) ≡ E hdµ, µ a measure, it is also true that λ (E) = E gd |λ| . How do h and g compare?

Corollary 24.4.9 Suppose (Ω, S) is a measure space and µ is a finite nonnegative measure on S. Then for h ∈ L1 (µ) , define a complex measure λ by ∫ λ (E) ≡ hdµ. E



Then |λ| (E) =

|h| dµ. E

Furthermore, |h| = gh where gd |λ| is the polar decomposition of λ, ∫ λ (E) = gd |λ| E

Proof: From Corollary 24.4.8 there exists g such that |g| = 1, |λ| a.e. and for all E ∈ S ∫ ∫ λ (E) = gd |λ| = hdµ. E

E

Let sn be a sequence of simple functions converging pointwise to g, |sn | ≤ 1. Then from the above, ∫ ∫ gsn d |λ| = sn hdµ. E

E

Passing to the limit using the dominated convergence theorem, ∫ ∫ |λ| (E) = d |λ| = ghdµ. E

E

It follows from the same kind of arguments given above that gh ≥ 0 a.e. and |g| = 1. Therefore, |h| = |gh| = gh. It follows from the above, that ∫ ∫ ∫ |λ| (E) = d |λ| = ghdµ = |h| dµ  E

24.5

E

E

Representation Theorems For The Dual Space Of Lp

The next topic deals with the dual space of Lp for p ≥ 1 in the case where the measure space is σ finite or finite. In what follows q = ∞ if p = 1 and otherwise, p1 + 1q = 1. Recall that the dual space of X is L (X, C). In fact, this theorem holds without the assumption that the measure space is σ finite in case p > 1 but this is more technical to establish. Theorem 24.5.1 (Riesz representation theorem) Let p > 1 and let (Ω, S, µ) be a finite measure space. If Λ ∈ (Lp (Ω))′ , then there exists a unique h ∈ Lq (Ω) ( p1 + 1q = 1) such that ∫ Λf =

hf dµ. Ω

This function satisfies ||h||q = ||Λ|| where ||Λ|| is the operator norm of Λ.

608

CHAPTER 24. REPRESENTATION THEOREMS Proof: (Uniqueness) If h1 and h2 both represent Λ, consider f = |h1 − h2 |q−2 (h1 − h2 ),

where h denotes complex conjugation. By Holder’s inequality, it is easy to see that f ∈ Lp (Ω). Thus 0 = Λf − Λf =



h1 |h1 − h2 |q−2 (h1 − h2 ) − h2 |h1 − h2 |q−2 (h1 − h2 )dµ ∫ = |h1 − h2 |q dµ. Therefore h1 = h2 and this proves uniqueness. Now let λ(E) = Λ(XE ). Since this is a finite measure space XE is an element of Lp (Ω) and so it makes sense to write Λ (XE ). In fact λ is a complex measure having finite total variation. This follows from an easlier result but I will show it directly here. Let A1 , · · · , An be a partition of Ω. |ΛXAi | = wi (ΛXAi ) = Λ(wi XAi ) for some wi ∈ C, |wi | = 1. Thus n ∑

|λ(Ai )| =

i=1

n ∑

|Λ(XAi )| = Λ(

i=1

n ∑

wi XAi )

i=1

∫ ∫ ∑ n 1 1 1 ≤ ||Λ||( | wi XAi |p dµ) p = ||Λ||( dµ) p = ||Λ||µ(Ω) p. Ω

i=1

This is because if x ∈ Ω, x is contained in exactly one of the Ai and so the absolute value of the sum in the first integral above is equal to 1. Therefore |λ|(Ω) < ∞ because this was an arbitrary partition. Also, if {Ei }∞ i=1 is a sequence of disjoint sets of S, let Fn = ∪ni=1 Ei , F = ∪∞ i=1 Ei . Then by the Dominated Convergence theorem, ||XFn − XF ||p → 0. Therefore, by continuity of Λ, λ(F ) = Λ(XF ) = lim Λ(XFn ) = lim n→∞

n ∑

n→∞

Λ(XEk ) =

k=1

∞ ∑

λ(Ek ).

k=1

This shows λ is a complex measure with |λ| finite. It is also clear from the definition of λ that λ ≪ µ. Therefore, by the Radon Nikodym theorem, there exists h ∈ L1 (Ω) with ∫ λ(E) =

hdµ = Λ(XE ). E

∑m Actually h ∈ Lq and satisfies the other conditions above. Let s = i=1 ci XEi be a simple function. Then since Λ is linear, ∫ ∫ m m ∑ ∑ Λ(s) = ci Λ(XEi ) = ci hdµ = hsdµ. (24.17) i=1

i=1

Ei

Claim: If f is uniformly bounded and measurable, then ∫ Λ (f ) = hf dµ.

24.5. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP

609

Proof of claim: Since f is bounded and measurable, there exists a sequence of simple functions, {sn } which converges to f pointwise and in Lp (Ω). This follows from Theorem 20.1.6 on Page 500 upon breaking f up into positive and negative parts of real and complex parts. In fact this theorem gives uniform convergence. Then ∫ ∫ Λ (f ) = lim Λ (sn ) = lim hsn dµ = hf dµ, n→∞

n→∞

the first equality holding because of continuity of Λ, the second following from 24.17 and the third holding by the dominated convergence theorem. This is a very nice formula but it still has not been shown that h ∈ Lq (Ω). Let En = {x : |h(x)| ≤ n}. Thus |hXEn | ≤ n. Then |hXEn |q−2 (hXEn ) ∈ Lp (Ω). By the claim, it follows that ∫ ||hXEn ||qq =

h|hXEn |q−2 (hXEn )dµ = Λ(|hXEn |q−2 (hXEn ))

q

≤ ||Λ|| |hXEn |q−2 (hXEn ) p = ||Λ|| ||hXEn ||qp,

the last equality holding because q − 1 = q/p and so (∫

|hXE |q−2 (hXE ) p dµ n n

Therefore, since q −

q p

)1/p

(∫ ( )p )1/p q q/p = |hXEn | dµ = ||hXEn ||qp

= 1, it follows that ||hXEn ||q ≤ ||Λ||.

Letting n → ∞, the Monotone Convergence theorem implies ||h||q ≤ ||Λ||.

(24.18)

Now that h has been shown to be in Lq (Ω), it follows from 24.17 and the density of the simple functions, Theorem 23.2.1 on Page 577, that ∫ Λf = hf dµ for all f ∈ Lp (Ω). It only remains to verify the last claim. ∫ ∥Λ∥ ≡ sup{ hf : ||f ||p ≤ 1} ≤ ||h||q ≤ ||Λ|| by 24.18, and Holder’s inequality.  To represent elements of the dual space of L1 (Ω), another Banach space is needed. Definition 24.5.2 Let (Ω, S, µ) be a measure space. L∞ (Ω) is the vector space of measurable functions such that for some M > 0, |f (x)| ≤ M for all x outside of some set of measure zero (|f (x)| ≤ M a.e.). Define f = g when f (x) = g(x) a.e. and ||f ||∞ ≡ inf{M : |f (x)| ≤ M a.e.}. Theorem 24.5.3 L∞ (Ω) is a Banach space.

610

CHAPTER 24. REPRESENTATION THEOREMS

Proof: It is clear that L∞ (Ω) is a vector space. Is || ||∞ a norm? Claim: If f ∈ L∞ (Ω),{ then |f (x)| ≤ ||f ||∞ a.e. } Proof of the claim: x : |f (x)| ≥ ||f ||∞ + n−1 ≡ En is a set of measure zero according to the definition of ||f ||∞ . Furthermore, {x : |f (x)| > ||f ||∞ } = ∪n En and so it is also a set of measure zero. This verifies the claim. Now if ||f ||∞ = 0 it follows that f (x) = 0 a.e. Also if f, g ∈ L∞ (Ω), |f (x) + g (x)| ≤ |f (x)| + |g (x)| ≤ ||f ||∞ + ||g||∞ a.e. and so ||f ||∞ +||g||∞ serves as one of the constants, M in the definition of ||f + g||∞ . Therefore, ||f + g||∞ ≤ ||f ||∞ + ||g||∞ . Next let c be a number. Then |cf (x)| = |c| |f (x)| ≤ |c| ||f ||∞ a.e. and so ||cf ||∞ ≤ |c| ||f ||∞ . Therefore since c is arbitrary, ||f ||∞ = ||c (1/c) f ||∞ ≤ 1c ||cf ||∞ which implies |c| ||f ||∞ ≤ ||cf ||∞ . Thus || ||∞ is a norm as claimed. To verify completeness, let {fn } be a Cauchy sequence in L∞ (Ω) and use the above claim to get the existence of a set of measure zero, Enm such that for all x ∈ / Enm , |fn (x) − fm (x)| ≤ ||fn − fm ||∞ ∞ Let E = ∪n,m Enm . Thus µ(E) = 0 and for each x ∈ / E, {fn (x)}n=1 is a Cauchy sequence in C. Let { 0 if x ∈ E = lim XE C (x)fn (x). f (x) = n→∞ limn→∞ fn (x) if x ∈ /E

Then f is clearly measurable because it is the limit of measurable functions. If Fn = {x : |fn (x)| > ||fn ||∞ } and F = ∪∞ / F ∪ E, n=1 Fn , it follows µ(F ) = 0 and that for x ∈ |f (x)| ≤ lim inf |fn (x)| ≤ lim inf ||fn ||∞ < ∞ n→∞

n→∞

because {||fn ||∞ } is a Cauchy sequence. (|||fn ||∞ − ||fm ||∞ | ≤ ||fn − fm ||∞ by the triangle inequality.) Thus f ∈ L∞ (Ω). Let n be large enough that whenever m > n, ||fm − fn ||∞ < ε. Then, if x ∈ / E, |f (x) − fn (x)| = lim |fm (x) − fn (x)| ≤ lim inf ||fm − fn ||∞ < ε. m→∞

m→∞

Hence ||f − fn ||∞ < ε for all n large enough.  ( )′ The next theorem is the Riesz representation theorem for L1 (Ω) . Theorem 24.5.4 (Riesz representation theorem) Let (Ω, S, µ) be a finite measure space. If Λ ∈ (L1 (Ω))′ , then there exists a unique h ∈ L∞ (Ω) such that ∫ Λ(f ) = hf dµ Ω

for all f ∈ L1 (Ω). If h is the function in L∞ (Ω) representing Λ ∈ (L1 (Ω))′ , then ||h||∞ = ||Λ||. Proof: Just as in the proof of Theorem 24.5.1, there exists a unique h ∈ L1 (Ω) such that for all simple functions s, ∫ Λ(s) = hs dµ. (24.19)

24.5. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF LP

611

To show h ∈ L∞ (Ω), let ε > 0 be given and let E = {x : |h (x)| ≥ ∥Λ∥ + ε}. Let |k| = 1 and hk = |h|. Since the measure space is finite, k ∈ L1 (Ω). As in Theorem 24.5.1 let {sn } be a sequence of simple functions converging to k in L1 (Ω), and pointwise. It follows from the construction in Theorem 20.1.6 on Page 500 that it can be assumed |sn | ≤ 1. Therefore ∫ ∫ Λ(kXE ) = lim Λ(sn XE ) = lim hsn dµ = hkdµ n→∞

n→∞

E

E

where the last equality holds by the Dominated Convergence theorem. Therefore, ∫ ∫ ||Λ||µ(E) ≥ |Λ(kXE )| = | hkXE dµ| = |h|dµ Ω

E

≥ (||Λ|| + ε)µ(E). It follows that µ(E) = 0. Since ε > 0 was arbitrary, ∥Λ∥ ≥ ||h||∞ . Since h ∈ L∞ (Ω), the density of the simple functions in L1 (Ω) and 24.19 imply ∫ Λf = hf dµ , ∥Λ∥ ≥ ||h||∞ . (24.20) Ω

This proves the existence part of the theorem. To verify uniqueness, suppose h1 and h2 both represent Λ and let f ∈ L1 (Ω) be such that |f | ≤ 1 and f (h1 − h2 ) = |h1 − h2 |. Then ∫ ∫ 0 = Λf − Λf = (h1 − h2 )f dµ = |h1 − h2 |dµ. Thus h1 = h2 . Finally,

∫ ||Λ|| = sup{|

hf dµ| : ||f ||1 ≤ 1} ≤ ||h||∞ ≤ ||Λ||

by 24.20.  Next these results are extended to the σ finite case. Lemma 24.5.5 Let (Ω, S, µ) be a measure space and suppose there exists a measurable function, r ∫ such that r (x) > 0 for all x, there exists M such that |r (x)| < M for all x, and rdµ < ∞. Then for Λ ∈ (Lp (Ω, µ))′ , p ≥ 1, there exists h ∈ Lq (Ω, µ), L∞ (Ω, µ) if p = 1 such that ∫ Λf = hf dµ. Also ||h|| = ||Λ||. (||h|| = ||h||q if p > 1, ||h||∞ if p = 1). Here 1 1 + = 1. p q Proof: Define a new measure µ e, according to the rule ∫ µ e (E) ≡ rdµ. E ′

Thus µ e is a finite measure on S. For Λ ∈ (Lp (µ)) , ( )) ( ) ( ˜ r−1/p f =Λ Λ (f ) = Λ r1/p r−1/p f

(24.21)

612

CHAPTER 24. REPRESENTATION THEOREMS ( ) e (g) ≡ Λ r1/p g Λ

where ′ ˜ is in Lp (˜ Now Λ µ) because

e Λ (g)

(∫ )1/p ( ) 1/p p 1/p ≡ Λ r g ≤ ||Λ|| r g dµ Ω

 d˜ µ 1/p ∫ z}|{ p = ∥Λ∥  |g| rdµ  = ||Λ|| ||g||Lp (eµ) Ω

e Here Therefore, by Theorems 24.5.4 and 24.5.1 there exists a unique h ∈ Lq (e µ) which represents Λ. q = ∞ if p = 1 and satisfies 1/q + 1/p = 1 otherwise. Then ∫ ( ) ∫ ( ) −1/p −1/p ˜ Λ (f ) = Λ r f = hf r rdµ = f hr1/q dµ Ω



˜ ∈ L (µ) since h ∈ L (˜ Now hr ≡h µ). In case p = 1, L (e µ) and Lq (µ) are exactly the same. In this case you have ∫ ∫ ( ) ˜ r−1 f = Λ (f ) = Λ hf r−1 rdµ = f hdµ 1/q

q

q

q





Thus the desired representation holds. Then in any case,



˜ |Λ (f )| ≤ h

q ∥f ∥Lp L



˜ ∥Λ∥ ≤ h

so

Lq

Also, as before,

q

˜

h q

L (µ)

.

∫ ( ) q−2 ˜ ˜ q−2 ˜ ˜ ˜ = h h hdµ = Λ h h Ω ) (∫ q−2 p 1/p ˜ ˜ h ≤ ∥Λ∥ h Ω (∫ ( )p )1/p ˜ q/p q/p = ∥Λ∥ ∥h∥ = ∥Λ∥ h Ω



˜

h

and so



˜ It works the same for p = 1. Thus h

Lq (µ)

Lq (µ)

≤ ∥Λ∥

= ∥Λ∥ . 

A situation in which the conditions of the lemma are satisfied is the case where the measure space is σ finite. In fact, you should show this is the only case in which the conditions of the above lemma hold. Theorem 24.5.6 (Riesz representation theorem) Let (Ω, S, µ) be σ finite and let Λ ∈ (Lp (Ω, µ))′ , p ≥ 1. Then there exists a unique h ∈ Lq (Ω, µ), L∞ (Ω, µ) if p = 1 such that ∫ Λf = hf dµ. Also ||h|| = ||Λ||. (||h|| = ||h||q if p > 1, ||h||∞ if p = 1). Here 1 1 + = 1. p q

24.6. THE DUAL SPACE OF C0 (X)

613

Proof: Without loss of generality, assum µ (Ω) = ∞. Then let {Ωn } be a sequence of disjoint elements of S having the property that 1 < µ(Ωn ) < ∞, ∪∞ n=1 Ωn = Ω. Define r(x) = Thus

∫ ∞ ∑ 1 −1 X (x) µ(Ω ) , µ e (E) = rdµ. Ω n n2 n E n=1 ∫ rdµ = µ e(Ω) = Ω

∞ ∑ 1 0 there exists a compact set K such that |f (x)| < ε whenever x ∈ / K. Recall the norm on this space is ||f ||∞ ≡ ||f || ≡ sup {|f (x)| : x ∈ X} The next lemma has to do with extending functionals which are defined on nonnegative functions to complex valued functions in such a way that the extended function is linear. This exact process was used earlier with the abstract Lebesgue integral. Basically, you can do it when the functional “desires to be linear”. Lemma 24.6.2 Suppose λ is a mapping which has nonnegative real values which is defined on the nonnegative functions in C0 (X) such that λ (af + bg) = aλ (f ) + bλ (g)

(24.22)

whenever a, b ≥ 0 and f, g ≥ 0. Then there exists a unique extension of λ to all of C0 (X), Λ such that whenever f, g ∈ C0 (X) and a, b ∈ C, it follows Λ (af + bg) = aΛ (f ) + bΛ (g) . If |λ (f )| ≤ C ||f ||∞

(24.23)

then |Λf | ≤ C ∥f ∥∞ , |Λf | ≤ λ (|f |) Proof: Here λ is defined on the nonnegative functions. First extend it to the continuous real valued functions. There is only one way to do it and retain the map is linear. Let C0 (X; R) be the real-valued functions in C0 (X) and define ( ) ( ) ( ) ΛR (f ) = ΛR f + − f − = ΛR f + − ΛR f − = λf + − λf − for f ∈ C0 (X; R). This is the only thing possible if ΛR is to be linear. Is ΛR (f + g) = ΛR (f ) + ΛR (g)? + − f + g = (f + g) − (f + g) = f + − f − + g + − g −

614

CHAPTER 24. REPRESENTATION THEOREMS

and so



+

ΛR (f + g) ≡ ΛR (f + g) − ΛR (f + g)

ΛR f + ΛR g ≡ ΛR f + − ΛR f − + ΛR g + − ΛR g − Are these equal? This will be so if and only if −

ΛR (f + g) + ΛR f − + ΛR g − = ΛR (f + g) + ΛR f + + ΛR g + +

equivalently, since ΛR = λ on nonnegative functions and λ tries to be linear, ( ) ( ) + − λ (f + g) + f − + g − = λ (f + g) + f + + g + −

But this is so because (f + g) + f − + g − = (f + g) + f + + g + . It is necessary to verify that ΛR (cf ) = cΛR (f ) for all c ∈ R. But (cf )± = cf ±,if c ≥ 0 while (cf )+ = −c(f )−, if c < 0 and (cf )− = (−c)f +, if c < 0. Thus, if c < 0, ( ) ( ) ΛR (cf ) = λ(cf )+ − λ(cf )− = λ (−c) f − − λ (−c)f + +

= −cλ(f − ) + cλ(f + ) = c(λ(f + ) − λ(f − )) = cΛR (f ) . A similar formula holds more easily if c ≥ 0. Now let Λf ≡ ΛR (Re f ) + iΛR (Im f ) for arbitrary f ∈ C0 (X). This is the only possible definition if it is to be linear and agree with ΛR on C0 (X; R). Then Λ (f + g) = ΛR (Re (f + g)) + iΛR (Im (f + g)) =

ΛR (Re f ) + ΛR (Re g) + i [ΛR (Im f ) + ΛR (Im g)]



Λ (f ) + Λ (g)

Also for b real, Re (ibf ) = −b Im f, Im (ibf ) = b Re (f ) . Then letting f = u + iv Λ ((a + ib) (u + iv)) = Λ (au − bv + iav + ibu) = ΛR (au − bv) + iΛR (av + bu) (a + ib) Λ (u + iv) = (a + ib) (ΛR (u) + iΛR (v)) = aΛR (u) − bΛR (v) + i [bΛR (u) + aΛR (v)] which is the same thing because ΛR is linear. It remains to verify the claim about continuity of Λ in case of 24.23. This is really pretty obvious because fn → 0 in C0 (X) if and only if the positive and negative parts of real and imaginary parts also converge to 0 and λ of each of these converges to 0 by assumption. What of the last claim that |Λf | ≤ λ (|f |)? Let ω have |ω| = 1 and |Λf | = ωΛ (f ) . Since Λ is linear, ) ( ) ( + + |Λf | = ωΛ (f ) = Λ (ωf ) = ΛR (Re ωf ) ≤ ΛR Re (ωf ) = λ Re (ωf ) ≤ λ (|f |)  Let L ∈ C0 (X)′ . Also denote by C0+ (X) the set of nonnegative continuous functions defined on X. ′

Definition 24.6.3 Letting L ∈ C0 (X) , define for f ∈ C0+ (X) λ(f ) = sup{|Lg| : |g| ≤ f }. Note that λ(f ) < ∞ because |Lg| ≤ ∥L∥ ∥g∥ ≤ ∥L∥ ||f || for |g| ≤ f . Isn’t this a lot like the total variation of a vector measure? Indeed it is, and the proof that λ wants to be linear is also similar to the proof that the total variation is a measure. This is the content of the following lemma.

24.6. THE DUAL SPACE OF C0 (X)

615

Lemma 24.6.4 If c ≥ 0, λ(cf ) = cλ(f ), f1 ≤ f2 implies λ (f1 ) ≤ λ (f2 ), and λ(f1 + f2 ) = λ(f1 ) + λ(f2 ). Also 0 ≤ λ (f ) ≤ ∥L∥ ||f ||∞ Proof: The first two assertions are easy to see so consider the third. For i = 1, 2 and fi ∈ C0+ (X) , let |gi | ≤ fi , λ (fi ) ≤ |Lgi | + ε Then let |ω i | = 1 and ω i L (gi ) = |L (gi )| so that |L (g1 )| + |L (g2 )| = ω 1 L (g1 ) + ω 2 L (g2 ) = L (ω 1 g1 + ω 2 g2 ) = |L (ω 1 g1 + ω 2 g2 )| Then λ (f1 ) + λ (f2 ) ≤ |L (g1 )| + |L (g2 )| + 2ε = ω 1 L (g1 ) + ω 2 L (g2 ) + 2ε = L (ω 1 g1 ) + L (ω 2 g2 ) + 2ε

where |gi | ≤ fi and Now

=

L (ω 1 g1 + ω 2 g2 ) + 2ε

=

|L (ω 1 g1 + ω 2 g2 )| + 2ε

|ω 1 g1 + ω 2 g2 | ≤ |g1 | + |g2 | ≤ f1 + f2

and so the above shows λ (f1 ) + λ (f2 ) ≤ λ (f1 + f2 ) + 2ε. Since ε is arbitrary, λ (f1 ) + λ (f2 ) ≤ λ (f1 + f2 ) . It remains to verify the other inequality. Let |g| ≤ f1 + f2 , |Lg| ≥ λ(f1 + f2 ) − ε. {

Let hi (x) = Then hi is continuous and

fi (x)g(x) f1 (x)+f2 (x)

if f1 (x) + f2 (x) > 0,

0 if f1 (x) + f2 (x) = 0.

h1 (x) + h2 (x) = g(x), |hi | ≤ fi .

The function hi is clearly continuous at points x where f1 (x) + f2 (x) > 0. The reason it is continuous at a point where f1 (x) + f2 (x) = 0 is that at every point y where f1 (y) + f2 (y) > 0, the top description of the function gives fi (y) g (y) ≤ |g (y)| ≤ f1 (y) + f2 (y) |hi (y)| = f1 (y) + f2 (y) so if |y − x| is small enough, |hi (y)| is also small. Then it follows from this definition of the hi that −ε + λ(f1 + f2 ) ≤ |Lg| = |Lh1 + Lh2 | ≤ |Lh1 | + |Lh2 | ≤ λ(f1 ) + λ(f2 ). Since ε > 0 is arbitrary, this shows that λ(f1 + f2 ) ≤ λ(f1 ) + λ(f2 ) ≤ λ(f1 + f2 )

616

CHAPTER 24. REPRESENTATION THEOREMS The last assertion follows from λ(f ) = sup{|Lg| : |g| ≤ f } ≤

sup ∥g∥∞ ≤∥f ∥∞

∥L∥ ||g||∞ ≤ ∥L∥ ||f ||∞ 

Let Λ be the unique linear extension of Theorem 21.7.8. It is just like defining the integral for functions when you understand it for nonnegative functions. Then from the above lemma, |Λf | = ωΛf = Λ (ωf ) ≤ λ (|f |) ≤ ∥L∥ ||f ||∞ . Also, if f ≥ 0,

(24.24)

Λf = λ (f ) ≥ 0.

Therefore, Λ is a positive linear functional on C0 (X). In particular, it is a positive linear functional on Cc (X). Thus there are now two linear continuous mappings L, Λ which are defined on C0 (X) . The above shows that in fact ∥Λ∥ ≤ ∥L∥. Also, from the definition of Λ |Lg| ≤ λ (|g|) = Λ (|g|) ≤ ∥Λ∥ ∥g∥∞ so in fact, ∥L∥ ≤ ∥Λ∥ showing that these two have the same operator norms. ∥L∥ = ∥Λ∥

(24.25)

By Theorem 22.0.5 on Page 542, since Λ is a positive linear functional on Cc (X), there exists a unique measure µ such that ∫ Λf = f dµ X

for all f ∈ Cc (X). This measure is regular. In fact, it is actually a finite measure. ′

Lemma 24.6.5 Let L ∈ C0 (X) as above. Then letting µ be the Radon measure just described, it follows µ is finite and µ (X) = ∥Λ∥ = ∥L∥ Proof: First of all, it was observed above in 24.25 that ∥Λ∥ = ∥L∥ . Now by regularity, µ (X) = sup {µ (K) : K ⊆ X} and so letting K ≺ f ≺ X for one of these K, it also follows µ (X) = sup {Λf : f ≺ X} However, for such nonnegative f ≺ X, λ (f ) ≡ ≤

sup {|Lg| : |g| ≤ f } ≤ sup {∥L∥ ∥g∥∞ : |g| ≤ f } ∥L∥ ∥f ∥∞ = ∥L∥

and so 0 ≤ Λf = λf ≤ ∥L∥ It follows that µ (X) = sup {Λf : f ≺ X} ≤ ||L|| . Now since Cc (X) is dense in C0 (X) , there exists f ∈ Cc (X) such that ∥f ∥∞ ≤ 1 and |Λf | + ε > ∥Λ∥ = ∥L∥ Thus, ∥L∥ − ε < |Λf | ≤ λ (|f |) = Λ |f | ≤ µ (X) Since ε is arbitrary, this shows ∥L∥ = µ (X).  What follows is the Riesz representation theorem for C0 (X)′ .

24.6. THE DUAL SPACE OF C0 (X)

617

Theorem 24.6.6 Let L ∈ (C0 (X))′ for X a Polish space with closed balls compact. Then there exists a finite Radon measure µ and a function σ ∈ L∞ (X, µ) such that for all f ∈ C0 (X) , ∫ L(f ) = f σdµ. X

Furthermore, µ (X) = ||L|| , |σ| = 1 a.e. and if

∫ ν (E) ≡

σdµ E

then µ = |ν| . Proof: From the above there exists a unique Radon measure µ such that for all f ∈ Cc (X) , ∫ Λf = f dµ X

Then for f ∈ Cc (X) ,

∫ |Lf | ≤ λ (|f |) = Λ(|f |) =

|f |dµ = ||f ||L1 (µ) . X

Since µ is both inner and outer regular, Cc (X) is dense in L1 (X, µ). (See Theorem 23.2.4 for e By the Riesz more than is needed.) Therefore L extends uniquely to an element of (L1 (X, µ))′ , L. representation theorem for L1 for finite measure spaces, there exists a unique σ ∈ L∞ (X, µ) such that for all f ∈ L1 (X, µ) , ∫ e = Lf f σdµ X

In particular, for all f ∈ C0 (X) ,

∫ Lf =

f σdµ X

and it follows from Lemma 24.6.5, µ (X) = ||L||. It remains to verify |σ| = 1 a.e. For any continuous f ≥ 0, ∫ ∫ Λf ≡ f dµ ≥ |Lf | = f σdµ X

X

Now if E is measurable, the regularity of µ implies there exists a sequence of nonnegative bounded functions fn ∈ Cc (X) such that fn (x) → XE (x) a.e. and in L1 (µ) . Then using the dominated convergence theorem in the above, ∫ ∫ dµ = lim fn dµ = lim Λ (fn ) ≥ lim |Lfn | n→∞ X n→∞ n→∞ E ∫ ∫ = lim fn σdµ = σdµ n→∞

and so if µ (E) > 0,

E

X

1 ≥

1 µ (E)

∫ E

σdµ

which shows from Lemma 24.4.7 that |σ| ≤ 1 a.e. But also, choosing f1 appropriately, ||f1 ||∞ ≤ 1, |Lf1 | + ε > ∥L∥ = µ (X)

618

CHAPTER 24. REPRESENTATION THEOREMS

Letting ω (Lf1 ) = |Lf1 | , |ω| = 1, µ (X) =

||L|| =

sup ||f ||∞ ≤1

|Lf | ≤ |Lf1 | + ε

∫ = ωLf1 + ε = f1 ωσdµ + ε X ∫ = Re (f1 ωσ) dµ + ε ∫X ≤ |σ| dµ + ε X

and since ε is arbitrary,

∫ µ (X) ≤

|σ| dµ ≤ µ (X) X

which requires |σ| = 1 a.e. since it was shown to be no larger than 1 and if it is smaller than 1 on a set of positive measure, then the above could not hold. It only remains to verify µ = |ν|. By Corollary 24.4.9, ∫ ∫ |ν| (E) = |σ| dµ = 1dµ = µ (E) E

and so µ = |ν| .  Sometimes people write

E



∫ f dν ≡

X

f σd |ν| X

where σd |ν| is the polar decomposition of the complex measure ν. Then with this convention, the above representation is ∫ L (f ) = f dν, |ν| (X) = ||L|| . X

Also note that at most one ν can represent L. If there were two of them ν i , i = 1, 2, then ν 1 − ν 2 would represent 0 and so |ν 1 − ν 2 | (X) = 0. Hence ν 1 = ν 2 .

24.7

Exercises

1. Suppose µ is a vector measure having values in Rn or Cn . Can you show that |µ| must be finite? Hint: You might define for each ei , one of the standard basis vectors, the real or complex measure, µei given by µei (E) ≡ ei · µ (E) . Why would this approach not yield anything for an infinite dimensional normed linear space in place of Rn ? 2. The Riesz representation theorem of the Lp spaces can be used to prove a very interesting inequality. Let r, p, q ∈ (1, ∞) satisfy 1 1 1 = + − 1. r p q Then

1 1 1 1 =1+ − > q r p r

and so r > q. Let θ ∈ (0, 1) be chosen so that θr = q. Then also   1/p+1/p′ =1 z }| { 1 1 1  1   1  = 1− ′ + −1= − ′ r  p  q q p

24.7. EXERCISES

619

and so

θ 1 1 = − ′ q q p

which implies p′ (1 − θ) = q. Now let f ∈ Lp (Rn ) , g ∈ Lq (Rn ) , f, g ≥ 0. Justify the steps in the following argument using what was just shown that θr = q and p′ (1 − θ) = q. Let ( ) 1 1 r′ n h ∈ L (R ) . + ′ =1 r r ∫ ∫ ∫ f ∗ g (x) h (x) dx = f (y) g (x − y) h (x) dxdy . ∫ ∫ θ



|f (y)| |g (x − y)| |g (x − y)| ≤

∫ (∫ ( (∫ (



[∫ (∫ (

1−θ

|g (x − y)|

θ

|f (y)| |g (x − y)|

|g (x − y)|

[∫ (∫ (

|h (x)|

1−θ

)r

1−θ

)r ′

|h (x)| dydx

)1/r′ dx ·

)1/r dx dy

]1/p′ )r′ )p′ /r′ |h (x)| dx dy ·

|f (y)| |g (x − y)|

θ

)r

)p/r ]1/p dx dy

]1/r′ [∫ (∫ ( )p′ )r′ /p′ 1−θ dx ≤ |g (x − y)| |h (x)| dy · [∫

(∫ |f (y)|

[∫ =

|h (x)|

r′

p

θr

|g (x − y)|

(∫ |g (x − y)| q/r

= ||g||q

q/p′

||g||q

(1−θ)p′

)p/r ]1/p dx dy ]1/r′

)r′ /p′ dy

dx

q/r

||g||q

||f ||p ||h||r′ = ||g||q ||f ||p ||h||r′ .

||f ||p (24.26)

Young’s inequality says that ||f ∗ g||r ≤ ||g||q ||f ||p .

(24.27)

Therefore ||f ∗ g||r ≤ ||g||q ||f ||p . How does this inequality follow from the above computation? Does 24.26 continue to hold if r, p, q are only assumed to be in [1, ∞]? Explain. Does 24.27 hold even if r, p, and q are only assumed to lie in [1, ∞]? 3. Suppose (Ω, µ, S) is a finite measure space and that {fn } is a sequence of functions which converge weakly to 0 in Lp (Ω). This means that ∫ fn gdµ → 0 Ω ′

for every g ∈ Lp (Ω). Suppose also that fn (x) → 0 a.e. Show that then fn → 0 in Lp−ε (Ω) for every p > ε > 0.

620

CHAPTER 24. REPRESENTATION THEOREMS

4. Give an example of a sequence of functions in L∞ (−π, π) which converges weak ∗ to zero but which does not converge pointwise a.e. to zero. Convergence weak ∗ to 0 means that for every ∫π g ∈ L1 (−π, π) , −π g (t) fn (t) dt → 0. Hint: First consider g ∈ Cc∞ (−π, π) and maybe try something like fn (t) = sin (nt). Do integration by parts. 5. Let λ be a real vector measure on the measure space (Ω, F). That is λ has values in R. The Hahn decomposition says there exist measurable sets P, N such that P ∪ N = Ω, P ∩ N = ∅, and for each F ⊆ P, λ (F ) ≥ 0 and for each F ⊆ N, λ (F ) ≤ 0. These sets P, N are called the positive set and the negative set respectively. Show the existence of the Hahn decomposition. Also explain how this decomposition is unique in the sense that if P ′ , N ′ is another Hahn decomposition, then (P \ P ′ ) ∪ (P ′ \ P ) has measure zero, a similar formula holding for N, N ′ . When you have the Hahn decomposition, as just described, you define λ+ (E) ≡ λ (E ∩ P ) , λ− (E) ≡ λ (E ∩ N ). This is sometimes called the Hahn Jordan decomposition. Hint: This is pretty easy if you use the polar decomposition above. 6. The Hahn decomposition holds for measures which have values in (−∞, ∞]. Let λ be such a measure which is defined on a σ algebra of sets F. This is not a vector measure because the set on which it has values is not a vector space. Thus this case is not included in the above discussion. N ∈ F is called a negative set if λ (B) ≤ 0 for all B ⊆ N. P ∈ F is called a positive set if for all F ⊆ P, λ (F ) ≥ 0. (Here it is always assumed you are only considering sets of F.) Show that if λ (A) ≤ 0, then there exists N ⊆ A such that N is a negative set and λ (N ) ≤ λ (A). Hint: This is done by subtracting off disjoint sets having positive meaure. Let A ≡ N0 and suppose Nn ⊆ A has been obtained. Tell why tn ≡ sup {λ (E) : E ⊆ Nn } ≥ 0. Let Bn ⊆ Nn such that

tn 2 Then Nn+1 ≡ Nn \ Bn . Thus the Nn are decreasing in n and the Bn are disjoint. Explain why λ (Nn ) ≤ λ (N0 ). Let N = ∩Nn . Argue tn must converge to 0 since otherwise λ (N ) = −∞. Explain why this requires N to be a negative set in A which has measure no larger than that of A. λ (Bn ) >

7. Using Problem 6 complete the Hahn decomposition for λ having values in (−∞, ∞]. Now the Hahn Jordan decomposition for the measure λ is λ+ (E) ≡ λ (E ∩ P ) , λ− (E) ≡ −λ (E ∩ N ) . Explain why λ− is a finite measure. Hint: Let N0 = ∅. For Nn a given negative set, let tn ≡ inf {λ (E) : E ∩ Nn = ∅} Explain why you can assume that for all n, tn < 0. Let En ⊆ NnC such that λ (En ) < tn /2 < 0 and from Problem 6 let An ⊆ En be a negative set such that λ (An ) ≤ λ (En ). Then Nn+1 ≡ Nn ∪ An . If tn does not converge to 0 explain why there exists a set having measure −∞ which C is not allowed. Thus tn → 0. Let N = ∪∞ must be positive n=1 Nn and explain why P ≡ N due to tn → 0. 8. What if λ has values in [−∞, ∞). Prove there exists a Hahn decomposition for λ as in the above problem. Why do we not allow λ to have values in [−∞, ∞]? Hint: You might want to consider −λ.

24.7. EXERCISES

621 ∞

9. Suppose X is a Banach space and let X ′ denote its dual space. A sequence {x∗n }n=1 in X ′ is said to converge weak ∗ to x∗ ∈ X ′ if for every x ∈ X, lim x∗n (x) = x∗ (x) .

n→∞

Let {ϕn } be a mollifier. Also let δ be the measure defined by δ (E) = 1 if 0 ∈ E and 0 if 1 ∈ / E. Explain how ϕn → δ weak ∗.



< 1 ∥x∥ + 1 ∥y∥ . A Banach space is 10. A Banach space X is called strictly convex if x+y 2 2 2 called uniformly convex if whenever ∥xn + yn ∥ → 2, it follows that ∥xn − yn ∥ → 0. Show that uniform convexity implies strict convexity. It was not done here, but it can be proved using something called Clarkson’s inequalities that the Lp spaces for p > 1 are uniformly convex.

622

CHAPTER 24. REPRESENTATION THEOREMS

Part IV

Appendix

623

Appendix A

The Cross Product This short explanation is included for the sake of those who have had a calculus course in which the geometry of the cross product has not been made clear. Unfortunately, this is the case in most of the current books. The geometric significance in terms of angles and lengths is very important, not just a formula for computing the cross product. The cross product is the other way of multiplying two vectors in R3 . It is very different from the dot product in many ways. First the geometric meaning is discussed and then a description in terms of coordinates is given. Both descriptions of the cross product are important. The geometric description is essential in order to understand the applications to physics and geometry while the coordinate description is the only way to practically compute the cross product. Definition A.0.1 Three vectors, a, b, c form a right handed system if when you extend the fingers of your right hand along the vector a and close them in the direction of b, the thumb points roughly in the direction of c. For an example of a right handed system of vectors, see the following picture.

 y

c

a b 

In this picture the vector c points upwards from the plane determined by the other two vectors. You should consider how a right hand system would differ from a left hand system. Try using your left hand and you will see that the vector c would need to point in the opposite direction as it would for a right hand system. From now on, the vectors, i, j, k will always form a right handed system. To repeat, if you extend the fingers of your right hand along i and close them in k the direction j, the thumb points in the direction of k. The following 6 is the geometric description of the cross product. It gives both the direction and the magnitude and therefore specifies the vector. j Definition A.0.2 Let a and b be two vectors in R3 . Then a × b is defined by the following two rules. i 1. |a × b| = |a| |b| sin θ where θ is the included angle.

625

626

APPENDIX A. THE CROSS PRODUCT 2. a × b · a = 0, a × b · b = 0, and a, b, a × b forms a right hand system. Note that |a × b| is the area of the parallelogram determined by a and b. 3 

b θ

a

|b| sin(θ)

-

The cross product satisfies the following properties. a × b = − (b × a) , a × a = 0,

(1.1)

(αa) ×b = α (a × b) = a× (αb) ,

(1.2)

For α a scalar, For a, b, and c vectors, one obtains the distributive laws, a× (b + c) = a × b + a × c,

(1.3)

(b + c) × a = b × a + c × a.

(1.4)

Formula 1.1 follows immediately from the definition. The vectors a × b and b × a have the same magnitude, |a| |b| sin θ, and an application of the right hand rule shows they have opposite direction. Formula 1.2 is also fairly clear. If α is a nonnegative scalar, the direction of (αa) ×b is the same as the direction of a × b,α (a × b) and a× (αb) while the magnitude is just α times the magnitude of a × b which is the same as the magnitude of α (a × b) and a× (αb) . Using this yields equality in 1.2. In the case where α < 0, everything works the same way except the vectors are all pointing in the opposite direction and you must multiply by |α| when comparing their magnitudes. The distributive laws are much harder to establish but the second follows from the first quite easily. Thus, assuming the first, and using 1.1, (b + c) × a = −a× (b + c) = − (a × b + a × c) = b × a + c × a. A proof of the distributive law is given in a later section for those who are interested. Now from the definition of the cross product, i × j = k j × i = −k k × i = j i × k = −j j × k = i k × j = −i With this information, the following gives the coordinate description of the cross product. Proposition A.0.3 Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k be two vectors. Then a × b = (a2 b3 − a3 b2 ) i+ (a3 b1 − a1 b3 ) j+ + (a1 b2 − a2 b1 ) k. Proof: From the above table and the properties of the cross product listed, (a1 i + a2 j + a3 k) × (b1 i + b2 j + b3 k) = a1 b2 i × j + a1 b3 i × k + a2 b1 j × i + a2 b3 j × k+

(1.5)

627 +a3 b1 k × i + a3 b2 k × j = a1 b2 k − a1 b3 j − a2 b1 k + a2 b3 i + a3 b1 j − a3 b2 i = (a2 b3 − a3 b2 ) i+ (a3 b1 − a1 b3 ) j+ (a1 b2 − a2 b1 ) k

(1.6)

 It is probably impossible for most people to remember 1.5. Fortunately, there is a somewhat easier way to remember it. Define the determinant of a 2 × 2 matrix as follows a b ≡ ad − bc c d i a × b = a1 b1

Then

j a2 b2

k a3 b3



(1.7)

where you expand the determinant along the top row. This yields a3 a3 1+1 a2 2+1 a1 3+1 a1 i (−1) + j (−1) + k (−1) b2 b3 b1 b3 b1 a 2 = i b2

a3 b3

a 1 − j b1

a3 b3

a 1 + k b1

a2 b2

a2 b2





Note that to get the scalar which multiplies i you take the determinant of what is left after deleting 1+1 the first row and the first column and multiply by (−1) because i is in the first row and the first column. Then you do the same thing for the j and k. In the case of the j there is a minus sign 1+2 because j is in the first row and the second column and so(−1) = −1 while the k is multiplied 3+1 by (−1) = 1. The above equals (a2 b3 − a3 b2 ) i− (a1 b3 − a3 b1 ) j+ (a1 b2 − a2 b1 ) k

(1.8)

which is the same as 1.6. There will be much more presented on determinants later. For now, consider this an introduction if you have not seen this topic. Example A.0.4 Find (i − j + 2k) × (3i − 2j + k) . Use 1.7 to compute this. i j k −1 1 −1 2 = −2 3 −2 1

2 1

1 2 i− 3 1

1 j+ 3

−1 −2

k = 3i + 5j + k.

Example A.0.5 Find the area of the parallelogram determined by the vectors, (i − j + 2k) , (3i − 2j + k) . These are the same two vectors in Example A.0.4. From Example A.0.4 and the geometric description of the cross √ area is just the √ product, the norm of the vector obtained in Example A.0.4. Thus the area is 9 + 25 + 1 = 35. Example A.0.6 Find the area of the triangle determined by (1, 2, 3) , (0, 2, 5) , (5, 1, 2) .

628

APPENDIX A. THE CROSS PRODUCT

This triangle is obtained by connecting the three points with lines. Picking (1, 2, 3) as a starting point, there are two displacement vectors, (−1, 0, 2) and (4, −1, −1) such that the given vector added to these displacement vectors gives the other two vectors. The area of the triangle is half the area of the parallelogram determined by (−1, √ −1) . Thus (−1, 0, 2) × (4, −1, −1) = (2, 7, 1) √ 0, 2) and (4, −1, and so the area of the triangle is 12 4 + 49 + 1 = 23 6. Observation A.0.7 In general, if you have three points (vectors) in R3 , P, Q, R the area of the triangle is given by 1 |(Q − P) × (R − P)| . 2 Q  -

P

A.1

R

The Box Product

Definition A.1.1 A parallelepiped determined by the three vectors, a, b, and c consists of {ra+sb + tc : r, s, t ∈ [0, 1]} . That is, if you pick three numbers, r, s, and t each in [0, 1] and form ra+sb + tc, then the collection of all such points is what is meant by the parallelepiped determined by these three vectors. The following is a picture of such a thing.You notice the area of the base of the parallelepiped, the parallelogram determined by the vectors, a and b has area equal to |a × b| while the altitude of the parallelepiped is |c| cos θ where θ is the angle shown in the 6 picture between c and a × b. Therefore, the volume of a×b this parallelepiped is the area of the base times the altitude which is just  |a × b| |c| cos θ = a × b · c. c 3 b a

This expression is known as the box product and is sometimes written as [a, b, c] . You should consider what happens if you interchange the b with the c or the a with the c. You can see geometrically from drawing pictures that this merely introduces a minus sign. In any case the box product of three vectors always equals either the volume of the parallelepiped determined by the three vectors or else minus this volume. θ

-

Example A.1.2 Find the volume of the parallelepiped determined by the vectors, i + 2j − 5k, i + 3j − 6k,3i + 2j + 3k. According to the above discussion, pick any two of these, take the cross product and then take the dot product of this with the third of these vectors. The result will be either the desired volume or minus the desired volume. i j k (i + 2j − 5k) × (i + 3j − 6k) = 1 2 −5 = 3i + j + k 1 3 −6 Now take the dot product of this vector with the third which yields (3i + j + k) · (3i + 2j + 3k) = 9 + 2 + 3 = 14.

A.2. THE DISTRIBUTIVE LAW FOR CROSS PRODUCT

629

This shows the volume of this parallelepiped is 14 cubic units. There is a fundamental observation which comes directly from the geometric definitions of the cross product and the dot product. Lemma A.1.3 Let a, b, and c be vectors. Then (a × b) ·c = a· (b × c) . Proof: This follows from observing that either (a × b) ·c and a· (b × c) both give the volume of the parallelepiped or they both give −1 times the volume.  Notation A.1.4 The box product a × b · c = a · b × c is denoted more compactly as [a, b, c].

A.2

The Distributive Law For Cross Product

Here is a proof of the distributive law for the cross product. Let x be a vector. From the above observation, x · a× (b + c) = (x × a) · (b + c) = (x × a) · b+ (x × a) · c = x · a × b + x · a × c = x· (a × b + a × c) . Therefore, x· [a× (b + c) − (a × b + a × c)] = 0 for all x. In particular, this holds for x = a× (b + c) − (a × b + a × c) showing that a× (b + c) = a × b + a × c and this proves the distributive law for the cross product.

630

APPENDIX A. THE CROSS PRODUCT

Appendix B

The Hausdorff Maximal Theorem First is the definition of what is meant by a partial order. Definition B.0.1 A nonempty set F is called a partially ordered set if it has a partial order denoted by ≺. This means it satisfies the following. If x ≺ y and y ≺ z, then x ≺ z. Also x ≺ x. It is like ⊆ on the set of all subsets of a given set. It is not the case that given two elements of F that they are related. In other words, you cannot conclude that either x ≺ y or y ≺ x. A chain, denoted by C ⊆ F has the property that it is totally ordered meaning that if x, y ∈ C, either x ≺ y or y ≺ x. A maximal chain is a chain C which has the property that there is no strictly larger chain. In other words, if x ∈ F \ ∪ C, then C∪ {x} is no longer a chain so x fails to be related to something in C. Here is the Hausdorff maximal theorem. The proof is a proof by contradiction. We assume there is no maximal chain and then show this cannot happen. The axiom of choice is used in choosing the xC right at the beginning of the argument. Theorem B.0.2 Let F be a nonempty partially ordered set with order ≺. Then there exists a maximal chain. Proof: Suppose no chain is maximal. Then for each chain C there exists xC ∈ F \ ∪ C such that C ∪ {xC } is a chain. Call two chains comparable if one is a subset of the other. Let X be the set of all chains. Thus the elements of X are chains and X is a subset of P (F). For C a chain, let θC denote C ∪ {xC } . Pick x0 ∈ F. A subset Y of X will be called a “tower” if θC ∈ Y whenever C ∈ Y, x0 ∈ ∪C for all C ∈ Y, and if S is a subset of Y such that all pairs of chains in S are comparable, then ∪S is in Y. Note that X is a tower. Let Y0 be the intersection of all towers. Then Y0 is also a tower, the smallest one. This follows from the definition. Claim 1: If C0 ∈ Y0 is comparable to every chain C ∈ Y0 , then if C0 ( C, it must be the case that θC0 ⊆ C. The symbol ( indicates proper subset. Proof of Claim 1: Consider B ≡ {D ∈ Y0 : D ⊆ C 0 or xC0 ∈ ∪D}. Let Y1 ≡ Y0 ∩ B. I want to argue that Y1 is a tower. Obviously all chains of Y1 contain x0 in their unions. If D ∈ Y1 , is θD ∈ Y1 ? case 1: D ! C0 . Then xC0 ∈ ∪D and so xC0 ∈ ∪θD so θD ∈ Y1 . case 2: D ⊆ C 0 . Then if θD ! C0 , it follows that D ⊆ C0 D ∪ {xD } . If x ∈ C0 \ D then x = xD . But then C0 D ∪ {xD } ⊆ C0 ∪ {xD } = C0 which is nonsense, and so C0 = D so xD = xC0 ∈ ∪θC0 = ∪θD and so θD ∈ Y1 . If θD ⊆ C0 then right away θD ∈ B. Thus B = Y0 because Y1 cannot be smaller than Y0 . In particular, if D ! C0 , then xC0 ∈ ∪D or in other words, θC0 ⊆ D. Claim 2: Any two chains in Y0 are comparable so if C ( D, then θC ⊆ D. Proof of Claim 2: Let Y1 consist of all chains of Y0 which are comparable to every chain of Y0 . (Like C0 above.) I want to show that Y1 is a tower. Let C ∈ Y1 and D ∈ Y0 . Since C is comparable to all chains in Y0 , either C ( D or C ⊇ D. I need to show that θC is comparable with 631

632

APPENDIX B. THE HAUSDORFF MAXIMAL THEOREM

D. The second case is obvious so consider the first that C ( D. By Claim 1, θC ⊆ D. Since D is arbitrary, this shows that Y1 is a tower. Hence Y1 = Y0 because Y0 is as small as possible. It follows that every two chains in Y0 are comparable and so if C ( D, then θC ⊆ D. Since every pair of chains in Y0 are comparable and Y0 is a tower, it follows that ∪Y0 ∈ Y0 so ∪Y0 is a chain. However, θ ∪ Y0 is a chain which properly contains ∪Y0 and since Y0 is a tower, θ ∪ Y0 ∈ Y0 . Thus ∪ (θ ∪ Y0 ) ! ∪ (∪Y0 ) ⊇ ∪ (θ ∪ Y0 ) which is a contradiction. Therefore, it is impossible to obtain the xC described above for some chain C and so, this C is a maximal chain.  If X is a nonempty set,≤ is an order on X if x ≤ x, and if x, y ∈ X, then

either x ≤ y or y ≤ x

and if x ≤ y and y ≤ z then x ≤ z. ≤ is a well order and say that (X, ≤) is a well-ordered set if every nonempty subset of X has a smallest element. More precisely, if S ̸= ∅ and S ⊆ X then there exists an x ∈ S such that x ≤ y for all y ∈ S. A familiar example of a well-ordered set is the natural numbers. Lemma B.0.3 The Hausdorff maximal principle implies every nonempty set can be well-ordered. Proof: Let X be a nonempty set and let a ∈ X. Then {a} is a well-ordered subset of X. Let F = {S ⊆ X : there exists a well order for S}. Thus F ̸= ∅. For S1 , S2 ∈ F , define S1 ≺ S2 if S1 ⊆ S2 and there exists a well order for S2 , ≤2 such that (S2 , ≤2 ) is well-ordered and if y ∈ S2 \ S1 then x ≤2 y for all x ∈ S1 , and if ≤1 is the well order of S1 then the two orders are consistent on S1 . Then observe that ≺ is a partial order on F. By the Hausdorff maximal principle, let C be a maximal chain in F and let X∞ ≡ ∪C. Define an order, ≤, on X∞ as follows. If x, y are elements of X∞ , pick S ∈ C such that x, y are both in S. Then if ≤S is the order on S, let x ≤ y if and only if x ≤S y. This definition is well defined because of the definition of the order, ≺. Now let U be any nonempty subset of X∞ . Then S ∩ U ̸= ∅ for some S ∈ C. Because of the definition of ≤, if y ∈ S2 \ S1 , Si ∈ C, then x ≤ y for all x ∈ S1 . Thus, if y ∈ X∞ \ S then x ≤ y for all x ∈ S and so the smallest element of S ∩ U exists and is the smallest element in U . Therefore X∞ is well-ordered. Now suppose there exists z ∈ X \ X∞ . Define the following order, ≤1 , on X∞ ∪ {z}. x ≤1 y if and only if x ≤ y whenever x, y ∈ X∞ x ≤1 z whenever x ∈ X∞ . Then let

Ce = {S ∈ C or X∞ ∪ {z}}.

Then Ce is a strictly larger chain than C contradicting maximality of C. Thus X \ X∞ = ∅ and this shows X is well-ordered by ≤.  With these two lemmas the main result follows.

B.1. THE HAMEL BASIS

633

Theorem B.0.4 The following are equivalent. The axiom of choice The Hausdorff maximal principle The well-ordering principle. Proof: It only remains to prove that the well-ordering principle implies the axiom of choice. Let I be a nonempty set and let Xi be a nonempty set for each ∏ i ∈ I. Let X = ∪{Xi : i ∈ I} and well order X. Let f (i) be the smallest element of Xi . Then f ∈ i∈I Xi . 

B.1

The Hamel Basis

A Hamel basis is nothing more than the correct generalization of the notion of a basis for a finite dimensional vector space to vector spaces which are possibly not of finite dimension. Definition B.1.1 Let X be a vector space. A Hamel basis is a subset of X, Λ such that every vector of X can be written as a finite linear combination of vectors of Λ and the vectors of Λ are linearly independent in the sense that if {x1 , · · · , xn } ⊆ Λ and n ∑

ck xk = 0

k=1

then each ck = 0. The main result is the following theorem. Theorem B.1.2 Let X be a nonzero vector space. Then it has a Hamel basis. Proof: Let x1 ∈ X and x1 ̸= 0. Let F denote the collection of subsets of X, Λ containing x1 with the property that the vectors of Λ are linearly independent as described in Definition B.1.1 partially ordered by set inclusion. By the Hausdorff maximal theorem, there exists a maximal chain, C Let Λ = ∪C. Since C is a chain, it follows that if {x1 , · · · , xn } ⊆ C then there exists a single Λ′ ∈ C containing all these vectors. Therefore, if n ∑

ck xk = 0

k=1

it follows each ck = 0. Thus the vectors of Λ are linearly independent. Is every vector of X a finite linear combination of vectors of Λ? Suppose not. Then there exists z which is not equal to a finite linear combination of vectors of Λ. Consider Λ ∪ {z} . If m ∑ cz + ck x k = 0 k=1

where the xk are vectors of Λ, then if c ̸= 0 this contradicts the condition that z is not a finite linear combination of vectors of Λ. Therefore, c = 0 and now all the ck must equal zero because it was just shown Λ is linearly independent. It follows C∪ {Λ ∪ {z}} is a strictly larger chain than C and this is a contradiction. Therefore, Λ is a Hamel basis as claimed. 

B.2

Exercises

1. Zorn’s lemma states that in a nonempty partially ordered set, if every chain has an upper bound, there exists a maximal element, x in the partially ordered set. x is maximal, means that if x ≺ y, it follows y = x. Show Zorn’s lemma is equivalent to the Hausdorff maximal theorem. 2. Show that if Y, Y1 are two Hamel bases of X, then there exists a one to one and onto map from Y to Y1 . Thus any two Hamel bases are of the same size.

634

APPENDIX B. THE HAUSDORFF MAXIMAL THEOREM

Bibliography [1] Apostol T. Calculus Volume II Second edition, Wiley 1969. [2] Apostol, T. Mathematical Analysis, Addison Wesley Publishing Co., 1974. [3] Artin M., Algebra, Pearson 2011. [4] Baker, Roger, Linear Algebra, Rinton Press 2001. [5] Baker, A. Transcendental Number Theory, Cambridge University Press 1975. [6] Baker, Roger, and Kuttler, K. Linear Algebra With Applications, with Roger Baker, World Scientific March (2014). (311 pages) [7] Birkhoff G. and Maclane S., Algebra, Macmillan 1965. [8] Davis H. and Snider A., Vector Analysis Wm. C. Brown 1995. [9] Diestal J. and Uhl J., Vector Measures, American Math. Society, Providence, R.I., 1977. [10] Edwards C.H. Advanced Calculus of several Variables, Dover 1994. [11] Gaal L. Classical Galois Theory, Chelsea publishing company, New York 1973 [12] Chahal J.S., Historical Perspective of Mathematics 2000 B.C. - 2000 A.D. Kendrick Press, Inc. (2007) [13] Cheney, E. W., Introduction To Approximation Theory, McGraw Hill 1966. [14] Golub, G. and Van Loan, C.,Matrix Computations, Johns Hopkins University Press, 1996. [15] Greenberg M.D. Advanced Engineering Mathematics Prentice Hall 1998 Second edition. [16] Friedberg S. Insel A. and Spence L., Linear Algebra, Prentice Hall, 2003. [17] Gurtin M. An introduction to continuum mechanics, Academic press 1981. [18] Hardy G. A Course Of Pure Mathematics, Tenth edition, Cambridge University Press 1992. [19] Herstein I. N., Topics In Algebra, Xerox, 1964. [20] Hewitt E. and Stromberg K. Real and Abstract Analysis, Springer-Verlag, New York, 1965. [21] Hobson E.W., The Theory of functions of a Real Variable and the Theory of Fourier’s Series V. 1, Dover 1957. [22] Hofman K. and Kunze R., Linear Algebra, Prentice Hall, 1971. [23] Horn R. and Johnson C. matrix Analysis, Cambridge University Press, 1985. [24] Jacobsen N. Basic Algebra Freeman 1974.

635

636

BIBLIOGRAPHY

[25] Karlin S. and Taylor H. A First Course in Stochastic Processes, Academic Press, 1975. [26] Kuttler K.L., Modern Analysis CRC Press 1998. [27] Kuttler K.Linear Algebra On web page. Linear Algebra [28] Marsden J. E. and Hoffman J. M., Elementary Classical Analysis, Freeman, 1993. [29] Marcus M., and Minc H., A Survey Of Matrix Theory and Matrix Inequalities, Allyn and Bacon, INc. Boston, 1964 [30] McShane E. J. Integration, Princeton University Press, Princeton, N.J. 1944. [31] Nobel B. and Daniel J. Applied Linear Algebra, Prentice Hall, 1977. [32] Rudin W. Principles of Mathematical Analysis, McGraw Hill, 1976. [33] Rudin W. Functional Analysis, second edition, McGraw-Hill, 1991. [34] Salas S. and Hille E., Calculus One and Several Variables, Wiley 1990. [35] Strang Gilbert, Linear Algebra and its Applications, Harcourt Brace Jovanovich 1980. [36] Wilkinson, J.H., The Algebraic Eigenvalue Problem, Clarendon Press Oxford 1965. [37] Yosida K., Functional Analysis, Springer Verlag, 1978.

Index (−∞, ∞], 498 C 1 , 444 C k , 446 C 1 , 444 C 1 and differentiability, 444 Cc (Ω), 541 Cc∞ , 580 Ccm , 580 Fσ , 519 Gδ , 519 L1 complex vector space, 531 L1loc , 581 Lp compactness, 586 completeness, 575 continuity of translation, 580 definition, 574 density of continuous functions, 578 density of simple functions, 577 norm, 574 separability, 578 Lp separable, 578 L1 (Ω), 530 L∞ , 588 Lp density of smooth functions, 582 ∩, 1 ∪, 1 ϵ net, 257 σ algebra, 497 A close to B eigenvalues, 330 Abel’s formula, 168, 172 Abelian group, 43 absolute convergence convergence, 377 absolute value complex number, 9 accumulation point, 250 adjoint, 323 of matrix, 275 adjugate, 159

algebraic number minimum polynomial, 59 algebraic numbers, 59 field, 61 alternating group, 233 3 cycles, 233 analytic function of matrix, 299 ann(m), 179 approximate identity, 581 arcwise connected, 263 area parallelogram, 627 area of a parallelogram, 626 arithmetic mean, 495 associated, 175 associates, 175 at most countable, 4 automorphism, 219 axiom of choice, 4 balls disjoint, almost covering a set, 558 Banach space, 457, 574, 591 barycenter, 478 basis, 264 basis of eigenvectors diagonalizable, 116 basis of vector space, 46 Bernstein polynomial approximation of derivative, 426 Besicovitch covering theorem, 553 Besicovitch covering theorem, 556, 558 Binet Cauchy volumes, 320 Binet Cauchy formula, 156 binomial theorem, 27 block diagonal matrix, 102 block matrix, 75 block multiplication, 74 Borel sets, 504 bounded continuous linear functions, 594 bounded linear transformations, 273 box product, 628 Brouwer fixed point theorem, 484 637

638

INDEX complact convex set, 484

Cantor function, 521 Cantor set, 520 Caratheodory’s procedure, 514 Cauchy interlacing theorem, 339, 341 Cauchy Schwarz inequality, 266, 591 Cauchy sequence, 252, 318 Cayley Hamilton theorem, 165, 170, 201, 367 chain, 631 chain rule, 439 change of variables, 559 better, 563 formula, 562 linear map, 555 map not one to one, 564 change of variables general case, 564 characteristic polynomial, 164 Cholesky factorization, 370 Clairaut’s theorem, 450 closed set, 250 closed sets limit points, 250 closure of a set, 254 cofactor, 157 column rank, 162 commutative ring, 205 commutative ring with unity, 23 commutator, 236, 339 commutator subgroup, 236 compact sequentially compact, 282 compact set, 256 compactness closed interval, 253 equivalent conditions, 257 companion matrix, 134, 144, 401 minimum polynomial of, 144 complete, 382 complex conjugate, 8 complex measure Radon Nikodym, 605 total variation, 604 complex numbers, 7 complex numbers arithmetic, 7 roots, 10 triangle inequality, 9 complex valued measurable functions, 529 components of a vector, 264 composition of linear transformations, 94 condition number, 374 conjugate of a product, 27

conjugate fields, 241 conjugate linear, 310 connected, 261 connected component, 262 connected components, 262 equivalence class, 262 equivalence relation, 262 open sets, 263 connected sets intersection, 262 intervals, 263 real line, 263 consistent, 37 continuous function, 255 continuous functions, 282 equivalent conditions, 255 contraction map, 258 fixed point, 460 fixed point theorem, 258 convergence in measure, 535 convex set, 592 convex functions, 587 convex combination, 98, 279 convex hull, 98, 279, 477 compactness, 99 convolution, 581 Coordinates, 45 coordinates, 477 countable, 4 counting zeros, 329 Courant Fischer theorem, 354 Cramer’s rule, 160 cross product, 625 area of parallelogram, 626 coordinate description, 626 distributive law, 629 geometric description, 625 cross product coordinate description, 626 geometric description, 625 parallelepiped, 628 cyclic basis, 127, 132 cyclic decomposition theorem, 192 cyclic set, 126, 129 independence, 130 De Moivre’s theorem, 10 definition of Lp , 574 definition of a C k function, 446 density of continuous functions in Lp , 578 derivative, 282 chain rule, 439

INDEX continuity, 445 continuity of Gateaux derivative, 445 continuous, 438 continuous Gateaux derivatives, 442 Frechet, 438 Gateaux, 440, 442 generalized partial, 448 higher order, 445, 446 matrix, 440 partial, 448 second, 445 well defined, 438 derivatives, 438 determinant definition, 153 estimate for Hermitian matrix, 371 expansion along row, column, 158 Hadamard inequality, 371 matrix inverse, 159 partial derivative, cofactor, 168 permutation of rows, 153 product, 155 product of eigenvalules, 333 row, column operations, 154 summary of properties, 164 symmetric definition, 154 transpose, 154 diagonal matrix, 115 diagonalizability, 115 diagonalizable, 115, 348 formal derivative, 120 minimal polynomial and its derivative, 120 differentiable, 438 continuous, 438 continuous partials, 449 differentiable function measurable sets, 558 sets of measure zero, 558 differential equations first order systems, 336 dimension of a vector space, 264 dimension of vector space, 46 direct sum, 101, 182 minimum polynomial splits, 194 notation, 101 torsion module, 181 directional derivative, 442 discrete Fourier transform, 368 distance, 249 to a subspace, 308 distance to a nonempty set, 251 distinct eigenvalues, 126 distinct roots polynomial and its derivative, 242

639 dominated convergence generalization, 534 dominated convergence theorem, 534 dot product, 265 dyadics, 87 Dynkin’s lemma, 503 echelon form, 35 Eggoroff theorem, 568 eigen-pair, 113 eigenvalue, 113, 494 existence, 113 eigenvalues, 164, 329 AB and BA, 166 eigenvector, 113 existence, 113 eigenvectors distinct eigenvalues, 126 independent, 126 elementary matrices, 78 elementary matrix inverse, 80 properties, 80 elementary operations, 33 elementary symmetric polynomials, 206 empty set, 2 equality of mixed partial derivatives, 452 equivalence class, 5, 54, 91 of polynomials, 54 equivalence relation, 5, 54, 91 Euclidean algorithm, 17 Euclidean domain, 173 existence of a fixed point, 384 factorization of a matrix general p.i.d., 186 factorization of matrix Euclidean domain, 184 Fatou’s lemma, 528 field axioms, 7 field extension dimension, 57 finite, 57 field extensions, 57 Field of scalars, 43 fields characteristic, 243 perfect, 244 fields perfect, 244 finite dimensional vector space, 46 finite measure regularity, 505 fixed field, 225

640

INDEX

fixed fields and subgroups, 230 flip, 474 formal derivative, 119 Fourier series, 317 Frechet derivative, 438 Fredholm alternative, 313 Frobenius inner product, 338 Frobenius norm, 275, 359 singular value decomposition, 359 Frobinius norm, 367 Fubini theorem, 548 function, 3 functions measurable, 497 fundamental theorem of algebra, 11, 13, 218 fundamental theorem of algebra plausibility argument, 13 rigorous proof, 14 fundamental theorem of arithmetic, 18 fundamental theorem of calculus general Radon measures, 585 Radon measures, 584 fundamental theorem of Galois theory, 231

Hahn Jordan decomposition, 620 Hamel basis, 633 Hardy’s inequality, 587 Hausdorff maximal principle, 631 Hermitian, 325 orthonormal basis eigenvectors, 353 positive definite, 356 Hermitian matrix factorization, 371 positive part, 366 positive part, Lipschitz continuous, 366 Hermitian operator, 310 largest, smallest, eigenvalues, 354 Hessian matrix, 469 higher order derivative multilinear form, 446 higher order derivatives, 445 implicit function theorem, 463 inverse function theorem, 463 Hilbert space, 303, 591 Holder’s inequality, 269, 571 homogeneous coordinates, 97 homomorphism, 219

g.c.d., 174 Galois group size, 224 Gamma function, 587 gamma function, 455 Gateaux derivative, 440, 442 continuous, 445 Gauss Elimination, 37 Gauss elimination, 34 Gauss Jordan method for inverses, 70 Gauss Seidel method, 385 generalized eigenvectors, 139 geometric mean, 495 Gerschgorin’s theorem, 328 Gram Schmidt process, 271, 305 Grammian matrix, 308, 314 invertible, 308 greatest common divisor, 17, 19 characterization, 17 description, 19 Gronwall’s inequality, 390 group definition, 224 group solvable, 236

ideal, 173 maximal, 202 principal, 173 implicit function theorem, 460 higher order derivatives, 463 inconsistent, 37 initial value problem uniqueness, 390 inner product, 265 inner product space, 591 adjoint operator, 309 inner regular, 504 compact sets, 504 inner regularity, 517 integers modulo a prime, 23 integral continuous function, 430 decreasing function, 523 functions in L1 , 530 linear, 530 operator valued function, 389 vector valued function, 389 integral domain, 23, 173 prime elements, 174, 176 integral over a measurable set, 535 integrals iterated, 433 interchange order of integration, 456 interior point, 249

Hadamard inequality, 371 Hahn decomposition, 620

INDEX intermediate value theorem, 263 intersection, 1 intervals notation, 1 invariance of domain, 486 invariant subspaces direct sum, block diagonal matrix, 102 inverse function theorem, 462, 493 higher order derivatives, 463 inverse image, 2 inverses and determinants, 159 invertible, 69 invertible maps, 458 different spaces, 458 irreducible, 19, 175 relatively prime, 20 isomorphism, 219 extensions, 221 iterated integrals, 433 iterative methods alternate proof of convergence, 387 diagonally dominant, 387 proof of convergence, 384 Jacobi method, 385 Jensens inequality, 587 Jordan block, 136 Jordan canonical form, 124 definition, 138 existence and uniqueness, 139 powers of a matrix, 140 Jordan form convergence of powers of blocks, 141 matrix description, 138 ker, 53 kernel of a product direct sum decomposition, 107 Kirchoff’s law, 41 Lagrange multipliers, 466, 467 Laplace expansion, 157 leading entry, 35 least squares, 312 least squares regression, 454 Lebesgue decomposition, 595 Lebesgue integral desires to be linear, 528 nonnegative function, 524 other definitions, 527 simple function, 525 Lebesgue measurable but not Borel, 521 Lebesgue measure one dimensional, 518 translation invariance, 518

641 translation invariant, 549 Lebesgue Stieltjes measure, 517 lim inf, 24 properties, 26 lim sup, 24 properties, 26 lim sup and lim inf limit, 533 limiit point, 250 limit continuity, 277 infinite limits, 276 limit of a function, 275 limit of a sequence, 250 well defined, 250 limit point, 275 limits combinations of functions, 276 existence of limits, 25 limits and continuity, 277 Lindeloff property, 256 Lindemann Weierstrass theorem, 215 Lindemannn Weierstrass theorem, 209 linear combination, 45, 155 linear independence, 265 linear maps continuous, 594 equivalent conditions, 594 linear transformation, 53, 85 defined on a basis, 86 dimension of vector space, 86 kernel, 106 matrix, 85 rank m, 471 linear transformations a vector space, 85 commuting, 108 composition, matrices, 94 sum, 85 linearly independent, 45 linearly independent set enlarging to a basis, 265 Lipschitz continuous, 258 measurable sets, 554 sets of measure zero, 554 little o notation, 438 local maximum, 469 local minimum, 469 locally one to one, 495 Lusin, 569 Lusin’s theorem, 586 map C 1 , 474

642 primitive and flips, 474 Markov matrix, 283 limit, 286 regular, 286 steady state, 283, 286 math induction, 6 mathematical induction, 6 matrices commuting, 350 notation, 65 transpose, 68 matrix, 65 differentiation operator, 89 inverse, 69 left inverse, 159 linear transformation, 88 lower triangular, 160 main diagonal, 115 Markov, 283 polynomial, 169 right inverse, 159 right, left inverse, 159 row, column, determinant rank, 162 stochastic, 283 upper triangular, 160 matrix positive definite, 369 matrix exponential, 388 matrix multiplication, 66 properties, 67 maximal chain, 631 maximal function Radon measures, 583 maximal ideal, 57, 202 maximum likelihood estimates covariance, 346 mean, 346 mean value inequality, 441, 460 mean value theorem, 441, 460 Cauchy, 430 measurability limit of simple functions, 500 measurable, 513 complex valued, 529 equivalent formulations, 498 linear combinations, 529 measurable complex functions simple functions, 532 measurable composed with continuous, 521 measurable function pointwise a.e. limit of continuous, 546 a.e. Borel, 547 measurable functions, 497 approximation, 500

INDEX pointwise limit, 498 simple functions, 498 measurable into (−∞, ∞], 498 measurable sets, 513 measure, 501 σ finite, 506 Borel, 507 inner regular, 504 metric space, 507 outer regular, 504 properties, 501 measure of balls, 549 measure space completion, 517 regular, 578 measures Lebesgue decomposition, 595 absolutely continuous, 595 decreasing sequences of sets, 501 increasing sequences of sets, 501 regularity, 504 measures from outer measures, 514 metric, 249 properties, 249 metric space, 249 compact sets, 257 complete, 252 completely separable, 255 open set, 249 separable, 255 metric tensor, 314 migration matrix, 286 minimal polynomial finding it, 142 minimum polynomial, 59, 108 algebraic number, 59 direct sum, 193 finding it, 110 Minkowski inequality, 573 integrals, 576 Minkowski’s inequality, 576 minor, 157 mixed partial derivatives, 450 module finitely generated, 179 mollifier, 581 convolution, 581 monomorphism, 219 monotone convergence theorem, 526 Moore Penrose inverse, 362 least squares, 362 uniqueness, 369 morphism, 183 multi-index, 446

INDEX multi-index notation, 446 multivariate normal distribution, 346 Muntz theorems, 435 negative definite, 356 negative part, 529 Neuman series, 392 Neuman series, 457, 458 Newton’s method, 452 nilpotent, 135 block diagonal matrix, 137 Jordan form, uniqueness, 137 Jordan normal form, 137 Noetherian module, 183 Noetherian rings, 176 non equal mixed partials example, 451 non solvable group, 238 nondefective, 115 norm p norm, 269 strictly convex, 386 uniformly convex, 386 normal closure, 241 normal extension, 227 normal matrix, 326 normal subgroup, 228, 236 null and rank, 319 open ball, 249 open set, 249 open cover, 256 open set, 249 open sets countable basis, 255 operator norm, 273 order, 15 ordered partial, 631 totally ordered, 631 orthonormal, 270 orthonormal basis existence, 305 orthonormal polynomials, 316 outer measure, 508, 513 determined by measure, 509 measurable, 513 outer measure on R, 511 outer regular, 504 outer regularity, 517 p.i.d., 173 relatively prime, 174 parallelepiped, 628

643 volume, 314, 628 partial derivatives, 440, 448 continuous, 449 partial order, 631 partially ordered set, 631 partition of unity, 542 infinitely differentiable, 582 partitioned matrix, 75 Penrose conditions, 363 permutation, 152 permutation matrices, 78, 232 permutations cycle, 233 Perron’s theorem, 291 piecewise continuous, 433 points of density, 589 pointwise convergence, 261 polar decomposition, 606 polar form complex number, 10 Polish space, 256, 504 polynomial, 18 addition, 18 degree, 18 divides, 19 division, 18 equality, 18 greatest common divisor, 19 greatest common divisor, uniqueness, 19 irreducible, 19 irreducible factorization, 20 multiplication, 18 relatively prime, 19 polynomial leading term, 18 matrix coefficients, 169 monic, 18 polynomials canceling, 20 coefficients in a field, 54 factoring, 11 factorization, 21, 108 invertibles, 174 principle ideal domain, 173 polynomials in finitely many algebraic numbers, 60 positive definite postitive eigenvalues, 356 principle minors, 357 positive definite matrix, 369 positive part, 529 postitive definite, 356 power method, 395 powers of a matrix existence of a limit, 283

644 Jordan form, 283 stochastic matrix, 283 prime number, 17 primitive, 474 principal ideal domain, 173 greatest common divisor, 174 principal submatrix, 341 principle minors, 356 projection map convex set, 318 QR algorithm, 332, 405 convergence, 407 convergence theorem, 407 non convergence, 332, 411 QR factorization, 311 quadratic form, 327 quadratic formula, 12 quotient prime ideal, 182 quotient group, 228 quotient module, 182 quotient ring field, 182 quotient space, 54, 63 quotient vector space, 63 radical, 196 Radon measure, 547, 578 Radon Nikodym derivative, 598 Radon Nikodym Theorem σ finite measures, 598 finite measures, 595 random variable distribution measure, 506 rank number of pivot columns, 74 rank of a matrix, 74, 162 rank one transformations, 87 rank theorem, 472 rational canonical form, 194 uniqueness, 146 Rayleigh quotient, 401 how close?, 402 real and imaginary parts, 529 regression line, 312 regular, 504 regular measure space, 578 regular Sturm Liouville problem, 317 relatively prime, 17, 174 residue classes, 23 Riesz map, 595 Riesz representation theorem, 309 Hilbert space, 594

INDEX metric space, 542 Riesz Representation theorem C (X), 616 Riesz representation theorem Lp finite measures, 607 Riesz representation theorem Lp σ finite case, 612 Riesz representation theorem for L1 finite measures, 610 right handed system, 625 right polar factorization, 342, 343 row operations, 78 row rank, 162 row reduced echelon form, 35, 73 unique, 74 Sard’s lemma, 562 Sard’s theorem, 562 scalars, 65 Schroder Bernstein theorem, 3 Schur’s theorem, 323 second derivative, 445 second derivative test, 470 sections of open sets, 448 self adjoint, 310 self adjoint nonnegative roots, 344, 351 separable polynomial, 229 separable metric space Lindeloff property, 256 separated sets, 261 sequence, 250 Cauchy, 252 subsequence, 251 sequential compactness, 282 sequentially compact, 282 sequentially compact set, 256 set notation, 1 sgn, 151 uniqueness, 152 shifted inverse power method, 397 complex eigenvalues, 400 sigma algebra, 497 sign of a permutation, 152 similar matrix and its transpose, 143 similar matrices, 91, 168 similarity characteristic polynomial, 168 determinant, 168 trace, 168 similarity transformation, 91 simple field extension, 61

INDEX simple functions approximation, 498 simple groups, 235 simultaneously diagonalizable, 349 commuting family, 351 singular value decomposition, 358 singular values, 357 skew symmetric, 68 solvable by radicals, 239 solvable group, 236 span, 45, 155 spectral mapping theorem, 300 spectral norm, 365 spectral radius, 374 Sperner’s lemma, 481 splitting field, 57, 58 dimension, 58 splitting fields isomorphic, 223 normal extension, 227 spt, 541 stochastic matrix, 283 submodule, 179 cyclic, 179 subsequence, 251 subspace, 48 complementary, 171 vector space, 48 subspaces direct sum, 101 direct sum, basis, 101 substituting matrix into polynomial identity, 169 Sylvester law of inertia, 337 dimention of kernel of product, 106 Sylvester’s equation, 319 symmetric, 68 symmetric polynomial theorem, 206 symmetric polynomials, 206 Taylor formula, 467 Taylor’s formula, 468 Taylor’s theorem, 468 the space AU, 320 Tietze extension theorem, 428 torsion module, 179 total variation, 602 totally bounded, 257 totally ordered, 631 trace, 99, 143 eigenvalues, 99, 143 product, 99, 143 similar matrices, 99, 143 sum of eigenvalues, 333

645 translation invariant, 518 transpose, 68 properties, 68 transposition, 233 triangle inequality, 268, 573 complex numbers, 9 triangulated, 477 triangulation, 477 trichotomy, 15 uniform convergence, 261 uniform convergence and continuity, 261 uniformly integrable, 536, 588 union, 1 uniqueness of limits, 275 unitary, 311, 323 Unitary matrix representation, 392 upper Hessenberg matrix, 415 Vandermonde determinant, 169 variation of constants formula, 337 variational inequality, 318 vector measures, 602 vector space, 43 axioms, 66 dimension, 264 vector space axioms, 43 vector valued function limit theorems, 276 vectors, 45, 66 Vitali cover, 556 volume parallelepiped, 314 Weierstrass approximation estimate, 423 well ordered, 6 well ordered sets, 632 well ordering, 5 Wilson’s theorem, 29 Wronskian, 168, 336 Wronskian alternative, 336 Young’s inequality, 571, 619