Numerical linear algebra and matrix factorizations 9783030364670, 9783030364687

1,072 211 2MB

English Pages 376 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Numerical linear algebra and matrix factorizations
 9783030364670, 9783030364687

Table of contents :
Foreword......Page 6
Preface......Page 7
Acknowledgments......Page 9
Contents......Page 10
List of Figures......Page 17
List of Tables......Page 19
Listings......Page 20
1.1 Notation......Page 21
1.2 Vector Spaces and Subspaces......Page 25
1.2.1 Linear Independence and Bases......Page 26
1.2.2 Subspaces......Page 28
1.2.3 The Vector Spaces Rn and Cn......Page 30
1.3 Linear Systems......Page 31
1.3.1 Basic Properties......Page 32
1.3.2 The Inverse Matrix......Page 33
1.4 Determinants......Page 35
1.5 Eigenvalues, Eigenvectors and Eigenpairs......Page 38
1.6.1 Exercises Sect. 1.1......Page 40
1.6.2 Exercises Sect. 1.3......Page 41
1.6.3 Exercises Sect. 1.4......Page 42
Part I LU and QR Factorizations......Page 45
2.1 Cubic Spline Interpolation......Page 46
2.1.2 Piecewise Linear and Cubic Spline Interpolation......Page 47
2.1.3 Give Me a Moment......Page 50
2.1.4 LU Factorization of a Tridiagonal System......Page 53
2.2 A Two Point Boundary Value Problem......Page 56
2.2.1 Diagonal Dominance......Page 57
2.3.1 The Buckling of a Beam......Page 59
2.4 The Eigenpairs of the 1D Test Matrix......Page 60
2.5.1 Block Multiplication......Page 62
2.5.2 Triangular Matrices......Page 65
2.6.1 Exercises Sect. 2.1......Page 67
2.6.2 Exercises Sect. 2.2......Page 71
2.6.4 Exercises Sect. 2.4......Page 72
2.6.5 Exercises Sect. 2.5......Page 73
2.7 Review Questions......Page 74
3.1 3 by 3 Example......Page 75
3.2 Gauss and LU......Page 77
3.3.1 Algorithms for Triangular Systems......Page 80
3.3.2 Counting Operations......Page 82
3.4.2 Permutation Matrices......Page 84
3.4.3 Pivot Strategies......Page 87
3.5 The LU and LDU Factorizations......Page 88
3.5.1 Existence and Uniqueness......Page 89
3.6 Block LU Factorization......Page 92
3.7.1 Exercises Sect. 3.3......Page 93
3.7.2 Exercises Sect. 3.4......Page 94
3.7.3 Exercises Sect. 3.5......Page 96
3.8 Review Questions......Page 99
4.1 The LDL* Factorization......Page 100
4.2 Positive Definite and Semidefinite Matrices......Page 102
4.2.1 The Cholesky Factorization......Page 104
4.2.2 Positive Definite and Positive Semidefinite Criteria......Page 106
4.3 Semi-Cholesky Factorization of a Banded Matrix......Page 108
4.4 The Non-symmetric Real Case......Page 112
4.5.1 Exercises Sect. 4.2......Page 113
4.6 Review Questions......Page 114
5.1 Inner Products, Orthogonality and Unitary Matrices......Page 116
5.1.1 Real and Complex Inner Products......Page 117
5.1.2 Orthogonality......Page 119
5.1.3 Sum of Subspaces and Orthogonal Projections......Page 121
5.1.4 Unitary and Orthogonal Matrices......Page 123
5.2 The Householder Transformation......Page 124
5.3.1 The Algorithm......Page 128
5.3.3 Solving Linear Systems Using Unitary Transformations......Page 130
5.4.1 Existence......Page 131
5.5 QR and Gram-Schmidt......Page 133
5.6 Givens Rotations......Page 134
5.7.2 Exercises Sect. 5.2......Page 136
5.7.3 Exercises Sect. 5.4......Page 137
5.7.5 Exercises Sect. 5.6......Page 140
5.8 Review Questions......Page 142
Part II Eigenpairs and Singular Values......Page 144
6.1 Defective and Nondefective Matrices......Page 145
6.1.1 Similarity Transformations......Page 147
6.1.2 Algebraic and Geometric Multiplicity of Eigenvalues......Page 148
6.2 The Jordan Factorization......Page 149
6.3.2 Unitary and Orthogonal Matrices......Page 151
6.3.3 Normal Matrices......Page 153
6.3.5 The Quasi-Triangular Form......Page 155
6.3.6 Hermitian Matrices......Page 156
6.4 Minmax Theorems......Page 157
6.5 Left Eigenvectors......Page 159
6.5.1 Biorthogonality......Page 160
6.6.1 Exercises Sect. 6.1......Page 161
6.6.2 Exercises Sect. 6.2......Page 163
6.6.3 Exercises Sect. 6.3......Page 165
6.7 Review Questions......Page 166
7 The Singular Value Decomposition......Page 168
7.1.1 The Matrices A*A, AA*......Page 169
7.2.1 The Singular Value Factorization......Page 171
7.3 A Geometric Interpretation......Page 174
7.4.1 The Frobenius Norm......Page 176
7.4.2 Low Rank Approximation......Page 177
7.5.1 Exercises Sect. 7.1......Page 178
7.5.2 Exercises Sect. 7.2......Page 179
7.5.3 Exercises Sect. 7.4......Page 182
7.6 Review Questions......Page 183
Part III Matrix Norms and Least Squares......Page 184
8.1 Vector Norms......Page 185
8.2.1 Consistent and Subordinate Matrix Norms......Page 188
8.2.2 Operator Norms......Page 189
8.2.3 The Operator p-Norms......Page 191
8.2.4 Unitary Invariant Matrix Norms......Page 193
8.3 The Condition Number with Respect to Inversion......Page 194
8.3.1 Perturbation of the Right Hand Side in a Linear Systems......Page 195
8.3.2 Perturbation of a Square Matrix......Page 197
8.4 Proof That the p-Norms Are Norms......Page 199
8.4.1 p-Norms and Inner Product Norms......Page 202
8.5.1 Exercises Sect. 8.1......Page 204
8.5.2 Exercises Sect. 8.2......Page 205
8.5.3 Exercises Sect. 8.3......Page 208
8.5.4 Exercises Sect. 8.4......Page 211
8.6 Review Questions......Page 212
9 Least Squares......Page 213
9.1 Examples......Page 214
9.1.1 Curve Fitting......Page 216
9.2 Geometric Least Squares Theory......Page 218
9.3.1 Normal Equations......Page 219
9.3.2 QR Factorization......Page 220
9.3.3 Singular Value Decomposition, Generalized Inverses and Least Squares......Page 221
9.4 Perturbation Theory for Least Squares......Page 224
9.4.1 Perturbing the Right Hand Side......Page 225
9.4.2 Perturbing the Matrix......Page 226
9.5.1 The Minmax Theorem for Singular Values and the Hoffman-Wielandt Theorem......Page 227
9.6.1 Exercises Sect. 9.1......Page 230
9.6.2 Exercises Sect. 9.2......Page 231
9.6.3 Exercises Sect. 9.3......Page 232
9.6.5 Exercises Sect. 9.5......Page 235
9.7 Review Questions......Page 236
Part IV Kronecker Products and Fourier Transforms......Page 237
10.1 The 2D Poisson Problem......Page 238
10.1.1 The Test Matrices......Page 241
10.2 The Kronecker Product......Page 242
10.3 Properties of the 2D Test Matrices......Page 245
10.4.2 Exercises Sect. 10.3......Page 247
10.5 Review Questions......Page 249
11.1 Algorithms for a Banded Positive Definite System......Page 250
11.1.2 Block LU Factorization of a Block Tridiagonal Matrix......Page 251
11.2 A Fast Poisson Solver Based on Diagonalization......Page 252
11.3.2 The Discrete Fourier Transform (DFT)......Page 255
11.3.3 The Fast Fourier Transform (FFT)......Page 257
11.4.1 Exercises Sect. 11.3......Page 260
11.5 Review Questions......Page 263
Part V Iterative Methods for Large Linear Systems......Page 264
12.1 Classical Iterative Methods; Component Form......Page 265
12.1.1 The Discrete Poisson System......Page 267
12.2 Classical Iterative Methods; Matrix Form......Page 269
12.2.2 The Splitting Matrices for the Classical Methods......Page 270
12.3 Convergence......Page 272
12.3.1 Richardson's Method......Page 273
12.3.2 Convergence of SOR......Page 275
12.3.3 Convergence of the Classical Methods for the Discrete Poisson Matrix......Page 276
12.3.4 Number of Iterations......Page 278
12.3.5 Stopping the Iteration......Page 279
12.4.1 The Spectral Radius......Page 280
12.4.2 Neumann Series......Page 282
12.5 The Optimal SOR Parameter ω......Page 283
12.6.1 Exercises Sect. 12.3......Page 286
12.6.2 Exercises Sect. 12.4......Page 288
12.7 Review Questions......Page 289
13 The Conjugate Gradient Method......Page 290
13.1 Quadratic Minimization and Steepest Descent......Page 291
13.2.1 Derivation of the Method......Page 294
13.2.2 The Conjugate Gradient Algorithm......Page 296
13.2.4 Implementation Issues......Page 297
13.3.1 The Main Theorem......Page 299
13.3.3 Krylov Spaces and the Best Approximation Property......Page 300
13.4.1 Chebyshev Polynomials......Page 304
13.4.2 Convergence Proof for Steepest Descent......Page 307
13.4.3 Monotonicity of the Error......Page 309
13.5 Preconditioning......Page 310
13.6.1 A Variable Coefficient Problem......Page 313
13.6.2 Applying Preconditioning......Page 316
13.7.1 Exercises Sect. 13.1......Page 317
13.7.2 Exercises Sect. 13.2......Page 318
13.7.3 Exercises Sect. 13.3......Page 320
13.7.4 Exercises Sect. 13.4......Page 323
13.8 Review Questions......Page 324
Part VI Eigenvalues and Eigenvectors......Page 325
14.1 Eigenpairs......Page 326
14.2 Gershgorin's Theorem......Page 327
14.3 Perturbation of Eigenvalues......Page 329
14.3.1 Nondefective Matrices......Page 331
14.4 Unitary Similarity Transformation of a Matrix into Upper Hessenberg Form......Page 333
14.5 Computing a Selected Eigenvalue of a Symmetric Matrix......Page 335
14.5.1 The Inertia Theorem......Page 337
14.5.2 Approximating λm......Page 338
14.6.1 Exercises Sect. 14.1......Page 339
14.6.3 Exercises Sect. 14.3......Page 340
14.6.5 Exercises Sect. 14.5......Page 341
14.7 Review Questions......Page 343
15.1.1 The Power Method......Page 344
15.1.2 The Inverse Power Method......Page 348
15.1.3 Rayleigh Quotient Iteration......Page 349
15.2 The Basic QR Algorithm......Page 351
15.2.1 Relation to the Power Method......Page 352
15.2.2 Invariance of the Hessenberg Form......Page 353
15.3 The Shifted QR Algorithms......Page 354
15.4.1 Exercises Sect. 15.1......Page 355
15.5 Review Questions......Page 356
Part VII Appendix......Page 357
16 Differentiation of Vector Functions......Page 358
References......Page 361
Index......Page 362

Citation preview

22

Tom Lyche

Numerical Linear Algebra and Matrix Factorizations Editorial Board T. J.Barth M.Griebel D.E.Keyes R.M.Nieminen D.Roose T.Schlick

Texts in Computational Science and Engineering Editors Timothy J. Barth Michael Griebel David E. Keyes Risto M. Nieminen Dirk Roose Tamar Schlick

22

More information about this series at http://www.springer.com/series/5151

Tom Lyche

Numerical Linear Algebra and Matrix Factorizations

Tom Lyche Blindern University of Oslo Oslo, Norway

ISSN 1611-0994 ISSN 2197-179X (electronic) Texts in Computational Science and Engineering ISBN 978-3-030-36467-0 ISBN 978-3-030-36468-7 (eBook) https://doi.org/10.1007/978-3-030-36468-7 Mathematics Subject Classification (2010): 15-XX, 65-XX © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

It is a pleasure to write this foreword to the book “Numerical Linear Algebra and Matrix Factorizations” by Tom Lyche. I see this book project from three perspectives, corresponding to my three different roles: first, as a friend and close colleague of Tom for a number of years, secondly as the present department head, and, finally, as a researcher within the international linear algebra and matrix theory community. The book actually has a long history and started out as lecture notes that Tom wrote for a course in numerical linear algebra. For almost forty years this course has been an important and popular course for our students in mathematics, both in theoretical and more applied directions, as well as students in statistics, physics, mechanics and computer science. These notes have been revised multiple times during the years, and new topics have been added. I have had the pleasure to lecture the course myself, using Tom’s lecture notes, and I believe that both the selection of topics and the combined approach of theory and algorithms is very appealing. This is also what our students point out when they have taken this course. As we know, the area presented in this book play a highly central role in many applications of mathematics and in scientific computing in general. Sometimes, in the international linear algebra and matrix theory community, one divides the area into numerical linear algebra, applied linear algebra and core (theoretical) linear algebra. This may serve some purpose, but often it is fruitful to have a more unified view on this, in order to see the interplay between theory, applications and algorithms. I think this view dominates this book, and that this makes the book interesting to a wide range of readers. Finally, I would like to thank Tom for his work with this book and the mentioned course, and for being a good colleague from whom I have learned a lot. I know that his international research community in spline theory also share this view. Most importantly, I hope that you, the reader, will enjoy the book! Oslo, Norway June 2019

Geir Dahl

v

Preface

This book, which has grown out of a one semester course at the University of Oslo, targets upper undergraduate and beginning graduate students in mathematics, statistics, computational physics and engineering who need a mathematical background in numerical linear algebra and related matrix factorizations. Mastering the material in this book should enable a student to analyze computational problems and develop his or her own algorithms for solving problems of the following kind, • System of linear equations. Given a (square) matrix A and a vector b. Find a vector x such that Ax = b. • Least squares. Given a (rectangular) matrix A and a vector b. Find a vector x such that the sum of squares of the components of b − Ax is as small as possible. • Eigenvalues and eigenvectors. Given a (square) matrix A . Find a number λ and/or a nonzero vector x such that Ax = λx. Such problems can be large and difficult to handle, so much can be gained by understanding and taking advantage of special structures. For this we need a good understanding of basic numerical linear algebra and matrix factorizations. Factoring a matrix into a product of simpler matrices is a crucial tool in numerical linear algebra for it allows one to tackle large problems through solving a sequence of easier ones. The main characteristics of this book are as follows: 1. It is self-contained, only assuming first year calculus, an introductory course in linear algebra, and some experience in solving mathematical problems on a computer. A special feature of this book is the detailed proofs of practically all results. Parts of the book can be studied independently making it suitable for self study. 2. There are numerous exercises which can be found at the end of each chapter. In a separate book we offer solutions to all problems. Solutions of many exam problems given for this course at the University of Oslo are included in this separate volume. vii

viii

Preface

3. The book, consisting of an introductory first chapter and 15 more chapters, naturally disaggregating into six thematically related parts. The chapters are designed to be suitable for a one week per chapter one semester course. Toward the goal of being self-contained, the first chapter contains a review of linear algebra, and is provided to the reader for convenient occasional reference. 4. Many of the chapters contain material beyond what might normally be covered in one week of lectures. A typical 15 week semester’s curriculum could consist of the following curated material LU and QR factorizations 2.4, 2.5, 3.2, 3.3, 3.5, 4.1, 4.2, 5.1 − 5.4, 5.6 SVD, norms and LSQ 6.1, 6.3, 7.1 − 7.4, 8.1 − 8.3, 9.1 − 9.3, 9.4.1 Kronecker products 10.1, 10.2, 10.3, 11.1, 11.2, 11.3 Iterative methods 12.1 − 12.4, 13.1 − 13.3, 13.5 Eigenpairs 14.1 − 14.5, 15.1 − 15.3 Chapters 2–4 give a rather complete treatment of various LU factorizations. Chapters 5–9 cover QR and singular value factorizations, matrix norms, least squares methods and perturbation theory for linear equations and least squares problems. Chapter 10 gives an introduction to Kronecker products. We illustrate their use by giving simple proofs of properties of the matrix arising from a discretization of the 2 dimensional Poison Equation. Also, we study fast methods based on eigenvector expansions and the Fast Fourier Transform in Chap. 11. Some background from Chaps. 2, 3 and 4 may be needed for Chaps. 10 and 11. Iterative methods are studied in Chaps. 12 and 13. This includes the classical methods of Jacobi, Gauss Seidel Richardson and Successive Over Relaxation (SOR), as well as a derivation and convergence analysis of the methods of steepest descent and conjugate gradients. The preconditioned conjugate gradient method is introduced and applied to the Poisson problem with variable coefficients. In Chap. 14 we consider perturbation theory for eigenvalues, the power method and its variants, and use the Inertia Theorem to find a single eigenvalue of a symmetric matrix. Chapter 15 gives a brief informal introduction to one of the most celebrated algorithms of the twentieth century, the QR method for finding all eigenvalues and eigenvectors of a matrix. 5. In this book we give many detailed numerical algorithms for solving linear algebra problems. We have written these algorithms as functions in MATLAB. A list of these functions and the page number where they can be found is included after the table of contents. Moreover, their listings can be found online at http:// folk.uio.no/tom/numlinalg/code. Complexity is discussed briefly in Sect. 3.3.2. As for programming issues, we often vectorize the algorithms leading to shorter and more efficient programs. Stability is important both for the mathematical problems and for the numerical algorithms. Stability can be studied in terms of perturbation theory that leads to condition numbers, see Chaps. 8, 9 and 14. We

Preface

ix

will often use phrases like “the algorithm is numerically stable” or “the algorithm is not numerically stable” without saying precisely what we mean by this. Loosely speaking, an algorithm is numerically stable if the solution, computed in floating point arithmetic, is the exact solution of a slightly perturbed problem. To determine upper bounds for these perturbations is the topic of backward error analysis. We refer to [7] and [17, 18] for an in-depths treatment. A list of freely available software tools for solving linear algebra problems can be found at www.netlib.org/utk/people/JackDongarra/la-sw.html To supplement this volume the reader might consult Björck [2], Meyer [15] and Stewart [17, 18]. For matrix analysis the two volumes by Horn and Johnson [9, 10] contain considerable additional material.

Acknowledgments I would like to thank my colleagues Elaine Cohen, Geir Dahl, Michael Floater, Knut Mørken, Richard Riesenfeld, Nils Henrik Risebro, Øyvind Ryan and Ragnar Winther for all the inspiring discussions we have had over the years. Earlier versions of this book were converted to LaTeX by Are Magnus Bruaset and Njål Foldnes with help for the final version from Øyvind Ryan. I thank Christian Schulz, Georg Muntingh and Øyvind Ryan who helped me with the exercise sessions and we have, in a separate volume, provided solutions to practically all problems in this book. I also thank an anonymous referee for useful suggestions. Finally, I would like to give a special thanks to Larry Schumaker for his enduring friendship and encouragement over the years. Oslo, Norway June 2019

Tom Lyche

Contents

1

A Short Review of Linear Algebra . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 Notation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Vector Spaces and Subspaces .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.1 Linear Independence and Bases. . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2.3 The Vector Spaces Rn and Cn . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.1 Basic Properties.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3.2 The Inverse Matrix . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.5 Eigenvalues, Eigenvectors and Eigenpairs . . . . . .. . . . . . . . . . . . . . . . . . . . 1.6 Exercises Chap. 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.6.1 Exercises Sect. 1.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.6.2 Exercises Sect. 1.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.6.3 Exercises Sect. 1.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

Part I 2

1 1 5 6 8 10 11 12 13 15 18 20 20 21 22

LU and QR Factorizations

Diagonally Dominant Tridiagonal Matrices; Three Examples . . . . . . . . 2.1 Cubic Spline Interpolation .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.1 Polynomial Interpolation .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.2 Piecewise Linear and Cubic Spline Interpolation .. . . . . . . . 2.1.3 Give Me a Moment . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1.4 LU Factorization of a Tridiagonal System . . . . . . . . . . . . . . . . 2.2 A Two Point Boundary Value Problem .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.1 Diagonal Dominance . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3 An Eigenvalue Problem .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.3.1 The Buckling of a Beam . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.4 The Eigenpairs of the 1D Test Matrix .. . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

27 27 28 28 31 34 37 38 40 40 41

xi

xii

Contents

2.5

Block Multiplication and Triangular Matrices . .. . . . . . . . . . . . . . . . . . . . 2.5.1 Block Multiplication . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.5.2 Triangular Matrices . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Exercises Chap. 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.1 Exercises Sect. 2.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.2 Exercises Sect. 2.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.3 Exercises Sect. 2.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.4 Exercises Sect. 2.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.6.5 Exercises Sect. 2.5 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

43 43 46 48 48 52 53 53 54 55

3

Gaussian Elimination and LU Factorizations . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 3 by 3 Example .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2 Gauss and LU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Banded Triangular Systems . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3.1 Algorithms for Triangular Systems . . . .. . . . . . . . . . . . . . . . . . . . 3.3.2 Counting Operations . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4 The PLU Factorization .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.1 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.2 Permutation Matrices . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.4.3 Pivot Strategies . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5 The LU and LDU Factorizations . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.5.1 Existence and Uniqueness.. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.6 Block LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7 Exercises Chap. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7.1 Exercises Sect. 3.3 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7.2 Exercises Sect. 3.4 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7.3 Exercises Sect. 3.5 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.7.4 Exercises Sect. 3.6 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.8 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

57 57 59 62 62 64 66 66 66 69 70 71 74 75 75 76 78 81 81

4

LDL* Factorization and Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . 4.1 The LDL* Factorization . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2 Positive Definite and Semidefinite Matrices . . . .. . . . . . . . . . . . . . . . . . . . 4.2.1 The Cholesky Factorization . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.2 Positive Definite and Positive Semidefinite Criteria . . . . . . 4.3 Semi-Cholesky Factorization of a Banded Matrix .. . . . . . . . . . . . . . . . . 4.4 The Non-symmetric Real Case . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5 Exercises Chap. 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.5.1 Exercises Sect. 4.2 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

83 83 85 87 89 91 95 96 96 97

5

Orthonormal and Unitary Transformations .. . . . . . . .. . . . . . . . . . . . . . . . . . . . 99 5.1 Inner Products, Orthogonality and Unitary Matrices . . . . . . . . . . . . . . . 99 5.1.1 Real and Complex Inner Products . . . . .. . . . . . . . . . . . . . . . . . . . 100 5.1.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 102

2.6

2.7

Contents

5.2 5.3

5.4 5.5 5.6 5.7

5.8 Part II 6

xiii

5.1.3 Sum of Subspaces and Orthogonal Projections . . . . . . . . . . . 5.1.4 Unitary and Orthogonal Matrices . . . . . .. . . . . . . . . . . . . . . . . . . . The Householder Transformation . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Householder Triangulation.. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.3.2 The Number of Arithmetic Operations . . . . . . . . . . . . . . . . . . . . 5.3.3 Solving Linear Systems Using Unitary Transformations . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The QR Decomposition and QR Factorization .. . . . . . . . . . . . . . . . . . . . 5.4.1 Existence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . QR and Gram-Schmidt .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Exercises Chap. 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.1 Exercises Sect. 5.1 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.2 Exercises Sect. 5.2 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.3 Exercises Sect. 5.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.4 Exercises Sect. 5.5 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.7.5 Exercises Sect. 5.6 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

104 106 107 111 111 113 113 114 114 116 117 119 119 119 120 123 123 125

Eigenpairs and Singular Values

Eigenpairs and Similarity Transformations . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1 Defective and Nondefective Matrices . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1.1 Similarity Transformations .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.1.2 Algebraic and Geometric Multiplicity of Eigenvalues .. . . 6.2 The Jordan Factorization .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3 The Schur Factorization and Normal Matrices .. . . . . . . . . . . . . . . . . . . . 6.3.1 The Schur Factorization .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.2 Unitary and Orthogonal Matrices . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.3 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.4 The Rayleigh Quotient . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.5 The Quasi-Triangular Form . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.6 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Minmax Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4.1 The Hoffman-Wielandt Theorem . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5 Left Eigenvectors .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.5.1 Biorthogonality . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6 Exercises Chap. 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.1 Exercises Sect. 6.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.2 Exercises Sect. 6.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.3 Exercises Sect. 6.3 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.6.4 Exercises Sect. 6.4 . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

129 129 131 132 133 135 135 135 137 139 139 140 141 143 143 144 145 145 147 149 150 150

xiv

7

Contents

The Singular Value Decomposition .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 The SVD Always Exists . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1.1 The Matrices A∗ A, AA∗ . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2 Further Properties of SVD . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2.1 The Singular Value Factorization . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2.2 SVD and the Four Fundamental Subspaces . . . . . . . . . . . . . . . 7.3 A Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4 Determining the Rank of a Matrix Numerically . . . . . . . . . . . . . . . . . . . . 7.4.1 The Frobenius Norm . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4.2 Low Rank Approximation . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5 Exercises Chap. 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.1 Exercises Sect. 7.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.2 Exercises Sect. 7.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.5.3 Exercises Sect. 7.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

Part III 8

9

153 154 154 156 156 159 159 161 161 162 163 163 164 167 168

Matrix Norms and Least Squares

Matrix Norms and Perturbation Theory for Linear Systems . . . . . . . . . 8.1 Vector Norms .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2 Matrix Norms.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.1 Consistent and Subordinate Matrix Norms .. . . . . . . . . . . . . . . 8.2.2 Operator Norms.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.3 The Operator p-Norms . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.4 Unitary Invariant Matrix Norms . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.5 Absolute and Monotone Norms .. . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3 The Condition Number with Respect to Inversion.. . . . . . . . . . . . . . . . . 8.3.1 Perturbation of the Right Hand Side in a Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.2 Perturbation of a Square Matrix . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4 Proof That the p-Norms Are Norms . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.4.1 p-Norms and Inner Product Norms . . . .. . . . . . . . . . . . . . . . . . . . 8.5 Exercises Chap. 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5.1 Exercises Sect. 8.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5.2 Exercises Sect. 8.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5.3 Exercises Sect. 8.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.5.4 Exercises Sect. 8.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.6 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

171 171 174 174 175 177 179 180 180

Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1.1 Curve Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 Geometric Least Squares Theory .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3 Numerical Solution.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3.1 Normal Equations . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3.2 QR Factorization.. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

199 200 202 204 205 205 206

181 183 185 188 190 190 191 194 197 198

Contents

xv

9.3.3

9.4

9.5

9.6

9.7 Part IV

Singular Value Decomposition, Generalized Inverses and Least Squares .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Perturbation Theory for Least Squares .. . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.1 Perturbing the Right Hand Side . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4.2 Perturbing the Matrix .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Perturbation Theory for Singular Values .. . . . . . .. . . . . . . . . . . . . . . . . . . . 9.5.1 The Minmax Theorem for Singular Values and the Hoffman-Wielandt Theorem . .. . . . . . . . . . . . . . . . . . . . Exercises Chap. 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.1 Exercises Sect. 9.1 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.2 Exercises Sect. 9.2 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.3 Exercises Sect. 9.3 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.4 Exercises Sect. 9.4 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.6.5 Exercises Sect. 9.5 .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

207 210 211 212 213 213 216 216 217 218 221 221 222

Kronecker Products and Fourier Transforms

10 The Kronecker Product .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1 The 2D Poisson Problem.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.1.1 The Test Matrices .. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.2 The Kronecker Product . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.3 Properties of the 2D Test Matrices . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4 Exercises Chap. 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4.1 Exercises Sects. 10.1, 10.2 . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.4.2 Exercises Sect. 10.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

225 225 228 229 232 234 234 234 236

11 Fast Direct Solution of a Large Linear System . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1 Algorithms for a Banded Positive Definite System .. . . . . . . . . . . . . . . . 11.1.1 Cholesky Factorization . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1.2 Block LU Factorization of a Block Tridiagonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.1.3 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.2 A Fast Poisson Solver Based on Diagonalization .. . . . . . . . . . . . . . . . . . 11.3 A Fast Poisson Solver Based on the Discrete Sine and Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.3.1 The Discrete Sine Transform (DST) .. .. . . . . . . . . . . . . . . . . . . . 11.3.2 The Discrete Fourier Transform (DFT).. . . . . . . . . . . . . . . . . . . 11.3.3 The Fast Fourier Transform (FFT) .. . . .. . . . . . . . . . . . . . . . . . . . 11.3.4 A Poisson Solver Based on the FFT . . .. . . . . . . . . . . . . . . . . . . . 11.4 Exercises Chap. 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.4.1 Exercises Sect. 11.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

237 237 238 238 239 239 242 242 242 244 247 247 247 250

xvi

Part V

Contents

Iterative Methods for Large Linear Systems

12 The Classical Iterative Methods . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.1 Classical Iterative Methods; Component Form .. . . . . . . . . . . . . . . . . . . . 12.1.1 The Discrete Poisson System . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2 Classical Iterative Methods; Matrix Form.. . . . . .. . . . . . . . . . . . . . . . . . . . 12.2.1 Fixed-Point Form . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.2.2 The Splitting Matrices for the Classical Methods .. . . . . . . . 12.3 Convergence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3.1 Richardson’s Method .. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3.2 Convergence of SOR . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3.3 Convergence of the Classical Methods for the Discrete Poisson Matrix .. . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3.4 Number of Iterations . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.3.5 Stopping the Iteration.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.4 Powers of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.4.1 The Spectral Radius . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.4.2 Neumann Series . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.5 The Optimal SOR Parameter ω . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.6 Exercises Chap. 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.6.1 Exercises Sect. 12.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.6.2 Exercises Sect. 12.4 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 12.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

253 253 255 257 258 258 260 261 263

13 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.1 Quadratic Minimization and Steepest Descent .. . . . . . . . . . . . . . . . . . . . 13.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.2.1 Derivation of the Method.. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.2.2 The Conjugate Gradient Algorithm .. . .. . . . . . . . . . . . . . . . . . . . 13.2.3 Numerical Example . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.2.4 Implementation Issues . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.3 Convergence .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.3.1 The Main Theorem . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.3.2 The Number of Iterations for the Model Problems . . . . . . . 13.3.3 Krylov Spaces and the Best Approximation Property . . . . 13.4 Proof of the Convergence Estimates. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.4.1 Chebyshev Polynomials .. . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.4.2 Convergence Proof for Steepest Descent .. . . . . . . . . . . . . . . . . 13.4.3 Monotonicity of the Error . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.5 Preconditioning .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.6 Preconditioning Example . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.6.1 A Variable Coefficient Problem .. . . . . . .. . . . . . . . . . . . . . . . . . . . 13.6.2 Applying Preconditioning .. . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.7 Exercises Chap. 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.7.1 Exercises Sect. 13.1 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.7.2 Exercises Sect. 13.2 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

279 280 283 283 285 286 286 288 288 289 289 293 293 296 298 299 302 302 305 306 306 307

264 266 267 268 268 270 271 274 274 276 277

Contents

xvii

13.7.3 Exercises Sect. 13.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.7.4 Exercises Sect. 13.4 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.7.5 Exercises Sect. 13.5 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 13.8 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Part VI

309 312 313 313

Eigenvalues and Eigenvectors

14 Numerical Eigenvalue Problems . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.1 Eigenpairs.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.2 Gershgorin’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.3 Perturbation of Eigenvalues.. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.3.1 Nondefective Matrices . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.4 Unitary Similarity Transformation of a Matrix into Upper Hessenberg Form .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.4.1 Assembling Householder Transformations .. . . . . . . . . . . . . . . 14.5 Computing a Selected Eigenvalue of a Symmetric Matrix .. . . . . . . . 14.5.1 The Inertia Theorem .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.5.2 Approximating λm . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.6 Exercises Chap. 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.6.1 Exercises Sect. 14.1 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.6.2 Exercises Sect. 14.2 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.6.3 Exercises Sect. 14.3 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.6.4 Exercises Sect. 14.4 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.6.5 Exercises Sect. 14.5 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14.7 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

324 326 326 328 329 330 330 331 331 332 332 334

15 The QR Algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.1 The Power Method and Its Variants . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.1.1 The Power Method . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.1.2 The Inverse Power Method .. . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.1.3 Rayleigh Quotient Iteration . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.2 The Basic QR Algorithm . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.2.1 Relation to the Power Method . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.2.2 Invariance of the Hessenberg Form .. . .. . . . . . . . . . . . . . . . . . . . 15.2.3 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.3 The Shifted QR Algorithms . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.4 Exercises Chap. 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.4.1 Exercises Sect. 15.1 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 15.5 Review Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

335 335 335 339 340 342 343 344 345 345 346 346 347

Part VII

317 317 318 320 322

Appendix

16 Differentiation of Vector Functions . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 351 References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 355 Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 357

List of Figures

Fig. 1.1

The triangle T defined by the three points P1 , P2 and P3 .. . . . . . . . .

Fig. 2.1

The polynomial of degree 13 interpolating f (x) = arctan(10x) + π/2 on [−1, 1]. See text . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The piecewise linear polynomial interpolating f (x) = arctan(10x) + π/2 at n = 14 uniform points on [−1, 1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A cubic spline with one knot interpolating f (x) = x 4 on [0, 2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . A cubic B-spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The cubic spline interpolating f (x) = arctan(10x) + π/2 at 14 equidistant sites on [−1, 1]. The exact function is also shown .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

Fig. 2.2

Fig. 2.3 Fig. 2.4 Fig. 2.5

Fig. 3.1 Fig. 3.2 Fig. 5.1 Fig. 5.2 Fig. 5.3 Fig. 5.4

Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Lower triangular 5 × 5 band matrices: d = 1 (left) and d = 2 right . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . The construction of v 1 and v 2 in Gram-Schmidt. The constant c is given by c := s 2 , v 1 /v 1 , v 1  . . .. . . . . . . . . . . . . . . . . . . . The orthogonal projections of s + t into S and T . . . . . . . . . . . . . . . . . . The Householder transformation in Example 5.1 . . . . . . . . . . . . . . . . . . . A plane rotation .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .

23

29

29 31 34

51 59 62 103 105 108 117

Fig. 7.1

The ellipse y12 /9 + y22 = 1 (left) and the rotated ellipse AS (right) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 160

Fig. 8.1

A convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 185

Fig. 9.1 Fig. 9.2

A least squares fit to data . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 201 Graphical interpretation of the bounds in Theorem 9.8 .. . . . . . . . . . . . 211

Fig. 10.1 Numbering of grid points.. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 227 xix

xx

List of Figures

Fig. 10.2 The 5-point stencil .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 227 Fig. 10.3 Band structure of the 2D test matrix . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 228 Fig. 11.1 Fill-inn in the Cholesky factor of the Poisson matrix (n = 100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 238 Fig. 12.1 The functions α → |1 − αλ1 | and α → |1 − αλn | . . . . . . . . . . . . . . . . . 262 Fig. 12.2 ρ(Gω ) with ω ∈ [0, 2] for n = 100, (lower curve) and n = 2500 (upper curve) . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266 Fig. 13.1 Level curves for Q(x, y) given by (13.4). Also shown is a steepest descent iteration (left) and a conjugate gradient iteration (right) to find the minimum of Q (cf Examples 13.1,13.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 281 Fig. 13.2 The orthogonal projection of x − x 0 into Wk . . .. . . . . . . . . . . . . . . . . . . . 291 Fig. 13.3 This is an illustration of the proof of Theorem 13.6 for k = 3. f ≡ Q − Q∗ has a double zero at μ1 and one zero between μ2 and μ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 295 Fig. 14.1 The Gershgorin disk Ri . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 319 Fig. 15.1 Post multiplication in a QR step . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 344

List of Tables

Table 12.1 The number of iterations kn to solve the discrete Poisson problem with n unknowns using the methods of Jacobi, Gauss-Seidel, and SOR (see text) with a tolerance 10−8 . . . . . . . . . 256 Table 12.2 Spectral radial for GJ , G1 , Gω∗ and the smallest integer kn such that ρ(G)kn ≤ 10−8 . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 266 Table 13.1 The of iterations K for the averaging problem on a √ √ number n × n grid for various n . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287 Table 13.2 The of iterations K for the Poisson problem on a √ number √ n × n grid for various n . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 287 Table 13.3 The number of iterations K (no preconditioning) and Kpre (with preconditioning) for the problem (13.52) using the discrete Poisson problem as a preconditioner .. . . . . . . . . . . . . . . . . . . . 305 Table 15.1 Quadratic convergence of Rayleigh quotient iteration . . . . . . . . . . . . 341

xxi

Listings

2.1 2.2 2.3 2.4 2.5

trifactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . trisolve .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . splineint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . findsubintervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . splineval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

36 36 50 50 51

3.1 3.2 3.3 3.4 3.5

rforwardsolve.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . rbacksolve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . cforwardsolve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . L1U . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . cbacksolve .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

63 63 64 73 75

4.1 4.2 4.3

LDLs. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . bandcholesky .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . bandsemicholeskyL .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

85 89 94

5.1 5.2 5.3

housegen .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . housetriang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . rothesstri .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

110 112 123

11.1 fastpoisson .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 11.2 fftrec . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

241 246

12.1 jdp . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 12.2 sordp ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

256 257

13.1 cg . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 13.2 cgtest. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 13.3 pcg . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

286 287 301

14.1 hesshousegen .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 14.2 accumulateQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

325 326

15.1 powerit .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 15.2 rayleighit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .

338 341 xxiii

Chapter 1

A Short Review of Linear Algebra

In this introductory chapter we give a compact introduction to linear algebra with emphasis on Rn and Cn . For a more elementary introduction, see for example the book [13].

1.1 Notation The following sets and notations will be used in this book. 1. The sets of natural numbers, integers, rational numbers, real numbers, and complex numbers are denoted by N, Z, Q, R, C, respectively. 2. We use the “colon equal” symbol v := e to indicate that the symbol v is defined by the expression e. 3. Rn is the set of n-tuples of real numbers which we will represent as bold face column vectors. Thus x ∈ Rn means ⎡ ⎤ x1 ⎢ x2 ⎥ ⎢ ⎥ x = ⎢ . ⎥, ⎣ .. ⎦ xn where xi ∈ R for i = 1, . . . , n. Row vectors are normally identified using the transpose operation. Thus if x ∈ Rn then x is a column vector and x T is a row vector.

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_1

1

2

1 A Short Review of Linear Algebra

4. Addition and scalar multiplication are denoted and defined by ⎡

⎤ x1 + y1 ⎢ ⎥ x + y := ⎣ ... ⎦ ,



⎤ ax1 ⎢ ⎥ ax := ⎣ ... ⎦ ,

xn + yn

x, y ∈ Rn ,

a ∈ R.

axn

5. Rm×n is the set of matrices A with real elements. The integers m and n are the number of rows and columns in the tableau ⎤ a11 a12 · · · a1n ⎢ a21 a22 · · · a2n ⎥ ⎥ ⎢ A=⎢ . .. . . .. ⎥ . ⎣ .. . . ⎦ . am1 am2 · · · amn ⎡

The element in the ith row and j th column of A will be denoted by ai,j , aij , A(i, j ) or (A)i,j . We use the notations ⎤ a1j ⎢ a2j ⎥ ⎢ ⎥ a :j := ⎢ . ⎥ , ⎣ .. ⎦ ⎡

⎤ a T1: ⎢ aT ⎥ ⎢ 2: ⎥ A = [a :1 , a :2 , . . . a :n ] = ⎢ . ⎥ ⎣ .. ⎦ ⎡

a Ti: := [ai1 , ai2 , . . . , ain ],

a Tm:

amj

for the columns a :j and rows a Ti: of A. We often drop the colon and write a j and a Ti with the risk of some confusion. If m = 1 then A is a row vector, if n = 1 then A is a column vector, while if m = n then A is a square matrix. In this text we will denote matrices by boldface capital letters A, B, C, · · · and vectors most often by boldface lower case letters x, y, z, · · · . 6. A complex number is a number written in the form x = a + ib, where a, b are real numbers and i, the imaginary unit, satisfies i 2 = −1. The set of all such numbers is denoted by C. The numbers a = Re x and b = Im x are the real and imaginary part of x. The number x√ := a − ib is called the complex √ conjugate of x = a + ib, and |x| := xx = a 2 + b 2 the absolute value or modulus of x. The complex exponential function can be defined by ex = ea+ib := ea (cos b + i sin b). In particular, eiπ/2 = i,

eiπ = −1,

e2iπ = 1.

1.1 Notation

3

We have ex+y = ex ey for all x, y ∈ C. The polar form of a complex number is x = a + ib = reiθ ,

r = |x| =

 a 2 + b2 ,

cos θ =

a , r

sin θ =

b . r

7. For matrices and vectors with complex elements we use the notation A ∈ Cm×n and x ∈ Cn . We define complex row vectors using either the transpose x T or the conjugate transpose operation x ∗ := x T = [x 1 , . . . , x n ]. If x ∈ Rn then x∗ = xT . 8. For x, y ∈ Cn and a ∈ C the operations of vector addition and scalar multiplication is defined by component operations as in the real case (cf. 4.). 9. The arithmetic operations on rectangular matrices are • matrix addition C := A + B if A, B, C are matrices of the same size, i.e., with the same number of rows and columns, and cij := aij + bij for all i, j . • multiplication by a scalar C := αA, where cij := αaij for all i, j . • matrix multiplication C := AB, C = A · B or C = A ∗ B, where A ∈ p Cm×p , B ∈ Cp×n , C ∈ Cm×n , and cij := k=1 aik bkj for i = 1, . . . , m, j = 1, . . . , n. • element-by-element matrix operations C := A × B, D := A/B, and E := A ∧ r where all matrices are of the same size and cij := aij bij , dij := aij /bij and eij := aijr for all i, j and suitable r. For the division A/B we assume that all elements of B are nonzero. The element-by-element product C = A × B is known as the Schur product and also the Hadamard product. 10. Let A ∈ Rm×n or A ∈ Cm×n . The transpose AT and conjugate transpose A∗ are n × m matrices with elements aijT := aj i and aij∗ := a j i , respectively. If B is an n, p matrix then (AB)T = B T AT and (AB)∗ = B ∗ A∗ . A matrix A ∈ Cn×n is symmetric if AT = A and Hermitian if A∗ = A. 11. The unit vectors in Rn and Cn are denoted by ⎡ ⎤ 1 ⎢0⎥ ⎢ ⎥ ⎢ ⎥ e1 := ⎢0⎥ , ⎢.⎥ ⎣ .. ⎦ 0

⎡ ⎤ 0 ⎢1⎥ ⎢ ⎥ ⎢ ⎥ e2 := ⎢0⎥ , ⎢.⎥ ⎣ .. ⎦ 0

⎡ ⎤ 0 ⎢0⎥ ⎢ ⎥ ⎢ ⎥ e3 := ⎢1⎥ , ⎢.⎥ ⎣ .. ⎦ 0

...,

⎡ ⎤ 0 ⎢0⎥ ⎢ ⎥ ⎢ ⎥ en := ⎢0⎥ , ⎢.⎥ ⎣ .. ⎦ 1

while I n = I := [δij ]ni,j =1 , where δij :=

1 0

if i = j, otherwise,

(1.1)

is the identity matrix of order n. Both the columns and the transpose of the rows of I are the unit vectors e1 , e2 , . . . , en .

4

1 A Short Review of Linear Algebra

12. Some matrices with many zeros have names indicating their “shape”. Suppose A ∈ Rn×n or A ∈ Cn×n . Then A is • • • • • • •

diagonal if aij = 0 for i = j . upper triangular or right triangular if aij = 0 for i > j . lower triangular or left triangular if aij = 0 for i < j . upper Hessenberg if aij = 0 for i > j + 1. lower Hessenberg if aij = 0 for i < j + 1. tridiagonal if aij = 0 for |i − j | > 1. d-banded if aij = 0 for |i − j | > d.

13. We use the following notations for diagonal- and tridiagonal n × n matrices ⎤ ⎡ ⎡ ⎤ d1 0 · · · 0 d1 ⎢ 0 d2 · · · 0 ⎥ ⎥ ⎥ ⎢ ⎢ diag(di ) = diag(d1 , . . . , dn ) := ⎢ . . . . ⎥ = ⎣ . . . ⎦ , ⎣ .. .. . . .. ⎦ dn 0 0 · · · dn ⎤ ⎡ d 1 c1 ⎥ ⎢ a 1 d 2 c2 ⎥ ⎢ ⎥ ⎢ .. .. .. B = tridiag(ai , di , ci ) = tridiag(a, d, c) := ⎢ ⎥. . . . ⎥ ⎢ ⎣ an−2 dn−1 cn−1 ⎦ an−1 dn Here bii := di for i = 1, . . . , n, bi+1,i := ai , bi,i+1 := ci for i = 1, . . . , n − 1, and bij := 0 otherwise. 14. Suppose A ∈ Cm×n and 1 ≤ i1 < i2 < · · · < ir ≤ m, 1 ≤ j1 < j2 < · · · < jc ≤ n. The matrix A(i, j ) ∈ Cr×c is the submatrix of A consisting of rows i := [i1 , . . . , ir ] and columns j := [j1 , . . . , jc ] ⎤ ⎡ ai1 ,j1 ai1 ,j2 · · · ai1 ,jc ⎢ ⎥ i i · · · ir ⎢ai2 ,j1 ai2 ,j2 · · · ai2 ,jc ⎥ A(i, j ) := A 1 2 =⎢ . ⎥. . . .. . . . .. ⎦ ⎣ .. j1 j2 · · · jc air ,j1 air ,j2 · · · air ,jc For the special case of consecutive rows and columns we also use the notation ⎤ ar1 ,c1 +1 · · · ar1 ,c2 ⎢ ar +1,c ar +1,c +1 · · · ar +1,c ⎥ 1 1 1 1 2 ⎥ ⎢ 1 A(r1 : r2 , c1 : c2 ) := ⎢ .. .. . . .. ⎥ . ⎣ . . . .⎦ ar2 ,c1 ar2 ,c1 +1 · · · ar2 ,c2 ⎡

ar1 ,c1

1.2 Vector Spaces and Subspaces

5

1.2 Vector Spaces and Subspaces Many mathematical systems have analogous properties to vectors in R2 or R3 . Definition 1.1 (Real Vector Space) A real vector space is a nonempty set V, whose objects are called vectors, together with two operations + : V × V −→ V and · : R × V −→ V, called addition and scalar multiplication, satisfying the following axioms for all vectors u, v, w in V and scalars c, d in R. (V1) (V2) (V3) (V4) (V5) (S1) (S2) (S3) (S4) (S5)

The sum u + v is in V, u + v = v + u, u + (v + w) = (u + v) + w, There is a zero vector 0 such that u + 0 = u, For each u in V there is a vector −u in V such that u + (−u) = 0, The scalar multiple c · u is in V, c · (u + v) = c · u + c · v, (c + d) · u = c · u + d · u, c · (d · u) = (cd) · u, 1 · u = u.

The scalar multiplication symbol · is often omitted, writing cv instead of c · v. We define u − v := u + (−v). We call V a complex vector space if the scalars consist of all complex numbers C. In this book a vector space is either real or complex. From the axioms it follows that 1. The zero vector is unique. 2. For each u ∈ V the negative −u of u is unique. 3. 0u = 0, c0 = 0, and −u = (−1)u. Here are some examples 1. The spaces Rn and Cn , where n ∈ N, are real and complex vector spaces, respectively. 2. Let D be a subset of R and d ∈ N. The set V of all functions f , g : D → Rd is a real vector space with (f + g)(t) := f (t) + g(t),

(cf )(t) := cf (t),

t ∈ D,

c ∈ R.

Two functions f , g in V are equal if f (t) = g(t) for all t ∈ D. The zero element is the zero function given by f (t) = 0 for all t ∈ D and the negative of f is given by −f = (−1)f . In the following we will use boldface letters for functions only if d > 1. 3. For n ≥ 0 the space Πn of polynomials of degree at most n consists of all polynomials p : R → R, p : R → C, or p : C → C of the form p(t) := a0 + a1 t + a2 t 2 + · · · + an t n ,

(1.2)

6

1 A Short Review of Linear Algebra

where the coefficients a0 , . . . , an are real or complex numbers. p is called the zero polynomial if all coefficients are zero. All other polynomials are said to be nontrivial. The degree of a nontrivial polynomial p given by (1.2) is the smallest integer 0 ≤ k ≤ n such that p(t) = a0 + · · · + ak t k with ak = 0. The degree of the zero polynomial is not defined. Πn is a vector space if we define addition and scalar multiplication as for functions. Definition 1.2 (Linear Combination) For n ≥ 1 let X := {x 1 , . . . , x n } be a set of vectors in a vector space V and let c1 , . . . , cn be scalars. 1. 2. 3. 4.

The sum c1 x 1 + · · · + cn x n is called a linear combination of x 1 , . . . , x n . The linear combination is nontrivial if cj x j = 0 for at least one j . The set of all linear combinations of elements in X is denoted span(X ). A vector space is finite dimensional if it has a finite spanning set; i.e., there exists n ∈ N and {x 1 , . . . , x n } in V such that V = span({x 1 , . . . , x n }).

Example 1.1 (Linear Combinations) 1. Any x = [x1 , . . . , xm ]T in Cm can be written as a linear combination of the unit vectors as x = x1 e1 + x2 e2 + · · · + xm em . Thus, Cm = span({e1 , . . . , em }) and Cm is finite dimensional. Similarly Rm is finite dimensional. 2. Let Π = ∪n Πn be the space of all polynomials. Π is a vector space that is not finite dimensional. For suppose Π is finite dimensional. Then Π = span({p1 , . . . , pm }) for some polynomials p1 , . . . , pm . Let d be an integer such that the degree of pj is less than d for j = 1, . . . , m. A polynomial of degree d cannot be written as a linear combination of p1 , . . . , pm , a contradiction.

1.2.1 Linear Independence and Bases Definition 1.3 (Linear Independence) A set X = {x 1 , . . . , x n } of nonzero vectors in a vector space is linearly dependent if 0 can be written as a nontrivial linear combination of {x 1 , . . . , x n }. Otherwise X is linearly independent. A set of vectors X = {x 1 , . . . , x n } is linearly independent if and only if c1 x 1 + · · · + cn x n = 0

⇒

c1 = · · · = cn = 0.

(1.3)

Suppose {x 1 , . . . , x n } is linearly independent. Then 1. If x ∈ span(X ) then the scalars c1 , . . . , cn in the representation x = c1 x 1 +· · ·+ cn x n are unique. 2. Any nontrivial linear combination of x 1 , . . . , x n is nonzero, Lemma 1.1 (Linear Independence and Span) Suppose v 1 , . . . , v n span a vector space V and that w1 , . . . , w k are linearly independent vectors in V. Then k ≤ n.

1.2 Vector Spaces and Subspaces

7

Proof Suppose k > n. Write w1 as a linear combination of elements from the set X0 := {v 1 , . . . , v n }, say w 1 = c1 v 1 + · · · + cn v n . Since w1 = 0 not all the c’s are equal to zero. Pick a nonzero c, say ci1 . Then v i1 can be expressed as a linear combination of w1 and the remaining v’s. So the set X1 := {w1 , v 1 , . . . , v i1 −1 , v i1 +1 , . . . , v n } must also be a spanning set for V. We repeat this for w 2 and X1 . In the linear combination w2 = di1 w1 + j =i1 dj v j , we must have di2 = 0 for some i2 with i2 = i1 . For otherwise w2 = d1 w1 contradicting the linear independence of the w’s. So the set X2 consisting of the v’s with v i1 replaced by w1 and v i2 replaced by w2 is again a spanning set for V. Repeating this process n − 2 more times we obtain a spanning set Xn where v 1 , . . . , v n have been replaced by w1 , . . . , w n . Since k > n we can then write wk as a linear combination of w1 , . . . , wn contradicting the linear independence of the w’s. We conclude that k ≤ n.   Definition 1.4 (Basis) A finite set of vectors {v 1 , . . . , v n } in a vector space V is a basis for V if 1. span{v 1 , . . . , v n } = V. 2. {v 1 , . . . , v n } is linearly independent. Theorem 1.1 (Basis Subset of a Spanning Set) Suppose V is a vector space and that {v 1 , . . . , v n } is a spanning set for V. Then we can find a subset {v i1 , . . . , v ik } that forms a basis for V. Proof If {v 1 , . . . , v n } is linearly dependent we can express one of the v’s as a nontrivial linear combination of the remaining v’s and drop that v from the spanning set. Continue this process until the remaining v’s are linearly independent. They still span the vector space and therefore form a basis.   Corollary 1.1 (Existence of a Basis) A vector space is finite dimensional (cf. Definition 1.2) if and only if it has a basis. Proof Let V = span{v 1 , . . . , v n } be a finite dimensional vector space. By Theorem 1.1, V has a basis. Conversely, if V = span{v 1 , . . . , v n } and {v 1 , . . . , v n } is a basis then it is by definition a finite spanning set.   Theorem 1.2 (Dimension of a Vector Space) Every basis for a vector space V has the same number of elements. This number is called the dimension of the vector space and denoted dim V. Proof Suppose X = {v 1 , . . . , v n } and Y = {w1 , . . . , w k } are two bases for V. By Lemma 1.1 we have k ≤ n. Using the same Lemma with X and Y switched we obtain n ≤ k. We conclude that n = k.   The set of unit vectors {e1 , . . . , en } form a basis for both Rn and Cn . Theorem 1.3 (Enlarging Vectors to a Basis) Every linearly independent set of vectors {v 1 , . . . , v k } in a finite dimensional vector space V can be enlarged to a basis for V.

8

1 A Short Review of Linear Algebra

Proof If {v 1 , . . . , v k } does not span V we can enlarge the set by one vector v k+1 which cannot be expressed as a linear combination of {v 1 , . . . , v k }. The enlarged set is also linearly independent. Continue this process. Since the space is finite dimensional it must stop after a finite number of steps.  

1.2.2 Subspaces Definition 1.5 (Subspace) A nonempty subset S of a real or complex vector space V is called a subspace of V if (V1) (S1)

The sum u + v is in S for any u, v ∈ S. The scalar multiple cu is in S for any scalar c and any u ∈ S.

Using the operations in V, any subspace S of V is a vector space, i.e., all 10 axioms V 1 − V 5 and S1 − S5 are satisfied for S. In particular, S must contain the zero element in V. This follows since the operations of vector addition and scalar multiplication are inherited from V. Example 1.2 (Examples of Subspaces) 1. {0}, where 0 is the zero vector is a subspace, the trivial subspace. The dimension of the trivial subspace is defined to be zero. All other subspaces are nontrivial. 2. V is a subspace of itself. 3. span(X ) is a subspace of V for any X = {x 1 , . . . , x n } ⊆ V. Indeed, it is easy to see that (V1) and (S1) hold. 4. The sum of two subspaces S and T of a vector space V is defined by S + T := {s + t : s ∈ S and t ∈ T }.

(1.4)

Clearly (V1) and (S1) hold and it is a subspace of V . 5. The intersection of two subspaces S and T of a vector space V is defined by S ∩ T := {x : x ∈ S and x ∈ T }.

(1.5)

It is a subspace of V. 6. The union of two subspaces S and T of a vector space V is defined by S ∪ T := {x : x ∈ S or x ∈ T }.

(1.6)

In general it is not a subspace of V. 7. A sum of two subspaces S and T of a vector space V is called a direct sum and denoted S ⊕ T if S ∩ T = {0}.

1.2 Vector Spaces and Subspaces

9

Theorem 1.4 (Dimension Formula for Sums of Subspaces) Let S and T be two finite subspaces of a vector space V. Then dim(S + T ) = dim(S) + dim(T ) − dim(S ∩ T ).

(1.7)

In particular, for a direct sum dim(S ⊕ T ) = dim(S) + dim(T ).

(1.8)

Proof Let {u1 , . . . , up } be a basis for S ∩ T , where {u1 , . . . , up } = ∅, the empty set, in the case S ∩ T = {0}. We use Theorem 1.3 to extend {u1 , . . . , up } to a basis {u1 , . . . , up , s 1 , . . . , s q } for S and a basis {u1 , . . . , up , t 1 , . . . , t r } for T . Every x ∈ S + T can be written as a linear combination of {u1 , . . . , up , s 1 , . . . , s q , t 1 , . . . , t r } so these vectors span S + T . We show that they are linearly independent q and hence p a basis. Suppose u + s + t = 0, where u := j =1 αj uj , s := j =1 ρj s j , and t := rj =1 σj t j . Now s = −(u + t) belongs to both S and to T and hence s ∈ S T . Therefore s can be written as a linear ∩ q combination of u1 , . . . , up say s := p p β u . But then 0 = β u − j =1 j j j =1 j j j =1 ρj s j and since {u1 , . . . , up , s 1 , . . . , s q } is linearly independent we must have β1 = · · · = βp = ρ1 = · · · = ρq = 0 and hence s = 0. We then have u + t = 0 and by linear independence of {u1 , . . . , up , t 1 , . . . , t r } we obtain α1 = · · · = αp = σ1 = · · · = σr = 0. We have shown that the vectors {u1 , . . . , up , s 1 , . . . , s q , t 1 , . . . , t r } constitute a basis for S + T . But then dim(S +T ) = p +q +r = (p +q)+(p +r)−p = dim(S)+dim(T )−dim(S ∩T ) and (1.7) follows. Equation (1.7) implies (1.8) since dim{0} = 0.

 

It is convenient to introduce a matrix transforming a basis in a subspace into a basis for the space itself. Lemma 1.2 (Change of Basis Matrix) Suppose S is a subspace of a finite dimensional vector space V and let {s 1 , . . . , s n } be a basis for S and {v 1 , . . . , v m } a basis for V. Then each s j can be expressed as a linear combination of v 1 , . . . , v m , say sj =

m

i=1

aij v i for j = 1, . . . , n.

(1.9)

10

1 A Short Review of Linear Algebra

n m If x ∈ S then x = j =1 cj s j = i=1 bi v i for some coefficients b := T T [b1 , . . . , bm ] , c := [c1 , . . . , cn ] . Moreover b = Ac, where A = [aij ] ∈ Cm×n is given by (1.9). The matrix A has linearly independent columns. Proof Equation (1.9) holds for some aij since s j ∈ V and {v 1 , . . . , v m } spans V. Since {s 1 , . . . , s n } is a basis for S and {v 1 , . . . , v m } a basis for V, every x ∈ S can be written x = nj=1 cj s j = m i=1 bi v i for some scalars (cj ) and (bi ). But then m

i=1

bi v i = x =

n

j =1

(1.9)

cj s j =

n

j =1

cj

m  i=1

m n    aij v i = aij cj v i . i=1

j =1

Since {v 1 , . . . , v m } is linearly independent it follows that bi = nj=1 aij cj for i = 1, . . . , m or b = Ac. Finally, to show that A has linearly independent columns suppose b := Ac = 0 for some c = [c1 , . . . , cn ]T . Define x := nj=1 cj s j . Then x= m i=1 bi v i and since b = 0 we have x = 0. But since {s 1 , . . . , s n } is linearly independent it follows that c = 0.   The matrix A in Lemma 1.2 is called a change of basis matrix.

1.2.3 The Vector Spaces Rn and Cn When V = Rm or Cm we can think of n vectors in V, say x 1 , . . . , x n , as a set X := {x 1 , . . . , x n } or as the columns of an m × n matrix X = [x 1 , . . . , x n ]. A linear combination can then be written as a matrix times vector Xc, where c = [c1 , . . . , cn ]T is the vector of scalars. Thus R(X) := {Xc : c ∈ Rn } = span(X ). Definition 1.6 (Column Space, Null Space, Inner Product and Norm) Associated with an m × n matrix X = [x 1 , . . . , x n ], where x j ∈ V, j = 1, . . . , n are the following subspaces of V. 1. The subspace R(X) is called the column space of X. It is the smallest subspace containing X = {x 1 , . . . , x n }. The dimension of R(X) is called the rank of X. The matrix X has rank n if and only if it has linearly independent columns. 2. R(X T ) is called the row space of X. It is generated by the rows of X written as column vectors. 3. The subspace N (X) := {y ∈ Rn : Xy = 0} is called the null space or kernel space of X. The dimension of N (X) is called the nullity of X and denoted null(X).

1.3 Linear Systems

11

4. The standard inner product is x, y := y ∗ x = x T y =

n

xj yj .

(1.10)

√ x ∗ x.

(1.11)

j =1

5. The Euclidian norm is defined by x2 :=

n

1/2 |xj |

2

=

j =1

Clearly N (X) is nontrivial if and only if X has linearly dependent columns. Inner products and norms are treated in more generality in Chaps. 5 and 8. The following Theorem is shown in any basic course in linear algebra. See Exercise 7.10 for a simple proof using the singular value decomposition. Theorem 1.5 (Counting Dimensions of Fundamental Subspaces) Suppose X ∈ Cm×n . Then 1. rank(X) = rank(X∗ ). 2. rank(X) + null(X) = n, 3. rank(X) + null(X ∗ ) = m,

1.3 Linear Systems Consider a linear system a11 x1 + a12x2 + · · · + a1n xn = b1 a21 x1 + a22x2 + · · · + a2n xn = b2 .. .. .. .. . . . . am1 x1 + am2 x2 + · · · + amn xn = bm of m equations in n unknowns. Here for all i, j , the coefficients aij , the unknowns xj , and the components bi of the right hand side are real or complex numbers. The system can be written as a vector equation x1 a 1 + x2 a 2 + · · · + xn a n = b,

12

1 A Short Review of Linear Algebra

T T   where a j = a1j , . . . , a mj ∈ Cm for j = 1, . . . , n and b = b1 , . . . , bm ∈ Cm . It can also be written as a matrix equation ⎡

a11 a12 ⎢ a21 a22 ⎢ Ax = ⎢ . .. ⎣ .. . am1 am2

··· ··· .. .

⎤⎡ ⎤ ⎡ ⎤ x1 b1 a1n ⎢ x2 ⎥ ⎢ b2 ⎥ a2n ⎥ ⎥⎢ ⎥ ⎢ ⎥ .. ⎥ ⎢ .. ⎥ = ⎢ .. ⎥ = b. . ⎦⎣ . ⎦ ⎣ . ⎦

· · · amn

xn

bm

The system is homogeneous if b = 0 and it is said to be underdetermined, square, or overdetermined if m < n, m = n, or m > n, respectively.

1.3.1 Basic Properties A linear system has a unique solution, infinitely many solutions, or no solution. To discuss this we first consider the real case, and a homogeneous underdetermined system. Lemma 1.3 (Underdetermined System) Suppose A ∈ Rm×n with m < n. Then there is a nonzero x ∈ Rn such that Ax = 0. Proof Suppose A ∈ Rm×n with m < n. The n columns of A span a subspace of Rm . Since Rm has dimension m the dimension of this subspace is at most m. By Lemma 1.1 the columns of A must be linearly dependent. It follows that there is a nonzero x ∈ Rn such that Ax = 0.   A square matrix is either nonsingular or singular. Definition 1.7 (Real Nonsingular or Singular Matrix) A square matrix A ∈ Rn×n is said to be nonsingular if the only real solution of the homogeneous system Ax = 0 is x = 0. The matrix is singular if there is a nonzero x ∈ Rn such that Ax = 0. Theorem 1.6 (Linear Systems; Existence and Uniqueness) Suppose A ∈ Rn×n . The linear system Ax = b has a unique solution x ∈ Rn for any b ∈ Rn if and only if the matrix A is nonsingular.   Proof Suppose A is nonsingular. We define B = A b ∈ Rn×(n+1) by adding a column to A. ByLemma 1.3 there is a nonzero z ∈ Rn+1 such that Bz = 0. If we T  z˜ write z = where z˜ = z1 , . . . , zn ∈ Rn and zn+1 ∈ R, then zn+1  Bz = [A b]



zn+1

 = A˜z + zn+1 b = 0.

1.3 Linear Systems

13

We cannot have zn+1 = 0 for then A˜z = 0 for a nonzero z˜ , contradicting the nonsingularity of A. Define x := −˜z/zn+1 . Then  1 1  z˜ − zn+1 b = b, =− A˜z = − Ax = −A zn+1 zn+1 zn+1 so x is a solution. Suppose Ax = b and Ay = b for x, y ∈ Rn . Then A(x − y) = 0 and since A is nonsingular we conclude that x − y = 0 or x = y. Thus the solution is unique. Conversely, if Ax = b has a unique solution for any b ∈ Rn then Ax = 0 has a unique solution which must be x = 0. Thus A is nonsingular.   For the complex case we have Lemma 1.4 (Complex Underdetermined System) Suppose A ∈ Cm×n with m < n. Then there is a nonzero x ∈ Cn such that Ax = 0. Definition 1.8 (Complex Nonsingular Matrix) A square matrix A ∈ Cn×n is said to be nonsingular if the only complex solution of the homogeneous system Ax = 0 is x = 0. The matrix is singular if it is not nonsingular. Theorem 1.7 (Complex Linear System; Existence and Uniqueness) Suppose A ∈ Cn×n . The linear system Ax = b has a unique solution x ∈ Cn for any b ∈ Cn if and only if the matrix A is nonsingular.

1.3.2 The Inverse Matrix Suppose A ∈ Cn×n is a square matrix. A matrix B ∈ Cn×n is called a right inverse of A if AB = I . A matrix C ∈ Cn×n is said to be a left inverse of A if CA = I . We say that A is invertible if it has both a left- and a right inverse. If A has a right inverse B and a left inverse C then C = CI = C(AB) = (CA)B = I B = B and this common inverse is called the inverse of A and denoted by A−1 . Thus the inverse satisfies A−1 A = AA−1 = I . We want to characterize the class of invertible matrices and start with a lemma. Theorem 1.8 (Product of Nonsingular Matrices) If A, B, C ∈ Cn×n with AB = C then C is nonsingular if and only if both A and B are nonsingular. In particular, if either AB = I or BA = I then A is nonsingular and A−1 = B. Proof Suppose both A and B are nonsingular and let Cx = 0. Then ABx = 0 and since A is nonsingular we see that Bx = 0. Since B is nonsingular we have x = 0. We conclude that C is nonsingular.

14

1 A Short Review of Linear Algebra

For the converse suppose first that B is singular and let x ∈ Cn be a nonzero vector so that Bx = 0. But then Cx = (AB)x = A(Bx) = A0 = 0 so C is singular. Finally suppose B is nonsingular, but A is singular. Let x˜ be a nonzero vector such that Ax˜ = 0. By Theorem 1.7 there is a vector x such that Bx = x˜ and x is nonzero since x˜ is nonzero. But then Cx = (AB)x = A(Bx) = Ax˜ = 0 for a nonzero vector x and C is singular.   Theorem 1.9 (When Is a Square Matrix Invertible?) A square matrix is invertible if and only if it is nonsingular. Proof Suppose first A is a nonsingular matrix. By Theorem 1.7 eachof the linear  systems Abi = ei has a unique solution bi for i = 1, . . . , n. Let B = b1 , . . . , b n .     Then AB = Ab1 , . . . , Abn = e1 , . . . , en = I so that A has a right inverse B. By Theorem 1.8 B is nonsingular since I is nonsingular and AB = I . Since B is nonsingular we can use what we have shown for A to conclude that B has a right inverse C, i.e. BC = I . But then AB = BC = I so B has both a right inverse and a left inverse which must be equal so A = C. Since BC = I we have BA = I , so B is also a left inverse of A and A is invertible. Conversely, if A is invertible then it has a right inverse B. Since AB = I and I is nonsingular, we again use Theorem 1.8 to conclude that A is nonsingular.   To verify that some matrix B is an inverse of another matrix A it is enough to show that B is either a left inverse or a right inverse of A. This calculation also proves that A is nonsingular. We use this observation to give simple proofs of the following results. Corollary 1.2 (Basic Properties of the Inverse Matrix) Suppose A, B ∈ Cn×n are nonsingular and c is a nonzero constant. 1. 2. 3. 4. 5.

A−1 is nonsingular and (A−1 )−1 = A. C = AB is nonsingular and C −1 = B −1 A−1 . AT is nonsingular and (AT )−1 = (A−1 )T =: A−T . A∗ is nonsingular and (A∗ )−1 = (A−1 )∗ =: A−∗ . cA is nonsingular and (cA)−1 = 1c A−1 .

Proof 1. Since A−1 A = I the matrix A is a right inverse of A−1 . Thus A−1 is nonsingular and (A−1 )−1 = A. 2. We note that (B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 B = I . Thus AB is invertible with the indicated inverse since it has a left inverse. 3. Now I = I T = (A−1 A)T = AT (A−1 )T showing that (A−1 )T is a right inverse of AT . The proof of part 4 is similar. 4. The matrix 1c A−1 is a one sided inverse of cA.  

1.4 Determinants

15

1.4 Determinants The first systematic treatment of determinants was given by Cauchy in 1812. He adopted the word “determinant”. The first use of determinants was made by Leibniz in 1693 in a letter to De L’Hôspital. By the beginning of the twentieth century the theory of determinants filled four volumes of almost 2000 pages (Muir, 1906–1923. Historic references can be found in this work). The main use of determinants in this text will be to study the characteristic polynomial of a matrix and to show that a matrix is nonsingular. For any A ∈ Cn×n the determinant of A is defined by the number det(A) =

sign(σ )aσ (1),1aσ (2),2 · · · aσ (n),n .

(1.12)

σ ∈Sn

This sum ranges of all n! permutations of {1, 2, . . . , n}. Moreover, sign(σ ) equals the number of times a bigger integer precedes a smaller one in σ . We also denote the determinant by (Cayley, 1841)   a11   a21   .  ..  a

n1

 a1n  a2n  ..  . .  · · · ann 

a12 · · · a22 · · · .. . an2

From the definition we have    a11 a12     a21 a22  = a11 a22 − a21 a12 . The first term on the right corresponds to the identity permutation given by (i) = i, i = 1, 2. The second term comes from the permutation σ = {2, 1}. For n = 3 there are six permutations of {1, 2, 3}. Then    a11 a12 a13     a21 a22 a23  = a11 a22 a33 − a11 a32a23 − a21 a12a33   a a a  31 32 33 + a21 a32a13 + a31 a12a23 − a31 a22a13 . This follows since sign({1, 2, 3}) = sign({2, 3, 1}) = sign({3, 1, 2}) = 1, and noting that interchanging two numbers in a permutation reverses it sign we find sign({2, 1, 3}) = sign({3, 2, 1}) = sign({1, 3, 2}) = −1. To compute the value of a determinant from the definition can be a trying experience. It is often better to use elementary operations on rows or columns to reduce it to a simpler form. For example, if A is triangular then det(A) =

16

1 A Short Review of Linear Algebra

a11 a22 · · · ann , the product of the diagonal elements. In particular, for the identity matrix det(I ) = 1. The elementary operations using either rows or columns are 1. Interchanging two rows(columns): det(B) = − det(A), 2. Multiply a row(column) by a scalar: α, det(B) = α det(A), 3. Add a constant multiple of one row(column) to another row(column): det(B) = det(A). where B is the result of performing the indicated operation on A. If only a few elements in a row or column are nonzero then a cofactor expansion can be used. These expansions take the form det(A) =

n

(−1)i+j aij det(Aij ) for i = 1, . . . , n, row

(1.13)

j =1 n

(−1)i+j aij det(Aij ) for j = 1, . . . , n, column. det(A) =

(1.14)

i=1

Here Ai,j denotes the submatrix of A obtained by deleting the ith row and j th column of A. For A ∈ Cn×n and 1 ≤ i, j ≤ n the determinant det(Aij ) is called the cofactor of aij . Example 1.3 (Determinant Equation for a Straight Line) The equation for a straight line through two points (x1 , y1 ) and (x2 , y2 ) in the plane can be written as the equation   1 x y    det(A) := 1 x1 y1  = 0 1 x y  2 2 involving a determinant of order 3. We can compute this determinant using row operations of type 3. Subtracting row 2 from row 3 and then row 1 from row 2, and then using a cofactor expansion on the first column we obtain     1 x y  1 x y     1 x1 y1  = 0 x1 − x y1 − y      1 x y  0 x − x y − y  2 2 2 1 2 1    x1 − x y1 − y    = (x1 − x)(y2 − y1 ) − (y1 − y)(x2 − x1 ). = x2 − x1 y2 − y1  Rearranging the equation det(A) = 0 we obtain y − y1 =

y2 − y1 (x − x1 ) x2 − x1

1.4 Determinants

17

which is the slope form of the equation of a straight line. We will freely use, without proofs, the following properties of determinants. If A, B are square matrices of order n with real or complex elements, then det(AB) = det(A) det(B). det(AT ) = det(A), and det(A∗ ) = det(A), (complex conjugate). det(aA) = a n det(A), for a ∈ C. A is singular if   and only if det(A) = 0. CD 5. If A = for some square matrices C, E then det(A) = det(C) det(E). 0 E 6. Cramer’s rule Suppose A ∈ Cn×n is nonsingular and b ∈ Cn . Let x = [x1 , x2 , . . . , xn ]T be the unique solution of Ax = b. Then

1. 2. 3. 4.

xj =

det(Aj (b)) , det(A)

j = 1, 2, . . . , n,

where Aj (b) denote the matrix obtained from A by replacing the j th column of A by b. 7. Adjoint formula for the inverse. If A ∈ Cn×n is nonsingular then A−1 =

1 adj(A), det(A)

where the matrix adj(A) ∈ Cn×n with elements adj(A)i,j = (−1)i+j det(Aj,i ) is called the adjoint of A. Moreover, Aj,i denotes the submatrix of A obtained by deleting the j th row and ith column of A. 8. Cauchy-Binet formula: Let A ∈ Cm×p , B ∈ Cp×n and C = AB. Suppose 1 ≤ r ≤ min{m, n, p} and let i = {i1 , . . . , ir } and j = {j1 , . . . , jr } be integers with 1 ≤ i1 < i2 < · · · < ir ≤ m and 1 ≤ j1 < j2 < · · · < jr ≤ n. Then ⎡

⎤ ⎡ ⎤⎡ ⎤ ci1 ,j1 · · · ci1 ,jr ai1 ,k1 · · · ai1 ,kr bk1 ,j1 · · · bk1 ,jr

⎢ .. ⎢ .. .. ⎥ = .. ⎥ ⎢ .. .. ⎥ , ⎣ . ⎣ . . ⎦ . ⎦⎣ . . ⎦ k cir ,j1 · · · cir ,jr air k1 · · · air ,kr bkr ,j1 · · · bkr ,jr where we sum over all k = {k1 , . . . , kr } with 1 ≤ k1 < k2 < · · · < kr ≤ p. More compactly,       det C(i, j ) = det A(i, k) det B(k, j ) , k

Note the resemblance to the formula for matrix multiplication.

(1.15)

18

1 A Short Review of Linear Algebra

1.5 Eigenvalues, Eigenvectors and Eigenpairs Suppose A ∈ Cn×n is a square matrix, λ ∈ C and x ∈ Cn . We say that (λ, x) is an eigenpair for A if Ax = λx and x is nonzero. The scalar λ is called an eigenvalue and x is said to be an eigenvector.1 The set of eigenvalues is called the spectrum of A and is denoted by σ (A). For example, σ (I ) = {1, . . . , 1} = {1}. Eigenvalues are the roots of the characteristic polynomial. Lemma 1.5 (Characteristic Equation) For any A ∈ Cn×n we have λ ∈ σ (A) ⇐⇒ det(A − λI ) = 0. Proof Suppose (λ, x) is an eigenpair for A. The equation Ax = λx can be written (A − λI )x = 0. Since x is nonzero the matrix A − λI must be singular with a zero determinant. Conversely, if det(A − λI ) = 0 then A − λI is singular and (A − λI )x = 0 for some nonzero x ∈ Cn . Thus Ax = λx and (λ, x) is an eigenpair for A.   The expression det(A − λI ) is a polynomial of exact degree n in λ. For n = 3 we have    a11 − λ a12 a13   det(A − λI ) =  a21 a22 − λ a23  .  a a32 a33 − λ  31 Expanding this determinant by the first column we find        a − λ a23   − a21  a12 a13  det(A − λI ) = (a11 − λ)  22   a32 a33 − λ a32 a33 − λ     a12 a13   = (a11 − λ)(a22 − λ)(a33 − λ) + r(λ) + a31  a22 − λ a23  for some polynomial r of degree at most one. In general det(A − λI ) = (a11 − λ)(a22 − λ) · · · (ann − λ) + r(λ),

(1.16)

where each term in r(λ) has at most n − 2 factors containing λ. It follows that r is a polynomial of degree at most n − 2, det(A − λI ) is a polynomial of exact degree n in λ and the eigenvalues are the roots of this polynomial. We observe that det(A − λI ) = (−1)n det(λI − A) so det(A − λI ) = 0 if and only if det(λI − A) = 0.

1 The

word “eigen” is derived from German and means “own”.

1.5 Eigenvalues, Eigenvectors and Eigenpairs

19

Definition 1.9 (Characteristic Polynomial of a Matrix) The function πA : C → C given by πA (λ) = det(A − λI ) is called the characteristic polynomial of A. The equation det(A − λI ) = 0 is called the characteristic equation of A. By the fundamental theorem of algebra an n × n matrix has, counting multiplicities, precisely n eigenvalues λ1 , . . . , λn some of which might be complex even if A is real. The complex eigenpairs of a real matrix occur in complex conjugate pairs. Indeed, taking the complex conjugate on both sides of the equation Ax = λx with A real gives Ax = λx. Theorem 1.10 (Sums and Products of Eigenvalues; Trace) For any A ∈ Cn×n trace(A) = λ1 + λ2 + · · · + λn ,

det(A) = λ1 λ2 · · · λn ,

(1.17)

where the trace of A ∈ Cn×n is the sum of its diagonal elements trace(A) := a11 + a22 + · · · + ann .

(1.18)

Proof We compare two different expansions of πA . On the one hand from (1.16) we find πA (λ) = (−1)n λn + cn−1 λn−1 + · · · + c0 , where cn−1 = (−1)n−1 trace(A) and c0 = πA (0) = det(A). On the other hand πA (λ) = (λ1 − λ) · · · (λn − λ) = (−1)n λn + dn−1 λn−1 + · · · + d0 , where dn−1 = (−1)n−1 (λ1 + · · · + λn ) and d0 = λ1 · · · λn . Since cj = dj for all j we obtain (1.17).   For a 2 × 2 matrix the characteristic equation takes the convenient form λ2 − trace(A)λ + det(A) = 0.

(1.19)

  Thus, if A = 21 12 then trace(A) = 4, det(A) = 3 so that πA (λ) = λ2 − 4λ + 3. Since A is singular ⇐⇒ Ax = 0, some x = 0 ⇐⇒ Ax = 0x, some x = 0 ⇐⇒ zero is an eigenvalue of A, we obtain Theorem 1.11 (Zero Eigenvalue) The matrix A ∈ Cn×n is singular if and only if zero is an eigenvalue. Since the determinant of a triangular matrix is equal to the product of the diagonal elements the eigenvalues of a triangular matrix are found on the diagonal. In general it is not easy to find all eigenvalues of a matrix. However, sometimes the dimension of the problem can be reduced. Since the determinant of a block triangular matrix is equal to the product of the determinants of the diagonal blocks we obtain

20

1 A Short Review of Linear Algebra

Theorem 1.12 (Eigenvalues of a Block Triangular Matrix) If A = block triangular then πA = πB · πC .

B

D 0 C



is

1.6 Exercises Chap. 1 1.6.1 Exercises Sect. 1.1 Exercise 1.1 (Strassen Multiplication (Exam Exercise 2017-1)) (By arithmetic operations we mean additions, subtractions, multiplications and divisions.) Let A and B be n × n real matrices. a) With A, B ∈ Rn×n , how many arithmetic operations are required to form the product AB? b) Consider the 2n × 2n block matrix 

    W X AB E F = , Y Z CD GH

where all matrices A, . . . , Z are in Rn×n . How many operations does it take to compute W , X, Y and Z by the obvious algorithm? c) An alternative method to compute W , X, Y and Z is to use Strassen’s formulas: P1 = (A + D) (E + H ) , P2 = (C + D) E,

P5 = (A + B) H ,

P3 = A (F − H ) ,

P6 = (C − A) (E + F ) ,

P4 = D (G − E) ,

P7 = (B − D) (G + H ) ,

W = P1 + P4 − P5 + P7 ,

X = P3 + P5 ,

Y = P2 + P4 ,

Z = P1 + P3 − P2 + P6 .

You do not have to verify these formulas. What is the operation count for this method? d) Describe a recursive algorithm, based on Strassen’s formulas, which given two matrices A and B of size m × m, with m = 2k for some k ≥ 0, calculates the product AB.   e) Show that the operation count of the recursive algorithm is O mlog2 (7) . Note that log2 (7) ≈ 2.8 < 3, so this is less costly than straightforward matrix multiplication.

1.6 Exercises Chap. 1

21

1.6.2 Exercises Sect. 1.3 Exercise 1.2 (The Inverse of a General 2 × 2 Matrix) Show that 

ab cd

−1

 =α

 d −b , −c a

α=

1 , ad − bc

for any a, b, c, d such that ad − bc = 0. Exercise 1.3 (The Inverse of a Special 2 × 2 Matrix) Find the inverse of 

 cos θ − sin θ A= . sin θ cos θ Exercise 1.4 (Sherman-Morrison Formula) Suppose A ∈ Cn×n , and B, C ∈ Rn×m for some n, m ∈ N. If (I + C T A−1 B)−1 exists then (A + BC T )−1 = A−1 − A−1 B(I + C T A−1 B)−1 C T A−1 . Exercise 1.5 (Inverse Update (Exam Exercise 1977-1)) a) Let u, v ∈ Rn and suppose v T u = 1. Show that I − uv T has an inverse given by I − τ uv T , where τ = 1/(v T u − 1). b) Let A ∈ Rn×n be nonsingular with inverse C := A−1 , and let a ∈ Rn . Let A be the matrix which differs from A by exchanging the ith row of A with a T , i.e., A = A − ei (eTi A − a T ), where ei is the ith column in the identity matrix I . Show that if λ := a T Cei = 0, −1

then A has an inverse C = A

(1.20)

given by

  1 C = C I + ei (eTi − a T C) λ

(1.21)

c) Write an algorithm which to given C and a checks if (1.20) holds and computes C provided λ = 0. (hint: Use (1.21) to find formulas for computing each column in C. ) Exercise 1.6 (Matrix Products (Exam Exercise 2009-1)) Let A, B, C, E ∈ Rn×n be matrices where AT = A. In this problem an (arithmetic) operation is an addition or a multiplication. We ask about exact numbers of operations. a) How many operations are required to compute the matrix product BC? How many operations are required if B is lower triangular? b) Show that there exists a lower triangular matrix L ∈ Rn×n such that A = L+LT .

22

1 A Short Review of Linear Algebra

c) We have E T AE = S + S T where S = E T LE. How many operations are required to compute E T AE in this way?

1.6.3 Exercises Sect. 1.4 Exercise 1.7 (Cramer’s Rule; Special Case) Solve the following system by Cramer’s rule:      12 x1 3 = 21 x2 6 Exercise 1.8 (Adjoint Matrix; Special Case) Show that if ⎡

⎤ 2 −6 3 A = ⎣ 3 −2 −6 ⎦ , 6 3 2 then ⎡

⎤ 14 21 42 adj(A) = ⎣ −42 −14 21 ⎦ . 21 −42 14 Moreover, ⎤ 343 0 0 adj(A)A = ⎣ 0 343 0 ⎦ = det(A)I . 0 0 343 ⎡

Exercise 1.9 (Determinant Equation for a Plane) Show that  x   x1   x2  x 3

y y1 y2 y3

z z1 z2 z3

 1  1  = 0. 1  1

is the equation for a plane through three points (x1 , y1 , z1 ), (x2 , y2 , z2 ), (x3 , y3 , z3 ) in space.

1.6 Exercises Chap. 1

23

Fig. 1.1 The triangle T defined by the three points P1 , P2 and P3

Exercise 1.10 (Signed Area of a Triangle) Let Pi = (xi , yi ), i = 1, 2, 3, be three points in the plane defining a triangle T . Show that the area of T is2   1 1 1   1 A(T ) =  x1 x2 x3  . 2 y1 y2 y3  The area is positive if we traverse the vertices in counterclockwise order. Exercise 1.11 (Vandermonde Matrix) Show that  1  1  .  ..  1

 x12 · · · x1n−1  x22 · · · x2n−1   (xi − xj ), .. ..  = . .  i>j xn xn2 · · · xnn−1  x1 x2 .. .

  where i>j (xi − xj ) = ni=2 (xi − x1 )(xi − x2 ) · · · (xi − xi−1 ). This determinant is called the Vandermonde determinant.3 Exercise 1.12 (Cauchy Determinant (1842)) Let α = [α1 , . . . , αn ]T , β = [β1 , . . . , βn ]T be in Rn . a) Consider the matrix A ∈ Rn×n with elements ai,j = 1/(αi + βj ), i, j = 1, 2, . . . , n. Show that det(A) = P g(α)g(β)

2 Hint: 3 Hint:

A(T ) = A(ABP3 P1 ) + A(P3 BCP2 ) − A(P1 ACP2 ), c.f. Fig. 1.1. subtract xnk times column k from column k +1 for k = n−1, n−2, . . . , 1.

24

1 A Short Review of Linear Algebra

where P =

n

n

j =1 aij ,

i=1

g(γ ) =

and for γ = [γ1 , . . . , γn ]T

n 

(γi − γ1 )(γi − γ2 ) · · · (γi − γi−1 )

i=2

 Hint: Multiply the ith row of A by nj=1 (αi + βj ) for i = 1, 2, . . . , n. Call the resulting matrix C. Each element of C is a product of n−1 factors αr +βs . Hence det(C) is a sum of terms where each term contain precisely n(n − 1) factors αr + βs . Thus det(C) = q(α, β) where q is a polynomial of degree at most n(n−1) in αi and βj . Since det(A) and therefore det(C) vanishes if αi = αj for some i = j or βr = βs for some r = s, we have that q(α, β) must be divisible by each factor in g(α) and g(β). Since g(α) and g(β) is a polynomial of degree n(n−1), we have q(α, β) = kg(α)g(β) for some constant k independent of α and β. Show that k = 1 by choosing βi + αi = 0, i = 1, 2, . . . , n. b) Notice that the cofactor of any element in the above matrix A is the determinant of a matrix of similar form. Use the cofactor and determinant of A to represent the elements of A−1 = (bj,k ). Answer: bj,k = (αk + βj )Ak (−βj )Bj (−αk ), where Ak (x) =

 αs − x , αs − αk

Bk (x) =

s =k

 βs − x . βs − βk

s =k

Exercise 1.13 (Inverse of the Hilbert Matrix) Let H n = (hi,j ) be the n × n matrix with elements hi,j = 1/(i+j−1). Use Exercise 1.12 to show that the elements n ti,j in T n = H −1 n are given by n ti,j =

f (i)f (j ) , i+j −1

where f (i +1) =

i 2 − n2 i2

f (i),

i = 1, 2, . . . ,

f (1) = −n.

Part I

LU and QR Factorizations

The first three chapters in this part consider ways of factoring a matrix A into a lower triangular matrix L and an upper triangular matrix U resulting in the product A = LU . We also consider the factorization A = LDU , where L is lower triangular, D is diagonal and U is upper triangular. Moreover, L and U have ones on their respective diagonals. Three simple introductory problems and related LU factorizations are considered in Chap. 2. We also consider some basic properties of triangular matrices and the powerful tool of block multiplication. We consider Gaussian elimination, it’s relation to LU factorization, and the general theory of LU factorizations in Chap. 3. Symmetric positive definite matrices, where LU factorizations play an important role, are considered in Chap. 4. There exists problems where Gaussian elimination leads to inaccurate results. Such problems can to a large extent be avoided by using an alternative method based on QR factorization. Here A = QR, where Q is unitary, i.e., Q∗ Q = I , and R is upper triangular. QR factorization is related to Gram-Schmidt orthogonalization of a basis in a vector space. The QR factorization plays an important role in computational least squares and eigenvalue problems.

Chapter 2

Diagonally Dominant Tridiagonal Matrices; Three Examples

In this chapter we consider three problems originating from: • cubic spline interpolation, • a two point boundary value problem, • an eigenvalue problem for a two point boundary value problem. Each of these problems leads to a linear algebra problem with a matrix which is diagonally dominant and tridiagonal. Taking advantage of structure we can show existence, uniqueness and characterization of a solution, and derive efficient and stable algorithms based on LU factorization to compute a numerical solution. For a particular tridiagonal test matrix we determine all its eigenvectors and eigenvalues. We will need these later when studying more complex problems. We end the chapter with an introduction to block multiplication, a powerful tool in matrix analysis and numerical linear algebra. Block multiplication is applied to derive some basic facts about triangular matrices.

2.1 Cubic Spline Interpolation We consider the following interpolation problem. Given an interval [a, b] , n + 1 ≥ 2 equidistant sites in [a, b] xi = a +

i−1 (b − a), n

i = 1, 2, . . . , n + 1

(2.1)

and y values y := [y1 , . . . , yn+1 ]T ∈ Rn+1 . We seek a function g : [a, b] → R such that g(xi ) = yi , for i = 1, . . . , n + 1. © Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_2

(2.2) 27

28

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

For simplicity we only consider equidistant sites. More generally they could be any a ≤ x1 < x2 < · · · < xn+1 ≤ b.

2.1.1 Polynomial Interpolation Since there are n + 1 interpolation conditions in (2.2) a natural choice for a function g is a polynomial of degree n. As shown in most books on numerical methods such a g is uniquely defined and there are good algorithms for computing it. Evidently, when n = 1, g is the straight line g(x) = y1 +

y2 − y1 (x − x1 ), x2 − x1

(2.3)

known as the linear interpolation polynomial. Polynomial interpolation is an important technique which often gives good results, but the interpolant g can have undesirable oscillations when n is large. As an example, consider the function given by f (x) = arctan(10x) + π/2,

x ∈ [−1, 1].

The function f and the polynomial g of degree at most 13 satisfying (2.2) with [a, b] = [−1, 1] and yi = f (xi ), i = 1, . . . , 14 is shown in Fig. 2.1. The interpolant has large oscillations near the end of the range. This is an example of the Runge phenomenon. Using larger n will only make the oscillations bigger.1

2.1.2 Piecewise Linear and Cubic Spline Interpolation To avoid oscillations like the one in Fig. 2.1 piecewise linear interpolation can be used. An example is shown in Fig. 2.2. The interpolant g approximates the original function quite well, and for some applications, like plotting, the linear interpolant using many points is what is used. Note that g is a piecewise polynomial of the form ⎧ ⎪ p1 (x), if x1 ≤ x < x2 , ⎪ ⎪ ⎪ ⎪ ⎪ if x2 ≤ x < x3 , p (x), ⎪ ⎨ 2 .. g(x) := . (2.4) ⎪ ⎪ ⎪ ⎪ ⎪ p (x), if xn−1 ≤ x < xn , ⎪ ⎪ n−1 ⎩ pn (x), if xn ≤ x ≤ xn+1 ,

1 This is due to the fact that the sites are uniformly spaced. High degree interpolation converges uniformly to the function being interpolated when a sequence consisting of the extrema of the Chebyshev polynomial of increasing degree is used as sites. This is not true for any continuous function (the Faber theorem), but holds if the function is Lipschitz continuous.

2.1 Cubic Spline Interpolation

29

8 6 4 2 −1.0

0.5

−0.5

1.0

−2 −4 −6 Fig. 2.1 The polynomial of degree 13 interpolating f (x) = arctan(10x) + π/2 on [−1, 1]. See text

3.0 2.5 2.0 1.5 1.0 0.5

−1.0

−0.5

0.5

1.0

Fig. 2.2 The piecewise linear polynomial interpolating f (x) = arctan(10x) + π/2 at n = 14 uniform points on [−1, 1]

where each pi is a polynomial of degree ≤ 1. In particular, p1 is given in (2.3) and the other polynomials pi are given by similar expressions. The piecewise linear interpolant is continuous, but the first derivative will usually have jumps at the interior sites. We can obtain a smoother approximation by letting g be a piecewise polynomial of higher degree. With degree 3 (cubic) we obtain continuous derivatives of order ≤ 2 (C 2 ). We consider here the following functions giving examples of C 2 cubic spline interpolants.

30

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

Definition 2.1 (The D2 -Spline Problem) Given n ∈ N, an interval [a, b], y ∈ Rn+1 , knots (sites) x1 , . . . , xn+1 given by (2.1) and numbers μ1 , μn+1 . The problem is to find a function g : [a, b] → R such that • piecewise cubic polynomial: g is of the form (2.4) with each pi a cubic polynomial, , • smoothness: g ∈ C 2 [a, b], i.e., derivatives of order ≤ 2 are continuous on R, • interpolation: g(xi ) = yi , i = 1, 2, . . . , n + 1, • D 2 boundary conditions: g  (a) = μ1 , g  (b) = μn+1 . We call g a D2 -spline. It is called an N-spline or natural spline if μ1 = μn+1 = 0. Example 2.1 (A D2 -Spline) Suppose we choose n = 2 and sample data from the function f : [0, 2] → R given by f (x) = x 4 . Thus we consider the D2 -spline problem with [a, b] = [0, 2], y := [0, 1, 16]T and μ1 = g  (0) = 0, μ3 = g  (2) = 48. The knots are x1 = 0, x2 = 1 and x3 = 2. The function g given by g(x) :=

p1 (x) = − 12 x + 32 x 3 , p2 (x) = 1 + 4(x − 1) +

if 0 ≤ x < 1, 9 2 (x



1)2

+

13 2 (x



1)3 ,

if 1 ≤ x ≤ 2, (2.5)

is a D2 -spline solving this problem. Indeed, p1 and p2 are cubic polynomials. For smoothness we find p1 (1) = p2 (1) = 1, p1 (1) = p2 (1) = 4, p1 (1) = p2 (1) = 9 which implies that g ∈ C 2 [0, 2]. Finally we check that the interpolation and boundary conditions hold. Indeed, g(0) = p1 (0) = 0, g(1) = p2 (1) = 1, g(2) = p2 (2) = 16, g  (0) = p1 (0) = 0 and g  (2) = p2 (2) = 48. Note that p1 (x) = 9 = 39 = p2 (x) showing that the third derivative of g is piecewise constant with a jump discontinuity at the interior knot. A plot of f and g is shown in Fig. 2.3. It is hard to distinguish one from the other. We note that • The C 2 condition is equivalent to (j )

(j )

pi−1 (xi ) = pi (xi ),

j = 0, 1, 2,

i = 2, . . . , n.

• The extra boundary conditions D2 or N are introduced to obtain a unique interpolant. Indeed counting requirements we have 3(n − 1) C 2 conditions, n + 1 conditions (2.2), and two boundary conditions, adding to 4n. Since a cubic polynomial has four coefficients, this number is equal to the number of coefficients of the n polynomials p1 , . . . , pn and give hope for uniqueness of the interpolant.

2.1 Cubic Spline Interpolation

31

Fig. 2.3 A cubic spline with one knot interpolating f (x) = x 4 on [0, 2]

2.1.3 Give Me a Moment Existence and uniqueness of a solution of the D2 -spline problem hinges on the nonsingularity of a linear system of equations that we now derive. The unknowns are derivatives at the knots. Here we use second derivatives which are sometimes called moments. We start with the following lemma. Lemma 2.1 (Representing Each pi Using (0, 2) Interpolation) Given a < b, h = (b−a)/n with n ≥ 2, xi = a+(i−1)h, and numbers yi , μi for i = 1, . . . , n+1. For i = 1, . . . , n there are unique cubic polynomials pi such that pi (xi ) = yi , pi (xi+1 ) = yi+1 ,

pi (xi ) = μi , pi (xi+1 ) = μi+1 .

(2.6)

Moreover, pi (x) = ci,1 + ci,2 (x − xi ) + ci,3 (x − xi )2 + ci,4 (x − xi )3

i = 1, . . . , n,

(2.7)

where ci1 = yi , ci2 =

h yi+1 − yi h μi μi+1 − μi − μi − μi+1 , ci,3 = , ci,4 = . h 3 6 2 6h (2.8)

32

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

Proof Consider pi in the form (2.7) for some 1 ≤ i ≤ n. Evoking (2.6) we find pi (xi ) = ci,1 = yi . Since pi (x) = 2ci,3 +6ci,4 (x −xi ) we obtain ci,3 from pi (xi ) = 2ci,3 = μi (a moment), and then ci,4 from pi (xi+1 ) = μi + 6hci,4 = μi+1 . Finally we find ci,2 by solving pi (xi+1 ) = yi + ci,2 h + μ2i h2 + μi+16h−μi h3 = yi+1 . For j = 0, 1, 2, 3 the shifted powers (x − xi )j constitute a basis for cubic polynomials and the formulas (2.8) are unique by construction. It follows that pi is unique.   Theorem 2.1 (Constructing a D2 -Spline) Suppose for some moments μ1 , . . . , μn+1 that each pi is given as in Lemma 2.1 for i = 1, . . . , n. If in addition μi−1 + 4μi + μi+1 =

6 (yi+1 − 2yi + yi−1 ), h2

i = 2, . . . , n,

(2.9)

then the function g given by (2.4) solves a D2 -spline problem. Proof Suppose for 1 ≤ i ≤ n that pi is given as in Lemma 2.1 for some μ1 , . . . , μn+1 . Consider the C 2 requirement. Since pi−1 (xi ) = pi (xi ) = yi and  (x ) = p (x ) = μ for i = 2, . . . , n it follows that g ∈ C 2 if and only if pi−1 i i i i  (x ) = p (x ) for i = 2, . . . , n. By (2.7) pi−1 i i i  pi−1 (xi ) = ci−1,2 + 2hci−1,3 + 3h2 ci−1,4

yi − yi−1 h μi−1 h μi − μi−1 − μi−1 − μi + 2h + 3h2 h 3 6 2 6h h yi − yi−1 h + μi−1 + μi = h 6 3 h − y y h i+1 i − μi − μi+1 . pi (xi ) = ci2 = h 3 6 =

(2.10)

 (x ) = p (x ) if and only if (2.9) holds. A simple calculation shows that pi−1 i i i Finally consider the function g given by (2.4). If (2.9) holds then g ∈ C 2 [a, b]. By construction g(xi ) = yi , i = 1, . . . , n + 1, g  (a) = p1 (x1 ) = μ1 and g  (b) = pn (xn+1 ) = μn+1 . It follows that g solves the D2 -spline problem.  

In order for the D2 -spline to exist we need to show that μ2 , . . . , μn always can be determined from (2.9). For n ≥ 3 and with μ1 and μn+1 given (2.9) can be written in the form ⎡

4 ⎢1 ⎢ ⎢ ⎢ ⎢ ⎣

⎤⎡ ⎤ ⎡ 2 ⎤ μ2 δ y 2 − μ1 1 ⎥ ⎢ μ3 ⎥ ⎢ ⎥ δ 2 y3 4 1 ⎥⎢ ⎥ ⎢ ⎥ 6 ⎢ .. ⎥ ⎢ ⎥ 2 . .. .. .. ⎥ . = ⎥ ⎢ ⎥ ⎢ ⎥ , δ yi := yi+1 − 2yi + yi−1 . . . . ⎥ ⎢ . ⎥ h2 ⎢ . ⎥ 2 ⎦ ⎣ ⎦ ⎣ 1 4 1 μn−1 δ yn−1 ⎦ 1 4 μn δ 2 yn − μn+1 (2.11)

2.1 Cubic Spline Interpolation

33

This is a square linear system of equations. We recall (see Theorem 1.6) that a square system Ax = b has a solution for all right hand sides b if and only if the coefficient matrix A is nonsingular, i.e., the homogeneous system Ax = 0 only has the solution x = 0. Moreover, the solution is unique. We need to show that the coefficient matrix in (2.11) is nonsingular. We observe that the matrix in (2.11) is strictly diagonally dominant in accordance with the following definition. Definition 2.2 (Strict Diagonal Dominance) The matrix A = [aij ] ∈ Cn×n is strictly diagonally dominant if |aii | >

|aij |, i = 1, . . . , n.

(2.12)

j =i

Theorem 2.2 (Strict Diagonal Dominance) A strictly diagonally dominant matrix is nonsingular. Moreover, if A ∈ Cn×n is strictly diagonally dominant then the solution x of Ax = b is bounded as follows:  |b | 

i , where σi := |aii | − |aij |. 1≤i≤n σi

max |xi | ≤ max

1≤i≤n

(2.13)

j =i

Proof We first show that the bound (2.13) holds for any solution x. Choose k so that |xk | = maxi |xi |. Then |bk | = |akk xk +

j =k

akj xj | ≥ |akk ||xk | −

  |akj ||xj | ≥ |xk | |akk | − |akj | , j =k

j =k

  and this implies max1≤i≤n |xi | = |xk | ≤ |bσkk | ≤ max1≤i≤n |bσii | . For the nonsingularity, if Ax = 0, then max1≤i≤n |xi | ≤ 0 by (2.13), and so x = 0.   For an alternative simple proof of the nonsingularity based on Gershgorin circle theorem see Exercise 14.3. Theorem 2.2 implies that the system (2.11) has a unique solution giving rise to a function g detailed in Lemma 2.1 and solving the D2 -spline problem. For uniqueness suppose g1 and g2 are two D2 -splines interpolating the same data. Then g := g1 −g2 is an N-spline satisfying (2.2) with y = 0. The solution [μ2 , . . . , μn ]T of (2.11) and also μ1 = μn+1 are zero. It follows from (2.8) that all coefficients ci,j are zero. We conclude that g = 0 and g1 = g2 . Example 2.2 (Cubic B-Spline) For the N-spline with [a, b] = [0, 4] and y = [0, 16 , 23 , 16 , 0] the linear system (2.9) takes the form ⎡ ⎤⎡ ⎤ ⎡ ⎤ 410 2 μ2 ⎣1 4 1⎦ ⎣μ3 ⎦ = ⎣−6⎦ . μ4 014 2

34

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

Fig. 2.4 A cubic B-spline

The solution is μ2 = μ4 = 1, μ3 = −2. The knotset is {0, 1, 2, 3, 4}. Using (2.8) (cf. Exercise 2.6) we find ⎧ ⎪ p1 (x) = 16 x 3 , ⎪ ⎪ ⎪ ⎨p (x) = 1 + 1 (x − 1) + 1 (x − 1)2 − 1 (x − 1)3 , 2 6 2 2 2 g(x) := 2 2 + 1 (x − 2)3 , ⎪ p (x) = − (x − 2) ⎪ 3 3 2 ⎪ ⎪ ⎩ p4 (x) = 16 − 12 (x − 3) + 12 (x − 3)2 − 16 (x − 3)3 ,

if 0 ≤ x < 1, if 1 ≤ x < 2, if 2 ≤ x < 3, if 3 ≤ x ≤ 4, (2.14)

A plot of this spline is shown in Fig. 2.4. On (0, 4) the function g equals the nonzero part of a function known as a C 2 cubic B-spline.

2.1.4 LU Factorization of a Tridiagonal System To find the D 2 -spline g we have to solve the triangular system (2.11). Consider solving a general tridiagonal linear system Ax = b where A = tridiag(ai , di , ci ) ∈

2.1 Cubic Spline Interpolation

35

Cn×n . Instead of using Gaussian elimination directly, we can construct two matrices L and U such that A = LU . Since Ax = LU x = b we can find x by solving two systems Lz = b and U x = z. Moreover L and U are both triangular and bidiagonal, and if in addition they are nonsingular the two systems can be solved easily without using elimination. In our case we write the product A = LU in the form ⎡

d1 ⎢a1 ⎢ ⎢ ⎢ ⎢ ⎣

⎤ ⎡ c1 1 ⎥ d 2 c2 ⎥ ⎢l1 1 ⎥ ⎢ .. .. .. ⎥=⎢ . . . . . ⎥ ⎣ .. .. an−2 dn−1 cn−1 ⎦ ln−1 an−1 dn

⎤ ⎤⎡ u1 c 1 ⎥ ⎥ ⎢ .. .. ⎥ ⎥⎢ . . ⎥. ⎥⎢ ⎦⎣ un−1 cn−1 ⎦ 1 un

(2.15)

To find L and U we first consider the case n = 3. Equation (2.15) takes the form ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎡ 1 0 0 u1 d 1 c1 0 u1 c 1 0 c1 0 ⎣a1 d2 c2 ⎦ = ⎣l1 1 0⎦ ⎣ 0 u2 c2 ⎦ = ⎣l1 u1 l1 c1 + u2 ⎦, c2 0 a2 d3 0 0 u3 0 l2 u2 l2 c2 + u3 0 l2 1 and the systems Lz = b and U x = z can be written ⎡

⎤⎡ ⎤ ⎡ ⎤ 1 0 0 b1 z1 ⎣l1 1 0⎦ ⎣z2 ⎦ = ⎣b2 ⎦ , z3 b3 0 l2 1

⎡ ⎤⎡ ⎤ ⎡ ⎤ u1 c 1 0 x1 z1 ⎣ 0 u2 c2 ⎦ ⎣x2 ⎦ = ⎣z2 ⎦ . 0 0 u3 x3 z3

Comparing elements we find u1 = d 1 ,

l1 = a1 /u1 ,

z1 = b1 ,

z2 = b2 − l1 z1 ,

x3 = z3 /u3 ,

u2 = d2 − l1 c1 ,

l2 = a2 /u2 ,

u3 = d3 − l2 c2 ,

z3 = b3 − l2 z2 ,

x2 = (z2 − c2 x3 )/u2 ,

x1 = (z1 − c1 x2 )/u1 .

In general, if u1 = d 1 ,

lk = ak /uk ,

uk+1 = dk+1 − lk ck ,

k = 1, 2, . . . , n − 1,

(2.16)

then A = LU . If u1 , u2 , . . . , un−1 are nonzero then (2.16) is well defined. If in addition un = 0 then we can solve Lz = b and U x = z for z and x. We formulate this as two algorithms. In trifactor, vectors l ∈ Cn−1 , u ∈ Cn are computed from a, c ∈ Cn−1 , d ∈ Cn . This implements the LU factorization of a tridiagonal matrix:

36

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

function [l,u]=trifactor(a,d,c) % [l,u]=trifactor(a,d,c) u=d; l=a; for k =1:length(a) l(k)=a(k)/u(k); u(k+1)=d(k+1)-l(k)*c(k); end Listing 2.1 trifactor

In trisolve, the solution x of a tridiagonal system with r right hand sides is computed from a previous call to trifactor. Here l, ∈ Cn−1 and u ∈ Cn are output from trifactor and b ∈ Cn,r for some r ∈ N: function x = trisolve (l,u,c,b) % x = trisolve (l,u,c,b) x=b; n= size(b,1); for k =2:n x(k,:)=b(k,:)-l(k-1)*x(k-1,:); end x(n,:)=x(n,:)/u(n); for k=(n-1):-1:1 x(k,:)=(x(k,:)-c(k)*x(k+1,:))/u(k); end Listing 2.2 trisolve

Since division by zero can occur, the algorithms will not work in general, but for tridiagonal strictly diagonally dominant systems we have Theorem 2.3 (LU of a Tridiagonal Strictly Dominant System) A strictly diagonally dominant tridiagonal matrix has a unique LU factorization of the form (2.15). Proof We show that the uk ’s in (2.16) are nonzero for k = 1, . . . , n. For this it is sufficient to show by induction that |uk | ≥ σk + |ck |,

where, σk := |dk | − |ak−1| − |ck | > 0, k = 1, . . . , n,

(2.17)

and where a0 := cn := 0. By assumption |u1 | = |d1 | = σ1 + |c1 |. Suppose |uk | ≥ σk + |ck | for some 1 ≤ k ≤ n − 1. Then |ck |/|uk | < 1 and by (2.16) and strict diagonal dominance |uk+1 | = |dk+1 − lk ck | = |dk+1 −

a k ck |ak ||ck | | ≥ |dk+1 | − uk |uk |

(2.18)

≥ |dk+1 | − |ak | = σk+1 + |ck+1 |.  

2.2 A Two Point Boundary Value Problem

37

Corollary 2.1 (Stability of the LU Factorization) Suppose A ∈ Cn×n is tridiagonal and strictly diagonally dominant with computed elements in the LU factorization given by (2.16). Then (2.17) holds, u1 = d1 and |lk | =

|ak | |ak | |ak ||ck | ≤ , |uk+1 | ≤ |dk+1 | + , k = 1, . . . , n − 1. |uk | |dk | − |ak−1 | |dk | − |ak−1 | (2.19)

Proof Using (2.16) and (2.17) for 1 ≤ k ≤ n − 1 we find |lk | =

|ak | |ak | |ak ||ck | ≤ , |uk+1 | ≤ |dk+1 | + |lk ||ck | ≤ |dk+1 | + . |uk | |dk | − |ak−1 | |dk | − |ak−1 |  

• For a strictly diagonally dominant tridiagonal matrix it follows from Corollary 2.1 that the LU factorization algorithm trifactor is stable meaning that we cannot have severe growth in the computed elements uk and lk . • The number of arithmetic operations to compute the LU factorization of a tridiagonal matrix of order n using (2.16) is 3n−3, while the number of arithmetic operations for Algorithm trisolve is r(5n−4), where r is the number of right-hand sides. This means that the complexity to solve a tridiagonal system is O(n), or more precisely 8n − 7 when r = 1, and this number only grows linearly2 with n.

2.2 A Two Point Boundary Value Problem Consider the simple two point boundary value problem − u (x) = f (x), x ∈ [0, 1],

u(0) = 0, u(1) = 0,

(2.20)

where f is a given continuous function on [0, 1] and u is an unknown function. This problem is also known as the one-dimensional (1D) Poisson problem. In principle it is easy to solve (2.20) exactly. We just integrate f twice and determine the two integration constants so that the homogeneous boundary conditions u(0) = u(1) = 0 are satisfied. For example, if f (x) = 1 then u(x) = x(x − 1)/2 is the solution. Suppose f cannot be integrated exactly. Problem (2.20) can then be solved approximately using the finite difference method. We need a difference approximation to the second derivative. If g is a function differentiable at x then g(x + h2 ) − g(x − h2 ) h→0 h

g  (x) = lim

2 We

show in Sect. 3.3.2 that Gaussian elimination on a full n × n system is an O(n3 ) process.

38

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

and applying this to a function u that is twice differentiable at x u(x+h)−u(x) − u (x + h2 ) − u (x − h2 ) h = lim h→0 h→0 h h u(x + h) − 2u(x) + u(x − h) = lim . h→0 h2

u (x) = lim

u(x)−u(x−h) h

To define the points where this difference approximation is used we choose a positive integer m, let h := 1/(m + 1) be the discretization parameter, and replace the interval [0, 1] by grid points xj := j h for j = 0, 1, . . . , m + 1. We then obtain approximations vj to the exact solution u(xj ) for j = 1, . . . , m by replacing the differential equation by the difference equation −vj −1 + 2vj − vj +1 = f (j h), h2

j = 1, . . . , m,

v0 = vm+1 = 0.

Moving the h2 factor to the right hand side this can be written as an m × m linear system ⎡

⎤ ⎡ ⎤ ⎡ ⎤ 2 −1 0 f (h) ⎢ −1 2 −1 ⎥ v1 ⎢ ⎥⎢ ⎢ f (2h) ⎥ ⎢ ⎥ ⎢ v2 ⎥ ⎢ ⎥ .. .. .. ⎢ ⎥⎢ . ⎥ ⎥ ⎥ .. . . . 2 0 ⎥⎢ . ⎥ = h ⎢ Tv = ⎢ ⎢ ⎥ =: b. . ⎢ ⎥⎢ . ⎥ ⎢ ⎥   ⎢ 0⎥⎣ ⎣f (m − 1)h ⎦ ⎢ ⎥ vm−1 ⎦ ⎣ −1 2 −1 ⎦ vm f (mh) 0 −1 2 (2.21) The matrix T is called the second derivative matrix and will occur frequently in this book. It is our second example of a tridiagonal matrix, T = tridiag(ai , di , ci ) ∈ Rm×m , where in this case ai = ci = −1 and di = 2 for all i.

2.2.1 Diagonal Dominance We want to show that (2.21) has a unique solution. Note that T is not strictly diagonally dominant. However, T is weakly diagonally dominant in accordance with the following definition. Definition 2.3 (Diagonal Dominance) The matrix A = [aij ] ∈ Cn×n is weakly diagonally dominant if |aii | ≥

j =i

|aij |, i = 1, . . . , n.

(2.22)

2.2 A Two Point Boundary Value Problem

39

We showed in Theorem 2.2 that a strictly diagonally dominant matrix is nonsingular. This is in general not true in the weakly diagonally dominant case. Consider the 3 matrices ⎡

⎤ 110 A1 = ⎣1 2 1⎦ , 011

⎡ 10 ⎣ A2 = 0 0 00

⎤ 0 0⎦ , 1



⎤ 2 −1 0 A3 = ⎣ −1 2 −1 ⎦ . 0 −1 2

They are all weakly diagonally dominant, but A1 and A2 are singular, while A3 is nonsingular. Indeed, for A1 column two is the sum of columns one and three, A2 has a zero row, and det(A3 ) = 4 = 0. It follows that for the nonsingularity and existence of an LU factorization of a weakly diagonally dominant matrix we need some additional conditions. Here are some sufficient conditions. Theorem 2.4 (Weak Diagonal Dominance) Suppose A = tridiag(ai , di , ci ) ∈ Cn×n is tridiagonal and weakly diagonally dominant. If in addition |d1 | > |c1 | and ai = 0 for i = 1, . . . , n − 2, then A has a unique LU factorization (2.15). If in addition dn = 0, then A is nonsingular. Proof The proof is similar to the proof of Theorem 2.2. The matrix A has an LU factorization if the uk ’s in (2.16) are nonzero for k = 1, . . . , n − 1. For this it is sufficient to show by induction that |uk | > |ck | for k = 1, . . . , n − 1. By assumption |u1 | = |d1 | > |c1 |. Suppose |uk | > |ck | for some 1 ≤ k ≤ n − 2. Then |ck |/|uk | < 1 and by (2.16) and since ak = 0 |uk+1 | = |dk+1 − lk ck | = |dk+1 −

a k ck |ak ||ck | > |dk+1 | − |ak |. | ≥ |dk+1 | − uk |uk | (2.23)

This also holds for k = n − 1 if an−1 = 0. By (2.23) and weak diagonal dominance |uk+1 | > |dk+1 | − |ak | ≥ |ck+1 | and it follows by induction that an LU factorization exists. It is unique since any LU factorization must satisfy (2.16). For the nonsingularity we need to show that un = 0. For then by Lemma 2.5, both L and U are nonsingular, and this is equivalent to A = LU being nonsingular. If an−1 = 0 then by (2.16) |un | > |dn | − |an−1 | ≥ 0 by weak diagonal dominance, while if an−1 = 0 then again by (2.23) |un | ≥ |dn | > 0.   Consider now the special system T v = b given by (2.21). The matrix T is weakly diagonally dominant and satisfies the additional conditions in Theorem 2.4. Thus it is nonsingular and we can solve the system in O(n) arithmetic operations using the algorithms trifactor and trisolve. We could use the explicit inverse of T , given in Exercise 2.15, to compute the solution of T v = b as v = T −1 b. However, this is not a good idea. In fact, all elements in T −1 are nonzero and the calculation of T −1 b requires O(n2 ) operations.

40

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

2.3 An Eigenvalue Problem Recall that if A ∈ Cn×n is a square matrix and Ax = λx for some nonzero x ∈ Cn , then λ ∈ C is called an eigenvalue and x an eigenvector. We call (λ, x) an eigenpair of A.

2.3.1 The Buckling of a Beam Consider a horizontal beam of length L located between 0 and L on the x-axis of the plane. We assume that the beam is fixed at x = 0 and x = L and that a force F is applied at (L, 0) in the direction towards the origin. This situation can be modeled by the boundary value problem Ry  (x) = −Fy(x),

y(0) = y(L) = 0,

(2.24)

where y(x) is the vertical displacement of the beam at x, and R is a constant defined by the rigidity of the beam. We can transform the problem to the unit interval [0, 1] by considering the function u : [0, 1] → R given by u(t) := y(tL). Since u (t) = L2 y  (tL), the problem (2.24) then becomes u (t) = −Ku(t),

u(0) = u(1) = 0,

K :=

F L2 . R

(2.25)

Clearly u = 0 is a solution, but we can have nonzero solutions corresponding to certain values of the K known as eigenvalues. The corresponding function u is called an eigenfunction. If F = 0 then K = 0 and u = 0 is the only solution, but if the force is increased it will reach a critical value where the beam will buckle and maybe break. This critical value corresponds to the smallest eigenvalue of (2.25). With u(t) = sin (πt) we find u (t) = −π 2 u(t) and this u is a solution if K = π 2 . It can be shown that this is the smallest eigenvalue of (2.25) and solving for F we 2 find F = πL2R . We can approximate this eigenvalue numerically. Choosing m ∈ N, h := 1/(m + 1) and using for the second derivative the approximation u (j h) ≈

u((j + 1)h) − 2u(j h) + u((j − 1)h) , h2

j = 1, . . . , m,

(this is the same finite difference approximation as in Sect. 2.2) we obtain −vj −1 + 2vj − vj +1 = Kvj , h2

j = 1, . . . , m, h =

1 , m+1

v0 = vm+1 = 0,

2.4 The Eigenpairs of the 1D Test Matrix

41

where vj ≈ u(j h) for j = 0, . . . , m + 1. If we define λ := h2 K then we obtain the equation T v = λv, with v = [v1 , . . . , vm ]T ,

(2.26)

and ⎤ 2 −1 0 ⎥ ⎢ −1 2 −1 ⎥ ⎢ ⎥ ⎢ .. .. .. ⎥ ⎢ . . . 0 ⎥ ∈ Rm×m . ⎢ := tridiagm (−1, 2, −1)− = ⎢ ⎥ ⎢ 0⎥ ⎥ ⎢ ⎣ −1 2 −1 ⎦ 0 −1 2 (2.27) ⎡

T = Tm

The problem now is to determine the eigenvalues of T . Normally we would need a numerical method to determine the eigenvalues of a matrix, but for this simple problem the eigenvalues can be determined exactly. We show in the next subsection that the smallest eigenvalue of (2.26) is given by λ = 4 sin2 (πh/2). Since λ = 2 2 h2 K = h FRL we can solve for F to obtain F =

4 sin2 (πh/2)R . h2 L2

For small h this is a good approximation to the value

π 2R L2

we computed above.

2.4 The Eigenpairs of the 1D Test Matrix The second derivative matrix T = tridiag(−1, 2, −1) is a special case of the tridiagonal matrix T 1 := tridiag(a, d, a)

(2.28)

where a, d ∈ R. We call this the 1D test matrix. It is symmetric and strictly diagonally dominant if |d| > 2|a|. We show that the eigenvectors are the columns of the sine matrix defined by 

j kπ S = sin m+1

m ∈ Rm×m . j,k=1

(2.29)

42

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

For m = 3, ⎡

⎡ ⎤ 3π ⎤ sin π4 sin 2π t 1 t 4 sin 4 4π 6π ⎦ = ⎣ S = [s 1 , s 2 , s 3 ] = ⎣sin 2π 1 0 −1⎦ , 4 sin 4 sin 4 3π 6π 9π t −1 t sin 4 sin 4 sin 4

1 t := √ . 2

Lemma 2.2 (Eigenpairs of 1D Test Matrix) Suppose T 1 = (tkj )k,j tridiag(a, d, a) ∈ Rm×m with m ≥ 2, a, d ∈ R, and let h = 1/(m + 1).

=

1. We have T 1 s j = λj s j for j = 1, . . . , m, where s j = [sin(j πh), sin(2j πh), . . . , sin(mj πh)]T ,

(2.30)

λj = d + 2a cos(j πh).

(2.31)

2. The eigenvalues are distinct and the eigenvectors are orthogonal s Tj s k =

m+1 1 δj,k = δj,k , 2 2h

j, k = 1, . . . , m.

(2.32)

Proof We find for 1 < k < m (T 1 s j )k =

m

tk,l sin(lj πh)

l=1

     = a sin (k − 1)j πh + sin (k + 1)j πh + d sin(kj πh) = 2a cos(j πh) sin(kj πh) + d sin(kj πh) = λj sk,j . This also holds for k = 1, m, and part 1 follows. Since j πh = j π/(m + 1) ∈ (0, π) for j = 1, . . . , m and the cosine function is strictly monotone decreasing on (0, π) the eigenvalues are distinct, and since T 1 is symmetric it follows from Lemma 2.3 below that the eigenvectors s j are orthogonal. To finish the proof of (2.32) we compute s Tj s j =

m

sin2 (kj πh) =

k=1

=

m+1 1 − 2 2

m

k=0

m

k=0

 1  1 − cos(2kj πh) 2 m

sin2 (kj πh) =

cos(2kj πh) =

k=0

m+1 , 2

2.5 Block Multiplication and Triangular Matrices

43

since the last cosine sum is zero.√We show this by summing a geometric series of complex exponentials. With i = −1 we find m

cos(2kj πh) + i

k=0

m

sin(2kj πh) =

k=0

m

e2ikj πh =

k=0

e2i(m+1)j πh − 1 = 0, e2ij πh − 1  

and (2.32) follows. T

Recall that the conjugate transpose of a matrix is defined by A∗ := A , where A is obtained from A by taking the complex conjugate of all elements. A matrix A ∈ Cn×n is Hermitian if A∗ = A. A real symmetric matrix is Hermitian. Lemma 2.3 (Eigenpairs of a Hermitian Matrix) The eigenvalues of a Hermitian matrix are real. Moreover, eigenvectors corresponding to distinct eigenvalues are orthogonal. Proof Suppose A∗ = A and Ax = λx with x = 0. We multiply both sides of ∗ Ax = λx from the left by x ∗ and divide by x ∗ x to obtain λ = xx ∗Ax x . Taking (x ∗ Ax)∗ x ∗ A∗ x x ∗ Ax ∗ complex conjugates we find λ = λ = (x ∗ x)∗ = x ∗ x = x ∗ x = λ, and λ is real. Suppose that (λ, x) and (μ, y) are two eigenpairs for A with μ = λ. Multiplying Ax = λx by y ∗ gives λy ∗ x = y ∗ Ax = (x ∗ A∗ y)∗ = (x ∗ Ay)∗ = (μx ∗ y)∗ = μy ∗ x, using that μ is real. Since λ = μ it follows that y ∗ x = 0, which means that x and y are orthogonal.  

2.5 Block Multiplication and Triangular Matrices Block multiplication is a powerful and essential tool for dealing with matrices. It will be used extensively in this book. We will also need some basic facts about triangular matrices.

2.5.1 Block Multiplication A rectangular matrix A can be partitioned into submatrices by drawing horizontal lines between selected rows and vertical lines between selected columns. For example, the matrix ⎡ ⎤ 123 A = ⎣4 5 6⎦ 789

44

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

can be partitioned as ⎤ ⎡   123 A11 A12 (i) = ⎣4 5 6⎦, A21 A22 789 ⎡ T⎤ ⎡ ⎤ 123 a 1: (iii) ⎣a T2: ⎦ = ⎣ 4 5 6 ⎦ , a T3: 789

⎡ ⎤ 123   (ii) a :1 , a :2 , a :3 = ⎣ 4 5 6 ⎦ , 789 ⎡ ⎤ 123   (iv) A11 , A12 = ⎣ 4 5 6 ⎦ . 789

In (i) the matrix A is divided into four submatrices         4 56 , and A22 = , A11 = 1 , A12 = 2, 3 , A21 = 7 89 while in (ii) and (iii) A has been partitioned into columns and rows, respectively. The submatrices in a partition are often referred to as blocks and a partitioned matrix is sometimes called a block matrix. In the following we assume that A ∈ Cm×p and B ∈ Cp×n . Here are some rules and observations for block multiplication.   1. If B = b:1 , . . . , b :n is partitioned into columns then the partition of the product AB into columns is   AB = Ab:1 , Ab:2 , . . . , Ab:n . In particular, if I is the identity matrix of order p then     A = AI = A e1 , e2 , . . . , ep = Ae1 , Ae2 , . . . , Aep and we see that column j of A can be written Aej for j = 1, . . . , p. 2. Similarly, if A is partitioned into rows then ⎡ T ⎤ T⎤ a1: a1: B ⎢ aT ⎥ ⎢ aT B ⎥ ⎢ 2: ⎥ ⎢ 2: ⎥ AB = ⎢ . ⎥ B = ⎢ . ⎥ , ⎣ .. ⎦ ⎣ .. ⎦ T T B am: am: ⎡

and taking A = I it follows that row i of B can be written eTi B for i = 1, . . . , m. 3. It is often useful to write the matrix-vector product Ax as a linear combination of the columns of A Ax = x1 a :1 + x2 a :2 + · · · + xp a :p .

2.5 Block Multiplication and Triangular Matrices

45

  4. If B = B 1 , B 2 , where B 1 ∈ Cp×r and B 2 ∈ Cp×(n−r) then     A B 1 , B 2 = AB 1 , AB 2 . This follows  from Rule 1. by an appropriate grouping of columns. A1 5. If A = , where A1 ∈ Ck×p and A2 ∈ C(m−k)×p then A2     A1 B A1 B= . A2 A2 B This follows from Rule 2. bya grouping of rows.    B1 , where A1 ∈ Cm×s , A2 ∈ Cm×(p−s) , B 1 ∈ 6. If A = A1 , A2 and B = B2 Cs×n and B 2 ∈ C(p−s)×n then 

    B1  = A1 B 1 + A2 B 2 . A1 , A2 B2

Indeed, (AB)ij =

p

k=1

aik bkj =

s

aik bkj +

k=1

p

aik bkj

k=s+1

= (A1 B 1 )ij + (A2 B 2 )ij = (A1 B 1 + A2 B 2 )ij .     A11 A12 B 11 B 12 7. If A = and B = then A21 A22 B 21 B 22 

A11 A12 A21 A22

    B 11 B 12 A11 B 11 + A12 B 21 A11 B 12 + A12 B 22 = , B 21 B 22 A21 B 11 + A22 B 21 A21 B 12 + A22 B 22

provided the vertical partition in A matches the horizontal one in B, i.e. the number of columns in A11 and A21 equals the number of rows in B 11 and B 12 and the number of columns in A equals the number of rows in B. To show this we use Rule 4. to obtain       A A B 12 A11 A12 B 11 , 11 12 . AB = A21 A22 B 21 A21 A22 B 22 We complete the proof using Rules 5. and 6.

46

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

8. Consider finally the general case. If all the matrix products Aik B kj in C ij =

s

Aik B kj ,

i = 1, . . . , p, j = 1, . . . , q

k=1

are well defined then ⎡ A11 · · · ⎢ .. ⎣ .

⎤⎡ B 11 · · · A1s .. ⎥ ⎢ .. . ⎦⎣ .

Ap1 · · · Aps

B s1

⎤ ⎡ ⎤ C 11 · · · C 1q B 1q .. ⎥ = ⎢ .. .. ⎥ . . ⎦ ⎣ . . ⎦ · · · B sq C p1 · · · C pq

The requirements are that • the number of columns in A is equal to the number of rows in B. • the position of the vertical partition lines in A has to mach the position of the horizontal partition lines in B. The horizontal lines in A and the vertical lines in B can be anywhere.

2.5.2 Triangular Matrices We need some basic facts about triangular matrices and we start with Lemma 2.4 (Inverse of a Block Triangular Matrix) Suppose   A11 A12 A= 0 A22 where A, A11 and A22 are square matrices. Then A is nonsingular if and only if both A11 and A22 are nonsingular. In that case  −1  A11 C −1 A = , (2.33) 0 A−1 22 for some matrix C. Proof Suppose A is nonsingular. We partition B := A−1 conformally with A and have      B 11 B 12 A11 A12 I 0 BA = = =I B 21 B 22 0 A22 0I Using block-multiplication we find B 11 A11 = I , B 21 A11 = 0, B 21 A12 + B 22 A22 = I , B 11 A12 + B 12 A22 = 0.

2.5 Block Multiplication and Triangular Matrices

47

The first equation implies that A11 is nonsingular, this in turn implies that B 21 = 0A−1 11 = 0 in the second equation, and then the third equation simplifies to B 22 A22 = I . We conclude that also A22 is nonsingular. From the fourth equation we find −1 B 12 = C = −A−1 11 A12 A22 .

Conversely, if A11 and A22 are nonsingular then     −1 −1   A11 A12 I 0 A11 −A−1 11 A12 A22 = =I 0 A−1 0 A 0I 22 22 and A is nonsingular with the indicated inverse.

 

Consider now a triangular matrix. Lemma 2.5 (Inverse of a Triangular Matrix) An upper (lower) triangular matrix A = [aij ] ∈ Cn×n is nonsingular if and only if the diagonal elements aii , i = 1, . . . , n are nonzero. In that case the inverse is upper (lower) triangular with diagonal elements aii−1 , i = 1, . . . , n. Proof We use induction on n. The result holds for n = 1. The 1-by-1 matrix A = −1 ]. Suppose [a11] is nonsingular if and only if a11 = 0 and in that case A−1 = [a11 (k+1)×(k+1) the result holds for n = k and let A ∈ C be upper triangular. We partition A in the form   ak Ak A= 0 ak+1,k+1 and note that Ak ∈ Ck×k is upper triangular. By Lemma 2.4 A is nonsingular if and only if Ak and (ak+1,k+1 ) are nonsingular and in that case −1

A

  c A−1 k = , −1 0 ak+1,k+1

for some c ∈ Cn . By the induction hypothesis Ak is nonsingular if and only if the diagonal elements a11, . . . , akk of Ak are nonzero and in that case A−1 k is upper −1  triangular with diagonal elements aii , i = 1, . . . , k. The result for A follows.  Lemma 2.6 (Product of Triangular Matrices) The product C = AB = (cij ) of two upper (lower) triangular matrices A = (aij ) and B = (bij ) is upper (lower) triangular with diagonal elements cii = aii bii for all i. Proof Exercise. A matrix is called unit triangular if it is triangular with 1’s on the diagonal.

 

48

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

Lemma 2.7 (Unit Triangular Matrices) For a unit upper (lower) triangular matrix A ∈ Cn×n : 1. A is nonsingular and the inverse is unit upper(lower) triangular. 2. The product of two unit upper (lower) triangular matrices is unit upper (lower) triangular.  

Proof 1. follows from Lemma 2.5, while Lemma 2.6 implies 2.

2.6 Exercises Chap. 2 2.6.1 Exercises Sect. 2.1 Exercise 2.1 (The Shifted Power Basis Is a Basis) Show that the polynomials {(x − xi )j }0≤j ≤n is a basis for polynomials of degree n.3 Exercise 2.2 (The Natural Spline, n = 1) How can one define an N-spline when n = 1? Exercise 2.3 (Bounding the Moments) Show that for the N-spline the solution of the linear system (2.11) is bounded as follows4 : max |μj | ≤

2≤j ≤n

3 max |yi+1 − 2yi + yi−1 |. h2 2≤i≤n

Exercise 2.4 (Moment Equations for 1. Derivative Boundary Conditions) Suppose instead of the D2 boundary conditions we use D1 conditions given by g  (a) = s1 and g  (b) = sn+1 for some given numbers s1 and sn+1 . Show that the linear system for the moments of a D1 -spline can be written ⎡ ⎡ 2 ⎢1 ⎢ ⎢ ⎢ ⎢ ⎣

1 4 1 .. .. .. . . . 1 4 1

⎤⎡



y2 − y1 − hs1 δ 2 y2 δ 2 y3 .. .



⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎥⎢ ⎢ ⎥ ⎥ ⎥⎢ ⎥ 6 ⎢ ⎥ ⎥⎢ ⎢ ⎥, ⎥= 2⎢ ⎥⎢ ⎥ ⎥ h ⎢ ⎥⎢ ⎥ ⎢ ⎥ δ 2 yn−1 1⎦ ⎣ μn ⎦ ⎢ ⎥ ⎣ ⎦ μn+1 δ 2 yn 2 hsn+1 − yn+1 + yn μ1 μ2 .. .

(2.34)

where δ 2 yi := yi+1 − 2yi + yi−1 , i = 2, . . . , n. Hint: Use (2.10) to compute g  (x1 ) and g  (xn+1 ), Is g unique? 3 Hint: 4 Hint,

consider an arbitrary polynomial of degree n and expanded it in Taylor series around xi . use Theorem 2.2.

2.6 Exercises Chap. 2

49

Exercise 2.5 (Minimal Norm Property of the Natural Spline) Study proof of the following theorem.5 Theorem 2.5 (Minimal Norm Property of a Cubic Spline) Suppose g is an Nspline. Then 

b

2 g  (x) dx ≤



a

b

  2 h (x) dx

a

for all h ∈ C 2 [a, b] such that h(xi ) = g(xi ), i = 1, . . . , n + 1. Proof Let h be any interpolant as in the theorem. We first show the orthogonality condition 

b

g  e = 0,

e := h − g.

(2.35)

a

 b b b Integration by parts gives a g  e = g  e a − a g  e . The first term is zero since g  is continuous and g  (b) = g  (a) = 0. For the second term, since g  is equal to a constant vi on each subinterval (xi , xi+1 ) and e(xi ) = 0, for i = 1, . . . , n + 1 

b a

 

g e =

n 

xi+1

 

g e =

n

xi

i=1

i=1



xi+1

vi xi

n

  e = vi e(xi+1 ) − e(xi ) = 0. 

i=1

Writing h = g + e and using (2.35) 

b



 2

h ) =

a



b

  2 g ) +

b

  2 g ) +

a

= a

and the proof is complete.

  g + e )2

a

= 

b

 

b

  2 e ) +2

b

  2 e ) ≥

a

a



b

g  e

a



b

  2 g )

a

 

Exercise 2.6 (Computing the D2 -Spline) Let g be the D2 -spline corresponding to an interval [a, b], a vector y ∈ Rn+1 and μ1 , μn+1 . The vector x = [x1 , . . . , xn ] and

5 The name spline is inherited from a“physical analogue”, an elastic ruler that is used to draw smooth curves. Heavy weights, called ducks, are used to force the ruler to pass through, or near given locations. The ruler will take a shape that minimizes its potential energy. Since the potential energy is proportional to the integral of the square of the curvature, and the curvature can be approximated by the second derivative it follows from Theorem 2.5 that the mathematical N-spline approximately models the physical spline.

50

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

the coefficient matrix C ∈ Rn×4 in (2.7) are returned in the following algorithm. It uses Algorithms 2.1 and 2.2 to solve the tridiagonal linear system. function [x,C]=splineint(a,b,y,mu1,munp1) % [x,C]=splineint(a,b,y,mu1,munp1) y=y(:); n=length(y)-1; h=(b-a)/n; x=a:h:b-h; c=ones(n-2,1); [l,u]= trifactor(c,4*ones(n-1,1),c); b1=6/h^2*(y(3:n+1)-2*y(2:n)+y(1:n-1)); b1(1)=b1(1)-mu1; b1(n-1)=b1(n-1)-munp1; mu= [mu1;trisolve(l,u,c,b1);munp1]; C=zeros(4*n,1); C(1:4:4*n-3)=y(1:n); C(2:4:4*n-2)=(y(2:n+1)-y(1:n))/h... -h*mu(1:n)/3-h*mu(2:n+1)/6; C(3:4:4*n-1)=mu(1:n)/2; C(4:4:4*n)=(mu(2:n+1)-mu(1:n))/(6*h); C=reshape(C,4,n)’; end Listing 2.3 splineint

Use the algorithm to compute the ci,j in Example 2.2. Exercise 2.7 (Spline Evaluation) To plot a piecewise polynomial g in the form (2.4) we need to compute values g(rj ) at a number of sites r = [r1 , . . . , rm ] ∈ Rm for some reasonably large integer m. To determine g(rj ) for some j we need to find an integer ij so that g(rj ) = pij (rj ). Given k ∈ N, t = [t1 , . . . , tk ] and a real number x. We consider the problem of computing an integer i so that i = 1 if x < t2 , i = k if x ≥ tk , and ti ≤ x < ti+1 otherwise. If x ∈ Rm is a vector then an m-vector i should be computed, such that the j th component of i gives the location of the j th component of x. The following MATLAB function determines i = [i1 , . . . , im ]. It uses the built in MATLAB functions length, min, sort, find. function i = findsubintervals(t,x) %i = findsubintervals(t,x) k=length(t); m=length(x); if kk)-(1:m))’; end Listing 2.4 findsubintervals

2.6 Exercises Chap. 2

51

Fig. 2.5 The cubic spline interpolating f (x) = arctan(10x) + π/2 at 14 equidistant sites on [−1, 1]. The exact function is also shown

Use findsubintervals and the algorithm splineval below to make the plots in Fig. 2.5. function [X,G]=splineval(x,C,X) % [X,G]=splineval(x,C,X) m=length(X); i=findsubintervals(x,X); G=zeros(m,1); for j=1:m k=i(j); t=X(j)-x(k); G(j)=[1,t,t^2,t^3]*C(k,:)’; end Listing 2.5 splineval

Given output x, C of splineint, defining a cubic spline g, and a vector X, splineval˘acomputes the vector G = g(X).

52

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

2.6.2 Exercises Sect. 2.2 Exercise 2.8 (Central Difference Approximation of 2. Derivative) Consider δ 2 f (x) :=

f (x + h) − 2f (x) + f (x − h) , h2

h > 0,

f : [x − h, x + h] → R.

a) Show using Taylor expansion that if f ∈ C 2 [x − h, x + h] then for some η2 δ 2 f (x) = f  (η2 ),

x − h < η2 < x + h.

b) Show that if f ∈ C 4 [x − h, x + h] then for some η4 δ 2 f (x) = f  (x) +

h2 (4) f (η4 ), 12

x − h < η4 < x + h.

δ 2 f (x) is known as the central difference approximation to the second derivative at x. Exercise 2.9 (Two Point Boundary Value Problem) We consider a finite difference method for the two point boundary value problem −u (x) + r(x)u (x) + q(x)u(x) = f (x), for x ∈ [a, b], u(a) = g0 ,

(2.36)

u(b) = g1 .

We assume that the given functions f, q and r are continuous on [a, b] and that q(x) ≥ 0 for x ∈ [a, b]. It can then be shown that (2.36) has a unique solution u. To solve (2.36) numerically we choose m ∈ N, h = (b −a)/(m+1), xj = a +j h for j = 0, 1, . . . , m + 1 and solve the difference equation vj +1 − vj −1 −vj −1 + 2vj − vj +1 + q(xj )vj = f (xj ), + r(xj ) h2 2h

j = 1, . . . , m, (2.37)

with v0 = g0 and vm+1 = g1 . a) Show that (2.37) leads to a tridiagonal linear system Av = b, where A = tridiag(aj , dj .cj ) ∈ Rm×m has elements aj = −1 −

h h r(xj ), cj = −1 + r(xj ), dj = 2 + h2 q(xj ), 2 2

2.6 Exercises Chap. 2

53

and ⎧ 2 ⎪ ⎪ ⎨h f (x1 ) − a1 g0 , bj = h2 f (xj ), ⎪ ⎪ ⎩h2 f (x ) − c g , m

m 1

if j = 1, if 2 ≤ j ≤ m − 1, if j = m.

b) Show that the linear system satisfies the conditions in Theorem 2.4 if the spacing h is so small that h2 |r(x)| < 1 for all x ∈ [a, b]. (c) Propose a method to find v1 , . . . , vm . Exercise 2.10 (Two Point Boundary Value Problem; Computation) a) Consider the problem (2.36) with r = 0, f = q = 1 and boundary conditions u(0) = 1, u(1) = 0. The exact solution is u(x) = 1 − sinh x/ sinh 1. Write a computer program to solve (2.37) for h = 0.1, 0.05, 0.025, 0.0125, and compute the “error” max1≤j ≤m |u(xj ) − vj | for each h. b) Make a combined plot of the solution u and the computed points vj , j = 0, . . . , m + 1 for h = 0.1. c) One can show that the error is proportional to hp for some integer p. Estimate p based on the error for h = 0.1, 0.05, 0.025, 0.0125.

2.6.3 Exercises Sect. 2.3 Exercise 2.11 (Approximate Force) Show that F =

π 2R 4 sin2 (πh/2)R = + O(h2 ). h2 L2 L2

Exercise 2.12 (Symmetrize Matrix (Exam Exercise 1977-3)) Let A ∈ Rn×n be tridiagonal and suppose ai,i+1 ai+1,i > 0 for i = 1, . . . , n − 1. Show that there exists a diagonal matrix D = diag(d1 , . . . , dn ) with di > 0 for all i such that B := DAD −1 is symmetric.

2.6.4 Exercises Sect. 2.4 Exercise 2.13 (Eigenpairs T of Order 2) Compute directly the eigenvalues and eigenvectors for T when n = 2 and thus verify Lemma 2.2 in this case.

54

2 Diagonally Dominant Tridiagonal Matrices; Three Examples

Exercise 2.14 (LU Factorization of 2. Derivative Matrix) Show that T = LU , where ⎡

1

⎢ 1 ⎢− ⎢ 2 ⎢ L=⎢ 0 ⎢ ⎢ . ⎣ .. 0

0 ··· ··· . 1 .. .. . − 23 1 .. .. .. . . . m−1 ··· 0 − m

⎤ ⎡ 2 −1 0 .. ⎥ ⎢ 3 ⎢0 .⎥ ⎥ ⎢ 2 .. ⎥ , U = ⎢ .. . . ⎥ ⎢. . .⎥ ⎢ ⎥ ⎢ .. ⎣. 0⎦ 1 0 ···

⎤ ··· 0 . ⎥ . −1 . . .. ⎥ ⎥ ⎥ .. .. . . 0 ⎥ . ⎥ ⎥ .. m . m−1 −1 ⎦ · · · 0 m+1 m 0

(2.38)

This is the LU factorization of T . Exercise 2.15 (Inverse of the 2. Derivative Matrix) Let S ∈ Rm×m have elements sij given by si,j = sj,i =

  1 j m+1−i , m+1

1 ≤ j ≤ i ≤ m.

(2.39)

Show that ST = I and conclude that T −1 = S.

2.6.5 Exercises Sect. 2.5 Exercise 2.16 (Matrix Element as a Quadratic Form) For any matrix A show that aij = eTi Aej for all i, j . Exercise 2.17 (Outer Product Expansion of a Matrix) For any matrix A ∈ Cm×n m show that A = i=1 nj=1 aij ei eTj . Exercise 2.18 (The Product AT A) Let B = AT A. Explain why this product is defined for any matrix A. Show that bij = a T:i a :j for all i, j . Exercise 2.19 (Outer Product Expansion) For A ∈ Rm×n and B ∈ Rp×n show that AB T = a :1 bT:1 + a :2 bT:2 + · · · + a :n bT:n . This is called the outer product expansion of the columns of A and B. Exercise 2.20 (System with Many Right Hand Sides; Compact Form) Suppose A ∈ Rm×n , B ∈ Rm×p , and X ∈ Rn×p . Show that AX = B

⇐⇒

Ax :j = b:j , j = 1, . . . , p.

2.7 Review Questions

55

  Exercise 2.21 (Block Multiplication Example) Suppose A = A1 , A2 and B =   B1 . When is AB = A1 B 1 ? 0 Exercise 2.22 (Another Block Multiplication Example) Suppose A, B, C ∈ Rn×n are given in block form by  A :=

 λ aT , 0 A1

B :=

 T 10 , 0 B1

C :=

 T 10 , 0 C1

where A1 , B 1 , C 1 ∈ R(n−1)×(n−1) . Show that   λ aT B1 CAB = . 0 C 1 A1 B 1

2.7 Review Questions 2.7.1 2.7.2 2.7.3 2.7.4 2.7.5 2.7.6 2.7.7 2.7.8 2.7.9

How do we define nonsingularity of a matrix? Define the second derivative matrix T . How did we show that it is nonsingular? Why do we not use the explicit inverse of T to solve the linear system T x = b? What are the eigenpairs of the matrix T ? Why are the diagonal  1 1+i elements of a Hermitian matrix real? Is the matrix 1+i Hermitian? Symmetric? 2 Is a weakly diagonally dominant matrix nonsingular? Is a strictly diagonally dominant matrix always nonsingular? Does a tridiagonal matrix always have an LU factorization?

Chapter 3

Gaussian Elimination and LU Factorizations

In this chapter we first review Gaussian elimination. Gaussian elimination leads to an LU factorization of the coefficient matrix or more generally to a PLU factorization, if row interchanges are introduced. Here P is a permutation matrix, L is lower triangular and U is upper triangular. We also consider in great detail the general theory of LU factorizations.

3.1 3 by 3 Example Gaussian elimination with row interchanges is the classical method for solving n linear equations in n unknowns.1 We first recall how it works on a 3 × 3 system. Example 3.1 (Gaussian Elimination on a 3 × 3 System) Consider a nonsingular system of three equations in three unknowns: (1) (1) (1) a11 x1 + a12 x2 + a13 x3 = b1(1),

I

(1) (1) (1) x1 + a22 x2 + a23 x3 = b2(1), a21

II

(1) (1) (1) x1 + a32 x2 + a33 x3 = b3(1). a31

III.

1 The

method was known long before Gauss used it in 1809. It was further developed by Doolittle in 1881, see [4]. © Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_3

57

58

3 Gaussian Elimination and LU Factorizations (1)

(1)

To solve this system by Gaussian elimination suppose a11 = 0. We subtract l21 := (1) (1) (1) (1) (1) a21 /a11 times equation I from equation II and l31 := a31 /a11 times equation I from equation III. The result is (1) (1) (1) a11 x1 + a12 x2 + a13 x3 = b1(1),

I II

(2) (2) x2 + a23 x3 = b2(2), a22 (2)

(2)

III ,

(2)

a32 x2 + a33 x3 = b3 , (2)

where bi

(1)

= bi

(1) (1)

− li1 bi

(2)

(1)

(1) (1)

for i = 2, 3 and aij = aij − li,1 a1j for i, j = 2, 3.

(1) (1) (1) = 0 and a21 = 0 we first interchange equation I and equation II. If a11 = If a11 (1) a21 = 0 we interchange equation I and III. Since the system is nonsingular the first column cannot be zero and an interchange is always possible. (2) (2) (2) (2) If a22 = 0 we subtract l32 := a32 /a22 times equation II from equation III to obtain (1) (1) (1) x1 + a12 x2 + a13 x3 = b1(1), a11 (2) (2) x2 + a23 x3 = b2(2), a22 (3) a33 x3 = b3(3),

I II III ,

(3) (2) (2) (2) (2) (2) (2) (2) where a33 = a33 − l32 a23 and b3(3) = b3(2) − l32 b2 . If a22 = 0 then a32 = 0 (cf. Sect. 3.4) and we first interchange equation II and equation III . The reduced system is easy to solve since it is upper triangular. Starting from the bottom and moving upwards we find (3)

(3)

x3 = b3 /a33 (2)

(2)

(1)

(1)

(2)

x2 = (b2 − a23 x3 )/a22

(1)

(1)

x1 = (b1 − a12 x2 − a13 x3 )/a11 . This is known as back substitution. Gauss elimination leads to an LU factorization. (k) Indeed, if akk = 0, k = 1, 2 then ⎡

1

⎤ ⎤⎡ (1) (1) (1) 0 0 a12 a13 a11 ⎥⎢ (2) (2) ⎥ 1 0⎦ ⎣ 0 a22 a23 ⎦ (2) (3) l32 1 0 0 a33

⎢ (1) LU := ⎣l21 (1) l31 ⎤ ⎡ ⎤ ⎡ (1) (1) (1) (1) (1) (1) a12 a13 a12 a13 a11 a11 ⎥ ⎢ (1) (1) (1)⎥ ⎢ (1) (1) (1) (1) (2) (1) (1) (2) = ⎣l21 a11 l21 a12 + a22 l21 a13 + a23 ⎦=⎣a21 a22 a23 ⎦= A. (1) (1) (1) (1) (2) (2) (1) (1) (2) (2) (3) (1) (1) (1) l31 a11 l31 a12 + l32 a22 l31 a13 + l32 a23 + a33 a31 a32 a33

3.2 Gauss and LU

59

Thus Gaussian elimination leads to an LU factorization of the coefficient matrix A(1) (cf. the proof of Theorem 3.2).

3.2 Gauss and LU In Gaussian elimination without row interchanges we start with a linear system Ax = b and generate a sequence of equivalent systems A(k) x = b(k) for k = 1, . . . , n, where A(1) = A, b(1) = b, and A(k) has zeros under the diagonal in its first k − 1 columns. Thus A(n) is upper triangular and the system A(n) x = b(n) is easy to solve. The process is illustrated in Fig. 3.1. The matrix A(k) takes the form ⎤ ⎡ (1) (1) (1) (1) (1) a1,1 · · · a1,k−1 a1,k · · · a1,j · · · a1,n ⎢ .. .. .. .. ⎥ .. ⎥ ⎢ . ⎢ . . . . ⎥ ⎢ (k−1) (k−1) (k−1) (k−1) ⎥ ⎥ ⎢ ak−1,k · · · ak−1,j · · · ak−1,n ak−1,k−1 ⎥ ⎢ ⎢ (k) (k) (k) ⎥ a · · · a · · · a ⎢ k,k k,j k,n ⎥ ⎥ A(k) = ⎢ (3.1) .. .. .. ⎥ . ⎢ ⎥ ⎢ . . . ⎥ ⎢ (k) (k) (k) ⎢ ai,k · · · ai,j · · · ai,n ⎥ ⎥ ⎢ ⎢ .. .. .. ⎥ ⎥ ⎢ . . . ⎦ ⎣ (k) (k) (k) an,k · · · an,j · · · an,n The process transforming A(k) into A(k+1) for k = 1, . . . , n − 1 can be described as follows. for i = k + 1 : n (k) (k) (k) lik = aik /akk

(3.2)

for j = k : n (k+1)

aij

A(1)

Fig. 3.1 Gaussian elimination

A(2)

(k)

(k) (k)

= aij − lik akj

A(k)

A(n)

60

3 Gaussian Elimination and LU Factorizations

(k+1)

(k)

For j = k it follows from (3.2) that aik

= aik −

(k)

aik

(k)

akk

(k)

akk = 0 for i =

k + 1, . . . , n. Thus A(k+1) will have zeros under the diagonal in its first k columns (k) and the elimination is carried one step further. The numbers lik in (3.2) are called multipliers. Gaussian elimination with no row interchanges is valid if and only if the pivots (k) akk are nonzero for k = 1, . . . , n − 1. This depends on certain submatrices of A known as principal submatrices. Definition 3.1 (Principal Submatrix) For k = 1, . . . , n the matrices A[k] ∈ Ck×k given by ⎡

A[k]

a11 · · · ⎢ := A(1 : k, 1 : k) = ⎣ ...

⎤ ak1 .. ⎥ . ⎦

ak1 · · · akk are called the leading principal submatrices of A ∈ Cn×n . More generally, a matrix B ∈ Ck×k is called a principal submatrix of A if B = A(r, r), where r = [r1 , . . . , rk ] for some 1 ≤ r1 < · · · < rk ≤ n. Thus, bi,j = ari ,rj ,

i, j = 1, . . . , k.

The determinant of a (leading) principal submatrix is called a (leading) principal minor. A principal submatrix is leading if rj = j for j = 1, . . . , k. Also a principal submatrix is special in that it uses the same rows and columns of A. For k = 1 The only principal submatrices of order k = 1 are the diagonal elements of A. !1 2 3" Example 3.2 (Principal Submatrices) The principal submatrices of A = 4 5 6 789 are [1], [5], [9],

1 2 1 3 5 6 4 5 , 7 9 , 8 9 , A.

The leading principal submatrices are [1],

1 2 45

, A.

(k)

Theorem 3.1 We have ak,k = 0 for k = 1, . . . , n − 1 if and only if the leading principal submatrices A[k] of A are nonsingular for k = 1, . . . , n − 1. Moreover (1) (2) (k) a22 · · · akk , det(A[k] ) = a11

k = 1, . . . , n.

(3.3)

3.2 Gauss and LU

61 (k)

Proof Let B k = Ak−1 be the upper left k −1 corner of A(k) given by (3.1). Observe that the elements of B k are computed from A by using only elements from A[k−1] . Since the determinant of a matrix does not change under the operation of subtracting a multiple of one row from another row the determinant of A[k] equals the product (1) (k) · · · akk = 0 for k = of diagonal elements of B k+1 and (3.3) follows. But then a11 1, . . . , n − 1 if and only if det(A[k] ) = 0 for k = 1, . . . , n − 1, or equivalently A[k] is nonsingular for k = 1, . . . , n − 1.   Gaussian elimination is a way to compute the LU factorization of the coefficient matrix. Theorem 3.2 Suppose A ∈ Cn×n and that the leading principal submatrices A[k] are nonsingular for k = 1, . . . , n − 1. Then Gaussian elimination with no row interchanges results in an LU factorization of A. In particular A = LU , where ⎡



1

⎢ (1) ⎥ ⎢l21 1 ⎥ ⎥ L=⎢ .. ⎥ , ⎢ .. . ⎦ ⎣ . (1) (2) ln1 ln2 · · · 1

⎡ (1) a11 · · · ⎢ .. U =⎣ .

(1) ⎤ a1n .. ⎥ , . ⎦

(3.4)

(n) ann

(j )

where the lij and aij(i) are given by (3.2). Proof From (3.2) we have for all i, j (k) (k)

(k)

(k+1)

lik akj = aij − aij

(k) (j )

(j )

for k < min(i, j ), and lij ajj = aij for i > j.

Thus for i ≤ j we find (LU )ij =

i−1

(k) (k)

(i)

lik akj + aij =

k=1

i−1

 (k) (k+1)  (i) (1) aij − aij + aij = aij = aij ,

(3.5)

k=1

while for i > j (LU )ij =

j −1

k=1

(k) (k) lik akj

(j ) + lij ajj

=

j −1



 (j ) aij(k) − aij(k+1) + aij = aij .

(3.6)

k=1

  Note that this Theorem holds even if A is singular. Since L is nonsingular the (n) = 0 when A is singular. matrix U is then singular, and we must have ann

62

3 Gaussian Elimination and LU Factorizations

3.3 Banded Triangular Systems Once we know an LU factorization of A the system Ax = b is solved in two steps. Since LU x = b we have Ly = b, where y := U x. We first solve Ly = b, for y and then U x = y for x.

3.3.1 Algorithms for Triangular Systems A nonsingular triangular linear system Ax = b is easy to solve. By Lemma 2.5 A has nonzero diagonal elements. Consider first the lower triangular case. For n = 3 the system is ⎡

⎤⎡ ⎤ ⎡ ⎤ a11 0 0 x1 b1 ⎣a21 a22 0 ⎦ ⎣x2 ⎦ = ⎣b2 ⎦ . x3 b3 a31 a32 a33 From the first equation we find x1 = b1 /a11 . Solving the second equation for x2 we obtain x2 = (b2 − a21 x1 )/a22 . Finally the third equation gives x3 = (b3 − a31 x1 − a32 x2 )/a33 . This process is known as forward substitution. In general k−1

  xk = bk − ak,j xj /akk ,

k = 1, 2, . . . , n.

(3.7)

j =1

When A is a lower triangular band matrix the number of arithmetic operations necessary to find x can be reduced. Suppose A is a lower triangular d-banded, / {lk , lk + 1, . . . , k for k = 1, 2, . . . , n, and where so that ak,j = 0 for j ∈ lk := max(1, k − d), see Fig. 3.2. For a lower triangular d-band matrix the calculation in (3.7) can be simplified as follows k−1

  xk = bk − ak,j xj /akk ,

k = 1, 2, . . . , n.

(3.8)

j =lk

Note that (3.8) reduces to (3.7) if d = n. Letting A(k, lk : (k − 1)) ∗ x(lk : (k − 1)) denote the sum k−1 a j =lk kj xj we arrive at the following algorithm, where the initial Fig. 3.2 Lower triangular 5 × 5 band matrices: d = 1 (left) and d = 2 right



a11 ⎢a21 ⎢ ⎢ 0 ⎣ 0 0

0 a22 a32 0 0

0 0 a33 a43 0

0 0 0 a44 a54

⎤ 0 0 ⎥ ⎥ 0 ⎥, 0 ⎦ a55



a11 ⎢a21 ⎢ ⎢a31 ⎣ 0 0

0 a22 a32 a42 0

0 0 a33 a43 a53

0 0 0 a44 a54

⎤ 0 0 ⎥ ⎥ 0 ⎥ 0 ⎦ a55

3.3 Banded Triangular Systems

63

“r” in the name signals that this algorithm is row oriented. The algorithm takes a nonsingular lower triangular d-banded matrix A ∈ Cn×n , and b ∈ Cn , as input, and returns an x ∈ Cn so that Ax = b. For each k we take the inner product of a part of a row with the already computed unknowns. function x=rforwardsolve(A,b,d) % x=rforwardsolve(A,b,d) n=length(b); x=b; x(1)=b(1)/A(1,1); for k=2:n lk=max(1,k-d); x(k)=(b(k)-A(k,lk:(k-1))*x(lk:(k-1)))/A(k,k); end Listing 3.1 rforwardsolve

A system Ax = b, where A is upper triangular must be solved by back substitution or ’bottom-up’. We first find xn from the last equation and then move upwards for the remaining unknowns. For an upper triangular d-banded matrix this leads to the following algorithm, which takes a nonsingular upper triangular dbanded matrix A ∈ Cn×n , b ∈ Cn and d, as input, and returns an x ∈ Cn so that Ax = b. function x=rbacksolve(A,b,d) % x=rbacksolve(A,b,d) n=length(b); x=b; x(n)=b(n)/A(n,n); for k=n-1:-1:1 uk=min(n,k+d); x(k)=(b(k)-A(k,(k+1):uk)*x((k+1):uk))/A(k,k); end Listing 3.2 rbacksolve

Example 3.3 (Column Oriented Forwardsolve) In this example we develop a column oriented vectorized version of forward substitution. For a backward substitution see Exercise 3.1. Consider the system Ax = b, where A ∈ Cn×n is lower triangular. Suppose after k − 1 steps of the algorithm we have a reduced system in the form ⎡

ak,k 0 ⎢ak+1,k ak+1,k+1 ⎢ ⎢ . ⎣ .. an,k

··· ··· .. .

0 0 .. .

· · · an×n

⎤⎡

⎤ ⎡ ⎤ xk bk ⎥ ⎢xk+1 ⎥ ⎢bk+1 ⎥ ⎥⎢ ⎥ ⎢ ⎥ ⎥⎢ . ⎥ = ⎢ . ⎥. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ xn

bn

This system is of order n − k + 1. The unknowns are xk , . . . , xn .

64

3 Gaussian Elimination and LU Factorizations

We see that xk = bk /ak,k and eliminating xk from the remaining equations we obtain a system of order n − k with unknowns xk+1 , . . . , xn ⎡ ak+1,k+1 0 ⎢ak+2,k+1 ak+2,k+2 ⎢ ⎢ .. ⎣ . an,k+1

··· ··· .. .

0 0 .. .

· · · an,n



⎡ ⎤ ⎡ ⎤ ⎡ ⎤ bk+1 ak+1,k ⎥ xk+1 ⎥ ⎢ .. ⎥ ⎢ .. ⎥ ⎢ . ⎥ ⎥ ⎣ . ⎦ = ⎣ . ⎦ − xk ⎣ .. ⎦ . ⎦ xn bn an,k

Thus at the kth step, k = 1, 2, . . . n we set xk = bk /A(k, k) and update b as follows: b((k + 1) : n) = b((k + 1) : n) − x(k) ∗ A((k + 1) : n, k). This leads to the following algorithm for column oriented forward solve, which takes a nonsingular lower triangular d-banded matrix A ∈ Cn×n , b ∈ Cn , and d as input, and returns an x ∈ Cn so that Ax = b. function x=cforwardsolve(A,b,d) %x=cforwardsolve(A,b,d) x=b; n=length(b); for k=1:n-1 x(k)=b(k)/A(k,k); uk=min(n,k+d); b((k+1):uk)=b((k+1):uk)-A((k+1):uk,k)*x(k); end x(n)=b(n)/A(n,n); end Listing 3.3 cforwardsolve

3.3.2 Counting Operations It is useful to have a number which indicates the amount of work an algorithm requires. In this book we measure this by estimating the total number of (complex) arithmetic operations. We count both additions, subtractions, multiplications and divisions, but not work on indices. As an example we show that the LU factorization of a full matrix of order n using Gaussian elimination requires exactly NLU :=

2 3 1 2 1 n − n − n 3 2 6

(3.9)

operations. Let M, D, A, S be the number of (complex) multiplications, divisions, additions, and subtractions. In (3.2) the multiplications and subtractions occur in the (k) (k) calculation of aijk+1 = aij(k) − lik akj which is carried out (n − k)2 times. Moreover,

3.3 Banded Triangular Systems

65

each calculation involves one subtraction and one multiplication. Thus we find M + n−1 2 =2 S = 2 n−1 (n − k) m2 = 23 n(n − 1)(n − 12 ). For each k there are n − k k=1 m=1 n−1 divisions giving a sum of k=1 (n − k) = 12 n(n − 1). Since there are no additions we obtain the total M +D+A+S =

2 1 1 n(n − 1)(n − ) + n(n − 1) = NLU 3 2 2

given by (3.9). We are only interested in NLU when n is large and for such n the term 23 n3 dominates. We therefore regularly ignore lower order terms and use number of operations both for the exact count and for the highest order term. We also say more loosely that the number of operations is O(n3 ). We will use the number of operations counted in one of these ways as a measure of the complexity of an algorithm and say that the complexity of LU factorization of a full matrix is O(n3 ) or more precisely 23 n3 . We will compare the number of arithmetic operations of many algorithms with the number of arithmetic operations of Gaussian elimination and define for n ∈ N the number Gn as follows: Definition 3.2 (Gn := 23 n3 ) We define Gn := 23 n3 . There is a quick way to arrive at the leading term 2n3 /3. We only consider the operations contributing to this term. In (3.2) the leading term comes from the inner loop contributing to M +S. Then we replace sums by integrals letting the summation indices be continuous variables and adjust limits of integration in an insightful way to simplify the calculation. Thus,  n−1

2 M +S =2 (n − k) ≈ 2 k=1

n−1 1



n

(n − k) dk ≈ 2 2

(n − k)2 dk =

0

2 3 n 3

and this is the correct leading term. Consider next NS , the number of forward plus backward substitutions. By (3.7) we obtain NS = 2

 n  n

(2k − 1) ≈ 2 (2k − 1)dk ≈ 4 k=1

1

n

kdk = 2n2 .

0

The last integral actually give the exact value for the sum in this case (cf. (3.26)). We see that LU factorization is an O(n3 ) process while solving a triangular system requires O(n2 ) arithmetic operations. Thus, if n = 106 and one arithmetic operation requires c = 10−14 seconds of computing time then cn3 = 104 seconds ≈ 3 hours and cn2 = 0.01 second, giving dramatic differences in computing time.

66

3 Gaussian Elimination and LU Factorizations

3.4 The PLU Factorization Theorem 3.1 shows that Gaussian elimination can fail on a    row  interchanges   without nonsingular system. A simple example is 01 11 xx12 = 11 . We show here that any nonsingular linear system can be solved by Gaussian elimination if we incorporate row interchanges.

3.4.1 Pivoting Interchanging two rows (and/or two columns) during Gaussian elimination is known as pivoting. The element which is moved to the diagonal position (k, k) is called the pivot element or pivot for short, and the row containing the pivot is called the pivot row. Gaussian elimination with row pivoting can be described as follows. 1. Choose rk ≥ k so that ar(k) = 0. k ,k 2. Interchange rows rk and k of A(k) . (k) and aij(k+1) using (3.2). 3. Eliminate by computing lik To show that Gaussian elimination can always be carried to completion by using suitable row interchanges suppose by induction on k that A(k) is nonsingular. Since A(1) = A this holds for k = 1. By Lemma 2.4 the lower right diagonal block in A(k) is nonsingular. But then at least one element in the first column of that block = 0. But then A(k+1) must be nonzero and it follows that rk exists so that ar(k) k ,k is nonsingular since it is computed from A(k) using row operations preserving the nonsingularity. We conclude that A(k) is nonsingular for k = 1, . . . , n.

3.4.2 Permutation Matrices Row interchanges can be described in terms of permutation matrices. Definition 3.3 A permutation matrix is a matrix of the form P = I (:, p) = [ei1 , ei2 , . . . , ein ] ∈ Rn×n , where ei1 , . . . , ein is a permutation of the unit vectors e1 , . . . , en ∈ Rn . Every permutation p = [i1 , . . . , in ]T of the integers 1, 2, . . . , n gives rise to a permutation matrix and vice versa. Post-multiplying a matrix A by a permutation matrix results in a permutation of the columns, while pre-multiplying by a permutation matrix gives a permutation of the rows. In symbols AP = A(:, p),

P T A = A(p, :).

(3.10)

3.4 The PLU Factorization

67

Indeed, AP = (Aei1 , . . . , Aein ) = A(:, p) and P T A = (AT P )T = (AT (: , p))T = A(p, :). Since P T P = I the inverse of P is equal to its transpose, P −1 = P T and P P T = I as well. We will use a particularly simple permutation matrix. Definition 3.4 We define a (j, k)-Interchange matrix I j k by interchanging column j and k of the identity matrix. Since I j k = I kj , and we obtain the identity by applying I j k twice, we see that I 2j k = I and an interchange matrix is symmetric and equal to its own inverse. Premultiplying a matrix by an interchange matrix interchanges two rows of the matrix, while post-multiplication interchanges two columns. We can keep track of the row interchanges using pivot vectors pk . We define p := pn , where p 1 := [1, 2, . . . , n]T , and p k+1 := I rk ,k p k for k = 1, . . . , n − 1. (3.11) We obtain p k+1 from pk by interchanging the entries rk and k in p k . In particular, since rk ≥ k, the first k − 1 components in pk and pk+1 are the same. There is a close relation between the pivot vectors p k and the corresponding interchange matrices P k := I rk ,k . Since P k I (pk , :) = I (P k p k , :) = I (p k+1 , :) we obtain P T = P n−1 · · · P 1 = I (p, :),

P = P 1 P 2 · · · P n−1 = I (:, p).

(3.12)

Instead of interchanging the rows of A during elimination we can keep track of the ordering of the rows using the pivot vectors p k . Gaussian elimination with row pivoting starting with aij(1) = aij can be described as follows: p = [1, . . . , n]T ; for k = 1 : n − 1 choose rk ≥ k so that ap(k) = 0. r ,k k

p = Irk ,k p

(3.13)

for i = k + 1 : n (k)

(k)

(k)

api ,k = api ,k /apk ,k for j = k : n ap(k+1) = ap(k) − ap(k) a (k) i ,j i ,j i ,k pk ,j

68

3 Gaussian Elimination and LU Factorizations

This leads to the following factorization: Theorem 3.3 Gaussian elimination with row pivoting on a nonsingular matrix A ∈ Cn×n leads to the factorization A = P LU , where P is a permutation matrix, L is lower triangular with ones on the diagonal, and U is upper triangular. More explicitly, P = I (:, p), where p = I rn−1 ,n−1 · · · I r1 ,1 [1, . . . , n]T , and ⎡



1

⎢a (1) ⎥ ⎢ p2 ,1 1 ⎥ ⎢ ⎥ L=⎢ . .. ⎥ , . . ⎦ ⎣ . ap(1) a (2) · · · 1 n ,1 pn ,2



⎤ (1) (1) ap1 ,1 · · · ap1 ,n ⎢ . ⎥ .. ⎥ U =⎢ . .. ⎦ . ⎣ ap(n) n ,n

(3.14)

Proof The proof is analogous to the proof for LU factorization without pivoting. From (3.13) we have for all i, j (k)

(k)

(k)

(k+1)

(k)

(j )

(j )

api ,k apk ,j = api ,j − api ,j for k < min(i, j ), and api ,j apj ,j = api ,j for i > j. Thus for i ≤ j we find (LU )ij =

n

li,k ukj =

k=1

=

i−1

i−1

ap(k) a (k) + ap(i)i ,j i ,k pk ,j

k=1



(k+1) 

(k)

api ,j − api ,j

  (i) (1) + api ,j = api ,j = api ,j = P T A ij ,

k=1

while for i > j (LU )ij =

n

k=1

=

j −1

(k) lik ukj

=

j −1

(j )

ap(k) a (k) + ap(k) a i ,k pk ,j i ,j pj ,j

k=1

 (k)   (j ) (k+1)  (1) api ,j − api ,j + api ,j = api ,j = api ,j = P T A ij .

k=1

  The PLU factorization can also be written P T A = LU . This shows that for a nonsingular matrix there is a permutation of the rows of A so that the permuted matrix has an LU factorization.

3.4 The PLU Factorization

69

3.4.3 Pivot Strategies The choice of pivot element in (3.13) is not unique. In partial pivoting we select the largest element (k) | := max{|ai,k | : k ≤ i ≤ n} |ar(k) k ,k

with rk the smallest such index in case of a tie. The following example illustrating that small pivots should be avoided. Example 3.4 Applying Gaussian elimination without row interchanges to the linear system 10−4 x1 + 2x2 = 4 x1 + x2 = 3 we obtain the upper triangular system 10−4 x1 + 2x2 = 4 (1 − 2 × 104 )x2 = 3 − 4 × 104 The exact solution is x2 =

−39997 ≈ 2, −19999

x1 =

4 − 2x2 20000 ≈ 1. = −4 10 19999

Suppose we round the result of each arithmetic operation to three digits. The solutions fl(x1 ) and fl(x2 ) computed in this way is fl(x2 ) = 2,

fl(x1 ) = 0.

The computed value 0 of x1 is completely wrong. Suppose instead we apply Gaussian elimination to the same system, but where we have interchanged the equations. The system is x1 + x2 = 3 10−4 x1 + 2x2 = 4 and we obtain the upper triangular system x1 + x2 = 3 (2 − 10−4 )x2 = 4 − 3 × 10−4

70

3 Gaussian Elimination and LU Factorizations

Now the solution is computed as follows x2 =

3.9997 ≈ 2, 1.9999

x1 = 3 − x2 ≈ 1.

In this case rounding each calculation to three digits produces fl(x1 ) = 1 and fl(x2 ) = 2 which is quite satisfactory since it is the exact solution rounded to three digits. Related to partial pivoting is scaled partial pivoting. Here rk is the smallest index such that |ar(k) | k ,k sk

:= max

# |a (k)| i,k

sk

$ :k≤i≤n ,

sk := max |akj |. 1≤j ≤n

This can sometimes give more accurate results if the coefficient matrix have coefficients of wildly different sizes. Note that the scaling factors sk are computed using the initial matrix. It also is possible to interchange both rows and columns. The choice (k) ar(k) := max{|ai,j | : k ≤ i, j ≤ n} k ,sk

with rk , sk the smallest such indices in case of a tie, is known as complete pivoting. Complete pivoting is known to be more numerically stable than partial pivoting, but requires a lot of search and is seldom used in practice.

3.5 The LU and LDU Factorizations Gaussian elimination without row interchanges is one way of computing an LU factorization of a matrix. There are other ways that can be advantageous for certain kind of problems. Here we consider the general theory of LU factorizations. Recall that A = LU is an LU factorization of A ∈ Cn×n if L ∈ Cn×n is lower triangular and U ∈ Cn×n is upper triangular , i.e., ⎡

l1,1 · · · ⎢ .. . . L=⎣ . . ln,1 · · ·

⎤ 0 .. ⎥ , . ⎦ ln,n

⎡ u1,1 · · · ⎢ .. . . U =⎣ . . 0 ···

⎤ u1,n .. ⎥ . . ⎦ un,n

To find an LU factorization there is one equation for each of the n2 elements in A, and L and U contain a total of n2 + n unknown elements. There are several ways to restrict the number of unknowns to n2 .

3.5 The LU and LDU Factorizations

71

L1U: lii = 1 all i, LU1: uii = 1 all i, LDU: A = LDU , lii = uii = 1 all i, D = diag(d11, . . . , dnn ).

3.5.1 Existence and Uniqueness Consider the L1U factorization. Three things can happen. An L1U factorization exists and is unique, it exists, but it is not unique, or it does not exist. The 2 × 2 case illustrates this. Example  3.5  (L1U of 2 × 2 Matrix) Let a, b, c, d ∈ C. An L1U factorization of A = [ ac db must satisfy the equations        u2 u1 ab 1 0 u1 u2 = = u1 l1 u2 l1 + u3 cd l1 1 0 u3 for the unknowns l1 in L and u1 , u2 , u3 in U . The equations are u1 = a,

u2 = b,

al1 = c,

bl1 + u3 = d.

(3.15)

These equations do not always have a solution. Indeed, the main problem is the equation al1 = c. There are essentially three cases 1. a = 0: The matrix has a unique L1U factorization. 2. a = c = 0: The L1U factorization exists, but it is not unique. Any value for l1 can be used. 3. a = 0, c = 0: No L1U factorization exists. Consider the four matrices     2 −1 01 A1 := , A2 := , −1 2 11

A3 :=

  01 , 02

 A4 :=

 11 . 11

From the previous discussion it follows that A1 has a unique L1U factorization, A2 has no L1U factorization, A3 has an L1U factorization but it is not unique, and A4 has a unique L1U factorization even if it is singular. In preparation for the main theorem about LU factorization we prove a simple lemma. Recall that ⎤ ⎡ a11 · · · ak1 ⎢ .. ⎥ A[k] := ⎣ ... . ⎦ ak1 · · · akk is called a leading principal submatrix of A.

72

3 Gaussian Elimination and LU Factorizations

Lemma 3.1 (L1U of Leading Principal Submatrices) Suppose A = LU is an L1U factorization of A ∈ Cn×n . For k = 1, . . . , n let A[k] , L[k] , U [k] be the leading principal submatrices of A, L, U , respectively. Then A[k] = L[k] U [k] is an L1U factorization of A[k] for k = 1, . . . , n. Proof For k = 1, . . . , n − 1 we partition A = LU as follows:        L[k] S k L[k] 0 U [k] S k L[k] U [k] A[k] B k = = , Ck F k 0 Tk M k U [k] M k S k + N k T k Mk N k

(3.16)

where F k , N k , T k ∈ Cn−k,n−k . Comparing blocks we find A[k] = L[k] U [k] . Since L[k] is unit lower triangular and U [k] is upper triangular this is an L1U factorization of A[k] .   The following theorem gives a necessary and sufficient condition for existence of a unique LU factorization. The conditions are the same for the three factorizations L1U, LU1 and LDU. Theorem 3.4 (LU Theorem) A square matrix A ∈ Cn×n has a unique L1U (LU1, LDU) factorization if and only if the leading principal submatrices A[k] of A are nonsingular for k = 1, . . . , n − 1. Proof Suppose A[k] is nonsingular for k = 1, . . . , n − 1. Under these conditions Gaussian elimination gives an L1U factorization (cf. Theorem 3.2). We give another proof here that in addition to showing uniqueness also gives alternative ways to compute the L1U factorization. The proofs for the LU1 and LDU factorizations are similar and left as exercises. We use induction on n to show that A has a unique L1U factorization. The result is clearly true for n = 1, since the unique L1U factorization of a 1 × 1 matrix is [a11] = [1][a11]. Suppose that A[n−1] has a unique L1U factorization A[n−1] = Ln−1 U n−1 , and that A[1] , . . . , A[n−1] are nonsingular. By block multiplication        Ln−1 0 U n−1 un Ln−1 U n−1 Ln−1 un A[n−1] cn = = , A= l Tn 1 r Tn ann 0 unn l Tn U n−1 l Tn un + unn (3.17) if and only if A[n−1] = Ln−1 U n−1 and l n , un ∈ Cn−1 and unn ∈ C are determined from U Tn−1 l n = r n ,

Ln−1 un = cn ,

unn = ann − l Tn un .

(3.18)

Since A[n−1] is nonsingular it follows that Ln−1 and U n−1 are nonsingular and therefore l n , un , and unn are uniquely given. Thus (3.17) gives a unique L1U factorization of A. Conversely, suppose A has a unique L1U factorization A = LU . By Lemma 3.1 A[k] = L[k] U [k] is an L1U factorization of A[k] for k = 1, . . . , n − 1. Suppose

3.5 The LU and LDU Factorizations

73

A[k] is singular for some k ≤ n − 1. We will show that this leads to a contradiction. Let k be the smallest integer so that A[k] is singular. Since A[j ] is nonsingular for j ≤ k − 1 it follows from what we have already shown that A[k] = L[k] U [k] is the unique L1U factorization of A[k] . The matrix U [k] is singular since A[k] is singular and L[k] is nonsingular. By (3.16) we have U T[k] M Tk = C Tk . This can be written as n − k linear systems for the columns of M Tk . By assumption M Tk exists, but since U T[k] is singular M k is not unique, a contradiction.   By combining the last two equations in (3.18) we obtain with k = n U Tk−1 l k = r k ,

     c Lk−1 0 uk = k . akk l Tk 1 ukk

This can be used in an algorithm to compute the L1U factorization. Moreover, if A is d-banded then the first k − d components in r k and ck are zero so both L and U will be d-banded. Thus we can use the banded rforwardsolve Algorithm 3.1 to solve the lower triangular system U Tk−1 l k = r k for the kth row l Tk in L and the   kth column uukkk in U for k = 2, . . . , n. This leads to the following algorithm to compute the L1U factorization of a d-banded matrix A with d ≥ 1. The algorithm will fail if the conditions in the LU theorem are not satisfied. function [L,U]=L1U(A,d) % [L,U]=L1U(A,d) n=length(A); L=eye(n,n); U=zeros(n,n);U(1,1)=A(1,1); for k=2:n km=max(1,k-d); L(k,km:(k-1))=rforwardsolve(U(km:(k-1) ... ,km:(k-1))’,A(k,km:(k-1))’,d)’; U(km:k,k)=rforwardsolve(L(km:k,km:k),A(km:k,k),d); end Listing 3.4 L1U

For each k we essentially solve a lower triangular linear system of order d. Thus the number of arithmetic operation for this algorithm is O(d 2 n). Remark 3.1 (LU of Upper Triangular Matrix) A matrix A ∈ Cn×n can have an LU factorization even if A[k] is singular for some k < n. By Theorem 4.1 such an LU factorization cannot be unique. An L1U factorization of an upper triangular matrix A is A = I A so it always exists even if A has zeros somewhere on the diagonal. By Lemma 2.5, if some akk is zero then A[k] is singular and the L1U factorization is not unique. In particular, for the zero matrix any unit lower triangular matrix can be used as L in an L1U factorization.

74

3 Gaussian Elimination and LU Factorizations

3.6 Block LU Factorization Suppose A ∈ Cn×n is a block matrix of the form ⎡

⎤ A11 · · · A1m ⎢ .. ⎥ , A := ⎣ ... . ⎦ Am1 · · · Amm

(3.19)

where each diagonal block Aii is square. We call the factorization ⎤⎡



I ⎢ L21 I ⎢ A = LU = ⎢ . ⎣ ..

..

⎥⎢ ⎥⎢ ⎥⎢ ⎦⎣

.

U 11 U 22

Lm1 · · · Lm,m−1 I

⎤ · · · U 1m · · · U 2m ⎥ ⎥ .. ⎥ .. . . ⎦ U mm

(3.20)

a block L1U factorization of A. Here the ith diagonal blocks I and U ii in L and U have the same size as Aii , the ith diagonal block in A. Moreover, the U ii are not necessarily upper triangular. Block LU1 and block LDU factorizations are defined similarly. The results for element-wise LU factorization carry over to block LU factorization as follows. Theorem 3.5 (Block LU Theorem) Suppose A ∈ Cn×n is a block matrix of the form (3.19). Then A has a unique block LU factorization (3.20) if and only if the leading principal block submatrices ⎡

A{k}

A11 · · · ⎢ .. := ⎣ .

⎤ A1k .. ⎥ . ⎦

Ak1 · · · Akk are nonsingular for k = 1, . . . , m − 1. Proof Suppose A{k} is nonsingular for k = 1, . . . , m − 1. Following the proof in Theorem 3.4 suppose A{m−1} has a unique block LU factorization A{m−1} = L{m−1} U {m−1} , and that A{1} , . . . , A{m−1} are nonsingular. Then L{m−1} and U {m−1} are nonsingular and   A{m−1} B C T Amm    L−1 L{m−1} 0 U {m−1} {m−1} B , = −1 C T U −1 0 Amm − C T U −1 {m−1} I {m−1} L{m−1} B

A=

(3.21)

3.7 Exercises Chap. 3

75

is a block LU factorization of A. It is unique by derivation. Conversely, suppose A has a unique block LU factorization A = LU . Then as in Lemma 3.1 it is easily seen that A{k} = L{k} U {k} is the unique block LU factorization of A[k] for k = 1, . . . , m. The rest of the proof is similar to the proof of Theorem 3.4.   Remark 3.2 (Comparing LU and Block LU) The number of arithmetic operations for the block LU factorization is the same as for the ordinary LU factorization. An advantage of the block method is that it combines many of the operations into matrix operations. Remark 3.3 (A Block LU Is Not an LU) Note that (3.20) is not an LU factorization of A since the U ii ’s are not upper triangular in general. To relate the block LU factorization to the usual LU factorization we assume that each U ii has an LU ˜ ii U˜ ii . Then A = L ˆ Uˆ , where L ˆ := L diag(L ˜ ii ) and factorization U ii = L −1 ˆ ˜ U := diag(Lii )U , and this is an ordinary LU factorization of A.

3.7 Exercises Chap. 3 3.7.1 Exercises Sect. 3.3 Exercise 3.1 (Column Oriented Backsolve) Suppose A ∈ Cn×n is nonsingular, upper triangular, d-banded, and b ∈ Cn . Justify the following column oriented vectorized algorithm for solving Ax = b. function x=cbacksolve(A,b,d) % x=cbacksolve(A,b,d) x=b; n=length(b); for k=n:-1:2 x(k)=b(k)/A(k,k); lk=max(1,k-d); b(lk:(k-1))=b(lk:(k-1))-A(lk:(k-1),k)*x(k); end x(1)=b(1)/A(1,1); end Listing 3.5 cbacksolve

Exercise 3.2 (Computing the Inverse of a Triangular Matrix) Suppose A ∈ Cn×n is a nonsingular lower triangular matrix. By Lemma 2.5 the inverse B = [b1 , . . . , bn ] is also lower triangular. The kth column bk of B is the solution of the linear systems Abk = ek . Show that bk (k) = 1/a(k, k) for k = 1, . . . , n, and explain why we can find bk by solving the linear systems A((k +1):n, (k +1):n)bk ((k +1):n) = −A((k +1):n, k)bk (k),

k = 1, . . . , n−1. (3.22)

76

3 Gaussian Elimination and LU Factorizations

Is it possible to store the interesting part of bk in A as soon as it is computed? When A instead is upper triangular, show also that we can find bk by solving the linear systems A(1:k, 1:k)bk (1:k) = I (1:k, k),

k = n, n − 1, . . . , 1,

(3.23)

for k = n, n − 1, . . . , 1. Exercise 3.3 (Finite Sums of Integers) Use induction on m, or some other method, to show that 1 m(m + 1), 2 1 1 12 + 22 + · · · + m2 = m(m + )(m + 1), 3 2 1 + 2 + ···+ m =

1 + 3 + 5 + · · · + 2m − 1 = m2 , 1 ∗ 2 + 2 ∗ 3 + 3 ∗ 4 + · · · + (m − 1)m =

1 (m − 1)m(m + 1). 3

(3.24) (3.25) (3.26) (3.27)

Exercise 3.4 (Multiplying Triangular Matrices) Show that the matrix multiplication AB can be done in 13 n(2n2 + 1) ≈ Gn arithmetic operations when A ∈ Rn×n is lower triangular and B ∈ Rn×n is upper triangular. What about BA?

3.7.2 Exercises Sect. 3.4 Exercise 3.5 (Using PLU for A∗ ) Suppose we know the PLU factors P , L, U in a PLU factorization A = P LU of A ∈ Cn×n . Explain how we can solve the system A∗ x = b economically. Exercise 3.6 (Using PLU for Determinant) Suppose we know the PLU factors P , L, U in a PLU factorization A = P LU of A ∈ Cn×n . Explain how we can use this to compute the determinant of A. Exercise 3.7 (Using PLU for A−1 ) Suppose the factors P , L, U in a PLU factorization of A ∈ Cn×n are known. Use Exercise 3.4 to show that it takes approximately 2Gn arithmetic operations to compute A−1 = U −1 L−1 P T . Here we have not counted the final multiplication with P T which amounts to n row interchanges. Exercise 3.8 (Upper Hessenberg System (Exam Exercise (1994-2)) Gaussian elimination with row pivoting can be written in the following form if for each k we exchange rows k and k + 1

3.7 Exercises Chap. 3

77

Algorithm 1 1. for k = 1, 2, . . . , n − 1 (a) exchange ak,j and ak+1,j for j = k, k + 1, . . . , n (b) for i = k + 1, k + 2, . . . , n i. ai,k = mi,k = ai,k /ak,k ii. ai,j = ai,j − mi,k ak,j for j = k + 1, k + 2, . . . , n To solve the set of equations Ax = b we have the following algorithm: Algorithm 2 1. for k = 1, 2, . . . , n − 1 (a) exchange bk and bk+1 (b) bi = bi − ai,k bk for i = k + 1, k + 2, . . . , n 2. xn = bn /an,n 3. for k = n − 1, n − 2, . . . , 1 (a) sum = 0 (b) sum = sum + ak,j xj for j = k + 1, k + 2, . . . , n (c) xk = (bk − sum)/ak,k We say that H ∈ Rn×n is unreduced upper Hessenberg if it is upper Hessenberg and the subdiagonal elements hi,i−1 = 0 for i = 2, . . . , n. a) Let H ∈ Rn×n be unreduced upper Hessenberg. Give an O(n2 ) algorithm for solving the linear system H x = b using suitable specializations of Algorithms 1 and 2. b) Find the number of multiplications/divisions in the algorithm you developed in exercise a). Is division by zero possible? c) Let U ∈ Rn×n be upper triangular and nonsingular. We define C := U + veT1 ,

(3.28)

where v ∈ Rn and e1 is the first unit vector in Rn . We also let P := I 1,2 I 2,3 · · · I n−1,n ,

(3.29)

where the I i,j are obtained from the identity matrix by interchanging rows i and j . Explain why the matrix E := CP is unreduced upper Hessenberg.

78

3 Gaussian Elimination and LU Factorizations

d) Let A ∈ Rn×n be nonsingular. We assume that A has a unique L1U factorization A = LU . To a given W ∈ Rn we define a rank one modification of A by B := A + weT1 .

(3.30)

Show that B has the factorization B = LH P T , where L is unit lower triangular, P is given by (3.29) and H is unreduced upper Hessenberg. e) Use the results above to sketch an O(n2 ) algorithm for solving the linear system Bx = b, where B is given by (3.30). We assume that the matrices L and U in the L1U factorization of A have already been computed.

3.7.3 Exercises Sect. 3.5 Exercise 3.9 (# Operations for Banded Triangular Systems) Show that for 1 ≤ d ≤ n Algorithm 3.4, with A(k, k) = 1 for k = 1, . . . , n in Algorithm 3.1, requires exactly NLU (n, d) := (2d 2 + d)n − (d 2 + d)(8d + 1)/6 = O(d 2 n) operations.2 In particular, for a full matrix d = n − 1 and we find NLU (n, n) = 23 n3 − 12 n2 − 16 n ≈ Gn in agreement with the exact count (3.9) for Gaussian elimination, while for a tridiagonal matrix NLU (n, 1) = 3n − 3 = O(n). Exercise 3.10 (L1U and LU1) Show that the matrix A3 in Example 3.5 has no LU1 or LDU factorization. Give an example of a matrix that has an LU1 factorization, but no LDU or L1U factorization. Exercise 3.11 (LU of Nonsingular Matrix) Show that the following are equivalent for a nonsingular matrix A ∈ Cn×n . 1. A has an LDU factorization. 2. A has an L1U factorization. 3. A has an LU1 factorization.

  Exercise 3.12 (Row Interchange) Show that A = [ 01 11 has a unique L1U factorization. Note that we have only interchanged rows in Example 3.5. Exercise 3.13 (LU and Determinant) Suppose A has an L1U factorization A = LU . Show that det(A[k] ) = u11 u22 · · · ukk for k = 1, . . . , n.

2 Hint:

Consider the cases 2 ≤ k ≤ d and d + 1 ≤ k ≤ n separately.

3.7 Exercises Chap. 3

79

Exercise 3.14 (Diagonal Elements in U) Suppose A ∈ Cn×n and A[k] is nonsingular for k = 1, . . . , n − 1. Use Exercise 3.13 to show that the diagonal elements ukk in the L1U factorization are u11 = a11,

ukk =

det(A[k] ) , for k = 2, . . . , n. det(A[k−1] )

(3.31)

Exercise 3.15 (Proof of LDU Theorem) Give a proof of the LU theorem for the LDU case. Exercise 3.16 (Proof of LU1 Theorem) Give a proof of the LU theorem for the LU1 case. Exercise 3.17 (Computing the Inverse (Exam Exercise 1978-1)) Let A ∈ Rn×n be nonsingular and with a unique L1U factorization A = LU . We partition L and U as follows     u1,1 uT1 1 0 , U= , (3.32) L= 0 U 2,2 1 L2,2 where L2,2 , U 2,2 ∈ R(n−1)×(n−1) . Define A2,2 := L2,2 U 2,2 and B 2,2 := A−1 2,2 . a) Show that A−1 = B, where   (1 + uT1 B 2,2 1 )/u1,1 −uT1 B 2,2 /u1,1 B := . −B 2,2 1 B 2,2

(3.33)

b) Suppose that the elements li,j , i > j in L and ui,j , j ≥ i in U are stored in A with elements ai,j . Write an algorithm that overwrites the elements in A with ones in A−1 . Only one extra vector s ∈ Rn should be used. Exercise 3.18 (Solving T H x = b (Exam Exercise 1981-3)) In this exercise we consider nonsingular matrices T , H , S ∈ Rn×n with T = (tij ) upper triangular, H = (hij ) upper Hessenberg and S := T H . We assume that H has a unique LU1 factorization H = LU with L∞ U ∞ ≤ KH ∞ for a constant K not too large. In this exercise the number of operations is the highest order term in the number of multiplications and divisions. a) Give an algorithm which computes S from T and H without using the lower parts (tij , i > j ) of T and (hij , i > j + 1) of H . In what order should the elements in S be computed if S overwrites the elements in H ? What is the number of operations of the algorithm? b) Show that L is upper Hessenberg. c) Give a detailed algorithm for finding the LU1-factorization of H stored in H . Determine the number of operations in the algorithm.

80

3 Gaussian Elimination and LU Factorizations

d) Given b ∈ Rn and T ,H as before. Suppose S and the LU1-factorization are not computed. We want to find x ∈ Rn such that Sx = b. We have the 2 following methods Method 1: 1. S = T H 2. Solve Sx = b Method 2: 1. Solve T z = b 2. Solve H x = z What method would you prefer? Give reasons for your answer. Exercise 3.19 (L1U Factorization Update (Exam Exercise 1983-1)) Let A ∈ Rn×n be nonsingular with columns a 1 , a 2 , . . . , a n . We assume that A has a unique L1U factorization A = LU . For a positive integer p ≤ n and b ∈ Rn we define B := [a 1 , . . . , a p−1 , a p+1 , . . . , a n , b] ∈ Rn×n . a) Show that H := L−1 B is upper Hessenberg. We assume that H has a unique L1U factorization. H = LH U H . b) Describe briefly how many multiplications/divisions are required to find the L1U factorization of H ? c) Suppose we have found the L1U factorization H := LH U H of H . Explain how we can find the L1U factorization of B from LH and U H . Exercise 3.20 (U1L Factorization (Exam Exercise 1990-1)) We say that A ∈ Rn×n has a U1L factorization if A = U L for an upper triangular matrix U ∈ Rn×n with ones on the diagonal and a lower triangular L ∈ Rn×n . A UL and the more common LU factorization are analogous, but normally not the same. a) Find a U1L factorization of the matrix 

 −3 −2 A := . 4 2 b) Let the columns of P ∈ Rn×n be the unit vectors in reverse order, i.e., P := [en , en−1 , . . . , e1 ].

3.8 Review Questions

c) d)

e)

f)

81

Show that P T = P and P 2 = I . What is the connection between the elements in A and P A? Let B := P AP . Find integers r, s, depending on i, j, n, such that bi,j = ar,s . Make a detailed algorithm which to given A ∈ Rn×n determines B := P AP . The elements bi,j in B should be stored in position i, j in A. You should not use other matrices than A and a scalar w ∈ R. Let P AP = MR be an L1U factorization of P AP , i.e., M is lower triangular with ones on the diagonal and R is upper triangular. Express the matrices U and L in a U1L factorization of A in terms of M, R and P . Give necessary and sufficient conditions for a matrix to have a unique U1L factorization.

3.7.4 Exercises Sect. 3.6 ˆ is unit lower triangular Exercise 3.21 (Making Block LU into LU) Show that L and Uˆ is upper triangular.

3.8 Review Questions 3.8.1 3.8.2 3.8.3 3.8.4

When is a triangular matrix nonsingular? What is the general condition for Gaussian elimination without row interchanges to be well defined? What is the content of the LU theorem? Approximately how many arithmetic operations are needed for • the multiplication of two square matrices? • The LU factorization of a matrix? • the solution of Ax = b, when A is triangular?

3.8.5 3.8.6

What is a PLU factorization? When does it exist? What is complete pivoting?

Chapter 4

LDL* Factorization and Positive Definite Matrices

In this chapter we consider LU factorizations of Hermitian and positive definite matrices. Recall that a matrix A ∈ Cn×n is Hermitian if A∗ = A, i.e., aj i = a ij for all i, j . A real Hermitian matrix is symmetric. Since aii = a ii the diagonal elements of a Hermitian matrix must be real.

4.1 The LDL* Factorization There are special versions of the LU factorization for Hermitian and positive definite matrices which takes advantage of the special properties of such matrices. The most important ones are 1. the LDL* factorization which is an LDU factorization with U = L∗ and D a diagonal matrix with real diagonal elements 2. the LL* factorization which is an LU factorization with U = L∗ and lii > 0 all i. A matrix A having an LDL* factorization must be Hermitian since D is real so that A∗ = (LDL∗ )∗ = LD ∗ L∗ = A. The LL* factorization is called a Cholesky factorization . Example 4.1 (LDL* of 2 × 2 Hermitian Matrix) Let a, d ∈ R and b ∈ C. An LDL* factorization of a 2 × 2 Hermitian matrix must satisfy the equations 

       1 0 d1 0 1 l1 d1 l1 d1 ab = = bd l1 1 0 d2 0 1 d1 l1 d1 |l1 |2 + d2

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_4

83

84

4 LDL* Factorization and Positive Definite Matrices

for the unknowns l1 in L and d1 , d2 in D. They are determined from d1 = a. al1 = b,

d2 = d − a|l1 |2 .

(4.1)

There are essentially three cases 1. a = 0: The matrix has a unique LDL* factorization. Note that d1 and d2 are real. 2. a = b = 0: The LDL* factorization exists, but it is not unique. Any value for l1 can be used. 3. a = 0, b = 0: No LDL* factorization exists. Lemma 3.1 carries over to the Hermitian case. Lemma 4.1 (LDL* of Leading Principal Sub Matrices) Suppose A = LDL∗ is an LDL* factorization of A ∈ Cn×n . For k = 1, . . . , n let A[k] ,L[k] and D [k] be the leading principal submatrices of A,L and D, respectively. Then A[k] = L[k] D [k] L∗[k] is an LDL* factorization of A[k] for k = 1, . . . , n. Proof For k = 1, . . . , n − 1 we partition A = LDL∗ as follows:      ∗ L[k] M ∗k L[k] 0 D [k] 0 A[k] B ∗k = LDU , = A= Bk F k Mk Nk 0 Ek 0 N ∗k 

(4.2)

where F k , N k , E k ∈ Cn−k,n−k . Block multiplication gives A[k] = L[k] D [k] L∗[k] . Since L[k] is unit lower triangular and D [k] is real and diagonal this is an LDL* factorization of A[k] .   Theorem 4.1 (LDL* Theorem) The matrix A ∈ Cn×n has a unique LDL* factorization if and only if A = A∗ and A[k] is nonsingular for k = 1, . . . , n − 1. Proof We essentially repeat the proof of Theorem 3.4 incorporating the necessary changes. Suppose A∗ = A and that A[k] is nonsingular for k = 1, . . . , n − 1. Note that A∗[k] = A[k] for k = 1, . . . , n. We use induction on n to show that A has a unique LDL* factorization. The result is clearly true for n = 1, since the unique LDL* factorization of a 1-by-1 matrix is [a11] = [1][a11][1] and a11 is real since A∗ = A. Suppose that A[n−1] has a unique LDL* factorization A[n−1] = Ln−1 D n−1 L∗n−1 , and that A[1] , . . . , A[n−1] are nonsingular. By definition D n−1 is real. Using block multiplication      ∗  Ln−1 l n A[n−1] a n Ln−1 0 D n−1 0 A= = l ∗n 1 0 dnn a ∗n ann 0∗ 1   Ln−1 D n−1 L∗n−1 Ln−1 D n−1 l n = l ∗n D n−1 L∗n−1 l ∗n D n−1 l n + dnn

(4.3)

4.2 Positive Definite and Semidefinite Matrices

85

if and only if A[n−1] = Ln−1 D n−1 L∗n−1 , and a n = Ln−1 D n−1 l n ,

ann = l ∗n D n−1 l n + dnn .

(4.4)

Thus we obtain an LDL* factorization of A that is unique since Ln−1 and D n−1 are nonsingular. Also dnn is real since ann and D n−1 are real. For the converse we use Lemma 4.1 in the same way as Lemma 3.1 was used to prove Theorem 3.4.   Here is an analog of Algorithm 3.4 that tries to compute the LDL* factorization of a d-banded matrix A with d ≥ 1. It uses the upper part of the matrix. function [L,dg]=LDLs(A,d) % [L,dg]=LDLs(A,d) n=length(A); L=eye(n,n); dg=zeros(n,1);dg(1)=A(1,1); for k=2:n m=rforwardsolve(L(1:k-1,1:k-1),A(1:k-1,k),d); L(k,1:k-1)=m./dg(1:k-1); dg(k)=A(k,k)-L(k,1:k-1)*m; end Listing 4.1 LDLs

The number of arithmetic operations for the LDL* factorization is approximately half the number of operations needed for the LU factorization. Indeed, in the L1U factorization we needed to solve two triangular systems to find the vectors s and m, while only one such system is needed to find m in the Hermitian case (4.3). The work to find dnn is O(n) and does not contribute to the highest order term. 1 2 Gn ,

Example 4.2 (A Factorization) Is the factorization       31 1 0 3 0 1 1/3 = 13 1/3 1 0 8/3 0 1 an LDL* factorization?

4.2 Positive Definite and Semidefinite Matrices Given A ∈ Cn×n . The function f : Cn → R given by f (x) = x ∗ Ax =

n n

i=1 j =1

aij x i xj

86

4 LDL* Factorization and Positive Definite Matrices

is called a quadratic form. Note that f is real valued if A is Hermitian. Indeed, f (x) = x ∗ Ax = (x ∗ Ax)∗ = x ∗ A∗ x = f (x). Definition 4.1 (Positive Definite Matrix) We say that a matrix A ∈ Cn×n is (i) positive definite if A∗ = A and x ∗ Ax > 0 for all nonzero x ∈ Cn ; (ii) positive semidefinite if A∗ = A and x ∗ Ax ≥ 0 for all x ∈ Cn ; (iii) negative (semi)definite if −A is positive (semi)definite. We observe that 1. The zero-matrix is positive semidefinite, while the unit matrix is positive definite. 2. The matrix A is positive definite if and only if it is positive semidefinite and x ∗ Ax = 0 ⇒ x = 0. 3. A positive definite matrix A is nonsingular. For if Ax = 0 then x ∗ Ax = 0 and this implies that x = 0. 4. It follows from Lemma 4.6 that a nonsingular positive semidefinite matrix is positive definite. 5. If A is real then it is enough to show definiteness for real vectors only. Indeed, if A ∈ Rn×n , AT = A and x T Ax > 0 for all nonzero x ∈ Rn then z∗ Az > 0 for all nonzero z ∈ Cn . For if z = x + iy = 0 with x, y ∈ Rn then z∗ Az = (x − iy)T A(x + iy) = x T Ax − iy T Ax + ix T Ay − i 2 y T Ay = x T Ax + y T Ay, and this is positive since at least one of the real vectors x, y is nonzero. Example 4.3 (Gradient and Hessian) Symmetric positive definite matrices is important in nonlinear optimization. Consider (cf. (16.1)) the gradient ∇f and hessian Hf of a function f : Ω ⊂ Rn → R ⎡ ∂f (x) ⎤ ⎢ ∇f (x) = ⎢ ⎣

∂x1

.. .

∂f (x) ∂xn

⎥ ⎥ ∈ Rn , ⎦

⎡ ∂ 2 f (x) ⎢ ∂x1.∂x1 Hf (x) = ⎢ ⎣ ..

∂ 2 f (x) ∂xn ∂x1

...

∂ 2 f (x) ∂x1 ∂xn

.. .

...

∂ 2 f (x) ∂xn ∂xn

⎤ ⎥ ⎥ ∈ Rn×n . ⎦

We assume that f has continuous first and second order partial derivatives on Ω. Under suitable conditions on the domain Ω it is shown in advanced calculus texts that if ∇f (x) = 0 and Hf (x) is positive definite then x is a local minimum for f . This can be shown using the second-order Taylor expansion (16.2). Moreover, x is a local maximum if ∇f (x) = 0 and Hf (x) is negative definite. Lemma 4.2 (The Matrix A∗ A) The matrix A∗ A is positive semidefinite for any m, n ∈ N and A ∈ Cm×n . It is positive definite if and only if A has linearly independent columns or equivalently rank n.

4.2 Positive Definite and Semidefinite Matrices

87

Proof Clearly A∗ A is Hermitian. Let x ∈ Cn and set z := Ax. By the definition (1.11) of the Euclidean norm we have x ∗ A∗ Ax = z∗ z = z22 = Ax22 ≥ 0 with equality if and only if Ax = 0. It follows that A∗ A is positive semidefinite and positive definite if and only if A has linearly independent columns. But this is equivalent to A having rank n (cf. Definition 1.6).   Lemma 4.3 (T Is Positive Definite) The second derivative matrix T tridiag(−1, 2, −1) ∈ Rn×n is positive definite.

=

Proof Clearly T is symmetric. For any x ∈ Rn xT T x = 2

n

n−1

xi2 −

i=1

=

n−1

i=1 n−1

xi2 − 2

i=1

= x12 + xn2 +

xi xi+1 −

n

xi−1 xi

i=2

xi xi+1 +

i=1

n−1

2 xi+1 + x12 + xn2

i=1

n−1

(xi+1 − xi )2 . i=1

Thus x T T x ≥ 0 and if x T T x = 0 then x1 = xn = 0 and xi = xi+1 for i = 1, . . . , n − 1 which implies that x = 0. Hence T is positive definite.  

4.2.1 The Cholesky Factorization Recall that a principal submatrix B = A(r, r) ∈ Ck×k of a matrix A ∈ Cn×n has elements bi,j = ari ,rj for i, j = 1, . . . , k, where 1 ≤ r1 < · · · < rk ≤ n. It is a leading principal submatrix, denoted A[k] if r = [1, 2, . . . , k]T . We have A(r, r) = X∗ AX,

X := [er1 , . . . , erk ] ∈ Cn×k .

(4.5)

Lemma 4.4 (Submatrices) Any principal submatrix of a positive (semi)definite matrix is positive (semi)definite. Proof Let X and B := A(r, r) be given by (4.5). If A is positive semidefinite then B is positive semidefinite since y ∗ By = y ∗ X∗ AXy = x ∗ Ax ≥ 0,

y ∈ Ck ,

x := Xy.

(4.6)

Suppose A is positive definite and y ∗ By = 0. By (4.6) we have x = 0 and since X has linearly independent columns it follows that y = 0. We conclude that B is positive definite.  

88

4 LDL* Factorization and Positive Definite Matrices

Theorem 4.2 (LDL* and LL*) The following is equivalent for a matrix A ∈ Cn×n . 1. A is positive definite, 2. A has an LDL* factorization with positive diagonal elements in D, 3. A has a Cholesky factorization. If the Cholesky factorization exists it is unique. Proof Recall that A−∗ := (A−1 )∗ = (A∗ )−1 . We show that 1 ⇒ 2 ⇒ 3 ⇒ 1. 1 ⇒ 2: Suppose A is positive definite. By Lemma 4.4 the leading principal submatrices A[k] ∈ Ck×k are positive definite and therefore nonsingular for k = 1, . . . , n − 1. Since A is Hermitian it has by Theorem 4.1 a unique LDL* factorization A = LDL∗ . To show that the ith diagonal element in D is positive we note that x i := L−∗ ei is nonzero since L−∗ is nonsingular. But then dii = e∗i Dei = e∗i L−1 AL−∗ ei = x ∗i Ax i > 0 since A is positive definite. 2 ⇒ 3: Suppose A has an LDL* factorization A = LDL∗ with positive ∗ 1/2 and D 1/2 := diagonal √ elements √ dii in D. Then A = SS , where S := LD diag( d11, . . . , dnn ), and this is a Cholesky factorization of A. 3 ⇒ 1: Suppose A has a Cholesky factorization A = LL∗ . Clearly A∗ = A. Since L has positive diagonal elements it is nonsingular and A is positive definite by Lemma 4.2. For uniqueness suppose LL∗ = SS ∗ are two Cholesky factorizations of the positive definite matrix A. Since A is nonsingular both L and S are nonsingular. Then S −1 L = S ∗ L−∗ , where by Lemma 2.5 S −1 L is lower triangular and S ∗ L−∗ is upper triangular, with diagonal elements ii /sii and sii /ii , respectively. But then both matrices must be equal to the same diagonal matrix and 2ii = sii2 . By positivity ii = sii and we conclude that S −1 L = I = S ∗ L−∗ which means that L = S.   A Cholesky factorization can also be written in the equivalent form A = R ∗ R, where R = L∗ is upper triangular with positive diagonal elements.   2 −1 Example 4.4 (2 × 2) The matrix A = has an LDL* and a Cholesky−1 2 factorization given by √      √  √   2√ 0 2 −1/ 2 2 −1 1 0 2 0 1 − 12 √ √ = . = 3/2 −1/ 2 3/2 0 −1 2 − 12 1 0 32 0 1



There are many good algorithms for finding the Cholesky factorization of a matrix, see [3]. The following version for finding the factorization of a matrix A with bandwidth d ≥ 1 uses the LDL* factorization Algorithm 4.1. Only the upper part of A is used. The algorithm uses the MATLAB command diag.

4.2 Positive Definite and Semidefinite Matrices

89

function L=bandcholesky(A,d) %L=bandcholesky(A,d) [L,dg]=LDL(A,d); L=L*diag(sqrt(dg)); end Listing 4.2 bandcholesky

As for the LDL* factorization the leading term in an operation count for a band matrix is O(d 2 n) . When d is small this is a considerable saving compared to the count 12 Gn = n3 /3 for a full matrix.

4.2.2 Positive Definite and Positive Semidefinite Criteria Not all Hermitian matrices are positive definite, and sometimes we can tell just by glancing at the matrix that it cannot be positive definite. Here are some necessary conditions. Theorem 4.3 (Necessary Conditions for Positive (Semi)Definiteness) If A ∈ Cn×n is positive (semi)definite then for all i, j with i = j 1. 2. 3. 4.

aii > 0, (aii ≥ 0), |Re (aij )| < (aii + ajj )/2, (|Re (aij )| ≤ (aii + ajj )/2), √ √ |aij | < aii ajj , (|aij | ≤ aii ajj ), If A is positive semidefinite and aii = 0 for some i then aij = aj i = 0 for j = 1, . . . , n.

Proof Clearly aii = eTi Aei > (≥)0 and Part 1 follows. If α, β ∈ C and αei +βej = 0 then 0 < (≤)(αei + βej )∗ A(αe i + βej ) = |α|2 aii + |β|2ajj + 2Re (αβaij ).

(4.7)

Taking α = 1, β = ±1 we obtain aii + ajj ± 2Re aij > 0 and this implies Part 2. We first show 3. when A is positive definite. Taking α = −aij , β = aii in (4.7) we find 0 < |aij |2 aii + aii2 ajj − 2|aij |2 aii = aii (aii ajj − |aij |2 ). Since aii > 0 Part 3 follows in the positive definite case. Suppose now A is positive semidefinite. For ε > 0 we define B := A + εI . The matrix B is positive definite since it is Hermitian and x ∗ Bx ≥ εx22 > 0 for any nonzero x ∈ Cn . From what we have shown %  |aij | = |bij | < bii bjj = (aii + ε)(ajj + ε), i = j.

90

4 LDL* Factorization and Positive Definite Matrices

Since ε > 0 is arbitrary Part 3 follows in the semidefinite case. Since A is Hermitian Part 3 implies Part 4.   Example 4.5 (Not Positive Definite) Consider the matrices A1 =

  01 , 11

A2 =

  12 , 22

A3 =

  −2 1 . 1 2

Here A1 and A3 are not positive definite, since a diagonal element is not positive. A2 is not positive definite since neither Part 2 nor Part 3 in Theorem 4.3 are satisfied.   The matrix 21 12 enjoys all the necessary conditions in Theorem 4.3. But to decide if it is positive definite it is nice to have sufficient conditions as well. We start by considering eigenvalues of a positive (semi)definite matrix. Lemma 4.5 (Positive Eigenvalues) A matrix is positive (semi)definite if and only if it is Hermitian and all its eigenvalues are positive (nonnegative). Proof Suppose A is positive (semi)definite. Then A is Hermitian by definition, and if Ax = λx and x is nonzero, then x ∗ Ax = λx ∗ x. This implies that λ > 0(≥ 0) since A is positive (semi)definite and x ∗ x = x22 > 0. Conversely, suppose A ∈ Cn×n is Hermitian with positive (nonnegative) eigenvalues λ1 , . . . , λn . By Theorem 6.9 (the spectral theorem) there is a matrix U ∈ Cn×n with U ∗ U = U U ∗ = I such that U ∗ AU = diag(λ1 , . . . , λn ). Let x ∈ Cn and define z := U ∗ x = [z1 , . . . , zn ]T ∈ Cn . Then x = U U ∗ x = U z and by the spectral theorem x ∗ Ax = z∗ U ∗ AU z = z∗ diag(λ1 , . . . , λn )z =

n

λj |zj |2 ≥ 0.

j =1

It follows that A is positive semidefinite. Since U ∗ is nonsingular we see that z = U ∗ x is nonzero if x is nonzero, and therefore A is positive definite.   Lemma 4.6 (Positive Semidefinite and Nonsingular) A matrix is positive definite if and only if it is positive semidefinite and nonsingular. Proof If A is positive definite then it is positive semidefinite and if Ax = 0 then x ∗ Ax = 0 which implies that x = 0. Conversely, if A is positive semidefinite then it is Hermitian with nonnegative eigenvalues (cf. Lemma 4.5). If it is nonsingular all eigenvalues are positive (cf. Theorem 1.11), and it follows from Lemma 4.5 that A is positive definite.   The following necessary and sufficient conditions can be used to decide if a matrix is positive definite.

4.3 Semi-Cholesky Factorization of a Banded Matrix

91

Theorem 4.4 (Positive Definite Characterization) The following statements are equivalent for a matrix A ∈ Cn×n . 1. A is positive definite. 2. A is Hermitian with only positive eigenvalues. 3. A is Hermitian and all leading principal submatrices have a positive determinant. 4. A = BB ∗ for a nonsingular B ∈ Cn×n . Proof 1 ⇐⇒ 2: This follows from Lemma 4.5. 1 ⇒ 3: A positive definite matrix has positive eigenvalues, and since the determinant of a matrix equals the product of its eigenvalues (cf. Theorem 1.10) the determinant is positive. Every leading principal submatrix of a positive definite matrix is positive definite (cf. Lemma 4.4) and therefore has a positive determinant. 3 ⇒ 4: Since a leading principal submatrix has a positive determinant it is nonsingular and Theorem 4.1 implies that A has a unique LDL* factorization and by Theorem 4.2 a unique Cholesky factorization A = BB ∗ with B = L. 4 ⇒ 1: This follows from Lemma 4.2.   Example  4.6  (Positive Definite Characterization) Consider the symmetric matrix 31 A := . 13 1. We have x T Ax = 2x12 + 2x22 + (x1 + x2 )2 > 0 for all nonzero x showing that A is positive definite. 2. The eigenvalues of A are λ1 = 2 and λ2 = 4. They are positive showing that A is positive definite since it is symmetric. 3. We find det(A[1] ) = 3 and det(A[2] ) = 8 showing again that A is positive definite since it is also symmetric. 4. Finally A is positive definite since by Example 4.2 we have A = BB ∗ ,

 B=

  √ 3 0 1 0 √ . 8/3 0 1/3 1

4.3 Semi-Cholesky Factorization of a Banded Matrix A positive semidefinite matrix has a factorization that is similar to the Cholesky factorization. Definition 4.2 (Semi-Cholesky Factorization) A factorization A = LL∗ of A ∈ Cn×n , where L is lower triangular with nonnegative diagonal elements is called a semi-Cholesky factorization.

92

4 LDL* Factorization and Positive Definite Matrices

Note that a semi-Cholesky factorization of a positive definite matrix is necessarily a Cholesky factorization. For if A is positive definite then it is nonsingular and then L must be nonsingular. Thus the diagonal elements of L cannot be zero. Theorem 4.5 (Characterization, Semi-Cholesky Factorization) A matrix A ∈ Cn×n has a semi-Cholesky factorization A = LL∗ if and only if it is positive semidefinite. Proof If A = LL∗ is a semi-Cholesky factorization then A is Hermitian. Moreover, x ∗ Ax = L∗ x22 ≥ 0 and A is positive semidefinite. For the converse we use induction on n. A positive semidefinite matrix of order one has a semi-Cholesky factorization since the only element in A is nonnegative. Suppose any positive semidefinite matrix of order n − 1 has a semi-Cholesky factorization and suppose A ∈ Cn×n is positive semidefinite. We partition A as follows 

 α v∗ A= , v B

α ∈ C, v ∈ Cn−1 , B ∈ C(n−1)×(n−1) .

(4.8)

There are two cases. Suppose first α = e∗1 Ae1 > 0. We claim that C := B − vv ∗ /α is positive semidefinite. C is Hermitian since B is. To show that C is positive semidefinite we consider any y ∈ Cn−1 and define x ∗ := [−y ∗ v/α, y ∗ ] ∈ Cn . Then  ∗  ∗  αv −v y/α v B y  ∗  −v y/α = [0, −(y ∗ v)v ∗ /α + y ∗ B] y

0 ≤ x ∗ Ax = [−y ∗ v/α, y ∗ ]

(4.9)

= −y ∗ vv ∗ y/α + y ∗ By = y ∗ Cy. So C ∈ C(n−1)×(n−1) is positive semidefinite and by the induction hypothesis it has a semi-Cholesky factorization C = L1 L∗1 . The matrix L∗ :=

 ∗  β v /β , 0 L∗1

β :=

√ α,

(4.10)

is upper triangular with nonnegative diagonal elements and LL∗ =



β 0 v/β L1

  ∗   ∗ β v /β αv = =A v B 0 L∗1

is a semi-Cholesky factorization of A. If α = 0 then part 4 of Theorem 4.3 implies that v = 0. Moreover, B ∈ C(n−1)×(n−1) in (4.8) is positive semidefinite and therefore has a semi-Cholesky

4.3 Semi-Cholesky Factorization of a Banded Matrix

93



 0 0∗ factorization B = But then where L = is a semi-Cholesky 0 L1 factorization of A. Indeed, L is lower triangular and L1 L∗1 .

LL∗ =

LL∗ ,



0 0∗ 0 L1

  ∗   ∗ 0 0 00 = = A. 0 L∗1 0B  

Recall that a matrix A is d-banded if aij = 0 for |i − j | > d. A (semi-) Cholesky factorization preserves bandwidth. Theorem 4.6 (Bandwidth Semi-Cholesky Factor) The semi-Cholesky factor L given by (4.10) has the same bandwidth as A. Proof Suppose A ∈ Cn×n is d-banded. Then v ∗ = [u∗ , 0∗ ] in (4.8), where u ∈ Cd , and therefore C := B − vv ∗ /α differs from B only in the upper left d × d corner. It follows that C has the same bandwidth as B and A. By induction on n, C = L1 L∗1 , where L∗1 has the same bandwidth as C. But then L in (4.10) has the same bandwidth as A.   Consider now implementing an algorithm based on the previous discussion. Since A is Hermitian we only need to use the lower part of A. The first column of L is [β, v ∗ /β]∗ if α > 0. If α = 0 then by 4 in Theorem 4.3 the first column of A is zero and this is also the first column of L. We obtain if A(1, 1) > 0  A(1, 1) = A(1, 1) A(2 : n, 1) = A(2 : n, 1)/A(1, 1)

(4.11)

for j = 2 : n A(j : n, j ) = A(j : n, j ) − A(j, 1) ∗ A(j : n, 1) Here we store the first column of L in the first column of A and the lower part of C = B − vv ∗ /α in the lower part of A(2 : n, 2 : n). The code can be made more efficient when A is a d-banded matrix. We simply replace all occurrences of n by min(i + d, n). Continuing the reduction we arrive at the following algorithm, which take a d-banded positive semidefinite A and d ≥ 1 as input, and returns a lower triangular matrix L so that A = LL∗ . This is the Cholesky factorization of A if A is positive definite and a semi-Cholesky factorization of A otherwise. The algorithm uses the MATLAB command tril:

94

4 LDL* Factorization and Positive Definite Matrices

function L=bandsemicholeskyL(A,d) %L=bandsemicholeskyL(A,d) n=length(A); for k=1:n kp=min(n,k+d); if A(k,k)>0 A(k,k)=sqrt(A(k,k)); A((k+1):kp,k)=A((k+1):kp,k)/A(k,k); for j=k+1:kp A(j:kp,j)=A(j:kp,j)-A(j,k)*A(j:kp,k); end else A(k:kp,k)=zeros(kp-k+1,1); end end L=tril(A); end Listing 4.3 bandsemicholeskyL

In the algorithm we overwrite the lower triangle of A with the elements of L. Column k of L is zero for those k where kk = 0. We reduce round-off noise by forcing those rows to be zero. In the semidefinite case no update is necessary and we “do nothing”. Deciding when a diagonal element is zero can be a problem in floating point arithmetic. We end the section with some necessary and sufficient conditions for a matrix to be positive semidefinite. Theorem 4.7 (Positive Semidefinite Characterization) The following is equivalent for a matrix A ∈ Cn×n . 1. 2. 3. 4.

A is positive semidefinite. A is Hermitian with only nonnegative eigenvalues. A is Hermitian and all principal submatrices have a nonnegative determinant. A = BB ∗ for some B ∈ Cn×n .

Proof 1 ⇐⇒ 2: This follows from Lemma 4.5. 1 ⇐⇒ 4: This follows from Theorem 4.5 1 ⇐⇒ 3: We refer to page 567 of [15], where it is shown that 4 ⇒ 3 (and therefore 1 ⇒ 3 since 1 ⇐⇒ 4) and 3 ⇒ 1.  

4.4 The Non-symmetric Real Case

95

Example 4.7 (Positive Semidefinite Characterization) Consider the symmetric  11 matrix A := . 11 1. We have x ∗ Ax = x12 + x22 + x1 x2 + x2 x1 = (x1 + x2 )2 ≥ 0 for all x ∈ R2 showing that A is positive semidefinite. 2. The eigenvalues of A are λ1 = 2 and λ2 = 0 and they are nonnegative showing that A is positive semidefinite since it is symmetric. 3. There are three principal sub matrices, and they have determinants det([a11]) = 1, det([a22]) = 1 and det(A) = 0 and showing again that A is positive semidefinite.   10 4. Finally A is positive semidefinite since A = BB ∗ , where B = . 10 In part 4 of Theorem 4.7 we require nonnegativity of all principal minors, while only positivity of leading principal minors was required for positive definite matrices (cf. Theorem 4.4). To see that nonnegativity of the leading principal minors  0 . The leading principal minors are is not enough consider the matrix A := 00 −1 nonnegative, but A is not positive semidefinite.

4.4 The Non-symmetric Real Case In this section we say that a matrix A ∈ Rn×n is positive semidefinite if x ∗ Ax ≥ 0 for all x ∈ Rn and positive definite if x ∗ Ax > 0 for all nonzero x ∈ Rn . Thus we do not require A to be symmetric. This means that some of the eigenvalues can be complex (cf. Example 4.8). Note that a non-symmetric positive definite matrix is nonsingular, but in Exercise 4.3 you can show that a converse is not true. We have the following theorem. Theorem 4.8 (The Non-symmetric Case) Suppose A ∈ Rn×n is positive definite. Then the following holds true. 1. 2. 3. 4. 5.

Every principal submatrix of A is positive definite, A has a unique LU factorization, the real eigenvalues of A are positive, det(A) > 0, aii ajj > aij aj i , for i = j .

Proof 1. The proof is the same as for Lemma 4.4. 2. Since all leading submatrices are positive definite they are nonsingular and the result follows from the LU Theorem 3.4.

96

4 LDL* Factorization and Positive Definite Matrices

3. Suppose (λ, x) is an eigenpair of A and that λ is real. Since A is real we can choose x to be real. Multiplying Ax = λx by x T and solving for λ we find T λ = xx TAx > 0. x 4. The determinant of A equals the product of its eigenvalues. The eigenvalues are either real and positive or occur in complex conjugate pairs. The product of two nonzero complex conjugate is positive.  a numbers a  5. The principal submatrix ajiii ajjij has a positive determinant.   Example 4.8 (2 × 2 Positive Definite) A non-symmetric positive definite matrix can have complex eigenvalues. The family of matrices 

 2 2−a A[a] := , a 1

a∈R

is positive definite for any a ∈ R. Indeed, for any nonzero x ∈ R2 x T Ax = 2x12 + (2 − a)x1 x2 + ax2 x1 + x22 = x12 + (x1 + x2 )2 > 0. The eigenvalues of A[a] are positive for a ∈ [1 − other values of a.



5 2 ,1

+



5 2 ]

and complex for

4.5 Exercises Chap. 4 4.5.1 Exercises Sect. 4.2 Exercise 4.1 (Positive Definite Characterizations)  Show directly that all 4 char21 acterizations in Theorem 4.4 hold for the matrix . 12 Exercise 4.2 (L1U factorization (Exam 1982-1)) ]Find the L1U factorization of the following matrix A ∈ Rn×n ⎛

⎞ 1 −1 0 · · · 0 ⎜ . ⎟ ⎜ −1 2 −1 . . . .. ⎟ ⎜ ⎟ ⎜ ⎟ A = ⎜ 0 ... ... ... 0 ⎟ . ⎜ ⎟ ⎜ . . ⎟ ⎝ .. . . −1 2 −1 ⎠ 0 · · · 0 −1 2 Is A positive definite?

4.6 Review Questions

97

Exercise 4.3 (A Counterexample) In the non-symmetric case a nonsingular positive semidefinite  matrixis not necessarily positive definite. Show this by considering 1 0 the matrix A := . −2 1 Exercise 4.4 (Cholesky Update (Exam Exercise 2015-2)) a) Let E ∈ Rn×n be of the form E = I + uuT , where u ∈ Rn . Show that E is symmetric and positive definite, and find an expression for E −1 .1 b) Let A ∈ Rn×n be of the form A = B + uuT , where B ∈ Rn×n is symmetric and positive definite, and u ∈ Rn . Show that A can be decomposed as A = L(I + vv T )LT , where L is nonsingular and lower triangular, and v ∈ Rn . c) Assume that the Cholesky decomposition of B is already computed. Outline a procedure to solve the system Ax = b, where A is of the form above. Exercise 4.5 (Cholesky Update (Exam Exercise 2016-2)) Let A ∈ Rn×n be a symmetric positive definite matrix with a known Cholesky factorization A = LLT . Furthermore, let A+ be a corresponding (n + 1) × (n + 1) matrix of the form  A+ =

 A a , aT α

where a is a vector in Rn , and α is a real number. We assume that the matrix A+ is symmetric positive definite. a) Show that if A+ = L+ LT+ is the Cholesky factorization of A+ , then L+ is of the form   L 0 L+ = T , y λ i.e., that the leading principal n × n submatrix of L+ is L. b) Explain why α > L−1 a22 . c) Explain how you can compute L+ when L is known.

4.6 Review Questions 4.6.1 4.6.2

1 Hint:

What is the content of the LDL* theorem? Is A∗ A always positive definite? The matrix E −1 is of the form E −1 = I + auuT for some a ∈ R.

98

4.6.3 4.6.4 4.6.5 4.6.6 4.6.7

4 LDL* Factorization and Positive Definite Matrices

! 10 4 3 " Is the matrix 4 0 2 positive definite? 3 25 What class of matrices has a Cholesky factorization? What is the bandwidth of the Cholesky factor of a band matrix? For a symmetric matrix give 3 conditions that are equivalent to positive definiteness. What class of matrices has a semi-Cholesky factorization?

Chapter 5

Orthonormal and Unitary Transformations

In Gaussian elimination and LU factorization we solve a linear system by transforming it to triangular form. These are not the only kind of transformations that can be used for such a task. Matrices with orthonormal columns, called unitary matrices can be used to reduce a square matrix to upper triangular form and more generally a rectangular matrix to upper triangular (also called upper trapezoidal) form. This lead to a decomposition of a rectangular matrix known as a QR decomposition and a reduced form which we refer to as a QR factorization. The QR decomposition and factorization will be used in later chapters to solve least squares- and eigenvalue problems. Unitary transformations have the advantage that they preserve the Euclidian norm of a vector. This means that when a unitary transformation is applied to an inaccurate vector then the error will not grow. Thus a unitary transformation is said to be numerically stable. We consider two classes of unitary transformations known as Householder- and Givens transformations, respectively.

5.1 Inner Products, Orthogonality and Unitary Matrices An inner product or scalar product in a vector space is a function mapping pairs of vectors into a scalar.

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_5

99

100

5 Orthonormal and Unitary Transformations

5.1.1 Real and Complex Inner Products Definition 5.1 (Inner Product) An inner product in a complex vector space V is a function V × V → C satisfying for all x, y, z ∈ V and all a, b ∈ C the following conditions: 1. x, x ≥ 0 with equality if and only if x = 0. 2. x, y = y, x 3. ax + by, z = ax, z + by, z.

(positivity) (skew symmetry) (linearity)

The pair (V, ·, ·) is called an inner product space. Note the complex conjugate in 2. Since x, ay + bz = ay + bz, x = ay, x + bz, x = ay, x + bz, x we find x, ay + bz = ax, y + bx, z,

ax, ay = |a|2 x, y.

(5.1)

An inner product in a real vector space V is real valued function satisfying Properties 1,2,3 in Definition 5.1, where we can replace skew symmetry by symmetry x, y = y, x (symmetry). In the real case we have linearity in both variables since we can remove the complex conjugates in (5.1). Recall that (cf. (1.10)) the standard inner product in Cn is given by x, y := y ∗ x = x T y =

n

xj yj .

j =1

Note the complex conjugate on y. It is clearly an inner product in Cn . The function · : V → R,

x −→ x :=

 x, x

(5.2)

is called the inner product norm. The inner product norm for the standard inner product is the Euclidian norm √ x = x2 = x ∗ x.

5.1 Inner Products, Orthogonality and Unitary Matrices

101

The following inequality holds for any inner product. Theorem 5.1 (Cauchy-Schwarz Inequality) For any x, y in a real or complex inner product space |x, y| ≤ xy,

(5.3)

with equality if and only if x and y are linearly dependent. Proof If y = 0 then 0x + y = 0 and x and y are linearly dependent. Moreover the inequality holds with equality since x, y = x, 0y = 0x, y = 0 and y = 0. So assume y = 0. Define z := x − ay,

a :=

x, y . y, y

By linearity z, y = x, y − ay, y = 0 so that by 2. and (5.1) ay, z + z, ay = az, y + az, y = 0.

(5.4)

But then x2 = x, x = z + ay, z + ay (5.4)

(5.1)

= z, z + ay, ay = z2 + |a|2y2

≥ |a|2 y2 =

|x, y|2 . y2

Multiplying by y2 gives (5.3). We have equality if and only if z = 0, which means that x and y are linearly dependent.   Theorem 5.2 (Inner Product Norm) For all x, y in an inner product space and all a in C we have 1. x ≥ 0 with equality if and only if x = 0. 2. ax = |a| x. 3. x + y ≤ x + y, √ where x := x, x.

(positivity) (homogeneity) (subadditivity)

In general a function   : Cn → R that satisfies these three properties is called a vector norm. A class of vector norms called p-norms will be studied in Chap. 8.

102

5 Orthonormal and Unitary Transformations

Proof The first statement is an immediate consequence of positivity, while the second one follows from (5.1). Expanding x + ay2 = x + ay, x + ay using (5.1) we obtain x + ay2 = x2 + ay, x + ax, y + |a|2 y2 ,

a ∈ C,

x, y ∈ V.

(5.5)

Now (5.5) with a = 1 and the Cauchy-Schwarz inequality implies x + y2 ≤ x2 + 2xy + y2 = (x + y)2 .  

Taking square roots completes the proof. In the real case the Cauchy-Schwarz inequality implies that −1 ≤ for nonzero x and y, so there is a unique angle θ in [0, π] such that cos θ =

x, y . xy

x,y xy

≤1

(5.6)

This defines the angle between vectors in a real inner product space.

5.1.2 Orthogonality Definition 5.2 (Orthogonality) Two vectors x, y in a real or complex inner product space are orthogonal or perpendicular, denoted as x ⊥ y, if x, y = 0. The vectors are orthonormal if in addition x = y = 1. From the definitions (5.6), (5.20) of angle θ between two nonzero vectors in Rn or Cn it follows that x ⊥ y if and only if θ = π/2. Theorem 5.3 (Pythagoras) For a real or complex inner product space x + y2 = x2 + y2 ,

if x ⊥ y.

Proof We set a = 1 in (5.5) and use the orthogonality.

(5.7)  

Definition 5.3 (Orthogonal- and Orthonormal Bases) A set of nonzero vectors {v 1 , . . . , v k } in a subspace S of a real or complex inner product space is an orthogonal basis for S if it is a basis for S and v i , v j  = 0 for i = j . It is an orthonormal basis for S if it is a basis for S and v i , v j  = δij for all i, j . A basis for a subspace of an inner product space can be turned into an orthogonalor orthonormal basis for the subspace by the following construction (Fig. 5.1).

5.1 Inner Products, Orthogonality and Unitary Matrices Fig. 5.1 The construction of v 1 and v 2 in Gram-Schmidt. The constant c is given by c := s 2 , v 1 /v 1 , v 1 

103

*

     v1 := s1 v2   A A A * 

* 

s2



 cv1  



Theorem 5.4 (Gram-Schmidt) Let {s 1 , . . . , s k } be a basis for a real or complex inner product space (S, ·, ·). Define v 1 := s 1 ,

v j := s j −

j −1

s j , v i  i=1

v i , v i 

vi ,

j = 2, . . . , k.

(5.8)

Then {v 1 , . . . , v k } is an orthogonal basis for S and the normalized vectors , {u1 , . . . , uk } :=

vk v1 ,..., v 1  v k 

-

form an orthonormal basis for S. Proof To show that {v 1 , . . . , v k } is an orthogonal basis for S we use induction on k. Define subspaces Sj := span{s 1 , . . . , s j } for j = 1, . . . , k. Clearly v 1 = s 1 is an orthogonal basis for S1 . Suppose for some j ≥ 2 that v 1 , . . . , v j −1 is an orthogonal basis for Sj −1 and let v j be given by (5.8) as a linear combination of s j and v 1 , . . . , v j −1 . Now each of these v i is a linear combination of s 1 , . . . , s i , j and we obtain v j = i=1 ai s i for some a0 , . . . , aj with aj = 1. Since s 1 , . . . , s j are linearly independent and aj = 0 we deduce that v j = 0. By the induction hypothesis v j , v l  = s j , v l  −

j −1

s j , v i  i=1

v i , v i 

v i , v l  = s j , v l  −

s j , v l  v l , v l  = 0 v l , v l 

for l = 1, . . . , j − 1. Thus v 1 , . . . , v j is an orthogonal basis for Sj . If {v 1 , . . . , v k } is an orthogonal basis for S then clearly {u1 , . . . , uk } is an orthonormal basis for S.   Sometimes we want to extend an orthogonal basis for a subspace to an orthogonal basis for a larger space.

104

5 Orthonormal and Unitary Transformations

Theorem 5.5 (Orthogonal Extension of Basis) Suppose S ⊂ T are finite dimensional subspaces of a vector space V. An orthogonal basis for S can always be extended to an orthogonal basis for T . Proof Suppose dim S := k < dim T = n. Using Theorem 1.3 we first extend an orthogonal basis s 1 , . . . , s k for S to a basis s 1 , . . . , s k , s k+1 , . . . , s n for T , and then apply the Gram-Schmidt process to this basis obtaining an orthogonal basis v 1 , . . . , v n for T . This is an extension of the basis for S since v i = s i for i = 1, . . . , k. We show this by induction. Clearly v 1 = s 1 . Suppose for some 2 ≤ r < k that v j = s j for j = 1, . . . , r − 1. Consider (5.8) for j = r. Since s r , v i  = s r , s i  = 0 for i < r we obtain v r = s r .   Letting S = span(s 1 , . . . , s k ) and T be Rn or Cn we obtain Corollary 5.1 (Extending Orthogonal Vectors to a Basis) For 1 ≤ k < n a set {s 1 , . . . , s k } of nonzero orthogonal vectors in Rn or Cn can be extended to an orthogonal basis for the whole space.

5.1.3 Sum of Subspaces and Orthogonal Projections Suppose S and T are subspaces of a real or complex vector space V endowed with an inner product x, y. We define • Sum: S + T := {s + t : s ∈ S and t ∈ T }, • direct sum S ⊕ T : a sum where S ∩ T = {0}, ⊥

• orthogonal sum S ⊕T : a sum where s, t = 0 for all s ∈ S and t ∈ T . We note that • S + T is a vector space, a subspace of V which in this book will be Rn or Cn (cf. Example 1.2). • Every v ∈ S ⊕ T can be decomposed uniquely in the form v = s + t, where s ∈ S and t ∈ T . For if v = s 1 + t 1 = s 2 + t 2 for s 1 , s 2 ∈ S and t 1 , t 2 ∈ T , then 0 = s 1 − s 2 + t 1 − t 2 or s 1 − s 2 = t 2 − t 1 . It follows that s 1 − s 2 and t 2 − t 1 belong to both S and T and hence to S ∩ T . But then s 1 − s 2 = t 2 − t 1 = 0 so s 1 = s 2 and t 2 = t 1 . By (1.8) in the introduction chapter we have dim(S ⊕ T ) = dim(S) + dim(T ). The subspaces S and T in a direct sum are called complementary subspaces. • An orthogonal sum is a direct sum. For if v ∈ S ∩T then v is orthogonal to itself, v, v = 0, which implies that v = 0. We often write T := S ⊥ . • Suppose v = s 0 + t 0 ∈ S ⊕ T , where s 0 ∈ S and t 0 ∈ T . The vector s 0 is called the oblique projection of v into S along T . Similarly, The vector t 0 is

5.1 Inner Products, Orthogonality and Unitary Matrices

105

Fig. 5.2 The orthogonal projections of s + t into S and T

s+t t0

S

s0



called the oblique projection of v into T along S. If S ⊕T is an orthogonal sum then s 0 is called the orthogonal projection of v into S. Similarly, t 0 is called the orthogonal projection of v in T = S ⊥ . The orthogonal projections are illustrated in Fig. 5.2. Theorem 5.6 (Orthogonal Projection) Let S and T be subspaces of a finite dimensional real or complex vector space V with an inner product ·, ·. The ⊥



orthogonal projections s 0 of v ∈ S ⊕T into S and t 0 of v ∈ S ⊕T into T satisfy v = s 0 + t 0 , and s 0 , s = v, s,

for all s ∈ S,

t 0 , t = v, t,

for all t ∈ T .

(5.9)

Moreover, if {v 1 , . . . , v k } is an orthogonal basis for S then k

v, v i  s0 = vi . v i , v i 

(5.10)

i=1

Proof We have s 0 , s = v − t 0 , s = v, s, since t 0 , s = 0 for all s ∈ S and (5.9) follows. If s 0 is given by (5.10) then for j = 1, . . . , k k k

v, v i  v, v i  vi , v j  = v i , v j  = v, v j . s 0 , v j  =  v i , v i  v i , v i  i=1

i=1

By linearity (5.9) holds for all s ∈ S. By uniqueness it must be the orthogonal ⊥

projections of v ∈ S ⊕T into S. The proof for t 0 is similar.

 

106

5 Orthonormal and Unitary Transformations

Corollary 5.2 (Best Approximation) Let S be a subspaces of a finite dimensional real or complex vector space V with an inner product ·, · and corresponding norm √ v := v, v. If s 0 ∈ S is the orthogonal projection of v ∈ V then v − s 0  < v − s, for all s ∈ S, s = s 0 .

(5.11)

Proof Let s 0 = s ∈ S and 0 = u := s 0 − s ∈ S. It follows from (5.9) that v − s 0 , u = 0. By (5.7) (Pythagoras) we obtain v − s2 = v − s 0 + u2 = v − s 0 2 + u2 > v − s 0 2 .  

5.1.4 Unitary and Orthogonal Matrices In the rest of this chapter orthogonality is in terms of the standard inner product in Cn given by x, y := y ∗ x = nj=1 xj yj . For symmetric and Hermitian matrices we have the following characterization. Lemma 5.1 Let A ∈ Cn×n and x, y be the standard inner product in Cn . Then 1. AT = A ⇐⇒ Ax, y = x, Ay for all x, y ∈ Cn . 2. A∗ = A ⇐⇒ Ax, y = x, Ay for all x, y ∈ Cn . Proof Suppose AT = A and x, y ∈ Cn . Then ∗

x, Ay = (Ay)∗ x = y ∗ A x = y ∗ AT x = y ∗ Ax = Ax, y. For the converse we take x = ej and y = ei for some i, j and obtain eTi Aej = Aej , ei  = ej , Aei  = eTi AT ej . Thus, A = AT since they have the same i, j element for all i, j . The proof of 2. is similar.   A square matrix U ∈ Cn×n is unitary if U ∗ U = I . If U is real then U T U = I and U is called an orthogonal matrix. Unitary and orthogonal matrices have orthonormal columns. If U ∗ U = I the matrix U is nonsingular, U −1 = U ∗ and therefore U U ∗ = U U −1 = I as well. Moreover, both the columns and rows of a unitary matrix of order n form orthonormal bases for Cn . We also note that the product of two unitary matrices is unitary. Indeed, if U ∗1 U 1 = I and U ∗2 U 2 = I then (U 1 U 2 )∗ (U 1 U 2 ) = U ∗2 U ∗1 U 1 U 2 = I .

5.2 The Householder Transformation

107

Theorem 5.7 (Unitary Matrix) The matrix U ∈ Cn×n is unitary if and only if U x, U y = x, y for all x, y ∈ Cn . In particular, if U is unitary then U x2 = x2 for all x ∈ Cn . Proof If U ∗ U = I and x, y ∈ Cn then U x, U y = (U y)∗ (U x) = y ∗ U ∗ U x = y ∗ x = x, y. Conversely, if U x, U y = x, y for all x, y ∈ Cn then U ∗ U = I since for i, j = 1, . . . , n (U ∗ U )i,j = e∗i U ∗ U ej = (U ei )∗ (U ej ) = U ej , U ei  = ej , ei  = e∗i ej , so that (U ∗ U )i,j = δi,j for all i, j . The last part of the theorem follows immediately by taking y = x:  

5.2 The Householder Transformation We consider a unitary matrix with many useful properties. Definition 5.4 (Householder Transformation) A matrix H ∈ Cn×n of the form H := I − uu∗ , where u ∈ Cn and u∗ u = 2 is called a Householder transformation. The name elementary reflector is also used. In the real case and for n = 2 we find        10 1 − u21 −u1 u2 u  H = . − 1 u1 u2 = u2 −u2 u1 1 − u22 01 A Householder transformation is Hermitian and unitary. Indeed, H ∗ = (I − uu∗ )∗ = H and H ∗ H = H 2 = (I − uu∗ )(I − uu∗ ) = I − 2uu∗ + u(u∗ u)u∗ = I . In the real case H is symmetric and orthogonal. There are several ways to represent a Householder transformation. Householder used I − 2uu∗ , where u∗ u = 1. For any nonzero v ∈ Cn the matrix H := I − 2

vv ∗ v∗ v

(5.12)

108

5 Orthonormal and Unitary Transformations

√ v is a Householder transformation. Indeed, H = I − uu∗ , where u := 2 v has 2 √ length 2. Two vectors can, under certain conditions, be mapped into each other by a Householder transformation. Lemma 5.2 Suppose x, y ∈ Cn are two vectors such that x2 = y2 , y ∗ x is ∗ real and v := x − y = 0. Then I − 2 vv v ∗ v x = y. Proof Since x ∗ x = y ∗ y and Re(y ∗ x) = y ∗ x we have v ∗ v = (x − y)∗ (x − y) = 2x ∗ x − 2Re(y ∗ x) = 2v ∗ x.  ∗ But then I − 2 vvv∗ v x = x −

2v ∗ x v∗ v v

(5.13)

= x − v = y.

 

There is a nice geometric interpretation of this Lemma. We have H =I−

vv ∗ vv ∗ 2vv ∗ = P − , where P := I − , v∗ v v∗v v∗ v

and Px = x −

v ∗ x (5.13) 1 1 v = x − v = (x + y). v∗v 2 2

If x, y ∈ Rn it follows that H x is the reflected image of x. The “mirror” M := {w ∈ Rn : w∗ v = 0} contains the vector (x + y)/2 and has normal x − y. This is illustrated for the real case in Fig. 5.3. “mirror”

Fig. 5.3 The Householder transformation in Example 5.1

y = Hx Px x

v =x−y

5.2 The Householder Transformation

109

Example 5.1 (Reflector) Suppose x := [1, 0, 1]T and y := [−1, 0, 1]T . Then v = x − y = [2, 0, 0]T and ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 100 2  −1 0 0  2 H := I − T = ⎣0 1 0⎦ − ⎣0⎦ 2 0 0 = ⎣ 0 1 0⎦ , v v 4 001 0 0 01 2vv T

⎡ ⎤ ⎡ ⎤ ⎡ 100 2  00  1 P := I − T = ⎣0 1 0⎦ − ⎣0⎦ 2 0 0 = ⎣0 1 v v 4 001 0 00 vv T

⎤ 0 0⎦ . 1

The set M := {w ∈ R3 : wT v = 0} = {

! w1 " w2 w3

: 2w1 = 0}

is the yz plane (cf. Fig. 5.3), H x = [−1, 0, 1]T = y, and P x = [0, 0, 1]T = (x + y)/2 ∈ M. Householder transformations can be used to produce zeros in vectors. In the following Theorem we map any vector in Cn into a multiple of the first unit vector. Theorem 5.8 (Zeros in Vectors) For any x ∈ Cn there is a Householder transformation H ∈ Cn×n such that

x1 /|x1 |, if x1 = 0, H x = ae1 , a = −ρx2 , ρ := 1, otherwise. Proof If x = 0 √ then H x = 0 and a = 0. Any u with u2 = we choose u := 2e1 in this case. For x = 0 we define z + e1 u := √ , where z := ρx/x2 . 1 + z1

√ 2 will work, and

(5.14)

2 x = x. Moreover, z = 1 and z = Since |ρ| = 1 we have ρx2 z = |ρ| 2 1 (z+e1 )∗ (z+e1 ) 2+2z1 ∗ |x1 |/x2 is real so that u u = = = 2. Finally, 1+z1 1+z1

H x = x − (u∗ x)u = ρx2 (z − (u∗ z)u) = ρx2 (z −

(z∗ + e∗1 )z (z + e1 )) 1 + z1

= ρx2 (z − (z + e1 )) = −ρx2 e1 = ae1 .  

110

5 Orthonormal and Unitary Transformations

The formulas in Theorem 5.8 are implemented in the following algorithm adapted from [17]. To any given x ∈ Cn , a number a and a vector u with u∗ u = 2 is computed so that (I − uu∗ )x = ae1 : function [u,a]=housegen(x) % [u,a]=housegen(x) a=norm(x); if a==0 u=x; u(1)=sqrt(2); return; end if x(1)== 0 r=1; else r=x(1)/abs(x(1)); end u=conj(r)*x/a; u(1)=u(1)+1; u=u/sqrt(u(1)); a=-r*a; end Listing 5.1 housegen

Note that • In Theorem 5.8 the first component of z is z1 = |x1 |/x2 ≥ 0. Since z2 = 1 we have 1 ≤ 1 + z1 ≤ 2. It follows that we avoid cancelation error when computing 1 + z1 and u and a are computed in a numerically stable way. • In order to compute H x for a vector x we do not need to form the matrix H . Indeed, H x = (I − uu∗ )x = x − (u∗ x)u. If u, x ∈ Rm this requires 2m operations to find uT x, m operations for (uT x)u and m operations for the final subtraction of the two vectors, a total of 4m arithmetic operations. If A ∈ Rm×n then 4mn operations are required for H A = A − (uT A)u, i.e., 4m operations for each of the n columns of A. • Householder transformations can also be used to zero out only the lower part of a vector. Suppose x T := [y, z]T , where y ∈ Ck , z ∈ Cn−k for some 1 ≤ k < n. The command [u, ˆ a] := housegen(z) defines a Householder transformation   ˆ = I − uˆ uˆ ∗ so that H ˆ z = ae1 ∈ Cn−k . With u := 0 ∈ Cn we see that H uˆ u∗ u = uˆ ∗ uˆ = 2, and        I 0 y I 0 0  ∗ ∗ , where H := I − uu = Hx = − 0 uˆ = ˆ , 0H ae1 0I uˆ 

defines a Householder transformation that produces zeros in the lower part of x.

5.3 Householder Triangulation

111

5.3 Householder Triangulation We say that a matrix R ∈ Cm×n is upper trapezoidal, if ri,j = 0 for j < i and i = 2, 3 . . . , m. Upper trapezoidal matrices corresponding to m < n, m = n, and m > n look as follows: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ xxxx xxx xxxx ⎢ ⎥ ⎢ ⎥ ⎣0 x x x ⎦ , ⎢0 x x x ⎥ , ⎢0 x x ⎥ . ⎣0 0 x x ⎦ ⎣0 0 x ⎦ 00xx 000x 000 In this section we consider a method for bringing a matrix to upper trapezoidal form using Householder transformations.

5.3.1 The Algorithm We treat the cases m > n and m ≤ n separately and consider first m > n. We describe how to find a sequence H 1 , . . . , H n of Householder transformations such that   R1 An+1 := H n H n−1 · · · H 1 A = = R, 0 and where R 1 is square and upper triangular. We define A1 := A,

Ak+1 = H k Ak ,

k = 1, 2, . . . , n.

Suppose Ak has the following form ⎡

(1) a (1) · · · a1,k−1 ⎢ 1,1 .. .. ⎢ . ⎢ . ⎢ (k−1) ⎢ ak−1,k−1 ⎢ ⎢ ⎢ Ak = ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

 =

 Bk Ck . 0 Dk

⎤ (1) (1) (1) a1,k · · · a1,j · · · a1,n .. .. .. ⎥ ⎥ . . . ⎥ (k−1) (k−1) (k−1) ⎥ ak−1,k · · · ak−1,j · · · ak−1,n ⎥ ⎥ (k) (k) (k) ⎥ ak,k · · · ak,j · · · ak,n ⎥ ⎥ .. .. .. ⎥ . . . ⎥ ⎥ (k) (k) (k) ⎥ ai,k · · · ai,j · · · ai,n ⎥ .. .. .. ⎥ ⎥ . . . ⎦ (k) (k) (k) am,k · · · am,j · · · am,n

(5.15)

Thus Ak is upper trapezoidal in its first k − 1 columns (which is true for k = 1).

112

5 Orthonormal and Unitary Transformations

ˆ k := I − uˆ k uˆ ∗k be a Householder transformation that maps the first column Let H (k) (k) ˆ k (D k e1 ) = ak e1 . Using Algorithm 5.1 [ak,k , . . . , am,k ]T of D k to a multiple of e1 , H " ! I 0 we have [uˆ k , ak ] = housegen(D k e1 ). Then H k := k−1 ˆ k is a Householder 0 H transformation and     Bk Ck B k+1 C k+1 Ak+1 := H k Ak = , ˆ k Dk = 0 D k+1 0 H where B k+1 ∈ Ck×k is upper triangular and D k+1 ∈ C(m−k)×(n−k) . Thus Ak+1 is upper trapezoidal in its first k columns   and the reduction has been carried one step further. At the end R := An+1 = R01 , where R 1 is upper triangular. The process can also be applied to A ∈ Cm×n if m ≤ n. If m = 1 then A is already in upper trapezoidal form. Suppose m > 1. In this case m − 1 Householder transformations will suffice and H m−1 · · · H 1 A is upper trapezoidal. In an algorithm we can store most of the vector uˆ k = [ukk , . . . , umk ]T and the matrix and Ak in A. However, the elements uk,k and ak = rk,k have to compete for the diagonal in A. For m = 4 and n = 3 the two possibilities look as follows: ⎡ u11 ⎢u21 A=⎢ ⎣u31 u41

r12 u22 u32 u42

⎤ ⎡ r13 r11 ⎥ ⎢ r23 ⎥ u21 or A = ⎢ ⎣u31 u33 ⎦ u43 u41

r12 r22 u32 u42

⎤ r13 r23 ⎥ ⎥. r33 ⎦ u43

The following algorithm for Householder triangulation takes A ∈ Cm×n and B ∈ Cm×r as input, and uses housegen to compute Householder transformations H 1 , . . . , H s so that R = H s · · · H 1 A is upper trapezoidal, and C = H s · · · H 1 B. The matrices R and C are returned. If B is the empty matrix then C is the empty matrix with m rows and 0 columns. rk,k is stored in A, and uk,k is stored in a separate vector. We will see that the algorithm can be used to solve linear systems and least squares problems with right hand side(s) B, and to compute the product of the Householder transformations by choosing B = I . function [R,C] = housetriang(A,B) % [R,C] = housetriang(A,B) [m,n]=size(A); r=size(B,2); A=[A,B]; for k=1:min(n,m-1) [v,A(k,k)]=housegen(A(k:m,k)); C=A(k:m,k+1:n+r); A(k:m,k+1:n+r)=C-v*(v’*C); end R=triu(A(:,1:n)); C=A(:,n+1:n+r); end Listing 5.2 housetriang

5.3 Householder Triangulation

113

ˆ k C = (I − vv ∗ )C = C − v(v ∗ C). Here v = uˆ k and the update is computed as H The MATLAB command triu extracts the upper triangular part of A introducing zeros in rows n + 1, . . . , m.

5.3.2 The Number of Arithmetic Operations The bulk of the work in Algorithm 5.2 is the computation of C −v ∗(v ∗ ∗C) for each k. Since in Algorithm 5.2, C ∈ C(m−k+1)×(n+r−k) and m ≥ n the cost of computing the update C − v ∗ (v T ∗ C) in the real case is 4(m − k + 1)(n + r − k) arithmetic operations. This implies that the work in Algorithm 5.2 can be estimated as 

n 0

2 4(m − k)(n + r − k)dk = 2m(n + r)2 − (n + r)3 . 3

(5.16)

For m = n and r = 0 this gives 4n3 /3 = 2Gn for the number of arithmetic operations to bring a matrix A ∈ Rn×n to upper triangular form using Householder transformations.

5.3.3 Solving Linear Systems Using Unitary Transformations Consider now the linear system Ax = b, where A is square. Using Algorithm 5.2 we obtain an upper triangular system Rx = c that is upper triangular and nonsingular if A is nonsingular. Thus, it can be solved by back substitution and we have a method for solving linear systems that is an alternative to Gaussian elimination. The two methods are similar since they both reduce A to upper triangular form using certain transformations and they both work for nonsingular systems. Which method is better? Here is a very brief discussion. • Advantages with Householder: – Row interchanges are not necessary, but see [3]. – Numerically stable. • Advantages with Gauss – Half the number of arithmetic operations compared to Householder. – Row interchanges are often not necessary. – Usually stable (but no guarantee). Linear systems can be constructed where Gaussian elimination will fail numerically even if row interchanges are used, see [21]. On the other hand the transformations used in Householder triangulation are unitary so the method is quite stable. So why is Gaussian elimination more popular than Householder triangulation? One

114

5 Orthonormal and Unitary Transformations

reason is that the number of arithmetic operations in (5.16) when m = n is 4n3 /3 = 2Gn , which is twice the number for Gaussian elimination. Numerical stability can be a problem with Gaussian elimination, but years and years of experience shows that it works well for most practical problems and pivoting is often not necessary. Also Gaussian elimination often wins for banded and sparse problems.

5.4 The QR Decomposition and QR Factorization Gaussian elimination without row interchanges results in an LU factorization A = LU of A ∈ Cn×n . Consider Householder triangulation of A. Applying Algorithm 5.2 gives R = H n−1 · · · H 1 A implying the factorization A = QR, where Q = H 1 · · · H n−1 is unitary and R is upper triangular. This is known as a QR factorization of A.

5.4.1 Existence For a rectangular matrix we define the following. Definition 5.5 (QR Decomposition) Let A ∈ Cm×n with m, n ∈ N. We say that A = QR is a QR decomposition of A if Q ∈ Cm,m is square and unitary and R ∈ Cm×n is upper trapezoidal. If m ≥ n then R takes the form  R=

R1



0m−n,n

where R 1 ∈ Cn×n is upper triangular and 0m−n,n is the zero matrix with m − n rows and n columns. For m ≥ n we call A = Q1 R 1 a QR factorization of A if Q1 ∈ Cm×n has orthonormal columns and R 1 ∈ Cn×n is upper triangular. Suppose m ≥ n. A QR factorization is obtained from a QR decomposition A = QR by simply using the first n columns of Q and the first n rows of R. Indeed, if we partition Q as [Q1 , Q2 ] and R = R01 , where Q1 ∈ Rm×n and R 1 ∈ Rn×n then A = Q1 R 1 is a QR factorization of A. On the other hand a QR factorization A = Q1 R 1 of A can be turned into a QR decomposition by extending the set of columns {q 1 , . . . , q n } of Q1 into an orthonormal basis {q 1 , . . . , q n , q n+1 , . . . , q m } for Rm and adding m − n rows of zeros to R 1 . We then obtain a QR decomposition A = QR, where Q = [q 1 , . . . , q m ] and R = R01 .

5.4 The QR Decomposition and QR Factorization

115

Example 5.2 (QR Decomposition and Factorization) Consider the factorization ⎡ 1 ⎢1 A=⎢ ⎣1 1

⎡ ⎤ ⎡ ⎤ ⎤ 1 1 −1 −1 3 1 223 ⎢ ⎥ ⎢ ⎥ 3 7⎥ ⎥ = 1 ⎢1 1 1 1 ⎥ × ⎢0 4 5⎥ = QR. −1 −4⎦ 2 ⎣1 −1 −1 1 ⎦ ⎣0 0 6⎦ 1 −1 1 −1 −1 2 000

Since QT Q = I and R is upper trapezoidal, this is a QR decomposition of A. A QR factorization A = Q1 R 1 is obtained by dropping the last column of Q and the last row of R, so that ⎡

⎤ ⎡ ⎤ 1 1 −1 223 ⎥ 1⎢ 1 1 1 ⎥ × ⎣0 4 5⎦ = Q1 R 1 . A= ⎢ 2 ⎣1 −1 −1⎦ 006 1 −1 1 Consider existence and uniqueness. Theorem 5.9 (Existence of QR Decomposition) Any matrix A ∈ Cm×n with m, n ∈ N has a QR decomposition. Proof If m = 1 then A is already in upper trapezoidal form and A = [1]A is a QR decomposition of A. Suppose m > 1 and set s := min(m − 1, n). Note that the function housegen(x) returns the vector u in a Householder transformation for any vector x. With B = I in Algorithm 5.2 we obtain R = CA and C = H s · · · H 2 H 1 . Thus A = QR is a QR decomposition of A since Q := C ∗ = H 1 · · · H s is a product of unitary matrices and therefore unitary.   Theorem 5.10 (Uniqueness of QR Factorization) If m ≥ n then the QR factorization is unique if A has linearly independent columns and R has positive diagonal elements. Proof Let A = Q1 R 1 be a QR factorization of A ∈ Cm×n . Now A∗ A = R ∗1 Q∗1 Q1 R 1 = R ∗1 R 1 . By Lemma 4.2 the matrix A∗ A is positive definite, the matrix R 1 is nonsingular, and if its diagonal elements are positive then R ∗1 R 1 is the Cholesky factorization of A∗ A. Since the Cholesky factorization is unique it follows that R 1 is unique and since necessarily Q1 = AR −1   1 , it must also be unique. Example 5.3 (QR Decomposition and Factorization) Consider finding the QR  2 −1  decomposition and factorization of the matrix A = −1 using the method of the 2   T 5 −4 uniqueness proof of Theorem 5.10. We find B := A A = −4 5 . The Cholesky   3 4 1 −1 factorization of B = R T R is given by R = √1 50 −4 = √ 05 3 . Now R 5 3 5  2 1 . Since A is square A = QR is both the QR so Q = AR −1 = √1 −1 2 5 decomposition and QR factorization of A.

116

5 Orthonormal and Unitary Transformations

5.5 QR and Gram-Schmidt The Gram-Schmidt orthogonalization of the columns of A can be used to find the QR factorization of A. Theorem 5.11 (QR and Gram-Schmidt) Suppose A ∈ Rm×n has rank n and let v 1 , . . . , v n be the result of applying Gram Schmidt to the columns a 1 , . . . , a n of A, i.e., v1 = a1 ,

vj = aj −

j −1 T

aj vi i=1

v Ti v i

vi ,

for j = 2, . . . , n.

(5.17)

Let Q1 := [q 1 , . . . , q n ],

q j :=



v 1 2 a T2 q 1 a T3 q 1 ⎢ 0 v  a T q 2 2 3 2 ⎢ ⎢ 0 v 3 2 ⎢ ⎢ R 1 := ⎢ .. . ⎢ ⎢ ⎢ ⎣

vj , v j 2

j = 1, . . . , n and

⎤ · · · a Tn−1 q 1 a Tn q 1 · · · a Tn−1 q 2 a Tn q 2 ⎥ ⎥ ⎥ · · · a Tn−1 q 3 a Tn q 3 ⎥ .. ⎥ .. ⎥. ... . . ⎥ ⎥ ⎥ .. . v n−1 2 a Tn q n−1 ⎦ 0 v n 2

(5.18)

Then A = Q1 R 1 is the unique QR factorization of A. Proof Let Q1 and R 1 be given by (5.18). The matrix Q1 is well defined and has orthonormal columns, since {q 1 , . . . , q n } is an orthonormal basis for R(A) by Theorem 5.4. By (5.17) aj = vj +

j −1 T

aj vi i=1

v Ti v i

v i = rjj q j +

j −1

q i rij = Q1 R 1 ej , j = 1, . . . , n.

i=1

Clearly R 1 has positive diagonal elements and the factorization is unique.

 

Example 5.4 (QR Using Gram-Schmidt) finding the QR decomposition   2Consider −1 = [a 1 , a 2 ] using Gram-Schmidt. and factorization of the matrix A = −1 2   a T2 v 1 Using (5.17) we find v 1 = a 1 and v 2 = a 2 − T v 1 = 35 12 . Thus Q = [q 1 , q 2 ], v1 v1  2    and q 2 = √1 12 . By (5.18) we find where q 1 = √1 −1 5

5

    1 5 −4 v 1 2 a T2 q 1 = √ R1 = R = 0 v 2 2 5 0 3 and this agrees with what we found in Example 5.3.

5.6 Givens Rotations

117

Warning The Gram-Schmidt orthogonalization process should not be used to compute the QR factorization numerically. The columns of Q1 computed in floating point arithmetic using Gram-Schmidt orthogonalization will often be far from orthogonal. There is a modified version of Gram-Schmidt which behaves better numerically, see [2]. Here we only considered Householder transformations (cf. Algorithm 5.2).

5.6 Givens Rotations In some applications, the matrix we want to triangulate has a special structure. Suppose for example that A ∈ Rn×n is square and upper Hessenberg as illustrated by a Wilkinson diagram for n = 4 ⎡

x ⎢x A=⎢ ⎣0 0

x x x 0

x x x x

⎤ x x⎥ ⎥. x⎦ x

Only one element in each column needs to be annihilated and a full Householder transformation will be inefficient. In this case we can use a simpler transformation. Definition 5.6 (Givens Rotation, Plane Rotation) A plane rotation (also called a Given’s rotation) is a matrix P ∈ R2,2 of the form  P :=

 c s , where c2 + s 2 = 1. −s c

A plane rotation is an orthogonal matrix and there is a unique angle θ ∈ [0, 2π) such that c = cos θ and s = sin θ . Moreover, the identity matrix is a plane rotation corresponding to θ = 0. A vector x in the plane is rotated an angle θ clockwise by P = R. See Exercise 5.16 and Fig. 5.4. Fig. 5.4 A plane rotation

x

θ

y=Rx

118

5 Orthonormal and Unitary Transformations

A Givens rotation can be used to introduce one zero in a vector. Consider first the case of a 2-vector. Suppose x=

  x1 = 0, x2

c :=

x1 , r

s :=

x2 , r

r := x2 .

If x ∈ R2 then        1 x12 + x22 1 x1 x2 x1 r = , = Px = 0 0 r −x2 x1 x2 r and we have introduced a zero in x. We can take P = I when x = 0. For an n-vector x ∈ Rn and 1 ≤ i < j ≤ n we define a rotation in the i, j -plane as a matrix P ij = (pkl ) ∈ Rn×n by pkl = δkl except for positions ii, jj, ij, j i, which are given by 

   pii pij c s = , where c2 + s 2 = 1. pj i pjj −s c

Thus, for n = 4, ⎡

P 12

c ⎢−s =⎢ ⎣0 0

s c 0 0

⎤ 00 0 0⎥ ⎥, 1 0⎦ 01



P 13

c ⎢0 =⎢ ⎣−s 0

⎤ 0s0 1 0 0⎥ ⎥, 0 c 0⎦ 001

P 23

⎡ 1 0 ⎢0 s =⎢ ⎣0 −s 0 0

0 c c 0

⎤ 0 0⎥ ⎥. 0⎦ 1

Premultiplying a matrix by a rotation in the i, j -plane changes only rows i and j of the matrix, while post multiplying the matrix by such a rotation only changes column i and j . In particular, if B = P ij A and C = AP ij then B(k, :) = A(k, :), C(:, k) = A(:, k) for all k = i, j and 

          B(i, :) cs A(i, :) cs = , C(:, i) C(:, j ) = A(:, i) A(:, j ) . B(j, :) −s c A(j, :) −s c (5.19)

Givens rotations can be used as an alternative to Householder transformations for solving linear systems. It can be shown that for a dense system of order n the number of arithmetic operations is asymptotically 2n3 , corresponding to the work of 3 Gaussian eliminations, while, the work using Householder transformations corresponds to 2 Gaussian eliminations. However, for matrices with a special structure Givens rotations can be used to advantage. As an example consider an upper Hessenberg matrix A ∈ Rn×n . It can be transformed to upper triangular form using rotations P i,i+1 for i = 1, . . . , n − 1. For n = 4 the process can be illustrated as follows.

5.7 Exercises Chap. 5

A=

x x x x x x x P 12 0x x x → 0 0 x x

x

119

 r11

r12 0 x 0 x 0 0

r13 x x x

r14  P 23 x → x x

 r11

r12 0 r22 0 0 0 0

r13 r23 x x

r14  r24 P 34 → x x

 r11

r12 0 r22 0 0 0 0

r13 r23 r33 0

r14  r24 r34 . r44

For an algorithm see Exercise 5.18. This reduction is used in the QR method discussed in Chap. 15.

5.7 Exercises Chap. 5 5.7.1 Exercises Sect. 5.1 Exercise 5.1 (The A∗ A Inner Product) Suppose A ∈ Cm×n has linearly independent columns. Show that x, y := y ∗ A∗ Ax defines an inner product on Cn . Exercise 5.2 (Angle Between Vectors in Complex Case) Show that in the complex case there is a unique angle θ in [0, π/2] such that cos θ =

|x, y| . xy

(5.20)

Exercise 5.3 (x T Ay Inequality (Exam Exercise 1979-3)) Suppose A ∈ Rn×n is symmetric positive definite. Show that |x T Ay|2 ≤ x T Ax y T Ay for all x, y ∈ Rn , with equality if and only if x and y are linearly dependent.

5.7.2 Exercises Sect. 5.2 Exercise 5.4 (What Does Algorithm Housegen Do When x = e1 ?) Determine H in Algorithm 5.1 when x = e1 . Exercise 5.5 (Examples of Householder Transformations) If x, y ∈ Rn with x = y and v := x − y = 0 then it follows from Example 5.1 that  2 vvT  2 I − 2 v T v x = y. Use this to construct a Householder transformation H such that H x = y in the following cases.     3 5 a) x = , y= . 4 0 ⎡ ⎤ ⎡ ⎤ 2 0 b) x = ⎣ 2 ⎦ , y = ⎣ 3 ⎦. 1 0

120

5 Orthonormal and Unitary Transformations

Exercise 5.6 (2 × 2 Householder Transformation) Show that a real 2 × 2 Householder transformation can be written in the form   − cos φ sin φ H = . sin φ cos φ Find H x if x = [cos φ, sin φ]T . Exercise 5.7 (Householder Transformation (Exam Exercise 2010-1)) a) Suppose x, y ∈ Rn with x2 = y2 and v := x − y = 0. Show that H x = y,

where H := I − 2

vv T . vT v

b) Let B ∈ R4,4 be given by ⎡

01 ⎢0 0 B := ⎢ ⎣0 0 0

⎤ 00 1 0⎥ ⎥, 0 1⎦ 00

where 0 < < 1. Compute a Householder transformation H and a matrix B 1 such that the first column of B 1 := H BH has a zero in the last two positions.

5.7.3 Exercises Sect. 5.4 Exercise 5.8 (QR Decomposition) ⎡

⎤ 12 ⎢1 2⎥ ⎥ A=⎢ ⎣1 0⎦ , 10



⎤ 1 1 1 1 1 ⎢1 1 −1 −1⎥ ⎥, Q= ⎢ 2 ⎣1 −1 −1 1 ⎦ 1 −1 1 −1



⎤ 22 ⎢0 2⎥ ⎥ R=⎢ ⎣0 0⎦ . 00

Show that Q is orthonormal and that QR is a QR decomposition of A. Find a QR factorization of A. Exercise 5.9 (Householder Triangulation) a) Let ⎡

⎤ 1 0 1 A := ⎣ −2 −1 0 ⎦ . 2 2 1

5.7 Exercises Chap. 5

121

Find Householder transformations H 1 , H 2 ∈ R3×3 such that H 2 H 1 A is upper triangular. b) Find the QR factorization of A, when R has positive diagonal elements. Exercise 5.10 (Hadamard’s Inequality) In this exercise we use the QR factorization to prove a classical determinant inequality. For any A = [a 1 , . . . , a n ] ∈ Cn×n we have |det(A)| ≤

n 

a j 2 .

(5.21)

j =1

Equality holds if and only if A has a zero column or the columns of A are orthogonal. a) Show that if Q is unitary then |det(Q)| = 1. b) Let A = QR be a QR factorization of A and let R = [r 1 , . . . , r n ]. Show that ∗ 2 (A∗ A)jj = a j 22 = (R j 2 .  R)jj = r c) Show that |det(A)| = nj=1 |rjj | ≤ nj=1 a j 2 . d) Show that we have equality if A has a zero column, e) Suppose the columns of A are nonzero. Show that we have equality if and only if the columns of A are orthogonal.1 Exercise 5.11 (QL Factorization (Exam Exercise 1982-2)) Suppose B ∈ Rn×n is symmetric and positive definite. It can be shown that B has a factorization of the form B = LT L, where L is lower triangular with positive diagonal elements (you should not show this). Note that this is different from the Cholesky factorization B = LLT . a) Suppose B = LT L. Write down the equations to determine the elements li,j of L, in the order i = n, n − 1, . . . , 1 and j = i, 1, 2 . . . , i − 1. b) Explain (without making a detailed algorithm) how the LT L factorization can be used to solve the linear system Bx = c. Compute LF . Is the algorithm stable? c) Show that every nonsingular matrix A ∈ Rn×n can be factored in the form A = QL, where Q ∈ Rn×n is orthogonal and L ∈ Rn×n is lower triangular with positive diagonal elements. d) Show that the QL factorization in c) is unique. Exercise 5.12 (QL-Factorization (Exam Exercise 1982-3)) In this exercise we will develop an algorithm to find a QL-factorization of A ∈ Rn×n (cf. Exam exercise 1982-2) using Householder transformations. a) Given vectors a := [a1 , . . . , an ]T ∈ Rn and en := [0, . . . , 0, 1]T . Find v ∈ Rn ∗ such that the Householder transformation H := I − 2 vv v ∗ v satisfies H a = −sen , where |s| = a2 . How should we choose the sign of s?

1 Show

that we have equality ⇐⇒ R is diagonal ⇐⇒ A∗ A is diagonal.

122

5 Orthonormal and Unitary Transformations

b) Let 1 ≤ r ≤ n, v r ∈ Rr , v r = 0, and V r := I r − 2

√ vr v r v ∗r = I r − ur u∗r , with ur := 2 . ∗ vr v r v r 2

" ! 0 Show that H := V0r I n−r is a Householder transformation. Show also that if ai,j = 0 for i = 1, . . . , r and j = r + 1, . . . , n then the last r columns of A and H A are the same. c) Explain, without making a detailed algorithm, how we to a given matrix A ∈ Rn×n can find Householder transformations H 1 , . . . , H n−1 such that H n−1 , . . . , H 1 A is lower triangular. Give a QL factorization of A. Exercise 5.13 (QR Fact. of Band Matrices (Exam Exercise 2006-2)) Let A ∈ Rn×n be a nonsingular symmetric band matrix with bandwidth d ≤ n − 1, so that aij = 0 for all i, j with |i − j | > d. We define B := AT A and let A = QR be the QR factorization of A where R has positive diagonal entries. a) Show that B is symmetric. b) Show that B has bandwidth ≤ 2d. c) Write a MATLAB function B=ata(A,d) which computes B. You shall exploit the symmetry and the function should only use O(cn2 ) flops, where c only depends on d. d) Estimate the number of arithmetic operations in your algorithm. e) Show that AT A = R T R. f) Explain why R has upper bandwidth 2d. g) We consider 3 methods for finding the QR factorization of the band matrix A, where we assume that n is much bigger than d. The methods are based on 1. Gram-Schmidt orthogonalization, 2. Householder transformations, 3. Givens rotations. Which method would you recommend for a computer program using floating point arithmetic? Give reasons for your answer. Exercise 5.14 (Find QR Factorization (Exam Exercise 2008-2)) Let ⎡

⎤ 2 1 ⎢ 2 −3⎥ ⎥ A := ⎢ ⎣−2 −1⎦ −2 3 a) Find the Cholesky factorization of AT A. b) Find the QR factorization of A.

5.7 Exercises Chap. 5

123

5.7.4 Exercises Sect. 5.5 Exercise 5.15 (QR Using Gram-Schmidt, II) Construct Q1 and R 1 in Example 5.2 using Gram-Schmidt orthogonalization.

5.7.5 Exercises Sect. 5.6 Exercise 5.16 ! " (Plane Rotation) Show that if x r cos (α−θ) r sin (α−θ) .

=

 r cos α  r sin α

then P x

=

Exercise 5.17 (Updating the QR Decomposition) Let H ∈ R4,4 be upper Hessenberg. Find Givens rotation matrices G1 , G2 , G3 such that G3 G2 G1 H = R is upper triangular. (Here each Gk = P i,j for suitable i, j, c og s, and for each k you are meant to find suitable i and j .) Exercise 5.18 (Solving Upper Hessenberg System Using Rotations) Let A ∈ Rn×n be upper Hessenberg and nonsingular, and let b ∈ Rn . The following algorithm solves the linear system Ax = b using rotations P k,k+1 for k = 1, . . . , n − 1. It uses the back solve Algorithm 3.2. Determine the number of arithmetic operations of this algorithm. function x=rothesstri(A,b) % x=rothesstri(A,b) n=length(A); A=[A b]; for k=1:n-1 r=norm([A(k,k),A(k+1,k)]); if r>0 c=A(k,k)/r; s=A(k+1,k)/r; A([k k+1],k+1:n+1) ... =[c s;-s c]*A([k k+1],k+1:n+1); end A(k,k)=r; A(k+1,k)=0; end x=rbacksolve(A(:,1:n),A(:,n+1),n); end Listing 5.3 rothesstri

Exercise 5.19 (A Givens Transformation  (Exam Exercise 2013-2)) A Givens cs rotation of order 2 has the form G := ∈ R2×2 , where s 2 + c2 = 1. −s c

124

5 Orthonormal and Unitary Transformations

a) Is G symmetric and unitary? b) Given x1 , x2 ∈ R and set r :=     y x where 1 = G 1 . y2 x2

%

x12 + x22 . Find G and y1 , y2 so that y1 = y2 ,

Exercise 5.20 (Givens Transformations (Exam Exercise 2016-3)) Recall that a rotation in the ij -plane is an m × m-matrix, denoted P i,j , which differs from the identity matrix only in the entries ii, ij, j i, jj , which equal 

   pii pij cos θ sin θ = , pj i pjj − sin θ cos θ

i.e., these four entries are those of a Givens rotation. a) For θ ∈ R, let P be a Givens rotation of the form 

cos θ sin θ P = − sin θ cos θ



and let x be a fixed vector in R2 . Show that there exists a unique θ ∈ (−π/2, π/2] so that P x = ±x2 e1 , where e1 = (1, 0)T . b) Show that, for any vector w ∈ Rm , one can find rotations in the 12-plane, 23plane, . . ., (m − 1)m-plane, so that ⎡ ⎤ α ⎢0⎥ ⎢ ⎥ P 1,2 P 2,3 · · · P m−2,m−1 P m−1,m w = ⎢ . ⎥ , ⎣ .. ⎦ 0 where α = ±w2 . c) Assume that m ≥ n. Recall that an m × n-matrix A with entries ai,j is called upper trapezoidal if there are no nonzero entries below the main diagonal (a1,1 , a2,2 , . . . , an,n ) (for m = n, upper trapezoidal is the same as upper triangular). Recall also that an m × n-matrix is said to be in upper Hessenberg form if there are no nonzero entries below the subdiagonal (a2,1, a3,2 , . . . , an,n−1 ).

5.8 Review Questions

125

Explain that, if an m × n-matrix H is in upper Hessenberg form, one can find plane rotations so that P m−1,m P m−2,m−1 · · · P 2,3 P 1,2 H is upper trapezoidal. d) Let again A be an m × n-matrix with m ≥ n, and let A− be the matrix obtained by removing column k in A. Explain how you can find a QR Decomposition of A− , when we already have a QR decomposition A = QR of A.2 Exercise 5.21 (Cholesky and Givens (Exam Exercise 2018-2)) Assume that A is n × n symmetric positive definite, and with Cholesky factorization A = LL∗ . Assume also that z is a given column vector of length n. a) Explain why A + zz∗ has a unique Cholesky factorization. b) Assume that we are given a QR decomposition  ∗   L R = Q , z∗ 0 with R square and upper triangular. Explain why R is nonsingular. Explain also that, if R also has nonnegative diagonal entries, then A + zz∗ has the Cholesky factorization R ∗ R. c) Explain how one can find plane rotations Pi1 ,n+1 , Pi2 ,n+1 ,. . . ,Pin ,n+1 so that  ∗   L R Pi1 ,n+1 Pi2 ,n+1 · · · Pin ,n+1 ∗ = , z 0

(5.22)

withR  upper triangular, and explain how to obtain a QR decomposition of  L∗ from this. In particular you should write down the numbers i1 , . . . , in . z∗ Is it possible to choose the plane rotations so that R  in (5.22) also has positive diagonal entries?

5.8 Review Questions 5.8.1 5.8.2 5.8.3

What is a Householder transformation? Why are they good for numerical work? What are the main differences between solving a linear system by Gaussian elimination and Householder transformations?

2 Consider

the matrix QT A− .

126

5.8.4 5.8.5 5.8.6 5.8.7

5 Orthonormal and Unitary Transformations

What are the differences between a QR decomposition and a QR factorization? Does any matrix have a QR decomposition? What is a Givens transformation? Is a unitary matrix always well conditioned?

Part II

Eigenpairs and Singular Values

We turn now to eigenpairs of matrices, i.e., eigenvalues and corresponding eigenvectors. The eigenpairs of a matrix are easily determined if it is diagonal. Indeed, the eigenvalues are the diagonal elements and the eigenvectors are unit vectors. We will see that not all matrices can be reduced to diagonal form using eigenvalue preserving transformations known as similarity transformations. This raises the question: how close to a diagonal matrix can we reduce a general matrix using similarity transformations? We give one answer to this question, the Jordan factorization or the Jordan canonical form. We also characterize matrices which can be diagonalized using unitary similarity transformations, and study the subclass of Hermitian matrices. Numerical methods for determining eigenvalues and eigenvectors will be considered in Chaps. 14 and 15. In the second chapter in this part we consider the important singular value decomposition of a rectangular matrix. This decomposition will play a central role in several of the remaining chapters in this book.

Chapter 6

Eigenpairs and Similarity Transformations

We have seen that a Hermitian matrix is positive definite if and only if it has positive eigenvalues. Eigenvalues and some related quantities called singular values occur in many branches of applied mathematics and are also needed for a deeper study of linear systems and least squares problems. In this and the next chapter we study eigenvalues and singular values. Recall that if A ∈ Cn×n is a square matrix, λ ∈ C and x ∈ Cn then (λ, x) is an eigenpair for A if Ax = λx and x is nonzero. The scalar λ is called an eigenvalue and x is said to be an eigenvector. The set of eigenvalues is called the spectrum of A and is denoted by σ (A). For example, σ (I ) = {1, . . . , 1} = {1}. The eigenvalues are the roots of the characteristic polynomial of A given for λ ∈ C by πA (λ) = det(A − λI ). The equation det(A − λI ) = 0 is called the characteristic equation of A. Equivalently the characteristic equation can be written det(λI − A) = 0.

6.1 Defective and Nondefective Matrices For the eigenvectors we will see that it is important to know if the eigenvectors of a matrix of order n form a basis for Cn . We say that A is defective if the eigenvectors do not form a basis for Cn and nondefective otherwise. We have the following sufficient condition for a matrix to be nondefective. Theorem 6.1 (Distinct Eigenvalues) A matrix with distinct eigenvalues is nondefective, i.e., its eigenvectors are linearly independent.

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_6

129

130

6 Eigenpairs and Similarity Transformations

Proof The proof is by contradiction. Suppose A has eigenpairs (λk , x k ), k = 1, . . . , n, with linearly dependent eigenvectors. Let m be the smallest integer such that {x 1 , . . . , x m } is linearly dependent. Thus m j =1 cj x j = 0, where at least one cj is nonzero. We must have m ≥ 2 since eigenvectors are nonzero. We find m

cj x j = 0 ⇒

j =1

m

cj Ax j =

j =1

m

cj λj x j = 0.

j =1

m−1 From the last relation we subtract m j =1 cj λm x j = 0 and find j =1 cj (λj − λm )x j = 0. But since λj − λm = 0 for j = 1, . . . , m − 1 and at least one cj = 0 for j < m we see that {x 1 , . . . , x m−1 } is linearly dependent, contradicting the minimality of m.   If some of the eigenvalues occur with multiplicity higher than one then the matrix can be either defective or nondefective. Example 6.1 (Defective and Nondefective Matrices) Consider the matrices   10 I := , 01

  11 J := . 01

Since I x = x and λ1 = λ2 = 1 any vector x ∈ C2 is an eigenvector for I . In particular the two unit vectors e1 and e2 are eigenvectors and form an orthonormal basis for C2 . We conclude that the identity matrix is nondefective. The matrix J also has the eigenvalue one with multiplicity two, but since J x = x if and only if x2 = 0, any eigenvector must be a multiple of e1 . Thus J is defective. If the eigenvectors x 1 , . . . , x n form a basis for Cn then any x ∈ Cn can be written x=

n

cj x j for some scalars c1 , . . . , cn .

j =1

We call this an eigenvector expansion of x. Thus to any nondefective matrix there corresponds an eigenvector expansion.   2 −1 Example 6.2 (Eigenvector Expansion Example) Eigenpairs of are −1 2 (1, [1, 1]T ) and (3, [1, −1]T ). Any x = [x1 , x2 ]T ∈ C2 has the eigenvector expansion     x1 + x2 1 x1 − x2 1 x= + . 1 −1 2 2

6.1 Defective and Nondefective Matrices

131

6.1.1 Similarity Transformations We need a transformation that can be used to simplify a matrix without changing the eigenvalues. Definition 6.1 (Similar Matrices) Two matrices A, B ∈ Cn×n are said to be similar if there is a nonsingular matrix S ∈ Cn×n such that B = S −1 AS. The transformation A → B is called a similarity transformation. The columns of S are denoted by s 1 , s 2 , . . . , s n . We note that 1. Similar matrices have the same eigenvalues, they even have the same characteristic polynomial. Indeed, by the product rule for determinants det(AC) = det(A) det(C) so that   πB (λ) = det(S −1 AS − λI ) = det S −1 (A − λI )S = det(S −1 ) det(A − λI ) det(S) = det(S −1 S) det(A − λI ) = πA (λ), since det(I ) = 1. 2. (λ, x) is an eigenpair for S −1 AS if and only if (λ, Sx) is an eigenpair for A. In fact (S −1 AS)x = λx if and only if A(Sx) = λ(Sx). 3. If S −1 AS = D = diag(λ1 , . . . , λn ) we can partition AS = SD by columns to obtain [As 1 , . . . , As n ] = [λ1 s 1 , . . . , λn s n ]. Thus the columns of S are eigenvectors of A. Moreover, A is nondefective since S is nonsingular. Conversely, if A is nondefective then it can be diagonalized by a similarity transformation S −1 AS, where the columns of S are eigenvectors of A. 4. For any square matrices A, C ∈ Cn×n the two products AC and CA have the same characteristic polynomial. More generally, for rectangular matrices A ∈ Cm×n and C ∈ Cn×m , with say m > n, the bigger matrix has m − n extra zero eigenvalues πAC (λ) = λm−n πCA (λ),

λ ∈ C.

(6.1)

To show this define for any m, n ∈ N block triangular matrices of order n + m by E :=

  AC 0 , C 0

 F :=

 0 0 , C CA

S=

  I A . 0 I

  I −A = . Moreover, ES = SF so The matrix S is nonsingular with S 0 I E and F are similar and therefore have the same characteristic polynomials. Moreover, this polynomial is the product of the characteristic polynomial of the diagonal blocks. But then πE (λ) = λn πAC (λ) = πF (λ) = λm πCA (λ). This implies the statements for m ≥ n. −1

132

6 Eigenpairs and Similarity Transformations

6.1.2 Algebraic and Geometric Multiplicity of Eigenvalues Linear independence of eigenvectors depends on the multiplicity of the eigenvalues in a nontrivial way. For multiple eigenvalues we need to distinguish between two kinds of multiplicities. Suppose A ∈ Cn×n has k distinct eigenvalues λ1 , . . . , λk with multiplicities a1 , . . . , ak so that πA (λ) := det(A − λI ) = (λ1 − λ)a1 · · · (λk − λ)ak ,

λi = λj , i = j,

k

ai = n.

i=1

(6.2) The positive integer ai = a(λi ) = aA (λi ) is called the multiplicity, or more precisely the algebraic multiplicity of the eigenvalue λi . The multiplicity of an eigenvalue is simple (double, triple) if ai is equal to one (two, three). To define a second kind of multiplicity we consider for each λ ∈ σ (A) the nullspace N (A − λI ) := {x ∈ Cn : (A − λI )x = 0}

(6.3)

of A − λI . The nullspace is a subspace of Cn consisting of all eigenvectors of A corresponding to the eigenvalue λ. The dimension of the subspace must be at least one since A − λI is singular. Definition 6.2 (Geometric Multiplicity) The geometric multiplicity g = g(λ) = gA (λ) of an eigenvalue λ of A is the dimension of the nullspace N (A − λI ). Example 6.3 (Geometric Multiplicity) The n × n identity matrix I has the eigenvalue λ = 1 with πI (λ) = (1 − λ)n . Since I − λI is the zero matrix when λ = 1, the nullspace of I − λI is all of n-space and it follows  that  a = g = n. On the other 11 hand we saw in Example 6.1 that the matrix J := has the eigenvalue λ = 1 01 with a = 2 and any eigenvector is a multiple of e1 . Thus g = 1. Theorem 6.2 (Geometric Multiplicity of Similar Matrices) Similar matrices have the same eigenvalues with the same algebraic and geometric multiplicities. Proof Similar matrices have the same characteristic polynomials and only the invariance of geometric multiplicity needs to be shown. Suppose λ ∈ σ (A), dim N (S −1 AS − λI ) = k, and dim N (A − λI ) = . We need to show that k = . Suppose v 1 , . . . , v k is a basis for N (S −1 AS − λI ). Then S −1 ASv i = λv i or ASv i = λSv i , i = 1, . . . , k. But then {Sv 1 , . . . , Sv k } ⊂ N (A − λI ), which implies that k ≤ . Similarly, if w 1 , . . . , w is a basis for N (A − λI ) then {S −1 w1 , . . . , S −1 w } ⊂ N (S −1 AS − λI ). which implies that k ≥ . We conclude that k = .  

6.2 The Jordan Factorization

133

For a proof of the following theorem see the next section.1 Theorem 6.3 (Geometric Multiplicity) We have 1. The geometric multiplicity of an eigenvalue is always bounded above by the algebraic multiplicity of the eigenvalue. 2. The number of linearly independent eigenvectors of a matrix equals the sum of the geometric multiplicities of the eigenvalues. 3. A matrix A ∈ Cn×n has n linearly independent eigenvectors if and only if the algebraic and geometric multiplicity of all eigenvalues are the same.

6.2 The Jordan Factorization We have seen that a nondefective matrix can be diagonalized by its eigenvectors, while a defective matrix does not enjoy this property. The following question arises. How close to a diagonal matrix can we reduce a general matrix by a similarity transformation? We give one answer to this question, called the Jordan factorization, or the Jordan canonical form, in Theorem 6.4. For a proof, see for example [10]. The Jordan factorization is an important tool in matrix analysis and it has applications to systems of differential equations, see [8]. The Jordan factorization involves bidiagonal matrices called Jordan blocks. Definition 6.3 (Jordan Block) A Jordan block of order m, denoted J m (λ) is an m × m matrix of the form ⎡λ 1



0 ··· 0 0 0 λ 1 ··· 0 0 ⎢ 0 0 λ ··· 0 0 ⎥

J m (λ) := ⎢ ⎣ ...

= λI m + E m , .. ⎥ .⎦

0 0 0 ··· λ 1 0 0 0 ··· 0 λ

⎡ 0 1 0 ···



00 0 0 1 ··· 0 0 ⎢ 0 0 0 ··· 0 0 ⎥

E m := ⎢ ⎣ ...

. .. ⎥ .⎦

(6.4)

0 0 0 ··· 0 1 0 0 0 ··· 0 0

!λ 1 0" A 3×3 Jordan block has the form J 3 (λ) = 0 λ 1 . Since a Jordan block is upper 0 0 λ triangular λ is an eigenvalue of J m (λ) and any eigenvector must be a multiple of e1 . Indeed, if J m (λ)v = λv for some v = [v1 , . . . , vm ] then λvi−1 + vi = λvi−1 , i = 2, . . . , m which shows that v2 = · · · = vm = 0. Thus, the eigenvalue λ of J m (λ) has algebraic multiplicity a = m and geometric multiplicity g = 1. The Jordan factorization is a factorization of a matrix into Jordan blocks. Theorem 6.4 (The Jordan Factorization of a Matrix) Suppose A ∈ Cn×n has k distinct eigenvalues λ1 , . . . , λk of algebraic multiplicities a1 , . . . , ak and geometric multiplicities g1 , . . . , gk . There is a nonsingular matrix

1 This

can also be shown without using the Jordan factorization, see [9].

134

6 Eigenpairs and Similarity Transformations

S ∈ Cn×n such that J := S −1 AS = diag(U 1 , . . . , U k ), with U i ∈ Cai ×ai ,

(6.5)

where each U i is block diagonal having gi Jordan blocks along the diagonal U i = diag(J mi,1 (λi ), . . . , J mi,gi (λi )).

(6.6)

Here mi,1 , . . . , mi,gi are positive integers and they gare unique if they are ordered so that mi,1 ≥ mi,2 ≥ · · · ≥ mi,gi . Moreover, ai = j i=1 mi,j for all i. We note that 1. The matrices S and J in (6.5) are called Jordan factors. We also call J the Jordan factorization of A. 2. The columns of S are called principal vectors or generalized eigenvectors. They satisfy the matrix equation AS = SJ . 3. Each U i is upper triangular with the eigenvalue λi on the diagonal and consists of gi Jordan blocks. These Jordan blocks can be taken in any order and it is customary to refer to any such block diagonal matrix as the Jordan factorization of A. Example 6.4 (Jordan Factorization) As an example consider the Jordan factorization ⎤

⎡2 1 0 021 ⎢0 0 2

J := diag(U 1 , U 2 ) = ⎢ ⎣

21 02

2

⎥ ⎥ ∈ R8×8 . ⎦

(6.7)

31 03

We encountered this matrix in Exercise 6.1. The eigenvalues together with their algebraic and geometric multiplicities can be read off directly from the Jordan factorization. • U 1 = diag(J 3 (2), J 2 (2), J 1 (2)) and U 2 = J 2 (3). • 2 is an eigenvalue of algebraic multiplicity 6 and geometric multiplicity 3, the number of Jordan blocks corresponding to λ = 2. • 3 is an eigenvalue of algebraic multiplicity 2 and geometric multiplicity 1. The columns of S = [s 1 , . . . , s 8 ] are determined from the columns of J as follows As 1 = 2s 1 ,

As 2 = s 1 + 2s 2 ,

As 4 = 2s 4 ,

As 5 = s 4 + 2s 5 ,

As 6 = 2s 6 , As 7 = 3s 7 ,

As 8 = s 7 + 3s 8 .

As 3 = s 2 + 2s 3 ,

6.3 The Schur Factorization and Normal Matrices

135

We see that the generalized eigenvector corresponding to the first column in a Jordan block is an eigenvector of A. The remaining generalized eigenvectors are not eigenvectors. The matrix ⎡3 1 ⎢ J := ⎢ ⎣

03

⎤ 21 02

2

210 021 002

⎥ ⎥ ⎦

is also a Jordan factorization of A. In any Jordan factorization of this A the sizes of the 4 Jordan blocks J 2 (3), J 2 (2), J 1 (2), J 3 (2) are uniquely given. Proof of Theorem 6.3 1. The algebraic multiplicity ai of an eigenvalue λi is equal to the size of the corresponding U i . Moreover each U i contains gi Jordan blocks of size mi,j ≥ 1. Thus gi ≤ ai . 2. Since A and J are similar the geometric multiplicities of the eigenvalues of these matrices are the same, and it is enough to prove statement 2 for the Jordan factor J . We show this only for the matrix J given by (6.7). The general case should then be clear. There are only 4 eigenvectors of J , namely e1 , e4 , e6 , e7 corresponding to the 4 Jordan blocks. These 4 vectors are clearly linearly independent. Moreover there are k = 2 distinct eigenvalues and g1 + g2 = 3 + 1 = 4. 3. Since gi ≤ ai for all i and i ai = n we have i gi = n if and only if ai = gi for i = 1, . . . , k.

6.3 The Schur Factorization and Normal Matrices 6.3.1 The Schur Factorization We turn now to unitary similarity transformations S −1 AS, where S = U is unitary. Thus S −1 = U ∗ and a unitary similarity transformation takes the form U ∗ AU .

6.3.2 Unitary and Orthogonal Matrices Although not every matrix can be diagonalized it can be brought into triangular form by a unitary similarity transformation.

136

6 Eigenpairs and Similarity Transformations

Theorem 6.5 (Schur Factorization) For each A ∈ Cn×n there exists a unitary matrix U ∈ Cn×n such that R := U ∗ AU is upper triangular. The matrices U and R in the Schur factorization are called Schur factors. We call A = U RU ∗ the Schur factorization of A. Proof We use induction on n. For n = 1 the matrix U is the 1 × 1 identity matrix. Assume that the theorem is true for all k ×k matrices, and suppose A ∈ Cn×n , where n := k + 1. Let (λ1 , v 1 ) be an eigenpair for A with v 1 2 = 1. By Theorem 5.5 we can extend v 1 to an orthonormal basis {v 1 , v 2 , . . . , v n } for Cn . The matrix V := [v 1 , . . . , v n ] ∈ Cn×n is unitary, and V ∗ AV e1 = V ∗ Av 1 = λ1 V ∗ v 1 = λ1 e1 . It follows that 

 λ1 x ∗ V AV = , for some M ∈ Ck×k and x ∈ Ck . 0 M ∗

(6.8)

By the induction hypothesis there is a unitary matrix W 1 ∈ C(n−1)×(n−1) such that W ∗1 MW 1 is upper triangular. Define 

1 0∗ W= 0 W1

 and U = V W .

Then W and U are unitary and 

1 0∗ 0 W ∗1



λ1 x ∗ 0 M   λ1 x ∗ W 1 = 0 W ∗1 MW 1

U ∗ AU = W ∗ (V ∗ AV )W =

is upper triangular.



1 0∗ 0 W1



 

If A has complex eigenvalues then U will be complex even if A is real. The following is a real version of Theorem 6.5. Theorem 6.6 (Schur Form, Real Eigenvalues) For each A ∈ Rn×n with real eigenvalues there exists an orthogonal matrix U ∈ Rn×n such that U T AU is upper triangular. Proof Consider the proof of Theorem 6.5. Since A and λ1 are real the eigenvector v 1 is real and the matrix W is real and W T W = I . By the induction hypothesis V is real and V T V = I . But then also U = V W is real and U T U = I .  

6.3 The Schur Factorization and Normal Matrices

137

A real matrix with some complex eigenvalues can only be reduced to block triangular form by a real unitary similarity transformation. We consider this in Sect. 6.3.5. Example 6.5 (Deflation Example) By using the unitary transformation V on the n × n matrix A, we obtain a matrix M of order n − 1. M has the same eigenvalues as A except λ. Thus we can find another eigenvalue of A by working with a smaller matrix M. This is an example of a deflation technique which is very useful in  numerical work. The second derivative matrix T :=

2 −1 0 −1 2 −1 0 −1 2

has an eigenpair

(2, x 1 ), where x 1 = [−1, 0, 1]T . Find the remaining eigenvalues using deflation. For this we extend x 1 to a basis {x 1 , x 2 , x 3 } for R3 by defining x 2 = [0, 1, 0]T , x 3 = [1, 0, 1]T . This is already an orthogonal basis and normalizing we obtain the orthogonal matrix ⎡

− √1 0

⎢ V =⎣ 0

√1 2

2

√1 2



⎥ 1 0 ⎦. 0 √1 2

We obtain (6.8) with λ = 2 and √  2 − 2 √ M= . − 2 2 

We can now find the remaining eigenvalues of A from the 2 × 2 matrix M.

6.3.3 Normal Matrices A matrix A ∈ Cn×n is normal if A∗ A = AA∗ . In this section we show that a matrix has orthogonal eigenvectors if and only if it is normal. Examples of normal matrices are 1. 2. 3. 4.

A∗ = A, A∗ = −A, A∗ = A−1 , A = diag(d1 , . . . , dn ).

(Hermitian) (Skew-Hermitian) (Unitary) (Diagonal)

Clearly the matrices in 1. 2. 3. are normal. If A is diagonal then A∗ A = diag(d1 d1 , . . . , dn dn ) = diag(|d1 |2 , . . . , |dn |2 ) = AA∗ ,

138

6 Eigenpairs and Similarity Transformations

and A is normal. The 2. derivative matrix T in (2.27) is symmetric and therefore normal. The eigenvalues of a normal matrix can be complex (cf. Exercise 6.21). However in the Hermitian case the eigenvalues are real (cf. Lemma 2.3). The following theorem shows that A has a set of orthogonal eigenvectors if and only if it is normal. Theorem 6.7 (Spectral Theorem for Normal Matrices) A matrix A ∈ Cn×n is normal if and only if there exists a unitary matrix U ∈ Cn×n such that U ∗ AU = D is diagonal. If D = diag(λ1 , . . . , λn ) and U = [u1 , . . . , un ] then (λj , uj ), j = 1, . . . , n are orthonormal eigenpairs for A. Proof If B = U ∗ AU , with B diagonal, and U ∗ U = I , then A = U BU ∗ and AA∗ = (U BU ∗ )(U B ∗ U ∗ ) = U BB ∗ U ∗ and A∗ A = (U B ∗ U ∗ )(U BU ∗ ) = U B ∗ BU ∗ . Now BB ∗ = B ∗ B since B is diagonal, and A is normal. Conversely, suppose A∗ A = AA∗ . By Theorem 6.5 we can find U with U ∗ U = I such that B := U ∗ AU is upper triangular. Since A is normal B is normal. Indeed, BB ∗ = U ∗ AU U ∗ A∗ U = U ∗ AA∗ U = U ∗ A∗ AU = B ∗ B. The proof is complete if we can show that an upper triangular normal matrix B must be diagonal. The diagonal elements eii in E := B ∗ B and fii in F := BB ∗ are given by eii =

n

bki bki =

i n n

|bki |2 and fii = bik bik = |bik |2 .

k=1

k=1

k=1

k=i

The result now follows by equating eii and fii for i = 1, 2, . . . , n. In particular for i = 1 we have |b11|2 = |b11|2 +|b12 |2 +· · ·+|b1n |2 , so b1k = 0 for k = 2, 3, . . . , n. Suppose B is diagonal in its first i − 1 rows so that bj k = 0 for j = 1, . . . , i −1, k = j +1, . . . , n. Then eii =

i

k=1

|bki |2 = |bii |2 =

n

|bik |2 = fii

k=i

and it follows that bik = 0, k = i +1, . . . , n. By induction on the rows we see that B is diagonal. The last part of the theorem follows from Sect. 6.1.1.    2 −1  Example 6.6 The orthogonal diagonalization of A = −1 2 is U T AU =  1  . diag(1, 3), where U = √1 11 −1 2

6.3 The Schur Factorization and Normal Matrices

139

6.3.4 The Rayleigh Quotient The Rayleigh quotient is a useful tool when studying eigenvalues. Definition 6.4 (Rayleigh Quotient) For A ∈ Cn×n and a nonzero x the number R(x) = RA (x) :=

x ∗ Ax x∗x

is called a Rayleigh quotient. ∗

If (λ, x) is an eigenpair for A then R(x) = xx ∗Ax x = λ. Equation (6.9) in the following theorem shows that the Rayleigh quotient of a normal matrix is a convex combination of its eigenvalues. Theorem 6.8 (Convex Combination of the Eigenvalues) Suppose A ∈ Cn×n is normal with orthonormal eigenpairs (λj , uj ), for j = 1, 2, . . . , n. Then the Rayleigh quotient is a convex combination of the eigenvalues of A n λi |ci |2 , RA (x) = i=1 n 2 j =1 |cj |

x = 0,

x=

n

cj uj .

(6.9)

j =1

n n c i u i cj u j = Proof By orthonormality of the eigenvectors x ∗ x = i=1 n j =1 n n n 2 . Similarly, x ∗ Ax = 2 |c | c u c λ u = j =1 j i=1 j =1 i i j j j i=1 λi |ci | . and (6.9) follows. This is clearly a combination of nonnegative quantities and a convex combination since ni=1 |ci |2 / nj=1 |cj |2 = 1.  

6.3.5 The Quasi-Triangular Form How far can we reduce a real matrix A with some complex eigenvalues by a real unitary similarity transformation? To study this we note that the complex eigenvalues of a real matrix occur in conjugate pairs, λ = μ + iν, λ = μ − iν, where μ, ν are real. The real 2 × 2 matrix 

μ ν M= −ν μ

 (6.10)

has eigenvalues λ = μ + iν and λ = μ − iν. Definition 6.5 (Quasi-Triangular Matrix) We say that a matrix is quasitriangular if it is block triangular with only 1 × 1 and 2 × 2 blocks on the diagonal. Moreover, no 2 × 2 block should have real eigenvalues.

140

6 Eigenpairs and Similarity Transformations

As an example consider the matrix ⎡ ⎤     D 1 R 1,2 R 1,3   2 1 3 2 R := ⎣ 0 D 2 R 2,3 ⎦ , D 1 := , D 2 := 1 , D 3 := . −1 2 −1 1 0 0 D3 Since R is block triangular the characteristic polynomial of R is given by πR = πD1 πD 2 πD3 . We find πD1 (λ) = πD3 (λ) = λ2 − 4λ + 5,

πD2 (λ) = λ − 1,

and the eigenvalues D 1 and D 3 are λ1 = 2+i, λ2 = 2−i, while D 2 obviously has the eigenvalue λ = 1. Any A ∈ Rn×n can be reduced to quasi-triangular form by a real orthogonal similarity transformation. For a proof see [16]. We will encounter the quasitriangular form in Chap. 15.

6.3.6 Hermitian Matrices The special cases where A is Hermitian, or real and symmetric, deserve special attention. Theorem 6.9 (Spectral Theorem, Complex Form) Suppose A ∈ Cn×n is Hermitian. Then A has real eigenvalues λ1 , . . . , λn . Moreover, there is a unitary matrix U ∈ Cn×n such that U ∗ AU = diag(λ1 , . . . , λn ). For any such U the columns {u1 , . . . , un } of U are orthonormal eigenvectors of A and Auj = λj uj for j = 1, . . . , n. Proof That the eigenvalues are real was shown in Lemma 2.3. The rest follows from Theorem 6.7.   There is also a real version. Theorem 6.10 (Spectral Theorem (Real Form)) Suppose A ∈ Rn×n is symmetric. Then A has real eigenvalues λ1 , λ2 , . . . , λn . Moreover, there is an orthogonal matrix U ∈ Rn×n such that U T AU = diag(λ1 , λ2 , . . . , λn ). For any such U the columns {u1 , . . . , un } of U are orthonormal eigenvectors of A and Auj = λj uj for j = 1, . . . , n.

6.4 Minmax Theorems

141

Proof Since a real symmetric matrix has real eigenvalues and eigenvectors this follows from Theorem 6.9.  

6.4 Minmax Theorems There are some useful characterizations of the eigenvalues of a Hermitian matrix in ∗ terms of the Rayleigh quotient R(x) = RA (x) := xx ∗Ax x . First we show Theorem 6.11 (Minmax) Suppose A ∈ Cn×n is Hermitian with orthonormal eigenpairs (λj , uj ), 1 ≤ j ≤ n, ordered so that λ1 ≥ · · · ≥ λn . Let 1 ≤ k ≤ n. For any subspace S of Cn of dimension n − k + 1 λk ≤ max R(x),

(6.11)

x∈S x =0

with equality for S = S˜ := span(uk , . . . , un ) and x = uk . Proof Let S be any subspace of Cn of dimension n − k + 1 and define S  := span(u1 , . . . , uk ). It is enough to find y ∈ S so that R(y) ≥ λk . Now S + S  := {s + s  : s ∈ S, s  ∈ S  } is a subspace of Cn and by (1.7) dim(S ∩ S  ) = dim(S) + dim(S  ) − dim(S + S  ) ≥ (n − k + 1) + k − n = 1. It follows that S ∩ S  is nonempty. Let y ∈ S ∩ S  = kj =1 cj uj with kj =1 |cj |2 = 1. Defining cj = 0 for k + 1 ≤ j ≤ n, we obtain by Theorem 6.8 max R(x) ≥ R(y) = x∈S x =0

n

λj |cj | = 2

j =1

k

λj |cj | ≥

j =1

2

k

λk |cj |2 = λk ,

j =1

˜ Now z = n dj uj for and (6.11) follows. To show equality suppose z ∈ S = S. j =k some dk , . . . , dn with nj=k |dj |2 = 1 and by Lemma 6.8 R(z) = nj=k λj |dj |2 ≤ λk . Since z ∈ S˜ is arbitrary we have maxx∈S˜ R(x) ≤ λk and equality in (6.11) ˜ Moreover, R(uk ) = λk . follows for S = S.

x =0

 

There is also a maxmin version of this result. Theorem 6.12 (Maxmin) Suppose A ∈ Cn×n is Hermitian with eigenvalues λ1 , . . . , λn , ordered so that λ1 ≥ · · · ≥ λn . Let 1 ≤ k ≤ n. For any subspace S of Cn of dimension k λk ≥ min R(x), x∈S x =0

(6.12)

142

6 Eigenpairs and Similarity Transformations

with equality for S = S˜ := span(u1 , . . . , uk ) and x = uk . Here (λj , uj ), 1 ≤ j ≤ n are orthonormal eigenpairs for A. Proof The proof is very similar to the proof of Theorem 6.11. We define S  := span(uk , . . . , un ) and show that R(y) ≤ λk for some y ∈ S ∩ S  . It is easy to see ˜ that R(y) ≥ λk for any y ∈ S.   These theorems immediately lead to classical minmax and maxmin characterizations. Corollary 6.1 (The Courant-Fischer Theorem) Suppose A ∈ Cn×n is Hermitian with eigenvalues λ1 , . . . , λn , ordered so that λ1 ≥ · · · ≥ λn . Then λk =

min

max R(x) =

dim(S )=n−k+1 x∈S x =0

max min R(x),

dim(S )=k x∈S x =0

k = 1, . . . , n.

(6.13)

Using Theorem 6.11 we can prove inequalities of eigenvalues without knowing the eigenvectors and we can get both upper and lower bounds. Theorem 6.13 (Eigenvalue Perturbation for Hermitian Matrices) Let A, B ∈ Cn×n be Hermitian with eigenvalues α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn . Then αk + εn ≤ βk ≤ αk + ε1 , for k = 1, . . . , n,

(6.14)

where ε1 ≥ ε2 ≥ · · · ≥ εn are the eigenvalues of E := B − A. Proof Since E is a difference of Hermitian matrices it is Hermitian and the eigenvalues are real. Let (αj , uj ), j = 1, . . . , n be orthonormal eigenpairs for A and let S := span{uk , . . . , un }. By Theorem 6.11 we obtain βk ≤ max RB (x) ≤ max RA (x) + max RE (x) x∈S x =0

x∈S x =0

x∈S x =0

≤ max RA (x) + maxn RE (x) = αk + ε1 , x∈S x =0

x∈C x =0

and this proves the upper inequality. For the lower one we define D := −E and observe that −εn is the largest eigenvalue of D. Since A = B+D it follows from the result just proved that αk ≤ βk − εn , which is the same as the lower inequality.   In many applications of this result the eigenvalues of the matrix E will be small and then the theorem states that the eigenvalues of B are close to those of A. Moreover, it associates a unique eigenvalue of A with each eigenvalue of B.

6.5 Left Eigenvectors

143

6.4.1 The Hoffman-Wielandt Theorem We can also give a bound involving all eigenvalues. The following theorem shows that the eigenvalue problem for a normal matrix is well conditioned. Theorem 6.14 (Hoffman-Wielandt Theorem) Suppose A, B ∈ Cn×n are both normal matrices with eigenvalues λ1 , . . . , λn and μ1 , . . . , μn , respectively. Then there is a permutation i1 , . . . , in of 1, 2, . . . , n such that n n n

2 |μij − λj | ≤ |aij − bij |2 . j =1

(6.15)

i=1 j =1

For a proof of this theorem see [19, p. 190]. For a Hermitian matrix we can use the identity permutation if we order both set of eigenvalues in nonincreasing or nondecreasing order.

6.5 Left Eigenvectors Definition 6.6 (Left and Right Eigenpairs) Suppose A ∈ Cn×n is a square matrix, λ ∈ C and y ∈ Cn is nonzero. We say that (λ, y) is a left eigenpair for A if y ∗ A = λy ∗ or equivalently A∗ y = λy. We say that (λ, y) is a right eigenpair for A if Ay = λy. If (λ, y) is a left eigenpair then λ is called a left eigenvalue and y a left eigenvector. Similarly if (λ, y) is a right eigenpair then λ is called a right eigenvalue and y a right eigenvector. In this book an eigenpair will always mean a right eigenpair. A left eigenvector is an eigenvector of A∗ . If λ is a left eigenvalue of A then λ is an eigenvalue of A∗ and then λ is an eigenvalue of A (cf. Exercise 6.3). Thus left and right eigenvalues are identical, but left and right eigenvectors are in general different. For a Hermitian matrix the right and left eigenpairs are the same. Using right and left linearly independent eigenpairs we get some useful eigenvector expansions. Theorem 6.15 (Biorthogonal Eigenvector Expansion) If A ∈ Cn×n has linearly independent right eigenvectors {x 1 , . . . , x n } then there exists a set of left eigenvectors {y 1 , . . ., y n } with y ∗i x j = δi,j . Conversely, if A ∈ Cn×n has linearly independent left eigenvectors {y 1 , . . . , y n } then there exists a set of right eigenvectors {x 1 , . . . , x n } with y ∗i x j = δi,j . For any scaling of these sets we have the eigenvector expansions v=

n y∗v

j j =1

y ∗j x j

xj =

n

x ∗k v y , y ∗k x k k k=1

v ∈ Cn .

(6.16)

144

6 Eigenpairs and Similarity Transformations

Proof For any right eigenpairs (λ1 , x 1 ), . . . , (λn , x n ) and left eigenpairs (λ1 , y 1 ), . . ., (λn , y n ) of A we have AX = XD, Y ∗ A = DY ∗ , where X := [x 1 , . . . , x n ],

Y := [y 1 , . . . , y n ],

D := diag(λ1 , . . . , λn ).

Suppose X is nonsingular. Then AX = XD ⇒ A = XDX −1 ⇒ X −1 A = DX −1 and it follows that Y ∗ := X−1 contains a collection of left eigenvectors such that Y ∗ X = I . Thus the columns of Y are linearly independent and y ∗i x j = δi,j . Similarly, if Y is nonsingular then AY −∗ = Y −∗ D and it follows that X := Y −∗ contains a collection of linearly independent right eigenvectors such that Y ∗ X = I . n n ∗ ∗ If v = j =1 cj x j then y i v = j =1 cj y i x j = ci y ∗i x i , so ci = y ∗i v/y ∗i x i for i = 1, . . . , n and the first expansion in (6.16) follows. The second expansion follows similarly.   For a Hermitian matrix the right eigenvectors {x 1 , . . . , x n } are also left eigenvectors and (6.16) takes the form v=

n x∗v

j j =1

x ∗j x j

xj .

(6.17)

6.5.1 Biorthogonality Left- and right eigenvectors corresponding to distinct eigenvalues are orthogonal. Theorem 6.16 (Biorthogonality) Suppose (μ, y) and (λ, x) are left and right eigenpairs of A ∈ Cn×n . If λ = μ then y ∗ x = 0. Proof Using the eigenpair relation in two ways we obtain y ∗ Ax = λy ∗ x = μy ∗ x and we conclude that y ∗ x = 0.   Right and left eigenvectors corresponding to the same eigenvalue are sometimes orthogonal, sometimes not. Theorem 6.17 (Simple Eigenvalue) Suppose (λ, x) and (λ, y) are right and left eigenpairs of A ∈ Cn×n . If λ has algebraic multiplicity one then y ∗ x = 0. Proof Assume that x2 = 1. We have (cf. (6.8)) 

 λ z∗ , V AV = 0M ∗

6.6 Exercises Chap. 6

145

where V is unitary and V e1 = x. We show that if y ∗ x = 0 then λ is also an eigenvalue of M contradicting the multiplicity assumption of λ. Let u := V ∗ y. Then (V ∗ A∗ V )u = V ∗ A∗ y = λV ∗ y = λu, so (λ, u) is an eigenpair of V ∗ A∗ V . But then y ∗ x = u∗ V ∗ V e1 = u∗ e1 . Suppose that u∗ e1 = 0, i.e., u = 0v for some nonzero v ∈ Cn−1 . Then 

λ 0∗ V A Vu = z M∗ ∗



      0 0 0 =λ = ∗ v v M v  

and λ is an eigenvalue of M.

The casewith multiple eigenvalues is more complicated. For example, the matrix A := 10 11 has one eigenvalue λ = 1 of algebraic multiplicity two, one right eigenvector x = e1 and one left eigenvector y = e2 . Thus x and y are orthogonal.

6.6 Exercises Chap. 6 6.6.1 Exercises Sect. 6.1 Exercise 6.1 (Eigenvalues of a Block Triangular Matrix) What are the eigenvalues of the matrix ⎡2 1 0 0 0 0 0 0⎤ 02100000

⎢0 0 2 0 0 0 0 0⎥ ⎢ 0 0 0 2 0 0 0 0 ⎥ ∈ R8,8 ? ⎣0 0 0 1 2 0 0 0⎦ 00000200 00000030 00000013

Exercise 6.2 (Characteristic Polynomial of Transpose) We have det(B T ) = det(B) and det(B) = det(B)for any square matrix B. Use this to show that a) πAT = πA , b) πA∗ (λ) = πA (λ). Exercise 6.3 (Characteristic Polynomial of Inverse) Suppose (λ, x) is an eigenpair for A ∈ Cn×n . Show that a) If A is nonsingular then (λ−1 , x) is an eigenpair for A−1 . b) (λk , x) is an eigenpair for Ak for k ∈ Z.

146

6 Eigenpairs and Similarity Transformations

Exercise 6.4 (The Power of the Eigenvector Expansion) Show that if A ∈ Cn×n is nondefective with eigenpairs (λj , x j ), j = 1, . . . , n then for any x ∈ Cn and k∈N A x= k

n

cj λkj x j for some scalars c1 , . . . , cn .

(6.18)

j =1

Show that if A is nonsingular then (6.18) holds for all k ∈ Z. Exercise 6.5 (Eigenvalues of an Idempotent Matrix) Let λ ∈ σ (A) where A2 = A ∈ Cn×n . Show that λ = 0 or λ = 1. (A matrix is called idempotent if A2 = A). Exercise 6.6 (Eigenvalues of a Nilpotent Matrix) Let λ ∈ σ (A) where Ak = 0 for some k ∈ N. Show that λ = 0. (A matrix A ∈ Cn×n such that Ak = 0 for some k ∈ N is called nilpotent). Exercise 6.7 (Eigenvalues of a Unitary Matrix) Let λ ∈ σ (A), where A∗ A = I . Show that |λ| = 1. Exercise 6.8 (Nonsingular Approximation of a Singular Matrix) Suppose A ∈ Cn×n is singular. Then we can find 0 > 0 such that A + I is nonsingular for all ∈ C with | | < 0 . Hint: det(A) = λ1 λ2 · · · λn , where λi are the eigenvalues of A. Exercise 6.9 (Companion Matrix) For q0 , . . . , qn−1 ∈ C let p(λ) = λn + qn−1 λn−1 + · · · + q0 be a polynomial of degree n in λ. We derive two matrices that have (−1)n p as its characteristic polynomial. a) Show that p = (−1)n πA where ⎡

−qn−1 −qn−2 ⎢ 1 0 ⎢ ⎢ 1 A=⎢ 0 ⎢ . .. ⎣ .. . 0 0

· · · −q1 ··· 0 ··· 0 . . .. . . ··· 1

⎤ −q0 0 ⎥ ⎥ 0 ⎥ ⎥. .. ⎥ . ⎦ 0

A is called a companion matrix of p. b) Show that p = (−1)n πB where ⎡

⎤ · · · 0 −q0 · · · 0 −q1 ⎥ ⎥ · · · 0 −q2 ⎥ ⎥. .. ⎥ . . .. . . . ⎦ 0 0 · · · 1 −qn−1

00 ⎢1 0 ⎢ ⎢ B = ⎢0 1 ⎢. . ⎣ .. ..

Thus B can also be regarded as a companion matrix for p.

6.6 Exercises Chap. 6

147

Exercise 6.10 (Find Eigenpair Example) Find eigenvalues and eigenvectors of ⎡ ⎤ 123 A = ⎣ 0 2 3 ⎦. Is A defective? 002 Exercise 6.11 (Right or Wrong? (Exam Exercise 2005-1)) Decide if the following statements are right or wrong. Give supporting arguments for your decisions. a) The matrix   1 3 4 A= 6 4 −3 is orthogonal? b) Let 

a1 A= 0a



where a ∈ R. There is a nonsingular matrix Y ∈ R2×2 and a diagonal matrix D ∈ R2×2 such that A = Y DY −1 ? Exercise 6.12 (Eigenvalues of Tridiagonal Matrix (Exam Exercise 2009-3)) Let A ∈ Rn,n be tridiagonal (i.e. aij = 0 when |i − j | > 1) and suppose also that ai+1,i ai,i+1 > 0 for i = 1, . . . , n − 1. Show that the eigenvalues of A are real.2

6.6.2 Exercises Sect. 6.2 Exercise 6.13 (Jordan Example) For the Jordan factorization of the matrix A = Find S.

!

3 0 1 −4 1 −2 −4 0 −1

"

we have J =

Exercise 6.14 (A Nilpotent Matrix) Show that (J m (λ) − λI )r = 1 ≤ r ≤ m − 1 and conclude that (J m (λ) − λI )m = 0.

!

!1 1 0" 010 001

0 I m−r 0 0

.

" for

Exercise 6.15 (Properties of the Jordan Factorization) Let J be the Jordan factorization of a matrix A ∈ Cn×n as given in Theorem 6.4. Then for r = 0, 1, 2, . . ., m = 2, 3, . . ., and any λ ∈ C a) Ar = SJ r S −1 , b) J r = diag(U r1 , . . . , U rk ),

2 Hint:

show that there is a diagonal matrix D such that D −1 AD is symmetric.

148

6 Eigenpairs and Similarity Transformations

c) U ri = diag(J mi,1 (λi )r , . . . , J mi,gi (λi )r ), min{r,m−1} r  r−k k d) J m (λ)r = (E m + λI m )r = k=0 Em. k λ Exercise 6.16 (Powers of a Jordan Block) Find J 100 and A100 for the matrix in Exercise 6.13. Exercise 6.17 (The Minimal Polynomial) Let J be the Jordan factorization of a matrix A ∈ Cn×n as given in Theorem 6.4. The polynomial μA (λ) :=

k 

(λi − λ)mi where mi := max mi,j , 1≤j ≤gi

i=1

(6.19)

is called the minimal polynomial of A. We define the matrix polynomial μA (A) by replacing the factors λi − λ by λi I − A.  g a) We have πA (λ) = ki=1 j i=1 (λi − λ)mi,j . Use this to show that the minimal polynomial divides the characteristic polynomial, i.e., πA = μA νA for some polynomial νA . b) Show that μA (A) = 0 ⇐⇒ μA (J ) = 0. c) (can be difficult) Use Exercises 6.14, 6.15 and the maximality of mi to show that μA (A) = 0 . Thus a matrix satisfies its minimal equation. Finally show that the degree of any polynomial p such that p(A) = 0 is at least as large as the degree of the minimal polynomial. d) Use 2. to show the Cayley-Hamilton Theorem which says that a matrix satisfies its characteristic equation πA (A) = 0. Exercise 6.18 (Cayley Hamilton Theorem (Exam Exercise 1996-3)) Suppose p is a polynomial given by p(t) := rj =0 bj t j , where bj ∈ C and A ∈ Cn×n . We define the matrix p(A) ∈ Cn×n by p(A) :=

r

bj Aj ,

j =0

where A0 := I . From this it follows that if p(t) := (t − α1 ) · · · (t − αr ) for some α0 , . . . , αr ∈ C then p(A) = (A − α1 ) · · · (A − αr ). We accept this without proof. Let U ∗ AU = T , where U is unitary and T upper triangular with the eigenvalues of A on the diagonal.   21 a) Find the characteristic polynomial πA to . Show that π(A) = 0. −1 4 b) Let now A ∈ Cn×n be arbitrary. For any polynomial p show that p(A) = U p(T )U ∗ . c) Let n, k ∈ N with 1 ≤ k < n. Let C, D ∈ Cn×n be upper triangular. Moreover, ci,j = 0 for i, j ≤ k and dk+1,k+1 = 0. Define E := CD and show that ei,j = 0 for i, j ≤ k + 1.

6.6 Exercises Chap. 6

149

d) Now let p := πA be the characteristic polynomial of A. Show that p(T ) = 0.3 Then show that p(A) = 0. (Cayley Hamilton Theorem)

6.6.3 Exercises Sect. 6.3 Exercise (Schur Factorization Example) Show   1 1 that  a Schur factorization of  6.19  −1  √1 , where U = A = 13 22 is U T AU = −1 −1 1 . 0 4 2

Exercise 6.20 (Skew-Hermitian Matrix) Suppose C = A + iB, where A, B ∈ Rn×n . Show that C is skew-Hermitian if and only if AT = −A and B T = B. Exercise 6.21 (Eigenvalues of a Skew-Hermitian Matrix) Show that any eigenvalue of a skew-Hermitian matrix is purely imaginary. Exercise 6.22 (Eigenvector Expansion Using Orthogonal Eigenvectors) Show that if the eigenpairs (λ1 , u1 ), . . . , (λn , un ) of A ∈ Cn×n are orthogonal, i.e., u∗j uk = 0 for j = k then the eigenvector expansions of x and Ax ∈ Cn take the form x=

n

j =1

cj u j ,

Ax =

n

cj λj uj , where cj =

j =1

u∗j x u∗j uj

.

(6.20)

Exercise 6.23 (Rayleigh Quotient (Exam Exercise 2015-3)) a) Let A ∈ Rn×n be a symmetric matrix. Explain how we can use the spectral theorem for symmetric matrices to show that λmin = min R(x) = min R(x), x =0

x2 =1

where λmin is the smallest eigenvalue of A, and R(x) is the Rayleigh quotient given by R(x) :=

3 Hint:

use a suitable factorization of p and use c).

x T Ax . xT x

150

6 Eigenpairs and Similarity Transformations

b) Let x, y ∈ Rn such that x2 = 1 and y = 0. Show that  T R(x − ty) = R(x) − 2t Ax − R(x)x y + O(t 2 ), where t > 0 is small.4 c) Based on the characterization given in a) above it is tempting to develop an algorithm for computing λmin by approximating the minimum of R(x) over the unit ball B1 := {x ∈ Rn | x2 = 1 }. Assume that x 0 ∈ B1 satisfies Ax 0 − R(x 0 )x 0 = 0, i.e., (R(x 0 ), x 0 ) is not an eigenpair for A. Explain how we can find a vector x 1 ∈ B1 such that R(x 1 ) < R(x 0 ).

6.6.4 Exercises Sect. 6.4 Exercise 6.24 (Eigenvalue Perturbation for Hermitian Matrices) Show that in Theorem 6.13, if E is symmetric positive semidefinite then βi ≥ αi . Exercise 6.25  (Hoffman-Wielandt)   −1  Show that (6.15) does not hold for the matrices A := 00 04 and B := −1 1 1 . Why does this not contradict the HoffmanWielandt theorem? Exercise 6.26 (Biorthogonal Expansion) Determine right and left eigenpairs for   the matrix A := 32 12 and the two expansions in (6.16) for any v ∈ R2 . Exercise 6.27 (Generalized Rayleigh Quotient) For A ∈ Cn×n and any y, x ∈ ∗ Cn with y ∗ x = 0 the quantity R(y, x) = RA (y, x) := yy ∗Ax x is called a generalized Rayleigh quotient for A. Show that if (λ, x) is a right eigenpair for A then R(y, x) = λ for any y with y ∗ x = 0. Also show that if (λ, y) is a left eigenpair for A then R(y, x) = λ for any x with y ∗ x = 0.

6.7 Review Questions 6.7.1 6.7.2

4 Hint:

Does A, AT and A∗ have the same eigenvalues? What about A∗ A and AA∗ ? Can a matrix with multiple eigenvalues be similar to a diagonal matrix?

Use Taylor’s theorem for the function f (t) = R(x − ty).

6.7 Review Questions

6.7.3 6.7.4 6.7.5 6.7.6 6.7.7 6.7.8 6.7.9 6.7.10 6.7.11

151

What is the geometric multiplicity of an eigenvalue? Can it be bigger than the algebraic multiplicity? What is the Jordan factorization of a matrix? What are the eigenvalues of a diagonal matrix? What are the Schur factors of a matrix? What is a quasi-triangular matrix? Give some classes of normal matrices. Why are normal matrices important? State the Courant-Fischer theorem. State the Hoffman-Wielandt theorem for Hermitian matrices. What is a left eigenvector of a matrix?

Chapter 7

The Singular Value Decomposition

The singular value decomposition and the reduced form called the singular value factorization are useful both for theory and practice. Some of their applications include solving over-determined equations, principal component analysis in statistics, numerical determination of the rank of a matrix, algorithms used in search engines, and the theory of matrices. We know from Theorem 6.7 that a square matrix A can be diagonalized by a unitary similarity transformation if and only if it is normal, that is A∗ A = AA∗ . In particular, if A ∈ Cn×n is normal then it has a set of orthonormal eigenpairs (λ1 , u1 ), . . . , (λn , un ). Letting U := [u1 , . . . , un ] ∈ Cn×n and D := diag(λ1 , . . . , λn ) we have the spectral decomposition A = U DU ∗ , where U ∗ U = I .

(7.1)

The singular value decomposition (SVD) is a decomposition of a matrix in the form A = U Σ V ∗ , where U and V are unitary, and Σ is a nonnegative diagonal matrix, i.e., Σij = 0 for all i = j and Σii ≥ 0 for all i. The diagonal elements σi := Σii are called singular values, while the columns of U and V are called singular vectors. To be a singular value decomposition the singular values should be ordered, i.e., σi ≥ σi+1 for all i. Example 7.1 (SVD) The following is a singular value decomposition of a rectangular matrix. ⎡ ⎤ ⎡ ⎤⎡ ⎤   14 2 1 2 2 20 1 ⎣ 1 1 3 4 = U ΣV ∗ . A= 4 22⎦ = ⎣ 2 −2 1 ⎦ ⎣0 1⎦ 15 3 5 4 −3 16 13 2 1 −2 00

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_7

(7.2)

153

154

7 The Singular Value Decomposition

Indeed, U and V are unitary since the columns (the singular vectors) are orthonormal, and Σ is a nonnegative diagonal matrix with singular values σ1 = 2 and σ2 = 1.

7.1 The SVD Always Exists The singular value decomposition is closely related to the eigenpairs of A∗ A and AA∗ .

7.1.1 The Matrices A∗ A, AA∗ To start we show that bases for the four fundamental subspaces R(A), N (A), R(A∗ ) and N (A∗ ) of a matrix A can be determined from the eigenpairs of A∗ A and AA∗ . Theorem 7.1 (The Matrices A∗ A, AA∗ ) Suppose m, n ∈ N and A ∈ Cm×n . 1. The matrices A∗ A ∈ Cn×n and AA∗ ∈ Cm×m have the same nonzero eigenvalues with the same algebraic multiplicities. Moreover the extra eigenvalues of the larger matrix are all zero. 2. The matrices A∗ A and AA∗ are Hermitian with nonnegative eigenvalues. 3. Let (λj , v j ) be orthonormal eigenpairs for A∗ A with λ1 ≥ · · · ≥ λr > 0 = λr+1 = · · · = λn . Then {Av 1 , . . . , Av r } is an orthogonal basis for the column space R(A) := {Ay ∈ Cm : y ∈ Cn } and {v r+1 , . . . , v n } is an orthonormal basis for the nullspace N (A) := {y ∈ Cn : Ay = 0}. 4. Let (λj , uj ) be orthonormal eigenpairs for AA∗ . If λj > 0, j = 1, . . . , r and λj = 0, j = r + 1, . . . , m then {A∗ u1 , . . . , A∗ ur } is an orthogonal basis for the column space R(A∗ ) and {ur+1 , . . . , um } is an orthonormal basis for the nullspace N (A∗ ). 5. The rank of A equals the number of positive eigenvalues of A∗ A and AA∗ . Proof 1. Consider the characteristic polynomials πA∗ A and πAA∗ . By (6.1) we have λm πA∗ A (λ) = λn πAA∗ (λ),

λ ∈ C,

and the claim follows. 2. The matrices A∗ A and AA∗ are Hermitian and positive semidefinite and therefore has nonnegative eigenvalues (cf. Lemmas 4.2 and 4.5). Moreover, if

7.1 The SVD Always Exists

155

A∗ Av = λv with v = 0, then λ=

Av22 v ∗ A∗ Av = ≥ 0. v∗v v22

(7.3)

3. By orthonormality of v 1 , . . . , v n we have (Av j )∗ Av k = v ∗j A∗ Av k = λk v ∗j v k = 0 for j = k, showing that Av 1 , . . . , Av n are orthogonal vectors. Moreover, (7.3) implies that Av 1 , . . . , Av r are nonzero and Av j = 0 for j = r + 1, . . . , n. In particular, the elements of {Av 1 , . . . , Av r } and {v r+1 , . . . , v n } are linearly independent vectors in R(A) and N (A), respectively. The proof will be complete once it is shown that R(A) ⊂ span(Av 1 , . . . , Av r ) and N (A) ⊂ span(v r+1 , . . . , v n ). Suppose x ∈ R(A). Then x = Ay for some y ∈ Cn , n Let y = of y. Since Av j = 0 for j =1 cj v j be an eigenvector expansion n r j = r + 1, . . . , n we obtain x = Ay = c Av = j =1 cj Av j ∈ n j =1 j j span(Av 1 , . . . , Av r ). Finally, if y = c v ∈ N (A), then we have j =1 j j r Ay = = · · · = cr = 0 since Av 1 , . . . , Av r are j =1 cj Av j = 0, and c1 linearly independent. But then y = nj=r+1 cj v j ∈ span(v r+1 , . . . , v n ). 4. Since AA∗ = B ∗ B with B := A∗ this follows from part 3 with A = B. 5. By part 1 and 2 A∗ A and AA∗ have the same number r of positive eigenvalues and by part 3 and 4 r is the rank of A.   The following theorem shows, in a constructive way, that any matrix has a singular value decomposition. Theorem 7.2 (Existence of SVD) Suppose for m, n, r ∈ N that A ∈ Cm×n has rank r, and that (λj , v j ) are orthonormal eigenpairs for A∗ A with λ1 ≥ · · · ≥ λr > 0 = λr+1 = · · · = λn . Define 1. V := [v 1 , . . . , v n ] ∈ Cn×n ,  2. Σ ∈ Rm×n is a diagonal matrix with diagonal elements σj := λj for j = 1, . . . , min(m, n), 3. U := [u1 , . . . , um ] ∈ Cm×m , where uj = σj−1 Av j for j = 1, . . . , r and ur+1 , . . . , um is any extension of u1 , . . . , ur to an orthonormal basis u1 , . . . , um for Cm . Then A = U Σ V ∗ is a singular value decomposition of A. Proof Let U , Σ, V be as in the theorem. The vectors u1 , . . . , ur are orthonormal since Av 1 , . . . , Av r are orthogonal and σj = Av j 2 > 0, j = 1, . . . , r by (7.3). But then U and V are unitary and Σ is a nonnegative diagonal matrix. Moreover, U Σ = U [σ1 e1 , . . . , σr er , 0, . . . , 0] = [σ1 u1 , . . . , σr ur , 0, . . . , 0] = [Av 1 , . . . , Av n ].

156

7 The Singular Value Decomposition

Thus U Σ = AV and since V is square and unitary we find U ΣV ∗ = AV V ∗ = A and we have an SVD of A with σ1 ≥ σ2 ≥ · · · ≥ σr .   ⎡ ⎤ 14 2 1 ⎣ Example 7.2 (Find SVD) To derive the SVD in (7.2) where A = 15 4 22⎦, we 16 13 first compute the eigenpairs of   1 52 36 B := A A = 25 36 73 T

as     3 3 B =4 , 4 4



   4 4 B =1 . −3 −3

  3 4 . Now u1 = Av 1 /σ1 = [1, 2, 2]T /3, Thus σ1 = 2, σ2 = 1, and V = 4 −3 u2 = Av 2 /σ2 = [2, −2, 1]T /3. For an SVD we also need u3 which is any vector of length one orthogonal to u1 and u2 . u3 = [2, 1, −2]T /3 is such a vector and we obtain the singular value decomposition (7.2). 1 5

7.2 Further Properties of SVD We first consider a reduced SVD that is often convenient.

7.2.1 The Singular Value Factorization Suppose A = U ΣV ∗ is a singular value decomposition of A of rank r. Consider the block partitions U = [U 1 , U 2 ] ∈ Cm×m ,

U 1 := [u1 , . . . , ur ],

U 2 := [ur+1 , . . . , um ],

V = [V 1 , V 2 ] ∈ Cn×n , V 1 := [v 1 , . . . , v r ], V 2 := [v r+1 , . . . , v n ],   Σ1 0r,n−r Σ= ∈ Rm×n , where Σ 1 := diag(σ1 , . . . , σr ). 0m−r,r 0m−r,n−r (7.4) Thus Σ1 contains the r positive singular values on the diagonal and for k, l ≥ 0 the symbol 0k,l = [ ] denotes the empty matrix if k = 0 or l = 0, and the zero matrix

7.2 Further Properties of SVD

157

with k rows and l columns otherwise. We obtain by block multiplication a reduced factorization A = U ΣV ∗ = U 1 Σ 1 V ∗1 .

(7.5)

As an example: 

         1 1   1  1 1 1 2 0 1 1 −1 1 −1 = √ =√ √ 2 √ 1 −1 . 1 −1 2 1 −1 0 0 2 1 1 2 1 2

Definition 7.1 (SVF) Let m, n, r ∈ N and suppose A ∈ Cm×n has r positive singular values, i.e., A has rank r. A singular value factorization (SVF) is a factorization of A ∈ Cm×n of the form A = U 1 Σ 1 V ∗1 , where U 1 ∈ Cm×r and V 1 ∈ Cn×r have orthonormal columns, and Σ 1 ∈ Rr×r is a diagonal matrix with σ1 ≥ · · · ≥ σr > 0. An SVD and an SVF of a matrix A of rank r are closely related. 1. Let A = U ΣV ∗ be an SVD of A. Then A = U 1 Σ 1 V ∗1 is an SVF of A, where U 1 , V 1 contain the first r columns of U , V respectively, and Σ1 ∈ Rr×r is a diagonal matrix with the positive singular values on the diagonal. 2. Conversely, suppose A = U 1 Σ 1 V ∗1 is a singular value factorization of A. Extend U 1 and V 1 in any way to unitary matrices U ∈ Cm×m and V ∈ Cn×n , and let Σ be given by (7.4). Then A = U ΣV ∗ is an SVD of A. 3. If A = [u1 , . . . , ur ] diag(σ1 , . . . , σr )[v 1 , . . . , v r ]∗ is a singular value factorization of A then A=

r

σj uj v ∗j .

(7.6)

j =1

This is known as the outer product form of the SVD and SVF. 4. We note that a nonsingular square matrix has full rank and only positive singular values. Thus the SVD and SVF are the same for a nonsingular matrix. Example 7.3 (r < n < m) To find the SVF and SVD of ⎡

⎤ 11 A = ⎣1 1⎦ . 00 we first compute eigenpairs of B := AT A =

  22 22

158

7 The Singular Value Decomposition

as     1 1 B =4 , 1 1



   1 1 B =0 , −1 −1

and we find σ1 = 2, σ2 = 0, Thus r = 1, m = 3, n = 2 and ⎡

Σ1 ⎣ Σ= 0 0

⎤ 0 0⎦ , 0

  1 1 1 V = √ . 2 1 −1

Σ 1 = [2],

√ We find u1 = Av 1 /σ1 = s 1 / 2, where s 1 = [1, 1, 0]T , and the SVF of A is given by ⎡ ⎤ 1 1 ⎣ ⎦ 1   A= √ 1 [2] √ 1 1 . 2 0 2 To find an SVD we need to extend u1 to an orthonormal basis for R3 . We first extend s 1 to a basis {s 1 , s 2 , s 3 } for R3 , apply the Gram-Schmidt orthogonalization process to {s 1 , s 2 , s 3 }, and then normalize. Choosing the basis s1 =

!1" 1 0

,

s2 =

!0" 1 0

,

s3 =

s T2 w 1 w w T1 w 1 1

!0" 0 1

, 

−1/2 1/2 0



we find from (5.8) w1 = s 1 , w 2 = s 2 − = , w3 = s 3 − !0" T T s 3 w1 s w w − w3T w2 w2 = 0 . Normalizing the wi ’s we obtain u1 = w1 /w 1 2 = w T1 w 1 1 1 √ √ 2 2 √ √ [1/ 2, 1/ 2, 0]T , u2 = w2 /w2 2 = [−1/ 2, 1/ 2, 0]T , and u3 = s 3 /s 3 2 = [0, 0, 1]T . Therefore, A = U ΣV T , is an SVD, where  U :=

√ √  1/√2 −1/√ 2 0 1/ 2 1/ 2 0 0 0 1

∈ R3,3 ,

Σ :=

!2 0" 00 00

∈ R3,2 ,

1  1  ∈ R2,2 . V := √ 11 −1 2

The method we used to find the singular value decomposition in the examples and exercises can be suitable for hand calculation with small matrices, but it is not appropriate as a basis for a general purpose numerical method. In particular, the Gram-Schmidt orthogonalization process is not numerically stable, and forming A∗ A can lead to extra errors in the computation. Standard computer implementations of the singular value decomposition [16] first reduces A to bidiagonal form and then use an adapted version of the QR algorithm where the matrix A∗ A is not formed. The QR algorithm is discussed in Chap. 15.

7.3 A Geometric Interpretation

159

7.2.2 SVD and the Four Fundamental Subspaces The singular vectors form orthonormal bases for the four fundamental subspaces R(A), N (A), R(A∗ ), and N (A∗ ). Theorem 7.3 (Singular Vectors and Orthonormal Bases) For positive integers m, n let A ∈ Cm×n have rank r and a singular value decomposition A = [u1 , . . . , um ]Σ[v 1 , . . . , v n ]∗ = U ΣV ∗ . Then the singular vectors satisfy Av i = σi ui , i = 1, . . . , r, ∗

A ui = σi v i , i = 1, . . . , r,

Av i = 0, i = r + 1, . . . , n, A∗ ui = 0, i = r + 1, . . . , m.

(7.7)

Moreover, 1. {u1 , . . . , ur } is an orthonormal basis for R(A), 2. {ur+1 , . . . , um }is an orthonormal basis for N (A∗ ),

(7.8)

3. {v 1 , . . . , v r } is an orthonormal basis for R(A∗ ), 4. {v r+1 , . . . , v n } is an orthonormal basis for N (A).

Proof If A = U ΣV ∗ then  = U Σ, or in terms of the block partition (7.4)  AV A[V 1 , V 2 ] = [U 1 , U 2 ] Σ01 00 . But then AV 1 = U 1 Σ 1 , AV 2 = 0, and this implies the first part of (7.7). Taking conjugate transpose of A = U Σ V ∗ gives A∗ = V Σ ∗ U ∗ or A∗ U = V Σ ∗ . Using the block partition as before we obtain the last part of (7.7). It follows from Theorem 7.1 that {Av 1 , . . . , Av r } is an orthogonal basis for R(A), {A∗ u1 , . . . , A∗ ur } is an orthogonal basis for R(A∗ ), {v r+1 , . . . , v m } is an orthonormal basis for N (A) and{ur+1 , . . . , um } is an orthonormal basis for N (A∗ ). By (7.7) {u1 , . . . , ur } is an orthonormal basis for R(A) and {v 1 , . . . , v r } is an orthonormal basis for R(A∗ ).  

7.3 A Geometric Interpretation The singular value decomposition and factorization give insight into the geometry of a linear transformation. Consider the linear transformation T : Rn → Rm given by T z := Az where A ∈ Rm×n . Assume that rank(A) = n. In the following theorem we show that the function T maps the unit sphere in Rn given by S := {z ∈ Rn : z2 = 1} onto an ellipsoid E := AS = {Az : z ∈ S} in Rm . Theorem 7.4 (SVF Ellipse) Suppose A ∈ Rm×n has rank r = n, and let A = U 1 Σ1 V T1 be a singular value factorization of A. Then E = U 1 E˜ where E˜ := {y = [y1, . . . , yn ]T ∈ Rn :

y12 σ12

+ ··· +

yn2 = 1}. σn2

160

7 The Singular Value Decomposition

Proof Suppose z ∈ S. Now Az = U 1 Σ1 V T1 z = U 1 y, where y := Σ1 V T1 z. Since rank(A) = n it follows that V 1 = V is square so that V 1 V T1 = I . But then V 1 Σ −1 1 y = z and we obtain −1 2 2 1 = z22 = V 1 Σ −1 1 y2 = Σ 1 y2 =

y12 σ12

+ ···+

yn2 . σn2

˜ Finally, x = Az = U 1 Σ1 V T z = U 1 y, where y ∈ E˜ This implies that y ∈ E. 1 ˜   implies that E = U 1 E. The equation 1 =

y12 σ12

+ ··· +

yn2 σn2

describes an ellipsoid in Rn with semiaxes

of length σj along the unit vectors ej for j = 1, . . . , n. Since the orthonormal transformation U 1 y → x preserves length, the image E = AS is a rotated ellipsoid with semiaxes along the left singular vectors uj = U ej , of length σj , j = 1, . . . , n. Since Av j = σj uj , for j = 1, . . . , n the right singular vectors defines points in S that are mapped onto the semiaxes of E. Example 7.4 (Ellipse) Consider the transformation A : R2 → R2 given by the matrix   1 11 48 A := 25 48 39 in Example 7.8. Recall that σ1 = 3, σ2 = 1, u1 = [3, 4]T /5 and u2 = [−4, 3]T /5. The ellipses y12 /σ12 + y22 /σ22 = 1 and E = AS = U 1 E˜ are shown in Fig. 7.1. Since

Fig. 7.1 The ellipse y12 /9 + y22 = 1 (left) and the rotated ellipse AS (right)

7.4 Determining the Rank of a Matrix Numerically

161

y = U T1 x = [3/5x1 + 4/5x2 , −4/5x1 + 3/5x2 ]T , the equation for the ellipse on the right is (− 45 x1 + 35 x2 )2 ( 35 x1 + 45 x2 )2 + = 1, 9 1

7.4 Determining the Rank of a Matrix Numerically In many elementary linear algebra courses a version of Gaussian elimination, called Gauss-Jordan elimination, is used to determine the rank of a matrix. To carry this out by hand for a large matrix can be a Herculean task and using a computer and floating point arithmetic the result will not be reliable. Entries, which in the final result should have been zero, will have nonzero values because of round-off errors. As an alternative we can use the singular value decomposition to determine rank. Although success is not at all guaranteed, the result will be more reliable than if Gauss-Jordan elimination is used. By Theorem 7.2 the rank of a matrix is equal to the number of nonzero singular values, and if we have computed the singular values, then all we have to do is to count the nonzero ones. The problem however is the same as for Gaussian elimination. Due to round-off errors none of the computed singular values are likely to be zero.

7.4.1 The Frobenius Norm This commonly occurring matrix norm will be used here in a discussion of how many of the computed singular values can possibly be considered to be zero. The Frobenius norm, of a matrix A ∈ Cm×n is defined by AF :=

m n 1/2  |aij |2 .

(7.9)

i=1 j =1

There is a relation between the Frobenius norm of a matrix and its singular values. First we derive some elementary properties of this norm. A systematic study of matrix norms is given in the next chapter. Lemma 7.1 (Frobenius Norm Properties) For any m, n ∈ N and any matrix A ∈ Cm×n 1. A∗ F = AF , 2. A2F = nj=1 a :j 22 ,

162

7 The Singular Value Decomposition

3. U AF = AV F = AF for any unitary matrices U ∈ Cm×m and V ∈ Cn×n , 4. ABF ≤ AF BF for any B ∈ Cn,k , k ∈ N, 5. Ax2 ≤ AF x2 , for all x ∈ Cn . Proof

m n 2 2 2 1. A∗ 2F = nj=1 m i=1 |a ij | = i=1 j =1 |aij | = AF . 2. This follows since the Frobenius norm is the Euclidian norm of a vector, AF := vec(A)2 , where vec(A) ∈ Cmn is the vector obtained by stacking the columns a :j of A on top of each other. 3. Recall that if U ∗ U = I then U x2 = x2 for all x ∈ Cn . Applying this to each 2. 2. column a :j of A we find U A2F = nj=1 U a :j 22 = nj=1 a :j 22 = A2F . Similarly, since V V ∗ = I we find AV F = V ∗ A∗ F = A∗ F = AF . 4. Using the Cauchy-Schwarz inequality and 2. we obtain 1.

AB2F =

1.

m m k k

|a ∗i: b:j |2 ≤ a i: 22 b :j 22 = A2F B2F . i=1 j =1

i=1 j =1

5. Since vF = v2 for a vector this follows by taking k = 1 and B = x in 4.   Theorem 7.5 (Frobenius Norm and Singular Values) We have AF % σ12 + · · · + σn2 , where σ1 , . . . , σn are the singular values of A.

=

Proof Using Lemma 7.1 we find AF = U ∗ AV F = ΣF = 3.

%

σ12 + · · · + σn2 .  

7.4.2 Low Rank Approximation m×n has a singular value decomposition A = Suppose  D  ∗m ≥ n ≥ 1 and A ∈ C U 0 V , where D = diag(σ1 , . . . , σn ). We choose > 0 and let 1 ≤ r ≤ n be   2 +· · ·+σ 2 < 2 . Define A := U D  V ∗ , where the smallest integer such that σr+1 n 0 D  := diag(σ1 , . . . , σr , 0, . . . , 0) ∈ Rn×n . By Lemma 7.1

A − A F = U

!

D−D  0

"

% " !  2 + · · · + σ 2 < . V ∗ F =  D−D  = σr+1 F n 0

7.5 Exercises Chap. 7

163

Thus, if is small then A is near a matrix A of % rank r. This can be used to determine 2 + · · · + σ 2 is “small”. Then we rank numerically. We choose an r such that σr+1 n postulate that rank(A) = r since A is close to a matrix of rank r. The following theorem shows that of all m × n matrices of rank r, A is closest to A measured in the Frobenius norm. Theorem 7.6 (Best Low Rank Approximation) Suppose A ∈ Rm×n has singular values σ1 ≥ · · · ≥ σn ≥ 0. For any r ≤ rank(A) we have A − A F =

min A − BF =

%

B∈Êm×n

2 + · · · + σ 2. σr+1 n

rank(B)=r

For the proof of this theorem we refer to p. 322 of [16].

7.5 Exercises Chap. 7 7.5.1 Exercises Sect. 7.1 Exercise 7.1 (SVD1) Show that the decomposition        1 1 1 11 20 1 1 1 √ A := = √ = U DU T 11 2 1 −1 0 0 2 1 −1

(7.10)

is both a spectral decomposition and a singular value decomposition. Exercise 7.2 (SVD2) Show that the decomposition 

      1 1 1 1 −1 2 0 1 1 −1 A := =√ =: U ΣV T √ 1 −1 2 1 −1 0 0 2 1 1

(7.11)

is a singular value decomposition. Show that A is defective so it cannot be diagonalized by any similarity transformation. Exercise 7.3 (SVD Examples) Find the singular value decomposition of the following matrices   3 a) A = . 4 ⎡ ⎤ 11 b) A = ⎣ 2 2 ⎦. 22

164

7 The Singular Value Decomposition

Exercise 7.4 (More SVD Examples) Find the singular value decomposition of the following matrices a) A = e1 the first unit vector in Rm . b) A = eTn the last unit vector in Rn . c) A = −10 03 . Exercise 7.5 (Singular Values of a Normal Matrix) Show that a) the singular values of a normal matrix are the absolute values of its eigenvalues, b) the singular values of a symmetric positive semidefinite matrix are its eigenvalues. Exercise 7.6 (The Matrices A∗ A, AA∗ and SVD) Show the following: If A = U Σ V is a singular value decomposition of A ∈ Cm×n then a) b) c) d)

A∗ A = V diag(σ12 , . . . , σn2 )V ∗ is a spectral decomposition of A∗ A. AA∗ = U diag(σ12 , . . . , σm2 )U ∗ is a spectral decomposition of AA∗ . The columns of U are orthonormal eigenvectors of AA∗ . The columns of V are orthonormal eigenvectors of A∗ A.

Exercise 7.7 (Singular Values (Exam Exercise 2005-2)) Given the statement: “If A ∈ Rn×n has singular values (σ1 , . . . , σn ) then A2 has singular values (σ12 , . . . , σn2 )”. Find a class of matrices for which the statement is true. Show that the statement is not true in general.

7.5.2 Exercises Sect. 7.2 Exercise7.8 (Nonsingular Matrix) Derive the SVF and SVD of the matrix1  1 11 48 . Also, using possibly a computer, find its spectral decomposition A = 25 48 39 U DU T . The matrix A is normal, but the spectral decomposition is not an SVD. Why? Exercise 7.9 (Full Row Rank) Find2 the SVF and SVD of   1 14 4 16 A := ∈ R2×3 . 15 2 22 13

     3 −4 3 0 1 3 4 . 4 3 0 1 5 4 −3 2 Hint: Take the transpose of the matrix in (7.2). 1 Answer:

A=

1 5

7.5 Exercises Chap. 7

165

Exercise 7.10 (Counting Dimensions of Fundamental Subspaces) Suppose A ∈ Cm×n . Show using SVD that a) rank(A) = rank(A∗ ). b) rank(A) + null(A) = n, c) rank(A) + null(A∗ ) = m, where null(A) is defined as the dimension of N (A). Exercise 7.11 (Rank and Nullity Relations) Use Theorem 7.1 to show that for any A ∈ Cm×n a) rank A = rank(A∗ A) = rank(AA∗ ), b) null(A∗ A) = null A, and null(AA∗ ) = null(A∗ ). Exercise 7.12 (Orthonormal Bases Example) Let A and B be as in Example 7.2. Give orthonormal bases for R(B) and N (B). Exercise 7.13 (Some Spanning Sets) Show for any A ∈ Cm×n that R(A∗ A) = R(V 1 ) = R(A∗ ) Exercise 7.14 (Singular Values and Eigenpair of Composite Matrix) Let A ∈ Cm×n with m ≥ n have singular values σ1 , . . . , σn , left singular vectors u1 , . . . , um ∈ Cm , and right singular vectors v 1 , . . . , v n ∈ Cn . Show that the matrix   0 A C := ∈ R(m+n)×(m+n) A∗ 0 has the n + m eigenpairs {(σ1 , p 1 ), . . . , (σn , p n ), (−σ1 , q 1 ), . . . , (−σn , q n ), (0, r n+1 ), . . . , (0, r m )}, where   u pi = i , vi



 ui qi = , −v i

  u r j = j , for i = 1, . . . , n, j = n + 1, . . . , m. 0

Exercise 7.15 (Polar Decomposition (Exam Exercise 2011-2)) Given n ∈ N and a singular value decomposition A = U ΣV T of a square matrix A ∈ Rn,n , consider the matrices Q := U V T ,

P := V ΣV T

(7.12)

of order n. a) Show that A = QP and show that Q is orthonormal.

(7.13)

166

7 The Singular Value Decomposition

b) Show that P is symmetric positive semidefinite and positive definite if A is nonsingular. The factorization in (7.13) is called a polar factorization c)  Use the singular value decomposition of A to give a suitable definition of B := AT A so that P = B. For the rest of this problem assume that A is nonsingular. Consider the iterative method X k+1 =

 1 , k = 0, 1, 2, . . . with X0 = A, X k + X −T k 2

(7.14)

for finding Q. d) Show that the iteration (7.14) is well defined by showing that X k = U Σ k V T , for a diagonal matrix Σk with positive diagonal elements, k = 0, 1, 2, . . .. e) Show that Xk+1 − Q =

  1 −T  T X k X k − QT X k − Q 2

(7.15)

and use (7.15) and the Frobenius norm to show (quadratic convergence to Q) X k+1 − QF ≤

1 −1 X F X k − Q2F . 2 k

(7.16)

f) Write a MATLAB program function [Q,P,k] = polardecomp(A,tol,K) to carry out the iteration in (7.14). The output is approximations Q and P = QT A to the polar decomposition A = QP of A and the number of iterations k such that Xk+1 − X k F < tol ∗ X k+1 F . Set k = K + 1 if convergence is not achieved in K iterations. The Frobenius norm in MATLAB is written norm(A,’fro’). Exercise 7.16 (Underdetermined System (Exam Exercise 2015-1)) a) Let A be the matrix ⎡

⎤ 1 2 A = ⎣ 0 1⎦ . −1 3 Compute A1 and A∞ . b) Let B be the matrix B=

  1 0 −1 . 11 1

Find the spaces span(B T ) and ker(B).

7.5 Exercises Chap. 7

167

c) Consider the underdetermined linear system x1

− x3 = 4,

x1 + x2 + x3 = 12. Find the solution x ∈ R3 with x2 as small as possible. d) Let A ∈ Rm×n be a matrix with linearly independent columns, and b ∈ Rm a vector. Assume that we use the Gauss-Seidel method (cf. Chap. 12) to solve the normal equations AT Ax = AT b. Will the method converge? Justify your answer.

7.5.3 Exercises Sect. 7.4 Exercise 7.17 (Rank Example) Consider the singular value decomposition ⎡

0 ⎢4 A := ⎢ ⎣4 0

⎤ ⎡ 1 1 1 1 ⎤⎡ 6 3 3 2 −2 −2 2 ⎢ 1 1 1 1 ⎥⎢0 1 −1 ⎥ ⎥ = ⎢ 2 2 2 2 ⎥⎢ 1 −1 ⎦ ⎣ 21 12 − 12 − 12 ⎦ ⎣ 0 1 1 1 1 0 3 3 2 −2 2 −2

⎤ 00 ⎡2 2 1⎤ 3 3 3 6 0⎥ ⎥ ⎣ 2 −1 −2 ⎦ ⎦ 3 3 3 00 2 2 1 − 3 3 3 00

a) Give orthonormal bases for R(A), R(AT ), N (A) and N (AT ). b) Explain why for all matrices B ∈ R4,3 of rank one we have A − BF ≥ 6. c) Give a matrix A1 of rank one such that A − A1 F = 6. Exercise 7.18 (Another Rank Example) Let A be the n × n matrix that for n = 4 takes the form   A=

1 −1 −1 −1 0 1 −1 −1 0 0 1 −1 0 0 0 1

.

Thus A is upper triangular with diagonal elements one and all elements above the diagonal equal to −1. Let B be the matrix obtained from A by changing the (n, 1) element from zero to −22−n . a) Show that Bx = 0, where x := [2n−2 , 2n−3 , . . . , 20 , 1]T . Conclude that B is singular, det(A) = 1, and A − BF = 22−n . Thus even if det(A) is not small the Frobenius norm of A − B is small for large n, and the matrix A is very close to being singular for large n. b) Use Theorem 7.6 to show that the smallest singular vale σn of A is bounded above by 22−n .

168

7 The Singular Value Decomposition

Exercise 7.19 (Norms, Cholesky and SVD (Exam Exercise 2016-1)) a) Let A be the matrix ⎡

⎤ 3 1 A = ⎣ 2 3⎦ . −1 5 Compute A1 , A∞ and AF . b) Let T be the matrix  T =

 2 −1 . −1 2

Show that T is symmetric positive definite, and find the Cholesky factorization T = LLT of T . c) Let A = U ΣV ∗ be a singular value decomposition of the m × n-matrix A with m ≥ n, and let A = ri=1 σi ui v ∗i , where 1 ≤ r ≤ n, σi are the singular values of A, and where ui , v i are the columns of U and V . Prove that 2 A − A 2F = σr+1 + · · · + σn2 .

7.6 Review Questions 7.6.1

Consider an SVD and an SVF of a matrix A. • • • • • •

7.6.2

What are the singular values of A? how is the SVD defined? how can we find an SVF if we know an SVD? how can we find an SVD if we know an SVF? what are the relations between the singular vectors? which singular vectors form bases for R(A) and N (A∗ )?

How are the Frobenius norm and singular values related?

Part III

Matrix Norms and Least Squares

We introduce vector and matrix norms and use them to study how sensitive the solution of a linear system is to perturbation in the data. This leads to the important concept of condition number. In the second chapter in this part we consider solving linear systems in the least squares sense. We give examples, the basic theory, discuss numerical methods and perturbation theory. Singular values and the important concept of generalized inverses play a central role in our presentation.

Chapter 8

Matrix Norms and Perturbation Theory for Linear Systems

Norms are used to measure the size of vector and matrices.

8.1 Vector Norms Definition 8.1 (Vector Norm) A (vector) norm in a real (resp. complex) vector space V is a function · : V → R that satisfies for all x, y in V and all a in R (resp. C) 1. x ≥ 0 with equality if and only if x = 0. 2. ax = |a| x. 3. x + y ≤ x + y.

(positivity) (homogeneity) (subadditivity)

The triple (V, R, ·) (resp. (V, C, ·)) is called a normed vector space and the inequality 3. is called the triangle inequality. In this book the vector space will be one of Rn , Cn or one of the matrix spaces or Cm×n . Vector addition is defined by element wise addition and scalar multiplication is defined by multiplying every element by the scalar. We encountered norms associated with any inner product in Rn or Cn in Chap. 5. That these inner product norms are really norms was shown in Theorem 5.2. In this book we will use the following family of vector norms on V = Cn and V = Rn . Rm×n ,

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_8

171

172

8 Matrix Norms and Perturbation Theory for Linear Systems

Definition 8.2 (Vector p-Norms) We define for p ≥ 1 and x ∈ Rn or x ∈ Cn the p-norms by xp :=

n

1/p |xj |p

(8.1)

,

j =1

x∞ := max |xj |.

(8.2)

1≤j ≤n

The most important cases are p = 1, 2, ∞: 1. x1 :=

n

|xj | ,

(the one-norm or l1 -norm)

j =1

% n 2 2. x2 := j =1 |xj | , 3. x∞ := max |xj |,

(the two-norm, l2 -norm, or Euclidian norm) (the infinity-norm, l∞ -norm, or max norm)

1≤j ≤n

Some remarks are in order. 1. In Sect. 8.4, we show that the p-norms are vector norms for 1 ≤ p ≤ ∞. 2. The triangle inequality x + yp ≤ xp + yp is called Minkowski’s inequality. 3. To prove it one first establishes Hölder’s inequality n

|xj yj | ≤ xp yq ,

j =1

1 1 + = 1, p q

x, y ∈ Cn .

(8.3)

The relation p1 + q1 = 1 means that if p = 1 then q = ∞ and vice versa. The Hölder’s inequality is the same as the Cauchy-Schwarz inequality (cf. Theorem 5.1) for the Euclidian norm p = 2. 4. The infinity norm is related to the other p-norms by lim xp = x∞ for all x ∈ Cn .

(8.4)

p→∞

5. The equation (8.4) clearly holds for x = 0. For x = 0 we write xp := x∞

n j =1

 |xj | p x∞

1/p .

Now each term in the sum is not greater than one and at least one term is equal to one, and we obtain x∞ ≤ xp ≤ n1/p x∞ ,

p ≥ 1.

Since limp→∞ n1/p = 1 for any fixed n ∈ N, we see that (8.4) follows.

(8.5)

8.1 Vector Norms

173

6. In Exercise 8.28 we show the following generalization of inequality (8.5) 

xp ≤ xp ≤ n1/p−1/p xp ,

x ∈ Cn ,

1 ≤ p ≤ p ≤ ∞.

(8.6)

We return now to the general vector norm case. Definition 8.3 (Equivalent Norms) We say that two norms · and · on V are equivalent if there are positive constants m and M such that for all vectors x ∈ V we have mx ≤ x ≤ Mx .

(8.7)

By (8.5) the p- and ∞-norms are equivalent for any p ≥ 1. This result is generalized in the following theorem. Theorem 8.1 (Basic Properties of Vector Norms) The following holds for a normed vector space (V, C, ·). (inverse triangle inequality). 1. x − y ≥ | x − y |, for all x, y ∈ Cn 2. The vector norm is a continuous function V → R. 3. All vector norms on V are equivalent provided V is finite dimensional. Proof 1. Since x = x − y + y ≤ x − y + y we obtain x − y ≥ x − y. By symmetry x − y = y − x ≥ y − x and we obtain the inverse triangle inequality. 2. This follows from the inverse triangle inequality. 3. The following proof can be skipped by those who do not have the necessary background in advanced calculus. Define the · unit sphere S := {y ∈ V : y = 1}. The set S is a closed and bounded set and the function f : S → R given by f (y) = y is continuous by what we just showed. Therefore f attains its minimum and maximum value on S. Thus, there are positive constants m and M such that m ≤ y ≤ M,

y ∈ S.

(8.8)

For any x ∈ V we have y := x/x ∈ S, and (8.7) follows if we apply (8.8) to these y.  

174

8 Matrix Norms and Perturbation Theory for Linear Systems

8.2 Matrix Norms For simplicity we consider only norms on the vector space (Cm×n , C). All results also holds for (Rm×n , R). A matrix norm   : Cm×n , → R is simply a vector norm on Cm×n . Thus 1., 2. and 3. in Definition 8.1 holds, where we replace x and y by m × n matrices A and B, respectively. The Frobenius norm AF :=

m n 1/2  |aij |2 i=1 j =1

is a matrix norm. Indeed, writing all elements in A in a string of length mn we see that the Frobenius norm is the Euclidian norm on the space Cmn . Adapting Theorem 8.1 to the matrix situation gives Theorem 8.2 (Matrix Norm Equivalence) All matrix norms on Cm×n are equivalent. Thus, if · and · are two matrix norms on Cm×n then there are positive constants μ and M such that μA ≤ A ≤ MA holds for all A ∈ Cm×n . Moreover, a matrix norm is a continuous function. Any vector norm ·V on Cmn defines a matrix norm on Cm×n given by A := vec(A)V , where vec(A) ∈ Cmn is the vector obtained by stacking the columns of A on top of each other. In particular, to the p vector norms for p = 1, 2, ∞, we have the corresponding sum norm, Frobenius norm, and max norm defined by AS :=

n m

i=1 j =1

|aij |,

AF :=

n m 

|aij |2

1/2

,

AM := max |aij |.

i=1 j =1

i,j

(8.9) Of these norms the Frobenius norm is the most useful. Some of its properties were derived in Lemma 7.1 and Theorem 7.5.

8.2.1 Consistent and Subordinate Matrix Norms Since matrices can be multiplied it is useful to have an analogue of subadditivity for matrix multiplication. For square matrices the product AB is defined in a fixed space Cn×n , while in the rectangular case matrix multiplication combines matrices in different spaces. The following definition captures this distinction.

8.2 Matrix Norms

175

Definition 8.4 (Consistent Matrix Norms) A matrix norm is called consistent on Cn×n if 4. AB ≤ A B

(submultiplicativity)

holds for all A, B ∈ Cn×n . A matrix norm is consistent if it is defined on Cm×n for all m, n ∈ N, and 4. holds for all matrices A, B for which the product AB is defined. Clearly the Frobenius norm is defined for all m, n ∈ N. From Lemma 7.1 it follows that the Frobenius norm is consistent. For a consistent matrix norm on Cn×n we have the inequality Ak  ≤ Ak for A ∈ Cn×n and k ∈ N.

(8.10)

When working with norms one often has to bound the vector norm of a matrix times a vector by the norm of the matrix times the norm of the vector. This leads to the following definition. Definition 8.5 (Subordinate Matrix Norms) Suppose m, n ∈ N are given, let   on Cm and  β on Cn be vector norms, and let   be a matrix norm on Cm×n . We say that the matrix norm   is subordinate to the vector norms   and  β if Ax ≤ A xβ for all A ∈ Cm×n and all x ∈ Cn . By Lemma 7.1 we have Ax2 ≤ AF x2 , for all x ∈ Cn . Thus the Frobenius norm is subordinate to the Euclidian vector norm. For consistent matrix norms we have Proposition 8.1 For m, n ∈ N, A ∈ Cm×n , all x ∈ Cn and any consistent matrix norm   Ax ≤ A x,

(8.11)

i.e., a consistent matrix norm is subordinate to itself. Moreover, the matrix power bound (8.10) holds for all square matrices A ∈ Cn×n . Proof Since a consistent matrix norm is defined on Cm×n for all m, n ∈ N the consistency implies that (8.11) holds for A ∈ Cm×n and B := x ∈ Cn×1 . The last statement also follows immediately from the consistency.  

8.2.2 Operator Norms Corresponding to vector norms on Cn and Cm there is an induced matrix norm on Cm×n which we call the operator norm. It is possible to consider one vector norm

176

8 Matrix Norms and Perturbation Theory for Linear Systems

on Cm and another vector norm on Cn , but we treat only the case of one vector norm defined on Cn for all n ∈ N.1 Definition 8.6 (Operator Norm) Let   be a vector norm defined on Cn for all n ∈ N. For given m, n ∈ N and A ∈ Cm×n we define A := max x =0

Ax . x

(8.12)

We call this the operator norm corresponding to the vector norm  . With a risk of confusion we use the same symbol for the operator norm and the corresponding vector norm. Before we show that the operator norm is a matrix norm we make some observations. 1. It is enough to take the max over subsets of Cn . For example A = max Ax.

(8.13)

S := {x ∈ Cn : x = 1}

(8.14)

x=1

The set

is the unit sphere in Cn with respect to the vector norm  . It is enough to take the max over this unit sphere since max x =0

.  x . Ax . . = max .A . = max Ay. x x =0 x y=1

2. The operator norm is subordinate to the corresponding vector norm. Thus, Ax ≤ Ax for all A ∈ Cm×n and x ∈ Cn .

(8.15)

3. We can use max instead of sup in (8.12). This follows by the following compactness argument. The unit sphere S given by (8.14) is bounded. It is also finite dimensional and closed, and hence compact. Moreover, since the vector norm   : S → R is a continuous function, it follows that the function f : S → R given by f (x) = Ax is continuous. But then f attains its max and min and we have A = Ax ∗  for some x ∗ ∈ S.

(8.16)

the case of one vector norm   on Cm and another vector norm  β on Cn we would define A := maxx =0 Ax xβ .

1 In

8.2 Matrix Norms

177

Lemma 8.1 (The Operator Norm Is a Consistent Matrix Norm) If   is vector norm defined on Cn for all n ∈ N, then the operator norm given by (8.12) is a consistent matrix norm. Moreover, I  = 1. Proof We use (8.13). In 2. and 3. below we take the max over the unit sphere S given by (8.14). 1. Nonnegativity is obvious. If A = 0 then Ay = 0 for each y ∈ Cn . In particular, each column Aej in A is zero. Hence A = 0. 2. cA = maxx cAx = maxx |c| Ax = |c| A. 3. A + B = maxx (A + B)x ≤ maxx Ax + maxx Bx = A + B. ABx ABx Bx 4. AB = maxx =0 ABx x = maxBx =0 x = maxBx =0 Bx x ≤ maxy =0

Ay y

maxx =0

Bx x

= A B.

That I  = 1 for any operator norm follows immediately from the definition.   √ Since I F = n, we see that the Frobenius norm is not an operator norm for n > 1.

8.2.3 The Operator p-Norms Recall that the p or p vector norms (8.1) are given by xp :=

n 

|xj |p

1/p

, p ≥ 1,

j =1

x∞ := max |xj |. 1≤j ≤n

The operator norms  p defined from these p-vector norms are used quite frequently for p = 1, 2, ∞. We define for any 1 ≤ p ≤ ∞ Ap := max x =0

Axp = max Ayp . yp =1 xp

(8.17)

For p = 1, 2, ∞ we have explicit expressions for these norms. Theorem 8.3 (One-Two-Inf-Norms) For A ∈ Cm×n we have A1 := max Aej 1 = max 1≤j ≤n

m

1≤j ≤n

A2 := σ1 ,

(max column sum)

(largest singular value of A)

A∞ = max eTk A1 = max 1≤k≤m

|ak,j |,

k=1

1≤k≤m

n

j =1

|ak,j |.

(max row sum)

(8.18)

178

8 Matrix Norms and Perturbation Theory for Linear Systems

The two-norm A2 is also called the spectral norm of A. Proof We proceed as follows: (a) We derive a constant Kp such that Axp ≤ Kp for any x ∈ Cn with xp = 1. (b) We give an extremal vector y e ∈ Cn with y e p = 1 so that Ay e p = Kp . It then follows from (8.17) that Ap = Ay e p = Kp . 2-norm: Let A = U ΣV ∗ be a singular value decomposition of A, define K2 = σ1 , c := V ∗ x, and y e = v 1 the singular vector corresponding to σ1 . Then x = V c, c2 = x2 = 1, and using (7.7) in (b) we find (a) Ax22 = U ΣV ∗ x22 = Σ c22 = nj=1 σj2 |cj |2 ≤ σ12 nj=1 |cj |2 = σ12 . (b) Av 1 2 = σ1 u1 2 = σ1 . 1-norm: Define K1 , c and y e by K1 := Aec 1 = max1≤j ≤n Aej 1 and y e := ec , a unit vector. Then y e 1 = 1 and we obtain (a) Ax1 =

n m m n m

 n     akj xj  ≤ |akj ||xj | = |akj | |xj | ≤ K1 . k=1 j =1

k=1 j =1

j =1

k=1

(b) Ay e 1 = K1 . ∞-norm: Define K∞ , r and y e by K∞ := eTr A1 = max1≤k≤meTk A1 and y e := [e−iθ1 , . . . , e−iθn ]T , where arj = |arj |eiθj for j = 1, . . . , n. (a) Ax∞

n n

    = max akj xj ≤ max |akj ||xj | ≤ K∞ . 1≤k≤m

j =1

1≤k≤m

j =1

  (b) Ay ∗ ∞ = max1≤k≤m  nj=1 akj e−iθj  = K∞ .   The last equality is correct because  nj=1 akj e−iθj  ≤ nj=1 |akj | ≤ K∞ with equality for k = r.   Example 8.1 (Comparing   One-Two-Inf-Norms) The largest singular value of the 1 14 4 16 matrix A := 15 2 22 13 , is σ1 = 2 (cf. Example 7.9). We find A1 =

29 , 15

A2 = 2,

A∞ =

37 , 15

AF =

√ 5.

The values of these norms do not differ by much. In some cases the spectral norm is equal to an eigenvalue of the matrix.

8.2 Matrix Norms

179

Theorem 8.4 (Spectral Norm) Suppose A ∈ Cn×n has singular values σ1 ≥ σ2 ≥ · · · ≥ σn and eigenvalues |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn |. Then A2 = σ1 and A−1 2 =

1 , σn

(8.19)

A2 = λ1 and A−1 2 =

1 , if A is positive definite, λn

(8.20)

1 , if A is normal. |λn |

(8.21)

A2 = |λ1 | and A−1 2 =

For the norms of A−1 we assume that A is nonsingular. Proof Since 1/σn is the largest singular value of A−1 , (8.19) follows. By Exercise 7.5 the singular values of a positive definite matrix (normal matrix) are equal to the eigenvalues (absolute value of the eigenvalues). This implies (8.20) and (8.21).   The following result is sometimes useful. Theorem 8.5 (Spectral Norm Bound) For any A ∈ Cm×n we have A22 ≤ A1 A∞ . Proof Let (σ 2 , v) be an eigenpair for A∗ A corresponding to the largest singular value σ of A. Then A22 v1 = σ 2 v1 = σ 2 v1 = A∗ Av1 ≤ A∗ 1 A1 v1 . Observing that A∗ 1 = A∞ by Theorem 8.3 and canceling v1 proves the result.  

8.2.4 Unitary Invariant Matrix Norms Definition 8.7 (Unitary Invariant Norm) A matrix norm   on Cm×n is called unitary invariant if U AV  = A for any A ∈ Cm×n and any unitary matrices U ∈ Cm×m and V ∈ Cn×n . When a unitary invariant matrix norm is used, the size of a perturbation is not increased by a unitary transformation. Thus if U and V are unitary then U (A + E)V = U AV + F , where F  = E. It follows from Lemma 7.1 that the Frobenius norm is unitary invariant. We show here that this also holds for the spectral norm.

180

8 Matrix Norms and Perturbation Theory for Linear Systems

Theorem 8.6 (Unitary Invariant Norms) The Frobenius norm and the spectral norm are unitary invariant. Moreover, A∗ F = AF and A∗ 2 = A2 . Proof The results for the Frobenius norm follow from Lemma 7.1. Suppose A ∈ Cm×n and let U ∈ Cm×m and V ∈ Cn×n be unitary. Since the 2-vector norm is unitary invariant we obtain U A2 = max U Ax2 = max Ax2 = A2 . x2 =1

x2 =1

Now A and A∗ have the same nonzero singular values, and it follows from Theorem 8.3 that A∗ 2 = A2 . Moreover V ∗ is unitary. Using these facts we find AV 2 = (AV )∗ 2 = V ∗ A∗ 2 = A∗ 2 = A2 .   It can be shown that the spectral norm is the only unitary invariant operator norm, see [10] p. 357.

8.2.5 Absolute and Monotone Norms A vector norm on Cn is an absolute norm if x =  |x|  for all x ∈ Cn . Here |x| := [|x1 |, . . . , |xn |]T , the absolute values of the components of x. Clearly the vector p norms are absolute norms. We state without proof (see Theorem 5.5.10 of [10]) that a vector norm on Cn is an absolute norm if and only if it is a monotone norm, i.e., |xi | ≤ |yi |, i = 1, . . . , n ⇒ x ≤ y, for all x, y ∈ Cn . Absolute and monotone matrix norms are defined as for vector norms. The study of matrix norms will be continued in Chap. 12.

8.3 The Condition Number with Respect to Inversion Consider the system of two linear equations x2 = 20 x1 + −16 x1 + (1 − 10 )x2 = 20 − 10−15

8.3 The Condition Number with Respect to Inversion

181

whose exact solution is x1 = x2 = 10. If we replace the second equation by x1 + (1 + 10−16 )x2 = 20 − 10−15 , the exact solution changes to x1 = 30, x2 = −10. Here a small change in one of the coefficients, from 1 − 10−16 to 1 + 10−16 , changed the exact solution by a large amount. A mathematical problem in which the solution is very sensitive to changes in the data is called ill-conditioned. Such problems can be difficult to solve on a computer. In this section we consider what effect a small change (perturbation) in the data A,b has on the inverse of A and on the solution x of a linear system Ax = b. To measure this we use vector and matrix norms. In this section   will denote a vector norm on Cn and also a matrix norm on Cn×n . We assume that the matrix norm is consistent on Cn×n and subordinate to the vector norm. Thus, for any A, B ∈ Cn×n and any x ∈ Cn we have AB ≤ A B and Ax ≤ A x. Recall that this holds if the matrix norm is the operator norm corresponding to the given vector norm. It also holds for the Frobenius matrix norm and the Euclidian n×n then I  = 1 vector norm. This follows from Lemma √ 7.1. We recall that if I ∈ R for an operator norm, while I F = n.

8.3.1 Perturbation of the Right Hand Side in a Linear Systems Suppose x, y solve Ax = b and (A + E)y = b+e, respectively. where A, A + E ∈ Cn×n are nonsingular and b, e ∈ Cn . How large can y−x be? The difference y −x measures the absolute error in y as an approximation to x, while y − x/x and y − x/y are measures for the relative error. We consider first the simpler case of a perturbation in the right-hand side b. Theorem 8.7 (Perturbation in the Right-Hand Side) Suppose A ∈ Cn×n is nonsingular, b, e ∈ Cn , b = 0 and Ax = b, Ay = b+e. Then 1 e y − x e ≤ ≤ K(A) , K(A) b x b

(8.22)

where K(A) = A A−1  is the condition number of A. Proof Subtracting Ax = b from Ay = b+e we have A(y−x) = e or y−x = A−1 e. Combining y − x = A−1 e ≤ A−1  e and b = Ax ≤ A x we obtain the upper bound in (8.22). Combining e ≤ A y − x and x ≤ A−1  b we obtain the lower bound.  

182

8 Matrix Norms and Perturbation Theory for Linear Systems

Consider (8.22). e/b is a measure of the size of the perturbation e relative to the size of b. The upper bound says that y − x/x in the worst case can be K(A) times as large as e/b. The bounds in (8.22) depends on K(A). This number is called the condition number with respect to inversion of a matrix, or just the condition number of A, if it is clear from the context that we are talking about inverting a matrix. The condition number depends on the matrix A and on the norm used. If K(A) is large, A is called ill-conditioned (with respect to inversion). If K(A) is small, A is called well-conditioned (with respect to inversion). We always have K(A) ≥ 1. For since x = I x ≤ I x for any x we have I  ≥ 1 and therefore A A−1  ≥ AA−1  = I  ≥ 1. Since all matrix norms are equivalent, the dependence of K(A) on the norm chosen is less important than the dependence on A. Example 8.1 provided an illustration of this. See also Exercise 8.19. Sometimes one chooses the spectral norm when discussing properties of the condition number, and the 1 , ∞ , or Frobenius norm when one wishes to compute it or estimate it. Suppose we have computed an approximate solution y to Ax = b. The vector r(y) := Ay − b is called the residual vector, or just the residual. We can bound x −y in terms of r. Theorem 8.8 (Perturbation and Residual) Suppose A ∈ Cn×n , b ∈ Cn , A is nonsingular and b = 0. Let r(y) = Ay − b for y ∈ Cn . If Ax = b then y − x r(y) 1 r(y) ≤ ≤ K(A) . K(A) b x b Proof We simply take e = r(y) in Theorem 8.7.

(8.23)  

Consider next a perturbation in the coefficient matrix in a linear system. Suppose A, E ∈ Cn×n with A, A + E nonsingular. We like to compare the solution x and y of the systems Ax = b and (A + E)y = b. Theorem 8.9 (Perturbation in Matrix) Suppose A, E ∈ Cn×n , b ∈ Cn with A nonsingular and b = 0. If r := A−1 E < 1 then A + E is nonsingular. If Ax = b and (A + E)y = b then E y − x ≤ r ≤ K(A) , y A

(8.24)

r K(A) E y − x ≤ ≤ . x 1−r 1 − r A

(8.25)

Proof We show A + E singular implies r ≥ 1. Suppose A + E is singular. Then (A + E)x = 0 for some nonzero x ∈ Cn . Multiplying by A−1 it follows that (I + A−1 E)x = 0 and this implies that x = A−1 Ex ≤ rx. But then r ≥ 1. Subtracting (A + E)y = b from Ax = b gives A(x − y) = Ey or x − y = A−1 Ey. Taking norms and dividing by y proves (8.24). Solving

8.3 The Condition Number with Respect to Inversion

183

x − y = A−1 Ey for y we obtain y = (I + A−1 E)−1 x. By Theorem 12.14 we x and (8.24) implies y − x ≤ ry ≤ have y ≤ (I + A−1 E)−1 x ≤ 1−r K(A) E r   1−r x ≤ 1−r A x. Dividing by x gives (8.25). In Theorem 8.9 we gave bounds for the relative error in x as an approximation to y and the relative error in y as an approximation to x. E/A is a measure for the size of the perturbation E in A relative to the size of A. The condition number again plays a crucial role. y − x/y can be as large as K(A) times E/A. It can be shown that the upper bound can be attained for any A and any b. In deriving the upper bound we used the inequality A−1 Ey ≤ A−1  E y. For a more or less random perturbation E this is not a severe overestimate for A−1 Ey. In the situation where E is due to round-off errors (8.24) can give a fairly realistic estimate for y − x/y. The following explicit expressions for the 2-norm condition number follow from Theorem 8.4. Theorem 8.10 (Spectral Condition Number) Suppose A ∈ Cn×n is nonsingular with singular values σ1 ≥ σ2 ≥ · · · ≥ σn > 0 and eigenvalues |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | > 0. Then ⎧ ⎪ if A is positive definite, ⎪ ⎨λ1 /λn , K2 (A) = |λ1 |/|λn |, if A is normal, (8.26) ⎪ ⎪ ⎩σ /σ , in general. 1

n

It follows that A is ill-conditioned with respect to inversion if and only if σ1 /σn is large, or λ1 /λn is large when A is positive definite. If A is well-conditioned, (8.23) says that y − x/x ≈ r(y)/b. In other words, the accuracy in y is about the same order of magnitude as the residual as long as b ≈ 1. If A is ill-conditioned, anything can happen. We can for example have an accurate solution even if the residual is large.

8.3.2 Perturbation of a Square Matrix Suppose A is nonsingular and E a perturbation of A. We expect B := A + E to be nonsingular when E is small relative to A. But how small is small? It is also useful to have bounds on B −1  in terms of A−1  and the difference B −1 − A−1 . We consider the relative errors B −1 /A−1 , B −1 − A−1 /B −1  and B −1 − A−1 /A−1 .

184

8 Matrix Norms and Perturbation Theory for Linear Systems

Theorem 8.11 (Perturbation of Inverse Matrix) Suppose A ∈ Cn×n is nonsingular and let B := A + E ∈ Cn×n be nonsingular. For any consistent matrix norm   we have B −1 − A−1  E ≤ A−1 E ≤ K(A) , −1 A B 

(8.27)

where K(A) := A A−1 . If r := A−1 E < 1 then B is nonsingular and B −1  1 1 ≤ . ≤ 1+r 1−r A−1 

(8.28)

We also have B −1 − A−1  A

−1





K(A) E r ≤ . 1−r 1 − r A

(8.29)

We can replace A−1 E by EA−1  everywhere. Proof That B is nonsingular if r < 1 follows from Theorem 8.9. We have −E = A − B = A(B −1 − A−1 )B = B(B −1 − A−1 )A so that B −1 − A−1 = −A−1 EB −1 = −B −1 EA−1 .

(8.30)

Therefore, if B is nonsingular then by (8.30) B −1 − A−1  ≤ A−1 EB −1  ≤ K(A)

E −1 B . A

Dividing through by B −1  gives the upper bounds in (8.27). Next, (8.30) implies B −1  ≤ A−1  + A−1 EB −1  ≤ A−1  + rB −1 . Solving for B −1  and dividing by A−1  we obtain the upper bound in (8.28). Similarly we obtain the lower bound in (8.28) from A−1  ≤ B −1  + rB −1 . The bound in (8.29) follows by multiplying (8.27) by B −1 /A−1  and using (8.28). That we can replace A−1 E by EA−1  everywhere follows from (8.30).  

8.4 Proof That the p-Norms Are Norms

185

8.4 Proof That the p-Norms Are Norms We want to show Theorem 8.12 (The p Vector Norms Are Norms) Let for 1 ≤ p ≤ ∞ and x ∈ Cn xp :=

n 

|xj |p

1/p

,

x∞ := max |xj |. 1≤j ≤n

j =1

Then for all 1 ≤ p ≤ ∞, x, y ∈ Cn and all a ∈ C 1. xp ≥ 0 with equality if and only if x = 0. 2. axp = |a| xp . 3. x + yp ≤ xp + yp .

(positivity) (homogeneity) (subadditivity)

Positivity and homogeneity follows immediately. To show the subadditivity we need some elementary properties of convex functions. Definition 8.8 (Convex Function) Let I ⊂ R be an interval. A function f : I → R is convex if   f (1 − λ)x1 + λx2 ≤ (1 − λ)f (x1 ) + λf (x2 )

(8.31)

for all x1 , x2 ∈ I with x1 < x2 and all λ ∈ [0, 1]. The sum nj=1 λj xj is called a convex combination of x1 , . . . , xn if λj ≥ 0 for j = 1, . . . , n and nj=1 λj = 1. The convexity condition is illustrated in Fig. 8.1.

(1 − λ)f (x1 ) + λf (x2 ) f (x)

x1 Fig. 8.1 A convex function

x = (1 − λ)x1 + λ x2

x2

186

8 Matrix Norms and Perturbation Theory for Linear Systems

Lemma 8.2 (A Sufficient Condition for Convexity) If f ∈ C 2 [a, b] and f  (x) ≥ 0 for x ∈ [a, b] then f is convex. Proof We recall the formula for linear interpolation with remainder, (cf a book on numerical methods) For any a ≤ x1 ≤ x ≤ x2 ≤ b there is a c ∈ [x1 , x2 ] such that f (x) =

x2 − x x − x1 f (x1 ) + f (x2 ) + (x − x1 )(x − x2 )f  (c)/2 x2 − x1 x2 − x1

= (1 − λ)f (x1 ) + λf (x2 ) + (x2 − x1 )2 λ(λ − 1)f  (c)/2,

λ :=

x − x1 . x2 − x1

Since λ ∈ [0, 1] we have f (x) ≤ (1 − λ)f (x1 ) + λf (x2 ). Moreover, x=

x2 − x x − x1 x1 + x2 = (1 − λ)x1 + λx2 x2 − x1 x2 − x1  

so that (8.31) holds, and f is convex.

The following inequality is elementary, but can be used to prove many nontrivial inequalities. Theorem 8.13 (Jensen’s Inequality) Suppose I ∈ R is an interval and f : I → R is Then for all n ∈ N, all λ1 , . . . , λn with λj ≥ 0 for j = 1, . . . , n and convex. n λ = 1, and all z1 , . . . , zn ∈ I j j =1 f(

n

j =1

λj zj ) ≤

n

λj f (zj ).

j =1

Proof We use induction on n. The result is trivial for n = 1. Let n ≥ 2, assume the inequality holds for n − 1, and let λj , zj for j = 1, . . . , n be given as in the theorem. Since n ≥ 2 we have λi < 1 for at least one i so assume without loss of λ generality that λ1 < 1, and define u := nj=2 1−λj 1 zj . Since nj=2 λj = 1 − λ1 this is a convex combination of n − 1 terms and the induction hypothesis implies λj that f (u) ≤ nj=2 1−λ f (zj ). But then by the convexity of f 1 f

n 



λj zj = f (λ1 z1 + (1 − λ1 )u) ≤ λ1 f (z1 ) + (1 − λ1 )f (u) ≤

j =1

and the inequality holds for n.

n

λj f (zj )

j =1

 

8.4 Proof That the p-Norms Are Norms

187

Corollary 8.1 (Weighted Geometric/Arithmetic Mean Inequality) Suppose n λ a is a convex combination of nonnegative numbers a1 , . . . , an . Then j j j =1 a1λ1 a2λ2 · · · anλn ≤

n

λj aj ,

(8.32)

j =1

where 00 := 0. Proof The result is trivial if one or more of the aj ’s are zero so assume aj > 0 for all j . Consider the function f : (0, ∞) → R given by f (x) = − log x. Since f  (x) = 1/x 2 > 0 for x ∈ (0, ∞), it follows from Lemma 8.2 that this function is convex. By Jensen’s inequality − log

n 

n

   λj aj ≤ − λj log(aj ) = − log a1λ1 · · · anλn

j =1

j =1

   n  or log a1λ1 · · · anλn ≤ log j =1 λj aj . The inequality follows since exp(log x) = x for x > 0 and the exponential function is monotone increasing.   Taking λj = n1 for all j in (8.32) we obtain the classical geometric/arithmetic mean inequality 1

(a1 a2 · · · an ) n ≤

1 aj . n n

(8.33)

j =1

Corollary 8.2 (Hölder’s Inequality) For x, y ∈ Cn and 1 ≤ p ≤ ∞ n

|xj yj | ≤ xp yq , where

j =1

1 1 + = 1. p q

Proof We leave the proof for p = 1 and p = ∞ as an exercise so assume 1 < p < ∞. For any a, b ≥ 0 the weighted arithmetic/geometric mean inequality implies that 1

1

ap bq ≤

1 1 1 1 a + b, where + = 1. p q p q

(8.34)

If x = 0 or y = 0 there is nothing to prove so assume that both x and y are nonzero. Using 8.34 on each term in the middle sum we obtain n n n

|xj |p p |yj |q q 1 1 |xj |p 1 |yj |q =1 |xj yj | = ≤ + p q xp yq p xpp q yqq xp yq 1

j =1

j =1

and the proof of the inequality is complete.

1

j =1

 

188

8 Matrix Norms and Perturbation Theory for Linear Systems

Corollary 8.3 (Minkowski’s Inequality) For x, y ∈ Cn and 1 ≤ p ≤ ∞ x + yp ≤ xp + yp . Proof We leave the proof for p = 1 and p = ∞ as an exercise so assume 1 < p < ∞. We write p

x + yp =

n

|xj + yj |p ≤

j =1

n

|xj ||xj + yj |p−1 +

j =1

n

|yj ||xj + yj |p−1 . j =1

We apply Hölder’s inequality with exponent p and q to each sum. In view of the relation (p − 1)q = p the result is p

p/q

x + yp ≤ xp x + yp

p/q

+ yp x + yp

p−1

= (xp + yp )x + yp

,

 

and canceling the common factor, the inequality follows.

8.4.1 p-Norms and Inner Product Norms It is possible to characterize the p-norms that are derived from an inner product. We start with the following identity. Theorem 8.14 (Parallelogram Identity) For all x, y in a real or complex inner product space x + y2 + x − y2 = 2x2 + 2y2 ,

(8.35)

where   is the inner product norm in the space. Proof We set a = ±1 in (5.5) and add the two equations.

 

Theorem 8.15 (When Is a Norm an Inner Product Norm?) To a given norm on a real or complex vector space V there exists an inner product on V such that x, x = x2 if and only if the parallelogram identity (8.35) holds for all x, y ∈ V. Proof If x, x = x2 then x + y2 + x − y2 = x + y, x + y + x − y, x − y = 2x2 + 2y2 and the parallelogram identity holds. For the converse we prove the real case and leave the complex case as an exercise. Suppose (8.35) holds for all x, y in the real vector space V. We show that x, y :=

 1 x + y2 − x − y2 , 4

x, y ∈ V

(8.36)

8.4 Proof That the p-Norms Are Norms

189

defines an inner product on V. Clearly 1. and 2. in Definition 5.1 hold. The hard part is to show 3. We need to show that x, z + y, z = x + y, z, ax, y = ax, y,

x, y, z ∈ V,

(8.37)

x, y ∈ V.

(8.38)

a ∈ R,

Now (8.36)

4x, z + 4y, z = x + z2 − x − z2 + y + z2 − y − z2   x +y x −y 2 x + y y −x 2 +  − z− +  = z+ 2 2 2 2   x +y x −y 2 x +y y −x 2 −  − z− −  + z+ 2 2 2 2 x +y 2 x−y 2 x +y 2 y−x 2 (8.35)  + 2  − 2z −  − 2  = 2z + 2 2 2 2 x+y (8.36) , z, = 8 2 or x, z + y, z = 2

x+y , z, 2

x, y, z ∈ V.

In particular, since y = 0 implies y, z = 0 we obtain x, z = 2 x2 , z for all x, z ∈ V. This means that 2 x+y 2 , z = x + y, z for all x, y, z ∈ V and (8.37) follows. We first show (8.38) when a = n is a positive integer. By induction (8.37)

nx, y = (n − 1)x + x, y = (n − 1)x, y + x, y = nx, y.

(8.39)

If m, n ∈ N then m2 

n (8.39) (8.39) x, y = mnx, y = mnx, y, m

implying that (8.38) holds for positive rational numbers 

n n x, y = x, y. m m

Now if a > 0 there is a sequence {an } of positive rational numbers converging to a. For each n (8.36)

an x, y = an x, y =

 1 an x + y2 − an x − y2 . 4

190

8 Matrix Norms and Perturbation Theory for Linear Systems

Taking limits and using continuity of norms we obtain ax, y = ax, y. This also holds for a = 0. Finally, if a < 0 then (−a) > 0 and from what we just showed (8.36)

(−a)x, y = (−a)x, y =

 1 −ax + y2 − −ax − y2 = −ax, y, 4  

so (8.38) also holds for negative a.

Corollary 8.4 (Are the p-Norms Inner Product Norms?) For the p vector norms on V = Rn or V = Cn , 1 ≤ p ≤ ∞, n ≥ 2, there is an inner product on V such that x, x = x2p for all x ∈ V if and only if p = 2. Proof For p = 2 the p-norm is the Euclidian norm which corresponds to the standard inner product. If p = 2 then the parallelogram identity (8.35) does not hold for say x := e1 and y := e2 .  

8.5 Exercises Chap. 8 8.5.1 Exercises Sect. 8.1 Exercise 8.1 (An A-Norm Inequality (Exam Exercise 1982-4)) Given a symmetric positive definite matrix A ∈ Rn×n with eigenvalues 0 < λn ≤ · · · ≤ λ1 . Show that / λ1 y2 , xA ≤ yA ⇒ x2 ≤ λn where xA :=



x T Ax,

x ∈ Rn .

Exercise 8.2 (A Orthogonal Bases (Exam Exercise 1995-4)) Let A ∈ Rn×n be a symmetric and positive definite matrix and assume b1 , . . . , bn is a basis for Rn . We define B k := [b1 , . . . , bk ] ∈ Rn×k for k = 1, . . . , n. We consider in this exercise the inner product ·, · defined by x, y := x T Ay for x, y ∈ Rn and the corresponding norm xA := x, x1/2 . We define b˜ 1 := b1 and  −1 b˜ k := bk − B k−1 B Tk−1 AB k−1 B Tk−1 Abk ,

k = 2, . . . , n.

a) Show that B Tk AB k is positive definite for k = 1, . . . , n. b) Show that for k = 2, . . . , n we have (i) b˜ k , bj  = 0 for j = 1, . . . , k − 1 and (ii) b˜ k − bk ∈ span(b1 , . . . , b k−1 ).

8.5 Exercises Chap. 8

191

c) Explain why b˜ 1 , . . . , b˜ n is a basis for Rn which in addition is A-orthogonal, i.e., b˜ i , b˜ j  = 0 for all i, j ≤ n, i = j . d) Define B˜ n := [b˜ 1 , . . . , b˜ n ]. Show that there is an upper triangular matrix T ∈ Rn×n with ones on the diagonal and satisfies B n = B˜ n T . e) Assume that the matrix T in d) is such that |tij | ≤ 12 for all i, j ≤ n, i = j . Assume also that b˜ k 2A ≤ 2b˜ k+1 2A for k = 1, . . . , n−1 and that det(B n ) = 1. Show that then2  b1 A b2 A · · · bn A ≤ 2n(n−1)/4 det(A).

8.5.2 Exercises Sect. 8.2 Exercise 8.3 (Consistency of Sum Norm?) Show that the sum norm is consistent. Exercise 8.4 (Consistency of Max Norm?) Show that the max norm is not consistent by considering 11 11 . Exercise 8.5 (Consistency of Modified Max Norm) a) Show that the norm A :=

√ mnAM ,

A ∈ Cm×n

is a consistent matrix norm. √ b) Show that the constant mn can be replaced by m and by n. Exercise 8.6 (What Is the Sum Norm Subordinate to?) Show that the sum norm is subordinate to the l1 -norm. Exercise 8.7 (What Is the Max Norm Subordinate to?) a) Show that the max norm is subordinate to the ∞ and 1 norm, i.e., Ax∞ ≤ AM x1 holds for all A ∈ Cm×n and all x ∈ Cn . b) Show that if AM = |akl |, then Ael ∞ = AM el 1 . ∞ c) Show that AM = maxx =0 Ax x1 . Exercise 8.8 (Spectral Norm) Let m, n ∈ N and A ∈ Cm×n . Show that A2 =

2 Hint:

max

x2 =y2 =1

Show that b˜ 1 2A · · · b˜ n 2A = det(A).

|y ∗ Ax|.

192

8 Matrix Norms and Perturbation Theory for Linear Systems

Exercise 8.9 (Spectral Norm of the Inverse) Suppose A ∈ Cn×n is nonsingular. Show that Ax2 ≥ σn for all x ∈ Cn with x2 = 1. Show that A−1 2 = max x =0

x2 . Ax2

Exercise 8.10 (p-Norm Example) Let  A=

 2 −1 . −1 2

Compute Ap and A−1 p for p = 1, 2, ∞. Exercise 8.11 (Unitary Invariance of the Spectral Norm) Show that V A2 = A2 holds even for a rectangular V as long as V ∗ V = I . Exercise 8.12 (AU 2 Rectangular A) Find A ∈ R2×2 and U ∈ R2×1 with U T U = I such that AU 2 < A2 . Thus, in general, AU 2 = A2 does not hold for a rectangular U even if U ∗ U = I . Exercise 8.13 (p-Norm of Diagonal Matrix) Show that Ap = ρ(A) := max |λi | (the largest eigenvalue of A), 1 ≤ p ≤ ∞, when A is a diagonal matrix. Exercise 8.14 (Spectral Norm of a Column Vector) A vector a ∈ Cm can also be considered as a matrix A ∈ Cm,1 . a) Show that the spectral matrix norm (2-norm) of A equals the Euclidean vector norm of a. b) Show that Ap = ap for 1 ≤ p ≤ ∞. Exercise 8.15 (Norm of Absolute Value Matrix) If A ∈ Cm×n has elements aij , let |A| ∈ Rm×n be the matrix with elements |aij |.   √ 1+i −2 a) Compute |A| if A = , i = −1. 1 1−i b) Show that for any A ∈ Cm×n AF =  |A| F , Ap =  |A| p for p = 1, ∞. c) Show that for any A ∈ Cm×n A2 ≤  |A| 2 . d) Find a real symmetric 2 × 2 matrix A such that A2 <  |A| 2 . Exercise 8.16 (An Iterative Method (Exam Exercise 2017-3)) Assume that A ∈ Cn×n is non-singular and nondefective (the eigenvectors of A form a basis for Cn ). We wish to solve Ax = b. Assume that we have a list of the eigenvalues {λ1 , λ2 , . . . , λm }, in no particular order. We have that m ≤ n, since some of the eigenvalues may have multiplicity larger than one. Given x 0 ∈ Cn , and k ≥ 0, we

8.5 Exercises Chap. 8

193

define the sequence {x k }m−1 k=0 by x k+1 = x k +

1 λk+1

r k , where r k = b − Ax k .

a) Let the coefficients cik be defined by rk =

n

cik ui ,

i=1

where {(σi , ui )}ni=1 are the eigenpairs of A. Show that ci,k+1 =

⎧ ⎨0 ⎩ci,k



 if σi = λk+1 , 1 − λk+1 otherwise. σi

b) Show that for some l ≤ m, we have that x l = x l+1 = · · · = x m = x, where Ax = b. c) Consider this iteration for the n × n matrix T = tridiag(c, d, c), where d and c are positive real numbers and d > 2c. The eigenvalues of T are λj = d + 2c cos

jπ , n+1

j = 1, . . . , n.

What is the operation count for solving T x = b using the iterative algorithm above? d) Let now B be a symmetric n × n matrix which is zero on the “tridiagonal”, i.e., bij = 0 if |i − j | ≤ 1. Set A = T + B, where T is the tridiagonal matrix above. We wish to solve Ax = b by the iterative scheme T x k+1 = b − Bx k .

(8.40)

Recall that if E ∈ Rn×n has eigenvalues λ1 , . . . , λn then ρ(E) := maxi |λi | is the spectral radius of E. Show that ρ(T −1 B) ≤ ρ(T −1 )ρ(B). e) Show that the iteration (8.40) will converge if3 ⎧ ⎨

⎫ n n ⎬

min max |bij |, max |bij | < d − 2c. ⎩ i ⎭ j j =1

3 Hint:

use Gershgorin’s theorem.

i=1

194

8 Matrix Norms and Perturbation Theory for Linear Systems

8.5.3 Exercises Sect. 8.3 Exercise 8.17 (Perturbed Linear Equation (Exam Exercise 1981-2)) Given the systems Ax = b, Ay = b + e, where   1.1 1 A := , 1 1

    2.1 b , b := 1 = b2 2.0

  e e := 1 , e2

e2 = 0.1.

We define δ := x − y2 /x2 . a) Determine K2 (A) = A2 A−1 2 . Give an upper bound and a positive lower bound for δ without computing x and y. b) Suppose as before that b2 = 2.0 and e2 = 0.1. Determine b1 and e which maximize δ. Exercise 8.18 (Sharpness of Perturbation Bounds) The upper and lower bounds for y − x/x given by (8.22) can be attained for any matrix A, but only for special choices of b. Suppose y A and y A−1 are vectors with y A  = y A−1  = 1 and A = Ay A  and A−1  = A−1 y A−1 . a) Show that the upper bound in (8.22) is attained if b = Ay A and e = y A−1 . b) Show that the lower bound is attained if b = y A−1 and e = Ay A . Exercise 8.19 (Condition Number of 2. Derivative Matrix) In this exercise we will show that for m ≥ 1 1 4 (m + 1)2 − 2/3 < condp (T ) ≤ (m + 1)2 , π2 2

p = 1, 2, ∞,

(8.41)

where T := tridiag(−1, 2, −1) ∈ Rm×m and condp (T ) := T p T −1 p is the pnorm condition number of T . The p matrix norm is given by (8.17). You will need the explicit inverse of T given by (2.39) and the eigenvalues given in Lemma 2.2. As usual we define h := 1/(m + 1). a) Show that for m ≥ 3 1 cond1 (T ) = cond∞ (T ) = 2

,

m odd, h−2 , −2 h − 1, m even.

and that cond1 (T ) = cond∞ (T ) = 3 for m = 2. b) Show that for p = 2 and m ≥ 1 we have cond2 (T ) = cot2

 πh  2

= 1/ tan2

 πh  . 2

(8.42)

8.5 Exercises Chap. 8

195

c) Show the bounds 4 4 −2 2 h − < cond2 (T ) < 2 h−2 . 2 π 3 π

(8.43)

Hint: For the upper bound use the inequality tan x > x valid for 0 < x < π/2. For the lower bound we use (without proof) the inequality cot2 x > x12 − 23 for x > 0. d) Show (8.41). Exercise 8.20 (Perturbation of the Identity Matrix) Let E be a square matrix. a) Show that if I − E is nonsingular then (I − E)−1 − I  ≤ E (I − E)−1  b) If E < 1 then (I − E)−1 is nonsingular by exists and 1 1 ≤ (I − E)−1  ≤ 1 + E 1 − E Show the lower bound. Show the upper bound if I  = 1. In general for a consistent matrix norm (i.e., the Frobenius norm) the upper bound follows from Theorem 12.14 using Neumann series. c) Show that if E < 1 then (I − E)−1 − I  ≤

E . 1 − E

Exercise 8.21 (Lower Bounds in (8.27) and (8.29)) a) Solve for E in (8.30) and show that K(B)−1

B −1 − A−1  E ≤ . A B −1 

b) Show using a) and (8.28) that B −1 − A−1  K(B)−1 E ≤ . 1 + r A A−1  Exercise 8.22 (Periodic Spline Interpolation (Exam Exercise 1993-2)) Let the components of x = [x0, . . . , xn ]T ∈ Rn+1 define a partition of the interval [a, b], a = x0 < x1 < · · · < xn = b,

196

8 Matrix Norms and Perturbation Theory for Linear Systems

and given a dataset y := [y0 , . . . , yn ]T ∈ Rn+1 , where we assume y0 = yn . The periodic cubic spline interpolation problem is defined by finding a cubic spline function g satisfying the conditions g(xi ) = yi ,

i = 0, 1, . . . , n,

g  (a) = g  (b),

g  (a) = g  (b).

(Recall that g is a cubic polynomial on each interval (xi−1 , xi ), for i = 1, . . . , n with smoothness C 2 [a, b].) We define si := g  (xi ), i = 0, . . . , n. It can be shown that the vector s := [s1 , . . . , sn ]T is determined from a linear system As = b,

(8.44)

where b ∈ Rn is a given vector determined by x and y. The matrix A ∈ Rn×n is given by ⎡

⎤ · · · 0 λ1 ⎥ . 2 μ2 . . 0 ⎥ ⎥ . ⎥ .. .. .. .. . . . . .. ⎥ ⎥, ⎥ .. .. .. .. . . . . 0 ⎥ ⎥ ⎥ .. . λn−1 2 μn−1 ⎦ 0 · · · 0 λn 2

2 μ1 0

⎢ ⎢ λ2 ⎢ ⎢ ⎢0 A := ⎢ ⎢ . ⎢ .. ⎢ ⎢ ⎣0 μn where λi :=

hi , hi−1 + hi

μi :=

hi−1 , hi−1 + hi

, i = 1, . . . , n,

and hi = xi+1 − xi ,

i = 0, . . . , n − 1, and hn = h0 .

You shall not argue or prove the system (8.44). Throughout this exercise we assume that 1 hi ≤ ≤ 2, 2 hi−1

i = 1, . . . , n.

8.5 Exercises Chap. 8

197

a) Show that A∞ = 3

A1 ≤

and that

10 . 3

b) Show that A−1 ∞ ≤ 1. c) Show that A−1 1 ≤ 32 . d) Let s and b be as in (8.44), where we assume b = 0. Let e ∈ Rn be such that ep /bp ≤ 0.01. Suppose sˆ satisfies Aˆs = b + e. Give estimates for ˆs − s∞ s∞

and

ˆs − s1 . s1

Exercise 8.23 (LSQ MATLAB Program (Exam Exercise 2013-4)) Suppose A ∈ Rm×n , b ∈ Rm , where A has rank n and let A = U Σ V T be a singular value factorization of A. Thus U ∈ Rm×n and Σ, V ∈ Rn×n . Write a MATLAB function [x,K]=lsq(A,b) that uses the singular value factorization of A to calculate a least squares solution x = V Σ −1 U T b to the system Ax = b and the spectral (2-norm) condition number of A. The MATLAB command [U,Sigma,V]=svd(A,0) computes the singular value factorization of A.

8.5.4 Exercises Sect. 8.4 Exercise 8.24 (When Is a Complex Norm an Inner Product Norm?) Given a vector norm in a complex vector space V, and suppose (8.35) holds for all x, y ∈ V. Show that  1 (8.45) x + y2 − x − y2 + ix + iy2 − ix − iy2 , 4 √ defines an inner product on V, where i = −1. The identity (8.45) is called the polarization identity.4 x, y :=

Exercise 8.25 (p Norm for p = 1 and p = ∞) Show that ·p is a vector norm in Rn for p = 1, p = ∞.

4 Hint:

We have x, y = s(x, y) + is(x, iy), where s(x, y) :=

1 4



 x + y2 − x − y2 .

198

8 Matrix Norms and Perturbation Theory for Linear Systems

Exercise 8.26 (The p-Norm Unit Sphere) The set Sp = {x ∈ Rn : xp = 1} is called the unit sphere in Rn with respect to p. Draw Sp for p = 1, 2, ∞ for n = 2. Exercise 8.27 (Sharpness of p-Norm Inequality) For p ≥ 1, and any x ∈ Cn we have x∞ ≤ xp ≤ n1/p x∞ (cf. (8.5)). Produce a vector x l such that x l ∞ = x l p and another vector x u such that x u p = n1/p x u ∞ . Thus, these inequalities are sharp. Exercise 8.28 (p-Norm Inequalities for Arbitrary p) If 1 ≤ q ≤ p ≤ ∞ then xp ≤ xq ≤ n1/q−1/p xp ,

x ∈ Cn .

Hint: For the rightmost inequality use Jensen’s inequality Cf. Theorem 8.13 with f (z) = zp/q and zi = |xi |q . For the left inequality consider first yi = xi /x∞ , i = 1, 2, . . . , n.

8.6 Review Questions 8.6.1 • • • • • • •

What is a consistent matrix norm? what is a subordinate matrix norm? is an operator norm consistent? why is the Frobenius norm not an operator norm? what is the spectral norm of a matrix? how do we compute A∞ ? what is the spectral condition number of a symmetric positive definite matrix?

8.6.2 Does there exist a vector norm   such that Ax ≤ AF x for all A ∈ Cn×n , x ∈ Cn , m, n ∈ N? 8.6.3 Why is A2 ≤ AF for any matrix A? 8.6.4 What is the spectral norm of the inverse of a normal matrix?

Chapter 9

Least Squares

Consider the linear system Ax = b of m equations in n unknowns. It is overdetermined, if m > n, square, if m = n, and underdetermined, if m < n. In either case the system can only be solved approximately if b ∈ / R(A), the column space of A. One way to solve Ax = b approximately is to select a vector norm ·, say a p-norm, and look for x ∈ Cn which minimizes Ax − b. The use of the one and ∞ norm can be formulated as linear programming problems, while the Euclidian norm leads to a linear system and has applications in statistics. Only this norm is considered here. Definition 9.1 (Least Squares Problem (LSQ)) Suppose m, n ∈ N, A ∈ Cm×n and b ∈ Cm . To find x ∈ Cn that minimizes E : Cn → R given by E(x) := Ax − b22 , is called the least squares problem. A minimizer x is called a least squares solution. √ Since the square root function is monotone, minimizing E(x) or E(x) is equivalent. Example 9.1 (Average) Consider an overdetermined linear system of 3 equations in one unknown x1 = 1 x1 = 1, x1 = 2

⎡ ⎤ 1 A = ⎣1⎦ ,

x = [x1 ],

1

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_9

⎡ ⎤ 1 b = ⎣1⎦ . 2

199

200

9 Least Squares

To solve this as a least squares problem we compute Ax − b22 = (x1 − 1)2 + (x1 − 1)2 + (x1 − 2)2 = 3x12 − 8x1 + 6. Setting the first derivative with respect to x1 equal to zero we obtain 6x1 − 8 = 0 or x1 = 4/3, the average of b1 , b2 , b3 . The second derivative is positive and x1 = 4/3 is a global minimum. We will show below the following results, valid for any m, n ∈ N, A ∈ Cm×n and b ∈ Cn . Theorem 9.1 (Existence) The least squares problem always has a solution. Theorem 9.2 (Uniqueness) The solution of the least squares problem is unique if and only if A has linearly independent columns. Theorem 9.3 (Characterization) x ∈ Cn is a solution of the least squares problem if and only if A∗ Ax = A∗ b. The linear system A∗ Ax = A∗ b is known as the normal equations. By Lemma 4.2 the coefficient matrix A∗ A is symmetric and positive semidefinite, and it is positive definite if and only if A has linearly independent columns. This is the same condition which guarantees that the least squares problem has a unique solution.

9.1 Examples Example 9.2 (Linear Regression) We want to fit a straight line p(t) = x1 + x2 t to m ≥ 2 given data (tk , yk ) ∈ R2 , k = 1, . . . , m. This is part of the linear regression process in statistics. We obtain the linear system ⎡ ⎤ ⎤ ⎡ ⎤ 1 t1   p(t1 ) y1 ⎢ .. ⎥ ⎢ .. ⎥ x1 ⎢ .. ⎥ Ax = ⎣ . ⎦ = ⎣ . ⎦ = ⎣ . ⎦ = b. x2 1 tm ym p(tm ) ⎡

This is square for m = 2 and overdetermined for m > 2. The matrix A has linearly independent columns if and only if the set {t1 , . . . , tm } of sites contains at least two distinct elements. For if say ti = tj then ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 1 t1 0      0 1 ti c1 ⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥ = ⇒ c1 = c2 = 0. c1 ⎣ . ⎦ + c2 ⎣ . ⎦ = ⎣ . ⎦ ⇒ 0 1 tj c2 tm 1 0

9.1 Examples

201

Conversely, if t1 = · · · = tm then the columns of A are linearly dependent. The normal equations are ⎡ ⎤  1 t1       m t x1 1 · · · 1 ⎢ . ⎥ x1 = k2 , A∗ Ax = ⎣ .. ⎦ x2 tk tk x2 t1 · · · tm 1 tm ⎡ ⎤     y1 yk 1 ··· 1 ⎢ . ⎥ = = A∗ b, ⎣ .. ⎦ = tk yk t1 · · · tm ym where k ranges from 1 to m in the sums. By what we showed the coefficient matrix is positive semidefinite and positive definite if we have at least two distinct cites. If m = 2 and t1 = t2 then both systems Ax = b and A∗ Ax = A∗ b are square, and p is the linear interpolant to the data. Indeed, p is linear and p(tk ) = yk , k = 1, 2. With the data t 1.0 2.0 3.0 4.0 y 3.1 1.8 1.0 0.1     6 4 10 x1 = . The data and the least 10.1 10 30 x2 squares polynomial p(t) = x1 + x2 t = 3.95 − 0.98t are shown in Fig. 9.1. 

the normal equations become

Example 9.3 (Input/Output Model) Suppose we have a simple input/output model. To every input u ∈ Rn we obtain an output y ∈ R. Assuming we have a linear relation y = u∗ x =

n

ui x i ,

i=1

Fig. 9.1 A least squares fit to data

y

6 @

@ × @

@

@

@ ×

@ @ × @

@ @ × @

- t

202

9 Least Squares

between u and y, how can we determine x? Performing m ≥ n experiments we obtain a table of values u u1 u2 · · · um . y y1 y2 · · · ym We would like to find x such that ⎡

u∗1 ⎢ u∗ ⎢ 2 Ax = ⎢ . ⎣ ..





⎤ y1 ⎥ ⎢ y2 ⎥ ⎥ ⎢ ⎥ ⎥ x = ⎢ . ⎥ = b. ⎦ ⎣ .. ⎦

u∗m

ym

We can estimate x by solving the least squares problem minAx − b22 .

9.1.1 Curve Fitting Given • • • •

size: 1 ≤ n ≤ m, sites: S := {t1 , t2 , . . . , tm } ⊂ [a, b], y-values: y = [y1 , y2 , . . . , ym ]∗ ∈ Rm , functions: φj : [a, b] → R, j = 1, . . . , n.

Find a function (curve fit) p : [a, b] → R given by p := nj=1 xj φj such that p(tk ) ≈ yk for k = 1, . . . , m. A solution to the curve fitting problem is found by finding an approximate solution to the following overdetermined set of linear equations ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ φ1 (t1 ) · · · φn (t1 ) x1 y1 p(t1 ) ⎥ ⎢ ⎥ ⎢ ⎢ .. ⎥ ⎢ .. . . .. ⎦ ⎣ .. ⎦ = ⎣ ... ⎥ Ax = ⎣ . ⎦ = ⎣ . ⎦ =: b. xn ym p(tm ) φ1 (tm ) · · · φn (tm ) ⎡

(9.1)

We propose to find x ∈ Rn as a solution of the corresponding least squares problem given by E(x) := Ax − b22 =

m n

 k=1

j =1

2 xj φj (tk ) − yk .

(9.2)

9.1 Examples

203

Typical examples of functions φj are polynomials, trigonometric functions, exponential functions, or splines. In (9.2) one can also include weights wk > 0 for k = 1, . . . , m and minimize E(x) :=

m

k=1

wk

n 

2 xj φj (tk ) − yk .

j =1

If yk is an accurate observation, we can choose a large weight wk . This will force p(tk ) − yk to be small. Similarly, a small wk will allow p(tk ) − yk to be large. If an estimate for the standard deviation δyk in yk is known for each k, we can choose wk = 1/(δyk )2 , k = 1, 2, . . . , m. For simplicity we will assume in the following that wk = 1 for all k. Lemma 9.1 (Curve Fitting) Let A be given by (9.1). The matrix A∗ A is symmetric positive definite if and only if p(tk ) :=

n

xj φj (tk ) = 0,

k = 1, . . . , m ⇒ x1 = · · · = xn = 0.

(9.3)

j =1

definite if and only if A has linearly Proof By Lemma 4.2 A∗ A is positive n independent columns. Since (Ax)k = j =1 xj φj (tk ), k = 1, . . . , m this is equivalent to (9.3).   Example 9.4 (Ill Conditioning and the Hilbert Matrix) The normal equations can be extremely ill-conditioned. Consider the curve fitting problem using the polynomials φj (t) := t j −1 , for j = 1, . . . , n and equidistant sites tk = (k − 1)/(m − 1) for k = 1, . . . , m. The normal equations are B n x = cn , where for n = 3 ⎤ 2⎤⎡ ⎤ ⎡ t x1 m t y k2 k3 k B 3 x := ⎣ tk t t ⎦ ⎣ x2 ⎦ = ⎣ tk yk ⎦ . 2 k3 k4 2 tk tk tk tk yk x3 ⎡

B n is positive definite if at least n of the t’s are distinct. However B n is extremely ill-conditioned even for moderate n. Indeed, m1 B n ≈ H n , where H n ∈ Rn×n is the Hilbert Matrix with i, j element 1/(i + j − 1). Thus, for n = 3 ⎡

1

H 3 = ⎣ 21 1 3

1 2 1 3 1 4

1⎤ 3 1⎦ . 4 1 5

204

9 Least Squares

The elements of m1 B n are Riemann sums approximations to the elements of H n . In fact, if B n = [bi,j ]ni,j =1 then m m 1 1 i+j −2 1 k − 1 i+j −2 bi,j = tk = m m m m−1  ≈ 0

k=1

1

x i+j −2 dx =

k=1

1 = hi,j . i+j −1

The elements of H −1 n are determined in Exercise 1.13. We find K1 (H 6 ) := H 6 1 H −1  ≈ 3 · 107 . It appears that m1 B n and hence B n is ill-conditioned 6 1 for moderate n at least if m is large. The cure for this problem is to use a different basis for polynomials. Orthogonal polynomials are an excellent choice. Another possibility is to use the shifted power basis (t − t˜)j −1 , j = 1, . . . , n, for a suitable t˜.

9.2 Geometric Least Squares Theory The least squares problem can be studied as a quadratic minimization problem. In the real case we have E(x) := Ax − b22 = (Ax − b)∗ (Ax − b) = x ∗ A∗ Ax − 2x ∗ A∗ b + b∗ b. Minimization of a quadratic function like E(x) will be considered in Chap. 13. Here we consider a geometric approach based on orthogonal sums of subspaces, cf. Sect. 5.1.3. With the usual inner product x, y = y ∗ x, orthogonal sums and projections we can prove the existence, uniqueness and characterization theorems for least squares problems. For A ∈ Cm×n we consider the column space S := R(A) of A and the null space T := N (A∗ ) of A∗ . These are subspaces of Cm and by Theorem 7.3 we have the orthogonal sum ⊥

Cm = R(A)⊕N (A∗ ).

(9.4)

Proof of Theorem 9.1 It follows from (9.4) that any b ∈ Cm can be decomposed uniquely as b = b1 + b2 , where b1 is the orthogonal projection of b into R(A) and b2 is the orthogonal projection of b into N (A∗ ). Suppose x ∈ Cn . Clearly Ax −b1 ∈ R(A) since it is a subspace and b2 ∈ N (A∗ ). But then Ax −b1 , b2  = 0 and by Pythagoras Ax − b22 = (Ax − b1 ) − b2 22 = Ax − b1 22 + b2 22 ≥ b2 22

9.3 Numerical Solution

205

with equality if and only if Ax = b1 . It follows that the set of all least squares solutions is {x ∈ Cn : Ax = b1 }.

(9.5)

This set is nonempty since b1 ∈ R(A).

 

Proof of Theorem 9.2 The set (9.5) contains exactly one element if and only if A has linearly independent columns.   Proof of Theorem 9.3 If x solves the least squares problem then Ax − b1 = 0 and it follows that A∗ (Ax − b) = A∗ (Ax − b1 ) = 0 since b2 ∈ N (A∗ ). This shows that the normal equations hold. Conversely, if A∗ Ax = A∗ b then A∗ b2 = 0 implies that A∗ (Ax − b1 ) = 0. But then Ax − b1 ∈ R(A) ∩ N (A∗ ) showing that Ax − b1 = 0, and x is a least squares solution.  

9.3 Numerical Solution We assume that m ≥ n, A ∈ Cm×n , b ∈ Cm . We consider numerical methods based on normal equations, QR factorization, or Singular Value Factorization. For more see [2]. We discuss the first two approaches in this section. Another possibility is to use an iterative method like the conjugate gradient method (cf. Exercise 13.10).

9.3.1 Normal Equations We assume that rank(A) = n, i.e., A has linearly independent columns. The coefficient matrix B := A∗ A in the normal equations is positive definite, and we can solve these equations using the Cholesky factorization of B. Consider forming the normal equations. We can use either a column oriented (inner product)- or a row oriented (outer product) approach. m 1. inner product: (A∗ A)i,j = k=1 a k,i ak,j , i, j = 1, . . . , n, (A∗ b)i = m a b , i = 1, . . . , n, k=1 ⎡ k,i ⎤k ⎡ ⎤ a k,1 a k,1 m ⎢ . ⎥ ⎢ . ⎥ ∗ 2. outer product: A∗ A = m k=1 ⎣ .. ⎦ [ak1 · · · akn ], A b = k=1 ⎣ .. ⎦ bk . a k,n

a k,n

The outer product form is suitable for large problems since it uses only one pass through the data importing one row of A at a time from some separate storage. Consider the number of operations to find the least squares solution for real data. We need 2m arithmetic operations for each inner product. Since B is symmetric we only need to compute n(n + 1)/2 such inner products. It follows that B can be computed in approximately mn2 arithmetic operations. In conclusion the number

206

9 Least Squares

of operations are mn2 to find B, 2mn to find c := A∗ b, n3 /3 to find L such that B = LL∗ , n2 to solve Ly = c and n2 to solve L∗ x = y. If m ≈ n it takes 43 n3 = 2Gn arithmetic operations. If m is much bigger than n the number of operations is approximately mn2 , the work to compute B. Conditioning of A can be a problem with the normal equation approach. We have Theorem 9.4 (Spectral Condition Number of A∗ A) Suppose 1 ≤ n ≤ m and that A ∈ Cm×n has linearly independent columns. Then K2 (A∗ A) := A∗ A2 (A∗ A)−1 2 =

σ2 λ1 = 12 = K2 (A)2 , λn σn

(9.6)

where λ1 ≥ · · · ≥ λn > 0 are the eigenvalues of A∗ A, and σ1 ≥ · · · ≥ σn > 0 are the singular values of A. Proof Since A∗ A is Hermitian it follows from Theorem 8.10 that K2 (A) = ∗

K2 (A A) =

λ1 λn .

But λi =

σi2

by Theorem 7.2 and the proof is complete.

σ1 σn

and  

It follows from Theorem 9.4 that the 2-norm condition number of B := A∗ A is the square of the condition number of A and therefore can be quite large even if A is only mildly ill-conditioned. Another difficulty which can be encountered is that the computed A∗ A might not be positive definite. See Problem 9.21 for an example.

9.3.2 QR Factorization The QR factorization can be used to solve the least squares problem. We assume that rank(A) = n, i.e., A has linearly independent columns. Suppose A = Q1 R 1 is a QR factorization of A. Since Q1 ∈ Cm×n has orthonormal columns we find A∗ A = R ∗1 Q∗1 Q1 R 1 = R ∗1 R 1 ,

A∗ b = R ∗1 Q∗1 b.

Since A has rank n the matrix R ∗1 is nonsingular and can be canceled. Thus A∗ Ax = A∗ b ⇒ R 1 x = c1 ,

c1 := Q∗1 b.

We can use Householder transformations or Givens rotations to find R 1 and c1 . Consider using the Householder triangulation algorithm Algorithm 5.2. We find R = Q∗ A and c = Q∗ b, where A = QR is the QR decomposition of A. The matrices R 1 and c1 are located in the first n rows of R and c. Using also Algorithm 3.2 we have the following method to solve the full rank least squares problem. 1. [R,c]=housetriang(A,b). 2. x=rbacksolve(R(1:n,1:n),c(1:n),n).

9.3 Numerical Solution

207

Example 9.5 (Solution Using QR Factorization) Consider the least squares problem with ⎡

⎤ ⎡ ⎤ 1 3 1 1 ⎢1 3 7 ⎥ ⎢1⎥ ⎥ ⎢ ⎥ A=⎢ ⎣1 −1 −4⎦ and b = ⎣1⎦ . 1 −1 2 1 This is the matrix in Example 5.2. The least squares solution x is found by solving the system ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎤⎡ ⎤ 1 1 1 1 1 2 223 x1 ⎢1⎥ 1 ⎣0 4 5⎦ ⎣x2 ⎦ = ⎣ 1 1 −1 −1⎦ × ⎢ ⎥ = ⎣0⎦ , ⎣1⎦ 2 x3 −1 1 −1 1 0 006 1 ⎡

and we find x = [1, 0, 0]∗. Using Householder triangulation is a useful alternative to normal equations for solving full rank least squares problems. It can even be extended to rank deficient problems, see [2]. The 2 norm condition number for the system R 1 x = c1 is K2 (R 1 ) = K2 (Q1 R 1 ) = K2 (A), and as discussed in the previous section this is the square root of K2 (A∗ A), the condition number for the normal equations. Thus if A is mildly ill-conditioned the normal equations can be quite ill-conditioned and solving the normal equations can give inaccurate results. On the other hand Algorithm 5.2 is quite stable. But using Householder transformations requires more work. The leading term in the number of arithmetic operations in Algorithm 5.2 is approximately 2mn2 − 2n3 /3, (cf. (5.16)) while the number of arithmetic operations needed to form the normal equations, taking advantage of symmetry is approximately mn2 . Thus for m much larger than n using Householder triangulation requires twice as many arithmetic operations as the approach based on the normal equations. Also, Householder triangulation have problems taking advantage of the structure in sparse problems. Using MATLAB a least squares solution can be found using x=A\b if A has full rank.For rank deficient problems the function x=lscov(A,b) finds a least squares solution with a maximal number of zeros in x.

9.3.3 Singular Value Decomposition, Generalized Inverses and Least Squares Further insight into the least squares problem can be obtained by considering a singular value decomposition of A and the corresponding singular value factorization.

208

9 Least Squares

If A has rank r then    Σ1 0 V ∗1 A = U ΣV = [U 1 , U 2 ] = U 1 Σ 1 V ∗1 , 0 0 V ∗2 ∗

Σ1 = diag(σ1 , . . . , σr ), (9.7)

where U 1 = [u1 , . . . , ur ], U 2 = [ur+1 , . . . , um ],

U ∗1 U 1 = I , U ∗2 U 2 = I ,

V 1 = [v 1 , . . . , v r ], V 2 = [v r+1 , . . . , v n ],

V ∗1 V 1 = I , V ∗2 V 2 = I

and σ1 ≥ · · · ≥ σr > 0. We recall (cf. Theorem 7.3) • • • •

the set of columns of U 1 the set of columns of U 2 the set of columns of V 1 the set of columns of V 2

is an orthonormal basis for R(A), is an orthonormal basis for N (A∗ ), is an orthonormal basis for R(A∗ ), is an orthonormal basis for N (A).

The concept of the inverse of a matrix can be generalized to any rectangular matrix. Theorem 9.5 (The Generalized Inverse) For any m, n ∈ N and any A ∈ Cm×n there is a unique matrix A† ∈ Cn×m such that AA† A = A, A† AA† = A† , (A† A)∗ = A† A, (AA† )∗ = AA† .

(9.8)

If U 1 Σ 1 V ∗1 is a singular value factorization of A then ∗ A† = V 1 Σ −1 1 U 1.

(9.9)

Proof For existence we show that the matrices A = U 1 Σ 1 V ∗1 ,

∗ A† := V 1 Σ −1 1 U1

satisfies (9.8). Since U 1 and V 1 have orthonormal columns we find ∗ ∗ ∗ A† A = V 1 Σ −1 1 U 1U 1Σ 1V 1 = V 1V 1, ∗ ∗ AA† = U 1 Σ 1 V ∗1 V 1 Σ −1 1 U 1 = U 1U 1.

(9.10)

A similar calculation shows that (A† A)∗ = V 1 V ∗1 and (AA† )∗ = U 1 U ∗1 showing that A† A and AA† are Hermitian. Moreover, by (9.10) AA† A = U 1 Σ 1 V ∗1 V 1 V ∗1 = U 1 Σ 1 V ∗1 = A ∗ ∗ −1 ∗ † A† AA† = V 1 Σ −1 1 U 1U 1U 1 = V 1Σ U 1 = A .

9.3 Numerical Solution

209

Thus (9.8) follows. That there is only one matrix A† ∈ Cn×m satisfying (9.8) is shown in Exercise 9.5.   The matrix A† is called the generalized inverse of A. We note that 1. If A is square and nonsingular then A−1 satisfies (9.8) so that A−1 = A† . Indeed, A−1 A = AA−1 = I implies that A−1 A and AA−1 are Hermitian. Moreover, AA−1 A = A, A−1 AA−1 = A−1 . By uniqueness A−1 = A† . 2. We show in Exercise 9.7 that if A has linearly independent columns then A† = (A∗ A)−1 A∗ .

(9.11)

For further properties and examples of the generalized inverse see the exercises. Orthogonal projections can be expressed in terms of generalized inverses and singular vectors. Theorem 9.6 (Orthogonal Projections) Given m, n ∈ N, A ∈ Cm×n of rank r, and let S be one of the subspaces R(A), N (A∗ ). The orthogonal projection of v ∈ Cm into S can be written as a matrix P S times the vector v in the form P S v, where P R(A) = AA† = U 1 U ∗1 =

r

uj u∗j ∈ Cm×m ,

j =1 †

P N (A∗ ) = I − AA =

U 2 U ∗2

=

m

(9.12) uj u∗j

∈C

m×m

.

j =r+1

where A† is the generalized inverse of A, and A = U ΣV ∗ ∈ Cm×n is a singular value decomposition of A (cf. (9.7)). Proof By block multiplication we have for any v ∈ Cm  ∗ U1 v = U U v = [U 1 , U 2 v = s + t, U ∗2 ∗

where s = U 1 U ∗1 v ∈ R(A) and t = U 2 U ∗2 v ∈ N (A∗ ). By uniqueness and (9.10) we obtain the first equation in (9.12). Since v = (U 1 U ∗1 + U 2 U ∗2 )v for any v ∈ Cm we have U 1 U ∗1 + U 2 U ∗2 = I , and hence U 2 U ∗2 = I − U 1 U ∗1 = I − AA† and the second equation in (9.12) follows.   Corollary 9.1 (LSQ Characterization Using Generalized Inverse) x ∈ Cn solves the least squares problem minx Ax − b22 if and only if x = A† b + z, where A† is the generalized inverse of A and z ∈ N (A).

210

9 Least Squares

Proof It follows from Theorem 9.6 that b1 := AA† b is the orthogonal projection of b ∈ Cm into R(A). Moreover, (9.5) implies that x is a least squares solution if and only if Ax = b1 . Let x be a least squares solution, i.e., Ax = b1 . If z := x − A† b then Az = Ax − AA† b = b1 − b1 = 0 and x = A† b + z. Conversely, if x = A† b + z with Az = 0 then Ax = A(A† b + z) = b1 and x is a least squares solution.   The least squares solution A† b has an interesting property. Theorem 9.7 (Minimal Norm Solution) The least squares solution with minimal Euclidian norm is x = A† b corresponding to z = 0. Proof Consider a singular value decomposition of A using the notation in (9.7). Suppose x = A† b + z, with z ∈ N (A). Since the columns of V 2 form a basis for N (A) we have z = V 2 y for some y. Moreover, V ∗2 V 1 = 0 since V has orthonormal columns. But then z∗ A† b = y ∗ V ∗2 V 1 Σ −1 U ∗1 b = 0. Thus z and A† b are orthogonal so that by Pythagoras x22 = A† b + z22 = A† b22 + z22 ≥ A† b22 with equality for z = 0.   Example 9.6 (Rank Deficient Least Squares Solution) Consider the least squares  problem with A = 11 11 and b = [1, 1]∗. The singular value factorization,A† and A† b are given by       1 1 !1" 1   1 1 1 1   1/2 √ 1 1 = A, A† b = A := √ . [2] √ 1 1 , A† = √ 1/2 1 1 2 4 2 2 2 2 Using Corollary 9.1 we find the general solution [1/2, 1/2] + [a, −a] for any a ∈ C. The MATLAB function lscov gives the solution [1, 0]∗ corresponding to a = 1/2, while the minimal norm solution is [1/2, 1/2] obtained for a = 0.

9.4 Perturbation Theory for Least Squares In this section we consider what effect small changes in the data A, b have on the solution x of the least squares problem minAx − b2 . If A has linearly independent columns then we can write the least squares solution x (the solution of A∗ Ax = A∗ b) as (cf. Exercise 9.7) x = A† b = A† b1 ,

A† = (A∗ A)−1 A∗ ,

where b1 is the orthogonal projection of b into the column space R(A).

9.4 Perturbation Theory for Least Squares

211

9.4.1 Perturbing the Right Hand Side Let us now consider the effect of a perturbation in b on x. Theorem 9.8 (Perturbing the Right Hand Side) Suppose A ∈ Cm×n has linearly independent columns, and let b, e ∈ Cm . Let x, y ∈ Cn be the solutions of minAx − b2 and minAy − b − e2 . Finally, let b1 , e1 be the orthogonal projections of b and e into R(A). If b1 = 0, we have for any operator norm 1 e1  y − x e1  ≤ ≤ K(A) , K(A) b1  x b1 

K(A) = AA† .

(9.13)

Proof Subtracting x = A† b1 from y = A† b1 + A† e1 we have y − x = A† e1 . Thus y − x = A† e1  ≤ A† e 1 . Moreover, b1  = Ax ≤ Ax. Therefore y − x/x ≤ AA† e1 /b1  proving the rightmost inequality. From A(x − y) = e1 and x = A† b1 we obtain the leftmost inequality.   Equation (9.13) is analogous to the bound (8.22) for linear systems. We see that the number K(A) = AA†  generalizes the condition number AA−1  for a square matrix. The main difference between (9.13) and (8.22) is however that e/b in (8.22) has been replaced by e1 /b1 , the orthogonal projections of e and b into R(A). If b lies almost entirely in N (A∗ ), i.e. b/b1  is large, then e1 /b 1  can be much larger than e/b. This is illustrated in Fig. 9.2. If b is almost orthogonal to R(A), then e1 /b 1  will normally be much larger than e/b. Example 9.7 (Perturbing the Right Hand Side) Suppose ⎡

⎤ 11 A = ⎣0 1⎦, 00



⎤ 10−4 b = ⎣ 0 ⎦, 1



⎤ 10−6 e = ⎣ 0 ⎦. 0

Fig. 9.2 Graphical interpretation of the bounds in Theorem 9.8

N(AT)

b

e e span(A)

1

b1

b

2

212

9 Least Squares

For this example we can compute K(A) by finding A† explicitly. Indeed,       11 2 −1 1 −1 0 ∗ −1 † ∗ −1 ∗ A A= , (A A) = , A = (A A) A = . 12 −1 1 0 1 0 ∗

Thus K∞ (A) = A∞ A† ∞ = 2 · 2 = 4 is quite small.! " 100 Consider now the projections b1 and e1 . We find AA† = 0 1 0 . Hence 000

b1 = AA† b = [10−4 , 0, 0]∗ ,

and e1 = AA† e = [10−6 , 0, 0]∗ .

Thus e1 ∞ /b1 ∞ = 10−2 and (9.13) takes the form 1 −2 y − x∞ 10 ≤ ≤ 4 · 10−2 . 4 x∞

(9.14)

To verify the bounds we compute the solutions as x = A† b = [10−4 , 0]∗ and y = A† (b + e) = [10−4 + 10−6 , 0]∗ . Hence 10−6 x − y∞ = −4 = 10−2 , x∞ 10 in agreement with (9.14) For each A we can find b and e so that we have equality in the upper bound in (9.13). The lower bound is best possible in a similar way.

9.4.2 Perturbing the Matrix The analysis of the effects of a perturbation E in A is quite difficult. The following result is stated without proof, see [12, p. 51]. For other estimates see [2] and [19]. Theorem 9.9 (Perturbing the Matrix) Suppose A, E ∈ Cm×n , m > n, where A has linearly independent columns and α := 1 − E2 A† 2 > 0. Then A + E has linearly independent columns. Let b = b1 + b2 ∈ Cm where b1 and b2 are the orthogonal projections into R(A) and N (A∗ ) respectively. Suppose b1 = 0. Let x and y be the solutions of minAx − b2 and min(A + E)y − b2 . Then ρ=

x − y2 E2 1 ≤ K(1 + βK) , x2 α A2

β=

b 2 2 , K = A2 A† 2 . b 1 2 (9.15)

Equation (9.15) says that the relative error in y as an approximation to x can be at most K(1+βK)/α times as large as the size E2 /A2 of the relative perturbation

9.5 Perturbation Theory for Singular Values

213

in A. β will be small if b lies almost entirely in R(A), and we have approximately ρ ≤ α1 KE2 /A2 . This corresponds to the estimate (8.25) for linear systems. If β is not small, the term α1 K 2 βE2 /A2 will dominate. In other words, the condition number is roughly K(A) if β is small and K(A)2 β if β is not small. Note that β is large if b is almost orthogonal to R(A) and that b2 = b −Ax is the residual of x.

9.5 Perturbation Theory for Singular Values In this section we consider what effect a small change in the matrix A has on the singular values.

9.5.1 The Minmax Theorem for Singular Values and the Hoffman-Wielandt Theorem We have a minmax and maxmin characterization for singular values. Theorem 9.10 (The Courant-Fischer Theorem for Singular Values) Suppose A ∈ Cm×n has singular values σ1 , σ2 , . . . , σn ordered so that σ1 ≥ · · · ≥ σn . Then for k = 1, . . . , n σk =

min

max

dim(S )=n−k+1 x∈S x =0

Ax2 Ax2 = max min . x2 x2 x∈ S dim(S )=k

(9.16)

x =0

Proof We have Ax22 x22

=

x ∗ A∗ Ax (Ax)∗ (Ax) = = RA∗ A (x), x∗x x∗x

the Rayleigh quotient of A∗ A. Since the singular values of A are the nonnegative square roots of the eigenvalues of A∗ A, the results follow from the Courant-Fischer Theorem for eigenvalues, see Theorem 6.1.   By taking k = 1 and k = n in (9.16) we obtain for any A ∈ Cm×n σ1 = maxn x∈C x =0

Ax2 , x2

σn = minn x∈C x =0

Ax2 . x2

This follows since the only subspace of Cn of dimension n is Cn itself. Using Theorem 9.10 we obtain the following result.

(9.17)

214

9 Least Squares

Theorem 9.11 (Perturbation of Singular Values) Let A, B ∈ Rm×n be rectangular matrices with singular values α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn . Then |αj − βj | ≤ A − B2 , for j = 1, 2, . . . , n.

(9.18)

Proof Fix j and let S be the n−j +1 dimensional subspace for which the minimum in Theorem 9.10 is obtained for B. Then (B + (A − B))x2 Bx2 (A − B)x2 ≤ max + max x2 x2 x∈S x∈S x2 x∈S

αj ≤ max x =0

x =0

x =0

≤ βj + A − B2 . By symmetry we obtain βj ≤ αj + A − B2 and the proof is complete.

 

The following result is an analogue of Theorem 8.11. Theorem 9.12 (Generalized Inverse When Perturbing the Matrix) Let A, E ∈ Rm×n have singular values α1 ≥ · · · ≥ αn and 1 ≥ · · · ≥ n . If rank(A + E) ≤ rank(A) = r and A† 2 E2 < 1 then 1. rank(A + E) = rank(A), A† 2 2. (A + E)† 2 ≤ †

1−A 2 E2

=

1 αr − 1 .

Proof Suppose A has rank r and let B := A + E have singular values β1 ≥ · · · ≥ βn . In terms of singular values the inequality A† 2 E2 < 1 can be written 1 /αr < 1 or αr > 1 . By Theorem 9.11 we have αr − βr ≤ E2 = 1 , which implies βr ≥ αr − 1 > 0, and this shows that rank(A + E) ≥ r. Thus 1. follows. To prove 2., the inequality βr ≥ αr − 1 implies that (A + E)† 2 =

1 1 1/αr A† 2 ≤ = = . βr αr − 1 1 − 1 /αr 1 − A† 2 E2  

The Hoffman-Wielandt Theorem, see Theorem 6.14, for eigenvalues of Hermitian matrices can be written n n n

|μj − λj |2 ≤ A − B2F := |aij − bij |2 , j =1

(9.19)

i=1 j =1

where A, B ∈ Cn×n are both Hermitian matrices with eigenvalues λ1 ≥ · · · ≥ λn and μ1 ≥ · · · ≥ μn , respectively. For singular values we have a similar result.

9.5 Perturbation Theory for Singular Values

215

Theorem 9.13 (Hoffman-Wielandt Theorem for Singular Values) For any m, n ∈ N and A, B ∈ Cm×n we have n

|βj − αj |2 ≤ A − B2F .

(9.20)

j =1

where α1 ≥ · · · ≥ αn and β1 ≥ · · · ≥ βn are the singular values of A and B, respectively. Proof We apply the Hoffman-Wielandt Theorem for eigenvalues to the Hermitian matrices     0 A 0 B C := and D := ∈ C(m+n)×(m+n) . A∗ 0 B∗ 0 If C and D have eigenvalues λ1 ≥ · · · ≥ λm+n and μ1 ≥ · · · ≥ μm+n , respectively then m+n

|λj − μj |2 ≤ C − D2F .

(9.21)

j =1

Suppose A has rank r and SVD [u1 , . . . , um ]Σ[v 1 , . . . , v n ]∗ . We use (7.7) and determine the eigenpairs of C as follows. 

        0 A ui Av i αi ui ui = = = α , i = 1, . . . , r, i A∗ 0 v i A∗ ui αi v i vi          −Av i −αi ui ui 0 A ui = = = −αi , i = 1, . . . , r, A∗ ui αi v i −v i A∗ 0 −v i          0 0 A ui 0 u = = = 0 i , i = r + 1, . . . , m, ∗ ∗ 0 0 0 A 0 A ui          Av i 0 0 0 A 0 = , i = r + 1, . . . , n. = =0 ∗ 0 0 vi A 0 vi Thus C has the 2r eigenvalues α1 , −α1 , . . . , αr , −αr and m + n − 2r additional zero eigenvalues. Similarly, if B has rank s then D has the 2s eigenvalues β1 , −β1 , . . . , βs , −βs and m + n − 2s additional zero eigenvalues. Let t := max(r, s). Then λ1 ≥ · · · ≥ λm+n = α1 ≥ · · · ≥ αt ≥ 0 = · · · = 0 ≥ −αt ≥ · · · ≥ −α1 , μ1 ≥ · · · ≥ μm+n = β1 ≥ · · · ≥ βt ≥ 0 = · · · = 0 ≥ −βt ≥ · · · ≥ −β1 .

216

9 Least Squares

We find m+n

|λj − μj | =

j =1

2

t

|αi − βi | +

i=1

2

t

|−αi + βi | = 2 2

i=1

t

|αi − βi |2

i=1

and  C −D2F = 

 0 A−B 2 F = B −A2F +(B −A)∗ 2F = 2B −A2F . A∗ − B ∗ 0

But then (9.21) implies ti=1 |αi − βi |2 ≤ B − A2F . Since t ≤ n and αi = βi = 0 for i = t + 1, . . . , n we obtain (9.20).   Because of Theorem 9.11 and the Hoffman-Wielandt Theorem for singular values, Theorem 9.13 we will say that the singular values of a matrix are well conditioned. Changing the Frobenius norm or the spectral norm of a matrix by small amount only changes the singular values by a small amount.

9.6 Exercises Chap. 9 9.6.1 Exercises Sect. 9.1 Exercise 9.1 (Fitting a Circle to Points) In this problem we derive an algorithm to fit a circle (t − c1 )2 + (y − c2 )2 = r 2 to m ≥ 3 given points (ti , yi )m i=1 in the (t, y)-plane. We obtain the overdetermined system (ti − c1 )2 + (yi − c2 )2 = r 2 , i = 1, . . . , m,

(9.22)

of m equations in the three unknowns c1 , c2 and r. This system is nonlinear, but it can be solved from the linear system ti x1 + yi x2 + x3 = ti2 + yi2 , i = 1, . . . , m,

(9.23)

and then setting c1 = x1 /2, c2 = x2 /2 and r 2 = c12 + c22 + x3 . a) Derive (9.23) from (9.22). Explain how we can find c1 , c2 , r once [x1 , x2 , x3 ] is determined. b) Formulate (9.23) as a linear least squares problem for suitable A and b. c) Does the matrix A in b) have linearly independent columns? d) Use (9.23) to find the circle passing through the three points (1, 4), (3, 2), (1, 0).

9.6 Exercises Chap. 9

217

Exercise 9.2 (Least Square Fit (Exam Exercise 2018-1)) √ √  2 √2 a) Let A be the matrix . Find the singular values of A, and compute 3 0 A2 .   3α b) Consider the matrix A = , where α is a real number. For which values of α1 α is A positive definite? c) We would like to fit the points p 1 = (0, 1), p 2 = (1, 0), p 3 = (2, 1) to a straight line in the plane. Find a line p(x) = mx + b which minimizes 3

p(xi ) − yi 2 ,

i=1

where p i = (xi , yi ). Is this solution unique?

9.6.2 Exercises Sect. 9.2 Exercise 9.3 (A Least Squares Problem (Exam Exercise 1983-2)) Suppose A ∈ Rm×n and let I ∈ Rn×n be the identity matrix. We define F : Rn → R by F (x) := Ax − b22 + x22 . a) Show that the matrix B := I + AT A is symmetric and positive definite. b) Show that F (x) = x T Bx − 2cT x + bT b,

where c = AT b.

c) Show that to every b ∈ Rm there is a unique x which minimizes F . Moreover, x is the unique solution of the linear system (I + AT A)x = AT b. Exercise 9.4 (Weighted Least Squares (Exam Exercise 1977-2)) For m ≥ n we are given A ∈ Rm×n with linearly independent columns, b ∈ Rm , and D := diag(d1 , d2 , . . . , dm ) ∈ Rm×m , where di > 0, i = 1, 2, . . . , m. We want to minimize r(x)2D :=

m

i=1

ri (x)2 di ,

x ∈ Rn ,

(9.24)

218

9 Least Squares

where ri = ri (x), i = 1, 2, . . . , m are the components of the vector r = r(x) = b − Ax. a) Show that r(x)2D in (9.24) obtains a unique minimum when x = x min is the solution of the system AT DAx = AT Db. b) Show that K2 (AT DA) ≤ K2 (AT A)K2 (D), where for any nonsingular matrix K2 (B) := B2 B −1 2 .

9.6.3 Exercises Sect. 9.3 Exercise 9.5 (Uniqueness of Generalized Inverse) Given A ∈ Cm×n , and suppose B, C ∈ Cn×m satisfy ABA BAB (AB)∗ (BA)∗

=A =B = AB = BA

(1) ACA = A, (2) CAC = C, (3) (AC)∗ = AC, (4) (CA)∗ = CA.

Verify the following proof that B = C. B = (BA)B = (A∗ )B ∗ B = (A∗ C ∗ )A∗ B ∗ B = CA(A∗ B ∗ )B = CA(BAB) = (C)AB = C(AC)AB = CC ∗ A∗ (AB) = CC ∗ (A∗ B ∗ A∗ ) = C(C ∗ A∗ ) = CAC = C. Exercise 9.6 (Verify That a !Matrix Inverse) Show that the " Is a Generalized   11 1 110 † generalized inverse of A = 1 1 is A = 4 1 1 0 without using the singular 00 value decomposition of A. Exercise 9.7 (Linearly Independent Columns and Generalized Inverse) Suppose A ∈ Cm×n has linearly independent columns. Show that A∗ A is nonsingular and A† = (A∗ A)−1 A∗ . If A has linearly independent rows, then show that AA∗ is nonsingular and A† = A∗ (AA∗ )−1 .

9.6 Exercises Chap. 9

219

Exercise 9.8 (More Orthogonal Projections) Given m, n ∈ N, A ∈ Cm×n of rank r, and let S be one of the subspaces R(A∗ ), N (A). Show that the orthogonal projection of v ∈ Cn into S can be written as a matrix P S times the vector v in the form P S v, where P R(A∗ ) = A† A = V 1 V ∗1 =

r

v j v ∗j ∈ Cn×n

j =1

P N (A) = I − A† A = V 2 V ∗2 =

n

(9.25) v j v ∗j ∈ Cn×n .

j =r+1

where A† is the generalized inverse of A and A = U ΣV ∗ ∈ Cm×n , is a singular value decomposition of A (cf. (9.7)). Thus (9.12) and (9.25) give the orthogonal projections into the 4 fundamental subspaces. Hint: by Theorem 7.3 we have the orthogonal sum



Cn = R(A∗ )⊕N (A).

Exercise 9.9 (The Generalized Inverse of a Vector) Show that u† = (u∗ u)−1 u∗ if u ∈ Cn,1 is nonzero. Exercise 9.10 (The Generalized Inverse of an Outer Product) If A = uv ∗ where u ∈ Cm , v ∈ Cn are nonzero, show that A† =

1 ∗ A , α

α = u22 v22 .

Exercise 9.11 (The Generalized Inverse of a Diagonal Matrix) Show that diag(λ1 , . . . , λn )† = diag(λ†1 , . . . , λ†n ) where , λ†i =

1/λi , λi = 0 0 λi = 0.

Exercise 9.12 (Properties of the Generalized Inverse) Suppose A ∈ Cm×n . Show that a) (A∗ )† = (A† )∗ . b) (A† )† = A. c) (αA)† = α1 A† , α = 0. Exercise 9.13 (The Generalized Inverse of a Product) Suppose k, m, n ∈ N, A ∈ Cm×n , B ∈ Cn×k . Suppose A has linearly independent columns and B has linearly independent rows. a) Show that (AB)† = B † A† . Hint: Let E = AB, F = B † A† . Show by using A† A = BB † = I that F is the generalized inverse of E. b) Find A ∈ R1,2 , B ∈ R2,1 such that (AB)† = B † A† .

220

9 Least Squares

Exercise 9.14 (The Generalized Inverse of the Conjugate Transpose) Show that A∗ = A† if and only if all singular values of A are either zero or one. Exercise 9.15 (Linearly Independent Columns) Show that if A has rank n then A(A∗ A)−1 A∗ b is the projection of b into R(A). (Cf. Exercise 9.8.) Exercise 9.16 (Analysis of the General Linear System) Consider the linear system Ax = b where A ∈ Cn×n has rank r > 0 and b ∈ Cn . Let U ∗ AV =



Σ1 0 0 0



represent the singular value decomposition of A. a) Let c = [c1 , . . . , cn ]∗ = U ∗ b and y = [y1 , . . . , yn ]∗ = V ∗ x. Show that Ax = b if and only if 

 Σ1 0 y = c. 0 0

b) Show that Ax = b has a solution x if and only if cr+1 = · · · = cn = 0. c) Deduce that a linear system Ax = b has either no solution, one solution or infinitely many solutions. Exercise 9.17 (Fredholm’s Alternative) For any A ∈ Cm×n , b ∈ Cn show that one and only one of the following systems has a solution (1)

Ax = b,

(2)

A∗ y = 0, y ∗ b = 0.

In other words either b ∈ R(A), or we can find y ∈ N (A∗ ) such that y ∗ b = 0. This is called Fredholm’s alternative. Exercise 9.18 (SVD (Exam Exercise 2017-2)) Let A ∈ Cm×n , with m ≥ n, be a matrix on the form   B A= C where B is a non-singular n × n matrix and C is in C(m−n)×n . Let A† denote the pseudoinverse of A. Show that A† 2 ≤ B −1 2 .

9.6 Exercises Chap. 9

221

9.6.4 Exercises Sect. 9.4 Exercise 9.19 (Condition Number) Let ⎡

⎤ 12 A = ⎣1 1⎦, 11

⎤ b1 b = ⎣ b2 ⎦ . b3 ⎡

a) Determine the projections b1 and b2 of b on R(A) and N (A∗ ). b) Compute K(A) = A2 A† 2 . Exercise 9.20 (Equality in Perturbation Bound) Let A ∈ Cm×n . Suppose y A and y A† are vectors with y A  = y A†  = 1 and A = Ay A  and A†  = A† y A† . a) Show that we have equality to the right in (9.13) if b = Ay A , e1 = y A† . b) Show that we have equality to the left if we switch b and e1 in a). c) Let A be as in Example 9.7. Find extremal b and e when the l∞ norm is used. This generalizes the sharpness results in Exercise 8.18. For if m = n and A is nonsingular then A† = A−1 and e1 = e. Exercise 9.21 (Problem Using Normal Equations) Consider the least squares problems where ⎡

⎤ 1 1 A = ⎣1 1 ⎦, 1 1+

⎡ ⎤ 2 b = ⎣ 3 ⎦ , ∈ R. 2

a) Find the normal equations and the exact least squares solution. ∗ 2 b) Suppose is small and we replace the (2, 2) √ entry 3+2 + in A A by 3+2 . (This will be done in a computer if < u, u being the round-off unit). For √ example, if u = 10−16 then u = 10−8 . Solve A∗ Ax = A∗ b for x and compare with the x found in a). (We will get a much more accurate result using the QR factorization or the singular value decomposition on this problem).

9.6.5 Exercises Sect. 9.5 Exercise 9.22 (Singular Values Perturbation (Exam Exercise 1980-2)) Let A( ) ∈ Rn×n be bidiagonal with ai,j = 0 for i, j = 1, . . . , n and j = i, i + 1. Moreover, for some 1 ≤ k ≤ n − 1 we have ak,k+1 = ∈ R. Show that |σi ( ) − σi (0)| ≤ | |,

i = 1, . . . , n,

where σi ( ), i = 1, . . . , n are the singular values of A( ).

222

9 Least Squares

9.7 Review Questions 9.7.1 9.7.2 9.7.3 9.7.4

Do the normal equations always have a solution? When is the least squares solution unique? Express the general least squares solution in terms of the generalized inverse. Consider perturbing the right-hand side in a linear equation and a least squares problem. What is the main difference in the perturbation inequalities? 9.7.5 Why does one often prefer using QR factorization instead of normal equations for solving least squares problems. 9.7.6 What is an orthogonal sum? 9.7.7 How is an orthogonal projection defined?

Part IV

Kronecker Products and Fourier Transforms

We give an introduction to Kronecker products of matrices and the fast Fourier transform. We illustrate the usefulness by giving a fast method for solving the 2 dimensional discrete Poison Equation based on the fast Fourier transform.

Chapter 10

The Kronecker Product

Matrices arising from 2D and 3D problems sometimes have a Kronecker product structure. Identifying a Kronecker structure can be very rewarding since it simplifies the derivation of properties of such matrices.

10.1 The 2D Poisson Problem Let Ω := (0, 1)2 = {(x, y) : 0 < x, y < 1} be the open unit square with boundary ∂Ω. Consider the problem − Δu := −

∂ 2u ∂ 2u − 2 = f on Ω, ∂x 2 ∂y

(10.1)

u := 0 on ∂Ω. Here the function f is given and continuous on Ω, and we seek a function u = u(x, y) such that (10.1) holds and which is zero on ∂Ω. Let m be a positive integer. We solve the problem numerically by finding approximations vj,k ≈ u(j h, kh) on a grid of points given by Ω h := {(j h, kh) : j, k = 0, 1, . . . , m + 1},

where h = 1/(m + 1).

The points Ωh := {(j h, kh) : j, k = 1, . . . , m} are called the interior points, while Ω h \ Ωh are the boundary points. The solution is zero at the boundary points. Using the difference approximation from Chap. 2 for the second derivative we obtain the

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_10

225

226

10 The Kronecker Product

following approximations for the partial derivatives vj −1,k − 2vj,k + vj +1,k ∂ 2 u(j h, kh) vj,k−1 − 2vj,k +vj,k+1 ∂ 2 u(j h, kh) ≈ , ≈ . ∂x 2 h2 ∂y 2 h2 Inserting this in (10.1) we get the following discrete analog of (10.1) −Δh vj,k = fj,k , vj,k = 0,

(j h, kh) ∈ Ωh , (j h, kh) ∈ ∂Ωh ,

(10.2) (10.3)

where fj,k := f (j h, kh) and − Δh vj,k :=

−vj −1,k + 2vj,k − vj +1,k −vj,k−1 + 2vj,k − vj,k+1 + . h2 h2

(10.4)

Let us take a closer look at (10.2). It consists of n := m2 linear equations. Since the values at the boundary points are known, the unknowns are the n numbers vj,k at the interior points. These linear equations can be written as a matrix equation in the form T V + V T = h2 F

with h = 1/(m + 1),

(10.5)

where T = tridiag(−1, 2, −1) ∈ Rm×m is the second derivative matrix given by (2.27) and ⎤ v1,1 · · · v1,m ⎢ .. ⎥ ∈ Rm×m , V := ⎣ ... . ⎦ vm,1 · · · vm,m ⎡

⎤ f1,1 · · · f1,m ⎢ .. ⎥ ∈ Rm×m . F := ⎣ ... . ⎦ fm,1 · · · fm,m ⎡

(10.6)

Indeed, the (j, k) element in T V + V T is given by m

i=1

T j,i vi,k +

m

vj,i T i,k = −vj −1,k + 2vj,k − vj +1,k − vj,k−1 + 2vj,k − vj,k+1 ,

i=1

and if we divide by h2 this is precisely the left hand side of (10.2). To write (10.5) in standard form Ax = b we need to order the unknowns vj,k in some way. The following operation of vectorization of a matrix gives one possible ordering. Definition 10.1 (vec Operation) For any B ∈ Rm×n we define the vector vec(B) := [b11 , . . . , bm1 , b12 , . . . , bm2 , . . . , b1n , . . . , bmn ]T ∈ Rmn by stacking the columns of B on top of each other.

10.1 The 2D Poisson Problem

227

1,1

1,2

1,3

1,3

2,3

3,3

7

8

9

2,1

2,2

2,3

1,2

2,2

3,2

4

5

6

3,1

3,2

3,3

1,1

2,1

3,1

1

2

3

vj,k in V - matrix

v in grid

x i in grid

x i+m

-1

j,k

Fig. 10.1 Numbering of grid points

vj,k+1 vj-1,k

vj+1,k x i-1

vj,k

x i+1

xi

vj,k-1

-1

4

xi-m

-1

-1

Fig. 10.2 The 5-point stencil

Let x := vec(V ) ∈ Rn , where n = m2 . Note that forming x by stacking the columns of V on top of each other means an ordering of the grid points. For m = 3 this is illustrated in Fig. 10.1. We call this the natural ordering. The elements in (10.2) defines a 5-point stencil, as shown in Fig. 10.2. To find the matrix A we note that for values of j, k where the 5-point stencil does not touch the boundary, (10.2) implies 4xi − xi−1 − xi+1 − xi−m − xi+m = bi , where xi = vj,k and bi = h2 fj,k . This must be modified close to the boundary. We then obtain the linear system Ax = b,

A ∈ Rn×n ,

b ∈ Rn ,

n = m2 ,

(10.7)

where x = vec(V ), b = h2 vec(F ) and A is the Poisson matrix given by aii = 4, ai+1,i = ai,i+1 = −1, ai+m,i = ai,i+m = −1, aij = 0,

i = 1, . . . , n, i = 1, . . . , n − 1, i = m, 2m, . . . , (m − 1)m, i = 1, . . . , n − m, otherwise. (10.8)

228

10 The Kronecker Product

n=9

n = 25

n = 100

Fig. 10.3 Band structure of the 2D test matrix

For m = 3 we have the following matrix ⎤ 4 −1 0 −1 0 0 0 0 0 ⎢ −1 4 −1 0 −1 0 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 −1 4 0 0 −1 0 0 0 ⎥ ⎥ ⎢ ⎢ −1 0 0 4 −1 0 −1 0 0 ⎥ ⎥ ⎢ ⎥ ⎢ A = ⎢ 0 −1 0 −1 4 −1 0 −1 0 ⎥ . ⎥ ⎢ ⎢ 0 0 −1 0 −1 4 0 0 −1 ⎥ ⎥ ⎢ ⎢ 0 0 0 −1 0 0 4 −1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 0 −1 0 −1 4 −1 ⎦ 0 0 0 0 0 −1 0 −1 4 ⎡

The bands of the weakly diagonally dominant matrix A are illustrated in Fig. 10.3.

10.1.1 The Test Matrices In Sect. 2.4 we encountered the 1-dimensional test matrix T 1 ∈ Rm×m defined for any real numbers a, d by T 1 := tridiag(a, d, a).

(10.9)

The (2-dimensional) Poisson matrix is a special case of the matrix T 2 = [aij ] ∈ Rn×n with elements aii ai,i+1 = ai+1,i ai,i+m = ai+m,i aij

= = = =

2d, i = 1, . . . , n, a, i = 1, . . . , n − 1, i = m, 2m, . . . , (m − 1)m, a, i = 1, . . . , n − m, 0, otherwise, (10.10)

10.2 The Kronecker Product

229

and where a, d are real numbers. We will refer to this matrix as simply the 2D test matrix. For m = 3 the 2D test matrix looks as follows ⎡

2d ⎢ a ⎢ ⎢ 0 ⎢ ⎢ a ⎢ ⎢ T2 = ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0

a 2d a 0 a 0 0 0 0

0 a 2d 0 0 a 0 0 0

a 0 0 a 0 0 2d a a 2d 0 a a 0 0 a 0 0

0 0 a 0 a 2d 0 0 a

⎤ 0 0 0 0 0 0⎥ ⎥ 0 0 0⎥ ⎥ a 0 0⎥ ⎥ ⎥ 0 a 0⎥. ⎥ 0 0 a⎥ ⎥ 2d a 0 ⎥ ⎥ a 2d a ⎦ 0 a 2d

(10.11)

The partition into 3 × 3 sub matrices shows that T 2 is block tridiagonal. Properties of T 2 can be derived from properties of T 1 by using properties of the Kronecker product.

10.2 The Kronecker Product Definition 10.2 (Kronecker Product) For any positive integers p, q, r, s we define the Kronecker product of two matrices A ∈ Rp×q and B ∈ Rr×s as a matrix C ∈ Rpr×qs given in block form as ⎡

Ab1,1 Ab1,2 ⎢ Ab2,1 Ab2,2 ⎢ C=⎢ . .. ⎣ .. . Abr,1 Abr,2

⎤ · · · Ab1,s · · · Ab2,s ⎥ ⎥ . ⎥. .. . .. ⎦ · · · Abr,s

We denote the Kronecker product of A and B by C = A ⊗ B. This definition of the Kronecker product is known more precisely as the left Kronecker product. In the literature one often finds the right Kronecker product which in our notation is given by B ⊗ A.  T The Kronecker product u ⊗ v = uT v1 , . . . , uT vr of two column vectors u ∈ Rp and v ∈ Rr is a column vector of length p · r. The test matrix T 2 can be written as a sum of Kronecker products. Indeed, if m = 3 then ⎡

⎤ d a 0 T1 = ⎣a d a⎦, 0ad



⎤ 100 I = ⎣0 1 0⎦ 001

230

10 The Kronecker Product

and ⎡

⎤ ⎡ ⎤ T1 0 0 dI aI 0 T 1 ⊗ I + I ⊗ T 1 = ⎣ 0 T 1 0 ⎦ + ⎣ aI dI aI ⎦ = T 2 0 aI dI 0 0 T1 given by (10.11). This formula holds for any integer m ≥ 2 T 2 = T 1 ⊗ I + I ⊗ T 1,

T 1 , I ∈ Rm×m ,

2 )×(m2 )

T 2 ∈ R(m

.

(10.12)

The sum of two Kronecker products involving the identity matrix is worthy of a special name. Definition 10.3 (Kronecker Sum) For positive integers r, s, k, let A ∈ Rr×r , B ∈ Rs×s , and I k be the identity matrix of order k. The sum A ⊗ I s + I r ⊗ B is known as the Kronecker sum of A and B. In other words, the 2D test matrix T 2 is the Kronecker sum involving the 1D test matrix T 1 . The following simple arithmetic rules hold for Kronecker products. For scalars λ, μ and matrices A, A1 , A2 , B, B 1 , B 2 , C of dimensions such that the operations are defined, we have       λA ⊗ μB = λμ A ⊗ B ,   A1 + A2 ⊗ B = A1 ⊗ B + A2 ⊗ B, (10.13)   A ⊗ B 1 + B 2 = A ⊗ B 1 + A ⊗ B 2, (A ⊗ B) ⊗ C = A ⊗ (B ⊗ C). Note however that in general we have A ⊗ B = B ⊗ A, but it can be shown that there are permutation matrices P , Q such that B ⊗ A = P (A ⊗ B)Q, see [9]. The following mixed product rule is an essential tool for dealing with Kronecker products and sums. Lemma 10.1 (Mixed Product Rule) Suppose A, B, C, D are rectangular matrices with dimensions so that the products AC and BD are well defined. Then the product (A ⊗ B)(C ⊗ D) is defined and (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD). Proof If B ∈ Rr,t and D ∈ Rt,s for some integers r, s, t, then ⎡

⎤⎡ ⎤ Ab1,1 · · · Ab1,t Cd1,1 · · · Cd1,s ⎢ .. ⎥ ⎢ .. .. ⎥ . (A ⊗ B)(C ⊗ D) = ⎣ ... . ⎦⎣ . . ⎦ Abr,1 · · · Abr,t Cdt,1 · · · Cdt,s

(10.14)

10.2 The Kronecker Product

231

Thus for all i, j ((A ⊗ B)(C ⊗ D))i,j = AC

t

bi,k dk,j = (AC)(BD)i,j = ((AC) ⊗ (BD))i,j .

k=1

  Using the mixed product rule we obtain the following properties of Kronecker products and sums. Theorem 10.1 (Properties of Kronecker Products) Suppose for r, s ∈ N that A ∈ Rr×r and B ∈ Rs×s are square matrices with eigenpairs (λi , ui ) i = 1, . . . , r and (μj , v j ), j = 1, . . . , s. Moreover, let F , V ∈ Rr×s . Then 1. (A ⊗ B)T = AT ⊗ B T , (this also holds for rectangular matrices). 2. If A and B are nonsingular then A ⊗ B is nonsingular. with (A ⊗ B)−1 = A−1 ⊗ B −1 . 3. If A and B are symmetric then A ⊗ B and A ⊗ I s + I r ⊗ B are symmetric. 4. (A ⊗ B)(ui ⊗ v j ) = λi μj (ui ⊗ v j ), i = 1, . . . , r, j = 1, . . . , s, 5. (A⊗I s +I r ⊗B)(ui ⊗v j ) = (λi +μj )(ui ⊗v j ), i = 1, . . . , r, j = 1, . . . , s., 6. If one of A, B is positive definite and the other is positive semidefinite then A ⊗ I + I ⊗ B is positive definite. 7. AV B T = F ⇔ (A ⊗ B) vec(V ) = vec(F ), 8. AV + V B T = F ⇔ (A ⊗ I s + I r ⊗ B) vec(V ) = vec(F ). Before giving the simple proofs of this theorem we present some comments. 1. The transpose (or the inverse) of an ordinary matrix product equals the transpose (or the inverse) of the matrices in reverse order. For Kronecker products the order is kept. 2. The eigenvalues of the Kronecker product (sum) are the product (sum) of the eigenvalues of the factors. The eigenvectors are the Kronecker products of the eigenvectors of the factors. In particular, the eigenvalues of the test matrix T 2 are sums of eigenvalues of T 1 . 3. Since we already know that T = tridiag(−1, 2, −1) is positive definite the 2D Poisson matrix A = T ⊗ I + I ⊗ T is also positive definite. 4. The system AV B T = F in part 7 can be solved by first finding W from AW = F , and then finding V from BV T = W T . This is preferable to solving the much larger linear system (A ⊗ B) vec(V ) = vec(F ). 5. A fast way to solve the 2D Poisson problem in the form T V + V T = F will be considered in the next chapter.

232

10 The Kronecker Product

Proof of Theorem 10.1 1. Exercise.        −1 −1 = AA−1 ⊗ BB −1 = 2. By the mixed product  rule A ⊗ B A ⊗ B I r ⊗ I s = I rs .Thus A ⊗ B is nonsingular with the indicated inverse. 3. By 1, (A ⊗ B)T = AT ⊗ B T = A ⊗ B. Moreover, since then A ⊗ I and I ⊗ B are symmetric, their sum is symmetric. 4. (A ⊗ B)(ui ⊗ v j ) = (Aui ) ⊗ (Bv j ) = (λi ui ) ⊗ (μj v j ) = (λi μj )(ui ⊗ v j ), for all i, j , where we used the mixed product rule. 5. (A ⊗ I s )(ui ⊗ v j ) = λi (ui ⊗ v j ), and (I r ⊗ B)(ui ⊗ v j ) = μj (ui ⊗ v j ). The result now follows by summing these relations. 6. By 1, A⊗I +I ⊗B is symmetric. Moreover, the eigenvalues λi +μj are positive since for all i, j , both λi and μj are nonnegative and one of them is positive. It follows that A ⊗ I + I ⊗ B is positive definite. 7. We partition V , F , and B T by columns as V = [v 1 , . . . , v s ], F = [f 1 , . . . , f s ] and B T = [b1 , . . . , bs ]. Then we have (A ⊗ B) vec(V ) = vec(F ) ⎡ ⎤⎡ ⎤ ⎡ ⎤ f1 Ab11 · · · Ab1s v1 ⎢ ⎢ .. ⎥ ⎢ ⎥ . . .. ⎦ ⎣ .. ⎦ = ⎣ ... ⎥ ⇔ ⎣ . ⎦



Abs1 · · · Abss vs fs  

A b1j v j , . . . , bsj v j = [f 1 , . . . , f s ] j



j

A[V b1 , . . . , V bs ] = F



AV B T = F .

8. From the proof of 7. we have (A ⊗ I s ) vec(V ) = AV I Ts and (I r ⊗ B) vec(V ) = I r V B T . But then (A ⊗ I s + I r ⊗ B) vec(V ) = vec(F ) ⇔

(AV I Ts + I r V B T ) = F



AV + V B T = F .  

For more on Kronecker products see [9].

10.3 Properties of the 2D Test Matrices Using Theorem 10.1 we can derive properties of the 2D test matrix T 2 from those of T 1 . Recall (cf. Lemma 2.2) that T 1 s j = λj s j for j = 1, . . . , m, where λj = d + 2a cos(j πh),

h :=

1 , m+1

(10.15)

10.3 Properties of the 2D Test Matrices

233

s j = [sin(j πh), sin(2j πh), . . . , sin(mj πh)]T .

(10.16)

Moreover, the eigenvalues are distinct and the eigenvectors are orthogonal s Tj s k =

m+1 1 δj,k = δj,k , 2 2h

j, k = 1, . . . , m.

(10.17)

Theorem 10.2 (Eigenpairs of 2D Test Matrix) For fixed m ≥ 2 let T 2 be the matrix given by (10.10) and let h = 1/(m + 1). Then T 2 (s j ⊗ s k ) = (λj + λk )(s j ⊗ s k )

j, k = 1, . . . , m,

(10.18)

where (λj , s j ) are the eigenpairs of T 1 given by (10.15) and (10.16). The eigenvectors s j ⊗ s k are orthogonal (s j ⊗ s k )T (s p ⊗ s q ) =

1 δj,p δk,q , 4h2

j, k, p, q = 1, . . . , m,

(10.19)

and T 2 is positive definite if d > 0 and d ≥ 2|a|. Proof Equation (10.18) follows from Part 5. of Theorem 10.1. Using the transpose rule, the mixed product rule and (2.32) we find for j, k, p, q = 1, . . . , m T           1 s p ⊗ s q = s Tj ⊗ s Tk s p ⊗ s q = s Tj s p ⊗ s Tk s q = 2 δj,p δk,q sj ⊗ sk 4h and (10.19) follows. Since T 2 is symmetric, T 2 is positive definite if the λj given by (10.15) are positive. But this is true whenever d > 0 and d ≥ 2|a| (cf. Exercise 10.5).   Corollary 10.1 The spectral condition number of the discrete Poisson matrix A ∈ 2 2 Rm ×m given by (10.8) is given by A2 A−1 2 =

cos2 w sin2 w

,

w :=

π . 2(m + 1)

(10.20)

Proof Recall that by (10.15) with d = 2, a = −1, and (10.18), the eigenvalues λj,k of A are λj,k = 4−2 cos (2j w)−2 cos (2kw) = 4 sin2 (j w)+4 sin2 (kw),

j, k =1, . . . , m.

Using trigonometric formulas, it follows that the largest and smallest eigenvalue of A, are given by λmax = 8 cos2 w,

λmin = 8 sin2 w.

234

10 The Kronecker Product

Since d > 0 and d ≥ 2|a| it follows that A is positive definite. By (8.26) we have A2 A−1 2 = λλmax and (10.20) follows.   min

10.4 Exercises Chap. 10 10.4.1 Exercises Sects. 10.1, 10.2 Exercise 10.1 (4 × 4 Poisson Matrix) Write down the Poisson matrix for m = 2 and show that it is strictly diagonally dominant. Exercise 10.2 (Properties of Kronecker Products) Prove (10.13). Exercise 10.3 (Eigenpairs of Kronecker Products (Exam Exercise 2008-3)) Let A, B ∈ Rn×n . Show that the eigenvalues of the Kronecker product A ⊗ B are products of the eigenvalues of A and B and that the eigenvectors of A ⊗ B are Kronecker products of the eigenvectors of A and B.

10.4.2 Exercises Sect. 10.3 Exercise 10.4 (2. Derivative Matrix Is Positive Definite) Write down the eigenvalues of T = tridiag(−1, 2, −1) using (10.15) and conclude that T is symmetric positive definite. Exercise 10.5 (1D Test Matrix Is Positive Definite?) Show that the matrix T 1 is symmetric positive definite if d > 0 and d ≥ 2|a|. Exercise 10.6 (Eigenvalues for 2D Test Matrix of Order 4) For m = 2 the matrix (10.10) is given by ⎡

2d a a ⎢ a 2d 0 A=⎢ ⎣ a 0 2d 0 a a

⎤ 0 a⎥ ⎥. a⎦ 2d

Show that λ = 2a + 2d is an eigenvalue corresponding to the eigenvector x = [1, 1, 1, 1]T . Verify that apart from a scaling of the eigenvector this agrees with (10.15) and (10.16) for j = k = 1 and m = 2. Exercise 10.7 (Nine Point Scheme for Poisson Problem) Consider the following 9 point difference approximation to the Poisson problem −Δu = f , u = 0 on the

10.4 Exercises Chap. 10

235

boundary of the unit square (cf. (10.1)) j, k = 1, . . . , m (a) −(h v)j,k = (μf )j,k (b) 0 = v0,k = vm+1,k = vj,0 = vj,m+1 , j, k = 0, 1, . . . , m + 1, (c) −(h v)j,k = [20vj,k − 4vj −1,k − 4vj,k−1 − 4vj +1,k − 4vj,k+1 − vj −1,k−1 − vj +1,k−1 − vj −1,k+1 − vj +1,k+1 ]/(6h2 ), (d)

(μf )j,k = [8fj,k + fj −1,k + fj,k−1 + fj +1,k + fj,k+1 ]/12. (10.21)

a) Write down the 4-by-4 system we obtain for m = 2. b) Find vj,k for j, k = 1, 2, if f (x, y) = 2π 2 sin (πx) sin (πy) and m = 2. Answer: vj,k = 5π 2 /66. It can be shown that (10.21) defines an O(h4 ) approximation to (10.1). Exercise 10.8 (Matrix Equation for Nine Point Scheme) Consider the nine point difference approximation to (10.1) given by (10.21) in Problem 10.7. a) Show that (10.21) is equivalent to the matrix equation 1 T V + V T − T V T = h2 μF . 6

(10.22)

Here μF has elements (μf )j,k given by (10.21d) and T = tridiag(−1, 2, −1). b) Show that the standard form of the matrix equation (10.22) is Ax = b, where A = T ⊗ I + I ⊗ T − 16 T ⊗ T , x = vec(V ), and b = h2 vec(μF ). Exercise 10.9 (Biharmonic Equation) Consider the biharmonic equation   Δ2 u(s, t) := Δ Δu(s, t) = f (s, t) (s, t) ∈ Ω, u(s, t) = 0, Δu(s, t) = 0 (s, t) ∈ ∂Ω.

(10.23)

Here Ω is the open unit square. The condition Δu = 0 is called the Navier boundary condition. Moreover, Δ2 u = uxxxx + 2uxxyy + uyyyy . a) Let v = −Δu. Show that (10.23) can be written as a system −Δv(s, t) = f (s, t) (s, t) ∈ Ω −Δu(s, t) = v(s, t) (s, t) ∈ Ω u(s, t) = v(s, t) = 0 (s, t) ∈ ∂Ω.

(10.24)

m×m , h = 1/(m + b) Discretizing, using (10.4), m with T = tridiag(−1, 2, −1) ∈ R 1), and F = f (j h, kh) j,k=1 we get two matrix equations

T V + V T = h2 F ,

T U + U T = h2 V .

236

10 The Kronecker Product

Show that (T ⊗ I + I ⊗ T )vec(V ) = h2 vec(F ),

(T ⊗ I + I ⊗ T )vec(U ) = h2 vec(V ).

and hence A = (T ⊗ I + I ⊗ T )2 is the matrix for the standard form of the discrete biharmonic equation. c) Show that with n = m2 the vector form and standard form of the systems in b) can be written T 2 U + 2T U T + U T 2 = h4 F

and Ax = b,

(10.25)

where A = T 2 ⊗I +2T ⊗T +I ⊗T 2 ∈ Rn×n , x = vec(U ), and b = h4 vec(F ). d) Determine the eigenvalues and eigenvectors of the matrix A in c) and show that it is positive definite. Also determine the bandwidth of A. e) Suppose we want to solve the standard form equation Ax = b. We have two representations for the matrix A, the product one in b) and the one in c). Which one would you prefer for the basis of an algorithm? Why?

10.5 Review Questions 10.5.1 Consider the Poisson matrix. • Write this matrix as a Kronecker sum, • how are its eigenvalues and eigenvectors related to the second derivative matrix? • is it symmetric? positive definite? 10.5.2 What are the eigenpairs of T 1 := tridiagonal(a, d, a)? 10.5.3 What are the inverse and transpose of a Kronecker product? 10.5.4 • give an economical general way to solve the linear system (A ⊗ B) vec(V ) = vec(F )? • Same for (A ⊗ I s + I r ⊗ B) vec(V ) = vec(F ).

Chapter 11

Fast Direct Solution of a Large Linear System

11.1 Algorithms for a Banded Positive Definite System In this chapter we present a fast method for solving Ax = b, where A is the Poisson matrix (10.8). Thus, for n = 9 ⎡

⎤ 4 −1 0 −1 0 0 0 0 0 ⎢ −1 4 −1 0 −1 0 0 0 0 ⎥ ⎢ ⎥ ⎢ 0 −1 4 0 0 −1 0 0 0 ⎥ ⎢ ⎥ ⎢ −1 0 0 4 −1 0 −1 0 0 ⎥ ⎢ ⎥ ⎢ ⎥ A = ⎢ 0 −1 0 −1 4 −1 0 −1 0 ⎥ ⎢ ⎥ ⎢ 0 0 −1 0 −1 4 0 0 −1 ⎥ ⎢ ⎥ ⎢ 0 0 0 −1 0 0 4 −1 0 ⎥ ⎢ ⎥ ⎣ 0 0 0 0 −1 0 −1 4 −1 ⎦ 0 0 0 0 0 −1 0 −1 4 ⎡ ⎤ T + 2I −I 0 =⎣ −I T + 2I −I ⎦ , 0 −I T + 2I where T = tridiag(−1, 2, −1). For the matrix A we know by now that 1. 2. 3. 4. 5.

It is positive definite. It is banded. It is block-tridiagonal. We know the eigenvalues and eigenvectors of A. The eigenvectors are orthogonal.

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_11

237

238

11 Fast Direct Solution of a Large Linear System

11.1.1 Cholesky Factorization Since A is positive definite we can use the Cholesky factorization A = LL∗ , with√L lower triangular, to solve Ax = b. Since A and L has the same bandwidth d = n the complexity of this factorization is O(nd 2 ) = O(n2 ), cf. Algorithm 4.2. We need to store A, and this can be done in sparse form. The nonzero elements in L are shown in Fig. 11.1 for n = 100. Note that most of the zeros between the diagonals in A have become nonzero in L. This is known as fill-inn.

11.1.2 Block LU Factorization of a Block Tridiagonal Matrix The Poisson matrix has a block tridiagonal structure. Consider finding the block LU factorization of a block tridiagonal matrix. We are looking for a factorization of the form ⎡D C ⎤ ⎡ I ⎤ ⎡ U 1 C1 ⎤ 1 1 L1 I ⎢ A1 D 2 C 2 ⎥ . . .. .. .. ⎢ .. .. ⎥=⎣ . . ⎦⎣ ⎦. (11.1) . . . ⎣ ⎦ .. .. U C m−1 m−1 A D C m−2

m−1

m−1

Am−1 D m

Lm−1 I

Fig. 11.1 Fill-inn in the Cholesky factor of the Poisson matrix (n = 100)

Um

11.2 A Fast Poisson Solver Based on Diagonalization

239

Here D 1 , . . . , D m and U 1 , . . . , U m are square matrices while A1 , . . . , Am−1 ,L1 , . . .,Lm−1 and C 1 , . . . , C m−1 can be rectangular. Using block multiplication the formulas (2.16) generalize to U 1 = D1,

Lk = Ak U −1 k ,

U k+1 = D k+1 − Lk C k ,

k = 1, 2, . . . , m − 1. (11.2)

To solve the system Ax = b we partition b conformally with A in the form bT = [bT1 , . . . , bTm ]. The formulas for solving Ly = b and U x = y are as follows: y 1 = b1 , x m = U −1 m ym,

y k = bk − Lk−1 y k−1 , x k = U −1 k (y k − C k x k+1 ),

k = 2, 3, . . . , m, k = m − 1, . . . , 2, 1.

(11.3)

The solution is then x T = [x T1 , . . . , x Tm ]. To find Lk in (11.2) we solve the linear systems Lk U k = Ak . Similarly we need to solve a linear system to find x k in (11.3). The number of arithmetic operations using block factorizations is O(n2 ), asymptotically the same as for Cholesky factorization. However we only need to store the m × m blocks, and using matrix operations can be advantageous.

11.1.3 Other Methods Other methods include • Iterative methods, (we study this in Chaps. 12 and 13), • multigrid. See [5], • fast solvers based on diagonalization and the fast Fourier transform. See Sects. 11.2, 11.3.

11.2 A Fast Poisson Solver Based on Diagonalization The algorithm we now derive will only require O(n3/2 ) arithmetic operations and we only need to work with matrices of order m. Using the fast Fourier transform the number of arithmetic operations can be reduced further to O(n log n). To start we recall that Ax = b can be written as a matrix equation in the form (cf. (10.5)) T V + V T = h2 F

with h = 1/(m + 1),

where T = tridiag(−1, 2, −1) ∈ Rm×m is the second derivative matrix, V = (vj,k ) ∈ Rm×m are the unknowns, and F = (fj,k ) = (f (j h, kh)) ∈ Rm×m contains function values.

240

11 Fast Direct Solution of a Large Linear System

Recall (cf. Lemma 2.2) that the eigenpairs of T are given by T s j = λj s j ,

j = 1, . . . , m,

s j = [sin (j πh), sin (2j πh), . . . , sin (mj πh)]T , λj = 2 − 2 cos(j πh) = 4 sin2 (j πh/2),

h = 1/(m + 1),

s Tj s k = δj,k /(2h) for all j, k. Let  m S := [s 1 , . . . , s m ] = sin (j kπh) j,k=1 ∈ Rm×m ,

D = diag(λ1 , . . . , λm ). (11.4)

Then T S = [T s 1 , . . . , T s m ] = [λ1 s 1 , . . . , λm s m ] = SD,

S2 = ST S =

1 I. 2h

Define X ∈ Rm×m by V = SXS, where V is the solution of T V + V T = h2 F . Then T V + V T = h2 F V =SXS

⇐⇒ T SXS + SXST = h2 F

S( )S

⇐⇒ ST SXS 2 + S 2 XST S = h2 SF S = h2 G T S=SD

⇐⇒ S 2 DXS 2 + S 2 XS 2 D = h2 G S 2 =I /(2h)

⇐⇒

DX + XD = 4h4 G.

Since D is diagonal, the equation DX + XD = 4h4 G, is easy to solve. For the j, k element we find (DX + XD)j,k =

m

=1

dj, x,k +

m

xj, d,k = λj xj,k + λk xj,k

=1

so that for all j, k xj,k = 4h4 gj,k /(λj + λk ) = h4 gj,k /(σj + σk ),

σj := λj /4 = sin2 (j πh/2).

11.2 A Fast Poisson Solver Based on Diagonalization

241

Thus to find V we compute 1. G = SF S, 2. xj,k = h4 gj,k /(σj + σk ), 3. V = SXS.

j, k = 1, . . . , m,

We can compute X, S and the σ ’s without using loops. Using outer products, element by element division, and raising a matrix element by element to a power we find  σ1    1 4 X = h G/M, where M := ... [ 1, ..., 1 ] + ... [ σ1 . ... .σm ] , σm 1 ⎡ ⎤ ⎡ ⎤ 1 1   πh 2   2 ⎣ . ⎦ ∧ 2. S = sin πh ⎣ .. ⎦ [ 1 2 ... m ] , σ = sin .. 2 . m

m

We now get the following algorithm to solve numerically the Poisson problem −Δu = f on Ω = (0, 1)2 and u = 0 on ∂Ω using the 5-point scheme, i.e., let m ∈ N, h = 1/(m + 1), and F = (f (j h, kh)) ∈ Rm×m . We compute V ∈ R(m+2)×(m+2) using diagonalization of T = tridiag(−1, 2, −1) ∈ Rm×m .

function V=fastpoisson(F) %function V=fastpoisson(F) m=length(F); h=1/(m+1); hv=pi*h*(1:m)’; sigma=sin(hv/2).^2; S=sin(hv*(1:m)); G=S*F*S; X=h^4*G./(sigma*ones(1,m)+ ones(m,1)*sigma’); V=zeros(m+2,m+2); V(2:m+1,2:m+1)=S*X*S; end Listing 11.1 fastpoisson

The formulas are fully vectorized. Since the 6th line in Algorithm 11.1 only requires O(m3 ) arithmetic operations, the complexity of this algorithm is for large m determined by 4 m-by-m matrix multiplications and is given by O(4 × 2m3 ) = O(8n3/2).1 The method is very fast and will be used as a preconditioner for a more complicated problem in Chap. 13. In 2012 it took about 0.2 seconds on a laptop to find the 106 unknowns vj,k on a 1000 × 1000 grid.

1 It is possible to compute V using only two matrix multiplications and hence reduce the complexity to O(4n3/2 ). This is detailed in Problem 11.4.

242

11 Fast Direct Solution of a Large Linear System

11.3 A Fast Poisson Solver Based on the Discrete Sine and Fourier Transforms In Algorithm 11.1 we need to compute the product of the sine matrix S ∈ Rm×m given by (11.4) and a matrix A ∈ Rm×m . Since the matrices are m-by-m this will normally require O(m3 ) operations. In this section we show that it is possible to calculate the products SA and AS in O(m2 log2 m) operations. We need to discuss certain transforms known as the discrete sine transform, the discrete Fourier transform and the fast Fourier transform. In addition we have the discrete cosine transform which will not be discussed here. These transforms are of independent interest. They have applications to signal processing and image analysis, and are often used when one is dealing with discrete samples of data on a computer.

11.3.1 The Discrete Sine Transform (DST) Given v = [v1 , . . . , vm ]T ∈ Rm we say that the vector w = [w1 , . . . , wm ]T given by wj =

m

k=1

sin

 j kπ  vk , m+1

j = 1, . . . , m

is the discrete sine transform (DST) of v. In matrix form we can write the DST as the matrix times vector w = Sv, where S is the sine matrix given by (11.4). We can then identify the matrix B = SA as the DST of A ∈ Rm,n , i.e. as the DST of the columns of A. The product B = AS can also be interpreted as a DST. Indeed, since S is symmetric we have B = (SAT )T which means that B is the transpose of the DST of the rows of A. It follows that we can compute the unknowns V in Algorithm 11.1 by carrying out discrete sine transforms on 4 m-by-m matrices in addition to the computation of X.

11.3.2 The Discrete Fourier Transform (DFT) The fast computation of the DST is based on its relation to the discrete Fourier transform (DFT) and the fact that the DFT can be computed by a technique known as the fast Fourier transform (FFT). To define the DFT let for N ∈ N ωN = exp−2πi/N = cos(2π/N) − i sin(2π/N),

(11.5)

11.3 A Fast Poisson Solver Based on the Discrete Sine and Fourier Transforms

243

√ where i = −1 is the imaginary unit. Given y = [y1 , . . . , yN ]T ∈ RN we say that z = [z1 , . . . , zN ]T given by zj +1 =

N−1

jk

ωN yk+1 ,

j = 0, . . . , N − 1,

k=0

is the discrete Fourier transform (DFT) of y. We can write this as a matrix times vector product z = F N y, where the Fourier matrix F N ∈ CN×N has elements jk ωN , j, k = 0, 1, . . . , N − 1. For a matrix we say that B = F N A is the DFT of A. As an example, since ω4 = exp−2πi/4 = cos(π/2) − i sin(π/2) = −i we find ω42 = (−i)2 = −1, ω43 = (−i)(−1) = i, ω44 = (−1)2 = 1, ω46 = i 2 = −1, ω49 = i 3 = −i, and so ⎡

1 1 ⎢ 1 ω4 F4 = ⎢ ⎣ 1 ω2 4 1 ω43

1 ω42 ω44 ω46

⎤ ⎤ ⎡ 1 1 1 1 1 ⎢ ⎥ ω43 ⎥ ⎥ = ⎢ 1 −i −1 i ⎥ . 6 ⎣ ⎦ 1 −1 1 −1 ⎦ ω4 9 1 i −1 −i ω4

(11.6)

The following lemma shows how the discrete sine transform of order m can be computed from the discrete Fourier transform of order 2m + 2. We recall that for any complex number w sin w =

eiw − e−iw . 2i

Lemma 11.1 (Sine Transform as Fourier Transform) Given a positive integer m and a vector x ∈ Rm . Component k of Sx is equal to i/2 times component k + 1 of F 2m+2 z where zT = [0, x T , 0, −x TB ] ∈ R2m+2 ,

x TB := [xm , . . . , x2 , x1 ].

In symbols (Sx)k =

i (F 2m+2 z)k+1 , 2

k = 1, . . . , m.

Proof Let ω = ω2m+2 = e−2πi/(2m+2) = e−πi/(m+1) . We note that ωj k = e−πij k/(m+1) ,

ω(2m+2−j )k = e−2πi eπij k/(m+1) = eπij k/(m+1) .

244

11 Fast Direct Solution of a Large Linear System

Component k + 1 of F 2m+2 z is then given by (F 2m+2 z)k+1 =

2m−1

ωj k zj +1 =

j =0

=

m

m

xj ω j k −

j =1

m

xj ω(2m+2−j )k

j =1

  xj e−πij k/(m+1) − eπij k/(m+1)

j =1

= −2i

m

j =1



j kπ xj sin m+1

= −2i(S m x)k .

Dividing both sides by −2i and noting −1/(2i) = −i/(2i 2) = i/2, proves the lemma.   It follows that we can compute the DST of length m by extracting m components from the DFT of length N = 2m + 2.

11.3.3 The Fast Fourier Transform (FFT) From a linear algebra viewpoint the fast Fourier transform is a quick way to compute the matrix- vector product F N y. Suppose N is even. The key to the FFT is a connection between F N and F N/2 which makes it possible to compute the FFT of order N as two FFT’s of order N/2. By repeating this process we can reduce the number of arithmetic operations to compute a DFT from O(N 2 ) to O(N log2 N). Suppose N is even. The connection between F N and F N/2 involves a permutation matrix P N ∈ RN×N given by P N = [e1 , e3 , . . . , eN−1 , e2 , e4 , . . . , eN ], where the ek = (δj,k ) are unit vectors. If A is a matrix with N columns [a 1 , . . . , a N ] then AP N = [a 1 , a 3 , . . . , a N−1 , a 2 , a 4 , . . . , a N ], i.e. post multiplying A by P N permutes the columns of A so that all the odd-indexed columns are followed by all the even-indexed columns. For example we have from (11.6) ⎡

1 ⎢0 P 4 = [e1 e3 e2 e4 ] = ⎢ ⎣0 0

00 01 10 00

⎤ 0 0⎥ ⎥ 0⎦ 1



⎤ 1 1 1 1 ⎢ 1 −1 −i i ⎥ ⎥ F 4P 4 = ⎢ ⎣ 1 1 −1 −1 ⎦ , 1 −1 i −i

11.3 A Fast Poisson Solver Based on the Discrete Sine and Fourier Transforms

245

where we have indicated a certain block structure of F 4 P 4 . These blocks can be related to the 2-by-2 matrix F 2 . We define the diagonal scaling matrix D 2 by  D 2 = diag(1, ω4 ) =

 1 0 . 0 −i

Since ω2 = exp−2πi/2 = −1 we find  F2 =

 1 1 , 1 −1

 D2F 2 =

 11 , −i i

and we see that  F 2 D2 F 2 . F 4P 4 = F 2 −D 2 F 2 

This result holds in general. Theorem 11.1 (Fast Fourier Transform) If N = 2m is even then 

F 2m P 2m

 F m Dm F m = , F m −D m F m

(11.7)

where m−1 2 D m = diag(1, ωN , ωN , . . . , ωN ).

(11.8)

Proof Fix integers p, q with 1 ≤ p, q ≤ m and set j := p − 1 and k := q − 1. Since jk

j

jk

m 2k k m = 1, ω2m = ωm , ω2m = −1, (F m )p,q = ωm , (D m F m )p,q = ω2m ωm , ωm

we find by considering elements in the four sub-blocks in turn (F 2m P 2m )p,q (F 2m P 2m )p+m,q (F 2m P 2m )p,q+m (F 2m P 2m )p+m,q+m

= = = =

j (2k)

ω2m (j +m)(2k) ω2m j (2k+1) ω2m (j +m)(2k+1) ω2m

= = = =

jk

ωm , (j +m)k jk ωm = ωm , j jk ω2m ωm , j +m (j +m)k j jk ω2m ωm =−ω2m ωm .

It follows that the four m-by-m blocks of F 2m P 2m have the required structure.

 

Using Theorem 11.1 we can carry out the DFT as a block multiplication. Let y ∈ R2m and set w = P T2m y = [wT1 , wT2 ]T , where wT1 = [y1, y3 , . . . , y2m−1 ],

wT2 = [y2 , y4 , . . . , y2m ].

246

11 Fast Direct Solution of a Large Linear System

Then F 2m y = F 2m P 2m P T2m y = F 2m P 2m w      F m DmF m w1 q1 + q2 = , = w2 q1 − q2 F m −D m F m where q 1 = F m w1 ,

and q 2 = D m (F m w2 ).

In order to compute F 2m y we need to compute F m w1 and F m w2 . Thus, by combining two FFT’s of order m we obtain an FFT of order 2m. If n = 2k then this process can be applied recursively as in the following MATLAB function: function z=fftrec(y) %function z=fftrec(y) y=y(:); n=length(y); if n==1 z=y; else q1=fftrec(y(1:2:n-1)); q2=exp(-2*pi*1i/n).^(0:n/2-1)’.*fftrec(y(2:2:n)); z=[q1+q2; q1-q2]; end Listing 11.2 fftrec

Statement 3 is included so that the input y ∈ Rn can be either a row or column vector, while the output z is a column vector. Such a recursive version of FFT is useful for testing purposes, but is much too slow for large problems. A challenge for FFT code writers is to develop nonrecursive versions and also to handle efficiently the case where N is not a power of two. We refer to [14] for further details. The complexity of the FFT is given by γ N log2 N for some constant γ independent of N. To show this for the special case when N is a power of two let xk be the complexity (the number of arithmetic operations) when N = 2k . Since we need two FFT’s of order N/2 = 2k−1 and a multiplication with the diagonal matrix D N/2 , it is reasonable to assume that xk = 2xk−1 + γ 2k for some constant γ independent of k. Since x0 = 0 we obtain by induction on k that xk = γ k2k . Indeed, this holds for k = 0 and if xk−1 = γ (k − 1)2k−1 then xk = 2xk−1 + γ 2k = 2γ (k − 1)2k−1 + γ 2k = γ k2k . Reasonable implementations of FFT typically have γ ≈ 5, see [14]. The efficiency improvement using the FFT to compute the DFT is spectacular for large N. The direct multiplication F N y requires O(8n2 ) arithmetic operations since

11.4 Exercises Chap. 11

247

complex arithmetic is involved. Assuming that the FFT uses 5N log2 N arithmetic operations we find for N = 220 ≈ 106 the ratio 8N 2 ≈ 84000. 5N log2 N Thus if the FFT takes one second of computing time and the computing time is proportional to the number of arithmetic operations then the direct multiplication would take something like 84000 seconds or 23 hours.

11.3.4 A Poisson Solver Based on the FFT We now have all the ingredients to compute the matrix products SA and AS using FFT’s of order 2m + 2 where m is the order of S and A. This can then be used for quick computation of the exact solution V of the discrete Poisson problem in Algorithm 11.1. We first compute H = SF using Lemma 11.1 and m FFT’s, one for each of the m columns of F . We then compute G = H S by m FFT’s, one for each of the rows of H . After X is determined we compute Z = SX and V = ZS by another 2m FFT’s. In total the work amounts to 4m FFT’s of order 2m + 2. Since one FFT requires O(γ (2m + 2) log2 (2m + 2)) arithmetic operations the 4m FFT’s amount to 8γ m(m + 1) log2 (2m + 2) ≈ 8γ m2 log2 m = 4γ n log2 n, where n = m2 is the size of the linear system Ax = b we would be solving if Cholesky factorization was used. This should be compared to the O(8n3/2 ) arithmetic operations used in Algorithm 11.1 requiring 4 straightforward matrix multiplications with S. What is faster will depend heavily on the programming of the FFT and the size of the problem. We refer to [14] for other efficient ways to implement the DST.

11.4 Exercises Chap. 11 11.4.1 Exercises Sect. 11.3 Exercise 11.1 (Fourier Matrix) Show that the Fourier matrix F 4 is symmetric, but not Hermitian. Exercise 11.2 (Sine Transform as Fourier Transform) Verify Lemma 11.1 directly when m = 1.

248

11 Fast Direct Solution of a Large Linear System

Exercise 11.3 (Explicit Solution of the Discrete Poisson Equation) Show that the exact solution of the discrete Poisson equation (10.5) can be written V = (vi,j )m i,j =1 , where  ipπ   j rπ   kpπ   lrπ  m m m m

sin m+1 sin m+1 sin m+1 sin m+1 1 vij = fk,l . !  " !  "2 (m + 1)4 pπ  2 rπ p=1 r=1 k=1 l=1 sin 2(m+1) + sin 2(m+1) Exercise 11.4 (Improved Version of Algorithm 11.1) Algorithm 11.1 involves multiplying a matrix by S four times. In this problem we show that it is enough to multiply by S two times. We achieve this by diagonalizing only the second T in T V + V T = h2 F . Let D = diag(λ1 , . . . , λm ), where λj = 4 sin2 (j πh/2), j = 1, . . . , m. a) Show that T X + XD = C, where X = V S, and C = h2 F S. b) Show that (T + λj I )x j = cj

j = 1, . . . , m,

(11.9)

where X = [x 1 , . . . , x m ] and C = [c1 , . . . , cm ]. Thus we can find X by solving m linear systems, one for each of the columns of X. Recall that a tridiagonal m × m system can be solved by Algorithms 2.1 and 2.2 in 8m − 7 arithmetic operations. Give an algorithm to find X which only requires O(δm2 ) arithmetic operations for some constant δ independent of m. c) Describe a method to compute V which only requires O(4m3 ) = O(4n3/2 ) arithmetic operations. d) Describe a method based on the fast Fourier transform which requires O(2γ n log2 n) where γ is the same constant as mentioned at the end of the last section. Exercise 11.5 (Fast Solution of 9 Point Scheme) Consider the equation 1 T V + V T − T V T = h2 μF , 6 that was derived in Exercise 10.8 for the 9-point scheme. Define the matrix X by V = SXS = (xj,k ) where V is the solution of (10.22). Show that 1 DX + XD − DXD = 4h4 G, where G = SμF S, 6

11.4 Exercises Chap. 11

249

where D = diag(λ1 , . . . , λm ), with λj = 4 sin2 (j πh/2), j = 1, . . . , m, and that xj,k =

h4 gj,k σj + σk −

2 3 σj σk

  , where σj = sin2 (j πh)/2 for j, k = 1, 2, . . . , m.

Show that σj + σk − 23 σj σk > 0 for j, k = 1, 2, . . . , m. Conclude that the matrix A in Exercise 10.8 b) is symmetric positive definite and that (10.21) always has a solution V . Exercise 11.6 (Algorithm for Fast Solution of 9 Point Scheme) Derive an algorithm for solving (10.21) which for large m requires essentially the same number of operations as in Algorithm 11.1. (We assume that μF already has been formed). Exercise 11.7 (Fast Solution of Biharmonic Equation) For the biharmonic problem we derived in Exercise 10.9 the equation T 2 U + 2T U T + U T 2 = h4 F . Define the matrix X = (xj,k ) by U = SXS where U is the solution of (10.25). Show that D 2 X + 2DXD + XD 2 = 4h6 G, where G = SF S, and that xj,k =

h6 gj,k , where σj = sin2 ((j πh)/2) for j, k = 1, 2, . . . , m. 4(σj + σk )2

Exercise 11.8 (Algorithm for Fast Solution of Biharmonic Equation) Use Exercise 11.7 to derive an algorithm function U=simplefastbiharmonic(F) which requires only O(δn3/2) operations to find U in Problem 10.9. Here δ is some constant independent of n. Exercise 11.9 (Check Algorithm for Fast Solution of Biharmonic Equation) In Exercise 11.8 compute the solution U corresponding to F = ones(m,m). For some small m’s check that you get the same solution obtained by solving the standard form Ax = b in (10.25). You can use x = A\b for solving Ax = b. 2 Use F(:) to vectorize a matrix and reshape(x,m,m) to turn a vector x ∈ Rm into an m×m matrix. Use the MATLAB command surf(U) for plotting U for, say, m = 50. Compare the result with Exercise 11.8 by plotting the difference between both matrices.

250

11 Fast Direct Solution of a Large Linear System

Exercise 11.10 (Fast Solution of Biharmonic Equation Using 9 Point Rule) Repeat Exercises 10.9, 11.8 and 11.9 using the nine point rule (10.21) to solve the system (10.24).

11.5 Review Questions 11.5.1 Consider the Poisson matrix. • What is the bandwidth of its Cholesky factor? • approximately how many arithmetic operations does it take to find the Cholesky factor? • same question for block LU, • same question for the fast Poisson solver with and without FFT. 11.5.2 What is the discrete sine transform and discrete Fourier transform of a vector?

Part V

Iterative Methods for Large Linear Systems

Gaussian elimination, LU and Cholesky factorization are direct methods. In absence of rounding errors they are used to find the exact solution of a linear system using a finite number of arithmetic operations. In an iterative method we start with an approximation x 0 to the exact solution x and then compute a sequence {x k } such that hopefully x k → x. Iterative methods are mainly used for large sparse systems, i.e., where many of the elements in the coefficient matrix are zero. The main advantages of iterative methods are reduced storage requirements and ease of implementation. In an iterative method the main work in each iteration is a matrix times vector multiplication, an operation which often does not need storing the matrix, not even in sparse form. In this part we consider the iterative methods of Jacobi, Gauss-Seidel, successive over relaxation (SOR), steepest descent and conjugate gradients.

Chapter 12

The Classical Iterative Methods

In this chapter we consider the classical iterative methods of Richardson, Jacobi, Gauss-Seidel and an accelerated version of Gauss-Seidel’s method called successive overrelaxation (SOR). David Young developed in his thesis a beautiful theory describing the convergence rate of SOR, see [22]. We give the main points of this theory specialized to the discrete Poisson matrix. With a careful choice of an acceleration parameter the amount of work using SOR on the discrete Poisson problem is the same as for the fast Poisson solver without FFT (cf. Algorithm 11.1). Moreover, SOR is not restricted to constant coefficient methods on a rectangle. However, to obtain fast convergence using SOR it is necessary to have a good estimate for an acceleration parameter. For convergence we need to study convergence of powers of matrices. In this chapter we only use matrix norms which are consistent on Cn×n and subordinate to a vector norm on Cn , (cf. Definitions 8.4 and 8.5).

12.1 Classical Iterative Methods; Component Form We start with an example showing how a linear system can be solved using an iterative method. Example 12.1 (Iterative Methods on a Special   y  2 × 12Matrix) Solving for the diag2 −1 onal elements, the linear system −1 z = 1 can be written in component 2 form as y = (z + 1)/2 and z = (y + 1)/2. Starting with y0 , z0 we generate two sequences {yk } and {zk } using the difference equations yk+1 = (zk + 1)/2 and zk+1 = (yk + 1)/2. This is an example of Jacobi’s method. If y0 = z0 = 0 then we find y1 = z1 = 1/2 and in general yk = zk = 1 − 2−k for k = 0, 1, 2, 3, . . .. The iteration converges to the exact solution [1, 1]T , and the error is halved in each iteration. © Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_12

253

254

12 The Classical Iterative Methods

We can improve the convergence rate by using the most current approximation in each iteration. This leads to Gauss-Seidel’s method: yk+1 = (zk + 1)/2 and zk+1 = (yk+1 + 1)/2. If y0 = z0 = 0 then we find y1 = 1/2, z1 = 3/4, y2 = 7/8, z2 = 15/16, and in general yk = 1 − 2 · 4−k and zk = 1 − 4−k for k = 1, 2, 3, . . .. The error is now reduced by a factor 4 in each iteration. Consider the general case. Suppose A ∈ Cn×n is nonsingular and b ∈ Cn . Suppose we know an approximation x k = [x k (1), . . . , x k (n)]T to the exact solution x of Ax = b. We need to assume that the rows are ordered so that A has nonzero diagonal elements. Solving the ith equation of Ax = b for x(i), we obtain a fixedpoint form of Ax = b i−1 n

  x(i) = − aij x(j ) − aij x(j ) + bi /aii , j =1

i = 1, 2, . . . , n.

(12.1)

j =i+1

1. In Jacobi’s method (J method) we substitute x k into the right hand side of (12.1) and compute a new approximation by i−1 n

  x k+1 (i) = − aij x k (j ) − aij x k (j ) + bi /aii , for i = 1, 2, . . . , n. j =1

j =i+1

(12.2) 2. Gauss-Seidel’s method (GS method) is a modification of Jacobi’s method, where we use the new x k+1 (i) immediately after it has been computed. i−1 n

  x k+1 (i) = − aij x k+1 (j ) − aij x k (j ) + bi /aii , for i = 1, 2, . . . , n. j =1

j =i+1

(12.3) 3. The Successive overrelaxation method (SOR method) is obtained by introducing an acceleration parameter 0 < ω < 2 in the GS method. We write x(i) = ωx(i) + (1 − ω)x(i) and this leads to the method i−1 n

  aij x k+1 (j ) − aij x k (j ) + bi /aii + (1 − ω)x k (i). x k+1 (i) = ω − j =1

j =i+1

(12.4) The SOR method reduces to the Gauss-Seidel method for ω = 1. Denoting the gs gs right hand side of (12.3) by x k+1 we can write (12.4) as x k+1 = ωx k+1 + (1 − ω)x k , and we see that x k+1 is located on the straight line passing through the gs two points x k+1 and x k . The restriction 0 < ω < 2 is necessary for convergence

12.1 Classical Iterative Methods; Component Form

255

(cf. Theorem 12.6). Normally, the best results are obtained for the relaxation parameter ω in the range 1 ≤ ω < 2 and then x k+1 is computed by linear gs extrapolation, i.e., it is not located between x k+1 and x k . 4. We mention also briefly the symmetric successive overrelaxation method SSOR. One iteration in SSOR consists of two SOR sweeps. A forward SOR sweep (12.4), computing an approximation denoted x k+1/2 instead of x k+1 , is followed by a backward SOR sweep computing i−1 n

  x k+1 (i) = ω − aij x k+1/2 (j ) − aij x k+1 (j ) + bi /aii + (1 − ω)x k+1/2 (i) j =1

j =i+1

(12.5) in the order i = n, n − 1, . . . 1. The method is slower and more complicated than the SOR method. Its main use is as a symmetric preconditioner. For if A is symmetric then SSOR combines the two SOR steps in such a way that the resulting iteration matrix is similar to a symmetric matrix. We will not discuss this method any further here and refer to Sect. 13.6 for an alternative example of a preconditioner. We will refer to the J, GS and SOR methods as the classical (iterative) methods.

12.1.1 The Discrete Poisson System Consider the classical methods applied to the discrete Poisson matrix A ∈ Rn×n given by (10.8). Let n = m2 and set h = 1/(m + 1). In component form the linear system Ax = b can be written (cf. (10.4)) 4v(i, j )−v(i−1, j )−v(i+1, j )−v(i, j−1)−v(i, j+1) = h2 fi,j ,

i, j =1, . . . , m,

with homogenous boundary conditions also given in (10.4). Solving for v(i, j ) we obtain the fixed point form   v(i, j ) = v(i−1, j ) + v(i+1, j ) + v(i, j−1) + v(i, j+1) + ei,j /4, (12.6) where ei,j := fi,j /(m + 1)2 . The J, GS , and SOR methods take the form  J : v k+1 (i, j ) = v k (i−1, j ) + v k (i, j−1) + v k (i+1, j ) + v k (i, j+1)

 + e(i, j ) /4

 GS : v k+1 (i, j ) = v k+1 (i−1, j ) + v k+1 (i, j−1) + v k (i+1, j ) + v k (i, j+1)  + e(i, j ) /4

256

12 The Classical Iterative Methods

Table 12.1 The number of iterations kn to solve the discrete Poisson problem with n unknowns using the methods of Jacobi, Gauss-Seidel, and SOR (see text) with a tolerance 10−8 k100 385 194 35

J GS SOR

k2500 8386 4194 164

k10 000

k40 000

k160 000

324

645

1286

 SOR : v k+1 (i, j ) = ω v k+1 (i−1, j ) + v k+1 (i, j − 1) + v k (i + 1, j )  + v k (i, j + 1) + e(i, j ) /4 + (1 − ω)v k (i, j ). (12.7) We note that for GS and SOR we have used the natural ordering, i.e., (i1 , j1 ) < (i2 , j2 ) if and only if j1 ≤ j2 and i1 < i2 if j1 = j2 . For the J method any ordering can be used. In Algorithm 12.1 we give a MATLAB program to test the convergence of Jacobi’s method on the discrete Poisson problem. We carry out Jacobi iterations on the linear system (12.6) with F = (fi,j ) ∈ Rm×m , starting with V 0 = 0 ∈ R(m+2)×(m+2) . The output is the number of iterations k, to obtain V (k) − U M := maxi,j |vij − uij | < tol. Here [uij ] ∈ R(m+2)×(m+2) is the ”exact" solution of (12.6) computed using the fast Poisson solver in Algorithm 11.1. We set k = K + 1 if convergence is not obtained in K iterations. In Table 12.1 we show the output k = kn from this algorithm using F = ones(m, m) for m = 10, 50, K = 104 , and tol = 10−8 . We also show the number of iterations for Gauss-Seidel  and SORπwith  a value of ω known as the optimal acceleration parameter ω∗ := 2/ 1 + sin( m+1 ) . We will derive this value later. function k=jdp(F,K,tol) % k=jdp(F,K,tol) m=length(F); U=fastpoisson(F); V=zeros(m+2,m+2); E=F/(m+1)^2; for k=1:K V(2:m+1,2:m+1)=(V(1:m,2:m+1)+V(3:m+2,2:m+1)... +V(2:m+1,1:m)+V(2:m+1,3:m+2)+E)/4; if max(max(abs(V-U))) ρ(I − αo A) = λ1 κ := , λn

(12.19)

262

12 The Classical Iterative Methods

Fig. 12.1 The functions α → |1 − αλ1 | and α → |1 − αλn |

Proof The eigenvalues of I − αA are 1 − αλj ,

ρα := ρ(I − αA) := max|1 − αλj | = j

j = 1, . . . , n. We have

⎧ ⎪ ⎪ ⎨1 − αλ1 ,

1 − αλn , ⎪ ⎪ ⎩αλ − 1, 1

if α ≤ 0 if 0 < α ≤ αo if α > αo ,

see Fig. 12.1. Clearly 1 − αλn = αλ1 − 1 for α = αo and ραo = αo λ1 − 1 =

λ1 − λn κ −1 < 1. = λ1 + λn κ +1

We have ρα < 1 if and only if α > 0 and αλ1 − 1 < 1 showing convergence if and only if 0 < α < 2/λ1 and ρα > ραo for α ≤ 0 and α ≥ 2/λ1 . Finally, if 0 < α < αo then ρα = 1 − αλn > 1 − αo λn = ραo and if αo < α < 2/λ1 then ρα = αλ1 − 1 > αo λ1 − 1 = ραo .   For a positive definite matrix we obtain Corollary 12.1 (Rate of Convergence for the R Method) Suppose A is positive definite with largest and  smallest  eigenvalue λmax and λmin , respectively. Richardson’s method x k+1 = I − αA x k + b converges if and only if 0 < α < 2/λmax . 2 With α = αo := λmax +λ we have the error estimate min x k − x2 ≤

 κ − 1 k x 0 − x2 , κ +1

k = 0, 1, 2, . . .

(12.20)

where κ := λmax /λmin is the spectral condition number of A. Proof The spectral norm  2 is consistent and therefore x k − x2 ≤ I − αo Ak2 x 0 − x2 . But for a positive definite matrix the spectral norm is equal to the spectral radius and the result follows form (12.19).  

12.3 Convergence

263

12.3.2 Convergence of SOR The condition ω ∈ (0, 2) is necessary for convergence of the SOR method. Theorem 12.6 (Necessary Condition for Convergence of SOR) Suppose A ∈ Cn×n is nonsingular with nonzero diagonal elements. If the SOR method applied to A converges then ω ∈ (0, 2).   Proof We have (cf. (12.13)) Dx k+1 = ω AL x k+1 + AR x k + b + (1 − ω)Dx k   or x k+1 = ω Lx k+1 + Rx k + D −1 b + (1 − ω)x k , where L := D −1 AL and R := D −1 AR . Thus (I − ωL)x k+1 = (ωR + (1 − ω)I )x k + ωD −1 b so the following form of the iteration matrix is obtained   Gω = (I − ωL)−1 ωR + (1 − ω)I .

(12.21)

We next compute the determinant of Gω . Since I − ωL is lower triangular with ones on the diagonal, the same holds for the inverse by Lemma 2.5, and therefore the determinant of this matrix is equal to one. The matrix ωR + (1 − ω)I is upper triangular with 1 − ω on the diagonal and therefore its determinant equals (1 − ω)n . It follows that det(Gω ) = (1 − ω)n . Since the determinant of a matrix equals the product of its eigenvalues we must have |λ| ≥ |1 − ω| for at least one eigenvalue λ of Gω and we conclude that ρ(Gω ) ≥ |ω − 1|. But then ρ(Gω ) ≥ 1 if ω is not in the interval (0, 2) and by Theorem 12.4 SOR diverges.   The SOR method always converges for a positive definite matrix. Theorem 12.7 (SOR on Positive Definite Matrix) SOR converges for a positive definite matrix A ∈ Rn×n if and only if 0 < ω < 2. In particular, Gauss-Seidel’s method converges for a positive definite matrix. Proof By Theorem 12.6 convergence implies 0 < ω < 2. Suppose now 0 < ω < 2 and let (λ, x) be an eigenpair for Gω . Note that λ and x can be complex. We need to show that |λ| < 1. The following identity will be shown below: ω−1 (2 − ω)|1 − λ|2 x ∗ Dx = (1 − |λ|2 )x ∗ Ax,

(12.22)

where D := diag(a11 , . . . , ann ). Now x ∗ Ax and x ∗ Dx are positive for all nonzero x ∈ Cn since a positive definite matrix has positive diagonal elements aii = eTi Aei > 0. It follows that the left hand side of (12.22) is nonnegative and then the right hand side must be nonnegative as well. This implies |λ| ≤ 1. If |λ| = 1 then (12.22) implies that λ = 1 and it remains to show that this is not possible. By (12.10) and (12.12) we have −1 −1 Gω x = (I − M −1 ω A)x = x − (ω D − AL ) Ax

264

12 The Classical Iterative Methods

and the eigenpair equation Gω x = λx can be written x −(ω−1 D −AL )−1 Ax = λx or Ax = (ω−1 D − AL )y,

y := (1 − λ)x.

(12.23)

Now Ax = 0 implies that λ = 1. To prove equation (12.22) we first show that Ey = λAx,

E := ω−1 D + AR − D = ω−1 D − AL − A.

(12.24)

The second equality follows immediately from A = D − AL − AR . By (12.23) and (12.24) we have Ey = (ω−1 D − AL − A)y = Ax − Ay = Ax − (1 − λ)Ax = λAx. Again using (12.23), (12.24) and adding (Ax)∗ y = y ∗ (ω−1 D − AL )∗ y = y ∗ (ω−1 D − AR )y and y ∗ (λAx) = y ∗ Ey = y ∗ (ω−1 D + AR − D)y we find (Ax)∗ y + y ∗ (λAx) = y ∗ (ω−1 D − AR )y + y ∗ (ω−1 D + AR − D)y = y ∗ (2ω−1 − 1)Dy = ω−1 (2 − ω)|1 − λ|2 x ∗ Dx. Since (Ax)∗ = x ∗ A∗ = x ∗ A, y := (1 − λ)x and y ∗ = (1 − λ)x ∗ this also equals (Ax)∗ y + y ∗ (λAx) = (1 − λ)x ∗ Ax + λ(1 − λ)x ∗ Ax = (1 − |λ|2 )x ∗ Ax,  

and (12.22) follows.

12.3.3 Convergence of the Classical Methods for the Discrete Poisson Matrix We know the eigenvalues of the discrete Poisson matrix A given by (10.8) and we can use this to estimate the number of iterations necessary to achieve a given accuracy for the various methods. Recall that by (10.15) the eigenvalues λj,k of A are λj,k = 4 − 2 cos (j πh) − 2 cos (kπh),

j, k = 1, . . . , m, h = 1/(m + 1).

Consider first the J method. The matrix GJ = I −D −1 A = I −A/4 has eigenvalues 1 1 1 μj,k = 1 − λj,k = cos(j πh) + cos(kπh), 4 2 2

j, k = 1, . . . , m.

(12.25)

12.3 Convergence

265

It follows that ρ(GJ ) = cos(πh) < 1. Since GJ is symmetric it is normal, and the spectral norm is equal to the spectral radius (cf. Theorem 8.4). We obtain x k −x2 ≤ GJ k2 x 0 −x2 = cosk (πh)x 0 −x2 ,

k = 0, 1, 2, . . .

(12.26)

The R method given by x k+1 = x k + αr k with α = 2/(λmax + λmin ) = 1/4 is the same as the J-method so (12.26) holds in this case as well. This also follows from Corollary 12.1 with κ given by (10.20). For the SOR method it is possible to explicitly determine ρ(Gω ) for any ω ∈ (0, 2). The following result will be shown in Sect. 12.5. Theorem 12.8 (The Spectral Radius of SOR Matrix) Consider the SOR iteration (12.1.1), with the natural ordering. The spectral radius of Gω is ⎧   ⎨ 1 ωβ + (ωβ)2 − 4(ω − 1) 2 , 4 ρ(Gω ) = ⎩ω − 1,

for 0 < ω ≤ ω∗ , for ω∗ < ω < 2,

(12.27)

where β := ρ(GJ ) = cos(πh) and ω∗ :=

2  > 1. 1 + 1 − β2

(12.28)

Moreover, ρ(Gω ) > ρ(Gω∗ ) for ω ∈ (0, 2) \ {ω∗ }.

(12.29)

A plot of ρ(Gω ) as a function of ω ∈ (0, 2) is shown in Fig. 12.2 for n = 100 (lower curve) and n = 2500 (upper curve). As ω increases the spectral radius of Gω decreases monotonically to the minimum ω∗ . Then it increases linearly to the value one for ω = 2. We call ω∗ the optimal relaxation parameter. For the discrete Poisson problem we have β = cos(πh) and it follows from (12.27), (12.28) that ω∗ =

2 , 1 + sin(πh)

ρ(Gω∗ ) = ω∗ − 1 =

1 − sin(πh) , 1 + sin(πh)

h=

1 . m+1 (12.30)

Letting ω = 1 in (12.27) we find ρ(G1 ) = β 2 = ρ(GJ )2 = cos2 (πh) for the GS method. Thus, for the discrete Poisson problem the J method needs twice as many iterations as the GS method for a given accuracy. The values of ρ(GJ ), ρ(G1 ), and ρ(Gω∗ ) = ω∗ − 1 are shown in Table 12.2 for n = 100 and n = 2500. We also show the smallest integer kn such that ρ(G)kn ≤ 10−8 . This is an estimate for the number of iteration needed to obtain an accuracy of 10−8 . These values are comparable to the exact values given in Table 12.1.

266

12 The Classical Iterative Methods

1.0 0.9 0.8 0.7 0.6 0.5 0.0

0.5

1.0

1.5

2.0

Fig. 12.2 ρ(Gω ) with ω ∈ [0, 2] for n = 100, (lower curve) and n = 2500 (upper curve) Table 12.2 Spectral radial for GJ , G1 , Gω∗ and the smallest integer kn such that ρ(G)kn ≤ 10−8

J GS SOR

n=100 0.959493 0.920627 0.56039

n = 2500 0.998103 0.99621 0.88402

k100 446 223 32

k2500 9703 4852 150

12.3.4 Number of Iterations Consider next the rate of convergence of the iteration x k+1 = Gx k + c. We like to know how fast the iterative method converges. Recall that x k − x = Gk (x 0 − x). For k sufficiently large x k − x ≤ Gk x 0 − x ≈ ρ(G)k x 0 − x. For the last formula we apply Theorem 12.13 which says that limk→∞ Gk 1/ k = ρ(G). For Jacobi’s method and the spectral norm we have GkJ 2 = ρ(GJ )k (cf. (12.26)). For fast convergence we should use a G with small spectral radius. Lemma 12.1 (Number of Iterations) Suppose ρ(G) = 1 − η for some 0 < η < 1 and let s ∈ N. Then s log(10) k˜ := η

(12.31)

is an estimate for the smallest number of iterations k so that ρ(G)k ≤ 10−s .

12.3 Convergence

267

Proof The estimate k˜ is an approximate solution of the equation ρ(G)k = 10−s . Thus, since − log(1 − η) ≈ η when η is small k=−

s log (10) s log(10) ˜ ≈ = k. log(1 − η) η  

The following estimates are obtained. They agree with those we found numerically in Sect. 12.1.1. • R and J: ρ(GJ ) = cos(πh) = 1 − η, η = 1 − cos(πh) = 12 π 2 h2 + O(h4 ) = π2 −2 2 /n + O(n ).

Thus, 2 log(10)s n + O(n−1 ) = O(n). k˜n = π2

• GS: ρ(G1 ) = cos2 (πh) = 1 − η, η = 1 − cos2 (πh) = sin2 πh = π 2 h2 + O(h4 ) = π 2 /n + O(n−2 ). Thus, log(10)s k˜n = n + O(n−1 ) = O(n). π2 • SOR: ρ(Gω∗ ) =

1−sin(πh) 1+sin(πh)

= 1 − 2πh + O(h2 ). Thus,

√ log(10)s √ n + O(n−1/2 ) = O( n). k˜n = 2π We note that 1. The convergence depends on the behavior of the powers Gk as k increases. The matrix M should be chosen so that all elements in Gk converge quickly to zero and such that the linear system (12.9) is easy to solve for x k+1 . These are conflicting demands. M should be an approximation to A to obtain a G with small elements, but then (12.9) might not be easy to solve for x k+1 . 2. The convergence limk→∞ Gk 1/ k = ρ(G) can be quite slow (cf. Exercise 12.15).

12.3.5 Stopping the Iteration In Algorithms 12.1 and 12.2 we had access to the exact solution and could stop the iteration when the error was sufficiently small in the infinity norm. The decision when to stop is obviously more complicated when the exact solution is not known. One possibility is to choose a vector norm, keep track of x k+1 − x k , and stop

268

12 The Classical Iterative Methods

when this number is sufficiently small. The following result indicates that x k − x can be quite large if G is close to one. Lemma 12.2 (Be Careful When Stopping) If x k = Gx k−1 + c, x = Gx + c and G < 1 then x k − x k−1  ≥

1 − G x k − x, G

k ≥ 1.

(12.32)

Proof We find x k − x = G(x k−1 − x) ≤ Gx k−1 − x   = Gx k−1 − x k + x k − x ≤ G x k−1 − x k  + x k − x . Thus (1 − G)x k − x ≤ Gx k−1 − x k  which implies (12.32).

 

Another possibility is to stop when the residual vector r k := b − Ax k is sufficiently small in some norm. To use the residual vector for stopping it is convenient to write the iterative method (12.10) in an alternative form. If M is the splitting matrix of the method then by (12.9) we have Mx k+1 = Mx k − Ax k + b. This leads to x k+1 = x k + M −1 r k ,

r k = b − Ax k .

(12.33)

Testing on r k works fine if A is well conditioned, but Theorem 8.8 shows that the relative error in the solution can be much larger than the relative error in r k if A is ill-conditioned.

12.4 Powers of a Matrix Let A ∈ Cn×n be a square matrix. In this section we consider the special matrix sequence {Ak } of powers of A. We want to know when this sequence converges to the zero matrix. Such a sequence occurs in iterative methods (cf. (12.16)), in Markov processes in statistics, in the converge of geometric series of matrices (Neumann series cf. Sect. 12.4.2) and in many other applications.

12.4.1 The Spectral Radius In this section we show the following important theorem.

12.4 Powers of a Matrix

269

Theorem 12.10 (When Is limk→∞ Ak = 0?) For any A ∈ Cn×n we have lim Ak = 0 ⇐⇒ ρ(A) < 1,

k→∞

where ρ(A) is the spectral radius of A given by (12.17). Clearly ρ(A) < 1 is a necessary condition for limk→∞ Ak = 0. For if (λ, x) is an eigenpair of A with |λ| ≥ 1 and x2 = 1 then Ak x = λk x, and this implies Ak 2 ≥ Ak x2 = λk x2 = |λ|k , and it follows that Ak does not tend to zero. The sufficiency condition is harder to show. We construct a consistent matrix norm on Cn×n such that A < 1 and then use Theorems 12.2 and 12.3. We start with Theorem 12.11 (Any Consistent Norm Majorizes the Spectral Radius) For any matrix norm · that is consistent on Cn×n and any A ∈ Cn×n we have ρ(A) ≤ A. Proof Let (λ, x) be an eigenpair for A,   a consistent matrix norm on Cn×n and define X := [x, . . . , x] ∈ Cn×n . Then λX = AX, which implies |λ| X = λX = AX ≤ A X. Since X = 0 we obtain |λ| ≤ A.   The next theorem shows that if ρ(A) < 1 then A < 1 for some consistent matrix norm on Cn×n , thus completing the proof of Theorem 12.10. Theorem 12.12 (The Spectral Radius Can Be Approximated by a Norm) Let A ∈ Cn×n and > 0 be given. There is a consistent matrix norm · on Cn×n such that ρ(A) ≤ A ≤ ρ(A) + . Proof Let A have eigenvalues λ1 , . . . , λn . By the Schur Triangulation Theorem 6.5 there is a unitary matrix U and an upper triangular matrix R = [rij ] such that U ∗ AU = R. For t > 0 we define D t := diag(t, t 2 , . . . , t n ) ∈ Rn×n , and note that i−j r for all i, j . For n = 3 the (i, j ) element in D t RD −1 ij t is given by t ⎤ λ1 t −1 r12 t −2 r13 = ⎣ 0 λ2 t −1 r23 ⎦ . 0 0 λ3 ⎡

D t RD −1 t

For each B ∈ Cn×n and t > 0 we use the one norm to define the matrix norm Bt := D t U ∗ BU D −1 t 1 . We leave it as an exercise to show that  t is a consistent matrix norm on Cn×n . We define B := Bt , where t is chosen so large that the sum of the absolute values of all off-diagonal elements in D t RD −1 t is less than . Then A = D t U



AU D −1 t 1

=

D t RD −1 t 1

≤ max (|λj | + ) = ρ(A) + . 1≤j ≤n

n

  = max | D t RD −1 | t ij 1≤j ≤n

i=1

270

12 The Classical Iterative Methods

  A consistent matrix norm of a matrix can be much larger than the spectral radius. However the following result holds. Theorem 12.13 (Spectral Radius Convergence) For any consistent matrix norm · on Cn×n and any A ∈ Cn×n we have lim Ak 1/ k = ρ(A).

k→∞

(12.34)

Proof Let   be a consistent matrix norm on Cn×n . If λ is an eigenvalue of A then λk is an eigenvalue of Ak for any k ∈ N. By Theorem 12.11 we then obtain ρ(A)k = ρ(Ak ) ≤ Ak  for any k ∈ N so that ρ(A) ≤ Ak 1/ k . Let > 0 and consider the matrix B := (ρ(A) + )−1 A. Then ρ(B) = ρ(A)/(ρ(A) + ) < 1 and B k  → 0 by Theorem 12.10 as k → ∞. Choose N ∈ N such that B k  < 1 for all k ≥ N. Then for k ≥ N  k  k Ak  = (ρ(A) + )k B k  = ρ(A) + B k  < ρ(A) + . We have shown that ρ(A) ≤ Ak 1/ k ≤ ρ(A) + for k ≥ N. Since is arbitrary the result follows.  

12.4.2 Neumann Series Let B be a square matrix. In this section we consider the Neumann series ∞

Bk

k=0

which is a matrix analogue of a geometric series of numbers. n×n . We say that the series Consider an infinite series ∞ k=0 Ak of matrices in C converges if the sequence of partial sums {S m } given by S m = m k=0 Ak converges. The series converges if and only if {S m } is a Cauchy sequence, i.e. to each > 0 there exists an integer N so that S l − S m  < for all l > m ≥ N. Theorem 12.14 (Neumann Series) Suppose B ∈ Cn×n . Then k 1. The series ∞ k=0 B converges if and only if ρ(B) < 1. k 2. If ρ(B) < 1 then (I − B) is nonsingular and (I − B)−1 = ∞ k=0 B . n×n 3. If B < 1 for some consistent matrix norm · on C then (I − B)−1  ≤

1 . 1 − B

(12.35)

12.5 The Optimal SOR Parameter ω

271

Proof

k 1. Suppose ρ(B) < 1. We show that S m := m k=0 B is a Cauchy sequence and hence convergent. Let > 0. By Theorem 12.12 there is a consistent matrix norm · on Cn×n such that B < 1. Then for l > m S l − S m  = 

l

Bk ≤

k=m+1

l

Bk ≤ Bm+1

k=m+1



Bm+1 . Bk = 1 − B k=0

But then {S m } is a Cauchy sequence provided N is such that B 1−B < . Conversely, suppose (λ, x) is an eigenpair for B with |λ| ≥ 1. We find S m x =   m k x = m λk x. Since λk does not tend to zero the series ∞ λk is B k=0 k=0 k=0 not convergent and therefore {S m x} and hence {S m } does not converge. 2. We have N+1

m 

 B k (I − B) = I + B + · · · + B m − (B + · · · + B m+1 ) = I − B m+1 .

k=0

(12.36) Since ρ(B) < 1 we conclude that B m+1 → 0 and hence taking limits in (12.36)  ∞ k − B) = I which completes the proof of 2. we obtain k=0 B (I ∞ 1 k k 3. By 2: (I − B)−1  =  ∞ k=0 B  ≤ k=0 B = 1−B .  

12.5 The Optimal SOR Parameter ω The following analysis is only carried out for the discrete Poisson matrix. It also holds for the averaging matrix given by (10.10). A more general theory is presented in [22]. We will compare the eigenpair equations for GJ and Gω . It is convenient to write these equations using the matrix formulation T V + V T = h2 F . If GJ v = μv is an eigenpair of GJ then 1 (vi−1,j + vi,j −1 + vi+1,j + vi,j +1 ) = μvi,j , 4

i, j = 1, . . . , m,

(12.37)

2

where v := vec(V ) ∈ Rm and vi,j = 0 if i ∈ {0, m + 1} or j ∈ {0, m + 1}. Suppose (λ, w) is an eigenpair for Gω . By (12.21) (I − ωL)−1 ωR + (1 −  ω)I w = λw or (ωR + λωL)w = (λ + ω − 1)w.

(12.38)

272

12 The Classical Iterative Methods

Let w = vec(W ), where W ∈ Cm×m . Then (12.38) can be written ω (λwi−1,j + λwi,j −1 + wi+1,j + wi,j +1 ) = (λ + ω − 1)wi,j , 4

(12.39)

where wi,j = 0 if i ∈ {0, m + 1} or j ∈ {0, m + 1}. Theorem 12.15 (The Optimal ω) Consider the SOR method applied to the discrete Poisson matrix (10.10), where we use the natural ordering. Moreover, assume ω ∈ (0, 2). 1. If λ = 0 is an eigenvalue of Gω then μ :=

λ+ω−1 ωλ1/2

(12.40)

is an eigenvalue of GJ . 2. If μ is an eigenvalue of GJ and λ satisfies the equation μωλ1/2 = λ + ω − 1

(12.41)

then λ is an eigenvalue of Gω . Proof Suppose (λ, w) is an eigenpair for Gω . We claim that (μ, v) is an eigenpair for GJ , where μ is given by (12.40) and v = (V ) with vi,j := λ−(i+j )/2 wi,j . Indeed, replacing wi,j by λ(i+j )/2 vi,j in (12.39) and cancelling the common factor λ(i+j )/2 we obtain ω (vi−1,j + vi,j −1 + vi+1,j + vi,j +1 ) = λ−1/2 (λ + ω − 1)vi,j . 4 But then GJ v = (L + R)v =

λ+ω−1 v = μv. ωλ1/2

For the converse let (μ, v) be an eigenpair for GJ and let et λ be a solution of (12.41). We define as before v =: vec(V ), W = vec(W ) with wi,j := λ(i+j )/2 vi,j . Inserting this in (12.37) and canceling λ−(i+j )/2 we obtain 1 1/2 (λ wi−1,j + λ1/2 wi,j −1 + λ−1/2 wi+1,j + λ−1/2 wi,j +1 ) = μwi,j . 4 Multiplying by ωλ1/2 we obtain ω (λwi−1,j + λwi,j −1 + wi+1,j + wi,j +1 ) = ωμλ1/2 wi,j , 4 Thus, if ωμλ1/2 = λ + ω − 1 then by (12.39) (λ, w) is an eigenpair for Gω .

 

12.5 The Optimal SOR Parameter ω

273

Proof of Theorem 12.8 Combining statement 1 and 2 in Theorem 12.15 we see that ρ(Gω ) = |λ(μ)|, where λ(μ) is an eigenvalue of Gω satisfying (12.41) for some eigenvalue μ of GJ . The eigenvalues of GJ are 12 cos(j πh) + 12 cos(kπh), j, k = 1, . . . , m, so μ is real and both μ and −μ are eigenvalues. Thus, to compute ρ(Gω ) it is enough to consider (12.41) for a positive eigenvalue μ of GJ . Solving (12.41) for λ = λ(μ) gives λ(μ) :=

1 ωμ ± 4

% 2 (ωμ)2 − 4(ω − 1) .

(12.42)

Both roots λ(μ) are eigenvalues of Gω . The discriminant d(ω) := (ωμ)2 − 4(ω − 1). is strictly decreasing on (0, 2) since d  (ω) = 2(ωμ2 − 2) < 2(ω − 2) < 0. Moreover d(0) = 4 > 0 and d(2) = 4μ2 − 4 < 0. As a function of ω, λ(μ) changes from real to complex when d(ω) = 0. The root in (0, 2) is ω = ω(μ) ˜ := 2

1−



1 − μ2 2  = . μ2 1 + 1 − μ2

(12.43)

In the complex case we find |λ(μ)| =

 1 (ωμ)2 + 4(ω − 1) − (ωμ)2 = ω − 1, 4

ω(μ) ˜ < ω < 2.

In the real case both roots of (12.42) are positive and the larger one is 1 λ(μ) = ωμ + 4

%

2 (ωμ)2 − 4(ω − 1) ,

0 < ω ≤ ω(μ). ˜

(12.44)

Both λ(μ) and ω(μ) ˜ are strictly increasing as functions of μ. It follows that |λ(μ)| is maximized for μ = ρ(GJ ) =: β and for this value of μ we obtain (12.27) for 0 < ω ≤ ω(β) ˜ = ω∗ . Evidently ρ(Gω ) = ω − 1 is strictly increasing in ω∗ < ω < 2. Equation (12.29) will follow if we can show that ρ(Gω ) is strictly decreasing in 0 < ω < ω∗ . By differentiation %  β (ωβ)2 − 4(ω − 1) + ωβ 2 − 2 d  2  ωβ + (ωβ) − 4(ω − 1) = . dω (ωβ)2 − 4(ω − 1) Since β 2 (ω2 β 2 − 4ω + 4) < (2 − ωβ 2 )2 the numerator is negative and the strict decrease of ρ(Gω ) in 0 < ω < ω∗ follows.  

274

12 The Classical Iterative Methods

12.6 Exercises Chap. 12 12.6.1 Exercises Sect. 12.3 Exercise 12.1 (Richardson and Jacobi) Show that if aii = d = 0 for all i then Richardson’s method with α := 1/d is the same as Jacobi’s method. Exercise 12.2 (R-Method When Eigenvalues Have Positive Real Part) Suppose all eigenvalues λj of A have positive real parts uj for j = 1, . . . , n and that α is real. Show that the R method converges if and only if 0 < α < minj (2uj /|λj |2 ). Exercise 12.3 (Divergence Example for J and GS)  Show that both Jacobi’s method and Gauss-Seidel’s method diverge for A = 13 24 . Exercise 12.4 (2 by 2 Matrix) We want to show that converges if  Gauss-Seidel a12  2×2 . and only if Jacobi converges for a 2 by 2 matrix A := aa11 ∈ R 21 a22 a) Show that the spectral radius for the Jacobi method is ρ(GJ ) =



|a21a12 /a11 a22|.

b) Show that the spectral radius for the Gauss-Seidel method is ρ(G1 ) = |a21 a12 /a11a22 |. c) Conclude that Gauss-Seidel converges if and only if Jacobi converges. Exercise 12.5 (Example: ! GS "Converges, J Diverges) Show (by finding its eigen1a a

values) that the matrix a 1 a is positive definite for −1/2 < a < 1. Thus, GS a a 1 converges for these values of a. Show that the J method does not converge for 1/2 < a < 1. Exercise 12.6 (Example: GS Diverges, J Converges) Let GJ and G1 be the iteration  matrices  for the Jacobi and Gauss-Seidel methods applied to the matrix A :=

1 0 1/2 1 1 0 −1 1 1

.1

a) Show that G1 :=



0 0 −1/2 0 0 1/2 0 0 −1

 and conclude that GS diverges.

b) Show that p(λ) := det(λI − GJ ) = λ3 + 12 λ + 12 . c) Show that if |λ| ≥ 1 then p(λ) = 0. Conclude that J converges. Exercise 12.7 (Strictly Diagonally Dominance; The J Method) Show that the J method converges if |aii | > j =i |aij | for i = 1, . . . , n. 1 Stewart Venit, “The convergence of Jacobi and Gauss-Seidel iteration”, Mathematics Magazine 48 (1975), 163–167.

12.6 Exercises Chap. 12

275

Exercise 12.8 (Strictly Diagonally Dominance; The GS Method) Consider the |aij | GS method. Suppose r := maxi ri < 1, where ri = j =i |aii | . Show using induction on i that | k+1 (j )| ≤ r k ∞ for j = 1, . . . , i. Conclude that GaussSeidel’s method is convergent when A is strictly diagonally dominant. Exercise 12.9 (Convergence Example for Fix Point Iteration) Consider for a ∈ C        0 a x1 1−a x1 = + x := =: Gx + c. x2 a 0 x2 1−a Starting with x 0 = 0 show by induction x k (1) = x k (2) = 1 − a k ,

k ≥ 0,

and conclude that the iteration converges to the fixed-point x = [1, 1]T for |a| < 1 and diverges for |a| > 1. Show that ρ(G) = 1 − η with η = 1 − |a|. Compute the estimate (12.31) for the rate of convergence for a = 0.9 and s = 16 and compare with the true number of iterations determined from |a|k ≤ 10−16 . Exercise 12.10 (Estimate in Lemma 12.1 Can Be Exact) Consider the iteration in Example 12.2. Show that ρ(GJ ) = 1/2. Then show that x k (1) = x k (2) = 1−2−k for k ≥ 0. Thus the estimate in Lemma 12.1 is exact in this case. Exercise 12.11 (Iterative Method (Exam Exercise 1991-3)) Let A ∈ Rn×n be a symmetric positive definite matrix with ones on the diagonal and let b ∈ Rn . We will consider an iterative method for the solution of Ax = b. Observe that A may be written A = I −L−LT , where L is lower triangular with zero’s on the diagonal, li,j = 0, when j >= i. The method is defined by Mx k+1 = N x k + b,

(12.45)

where M and N are given by the splitting A = M − N,

M = (I − L)(I − LT ),

N = LLT .

(12.46)

a) Let x = 0 be an eigenvector of M −1 N with eigenvalue λ. Show that λ=

xT N x . + xT N x

x T Ax

(12.47)

b) Show that the sequence {x k } generated by (12.45) converges to the solution x of Ax = b for any starting vector x 0 .

276

12 The Classical Iterative Methods

c) Consider the following algorithm 1. Choose x = [x(1), x(2), . . . , x(n)]T . 2. for k = 1, 2, 3, . . . for i = 1, 2, . . . , n − 1, n, n, n − 1, n − 2, . . . , 1

a(i, j )x(j ) x(i) = b(i) −

(12.48)

j =i

Is there a connection between this algorithm and the method of Gauss-Seidel? Show that the algorithm (12.48) leads up to the splitting (12.46). Exercise 12.12 (Gauss-Seidel Method (Exam Exercise 2008-1)) Consider the linear system Ax = b in which ⎡

⎤ 301 A := ⎣0 7 2⎦ 124 and b := [1, 9, −2]T . a) With x 0 = [1, 1, 1]t , carry out one iteration of the Gauss-Seidel method to find x 1 ∈ R3 . b) If we continue the iteration, will the method converge? Why? c) Write a MATLAB program for the Gauss-Seidel method applied to a matrix A ∈ Rn×n and right-hand side b ∈ Rn . Use the ratio of the current residual to the initial residual as the stopping criterion, as well as a maximum number of iterations.2

12.6.2 Exercises Sect. 12.4 Exercise 12.13 (A Special Norm) Show that Bt := D t U ∗ BU D −1 t 1 defined in the proof of Theorem 12.12 is a consistent matrix norm on Cn×n . Exercise 12.14 (Is A + E Nonsingular?) Suppose A ∈ Cn×n is nonsingular and E ∈ Cn×n . Show that A + E is nonsingular if ρ(A−1 E) < 1.

2 Hint:

The function C=tril(A) extracts the lower part of A into a lower triangular matrix C.

12.7 Review Questions

277

Exercise 12.15 (Slow Spectral Radius Convergence) The convergence limk→∞  Ak 1/ k = ρ(A) can be quite slow. Consider ⎤

⎡λ a

0 ··· 0 0 0 λ a ··· 0 0 ⎢ 0 0 λ ··· 0 0 ⎥

A := ⎢ ⎣ ...

∈ Rn×n . .. ⎥ .⎦

0 0 0 ··· λ a 0 0 0 ··· 0 λ

If |λ| = ρ(A) < 1 then limk→∞ Ak = 0 for any a ∈ R. We show below that the k (1, n) element of Ak is given by f (k) := n−1 a n−1 λk−n+1 for k ≥ n − 1. a) Pick an n, e.g. n = 5, and make a plot of f (k) for λ = 0.9, a = 10, and n − 1 ≤ k ≤ 200. Your program should also compute maxk f (k). Use your program to determine how large k must be before f (k) < 10−8 . b) We can determine the elements!of Ak explicitly for any k. Let E := (A − λI )/a. " for 1 ≤ k ≤ n − 1 and that E n = 0. Show by induction that E k = 00 I n−k 0 min{k,n−1} k  j k−j j c) We have Ak = (aE + λI )k = j =0 E and conclude that the j a λ (1, n) element is given by f (k) for k ≥ n − 1.

12.7 Review Questions 12.7.1 Consider a matrix A ∈ Cn×n with nonzero diagonal elements. • • • •

Define the J and GS method in component form, Do they always converge? Give a necessary and sufficient condition that An → 0. Is there a matrix norm   consistent on Cn×n such that A < ρ(A)?

12.7.2 What is a Neumann series? when does it converge? 12.7.3 How do we define convergence of a fixed point iteration x k+1 = Gx k + c? When does it converge? 12.7.4 Define Richardson’s method.

Chapter 13

The Conjugate Gradient Method

The conjugate gradient method was published by Hestenes and Stiefel in 1952, [6] as a direct method for solving linear systems. Today its main use is as an iterative method for solving large sparse linear systems. On a test problem we show that it performs as well as the SOR method with optimal acceleration parameter, and we do not have to estimate any such parameter. However the conjugate gradient method is restricted to positive definite systems. We also consider a mathematical formulation of the preconditioned conjugate gradient method. It is used to speed up convergence of the conjugate gradient method. We only give one example of a possible preconditioner. See [1] for a more complete treatment of iterative methods and preconditioning. The conjugate gradient method can also be used for minimization and is related to a method known as steepest descent. This method and the conjugate gradient method are both minimization methods, and iterative methods, for solving equations. Throughout this chapter A ∈ Rn×n will be a symmetric and positive definite matrix. We recall that A has positive eigenvalues and that the spectral (2-norm) condition number of A is given by κ := λλmax , where λmax and λmin are the largest min and smallest eigenvalue of A. The analysis of the methods in this chapter is in terms of two inner products T on Rn , the √ usual inner product x, y = x y with the associated Euclidian norm T x2 = x x , and the A-inner product and the corresponding A-norm given by x, yA := x T Ay,

yA :=

% y T Ay,

x, y ∈ Rn .

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_13

(13.1)

279

280

13 The Conjugate Gradient Method

We note that the A-inner product is an inner product on Rn . Indeed, for any x, y, z ∈ Rn 1. x, xA = x T Ax ≥ 0 and x, xA = 0 if and only if x = 0, since A is positive definite, 2. x, yA := x T Ay = (x T Ay)T = y T AT x = y T Ax = y, xA by symmetry of A, 3. x + y, zA := x T Az + y T Az = x, zA + y, zA , true for any A. By Theorem 5.2 the A-norm is a vector norm on Rn since it is an inner product norm, and the Cauchy-Schwarz inequality holds |x T Ay|2 ≤ (x T Ax)(y T Ay),

x, y ∈ Rn .

(13.2)

13.1 Quadratic Minimization and Steepest Descent We start by discussing some aspect of quadratic minimization and its relation to solving linear systems. Consider for a positive definite A ∈ Rn×n , b ∈ Rn and c ∈ R the quadratic function Q : Rn → R given by Q(y) :=

1 T y Ay − bT y + c. 2

(13.3)

As an example, some level curves of Q(x, y) :=

   1   2 −1 x = x 2 − xy + y 2 xy −1 2 y 2

(13.4)

are shown in Fig. 13.1. The level curves are ellipses and the graph of Q is a paraboloid (cf. Exercise 13.2). The following expansion will be used repeatedly. For y, h ∈ Rn and ε ∈ R 1 Q(y + εh) = Q(y) − εhT r(y) + ε2 hT Ah, where r(y) := b − Ay. 2

(13.5)

Minimizing a quadratic function is equivalent to solving a linear system. Lemma 13.1 (Quadratic Function) A vector x ∈ Rn minimizes Q given by (13.3) if and only if Ax = b. Moreover, the residual r(y) := b − Ay for any y ∈ Rn is "T ! equal to the negative gradient, i.e., r(y) = −∇Q(y), where ∇ := ∂y∂ 1 , . . . , ∂y∂ n . Proof If y = x, ε = 1, and Ax = b, then (13.5) simplifies to Q(x + h) = Q(x) + 12 hT Ah, and since A is positive definite Q(x + h) > Q(x) for all nonzero h ∈ Rn . It follows that x is the unique minimum of Q. Conversely, if Ax = b and

13.1 Quadratic Minimization and Steepest Descent

x2

x0

281

x2 x3

x1

x0

x1

Fig. 13.1 Level curves for Q(x, y) given by (13.4). Also shown is a steepest descent iteration (left) and a conjugate gradient iteration (right) to find the minimum of Q (cf Examples 13.1,13.2)

h := r(x), then by (13.5), Q(x + εh) − Q(x) = −ε(hT r(x) − 12 εhT Ah) < 0 for ε > 0 sufficiently small. Thus x does not minimize Q. By (13.5) for y ∈ Rn ∂ 1 Q(y) := lim (Q(y + εei ) − Q(y)) ε→0 ε ∂yi 1 1 = lim −εeTi r(y)) + ε2 eTi Aei = −eTi r(y), ε→0 ε 2 showing that r(y) = −∇Q(y).

i = 1, . . . , n,  

A general class of minimization algorithms for Q and solution algorithms for a linear system is given as follows: 1. Choose x 0 ∈ Rn . 2. For k = 0, 1, 2, . . . Choose a “search direction” p k , Choose a “step length” αk , Compute x k+1 = x k + αk p k . (13.6) We would like to generate a sequence {x k } that converges quickly to the minimum x of Q.

282

13 The Conjugate Gradient Method

For a fixed direction p k we say that αk is optimal if Q(x k+1 ) is as small as possible, i.e. Q(x k+1 ) = Q(x k + αk p k ) = min Q(x k + αp k ). α∈R

By (13.5) we have Q(x k + αp k ) = Q(x k ) − αp Tk r k + 12 α 2 pTk Apk , where r k := ∂ b−Ax k . Since pTk Apk ≥ 0 we find a minimum αk by solving ∂α Q(x k +αp k ) = 0. It follows that the optimal αk is uniquely given by αk :=

pTk r k p Tk Ap k

(13.7)

.

In the method of steepest descent, also known as the gradient method, we choose p k = r k the negative gradient, and the optimal αk . Starting from x 0 we compute for k = 0, 1, 2 . . . x k+1 = x k +

 r Tk r k  r Tk Ar k

(13.8)

rk.

This is similar to Richardson’s method (12.18), but in that method we used a constant step length. Computationally, a step in the steepest descent iteration can be organized as follows p k = r k , t k = Apk , αk = (pTk r k )/(pTk t k ),

(13.9)

x k+1 = x k + αk pk , r k+1 = r k − αk t k . Here, and in general, the following update of the residual is used: r k+1 = b − Ax k+1 = b − A(x k + αk pk ) = r k − αk Ap k .

(13.10)

In the steepest descent method the choice pk = r k implies that the last two gradients are orthogonal. Indeed, by (13.10), r Tk+1 r k = (r k − αk Ar k )T r k = 0 since αk = r Tk r k r Tk Ar k

and A is symmetric.

Example 13.1 (Steepest Descent Iteration) Suppose Q(x, y) is given by (13.4). Starting with x 0 = [−1, −1/2]T and r 0 = −Ax 0 = [3/2, 0]T we find t0 = 3



1 −1/2

t 1 = 3 ∗ 4−1



α0 =

,

 −1  2

,

1 , 2

α1 =

x 1 = −4−1 1 , 2

1 2

x 2 = −4−1

, 

r 1 = 3 ∗ 4−1 1 1/2



,

0 1

r 2 = 3 ∗ 4−1

!

1/2 0

" ,

13.2 The Conjugate Gradient Method

283

and in general for k ≥ 1  1      t 2k−2 = 3 ∗ 41−k −1/2 , x 2k−1 = −4−k 12 , r 2k−1 = 3 ∗ 4−k 01 ! "    1  −k −k 1/2 . , x , r = −4 = 3 ∗ 4 t 2k−1 = 3 ∗ 4−k −1 2k 2k 1/2 2 0 Since αk = 1/2 is constant for all k the methods of Richardson, Jacobi and steepest descent are the same on this simple problem. See the left part of Fig. 13.1. The rate of convergence is determined from x j +1 2 /x j  = r j +1 2 /r j 2 = 1/2 for all j .

13.2 The Conjugate Gradient Method In the steepest descent method the last two gradients are orthogonal. In the conjugate gradient method all gradients are orthogonal.1 We achieve this by using A-orthogonal search directions i.e., p Ti Ap j = 0 for all i = j .

13.2.1 Derivation of the Method As in the steepest descent method we choose a starting vector x 0 ∈ Rn . If r 0 = b −Ax 0 = 0 then x 0 is the exact solution and we are finished, otherwise we initially make a steepest descent step. It follows that r T1 r 0 = 0 and p0 := r 0 . For the general case we define for j ≥ 0 p j := r j −

j −1 T

r j Ap i ( T )p i , pi Api i=0

x j +1 := x j + αj p j

αj :=

r j +1 = r j − αj Ap j .

r Tj r j p Tj Ap j

(13.11)

,

(13.12) (13.13)

We note that 1. p j is computed by the Gram-Schmidt orthogonalization process applied to the residuals r 0 , . . . , r j using the A-inner product. The search directions are therefore A-orthogonal and nonzero as long as the residuals are linearly independent. 2. Equation (13.13) follows from (13.10). 3. It can be shown that the step length αj is optimal for all j (cf. Exercise 13.7). 1 It

is this property that has given the method its name.

284

13 The Conjugate Gradient Method

Lemma 13.2 (The Residuals Are Orthogonal) Suppose that for some k ≥ 0 that x j is well defined, r j = 0, and r Ti r j = 0 for i, j = 0, 1, . . . , k, i = j . Then x k+1 is well defined and r Tk+1 r j = 0 for j = 0, 1, . . . , k. Proof Since the residuals r j are orthogonal and nonzero for j ≤ k, they are linearly independent, and it follows form the Gram-Schmidt Theorem 5.4 that pk is nonzero and p Tk Ap i = 0 for i < k. But then x k+1 and r k+1 are well defined. Now r Tk+1 r j

(13.13)

=

(r k − αk Ap k )T r j

(13.11) T = rk rj pTk Api = 0

=



− αk p Tk A

j −1 T

r j Api  pj + ( T )p i p i Ap i i=0

r Tk r j − αk p Tk Ap j = 0,

j = 0, 1, . . . , k.

That the final expression is equal to zero follows by orthogonality and Aorthogonality for j < k and by the definition of αk for j = k. This completes the proof.   The conjugate gradient method is also a direct method. The residuals are orthogonal and therefore linearly independent if they are nonzero. Since dim Rn = n the n + 1 residuals r 0 , . . . , r n cannot all be nonzero and we must have r k = 0 for some k ≤ n. Thus we find the exact solution in at most n iterations. The expression (13.11) for pk can be greatly simplified. All terms except the last one vanish, since by orthogonality of the residuals r Tj Api

(13.13) T  r i = rj

− r i+1  = 0, αi

i = 0, 1, . . . , j − 2.

With j = k + 1 (13.11) therefore takes the simple form p k+1 = r k+1 + βk p k and we find βk := −

r Tk+1 Apk p Tk Ap k

(13.13)

=

r Tk+1 (r k+1 − r k ) αk pTk Apk

(13.12)

=

r Tk+1 r k+1 r Tk r k

.

(13.14)

To summarize, in the conjugate gradient method we start with x 0 , p 0 = r 0 = b − Ax 0 and then generate a sequence of vectors {x k } as follows: For k = 0,1, 2, . . . x k+1 := x k + αk pk , r k+1 := r k − αk Apk ,

αk :=

r Tk r k p Tk Ap k

,

(13.15) (13.16)

13.2 The Conjugate Gradient Method

pk+1 := r k+1 + βk p k ,

285

βk :=

r Tk+1 r k+1 r Tk r k

.

(13.17)

The residuals and search directions are orthogonal and A-orthogonal, respectively. For computation we organize the iterations as follows for k = 0, 1, 2, . . . t k = Ap k , αk = (r Tk r k )/(p Tk t k ), x k+1 = x k + αk pk , r k+1 = r k − αk t k ,

(13.18)

βk = (r Tk+1 r k+1 )/(r Tk r k ), p k+1 := r k+1 + βk pk . Note that (13.18) differs from (13.9) only in the computation of the search direction. Example 13.2 (Conjugate Gradient (13.18) applied to the pos 2 −1  Iteration)  x1   0 Consider  itive definite linear system −1 = . Starting as in Example 13.1 with 2 ! x2" 0 " ! −1 3/2 x 0 = −1/2 we find p 0 = r 0 = 0 and then t0 = p1 =

! !

3 −3/2 3/8 3/4

" ,

" ,

" ! ! " −1/4 0 α0 = 1/2, x 1 = −1/2 , r 1 = 3/4 , β0 = 1/4, ! " 0 t 1 = 9/8 , α1 = 2/3, x 2 = 0, r 2 = 0.

Thus x 2 is the exact solution as illustrated in the right part of Fig. 13.1.

13.2.2 The Conjugate Gradient Algorithm In this section we give numerical examples and discuss implementation. The formulas in (13.18) form a basis for the following algorithm, which solves the positive definite linear system Ax = b by the conjugate gradient method. x is a starting vector for the iteration. The iteration is stopped when ||r k ||2 /||b||2 ≤ tol or

286

13 The Conjugate Gradient Method

k > itmax. K is the number of iterations used: function [x,K]=cg(A,b,x,tol,itmax) % [x,K]=cg(A,b,x,tol,itmax) r=b-A*x; p=r; rho0=b’*b; rho=r’*r; for k=0:itmax if sqrt(rho/rho0) 0 and d ≥ 2|a| (cf. Theorem 10.2). We set h = 1/(m + 1) and f = [1, . . . , 1]T ∈ Rn . We consider two problems. 1. a = 1/9, d = 5/18, the Averaging matrix. 2. a = −1, d = 2, the Poisson matrix.

13.2.4 Implementation Issues Note that for our test problems T 2 only has O(5n) nonzero elements. Therefore, taking advantage of the sparseness of T 2 we can compute t in Algorithm 13.1

13.2 The Conjugate Gradient Method

287

in O(n) arithmetic operations. With such an implementation the total number of arithmetic operations in one iteration is O(n). We also note that it is not necessary to store the matrix T 2 . To use the conjugate gradient algorithm on the test matrix for large n it is advantageous to use a matrix equation formulation. We define matrices V , R, P , B, T ∈ Rm×m by x = vec(V ), r = vec(R), p = vec(P ), t = vec(T ), and h2 f = vec(B). Then T 2 x = h2 f ⇐⇒ T 1 V + V T 1 = B, and t = T 2 p ⇐⇒ T = T 1 P + P T 1 . This leads to the following algorithm for testing the conjugate gradient algorithm on the matrix 2 )×(m2 )

A = tridiagm (a, d, a) ⊗ I m + I m ⊗ tridiagm (a, d, a) ∈ R(m

.

function [V,K]=cgtest(m,a,d,tol,itmax) % [V,K]=cgtest(m,a,d,tol,itmax) R=ones(m)/(m+1)^2; rho=sum(sum(R.*R)); rho0=rho; P=R; V=zeros(m,m); T1=sparse(tridiagonal(a,d,a,m)); for k=1:itmax if sqrt(rho/rho0) 0,

(13.19)

while for the conjugate gradient method we have √ k κ −1 ||x − x k ||A − √2 k ≤2 √ < 2e κ , ||x − x 0 ||A κ +1

k ≥ 0.

(13.20)

Here κ = cond2(A) := λmax /λmin is the spectral condition number of A, where λmax and λmin are the largest and smallest eigenvalue of A, respectively. Theorem 13.3 implies 1. Since κ−1 κ+1 < 1 the steepest descent method always converges for a positive definite matrix. The convergence can be slow when κ−1 κ+1 is close to one, and this happens even for a moderately ill-conditioned A. 2. The rate of convergence for the conjugate gradient method appears to be determined by the square root of the spectral condition number. This is much better than the estimate for the steepest descent method. Especially for problems with large condition numbers. 3. The proofs of the estimates in (13.19) and (13.20) are quite different. This is in spite of their similar appearance.

13.3 Convergence

289

13.3.2 The Number of Iterations for the Model Problems Consider the test matrix 2 )×(m2 )

T 2 := tridiagm (a, d, a) ⊗ I m + I m ⊗ tridiagm (a, d, a) ∈ R(m

.

The eigenvalues were given in (10.15) as λj,k = 2d + 2a cos(j πh) + 2a cos(kπh),

j, k = 1, . . . , m.

(13.21)

For the averaging problem given by d = 5/18, a = 1/9, the largest and smallest eigenvalue of T 2 are given by λmax = 59 + 49 cos (πh) and λmin = 59 − 49 cos (πh). Thus κA =

5 + 4 cos(πh) ≤ 9, 5 − 4 cos(πh)

and the condition number is bounded independently of n. It follows from (13.20) that the number of iterations can be bounded independently of the size n of the problem, and this is in agreement with what we observed in Table 13.1. For the Poisson problem we have by (10.20) the condition number κP =

2 2√ λmax cos2 (πh/2) cos(πh/2) √ ≈ ≈ = n. and κP = 2 λmin sin(πh/2) πh π sin (πh/2)

Thus, (see also Exercise 8.19) we solve the discrete Poisson problem in O(n3/2 ) arithmetic operations using the conjugate gradient method. This is the same as for the SOR method and for the fast method without the FFT. In comparison the Cholesky Algorithm requires O(n2 ) arithmetic operations both for the averaging and the Poisson problem.

13.3.3 Krylov Spaces and the Best Approximation Property For the convergence analysis of the conjugate gradient method certain subspaces of Rn called Krylov spaces play a central role. In fact the iterates in the conjugate gradient method are best approximation of the solution from these subspaces using the A-norm to measure the error. The Krylov spaces are defined by W0 = {0} and Wk = span(r 0 , Ar 0 , A2 r 0 , . . . , Ak−1 r 0 ),

k = 1, 2, 3, · · · .

290

13 The Conjugate Gradient Method

They are nested subspaces W0 ⊂ W1 ⊂ W2 ⊂ · · · ⊂ Wn ⊂ Rn with dim(Wk ) ≤ k for all k ≥ 0. Moreover, If v ∈ Wk then Av ∈ Wk+1 . Lemma 13.3 (Krylov Space) For the iterates in the conjugate gradient method we have x k − x 0 ∈ Wk ,

r k , p k ∈ Wk+1 ,

k = 0, 1, . . . ,

(13.22)

w ∈ Wk .

(13.23)

and r Tk w = p Tk Aw = 0,

Proof Equation (13.22) clearly holds for k = 0 since p 0 = r 0 . Suppose it holds for some k ≥ 0. Then r k+1 = r k − αk Apk ∈ Wk+2 , p k+1 = r k+1 + βk pk ∈ (13.12)

Wk+2 and x k+1 − x 0 = x k − x 0 + αk p k ∈ Wk+1 . Thus (13.22) follows by induction. The equation (13.23) follows since any w ∈ Wk is a linear combination of {r 0 , r 1 , . . . , r k−1 } and also {p0 , p1 , . . . , p k−1 }.   Theorem 13.4 (Best Approximation Property) Suppose Ax = b, where A ∈ Rn×n is positive definite and {x k } is generated by the conjugate gradient method (cf. (13.15)). Then x − x k A = min x − x 0 − wA . w∈Wk

(13.24)

Proof Fix k, let w ∈ Wk and u := x k −x 0 −w. By (13.22) u ∈ Wk and then (13.23) implies that x − x k , u = r Tk u = 0. Using Corollary 5.2 we obtain x − x 0 − wA = x − x k + uA ≥ x − x k A , with equality for u = 0.

 

If x 0 = 0 then (13.24) says that x k is the element in Wk that is closest to the solution x in the A-norm. More generally, if x 0 = 0 then x − x k = (x − x 0 ) − (x k − x 0 ) and x k − x 0 is the element in Wk that is closest to x − x 0 in the A-norm. This is the orthogonal projection of x − x 0 into Fig. 13.2. Wk , see m Recall that to each polynomial p(t) := m j =0 aj t there corresponds a matrix polynomial p(A) := a0 I + a1 A + · · · + am Am . Moreover, if (λj , uj ) are eigenpairs of A then (p(λj ), uj ) are eigenpairs of p(A) for j = 1, . . . , n. Lemma 13.4 (Krylov Space and Polynomials) Suppose Ax = b where A ∈ Rn×n is positive definite with orthonormal eigenpairs (λj , uj ), j = 1, 2, . . . , n, and let r 0 := b − Ax 0 for some x 0 ∈ Rn . To each w ∈ Wk there corresponds

13.3 Convergence

291

x − x0 x − xk

x k − x0

Fig. 13.2 The orthogonal projection of x − x 0 into Wk

a polynomial P (t) := n j =1 σj uj then

k−1

j =0 aj t

||x − x 0 − w||2A =

k−1

n σ2

j j =1

λj

such that w = P (A)r 0 . Moreover, if r 0 =

Q(λj )2 ,

Q(t) := 1 − tP (t).

(13.25)

Proof If w ∈ Wk then w = a0 r 0 + a1 Ar 0 + · · · + ak−1 Ak−1 r 0 for some scalars a0 , . . . , ak−1 . But then w = P (A)r 0 . We find x − x 0 − P (A)r 0 = A−1 (r 0 − AP (A))r 0 = A−1 Q(A)r 0 and A(x − x 0 − P (A)r 0 ) = Q(A)r 0 . Therefore, x − x 0 − P (A)r 0 2A = cT A−1 c where c = (I − AP (A))r 0 = Q(A)r 0 . (13.26) Using the eigenvector expansion for r 0 we obtain c=

n

j =1

σj Q(λj )uj ,

A−1 c =

n

i=1

σi

Q(λi ) ui . λi

Now (13.25) follows by the orthonormality of the eigenvectors. We will use the following theorem to estimate the rate of convergence.

(13.27)  

292

13 The Conjugate Gradient Method

Theorem 13.5 (cg and Best Polynomial Approximation) Suppose [a, b] with 0 < a < b is an interval containing all the eigenvalues of A. Then in the conjugate gradient method ||x − x k ||A ≤ min max |Q(x)|, Q∈Πk a≤x≤b ||x − x 0 ||A

(13.28)

Q(0)=1

where Πk denotes the class of univariate polynomials of degree ≤ k with real coefficients. Proof By (13.25) with Q(t) = 1 (corresponding to P (A) = 0) we find ||x − σ2 x 0 ||2A = nj=1 λjj . Therefore, by the best approximation property Theorem 13.4 and (13.25), for any w ∈ Wk ||x−x k ||2A ≤ ||x−x 0 −w||2A ≤ max |Q(x)|2 a≤x≤b

n σ2

j j =1

λj

= max |Q(x)|2 ||x−x 0 ||2A , a≤x≤b

where Q ∈ Πk and Q(0) = 1. Minimizing over such polynomials Q and taking square roots the result follows.   In the next section we use properties of the Chebyshev polynomials to show that ||x − x k ||A ≤ min Q∈Πk ||x − x 0 ||A

max

Q(0)=1

λmin ≤x≤λmax

|Q(x)| =

2 , −k a + ak

√ κ −1 a := √ , κ +1 (13.29)

where κ = λmax /λmin is the spectral condition number of A. Ignoring the second term in the denominator this implies the first inequality in (13.20). Consider the second inequality in (13.20). The inequality x−1 < e−2/x x+1

for x > 1

(13.30)

follows from the familiar series expansion of the exponential function. Indeed, with y = 1/x, using 2k /k! = 2, k = 1, 2, and 2k /k! < 2 for k > 2, we find e2/x = e2y =



(2y)k k=0

and (13.30) follows.

k!

< −1 + 2



k=0

yk =

x+1 1+y = 1−y x−1

13.4 Proof of the Convergence Estimates

293

13.4 Proof of the Convergence Estimates 13.4.1 Chebyshev Polynomials The proof of the estimate (13.29) for the error in the conjugate gradient method is based on an extremal property of the Chebyshev polynomials. Suppose a < b, c ∈ [a, b] and k ∈ N. Consider the set Sk of all polynomials Q of degree ≤ k such that Q(c) = 1. For any continuous function f on [a, b] we define f ∞ = max |f (x)|. a≤x≤b

We want to find a polynomial Q∗ ∈ Sk such that Q∗ ∞ = min Q∞ . Q∈Sk

We will show that Q∗ is uniquely given as a suitably shifted and normalized version of the Chebyshev polynomial. The Chebyshev polynomial Tn of degree n can be defined recursively by Tn+1 (t) = 2tTn (t) − Tn−1 (t),

n ≥ 1,

t ∈ R,

starting with T0 (t) = 1 and T1 (t) = t. Thus T2 (t) = 2t 2 − 1, T3 (t) = 4t 3 − 3t etc. In general Tn is a polynomial of degree n. There are some convenient closed form expressions for Tn . Lemma 13.5 (Closed Forms of Chebyshev Polynomials) For n ≥ 0 1. Tn (t) = cos t) for t ∈ [−1, √ √ 1], −n   (n arccos n for |t| ≥ 1. 2. Tn (t) = 12 t + t 2 − 1 + t + t 2 − 1 Proof 1. With Pn (t) = cos (n arccos t) we have Pn (t) = cos nφ, where t = cos φ. Therefore, Pn+1 (t) + Pn−1 (t) = cos (n + 1)φ + cos (n − 1)φ = 2 cos φ cos nφ = 2tPn (t), and it follows that Pn satisfies the same recurrence relation as Tn . Since P0 = T0 and P1 = T1 we have Pn = Tn for all n ≥ 0. 2. Fix t with |t| ≥ 1 and let xn := Tn (t) for n ≥ 0. The recurrence relation for the Chebyshev polynomials can then be written xn+1 − 2txn + xn−1 = 0 for n ≥ 1, with x0 = 1, x1 = t.

(13.31)

294

13 The Conjugate Gradient Method

To solve this difference equation we insert xn = zn into (13.31) and obtain zn+1 − 2tzn + zn−1 = 0 or z2 − 2tz + 1 = 0. The roots of this equation are z1 = t +



t 2 − 1,

z2 = t −

   −1 t2 − 1 = t + t2 − 1 .

Now z1n , z2n and more generally c1 z1n + c2 z2n are solutions of (13.31) for any constants c1 and c2 . We find these constants from the initial conditions x0 = c1 + c2 = 1 and x1 = c1 z1 + c2 z2 = t. Since z1 + z2 = 2t the solution is c1 = c2 = 12 .   We show that the unique solution to our minimization problem is Q∗ (x) =

Tk (u(x)) , Tk (u(c))

u(x) =

b + a − 2x . b−a

(13.32)

Clearly Q∗ ∈ Sk . Theorem 13.6 (A Minimal Norm Problem) Suppose a < b, c ∈ [a, b] and k ∈ N. If Q ∈ Sk and Q = Q∗ then Q∞ > Q∗ ∞ . Proof Recall that a nonzero polynomial p of degree k can have at most k zeros. If p(z) = p (z) = 0, we say that p has a double zero at z. Counting such a zero as two zeros it is still true that a nonzero polynomial of degree k has at most k zeros. |Q∗ | takes on its maximum 1/|Tk (u(c))| at the k + 1 points μ0 , . . . , μk in [a, b] such that u(μi ) = cos(iπ/k) for i = 0, 1, . . . , k. Suppose Q ∈ Sk and that Q∞ ≤ Q∗ ∞ . We have to show that Q ≡ Q∗ . Let f ≡ Q − Q∗ . We show that f has at least k zeros in [a, b]. Since f is a polynomial of degree ≤ k and f (c) = 0, this means that f ≡ 0 or equivalently Q ≡ Q∗ . Consider Ij = [μj −1 , μj ] for a fixed j . Let σj = f (μj −1 )f (μj ). We have σj ≤ 0. For if say Q∗ (μj ) > 0 then Q(μj ) ≤ Q∞ ≤ Q∗ ∞ = Q∗ (μj ) so that f (μj ) ≤ 0. Moreover, −Q(μj −1 ) ≤ Q∞ ≤ Q∗ ∞ = −Q∗ (μj −1 ). Thus f (μj −1 ) ≥ 0 and it follows that σj ≤ 0. Similarly, σj ≤ 0 if Q∗ (μj ) < 0. If σj < 0, f must have a zero in Ij since it is continuous. Suppose σj = 0. Then f (μj −1 ) = 0 or f (μj ) = 0. If f (μj ) = 0 then Q(μj ) = Q∗ (μj ). But then μj is a maximum or minimum both for Q and Q∗ . If μj ∈ (a, b) then Q (μj ) =

13.4 Proof of the Convergence Estimates

295

Q∗  (μj ) = 0. Thus f (μj ) = f  (μj ) = 0, and f has a double zero at μj . We can count this as one zero for Ij and one for Ij +1 . If μj = b, we still have a zero in Ij . Similarly, if f (μj −1 ) = 0, a double zero of f at μj −1 appears if μj −1 ∈ (a, b). We count this as one zero for Ij −1 and one for Ij . In this way we associate one zero of f for each of the k intervals Ij , j = 1, 2, . . . , k. We conclude that f has at least k zeros in [a, b].   Theorem 13.6 with a, and b, the smallest and largest eigenvalue of A, and c = 0 implies that the minimizing polynomial in (13.29) is given by b + a − 2x b+a Q∗ (x) = Tk /Tk . (13.33) b−a b−a By Lemma 13.5      b + a − 2x   max Tk (t) = 1. max Tk  = −1≤t a≤x≤b  ≤1 b−a Moreover with t = (b + a)/(b − a) we have √  κ +1 2 t + t −1= √ , κ −1

(13.34)

κ = b/a.

Thus again by Lemma 13.5 we find Tk

b+a b−a



= Tk

κ +1 κ −1



1 = 2

 √ √  κ +1 k κ −1 k √ + √ κ −1 κ +1

(13.35)

and (13.29) follows (Fig. 13.3).

Fig. 13.3 This is an illustration of the proof of Theorem 13.6 for k = 3. f ≡ Q −Q∗ has a double zero at μ1 and one zero between μ2 and μ3

296

13 The Conjugate Gradient Method

13.4.2 Convergence Proof for Steepest Descent For the proof of (13.19) the following inequality will be used. Theorem 13.7 (Kantorovich Inequality) For any positive definite matrix A ∈ Rn×n 1≤

(y T Ay)(y T A−1 y) (M + m)2 ≤ 4Mm (y T y)2

y = 0, y ∈ Rn ,

(13.36)

where M := λmax and m := λmin are the largest and smallest eigenvalue of A, respectively. Proof If (λj , uj ) are orthonormal eigenpairs of A then (λ−1 j , uj ) are eigenpairs n −1 for A , j = 1, . . . , n. Let y = j =1 cj uj be the corresponding eigenvector expansion of a vector y ∈ Rn . By orthonormality, (cf. (6.9))

y T Ay ti λi , = yT y n

a :=

i=1

ti y T A−1 y , = yT y λi n

b :=

(13.37)

i=1

where c2 ti = n i

2 j =1 cj

≥ 0,

i = 1, . . . , n and

n

ti = 1.

(13.38)

i=1

Thus a and b are convex combinations of the eigenvalues of A and A−1 , respectively. Let c be a positive constant to be chosen later. By the geometric/arithmetic mean inequality (8.33) and (13.37) √

ab =



 1 1  (ac)(b/c) ≤ (ac + b/c)/2 = ti λi c + 1/(λi c) = ti f (λi c), 2 2 n

n

i=1

i=1

where f : [mc, Mc] → R is given by f (x) := x + 1/x. By (13.38) √

ab ≤

1 max f (x). 2 mc≤x≤Mc

Since f ∈ C 2 and f  is positive it follows from Lemma 8.2 that f is a convex function. But a convex function takes it maximum at one of the endpoints of the range (cf. Exercise 13.16) and we obtain √ 1 ab ≤ max{f (mc), f (Mc)}. 2

(13.39)

13.4 Proof of the Convergence Estimates

297

% % √ M m Choosing c := 1/ mM we find f (mc) = f (Mc) = + m M = By (13.39) we obtain

M+m √ . mM

(y T Ay)(y T A−1 y) (M + m)2 , = ab ≤ (y T y)2 4Mm the upper bound in (13.36). For the lower bound we use the Cauchy-Schwarz inequality as follows 3 n 42 n n n

 2    1/2 1/2 1= ti = (ti λi ) (ti /λi ) ≤ ti λi ti /λi = ab. i=1

i=1

i=1

i=1

  Proof of (13.19) Let j := x − x j , j = 0, 1, . . ., where Ax = b. It is enough to show that  k+1 2A  k 2A for then  k A ≤



κ−1 κ+1





κ −1 κ +1

2

 k−1  ≤ · · · ≤

k = 0, 1, 2, . . . ,

, 

k+1 = k − αk r k ,

κ−1 κ+1

k

(13.40)

 0 . It follows from (13.8) that

αk :=

r Tk r k r Tk Ar k

.

We find  k 2A = Tk A k = r Tk A−1 r k ,  k+1 2A = ( k − αk r k )T A( k − αk r k ) = Tk A k − 2αk r Tk A k + αk2 r Tk Ar k =  k 2A −

(r Tk r k )2 r Tk Ar k

.

Combining these and using Kantorovich inequality  k+1 2A  k 2A

=1−

(r Tk r k )2

(r Tk Ar k )(r Tk A−1 r k )

≤1−

4λmin λmax = (λmin + λmax )2



κ −1 κ +1

2

and (13.40) is proved.  

298

13 The Conjugate Gradient Method

13.4.3 Monotonicity of the Error The error analysis for the conjugate gradient method is based on the A-norm. We end this chapter by considering the Euclidian norm of the error, and show that it is strictly decreasing. Theorem 13.8 (The Error in cg Is Strictly Decreasing) Let in the conjugate gradient method m be the smallest integer such that r m+1 = 0. For k ≤ m we have  k+1 2 <  k 2 . More precisely,  k 22 −  k+1 22 =

p k 22 p k 2A

( k 2A +  k+1 2A )

where j = x − x j and Ax = b. Proof For j ≤ m j = x m+1 − x j = x m − x j + αm p m = x m−1 − x j + αm−1 p m−1 + αm p m = . . . so that j =

m

αi pi ,

αi =

i=j

r Ti r i p Ti Ap i

(13.41)

.

By (13.41) and A-orthogonality  j 2A = j A j =

m

αi2 pTi Api =

i=j

m

(r T r i )2 i

i=j

pTi Ap i

.

(13.42)

By (13.17) and Lemma 13.3 pTi p k = (r i + βi−1 p i−1 )T pk = βi−1 p Ti−1 pk = · · · = βi−1 · · · βk (p Tk pk ), and since βi−1 · · · βk = (r Ti r i )/(r Tk r k ) we find p Ti p k =

r Ti r i r Tk r k

pTk p k ,

i ≥ k.

Since  k 22 =  k+1 + x k+1 − x k 22 =  k+1 + αk pk 22 ,

(13.43)

13.5 Preconditioning

299

we obtain    k 22 −  k+1 22 =αk 2pTk k+1 + αk pTk p k m m m

    = αk 2 αi pTi p k + αk pTk pk = + αk αi pTi pk

(13.41)

i=k+1 m

i=k

m

 r Tk r k (13.43) = + p Tk Ap k i=k i=k+1 2 (13.42) p k 2   k 2A = p k 2A

r Ti r i pTi Api

i=k+1

r Ti r i T pk p k r Tk r k

 +  k+1 2A .  

and the Theorem is proved.

13.5 Preconditioning For problems Ax = b of size n, where both n and cond2 (A) are large, it is often possible to improve the performance of the conjugate gradient method by using a technique known as preconditioning. Instead of Ax = b we consider an equivalent system BAx = Bb, where B is nonsingular and cond2 (BA) is smaller than cond2 (A). The matrix B will in many cases be the inverse of another matrix, B = M −1 . We cannot use CG on BAx = Bb directly since BA in general is not symmetric even if both A and B are. But if B (and hence M) is positive definite then we can apply CG to a symmetrized system and then transform the recurrence formulas to an iterative method for the original system Ax = b. This iterative method is known as the preconditioned conjugate gradient method. We shall see that the convergence properties of this method is determined by the eigenvalues of BA. Suppose B is positive definite. By Theorem 4.4 there is a nonsingular matrix C such that B = C T C. (C is only needed for the derivation and will not appear in the final formulas). Now BAx = Bb ⇔ C T (CAC T )C −T x = C T Cb ⇔ (CAC T )y = Cb, where y := C −T x. We have 3 linear systems Ax = b

(13.44)

BAx = Bb

(13.45)

(CAC )y = Cb, & x = C y. T

T

(13.46)

300

13 The Conjugate Gradient Method

Note that (13.44) and (13.46) are positive definite linear systems. In addition to being positive definite the matrix CAC T is similar to BA. Indeed, C T (CAC T )C −T = BA. Thus CAC T and BA have the same eigenvalues. Therefore, if we apply the conjugate gradient method to (13.46) then the rate of convergence will be determined by the eigenvalues of BA. We apply the conjugate gradient method to (CAC T )y = Cb. Denoting the search direction by q k and the residual by zk := Cb − CAC T y k we obtain the following from (13.15), (13.16), and (13.17). y k+1 = y k + αk q k ,

αk = zTk zk /q Tk (CAC T )q k ,

zk+1 = zk − αk (CAC T )q k , q k+1 = zk+1 + βk q k ,

βk = zTk+1 zk+1 /zTk zk .

With x k := C T y k ,

p k := C T q k ,

s k := C T zk ,

r k := C −1 zk

(13.47)

this can be transformed into x k+1 = x k + αk pk ,

αk =

s Tk r k pTk Apk

(13.48)

,

r k+1 = r k − αk Apk ,

(13.49)

s k+1 = s k − αk BAp k ,

(13.50)

p k+1 = s k+1 + βk pk ,

βk =

s Tk+1 r k+1 s Tk r k

.

(13.51)

Here x k will be an approximation to the solution x of Ax = b, r k = b − Ax k is the residual in the original system, and s k = C T zk = C T (C − CAC T )y k = Bb − BAx k is the residual in the preconditioned system. If we set r 0 = b − Ax 0 , p 0 = s 0 = Br 0 , we obtain the following preconditioned conjugate gradient algorithm for determining approximations x k to the solution of a positive definite system Ax = b, by considering the system BAx = Bb, with B positive definite. The iteration is stopped when ||r k ||2 /||b||2 ≤ tol or k > itmax. K is the number of iterations used, and x(= x 0 ) is the starting iteration.

13.5 Preconditioning

301

function [x,K]=pcg(A,B,b,x,tol,itmax) % [x,K]=pcg(A,B,b,x,tol,itmax) r=b-A*x; p=B*r; s=p; rho=s’*r; rho0=b’*b; for k=0:itmax if sqrt(rho/rho0) 0 for all (x, y) ∈ Ω. The problem (13.52) reduces to the Poisson problem (10.1) in the special case where c(x, y) = 1 for (x, y) ∈ Ω . To solve (13.52) numerically, we choose m ∈ N, set h := 1/(m + 1), and define index sets Im := {(j, k) : 1 ≤ j, k ≤ m}, I m := {(j, k) : 0 ≤ j, k ≤ m + 1}, ∂Im := I m \ Im . We compute approximations vj,k ≈ u(xj , yk ) on a grid of points {(xj , yk ) = (j h, kh) : (j, k) ∈ I m }

13.6 Preconditioning Example

303

using a finite difference method. For univariate functions f, g we approximate derivatives by using the central difference approximations d d h d h d h f (t) g(t) ≈ f (t + ) g(t + h/2) − f (t − ) g(t − ) / h dt dt 2 dt 2 dt 2   h  h  ≈ f (t + ) g(t + h) − g(t) − f (t − ) g(t) − g(t − h) / h2 2 2

to obtain (Lh v)j,k :=

(dv)j,k = fj,k , h2

vj,k = 0,

(j, k) ∈ Im ,

(13.53)

(j, k) ∈ ∂Im ,

(13.54)

where fj,k := f (xj , yk ), (dv)j,k := (d1 v)j,k + (d2 v)j,k , ∂  ∂u  c , ∂x ∂x j,k ∂  ∂u  c − vj,k−1 ) − cj,k+ 1 (vj,k+1 − vj,k ) ≈ −h2 , 2 ∂y ∂y j,k (13.55)

(d1 v)j,k := cj − 1 ,k (vj,k − vj −1,k ) − cj + 1 ,k (vj +1,k − vj,k ) ≈ −h2 2

2

(d2 v)j,k := cj,k− 1 (vj,k 2

and where cp,q = c(ph, qh) for p, q ∈ R. The equation (13.53) can be written in matrix form as ⎡ ⎡ ⎤ ⎤ f1,1 . . . f1,m (dv)1,1 . . . (dv)1,m 1 ⎢ ⎢ . ⎥ .. .. ⎥ . Lh v = F , Lh v := 2 ⎣ ... ⎦ , F := ⎣ .. . . ⎦ h (dv)m,1 . . . (dv)m,m fm,1 . . . fm,m (13.56) This is a linear system with the elements of ⎡

⎤ v1,1 . . . v1,m ⎢ .. ⎥ V := ⎣ ... . ⎦ vm,1 . . . vm,m as unknowns. The system h2 Lh v = h2 F can be written in standard form Ax = b where x = vec(V ), b = h2 vec(F ), and the coefficient matrix A ∈ Rn×n is defined as follows Ax = Avec(V ) := h2 vec(Lh v).

(13.57)

304

13 The Conjugate Gradient Method

If c(x, y) = 1 for all (x, y) ∈ Ω we recover the Poisson matrix (10.8). In general we can show that A is positive definite for all m ∈ N provided c(x, y) > 0 for all (x, y) ∈ Ω. For this we do not need the explicit form of A. To start we define for m ∈ N a discrete inner product on the space of matrices Rm×m V , W  := h2

m

vj,k wj,k ,

(13.58)

j,k=1

We then have the following lemma. Lemma 13.6 (Discrete Inner Product) If V , W ∈ Rm×m and vj,k = wj,k = 0 for (j, k) ∈ ∂Im , then Lh v, W  =

m m

   cj,k+ 1 vj,k+1 − vj,k wj,k+1 − wj,k 2

j =1 k=0

+

m m

j =0 k=1

cj + 1 ,k 2



  vj +1,k − vj,k wj +1,k − wj,k .

(13.59)

Proof If m ∈ N, ai , bi , ci ∈ R for i = 0, . . . , m and b0 = c0 = bm+1 = cm+1 = 0 then m



m

 ai−1 (bi − bi−1 ) − ai (bi+1 − bi ) ci = ai (bi+1 − bi )(ci+1 − ci ).

i=1

(13.60)

i=0

Indeed, the left hand side can be written m

i=0

ai (bi+1 − bi )ci+1 −

m

ai (bi+1 − bi )ci ,

i=0

and the right hand side of (13.60) follows. We apply (13.60) to (d1 v)j,k wj,k and (d2 v)j,k wj,k given by (13.55) and (13.59) follows.   Theorem 13.10 (Positive Definite Matrix) If c(x, y) > 0 for (x, y) ∈ Ω then the matrix A defined by (13.57) via the linear system (13.56) is positive definite. Proof By (13.59) Lh v, W  = W , Lh v and symmetry follows. We take W = V and obtain quadratic factors in (13.59). Since cj + 1 ,k and cj,k+ 1 correspond to 2 2 values of c in Ω for the values of j, k in the sums, it follows that they are positive and Lh v, V  ≥ 0 for all V ∈ Rm×m . If Lh v, V  = 0 then all the quadratic factors must be zero, and vj,k+1 = vj,k for k = 0, 1, . . . , m and j = 1, . . . , m. Now vj,0 = vj,m+1 = 0 implies that V = 0. It follows that the linear system (13.56) is positive definite.  

13.6 Preconditioning Example

305

13.6.2 Applying Preconditioning Consider solving Ax = b, where A is given by (13.57) and b ∈ Rn . Since A is positive definite it is nonsingular and the system has a unique solution x ∈ Rn . Moreover we can use either Cholesky factorization or the block tridiagonal solver √ to find x. Since the bandwidth of A is m = n both of these methods require O(n2 ) arithmetic operations for large n. If we choose c(x, y) ≡ 1 in (13.52), we get the Poisson problem. With this in mind, we may think of the coefficient matrix Ap arising from the discretization of the Poisson problem as an approximation to the matrix (13.57). This suggests using B = A−1 p , the inverse of the discrete Poisson matrix as a preconditioner for the system (13.53). Consider Algorithm 13.3. With this preconditioner the calculation w = Bt takes the form Ap w k = t k . In Sect. 11.2 we developed a Simple fast Poisson Solver, Cf. Algorithm 11.1. This method can be utilized to solve Ap w = t. Consider the specific problem where c(x, y) = e−x+y and f (x, y) = 1. We have used Algorithm 13.1 (conjugate gradient without preconditioning), and Algorithm 13.3 (conjugate gradient with preconditioning) to solve the problem (13.52). We used x 0 = 0 and = 10−8 . The results are shown in Table 13.3. Without preconditioning the number of iterations still seems to be more or √ less proportional to n although the convergence is slower than for the constant coefficient problem. Using preconditioning speeds up the convergence considerably. The number of iterations appears to be bounded independently of n. Using a preconditioner increases the work in each iteration. For the present example the number of arithmetic operations in each iteration changes from O(n) without preconditioning to O(n3/2 ) or O(n log2 n) with preconditioning. This is not a large increase and both the number of iterations and the computing time is reduced significantly. Let us finally show that the number κ = λmax /λmin which determines the rate of convergence for the preconditioned conjugate gradient method applied to (13.52) can be bounded independently of n.

Table 13.3 The number of iterations K (no preconditioning) and Kpre (with preconditioning) for the problem (13.52) using the discrete Poisson problem as a preconditioner n K √ K/ n Kpre

2500 222 4.44 22

10000 472 4.72 23

22500 728 4.85 23

40000 986 4.93 23

62500 1246 4.98 23

306

13 The Conjugate Gradient Method

Theorem 13.12 (Eigenvalues of Preconditioned Matrix) Suppose 0 < c0 ≤ c(x, y) ≤ c1 for all (x, y) ∈ [0, 1]2. For the eigenvalues of the matrix BA = A−1 p A just described we have κ=

λmax c1 ≤ . λmin c0

n Proof Suppose A−1 p Ax = λx for some x ∈ R \ {0}. Then Ax = λAp x. T Multiplying this by x and solving for λ we find

λ=

x T Ax . x T Ap x

We computed x T Ax in (13.59) and we obtain x T Ap x by setting all the c’s there equal to one x T Ap x =

m m



vi,j +1 − vi,j

i=1 j =0

2

+

m m



2 vi+1,j − vi,j .

j =1 i=0

Thus x T Ap x > 0 and bounding all the c’s in (13.59) from below by c0 and above by c1 we find c0 (x T Ap x) ≤ x T Ax ≤ c1 (x T Ap x) which implies that c0 ≤ λ ≤ c1 for all eigenvalues λ of BA = A−1 p A.

 

Using c(x, y) = e−x+y as above, we find c0 = e−2 and c1 = 1. Thus κ ≤ e2 ≈ 7.4, a quite acceptable matrix condition number which explains the convergence results from our numerical experiment.

13.7 Exercises Chap. 13 13.7.1 Exercises Sect. 13.1 Exercise 13.1 (A-Norm) One can show that the A-norm is a vector norm on Rn without using the fact that it is an inner product norm. Show this with the help of the Cholesky factorization of A. Exercise 13.2 (Paraboloid) Let A = U DU T be the spectral decomposition of A, i.e., U is orthogonal and D = diag(λ1 , . . . , λn ) is diagonal. Define new variables

13.7 Exercises Chap. 13

307

v = [v1 , . . . , vn ]T := U T y, and set c := U T b = [c1 , . . . , cn ]T . Show that Q(y) =

1 λj vj2 − cj vj . 2 n

n

j =1

j =1

Exercise 13.3 (Steepest Descent Iteration) Verify the numbers in Example 13.1. Exercise 13.4 (Steepest Descent (Exam Exercise 2011-1)) The method of steepest descent can be used to solve a linear system Ax = b for x ∈ Rn , where A ∈ Rn,n is symmetric and positive definite, and b ∈ Rn . With x 0 ∈ Rn an initial guess, the iteration is x k+1 = x k + αk r k , where r k is the residual, r k = b − Ax k , and r Tk r k . r Tk Ar k

αk =



 2 −1 a) Compute x 1 if A = , b = [1 1]T and x 0 = 0. −1 2 b) If the k-th error, ek = x k − x, is an eigenvector of A, what can you say about x k+1 ?

13.7.2 Exercises Sect. 13.2 Exercise 13.5 (Conjugate Gradient Iteration, II) Do one iteration with the T conjugate gradient method when x 0 = 0. (Answer: x 1 = bT b b.) b Ab

Exercise 13.6 (Conjugate Gradient Iteration, III) Do two conjugate gradient iterations for the system 

2 −1 −1 2

    0 x1 = x2 3

starting with x 0 = 0. Exercise 13.7 (The cg Step Length Is Optimal) Show that the step length αk in the conjugate gradient method is optimal.2 Exercise 13.8 (Starting Value in cg) Show that the conjugate gradient method (13.18) for Ax = b starting with x 0 is the same as applying the method to the system Ay = r 0 := b − Ax 0 starting with y 0 = 0.3 k−1

2 Hint:

use induction on k to show that pk = r k +

3 Hint:

The conjugate gradient method for Ay = r 0 can be written y k+1 := y k + γk q k , γk :=

s Tk s k , q Tk Aq k

j =0 ak,j r j

s k+1 := s k − γk Aq k , q k+1 := s k+1 + δk q k , δk :=

s k = r k , and q k = pk , for k = 0, 1, 2 . . ..

for some constants ak,j .

s Tk+1 s k+1 . s Tk s k

Show that y k = x k − x 0 ,

308

13 The Conjugate Gradient Method

Exercise 13.9 (Program Code for Testing Steepest Descent) Write a function K=sdtest(m,a,d,tol,itmax) to test the steepest descent method on the matrix T 2 . Make the analogues of Tables 13.1 and 13.2. For Table 13.2 it √ is enough to test for say n = 100, 400, 1600, 2500, and tabulate K/n instead of K/ n in the last row. Conclude that the upper bound (13.19) is realistic. Compare also with the number of iterations for the J and GS method in Table 12.1. Exercise 13.10 (Using cg to Solve Normal Equations) Consider solving the linear system AT Ax = AT b by using the conjugate gradient method. Here A ∈ Rm,n , b ∈ Rm and AT A is positive definite.4 Explain why only the following modifications in Algorithm 13.1 are necessary 1. r=A’(b-A*x); p=r; 2. a=rho/(t’*t); 3. r=r-a*A’*t; Note that the condition number of the normal equations is cond2 (A)2 , the square of the condition number of A. Exercise 13.11 (AT A Inner Product (Exam Exercise 2018-3)) In this problem we consider linear systems of the form Ax = b, where A ∈ Rn×n and b ∈ Rn are given, and x ∈ Rn is the unknown vector. We assume throughout that A is nonsingular. a) Let {v i }ki=1 be a set of linearly independent vectors in Rn , and let ·, · be an inner product in Rn . Explain that the k × k-matrix N with entries nij = v i , v j  is symmetric positive definite. b) Let W ⊂ Rn be any linear subspace. Show that there is one and only one vector xˆ ∈ W so that wT AT Axˆ = w T AT b,

for all w ∈ W,

and that xˆ satisfies b − Ax ˆ 2 ≤ b − Aw2 ,

for all w ∈ W.

c) In the rest of this problem we consider the situation above, but where the vector space W is taken to be the Krylov space Wk := span(b, Ab, . . . , Ak−1 b). We use the inner product in Rn given by v, wA := v T AT Aw,

v, w ∈ Rn .

4 This system known as the normal equations appears in linear least squares problems and was considered in this context in Chap. 9.

13.7 Exercises Chap. 13

309

The associated approximations of x, corresponding to xˆ in Wk , are then denoted x k . Assume that x k ∈ Wk is already determined. In addition, assume that we already have computed a “search direction” pk ∈ Wk+1 such that Apk 2 = pk A = 1, and such that p k , wA = 0,

for all w ∈ Wk .

Show that x k+1 = x k + αk p k for a suitable αk ∈ R, and express αk in terms of the residual r k := b − Ax k , and p k . d) Assume that A is symmetric, but not necessarily positive definite. Assume further that the vectors pk−2 , p k−1 , and pk are already known with properties as above. Show that Ap k−1 ∈ span(pk−2 , p k−1 , pk ). Use this to suggest how the search vectors p k can be computed recursively.

13.7.3 Exercises Sect. 13.3 Exercise 13.12 (Krylov Space and cg Iterations) Consider the linear system Ax = b where ⎡

⎤ 2 −1 0 A = ⎣ −1 2 −1 ⎦ , 0 −1 2

⎡ ⎤ 4 ⎣ and b = 0 ⎦ . 0

a) Determine the vectors defining the Krylov spaces for k ≤ 3 taking as initial ⎡ ⎤ 4 8 20 approximation x = 0. Answer: [b, Ab, A2 b] = ⎣ 0 −4 −16 ⎦ . 0 0 4 b) Carry out three CG-iterations on Ax = b. Answer: ⎡

⎤ 0 2 8/3 3 [x 0 , x 1 , x 2 , x 3 ] = ⎣ 0 0 4/3 2 ⎦ , 00 01 ⎡

⎤ 40 00 [r 0 , r 1 , r 2 , r 3 ] = ⎣ 0 2 0 0 ⎦ , 0 0 4/3 0

310

13 The Conjugate Gradient Method



⎤ 8 0 0 [Ap0 , Ap1 , Ap2 ] = ⎣ −4 3 0⎦, 0 −2 16/9 ⎡

⎤ 4 1 4/9 0 [p0 , p1 , p 2 , p 3 ] = ⎣ 0 2 8/9 0 ⎦ , 0 0 12/9 0 c) Verify that • • • • • •

dim(Wk ) = k for k = 0, 1, 2, 3. x 3 is the exact solution of Ax = b. r 0 , . . . , r k−1 is an orthogonal basis for Wk for k = 1, 2, 3. p 0 , . . . , pk−1 is an A-orthogonal basis for Wk for k = 1, 2, 3. {r k  is monotonically decreasing. {x k − x is monotonically decreasing.

Exercise 13.13 (Antisymmetric System (Exam Exercise 1983-3)) In this and the next exercise x, y = x T y is the usual inner product in Rn . We note that x, y = y, x, Cx, y = y, C T x

x, y ∈ Rn ,

(13.61)

(13.61)

= C T x, y,

x, y ∈ Rn , C ∈ Rn×n .

(13.62)

Let B ∈ Rn×n be an antisymmetric matrix, i.e., B T = −B, and let A := I − B, where I is the unit matrix in Rn . a) Show that Bx, x = 0,

x ∈ Rn ,

(13.63)

Ax, x = x, x = x22 . b) Show that Ax22 = x22 + Bx22 and that A2 = c) Show that A is nonsingular, A−1 2 = max x =0

(13.64) %

1 + B22 .

x2 , Ax2

and A2 ≤ 1. d) Let 1 ≤ k ≤ n, W = span(w1 , . . . , w k ) a k-dimensional subspace of Rn and b ∈ Rn . Show that if x ∈ W is such that Ax, w = b, w for all w ∈ W, then x2 ≤ b2 .

(13.65)

13.7 Exercises Chap. 13

311

k With x := j =1 xj w j the problem (13.65) is equivalent to finding real numbers x1 , . . . , xk solving the linear system k

xj Awj , w i  = b, wi ,

i = 1, . . . , k.

(13.66)

j =1

Show that (13.65) has a unique solution x ∈ W. e) Let x ∗ := A−1 b. Show that x ∗ − x2 ≤ A2 min x ∗ − w2 . w∈W

(13.67)

Exercise 13.14 (cg Antisymmetric System (Exam Exercise 1983-4)) (It is recommended to study Exercise 13.13 before starting this exercise.) As in Exercise 13.13 let B ∈ Rn×n be an antisymmetric matrix, i.e., B T = −B, let x, y = x T y be the usual inner product in Rn , let A := I − B, where I is the unit matrix in Rn and b ∈ Rn . The purpose of this exercise is to develop an iterative algorithm for the linear system Ax = b. The algorithm is partly built on the same idea as for the conjugate gradient method for positive definite systems. Let x 0 = 0 be the initial approximation to the exact solution x ∗ := A−1 b. For k = 1, 2, . . . , n we let Wk := span(b, Bb, . . . , B k−1 b). For k = 1, 2, . . . , n we define x k ∈ Wk by Ax k , w = b, w, for all w ∈ Wk . The vector x k is uniquely determined as shown in Exercise 13.13d) and that it is a “good” approximation to x∗ follows from (13.67). In this exercise we will derive a recursive algorithm to determine x k . For k = 0, . . . , n we set r k := b − Ax k , and ρk := r k 22 . Let m ∈ N be such that ρk = 0,

k = 0, . . . , m.

Let ω0 , ω1 , . . . , ωm be real numbers defined recursively for k = 1, 2, . . . , m by ωk :=

1, −1 (1 + ωk−1 ρk /ρk−1 )−1 ,

if k = 0 otherwise.

(13.68)

312

13 The Conjugate Gradient Method

We will show below that x k and r k satisfy the following recurrence relations for k = 0, 1, . . . , m − 1 x k+1 = (1 − ωk )x k−1 + ωk (x k + r k ),

(13.69)

r k+1 = (1 − ωk )r k−1 + ωk Br k ,

(13.70)

starting with x 0 = x −1 = 0 and r 0 = r −1 = b. a) Show that 0 < ωk < 1 for k = 1, 2, . . . , m. b) Explain briefly how to define an iterative algorithm for determining x k using the formulas (13.68), (13.69), (13.70) and estimate the number of arithmetic operations in each iteration. c) Show that r k , r j  = 0 for j = 0, 1, . . . , k − 1. d) Show that if k ≤ m + 1 then Wk = span(r 0 , r 1 , . . . , r k−1 ) and dimWk = k. e) Show that if 1 ≤ k ≤ m − 1 then Br k = αk r k+1 + βk r k−1 ,

(13.71)

where αk := Br k , r k+1 /ρk+1 and βk := Br k , r k−1 /ρk−1 . f) Define α0 := Br 0 , r 1 /ρ1 and show that α0 = 1. g) Show that if 1 ≤ k ≤ m − 1 then βk = −αk−1 ρk /ρk−1 . h) Show that5 r k+1 , A−1 r k+1  = r k+1 , A−1 r j ,

j = 0, 1, . . . , k.

(13.72)

i) Use (13.71) and (13.72) to show that αk + βk = 1 for k = 1, 2, . . . , m − 1. j) Show that αk ≥ 1 for k = 1, 2, . . . , m − 1. k) Show that x k , r k and ωk satisfy the recurrence relations (13.68), (13.69) and (13.70).

13.7.4 Exercises Sect. 13.4 Exercise 13.15 (Another Explicit Formula for the Chebyshev Polynomial) Show that Tn (t) = cosh(narccosh t) for t ≥ 1, where arccosh is the inverse function of cosh x := (ex + e−x )/2.

5 Hint:

Show that A−1 (r k+1 − r j ) ∈ Wk+1 .

13.8 Review Questions

313

Exercise 13.16 (Maximum of a Convex Function) Show that if f : [a, b] → R is convex then maxa≤x≤b f (x) ≤ max{f (a), f (b)}.

13.7.5 Exercises Sect. 13.5 Exercise 13.17 (Variable form ⎡ a1,1 ⎢ ⎢ −c 3 ,1 2 Ax = ⎢ ⎢ −c 3 ⎣ 1, 2 0

Coefficient) For m = 2, show that (13.57) takes the −c 3 ,1 −c1, 3 2

2

a2,2

0

0

a3,3

−c2, 3 −c 3 ,2 2

2



⎡ ⎤ ⎡ ⎤ (dv)1,1 ⎥ v1,1 −c2, 3 ⎥ ⎢v2,1 ⎥ ⎢(dv)2,1 ⎥ 2 ⎥⎢ ⎥ ⎢ ⎥ ⎣v1,2 ⎦ = ⎣(dv)1,2 ⎦ , −c 3 ,2 ⎥ ⎦ 2 v2,2 (dv)2,2 a4,4 0

where ⎡ ⎤ c 1 ,1 + c1, 1 a1,1 2 ⎢ 2 ⎢ a2,2 ⎥ ⎢ c 3 ,1 + c2, 1 ⎥=⎢ 2 ⎢ 2 ⎣ a3,3 ⎦ ⎢ c 1 + c 3 ⎣ 2 ,2 1, 2 a4,4 c3 + c 3 ⎡

2 ,2

2, 2

+ c1, 3 + c 3 ,1



2 2 ⎥ + c2, 3 + c 5 ,1 ⎥ ⎥. 2 2 + c1, 5 + c 3 ,2 ⎥ ⎦ 2 2 + c2, 5 + c 5 ,2 2

2

Show that the matrix A is symmetric, and if c(x, y) > 0 for all (x, y) ∈ Ω then it is strictly diagonally dominant.

13.8 Review Questions 13.8.1 13.8.2 13.8.3 13.8.4 13.8.5

Does the steepest descent and conjugate gradient method always converge? What kind of orthogonalities occur in the conjugate gradient method? What is a Krylov space? What is a convex function? How do SOR and conjugate gradient compare?

Part VI

Eigenvalues and Eigenvectors

In this and the next chapter we briefly give some numerical methods for finding one or more eigenvalues and eigenvectors of a matrix. Both Hermitian and non hermitian matrices are considered. But first we consider a location result for eigenvalues and then give a useful upper bound for how much an eigenvalue can change when the elements of the matrix is perturbed.

Chapter 14

Numerical Eigenvalue Problems

14.1 Eigenpairs Consider the eigenpair problem for some classes of matrices A ∈ Cn×n . Diagonal Matrices The eigenpairs are easily determined. Since Aei = aii ei the eigenpairs are (λi , ei ), where λi = aii for i = 1, . . . , n. Moreover, the eigenvectors of A are linearly independent. Triangular Matrices Suppose A is upper  or lower triangular. Consider finding the eigenvalues Since det(A − λI ) = ni=1 (aii − λ) the eigenvalues are λi = aii for i = 1, . . . , n, the diagonal elements of A. To determine the eigenvectors can be more challenging since A can be defective, i.e., the eigenvectors are not necessarily linearly independent, cf. Chap. 6. Block Diagonal Matrices Suppose A = diag(A1 , A2 , . . . , Ar ),

Ai ∈ Cmi ×mi .

Here the eigenpair problem reduces to r smaller problems. Let Ai X i = Xi D i define the eigenpairs of Ai for i = 1, . . . , r and let X := diag(X1 , . . . , X r ), D := diag(D 1 , . . . , D r ). Then the eigenpairs for A are given by AD = diag(A1 , . . . , Ar ) diag(X 1 , . . . , Xr ) = diag(A1 X1 , . . . , Ar Xr ) = diag(X1 D 1 , . . . , Xr D r ) = XD.

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_14

317

318

14 Numerical Eigenvalue Problems

Block Triangular matrices Matrices Let A11 , A22 , . . . , Arr be the diagonal blocks of A. By Property 8. of determinants det(A − λI ) =

r 

det(Aii − λI )

i=1

and the eigenvalues are found from the eigenvalues of the diagonal blocks. In this and the next chapter we consider some numerical methods for finding one or more of the eigenvalues and eigenvectors of a matrix A ∈ Cn×n . Maybe the first method which comes to mind is to form the characteristic polynomial πA of A, and then use a polynomial root finder, like Newton’s method to determine one or several of the eigenvalues. It turns out that this is not suitable as an all purpose method. One reason is that a small change in one of the coefficients of πA (λ) can lead to a large change in the roots of the polynomial. For example, if πA (λ :) = λ16 and q(λ) = λ16 −10−16 then the roots of πA are all equal to zero, while the roots of q are λj = 10−1 e2πij/16 , j = 1, . . . , 16. The roots of q have absolute value 0.1 and a perturbation in one of the polynomial coefficients of magnitude 10−16 has led to an error in the roots of approximately 0.1. The situation can be somewhat remedied by representing the polynomials using a different basis. In this text we will only consider methods which work directly with the matrix. But before that, in Sect. 14.3 we consider how much the eigenvalues change when the elements in the matrix are perturbed. We start with a simple but useful result for locating the eigenvalues.

14.2 Gershgorin’s Theorem The following theorem is useful for locating eigenvalues of an arbitrary square matrix. Theorem 14.1 (Gershgorin’s Circle Theorem) Suppose A ∈ Cn×n . Define for i = 1, 2, . . . , n Ri = {z ∈ C : |z − aii | ≤ ri },

ri :=

n

|aij |,

j =1 j =i

Cj = {z ∈ C : |z − ajj | ≤ cj },

cj :=

n

i=1 i =j

|aij |.

14.2 Gershgorin’s Theorem

319

Imaginary axis

Fig. 14.1 The Gershgorin disk Ri

6 ai,i

- ri - Real axis

Then any eigenvalue of A lies in R ∩ C where R = R1 ∪ R2 ∪ · · · ∪ Rn and C = C1 ∪ C2 ∪ · · · ∪ Cn . Proof Suppose (λ, x) is an eigenpair for A. We claim that λ ∈ Ri , where i is such that |x | = x . Indeed, Ax = λx implies that i ∞ j aij xj = λxi or (λ − aii )xi = a x . Dividing by x and taking absolute values we find ij j i j =i |λ − aii | = |

aij xj /xi | ≤

j =i

|aij ||xj /xi | ≤ ri

j =i

since |xj /xi | ≤ 1 for all j . Thus λ ∈ Ri . Since λ is also an eigenvalue of AT , it must be in one of the row disks of AT . But these are the column disks Cj of A. Hence λ ∈ Cj for some j .   The set Ri is a subset of the complex plane consisting of all points inside a circle with center at aii and radius ri , c.f. Fig. 14.1. Ri is called a (Gerschgorin) row disk. An eigenvalue λ lies in the union of the row disks R1 , . . . , Rn and also in the union of the column disks C1 , . . . , Cn . If A is Hermitian then Ri = Ci for i = 1, 2, . . . , n. Moreover, in this case the eigenvalues of A are real, and the Gerschgorin disks can be taken to be intervals on the real line. Example 14.1 (Gershgorin) Let T = tridiag(−1, 2, −1) ∈ Rm×m be the second derivative matrix. Since A is Hermitian we have Ri = Ci for all i and the eigenvalues are real. We find R1 = Rm = {z ∈ R : |z − 2| ≤ 1} and Ri = {z ∈ R : |z − 2| ≤ 2},

i = 2, 3, . . . , m − 1.

We conclude that λ ∈ [0, 4] for any eigenvalue λ of T . To check this, we recall that by Lemma 2.2 the eigenvalues of T are given by  λj = 4 sin

jπ 2(m + 1)

2 ,

j = 1, 2, . . . , m.

320

14 Numerical Eigenvalue Problems

"2 ! π When m is large the smallest eigenvalue 4 sin 2(m+1) is very close to zero and "2 ! mπ the largest eigenvalue 4 sin 2(m+1) is very close to 4. Thus Gerschgorin’s theorem gives a remarkably good estimate for large m. Sometimes some of the Gerschgorin disks are distinct and we have Corollary 14.1 (Disjoint Gershgorin Disks) If p of the Gershgorin row disks are disjoint from the others, the union of these disks contains precisely p eigenvalues. The same result holds for the column disks. Proof Consider a family of matrices A(t) := D + t (A − D),

D := diag(a11 , . . . , ann ),

t ∈ [0, 1].

We have A(0) = D and A(1) = A. As a function of t, every eigenvalue of A(t) is a continuous function of t. This follows from Theorem 14.2, see Exercise 14.5. The row disks Ri (t) of A(t) have radius proportional to t, indeed Ri (t) = {z ∈ C : |z − aii | ≤ tri },

ri :=

n

|aij |.

j=1

j =i

Clearly 0 ≤ t1 < t25≤ 1 implies Ri (t1 ) ⊂ Ri (t2 ) and Ri (1) is a row disk of p A for all i. Suppose k=1 Rik (1) are disjoint from the other disks of A and set 5 p R p (t) := k=1 Rik (t) for t ∈ [0, 1]. Now R p (0) contains only the p eigenvalues ai1 ,i1 , . . . , aip ,ip of A(0) = D. As t increases from zero to one the set R p (t) is disjoint from the other row disks of A and by the continuity of the eigenvalues cannot loose or gain eigenvalues. It follows that R p (1) must contain p eigenvalues of A.     1 1 2 Example 14.2 Consider the matrix A = 3 2 4 , where | i | ≤ 10−15 all i. By 5 6 3

Corollary 14.1 the eigenvalues λ1 , λ2 , λ3 of A are distinct and satisfy |λj − j | ≤ 2 × 10−15 for j = 1, 2, 3.

14.3 Perturbation of Eigenvalues In this section we study the following problem. Given matrices A, E ∈ Cn×n , where we think of E as a perturbation of A. By how much do the eigenvalues of A and A+ E differ? Not surprisingly this problem is more complicated than the corresponding problem for linear systems. We illustrate this by considering two examples. Suppose A0 := 0 is the zero matrix. If λ ∈ σ (A0 + E) = σ (E), then |λ| ≤ E∞ by Theorem 12.11, and any

14.3 Perturbation of Eigenvalues

321

zero eigenvalue of A0 is perturbed by at most E∞ . On the other hand consider for > 0 the matrices ⎤ 0 0⎥ ⎥ .. ⎥ , .⎥ ⎥ 0 0 · · · 0 1⎦ 0 0 0 ··· 0 0

⎡ 0 ⎢0 ⎢ ⎢ A1 := ⎢ ... ⎢ ⎣0

1 0 ··· 0 1 ··· .. .. . .

0 0 .. .



⎤ 00 0 0⎥ ⎥ .. .. ⎥ = e eT . n 1 . .⎥ ⎥ ⎦ 0 ··· 0 0 0 0 ··· 0 0

00 ⎢0 0 ⎢ ⎢ E := ⎢ ... ... ⎢ ⎣0 0

0 ··· 0 ··· .. .

The characteristic polynomial of A1 + E is π(λ) := (−1)n (λn − ), and the zero 1/n eigenvalues of A1 are perturbed by the amount |λ| = E∞ . Thus, for n = 16, a −16 perturbation of say = 10 gives a change in eigenvalue of 0.1. 1/n The following theorem shows that a dependence E∞ is the worst that can happen. Theorem 14.2 (Elsner’s Theorem (1985)) Suppose A, E ∈ Cn×n . To every μ ∈ σ (A + E) there is a λ ∈ σ (A) such that 1/n

|μ − λ| ≤ KE2 ,

1−1/n  K = A2 + A + E2 .

(14.1)

Proof Suppose A has eigenvalues λ1 , . . . , λn and let λ1 be one which is closest to μ. Let u1 with u1 2 = 1 be an eigenvector corresponding to μ, and extend u1 to an orthonormal basis {u1 , . . . , un } of Cn . Note that (μI − A)u1 2 = (A + E)u1 − Au1 2 = Eu1 2 ≤ E2 , n 

(μI − A)uj 2 ≤

j =2

n 

n−1  (|μ| + Auj 2 ) ≤ (A + E)2 + A2 .

j =2

Using this and Hadamard’s inequality (5.21) we find |μ − λ1 |n ≤

n 

  |μ − λj | = |det(μI − A)| = |det (μI − A)[u1 , . . . , un ] |

j =1

≤ (μI − A)u1 2

n 

 n−1 (μI − A)uj 2 ≤ E2 (A + E)2 + A2 .

j =2

The result follows by taking nth roots in this inequality.

 

It follows from this theorem that the eigenvalues depend continuously on the 1/n elements of the matrix. The factor E2 shows that this dependence is almost,  11 but not √ quite, differentiable. As an example, the eigenvalues of the matrix 1 are 1 ± and these expressions are not differentiable at = 0.

322

14 Numerical Eigenvalue Problems

14.3.1 Nondefective Matrices Recall that a matrix is nondefective if the eigenvectors form a basis for Cn . For nondefective matrices we can get rid of the annoying exponent 1/n in E2 in (14.1). For a more general discussion than the one in the following theorem see [19]. Theorem 14.3 (Absolute Errors) Suppose A ∈ Cn×n has linearly independent eigenvectors {x 1 , . . . , x n } and let X = [x 1 , . . . , x n ] be the eigenvector matrix. To any μ ∈ C and x ∈ Cn with xp = 1 we can find an eigenvalue λ of A such that |λ − μ| ≤ Kp (X)rp ,

1 ≤ p ≤ ∞,

(14.2)

where r := Ax − μx and Kp (X) := Xp X−1 p . If for some E ∈ Cn×n it holds that (μ, x) is an eigenpair for A + E, then we can find an eigenvalue λ of A such that |λ − μ| ≤ Kp (X)Ep ,

1 ≤ p ≤ ∞,

(14.3)

Proof If μ ∈ σ (A) then we can take λ = μ and (14.2), (14.3) hold trivially. So assume μ ∈ / σ (A). Since A is nondefective it can be diagonalized, we have A = XDX−1 , where D = diag(λ1 , . . . , λn ) and (λj , x j ) are the eigenpairs of A for j =   −1 −1 1, . . . , n. Define D 1 := D − μI . Then D −1 1 = diag (λ1 − μ) , . . . , (λn − μ) exists and   −1 −1 −1 r = (A − μI )−1 (A − μI )x = x. XD −1 1 X r = X(D − μI )X Using this and Lemma 14.1 below we obtain −1 −1 1 = xp = XD −1 1 X rp ≤ D 1 p Kp (X)rp =

Kp (X)rp . minj |λj − μ|

But then (14.2) follows. If (A + E)x = μx then 0 = Ax − μx + Ex = r + Ex. But then rp = −Exp ≤ Ep . Inserting this in (14.2) proves (14.3).   The equation (14.3) shows that for a nondefective matrix the absolute error can be magnified by at most Kp (X), the condition number of the eigenvector matrix with respect to inversion. If Kp (X) is small then a small perturbation changes the eigenvalues by small amounts. Even if we get rid of the exponent 1/n, the equation (14.3) illustrates that it can be difficult or sometimes impossible to compute accurate eigenvalues and eigenvectors of matrices with almost linearly dependent eigenvectors. On the other hand the eigenvalue problem for normal matrices is better conditioned. Indeed, if A is normal then it has a set of orthonormal eigenvectors and the eigenvector matrix is unitary. If

14.3 Perturbation of Eigenvalues

323

we restrict attention to the 2-norm then K2 (X) = 1 and (14.3) implies the following result. Theorem 14.4 (Perturbations, Normal Matrix) Suppose A ∈ Cn×n is normal and let μ be an eigenvalue of A + E for some E ∈ Cn×n . Then we can find an eigenvalue λ of A such that |λ − μ| ≤ E2 . For an even stronger result for Hermitian matrices see Corollary 6.13. We conclude that the situation for the absolute error in an eigenvalue of a Hermitian matrix is quite satisfactory. Small perturbations in the elements are not magnified in the eigenvalues. In the proof of Theorem 14.3 we used that the p-norm of a diagonal matrix is equal to its spectral radius. Lemma 14.1 (p-Norm of a Diagonal Matrix) If A = diag(λ1 , . . . , λn ) is a diagonal matrix then Ap = ρ(A) for 1 ≤ p ≤ ∞. Proof For p = ∞ the proof is left as an exercise. For any x ∈ Cn and p < ∞ we have Axp = [λ1 x1 , . . . , λn xn ]T p =

n  1/p |λj |p |xj |p ≤ ρ(A)xp . j =1

Ax

Thus Ap = maxx =0 xpp ≤ ρ(A). But from Theorem 12.11 we have ρ(A) ≤   Ap and the proof is complete. For the accuracy of an eigenvalue of small magnitude we are interested in the size of the relative error. Theorem 14.5 (Relative Errors) Suppose in Theorem 14.3 that A ∈ Cn×n is nonsingular. To any μ ∈ C and x ∈ Cn with xp = 1, we can find an eigenvalue λ of A such that rp |λ − μ| ≤ Kp (X)Kp (A) , |λ| Ap

1 ≤ p ≤ ∞,

(14.4)

where r := Ax − μx. If for some E ∈ Cn×n it holds that (μ, x) is an eigenpair for A + E, then we can find an eigenvalue λ of A such that Ep |λ − μ| ≤ Kp (X)A−1 Ep ≤ Kp (X)Kp (A) , |λ| Ap

1 ≤ p ≤ ∞,

Proof Applying Theorem 12.11 to A−1 we have for any λ ∈ σ (A) Kp (A) 1 ≤ A−1 p = λ Ap

(14.5)

324

14 Numerical Eigenvalue Problems

and (14.4) follows from (14.2). To prove (14.5) we define the matrices B := μA−1  μ −1 and F := −A E. If (λj , x) are the eigenpairs for A then λj , x are the eigenpairs for B for j = 1, . . . , n. Since (μ, x) is an eigenpair for A + E we find   (B + F − I )x = (μA−1 − A−1 E − I )x = A−1 μI − (E + A) x = 0. Thus (1, x) is an eigenpair for B + F . Applying Theorem 14.3 to this eigenvalue we can find λ ∈ σ (A) such that | μλ − 1| ≤ Kp (X)F p = Kp (X)A−1 Ep which proves the first estimate in (14.5). The second inequality in (14.5) follows from the submultiplicativity of the p-norm.  

14.4 Unitary Similarity Transformation of a Matrix into Upper Hessenberg Form Before attempting to find eigenvalues and eigenvectors of a matrix (exceptions are made for certain sparse matrices), it is often advantageous to reduce it by similarity transformations to a simpler form. Orthogonal or unitary similarity transformations are particularly important since they are insensitive to round-off errors in the elements of the matrix. In this section we show how this reduction can be carried out. Recall that a matrix A ∈ Cn×n is upper Hessenberg if ai,j = 0 for j = 1, 2, . . . , i −2, i = 3, 4, . . . , n. We will reduce A ∈ Cn×n to upper Hessenberg form by unitary similarity transformations. Let A1 = A and define Ak+1 = H k Ak H k for k = 1, 2, . . . , n − 2. Here H k is a Householder transformation chosen to introduce zeros in the elements of column k of Ak under the subdiagonal. The final matrix An−1 will be upper Hessenberg. Householder transformations were used in Chap. 5 to reduce a matrix to upper triangular form. To preserve eigenvalues similarity transformations are needed and then the final matrix in the reduction cannot in general be upper triangular. If A1 = A is Hermitian, the matrix An−1 will be Hermitian and tridiagonal. For if A∗k = Ak then A∗k+1 = (H k Ak H k )∗ = H k A∗k H k = Ak+1 . Since An−1 is upper Hessenberg and Hermitian, it must be tridiagonal. To describe the reduction to upper Hessenberg or tridiagonal form in more detail we partition Ak as follows 

 Bk Ck Ak = . Dk Ek

14.4 Unitary Similarity Transformation of a Matrix into Upper Hessenberg Form

325

Suppose B k ∈ Ck,k is upper Hessenberg, and the first k − 1 columns of D k ∈ Cn−k,k are zero, i.e. D k = [0, 0, . . . , 0, d k ]. Let V k = I − v k v ∗k ∈ Cn−k,n−k be a Householder transformation such that V k d k = αk e1 . Define 

Ik 0 Hk = 0 Vk

 ∈ Cn×n .

The matrix H k is a Householder transformation, and we find 

Ak+1

Ik 0 = H k Ak H k = 0 Vk   Ck V k Bk . = V k Dk V k Ek V k



Bk Ck Dk Ek



Ik 0 0 Vk



Now V k D k = [V k 0, . . . , V k 0, V k d k ] = (0, . . . , 0, αk e1 ). Moreover, the matrix B k is not affected by the H k transformation. Therefore the upper left (k+1)×(k+1) corner of Ak+1 is upper Hessenberg and the reduction is carried one step further. The reduction stops with An−1 which is upper Hessenberg. To find Ak+1 we use Algorithm 5.1 to find v k and αk . We store v k in the kth column of a matrix L as L(k + 1 : n, k) = v k . This leads to the following algorithm for reducing a matrix A ∈ Cn×n to upper Hessenberg form using Householder transformations. The algorithm returns the reduced matrix B. B is tridiagonal if A is symmetric. Details of the transformations are stored in a lower triangular matrix L, also returned by the algorithm. The elements of L can be used to assemble a unitary matrix Q such that B = Q∗ AQ. Algorithm 5.1 is used in each step of the reduction: function [L,B] = hesshousegen(A) n=length(A); L=zeros(n,n); B=A; for k=1:n-2 [v,B(k+1,k)]=housegen(B(k+1:n,k)); L((k+1):n,k)=v; B((k+2):n,k)=zeros(n-k-1,1); C=B((k+1):n,(k+1):n); B((k+1):n,(k+1):n)=C-v*(v’*C); C=B(1:n,(k+1):n); B(1:n,(k+1):n)=C-(C*v)*v’; end end Listing 14.1 hesshousegen

326

14 Numerical Eigenvalue Problems

14.4.1 Assembling Householder Transformations We can use the output of Algorithm 14.1 to assemble the matrix Q ∈ Rn×n such that Q is orthogonal and Q∗ AQ is upper !Hessenberg. " We need to compute the product I 0 Q = H 1 H 2 · · · H n−2 , where H k = 0 I −v k vT and v k ∈ Rn−k . Since v 1 ∈ Rn−1 k

and v n−2 ∈ R2 it is most economical to assemble the product from right to left. We compute Qn−1 = I and Qk = H k Qk+1 for k = n − 2, n − 3, . . . , 1. ! Suppose Qk+1 has the form Qk =

Ik 0 0 Uk

"

, where U k ∈ Rn−k,n−k . Then

      Ik 0 0 Ik 0 Ik ∗ = . 0 I − v k v Tk 0 Uk 0 U k − v k (v Tk U k )

This leads to the following algorithm for assembling Householder transformations. The algorithm assumes that L is output from Algorithm 14.1, and assembles an orthogonal matrix Q from the columns of L so that Q∗ AQ is upper Hessenberg. function Q = accumulateQ(L) n=length(L); Q=eye(n); for k=n-2:-1:1 v=L((k+1):n,k); C=Q((k+1):n,(k+1):n); Q((k+1):n,(k+1):n)=C-v*(v’*C); end Listing 14.2 accumulateQ

14.5 Computing a Selected Eigenvalue of a Symmetric Matrix Let A ∈ Rn×n be symmetric with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn . In this section we consider a method to compute an approximation to the mth eigenvalue λm for some 1 ≤ m ≤ n. Using Householder similarity transformations as outlined in the

14.5 Computing a Selected Eigenvalue of a Symmetric Matrix

327

previous section we can assume that A is symmetric and tridiagonal. ⎡

d1 ⎢ c1 ⎢ ⎢ A=⎢ ⎢ ⎣

⎤ c1 ⎥ d 2 c2 ⎥ ⎥ .. .. .. ⎥. . . . ⎥ cn−2 dn−1 cn−1 ⎦ cn−1 dn

(14.6)

Suppose ! one"of the off-diagonal elements is equal to zero, say ci = 0. We then have A = A01 A02 , where ⎡

d1 ⎢ c1 ⎢ ⎢ A1 = ⎢ ⎢ ⎣

⎤ ⎤ ⎡ c1 di+1 ci+1 ⎥ ⎥ ⎢ci+1 di+2 ci+2 d 2 c2 ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ .. .. .. . . . . . . = and A ⎥ ⎥. ⎢ 2 . . . . . . ⎥ ⎥ ⎢ ⎦ ⎣ ci−2 di−1 ci−1 cn−2 dn−1 cn−1 ⎦ ci−1 di cn−1 dn

Thus A is block diagonal and we can split the eigenvalue problem into two smaller problems involving A1 and A2 . We assume that this reduction has been carried out so that A is irreducible, i.e., ci = 0 for i = 1, . . . , n − 1. We first show that irreducibility implies that the eigenvalues are distinct. Lemma 14.2 (Distinct Eigenvalues of a Tridiagonal Matrix) An irreducible, tridiagonal and symmetric matrix A ∈ Rn×n has n real and distinct eigenvalues. Proof Let A be given by (14.6). By Theorem 6.10 the eigenvalues are real. Define for x ∈ R the polynomial pk (x) := det(xI k − Ak ) for k = 1, . . . , n, where Ak is the upper left k × k corner of A (the leading principal submatrix of order k). The eigenvalues of A are the roots of the polynomial pn . Using the last column to expand for k ≥ 2 the determinant pk+1 (x) we find pk+1 (x) = (x − dk+1 )pk (x) − ck2 pk−1 (x).

(14.7)

Since p1 (x) = x −d1 and p2 (x) = (x −d2 )(x −d1 )−c12 this also holds for k = 0, 1 if we define p−1 (x) = 0 and p0 (x) = 1. For M sufficiently large we have p2 (−M) > 0,

p2 (d1 ) < 0,

p2 (+M) > 0.

Since p2 is continuous there are y1 ∈ (−M, d1 ) and y2 ∈ (d1 , M) such that p2 (y1 ) = p2 (y2 ) = 0. It follows that the root d1 of p1 separates the roots of p2 , so y1 and y2 must be distinct. Consider next p3 (x) = (x − d3 )p2 (x) − c22 p1 (x) = (x − d3 )(x − y1 )(x − y2 ) − c22 (x − d1 ).

328

14 Numerical Eigenvalue Problems

Since y1 < d1 < y2 we have for M sufficiently large p3 (−M) < 0,

p3 (y1 ) > 0,

p3 (y2 ) < 0,

p3 (+M) > 0.

Thus the roots x1 , x2 , x3 of p3 are separated by the roots y1 , y2 of p2 . In the general case suppose for k ≥ 2 that the roots z1 , . . . , zk−1 of pk−1 separate the roots y1 , . . . , yk of pk . Choose M so that y0 := −M < y1 , yk+1 := M > yk . Then y0 < y1 < z1 < y2 < z2 · · · < zk−1 < yk < yk+1 . We claim that for M sufficiently large pk+1 (yj ) = (−1)k+1−j |pk+1 (yj )| = 0, for j = 0, 1, . . . , k + 1. This holds for j = 0, k + 1, and for j = 1, . . . , k since pk+1 (yj ) = −ck2 pk−1 (yj ) = −ck2 (yj − z1 ) · · · (yj − zk−1 ). It follows that the roots x1 , . . . , xk+1 are separated by the roots y1 , . . . , yk of pk and by induction the roots of pn (the eigenvalues of A) are distinct.  

14.5.1 The Inertia Theorem We say that two matrices A, B ∈ Cn×n are congruent if A = E ∗ BE for some nonsingular matrix E ∈ Cn×n . By Theorem 6.7 a Hermitian matrix A is both congruent and similar to a diagonal matrix D, U ∗ AU = D where U is unitary. The eigenvalues of A are the diagonal elements of D. Let π(A), ζ(A) and υ(A) denote the number of positive, zero and negative eigenvalues of A. If A is Hermitian then all eigenvalues are real and π(A) + ζ (A) + υ(A) = n. Theorem 14.6 (Sylvester’s Inertia Theorem) If A, B ∈ Cn×n are Hermitian and congruent then π(A) = π(B), ζ (A) = ζ (B) and υ(A) = υ(B). Proof Suppose A = E ∗ BE, where E is nonsingular. Assume first that A and B are diagonal matrices. Suppose π(A) = k and π(B) = m < k. We shall show that this leads to a contradiction. Let E 1 be the upper left m × k corner of E. Since m < k, we can find a nonzero x such that E 1 x = 0 (cf. Lemma 1.3). Let y T = [x T , 0T ] ∈ Cn , and z = [z1 , . . . , zn ]T = Ey. Then zi = 0 for i = 1, 2, . . . , m. If A has positive eigenvalues λ1 , . . . , λk and B has eigenvalues μ1 , . . . , μn , where μi ≤ 0 for i ≥ m + 1 then ∗

y Ay =

n

i=1

2

λi |yi | =

k

i=1

λi |xi |2 > 0.

14.5 Computing a Selected Eigenvalue of a Symmetric Matrix

329

But y ∗ Ay = y ∗ E ∗ BEy = z∗ Bz =

n

μi |zi |2 ≤ 0,

i=m+1

a contradiction. We conclude that π(A) = π(B) if A and B are diagonal. Moreover, υ(A) = π(−A) = π(−B) = υ(B) and ζ (A) = n − π(A) − υ(A) = n − π(B) − υ(B) = ζ(B). This completes the proof for diagonal matrices. Let in the general case U 1 and U 2 be unitary matrices such that U ∗1 AU 1 = D 1 and U ∗2 BU 2 = D 2 where D 1 and D 2 are diagonal matrices. Since A = E ∗ BE, we find D 1 = F ∗ D 2 F where F = U ∗2 EU 1 is nonsingular. Thus D 1 and D 2 are congruent diagonal matrices. But since A and D 1 , B and D 2 have the same eigenvalues, we find π(A) = π(D 1 ) = π(D 2 ) = π(B). Similar results hold for ζ and υ.   Corollary 14.2 (Counting Eigenvalues Using the LDL* Factorization) Suppose A = tridiag(ci , di , ci ) ∈ Rn×n is symmetric and that α ∈ R is such that A − αI has an symmetric LU factorization, i.e. A − αI = LDLT where L is unit lower triangular and D is diagonal. Then the number of eigenvalues of A strictly less than α equals the number of negative diagonal elements in D. The diagonal elements d1 (α), . . . , dn (α) in D can be computed recursively as follows 2 d1 (α) = d1 − α, dk (α) = dk − α − ck−1 /dk−1 (α), k = 2, 3, . . . , n.

(14.8)

Proof Since the diagonal elements in L in an LU factorization equal the diagonal elements in D in an LDLT factorization we see that the formulas in (14.8) follows immediately from (2.16). Since L is nonsingular, A − αI and D are congruent. By the previous theorem υ(A − αI ) = υ(D), the number of negative diagonal elements in D. If Ax = λx then (A − αI )x = (λ − α)x, and λ − α is an eigenvalue of A − αI . But then υ(A − αI ) equals the number of eigenvalues of A which are less than α.  

14.5.2 Approximating λm Corollary 14.2 can be used to determine the mth eigenvalue of A, where λ1 ≥ λ2 ≥ · · · ≥ λn . Using Gerschgorin’s theorem we first find an interval [a, b], such that (a, b) contains the eigenvalues of A. Let for x ∈ [a, b] ρ(x) := #{k : dk (x) > 0 for k = 1, . . . , n}

330

14 Numerical Eigenvalue Problems

be the number of eigenvalues of A which are strictly greater than x. Clearly ρ(a) = n, ρ(b) = 0. Choosing a tolerance and using bisection we proceed as follows: h = b − a; f or j = 1 : itmax c = (a + b)/2; if b − a < eps ∗ h λ = (a + b)/2; return

(14.9)

end k = ρ(c); if k ≥ m a = c else b = c; end We generate a sequence {[aj , bj ]} of intervals, each containing λm and bj −aj = 2−j (b − a). As it stands this method will fail if in (14.8) one of the dk (α) is zero. One possibility is to replace such a dk (α) by a suitable small number, say δk = ck M , where M is the Machine epsilon, typically 2 × 10−16 for MATLAB. This replacement is done if |dk (α)| < |δk |.

14.6 Exercises Chap. 14 14.6.1 Exercises Sect. 14.1 Exercise 14.1 (Yes or No (Exam Exercise 2006-1)) Answer simply yes or no to the following questions: a) Every matrix A ∈ Cm×n has a singular value decomposition? b) The algebraic multiplicity of an eigenvalue is always less than or equal to the geometric multiplicity? c) The QR factorization of a matrix A ∈ Rn×n can be determined by Householder transformations in O(n2 ) arithmetic operations? d) Let ρ(A) be the spectral radius of A ∈ Cn×n . Then limk→∞ Ak = 0 if and only if ρ(A) < 1?

14.6 Exercises Chap. 14

331

14.6.2 Exercises Sect. 14.2 Exercise 14.2 (Nonsingularity Using Gershgorin) Consider the matrix ⎛

410 ⎜1 4 1 A=⎜ ⎝0 1 4 001

⎞ 0 0⎟ ⎟. 1⎠ 4

Show using Gershgorin’s theorem that A is nonsingular. Exercise 14.3 (Gershgorin, Strictly Diagonally Dominant Matrix) Show using Gershgorin’s circle theorem that a strictly diagonally dominant matrix A (|ai,i | > |a | for all i) is nonsingular. j =i i,j Exercise 14.4 (Gershgorin Disks (Exam Exercise 2009-2)) The eigenvalues of A ∈ Rn,n lie inside R ∩ C, where R := R1 ∪ · · · ∪ Rn is the union of the row disks Ri of A, and C = C1 ∪ · · · ∪ Cn is the union of the column disks Cj . You do not need to prove this. Write a MATLAB function [s,r,c]=gershgorin(A) that computes the centres s = [s1 , . . . , sn ] ∈ Rn of the row and column disks, and their radii r = [r1 , . . . , rn ] ∈ Rn and c = [c1 , . . . , cn ] ∈ Rn , respectively.

14.6.3 Exercises Sect. 14.3 Exercise 14.5 (Continuity of Eigenvalues) Suppose A(t) := D + t (A − D),

D := diag(a11, . . . , ann ),

t ∈ R.

0 ≤ t1 < t2 ≤ 1 and that μ is an eigenvalue of A(t2 ). Show, using Theorem 14.2 with A = A(t1 ) and E = A(t2 ) − A(t1 ), that A(t1 ) has an eigenvalue λ such that   |λ − μ| ≤ C(t2 − t1 )1/n , where C ≤ 2 D2 + A − D2 . Thus, as a function of t, every eigenvalue of A(t) is a continuous function of t. Exercise 14.6 (∞-Norm of a Diagonal Matrix) Give a direct proof that A∞ = ρ(A) if A is diagonal. Exercise 14.7 (Eigenvalue Perturbations (Exam Exercise 2010-2)) Let A = [akj ], E = [ekj ], and B = [bkj ] be matrices in Rn,n with akj =

1, j = k + 1, 0, otherwise,

ekj =

,

k = n, j = 1,

0, otherwise,

(14.10)

332

14 Numerical Eigenvalue Problems

and B = A + E, where 0 < < 1. Thus for n = 4, ⎡ 0 ⎢0 A := ⎢ ⎣0 0

⎤ 100 0 1 0⎥ ⎥, 0 0 1⎦ 000



000 ⎢0 0 0 E := ⎢ ⎣0 0 0 00

⎤ 0 0⎥ ⎥, 0⎦ 0



01 ⎢0 0 B := ⎢ ⎣0 0 0

⎤ 00 1 0⎥ ⎥. 0 1⎦ 00

a) Find the eigenvalues of A and B. b) Show that A2 = B2 = 1 for arbitrary n ∈ N. c) Recall Elsner’s Theorem (Theorem 14.2). Let A, E, B be given by (14.10). What upper bound does (14.1) in Elsner’s theorem give for the eigenvalue μ = 1/n of B? How sharp is this upper bound?

14.6.4 Exercises Sect. 14.4 Exercise 14.8 (Number of Arithmetic Operations, Hessenberg Reduction) 3 Show that the number of arithmetic operations for Algorithm 14.1 is 10 3 n = 5Gn . Exercise 14.9 (Assemble Householder Transformations) Show that the number of arithmetic operations required by Algorithm 14.2 is 43 n3 = 2Gn . Exercise 14.10 (Tridiagonalize a Symmetric Matrix) If A is real and symmetric we can modify Algorithm 14.1 as follows. To find Ak+1 from Ak we have to compute V k E k V k where E k is symmetric. Dropping subscripts we have to compute a product of the form G = (I − vv T )E(I − vv T ). Let w := Ev, β := 12 v T w and z := w −βv. Show that G = E −vzT −zv T . Since G is symmetric, only the sub- or superdiagonal elements of G need to be computed. Computing G in this way, it can be shown that we need O(4n3 /3) operations to tridiagonalize a symmetric matrix by orthonormal similarity transformations. This is less than half the work to reduce a nonsymmetric matrix to upper Hessenberg form. We refer to [18] for a detailed algorithm.

14.6.5 Exercises Sect. 14.5 Exercise 14.11 (Counting Eigenvalues) Consider the matrix in Exercise 14.2. Determine the number of eigenvalues greater than 4.5.

14.6 Exercises Chap. 14

333

Exercise 14.12 (Overflow in LDL* Factorization) Let for n ∈ N ⎡

⎤ 10 1 0 · · · 0 ⎢ . ⎥ ⎢ 1 10 1 . . . .. ⎥ ⎢ ⎥ ⎢ ⎥ An = ⎢ 0 . . . . . . . . . 0 ⎥ ∈ Rn×n . ⎢ ⎥ ⎢ . . ⎥ ⎣ .. . . 1 10 1 ⎦ 0 · · · 0 1 10 a) Let √ dk be the diagonal elements of D in an LDL* factorization of An . Show that 5+ 24 < dk ≤ 10, k = 1, 2, . . . ,√ n. b) Show that Dn := det(An ) > (5 + 24)n . Give n0 ∈ N such that your computer gives an overflow when Dn0 is computed in floating point arithmetic. Exercise 14.13 (Simultaneous Diagonalization) (Simultaneous diagonalization of two symmetric matrices by a congruence transformation). Let A, B ∈ Rn×n where AT = A and B is symmetric positive definite. Then B = U T DU where U is orthonormal and D = diag(d1 , . . . , dn ) has positive diagonal elements. Let ˆ = D −1/2 U AU T D −1/2 where A  −1/2 −1/2  . D −1/2 := diag d1 , . . . , dn ˆ is symmetric. a) Show that A ˆ = Uˆ T D ˆ Uˆ where Uˆ is orthonormal and D ˆ is diagonal. Set E = Let A T U T D −1/2 Uˆ . ˆ E T BE = I . b) Show that E is nonsingular and that E T AE = D, For a more general result see Theorem 10.1 in [11]. Exercise 14.14 (Program Code for One Eigenvalue) Suppose A = tridiag(c, d, c) is symmetric and tridiagonal with elements d1 , . . . , dn on the diagonal and c1 , . . . , cn−1 on the neighboring subdiagonals. Let λ1 ≥ λ2 ≥ · · · ≥ λn be the eigenvalues of A. We shall write a program to compute one eigenvalue λm for a given m using bisection and the method outlined in (14.9). a) Write a function k=counting(c,d,x) which for given x counts the number of eigenvalues of A strictly greater than x. Use the replacement described above if one of the dj (x) is close to zero. b) Write a function lambda=findeigv(c,d,m) which first estimates an interval (a, b] containing all eigenvalues of A and then generates a sequence {(aj , bj ]} of intervals each containing λm . Iterate until bj − aj ≤ (b − a) M , where M is MATLAB’s machine epsilon eps. Typically M ≈ 2.22 × 10−16. c) Test the program on T := tridiag(−1, 2, −1) of size 100. Compare the exact value of λ5 with your result and the result obtained by using MATLAB’s built-in function eig.

334

14 Numerical Eigenvalue Problems

Exercise 14.15 (Determinant of Upper Hessenberg Matrix) Suppose A ∈ Cn×n is upper Hessenberg and x ∈ C. We will study two algorithms to compute f (x) = det(A − xI ). a) Show that Gaussian elimination without pivoting requires O(n2 ) arithmetic operations. b) Show that the number of arithmetic operations is the same if partial pivoting is used. c) Estimate the number of arithmetic operations if Given’s rotations are used. d) Compare the two methods discussing advantages and disadvantages.

14.7 Review Questions 14.7.1

Suppose A, E ∈ Cn×n . To every μ ∈ σ (A + E) there is a λ ∈ σ (A) which is in some sense close to μ. • • • •

14.7.2 14.7.3

14.7.4 14.7.5 14.7.6

What is the general result (Elsner’s theorem)? what if A is non defective? what if A is normal? what if A is Hermitian?

Can Gerschgorin’s theorem be used to check if a matrix is nonsingular? How many arithmetic operation does it take to reduce a matrix by similarity transformations to upper Hessenberg form by Householder transformations? Give a condition ensuring that a tridiagonal symmetric matrix has real and distinct eigenvalues: What is the content of Sylvester’s inertia theorem? Give an application of this theorem.

Chapter 15

The QR Algorithm

The QR algorithm is a method to find all eigenvalues and eigenvectors of a matrix. In this chapter we give a brief informal introduction to this important algorithm. For a more complete treatment see [18]. The QR algorithm is related to a simpler method called the power method and we start studying this method and its variants.

15.1 The Power Method and Its Variants These methods can be used to compute a single eigenpair of a matrix. They also play a role when studying properties of the QR algorithm.

15.1.1 The Power Method The power method in its basic form is a technique to compute an approximation to the eigenvector corresponding to the largest (in absolute value) eigenvalue of a matrix A ∈ Cn×n . As a by product we can also find an approximation to the corresponding eigenvalue. We define a sequence {zk } of vectors in Cn by zk := Ak z0 = Azk−1 ,

k = 1, 2, . . . .

(15.1)

Example 15.1 (Power Method) Let  A=

 2 −1 , −1 2

z0 :=

  1 . 0

© Springer Nature Switzerland AG 2020 T. Lyche, Numerical Linear Algebra and Matrix Factorizations, Texts in Computational Science and Engineering 22, https://doi.org/10.1007/978-3-030-36468-7_15

335

336

15 The QR Algorithm

We find 

 2 z1 = Az0 = , −1



   1 1 + 3k 5 , ··· . z2 = Az1 = , · · · , zk = −4 2 1 − 3k

It follows that 2zk /3k converges to [1, −1], an eigenvector corresponding to the dominant eigenvalue λ = 3. The sequence of Rayleigh quotients {zTk Azk /zTk zk } will converge to the dominant eigenvalue λ = 3. To understand better what happens we expand z0 in terms of the eigenvectors     1 1 1 1 z0 = + = c1 v 1 + c2 v 2 . 2 −1 2 1 Since Ak has eigenpairs (λkj , v j ), j = 1, 2 we find zk = c1 λk1 v 1 + c2 λk2 v 2 = c1 3k v 1 + c2 1k v 2 . Thus 3−k zk = c1 v 1 + 3−k c2 v 2 → c1 v 1 . Since c1 = 0 we obtain convergence to the dominant eigenvector. Let A ∈ Cn×n have eigenpairs (λj , v j ), j = 1, . . . , n with |λ1 | > |λ2 | ≥ · · · ≥ |λn |. Given z0 ∈ Cn we assume that |λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |,

(i)

(ii) zT0 v 1 = 0

(15.2)

(iii) A has linearly independent eigenvectors. The first assumption means that A has a dominant eigenvalue λ1 of algebraic multiplicity one. The second assumption says that z0 has a component in the direction v 1 . The third assumption is not necessary, but is included in order to simplify the analysis. To see what happens let z0 = c1 v 1 + c2 v 2 + · · · + cn v n , where by assumption (ii) of (15.2) we have c1 = 0. Since Ak v j = λkj v j for all j we see that zk = c1 λk1 v 1 + c2 λk2 v 2 + · · · + cn λkn v n ,

k = 0, 1, 2, . . . .

(15.3)

Dividing by λk1 we find zk λk1

= c1 v 1 + c2

 λ k 2

λ1

v 2 + · · · + cn

 λ k n

λ1

vn,

k = 0, 1, 2, . . . .

(15.4)

15.1 The Power Method and Its Variants

337

Assumption (i) of (15.2) implies that (λj /λ1 )k → 0 as k → ∞ for all j ≥ 2 and we obtain zk = c1 v 1 , k→∞ λk 1 lim

(15.5)

the dominant eigenvector of A. It can be shown that this also holds for defective matrices as long as (i) and (ii) of (15.2) hold, see for example page 58 of [18]. In practice we need to scale the iterates zk somehow and we normally do not know λ1 . Instead we choose a norm on Cn , set x 0 = z0 /z0  and generate for k = 1, 2, . . . unit vectors as follows: (i)

y k = Ax k−1

(15.6)

(ii) x k = y k /y k .

Lemma 15.1 (Convergence of the Power Method) Suppose (15.2) holds. Then  |λ | k c1 v 1 1 . xk = k→∞ λ1 |c1 | v 1  lim

In particular, if λ1 > 0 and c1 > 0 then the sequence {x k } will converge to the eigenvector u1 := v 1 /v 1  of unit length. Proof By induction on k it follows that x k = zk /zk  for all k ≥ 0, where zk = Ak z0 . Indeed, this holds for k = 1, and if it holds for k − 1 then y k = Ax k−1 = Azk−1 /zk−1  = zk /zk−1  and x k = (zk /zk−1 )(zk−1 /zk ) = zk /zk . But then

xk =

zk = zk 

c1 λk1 |c1 λk1 |

v1 +

c2 c1

v 1 +

c2 c1

and this implies the lemma.

 

λ2 λ1 λ2 λ1

k k

v2 + · · · +

cn c1

v2 + · · · +

cn c1

 

λn λ1 λn λ1

k k

vn , vn

k = 0, 1, 2, . . . ,

 

Suppose we know an approximate eigenvector u of A, but not the corresponding eigenvalue μ. One way of estimating μ is to minimize the Euclidian norm of the residual r(λ) := Au − λu. Theorem 15.1 (The Rayleigh Quotient Minimizes the Residual) Let A ∈ Cn×n , u ∈ Cn \ {0}, and let ρ : C → R be given by ρ(λ) = Au − λu2 . Then ρ is ∗ minimized when λ := uu∗Au u , the Rayleigh quotient for A.

338

15 The QR Algorithm

Proof Assume u∗ u = 1 and extend u to an orthonormal basis {u, U } for Cn . Then U ∗ u = 0 and   ∗   ∗  ∗ u Au − λ u Au − λu∗ u u = . (Au − λu) = U ∗ Au − λU ∗ u U ∗ Au U∗ By unitary invariance of the Euclidian norm ρ(λ)2 = |u∗ Au − λ|2 + U ∗ Au22 , and ρ has a global minimum at λ = u∗ Au.

 

Using Rayleigh quotients we can incorporate the calculation of the eigenvalue into the power iteration. We can then compute the residual and stop the iteration when the residual is sufficiently small. But what does it mean to be sufficiently small? Recall that if A is nonsingular with a nonsingular eigenvector matrix X and (μ, u) is an approximate eigenpair with u2 = 1, then by (14.4) we can find an eigenvalue λ of A such that |λ − μ| Au − μu2 ≤ K2 (X)K2 (A) . |λ| A2 Thus if the relative residual is small and both A and X are well conditioned then the relative error in the eigenvalue will be small. This discussion leads to the power method with Rayleigh quotient computation. Given A ∈ Cn×n , a starting vector z ∈ Cn , a maximum number K of iterations, and a convergence tolerance tol. The power method combined with a Rayleigh quotient estimate for the eigenvalue is used to compute a dominant eigenpair (l, x) of A with x2 = 1. The integer it returns the number of iterations needed in order for Ax − lx2 /AF < tol. If no such eigenpair is found in K iterations the value it = K + 1 is returned. function [l,x,it]=powerit(A,z,K,tol) % [l,x,it]=powerit(A,z,K,tol) af=norm(A,’fro’); x=z/norm(z); for k=1:K y=A*x; l=x’*y; if norm(y-l*x)/af