Linear Algebra for Earth Scientists
 9781032555942, 9781032557502, 9781003432043

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Contents
Preface
Author Bios
CHAPTER 1: Rows and Columns
1.1. THE THREE-POINT PROBLEM
1.1.1. Graphical Solution
1.1.2. Three Equations, Three Unknowns, and Nine Knowns
1.2. CAN WE SOLVE THIS MORE EASILY USING LINEAR ALGEBRA?
1.2.1. Putting a Problem Written as Equations into a Linear Algebra Form
1.2.2. Solving a Problem in Linear Algebra Form
1.2.3. Remember the Units
1.3. THINKING ABOUT COLUMNS AND VECTORS
1.4. VECTORS
1.4.1. Properties and Operations of Vectors
1.4.2. Some Geologic Vectors
1.4.3. What about a Dipping Bed or Surface?
1.4.4. Direction Cosines
1.4.5. Lines as well as Planes
1.5. A GEOCHEMICAL EXAMPLE
1.5.1. Do the Units Work?
1.5.2. Graphing the Row and Column Views
1.5.3. Using an Inverse
1.6. SUMMARY
1.7. EXERCISES
CHAPTER 2: Matrix Multiplication
2.1. MULTIPLYING THE LINEAR ALGEBRA WAY
2.1.1. Conventions Used for Matrices and Vectors
2.1.2. Multiply a Vector Times a Vector
2.1.3. Multiply a Matrix Times a Vector—the Column Approach
2.1.4. Multiply a Matrix Times a Vector—the Row Approach
2.1.5. Three-point Problem as a Matrix Times a Vector
2.1.6. Multiply a Matrix Times a Matrix with Dot Products
2.1.7. Multiply a Matrix Times a Matrix with Outer Products
2.1.8. Another Use for Dot Products
2.1.9. Some Important Rules about Matrix Math
2.2. TRANSPOSE OF A VECTOR OR MATRIX
2.2.1. The Transpose of a Vector Times Itself
2.2.2. A Couple Useful Rules About Transposes
2.3. LINEAR COMBINATIONS
2.3.1. Rows and Columns Graphed
2.3.2. Coordinate Systems and Unit Vectors
2.3.3. The Diagonal Identity Matrix
2.3.4. Multiplying by Diagonal Matrices
2.4. LINEAR TRANSFORMATIONS
2.4.1. The Linear Transformation View
2.4.2. Linear Combinations and the Span of a Matrix
2.4.3. Do We have it all Straight?
2.5. GEOLOGIC TRANSFORMATIONS
2.5.1. Strain Matrix and Transformation
2.5.2. Transformation by 2D Rotation
2.5.3. Transformation by 3D Rotation
2.6. WHAT MORE CAN WE DO WITH MATRICES?
2.6.1. Computing Direction Cosines from Orientation Measurements
2.6.2. Matrix Transformation from Trend and Plunge to Direction Cosines
2.6.3. Plane Orientations to Direction Cosines
2.6.4. Equation of a Plane and Its Normal or Pole
2.7. MORE ABOUT 3D ROTATIONS
2.8. SUMMARY
2.9. EXERCISES
CHAPTER 3: Solving Ax=b
3.1. ELIMINATION
3.1.1. The Three-point Problem, Revisited
3.1.2. What Does Elimination Tell Us about the Matrix A?
3.1.3. The Column View of the Three-point Problem
3.2. ELIMINATION MATRICES AND ELEMENTARY ROW OPERATIONS
3.2.1. Elimination Using Outer Products
3.2.2. The Three-point Problem Using Elimination Matrices
3.2.3. Swapping Rows When Needed
3.2.4. Summary
3.3. ELEMENTARY MATRICES AND THEIR INVERSES
3.3.1. What are Elementary Matrices, and Why are They Important?
3.3.2. Undoing an Operation and the Inverse of an Elementary Matrix
3.4. OUR FIRST FACTORIZATIONS—A = LU = LDU
3.4.1. Finding the Inverse of E and Getting L
3.4.2. Getting to L U
3.4.3. Computing LDU and Learning More About the Matrix A
3.5. GAUSS-JORDAN ELIMINATION AND COMPUTING A−1
3.5.1. Setting up Gauss-Jordan Elimination
3.5.2. Cleaning Upwards
3.6. THE THREE-POINT PROBLEM MANY WAYS
3.6.1. Simple Elimination with an Augmented Matrix
3.6.2. Gauss-Jordan Elimination on the Three-point Problem
3.6.3. LDU for the Three-point Problem
3.7. REWORKING THE GEOCHEMICAL EXAMPLE
3.8. SUMMARY
3.9. EXERCISES
CHAPTER 4: When Does Ax = b have a Solution?
4.1. SLIP WITH DIFFERENT NUMBERS OF FAULTS
4.1.1. Three Faults
4.1.2. Two Faults
4.1.3. Four Faults
4.2. THE THREE-POINT PROBLEM WITH MORE POINTS
4.3. THE VECTOR b MUST BE IN THE COLUMN SPACE OF A TO FIND x
4.3.1. Vector Spaces and Subspaces
4.3.2. Dimension, Rank, and Basis
4.4. EXPLORING THE VECTOR SPACES AND SUBSPACES OF Ax=b
4.4.1. The Data Vector b and the Column Space of A
4.4.2. The Solution Vector x and Null Space are in a Different Vector Space from C(A)
4.4.3. Finding the Solution for a Rectangular Matrix
4.4.4. The N(A) and C(A) for Some Square Matrices
4.4.5. The Row Space of A is the Column Space of the Transpose of A
4.4.6. Null Space of the Transpose of A
4.4.7. Plotting Ax=b in Vector Spaces and Subspaces
4.4.8. Two, Three, and Four Faults Revisited
4.4.9. What’s Hiding in the Null Space?
4.5. RANK AND SIZE OF A DETERMINES IF WE CAN SOLVE Ax = b
4.5.1. Unique Solution—Rank and Number of Rows and Columns are Equal: r = m = n
4.5.2. The Most Common Case: Full Column Rank: r = n < m
4.5.3. Lots and Lots of Solutions: r = m < n
4.5.4. The General Case: r < m and r < n
4.6. SUMMARY
4.7. EXERCISES
CHAPTER 5: Solving Ax= b When There is No Solution
5.1. PROJECTION
5.1.1. Lines in 2D and 3D
5.1.2. Going from a to A
5.1.3. More About AT A
5.2. SOLVING PROJECTION FOR LINES AND PLANES
5.2.1. Fitting a Line to More Than Two Points
5.2.2. How We Normally Find the Best-fitting Line
5.2.3. The Solution Viewed from Vector Space
5.2.4. Solving the Six-point Problem
5.2.5. Using Algebra and Calculus for Least Squares
5.2.6. The Vector Space Picture of Projection
5.3. CAN WE FIT SOMETHING OTHER THAN A LINE OR PLANE?
5.3.1. Fitting a Stream Profile
5.3.2. Tomography of the Upper Crust
5.3.3. How Certain are We that the Best Solution is Good?
5.4. SUMMARY
5.5. EXERCISES
CHAPTER 6: Determinants and Orthogonality
6.1. DETERMINANTS, STRAIN, AND GEOLOGIC TRANSFORMATIONS
6.1.1. Determinants
6.1.2. What does the Determinant Measure?
6.1.3. What is the Volume in Higher Dimensions?
6.1.4. Why can the Determinant have a Positive or Negative Sign?
6.1.5. Computing Determinants
6.2. STRAIN, DETERMINANTS, AND BASIC MATRIX OPERATIONS
6.2.1. Two Strains and the Determinant of AB
6.2.2. The Same Strain Again and the Determinant of A2
6.2.3. Undoing a Strain and the Determinant of A−1
6.2.4. Switch Strain Axes and the Determinant of AT
6.2.5. Multiplying Strains by a Factor of 2 and the Determinant of 2A
6.2.6. Adding Two Strains and the Determinant of A + B
6.3. CROSS PRODUCTS
6.3.1. Cross Product of Two Vectors
6.3.2. Cross Product and the Angle Between Vectors
6.3.3. The Three-point Problem with Cross Products
6.3.4. Combining Cross Product and Dot Product
6.3.5. Using the Cross Product in Orientation Analysis
6.4. ORTHOGONALITY AND GRAM-SCHMIDT PROCESS
6.4.1. Gram-Schmidt Process
6.4.2. Orthogonal Vectors in 2D
6.4.3. Orthogonal Vectors in 3D and More
6.4.4. Normalizing the Orthogonal Vectors
6.4.5. Example of Gram-Schmidt
6.4.6. Orthogonal or Orthonormal?
6.5. QR DECOMPOSITION—OUR NEXT FACTORIZATION
6.5.1. Constructing Q and R
6.5.2. Using QR Decomposition When Ax=b has No Solution
6.6. SUMMARY
6.7. EXERCISES
CHAPTER 7: An Earth Science View of Eigenvalues and Eigenvectors
7.1. MOTIVATION FOR EIGENANALYSIS
7.1.1. The Special Directions are the Eigenvectors
7.1.2. The Scaling Factors are the Eigenvalues
7.2. EIGENVALUES AND EIGENVECTORS
7.2.1. Finding Eigenvalues with the Characteristic Equation
7.2.2. Two Examples from Strain
7.2.3. Finding the Eigenvectors
7.3. LOOKING AT DATA AND NOT Ax=b
7.3.1. Back to Orientation Data
7.3.2. Using AT A with Arrays of Data
7.3.3. Computing Eigenvalues and Eigenvectors for Orientation Data
7.3.4. More Information About the Orientation Arrays
7.4. DIAGONALIZATION INTO XΛX–1
7.4.1. More Strain
7.4.2. Rotation Matrices
7.4.3. What else We Get from Diagonalization
7.4.4. The Spectral Theorem
7.5. A DETAILED LOOK AT WHAT XT, Λ, AND X DO
7.6. SUMMARY
7.7. EXERCISES
CHAPTER 8: Change of Basis, Eigenbasis, and Quadratic Forms
8.1. CHANGE OF BASIS
8.1.1. Basis Vectors, Components, and Change of Basis Matrices
8.1.2. Change of Basis Matrices from a Different Angle
8.1.3. Change of Basis Matrices can Look Just Like Rotation Ones
8.2. WHAT DO THESE RESULTS MEAN FOR THE EIGENBASIS?
8.3. THE GEOLOGIST’S DILEMMA—SETTING A BRUNTON® COMPASS
8.3.1. What are the Basis Vectors?
8.3.2. Change of Basis Matrices
8.3.3. Put a Ring on It
8.4. A NICE GNEISS
8.5. STRESSING THE USE OF EIGENANALYSIS
8.5.1. Stress Tensor or Matrix
8.5.2. Finding Principal Stresses
8.6. QUADRATIC FORMS
8.6.1. Circles Become Ellipses
8.6.2. Axes and Coordinate System are Aligned
8.6.3. Tilted Ellipsoids and Quadratic Forms
8.6.4. Finding the Ellipse
8.6.5. Finding the Ellipse with more Points Using Least Squares
8.7. SUMMARY
8.8. EXERCISES
CHAPTER 9: Singular Value Decomposition
9.1. WHAT IS SINGULAR VALUE DECOMPOSITION OR SVD?
9.2. SVD FOR ANY MATRIX
9.2.1. Back to Simple Shear
9.2.2. What is the Connection Between Λ and Σ?
9.2.3. What are the Sizes and Ranks of the Matrices in SVD?
9.3. WHEN DO SIMPLE AND PURE SHEAR PRODUCE THE SAME STRAIN ELLIPSE?
9.4. USING SVD TO SOLVE Ax = b
9.4.1. The Four Subspaces via SVD
9.4.2. Four Easy Steps to Get from the SVD to the Inverse or Pseudoinverse
9.4.3. Calculating a Particular Solution xp and the Full Solution to Ax = b
9.5. CONDITION NUMBERS, OR, WHEN GOOD MATRICES GO BAD
9.5.1. Testing all of Our Approaches to Solving Ax = b
9.5.2. Small Errors and Big Levers
9.6. SUMMARY
9.7. EXERCISES
Index
MATLAB® Moments

Citation preview

Linear Algebra for Earth Scientists Linear Algebra for Earth Scientists is written for undergraduate and graduate students in Earth and Environmental sciences. It is intended to give students enough background in linear algebra to work with systems of equations and data in geology, hydrology, geophysics, or whatever part of the Earth Sciences they engage with. The book does not presuppose any extensive prior knowledge of linear algebra. Instead, the book builds students up from a low base to a working understanding of the subject that they can apply to their work, using many familiar examples in the geosciences. Features •  Suitable for students of Earth and Environmental Sciences • Minimal prerequisites — written in a way that is accessible and engaging for those without a mathematical background •  All material presented with examples and applications to the Earth Sciences.

Linear Algebra for Earth Scientists

J. Douglas Walker and Noah M. McLean University of Kansas, USA

MATLAB® and Simulink® are trademarks of The MathWorks, Inc. and are used with permission. The MathWorks does not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® or Simulink® software or related products does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular use of the MATLAB® and Simulink® software. First edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 J. Douglas Walker, Noah M. McLean Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Names: Walker, J. Douglas, 1958- author. | McLean, Noah M., author. Title: Linear algebra for earth scientists / J. Douglas Walker, University of Kansas, USA, Noah M. McLean, University of Kansas, USA. Description: First edition. | Boca Raton : C&H/CRC Press, 2024. | Includes bibliographical references and index. Identifiers: LCCN 2023050876 (print) | LCCN 2023050877 (ebook) | ISBN 9781032555942 (hbk) | ISBN 9781032557502 (pbk) | ISBN 9781003432043 (ebk) Subjects: LCSH: Earth sciences--Mathematics--Textbooks. | Algebras, Linear--Textbooks. Classification: LCC QE33.2.M3 W348 2024 (print) | LCC QE33.2.M3 (ebook) | DDC 550.1/5125--dc23/eng/20231221 LC record available at https://lccn.loc.gov/2023050876 LC ebook record available at https://lccn.loc.gov/2023050877 ISBN: 978-1-032-55594-2 (hbk) ISBN: 978-1-032-55750-2 pbk) ISBN: 978-1-003-43204-3 (ebk) DOI: 10.1201/ 9781003432043 Typeset in Latin Modern font by KnowledgeWorks Global Ltd. Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.

Contents Preface

xiii

Author Bios

xvii

Chapter

1 ■ Rows and Columns

1.1 THE THREE-POINT PROBLEM

1.1.1

Graphical Solution

1.1.2

Three Equations, Three Unknowns, and Nine Knowns

1.2 CAN WE SOLVE THIS MORE EASILY USING LINEAR ALGEBRA?

1 1 1 2 5

1.2.1

Putting a Problem Written as Equations into a Linear Algebra Form

5

1.2.2

Solving a Problem in Linear Algebra Form

7

1.2.3

Remember the Units

7

1.3 THINKING ABOUT COLUMNS AND VECTORS

8

1.4 VECTORS

10

1.4.1

Properties and Operations of Vectors

10

1.4.2

Some Geologic Vectors

12

1.4.3

What about a Dipping Bed or Surface?

14

1.4.4

Direction Cosines

16

1.4.5

Lines as well as Planes

16

1.5 A GEOCHEMICAL EXAMPLE

18

1.5.1

Do the Units Work?

19

1.5.2

Graphing the Row and Column Views

20

1.5.3

Using an Inverse

20

1.6 SUMMARY

22

1.7 EXERCISES

23

Chapter

2 ■ Matrix Multiplication

2.1 MULTIPLYING THE LINEAR ALGEBRA WAY

2.1.1

Conventions Used for Matrices and Vectors

2.1.2

Multiply a Vector Times a Vector

27 27 27 28 v

vi ■ Contents

2.1.3

Multiply a Matrix Times a Vector—the Column Approach

29

2.1.4

Multiply a Matrix Times a Vector—the Row Approach

30

2.1.5

Three-point Problem as a Matrix Times a Vector

30

2.1.6

Multiply a Matrix Times a Matrix with Dot Products

31

2.1.7

Multiply a Matrix Times a Matrix with Outer Products

32

2.1.8

Another Use for Dot Products

33

2.1.9

Some Important Rules about Matrix Math

34

2.2 TRANSPOSE OF A VECTOR OR MATRIX

35

2.2.1

The Transpose of a Vector Times Itself

36

2.2.2

A Couple Useful Rules About Transposes

37

2.3 LINEAR COMBINATIONS

38

2.3.1

Rows and Columns Graphed

38

2.3.2

Coordinate Systems and Unit Vectors

40

2.3.3

The Diagonal Identity Matrix

41

2.3.4

Multiplying by Diagonal Matrices

41

2.4 LINEAR TRANSFORMATIONS

42

2.4.1

The Linear Transformation View

42

2.4.2

Linear Combinations and the Span of a Matrix

44

2.4.3

Do We have it all Straight?

46

2.5 GEOLOGIC TRANSFORMATIONS

46

2.5.1

Strain Matrix and Transformation

46

2.5.2

Transformation by 2D Rotation

48

2.5.3

Transformation by 3D Rotation

49

2.6 WHAT MORE CAN WE DO WITH MATRICES?

50

2.6.1

Computing Direction Cosines from Orientation Measurements

51

2.6.2

Matrix Transformation from Trend and Plunge to Direction Cosines

53

2.6.3

Plane Orientations to Direction Cosines

54

2.6.4

Equation of a Plane and Its Normal or Pole

55

2.7 MORE ABOUT 3D ROTATIONS

57

2.8 SUMMARY

60

2.9 EXERCISES

60

Chapter

3 ■ Solving Ax=b

3.1 ELIMINATION

3.1.1

The Three-point Problem, Revisited

3.1.2

What Does Elimination Tell Us about the Matrix A?

3.1.3

The Column View of the Three-point Problem

62 62 63 67 69

Contents ■ vii

3.2 ELIMINATION MATRICES AND ELEMENTARY ROW OPERATIONS

70

3.2.1

Elimination Using Outer Products

70

3.2.2

The Three-point Problem Using Elimination Matrices

72

3.2.3

Swapping Rows When Needed

73

3.2.4

Summary

74 74

3.3 ELEMENTARY MATRICES AND THEIR INVERSES

3.3.1

What are Elementary Matrices, and Why are They Important?

75

3.3.2

Undoing an Operation and the Inverse of an Elementary Matrix 75 77

3.4 OUR FIRST FACTORIZATIONS—A = L U = L D U

3.4.1

Finding the Inverse of E and Getting L

77

3.4.2

Getting to L U

78

3.4.3

Computing LDU and Learning More About the Matrix A −1

3.5 GAUSS-JORDAN ELIMINATION AND COMPUTING A

80 80

3.5.1

Setting up Gauss-Jordan Elimination

81

3.5.2

Cleaning Upwards

81

3.6 THE THREE-POINT PROBLEM MANY WAYS

83

3.6.1

Simple Elimination with an Augmented Matrix

83

3.6.2

Gauss-Jordan Elimination on the Three-point Problem

84

3.6.3

LDU for the Three-point Problem

85

3.7 REWORKING THE GEOCHEMICAL EXAMPLE

87

3.8 SUMMARY

89

3.9 EXERCISES

90

Chapter

4 ■ When Does Ax = b have a Solution?

4.4.1

The Data Vector b and the Column Space of A

93 94 95 96 96 97 99 99 101 102 102

4.4.2

The Solution Vector x and Null Space are in a Different Vector Space from C(A)

104

4.4.3

Finding the Solution for a Rectangular Matrix

105

4.1 SLIP WITH DIFFERENT NUMBERS OF FAULTS

4.1.1

Three Faults

4.1.2

Two Faults

4.1.3

Four Faults

4.2 THE THREE-POINT PROBLEM WITH MORE POINTS 4.3 THE VECTOR b MUST BE IN THE COLUMN SPACE OF A TO FIND x

4.3.1

Vector Spaces and Subspaces

4.3.2

Dimension, Rank, and Basis

4.4 EXPLORING THE VECTOR SPACES AND SUBSPACES OF Ax=b

viii ■ Contents

4.4.4

The N(A) and C(A) for Some Square Matrices

111

4.4.5

The Row Space of A is the Column Space of the Transpose of A

114

4.4.6

Null Space of the Transpose of A

118

4.4.7

Plotting Ax=b in Vector Spaces and Subspaces

120

4.4.8

Two, Three, and Four Faults Revisited

123

4.4.9

What’s Hiding in the Null Space?

127

4.5 RANK AND SIZE OF A DETERMINES IF WE CAN SOLVE Ax = b

130

4.5.1

Unique Solution—Rank and Number of Rows and Columns are Equal: r = m = n

130

4.5.2

The Most Common Case: Full Column Rank: r = n < m

131

4.5.3

Lots and Lots of Solutions: r = m < n

132

4.5.4

The General Case: r < m and r < n

135

4.6 SUMMARY

135

4.7 EXERCISES

137

Chapter

5 ■ Solving Ax= b When There is No Solution

5.1 PROJECTION

5.1.1

Lines in 2D and 3D

5.1.2

Going from a to A

5.1.3

More About AT A

5.2 SOLVING PROJECTION FOR LINES AND PLANES

5.2.1

Fitting a Line to More Than Two Points

5.2.2

How We Normally Find the Best-fitting Line

5.2.3

The Solution Viewed from Vector Space

5.2.4

Solving the Six-point Problem

5.2.5

Using Algebra and Calculus for Least Squares

5.2.6

The Vector Space Picture of Projection

5.3 CAN WE FIT SOMETHING OTHER THAN A LINE OR PLANE?

5.3.1

Fitting a Stream Profile

5.3.2

Tomography of the Upper Crust

5.3.3

How Certain are We that the Best Solution is Good?

5.4 SUMMARY 5.5 EXERCISES

Chapter

6 ■ Determinants and Orthogonality

6.1 DETERMINANTS, STRAIN, AND GEOLOGIC TRANSFORMATIONS

6.1.1

Determinants

139 140 140 144 147 148 148 149 150 154 156 158 159 160 161 163 164 165 167 167 168

Contents ■ ix

6.1.2

What does the Determinant Measure?

169

6.1.3

What is the Volume in Higher Dimensions?

172

6.1.4

Why can the Determinant have a Positive or Negative Sign?

172

6.1.5

Computing Determinants

173

6.2 STRAIN, DETERMINANTS, AND BASIC MATRIX OPERATIONS

6.2.1

Two Strains and the Determinant of AB

6.2.2

The Same Strain Again and the Determinant of A

174 175

2

176

6.2.3

Undoing a Strain and the Determinant of A

−1

176

6.2.4

Switch Strain Axes and the Determinant of A

177

6.2.5

Multiplying Strains by a Factor of 2 and the Determinant of 2A

177

6.2.6

Adding Two Strains and the Determinant of A + B

178

T

6.3 CROSS PRODUCTS

178

6.3.1

Cross Product of Two Vectors

179

6.3.2

Cross Product and the Angle Between Vectors

180

6.3.3

The Three-point Problem with Cross Products

181

6.3.4

Combining Cross Product and Dot Product

182

6.3.5

Using the Cross Product in Orientation Analysis

184

6.4 ORTHOGONALITY AND GRAM-SCHMIDT PROCESS

186

6.4.1

Gram-Schmidt Process

187

6.4.2

Orthogonal Vectors in 2D

188

6.4.3

Orthogonal Vectors in 3D and More

188

6.4.4

Normalizing the Orthogonal Vectors

189

6.4.5

Example of Gram-Schmidt

190

6.4.6

Orthogonal or Orthonormal?

191

6.5 QR DECOMPOSITION—OUR NEXT FACTORIZATION

192

6.5.1

Constructing Q and R

192

6.5.2

Using QR Decomposition When Ax=b has No Solution

193

6.6 SUMMARY

195

6.7 EXERCISES

196

Chapter

7 ■ An Earth Science View of Eigenvalues and Eigenvectors

7.1 MOTIVATION FOR EIGENANALYSIS

7.1.1

The Special Directions are the Eigenvectors

7.1.2

The Scaling Factors are the Eigenvalues

7.2 EIGENVALUES AND EIGENVECTORS

7.2.1

Finding Eigenvalues with the Characteristic Equation

198 198 201 201 201 202

x ■ Contents

7.2.2

Two Examples from Strain

202

7.2.3

Finding the Eigenvectors

203

7.3 LOOKING AT DATA AND NOT Ax=b

206

7.3.1

Back to Orientation Data

207

7.3.2

Using A A with Arrays of Data

208

7.3.3

Computing Eigenvalues and Eigenvectors for Orientation Data 210

7.3.4

More Information About the Orientation Arrays

211

–1

212

T

7.4 DIAGONALIZATION INTO XΛX

7.4.1

More Strain

214

7.4.2

Rotation Matrices

217

7.4.3

What else We Get from Diagonalization

218

7.4.4

The Spectral Theorem

219

7.5 A DETAILED LOOK AT WHAT X , Λ, AND X DO

220

7.6 SUMMARY

223

7.7 EXERCISES

224

T

Chapter 8.1

8.2 8.3

8.4 8.5

8.6

8 ■ Change of Basis, Eigenbasis, and Quadratic Forms

227 CHANGE OF BASIS 228 8.1.1 Basis Vectors, Components, and Change of Basis Matrices 230 8.1.2 Change of Basis Matrices from a Different Angle 232 8.1.3 Change of Basis Matrices can Look Just Like Rotation Ones 235 WHAT DO THESE RESULTS MEAN FOR THE EIGENBASIS? 237 ® THE GEOLOGIST’S DILEMMA—SETTING A BRUNTON COMPASS 238 8.3.1 What are the Basis Vectors? 240 8.3.2 Change of Basis Matrices 242 8.3.3 Put a Ring on It 244 A NICE GNEISS 246 STRESSING THE USE OF EIGENANALYSIS 249 8.5.1 Stress Tensor or Matrix 251 8.5.2 Finding Principal Stresses 252 QUADRATIC FORMS 254 8.6.1 Circles Become Ellipses 255 8.6.2 Axes and Coordinate System are Aligned 258 8.6.3 Tilted Ellipsoids and Quadratic Forms 259 8.6.4 Finding the Ellipse 260 8.6.5 Finding the Ellipse with more Points Using Least Squares 262

Contents ■ xi

8.7 SUMMARY

263

8.8 EXERCISES

264

Chapter

9 ■ Singular Value Decomposition

9.1 WHAT IS SINGULAR VALUE DECOMPOSITION OR SVD? 9.2 SVD FOR ANY MATRIX

9.2.1

Back to Simple Shear

9.2.2

What is the Connection Between Λ and Σ?

9.2.3

What are the Sizes and Ranks of the Matrices in SVD?

268 268 271 272 274 275

9.3 WHEN DO SIMPLE AND PURE SHEAR PRODUCE THE SAME STRAIN ELLIPSE?

276

9.4 USING SVD TO SOLVE Ax = b

278

9.4.1

The Four Subspaces via SVD

279

9.4.2

Four Easy Steps to Get from the SVD to the Inverse or Pseudoinverse

281

9.4.3

Calculating a Particular Solution xp and the Full Solution to Ax = b

282

9.5 CONDITION NUMBERS, OR, WHEN GOOD MATRICES GO BAD

284

9.5.1

Testing all of Our Approaches to Solving Ax = b

286

9.5.2

Small Errors and Big Levers

287

9.6 SUMMARY

290

9.7 EXERCISES

291

Index

293

MATLAB® Moments

301

Preface In years of teaching students how to use quantitative methods and leading them to gain a more intuitive feel for numerical analysis, we have found it very helpful to introduce them to introductory linear algebra in setting up and solving problems. Undergraduate courses in calculus give students some familiarity with vectors and matrix math but do little to help them develop the ability to formulate problems represented by systems of equations. Many required courses for undergraduate geology majors do not expose the students to the rich array of tools available from linear algebra to gain additional insight into Earth Science problems. The book is intended to cover fundamental aspects of linear algebra. The scope is from vector properties, matrix operations, and solving systems of equations to the computation and use of eigenvectors and eigenvalues for problems such as stress and strain. We go further from this and take up the subject of singular value decomposition, a newer focus of applied linear algebra. All of the material will be presented with examples from and applications to the Earth Sciences. Throughout, we try to present familiar problems that appeal to physical scientists in a mathematically sound way. A familiar example, the three-point problem to determine the orientation of a planar surface, is revisited in most of the chapters. Chapter 1 starts with the graphical and algebraic solution to the three-point problem, but by the time students reach Chapter 5, they will be able to deal with a fifty-point problem if needed. Likewise, we bring back many examples from structural geology and strain to grow the understanding and complexity of what we can determine using linear algebra. In early chapters, the reader will learn that deformation is just a linear transformation, but by Chapter 8, they will understand how to move in and out of the eigenbasis for a problem. This repetition, accompanied by increasing complexity, let the reader appreciate their growing understanding of linear algebra and problem-solving as they read the book. The authors do not consider this a math textbook. The undergraduate and graduate students we have worked with for decades have had extensive exposure to solving mathematical exercises before seeing the material presented here. While some students can apply a theorem to problems in their field, be it the Earth Sciences or Biology or Chemistry, our experience is that most students benefit from extra motivation and context that derives from observations on physical systems. Our aim with this book is to provide a resource for students and instructors who are spurred on more by making measurements and calculations on rocks, for example, rather than formulating a rigorous mathematical proof.

xiii

xiv ■ Preface

How to Use the Book Geology offers perhaps the best motivation possible for learning linear algebra. The examples and problems presented in this book are framed using the Earth and are primarily set in physical 2D or 3D systems. The limited number of dimensions means we can visualize the problem and see where and how the math is used. Once readers understand three dimensions, taking concepts and equations to higher dimensions is almost always accessible. Except for a couple of topics noted in the text, if you can work the algebra in three dimensions, you can understand it in a thousand. We assume in this book that we are working in a finite number of dimensions, with the field being the real numbers. We also include some exercises at the end of each chapter. Rather than focusing on equation solving, the problems are written so that readers can learn to formulate problems in the Earth Sciences. There is a vast array of resources to use for learning how to solve equations for a matrix inverse or eigenvectors and eigenvalues, and readers should be able to access these online readily. Less available are examples of setting up a coordinate system, constructing a set of coupled equations, or laying out the geometry of a tomography exercise. These are the problems that are posed in this book. The reader will build their linear algebra expertise by first putting the problems in matrix and vector form and ultimately using least squares and eigenanalysis to solve, for instance, for velocities and chemical components. The text also presents many examples of using MATLAB® for calculations in linear algebra. These are separated into MATLAB® Moment boxes throughout the text to help the reader build their MATLAB® expertise while learning linear algebra. Other popular scientific programming languages, like R and Python, have very similar functionality and syntax. This book does not provide extended code examples to complete problems. We suggest interested readers start with the book Structural Geology Algorithms by Allmendinger and others1 . Acknowledgments A host of scientists inspired the authors in writing this book. Richard Allmendinger and Allen Glazner have always stressed the importance of computation and quantitative methods in the Earth Sciences, and we have been fortunate to collaborate with them. Basil Tikoff, Julie Newman, and Pete Copeland have greatly encouraged us to progress with this text. In a more mathematical direction, McLean started his linear algebra career in the classes of Gilbert Strang at MIT. Both authors have significantly benefitted from Strang’s books and online courseware. We also want to thank Dr. Chris Glazner for review and comments on the rigor of our presentations. Dr. Kyle Maddox greatly helped with discussions on abstract and linear algebra. Of course, all mistakes, errors, and misunderstandings are solely the authors’ responsibility. An excellent resource for the authors has been videos available on YouTube™. We have gained knowledge and inspiration from YouTubers Grant Sanderson, Trefor Bazett, Nathaniel Johnston, Dr. Peyam Tabrizian, Steve Brunton, Nathan Kutz, and 1 Allmendinger, R., Cardozo, N., and Fisher, D., 2011, Structural Geology Algorithms: Vectors and Tensors. Cambridge: Cambridge University Press, doi:10.1017/CBO9780511920202

Preface ■ xv

many others. One of the first links we give students is to the Essence of Linear Algebra at 3Blue1Brown. The use of these videos by students to gain a further and different understanding of linear algebra cannot be overstated. Mark A. Wilson, College of Wooster, provided the brachiopod image we use in several chapters. F. Pantigny helped the authors immensely with the LATEX package NiceMatrix. The authors also wish to thank the alums and benefactors of the Geology Department at the University of Kansas for their support of the authors, notably the Union Pacific Resources Foundation (Walker) and Hubert H. and Kathleen M. Hall (McLean). We are also greatly indebted to the numerous graduate and undergraduate students who have taken courses using the material discussed in this book. In preparing this text, we tried to use everything we gleaned from students to make the examples clearer and the explanations better.

Author Bios J. Douglas Walker works on the integration of Tectonics, Geochronology, and Geoinformatics to better understand the geologic development of plate boundaries. His work has addressed problems in the development of contractile belts in a backarc setting, the response of the lithosphere to extensional deformation, and past and active strike-slip faulting. This work relies on the quantification of Earth Science problems and the assessment of uncertainty in geology. Walker’s active research program spans the gamut from developing software/hardware combinations for geologic mapping in the field to regional database development for solving tectonic problems. Walker received his B.S., M.S., and Ph.D. from the Massachusetts Institute of Technology. Since defending his dissertation work in 1985, he has been at the University of Kansas teaching courses in structural geology, tectonics, field geology, historical geology, error analysis, and linear algebra. He is currently the Union Pacific Resources Distinguished Professor in Geology. He has a long service record with the Geological Society of America and is a former President of this organization. He also received the Outstanding Contributions in Geoinformatics Award from the Geological Society of America (2019) and the Distinguished Service Award from the Geochemical Society (2013). Noah M. McLean works on problems in geochronology, thermochronology, and isotope geochemistry. One direction of his research involves understanding how ages are measured, and making measured ages more accurate and precise. This line of questions has led him to collaborations with geochemists and geophysicists. With computer scientists, he has helped to build community tools for interpreting those measurements. Other lines of research involve applying geochronology techniques to understanding geological problems like large-volume silicic magma system behavior, geochronology of the terrestrial sedimentary record, and timescales of mountain-building processes. McLean received his B.S. from the University of North Carolina at Chapel Hill and his Ph.D. from the Massachusetts Institute of Technology. After a postdoctoral position at the British Geologic Survey, he joined the Geology Department at the University of Kansas, where he is currently the Hubert H. and Kathleen M. Hall Associate Professor.

xvii

CHAPTER

1

Rows and Columns

The most confusing aspect for Earth Scientists starting to learn linear algebra is understanding how to apply linear algebra to problems in the real world. In this chapter, we describe some geologic examples that should be familiar and then render these problems in the language and framework of linear algebra. Our main aim is to show why using the tools of linear algebra to solve these problems is much more powerful and extensible than other ways like commonly used graphical constructions. We try to emphasize setting up problems and the procedures for solving them and discuss essential but seemingly mundane aspects, such as what units imply and how they are used in error checking.

1.1 THE THREE-POINT PROBLEM We will start our exploration of linear algebra with a common exercise in geology: determining the orientation of a surface where the location and elevation of three points on the surface are known. This is the classic three-point problem illustrated on a map in Figure 1.1. 1.1.1

Graphical Solution

The first way to solve this problem is to use a graphical solution familiar to structural geology, subsurface geology, or hydrogeology labs. The problem starts using a map with a scale and three points. Each point is assigned an elevation (as is typical in determining the orientation of dipping beds or groundwater surfaces) or depth below the surface. The problem we present here is to determine the strike and dip of an inclined bed using its outcrop pattern on a topographic map, as shown in Figure 1.1. The graphical approach is the best solution if you are a geologist with your map, a protractor, and a scale in the field. The method is relatively straightforward, and the answer is right on the map. Although quick and easy for solving a single threepoint problem, is this the right approach for 50 such solutions when the data are in a Geographical Information System or otherwise digitally available? Probably not.

DOI: 10.1201/9781003432043-1

1

2 ■ Linear Algebra for Earth Scientists

etrik

60

m

80

1

Strike = α

60

40

-S

N

50 m

30 1

2 δ

ip D 3

-

50

m

2

90 N

α 60 3

Dip = δ = tan-1(60/50)

The classic three-point problem and its graphical solution. On the left, a topographic map with contours (in meters), a scale bar, and a north arrow. Points 1, 2, and 3 are on the base of a dipping unit shown in blue. The elevations of these points are shown on the right. The graphical solution involves finding the spot on the line from 1 to 2 (from the highest and lowest points, red line) that corresponds to the elevation of 3 (the middle elevation point). The middle elevation is 60 m or halfway between the other two. We put a point (shown in green) on the line joining 1 and 2. A line of strike is a line of equal elevation, so we draw a line from point 3 through the green point (red) and add parallel lines of strike through the other two points (shown in black). We measure the strike angle α from the north, shown in the lower right. The dip δ is computed using trigonometry. The tangent of the dip is the change in elevation between any two lines of strike divided by the distance between them, so the dip angle is the inverse tangent of this ratio. Figure 1.1

1.1.2

Three Equations, Three Unknowns, and Nine Knowns

We need to describe the problem in a different way to use linear algebra. We repose the problem, devise a way of solving it (which will require a numerical approach and not a graphical one), and then write the answer to compare it to the previous solution. This is a bit tedious the first time, but it becomes quicker with time and practice.

Rows and Columns ■ 3

3

3 slope

slope

1 2 Z Y

2 X

1

Z X

RHS

Y

Diagram of a sloping surface. In the left part, the surface slopes down to the front right. In the right part, the surface is shown in map view and rotated 90° clockwise. The coordinate system is right-handed, and we measure angles counterclockwise from x. Figure 1.2

Our new view, as shown in Figure 1.2, is an inclined surface that forms the top of a box. There is a scale, location, and elevation, which we illustrate here with a coordinate system x-y-z. In this picture, the surface slopes obliquely relative to x and y, with the elevation on the surface as z. The origin is set at the front of the box at (x0 , y0 , z0 ), and the surface intersects this point. We set up the positive directions of the x, y, and z axes to form a right-handed coordinate system. For each point, we can think of its position in space being defined by an equation of a plane. The general equation for a plane is: mx x + my y + c = z.

(1.1)

The values for x, y, and z correspond to the locations and elevations on the surface. The constant c is the base level for the z value. The constants mx and my are the gradients in the x and y directions. These correspond to the partial derivatives of the slope of the surface: mx =

∂(elevation) ∂(elevation) and my = . ∂x ∂y

For points 1, 2, and 3, we can write an equation expressing the elevation in terms of slopes, locations, and base level. For simplicity, we will assume that x0 = y0 = 0, but the value z0 remains one of the unknowns. The equations for each point then

4 ■ Linear Algebra for Earth Scientists

become: mx x1 + my y1 + z0 = z1

(1.2)

mx x2 + my y2 + z0 = z2

(1.3)

mx x3 + my y3 + z0 = z3 .

(1.4)

Equations 1.2 to 1.4 show that we have three equations and three unknowns mx , my , and z0 . We know the coordinates of each of the points, so we have nine known values. Setting x0 = y0 = 0 may seem like an arbitrary shortcut, and the reader may wonder why not just set z0 to zero and just need two equations. If we calculate ∆x12 as x1 − x2 , then it is exactly the same as (x1 − x0 ) − (x2 − x0 ). Any shift in x0 or y0 cancels out. This is not the case for z0 , which we must determine. This setup should give us a unique solution for the unknowns. We need to rearrange the equations and solve for the unknown variables. After a bunch of painful algebra, during which we could make a lot of mistakes, we can arrive at a solution. The steps are: 1. Solving equation 1.2 for z0 : z1 − mx x1 − my y1 = z0 .

(1.5)

2. Substituting z0 into equation 1.3, mx x2 + my y2 + z0 = z2 , then gives us which simplifies to

my y2 − my y1 = z2 − z1 − mx x2 + mx x1 , my (y2 − y1 ) = (z2 − z1 ) + mx (x1 − x2 ),

giving us the equation for my , my =

(z2 − z1 ) + mx (x1 − x2 ) . (y2 − y1 )

3. Inserting my into equation 1.4 in order to solve for mx , mx x3 + my y3 + z0 = z3 , mx x3 + y3

(z2 − z1 ) + mx (x1 − x2 ) + z1 − mx x1 − my y1 = z3 , (y2 − y1 )

mx x3 + y3

(z2 − z1 ) + mx (x1 − x2 ) + z1 − mx x1 (y2 − y1 ) − y1

(z2 − z1 ) + mx (x1 − x2 ) = z3 . (y2 − y1 )

(1.6)

Rows and Columns ■ 5

4. Combining terms, simplifying, and solving for mx , mx (x3 − x1 ) + (y3 − y1 )

(z2 − z1 ) (x1 − x2 ) = (z3 − z1 ) − (y3 − y1 ) , (y2 − y1 ) (y2 − y1 )

(z2 − z1 ) (y2 − y1 ) mx = (z3 − z1 ) − . (x1 − x2 ) (x3 − x1 ) + (y3 − y1 ) (y2 − y1 ) (y3 − y1 )

(1.7)

We can now compute mx using the known values. Then we substitute the result for mx in equation 1.6 to get my , and then my in equation 1.5 to get z0 for the final value for the unknowns. Now we have solved for all three unknowns, mx , my , and z0 . MATLAB® moment—Some basics Here are some basic operations you will want to know for scalars. They follow all the standard conventions you might find in, e.g., Microsoft® Excel. >>

The prompt in the Command Window. Do operations here.

>>x1=5 Create a new variable, x1, and set its value to five. >>x3=x2+x1 Compute a new value from defined ones and show the result. >>x3=x2+x1; Compute but suppress output using semicolon. >>x+y, x-y, x∗y, x/y Addition, subtraction, multiplication, division. >>x3=4+(x2∗x1) MATLAB® follows normal order of operations.

1.2 CAN WE SOLVE THIS MORE EASILY USING LINEAR ALGEBRA? In the previous case, we did the three-point problem by treating it as an exercise in equations and unknowns. In other words, we solved a series of coupled equations to get the result. This is the Row View in linear algebra and is where we will start in moving toward posing the problem in a linear algebra framework. 1.2.1 Putting a Problem Written as Equations into a Linear Algebra Form

In section 1.1.2, we translated the locations of the three points into the equations for planes. In doing this, we used nine constraints, the x, y, and z value of each point, and three unknowns, the two slopes, mx and my and the intercept on the z-axis z0 .

6 ■ Linear Algebra for Earth Scientists

We can express each of the equations together in terms of known coefficients and unknown variables similar to equations 1.2, 1.3, and 1.4: x1 · mx + y1 · my + 1 · z0 = z1

(1.8)

x2 · mx + y2 · my + 1 · z0 = z2

(1.9)

x3 · mx + y3 · my + 1 · z0 = z3 .

(1.10)

Notice that we inserted a 1 into each equation to have a coefficient to multiply z0 . This is another known factor that we use in solving this problem. We can then rewrite our three equations in matrix form as a single equation using the language and representation of linear algebra, which gives a single equation to replace the three equations 1.8 to 1.10. x1 y1 1 mx    x2 y2 1my  x3 y3 1 z0 A

 







x

=

z1

  z2 

(1.11)

b

(1.12)

z3

=

Ax = b is the Fundamental Equation of Linear Algebra. We express the problem in terms of explicit elements in equation 1.11 and simple linear algebra form in equation 1.12. The first element is a 3 × 3 Matrix shown as A. When we describe the size of matrices, we list them as the number of rows and then the number of columns. An m × n matrix has m rows and n columns. Each entry of the matrix is called a Component. The next two elements, x and b, are vectors, which are the same as 3 × 1 matrices. When there is a matrix with a single column, n is 1; we call it a Column Vector. When it has a single row, m is 1; we call it a Row Vector. We call any matrix 1 × n or m × 1 a vector. Each entry of the vector is also called a component. A vector that is m × 1 has m components. We can think of matrices m × n as being formed by row or column vectors, whichever approach is more natural for the problem. Chapter 2 details how to perform the multiplication in equation 1.11 and how to multiply matrices in general. But for now, what we are generally doing is multiplying the matrix times the first vector by taking the first row of A times x to get the first entry in b. This operation is shown in Figure 1.3 and produces the first equation 1.8 of our coupled equations for the three-point problem. We could continue this row-byrow to reproduce the equations 1.9 and 1.10 if needed. The matrix A consists of the coefficients of the three equations xn + yn + 1 = zn , vector x contains the unknowns mx , my , and z0 , and vector b has the data zn for the coefficients in A. Note to the reader: In all the following text, we adopt a bold capital letter for matrices and a bold lowercase letter for vectors. We will mostly call matrix A the Coefficients Matrix or Design Matrix, x the Solution or Unknowns Vector, and b the Data Vector or Vector of Results. A matrix is a collection of row vectors arranged vertically or column vectors arranged horizontally. The best view depends on the problem we are solving. In the previous example for the three-point problem, each equation is a row in the

Rows and Columns ■ 7

Figure 1.3 Multiplying the elements of the first row of A times the column vector x to get the first element of the data vector b. This operation is the same as the equation on the right-hand side of the figure. We repeat this step with the second and third rows of the matrix A to get the second and third equations.

coefficients matrix. The data vector had the result for each equation in its rows. The only confusing part is that the unknowns in the equation became a column vector even though they were multiplying entries in the coefficients matrix as rows. It turns out that this is just right once we understand the linear algebra method of multiplication, which is explained in detail in the next chapter. Remember that the matrix operation—written fully in equation 1.11 and as a linear algebra equation in 1.12—is the fastest, most informative, and most descriptive way of getting the result we want. 1.2.2

Solving a Problem in Linear Algebra Form

Now that we have rewritten our problem from equations into linear algebra, we are left with the problem of how to solve for the unknowns. If this were just algebra and we had A · x = b, we would simply write x = b/A. It is almost that easy in linear algebra, with a slight rewrite to accommodate the fact that we are using matrices and vectors: x = A−1 b ,

(1.13)

where the A−1 multiplying b serves just like dividing b by A. The matrix A−1 , like the matrix A, is 3 × 3. But we still have a lot to learn. A is a matrix, x and b are vectors and not numbers, we have no idea how to compute A−1 (it is called the Inverse of A), and we still are not fully versed in multiplying matrices or vectors. We will handle the ideas of multiplication in the next chapter, along with some basic vector and matrix operations like the transpose of a matrix. What A−1 is and how we compute it is a major component of later chapters. Using it in multiplication is the same as shown in Figure 1.3. 1.2.3 Remember the Units

One way to avoid having trouble setting up the equations in linear algebra is to remember to keep track of the units being used. The three-point problem is a

8 ■ Linear Algebra for Earth Scientists

straightforward example of this. We would have the locations in a horizontal unit expressed as a length, probably meters from northing and easting. The elevations would also be expressed in the same units, with the slopes as unit changes per unit. This is shown written out. x[m] · mx [m/m] + y[m] · my [m/m] + (x · mx )[m]

+

(y · my )[m]

1 · z0 [m]

+ (1 · z0 )[m]

=

z[m] →

=

z[m]

This case is easy, but others will be more complex. We will further explore how units propagate through linear algebra calculations in the following sections. Also, note that we use square brackets to indicate we are addressing the units of a quantity. So [cm] would indicate units of centimeters and [ℓ] dimensions of length. If we were doing physics problems, keeping the units consistent would be called Dimensional Analysis and would be integral to our work. We seldom see dimensional analysis used in mathematics texts. The applied examples of linear algebra presented here are important because they can give great insights into the problem while we set up the basic math for the systems of equations. We stress it here because we are working in the Earth Sciences on problems in the real world.

1.3 THINKING ABOUT COLUMNS AND VECTORS We set up the three-point problem by using rows to create a matrix for a system of coupled equations. As previously noted, each matrix column can be considered a vector—a column vector. Now, we will do a different problem by thinking about the matrix as being defined by column vectors. Although we could do this with the example of the three-point problem, we pick a different one that seems more natural as columns. Some problems lend themselves to being viewed as rows and others as columns; for many, either approach is equally natural. Our next example is a problem encountered in structural geology and neotectonics. We know from geodetic measurements how fast two points on the surface of the Earth move away from each other horizontally. This example deals with extension: two faults separate the points, and for each, we know the motion direction from criteria such as slickenlines, but we do not know the slip rate on either one. Assuming that the overall horizontal motion between the two points results from the combined motion on the faults, we can figure out how fast each structure is moving. The problem is shown in a block diagram and on a map in Figure 1.4. How do we start solving this problem? We must combine motions on the two faults to develop the resultant motion vector. We label the direction of normal fault motion a and strike-slip fault motion b. We know the direction of motion for each fault but not the rate of motion, the slip rate. The total horizontal motion between the two GPS points that we have determined over some period of time is r, and we know both the direction and magnitude for this vector. We have to combine and scale a and b by the unknown slip rates u and v to get the full resultant motion vector r over the period of observation. ua + vb = r

(1.14)

Rows and Columns ■ 9

RHS

b

a

a

I

b

N I E

D D N E

Diagram of normal and strike-slip faults in north-east-down (NED), righthanded system. Positive rotations appear clockwise on the map surface. The motion direction is known for each structure, not the slip rate. The red arrows show the movement directions. Square boxes show the location of GPS stations used to determine the block’s motion on the lower part, away from the upper part. Blue vectors show the components of motion relative to N and E. The blue arrow on the green box shows the net motion of the right-hand block in the N-E reference frame, with displacement given in appropriate units (e.g., mm/yr). This vector has been measured. The map scale for such a problem would typically be several kilometers, but it is unimportant for our setup. Figure 1.4

Before we start solving for the unknowns, we first consider the units. The motions a and b are unitless because they have an orientation but no magnitude. They could, for example, be assigned units of [mm/mm] or [m/m] since, in practice, the slickenline orientations are derived by making some physical measurements, but these units divide out. The resultant GPS-measured motion vector r has both a direction and units [mm/yr]. The unknowns u and v must also have units [mm/yr] so that when we multiply them a and b, we get units consistent with r. u [mm/yr] · a [mm/mm] + v [mm/yr] · b [mm/mm]

=

r [mm/yr]

In this problem, we have three knowns: the total horizontal motion r and the relative horizontal motion directions a and b. The knowns are not scalars but have two values, one referencing the N direction and the other the E direction. Each pair of values can be represented as elements of a single vector. Our unknowns are the two scalar slip rates u and v—the slip rate on faults a and b, respectively. This pair

10 ■ Linear Algebra for Earth Scientists

of scalar values can also be represented as elements of a single vector. "

#

"

#

" #

"

a b u r a = N , b = N , slip rates = , and r = N aE bE v rE

#

(1.15)

We take all of these as column vectors and use a and b as the columns of a matrix. The multiplication would be done as in Figure 1.3, except now we only have a 2 × 2 matrix. "

#

"

#

"

a b a u· N +v· N = N aE bE aE

bN bE

#" #

"

u r = N v rE

#

(1.16)

And this gets us right back to having the equation in the linear algebra form Ax = b. We got to this result in a very different way than we did for the three-point problem. We did not write any equations but recognized that we could start with vectors and add and scale them. In this case, we arranged the column vectors into a matrix. Note from this point forward, when we use the term vector, it will be for a column vector. We will explicitly call row vectors as such when discussing them. There seems to be a little trickery here. Why do the values for u and v appear on both the right and left sides of the a and b vectors? In our case, u and v are scalar quantities and can multiply from the right or left sides; however, order matters when we have vectors and matrices. So, although we could rearrange the scalars on the left-hand side of 1.16, the order of the matrix and vectors is fixed. We will see geologic examples of matrix, vector, and scalar multiplication very soon.

1.4 VECTORS We have examined vectors using examples and constructed them in a fault slip problem. We have seen that vectors are the basic building blocks of matrices. Since vectors are such fundamental features of linear algebra, we will explore them in more detail in this section. We will first better define a vector and then show some of its properties in mathematical terms. Then, we present examples of vectors that mostly come from structural geology. So far, we have used vectors for sets of unknown and data values. We now explore their geometrical significance. In later chapters, we will extend the usage of vectors to long arrays of data values—something to look forward to. 1.4.1 Properties and Operations of Vectors

We know that a vector is a column or row of numbers that we call components. So far, the components are the coefficients for terms in a set of equations or an array of unknown values. Perhaps one of the most common ways to first think of a vector is that it is an arrow in space, as we show in the left-hand part of Figure 1.5. The vector a can be considered just an object in space but acquires more meaning when we put it in a coordinate system. We show the axes u-v as orthogonal axes. This gives us directions for the reference frame. To this, we add two special vectors u ˆ and v ˆ that are parallel to the axes. We define each to have unit length; these become the

Rows and Columns ■ 11

Diagram of basic vector operations. On the left is a single vector a in the u-v coordinate system. The unit basis vectors give dimension to the axes to establish their length and coordinates. The middle shows simple addition and subtraction in 2D. The right side shows multiplication by a scalar in 3D. As with scalar addition and multiplication, the operations could also be combined. Vector addition and multiplying a vector with a scalar are commutative (e.g., a + b = b + a and ca = ac).

Figure 1.5

standard unit vectors for our coordinate system, and we refer to these as the unit Basis Vectors. They add the idea of length to the axes, and these vectors completely define the dimension and orientation of the coordinate system. Lastly, we place the tail of the arrow a at the origin of the coordinates and read its components using the unit basis vectors as shown in Figure 1.5. What are the mathematical Operations on Vectors we do? There are two basic rules that vectors follow. The first is that we can multiply a vector by a scalar constant, c, and the second is that we can add a vector to another vector of the same size. We show the operations in the following equations and Figure 1.5. "

#

"

#

"

#

a a c · a1 c· 1 = 1 ·c= , and a2 a2 c · a2

"

#

" #

"

a1 b a + b1 + 1 = 1 a2 b2 a2 + b 2

#

(1.17)

Vector addition and scalar multiplication follow the associate, commutative, and distributive laws, although we have not yet covered how to ‘multiply’ two vectors. a + (b + c) = (a + b) + c

→ associative,

a + b = b + a , and c · a = a · c → commutative, c · (a + b) = c · a + c · b

→ distributive.

These are the most important basic rules for vectors in linear algebra. Remember, we are doing a form of algebra, so our main interests are multiplying, dividing, adding, subtracting, and sometimes taking powers. We now know how to add vectors and multiply by a constant. Collectively, a set of vectors following these rules and definitions could be called a Vector Space, but further discussion of space and other operations will come in later chapters.

12 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Making vectors Create a new variable as a row or column vector (both examples of an ‘array’ in MATLAB). Use square brackets, [

] to wrap the elements in arrays.

>>a=[1,2,3] or h

i

>>a=[1 2 3] gives the 1 × 3 row vector a = 1 2 3 . You can use commas, spaces, or both between elements in the same row. Use semicolons to create row breaks. 1

    >>a=[1;2;3] gives the 3 × 1 column vector a = 2.

3

MATLAB® moment—Addition and multiplication A common operation in linear algebra is multiplying a vector or a matrix by a scalar. One of the defining properties of being linear is that this can be done. " #

" #

1 3 >>a=[1;2]; b=[3;1]; c=3; gives a = ,b= and c = 3. 1 2 " #

" #

1 3 >>d=c∗a gives d = ca = 3 = , and 2 6 " #

" #

" #

3 4 1 + = . >>f=a+b gives f = a + b = 1 3 2

1.4.2 Some Geologic Vectors

We will start with some simple geologic vectors. This will show us how to set up coordinate systems and give us ideas on what we can use vectors for in geologic applications. We will take examples from structural geology and field geology to start. Such examples correspond to real rocks and solid geometries that we hope readers find natural to visualize. In addition, you can simply use your hands to reproduce the features we describe here if you get confused or the figures do not make sense. We will also reuse some of these examples in later sections and chapters. Suppose we have a dipping fault surface across which we have motion, as shown in Figure 1.6. We can orient several features here, but we will start with the slip

Rows and Columns ■ 13 RHS

50 m

slip

N

N 60 m

30 m

30 m

60 m RHS

E D

slip

D

foo tw all

U

S

E

ll wa g in ng a h

RHS

N RHS

P

UD

U

Figure 1.6 Block diagram of a dipping fault with its slip vector. On the right, we isolate the fault plane and slip and present it in three coordinate systems. We use the northeast-down (NED), north-up dip-up (NUDU), and a right-handed system, oriented parallel to the plane, with one axis in the slip direction, the next perpendicular to slip, and the last up perpendicular to the fault surface (SPU). All of these are arranged as column vectors with components corresponding to the main axes of the coordinate system. The subscript after vector gives the coordinate system that is the basis for the vector components.

vector, representing the motion between the hanging wall and footwall blocks. This is a vector in space and exists without any particular coordinate system. Once we pick the reference frame, the vector has a length expressed in appropriate units and orientation. When working with data vectors, such as time series or measurement data, there may be no coordinate system, just a vector of data with specified units arranged into a column or row. We almost always start by specifying a coordinate system for vectors representing quantities in the physical world. In this geologic case, our usual approach uses the handy North-East-Down or NED coordinate system. In geology, we always look down into the Earth, so a positive downward direction is natural. Also, since we are on the Earth, north is an obvious coordinate, and we will use it as the principal cardinal direction. Given those two axes or directions, we get east as the final axis of a righthanded coordinate system and list the coordinate axes in the order north-east-down NED. Let the fault have 30 m dip-slip and 60 m left-lateral strike-slip motion in this example. In the NED system, we can do a bit of trigonometry to figure out the E and D components. The east component is 30 · cos(60◦ ) = 15 m, and the down component is 30 · sin(60◦ ) ≈ 26 m. The north component is straightforward; it is just 60 m. We can also explore other coordinate systems that might be useful (Figure 1.6). The second is NUDU (north-up dip-up), with the coordinate system in the plane and directed upward. In this case, the north component is 60 m, but the up component is 0 m, and the up-dip is 30 m, as given in the figure. The next system is in the

14 ■ Linear Algebra for Earth Scientists

e

lin e k i str

RHS

N

D

dip line

E pole to plane

strike and dip

lin e k i r d st e i l p

dip azimuth and dip

e

im

dip azimuth and dip Block diagram of a dipping bed or surface. The right-handed coordinate system north-east-down (NED) is shown. The main features that fix the orientation of the dipping bed are the strike and dip, the dip azimuth and dip, or the pole to the bed. Note that the pole to the bed points downward in the positive D direction. We will not address plane facing at this point. Figure 1.7

reference frame of the fault surface and the slip direction, which we can call SPU (slip-perpendicular-up). In this system, there is only motion in the slip direction; in √ this case, it is the length of the slip vector. This gives 602 + 302 ≈ 67 m. So, we see three ways of expressing the same vector using different cartesian coordinate systems. The coordinate system has three components since this is a problem in three dimensions. The components here are the motion in meters relative to the coordinate axes. Not to complicate things too much, but we can assign a unitˆ, length vector to each coordinate direction. For NED, we would write these as N ˆ and D. ˆ We will spend much more time on these unit vectors in later chapters. E, Lastly, as we noted in our starting discussion of vectors, the vector exists without the coordinate systems. But, NED makes describing what the vector does to the Earth much easier. 1.4.3 What about a Dipping Bed or Surface?

We will continue our discussion with another example, a dipping bed. This is shown in Figure 1.7 with all the orientation quantities we can assign to a bedding surface.

Rows and Columns ■ 15

The reader may be most familiar either with the Strike and Dip or with the Dip Azimuth and Dip terminology from fieldwork or classes, and may also know the Pole to the Plane or Plane Normal from using stereonets in structural geology. Figure 1.7 shows these various terms and vectors. All of these notations express the orientation of the bed but do so in different ways. The strike-and-dip and dip-and-dipdirection notations imply the orientation of two different vectors. This is obvious for strike and dip because the strike is a vector parallel to the surface of the Earth, and the dip is perpendicular to the strike and headed into the Earth. Writing a quadrant notation such as N45E 25SE or azimuth as 45, 25 completely fixes the orientation of two vectors in space, and two perpendicular lines define a plane. Dip azimuth and dip do as well, in that the direction perpendicular to the dip azimuth is the strike and thus specifies not only the dip-azimuth line but also the strike line, as we see in Figure 1.7. Finally, as explained, we can view the pole or normal to the plane as a vector, making an angle with each coordinate direction. The next task is to figure out how to bring this into a linear algebra framework. We stated that all three representations are equivalent, as they have to be since they are recording the same observation but written in very different ways. In this book, our approach for planes is to work as much as possible with the pole to the plane, with some mention of dip azimuth and dip. Plenty of texts and resources explain converting strikes and dips to poles or normal vectors, so we will not deal with that fully. We adopt the treatment where we turn the pole-to-the-plane into direction cosines. We can form vectors from the geologic inputs in all of these cases. MATLAB® moment—Making matrices A way to build a matrix in MATLAB® is by coding it in a single line. "

#

1 2 3 . >>A=[1,2,3;4,5,6] gives A = 4 5 6 Another way to build a matrix is by stitching together, or concatenating, multiple separately defined arrays.     1 4     >>a=[1;2;3]; b=[4;5;6]; makes vectors a = 2 and b = 5. 3 6 Vectors a and b can be concatenated by wrapping their variable names in square brackets, just like assembling an array from individual elements. 1 4   >>C=[a,b] gives C = 2 5. 3 6 



16 ■ Linear Algebra for Earth Scientists

N

RHS

RHS

E

z

dipping plane

x

D y

Block diagram of the definition and determination of direction cosines. The cartesian coordinate system is x, y, z on the left-hand diagram and NED on the right. The vector v has length ∥v∥, and forms the direction angles α, β, γ with the coordinate systems. The values for l, m, n are given. These are equal to xv /∥v∥, yv /∥v∥, zv /∥v∥. The vector on the right is the pole or normal to a schematic dipping plane. Figure 1.8

1.4.4

Direction Cosines

The Direction Cosines are just the cosines of the angles between the coordinate axes and the vector or pole we are trying to orient. This is shown in the left part of Figure 1.8 with the vector v and the x, y, and z axes. The cosines of the angles bear a particular relation between the vector’s length and the vector’s projection onto the coordinate axes. If the projections are xv , yv , and zv , and the length of the vector v is ∥v∥, then: cos α =

xv =l ∥v∥

cos β =

yv =m ∥v∥

cos γ =

zv = n. ∥v∥

(1.18)

We use l, m, and n to equal the values of the direction cosines from x, y, and z, respectively. We also show the same relations for the NED coordinate system to identify the angles (right side, Figure 1.8). A handy identity we will use is l2 + m2 + n2 = 1. We usually make v a unit vector, with ∥v∥ = 1. This is especially easy for orientation data because we can normalize the pole to the plane vector or a linear vector to any value we want. 1.4.5 Lines as well as Planes

In the Earth Sciences, we come across a lot of linear features that fit naturally into a linear algebra framework. Called Lineations, these features are oriented by specifying the trend as the angle from north in the horizontal plane, and the plunge as the vertical angle of the line or lineation with the horizontal, as shown in the left

Rows and Columns ■ 17

trend

RHS

N

E

plunge

N

D

lineation E

foliation D

lineation strike azimuth of foliation rake

N

E

pole to foliation

D

Tectonite with deformation features and the orientations of geologic lineations. The middle shows a rock sample with a strong lineation of the white feldspar grains. The left is the orientation relative to the NED coordinate system of both the foliation and lineation. On the right are the angles used for direction cosines relative to the NED framework for the lineation and the foliation. The lineation lies in the plane of the foliation. We measure its trend and plunge separately from the foliation. The lower left shows a new quantity, the rake of the lineation, which is the angle between it and the strike line of the foliation. The rake is constrained to lie in the plane of the foliation.

Figure 1.9

side of Figure 1.9. In the previous section, we had an example of a vector that defined fault slip. Another good example is a deformed rock with a strong gneissic layering and a prominent lineation (Figure 1.9), a rock we can call a tectonite. Here, geologists would measure the dip and strike of the foliation, and we discussed how to orient the foliation as it is just a plane. The lineation is oriented using trend and plunge. We can convert the trend and plunge into direction cosines using the same method for the pole to the foliation plane. This will just use the cosines of the angles α from N, β from E, and γ from D (Figures 1.8 and 1.9). A quantity called rake is another way to orient a linear feature. The rake is easy to measure in the field, just the angle between the strike or horizontal line and the lineation measured downward in the plane (Figure 1.9 in the lower left). This is true for any linear feature on a surface, including slickenlines on a fault surface. Rake can vary from 0° to 180°. The question is, where is the zero position? Since the rake is measured relative to a line of strike on a plane, we mainly have to figure out which end of the strike line we use. Structural geologists take this as the strike azimuth, the direction where the plane appears to dip to the right. This is also just 90° counterclockwise from the dip azimuth. Computations using rake are standard in the Earth Sciences.

18 ■ Linear Algebra for Earth Scientists

1.5 A GEOCHEMICAL EXAMPLE Our next example will be from geochemistry. In this case, we are using the proportions of elements in minerals to model the mineral composition of a rock. This operation will be explained in detail. One of the nice features of this example is that the problem is equally at home when seen in the row view, column view, or simply as a set of equations to solve. It is also a good problem to ensure the units are set up correctly. For this example, we do not get the help of a simple physical view, such as a dipping bed, but the linear algebra view is clear. Suppose we are studying an altered rock composed primarily of three minerals— potassium feldspar (K-spar), biotite, and chlorite. These are silicate minerals that contain some of all the major elements, but we can single out three elements that distinguish them from each other: Potassium (K), Aluminum (Al), and Iron (Fe). Table 1.1 shows the proportions of each element in each mineral and the altered rock as follows: Elemental analysis of a fictional altered rock composed of three minerals. The compositions are in simple elemental proportions and not in more involved units such as weight percent. TABLE 1.1

Analyte

K

Al

Fe

Biotite Feldspar Chlorite Rock

1 1 0 5

1 1 3 10

3 0 1 8

We set up the problem further by defining b = amount of biotite, f = amount of feldspar, and c = amount of chlorite. Now that we know the rock’s chemical composition, we would like to know the proportion of each mineral needed to make up the rock. Using the minerals and elements, we can write down three equations that express the total amount of an element in the rock as the sum coming from some mixture of each of the individual minerals. These equations must follow the relationship between minerals and rock, which comes from the basic idea that the Element in mineral × Mineral proportion = Element in rock. K = 1b + 1f + 0c = 5 Al = 1b + 1f + 3c = 10

(1.19)

Fe = 3b + 0f + 1c = 8 . This is the Row or Equation View of the problem. We could also write this focussing on the individual minerals. This form expresses the relation between the amounts of each mineral and the rock’s elemental composition. 1 1 0       b 1 + f 1 + c 3 3 0 1  

 

 

5   10 8 

=



(1.20)

Rows and Columns ■ 19

This is the Column View of the problem. In this view, we treat the problem as adding up the biotite, feldspar, and chlorite vectors to create the data vector that is the rock’s composition expressed as elements. We are setting up the problem as the scaling and addition of column vectors rather than a set of equations. Regardless of how we set the system up, we will still get a result of the form Ax = b. 1 1 0 b 5      1 1 3 f 10 =      3 0 1 c 8 

1.5.1

 





(1.21)

Do the Units Work?

Next, what are the units associated with the matrix and vector components? Remember, these must work out, or we have incorrectly set up the problem. The units for the coefficient matrix A are Element per mineral. For the unknowns in x, the units are just the Amount of mineral. Finally, we have units of Amount of element for the data vector b. Applying this to the result, we get units for the following matrices. 1[K/b] 1[K/f ] 0[K/c] [amount of b] 5 [amount of K]      1[Al/b] 1[Al/f ] 3[Al/c][amount of f ] = 10 [amount of Al] 3[Fe/b] 0[Fe/f ] 1[Fe/c] [amount of c] 8 [amount of Fe] 









So, it is clear that the units work out. Considering the first entry, we get (K/b) × Amount of b = Amount of K. We immediately see that any other variable arrangement would not give reasonable units. Exchanging values between x and b would give: (K/b) × Amount of K ̸= Amount of b. MATLAB® moment—Random integer arrays Sometimes, you want a vector or a matrix of integers to test your code or check a linear algebra concept. A lot of the matrices and vectors we present in the text use integer values. Such matrices are nice examples because you can do many operations in your head and see the result. Here is an easy way to generate an integer vector or matrix of any size. h

i

>>a=randi(9,1,3) gives a = 8 9 6 . 2 3   >>A=randi(5,3,2) gives A = 5 2. 1 4 



The randi(n, r, c) command produces n random real numbers between 1 and n, inclusive. This is the first argument of the function. The variables r and c are the number of rows and columns in the output array or matrix. This command is very similar to the rand() function we saw earlier.

20 ■ Linear Algebra for Earth Scientists

1.5.2 Graphing the Row and Column Views

We have set the problem up like we started the three-point problem. We can go through the isolation of variables, do lots of algebra, and solve for the unknowns. This is a very standard way to find solutions. However, it does not give us much insight into how the combination of minerals works together to make the rock. For this problem, the column view is the more informative and visual approach, which we show along with the row view in Figure 1.10. We start in the upper left of Figure 1.10, showing a vector plotting each mineral’s elemental composition. The minerals vectors are fixed by how much of each element, K, Al, and Fe, is present. Emphasizing the elemental axes is natural because these are the units of the whole rock measurement. In the upper right of Figure 1.10, we show the whole-rock composition in terms of elemental axes and axes of the minerals b, f , and c. The latter is appropriate since these are the units of the solution vector x expressed as the amount of each mineral. The mineral axes are not orthogonal, but that is not a problem when looking at the columns of a matrix. The lower left shows the solution vector emphasizing the elemental components. The blue background shows the data vectors in terms of the elemental axes. The lower right highlights the solution as mineral components for the column view. The blue background here shows the coordinates using the minerals, not the elements, and it is clear what combination of the minerals is needed to arrive at the whole rock composition. Figure 1.10 clarifies that the whole-rock analysis is a vector that can be considered to have coordinates of either K-Al-Fe or b-f-c. When we set up the K-Al-Fe system, we start with an orthogonal or cartesian coordinate system. On the other hand, we can also set up the problem in the b-f-c coordinate system, which is not orthogonal. In the previous example, the matrix A tells us how to translate between the two coordinate systems. This is a linear transformation matrix we will discuss in the next chapter. As we will learn later, the b-f-c coordinate system based on the columns of A works because we can fill the entire space using combinations and multiples of the three vectors. We say the vectors span the entire space. 1.5.3 Using an Inverse

We have introduced the idea that we can solve Ax = b by rewriting it as x = A b. We will show the inverse here and how to use it, but we will not yet go through how to compute A−1 from A. We get the result for the inverse shown in 1.22 and its use to get the unknowns in 1.23. −1

1 1 0 1/9 −1/9 1/3     −1 1/9 −1/3 , A = 1 1 3 → A =  8/9 3 0 1 −1/3 1/3 0

(1.22)

1/9 −1/9 1/3 5 19/9 b        1/9 −1/3 10 = 26/9 = f  . A−1 b =  8/9 −1/3 1/3 0 8 5/3 c

(1.23)



















 

Rows and Columns ■ 21 Fe

Fe

3

8

3 1 1 10 Al 1

1

3

1

3

Al

1 K K

5 Fe

Fe 8

Al 10 Al

Column View

Row View K

5

K

The row and column views of the geochemical example. The upper left shows the mineral components for biotite, b, chlorite, c, and feldspar, f , plotted for elemental axes. Red arrows show the elemental vector defined by each mineral component. The upper right shows an expanded view of the data vector in green and the mineral vectors in orange required to meet this composition. Dashed lines are the mineral components extended along their length, and red dots are at the component’s length needed to add to the whole rock composition. The lower left shows the system plotted using the row view, so the solution emphasizes the vector lengths in terms of their elemental components. On the lower right, we show the column view, where the solution is focused on the addition of mineral vectors, which form a non-orthogonal coordinate system. Figure 1.10

22 ■ Linear Algebra for Earth Scientists

We discussed the need to consider the units when setting up a problem. What units are associated with the inverse? They are just what you would expect: the inverse of the units of the beginning matrix. Our chemistry matrix had units of Element/mineral. The inverse matrix has units of Mineral/element. This seems like an odd unit, but when we multiply A−1 b for, say, feldspar and K, we have (f /K) × K giving us a final answer in f . This will be covered in other chapters in detail. MATLAB® moment—Random arrays Sometimes, you just want a vector or a matrix to test your code or check a linear algebra concept. Here is an easy way to generate a vector or matrix of any size for your purpose. h

i

>>a=rand(1,3) gives a = 0.9649 0.2134 0.8927 . "

#

0.8147 0.0975 >>A=rand(2,2) gives A = . 0.9058 0.6324 The rand() command produces random real numbers between 0 and 1. The first argument, or number in parentheses, is the number of rows; the second, after a comma, is the number of columns.

1.6

SUMMARY

This chapter covered the primary rationale for representing geologic problems in a linear algebra form. The first example showed how the three-point problem could be recast as a series of equations and that these equations could be shown in matrix and vector form. The three-point problem is best thought of as a series of coupled equations, fundamentally taking the row view of linear algebra. We then went over an example of the GPS motion of a fault block and saw that this is simpler to think about as column vectors that are turned into a matrix. This stresses the column view of the matrix and linear algebra. At the end of this chapter, we considered a geochemical example that we presented in both the column and row view. We concluded that some geologic problems might be better thought of as rows to a matrix, others as the columns of a matrix, and others can be presented either way. We only have a couple of equations in this chapter that we will see or use again. These are two versions of the fundamental equation of linear algebra. A x = b and x = A−1 b.

(1.24)

However, the concepts of systems of equations and the row and column views are at the heart of linear algebra.

Rows and Columns ■ 23

1.7 EXERCISES 1.

You have the equations for two lines: 1x + 5y = 12 4x + 3y = 8. (i) (ii) (iii) (iv)

2.

Put the data in linear algebra form. Plot a graph using both column and row approaches. Solve the equations by elimination of variables. In MATLAB® , enter and display the matrix and data vector.

A more complicated example using w, x, y, and z. You have the following equations: 1w + 3x + 5y + 2z = 12 2w + 5x + 4y + 1z = 11 8w + 4x + 6y + 1z = 13 5w – 4x − 3y – 2z = 10 (i) Make the data into a matrix and vectors. (ii) Solve the equations by elimination of variables. (iii) In MATLAB® , enter the matrix and data vector.

3.

An oblique-slip reverse fault strikes north-south and dips 60 degrees to the west. It has 60m of dip-slip and 40m of left-lateral strike-slip. A setup figure is shown in Figure 1.11. 50

N

ha

ng

w ing

m

all

E D 40 m left slip slip

Figure 1.11

60 m up dip

foo

twa

ll

Reverse fault dipping 60° west and striking due north.

24 ■ Linear Algebra for Earth Scientists

Setup for exercise on tomography. Assume each block is 1 km on a side.

Figure 1.12

Give vectors for total slip relative to: (i) North-south, up-down (NED). (ii) A coordinate system you define using the footwall of the fault. (iii) Assume it moves an additional 40m left laterally; redefine the vectors in parts i and ii using vector addition and the total slip vector using NED. (iv) Now assume the fault moves two times farther in the strike direction and three times farther in the dip direction. Redefine the vectors in the first two parts using vector multiplication by a scalar and the total slip vector using NED. 4.

Let’s set up the equations for a problem in tomography. Tomography is a method by which we can determine the seismic velocity, the speed of a seismic wave through the Earth, as it crosses an area divided into blocks with different velocities. By looking at travel times crossing different blocks in different directions, we can figure out the velocity structure of the blocks. This method is used to look at the three-dimensional interior of the Earth using seismic waves produced by large earthquakes. We will start with a simple two-dimensional geometry shown in Figure 1.12. For our example, we will consider four 1 km square blocks traversed by seismic waves. You are trying to determine the unknown seismic velocity in each colored block, v1 , v2 , v3 , and v4 . The stars represent the source positions, and the times to cross the pattern are shown as t1 , t2 , t3 , and t4 . These travel times are your data. The basic equation we use to relate distance, rate, and time is distance = rate × time. (i)

Write an equation for the time t it will take the first seismic wave to traverse just the top left block on the diagram, with unknown seismic velocity v1 . Assume this block and all other blocks are 1 km on a side.

Rows and Columns ■ 25

(ii) The equation from (i) includes an awkward term that looks like (1/v1 ). Let’s call the reciprocal velocity the slowness and denote it v −1 . From here on, we can solve linear equations formulated in terms of the slowness, v1−1 , v2−1 , v3−1 , and v4−1 . When we have a solution, we can take the reciprocal of each slowness to calculate the velocity we’re looking for. (iii) Write an equation for the total time t1 it will take the first seismic wave to traverse the top left block with slowness v1−1 , and the top right block with slowness v2−1 . (iv) Now write a series of equations and a linear algebra expression that relates the unknown slownesses v1−1 , v2−1 , v3−1 , and v4−1 to the travel times t1 , t2 , t3 , and t4 , and the distances. (v) Explain how you design the matrix and vectors to solve the problem Ax = b. Make sure you show what quantities carry what physical units (e.g., time and distance) and that the equations balance once you set them up. Also, show the units in matrix and vector form. This will be Ax = b without the variable names but with the unit designations. 5.

This exercise is on the mineral garnet and its components as cations. We will see this problem several more times in this book, so this is just the start of setting up the exercise. You will set up the matrices and equations in the form of Ax = b, but first, you will have to figure out what A should look like. Garnet has the general chemical formula A3 B2 Si3 O12 , where A is +2 cations and can be Fe2+ , Ca, Mg, or Mn, and B is +3 cations and can be Fe3+ , Al, or Cr. So, 7 cations can appear as A or B. There are also 6 components or classifications of Garnet based on chemistry: 1. 2. 3. 4. 5. 6.

Andradite (An) – Ca3 Fe3+ 2 Si3 O12 Grossular (Gr) – Ca3 Al2 Si3 O12 Pyrope (Py) – Mg3 Al2 Si3 O12 Spessartine (Sp) – Mn3 Al2 Si3 O12 Almandine (Alm) – Fe2+ 3 Al2 Si3 O12 Uvarovite (Uv) – Ca3 Cr2 Si3 O12

So, there are 6 components and 7 cations. When analyzing garnets, we usually measure all the important elements and then compute the relative abundances of the components based on the cations. Once we have the chemical cation abundances (totaling 5), we are then interested in computing the relative contributions of each garnet component (Alm vs. Py vs. Sp, etc.). What data are we given? The chemistry of the garnet components and the amount of each cation. What do we want? Based on all this chemistry, we want to express the garnet composition as fractions (adding to 1) of each garnet type. In the most general way: (Chemistry of components) * (amount each component) = cations of each element or Ax = b

26 ■ Linear Algebra for Earth Scientists

In this, A is a matrix, and x and b are both vectors. We assign x to the vector of unknown components and b to the vector of measured cation abundances. Put the x vector in the order An, Gr, Py, Sp, Alm, and Uv. Put the b vector in the order Fe2 +, Ca, Mg, Mn, Fe3 +, Al, and Cr. Exercise: Construct matrix A (chemistry of the components). First, write down the vectors x and b in terms of the component and element symbols. Then, decide what appears in the rows and the columns of A – rows are either components or cations and vice versa for columns. Write this in the form Ax = b.

CHAPTER

2

Matrix Multiplication

This chapter will take up some formalities of linear algebra, namely matrix and vector multiplication, and operations like transpose and dot product. There will be many examples using small matrices, but all operations can be extended to matrices and vectors of any size. We restrict ourselves to the Real Numbers, although we can do linear algebra across other fields, such as Complex Numbers. The number of dimensions or the size of the number array is symbolized using R2 for something like (0.707, 3.14) in 2 dimensions and Rn for n dimensions. For example, every vector in R9 will have 9 components. We will also introduce restrictions on which vectors or matrices can multiply other ones. We will end the chapter with geologic examples and a discussion of orientation data.

2.1 MULTIPLYING THE LINEAR ALGEBRA WAY We have already discussed the two most important properties of vectors and matrices, that they can be multiplied by a constant and can be added if they are the same size and shape. Now, we must consider what would happen if we want to multiply one matrix times another. 2.1.1 Conventions Used for Matrices and Vectors

In this book, we will follow as closely as possible a defined manner of using the typographical style of letters and symbols to indicate their roles as linear algebra variables or operators. Lower, upper case, regular, and bold fonts imply different things about a matrix, vector, or variable. A bold capital letter means a quantity is a matrix - A. A bold lower-case letter refers to a vector: x or b. A bold lower-case letter with a subscript like ai refers to the ith column or row of the matrix A. ⃗, One convention we do not adopt is placing an arrow over a vector, such as x ˆ because it seems to us to clutter up the text unnecessarily. We will use x (x-hat) in ˜ (x-tilde) sparingly later in the many cases to designate a unit vector. We also use x text for vectors with some special property or significance. Finally, a lowercase letter such as x is just a variable, a scalar, or a component of a vector or matrix. We refer to a matrix as m × n, meaning it has m rows and n columns (rows before columns). Any n × n matrix is square. We also will use square brackets [ ] to enclose DOI: 10.1201/9781003432043-2

27

28 ■ Linear Algebra for Earth Scientists

entries in a matrix or vector. We have used these conventions; plenty of examples are in the previous sections. Other conventions will be used in subsequent chapters, but this is all we need now. 2.1.2

Multiply a Vector Times a Vector

The first operation creates the product of two vectors of the same size. This is surprisingly easy. With two vectors u and v, we create what we call the dot product or inner product, that is symbolized by u · v. If u = [1, 3] and v = [2, 4], then u · v = (1 · 2) + (3 · 4) = 14.

(2.1)

Do not be confused by the symbols for dot product and multiplication used in the equations. They look exactly the same, but the difference is clear—when · is between two vectors (bold, lower case letters), it must represent a dot product, but a · between two numbers or scalar variables means just ordinary multiplication. MATLAB® moment—Dot product Computing the dot product of two vectors. 1 4     >>a=[1;2;3], >>b=[4;5;6] gives a = 2 and b = 5, 3 6  

 

1 4     >>c=dot(a,b) give c = a · b = 2 · 5 = 32, 3 6    

Use ( ) for functions.

The dot product of two vectors always produces a single number or scalar. For this reason, the product is also called the Scalar Product of two vectors. Although the notation ⟨u, v⟩ is commonly used to symbolize the vector product, we will stick to the dot product in this text. In general, if u = [u1 , u2 , . . . un ] and v = [v1 , v2 , . . . vn ], then 

 



u1 v1      u2   v2     u·v =  ..  ·  ..  = u1 v1 + u2 v2 + . . . + un vn .  .  . un

(2.2)

vn

Notice also that u · v = v · u, meaning that the dot product is Commutative. You will also see this written as a row vector times a column vector without the · between

Matrix Multiplication ■ 29

them. This can just be called an inner product. 



v1  i  v2   . . . un  .   = u1 v1 + u2 v2 + . . . + un vn  .. 

h

uv = u1 u2

(2.3)

vn

This is precisely how we showed the matrix-vector multiplication in the previous chapter when discussing Ax = b. 2.1.3

Multiply a Matrix Times a Vector—the Column Approach

Now, we move on to a matrix times a vector. We have already seen this in the previous chapter, but it needs a bit more explanation and in-depth treatment. First, we put the matrix on the left side of the vector, which means we are multiplying the vector by the matrix. The multiplying matrix must have the same number of columns as the vector has rows. In other words, if we have an n × 1 vector, the matrix must have n columns. So, if our matrix is m × n and the vector n × 1, we get a resulting vector that is m × 1. We can write the multiplication in two main ways - using a column approach or a row approach. This section discusses the former and the latter in the next section. The column approach is similar to the fault-slip example given in the previous chapter in that it emphasizes viewing the matrix as a series of columns that are acted on by the row entries of the vector. In a bigger example, we take a 4 × 3 matrix multiplying a 3 × 1 vector: 

a1 a  2  a3 a4

b1 b2 b3 b4







 

 

c1   a1 b1 c1 u1 a  b  c  c2     2  2  2  u  = u1   + u2   + u3   = a3  b3  c3  c3  2 u3 c4 a4 b4 c4 u1 a1 + u2 b1 + u3 c1 r1 u a + u b + u c  r   1 2  2 2 2 3 2   =  . u1 a3 + u2 b3 + u3 c3  r3  u1 a4 + u2 b4 + u3 c4 r4 



 

(2.4)

We multiply each column by the corresponding row in the vector and add them. This results in a 4 × 1 vector, the right result for a 4 × 3 matrix times a 3 × 1 vector. The multiplication in this way forms a final vector that is a combination of the columns of the matrix. This utilizes both fundamental properties of a matrix, multiplying by a scalar and adding like-shaped matrices. Note we can call any array of entries, whether a full matrix or simply a vector, a matrix. Vectors are just special matrices with a single column or row. Once more, the operation takes the view of the matrix as a set of columns or column vectors that we operate on during multiplication. The key idea here is that in the equation Ax = b, the vector x takes a linear combination of the columns of the matrix A that add up to the data vector b. The linear combination of vectors

30 ■ Linear Algebra for Earth Scientists

is one of the fundamental ideas underlying all of linear algebra. It will be revisited many times in the coming sections and chapters. 2.1.4 Multiply a Matrix Times a Vector—the Row Approach

Our second way of multiplying a vector by a matrix is inspired by the three-point problem described in the first chapter. Instead of looking at the multiplying matrix as columns, we view it as rows, similar to the equations described for solving for slopes and base elevation. We will use the same matrix and vector as in Section 2.1.3 but operate on rows times the column vector. 

a1 a  2  a3 a4

b1 b2 b3 b4

 



r1 c1   u1   c2     r2   u  =   , r3  c3  2 u3 r4 c4

  h i u1   c1 u2  , r2 = a2

h

where r1 = a1 b1

b2

u3



h

r3 = a3 b3

u3



i u1 h   c3 u2  , and r4 = a4

  i u1   c2 u2  , 

b4

u3

 i u1   c4 u2  .

(2.5)

u3

And of course r1 = u1 a1 + u2 b1 + u3 c1 , and so forth for r2 , r3 , r4 , r1 u1 a1 + u2 b1 + u3 c1 u a + u b + u c  r   2  2 2 3 2 giving  1 2  =  . u1 a3 + u2 b3 + u3 c3  r3  u1 a4 + u2 b4 + u3 c4 r4 



 

(2.6)

The results of 2.4 and 2.6 are identical. The operation views the matrix as a set of rows that each multiplies the column vector to create the row’s content in the data vector. This is how we analyzed the three-point problem as a series of rows representing equations. Either view of the matrix, as columns or as rows, must and does yield the same answer. 2.1.5 Three-point Problem as a Matrix Times a Vector

In the previous chapter, our first example of a matrix equation and its linear algebra companion equation was for the three-point problem. We repeat that equation here but expand the multiplication to show the result. This should look very familiar to the reader. We have included the units for each term and will discuss this as follows. x1 [m] y1 [m] 1 [ ] mx [ ]    x2 [m] y2 [m] 1 [ ]my [ ] x3 [m] y3 [m] 1 [ ] z0 [m] 



A

=

z1 [m]   z2 [m] z3 [m]

(2.7)

=

b

(2.8)



x





Matrix Multiplication ■ 31

So, in this case, we get that our multiplication of matrix times a vector gives a final vector whose entries are precisely the entries describing the equations on the lefthand side of the three-point problem. Simply putting the b back on the right-hand side gives us the three-point problem equations in vector form. x1 y1 1 mx x1 mx + y1 my + 1z0      x2 y2 1 my  → x2 mx + y2 my + 1z0  x 3 y3 1 z0 x3 mx + y3 my + 1z0 









 

=

z1

  z2 

(2.9)

z3

This example’s other important feature is that the matrix and vectors can have different units. For the coefficients matrix, two columns have units of length, but the other is unitless. The unknowns vector has two dimensionless components and one in [m]. This situation means we must ensure the products work for the data vector. In the end, the multiplication of A with x produces a vector whose entries all have dimension length, the same as the data vector b. MATLAB® moment—Multiplying matrix times vector Computing the product of a matrix times a vector. >>A=[1,2;4,3], >>b = [1; 2], >>b = [4; 5; 6] "

#

" #

1 2 1 give A = and b = . 4 3 2 "

1 2 >>c=A∗b gives c = Ab = 4 3

#" #

"

#

1 5 = . 2 10

Use ∗ to multiply arrays, provided they have the right dimensions.

2.1.6

Multiply a Matrix Times a Matrix with Dot Products

Now for the big step, multiplying one matrix times another matrix. We have already said that a matrix times a vector has the restriction that the number of columns in the matrix must equal the number of rows in the vector. This goes for matrix-matrix multiplication. We are multiplying the columns of the second matrix by the rows of the first, so they must have the same number of components. In other words, we can only multiply two matrices if they are m × n and n × p. The resulting matrix is m×p. This “row times column” process is the method most taught in linear algebra and other math classes working with matrices and vectors. Here is a simple example for a 3 × 2 times a 2 × 2 matrix. 







# a1 b 1 " r1 s1   u1 v1   = r2 s2  , a2 b2  u2 v2 a3 b 3 r3 s3

(2.10)

32  Linear Algebra for Earth Scientists



 



  

a u where r1 = 1 · 1 b1 u2





 



  

a u r2 = 2 · 1 b2 u2

a v and s1 = 1 · 1 b1 v2





 





  

a u r3 = 3 · 1 , b3 u2

a v s2 = 2 · 1 b2 v2 ⎡



a v s3 = 3 · 1 . b3 v2





r1 s1 a1 u 1 + b 1 u 2 a1 v 1 + b 1 v 2 ⎥ ⎢ ⎥ ⎢ This gives as a final result ⎣r2 s2 ⎦ = ⎣a2 u1 + b2 u2 a2 v1 + b2 v2 ⎦ . r3 s3 a3 u 1 + b 3 u 2 a3 v 1 + b 3 v 2

(2.11)

We can isolate the result for any particular entry in the product matrix. In the following example for AB = C, it is simply the inner product of a row ai in the first matrix with a column bj in the second to give the entry cij in C. ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

bj b1j

b11

bkj bn1



a11

⎢ ⎢ ai ⎢ ⎢ ai1 ⎢ ⎣

⎥ ⎥ ain ⎥ ⎥ ⎥ ⎦

aik

am1

bnj

⎤ ⎡

a1n

b1p

cij

⎥ ⎥ ⎥ ⎥ ⎥ ⎦

bnp

(2.12) ⎤

c1j

⎢ ⎢ ⎢ ⎢ ci1 ⎢ ⎣



cip

⎥ ⎥ ⎥ ⎥ ⎥ ⎦

cmj

amn

MATLAB® moment—Multiplying matrices Computing the product of a matrix times a matrix. 







1 2 1 3 and B = , >>A=[1,2;4,3], >>B=[1,3;4,2] give A = 4 3 4 2 



1 2 >>C=A∗B gives C = AB = 4 3







1 3 9 7 = . 4 2 16 18

Use ∗ to multiply matrices, provided they have the right dimensions.

2.1.7

Multiply a Matrix Times a Matrix with Outer Products

The previous example presented matrix multiplication in the most common form: a row times a column. An alternative method is multiplying columns times rows to

Matrix Multiplication ■ 33

create a series of matrices that we combine to get the final product. This multiplication method takes an Outer Product. We show a simple outer product as a column times a row vector. "

#

"

i a1 h a u a u u1 u2 = 1 1 1 2 a2 a2 u1 a2 u2

#

(2.13)

This may seem odd and violate the required number of rows and columns for multiplication. It does not. We state that the only requirement for multiplication is that the number of columns in the matrix on the left has to be the same as the number of rows in the matrix on the right. In the example in equation 2.13 we have a 2 × 1 vector multiplying a 1 × 2 vector. It produces a matrix, in this case, 2 × 2, not a vector. A simple rule is that an inner product of a row times a column produces a scalar, but an outer product of a column times a row produces an entire matrix. We will demonstrate this using the 3 × 2 and 2 × 2 matrices from equation 2.10. 







# a1 b1 " r1 s1   u1 v1   = r2 s2  a2 b2  u2 v2 a3 b3 r3 s3

Now we do the multiplication as columns × rows. This is done one column at a time for the matrix on the left, acting on a single row of the matrix on the right. We multiply the first column by the first row, the second column by the second row, and so forth. This operation will create a new 3 × 2 matrix since the columns are 3 × 1 and the rows are 1 × 2. 







 

a1 h a 1 u 1 a 1 v1 i     a2  u1 v1 = a2 u1 a2 v1  , a3 a3 u1 a3 v1

and





b1 h b1 u2 b1 v2 i     b2  u2 v2 = b2 u2 b2 v2  . b3 b3 u2 b3 v2

(2.14)

We add the two resulting matrices together to get the final result. a 1 u 1 a 1 v1 b1 u2 b1 v2 a1 u1 + b2 u2 a1 v1 + b1 v2 r1 s1         a2 u1 a2 v1  + b2 u2 b2 v2  = a2 u1 + b2 u2 a2 v1 + b2 v2  = r2 s2  a3 u1 a3 v1 b3 u3 b3 v2 a3 u1 + b3 u2 a3 v1 + b3 v2 r3 s3 















(2.15)

So, the results are identical between equations 2.11 and 2.15, as they should be. The first was done as rows times columns and the second as columns times rows. This again emphasizes that the row and column views are interchangeable. 2.1.8

Another Use for Dot Products

Another use of the dot product is finding the angle between two vectors. Again, we will not go into a proof, but if we have two vectors u and v that intersect, then we have a simple formula that relates the dot product to the angle between them. √ √ u · v = ∥u∥∥v∥ cos θ = u · u v · v cos θ, (2.16)

34 ■ Linear Algebra for Earth Scientists

u·v θ = arccos . ∥u∥∥v∥ 



(2.17)

Notice that all we get is the angle between the two vectors. We do not get any orientation or rotation sense but an angle. We will have to go back to the vectors themselves to do more. Note also that θ is 0◦ to 180◦ . If we are dealing with unit vectors, such as using direction cosines to represent orientations, then our equations become: u · v = cos θ, or θ = arccos(u · v).

(2.18)

We will find it handy to use equation 2.18 in many instances, especially as we work on orientation data or other spatially-based geological data expressed as Unit Vectors. These vectors have unit length or magnitude. MATLAB® moment—Matrices that cannot be multiplied MATLAB® will not let you multiply matrices that are mismatched by size. This is a common enough error that we show with an example. "

#

"

#

1 2 3 1 2 Let’s compute C = AB, where A = and B = . 3 2 1 2 1 >>C=A∗B, gives >>Error using

*

>>Incorrect dimensions for matrix multiplication. Check that the number of columns in the first matrix matches the number of rows in the second matrix You cannot multiply at 2 × 2 matrix by one that is 2 × 3.

2.1.9

Some Important Rules about Matrix Math

We stated that the operation of multiplying a vector times a vector, the dot product, is commutative, meaning that u · v = v · u. While this is true for two vectors, it is not true for two matrices, so that AB ̸= BA. We can see this with a simple example. "

#

"

#

1 2 2 3 Let A = , and B = , 3 4 1 −1 "

#

"

(2.19) #

4 1 11 16 then AB = ̸= BA = . 10 5 −2 −2

(2.20)

So, matrix multiplication is not commutative. In fact, it is common that if we can multiply AB, we may not even be able to perform BA. For example, if A is 6 ×4

Matrix Multiplication ■ 35

and B is 4 × 3 then AB works out to be a 6 × 3 matrix. However, we cannot even perform BA because we have no way to multiply a 4 × 3 matrix times a 6 × 4 one. The number of columns in the first differs from the number of rows in the second. But matrix multiplication is associative. We can write the equation ABC as (AB)C or A(BC). It is sometimes very convenient to regroup our multiplications. We cannot change the order of the matrices, but we can choose in what grouping we multiply them. There will be numerous examples of this given in the following and in later chapters. In the previous example, if we define the matrix C, we can see that the multiplication is associative. "

#

0 1 Let C = , then −1 2 "

#"

4 1 (AB)C = 10 5 "

1 2 and A(BC) = 3 4

#"

(2.21) #

0 1 −1 2

"

#

−1 6 = , −5 20

#

"

(2.22)

#

−3 8 −1 6 = . 1 −1 −5 20

(2.23)

And matrix multiplication is distributive. Simply put, A(B + C) = AB + AC. Also, we can see that (B + C)A = BA + CA. But again, A(B + C) ̸= (B + C)A. Matrices work the same way as vectors regarding multiplication by a constant and addition. We can always multiply any matrix by a constant, cA, and cA = Ac, so the scalar multiplication is commutative. Matrix addition is also commutative, and A + B = B + A, as long as A and B are the same size and shape. As an example, using A and B from equation 2.19, we get: "

#

"

#

"

#

"

#

"

#

1 2 2 3 2 3 1 2 3 5 + = + = . 3 4 1 −1 1 −1 3 4 4 3

(2.24)

Addition is also associative because (A + B) + C = A + (B + C). And, of course, we can apply the associate property and multiply it by a constant simultaneously. For example, c(A + B) + C = cA + (cB + C).

2.2

TRANSPOSE OF A VECTOR OR MATRIX

When we turn a matrix or vector on its side, we call it transposing the matrix, and the on-its-side matrix is called the transpose. The usual way to write this for two vectors is: 



u1   h i u2   T  u · v = uTv, where u =   ..  , and u = u1 u2 . . . un .  .  un

(2.25)

36 ■ Linear Algebra for Earth Scientists

If we have a matrix, we turn its first row into the first column of its transpose, the second row into the second column, etc. This looks like this: 

a1 a  A= 2 a3 a4

b1 b2 b3 b4



c1 c2    , then c3  c4





a1 a2 a3 a4   AT =  b1 b2 b3 b4  . c1 c2 c3 c4

(2.26)

MATLAB® moment—Transposing matrices and vectors Transposing a matrix and a vector. "

#

" #

1 2 1 >>A = [1,2;4,3], >>b=[1;2] gives A = and b = , 4 3 2 "

1 2 >>C=A′ gives C = AT = 4 3 " #T

>>d=b gives d = b = ′

T

1 2

#T

h

"

#

1 4 = and, 2 3 i

= 1 2 .

Use ′ to get the transpose of any array.

And even in a more general way, we can write a matrix using subscripts that indicate the row and column of the entry. For some entry u of the matrix U , we would have uij , where i is the row number, and j is the column number. If our matrix is m × n, then we could write U and U T as: 

u11 u  21  u U =  31  ..  . um1



u12 · · · u1n u22 · · · u2n    u32 · · · u3n  , and .. ..   . .  um2 · · · umn





u11 u21 u31 · · · um1  u12 u22 u32 · · · um2    UT =  .. .. ..   .. . . . .   . u1n u2n u3n · · · unm

(2.27)

So transposing is thus swapping rows and columns. For square matrices, we imagine this as flipping the matrix across the Main Diagonal. The entries starting at the upper left and continuing down to the right, one row and one column at a time, form the main diagonal of a matrix. These are the matrix elements with matching indices, e.g., u22 or unn . For rectangular matrices, we can just switch the order of the indices, e.g., u12 → u21 or umn → unm as we see in equation 2.27. 2.2.1

The Transpose of a Vector Times Itself

We started this section with the idea that we could equate a dot product with multiplication by a transpose. We consider a couple of special cases here. What if we

Matrix Multiplication ■ 37

take the dot product of a vector with itself? We could write this u · u, but we will try to stay with the convention in this book of writing it uTu . Let’s see what we get. 



u1    u2  T 2 2 2 2  Let u =   ..  , then u u = u1 + u2 + · · · + un = ∥u∥ .  . 

(2.28)

un

We call the result of uTu or ∥u∥2 the square of the Norm of the vector. This is simply the length of the vector squared, with the norm ∥u∥ corresponding to the length. The norm ∥u∥ is also called the 2-norm, the L2 norm, or the Euclidean Norm. MATLAB® moment—Computing the norm of a vector Here are two ways to compute the norm of a vector. One uses the norm function, the other the sqrt or square root function. 1   >>a=[1;2;3], gives a = 2, and we want the norm of a = ∥a∥. 3 √ >>b=sqrt(a′ ∗a) gives b = 14 = 3.74, or we could use,  

>>b=norm(a) giving the same result, b = ∥a∥ = 3.74.

We could also explore changing the order of multiplication and transposition. We would do this as follows: 







u1 v1      u2   v2     Let u =   ..  and v =  ..  ,  .  . un vn

(2.29)

Then uTv = v Tu = u1 v1 + u2 v2 + · · · + un vn = u · v = v · u. These are very handy expressions that we will use later in this book. Note also that uTv ̸= uv T . The former produces a scalar product and the latter results in an n × n matrix.

2.2.2

A Couple Useful Rules About Transposes

In this book, we keep formal proofs to a minimum but give rules that we will need to use and remember later, which may be useful as you apply linear algebra to the Earth Sciences. There are two of these for the transpose. The first concerns taking the transpose of a transpose. This gives you what you would think; transposing the transposed matrix returns the original matrix. So for matrix A, (A)T = AT , and

38 ■ Linear Algebra for Earth Scientists

(AT )T = A. When we have matrix multiplication, things get a little more involved. Remember, (ABC)T = C TB TAT . When we transpose a product, we reverse the order of multiplication for the transposed matrices. We have only given one example of using the transpose: computing the norm of a vector. We will find later that transpose is essential in linear algebra. One of the most common expressions we will use starting in Chapter 5, Solving Ax = b when there is no solution, is the computation of the matrix ATA.

2.3 LINEAR COMBINATIONS Now that we have gone over various ways of multiplying matrices, let’s step back and think about what multiplication accomplishes. We presented in Section 2.1 the idea that we could view linear algebra as working with rows and finding values for the solutions to equations or with columns and their linear combination to build a solution vector. These are such essential and complementary concepts that we present one more example. We aim to understand the row and column views by using them to find the intersection of two lines. These two views use exactly the same matrix, solution, and data vectors but must be plotted using different coordinate systems. We will then discuss other aspects of multiplication, including multiplying by 1 and multiplying using a diagonal matrix on the left or right side. 2.3.1 Rows and Columns Graphed

Let’s take the following matrix equation and express it as the solution to equations for a row problem and as a combination of vectors for a column problem. The two approaches are contrasted in Figure 2.1. Remember, we are just solving the equation Ax = v. The setup and values are: "

a11 a12 a21 a22

#" #

" #

u r = 1 v r2

"

and

#" #

1 3 4 −1

" #

u 5 = . v 7

(2.30)

We will start with the row view. This views the operation of multiplication using two coupled equations coming from the rows of the matrix and values in the data vector to solve for the unknowns vector. a11 u + a12 v = r1

and

a21 u + a22 v = r2 ,

(2.31)

1 · u + 3 · v = 5 and 4 · u + (−1) · v = 7.

(2.32)

The row version looks like equations we would do algebra on to get the answers for u and v. We could just multiply the first equation by −4, subtract it from the second, and solve for v and then u. The column version is different. We turn the matrix into column vectors a1 and a2 . The unknowns u and v are the amounts of the first and second columns we need,

Matrix Multiplication ■ 39

The two ways to solve 2.30. The left side shows the solution to two equations or the row version. The solution is the point of intersection between two lines, and the axes are u and v. The right side shows the column solution where the columns of A are combined to find the solution vector r. In both cases, the solution is the same at (2, 1) regardless of whether the axes are u-v or x-y. Figure 2.1

5

5

5

10

5

-5

-5

Row View

Column View

so they sum together to give the data vector that we call r. "

#

"

#

a a u 11 + v 21 → u · a1 + v · a2 = r, a12 a22 " #

"

#

(2.33)

" #

1 3 5 u +v = −1 7 4

(2.34)

The column version is much more like linear algebra because it is the linear combination of some vectors. The variables are vector and scalar quantities in equation 2.33. In the row version, we have two equations with the variables u and v, so how should we graph the problem and label the axes? Since the equation is in u and v, those are the unknowns, and we use those for the axes on the left side of Figure 2.1. And the unit vectors are u ˆ and v ˆ. The equations in 2.32 describe lines in the u-v space. Solving the coupled equations finds the point of intersection, which in this case is (u, v) = (2, 1). The solution to the column version is more subtle. Of course, we have to get the same answer, that (u, v) is (2, 1), but the meaning is different. Now u and v are not the coordinates of a point, but rather the amounts of a1 and a2 that we need to arrive at the solution vector r. The key to understanding the difference is answering the question, what are the axes in the column version graph? They are NOT u and v! The axes are whatever we want to assign to the units of vector b and vectors a1 and a2 . In this case, we use x and y, with unit vectors ˆi and ˆ j.

40 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Multiplying vectors Multiplying two column vectors without dot products. 4 1 3 2     >>a=[1;2;3;4], >>b=[4;3;2;1] gives a =   and b =  . 2 4 1 3  

 

4

  i 3   >>c=a′ ∗b gives c = aT b = 1 2 3 4   = 20. 2 h

1

2.3.2 Coordinate Systems and Unit Vectors

In the problem of row view and column view, we emphasized that the solutions are the same, but the way we get there is different. In the row approach, we set up equations and unknowns, and the graph we use is in the coordinate system of the unknowns. In the example in equation 2.30 and Figure 2.1, we were solving for u and v, so we set up the graph in terms of lines in u and v. Picking the unit vectors for the coordinate system, we just used u ˆ and v ˆ. As noted, putting a ”hat” on something makes it a little more special and specific. For the coordinate system u and v, the u ˆ and v ˆ are unit vectors in the directions of u and v. Also, unless otherwise stated, u and v are perpendicular and form a right-hand system (counterclockwise rotations are considered positive). We commonly call a coordinate system like this, set up with orthogonal unit vectors, the standard coordinate system, or the standard basis. We will explain the idea of a basis later in this text. For the column approach, we start with two column vectors. What coordinate system are they in? They are not in u and v because (u, v) is not a point in the coordinate system! The point we are trying to reach is represented by r, the data vector, which is also not in u-v. Because we are combining the vectors a1 and a2 to reach r, we could use these for the coordinate axes. The vectors of A would form an unusual reference coordinate system because they are not perpendicular. The solution is to graph the vectors a1 and a2 in another standard coordinate system. Each vector has two components, so we will make an orthogonal coordinate system in 2 dimensions. In the right part of Figure 2.1, we have called the axes of this coordinate system x and y with unit vectors ˆi and ˆ j. Now we plot the vectors a1 and a2 using the first component of each as the x coordinate and the second component as the y coordinate. This produces the result that u = 2 and v = 1 and 2a1 + 1a2 = r. Some problems require two different graphs, and some can be expressed in the same coordinate system. In the geochemical example in Chapter 1, we did not encounter this issue because the row and column views were constructed using different expressions of the same element-based system. Other authors commonly show both

Matrix Multiplication ■ 41

systems with no axes or generic x and y axes. They do not explain that the two views can require different coordinate systems. We avoid that practice. Understanding this difference will help the reader in later chapters to understand more deeply topics like vector spaces and the least square fitting of data. We will use terms like row and column space, so there is much to look forward to. 2.3.3 The Diagonal Identity Matrix

We need to introduce two more matrix ideas before going further. Fortunately, we can use a single matrix for this, the Identity Matrix. The identity matrix is always square. It goes by the notation I or In to show the matrix is of size n × n. We show a couple of examples of the identity matrix. 1 0 0 ··· 0 1 0 · · ·   0 0 1 ··· I=   .. .. .. . . . . . . 0 0 0 ··· 

"

#

1 0 I= = I2 , 0 1

and

0 0   0 ..   . 

(2.35)

1

First of all, these matrices are Diagonal matrices , a matrix whose only nonzero entries are along the main diagonal from the upper left to the lower right. Second, these matrices are square, so they can all be called n × n. Third, an identity matrix is a special diagonal matrix with ones along the main diagonal and zeros everywhere else. The last key property of an identity matrix is that multiplying it or multiplying by it does not change anything; it is the matrix way of multiplying by 1. The identity matrix is always commutative for matrix multiplication. "

1 0 0 1

#"

a b c d

#

"

=

a b c d

#"

1 0 0 1

#

"

=

a b c d

#

(2.36)

It may seem strange to emphasize a matrix that does nothing, but it is a handy tool in almost all of linear algebra. We will see it again in this chapter and often after that. 2.3.4 Multiplying by Diagonal Matrices

The diagonal identity matrix makes no difference whether we put it on the right or the left in multiplication; we get the same result. We showed that this was not generally true in Section 2.1.9 where it was clear that AB ̸= BA. Is there something different about diagonal matrices, or is it just the identity matrix? The answer is both. To explore this, we will do operations like AD and DA. The identity matrix is special because, always, AI = IA. But, diagonal matrices are different because they accomplish different operations on the matrix depending on whether they are on the right or the left. We work on an example to show the important but consistent difference. In equation 2.37, we show a matrix operation with a diagonal matrix D on the left, and in equation 2.38, we put D on the right.

42 ■ Linear Algebra for Earth Scientists

1 0 0 1 1 1    DA =0 2 01 1 1 0 0 3 1 1 1 



(2.37)

1 2 3   1 2 3 . 1 2 3

(2.38)

1 1 1 1 0 0    1 1 10 2 0 = 1 1 1 0 0 3 

AD =

1 1 1   = 2 2 2 , 3 3 3 













The previous results show that a diagonal matrix on the left multiplies the rows, but putting it on the right multiplies the columns. This is a good trick to remember. When doing Gaussian Elimination, we aim to add and subtract multiples of one row from another. This means we will always put an elimination matrix on the left to act on the rows of the matrix on the right it is multiplying. In the next chapter on elimination, when computing what we will call LDU decomposition, and later in the book, when going over Eigenvectors, Eigenvalues, and Singular Value Decomposition, one of our primary goals is to come up with special diagonal matrices. We call this diagonalization and say that a matrix is diagonalizable. And as you may have guessed, we reserve D for diagonal matrices.

2.4 LINEAR TRANSFORMATIONS So far, we have viewed multiplying a matrix times a vector as a way of re-forming coupled equations to find a set of unknowns. But we can also view multiplication as stretching and rotating vectors. We call this the Linear Transformation view. This gives us two perspectives that we must explore further. The first is what we can learn and visualize about the operations of a matrix in transforming vectors or other matrices and not just solving equations. The second perspective will concern how the coordinate system of the target vector relates to that of the resultant vector. We got a hint of this perspective in the last chapter when we looked at the chemistry and minerals problem. It became a more evident issue when we worked at graphing rows and columns views of matrix multiplication in Section 2.3.1. 2.4.1 The Linear Transformation View

Let’s explore the transformation view by starting with a simple case of a 2 × 2 matrix and a 2 × 1 vector to get another 2 × 1 vector. We have seen this numerous times in Ax = b. In previous cases, we have seen the vector x as the solutions to equations or the factors for scaling and multiplying the columns of A to get b in the row and column views, respectively. We could also look at the matrix A as transforming the vector x into a new vector r. Now we have one more interpretation for Ax = b, the most basic equation in linear algebra, the Linear Transformation View. For the transformation view, there are no unknowns. Instead, we have a vector x composed not of unknowns but of the inputs we want to transform. The transformation is done by the matrix A producing the output vector b. Let’s go back to

Matrix Multiplication ■ 43

10

5

-5

5

10

-5

-10

-5

Transformation view of matrix multiplication. The coordinate system of u-v and point (2,1) on the left are transformed by multiplication by the matrix A in 2.39. The matrix causes a stretching of the axes as well as a rotation. The transformation of the unit vectors u ˆ and v ˆ is shown between the two graphs. Subscripts after the vectors indicate the coordinate system.

Figure 2.2

the problem we used to contrast the row view and column view in a 2D graph at the start of this section. We had an equation to solve for u and v. Our result was that u = 2 and v = 1. We graphed this in the row view for the u-v axes and the x-y axes for the column view in Figure 2.1. We could also just view this equation in the transformation view. For this, we rewrite the equation by putting in the values for u and v to see where the vector ends up in the x-y coordinate system. "

#" #

1 3 4 −1

" #

u x = v y

"



#" #

1 3 4 −1

" #

2 5 = 1 7

(2.39)

In this form, the matrix takes the vector (2, 1) and transforms it into the vector (5, 7). In Figure 2.2, we show how this takes the point (2, 1)uv and transforms it to (5, 7)xy , where the subscripts refer to the coordinate system. This emphasizes the amazing fact that in the equation Ax = b, the vectors x and b can be viewed as residing in different coordinate systems. We explored this notion by contrasting the row and column views in Section 2.3.2. We will let this idea rest for now, but it will come back in full feature in a later chapter where we will use terms like row space and column space. Thus, the row and column views will expand in significance later in this text. We have seen what happens to a single point, but Figure 2.2 also shows what happens to the coordinate axes as they are transformed by multiplication by A. They remain orthogonal but are stretched and rotated. Lines parallel before transformation are parallel after transformation, and lines and points that are evenly spaced before transformation remain so after transformation. This result is exactly what it means for a Transformation to be Linear. The main point here is that the matrix A

44 ■ Linear Algebra for Earth Scientists

tells us what happens to the basis vectors u ˆ and v ˆ during the transformation. The u ˆ and v ˆ vectors are the standard basis in the starting coordinate system, the ones we would pick automatically. Multiplication in the transformation view can accomplish something even more expansive if we start with an n×n matrix multiplying an n×1000 matrix or a matrix of 1000, n × 1 vectors. Our result will be a n × 1000 matrix. In geology, we usually think that n applies to the Earth, so it is commonly equal to 2 or 3 in most physical cases. We can write out a simple case as AU = R as follows: "

a11 a12 a21 a22

#"

#

"

#

u11 u12 · · · u1,1000 r r · · · r1,1000 = 11 12 . u21 u22 · · · u2,1000 r21 r22 · · · r2,1000

(2.40)

So, what is the resultant matrix? Since we look at the matrix U as composed of 1000 columns of 2 × 1 vectors, the resultant is also 1000 columns of 2 × 1 vectors. The other thing to note is that the operation takes the matrix U and transforms it into the coordinate system of A. Once again, remember to consider the units; you can avoid many problems and mistakes. Suppose the previous problem was aimed at taking data originally recorded in decimal degrees and translating it to a projected coordinate system like UTM. Then, we would see the units as follows: A [meters/degrees]

×

U [degrees]

=

R [meters].

(2.41)

Of course, we would have to add the proper northing and easting to the data with UTM. This can be done again with another matrix, say another easy-to-build 2×1000, with each row having a constant value to apply the proper shift corresponding to the projection’s datum. This change from one basis to another will be taken up in a later chapter. 2.4.2 Linear Combinations and the Span of a Matrix

In the column view for Ax = b, we have x as a solution vector whose components tell us how to combine the columns of A to get to the point b. Another way of stating this is that b is formed from a Linear Combination of the columns of A using the factors in x as the multipliers. In this case, and all of linear algebra, a linear combination of the columns means we multiply the columns by scalars and add them together to get a data vector. We can also check whether any of the columns are a combination of some or all of the other columns. If not, the columns are all Linearly Independent and can fill a space with the same dimensions as the number of columns. If some are dependent or simple combinations of the others, then we have to lower the number of dimensions that the columns of A can fill and call the columns Linearly Dependent. We call the space filled by the columns the Span of the matrix. Span is a fundamental concept in linear algebra, which one will often use in later chapters, commonly combined with the words basis, rank, and dimension. So, for the example in the left part of Figure 2.3, we see that we can take some combination of the column vectors a1 and a2 and arrive at any point (x, y). We say that the two vectors a1 and a2 span all of x-y, or, in this case, all of the two

Matrix Multiplication ■ 45 10

10

-10

-10

Figure 2.3 Span of some vectors. The left side shows the situation in Figure 2.1, but with more resultant vectors shown in blue that are different combinations of the starting components a1 and a2 . Combinations of a1 and a2 can reach any point (x, y). The vector a3 equals −2(a1 ), so a3 and a1 are colinear and not independent. In the right-hand part, we show three vectors in the system: x, y, and z. None of these vectors are colinear, nor are they orthogonal. We can find a linear combination of b1 , b2 , and b3 that will make any point (x, y, z) or resultant vector r.

dimensions of the graph. The vectors themselves have two components. We are dealing with Real Numbers, which we symbolize with the letter R. For the example here, we would say they are in the vector space R2 . This means that the two vectors are in R2 and span all of R2 . In a later chapter, we will discuss vector spaces in great detail, but at this point, we put vectors with n components in vector space Rn . The vectors a2 and a3 span the x-y space, but vectors a1 and a3 do not. Why? In this case, a3 is a linear combination of a1 because a3 = −2(a1 ). The vectors a1 and a3 are not Linearly Independent. We would say that a1 and a3 do not span all of R2 , but only span lines that are parallel to a1 and that run through the origin. In other words, a1 ∥ a3 . On the right side of Figure 2.3, we have an example for three dimensions or R3 . Here we have the three vectors b1 , b2 , and b3 (presumably from a matrix B) that are not colinear but not necessarily orthogonal. Here, we can see that these vectors span all of R3 . We could position the vector r anywhere in the x-y-z space and we could still find a combination of b1 , b2 , and b3 that could reach r. In other words, we can find scalars c1 , c2 , and c3 so that c1 b1 + c2 b2 + c3 b3 = r for all vectors r in the x-y-z space. We can even do this if r is the zero vector by setting c1 = c2 = c3 = 0.

46 ■ Linear Algebra for Earth Scientists

2.4.3 Do We have it all Straight?

There are three ways of thinking about the equation Ax = b. The first is that it is a system of simultaneous equations that we have to solve for x with knowns A and b. This is the row view. The second is that we combine column vectors from A to get b by solving for the entries in x. This is the column view. The last is transforming the vector x into the vector b using the matrix A. This is the transformation view. In this case, we are using both A and x as knowns to produce an output b. In geologic examples, we use all three approaches to help the reader to be able to figure out which view(s) is the appropriate or informative one for the problem at hand.

2.5 GEOLOGIC TRANSFORMATIONS This section presents several examples of geologic problems well suited to matrix multiplication. We will emphasize many examples from structural geology, but the processes we show are entirely general. The examples will show mostly the transformation view. The following chapters will present many examples for thinking about the equation Ax = b as solving for x. 2.5.1 Strain Matrix and Transformation

Strain is the relation of the deformed shape of an object or marker to its original shape. This process is a transformation of points to different locations. We will take a square with unit-length sides as our undeformed shape and start with two examples that form the end member types of finite strain. The first is stretching and squishing alone. We call this type of deformation Pure Shear Strain. The second leaves the base fixed but transforms the rest of the square to the right—or left—as we go upward. This is called Simple Shear Strain. We give the matrix that describes the initial shape and the pure and simple shear matrices. We specify the initial shape using the coordinates of the four corner points in a 2 × 4 matrix. We could also consider this as four column-vectors that go to each corner point, as shown in Figure 2.4. Either way, it gives the same result. The strain is a transformation matrix acting on vectors with two components, so it must be 2 × 2. "

0 0 1 1 0 1 1 0

#

Starting matrix

"

#

2 0 0 0.5

Pure shear (100%)

"

1 1 0 1

#

(2.42)

Simple shear (45°)

The pure shear transformation takes every point in the square and multiplies the horizontal coordinate by 2 and the vertical coordinate by 0.5. We can read the final position of the upper right corner as taking the point (1, 1) and moving it to (2, 0.5). The pure shear matrix is diagonal because there are only nonzero terms along the upper left to lower right diagonal. The simple shear transformation shifts points in the horizontal direction by an amount equal to the distance in the vertical. We can read the final position of the upper right corner as taking the point (1, 1) and moving

Matrix Multiplication ■ 47

Example of deformed unit box and brachiopod. The pure shear transformation is shown along the bottom, and the simple shear transformation across the top. Understanding and modeling such strains is a basic operation in structural geology. The original brachiopod image is courtesy of Mark A. Wilson, College of Wooster. Figure 2.4

it to (2, 1). The two strain matrices accomplish very different deformations, as shown in Figure 2.4. We show another set of examples of the pure and simple shear strains in Figure 2.5. What if we overprint the two types of shear? In other words, superpose simple on pure shear and pure shear on simple shear. We might expect the final shapes to be similar after having experienced both deformations. We show the results in the right-hand part of Figure 2.5. Again, the final shapes are quite different. Simple Shear

2

2.5 0.5

0.5

Pure Shear

1 2 1 Simple Shear

2

Pure Shear

4 0.5

Pure and simple shear deformation. The upper left is an undeformed square in a cartesian coordinate system. The top line of shapes is progressive pure, then simple shear; the lower line is simple followed by pure shear. The deformation can be viewed as matrix multiplication, so the order of deformation matters. All strains preserve the area of the square and are called plane strain deformations. Figure 2.5

48 ■ Linear Algebra for Earth Scientists

If we call the pure shear matrix A and the simple shear matrix B, then we can compute the resultant strain matrices as AB and BA. These are two different transformations or paths for deformation. The transformations multiply, and we get two different finite strain matrices, as we see in the following equation: "

1 1 AB = 0 1

#"

"

# "

#"

2 0 BA = 0 0.5

#

2 0 2 0.5 = , 0 0.5 0 0.5 # "

(2.43)

#

1 1 2 2 = . 0 1 0 0.5

(2.44)

This is a nice example showing that AB ̸= BA and affirming, in this case, that the multiplication of matrices is not commutative. This also fits with the fact that strain is a path-dependent process.

MATLAB® moment—Identity and other matrices Using an Identity Matrix is common. It is so common that there is a function to do it. # " 1 0 0 1 1   , >>A=eye(3) gives A = 0 1 0, >>A=ones(2) gives A = 1 1 0 0 1 



0 0 0   and >>A = zeros(3) gives A = 0 0 0. 0 0 0 



1 1   >>A=ones(3, 2) gives A = 1 1. 1 1 



The first index is the number of rows; the second is the number of columns. This also works for zeros() and eye(). Note that eye(2, 3) gives a rectangular matrix.

2.5.2 Transformation by 2D Rotation

Another transformation matrix rotates data around the origin in a 2D graph with no change in the lengths of vectors. We show this in Figure 2.6 with a rotation for unit vectors u ˆ and v ˆ by angle θ. We will put these vectors into the columns of a matrix we call A and follow them as they rotate. In the figure and equation 2.46, we see that the (1, 0) position goes to (cos θ, sin θ) and that (0, 1) goes to (− sin θ, cos θ). We will put the new positions in the columns of a matrix we call B. We get the new

Matrix Multiplication ■ 49

+

1

1

+ + -1 1

+

+

1

1 RHS

The left side shows a counterclockwise rotation around the origin by an angle θ. The starting and ending positions of coordinate vectors are shown with red and blue arrows, respectively, for their initial and final positions. Positive rotations are counterclockwise. The right side shows the rotation in three dimensions, with a vector parallel to the u axis undergoing a positive rotation around the v axis. The resulting rotated vector is shown in blue. Figure 2.6

positions by taking BA. h i 1 cos θ 0 − sin θ 1 0 ˆ v ˆ = u ˆ= → , v ˆ= → , A= u , 0 1 cos θ 0 1 sin θ " #

"

#

" #

cos θ − sin θ B= sin θ cos θ "

#

"

#

"

cos θ − sin θ → BA = . sin θ cos θ "

#

(2.45)

#

(2.46)

We can see that the vectors u ˆ and v ˆ together form an Identity matrix. These unit vectors are a basis for the u-v system. They span all of u-v and are orthogonal. As a general rule, if we know what happens to these basis unit vectors, we know how to transform all of the vectors in the system. The matrix B is constructed to tell us where the vectors u ˆ and v ˆ go during the transformation. Because any vector in the u-v system can be written as t = c1 u ˆ + c2 v ˆ we get: Bt = B(c1 u ˆ + c2 v ˆ) = c1 B u ˆ + c2 B v ˆ.

(2.47)

Also, note that if the upper right entry of B is negative, this corresponds to a counterclockwise rotation. If the upper right is positive and the lower left is negative, it is a clockwise rotation in this 2D, right-handed coordinate system. 2.5.3 Transformation by 3D Rotation

We can also see in Figure 2.6 the setup for rotations in 3D graphs. In this case, we have the coordinate axes u, v, and w with their unit vectors. This is a righthanded system, and the positive rotations are shown. We start here by individually considering rotations about each axis by an angle θ similar to what we did in 2D.

50 ■ Linear Algebra for Earth Scientists

With that, our setup looks like this: 1 0 0   0 cos θu − sin θu , 0 sin θu cos θu Rotation u-axis 



cos θv 0 sin θv  0 1 0  ,  − sin θv 0 cos θv Rotation v-axis 



cos θw − sin θw 0  cos θw 0   sin θw 0 0 1 Rotation w-axis. 



(2.48)

We can easily see that these matrices accomplish what we want as far as rotating around the coordinate axes. We get the final vector we seek if we take any coordinate vector, say (1, 0, 0), and apply the appropriate rotation. For example, multiplying by rotation around the v-axis would give us (cos θv , 0, −sin θv ) or a vector rotated to pointing toward us and down in Figure 2.6. We previously noted that if we are given direction cosines for a vector, l2 + m2 + n2 = 1. This means that if we know the rotation relative to two axes, the magnitude of rotation with the third axis is defined. We can establish the sense of rotation from the signs of the other two. In the following section, we will discuss this more fully when we return to direction cosines again.

MATLAB® moment—Trigonometric functions All of the familiar trigonometric functions are available to use. You can work in degrees or radians if you pick the correct function. >>t=90, r=pi give t = 90 degrees and r = π radians. >>cos(r) gives cos(π) = −1 using radians. >>sind(t) gives sin(90) = 1 using degrees. >>cos(x), sin(x), tan(x) take x in radians. >>cosd(x), sind(x), tand(x) take x in degrees. Other trigonometric functions like secant, cosecant, and cotangent follow the same pattern. Inverse functions are made by adding a as a prefix, like acos = cos−1 gives the result in radians, and acosd gives the answer in degrees.

2.6 WHAT MORE CAN WE DO WITH MATRICES? We now have taken three different views of matrices. The first two we saw were the column and row views, and we were motivated to take these views in solving coupled linear equations, like the three-point problem or the mineralogy and geochemistry of a rock. Data fit into equations with unknowns we wanted to determine in these cases. The third view, the transformation view, moved a series of points to new coordinates by multiplying matrices and vectors. We used examples of geologic strain and reprojecting a set of coordinates. What more can we do that also fits in with the theme

Matrix Multiplication ■ 51

of this chapter on matrix multiplication? With the transformation view, we get the idea that we can organize information into a matrix and then transform it. We are all used to organizing data this way. It looks like a table; we see tables everywhere, and you have probably made many. A table is a matrix. The cells are the values; the rows and columns are usually different quantities, like samples arranged in rows and characteristics arranged in columns. We can take the row view by reading rows or use column vectors by reading columns. The next step we will take is using data organized into a matrix to do more calculations where we mimic the multiplication operation, like multiplying columns by rows but adding the multiplier’s ability to be a function that acts on matrix elements. This lets the multiplying matrix act on the elements of the target matrix in different ways. We will show an example of computing direction cosines of linear and planar features measured by a geologist. Again, many examples here will come from structural geology, but the use of oriented data is ubiquitous across all areas of the Earth Sciences. We revisit the direction cosines and introduce some more advanced concepts. Remember that although trend and plunge and dip azimuth and dip seem like single vectors, they hold even more information because they are measured in an orthogonal coordinate system of compass directions and up or down. MATLAB® moment—Why is the sin(π) ̸= 0 Because MATLAB® is a computer program using a binary representation for numbers, sometimes you will not get exactly the answer you expect. >>theta=pi gives theta = π radians. >>sin(theta)=1.2246e-16, but sin(π) = 0! This is the result of operations done at machine precision. MATLAB® used a binary representation very, very, very close to π and computed a number indistinguishable from 0. Nothing is wrong. Just recognize that 0.00000000000000012246 ≈ 0, although it is not exactly 0.

2.6.1 Computing Direction Cosines from Orientation Measurements

Most geologists use a hand-held mechanical compass or a digital compass mobile application to collect orientation data. Regardless of the method, the geologist gets bearings relative to north and inclinations down into the Earth relative to the horizontal. We now consider how to get from the geographic system to one expressed in a matrix form as direction cosines. Please review the explanation of direction cosines in the previous chapter if necessary. In Figure 2.7, we show an example of a vector to orient lines from geologic features. This shows how to orient a single line measurement, such as a slickenline or a slip vector. We describe in the field the orientation by taking the trend and plunge of

52 ■ Linear Algebra for Earth Scientists

The direction cosines for a plunging lineation with a horizontal unit circle and the NED coordinate system. The green point is the intersection of the vertical projection of the tip of the lineation with the horizontal surface. The direction cosines are given by coordinates (l, m, n) for the tip of the vector. The magnitudes of l, m, and n are the projections of the red arrow/lineation resolved onto the NED axes. The white area around n is meant to show a hole through the surface of the gray disk, revealing the point and vector below. Figure 2.7

-1 U

RHS

W -1

length=1

S -1

1 N trend

1

plunge

E

1 D

our feature (referred to as T and P ). The trend is the bearing of the line relative to north projected on the horizontal plane. The plunge is measured as the vertical angle from the trend line on the horizontal surface. Given the measurements T and P , how do we get the direction cosines in the NED coordinate system? This is worked out in many texts, and we only quote the result here: l

= cos (α) = cos (T ) cos (P ),

m = cos (β) = n

=

cos (γ)

=

(2.49)

sin (T ) cos (P ),

(2.50)

sin (P ).

(2.51)

The variables α, β, and γ are the angles between the vector of interest and the N, E, and D axes, respectively. A full explanation and illustration of measuring the angles α, β, and γ are given in the previous chapter. We can take the same approach if we work with the vector for dip azimuth and dip of planar features as shown in Figure 2.8. In this case, dip takes on the role of plunge, and dip azimuth acts like the trend. dip azimuth → trend, and dip → plunge

(2.52)

We orient the line running down the surface just like we did the trend and plunge for a lineation. It is common to use the Pole or Normal Vector to a plane rather than its dip azimuth and dip, especially when calculating an average orientation or more complex features such as fold axial directions (more on this later). We show the relations of the normal vector to the dip azimuth and dip in Figure 2.8. The conversion is easy

Matrix Multiplication ■ 53

A plane in blue is shown on a unit radius lower hemisphere. The dip azimuth and dip are shown as the red vector pointing down to the left; the pole is on the right. The dip azimuth-dip and pole are vectors in the NED coordinate system. The E-axis is omitted for clarity. Vertical projections of the pole and dip azimuth-dip vectors are shown as green circles on the upper disk. Note that the strike direction is northeast. Because when looking in this direction, the plane dips to the right. Figure 2.8

po

le

a be

di

g

N

uth

zim pa

ri n

pole

1

st rik e

dip

1 D

in that the orientation of the pole/normal is: pole bearing = dip azimuth − 180° = strike − 90°,

(2.53)

pole or normal inclination = 90° − dip.

(2.54)

In all these cases, just a single pair of numbers gives the orientation. One is like the trend in that it provides a bearing relative to north in the horizontal plane, and the other is like the plunge that gives the inclination from the horizontal. Using these measurements, we can compute the direction cosine appropriate for any line or plane we may encounter. This has the advantage that we can see the attitude immediately from the signs of l, m, and n, that is, which way the line is plunging or the plane is dipping, as explained. 2.6.2

Matrix Transformation from Trend and Plunge to Direction Cosines

Given the rules for converting a trend and plunge or dip azimuth and dip to direction cosines, is there a way to represent this in matrix form? The measurements can be arranged to look like a table: "

m =

T P

#

"

#

T1 T2 · · · Tn , or for several entries → M = . P1 P2 · · · Pn

(2.55)

What would we want to do with this matrix? We want each row or column vector to form the input for matrix multiplication or a function. We must expand our view of functions to one where the function can take a vector as an input and give a vector as an output. In our case, we will use each matrix column as the input to a function; the function’s output will be another column vector. This function is Multivariate,

54 ■ Linear Algebra for Earth Scientists

the common name, or simply a Vector Input Function. We must use equations 2.49 to 2.51 to turn the trend-and-plunge vector into a direction cosine vector. We will arrange the function to resemble a matrix multiplication for L. cos( ) · cos( )    sin( ) · cos( )  . (1) · sin( ) 



L

=

(2.56)

Each row of L takes in the same column vector as an input. We insert the T or first column value into the first () of L and the P or second column value into the second () of each row of L. The last row of L has no place for the variable T ; it is already taken up by 1, so it only uses the value for P . Now, we let the function L act on a single trend and plunge vector or the whole trend and plunge matrix. T L P

cos(T ) · cos(P )    sin(T ) · cos(P )  (1) · sin(P )

=

 





!

l

=

(2.57)

  m .

n

The results in equation 2.57 are identical to equations 2.49 to 2.51. The function L takes in the vector of trend and plunge, 2×1, and returns a vector of direction cosines 3 × 1. The function L is a vector input and output function. If we used the matrix M of trends and plunges from 2.55, we would create a wider matrix of results. Working on the matrix M that is 2 × p, we get an output matrix of direction cosines we will call C that is 3 × p. L(M ) = L

T1 T2 · · · Tp P1 P2 · · · Pp

!





l1 l2 · · · lp   = m1 m2 · · · mp  = C n1 n2 · · · np

(2.58)

We now can use our matrices and vectors to multiply each other, and we can take a vector or matrix of one size as the input to a function and get out a vector or matrix of a different size. In equation 2.58, we see we can simply write L(M ) = C. 2.6.3 Plane Orientations to Direction Cosines

We have seen our definitions of trend and plunge and how these translate to direction cosines using the vector input function L. We show in Figure 2.8 a dipping plane with its strike and dip, and noted above that the dip azimuth and dip or the pole to the plane can also be represented by single vectors with a bearing and vertical angle. In practice, dip azimuth and dip or poles to planes can fit into the function we made for trend and plunge. We use the dip azimuth the same way we use the trend, and the dip angle works similarly to plunge as shown in equation 2.52. Thus, our input vector is written in terms of dip azimuth and dip. This gives our input vector and the vector input

Matrix Multiplication ■ 55

function L. "

Input vector:

L

DD D

!

dip azimuth = DD dip = D

#

(2.59)



l cos(DD) · cos(D)     =  sin(DD) · cos(D)  = m . n sin(D) 

 



(2.60)

The most common way to plot and analyze plane orientation data is to use the pole to the plane as illustrated in Figure 2.8 and discussed in the previous chapter. The normal vector points downward from the plane, and the orientation of the pole is 180° to the dip azimuth and has an inclination that is just the complement to the dip, that is, 90°—dip. The pole, just like the dip azimuth and dip, is a vector, so we can make equations for the pole similar to those for trend and plunge. We showed the relationship above in equations 2.53 and 2.54. We get the equations for the direction cosines to use in a vector input function and can simplify them using trigonometric identities. "

Input vector:

Lpole

Lpole

DD D DD D



(2.61)

cos(DD − 180°) · cos(90° − D)   =  sin(DD − 180°) · cos(90° − D)  , sin(90° − D)

(2.62)

l − cos(DD) · sin(D)     . =  − sin(DD) · sin(D)  = m n pole cos(D)

(2.63)

!

!

#

pole direction = DD − 180° pole inclination = 90° − D 





 



We know how to get from strike and dip to dip azimuth and dip, so we just repeat that operation of subtracting 90° from the strike. From this, we can go to the pole as outlined above. "

Input vector:



sin(strike) · sin(D) l strike     = − cos(strike) · sin(D) = m . D cos(D) n pole !

Lpole

#

pole direction = strike − 90° pole inclination = 90° − D





(2.64)

 

(2.65)

2.6.4 Equation of a Plane and Its Normal or Pole

We want to emphasize here why it is natural to orient a plane by specifying the orientation of its normal. Let us take a straightforward example of the equation of a plane using x + y + z = 0. This is an equation for a plane going through the origin and having the configuration shown in Figure 2.9.

56 ■ Linear Algebra for Earth Scientists

Plane and pole to the plane. The plane is shown in yellow and is defined by the equation x + y + z = 0. The normal direction or pole is perpendicular to the plane and goes from the origin through (1, 1, 1). These components are shown in the blue dashed line. We can take the√length of this vector, which is 3, and divide each of the components by this to get the direction   cosine vector (l, m, n) = √13 , √13 , √13 . We can multiply the unit vector by any constant c and retain the perpendicular angle with the plane. Figure 2.9

What is a normal vector to the plane? The simplest way to find a normal is to identify the coefficients that multiply x, y, and z, in this case, 1, 1, and 1. This point defines a vector normal to the plane we write in the form u = [1 1 1]. We can write this in equation 2.66.  

a   Plane P → ax + by + cz = 0 → Pole or normal vector u =  b  c

(2.66)

So, we see how to write a vector that is normal to a plane by using the equation of the plane. The coefficients of the equation of a plane are the same as the coordinates of a normal vector to the plane. What would we write for the equation of a line in 3D? This becomes tricky because the easiest way to do this is through a parametric equation. This means that we express the conditions on x, y, and z using another variable, let’s call it t in this example, and have the position x, y, z expressed as functions of t. We would write that x = x(t), y = y(t), and z = z(t). For the vector in Figure 2.9, our equations are just x(t) = y(t) = z(t) = t. In fact, for any line from the origin through a point, (a, b, c), the equations are done by a simple linear transformation as x = at, y = bt, z = ct. This means that for t = 1, the line goes through the point (a, b, c). √ This is a vector, and if we divide the components by the length of the vector— a2 + b2 + c2 —we would have a set of direction cosines. In orientation analysis, we express the orientation as the value of the direction cosines l, m, and n. For the case in Figure 2.9, we know l = m = n. If we take these values proportional to the point (1, 1, 1) then we get the length of the vector being √ 3, so l = m = n = √13 .

Matrix Multiplication ■ 57

Diagram for rotation around each axis in the NED system. The views are along the E, N, and D axes. The lower right part of the figure shows the setup of axes and lineation shown in red. The top left shows the setting plunge relative to the north axis. In this case, the rotation is clockwise, looking down the E axis. This means that the angle must have a negative sign and will be shown in equations as −P . Lower left is rotation around D, looking upward. In this orientation, the rotation is counterclockwise for the trend. The upper right shows rotation around the north axis for completeness. Figure 2.10

Looking west

+

Looking south

E

+

N

E N plunge (-) D

D

Looking up

+

D

N

E

T

P

E

trend N

D

2.7 MORE ABOUT 3D ROTATIONS We discussed 3D rotations in Section 2.5.3. We revisit these ideas here using the trend and plunge of lineations. We can take these values and compute the direction cosines. This might seem like extra work when we already know the trend and plunge, but the effort is rewarded because the direction cosine values l, m, and n form a unitlength vector in NED. When we have the trend and plunge, we use equation 2.57 to get the projections onto the north, east, and down axes, as shown in Figure 2.8. We can make similar calculations for planar features using equations 2.62 for dip azimuth and dip and 2.65 for strike and dip. How can we connect these values to 3D rotations? A way to view the trend and plunge is that it results from the rotation of a unit vector initially parallel to one of the axes. Since trend is measured from north, we would start with the point (1, 0, 0) in NED. This is then rotated by the plunge angle around the E axis. This is shown in Figure 2.10. For this, we have: cos(P ) 0 sin(P ) 1 cos(P )      0 1 0  0 =  0  . − sin(P ) 0 cos(P ) 0 − sin(P ) 

 





(2.67)

A confusing aspect of this multiplication is that the entry for the down component is -sin(P ), which looks like a negative number. But the line is plunging downward, so the value of n must be positive. This is because rotating downward from due north

58 ■ Linear Algebra for Earth Scientists

+ N l(+) m(-) n(+)

+

N l(+) Map View

m(+) n(+)

+

E

+

E

l(-) m(+) n(+)

l(-) m(-) n(+)

+

D

D

+

The left part shows the NED coordinate system and the signs of positive rotation around all axes. Also shown are the signs of the direction cosines l, m, and n for each plunge and dip azimuth quadrant. Remember, plunge is always measured in its down direction relative to the map view. This means that n is always positive because we follow vectors into the Earth. The right-hand side is another representation of the coordinate system and rotations. The angles we measure are shown relative to the starting position for each rotation angle. Also given are the unit vectors for representing vectors in NED. Figure 2.11

is a negative rotation angle if this is done relative to the positive E axis. The plunge is measured downward from the horizontal, whereas the rotation angle is measured upward from the D direction. This is shown in the upper left of Figure 2.10. The rotation from north downward is a negative angle relative to the east axis. The sine of a negative acute angle is negative, so the last entry in 2.67 will be positive. We show all rotations about the NED directions in excruciating detail in Figures 2.10 and 2.11. Again, another advantage of computing the direction cosines l, m, and n is that looking at their signs reveals which quadrant the line plunges or plane dips, also shown in Figure 2.11. We rotate the plunging vector by the amount of the trend. This is done by using the rotation matrix for the D axis. In this case, the angle is T, and because it is rotating in a positive manner, as we see in Figure 2.10, we do not have to worry about signs. The result of our multiplication is cos(T ) − sin(T ) 0 cos(P ) cos(T ) cos(P ) l        cos(T ) 0  0  =  sin(T ) cos(P )  = m .  sin(T ) 0 0 1 sin(P ) sin(P ) n 









 

(2.68)

The result is exactly the same as our formulas for direction cosines in equations 2.49 to 2.51. It is very similar to the one we got above in equation 2.67, except for the sign of the last term. Because of the way we set up the problem by starting with a negative rotation for the plunge, the negative sign for the last term will always cancel out when computing the direction cosines. Since data are collected always looking downward, there is no need to worry about the sense or rotation for the plunge or dip; just taking the sine gives the correct value for n. The geologist’s trend observation ensures the vector points in the right direction.

Matrix Multiplication ■ 59

MATLAB® moment—Function: Direction Cosines Here is an example of making a function. The one here we call dircosTP, and it takes the trend and plunge as inputs and returns a vector of direction cosines. function [ dcvect ] = dircosTP(trend, plunge) % DIRCOSTP % This is a function that takes in a single Trend and Plunge as input and returns % a vector that is the three-direction cosines for that Trend and Plunge. % Variable names follow common usage for input. % Input variables are assumed to be in degrees, not radians % Other functions will take input in degrees % Output is a single vector and is for a North-East-Down or NED system. % % trend - angle measured clockwise from north. Trend on a horizontal surface. % plunge - vertical angle with horizontal. This value can be positive % or negative, although the usual convention is to be positive. % If negative, the dcvect is pointed upward. % dcvect - vector of direction cosines. The convention is that these % will be in the order [l, m, n]. This means they use angles relative % to the North for l, East for m, and Down for n. % dcvect = ones(3,1); ones. trend = deg2rad(trend); plunge = deg2rad(plunge);

% Initialize output vector as column vector of % Convert trend to radians % Convert plunge to radians

dcvect(1) = cos(trend) ∗ cos(plunge); dcvect(2) = sin(trend) ∗ cos(plunge); dcvect(3) = sin(plunge); end

% Compute l % Compute m % Compute n

% You always need an end statement

60 ■ Linear Algebra for Earth Scientists

2.8

SUMMARY

This chapter covered matrix and vector multiplication in all its splendor. The fundamental equation of this part of linear algebra is Ax = b, a matrix times a vector resulting in a vector. When we multiply a vector times a vector, we get a scalar, but a matrix times a vector is another vector. This latter multiplication can happen when the matrix has m rows and n columns, and the vector has n rows, producing another vector with m components. We can also multiply a matrix times a matrix as long as the matrix on the left has the same number of columns and the matrix on the right has rows. Multiplication of a vector times a vector follows all the good rules of a product. It is commutative, associative, and distributive. And multiplying by a scalar is fine as well. Matrix multiplication, on the other hand, is decidedly not commutative but is associative and distributive. The next big subject concerned the main diagonal of a matrix. The operation of transposing a matrix is flipping rows and columns around the main diagonal. For square matrices, this is pretty simple and makes sense. For rectangular matrices, this exchanges rows for columns. The easiest way to show this is that for a component aij in a matrix, A goes to the component aij in A. We also call a matrix diagonal if the only nonzero entries are along the main diagonal. The most used diagonal matrix is the identity matrix we symbolize with I. The identity matrix has ones along the main diagonal. Multiplying by I for a matrix or a vector is like multiplying by 1 for a scalar. And unlike most matrices, the identity matrix always commutes in multiplication. Multiplying matrices can be used to solve sets of equations. We commonly take this as the row view of the system. Alternatively, we can think of multiplication in terms of columns when trying to find amounts of different chemical components, for example. The last view is that multiplication performs a change in coordinates or a linear transformation of coordinates. This is naturally called the transformation view. In the Earth Sciences, one of the most common transformations is rotation around a cardinal direction, like north or east. This requires a rotation matrix to act on the data to change its orientation or reference system. This view also applies to understanding strain within rocks in that it can describe how deformation changes the shape of any geological marker.

2.9 1.

EXERCISES Consider the following three matrices. 2 3 2 3 −9  −9  " # 7 7     1 3 5 7 9     5 , C =  1 3 . A= , B= 7     −2 −10 −12 5 5  1  1 3  3  −2 −10 −2 −10 





(i) Give the sizes of A, B, and C in terms of rows and columns. (ii) A times B. (iii) B times A.



Matrix Multiplication ■ 61

(iv) (v) (vi) (vii) (viii) (ix) 2.

B times I. A times C. C times A. AT times A. B T times B. A times AT .

Let’s work with some strain matrices. (i) Write the 2D simple shear matrix in terms of a shear variable λ. (ii) Write the 2D pure shear matrix in terms of the deformed length divided by the original length. This representation is called stretch and is usually symbolized as S. (iii) Is pure shear then simple shear the same as simple shear then pure shear? (iv) Write a matrix for 3D for pure shear with no volume change. Justify that the strain does not change volume.

3.

Look over the MATLAB® function to compute direction cosines. Copy this over into MATLAB® on your computer. Then, run it on the following data in trend and plunge and give the results in terms of direction cosines. Note that the first row is the trends, and the second is the plunges. "

45 55 105 265 303 A= 5 10 20 45 65

#

(2.69)

Now, see if you can modify the function to take an array as input, not just a single point. You will not have to change very much. 4.

We will now make several random matrices in MATLAB® using the rand() function we saw in last chapter. Make at least one each square, rectangular and wide, and rectangular and tall matrices. Now multiply each matrix by its transpose and each transpose by the starting matrix. Describe the shapes and other special features of the products of A′ ∗A and A∗A′ .

CHAPTER

3

Solving Ax=b

In this chapter, we take up one of the most important applications in linear algebra, finding the solution to a set of simultaneous linear equations. The problem is presented in the simple algebraic form Ax = b. Each variable plays a critical role in this equation. First, the matrix A contains the coefficients of the equations we aim to solve. The coefficients matrix A multiplies the unknowns in the vector x. We are trying to find the value of each variable in x, so we refer to this as the solution vector or vector of unknowns. The product of Ax gives us the vector b, which we refer to as the data vector. This chapter starts with Gaussian elimination and follows with the first factorizations of the matrix A. We end with the Gauss-Jordan method for elimination and computing the inverse of A.

3.1 ELIMINATION A common way to solve for x in Ax = b is using a process called Gaussian Elimination. This process successively replaces or eliminates components of the coefficients matrix until we get a single component equal to a single result in the data vector. Dividing the result by the remaining coefficient gives us a value for one of the unknowns. Once we solve for one component of the solution vector, we successively solve for the others by a process we call Back Substitution. We also stated before that the solution vector is just x = A−1 b for square coefficient matrices. A commonly used and reliable method for getting A−1 relies on Gaussian elimination. It turns out that the result of elimination, depending on the method we use, not only gives us the values of the unknowns, it can tell us whether a matrix has an inverse and can provide the inverse matrix A−1 and the value of the determinant of the matrix—more on this and in later chapters. We will start by solving a familiar problem and examples that meet two criteria. The first criterion is that the matrix is square. This means we have an n × n matrix that multiplies an n × 1 vector of unknowns to reach a data vector that is also n × 1. The second criterion is that the columns of the matrix are linearly independent, meaning that no column is a combination or factor of other columns. If both these conditions are met, then we can find a vector of unknowns that is the solution to Ax = b for any A and b.

62

DOI: 10.1201/9781003432043-3

Solving Ax=b ■ 63

20

10 20

N 70

70 60

10

E

20

70

N

D 20

60 70

E

D

Top View

Diagram of the three-point problem in NED coordinates. Each point is fixed in its location relative to N and E. The third coordinate value is the depth measured in the D direction. Refer to Chapter 1 for a more detailed explanation of how we arrived at the equations for the problem. Figure 3.1

3.1.1 The Three-point Problem, Revisited

Let’s return to the three-point problem we discussed earlier and solved using regular algebra. We mentioned that it is a bit easier to solve in linear algebra and that the equations are pretty straightforward. This is a good problem in that, for linear algebra, the problem is solvable and well-behaved so long as the points are not colinear or nearly so in map view. Different combinations of elevations present no issues with the three-point problem. This will be our first problem in using elimination. We give a somewhat new setup in Figure 3.1. First, we use the NED coordinate system. Because the positive D is downward, we will refer to the depth of points and not their elevation. Second, we emphasize the north and east locations of the points because these form the coefficients matrix for the problem. The locations set the coefficients matrix A, the unknowns are in the vector x, and the data points are in the vector of depths b. We repeat the equations that we will solve and give Ax = b with the known values substituted for the variables. 10mN + 20mE + D0 = D1 ,

(3.1)

20mN + 70mE + D0 = D2 ,

(3.2)

70mN + 60mE + D0 = D3 ,

(3.3)

64 ■ Linear Algebra for Earth Scientists

10 20 1 mN D1      20 70 1  mE  = D2  . 70 60 1 D0 D3 









(3.4)

We see that A in equation 3.4 is made up of the coefficients of the equations in equations 3.1 to 3.3. The unknowns are arranged in the vector x containing mN , mE , and D0 . Finally, the data vector appears on the right-hand side of the equation and comprises the observed depths. For the first case, the depths used are 30, 90, and 130 for D1 , D2 , and D3 , respectively. The process of elimination works on the rows of the matrix A. We aim to work through the matrix row by row and column by column to eliminate coefficients one at a time. The process will produce a matrix in a form we call Upper Triangular, which means that all the coefficients below the main diagonal are set to zero. This process for a suitable matrix, one that is square and has independent columns, results in a single nonzero entry in the last column of the last row. This means we can solve for the last unknown using the remaining coefficient. We then work our way back through the matrix, a row at a time, solving for the other unknowns. This process is called Back Substitution and is the necessary step following elimination to determine the values of the unknowns. Everything done to the matrix A must also be done to b. We introduce an Augmented Matrix form to make the process easier to follow and compute. This is a fancy way of saying that we stick something on the right side of the matrix. In this case, it is just the data vector b but separated by a line (see equation 3.5). We also can indicate the matrix of A augmented with b by [A|b]. We then can start the process of elimination on the augmented matrix. MATLAB® moment—Ax = b for this Chapter For this chapter on solving Ax = b by elimination and various factorizations, we will focus the MATLAB® moments on using the coefficients matrix and data vector from the three-point problem. We use the first data vector for the problem presented in this chapter. This gives us the following matrices and vectors. 10 20 1   >>A=[10,20,1;20,70,1;70,60,1] gives A = 20 70 1, 70 60 1 



30 mN     >>b=[30;90;130] gives b =  90 . The unknowns vector is x =  mE . 130 D0 







After this one, we will forgo giving the instructions for creating matrices and vectors in the MATLAB® moments.

Solving Ax=b ■ 65

-20 0 N

N 12.5

E

30

E N 90

90

87.5 E

130

D

D

RHS

D

Diagram of the three-point problem. Each point is fixed in its location relative to N and E. The parameters that define the plane are given below the left and center examples, and these set the D value for each point. A plane can fit the 3 points regardless of the D values. The change in the plane is recorded in the values for mN , mE , and D0 . The right side emphasizes that we can solve for any plane intersecting the three given locations. The locations set the coefficients matrix, and the depths set the data vector. This is the row view of the three-point problem set up in equations 3.1 to 3.3. Figure 3.2

We will work on the two examples in Figure 3.2. We will start using the depths in the left part of the figure and substitute values of 35, 90, and 130 for D1 , D2 , and D3 in equation 3.5. These values for depth were computed as D = N + E, which will give the result that mN = mE = 1 as shown in the left part of Figure 3.2. 10 20 1 D1 10 20 1 30     20 70 1 D →   20 70 1 90 . 2  70 60 1 D3 70 60 1 130 







(3.5)

Elimination starts with the upper left entry or the left entry of the first row, in this case, 10. We call this the Pivot or Pivot Position, and show it as a boxed value in equation 3.6. We then use this value to eliminate the first entry in the following rows so there are zeros below the first pivot in the first column. We do this by multiplying the whole first row, including the augmented part, by the amount needed to have the pivot equal to the value below it. For the second row in this example, we multiply by 2. We then subtract the modified first row from the second row. This will leave a zero in the first coefficient of the second row and create new values for the other coefficients and the augmented entry. For the third row, we multiply the first row by

66 ■ Linear Algebra for Earth Scientists

7 and subtract. 







10 10 20 1 30 20 1 30     = (20 − 20) (70 − 40) (1 − 2) (90 − 60)    0 30 −1 30 . (70 − 70) (60 − 140) (1 − 7) (130 − 210) 0 −80 −6 −80

(3.6)

MATLAB® moment—Making an augmented matrix In working with Ax = b, we want to make the augmented matrix. This is the same as concatenating vectors to make a matrix, except now we put together a matrix and a vector. 10 20 1 30   >>C=[A b] gives C = 20 70 1 90 , A augmented with b or [A|b]. 70 60 1 130 



Our result is shown in equation 3.7, with the next pivot highlighted. This is the second entry of the second row. We repeat the following process to eliminate values in the second column below the pivot. To do this, we multiply the second row as it now appears in equation 3.7 by 8/3 (80 ÷ 30). The result is shown in equation 3.7, with the last pivot position highlighted. 10   0 0 

10 20 1 30   =   0 30 −1 30 (−80 + 80) (−6 − 2.7) (−80 + 80) 0 



20 30 0

1 −1 −8.7

30  30  0 

(3.7)

We are now done with the elimination steps and need to do back substitution. Note here that the variables associated with the three columns, from left to right, are mN , mE , and D0 , respectively. We start with the last line and see that D0 = 0 (0 ÷ −8.7). Further back substitution gives us mN = 1 and mE = 1. We should have expected these values because the D value was set to N + E for each N and E pair. This gives a D intercept of 0 and a slope of 1 in the N and E directions. We have now completely solved the three simultaneous equations in 3.1 to 3.3 as well as the form Ax = b in equation 3.4. When we finish Gaussian elimination, the resulting matrix has a stair-step-like pattern formed by the pivots with all zeros below. We can call this the Row Echelon Form of the matrix. Let’s try the other example in the middle of Figure 3.2. 

10   20 70

10 20 1 12.5   70 1 90  →  0 60 1 87.5 0 



20 30 0

1 −1 −8.7

12.5  65 . 180 

(3.8)

Solving Ax=b ■ 67

We keep the coefficients matrix the same, but we will adjust the data vector. The augmented matrix on the left-hand side of equation 3.8 shows that the new resultant is D1 = 12.5, D2 = 89, and D3 = 87.5. The right-hand side gives the results of elimination. Notice that the matrix of coefficients reduces to the same eliminated form and that the difference is in what happens to the data vector that forms the augmented part. We start with the last line and see that D0 ≈ −20 (180 ÷ −8.7). Further back substitution gives us mN ≈ 0.25 and mE ≈ 1.5. MATLAB® moment—Working on a cell in an array or matrix Doing math on individual entries in an array is something we do all the time. There are several ways to do this, but we present a sure one here. We work with the matrix A given above. 10  >>A(2,1)=50 changes the entry in row 2, column 1 A =  50 70

20 1  70 1, 60 1

10  then we can subtract >> A(3,1)=A(3,1)-50 gives A =  50 20

20 1 70 1 . 60 1









MATLAB® moment—Change a row or a column in an array Replacing a row or column in an array with another value is sometimes necessary. This uses the colon character : in the array row or column designation. >>A(:,1)=50 50 20 1



changes the first column entries for all rows

A =

changes the second row entries for all columns

A =



  50 70 1,

50 60 1 >>A(2,:)=50 10 20

1

20 60

1





  50 50 50.

3.1.2

What Does Elimination Tell Us about the Matrix A ?

We started the previous section with some conditions on the matrix A. These were that it is square and that the columns are independent. The variables are in the solution vector, each component multiplying one column in the matrix. Thus, the matrix is as wide or has the same number of columns as unknowns. We need as many

68 ■ Linear Algebra for Earth Scientists

equations as unknowns to expect a solution, no more and no less. This means that the number of rows equals the number of unknowns, making the matrix square. This requirement tells us that a solution for the unknowns may exist. The independence of the columns as a requirement is more subtle. Suppose in our example for the three-point problem that our A matrix is as given in equation 3.9 with the result of elimination given on the right-hand side. 10 30 1 10    20 10 1 →  0 0 50 1 0 





30 -50 0

1  −1 0 

(3.9)

The last row has a 0 in the third column, where we usually expect a pivot. What happened? The zero row means that the third row must have been a combination of the first two rows. This means that the rows are not Independent. We can see that if we subtract one of the second row from twice the first row, we get the last row of the matrix on the left. This also means that the columns are not independent— subtracting twice the first column from fifty times the last column gives the middle column. This is also a violation of one of the things we needed to do elimination on a square matrix successfully. This shows up clearly as a 0 value in a pivot position, but a 0 value is not allowed as a pivot. The matrix has only 2 and not 3 pivots. The last row, representing the last equation, cannot be solved for the value of the third unknown, D0 , because this row gives either the equation 0 × D0 = 0 or 0 × D0 = n. In the first case, D0 is not determined because the equation is true for any value of D0 , giving an infinite number of solutions, and the second case has no solution at all. We refer to the number of pivots as the Rank of the matrix. For an n × n matrix to give a solution in Ax = b, the matrix must have n pivots. Such a matrix is called Full Rank. A full-rank square matrix has another important property. It is Invertible. This means that Ax = b has a solution for x that we can find by either elimination and back substitution or by computing the inverse of A and finding x = A−1 b. MATLAB® moment—Find the rank of a matrix Once we get into linear algebra functions and operations, MATLAB® becomes really efficient. Finding the rank of a matrix is a one-word command. >>rank(A) gives 3, the rank of the matrix A we have been using, >>rank([A,b]) also gives 3, the rank of the augmented matrix.

If the matrix is not square, it cannot be full rank. If it has a pivot in every column (n < m), it is full Column Rank. Likewise, a pivot in every row (n > m) means it is full Row Rank. Or it may have a rank lower than n and m. Please note that for all matrices, the number of independent rows is the same as the number of independent columns, and the column and row ranks are the same. We refer to the rank of the

Solving Ax=b ■ 69

RHS

Diagram showing the column view for the three-point problem. The three black vectors on the left are the matrix’s columns and their components. The red vectors in the middle and right show how much of each component we need to meet the solution. The units for the unknowns vector are a little complicated. The slopes mN and nE are in units of m/m (dimensionless), and D0 is in meters. The components in vectors a1 and a2 are in meters, but a3 is dimensionless. This diagram gives great insight into the column view of the linear algebra problem. Unfortunately, it does not add physical insight into understanding it. Figure 3.3

matrix when referring to the number of independent rows and columns, regardless of the shape of the matrix. If we call it full-rank, we are talking about a square matrix. 3.1.3 The Column View of the Three-point Problem

We looked at the three-point problem as a series of equations focusing on the rows of the matrix. We can also pose the problem differently and look at it in terms of its columns. When looking at the columns, we are not in the x, y, z or NED real-world coordinate system. We will call it the u, v, w coordinate system to emphasize the difference. The linear algebra equation is set as three vectors, the first component of which is the u value, the second the v value, and the third component the w value. Of course, this is just a way of writing Ax = b, with the matrix A composed of column vectors a1 , a2 , and a3 , with the unknowns multiplying the columns as we see in Figure 3.3. This is shown in equation 3.10. Our solution is to find the multipliers for the columns of A so that their combination or sum reaches the resulting point (30, 90, 130). 10 20 1 30         mN 20 + mE 70 + D0 1 =  90  70 60 1 130 







 





(3.10)

The solution gives us the factors that multiply vectors a1 , a2 , and a3 , which are just the values for mE , mN , and D0 , the same as previous with D0 = 0, mN = 1, and

70 ■ Linear Algebra for Earth Scientists

mE = 1. A plot of the column vectors, the solution point, and the multiplies are shown in the left and middle parts of Figure 3.3. We can also set up the situation for the second set of D values shown in equation 3.8. In this case, we have the solution vector with the values of D0 = −20, mN = 0.25, and mE = 1.5. This column view solution is plotted in the right-hand part of Figure 3.3. Unlike some examples from previous chapters, the column view of the three-point problem does not lead to a deeper understanding of the physical system. Sometimes, the column view is helpful or the more natural and informative approach, but in this case, it is not. This is not to say we are abandoning the column approach, far from it, because, for many problems, it is the path to deeply understanding the physical system and linear algebra.

3.2

ELIMINATION MATRICES AND ELEMENTARY ROW OPERATIONS

Gaussian elimination acts on the rows of a matrix. The elimination steps use Elementary Row Operations and do not change the unknowns we get. The unknown variables are each associated with a column of the matrix and are unchanged by row operations. The three elementary operations are Multiplying a row by a scalar, Adding one row to another, and Swapping or Exchanging rows. For linear algebra, the elementary row operations are the tools we use in Gaussian elimination and many other operations. And, of course, we do elementary row operations using matrix multiplication. Our approach is to add or subtract multiples of the rows from the rows below to create the pivots and eliminate components until we get to an upper diagonal matrix. We can do this using a series of Elimination Matrices that are Lower Triangular, meaning that all the coefficients above the main diagonal are zero. We use one elimination matrix for every entry in the matrix A that we want to set to zero. 3.2.1 Elimination Using Outer Products

Chapter 2 introduced matrix multiplication as inner products (rows × columns) and outer products (columns × rows). Because elimination is done on the rows of the coefficients matrix using Row Reduction, we will use the outer product method to show better the actions of elements of the elimination matrix on the rows of the design matrix. We set up an elimination matrix E that looks like the identity matrix with an extra 1 in the second row of the first column. We multiply E times the coefficients matrix that we will call U in equation 3.11. As usual, we will work on a small 3 × 3 example, but everything we do scales easily to square matrices of any size. 1 0 0 u11 u12 u13    EU = 1 1 0 u21 u22 u23  0 0 1 u31 u32 u33 





(3.11)

Solving Ax=b ■ 71

Our first operation is to create the outer product of the first column of E with the first row of U . 1 h u11 u12 u13 i     1 u11 u12 u13 = u11 u12 u13  0 0 0 0  





(3.12)

This vector makes a copy of the first row in the first two rows of the product matrix. We will use the second and third columns of E without modification to get the remaining matrices used in the outer product. Remember, these multiply only the second and third rows of U , respectively. 0 h 0 0 0 0 h 0 0 0 i i         1 u21 u22 u23 = u21 u22 u23  , 0 u31 u32 u33 =  0 0 0  0 0 0 0 1 u31 u32 u33  



  





(3.13)

Adding the product matrices from equations 3.12 and 3.13, we get the matrix for EU : u11 u12 u13 0 0 0 0 0 0       u11 u12 u13  + u21 u22 u23  +  0 0 0  = 0 0 0 0 0 0 u31 u32 u33 















u11 u12 u13   u11 + u21 u12 + u22 u13 + u23 . u31 u32 u33

(3.14)

We see a pattern of what the column entries do to the product matrices. If we had made the second entry of E in equation 3.12 equal to −(u21 /u11 ), we would get: 1 u11 u12 u13 h i     −(u21 /u11 ) u11 u12 u13 = −u21 −(u12 u21 /u11 ) −(u13 u21 /u11 ) . 0 0 0 0 







(3.15)

Now, if we add the product matrices from equations 3.13 and 3.15, we know we will get the matrix: 



u11 u12 u13   0 −(u u /u ) −(u  12 21 11 13 u21 /u11 ) . u31 u32 u33

(3.16)

The first entry of the second row is zero, so we are on our way to elimination and getting an upper triangular matrix. We do a similar operation for the third row to have a pivot in the first column. For the third row, we would use −(u31 /u11 ). Once the first column is done, we repeat the whole procedure for the new second element of the second row acting on the third row to create the next pivot. We use outer products here to demonstrate elimination. We will, of course, just use ordinary-looking elimination matrices and simple matrix multiplication to do the work.

72 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Making an elimination matrix Making an elimination matrix means combining two operations. The first is to make the identity matrix I and then change the entry in the elimination position. Again, we name the elimination matrices after the row and column position they act on. >>E21=eye(3) produces a 3 × 3 identity matrix, 1 0 0   = −2 1 0, 0 0 1 

>>E21(2,1)=-2 produces the matrix E21



which subtracts two of the first row from the second.

3.2.2

The Three-point Problem Using Elimination Matrices

Let’s return to the three-point problem we solved by previous elimination and show the individual elimination matrices. We will show all the steps from a coefficients matrix A to the upper triangular matrix U . Above, we stated that to create the first pivot and eliminate the entries below, we first added −2 of the first row to the second and −7 of the first row to the third. The operations are shown in equations 3.17 and 3.18 and give the elimination matrices we would use. We are doing the Elimination on the Augmented Matrix [A|b]. This fits ordinary matrix multiplication: the elimination matrix is 3 × 3, and the augmented matrix is 3 × 4. This will be fine for all cases because we multiply a n × n matrix on the left with an n × (n + 1) matrix on the right. We can augment as many columns as we like, and multiplication will work if the number of rows in the target matrix is the same as the number of columns in the elimination matrix. 1 0 0 10   −2 1 0  20 0 0 1 70 



1 0 0 10    0 1 0  0 −7 0 1 70 







20 1 30 10   70 1 90  =  0 60 1 130 70 10 20 1 30   30 −1 30  =  0 60 1 130 0 





20 1 30  30 −1 30 , 60 1 130

(3.17)

20 1 30  30 −1 30 . −80 −6 −80 

(3.18)

We now have the pivot in the second column of the second row. And our last step is to eliminate the −80 in the last row. We do this by multiplying the second row by 80 ÷ 30 (≈ 2.7) and then adding to the last row. That is just what the matrices in equation 3.19 accomplish. 1 0 0 10   0 1 0  0 0 2.7 1 0 



10 20 1 30   30 −1 30  =  0 −80 −6 −80 0 



20 30 0

1 −1 -8.7

30  30 . 0 

(3.19)

Solving Ax=b ■ 73

We have eliminated the entries below each of the pivots. We get a form that can be solved for the unknowns by using back substitution. Because we used the augmented matrix, we have the correct extra column to determine the solutions. We label elimination matrices by the element they eliminate. In equation 3.17 for the three-point problem, we eliminated the element A21 so the elimination matrix is called E21 . Similarly, we had E31 in equation 3.18 and E32 in equation 3.19. The subscript tells us which off-diagonal entry, row number, then column number, is the target for elimination. For this example in equation 3.17 to equation 3.19, if we call the main matrix A and the eliminated matrix with pivots U we can simply write the equation E32 E31 E21 A = EA = U , where the matrix E is given in equation 3.20. 1 0 0 1 0 0 1 0 0 1 0 0       1 0. (3.20) = 0 1 0  0 1 0 −2 1 0 =  −2 0 2.7 1 −7 0 1 0 0 1 −12.4 2.7 1 

E = E32 E31 E21











Notice that the order of the multiplication matters. We must multiply the elimination matrices the same way we do elimination, starting with our first elimination matrix on the right and moving to the last on the left. 3.2.3

Swapping Rows When Needed

There will always be a solution because we have worked on a problem with independent columns. The only hitch we could have will be if one of the entries becomes zero in a pivot position before we are done. An example is shown in equation 3.21 with this problem after we create the first Pivot Column. 

1  1 2

1 2 5   2 3  goes to  0 7 10 0 



2 0 3

5  −2 . 0 

(3.21)

What can we do to get around this? If we interchange rows 2 and 3, the problem will disappear. We use a Permutation Matrix to do this operation. This is just a matrix that changes the order of rows, which we call a row swap. The permutation matrix looks like an identity matrix in that each column contains a single 1, but the order differs. Here is what we could do: 1 1 0 0 1 2 5 1 2 5       0 0 1 1 2 3  = 2 7 10 goes to  0 0 1 0 2 7 10 1 2 3 0 











2 3 0

5  0 . −2 

(3.22)

So, the result of this operation is that any matrix can undergo elimination by applying elimination matrices and permutation matrices, provided it is square and all the columns are independent. This still works if we are using an augmented matrix. The permutation matrix swaps rows no matter how long they are.

74 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Swapping rows and permutation matrix Swapping two rows in a matrix can be done using a permutation matrix. We show here making a permutation matrix from an identity matrix. This matrix, called S12 , switches rows 1 and 2. >>S12=eye(3), I3=eye(3)

produces two copies of the 3 × 3 identity

matrix, >>S12(2,:)=I3(1,:), S12(1,:)=I3(2,:) produces S12

0 1 0   = 1 0 0. 0 0 1 



We make two copies of I and modify one of them. This creates a matrix to insert values into, and it is good practice to initialize array variables before working with them. Then the permutation matrix is made by copying rows from I3 into S12 .

The matrix U we get from elimination is upper triangular, and we work toward getting the upper triangular form whenever we do elimination. The elimination matrices E are also special because they are always lower triangular. In later sections, we will do more operations to U , such as making the diagonal entries equal to one and eliminating some of the entries above the diagonal. But for now, use U for the last matrix in equation 3.22. 3.2.4

Summary

We can now do Gaussian elimination to solve for the unknowns in the equation Ax = b. Any square coefficients matrix with independent columns can undergo elimination to produce an upper triangular matrix we call U . We make the augmented matrix by taking the matrix A and placing b onto the right side. The result of Gaussian elimination will be A and b in a form ready for back substitution to determine the vector of unknowns x. Elimination is done using simple matrix multiplications to operate on the rows of the matrix [A|b]. If needed, we can swap rows in A by using permutation matrices. The elimination and permutation matrices are a special class of matrices we call elementary matrices that we discuss in the next section.

3.3

ELEMENTARY MATRICES AND THEIR INVERSES

This section discusses elementary matrices, which we have used as elimination and permutation matrices for a while. One of the powerful and fortunate aspects of using an elementary matrix is that it is easy to figure out how to undo what it does. Undoing the multiplication of one of the Es is the same as taking its inverse, that is, calculating E −1 . Computing the Inverse of an Elimination Matrix is easy and usually only requires one operation.

Solving Ax=b ■ 75

3.3.1

What are Elementary Matrices, and Why are They Important?

The best example of an Elementary Matrix is the identity matrix I. All other elementary matrices are made by simple or elementary row operations on I. There are three elementary operations. 1) Exchanging rows of I, thus creating a permutation matrix. 2) Multiplying a row of I by a constant. 3) Adding a multiple of one row to another row or building an elimination matrix. We show example matrices of each of the following. 1 0 0   0 0 1 0 1 0 



Exchanging rows 2 and 3,

(3.23)

1 0 0   −3 1 1 Subtracting 3 × row 1 from row 2. 0 0 1 



1 0 0   0 3 0 0 0 1 

(3.24)



Multiplying row 2 by a scalar value 3,

(3.25)

The elementary matrices do a lot of the work that has to be done in linear algebra. They perform all the basic operations done in Gaussian elimination. Each elimination step is done by swapping rows using matrices such as equation 3.23 or adding a multiple of a row to another row as shown in equation 3.24. Although we have not done it yet, we will use equation 3.25 soon in operating on matrices. 3.3.2

Undoing an Operation and the Inverse of an Elementary Matrix

How do we undo an elementary row operation? Suppose we have, as usual, the matrix A that we multiply by an elementary matrix E to get a resulting matrix R. In other words, EA = R. What do we do to R to get A back? The obvious linear algebraic operation is If EA = R, then A = E −1 R.

(3.26)

The matrix E −1 then is not only the right matrix to undo the multiplication by E, but it is also the matrix we name the inverse of E. For an elementary matrix, determining E −1 is straightforward. Let’s illustrate this with an example. The matrix −1 E21 in equation 3.27 adds two of row 1 to row 2 in the matrix A, so the matrix E21 must subtract two of row 1 from row 2 in A. In matrices, we would write: 1 0 0 1 0 0     −1 = 2 1 0 , then E21 = −2 1 0 . 0 0 1 0 0 1 

If E21







(3.27)

76 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Inverting an elementary matrix We will show two ways to get the inverse of an elementary elimination matrix. The first way is just to change the sign of the only off-diagonal entry. The second way is to use the built-in MATLAB® function that inverts a matrix. >>E21I=E21, E21I(2,1)=-E21(2,1) changes the sign of the element, >>E21I=inv(E21) produces the inverse matrix directly. 1 0 0   2 1 0 0 0 1 

Both

methods

E21

=

to

the

inverse

matrix

1 0 0   = −2 1 0. 0 0 1 

−1 E21

take





Even though you can use MATLAB® ’s inverse function, continue to develop your skills and insights into linear algebra by learning other methods, like LDU and Gauss-Jordan, as we show.

So, to get the inverse of one of our elementary matrices, we change the sign of the only nonzero entry of the lower triangular matrix. 1 0 0 1 0 0     −1 If E = 0 1 0 , then E =  0 1 0 . a 0 1 −a 0 1 







(3.28)

The inverses of a permutation matrix and a matrix that multiplies a single row are just as simple. 1 0 0 1 0 0     −1 If E=0 0 1 , then E = 0 0 1 , 0 1 0 0 1 0

(3.29)

a 0 0 1/a 0 0     If E=0 1 0 , then E −1 = 0 1 0 . 0 0 1 0 0 1

(3.30)

















Amazingly, the Inverse of a Permutation Matrix is just the matrix itself. In the example, in equation 3.28, we changed the sign of the a in the matrix, but in equation 3.30, we took 1/a. Why? We added and subtracted from the third row in equation 3.28. The factor a was not in the diagonal position, so we changed the sign to invert the matrix. In equation 3.30, the factor a was in the main diagonal, so its only effect is to multiply a single row. Thus, we invert it to get the inverse. We show

Solving Ax=b ■ 77

the multiplication for each. a 0 0 1/a 0 0 a/a 0 0      E −1 E = 0 1 0  0 1 0 =  0 1 0 = I, and 0 0 1 0 0 1 0 0 1 









1 0 0 1 0 0 1 0 0      0 1 0 = I . 0 1 0 0 1 0 =     a 0 1 −a 0 1 a−a 0 1 







(3.31)



(3.32)

This points out one more important fact about matrices and their inverses. We get the identity matrix if we multiply a matrix by its inverse or an inverse by the matrix. This is one of the fundamental properties of an inverse matrix: A−1 A = AA−1 = I.

(3.33)

The use of elementary matrices is straightforward. They are simple to formulate following the rules for elementary matrix operations. Their inverses are easy to compute. Lastly, although it might get tedious, we could string together as many matrices as we wanted to perform a set of operations on a matrix. Using elementary matrices for elimination leads us to our next major step—the Factorization of matrix A into several matrices with specific qualities. In this chapter, we will use factorization to solve Ax = b. In the following chapters, we will develop a host of factorizations that solve other fundamental problems in the linear algebra framework.

3.4

OUR FIRST FACTORIZATIONS—A = L U = L D U

Next, let’s consider the more general case where we have EA = U or A = E −1 U . We know that we construct the matrix E by multiplying, in the right order, a whole series of individual elimination matrices that are themselves elementary matrices. In the three-point problem in equations 3.17 to 3.19, we multiplied A first by E21 , then E31 , and finally by E32 . We can set the final matrix E = E32 E31 E21 as shown in equation 3.20. The product matrix E is definitely no longer an elementary matrix as it has many off-diagonal nonzero terms. This means we do not yet know how to get its inverse. Below we will describe how to get E −1 , and how to turn E −1 U into LDU , the factorization of A into the product of three matrices L, D, and U . We will start with the factorization A = LU before calculating the matrix D. 3.4.1

Finding the Inverse of E and Getting L

There is no simple relation between E and E −1 as there was for the individual elementary matrices. Because E is not elementary, we must do more work to calculate it. Can we use the fact that the matrix E is the product of elementary matrices to find its inverse? Yes, and it uses a straightforward process. We start with a basic identity, not proven here, that (AB)−1 = B −1 A−1 .

(3.34)

78 ■ Linear Algebra for Earth Scientists

Of course, if A and B are large and complicated n × n matrices, then we do not yet know how to compute their inverses. On the other hand, if A and B are elementary matrices, no matter their size, we can immediately write down their inverses. If we take our recurring example E = E32 E31 E21 , then using the previous identity, we can immediately say that: −1 −1 −1 E −1 = (E32 E31 E21 )−1 = E21 E31 E32 .

(3.35)

And we know how to compute all of the inverse matrices! We can do this for any set of elementary matrices: −1 If E = E1 E2 · · · En−1 En , then E −1 = En−1 En−1 · · · E2−1 E1−1 .

(3.36)

This means that if we do elimination by using elementary elimination matrices, we can always compute a single inverse matrix that undoes the elimination process. So, if we have EA = U then we can immediately get A = E −1 U . We give a new designation to the special, lower-triangular matrix E −1 , and that is L. We use L because this matrix is lower triangular. The lower triangular matrix L is the logical companion to the upper triangular matrix U . We can write down the equation A = LU . This may seem like a lot of work to reproduce A, a matrix we already knew, but we will do more with L and U to determine quite a bit about the linear algebra of the system. This is just our start, and we will use different factorizations to accomplish important tasks in linear algebra as we go forward. 3.4.2

Getting to L U

Now we are ready to take an example matrix from A to LU factorization using elimination matrices, their inverses, all of our elementary matrix operations, and the fundamental operations of linear algebra. Rather than set up a geologic problem, we will use an easy starting matrix (all results are integers) to demonstrate the method and sequence of operations. We will take as our starting matrix: 1 2 4   A = 2 6 10 . 3 8 9 



(3.37)

We can compute the elimination matrices, knowing that we first subtract 2 of row 1 from row 2 and 3 of row 1 from row 3. The last elimination would be subtracting 1 of the remaining row 2 from the remaining row 3. The matrices we need are listed in equation 3.38. 1 0 0   = −2 1 0 , 0 0 1 

E21



1 0 0   =  0 1 0 , −3 0 1 

E31



1 0 0   = 0 1 0. 0 −1 1

(3.38)

1 0 0   = 0 1 0. 0 1 1

(3.39)



E32



Giving us for the inverses: 1 0 0   = 2 1 0 , 0 0 1 

−1 E21



1 0 0   = 0 1 0 , 3 0 1 

−1 E31





−1 E32



Solving Ax=b ■ 79

We can start by multiplying the elimination matrices to give us the overall matrix E. We can then multiply their inverses in reverse order and get the result for E −1 . 1 0 0   E = −2 1 0 , 1 −1 1 



1 0 0 1 2 4     = L = 2 1 0 , and U = 0 2 2 . 3 1 1 0 0 −5 

E −1







(3.40)

So at this point, we can compute E and get the matrix U (=EA). If we augmented the data vector onto the matrix A, then we would be set up to do back substitution and solve for the unknowns (we can solve Ax = b). We can also compute L = E −1 , and arrive at A = LU . 1 0 0 1 2 4 1 2 4      LU = 2 1 0 0 2 2  = 2 6 10 = A. 3 1 1 0 0 −5 3 8 9 









(3.41)

MATLAB® moment—Using the lu() function MATLAB® fuction lu() is shown here for the matrix in equation 3.37. >>[L U P]=lu(A), lu() does LU factorization on the matrix A. It gives three output matrices. 1.000 0 0 3.000 8.000 9.000 0 0 1       0.667 4.000, and P = 0 1 0. L = 0.667 1 0 , U =  0 0.333 −1 1 0 0 5.000 1 0 0 











We noted that many of MATLAB® ’s built-in functions swap rows for numerical stability. Although the matrix A should not present problems, the rows were reordered by swapping row 1 with row 3. The record of this is contained in the matrix P , which we see does just this. Once rows are swapped, the LU calculation proceeds on the new matrix A that we call A1. The matrices L and U in the output are in this swapped order. This is not an issue because columns remain the same, so there is no change to the solution for x. We can recover the starting matrix. The math to remember is that LU = P A . We get the matrix A back using the previous matrices. 3 8 9   >>A1=L*U gives A1 = 2 6 10, the pivoted version of the matrix A. 1 2 4 



1 2 4   >>A=P*L*U, or A=P*A1 gives A = 2 6 10, the original matrix A. 3 8 9 



80 ■ Linear Algebra for Earth Scientists

3.4.3

Computing LDU and Learning More About the Matrix A

Our next task is to get to A = LDU and see why this factorization is important. Let’s start with the equation in a slightly different form as A = LIU , with I, of course, being the identity matrix. Let’s write this solution down for the previous example. 1 0 0 1 0 0 1 2 4     LIU = 2 1 0 0 1 0 0 2 2 . 3 1 1 0 0 1 0 0 −5 







(3.42)

Sticking I anywhere is OK because it is the same as multiplying by 1. We notice here that the matrix L is a lower diagonal matrix with ones along the diagonal. Doing this for U would also be nice and symmetric. For the IU , let’s divide each row of U by the number in the pivot position (that is, along the diagonal) and then put this factor into I. Now I becomes a diagonal matrix with the values needed to recover the previous version of U . Remember from the last chapter that multiplying a matrix with a diagonal matrix on the left multiplies each row by the diagonal value in that row. We also saw this for the elementary row operation in equation 3.30. Note well here that this changes the look of U , but in no way changes the solution it offers for the unknowns by back substitution, provided we do the same division to the data vector b. This makes equation 3.42 into the following form. 1 0 0 1 0 0 1 2 4     LDU = 2 1 0 0 2 0  0 1 1. 3 1 1 0 0 −5 0 0 1 







(3.43)

What is important about the matrix D? It turns out that computing the matrix D is the easiest way to get the determinant of A, which we symbolize |A| or det(A). Once we have D, a diagonal matrix, we can compute |A| = |D|. We will discuss the determinant in detail and find many uses for it in later chapters. To compute |D| we simply multiply the diagonal terms, that is, |D| = d11 d22 · · · dnn . Because A is square and has independent columns, this means that |D| = ̸ 0. This means the matrix A is full rank and invertible. We could have done the same operation using the diagonal terms of U . Even more magic will come from D if we want to work with eigenvalues (whatever they are) in a later chapter.

3.5

GAUSS-JORDAN ELIMINATION AND COMPUTING A−1

We have now found a way to solve the problem Ax = b by doing elimination on the augmented matrix of A with b and solving for x by back substitution. Early on, however, we stated that another solution to solving Ax = b was to write this as x = A−1 b. We showed previously in doing Gaussian elimination to get the LDU form that we could compute the inverses of the elimination matrices efficiently and use these to compute E −1 . We could do this because we were dealing with elementary matrices, and finding their inverses is simple. We observed that directly finding E −1 was not as simple (well, without using elementary matrices). The same goes for A

Solving Ax=b ■ 81

and its inverse. Fortunately, there is a method we can apply while doing elimination to create LDU that also produces A−1 . We call this method Gauss-Jordan Elimination. 3.5.1

Setting up Gauss-Jordan Elimination

We have seen that using augmented matrices can ease the bookkeeping in the calculations done during elimination. It keeps track of row operations like subtraction and row swapping. We now introduce another augmented form specifically for Gauss-Jordan Elimination: making the augmented matrix of A by sticking the appropriately sized I on the right side. For the matrix used in 3.37, we get the following augmented matrix as [A|I]. 1 2 4 1 2 4 1 0 0     2 6 10 →  2 6 10 0 1 0  3 8 9 3 8 9 0 0 1 







(3.44)

Our row operations are the same as in Gaussian elimination on A augmented with b as in equation 3.5. What we do for elimination on the matrix A is also done row by row on the augmented matrix [A|I]. 1 2 4 1 0 0 1 2 4 1 0 0     E21  2 6 10 0 1 0  →  0 2 2 −2 1 0  3 8 9 0 0 1 3 8 9 0 0 1

(3.45)

1 2 4 1 0 0 1 2 4 1 0 0     E31  0 2 2 −2 1 0  →  0 2 2 −2 1 0  3 8 9 0 0 1 0 2 −3 −3 0 1

(3.46)

















1 2 4 1 0 0 1 2 4 1 0 0     E32  0 2 2 −2 1 0  →  0 2 2 −2 1 0  0 2 −3 −3 0 1 0 0 −5 −1 −1 1 







(3.47)

We have accomplished the first step of Gauss-Jordan elimination and have produced the matrix U augmented with the modified identity matrix. So far, so good. 3.5.2

Cleaning Upwards

Our next step is Cleaning Upwards to get the left-hand side of the augmented matrix from an upper diagonal matrix to just a diagonal one. This involves starting with row 3, in this case, and adding or subtracting upwards as needed. Our first step is eliminating the 2 and 4 above the −5 in the third column. We do this by adding upward multiples of the third row (using 2 ÷ 5 for row 2 and 4 ÷ 5 for the first row). We have to do this for the entire third row, so we have to look at the terms in the augmented matrix and do the same above them. 1 2 4 1 0 0 1 2 0 1/5 −4/5 4/5      0 2 2 −2 1 0  →  0 2 0 −12/5 3/5 2/5 . 0 0 −5 −1 −1 1 0 0 −5 −1 −1 1 







(3.48)

82 ■ Linear Algebra for Earth Scientists

We do the same for the second column using the 2 in row 2. This involves subtracting row 2 from row 1 across the entire augmented matrix. 1 2 0 13/5 −7/5 2/5 1 0 0 13/5 −7/5 2/5      0 2 0 −12/5 3/5 2/5  →  0 2 0 −12/5 3/5 2/5 . (3.49) 0 0 −5 −1 −1 1 0 0 −5 −1 −1 1 







Our last step is to make the left-hand side into the identity matrix. We do this by dividing through by the diagonal entries for each row, in this case, −5 for row 3, 2 for row 2, and 1 for row 1. 1 0 0 13/5 −7/5 2/5 1 0 0 13/5 −7/5 2/5     1/5 . (3.50)  0 2 0 −12/5 3/5 2/5  →  0 1 0 −6/5 3/10 0 0 −5 −1 −1 1 0 0 1 1/5 1/5 −1/5 







The Gauss-Jordan elimination started by working on the augmented matrix [A|I]. It produced another augmented matrix starting with I. What is the second part? It is A−1 , the Inverse Matrix of A. This means the final matrix is [I|A−1 ]! MATLAB® moment—Cleaning upward We have a firm grasp on using an elimination matrix to put a target matrix into upper triangular form from Gaussian elimination. This involved making a lower-triangular elimination matrix. Can we clean upward using an elimination matrix? Yes, but now the matrix doing the cleaning must be upper triangular! Let’s set the matrix B to the augmented matrix in equation 3.48. 1 2 4 1 0 0   B =  0 2 2 −2 1 0 . 0 0 −5 −1 −1 1 



We got this from doing elimination on the augmented matrix [A|I]. We need to eliminate the 2 and 4 in the last column of the original design matrix and apply whatever we do to the entire matrix B. This is done by adding 2/5 of the last row from row 2 and 4/5 of the last from row 1. To do this, we use an upper triangular cleaning matrix that we will call C3 because it is cleaning the third column. 1 0 0.8   C3 = 0 1 0.4, and C3 B gives the result we see in equation 3.49. 0 0 1 



To make C3 we would use the following in MATLAB® : >>C3=eye(3) produces a 3 × 3 identity matrix, >>C3(1,3)=0.8; C3(2,3)=0.4; producing the matrix we see earlier.

Solving Ax=b ■ 83

Now, we have two methods for solving Ax = b where A is square and has independent columns. The first is the LDU reduction, giving us a final matrix U that, along with the modified b that we augmented onto A, we can use to back substitute and solve for x. The second method is Gauss-Jordan, which gives A−1 . We can then compute A−1 b to get x. In the process of doing Gauss-Jordan, we do not have to give up the benefits of LDU . We just make the augmented matrix [A|I] instead of [A|b] and get L, D, and U to use as needed. Because we are getting A−1 , we can ignore b during elimination and use it in its unmodified form later with the inverse. We could also make another augmented matrix [A|I|b] and have every possible method at our disposal. MATLAB® moment—My output looks different than expected It is common when using one of the MATLAB® functions for inversion, LU factorization, and many others that the output matrices may have a different row order from what you input. This is because MATLAB® works to make calculations as numerically stable as possible. This is called pivoting or partial pivoting, not to be confused with pivot columns or variables. An example of a matrix needing partial pivoting is given. 0.001 25 3 0.001 25 3     50, 000, 000 6, 000, 005 A = −2000 0.1 5 →  0 1 1 1 0 −24, 999 −2999 







In our regular Gaussian elimination, we multiply the first row by 2,000,000. We can see that other entries in the matrix can blow up in value, leading to numerical instability in the calculations. MATLAB® swaps rows in this case. Its final result may be in a different row order, but the variables remain unchanged as they are tied to the columns.

3.6

THE THREE-POINT PROBLEM MANY WAYS

We presented the three-point problem, its elimination results, and how to think of it with elimination matrices. We will now work it through using all the methods described previously. We are returning to the version of the three-point problem we presented at the beginning of this chapter. We will also revisit the problem in the next chapter. 3.6.1

Simple Elimination with an Augmented Matrix

We start by simply presenting the result of earlier elimination. We create the augmented matrix and then do elimination to isolate the pivots. This is shown,

84 ■ Linear Algebra for Earth Scientists

summarizing our previous results from section 3.1.1 and equation 3.7. 10 10 20 1 30     20 70 1 90  →  0 70 60 1 130 0

20 30 0







1 −1 −8.7

30  30  → 0 

mx = 1 my = 1. z0 = 0

(3.51)

So, we do back substitution for this case and get the results for the unknowns. One other thing we can do here is similar to what we are going to do for the LDU factorization and divide through by the value of each pivot. This gives a result where we can more easily solve for the unknowns. 10   0 0 

20 30 0

1 −1 −8.7

30 1 2 0.1 3    30  →  0 1 −0.033 1 . 0 0 1 0 0 





(3.52)

And if we clean upwards with the result in equation 3.52, we can convert to a form for the matrix that we call Reduced Row Echelon Form where we can read the results off directly from the augmented matrix. 1 0 0 mx 1 2 0.1 1 3 1 0 0 1           0 1 −0.033 1  →  0 1 0 1  , giving  0 1 0 my  = 1. 0 0 1 0 0 0 1 0 0 1 0 z0 0 













 

(3.53)

The last equation is a tidy solution to the problem. Any square matrix that is full rank will produce a reduced row echelon form that is the identity matrix. Row echelon means the pivots are arranged in a downward and rightward stair-step pattern. Reduced row echelon form means all the pivots have been set to one, and values above the pivots have been reduced to zero. We will discuss the row echelon form a bit more at the end of this chapter and see it again in the next chapter when we start working with rectangular matrices that are definitely not full rank and do not have pivots in every column or row. When we get the row echelon form, we sometimes refer to the resulting matrix using R. What we are doing is taking the augmented matrix [A|b] to [R|b′ ], where R is A taken through elimination and put in row echelon form, and b′ is the vector b after all of the elimination and cleaning operations. 3.6.2 Gauss-Jordan Elimination on the Three-point Problem

Now, we can set the problem up as its Gauss-Jordan version. This starts with the matrix and augments the identity matrix. We can then apply the elimination matrices to get the first part of our result. 10 20 1 1 0 0    20 70 1 0 1 0  70 60 1 0 0 1 

10 20 1 1 0 0   −2 1 0 .  0 30 −1 0 0 −8.7 −12.4 2.7 1









Now we do the upward cleaning steps: 10 0 0 −0.04 −0.15 0.19 10 20 0 −0.43 0.31 0.115      0 30 0 −0.57 0.69 −0.115  →  0 30 0 −0.57 0.69 −0.115 . 0 0 −8.7 −12.4 2.7 1 0 0 −8.7 −12.4 2.7 1 







Solving Ax=b ■ 85

Which finally gives us [I|A−1 ]: 1 0 0 −0.004 −0.015 0.019    0 1 0 −0.019 0.023 −0.004 . 0 0 1 1.42 −0.31 −0.115 



(3.54)

Now that we have A−1 , we can multiply it times our data vector to get the solution vector. −0.004 −0.015 0.019 30 1 mE        −0.019 0.023 −0.004  90  = 1 = mN  1.42 −0.31 −0.115 130 0 z0 





 





(3.55)

The advantage of this method is that if we work on a problem with the same coefficient matrix, we can easily produce results for any number of data vectors without any additional elimination. Suppose, for example, our problem was one where we had three monitoring wells, measured the water table’s depth, and computed its slope. Because the wells are not changing position, the coefficient matrix A does not change. What does change is the data vector b as this records water levels, which can change with time. At any given moment, we can record the water levels in b and multiply them by A−1 to get the unknowns in x. MATLAB® moment—Use rref() for reduced row echelon form The MATLAB® function rref() computes the reduced row echelon form of an augmented matrix. We show an example here using the A matrix and b data vector for the three-point problem that is the Ax = b arrays for this chapter. >> rref([A,b]) puts A augmented with b into reduced row echelon form. 10 20 1 30 1 0 0 1     [A|b] =  20 70 1 90  rref →  0 1 0 1  identical to equation 70 60 1 130 0 0 1 0 3.53. 

3.6.3







LDU for the Three-point Problem

We will start the process on the three-point problem by doing Gaussian elimination on the augmented matrix as described earlier and creating LDU as a result. We will revisit it now, starting with the elimination matrices and their inverses. We do not give the full augmented matrix as we solved the previously mentioned problem

86 ■ Linear Algebra for Earth Scientists

in section 3.6.1. ,1 0 0 1 0 0 1 0 0       1 0 , =−2 1 0 , E31 = 0 1 0 , E32 =  0 0 2.7 1 −7 0 1 0 0 1

E21

1 0 0   −1 E21 = 2 1 0 , 0 0 1 











1 0 0   −1 E31 = 0 1 0 , 7 0 1 







1 0 0  −1  1 0 . E32 =0 0 −2.7 1 



−1 −1 −1 We will use the fact that L = E21 E31 E32 to get L. We will also recall the solution for U since we already got it in equation 3.19. This solves the LU factorization. As a final step, we compute D using the factors in equation 3.19 to get the final result in equation 5.44 for L, D, and U for the three-point problem. We can check these matrices by multiplying them to get A back again in equation 3.57.

1 0 0 10 0 0 1 2 0.1       1 0 , D →  0 30 0  , U → 0 1 −0.033 , L → 2 7 −2.7 1 0 0 −8.7 0 0 1

(3.56)

10 20 1   LDU =  20 70 1  = Coefficient Matrix A. 70 60 1

(3.57)

















Once we are at this step, we can fully use the results to solve for x by working with the matrix U and modifying the data vector b using the same elimination steps or by augmenting it to the starting coefficients matrix for the three-point problem. We can also get information about the determinant and eigenvalues of the coefficients matrix by using D. MATLAB® moment—Use rref() for Gauss-Jordan elimination The reduced row echelon function rref() can also be used to perform Gauss-Jordan elimination on a matrix augmented with the identity matrix. This is one way to compute the inverse of a matrix. >>R=rref([A,eye(3)]) augements A with an appropriately sized I. 1 0 0 −0.004 −0.015 0.019 rref 10 20 1 1 0 0    0 1 0 −0.019 0.023 −0.004  −1   20 70 1 0 1 0  → [I|A ] =  0 0 1 1.42 −0.31 −0.115 70 60 1 0 0 1 





>>A1=[R(:,4),R(:,5),R(:,6)] would set A1 to the inverse of A.



Solving Ax=b ■ 87

3.7 REWORKING THE GEOCHEMICAL EXAMPLE We will also work through the geochemical example presented in Chapter 1 and show how elimination leads to the same solution and how the Gauss-Jordan method leads to the inverse. We had the following matrix setup: K, Al, and Fe are elemental compositions, and b, f, and c stand for the minerals biotite, feldspar, and chlorite, respectively. 5 1 1 0 b K = 1b + 1f + 0c = 5      Al = 1b + 1f + 3c = 10 , giving 1 1 3 f  = 10 . 8 3 0 1 c Fe = 3b + 0f + 1c = 8  







(3.58)

The way the problem is set up, we will have to swap rows two and three to have pivots along the diagonal. 1 1 0 1 1 0     1 1 3 elimination → 0 0 3 , swap row 2 with row 3 → 0 −3 1 3 0 1 







1 1 0 0 1 1 0     0 0 1 0 0 3 =  0 0 1 0 0 −3 1 0 







1 −3 0

0  1  = U. 3

(3.59)



(3.60)

We then use the augmented matrix, put the matrix into an upper triangular form, make the reduced row echelon form, and finally solve for the unknown amounts of each mineral. Notice that because we swap two rows, we must also swap the last two entries of the data vector. We must mimic what we do with the coefficients matrix with the data vector. The rows in both must correspond. Notice, too, that swapping rows does nothing to the unknowns. These multiply the columns of the coefficient matrix, and the columns are not swapped or exchanged. 1 1 1 0 5     3 0 1 8 → 0 1 1 3 10 0 





1 −3 0

0 1 3

5 1 0 0 19/9    −7  →  0 1 0 26/9 0 0 1 5/3 5 



1 0 0 b 19/9 b = 19/9      Giving 0 1 0 f  = 26/9 → f = 26/9. 0 0 1 c 5/3 c = 5/3 

 





(3.61)



(3.62)

We then want to look at putting the matrix in LDU diagonalization to get to the matrix D. We will start with the matrix in its version with the swapped second and third rows. This does not change any of the values of the unknowns but will change the sign of the matrix’s determinant, which will be discussed later. We have already determined the U matrix in equation 3.60, so we will need to recapture our elimination steps to get L. We subtracted three of row 1 from row 2 and one of row

88 ■ Linear Algebra for Earth Scientists

1 from row 3. This gives us the following: 1 0 0   = −3 1 0 , 0 0 1 

E21

1 0 0   = 3 1 0 , 0 0 1 

−1 E21





E31



1 0 0 1 0 0     = 0 1 0 → E −1 = L = 3 1 0 . 1 0 1 1 0 1 

−1 E31

1 0 0   =  0 1 0 , and −1 0 1 







(3.63)

From this, we get LDU by dividing through by the pivots in U to create D. 1 0 0 1 0 0 1 1 0 1 1 0       3 1 00 −3 00 1 −1/3 =3 0 1. 1 0 1 0 0 3 0 0 1 1 1 3 



L





D



U



(3.64)

A

And finally, we will do Gauss-Jordan elimination on the matrix to compute its inverse. Again, we will swap the rows before we begin so that the elimination goes well. 1 1 0 1 0 0 1 1 0 1 0 0      3 0 1 0 1 0  →  0 −3 1 −3 1 0  → 1 1 3 0 0 1 0 0 3 −1 0 1

(3.65)

1 1 0 1 0 0   1 −1/3 0  →  0 1 −1/3 0 0 1 −1/3 0 1/3

(3.66)

1 0 0 1/9 1/3 −1/9 h i    0 1 0 8/9 −1/3 1/9  = I |A−1 giving → 0 0 1 −1/3 0 1/3

(3.67)

















1/9 1/3 −1/9 5 19/9      8/9 −1/3 1/9 8 =     26/9 −1/3 0 1/3 10 5/3 









b = 19/9 → f = 26/9. c = 5/3

(3.68)

Again, all methods give us the same result. The geochemical example is nice because it makes us swap rows to get nonzero elements along the main diagonal. As stated earlier, these nonzero values form the pivots for the matrix. If a zero value appears in a pivot position, we must do a row swap to get a nonzero value as the pivot.

Solving Ax=b ■ 89

MATLAB® moment—Two ways to solve Ax = b Linear Algebra is the foundation of MATLAB® , so there are several easy ways to solve Ax = b. Computing the inverse matrix is a simple command inv(). If you want to calculate x directly, you can simply write x = A\b. The backslash operator, based on the LU factorization for square matrices, is the fastest and most numerically stable (more in Chapter 9) and should be your go-to method when you have just one b. We demonstrate by calculating solutions for the three-point problem example. −0.004 −0.015 0.019   =  −0.019 0.023 −0.004 , then use x = A−1 b. 1.42 −0.31 −0.115 

>>R=inv(A) gives A−1



1

    >>x=A\b gives x = 1.

0

3.8 SUMMARY We have put together a great toolbox of methods for solving Ax = b where A is a square, full-rank matrix. The easiest method is Gaussian Elimination, where we do all of our operations on the rows of a matrix. This creates an upper-triangular matrix where all the entries below the main diagonal are zero. For an n × n matrix, the bottom row contains a single nonzero component in the right-most position; the next row has a nonzero element of the next column to the left, and so forth. Finally, the uppermost row contains a nonzero entry in the leftmost position, or in other words, the upper left corner of the matrix. We call each of these entries along the main diagonal a pivot. All pivot positions must be nonzero for a matrix to be full rank. The entries below the main diagonal are all zero, and the rest above the main diagonal can take on any real number. The operations on the rows of the coefficients matrix can be considered in single steps that act on only two rows. This process can be implemented with matrix multiplication by using an elimination matrix. The elimination matrix starts with the identity matrix but adds a nonzero component in the lower triangular part of the matrix. This means we keep track of all the elimination operations and reverse them as needed. The elimination matrices are examples of what we call elementary matrices. The elementary matrices are simple and can be easily inverted. Because elimination is acting on the rows of the coefficients matrix, everything done to those rows must also be done with the data vector. We use a matrix form we call the augmented matrix to make this process easier. This means we stick the data vector as an additional column to the coefficients matrix. If the coefficients are in an n × n matrix, then the data vector is n × 1. Any elimination or permutation matrix

90 ■ Linear Algebra for Earth Scientists

will also have to be n × n, acting the same way on rows of the matrix and the data vector. Once the matrix is reduced to its pivots by elimination, we say it is in row echelon form. The pivots form a rightward-descending staircase of nonzero entries. We can take a further step in that we can clear upwards from the pivots, creating a diagonal matrix, and then we can divide each row by the value of the pivot to get the matrix in what we call reduced row echelon form. The reduced row echelon form leaves us with the identity matrix for a square, full-ranked matrix. If we perform these operations on the augmented matrix, we can read the solution for each unknown by merely reading across the matrix a row at a time. No back substitution is needed. Another version of the augmented matrix is used when we do the Gauss-Jordan elimination method. Here, we put an identify matrix onto the square coefficients matrix. This Gauss-Jordan method then turns the coefficient part of the matrix into the identity matrix by elimination and cleaning upward from the pivots. Once the matrix is in this form, the augmented part becomes A−1 . Why? Because it records every action necessary to take our starting matrix A on the left side of the augmented matrix to I. For this reason, the augmented I has now become the inverse of A. The end result is that we convert the matrix [A|I] into [I|A−1 ]. What should we do with the data vector? Nothing Because we computed A−1 , we can simply get our unknowns by computing x = A−1 b. We also got our first two major factorizations of a matrix. These were the LU and the LDU forms. We saw that the process of elimination takes the coefficients matrix into an upper triangular matrix that we call U . In doing this, we do a series of elimination steps using elementary matrices. We take inverses of the elimination matrices and add them together to get a lower triangular matrix that we call L. The matrix L can then be used to multiply U to recover the coefficients matrix A using LU = A. Sticking an identity matrix in the middle allows us to recast LU as LIU . We then divide each row by the pivot values in U and make these the entries of the middle matrix I to get the form LDU . The matrix D is diagonal, and we have the result that LDU = A. From the D matrix, we can compute the determinant of A.

3.9 1.

EXERCISES Do elimination on the following matrices (Ax = b). Use an augmented matrix and show your work. Determine each matrix’s unknowns vector x. 2 3 0 5     (i) A = −9 7 3  , b = −4 , 7 5 −5 2 







2 3 4 3     3  , b = 4 . (ii) A = −2 7 4 12 −2 5 



 

Solving Ax=b ■ 91

2.

Now for some more work on elimination. (i) (ii) (iii) (iv) (v)

Give the step-by-step elimination matrices for part (i) of question 1. Multiply them to get a single elimination matrix. Compute the inverse of each elimination matrix to find L in LU decomposition. Show the LDU decomposition. Do Gauss Jordan elimination for each example, giving the matrix A−1 . Check that A−1 b gives the same result as back substation.

3.

Perform the previous operations using MATLAB® . You should also compute the vector x using the MATLAB command A\b . Does using the lu() command produce the same result as you get by hand calculation?

4.

In Chapter 1, we worked on garnet components and chemistry to make the linear algebra components for Ax = b. We still have not gotten any data vectors. (i)

Can you perform elimination on the matrix A? Can you make the elimination matrices? (ii) If the answer is yes, reread this chapter. If the answer is no, what went wrong? The basic problem is that the design matrix is not square. We do not know how to deal with this—but we will find out in the following two chapters! So, we now need to make a simplifying assumption to create a square matrix. The matrix has as many rows as we have elements, so we need to compromise on the elements. (i)

What element(s) can you eliminate or group with another element to pose a solvable problem? Write out new design and data matrices for this example. (ii) As a specific possibility, group Fe3+ in the data vector with Fe2+ and let the An component be all Ca so that this component has Ca5 . Write out the new design and data matrices. Solve for the vector x using the following data vectors. Fe 2.805 Ca  0.14 Mg 0.015  Mn0.048  Al1.988 Cr 0.004 Total 5 

3.

Cations 2.46 0.17 0.015 0.375 1.976 0.004 5

2.01 1.13  0.03  . 0.03  1.78 0.02 5 

We will go back to our tomography example from Chapter 1. In that chapter, you set up the matrix and vectors in the form Ax = b. Now, we will change the problem slightly and perform elimination. The new setup is shown in Figure 3.4. You will have to remake the design matrix A to take into account the diagonal

92 ■ Linear Algebra for Earth Scientists

Setup for exercise on tomography. Assume each block is 1 km on a side. Figure 3.4

elements. Since we assume 1 km √ blocks, the distance traveled by the third wave across blocks 1 and 4 will be 2 km. The vector x is a list of the unknowns, and b is the data vector. You should perform the following operations by hand and/or using MATLAB® . (i)

Formulate and perform elimination on the matrix A. Keep track of the row operations on the time vector b. You can do this as a separate vector or as an augmented matrix. (ii) Compute the unknown vector x, which is the slowness (reciprocal of velocity) for the four blocks. 0.35 0.354 0.37 0.354     Use data vectors   and   and back substitution. 0.51 0.511 0.45 0.370 







The data vectors contain the seismic wave travel times and have units of seconds. (iii) Do Gauss Jordan elimination on the matrix A and find A−1 . (iv) Check that A−1 b gives the same result as back substation.

CHAPTER

4

When Does Ax = b have a Solution?

In the last chapter, we gave the solution to Ax = b under two special conditions: the matrix A that defines the coefficients is square and that its columns are all independent. These conditions ensure that the same number appears in three places: the number of unknowns in the solution vector x, the number of measurements in the data vector b, and therefore the number of rows and columns in A. The columns of A are all independent, and therefore, the matrix is full rank and has as many pivots as rows and columns. Square and full rank together mean that Ax = b has a unique solution and that A is invertible. This chapter addresses what to do when the matrix A is not square or if it’s not full rank even if it is square. Either case means that there are more equations than unknowns or more unknowns than equations, and there could be trouble finding a solution. Such matrices, if square, are called Noninvertible or Singular. Rectangular matrices are not invertible regardless of rank. Once we are comfortable with these results, we will describe what they mean in terms of vector spaces and solving for a variable that may have no unique solution. MATLAB® moment—The fault vectors for this chapter We will see the fault problems several times in this chapter. We give the slip vectors here and how we combine them into the various problems for Ax = b. This way, we can set and change vectors as needed and rebuild matrices. The column vectors slip are listed by fault label: >>a=[1;0;0]; b=[2;2;1]; c=[1;-1;1]; d=[3;-1;1] To define matrices F2, F3, and F4 with two, three, and four faults, respectively, use: >>F2=[a,b]; F3=[a,b,c]; F4=[a,b,c,d];

DOI: 10.1201/9781003432043-4

93

94 ■ Linear Algebra for Earth Scientists

S a

a

c

a

c

d

F

b

b

b

N E D D

N

F

E

S b

Map views in upper row and block diagram below. The block diagram is for a two-fault case at the bottom. Red arrows show the components of motion or slip on each fault. A black dot on a thin line shows the fault also has a vertical downward motion in the slip direction. We can measure the total displacement of S relative to F by its North, East, and Down components. Components on faults are listed in the lower right and equation 4.1. Recall that this problem was set up with just two faults in Chapter 1. Figure 4.1

4.1 SLIP WITH DIFFERENT NUMBERS OF FAULTS We start with the geologic problem we saw in Chapter 1, which described finding the slip on two faults if we knew the horizontal motion across all the structures. Across the top of Figure 4.1, we show the problem again in map form for more cases. We now include movement in all three coordinate directions: North, East, and Down. The goal is to find the slip on individual faults given a specific number of faults and their slip directions. As we’ll see, these new arrangements may have a unique or no solution. In this problem setup, we have estimated the total movement vector, s. It is the displacement of the point S relative to F and is the data vector for this problem. The movement directions of the faults are set using their vector components, expressed

When Does Ax = b have a Solution? ■ 95

in NED coordinates. For example, fault a is a strike-slip fault that, for every 1 unit of north slip, has 0 units of east or down motion when it moves. Fault b is a normal fault, and it has its slip vector in the proportions 2 north: 1 east: 1 down. For normal faults c and d, we have the proportions 1 : −1 : 1 and 3 : −1 : 1, respectively. The −1 indicates that the hanging wall moves westward, meaning it has a negative motion relative to the positive east direction in NED coordinates. These vectors will be arranged as column vectors and arranged to form the coefficients matrices A that we work with following. 1 2 1 3         0 1 −1 −1 a =  , b =  , c =  , d =  . 0 1 1 1  

 









(4.1)

The vectors in equations 4.1 show the slip direction on each structure but not the magnitude of the slip, or in other words, how far each fault moved. For each fault, we will have an unknown magnitude of slip that we will call as , bs , cs , and ds . We arrange these components into the column vector of unknowns x. Lastly, we have the total movement vector s with its components sN , sE , and sD , the measured total displacement between S and F. The vector s becomes the data vector b in Ax = b. The problem is now set up so that we’ve identified A, x, and b, and we can work to find the motion on each fault given a total motion vector s. We will use different combinations in the number of faults to explore the solvability of Ax = b. Our intuition tells us we should find a solution if we have 3 faults and 3 dimensions in NED. At least, we do not go into the exercise suspecting trouble. On the other hand, if we have 2 or 4 faults, we already know that finding a unique solution is unlikely, but perhaps we can solve the problem under the right circumstances. 4.1.1

Three Faults

We will start with a problem we think should have a solution: three faults moving in three dimensions. This means we will have a square 3 × 3 coefficients matrix, a data vector with three components, and three unknowns to find. In the NED coordinate system, let’s assume we start with faults a, b, and c, and the data vector has components 4, 1, and 2. We estimate that our three faults have moved the station S 4 units to the north, 1 to the east, and 2 units in the down direction relative to the fixed point F. We get the following equations before and after elimination. 1 2 1 as 4 1 2 1 as 4           0 1 −1  bs  = 1 → 0 1 −1  bs  = 1. 0 1 1 cs 2 0 0 2 cs 1 

 

 



 

 

(4.2)

From inspection, we can see that all the columns are independent (there are no zero pivots), so we expect one unique solution. We use elimination to get the upper triangular matrix U , and then we can solve for the unknowns by back substitution. This gives the result that cs = 1/2, bs = 3/2, and then as = 1/2. What could be easier? Because the coefficients matrix is square and full rank, another option would have been to compute an inverse for the matrix and get the unknowns vector using x = A−1 b. Regardless of the approach, this problem has one unique solution.

96 ■ Linear Algebra for Earth Scientists

4.1.2

Two Faults

Now, suppose that only two faults have been mapped in this area. We will use faults a and b and the same data vector. We only have one tool to find a solution: elimination. We cannot find A−1 by Gauss-Jordan because the matrix is not square and thus has no inverse. 1 2 " # 4 1 2 " # 4   as     as   = 1 → 0 1 = 1. 0 1 bs bs 0 1 2 0 0 1 



 





 

(4.3)

Does this version of the problem Ax = b have a solution? No. After elimination, multiplying the last row in the matrix A by the unknown vector implies that 0as + 0bs = 1, which is impossible to solve. Are there cases where we could get a solution? The answer is yes, but those cases require special conditions on the components of the data vector. Suppose we have the same setup with a slightly different data vector and then do elimination. 1 2 " # 4 1 2 " # 4   as     a   = 1 → 0 1 s = 1. 0 1 bs bs 0 1 1 0 0 0 



 





 

(4.4)

Now the last row gives 0as + 0bs = 0. No problem with this, and back substitution gives us that bs = 1 and as = 2. This result shows that the problem can be solved in certain cases. But what are those cases? The simple answer is that to solve Ax = b, the data vector b must be some combinations of the columns of A. In equation 4.4, we see that the data vector is two of the first column plus one of the second column. In the previous case for two faults in equation 4.3, it is clear that b was not the result of any combination of the columns. 4.1.3

Four Faults

What if there are 4 faults in our system? We can also set this problem up and try to get a solution. Of course, we try to determine the unknowns using elimination. 1 2  0 1 0 1 

   as 1 3    b  −1 −1  s   cs 

1

1

cd

4 1 2    = 1 → 0 1 2 0 0  



   as 1 3    b  −1 −1  s   cs 

2

2

ds

4   = 1. 1  

(4.5)

The last row tells us that 2cs + 2ds = 1. This is a sensible equation, so we expect to be able to get a solution for cs and ds . In fact, we find an infinite number of solutions because all we know is that cs = 1/2 − ds . Pick any value for ds , and we get a value for cs . We can use these values and back substitution to determine bs and as . Unfortunately, these values for as , bs , and cs will change with our arbitrary choice for ds . To sum up, our results so far tell us that for some cases, a rectangular coefficients matrix results in a unique solution, may give no solution, and, in some cases, gives an infinite number of solutions. There is a lot still to understand.

When Does Ax = b have a Solution? ■ 97

50 m 80

1

slope

5

40

5

1

2

2 4

N

4

6

6

60

3

U

3

E

Diagram of a three-point problem with six locations and elevations. There are six points, though only three are needed to define the orientation of the surface. For this reason, the problem is now over-constrained. We might expect a solution for any choice of three points since three points define a plane. The complete system is shown in equation 4.6. Figure 4.2

4.2 THE THREE-POINT PROBLEM WITH MORE POINTS Let’s go back to the three-point problem to gain insight from an example we have studied. We start by adding three more points to the three-point problem we saw in Chapter 1. The problem in the form Ax = b is given in equation 4.6. 5 70  60   75  40 70 

70 50 10 40 55 20

30 1      1 90 mE    1   60  mN  =  . 90 1    z0 60  1 50 1 





(4.6)

We repeat the problem setup in Figure 4.2, but now with the additional points. Our lessons from the previous examples lead us to think that we might get one unique solution, no solution, or possibly infinite solutions. So we go ahead and do elimination on the 6 points now instead of 3, and get the following result: 5 70  60   75  40 70 

70 50 10 40 55 20

1 30 5 70 1 30 0 −930 −13    90   −330  1  m     m   E E   0   1 0 0.6    60     −5.5   mN  =   →   mN  =  . 90 0 −0.53 1 0 0   z0     z0   60 0 −0.27 1 0 0  1 50 0 0 0 −25.5 













(4.7)

98 ■ Linear Algebra for Earth Scientists

No solution! Because the last three components of the data vector did not all go to zero, there can be no solution. But we should still be able to find the plane that includes any choice of three points, right? Let’s try to get an answer using points 1, 4, and 5. The coefficients matrix and the results after elimination are shown in equation 4.8. 30 mE 5 70 1 30 mE 5 70 1            75 40 1  mN  = 90 →  0 −1005 −14  mN  = −360. 0 z0 0 0 0 60 z0 40 55 1 



















(4.8)

We immediately encounter a problem: the last row is all zeros. This means we cannot get a unique solution because there is no constraint on the value of z0 —we are free to assume any value we want for z0 . What went wrong? In terms of linear algebra, the third row must have been a combination of the first two, so the rows of the matrix and, therefore, the columns are not independent. The matrix is square and 3 × 3 but is only of rank 2. This means that the coefficients matrix is singular—it is not invertible—thus, we cannot solve the equation Ax = b. We could have guessed this by inspecting Figure 4.2. Points 1, 4, and 5 are colinear in map view. Because the last entry of b went to 0, the points must also be colinear in three dimensions. We know that if this is the case, we will not get a solution to the three-point problem. They form a line in space and not a plane. MATLAB® moment—Dimensions of a vector or matrix We will need to know the size of vectors and matrices when we want to fit them into dimensions. Doing this is simple using the MATLAB® functions size() and length(). We will use the matrices and vectors mentioned for faults. >>size(a) gives >>3 1 as the number of rows then columns, just the way we would write its size as 3 × 1. >>length(a) gives >>3, which is the size of the longest dimension of the array.

So the previous example shows that 6 points do not solve a three-point problem and that sometimes neither do 3 points. When do we get a solution? In the last chapter, we had the same number of equations as unknowns, but now we see that does not always work. In the previous chapter, all the matrices were full rank; in this chapter, they may not be. The rank of the matrix must be important.

When Does Ax = b have a Solution? ■ 99

MATLAB® moment—What happens when a matrix is singular? The three-point problem in equation 4.8 is singular. What happens if we try to solve the problem using MATLAB® ? We use A the design matrix in equation 4.8 and b for the data vector. Let’s try to solve for the unknowns using the \ method. >>A\b gives a warning in red type: Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 6.597971e-19. Or: Warning: Matrix is singular to working precision. Any time MATLAB® outputs RED TEXT it means something went wrong. In this case, the matrix is singular. Using the inv() function gives the same warning.

4.3 THE VECTOR b MUST BE IN THE COLUMN SPACE OF A TO FIND x Given the previous examples, we are now wondering what determines whether Ax = b has a solution. A concise mathematical statement would be that the vector b must be in the column space of A for there to be a solution x. This is the more formal way of saying that the data vector b must be a combination of the columns of the coefficients matrix. What that means and how it is determined will require a bit of discussion, primarily focused on the more abstract parts of linear algebra. We must establish several new definitions and connections to understand the solvability of Ax = b. But the payoff is huge—with this understanding, we can design the experiments and make the measurements we need to solve geologic problems and know when we don’t have enough information to make a sound scientific interpretation. Once we have a few new ideas, we will return to the cases described previously in Section 4.5 and see what we can glean from them. 4.3.1 Vector Spaces and Subspaces

To understand what the column space of A is, we need to step back and ask, What do we mean when we call something a Vector Space? It is the space where its elements—called vectors—follow the rules for the addition and multiplication of vectors we discussed in Chapter 1. Calling something a vector space means that any vector resulting from adding the vectors in the space remains in the same space and that if we multiply a vector by a scalar, the resulting vector is also in the same space. We could say that the vector space is Closed under the linear operations of scalar multiplication and vector addition. The vector 0 or zero vector is considered the trivial vector space.

100 ■ Linear Algebra for Earth Scientists

Diagram of a part of R3 that is a subspace on the left and not on the right. On the left, the subspace is the coordinate plane of u-v where w = 0. Vectors a and b can be added or scalar multiplied in any way and still be in the subspace. Some examples are shown in the green vectors. On the right, we take that part of the subspace on the +v side as another candidate for a subspace. With the same vectors a and b, we can compute products and sums that are both in (green) and out (orange) of the proposed subspace. Thus, the right part of the figure is not a subspace.

Figure 4.3

For this book, we will work with Real Vector Spaces, one where the vectors consist of real numbers. Our vector spaces are constructed in the n-dimensional space of the Real numbers Rn . Vectors in a vector space must all be the same size, meaning each vector must have n components. Just as rows and columns give the size of a matrix, the number of components gives the size of a vector. And when we use the number of components, we are also referring to the number of Dimensions that the vector is in. A vector with 6 components is in the vector space R6 . This is even true if some or all the components are zero. In fact, a vector space must contain 0, the zero vector. What is a Subspace? A subspace is the part of a vector space spanned by a particular set of vectors. For example, two independent vectors with three components define a plane in three dimensions. This plane is a two-dimensional subspace of R3 , as shown in Figure 4.3. On the left-hand side, we have two vectors a and b that lie on the u-v plane but are vectors in R3 in the u-v-w coordinate system. Each vector has 3 components, but the w-component is zero. Any multiple or sum of these two vectors is still in the plane, so the plane is a subspace. We have the same two vectors on the right-hand side of Figure 4.3, but now we are asking whether the +v side of the u-v plane is a subspace. No. This is not a subspace because multiples and sums of the vectors starting in the +v sector can lie outside the sector, so they violate the idea of closure for vector spaces and subspaces. If we had a third vector with a nonzero w component, the vectors could fill all of R3 . Even in this case, we could refer to the vector space as a subspace. We broached the subject of subspaces before discussing the Span of a set of vectors. Let’s do a span refresher. In the previous case, we have two vectors in three

When Does Ax = b have a Solution? ■ 101

dimensions, but their span is only a plane. The vectors in Figure 4.3 are in one of the coordinate planes, but we could have just as easily made them at any orientation. Regardless, the span of two vectors is, at most, a plane or two-dimensional surface in however many dimensions we have for the vector space. For example, if we have two vectors with six components each, then the most they can span is two dimensions in a six-dimensional space. It is hard to visualize R6 , so we will stick with three dimensions and R3 for now and in our following examples. 4.3.2 Dimension, Rank, and Basis

We will describe the dimension of a vector and subspace to avoid confusion. A column or row vector represents a one-dimensional line in space. That part is easy. Any combination or multiple of the vector with itself stays on the same line, and nothing we do can get off it. On the other hand, a vector can have any number of components. This means that a one-dimensional vector with n components is located in Rn but has a span of only one dimension. We would say that the vector defines a one-dimensional subspace of Rn . Likewise, a plane has two dimensions, and a volume has three in any vector space. The space we are in depends on the number of components, so n components mean that we are in Rn , but a line fills only a one-dimensional subspace, a plane a two-dimensional subspace, and so forth. Suppose we have a square matrix A that is n × n. The matrix will be formed from n vectors of length n. We have defined the rank of the matrix to be the number of independent columns, which is also the same as the number of independent rows. We call the matrix full-rank if the n×n matrix has all independent columns. The columns of the full-rank matrix can be linearly combined to fill all of Rn . We would say the columns of the matrix are a Basis for the space. A basis is a set of n independent vectors that span the n-dimensional vector space . We commonly see a particular convention when we have the basis vectors to span a subspace. These vectors are turned into vectors with length one. This is called Normalization. This is a straightforward process that means finding the norm of the vector and then dividing all of the components by the norm. Here is a good example: 2 2 2/3 √ √ b 1      b = 2 , ∥b∥ = bT b = 9 = 3 → = 2 = 2/3 = ˆ b. ∥b∥ 3 1 1 1/3  

 





(4.9)

This vector is normalized to unit length by putting the ˆ over the vector. We use the name b-hat for our unit vector ˆ b. A rectangular matrix A has m rows and n columns. The column rank of the matrix is the number of independent columns, which must be equal to the number of independent rows, meaning that the row rank is equal to the column rank for a rectangular matrix. Regardless of whether we are taking the column view or the row view, the rank of the matrix is the same. For this reason, we talk about the rank of the matrix, whether it is rectangular or square.

102 ■ Linear Algebra for Earth Scientists

We will eventually relate the solvability and number of solutions to the rank and size of the coefficients matrix. The only additional aspect we must consider is whether m > n, m = n, or m < n. We review all the possibilities in Section 4.5 and summarize when and how we solve Ax = b.

MATLAB® moment—Normalizing a vector The easiest way to calculate the norm of a vector in MATLAB® is to use the norm() function. 2/3   ˆ >>bhat=b/norm(b) gives b = 2/3. 1/3 



4.4 EXPLORING THE VECTOR SPACES AND SUBSPACES OF Ax=b This section explores the vector spaces and subspaces associated with the solution to Ax = b. We will use many 2D and 3D examples so the reader can try to visualize these spaces. The discussion will get very deep into the mathematical details of linear algebra. It can be considered optional for a lot of readers as we give a summary of the results in Section 4.5—So, when does Ax = b have a solution? For those who want a deeper understanding of column space, null space, and solvability, read on. If you just want to know when and how many solutions there are to Ax = b, skip ahead to Section 4.5 to see how the rank and size of the matrix control solvability. 4.4.1 The Data Vector b and the Column Space of A

We can see what is meant by b being a combination of the columns of A by considering the following example for Ax = b. Let the vectors a and b from Figure 4.3 form the columns of matrix A in equation 4.10. " # 2 −1 2 −1 2x − 1y b1 x           A = 2 1  , x = → Ax = x 2 + y  1  = 2x + 1y  = b2  = b y 0 0 0 0 0x + 0y b3 (4.10) 



 









 

The vector b is a sum of the first column of A times the first component of x and the second column of A times the second component of x. The columns of A have zero as their w component, meaning that only vectors b with the third component of 0 can have solutions. If the third component of b is not zero, then there is no vector x whose components can multiply the columns of A to give b. So what is the Column Space of A? The column space is the subspace Spanned by the Column Vectors that form A, which is the u-v plane as shown in Figure 4.3. We denote the column space of the matrix A as C(A) .

When Does Ax = b have a Solution? ■ 103

For an m × n matrix, each column will have m components, the number of rows in the matrix. So the column space is always in Rm or a subspace of Rm . In the previous example, we showed two vectors in R3 . With two vectors, we only span two dimensions, which is still a subspace of R3 . If we have a single vector, we span only a single dimension: a line containing all scalar multiples of the single vector. Of course, we could span all of R3 if we have three independent vectors with three components. Do three vectors with three components always span R3 ? No. If two vectors are the same or the third is a combination of the other two, we can only span two dimensions. If all three are the same or multiples of one another, they only span one dimension. We show in equation 4.11 an example of matrices A and B that are different shapes but have the same column space. In each case, the column vector has 3 components, so C(A) must be in R3 . It does not matter how many columns are in the matrix; the column space must be all of R3 or a subspace of R3 . Thus, the matrix B has 3 columns but only spans the same one-dimensional subspace spanned by matrix A. 1 1 1 1 1 1         A = 1 , B = 1 1 1 → C(A) = c1 1 ≃ C(B) = c2 1 1 1 1 1 1 1  





 

 

(4.11)

So, these two matrices have the same column space. They differ in their row spaces, but that will be discussed later. In our example in Figure 4.3, the two vectors have three components but only span a plane. When those vectors are part of a matrix, the span is equivalent to the rank, just the number of independent columns, in this case. Let’s get to a few more examples of vectors in R3 with different numbers of columns. For each of these, we will give the rank of the matrix, calling it r, and what configuration is displayed by the column space. 1   3 , r = 1, 1

1 0   3 0 , r = 2, 1 1

 



line in R3



plane in R3

1 0 1   3 0 −1 , r = 3 1 1 1 



(4.12)

volume in R3 .

What if there are many more columns than rows? In this case, the dimension of C(A) cannot exceed the number of components in the columns, which is the same as the number of rows. This also means that the number of independent columns is the same or less than the number of rows. So the C(A) is either all of or a subspace of Rm . 1 1 1 3   A =  3 0 0 1 , −1 1 −1 1 



1 1 1       C(A) = c1  3  + c2 0 + c3  0  −1 1 −1 

r = 3,



 





(4.13)

For the matrix in equation 4.13, we have the C(A) in R3 as there are only 3 components. This also means that, at most, 3 of the columns can be independent. The rank of the matrix is 3.

104 ■ Linear Algebra for Earth Scientists

In summary, suppose we have the m × n matrix A. The columns of A have m components, so the column space of A must be in m dimensions. However, we have n total columns. If n < m, the column space can fill or span n dimensions at most. If n = m, the column space could fill all the m dimensions provided the columns are independent. And if n > m, then the column space may span all of m dimensions (Rm ), but never more than this many dimensions. For this reason, C(A) is in mdimensional vector space but can only span an m or fewer dimensional subspace. When we are presented with the equation Ax = b, the columns of A are multiplied by the rows of x and the sum forms the rows of the data vector b. This emphasizes again that the operation of Ax is to produce a vector that is a combination of the columns of A. This also means that if b is in the column space of A, then there is a solution vector x that solves Ax = b. There could also be infinite solutions, but that requires more discussion. MATLAB® moment—Finding the column space There is no single command to pick the columns forming the column space of a matrix. What you easily can get out of MATLAB® is an orthonormal basis for the column space. We will not take up the orthonormality in detail until a later chapter, but this term means that the vectors are perpendicular and unit length. For the normal x-y-z coordinate system, the I3 matrix is an orthonormal basis. The command is orth(), and it returns an array that is the basis for the column space. −0.924 0 0.383   −1 0 . >> r=orth(F4) gives r =  0 −0.383 0 −0.924 



We get three column vectors for the column space since this matrix is rank 3.

4.4.2

The Solution Vector x and Null Space are in a Different Vector Space from C(A )

The data vector b is in the column space of A, which is in a vector space Rm . What vector space does the solution vector x occupy? The vector x has the same number of components as A has columns, so it must be in Rn . The number of columns equals the number of components in each row, so the row space of the matrix A is also a subspace of Rn . We will not discuss the row space quite yet but mention it so that in the following sections, the reader will not be surprised when the row space and x are plotted on a different set of axes from C(A). The null space of A is also a subspace of Rn . To motivate this new subspace, we ask the question: Is there a vector b for which there is always a solution to the equation Ax = b? Yes! If b is the zero vector, then we can always find a solution to Ax = 0 even if it means that x must also be the zero vector. When we solve

When Does Ax = b have a Solution? ■ 105

for b = 0, we are solving for all vectors x that fit this criterion. The x vectors that are solutions form a subspace of Rn that we call the Null Space. The span of this set of vectors tells us what subspace the null space occupies. We symbolize the null space of A as N(A) . Solving for Ax = 0 is like the Homogeneous Solution to a differential equation. The N(A) is also referred to as the Kernel of the matrix. The idea of a null space is probably new to most readers. It is fundamental to deeply understanding the equation Ax = b. To clarify its significance and the solvability of Ax = b, we will use a series of examples for a 2 × 3 matrix and several 3 × 3 matrices. We will thoroughly explore the vector spaces, column and null spaces, and conditions on the data vector and the unknowns in x. As stated previously, matrices of this size have the advantage that we can visualize them in an u-v-w plot showing the extent of the column space and an x1 -x2 -x3 plot for the null space. If you understand the arguments for determining the column and null spaces in 2 or 3 dimensions, then there should be no trouble calculating them for 50 dimensions. We also use the coordinate system of x1 -x2 -· · · -xn for the N(A) to emphasize the connection between the vector x, the null space, and Rn . MATLAB® moment—finding the null space There is a simple command to find the null space of a matrix. It is null() and returns an array that is the basis for the null space. This will be an orthonormal basis. We will discuss orthonormality in a later chapter. Take b as a vector in Rn . >>r=null(b)

gives r= 1 × 0 empty double row vector. The null

space of b is only the zero vector, which MATLAB omits from its output. 



−0.816  0    >>r=null(F4) gives r =  . −0.408 0.408 We only get one vector for the null space because this matrix is rank 3 but has 4 columns.

4.4.3 Finding the Solution for a Rectangular Matrix

We will start with the 2 × 3 matrix A in Figure 4.4. The column space for this setup is in R2 , whereas the null space is in R3 . The column space is based on the number of components in the columns, which is 2, the same as the number of rows. The null space is based on the length of the solution vector. This vector x multiplies the columns of our matrix A, so it must have 3 components. And, the column space is not an R2 subspace of the R3 null space. The C(A) and N(A) are in separate vector spaces entirely. We adopt a standard color scheme for the vectors in

106 ■ Linear Algebra for Earth Scientists

Figure 4.4 The column and null space for a 2 × 3 matrix. For A, the column space comprises all multiples of the column vectors of the matrix. Only one of the columns is independent. The red vector shows the basis for the column space in a u-v axis plot. The column space is just a line in R2 . The null space is on the right in R3 . Two vectors for the null space are shown in green, as is the plane they span. We also show the possible space for the solution vector xp , where the subscript stands for particular. We pick the line containing the x1 -axis, shown in the blue arrow, with the variables x2 and x3 equal zero.

Figure 4.4. Red vectors are in the column space, and green vectors are in the null space. If the column space or null space spans two dimensions, we color the corresponding plane red or green, respectively. For the solution space that results from Gaussian elimination, we use blue. To get the results shown in Figure 4.4, what computations must we make? First, we determine the C(A), which is simply the first column of A. From this, we see that C(A) is one dimensional, a line made by taking all multiples of the first column of A as shown by the red arrow in the left part of Figure 4.4. The other columns are dependent, so there is only one linearly independent column, and the rank of the matrix is 1. Now, we can rewrite the equation Ax = b in terms of the unknowns, the vectors for the column space, and the data vector. " #

Ax = x1

" #

" #

" #

3 3 3 b + x2 + x3 = 1 =b 3 3 3 b2

(4.14)

Just from inspection of equation 4.14, we can see that for the vector b to be in C(A), both components of it must be the same. So, for there to be a solution, we have b1 = b2 . We now do Gaussian elimination on the augmented matrix to see what we can determine about the solution vector x. "

A=

3 3 3 b1 3 3 3 b2 = b1

#

"



3 3 3 b1 0 0 0 0

#

"



3 0

3 0

  # x " # 3  1 b1 x2  =

0

x3

0

(4.15)

When Does Ax = b have a Solution? ■ 107

This equation has a single pivot, the 3 on the left. This means that we have one Pivot Column. This pivot column corresponds to the variable x1 of the unknowns vector x. We call x1 the Pivot Variable. We highlighted the 0s in the second and third columns in the second row. These columns are called the Free Columns, and the variables x2 and x3 will be called Free Variables. The equations we have to solve are: 3x1 + 3x2 + 3x3 = b

(4.16)

0x1 + 0x2 + 0x3 = 0.

(4.17)

We call the resulting value simply b because the data vector has b1 = b2 for the system to have a solution. Equation 4.16 shows that we need to have the condition x1 = b − (x2 + x3 ) to get a solution. Equation 4.17 shows that we could set x2 and x3 to any value we want, and we can still solve for x1 in equation 4.16. This tells us that there must be an infinite number of solutions because, for any value of b, we could pick an infinite combination of values for x1 , x2 , and x3 . This result seems like it could be more enlightening, so we reorganize our analysis around the variables and what we know about them from elimination. We identified pivot and free variables. Our approach will be to find solutions for the pivot variables by setting the value of the free variables to zero. This changes equation 4.16 to: "

3x1 + 3(0) + 3(0) = b → b = 3x1 , b =

3x1 3x1

#





x1   , x =  0 . 0

(4.18)

Let’s return to equation 4.14 and set both components of b, our data vector, to 9. This is clearly in the column space, so we must have a solution. The 9 is arbitrary and could have easily been -2 or 25. Setting the free variables to zero will give us a unique solution. We call such a solution a Particular Solution or xp . " #

" #

9 3 = x1 b= 9 3

3 3   →3 , xp = 0 , x1 = 3 3 0 " #

 

(4.19)

The result for b means that the solution for x1 is 3 and 0 for x2 and x3 . This is our particular solution, which solves for the pivot variables in matrix A after elimination. This means that the solution vector x must be on the x1 axis, as shown in the blue vector on the right-hand side of Figure 4.4. The solution occupies just one dimension: a one-dimensional subspace of R3 . What fills the other two dimensions? The null space. Is the particular solution unique? No, it is not. For this example, we could have arranged the columns differently and had the quantity associated with x2 in the first column. This would show that x2 is 3 as the pivot variable and x1 = x3 = 0 for our particular solution. So, the particular solution depends on how we arrange the columns. In addition, we could have set x1 = x2 = 0 and arrived at a particular solution using only x3 . This is only true if the matrix is not full column rank and

108 ■ Linear Algebra for Earth Scientists

we have some free variables. We should get a single, unique solution if the matrix is full-rank. In this problem, we have two free variables and one pivot variable. We must explore the null space for a complete solution encompassing all possible values for x. We solve the equation Ax = 0 using pivot and free variables. Our practice will be to set one of the free variables to 1 and any others to zero. We proceed to compute a solution vector one free variable at a time. This provides a solution for the pivot variables using one of the free variables. x3 = 1, x2 = 0 → x1 = −1,

(4.20)

x2 = 1, x3 = 0 → x1 = −1,

(4.21)









−1 −1     N (A) = c2  0  + c3  1  = xs . 0 1

(4.22)

This gives us vectors that we can call the Special Solutions or xs to the problem. These are the solutions from the null space and consist of the same number of vectors as we have free columns. Of course, if we put in 1 for the free variable and solve it, we could always put in any other real number and get a solution. For this reason, we put a scalar in front of the special vectors to show that any multiple of the vectors is also in the null space. And any combination of multiples of the vectors in the null space is also in the null space. Thus, in equation 4.22, we have two independent vectors that Span the Null Space, again shown in Figure 4.4. These vectors form a Basis for the Null Space. What can we do with the vectors in the null space relative to xp ? These vectors must have at least one non-zero value for x2 and x3 to reach beyond just the x1 axis. Let’s reform our solution using two vectors x1 and x2 , where x1 is just the solution vector x from equation 4.19 and x2 is another vector where there are non-zero values for x2 and x3 , so it is in the span of the null space. A(x1 + x2 ) = b, → Ax1 + Ax2 = b.

(4.23)

Then we get our total solution for the unknowns vector x: −1 −1 3       x = xp + xs = 0 + c2  0  + c3  1  0 1 0  









" #

for b =

9 . 9

(4.24)

We can see that this works because the particular solution gives us the result for b, and any addition from the null space gives us zero. The particular solution does not have a leading scalar – it solves for a specific b, so it must be represented by a single point (not a subspace). In equation 4.25, any multiples and the sums of the null space vectors in xs give zero.       # 3 " # " # −1 " # −1 " # 3 3 3   9 3 3 3   3 3 3   0 , and c2 . 0 =  0  = c3  1 =

"

3 3 3

0

9

3 3 3

1

3 3 3

0

0

(4.25)

When Does Ax = b have a Solution? ■ 109

Figure shows examples of the total solution vector x as combinations of the particular and null space solutions. In all cases, the particular solution must be a point, shown in blue along the x1 axis. Left shows an xp at 1 for x1 . Right shows the xp at −1 for x1 . We can add any vector from xs , a combination of the vectors defining the null space. These vectors are shown in green. The plane of these vectors, shown in green, no longer coincides with the origin but is shifted to be centered on xp . The final solution for x is shown in the green dot and is just the sum of xp with xs . Note that the shifted plane is no longer a subspace because it does not contain the zero vector.

Figure 4.5

So, no matter what we do with the constants c2 and c3 , the solutions from the N(A) always give the zero vector and add nothing to the particular solution. The particular and special solutions span the vector space occupied by x. For the previous example, the particular solution must be along the x1 axis, meaning that it is one-dimensional. The special solution consists of two independent vectors that span a two-dimensional plane. This means that a combination of xp and xs can fill all of R3 . Figure 4.5 shows examples where the particular solution is a point for two different vectors b. The particular solution xp is restricted to the x1 axis. To this solution, we can add a vector from the null space by setting values for c2 and c3 in equation 4.22. Doing this forms a plane of solutions that is centered on xp but stretches to other values of the total solution for x in combinations of xp with xs . Note well that Ax = Axp = b, where x = xp + xs , in both cases. Figure 4.5 is intended to show in detail how the equation x = xp + xs plots in three dimensions. A similar figure could be constructed for any other system in any number of dimensions. The other feature evident from the figure is that the vectors defined by the special and particular solutions span all of R3 . Since we have determined that the particular solution is not unique, what about the special solution? We can investigate this by setting x2 as the pivot variable and

110 ■ Linear Algebra for Earth Scientists

The null and particular solution spaces for the matrix A with x2 chosen to be the pivot variable. The main diagram shows the vectors spanning the null and solution spaces and the equations for xp and xs . The lower right inset shows the same configuration with x1 as the pivot as in Figure 4.4. Figure 4.6

solving for x1 and x3 as pivot variables. When we do this, we get the result: x3 = 1, x1 = 0 → x2 = −1, x1 = 1, x3 = 0 → x2 = −1, 

0





1

(4.26)



    N (A) = c1 −1 + c3 −1 = xs .

(4.27)

0

1

Comparing equation 4.27 with equation 4.22 from the previous xs , we see that the results are different. This, of course, confirms that neither xp nor xs are unique. What about the null space? We show these two cases in Figure 4.6 and see that the vectors plot differently, but the null space looks the same. If we call the first special solution xs1 and the second xs2 , then we just need to see that xs2 spans xs1 and conversely that xs1 spans xs2 . We will show this for the first condition by demonstrating that the two vectors that define xs1 can be constructed using the vectors in xs2 . We have two equations to satisfy: −1 0 1 −1 0 1              0  = a −1 + b −1 and  1  = c −1 + d −1 . 1 1 0 0 1 0 























(4.28)

These are easily solved, giving a = 1, b = −1, c = 1, and d = −1. We also need to show the converse, but it is just as simple, so that we will skip it here. Because we can always do this, we know that changing the pivot and free variables will not change the null space of a system. In addition, because xp is not in the span of N(A), we know that xp and xs will span R3 .

When Does Ax = b have a Solution? ■ 111

The column space and null space for a full rank 3×3 matrix. For the matrix A in this example, the column space comprises all multiples of the column vectors of the matrix. The columns are independent. The three red vectors show the basis for the column space. The column space fills all of R3 . The unknowns vector x has a span that also fills R3 but with different axes and possibly units from the column space. The only null space is the zero vector, shown by the green dot at the origin on the right.

Figure 4.7

4.4.4

The N(A ) and C(A ) for Some Square Matrices

Now, we will start to work in the C(A) and N(A) of square matrices. The approach and tools we have developed with a rectangular matrix transfer to the square case. We know the spaces are in Rn , so considering the column and null spaces sharing a vector space is tempting. They do not. They are in separate spaces, but they share some similarities based on the rank of the matrix. Full rank matrix Our first example is a 3 × 3, full rank matrix in Figure 4.7. The columns have 3 components; thus, the column space is a subspace of R3 . Because it is full rank, the column space spans all of R3 , and we can pick any data vector b and find a solution vector x that will produce it. If b is the zero vector, then the only possible vector x is the zero vector, which plots at the origin. The vector x also has three components and is in an R3 vector space. However, this is a different vector space from the column space in R3 . The column and solution space fill R3 . The column space is the span of three columns of the matrix A and is referenced in whatever units are used for u-v-w. The solution space for x is the span of the unit vectors in the x1 , x2 , and x3 directions and is in three dimensions. The null space is the single point at the origin, and the only vector in the null space is the zero vector. Rank 2 matrix Now let’s consider a square matrix that is not full rank in Figure 4.8. The matrix B has the same second and third columns. They are not independent, and the rank of the matrix is 2. The column space is only two-dimensional and spanned by the

112 ■ Linear Algebra for Earth Scientists

The column space and null space for a 3 × 3 matrix of rank 2. For the matrix B in this example, the column space comprises all linear combinations of the first two column vectors of the matrix. The two red vectors show the basis for the column space. The column space fills a 2-dimensional subspace of R3 . The extent of the C(A) is shown by the red plane that contains the u and v axes. The green vector shows the null space. It is in the same space as the solution vectors shown in blue. These two vectors span the x1 -x2 plane. Figure 4.8

first two columns. These columns contain the u and v axes, as shown in the left part of Figure 4.8. We must determine possible solution vectors x. The components x1 and x2 are the pivot variables, and x3 is the only free variable. The solution space is spanned by all combinations of the x1 and x2 . Since these can take on any value, the solution space spans the entire x1 -x2 coordinate plane as shown on the right in Figure 4.8. This also means that our data vector has conditions on it as well. We can have b1 and b2 take on any value, but we can only solve the equation if b3 = 0. Now, we look for the null space. We get the three equations for solving Ax = 0. 3x1 + 0x2 + 0x3 = 0,

(4.29)

0x1 + 3x2 + 3x3 = 0,

(4.30)

0x1 + 0x2 + 0x3 = 0.

(4.31)

Since we have one free variable, we know the null space must be one-dimensional. We set the free variable x3 = 1. From equation 4.30, we get that x2 = −1. Plugging these values into the equation 4.29 gives us x1 = 0. So the null space must be: 0   N(B) = c1 −1 = xs . 1 



(4.32)

This vector is plotted on the right side of Figure 4.8. Finally, we give the complete solution for Ax = b. In this equation, the values of 3x1 and 3x2 are set to the b1 and b2 components of the data vector b. x1 0 0       x =  0  + x2  + c1 −1. 0 0 1 











(4.33)

When Does Ax = b have a Solution? ■ 113

Figure 4.9 The column space and null space for a 3 × 3 matrix of rank 1. For the matrix C in this example, the column space comprises all multiples of the first column vector. The red vector shows the basis for the column space. The column space fills a 1-dimensional subspace of R3 . The null space is shown by the green plane that represents the span of the two green vectors in the right-hand drawing. Notice that the diagram on the right is identical to the previous case for a 2 × 3 matrix shown in Figure 4.4.

Rank 1 matrix Our last example is of a rank 1 matrix in R3 . We have set the columns of C to be all the same. In this case, the column space is only multiples of the first column. Solving Ax = 0 must give us two independent vectors that span the null space. Figure 4.9 shows the column and null space. We will show the steps in computing the null space. We start with the matrix C and do our standard elimination steps. We do not need to augment the data vector because it is the zero vector, and elimination cannot change this. 3 3 3 3 3 3 3      C = 3 3 3 → 0 0 0 →  0 0 0 0 3 3 3 0 









0 3 3 x1     0 0 x2  = 0. 0 0 0 x3 



 

(4.34)

The equations that we would solve for this would be: 3x1 + 3x2 + 3x3 = 0,

(4.35)

0x1 + 0x2 + 0x3 = 0,

(4.36)

0x1 + 0x2 + 0x3 = 0.

(4.37)

There are two free variables, x2 and x3 . We set one to 1 and the other to 0 and solve for each variable. This will give us two solutions. We can write down the null space

114 ■ Linear Algebra for Earth Scientists

of C as shown in equation 4.40 and Figure 4.9. x2 = 0, x3 = 1 → x1 = −1.

(4.38)

x2 = 1, x3 = 0 → x1 = −1.

(4.39)









−1

−1

    N (C) = c2  0  + c3  1  = xs .

(4.40)

0

1

We show these special solutions as xs . And we can get a particular solution, xp when b in the column space of A. Suppose that all the components of our data vector are 9. This is clearly in the column space, so we must have a unique particular solution. Then, our total solution for the unknowns vector x is −1 −1 3 9         b = 9 → x = xp + xs = 0 + c2  0  + c3  1  . 0 1 0 9  

 









(4.41)

This result should seem quite familiar. Equation 4.41 is precisely the same result we got previously in equation 4.29. How can this be? In the first example, we were solving for a 2 × 3 coefficients matrix, and now our matrix A is 3 × 3. How can we possibly get the same solution? The solution space for the unknowns is similar in both cases: it is a subspace of R3 . We are solving a problem that requires 3 components in the solution. That the column space is in R2 in the former example and R3 in this example does not change the fact that the solution vector x is in R3 . 4.4.5 The Row Space of A is the Column Space of the Transpose of A

Like the definition of the column space, the Row Space occupies the vector space with dimensions corresponding to the number of components in the rows. If each row has three components, the A row space is in R3 . However, the row space is further defined as the subspace spanned by the rows of A, so it is possibly all of R3 or a zero, one, or two-dimension subspace of R3 . The Independent Rows Span the row space in the matrix. This all should sound very familiar from our development of the column space. We adopt another important convention to describe our matrix’s rows and row space. Instead of reworking our diagrams and rotating our thinking by 90° for rows instead of columns, we use AT. Taking the transpose of a matrix simply puts the rows into the columns we are used to working with. Thus, the row space of A is the column space of AT and is symbolized C(AT ). For an m × n matrix, the column space is in Rm and the row space in Rn .

When Does Ax = b have a Solution? ■ 115

MATLAB® moment—Output of MATLAB® functions In previous MATLAB® moments, we have noted that the program provides orthonormal bases for both the column and null spaces. What does it give for the rectangular matrix A in section 4.4.3 and Figure 4.4? "

#

−0.707 >> s=orth(A) gives s = . The vector s is parallel to the column −0.707 " #

space in equation 4.14 and Figure 4.4. It is just the vector to unit length.

3 normalized 3





−0.577 −0.577   >> r=null(A) gives r =  0.789 −0.211. −0.211 0.789 Although the vectors in r look different from those in equation 4.22, they are the basis vectors for the same 2D subspace. An easy way to show this is by taking dot products with the vector t = [1 1 1], which is perpendicular to the null space in equation 4.22. We will show how to find this vector later in the chapter. For now, the dot product of t with any of the vectors in r or from equation 4.22 is zero, demonstrating that those vectors are all coplanar.

Now, we can replicate everything we said in the previous section on column space for the row space. The only difference is that each column of AT has n components, not m. This means that the C(AT ) must be either the whole of Rn or a subspace of Rn . If a matrix is full row rank, the rows are all independent, as are the columns of AT . Let’s start by looking at the row spaces of the examples in equation 4.11. 1 1 1 1   AT = 1 1 1 , B T = 1 1 1 → C(AT ) = c1 1 ̸≃ C(B T ) = c2 1. 1 1 1 1 







 

(4.42)

In this example, the matrices A and B both had column spaces in R3 as shown in equation 4.11. They have different dimensional row spaces in equation 4.42. The row space is a one-dimensional vector space for A and a three-dimensional space for B. Again, this shows that the row and column spaces belong to different vector spaces. For a matrix that is m × n, the column space is in Rm , and the row space is in Rn . For a matrix n × n, the column space is in Rn , and the row space is in a different vector space Rn . To see this in action, let’s also go back to the matrices we used in equations 4.12 and 4.13. We present these as the transposes, so we consider the dimension of the

116 ■ Linear Algebra for Earth Scientists

The row space and null space for the matrix B from section 4.4.3. Matrix B a 3 × 3 matrix of rank 2. This means that the row space, shown as C(B T ), spans two dimensions and is the orange plane and vectors in the diagram. The last dimension is spanned by the N(B) shown as the green vector. Note that the null and row spaces are orthogonal, as any dot product between vectors in the two spaces will give zero. The solution space for the vector x is all of the x1 -x2 plane and is shown in blue. Figure 4.10

column space of the transpose. "

#

1 3 1 ,

1 3 1 , 0 0 1

scalar in R1

plane in R2

h

i

1

3 1  0 1 , 1 −1 1





 0

volume in R3

1 1   1 3 

3 1 0 1  . 0 −1 1 1 

(4.43)

hyperplane in R4

The C(AT ) is in R4 for the last example in equation 4.43, but with 3 columns, it can only fill a 3-dimensional subspace. We call this a Hyperplane in R4 because it has no volume. To have volume in R4 , we need four vectors that span this space. Three vectors do not span R4 , so they must be a subspace of only 3 dimensions in 4-dimensional space. We discuss the configurations of higher dimensional shapes in later chapters. We will look at the C(AT ) for two more examples from earlier. We will use the rectangular example in section 4.4.3 and the matrix of rank 2 from section 4.4.4. In Figures 4.10 and 4.11, we show only the results for the row space and null space, both subspaces in the same vector space R3 . Note that we follow the same color scheme for the subspaces but add an orange color to indicate the row space components. We can note something significant in Figure 4.10. The row and null spaces appear perpendicular. For the matrix B, we can verify this by taking the dot products of

When Does Ax = b have a Solution? ■ 117

The row space and null space for matrix C from section 4.4.3. This is a rank 1 matrix. Again, the row and null space occupy the vector space R3 . Because there is only one independent column in C T , the C(C T ) spans just one dimension. The other two dimensions are spanned by the N(C). Note that the null and row spaces are orthogonal, as any dot product between vectors in the two spaces will give zero. The solution space for the vector x is all of the x1 axis shown in blue and with a blue vector. Figure 4.11

the basis vectors: 0 3 0 0 0 0             −1 · 0 = 0 , and −1 · 3 = 0 . 1 0 0 1 3 0 

  

 



  

 

(4.44)

A similar diagram is shown in Figure 4.11 for matrix C from section 4.4.3. This is a rank 1 matrix, and the column space of C T spanes just one dimension. Again, the row and null spaces appear perpendicular. Taking dot products of the basis vectors confirms this. The orthogonality is easy to see in these examples, but if we had more rows and columns, we would have to rely on the inner products to be sure. Our conclusion is that C(AT ) is orthogonal with N(A). We also show the possible solution spaces in Figures 4.10 and 4.11. For the B matrix in Figure 4.10, the particular solution is the x1 -x2 plane, which spans two dimensions. The row space also spans two dimensions. For the C matrix in Figure 4.11, xp is a line. The row space for this matrix is also one-dimensional. So, it appears that the dimension of the particular solution space is the same as the row space. This is indeed the case. On the other hand, the row space is perpendicular to the null space, but the solution space has no set relation to the other two. Therefore, we can span Rn using basis vectors of the null space and from either the row or solution space. The relationship

118 ■ Linear Algebra for Earth Scientists

of the row to the solution space of Ax = b is a complicated one beyond the scope of this text. Knowing that they are of the same dimension is sufficient for our purposes. The row and null spaces are subspaces of the vector space Rn , where n is the number of columns in the coefficients matrix A. The row subspace has dimension r, the rank of the matrix, and the null space has dimension n − r. The row and null spaces span all Rn . Likewise, the column space resides in Rm and is a subspace of dimension r. The natural question is, what takes up the other m − r part of vector space in Rm ? That is the subject of the next section. MATLAB® moment—Finding the row space There is an easy way to get the row space of a matrix. We will use the null() and orth() functions and the matrix B from section 4.4.3. 3 0 0   B = 0 3 3, which is a 3 × 3, rank 2 matrix. 0 0 0     0 0 1     >> s=null(B); t=orth(B′ ) gives s = −0.707 and t = −0.707 0. 0.707 −0.707 0 



"

#

x We get a solution vector x = t 1 + c1 s. x2 We use both B and B T because the null and row spaces are subspaces of the same vector space. If the matrix were m × n, then both the row space and null space are in Rm .

4.4.6 Null Space of the Transpose of A

We have seen that the row and null spaces are subspaces that fill Rn . We now are looking for a similarly disposed companion for the column space in Rm . This last subspace is the Null Space of AT , which we symbolize as N(AT ), and it has dimension m − r. We got the null space of A by solving for the vector x in Ax = 0. And the vector x had n components that we label x1 , x2 , · · · xn . We get the null space of AT by solving for y in ATy = 0. The vector y has m components, y1 , y2 , · · · ym . In doing this, we perform the same operations we did to find N(A) but just work with the rows of A as the columns of AT . This will allow us to solve for the Span of N(AT ). We show in Figure 4.12 examples of rank 1 matrices from sections 4.4.3 and 4.4.4. For the 2 × 3 matrix on the left side of Figure 4.12, the column space and the null space of AT are both in R2 . We can set up the matrix AT and solve for vectors y

When Does Ax = b have a Solution? ■ 119

Vectors in lavender showing the null space of AT . On the left is a 2 × 2, rank 1 matrix. Column space is perpendicular to the null space of AT . The N(AT ) is computed solving ATy = 0. The right side shows another rank 1 matrix in 3 dimensions. The N(C T ) and C(C) in this diagram closely resemble the configuration of N(C) and C(C T ) in Figures 4.10 and 4.11. Figure 4.12

which solve ATy = 0. This gives us the expression for N(AT ). " # 3 3 3 3 " # 0 y −1       1 AT = 3 3 → 0 0 = 0 , giving y = . y2 1 3 3 0 0 0 







 

(4.45)

Doing this in a similar manner for the 3 × 3 matrix gives us the following the expressions for N(C T ): 3 3 3 3 3 3 y1 0 −1 −1            T C = 3 3 3 → 0 0 0 y2  = 0 , giving y =  1  , y =  0  . (4.46) 3 3 3 0 0 0 y3 0 0 1 





 

 









These results are plotted in Figure 4.12. We follow the same color scheme for the subspaces but add a lavender color to indicate the N(AT ) components. We are also back with the u-v-w axes we used for Rm earlier. By our convention, y1 is in the u direction, y2 is in the v direction, and y3 is in the w direction. For the 3 × 3 rank 1 matrix C, we get a result in Figure 4.12 that is very similar to the result for the null space shown in Figure 4.9. The reason for this is clear. In this case, A = AT , so all the spaces are in R3 , and the rows are the same as the columns. They look the same, but in Figure 4.12, we are in the u-v-v system of the column space while in Figure 4.9, we are in the x1 -x2 -x3 coordinate system we used for the row space.

120 ■ Linear Algebra for Earth Scientists

The column space and null space of the transpose for matrix B from section 4.4.4. This is a rank 2 matrix, so the column space forms a plane in three dimensions. The N(B T ) is one dimensional and perpendicular to the column space. An equation for the N(B T ) vector is given in the figure. Figure 4.13

For completeness, we show in Figure 4.13 the subspace of N(B T ) for the matrix B from section 4.4.4. The column space is in R3 , but the matrix is rank 2. This means that C(B) is a two-dimensional subspace of R3 and N(B T ) is a one-dimensional subspace. Both subspaces are shown in Figure 4.13. Again, we see that C(B) is perpendicular to N(B T ). We can also solve for the null space of AT by taking the transpose of ATy = 0. (ATy)T = 0T → y TA = 0T

(4.47)

In this case, we will find the subspace formed by y T . Because this vector sits on the left-hand side of A (not AT ), we commonly call this the Left Null Space of A. In contrast, the vector x is multiplied on the right-hand side of A. In the diagrams in Figures 4.12 and 4.13, we see that the column space of A and the null space of AT appear to be perpendicular. As we did for the null and row spaces earlier, we can take inner products and verify this is true. So we have sets of subspaces that have common properties, namely, the C(A) and N(AT ) are orthogonal and both in Rm , while C(AT ) and N(A) are orthogonal and in Rn . This section on the N(AT ) is short mostly because this subspace does not play a significant role in solving Ax = b when there is a solution. However, this null space becomes critical in dealing with Ax = b when A is full column rank and there is no solution. This is the topic of the next chapter and relies on N(AT )! 4.4.7 Plotting Ax=b in Vector Spaces and Subspaces

We are now ready to summarize all the spaces and subspaces as they pertain to solving Ax = b. We have six different parts of the system to consider. Four revolve around the matrix A. The columns of the matrix span a subspace of Rm that we call the column space C(A). The rows of the matrix span a subspace in Rn that we can

When Does Ax = b have a Solution? ■ 121

Plots of all the subspaces associated with Bx = b. The matrix B comes from a previous example in section 4.4.4. The left side shows those elements in Rm . This is all vectors of length equal to m or the number of rows in the matrix, including the column vectors of B, the data vector b, and the null space of B T . The right side shows elements in Rn , which is set by the number of columns. Here, we see the row space and null space of B and the solution or unknowns vector x. Figure 4.14

call C(AT ). When we think about the row space, we do it by looking at the columns of AT , which puts the rows of the matrix into the columns of the matrix’s transpose. Both the C(A) and the C(AT ) have a span equal to the rank of the matrix. The rank is the number of independent columns or rows, as these are equal. Thus, the column space spans an Rr subspace of the vector space Rm , and the row space spans an Rr subspace of the vector space Rn . For the C(A), if r < m, what accounts for the other m − r dimensions in the vector space? This is N(AT ) which has dimension m − r. Likewise, for C(AT ), if r < n, then the N(A) is of dimension n − r in the vector space of dimension n. If r = m, then N(AT ) = 0; if r = n, N(A) = 0. The last two vectors in the system for Ax = b are the unknowns and data vectors x and b. The possible values of x and b are in the row and column spaces, respectively. We show these subspaces in Figure 4.14. Our example is the rank 2 matrix B from section 4.4.4 also shown in Figures 4.8, 4.10, 4.11, and 4.13. Notice that for the C(A) or Rm side, we have the b vector for the data plotting as part of the column space. This has been the requirement for a solution in this chapter. On the left-hand side, we have the solution vector plotting neither coincident with the row space nor the null space in Rn . All three spaces intersect at the origin or the zero vector. This is important in that any vector space must contain the zero vector. As we noted earlier, each of the two sides of the figure has a set of perpendicular subspaces. The column space is orthogonal to the null space of B T in Rm , and the row space is orthogonal to the null space in Rn . Now that we have discussed and illustrated the C(A), N(A), C(AT ), and N(AT ), can we fit these spaces along with the data vector b and the solution vector x into

122 ■ Linear Algebra for Earth Scientists

The 4 subspaces and their relationships to rank, dimension, and vector spaces. The row space and null space are in Rn , and the column and left null spaces are in Rm . The row and column spaces have r dimensions, where r is the rank of A. The null spaces occupy the remaining dimensions of their respective vector spaces. A solution vector xp in the row space, if it exists, is taken by A to the vector b in the column space, whereas xs from the null space is taken to the zero vector in Rm . A combination of xp with any vectors xs still produces the data vector b. Note that 0 overlaps both subspaces in Rm and Rn as any subspace must contain the zero vector. Figure 4.15

a more general summary diagram? The answer is yes. We show the configuration in Figure 4.15, a diagram modified from Strang (1993)1 and several textbooks by the same author. This diagrams the two principal spaces we consider when seeing a matrix. The visualization of these spaces is critical, but it is underlain by the fact that the system Ax = b is part of two different vector spaces. The row and column spaces are in the vector spaces Rn and Rm , respectively. If a matrix is m × n, the columns have m components, and the rows have n components. This establishes the two different and separate vector spaces in Figure 4.15 that we explored in detail in the previous 1 The Fundamental Theorem of Linear Algebra: American Mathematical Monthly, v. 100, pp. 848–855.

When Does Ax = b have a Solution? ■ 123

sections. The row and column spaces are subspaces of Rn and Rm , although they may fill the entire vector space. For the equation Ax = b to have a particular solution xp , some linear combination of the columns must give b. There are n columns in A, which we symbolize as an , so we have to have the same number of scalar multipliers. We can write this as the equation: Axp = x1 a1 + x2 a2 + · · · + xn an = b.

(4.48)

So the matrix A acts on a vector we call xp with n components, x1 to xn . This is the operation across the top of Figure 4.15 that takes xp to b—a vector in Rn to Rm . There is a vector in Rm that always has a solution. If b = [01 02 · · · 0m ]T , then we know that x = [01 02 · · · 0n ]T is a solution. This is the zero vector, which we know is always present in a vector space. For there to be nonzero vectors x that solve b = 0, the row space cannot fill all of Rn and n − r > 0. Axs = x1 a1 + x2 a2 + · · · + xn an = 0.

(4.49)

This means that the null space of A is more than just the zero vector, and we can find special solution vectors xs that solve Ax = 0. This is the operation across the lower part of Figure 4.15 that takes xs to the zero vector in Rm . With the actions of the row and column spaces defined earlier, we show how to combine xs and xp across the center of the diagram. The vector xp comes from the row space and gives a nonzero value of b. At the same time, we can combine xp with any vector xs , and it does not change the result. The relationship is: A(xp + xs ) = Axp + Axs = b + 0 = b.

(4.50)

If the null space is not just the zero vector, it must be a nonzero subspace of Rn . This gives an infinite number of vectors xs that solve Ax = 0. 4.4.8 Two, Three, and Four Faults Revisited

To better demonstrate the four subspaces and when we can get a solution vector to Ax = b, we will look at the problem of slip on different numbers of faults that started this chapter. We will discuss the four subspaces for each case—the column, null, row, and left null spaces. Our emphasis here is to look at the coefficients matrices. After all, the four subspaces are defined by these matrices. These will be easiest to visualize for two and three faults since these are in R3 or R2 . As we analyze these, we can determine what conditions there are on solvability. Two faults We start with the two-fault example. The matrices and vectors, along with the reduced row echelon form for the matrix A, are shown in equation 4.51. The slip vectors are shown here and in the following sections with a subscript such as as . The data vector that we usually call b will be shown as the net motion vector s with its components sN , sE , and sD . We show the setup in the left part of

124 ■ Linear Algebra for Earth Scientists

S a

F b N E D

Figure 4.16 The complete look at the two-fault problem. Left shows the problem again as in Figure 4.1, and the slip matrix is shown in the bottom left. Middle shows C(A) and N(AT )in R3 . In this case, the axes are NED and shown with D pointed upward for clarity. Also shown is the possible data vector b. The C(AT ) and possible solution space fill the entire as -bs coordinate system in R2 . For this reason, we show the plane half in orange and half in blue.

Figure 4.16. The rest of the diagram shows the four subspaces for the system based on Gaussian elimination on the matrix A. The solution to the problem and the reduced row echelon version is 1 2 " # sN 1 2 " # sN   as     as   =  sE  → 0 1 =  sE  → 0 1 bs bs 0 1 sD 0 0 sD − sE 















1 0 " # 1as = sN − 2sE sN − 2sE   a   1bs = sE rref: 0 1 s =  sE . → bs 0 0 sD − sE 0as + 0bs = sD − sE 







(4.51)

We can see that for a solution to exist, the data vector has the condition that sE = sD . If this condition is met, a solution can be found for any value for sE and sN . So, for this case, for two faults, there is either one solution or no solution to Ax = b. The column space is in R3 , and the row space is in R2 . The column space contains two independent vectors, so it is in 2 dimensions, meaning the null space of the transpose must be in 1 dimension. The row space is formed from two independent rows, so it is in 2 dimensions. The null space is only the zero vector. An important observation here is that the 4 subspaces can be evaluated regardless of whether or not there is a solution to Ax = b. Suppose the vector b does not lie in the C(A) in the middle diagram; then no solution exists. Again, note that the slip vectors plot in the NED coordinate system of R3 whereas the solution vector has as -bs coordinates in R2 .

When Does Ax = b have a Solution? ■ 125

S c

a

F b N E D

The complete look at the three-fault problem. Left shows the problem again as in Figure 4.1 and the slip matrix shown in the upper left. Middle shows C(A) and N(AT ). The axes are NED and shown with D pointed upward for clarity. Right side shows the N(A) and C(AT ). The Oval in the lower right is a hole through the as -bx coordinate plane showing the vector in the C(A) pointed toward negative cs values. Figure 4.17

Three faults We present the same treatment for three faults. The matrices and vectors, along with the reduced row echelon form for the matrix A, are shown in equation 4.53. The subspaces are shown in Figure 4.17. as sN sN 1 2 1 as 1 2 1           0 1 −1  bs  =  sE  → 0 1 −1  bs  =  sE  → cs sD − sE 0 0 2 cs sD 0 1 1  







 





1 0 0 as sN − 3(sD − sE )/2 − (sD − sE )      sE + (sD − sE )/2 0 1 0  bs  =  → (sD − sE )/2 0 0 1 cs 

rref:



 



as = sN − 3(sD − sE )/2 bs = (sE + sD )/2 . cs = (sD − sE )/2

(4.52)



(4.53)

(4.54)

The coefficients matrix is full rank and square. For this matrix, there is a solution for every data vector. The column and row space are in R3 . The rows and columns are independent, so the C(A) and C(AT ) fill all of R3 . This means that the null and left null spaces are the zero vector. As noted earlier and in the previous chapters, a square, full-rank matrix is the best type to encounter. Every data vector has a unique solution vector.

126 ■ Linear Algebra for Earth Scientists

c

a

d b N E D

The four-fault problem. Left shows the problem again as in Figure 4.1, and the slip matrix is shown in the upper left. Middle shows C(A). In this case, the axes are NED and shown with D pointed upward for clarity. Also shown is the possible data vector b. The right is a schematic view of the row space, a subspace of R4 . There are three dimensions to the row space and one dimension for the null space. There are an infinite number of possible solutions. Figure 4.18

Four faults We end with the four-fault case and the vector spaces and subspaces shown in Figure 4.18. The matrices and vectors, along with the reduced row echelon form for the matrix A, are shown in equation 4.56.    as 1 2 1 3      bs  0 1 −1 −1   =  cs  

0 1

1

1

ds

   as  sN 1 2 1 3        bs    sE  → 0 1 −1 −1   =   cs  



sD



0 0

2

2

ds



sN  sE  → sD − sE

(4.55)

 

 a   1 0 0 2  s sN − (sD − sE )/2 − 2(sE + (sD − sE )/2)    b   sE + (sD − sE )/2 rref: 0 1 0 0  s  =  →  cs  0 0 1 1 (sD − sE )/2 ds

(4.56)

as + 2ds = sN − 3(sD )/2 − 7(sE )/2 bs = (sE + sD )/2 . cs + ds = (sD − sE )/2

(4.57)



For this example, the matrix is full row rank but not full column rank. The last row tells us that we have 2cs +2ds = sD −sE for a solution. This means we can always solve for any data vector, but we get infinite solutions since we can put in any value for either cs or ds and solve for the other unknown. For this example, the column space is in R3 , and the row space is in R4 . Because the rows are independent and there are three independent columns, we get that the column space is 3 dimensional and that the left null space is just the zero vector. There are three independent rows for the row space, so they span a three-dimensional subspace, meaning the null space must be 1 dimensional. The row and null spaces are orthogonal and fill all dimensions defined

When Does Ax = b have a Solution? ■ 127

S a c

F

b b

N N D

E E

D

Three faults and a rank 2 matrix. The diagram on the left shows that faults b and c have identical slip components in the coefficients matrix. The center diagram shows the column space and the left null space. The right diagram shows the null space and solution space. The row space is omitted for clarity but would be shown as a plane perpendicular to the null space. Figure 4.19

by the number of components in the rows—the same as the number of columns. We cannot show much with the row and null spaces because of the extra dimension. 4.4.9

What’s Hiding in the Null Space?

It is pretty clear how the column and row spaces are connected to solving Ax = b. What do we gain from looking at the null space? Does it help us understand problems in the Earth and Environmental Sciences, or is it just needed for bookkeeping in Rn ? To illustrate how much we can learn from the null space, we set up a three-fault problem a bit different from the one in Section 4.1.1. Once again, this is a threedimensional arrangement so that we can visualize the connection of linear algebra to the solution of a physical problem. The conclusions may seem obvious to the reader, which is our intention. It is impossible to start with a 50-dimensional example and give much physical insight. As labeled and shown in Figure 4.19, we will have three faults. This is one strikeslip fault labeled a and two normal faults labeled b and c. The difference from the example in Section 4.1.1 is that the normal faults are parallel, and the motion vectors for b and c are also parallel. The rows and columns have 3 components, so the vector spaces associated with this picture must be in R3 . The reader can refer to the block diagram in Figure 4.1 to see the orientations and slips in a schematic block diagram. The slip vectors and the coefficients matrix for this problem are shown in equation 4.58. The columns of A are the slip vectors for the faults, with components in the order N-E-D. Notably, the normal faults are parallel and have the same orientation for their slip vectors. 1   a = 0 , 0  

1   b = 0 , 1  

1   c = 0 , 1  

1 1 1 N   A = 0 0 0 E . 0 1 1 D 



(4.58)

128 ■ Linear Algebra for Earth Scientists

We now write down the problem for Ax = b and put it in reduced row echelon form. This only requires a row swap and cleaning upwards in the coefficients matrix. We also adopt the usual names for the solution vector and data vector. sN − sD N 1 0 0 as sN 1 1 1 as           A = 0 0 0  bs  =  sE  → 0 1 1  bs  =  sD  D. E sE 0 0 0 cs sD 0 1 1 cs 

 





 







(4.59)

We see immediately that the matrix is rank 2, with the last row all zeros. This means that for there to be a solution, the value of sE , which is eastward motion, must be zero. This is clear from Figure 4.19 and equation 4.58 because none of the faults have a component of motion in the east direction. We can write down the equations for this system in equation 4.60 to get the following: 1as + 0bs + 0cs = 0as + 1bs + 1cs = 0as + 0bs + 0cs =

sN − sD sD . sE

(4.60)

In solving this rank 2 system, we assign cs as the free variable and assume its value is 0. Setting sE = 0 gives the solution that as = sN − sD , bs = sD , and cs = 0. This last term means that fault c has no motion. The only active fault with down motion is b. Any motion in the D direction on b is balanced with an equal amount of movement to the north. Motion on fault a is only to the north. Let’s take the slip vector s as having components sN = 4, sE = 0, and sD = 2. The simple solution is as = 2 and bs = 2 with cs = 0. We will consider this the particular solution to the problem, so we call this vector xp . We can multiply through and see that this works. 2 1 1 1 2 4 N        xp = 2 → Axp = s → 0 0 0 2 = 0 E . 0 0 1 1 0 2 D  



 

 

(4.61)

Let’s recast the results in the equations 4.60 by adding all three together. 1as + 1bs + 1cs = sN .

(4.62)

It is clear from this last equation that we should be able to solve fault motions with nonzero values of all three unknowns as , bs , and cs . Where do the other solutions come from? The null space! Let’s now write down the full solution to the problem that contains a particular solution and the null space. We set components in the east direction of the data vector to zero. 0 sN − sD 0       N (A) = −1 → x = xp + xs =  sD  + c1 −1. 1 0 1 











(4.63)

We can see now that the components bs and cs can play off each other. We can add slip to fault c by subtracting an equal amount of slip from fault b. Figure 4.20 shows several examples of this. We will keep the slip vector s that sN = 4, sE = 0,

When Does Ax = b have a Solution? ■ 129

S a

a

F

c

c

b

b

N D

E

a

a c

c

b

b

Map views of solution vectors x combining the particular solution and components in the null space. Thin red lines with arrows are the net slip vectors, the sum of the vector xp with the null space contribution. The xs vectors are shown as thick green lines. The upper left shows the particular solution, taken when c1 = 0, along with the total north component in the blue arrow. Total solutions for x are given with c1 values of 1, 2, and 3. For c1 = 1, both faults b and c must move. When c1 = 2, all motion is on fault c. Finally, for c1 = 3 (or even > 2), fault b moves in the opposite direction, which is geologically unreasonable. Figure 4.20

and sD = 2 with its particular solution as = 2, bs = 2, and cs = 0. Another solution using the results of equation 4.62 is as = 2, bs = 0, and cs = 2. How can we get to this vector of unknowns? A simple way is to take equation 4.63 and make the value of c1 = 2. From this, we get the following: 2 0 0 2 0 2             xp = 2 , xs = 2 −1 = −2 → x = xp + xs = 2 + −2 = 0 . 0 1 2 0 2 2  









 





 

(4.64)

130 ■ Linear Algebra for Earth Scientists

If we take the result for x in equation 4.64 and multiply it by the A matrix, we will get the slip vector s. Figure 4.20 shows another series of solutions we can get by setting c1 to 1 and 3 as well as 2. We see that each choice gives us a solution! Are all of the solutions reasonable? It is clear that for c1 = 0, 1, or 2, the motion on faults b and c is zero or down to the north. The last solution with c1 = 3 implies that fault b moves up and to the south. Although mathematically possible, this is not a possible solution for this geologic configuration with normal and strike-slip faults. In addition, we can see that c1 < 0 causes one or both faults to move in the wrong direction. From this null space analysis, we gain important insights into the problem. The possible values of c1 are not limited in the null space but are restricted in the real-world problem. We must conclude that 0 ≤ c1 ≤ 2. Again, this is a simple and obvious problem that can be visualized in three dimensions. The null space can tell us how different variables in the solution vector can interact. We know that to have a null space, the columns of the matrix cannot be all independent. In setting up and solving the problem, we can discover how and which variables may interact and offset each other. This may be the easiest way to identify such tradeoffs in higher dimensional systems. Our last bit of analysis for this problem will be to ask: What if the east component of s is not zero? We immediately know that there is no exact solution as nothing in the matrix A can account for a nonzero sE . What if we want to compute the closest solution? We could set the sE to zero and forge ahead. But we could also add a component of sE by adding something from the null space of AT . We can see in Figure 4.19 that N(AT ) points in the east direction and offers another null space to find a component for the data vector b. Using this approach forms the basis for most of the next chapter on Solving Ax = b when there is no solution.

4.5 RANK AND SIZE OF A DETERMINES IF WE CAN SOLVE Ax = b The following sections discuss the generalities of when solutions may exist. Elimination identifies the number and positions of pivots and the rank of the matrix. Remember, the rank (r) is equal to the number of pivots, and this is the number of independent columns and variables that we can solve for. Any extra columns (n − r) then appear as free columns/variables in the system. Any extra rows (m − r) may cause issues with the overall solvability of the system. We can distill this down to four cases for rank and matrix size that determines the solvability of the system and the possible number of solutions. In all of this, a central conclusion is that there is a solution vector when the data vector is in the column space of the coefficients matrix. 4.5.1 Unique Solution—Rank and Number of Rows and Columns are Equal: r=m=n

When the rank, number of rows, and number of columns are equal, the system is solvable and always gives a single unique solution. This was the entire focus of the

When Does Ax = b have a Solution? ■ 131

last chapter. This is the case for the following matrix. 1 1 2 3    2 5 4 → elimination  0 1 3 5 0 



2 1 0



3 1 0 0    −2 → R = 0 1 0 = I. 0 0 1 4 





(4.65)

So in equation 4.65, we see that the number of pivots and, hence, the rank is equal to the number of rows and columns for this matrix. This matrix is full rank and solvable for any data vector b. We have the identity matrix when we get to the reduced row echelon form R. This signals that there is one unique solution. 4.5.2

The Most Common Case: Full Column Rank: r = n < m

What happens if there are more rows than columns, similar to the three-point problem with more points? In this case, our m × n matrices have m > n, and we simply have more equations than unknowns. We set up a system like that shown in equation 4.66 where we have 5 data points. We will let b be the null vector for now. 1x1 + 2x2 + 2x3 3x1 + 7x2 + 0x3 4x1 + 8x2 + 9x3 1x1 + 2x2 + 4x3 2x1 + 4x2 + 6x3

= = = = =

0 0 0 0 0

1 3   4  1 2 



2 7 8 2 4

2   0 0 0 x  1       9 x2  = 0.    0 4  x3 6 0 

 

(4.66)

We need to do elimination. Are we anticipating problems? Possibly not because our data vector is the null vector. We will find that when we do elimination, at least two of the rows will go to zero, so we might be left with three nonzero equations for the three unknowns in x. 1 3   4  1 2 

1 2 2   7 10 0   8 9 → 0   0 2 4 4 6 0 



2 3 0 0 0

2 4 1 0 0





       →     

1 0 0 0 0

0 1 0 0 0

0 " # 0  I  1 = R = .  0  0 0 

(4.67)

We end up with three pivots and three unknowns to solve for. We show the rowreduced echelon form of the matrix on the right. The matrix R goes to the identity matrix as earlier, with zero rows filling in the rest of the following matrix. If we take the pivots and put the matrix into equations, we get: 1x1 + 2x2 + 2x3 = 0 0x1 + 3x2 + 4x3 = 0 0x1 + 0x2 + 1x3 = 0 0x1 + 0x2 + 0x3 = 0 0x1 + 0x2 + 0x3 = 0.

132 ■ Linear Algebra for Earth Scientists

Unfortunately, the null vector is the only possible solution vector for x, which we take to mean there is no solution. So the case works out, but not very interestingly, and we get little added information. Is the situation better if the data vector b is nonzero? Possibly, but probably not. In this case, we will have to analyze as we did in the previous examples for faults above. b1 b   2   b3    b4  b5

1 0   0  0 0 

 



2 3 0 0 0

2 4   1 x  0 0







=

b1   b2 − 3b1     b3 − 4b1 .     b4 − 9b1 − 2b3  b5 − 10b1 − 2b3

(4.68)

And again, the possibilities for b are minimal. So, although we can get a solution, we are not guaranteed one. As stated in Section 4.3, for there to be a solution, b must be in the column space of A. So, for full column rank with many additional rows, we conclude that we can end up with zero or one solution, but no more. Having more equations than unknowns is quite common. Although there is no unique solution, there are robust ways to deal with this. This case is so common and important that the next chapter is devoted almost entirely to solving it.

MATLAB® moment—Making a singular matrix When we have made matrices using the rand() command in MATLAB® , we have come up with invertible matrices. What can we do to get a singular or rank-deficient matrix? The easiest way is to use the magic square command. A magic square is one where all the columns and all the rows sum to the same value—it is like a Sudoku puzzle. 16 2 3 13  5 11 10 8    >>A=magic(4) gives A =  . 9 7 6 12 4 14 15 1 



Because of the row and column behavior, if we specify a size of 4, we get only 3 independent columns and rows. >>rank(A) will return the value 3.

4.5.3 Lots and Lots of Solutions: r = m < n

Our next case is when the matrix is full row rank but has additional columns. This is a bit messier than the previous case because we will definitely have to deal

When Does Ax = b have a Solution? ■ 133

with the null space and explore what that implies for solutions. This is the case we looked at earlier with four faults but only three motion directions, which is more unknowns than equations. We will set up this example with five columns and three rows to get a wider matrix. 1x1 + 2x2 + 2x3 + 2x4 + 2x5 1x1 + 2x2 + 4x3 + 8x4 + 2x5 2x1 + 4x2 + 6x3 + 10x4 + 8x5

   x1  1 2 2 2 2  = 0 x2     = 0 → 1 2 4 8 2 x3    2 4 6 10 8 x4  = 0 

0   = 0. 0  

(4.69)

x5 We have set the data vector to be the null vector. We already know there will be trouble because we have fewer equations than unknowns. Let’s do the elimination and see if it gets worse. 

1  1 2

1 2 2 2 2   2 4 8 2 →  0 4 6 10 8 0 



2 0 0

2 2 0

2 6 0

1 2 2 2 3 2    0  →  0 0 2 6 0. 0 0 0 0 4 4 





(4.70)

0

2 0 0

2 2 0

2 6 0

pivot

free

1

 0

pivot



free

pivot

And it does. So now we have three rows/equations and five unknowns. We can quickly identify the pivots and put the matrix in row echelon form. Three columns have pivots, and two have 0 in the pivot position. This is a matrix in R3 because each column has 3 components. The matrix is of rank 3 because it has 3 pivot columns and there are n − m or 2 free variables in the N(A). The most informative way to view this is with the matrix in row echelon or reduced row echelon form, as shown in equation 4.71. The pivot columns end up with a 1 at the left part of each row. The columns above the pivots are set to zero by upward cleaning, but we are left with other columns with nonzero values and no pivot in the free columns. We saw this all before in section 4.4.3. The free variables in the x vector match the free columns.

2 1 2 0  0  → Rx → 0 0 1 4 0 0 0 



   x1 pivot  −4 0  x2  free   3 0 x3  pivot.   0 1 x4  free

(4.71)

x5 pivot

We establish the null space by setting one of the free variables to one and the other to zero and solving the system for the null vector. Again, we have gone through this procedure before in section 4.4.3. So, we get the equation for the null space or the special solutions to the system. 4 −2  0   1          = c1 −3 + c2  0 .      1   0  0 0 

xs







(4.72)

134  Linear Algebra for Earth Scientists

Let’s take an example of a nonzero data vector after elimination and row reduction. We will set them as follows: ⎡

1 2 0 −4 0 1 3 0 0 0 0

⎢ ⎣0

⎡ ⎤ ⎤ x1 ⎡ ⎤ ⎥ 0 ⎢ 2 ⎢x2 ⎥ ⎢ ⎥ ⎢ ⎥ 0⎥ ⎦ ⎢x3 ⎥ = ⎣4⎦ = b, ⎢ ⎥ 1 ⎣x4 ⎦ 2

x5

1x1 + 2x2 + 0x3 − 4x4 + 0x5 = 2 0x1 + 0x2 + 1x3 + 3x4 + 0x5 = 4. 0x1 + 0x2 + 0x3 + 0x4 + 1x5 = 2

giving equations

(4.73)

Let’s do what we know how to do after elimination – back substitution – to get a particular solution if it exists. The last equation in 4.73 gives x5 = 2. We then go to the middle equation and realize that we can solve for x3 only if we assume a value for free variable x4 . If we take x4 = 0 as we always do with free variables when finding a particular solution, we get that x3 = 4. Now we want to solve for x1 . Again, we can solve for x1 only if we set the free variable x2 = 0 and assign a zero value to x4 . This gives us the result that x1 = 2. So, our full solution to the system of equations is: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 2 4 −2 ⎤ x1 ⎡ ⎤ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 2 0 −4 0 ⎢ 2 ⎢x2 ⎥ ⎢0⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0 0 1 3 0⎦ ⎢x3 ⎥ = ⎣4⎦ → x = ⎢4⎥ + c1 ⎢−3⎥ + c2 ⎢ 0 ⎥ = xp + xs . ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣0⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ 0 0 0 0 1 ⎣x4 ⎦ 2 ⎡

x5

2

(4.74)

0

0

We have a particular solution. Notice in xp that only nonzero values are associated with the pivot variables. We end up with infinite solutions because we can always add a factor of the special solution to the result. The special solutions in equation 4.74 are in the null space and are defined by Ax = 0. These solutions can be added to any result we get and do not change the result. We conclude that we have one or an infinite number of solutions for this case of full row rank with many additional columns. One last thing we have to solve in the case of full-row rank is the column space of our matrix. When we start with the full matrix before elimination, we do not know which columns will be pivot columns. Therefore, we cannot pick a column space for the matrix at the start. We do the elimination and identify the pivots as shown in equation 4.75. Once we have these, we return to the original matrix, identify associated columns, and let them form C(A). ⎡









⎡ ⎤

1 1 2 2 2 2 ⎢ ⎥ ⎢ A = ⎣1 2 4 8 2⎦ → echelon form ⎣ 0 2 4 6 10 8 0

2 0 0

2 2 0

⎡ ⎤

2 6 0



2 ⎥ 0⎦→ 4 ⎡ ⎤

1 2 2 2 2 1 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 2 4 8 2⎦ → C(A) = c1 ⎣1⎦ + c2 ⎣4⎦ + c3 ⎣2⎦. 2 4 6 10 8 2 6 8

(4.75)

When Does Ax = b have a Solution? ■ 135

4.5.4

The General Case: r < m and r < n

The last case to consider is when the rank is less than the number of rows and columns. The relative size of the rows and columns does not matter because we will have some zero rows at the bottom of the matrix, and we will get some free variables because the matrix is not full column rank. So, the matrix will be messy after elimination and row reduction. We give an example in equation 4.76. 1 1 2 2 2 2    A = 1 2 4 8 2 → echelon form  0 2 4 6 10 4 0 





2 0 0

2 2 0

2 2  6 0. 0 0 

(4.76)

This gives us three free variables. We solve Ax = 0 and identify the null space for these. We can make infinite additions from the null space and not change the result. We have the problem, however, that there is a zero row. We must be lucky to get a particular solution when b ̸= 0. And without a particular solution, there is no solution possible. For those reasons, we either get zero or an infinite number of solutions. MATLAB® moment—Finding pivots and column space Can we use MATLAB® to find the pivots of our matrix? Yes, we can do that with the rref() function. For the matrix A in equation 4.76 we would do the following: >>[R,p]=rref(A) will give a reduced row echelon form matrix R and a vector p with the columns containing pivots. 1 2 0 −4 2 h i   In this case we will get R = 0 0 1 3 0 and p = 1 3 . 0 0 0 0 0 



Now we make vectors for the column space of A and put them in a matrix F . 1 2   >>F=A(:,p) gives F = 1 4, which are columns 1 and 3 of A. 2 6 

4.6



SUMMARY

In this chapter, we figured out the conditions necessary to solve Ax = b. The solutions were surprisingly varied, from no solutions to perhaps one unique solution to infinite solutions. The solvability of the fundamental equation of linear algebra rested on knowing the rank of the matrix compared to the number of rows and columns. The last chapter was about when the number of rows and columns and

136 ■ Linear Algebra for Earth Scientists

the rank of the matrix were the same. This chapter took things much further and handled rectangular matrices. We had rectangular matrices with a pivot in every column (full-column rank) and a pivot in every row (full-row rank), but not both. Square matrices may not be full-rank either—something we saw briefly in the last chapter when dealing with the independence of the columns of a matrix—and can misbehave just like rectangular matrices. Understanding vector spaces and subspaces gives us the tools and insights to work with matrices not of full rank. A vector space is defined in Rn by vectors with n components. The vector space must be closed under addition and scalar multiplication. All vectors in the space must have n components, including the zero vector. A subspace is a part of the Rn vector space spanned by vectors with n components. The subspace and its spanning vectors must be closed under addition and scalar multiplication. A subspace can span any number of dimensions from 0 to n, although the subspace must contain at least the null vector. The spanning vectors form a basis for the subspace. The equation Ax = b is a m×n matrix multiplying an n×1 vector to get a m×1 vector. The columns of the matrix A and the data vector b have m components, so they are in the vector space Rm , while the rows of A and the unknowns vector x have n components and are in Rn . A simple statement reflecting this action is The matrix A takes a vector x in Rn and puts in it in Rm as the vector b. For the matrix A we have a subspace we call the column space or C(A) spanned by the independent columns of A. For Ax = b to have a solution, the vector b must be a combination of the columns of A, so it must be in C(A). If not, there cannot be an exact solution. The rows of A form a subspace of Rn . Because we like using column vectors, we can write the row space as C(AT ), and it is spanned by the columns of AT . The next subspace is the null space of the matrix A. This is the subspace spanned by the solutions to the equation Ax = 0 and uses the notation N(A). The null space is in the same space as the solution vector x, so it plots in Rn along with the C(AT ). The last space we have is the null space of N(AT ), which solves for the combinations of the rows of A that add to zero. Each column vector in the row space has m components, so it plots in Rm along with the C(A). Another important property is that C(A) ⊥ N(AT ) and C(AT ) ⊥ N(A). We look mainly to the size and rank of the matrix A to figure out how many solutions there are to the equation Ax = b. We set the size of A to be m × n and the rank to be r. There are four possible cases. The first is when the matrix A is full rank and square, r = m = n, then there is always a unique solution for any vector b. The second is when the matrix is full column rank but has extra rows, r = n < m, and leads to having one or no solutions. For this case, there is a solution only if b is in the column space. Third, if the matrix is full row rank but has extra columns, r = m < n, we can have one or an infinite number of solutions. If the equation is solvable, we get a particular solution we call xp . To this, we can add an infinite number of special solutions xs that result from free variables in the system after elimination, which defines a nonzero null space. Fourth and last is the case where A is neither full column nor row rank, r < n and r < m, where we expect either no solution or an infinite number of solutions.

When Does Ax = b have a Solution? ■ 137

4.7

EXERCISES We have worked out solutions to the tomography problem in the previous chapters. We will now expand the problem to figure out when there is a solution and how many solutions we might have. The figure shows six combinations of sources and paths across our familiar 4-block problem.

1.

A

B

C

D

E

F

For each case, A to F set the design matrix and describe its solvability. Again, assume that each block is 1km on a side and that diagonal paths are simply √ 2km in length. Besides giving the matrix, give dimensions for all subspaces (column, row, null, left null spaces) and write down a basis for each. Give a final equation in the form x = xp + xs if you can. 2.

We will take a new approach to garnet components and chemistry. As a reminder, we have 6 types or components of garnet based on chemistry: 1. 2. 3. 4. 5. 6.

Andradite (An) – Ca3 Fe3+ 2 Si3 O12 Grossular (Gr) – Ca3 Al2 Si3 O12 Pyrope (Py) – Mg3 Al2 Si3 O12 Spessartine (Sp) – Mn3 Al2 Si3 O12 Almandine (Alm) – Fe2+ 3 Al2 Si3 O12 Uvarovite (Uv) – Ca3 Cr2 Si3 O12

With the 6 components, we have 7 cations, which are +2 cations Fe2+ , Ca, Mg, and Mn, and +3 cations that can be Fe3+ , Al, and Cr. In the last chapter, we

138 ■ Linear Algebra for Earth Scientists

made simplifying assumptions to get down to 6 cations, but now we want to work with all 7. (i)

Set up this problem again, this time with 7 cations and 6 components into a new matrix. Then, go through elimination to get conditions on b (asinAx = b), the list of cations. Your b vector should have 7 rows, and you should have the possible combinations of the components after elimination. Hint—keep the matrix you made last time and add Fe3+ as the last row. (ii) Now ignore the +3 cation in each component. Write a new matrix that only takes into account the +2 cations. You will have to change the size of the design matrix and vectors. What is the matrix do you get? Find the null space of this matrix. (iii) Finally, use only the +2 cations and ignore the Ca components. What is the matrix setup here, and does it give a solution? 3.

We present one more matrix that fits the condition that the rank is less than the number of rows and columns. Use MATLAB® functions to compute bases for all four subspaces. You can try using similar functions on A and AT . 1 2  A= 1 2 

1 2 2 2

2 4 4 4

2 6 8 1

1 2 1 2

2 4  . 6 4 

CHAPTER

5

Solving Ax= b When There is No Solution

The last two chapters centered on solving the equation Ax = b. We encountered cases where the equation could have one unique solution, an infinite number of solutions, and times when there was no solution. In this chapter, we deal with Ax = b where we know that there is no unique solution, but we want to find the best solution we can. This is one of the most important uses of linear algebra and usually arises from having many more rows than columns. Figure 5.1 shows two common cases where this can happen. The first is the fitting of a line to scattered but aligned data. The second is the 3-dimensional version of this problem, fitting a plane to scattered points akin to the three-point problem we have seen numerous times before. We know there is no exact solution in both cases, but we want one that most closely fits the data. Besides framing this in linear algebra terms, what we are embarking on is the basis for Linear Regression, perhaps the most common term we have for fitting a line to a set of data. In linear algebra terms, we can view this in light of our trying to solve the familiar equation Ax = b. The setup will have m >> n = r and look like the problem posed in equation 5.1, with possibly more columns but lots of rows. 

a11   a21  .  .  .







a12 b " #  1 a22   x1  b2   = ..    .. . .  x2  . 

am1 am2

(5.1)

bm

Our main point of the last chapter was that b had to be in the column space of A for there to be a solution vector x. If we do elimination on equation 5.1, we can never have more than 2 nonzero pivots with just two columns. This would mean that all but the first two components of b would have to go to zero for b to be in the column space of A. This is unlikely and is not achieved in any meaningful, real-world problems. For this reason, the data vector will never be exactly in the column space of the design matrix.

DOI: 10.1201/9781003432043-5

139

140 ■ Linear Algebra for Earth Scientists

RHS

Examples of data points fitted to a line in 2D and a plane in 3D. No single line or plane can go through all the data points, so we attempt to find the line or plane that fits the data the best in each case. Much of this chapter explores what we mean by the term best.

Figure 5.1

If there is no exact or true solution, what is the value of finding a vector x that is close? When we think about statistics, we do not find a true value but an average or best value when we take a mean value of scattered observations. In linear algebra, this corresponds to getting a solution vector for Ax = b when no exact solution exists. We want to find the solution vector x that best aligns the resultant vector b with the column space of A. We will call this vector x ˆ (x-hat) to distinguish it from an exact solution. We now are solving the equation Aˆ x ≈ b since there is no vector x ˆ that will make the product Aˆ x exactly equal to b. To find the vector x ˆ, we must find the solution closest to being aligned with the column space. This process is referred to as Projection and is the subject of the next section. We are now moving beyond just using linear algebra to solve a particular set of coupled equations!

5.1 PROJECTION We know that vectors can be added together and multiplied by a constant. This section explores a third operation – projecting vectors. We are already familiar with this idea. We use their projections on the coordinate axes when we express vectors as components in a coordinate system. We will start to understand the more general method by projecting one vector onto another and work toward projecting a vector onto the column space of a coefficients matrix. As we have done before, we present examples in two and three dimensions to visualize the process. 5.1.1 Lines in 2D and 3D

Figure 5.2 shows examples of projecting a vector onto another in 2D and 3D. Of course, the projection concept is general to however many dimensions we want to work in. On the left-hand side of Figure 5.2, we show two vectors a and b, and we want to project b onto a. In this case, we want the point on a that is the closest to

Solving Ax= b When There is No Solution ■ 141

RHS

Projections in 2 and 3D. The left side shows the vector b projected onto a. Its projection is labeled as vector p, which is xˆ times a. The projection of b on a is the closest Euclidean distance, which is the distance between the vectors measured perpendicular to a. How much b misses a is indicated by the vector e, which is just the difference between b and p. The right-hand side shows the same situation in 3D. The vectors, properties, and significances are the same as in the left-hand diagram. Figure 5.2

b, shown here by the tip of the vector p. By Closest, we mean in the Euclidean Distance sense of adding squared distances in each dimension we use. This is just the vector norm from the inner product described earlier. This is where we put a vector that we call e that is perpendicular to a that meets the tip of vector b. We show that the Projection Vector p has a misfit to b by the vector e, using e to stand for Error Vector as is typical in all of mathematics. Vector p is just a multiple of a, and call the scalar multiplier xˆ. Note that xˆ is only a scalar when projecting a vector. When fitting many points or lines, x ˆ will be a vector. What have we solved in this case? In this 2D example, we have solved the equation aˆ x = p. Given that b is not parallel to a, it cannot be in a’s column space, so the best we can do is find where it would end up if we had to put it in the column space. How? We take the nearest projection of b onto a, and that is p, which we get by multiplying a by xˆ. Again, ∥p∥ is the shortest Euclidean distance between b and a. We get the same set of operations to project one vector on another in higher dimensions, and we show the results for 3D in the right part of Figure 5.2. Let’s write all these vectors in a way that looks like linear algebra. First, we summarize what we know about the system. p = xˆa,

(5.2)

e ⊥ a,

(5.3)

e = b − p,

(5.4)

e = b − xˆa.

(5.5)

142 ■ Linear Algebra for Earth Scientists

Now, let’s go back to the ideas of dot products. We know that if e ⊥ a, then a·e = 0. We can then rewrite equation 5.5 by taking the dot product with a. a · e = a · (b − xˆa) = 0 → a · b − xˆa · a = 0 → a · b = xˆ a · a.

(5.6)

We have a lot of flexibility in arranging terms because taking the dot product produces a scalar and is commutative under multiplication. We can rearrange equation 5.6 into a simple quotient to solve for xˆ. Remember, in this case, xˆ is a scalar so that it can be taken to the front of the equation. xˆ =

a·b aT b = T . a·a a a

(5.7)

It is common to recast equation 5.6 by substituting aT for (a·) to produce what we call the Normal Equation: aTaˆ x = aT b.

(5.8)

Now we can find xˆ by just taking the dot products of known vectors or multiplying the transpose of one vector times itself and another vector. For the previous example, we are trying to find xˆ, a scalar. This means that we can write p as either xˆa or aˆ x. T In addition, for vectors, the product a a is a scalar. We can use equation 5.7 in the following way. aT b p = aˆ x=a aTa

!

!

=

aaT b. aTa

(5.9)

The operations are all associative so that we can move the parentheses. As noted earlier, aTa is a scalar, but the term aaT produces a square matrix. For this reason, we define a new matrix P that we call the Projection Matrix. The critical feature of the projection matrix is that it depends only on a. !

P =

aaT . aTa

(5.10)

The term aTa is just the norm or length of the vector a and is used in the projection matrix to establish the length or scale of the vector. The matrix computed from aaT gives us information about the orientation of a and thus how to interact with different vectors b to create the projection. We can use P with any vector b to compute the projection vector p: p = P b.

(5.11)

Let’s look at a numeric example starting with a vector a. We will work out the projection matrix P and then the projection vectors for two examples of b. 1 h i 1     a = 2 , then aTa = 1 2 1 2 = 6. 1 1  

 

(5.12)

Solving Ax= b When There is No Solution ■ 143

1 h 1 2 1 1 2 1 i 1      aaT = 2 1 2 1 = 2 4 2 , so P = 2 4 2 . 6 1 1 2 1 1 2 1  









(5.13)

Now that we have P , our first example will be a vector b that is not in the column space of a. 1 1 2 1 1 2/3 1       b = 1 , then p = P b = 2 4 2 1 = 4/3 , 6 1 1 2 1 1 2/3  



and xˆ =

1h

T

a b = aTa 6

 





  i 1 2   1 2 1 1 = .

(5.15)

3

1

(5.14)

We see in equation 5.14 that the vectors p and b are clearly different, and equation 5.15 shows that p = 23 a. Now try a vector b in the column space of a. In this case, we will just use b = 2a. 2 1 2 1 2 12 2 1 1        b = 4 , then P b = 2 4 2 4 = 24 = 4 . 6 6 2 1 2 1 2 12 2  



 





 

(5.16)

Notice that if b is in the column space of a, we get the vector back; it is its own projection vector. What about the value of xˆ?   i 2   aT b = 1 2 1 4 = 12, h

2

1 so xˆ = (12) = 2. 6

(5.17)

This gives us the result that the projection of b on a is twice a, exactly the setup we used with b = 2a. The next question is, what if we project the vector twice? In other words, we will project the vector b by P , take the result, and project it again. Let’s start by doing this for the example in equation 5.14, where b is not in the column space of a. Our first projection gives: 2/3   P b = 4/3 . 2/3 



(5.18)

Now let’s project the result again by P and see what happens: 1 2 1 2/3 2/3 1     P (P b) = 2 4 2 4/3 = 4/3 . 6 1 2 1 2/3 2/3 









(5.19)

This gives us the important result: projecting a second time leaves the result unchanged. Once projected, it is firmly in the column space of a. For this reason, we

144 ■ Linear Algebra for Earth Scientists

can write that P 2 = P . The matrix P is also called an Orthogonal Projector because it projects the point b orthogonally onto a. And what if we use P to project a? We get: 1 1 2 1 1 1     P a = 2 4 2 2 = 2 = a. 6 1 1 2 1 1  



 

(5.20)

This makes sense because projecting a with respect to a should give a back again. We can apply this to an equation with P and the identity matrix I. If P is a projection matrix or orthogonal projector, then I − P is also a projection matrix. This is because we can take the matrix I − P and square it and get the same matrix back. (I − P )2 = I 2 − 2P + P 2 = I − 2P + P = I − P

(5.21)

The matrix (I − P ) is called the Complementary Projector because it projects orthogonally to P . We can easily see that: P (I − P ) = P I − P 2 = P − P = 0.

(5.22)

5.1.2 Going from a to A

We saw how to project a vector onto a vector in the previous section—what if we are interested in projecting a vector onto a plane spanned by two vectors? The vectors can be put into a matrix A, and we want to project onto the span of A. This means that x ˆ is not a single value but must also be a vector. And this vector will have to reside in the row space because we will be solving the equation Aˆ x ≈ b. This will be pretty easy to visualize in three dimensions but impossible past that. Luckily for us, the 3D case is as complicated as we need to see, and going to n dimensions is just a simple extension of the same methodology. We examine the example in Figure 5.3. What is the setup here? The plane has to be defined, and an easy way is to specify two vectors a1 and a2 that form a basis for the plane. These vectors must be independent but not necessarily orthogonal. In three dimensions in the u-v-w system we see that a1 and a2 are each 3 × 1 vectors. We can arrange these into the columns of a matrix, and we call that matrix A, which is 3 × 2. The two independent vectors define the plane and the column space of A. 



| |   A = a1 a2  . | |

(5.23)

So we start with the equation Ax = b. Because there are more rows than columns, we have more equations than unknowns and anticipate problems. As usual, we can solve Ax = b provided that b is in the column space of A meaning that it lies within the plane. It is easy to see that the vector b1 in Figure 5.3 is within the span of a1 and a2 because we can take some combination of these basis vectors and get b1 . To

Solving Ax= b When There is No Solution ■ 145

Projecting a vector onto a plane in 3 dimensions. The matrix A will have two columns defined by vectors a1 and a2 . The blue plane shows C(A). Vector b1 is in C(A), so it does not require projection—we could simply find scaling factors for a1 and a2 . Vector b2 is not in C(A), and we must find its projection p in the column space. The projection misses the C(A) by the error vector e, which is perpendicular to the plane for the column space spanned by a1 and a2 . Figure 5.3

RHS

solve this case, we can find an exact vector x that 2×1 and formed by the coefficients multiplying a1 and a2 . We are not so lucky for vector b2 —it is not on the plane and not in the column space of A. So, for Ax = b1 there is a solution vector x, but Ax = b2 has no solution. We have to find the best solution by projecting it onto the plane. This will require a projection matrix P , a projected vector p, and an error vector e. Our solution will now be x ˆ, a 2 × 1 vector, not a scalar as earlier. We will not repeat all the steps to get to P , p, e, and x ˆ, but just give the results. We start with the projection matrix and get a result similar to equation 5.10. The usual equations hold as well, so we can simply rewrite the equations 5.2, 5.4, and 5.5 in terms of A. Remember, A has full column rank, but m > n, so we get one or no solutions to Ax = b. To make that equals sign true, we need to change the equation ˆ + e = b. slightly so that now Ax e ⊥ A,

(5.24)

e = b − Aˆ x(x ˆ is a vector ), and

(5.25)

e = b − p.

(5.26)

Because e and A are perpendicular, we get that ATe = 0. This gives us the results: AT (b − p) = 0 = AT (b − Aˆ x), →

(5.27)

ATAˆ x = AT b.

(5.28)

146 ■ Linear Algebra for Earth Scientists

Equation 5.28 is one of the most important equations in linear algebra and is the multi-dimensional version of the Normal Equation. It looks exactly like equation 5.8 with aT changed to AT . Using simple algebra, equation 5.28 seems to give the result that: x ˆ=

AT b . ATA

This result seems strange and needs to be corrected. When projecting one vector onto another, we used scalars and dot products that compute to scalars, and division by scalars is easy. We know how to multiply matrices, but how do we divide them? We can’t. Instead, we take the inverse of a matrix. We convert the denominator ATA, which is a square matrix, to its inverse and thus get a result for p, the projection matrix P , and the solution vector x ˆ. The matrix ATA is always invertible, as is explained in the next section. x ˆ = (ATA)−1 AT b,

(5.29)

p = Aˆ x = A(ATA)−1 AT b = P b,

(5.30)

and P = A(ATA)−1 AT = H, hat matrix.

(5.31)

The projection matrix is so universally used and important in the linear algebra of least squares and regression analysis that it is named the Hat Matrix. It is commonly shown as H. MATLAB® moment—Using the normal equation We now have the normal equation ATAˆ x = AT b for doing our projection and fitting of data. There are several ways to use this in MATLAB® to find the value of x ˆ. >>xhat=inv(A′ ∗A)∗A′ ∗b will give the answer we are after. However, there are better ways. It’s slow to compute for large matrices A, and it can be numerically unstable for matrices close to singular. >>xhat=A\b will provide x ˆ with a faster, more numerically stable algorithm. The easy deployment of the sophisticated \ to solve problems Ax = b as ˆ = A\b is a major strength of the MATLAB® language for linear algebra x problems.

We are now ready to get the best solution using Aˆ x ≈ b since there is no solution to Ax = b. The vector x ˆ is just right to get as close as we can to fitting a solution into the column space of A. The projection process starts with a matrix A with independent columns. For 2 columns, the solution vector has two components, and the projection is a line on the

Solving Ax= b When There is No Solution ■ 147

plane spanned by the column space. So, in this case, we are using the vector x ˆ to combine the columns of A to get the best or closest vector p in the span or column space of A. This is what is shown in Figure 5.3, and we use the representation for the columns of A as shown in equation 5.23. In equation form, we would get the following: 



| | " # ˆ  x  p = Aˆ x = a1 a2  1 = xˆ1 a1 + xˆ2 a2 . xˆ2 | |

(5.32)

In Figure 5.3, we project the vector b2 to p by taking a combination of the vectors a1 and a2 so that the projected vector is in C(A). We follow this same procedure to find the best solution vector x ˆ no matter how many columns or how long they are. We can consistently compute the projected vector p by finding the best unknowns vector x ˆ. And we get the best vector by producing the projection matrix P from matrix A. If A is full column rank, then we can always compute the projection matrix because ATA will always be invertible and produce a good result. Computing x ˆ involves multiplication of the n × n matrix (ATA)−1 with T an n × m matrix A and the m × 1 vector b. This produces an n × 1 vector, which is just right to be the vector x ˆ. The projection matrix P = A(ATA)−1 AT is the product of an m × n with an n × n with an n × m matrix, which produces an m × m projection matrix. The matrix P or, as we sometimes symbolize it, H, multiplies the m × 1 data vector to give the projected vector p. 5.1.3 More About AT A

The matrix ATA has some special properties. First, regardless of the shape of A, the matrix resulting from ATA is square. This is easy to see in that if A is m×n, then ATA takes a n × m matrix times a m × n, giving a result that is n × n. Depending on the relative sizes of m and n, the result may be a matrix with fewer or more elements than the starting matrix A. A 2 × 5 matrix produces a 5 × 5 result, but a 5 × 2 matrix gives a smaller 2 × 2 matrix. Second, the matrix ATA is symmetric, meaning it is symmetrical across the main diagonal. Why this is true is easy to see. We label the elements of A as aij and the elements of AT are aji . The ith column of A—shown using the symbol ai —becomes the ith row of AT . We call the resulting matrix B and label its elements bij . Each term bij is inner product of the ith row of AT , which is ai , with the j th row of A, which is aj . Likewise the term bji is inner product of the j th row of AT with the ith row of A. Because ai · aj = aj · ai , then bij = bji making the matrix B symmetrical. This is an important fact to remember when you are computing ATA—the result must be a symmetric matrix, or you made an error in calculation. Lastly, we can state that if the columns of A are independent, then ATA has independent columns. Since ATA is square, this also means that ATA is invertible. Therefore, we know that (ATA)−1 exists and is not zero. We will see more proof of this fact in a later chapter.

148 ■ Linear Algebra for Earth Scientists

These three properties—a square, symmetrical, and invertible matrix—make ATA a well-behaved matrix. It is just the sort of matrix that never gives problems in any operation in linear algebra. And this matrix has a name—it is called a Gram Matrix of A. We will sometimes designate this matrix as G.

5.2 SOLVING PROJECTION FOR LINES AND PLANES In this section, we will show how finding the solution to Aˆ x = b fits into the four fundamental subspaces discussed in the previous chapter. The key to understanding why projection gives the best solution is based on the left null space or N(AT ) and how it interacts with the column space. The discussion will be based on the most straightforward problem we can devise—fitting a line to three noncolinear points in 2 dimensions. And, of course, we will discuss the row and column views of the problem as they are visualized using the row and column subspaces. Even if you skipped over the detailed discussion of the fundamental subspaces in Chapter 4, you should get a lot out of this section. It sets the stage for understanding and computing least-squares problems. 5.2.1 Fitting a Line to More Than Two Points

The equation of a line is simple: v = mu + v0 . How do we solve this with just two different points? Let’s say the first point has coordinates (u1 , v1 ), the second has different coordinates (u2 , v2 ). Then we have two equations and two unknowns, shown in equation 5.33 that sets up as a linear algebra equation in 5.34. The matrix of coefficients in equation 5.34 is full rank and square, so we know there is a solution for the unknowns m and v0 . And anyway, we know two distinct points define a line. v1 = mu1 + v0 , and v2 = mu2 + v0 . "

u1 1 u2 1

#" #

(5.33)

" #

m v = 1 . v0 v2

(5.34)

What if we add one more distinct point, (u3 , v3 ), to the problem? We get the equation shown in 5.35, which usually does not have an exact solution. u1 1 " # v1   m   u 1 =  2  v2  v0 u3 1 v3 



 

(5.35)

So, we are after the values of m and v0 that give us the best fit to the points. We want x ˆ composed of what we can call m ˆ and vˆ0 that give the projected results that are closest to the actual data vector. We could find the solution using letter variables but formulate it here in a numerical example because it will be easier to visualize. We will set our three (u, v) points to (1, 2), (2, 1), and (3, 3). This gives us the system shown in linear algebra form in equation 5.36 and Figure 5.4. We will use this example for the next few sections.

Solving Ax= b When There is No Solution ■ 149

Diagram for fitting a line to points in equation 5.36. Axes are labeled u and v. The data points are labeled a, b, and c. The points do not fall on the line, so we must find the line in blue that minimizes the misfit between the lines and points shown as the errors ea , eb , and ec . The misfit is assumed to be v direction because the equation we use is v = mu + v0 , making v the dependent variable. Figure 5.4

Do we have a solution when we do elimination and take the system to reduced row echelon form? 1 1 " # 2 1 0 " # −1   m     m   = 1 elimination → 0 1 =  3 . 2 1 v0 v0 3 1 3 0 0 3 



 









(5.36)

No. We cannot find any values for m and v0 that will give a result. So now we find the closest values m ˆ and vˆ0 we can. 5.2.2 How We Normally Find the Best-fitting Line

This problem is usually posed as finding the slope and intercept m ˆ and vˆ0 that give the line through the points most closely aligned with them. That is how it is presented in Figure 5.4. Given the arrangement of points, it is clear that it will be impossible to find a single m and v0 that will work. We are attempting to minimize the misfit between the line and the position of the points. We will refer to the mismatch by using e, which is the error or distance of projection from the point to the line. We can do this by Least Squares, which minimizes the sum of the squared uncertainties. In other words, it finds the m ˆ and vˆ0 that makes e2a + e2b + e2c a minimum. As you may have guessed, this requires doing calculus and algebra that gets the right result but offers little intuition into the problem. The linear algebra approach gives us the same answer and nice insights into the solution. We take the basic setup in equation 5.36 and Figure 5.4, and work out the answer for Aˆ x ≈ b using a simple linear algebra equation. The coefficients matrix A is the u values arranged in a column and a column of ones, as given in equation 5.35. We get our solution by solving the normal equation ATAˆ x = AT b. Rewriting this and filling in our matrices, we get the equations in 5.37:     # 1 1 " # " # 2 1 2 3  ˆ 1 2 3    m = 2 1 1 .

"

ATAˆ x = AT b



1 1 1

3 1

vˆ0

1 1 1

3

(5.37)

150 ■ Linear Algebra for Earth Scientists

This gives us a problem we can solve quickly—two equations with two unknowns, as shown in the linear algebra way in equation 5.38. We use elimination and then back substitution to arrive at the answer. "

#" #

14 6 6 3

"

#

m ˆ 13 = vˆ0 6

"

m ˆ 13 m ˆ = 12 = → . vˆ0 6 vˆ0 = 1

#" #

14 6 elimination → 0 6

"

#

(5.38)

Our solution vector x ˆ is [1/2, 1]. This tells us everything we need to know about the line. It has slope m ˆ of 0.5 and v intercept vˆ0 of 1. This slope and intercept are just right to minimize the sum of the squared errors. That sum, 1.5, is the least we can get for a line using this combination of points. 5.2.3 The Solution Viewed from Vector Space

Although we got a solution earlier, looking at Figures 5.2 and 5.4 and comparing with equations 5.37 and 5.38 raises some questions. First, why do we have three error vectors e in Figure 5.4, but only one in Figure 5.2 when we were setting up the matrix operations initially? Second, the error vectors are pointing in the v direction, but the vector e is perpendicular to the vector a in Figure 5.3. Does this make sense? Third, the line we seek does not go through the origin, meaning it is not a vector in our usual sense, so how can we use linear algebra? The solution in equation 5.38 is correct, but how did we get it with all these inconsistencies? The answer is simply that Figure 5.4 is a view of the problem better suited to line-fitting using algebra combined with calculus, not linear algebra and projections. Please do not be mistaken; the line is precisely the solution we are seeking for the three points, but using the normal equation 5.37 demands that we look at this in terms of the column space, row space, both null spaces, as well as the u-v physical space which we use to set up the problem, to gain deeper insights into how and why we get the best solution. How should we go about analyzing the problem? First, consider the upper left diagram in Figure 5.5 that shows the data we are trying to fit—quantities represented by u-v pairs plot as the data points we graph. The axes commonly represent physical quantities in some Physical Space. These could be force and extension on a spring, time and distance traveled, or a host of other factors. We will also have unit vectors u ˆ and v ˆ carrying the physical units associated with the problem. And finally, u and v are related by an equation, which in this case is the formula for a line um + v0 = v. This takes care of the physical setup, so we can now get to linear algebra. The reader should reference Figure 5.5 to understand the configuration of the subspaces and the operations we describe. Let’s restate the linear algebra setup in equation 5.39, first as it would appear as variables in equations and second as the plotted values shown in the figure. ua 1 " # va 1 1 " # 2 ˆ ˆ   m     m   Aˆ x ≈ b →  ub 1 =  vb  → 2 1 = 1. vˆ0 vˆ0 uc 1 vc 3 1 3 











 

(5.39)

Solving Ax= b When There is No Solution ■ 151

Diagrams of fitting three points to a line. The upper left shows the points and best-fitting line in the coordinate system u-v. Lower left is the row space. As shown in the orange vectors, the row space spans all of R2 and can reach any combination of m and v0 . The null space is the zero vector shown as the green dot at the origin. The right side shows R3 containing the column space and the left null space. These subspaces are of dimension 2 and 1, respectively. The N(AT ) is orthogonal to C(A). The upper right shows the axes with the a-b plane flat. The lower right is a rotated view, so we look at a relatively flat column space plane. Note that the yellow-colored quadrant is the same part of the a-b plane in both the upper and lower parts. The cyan vector is the data vector we use in this problem and the vector we will project onto the column space shown by the red plane. Figure 5.5

152 ■ Linear Algebra for Earth Scientists

We start understanding the subspaces by first analyzing the coefficients matrix A. It has two columns of three elements, so its column space is in R3 . The matrix A is full column rank, but its columns only span a plane, which makes the C(A) a 2-dimensional subspace of R3 . The data vector b must sit in this plane for there to be a unique solution to the Ax = b problem. The row space is defined by the column space of AT . The transpose has three columns of two elements, so it is in R2 . The matrix AT is full row rank, so the subspace for C(AT ) fills all of R2 . There is no room left for anything but the zero vector in the N(A), the vector space paired with C(AT ) in R2 . We already knew that the null space N(A) is just the zero vector since A is full column rank. The last space we consider is the null space of AT . Because C(A) spans a 2-dimensional subspace of R3 , we know a single, three-component vector must span N(AT ). We understand that this is a lot to take in about the problem. The main points to remember are that N(A) = 0 and that N(AT ) ̸= 0. There is something in the left null space. Like the null space in the last chapter—What is hiding in the left null space? Let’s explore the spaces in detail using Figure 5.5. The lower left shows R2 containing the row and null space. This is the space for the unknowns of slope and intercept for the best-fit line. The row space has axes for m and v0 . These may seem like odd axes, but this is the row space and not the data space, so it is posed in terms of the unknowns we are trying to determine. The two orange vectors form a basis for the C(AT ). We can see for AT that we get: "

1 2 3 A = 1 1 1 T

#

" #

→ C(A ) = c1 T

" #

1 2 + c2 . 1 1

(5.40)

These span the two dimensions of m and v0 , and the unknowns vector shown in blue can fit anywhere in the space. The physical and row spaces are in R2 , but their vector spaces are different. The u and v axes represent some physical quantities, say time and distance. What units are associated with the coordinate system for the row space? The first variable m is the slope, and if u and v are time and distance, then this variable has units of velocity. The second variable, v0 , can be a little more complicated. If v is distance, then v0 could have dimensions of length. This shows why we cannot just stick the x-y labels on axes just because they are in R2 . The two plots in the left side Figure 5.5 are in different vector spaces. The right hand part of Figure 5.5 shows two perspectives of the C(A) and N(AT ) in R3 . Using components based in these subspaces makes the projection method and the normal equation clear. The C(A) spans 2 dimensions with two independent columns. The N(AT ), which is perpendicular to C(A), is what allows us to span all of R3 . What coordinate axes do we use for R3 ? Each column of A represents an unknown in the equation um + v0 = v. The first column is the u component that multiplies unknown m, and the second is a 1 multiplying the unknown v0 . Each row of A is one of the data points we call a, b, and c. Each data point defines one of the components in the R3 coordinate system. Thus, the columns of A are vectors a1 and a2 , and the rows are the components as shown in equation 5.41. So

Solving Ax= b When There is No Solution ■ 153

2D View from top

Diagram the column view for fitting a line to points in equation 5.36. On the left is a 3D view with the projection vector p shown in purple. It lies in the column space. It best fits the cyan data vector b ending at [2, 1, 3]. The error vector e connects the ends of p and b and is parallel to the null space vector. The upper right side shows the combination of vectors—one half of [1, 1, 1] plus one of [1, 2, 3]—to reach the projection p, and the lower shows the plane itself with the components, column space vectors, and p. Figure 5.6

we label the axes a, b, and c in Figure 5.5. If we add more data points, we have to add more axes and lose the ability to visualize. a1

1  A=2 3

a2  1 a − component 1 b − component . 1 c − component

(5.41)

Because the vector b is not in the column space, we have to find the best vector p which is. From the first section, we know that if we project perpendicular to the column space to the vector b, we find p. The vector between these is called the error vector so that e = b − p, with e perpendicular to the column space. What else is perpendicular to the column space? The null space of AT ! We are looking for the vector e perpendicular to the span, and that is precisely the orientation of the vector for N(AT ). Thus, establishing the 4 fundamental subspaces gives us all the information we need to solve Aˆ x ≈ b. We transfer the setup in Figure 5.5 to a view focused on the column space and left null space shown in Figure 5.6. For the specific case shown in Figures 5.5 and 5.6, e is the difference between [2, 1, 3] and [1 21 , 2, 2 12 ]. This gives e = [− 21 , 1, − 12 ]. Thus, e = 21 [−1, 2, −1] or 21 × the

154 ■ Linear Algebra for Earth Scientists

vector of N(AT ) as shown in Figures 5.5 and 5.6. These results are summarized in equations 5.42 and 5.43.   " # 11 1 1 1  2  2   = Aˆ x = 2 1 2=p 1 



3 1

(5.42)

2 12

1 11 1 2     2  2  1        e = b − Aˆ x = 1 −  2  = −1 = −2 . 2 1 1 1 3 22 2

 













(5.43)

This is set up by using the normal equation ATAˆ x = AT b. This is called the Normal Equation because it finds the normal direction, which is also the null space direction, to the column space. It finds the perpendicular direction to C(A) no matter how many dimensions we are in. Notice that the three components we found for e are identical to the three errors we got in Section 5.2.2. This is no accident. We go from the physical view in the upper part of Figure 5.5 to the column space view in Figure 5.6 by posing the problem in terms of the data points and not the u-v axes. The column space, therefore, is defined by the components of our data points. We hope this detailed example makes the operation of fitting data clear to the reader. Each part of the problem is necessary to understand how to reach the solution or unknowns vector x ˆ. These parameters best fit the data vector relative to the column space. Remember, we still have to compute the product Aˆ x to get the projected vector p. A lot of the time, however, we are most interested in the values of x ˆ. In the previous example, we are usually most interested in the slope and intercept values from the x ˆ vector. 5.2.4

Solving the Six-point Problem

Let’s go back to the three-point problem we have seen many times before, but now consider it the Six-Point problem we showed in the previous chapter. We show this diagram in Figure 5.7 and equation 5.44. point point point point point point

1 2 3 4 5 6

5 70  60   75  40 70 

70 50 10 40 55 20

1 30      1 90 mE    1   60  mN  =  . 90 1  z0    60 1 1 70 





(5.44)

Elimination does not help as the resultant vector does not zero out, as shown in the reduced augmented matrix in equation 5.45.         

5 70 60 75 40 70

70 50 10 40 55 20

1 1 1 1 1 1





       

       



5 70 1 30 0 −930 −13 −330   0 0 0.6 −5.5   . 0 0 0 −0.53   −0.27  0 0 0 0 0 0 −25.5 

(5.45)

Solving Ax= b When There is No Solution ■ 155

50 m

N

E

3 E

80

1 5

40

2

1N

5

6 4

2

N

4

U

6

3

2

5

1

6 4

60

3

E Diagram of the three-point problem with additional points. The left side shows maps with points plotted on contour lines. The right side shows fits to three points at a time; the top is 1, 2, 3, and the bottom is 4, 5, 6. Notice the mismatch of unselected points. Both parts are shown from the same view angle, roughly parallel to the surface, looking down to the southwest toward the dip.

Figure 5.7

This means that there is definitely no solution vector that can satisfy all the equations. We can get a solution for several combinations of three points, but not all, and the other points plot above and below the plane defined as shown in Figure 5.7. Each combination gives different values for the slopes and z0 . This is our general problem when there are more equations than unknowns. Now, we can solve the six-point problem. Our setup is exactly the same as in equation 5.37, and we use A, AT , x ˆ, and b. 5   70 5 70 60 75 40 70     60 70 50 10 40 55 20  75 1 1 1 1 1 1  40 70 

70 50 10 40 55 20

1 30     90 1  m  5 70 60 75 40 70  E   1     60  mN  = 70 50 10 40 55 20  , 90 1  z0  1 1 1 1 1 1  60 1 1 50 



20650 11050 320 mE 22700      11050 12525 245 mN  = 15100 320 245 6 z0 380 











mE mN z0

= 1.07 = 0.688 . = −21.6



(5.46)

(5.47)

All we have to do is write down the equation for ATAˆ x = AT b and plug in the values as is shown in equation 5.46. Once this is done, we multiply and get the results in equation 5.47. We then do elimination and back substitution to solve for

156 ■ Linear Algebra for Earth Scientists

x ˆ. This procedure has only a couple of steps and is easy to write down as a single linear algebra equation. We could also eliminate using Gauss-Jordan and compute the inverse of ATA and solve that if we wanted. The method and solution could not be more straightforward. How should we visualize this problem? Our best plane will look like Figure 5.1, where we have a plane with some points above, some below, and perhaps some on or near the plane itself. Why not make a figure showing the plane and the error vectors similar to Figure 5.4? We could do this and set up the problem similar to the solution for a line, but we do not gain any insights into the problem. It is too hard to draw accurately in three dimensions, and an inaccurate drawing gives us no more knowledge than we get from just looking at 5.1. Even if we do this, we will still perform a projection exercise, so we can go straight to equation 5.28 or 5.37 to solve the problem. Can we create a visualization similar to Figure 5.6 to help us see the solution? The answer is a resounding no. Why? Because this would require us to work with three vectors that are 6 × 1. In other words, we will solve for three variables in the vector space R6 , a drawing we do not know how to make. Our path here is just to solve the projection problem, knowing that we already did this in three dimensions easily and that there is nothing special about R3 , so going to higher dimensions is permitted by our method. 5.2.5 Using Algebra and Calculus for Least Squares

We will use something other than the word simple to describe the problem in terms of setting up a least squares solution without using linear algebra. The final result is the same as using the previous projection method, but it takes a bit of work to derive and must be re-derived every time we change the number of unknowns, the number of columns in the matrix. In linear algebra, we have one equation derived once. We refer to Figure 5.4 to set up the least squares problem. We see in the figure that there are three error vectors ea , eb , and ec . Our task now is to find values for the values in x ˆ, that is m ˆ and vˆ0 , that give the best fit by minimizing the sum of the P squared errors, e2i = e2a + e2b + e2c . This is why the method is usually called Least Squares fitting or method. So what is the equation for the error vectors ei ? It will be the difference between the computed v and observed value vobs at each of the observed points. If the equation of the line is v = mu + v0 , then e = vobs − v. We get e2 = (vobs − mu − v0 )2 . If we index each of the three points, we will get the following: e2a = (va − mua − v0 )2 e2b = (vb − mub − v0 )2 e2c = (vc − muc − v0 )2

and

X

e2i = e2a + e2b + e2c .

(5.48)

What is next? We have to minimize e2i for both m and v0 simultaneously. This means we must set up coupled equations of partial derivatives and solve them P

Solving Ax= b When There is No Solution ■ 157

simultaneously. For this, we will have to compute the following: ∂( e2i ) ∂m P

∂( e2i ) . ∂v0 P

and

(5.49)

After a bunch of painful algebra and taking partial derivatives and more algebra, during which we could make a lot of mistakes, we get to some equations for computing the slope and the intercept. There are many ways to write the result, but we pick one of the more popular notations from the book by Taylor on Error Analysis. Again, this is to get the result we got from using the normal equation and the setup of the matrices. m

n X

x2i + v0

i=1

m

n X

n X

xi =

i=1 n X

n X

xi + nv0 =

i=1

xi yi , and

(5.50)

yi .

(5.51)

i=1

i=1

The index n on the sums is the number of points we use to fit a line. If we use these equations and plug in the case we described for three points fitting a line in Section 5.2.2, we arrive at two coupled equations: 14m + 6v0 =13, and 6m + 3v0 =6.

(5.52) (5.53)

These are exactly the same two coupled equations that are the same as we got in equation 5.38. Of course, these give the result: m = 0.5 and v0 = 1! What an ordeal to arrive at the result we got easily using linear algebra. Lesson learned. This is the correct result, but it seems strange that we started with equations involving powers and ended up with the same linear equations in linear algebra. The reason is that the highest order terms we differentiate contain either m2 or v02 in isolated form. Taking the derivative of either of these terms will result in a factor of m or v0 in isolation so that they can be arranged into the same linear coupled equations. As stated earlier, getting there takes some work and is prone to mistakes, which must be redone for each new situation. In using the projection and matrix method, all we have to do is be able to write a linear equation for the data, like mx + v0 = y or maybe even y = aex , and we can jump quickly and with little effort to the solution. The most important point to remember here is that we have a relationship that can guide us to the best method to follow: Algebra, derivatives, sums (Hard) → Linear algebra and projection (Easy). (5.54) Keeping this rule in mind, the reader should see how much easier linear algebra is for fitting lines.

158 ■ Linear Algebra for Earth Scientists

Figure 5.8 How the 4 subspaces fit into projection to solve Aˆ x = p. The row space and null space are in Rn , and the column and left null spaces are in Rm . The problem is usually set so that m >> n and both the column and row space are of rank = n. The best fitting vector for b is p in C(A). We get the best solution vector x ˆ in the row space. The N(A) is the zero vector. The left null space contains the error vector e. Finally, we can find p by using the projector matrix P and e using the complementary projector (I − P ).

5.2.6 The Vector Space Picture of Projection

In the last chapter, we took a step back to see how Ax = b fits into the structure of the column spaces, null spaces, and real numbers. We do the same in Figure 5.8. Because we are dealing exclusively with the situation that m > n and A is full column rank, the N(A) is just the zero vector. The C(AT ) has row rank and dimension equal to n and resides in Rn . The solution vector x ˆ must sit in the row space. Things get m more interesting for R . The first observation in Figure 5.8 is that the data vector b is neither in the column space nor the null space. We already know that to get p, we have to use the orthogonal projector P times b to get a vector in the column space. We also have the vector e in the N(AT ). How do we get to e? We want to project in the direction of the null space from b because we know p ⊥ e. We saw in Section 5.1.1 that the matrix (I − P ) is the complementary projector to P and orthogonal

Solving Ax= b When There is No Solution ■ 159

to it. For this reason, we can use (I − P ) to project onto N(AT ) to get e. e = b − p = b − P b = (I − P )b.

(5.55)

And of course e = b − p gives b = e + p, as we can see plotted in Figure 5.8. The data vector b is neither in the column space nor the null space of AT . But it does reside in Rm , so it must be a combination of the vectors in C(A) and N(AT ). Unlike the case when working with the N(A) and mapping vectors in it to the zero vector in Rm , b has no map back to Rn .

5.3 CAN WE FIT SOMETHING OTHER THAN A LINE OR PLANE? So far, we have fit lines and planes with slopes and intercepts. Are there other functions that will work for our purpose? The answer is a resounding yes. We can use the projection method described here to find the best-fitting unknowns in most equations that we can linearize. Making the problem linear is just getting it into a form where all the unknown terms are to a first power. In the case of a line, we have y = mx + y0 . We get the values of x and y as data, so nothing is nonlinear. Our unknowns are m and y0 . These are to a single power, so they are linear too. The equation of the line is linear in the unknowns. Can we solve for the equation a + b cos(x) = y to determine a and b? At first glance, the cos(x) seems to be a barrier. However, the equation uses the data points specified by x and y. For this reason, cos(x) takes on a scalar value. y1 1 cos(x1 ) " #    1 cos(x )  y2   2  a  . .  = Ax = b →  . . . . ..  .  b . yn 1 cos(xn ) 







(5.56)

This means that the constants a and b work just like the slope and intercept for the equation of a line. We could not, however, use this setup to solve a problem like a + b cos(cx) = y. What is the difference with what we do for equation 5.56? In this case, the a and b are linear coefficients, but we cannot solve for c because it is a coefficient inside the cosine function. We do not have an easy way to get it outside. What if our function is a + bex = y? Of course, we can also use the same method for this problem. We would just put together a table that looks like the following: 1 ex1 y1  " #   1 ex2  a  y2   Ax = b →  = ..   ..   ..  . .  b . . 1 exn yn 







(5.57)

And the problem is solved. Again, the x are points that are evaluated by the exponential function. If we put x into ex , we just get back a scalar and see that

160 ■ Linear Algebra for Earth Scientists

The setup for solving an exponential to fit data given in the graph and shown in the matrix M . The profile is for the shape of a stream going downriver. The units for the axes are kilometers for x and meters for y. The M entries are each point’s x and y values. This is all the information needed. The orange line is the best-fitting polynomial for the equation in 5.62. Figure 5.9

800

400

200 400

equation 5.57 is essentially the same as equation 5.56. It is no more complex than v = mu + v0 , which started Section 5.2. 5.3.1 Fitting a Stream Profile

What happens if the exponent contains a constant so that instead of ex , we start with ebx ? We will explore this by trying to fit the parameters of a stream profile. To a good first approximation, the headward to the downstream profile will resemble exponential decay. We show an example of a profile in Figure 5.9 along with profile values in a measurements matrix we will call M . The function we will need to fit is aebx = y. Our data are x and y measurements, and the constants we need to evaluate are a and b. The constant a seems like it should be easy to find, but the b is in the exponential term. This is not linear, but can we get a linear representation in both a and b? The answer is yes, and we must take the logarithm of each side of the equation. 



ln aebx = y → ln(a) + bx = ln(y).

(5.58)

Once we have done this step, we can use the set of linear equations to construct a coefficients matrix, an unknowns vector, and the data vector: 1 x1 ln(y1 ) #   "  1 x ln(y   2  ln(a) 2)    Ax = b →  . .  = .  . ..  b  ..  ..  1 xn ln(yn ) 







(5.59)

This gives us the following equations and the normal equation to solve this problem. The only complication is to remember we are solving for ln(ˆ a), meaning we have to take the exponential of this value to use in our equation.

Solving Ax= b When There is No Solution ■ 161

The setup for tomography. For simplicity, each block is 1 km on a side, and the seismic energy sources enter at the starred arrows at time = 0. The signal travel time is measured across the block and is recorded by the value tn . Equations for each travel time are shown on the righthand side. Figure 5.10

A

1 km

ln(600) 50 ln(500) 100         ln(225) 150 " #      a) ln(100) 200 ln(ˆ ,  =  ˆb 250  ln(90)      300  ln(60)      ln(40)  350 ln(20) 400

1 1   1  1 Aˆ x≈b →  1   1  1 1 "

ATAˆ x = AT b →









#"

8 1800 1800 510000

(5.60)

ln(ˆ a) 37.9 = . ˆb 7517 #

"

#

(5.61)

Now, we can solve the linear algebra problem in equation 5.61 and come up with values for the constants. We can substitute our results for a ˆ and ˆb to get the following best fitting exponential equation: ln(ˆ a) = 6.909, a ˆ = 1001, ˆb = −0.00988 → y = 1001e−0.00988 .

(5.62)

5.3.2 Tomography of the Upper Crust

Our second example is performing seismic tomography. The physical setup is shown in Figure 5.10. If you have been doing the exercises at the end of the chapters, you have seen this several times before. If not, here is some explanation. Tomography is a method by which we can determine the seismic velocity, the speed of a compression wave through the Earth, as it crosses an area of blocks with different velocities. By looking at travel times crossing different blocks in different directions, we can figure out the velocity structure of the blocks. This method is used to look at the entire interior of the Earth using seismic waves produced by large earthquakes. For our example, we will consider four 1 km square blocks to be sampled by the seismic waves. We know that we can write an equation that relates travel time to distance and velocity, and the reader is probably familiar with many problems set up

162 ■ Linear Algebra for Earth Scientists

this way. For our example, we can write the time taken to cross a single block, here taken as the time taken to traverse Block A whose velocity is v1 as: v1 [km/s] × t[s] = 1[km] → t =

1 = v1−1 , → if t = 0.2s, then v1 = 5km/s. v1

(5.63)

With this, we can set up an equation for each time t that will depend on the distance traveled and the velocity in each block. The distance in each√block is set to 1 km for the horizontal or vertical paths. The distance increases to 2 km for the diagonal paths. For the paths giving t1 and t6 we write the following equations: √ √ t1 = v1−1 + v2−1 , and t6 = ( 2)v2−1 + ( 2)v3−1 . (5.64) All relevant equations for the other times are given on the right side of Figure 5.10. We have the physical setup, but what are we trying to find? The seismic velocities of each block So, our unknowns are the inverse of the seismic velocities. What is the data vector? It has to be the travel times that we observe. This way, we can arrange each equation as a scalar constant multiplying the velocities equalling the time. We recast the equations in Figure 5.10 as the linear algebra equation 5.65:   

1

t1

 0 √  2  Ax = b →   1   0

0

0.35





   1 0 0  −1   t  0.35 v 2      0 1 √1   1      −1       0 0 2 v2  t3  0.50    −1  =   =  . v  t  0.39 0 1 0    3   4        −1 1 0 1    √ √ v4 t5  0.36   2 2 0 t6 0.48 

(5.65)

As we go through this problem, we must remember we are solving for inverse velocities and will have to take the reciprocal to get the actual seismic velocities in [km/s]. Because this is always the case in this sort of work, it is common in tomography to call inverse velocity Slowness and substitute a variable such as s1 rather than using v1−1 . We will not follow that convention here because it introduces another term the reader has to keep track of in this text. Now, we use the normal equation for the matrix and vectors we have set up to determine the unknowns vector to get the velocities in each parcel. 4 1 1



ATAˆ x = AT b →

1 4 2   1 2 4

2 1 1









0.19 v −1  1    v −1  0.17   2   giving  −1  =  , v  0.18   3   v4−1 0.17









−1 2 v1  1.45   −1   1  v2  1.39   −1  =  , 1 v3  1.42     4 v −1 1.42 4



v1 v2 v3 v4

= 5.3 = 6.0 = 5.5 = 5.8

km/s km/s . km/s km/s

(5.66)

(5.67)

Solving Ax= b When There is No Solution ■ 163

Problems like the previous one are common in Geophysics. In such problems in the Earth, however, it would be common to work with thousands of velocity cells arranged in 3 dimensions. They present more complex examples to solve with much more detail, but many solutions start with the sort of computations we made earlier. 5.3.3

How Certain are We that the Best Solution is Good?

We will end this section by discussing uncertainty in our results for the best vector. This will not be an in-depth treatment of uncertainty or error analysis but will show how well the results fit the data. When we use the equation Aˆ x = p, we are computing the data vector using our best estimate of the values in the x ˆ vector. So our approach will be to use the differences between the p and b vectors to show the mismatch between the observed values in the data vector b and the computed values in the projection vector p. We call the difference the Residual of the Fit or simply Residual. Close alignment of values will produce a small difference. In contrast, weak alignment will result in higher residuals. Remember, we are trying to figure out velocities, but we are fitting using arrival times in the vector b, so the residuals from the analysis will be in terms of time. We can later convert this to velocity if needed. Both b and p are m × 1 vectors. We can compute their difference by just taking b − p and getting a residuals vector, which we will call e. We use e because the residual corresponds to the difference between observed and fitted values we have already called the error. As noted earlier, each row of b corresponds to a different data point we are attempting to fit. In computing e, we find the residuals for every data point of the vector. We want to see how big the vector e is. To do this, we just want the magnitude or norm of the residual vector, which is easily computed. ∥e∥2 = eT e → (b − p)T (b − p) = (b − Aˆ x)T (b − Aˆ x).

(5.68)

What are the units associated with this quantity? The vector b and p both have units of time, so the quantity ∥e∥2 must have units of [s2 ]. We will eventually work with the square root of this quantity, which will have the correct units of [s]. Now that we have the magnitude of the error vector, how can we use it? The first thing we must do is normalize its length to reflect the number of data points. For the same magnitude of scatter around a line, we expect a dataset with 100 data points to have a much larger ∥e∥2 than a dataset with 10 points. We are headed toward making the magnitude of the residual vector akin to the Standard Deviation we apply to take the mean of a set of values. Recall that we can compute the standard deviation, which we will call σ, using a simple summation. σ2 =

n X 1 (xi − x¯)2 (n − 1) i=1

(5.69)

where n is the number of observations and x¯ is the√mean or average value. We will do something similar here, symbolizing it as s = σ 2 . We replace the part in the summation in equation 5.69 with eT e. This gives an equivalent treatment for the residuals as the sum of (xi − x¯)2 . The term with n − 1 is a divisor that normalizes for

164 ■ Linear Algebra for Earth Scientists

the number of points. The equivalent term for the matrix operations is m − r, where m is the number of rows and r is the rank of the matrix, which will almost always be equal to the number of columns. This correction term is based on the number of degrees of freedom in the system, a subject we will not explore in this text. This gives us the final result in units of time: ∥e∥2 s2 = → s= m−r

s

∥e∥2 ∥e∥ = √ . m−r m−r

(5.70)

Let’s compute s using the data from the tomography example. 1 1 0 0  0.354   0  0.19 0.354 0 1 1 √   √   2 0     0 2   0.17 0.511 Aˆ x=p →   = ,  1 0.18 0.370 0 1 0       0  0.17 0.340 1 0 1 √ √ 0.491 0 2 2 0

(5.71)

0.35 0.354 0.004 0.35 0.354  0.004        0.50 0.511  0.011        e=b−p= − = → 0.39 0.370 −0.020       0.36 0.340 −0.020 0.48 0.491 0.011

(5.72)

∥e∥ 0.034 s= √ = = 0.017 [s]. 2 m−r

(5.73)





















This gives us an average residual of the travel times of about 0.02 seconds. Since our total travel times are around 0.4 seconds, this is about 5% (=0.02/0.4). With an average velocity of 5.5 km/s, this works out to a 0.3 km/s residual (5.5×0.05 [km/s]). With more work, we can compute the uncertainties on each of the individual block velocities, but that is beyond the scope of this book.

5.4

SUMMARY

This chapter featured a lot of fundamental equations related to the project of vectors onto other vectors and subspaces that we can use for solving a host of linear algebra problems. These equations are the basis for the least squares method used throughout the sciences and by anyone wanting to plot one variable against another. The problem centers around how to solve Ax = b when there is no solution because the number of rows greatly exceeds the number of columns. In other words, the number of equations exceeds the number of unknowns. We solve instead for Aˆ x = p, where p is the vector in the column space closest to b. Thus, the vector x ˆ is the best solution to the problem. The vector x ˆ is usually determined using the Normal Equation. The principal equations for vectors and matrices are given.

Solving Ax= b When There is No Solution ■ 165

For vectors, we get: Projected vector

→ aˆ x = p,

(5.74)

Normal equation

→ aTaˆ x = aT b,

(5.75)

Projection matrix → P = Solution vector → xˆ =

aaT , aT a aT b . aT a

(5.76) (5.77)

For matrices, we get: Projected vector

→ Aˆ x = p,

(5.78)

Normal equation

→ ATAˆ x = AT b,

(5.79)

→ P = A(ATA)−1 AT = H,

(5.80)

→ x ˆ = (ATA)−1 AT b.

(5.81)

Projection or hat matrix Solution vector

The solution vector x ˆ can arise from many common operations. Getting this from the normal equation can yield a best-fitting exponential to a set of data, a seismic velocity model for the Earth, and a host of other problems. Using this method relies on being able to write a series of equations usually related to some physical or modeled process. As stated earlier, this is the basis for the ubiquitous least squares method. The normal equation is perhaps one of the most used equations in science and engineering. This is all based on the method of projection from the left null space.

5.5 1.

EXERCISES We will start with a simple line-fitting exercise. You are given the points (1,2), (3,4), (3,5), (4,4), (4,5), and (4,6). (i)

Perform a least square fit for the best line. Work this example using the normal equation. Also, compute the residuals of fit. (ii) Plot the data and line in MATLAB® . (iii) Now add the point (10,1) and recompute. How different is your result? Show this by comparing the fitted line and the residuals of fit. 2.

Use the formula AT Aˆ x = AT b to solve for the best x(= x ˆ) for the following matrices and vectors. Also, compare this to the results of using the MATLAB® command \. 2 3 5     −9 7 = −9. 7 5 2 

(i)







166 ■ Linear Algebra for Earth Scientists

The graph of data points x-y for data given in the measurement matrix M . The entries in M are each point’s x and y values. This is all the information needed, along with the order of the desired polynomial. The orange line is the bestfitting polynomial for a simple parabola. Figure 5.11

8

4

-2

2

4

6

3 2 4    2 3  3   −9 7  3     (ii)   =  . 3  7 5     −2 3  3 3 4 −5 (iii) Try to draw a picture of the column space defined by the matrix and the b and x ˆ vectors for part (i). 

3.



 

We have seen several examples of fitting curves. Now, we will fit a set of points that, at first glance, seem to mark a parabola. The measurement matrix M is given in the right part of Figure 5.11. We will do several exercises on this data set to look at fitting the points to polynomials of different degrees. For each part, you will need to find the coefficients of the polynomial and compute the residuals of fit. (i)

Assume that the data can be fit by a parabola defined by the equation au2 + bu + c = v. Find the coefficients and the residual of this fit. Also, write an equation in the form Ax = b for the problem showing the matrix and vectors. (ii) Now analyze the data with three more polynomials of increasing degree from 3 to 5. These will be au3 + bu2 + cu + d = v, au4 + bu3 + cu2 + du + e = v, and au5 + bu4 + cu3 + du2 + eu + f = v. Find the coefficients and residuals. Is there any difference you notice between the fits or coefficients using even as opposed to odd power polynomials or terms of u? (iii) Try fitting the points with a curve of order 7, au7 + bu6 + cu5 + du4 + eu3 + f u2 + gu + h = v. What happens when you take the residual of the fit? Write out the system as a matrix and vectors. What is the rank of the design matrix in this case? Write out the matrices and vectors, but you should use MATLAB® to perform the computations.

CHAPTER

6

Determinants and Orthogonality

This chapter covers two important topics of linear algebra—determinants and orthogonality. Determinants give us some information about a matrix and linear transformations. In particular, they tell us something about how big the matrix is, whether it fills its dimensions, and whether it has a right- or left-handed coordinate system/transformation. Orthogonality is a central theme in the rest of the book. We are figuring out a transformation that takes vectors from the native coordinate axes of the starting matrix to a different but functionally equivalent set of basis vectors that are orthogonal and in a more convenient or natural coordinate system. We start going over orthogonality by presenting what may be familiar to the reader in the cross-product of two vectors.

6.1 DETERMINANTS, STRAIN, AND GEOLOGIC TRANSFORMATIONS In Chapter 2, we showed examples of the transformation view of matrix multiplication. One of the first geologic examples we discussed was strain. We gave diagrams of pure shear and simple shear deformations using deformed squares and stated that the strain changed the squares to rectangles and parallelograms but did not change the areas. We presented matrices to perform the transformations. We now repeat the pure shear and simple shear approaches but apply them to a brachiopod shell shown in the middle of Figure 6.1. The surrounding shells are deformed by simple shear deformation on the left and pure shear deformation on the right. Any of the deformations may be accompanied by a component of Dilation or Contraction, leading to an increase or decrease, respectively, in the area. Which ones of the transformations shown, preserve the area of the original shell? Do any of them maintain the area? It is difficult to tell whether there is any dilation or contraction just based on a quick examination of Figure 6.1. Perhaps we could get out a ruler, make some measurements, and then have a better idea, but that would take some effort. Is there some way to use linear algebra to figure this out? The answer is, of course, yes if we look at the transformation matrices. We list these in the following equations,

DOI: 10.1201/9781003432043-6

167

168 ■ Linear Algebra for Earth Scientists A C Simple Shear

Pure Shear

B D

Center of the diagram shows an undeformed shell of a brachiopod (courtesy of Mark A. Wilson, College of Wooster). Shells A and B are transformed by a simple shear strain of 45◦ with possible changes in area. Shells C and D are deformed by pure shear, again with a possible component of dilation or contraction. Deformations are discussed in the text, as is the overall strain in each case.

Figure 6.1

along with the accompanying dilations. "

A= "

B=

#

1 0.707 0 1

C= "

(6.1)

→ 44% area increase,

(6.2)

→ 44% area increase,

(6.3)

#

1.2 0.848 0 1.2

"

→ no dilation,

#

2.4 0 0 0.6

#

1.414 0 → no dilation. D= 0 0.707

(6.4)

The area changes can be obtained by computing the determinants of each of the transformation matrices. For these matrices, the determinants are easy to find. We go over computing determinants and more strain examples in the following sections. 6.1.1 Determinants

We start our study of determinants by computing the determinant of the 2 × 2 matrix A. We symbolize the determinant by det(A). We will more commonly use vertical braces – | | – around the matrix to signify that we are taking the determinant. "

#

a b A= , c d

a b det(A) → |A| → = ad − bc. c d

(6.5)

Equation 6.5 gives the formula for the determinant of a 2 × 2 matrix. This is simple, and we can immediately see from the examples in Figure 6.1 and equations 6.1 to 6.4 that the determinants for these matrices are: |A| = 1, |B| = 1.44, |C| = 1.44, and |D| = 1.

(6.6)

Determinants and Orthogonality ■ 169

What is the determinant of a matrix that changes neither area nor shape? That could be the identity matrix, and we get |I| = 1, which is true regardless of the number of dimensions! The determinant of the transformation matrices reflects the factor of change in area in each case, so we get the following dilations when we compare these values to 1 for I. Given this, our changes in areas are: Area changes: A = 0, B = +0.44, C = +0.44, and D = 0.

(6.7)

The results in equation are the same as those given in equations 6.1 to 6.4. In Chapter 3, we saw another method to calculate the determinant. The product of the diagonal terms of the matrix D from the factorization LDU gives the determinant. We do not have to go all the way to LDU either, as the main diagonal of the matrix U from LU decomposition by elimination is the same as that of D. The examples in equations 6.1 to 6.4 are in upper triangular form, so the determinants are the same as the product of the terms on the main diagonal of these matrices. MATLAB® moment—Finding the determinant Getting the determinant of a matrix requires just a single command. "

#

4 2 >>det(A) gives 10 for the matrix A = . 3 4 #

"

3 4 2 , >>det(B) gives: For the matrix B = 9 3 4 >>Error using det, Matrix must be square. Bad news, red text.

6.1.2 What does the Determinant Measure?

So, what does the determinant of a matrix tell us? What the determinant does extremely well if it is not zero is that it reflects what we will call the size or Volume of the matrix in whatever space it is in. If the matrix is used as a Linear Transformation, it reflects how much the original object is dilated or contracted. We present some examples in Figure 6.2. And again, if the determinant is zero, the matrix is singular and has no inverse. For the u-v and u-v-w coordinate systems, we will set the unit vectors up to coincide with the axis directions. This gives us the following vectors corresponding to the examples in Figure 6.2. " #

u ˆ=

1 0

" #

v ˆ=

0 1

1 0 0       or u ˆ = 0 v ˆ = 1 w ˆ = 0. 0 0 1  

 

 

(6.8)

170 ■ Linear Algebra for Earth Scientists

2

1 3

1

1 1

3 3

1

2

Diagrams for the determinant representing area and volume. Left is a 2D example with unit vectors and multiples of the unit vectors. The middle is a unit cube. Right is an example where we multiplied each unit vector by three. See the text for examples of matrices associated with these volumes. Figure 6.2

For our first example of unit vectors in the u-v system, we know that the determinant equals 1. This is also the area of the unit box in 2D. What if we double the length of the u-component or the lengths of both u and v-components? This increases the determinant to correspond to the increase in the area of the enclosed regions. As an aside, the units of our quantities work well for these operations. If we are considering the vectors to have units of [ℓ], then the 2D determinant has units [ℓ2 ]. 1 det [unit box] = 0 2 det [2u box] = 0 2 det [2u 2v box] = 0



0 = 1 ∗ 1 − 0 ∗ 0 = 1, 1

(6.9)



0 =2 ∗ 1 − 0 ∗ 0 = 2, 1

(6.10)



0 =2 ∗ 2 − 0 ∗ 0 = 4. 2

(6.11)

The determinant computes the volume of an enclosed shape in 3D. Again, the volume of the unit box is equal to 1. Of course, the determinant, in this case, has units of [ℓ3 ], exactly what we expect for a volume. What if we triple the length of each of the vectors? This increases the determinant by 33 = 27, see Figure 6.2. We will not give the formula for the determinant here, but we can see that going to 3 will give a (3)∗(Identity matrix), so it has 3s along the diagonal, which are also the pivots. The matrix is already in the form of D, so we multiply these terms to get the

Determinants and Orthogonality ■ 171

General area and volume for 2D and 3D from matrices. The matrix columns give the coordinates for each vector in the system. Using determinants and their formulas in 2D and 3D is certainly simpler than trying to figure out areas and volumes geometrically. The back faces of the parallelepiped are opaque for the 3D example. . Figure 6.3

determinant. det [unit cube] =

1 0 0 0 1 0 0 0 1

3 det [sides of 3 cube] = 0 0

0 3 0

0 0 3

= 1, and = 3 ∗ 3 ∗ 3 = 27.

(6.12)

(6.13)

Regardless of the orientations of the vectors, the matrix and its determinant will define a volume. We show two examples in Figure 6.3. Here, we will give constants for the values of the vectors and solve the determinants to get the volume. The boxes in Figure 6.3 are constructed from the following matrices in equation 6.14 and equation 6.15. We can take the determinant of each of the matrices using set formulas. You should remember the formula for the 2 × 2 case. The formula for the 3 × 3 matrix is hard to remember and also has terms that switch signs based on where the multiplier is in the matrix. Higher-dimensional matrices have longer formulas and more sign switches and can be prone to math errors. If you must, the easiest way to find the determinant is to use LDU or LU elimination. "

a b det c d

#

=

a b c d

= ad − bc,

a a11 a12 a13 a11 (a22 a33 − a23 a32 ) 11 a12 a13   det a21 a22 a23  = a21 a22 a23 = −a12 (a21 a33 − a23 a31 ). a31 a32 a33 +a13 (a21 a32 − a22 a31 ) a31 a32 a33 



(6.14)

(6.15)

172 ■ Linear Algebra for Earth Scientists

6.1.3 What is the Volume in Higher Dimensions?

Since we can take the determinant of any square matrix, a fair question to ask is, what does the determinant mean for volume when we get into higher dimensions? For example, if we have the identity matrix in 6D, what does it mean for a cube (if that is the correct term) in 6D to have a volume of 1? We call such cubes in higher dimensions Hypercubes or n-cubes. The determinant then measures the n-volume of the n-cube. The exact application of volume is clear in 2D and 3D, and we will keep our examples in this book to those dimensions. The volume of a 25-cube is not a terribly transparent concept in the Earth Sciences. So, what information do we get from the determinant for twenty-five dimensions? Although the exact meaning of size is a bit hard to understand in 25D, the determinant does give us a measure of how big a matrix is. This is easy to see in 2D and 3D; we can appreciate why we want to do this. We will see examples applied to strain in a later section. Also, the determinant gives us a factor of dilation or contraction when the matrix is used for a linear transformation. A curious and important result is considering what a zero determinant means relative to the number of dimensions. We already know that |A| = 0 indicates that the columns of a matrix are not all independent and that at least one of the pivots becomes zero. A zero pivot implies that at least one matrix row becomes all zeros during elimination. A zero row or dependent column gives the same information— that the matrix is not full rank and, therefore, represents fewer dimensions than is indicated by the number of columns or rows in a square matrix. We can then ask a simple question—What is the volume of a 24D hypercube in 25D? Surprisingly, it has to be zero and is so whenever the rank is less than the number of columns. This makes sense if we return to our familiar 2D and 3D examples. In 2D, a one-dimensional object is a line that has zero area. The determinant is 0. In 3D, a two-dimensional object is a plane, and a one-dimensional object is a line with no volume. This pattern continues to higher dimensions. 6.1.4 Why can the Determinant have a Positive or Negative Sign?

We will now compare two matrices that seem very similar. The first is just the identity matrix, and the second is the identity matrix with the rows flipped. These are simple 2 × 2 matrices, and we can use the formula in equation 6.5 to compute the determinants. "

#

1 0 A= → |A| = 1 ∗ 1 − 0 ∗ 0 = 1, 0 1 "

(6.16)

#

0 1 B= → |B| = 0 ∗ 0 − 1 ∗ 1 = −1. 1 0

(6.17)

What is the difference between the two cases? An area of 1 makes sense, but what does an area of −1 makes mean? We show in Figure 6.4 the situations for 2D and 3D for Positive and Negative Determinants. The first 2D case is an identity matrix that gives a value for the determinant of +1. The second is a permutation matrix

Determinants and Orthogonality ■ 173

(+)

(-) (+)

(-)

(-)

(+) (+)

(-)

Signs of rotation depending on the matrix. For the left side, a 2D example is shown, one with positive rotation (CCW) that corresponds to a positive determinant value, the other a negative rotation (CW) for a determinant of −1. The first column in the +1 case is u but is v for the −1 case. We see u → v is a CCW rotation, but v → u is CW. The right-hand side shows 3D examples. For the identity matrix, the rotations are positive (CCW) relative to the orthogonal axis, and the determinant is 1. For the example of a permutation matrix for one permutation, the rotations are negative (CW), and the determinant is −1. Figure 6.4

that exchanges rows and has a determinant of −1. We see similar computations in the 3D case of the identity and permutation matrices. The figure shows that the matrices describe a right-handed system for matrices with positive determinants, with rotations from positive to positive axes in the counterclockwise direction. For the diagrams with determinants of −1, the rotation is in a negative sense in that it is clockwise. For this reason, a negative value for the determinant indicates that the coordinate system is not strictly right-handed and that the operation of the matrix flips the handedness. This is sometimes referred to as meaning that the matrix creates a Reflection during linear transformation. Regardless, the absolute value of the determinant is the same, indicating that the object occupies the same amount of space in its coordinate system. 6.1.5

Computing Determinants

Historically, the determinant and its computation were a central topic in linear algebra. We can write linear algebraic formulas for computing the determinant. Repeated substitution and computing determinants can give us A−1 to be used to solve for the vector of unknowns in the equation Ax = b. The determinant is integral to an operation called Cramer’s Rule, the classic method used to compute A−1 . But for most purposes, the determinant is just a simple by-product of other calculations that are more efficient at solving Ax = b. In the Chapter on elimination, we noted that an advantage of the LDU method was that the Diagonal Entries of the matrix D could be multiplied together to give the value for the determinant of the matrix. If we just do LU decomposition, we can do the same thing with the entries of the U matrix along the main diagonal. Other ways to get the determinant do not involve elimination, but these methods are computationally slow and long. As an example, the General Formula for an

174 ■ Linear Algebra for Earth Scientists

n × n determinant looks like this, with the power p depending on which term we are computing: |A| =

n! X

(±1)p · (polynomials of degree k).

(6.18)

k=1

This sum involves working with ∼ n! numerical Operations. On the other hand, using the Gauss-Jordan method only requires about ∼n3 operations. Both seem like they could become huge, but n! grows much faster. For example, at n = 5, n! ≈ n3 , but by n = 10, n! is about 3600 times larger than n3 . Elimination is the easiest and fastest way to get a determinant for most matrices. Another way of getting the determinant is through eigenanalysis. We will commonly factor a matrix into its eigenvalues and eigenvectors. If we take a matrix and figure out its eigenvalues, the determinant of the matrix is equal to the sum of the eigenvalues. This is discussed in much greater detail in the next chapter. MATLAB® moment—Random integer arrays We emphasize here the randi() function discussed earlier. When making up examples, you often want a matrix or vector whose components are all integers. We get random real numbers between 0 and 1 using rand(). >>a=randi(10) assigns a random integer between 1 and 10 to a. >>b=randi(5,6) gives b with 6 components of integers from 1 and 5. >>A=randi(10,3,6) gives A, a 3 × 6 matrix of integers between 1 and 10. The first argument is the maximum value for the random integers. After this, we get the size of the array, with a single value indicating a vector.

6.2 STRAIN, DETERMINANTS, AND BASIC MATRIX OPERATIONS Now that we can find the determinant, let’s see some examples of using it. Since we are making the point that the determinant measures how big a matrix is and how much space it fills, we can ask for several special cases for matrices. We will demonstrate these in terms of finite strains but not offer extensive linear algebra proofs. Again, the reader can go to a linear algebra book for mathematicians, as opposed to Earth Scientists, to get the complete proofs and many more ways of explaining determinants in detail. The straightforward results using the ideas of finite strain make the identities natural. We start by assuming that what we are deforming has an area of 1. This is most easily represented by an identity matrix with |I| = 1. In Chapter 2, we started a discussion of finite strain with the matrices for pure and simple shear as shown here in equation 6.19. We noted that the determinants of the strains were 1, so there was no area change, but we did not show how to compute this. We now know how to take the determinants of these. For pure shear,

Determinants and Orthogonality ■ 175

Pure Shear

Simple Shear

Simple shear followed by pure shear deformation with no changes in volume. Matrices for the deformation are shown in equation 6.20. Figure 6.5

the determinant of the strain matrix is (2)(0.5) − (0)(0) = 1. For simple shear it is (1)(1) − (1)(0) = 1. Since the determinant is 1, there is no volume change, and the strain could be called Plane Strain in two dimensions. "

#

"

#

2 0 1 1 Pure shear (100%)A = , Simple shear (45°)B = . 0 0.5 0 1

(6.19)

6.2.1 Two Strains and the Determinant of AB

Now, let us consider other strain combinations; the first is when we apply one strain and then another. Let’s start with examples of simple shear followed by pure shear. In this case, we would just multiply the strain accomplished by simple shear, which is just the strain matrix, by the pure shear matrix. This gives us the result in equation 6.20, which again has a determinant of 1 [(2)(0.5) − (2)(0) = 1]. The result is shown in Figure 6.5. "

#"

2 0 AB = 0 0.5

#

"

#

1 1 2 2 = . 0 1 0 0.5

(6.20)

What if one of the strains changes volume? Let’s have the pure shear one double the volume. Although we are unsure how to do this geologically, it is pretty easy to set it up in matrix form. In this case, we could have the equations and result in equation 6.21. We can see that |A| = 2, |B| = 1, and the |AB| = 2. It is easy to see by comparing Figure 6.5 to Figure 6.6 that the area of the final strain parallelogram has doubled in Figure 6.6. "

2 0 AB = 0 1

#"

#

"

#

1 1 2 2 = . 0 1 0 1

(6.21)

From this we get that the determinant of the Product of Matrices is det(AB) = det(A) × det(B).

Simple Shear

Pure Shear

Simple shear followed by pure shear deformation with changes in volume. Matrices for the deformation are shown in equation 6.21. Figure 6.6

176 ■ Linear Algebra for Earth Scientists

Pure Shear

Pure Shear

Constant volume pure shear deformation. Volume is preserved, but the square becomes longer and flatter with more strain. Figure 6.7

Pure Shear

Pure Shear

Pure shear deformation that doubles the volume. Two stages of deformation produce 4 times the volume. Figure 6.8

6.2.2 The Same Strain Again and the Determinant of A2

Now, we can see what happens to volume if we apply the same strain twice as in Figure 6.7 for no volume change. "

#"

2 0 AA = A = 0 0.5 2

#

"

#

2 0 4 0 = . 0 0.5 0 0.25

(6.22)

Now, we will try using a matrix that doubles the volume in Figure 6.8. "

2 0 A2 = 0 1

#"

#

"

#

2 0 4 0 = . 0 1 0 1

(6.23)

From this we get the result that det(A2 ) = det(A) × det(A) = (det(A))2 . 6.2.3 Undoing a Strain and the Determinant of A−1

For this case, we must find the Inverse of a matrix that undoes the strain. If we use the example in equation 6.23 and Figure 6.8, we want to go from right back to the left. To do this, we require the inverse of the strain matrix A, which follows. "

#

2 0 A= , 0 1

"

A

−1

#

1/2 0 = 0 1

"

→ A

−1

#

1 0 A= . 0 1

(6.24)

From this we get once again that |A| = 2, and we get that |A−1 | = 1/2. This reciprocal relationship between the determinants of A and A−1 must always hold so that the volume of the strained object returns to its original value. 1 . From this we get the result that det(A−1 ) = det(A)

Determinants and Orthogonality ■ 177

Simple Shear

Changing the axis along which simple shear occurs. This is done with just the transpose of the simple shear strain matrix. The final results are different but have the same volume. Figure 6.9

Simple Shear

6.2.4 Switch Strain Axes and the Determinant of AT

What happens if we Transpose the strain axes? This is shown well if we look at the example of simple shear. We will switch the shearing so that it is upward on the right instead of top to the right. This is shown in Figure 6.9. This involves interchanging the shearing term, the one in the upper right, across the matrix. This operation is the same as transposing, so we get the result for B and its transpose in equation 6.25. It is clear from Figure 6.9 that there is no change in volume, so in this case, the two determinants, det(B) and det(B T ), must both be 1. Even with a volume change, such as that for pure shear deformation for matrix A in equation 6.19, the two determinants are equal whether we are stretching left to right or down to up. "

#

"

1 1 B= , 0 1

#

1 0 BT = . 1 1

(6.25)

From this we get the result that det(AT ) = det(A). 6.2.5 Multiplying Strains by a Factor of 2 and the Determinant of 2A

Our last case here is multiplying the strain matrix by a scalar value. We will look again at the pure shear case as it is easier to see volume changes. We will see what happens when the scalar factor is 2 and 3 for the pure shear matrix. For these cases, we get the matrices in equation 6.26. "

#

2 0 A= , 0 0.5

"

#

4 0 2A = , 0 1

"

#

6 0 3A = . 0 1.5

(6.26)

From our previous work, we can get the volume change simply from the determinants. For this we get det(A) = 1, det(2A) = 4, and det(3A) = 9. The relationship shows we take the scale factor to power 2, or the number of dimensions. We saw Figure 6.2 that for the factor of 3 in three dimensions, we increase the size by a factor of 27,

178 ■ Linear Algebra for Earth Scientists

Figure 6.10

Adding two finite strains.

or 33 . We generally take the scalar multiplier to a power equal to the n for a square, n × n matrix. From this we get the result that det(2A) = 2n det(A), det(mA) = mn det(A). 6.2.6 Adding Two Strains and the Determinant of A + B

This last section is perhaps the most important to remember and understand. How do we Add strains? The obvious answer is we just compute a result by taking A + B. We can add two like-sized matrices with the same units, so let’s do this with A and B from earlier in equation 6.19 and then see what the resulting strain looks like in Figure 6.10. "

A+B =C

#

"

#

"

#

2 0 1 1 3 1 + = . 0 0.5 0 1 0 1.5

(6.27)

Something has gone horribly wrong with this approach. We know from the discussion and previous figures that the determinants of A and B are 1. That means they do not change volume by their actions. But the determinant of C is 4.5, meaning that adding two matrices that do not change volume results in a transformation that increases volume several times. In structural geology, straining once and then again is called the Superposition of strains. Superposition is doing one strain, then taking the strained object and doing another. This is matrix multiplication, and we have shown it many times. We also showed in a previous chapter that AB = ̸ BA, meaning that the order of multiplications and the order in which we apply strains matters. Saying that we are adding strains together is a misleading and incorrect way of saying we are superposing strains. From this we get the result that det(A + B) ̸= det(A) + det(B), and one strain followed by another is always done by AB not A + B.

6.3 CROSS PRODUCTS Our next subject is the cross-products of two vectors, a topic the reader may have worked with before. The primary rule here is that cross products only apply to two vectors, no more or less, and are always done in three dimensions. The cross-product produces a third vector that is perpendicular to the ones used for the product. Since we live in 3D, cross-products can be extremely useful and help motivate the discussion of orthogonality. It can also tell us if the vectors are parallel much in the way dot

Determinants and Orthogonality ■ 179

products can inform about whether vectors are orthogonal. The use of the cross product is that the resultant vector defines the Null Space of the two vectors. 6.3.1 Cross Product of Two Vectors

The Cross Product is a special multiplication we do to two vectors in 3D. If we have two vectors u and v as in equation 6.28, then we define the cross product in the way shown in equation 6.29, which we put into vector form in equation 6.30. Notice the negative sign in front of the middle term. 

 



v1 u1     u = u2  , and v = v2  , v3 u3 ˆi u × v = u 1 v1

(6.28)

ˆj kˆ u u u u u u 1 2 1 2 3 3 ˆ ˆ ˆ . − j + k u2 u3 = i v1 v2 v2 v3 v1 v3 v2 v3

(6.29)

MATLAB® moment—Finding the cross product Getting the cross-product of two vectors is just a single command. 5 1     >>c=cross(a, b) gives c = −6 12 −6 for a = 2 and b = 4. 3 3  

h

 

i

>>norm(c) gives 14.7 for the area of the parallelogram.

The result in equation 6.29 is odd in that we are computing the cross product looks like the determinant of a matrix. This matrix has the coefficients of u and v in ˆ in the first row. the second and third rows, respectively, but has vectors ˆi, ˆ j, and k The result of taking this determinant is not a single number but a vector, as shown in equation 6.30. Otherwise, the computation of the cross-product is done similarly to that done for the determinant. This similarity will become even closer when we combine dot products with cross products. u×v = w =

  i u2 v3 − u3 v2  ˆ  ˆi ˆ j k u3 v1 − u1 v3 .

h

(6.30)

u1 v2 − u2 v1

Is there anything special about the vector w? Indeed, there is, as it is Perpendicular to u and v, and its length is equal in magnitude to the Area of the Parallelogram described by u and v. These relations are shown in Figure 6.11.

180 ■ Linear Algebra for Earth Scientists

RHS

Figure 6.11 The cross product w of the vectors u amd v. Left side shows the vectors ˆ Middle shows the three in the 123 coordinate system, with unit vectors ˆi, ˆ j, and k. vectors and the angle between u and v. It also shows the parallelogram defined by u and v. The right side shows cross-products among the various axes and directions.

If we look at the 1-2-3 coordinate system in Figure 6.11, we can consider what the cross product of the different axes gives. Let’s start with 1 × 2. This gives a vector in the 3 direction. ˆ i 1 × 2 = 1 0

ˆ j 0 1



ˆ k 1 0 ˆ ˆ = k = 3. 0 = k 0 1 0

(6.31)

The same goes for 2 × 3 and 3 × 1. They all give the third axis, and the resultant vector points in the positive direction of the third axis. What is we start with the 3 axis and take 3 × 2? ˆ ˆ j k 0 1 i ˆ 3 × 2 = 0 0 1 = ˆi i = −1. = −ˆ 1 0 0 1 0

(6.32)

In this case, the resulting vector will point parallel to 1 but in the negative direction. How about starting in the negative 2 direction and crossing this with positive 1? In this case, the result points toward positive 3. The vector is perpendicular to the two starting vectors and points in the direction consistent with the right-handed rotation of the first to the second vector. Its length is proportional to the parallelogram formed by the two vectors. So, in computing cross-products, we produce a single vector. 6.3.2

Cross Product and the Angle Between Vectors

In a previous chapter, we saw that the dot product could be used to tell us something about the Angle between vectors. The same is true for the cross-product. Again, we will not go into a proof, but if we have any two vectors u and v that intersect, then we have a simple formula that relates the scalar magnitude of the

Determinants and Orthogonality ■ 181

cross product vector to the angle between them. √ √ ∥u × v∥ = ∥u∥∥v∥ sin θ = u · u v · v sin θ, ∥u × v∥ θ = arcsin . ∥u∥∥v∥ 



(6.33) (6.34)

Note that θ is 0◦ to 180◦ . If we are dealing with unit vectors, such as using direction cosines to represent orientations, then our equations become ∥u × v∥ = sin θ,

(6.35)

θ = arcsin(∥u × v∥).

(6.36)

We will find it handy to use equations 6.33 to 6.36 in many instances, especially as we work on orientation data or other types of spatially-based geological data. Using the cross product is an easy way to find the perpendicular or pole to a plane, especially if two unit vectors define the plane. A word of caution here: in many sources, you will see equation 6.36 written as θ = arcsin(u × v). The cross product (u × v) is a vector, and we are seeking the norm of this vector. 6.3.3 The Three-point Problem with Cross Products

In the three-point problem, we are trying to find the coefficients to the equation for a plane given in the x-y-z coordinate system in equation 6.37. mx x + my y + z0 = z.

(6.37)

Our way of solving it has been to use the coordinates of 3 points to set up 3 coupled equations so that we could solve for 3 unknowns. But there is another way to solve the problem now that we know about cross-products. With three points, we can write coefficients for two vectors. We do this simply by writing the vector between two pairs of points, as shown in Figure 6.12. We will take point 1 as the starting point and compute the 1-2 and 1-3 vectors by subtracting the coordinates of point 1 away from 2 and 3. We are in ENU coordinates. We can then set up the cross-product equations as seen in equation 6.38 and evaluate them to get the result in equation 6.39. Note that we are in the ENU coordinate system on this figure, making counterclockwise rotations on the map surface have positive signs. This is a positive rotation angle since we are taking the cross product of 3-1 to 2-1. E ˆ U ˆ N ˆ −60 30 55 30 55 −60 ˆ ˆ ˆ 3-1 × 2-1 = 55 −60 30 = E − N + U , (6.38) −20 60 65 60 65 −20 65 −20 60

ˆ ˆ (1350) + U ˆ (2800). 3-1 × 2-1 = E(−3000) − N

(6.39)

182 ■ Linear Algebra for Earth Scientists

U

50 m

N 1

80

40

2 RHS

1

2

U

N

3 60

U

3

E

E

E

Using the cross product to solve the three-point problem. The left is the setup of the problem. The vectors from points 1 to 2 and 1 to 3 with their components are given on the right. This problem is posed in the east-north-up (ENU) coordinate system and positive rotations are counterclockwise. The solution using the crossproduct vector is shown in orange. Note this is not to scale in that the orange vector should be about 50 times longer. The components indicate the arrow is directed to the southwest. Middle shows how we go from the orientation of the cross-product vector to the east slope. A similar approach is used for the north slope and is explained in the text. Figure 6.12

From this, we can easily compute the slopes as they are given by components of the cross-product vector. However, this vector is perpendicular to the surface of interest. For that reason, we have to use the reciprocal of the vector’s slope and take its inverse sign to convert from the normal vector to the plane. See the center part of Figure 6.12 for reference. For the slope in the east direction, we get mE = −(−3000/2800) = 1.07. Likewise, we have mN = −(−1350/2800) = 0.48 for north. We got these same results using elimination, so the slopes check out. What about the value of U0 , as that did not come out of our solution? Nor should it. For the cross product, we had two equations, one for each vector, so we could only solve for two unknowns. To get U0 , we must use the fact that we have one more equation, which is the equation for the plane itself given in equation 6.37. We substitute the values for mE and mN into this equation, use data for any of the three points, 1, 2, or 3, and then solve for U0 . So the total is three equations, one from the plane and two from the cross product, giving our three unknowns mE , mN , and U0 . Using the cross-product works here because the problem is confined to two vectors in 3D. Any more or fewer dimensions or data points and the cross-product method is not at our disposal. 6.3.4 Combining Cross Product and Dot Product

Our next cross-product exercise is combining them with the dot product when we have three vectors. In other words, for these vectors, we take the cross product

Determinants and Orthogonality ■ 183

Cross and dot product. The left side is an oblique view through a parallelepiped tilted away from us. The right side is the backplane viewed from the +b direction. Red vectors a, b, and c have lengths from the origin to the red circles. The cross product of a × b is the blue vector at angle θ to the c vector. The blue dot is the dot product of c with the crossproduct vector. The yellow plane is the base of the parallelepiped. Height equals the dot product of c with a×b, and volume equals base times height. Figure 6.13

of any two and then the dot product of the third. The result is a volume equal to the determinant of the matrix we get by taking the vectors and making them the columns of a 3 × 3 matrix. We illustrate the process we are about to go through in Figure 6.13. In this diagram, we take the base of the object to be formed by vectors a and b at angle ϕ so that the area of the base is equal to a × b. The three vectors a, b, and c form a parallelepiped. The vector from a × b is perpendicular to the base of the parallelepiped. The dot product of c with a × b is the projection of c onto a line perpendicular to the base of the parallelepiped, which is the object’s height. This dot product is a scalar value that is equal to the volume. Given this result, what do we get if we expand out the equation for c · (a × b)? Notice that in equation 6.40, we reverse the order of the terms in the middle or ˆ j row so that we do not have to worry about carrying through the negative sign.     c1 i a2 b3 − a3 b2    ˆ  a × b = ˆi ˆ j k a3 b1 − a1 b3 , and c = c2 . h

a1 b2 − a2 b1

(6.40)

c3

Now, we can take the dot product with the cross product. We know that we can rewrite c · (a × b) as a product using the transpose to get cT (a × b) and get the equations in 6.41. This is the same as the result we got in equation 6.15 for the determinant of a 3 × 3 matrix, of course, with the sign of the middle row made positive by reversing the difference inside the parentheses. h

cT (a × b) = c1 c2

  c1 (a2 b3 − a3 b2 ) i a2 b3 − a3 b2   c3 a3 b1 − a1 b3  = +c2 (a3 b1 − a1 b3 ) =

a1 b2 − a2 b1

+c3 (a1 b2 − a2 b1 )

c c c 1 2 3 a1 a2 a3 . (6.41) b1 b2 b3

184 ■ Linear Algebra for Earth Scientists

We have made a matrix out of the three vectors a, b, and c, using them as row vectors. The order of the vectors as rows does not matter as the final equation is the same after some painful algebra, regardless of how we put the matrix together. The only difference may be the sign of the result. This is a natural result since changing a single set of rows in a matrix reverses the sign of the determinant. Our final result is that cT (a × b) = det(a b c), within sign of the result. Because c · (a × b) produces a single real number, it is commonly called the Scalar Triple Product. The reader may wonder why we took this detour between cross-products, dot products, and determinants. The reason is that much of the work done in the Earth Sciences is in three dimensions, and computing volumes is a common task. This gives us another linear algebra tool to use. 6.3.5

Using the Cross Product in Orientation Analysis

The cross-product can be helpful for orientation analysis in structural geology. The pole to a plane, strike and dip, dip azimuth and dip, or trend and plunge are limited to exactly three dimensions. What are some other important aspects of orientation analysis? First, we are constantly dealing with direction cosines in orientation analysis, so the length of the vector is always 1. This means the magnitude of the cross-product of two orientation vectors always equals one. Second, the cross product gives us the angle between any two orientations using the result in equation 6.34. Third, if we suspect or know that two measurements are related by folding, then the cross product gives us the orientation of the fold axis, as well as the angle of folding. Our first operation is to look at two orientation vectors, the poles to layering or bedding. The definitions of these quantities and their vector representations were given in Chapter 1, and their computation in Chapter 2. Each orientation vector will have length 1 because the sum of the squared direction cosines is 1. We show two such vectors a and b in Figure 6.14 along with their cross-product vector. We can take their cross product and get a resultant vector of length 1, and the coefficients of the resultant vector we will call c as shown in equations 6.42 and 6.43. N ˆ D ˆ a a a a a a ˆ E E D N D N E ˆ ˆ ˆ c = a × b = aN aE aD = N −E +D , bE bD bN bD bN bE bN bE bD   cN a a a a a a   E D N D N E c = −cE  , with cN = , c = , and cD = . bE bD E bN bD bN bE

(6.42)

(6.43)

cD

ˆ can be either positive or negative from this calculaThe coefficients for ˆi, ˆ j, and k ˆ or D component would indicate the vector is pointed in the upward tion. A negative k direction. Geologic data are reported only with plunges or dips into the Earth, so we have to have a D component that is ≥ 0. For c to fit a geologic context, we might have to change all components’ signs, as shown in equation 6.44. ˆ cN + Ec ˆ E − Dc ˆ D. If cD < 0 → c = −N

(6.44)

Determinants and Orthogonality ■ 185 U

ing

W

old ff

RHS

S

N

+

is Ax

o

+ E

+

D

Left side shows two orientations, a and b and their cross product c. If the cross product c of a × b points upward away from D, then the signs of its components must change. The right part of the figure is an illustration of a folded layer. The direction along which folding occurred is shown as the axis of folding or simply fold axis. The arrows pointing upward from the surface show two examples, one from each side, of the orientation of a pole. The vector c points in the direction of the axis of folding. Figure 6.14

Finding the pole to a plane can be done easily with a cross-product. In a previous chapter, we gave the equations of the direction cosines for a pole to a plane whose orientation was given either by strike and dip or by dip azimuth and dip. We derived these from geometric arguments. Here, we can consider another way of getting the same result for strike and dip measurements by using cross-products. If the reader has any questions on determining the strike and dip of a plane, look again at the discussion in the first chapter. The strike vector is measured on a horizontal plane so that the inclination of the vector is 0. In the NED coordinate system, the strike vector is given simply by the direction cosines of the strike line measured from north. We use the subscript s to indicate that these are for the strike vector. ls = cos(strike), ms = sin(strike), and ns = 0.

(6.45)

Our next vector will be the dip vector, which has an orientation of strike+90° and an inclination of the dip value. A familiar result gives its direction cosines: ld = − sin(strike) cos(dip), md = cos(strike) cos(dip), and nd = sin(dip).

(6.46)

We use d to indicate these are associated with the dip vector. We then take the cross product of these two in the NED system as shown in equations 6.47 and 6.48. The final result in equation 6.48 is the same as in the

186 ■ Linear Algebra for Earth Scientists

previous chapter. N ˆ D ˆ ˆ E ˆ sin(strike) sin(dip) − E ˆ cos(strike) sin(dip) + D ˆ cos(dip) (6.47) ls ms ns = N ld md nd    

sin(strike) sin(dip)

  − cos(strike) sin(dip)

cos(dip)

lp   = mp . np

(6.48)

We use the subscript p to indicate that this last result is the vector of direction cosines for the pole to the plane of interest. A straightforward application is figuring out the axis of folding in rocks. We will describe this in much more detail when we have learned about Eigenvectors, but here, we can perform a simple analysis by imagining a folded layer, as shown in Figure 6.14. A good way to understand folding is to start with a planar layer that we bent around the folding axis. This bends the layer in an orientation perpendicular to the axis. Therefore, a pole projected perpendicular to the layer is also perpendicular to the axis. This is true for any pole on the layer. If we have a pole on either side of the axis of folding, as shown in Figure 6.14, these two are both perpendicular to the axis. Taking the cross product of them gives a third vector that is perpendicular to both. This orientation must be the same as the axis of folding. In this way, we can take two observations and find the folding relation between the two. If the layer on either side of the axis was roughly planar, then the cross-product could also give us the angle through which the layer was folded using equation 6.34.

6.4 ORTHOGONALITY AND GRAM-SCHMIDT PROCESS Our next subject is orthogonality and computing orthonormal vectors and matrices. We can get to orthogonal columns if we start with a square or rectangular matrix with independent columns and apply the steps described later in this chapter. If we then adjust the norm of the column vectors to equal 1, the vectors or matrix are called Orthonormal. It is the practice in linear algebra not to call the matrix Orthogonal unless it is square. We commonly use the notation Q for matrices with orthonormal columns and qn for individual columns or rows. Orthonormal matrices are of particular interest to Earth Scientists, and the focus of much of the rest of this book. In particular, square orthogonal matrices behave well in all linear algebra operations. For the case of orthogonal columns, we have some special results for the matrix ATA. First, we know that ATA is square and has size n × n. If we name the columns of A using the notion ai , with i = 1, 2, ..., n, then for the entries of ATA we get that ∥ai ∥2 , if i = j . 0, if i = ̸ j

(

(ATA)ij = aTiaj =

(6.49)

Determinants and Orthogonality ■ 187

This produces a diagonal matrix with the values |ai |2 along the diagonal. The fact that aTi aj = 0 follows simply from the columns being orthogonal, making their dot product zero—ai · aj = 0 if i = ̸ j. When we apply the term “normal” or “norm” in the context of orthogonality, we are implying that a vector is of unit length. If we have orthogonal columns in a matrix with unit length, we call them orthonormal. We would also call A by the name Q and the column vectors qi rather than ai . The result for the matrix QT Q in this case is: 1, if i = j . 0, if i = ̸ j

(

(Q Q)ij = T

qiT qj

=

(6.50)

The result is, in this case, that we get: (QT Q) = I.

(6.51)

This is true regardless of the shape of Q. We get an additional important identity if Q is square. QT Q = Q−1 Q = I → QT = Q−1 .

(6.52)

One more important feature is associated with orthogonal matrices Q. We can show that the determinant of Q = ±1. In Section 6.2, we showed that det(AB) = det(A) det(B). Using equation 6.51, we would compute: 1 = det(I) = det(QT Q) = det(QT ) det(Q).

(6.53)

We know further that det(AT ) = det(A), so 1 = det(QT ) det(Q) = det(Q) det(Q) = (det(Q))2 → det(Q) = ±1.

(6.54)

6.4.1 Gram-Schmidt Process

We noted that the columns form a basis if a matrix has independent columns. If the columns are orthogonal, then we can have an easier time graphing the system in 2D or 3D and tend to produce good results in linear algebra operations. In both cases, the columns span the column space, but we can think of the orthogonal columns doing this most efficiently. The orthonormal case operates most closely to our typical cartesian coordinate system. We can rewrite any set of independent vectors as a set of orthogonal vectors. We do this by applying what we call the Gram-Schmidt Process for orthogonalization. If we make the vectors into unit length, we can call it Gram-Schmidt orthonormalization. The Gram-Schmidt process can act on any number of vectors of any length. We build the Gram-Schmidt process by selecting a starting vector and then sequentially performing projections to isolate the perpendicular components of the other vectors. Finding the orthogonal vector is the first part of the process, and normalizing is the second, with projection being the key component.

188 ■ Linear Algebra for Earth Scientists

Doing the Gram-Schmidt process in 2 dimensions. We create new vectors and b using the starting vectors a and b and projection.

Figure 6.15

a





6.4.2 Orthogonal Vectors in 2D

We will start our discussion with vectors in 2D as illustrated in Figure 6.15. On the left side, we see two vectors, a and b, that are not orthogonal but are independent and span the space of the axes u-v. Now, we want to use these to construct orthogonal vectors. We first accept that the vector a is fine as it is and is a good starting point. We may call it a⊥ for the Gram-Schmidt process. We will use the notation with the ⊥ for the orthogonal vectors created in this process. We must construct a perpendicular vector based on b. We have already done this in the previous chapter in the section on projection, where we created a vector e = b−p, where p was the projection vector of b onto a. We know that we can write the equation for p and therefore b⊥ as shown in equation 6.55. Remember that b⊥ is perpendicular to a⊥ , but neither of these vectors is normalized to unit length. To do this we would have to divide b⊥ by ∥b⊥ ∥ and a⊥ by ∥a⊥ ∥. Notice in this process that we are only working with column vectors. This means that the calculation of aTb and aTa produce scalar quantities. !

p=

aT b a, giving b⊥ = b − aTa

!

aT b a. aT a

(6.55)

Another way to describe what we did earlier is that in finding p, we have isolated all of b that is parallel to a. When we then subtract this part from b, all that is left is the part that is perpendicular to a, that being b⊥ . Note also that b⊥ is a vector ⊥ with components b⊥ 1 and b2 . 6.4.3 Orthogonal Vectors in 3D and More

We want to add another vector to the setup in Figure 6.15. To do this, we will have to add another dimension to the plot, shown in Figure 6.16. In this figure, we have a, b, and c that are not perpendicular and that are linearly independent. Since

Determinants and Orthogonality ■ 189

RHS

Gram-Schmidt process in 3 dimensions. The left side shows the three vectors in red. The right side shows the perpendicular components in blue. Orange lines are the differences between the starting and the orthogonal vectors. Note that a⊥ , b⊥ , and c⊥ are mutually orthogonal. Figure 6.16

they are independent, we can recast them as orthogonal vectors that are a basis. We start exactly as earlier with a and our result for b⊥ given in equation 6.55. What can we do to find c⊥ ? We do the previously mentioned operation described for p and b. That is, we want to isolate the components of c that are parallel to a and b. This is given in equation 6.56, a somewhat longer version of equation 6.55. !

c



=c−

aT c a− aT a

(b⊥ )T (c) b⊥ . (b⊥ )T (b⊥ ) !

(6.56)

The Gram-Schmidt process has produced three perpendicular vectors from three independent vectors not initially orthogonal. What if we continue to higher dimensions with more independent vectors? We would just keep plugging new vectors into equation 6.56 to eliminate all but the orthogonal part of the next vector. This procedure is the Gram-Schmidt Process and can be carried out as long as we have independent starting vectors. To finish the process, we will have to make the vectors of unit length via normalization. 6.4.4

Normalizing the Orthogonal Vectors

The previous results have produced a matrix now with orthogonal column vectors. If we start with an n × 3 matrix without orthogonal columns, the Gram-Schmidt process will produce a same-sized matrix but with orthogonal columns. We show this in equation 6.57. 

a1   a2  .  .  .

b1 b2 .. .



c1 c2   ..   → Gram-Schmidt Process → .

an bn cn

a1 b⊥ c⊥ 1 1   c⊥  a2 b⊥ 2 2  . .. ..   . . . .  . an b⊥ c⊥ n n 



(6.57)

190 ■ Linear Algebra for Earth Scientists

Remember, the process acts on the columns of the matrix, so we only need to consider these when we create the normalization factor. For the first column, which √ we symbolize with a, we simply take aTa = ∥a∥, and use this to normalize all the components of a. We get the result in equation 6.58 from this. The symbol a ˆn indicates a component of the unit or normalized vector. a ˆ1 =

a1 a2 an , a ˆ2 = , ···a ˆn = . ∥a∥ ∥a∥ ∥a∥

(6.58)

We can do the same for b⊥ and c⊥ . ˆb⊥ = 1

⊥ ⊥ b⊥ 1 ˆb⊥ = b2 , · · · ˆb⊥ = bn , , 2 n ∥b⊥ ∥ ∥b⊥ ∥ ∥b⊥ ∥

cˆ⊥ 1 =d

c⊥ c⊥ c⊥ 1 2 n ⊥ ⊥ , c ˆ = , · · · c ˆ = . 2 n ∥c⊥ ∥ ∥c⊥ ∥ ∥c⊥ ∥

a ˆ1 a1 b⊥ c⊥ 1 1    ⊥ a ˆ  a2 b2 c⊥   2  .  2 .. ..   .  → Normalization →  .. . .  .  . ⊥ an b⊥ c a ˆn n n 



(6.59)



(6.60)

ˆb⊥ cˆ⊥   1 1 | | | ˆb⊥ cˆ⊥   2 2   ˆ ˆ b⊥ cˆ⊥  . .. ..   → a . . | | | ˆb⊥ cˆ⊥ n n 

(6.61)

Now, we have created an orthonormal matrix using Gram-Schmidt orthonormalization. 6.4.5

Example of Gram-Schmidt

We will take up an example of a 3×3 matrix and show the Gram-Schmidt process taken to orthonormalization. Be warned, this is a messy process and can result in √ lots of terms looking like x. We start with the matrix in equation 6.62. 1 0 3 | | | 1       G = 0 2 0 → a b c , a = a⊥ = 0 . 1 1 0 | | | 1 







 

(6.62)

Then "

b⊥

#

aT b =b− T a= a a

(6.63)

− 12 0 1 0 1 h i 0    1       1    2  2 −  1 0 1 2 0 = 2 − 0 =   . 2 2 1 1 1 1 1 1  



   

 

 





2

(6.64)

Determinants and Orthogonality ■ 191

Followed by (b⊥ )T (c) aT c ⊥ =c− b − a= (b⊥ )T (b⊥ ) aT a "



c  



3   h 0 − 1  − 1   4.5  2 0

#

#

(6.65)

1 i 3 − 2  1 h i 3 1 1     −  1 0 1 0 0 =     2 0  2  2 1 0 0 1 2

  

2

"





   

4 3 −1 1   1  2  3    32  0 +  2  − 0 =     3  2    3 . 1 0 1 − 43 2

 





 



(6.66)



(6.67)

Giving for the final matrix with orthogonal columns 

1 − 12 2 1 12



G⊥

 = 0

 1 √  2 4  3   2  , normalizing →   0 3    4 −3    1 



2

0.5 −√ 4.5 2 √ 4.5 0.5 √ 4.5

2 3



     1   = Q. 3     2



(6.68)

3

This is a messy result, to say the least. Creating orthonormal matrices is critical in linear algebra. As stated earlier, matrices with orthonormal columns work well because they do not give computation difficulties. MATLAB® moment—Gram-Schmidt process There is no command for the Gram-Schmidt process in MATLAB® . Instead, we just use the orth() function. −0.99 0.10 −0.92   >>C=orth(G) gives C = −0.043 −0.86 −0.51 for G in equation 6.62. −0.13 −0.51 0.85 



Notice that this is very different from the result in equation 6.68. The orthogonal basis of a matrix is not unique. This can be easily seen if we start the Gram-Schmidt process using the second column vector and not the first. The process still produces an orthogonal or orthonormal basis.

6.4.6

Orthogonal or Orthonormal?

When do we call a matrix orthogonal as opposed to orthonormal? This is another messy aspect of the whole orthogonal columns and Gram-Schmidt work. In general,

192 ■ Linear Algebra for Earth Scientists

we have the following usage: Orthogonal matrix = a square matrix with orthonormal columns. Orthonormal matrix = a rectangular matrix with orthonormal columns. What do we call a square or rectangular matrix with orthogonal but not orthonormal columns? Matrix with orthogonal columns = a matrix with orthogonal columns. This is an excellent place to end this part on Gram-Schmidt orthonormalization. The process is integral to doing linear algebra, especially computational linear algebra. Many times, however, an innocuous-appearing matrix can go to a matrix that is not very attractive. In the text, we use orthogonal to mean perpendicular and normalization to refer to unit length. Thus, orthonormal is perpendicular with unit length.

6.5 QR DECOMPOSITION—OUR NEXT FACTORIZATION We will end this chapter with another important factorization, one that we call QR factorization or decomposition. With a little work, we can show that m × n matrices A, with m ≥ n, can be decomposed into two other matrices that we call Q and R with the property that A = QR. What matrix is Q in QR factorization? It is the orthonormal matrix that the Gram-Schmidt process produces. What is R? It is an upper triangular matrix that results from the interaction of Q with A. This factorization is used widely in cases where the equation Ax = b has no solution. From the last chapter, this is common for matrices with many more rows than columns where we are performing projection or least square analysis. For QR decomposition in this book, we limit ourselves to full column-rank matrices. 6.5.1 Constructing Q and R

The primary outcome of the previous section was the computation of an orthonormal matrix from a starting matrix with full column rank. This process results in the matrix Q. We also noted previously in Section 6.4 that if we start with an m × n matrix A having orthogonal columns of unit length, taking the product ATA produces an n × n identity matrix. We show the result for a m × 3 matrix. 

QTQ =  

a ˆ ˆ b⊥ cˆ⊥

| | | 1 0 0     ˆ ˆ b⊥ cˆ⊥  = 0 1 0.  a 0 0 1 | | | 







(6.69)

This is because a ˆ Ta ˆ = 1, but a ˆT ˆ b⊥ = 0, and an n × m times a m × n matrix produces an n × n matrix. Again, we use b⊥ to indicate the orthogonal component of vector b and ˆ b to signify a unit vector.

Determinants and Orthogonality ■ 193

Let us assume that the factorization A = QR is correct. We can perform the following calculations to get a result for the matrix R. QT (A = QR) → QTA = QT (QR) = (QT Q)R = IR = R.

(6.70)

Why should QTA produce an upper triangular matrix? We can reformulate this as follows: 

QTA =  

a ˆ ˆ b⊥ cˆ⊥





| | |   a b c.  | | |

(6.71)

The inner product of a ˆ with a produces a positive scalar value that is the squared vector length. Likewise, taking the inner products of a ˆ with b and c can produce scalar values different from zero so long as the vectors are not orthogonal. But ˆ b⊥ · a = 0, because the Gram-Schmidt process produces ˆ b⊥ so that it is orthogonal to the column vectors to its left in the matrix Q or earlier it in QT . Putting this together, we get an R matrix that is n × n and looks like the following for an n = 3 example: a ˆa a ˆb a ˆc a ˆa a ˆ⊥b a ˆ⊥c     R = QTA =  ˆ b⊥ a ˆ b⊥ b ˆ b⊥ c =  0 ˆ b⊥ b ˆ b⊥ c . cˆ⊥ a cˆ⊥ b cˆ⊥ c 0 0 cˆ⊥ c 







(6.72)

So the matrix R is upper triangular. As we saw in previous chapters, such matrices are ideal for back substitution. 6.5.2

Using QR Decomposition When Ax=b has No Solution

When we are presented with a problem with many more equations than variables, we know that there is most likely no exact solution. The last chapter covered this case extensively and developed techniques to find the best solution that we called x ˆ. To get to this, we worked through what we called the normal equation, which is shown in equation 6.73. Now that we have the QR factorization, let’s use it in the normal equation to recast it into a clearer form. ATAˆ x = ATb, → x ˆ = (ATA)−1 AT b ,

(6.73)

x ˆ = ((QR)TQR)−1 (QR)Tb.

(6.74)

Now we can rearrange all the terms in equation 6.74 and simplify using the fact that QTQ = (RT )−1 RT = I and that (QR)T = RTQT : x ˆ = (RT(QTQ)R)−1 RTQTb = (RTR)−1 RTQTb =

(6.75)

R−1 (RT )−1 RTQTb, giving x ˆ = R−1 QTb.

(6.76)

And if we multiply each side of the last equation in 6.76 by R, we get the nice result: Rˆ x = QTb.

(6.77)

194 ■ Linear Algebra for Earth Scientists

This is an especially lovely result because the matrix R is upper triangular, and QTb is just a vector. That means that we can just read off the components for x ˆ by back substitution into equation 6.77.

MATLAB® moment—Using QR decomposition QR decomposition uses the qr() function. MATLAB® handles the Q part QR decomposition a little differently than we do in this text. We will start with a random matrix and then do the decomposition. 2 5  >>A=randi(9,4,2) gives A =  9 4 

6 3  , then >>[Q,R]=qr(A) gives 7 3 

−0.18 0.97 −0.16 −0.05 −11.22 −9.09 −0.45 −0.23 −0.78 −0.36  0 4.52      Q=  and R =  . −0.80 −0.06 0.56 −0.19  0 0  −0.36 −0.05 −0.21 0.91 0 0 







The matrix Q here has four columns. The first two come from the normalization of the matrix A and the next two serve to form the rest of an orthonormal basis in R4 as computed by MATLAB® . Because each column of A has four components, an orthogonal matrix should span R4 .

Let’s return to our example of fitting a parabola from the exercises in the last chapter. We were trying to solve for a, b, and c in the equation ax2 + bx + c = y of a parabolic but scattered data set. We set up the problem, with no solution, as: 4 −2 1 7  1 −1 1  3          0 0 1    2    a   1  1 1    −1 . b Ax = b →  =   9    3 1  −1   c   16 4 1  1      25 5 1  6  36 6 1 7 







(6.78)

We will not show the steps, but we can orthonormalize A and then do QR decomposition on this and get the following matrices, rounded to two significant figures. The results look messy, but we warned you that the Gram-Schmidt process could produce

Determinants and Orthogonality ■ 195

ugly results. 0.084 −0.75 0.29 0.021 −0.33 0.42         0 0 0.49    47.7 8.89 1.93 0.021 0.23 0.50    , R = QTA =  3.61 −0.32 , Q=  0  0.19  0.37 0.37   0 0 2.04  0.28 0.22   0.34     0.52 0.095 0.0095 0.75 −0.20 −0.25

(6.79)

a ˆ a ˆ 47.7 8.89 1.93 9.20 ˆ   ˆ   T 3.61 −0.32  b  = −7.42 . R b = Q b →  0 0 0 2.04 1.91 cˆ cˆ

(6.80)





 



 





Now, by back substitution, we get that:

6.6

cˆ = 1.91/2.04 = 0.936,

(6.81)

ˆb = (−7.42 + 0.30)/3.61 = −1.97, and

(6.82)

a ˆ = (9.20 + 17.51 − 1.81)/47.7 = 0.525.

(6.83)

SUMMARY

We first viewed the determinant as a measure of the size and orientation of a matrix. By size, we meant what the matrix does to a vector or how we view the matrix as performing a linear transformation. We stated in previous chapters that the linear transformation view of a matrix is one of the fundamental ways of understanding linear algebra. We presented this in terms of the geologic strain of an object by deformation and showed examples of changing the shape and area of some twodimensional images. We used the determinant of the matrix for linear transformation to understand whether the deformation was dilatational or contractional. We called the deformation plane strain for 2D if the size was unchanged. The determinant can have either a positive or negative sign. A positive sign means that the linear transformation returns a picture similar to the starting one, although with some changes in lengths and angles. On the other hand, a negative sign shows that the axes’ role changes during the transformation, and the overall coordinate system goes from right-handed to left-handed. This change is important in many applications. We also have several rules involving square matrices and their determinants. 1. det(AB) = det(BA) = det(A) × det(B). 2. det(A2 ) = det(A)2 . 3. det(A−1 ) =

1 . det(A)

196 ■ Linear Algebra for Earth Scientists

4. det(AT ) = det(A). 5. det(Q) = ±1 for orthogonal matrices. A special operation in linear algebra can be done for three-dimensional cases, taking a cross-product. The cross product makes a vector combination of two other vectors that works almost exactly like computing a determinant. The operation produces a perpendicular (orthogonal) vector to the two vectors and has a length proportional to the area of the parallelogram described by the two vectors. Cross products are a quick operation to find an orthogonal vector in three dimensions. When the cross product is combined with a dot product, such as computing c · (a × b), it produces the volume of the parallelepiped described by a-b-c. The last part of the chapter covered orthogonality. Our primary outcome was a method to turn a matrix with independent columns into one where all the columns are orthogonal and normalized to unit length. We used what we called the GramSchmidt process to accomplish this chore. Gram-Schmidt is an algorithm that leads us through creating the orthonormal matrix. Once created, we can use the matrix as a basis for our system. We can also use the result to generate the QR factorization that is handy in solving Ax = b when A has many more rows than columns.

6.7 EXERCISES 1.

Compute determinants of the linear transformations implied in Figure 6.17. The upper left is the pre-deformed state, and all other ellipses are deformed versions

Underformed initial circle in the upper left and four ellipses created by deformation. Figure 6.17

Determinants and Orthogonality ■ 197

of this circular marker. Use the grid to determine the size of each feature and write a deformation matrix to go from the circle to this shape. 2.

Perform the Gram-Schmidt process on the following matrix. Also, compute the orthogonal basis using the MATLAB® orth() command. Once you have the matrix Q, show that it is orthogonal by computing the dot products of the columns. 1 1 −2   2 3 −2. 2 −2 −1 



3.

Now, we will return to the six-point problem we saw in Chapter 5. We solved this problem using the normal equation, but now we are going to work on it with QR decomposition. Here are the matrix and vectors we need to use. 5 70  60  A= 75  40 70 

70 50 10 40 55 20

30 1 90   1    mE 60 1       , x = mN  , b =   . 90 1    z0 60 1 70 1 





Plan on using the MATLAB® orth() command on the matrix A to get Q. You will then have all the matrices and vectors to solve the problem.

CHAPTER

7

An Earth Science View of Eigenvalues and Eigenvectors

This chapter is about Eigenvectors and Eigenvalues. Eigenanalysis finds these values and vectors and can lead to profound insights into the fundamental properties of a matrix and, therefore, processes and measurements of the Earth. Getting through this chapter will let the reader declare that they understand the essential features of linear algebra and its applications to many parts of the Earth Sciences.

7.1

MOTIVATION FOR EIGENANALYSIS

Geologic strain is one of the most visible and easy-to-understand linear transformations we encounter. In this chapter, we will rely heavily on examining deformed rocks. Strain compares the deformed state to the undeformed state. Strain is measured by comparing lengths and angles of observed features as they appear before and after deformation. Features where we know the initial shape present perfect examples for doing Eigenanalysis to determine the eigenvectors and eigenvalues for a geologic problem. We start with examples of a deformed fossil. In each case, the fossil is a brachiopod, shown in the left side of Figures 7.1 and 7.2, the same fossil we saw in the last chapter. It is repeated nine times; each shell rotated 10° counterclockwise from the last. In Figure 7.1, the array of shells is sheared by 30° on the top relative to the base, and each shell is deformed in this strain field. If we look at this as a simple shear problem, our strain matrix is given in equation 7.1. This is a linear transformation. It is easy to see that all the shells change shape differently depending on their orientation. Are there any orientations where it is easier to understand the strain matrix based on the shape of the shell? "

Simple shear strain →

198

#

1 sin(30°) 0 1

"

=

#

1 0.5 . 0 1

(7.1)

DOI: 10.1201/9781003432043-7

An Earth Science View of Eigenvalues and Eigenvectors ■ 199

Figure 7.1 Left shows an array of brachiopods. Each one is rotated 10° relative to the one before. Right shows the array sheared 30°. The original image is courtesy of Mark A. Wilson, College of Wooster.

Left shows an array of brachiopods as in Figure 7.1. Right shows the array elongated 40% horizontally. They were also shortened 30% vertically so that the deformation roughly preserves area. Figure 7.2

In Figure 7.2, the original array of shells is stretched about 40% in the horizontal direction and shortened about 30% vertically. √ We can formulate this as a pure shear problem that preserves area, giving factors of 2 and √12 for stretching, as shown in the strain matrix in equation 7.2. Again, it is easy to see that all the shells change shape, but in different ways depending on their orientation. Is there any particular or special orientation or shell for the pure shear case that tells us something about the strain matrix? "

Pure shear strain →

#

1.41 0 . 0 0.71

(7.2)

In both cases, the upper left shell in each array shows the special orientations well, illustrated with examples in Figure 7.3. We can see that the long axis of the shell is unrotated by the deformations. For simple shear, we point out only one special direction, at least for now: horizontal. This is the only line that is not rotated. For pure shear, two directions do not rotate, horizontal and vertical, again evident in

200 ■ Linear Algebra for Earth Scientists

Upper shows the original and deformed shell from the upper left positions in Figure 7.1. Note that the long axis or hinge line is the only direction that points in the same direction before and after deformation. Lower shows the original and pureshear deformed shell from the upper left positions in Figure 7.2. The long and short axes are not rotated, but all other lines rotate. Figure 7.3

Simple Shear

Pure Shear

Simple Shear

Pure Shear

Some examples taken from Figures 7.1 and 7.2. The upper row shows undeformed shells, and the lower row shows deformed shells. The left shows simple shear deformation. A single but different ridge in each is roughly horizontal and does not change orientation during the deformation. Right shows the middle shell in the array from Figure 7.2. Two ridges, one horizontal and one vertical, are not rotated. All other ridges have a different orientation. Figure 7.4

the shell. In the case of simple shear, the length of the line remains unchanged. The horizontal line is lengthened for pure shear, and the deformation shortens the vertical line. We can find more lines that appear unrotated in the other shells in Figure 7.1 and 7.2, and show some examples in Figure 7.4. For the simple shear case, we can see several examples where one of the shell’s ridges does not rotate and does not change length. These are always oriented roughly in the horizontal direction. In pure shear, we see shells where two lines do not rotate, but the length and width of the shells do change. One example is shown in the right part of Figure 7.4. Ribs in these orientations are perpendicular before and after the pure shear deformation and do not undergo rotation.

An Earth Science View of Eigenvalues and Eigenvectors ■ 201

7.1.1 The Special Directions are the Eigenvectors

We have identified in these examples the special directions that are unchanged during a linear transformation or, in our case, strain. For these orientations, we have corresponding vectors. We refer to these vectors as the Eigenvectors. This does not imply that nothing happens in these directions, far from it. We can easily see in the pure shear example in Figures 7.3 and 7.4 that although the directions did not change, the length of the shells did. In other words, the result of our linear transformation left the eigenvectors in the same orientation (the definition of eigenvectors) but did change their lengths. In linear algebra, the eigenvectors are an essential set of vectors that we associate with a matrix. In our example, we had a 2-dimensional strain matrix, and simple shear gave one eigenvector, and pure shear gave two. In most cases, we will work with matrices that provide the same number of eigenvectors as dimensions. Remember, the strain matrix applies a linear transformation to the whole system, and the eigenvectors are components that may change length but do not rotate during the transformation. 7.1.2 The Scaling Factors are the Eigenvalues

The scaling factors of the eigenvectors are also special, and they are called the Eigenvalues. These can be real or complex numbers for a matrix, but we get reals for most Earth Science problems. The eigenvalues tell us how much the eigenvector is stretched in the transformation and whether the vector flips direction. Like the case for determinants, negative signs represent changes in the coordinate system. In the previous examples, we can quickly evaluate the eigenvalues. For the case of simple shear, the shell does not change its length at all in the eigenvector direction. For this case, the eigenvector has an eigenvalue of 1. We have lengthened the shells about 40% horizontally and shortened about 30% vertically for pure shear. In this case, the associated eigenvalues are about 1.4 and 0.7, respectively.

7.2 EIGENVALUES AND EIGENVECTORS We aim to find eigenvalues and eigenvectors to identify vectors that stay in the same orientation between the untransformed and transformed state but are squished, stretched, flipped, or some combination of these changes. We use the matrix A for linear transformation. This matrix can act on any vector, but the eigenvectors are the resultant vectors still aligned with the original ones after we perform the transformation with A. We can write this simply as: Ax = λx, where x is an eigenvector and λ is the associated eigenvalue.

(7.3)

The eigenvector x is on both sides of the equation. On the left, it is transformed by the matrix A. The result on the right is the same vector multiplied by a scalar, the eigenvalue. This expresses well the conditions we described earlier for both the eigenvector and eigenvalue; the only difference is that we call the eigenvalue λ instead of the stretching factor. In the equation 7.3, we are only dealing with one eigenvector and one eigenvalue at a time.

202 ■ Linear Algebra for Earth Scientists

7.2.1 Finding Eigenvalues with the Characteristic Equation

We can rearrange equation 7.3 and do some multiplications to come up with the result that can lead us to the eigenvalues. Although we started with the special directions as the eigenvectors, we commonly get them by finding the eigenvalues first and then solving for eigenvectors. Ax − λx = 0, where λ is a scalar.

(7.4)

Ax − λIx = 0, where λI is a matrix.

(7.5)

(A − λI)x = 0, where A − λI is a matrix.

(7.6)

For equation 7.6 to be true, we must meet one of two conditions. The first possibility is that the vector x is the zero vector. This is the trivial case and is always a solution; as we know, the zero vector is part of every vector space. In the previous examples of shells, we saw that nonzero vectors correspond to the eigenvectors. If x is not zero, then the other possible condition is that the matrix (A − λI) is singular. If this matrix is not full rank, then there is a null space and, therefore, a nonzero solution for x in the equation (A − λI)x = 0. This means that equation 7.7, known as the Characteristic Equation of a square matrix, must be satisfied. det(A − λI) = |A − λI| = 0.

(7.7)

Expanding this out for a 2 × 2 matrix with arbitrary real components gives us the result in equation 7.8. "

#

"



#



a − λ b a b λ 0 A= , λI = , |A − λI| = → c d − λ c d 0 λ

λ2 − (a + d)λ + (ad − bc) = 0.

(7.8)

We call equation 7.8 the Characteristic Polynomial of the matrix. The polynomial, in this case, is a quadratic equation that we know how to solve. We will get two values we will call λ1 and λ2 that are the solutions to the normal quadratic equation with the constants from equation 7.8. λ1 , λ2 =

(a + d) ±

(a + d)2 − 4(ad − bc) . 2

p

(7.9)

This is the general solution for the eigenvalues in the case of a 2 × 2 matrix. The two values are just the ones needed to make the matrix in the characteristic equation singular. In some cases, especially those used as exercises in books on linear algebra, we can easily factor the polynomial in equation 7.8 or find its roots by inspection. 7.2.2 Two Examples from Strain

Below are examples of finding the eigenvalues for the cases of simple and pure shear deformation presented in equations 7.1 and 7.2 and Figures 7.1 and 7.2.

An Earth Science View of Eigenvalues and Eigenvectors ■ 203

1 − λ 0.5 → (1 − λ)(1 − λ) − 0 = 0 → λ1 = λ2 = 1, 0 1 − λ 1.41 − λ 0 → (1.41 − λ)(0.71 − λ) − 0 = 0 → λ1 = 1.41, λ2 = 0.71. 0 0.71 − λ

(7.10) (7.11)

These are two cases where we could have figured out the eigenvalues just by inspection. We emphasize again; the simple shear case gave us a repeated eigenvalue and the pure shear example provides two distinct eigenvalues. Consider a more general example of a 2 × 2 matrix. In this case, we can factor the characteristic polynomial into two distinct roots for λ. "

5 −3 1 1

#



5 − λ 1



−3 → (5 − λ)(1 − λ) + 3 = 0 → 1 − λ

λ2 − 6λ + 8 = 0 → (λ − 4)(λ − 2) = 0 → λ1 = 4, λ2 = 2.

(7.12)

If we have a larger matrix, say a 4 × 4 one, our characteristic polynomial will have order 4. We can only apply eigenanalysis to square matrices, so an n × n matrix gives us a polynomial of degree n. This means that if n → 50, it becomes increasingly difficult to find the eigenvalues. Fortunately, in many of our Earth Science examples, we will end up with 3 × 3 matrices for the eigenanalysis. Why is the eigenvalue equation characteristic of the matrix? This is a bit of a trick question. The most direct answer is that the word Eigen is German and can be translated as own. Another possible translation is characteristic so that the eigenvalues are the matrix’s own characteristic values. The first use of eigen with value and vector is traced back to mathematician David Hilbert in 1904, although the concept preceded his usage. Does the order of the eigenvalues matter? The short answer to this question is no; the ordering of the magnitude of the eigenvalues implies nothing. They can be listed from greatest to least or least to greatest. The only thing that matters and is critical is whatever order is used for the eigenvalues, the same order has to be used for the eigenvectors. The standard convention is to use λ1 to label the largest eigenvalue, λ2 for the next largest, and so forth. In the Earth Sciences, we usually follow the order λ1 ≥ λ2 ≥ · · · ≥ λn because we order principal axes for stress or strain in this manner. 7.2.3

Finding the Eigenvectors

We have the eigenvalues. These are the stretching factors for vectors that remain in the same orientation after transformation. So, how do we find the eigenvectors? We do this by using the eigenvalues and plugging them into the characteristic equation. These values make the matrix singular, and once we have done this, we have an equation we can solve for x, an eigenvector of the matrix. The eigenvector is in the null space of (Ax − λx). We repeat this operation for every distinct eigenvalue.

204 ■ Linear Algebra for Earth Scientists

Let’s start with the example of pure shear from equation 7.11. Here, we let the vector b be the zero vector to find the null space. Out of this substitution, we get two more matrices and their linear algebra equations in the form of Ax = b. "

#

"

#

1.41 − λ 0 0 0 x=0→ x = 0, and 0 0.71 − λ 0 −0.71

"

#

0.71 0 x = 0. 0 0

(7.13)

Of course, these matrices are singular—that was the whole point of the characteristic polynomial, to find values for λ that make the matrix singular. The substitution of the eigenvalues takes an invertible matrix and makes it into a matrix with a null space. Now, we solve for the different eigenvectors in equation 7.13 by finding the null space of the new matrices. These equations are easy to solve for x. If we use u and v for the unknowns and x1 and x2 for the solution vectors, then the equations for the null space become: " #

λ1 = 1.41 → 0u − 0.71v = 0, x1 =

1 , and 0

(7.14)

" #

λ2 = 0.71 → 0.71u − 0v = 0, x2 =

0 . 1

(7.15)

Since these are singular, the matrices have a free variable. We set the free variable to 1 in each case and then solve for the pivot variable. In both cases, the pivot variable must be zero. What exactly do the results in equations 7.14 and 7.15 mean, and how do we interpret them? The results indicate that the vector [1,0], which is horizontal, remains horizontal and stretches by a factor of 1.41 and that the vector [0,1], which is vertical, remains vertical and shortens by a factor of 0.71. This is consistent with what we see in Figure 7.2. The upper left shell has major and minor axes parallel to the eigenvectors. The long side is unrotated and stretched, consistent with elongation by 1.41. The short side is also unrotated and is shorter, probably by a factor of 0.71. Let’s take up the simple shear problem from earlier in equation 7.10. This case gave us two eigenvalues that are the same. We called this a repeated eigenvalue in Section 7.2.2. When we create the Ax = b, we get only one new matrix since we have only one unique eigenvalue. "

#

1 − λ 0.5 x=0 → 0 1−λ

"

#

0 0.5 x = 0. 0 0

(7.16)

This equation makes us find the vectors in the null space but only gives us one solution. Using the notation we had earlier, we get for λ = 1 that 0u−0.5v = 0, giving x = [1, 0]. This means that our repeated eigenvalues give us a repeated eigenvector. " #

λ = λ1 = λ2 = 1, x1 = x2 =

1 . 0

(7.17)

An Earth Science View of Eigenvalues and Eigenvectors ■ 205

Transformation of various vectors by matrix in equation 7.12. Eigenvectors x1 and x2 are shown in blue across the top of the figure. Eigenvectors stay in the same orientation but stretch. In the bottom part, we show the horizontal unit vector called e1 in orange, and the vertical unit vector e2 shown in green. These vectors change length and rotate. Also, note that e1 goes to the first column vector of the matrix and that e2 goes to the second matrix column. Figure 7.5

What does this result mean? It gives us that only one vector remains in its original orientation, the horizontal vector [1, 0], and does not change length. This is exactly the result in Figures 7.3 and 7.4. The only lines that remain in the same orientation are the ones that start horizontally. This is true for the shell’s hinge line or rib ornaments that start horizontally. Let’s continue with the last example in equation 7.12. The eigenvalues are λ1 = 4, λ2 = 2, and we use the larger eigenvalue of 4 first, then the smaller value of 2. "

#

5 − λ −3 x=0 → 1 1−λ

"

#

1 −3 x = 0, and 1 −3

"

#

3 −3 x = 0. 1 −1

(7.18)

Now, we solve for the eigenvectors and get the following result. "

#

" #

1 1 , and λ2 = 2, x2 = . λ1 = 4, x1 = 0.33 1

(7.19)

What happens to the eigenvectors during transformation? What happens to the unit coordinate vectors? We can see this in Figure 7.5. The eigenvectors do not rotate but stretch. All other vectors rotate and stretch. We can create this sort of figure for any square, invertible matrix.

206 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Getting eigenvectors and eigenvalues ® Finding eigenvectors and eigenvalues is a single function in MATLAB " # . We 5 −3 will work with the matrix in equation 7.12. This will be A = . 1 1

>>[V,D]=eig(A) gives matrices V of the eigenvectors and D of the eigenvalues. "

#

"

#

0.949 0.707 4 0 V = and D = . 0.316 0.707 0 2 Why does this result differ from the eigenvectors shown in equation 7.19 that we got for the same matrix? The function eig() returns unit length eigenvectors. If x1 and x2 in equation 7.19 are normalized, they are equal to the results we see here.

7.3

LOOKING AT DATA AND NOT Ax=b

The previous examples show the power of eigenanalysis in understanding the characteristics of linear transformations. We will see the strain examples later in this chapter and how to connect these to the principal strains in deformed rocks and objects. But first, we look at insights we can gain in arrays of data by using eigenanalysis. Such analyses are common for understanding everything from image analysis and face recognition to the relative importance of different variables in analyzing systems with many controlling factors or components. For all of the examples here, we will assume we are working with an m × n starting matrix with n < m. In other words, the matrix is constructed to have more data points than the dimension we are working in. We call the starting matrix A as usual. The target of our analysis will be the matrix given by ATA or the Gram matrix of A. We had seen this combination before when we developed the projection matrix and developed the normal equation we used for least squares calculations. The form ATA is also the basis for computing a covariance matrix and doing principal component analysis. We will use an example we have met already: direction cosines to deduce information about dip azimuth-dip or trend and the plunge of geologic features. We have learned to efficiently convert geologic orientation data into direction cosines using matrix operations. We have also noted some important features about the direction cosines, including the fact that l2 + m2 + n2 = 1 and that we could find the orientation of a folding vector using a cross product. We are now ready to combine this information with our newly developed techniques for eigenvectors and eigenvalues on the matrix ATA to determine more from orientation data.

An Earth Science View of Eigenvalues and Eigenvectors ■ 207

Comparing the sum of the direction cosine vectors for the more clustered first and more scattered second examples of orientation data. Black arrows are for the more clustered data presented in equation 7.20. Green arrows show the direction cosine vectors for the more scattered data set in equation 7.22. Vertical shadows of vectors are shown on the NE plane, and positions of the final sums are shown connecting to the D axis and as circles on the NE plane. The more clustered data give a longer vector, whereas the scattered data is noticeably shorter. This reflects the summation of components in the direction of the t vector. The length of the vector t decreases as the scatter in the data increases. Figure 7.6

D

E N

7.3.1 Back to Orientation Data

Our first problem will be finding an average orientation of a set of linear features. These could be striations on a fault surface or measuring a bunch of fold axes of a single set of minor folds. This problem has been analyzed in detail from a statistical sense. Still, we start with linear algebra to lead to the eigenanalysis that characterizes the current practice in the Earth Sciences. We start by turning the trend and plunge of each linear feature into direction cosines in the NED coordinate system. Example data are given in equation 7.20 for four different trends and plunges. " #

"

#"

T 45 = P 50

#"

40 51

#"

42 55

#

46 49

l 0.455 0.482 0.426 0.456        → m = 0.455 0.405 0.384 0.472. (7.20) n 0.766 0.777 0.819 0.755  











The direction cosines l, m, and n are just the coefficients of each vector. We can ˆ , mE, ˆ and nD ˆ in the NED system. consider this vector to have coordinates lN We want to use these to get an average or best orientation. We could start by adding all the vectors together tail-to-head as shown in Figure 7.6. This is the start of how we would average any set of data. This gives us a resultant vector t, expressed

208 ■ Linear Algebra for Earth Scientists

as sums of l, m, and n in equation 7.21. Consider how we normalize these. For an average, we usually √ divide by the number of points, but here we find the length of t by computing tT t . We can then divide each of the components of t by ∥t∥ to get the direction cosines of the unit resultant vector we will call r. This unit vector is pointing toward t. These direction cosines can then be converted back into trend and plunge. " # " # 0.455 rl 1.82 l T 43.3      P   = , (7.21) t =  m = 1.71 , ∥t∥ = 3.995, r = rm  = 0.429 , P P 51.3 0.780 rn 3.12 n P 













where P = arcsin(rn ), T = arccos(rl / cos(P )) or T = arcsin(rm / cos(P )). This was a simple operation where we used much of what we learned in the earlier chapters. We converted orientation data to direction cosine vectors. We added these vectors. We found the length of the resultant vector by using the product of the transpose times the vector. We multiplied by a scalar. Finally, we converted the resultant unit vector back to the desired units. We will now give another example that results in the same trend and plunge but has different starting vectors. This is given in equation 7.22. The vectors are also plotted in Figure 7.6. " #"

T P

#"

68 20

#"

20 51

#"

41 68

#

29 58

# " # " 0.456 43.3 T   = . → r = 0.428 , ∥t∥ = 3.71, 51.3 P 0.780 



(7.22)

It is easy to compare data in equation 7.20 with equation 7.22 and see that the latter has much more variation, so it is more scattered. We also see that the length of the summation vector is smaller. In equation 7.21, the length of the total sum vector t is almost 4, but in equation 7.22, it is 3.7. Although this is smaller, we do not have a good feel for the meaning of the difference of 0.3 in terms of scatter. If all the measurements were colinear, the t length would be 4. Can we do something that will give us a feel for how well the data cluster or whether there are other patterns in the scatter? We will continue working on examples with orientation data to see how eigenanalysis can help. 7.3.2

Using AT A with Arrays of Data

As we noted in the chapter on solving Ax = b when there was no solution, we commonly encounter matrices that have many more rows than columns, m >> n. There, we used a projection matrix to solve for the best or closest solution to the problem. We can do something similar with arrays of data and eigenanalysis and show an example using orientation data. We have already looked at one method, but the one presented here gives us much richer results. In data analysis, it is common to compute something called a covariance matrix to analyze the tendencies of the data to align different variables. This involves finding means of datasets and using these to center the data on the mean. The calculation computes (x − x¯) for every data point and column.

An Earth Science View of Eigenvalues and Eigenvectors ■ 209

We compute something simpler for eigenanalysis using a Gram matrix. For the orientation problem presented in equation 7.20, we rewrite the individual orientation vectors into the rows of a matrix: 0.455 0.455 0.482 0.426 0.456 0.482       0.455 0.405 0.384 0.472 → A =  0.426 0.766 0.777 0.819 0.755 0.456 











0.455 0.405 0.384 0.472

0.766 0.777  . 0.819 0.755 

(7.23)

Each row is a data point, and each column lists the components for the direction cosines l, m, and n. We then compute the Gram matrix ATA. 0.829 0.781 1.416   G = ATA = 0.781 0.741 1.334. 1.416 1.334 2.431 



(7.24)

We noted in the earlier chapter that this gives a symmetric matrix as the product. We revisit that claim here with an example of a m × 3 matrix and using indices for the elements. 

a11   a21 A=  ..  .

a12 a22 .. .

am1 am2



  a13 a a · · · a 11 21 m1 a23     T ..   , A = a12 a22 · · · am2  , .  a13 a23 · · · am3 am3

P

(ai1 · ai1 )

P

ATA =   (ai2 · ai1 )

P

P

P

P

(ai3 · ai1 )

(ai1 · ai2 )

P

(ai2 · ai2 )

P

(ai3 · ai2 )

P

(7.25)

(ai1 · ai3 )



(ai2 · ai3 ) .

(7.26)



(ai3 · ai3 )

We know this is symmetric because the cross-diagonal terms are equal. We get that P P (ai3 · ai1 ) = (ai1 · ai3 ), because the dot product is commutative—(ai3 · ai1 ) = (ai1 · ai3 ). We can also name the columns of A using the notion a1 , a2 , and a3 , then for the entries of ATA we get:  

|

 A = a1

|





T T T a1 a1 a1 a2 a1 a3 

| |    T T T  a2 a3  → ATA =  a2 a1 a2 a2 a2 a3  .   | | aT3 a1 aT3 a2 aT3 a3

(7.27)

The matrix ATA is also special because it is in a class of matrices we call Symmetric Positive Definite. In our case, Positive Definite means that the diagonal terms are all positive since they are squares, and therefore, the eigenvalues of the matrix are all real and positive. This n × n matrix is also symmetric and nonsingular—it has an inverse. Positive-definite matrices are always well-behaved in that they give little trouble computationally. One or more of the diagonal entries could be zero. If so, it is called Positive Semi-definite and is still an easy matrix.

210 ■ Linear Algebra for Earth Scientists

7.3.3

Computing Eigenvalues and Eigenvectors for Orientation Data

We will look at the matrix for the second set of orientation data shown in equation 7.22. We show the matrix and the result for ATA. 0.352 0.591  A= 0.283 0.464 

0.871 0.215 0.246 0.257

  0.342 0.768 0.622 1.235  0.777    → G = ATA = 0.622 0.931 0.911. 0.927 1.235 0.911 2.300 0.848 

(7.28)

Note that in the previous example, we renamed the matrix ATA as G. We do this explicitly so that the reader recognizes that the matrix ATA is different because it is a square matrix made from two rectangular ones. It is the Gram matrix of A. Because of this difference, the matrix G does not directly correspond to the actions of the matrix A. In the strain examples, we had A performing the linear transformation that is the objective of the problem. Since it was square, we could use A directly for calculating the eigenvectors and eigenvalues and directly see the actions performed by A. When we have the matrix G, the starting matrix A was probably a data list or table of vector quantities. In this case, we have computed ATA to make a square matrix whose entries, in one analogy, reflect the covariation or correlation between the column vectors of A. In this case, the matrix G describes the arrangement of data and not a linear transformation. We learn more about the overall arrangement data by eigenanalysis of the matrix G. We can now solve matrix G for its eigenvalues and eigenvectors. This is done exactly as outlined earlier by computing the characteristic polynomial for det(G − λI) = 0, finding the λ values, and then substituting into the characteristic equation to solve for the eigenvectors. Results are given in equation 7.29. We call the eigenvalues λ1 , λ2 , and λ3 , and their associated unit eigenvectors x1 , x2 , and x3 . 0.456 −0.009     λ1 = 3.466 x1 = 0.398 , λ2 = 0.477 x2 = −0.892 , and 0.796 0.456 









(7.29)



−0.890   λ3 = 0.059 x3 =  0.213  . 0.404

(7.30)

What did this get us? Let’s compare the orientation of the λ1 eigenvector to the result from addition to get the mean vector r in equation 7.22: " # " # 0.456 0.456 T 43.3     r = 0.428 and x1 = 0.398 → = . P 51.3 0.780 0.796 







(7.31)

For this example of orientation data, we get the same best-fitting or average vector to the data either way. The orientation we get using addition is the same as for the λ1 orientation from eigenanalysis! The vector quantities differ slightly due to the different calculation methods, but the result is the same.

An Earth Science View of Eigenvalues and Eigenvectors ■ 211

We can recast the eigenvalues and eigenvectors into two matrices. The first is the Eigenvector Matrix. That is easy, it is just the column vectors x1 , x2 , and x3 arranged into a 3 × 3 matrix. We call this matrix X and show it in equation 7.32. What can we do with the eigenvalues? We could make these into a vector, but that will not work because we want to multiply each of the columns of X by its corresponding eigenvalue. To do this, we must have a similar size matrix, in our case, 3 × 3. We insert the eigenvalues as the entries into a diagonal Eigenvalue Matrix that we will call Λ, with the same order of eigenvalues as in the eigenvector matrix X. We give the result in equation 7.32. 0.456 −0.009 −0.900   X = 0.398 −0.892 0.213  , 0.796 0.452 0.403 



3.466 0 0   0.477 0 . and Λ =  0 0 0 0.059 



(7.32)

This keeps the eigenvectors and eigenvalues in the same order between the matrix X and Λ. We can do the same for data from the first clustered set of orientations. This gives us the following: 0.455 −0.199     λ1 = 3.991 x1 = 0.429 , λ2 = 0.007 x2 = −0.805 , and 0.780 0.559 











−0.868   λ3 = 0.002 x3 =  0.401  . 0.281

(7.33)

Again, the λ1 eigenvector is aligned with that for the more scattered data and the orientations of the summed vectors. Thus, in both cases, the summation and eigenanalysis give the same result for the orientation of the vectors. The other eigenvalues and vectors are somewhat different, with the ratio of λ1 to λ2 giving ≈ 600 for the clustered data and ≈ 7 for the scattered data. We explore interpreting this difference in the next section. 7.3.4

More Information About the Orientation Arrays

We get much more information about the orientation data using eigenanalysis. In Figure 7.7, we plot the eigenvectors and data points for the more scattered, second example in equation 7.29 (in green) and the more clustered first example from equation 7.33 (in yellow). The two examples give the same orientation for the summed vectors and λ1 eigenvector directions for the scattered and more clustered dataset. Thus, in both cases, all results align in the same direction, whether using eigenanalysis or just summing vectors. Combining the results of eigenanalysis with the original data shows more about the scatter. For the clustered example, the eigenvalues are such that λ1 >> λ2 ≈ λ3 whereas the scattered example gives λ1 > λ2 > λ3 . This is because the eigenvalues indicate how much the data clusters, with more aligned data giving λ1 much greater

212 ■ Linear Algebra for Earth Scientists

N

N

E E

D Positions of eigenvectors in the NED coordinate system. The left side shows a stereonet view common in the geosciences. This is simply the top view of the right side projected onto a plane. Green points show data and eigenvectors for the second, more scattered example. The yellow points and square are the data and eigenvectors for the clustered example. The right side is the unit lower hemisphere illustration, showing results for the eigenanalysis only for the more scattered example. The x1 vectors are colinear between the two cases. Figure 7.7

than the other eigenvalues. The fact that λ1 is so much larger in the first example reflects the close alignment of data. The second example’s lower value of λ1 shows more scatter. In both cases, the sum λ1 + λ2 + λ3 = 4, the number of unit vectors we are working with. The relative magnitudes of the eigenvalues and positions of the eigenvectors can reflect alignment between points. In Figure 7.7, the data from the second example scatter in an E-W orientation along the plane spanned by x1 and x2 . The fact that λ2 is relatively large but significantly greater than λ3 reflects the scatter of the data in an E-W orientation. In geologic terms, the orientation forms an E-W girdle reflecting the scatter. For the clustered example, the scatter is weak and oriented in an ENE-WSE direction. For this case, λ2 is close to λ3 , showing that the data are concentrated at point maximum. An important feature of X is that it is orthogonal if the matrix we start with is symmetric positive definite. In other words, we can compute a matrix X whose columns are all unit-length vectors that are orthogonal. We will see later that this matrix spans the same number of dimensions as our starting matrix and can be a set of basis vectors for the system.

7.4 DIAGONALIZATION INTO XΛX–1 This section will present our next major decomposition, or diagonalization, using the matrices Λ and X that we just got to earlier. We have seen diagonalization

An Earth Science View of Eigenvalues and Eigenvectors ■ 213

before with LDU . The Eigen-based diagonalization is perhaps the most important that the reader will encounter. We will use the ideas in the sections earlier with other properties we discussed in previous chapters and present physical examples so that you can appreciate the power of eigenvectors and eigenvalues. Let’s think about diagonalization on a matrix we will call B. The only condition on matrix B is that it is a symmetric positive definite as described in 7.3.2. The matrix B may result by making the Gram matrix by computing ATA for a long and skinny matrix A that is an array of data. The matrix may also be a linear transformation like pure shear strain. Because the matrix is symmetric positive definite, taking the eigenvectors of B will always result in an orthogonal matrix X: the eigenvectors or column vectors of X are orthonormal. It also means that X is always invertible, and the matrix Λ will always have positive values. OK, on to the linear algebra. We start with a simple product of taking B times its eigenvector matrix X. We know by the definition of the eigenvector that these vectors do not rotate during multiplication by B but only change by scaling length. The scaling factors, of course, are the eigenvalues; we can write the operation as follows in equation 7.34.  . ..   .. ..  .. · · · b1 · · · . . .      .    . BX =   x1 · · · xn  = x1 λ1 · · · xn λn  . . . . . . .. .. .. .. · · · bn · · · 

(7.34)

We can see that we get just the columns of the eigenvector matrix times the corresponding eigenvalues, that is x1 λ1 , x2 λ2 , out to xn λn . We can recast this into the two matrices X and Λ.  . .  . x1 

.. .

..  λ1 0 · · · 0  .   0 λ2 · · · 0    · · · xn   . .. ..   = XΛ. . . . ..  . . 0 0 · · · λn 



(7.35)

The matrix Λ must go to the right of X. Arranging the eigenvectors and eigenvalues into matrices means putting the diagonal matrix on the right to multiply the columns of the X matrix, as we discussed in the chapter on matrix multiplication. The result of equations 7.34 and 7.35 is shown in equation 7.36. BX = XΛ.

(7.36)

We can manipulate this equation further by multiplying by X −1 on either the right-hand side or the left-hand side of the entries in equation 7.36. This gives two new important results using the fact that X −1 X = XX −1 = I. BXX −1 = XΛX −1 −→

B = XΛX −1 ,

(7.37)

X −1 BX = X −1 XΛ −→

X −1 BX = Λ.

(7.38)

Now we have two new equations by which we can determine B if we know X and Λ or find Λ if we know B and X. We can always compute X −1 because X

214 ■ Linear Algebra for Earth Scientists

Pure Shear Strain

Transformation of a square by pure shear deformation with principal strain axes oriented at +45° and −45° to the coordinate axes. The square is transformed into a rhombus. Figure 7.8

has independent columns, so the inverse must exist. It may seem strange to want to compute B as shown in equation 7.37, but this can often be true in geologic operations. We will show such an example in section 7.4.1. Lastly, if all we have is B, we start by determining the eigenvalues using the characteristic equation, ordering the values from greatest to least, and then computing the eigenvectors. Which matrices are diagonalizable? All symmetric positive definite matrices can be diagonalized to eigenvalue and eigenvector matrices regardless of the size of the starting matrix. In fact, any symmetric matrix is similarly diagonalizable. This is a very powerful result, and symmetric matrices will be encountered frequently when working on problems in the Earth and Environmental Sciences. This means that eigenanalysis is a vital tool to keep in mind. 7.4.1 More Strain

Strain analysis of rocks is natural for using the eigenvectors and eigenvalues of a matrix. We showed previous examples of how we can view strain as a linear transformation through the lens of linear algebra. Although most problems in strain analysis involve heterogeneous strains across areas, we divide this problem into a series of homogeneous strains that we can treat simply as local linear transformations. Because of this, it makes sense to analyze strain using the tools of linear algebra, such as eigenvectors and eigenvalues. Sometimes, we can figure out the orientations of the eigenvectors without computing them. In Figure 7.8, we show such a case where a square has been strained

An Earth Science View of Eigenvalues and Eigenvectors ■ 215

by pure shear deformation. The lengthening is in the direction we call S1 oriented at +45°, and a shortening is in the direction we call S2 at −45° to the u axis. In this analysis, the letter S stands for Stretch. The equation for stretch is defined as: S=

ℓdeformed ℓundeformed

(7.39)

.

In this equation, ℓ is a line to compare its lengths before and after deformation. Note that S > 0. The other description for changes in lengths is Elongation. This is defined as: ε=

ℓdeformed − ℓundeformed ∆ℓ . = ℓundeformed ℓ

(7.40)

The elongation can be positive or negative, so lines that get longer have ε > 0, and shorter ones have ε < 0. Finally, we can describe angular changes in a deformed rock. This is done using what we call the Shear Strain. If a line is rotated by an angle of θ, then we define the shear strain as: γ = tan(θ).

(7.41)

For stretch and elongation, we can find directions in which S and ε take on maximum and minimum values. We call these directions and their values the Principal Stretches and Elongations. In two dimensions, we give these subscripts with S1 being the greatest principal stretch and S2 being the least. Similarly, we have principal elongations ε1 and ε2 . We can also consider three dimensions, using S1 ≥ S2 ≥ S3 and ε1 ≥ ε2 ≥ ε3 . For angular shear, we single out the orientation of maximum shear strain as γmax . The principal stretches and elongations form an orthogonal coordinate system, which we discuss in more detail. For the example in Figure 7.8, the lengthening and shortening directions are clear and are oriented parallel to a and b. These are the principal stretches S1 and S2 . Because we start with a square, we can immediately determine the orientation of x1 and x2 , getting x1 parallel to S1 and x2 parallel to S2 , given in equation 7.42. We do not know the eigenvalues yet, but we know which value goes with which vector. " #

"

#

1 −1 λ1 , x 1 = , and λ2 , x2 = . 1 1

(7.42)

Once we know the orientation of the eigenvectors, we can easily compute the accompanying eigenvalues for the case shown in Figure 7.8. We know the length of a′ √ is 1.412 + 1.412 = 2. We set up this problem using√(1, 1) for the coordinates of a, which gives the vector length from the origin to a as 2 = 1.41. Since the length of a is 1.41, and the length of a′ is 2, this gives a stretching factor of 2 ÷ 1.41 or simply 1.41. This is λ1 . Likewise, for λ2 , we get from b that it has gone from length 1.41 to b′ with length 1, yielding a shortening factor of 0.71. " #

"

#

1 −1 λ1 = 1.41, x1 = , and λ2 = 0.71, x2 = . 1 1

(7.43)

216 ■ Linear Algebra for Earth Scientists

The eigenvector matrix has orthogonal columns, but they are not of unit length. To get this, √ we must divide each eigenvector by its length. In our case, we divided each by the 2. With that, we get the final result for Λ and X in equation 7.44. "

#

1.41 0 Λ= , 0 0.71

1 1 −1 0.71 −0.71 = . and X = √ 1 1 0.71 0.71 2 "

#

"

#

(7.44)

What do we do now with Λ and X? We can recover the strain matrix from diagonalization using the result in equation 7.37. "

A = XΛX

−1

#"

0.71 −0.71 = 0.71 0.71

#"

1.41 0 0 0.71

#

"

#

0.71 0.71 1.06 0.35 = . −0.71 0.71 0.35 1.06

(7.45)

Note that we call the matrix A because it is a strain matrix doing transformation, not the result of making the Gram matrix of some other matrix. Note that the determinant of A is the same as that of Λ, which is one in this case. Thus, the strain is plane strain, as we have discussed earlier. We will always see that eigendiagonalization of a matrix will give an eigenvalue matrix Λ whose determinant is the same as the starting matrix. This shows the power and usefulness of diagonalization for recovering the strain matrix. In the previous example, the eigenvectors and eigenvalues were pretty clear from inspection. Still, the actual matrix doing the linear transformation, the strain matrix A, was not obvious at all. Although geologists themselves, the authors would never have come up with the result for A without the eigenanalysis. The other important factor is the deformation described earlier was pure shear. One way to distinguish pure shear from simple shear is that if the strain matrix is diagonalizable, the strain is pure shear. Now, let’s look at an example where we know the strain matrix A and the orientation of the eigenvectors. This will come from looking at simple shear strain but in a case where it is reasonable and possible to justify having two sets of eigenvectors and values and not a repeated set. In this example, we will use infinitesimal strain, or minimal magnitude strains typically of less than 1%. Such strains are usually impossible to detect visually and form the increments that add up to the larger finite strains we have been using. Our setup is shown in Figure 7.9. We assign the maximum and minimum elongations to ε1 and ε2 , respectively, and the maximum shear strain to γmax . Specifying γmax to be parallel to the u axis means that the elongations are oriented at ±45° to u. In this setup, the eigenvectors are essentially parallel to ε1 and ε2 . This is relatively accurate for infinitesimal strains. However, the strain matrix would not be diagonalizable for large, simple shear deformation cases. We can also define the deformation or strain matrix B. We will let the amount of shear strain be 0.01. We also know the orientation of the eigenvectors at ±45°. This gives us the same eigenvector matrix we had in equation 7.45. From this, we can set up the problem to solve for the eigenvalues as we saw earlier in equation 7.38. So now we can solve for the eigenvalues associated with this infinitesimal strain. "

#"

0.71 0.71 Λ = X −1 BX = −0.71 0.71

1 0.01 0 1

#"

#

"

#

0.71 −0.71 1.005 0.005 = . 0.71 0.71 −0.005 0.995

(7.46)

An Earth Science View of Eigenvalues and Eigenvectors ■ 217

Infinitesimal strain axes for simple shear deformation. Fixing the orientation of γmax , also defines the directions of ε1 and ε2 , the greatest and least elongations at ±45° to the maximum shear strain. Figure 7.9

This is not an exact set of eigenvectors because the off-diagonal terms are not zero but just small. We should have expected this because the strain we applied was a simple shear deformation, and we know there is only one repeated eigenvalue in this case. So, what does our result mean? This result approximates what happens to the elongations during the small increment of simple shear. We see about a 0.005 change in elongation when we apply a 0.01 shear. This is useful because the elongations are roughly the same between the shortening and lengthening components, although they have opposite signs. Again, although it is not an exact result, it helps find how the elongations are distributed, something we would be hard-pressed to say confidently without this exercise. 7.4.2

Rotation Matrices

Another possible way of straining an object is by rotating it. In this case, our strain matrix is a rotation matrix. We will start with an example of just 2 × 2 as in equation 7.47. "

#

cos θ − sin θ R= . sin θ cos θ

(7.47)

When we expand this out to find the eigenvalues by subtracting λ from the main diagonal and computing the determinant, we end up with the equation in equation 7.49. This equation is a bit messy, and the roots are not evident. For this reason, we use the quadratic formula with the result shown in equations 7.50 and 7.51 after the substitution of −sin2 (θ) = cos2 (θ) − 1. |R − λI| = (cos θ − λ)2 + sin2 θ →

(7.48)

λ2 − 2λ cos θ + cos2 θ + sin2 θ = λ2 − 2λ cos θ + 1 = 0 ,

(7.49)

λ1 , λ2 =

2 cos θ ±

p (2 cos θ)2 − 4 = cos θ ± cos2 θ − 1 , 2

p

λ1 , λ2 = cos θ ±

p

− sin2 θ

→ cos θ ± i sin θ.

(7.50) (7.51)

218 ■ Linear Algebra for Earth Scientists

Our result is surprising, as it is a complex number. What does this mean? The eigenvalues of the equation are not real numbers, so there is one solution to this equation that we can plot—the zero vector. Otherwise, all vectors change by application of a rotation matrix. This makes sense in that the fundamental role of a rotation matrix is to rotate all vectors in a system. Does this ever have any real and nonzero roots? Yes, when θ = 0 or n2π, n an integer, we will get values of 1 for the eigenvalues, which implies that all nonzero vectors are eigenvectors. This makes sense because, in this case, no vector rotates, and they all retain the same length. Also, if we have a larger rotation matrix that does not rotate around all axes, we get some eigenvalues of one. For example, the following equation has one real eigenvalue: cos θ − sin θ 0   R =  sin θ cos θ 0 . 0 0 1 



(7.52)

In this case, we get an eigenvalue 1 for the eigenvector [0 0 1]T —only this vector remains unrotated. 7.4.3

What else We Get from Diagonalization

When introducing strain as a linear transformation operation, we used a strain matrix that produced no volume change with the deformation. We discussed in the last chapter the determinants of several different strain matrices and how to read this as a volume change. With the diagonalization of the matrix B, we get the formula that B = XΛX −1 . We then have an easy way to figure out the Determinant. From the rules we have established: det(B) = det(XΛX −1 ) = det(X) det(Λ) det(X −1 ) .

(7.53)

But we know that |X| = |X −1 | = 1. This simplifies equation 7.53 to: det(B) = det(Λ) .

(7.54)

And Λ is a diagonal matrix with entries λ1 , λ2 out to λn . For a diagonal matrix, the determinant is just the product of the diagonal elements, so we get: det(B) = λ1 × λ2 × · · · × λn .

(7.55)

This approach means that if we can establish accurate eigenvalues reflecting the deformation of a rock, we could get not only the deformed shape but also potential volume or area changes. It also gives us another method for finding the determinant of a matrix if we are already committed to doing eigenanalysis. The examples for pure shear and orientation analysis show that the starting matrix is symmetric positive definite. These matrices are always diagonalizable and give real eigenvalues, as noted earlier. The other notable property of these matrices, and any symmetric matrix, is that as long as these are of full rank, they give us a full set of orthogonal eigenvectors. This means that the eigenvectors establish an orthogonal

An Earth Science View of Eigenvalues and Eigenvectors ■ 219

system of basis vectors. We know that any line parallel to the eigenvectors does not rotate and only stretches. Thus, the eigenvector basis remains in the same orientation throughout, regardless of what linear transformation we are doing. Although some eigenvalues could be repeated, a symmetric, full-rank n × n matrix will always give n independent, orthogonal eigenvectors. For symmetric matrices, it is common to refer to the eigenvectors as the Principal Axes. The eigenvectors do not rotate, and each eigenvector is associated with an eigenvalue. The eigenvalue matrix Λ is diagonal, so each eigenvector is associated with a single scalar eigenvalue. Although the eigenvectors are all orthogonal, the eigenvalues need not be distinct. The identity matrix is the best example of a symmetric matrix with repeated eigenvalues and orthogonal eigenvectors. For a 3 × 3 example we get: 0 0 1 1 0 0         I = 0 1 0 , λ1 = λ2 = λ3 = 1, x1 = 0 , x2 = 1 , x3 = 0. 1 0 0 0 0 1 

 



 

 

(7.56)

We are assuming here that the components of the symmetric matrix are real numbers. This means that all of the eigenvalues are real numbers as well. The eigenvectors are orthonormal and form an orthogonal matrix. We know from the discussion on orthogonality that if a matrix Q is orthogonal, then QQT = QTQ = I. This means we can rewrite equations 7.37 and 7.38 as follows: B = XΛX T ,

(7.57)

X T BX = Λ.

(7.58)

This is a much preferable representation because now we can use X T and not X −1 . Transposing is computationally faster than trying to figure out an inverse! 7.4.4 The Spectral Theorem

In the introduction to Section 7.4, we remarked that the only condition on matrix B is that it is a symmetric positive definite matrix to do eigenanalysis. This was a bit too strict. We can say that any matrix B that is real and symmetric can be diagonalized into an orthogonal matrix and a diagonal matrix. In other words, we get the result in equation 7.59 that: B = QDQT .

(7.59)

This is true whenever B is symmetric. We call this the Spectral Theorem, one of the most important ideas in linear algebra. And, although we do not discuss it further in this book, the matrix B can also be complex as long as it equals its complex conjugate. Such complex matrices are called Hermitian. We will see the power of the spectral theorem when we view the process of XΛX T on a symmetric matrix as a change of basis operation, the multiplication by a diagonal matrix, and then a reversal of the change of basis.

220 ■ Linear Algebra for Earth Scientists

This is a special case for the more general idea of Similar Matrices. We say that two matrices are similar if the following relationship holds: B = P AP −1 .

(7.60)

The matrices A and B are called similar if there exists a matrix P that satisfies the relation in equation 7.60. This is true regardless of whether A and B are real or complex, and without either being a symmetric matrix. If similar, A and B represent the same linear transformation. For the spectral theorem on real-valued matrices, we require B to be symmetric, which makes P orthogonal and A diagonal. In addition, the entries of A in equation 7.60 are the eigenvalues—they are all real numbers—and P = PT.

7.5 A DETAILED LOOK AT WHAT XT , Λ, AND X DO We now have the matrix in the form B = XΛX T by diagonalization. This means we use three matrices to accomplish the action of our one starting matrix. What do these matrices do that helps us understand problems in the Earth Sciences? Let’s look at another case where the eigenvectors are not aligned with the original coordinate axes using an example from strain. "

#

"

#

"

#

"

#

0.87 −0.5 2 0 0.87 0.5 1.62 0.65 , X= , Λ= , XT = . (7.61) B= 0.5 0.87 0 0.5 −0.5 0.87 0.65 0.88 What does each matrix do? The matrix Λ is the sort of stretching matrix we have seen before. It takes the first coordinate components and makes them longer, and the second coordinate components and makes them shorter. The determinant of Λ equals 1, so there is no volume change. What about X and X T ? We can compare these matrices in equation 7.61 with the rotation matrix in equation 7.47 and see that the matrices are rotations of (+π/6) (counterclockwise) for X and (−π/6) (clockwise) for X T . Figure 7.10 illustrates these in more detail. So both X T and X are rotation matrices and inverses of each other. Their main role is changing between the u-v to the x1 -x2 coordinate systems. We call such operations Change of Basis because we go from using u ˆ and v ˆ as the orthonormal basis vectors to x ˆ1 and x ˆ2 , which are the basis vectors of the Eigenbasis. The matrix Λ has its components in terms of the eigenbasis. Finally, the matrix B is the single-step deformation matrix that represents the overall linear transformation using the u ˆ and v ˆ basis. To better visualize the multiplication by X T, Λ, and X, we show a unit circle with a color gradient from yellow to blue. The circle will be deformed into an ellipse by the linear transformation. In a later section, we will enter into a much more detailed discussion of the deformation of circles and spheres. The color gradient helps show the rotations involved with the process. We also show the intersection of a unit vector oriented to the upper right with the unit circle. We label this vector w and show its intersection with the unit circle as a red circle. The vector is oriented 30° to u and is specified in equation 7.62. Notice

An Earth Science View of Eigenvalues and Eigenvectors ■ 221

Eigenanalysis for a pure shear stretch by a factor of 2 directed up and to the right at 30° to the u-axis. The full matrices are given in equation 7.61. The black arrow with B is the direct transformation; the orange arrows show the matrix multiplication path going through the eigenbasis. The red circle at the head of w follows the transformation. The starting coordinate system is u-v with basis vectors u ˆ and v ˆ. The eigenbasis is in the x1 -x2 coordinate system and basis vectors x ˆ1 and x ˆ2 . The orange arrows show the changes associated with the matrices for the path using X T, Λ, and X. Orange letters are coordinates in the eigenbasis, and black ones reference the original basis. Figure 7.10

that we use the short-hand wu to indicate the direction cosine relative to the u-axis. This is the entry in w for the first component. The story is similar for the second component wv . "

#

"

#

wu 0.87 . w= → wv 0.5

(7.62)

We will follow w’s ride through multiplication by X T , Λ, and X by tracking the red circle at its head. The matrix X T goes first. The rows of this matrix are the vectors x1 and x2 expressed in the u-v system. For example, we read the symbol x1u as the component of x1 in the u direction. Note that in the matrix X of the eigenvectors, x1 and x2 form the columns of the matrix, but they form the rows of X T as we see in the following equation. "

#

"

x x -vector x1u X = 1 −→ 1 x2 x2 -vector x2u T

#

"

#

x 1v 0.87 0.5 . → x 2v −0.5 0.87

(7.63)

222 ■ Linear Algebra for Earth Scientists

Now we want to find the components of w in the x1 and x2 directions. This is easy; we simply take the dot product to find the projection of one vector on another. We would just compute x1 · w and x2 · w, which is what the following equation accomplishes. "

x x 1v X Tw = 1u x 2u x 2v

#"

#

"

#

"

#

wu x w + x1v wv wx1 = 1u u = = we . wv x2u wu + x2v wv wx2

(7.64)

In this equation, (x1u wu + x1v y wv ) is the dot product of x1 with w, giving the component of w in the x1 direction that we label wx1 . Likewise, (x2u wu + x2v wv ) gives the x2 component of w. The final result is we or w in the eigenbasis, that is, the components of w using x1 and x2 as the basis vectors. "

#"

0.87 0.5 X w= −0.5 0.87 T

#

0.87 0.5

" #

= uv

1 0

= we .

(7.65)

x1 x2

This is now the vector w expressed in the coordinates of the eigenbasis, as we can see in the lower left of Figure 7.10. We have also adopted a new convention to indicate which basis is used for the vector’s component. We start in the u-v system, indicating this basis using the subscript uv. We go to the eigenbasis using X T , using the subscript x1 x2 . For simplicity, we use the subscript e to indicate that we have taken the vector w to the eigenbasis. The next multiplication is by Λ. This is a diagonal matrix, so the first row only multiplies the x1 component of the vector we and the second row only the x2 component. "

#

x -component λ1 0 Λ −→ 1 → x2 -component 0 λ2

"

#

2 0 . 0 0.5

(7.66)

Now, we form the matrix product with the vector we . "

#

"

#

λ w + 0wx2 λ w Λwe = 1 x1 = 1 x1 = weλ . 0wx1 + λ2 wx2 λ2 wx2

(7.67)

We have to remember that this result is still in the eigenbasis. The matrix Λ is written with respect to x1 and x2 directions. And we had already put w into the eigenbasis as we . "

#" #

2 0 Λwe = 0 0.5

" #

1 2 = 0 0

= weλ .

(7.68)

x1 x2

We have the result in the lower right of Figure 7.10. Adding the λ to the subscript of we indicates that the vector has now been multiplied by Λ. Our starting circle is now an ellipse with an axial ratio of 2 : 0.5, the same as the eigenvalues. Our last operation is to return the stretched vector from the eigenbasis to our original u-v basis. We must do this by recovering the [u, v] components of the vector weλ . This requires returning to the original basis using the X matrix. h

X = x1 x2

i

"

#

u-component x1u x2u −→ → v-component x1v x2v

"

#

0.87 −0.5 . 0.5 0.87

(7.69)

An Earth Science View of Eigenvalues and Eigenvectors ■ 223

We set up the multiplication and get the following result: "

Xweλ

#

#

"

"

#

x x λ w + x2u λ2 wx2 x = 1u 1 x1 = λ1 wx1 1u + λ2 wx2 2u = wd . x 2v x1v λ1 wx1 + x2v λ2 wx2 x1v "

Xweλ

#" #

0.87 −0.5 = 0.5 0.87

2 0

"

x1 x2

(7.70)

#

1.73 = 1.0

= wd .

(7.71)

uv

This gives us the result in the upper right of Figure 7.10. We use the d subscript to indicate that this is the deformed vector in its original coordinate system. This agrees exactly with the product of Bw that we get without going through the eigenbasis, as we see in the top row of Figure 7.10. "

#"

1.62 0.65 Bw = 0.65 0.88

#

"

#

0.87 1.73 = = wd . 0.5 1.0

(7.72)

Equation 7.72 accomplishes in one step what took three using the eigenanalysis. What did we gain with the extra work? The matrix B does not immediately tell us what the deformed shape looks like. Visualizing the deformed shape simply by examining the deformation matrix B is tough. From Λ, we know the Shape of the ellipse, and X shows its orientation. The eigenanalysis matrices present understandable and clear pictures of the overall transformation; at best, we know that the matrix B is symmetric, positive definite, but that is about all.

7.6 SUMMARY This chapter defines and gives examples of Eigenanalysis, one of the most important operations in Linear Algebra. We started by examining deformation matrices that applied plane strain to some fossils. The deformation matrices are viewed as linear transformations that change the shapes of objects on which they operate. For pure shear deformation, we found two directions in which the starting vector did not rotate but changed length; for simple shear transformations, we found only one direction; and for rotational strains, there were no directions. The special directions are found by computing the eigenvectors of the linear transformation matrix, and the stretching corresponds to the eigenvalues. To figure out the eigenvectors and eigenvalues, we use the characteristic equation for the matrix, which is |A − λI| = 0. We expand this determinant into the characteristic polynomial in the variable λ. For an n × n matrix, we find n roots of the polynomial. For most cases in the Earth Sciences, the roots will be real and distinct. We order the eigenvalues from λ1 to λn , usually in decreasing order. The eigenvalues are then entered into the characteristic equation, and we solve for n eigenvectors. A pervasive operation in Linear Algebra for the Earth Sciences is to compute the matrix ATA, called the Gram matrix. Using the Gram matrix arises in least squares computations and can be used in a host of other problems. Assuming we start with a tall matrix A, where m > n and A is full column rank, the Gram matrix will be symmetric and positive definite. This means we can find real-valued eigenvalues and a set of orthonormal eigenvectors.

224 ■ Linear Algebra for Earth Scientists

If we can find real eigenvectors and eigenvalues for a matrix, then the matrix is diagonalizable. Symmetric real matrices are always diagonalizable and are very common in problems related to the Earth. When we perform the diagonalization, the eigenvector and eigenvalue matrices X and Λ are interpreted as a rotation matrix and a diagonal stretching or linear transformation matrix. We use the matrix X T to recast our problem into one with basis vectors that are the eigenvectors. We can easily visualize the transformation by Λ and any accompanying dilations, contractions, or coordinate flips. Finally, using X, we can switch our problem back to the starting basis for the matrix A.

7.7 EXERCISES 1.

We will do two exercises on deformed plutonic rocks, shown in Figure 7.11. (i)

For each image, measure the orientation of the long axis of 25 grains. You can do this as an angle relative to north with clockwise angles having a positive sign. Compile results into a 25 × 2 matrices of direction cosines. (ii) For each image, perform an eigen analysis of the orientation data. You will get two eigenvector directions. (iii) Can you say anything about the degree of deformation or alignment using the eigenvalues? (iv) Now consider the aspect ratio of each of the measured grains. Can these be worked into the eigenanalysis? If you come up with an idea for this, go ahead and make some measurements and calculations. 2.

In this problem we will explore how eigenanalysis can help us compute something like An quickly and easily. We know from Section 7.4, if we have a real symmetric matrix A we can write down that A = XΛX −1 . This is also true for many real matrices that are not necessarily symmetric. So we will suppose A is real and gives real eigenvectors and eigenvalues, but not necessarily an orthogonal set of eigenvectors. We will use this eigendecomposition for A in the following exercises. (i) Compute A2 . What do you notice about the resulting matrix? (ii) Using the idea of taking A to a power, come up with a general equation for An . (iii) Now let’s work on 2 × 2 strain matrix. We will use: "

A=

#

1.0100 0.01 . 0 0.9901

Compute the determinant of A. What sort of strain does this matrix accomplish? (iv) Compute X and Λ on the matrix A. Then compute A25 by direct multiplication and by using the eigendecompostion. How do the results compare? (v) Could you use X T instead of X −1 ? Try the computation with X T and compare to the last result. If this does not work, what do you need to do to X to be able to use X T ?

An Earth Science View of Eigenvalues and Eigenvectors ■ 225

5cm

5cm Two deformed plutonic rocks. Dots show the grains to focus on for measurements of the long directions of grains. Also shown are ellipses in each picture for examples of grain shapes tied to specific grains. Assume that north is aligned with the vertical direction and east is to the right in each photo. Some grains may be clear; others may require the reader to make their best approximation. Yellow bars about 5 cm in length. Figure 7.11

226 ■ Linear Algebra for Earth Scientists

Underformed initial circle in the upper left and four ellipses created by deformation. Figure 7.12

3.

For our next problem we will reuse an exercise we did last chapter. We repeat the setup in Figure 7.12. We can form a vector for the long axis of each ellipse. Do this by measuring the long axis using the center do and your best estimate for the position of the tip of the major semiaxis of each ellipse. Record this as a vector like [1.75 3.75]. (i) Form a 4 × 2 matrix of the data. Do the eigenanalysis on this matrix. (ii) Now we will work with the orientation of the axes alone. To do this, each vector will have to be normalized to unit length. Do the normalization and then repeat the eigenanalysis. (iii) The two previous parts should have given somewhat different answers. What is the reason for this?

CHAPTER

8

Change of Basis, Eigenbasis, and Quadratic Forms

The last chapter focused on eigenanalysis to create a matrix of eigenvalues and corresponding eigenvectors. The eigenvectors form a separate set of basis vectors for the system we are investigating. This chapter explains using the eigenbasis and how to look at basis vectors in general. This necessarily deals with changing from one basis to another. Changing basis is a more confusing and hard-to-remember procedure in linear algebra, so we will gradually lead the reader through it. Eigenanalysis also leads us to address quadratic forms, which we will examine at the end of this chapter. What is a Basis? We know that the basis vectors span a vector space with just enough to reach every point in the space but no more to make any of them linearly dependent. A basis can add a Coordinate System to our linear algebra problem so that we can measure and calculate. So what exactly is a change in basis? For the aims of this book, we consider a change of basis mainly as a simple change in the coordinate system. This is the way it is used in most Earth Science situations. Such changes would be going from a UTM to a State Plane grid in cartography, making a down-plunge projection in structural geology, aligning with the flow direction in hydrology, or simply changing from a North-West-Up coordinate system used by many mobile devices for recording orientations to the NED system that we have used extensively in this text. One fundamental change of basis is to go from whatever our measurement framework is to an eigenvector and eigenvalue system. For any reader who has heard the term Principal Stress Tensor, this reflects a particular change of basis. Changing basis is a profound subject in linear algebra, with many important implications and elegant methods. Those aspects, however, are beyond the scope of this text.

DOI: 10.1201/9781003432043-8

227

228 ■ Linear Algebra for Earth Scientists

MATLAB® moment—Finding a basis The main function to get an orthonormal basis is orth(). This returns an orthonormal basis for the column space of a matrix. For m × n matrices of full column rank, this yields n basis vectors in Rm . If the matrix is not full column rank, it returns a basis for the independent columns. "

#

"

#

0.32 0.95 1 2 >> s=orth(A) gives for A = . −0.95 0.32 4 5

8.1 CHANGE OF BASIS We showed examples in the last chapter of the actions of the eigenvector and eigenvalue matrices. We take data or components into the eigenbasis using X T and out of it using X. At first glance, this seems backward, so why is this the right transformation? And what are we changing when we go into the eigenbasis? In this section, we use simple examples of stretching and rotation to motivate the logic mechanics of the procedure we call Change of Basis. We will briefly review what we mean by a basis, basis vectors, and vector components. The most straightforward statement is that a basis is defined by its basis vectors and we write components in terms of these vectors. Change of basis means rewriting the vector components of the old basis in terms of new basis vectors. Let’s start with Figure 8.1 to ensure we are comfortable with all the definitions we need for the change of basis. We use just two dimensions—this is not hard to visualize, and we can go to any number of dimensions and utilize the same methods and procedures. The left side of Figure 8.1 shows, in no particular reference frame, our three objects of interest, the unit vectors of u-v and x-y systems, the basis vectors for these coordinates, and a vector a. On the right side, we align the origins for the basis vectors and the tail of the vector a. We have set the x-y basis vectors to be twice the length of the u-v ones. Both systems are right-handed. How do we transform the basis vectors from the u-v to the x-y system? Examining Figure 8.1, we see that the x-y basis vectors are twice the length of the u-v ones. So, in going from u-v to x-y, the basis vectors become two times longer. What happens to the vector components when we go from the u-v to the x-y system? " #

2 x ˆ = 2ˆ u= 0

" #

uv

1 = 0

" #

xy

0 and yˆ = 2ˆ v= 2

" #

= uv

0 1

.

(8.1)

xy

In this equation, the xy Subscript Notation means that this is measured in terms of the x-y basis vectors, and the uv subscript means that this is measured in terms of the u-v basis vectors. We read the equations in 8.1 for x ˆ as: " #

2 x ˆ has components in the u-v basis and components 0

" #

1 0

in the x-y basis. (8.2)

Change of Basis, Eigenbasis, and Quadratic Forms ■ 229

Left side shows the basis vectors for the two coordinate systems and the vector a we will use in this example. The black dots are the origin of the coordinates or the tail of the vector. We align the coordinates and vector to the same origin on the right side. Using the subscript to indicate the basis, we also show a as its components in the u-v and x-y systems. We show the relationship between the basis vectors of the two systems on the right. Figure 8.1

The statements for yˆ are similar. This means that as the basis vectors get longer in going from u-v to x-y, the components get smaller. The components change in the opposite manner as the basis vectors. On the other hand, if we go from the x-y to u-v systems, we would get:  

 

 

 

1 1 0 0 x ˆ yˆ u ˆ = = 2 =   , v ˆ = =  1  =   , which reads 2 2 0 0 1 2 xy

xy

uv

 

(8.3)

uv

 

1 1 u ˆ has components  2  in the x-y basis and   in the u-v basis. 0 0

(8.4)

In this case, the basis vectors get shorter in going from x-y to u-v but the components get larger. Again, the components change in the opposite manner as the basis vectors. We now will do the same comparisons for the vector a. We can find the components of this vector in both bases as shown in Figure 8.1.  

1

a=  1

uv

  1

=  21  . 2

(8.5)

xy

This example for a shows that if the basis vectors are longer like x ˆ and yˆ then the components are smaller, and if the basis vectors are shorter like u ˆ and v ˆ the components are larger. Knowing this behavior will take the reader a long way in understanding change of basis.

230 ■ Linear Algebra for Earth Scientists

8.1.1

Basis Vectors, Components, and Change of Basis Matrices

We must specify the basis vectors for the x-y and u-v systems. Using the Standard Basis for each system is pretty easy. This assignment gives: " #

1 u ˆ= 0

" #

uv

0 and v ˆ= 1

" #

uv

1 , x ˆ= 0

" #

0 1

and yˆ = xy

(8.6)

. xy

The result is that the standard basis vectors look the same in their own coordinate systems. This is just how we define the standard basis. We can also write down an equation for any vector in the system by adding scaled products of the basis vectors. " #

a b

= aˆ u + bˆ v.

(8.7)

uv

How do we express the basis vectors of one system in terms of the basis vectors in the other? We make equations for each basis vector using those of the other system.  

u ˆxy

 

1 1 → x ˆ + 0ˆ y = 2 2 0

, v ˆxy

0 1 → 0ˆ x + yˆ =  1  , and 2 2

xy

 

(8.8)

xy

 

2 x ˆuv → 2ˆ u + 0ˆ v=  0

uv

0 , yˆuv → 0ˆ u + 2ˆ y=  . 2

(8.9)

uv

So now, if we want to change the u-v components of the vector in equation 8.7 to the x-y basis, we can just make a simple substitution.  

 

a

  = aˆ uxy + bˆ vxy = a

b

1 2

0

xy

 

0



+ b 1 =  2



a/2 b/2

(8.10)

 . xy

We can do this with matrix multiplication using equations 8.7 and 8.8.  

a

h

  = u ˆxy

b

xy

   1 i a 2    = v ˆxy

b

0

0 1 2

 

a

   , and let B =

b



1 2

0

0 1 2

  .

(8.11)

xy

The components get smaller when we go into a basis with longer basis vectors. We call the matrix B the Change of Basis Matrix that converts components determined for the u-v system into components in the x-y system. To make this clearer for the reader, we can add a Subscript to the Change of Basis Matrix as needed for clarity. B → Buv→xy .

(8.12)

Another way of expressing this is that the matrix B is how the u-v basis vectors u ˆ and v ˆ appear as components in the x-y basis. The columns of B are exactly u ˆ and v ˆ expressed in components using x ˆ and yˆ.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 231

Let’s now change the components of a vector in the x-y basis to the u-v one. " #

c d

" #

c d

= cˆ x + bˆ y → xy

" #

= cˆ xuv + bˆ yuv uv

" #

"

2 0 2c +d = =c 0 2 2d

#

.

(8.13)

uv

Again, we do this with a matrix multiplication. " #

c d

h

ˆuv yˆuv = x uv

" # i c

"

2 0 = d 0 2

#" #

"

#

c 2 0 , and let C = . d 0 2

(8.14)

And C the change of basis matrix that converts components determined for the x-y system into components in the u-v system. We can also write C → Cxy→uv . The columns of C are exactly x ˆ and yˆ expressed as components using u ˆ and u ˆ. The reader may notice that the matrices B and C look somewhat similar. We can write C = 4B and B = 14 C. If we multiply the two matrices we get BC = CB = I, meaning B = C −1 and C = B −1 . The matrices are the inverses of each other. The matrix B changes components from u-v to x-y while C does the opposite and goes from x-y to u-v. So, as long as we know one of the change of basis matrices, we can determine the other. We can write a general equation for a change of basis matrix we will call B. We want to go between two bases we designate α and β, having the basis vectors α1 · · · αn and β1 · · · βn . Both vector spaces are n-dimensional. We can write a change of basis matrix for an n × n system as follows, using the basis vectors and scalars b11 · · · bnn . h

i

Bα→β = [α1 ]β [α2 ]β · · · [αn ]β , where [αi ]β = b1i β1 + b2i β2 · · · bn βni , 

Bα→β

b11 b  21 =  ..  .

bn1

(8.15)



b12 · · · b1n b22 · · · b2n   .. ..  . . .  bn2 · · · bnn

(8.16)

We write the α basis vectors as they appear in the β basis. These new vectors, such as [αn ]β , then become the columns of the change of basis matrix Bα→β . Each basis vector has n components, and there are n such vectors.

MATLAB® moment—Creating the change of basis matrix MATLAB® will not generate a change of basis matrix. You have to do this yourself. To get the matrix requires working through the logic presented in this chapter or using the result of a function like eig(). You can do plenty of things with the matrix once you know it!

232 ■ Linear Algebra for Earth Scientists

Figure of rotation for change of basis. Left diagram is using the uv basis, with the standard basis vectors u ˆ and v ˆ. Given are the coordinates of the xy basis vectors x ˆ and yˆ, as expressed in the uv basis. The right diagram shows the system in the xy basis, with x ˆ and yˆ as the standard basis vectors. Given are the coordinates of the uv basis vectors as expressed in the xy basis. In both parts, we show the vector b, which is at +45◦ to the u ˆ direction. Figure 8.2

8.1.2 Change of Basis Matrices from a Different Angle

What happens when we rotate a set of orthogonal basis vectors without rescaling? We show the setup in Figure 8.2. We call the rotation angle θ and will take counterclockwise rotations as positive and clockwise rotations as negative. This is the same approach we have used throughout this text to set up the right-handed coordinate system in two dimensions. We use the subscript uv for the first basis and u ˆ and v ˆ for the basis vectors; xy is used for the second basis with x ˆ and yˆ as the basis vectors. These vectors are in the standard form as shown in equation 8.6. In the previous section, we changed the basis by scaling the basis vectors and keeping them in the same orientation. For rotations, our main task is to answer the question: How do the basis vectors in the old coordinate system move to arrive at the new basis? Let’s review the problem in Figure 8.2. Just as we did for scaling in Section 8.1.1, our first task is to write the basis vectors as components in the other system, that is u ˆ and v ˆ in x-y components, and x ˆ and yˆ in u-v components: "

u ˆxy

cos θ = cos(θ)ˆ x − sin(θ)ˆ y= − sin θ "

v ˆxy

sin θ = sin(θ)ˆ x + cos(θ)ˆ y= cos θ "

x ˆuv

yˆuv

(8.17)

, xy

#

cos θ = cos(θ)ˆ u + sin(θ)ˆ v= sin θ "

#

(8.18)

, xy

#

and

,

(8.19)

uv

− sin θ = − sin(θ)ˆ u + cos(θ)ˆ v= cos θ

#

. uv

(8.20)

Change of Basis, Eigenbasis, and Quadratic Forms ■ 233

So now we have the equations that reflect the changes of the basis vectors from one system to the other. What about changing the components of an arbitrary vector? Let’s use b as our example vector. We start by stating that even if we change our basis vectors and coordinate system, the vector b does not change. It is a fixed vector in space, the same as the previous example given with the vector a used for scaling changes in Figure 8.1. For b in the u-v and x-y bases, we get that: " #

" #

b = u bv

b → buv

bx . by

and b → bxy =

(8.21)

Again, the components will change depending on the basis, but the vector b itself is the same. When changing basis, we specify the components to reflect changes in basis vectors but do not alter the starting vector in any way. This is the same argument we used for a tensor object in a previous chapter. We will go back to the approach we used in the previous section. We want to change the basis for b from the u-v to the x-y basis. "

bxy = bu u ˆxy + bv v ˆxy = bu

cos θ − sin θ

#

"

sin θ + bv cos θ

xy

#

"

xy

bu cos θ + bv sin θ = −bu sin θ + bv cos θ

#

. (8.22) xy

We can rearrange the vector and scalar quantities to get: "

bxy

cos θ sin θ = − sin θ cos θ

#" #

"

#

h i bu cos θ sin θ ˆxy v ˆxy buv , and let B = = u . (8.23) bv − sin θ cos θ

We again have a matrix that changes components from the u-v to the x-y system. We use the same notation and write that B → Buv→xy . Again, B has columns that show how u ˆ and v ˆ appear as components relative to x ˆ and yˆ. We also need to notice that the matrix B is an orthogonal matrix. We will exploit the special properties of this Q-type matrix soon. Now, we will consider the vector b shown in both parts of Figure 8.2. The vector b is at 45◦ to the u ˆ written as ∠bˆ u = 45◦ . The components of this vector in the uv basis are: "

buv

#

"

#

cos 45◦ 0.707 = ◦ = sin 45 0.707

(8.24)

. uv

Let the angle θ be 15◦ in the uv basis as shown in Figure 8.2. We can then write out the change of basis matrix and multiplication to take buv to bxy . "

bxy = Bbuv

#"

0.966 0.259 = −0.259 0.966

#

"

#

0.707 0.866 = 0.707 0.50

xy

"

#

cos 30◦ = . sin 30◦

(8.25)

We can see this is the right result. If we set θ = 15◦ , then we know that the ∠bˆ x = 30◦ . ◦ And this is precisely the result we got in equation 8.25, that b is 30 from the x ˆ basis vector.

234 ■ Linear Algebra for Earth Scientists

We see that x ˆ and yˆ are rotated counterclockwise from u ˆ and v ˆ but the components of vector b appears to rotate clockwise! In u-v, b is at 45° to u ˆ but is at 30° to x ˆ in x-y. This means that a 15° counterclockwise rotation of the basis vectors results in a 15° clockwise rotation of the components. So, as is the case for scaling basis vectors, components behave in the opposite manner to the basis vectors. Let’s go in the other direction, from the x-y system to u-v. We can set up components and basis vectors similarly to equation 8.22. We now write the vector buv in terms of the components of bxy . "

buv = bx x ˆuv + by yˆuv = bx

cos θ sin θ

#

"

+ by uv

− sin θ cos θ

#

"

uv

b cos θ − by sin θ = x bx sin θ + by cos θ

#

. (8.26) uv

Which goes to a version of equation 8.23. "

buv

cos θ − sin θ = sin θ cos θ

#" #

"

#

h i bx cos θ − sin θ ˆuv yˆuv bxy , and let C = = x . by sin θ cos θ

(8.27)

This gives us the following using ∠bˆ x = 30◦ and θ = 15◦ : "

buv = Cbxy

#"

0.966 −0.259 = 0.259 0.966

#

"

#

0.866 0.707 = 0.50 0.707

uv

"

#

cos 45◦ = . sin 45◦

(8.28)

So now we see that the basis vectors rotate 15° clockwise, but the components appear to rotate 15° counterclockwise. Once again, the components behave inversely relative to the basis vectors. What can we say about the change of basis matrices B and C? We know from the previous section and can see in equations 8.23 and 8.27 that C = B −1 . This has to be the case because B rotates in the opposite direction to C. And since B is orthogonal, C = B −1 = B T . It doesn’t get better than that! MATLAB® moment—Changing basis using a matrix We now have gone through creating a change of basis matrix. If we use the change of basis matrix B for a 15° rotation in equation 8.25 and the vector b in equation 8.24, we get the following results. "

#

0.866 . We can go back using the transpose. >>br=B*b and we get 0.50 "

#

0.707 >>B′ ∗br gives . The ∗ command does all the work. 0.707

We get compelling results by recapping the procedure on purely orthogonal matrices in two dimensions. First, it is easy to set up the change of basis matrices. We follow the right-hand convention with counterclockwise rotations being positive. We can then easily write down the change of basis matrices. If we assume that the u-v is

Change of Basis, Eigenbasis, and Quadratic Forms ■ 235

the starting coordinate system and we want to go to the x-y one, then we can write the change of basis matrix Buv→xy . And the inverse matrix is just B T to get from x-y to u-v. "

Buv→xy

cos θ sin θ = − sin θ cos θ

#

"

and Bxy→uv

#

cos θ − sin θ = (Buv→xy )T = . sin θ cos θ

(8.29)

We also know we can undo the rotation by rotating by the angle −θ. We can substitute this into equation 8.29 for Buv→xy . "

#

"

#

cos(−θ) sin(−θ) cos θ − sin θ Buv→xy (−θ) = = = Bxy→uv . − sin(−θ) cos(−θ) sin θ cos θ

(8.30)

The lesson here is that going from u-v to x-y behaves normally with counterclockwise rotations in this setup. That means that when we go from x-y to u-v, we must view the rotation in reverse. Another way of stating this is that if the target basis is moving counterclockwise relative to the starting basis, then the starting basis appears to move clockwise with respect to the target basis. We see this by comparing the rotations in the left and right parts of Figure 8.2. And if the basis rotation is counterclockwise, the components appear to move clockwise. 8.1.3

Change of Basis Matrices can Look Just Like Rotation Ones

The change of basis matrices in the last section should look familiar to the reader. The change of basis matrices B in equation 8.23 and C in equation 8.27 look just like the rotation matrices for the angle θ and −θ, respectively, that we saw in earlier chapters. And indeed, they are. We will quickly review what we did with rotation matrices before we jump into the eigenbasis and several examples. We restrict ourselves to the 2D case for simplicity in visualization, but the arguments here extend to however many dimensions we want. The unique feature of the 2D case is that matrix multiplication is commutative, whereas it is not for 3D and higher dimensions. Since we first learned about these matrices in Chapter 2, we have developed the idea of orthogonal matrices, where the inner product of one column with another column is 0, and the norm of each column is 1. We have also used the symbol Q for such matrices. Let’s explore the action of these matrices using the example in Figure 8.3. In this case we are using three vectors called u ˆ, x ˆ, and a. We will have the first two as unit basis vectors and the last as an arbitrary vector in space. The angle +θ is measured counterclockwise, and −θ is clockwise, the same convention we have used. So, in this figure, we get that: 20◦ ccw = +20◦ , 20◦ cw = −20◦ , 60◦ ccw = +60◦ , and 60◦ cw = −60◦ .

(8.31)

Now, we can look at the angles and the rotation matrices. As we saw in earlier chapters, as well as the previous section, a Rotation Matrix looks like: "

#

cos θ − sin θ R(θ) = . sin θ cos θ

(8.32)

236 ■ Linear Algebra for Earth Scientists

Figure 8.3 Figure for rotation matrices and a change of basis. The left diagram is using the u-v basis, with the standard basis vector u ˆ, to specify x ˆ and a. We omit v ˆ for clarity. The center part is the same operation for the x-y basis, again omitting the yˆ basis vector. The right part shows how to combine rotations or angles in x-y and u-v bases to specify the orientation of the vector a.

So for the left part of Figure 8.3, we can consider a as a vector rotated 20° counterclockwise from u ˆ and 60° clockwise from x ˆ. How do we use rotation matrices to compute ∠ˆ ux ˆ? We could first rotate 20° to get to a and then an additional 60° to get to x ˆ. We would do this using two rotation matrices, first doing R(20°) and then R(60°). Note that the rotation from a to x ˆ is counterclockwise. "

#"

0.5 −0.87 R(60°)R(20°) = 0.87 0.5

#

"

#

0.94 −0.34 0.17 −0.98 = = R(80°). 0.87 0.34 0.98 0.17

(8.33)

The matrix multiplication is commutative so R(60°)R(20°) = R(20°)R(60°) = R(80°). Note that commutivity is only true for Plane or 2D rotations, and does not hold in 3 dimensions and more. The result of equation 8.33 is that by rotating a vector parallel to u ˆ by 80°ccw it aligns with x ˆ. We can repeat this operation with x ˆ as the basis vector and rotating to u ˆ, as shown in the middle part of Figure 8.3. Rotations are clockwise, so we have angles of −60° to get to a and −20° to get to u ˆ. We could call the final rotation matrix R(−80), keeping the notation used in equations 8.32 and 8.33. Also, notice the angles added in these equations to get the final rotation angle, precisely what we would expect. This is only true, however, for Plane Rotations, rotations confined to a single plane. Let’s now explore rotations as a change of basis. What if we changed the basis from u-v to x-y? We can use our logic from Section 8.1.2 and see that we want to be able to express the starting basis vectors as components in the other basis. This means we are rotating −80° because this is how u ˆ appears in the x-y basis. The first column of the matrix R(−80) is the u ˆ basis vector in components of the x-y basis, with the second column representing the orthogonal basis vector v ˆ. "

#

0.17 0.98 R(−80°) = −0.98 0.17

"

→ u ˆxy

#

0.17 = −0.98

"

and v ˆxy

#

0.98 = . 0.17

(8.34)

Change of Basis, Eigenbasis, and Quadratic Forms ■ 237

The critical feature here is that x ˆ is 80°ccw from u ˆ, but we use the opposite rotation of 80°cw to get to change to the x-y basis. Components move in the opposite sense to the change in the basis vectors. This means that as we wrote in the previous section, Buv→xy = R(−80). And by similar reasoning, Bxy→uv = R(80). We can also see in the right part of Figure 8.3 that we get the components of a in the u-v basis by just using the first column of R(20). Then to get to a into the x-y basis we multiply this by R(−80). "

axy

#"

0.17 0.98 = Buv→xy R(20) = −0.98 0.17 "

#

0.94 −0.34 = 0.87 0.34

(8.35)

#

0.5 0.87 = R(−60) = R(60cw). −0.87 0.5

(8.36)

This is just the right result for Figure 8.3. And once again, we could have set Buv→xy = B and Bxy→uv = B T .

8.2

WHAT DO THESE RESULTS MEAN FOR THE EIGENBASIS?

Now that we have a general understanding of how to change bases in two dimensions by lengthening and rotating, we are ready to consider more advanced ideas and, finally, understand what we are doing with the eigenvector and eigenvalue matrices. The most important equation we have in this regard is expressed in the Spectral Theorem for the case of symmetric real matrices. In the last chapter, we showed the roles played by the eigenvector and eigenvalue matrices during decomposing a symmetric matrix into A = XΛX T . This means that the linear map or transformation that is done by A on a vector can be done by first using X T to convert it to the Eigenbasis, then using Λ to do some stretching, and take the vector back to the original basis using X. We explain this process in more detail. First, we start with the Eigenvector Matrix X. This matrix contains the eigenvectors or directions only lengthened or shortened by the transformation done by A in the basis for A. We know X is an orthogonal and rotation matrix. Using the previous discussion, this means that X forms the basis vectors for the eigenbasis written in components of the basis vectors of A. We could write basis vectors of X as [x1 ]a . We will use the subscript e to indicate the components of a vector or matrix are in the eigenbasis and a to stand for the original basis of A. Using equation 8.23, this means that X is a change of basis matrix that changes from components written in the eigenbasis to ones in the original basis. So X T takes vectors in the a basis and converts them to components in the eigenbasis. We could write the following for the change of basis matrices: T Xa→e and Xe→a .

(8.37)

In Sections 8.1.1 and 8.1.2, we used the B matrix to change from our starting components to the target basis and C to return to the starting basis. So, in our case, the matrix X T takes on the role of B by taking our starting components to the

238 ■ Linear Algebra for Earth Scientists

eigenbasis, and X acts like C by returning components from the eigenbasis to our original basis. Just as C = B −1 to undo that change of basis, X reverses the work of X T as these are Q-like orthogonal matrices. And there is no restriction on the number of dimensions, although we show examples in two or three dimensions. Now, what does Λ do? It is a diagonal matrix that stretches vectors in the eigenbasis. Because it is diagonal, the stretching is easy to understand and visualize, at least in two or three dimensions. We apply Λ to vectors written in the e-basis. Finally, once we are done with operations and visualization in the eigenbasis, we can return to the a-basis using X as the change of basis matrix from e to the original basis. We can write this as an equation with subscripts that capture the bases and change of bases. T Aa→a xa = Xe→a Λe→e Xa→e xa .

(8.38)

Operations on the left side of equation 8.39 occur in the a basis. On the right side, we change the basis to the eigenbasis, then do a linear transformation in this basis, and then return the original a-basis. And once again, the previous case is for symmetric matrices A. Otherwise, we are not assured that X is orthogonal, in which case we use: −1 Aa→a xa = Xe→a Λe→e Xa→e xa .

(8.39)

But, the overall process and workflow of operations are similar between the two cases. We present three additional examples for eigenanalysis and change of basis. The first is a common problem for geologists—going from magnetic to true north. The next is for orientation data on a gneissic rock with a defined foliation and lineation. We use the results to interpret the orientations of the fabric and the principal strains. The last example is determining the magnitude and orientations of principal stresses given stresses measured in an arbitrarily oriented coordinate system. None of these examples depends on solving Ax = b, but all require a change of basis.

8.3 THE GEOLOGIST’S DILEMMA—SETTING A BRUNTON® COMPASS When working in the field, geologists use a magnetic compass to establish the orientations of features like layering in rocks or lineations on surfaces. Compass needles indicate the direction of Magnetic North, but the geologist wants their reading to be in terms of True or Geographic North. This means that they have to correct for the difference between magnetic north—the actual direction the compass points toward the magnetic pole—and the geographic or map direction north referenced to the geographic pole. This difference is called the Magnetic Declination. Shown in Figure 8.4 is the compass of choice historically for geologists—the Brunton® pocket transit or simply the Brunton® compass. The main figure shows the dial set to a declination of 10° east, and the inset shows the compass itself. The reader can think of the geologist aligning the long axis or body of the compass, including the black arm, with the direction they intend to measure.

Rin

Change of Basis, Eigenbasis, and Quadratic Forms ■ 239

g

Photograph of Brunton® compass face with the inset showing the whole compass with mirror and sighting arm. The numbered ring on the compass face can be rotated to compensate for magnetic declination. The ring in the yellow circle shows the declination set to 10°E. Note that although north is up in the photograph, the E and W on the ring are reversed. Flipping the cardinal directions is key to understanding declination and reading the compass. The white arrow is the magnetic north end of the compass needle. We read the number at its tip to determine the orientation of the compass body. This compass is a quadrant model that divides the dial into NE, NW, SE, and SW quadrants. The reading is about 54° west of north, expressed as N54W. Figure 8.4

The compass’s dial rotates so the user can compensate for the local magnetic declination from the geographic north direction. In the example in Figure 8.4, by rotating the ring 10° toward the east for the declination, the user directly uses the north needle to read the compass’s orientation relative to geographic north. As noted in the caption to Figure 8.4, the compass is reading a 54° west-of-north trend for the compass body after a 10° rotation of the ring. At first glance, the readings in Figure 8.4 seem backward. The north magnetic arrow is rotated clockwise from the north direction of the compass body, but we are stating that this is a west-of-north heading. The quadrant ring on the compass is set to 10° east, but this setting is counterclockwise of the zero position. These readings seem to be the opposite of what we would expect. Why are these the correct ways of using the compass? Figure 8.5 shows how the magnetic north arrow and the body of the compass interact. We can see in this figure and Figure 8.4 that the ring, face, and body of the compass move together, but the north arrow is independent and is stationary as

240 ■ Linear Algebra for Earth Scientists

Brunton® compass on a rock face on the left side. The right side shows a schematic version of the compass with the aiming arm and division of the dial around the declination pin, shown as a black line in the blue ring. The lower right part of the diagram shows the compass with the body pointing to geographic north and magnetic north about 10° east. This is the 0° position of the ring. This rotation of the ring clockwise in the body compensates for the declination. The upper right shows the compass body rotated west to align with a rock face. Magnetic north is unchanged, but now the needle is reading in the western half of the dial. Figure 8.5

we rotate the compass. Since magnetic north is fixed, as the compass body rotates westward, which is counterclockwise, the north arrow appears to rotate clockwise or eastward. We want to read the orientation of the compass body, aligned with our direction of interest. The body and the north magnetic arrow move in an opposite sense to each other. This sounds like the variation between basis vectors and vector components we discussed earlier for a rotational change of basis. And we also need to figure out how to set the declination. In Figure 8.6, we show the variation in the declination of the Earth’s field for North America. It is 0° along a roughly north-south line through the continent, but everywhere else is either rotated to the east or west of north depending on the latitude and longitude. This is the angle for which we compensate by setting the declination on the compass using the adjustable ring. Although it seems backward, we must explain why this way of reading the compass and setting the ring is correct. Naturally, we can understand the whole process of using the compass by doing some linear algebra with a change of basis. 8.3.1 What are the Basis Vectors?

We will start by identifying the bases and the basis vectors. The first basis is the Earth’s magnetic field—we read the magnetic needle on the compass as it rotates freely as the compass changes orientation. The second system is the geographic basis,

Change of Basis, Eigenbasis, and Quadratic Forms ■ 241

True north

Magnetic north

Pin

Adjustment ring

The center shows a map of magnetic declination for North America. Declination varies from over 20° west of geographic north in eastern North America to almost 20° east in the west. Examples of the magnetic and compass basis vectors are shown. We use black arrows for the body, aligned with geographic north, and orange for the magnetic coordinate system. The left side shows the rotation of the compass ring to account for the 18° east declination of southern Alaska. Note this is indicated by a short black line at the head of the arrow at 0°. The right side shows the two systems of basis vectors. We take the angle θ = 45°. The equation shown is magnetic vectors expressed in both the magnetic and geographic basis. The center part is modified from a figure at http://usgs.gov. Figure 8.6

a fixed imaginary grid on the Earth’s surface. Using the north direction as the first basis vectors in both systems makes sense. What orientation should we use for the second basis vector? Throughout this book, we have used the north-east-down (NED) system, so our second basis vector will be east for both. We show these in equation 8.40. Going from north to east appears as a clockwise rotation in plan view in Figure 8.6. Still, this apparent negative-signed rotation is positive in the NED, right-handed system. We use its long axis for the compass’s orientation and will call this the P direction. " #

ˆm = 1 Magnetic: N 0

" #

ˆm = 0 , E 1

m

" #

ˆg = 1 and Geographic: N 0

g

,

(8.40)

m

" #

ˆg = 0 . , E 1

(8.41)

g

The result is that the basis vectors look exactly the same in their own coordinate systems. This is just how we define orthogonal, unit-length basis vectors. A subscript after a matrix or vector indicates which basis we use to write the components. We use g for the geographic and m for the magnetic system. We can

242 ■ Linear Algebra for Earth Scientists

write any arbitrary vector as scalar components times the basis vector. For example, the vector x in the geographic basis would be: " #

xg =

a b

ˆ g + bE ˆg . = aN

(8.42)

g

As a reminder, we read equation 8.42 to mean that the Components a and b are written in terms of the geographic Basis Vectors. To understand how to apply a change of basis, we start by figuring out what one set of basis vectors looks like in the other system. This is shown on the right side of Figure 8.6. The operation is similar to the rotation example in Section 8.1.2. We will write the geographic basis vectors in terms of their magnetic coordinates and ˆ g ]m means the the magnetic basis vectors in geographic components. The notation [N north geographic basis vector written in magnetic components. "

ˆ g ]m = cos(θ)N ˆ m − sin(θ)E ˆm = cos θ [N − sin θ "

ˆc ]m = sin(θ)N ˆ m + cos(θ)E ˆm = sin θ [E cos θ "

#

,

(8.43)

m

#

,

(8.44)

m

#

ˆ m ]g = cos(θ)N ˆ g + sin(θ)E ˆg = cos θ , and [N sin θ

(8.45)

g

"

#

ˆm ]g = − sin(θ)N ˆ g + cos(θ)E ˆg = − sin θ . [E cos θ

(8.46)

g

These equations are very similar to those shown in equations 8.17 through 8.20 of Section 8.1.2. The main difference here is that in the view of the compass, clockwise rotations have positive signs. Again, following Section 8.1.2, this means that we can write positive angles as θ E and negative angles as θ W. 8.3.2 Change of Basis Matrices

Using equations 8.43 through 8.46, we can form matrices that express any vector with components in the geographic basis as a vector with components in the magnetic basis or one in the magnetic basis as components in the geographic basis, respectively. We will specify a matrix we call B that is the basis vectors of the geographic system written as components in the magnetic system. h

ˆg E ˆg B= N

"

i m

cos θ sin θ = − sin θ cos θ

#

.

(8.47)

m

Similar to the example in Section 8.1.2, the matrix B changes the vector components of the geographic basis vectors to components expressed in the magnetic basis. The vectors themselves do not change, but their components do. We know we can express any vector written in the geographic basis to look like scalars multiplying the basis vectors shown for xg in equation 8.42.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 243

The Center part shows the negative rotation angle between magnetic north and the body of the compass. The left part shows the declination of 18°E for magnetic north. The right part is the correction for declination in the opposite sense to the rotation of the compass body P . Figure 8.7

To get x from the geographic basis to the magnetic basis, we multiply the vector xg in equation 8.42 on the left by the matrix B. This gives us the following: "

#

"

#

"

ˆ g + bB E ˆg = a cos θ + b sin θ = a cos θ + b sin θ Bx = aB N − sin θ cos θ −a sin θ + b cos θ

#

.

(8.48)

m

The important point here is that we can take any vector with components measured in the geographic basis to components in the magnetic basis by multiplying by B. As we have done in previous sections, we sometimes give this matrix a special subscript to show how it changes the components. "

Bg→m

#

cos θ sin θ , and Bg→m xg = xm . = − sin θ cos θ

(8.49)

Remember, if we want to go the other direction from components in the magnetic basis to the geographic basis, we need only use B T as we know from Section 8.1.2 T that Bm→g = Bg→m . We will compute the change of basis matrix for the example shown in the left part of Figure 8.7. The magnetic declination is 18° in a clockwise sense or 18°E relative to true north. Since this is a positive rotation, we can use the following results: "

#

"

#

cos(18°) sin(18°) 0.95 0.31 B= = − sin(18°) cos(18°) −0.31 0.95 " #

ˆ g ]m = B 1 [N 0

"

=

#

0.95 −0.31

m

" #

(8.50) m

"

#

ˆg ]m = B 0 = 0.31 , [E 1 0.95

.

(8.51)

m

Equation 8.51 makes sense for the left part of Figure 8.7. With respect to the magnetic ˆ g has a positive N and a negative E component, and E ˆg has positive N and basis, N

244 ■ Linear Algebra for Earth Scientists

E magnetic components. If we want to use a rotation matrix, then we see that the geographic basis is 18° to the west, so that we would use R(−18°) = B T . In the center part of Figure 8.7, we see an example of measuring the compass orientation from magnetic north. This means we are in the magnetic basis, and the compass is rotated 63° counterclockwise or toward the west of magnetic north. We would read this as 63° W in the magnetic basis. We set the components of the compass orientation using a rotation matrix from magnetic north to the P vector. In this case, we get: "

#

" #

"

#

0.45 0.89 1 0.45 R(−63°) = , and [R(63W )] = = Pm . −0.89 0.45 0 −0.89

(8.52)

What we are after, however, is the compass orientation in the geographic system. We must change the basis from the magnetic to the geographic system to get to this. The right part of Figure 8.7 shows both sets of basis vectors and the vector P . We have the change of basis matrix in equation 8.50 for going from geographic to magnetic basis, so we can just use the transpose of this matrix to go the other direction. Since we have the components of Pm , we can just multiply on the left by the B T to get the following: #"

"

0.95 −0.31 B T Pm = 0.31 0.95

#

"

#

0.45 0.71 = = Pg . −0.89 −0.71

(8.53)

We can find the angle for the rotation matrix to take geographic north to Pg and get: "

#

"

#

cos(−45) 0.71 = Pg = sin(−45) −0.71

" #

→ [R(45W )]

1 . 0

(8.54)

This gives us the angle between geographic north and the compass body being −45° or 45° W, so we say the compass is pointed toward 45° west of north. Notice also that this result can be determined using R(63W ) and R(18E) rotation matrices where we use E and W to set the sign of the angle. 8.3.3 Put a Ring on It

Doing matrix multiplication or adding/subtracting angles in the field would be inefficient. That is why the Brunton® compass performs the change of basis using a rotation of the compass ring. Figure 8.5 shows a geologist holding a compass against the surface of a rock. The Earth’s magnetic field sets the needle direction, and the rock’s orientation sets the body. The right side of this figure shows the magnetic declination with the compass body pointing toward geographic north. The big question is how do we rotate the ring so we can directly read the orientation of the body relative to true north? The only part of the compass that stays fixed in its orientation is the north arrow. The body can rotate, but the needle should always point to magnetic north. Because we use this arrow but want to read it in the geographic basis, the ring of the compass

Change of Basis, Eigenbasis, and Quadratic Forms ■ 245

Setting declination ring. The left part shows the compass body aligned with geographic north and set for a declination of 18°E for magnetic north, shown as angle θd . The ring is rotated so that the 0° reading on the ring is at 18° east of true north and aligned with magnetic north. The center part enlarges the compass and shows that the ring reads in the geographic basis (θg ) outward and magnetic basis inward (θm ). The figure’s right part shows the body pointing toward 45°W of geographic north. True and magnetic norths stay fixed as the body rotates. The offset of 18°E is maintained between the body and the 0° mark by moving the ring, reflecting the constant 18°E declination. Figure 8.8

has its east and west scales reversed. From the physical perspective, this should make sense to the reader, as shown in Figures 8.5 and 8.7, and as a rotation matrix. It is also consistent with how components change as we move between bases. The compass body is in the geographic basis, meaning that components in the magnetic basis— which is what the angle east or west is reflecting—should behave in the opposite manner. In the geographic basis, clockwise rotations are toward the east, so they should read toward the west in the magnetic basis. This is exactly how the ring is marked off. How should we rotate the ring to compensate for the declination? We show examples of reading the compass and setting the ring in Figure 8.8. The left side shows the compass body aligned with true north and the magnetic needle inclined to the upper right. We see immediately that we can read the ring in two ways. The first is just using the orientation of the north magnetic arrow and reading the intersection at the ring’s interior. The second way is to read the outer part where the pin attached to the body of the compass intersects the ring. Looking at the small portion of the ring shown in the middle of Figure 8.8, we see that the pin is reading 18° E while the magnetic north arrow is reading 0°. The inner part is set to read 0° in the magnetic basis when the pin and body are aligned with geographic north. The outer edge reads 18° E in this configuration, which is the orientation of the north arrow in the geographic basis. This means that the change of basis is performed and hard-wired into

246 ■ Linear Algebra for Earth Scientists

Stereographic image of gneiss viewed downward on the foliation plane. The elongated minerals are feldspars, showing a scatter in orientation. We use the orientations of six grains in the eigenanalysis. Figure 8.9

the compass by rotating the ring. The inner part of the ring reads the north arrow in the magnetic basis, and the outer part reads the declination in the geographic basis. So the setting and reading of the Brunton® compass makes sense from a linear algebra point of view. We want to know the orientation of linear and planar features in a geographic or true north reference frame, and we reference north and east as the basis vectors for this system. The compass needle gives us the orientation of the magnetic basis in terms of north and east for that system. We read this orientation using the inner part of the compass ring, but with east and west swapped on the ring because components change in an opposite manner to the change in the basis vectors. Finally, we must correct for the mismatch of magnetic and geographic north. This is done in the geographic system by reading and rotating using the outer part of the ring.

8.4 A NICE GNEISS Figure 8.9 shows a stereoscopic image of augen gneiss. Augen is the German word for Eye, which is used because of the almond shape of the deformed feldspar grains. Note we are using augen to understand eigen stuff. In this figure, we are looking perpendicular to the foliation in the rock. The augen are lined up on the foliation surface and form ellipses with a limited range of orientations. We can use the eigenanalysis of the long axes of the grains to figure out the orientations of the principal strains. The long axis or the λ1 trend and plunge gives us the S1 direction and the λ2 and λ3 orientations to interpret the orientations of S2 and S3 . We show the analysis of this rock in Figure 8.10. The center part shows the orientation data plotted in NED coordinates, with the positive D axis oriented vertically upward. The scatter of data in a steep, north-northeast plane is evident when

Change of Basis, Eigenbasis, and Quadratic Forms ■ 247

St r e

tch

Augen data

R ot ation

Rotation

S tr et c

h

Actions of the G, X, Λ, and X T matrices in equation 8.55. The data from A are plotted using NED coordinates of the direction cosines in the center. Note that the axes are rotated so that the D-axis points upward. The data strongly scatters in the D-direction. The Upper left is the original orientation data in the NED coordinate system redrawn in a more oblique perspective. The actions of the eigenanalysis matrices are shown going counterclockwise around the Figure. We show X TA by changing to the eigenbasis x1 -x2 -x3 in the lower left. Across the lower part, we show stretching using the matrix Λ. On the right, we return the data to the NED system using X. Across the top, we show the action of the G matrix. Both paths lead to the same transformation of the orientation vectors in the upper right. Figure 8.10

248 ■ Linear Algebra for Earth Scientists

we examine the center part of Figure 8.10. We will follow these vectors during the progression of the analysis. The matrices needed for the eigenanalysis are shown in equation 8.55. The rows of matrix A are the direction cosines for the axis orientations of six augen. The direction cosines are in the NED convention, with the first column being the N coefficient, the second the E, and the third the D. We then compute the matrix G = ATA to be used for the eigenanalysis. We also show the matrices Λ and X that form the eigenvalues and eigenvectors of G. 0.96 0.94  0.92  A= 0.78  0.69 0.51 

0.27 0.22 0.22 0.43 0.47 0.23

0.10 0.24  0.32  , 0.48  0.55 0.82 

4.00 1.45 1.78   T G = A A = 1.45 0.63 0.79 , 1.78 0.79 1.36 

0.057 0 0   0.478 0 , Λ= 0 0 0 5.460 





0.22 0.49 0.84   X = −0.94 −0.13 0.32 . 0.27 −0.86 0.43 



(8.55)

In this case, G and its eigenvectors and eigenvalues will reflect the average orientation of the axes of the augen. Remember that G is in the original geographic north basis. We show the progression of the eigenanalysis matrices in Figure 8.10. The direction cosines are plotted in the NED system in the center and upper left. The upper left shows the eigenvectors as they are oriented in the NED system. The lower left shows the eigenvectors as the coordinate system. Both diagrams clarify that the orientation data do not cluster but are in a NE direction and spread along an arc running primarily in the D direction. The x1 direction best aligns with the overall center of data, and the data scatter along a plane through x1 and x2 . The x3 direction is normal to the scatter defined by the orientation data. This leads to the conclusion that the λ1 direction most closely aligns with the data. We see λ2 perpendicular to λ1 but still somewhat aligned with the data. Finally, the λ3 direction forms the last axial direction in the eigenvector coordinate system and has the least alignment with the data. This configuration becomes amplified during the rest of the analysis. Across the bottom of Figure 8.10, we show the action of the eigenvalue matrix Λ. The most obvious result is significantly stretching the vectors in the x1 direction and the contraction of scatter in both the x2 and x3 directions. The Ratios of the eigenvalues are about 11.4 for λ1 /λ2 and 96 for λ1 /λ3 , and reflect the relative stretching toward the x1 direction and contraction in the other two. Lastly, the right side shows the rotation of the transformed vectors back to the NED coordinate system. Comparing the starting vectors to those in the upper right of Figure 8.10 shows that the values are strongly stretched and reoriented into the λ1 -x1 direction. One result of the operation is to compute the orientation of the best-fitting long axis to the augen. We associate this with the x1 direction and interpret this as the orientation of S1 , the greatest principal stretch. What do we get from x2 and x3 ? Looking at the lower left part of Figure 8.10, we see that the orientation vectors

Change of Basis, Eigenbasis, and Quadratic Forms ■ 249

point mostly toward x1 but are secondarily dispersed in the x2 direction. Only a small component of the orientation vectors are in the x3 direction. If we read the magnitude of the eigenvalues as the tendency of the data to align with the associated eigenvector direction, then x3 has the least elongation of the augen and is thus the S3 direction. This means x2 is best interpreted as associated with S2 , the intermediate principal stretch. These orientations fit the structural geology of the augen gneiss. The elongation direction of the augen is called a lineation, in this case, a stretching lineation, in the direction of S1 . To get the orientation of the lineation, we convert the x1 direction cosines into trend and plunge. The result is given in equation 8.56. The lineation and the S2 direction are in the plane of the foliation, which is the planar fabric in the rock. We could find the normal vector to the foliation by taking the cross product of x1 with x2 . This is unnecessary because this is the x3 direction. We convert the direction cosines for x3 into the dip azimuth and dip of the foliation plane. " # " # 0.84 T 21.5   x1 = 0.32 → = giving S1 = 021.5, 25.5, P 25.5 0.43

(8.56)

" # " # 0.22 DD 102.5   x3 = −0.94 → = giving foliation = 102.5, 74.3. D 74.3 0.27

(8.57)









We have set up this example so that the direction cosines for the D-direction had a positive sign in the solution vectors. This ensures the plunge is into the Earth following the geologic convention for trend and plunge. What if they were negative? We would reverse the signs of all components. This would yield a positive D value, so a plunge into the earth. The signs of the N and D components flip, leading to a 180° rotation in the trend direction. This is sensible because a line through the origin can be represented by a vector pointing in either direction. For geology, we pick the one that plunges into the Earth and has a positive D-direction cosine.

8.5 STRESSING THE USE OF EIGENANALYSIS The idea of stress acting in the Earth to create faults and mantle and crustal motions is fundamental to the Earth Sciences. Figure 8.11 shows a common visualization for stress. There is an overall stress vector σ acting on a plane we are interested in, such as a fault. The stress vector is resolved into components perpendicular and parallel to the plane we show as σn and τ , respectively. These are called normal stress and shear stress. The coordinate axes are arbitrary, so we will use x, y, and z. Stress is a force acting on an area, so the force and the area must be specified. This means it cannot be represented as a simple, three-component vector in a 3dimensional framework. Instead, we must combine the stress and the area it acts on into a single mathematical object. For doing linear algebra, it is specified by a 2 × 2 matrix in 2D and a 3 × 3 matrix in 3D. In engineering, physics, and parts of the Earth Sciences, these matrices are called Second-order Tensors. The reader may

Pla ne of Int ere

st

250 ■ Linear Algebra for Earth Scientists

Figure 8.11 Diagrams for stress on a plane. The left side shows the plane in yellow and the overall stress vector σ in blue. Red arrows show the normal stress σn and shear stress τ acting on the plane. The right side shows a stress vector broken into components acting on the coordinate planes. Planes are named by their normal vector. The tractions are resolved into one normal stress and two shear stresses acting on a unit area of the coordinate planes.

encounter a host of confusing and inconsistent definitions of the term Tensor depending on the source. For this book, a tensor is a physical quantity present regardless of the coordinate system. The authors’ favorite example of a tensor is a one-kilogram weight. Its mass is unchanged whether we measure in terms of 1 kilogram, 1000 grams, 2.2 pounds, or 35.3 ounces. This is a zeroth-order tensor, but we will keep calling it a scalar. Likewise, a speeding car travels at the same velocity tensor whether we express it in kilometers/hour or miles/minute. This is a first-order tensor, but we will call it a vector. The simplest definition is that a Tensor is something that exists as a physical object or quantity independent of the coordinate system but is well specified regardless of which coordinate we select. To follow the traditions and literature of the Earth Sciences, we will use the term Stress Tensor or State of Stress even though it is just a square matrix for a specific coordinate system. Changing the coordinate system changes the scalar, vector, or matrix components but not the actual physical quantity. This is the same as changing basis vectors we dealt with earlier. To define the stress tensor in three dimensions, we must specify the coordinate system and nine components of a 3 × 3 matrix, as shown on the right side of Figure 8.11. We can do this by determining three Traction vectors acting on the orthogonal coordinate planes. Traction sounds like it should be a shear stress, but it is not. In mechanics, traction is any force on an area. The typical approach is to show the traction as acting on a unit area of one of the coordinate planes. We name a coordinate plane by the axis normal to it. We name the tractions σ(x) , σ(y) , and σ(z) as shown in Figure 8.11. Because we are in

Change of Basis, Eigenbasis, and Quadratic Forms ■ 251

three dimensions, each traction vector is defined by three components acting in our coordinate system’s directions. One acts perpendicular, and the other two act parallel to a coordinate plane. Three coordinate planes defined in three dimensions require defining nine values. We will keep this Figure in mind as we discuss the stress tensor in the next section. 8.5.1 Stress Tensor or Matrix

We look at a set of stresses acting along and perpendicular to all the coordinate axes, giving us 3 components for every direction as seen in Figure 8.11. These components become the entries in a 3 × 3 matrix we call the stress tensor, given in equation 8.58. For the entries in the stress tensor, the first subscript corresponds to the plane acted on and the second to the direction it is acting. For this reason, σxy is the stress acting on the x plane in the y direction. x-plane y-plane  z-plane 

x-direction σxx σyx σzx

y-direction σxy σyy σzy

z-direction  σxz . σyz σzz

(8.58)

So, we see that the stress tensor combines nine individual vectors. Although we show the stress tensor in terms of the x-y-z coordinate planes, we can use it to find the stress acting on any arbitrary plane. This means we can use the stress tensor to find the stress acting on any plane of interest in the system. We commonly say that the stress tensor represents the state of stress or Stress at a Point. Both these imply that it encapsulates stresses acting in all ways in all directions. The state of stress is independent of the coordinate system. This means the stress state exists for our system without the plane we are interested in, but τ and σn do not until we declare a plane of interest. Because accelerations in the Earth are mostly very low or zero, the stresses must be in equilibrium. This means that the cross-diagonal terms in the stress tensor must be equal. In other words, σxy = σyx , σxz = σzx , and σzy = σyz . The stress tensor is a bit simpler because rather than 9 unique components, we only need 6. But much more importantly, this means we can rewrite the stress tensor as a symmetric matrix as shown in equation 8.59. This Matrix for Real-world Stresses is commonly symmetric positive definite, the best kind of matrix. 



σxx σxy σxz   σxy σyy σyz  = T , stress tensor. σxz σyz σzz

(8.59)

What is the next obvious step? Find the eigenvectors and eigenvalues of the stress tensor matrix! This takes a little justification, which we give in the next section. A little more understanding is needed about stress before we can fully appreciate and use the eigenanalysis.

252 ■ Linear Algebra for Earth Scientists

8.5.2 Finding Principal Stresses

Understanding stresses acting in the Earth is a subject richly enhanced by eigenanalysis. In this section, we will set up a problem to be solved by eigenanalysis, perform the diagonalization, and then show how to read the results in terms of a change in basis. We start with the stress tensor, a 3 × 3 symmetric matrix. We then analyze this matrix and recover the eigenvector and eigenvalues. From this, we interpret the magnitude and orientations of the Principal Stress. The principal stresses are like the principal strains in that we reduce the problem to a diagonal matrix and an orthogonal coordinate system. We show a sample stress vector here with the coordinate axes in the NED coordinate system in equation 8.60. Notice that this is a symmetric matrix. Following geologic practice, normal stresses that are compressive inward are considered to have positive signs, and the normal stresses are arrayed along the diagonal of the matrix. The off-diagonal components are the shear stresses, and explaining how to assign them signs is beyond the scope of this chapter. 500 25 200   U =  25 1000 100 . 200 100 1500 



(8.60)

We perform the eigenanalysis and get the eigenvectors and eigenvalues as shown. 1557 0 0   981 0  Λ= 0 0 0 461 



−0.188 −0.024 0.982   and X = −0.182 0.983 −0.011. −0.966 −0.180 −0.188 



(8.61)

How do we interpret these results? The principal stresses are the Λ components. Like the principal strains, we order the principal stresses from greatest to least and use the symbol σn for the magnitude of the stress. In this case, we have σ1 = 1557, σ2 = 981, and σ3 = 461. Now, we have to associate the principal stresses with directions. We assign the eigenvectors to eigenvalues so that they will be numbered x1 , x2 , and x3 . For simplicity, we refer to the coordinate system as 1-2-3, remembering that 1 is the greatest and 3 is the least principal stress direction. Remember, in equation 8.60, we set the stresses in terms of NED coordinates. Hence, the eigenvector matrix represents the orientations of the principal axes or principal stresses in terms of NED components. We show the axes in Figure 8.12. The figure shows the orientation of the principal stress axes. Because the matrix is in principal components, we can plot the axes as eigenvectors or their inverse. Any scalar multiple of an eigenvector is still an eigenvector, so multiplying by −1 is permitted. Switching the sign of two eigenvectors at a time preserves the handedness of the coordinate system. The principal stresses are roughly N-S for the least principal stress, E-W for the intermediate stress, and nearly vertical for the greatest principal stress. We should have anticipated something like this. The diagonal components are much greater than the off-diagonal ones, and the column vectors in the matrix in equation 8.60 have their largest components in the N, E, and D directions. We can also consider the eigenvectors as the normal vector to a principal plane. In other words, we can define the plane formed by the 2 and 3 axes as the 1 principal plane.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 253

Principal axes gained from eigenanalysis of the matrix in equation 8.60. These vectors indicate the orientation of the principal stresses in the NED system. The principal stresses are labeled from the greatest principal strain as 1 to the least as 3. Note that we are looking at the top of a horizontal plane with the D axis downward. In plotting the eigenvectors, we have chosen the axis orientation so that the D component is negative and the vector plots above the plane. We can switch the sign of two eigenvectors and retain a right-handed system. These vectors indicate the orientation of the principal stresses in the NED system. Figure 8.12

E

D

N

The eigenvalues represent the principal stresses acting in the direction of each eigenvector and on the principal planes. The tensor for the principal stresses is diagonal, meaning that no shear stresses are acting on any of the principal planes. Thus, the diagonalization of the stress tensor in equation 8.60 gives us the magnitude and directions of principal stresses. In turn, we can view the magnitude as the lengths of semiaxes of an ellipse or an ellipsoid representing the state of stress in the system. The stress ellipse or ellipsoid can then be interpreted as the stresses acting on a plane as the directed radii of the figure, as shown in Figure 8.13. Again, the stress state is coordinate system independent, but the stress tensor is defined by components referenced to some system. This means we can rotate the coordinate system however we want. It does not change the stress state, but it does change the tensor coordinates. If we specify a particular plane using its normal vector, we can compute the stresses acting on it. This will give us values for the shear stress and normal stress on the surface. Changing coordinates does not change these because we apply the same changes to all vectors. When making measurements such as stress and strain, it is nearly impossible to set the initial or observation coordinate system to coincide with the principal axes given by the eigenvectors. This means that the ellipses and ellipsoids we use to visualize the stress or strain state will appear to be rotated, as shown for the ellipse

254 ■ Linear Algebra for Earth Scientists

Stress ellipse showing radii representing inward directed stress vectors. Principal stresses σ1 and σ2 directions correspond to the long and short axes of the ellipse. The plane of interest is shown in the bold red line. Stresses resolve on the plane to create inward-directed normal stress and shear stress that is oriented clockwise manner. The shear stress arises from the longer vectors to one side of the plane normal. The σn and τ components are shown on both sides but grayed out on the left. As the equilibrium condition demands, these components are directed in opposite directions on either side of the plane. Figure 8.13

Plane of interest

in Figure 8.13. The typical equation we use for these figures will contain cross terms, such as the equation: 5x2 + 10xy + 4y 2 = 0.

(8.62)

This is a perfectly good equation for an ellipse, but how do we relate the various terms to the axial lengths and orientations? For the case of an ellipse aligned with the coordinate axes, we had no terms that looked like xy. The matrix was diagonal, so relating the eigenvalues to the figure parameters was clear. This is different in equation 8.62 where we have cross terms. This pushes us to discuss quadratic forms, another important topic in linear algebra.

8.6 QUADRATIC FORMS We have the linear algebra technique of eigenanalysis, which produces eigenvector and eigenvalue matrices. The numerous examples show that these matrices can be interpreted as linear transformations and are easily visualized as circles and ellipses. How do we formulate circles or ellipsoids using vectors and matrices in linear algebra? The answer leads us to the general subject of quadratic forms. Our discussion will focus on two and three-dimensional examples to visualize them, but the approach can be generalized to any number of dimensions.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 255

8.6.1 Circles Become Ellipses

The matrix we get in most applications, a symmetric positive definite matrix, has an excellent way to be visualized using eigenanalysis. In two or three dimensions, we can easily look at the action of the matrix as taking an initially circular or spherical object and linearly transforming it into an ellipse or ellipsoid. This visualization technique is ubiquitous in the Earth and Environmental sciences. Doing eigenanalysis gives us the lengths of the semiaxes and their orientation. Determining absolute measurements to establish eigenvalues when studying deformed rocks is exceedingly tricky. It requires much knowledge about the exact mechanisms of deformation and some markers of known original lengths. Such markers are usually impossible or very difficult to find. A standard solution is to use the Ratio of the principal strains rather than worrying about their absolute values. These ratios are easier to visualize and compute. A common approach is to study the deformation of originally circular or spherical markers in a rock, such as the pink feldspar grains shown in Figure 8.14. This is another example of an augen gneiss. Circular Markers are special because their radii are the same length, and all their tangents are perpendicular to a radius. When deformed, they become elliptical. The radii have different lengths for ellipses, and tangents make an angle with them. For an ellipse formed from a circular object, we know that all ellipse radii started the same length, and all tangent lines were initially perpendicular to the radii. The visualization clearly shows both Elongation and Angular Deformation. We have the transformation of a circular marker to an elliptical shape, but how do we relate the two? Let’s start with the equations of a circle and an ellipse. Circle: x2 + y 2 = r2 → Ellipse:

x2 y 2 + 2 =1 r2 r

(8.63)

x2 y 2 + 2 = 1. a2 b

(8.64)

In these equations, r is the circle’s radius, a is the long or major semiaxis, and b is the short or minor semiaxis of the ellipse. The area of the circle is πr2 ; the area of the ellipse is πab. The a and b axes have lengths that are a function of the eigenvalues, and their orientations are parallel to the eigenvectors. This shows why calling the eigenvectors the principal axes is appropriate. We show how to relate the parameters of deformed shapes to the eigenvectors and eigenvalues we get from diagonalization. The setup is in Figure 8.15 for a square and a circle. We have dealt with the case for the square already and will use those results in our discussion here. For the circles, we will assume a radius of 1. The strain is pure shear deformation, giving us a familiar-looking strain matrix similar to what we analyzed earlier. "

#

2 0 0 0.5

" #

1 → λ1 = 2, x1 = , 0

" #

λ2 = 0.5, x1 =

0 . 1

(8.65)

256 ■ Linear Algebra for Earth Scientists

Picture of augen gneiss, with undeformed sample for comparison in the inset. Pink grains are Kfeldspars, and we use these to monitor deformation. Notice that we can approximate the original pink grains as circular, but they are elliptical in the deformed samples. There is a range in grain sizes in the deformed and undeformed rocks, making volume change estimates impossible. The changes in the shape of grains can be tracked. The yellow ellipse in the main image can be described by its eigenvalues for the strain ratios and eigenvectors for its orientation. This case is simpler than the fossil example at the beginning of the chapter because we are monitoring the deformation of regular shapes. Foliation in rock is parallel to the long axis of the strain ellipse. Axes for the greatest principal strain S1 and least S2 are shown parallel and perpendicular to foliation. Figure 8.14

on or Foliati

Fabric

The result for the square is a rectangle half as tall and twice as wide. The strain does not change the area of the deformed square. This is consistent with the eigenanalysis, which shows no change volume because λ1 × λ2 = 1. We use the same approach for understanding the circle. It starts with radius 1, and we apply the deformation matrix in equation 8.65 to get an ellipse twice as wide as the initial circle but only half as tall. The major semiaxis is length 2, and the minor semiaxis is 0.5. These, in turn, are the same as the eigenvalues. There is no volume change because the area of the circle is π, and the area of the ellipse is the same at (2 × 0.5) × π = 1 × π = π.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 257

Applying the same pure-shear deformation to a square and a circle. The strain applied is a doubling of length in the x-direction and a halving length in the y-direction. This deformation corresponds to the eigenvalues and eigenvectors given in equation 8.65. The ellipse’s long axis would be evident in many rocks, such as the gneiss shown in Figure 8.14, by a pronounced fabric or foliation. Also, note that the horizontal and vertical lines do not rotate, although they stretch. These are the eigenvector directions, and the stretching is by a factor of the eigenvalues. A line not in the eigenvector direction rotates, as shown by the yellow lines starting at 45° to x. The yellow lines rotate by the same angle. Figure 8.15

This shows that if we can determine what happens to a circular marker and establish the ratio of long and short semiaxes of the resulting ellipse, we also establish the eigenvalues if we assume no accompanying area change. "

#

"

#

a 0 2 0 Λ= = 0 b 0 0.5



x2 y2 + = 1. 22 0.52

(8.66)

The eigenvalues are exactly the lengths of the major and minor axes of the ellipse. We use these in the equation for an ellipse by expressing a as 1/a2 for the long axis and b as 1/b2 for the short axis. In this way, the eigenanalysis ultimately defines the strain matrix, its orientation, and how to visualize it as an ellipse. It is common in deformed rocks to be able to establish at least an average orientation for the long and short semiaxes, such as S1 parallel to the foliation shown in Figure 8.14, and thus get the orientation of the eigenvectors by assuming x1 ∥ S1 . In two dimensions, the x2 direction is assumed to be parallel to S2 , perpendicular to S1 . In this case, we get the following: S1 = 2, S2 = 0.5.

(8.67)

The principal strains are equal to the eigenvalues. These results fit with equation 8.66 and Figure 8.15 and show how we can directly interpret the eigenanalysis in terms of the deformation of rocks. After looking at the correspondence of the matrix Λ with the a and b semiaxes of an ellipse, we wonder whether all such eigenvalue matrices represent ellipses. The answer is yes! So long as the eigenvalues are positive, which is always true with symmetric, positive definite matrices. This result is important. It means we can present

258 ■ Linear Algebra for Earth Scientists

any two-dimensional case as an ellipse, three-dimensional as an ellipsoid, and have some idea of how to start visualizing the actions of matrices in four or more dimensions. 8.6.2

Axes and Coordinate System are Aligned

This section will cover the quadratic forms for ellipses and ellipsoids where the axes of the shape are aligned with those of the coordinate system. For these systems, there is no rotation to and from the eigenbasis, and the eigenvector matrix is the identity matrix, X = I. We will take on cases where the axes are not aligned after this. We start with the standard method to express a circle in quadratic form using vectors and matrices. Using an x-y coordinate system and calling the radius of the circle r, we write the Circle Matrix: " #T "

x2 y 2 x + 2 = y r2 r

1/r2 0 0 1/r2

h i 1/r2 x 0 = x y y 0 1/r2

#" #

"

#" #

x = 1. y

(8.68)

This gives the familiar equation x2 + y 2 = 1 if r = 1. We can do the same operation with an Ellipse Matrix. In this case, we associate the long axis a with x and short axis b with y: i 1/a2 x2 y 2 h 0 x y + = 0 1/b2 a2 b2 "

#" #

x = 1. y

(8.69)

A common and informative formulation of this result is given by working with a diagonal matrix A of the reciprocals of a and b to a single power. Because this matrix is diagonal, AT = A. We also fix a column vector u that combines x and y. Using the matrix and vector in the Quadratic Form, we get: h i 1/a2 1/a 0 0 A= , x y 0 1/b 0 1/b2 "

h

x y

" i 1/a

0

#

#T "

0 1/b

"

#" #

x = uTA2 u = y

(8.70)

#" #

1/a 0 0 1/b

x = uTATAu = (Au)TAu = 1. y

(8.71)

Of course, this can be easily extended to three or more dimensions. The nice feature of equation 8.71 is that it expresses the square of the product of a vector and a matrix. It would have been simple to express this as something looking like (Au)2 = 1, where Au holds all the details about the sort of figure or shape we are working with and its dimensions. We know that the middle term ATA in equation 8.71 is a symmetric matrix. The last aspect of this is that the product looks like a magnitude because (Au)TAu looks like ∥Au∥2 . And this is also a quadratic equation in Au. How do we incorporate the eigenvalues into this equation? In equation 8.66, we showed an example of what to do. The axis dimensions a and b come from the eigenvalues of the linear transformation matrix. So if a is the long axis, we should associate it with the largest eigenvalue λ1 and b with λ2 . For this example, we will

Change of Basis, Eigenbasis, and Quadratic Forms ■ 259

associate the x-axis with λ1 and the y-axis with λ2 . From this we can rewrite equation 8.69 as: h

D = A A → u Du = x y T

T

" i 1/λ2 1

0

0 1/λ22

x2 y2 x = 2 + 2 = 1. y λ1 λ2

#" #

(8.72)

We are using the matrix D = ATA for the diagonal matrix of quadratic form. This relates to but differs from the eigenvalue matrix Λ. One thing to be careful with in this version is that we will have diagonal coefficients in the matrix D in reverse order. This is because of the following conditions: λ1 > λ2 and a > b →

1 1 1 1 < 2 and < . 2 a b λ1 λ2

(8.73)

We get the semiaxes of the ellipse from the eigenvalues and the matrix Λ. To use these axial lengths in the equations for quadratic form, however, we must use their squared reciprocal in the coefficients matrix, as we see in equation 8.72. Again, we use D for the matrix of quadratic form and not a symbol involving Λ to avoid confusion. If we go to three dimensions, our result would be:  2 i 1/λ1  z  0

h

x y

0

0 0 x x2 y2 z2   2 1/λ2 0  y  = 2 + 2 + 2 = 1. λ1 λ2 λ3 0 1/λ23 z  

(8.74)

This is the equation for an Ellipsoid Matrix aligned with the axes of the x-y-z coordinate system. Our matrix D grows as we go up in dimensions but is strictly diagonal. When aligned with coordinate axes, the eigenvector matrix of the transformation remains equal to the identity matrix. Of course, we can go up to any number of dimensions and still do the same analysis. In higher dimensions, we could refer to the figure that we define as an n-Ellipsoid. 8.6.3

Tilted Ellipsoids and Quadratic Forms

The previous section was the case of ellipsoids whose axes were aligned with the coordinate system. The eigenvalue matrix is then the same as the linear transformation matrix (e.g., principal strain matrix) and is a diagonal matrix. What about the previous cases where the strain matrix is not diagonal but symmetric? We saw these could be diagonalized into the eigenvector and eigenvalue matrices, but the picture that emerges is one of a Tilted Ellipse. How do we take these diagonalizable symmetric matrices into quadratic form? We can view the components of the matrix of quadratic form as the coefficients for the quadratic equation of an ellipse in R2 or an ellipsoid in R3 . We retain the quadratic form in higher dimensions but have no good term for describing an ndimensional ellipsoid. Let’s start with the 2 × 2 matrix B in the equation 8.75 and show how to specify the symmetric terms. We use B instead of D to emphasize that, in this case, the matrix of quadratic form is not diagonal. "

B=

b11 b21

b12 b22

#

"



b11 b12

b12 b22

#

, as

b12 = b21 .

(8.75)

260 ■ Linear Algebra for Earth Scientists

Our result from this will be equation 8.76. We will work in a vector space with coordinates u and v. h

u v

" i b 11

b12

b12 b22

#" #

u = b11 u2 + 2b12 uv + b22 v 2 = 1. v

(8.76)

Equation 8.76 is still a quadratic form as all terms have degree 2, like u2 or uv. In three-dimensional cases, we get similar but somewhat longer answers. 







b11 b12 b13 b11 b12 b13     B = b21 b22 b23  → b12 b22 b23  , as b13 b23 b33 b31 b32 b33

b12 = b21 b13 = b31 , b23 = b32

b11 u2 + b22 v 2 + b33 w2 + 2b12 uv + 2b13 uw + 2b23 vw = 1.

(8.77) (8.78)

We can take this to more dimensions, but the equation remains at degree 2. Hence, the quadratic form will generalize to as many dimensions as we want. 8.6.4

Finding the Ellipse

In the cases we have studied so far, we have been given a symmetric strain matrix that we analyze for its eigenvectors and eigenvalues to get the size and orientation of the principal strains. Unfortunately, this is not the case for most analyses we would do. We see rocks such as the augen gneiss shown in Figure 8.14 where we have a lot of elliptical grains, but nowhere is there a strain matrix. Instead, we must establish this matrix ourselves. How would we go about this in a two-dimensional example? We would have to analyze elliptical-shaped markers we think originated as circular features. Let’s start with the ellipse shown in Figure 8.16. We measure it using an arbitrary coordinate system and scale, plotted as the u and v axes. Consider the quadratic form again for an ellipse in equation 8.76. This is one equation in two variables (u and v) containing three constants (b11 , b12 , and b22 ). The values for the variables u and v come from points on the ellipse in Figure 8.16, so they are known input parameters. The three constants that relate to the size and tilt of the ellipse are the unknowns. So degree-2 equation 8.76 is in terms of three unknowns. To solve this system, we must have three distinct points on the ellipse to write three equations. Taking u and v values from Figure 8.16 and inserting them into equation 8.76 gives us three equations. b11 (4.1)2 + 2b12 (−4.1)(0) + b22 (0)2 =1 2 b11 (0) + 2b12 (3.2)(0) + b22 (3.2)2 = 1 . b11 (−2)2 + 2b12 (−2)(−3.5) + b22 (−3.5)2 = 1

(8.79)

If we compute each of the second-degree terms for u and v, the three equations in 8.79 can be rewritten in matrix and vector form as: 16.8 0 0 b11 1      0 10.2 b12  = 1 .  0 4 14 12.2 b22 1 





 

(8.80)

Change of Basis, Eigenbasis, and Quadratic Forms ■ 261

Tilted ellipse drawn in u-v coordinate system. The major axis is up to the right, and the minor is up to the left. Visual estimates for point positions used in this section are shown in the figure as red dots with coordinates. Additional points used in the next section are shown in black dots. Establishing the location of points on an ellipse in a local coordinate system is a common exercise in the Earth Sciences. Ellipse was constructed using a = 5, b = 3, and the major axis inclined at 30° counterclockwise from u. Figure 8.16

4

-4

2

-4

The u-v points were roughly estimated for the ellipse in Figure 8.16. This matrix is full rank, so we can solve this for b11 , b12 , and b22 by inverting the coefficients matrix and multiplying it times the vector of ones. Once we have these values, we can substitute them into equation 8.75 and obtain B, the matrix of quadratic form. " # b11 0.0595 0.0595 −0.0310     = B. b12  = −0.0310 → −0.0310 0.0980 b22 0.0980 







(8.81)

Now we can do eigenanalysis to get the eigenvector and eigenvalue matrices of B. "

#

"

#

0.8739 −0.4860 0.0423 0 XB = , ΛB = = D. 0.4860 0.8739 0 0.1152

(8.82)

We have subscripted X and Λ because their matrices apply to the matrix B, the quadratic form matrix that is symmetric but not diagonal. This is undoubtedly a linear transformation, one related to, but different from, the strain matrices we used in the previous sections of eigenanalysis. The matrices B and D specify the quadratic form. Getting to the eigenvectors and eigenvalues for the underlying linear transformation is simple. First, the eigenvector matrices are the same, and the eigenvectors of the quadratic form are the same as for the strain matrix, XB = X. To get to the eigenvalue matrix, we take the square root of the reciprocals of the diagonal elements in D that we call ΛB . We can now compute a strain matrix. "

#

"

#

"

#

0.8739 −0.4860 4.86 0 4.41 0.811 X= , Λ= , A = XΛX T = . (8.83) 0.4860 0.8739 0 2.95 0.811 3.40

262 ■ Linear Algebra for Earth Scientists

With these three points, we get semi-axes of lengths a = 4.86 and b = 2.95. We view the eigenvector matrix X as a rotation matrix and can determine the inclination of the major axis using inverse trigonometric functions to get 29.1◦ or R(29.1). "

#

"

0.8739 −0.4860 cos θ − sin θ X= = 0.4860 0.8739 sin θ cos θ

#

→ θ = 29.1◦ .

(8.84)

For this example, we set the starting ellipse shown in Figure 8.16 with a = 5 and b = 3. The major axis is 30° to u. This gives us the following transformation matrix A, matrix of quadratic form B, eigenvector matrix X, and eigenvalue matrix Λ. "

#

"

#

4.5 0.87 0.0578 −0.0308 A= , B= , 0.87 3.5 −0.0308 0.0933 "

#

"

(8.85)

#

0.866 −0.5 5 0 X= , and Λ = . 0.5 0.866 0 3

(8.86)

We can see that these matrices in equation 8.85 are reasonably close to those in equations 8.81 and 8.83. Those latter values were computed using estimated points on the ellipse, not ones derived using the values in 8.85. This reminds us why the eigenvector and eigenvalue matrices are so important and enlightening. It is hard to tell much just from looking at the A matrix other than it is symmetric positive definite, indicating that it is diagonalizable. The linear transformation it accomplishes is much less obvious. 8.6.5 Finding the Ellipse with more Points Using Least Squares

The previous example had just the right number of points on the ellipse to get an answer. Looking at Figure 8.14, it is easy to see that strain analysis will use more than just three measurements. The irregularities in the grains make taking more measurements on each grain and collecting ellipse parameters from several grains necessary. We will look back at the ellipse in Figure 8.16 and add three more estimated points to the figure. How do we solve this now that we have 6 total points? The tool we go to is least squares fitting. We will set this up for the three points we have already analyzed and the three remaining ones shown as black dots in Figure 8.16. This will give us a 6 × 3 matrix that we will call A. 16.8 0 0 1    1  0 0 10.2   a   1  4   14 12.2     1 Aˆ x=b →  a2  =  .  16 1 24 9    a   1  4 −9.2 5.3  3 19.4 31.6 13 1 



 

(8.87)

We use the Normal Equation to get a solution. 947 1016 466 a1 60.2      AT Aˆ x = AT b → 1016 1855 749 a2  = 60.4. 466 749 531 a3 49.7 









(8.88)

Change of Basis, Eigenbasis, and Quadratic Forms ■ 263

a1 0.0556     for six points: a2  = −0.0370. a3 0.0971 







(8.89)

Now we take these values into a matrix for the coefficients in what we have been calling the matrix B, the matrix that holds the coefficients to the quadratic form uTBu. This gives us the following: "

#

"

#

0.0556 −0.0370 0.034 0 B= , which gives D = , −0.0370 0.0971 0 0.119 "

#

"

#

"

(8.90)

#

4.78 1.10 0.863 −0.505 5.4 0 A= , X= , and Λ = . 1.10 3.55 0.505 0.863 0 2.9

(8.91)

With six points using least squares, we get semi-axes of lengths 5.4 and 2.9 and an inclination of 30.3◦ . This is slightly different from what we got with three points but still in reasonable agreement with the values used to set up the problem. The positions of all 6 points were estimated from the figure.

8.7

SUMMARY

This chapter covered the operation of Change of Basis. Changing the basis or coordinate system is done all the time in the Earth Sciences. This can include going from a geographic to UTM coordinate grid, aligning with the fabric in a metamorphic rock, or describing a paleomagnetic field in a rock sample. Changing the basis means that we go from using one set of axes with an associate scale to another set with different orientations, scales, or both. The basis vectors are set by the type of measurement and the easiest coordinate orientations. A great example of a natural basis is the NED system we use in field geology. Geologists view observations as going down into the Earth, so use a positive unit vector pointed downward. Since most geographic systems utilize north as the starting cardinal direction, this defines east as the second principal direction followed by down. The components of vectors are written in terms of the basis vectors of a system. This is the way we have been working with vectors and matrices throughout this text. If we start with a system we will call α, with basis vectors α1 , α2 , ... , αn , then any vector is written as: uα = a1 α1 + a2 α2 + · · · + an αn .

(8.92)

In this, a1 , a2 , ... an are the vector components of u in the α basis. We use a subscript on a vector, in this case α, to indicate which basis the vector is using. If we want to change to another basis, which we will call β, we need to know where each of the starting basis vectors goes into the new system. We will write this as [αi ]β and have a representation of each of the αi in terms of the β basis vectors. [αi ]β = b1i β1 + b2i β2 + · · · bni βn .

(8.93)

264 ■ Linear Algebra for Earth Scientists

Once this is done, we write out the vector in the new coordinate system as follows: uβ = a1 [α1 ]β + a2 [α2 ]β + · · · + an [αn ]β .

(8.94)

This gives us our change of basis matrix and a way to express the vector uα as uβ . 

b11   b21 uβ = Bα→β uα =   ..  .

bn1





a1 b12 · · · b1n   b22 · · · b2n    a2    .. ..   . . . .   ..  bn2 · · · bnn an

(8.95)

Our change of basis matrix is built out of the bij terms in equation 8.93. We can call this the change of basis matrix B, and we would indicate its operation with a simple subscript Bα→β . The components of the n × n matrix B multiply the components of the n × 1 vector u in the α basis to create a new n × 1 vector with components in the β basis. In this chapter, we saw several examples of changing basis in Earth Science applications. However, the motivating basis for our work was going in and out of the eigenbasis. We saw that if the matrices X and Λ are the eigenvector and eigenvalue matrices, respectively, then we convert our system components into the eigenbasis by left multiplying by X T and back using X. Importantly, we see that the matrix X is the eigenvectors specified as components in the basis of our starting system. The matrix Λ performs its operations in the eigenbasis. The other operation we covered was using quadratic forms. This was based on the representation of features such as regular curves or shapes specified by second-order polynomials. We perform eigendecomposition on the coefficients of the polynomial by putting them into a matrix. This means that we can view the eigenvalues as the principal components or axes of an n-ellipsoid and the eigenvectors as the orientation of the figure.

8.8 EXERCISES 1.

There are three common geographic coordinate systems used for orientations in the Earth Sciences, shown in Figure 8.17. First is the North-East-Down or NED system we have adopted in this book. The second is East-North-Up, or ENU, which is very common in geology texts and programs. Finally, the North-WestUp or NWU is commonly used in mobile apps and devices. All three systems are right-handed, but NED has a positive angle in map view measured clockwise; the others are counterclockwise rotations on a map. The clockwise rotation is also the most familiar to beginning users as everybody is taught to go from north 90◦ clockwise to east, 180◦ clockwise to south, and 270◦ clockwise to west. (i)

Write the change of basis matrix for going from ENU to NED and NWU to NED. Start by writing the equations expressing the new basis vectors in the old system.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 265

N

U

N

U

N

RHS

E

E

RHS

W

RHS

D

Geographic coordinate axes for the NED, NWU, and ENU geographic systems. Basis vectors are shown. Figure 8.17

(ii) Show two equivalent paths to going from NWU to NED. The first will be directly using one of the matrices you defined. The other is to go from NWU to ENU and then ENU to NED. (iii) In the middle frame of Figure 8.17, we show a vector pointing downward and intersecting the N-E plane in the southwest quadrant. Make a rough estimate for the direction cosines of this vector and transform them into the other two bases. 2.

In Chapter 1, we saw a diagram of an oblique-slip reverse fault that strikes northsouth and dips 60° to the west. It has 60 m of dip-slip and 40 m of left-lateral strike-slip. A setup figure is repeated in Figure 8.18. We set up the coordinate system for the motion using NED coordinates and then defined our vectors relative to this system. We are going to define three new coordinate systems. The first system will have its first basis vector parallel to the slip vector, its second perpendicular to slip but in the fault plane, and the last perpendicular to these in a right-hand sense. The second system will start with the up-dip direction as the first basis vector, and the second will be along strike, with the third oriented to give a right-handed system. The last system will not be orthogonal but will have axes in the south, down-dip, and east directions. (i) Draw a diagram showing the three systems and the NED axes. (ii) Write a change of basis matrix to go from components in the NED system to the other three. (iii) What are the implications of having a non-orthogonal basis?

266 ■ Linear Algebra for Earth Scientists

50

N

n ha Reverse fault dipping 60° west and striking due north.

a gw

ll

E

Figure 8.18

D 40 m left slip slip

3.

gin

m

60 m up dip

foo

twa

ll

In Chapters 1 and 3, we worked on a geochemical example that we will revisit now. We posed this as an Ax = b with the following equations. K = 1b + 1f + 0c = 5 1 1 0 b 5      Al = 1b + 1f + 3c = 10 , giving 1 1 3 f  = 10 . Fe = 3b + 0f + 1c = 8 3 0 1 c 8 

 





(8.96)

The problem was set up using the elements for the main basis vectors, and we used this example to contrast row and column views. We see this in the Figure 8.19. The diagram in Figure 8.19 shows the elemental axes as being orthogonal, ˆ and Fˆe. ˆ Al, and we have defined the problem in terms of the unit vectors K, The setup is done using an element basis. The diagram also shows axes for the minerals biotite, feldspar, and chlorite. These are not orthogonal in this basis, and the unit basis vectors ˆ b, fˆ, and cˆ have different lengths in this view. We can now re-examine this diagram and system in terms of the basis vectors. We can see in Figure 8.19 that our data vector can be put in terms of either minerals or elements. The rock vector is unchanged regardless of which we use, so the elements and minerals are two different possible bases for this problem. We will call the element basis C for chemistry and the mineral basis M . Again, ˆ and Fˆe. Basis M has ordered basis ˆ Al, C has the ordered basis vectors K, vectors ˆ b, fˆ, and cˆ. We want to let C be the starting basis and M be the target basis, but it is hard to imagine how to express single elements as a sum of several minerals. We know how to go from a mineral to a sum of elements. For that reason, we will set M as the starting basis and C as the target basis.

Change of Basis, Eigenbasis, and Quadratic Forms ■ 267

Figure the geochemical problem from Chapter 1. Default axes are given in amounts of an element. Also shown are the proportions of K, Al, and Fe in the minerals biotite (b), feldspar (f), and chlorite (c). The blue arrow shows the composition of a rock sample in terms of the elements. We set up and solved the linear algebra problem to determine the combination of the minerals that are needed to reach the rock’s chemical composition. Those numbers are shown as well. Figure 8.19

8 Fe

Al

10

5 K

This will give us BM →C . Here is how we write biotite in the C basis. 1 0 0 1 1        ˆ ˆ + 3Fˆe = 1  ˆ + 1Al b = 1K 0 + 1 1 + 3 0 = √ 1 . 11 3 0 0 1 C  

 

 

 

(8.97)

We can write similar representations for fˆ and cˆ. This will give us the matrix BM →C . We need only invert this to get the matrix BC→M . Now, address the following exercises. (i) Calculate the two change of basis matrices BM →C then BC→M . (ii) Explain the meaning of the terms in the BC→M . You should have gotten some negative signed terms. (iii) Compute the rock analysis of K=5, Al=10, and Fe=8. Does this agree with the result you got previously? (iv) The matrix BC→M looks the same as the A−1 matrix we computed in the last section of Chapter 3. Explain how a change of basis relates to solving a set of coupled linear equations.

CHAPTER

9

Singular Value Decomposition

Linear algebra is one of the younger fields in mathematics. Although it dates back centuries, it has gotten significant renewed attention since the advent of readily available and inexpensive computing after the 1960s. This chapter discusses one of the newer foci of linear algebra, the process of Singular Value Decomposition or SVD. The SVD is a powerful tool! It can quickly and accurately solve many of the linear algebra applications presented in the previous chapters. Those include solving for x in Ax = b, calculating the particular solution and any special solutions when A is square or a rectangular, when it’s full rank or singular, when it has more rows than columns or more columns than rows, and even when it looks like it’s full rank but isn’t quite what we’d hoped for. Another benefit of solving for x with SVD is that the calculation is numerically stable (more about this as follows). Last but not least, calculating the SVD also gives us orthonormal bases for each of the four subspaces pictured in Figure 4.15 and all the insight into A and Ax = b that they provide. Collectively, these features make SVD a natural fit for solving Ax = b and understanding the solutions. We begin with a geometric view of the SVD, paralleling our treatment of the eigenvalues and eigenvectors in Chapter 7. We then pick apart the matrix factors of the SVD and explore how to use them to solve Ax = b, revisiting the tomography problem first presented in Chapter 1.

9.1

WHAT IS SINGULAR VALUE DECOMPOSITION OR SVD?

We start by looking at the same set of shells we saw in the chapter on Eigenvectors and Eigenvalues. We did an exercise where we identified a special direction for the strain matrix in equation 9.1 and Figure 9.1. We noted that the horizontal axis of the upper left shell is unrotated during simple shear, and this is an eigenvector of this matrix. In the case of pure shear, we observed two such directions. We focus on the middle right shell in the undeformed and deformed array in Figure 9.1 and notice that the long and short axis/ridge lines remain perpendicular. These vectors are ones we can discover and understand through Singular Value Decomposition or, as it 268

DOI: 10.1201/9781003432043-9

Singular Value Decomposition ■ 269

Left shows an array of brachiopods. Each one is rotated 10° counterclockwise relative to the one before. Right shows the array sheared 30°. Notice that, although elongated, the middle right shell’s hinge line and middle rib have approximately the same orthogonal shape before and after deformation.

Figure 9.1

The left part shows a set of lines perpendicular before and after deformation that correspond to similar ornaments on the shell. Blue arrows in the middle diagram correspond to the SVD analysis of the simple shear strain, with red bars corresponding to the amount of stretching. These are shown only for the greatest singular value direction. Figure 9.2

is usually called, SVD. "

Simple shear strain →

#

1 0.5 0 1

(9.1)

We know that the transformation in equation 9.1 gives only one repeated eigenvalue and eigenvector, and this made sense in what we know about eigenvectors and eigenvalues. Only one direction remains unrotated by the deformation. But it is now apparent in this example, as we see in Figure 9.2, that two directions in the initial shell are reoriented but stay perpendicular before and after deformation. In other words, two directions retain their angular relationship but are rotated by the deformation. They change lengths and orientations but not their relative angles. These must also be special directions! The directions are called the Singular Vectors, and their stretching factors are called the Singular Values. The special vectors in the starting case are always an orthonormal basis (independent and orthogonal, unit length vectors). We can find a similar set of special orthonormal vectors

270 ■ Linear Algebra for Earth Scientists

in the results matrix. So, our result is that we have two orthogonal directions in the original image that rotate to two orthogonal directions in the deformed image. We can write this as an equation that rules the process of singular value decomposition (SVD), given in equation 9.2. This is another example of a diagonalization of a matrix. A = U Σ VT . (9.2) rotation stretching rotation In this equation we call the matrix Σ the Singular Values Matrix, with the singular values on its diagonal, the columns of U are the Left Singular Vectors, and the columns of V are the Right Singular Vectors. Our first goal in SVD is to decompose A into its three factorization matrices A = U ΣV T . We will use the simple shear example in equation 9.1 to explain the parts of the equation in 9.2. "

#

0.788 −0.615 , U= 0.615 0.788

"

#

1.281 0 Σ= , 0 0.781

"

V

T

#

0.615 0.788 = . −0.788 0.615

(9.3)

The matrices U and V T are rotation matrices. The matrix U corresponds to a 38° counterclockwise rotation and is the approximate orientation of the deformed shell. The right singular vectors in V T reflect a 52° rotation, which is the orientation of the long axis of the undeformed shell. The matrix Σ is equivalent to the pure shear matrices we have seen before. So, the result of SVD here is what every structural geologist would state—a simple shear deformation is identical to a pure shear deformation combined with a rotation. As noted in a previous chapter, a 2D rotation matrix with an upper right term positive corresponds to a clockwise rotation, and an upper right term negative is a counterclockwise rotation. MATLAB® moment—Using the svd() function The SVD function in MATLAB® takes in a matrix and returns three matrices that form the singular value decomposition. >>[ U S V ] = svd(A) gives U for the left singular vectors, V for the right singular vectors, and S for the singular values matrix. Note that to perform the multiplication of the U , S, and V matrices to recover A, we have to use the transpose of the matrix V output by MATLAB® . In other words, the multiplication would be: >>A = U * S * V’ "

#

0.615 −0.788 For the simple shear example in equation 9.1, we get V = , 0.788 0.615 which is a +52° counterclockwise rotation. This makes V T = R(−52°). We must understand that the SVD is related to the simple shear matrix in equation 9.1 and not to the particular shell we picked. The middle right shell in Figure 9.1

Singular Value Decomposition ■ 271

Three examples of different starting orientations for the brachiopod. The left side is before simple shear deformation, and the right side is after. The top row corresponds to no rotation of the shell, the middle is 40°, and the bottom row is 80°. The blue lines are the singular vectors. They are perpendicular and fixed in the starting orientation for V T on the left side and in the final deformed orientation for U on the right side. Note that the ridges marked in the left parts correspond to the same but rotated ridges on the right-hand side. In this way, we see that the deformation and the singular vectors and singular values correspond to the deformation matrix and not to the shape or orientation of the shell. Figure 9.3

was nice because it was marked by the hinge of the shell and the ridge line perpendicular to the hinge, as we show in Figure 9.2. This means the shell stretched but did not appear distorted in an angular sense like all the other shells. This makes it look the same as a shell that undergoes pure shear where the hinge line is parallel to the stretching direction. We can find similar markers in all the shells, as shown in Figure 9.3. In this figure, the blue lines are perpendicular, all with the same orientations as the special directions. The only difference is that the initially perpendicular ridges they mark are not the same as in Figure 9.2 and change with the orientation of the starting shell. Thus, the rotation and stretching stay the same, but the markers used are not fixed but defined by the shell’s orientation. Just as we saw that for eigenanalysis, the eigenvectors and eigenvalues were found using the characteristic equation for square matrices, For SVD, we have a similar but more complicated operation of finding two sets of singular vectors and a set of singular values associated with any matrix regardless of its shape. We will confine our work to the real numbers but will entertain the broad range of matrices we encountered in the chapter about the solvability of Ax = b based on matrix rank and shape.

9.2

SVD FOR ANY MATRIX

We got the result for a 2 × 2 matrix. We can also get eigenvectors and eigenvalues for this matrix, but they are not a full set, and we get one repeated eigenvector and eigenvalue. Why was this? With eigendecomposition, we are looking for vectors that

272 ■ Linear Algebra for Earth Scientists

end at the same orientation that they start. In the case of simple shear, there is just one such direction: the direction of shearing. Otherwise, all vectors rotate. For this reason, eigenanalysis provides limited information about many matrices. We know that we can get the eigenvectors and eigenvalues of a symmetric or skew-symmetric matrix, that the eigenvalues are real, and that the eigenvectors are orthogonal. For other square matrices, we may have no luck in decomposition. What if we wanted the eigenvectors and eigenvalues of a non-square matrix, say one that was 1000 × 3? This cannot be done because the eigendecomposition only works on square matrices. It still seems like there should be some process that will give us information about the size and orientation of the vectors in the matrix. There is, and it is singular value decomposition. SVD will work on any matrix. Once again, our main equation is A = U ΣV T . In this, A is any matrix, and U and V T are two special orthonormal matrices corresponding to the left and the right singular vectors. The foundation of SVD is that for any matrix, we can always find U and V T as well as a unique matrix Σ. From this equation, we can derive some properties by which we can solve for U , V T , and Σ. We do this by isolating for U or V T and not solving for both at once. We can manipulate the results into equations 9.4 and 9.5 that we can solve. ATA = AT U ΣV T = (U ΣV T )T U ΣV T → (V T )T ΣT (U T U )ΣV T = V ΣTIΣV T

→ ATA = V Σ2 V T

(9.4)

→ AAT = U Σ2 U T .

(9.5)

AAT = U ΣV TAT = U ΣV T (U ΣV T )T → U Σ(V T (V T )T )ΣT U T = U ΣTIΣU T

We went through a tortuous process to get to final equations containing ATA and AAT , something we know how to compute. Both matrices can be called a Gram Matrix. Note that we are given that Σ is diagonal, meaning that Σ2 is also diagonal. Our main result is that ATA or AAT = (orthonormal matrix) × (diagonal matrix) × (transposed orthonormal matrix). It is important to recognize that the transpose of an orthonormal matrix is the same as its inverse, so in the previous equations, we could substitute U −1 and V −1 for U T and V T . In addition, both ATA and AAT are symmetric, positive definite matrices (or possibly semidefinite). This means that the final equations in 9.4 and 9.5 are identical to the eigendecomposition equation B = XΛX −1 , where we substitute ATA or AAT for B, Σ2 for Λ, and U or V for X. We have reduced SVD to a process we have done before, namely eigendecomposition! SVD works for any matrix because ATA and AAT are symmetric positive semi-definite so that they can be decomposed into eigenvalues and eigenvectors. 9.2.1

Back to Simple Shear

We will look again at the problem that started the chapter—simple shear. "

#

1 0.5 Simple shear strain → A = . 0 1

(9.6)

Singular Value Decomposition ■ 273

Left side shows starting unit box and shell. This is followed by ca. 52° clockwise rotation. In the third frame, the box is elongated parallel to the horizontal axis and shortened parallel to the vertical. This corresponds to ribs on the shell after the rotation. The deformed box undergoes a 38° counterclockwise rotation in the last frame.

Figure 9.4

We get results for A from this. #"

"

1 0 A A= 0.5 1 T

"

#"

1 0.5 AA = 0 1 T

#

"

1 0.5 1 0.5 = 0 1 0.5 1.25 #

"

#

(9.7) #

1 0 1.25 0.5 = . 0.5 1 0.5 1

(9.8)

Now we can find the eigenvectors and eigenvalues for equations 9.7 and 9.8. "

#

1 0.5 A A= 0.5 1.25 T

"

#

1.25 0.5 AA = 0.5 1 T

"

"

"

"

A = U ΣV

(9.9)

#

0.788 −0.615 → X=U = 0.615 0.788 #

1.64 0 Λ = Σ2 = 0 0.61 T

#

0.615 −0.788 → X=V = 0.788 0.615

1.281 0 → Σ= 0 0.781

#"

0.788 −0.615 = 0.615 0.788

"

1.281 0 0 0.781

#"

(9.10)

#

(9.11) #

0.615 0.788 . −0.788 0.615

(9.12)

Our solution equals the SVD we showed in equation 9.3. What do these matrices mean? In explaining SVD in equation 9.2, we noted that Σ is a stretching matrix and that U and V T are rotation matrices. How does this apply to the case of simple shear? Perhaps the easiest way to see this is by following the deformation of a box and shell as they go through the various transformations. We will set up the box as the matrix shown in Figure 9.4, with the coordinates of its corners given by the column vectors of the matrix, and follow the deformation. This is

274 ■ Linear Algebra for Earth Scientists

equivalent to following what happens to four corners of the box during deformation. We set the shell so that the hinge line is horizontal. This looks really similar to the eigenanalysis progression we saw in previous chapters. We only had one rotation matrix, and the diagonalization gave A = XΛX T . The set of operations in the eigenanalysis rotates and counter-rotates by the same amount, leaving the eigenvectors unrotated after transformation. For SVD, the two rotations are different and capture perpendicular lines that may rotate but retain the same orthogonal orientation after transformation. Calculating the SVD is different and, in some ways, less restrictive than eigenanalysis. 9.2.2

What is the Connection Between Λ and Σ?

The reader is probably wondering when to use eigenvalues, when to use singular values, and what their differences are. Of course, we only have singular values at our disposal for rectangular matrices. What about square matrices? In the chapter on eigenvectors and values, we used these to infer the orientation of the strain ellipse and its size for pure shear deformation. In this chapter, we have used the singular values and vectors for the same purpose but on simple shear. But we also stated that Λ = Σ2 , so how can the magnitude of strain be both the eigenvalues and the square root of the eigenvalues? The difference is that for eigenvalues and vectors, we are dealing with a single square matrix A, but for singular values, we are working with ATA and AAT . If we can diagonalize A, something we have stated is true for all real-valued A’s, then we can write A = XΛX −1 for symmetric positive definite matrices. Let’s suppose we are doing SVD and use the same matrix A in ATA to get the singular values and the right singular vectors. Using the equation A = XΛX −1 will lead to the result shown in equation 9.13. ATA = (XΛX −1 )T XΛX −1 = XΛX −1 XΛX −1 = XΛ2 X −1.

(9.13)

We would reach an equivalent result using AAT . This means if we are getting the singular values from SVD, then we are computing the eigenvectors and values for either ATA or AAT . We get something corresponding to Λ2 that we call Σ2 to avoid confusion. If components of A had units of meters, for example, then ATA and AAT will have units of meters2 . We then take the square root of the matrix Σ2 to have the correct power and units associated with a single power A. When we find eigenvalues and vectors for just A, we get plain, single-powered Λ as we did in the previous chapters. To make sure this is a clear result, we will redo the pure shear analysis with both eigendecomposition and SVD. In a previous chapter, we got the eigenvalue and eigenvector matrices for a pure shear matrix shown in 9.14. "

#

"

#

1.41 0 1.41 0 A= → Λ= , 0 0.71 0 0.71

"

#

1 0 X= . 0 1

(9.14)

Singular Value Decomposition ■ 275

Now, we will do SVD for the same pure shear matrix. We compute both ATA and AAT in equation 9.15, and their eigenvectors and values in equations 9.16 and 9.17. We see that the value of Σ from SVD is equal to Λ from the eigendecompostion we got in 9.14. "

A A=

AA

=

T

T

T

=

"

U 2

Σ

V "

=

#

2 0 = 0 0.5

#

2 0 , 0 0.5 #

1 0 , 0 1

"

Σ=

(9.15) (9.16) #

1.41 0 . 0 0.71

(9.17)

So we conclude that for A alone, we can use the eigenvectors and eigenvalues, provided that A is symmetric positive definite—they provide the same information as the SVD. This is undoubtedly the case for pure shear deformation. If A is not symmetric positive definite, as in the simple shear case, or is rectangular, then we cannot use eigenanalysis to yield the same insights, but we can use SVD. In detail, this works because both ATA and AAT are symmetric positive definite for any realvalued starting matrix A. 9.2.3

What are the Sizes and Ranks of the Matrices in SVD?

We will start with a specific example and then move to a more general result. Let’s start with a matrix A that is 20 × 3. We can easily compute the sizes of U and V T . The matrix U is associated with ATA, so it comes from a 3 × 20 matrix multiplying a 20 × 3 one. The result is a 3 × 3 square symmetric matrix. Likewise, V T is associated with AAT , so a 20 × 3 matrix multiplying a 3 × 20 one. The result is a 20 × 20 square matrix. The matrix Σ sits between these, so it must be multiplied by a 20 × 20 matrix and multiply itself times a 3 × 3 matrix. This only works if the matrix Σ is 20 × 3. So the result is matrices multiplied in the order 20 × 20, 20 × 3, 3 × 3, which produces a final matrix of 20 × 3, the same as A. Now, we will consider the equation A = U ΣV T and start with A being a m × n matrix. We then will get the result shown in 9.18. A = U m×n m×m

Σ m×n

VT . n×n

(9.18)

It is also straightforward to figure out the rank of the matrices and a bit more of what they look like in most cases. We have not specified the relative magnitude of m or n. Suppose m > n, and the matrix A has rank r = n. This is a common situation in least squares analysis and means that the matrix V T is smaller than U . It will be full rank since r = n. This means the matrix Σ, which will be rectangular diagonal, has size m × n. Rectangular Diagonal means that the diagonal terms in the first n rows will be positive, but the rest will be zero. The matrix U is m × m, but has rank only r = n. Below, we give an example of an orientation matrix with 5 rows

276 ■ Linear Algebra for Earth Scientists

and 3 columns. 

l1 l 2  l3  l4 l5

m1 m2 m3 m4 m5

n1  σ1 0 0    0 σ | | | | | n2  2 0       n3  = u1 u2 u3 u4 u5   0 0 σ3      | | | | | 0 0 0 n4  0 0 0 n5 





v1 v2 v3

  .

(9.19)

For this example, A is 5 × 3, U is 5 × 5, Σ is 5 × 3, and V is 3 × 3. The matrix Σ is rectangular diagonal. If n > m, then matrix U will be smaller, and the matrix A is at most of rank m. If r = m, we will get m singular values in the matrix Σ. Then, rather than having rows of zeros as in equation 9.19, it will have columns of zeros on the right-hand side of Σ. In either case, the number of nonzero singular values is the same as the rank of A.

9.3 WHEN DO SIMPLE AND PURE SHEAR PRODUCE THE SAME STRAIN ELLIPSE? Now, we take up a statement by structural geologists made earlier—that a simple shear could be viewed as a pure shear plus a rotation. Using SVD, we can look at the strain ellipse for simple shear, something we could not have done before. Let’s set up the problem with the strains given by variables and see when we get the same overall strain for pure and simple shear. We show this in Figure 9.5. When we say the same strain, we mean when the strain ellipses have the same aspect ratio. The total strain will not be the same since simple shear will cause a rotation. We do this problem for the 2D case, recognizing that we could extend this into as many dimensions as we like. The measure of the strain ellipse is given by the singular values, much in the way we interpreted it before as represented by the eigenvalues. Let’s start by setting up each problem so there is no change in volume. This is easy for the simple shear case: as long as the diagonal terms are 1, the determinant remains the same. For pure shear, we need to set the shortening to the reciprocal of the extension. Calling the pure shear matrix C for Coaxial deformation and the simple shear matrix N for Non-Coaxial, we get the following in equations 9.20 and 9.21. In addition, to get the strain to have no area change in 2D, the matrix of singular values Σ must be of the form shown in 9.22. "

#

c 0 C = , 0 1/c "

"

γ = shearing factor,

(9.21)

d = long axis,

(9.22)

#

d 0 Σ = , 0 1/d d2 0 Σ2 = 0 1/d2 "

(9.20)

#

1 γ , 0 1

N =

c = stretching factor,

#

→ eigenvalues for C TC and N TN .

(9.23)

Singular Value Decomposition ■ 277

Examples of pure shear (top) and simple shear (bottom) strain ellipses. The middle diagram is the starting circle and box, with unit diameter. The semiaxis lengths are in the same ratio for the pure shear and simple shear ellipses and correspond to the singular values given in equation 9.27. Yellow lines show the pure shear strain axes, the same as the eigenvectors for the pure shear matrix in equation 9.24. Red lines show the orientation of the singular vectors for simple shear. The center diagram has these coincident with the right singular vectors, and the lower diagram shows the orientation of the left singular vectors. These come from the matrices in equation 9.27. The lines in the middle diagram are rotated 58.3° clockwise by V T , stretched, and then rotated 31.7° counterclockwise by U . U = R(31.7); V T = R(−58.3). Figure 9.5

We can now start working toward the solution. In both cases, we will use SVD to solve the problem. For C, we will compute C TC and find its eigenvectors and values. c2 0 C C= , 0 1/c2 "

#

T

c2 0 Λ= , 0 1/c2 "

#

"

1 0 X= 0 1

#

(9.24)

For N , we will compute N TN and find its eigenvectors and values. "

1 γ N N= γ 1 + γ2 T

#

(9.25)

278 ■ Linear Algebra for Earth Scientists

Finding the eigenvalues and vectors for the simple shear case is more involved. We only quote the result for the eigenvalues in 9.26, as this is all we need to match the pure and simple shear cases. (2 + γ 2 ) ±

(2 + γ 2 )2 − 4 (2 + γ 2 ) ± γ γ 2 + 4 λ1 , λ2 = = . (9.26) 2 2 So the solution for the simple shear case is not, well, very simple. On the other hand, the pure shear case is quite easy to solve. The easiest way to approach the problem is to establish a value for γ and compute the eigenvalues. We then take the square root of these values to get Σ and the lengths of the long and short axes of the strain ellipse, the quantities d and 1/d in equation 9.22. These values will simply be equal to c and 1/c. Let’s explore a simple case where we set γ to 1. We substitute these into equation 9.26 and get the following results for Σ. We then determine the singular vectors. "

p

p

#

"

#

"

#

0.851 −0.526 1.618 0 0.526 −0.851 U= , Σ= , V = . 0.526 0.851 0 0.618 0.851 0.526

(9.27)

This gives a rotation angle of about 58.3° between the horizontal axis and the long axis of the simple shear ellipse for the right singular vectors in V T and 31.7° between the horizontal axis and the left singular vectors in U .

9.4 USING SVD TO SOLVE Ax = b We can use the insight into the SVD gleaned from its geometric interpretation to help us find all the solutions to Ax = b in a simple and numerically stable way. Calculating U , S, and V for any real matrix A is arduous, as we found out, so throughout the following, we’ll take advantage of MATLAB® ’s SVD capabilities. See the MATLAB® moment at the end of Section 9.2.1 for a refresher. We will sprinkle some MATLAB® commands into the text as well, although we have avoided this practice up to now. Hand calculations are arduous, and the authors would use MATLAB® to compute the SVD of even a 2 × 2 matrix. To demonstrate the power of SVD, we’ll use the tomography problem introduced in Exercise 4 of Chapter 1, revisited in Exercise 5 of Chapter 3, solved in several permutations in Exercise 1 of Chapter 4, and re-revisited in Section 5.3.2 as a least-squares problem. To recap Exercise 4 of Chapter 1, Figure 1.12 is reproduced here as Figure 9.6. We set up equations for travel time measurements of seismic waves across four square blocks to find out something about the seismic velocity structure of those blocks. We quickly realized that, to formulate linear equations for the travel times, we needed to use the reciprocal of the velocity v, known as slowness v −1 , along with the distances the seismic waves traversed across the blocks. For the problem set up in Figure 1.12, we wrote the linear system Ax = b as    v −1   1 t1 1 1 0 0   0 0 1 1 v −1  t     2   2    =  . 0 1 0 1 v −1  t3   3  

1 0 1 0

v4−1

t4

(9.28)

Singular Value Decomposition ■ 279

Figure 9.6 Setup for exercise on tomography. Assume each block is 1 km on a side.

The matrix A contains distances in kilometers, the unknown vector x contains the slowness terms in seconds per kilometer, and the data vector b contains the travel times in seconds. 9.4.1 The Four Subspaces via SVD

The matrix A in equation 9.28 looks like it ought to yield a solution for x—it has four rows, four columns, and it describes ray paths that traverse all four blocks. Unfortunately, as we learned in Exercise 4.1E, there is no unique solution, and A is singular with rank = 3. Using our linear algebra tools based on elimination, we can better understand the action of A in Ax = b by calculating its four subspaces (see Figure 4.15). SVD offers a quick and easy way to find those four subspaces. Using A from equation 9.28, we get: A = U ΣV T → 2 0 0 1/2 1/2 1/2 −1/2 1/2 −1/2 −1/2 −1/2 0 √2 0    √ U = , Σ =  1/2 −1/2 1/2 2 1/2 0 0 1/2 −1/2 1/2 1/2 0 0 0 √   −1/2 2/2 0 −1/2 √ −1/2 0 −√2/2 1/2   V = . 0 2/2 1/2  −1/2 √ 0 −1/2 −1/2 − 2/2 





0 0  , 0 0 

(9.29)

280 ■ Linear Algebra for Earth Scientists

Numbering the column vectors of U and V and the singular values σ inside Σ, 

| | |  U = u1 u2 u3 | | |

σ1 0 0 | 0 σ   2 0 u4  , Σ =   0 0 σ3 | 0 0 0 



  0 | | | |  0   , V = v1 v2 v3 v4  . 0 | | | | 0 

(9.30)

As expected, the 4 × 4 matrix A yields three 4 × 4 factors U , Σ, and V . The rank of A is r = 3, so we get three positive singular values along the diagonal of Σ ordered from largest to smallest. The fourth row and column of Σ are all zeros. This information tells us where to look for each of the four subspaces introduced in Chapter 4. The first three columns of U are orthogonal and span the column space of A. These three columns belong to the three singular values in Σ. The fourth column u4 of U spans the left null space of A. Recall from Chapter 5 that the left null space is the subspace where we find the error vector in projection and least squares problems. As expected, the left null space is orthogonal to the column space (see Figure 4.15) and has dimension m − r. This follows from the property of U that all its columns are orthogonal, so its fourth column is orthogonal to the subspace spanned by its first three columns. You can check that the fourth column of U is the N(AT ) by multiplying its transpose by A, which results in a zero row vector: 1 h i 0  uT4 A = −1/2 −1/2 1/2 1/2  0 1 

1 0 1 0

0 1 0 1

0 h i 1  = 0 0 0 0 . 1 0 

(9.31)

We continue to use the observation that A has rank r = 3. The first three columns of V , which also belong to the three singular values in Σ, are orthogonal and span the row space of A. Recall that V becomes V T in the SVD factorization, so the first three columns of V will become the first three rows of V T . The last column of V , v4 , belongs to the fourth row and column of Σ that’s full of zeros, and it defines the null space of A. As expected, the null space in our example has dimension n − r = 1. Also, the N(A) is perpendicular to the row space, consistent with the property that the columns of V are orthogonal and the first three columns span the row space. You can check that the fourth column of V is the null space: 1 0  Av4 =  0 1 

1 0 1 0

0 1 0 1

0 −1/2 0     1   1/2 0   =  . 1  1/2 0 0 −1/2 0 



 

(9.32)

To restate the SVD’s properties we’ve explored, the rank r of an m × n matrix A is the number of singular values that appear along the diagonal of Σ. The SVD factors A = U ΣV T , where U has dimensions m×m, Σ has dimensions m×n and V has dimensions n × n. The first r columns of U span the column space of A, and the last m − r columns of U span the left null space. The first r columns of V span the row space of A, and the last n − r columns of V span the null space. In other words, we get the rank and all four subspaces of A with one quick matrix factorization. The SVD is a true linear algebra power, too. But wait, there’s more!

Singular Value Decomposition ■ 281

9.4.2 Four Easy Steps to Get from the SVD to the Inverse or Pseudoinverse

Armed with the null space we found in V , we have xs and are halfway to finding the full solution to Ax = b → x = xp + xs . Now we need to find the particular solution xp . In Chapter 3, one way that we solved for x was to determine the inverse of A so that we could calculate x = A−1 b. Unfortunately, the A in equation 9.28 is singular and doesn’t have an inverse. It would be really handy to have something like an inverse for cases where A is rectangular or A−1 doesn’t exist. Many matrices don’t have an inverse, but all real matrices have a Pseudoinverse. The pseudoinverse discussed here is the Moore-Penrose Inverse and denoted A+ . When A is invertible, then A+ = A−1 . When A is not invertible, then the pseudoinverse A+ doesn’t act like the inverse A−1 we’ve met in the past. For instance, A−1A = AA−1 = I but, A+A = ̸ AA+ ̸= I.

(9.33)

where A−1 is the inverse of an invertible matrix and A+ is the pseudoinverse of a noninvertible matrix. However, it is true for any A that AA+A = A and A+AA+ = A+ .

(9.34)

Helpfully, we can use A+ to solve for xp when A−1 doesn’t exist. Specifically, xp = A+ b.

(9.35)

Now, we need to calculate A+ , and we’ll have our complete solution. It turns out we’ve calculated a pseudoinverse before for least squares solutions (Chapter 5) in the case of overdetermined systems where m > n. In equations 5.27 and 5.29, we used a tall and skinny m > n full rank matrix A to formulate the so-called normal equation. x ˆ = (ATA)−1 AT b,

(9.36)

In the process, we actually calculated our first pseudoinverse: A+ = (ATA)−1 AT such that x ˆ = A+ b.

(9.37)

For our m > n least squares examples with rank r = n, the matrix A was rectangular and didn’t have an inverse, but we got to a n × n matrix that we could multiply by b ˆ We’re looking for a more general solution that works when the rank to calculate x. r < n. Can we use the SVD? For a matrix factorization A = BCD, you could use the matrix property (BC)−1 = C −1 B −1 to figure out that, where A is invertible, A = BCD → A−1 = (BCD)−1 = D −1 C −1 B −1 .

(9.38)

We can apply this method to the SVD factorization, with A invertible as a starting point. We get A = U ΣV T → A−1 = (U ΣV T )−1 = V −T Σ−1 U −1 .

(9.39)

282 ■ Linear Algebra for Earth Scientists

In section 6.4, we learned that orthonormal matrices Q are special in that their transpose acts like their inverse: QT Q = I (see equation 6.52). Because both U and V are orthonormal, no matter what type of matrix A is, they follow this rule: U −1 = U T

and (V T )−1 = V .

(9.40)

The last step is to find the inverse of the diagonal matrix Σ. If A is invertible with rank = r, then the inverse of Σ is calculated simply by taking the reciprocal out of the r singular values on the diagonal. 1/σ1 . . . 0 ..   .. . .. = . . . 0 . . . 1/σr 

Σ−1



(9.41)

Check for yourself that Σ−1 Σ = I when Σ is square and full rank. Modifying the expression in equation 9.39, we now know how to calculate an inverse using the SVD in four easy steps. 1. Calculate the inverse of V T by just using V , 2. calculate the inverse of Σ by taking the reciprocal of its diagonal elements, 3. calculate the inverse of U by taking its transpose, and 4. multiply the three matrices → A−1 = V Σ−1 U T . We know how to do each of these four calculations with just the basic linear algebra skills we developed in Chapter 2. And we will use the MATLAB® function svd() to do most of our work. But what if A is not square or full rank? We can adapt the approach earlier to matrices A that are rectangular or square but not invertible. For an m×n matrix, the SVD factor U is m × m, the factor V is n × n, and both matrices are orthonormal. They are both square, and their transposes act as their inverses, so no problems there. The factor Σ, though, is m × n, and even a square Σ can have zeros on the diagonal. The key insight here is that, for rank = r, there are r singular values along the diagonal of Σ, and the first r columns of U and V belong to these singular values and span the row and column spaces of A. We want just these components to calculate our pseudoinverse. If the matrices Ur and Vr are the first r columns of U −1 and V and the matrix Σ−1 r is the first r rows and columns of Σ , then we can write an equation for the pseudoinverse that works for any A, A+ = Vr Σ−1 r Ur .

(9.42)

9.4.3 Calculating a Particular Solution xp and the Full Solution to Ax = b

Now that we’ve calculated the pseudoinverse A+ , we have a clear path to our particular solution, xp , using equation 9.35. First, let’s confirm that the size of the A+ is correct. For an m × n matrix A, equation 9.42 for the pseudoinverse yields

Singular Value Decomposition ■ 283

the matrix Vr that is n × r, the matrix Σr that is r × r, and the matrix UrT that is r × m. The inner dimensions of each factor agree, and the outer dimensions give the dimensions of A+ as n × m or the same dimensions as AT . Therefore, we can safely multiply the m × 1 vector b by A+ to get the n × 1 vector xp . For the 4 × 4 matrix A from equation 9.28 that describes Figure 9.6, we will get a 4 × 4 matrix A+ . In the examples, we will use a very simple vector b of measured travel times (in seconds). 1 0  A= 0 1

0 1 0 1

0 1  , 1 0

3/8 −1/8 −1/8

3/8





1 0 1 0

2 t1 t  2  2   b =   =  , t3  2 2 t4

3/8

(9.43)

3/8 −1/8

v1−1





1   1 v −1     2  + A b = xp =   =  −1 . 1 v   3   



 3/8 −1/8 3/8 −1/8   A+ =  , −1/8 3/8 −1/8 3/8

−1/8

 

 



1

(9.44)

v4−1

By inspection, we can see that if each square in Figure 9.6 has v = v −1 = 1 kilometer per second (velocity) or second per kilometer (slowness), then each travel time will be two seconds—this particular solution indeed solves Ax = b. For the complete solution, we add an arbitrary constant(s) multiplied by each vector in the null space, which we discovered is the column v4 of the matrix V in equation 9.29 in Section 9.4.1: −1/2 1  1/2 1     x = xp + xs =   + c  .  1/2 1 −1/2 1  





(9.45)

More generally, the full solution to Ax = b is x = Vr Σ−1 r Ur b + Vr+ c.

(9.46)

The matrix Vr+ , if it exists, is the null space found in columns r + 1 to n of V , and c is an arbitrary column vector with n − r components. There is no exact solution for full-rank overdetermined problems where m > n, the null space is the zero vector, and x in equation 9.46 is the least squares solution. We can now see that many velocity structures are possible from the complete solution to our tomography problem in equation 9.45. For instance, the x vectors in equation 9.47 all solve Ax = b for the b in equation 9.43. 0.9 1.1     1.1 0.9 

x

=



1.1 0.9     0.9 1.1 

or



0.1 1.9     1.9 0.1 

or



1.9 0.1    . 0.1 1.9 

or



(9.47)

284 ■ Linear Algebra for Earth Scientists

So what makes the pseudoinverse ‘pick out’ the particular xp in equation 9.44 when many others would work? The answer is that the xp calculated using the MoorePenrose pseudoinverse has the minimum vector norm of all possible choices of xp . Our xp has ∥xp ∥ = 2, while the others in equations 9.47 have norms of 2.01, 2.01, 2.69, and 2.69, respectively. Minimum norm solutions are pretty useful when you expect that the elements of x are, collectively, as close to the zero vector as possible.

MATLAB® moment—Calculating a pseudoinverse with pinv() The Moore-Penrose pseudoinverse function in MATLAB® takes in a matrix A and returns the pseudoinverse A+ . The use of the function looks like: >>Aplus = pinv(A)

where inside of MATLAB® , pinv() is calculated

using the MATLAB® function svd() and equation 9.42. With the pinv() function, we can check the inequalities in equation 9.33. Using the tomography problem example, for instance, >>pinv(A)*A does not equal >>A*pinv(A) does not equal >>eye(4). However, per equation 9.34, >>A*pinv(A)*A does return A.

Finally, a practical consideration. Our tomography problem involves measuring the travel times for four seismic wave ray paths. If any of the slowness parameters in xi = vi−1 were zero, then the seismic wave would have 1/0 = ∞ velocity through that block, which is impossible. Also, negative slowness and velocity make no physical sense in this context. Therefore, we can limit the elements of x to be positive real numbers greater than zero. This limits our value of c in the general solution (equation 9.45) to be −2 < c < 2, which keeps all elements xi > 0. We saw a similar situation for the three-fault problem in Section 4.4.9, What’s hiding in the null space?. In that problem, we limited our solutions using the knowledge that the faults could move in only one direction, greatly restricting the possible contributions from the null space. In practice, solutions to natural physical systems require extra steps of examination and thought in addition to the theoretical considerations this book focuses upon. We’ll continue with more practical considerations in the next section, focusing on how well our computers can handle the numbers we’ve arranged in matrices and vectors, especially in the presence of the ubiquitous measurement errors that result from doing real science.

9.5 CONDITION NUMBERS, OR, WHEN GOOD MATRICES GO BAD The tomography problem set up in Section 9.4 explores travel times for seismic waves moving from sources to receivers across a 2 km × 2 km grid broken into four 1 km square blocks. These measurements are reflected in the design matrix A in

Singular Value Decomposition ■ 285

1 mm

Our previous tomography problem, but with the seismic source at the start of the first ray path moved backward 1 mm after recent maintenance. The top left block, colored blue, is now 1.000001 km wide. Figure 9.7

equation 9.28. Consider a hypothetical scenario: After servicing the seismic source to the left of block one, you measure that it has moved backward exactly 1 mm. The width of the top left block is now slightly longer, but all other lengths are unchanged, as illustrated in Figure 9.7. Our new design matrix for the tomography problem is as follows: 1.000001  0  A′ =   0 1 

1 0 1 0

0 1 0 1

0 1  . 1 0 

(9.48)

We can recognize the consequences of this small shift to our system by performing an SVD on A′ . 2.000 0 0 0 −0.500 0.653 −0.271 −0.500  0 1.414 0  −0.500 −0.653 0.271 −0.500 0     U = , Σ =  ,  0  −0.500 −0.271 −0.271 0.500  0 1.414 0 0 0 0 2.500 × 10−7 −0.500 0.271 0.653 0.500 







−0.500 0.653 0.271 −0.500 −0.500 0.271 −0.653 0.500    V = . −0.500 −0.271 −0.653 0.500  −0.500 −0.653 −0.271 −0.500 



(9.49)

We have rounded values to three decimal places. Note that the first three singular values in Σ are very close to those of A, but a new fourth singular value has appeared! We can confirm this seeming stroke of good luck in MATLAB® , where rank(A) = 4. With just one millimeter of displacement to one source, we should now be able to solve our system for unique seismic velocities in each of the four blocks. That was easy! Or was it?

286 ■ Linear Algebra for Earth Scientists

If we use our particular solution from equation 9.44, that all velocities and slownesses are 1 (km/s or s/km, respectively), we can Forward Model the vector b. 1.000001  0    0 1 

Ax = b



1 0 1 0

0 1 0 1

2.000001 0 1  1  2 1     .   =   2 1 1  2 0 1  





(9.50)

Forward modeling involves assuming the unknown parameter vector x to predict a value for the measured data vector b. We did something similar when we calculated xp instead of b in equation 9.44 using A+ b = xp . Our first question is, can we use our new perturbed A and predicted b to recover the same x, a vector of ones, that we assumed in our forward model? Let’s try several ways of approaching the problem. 9.5.1 Testing all of Our Approaches to Solving Ax = b

We start by using Gaussian elimination, as introduced in Chapter 3. We could do this by implementing the elimination algorithm on an augmented matrix [A|b] as demonstrated in Section 3.2.2, or with MATLAB® ’s backslash operator \, which depends on an LU factorization for most square matrices. Both perform the same arithmetic operations and arrive at the same conclusion, stated in equation 9.51. The other approach from Chapter 3, calculating the inverse using MATLAB® ’s inv() function, also uses an LU factorization but requires more calculations and therefore takes longer to execute. So far, so good, though you might notice that the x we calculate differs from a vector of integer ones by about 1 × 10−10 . That’s a pretty small difference and nothing to worry about too much. 1.00000000011 0.99999999989   x= , 0.99999999989 1.00000000011 



1.1 1 1 −1.1     x− =  × 10−10 . 1 −1.1 1.1 1  





(9.51)

So far, so good. We could also solve the system using the normal equation introduced in Chapter 4, making use of MATLAB® ’s inv() function. 1.0032 1.0017   (ATA)−1 AT b = x =  , 1.0007 1.0003 



1 3.2 1 1.6     x −   =   × 10−3 . 1 0.7 1 0.3  





(9.52)

Using the normal equation seems to be problematic: we’re not correctly calculating x even though we are using the A and b from the consistent set of equations in (9.50). The results are similar if we use MATLAB® ’s backslash for the inverse: (A’*A)\(A’*b) yields errors in all components of x of about 10−3 . We were better off calculating more accurate values for x using elimination/LU based approaches.

Singular Value Decomposition ■ 287

For our last method, let’s try using the pseudoinverse based on the SVD in equation 9.49.       −4.7 1 1.00000000046 1  0  1.00000000000       V Σ−1 U T b = x =   × 10−10 . (9.53) , x −   =  1  4.7  0.99999999953 −2.3 1 1.00000000023 The same solution can be reached using the SVD-based MATLAB® function pinv(), coded as pinv(A)*b where A and b are A and b as earlier. To sum up the results of our experiment, our new matrix A is indeed full rank, and we can recover a unique x using its forward-modeled b. Our elimination/LU - and SVD-based solutions most accurately determine x, and the normal equation does not. This observation on its own is good enough reason to recommend using MATLAB® ’s backslash (also known as matrix left division) operator as a first choice when solving Ax = b, here and for all full rank problems. MATLAB® uses a QR factorization when you use backslash on rectangular matrices, another highly accurate algorithm (see Section 6.5.2). In the second place is using inv() or pinv(), if the matrix A is or is not invertible, respectively. One reason these approaches come in second is that they take longer to compute. Another is that, in general, the x that you calculate with the elimination-based backslash yields matrix-vector products Ax that are closer to b, in the sense that they generally have a smaller residual norm ∥b − Ax∥. In last place is the normal equation, which we gleaned great insight from in Chapter 5 but can produce less numerically accurate solutions than our other approaches. 9.5.2 Small Errors and Big Levers

We’re not yet finished with practical concerns for our modified tomography problem, where we added 1 millimeter to the width of the first block. In Sections 9.4.1 through 9.4.3, our measured travel times were all exactly 2 seconds, and in Section 9.5, we’re now considering a measured travel time that is 2 seconds plus an additional microsecond. We must entertain the possibility, though, that we cannot make measurements of travel times with infinite precision. Even the best measuring devices will impart some small measurement errors. We use the term error here not to mean a mistake but instead to represent the difference between a measured value and a true value, which might be due to any number of factors, including instrument noise. Let’s consider that our measured travel times have some small measurement errors and assess the effect of those errors on our solutions and resulting geologic interpretations. We add a very small error of just 0.1 microseconds to the first term in b, bringing the first travel time to 2.0000011 seconds. We then use our preferred backslash operator in MATLAB® to solve for a new x, 1.000001  0    0 1 

1 0 1 0

0 1 0 1

0 2.0000011 1.100      1  2  0.900 \ = .  0.900 1  2 0 2 1.100 









On no—our slownesses (and seismic velocities) changed by ten percent!

(9.54)

288 ■ Linear Algebra for Earth Scientists

More experimentation shows that adding or subtracting that additional 0.1 microseconds from any other element of b will have a similar effect. Changing an element of b by about 10−7 will change elements of the resulting solution x by about 10−1 . Our solution is extraordinarily sensitive to small amounts of noise in the measured travel times! Perturbing an element of b by 10−6 , adding or subtracting a microsecond to any measured travel time, results in changes to x on the order of 100 = 1, and changing a perturbation to an element of b of 10−8 will change elements of x by 10−2 . The pattern is that a perturbation of b results in a 106 times larger perturbation in x. Our original matrix A in this chapter was singular, and the matrix A we made by adding 1 millimeter to a length is now full rank, but still quite close to singular. That proximity shows up in the high sensitivity of the solution x to small errors in the measured data b. The term for this is that A is Poorly Conditioned or Ill-Conditioned. There are lots of ways to define poorly conditioned or its opposite, Well Conditioned, and the concept appears in several mathematical contexts. But here, we mean the sensitivity of the solution to our inverse problem. An example of a well-conditioned tomography problem can be found in Chapter 5, Figure 5.10 and equation 5.65. The Condition Number quantifies and puts an upper bound the effect that small changes in b will have on x, or vice versa. The condition number of the matrix A, denoted κ(A), is the ratio of its largest to its smallest singular value. And more precisely, the condition number sets a bound on normalized perturbations to x and b, κ(A) =

σ1 , σn

∥δx∥ ∥δb∥ ≤ κ(A) . ∥x∥ ∥b∥

(9.55)

MATLAB® moment—Computing the condition number Getting the condition number of a matrix is simple in MATLAB® using the cond() function. We will get the following for our matrix A in equation 9.48. >>kappa=cond(A) gives κ(A) = 8.0 × 106 .

Using the singular values from the SVD in equation 9.49, the matrix A therefore has a condition number of σ1 /σ4 = 2.0/(2.5 × 10−7 ) = 8.0 × 106 . As we observed earlier, this is a large condition number, meaning that A is poorly conditioned. The perturbations of x on the order of 106 relative to b we observed earlier are less than or equal to, but not far from, this condition number. A condition number closer to 1 would be well-conditioned. However, there is no cutoff in the loose definition of these terms, and what defines an acceptable condition number depends on the problem you’re solving. As another example, the matrix A in equation 5.65 has a condition number of 2 and is therefore well conditioned.

Singular Value Decomposition ■ 289

The condition number can show up in solving for x without an explicit error in b. For instance, we noticed in Section 9.5.1 that the normal equation yielded a much worse estimate of x even with our error-free b, with errors in x on the order of 10−3 . The errors we observed in x derive from two factors. The first is that the condition number of the product of two square matrices is the product of their individual condition numbers, and the condition number of a square matrix is the same as its transpose, κ(AB) = κ(A)κ(B),

κ(A) = κ(AT ).

(9.56)

So, if the condition number of A is 8.0 × 106 , then the condition number of (ATA)— the matrix we must invert in the normal equation—is (8.0 × 106 )2 = 6.4 × 1013 . The second factor contributing to the failure of the normal equation to solve Ax = b is that numbers for generic scientific computing, stored on computers and used in MATLAB® and other languages, are generally Double Precision floating point numbers. Because of the way they’re stored in binary format in computer memory, ‘doubles’ are only accurate to about the sixteenth decimal place or a relative precision of 2.2 × 10−16 . We encountered a related problem computing direction cosines in Section 2.6.1’s MATLAB® moment. Even without an explicit error in b, the vector can only be represented on the computer, at best, to this level of precision. Using equation 9.55, we might expect relative errors in x using the normal equation as large as (2.2 × 10−16 )(6.4 × 1013 ) = 0.014. Our actual error in x, from equation 9.52, was not much smaller. So, the normal equation works fine for well-conditioned matrices but can go awry for poorly-conditioned matrices. For LU based methods like the backslash, we expect the error to have a maximum norm of (2.2 × 10−16 )(8.0 × 106 ) = 1.8 × 10−9 . As expected, this is just larger than the norm of our observed x error in equation 9.51, which was around 10−10 . MATLAB® moment—Poorly conditioned matrix Sometimes, you may need a poorly conditioned matrix to test some code or a function. To create a simple one in MATLAB® , you can make a Hilbert 1 matrix. This makes the matrix components aij = . The command i+j−1 is hilb(n) where n is the size of the square Hilbert matrix. 

1

    >>B=hilb(5) gives B =     

1 2 1 3 1 4 1 5

1 2 1 3 1 4 1 5 1 6

1 3 1 4 1 5 1 6 1 7

1 4 1 5 1 6 1 7 1 8

1 5 1 6 1 7 1 8 1 9

      → κ(B) = 47661.    

Setting n = 12 is as big as you will need to go.

290 ■ Linear Algebra for Earth Scientists

9.6

SUMMARY

In this chapter, we explored the theory behind a relatively recent development in linear algebra applications, the singular value decomposition or SVD. We learned that the SVD is a real power tool in our linear algebra toolbox—the SVD can help us solve linear systems of equations and provide enormous insight into those equations along the way. To use another metaphor, the SVD is like a matrix X-ray, revealing its inner workings. We also explored some practical considerations in solving linear systems on modern computer systems. These insights helped us develop rules of thumb for choosing which of the various techniques we’ve explored in this textbook to apply when solving the linear systems of equations for practical Earth Science problems. We started the chapter with a new matrix factorization, the singular value decomposition, A = U ΣV T . The matrices U and V are orthogonal, and the matrix Σ is diagonal, with the singular values σi arranged from largest to smallest along its diagonal. Matrices U and V T describe rotations, and Σ is a stretching, and so A can be seen as the composition of a rotation by V T , a stretching by Σ, and finally a rotation by U . This is an elegant system partly because it will work for any matrix A, even if A is rectangular. Eigendecomposition, which looks similar to the SVD on the surface, won’t work on rectangular matrices. But it will work on the positive semi-definite products of rectangular or square matrices ATA or AAT . When we perform this eigendecomposition, we discover that the eigenvectors of the products are the singular vectors of A, and the eigenvalues of the products are the squared singular values of A. We can use this property to calculate the singular value decomposition from the eigendecomposition of ATA or AAT . The SVD provides considerable insight into any real matrix A. The rank r is given by the number of nonzero singular values on the diagonal of Σ. The first r orthonormal columns of U span the column space of A. Columns r + 1 to n of U , if they exist, span the left null space or the subspace in which we find the error vector in projection and least squares problems. The first r orthonormal columns of V span the row space of A, and columns r + 1 to m of V , if they exist, span the null space of A. For an invertible matrix, the ratio of the largest to the smallest singular value, σ1 /σn , is the condition number, which tells us something about how close to singular it is and how well it will behave when solving Ax = b. We do not expect the reader ever to calculate an SVD or pseudoinverse by hand. Rather, using MATLAB® or some other application is the most practical method, so we discuss some of the functions here. We can use the SVD to calculate a MoorePenrose pseudoinverse of any matrix A as A+ = V Σ−1 U T . The pseudoinverse is a quick and numerically accurate way to solve for the particular solution xp , to go along with a special solution xs in the null space from V , giving us the complete solution x to the system Ax = b. For systems with r = n, whether square or tall with m > n, LU -based approaches that use Gaussian elimination, like MATLAB® ’s backslash operator \, are quick and numerically accurate. Use the backslash in MATLAB® for these systems, or the function inv() if you really need the inverse. Use the pseudoinverse function pinv() for matrices with rank r < n to find particular solutions

Singular Value Decomposition ■ 291

50 m 80

1 5

40

2

N

4 6 60

3

E Diagram of the three-point problem with three extra measurements The map shows data points—measured locations in (x, y, z) coordinates, plotted on contour lines. Figure 9.8

with the smallest vector norm. When you know or suspect that the matrix A has a high condition number and, therefore, is poorly conditioned, don’t use the normal equation, and no matter which method you use, beware: small changes to b can result in large changes to x.

9.7 EXERCISES 1.

Revisit Exercise 3 of Chapter 4 with your new SVD tools. The exercise contains a matrix whose rank is less than the number of rows and columns. Use the svd() function in MATLAB® to compute bases for all four subspaces. 1 2  A= 1 2 

2.

1 2 2 2

2 4 4 4

2 6 8 1

1 2 1 2

2 4   6 4 

Revisit the three-point problem expanded to become a six-point problem described in Section 5.2.4 and shown in Figure 9.8. point point point point point point

1 2 3 4 5 6

5 70  60   75  40 70 

70 50 10 40 55 20

1 30   90 1  m   E   1   60  mN  =   90 1  z0   60 1 1 70 





(9.57)

292 ■ Linear Algebra for Earth Scientists

(i) (ii) (iii) (iv)

(v)

(vi)

Calculate the SVD of the design matrix A. Write out and explain the four subspaces of A. Predict its size, then use the SVD to calculate the pseudoinverse A+ of the rectangular matrix A. Use the pseudoinverse to solve the least squares problem in equation 5.44. Now consider what might happen if you only have data from points 1, 2, and 5. Calculate the SVD of the resulting A. What is the condition number κ(A) of this new matrix? Using the geometry of these three points in 3D space, can you explain why κ(A) is so high? How might small errors in the measured elevations affect the unknown parameters mE , mN , and z0 ? You can now add a fourth data point to your system. Which of the remaining points—3, 4, or 6—would the condition number be reduced the most? Reducing the condition number improves your estimate of the unknown parameters when your measured data have uncertainties.

Index

N(AT ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 ε . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 || . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Symbols +

A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 E −1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .148, 210 H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 LDU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 LU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 QT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Q−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 R(θ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 U . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211 Σ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 x ˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 σ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 σn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 τ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 ai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 qn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 xp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 xs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 det . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 γ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 ⟨ , ⟩ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27, 45 C(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 C(AT ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 N(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

A augen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .246 augmented matrix . . . . . . . . . . . . . . . . . . . 64 B b-hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 back substitution . . . . . . . . . . . . . . . . . 64, 66 elimination . . . . . . . . . . . . . . . . . . . . . . 64 basis . . . . . . . . . . . . . . . . . . . . . . 101, 227, 228 basis vector . . . . . . . . . . . . . . . . . . . . 228 components . . . . . . . . . . . . . . . . . . . . 228 coordinate system . . . . . . . . . . . . . . 227 eigenvector . . . . . . . . . . . . . . . . . . . . . 220 notation . . . . . . . . . . . . . . . . . . . . . . . 228 null space . . . . . . . . . . . . . . . . . . . . . . 108 rotation . . . . . . . . . . . . . . . . . . . . . . . . 232 standard . . . . . . . . . . . . . . . . . . . . . . . 230 subscript . . . . . . . . . . . . . . . . . . . . . . . 228 unit vector . . . . . . . . . . . . . . . . . . . . . . 11 working in . . . . . . . . . . . . . . . . . . . . . 228 basis vector . . . . . . . . . . . . . . . . . . . . . 11, 228 standard . . . . . . . . . . . . . . . . . . . . . . . 230 Brunton compass . . . . . . . . . . . . . . . . . . . 238 C change of basis . . . . . . . . . . . . . . . . . . . . . 228 matrix . . . . . . . . . . . . . . . . . . . . . . . . . 230 eigenbasis . . . . . . . . . . . . . . . . . . . . . . 237 eigenvector matrix . . . . . . . . . . . . . 237 rotation . . . . . . . . . . . . . . . . . . . . . . . . 232 subscript . . . . . . . . . . . . . . . . . . . . . . . 230 characteristic equation . . . . . . . . . . . . . . 202 eigenvalues . . . . . . . . . . . . . . . . . . . . . 202 characteristic polynomial . . . . . . . . . . . 202

293

294 ■ Index

circle area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 matrix . . . . . . . . . . . . . . . . . . . . . . . . . 258 circular marker . . . . . . . . . . . . . . . . . . . . . 255 angular deformation . . . . . . . . . . . . 255 elongation . . . . . . . . . . . . . . . . . . . . . . 255 cleaning upward . . . . . . . . . . . . . . . . . . . . . 81 closest distance . . . . . . . . . . . . . . . . . . . . . 141 coaxial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 coefficients matrix . . . . . . . . . . . . . . . . . . . . 6 column rank . . . . . . . . . . . . . . . . . . . . . . . . . 68 column space . . . . . . . . . . . . . . . . . . . . . . . 102 from SVD . . . . . . . . . . . . . . . . . . . . . . 280 span . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 column view . . . . . . . . . . . . . . . . . . . . . 19, 20 complementary projector . . . . . . . . . . . 144 component matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 components and basis vectors . . . . . . . 242 condition number . . . . . . . . . . . . . . . . . . . 288 contraction . . . . . . . . . . . . . . . . . . . . . . . . . 167 coordinate plane . . . . . . . . . . . . . . . . . . . . 250 coordinate system NED . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 right-handed . . . . . . . . . . . . . . . . . . . . . 3 x-y-z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 cross product . . . . . . . . . . . . . . . . . . . . . . . 179 angle . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 formula . . . . . . . . . . . . . . . . . . . . . . . . 179 null space . . . . . . . . . . . . . . . . . . . . . . 179 parallelogram . . . . . . . . . . . . . . . . . . 179 perpendicular . . . . . . . . . . . . . . . . . . 179 three-point problem . . . . . . . . . . . . 181 D data vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 declination . . . . . . . . . . . . . . . . . . . . . . . . . 238 design matrix . . . . . . . . . . . . . . . . . . . . . . . . . 6 determinant 2 × 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 3 × 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 A ∗ B . . . . . . . . . . . . . . . . . . . . . . . . . . 175 A + B . . . . . . . . . . . . . . . . . . . . . . . . . 178 A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 AT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 LDU . . . . . . . . . . . . . . . . . . . . . . . . . . 173 c × A . . . . . . . . . . . . . . . . . . . . . . . . . . 178 adding matrices . . . . . . . . . . . . . . . . 178 eigenvalue matrix . . . . . . . . . .216, 218

elimination . . . . . . . . . . . . . . . . . . . . . 173 general formula . . . . . . . . . . . . . . . . 174 linear transformation . . . . . . . . . . . 169 matrix inverse . . . . . . . . . . . . . . . . . . 176 matrix product . . . . . . . . . . . . . . . . . 175 matrix transpose . . . . . . . . . . . . . . . 177 n-cube . . . . . . . . . . . . . . . . . . . . . . . . . 172 n-volume . . . . . . . . . . . . . . . . . . . . . . . 172 number of operations . . . . . . . . . . . 174 sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 symbols . . . . . . . . . . . . . . . . . . . . . . . . 168 volume . . . . . . . . . . . . . . . . . . . . . . . . . 169 diagonal matrix . . . . . . . . . . . . . . . . . . . . . . 41 dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 dimension subspace . . . . . . . . . . . . . . . . . . . . . . . 101 vector . . . . . . . . . . . . . . . . . . . . . . . . . . 101 vector space . . . . . . . . . . . . . . . . . . . . 101 dimensional analysis . . . . . . . . . . . . . . . . . . 8 direction cosines . . . . . . . . . . . . . . . . . . . . . 16 computing . . . . . . . . . . . . . . . . . . . . . . 52 lineations . . . . . . . . . . . . . . . . . . . . . . . 52 plane normal . . . . . . . . . . . . . . . . . . . . 55 poles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 dot product . . . . . . . . . . . . . . . . . . . . . . . . . . 28 angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 commutative . . . . . . . . . . . . . . . . . . . . 28 double precision . . . . . . . . . . . . . . . . . . . . 289 E eigen meaning . . . . . . . . . . . . . . . . . . . . . . 203 eigenanalysis . . . . . . . . . . . . . . . . . . . . . . . .198 diagonalization . . . . . . . . . . . . . . . . . 213 eigenbasis . . . . . . . . . . . . . . . . . . . . . . . . . . .220 eigenvector matrix . . . . . . . . . . . . . 237 eigenvalue . . . . . . . . . . . . . . . . . . . . . . . . . . 201 characteristic equation . . . . . . . . . 202 distinct . . . . . . . . . . . . . . . . . . . . . . . . 203 ellipse shape . . . . . . . . . . . . . . . . . . . 223 matrix . . . . . . . . . . . . . . . . . . . . . . . . . 211 order . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 ratios . . . . . . . . . . . . . . . . . . . . . 248, 255 repeated . . . . . . . . . . . . . . . . . . . . . . . 203 symbol . . . . . . . . . . . . . . . . . . . . . . . . . 201 eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . 201 change of basis . . . . . . . . . . . . . . . . . 220 ellipse orientation . . . . . . . . . . . . . . 223 matrix . . . . . . . . . . . . . . . . . . . . . . . . . 211 null space . . . . . . . . . . . . . . . . . . . . . . 203 orthogonal matrix . . . . . . . . . . . . . . 212 sign . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Index ■ 295

special orientation . . . . . . . . . . . . . . 199 elementary matrices . . . . . . . . . . . . . . . . . .75 inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 75 operations . . . . . . . . . . . . . . . . . . . . . . . 75 elementary row operations . . . . . . . . . . . 70 adding . . . . . . . . . . . . . . . . . . . . . . . . . . 70 exchanging . . . . . . . . . . . . . . . . . . . . . . 70 multiplying . . . . . . . . . . . . . . . . . . . . . . 70 swapping . . . . . . . . . . . . . . . . . . . . . . . . 70 elimination back substitution . . . . . . . . . . . . . . . . 64 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . 62 row echelon form . . . . . . . . . . . . . . . . 66 row reduction . . . . . . . . . . . . . . . . . . . 70 elimination matrices . . . . . . . . . . . . . . . . . 70 augmented matrix . . . . . . . . . . . . . . . 72 outer product . . . . . . . . . . . . . . . . . . . 70 ellipse area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 major semiaxis . . . . . . . . . . . . . . . . . 255 matrix . . . . . . . . . . . . . . . . . . . . . . . . . 258 minor semiaxis . . . . . . . . . . . . . . . . . 255 semiaxis . . . . . . . . . . . . . . . . . . . . . . . .255 tilted . . . . . . . . . . . . . . . . . . . . . . . . . . .259 ellipsoid matrix . . . . . . . . . . . . . . . . . . . . . . . . . 259 elongation . . . . . . . . . . . . . . . . . . . . . . . . . . 215 equation of circle . . . . . . . . . . . . . . . . . . . 255 equation of ellipse . . . . . . . . . . . . . . . . . . 255 equation view . . . . . . . . . . . . . . . . . . . . . . . . 18 error vector . . . . . . . . . . . . . . . . . . . . . . . . . 141 Euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . 141 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 F factorization . . . . . . . . . . . . . . . . . . . . . . . . . 77 first-order tensor . . . . . . . . . . . . . . . . . . . . 250 fold axis orientation . . . . . . . . . . . . . . . . 184 forward model . . . . . . . . . . . . . . . . . . . . . . 286 free column . . . . . . . . . . . . . . . . . . . . . . . . . 107 free variable . . . . . . . . . . . . . . . . . . . . . . . . 107 full rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 fundamental equation . . . . . . . . . . . . . . . . . 6 G Gauss-Jordan elimination . . . . . . . . . . . . 81 cleaning upward . . . . . . . . . . . . . . . . . 81 Gaussian elimination . . . . . . . . . . . . . . . . . 62 geographic north . . . . . . . . . . . . . . . . . . . .238 geologic vectors

dip azimuth and dip . . . . . . . . . . . . . 15 dipping surface . . . . . . . . . . . . . . . . . . 14 direction cosines . . . . . . . . . . . . . . . . . 16 lineations . . . . . . . . . . . . . . . . . . . . . . . 17 normal to the plane . . . . . . . . . . . . . 15 pole to the plane . . . . . . . . . . . . . . . . 15 slip vector . . . . . . . . . . . . . . . . . . . . . . . 13 strike and dip . . . . . . . . . . . . . . . . . . . 15 trend and plunge . . . . . . . . . . . . 17, 52 Gram matrix . . . . . . . . . . . . . . . . . . 148, 272 Gram-Schmidt Process . . . . . . . . . 187, 189 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 messy results . . . . . . . . . . . . . . . . . . . 191 H hat matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 146 Hermitian . . . . . . . . . . . . . . . . . . . . . . . . . . 219 homogeneous solution . . . . . . . . . . . . . . . 105 hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . 172 hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . 116 I identity matrix . . . . . . . . . . . . . . . . . . . . . . 41 ill-conditioned . . . . . . . . . . . . . . . . . . . . . . 288 independence . . . . . . . . . . . . . . . . . . . . . . . . 68 inner product . . . . . . . . . . . . . . . . . . . . . . . . 28 inverse . . . . . . . . . . . . . . . . . . . . . . . . .7, 74, 82 Gauss Jordan . . . . . . . . . . . . . . . . . . . 81 K kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 null space . . . . . . . . . . . . . . . . . . . . . . 105 L least squares . . . . . . . . . . . . . . . . . . . 149, 156 left null space . . . . . . . . . . . . . . . . . . . . . . . 120 null space of AT . . . . . . . . . . . . . . . 120 left singular vectors . . . . . . . . . . . . . . . . . 270 linear combination . . . . . . . . . . . . . . . 29, 44 linear regression . . . . . . . . . . . . . . . . . . . . 139 linear transformation . . . . . . . . . . . . . 42, 43 view . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 linearly dependent . . . . . . . . . . . . . . . . . . . 44 linearly independent . . . . . . . . . . . . . .44, 45 lineations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 lower triangular . . . . . . . . . . . . . . . . . . . . . . 70 M magnetic declination . . . . . . . . . . . . . . . . 238 magnetic north . . . . . . . . . . . . . . . . . . . . . 238 main diagonal . . . . . . . . . . . . . . . . . . . . . . . 36

296 ■ Index

matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 augmented . . . . . . . . . . . . . . . . . . . . . . 64 change of basis . . . . . . . . . . . . . . . . . 230 circle . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 coefficients . . . . . . . . . . . . . . . . . . . . . . . 6 column view . . . . . . . . . . . . . . . . . . . . .19 complementary projector . . . . . . . 144 design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 diagonal . . . . . . . . . . . . . . . . . . . . . . . . . 41 diagonalizable . . . . . . . . . . . . . . . . . . 214 eigenvalue . . . . . . . . . . . . . . . . . . . . . . 211 eigenvalue determinant . . . . 216, 218 eigenvector . . . . . . . . . . . . . . . . . . . . . 211 elimination . . . . . . . . . . . . . . . . . . . . . . 70 ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . 258 ellipsoid . . . . . . . . . . . . . . . . . . . . . . . . 259 factorization . . . . . . . . . . . . . . . . . . . . .77 Gram . . . . . . . . . . . . . . . . . . . . . 148, 272 hat matrix . . . . . . . . . . . . . . . . . . . . . 146 Hermitian . . . . . . . . . . . . . . . . . . . . . . 219 identity . . . . . . . . . . . . . . . . . . . . . . . . . 41 inverse . . . . . . . . . . . . . . . . 7, 20, 74, 82 invertible . . . . . . . . . . . . . . . . . . . . . . . . 68 lower triangular . . . . . . . . . . . . . . . . . 70 main diagonal . . . . . . . . . . . . . . . . . . . 36 noninvertible . . . . . . . . . . . . . . . . . . . . 93 orthogonal . . . . . . . . . . . . . . . . 186, 192 orthogonal columns . . . . . . . . . . . . 192 orthogonal projector . . . . . . . . . . . 144 orthonormal . . . . . . . . . . . . . . . 186, 192 permutation . . . . . . . . . . . . . . . . . . . . . 73 positive definite . . . . . . . . . . . . . . . . 209 positive semi-definite . . . . . . . . . . . 209 projection . . . . . . . . . . . . . . . . . . . . . . 142 rectangular diagonal . . . . . . . . . . . 275 reduced row echelon form . . . . . . . 84 rotation . . . . . . . . . . . . . . . . . . . . . . . . 235 row view . . . . . . . . . . . . . . . . . . . . . . . . 18 similar . . . . . . . . . . . . . . . . . . . . . . . . . 220 singular . . . . . . . . . . . . . . . . . . . . . . . . . 93 singular value . . . . . . . . . . . . . . . . . . 270 size . . . . . . . . . . . . . . . . . . . . . . . . . . . 6, 27 symmetric . . . . . . . . . . . . . . . . . . . . . .147 symmetric positive definite . . . . . 209 tilted ellipse . . . . . . . . . . . . . . . . . . . . 259 transpose . . . . . . . . . . . . . . . . . . . . . . . .35 typographical style . . . . . . . . . . . . . . 27 upper triangular . . . . . . . . . . . . . . . . . 64 matrix multiplication associative . . . . . . . . . . . . . . . . . . . . . . 35 column approach . . . . . . . . . . . . . . . . 29

commutative . . . . . . . . . . . . . . . . . . . . 34 diagonal matrix . . . . . . . . . . . . . . . . . 41 distributive . . . . . . . . . . . . . . . . . . . . . .35 dots products . . . . . . . . . . . . . . . . . . . 31 elimination . . . . . . . . . . . . . . . . . . . . . . 70 identity matrix . . . . . . . . . . . . . . . . . . 41 inner products . . . . . . . . . . . . . . . . . . 31 outer products . . . . . . . . . . . . . . . . . . 32 row approach . . . . . . . . . . . . . . . . . . . .30 matrix with orthogonal columns . . . . 192 Moore-Penrose inverse . . . . . . . . . . . . . . 281 multivariate function . . . . . . . . . . . . . . . . . 54 N n-cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 n-ellipsoid . . . . . . . . . . . . . . . . . . . . . . . . . . 259 n-volume . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 non-coaxial . . . . . . . . . . . . . . . . . . . . . . . . . 276 noninvertible matrix . . . . . . . . . . . . . . . . . 93 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 normal equation . . . . . . . . . . .142, 146, 154 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 accuracy . . . . . . . . . . . . . . . . . . . . . . . 286 matrices . . . . . . . . . . . . . . . . . . . . . . . 146 multidimensional . . . . . . . . . . . . . . . 146 subspaces . . . . . . . . . . . . . . . . . . . . . . 154 normal stress . . . . . . . . . . . . . . . . . . . . . . . 249 normal to the plane . . . . . . . . . . . . . . . . . . 15 normal vector . . . . . . . . . . . . . . . . . . . . 52, 55 normalization . . . . . . . . . . . . . . . . . . . . . . . 101 a ˆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 null space . . . . . . . . . . . . . . . . . . . . . . . . . . .105 basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 cross product . . . . . . . . . . . . . . . . . . 179 from SVD . . . . . . . . . . . . . . . . . . . . . . 280 homogeneous solution . . . . . . . . . . 105 kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 105 span . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 null space of AT . . . . . . . . . . . . . . . . . . . . 118 from SVD . . . . . . . . . . . . . . . . . . . . . . 280 left null space . . . . . . . . . . . . . . . . . . 120 span . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 O orthogonal inverse . . . . . . . . . . . . . . . . . . 187 orthogonal matrix . . . . . . . . . . . . . .186, 192 orthogonal projector . . . . . . . . . . . . . . . . 144 orthogonal transpose . . . . . . . . . . . . . . . .187 orthonormal . . . . . . . . . . . . . . . . . . . . . . . . 186 orthonormal matrix . . . . . . . . . . . . . . . . . 192

Index ■ 297

outer product . . . . . . . . . . . . . . . . . . . . . . . . 33 P particular solution . . . . . . . . . . . . . . . . . . 107 permutation matrix . . . . . . . . . . . . . . . . . . 73 inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 76 physical space . . . . . . . . . . . . . . . . . . . . . . 150 pivot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 column . . . . . . . . . . . . . . . . . . . . . 73, 107 position . . . . . . . . . . . . . . . . . . . . . . . . . 65 rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 variable . . . . . . . . . . . . . . . . . . . . . . . . 107 plane rotation . . . . . . . . . . . . . . . . . . . . . . 236 plane strain . . . . . . . . . . . . . . . . . . . . . . . . . 175 pole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52, 55 pole to the plane . . . . . . . . . . . . . . . . . . . . . 15 poorly conditioned . . . . . . . . . . . . . . . . . . 288 positive definite . . . . . . . . . . . . . . . . . . . . . 209 positive semi-definite . . . . . . . . . . . . . . . 209 principal axes . . . . . . . . . . . . . . . . . . . . . . . 219 principal elongations . . . . . . . . . . . . . . . . 215 principal strains . . . . . . . . . . . . . . . . . . . . 215 principal stresses . . . . . . . . . . . . . . . . . . . 252 coordinate system . . . . . . . . . . . . . . 252 eigenvalues . . . . . . . . . . . . . . . . . . . . . 252 eigenvectors . . . . . . . . . . . . . . . . . . . . 252 nomenclature . . . . . . . . . . . . . . . . . . 252 projection . . . . . . . . . . . . . . . . . . . . . . . . . . 140 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 projection matrix . . . . . . . . . . . . . . . . . . . 142 projection vector . . . . . . . . . . . . . . . . . . . .141 projector complementary . . . . . . . . . . . . . . . . . 144 orthogonal . . . . . . . . . . . . . . . . . . . . . 144 pseudoinverse . . . . . . . . . . . . . . . . . . . . . . . 281 pure shear . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Q QR decomposition . . . . . . . . . . . . . . . . . . 192 least squares . . . . . . . . . . . . . . . . . . . 193 quadratic form . . . . . . . . . . . . . . . . . . . . . . 258 ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . 258 R rake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 column rank . . . . . . . . . . . . . . . . . . . . .68 full rank . . . . . . . . . . . . . . . . . . . . . . . . 68 row rank . . . . . . . . . . . . . . . . . . . . . . . . 68 span . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

real numbers . . . . . . . . . . . . . . . . . . . . . 27, 45 real vector space . . . . . . . . . . . . . . . . . . . . 100 rectangular diagonal . . . . . . . . . . . . . . . . 275 reduced row echelon form . . . . . . . . . . . . 84 identity matrix . . . . . . . . . . . . . . . . . . 84 reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 residual of the fit . . . . . . . . . . . . . . . . . . . 163 results vector . . . . . . . . . . . . . . . . . . . . . . . . . 6 right singular vectors . . . . . . . . . . . . . . . 270 rotation matrix . . . . . . . . . . . . . . . . . . . . . 235 commutivity . . . . . . . . . . . . . . . . . . . 236 row echelon form . . . . . . . . . . . . . . . . . . . . 66 from elimination . . . . . . . . . . . . . . . . 66 row rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 row reduction . . . . . . . . . . . . . . . . . . . . . . . . 70 row space . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 from SVD . . . . . . . . . . . . . . . . . . . . . . 280 span . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 row view . . . . . . . . . . . . . . . . . . . . . . 5, 18, 20 S scalar product . . . . . . . . . . . . . . . . . . . . . . . 28 scalar triple product . . . . . . . . . . . . . . . . 184 second-order tensor . . . . . . . . . . . . . . . . . 249 semiaxis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 eigenvalue . . . . . . . . . . . . . . . . . . . . . . 256 long . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 short . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 shear strain . . . . . . . . . . . . . . . . . . . . . . . . . 215 shear stress . . . . . . . . . . . . . . . . . . . . . . . . . 249 similar matrix . . . . . . . . . . . . . . . . . . . . . . 220 simple shear . . . . . . . . . . . . . . . . . . . . . . . . . 46 singular matrix . . . . . . . . . . . . . . . . . . . . . . 93 singular value decomposition . . . . . . . . 269 singular value matrix . . . . . . . . . . . . . . . 270 singular values . . . . . . . . . . . . . . . . . . . . . . 269 singular vectors . . . . . . . . . . . . . . . . . . . . . 269 six-point problem . . . . . . . . . . . . . . . 97, 154 slowness . . . . . . . . . . . . . . . . . . . . . . . . 25, 162 solution particular . . . . . . . . . . . . . . . . . . . . . . 107 special . . . . . . . . . . . . . . . . . . . . . . . . . 108 solution vector . . . . . . . . . . . . . . . . . . . . . . . . 6 span . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44, 100 column space . . . . . . . . . . . . . . . . . . .102 null space . . . . . . . . . . . . . . . . . . . . . . 108 null space of AT . . . . . . . . . . . . . . . 118 rank . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 row space . . . . . . . . . . . . . . . . . . . . . . 114 vector space . . . . . . . . . . . . . . . . . . . . 100 special solution . . . . . . . . . . . . . . . . . . . . . 108

298 ■ Index

spectral theorem . . . . . . . . . . . . . . . . . . . . 219 standard basis . . . . . . . . . . . . . . . . . . . . . . 230 standard deviation . . . . . . . . . . . . . . . . . . 163 strain . . . . . . . . . . . . . . . . . . . . . . . . . . . 46, 198 circular marker . . . . . . . . . . . . . . . . . 255 coaxial . . . . . . . . . . . . . . . . . . . . . . . . . 276 contraction . . . . . . . . . . . . . . . . . . . . . 167 dilation . . . . . . . . . . . . . . . . . . . . . . . . 167 elongation . . . . . . . . . . . . . . . . . . . . . . 215 matrix . . . . . . . . . . . . . . . . . . . . . .46, 168 non-coaxial . . . . . . . . . . . . . . . . . . . . .276 plane strain . . . . . . . . . . . . . . . . . . . . 175 principal elongations . . . . . . . . . . . 215 principal strains . . . . . . . . . . . . . . . . 215 pure shear . . . . . . . . . . . . . . . . . . 46, 175 shear strain . . . . . . . . . . . . . . . . . . . . 215 simple shear . . . . . . . . . . . . . . . . 46, 175 stretch . . . . . . . . . . . . . . . . . . . . . . . . . 215 superposition . . . . . . . . . . . . . . . . . . .178 strain ratios . . . . . . . . . . . . . . . . . . . . . . . . 255 Strang theory . . . . . . . . . . . . . . . . . . . . . . . 122 stress at a point . . . . . . . . . . . . . . . . . . . . 251 stress state . . . . . . . . . . . . . . . . . . . . . . . . . 250 stress tensor . . . . . . . . . . . . . . . . . . . . . . . . 250 positive definite . . . . . . . . . . . . . . . . 251 symmetric . . . . . . . . . . . . . . . . . . . . . .251 stress vector . . . . . . . . . . . . . . . . . . . . . . . . 249 normal . . . . . . . . . . . . . . . . . . . . . . . . . 249 shear . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 stretch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 superposition . . . . . . . . . . . . . . . . . . . . . . . 178 SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . 269, 270 left singular vectors . . . . . . . . . . . . 270 right singular vectors . . . . . . . . . . . 270 simple shear . . . . . . . . . . . . . . . . . . . .272 singular value matrix . . . . . . . . . . . 270 singular values . . . . . . . . . . . . . . . . . 269 singular vectors . . . . . . . . . . . . . . . . 269 swapping rows . . . . . . . . . . . . . . . . . . . . . . . 73 symmetric matrix . . . . . . . . . . . . . . . . . . . 147 symmetric positive definite . . . . . . . . . .209 T tectonite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 example . . . . . . . . . . . . . . . . . . . . . . . . 250 first-order . . . . . . . . . . . . . . . . . . . . . . 250 second-order . . . . . . . . . . . . . . . . . . . 249 zeroth-order . . . . . . . . . . . . . . . . . . . . 250 three-point problem

cross product . . . . . . . . . . . . . . . . . . 181 elimination . . . . . . . . . . . . . . . . . . . . . . 63 graphical solution . . . . . . . . . . . . . . . . 1 matrix multiplication . . . . . . . . . . . . 31 matrix solution . . . . . . . . . . . . . . . . . . . 7 tomography . . . . . . . . . . . . 24, 91, 137, 161 traction on plane . . . . . . . . . . . . . . . . . . . . . . . 250 traction vector . . . . . . . . . . . . . . . . . . . . . . 250 transformation view . . . . . . . . . . . . . . . . . .42 transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 product . . . . . . . . . . . . . . . . . . . . . . . . . 38 rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 vector multiplication . . . . . . . . . . . . 36 trend and plunge . . . . . . . . . . . . . . . . . 17, 52 true north . . . . . . . . . . . . . . . . . . . . . . . . . . 238 typographical styles . . . . . . . . . . . . . . . . . . 27 hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 27 row or column . . . . . . . . . . . . . . . . . . . 27 tilde . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 U unit vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 unknowns vector . . . . . . . . . . . . . . . . . . . . . . 6 upper triangular . . . . . . . . . . . . . . . . . . . . . 64 upward cleaning . . . . . . . . . . . . . . . . . . . . . 81 V vector angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 associative, commutative, distributive . . . . . . . . . . . . . . . . . 11 basis vector . . . . . . . . . . . . . . . . . . . . . 11 basis vectors . . . . . . . . . . . . . . . . . . . 242 column . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 components . . . . . . . . . . . . . . . . . . . . 242 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 dot product . . . . . . . . . . . . . . . . . . . . . 28 error vector . . . . . . . . . . . . . . . . . . . . 141 inner product . . . . . . . . . . . . . . . . . . . 28 multiplication . . . . . . . . . . . . . . . . . . . 28 using transpose . . . . . . . . . . . . . . . 37 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 normal . . . . . . . . . . . . . . . . . . . . . . 52, 55 normalization . . . . . . . . . . . . . . . . . . 101 operations . . . . . . . . . . . . . . . . . . . . . . . 11 orthonormal . . . . . . . . . . . . . . . . . . . .186 pole . . . . . . . . . . . . . . . . . . . . . . . . . 52, 55 projection vector . . . . . . . . . . . . . . . 141

Index ■ 299

results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 scalar product . . . . . . . . . . . . . . . . . . . 28 solution . . . . . . . . . . . . . . . . . . . . . . . . . . 6 stress . . . . . . . . . . . . . . . . . . . . . . . . . . 249 traction . . . . . . . . . . . . . . . . . . . . . . . . 250 transpose . . . . . . . . . . . . . . . . . . . . . . . .35 typographical style . . . . . . . . . . . . . . 27 unit vector . . . . . . . . . . . . . . . . . . . . . . 34 unknowns . . . . . . . . . . . . . . . . . . . . . . . . 6 vector space . . . . . . . . . . . . . . . . . . . . . 11 zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 vector input function . . . . . . . . . . . . . . . . 54 vector space . . . . . . . . . . . . . . . . . . . . . . 11, 99 basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 closed . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 column space . . . . . . . . . . . . . . . . . . .102

dimension . . . . . . . . . . . . . . . . . . . . . . 100 left null space . . . . . . . . . . . . . . . . . . 120 null space . . . . . . . . . . . . . . . . . . . . . . 105 null space of AT . . . . . . . . . . . . . . . 118 real . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 row space . . . . . . . . . . . . . . . . . . . . . . 114 subspace . . . . . . . . . . . . . . . . . . . . . . . 100 W well conditioned . . . . . . . . . . . . . . . . . . . . 288 X x-hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Z zero vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 zeroth-order tensor . . . . . . . . . . . . . . . . . 250

MATLAB® Moments

A

Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 206 Elimination matrix . . . . . . . . . . . . . . . . . . 72 Elimination matrix cleaning upward . .82

Adding vectors . . . . . . . . . . . . . . . . . . . . . . . 12 arithmetic operations . . . . . . . . . . . . . . . . . 5 augmented matrix . . . . . . . . . . . . . . . . . . . 66

F Finding column space . . . . . . . . . . . . . . . 135 Finding pivots . . . . . . . . . . . . . . . . . . . . . . 135 Finding the column space . . . . . . . . . . . 104 Finding the cross product . . . . . . . . . . . 179 Finding the determinant . . . . . . . . . . . . 169 Finding the null space . . . . . . . . . . . . . . 105 function create . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Function \ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Function cond() . . . . . . . . . . . . . . . . . . . . 288 Function cross() . . . . . . . . . . . . . . . . . . . . 179 Function det() . . . . . . . . . . . . . . . . . . . . . . 169 Function eig() . . . . . . . . . . . . . . . . . . . . . . 206 Function inv() . . . . . . . . . . . . . . . . . . . . . . . 76 Function length() . . . . . . . . . . . . . . . . . . . . 98 Function lu() . . . . . . . . . . . . . . . . . . . . . . . . 79 Function norm() . . . . . . . . . . . . . . . . . . . . 102 Function null() . . . . . . . . . . . . . . . . . . . . . 115 Function orth() . . . . . . . . . . . . . . . . . . . . . 115 Function pinv() . . . . . . . . . . . . . . . . . . . . . 284 Function qr() . . . . . . . . . . . . . . . . . . . . . . . 194 Function rand() . . . . . . . . . . . . . . . . . . . . . . 22 Function randi() . . . . . . . . . . . . . . . . 19, 174 Function rank() . . . . . . . . . . . . . . . . . . . . . . 68 Function rref() . . . . . . . . . . . . . . . . . . . . . . . 85 Function size() . . . . . . . . . . . . . . . . . . . . . . . 98 Function svd() . . . . . . . . . . . . . . . . . . . . . . 270

B Basic operations . . . . . . . . . . . . . . . . . . . . . . 5 basis for column space . . . . . . . . . . . . . . 228 basis using orth() . . . . . . . . . . . . . . . . . . . 228 C change of basis matrix . . . . . . . . . . . . . . 231 changing basis . . . . . . . . . . . . . . . . . . . . . . 234 Changing basis using a matrix . . . . . . 234 Chapter 3 matrix and vector . . . . . . . . . 64 Chapter 4 fault vectors . . . . . . . . . . . . . . 93 cleaning upward . . . . . . . . . . . . . . . . . . . . . 82 Condition number . . . . . . . . . . . . . . . . . . 288 cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50 cosine degrees . . . . . . . . . . . . . . . . . . . . . . . . 50 Creating I and other matrices . . . . . . . 48 Creating a function . . . . . . . . . . . . . . . . . . 59 Creating the change of basis matrix . 231 cross product . . . . . . . . . . . . . . . . . . . . . . . 179 D degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 determinant . . . . . . . . . . . . . . . . . . . . . . . . 169 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Dimensions of arrays . . . . . . . . . . . . . . . . . 98 Dot product . . . . . . . . . . . . . . . . . . . . . . . . . 28 E Edit column . . . . . . . . . . . . . . . . . . . . . . . . . 67 Edit row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 editing cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 columns . . . . . . . . . . . . . . . . . . . . . . . . . 67 rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Editing single cells . . . . . . . . . . . . . . . . . . . 67 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . 206

G Gauss-Jordan elimination . . . . . . . . . . . . 86 Gram-Schmidt process . . . . . . . . . . . . . . 191 H Hilbert matrix . . . . . . . . . . . . . . . . . . . . . . 289 I identity matrix . . . . . . . . . . . . . . . . . . . . . . 48

301

302 ■ MATLAB® Moments

Invert a matrix . . . . . . . . . . . . . . . . . . . . . . 76

qr() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

M machine precision . . . . . . . . . . . . . . . . . . . . 51 Machine precision on numbers . . . . . . . 51 Making a singular matrix . . . . . . . . . . . 132 Making augmented matrix . . . . . . . . . . . 66 Making matrices . . . . . . . . . . . . . . . . . . . . . 15 Making vectors . . . . . . . . . . . . . . . . . . . . . . 12 matrix cleaning upward . . . . . . . . . . . . . . . . . 82 column space . . . . . . . . . . . . . . . . . . .135 elimination . . . . . . . . . . . . . . . . . . . . . . 72 elimination upward . . . . . . . . . . . . . . 82 entering . . . . . . . . . . . . . . . . . . . . . . . . . 15 identity . . . . . . . . . . . . . . . . . . . . . . . . . 48 inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 76 magic square . . . . . . . . . . . . . . . . . . . 132 multiplication error . . . . . . . . . . . . . .34 multiply vector . . . . . . . . . . . . . . . . . . 31 multiplying . . . . . . . . . . . . . . . . . . . . . . 32 ones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 pivots . . . . . . . . . . . . . . . . . . . . . . . . . . 135 random . . . . . . . . . . . . . . . . . . . . . . . . . 22 random integer . . . . . . . . . . . . . 19, 174 rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 singular . . . . . . . . . . . . . . . . . . . . 99, 132 swapping rows . . . . . . . . . . . . . . . . . . .74 transpose . . . . . . . . . . . . . . . . . . . . . . . .36 zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Matrix rank . . . . . . . . . . . . . . . . . . . . . . . . . 68 Matrix times vector . . . . . . . . . . . . . . . . . . 31 Multiplying matrices . . . . . . . . . . . . . . . . . 32 Multiplying vectors . . . . . . . . . . . . . . . . . . 40

R radians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 rand() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 randi() . . . . . . . . . . . . . . . . . . . . . . . . . . 19, 174 random integer matrix . . . . . . . . . . . . . . . . . . . . . . . . . 174 vector . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Random integer vector or matrix 19, 174 Random vector or matrix . . . . . . . . . . . . 22 rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 red text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Reduced row echelon form . . . . . . . . . . . 85 row space . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 rref and column space . . . . . . . . . . . . . . 135 rref and pivots . . . . . . . . . . . . . . . . . . . . . . 135

N norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Norm of a vector . . . . . . . . . . . . . . . . . . . . .37 Normalizing a vector . . . . . . . . . . . . . . . . 102

V vector adding . . . . . . . . . . . . . . . . . . . . . . . . . . 12 dot product . . . . . . . . . . . . . . . . . . . . . 28 entering . . . . . . . . . . . . . . . . . . . . . . . . . 12 multiplying . . . . . . . . . . . . . . . . . . . . . . 40 norm . . . . . . . . . . . . . . . . . . . . . . . 37, 102 random . . . . . . . . . . . . . . . . . . . . . . . . . 22 random integer . . . . . . . . . . . . . 19, 174 scalar multiplication . . . . . . . . . . . . . 12 transpose . . . . . . . . . . . . . . . . . . . . . . . .36

O ones matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Output formats . . . . . . . . . . . . . . . . . . . . . . 83 P Permutation matrix . . . . . . . . . . . . . . . . . . 74 pinv() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 poorly conditioned matrix . . . . . . . . . . 289 precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Q QR decomposition . . . . . . . . . . . . . . . . . . 194

S Scalar multiplying vector . . . . . . . . . . . . 12 sine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 sine degrees . . . . . . . . . . . . . . . . . . . . . . . . . . 50 singular matrix . . . . . . . . . . . . . . . . . . . . . . 99 svd() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Swapping rows . . . . . . . . . . . . . . . . . . . . . . . 74 T transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Transposing matrices and vectors . . . . 36 trig functions . . . . . . . . . . . . . . . . . . . . . . . . 50 Trigonometric functions . . . . . . . . . . . . . . 50 U Using A\b . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Using the normal equation . . . . . . . . . . 146

W When a matrix is singular . . . . . . . . . . . 99 When matrices will not multiply . . . . . 34 Z zeros matrix . . . . . . . . . . . . . . . . . . . . . . . . . 48