This book is the second of two volumes on linear algebra for graduate students in mathematics, the sciences, and economi

*155*
*40*

*English*
*Pages 288
[307]*
*Year 2020*

- Author / Uploaded
- Frederick P. Greenleaf
- Sophie Marques

*Table of contents : CoverContentsPreface Organization of the Text AcknowledgmentsChapter 1. Generalized Eigenspaces and the Jordan Decomposition An Overview of This Chapter 1.1. Nilpotent Operators and Examples “Stable Range” and “Stable Kernel” of a Linear Map Basic Properties of Nilpotent Operators Cyclic Vectors and Cyclic Subspaces 1.2. Fine Structure of Nilpotent Operators The General Inductive Step in Theorem 1.16 Cyclic Subspace Decompositions Uniqueness of Cyclic Subspace Decomposition 1.3. Generalized Eigenspaces Jordan Decomposition of a Generalized Eigenspace 1.4. The Generalized Eigenspace Decomposition Further Properties of Characteristic Polynomials Proof of Generalized Eigenspace Decomposition The Jordan Canonical Form 1.5. Consequences of the Jordan Canonical Form (JCF) The General Spectral Mapping Theorem Cayley-Hamilton Theorem Alternative Proof of Cayley-Hamilton Further Examples of Jordan Form Jordan Form J(T) as a Similarity Invariant The Minimal Polynomial symbfit {𝑚_{𝑇}(𝑥)} for symbfit {𝑇:𝑉→𝑉} Additional Exercises 1.6. Appendix A: Brief Review of Diagonalization Algebraic vs. Geometric MultiplicityChapter 2. Further Applications of Jordan Form An Overview of This Chapter 2.1. The Jordan Form and Differential Equations Solving Higher-Order Differential Equations General Comments on Constant Coefficient ODE Transforming an ODE to a First-Order System symbfit {𝑑𝑋/𝑑𝑡=𝐴aisebox {3𝑝𝑡}.𝑋(𝑡)} The Case of a Double Root 2.2. Normal Forms for Linear Operators over ℝ Complexification: A Case Study Complexification of Real Vector Spaces and Operators Relations Between RR [𝑡] and CC [𝑡] Complexification of Linear Operators over RR Subspaces of Real Type in 𝑉_{CC } Concordance Between Eigenspaces of symbfit 𝑇 in symbfit 𝑉 and of symbfit {𝑇}_{symbf {CC }} in symbfit {𝑉}_{CC } The Normal Form of a Linear Operator Over RR When symbfit {𝑇_{CC }} Is Not Diagonalizable Additional ExercisesChapter 3. Bilinear, Quadratic, and Multilinear Forms An Overview of This Chapter 3.1. Basic Definitions and Examples Variants: Sesquilinear Forms Dealing with Degeneracy of symbfit 𝐵 3.2. Canonical Models for Bilinear Forms The Automorphism Group of a Bilinear Form 𝐵 Canonical Forms. Case 1: symbfit {𝐵} Symmetric, KK =RR Sylvester’s Theorem: Invariance of the Signature on 𝑂𝑠𝑦𝑚𝑏𝑓𝑖𝑡(𝑝,𝑞) A Diagonalization Algorithm The Gauss-Seidel Algorithm Canonical Forms. Case 2: symbfit 𝐵 Symmetric, symbfit {KK =CC } Canonical Forms: Case 3: symbfit {𝐵} Antisymmetric, KK =RR or CC 3.3. Sesquilinear Forms (KK =CC ) Sesquilinear Change of Basis Formula Additional ExercisesChapter 4. Tensor Fields, Manifolds, and Vector Calculus An Overview of This Chapter 4.1. Tangent Vectors, Cotangent Vectors, and Tensors Smooth Functions and Mappings on Manifold symbfit {𝑀} Tangent Vectors and the Tangent Spaces TM _{𝑝} Some Objections: Finding the Right Definition Change of Coordinates Vector Fields as Differential Operators on symbfit 𝑀 The Differential of a Map Between Manifolds 4.2. Cotangent Vectors and Differential Forms The Canonical d-Operator on Functions Exterior Derivatives (an Overview) Primitives of symbf 1-Forms Interpretations of symbf 1-Forms and Their Primitives 4.3. Differential Forms on symbfit 𝑀 and Their Exterior Derivatives Action of Permutation Groups symbfit {𝑆_{𝑛}} on Tensors Products of Tensors and Tensor Fields Wedge Product of Antisymmetric Tensors Calculating Wedge Products on a Chart Exterior Derivative of Higher Rank sbfi 𝑘-Forms Primitives of sbfi 𝑘-Forms. Poincaré’s Lemma Transferring Calculations Between 𝑀 and RR ^{𝑚} Proof of Poincaré’s Lemma 4.4. Div, Grad, Curl, and All That Vector Operations in Two Dimensions Differential Forms and the Cross Product Additional ExercisesChapter 5. Matrix Lie Groups An Overview of This Chapter 5.1. Matrix Groups and the Implicit Function Theorem Smooth Mappings and Their Differentials Smooth Hypersurfaces and the Implicit Function Theorem (IFT) Rank vs Dimension of Level Sets The “Maximal Rank” Case The Inverse Mapping Theorem (IMT) Differentiable Manifolds (General Definition) Differentiable Manifolds and the IFT 5.2. Matrix Lie Groups Real vs Complex Matrix Groups Examples of Lie Groups Translations and Automorphisms on a Lie Group Tangent Spaces TM _{𝑝} of a Manifold The Differential of a Map Smooth Vector Fields on a Manifold Smooth Vector Fields as Differential Operators on 𝑀 5.3. Lie Algebra Structure in Tangent Spaces of Lie Groups Lie Algebra of a Matrix Lie Group Adjoint Action of G on Its Lie Algebra rk {𝑔} Lie Algebras of the Classical Groups Some Lie Algebras Determined by Commutation Relations 5.4. The Exponential Map for Matrix Lie Groups One-Parameter Subgroups in Matrix Lie Groups The Logarithm Logb (symbfit {𝐴)} of a Matrix Singularities of the Exponential Map on symbf {𝑀(n,CC )} Relation Between symbf {𝐸𝑥𝑝: rk {𝑔𝑙}}→symbf {𝐺𝐿} and Its Restriction symbf {𝑒𝑥𝑝: rk {𝑔}→𝐺}. Case Study: Nilpotent Lie Groups The Maps 𝐸𝑥𝑝,𝐴𝑑, 𝑎𝑑 Revisited Connected Lie Groups The Campbell-Baker-Hausdorff Formula: Recovering symbfit 𝐺 from rk {𝑔} 5.5. The Lie Correspondence: Lie Groups vs Lie Algebras Further Development of Lie Theory Additional ExercisesBibliographyBack Cover*

C O U R A N T F R E D E R I C K P. G R E E N L E A F SOPHIE MARQUES

Linear Algebra II

American Mathematical Society Courant Institute of Mathematical Sciences

30 LECTURE NOTES

10.1090/cln/030

Linear Algebra II

Courant Lecture Notes in Mathematics Executive Editor Jalal Shatah Managing Editor Paul D. Monsour Production Editor Neelang Parghi

Frederick P. Greenleaf Courant Institute

Sophie Marques Stellenbosch University

30

Linear Algebra II

Courant Institute of Mathematical Sciences New York University New York, New York American Mathematical Society Providence, Rhode Island

2010 Mathematics Subject Classiﬁcation. Primary 97H60, 15-01, 15A04, 15A15, 15A18, 15A21, 40A05, 40A25, 20B30.

For additional information and updates on this book, visit www.ams.org/bookpages/cln-30

Library of Congress Cataloging-in-Publication Data Names: Greenleaf, Frederick P., author. | Marques, Sophie, 1986- author. Title: Linear algebra II / Frederick P. Greenleaf, Courant Institute, New York University, Sophie Marques, Stellenbosch University. Description: Providence, RI : American Mathematical Society, [2020] | Series: Courant lecture notes in mathematics, 1529-9031 ; 30 | Includes bibliographical references. Identiﬁers: LCCN 2019059330 | ISBN 9781470454258 (paperback) | ISBN 9781470456429 (ebook) Subjects: LCSH: Algebras, Linear. | AMS: Mathematics education – Algebra – Linear algebra. | Linear and multilinear algebra; matrix theory – Instructional exposition (textbooks, tutorial papers, etc.). | Linear and multilinear algebra; matrix theory – Basic linear algebra – Linear transformations, semilinear transformations. | Linear and multilinear algebra; matrix theory – Basic linear algebra – Determinants, permanents, other special matrix functions [See also 19B10, 19B14]. | Linear and multilinear algebra; matrix theory – Basic linear algebra – Matrix exponential and similar functions of matrices. | Linear and multilinear algebra; matrix theory – Basic linear algebra – Eigenvalues, singular values, and eigenvectors. | Linear and multilinear algebra; matrix theory – Basic linear algebra – Canonical forms, reductions, classiﬁcation. | Sequences, series, summability – Convergence and divergence of inﬁnite limiting processes – Convergence and divergence of series and sequences. | Sequences, series, summability – Convergence and divergence of inﬁnite limiting processes – Approximation to limiting values (summation of series, etc.) For the Euler-Maclaurin summation formula, | Group theory and generalizations – Permutation groups – Symmetric groups. Classiﬁcation: LCC QA184.2 .G7255 2020 — DDC 512/.5–dc23 LC record available at https://lccn.loc.gov/2019059330

Copying and reprinting. Individual readers of this publication, and nonproﬁt libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Requests for permission to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For more information, please visit www.ams.org/publications/pubpermissions. Requests for translation rights and licensed reprints should be sent to reprint-permission @ams.org. c 2020 by the authors. All rights reserved. Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines

established to ensure permanence and durability. Visit the AMS home page at https://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

25 24 23 22 21 20

Contents Preface Organization of the Text Acknowledgments

ix xiii xiv

Chapter 1. Generalized Eigenspaces and the Jordan Decomposition An Overview of This Chapter 1.1. Nilpotent Operators and Examples “Stable Range” and “Stable Kernel” of a Linear Map Basic Properties of Nilpotent Operators Cyclic Vectors and Cyclic Subspaces 1.2. Fine Structure of Nilpotent Operators The General Inductive Step in Theorem 1.16 Cyclic Subspace Decompositions Uniqueness of Cyclic Subspace Decomposition 1.3. Generalized Eigenspaces Jordan Decomposition of a Generalized Eigenspace 1.4. The Generalized Eigenspace Decomposition Further Properties of Characteristic Polynomials Proof of Generalized Eigenspace Decomposition The Jordan Canonical Form 1.5. Consequences of the Jordan Canonical Form (JCF) The General Spectral Mapping Theorem Cayley-Hamilton Theorem Alternative Proof of Cayley-Hamilton Further Examples of Jordan Form Jordan Form J(T) as a Similarity Invariant The Minimal Polynomial 𝑚𝑇 (𝑥) for 𝑇 ∶ 𝑉 → 𝑉 Additional Exercises 1.6. Appendix A: Brief Review of Diagonalization Algebraic vs. Geometric Multiplicity

1 1 3 5 6 8 9 14 16 16 21 24 25 27 29 31 37 37 39 40 41 43 44 46 55 58

Chapter 2. Further Applications of Jordan Form An Overview of This Chapter 2.1. The Jordan Form and Differential Equations Solving Higher-Order Differential Equations General Comments on Constant Coefficient ODE

63 63 64 66 68

v

vi

CONTENTS

Transforming an ODE to a First-Order System {𝑑𝑋/𝑑𝑡 = 𝐴 . 𝑋(𝑡) The Case of a Double Root 2.2. Normal Forms for Linear Operators over ℝ Complexification: A Case Study Complexification of Real Vector Spaces and Operators Relations Between ℝ[𝑡] and ℂ[𝑡] Complexification of Linear Operators over ℝ Subspaces of Real Type in 𝑉ℂ Concordance Between Eigenspaces of 𝑇 in 𝑉 and of 𝑇ℂ in 𝑉ℂ The Normal Form of a Linear Operator Over ℝ When 𝑻ℂ Is Not Diagonalizable Additional Exercises

68 71 75 76 77 78 80 83 85 88 89 94

Chapter 3. Bilinear, Quadratic, and Multilinear Forms An Overview of This Chapter 3.1. Basic Definitions and Examples Variants: Sesquilinear Forms Dealing with Degeneracy of 𝐵 3.2. Canonical Models for Bilinear Forms The Automorphism Group of a Bilinear Form 𝐵. Canonical Forms. Case 1: 𝐵 Symmetric, 𝕂 = ℝ Sylvester’s Theorem: Invariance of the Signature on O(𝑝, 𝑞) A Diagonalization Algorithm The Gauss-Seidel Algorithm Canonical Forms. Case 2: 𝐵 Symmetric, 𝕂 = ℂ Canonical Forms: Case 3: 𝐵 Antisymmetric, 𝕂 = ℝ or ℂ 3.3. Sesquilinear Forms (𝕂 = ℂ) Sesquilinear Change of Basis Formula Additional Exercises

99 99 100 102 106 106 107 109 113 115 117 118 120 122 123 127

Chapter 4. Tensor Fields, Manifolds, and Vector Calculus An Overview of This Chapter 4.1. Tangent Vectors, Cotangent Vectors, and Tensors Smooth Functions and Mappings on Manifold 𝑀 Tangent Vectors and the Tangent Spaces TM𝑝 Some Objections: Finding the Right Definition Change of Coordinates Vector Fields as Differential Operators on 𝑀 The Differential of a Map Between Manifolds 4.2. Cotangent Vectors and Differential Forms The Canonical 𝑑-Operator on Functions Exterior Derivatives (an Overview) Primitives of 1-Forms Interpretations of 1-Forms and Their Primitives

131 131 134 138 141 142 147 149 151 154 155 157 158 160

CONTENTS

4.3. Differential Forms on 𝑀 and Their Exterior Derivatives Action of Permutation Groups 𝑆𝑛 on Tensors Products of Tensors and Tensor Fields Wedge Product of Antisymmetric Tensors Calculating Wedge Products on a Chart Exterior Derivative of Higher Rank 𝑘-Forms Primitives of 𝑘-Forms. Poincaré’s Lemma Transferring Calculations Between 𝑀 and ℝ𝑚 Proof of Poincaré’s Lemma 4.4. Div, Grad, Curl, and All That Vector Operations in Two Dimensions Differential Forms and the Cross Product Additional Exercises Chapter 5. Matrix Lie Groups An Overview of This Chapter 5.1. Matrix Groups and the Implicit Function Theorem Smooth Mappings and Their Differentials Smooth Hypersurfaces and the Implicit Function Theorem (IFT) Rank vs Dimension of Level Sets The “Maximal Rank” Case The Inverse Mapping Theorem (IMT) Differentiable Manifolds (General Definition) Differentiable Manifolds and the IFT 5.2. Matrix Lie Groups Real vs Complex Matrix Groups Examples of Lie Groups Translations and Automorphisms on a Lie Group Tangent Spaces TM𝑝 of a Manifold The Differential of a Map Smooth Vector Fields on a Manifold Smooth Vector Fields as Differential Operators on 𝑀 5.3. Lie Algebra Structure in Tangent Spaces of Lie Groups Lie Algebra of a Matrix Lie Group Adjoint Action of 𝐺 on its Lie Algebra 𝔤 Lie Algebras of the Classical Groups Some Lie Algebras Determined by Commutation Relations 5.4. The Exponential Map for Matrix Lie Groups One-Parameter Subgroups in Matrix Lie Groups The Logarithm 𝐿𝑜𝑔(𝐴) of a Matrix Singularities of the Exponential Map on 𝑀(n, ℂ) 𝑒𝑥𝑝 ∶ 𝔤 → 𝐺 Case Study: Nilpotent Lie Groups The Maps Exp, Ad, ad Revisited

vii

161 165 167 167 170 172 175 176 177 178 181 185 186 189 203 203 204 205 207 211 212 213 214 215 221 222 225 231 233 234 235 238 242 243 246 247 252 254 255 256 258 260 261 264

viii

CONTENTS

Connected Lie Groups The Campbell-Baker-Hausdorff Formula: Recovering 𝑮 from 𝔤 5.5. The Lie Correspondence: Lie Groups vs Lie Algebras Further Development of Lie Theory Additional Exercises Bibliography

270 272 274 276 277 289

Preface The previous volume, Linear Algebra I (LA I), provided a self-contained account of basic topics that might be covered in one semester at the introductory graduate level, ending with: • In Chapter 5, a general discussion of diagonalization including: norm convergence of linear operators, matrix-valued power series, and use of the exponential map Exp(𝐴) = 𝑒 𝐴 to solve systems of differential equations 𝑑𝑋/𝑑𝑡 = 𝐴 ⋅ 𝑋(𝑡) in which 𝐴 is diagonalizable. • In Chapter 6 we discuss orthogonal diagonalization of operators on inner product spaces, including spectral decomposition, spectral mapping theorem, polar decomposition 𝑇 = 𝑈 ⋅ |𝑇|, and singular value decomposition. Linear Algebra II begins with more challenging basic topics, presented in Chapters 1–3, whose contents are described below. The final Chapters 4 and 5 are different. They are intended as more-or-less independent and self-contained surveys of two special topics: • Chapter 4: Tensor Fields, Differentiable Manifolds and Vector Calculus • Chapter 5: Matrix Lie Groups Both are vast subjects, so presentations in these chapters will not be as fully developed as those in preceding chapters or LA I. Our intent was to present some advanced topics that illustrate uses of linear algebra in realms beyond a standard second course in linear algebra. In practice, every instructor could choose which to present, according to his or her interests and those of the class. Each chapter in LA II begins with a detailed overview of the topics that will be covered. Here is brief description of the chapter contents. Chapter 1. Generalized Eigenspaces and the Jordan Decomposition. The first serious obstacle to diagonalization is dealing with nilpotent operators, which is addressed by working out a detailed procedure for computing their cyclic subspace decompositions. Although there are elegant existence proofs regarding such decompositions, the desired cyclic subspaces are not unique (unlike eigenspaces in diagonalization), and any procedure for finding them is inevitably complicated by the need to make some arbitrary choices. For any linear operator 𝑇 ∶ 𝑉 → 𝑉 the space 𝑉 is uniquely a direct sum (the Fitting decomposition) 𝑉 = 𝐾∞ (𝑇) ⊕ 𝑅∞ (𝑇) of 𝑇-invariant subspaces, the “stable kernel” and “stable range” of 𝑇, on which 𝑇|𝐾∞ is nilpotent and 𝑇|𝑅∞ is ix

x

PREFACE

bijective. If 𝜆 is an eigenvalue of 𝑇, (𝑇 − 𝜆𝐼) is obviously nilpotent on its stable kernel 𝑀𝜆 (𝑇) = 𝐾∞ (𝑇 −𝜆𝐼), the generalized 𝜆-eigenspace of 𝑇. These spaces are 𝑇-invariant and linearly independent. Whenever the characteristic polynomial 𝑝𝑇 (𝑥) = det(𝑇 − 𝜆𝐼) splits in 𝕂[𝑥], a proof by induction on dim(𝑉) employs the Fitting decomposition to show that 𝑉 is the direct sum 𝑉 = ⨁𝜆 𝑀𝜆 (𝑇) of its generalized eigenspaces. Since each component 𝑀𝜆 (𝑇) has its own cyclic subspace decomposition, this “generalized eigenspace decomposition” leads directly to the Jordan canonical form 𝐽(𝑇) for 𝑇, which has many applications. Chapter 1 ends with a brief appendix reviewing key theorems about diagonalization that were covered in Linear Algebra I, illustrating a few techniques of proof that will be relevant in discussing what to do when diagonalization fails. Chapter 2. Further Applications of the Jordan Form. In the first half of this chapter we employ the Jordan form to solve higher-order ODEs by converting them into linear systems 𝑑𝑋/𝑑𝑡 = 𝐴 ⋅ 𝑋(𝑡) of constant coefficient first-order equations. We then recast the coefficent matrix 𝐴 in Jordan canonical form 𝐽(𝐴) made up of diagonal blocks 𝐵𝑘 = 𝜆𝑘 𝐼 + 𝐸𝑘 in which 𝐸𝑘 is an elementary nilpotent matrix. Owing to nilpotence, the one-parameter groups 𝑒𝑡𝐵𝑘 can be written explicitly, and the system solved by taking 𝑋(𝑡) = 𝑒𝑡𝐽(𝐴) ⋅ 𝑋(0). The solution of the original 𝑛th -order ODE is easily read from this. These techniques work when the characteristic polynomial 𝑝𝐴 (𝑥) splits into linear factors in the space of polynomials 𝕂[𝑥]. The second half of Chapter 2 concerns itself with complexifying real vector spaces and linear operators 𝑇 ∶ 𝑉 → 𝑉 to get complex spaces 𝑉ℂ and operators 𝑇ℂ to which the preceding methods apply. By reverse-engineering the construction of the Jordan form 𝐽(𝑇ℂ ), the original ℝ-linear operator 𝑇 can be recast in a “real-normal form” that reveals its structure. There are actually two such real forms, depending on whether 𝑇ℂ is diagonalizable or not (in which case one applies the Jordan form). Chapter 3. Bilinear, Quadratic, and Multilinear Forms. Bilinear forms 𝐵(𝑥, 𝑦) on a vector space 𝑉 arise often in many areas of mathematics, inner products on real vector spaces being just one example. We restrict attention to the symmetric and antisymetric 𝐵, the ones most frequently encountered. Bilinear forms can be described by 𝑛 × 𝑛 matrices [𝐵]𝔛 = [𝐵(𝑒𝑖 , 𝑒𝑗 )] once a basis 𝔛 = {𝑒𝑖 } in 𝑉 is specified. Modulo a change of basis, bilinear forms over ℝ or ℂ can be classified according to their normal forms, for which there are only three possibilites. We also work out the transformation law [𝐵]𝔛 → [𝐵]𝔜 between coordinate descriptions of 𝐵. Every bilinear form 𝐵 ∶ 𝑉 × 𝑉 → 𝕂 determines a group Aut(𝐵) of automorphisms, linear operators that leave 𝐵 invariant, so 𝐵(𝑇𝑥, 𝑇𝑦) = 𝐵(𝑥, 𝑦). As examples we have the orthogonal group of linear rigid motions O(𝑛) on ℝ𝑛 , and the unitary operators U(𝑛) on ℂ𝑛 , that preserve the usual inner products on these spaces; there is also a third family, the symplectic groups Sp(2𝑛, 𝕂). These classical groups are prevalent in physics and geometry, or wherever one

PREFACE

xi

has to deal with symmetries. As an example we discuss the Lorentz group, the automorphism group Aut(𝐵) = SO(3, 1) for the form 𝐵(𝐱, 𝐲) = 𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 − 𝑥4 𝑦4

for 𝐱.𝐲 ∈ ℝ4 .

This is the symmetry group underlying Einstein’s theory of special relativity (taking 𝑥1 , 𝑥2 , 𝑥3 to be space coordinates and 𝑥4 = time). It tells us how to transform measurements made in one space-time coordinate frame to those made in a different frame moving at constant velocity. (The outcome is decidedly nonintuitive!) Beyond all this, Chapter 3 makes an initial foray into the world of tensors and multilinear forms, which have become central topics in differential geometry, and physics too — general relativity is all about tensors, as is much of electromagnetic theory and the analysis of stress in solid media. This part of Chapter 3 covers a few basic facts that set the stage for a much more extensive discussion of tensor algebra and differential geometry in Chapter 4, should the instructor choose to pursue this special topic upon completion of Chapters 1–3. Chapter 4. Tensor Fields, Manifolds, and Vector Calculus. This chapter explores aspects of linear and multilinear algebra that lie at the heart of modern differential geometry and ends with a reinterpretation of the main results of traditional multivariate calculus. After a brief review of the classical vector operators div, grad, and curl of calculus, attention shifts to the concept of a differentiable manifold and the present-day interpretation of the vector fields that live on them, which are now recognized to be of many types — fields of tangent vectors, cotangent vectors, and tensors of various ranks. Early on, manifolds were viewed as smooth hypersurfaces embedded in some Euclidean space; tangent vectors, normal vectors, and tensor were realized as objects within that external space. But everyone knew this universal Euclidean space was a fiction. Furthermore, in trying to do calculus on, say, a 4-dimensional hypersurface 𝑀 embedded in a 10-dimensional space, one is confronted with some vexing questions, for instance: What do we mean by a “tangent vector to 𝑀” if we are forbidden to speak of anything external to 𝑀? More generally, how can one get rid of all reference to some mythical “allencompassing Euclidean space” and deal with such questions in an intrinsic and coordinate-independent way? A satisfactory answer emerged in the first quarter of the twentieth century, based on the concept of a differentiable manifold which circumvented these philosophical issues. Along with it came the realization that many different types of vector fields could be associated with a manifold. We illustrate the possibilities by examining a few examples from the physical sciences, in which we explain why: • velocity fields, as in fluid flow, must be regarded as fields of tangent vectors,

xii

PREFACE

• force fields such as electric fields, gravitational fields, etc., must be interpreted as fields of cotangent vectors. • On the other hand, magnetic fields are usually described in introductory courses as vector fields, but the vectors involved are something quite different, neither tangent nor cotangent vectors. They are in fact represented by fields of rank-2 antisymmetric tensors. Our goals in Chapter 4 are to: • Develop enough of the theory of differentiable manifolds, and the multilinear forms (tensor fields) that live on them, to reformulate traditional multivariate calculus in modern terms. • Work through basic concepts of multilinear algebra needed to understand the concepts upon which modern differential geometry is based. • Provide enough exposure to these concepts so that one can carry out meaningful computations. The last section of this chapter is focused on tying the concepts discussed in this chapter, to what you might have been taught in lower-level multivariate calculus. Chapter 5. Matrix Lie Groups. The subject of Lie groups is the nexus where three major areas of mathematics come into play — calculus-style analysis, modern algebra, and differential geometry are all intertwined in this field, underpinned by the tools linear algebra provides. That interplay among so many disciplines can be daunting, but it is also what makes Lie groups such an interesting subject. This brief chapter cannot be a treatise on the subject. It is intended to be an introduction to the players involved, a few of the most important concepts, and examples illustrating the techniques involved in working with them. Some knowledge of differential geometry will be assumed, but the necessary topics reprised early in the chapter to make the discussion self-contained. For additional information see Section 4.1. Many Lie groups of interest in physics and geometry were originally modeled as curvilinear smooth hypersurfaces 𝑀 embedded in Euclidean spaces ℝ𝑛 (and sometimes ℂ𝑛 ), for example the “classical matrix groups” — the orthogonal groups O(𝑛), unitary groups U(𝑛), and symplectic groups Sp(2𝑛) introduced in Chapter 3 as the symmetry groups associated with various bilinear forms that arise throughout mathematics — are realized as subsets of matrix space M(𝑛, 𝕂) with 𝕂 = ℝ or ℂ. Beginning in the twentieth century Lie theory (and most of differential geometry as well) moved away from the notion of embedded hypersurfaces and developed an intrinsic theory of such spaces — now called differentiable manifolds — in which Lie groups and manifolds were regarded as self-contained universes in their own right. However, to keep things simple we focus on matrix Lie groups, which can be modeled as subsets of matrix space over ℝ or ℂ. Besides, almost everything said for matrix Lie groups carries over to the general theory of Lie groups, although some proofs get harder in that setting.

ORGANIZATION OF THE TEXT

xiii

Main topics of Chapter 5 are: • To show how differentiable structure can be imposed on a matrix group 𝐺 using the implicit function theorem to define Euclidean coordinates on a family of open sets 𝑈𝛼 that cover 𝐺. Equipped with these coordinate charts, 𝐺 becomes a Lie group, in which the group multiplication operation 𝐺 × 𝐺 → 𝐺 becomes a differentiable map. • To discuss what tangent vectors, tangent spaces, and derivatives of functions 𝑓 ∶ 𝐺 → ℝ” might mean on a group 𝐺 equipped with differentiable structure, which might be realized as curvilinear hypersurface of high dimension. This is the realm of differential geometry. • The result is a construct 𝐺 that has both geometric and algebraic aspects. Lie groups can be quite complicated, but deal with them we must because they turn up at the heart of modern physics (and other fields), for instance as the symmetry groups that govern: the interactions of subatomic particles; the behavior of the cosmos according to special and general relativity; and the seemingly paradoxical rules of quantum mechanics. Even the periodic table of chemistry can in the end be deduced as a consequence of the mathematics associated with that classic example of a Lie group — the special orthogonal group SO(3) of rotations in 3-dimensional space. • Finally, we will show that the tangent space to a Lie group 𝐺 at its identity element, which at first sight is just a vector space of the same dimension as the coordinate charts that cover 𝐺, acquires algebraic structure induced by the multiplication law in 𝐺 and becomes what is known as a Lie algebra. The important point is that this Lie algebra is a linear structure, much easier to deal with than 𝐺, and yet it encodes almost all the information to completely reconstruct and understand the Lie group from which it was derived. Organization of the Text The handling of exercises is somewhat unconventional. Many are placed within the main text, as the topics they address first occur. Each chapter ends with an extensive set of section-by-section Additional Exercises. Some recap the main topics of of each section; others are longer and intended to be more challenging; each begins with a block of true/false questions, which students often find more challenging than you might expect. There are a few unconventional notations, explained as they are introduced: • We often write |𝑉| for the dimension dim(𝑉), and 𝑅(𝑇), 𝐾(𝑇) for the range and kernel of 𝑇, respectively, when this is convenient. • A special symbol 1- is used to distinguish the constant function (or polynomial) everywhere equal to 1, from the scalar 1. The symbol 𝕂 indicates a generic ground field.

xiv

PREFACE

Acknowledgments The authors are particularly indebted to Paul Monsour, managing editor of the AMS/Courant Lecture Notes Series, and the in-house production editor, Neelang Parghi, for their tireless efforts in creating this volume. Thanks also to Ina Mette, AMS acquisitions editor, and to Holli Chopra, our copy-editor, who was able to catch many errors in the original LATEX file — those that remain, for which we apologize in advance, are entirely the authors’ responsibility.

10.1090/cln/030/01

CHAPTER 1

Generalized Eigenspaces and the Jordan Decomposition What to do when diagonalization fails.

An Overview of This Chapter. In the previous volume, Linear Algebra I, we analyzed operators that were diagonalizable and the techniques by which they can be cast into diagonal form. The current volume, Linear Algebra II, explores what can be done to understand the structure of a linear operator when diagonalization fails. We continue the conventions employed in Linear Algebra I, referred to as “LA I” in the present Notes. For instance, we often write dim(𝑉) = |𝑉| where appropriate; 𝕂, ℂ, ℝ, ℚ, for number fields; and 1- for the constant polynomial in 𝕂[𝑥]. Chapter 1 ends with a brief appendix that recapitulates key concepts developed in LA I about diagonalizing linear operators. This appendix also includes some details of proof illustrating techniques that will be relevant to our efforts in the current chapter. Once we pass beyond the topics of LA I, the analysis gets harder and the revealed structures more complicated. With that in mind, and as a guide to readers of this chapter, we shall briefly outline the major steps needed to move beyond the realm of diagonalizability. • The first serious obstacle is dealing with nilpotent operators, those for which 𝑇 𝑛 = 0 (the zero operator) for all sufficiently large 𝑛 ∈ ℕ. These have 𝜆 = 0 as their only eigenvalue for any ground field 𝕂, and the only eigenspace 𝐸𝜆=0 (𝑇) can be ⫋ 𝑉. Decomposing 𝑉 into eigenspaces gets you nowhere. • The “Fitting decomposition” of 𝑉 provides a path toward understanding an arbitrary linear operator 𝑇 ∶ 𝑉 → 𝑉. In it, we extend the notions of kernel 𝐾(𝑇) and range 𝑅(𝑇) of an operator by defining the stable kernel 𝐾∞ (𝑇) and stable range 𝑅∞ (𝑇): 𝐾∞ (𝑇) =

⋃

𝐾(𝑇 𝑛 )

and

𝑛∈ℕ

𝑅∞ (𝑇) =

⋂

𝑅(𝑇 𝑛 ).

𝑛∈ℕ

The subspaces 𝐾(𝑇 𝑛 ) are strictly increasing until they become constant, with 𝐾(𝑇 𝑛 ) = 𝐾(𝑇 𝑛+1 ) = ⋯ = 𝐾∞ (𝑇); the ranges are decreasing and stabilize at the same exponent 𝑛, with 𝑅(𝑇 𝑛 ) = 𝑅∞ (𝑇). These stable subspaces are 𝑇-invariant and yield a direct sum decomposition 1

2

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

𝑉 = 𝐾∞ (𝑇)⊕𝑅∞ (𝑇); this Fitting decomposition is the basis for all further progress. Notice that the restricted operator 𝑇˜ = 𝑇|𝐾∞ (𝑇) is nilpotent, with 𝑇˜𝑛−1 ≠ 0, 𝑇˜𝑛 = 0, and nilpotence degree 𝑛 = dim(𝐾∞ ). • The structure of a nilpotent operator is revealed by decomposing 𝑉 as a direct sum 𝑉 = 𝐶1 ⊕ ⋯ ⊕ 𝐶𝑟 of cyclic subspaces, where each 𝐶𝑘 contains a vector 𝑣0 whose iterates 𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑑𝑘 −1 (𝑣0 ) span 𝑉 and are linearly independent, with 𝑇 𝑑𝑘 (𝑣0 ) = 0. Thus, 𝔛 = {𝑇 𝑘 (𝑣0 ) ∶ 0 ≤ 𝑗 ≤ 𝑑𝑘 − 1} is a basis, dim(𝐶𝑘 ) = 𝑑𝑘 , and the restriction 𝑇|𝐶𝑘 is nilpotent with nilpotence degree 𝑑(𝑇|𝐶𝑘 ) = 𝑑𝑘 . If we list these basis vectors in reverse order, the restriction has a particularly simple matrix realization as an elementary nilpotent matrix

[𝑇|𝐶𝑘 ]𝔛𝔛

0 1 0 ⎛ ⎞ 0 1 ⎜ ⎟ ⋱ ⋱ =⎜ ⎟ 0 1 ⎟ ⎜ 0 ⎠𝑑 ⎝0

. 𝑘 ×𝑑𝑘

• The cyclic subspace decomposition 𝑉 = 𝐶1 ⊕ ⋯ ⊕ 𝐶𝑟 is, unfortunately, not unique. The subspaces 𝐶𝑘 are not uniquely determined, unlike the eigenspaces of a diagonalizable operator 𝑇; “algorithms” for finding suitable subspaces 𝐶𝑘 will necessarily involve some arbitrary choices that cannot be determined in advance. On the other hand, this decomposition is unique in the sense that the number of subspaces 𝐶𝑘 , their dimensions 𝑑𝑘 , and the elementary nilpotent matrix realization of 𝑇|𝐶𝑘 for a suitably chosen basis are unique and can be computed explicity. The trouble comes if you find it necessary to exhibit an actual basis 𝔛 that does the job. • After working out a detailed procedure for handling nilpotent operators, we turn to analysis of general operators whose characteristic polynomial 𝑝𝑇 (𝑥) = det(𝑇 − 𝜆1-) splits into linear factors in 𝕂[𝑥]. This is always true when 𝕂 = ℂ, and in any case, this obstacle can always be surmounted by “enlarging” the ground field 𝕂, a process that will be illustrated for 𝕂 = ℝ vs 𝕂 = ℂ in Chapter 2. • This brings us to the central result of Chapter 1, the generalized eigenspace decomposition of a linear operator 𝑇 and the associated Jordan canonical form (JCF) for matrices, which apply whenever 𝑝𝑇 (𝑥) splits over 𝕂. For any 𝜆 ∈ 𝕂, we define the generalized eigenspace 𝑀𝜆 (𝑇) = 𝐾∞ (𝑇 − 𝜆𝐼) = {𝑥 ∈ 𝑉 ∶ (𝑇 − 𝜆𝐼)𝑘 𝑥 = 0 for some 𝑘 ∈ ℕ},

1.1. NILPOTENT OPERATORS AND EXAMPLES

3

and 𝜆 is a generalized eigenvalue for 𝑇 if 𝑀𝜆 (𝑇) ≠ (0). Clearly, 𝜆 is an eigenvalue ⇔ it is a generalized eigenvalue, but 𝐸𝜆 (𝑇) ⫋ 𝑀𝜆 (𝑇) in general and that makes all the difference. The 𝑀𝜆 (𝑇) are 𝑇-invariant, and when 𝑝𝑇 (𝑥) splits over 𝕂, we get a direct sum decomposition 𝑉=

⨁

𝑀𝜆 (𝑇)

𝜆

indexed by the distinct eigenvalues of 𝑇 in 𝕂. • That analysis begins by considering a fixed eigenvalue 𝜆 and invoking both the Fitting decomposition and the generalized eigenspace decomposition. Observe that the stable kernel of the operator 𝑇 − 𝜆𝐼 is invariant under both (𝑇 − 𝜆𝐼) and 𝑇, and that (𝑇 − 𝜆𝐼) is nilpotent on its stable kernel 𝐾∞ (𝑇 − 𝜆) = {𝑥 ∈ 𝑉 ∶ (𝑇 − 𝜆𝐼)𝑘 𝑥 = 0 for some 𝑘} = 𝑀𝜆 (𝑇). Therefore, the restriction of (𝑇 − 𝜆𝐼) to this stable kernel has a cyclic 𝑠 subspace decomposition 𝑀𝜆 (𝑇) = ⨁𝑗=1 𝐶𝜆,𝑗 . Each cyclic subspace has a basis such that the restriction of (𝑇 − 𝜆𝐼) to 𝐶𝜆,𝑗 is represented by an elementary nilpotent matrix. Since 𝑇 itself leaves these subspaces invariant, the matrix of 𝑇 = (𝑇 − 𝜆𝐼) + 𝜆𝐼 with respect to the same basis is a typical Jordan block

𝐵𝑘 = [𝑇|𝐶𝜆,𝑗 ]𝔛𝔛

𝜆 1 0 ⎛ ⎞ 𝜆 1 ⎜ ⎟ ⋱ =⎜ ⎟ 𝜆 1⎟ ⎜ 𝜆 ⎠𝑑 ⎝0

.

𝑘 ×𝑑𝑘

• In the endgame (only sketched here, leaving details to the main text), we proceed by returning to the Fitting decomposition, which becomes 𝑉 = 𝑀𝜆 (𝑇) ⊕ 𝑅∞ (abbreviating 𝑅∞ = 𝑅∞ (𝑇 − 𝜆𝐼)). We will show the following: (i) 𝑅∞ is 𝑇-invariant. (ii) The characteristic polynomial 𝑝𝑇 ′ of 𝑇 ′ = 𝑇|𝑅∞ splits over 𝕂 if 𝑝𝑇 does. That means the restriction of 𝑇 to 𝑅∞ satisfies all the hypotheses of the theorem we are trying to prove, and since dim(𝑅∞ ) < dim(𝑉), we may argue by induction on dim(𝑉) to conclude that we have similar “fine structure” in each of the generalized eigenspaces. That concludes our outline of things to come. Now for the details. 1.1. Nilpotent Operators and Examples Nilpotent operators present the first obstacle in attempts to diagonalize a linear operator. Definition 1.1. We say 𝑇 ∶ 𝑉 → 𝑉 is nilpotent if 𝑇 𝑘 = 0 for some 𝑘 ∈ ℕ and unipotent if 𝑇 = 𝐼 + 𝑁 with 𝑁 nilpotent. The smallest exponent such that

4

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

𝑇 𝑑 = 0 is the nilpotent degree deg(𝑇). Obviously, 𝑇 is unipotent ⇔ 𝑇 − 𝐼 is nilpotent. Nilpotent and unipotent matrices 𝐴 ∈ M(𝑛, 𝕂) are defined the same way as for operators. Nilpotent operators cannot be diagonalized unless 𝑇 is the zero operator (or 𝑇 = 𝐼, if unipotent). Any analysis of normal forms must therefore examine these operators in detail. As examples, all strictly upper triangular matrices (with zeros on the diagonal) as well as those that are strictly lower triangular are nilpotent in view of the following observations. Exercise 1.2. If 𝐴 has upper triangular form with zeros on and below the diagonal, prove that 0 0 ∗ ⎛ ⎞ 0 ⋱ 𝐴2 = ⎜ ⎟, ⋱ 0 ⎟ ⎜ 0 ⎠ ⎝0

0 0 0 ∗ ⎛ ⋱ ⋱ ⋱ ⎜ ⋱ ⋱ 0 𝐴3 = ⎜ ⋱ 0 ⎜ 0 ⎝0

⎞ ⎟ ⎟, ⎟ ⎠

etc., so that 𝐴𝑛 = 0. Matrices of the same form but with 1’s on the diagonal all correspond to unipotent operators. We will see that if 𝑁 ∶ 𝑉 → 𝑉 is nilpotent, there is a basis 𝔛 such that 0 ∗ ⎛ ⎞ ⋱ [𝑁]𝔛𝔛 = ⎜ ⎟, ⋱ ⎜ ⎟ 0 ⎠ ⎝0 but this is not true for all bases. Furthermore, a lot more can be said about the off-diagonal terms (∗) for suitably chosen bases. Exercise 1.3. In M(𝑛, 𝕂), show that the following sets of upper triangular matrices are both subgroups in GL(𝑛, 𝕂). 1 ∗ ⎞⎫ ⎧⎛ ⋱ (a) The strictly upper triangular group 𝒩 = ⎜ ⎟ ⋱ ⎟⎬ ⎨⎜ 1 ⎠⎭ ⎩⎝ 0 with entries in 𝕂. 𝑎 ∗ ⎞⎫ ⎧⎛ 1,1 ⋱ (b) The full upper triangular group 𝒫 = ⎜ ⎟ ⋱ ⎟⎬ ⎨⎜ 𝑎𝑛,𝑛 ⎠ ⎭ ⎩⎝ 0 𝑛 with entries in 𝕂 such that ∏𝑖=1 𝑎𝑖𝑖 ≠ 0. Verify that 𝒩 and 𝒫 are closed under taking matrix products and inverses. Example 1.4. Let 𝐴 = (

0 0

1 ) in M(2, 𝕂). This is a nilpotent matrix, and 0

in any ground field 𝜆 = 0 is the only root of its characteristic polynomial 𝑝𝐴 (𝜆) = det(𝐴 − 𝜆𝐼) = 𝜆2 .

1.1. NILPOTENT OPERATORS AND EXAMPLES

5

There is a nontrivial eigenvector 𝑒1 = (1, 0) corresponding to eigenvalue 𝜆 = 0 because ker(𝐴) = 𝕂 ⋅ 𝑒1 is nontrivial (as it must be for any nilpotent operator). But you can easily verify that scalar multiples of 𝑒1 are the only eigenvectors, so there is no basis of eigenvectors in 𝕂2 and 𝐴 cannot be diagonalized by any similarity transformation, regardless of the ground field 𝕂. We now turn to an important tool for advancing beyond the limitations of diagonalization, the Fitting decomposition. “Stable Range” and “Stable Kernel” of a Linear Map. Given a linear operator 𝑇 ∶ 𝑉 → 𝑉 on a finite-dimensional vector space (arbitrary ground field), let 𝐾𝑖 = 𝐾(𝑇 𝑖 ) = ker(𝑇 𝑖 ) and 𝑅𝑖 = 𝑅(𝑇 𝑖 ) = range(𝑇 𝑖 ) for 𝑖 = 0, 1, 2, … . These spaces are nested (0) ⊆ 𝐾1 ⊆ 𝐾2 ⊆ ⋯ ⊆ 𝐾𝑖 ⊆ 𝐾𝑖+1 ⊆ ⋯ 𝑉 ⊇ 𝑅1 ⊇ 𝑅2 ⊇ ⋯ ⊇ 𝑅𝑖 ⊇ 𝑅𝑖+1 ⊇ ⋯ , and if dim(𝑉) < ∞, they must each stabilize at some point, say with 𝐾𝑟 = 𝐾𝑟+1 = ⋯ and 𝑅𝑠 = 𝑅𝑠+1 = ⋯ for some integers 𝑟 and 𝑠. In fact, if 𝑟 is the first (smallest) index such that 𝐾𝑟 = 𝐾𝑟+1 = ⋯ the sequence of ranges must also stabilize at the same point because |𝑉| = |𝐾𝑖 | + |𝑅𝑖 | at each step. With this in mind, we define ∞

Stable Range of 𝑇:

𝑅∞ =

⋂

𝑅𝑖 = 𝑅𝑟 = 𝑅𝑟+1 = ⋯

⋃

𝐾𝑖 = 𝐾𝑟 = 𝐾𝑟+1 = ⋯

𝑖=1 ∞

Stable Kernel of 𝑇:

𝐾∞ =

𝑖=1

Proposition 1.5. 𝑉 = 𝑅∞ ⊕ 𝐾∞ and the spaces 𝑅∞ , 𝐾∞ are 𝑇-invariant. Furthermore, 𝑅𝑖+1 ≠ 𝑅𝑖 and 𝐾𝑖+1 ≠ 𝐾𝑖 for 𝑖 < 𝑟. Proof. To see there is a nontrivial jump 𝑅𝑖 ⫌ 𝑅𝑖+1 at every step until 𝑖 = 𝑟, it suffices to show that 𝑅𝑖+1 = 𝑅𝑖 at the 𝑖th step implies 𝑅𝑖+2 = 𝑅𝑖+1 (a similar result for kernels then follows automatically). Obviously, 𝑅𝑖+2 ⊆ 𝑅𝑖+1 for all 𝑖 ≥ 𝑟 since the ranges get smaller; to prove the reverse inclusion, let 𝑣 ∈ 𝑅𝑖+1 . Then there is some 𝑤1 ∈ 𝑉 such that 𝑣 = 𝑇 𝑖+1 (𝑤1 ) = 𝑇(𝑇 𝑖 (𝑤1 )). By hypothesis, 𝑇 𝑖+1 (𝑉) = 𝑅𝑖+1 = 𝑅𝑖 = 𝑇 𝑖 (𝑉), so there is some 𝑤2 ∈ 𝑉 such that 𝑇 𝑖 (𝑤1 ) = 𝑇 𝑖+1 (𝑤2 ). Thus, 𝑣 = 𝑇 𝑖+1 (𝑤1 ) = 𝑇(𝑇 𝑖 (𝑤1 )) = 𝑇(𝑇 𝑖+1 (𝑤2 )) = 𝑇 𝑖+2 (𝑤2 ) ∈ 𝑅𝑖+2 , so 𝑅𝑖+1 ⊆ 𝑅𝑖+2 . Hence 𝑅𝑖 = 𝑅𝑖+1 = 𝑅𝑖+2 , and by induction 𝑅𝑟 = 𝑅𝑟+1 = ⋯ = 𝑅∞ . For 𝑇-invariance, we have 𝑇(𝑅𝑖 ) = 𝑅𝑖+1 for 𝑖 ≥ 𝑟, with |𝑅𝑖 | = |𝑅𝑖+1 | by definition of 𝑟. Thus 𝑇 ∶ 𝑅𝑖 → 𝑅𝑖+1 is a bijection for all large 𝑖, and hence so is 𝑇 ∶ 𝑅∞ → 𝑅∞ . Similarly, 𝑇(𝐾𝑖 ) = 𝐾𝑖+1 with |𝐾𝑖 | = |𝐾𝑖+1 | for all large 𝑖 and 𝑇 ∶ 𝐾∞ → 𝐾∞ is also bijective. To see 𝑉 = 𝐾∞ ⊕ 𝑅∞ we show (𝑖) 𝑅∞ + 𝐾∞ = 𝑉 and (𝑖𝑖) 𝑅∞ ∩ 𝐾∞ = {0}. For (𝑖𝑖), if 𝑣 ∈ 𝑅∞ = 𝑅𝑟 , there is some 𝑤 ∈ 𝑉 such that 𝑇 𝑟 (𝑤) = 𝑣 ; but if 𝑣 ∈ 𝐾∞ = 𝐾𝑟 , then 𝑇 𝑟 (𝑣) = 0. Consequently 𝑇 2𝑟 (𝑤) = 𝑇 𝑟 (𝑣) = 0. But

6

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

𝑇 ∶ 𝑅𝑖 → 𝑅𝑖+1 is a bijection for 𝑖 ≥ 𝑟. Since 𝑣 = 𝑇 𝑟 (𝑤) ∈ 𝑅𝑟 , and since 𝑇 𝑟 ∶ 𝑅𝑟 → 𝑅2𝑟 = 𝑅𝑟 is a bijection, we get 0 = 𝑇 2𝑟 (𝑤) = 𝑇 𝑟 (𝑇 𝑟 (𝑤)). Hence 𝑣 = 𝑇 𝑟 (𝑤) = 0, so 𝐾∞ ∩ 𝑅∞ = {0}. Then (𝑖𝑖) ⇒ (𝑖) because |𝑅∞ + 𝐾∞ | = |𝑅𝑟 + 𝐾𝑟 | = |𝑅𝑟 | + |𝐾𝑟 | − |𝐾𝑟 ∩ 𝑅𝑟 | = |𝐾∞ | + |𝑅∞ | = |𝐾𝑟 | + |𝑅𝑟 | = |𝑉| (by a variant LA I-Exercise 1.67 of the dimension theorem, LA I-Theorem 1.54). We conclude that 𝑅∞ + 𝐾∞ = 𝑉, proving (𝑖). □ Lemma 1.6 (Fitting Decomposition). 𝑇|𝐾∞ is a nilpotent operator on 𝐾∞ and 𝑇|𝑅∞ is a bijective linear map of 𝑅∞ → 𝑅∞ . Hence, every linear operator 𝑇 on a finite-dimensional space 𝑉 over any field has a direct sum decomposition 𝑇 = (𝑇|𝑅∞ ) ⊕ (𝑇|𝐾∞ ) such that 𝑇|𝐾∞ is nilpotent and 𝑇|𝑅∞ bijective on 𝑅∞ . Proof. 𝑇 𝑟 (𝐾∞ ) = 𝑇 𝑟 (ker(𝑇 𝑟 )) = {0} so (𝑇|𝐾∞ )𝑟 = 0 and 𝑇|𝐾∞ is nilpotent of degree ≤ 𝑟, the index at which the ranges stabilize at 𝑅∞ . □ Basic Properties of Nilpotent Operators. Lemma 1.7. If 𝑁 ∶ 𝑉 → 𝑉 is nilpotent, the unipotent operator 𝐼 + 𝑁 is invertible. Proof. If 𝑁 𝑘 = 0, the geometric series 𝐼 + 𝑁 + 𝑁 2 + ⋯ + 𝑁 𝑘−1 + ⋯ = 𝑁 𝑘 is finite and a simple calculation shows that

∞ ∑𝑘=0

(𝐼 − 𝑁)(𝐼 + 𝑁 + ⋯ + 𝑁 𝑘−1 ) = 𝐼 − 𝑁 𝑘 = 𝐼. Since 𝑁 ≠ 𝐼 (by nilpotence), we get (1.1)

(𝐼 − 𝑁)−1 = 𝐼 + 𝑁 + ⋯ + 𝑁 𝑘−1 .

□

Lemma 1.8. If 𝑇 ∶ 𝑉 → 𝑉 is nilpotent, then 𝑝𝑇 (𝜆) = det(𝑇 − 𝜆𝐼) is equal to (−1)𝑛 𝜆𝑛 (𝑛 = dim 𝑉), so 𝜆 = 0 is the only eigenvalue (over any field 𝕂). It is an eigenvalue since ker(𝑇) ≠ {0} and this kernel is the full subspace of 𝜆 = 0 eigenvectors 𝐸𝜆=0 (𝑇). Proof. Take a basis 𝔛 = {𝑒1 , ⋯ , 𝑒𝑛 } that runs first through 𝐾(𝑇) = 𝐾1 = ker(𝑇) and is then augmented to a basis in 𝐾2 = ker(𝑇 2 ), etc. With respect to such a basis, [𝑇]𝔛𝔛 is an upper triangular matrix with blocks of zero matrices on the diagonal (see Exercise 1.10 below). Obviously, [𝑇 − 𝜆𝐼]𝔛𝔛 has diagonal values −𝜆, so det(𝑇 − 𝜆𝐼) = (−1)𝑛 𝜆𝑛 as claimed. □ Similarly, a unipotent operator 𝑇 has 𝜆 = 1 as its only eigenvalue (over any field), and its characteristic polynomial is 𝑝𝑇 (𝑥) = (1−𝑥)𝑛 . The sole eigenspace 𝐸𝜆=1 (𝑇) is the set of fixed points Fix(𝑇) = {𝑣 ∶ 𝑇(𝑣) = 𝑣} in 𝑉.

1.1. NILPOTENT OPERATORS AND EXAMPLES

7

Exercise 1.9. Prove the following: (a) A nilpotent operator 𝑇 is diagonalizable if and only if 𝑇 = 0. (b) A unipotent operator 𝑇 is diagonalizable if and only if 𝑇 is the identity operator 𝐼 = id𝑉 . Exercise 1.10. If 𝑇 ∶ 𝑉 → 𝑉 is a nilpotent linear operator on a finitedimensional space and {𝑒1 , … , 𝑒𝑛 } is a basis that passes through successive kernels 𝐾𝑖 = ker(𝑇 𝑖 ), 1 ≤ 𝑖 ≤ 𝑑 = deg(𝑇), prove that [𝑇]𝔛𝔛 is upper triangular with 𝑚𝑖 × 𝑚𝑖 zero-blocks on the diagonal, 𝑚𝑖 = dim(𝐾𝑖 /𝐾𝑖−1 ). Hint. The problem is to devise efficient notation to handle this question. Partition the indices 1, 2, … , 𝑛 into consecutive intervals 𝐽1 , … , 𝐽𝑑 (𝑑 = deg(𝑇)) with {𝑒𝑗 ∶ 𝑗 ∈ 𝐽1 } a basis for 𝐾1 , {𝑒𝑖 ∶ 𝑖 ∈ 𝐽1 ∪ 𝐽2 } a basis for 𝐾2 , etc. Matrix entries 𝑇𝑖𝑗 are determined by the system of vector equations 𝑛

𝑇(𝑒𝑖 ) = ∑ 𝑇𝑗𝑖 𝑒𝑗

(1 ≤ 𝑖 ≤ 𝑛 = dim 𝑉).

𝑗=1

What do the inclusions 𝑇(𝐾𝑖 ) ⊆ 𝐾𝑖−1 tell you about the coefficients 𝑇𝑖𝑗 ? If 𝑇 ∶ 𝑉 → 𝑉 is nilpotent, the powers 𝑇 𝑘 eventually kill every vector 𝑣 ≠ 0, so there is an 𝑚 = 𝑚(𝑣) in ℕ such that {𝑣, 𝑇(𝑣), … , 𝑇 𝑚−1 (𝑣)} are nonzero and 𝑇 𝑚 (𝑣) = 0. The nilpotence degree deg(𝑇) is the smallest exponent 𝑑 = 0, 1, 2, … such that 𝑇 𝑑 = 0. Proposition 1.11. Let 𝑇 ∶ 𝑉 → 𝑉 be nilpotent and 𝑣0 ≠ 0. If 𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑚−1 (𝑣0 ) are nonzero and 𝑇 𝑚 (𝑣0 ) = 0, the subspace 𝑊(𝑣0 ) = 𝕂-span{𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑚−1 (𝑣0 )} is 𝑇-invariant and the vectors {𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑚−1 (𝑣0 )} are independent. Hence they are a basis for this “cyclic subspace” 𝑊(𝑣0 ) determined by 𝑣0 and the action of 𝑇. Proof. The {𝑇 𝑘 (𝑣0 ) ∶ 0 ≤ 𝑘 ≤ 𝑚 − 1} span 𝑊(𝑣0 ) by definition. They are independent because if 0 = 𝑐0 𝑣0 + 𝑐1 𝑇(𝑣0 ) + ⋯ + 𝑐𝑚−1 𝑇 𝑚−1 (𝑣0 ) for some choice of 𝑐𝑘 ∈ 𝕂, then 0 = 𝑇 𝑚−1 (0) = 𝑇 𝑚−1 (𝑐0 𝑣0 + 𝑐1 𝑇(𝑣0 ) + ⋯ + 𝑐𝑚−1 𝑇 𝑚−1 (𝑣0 )) = 𝑐0 𝑇 𝑚−1 (𝑣0 ) + 𝑐1 ⋅ 0 + ⋯ + 𝑐𝑚−1 ⋅ 0, which implies 𝑐0 = 0 since 𝑇 𝑚−1 (𝑣0 ) ≠ 0 by minimality of the exponent 𝑚. Next, apply 𝑇 𝑚−2 to the original sum, which now has the form 𝑐1 𝑇(𝑣0 ) + ⋯ + 𝑐𝑚−1 𝑇 𝑚−1 (𝑣0 ); we get 𝑇 𝑚−2 (0) = 𝑇 𝑚−2 (𝑐1 𝑇(𝑣0 ) + ⋯ + 𝑐𝑚−1 𝑇 𝑚−1 (𝑣0 )) = 𝑐1 𝑇 𝑚−1 (𝑣0 ) + 0 + ⋯ + 0 and then 𝑐1 = 0. Continuing this process, we eventually obtain 𝑐0 = 𝑐1 = 𝑐2 = ⋯ = 𝑐𝑚−1 = 0. □

8

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

Obviously, 𝑊(𝑣0 ) is 𝑇-invariant and 𝑇0 = 𝑇|𝑊 (𝑣0 ) is nilpotent (with degree 𝑚 = deg(𝑇0 ) ≤ deg(𝑇)) because for each basis vector 𝑇 𝑘 (𝑣0 ) we have 𝑇0𝑚 (𝑇 𝑘 (𝑣0 )) = 𝑇 𝑘 (𝑇 𝑚 (𝑣0 )) = 0, but in fact deg(𝑇0 ) = 𝑚 because 𝑇0𝑚−1 (𝑣0 ) ≠ {0}. Now consider the ordered basis 𝔛 = {𝑒1 = 𝑇 𝑚−1 (𝑣0 ), 𝑒2 = 𝑇 𝑚−2 (𝑣0 ), … , 𝑒𝑚 = 𝑣0 } in 𝑊(𝑣0 ). Since 𝑇(𝑒𝑘+1 ) = 𝑒𝑘 for each 𝑘 ≥ 1 and 𝑇(𝑒1 ) = 0, the matrix [𝑇]𝔛𝔛 has the form 0 1 0 ⋯ ⎛0 0 1 ⋱ ⎜0 0 0 ⋱ [𝑇]𝔛𝔛 = ⎜ ⋱ ⋱ ⎜⋮ ⋮ ⋱ ⎜ 0 ⋯ ⋯ 0 ⎝ and the action on these ordered basis vectors is 𝑇

𝑇

𝑇

⋯ 0 ⋮ ⋱ ⋮ ⋱ 0 ⋱ 1 0 0

𝑇

⎞ ⎟ ⎟, ⎟ ⎟ ⎠

𝑇

0 ⟵ 𝑒1 ⟵ 𝑒2 ⟵ ⋯ ⟵ 𝑒𝑚−1 ⟵ 𝑒𝑚 = 𝑣0 . The “top vector” 𝑒𝑚 = 𝑣0 is referred to as a cyclic vector for the invariant subspace 𝑊(𝑣0 ). Any matrix having the form 0 1 0 ⋯ ⎛⋮ ⋱ ⋱ ⋱ ⎜⋮ ⋱ ⋱ ⎜ ⋮ ⋱ ⎜ ⋮ ⎜ ⎝0 ⋯ ⋯ ⋯ is called an elementary nilpotent matrix.

⋯ 0 ⋮ ⋱ ⋮ ⋱ 0 ⋱ 1 ⋯ 0

⎞ ⎟ ⎟ ⎟ ⎟ ⎠

Cyclic Vectors and Cyclic Subspaces. To put this in its proper context, we leave the world of nilpotent operators for a moment. Definition 1.12. If dim(𝑉) < ∞, 𝑇 ∶ 𝑉 → 𝑉 is any linear operator, and 𝑊 ⊆ 𝑉 is a nonzero 𝑇-invariant subspace, we say 𝑊 is a cyclic subspace if it contains a “cyclic vector” 𝑣0 ∈ 𝑊 such that 𝑊 = 𝕂-span{𝑣0 , 𝑇(𝑣0 ), 𝑇 2 (𝑣0 ), … }. Only finitely many iterates 𝑇 𝑖 (𝑣0 ) under the action of 𝑇 can be linearly independent, so there will be a first (smallest) exponent 𝑘 = 𝑘(𝑣0 ) such that {𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑘−1 (𝑣0 )} are linearly independent and 𝑇 𝑘 (𝑣0 ) is a linear combination of the previous vectors. Lemma 1.13. Let 𝑇 ∶ 𝑉 → 𝑉 be an arbitrary linear operator on a finitedimensional vector space. If 𝑣0 ∈ 𝑉 is nonzero, there is a unique exponent 𝑘 = 𝑘(𝑣0 ) ≥ 1 such that {𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑘−1 (𝑣0 )} are linearly independent and 𝑇 𝑘 (𝑣0 ) is a linear combination of these vectors. Obviously, 𝑊 = 𝕂-span{𝑇 𝑗 (𝑣0 ) ∶ 𝑗 = 0, 1, 2, … } = 𝕂-span{𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑘−1 (𝑣0 )} and dim(𝑊) = 𝑘. Furthermore, 𝑇(𝑊) ⊆ 𝑊 and 𝑊 is a cyclic subspace in 𝑉.

1.2. FINE STRUCTURE OF NILPOTENT OPERATORS

9

Proof. By definition of 𝑘 = 𝑘(𝑣0 ), 𝑇 𝑘 (𝑣0 ) is a linear combination 𝑇 𝑘 (𝑣0 ) = Arguing recursively,

𝑘−1 ∑𝑗=0 𝑐𝑗 𝑇 𝑗 (𝑣0 ).

𝑘−1

𝑇

𝑘+1

𝑘

(𝑣0 ) = 𝑇(𝑇 (𝑣0 )) = ∑ 𝑐𝑗 𝑇 𝑗+1 (𝑣0 ) 𝑗=0 𝑘

∈ 𝑐𝑘−1 𝑇 (𝑣0 ) + 𝕂-span{𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑘−1 (𝑣0 ) } Since we already know 𝑇 𝑘 (𝑣0 ) lies in 𝕂-span{𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑘−1 (𝑣0 )}, so does 𝑇 𝑘+1 (𝑣0 ). Continuing this process, we find that all iterates 𝑇 𝑖 (𝑣0 ) (𝑖 ≥ 𝑘) lie in 𝑊. By definition 𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑘−1 (𝑣0 ) are linearly independent and span 𝑊, so dim(𝑊) = 𝑘. □ When 𝑇 is nilpotent, there is a simpler description of the cyclic subspace 𝑊 generated by the action of 𝑇 on 𝑣0 ≠ 0. Since 𝑇 𝑑 = 0 on all of 𝑉 when 𝑑 = deg(𝑇), there is a smallest exponent 𝑙 with {𝑣0 , 𝑇(𝑣0 ), … , 𝑇 ℓ−1 (𝑣0 )} nonzero and 𝑇 ℓ (𝑣0 ) = 𝑇 ℓ+𝑖 (𝑣0 ) = 0 for all 𝑖 ≥ 0. These vectors are independent and 𝑇 ℓ (𝑣0 ) = 0 lies in 𝕂-span{𝑣0 , 𝑇(𝑣0 ), … , 𝑇 ℓ−1 (𝑣0 )}, so ℓ is precisely the exponent of the previous lemma and 𝑊 = 𝕂-span{𝑣0 , 𝑇(𝑣0 ), … , 𝑇 ℓ−1 (𝑣0 )} is the cyclic subspace generated by 𝑣0 . 1.2. Fine Structure of Nilpotent Operators Continuing the discussion of nilpotent operators, observe that if 𝑇 ∶ 𝑉 → 𝑉 is nilpotent and nonzero, the chain of kernels 𝐾𝑖 = ker(𝑇 𝑖 ), {0} = 𝐾0 ⫋ 𝐾1 = ker(𝑇) ⫋ 𝐾2 ⫋ ⋯ ⫋ 𝐾𝑑 = 𝑉

(𝑑 = deg(𝑇)),

terminates at 𝑉 in finitely many steps. The difference sets partition 𝑉 ∼ (0) into disjoint “layers” 𝑉 ∼ (0) = (𝐾𝑑 ∼ 𝐾𝑑−1 ) ∪ ⋯ ∪ (𝐾𝑖 ∼ 𝐾𝑖−1 ) ∪ ⋯ ∪ (𝐾1 ∼ 𝐾0 ) where 𝐾0 = (0). The layers 𝐾𝑖 ∼ 𝐾𝑖−1 correspond to the quotient spaces 𝐾𝑖 /𝐾𝑖−1 , and by examining the action of 𝑇 on these quotients, we will determine the structure of the operator 𝑇. Exercise 1.14. If 𝑣0 is in the “top layer” 𝑉 ∼ 𝐾𝑑−1 , prove that the subspace 𝑊 spanned by {𝑇 𝑗 (𝑣0 ) ∶ 𝑗 ≥ 0} has dimension 𝑑 and that every 𝑣0 in this layer is a cyclic vector under the action of 𝑇 on 𝑊. Remark 1.15. Since |𝐾𝑑−1 | < |𝐾𝑑 | = |𝑉|, the subspace 𝐾𝑑−1 is a very thin subset of 𝑉 and has “measure zero” in 𝑉 when 𝕂 = ℝ or ℂ. If you could pick a vector 𝑣0 ∈ 𝑉 at random, you would have 𝑣0 ∈ 𝑉 ∼ 𝐾𝑑−1 “with probability 1,” and every such choice of 𝑣0 would generate a cyclic subspace of dimension 𝑑. “Unsuccessful” choices, which occur with “probability zero,” yield cyclic subspaces 𝑊(𝑣0 ) of dimension < 𝑑. ○ We now state the main structure theorem for nilpotent operators.

10

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

Theorem 1.16 (Cyclic Subspace Decomposition). Given a nilpotent linear operator 𝑇 ∶ 𝑉 → 𝑉 on a finite-dimensional vector space 𝑉, there is a decomposition 𝑉 = 𝐶1 ⊕⋯⊕𝐶𝑟 into cyclic 𝑇-invariant subspaces. Obviously the restrictions 𝑇𝑖 = 𝑇|𝐶𝑖 are nilpotent with nilpotence degrees 𝑚𝑖 = dim(𝐶𝑖 ) = (smallest 𝑚 such that 𝑇 𝑚 kills the cyclic generator 𝑣𝑖 ∈ 𝐸𝑖 ). Listed in descending order 𝑚1 ≥ 𝑚2 ≥ ⋯ ≥ 𝑚𝑟 > 0 (repeats allowed), these 𝑟 degrees are unique and ∑𝑖=1 𝑚𝑖 = dim(𝑉). While it is nice to know such structure exists, it is equally important to develop a constructive procedure for finding suitable cyclic subspaces 𝐶1 , … , 𝐶𝑟 . This is complicated by the fact that 𝐶𝑘 are not necessarily unique, unlike the eigenspaces 𝐸𝜆 (𝑇) in the diagonalization problem. Any algorithm for constructing suitable 𝐶𝑗 will necessarily involve some arbitrary choices. The rest of this section provides a proof of Theorem 1.16 that yields an explicit construction of the desired subspaces. There are some shorter, more elegant proofs of Theorem 1.16, but they are existential rather than constructive and so are less informative. Corollary 1.17. If 𝑇 ∶ 𝑉 → 𝑉 is nilpotent, there is a decomposition into cyclic spaces 𝑉 = 𝐶1 ⊕ ⋯ ⊕ 𝐶𝑟 , so there is a basis 𝔛 such that [𝑇]𝔛𝔛 consists of elementary nilpotent blocks on the diagonal:

[𝑇]𝔛𝔛

𝐵 0 ⋯ 0 ⎛ 01 𝐵 0 2 ⎜ ⋱ ⋱ ⋮ 0 =⎜ ⋱ ⋱ ⋱ ⎜ ⋮ ⋱ ⋱ 0 ⎜ ⋯ 0 𝐵𝑟 ⎝ 0

⎞ ⎟ ⎟ ⎟ ⎟ ⎠

with 0 1 0 ⋯ ⎛0 0 1 ⋱ ⎜0 0 0 ⋱ 𝐵𝑖 = ⎜ ⋱ ⋱ ⎜⋮ ⋱ ⎜⋮ ⋯ ⎝0

⋯ 0 ⋮ ⋱ ⋮ ⋱ 0 ⋱ 1 0 0

⎞ ⎟ ⎟. ⎟ ⎟ ⎠

We start with the special case in which 𝑇 has the largest possible degree of nilpotence. Lemma 1.18. If 𝑇 is nilpotent and deg(𝑇) = dim(𝑉), there is a cyclic vector in 𝑉 and a basis 𝔛 such that [𝑇]𝔛𝔛 is an elementary nilpotent matrix. Proof. If deg(𝑇) = 𝑑 is equal to dim(𝑉) the spaces 𝐾𝑖 = ker(𝑇 𝑖 ) increase, with |𝐾𝑖+1 | ≥ 1 + |𝐾𝑖 | at each step in the chain {0} ⫋ 𝐾1 ⫋ ⋯ ⫋ 𝐾𝑑−1 ⫋ 𝐾𝑑 = 𝑉, but there are only 𝑑 = dim(𝑉) steps, so we must have |𝐾𝑖+1 | = 1 + |𝐾𝑖 |. If 𝑣0 ∈ 𝑉 ∼ 𝐾𝑑−1 , then 𝑇 𝑑 (𝑣0 ) = 0 and by definition of 𝐾𝑑−1 the vectors 𝑣0 , 𝑇(𝑣0 ), … , 𝑇 𝑑−1 (𝑣0 ) are all nonzero, so 𝑣0 is a cyclic vector for the iterated action of 𝑇. □

1.2. FINE STRUCTURE OF NILPOTENT OPERATORS

11

If 𝑇 ∶ 𝑉 → 𝑉 is nilpotent of degree 𝑑, the idea behind the proof of Theorem 1.16 is to look at the kernels 𝐾𝑖 = ker(𝑇 𝑖 ): 𝑉 = 𝐾𝑑 ⫌ 𝐾𝑑−1 ⫌ ⋯ ⫌ 𝐾2 ⫌ 𝐾1 = ker(𝑇) ⫌ {0}. As the kernels get smaller, more of 𝑉 is “uncovered”: the differences 𝑉 ∼ 𝐾𝑠 and the quotients 𝑉/𝐾𝑠 get bigger, and 𝑇|𝐾𝑠 reveals more details about the full action of 𝑇 on 𝑉. It will be important to note that 𝑇(𝐾𝑖 ) ⊆ 𝐾𝑖−1 (since 0 = 𝑇 𝑖 (𝑥) = 𝑇 𝑖−1 (𝑇(𝑥)) and 𝑇(𝑥) ∈ 𝐾𝑖−1 ). Furthermore, 𝑥 ∉ 𝐾𝑖 implies that 0 ≠ 𝑇 𝑖 (𝑥) = 𝑇 𝑖−1 (𝑇(𝑥)) so that 𝑇(𝑥) ∉ 𝐾𝑖−1 . Therefore, (1.2)

𝑇 maps (𝐾𝑖+1 ∼ 𝐾𝑖 ) into (𝐾𝑖 ∼ 𝐾𝑖−1 )

for all 𝑖.

However, although 𝑇(𝐾𝑖 ) ⊆ 𝐾𝑖−1 , it is not generally true that 𝑇(𝐾𝑖 ) = 𝐾𝑖−1 , so the induced map on the quotient space 𝑇˜ ∶ 𝐾𝑖 /𝑘𝑖−1 → 𝐾𝑖 /𝐾𝑖−1 need not be surjective. Definition 1.19. Let 𝑇 ∶ 𝑉 → 𝑉 be an arbitrary linear map and 𝑊 a 𝑇-invariant subspace. We say that vectors 𝑒1 , … , 𝑒𝑚 in 𝑉 1. Are Independent (mod 𝑊) if their images 𝑒1 , … , 𝑒𝑚 in 𝑉/𝑊 are lin𝑚 𝑚 early independent. Since ∑𝑖=1 𝑐𝑖 𝑒𝑖 = 0 in 𝑉/𝑊 if and only if ∑𝑖=1 𝑐𝑖 𝑒𝑖 is in 𝑊, that is the same as saying 𝑚

∑ 𝑐𝑖 𝑒𝑖 ∈ 𝑊 ⇒ 𝑐1 = ⋯ = 𝑐𝑚 = 0

(𝑐𝑖 ∈ 𝕂).

𝑖=1

2. Span 𝑉 (mod 𝑊) if 𝕂-span{𝑒𝑖 } = 𝑉/𝑊, which means that given 𝑣 𝑚 in 𝑉, there are 𝑐𝑖 ∈ 𝕂 such that (𝑣 − ∑𝑖 𝑐𝑖 𝑒𝑖 ) ∈ 𝑊, or 𝑣 = ∑𝑖=1 𝑐𝑖 𝑒𝑖 in 𝑉/𝑊. 3. Are a Basis for 𝑉 (mod 𝑊) if the images {𝑒𝑖 } are a basis in 𝑉/𝑊, which happens if and only if (1) and (2) hold. Exercise 1.20. Let 𝑊 ⊆ ℝ5 be the solution set of system {

𝑥1 + 𝑥3 + 2𝑥4 = 0 𝑥1 − 𝑥4 + 2𝑥5 = 0,

and let {𝑒𝑖 } be the standard basis in 𝑉 = ℝ5 . (a) Find vectors 𝑣1 , 𝑣2 that are a basis for 𝑉 (mod 𝑊). (b) If 𝑣1 , 𝑣2 are the vectors in (a), is 𝔛 = {𝑒1 , 𝑒2 , 𝑒3 , 𝑣1 , 𝑣2 } a basis for 𝑉? (c) Find a basis {𝑓1 , 𝑓2 , 𝑓3 } for the subspace 𝑊. Exercise 1.21. Let 𝑇 ∶ 𝑉 → 𝑉 be an arbitrary linear map and let 𝑊 be a 𝑇-invariant subspace. Explain why independence of vectors 𝑓1 , … , 𝑓𝑟 mod a 𝑇-invariant subspace 𝑊 ⊆ 𝑉 implies their independence (mod 𝑊 ′ ) for any smaller 𝑇-invariant subspace 𝑊 ′ ⊆ 𝑊 ⊆ 𝑉. Proof. (Proof of Theorem 1.16.) Below we will construct two related sets of vectors ℱ1 , ℱ2 , ℱ3 , … and ℰ1 = ℱ1 ⊆ ℰ2 ⊆ ℰ3 ⊆ ⋯ ⊆ ℰ𝑟 such that ℰ𝑟 is a basis for 𝑉 aligned with the kernels 𝐾𝑑 = 𝑉 ⊇ 𝐾𝑑−1 ⊇ ⋯ ⊇ 𝐾1 = ker(𝑇) ⊇ {0}.

12

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

When the construction terminates, the vectors in ℰ𝑟 will be a basis for all of 𝑉 that provides the desired decomposition into cyclic subspaces. (Initial) Step 1: Let ℱ1 = ℰ1 be a set of vectors {𝑒𝑖 ∶ 𝑖 ∈ 𝐼1 } ⊆ 𝑉 ∼ 𝐾𝑑−1 that are a basis for 𝑉 (mod 𝐾𝑑−1 ), so their images {𝑒𝑖 } are a basis in 𝑉/𝐾𝑑−1 . Obviously the index set 𝐼1 has cardinality |𝐼1 | = |𝑉/𝐾𝑑−1 | = |𝑉| − |𝐾𝑑−1 |, the dimension of the quotient space. You might feel more comfortable indicating the index sets 𝐼1 , 𝐼2 , … being constructed here as consecutive blocks of integers, say 𝐼1 = {1, 2, … , 𝑠1 }, 𝐼2 = {𝑠1 + 1, … , 𝑠2 }, etc., but this notation becomes really cumbersome after the first two steps. And in fact there is no need to explicitly name the indices in each block. From here on, you should refer to the chart shown in Figure 1.1, which lists all the players that will emerge in our discussion. Step 2: The 𝑇-images 𝑇(ℱ1 ) lie in the layer 𝑇(𝑉 ∼ 𝐾𝑑−1 ) ⊆ 𝐾𝑑−1 ∼ 𝐾𝑑−2 , as noted in (1.2). In this step we shall verify two assertions. Claim (i): Vectors 𝑇(ℱ1 ) = {𝑇(𝑒𝑖 ) ∶ 𝑖 ∈ 𝐼1 } ⊆ 𝐾𝑑−1 ∼ 𝐾𝑑−2 are independent (mod 𝐾𝑑−2 ). If these vectors are not already representatives of a basis for 𝐾𝑑−1 /𝐾𝑑−2 , we can adjoin additional vectors ℱ2 = {𝑒𝑖 ∶ 𝑖 ∈ 𝐼2 } ⊆ 𝐾𝑑−1 ∼ 𝐾𝑑−2 chosen so that 𝑇(ℱ1 ) ∪ ℱ2 corresponds to a basis for 𝐾𝑑−1 /𝐾𝑑−2 ; otherwise we take ℱ2 = ∅. Claim (ii): The vectors ℰ2 = ℱ2 ∪ [ℰ1 ∪ 𝑇(ℱ1 )] = ℰ1 ∪ [𝑇(ℱ1 ) ∪ ℱ2 ] are a basis for all of 𝑉 (mod 𝐾𝑑−2 ). Remark 1.22. In Chapter 1 of LA I we saw that if {𝑒1 , … , 𝑒𝑟 } is a basis for 𝑊, we can adjoin successive “outside vectors” 𝑒𝑟+1 , ⋯ , 𝑒𝑛 to get a basis for 𝑉. (These can even be found by deleting some of the vectors in a preordained basis in 𝑉.) Then the images {𝑒𝑟+1 , … , 𝑒𝑛 } are a basis for the quotient space 𝑉/𝑊, as in the proof of the dimension formula |𝑉| = |𝑊|+|𝑉/𝑊| for finite-dimensional 𝑉 (Theorem 1.79 in LA I). ○ Proof. (Proof of Claim (i)). If there is a relation ∑ 𝑎𝑖 𝑇(𝑒𝑖 ) = 𝑇( ∑ 𝑎𝑖 𝑒𝑖 ) ≡ 0 𝑖∈𝐼1

(mod 𝐾𝑑−2 ),

𝑖∈𝐼1

then ∑𝑖∈𝐼 𝑎𝑖 𝑒𝑖 is in 𝐾𝑑−2 and also lies in the larger space 𝐾𝑑−1 ⊇ 𝐾𝑑−2 . But 1 by definition, vectors in ℱ1 = {𝑒𝑖 ∶ 𝑖 ∈ 𝐼1 } are independent (mod 𝐾𝑑−1 ), so we must have 𝑎𝑖 = 0 for 𝑖 ∈ 𝐼1 , proving independence (mod 𝐾𝑑−1 ) of the vectors in 𝑇(ℱ1 ). □ (1)

(2)

Proof. (Proof of Claim (ii)). Suppose 𝑎𝑖 , 𝑎𝑖 , 𝑏𝑖 ∈ 𝕂 are coefficients such that (1.3)

(1)

(2)

∑ 𝑎𝑖 𝑒𝑖 + ∑ 𝑏𝑖 𝑇(𝑒𝑖 ) + ∑ 𝑎𝑖 𝑒𝑖 ≡ 0 (𝑚𝑜𝑑 𝐾𝑑−2 ). 𝑖∈𝐼1

𝑖∈𝐼1

𝑖∈𝐼2

T r−1(F3)

T r (F2)

T r+1(F1)

... T 0

T d−1(F1)

T 0

T d−2(F2)

T r−2(F3)

T r−1(F2)

T r (F1)

T 0

T d−3(F3)

...

(0) = K0

ker(T) = K1

T r−3(F3)

2

es +1· ··es3

F3 =

E2

T r−2(F2)

T(F2) = T(es1+1)· · · T(es2)

1

es +1· · · es2

F2 =

E1

... T r−1(F1)

T(F1) = T(e1) · · · T(es1) T 2(F 1) = T 2(e1) · · · T 2(es1)

1

F1 = e1 · · · es

...

Kd−r−2

Kd−r−1

STEP (r + 1)

K d−r

STEP(r)

K d−r+1

K d−3

K d−2

K d−1

Kd

...

...

...

...

..

E3

.

T 2(Fr )

T(Fr )

Fr

...

T(Fr+1)

Fr+1

Er

...

Fr+2

E r+1

. T 0

T(Fd−1)

..

E r+2

T 0

Fd

...

Ed

1.2. FINE STRUCTURE OF NILPOTENT OPERATORS 13

...

Figure 1.1. Steps in the construction of a basis that decomposes vector space 𝑉 into cyclic subspaces under the action of a nilpotent linear operator 𝑇 ∶ 𝑉 → 𝑉. The subspaces 𝐾𝑖 are the kernels of the powers 𝑇 𝑖 for 1 ≤ 𝑖 ≤ 𝑑 = deg(𝑇), with 𝐾𝑑 = 𝑉 and 𝐾0 = (0).

14

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

This sum lies in 𝐾𝑑−2 , hence also in the larger subspace 𝐾𝑑−1 , and the last two terms are already in 𝐾𝑑−1 because ℱ2 ∪ 𝑇(ℱ1 ) ⊆ 𝐾𝑑−1 ∼ 𝐾𝑑−2 . Thus (1)

∑ 𝑎𝑖 𝑒𝑖 ≡ 0 (mod Kd−1 ), 𝑖∈𝐼1 (1)

and since {𝑒𝑖 ∶ 𝑖 ∈ 𝐼1 } are independent (mod 𝐾𝑑−1 ) we must have 𝑎𝑖 = 0 for all 𝑖 ∈ 𝐼1 . Now the sum (1.3) reduces to its last two terms, which all lie in 𝐾𝑑−1 . But by construction, ℱ2 ∪ 𝑇(ℱ1 ) is a basis for 𝐾𝑑−1 (mod 𝐾𝑑−2 ), which (2) implies 𝑎𝑖 = 0 for 𝑖 ∈ 𝐼2 and 𝑏𝑖 = 0 for 𝑖 ∈ 𝐼1 . Thus ℰ2 = ℱ1 ∪ [𝑇(ℱ1 ) ∪ ℱ2 ] = [ℱ1 ∪ 𝑇(ℱ1 )] ∪ ℱ2 ⊆ 𝑉 is an independent set of vectors (mod 𝐾𝑑−2 ). It remains to show ℰ2 spans 𝑉 (mod 𝐾𝑑−2 ). If 𝑣 ∈ 𝑉 is not contained in 𝐾𝑑−1 , then 𝑣 − 𝑣1 ≡ 0 (mod 𝐾𝑑−1 ) for some 𝑣1 ∈ 𝕂-span{ℱ1 }, so 𝑣 − 𝑣1 ∈ 𝐾𝑑−1 . If this difference lies outside of 𝐾𝑑−2 , we can find some 𝑣2 ∈ 𝑇(ℱ1 ) ∪ ℱ2 such that 𝑣 = (𝑣1 + 𝑣2 ) ∈ 𝐾𝑑−2 because 𝑇(ℱ1 ) ∪ ℱ2 is a basis for 𝐾𝑑−1 /𝐾𝑑2 . Thus 𝑣 = 𝑣1 + 𝑣2 (mod 𝐾𝑑−2 ), and since 𝑣1 + 𝑣2 ∈ 𝕂-span{ℱ1 ∪ 𝑇(ℱ1 ) ∪ ℱ2 }, statement (ii) is proved. □ That completes Step 2, which fills in the second row of Figure 1.1. Further inductive steps that fill in the remaining rows involve no new ideas, but things can get out of hand unless the notation is managed carefully. Below we spell out the general inductive step in this process, which could be skipped on first reading. It is followed by a final paragraph proving uniqueness of the multiplicities 𝑚𝑖 (which you should read). □ The General Inductive Step in Theorem 1.16. With the chart from Figure 1.1 in hand to keep track of the players, we continue the inductive construction of basis vectors. At step 𝑟, we have defined sets ℱ𝑖 ⊆ 𝐾𝑑−𝑖+1 ∼ 𝐾𝑑−𝑖 for 1 ≤ 𝑖 ≤ 𝑟 such that ℰ1 = ℱ1 and ℰ𝑟 = ℰ𝑟−1 ∪ [ 𝑇 𝑟−1 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟−1 ) ∪ ℱ𝑟 ] is a basis for 𝑉/𝐾𝑑−𝑟 . At the next step, we take the vectors 𝑇 𝑟−1 (ℱ1 ) ∪ 𝑇 𝑟−2 (ℱ2 ) ∪ ⋯ ∪ ℱ𝑟 ⊆ 𝐾𝑑−𝑟+1 ∼ 𝐾𝑑−𝑟 created in the previous steps and form their 𝑇-images 𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟 ) ⊆ 𝐾𝑑−𝑟 ∼ 𝐾𝑑−𝑟−1 . To complete the inductive step we show the following: 1. These vectors in 𝐾𝑑−𝑟 are independent (mod 𝐾𝑑−𝑟−1 ). 2. When we adjoin additional vectors ℱ𝑟+1 ⊆ 𝐾𝑑−𝑟 ∼ 𝐾𝑑−𝑟−1 as needed to produce a basis for 𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 , taking ℱ𝑟+1 = ∅ if the vectors 𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟 ) are already basis representatives for 𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 , then ℰ𝑟+1 = ℰ𝑟 ∪ [𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟 ) ∪ ℱ𝑟+1 ] is a basis for 𝑉 (mod 𝐾𝑑−𝑟−1 ). When this process terminates, the final set ℰ𝑑 is a basis for all of 𝑉.

1.2. FINE STRUCTURE OF NILPOTENT OPERATORS

15

Proof. Step 1. If the vectors 𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟 ) are not representatives for independent vectors in 𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 , there would be sets of coefficients (1)

{𝑐𝑖

(𝑟)

∶ 𝑖 ∈ 𝐼1 }, … , {𝑐𝑖

∶ 𝑖 ∈ 𝐼𝑟 }

such that (1.4)

(1)

(𝑟)

∑ 𝑐𝑖 𝑇 𝑟 (𝑒𝑖 ) + ⋯ + ∑ 𝑐𝑖 𝑇(𝑒𝑖 ) ≡ 0 𝑖∈𝐼1

(mod Kd−r−1 ).

𝑖∈𝐼𝑟

This sum is in 𝐾𝑑−𝑟−1 and also in the larger space 𝐾𝑑−𝑟 . But the vectors 𝑇 𝑟−1 {𝑒𝑖 ∶ 𝑖 ∈ 𝐼1 } ∪ ⋯ ∪ {𝑒𝑖 ∶ 𝑖 ∈ 𝐼𝑟 } are independent (mod 𝐾𝑑−𝑟 ) by hypothesis and are representatives for a basis in 𝐾𝑑−𝑟+1 /𝐾𝑑−𝑟 . We may rewrite the congruence in (1.4) as (𝑟)

(1)

𝑇( ∑ 𝑐𝑖 𝑇 𝑟−1 (𝑒𝑖 ) + ⋯ + ∑ 𝑐𝑖 𝑒𝑖 ) ≡ 0 𝑖∈𝐼1

(mod 𝐾𝑑−𝑟−1 ).

𝑖∈𝐼𝑟

Thus 𝑇(⋯) ∈ 𝐾𝑑−𝑟−1 , and (⋯) ∈ 𝐾𝑑−𝑟 too. By (mod 𝐾𝑑−𝑟 ) independence of (𝑗) the 𝑒𝑖 , we must have 𝑐𝑖 = 0 in 𝕂 for all 𝑖, 𝑗 so the vectors 𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟 ) are independent (mod 𝐾𝑑−𝑟−1 ) as claimed. Step 2. To verify independence of the updated set of vectors ℰ𝑟+1 = ℰ𝑟 ∪ [𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑟 ) ∪ ℱ𝑟+1 ] in 𝑉/𝐾𝑑−𝑟−1 , suppose some linear combination 𝑆 = 𝑆 ′ + 𝑆 ″ is zero (mod 𝐾𝑑−𝑟−1 ) where 𝑆 ′ is a sum over vectors in ℰ𝑟 and 𝑆 ″ a sum over vectors in 𝑇 𝑟 (ℱ1 ) ∪ ⋯ ∪ ℱ𝑟+1 . Then 𝑆 ≡ 0 (mod 𝐾𝑑−𝑟−1 ) implies 𝑆 ≡ 0 (mod 𝐾𝑑−𝑟 ), and then by independence of vectors in ℰ𝑟 (mod 𝐾𝑑−𝑟 ), all coefficients appearing in 𝑆 ′ are zero. The remaining term 𝑆 ″ in the reduced sum lies in 𝐾𝑑−𝑟 ∼ 𝐾𝑑−𝑟−1 , and by independence of the set of vectors 𝑇 𝑟 (ℱ1 )∪⋯∪ℱ𝑟+1 in 𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 , all coefficients in 𝑆 ″ are also zero. Thus ℰ𝑟+1 ⊆ 𝑉 corresponds to an independent set in 𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 . Dimension counting reveals that |𝑉/𝐾𝑑−1 | = |ℱ1 |, |𝐾𝑑−1 /𝐾𝑑−2 | = |𝑇(ℱ1 )| + |ℱ2 | = |ℱ1 | + |ℱ2 |, ⋮ |𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 | = |ℱ1 | + ⋯ + |ℱ𝑟+1 |. Thus |𝑉/𝐾𝑑−𝑟−1 | = |𝑉/𝐾𝑑−1 | + ⋯ + |𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 | is precisely the number |ℰ𝑟+1 | of basis vectors appearing in the first 𝑟 + 1 rows from the top of the chart in Figure 1.1. But this is also equal to dim(𝑉/𝐾𝑑−𝑟−1 ), so ℰ𝑟+1 is a basis for 𝑉/𝐾𝑑−𝑟−1 and Step (𝑟 + 1) of the induction is complete. That also completes the proof of Theorem 1.16. □

16

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

Cyclic Subspace Decompositions. A direct sum decomposition of 𝑉 into cyclic subspaces can now be read out of the chart in Figure 1.1, in which basis vectors have been constructed row-by-row. For this, consider what happens when we partition into columns. For each 𝑒𝑖 ∈ ℱ1 , (𝑖 ∈ 𝐼1 ), we have 𝑒𝑖 , 𝑇(𝑒𝑖 ), 𝑇 2 (𝑒𝑖 ), … , 𝑇 𝑑−1 (𝑒𝑖 ) ≠ 0 and 𝑇 𝑑 (𝑒𝑖 ) = 0, so these vectors span a cyclic subspace 𝐸(𝑒𝑖 ) such that 𝑇|𝐸(𝑒𝑖 ) has nilpotence degree 𝑑 with 𝑒𝑖 as its cyclic vector. Since the vectors that span 𝐸(𝑒𝑖 ) are part of a basis ℰ𝑑 for all of 𝑉, we obtain a direct sum of cyclic 𝑇-invariant subspaces ⨁𝑖∈𝐼 𝐸(𝑒𝑖 ) ⊆ 𝑉 with |𝐼1 | = |ℱ1 | 1 summands. Vectors 𝑒𝑖 ∈ ℱ2 (𝑖 ∈ 𝐼2 ) generate cyclic subspaces 𝐸(𝑒𝑖 ) with dim (𝐸(𝑒𝑖 )) = deg(𝑇|𝐸(𝑒𝑖 ) ) = 𝑑 − 1; these become part of ⨁

𝐸(𝑒𝑖 ) ⊕

𝑖∈𝐼1

⨁

𝐸(𝑒𝑖 ),

𝑖∈𝐼2

etc. (Of course ⨁𝑖∈𝐼 𝐸(𝑒𝑖 ) = (0) when 𝐼2 = ∅.) In the last step, the vectors 2 𝑒𝑖 ∈ ℱ𝑑 (𝑖 ∈ 𝐼𝑑 ) determine 𝑇-invariant one-dimensional cyclic spaces such that 𝑇(𝕂𝑒𝑖 ) = (0), with nilpotence degree = 1 — i.e., for 𝑖 ∈ 𝐼𝑑 , the spaces 𝐸(𝑒𝑖 ) = 𝕂𝑒𝑖 all lie within ker(𝑇). The end result is a cyclic subspace decomposition (1.5)

(

⨁

𝑖1 ∈𝐼1

𝐸(𝑒𝑖1 )) ⊕ (

⨁

𝐸(𝑒𝑖2 )) ⊕ ⋯ ⊕ (

𝑖2 ∈𝐼2

⨁

𝐸(𝑒𝑖𝑑 ))

𝑖𝑑 ∈𝐼𝑑

of the entire space 𝑉, since all basis vectors in ℰ𝑟 are accounted for. (Various summands in (1.5) may of course be trivial.) Uniqueness of Cyclic Subspace Decomposition. A direct sum decom𝑠 position 𝑉 = ⨁𝑗=1 𝐸𝑗 into 𝑇-invariant cyclic subspaces can be refined by gathering together those 𝐸𝑖 of the same dimension, writing 𝑑

𝑉=

⨁ 𝑘=1

ℋ𝑘

where ℋ𝑘 =

⨁

{𝐸𝑖 ∶ dim(𝐸𝑖 ) = deg(𝑇|𝐸𝑖 ) = 𝑘 }

for 1 ≤ 𝑘 ≤ 𝑑 = deg(𝑇). The action of 𝑇 is the same on each summand in ℋ𝑘 . 𝑠

Proposition 1.23. In any direct sum decomposition 𝑉 = ⨁𝑗=1 𝐸𝑗 into cyclic 𝑇-invariant subspaces, the number of spaces of dimension dim(𝐸𝑖 ) = 𝑘, 1 ≤ 𝑘 ≤ 𝑑 = deg(𝑇) can be computed in terms of the dimensions of the quotients 𝐾𝑖 /𝐾𝑖−1 . These numbers are then the same for all cyclic decompositions. Proof. Let us regard Figure 1.1 as a 𝑑 × 𝑑 array of “cells” with 𝐶𝑖𝑗 the cell in Row𝑖 (from the top) and Col𝑗 (from the left); the “size” |𝐶𝑖𝑗 | of a cell is the number of basis vectors it contains. Note that 1. |𝐶𝑖𝑗 | = 0 if the cell lies above the diagonal, with 𝑗 > 𝑖, because those cells are empty (others may be empty too). 2. |𝐶𝑖𝑗 | = |ℱ𝑗 | for all cells on and below the diagonal in Col𝑗 of the array. In particular, |𝐶𝑗1 | = |ℱ1 | for all nonempty cells in Col1 , |𝐶𝑗2 | = |ℱ2 | for those in Col2 , etc.

1.2. FINE STRUCTURE OF NILPOTENT OPERATORS

17

By our construction, it is evident that vectors in the nonempty cells in Row𝑟 of Figure 1.1 correspond to a basis for the quotient space 𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 . Counting the total number of basis vectors in Row𝑟 , we find that dim(𝐾𝑑−𝑟 /𝐾𝑑−𝑟−1 ) = |𝐶𝑟,1 | + ⋯ + |𝐶𝑟,𝑟 | = |ℱ1 | + ⋯ + |ℱ𝑟+1 |. We may now recursively compute the values of |𝐶𝑟𝑗 | and |ℱ𝑗 | from the dimensions of the quotent spaces 𝐾𝑖 /𝐾𝑖−1 . But as noted above, each 𝑒𝑖 ∈ ℱ𝑘 lies in the diagonal cell 𝐶𝑘𝑘 and generates a distinct cyclic space in the decomposition. That completes the proof of Proposition 1.23 □ Remark 1.24. To summarize, 1. We define 𝐾𝑖 = ker(𝑇 𝑖 ) for 1 ≤ 𝑑 = 𝑖 nilpotence degree of 𝑇. 2. The following relations hold: ℰ1 = ℱ1 ⊆ 𝑉 ∼ 𝐾𝑑−1 is a basis for 𝑉/𝐾𝑑−1 , ℰ2 = ℰ1 ∪ [ 𝑇(ℱ1 ) ∪ ℱ2 ] ⊆ 𝑉 ∼ 𝐾𝑑−2 is a basis for 𝑉/𝐾𝑑−2 , ⋮ ℰ𝑟+1 = ℰ𝑟 ∪[ 𝑇 𝑟 (ℱ1 )∪𝑇 𝑟−1 (ℱ2 ) ∪⋯∪ ℱ𝑟+1 ] ⊆ 𝑉 ∼ 𝐾𝑑−𝑟 is a basis for 𝑉/𝐾𝑑−𝑟−1 , ⋮ ℰ𝑑 = ℰ𝑑−1 ∪ [ 𝑇 𝑑−1 (ℱ1 ) ∪ ⋯ ∪ 𝑇(ℱ𝑑−1 ) ∪ ℱ𝑑 ] is a basis for all of 𝑉. ○ In working examples, it usually helps to start by determining a basis ℬ(0) = ℬ ∪ ⋯ ∪ ℬ(𝑑) for 𝑉 aligned with the kernels so that ℬ(1) is a basis for 𝐾1 , ℬ(2) determines a basis for 𝐾2 /𝐾1 , etc. This yields a convenient basis in 𝑉 to start the construction. (1)

Example 1.25. Let 𝑉 = 𝕂5 and 𝑇 ∶ 𝑉 → 𝑉 the operator 𝑇(𝑥1 , … , 𝑥5 ) = (0, 𝑥3 + 𝑥4 , 0, 𝑥3 , 𝑥1 + 𝑥4 ) whose matrix with respect to the standard basis 𝔛 = {𝑒1 , … , 𝑒5 } in 𝕂5 is

𝐴 = [𝑇]𝔛𝔛

0 ⎛0 ⎜ =⎜0 ⎜0 ⎝1

0 0 0 0 0

0 1 0 1 0

0 1 0 0 1

0 0 0 0 0

⎞ ⎟ ⎟. ⎟ ⎠

Show that 𝑇 is nilpotent, and then determine deg(𝑇) and the kernels 𝑉 = 𝐾𝑑 ⊇ 𝐾𝑑−1 ⊇ ⋯ ⊇ 𝐾1 ⊇ {0}. Find a basis 𝔜 such that [𝑇]𝔜𝔜 has block diagonal form, with each block 𝐵𝑖 an elementary nilpotent matrix. This is the Jordan canonical form for a nilpotent linear operator.

18

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

Discussion. First find bases for the kernels 𝐾𝑖 = ker(𝑇 𝑖 ). We have 𝐾1 = ker(𝑇) = {𝐱 ∶ 𝑥3 + 𝑥4 = 0, 𝑥3 = 0, 𝑥1 + 𝑥4 = 0} = {𝐱 ∶ 𝑥4 = 𝑥3 = 0, 𝑥1 + 𝑥4 = 0} = {𝐱 ∶ 𝑥1 = 𝑥3 = 𝑥4 = 0} = {(0, 𝑥2 , 0, 0, 𝑥5 ) ∶ 𝑥2 , 𝑥5 ∈ 𝕂} = 𝕂-span{𝑒2 , 𝑒5 }. Iteration of 𝑇 sends 𝐱 to 𝑇(𝐱) = (0, 𝑥3 + 𝑥4 , 0, 𝑥3 , 𝑥1 + 𝑥4 ) 𝑇 2 (𝐱) = 𝑇(𝑇(𝐱)) = (0, 𝑥3 , 0, 0, 𝑥3 ) 𝑇 3 (𝐱) = (0, … , 0) for 𝐱 ∈ 𝕂5 . Clearly 𝑇 is nilpotent with deg(𝑇) = 3, and |𝐾1 | = 2 ∶ 𝐾1 = 𝕂-span{𝑒2 , 𝑒5 } = {𝐱 ∶ 𝑥1 = 𝑥3 = 𝑥4 = 0}, |𝐾2 | = 4 ∶ 𝐾2 = ker(𝑇 2 ) = {𝐱 ∶ 𝑥3 = 0} = 𝕂-span{𝑒1 , 𝑒2 , 𝑒4 , 𝑒5 }, and |𝐾3 | = 5 ∶ 𝐾3 = 𝕂5 . In this example, 𝔛 = {𝑒2 , 𝑒5 ; 𝑒1 , 𝑒4 ; 𝑒3 } = ℬ(1) ∪ ℬ(2) ∪ ℬ(3) is an ordered basis for 𝑉 aligned with the kernels, running through (0) ⊆ 𝐾1 ⊆ 𝐾2 ⊆ 𝐾3 = 𝑉. From this, we can determine the families ℱ1 , ℱ2 , ℱ3 of Theorem 1.16. Step 1: Since |𝐾3 /𝐾2 | = 1, any 𝐱 ≠ 𝟎 in the layer 𝐾3 ∼ 𝐾2 = {𝐱 ∶ 𝑥3 ≠ 0} yields a basis vector for 𝐾3 /𝐾2 . We shall take ℱ1 = {𝑒3 } chosen from the standard basis 𝔛, and then ℰ1 = {𝑒3 } too. (Any 𝐱 with 𝑥3 ≠ 0 would also work.) Step 2: The image set 𝑇(ℱ1 ) consisting of the single vector 𝑇(𝑒3 ) = 𝑒2 + 𝑒4 lies in the next layer 𝐾2 ∼ 𝐾1 = {𝐱 ∶ 𝑥3 = 0} ∼ 𝕂-span{𝑒2 , 𝑒5 } = {𝐱 ∶ 𝑥3 = 0 and 𝑥1 , 𝑥4 are not both = 0}. Since |𝑇(ℱ1 )| = 1 and dim(𝐾2 /𝐾1 ) = |𝐾2 |−|𝐾1 | = 4−2 = 2, we must adjoin one suitably chosen new vector 𝐱 from layer 𝐾2 ∼ 𝐾1 to 𝑇(ℱ1 ) to get representatives for the desired basis in 𝐾2 /𝐾1 . Then ℱ2 = {𝐱}, and ℰ2 = (ℱ1 ∪ 𝑇(ℱ1 )) ∪ ℱ2 = {𝑒3 , 𝑒2 + 𝑒4 , 𝐱} is a basis for 𝑉/𝐾2 as in the first inductive step of Theorem 1.16. A suitable vector 𝐱 = (𝑥1 , … , 𝑥5 ) in 𝐾2 ∼ 𝐾1 must have 𝑥3 = 0 so 𝐱 ∈ 𝐾2 and 𝑥1 , 𝑥3 , 𝑥4 not all zero (so 𝑥 ∉ 𝐾1 ). This holds if and only if 𝑥3 = 0 and 𝑥1 , 𝑥4 are not both 0. But we must also ensure that our choice of 𝐱 makes {𝑒3 , 𝑒2 +𝑒4 , 𝐱} independent (mod 𝐾1 ). The following lemma is helpful. Lemma 1.26. Let 𝑉 = 𝕂𝑛 , 𝑊 a subspace, 𝔛 = {𝑣1 , … , 𝑣𝑟 } independent vectors in 𝑉, and 𝔜 = {𝑤1 , … , 𝑤𝑛−𝑟 } a basis for 𝑊. If 𝑀 = 𝕂-span{𝑣1 , … , 𝑣𝑟 }, the following assertions are equivalent: 1. 𝔛 is a set of representatives for a basis in 𝑉/𝑊. 2. 𝔜 ∪ 𝔛 = {𝑤1 , … , 𝑤𝑛−𝑟 , 𝑣1 , … , 𝑣𝑟 } is a basis for 𝑉. 3. 𝑉 = 𝑊 ⊕ 𝑀 (direct sum of subspaces). In every case, we have 𝑟 = |𝑉/𝑊|.

1.2. FINE STRUCTURE OF NILPOTENT OPERATORS

19

Proof. In LA I, we showed that the images 𝑣1 , … , 𝑣𝑟 are a basis for 𝑉/𝑊 ⇔ {𝑣1 , … , 𝑣𝑟 } ∪ 𝔜 are a basis for 𝑉. Obviously (2) ⇔ (3). □ Corollary 1.27. In the setting of Theorem 1.79 in LA I, the “outside vectors” 𝑣1 , … , 𝑣𝑟 ∉ 𝑊 are representatives for a basis {𝑣1 , … , 𝑣𝑟 } in 𝑉/𝑊 if and only if the 𝑛 × 𝑛 matrix 𝐴 whose rows are 𝑅1 = 𝑣1 , … , 𝑅𝑟 = 𝑣𝑟 , 𝑅𝑟+1 = 𝑤1 , … , 𝑅𝑛 = 𝑤𝑛−𝑟 has rank(𝐴) = 𝑛. Armed with this observation (and the known basis {𝑒2 , 𝑒5 } for 𝐾1 ), we seek a vector 𝐱 = (𝑥1 , … , 𝑥5 ) such that (i) 𝐱 ∈ 𝐾2 (so 𝑥3 = 0), (ii) 𝑥1 , 𝑥4 are not both equal to 0, and (iii) 𝑒3 ⎛ 𝑒2 + 𝑒4 ⎜ 𝐴 = ⎜ (𝑥1 , 𝑥2 , 0, 𝑥4 , 𝑥5 ) 𝑒2 ⎜ 𝑒5 ⎝

0 0 ⎞ ⎛ 0 1 ⎟ ⎜𝑥 𝑥 2 ⎟=⎜ 1 ⎟ ⎜ 0 1 ⎠ ⎝ 0 0

1 0 0 0 1 0 ⎞ 0 𝑥4 𝑥5 ⎟⎟ 0 0 0 ⎟ 0 0 1 ⎠

has Rowrank(𝐴) = 5. Symbolic row operations put this into the form ⎛ ⎜ ⎜ ⎜ ⎝

𝑥1 𝑥2 0 1 0 0 0 0 0 0

0 𝑥4 𝑥5 0 0 0 ⎞ 1 0 0 ⎟⎟ , 0 1 0 ⎟ 0 0 1 ⎠

which has rank = 5 if and only if 𝑥1 ≠ 0. Thus we may take 𝑒1 as the additional vector we seek, and then ℱ1 = {𝑒3 },

𝑇(ℱ1 ) = {𝑒2 + 𝑒4 },

ℱ2 = {𝑒1 },

and ℰ2 = [ℱ1 ∪ 𝑇(ℱ1 )] ∪ ℱ2 . That completes Step 2. (Actually any 𝐱 with 𝑥1 ≠ 0, 𝑥3 = 0 would work.) Step 3: In the next layer 𝐾1 ∼ 𝐾0 , we have the vectors 𝑇 2 (ℱ1 ) = {𝑇 2 (𝑒3 ) = 𝑇(𝑒2 + 𝑒4 ) = 𝑒2 + 𝑒5 }

and

𝑇(ℱ2 ) = {𝑇(𝑒1 )} = {𝑒5 }.

Since |𝐾1 /𝐾0 | = |𝐾1 | = 2, there is no need to adjoin additional vectors from this layer, so ℱ3 = ∅. A suitable basis in 𝑉 is ℰ3 = ℱ1 ∪ [𝑇(ℱ1 ) ∪ ℱ2 ] ∪ [ 𝑇 2 (ℱ1 ) ∪ 𝑇(ℱ2 ) ] = {𝑒3 ; 𝑒2 + 𝑒4 , 𝑒1 ; 𝑒2 + 𝑒5 , 𝑒5 }. The diagram in Figure 1.1 takes the form (1)

ℱ1 = 𝑒1 = 𝑒3 (1)

𝑇(ℱ1 ) = 𝑒2 = 𝑒2 + 𝑒4

(2)

ℱ2 = 𝑒1 = 𝑒1

(1)

,

(2)

𝑇 2 (ℱ1 ) = 𝑒3 = 𝑒2 + 𝑒5 𝑇(ℱ2 ) = 𝑒2 = 𝑒5 and the iterated action of 𝑇 sends 𝑒3 → 𝑇(𝑒3 ) = 𝑒2 + 𝑒4 → 𝑇 2 (𝑒3 ) = 𝑒2 + 𝑒5 → 0

and

𝑒1 → 𝑇(𝑒1 ) = 𝑒5 → 0.

20

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

The cyclic subspaces are 𝐸1 = 𝕂 -span{𝑒3 , 𝑇(𝑒3 ), 𝑇 2 (𝑒3 )} = 𝕂-span{𝑒3 , 𝑒2 + 𝑒4 , 𝑒2 + 𝑒5 } and 𝐸2 = 𝕂-span{𝑒1 , 𝑇(𝑒1 )} = 𝕂-span{𝑒1 , 𝑒5 }, with 𝑉 = 𝐸1 ⊕ 𝐸2 . With respect to this basis, [𝑇]𝔛𝔛 has the block diagonal form

[𝑇]𝔛𝔛

0 ⎛0 ⎜ =⎜0 ⎜0 ⎝0

1 0 0 0 0 1 0 0 ⎞ 0 0 0 0 ⎟, ⎟ 0 0 0 1 ⎟ 0 0 0 0 ⎠

with each diagonal block an elementary nilpotent matrix. The number and size of such blocks are uniquely determined, but the bases are not unique, nor are the cyclic subspaces in the splitting 𝑉 = 𝐸1 ⊕𝐸2 . That concludes our discussion of Example 1.25. ○ Exercise 1.28. Let 𝑊 be the 3-dimensional subspace in 𝑉 = 𝕂5 determined by the equations 𝑥 − 2𝑥2 + 𝑥3 − 𝑥5 = 0, { 1 3𝑥1 + 5𝑥3 − 𝑥4 = 0, which is equivalent to the matrix equation 𝐴𝐱 = 𝟎 with 𝐴=(

1 −2 1 0 −1 ). 3 0 5 −1 0

(a) Find vectors {𝑣1 , 𝑣2 , 𝑣3 } that are a basis for 𝑊. (b) Find 2 vectors {𝑣4 , 𝑣5 } that are representatives for a basis in 𝑉/𝑊. (c) Find two of the standard basis vectors {𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 , 𝑒5 } in 𝕂5 that are a basis for 𝑉 (mod 𝑊). Exercise 1.29. Do either of the vectors 𝑓1 = 2𝑒1 − 3𝑒2 + 𝑒3 + 𝑒4

𝑓2 = −𝑒1 + 2𝑒2 + 5𝑒3 − 2𝑒4

in 𝕂5 lie in the subspace 𝑊 determined by the system of equations in Exercise 1.28? Do these vectors form a basis for 𝕂5 (mod 𝑊)? Exercise 1.30. Which of the following matrices 𝐴 are nilpotent? 0 0 0 (a) ( 1 0 0 ) 0 1 0

0 1 2 (b) ( 0 0 3 ) 1 0 0

1 2 −1 1) (c) ( −1 −2 −1 −2 1

5 −6 −6 4 2) (d) ( −1 3 −6 4

If 𝐴 is nilpotent, find a basis for 𝕂3 that puts 𝐴 into block diagonal form with elementary nilpotent blocks. What is the resulting block diagonal form if the blocks are listed in order of decreasing size?

1.3. GENERALIZED EIGENSPACES

21

Hint. As a quick test before launching into calculations, if 𝐴 is nilpotent what does that tell you about Tr(𝐴) and det(𝐴)? Exercise 1.31. If 𝑁1 , 𝑁2 are nilpotent, is 𝑁1 𝑁2 nilpotent? What if 𝑁1 and 𝑁2 commute, with 𝑁1 𝑁2 = 𝑁2 𝑁1 ? Exercise 1.32. Prove the following: (a) If 𝑁1 , 𝑁2 are nilpotent operators that commute, so 𝑁1 𝑁2 = 𝑁2 𝑁1 , then linear combinations 𝑐1 𝑁1 + 𝑐2 𝑁2 are also nilpotent. (b) If 𝑁1 , … , 𝑁𝑟 are nilpotent and commute pairwise, so [𝑁𝑖 , 𝑁𝑗 ] = 0 for 𝑖 ≠ 𝑗, then all operators in 𝕂-span{𝑁1 , … , 𝑁𝑟 } are nilpotent. 𝑛

Exercise 1.33. Let 𝑉 = 𝒫𝑛 (𝕂) be the polynomials 𝑓 = ∑𝑘=0 𝑐𝑘 𝑥𝑘 ∈ 𝕂[𝑥] of degree ≤ 𝑛. (a) Show that the differentiation operator 𝐷 ∶ 𝑉 → 𝑉 𝑑𝑓 = 𝑐1 + 2𝑐2 𝑥 + ⋯ + 𝑛 ⋅ 𝑐𝑛 𝑥𝑛−1 𝑑𝑥 is nilpotent with deg(𝐷) = 𝑛 + 1 = dim(𝑉). (b) Prove that any constant-coefficient differential operator 𝐿 ∶ 𝑉 → 𝑉 of the form 𝑎1 𝐷 + 𝑎2 𝐷2 + ⋯ + 𝑎𝑛 𝐷𝑛 that has no constant term 𝑐0 𝐼 is nilpotent on 𝑉. (c) Does this remain true if there is a nonzero constant term 𝑐0 𝐼 in (b)? 𝐷𝑓 =

Exercise 1.34. In the space of polynomials ℝ[𝑥], let 𝑉1 = {𝑓 ∶ 𝑓(−𝑥) = 𝑓(𝑥), the even polynomials}, 𝑉2 = {𝑓 ∶ 𝑓(−𝑥) = −𝑓(𝑥), the odd polynomials}. Prove that 𝒫𝑛 is their direct sum 𝑉1 ⊕ 𝑉2 . Are these subspaces invariant under differentiation? Exercise 1.35. Explain why Tr(𝑇) = 0 and det(𝑇) = 0 for any nilpotent linear operator 𝑇 ∶ 𝑉 → 𝑉 on a finite-dimensional space. Is the converse true? 1.3. Generalized Eigenspaces We will give a general structure theorem for linear operators 𝑇 over a field 𝕂 large enough that the characteristic polynomial 𝑝𝑇 (𝑥) = det(𝑇 − 𝑥𝐼) splits 𝑠 into linear factors 𝑓(𝑥) = 𝑐⋅∏𝑖=1 (𝑥−𝑎𝑖 )𝑚𝑖 in 𝕂[𝑥]. This is always true if 𝕂 = ℂ but need not be true for other fields; and even if 𝑝𝑇 (𝑥) does split, that alone is not enough to guarantee 𝑇 is diagonalizable. In Appendix A at the end of this chapter, we briefly review diagonalizability of linear operators over a general field 𝕂, which means that 𝑉 contains a basis of eigenvectors, so the eigenspaces 𝐸𝜆 (𝑇) span 𝑉. If you already have a good understanding of these matters, you may nevertheless wish to review Theorem A.5 in this appendix, as well as its proof. The techniques used there will be applied to develop the generalized eigenspace decomposition and Jordan canonical form, the main topics of this section.

22

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

The Fitting decomposition (Lemma 1.6), reprised below, is a first step toward decomposing linear operators 𝑇 ∶ 𝑉 → 𝑉 that are not diagonalizable. (Fitting decomposition) Given a linear map 𝑇 ∶ 𝑉 → 𝑉 on a finite-dimensional vector space over any field, there is a direct sum decomposition 𝑉 = 𝑁 ⊕ 𝑆 into 𝑇-invariant subspaces such that 𝑇|𝑆 ∶ 𝑆 → 𝑆 is a bijection (invertible linear operator on 𝑆) and 𝑇|𝑁 ∶ 𝑁 → 𝑁 is nilpotent. The relevant subspaces are the stable kernel and stable range of 𝑇, ∞

𝐾∞ =

⋃

𝐾𝑖 (𝐾𝑖 = ker(𝑇 𝑖 ) with {0} ⫋ 𝐾1 ⫋ ⋯ ⫋ 𝐾𝑟 = 𝐾𝑟+1 = ⋯ = 𝐾∞ ),

⋂

𝑅𝑖 (𝑅𝑖 = range(𝑇 𝑖 ) with {0} ⫌ 𝑅1 ⫌ ⋯ ⫌ 𝑅𝑟 = 𝑅𝑟+1 = ⋯ = 𝑅∞ )

𝑖=1 ∞

𝑅∞ =

𝑖=1

(see Section 1.1). Obviously, 𝑇 = (𝑇|𝑅∞ ) ⊕ (𝑇|𝐾∞ ) which splits 𝑇 into canonically defined nilpotent and invertible parts. Exercise 1.36. Prove that the Fitting decomposition is unique: If 𝑉 = 𝑁 ⊕ 𝑆, both 𝑇-invariant, such that 𝑇|𝑁 ∶ 𝑁 → 𝑁 is nilpotent and 𝑇|𝑆 ∶ 𝑆 → 𝑆 invertible, show that 𝑁 = 𝐾∞ (𝑇) and 𝑆 = 𝑅∞ (𝑇). Given a linear operator 𝑇 ∶ 𝑉 → 𝑉, we may apply these remarks to the operators (𝑇 − 𝜆𝐼) associated with eigenvalues 𝜆 in sp𝕂 (𝑇). The eigenspace 𝐸𝜆 (𝑇) = ker(𝑇 − 𝜆𝐼) is the first in an ascending chain of 𝑇-invariant subspaces: {0} ⫋ ker(𝑇 − 𝜆) = 𝐸𝜆 (𝑇) ⫋ ker(𝑇 − 𝜆)2 ⫋ ⋯ ⫋ ker(𝑇 − 𝜆)𝑟 = ⋯ = 𝐾∞ (𝑇 − 𝜆). Definition 1.37. If 𝜆 ∈ 𝕂, the “stable kernel” of (𝑇 − 𝜆𝐼) ∞

𝐾∞ (𝜆) =

⋃

ker(𝑇 − 𝜆𝐼)𝑚

𝑚=1

is called the generalized 𝜆-eigenspace, which we will hereafter denote by 𝑀𝜆 (𝑇). (1.6)

𝑀𝜆 (𝑇) = {𝑣 ∈ 𝑉 ∶ (𝑇 − 𝜆𝐼)𝑘 𝑣 = 0 for some 𝑘 ∈ ℕ} ⊇ 𝐸𝜆 (𝑇) = {𝑣 ∶ (𝑇 − 𝜆𝐼)𝑣 = 0}

We refer to any 𝜆 ∈ 𝕂 such that 𝑀𝜆 (𝑇) ≠ (0) as a generalized eigenvalue for 𝑇. Since 𝑀𝜆 (𝑇) ≠ {0} ⇔ 𝐸𝜆 (𝑇) ≠ {0} ⇔ det(𝑇 − 𝜆𝐼) = 0, these are just the usual eigenvalues of 𝑇 in 𝕂. Generalized eigenspaces have the properties laid out below. Lemma 1.38. The spaces 𝑀𝜆 (𝑇) are 𝑇-invariant. Proof. 𝑇 commutes with all the operators (𝑇 − 𝜆)𝑚 , which commute with each other. Thus, 𝑣 ∈ 𝑀𝜆 (𝑇) ⇒ (𝑇 − 𝜆𝐼)𝑘 𝑣 = 0 for some 𝑘 ∈ ℕ ⇒ (𝑇 − 𝜆𝐼)𝑘 𝑇(𝑣) = 𝑇(𝑇 − 𝜆𝐼)𝑘 𝑣 = 𝑇(0) = 0. Hence 𝑇(𝑣) ∈ 𝑀𝜆 (𝑇).

□

1.3. GENERALIZED EIGENSPACES

23

We now show that 𝑇|𝑀𝜆 has upper triangular form with respect to a suitably chosen basis in 𝑀𝜆 . Proposition 1.39 (Block Upper Triangular Decomposition of 𝑀𝜆 (𝑇)). Each generalized eigenspace 𝑀𝜆 (𝑇), 𝜆 ∈ sp(𝑇), has a basis 𝔛 that puts the matrix of 𝑇|𝑀𝜆 (𝑇) into upper triangular form: [𝑇|𝑀𝜆 ]𝔛𝔛

𝜆 ∗ ⎛ ⎞ ⋱ =⎜ ⎟. ⋱ ⎜ ⎟ 𝜆⎠ ⎝0

Proof. We already know that any nilpotent operator 𝑁 on a finite-dimensional vector space can be put into strictly upper triangular form by a suitable choice of basis: 0 ∗ ⎞ ⎛ ⋱ [𝑁]𝔛𝔛 = ⎜ ⎟. ⋱ ⎟ ⎜ 0⎠ ⎝0 Now write 𝑇|𝑀𝜆 = (𝑇 − 𝜆𝐼)|𝑀𝜆 + 𝜆 𝐼|𝑀𝜆 in which 𝑉 = 𝑀𝜆 , 𝑁 = (𝑇 − 𝜆𝐼)|𝑀𝜆 . Since [ 𝐼|𝑀𝜆 ]𝔛𝔛 = 𝐼𝑚×𝑚 for any basis, a basis that puts (𝑇 −𝜆𝐼)|𝑀𝜆 into strict upper triangular form automatically yields 𝜆 [ 𝑇|𝑀𝜆 ]𝔛𝔛 = [ (𝑇 − 𝜆𝐼)|𝑀𝜆 ]𝔛𝔛 + 𝜆[𝐼𝜆 |𝑀𝜆 ]𝔛𝔛 = (

∗ ⋱

0

). 𝜆

□

A more precise result is obtained using the cyclic subspace decomposition for nilpotent operators (Theorem 1.16) to guide the choice of basis. As a preliminary step, we might pick a basis 𝔛 aligned with the kernels (0) ⫋ 𝐾1 = ker(𝑇) ⫋ 𝐾2 = ker(𝑇 2 ) ⫋ ⋯ ⫋ 𝐾𝑑 = 𝑉 where 𝑑 = deg(𝑇). As in Proposition 1.39, [𝑇]𝔛𝔛 is then upper triangular with 𝑚𝑖 × 𝑚𝑖 blocks of zeros on the diagonal and 𝑚1 + ⋯ + 𝑚𝑑 = deg(𝑇). If we apply this to the action of (𝑇 − 𝜆𝐼) on 𝑀𝜆 (𝑇), the matrix of the nilpotent operator 𝑇 − 𝜆𝐼 becomes upper triangular with zero blocks on the diagonal. Writing 𝑇 = (𝑇 − 𝜆𝐼) + 𝜆𝐼, the matrix of 𝑇 with respect to any basis 𝔛 running through successive kernels 𝐾𝑖 = ker(𝑇 − 𝜆𝐼)𝑖 must have the form [𝑇|𝑀𝜆 ]𝔛𝔛 = 𝜆 ⋅ 𝐼𝑛×𝑛 + [ 𝑇 − 𝜆𝐼 ]𝔛𝔛

∗ 𝜆 ⋅ 𝐼𝑚1 ×𝑚1 ⎞ ⎛ 𝜆 ⋅ 𝐼𝑚2 ×𝑚2 ⎟ ⎜ (1.7) =⎜ ⎟ ⋱ ⎟ ⎜ 0 𝜆 ⋅ 𝐼𝑚𝑟 ×𝑚𝑟 ⎠ ⎝ with 𝑚𝑖 = dim(𝐾𝑖 /𝐾𝑖−1 ) = dim(𝐾𝑖 ) − dim(𝐾𝑖−1 ) and 𝑛 = dim(𝑉) = ∑𝑖 𝑚𝑖 . The shape of the “block upper triangular form” (1.7) is completely determined by the dimensions of the kernels 𝐾𝑖 = 𝐾𝑖 (𝑇 − 𝜆𝐼).

24

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

In short, (1.7) says that 𝑇|𝑀𝜆 = 𝜆𝐼𝜆 + 𝑁𝜆 where 𝐼𝜆 = id𝑀𝜆 , 𝜆𝐼𝜆 is a scalar operator on 𝑀𝜆 , and 𝑁𝜆 = (𝑇 − 𝜆𝐼)|𝑀𝜆 is a nilpotent operator whose matrix with respect to the basis 𝔛 is the matrix in (1.7) but with 𝑚𝑖 × 𝑚𝑖 zero-blocks on the diagonal. We conclude that the restriction 𝑇|𝑀𝜆 has an “additive decomposition” 𝑇|𝑀𝜆 = (scalar) + (nilpotent) into commuting scalar and nilpotent parts, 𝜆 ∗ 𝜆 0 0 ∗ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⋱ ⋱ ⋱ 𝑇|𝑀𝜆 = 𝜆 𝐼+𝑁𝜆 = ⎜ ⎟ = ⎜ ⎟+⎜ ⎟. ⋱ ⋱ ⋱ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 𝜆⎠ 𝜆⎠ ⎝0 0⎠ ⎝0 ⎝0 Both terms in this decomposition commute with 𝑇|𝑀𝜆 . There is also a “multiplicative decomposition” 𝑇|𝑀𝜆 = (scalar ) ⋅ (unipotent) = (𝜆𝐼) ⋅ 𝑈𝜆 , where 𝑈𝜆 is the unipotent operator (𝐼 + 𝑁𝜆 ); the corresponding matrices are 𝜆 ∗ ⎛ ⎞ ⋱ ⎜ ⎟ = ⋱ ⎜ ⎟ 𝜆⎠ ⎝0 The off-diagonal entries (∗) in decompositions.

𝜆 0 1 ∗ ⎛ ⎞ ⎛ ⎞ ⋱ ⋱ ⎜ ⎟⋅⎜ ⎟ . ⋱ ⋱ ⎜ ⎟ ⎜ ⎟ 𝜆⎠ ⎝0 1 ⎠ ⎝0 𝑁𝜆 and 𝑈𝜆 need not be the same in these two

Jordan Decomposition of a Generalized Eigenspace. We now show that the block-upper triangular description of the action of 𝑇 on 𝑀𝜆 (𝑇) can be refined to provide much more information about the off-diagonal terms (∗). We will also see that for many purposes the block upper-triangular form will suffice and is easier to compute than the Jordan decomposition since we only need to determine the kernels 𝐾𝑖 . Now consider what happens if we take a basis 𝔜 in 𝑀𝜆 corresponding to a cyclic subspace decomposition of the nilpotent operator 𝑁𝜆 = (𝑇 − 𝜆𝐼)|𝑀𝜆 = (𝑇|𝑀𝜆 ) − 𝜆𝐼𝜆

(𝐼𝜆 = 𝐼|𝑀𝜆 ).

Then [𝜆𝐼𝜆 ]𝔜𝔜 is 𝜆 times the identity matrix (as it is for any basis in 𝑀𝜆 ) while [𝑁𝜆 ]𝔜𝔜 consists of diagonal blocks, each an elementary nilpotent matrix 𝑁𝑖 .

[ 𝑁𝜆 ]𝔜𝔜 = [ (𝑇 − 𝜆𝐼)|𝑀𝜆 ]𝔜𝔜

with

𝑁 ⎛ 1 ⎜ =⎜ ⎜ ⎜ ⎝ 0

0

⋱ ⋱

0 1 0 ⎞ ⎛ ⋱ ⋱ ⎟ ⎜ ⋱ ⋱ 𝑁𝑖 = ⎜ ⎟, ⋱ 1⎟ ⎜ 0⎠ ⎝0

⋱

⎞ ⎟ ⎟ ⎟ ⎟ 𝑁𝑟 ⎠

1.4. THE GENERALIZED EIGENSPACE DECOMPOSITION

25

of size 𝑑𝑖 × 𝑑𝑖 , but with 𝑁𝑖 a 1 × 1 zero matrix when 𝑑𝑖 = 1. This yields the Jordan Block Decomposition of 𝑇 on a typical generalized eigenspace 𝑀𝜆 . Theorem 1.40 (Jordan Decomposition of 𝑇|𝑀𝜆 ). If 𝑇 ∶ 𝑉 → 𝑉 is a linear operator on a finite-dimensional space, then in each generalized eigenspace 𝑀𝜆 , there is a basis 𝔜 that puts the restriction 𝑇|𝑀𝜆 into block upper triangular form

(1.8) [ 𝑇|𝑀𝜆 ]𝔜𝔜 = 𝜆 ⋅ [𝐼|𝑀𝜆 ]𝔜𝔜 + [ 𝑁𝜆 ]𝔜𝔜

𝐵 ⎛ 1 ⎜ =⎜ ⎜ ⎜ ⎝ 0

0 ⋱ ⋱ 𝐵𝑚 𝜆 𝐼𝑟×𝑟

⎞ ⎟ ⎟. ⎟ ⎟ ⎠

When 𝑑1 > 1, where each 𝐵𝑖 = 𝜆 𝐼𝑑𝑖 ×𝑑𝑖 + (elementary nilpotent) is referred to as a “Jordan block,” 𝜆 1 0 ⎛ ⎞ ⋱ ⋱ ⎜ ⎟ ⋱ ⋱ 𝐵𝑖 = ⎜ ⎟. ⋱ 1⎟ ⎜ 𝜆⎠ ⎝0 Remark 1.41. The last block in (1.8) is exceptional. The others correspond to restrictions 𝑇|𝐶𝑖 (𝜆) to cyclic subspaces of dimension 𝑑𝑖 > 1 in a cyclic subspace decomposition 𝑀𝜆 (𝑇) = 𝐶1 (𝜆) ⊕ ⋯ ⊕ 𝐶𝑚 (𝜆). However, some cyclic subspaces might be one-dimensional, and any such 𝐶𝑖 (𝜆) is contained in the ordinary eigenspace 𝐸𝜆 (𝑇). If there are 𝑟 such degenerate cyclic subspaces, we may lump them together into a single subspace 𝐸=

{𝐶 (𝜆) ∶ dim(𝐶𝑖 (𝜆)) = 1} ⊆ 𝐸𝜆 (𝑇) ⊆ 𝑀𝜆 (𝑇). ⨁ 𝑖 If we write dim(𝐸) = 𝑟 and 𝑇|𝐸 = 𝜆 ⋅ 𝐼𝐸 , it should also be evident that 𝑟 + 𝑑1 + ⋯ + 𝑑𝑚 = dim (𝑀𝜆 (𝑇)).

○

1.4. The Generalized Eigenspace Decomposition So far we have only determined the structure of 𝑇 restricted to a single generalized eigenspace 𝑀𝜆 (𝑇). Several issues must be resolved to arrive at the overall structure of 𝑇 on 𝑉. • If the generalized eigenspaces 𝑀𝜆𝑖 (𝑇) fail to span 𝑉 (for instance if 𝑝𝑇 (𝑥) fails to split in 𝕂[𝑥]), knowing the behavior of 𝑇 only on their span 𝑀 = ∑ 𝑀𝜆 (𝑇) 𝜆∈sp(𝑇)

leaves the global behavior of 𝑇 beyond reach.

26

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

• It is equally important to prove (as we did in LA I, Proposition 2.29, for ordinary eigenspaces) that the span of the generalized eigenspaces is in fact a direct sum, 𝑀=

𝑀𝜆 (𝑇).

⨁

𝜆∈sp(𝑇)

That means the actions of 𝑇 on different 𝑀𝜆 are independent and can be examined separately to get 𝑇|𝑀 = ⨁𝜆∈sp(𝑇) (𝑇|𝑀𝜆 ). Both issues will be resolved in our favor if 𝑝𝑇 (𝑥) splits into linear factors in 𝕂[𝑥], which is always true if 𝕂 = ℂ. We are not so lucky for linear operators over 𝕂 = ℝ or over finite fields such as 𝕂 = ℤ𝑝 . When this program succeeds, the result is the generalized eigenspace decomposition. Theorem 1.42 (Generalized Eigenspace Decomposition). If 𝑇 is a linear operator on a finite-dimensional vector space 𝑉 and its characteristic polynomial 𝑝𝑇 (𝑥) = det(𝑇 − 𝑥 𝐼) splits over 𝕂, then 𝑉 is a direct sum of the generalized eigenspaces for 𝑇, 𝑉= 𝑀𝜆 (𝑇). ⨁ 𝜆∈sp(𝑇)

Since the spaces 𝑀𝜆 are 𝑇-invariant, we obtain a decomposition of 𝑇 itself (1.9)

𝑇=

⨁

𝑇|𝑀𝜆 (𝑇)

𝜆∈sp(𝑇)

into operators, each of which can be put into Jordan form (1.8) by choosing bases compatible with a decomposition of 𝑀𝜆 into 𝑇-invariant subspaces that are cyclic under the action of (𝑇 − 𝜆𝐼). Proof. The following proof that the generalized eigenspaces are independent components in a direct sum follows the same lines as a similar result for ordinary eigenspaces (Proposition 2.29 of LA I) but with additional technical complications. Proof that they span 𝑉 will require some new ideas based on the Fitting decomposition. Proposition 1.43 (Independence of the 𝑀𝜆 ). If 𝑇 ∶ 𝑉 → 𝑉 is a linear operator on a finite-dimensional space, the span 𝑀 = ∑𝜆∈sp(𝑇) 𝑀𝜆 (𝑇) of the generalized eigenspaces (which might be a proper subspace in 𝑉) is always a direct sum, 𝑀 = ⨁𝜆∈sp(𝑇) 𝑀𝜆 (𝑇). Proof. Let 𝜆1 , … , 𝜆𝑟 be the distinct eigenvalues in 𝕂. By definition of “direct sum” we must show the components 𝑀𝜆 are independent so that (1.10)

0 = 𝑣1 + ⋯ + 𝑣𝑟 with 𝑣𝑖 ∈ 𝑀𝜆𝑖 ⇒ each term 𝑣𝑖 is zero.

Fix an index 𝑘. For each 1 ≤ 𝑖 ≤ 𝑟, let 𝑚𝑗 = deg(𝑇 − 𝜆𝑗 𝐼)|𝑀𝜆 . 𝑗

If 𝑣𝑘 = 0, we’re done, and if 𝑣𝑘 ≠ 0 let 𝑚 = 𝑚(𝑣𝑘 ) ≤ 𝑚𝑘 be the smallest exponent such that (𝑇 − 𝜆𝑘 𝐼)𝑚 𝑣𝑘 = 0 and (𝑇 − 𝜆𝑘 𝐼)𝑚−1 𝑣𝑘 ≠ 0. Then 𝑤 = (𝑇 − 𝜆𝑘 𝐼)𝑚−1 𝑣𝑘 is a nonzero eigenvector in 𝐸𝜆𝑘 .

1.4. THE GENERALIZED EIGENSPACE DECOMPOSITION

27

Define 𝐴 = ∏(𝑇 − 𝜆𝑖 𝐼)𝑚𝑖 ⋅ (𝑇 − 𝜆𝑘 )𝑚−1 , 𝑖≠𝑘

which is keyed to the particular index 𝜆𝑘 and base point 𝑤 as above. Then 0 = 𝐴(0) = 0 + 𝐴𝑣𝑘

(since 𝐴𝑣𝑖 = 0 for 𝑖 ≠ 𝑘)

= ∏(𝑇 − 𝜆𝑖 )𝑚𝑖 ((𝑇 − 𝜆𝑘 )𝑚−1 𝑣𝑘 ) = ∏(𝑇 − 𝜆𝑖 )𝑚𝑖 𝑤 𝑖≠𝑘

𝑖≠𝑘

= ∏ ((𝑇 − 𝜆𝑘 ) + (𝜆𝑘 − 𝜆𝑖 ))

𝑚𝑖

𝑤

(a familiar algebraic trick)

𝑖≠𝑘 𝑚

𝑖 𝑚 = ∏ ∑ ( 𝑖 )(𝑇 − 𝜆𝑘 )𝑚𝑖 −𝑠 (𝜆𝑘 − 𝜆𝑖 )𝑠 𝑤 𝑠 𝑖≠𝑘 𝑠=0

(binomial expansion of (⋯)𝑚𝑖 ).

All terms in the binomial sum are zero except when 𝑠 = 𝑚𝑖 , so we get 0 = ( ∏(𝜆𝑘 − 𝜆𝑖 )𝑚𝑖 ) ⋅ 𝑤. 𝑖≠𝑘

The factor (⋯) is nonzero because the 𝜆𝑖 are the distinct eigenvalues of 𝑇 in 𝕂, so 𝑤 must be zero. This is a contradiction because 𝑤 ≠ 0 by definition. We conclude that every term 𝑣𝑘 in (1.10) is zero, so the span 𝑀 is a direct sum of the 𝑀𝜆 . □ Exercise 1.44. If a linear operator 𝑇 ∶ 𝑉 → 𝑉 has characteristic polynomial (a) 𝑝𝑇 (𝑥) = 𝑥(𝑥 − 2)2 , (b) 𝑝𝑇 (𝑥) = (𝑥 − 2)2 (𝑥 − 5)2 , make diagrams showing the possible Jordan canonical forms 𝑇 can have. Normalize your diagrams by listing Jordan blocks for each eigenvalue in order of decreasing size. Further Properties of Characteristic Polynomials. Before proving Theorem 1.42, we digress to develop a few more facts about characteristic polynomials needed to work out the relation between the eigenvalues of 𝑇 and those of the restriction 𝑇|𝑅∞ , where 𝑅∞ = 𝑅∞ (𝑇 − 𝜆1 𝐼). These will be needed in the induction proof of Theorem 1.42 Lemma 1.45. If 𝐴 = (

𝐵

∗

0 𝐶 then det(𝐴) = det(𝐵) ⋅ det(𝐶).

), where 𝐵 is 𝑟 × 𝑟 and 𝐶 is (𝑛 − 𝑟) × (𝑛 − 𝑟),

Corollary 1.46. If 𝐴 ∈ M(𝑛, 𝕂) is upper triangular with values 𝑐1 , … , 𝑐𝑛 𝑛

on the diagonal, then det(𝐴) = ∏ 𝑐𝑖 . 𝑖=1

28

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

Proof. (Proof of Lemma 1.45) Consider an 𝑛 × 𝑛 template corresponding to some 𝜎 ∈ 𝑆𝑛 . If any of the marked spots in columns 𝐶1 , … , 𝐶𝑟 occur in a row 𝑅𝑖 with 𝑟 + 1 ≤ 𝑖 ≤ 𝑛, then 𝑎𝑖𝑗 = 𝑎𝑖,𝜍(𝑖) = 0 and so is the corresponding term in ∑𝜍∈𝑆 (⋯). Thus, columns 𝐶𝑗 , 1 ≤ 𝑗 ≤ 𝑟, can only be marked in rows 𝑅1 , … , 𝑅𝑟 𝑛 if the template is to yield a nonzero term in det(𝐴). It follows immediately that all columns 𝐶𝑗 with 𝑟 + 1 ≤ 𝑗 ≤ 𝑛 must be marked in rows 𝑅𝑖 with 𝑟 + 1 ≤ 𝑖 ≤ 𝑛 if 𝜎 is to contribute to 𝑛

det(𝐴) = ∑ sgn(𝜎) ⋅ ∏ 𝑎𝑖,𝜍(𝑖) . 𝜍∈𝑆𝑛

𝑖=1

Therefore, only permutations 𝜎 that leave invariant the blocks of indices [1, 𝑟], [𝑟 + 1, 𝑛] can contribute. These 𝜎 are composites of permutations 𝜇 ∈ 𝑆𝑟 and 𝜏 ∈ 𝑆𝑛−𝑟 (identified with a permutation 𝜏 ̃ of {𝑟 + 1, … , 𝑛}), 𝜇(𝑘) 𝜎(𝑘) = 𝜇 × 𝜏(𝑙) ̃ ={ 𝜏(𝑘) ̃ = 𝑟 + 𝜏(𝑘 − 𝑟)

if 1 ≤ 𝑘 ≤ 𝑟, if 𝑟 + 1 ≤ 𝑘 ≤ 𝑛.

Furthermore, we have sgn(𝜎) = sgn(𝜇 × 𝜏) = sgn(𝜇) ⋅ sgn(𝜏) by definition of the signature sgn because 𝜇, 𝜏 operate on disjoint subsets of indices in [1, 𝑛]. In the matrix 𝐴, we have 𝐵𝑘,ℓ = 𝐴𝑘,ℓ 𝐶𝑘,ℓ = 𝐴𝑘+𝑟,ℓ+𝑟

for 1 ≤ 𝑘, ℓ ≤ 𝑟, for 𝑟 + 1 ≤ 𝑘, ℓ ≤ 𝑛,

so we get 𝑛−𝑟

𝑟

det(𝐴) =

∑

sgn(𝜇 × 𝜏)̃ ⋅ ( ∏ 𝐵𝑖,𝜇(𝑖) ) ⋅ ( ∏ 𝐶𝑗,𝜏(𝑗) ) ̃ 𝑖=1

(𝜇,𝜏)∈𝑆𝑟 ×𝑆𝑛−𝑟 𝑟

𝑗=1 𝑛−𝑟

= ( ∑ sgn(𝜇) ⋅ ∏ 𝐵𝑖,𝜇(𝑖) ) ⋅ ( ∑ sgn(𝜏)̃ ⋅ ∏ 𝐶𝑗,𝜏(𝑗) ) ̃ 𝜇∈𝑆𝑟

𝜏∈𝑆𝑛−𝑟

𝑖=1

𝑗=1

□

= det(𝐵) ⋅ det(𝐶).

Corollary 1.47. If 𝑇 ∶ 𝑉 → 𝑉 is a linear operator on a finite-dimensional vector space and 𝑀 ⊆ 𝑉 is a 𝑇-invariant subspace, the characteristic polynomial 𝑝𝑇|𝑀 (𝑥) of the restriction 𝑇|𝑀 divides 𝑝𝑇 (𝑥) = det(𝑇 − 𝑥 𝐼) in 𝕂[𝑥]. Proof. If 𝑀 ⊆ 𝑉 is 𝑇-invariant and we take a basis 𝔛 = {𝑒𝑖 } that first spans 𝑀 and then picks up additional vectors to get a basis for 𝑉, the matrix [𝑇]𝔛𝔛 has block upper triangular form (

𝐴 ∗ ), and then 0 𝐵

[𝑇 − 𝑥𝐼]𝔛𝔛 = (

𝐴 − 𝑥𝐼 ∗ ). 0 𝐵 − 𝑥𝐼

But it is trivial to check that 𝐴−𝑥𝐼 = [(𝑇−𝑥𝐼)|𝑀 ]𝔛′ 𝔛′ where 𝔛′ = {𝑒1 , ⋯ , 𝑒𝑟 } are the initial vectors that span 𝑀. Thus, det (𝐴 − 𝑥𝐼) = det((𝑇 − 𝑥𝐼)|𝑀 ) = 𝑝𝑇|𝑀 (𝑥) divides 𝑝𝑇 (𝑥) = det(𝐴 − 𝑥𝐼) ⋅ det(𝐵 − 𝑥𝐼). □

1.4. THE GENERALIZED EIGENSPACE DECOMPOSITION

29

Exercise 1.48. Given (𝑉, 𝑀, 𝑇) as in the previous corollary, 𝑇 induces a ˜ + 𝑀) = 𝑇(𝑣) + 𝑀 for 𝑣 ∈ 𝑉. linear map 𝑇˜ ∶ 𝑉/𝑀 → 𝑉/𝑀 such that 𝑇(𝑣 Prove that the characteristic polynomial 𝑝𝑇(𝑥) = det𝑉 /𝑀 (𝑇˜ − 𝑥𝐼) also divides ˜ 𝑝𝑇 (𝑥) = det(𝐴 − 𝑥𝐼) ⋅ det(𝐵 − 𝑥𝐼). Lemma 1.49. If 𝑓 and 𝑃 are nonconstant polynomials in 𝕂[𝑥] and 𝑃 divides 𝑓, so 𝑓(𝑥) = 𝑃(𝑥)𝑄(𝑥) for some other 𝑄 ∈ 𝕂[𝑥], then 𝑃(𝑥) must split over 𝕂 if 𝑓(𝑥) does. Proof. If 𝑄 is constant, there is nothing to prove. Nonconstant polynomials 𝑓 ≠ 0 in 𝕂[𝑥] have unique factorizations into irreducible polynomials 𝑟 𝑓 = ∏𝑖=1 𝐹𝑖 where 𝐹𝑖 cannot be written as a product of nonconstant polynomials of lower degree. Each polynomial 𝑓, 𝑃, 𝑄 has such a factorization: 𝑚

𝑚

𝑃 = ∏ 𝑃𝑘 , 𝑄 = ∏ 𝑄𝑗 , 𝑘=1

so

𝑓 = 𝑃𝑄 = ∏ 𝑃𝑘 ⋅ ∏ 𝑄𝑗 .

𝑗=1

𝑘

𝑗

Since 𝑓 splits over 𝕂, it can also be written as a product of linear factors 𝑓(𝑥) = 𝑛 ∏𝑖=1 (𝑥 − 𝛼𝑖 ), where {𝛼𝑖 } are the roots of 𝑓(𝑥) in 𝕂 counted according to their multiplicities. Linear factors (𝑥 − 𝛼) are obviously irreducible, and the two irreducible decomposition of 𝑓(𝑥) must agree. Thus 𝑃 (and 𝑄) are products of linear factors, and 𝑃(𝑥) splits over 𝕂. □ This lemma has a useful corollary. Corollary 1.50. If the characteristic polynomial 𝑝𝑇 (𝑥) of a linear operator 𝑇 ∶ 𝑉 → 𝑉 splits over 𝕂, so does 𝑝𝑇|𝑊 for any 𝑇-invariant subspace 𝑊 ⊆ 𝑉. Proof of Generalized Eigenspace Decomposition. We now take up the proof of Theorem 1.42. The operator 𝑇 has eigenvalues in 𝕂 because 𝑝𝑇 splits, and its eigenvalues {𝜆1 , … , 𝜆𝑟 } are the distinct roots of 𝑝𝑇 in 𝕂. Recall that 𝐸𝜆 ≠ {0} ⇔ 𝑀𝜆 ≠ {0}. Pick an eigenvalue 𝜆1 and let 𝑉 = 𝐾∞ ⊕ 𝑅∞ be the Fitting decomposition of the operator (𝑇 − 𝜆1 𝐼), so 𝐾∞ is the generalized eigenspace 𝑀𝜆1 (𝑇) while 𝑅∞ is the stable range of (𝑇 − 𝜆1 𝐼). Both spaces are invariant under 𝑇 − 𝜆1 𝐼 and also under 𝑇 since 𝜆1 𝐼 commutes with 𝑇. It will be important to note that 𝜆1 cannot be an eigenvalue of 𝑇|𝑅∞ because if 𝑣 ∈ 𝑅∞ is nonzero, then (𝑇 − 𝜆1 𝐼)𝑣 = 0 ⇒ 𝑣 ∈ 𝐾∞ ∩ 𝑅∞ = {0}. Hence sp(𝑇|𝑅∞ ) ⊆ {𝜆2 , … , 𝜆𝑟 }. We now argue by induction on 𝑛 = dim(𝑉). There is little to prove if 𝑛 = 1. (There is an eigenvalue, so 𝐸𝜆 = 𝑉 and 𝑇 = 𝜆𝐼.) So assume 𝑛 > 1 and that the theorem has been proved for all spaces 𝑉 ′ of dimension < 𝑛 and for all operators 𝑇 ′ ∶ 𝑉 ′ → 𝑉 ′ such that det(𝑇 ′ − 𝜆𝐼) splits over 𝕂. The natural move is to apply this inductive hypothesis to 𝑇 ′ = 𝑇|𝑅∞ = 𝑇|𝑅∞ (𝑇−𝜆1 𝐼) since dim(𝑅∞ ) = dim(𝑉) − dim(𝑀𝜆1 ) < dim(𝑉) = 𝑛. But to do so, we must show 𝑝𝑇 ′ splits over 𝕂, as required in the induction hypothesis.

30

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

By Lemma 1.49, the characteristic polynomial of 𝑇 ′ = 𝑇|𝑅∞ splits over 𝕂 and, by induction on dimension 𝑅∞ (𝑇 ′ ), is a direct sum of generalized eigenspaces for the restricted operator 𝑇 ′ , 𝑅∞ (𝑇 ′ ) =

⨁′

𝑀𝜇 (𝑇 ′ ),

𝜇∈sp(𝑇 )

where sp(𝑇 ′ ) = the distinct roots of 𝑝|𝑇 ′ in 𝕂. To compare the roots of 𝑝𝑇 and 𝑝𝑇 ′ , we invoke our observation: 𝑝𝑇 ′ divides 𝑝𝑇 . The roots sp(𝑇 ′ ) = sp(𝑇|𝑅∞ ) are a subset of the roots sp(𝑇) of 𝑝𝑇 (𝑥), so every eigenvalue 𝜇 for 𝑇 ′ is an eigenvalue for 𝑇. Now label the distinct eigenvalues of 𝑇 so that sp(𝑇 ′ ) = {𝜆𝑠 , 𝜆𝑠+1 , … , 𝜆𝑟 } ⊆ sp(𝑇) = {𝜆1 , … , 𝜆𝑟 } (with 𝑠 > 1 because 𝜆1 ∉ sp(𝑇|𝑅∞ ), as we observed earlier). Furthermore, for each 𝜇 ∈ sp(𝑇 ′ ), the generalized eigenspace 𝑀𝜇 (𝑇 ′ ) is a subspace of 𝑅∞ (𝑇 − 𝜇𝐼) ⊆ 𝑉 and must be contained in 𝑀𝜇 (𝑇) because (𝑇 ′ − 𝜇𝐼)𝑘 𝑣 = 0 ⇒ (𝑇 − 𝜇𝐼)𝑘 𝑣 = 0

for all 𝑣 ∈ 𝑀𝜇 (𝑇 ′ ).

It is obvious from the definitions that 𝑅∞ (𝑇 ′ ) = 𝑅∞ (𝑇), so we shall abbreviate 𝑅∞ (𝑇 ′ ) = 𝑅∞ . Then 𝑅∞ =

⨁′

𝑀𝜇 (𝑇 ′ ) ⊆

∑

∑

𝑀𝜇 (𝑇) ⊆

𝜇∈sp(𝑇 ′ )

𝜇∈sp(𝑇 )

𝑀𝜆 (𝑇)

𝜆∈sp(𝑇)

(since 𝑅∞ = 𝑅∞ (𝑇 ′ ) = ⨁𝜇∈sp(𝑇 ′ ) 𝑀𝜇 (𝑇 ′ ) by the induction hypothesis). Therefore, the generalized eigenspaces 𝑀𝜆 (𝑇), 𝜆 ∈ sp(𝑇), must span 𝑉 because 𝑉 = 𝐾∞ ⊕ 𝑅∞ = 𝑀𝜆1 (𝑇) ⊕ 𝑅∞ ⊆ 𝑀𝜆1 (𝑇) ⊕ (

⨁′

𝑀𝜇 (𝑇 ′ ))

𝜇∈sp(𝑇 )

(1.11)

𝑀𝜇 (𝑇)) (because 𝑀𝜇 (𝑇 ′ ) ⊆ 𝑀𝜇 (𝑇))

∑

⊆ 𝑀𝜆1 (𝑇) + (

𝜇∈sp(𝑇 ′ )

∑

⊆ 𝑀𝜆1 (𝑇) + (

𝑀𝜆 (𝑇)) ⊆ 𝑉

(because sp(𝑇 ′ ) ⊆ sp(𝑇)).

𝜆∈sp(𝑇)

We draw the following conclusion: for 𝜆 ∈ sp(𝑇), the 𝑀𝜆 span 𝑉, so 𝑉 is a direct sum of its generalized eigenspaces by Proposition 1.43. That finishes the proof of Theorem 1.42. □ It is worth noting that sp(𝑇 ′ ) = {𝜆2 , … , 𝜆𝑟 } and

𝑀𝜆𝑖 (𝑇 ′ ) = 𝑀𝜆𝑖 (𝑇)

for 2 ≤ 𝑖 ≤ 𝑟.

Since 𝑀𝜇 (𝑇 ′ ) ⊆ 𝑀𝜇 (𝑇) for all 𝜇 ∈ sp(𝑇 ′ ) and 𝑀𝜆1 (𝑇) ∩ 𝑉 ′ = (0), 𝜆1 cannot appear in sp(𝑇 ′ ); on the other hand, every 𝜇 ∈ sp(𝑇 ′ ) must lie in sp(𝑇).

1.4. THE GENERALIZED EIGENSPACE DECOMPOSITION

31

The Jordan Canonical Form. The preceding discussion has several corollaries that lead to a fully developed version of the Jordan canonical form (JCF). However, keep in mind that some things can be proved using just the simpler block upper-triangular form presented below. Corollary 1.51 (Block Upper Triangular Form). If the characteristic polynomial of 𝑇 ∶ 𝑉 → 𝑉 splits over 𝕂, and in particular if 𝕂 = ℂ, there is a basis 𝔛 such that [𝑇]𝔛𝔛 has block upper triangular form

(1.12)

[𝑇]𝔛𝔛

𝑇 ⎛ 1 ⎜ =⎜ ⎜ ⎜ ⎝ 0

0

⋱ ⋱ ⋱

⎞ ⎟ ⎟ ⎟ ⎟ 𝑇𝑟 ⎠

with blocks on the diagonal 𝜆 ∗ ∗ ⎛ 𝑖 ⋱ ⋱ ⎞ ⎜ ⎟ ⋱ ⋱ 𝑇𝑖 = ⎜ ⎟ ⋱ ∗ ⎟ ⎜ 𝜆𝑖 ⎠ ⎝ 0 of size 𝑚𝑖 × 𝑚𝑖 such that: 1. 𝜆1 , … , 𝜆𝑟 are the distinct eigenvalues of 𝑇 (which may appear in several different blocks). 2. The block sizes are the algebraic multiplicities 𝑚𝑖 of the 𝜆𝑖 in the splitting of the characteristic polynomial 𝑝𝑇 (𝑡) (see the next corollary for details). 𝑟 3. 𝑝𝑇 (𝑥) = (−1)𝑛 ⋅ ∏𝑗=1 (𝑥 − 𝜆𝑗 )𝑚𝑗 with 𝑛 = 𝑚1 + ⋯ + 𝑚𝑟 . The blocks 𝑇𝑖 may or may not have off-diagonal terms. Corollary 1.52. If the characteristic polynomial of an 𝑛 × 𝑛 matrix 𝐴 splits over 𝕂, there is a similarity transform 𝐴 ↦ 𝑆𝐴𝑆 −1 , 𝑆 ∈ GL(𝑛, 𝕂), such that 𝑆𝐴𝑆 −1 has the block upper-triangular form shown above. Corollary 1.53. If the characteristic polynomial of 𝑇 ∶ 𝑉 → 𝑉 splits over 𝕂, and in particular if 𝕂 = ℂ, then for every 𝜆 ∈ sp(𝑇) we have (algebraic multiplicity of 𝜆) = dim(𝑀𝜆 ) = 𝑚𝑗 , where 𝑚𝑗 is the block size in (1.12). Proof. Taking a basis that casts [𝑇]𝔛𝔛 in the form (1.12), [ 𝑇 − 𝑥𝐼 ]𝔛𝔛 will have the same form but with diagonal entries 𝜆𝑗 replaced by (𝜆𝑗 − 𝑥). Then 𝑟

det[𝑇 − 𝑥𝐼]𝔛𝔛 = ∏ (𝜆𝑗 − 𝑥)

dim(𝑀𝜆 ) 𝑗

= 𝑝𝑇 (𝑥)

𝑗=1

since the blocks 𝑇𝑗 correspond to 𝑇|𝑀𝜆 . Obviously, the exponent on (𝜆𝑗 − 𝑥) is 𝑗

also the multiplicity of 𝜆𝑗 in the splitting of 𝑝𝑇 .

□

32

1. GENERALIZED EIGENSPACES AND THE JORDAN DECOMPOSITION

Corollary 1.54. If the characteristic polynomial of an 𝑛 × 𝑛 matrix 𝐴 with distinct eigenvalues sp𝕂 (𝐴) = {𝜆1 , … , 𝜆𝑟 } splits over 𝕂, then the following hold: 𝑟

𝑚

1. det(𝐴) = ∏𝑖=1 𝜆𝑖 𝑖 , the product of eigenvalues counted according to their algebraic multiplicities 𝑚𝑖 . 𝑟 2. Tr(𝐴) = ∑𝑖=1 𝑚𝑖 𝜆𝑖 , the sum of eigenvalues counted according to their algebraic multiplicities 𝑚𝑖 . 3. More generally, if 𝕂 = ℂ, there are explicit formulas for all coefficients of the characteristic polynomial when we write it in the form 𝑛

𝑝𝐴 (𝑥) = det(𝐴 − 𝑥 𝐼) = ∑ (−1)𝑘 𝑐𝑘 (𝐴) 𝑥𝑘 . 𝑘=0

If eigenvalues are listed according to their multiplicities 𝑚𝑘 = 𝑚(𝜆𝑘 ), say as 𝜇1 , … , 𝜇𝑛 with 𝑛 = dim(𝑉) and 𝜇1 = ⋯ = 𝜇𝑚1 = 𝜆1 ,

𝜇𝑚1 +1 = ⋯ = 𝜇𝑚1 +𝑚2 = 𝜆2 ,

etc.,

then 𝑐𝑛 = 1 and 𝑛

𝑐𝑛−1 = ∑ 𝜇𝑗 = Tr(𝐴), 𝑗=1 𝑛

𝑐𝑛−2 = ∑ 𝜇𝑗1 𝜇𝑗2 , 𝑗1 0.

○

Recall that for a complex variable 𝑧 = 𝑥 + 𝑖𝑦 we have 𝑒𝑧 = |𝑧|(cos 𝑥 + 𝑖 sin 𝑦) 𝑒𝑧 + 𝑒−𝑧 𝑒𝑖𝑧 + 𝑒−𝑖𝑧 and cos(𝑧) = 2 2 𝑖𝑧 𝑧 −𝑧 𝑒 −𝑒 𝑒 − 𝑒−𝑖𝑧 sinh(𝑧) = and sin(𝑧) = , 2 2𝑖 and the preceding formulas are easy consequences. cosh(𝑧) =

74

2. FURTHER APPLICATIONS OF JORDAN FORM

Remark 2.7. For 𝑘 ∈ ℝ, the differential equations 𝐿𝑦 = 0 and 𝐿𝑦 = 𝑓 involving the operators 𝑑2 𝑦 𝐿 = 2 + 𝑘𝑦(𝑡) 𝑑𝑡 have been important in physics since the time of Issac Newton. We mention just a few ways they appear in classical mechanics. • Newton’s Law of Motion. 𝐹 = 𝑚𝑎 describes the instantaneous response of a body subjected to an external force 𝐹(𝑥, 𝑡) that may depend on time and position. (Think of a point mass moving in one dimension for simplicity.) Newton’s crucial observation was that the force 𝐹 and mass 𝑚 of the object are all that matter in determining the momentto-moment acceleration 𝑎(𝑡) = 𝑑2 𝑥/𝑑𝑡2 of the object. Thus, Newton’s law is a differential equation, and one of second order at that: 𝑚

𝑑2 𝑥 − 𝐹(𝑥, 𝑡) ≡ 0 𝑑𝑡2

for all 𝑡.

To find the quantities of interest, the position function 𝑥(𝑡) and velocity function 𝑥′ (𝑡), we must solve this ODE. • Motion of a Pendulum. When a pedulum (a weight attached to a string) is moved away from its resting position and released, the balance between the downward force of gravity and the restraining force exerted by the string causes the weight to feel a net force pushing it back toward its equilibrium position. From the geometry of the situation, one can deduce that for small displacement this force is essentially equal to a constant 𝑘 > 0 times the horizontal displacement 𝑥(𝑡) of the mass. For small initial displacements, its motion is then governed by the differential equation (2.8)

𝑚

𝑑2 𝑥 + 𝑘𝑥(𝑡) ≡ 0 𝑑𝑡2

for all 𝑡

since 𝐹 = −𝑘𝑥(𝑡). Owing to the (−) sign appearing here, the lateral force felt by the mass attached to the string is always opposite to the horizontal dispacement, which is why the pendulum bob oscillates instead of flying off into the distance. An even better example is the motion of a mass suspended on a spring. Left alone, the system will find a stable equilibrium position, but it is the nature of a spring to resist displacements from that position, exerting a force proportional to the displacement but in the opposite direction. Now, the motion is exactly governed by the differential equation (2.8), and the solutions are periodic up and down oscillations about the equilibrium position. Now the constant 𝑘 > 0 reflects the strength of the spring. Versions of (2.8) in which the constant 𝑘 is a complex number can arise in certain models of physical situations. ○

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

75

We end this section with an exercise involving a third-order differential operator whose coefficients are real, but 𝑝𝐴 (𝑥) has some complex eigenvalues and there may be solutions of 𝐴(𝐱) = 𝐛 with 𝐱(𝑡) in ℂ𝑛 . We can always find complex vector-valued solutions to (2.4) by regarding 𝐴 ∈ M(𝑛, ℝ) as a complex matrix with real entries; then the solutions will have complex values 𝐳(𝑡) = 𝐱(𝑡) + 𝑖𝐲(𝑡) in ℂ𝑛 , whose real and imaginary parts 𝐱(𝑡) and 𝐲(𝑡) are both real-valued solutions since 𝑑𝐲 𝑑𝐳 𝑑𝐱 = +𝑖 , 𝑑𝑡 𝑑𝑡 𝑑𝑡 and these will include all the real vector-valued solutions. Exercise 2.8. Recast the third-order ordinary differential equation 𝐿𝑦 =

𝑑3 𝑦 𝑑2 𝑦 𝑑𝑦 − 2 − + 𝑦(𝑡) = 0 𝑑𝑡 𝑑𝑡3 𝑑𝑡

as a first-order constant-coefficient system 𝑑𝑍 = 𝐴 ⋅ 𝑍(𝑡) 𝑑𝑡

with 𝑍(0) = (𝑐1 , … , 𝑐3 ).

Find the Jordan form of the coefficient matrix 𝐴. Then give a general formula for the solutions 𝑦(𝑡) = 𝑒𝑡𝐴 ⋅ 𝑍(0), and find a basis for the full set of solutions to 𝐿𝑦 = 0 in 𝒞 ∞ (ℝ). Hint. The characteristic polynomial 𝑝𝐴 (𝑥) of the matrix 𝐴 in the associated first-order system 𝑑𝑍/𝑑𝑡 = 𝐴 ⋅ 𝑍(𝑡) has 𝜆 = 1 as a root.

2.2. Normal Forms for Linear Operators over ℝ How can one analyze a linear operator 𝑇 ∶ 𝑉 → 𝑉 when the characteristic polynomial 𝑝𝑇 (𝑥) does not split over 𝕂? One approach is via the “rational canonical form,” which makes no attempt to replace the ground field 𝕂 with a ˜ over which 𝑝𝑇 might split. We will not pursue that topic larger field of scalars 𝕂 in these Notes. A different approach, which we will illustrate for 𝕂 = ℝ, is to ˜ ⊇ 𝕂 (a “field extension”). Then we may examine an enlarged field of scalars 𝕂 ˜ [𝑥], and since 𝕂 ˜ ⊇ 𝕂, in an obvious way regard 𝕂[𝑥] as a subalgebra within 𝕂 ˜ and split into linear factors there is a better chance that 𝑓(𝑥) will have roots in 𝕂 ˜ [𝑥]: in 𝕂 𝑑

(2.9)

𝑓(𝑥) = 𝑐 ⋅ ∏(𝑥 − 𝜇𝑗 )𝑚𝑗

˜. with 𝑐 and 𝜇𝑗 in 𝕂

𝑗=1

˜ that is algebraically It is, in fact, always possible to embed 𝕂 in a field 𝕂 ˜ ˜ and closed, which means that every polynomial 𝑓(𝑥) ∈ 𝕂[𝑥] has roots in 𝕂 splits as in (2.9). The fundamental theorem of algebra asserts that the complex

76

2. FURTHER APPLICATIONS OF JORDAN FORM

number field is algebraically closed, but the real number system ℝ is not — for instance, 𝑥2 + 1 and 𝑥2 + 𝑥 + 2 ∈ ℝ[𝑥] have no roots in ℝ and cannot split into linear factors in ℝ[𝑥]. However, they do split when regarded as an elements of ℂ[𝑥]; for example, 1 1 𝑥2 + 𝑥 + 1 = (𝑥 − 𝑧+ ) ⋅ (𝑥 − 𝑧− ) = (𝑥 − (−1 + 𝑖√3)) ⋅ (𝑥 + (−1 − 𝑖√3)) 2 2 where 𝑖 = √−1 . In this simple example one can find the complex roots 𝑧± = 1 (−1 ± 𝑖√3) using the quadratic formula. 2

Complexification: A Case Study. “Complexification” is a fancy word for a process you have already encountered by other names. In its simplest form, you learned to solve quadratic equations 𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0 having coefficients in ℝ by passing from the real numbers to the larger field of complex numbers ℂ. This is what lets us make sense of all the solutions provided by the quadratic formula −𝑏 ± √𝑏2 − 4𝑎𝑐 . 𝑥± = 2𝑎 This perplexed mathematicians when the discriminant (𝑏2 − 4𝑎𝑐) turned out to be negative — the simplest example being the quadratic equation 𝑥2 + 1 = 0 whose solutions 𝑥 = ±√−1 were deemed “imaginary” because no real number can have 𝑥2 = −1 (the complex number system had not yet been invented). Nowadays this is familiar stuff, and we have already encountered this shift from ℝ to ℂ — for instance, to find and interpret the complex eigenvalues that can arise when you solve the polynomial equation 𝑝𝐴 (𝑥) = det(𝐴 − 𝑥𝐼) = 0, even if the entries in 𝐴 are real. The point of this section is to provide a deeper understanding of this “complexification” process and its use to reveal the structure of matrix multiplication operators on ℝ𝑛 𝑛

(2.10)

𝐲 = 𝐿𝐴 (𝐱) = 𝐴 ⋅ 𝐱

with

𝑦𝑗 = ∑ 𝑎𝑗𝑘 𝑥𝑘 𝑘=1

determined by matrices 𝐴 ∈ M(𝑛, ℝ). Obviously, the same formula could be applied to 𝑛-tuples 𝐳 = (𝑧1 , … , 𝑧𝑛 ) having complex entries 𝑧𝑗 = 𝑥𝑗 + 𝑖𝑦𝑗 with ̃ ∶ ℂ𝑛 → ℂ𝑛 given by 𝑥𝑗 , 𝑦𝑗 ∈ ℝ. This yields an operator 𝐿𝐴 ̃ (𝐳) = 𝐴 ⋅ 𝐳 = 𝐴 ⋅ [𝐱] + 𝑖𝐴 ⋅ [𝐲] 𝐰 = 𝐿𝐴 𝑛 ̃ to with 𝑤𝑗 = ∑𝑘=1 𝑎𝑗𝑘 𝑧𝑘 . While 𝐿𝐴 was ℝ-linear on ℝ𝑛 , the “extension” 𝐿𝐴 𝑛 ℂ turns out to be a complex-linear operator on the complex vector space ℂ𝑛 , whose elements will hereafter be written 𝐳 = 𝐱 + 𝑖𝐲 if 𝐳 = (𝑧1 , … , 𝑧𝑛 ) with 𝑧𝑘 = 𝑥𝑘 + 𝑖𝑦𝑘 = 𝑥𝑘 + √−1𝑦𝑘 .

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

77

̃ is a complex-linear map on ℂ𝑛 for any 𝐴 in Exercise 2.9. Verify that 𝐿𝐴 M(𝑛, ℝ) by showing that ̃ ((𝑎 + 𝑖𝑏) ⋅ 𝐳) = (𝑎 + 𝑖𝑏) ⋅ 𝐿𝐴 ̃ (𝐳) (a) 𝐿𝐴 ̃ (𝐳 + 𝐰) = 𝐿𝐴 ̃ (𝐳) + 𝐿𝐴 ̃ (𝐰) (b) 𝐿𝐴 for all 𝐳, 𝐰 ∈ ℂ𝑛 and (𝑎 + 𝑖𝑏) ∈ ℂ. ̃ to the “real points” 𝐳 = 𝐱 + 𝑖𝟎 in ℂ𝑛 with 𝐱 ∈ ℝ𝑛 , we find If we restrict 𝐿𝐴 ̃ that 𝐿𝐴 is an extension of 𝐿𝐴 in the following sense: ̃ (𝐱 + 𝑖0) = 𝐿𝐴 (𝐱) + 𝑖0 𝐿𝐴

for 𝐱 ∈ ℝ𝑛 .

Therefore, it is natural to think of ℂ𝑛 as a “complexification” of ℝ𝑛 because ̃ ∶ ℂ𝑛 → ℂ𝑛 becomes the complexification ℂ𝑛 = ℝ𝑛 + 𝑖ℝ𝑛 , and then 𝑇ℂ = 𝐿𝐴 of the ℝ-linear operator 𝑇 = 𝐿𝐴 on ℝ𝑛 . Complexification of Real Vector Spaces and Operators. We can do the same for any vector space 𝑉 over ℝ, defining its complexification 𝑉ℂ to be the set of the formal sums 𝑧 = 𝑥 + 𝑖𝑦 with 𝑥, 𝑦 ∈ 𝑉 . The same arguments as in Exercise 2.9 show that 𝑉ℂ = 𝑉 + 𝑖𝑉 is a vector space over ℂ if we define (2.11)

(𝑎 + 𝑖𝑏) ⋅ (𝑥 + 𝑖𝑦) = (𝑎𝑥 − 𝑏𝑦) + 𝑖(𝑎𝑦 + 𝑏𝑥) (𝑥 + 𝑖𝑦) + (𝑢 + 𝑖𝑣) = (𝑥 + 𝑢) + 𝑖(𝑦 + 𝑣)

for 𝑎, 𝑏 ∈ ℝ, 𝑥, 𝑦 ∈ 𝑉. If 𝑇 ∶ 𝑉 → 𝑉 is a linear operator between real vector spaces, its complexification 𝑇ℂ ∶ 𝑉ℂ → 𝑉ℂ is the ℂ-linear map (2.12)

𝑇ℂ (𝑥 + 𝑖𝑦) = 𝑇(𝑥) + 𝑖𝑇(𝑦)

in 𝑊ℂ for any 𝑥, 𝑦 ∈ 𝑉.

The characteristic polynomial 𝑝𝑇ℂ (𝑡) = det(𝑇ℂ − 𝑡𝐼) ∈ ℂ[𝑡] turns out to be the same as 𝑝𝑇 (𝑡) when we view 𝑝𝑇 (𝑡) ∈ ℝ[𝑡] as a polynomial in ℂ[𝑡] that happens to have real coefficients. Since 𝑝𝑇ℂ (𝑡) always splits over ℂ, the theory developed in Chapter 1 applies to 𝑇ℂ . Our task is then to interpret that theory in terms of the original real linear operator 𝑇 ∶ ℝ𝑛 → ℝ𝑛 . Exercise 2.10. If 𝑇 is a linear operator from ℝ𝑛 → ℝ𝑛 and 𝑇ℂ ∶ ℂ𝑛 → ℂ𝑛 is its complexification as defined in (2.11) and (2.12), verify the following: (a) 𝑇ℂ is in fact a ℂ-linear operator on ℂ𝑛 . (b) The characteristic polynomials 𝑝𝑇 (𝑡) in ℝ[𝑡] and 𝑝𝑇ℂ (𝑡) in ℂ[𝑡] are the same, with 𝑝𝑇ℂ (𝑡) = 𝑝𝑇 (𝑡) when we identify ℝ[𝑡] ⊆ ℂ[𝑡]. Hint. Polynomials in 𝕂[𝑡] are equal ⇔ they have the same coefficients. Here, 𝑝𝑇ℂ has coefficients in ℂ while 𝑝𝑇 has coefficients in ℝ, but we are identifying ℝ = ℝ + 𝑖0 ⊆ ℂ.

78

2. FURTHER APPLICATIONS OF JORDAN FORM

Relations Between ℝ[𝑡] and ℂ[𝑡]. When we view a real-coefficient poly𝑚 nomial 𝑓(𝑡) = ∑𝑗=0 𝑎𝑗 𝑡𝑗 ∈ ℝ[𝑡] as an element of ℂ[𝑡], it can have complex roots as well as real roots. However, Complex Roots of a Real Polynomial. When we view 𝑓(𝑡) ∈ ℝ[𝑥] as an element of ℂ[𝑥], nonreal roots must occur in conjugate pairs 𝑧± = (𝑢 ± 𝑖𝑣) with 𝑢, 𝑣 real and 𝑣 ≠ 0. Such roots can have nontrivial multiplicities, resulting in paired factors (𝑡 − 𝑧+ )𝑚 ⋅ (𝑡 − 𝑧− )𝑚 in the irreducible factorization of 𝑓(𝑡) in ℂ[𝑡]. In fact, if 𝑧 = 𝑥+𝑖𝑦 with 𝑥, 𝑦 real and 𝑓(𝑧) = 0, the complex conjugate 𝑧 = 𝑥−𝑖𝑦 is also a root of 𝑓 because 𝑚

𝑗

𝑓(𝑧) = ∑ 𝑎𝑗 𝑧 = ∑ 𝑎𝑗 𝑧𝑗 = 𝑓(𝑧) = 0 𝑗=0

𝑗

(Recall that 𝑧 + 𝑤 = 𝑧 + 𝑤, 𝑧𝑤 = 𝑧 ⋅ 𝑤, and (𝑧)− = 𝑧 for 𝑧, 𝑤 ∈ ℂ.) Thus, the number of nonreal roots is even if any exist, while the number of real roots is unrestricted and might be zero. The splitting of 𝑓 in ℂ[𝑡] can then be written as 𝑟

𝑠

𝑓(𝑡) = 𝑐 ⋅ ∏(𝑡 − 𝜇𝑗 )𝑚𝑗 (𝑡 − 𝜇𝑗 )𝑚𝑗 ⋅ ∏ (𝑡 − 𝑟𝑗 )𝑚𝑘 (2.13)

𝑗=1

𝑘=𝑟+1

𝑟

𝑚𝑗

= 𝑐 ⋅ ∏ ( (𝑡 − 𝜇𝑗 )(𝑡 − 𝜇𝑗 ) ) 𝑗=1

𝑠

⋅ ∏ (𝑡 − 𝑟𝑗 )𝑚𝑘 𝑘=𝑟+1

where the 𝜇𝑗 are complex and nonreal (𝜇 ≠ 𝜇) and the 𝑟𝑗 are the distinct real roots of 𝑓. Obviously 𝑠

𝑟

𝑛 = deg(𝑓) = ∑ 2𝑚𝑗 + ∑ 𝑚𝑗 . 𝑗=𝑟+1

𝑗=1

Since 𝑓 has real coefficients, all complex numbers must disappear when the products involving nonreal roots are expanded. In particular, each nonreal conjugate pair 𝜇, 𝜇 yields a quadratic with real coefficients. Hence, (2.14) and

𝑄𝜇 (𝑡) = (𝑡 − 𝜇)(𝑡 − 𝜇) = 𝑡2 − 2Re(𝜇) 𝑡 + |𝜇|2 𝑠

𝑟

𝑓(𝑡) = 𝑐 ⋅ ∏(𝑡 − 𝑟𝑗 )𝑚𝑗 ⋅ ∏(𝑄𝜇𝑗 (𝑡))𝑚𝑗 𝑟+1

𝑗=1

is a factorization of 𝑓(𝑡) into linear and irreducible quadratic factors in ℝ[𝑡], and every 𝑓 ∈ ℝ[𝑡] can be decomposed this way. This is the (unique) decomposition of 𝑓(𝑡) into irreducible factors in ℝ[𝑡]. By definition, the 𝑄𝜇 (𝑡) have no real roots and cannot be products of polynomials of lower degree in ℝ[𝑡], while the linear factors (𝑡 − 𝑟𝑗 ) are irreducible over ℝ as they stand. The following terminology is standard.

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

79

Definition 2.11. A nonconstant polynomial 𝑓 ∈ 𝕂[𝑡] is irreducible if it cannot be factored as 𝑓(𝑡) = 𝑔(𝑡)ℎ(𝑡) with 𝑔, ℎ nonconstant, hence of lower degree than 𝑓. A polynomial is monic if its leading coefficient is 1. It is well 𝑟 known that every monic 𝑓 ∈ 𝕂[𝑡] factors uniquely as ∏𝑗=1 ℎ𝑗 (𝑡)𝑚𝑗 where each ℎ𝑗 is monic and irreducible in 𝕂[𝑡]. The exponent 𝑚𝑗 ≥ 1 is the multiplicity of ℎ𝑗 , and this factorization is unique. The simplest irreducibles (over any 𝕂) are the linear polynomials 𝑎𝑡 + 𝑏 (with 𝑎 ≠ 0 since “irreducibility” only applies to nonconstant polynomials). This follows from the degree formula: Degree Formula. We have deg(𝑔ℎ) = deg(𝑔) + deg(ℎ) for all nonzero 𝑔, ℎ ∈ 𝕂[𝑡]. If we could factor 𝑎𝑡 + 𝑏 = 𝑔(𝑡)ℎ(𝑡), either 𝑔(𝑡) or ℎ(𝑡) would have degree 0, which is not allowed in a nontrivial factorization. Thus 𝑎𝑡 + 𝑏 has no nontrivial factorization. When 𝕂 = ℂ, all irreducible polynomials have 𝑑𝑒𝑔𝑟𝑒𝑒 = 1, but if 𝕂 = ℝ, they can have 𝑑𝑒𝑔𝑟𝑒𝑒 = 1 or 2. Lemma 2.12. The irreducible monic polynomials in ℝ[𝑡] have the form 1. (𝑡 − 𝑟) with 𝑟 ∈ ℝ, or 2. 𝑡2 + 𝑏𝑡 + 𝑐 with 𝑏2 − 4𝑐 < 0. These are the polynomials (𝑡 − 𝜇)(𝑡 − 𝜇) with 𝜇 a nonreal element in ℂ. Proof. Linear polynomials 𝑎𝑡 + 𝑏 (𝑎 ≠ 0) in ℝ[𝑡] are irreducible, as noted above. If 𝑓 has the form (2.6), the quadratic formula can be applied to 𝑓(𝑡) = 𝑡2 + 𝑏𝑡 + 𝑐 in ℂ[𝑥] to find its roots −𝑏 ± √𝑏2 − 4𝑐 −𝑏 ± 𝑖√4𝑐 − 𝑏2 = . 2 2 There are three possible outcomes: 𝜇, 𝜇 =

1. When 𝑏2 −4𝑎𝑐 = 0, 𝑓(𝑥) has a single real root with multiplicity 𝑚 = 2, 1

2

and then 𝑓(𝑡) = (𝑡 + 𝑏) . 2

2. When 𝑎 = 1 and 𝑏2 − 4𝑎𝑐 > 0, there are two distinct real roots 1 𝑟± = ( − 𝑏 ± √𝑏2 − 4𝑐 ), 2 and then 𝑓(𝑡) = (𝑡 − 𝑟+ )(𝑡 − 𝑟− ) 3. When 𝑎 = 1 and the discriminant 𝑏2 − 4𝑐 is negative, there are two distinct conjugate nonreal roots in ℂ, −𝑏 − 𝑖√4𝑐 − 𝑏2 −𝑏 + 𝑖√4𝑐 − 𝑏2 and 𝜇 = (𝑖 = √−1). 2 2 In this case, 𝑓(𝑡) = (𝑡 − 𝜇)(𝑡 − 𝜇) has the form (2.6) in ℝ[𝑡].

𝜇=

The quadratic 𝑓(𝑡) is irreducible in ℝ[𝑡] when 𝑓(𝑡) has two nonreal roots; otherwise it would have a factorization (𝑡 − 𝑟1 )(𝑡 − 𝑟2 ) in ℝ[𝑡] and also in ℂ[𝑡] that would disagree with (𝑥 − 𝜇)(𝑥 − 𝜇), contrary to unique factorization in ℂ[𝑡]. That cannot occur. □

80

2. FURTHER APPLICATIONS OF JORDAN FORM

Complexification of Linear Operators over ℝ. We now discuss complexification of arbitrary vector spaces over ℝ and complexifications 𝑇ℂ of the ℝ-linear operators 𝑇 ∶ 𝑉 → 𝑉 that act on them. Definition 2.13 (Complexification). Given an arbitrary vector space 𝑉 over ℝ, its complexification 𝑉ℂ is made up of the formal sums 𝐳 = 𝑥 + 𝑖𝑦 with 𝑥, 𝑦 ∈ 𝑉 equipped with operations 𝐳 + 𝐰 = (𝑥 + 𝑖𝑦) + (𝑢 + 𝑖𝑣) = (𝑥 + 𝑢) + 𝑖(𝑦 + 𝑣) and (𝑎 + 𝑖𝑏) ⋅ 𝐳 = (𝑎 + 𝑖𝑏)(𝑥 + 𝑖𝑦) = (𝑎𝑥 − 𝑏𝑦) + 𝑖(𝑏𝑥 + 𝑎𝑦) for 𝑎 + 𝑖𝑏 ∈ ℂ. Two symbols 𝐳 = (𝑥 + 𝑖𝑦) and 𝐳′ = (𝑥′ + 𝑖𝑦′ ) designate the same element of 𝑉ℂ ⇔ 𝑥′ = 𝑥 and 𝑦′ = 𝑦. The real points in 𝑉ℂ are those of the form 𝑉 + 𝑖0. This set is a vector space over ℝ (but not over ℂ) because (𝑐 + 𝑖0) ⋅ (𝑥 + 𝑖0) = (𝑐𝑥) + 𝑖0 ′

for 𝑐 ∈ ℝ, 𝑥 ∈ 𝑉,

′

for 𝑥, 𝑥′ ∈ 𝑉.

(𝑥 + 𝑖0) + (𝑥 + 𝑖0) = (𝑥 + 𝑥 ) + 𝑖0

Clearly the operations (+) and (scaling by some real number 𝑐 + 𝑖0) match the usual operations in 𝑉 when restricted to 𝑉 + 𝑖0. If 𝑇 ∶ 𝑉 → 𝑉 is an ℝ-linear operator, its complexification 𝑇ℂ ∶ 𝑉ℂ → 𝑉ℂ is the map (2.15)

𝑇ℂ (𝑥 + 𝑖𝑦) = 𝑇(𝑥) + 𝑖𝑇(𝑦)

for 𝑥, 𝑦 ∈ 𝑉,

which turns out to be a ℂ-linear operator on 𝑉ℂ . Exercise 2.14. Prove that 𝑉ℂ is actually a vector space over ℂ. Hint. Check the vector space axioms. In particular, you must check that (𝑧1 𝑧2 ) ⋅ 𝑤 = 𝑧1 ⋅ (𝑧2 ⋅ 𝑤) for 𝑧1 , 𝑧2 ∈ ℂ and 𝑤 ∈ 𝑉ℂ and that (𝑐 + 𝑖0) ⋅ (𝑥 + 𝑖0) = (𝑐 ⋅ 𝑥) + 𝑖0 for 𝑐 ∈ ℝ, so 𝑉 + 𝑖0 is a subspace over ℝ isomorphic to 𝑉 under the map 𝑥 → 𝑥 + 𝑖0. Example 2.15. You should check that 𝑇ℂ is a ℂ-linear operator on the complex vector space 𝑉ℂ . Most steps are routine; the messy part is proving that 𝑇ℂ (𝑧 ⋅ 𝐰) = 𝑧 ⋅ 𝑇ℂ (𝐰) for 𝑧 ∈ ℂ, 𝐰 ∈ 𝑉ℂ , so we will only do that. If 𝑧 = 𝑎 + 𝑖𝑏 ∈ ℂ and 𝐰 = 𝑢 + 𝑖𝑣 in 𝑉ℂ , we get 𝑇ℂ ((𝑎 + 𝑖𝑏)(𝑢 + 𝑖𝑣)) = 𝑇ℂ ((𝑎𝑢 − 𝑏𝑣) + 𝑖(𝑏𝑢 + 𝑎𝑣)) = 𝑇(𝑎𝑢 − 𝑏𝑣) + 𝑖𝑇(𝑏𝑢 + 𝑎𝑣) = 𝑎𝑇(𝑢) − 𝑏𝑇(𝑣) + 𝑖𝑏𝑇(𝑢) + 𝑖𝑎𝑇(𝑣) = (𝑎 + 𝑖𝑏) ⋅ (𝑇(𝑢) + 𝑖𝑇(𝑣)) = (𝑎 + 𝑖𝑏) ⋅ 𝑇ℂ (𝑢 + 𝑖𝑣). Exercise 2.16. If 𝐴, 𝐵 ∈ M(𝑛, ℝ) and we define an operator (𝐴 + 𝑖𝐵) on ℂ𝑛 = ℝ𝑛 + 𝑖ℝ𝑛 letting (𝐴 + 𝑖𝐵) ⋅ (𝑥 + 𝑖𝑦) = (𝐴𝑥 − 𝐵𝑦) + 𝑖(𝐵𝑥 + 𝐴𝑦)

for 𝑥, 𝑦 ∈ ℝ𝑛 ,

is this a ℂ-linear operator on ℂ𝑛 ? What if 𝐴 = 𝐵? Exercise 2.17. If 𝑀1 , … , 𝑀𝑟 are vector spaces over ℝ verify that the complexification of 𝑉 = 𝑀1 ⊕ ⋯ ⊕ 𝑀𝑟 is 𝑉ℂ = (𝑀1 )ℂ ⊕ ⋯ ⊕ (𝑀𝑟 )ℂ .

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

81

Lemma 2.18 (Bases in 𝑉 vs Bases in 𝑉ℂ ). If 𝔛 = {𝑒𝑗 } is an ℝ-basis in 𝑉, then {𝑒𝑗̃ = 𝑒𝑗 + 𝑖0} is a ℂ-basis in 𝑉ℂ . In particular, dimℝ (𝑉) = dimℂ (𝑉ℂ ). Proof. If 𝑤 = 𝑣 + 𝑖𝑤 in 𝑉ℂ , there are real coefficients {𝑐𝑗 }, {𝑑𝑗 } such that 𝑤 = ( ∑ 𝑐𝑗 𝑒𝑗 ) + 𝑖( ∑ 𝑑𝑗 𝑒𝑗 ) = ∑(𝑐𝑗 + 𝑖𝑑𝑗 )(𝑒𝑗 + 𝑖0), 𝑗

𝑗

𝑗

so {𝑒𝑗̃ } span 𝑉ℂ . As for independence, if we have 0 + 𝑖0 = ∑ 𝑧𝑗 𝑒𝑗̃ = ∑(𝑐𝑗 + 𝑖𝑑𝑗 ) ⋅ (𝑒𝑗 + 𝑖0) = ( ∑ 𝑐𝑗 𝑒𝑗 ) + 𝑖( ∑ 𝑑𝑗 𝑒𝑗 ) 𝑗

𝑗

𝑗

𝑗

in 𝑉ℂ for coefficients 𝑧𝑗 = 𝑐𝑗 + 𝑖𝑑𝑗 in ℂ, then ∑𝑗 𝑐𝑗 𝑒𝑗 = 0 = ∑𝑗 𝑑𝑗 𝑒𝑗 , which implies 𝑐𝑗 = 0 and 𝑑𝑗 = 0 because {𝑒𝑗 } is a basis in 𝑉. Thus 𝑧𝑗 = 0 for all 𝑗. □ Example 2.19. If 𝑉 = ℝ𝑛 , then in an obvious sense 𝑉ℂ = ℝ𝑛 + 𝑖ℝ𝑛 is the same as ℂ𝑛 . If 𝐴 ∈ M(𝑛, ℝ), we get an ℝ-linear operator 𝑇 = 𝐿𝐴 that sends 𝑣 to 𝐴 ⋅ 𝑣, whose matrix with respect to the standard basis 𝔛 = {𝑒𝑗 } in ℝ𝑛 is [𝑇]𝔛𝔛 = 𝐴. If 𝔜 = {𝑒𝑗̃ = 𝑒𝑗 + 𝑖0} is the corresponding basis in 𝑉ℂ , it is easy to check that we have [(𝐿𝐴 )ℂ ]𝔜𝔜 = [ 𝐿𝐴 ]𝔛𝔛 = 𝐴, so (𝐿𝐴 )ℂ is obtained by letting the matrix 𝐴 with real entries act on complex column vectors by matrix multiplication (regarding 𝐴 as a complex matrix that happens to have real entries). Definition 2.20. The conjugation operator 𝐽 on 𝑉ℂ maps 𝑥 + 𝑖𝑦 → 𝑥 − 𝑖𝑦. It is an ℝ-linear operator on 𝑉ℂ with 𝐽(𝑐 ⋅ 𝑤) = 𝑐 ⋅ 𝐽(𝑤)

if 𝑐 = 𝑐 + 𝑖0 ∈ ℝ,

but it is conjugate linear over ℂ with 𝐽(𝑧 ⋅ 𝑤) = 𝑧 ⋅ 𝐽(𝑤)

and

𝐽(𝑤1 + 𝑤2 ) = 𝐽(𝑤1 ) + 𝐽(𝑤2 )

for 𝑧 ∈ ℂ, 𝑤 ∈ 𝑉ℂ . Exercise 2.21. Verify the following basic properties of conjugation. (a) 𝐽 2 = 𝐽 ∘ 𝐽 = id, so 𝐽 −1 = 𝐽. (b) 𝑤 ∈ 𝑉𝐶 is a real point if and only if 𝐽(𝑤) = 𝑤. (c)

𝑤 + 𝐽(𝑤) 𝑤 − 𝐽(𝑤) = 𝑥 + 𝑖0 and = 𝑦 + 𝑖0 if 𝑤 = 𝑥 + 𝑖𝑦 in 𝑉ℂ . 2 2𝑖

We can use the operator 𝐽 to identify the ℂ-linear maps 𝑆 ∶ 𝑉ℂ → 𝑉ℂ of real-type, those 𝑆 ∈ Homℂ (𝑉ℂ , 𝑉ℂ ) such that 𝑆 = 𝑇ℂ for some ℝ-linear operator 𝑇 on 𝑉 (so 𝑆(𝑥 + 𝑖𝑦) = 𝑇(𝑥) + 𝑖𝑇(𝑦)). Lemma 2.22. If 𝑇 ∶ 𝑉 → 𝑉 is ℝ-linear and 𝑇ℂ is its complexification, then 𝐽𝑇ℂ = 𝑇ℂ 𝐽, or equivalently 𝐽𝑇ℂ 𝐽 = 𝑇ℂ .

82

2. FURTHER APPLICATIONS OF JORDAN FORM

If 𝑆 ∶ 𝑉ℂ 1. 2. 3.

→ 𝑉ℂ is any ℂ-linear operator, the following properties are equivalent. 𝑆 = 𝑇ℂ for some ℝ-linear map 𝑇 ∶ 𝑉 → 𝑉. 𝑆𝐽 = 𝐽𝑆. 𝑆 leaves invariant the space 𝑉 + 𝑖0 of real vectors in 𝑉ℂ .

Proof. Implication (1) ⇒ (2) follows because 𝐽𝑇ℂ 𝐽(𝑥 + 𝑖𝑦) = 𝐽𝑇ℂ (𝑥 − 𝑖𝑦) = 𝐽(𝑇𝑥 − 𝑖𝑇𝑦) = 𝑇𝑥 + 𝑖𝑇𝑦 = 𝑇ℂ (𝑥 + 𝑖𝑦). 1

As for (2) ⇒ (3), suppose 𝑆𝐽 = 𝐽𝑆. We have 𝐰 ∈ 𝑉 + 𝑖0 ⇔ 𝐰 = (𝐰 + 𝐽(𝐰)), 2 and for such vectors we have (by (2)) 1 1 𝑆𝐰 = (𝑆𝐰 + 𝑆𝐽𝐰) = (𝑆𝐰 + 𝐽𝑆𝐰), 2 2 so 𝑆𝐰 is in (𝑉 + 𝑖0). For (3) ⇒ (1): if 𝑆 leaves 𝑉 + 𝑖0 invariant, then 𝑆(𝑥 + 𝑖0) = 𝑇(𝑥) + 𝑖0 for a unique 𝑇(𝑥) in 𝑉, and we must show 𝑥 → 𝑇𝑥 is ℝ-linear. In fact, if 𝑐1 , 𝑐2 ∈ ℝ and 𝑥1 , 𝑥2 ∈ 𝑉, we have 𝑆((𝑐1 𝑥1 + 𝑐2 𝑥2 ) + 𝑖0) = 𝑇(𝑐1 𝑥1 + 𝑐2 𝑥2 ) + 𝑖0, while by ℂ-linearity 𝑆 must also satisfy the identities 𝑆((𝑐1 𝑥1 + 𝑐2 𝑥2 ) + 𝑖0) = 𝑆((𝑐1 𝑥1 + 𝑖0) + (𝑐2 𝑥2 + 𝑖0)) = 𝑆((𝑐1 + 𝑖0)(𝑥1 + 𝑖0)) + 𝑆((𝑐2 + 𝑖0)(𝑥2 + 𝑖0)) = (𝑐1 + 𝑖0)(𝑇𝑥1 + 𝑖0) + (𝑐2 + 𝑖0)(𝑇𝑥2 + 𝑖0) = (𝑐1 𝑇𝑥1 + 𝑐2 𝑇𝑥2 ) + 𝑖0. Thus 𝑇 is ℝ-linear. Furthermore, 𝑇ℂ = 𝑆 because 𝑇ℂ (𝑥 + 𝑖𝑦) = 𝑇𝑥 + 𝑖𝑇𝑦 = (𝑇𝑥 + 𝑖0) + 𝑖(𝑇𝑦 + 𝑖0) = 𝑆(𝑥 + 𝑖0) + 𝑖𝑆(𝑦 + 𝑖0) = 𝑆((𝑥 + 𝑖0) + 𝑖(𝑦 + 𝑖0)) = 𝑆(𝑥 + 𝑖𝑦) □

for all 𝑥 + 𝑖𝑦 ∈ 𝑉ℂ . Exercise 2.23. Prove the following: (a) (𝑇ℂ )𝑘 = (𝑇 𝑘 )ℂ for all 𝑘 = 0, 1, 2, … . (b) 𝑒𝑇ℂ = (𝑒𝑇 )ℂ .

Lemma 2.24. If 𝑝𝑇 (𝑥) is the characteristic polynomial of an ℝ-linear operator 𝑇 ∶ 𝑉 → 𝑉, then 𝑝𝑇ℂ (𝑥) = 𝑝𝑇 (𝑥) when we regard ℝ[𝑥] ⊆ ℂ[𝑥]. Proof. By Lemma 2.18, if 𝔛 = {𝑒𝑗 } is an ℝ-basis in 𝑉, then 𝔜 = {𝑒𝑗̃ = 𝑒𝑗 +𝑖0} is a ℂ-basis in 𝑉ℂ and [𝑇]𝔛𝔛 = [𝑇ℂ ]𝔜𝔜 because 𝑇ℂ (𝑒𝑗̃ ) = 𝑇ℂ (𝑒𝑗 + 𝑖0) = 𝑇(𝑒𝑗 ) + 𝑖𝑇(0) = ( ∑ 𝑡𝑘𝑗 𝑒𝑘 ) + 𝑖0 𝑘

= ∑(𝑡𝑘𝑗 + 𝑖0)(𝑒𝑘 + 𝑖0) = ∑(𝑡𝑘𝑗 + 𝑖0) 𝑒𝑘̃ . 𝑘

𝑘

The matrices agree and so do det([𝑇]𝔛𝔛 − 𝑥𝐼) and det([𝑇ℂ ]𝔜𝔜 − 𝑥𝐼).

□

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

83

With this in mind, we write 𝑝𝑇 (𝑥) for 𝑝𝑇ℂ (𝑥), leaving the context to determine whether we are speaking of elements of ℝ[𝑥] or ℂ[𝑥]. As noted earlier, 𝑝𝑇 (𝑥) splits over ℂ but not necessarily over ℝ. Furthermore, the nonreal complex roots in the irreducible factorization of 𝑝𝑇 come in conjugate pairs, so we may systematically list the distinct real eigenvalues 𝜆𝑗 and distinct conjugate pairs (𝜇𝑗 , 𝜇𝑗 ) of eigenvalues in spℂ (𝑇ℂ ), taking Im(𝜇𝑗 ) > 0 for each pair: (2.16)

𝜇1 , 𝜇1 , … , 𝜇𝑟 , 𝜇𝑟 ; 𝜆𝑟+1 , … , 𝜆𝑠

(𝜆𝑖 real; 𝜇𝑗 ≠ 𝜇𝑗 with Im(𝜇𝑗 ) > 0).

For eigenvalues and conjugate pairs we write 𝑑(𝜇𝑗 ) = dimℂ (𝐸𝜇𝑗 (𝑇ℂ )) and

𝑑(𝜆𝑗 ) = dimℂ (𝐸𝜆 (𝑇ℂ ).

We first take up the case when 𝑇ℂ is diagonalizable, followed by the general case in which the Jordan form of 𝑇ℂ must be used. When 𝑇ℂ is diagonalizable, we will show that the eigenspaces 𝐸𝜆𝑖 , 𝐸𝜇𝑗 , 𝐸𝜇 in 𝑉ℂ can be used to construct a 𝑗 direct sum decomposition of 𝑉ℂ into 𝑇ℂ -invariant subspaces. Remark 2.25. If 𝜇 ≠ 𝜇, then 𝐰 ∈ 𝑉ℂ is an eigenvector for 𝜇 ⇔ 𝐽𝐰 is an eigenvector for 𝜇 because 𝑇ℂ (𝐽𝐰) = 𝐽(𝑇ℂ 𝐰) = 𝐽(𝜇𝐰) = 𝜇 ⋅ 𝐽𝐰. Hence 𝐽(𝐸𝜇 (𝑇ℂ )) = 𝐸𝜇 (𝑇ℂ ) and 𝐽 is an ℝ-linear bijection between these complex subspaces of 𝑉ℂ . ○ Observe also that 𝐽(𝐸𝜇 ⊕ 𝐸𝜇 ) = 𝐸𝜇 ⊕ 𝐸𝜇 even though neither summand is 𝐽-invariant, while 𝐽(𝐸𝜆 ) = 𝐸𝜆 for real eigenvalues of 𝑇ℂ . Subspaces of Real Type in 𝑉ℂ . The complex subspaces 𝐸𝜇,𝜇 = 𝐸𝜇 (𝑇ℂ ) ⊕ 𝐸𝜇 (𝑇ℂ )

with

dimℂ (𝐸𝜇.𝜇 ) = 2𝑑𝜇

are of a special “real-type” in 𝑉ℂ owing to their conjugation invariance. Definition 2.26. If 𝑊 is a ℂ-subspace in the complexification 𝑉ℂ = 𝑉 +𝑖𝑉, its real points are those in the ℝ-subspace 𝑊 ∩(𝑉 +𝑖0) in 𝑉ℂ , which corresponds to an ℝ-subspace 𝑊ℝ ⊆ 𝑉 such that 𝑊 ∩ (𝑉 + 𝑖0) = 𝑊ℝ + 𝑖0. Hereafter, we will identify 𝑊ℝ + 𝑖0 in 𝑉ℂ with 𝑊ℝ in 𝑉, referring to both as “𝑊ℝ .” Complexifying this ℝ-subspace in 𝑉, we obtain a complex subspace (𝑊ℝ )ℂ = 𝑊ℝ + 𝑖𝑊ℝ ⊆ 𝑊 in 𝑉ℂ . In general, 𝑊ℝ + 𝑖𝑊ℝ can be smaller than the original complex subspace 𝑊 and could even be trivial. We say that a complex subspace 𝑊 ⊆ 𝑉ℂ is of real-type if 𝑊 = 𝑊ℝ + 𝑖𝑊ℝ

(𝑊ℝ = 𝑊 ∩ (𝑉 + 𝑖0) the real points in 𝑊).

In other words, a complex subspace 𝑊 is of real type ⇔ it is the complexification of its subspace 𝑊 ∩ (𝑉 + 𝑖0) of real points. The following observation makes it easy to identify subspaces 𝑊 of realtype in 𝑉ℂ .

84

2. FURTHER APPLICATIONS OF JORDAN FORM

Lemma 2.27. A complex subspace 𝑊 in 𝑉ℂ is of real-type ⇔ 𝐽(𝑊) = 𝑊. Proof. For (⇒) we have 𝑊 = 𝑊ℝ + 𝑖𝑊ℝ , so 𝐽𝑊 = 𝑊ℝ − 𝑖𝑊ℝ = 𝑊ℝ + 𝑖𝑊ℝ = 𝑊 since 𝑊ℝ = −𝑊ℝ . For (⇐), write 𝐰 ∈ 𝑊 as 𝐰 = 𝑥 + 𝑖𝑦 (𝑥, 𝑦 ∈ 𝑉). Then we have 1 1 (𝐰 + 𝐽𝐰) = 𝑥 + 𝑖0 and (𝐰 − 𝐽𝐰) = 𝑦 + 𝑖0, 2 2𝑖 both of which lie in 𝑊ℝ = 𝑊 ∩ (𝑉 + 𝑖0) because 𝐽𝑊 = 𝑊. Thus, 𝐰 = (𝑥 + 𝑖0) + 𝑖(𝑦 + 𝑖0) = 𝑥 + 𝑖𝑦

and

𝐰 ∈ 𝑊ℝ + 𝑖𝑊ℝ ,

so 𝑊 is of real-type.

□

It follows immediately that the complex subspaces 𝑊 = 𝐸𝜆+𝑖0 (𝑇ℂ ) for 𝜆 in ℝ and 𝑊 = 𝐸𝜇 (𝑇ℂ )⊕𝐸𝜇 (𝑇ℂ ) for 𝜇 ≠ 𝜇 are of real type in 𝑉ℂ and are 𝑇ℂ -invariant. We now compare what 𝑇ℂ does in these invariant subspaces with the action of 𝑇ℂ on the real subspace 𝑊ℝ + 𝑖0. This subspace actually is 𝑇ℂ -invariant (even though 𝑇ℂ is acting on an ℝ-subspace of 𝑉ℂ ) because 𝑇ℂ (𝑊ℝ ) = 𝑇ℂ (𝑊 ∩ (𝑉 + 𝑖0)) ⊆ 𝑇ℂ (𝑊) ∩ 𝑇ℂ (𝑉 + 𝑖0) = 𝑊 ∩ (𝑉 + 𝑖0) (𝑊 is 𝑇ℂ -invariant) = 𝑊ℝ . Furthermore, its restriction to 𝑊ℝ is closely tied to the action of 𝑇 on a certain ℝsubspace of 𝑉. This will, with some effort, provide an ℝ-basis for 𝑉 that reveals the structure of the original ℝ-linear operator 𝑇 ∶ 𝑉 → 𝑉. Exercise 2.28. Whether 𝕂 = ℝ or ℂ, a matrix in M(𝑛, 𝕂) determines a linear operator 𝐿𝐴 ∶ 𝕂𝑛 → 𝕂𝑛 . Verify the following relationships between an operator on 𝑉 = ℝ𝑛 and its complexification 𝑉ℂ = ℂ𝑛 = ℝ𝑛 + 𝑖ℝ𝑛 . (a) If 𝕂 = ℝ, (𝐿𝐴 )ℂ ∶ ℂ𝑛 → ℂ𝑛 is the same as the operator 𝐿𝐴 ∶ ℂ𝑛 → ℂ𝑛 we get by regarding 𝐴 as a complex matrix whose entries are all real. (b) Consider 𝐴 ∈ M(𝑛, ℂ) and regard ℂ𝑛 as the complexification of ℝ𝑛 . Verify that 𝐿𝐴 ∶ ℂ𝑛 → ℂ𝑛 is of real type ⇔ all entries in 𝐴 are real, so 𝐴 ∈ M(𝑛, ℝ). Exercise 2.29. If 𝑆 and 𝑇 are ℝ-linear operators on a real vector space 𝑉, is the map (𝑆 + 𝑖𝑇) ∶ (𝑥 + 𝑖𝑦) → 𝑆(𝑥) + 𝑖𝑇(𝑦) on 𝑉ℂ a ℂ-linear operator? If so, when is it of real-type? How about the same question for the map (𝑆 + 𝑖𝑇) ∶ (𝑥 + 𝑖𝑦) → (𝑆(𝑥) − 𝑇(𝑦)) + 𝑖(𝑇(𝑥) + 𝑆(𝑦))? Exercise 2.30. If 𝑇 ∶ 𝑉 → 𝑉 is an ℝ-linear operator on a real vector space, prove the following: 𝑘

(a) (𝑇ℂ ) = (𝑇 𝑘 )ℂ for all 𝑘 ∈ ℕ. (b) 𝑒(𝑇ℂ ) = (𝑒𝑇 )ℂ on 𝑉ℂ .

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

85

Exercise 2.31. If 𝐴 ∈ M(𝑛, ℝ), we say 𝜆 ∈ ℂ is in the complex spectrum of 𝐴 if 𝐴 ⋅ 𝐳 = 𝜆𝐳 for some 𝜆 ∈ ℂ and nonzero vector 𝐳 ∈ ℂ𝑛 . This is the same as saying 𝜆 is a complex root of the characteristic polynomial 𝑝𝐴 (𝑥) = det(𝐴 − 𝑥𝐼), which has real coefficients. Prove that this happens if and only if 𝜆 ∈ spℂ ((𝐿𝐴 )ℂ ) , where (𝐿𝐴 )ℂ is the complexification of 𝐿𝐴 on ℝ𝑛 . Exercise 2.32. Let 𝑉ℂ = 𝑉 + 𝑖𝑉 be the complexification of a real vector space and let 𝑆 ∶ 𝑉ℂ → 𝑉ℂ be a ℂ-linear operator of real-type with 𝑆(𝑥 + 𝑖𝑦) = 𝑇(𝑥) + 𝑖𝑇(𝑦) for some 𝑇 ∶ 𝑉 → 𝑉. If 𝑊 = 𝑊ℝ + 𝑖𝑊ℝ ⊆ 𝑉ℂ is a complex subspace of real type that is 𝑆-invariant, verify that (a) 𝑆(𝑊ℝ + 𝑖0) ⊆ (𝑊ℝ + 𝑖0)

and

(b) 𝑆|(𝑊ℝ +𝑖0) = (𝑇|𝑊ℝ ) + 𝑖0.

Concordance Between Eigenspaces of 𝑻 in 𝑽 and of 𝑻ℂ in 𝑽ℂ . The complexified operator 𝑇ℂ acts on a complex vector space 𝑉ℂ and therefore can be put into Jordan form (or perhaps diagonalized) by methods worked out in Chapter 1. We first examine the special case when 𝑇ℂ is diagonalizable before resolving the main problem of interest in this chapter: Question. If 𝑇ℂ ∶ 𝑉ℂ → 𝑉ℂ is the complexification of an ℝlinear operator on 𝑉, what can be said about the structure of the original ℝ-linear operator 𝑇 ∶ 𝑉 → 𝑉? Case 1: 𝑇ℂ is diagonalizable. It will suffice to focus on a single eigenvalue for 𝑇ℂ or a single conjugate pair. Subcase 1a: A Real Eigenvalue 𝜆+𝑖0 for 𝜆 ∈ ℝ. Then, the correspondence between 𝐸𝜆 (𝑇) in 𝑉 and 𝐸𝜆+𝑖0 (𝑇ℂ ) in 𝑉ℂ is straightforward. Lemma 2.33. If 𝑇 ∶ 𝑉 → 𝑉 is ℝ-linear, then spℝ (𝑇) + 𝑖0 = spℂ (𝑇ℂ ) ∩ (ℝ + 𝑖0) , and if 𝜆 ∈ spℝ (𝑇), (𝐸𝜆 (𝑇))ℂ = 𝐸𝜆 (𝑇) + 𝑖𝐸𝜆 (𝑇)

is equal to

𝐸𝜆+𝑖0 (𝑇ℂ ) ⊆ 𝑉ℂ .

Proof. Let 𝜆 ∈ ℝ be in spℂ (𝑇). Then there is a 𝐰 = 𝑢 + 𝑖𝑣 ≠ 0 such that 𝑇ℂ (𝐰) = 𝑇𝑢 + 𝑖𝑇𝑣

is equal to (𝜆 + 𝑖0)𝐰 = 𝜆𝑢 + 𝑖𝜆𝑣 ,

which happens if and only if 𝑇𝑢 = 𝜆𝑢 and 𝑇𝑣 = 𝜆𝑣. Since at least one of 𝑢, 𝑣 is nonzero, 𝜆 ∈ spℝ (𝑇). Conversely, if 𝑢 ∈ 𝑉 is nonzero and 𝑇𝑢 = 𝜆𝑢, we get 𝑇ℂ (𝑢 + 𝑖0) = 𝑇𝑢 + 𝑖0 = (𝜆 ⋅ 𝑢) + 𝑖0 = (𝜆 + 𝑖0) ⋅ (𝑢 + 𝑖0), so 𝜆 + 𝑖0 ∈ spℂ (𝑇ℂ ). The second statement follows because 𝑇ℂ (𝑢 + 𝑖𝑣) = 𝑇𝑢 + 𝑖𝑇𝑣 = (𝜆 + 𝑖0)(𝑢 + 𝑖𝑣) ⇔ 𝑇𝑢 = 𝜆𝑢

and

𝑇𝑣 = 𝜆𝑣.

86

2. FURTHER APPLICATIONS OF JORDAN FORM

Thus 𝐸𝜆+𝑖0 (𝑇ℂ ) is of real-type in 𝑉ℂ and is the complexification of its set of real points 𝐸𝜆 (𝑇) + 𝑖0. If 𝔛 = {𝑒𝑗 } is a basis for 𝐸𝜆 (𝑇), the vectors 𝔜 = {𝑒𝑗̃ = (𝑒𝑗 + 𝑖0)} are a ℂ-basis for 𝐸𝜆+𝑖0 (𝑇ℂ ) such that (2.17)

[𝑇|𝐸𝜆 (𝑇) ]𝔛𝔛 = [𝑇ℂ |𝐸𝜆+𝑖0 (𝑇ℂ ) ]𝔜𝔜 .

In particular, dimℝ (𝐸𝜆 (𝑇)) = dimℂ (𝐸𝜆+𝑖0 (𝑇ℂ )) = 𝑑(𝜆) for real eigenvalues. □ Subcase 1b: A Nonreal Conjugate Pair. If 𝜇 ≠ 𝜇̄ in spℂ (𝑇ℂ ), we have seen that 𝐸𝜇,𝜇 = 𝐸𝜇 (𝑇ℂ ) ⊕ 𝐸𝜇 (𝑇ℂ ) is 𝑇ℂ -invariant and of real-type. List the distinct pairs (𝜇, 𝜇) of nonreal eigenvalues (taking Im(𝜇) > 0) as in (2.16) and pick a ℂ-basis for 𝐸𝜇 (𝑇ℂ ), 𝑓𝑗

(𝜇)

(𝜇)

= 𝑥𝑗

(𝜇)

(𝜇)

+ 𝑖𝑦𝑗

(𝜇)

(𝑥𝑗 , 𝑦𝑗

∈ 𝑉) 𝑑(𝜇)

(𝜇)

for 1 ≤ 𝑗 ≤ 𝑑𝜇 = dimℂ 𝐸𝜇 (𝑇ℂ ). Then 𝐸𝜇 (𝑇ℂ ) = ⨁𝑗=1 ℂ𝑓𝑗 , a direct sum of one-dimensional complex subspaces. Since 𝐽(𝐸𝜇 ) = 𝐸𝜇 , we get a matching ℂ-basis in 𝑀𝜇 (𝑇ℂ ): 𝑑

𝐽𝑓𝑗

(𝜇)

(𝜇)

= 𝑥𝑗

(𝜇)

− 𝑖𝑦𝑗

𝐸𝜇 =

in

⨁

ℂ ⋅ 𝐽𝑓𝑗

(𝜇)

.

𝑗=1

Then define the two-dimensional complex subspaces 𝑉𝑗 𝑉𝑗

(𝜇)

= ℂ𝑓𝑗

(𝜇)

⊕ ℂ(𝐽𝑓𝑗

(𝜇)

Each is 𝑇ℂ -invariant since 𝑇ℂ (𝑓𝑗

(𝜇)

in 𝐸𝜇,𝜇 :

𝑗 = 1, 2, … , 𝑑(𝜇) = dimℂ (𝐸𝜇 (𝑇ℂ )).

),

(𝜇)

) = 𝜇𝑓𝑗

(𝜇)

𝜇

𝜇

and 𝑇ℂ (𝐽(𝑓𝑗 )) = 𝜇𝑓𝑗 . Further𝑑

(𝜇)

more, 𝑉𝑗 is of real type since it is 𝐽-invariant. Clearly 𝐸𝜇,𝜇 = ⨁𝑗=1 𝑉𝑗 claim that

(𝜇)

. We

Lemma 2.34. In each complex subspace 𝑉𝑗

(𝜇)

= ℂ𝑓𝑗

(𝜇)

⊕ 𝐶(𝐽𝑓𝑗

(𝜇)

)

𝑗 = 1, … , 𝑑(𝜇) = dimℂ (𝐸𝜇 (𝑇ℂ ),

we can find a ℂ-basis consisting of two vectors of real-type in (𝑉𝑗

(𝜇)

)ℝ = 𝑉𝑗

(𝜇)

∩ (𝑉 + 𝑖0) ,

something that cannot be done for the spaces ℂ𝑓𝑗 alone. Proof. If 𝑓𝑗 (𝜇)

𝑥𝑗 are in (𝑉𝑗

(𝜇)

(𝜇)

= 𝑥𝑗

(𝜇)

+ 𝑖𝑦𝑗

and 𝐽𝑓𝑗

1 (𝜇) (𝜇) + 𝑖0 = (𝑓𝑗 + 𝐽𝑓𝑗 ) 2

(𝜇)

and

(𝜇)

(𝜇)

(𝜇)

= 𝑥𝑗 (𝜇)

𝑦𝑗

or the eigenspace 𝐸𝜇 (𝑇ℂ ) (𝜇)

− 𝑖𝑦𝑗

+ 𝑖0 =

)ℝ , and their ℂ-span is obviously all of 𝑉𝑗

(𝜇)

they are independent over ℂ and form a ℂ-basis for 𝑉𝑗

, then

1 (𝜇) (𝜇) (𝑓 − 𝐽𝑓𝑗 ) 2𝑖 𝑗

. Since dimℂ (𝑉𝑗

(𝜇)

(𝜇)

) = 2,

. But they are also an

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

ℝ-basis for the two-dimensional subspace (𝑉𝑗 real points in 𝑉𝑗

(𝜇)

(𝜇)

(𝜇)

)ℝ = (ℝ𝑥𝑗

. As previously noted, 𝑇ℂ leaves 𝑉𝑗

𝑇 leaves invariant the subspace of real vectors (𝑉𝑗 𝑇ℂ (𝑉𝑗

(𝜇)

∩ (𝑉 + 𝑖0)) ⊆ 𝑇ℂ (𝑉𝑗 ⊆ 𝑉𝑗

(𝜇)

(𝜇)

(𝜇)

(𝜇)

87 (𝜇)

+ ℝ𝑦𝑗

) + 𝑖0 of

invariant; furthermore,

)ℝ ⊆ 𝑉 because

) ∩ (𝑇(𝑉) + 𝑖0)

∩ (𝑉 + 𝑖0) = (𝑉𝑗

(𝜇)

)ℝ . □

That proves Lemma 2.34 .

We can now determine the matrix of the restriction 𝑇|(𝑉 (𝜇) ) with respect to 𝑗

ℝ

(𝜇) (𝜇) (𝜇) (𝜇) {𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , … }.

the ordered ℝ-basis 𝔛 = (From here on, we suppress most superscripts “(𝜇)” for clarity.) If 𝜇 = 𝑎 + 𝑖𝑏 and 𝜇 = 𝑎 − 𝑖𝑏 (𝑏 ≠ 0), we have 𝑇ℂ (𝑓𝑗

(𝜇)

) = 𝑇ℂ (𝑥𝑗 + 𝑖𝑦𝑗 ) = 𝜇(𝑥𝑗 + 𝑖𝑦𝑗 ) = (𝑎𝑥𝑗 − 𝑏𝑦𝑗 ) + 𝑖(𝑎𝑦𝑗 + 𝑏𝑥𝑗 ),

𝑇ℂ (𝐽𝑓𝑗

(𝜇)

) = 𝑇ℂ (𝑥𝑗 − 𝑖𝑦𝑗 ) = 𝜇(𝑥𝑗 − 𝑖𝑦𝑗 ) = (𝑎𝑥𝑗 − 𝑏𝑦𝑗 ) − 𝑖(𝑎𝑦𝑗 + 𝑏𝑥𝑗 ).

Writing 𝜇 in polar form 𝜇, 𝜇 = 𝑎 ± 𝑖𝑏 = 𝑟𝑒±𝑖𝜃 = 𝑟 cos(𝜃) ± 𝑖𝑟 sin(𝜃), we get 1 1 𝑇𝑥𝑗 + 𝑖0 = 𝑇ℂ (𝑥𝑗 + 𝑖0) = 𝑇ℂ (𝑓𝑗 + 𝐽𝑓𝑗 ) = (𝑇ℂ 𝑓𝑗 + 𝑇ℂ 𝐽𝑓𝑗 ) 2 2 1 = ((𝑎 + 𝑖𝑏)𝑓𝑗 + 𝐽((𝑎 + 𝑖𝑏)𝑓𝑗 )) (𝑇ℂ and 𝐽 commute) 2 = (𝑎𝑥𝑗 − 𝑏𝑦𝑗 )

(all other terms cancel)

= 𝑟(cos(𝜃) ⋅ 𝑥𝑗 − sin(𝜃) ⋅ 𝑦𝑗 ) and similarly 𝑇𝑦𝑗 + 𝑖0 = 𝑇ℂ (

𝑓𝑗 − 𝐽𝑓𝑗 ) = 𝑎𝑦𝑗 + 𝑏𝑥𝑗 = 𝑟(sin(𝜃) ⋅ 𝑥𝑗 + cos(𝜃) ⋅ 𝑦𝑗 ) 2𝑖

Thus, the matrix of 𝑇 restricted to (𝑉𝑗 {𝑥𝑗 , 𝑦𝑗 } in (𝑉𝑗

(𝜇)

(𝜇)

)ℝ with respect to the ℝ-basis 𝔛 =

)ℝ is a scalar multiple of a rotation matrix (by −𝜃 radians): cos 𝜃 sin 𝜃

[𝑇|(𝑉 (𝜇) ) ]𝔛𝔛 = 𝑟 ⋅ ( − sin 𝜃 cos 𝜃 ) = 𝑟 ⋅ 𝑅(𝜃). ℝ 𝑗 (𝜇)

(𝜇)

(Reversing the order of basis vectors 𝑥𝑗 , 𝑦𝑗 would yield a scalar multiple 𝑟 ⋅ 𝑅(𝜃) of the matrix describing a rotation by +𝜃 radians.) ˜ = {𝑥𝑗 + 𝑖0 , 𝑦𝑗 + 𝑖0} is also a ℂ-basis for 𝑉𝑗 (𝜇) , and by (2.17) we have But 𝔛 [ 𝑇|(𝑉 (𝜇) ) ]𝔛𝔛 = [ 𝑇ℂ |𝑉 (𝜇) ]𝔛˜𝔛˜ , 𝑗

ℝ

𝑗

˜ is a complex basis. Of course, with whose entries are real despite the fact that 𝔛 (𝜇) (𝜇) (𝜇) (𝜇) (𝜇) (𝜇) respect to the original ℂ-basis ℨ = {𝑓𝑗 = 𝑥𝑗 + 𝑖𝑦𝑗 , 𝐽𝑓𝑗 = 𝑥𝑗 − 𝑖𝑦𝑗 }

88

in 𝑉𝑗

2. FURTHER APPLICATIONS OF JORDAN FORM (𝜇)

, we get the expected diagonal matrix with complex entries [ 𝑇ℂ |𝑉 (𝜇) ]ℨ˜ℨ˜ = ( 𝑗

𝑟𝑒𝑖𝜃 0 ). 0 𝑟𝑒−𝑖𝜃

The Normal Form of a Linear Operator Over ℝ. Whether or not the complexified operator 𝑇ℂ ∶ 𝑉ℂ → 𝑉ℂ is diagonalizable, we will exhibit a realnormal form for the original ℝ-linear operator 𝑇 ∶ 𝑉 → 𝑉. This is closely related to the JCF for operators on complex spaces; it provides geometric insight into the way 𝑇 acts on the original vector space 𝑉. When 𝑇ℂ is diagonalizable, we first recall that the subspace 𝐸𝜇,𝜇 = 𝐸𝜇 ⊕ 𝐸𝜇 is of real type in 𝑉ℂ for each distinct complex-conjugate pair in sp(𝑇ℂ ). For each pair, we claim that 𝐸𝜇,𝜇 is a direct sum of two-dimensional 𝑇ℂ -invariant complex subspaces, each of which is of real type, with 𝑑(𝜇)

(𝐸𝜇 ⊕ 𝐸𝜇 ) ∩ (𝑉 + 𝑖0) =

⨁

(𝑉𝑗

(𝜇)

)ℝ .

𝑗=1

The sum on the right is direct so that dimℝ (⨁𝜇,𝜇 ⋯) = 2𝑑(𝜇) = 2 ⋅ dimℂ (𝐸𝜇 ), and since dimℝ (𝐸𝜇 ⊕ 𝐸𝜇 )ℝ = dimℂ (𝐸𝜇 ⊕ 𝐸𝜇 ) = 2𝑑(𝜇) , the spaces coincide. When 𝑇ℂ is diagonalizable, previous efforts yield: Theorem 2.35 (Real-Normal Form of 𝑇; 𝑇ℂ Diagonalizable). Consider alinear operator 𝑇 ∶ 𝑉 → 𝑉 on a real vector space such that 𝑇ℂ is diagonalizable. If we list the distinct (pairs of) eigenvalues as in (2.16), 𝜇1 , 𝜇1 , … , 𝜇𝑟 , 𝜇𝑟 , 𝜆𝑟+1 , … , 𝜆𝑠

(Im 𝜇𝑗 > 0),

there is an ℝ-basis 𝔛 in 𝑉 such that [𝑇]𝔛𝔛 has block-diagonal form

[𝑇]𝔛𝔛

𝑅 ⎛ 1 ⎜ ⎜ =⎜ ⎜ ⎜ ⎝ 0

0

⋱ 𝑅𝑟 𝐷1 ⋱

⎞ ⎟ ⎟ ⎟. ⎟ ⎟ 𝐷𝑠 ⎠

For 1 ≤ 𝑘 ≤ 𝑟, we have ⎛ 𝑟𝑘 𝑅(𝜃𝑘 ) 𝑅𝑘 = ⎜ ⎜ 0 ⎝

0 ⋱

⎞ ⎟⎟ ,

𝑟𝑘 𝑅(𝜃𝑘 ) ⎠

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

89

which consists of 𝑑(𝜇𝑘 ) = dimℂ (𝐸𝜇𝑘 ) blocks, each a scalar multiple of the same 2 × 2 rotation matrix. For 𝑟 + 1 ≤ 𝑘 ≤ 𝑠, 𝜆𝑘 is real and 𝜆 0 ⎞ ⎛ 𝑘 ⋱ 𝐷𝑘 = ⎜ ⎟ ⋱ ⎜ ⎟ 𝜆𝑘 ⎠ ⎝ 0 with 𝑑(𝜆𝑘 ) = dimℂ (𝐸𝜆𝑘 ). When 𝑻ℂ Is Not Diagonalizable. When 𝑇ℂ cannot be diagonalized, we apply the Jordan form for 𝑇ℂ . Lemma 2.36. Let 𝑇 ∶ 𝑉 → 𝑉 be a linear operator on a vector space over ℝ. For 𝜇 ∈ ℂ, the generalized eigenspace in 𝑉ℂ is 𝑀𝜇 = 𝑀𝜇 (𝑇ℂ ) = {𝑤 ∈ 𝑉ℂ ∶ (𝑇ℂ − 𝜇𝐼)𝑘 𝑤 = 0 for some 𝑘 ∈ ℕ}. Then 𝑤 ∈ 𝑀𝜇 ⇔ 𝐽(𝑤) ∈ 𝑀𝜇 so that 𝐽(𝑀𝜇 ) = 𝑀𝜇

and

𝐽(𝑀𝜇 ) = 𝑀𝜇 .

Proof. Since 𝐽 = 𝐽 −1 , the map Φ ∶ 𝑆 → 𝐽𝑆𝐽 = 𝐽𝑆𝐽 −1 is an automorphism of the algebra of all ℂ-linear maps Homℂ (𝑉ℂ , 𝑉ℂ ): • It is a bijection such that Φ(𝐼) = 𝐼, and it preserves products Φ(𝑆1 𝑆2 ) = Φ(𝑆1 )Φ(𝑆2 ) because 𝐽 2 = 𝐼 ⇒ 𝐽𝑆1 𝑆2 𝐽 = (𝐽𝑆1 𝐽)(𝐽𝑆2 𝐽). • Furthermore, Φ(𝑆1 +𝑆2 ) = Φ(𝑆1 )+Φ(𝑆2 ) and Φ(𝑐𝑆) = 𝑐Φ(𝑆) for 𝑐 ∈ ℂ, and in particular, Φ(𝑆 𝑘 ) = Φ(𝑆)𝑘 for 𝑘 ∈ ℕ. Thus 𝑘

𝑘

𝐽((𝑇ℂ − 𝜇𝐼)𝑘 )𝐽 = (𝐽(𝑇ℂ − 𝜇𝐼)𝐽) = (𝐽𝑇ℂ 𝐽 − 𝐽(𝜇𝐼)𝐽) = (𝑇ℂ − 𝜇𝐼)𝑘 (here 𝐽𝑇ℂ = 𝑇ℂ 𝐽 because 𝑇ℂ is of real-type by definition), and (𝐽(𝜇𝐼)𝐽)(𝑤) = 𝐽(𝜇 ⋅ 𝐽(𝑤)) = 𝜇𝐽 2 (𝑤) = 𝜇𝑤

for 𝑤 ∈ 𝑉ℂ .

Finally, we have (𝑇ℂ − 𝜇𝐼)𝑘 𝑤 = 0 if and only if 𝐽(𝑇ℂ − 𝜇𝐼)𝑘 𝑤 = 0 ⇔ 𝐽(𝑇ℂ − 𝜇𝐼)𝑘 𝐽(𝐽(𝑤)) = 0 ⇔ (𝑇ℂ − 𝜇)𝑘 𝐽(𝑤) = 0 ⇔ 𝐽(𝑤) ∈ 𝑀𝜇 . □

Then 𝐽(𝑀𝜇 ) = 𝑀𝜇 and 𝑀𝜇 = 𝐽(𝑀𝜇 ) since 𝐽 2 = 𝐼.

Theorem 2.37 (Real-Normal Form of 𝑇; 𝑇ℂ Not Diagonalizable). Consider a linear operator 𝑇 ∶ 𝑉 → 𝑉 on a real vector space and list the (pairs of) eigenvalues 𝜇1 , 𝜇1 , … , 𝜇𝑟 , 𝜇𝑟 , 𝜆𝑟+1 , … , 𝜆𝑠 (Im 𝜇𝑘 > 0) of 𝑇ℂ ∶ 𝑉ℂ → 𝑉ℂ as in (2.16). Then there is an ℝ-basis 𝔛 for 𝑉 that puts [𝑇]𝔛𝔛 in block diagonal form:

[𝑇]𝔛𝔛

𝐴 ⎛ 1 = ⎜⎜ ⎜ ⎝ 0

0 ⋱ ⋱ 𝐴𝑚

⎞ ⎟. ⎟ ⎟ ⎠

90

2. FURTHER APPLICATIONS OF JORDAN FORM

Each block 𝐴𝑗 has one of two possible block-upper triangular forms: either 𝜆 1 0 ⎛ ⎞ ⋱ ⋱ 𝐴𝑗 = ⎜ ⎟ ⋱ 1⎟ ⎜ 𝜆⎠ ⎝0

for real eigenvalues 𝜆 of 𝑇ℂ

or ⎛ 𝐴𝑗 = ⎜⎜ ⎜ ⎝

𝑅𝜇,𝜇

𝐼2

0

⋱ ⋱ ⋱ 0

𝐼2 𝑅𝜇,𝜇

⎞ ⎟ ⎟ ⎟ ⎠

for conjugate pairs 𝜇 = 𝑒𝑖𝜃 , 𝜇 = 𝑒−𝑖𝜃

where 𝐼2 is the 2 × 2 identity matrix and 𝑅𝜇,𝜇 = 𝑟 (

cos 𝜃 − sin 𝜃 ). sin 𝜃 cos 𝜃

Proof. The proof will of course employ the generalized eigenspace decomposition 𝑉ℂ = ⨁𝑧∈sp(𝑇 ) 𝑀𝑧 (𝑇ℂ ). As above, we write 𝑀𝜇,𝜇 = 𝑀𝜇 (𝑇ℂ )⊕𝑀𝜇 (𝑇ℂ ) ℂ and write 𝑑(𝜇) = 2 ⋅ dim(𝑀𝜇 (𝑇ℂ )) for each distinct pair of nonreal eigenvalues. The summands 𝑀𝜆 , 𝑀𝜇,𝜇 are of real type and are 𝑇ℂ -invariant, so for each 𝜇 we may write 𝑀𝜇,𝜇 as the complexification of its space of real points: (2.18)

𝑊𝜇 = 𝑀𝜇,𝜇 ∩ (𝑉 + 𝑖0).

Here 𝑀𝜇,𝜇 = 𝑊𝜇 + 𝑖𝑊𝜇 is 𝑇ℂ -invariant. Since 𝑇ℂ is of real type (by definition), it leaves invariant the ℝ-subspace 𝑉 + 𝑖0; therefore the space of real points 𝑊𝜇 is also 𝑇ℂ invariant with 𝑇ℂ |(𝑊𝜇 +𝑖0) = 𝑇|𝑊𝜇 . It follows that 𝑇ℂ |𝑀𝜇,𝜇 is the complexification of 𝑇|𝑊𝜇 and (𝑇|𝑊𝜇 )ℂ (𝑧) = (𝑇|𝑊𝜇 )(𝑧) + 𝑖(𝑇|𝑊𝜇 )(𝑧)

for 𝑧 ∈ 𝑀𝜇,𝜇 .

If we can find an ℝ-basis in the space (2.18) of real points 𝑊𝜇 ⊆ 𝑀𝜇,𝜇 for which 𝑇|𝑊𝜇 takes the form described in the theorem, then our proof is complete. For this we may obviously restrict attention to a single subspace 𝑀𝜇,𝜇 (or 𝑀𝜆 if 𝜆 ∈ ℝ) in 𝑉ℂ . Case 1: A Real Eigenvalue 𝜆. If 𝜆 is real, then (𝑇ℂ − 𝜆)𝑘 (𝑥 + 𝑖𝑦) = (𝑇ℂ − 𝜆)𝑘−1 [(𝑇 − 𝜆)𝑥 + 𝑖(𝑇 − 𝜆)𝑦 ] = ⋯ = (𝑇 − 𝜆)𝑘 𝑥 + 𝑖(𝑇 − 𝜆)𝑘 𝑦

for 𝑘 ∈ ℕ.

Thus, 𝑥 + 𝑖𝑦 ∈ 𝑀𝜆 (𝑇ℂ ) if and only if 𝑥, 𝑦 ∈ 𝑀𝜆 (𝑇), and the subspace of real points in 𝑀𝜆 (𝑇ℂ ) = 𝑀𝜆 (𝑇) + 𝑖𝑀𝜆 (𝑇) is precisely 𝑀𝜆 (𝑇) + 𝑖0. There is a ℂ-basis 𝔛 = {𝑓𝑗 } consisting of real vectors in 𝑀𝜆 (𝑇ℂ ) that yields the same matrix for 𝑇ℂ and for its restriction to this subspace of real points. We conclude that 𝑇ℂ |𝑊𝜆 = 𝑇|𝑊𝜆 = 𝑇|𝑀𝜆 (𝑇)

and

𝑇ℂ |𝑀𝜆 (𝑇) = 𝑇|𝑀𝜆 (𝑇) .

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

91

Case 2: A Conjugate Pair 𝜇, 𝜇. The space 𝑀𝜇,𝜇 = 𝑀𝜇 (𝑇ℂ ) ⊕ 𝑀𝜇 (𝑇ℂ ) is of real type because 𝐽(𝑀𝜇 ) = 𝑀𝜇 . In fact, if 𝑣 ∈ 𝑀𝜇 , then (𝑇ℂ − 𝜇)𝑘 𝑣 = 0 for some 𝑘. But 𝑇ℂ 𝐽 = 𝐽𝑇ℂ , so (𝑇ℂ − 𝜇)𝑘 𝐽(𝑣) = (𝑇ℂ − 𝜇)𝑘−1 𝐽(𝑇ℂ − 𝜇)(𝑣) = ⋯ = 𝐽(𝑇ℂ − 𝜇)𝑘 (𝑣) = 0. Thus, 𝑀𝜇,𝜇 is the complexification of its subspace 𝑉𝜇 = 𝑀𝜇,𝜇 (𝑇ℂ ) ∩ (𝑉 + 𝑖0) of real points. By the cyclic subspace decomposition (Theorem 1.16), a generalized eigenspace 𝑀𝜇 (𝑇ℂ ) is a direct sum of 𝑇ℂ -invariant spaces 𝐶𝑗 that are cyclic under the (𝑗)

(𝑗)

action of the nilpotent operator (𝑇ℂ − 𝜇𝐼). In 𝐶𝑗 , take a basis 𝔛𝑗 = {𝑓1 , … , 𝑓𝑑𝑗 } that puts (𝑇ℂ − 𝜇𝐼)|𝐶𝑗 into elementary nilpotent form 0 1 0 ⎛ ⎞ ⋱ ⋱ ⎜ ⎟ ⋱ 1⎟ ⎜ 0⎠ ⎝0

so that

𝜇 1 0 ⎛ ⎞ ⋱ ⋱ [(𝑇ℂ )|𝐶𝑗 ] = ⎜ ⎟. ⋱ 1⎟ ⎜ 𝜇⎠ ⎝0

For the basis 𝔛𝑗 , we have (𝑇ℂ − 𝜇𝐼)𝑓1 = 0

and

(𝑇ℂ − 𝜇𝐼)𝑓𝑗 = 𝑓𝑗−1

for 𝑗 > 1,

which implies that 𝑇ℂ (𝑓1 ) = 𝜇𝑓1 and 𝑇ℂ (𝑓𝑗 ) = 𝜇𝑓𝑗 + 𝑓𝑗−1 for 𝑗 > 1. If 𝑓𝑗 = 𝑥𝑗 + 𝑖𝑦𝑗 ∈ 𝑊𝜇 + 𝑖𝑊𝜇 , we have 𝑓𝑗 = 𝐽(𝑓𝑗 ) = 𝑥𝑗 − 𝑖𝑦𝑗 and 𝑇ℂ (𝑓𝑗 ) = 𝜇𝑓𝑗 + 𝑓𝑗−1 . Hence, since 𝑇ℂ is of real type, 𝑇ℂ (𝑓𝑗 ) = 𝑇ℂ (𝐽(𝑓𝑗 )) = 𝐽(𝑇ℂ (𝑓𝑗 )) = 𝐽(𝜇𝑓𝑗 + 𝑓𝑗−1 ) = 𝜇𝐽(𝑓𝑗 ) + 𝐽(𝑓𝑗−1 ), and we get 𝑇ℂ (𝑓𝑗 ) = 𝜇𝑓𝑗 + 𝑓𝑗−1 . Writing 𝜇 = 𝑎 + 𝑖𝑏 with 𝑏 ≠ 0 (or in polar form, 𝜇 = 𝑟𝑒𝑖𝜃 with 𝜃 ∉ 𝜋ℤ), we get 𝑇(𝑥𝑗 ) + 𝑖𝑇(𝑦𝑗 ) = 𝑇ℂ (𝑥𝑗 + 𝑖𝑦𝑗 ) = 𝑇ℂ (𝑓𝑗 ) = 𝜇𝑓𝑗 + 𝑓𝑗−1 = (𝑎 + 𝑖𝑏)(𝑥𝑗 + 𝑖𝑦𝑗 ) + (𝑥𝑗−1 + 𝑖𝑦𝑗−1 ) = (𝑎𝑥𝑗 − 𝑏𝑦𝑗 ) + 𝑖(𝑏𝑥𝑗 + 𝑎𝑦𝑗 ) + (𝑥𝑗−1 + 𝑖𝑦𝑗−1 ). Since 𝜇 = 𝑟𝑒𝑖𝜃 and 𝜇 = 𝑟𝑒−𝑖𝜃 , 𝑇(𝑥𝑗 ) = 𝑎𝑥𝑗 − 𝑏𝑦𝑗 + 𝑥𝑗−1 = 𝑥𝑗 ⋅ 𝑟 cos(𝜃) − 𝑦𝑗 ⋅ 𝑟 sin(𝜃) + 𝑥𝑗−1 , 𝑇(𝑦𝑗 ) = 𝑏𝑥𝑗 + 𝑎𝑦𝑗 + 𝑦𝑗−1 = 𝑥𝑗 ⋅ 𝑟 sin(𝜃) + 𝑦𝑗 ⋅ 𝑟 cos(𝜃) + 𝑦𝑗−1 , with respect to the ℝ-basis (1)

(1)

(1)

(1)

(2)

(2)

{𝑥1 , 𝑦1 ; … ; 𝑥𝑑1 , 𝑦𝑑1 ; 𝑥1 , 𝑦1 ; … }

92

2. FURTHER APPLICATIONS OF JORDAN FORM 𝑑

(𝜇)

in 𝑉𝜇 = 𝑀𝜇,𝜇 ∩ (𝑉 + 𝑖0) = ⨁𝑗=1 𝑉𝐽 . Thus, the matrix [𝑇]𝔛𝔛 consists of diagonal blocks of size 2𝑑𝑗 × 2𝑑𝑗 that have the form 𝐼2 ⋱ ⎛ 𝑟 𝑅(𝜃) 𝑟 𝑅(𝜃) 𝐼2 ⎜ 𝐵=⎜ ⋱ ⎜ 0 𝑟 ⎝ in which 𝐼2 is the 2 × 2 identity matrix and 𝑟 ⋅ 𝑅(𝜃) = (

0 ⋱ 𝐼2 𝑅(𝜃)

⎞ ⎟ ⎟ ⎟ ⎠

𝑟 cos 𝜃 −𝑟 sin 𝜃 𝑎 −𝑏 )=( ) 𝑏 𝑎 𝑟 sin 𝜃 𝑟 cos 𝜃

if 𝜇 = 𝑎 + 𝑖𝑏 = 𝑟𝑒𝑖𝜃 . That concludes the proof of Theorem 2.37.

□

Example 2.38. Find the Jordan form when the following matrix in M(3, ℝ) is viewed as a complex matrix having real entries: 4 −1 −1 0 4 ). 𝐴 = ( −2 2 −2 2 Then: 1. Find the real-normal form for 𝐴. 2. Find an invertible transition matrix 𝑄 ∈ M(3, ℝ) such that 𝑄𝐴𝑄−1 has the normal form in 1. 3. Exhibit an ℝ-basis 𝔛 for ℝ3 such that [𝐿𝐴 ]𝔛𝔛 has this normal form. Discussion. The characteristic polynomial of 𝐴 is 4 − 𝑥 −1 −1 4 ) 𝑝𝐴 (𝑥) = det ( −2 −𝑥 2 −2 2 − 𝑥 = −𝑥3 + 6𝑥2 − 16𝑥 + 16 = (𝑥 − 2)(𝑥2 − 4𝑥 + 9) , so the eigenvalues are 𝜆1 = 2,

𝜆2 = 2 − 2𝑖,

𝜆3 = 2 + 2𝑖.

Since the eigenvalues are distinct, the diagonal matrix 𝐷 = diag(𝜆1 , 𝜆2 , 𝜆3 ) is already a JCF for 𝐴. Applying symbolic row operations, we can compute bases for the one-dimensional eigenspaces 𝐸𝜆1 (𝐴) = ker(𝐴 − 𝜆1 𝐼) = ℂ ⋅ (𝐞1 + 𝐞2 + 𝐞3 ) 𝐸𝜆2 (𝐴) = ker(𝐴 − 𝜆2 𝐼) = ℂ ⋅ ((3 − 𝑖)𝐞1 + (3 + 4𝑖)𝐞2 + 5𝐞3 ) 𝐸𝜆3 (𝐴) = ker(𝐴 − 𝜆3 𝐼) = ℂ ⋅ ((3 + 𝑖)𝐞1 + (3 − 4𝑖)𝐞2 + 5𝐞3 ). The proofs of Lemmas 2.33 and 2.34, which together yield the normal form Corollary 2.35, are constructive and provide an explicit recipe for constructing ℂ-bases for 𝑇ℂ that lie within the ℝ-subspaces of real points in the complex vector spaces 𝐸𝜆1 (𝑇ℂ ) or 𝑀𝜆 ,𝜆 (𝑇ℂ ). 2

2

2.2. NORMAL FORMS FOR LINEAR OPERATORS OVER ℝ

93

If we identify ℂ3 as the complexification of ℝ3 , the conjugation map becomes 𝐽(𝐱 + 𝑖𝐲) = 𝐱 − 𝑖𝐲. Label the three eigenvectors as (𝑘)

𝐳𝜆𝑘 = 𝐱𝑘 + 𝑖𝐲𝑘 = (𝑥1

(𝑘)

(𝑘)

+ 𝑖𝑦1 , … , 𝑥3

(𝑘)

+ 𝑖𝑦3 ) for 𝑘 = 1, 2, 3

(with 𝑑𝑘 = 𝑑(𝜆𝑘 ) = 1 since all eigenvalues 𝜆𝑘 have multiplicity = 1). The polar forms of the eigenvalues are 𝜆1 = 2,

𝜆2 = √8 𝑒−𝑖𝜋/2 ,

𝜆3 = √8 𝑒𝑖𝜋/2 .

The first eigenvector (1)

𝐳1 = (𝐞1 + 𝐞2 + 𝐞3 ) + 𝑖𝟎 = 𝑥1 + 𝑖0 is already of real type. The eigenvectors 𝐳2 , 𝐳3 = 𝐽(𝐳2 ) are a ℂ-basis for 𝑀 = 𝑀𝜆

2 ,𝜆2

. (2)

(2)

As in the proof of Theorem 2.35, the real and imaginary parts of 𝐳2 = 𝑥1 +𝑖𝑦1 , (2)

𝑥1 = Re(𝐳2 ) = 3𝐞1 + 3𝐞2 + 5𝐞3 = (3, 3, 5) (2)

𝑦1

= Im(𝐳2 ) = −𝐞1 + 4𝐞2 = (−1, 4, 0)

in ℝ3

in ℝ3 ,

are vectors of real-type that form a ℂ-basis for 𝑀 (so do the real and imaginary (1) (2) (2) parts of 𝐳3 = 𝐽(𝐳2 ). The point of Theorem 2.37 is that 𝔛 = {𝑥1 , 𝑥1 , 𝑦1 } is also an ℝ-basis in ℝ3 that puts 𝑇 into the real-normal form [𝑇]𝔛𝔛

1 0 0 √8 cos 𝜃 √8 sin 𝜃 ) =(0 0 −√8 sin 𝜃 √8 cos 𝜃

where 𝜃 = 𝜋/4 radians. Compare this with the diagonalized form [𝑇]𝔜𝔜

2 0 0 𝑖𝜋/4 √ 8𝑒 0 =(0 ) √8 𝑒−𝑖𝜋/4 0 0

of 𝑇ℂ for the complex basis 𝔜 = {𝐳1 , 𝐳2 , 𝐳3 } in ℂ3 .

○

Exercise 2.39. Find the Jordan form for the matrix in M(4, ℝ), 2 ⎛ −1 𝐴=⎜ ⎜ −1 ⎝ −1

1 4 4 3

0 0 0 0⎞ ⎟, 1 −1 ⎟ 2 −1 ⎠

regarded as matrix in M(4, ℂ). Use this complex JCF to find a real normal form for 𝐴.

94

2. FURTHER APPLICATIONS OF JORDAN FORM

Additional Exercises Section 2.1. Jordan Form and Differential Equations 1. True/False Questions (“True” if the statement is always true.) T/F Questions (a)–(e) below are concerned with systems of constantcoefficient differential equations 𝑑𝑋 = 𝐴 ⋅ 𝑋(𝑡) 𝑑𝑡

(𝐴 ∈ M(𝑛, ℝ))

and their vector-valued solutions 𝑋(𝑡) with values in ℝ𝑛 . (a) For any matrix 𝐴 ∈ M(𝑛, ℝ), the system 𝑑𝑋/𝑑𝑡 = 𝐴 ⋅ 𝑋(𝑡) has solutions that are differentiable for all 𝑡 ∈ ℝ. (b) For any 𝐴 ∈ M(𝑛, ℝ) and 𝐜 ∈ ℝ𝑛 , there is a unique solution to the system 𝑑𝑋/𝑑𝑡 = 𝐴 ⋅ 𝑋(𝑡) that satisfies the initial condition 𝑋(0) = 𝐜. (c) For some choices of 𝐴 ∈ M(𝑛, ℝ) and 𝐜 ∈ ℝ𝑛 , the system 𝑑𝑋/𝑑𝑡 = 𝐴 ⋅ 𝑋(𝑡) has no smooth solutions with 𝑋(0) = 𝐜. (d) If 𝐞1 , … , 𝐞𝑛 are the standard basis vectors in ℝ𝑛 and 𝑋𝑗 (𝑡) is a solution of the system such that 𝑋𝑗 (0) = 𝐞𝑗 , then 𝑋1 (𝑡), … , 𝑋𝑛 (𝑡) are linearly independent in the space 𝒞 ∞ (ℝ, ℝ𝑛 ) of infinitely differentiable vector valued function on the real line. (e) Every solution of the system 𝑑𝑋/𝑑𝑡 = 𝐴 ⋅ 𝑋(𝑡) in (d) lies in the finite-dimensional vector subspace spanned by the particular solutions 𝑋1 (𝑡), … , 𝑋𝑛 (𝑡). T/F Questions (f)–(g) below are concerned with the solutions of an 𝑛th -order constant-coefficient differential equation 𝐿𝑦(𝑡) = 𝐷𝑛 𝑦(𝑡) + 𝑎1 𝐷𝑛−1 𝑦(𝑡) + ⋯ + 𝑎𝑛 𝑦(𝑡) = 0. (f) Solutions of this ODE form an 𝑛-dimensional subspace in the space 𝒞 ∞ (ℝ) of smooth real-valued functions 𝑦(𝑡) on the real line. (g) The scalar-valued solution 𝑦(𝑡) of this ODE exists and is unique once we specify its initial value 𝑦(0) = 𝑐 (𝑐 ∈ ℝ) at time 𝑡 = 0. 2. For each of the following linear systems of constant-coefficient differential equations, rewrite the system in the form 𝑑𝑌 /𝑑𝑡 = 𝐴 ⋅ 𝑌 (𝑡) for vector-valued functions 𝑌 (𝑡). (i) (ii) (iii) (iv)

𝑑𝑦1 𝑑𝑡 𝑑𝑦1 𝑑𝑡 𝑑𝑦1 𝑑𝑡 𝑑𝑦1 𝑑𝑡

= 𝑦2 (𝑡), = 𝑦2 (𝑡), = 𝑦1 (𝑡),

𝑑𝑦2 𝑑𝑡 𝑑𝑦2 𝑑𝑡 𝑑𝑦2 𝑑𝑡

= −𝑦1 (𝑡) = 𝑦1 (𝑡) = 𝑦1 (𝑡) + 𝑦2 (𝑡)

= 𝑦1 (𝑡) + 𝑦2 (𝑡),

𝑑𝑦2 𝑑𝑡

= 𝑦1 (𝑡) + 2𝑦2 (𝑡),

𝑑𝑦3 𝑑𝑡

= −𝑦1 (𝑡) + 𝑦3 (𝑡)

ADDITIONAL EXERCISES

95

Then (a) Find solutions 𝑌 (𝑡) = (𝑦1 (𝑡), 𝑦2 (𝑡)) that satisfy the initial conditions 𝑌 (0) = 𝐜 in ℝ2 for the initial values 𝐜1 = (1, 0)

and

𝐜2 = (0, 1)

in ℝ2 .

(b) Explain why a solution satisfying an arbitrary initial condition 𝑌 (0) = 𝐜 = (𝑐1 , 𝑐2 ) can always be obtained by taking a linear combination of these two basic solutions (with what coefficients?). (c) Find the general solution of this equation, independent of initial conditions. 3. Find all solutions of the the linear system of constant-coefficient differential equations 𝑑𝑌 /𝑑𝑡 = 𝐴 ⋅ 𝑌 (𝑡) 𝑑𝑦1 𝑑𝑦2 𝑑𝑦3 = 𝑦2 + 𝑦3 , = 𝑦1 + 𝑦3 , = 𝑦1 + 𝑦2 𝑑𝑡 𝑑𝑡 𝑑𝑡 such that y(0) = (1, 0, −1). 4. Continuing with Additional Exercise 3, find a basis 𝔛 = {𝐲1 (𝑡), 𝐲2 (𝑡), 𝐲𝟑 (𝑡)} for the three-dimensional vector space of vector-valued solutions to the system 𝑑𝐲/𝑑𝑡 = 𝐴 ⋅ 𝐲(𝑡). 5. Find all solutions of the differential equation 𝑑3 𝑦 𝑑𝑦 = 0. − 𝑑𝑡 𝑑𝑡3 Then exhibit a solution for which 𝑦(0) = 3, 𝑦′ (0) = −1. What is the value of 𝑑2 𝑦/𝑑𝑡2 (0) for this solution? 6. Find all solutuions of the second-order differential equation 𝑑2 𝑦 𝑑𝑦 + 𝑦(𝑡) = 0. +2 2 𝑑𝑡 𝑑𝑡 Section 2.2. Normal Forms for Linear Operators over ℝ 1. True/False Questions (“True” if the statement is always true.) In (a)–(c) below, let 𝐴 ∈ M(𝑛, ℂ), define 𝐿𝐴 (𝐳) = 𝐴 ⋅ 𝐳 on ℂ𝑛 , and regard ℂ𝑛 as the complexification ℝ𝑛 + 𝑖ℝ𝑛 of ℝ𝑛 . (a) 𝐿𝐴 is of real type ⇔ 𝐴(𝐱 + 𝑖0) ∈ ℝ𝑛 + 𝑖0 for all 𝐱 ∈ ℝ𝑛 . (b) 𝐿𝐴 is of real type ⇔ all entries in 𝐴 are real. (c) 𝐿𝐴 is of real type ⇔ the spectrum spℂ (𝐿𝐴 ) is contained in ℝ + 𝑖0. (d) If 𝑇 is a linear operator on a vector space over ℝ, then 𝜆 = 0 is in spℝ (𝑇) ⇔ it is in spℂ (𝑇ℂ ). (e) If 𝑉 is a linear operator on a vector space over ℝ, then det(𝑇) is the product of the distinct eigenvalues of the complexification 𝑇ℂ . 2. If 𝑇 is a linear operator on a vector space 𝑉 over ℝ, is rank(𝑇) = dimℝ (𝑇(𝑉)) equal to rank(𝑇ℂ ) = dimℂ (𝑇ℂ (𝑉ℂ ))?

96

2. FURTHER APPLICATIONS OF JORDAN FORM

3. If 𝐴 ∈ M(𝑛, ℝ), the multiplication operator for 𝐱 ∈ ℝ𝑛

𝐿𝐴 (𝐱) = 𝐴 ⋅ 𝐱

is an ℝ-linear operator on ℝ𝑛 , but 𝐴 can also be viewed as a complex matrix whose entries happen to be real. (a) Verify that ̃ (𝐳) = 𝐴 ⋅ 𝐳 𝐿𝐴

for 𝐳 ∈ ℂ𝑛

defines a complex-linear operator on ℂ𝑛 . (b) If we view ℂ𝑛 as the complexification (ℝ𝑛 )ℂ of ℝ𝑛 , the complexification (𝐿𝐴 )ℂ = 𝐿𝐴 ∶ 𝑉ℂ → 𝑉ℂ is also a ℂ-linear operator on ℂ𝑛 . Verify that these constructs agree: ̃ = (𝐿𝐴 )ℂ 𝐿𝐴

on ℂ𝑛 .

4. Let 𝐵 be a bilinear form on a vector space 𝑉 over ℝ, and define a map 𝐵˜ ∶ 𝑉ℂ × 𝑉ℂ → ℂ letting 𝐵˜(𝑥 + 𝑖𝑦, 𝑥′ + 𝑖𝑦′ ) = [𝐵(𝑥, 𝑥′ ) − 𝐵(𝑦, 𝑦′ )] + 𝑖[𝐵(𝑥, 𝑦′ ) + 𝐵(𝑥′ , 𝑦)]. Is 𝐵˜ a complex-bilinear form on 𝑉ℂ (ℂ-linear in each input when the other entry is held fixed)? 5. Let 𝑇 be a linear operator on a finite-dimensional real vector space 𝑉 and 𝑇ℂ on 𝑉ℂ its complexification. If 𝜆1 , … , 𝜆𝑛 are the complex eigenvalues of 𝑇ℂ listed according to their multiplicities, prove the following: 𝑛

𝑛

(a) det(𝑇) = det(𝑇ℂ ) = ∏ 𝜆𝑗

(b) Tr(𝑇) = Tr(𝑇ℂ ) = ∑ 𝜆𝑗 . 𝑗=1

𝑗=1

6. For each of the following matrices in M(2, ℝ), (i) 𝐴 = (

0 −1 ) 1 0

(ii) 𝐴 = (

0 1 ) 1 0

(iii) 𝐴 = (

1 1 ) −1 1

(iv) 𝐴 = (

1 1 ), 1 −1

(a) Find the Jordan form of the complexification 𝐿𝐴 ∶ ℂ2 → ℂ2 . (b) Find the real-normal form of 𝐴. (c) Find a transition matrix 𝑄 such that 𝑄𝐴𝑄−1 puts 𝐴 into the realnormal form in (b). 7. Find the complex Jordan form of the matrix 3 ⎛1 𝐴=⎜ ⎜3 ⎝4

2 0 3 0

−2 −1 −1 0⎞ ⎟ −2 −1 ⎟ −2 −1 ⎠

ADDITIONAL EXERCISES

97

regarding 𝐴 as a complex matrix. Then use this to find a real-normal form for 𝐴. Hint. 𝜆 = 𝑖 is a complex double root of 𝑝𝐴 .

10.1090/cln/030/03

CHAPTER 3

Bilinear, Quadratic, and Multilinear Forms An Overview of This Chapter. Bilinear forms are maps 𝐵 ∶ 𝑉 × 𝑉 → 𝕂 that act on ordered pairs (𝑣, 𝑤) of vectors to produce a scalar 𝐵(𝑣, 𝑤) in the ground field 𝕂. “Bilinearity” means that the output 𝐵(𝑣, 𝑤) is a linear function of 𝑣 when the other input 𝑤 is held fixed, and likewise if 𝑣 is held fixed while 𝑤 varies. In a certain sense, a bilinear map involves the joint action of two linear maps from 𝑉 → 𝕂, each of which is an element of the dual vector space 𝑉 ∗ , which was a main topic in Chapter 3 of LA I. Bilinear forms arise often in physics and many areas of mathematics. For instance, the inner products 𝐵(𝑢, 𝑣) = (𝑢, 𝑣) on real vector spaces are wellknown examples of bilinear forms. Thus, it is of some importance to find natural “canonical forms” that reveal the behavior of a given form 𝐵. This is analogous to the problem of diagonalizing a linear linear operator 𝑇, or finding a basis that puts the matrix [𝑇]𝔛𝔛 into Jordan canonical form when 𝑇 cannot be diagonalized. Thus, we will often speak of “diagonalizing” bilinear forms, although we will soon see that the two problems are different and have markedly different outcomes. (There are only finitely many possible canonical forms for 𝐵 on a vector space with dim𝕂 𝑉 = 𝑛.) Important symmetry groups in physics arise as groups of linear operators that leave invariant particular bilinear forms. As an example, we have the Lorentz group, the family of linear operators that leave invariant the “Lorentz form” 𝐵((𝐱, 𝑡), (𝐱, 𝑡)) = 𝑥12 + 𝑥22 + 𝑥32 − 𝑐2 𝑡2 3

(with 𝑐 = the speed of light) ,

that acts on the coordinates (𝐱, 𝑡) ∈ ℝ × ℝ of space-time in Einstein’s theory of special relativity. These transformations tell us how measurements of space and time change when we shift from the viewpoint of one observer to that of a different observer moving at constant velocity. Invariance of this form embodies the observed fact that the speed of a passing light wave is the same for all observers even if they are moving relative to one another. Many consequences of this simple fact are profoundly counterintuitive, and invariance of the Lorentz form is the key to understanding them. Detailed study of this symmetry group revealed that neither space nor time measurements have any absolute meaning; on the other hand, it does allow us to compute how appearances change as we move from one frame of reference to another. Beyond all this, we will enter the world of tensors and multilinear forms, with which you may be less familiar. These have become central topics in differential geometry and in physics too — general relativity is all about tensors; so is much of electromagnetic theory and the analysis of stress in solid media 99

100

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

such as the Earth’s mantle and core. This chapter provides a basic introduction to tensors. It is followed in Chapter 4 by a much more detailed introduction to the role of multilinear algebra in modern differential geometry, ending with a re-examination of what was really going on in vector calculus where you first encountered div, grad, curl and all that.

3.1. Basic Definitions and Examples Definition 3.1. A bilinear form is a map 𝐵 ∶ 𝑉 × 𝑉 → 𝕂 that is linear in each entry when the other entry is held fixed so that 𝐵(𝛼𝑥, 𝑦) = 𝛼𝐵(𝑥, 𝑦) = 𝐵(𝑥, 𝛼𝑦) 𝐵(𝑥1 + 𝑥2 , 𝑦) = 𝐵(𝑥1 , 𝑦) + 𝐵(𝑥2 , 𝑦) for 𝛼 ∈ 𝕂, 𝑥𝑘 ∈ 𝑉, 𝑦𝑘 ∈ 𝑉 𝐵(𝑥, 𝑦1 + 𝑦2 ) = 𝐵(𝑥, 𝑦1 ) + 𝐵(𝑥, 𝑦2 ). (This of course forces 𝐵(𝑥, 𝑦) = 0 if either input is zero.) We say 𝐵 is symmetric if 𝐵(𝑥, 𝑦) = 𝐵(𝑦, 𝑥) for all 𝑥, 𝑦 and antisymmetric if 𝐵(𝑥, 𝑦) = −𝐵(𝑦, 𝑥). Similarly, a multilinear form (aka: 𝑘-linear form or a tensor of rank 𝑘) is a map 𝐵 ∶ 𝑉 × ⋯ × 𝑉 → 𝕂 that is linear in each entry when the other entries are held fixed. We write 𝑉 (0,𝑘) = 𝑉 ∗ ⊗ ⋯ ⊗ 𝑉 ∗ for the set of 𝑘-linear forms. The reason we use the dual space 𝑉 ∗ here rather than 𝑉, and the rationale for this “tensor product” notation, will soon become clear. You can think of a rank-𝑘 tensor on a vector space 𝑉 as a black box that accepts an ordered list of input vectors chosen from 𝑉 and returns a single scalar in 𝕂 as its output: input

output

(𝑣1 , … , 𝑣𝑘 ) −−−⟶ 𝐵 −−−⟶ 𝐵(𝑣1 , … , 𝑣𝑘 ) ∈ 𝕂. The set 𝑉 (0,2) = 𝑉 ∗ ⊗ 𝑉 ∗ of bilinear forms on 𝑉 becomes a vector space over 𝕂 if we define the following: 1. Zero Element. 𝐵(𝑥, 𝑦) = 0 for all 𝑥, 𝑦 ∈ 𝑉. 2. Scalar Multiple. (𝛼𝐵)(𝑥, 𝑦) = 𝛼𝐵(𝑥, 𝑦) for 𝛼 ∈ 𝕂 and 𝑥, 𝑦 ∈ 𝑉. 3. Addition. (𝐵1 + 𝐵2 )(𝑥, 𝑦) = 𝐵1 (𝑥, 𝑦) + 𝐵2 (𝑥, 𝑦) for 𝑥, 𝑦 ∈ 𝑉. When 𝑘 > 2, the space of 𝑘-linear forms 𝑉 ∗ ⊗⋯⊗𝑉 ∗ is also a vector space using the same definitions. The space 𝑉 (0,1) of 1-forms (the tensors of rank 1 on 𝑉) is the dual space 𝑉 ∗ = Hom𝕂 (𝑉, 𝕂) consisting of all 𝕂-linear maps ℓ ∶ 𝑉 → 𝕂. By convention, the space of 0-forms is identified with the ground field: 𝑉 (0,0) = 𝕂. Its elements are not mappings on 𝑉. It is also possible (and useful) to define multilinear forms of mixed type, mappings 𝜃 ∶ 𝑉1 × ⋯ × 𝑉𝑘 → 𝕂 in which the components 𝑉𝑗 are not all the same. These forms also constitute a vector space. We postpone any discussion of forms of “mixed type.” If ℓ1 , ℓ2 ∈ 𝑉 ∗ , we can create a bilinear form ℓ1 ⊗ ℓ2 by taking a “tensor product” of these forms, defining ℓ1 ⊗ ℓ2 (𝑣1 , 𝑣2 ) = ⟨ℓ1 , 𝑣1 ⟩ ⋅ ⟨ℓ2 , 𝑣2 ⟩

for 𝑣1 , 𝑣2 ∈ 𝑉.

3.1. BASIC DEFINITIONS AND EXAMPLES

101

Bilinearity is easily checked. More generally, if ℓ1 , … , ℓ𝑘 ∈ 𝑉 ∗ , we obtain a 𝑘-linear map from 𝑉 × ⋯ × 𝑉 → 𝕂 if we let 𝑘

ℓ1 ⊗ ⋯ ⊗ ℓ𝑘 (𝑣1 , … , 𝑣𝑘 ) = ∏⟨ℓ𝑗 , 𝑣𝑗 ⟩. 𝑗=1

We will show that monomials of the form ℓ1 ⊗ ⋯ ⊗ ℓ𝑘 with ℓ𝑗 ∈ 𝑉 ∗ span the space 𝑉 (0,𝑘) of rank-𝑘 tensors, but they do not by themselves form a vector space except when 𝑘 = 1. Exercise 3.2. If 𝐴 ∶ 𝑉 → 𝑉 is any linear operator on a real inner product space, verify that 𝜙(𝑣1 , 𝑣2 ) = (𝐴𝑣1 , 𝑣2 )

for 𝑣1 , 𝑣2 ∈ 𝑉

is a bilinear form. Note. This would not be true if 𝕂 = ℂ. Inner products on a complex vector space are conjugate-linear in their second input with (𝑥, 𝑧 ⋅ 𝑦) = 𝑧 ⋅ (𝑥, 𝑦) for 𝑧 ∈ ℂ; for ℂ-linearity in the second entry, we would need (𝑥, 𝑧 ⋅ 𝑦) = 𝑧 ⋅ (𝑥, 𝑦). However, 𝑐 = 𝑐 for real scalars, so an inner product on a real vector space is a bona fide rank-2 tensor in 𝑉 (0,2) . ○ Example 3.3. Let 𝐴 ∈ M(𝑛, 𝕂) and 𝑉 = 𝕂𝑛 . Regarding elements of 𝕂𝑛 as 𝑛 × 1 column vectors, define 𝑛

𝐵(𝐱, 𝐲) = 𝐱T 𝐴𝐲 = ∑ 𝑥𝑖 𝐴𝑖𝑗 𝑦𝑗 𝑖,𝑗=1

where 𝐱T is the 1 × 𝑛 transpose of the 𝑛 × 1 column vector 𝐱. If we interpret the 1 × 1 product matrix as a scalar in 𝕂, then 𝐵 is a typical bilinear form on 𝑉 = 𝕂𝑛 . The analogous construction for multilinear forms is more complicated. For instance, to describe a rank-3 tensor 𝐵(𝐱, 𝐲, 𝐳) on 𝑉 × 𝑉 × 𝑉, we would need a three-dimensional 𝑛 × 𝑛 × 𝑛 array of coefficients {𝐵𝑖1 ,𝑖2 ,𝑖3 ∶ 1 ≤ 𝑖𝑘 ≤ 𝑛}, from which we recover the original multilinear form via a sum with 𝑛3 terms: 𝑛

𝐵(𝐱, 𝐲, 𝐳) =

∑

𝑥𝑖1 𝑦𝑖2 𝑧𝑖3 𝐵𝑖1 ,𝑖2 ,𝑖3

for 𝐱, 𝐲, 𝐳 ∈ 𝕂3 .

𝑖1 ,𝑖2 ,𝑖3 =1

If 𝐵 has rank 𝑘, the coefficient array is 𝑘-dimensional with 𝑛𝑘 entries, where 𝑛 = dim(𝑉); 𝐵 is described by an 𝑛 × 𝑛 matrix only for bilinear forms (𝑘 = 2). For the time being, we will focus on bilinear forms, which are quite important in their own right. Many examples involve symmetric or antisymmetric bilinear forms, and in any case, we have the following result. Lemma 3.4. Every bilinear form 𝐵 is uniquely the sum 𝐵 = 𝐵+ + 𝐵− of a symmetric and an antisymmetric form.

102

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

Proof. 𝐵± are given by 𝐵+ (𝑣1 , 𝑣2 ) =

𝐵(𝑣1 , 𝑣2 ) + 𝐵(𝑣2 , 𝑣1 ) 2

and

𝐵− =

𝐵(𝑣1 , 𝑣2 ) − 𝐵(𝑣2 , 𝑣1 ) . 2

As for uniqueness, you cannot have 𝐵 = 𝐵 ′ with 𝐵 symmetric and 𝐵 ′ antisymmetric without both being the zero form. □ Variants: Sesquilinear Forms. If 𝑉 is an inner product space over ℂ, a map 𝐵 from 𝑉 × 𝑉 → ℂ is sesquilinear if it is a linear function of its first entry when the other is held fixed but conjugate-linear in the second entry so that 𝐵(𝑥1 + 𝑥2 , 𝑦) = 𝐵(𝑥1 , 𝑦) + 𝐵(𝑥2 , 𝑦) 𝐵(𝑥, 𝑦1 + 𝑦2 ) = 𝐵(𝑥, 𝑦1 ) + 𝐵(𝑥, 𝑦2 ) and 𝐵(𝛼𝑥, 𝑦) = 𝛼𝐵(𝑥, 𝑦),

𝐵(𝑥, 𝛼𝑦) = 𝐵(𝑥, 𝑦) 𝛼

for all 𝛼 ∈ ℂ. (This is the same as bilinearity when 𝕂 = ℝ.) The map is Hermitian symmetric if 𝐵(𝑦, 𝑥) = 𝐵(𝑥, 𝑦). On a vector space over ℝ, an inner product is a special type of sesquilinear form, one that is strictly positive definite in the sense that (3.1)

𝐵(𝑥, 𝑥) ≥ 0 for all 𝑥 ∈ 𝑉

and

𝐵(𝑥, 𝑥) = ‖𝑥‖2 = 0 ⇒ 𝑥 = 0.

Over ℂ, an inner product is a map 𝐵 ∶ 𝑉 ×𝑉 → ℂ that is sesquilinear, Hermitian symmetric, and satisfies the positivity conditions (3.1). A bilinear form 𝐵 ∈ 𝑉 ∗ ⊗ 𝑉 ∗ is completely determined by its action on a basis 𝔛 = {𝑒𝑖 } via the matrix 𝐵𝔛 = [𝐵𝑖𝑗 ] with entries 𝐵𝑖𝑗 = 𝐵(𝑒𝑖 , 𝑒𝑗 )

for 1 ≤ 𝑖, 𝑗 ≤ 𝑛.

This matrix is symmetric/antisymmetric if and only if 𝐴 has these properties. Given [𝐵]𝔛 , we recover 𝐵 by writing 𝑥 = ∑𝑖 𝑥𝑖 𝑒𝑖 , 𝑦 = ∑𝑗 𝑦𝑗 𝑒𝑗 so that 𝐵(𝑥, 𝑦) = 𝐵( ∑ 𝑥𝑖 𝑒𝑖 , ∑ 𝑦𝑗 𝑒𝑗 ) = ∑ 𝑥𝑖 (𝑒𝑖 , ∑ 𝑦𝑗 𝑒𝑗 ) 𝑖

(3.2)

= ∑ 𝑥𝑖 𝐵𝑖𝑗 𝑦𝑗 =

𝑗

𝑗

𝑖

[𝑥]T 𝔛 [𝐵]𝔛 [𝑦]𝔛

,

𝑖,𝑗

regarding a 1 × 1 matrix as an element of 𝕂. Conversely, given a basis and a matrix 𝐴 ∈ M(𝑛, 𝕂), the previous equality determines a bilinear form 𝐵 (which is symmetric if and only if 𝐵 = 𝐵 T ) such that [𝐵]𝔛 = 𝐴. Thus we have isomorphisms between vector spaces over 𝕂: 1. The space of rank-2 tensors 𝑉 (0,2) = 𝑉 ∗ ⊗ 𝑉 ∗ is isomorphic to M(𝑛, 𝕂) via the correspondence 𝐵 → [𝐵]𝔛 once a basis in 𝑉 is specified. 2. The space of symmetric bilinear forms is isomorphic to the vector space of symmetric matrices.

3.1. BASIC DEFINITIONS AND EXAMPLES

103

Note. We write [𝐵]𝔛 for the matrix associated with a bilinear form 𝐵 ∶ 𝑉 × 𝑉 → ℝ given a basis 𝔛 in 𝑉. In this situation, both copies of 𝑉 are involved on the same footing and there is no “initial space” 𝑉 or “target space” 𝑊. In defining the matrix [𝑇]𝔜𝔛 associated with a linear operator 𝑇 ∶ 𝑉 → 𝑊, we had to specify bases 𝔛 in 𝑉 and 𝔜 in 𝑊 (even if 𝑉 = 𝑊 and 𝔛 = 𝔜). ○ We next produce a basis for the space 𝑉 (0,2) = 𝑉 ∗ ⊗ 𝑉 ∗ of bilinear forms and determine its dimension. Proposition 3.5. If 𝔛 = {𝑒𝑖 } is a basis in a finite dimensional vector space 𝑉 and 𝔛∗ = {𝑒𝑖∗ } is the dual basis in 𝑉 ∗ such that ⟨𝑒𝑖∗ , 𝑒𝑗 ⟩ = 𝛿𝑖𝑗 , the monomials 𝑒𝑖∗ ⊗ 𝑒𝑗∗ given by 𝑒𝑖∗ ⊗ 𝑒𝑗∗ (𝑣1 , 𝑣2 ) = ⟨𝑒𝑖∗ , 𝑣1 ⟩ ⋅ ⟨𝑒𝑗∗ , 𝑣2 ⟩ (𝑣1 , 𝑣2 ∈ 𝑉) are a basis in 𝑉 ∗ ⊗ 𝑉 ∗ . Hence, dim(𝑉 ∗ ⊗ 𝑉 ∗ ) = 𝑛2 = dim(𝑉)2 . Proof. The monomials 𝑒𝑖∗ ⊗ 𝑒𝑗∗ span 𝑉 ∗ ⊗ 𝑉 ∗ , because if 𝐵 is any bilinear form and 𝐵𝑖𝑗 = 𝐵(𝑒𝑖 , 𝑒𝑗 ), then 𝐵˜ = ∑𝑖,𝑗 𝐵𝑖𝑗 ⋅ 𝑒𝑖∗ ⊗ 𝑒𝑗∗ has the same action on pairs 𝑒𝑘 , 𝑒ℓ ∈ 𝑉 as the original tensor 𝐵. 𝐵˜(𝑒𝑘 , 𝑒𝑙 ) = ( ∑ 𝐵𝑖𝑗 ⋅ 𝑒𝑖∗ ⊗ 𝑒𝑗∗ )⟨𝑒𝑘 , 𝑒ℓ ⟩ = ∑ 𝐵𝑖𝑗 (𝑒𝑖∗ , 𝑒𝑘 ) ⋅ ⟨𝑒𝑗∗ , 𝑒ℓ ⟩ 𝑖,𝑗

𝑖,𝑗

= ∑ 𝐵𝑖𝑗 𝛿𝑖𝑘 𝛿𝑗ℓ = 𝐵𝑘ℓ = 𝐵(𝑒𝑘 , 𝑒ℓ ), 𝑖,𝑗

so 𝐵˜ = 𝐵 ∈ 𝕂-span{𝑒𝑖∗ ⊗ 𝑒𝑗∗ }. As for linear independence, if 𝐵˜ = ∑𝑖,𝑗 𝑏𝑖𝑗 𝑒𝑖∗ ⊗ 𝑒𝑗∗ is the zero element in 𝑉 (0,2) , then 𝐵˜(𝑥, 𝑦) = 0 for all 𝑥, 𝑦, so 𝑏𝑘ℓ = 𝐵˜(𝑒𝑘 , 𝑒ℓ ) = 0

for 1 ≤ 𝑘, ℓ ≤ 𝑛.

□

A similar discussion shows that the space 𝑉 (0,𝑟) of rank-𝑟 tensors has dimension dim(𝑉 (0,𝑟) ) = dim(𝑉 ∗ ⊗ ⋯ ⊗ 𝑉 ∗ ) = dim(𝑉)𝑟 = 𝑛𝑟 . If 𝔛 = {𝑒1 , … , 𝑒𝑛 } is a basis for 𝑉 and {𝑒𝑖∗ } is the dual basis in 𝑉 ∗ , the monomials 𝑒𝑖∗1 ⊗ ⋯ ⊗ 𝑒𝑖∗𝑟

1 ≤ 𝑖 1 , … , 𝑖𝑟 ≤ 𝑛

are a basis for 𝑉 (0,𝑟) . Theorem 3.6 (Change of Basis). Given 𝐵 ∈ 𝑉 ∗ ⊗ 𝑉 ∗ and a basis 𝔛 in 𝑉, we describe 𝐵 by its matrix as in (3.2). If 𝔜 = {𝑓𝑗 } is another basis and if 𝑓𝑗 = id(𝑓𝑗 ) = ∑ 𝑆𝑘𝑗 𝑒𝑘 𝑘

for 1 ≤ 𝑗 ≤ 𝑛 ,

104

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

then 𝑆 = [𝑆𝑖𝑗 ] = [id]𝔛𝔜 is the transition matrix for basis vectors and we have ([𝐵]𝔜 )𝑖𝑗 = 𝐵(𝑓𝑖 , 𝑓𝑗 ) = 𝐵( ∑ 𝑆𝑘𝑖 𝑒𝑘 , ∑ 𝑆ℓ𝑗 𝑒ℓ ) ℓ

𝑘

T

= ∑ 𝑆𝑘𝑖 𝐵𝑘ℓ 𝑆ℓ𝑗 = ∑(𝑆 )𝑖𝑘 𝐵𝑘ℓ 𝑆ℓ𝑗 𝑘,ℓ

𝑘,ℓ

T

= (𝑆 [𝐵]𝔛 𝑆)𝑖𝑗 . Remark 3.7. We can also write this as [𝐵]𝔜 = 𝑃[𝐵]𝔛 𝑃 T , taking 𝑃 = 𝑆 T = where 𝑖𝑑 ∶ 𝑉 → 𝑉 is the (linear) identity map on 𝑉. (Recall the notational conventions set forth in equation (2.8) of Section 2.4 in LA I.) Thus, change of basis is effected by “congruence” of matrices 𝐴 → 𝑃𝐴𝑃 T with det(𝑃) ≠ 0. This differs considerably from the “similarity transforms” 𝐴 → 𝑆𝐴𝑆 −1 that describe the effect of change of basis on the matrix [𝑇]𝔜𝔛 of a linear operator 𝑇 ∶ 𝑉 → 𝑉. Note that 𝑆 T is generally not equal to 𝑆−1 , so congruence and similarity are not the same thing. The difference between these concepts will emerge when we seek “normal forms” for various kinds of bilinear (or sesquilinear) forms. ○

[id]T 𝔛𝔜

Definition 3.8. A bilinear form 𝐵 is nondegenerate if 𝐵(𝑣, 𝑉) = 0 ⇒ 𝑣 = 0

and

𝐵(𝑉, 𝑣) = 0 ⇒ 𝑣 = 0.

If 𝐵 is either symmetric or antisymmetric, we only need the one-sided version. The radical of 𝐵 is the subspace rad(𝐵) = {𝑣 ∈ 𝑉 ∶ 𝐵(𝑣, 𝑣′ ) = 0 for all 𝑣′ ∈ 𝑉} , which measures the degree of degeneracy of the form 𝐵. The 𝐵-orthocomplement of a subspace 𝑊 ⊆ 𝑉 is defined to be 𝑊 ⟂,𝐵 = {𝑣 ∈ 𝑉 ∶ 𝐵(𝑣, 𝑊) = (0)}. Remark 3.9. Obviously, 𝑊 ⟂,𝐵 is a subspace, and when 𝐵 is symmetric or antisymmetric, the conditions 𝐵(𝑣, 𝑊) = 0 and 𝐵(𝑊, 𝑣) = 0 determine the same “complementary” subspace 𝑊 ⟂,𝐵 . In particular, nondegeneracy means that 𝑉 ⟂,𝐵 = {0}, and in general 𝑉 ⟂,𝐵 is equal to the radical of 𝐵. ○ The notion of “nondegeneracy” is a little ambiguous when the bilinear form 𝐵 is neither symmetric nor antisymmetric: Is there a difference between “rightnondegenerate,” in which 𝐵(𝑉, 𝑦) = 0 ⇒ 𝑦 = 0, and “left-nondegenerate,” in which 𝐵(𝑥, 𝑉) = 0 ⇒ 𝑥 = 0? The answer is no. In fact, if we impose a basis 𝔛 on 𝑉 and view vectors 𝑥, 𝑦 ∈ 𝑉 as 𝑛 × 1 columns, we may write 𝐵(𝑥, 𝑦) = [𝑥]T 𝔛 [𝐵]𝔛 [𝑦]𝔛 , and if [𝐵]𝔛 is singular, there would be some 𝑦 ≠ 0 such that [𝐵]𝔛 [𝑦]𝔛 = 0. Hence 𝐵(𝑉, 𝑦) = 0 if 𝐵 is right-nondegenerate. Thus, [𝐵]𝔛 must be nonsingular if 𝐵 is right-nondegenerate. But this works in both directions, making this condition both necessary and sufficient. The same argument applies to left-nondegeneracy, so the two nondegeneracy conditions are equivalent. We have proved the following lemma: Lemma 3.10. If 𝐵 is a bilinear form on 𝑉, it is right-nondegenerate if and only if [𝐵]𝔛 is nonsingular, and likewise for left-nondegeneracy.

3.1. BASIC DEFINITIONS AND EXAMPLES

105

Thus, we may drop the “left/right” conditions on nondegeneracy. Lemma 3.11. If 𝐵 is a nondegenerate bilinear form on a finite dimensional vector space 𝑉, 𝐵 determines a natural linear bijection 𝐽𝐵 ∶ 𝑉 → 𝑉 ∗ given by ⟨𝐽𝐵 (𝑣), 𝑣′ ⟩ = 𝐵(𝑣′ , 𝑣)

for 𝑣′ ∈ 𝑉.

Proof. If |𝑉| < ∞, any nondegenerate bilinear form 𝐵 mediates a natural bijection 𝐽 = 𝐽𝐵 ∶ 𝑉 → 𝑉 ∗ that identifies each vector 𝑣 ∈ 𝑉 with a functional 𝐽(𝑣) in 𝑉 ′ such that ⟨𝐽(𝑣), 𝑣′ ⟩ = 𝐵(𝑣′ , 𝑣)

for all 𝑣, 𝑣′ ∈ 𝑉.

This map is clearly 𝕂-linear and 𝐽(𝑤) = 0 ⇒ 𝐵(𝑉, 𝑤) = 0 ⇒ 𝑤 = 0 by nondegeneracy of 𝐵, so 𝐽 is one-to-one. It is also a bijection, hence an isomorphism, because dim(𝑉) = dim(𝑉 ∗ ). □ Hereafter, it will often be convenient to abbreviate dim(𝑉) = |𝑉| in discussing 𝐵-orthocomplements of subspaces in 𝑉. Theorem 3.12 (Dimension of 𝐵-Orthocomplements). If 𝐵 is a nondegenerate bilinear form on a finite dimensional space 𝑉, the dimension of the 𝐵-orthocomplement 𝑀 ⟂,𝐵 = {𝑤 ∶ 𝐵(𝑉, 𝑤) = 0} of a subspace is related to that of 𝑀 by (3.3)

Dimension Formula: |𝑀| + |𝑀 ⟂,𝐵 | = |𝑉| ,

even though we need not have 𝑀 ∩ 𝑀 ⟂,𝐵 = (0) or 𝑉 = 𝑀 ⊕ 𝑀 ⟂,𝐵 . This is essentially a consequence of previous results about annihilators 𝑀 ∘ ⊆ 𝑉 ∗ of subspaces 𝑀 ⊆ 𝑉 (discussed in LA I, Exercises 3.25 - 3.26). For completeness, here is a self-contained proof of this relation. In Section 2.3 of LA I we defined the “annihilator” 𝑀 ∘ of a subspace 𝑀 ∘ ⊆ 𝑉 to be a subspace in 𝑉 ∗ with 𝑀 ∘ = {ℓ ∈ 𝑉 ∗ ∶ ⟨ℓ, 𝑀⟩ = 0} and explained why (𝑀 ∘ )∘ = 𝑀

and

|𝑉| = |𝑀| + |𝑀 ∘ |

when |𝑉| < ∞. The annihilator 𝑀 ∘ is analogous to the orthogonal complement 𝑀 ⟂ in an inner product space, but it lives in the dual space 𝑉 ∗ instead of 𝑉. It has the advantage that 𝑀 ∘ makes sense in any vector space 𝑉, whether or not it is equipped with an inner product or a nondegenerate bilinear form. (Also, orthogonal complements 𝑀 ⟂ depend on the particular inner product on 𝑉, while the annihilator 𝑀 ∘ has an absolute meaning all moving observers can agree upon.) Exercise 3.13. When 𝑉 is equipped with a nondegenerate bilinear form 𝐵, we may invoke the natural isomorphism 𝑉 ≅ 𝑉 ∗ it induces (as in Lemma 3.11) to identify an annihilator 𝑀 ∘ in 𝑉 ∗ with a uniquely defined subspace 𝐽𝐵−1 (𝑀 ∘ ) in 𝑉. From the definitions, verify that 𝑀 ∘ ⊆ 𝑉 ∗ becomes the 𝐵-orthocomplement 𝑀 ⟂,𝐵 ⊆ 𝑉 under these identifications.

106

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

With this in mind, the dimension formula (3.3) is an immediate consequence of the following exercise regarding annihilators. Exercise 3.14. If 𝑉 is finite dimensional and 𝑀 is a subspace, let 𝔛 = {𝑣1 , … , 𝑣𝑟 , 𝑣𝑟+1 , … , 𝑣𝑛 } be a basis in 𝑉 such that 𝕂-span{𝑥1 … 𝑥𝑟 } = 𝑀 and let 𝔛∗ = {𝑣1∗ , … , 𝑒𝑛∗ } be the basis in 𝑉 ∗ dual to 𝔛 in 𝑉. ∗ (a) Verify that the annihilator 𝑀 ∘ is equal to 𝕂-span{𝑣𝑟+1 , … , 𝑣𝑟∗ }. (b) Use this and the remarks of Exercise 3.13 to show that |𝑀 ⟂,𝐵 | = |𝑀 ∘ | = 𝑛 − (𝑟 + 1) = |𝑉| − |𝑀| as asserted in equation (3.3). Dealing with Degeneracy of 𝑩. If 𝐵 is degenerate, so the radical rad(𝐵) is nonzero, the role of the radical can be eliminated for most practical purposes, allowing us to focus on nondegenerate forms. Exercise 3.15. Let 𝑀 = rad(𝐵) and form the quotient space 𝑉˜ = 𝑉/𝑀. Show the following. (a) 𝐵 induces a well-defined bilinear form 𝐵˜ ∶ 𝑉˜ × 𝑉˜ → 𝕂 if we let 𝐵˜(𝑥 + 𝑀, 𝑦 + 𝑀) = 𝐵(𝑥, 𝑦) for all 𝑥, 𝑦 ∈ 𝑉. (b) 𝐵˜ is symmetric (or antisymmetric) ⇔ 𝐵 is. (c) Prove that 𝐵˜ is nondegenerate on the quotient space 𝑉˜ = 𝑉/𝑀. Exercise 3.16. Given 𝑛 × 𝑛 matrices 𝐴, 𝐵, show that 𝐱T 𝐵𝐲 = 𝐱T 𝐴𝐲 for all 𝐱, 𝐲 ∈ 𝕂𝑛 if and only if 𝐴 = 𝐵. 3.2. Canonical Models for Bilinear Forms In doing calculations, it is natural to work with the matrices [𝐵]𝔛 that represent 𝐵 with respect to various bases and seek bases yielding the simplest possible form. If a bilinear form 𝐵 is represented by 𝐴 = [𝐵]𝔛 for some basis, we must examine the effect of a change of basis 𝔛 → 𝔜 and describe [𝐵]𝔜 in terms of the transition matrix 𝑆 = [id]𝔛𝔜 that tells us how to write vectors in the 𝔜basis in terms of vectors in 𝔛, as in (3.2). Writing 𝔛 = {𝑒𝑖 } and 𝔜 = {𝑓𝑗 }, let 𝑆 = [𝑠𝑖𝑗 ] = [id]𝔛𝔜 be the matrix of coefficients such that (3.4)

𝑓𝑗 = id(𝑓𝑗 ) = ∑ 𝑠𝑘𝑗 𝑒𝑘

for 1 ≤ 𝑗 ≤ 𝑛.

𝑘

Obviously, det(𝑆) ≠ 0 because this system of vector equations must be invertible. In Theorem 3.6, we worked out the effect of this basis change: [𝐵]𝔜 = 𝑆 T [𝐵]𝔛 𝑆

where 𝑆 = [id]𝔛𝔜 ,

and if we let 𝑃 = 𝑆 T , this becomes (3.5)

[𝐵]𝔜 = 𝑃[𝐵]𝔛 𝑃T

with 𝑃 = 𝑆 T .

We now show that there is only a limited number of possible “canonical forms” for nondegenerate bilinear forms on a vector space of dim 𝑉 = 𝑛, at least when

3.2. CANONICAL MODELS FOR BILINEAR FORMS

107

𝐵 is either symmetric or antisymmetric (the forms of greatest interest in applications). We might also ask whether these canonical forms are unique. (Answer: not very.) The Automorphism Group of a Bilinear Form 𝐵. If a vector space is equipped with a nondegenerate bilinear form 𝐵, a natural (and important) symmetry group, the automorphism group Aut(𝐵) ⊆ GL𝕂 (𝑉), comes along with it. It consists of the invertible linear maps 𝑇 ∶ 𝑉 → 𝑉 that “leave the form invariant,” with 𝐵(𝑇(𝑥), 𝑇(𝑦)) = 𝐵(𝑥, 𝑦)

for all vectors 𝑥, 𝑦 ∈ 𝑉.

We have encountered such automorphism groups before, by various names. For example, 1. The Real-Orthogonal Group O(𝑛) consists of the invertible linear maps 𝑇 on ℝ𝑛 that preserve the usual inner product, 𝑛

for 𝐱, 𝐲 ∈ ℝ𝑛 .

𝐵(𝐱, 𝐲) = ∑ 𝑥𝑖 𝑦𝑖 𝑖=1

As explained in Theorem 6.68 of LA I, the automorphisms that preserve this symmetric bilinear form are precisely the linear rigid motions on Euclidean space, those that leave invariant lengths of vectors and distances between them, so that ‖𝑇(𝐱)‖ = ‖𝐱‖

and

‖𝑇(𝐱) − 𝑇(𝐲)‖ = ‖𝐱 − 𝐲‖

for 𝐱, 𝐲 ∈ ℝ𝑛 , where 𝑛

1/2

‖𝐱‖ = ( ∑ |𝑥𝑖 |2 )

(Pythagoras’ formula).

𝑖=1

2. The Unitary Group U(𝑛) is the group of invertible linear operators on 𝑉 = ℂ𝑛 that preserve the (Hermitian, sesquilinear) standard inner product 𝑛

𝐵(𝐳, 𝐰) = ∑ 𝑧𝑘 𝑤𝑘 𝑘=1

on complex 𝑛-space. For these operators, the following conditions are equivalent (see Theorem 6.68 and Proposition 6.71 of LA I): 𝑇 ∈ U(𝑛) ⇔ 𝐵(𝑇(𝐳) , 𝑇(𝐰)) = 𝐵(𝐳, 𝐰) ⇔ ‖𝑇(𝐳)‖ = ‖𝐳‖ ⇔ ‖𝑇(𝐳) − 𝑇(𝐰)‖ = ‖𝐳 − 𝐰‖ for 𝐳, 𝐰 ∈ ℂ𝑛 , where 𝑛

‖𝐳‖ = 𝐵(𝐳, 𝐳)1/2 = ( ∑ |𝑧𝑖 |2 ) 𝑖=1

1/2

(Pythagoras’ formula for ℂ𝑛 ).

108

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

Exercise 3.17. Explain why U(𝑛) is a closed and bounded subset in matrix 2 space M(𝑛, ℂ) ≅ ℂ𝑛 — i.e., (a) There is a bound 𝑀 > 0 such that all entries in a unitary matrix 𝐴 satisfy |𝑎𝑖𝑗 | ≤ 𝑀, and (b) If a sequence {𝐴𝑛 } of unitary matrices converges to a limit in matrix space M(𝑛, ℂ), say with 𝐴𝑛 → 𝐴 as 𝑛 → ∞, the limit 𝐴 is also a unitary matrix. Hint. In (a) you must find a common upper bound 𝑀 for the size |𝑎𝑖𝑗 | of the (complex) entries in any unitary 𝑛 × 𝑛 matrix. In (b) recall the discussion of limits of matrices and linear operators in LA I (Sections 5.3 - 5.4). For matrices, (𝑛) we have 𝐴(𝑛) → 𝐴 as 𝑛 → ∞ ⇔ 𝐴𝑖𝑗 → 𝐴𝑖𝑗 in ℂ as 𝑛 → ∞ for all 𝑖, 𝑗. 3. The Complex Orthogonal Group O(𝑛, ℂ) is the automorphism group of the following bilinear form on complex 𝑛-space ℂ𝑛 : 𝑛

𝐵(𝐳, 𝐰) = ∑ 𝑧𝑘 𝑤𝑘

(𝐳, 𝐰 ∈ ℂ𝑛 ).

𝑘=1

This is bilinear over 𝕂 = ℂ, but is not an inner product; it is not conjugate-linear in the entry 𝐰 because 𝑤𝑘 appears in 𝐵 instead of 𝑤𝑘 . Furthermore, not all vectors have 𝐵(𝐳, 𝐳) ≥ 0 (try 𝐳 = (1, 𝑖) in ℂ2 ). In the present section, we will systematically examine the canonical forms and associated automorphism groups for nondegenerate symmetric or antisymmetric forms over 𝕂 = ℝ or ℂ. The number of possibilities is surprisingly small. Definition 3.18. The automorphism group of a nondegenerate symmetric or antisymmetric form 𝐵 ∶ 𝑉 × 𝑉 → 𝕂 is (3.6) Aut(𝐵) = {𝑇 ∈ GL𝕂 (𝑉) ∶ 𝐵(𝑇(𝑣), 𝑇(𝑤)) = 𝐵(𝑣, 𝑤)

for all 𝑣, 𝑤 ∈ 𝑉} .

Here, GL𝕂 (𝑉) = {𝑇 ∶ det(𝑇) ≠ 0} is the general linear group on 𝑉, consisting of all invertible 𝕂-linear operator 𝑇 ∶ 𝑉 → 𝑉. Aut(𝐵) is a group because it contains the identity element 𝐼 = id𝑉 , the composition product 𝑆 ∘ 𝑇 of any two elements, and the inverse 𝑇 −1 of any element. Given a basis 𝔛 = {𝑒𝑖 } for 𝑉, each linear operator 𝑇 ∈ Aut(𝐵) corresponds to an invertible matrix [𝑇]𝔛𝔛 such that 𝑇(𝑒𝑗 ) = ∑𝑘 𝑡𝑘𝑗 𝑒𝑘 , and these matrices form a group 𝐺𝐵,𝔛 = { [𝑇]𝔛𝔛 ∶ 𝑇 ∈ Aut(𝐵) } under matrix multiplication (⋅). The group (Aut(𝐵), ∘) and the matrix group (𝐺𝐵,𝔛 , ⋅ ) are isomorphic and are often identified. Matrices in 𝐺𝐵,𝔛 are characterized by the following algebraic properties: (3.7)

𝐺𝐵,𝔛 = {𝐸 ∈ GL(𝑛, 𝕂) ∶ 𝐸 T [𝐵]𝔛 𝐸 = [𝐵]𝔛 } .

3.2. CANONICAL MODELS FOR BILINEAR FORMS

109

This follows because [𝑇(𝑥)]𝔛 = [𝑇]𝔛𝔛 ⋅ [𝑥]𝔛 for 𝑥 ∈ 𝑉. Hence, 𝑇 ∈ Aut(𝐵) ⇔ 𝐵(𝑥, 𝑦) = 𝐵(𝑇(𝑥), 𝑇(𝑦)) for all 𝑥, 𝑦 ∈ 𝑉 T ⇔ [𝑥]T 𝔛 [𝐵]𝔛 [𝑦]𝔛 = [𝑇(𝑥)]𝔛 [𝐵]𝔛 [𝑇(𝑦)]𝔛 T = [𝑥]T 𝔛 ([𝑇]𝔛𝔛 [𝐵]𝔛 [𝑇]𝔛𝔛 ) [𝑦]𝔛

for 𝑥, 𝑦 ∈ 𝑉

⇔ [𝐵]𝔛 = [𝑇]T 𝔛𝔛 [𝐵]𝔛 [𝑇]𝔛𝔛 , so 𝑇 is an automorphism of the bilinear form 𝐵 if and only if the matrix [𝑇]𝔛𝔛 satisfies the identity [𝐵]𝔛 = [𝑇]T 𝔛𝔛 [𝐵]𝔛 [𝑇]𝔛𝔛 , and this must be true for any basis 𝔛. Matrices in 𝐺𝐵,𝔛 are precisely the matrix realizations (with respect to basis 𝔛) of the various automorphisms in Aut(𝐵). Exercise 3.19. If 𝐵 is a nondegenerate bilinear form, show that 𝐺𝐵 = Aut(𝐵) is a subgroup in the general linear group GL𝕂 (𝑉) — i.e., that (a) 𝐼 ∈ 𝐺𝐵 , (b) 𝑇1 , 𝑇2 ∈ 𝐺𝐵 ⇒ 𝑇1 ∘ 𝑇2 ∈ 𝐺𝐵 , and (c) 𝑇 ∈ 𝐺𝐵 ⇒ 𝑇 −1 ∈ 𝐺𝐵 . We can also assess the effect of a change of basis 𝔛 → 𝔜. The matrix group 𝐺𝐵,𝔜 is a conjugate 𝐺𝐵,𝔜 = 𝐴 ⋅ 𝐺𝐵,𝔛 ⋅ 𝐴−1 of 𝐺𝐵,𝔛 for some 𝐴 ∈ GL(𝑛, 𝕂). Exercise 3.20. If 𝔛, 𝔜 are bases in 𝑉, define 𝐺𝐵,𝔛 and 𝐺𝐵,𝔜 as in (3.7) and prove that 𝐺𝐵,𝔜 = 𝑆𝐺𝐵,𝔛 𝑆−1

where 𝑆 = [id]𝔜𝔛 and 𝑆−1 = [id]𝔛𝔜 . 𝑛

Recall that 𝑆 = [𝑆]𝔛𝔜 is the matrix such that 𝑓𝑖 = id(𝑓𝑖 ) = ∑𝑘=1 𝑠𝑘𝑖 𝑒𝑘 if 𝔛 = {𝑒𝑖 } and 𝔜 = {𝑓𝑗 }. The general linear group GL𝕂 (𝑉) in which all these automorphism groups live is defined by the condition det(𝑇) ≠ 0, which makes no reference to a bilinear form. The special linear group on 𝑉, SL𝕂 (𝑉) = {𝑇 ∈ GL𝕂 (𝑉) ∶ det(𝑇) = 1}, is another “classical group” that does not arise as the automorphism group of a bilinear form 𝐵. All the other classical groups of physics and geometry are automorphism groups of particular bilinear forms on 𝑉 or their intersections with SL𝕂 (𝑉). We determine the congruence classes of nondegenerate bilinear forms according to whether 𝐵 is symmetric or antisymmetric and whether the ground field is 𝕂 = ℝ or ℂ, always assuming 𝐵 is nondegenerate. The analysis is the same for antisymmetric forms over 𝕂 = ℝ or ℂ, so there are really only three cases to deal with. Canonical Forms. Case 1: 𝑩 Symmetric, 𝕂 = ℝ. If 𝐵 is a nondegenerate symmetric bilinear form on a vector space over ℝ with dim(𝑉) = 𝑛, there are just 𝑛 + 1 possible canonical forms.

110

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

Theorem 3.21 (𝐵 symmetric; 𝕂 = ℝ). There is an ℝ-basis 𝔛 ⊆ 𝑉 such that the matrix describing 𝐵 has the form (3.8)

[𝐵]𝔛 = (

𝐼𝑝×𝑝

0

0

−𝐼𝑞×𝑞

)

with 𝑝 + 𝑞 = 𝑛 = dim(𝑉).

In this case, we say 𝐵 has signature (𝑝, 𝑞). The possibilities 𝑝 = 0 and 𝑝 = 𝑛 are allowed. Proof. First observe that we have a polarization identity for symmetric 𝐵 which determines 𝐵(𝑣, 𝑤) from homogeneous expressions of the form 𝐵(𝑢, 𝑢), just as with inner products over ℝ: (3.9)

𝐵(𝑣, 𝑤) =

1 [ 𝐵(𝑣 + 𝑤, 𝑣 + 𝑤) − 𝐵(𝑣, 𝑣) − 𝐵(𝑤, 𝑤) ] 2

for all 𝑣, 𝑤 ∈ 𝑉. Definition 3.22. The map 𝑄(𝑣) = 𝐵(𝑣, 𝑣) from 𝑉 to ℝ is the quadratic form associated with a nondegenerate symmetric bilinear form 𝐵 ∶ 𝑉 × 𝑉 → ℝ. Note that 𝑄(𝜆𝑣) = 𝐵(𝜆𝑣, 𝜆𝑣) = 𝜆2 𝑄(𝑣, 𝑣), and the quadratic form 𝑄 ∶ 𝑉 → ℝ determines the full bilinear form 𝐵 ∶ 𝑉 × 𝑉 → 𝕂 via the polarization identity (3.9). We proceed by arguing by induction on 𝑛 = |𝑉|. The case 𝑛 = 1 is trival, so we assume Theorem 3.21 true for all spaces of dimension ≤ 𝑛 and consider a form 𝐵 on a space with |𝑉| = 𝑛 + 1. Since 𝐵 ≢ 0, it follows from (3.9) that there is some 𝑣1 ≠ 0 such that 𝐵(𝑣1 , 𝑣1 ) ≠ 0, and after scaling 𝑣1 by some 𝑎 ≠ 0, we can ensure that 𝐵(𝑣1 , 𝑣1 ) = ±1. But because 𝕂 = ℝ, we cannot control whether the outcome will be +1 or −1. Let 𝑀1 = ℝ ⋅ 𝑣1 and ⟂,𝐵

𝑀1

= {𝑣 ∈ 𝑉 ∶ 𝐵(𝑣, 𝑣1 ) = 0 }.

⟂,𝐵

We have 𝑀1 ∩ 𝑀1 = {0} because any 𝑤 in the intersection must have the ⟂,𝐵 form 𝑤 = 𝑐1 𝑣1 , 𝑐1 ∈ ℝ. But 𝑤 ∈ 𝑀1 too, so 0 = 𝐵(𝑤, 𝑤) = 𝑐12 𝐵(𝑣1 , 𝑣1 ) = ±𝑐12 , ⟂,𝐵 hence 𝑐1 = 0 and 𝑤 = 0. Therefore 𝑀1 ⊕𝑀1 = 𝑉 because |𝑊|+|𝑊 ⟂,𝐵 | = |𝑉| for any 𝑊 ⊆ 𝑉 (Exercise 3.14). For an alternative proof, recall the general dimension formula |𝑊1 + 𝑊2 | = |𝑊1 | + |𝑊2 | − |𝑊1 ∩ 𝑊2 | for subspaces 𝑊1 , 𝑊2 in a finite-dimensional vector space 𝑉. ⟂,𝐵 ⟂,𝐵 ⟂,𝐵 If 𝐵1 is the restriction of 𝐵 to 𝑀1 , we claim that 𝐵1 ∶ 𝑀1 × 𝑀1 → ℝ is nondegenerate on this lower-dimensional subspace. Otherwise, there would ⟂,𝐵 ⟂,𝐵 ⟂,𝐵 be an 𝑥 ∈ 𝑀1 such that 𝐵(𝑥, 𝑀1 ) = 0. But since 𝑥 ∈ 𝑀1 too, we would also have 𝐵(𝑥, 𝑀1 ) = 0, and therefore by additivity of 𝐵 in each entry, ⟂,𝐵

𝐵(𝑥, 𝑉) = 𝐵(𝑥 , 𝑀1

+ 𝑀1 ) = 0.

Nondegenerancy of 𝐵 on 𝑉 then forces 𝑥 = 0.

3.2. CANONICAL MODELS FOR BILINEAR FORMS

111

We may therefore continue by induction on dim(𝑉). Choosing a suitable ⟂,𝐵 basis 𝔛′ = {𝑣2 , … , 𝑣𝑛+1 } in 𝑀1 and 𝔛 = {𝑣1 , 𝑣2 , … , 𝑣𝑛+1 } in 𝑉, we get ±1 ⎛ [𝐵]𝔛 = ⎜ 0 ⎜ ⎝ 0

⋯ 𝐼𝑝×𝑝

0 −𝐼𝑞×𝑞

⎞ ⎟ ⎟ ⎠

with 𝑝 + 𝑞 = 𝑛 − 1.

If the top left entry is −1, we may switch vectors 𝑒1 ↔ 𝑒𝑝+1 , which replaces [𝐵]𝔛 with [𝐵]𝔜 = 𝐸 T [𝐵]𝔛 𝐸 where 𝐸 is the following permutation matrix (the zero on the diagonal is in column 𝑝 + 1): 0 0 ⋯ 0 1 0 ⋯ 0 ⎛ 0 +1 0 0 ⎞ ⎜⋮ ⎟ ⋱ ⋮ ⎜ ⎟ 0 +1 ⎟ 𝐸=⎜ 0 ⎜1 ⎟ ⋮ −1 ⎜0 ⎟ 0 ⋱ 0 ⎟ ⎜⋮ 0 −1 ⎠ ⎝0 Note that 𝐸 T = 𝐸 for this permutation matrix so that [𝐵]𝔜 has the block-diagonal form (3.8). That concludes the proof of Theorem 3.21. □ Later on we will describe an algorithmic procedure for putting 𝐵 into canonical form diag(+1, … , +1, −1, … , −1); these algorithms work the same way over 𝕂 = ℝ or ℂ. We will also see that an antisymmetric 𝐵 cannot be diagonalized by any congruence, but it nevertheless has a different (and equally useful) canonical form. The Real Orthogonal Groups O(𝑝, 𝑞), 𝑝+𝑞 = 𝑛. The outcome in Theorem 3.21 breaks into 𝑛 + 1 possibilities (including 𝑝 = 0 and 𝑞 = 0). If 𝔛 is a basis such that [𝐵]𝔛 has the standard form (3.8), then 𝐴 ∈ 𝐺𝐵,𝔛 if and only if (3.10)

𝐴T (

𝐼 0 0 𝐼𝑝×𝑝 ) 𝐴 = ( 𝑝×𝑝 ) , 0 −𝐼𝑞×𝑞 0 −𝐼𝑞×𝑞

which can be written concisely as 𝐴T 𝐽𝐴 = 𝐽

where 𝐽 = (

𝐼𝑝×𝑝 0 ). 0 −𝐼𝑞×𝑞

Members of this family of matrix groups over ℝ are denoted by O(𝑝, 𝑞), and each contains as a subgroup the Special Orthogonal Group: SO(𝑝, 𝑞) = O(𝑝, 𝑞) ∩ SL(𝑛, ℝ). Several of the groups O(𝑝, 𝑞) and SO(𝑝, 𝑞) are of particular interest. The Real-Orthogonal Groups O(𝑛, 0) = O(𝑛) and SO(𝑛). With respect to the standard basis in ℝ𝑛 , we have 𝐵𝔛 = 𝐼𝑛×𝑛 , so 𝐽 = 𝐼𝑛×𝑛 in (3.10) and O(𝑛, 0) = 𝐺𝐵,𝔛 = {𝐴 ∶ 𝐴T 𝐴 = 𝐴T 𝐼𝐴 = 𝐼}.

112

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

The set of multiplication operators 𝐿𝐴 (𝐱) = 𝐴 ⋅ 𝐱 for 𝐴 ∈ O(𝑛, 0) is just the familar group of orthogonal linear transformations on ℝ𝑛 , traditionally denoted 2 O(𝑛). This group is a closed and bounded set in matrix space M(𝑛, ℝ) ≅ ℝ𝑛 . The Lorentz Group O(𝑛 − 1, 1). This is the group of space-time symmetries at the center of Einstein’s theory of special relativity for 𝑛 − 1 space dimensions 𝑥1 , … , 𝑥𝑛−1 and one time dimension 𝑥𝑛 (which is generally labeled “𝑡” by physicists). For a suitably chosen basis 𝔛 in ℝ𝑛 , the matrix describing an arbitrary nondegenerate symmetric bilinear form 𝐵 of signature (𝑛 − 1, 1) becomes 1 0 ⎛ ⎞ ⋱ (3.11) [𝐵]𝔛 = ⎜ ⎟ , 1 ⎜ ⎟ −1 ⎠ ⎝0 and the associated quadratic form is 2 2 2 𝐵(𝑥, 𝑥) = [𝑥]T 𝔛 [𝐵]𝔛 [𝑥]𝔛 = 𝑥1 + ⋯ + 𝑥𝑛−1 − 𝑥𝑛 .

Remark 3.23. The physicists’ version of this is a little different: 2 − 𝑐2 𝑡 2 , 𝐵(𝑥, 𝑥) = 𝑥12 + ⋯ + 𝑥𝑛−1

where 𝑐 is the speed of light. But the numerical value of 𝑐 depends on the physical units used to describe it — feet per second, etc. – and one can always choose the units of length and time to make the experimentally measured speed of light have numerical value 𝑐 = 1. For instance we could take 𝑡 = 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 and measure lengths in light-seconds = the distance a light ray travels in one second; or we could measure 𝑡 in 𝑦𝑒𝑎𝑟𝑠 and lengths in light-years. Either way, the numerical value of the speed of light is 𝑐 = 1. ○ From (3.11), it is clear that 𝐴 is in O(𝑛 − 1, 1) if and only if (3.12)

𝐴T (

𝐼𝑛−1 0 𝐼 0 ) 𝐴 = ( 𝑛−1 ). 0 −1 0 −1

O(𝑛 − 1, 1) contains the subgroup SO(𝑛 − 1, 1) = O(𝑛 − 1, 1) ∩ SL(𝑛, ℝ) which correspond to proper Lorentz transformations, those with det(𝐴) = +1. ˜ − 1) of the usual orthogonal Within the group SO(𝑛 − 1, 1), we find a copy SO(𝑛 group SO(𝑛 − 1) embedded in M(𝑛, ℝ) via the one-to-one homomorphism ˜ − 1) in M(𝑛 − 1, ℝ) ↦ ( 𝐴 0 ) ∈ SO(𝑛 − 1, 1) in M(𝑛, ℝ). 𝐴 ∈ SO(𝑛 0 1 ˜ − 1), called the Galilean group, acts only on the “space The subgroup SO(𝑛 coordinates” 𝑥1 , … , 𝑥𝑛−1 in ℝ𝑛 , leaving the time coordinate 𝑡 = 𝑥𝑛 fixed. The following family of matrices in O(𝑛 − 1, 1) is of particular interest in understanding the meaning of special relativity:

(3.13)

2 √ 0 0 −𝑣/√1 − 𝑣2 ⎛ 1/ 1 − 𝑣 0 1 ⋱ 0 ⎜ ⋮ ⋱ ⋮ 𝐴=⎜ 0 1 0 ⎜ 2 √ √ ⎝ −𝑣/ 1 − 𝑣 0 ⋯ 0 1/ 1 − 𝑣2

⎞ ⎟ ⎟. ⎟ ⎠

3.2. CANONICAL MODELS FOR BILINEAR FORMS

113

When we employ units that make the speed of light 𝑐 = 1, the parameter 𝑣 must have values |𝑣| < 1 to prevent the corner entries in this array from having physically meaningless imaginary values; as 𝑣 → 1, these entries blow up, so SO(𝑛 − 1, 1) is indeed an unbounded set in matrix space M(𝑛, ℝ). In special relativity, an event is described by a point (𝐱, 𝑡) in space-time ℝ𝑛−1 ×ℝ that specifies the location 𝐱 and the time 𝑡 at which the event occurred. Now suppose two observers are moving through space at constant velocity with respect to one another (no acceleration as time passes). Each will use his or her own frame of reference in observing an event to assign space-time coordinates to it. The matrix 𝐴 in (3.13) tells us how to make the (relativistic) transition from the values (𝐱, 𝑡) seen by Observer #1 to those recorded by Observer #2:1 (

𝐱′ 𝐱 )=𝐴⋅( ) 𝑡 𝑡′

Exercise 3.24. Verify that the matrices in (3.14) all lie in SO(𝑛 − 1, 1). Be sure to check that det(𝐴) = +1. Hint. Show that (3.14) ⇒ det(𝐴)2 = 1, so det(𝐴) = ±1, and then argue that det(𝐼) = 1 and det(𝐴) is a continuous function of the real-valued parameter −1 < 𝑣 < +1. Exercise 3.25. Show that cosh(𝑦) ⎛ 0 𝐵=⎜ 0 ⎜ ⎝ sinh(𝑦)

0 1 0 0

0 sinh(𝑦) ⎞ 0 0 ⎟ 1 0 ⎟ 0 cosh(𝑦) ⎠

is in SO(3, 1) for all 𝑦 ∈ ℝ. Remark 3.26. If we work with physical units that do not make 𝑐 = 1, as assumed in (3.13), we must replace “√1 − 𝑣2 ” everywhere it appears with 𝑣 2 − 1 ( ) √ 𝑐 in which the speed of light 𝑐 appears explicitly.

○

Sylvester’s Theorem: Invariance of the Signature on O𝑠𝑦𝑚𝑏𝑓𝑖𝑡(𝑝, 𝑞). One way to compute the signature would be to find a basis that puts [𝐵]𝔛 into the block-diagonal form (3.10), but how do we know the signature does not depend on the basis used to compute it? That it does not is the subject of the next theorem. Proving this amounts to showing that the signature is a congruence invariant: you cannot transform (

𝐼𝑝×𝑝

0

0

−𝐼𝑞×𝑞

)

⟶

(

𝐼𝑝′ ×𝑝′

0

0

−𝐼𝑞′ ×𝑞′

)

1 To keep things simple, the transition matrix (3.13) describes what happens when Observer #2 is moving with velocity 𝑣 in the positive 𝑥1 -direction, as seen by Observer #1, so that 𝑥′1 = 𝑥1 − 𝑣𝑡, 𝑥′2 = 𝑥2 , … , 𝑥′𝑛−1 = 𝑥𝑛−1 . The general formula is more complicated.

114

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

unless 𝑝′ = 𝑝 and 𝑞′ = 𝑞. This fact is often referred to as “Sylvesters’s law of inertia.” Theorem 3.27 (Sylvester). If 𝐴 is a nondegenerate real symmetric 𝑛 × 𝑛 matrix, there is some 𝑃 ∈ GL(𝑛, ℝ) such that 𝑃 T 𝐴𝑃 = diag(1, … , 1, −1, … , −1). The number 𝑝 of +1 entries and the canonical form (3.10) are uniquely determined. Proof. Existence of a diagonalization has already been proved in Theorem 3.21. If 𝐵(𝐱, 𝐲) = ∑𝑖,𝑗 𝑥𝑖 𝐴𝑖𝑗 𝑦𝑗 = 𝐱T 𝐴𝐲 is a nondegenerate symmetric bilinear form on ℝ𝑛 , so [𝐵] = [𝐴𝑖𝑗 ] with respect to the standard basis, then there is a basis 𝔛 such that [𝐵]𝔛 = diag(1, … , 1, −1, … , −1). Suppose 𝑝 = #(entries = +1) for 𝔛 and that there is another diagonalizing basis 𝔜 such that 𝑝′ = #(entries = +1) is ≠ 𝑝. We may assume 𝑝 < 𝑝′ . Writing 𝔛 = {𝑣1 , … , 𝑣𝑝 , 𝑣𝑝+1 , … , 𝑣𝑛 } and ′ 𝔜 = {𝑤1 , … , 𝑤𝑝′ , 𝑤𝑝′ +1 , … , 𝑤𝑛 }, define 𝐿 ∶ 𝑉 → ℝ𝑝−𝑝 +𝑛 via 𝐿(𝐱) = (𝐵(𝐱, 𝑣1 ), … , 𝐵(𝐱, 𝑣𝑝 ), 𝐵(𝐱, 𝑤𝑝′ +1 ), … , 𝐵(𝐱, 𝑤𝑛 )). ′

Then rank(𝐿) is at most dim(ℝ𝑝−𝑝 +𝑛 ) = 𝑝 − 𝑝′ + 𝑛 < 𝑛. Hence, dim(ker(𝐿)) = dim(𝑉) − rank(𝐿) > 0, and there is some 𝑣0 ≠ 0 in 𝑉 such that 𝐿(𝑣0 ) = 0. That means 𝐵(𝑣0 , 𝑣𝑖 ) = 0 for 1 ≤ 𝑖 ≤ 𝑝

and

for 𝑝′ + 1 ≤ 𝑖 ≤ 𝑛.

𝐵(𝑣0 , 𝑤𝑖 ) = 0 𝑛

𝑛

Writing 𝑣0 in terms of the two bases, we have 𝑣0 = ∑𝑗=1 𝑎𝑗 𝑣𝑗 = ∑𝑘=1 𝑏𝑘 𝑤𝑘 . For 𝑖 ≤ 𝑝, we get 0 = 𝐵(𝑣0 , 𝑣𝑖 ) = 𝐵( ∑ 𝑎𝑗 𝑣𝑗 , 𝑣𝑖 ) = ∑ 𝑎𝑗 𝐵(𝑣𝑗 , 𝑣𝑖 ) 𝑗

𝑗

= ∑ 𝑎𝑗 𝛿𝑖𝑗 = 𝑎𝑖 = 𝑎𝑖 𝐵(𝑣𝑖 , 𝑣𝑖 ) 𝑗

because [𝐵]𝔛 = diag(1, … , 1, −1, … , −1). But 𝐵(𝑣𝑖 , 𝑣𝑖 ) > 0 for 𝑖 ≤ 𝑝 while 𝐵(𝑣0 , 𝑣𝑖 ) = 0, so we conclude that 𝑎𝑖 = 0 for 0 ≤ 𝑖 ≤ 𝑝. Similarly, 𝑏𝑗 = 0 for 𝑝′ + 1 ≤ 𝑗 ≤ 𝑛. It follows that 𝑎𝑖 ≠ 0 for some 𝑝′ < 𝑖 ≤ 𝑛, and hence 𝑛

𝑛

𝑛

𝐵(𝑣0 , 𝑣0 ) = 𝐵( ∑ 𝑎𝑗 𝑣𝑗 , ∑ 𝑎ℓ 𝑣ℓ ) = ∑ 𝑎𝑗2 𝐵(𝑣𝑗 , 𝑣𝑗 ) ℓ=1

𝑗=1

𝑗=1

𝑛

= ∑ 𝑎𝑗2 𝐵(𝑣𝑗 , 𝑣𝑗 ) < 0. 𝑗=𝑝+1

Furthermore, 𝑛

𝑛

𝑛

𝐵(𝑣0 , 𝑣0 ) = 𝐵( ∑ 𝑏𝑗 𝑤𝑗 , ∑ 𝑏ℓ 𝑤ℓ ) = ∑ 𝑏𝑗2 𝐵(𝑤𝑗 , 𝑤𝑗 ) 𝑗=1

ℓ=1

𝑗=1 𝑝′

= ∑ 𝑏𝑗2 𝐵(𝑤𝑗 , 𝑤𝑗 ) > 0. 𝑗=1

Thus 𝐵(𝑣0 , 𝑣0 ) < 0 and 𝐵(𝑣0 , 𝑣0 ) > 0, which is a contradiction.

□

3.2. CANONICAL MODELS FOR BILINEAR FORMS

115

Corollary 3.28. Two nonsingular symmetric matrices in 𝑀(𝑛, ℝ) are congruent via 𝐴 → 𝑃 T 𝐴𝑃 for some 𝑃 ∈ GL(𝑛, ℝ) if and only if they have the same signature (𝑝, 𝑞). A Diagonalization Algorithm. Let 𝐴 be a symmetric 𝑛 × 𝑛 matrix with entries from a field 𝕂 not of characteristic two (so 1+1 ≠ 0 in 𝕂). We know that there are matrices 𝑄, 𝐷 ∈ M(𝑛, 𝕂) such that 𝑄 is invertible and 𝑄T 𝐴𝑄 = 𝐷 is diagonal. We now give a method for computing suitable 𝑄 and diagonal form 𝐷 via elementary row and column operations; a short additional step then yields the signature (𝑝, 𝑞) when 𝕂 = ℝ. The effect of an elementary column operation on 𝐴 is obtained by right multiplication 𝐴 → 𝐴𝐸 by a suitable “elementary matrix” 𝐸 (recall sections 1.1 and 4.2 of LA I). Furthermore, the same elementary operation on rows is effected by a left multiplication 𝐴 → 𝐸 T 𝐴 using the same 𝐸. If we perform an elementary operation on rows followed by the same elementary operation on columns, the net result is the congruence transformation 𝐴 → 𝐸 T 𝐴𝐸. (Since matrix multiplication is associative, we have 𝐸T (𝐴𝐸) = (𝐸 T 𝐴)𝐸, so no parentheses are needed in this expression.) Now suppose that 𝑄 is an invertible matrix such that 𝑄T 𝐴𝑄 = 𝐷 is diagonal. Any invertible 𝑄 is a product of elementary matrices, say 𝑄 = 𝐸1 𝐸2 ⋯ 𝐸𝑘 . Hence T 𝐷 = 𝑄T 𝐴𝑄 = 𝐸𝑘T 𝐸𝑘−1 ⋯ 𝐸1T 𝐴𝐸1 𝐸2 ⋯ 𝐸𝑘 . Putting these observations together we get the following theorem: Theorem 3.29. A suitably chosen sequence of paired elementary row and column operations can transform any real symmetric matrix 𝐴 into a diagonal matrix 𝐷. Furthermore, if 𝐸1 , … , 𝐸𝑘 are the appropriate elementary matrices that effect the necessary column operations (indexed in the order performed), then 𝑄T 𝐴𝑄 = 𝐷 if we take 𝑄 = 𝐸1 𝐸2 ⋯ 𝐸𝑘 . The same idea works for symmetric bilinear forms over any field 𝕂 with characteristic ≠ 2. Corollary 3.30. If 𝐴 ∈ M(𝑛, 𝕂) is a symmetric matrix with entries in a field not of characteristic = 2, then 𝐴 is congruent to a diagonal matrix. Example 3.31. Taking 𝐴 to be the symmetric matrix in M(3, ℝ), 1 −1 2 𝐴 = ( −1 3 1

3 1) , 1

we apply the procedure just described to find an invertible matrix 𝑄 such that 𝑄T 𝐴𝑄 = 𝐷 is diagonal. Discussion. We begin by eliminating all nonzero entries in the first row and first column except for the entry 𝑎11 . To this end, we start by performing the column operation Col2 → Col2 + Col1 ; this yields a new matrix to which

116

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

we apply the same operation on rows, Row2 → Row2 + Row1 . These first steps yield 1 −1 2 𝐴 = ( −1 3 1

3 1 0 3 1 0 3 1 ) → ( −1 1 1 ) → ( 0 1 4 ) = 𝐸1T 𝐴𝐸1 1 3 4 1 3 4 1

where 1 0 0 𝐸1 = ( 1 1 0 ) . 0 0 1 The second round of moves is Col3 → Col3 −3 ⋅ Col1 followed by the same row operations 1 0 3 1 (0 1 4) → (0 3 4 1 3

0 0 1 1 4) → (0 4 −8 0

0 0 1 4 ) = 𝐸2T 𝐸1T 𝐴𝐸1 𝐸2 4 −8

where 𝐸2 = (

1 0 0 0 1 0 ). −3 0 1

Finally, we achieve a diagonal form by applying Col3 → Col3 −4⋅Col2 and then the corresponding operation on rows to get 1 𝐸3T 𝐸2T 𝐸1T 𝐴𝐸1 𝐸2 𝐸3 = ( 0 0

0 0 1 0) 0 −24

1 0 1 where 𝐸3 = ( 0 0 −4

0 0 ). 1

Since the outcome is a diagonal matrix, the process is complete. To summarize: by taking 1 1 −7 1 −4 ) , 𝑄 = 𝐸1 𝐸2 𝐸3 = ( 0 0 0 1 we get a diagonal form 1 𝐷 = 𝑄T 𝐴𝑄 = ( 0 0

0 0 1 0). 0 −24

To obtain the canonical form (3.8), we need one more pair of operations Row3 →

1 √24

⋅ Row3

and

Col3 →

1 √24

⋅ Col3 ,

both of which correspond to the (diagonal) elementary matrix 1 𝐸4 = ( 0 0

0 1 0

0 0 ). 1 √24

3.2. CANONICAL MODELS FOR BILINEAR FORMS

117

Therefore, the canonical form is ˜T 𝐴𝑄 ˜ where 𝑄 ˜ = 𝐸4 ⋅ ⋯ ⋅ 𝐸1 . diag(1, 1, 1, −1) = 𝑄 ○ This example also shows that the diagonal form of a real symmetric matrix achieved through congruence transformations 𝐴 → 𝑄T 𝐴𝑄 is not unique; both diag(1, 1, 1, −24) and diag(1, 1, 1, −1) are congruent to 𝐴. Only the sigature (3.1) is a true congruence invariant. The Gauss-Seidel Algorithm. In Section 4.2 of LA I, we showed that the inverse 𝐴−1 of an invertible matrix can be obtained multiplying on the left by a sequence of elementary matrices (or equivalently, by executing the corresponding sequence of elementary row operations on the matrix). We also developed the Gauss-Seidel Algorithm which does this efficiently. Gauss-Seidel Algorithm. Starting with the 𝑛 × 2𝑛 augmented matrix [𝐴 ∶ 𝐼𝑛×𝑛 ], apply row operations to bring the left-hand block into reduced echelon form, which must equal 𝐼𝑛×𝑛 since 𝐴 is invertible. Applying the same moves to the entire 𝑛 × 2𝑛 augmented matrix, we arrive at a matrix [ 𝐼𝑛×𝑛 ∶ 𝐴−1 ] whose right-hand block is the desired inverse. An algorithm similar to Gauss-Seidel yields a matrix 𝑄 such that 𝑄T 𝐴𝑄 = 𝐷 is diagonal; the signature (𝑟, 𝑠) can then be determined by inspection as in the last steps of Example 3.31. The reader should justify the method, illustrated below, for computing an appropriate 𝑄 without recording each elementary matrix separately. Starting with an augmented 𝑛 × 2𝑛 matrix [𝐴 ∶ 𝐼𝑛×𝑛 ], we apply paired row and column operations to drive the left-hand block into diagonal form, but we apply them to the entire augmented matrix. When the left-hand block achieves diagonal form 𝐷, the right-hand block in [ 𝐷 ∶ 𝑄T ] is a matrix such that 𝑄T 𝐴𝑄 = 𝐷. The steps are worked out in detail below; we leave the reader to verify that 𝑄T 𝐴𝑄 = 𝐷. Starting with Col2 → Col2 + Col1 and then the corresponding operation on rows, we get 1 −1 2 [𝐴 ∶ 𝐼 ] = ( −1 3 1

3 1 0 0 paired R/C opns. 1 0 3 1 0 0 1 0 1 0 ) −−−−−−−−−−→ ( 0 1 4 1 1 0 ) 1 0 0 1 3 4 1 0 0 1

1 paired R/C opns. −−−−−−−−−−→ ( 0 0

0 0 1 0 0 1 4 1 1 0) 4 −8 −3 0 1

1 paired R/C opns. −−−−−−−−−−→ ( 0 0

0 0 1 0 1 0 1 1 0 −24 −7 −4

0 0 ) → [ 𝐷 ∶ 𝑄T ] . 1

Therefore, 𝑄T = (

1 0 1 1 −7 −4

0 0), 1

1 𝑄=(0 0

1 −7 1 −4 ) , 0 1

118

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

and a diagonalized form 𝑄T 𝐴𝑄 is 1 𝐷=(0 0

0 0 1 0 ). 0 −24

This concludes our discussion of Example 3.31. Exercise 3.32. On 𝑉 = 𝑀(2, ℝ), consider the symmetric bilinear form 𝐵(𝑃, 𝑄) = Tr(𝑃𝑄), the trace of the product matrix 𝑃𝑄. (a) Explain why this form is symmetric. (b) Is this form nondegenerate? Explain. (c) Determine the matrix [𝐵]𝔛 of this form with respect to the ordered basis 𝔛 = {𝐸11 , 𝐸12 , 𝐸21 , 𝐸22 } of matrix units. (d) Determine the signature of this bilinear form. Hint. For calculations, use the standard basis of “matrix units” 𝐸𝑖𝑗 , 1 ≤ 𝑖, 𝑗 ≤ 2 in 𝑉 whose entries are zero except for a “1” in the (𝑖, 𝑗) spot. Recall that 𝐸𝑖𝑗 ⋅ 𝐸𝑘ℓ = 𝛿𝑗𝑘 𝐸𝑖ℓ where 𝛿𝑖𝑗 = Kronecker delta. Canonical Forms. Case 2: 𝑩 Symmetric, 𝕂 = ℂ. In this setting, there is just one canonical form for each dimension. Theorem 3.33 (𝐵 Symmetric; 𝕂 = ℂ). If 𝐵 is a nondegenerate, symmetric bilinear form over 𝕂 = ℂ, there is a basis 𝔛 such that [𝐵]𝔛 = 𝐼𝑛×𝑛 . In coordinates, for this basis we have 𝑛

𝐵(𝐳, 𝐰) = ∑ 𝑧𝑗 𝑤𝑗

(no conjugate, even though 𝕂 = ℂ).

𝑗=1

Proof. By our discussion for 𝕂 = ℝ, we know we can put 𝐵 in diagonal form [𝐵]𝔛 = diag(𝜆1 , … , 𝜆𝑛 ) with each 𝜆𝑖 ≠ 0 since 𝐵 is nondegenerate. Now take square roots in ℂ and let 𝑃 = diag(1/√𝜆1 , … , 1/√𝜆𝑛 ) to get 𝑃 T [𝐵]𝔛 𝑃 = 𝐼𝑛×𝑛 . □ There are various symmetric nondegenerate bilinear forms 𝐵 on ℂ𝑛 , but the associated automorphism groups Aut(𝐵) ⊆ GL(𝑛, ℂ) are essentially the same: we have seen that there is always a basis 𝔛 such that [𝐵]𝔛 = 𝐼, and then the matrix realization 𝐺𝐵,𝔛 of Aut(𝐵) becomes the complex orthogonal group encountered earlier: O(𝑛, ℂ) = 𝐺𝐵,𝔛 = {𝐴 ∈ M(𝑛, ℂ) ∶ 𝐴T 𝐼𝑛×𝑛 𝐴 = 𝐼𝑛×𝑛 } , in which 𝐴T = 𝐴−1 . (Notice the use of 𝐴T here and not the adjoint 𝐴∗ = 𝐴T , even though 𝕂 = ℂ.) Among the subgroups of SO(𝑛, ℂ), we find the special orthogonal group over ℂ, SO(𝑛, ℂ) = O(𝑛, ℂ) ∩ SL(𝑛, ℂ). Both O(𝑛, ℂ) and SO(𝑛, ℂ) are closed, unbounded subsets in M(𝑛, ℂ).

3.2. CANONICAL MODELS FOR BILINEAR FORMS

119

Exercise 3.34. Taking 𝑛 = 2, show that SO(2, ℂ) has the following properties: (a) SO(2, ℂ) is abelian and isomorphic to the direct product group 𝑆 1 × ℝ in which 𝑆 1 = {𝑧 ∈ ℂ ∶ |𝑧| = 1} and the product operation on pairs (𝑧, 𝑡) is (𝑧, 𝑡) ∗ (𝑧′ 𝑡′ ) = (𝑧𝑧′ , 𝑡 + 𝑡′ ). Find a bijection 𝜙 ∶ 𝑆 1 × ℝ → SO(2, ℂ) that implements the desired group isomorphism so that 𝜙((𝑧, 𝑡) ∗ (𝑧′ , 𝑡′ )) = 𝜙(𝑧, 𝑡) ⋅ 𝜙(𝑧′ , 𝑡′ )

(matrix product)

and 𝜙(1, 0) = 𝐼2×2 . (b) Show that 𝐴 ∈ M(2, ℂ) is in SO(2, ℂ) if and only if 𝐴=(

𝑎 𝑏 ) −𝑏 𝑎

with 𝑎, 𝑏 ∈ ℂ and 𝑎2 + 𝑏2 = 1. (c) Show that SO(2, ℂ) is an unbounded subset in M(2, ℂ) and, hence, that SO(𝑛, ℂ) is unbounded in M(𝑛, ℂ) for all 𝑛 ≥ 2 because we may embed SO(2, ℂ) in SO(𝑛, ℂ) via 𝐴 0 ⋯ 0 ⎛0 1 0⎞ 𝐴 ∈ SO(2, ℂ) ↦ ⎜ ⎟. ⋱ ⎜⋮ ⎟ 1⎠ ⎝0 0 Hint. In (b) write 𝐴 = [𝑎, 𝑏; 𝑐, 𝑑]. The identities 𝐴T 𝐴 = 𝐼 and det(𝐴) = 1 yield five equations in the complex unknowns 𝑎, 𝑏, 𝑐, 𝑑, but only four remain when duplicates are deleted. There is still some redundancy in the remaining system of equations, which can nevertheless be solved by algebraic elimination despite its nonlinearity. In (c) use the supnorm ‖𝐴‖ = max𝑖,𝑗 {|𝐴𝑖𝑗 |} on matrices to discuss bounded sets in matrix space. Remark 3.35. A similar problem was posed in Section 6.5 of LA I (see Euler’s theorem 6.84) regarding the group of real matrices SO(3) ⊆ M(3, ℝ). That came down to understanding the action of the group of real orthogonal matrices SO(2) on the plane ℝ2 . The groups O(2, 𝐶) and SO(2, ℂ) acting as ℂ-linear operators on ℂ2 are the “complexified” analogs of the real orthogonal groups acting on ℝ2 . ○ Exercise 3.36. Which scalar matrices 𝜆𝐼 (𝜆 ∈ ℂ) lie in SO(𝑛, ℂ) or in O(𝑛, ℂ)? Which diagonal matrices lie in these groups?

120

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

Exercise 3.37. Prove that SO(𝑛, ℂ) is a closed subset in M(𝑛, ℂ), so that If a sequence {𝐴(𝑛) } in SO(𝑛, ℂ) converges to a limit 𝐴 in M(𝑛, ℂ), the limit 𝐴 must also be in SO(𝑛, ℂ). (One cannot escape from SO(𝑛, ℂ) by taking a limit.) Hint. Recall the discussion of matrix and operator limits in LA I, Sections (𝑛) 5.3 - 5.4. Convergence 𝐴(𝑛) → 𝐴 as 𝑛 → ∞ means 𝐴𝑖𝑗 → 𝐴𝑖𝑗 in ℂ for each matrix entry. Canonical Forms: Case 3: 𝑩 Antisymmetric, 𝕂 = ℝ or ℂ. In the antisymmetric case, the same argument applies whether 𝕂 = ℝ or ℂ. By antisymmetry we have 𝐵(𝑣, 𝑣) = 0 for all 𝑣, and if 𝑊 ⊆ 𝑉, the 𝐵-orthocomplement 𝑊 ⟂,𝐵 = {𝑣 ∶ 𝐵(𝑣, 𝑊) = 0} need not be a subspace complementary to 𝑊. We might even have 𝑊 ⟂,𝐵 ⊇ 𝑊, although the identity dim(𝑊) + dim(𝑊 ⟂,𝐵 ) = dim(𝑉) remains valid. Theorem 3.38 (𝐵 antisymmetric; 𝕂 = ℝ or ℂ). If 𝐵 is a nondegenerate antisymmetric form over 𝕂 = ℝ or ℂ, there is a basis 𝔛 such that [𝐵]𝔛 = 𝐽 = (

0 𝐼𝑚×𝑚 ). −𝐼𝑚×𝑚 0

In particular, dim𝕂 (𝑉) must be even if 𝑉 carries a nondegenerate antisymmetric bilinear form. Proof. Recall that dim(𝑊)+dim (𝑊 ⟂,𝐵 ) = dim(𝑉) for any nondegenerate bilinear form 𝐵 on 𝑉 and any subspace 𝑊 ⊆ 𝑉. Fix 𝑣1 ≠ 0. Then 𝑀1 = (𝕂𝑣1 )⟂,𝐵 has dimension 𝑛 − 1 if dim(𝑉) = 𝑛, but it includes 𝕂𝑣1 ⊆ (𝕂𝑣1 )⟂,𝐵 because 𝐵(𝑣, 𝑣) ≡ 0. Now take any 𝑣2 ∉ 𝑀1 (so 𝑣2 ≠ 0); scale it to get 𝐵(𝑣1 , 𝑣2 ) = −1 and let 𝑀2 = (𝕂𝑣2 )⟂,𝐵 . Again, nondegeneracy implies dim(𝑀2 ) = 𝑛−1 = dim(𝑀1 ). But 𝑀2 ≠ 𝑀1 since 𝑣2 ∈ 𝑀2 and 𝑣2 ∉ 𝑀1 . The basic dimension formula |𝑊1 + 𝑊2 | = |𝑊1 | + |𝑊2 | − |𝑊1 ∩ 𝑊2 |

for subspaces 𝑊1 , 𝑊2 ⊆ 𝑉

then implies that dim(𝑀1 ∩ 𝑀2 ) = 𝑛 − 2. The space 𝑀 = 𝑀1 ∩ 𝑀2 is 𝐵orthogonal to 𝕂-span{𝑣1 , 𝑣2 } by definition of these vectors. Furthermore, 𝐵|𝑀 is antisymmetric and nondegenerate. In fact, we already know that 𝐵(𝑀, 𝑣1 ) = 𝐵(𝑀, 𝑣2 ) = 0 and 𝑉 = 𝕂𝑣1 ⊕ 𝕂𝑣2 ⊕ 𝑀, so if 𝐵(𝑤, 𝑀) = 0 for some 𝑤 ∈ 𝑀, then 𝐵(𝑤, 𝑉) = 𝐵(𝑤, ℝ𝑣1 + ℝ𝑣2 + 𝑀) = 0 and 𝑤 = 0 by nondegeneracy. Finally, if 𝑁 = 𝕂-span{𝑣1 , 𝑣2 }, we have 𝑉 = 𝑁 ⊕ 𝑀. (Why?) We can now argue by induction on dimension: 𝑛 = dim(𝑉) must be even, and there is a basis 𝔛0 = {𝑣3 , … , 𝑣𝑛 } in 𝑀 such that 𝑅

0 ⋱

[𝐵|𝑀 ]𝔛0 = ( 0 with 𝑅=(

0 1 ). −1 0

) 𝑅

3.2. CANONICAL MODELS FOR BILINEAR FORMS

121

Hence, 𝔛 = {𝑣1 , 𝑣2 } ∪ 𝔛0 is a basis for 𝑉 such that 𝑅 0 ⎛ ⎞ 𝑅 ⎟. [𝐵]𝔛 = ( )=⎜ ⋱ 0 [ 𝐵|𝑀 ]𝔛 ⎜ ⎟ 0 𝑅 ⎠ ⎝ 0 A simple relabeling of basis vectors (corresponding to some permutation matrix 𝐸 such that 𝐸T = 𝐸 −1 ) gives the standard form 𝑅

0

𝐸T [𝐵]𝔛 𝐸 = [𝐵]𝔜 = ( where 𝑚 =

1 2

0 𝐼𝑚×𝑚 ) −𝐼𝑚×𝑚 0 □

dim(𝑉).

A skew-symmetric nondegenerate form 𝐵 imposes a symplectic structure on 𝑉. The dimension dim𝕂 (𝑉) must be even, and as we saw earlier, there is just one such nondegenerate structure up to congruence of the representative matrix. Definition 3.39. The automorphism group Aut(𝐵) of a nondegenerate skew-symmetric form on 𝑉 is called a symplectic group. If 𝔛 is a basis that puts 𝐵 into standard form, we have T 𝐵(𝑥, 𝑦) = [𝑥]T 𝔛 [𝐵]𝔛 [𝑦]𝔛 = [𝑥]𝔛 𝐽 [𝑦]𝔛

where

𝐽=(

0 𝐼𝑚×𝑚 ). −𝐼𝑚×𝑚 0

By (3.7), 𝑇 is an operator in Aut(𝐵) if and only 𝐵(𝑥, 𝑦) = 𝐵(𝑇𝑥, 𝑇𝑦) for 𝑥, 𝑦 in 𝑉. If 𝐴 = [𝑇]𝔛𝔛 , we claim that 𝐴 is in 𝐺𝔛,𝐵 ⇔ 𝐴T 𝐽𝐴 = 𝐽. In fact, 𝐵(𝑥, 𝑦) = [𝑥]T 𝔛 𝐽 [𝑦]𝔛 by definition of Aut(𝐵) since 𝐵(𝑥, 𝑦) = 𝐵(𝑇𝑥, 𝑇𝑦) for 𝑥, 𝑦 ∈ 𝑉. This happens if and only if T T T [𝑥]T 𝔛 𝐽 [𝑦]𝔛 = [𝑇𝑥]𝔛 𝐽 [𝑇𝑦]𝔛 = [𝑥]𝔛 [𝑇]𝔛𝔛 𝐽 [𝑇]𝔛𝔛 [𝑦]𝔛 .

Since this is true for all 𝑥, 𝑦, we have 𝑇 ∈ Aut(𝐵) ⇔ 𝐴 = [𝑇]𝔛𝔛 ∈ 𝐺𝐵,𝔛 ⇔ 𝐴T 𝐽𝐴 = 𝐽 , as claimed. The corresponding matrix group is the classical Symplectic 1 Group of Degree 𝑚 = dim(𝑉): 2

Sp(𝑛, 𝕂) = 𝐺𝐵,𝔛 = {𝐴 ∈ M(𝑛, 𝕂) ∶ 𝐴T 𝐽𝐴 = 𝐽} . The related matrix 𝑅 ′

0 ⋱

𝐽 =( 0

) 𝑅

with

𝑅=(

0 1 ) −1 0

is a GL-conjugate of 𝐽 with 𝐽 ′ = 𝐶𝐽𝐶 −1 for some 𝐶 ∈ GL(2𝑚, ℝ), and the algebraic condition 𝐴T 𝐽 ′ 𝐴 = 𝐽 ′ determines a subgroup 𝐺 ′ = {𝐴 ∶ 𝐴T 𝐽 ′ 𝐴 = 𝐽 ′ } that is a conjugate of (hence isomorphic to) the matrix group 𝐺𝐵,𝔛 = Sp(𝑛, 𝕂) above. The commutation

122

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

relations involving 𝐽 and 𝐽 ′ have both been used in the literature to describe the matrices associated with elements of Aut(𝐵). The version involving 𝐽 is often favored by physicists in discussions of Hamiltonian systems in classical mechanics and in quantum mechanics. Note. We have det(𝐴) ≠ 0 automatically because det(𝐽) = det(𝐴T 𝐽𝐴) = det(𝐴)2 ⋅ det 𝐽 ⇒ (det(𝐴))2 = 1 for 𝐴 ∈ Sp(𝑛, 𝕂), so det(𝐴) = ±1 whether the underlying field is ℝ or ℂ. The only scalar matrices 𝜆𝐼 in Sp(𝑛, 𝕂) are those such that 𝜆2 = 1. Incidentally, det(𝐽) = 1 because 𝑚 row transpositions send 𝐽 → diag( − 𝐼𝑚×𝑚 , 𝐼𝑚×𝑚 ) which has det = (−1)𝑚 ; but transposing those 𝑚 rows creates another factor of (−1)𝑚 , making det(𝐽) = +1. ○ 3.3. Sesquilinear Forms (𝕂 = ℂ) Finally, we take up sesquilinear forms 𝐵 ∶ 𝑉 × 𝑉 → ℂ (which are only defined on complex vector spaces). These are linear functions of the first entry in 𝐵(𝑣, 𝑤) but conjugate-linear in the second so that 𝐵(𝑥, 𝜆𝑦) = 𝜆𝐵(𝑥, 𝑦) and 𝐵(𝜆𝑥, 𝑦) = 𝜆𝐵(𝑥, 𝑦). There is a limited number of possibilities. Lemma 3.40. A sesquilinear form on 𝑉 cannot be symmetric or antisymmetric unless it is zero. Proof. If 𝐵 were symmetric, we would have 𝜆𝐵(𝑥, 𝑦) = 𝐵(𝜆𝑥, 𝑦) = 𝐵(𝑦, 𝜆𝑥) = 𝜆𝐵(𝑦, 𝑥) = 𝜆𝐵(𝑥, 𝑦) for all 𝜆 ∈ ℂ, which is impossible unless 𝐵(𝑥, 𝑦) ≡ 0. A similar argument works when a sesquilinear 𝐵 is antisymmetric. □ Thus, the only natural symmetries of sesquilinear forms over ℂ are 1. Hermitian Symmetry. 𝐵(𝑥, 𝑦) = 𝐵(𝑦, 𝑥). 2. Skew-Hermitian Symmetry. 𝐵(𝑥, 𝑦) = −𝐵(𝑦, 𝑥). However, if 𝐵 is Hermitian, then 𝑖𝐵 (where 𝑖 = √−1) is skew-Hermitian and vice-versa, so once we analyze Hermitian sesquilinear forms, there is nothing new to say about skew-Hermitian forms. The sesquilinear forms on 𝑉 are a vector space over ℂ. Every such form is uniquely a sum 𝐵 = 𝐵𝐻 + 𝐵𝑆 of a Hermitian and skew-Hermitian form: 𝐵(𝑣, 𝑤) + 𝐵(𝑤, 𝑣) 𝐵(𝑣, 𝑤) − 𝐵(𝑤, 𝑣) + for all 𝑣, 𝑤 ∈ 𝑉 2 2 As usual, a sesquilinear form 𝐵 is determined by its matrix representation relative to a basis 𝔛 = {𝑒1 , … , 𝑒𝑛 } in 𝑉 given by 𝐵(𝑣, 𝑤) =

[𝐵]𝔛 = [𝐵𝑖𝑗 ]

where 𝐵𝑖𝑗 = 𝐵(𝑒𝑖 , 𝑒𝑗 ).

Given any basis 𝔛, the form 𝐵 is 1. Nondegenerate if and only if [𝐵]𝔛 is nonsingular (nonzero determinant).

3.3. SESQUILINEAR FORMS (𝕂 = ℂ)

123

2. Hermitian symmetric if and only if [𝐵]𝔛 is self-adjoint (= [𝐵]∗𝔛 ). 3. The correspondence 𝐵 → [𝐵]𝔛 is a ℂ-linear isomorphism between the vector space of sesquilinear forms on 𝑉 and matrix space M(𝑛, ℂ). Sesquilinear Change of Basis Formula. The change of basis formula is a bit different from that for bilinear forms. If 𝔜 = {𝑓𝑗 } is another basis related to 𝔛 = {𝑒𝑖 } via 𝑛

𝑓𝑖 = ∑ 𝑠𝑗𝑖 𝑒𝑗

where

𝑆 = [id]𝔛𝔜 ,

𝑗=1

we have ([𝐵]𝔜 )𝑖𝑗 = 𝐵(𝑓𝑖 , 𝑓𝑗 ) = 𝐵( ∑ 𝑠𝑘𝑖 𝑒𝑘 , ∑ 𝑠ℓ𝑗 𝑒ℓ ) ℓ

𝑘

= ∑ 𝑠𝑘𝑖 𝑠ℓ𝑗 ([𝐵]𝔛 )𝑘ℓ = (𝑆 T [𝐵]𝔛 𝑆)𝑖𝑗 𝑘,𝑙

where 𝑆 is the conjugate matrix: 𝑆 𝑖𝑗 = 𝑠𝑖𝑗 . Letting 𝑃 = 𝑆, we may rewrite the outcome as (3.14)

[𝐵]𝔜 = 𝑃 ∗ [𝐵]𝔛 𝑃

where det(𝑃) ≠ 0 and 𝑃∗ = (𝑃)T . In terms of the transition matrix 𝑆 between bases, we have 𝑃 = 𝑆 = [id]𝔛𝔜 . Here 𝑃∗ need not be equal to 𝑃 −1 , so 𝑃 need not be a unitary matrix in M(𝑛, ℂ). Formula (3.14) differs from that for orthogonal matrices in that 𝑃 T has been replaced by 𝑃∗ . Exercise 3.41. Prove the basic properties 1–3 of sesquilinear forms listed above. Exercise 3.42. If 𝐵 is sesquilinear, 𝔛 is a basis in 𝑉, and 𝑥 = ∑𝑖 𝑥𝑖 𝑒𝑖 , 𝑦 = ∑𝑗 𝑦𝑗 𝑒𝑗 in 𝑉, show that 𝐵(𝑥, 𝑦) = [𝑥]T 𝔛 [𝐵]𝔛 [𝑦]𝔛

so that

𝐵(𝑥, 𝑦) = ∑ 𝑥𝑖 𝐵𝑖𝑗 𝑦𝑗 . 𝑖𝑗

Definition 3.43. An inner product is a sesquilinear form that is 1. Hermitian. 𝐵(𝑥, 𝑦) = 𝐵(𝑦, 𝑥). 2. Positive Definite. 𝐵(𝑥, 𝑥) ≥ 0 for all 𝑥. 3. Nondegenerate. 𝐵(𝑥, 𝑉) = (0) ⇔ 𝑥 = 0. If 𝐵 is Hermitian as in (1), the associated quadratic form 𝑄(𝑥) = 𝐵(𝑥, 𝑥) has real values for all 𝑥 ∈ 𝑉. In general, conditions (2) and (3) amount to saying 𝐵(𝑥, 𝑥) ≥ 0 and 𝐵(𝑥, 𝑥) = 0 ⇒ 𝑥 = 0 — i.e., the form is strictly positive definite. This equivalence follows from the polarization identity for sesquilinear forms.

124

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

Lemma 3.44 (Polarization Identity). If 𝐵 is any sesquilinear form on a complex vector space, then 3

𝐵(𝑣, 𝑤) =

1 [ ∑ 𝑖𝑘 ⋅ 𝐵(𝑣 + 𝑖𝑘 𝑤, 𝑣 + 𝑖𝑘 𝑤)] 4

where 𝑖 = √−1.

𝑘=0

In particular, 𝐵(𝑥, 𝑦) is nondegenerate if 𝐵(𝑥, 𝑥) ≠ 0 for some 𝑥 ∈ 𝑉. □

Proof. Expand the sum.

Proposition 3.45. Every nondegenerate Hermitian sesquilinear form 𝐵 on a complex vector space can be put into one of the following canonical forms by a suitable choice of basis in 𝑉. (3.15)

[𝐵]𝔛 = (

𝐼𝑝×𝑝 0 ) 0 −𝐼𝑞×𝑞

(forms of type (𝑝, 𝑞))

where 𝑝 + 𝑞 = 𝑛 = dim 𝑉 and 𝑝, 𝑞 ≥ 0. If 𝐳 = ∑𝑖 𝑧𝑖 𝑒𝑖 , 𝐰 = ∑𝑗 𝑤𝑗 𝑒𝑗 with respect to a basis 𝔛 such that [𝐵]𝔛 has canonical form, we get the standard realization of a sesquilinear form of type (𝑝, 𝑞). 𝑝

(3.16)

𝑛

𝐵𝑝,𝑞 (𝐳, 𝐰) = ∑ 𝑧𝑖 𝑤𝑖 − ∑ 𝑧𝑖 𝑤𝑖

(with 𝑝 + 𝑞 = 𝑛)

𝑖=𝑝+1

𝑖=1

in which 𝑉 = ℂ𝑛 . When 𝑝 = 𝑛, we obtain the standard inner product (𝐳, 𝐰) on ℂ𝑛 𝑛

𝐵𝑛,0 (𝐳, 𝐰) = ∑ 𝑧𝑗 𝑤𝑗 = (𝐳, 𝐰)

for 𝐳, 𝐰 ∈ ℂ𝑛 ,

𝑗=1

and if 𝑞 = 𝑛, we get 𝑛

𝐵0,𝑛 (𝐳, 𝐰) = − ∑ 𝑧𝑖 𝑤𝑖 = −(𝐳, 𝐰). 𝑖=1

Proof. If 𝐵 is nondegenerate Hermitian and 𝑣 ≠ 0, there must be some 𝑤 ∈ 𝑉 such that 𝐵(𝑣, 𝑤) ≠ 0, but by the polarization identity, nondegeneracy of 𝐵 implies there is some 𝑣 such that 𝐵(𝑣, 𝑣) ≠ 0. Hermitian symmetry implies 𝐵(𝑣, 𝑣) is real, and if 𝑐 ∈ ℂ, we have 𝐵(𝑐𝑣, 𝑐𝑣) = |𝑐|2 𝐵(𝑣, 𝑣); thus, by rescaling, we can make 𝐵(𝑣, 𝑣) = ±1, depending on whether 𝐵(𝑣, 𝑣) is positive or negative. We now argue by induction on dim(𝑉), assuming dim(𝑉) = 𝑛 and that [𝐵]𝔛 can be put into canonical form for any nondegenerate Hermitian form 𝐵0 on any complex vector space of dimension < 𝑛. As in the preceding paragraph, we can find a vector 𝑣1 such that 𝐵(𝑣1 , 𝑣1 ) = ±1; let 𝑀1 = ℝ𝑣1 and relabel ⟂,𝐵

𝑀1

= 𝑀1⟂ = 𝑀0 .

By nondegeneracy of 𝐵, we have dim(𝑀0 ) = 𝑛 − 1, and then 𝑀1 ∩ 𝑀0 = (0) because 𝑤 ∈ 𝑀1 ∩ 𝑀1⟂ ⇒ 𝑤 = 𝑐𝑣1 and also 0 = (𝑤, 𝑣1 ) = 𝑐(𝑣1 , 𝑣1 ), which implies 𝑐 = 0. Thus, 𝑉 = 𝑀1 ⊕ 𝑀0 .

3.3. SESQUILINEAR FORMS (𝕂 = ℂ)

125

The restricted form 𝐵0 = 𝐵|𝑀0 is also nondegenerate because if 𝐵(𝑤, 𝑀0 ) is zero for some nonzero 𝑤 ∈ 𝑀0 , then 𝐵(𝑤, 𝑉) = 𝐵(𝑤, 𝑀1 + 𝑀0 ) = (0) too, contrary to nondegeneracy of 𝐵 on 𝑉. We conclude that there is a basis 𝔛 = {𝑒1 = 𝑣1 , 𝑒2 , … , 𝑒𝑛 } in 𝑉 = 𝑀1 ⊕ 𝑀0 such that 𝜇 ⎛ 1 [𝐵]𝔛 = ⎜ 0 ⎜ ⎝ 0

0 𝐼𝑝′ ×𝑝′

0

0

−𝐼𝑞′ ×𝑞′

⎞ ⎟ ⎟ ⎠

with 𝑝′ + 𝑞′ = 𝑛 − 1. Finally, apply a permutation matrix 𝐸 (relabel basis vectors) to get a new basis 𝔜 such that [𝐵]𝔜 = 𝐸 ∗ [𝐵]𝔛 𝐸 = (

𝐼𝑝×𝑝 0 ) 0 −𝐼𝑞×𝑞

with 𝑝 + 𝑞 = 𝑛 = dim(𝑉) and 𝑝, 𝑞 ≥ 0. This reduces to [𝐵]𝔛 = 𝐼𝑛×𝑛 (or −𝐼𝑛×𝑛 ) when 𝑝 = 𝑛 (or 𝑝 = 0). □ Proposition 3.45 implies that there are just 𝑛 + 1 congruence classes of nondegenerate Hermitian sesquilinear forms on a complex vector space of dimension 𝑛, which are distinguished by their signatures (𝑝, 𝑞). Convenient models of the possible automorphism groups Aut(𝐵) = {𝑇 ∶ det(𝑇) ≠ 0

and

𝐵(𝑇(𝑣), 𝑇(𝑤)) = 𝐵(𝑣, 𝑤) for all 𝑣, 𝑤}

are provided by the matrix groups 𝐺𝐵,𝔛 determined by a basis 𝔛 that puts 𝐵 into canonical form. This yields the Unitary Groups of Type (𝑝, 𝑞), for which Aut(𝐵) is isomorphic to the matrix group (3.17)

U(𝑝, 𝑞) = {𝐴 ∶ 𝐴∗ 𝐽𝑝,𝑞 𝐴 = 𝐽𝑝,𝑞 } with 𝐽𝑝,𝑞 = (

𝐼𝑝×𝑝

0

0

−𝐼𝑞×𝑞

).

There is a slight twist in the correspondence between operators 𝑇 ∈ Aut(𝐵) and matrices 𝐴 = [𝑇]𝔛𝔛 ∈ 𝐺𝐵,𝔛 in U(𝑝, 𝑞). Exercise 3.46. Let 𝐵 be nondegenerate Hermitian sesquilinear and 𝔛 = {𝑒𝑖 } be a basis such that [𝐵]𝔛 is in canonical form (3.15). If 𝐴 = [𝑇]𝔛𝔛 is the matrix associated with 𝑇 ∈ Aut(𝐵) and the basis 𝔛 in 𝑉, verify that the complex conjugate 𝐴 satisfies the identity in (3.17), and conversely if 𝐴 ∈ U(𝑝, 𝑞), then 𝐴 = ([𝑇]𝔛𝔛 ) for some 𝑇 ∈ Aut(𝐵). −

Thus, the correspondence Φ ∶ 𝑇 ↦ 𝐴 = ([𝑇]𝔛𝔛 ) (rather than 𝑇 ↦ 𝐴 = [𝑇]𝔛𝔛 ) is a bijection between Aut(𝐵) and the matrix group U(𝑝, 𝑞) ⊆ M(𝑛, ℂ) such that Φ(𝑇1 ∘ 𝑇2 ) = Φ(𝑇1 ) ⋅ Φ(𝑇2 ) (matrix product) . Hence Φ is a group isomorphism between Aut(𝐵) and U(𝑝, 𝑞).

126

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

When 𝑝 = 𝑛 and we identify 𝑉 ≅ ℂ𝑛 via a basis such that [𝐵]𝔛 = 𝐼𝑛×𝑛 , Aut(𝐵) becomes the classical group of unitary operators on ℂ𝑛 , and the group of matrices 𝐺𝐵,𝔛 is the familiar group of unitary matrices U(𝑛) = U(𝑛, 0) = {𝐴 ∈ M(𝑛, ℂ) ∶ 𝐴∗ 𝐴 = 𝐴∗ 𝐼𝑛×𝑛 𝐴 = 𝐼𝑛×𝑛 } which includes the special unitary group SU(𝑛) = U(𝑛) ∩ SL(𝑛, ℂ) ⊆ U(𝑛) as a closed subgroup. Similarly U(𝑝, 𝑞) contains the Special Unitary Group of Type (𝑝, 𝑞), SU(𝑝, 𝑞) = U(𝑝, 𝑞) ∩ SL(𝑛, ℂ). For 𝐴 ∈ U(𝑝, 𝑞), the identity 𝐴∗ 𝐽𝑝,𝑞 𝐴 = 𝐽𝑝,𝑞 implies | det(𝐴)|2 det(𝐽𝑝,𝑞 ) = det(𝐽𝑝.𝑞 ) , so | det(𝐴)| ≡ 1 and det(𝐴) lies on the unit circle 𝑆 1 = {𝑧 ∶ |𝑧| = 1} in ℂ. We know that matrices in U(𝑛) are orthogonally diagonalizable over ℂ since they are normal operators (𝐴∗ 𝐴 = 𝐴𝐴∗ , so 𝐴∗ 𝐴 = 𝐼 ⇔ 𝐴𝐴∗ = 𝐼). And because ‖𝐴𝑥‖2 = ‖𝑥‖2 for all 𝑥, all eigenvalues 𝜆𝑖 have absolute value 1, so for unitary matrices (or operators) the spectrum spℂ (𝐴) is a subset of the unit circle 𝑆 1 = {𝑧 ∈ ℂ ∶ |𝑧| = 1}. Furthermore, U(𝑛) contains a copy 𝑆 1 ≅ {𝜆𝐼𝑛×𝑛 ∶ |𝜆| = 1} of the unit circle, which is a group under the usual multiplication of complex numbers because |𝑧𝑤| = |𝑧| ⋅ |𝑤| and |𝑧| = 1 ⇒ |1/𝑧| = 1. In SU(𝑛), however, the only scalar matrices are those of the form 𝜆𝐼 where 𝜆 is an 𝑛th root of unity, 𝜆 = 𝑒2𝜋𝑖𝑘/𝑛 with 0 ≤ 𝑘 ≤ 𝑛 − 1. Notice the parallel between the orthogonal groups over 𝕂 = ℝ and the unitary groups over 𝕂 = ℂ. 1. SO(𝑝, 𝑞) and O(𝑝, 𝑞) over ℝ are the “real points” in the groups SU(𝑝, 𝑞) and U(𝑝, 𝑞). In fact, we have O(𝑝, 𝑞) = U(𝑝, 𝑞) ∩ (M(𝑛, ℝ) + 𝑖0) when we identify M(𝑛, ℂ) = M(𝑛, ℝ) + √−1 M(𝑛, ℝ) by splitting 𝐴 = [𝑧𝑖𝑗 ] as [𝑥𝑖𝑗 ] + √−1 [𝑦𝑖𝑗 ] when 𝑧𝑖𝑗 = 𝑥𝑖𝑗 + √−1 𝑦𝑖𝑗 . 2. We also recognize SO(𝑛) and O(𝑛) as the real points in the complex matrix groups SO(𝑛, ℂ) and O(𝑛, ℂ). At the same time, these subgroups are also the real points in the unitary groups SU(𝑛) and U(𝑛). Exercise 3.47. Prove that U(𝑛) is a closed, bounded subset when we iden2 tify M(𝑛, ℂ) ≈ ℂ𝑛 . Hence, it is a compact matrix group. Exercise 3.48. If 1 < 𝑝 < 𝑛, prove that U(𝑝, 𝑞) and SU(𝑝, 𝑞) are closed but unbounded subsets in matrix space M(𝑛, ℂ). Exercise 3.49. Verify that matrix group U(𝑝, 𝑞) is in fact the matrix model 𝐺𝐵,𝔛 of the automorphism group Aut(𝐵𝑝,𝑞 ) with respect to any basis that puts 𝐵 = 𝐵𝑝,𝑞 into canonical form (3.17).

ADDITIONAL EXERCISES

127

Additional Exercises Section 3.1. Bilinear, Quadratic, and Multilinear Forms 1. True/False Questions (“True” if the statement is always true.) (a) If a bilinear form 𝐵 on a finite dimensional vector space is symmetric, then its matrix [𝐵]𝔛 with respect to any basis 𝔛 in 𝑉 is symmetric so that [𝐵]T 𝔛 = [𝐵]𝔛 . (b) For an arbitrary bilinear form on a vector space over ℝ or ℂ, the left- and right-handed nondegeneracy conditions 𝐵(𝑥, 𝑉) = {0} ⇒ 𝑥 = 0 𝐵(𝑉, 𝑥) = {0} ⇒ 𝑥 = 0 are equivalent. (c) If 𝐵 is a bilinear form on a vector space such that dim(𝑉) > 1 and 𝑥 ≠ 0 in 𝑉, there is some 𝑦 ∈ 𝑉 such that 𝑦 ≠ 0 but 𝐵(𝑥, 𝑦) = 0. (d) If 𝐵 ∶ 𝑉 × ⋯ × 𝑉 → 𝕂 is a tensor of rank-𝑘 on a vector space 𝑉, then 𝐵(𝑣1 , … , 𝑣𝑘 ) = 0 if one of the inputs 𝑣1 , … , 𝑣𝑘 is zero. Congruence of Matrices. Recall that matrices 𝐴, 𝐴′ ∈ M(𝑛, 𝕂) are congruent, indicated by writing 𝐴′ ≅ 𝐴, if there is a nonsingular matrix 𝑃 such that 𝐴′ = 𝑃 T 𝐴𝑃. This happens ⇔ they are matrix realizations of same bilinear form 𝐵 on an 𝑛-dimensional vector space with respect to different bases, so 𝐴 = [𝐵]𝔛 and 𝐴′ = [𝐵]𝔜

for bases 𝔛, 𝔜 in 𝑉.

(e) If two matrices in M(𝑛, 𝕂) are congruent, so 𝐵 = 𝑄T 𝐴𝑄, they must have the same eigenvalues in 𝕂. (f) Any symmetric matrix 𝐴 ∈ M(𝑛, 𝕂) is congruent to a diagonal matrix. (g) Every quadratic form is a bilinear form. 2. Which of the following functions with inputs 𝐱, 𝐲, 𝐳 from 𝑉 = ℝ𝑛 have the right properties to be rank-3 tensors in 𝑉 (0,2) = 𝑉 ∗ ⊗ 𝑉 ∗ ⊗ 𝑉 ∗ ? (a) 𝐵(𝐱, 𝐲, 𝐳) = 3𝑥1 𝑥2 𝑥3 − 𝑥3 𝑧1 𝑦4 , 𝑉 = ℝ4 (b) 𝐵(𝐱, 𝐲, 𝐳) = 3𝑥12 𝑥2 𝑦4 + 2𝑥3 𝑧3 , 𝑉 = ℝ4 𝑥1 𝑥2 𝑥3 (c) 𝐵(𝐱, 𝐲, 𝐳) = det ( 𝑦1 𝑦2 𝑦3 ) , 𝑉 = ℝ3 . 𝑧1 𝑧2 𝑧3 3. The map 𝐵 ∶ ℂ4 × ℂ4 → ℂ given by 𝐵(𝐯, 𝐰) = 𝑣1 𝑤1 + 𝑣2 𝑤2 − 𝑣3 𝑤3 − 𝑣4 𝑤4 is easily seen the be bilinear (a rank-2 tensor). Find the 4 × 4 matrix 𝐵𝔛 that describes it with respect the standard basis 𝐞1 , … , 𝐞4 in ℂ4 . Is this matrix symmetric? Antisymmetic? Is 𝐵 a sesquilinear form on ℂ4 ?

128

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

4. Let 𝐵 ∶ 𝑉 × 𝑉 → ℝ be a bilinear form on a vector space 𝑉 over ℝ. (a) Is 𝐵˜(𝑥 + 𝑖𝑦, 𝑥′ + 𝑖𝑦′ ) = (𝐵(𝑥, 𝑥′ ) − 𝐵(𝑦, 𝑦′ )) + 𝑖 (𝐵(𝑥, 𝑦′ ) + 𝐵(𝑥′ , 𝑦)) a complex-bilinear form on the complexification 𝑉ℂ = 𝑉 + 𝑖𝑉? (b) Is 𝐵˜ symmetric if 𝐵 is symmetric? (c) We say that 𝐵˜ is nondegenerate if 𝐵˜(𝑥, 𝑉ℂ ) = (0) ⇒ 𝑥 = 0 in 𝑉ℂ . Is 𝐵˜ nondegenerate on 𝑉ℂ if 𝐵 is nondegenerate on 𝑉? 5. Regarding the columns 𝐶1 , 𝐶2 of a matrix 𝐴 ∈ M(2, 𝕂) as vectors in 𝑉 = 𝕂2 , is the map 𝐵(𝐶1 , 𝐶2 ) = det(𝐴) a bilinear form on 𝑉? If so, is 𝐵 symmetric? Antisymmetric? 6. If a matrix 𝐴 is congruent to a diagonal matrix 𝐷 with 𝐷 = 𝑄T 𝐴𝑄, is 𝐴 necessarily a symmetric matrix? 7. If 𝐵 is a symmetric bilinear form on vector space over a field 𝕂, the associated quadratic form 𝑄 ∶ 𝑉 → 𝕂 is given by 𝑄(𝑥) = 𝐵(𝑥, 𝑥) for all 𝑥 ∈ 𝑉. Prove the polarization identity 1 𝐵(𝑥, 𝑦) = [𝑄(𝑥 + 𝑦, 𝑥 + 𝑦) − 𝑄(𝑥, 𝑥) − 𝑄(𝑦, 𝑦)] for 𝑥, 𝑦 ∈ 𝑉 , 2 which allows us to reconstruct 𝐵(𝑥, 𝑦) from 𝑄. Section 3.2. Canonical Models for Bilinear Forms 1. True/False Questions (“True” if the statement is always true.) (a) When 𝕂 = ℂ, every symmetric matrix can be diagonalized by a similarity transformation 𝐴 → 𝑆𝐴𝑆 −1 (𝑆 invertible). (b) If 𝕂 = ℝ or ℂ, every symmetric matrix in M(𝑛, 𝕂) is congruent to a diagonal matrix. 2. Which pairs from the following list of matrices are congruent? 1 (𝑎) 𝐴 = ( −1 3

1 −2 2 1) 1 1

1 2 1 (𝑏) 𝐵 = ( 2 3 2 ) 3 1 1

1 0 1 (𝑐) 𝐶 = ( 0 1 2 ) 1 2 1

Give your reasoning. Hint. You could use Sylvester’s theorem to compare symmetric matrices. You might also consider the ranks of the matrices. 3. Determine which pairs of matrices in Additional Exercise 2 are in the same similarity class. 4. Let 𝑉 be an inner product space over ℝ and 𝐵(𝑥, 𝑦) a symmetric bilinear form on 𝑉. (a) Prove that there is an orthonormal basis 𝔛 in 𝑉 such that [𝐵]𝔛 is a diagonal matrix. (b) What can you say about these diagonal entries? (In particular, are they necessarily related to the eigenvalues of [𝐵]𝔛 ?

ADDITIONAL EXERCISES

129

𝑎 𝑏 ) be a real-orthogonal matrix in O(2). Explain why the 𝑐 𝑑 following hold: (a) det(𝐴) = +1 or −1. (b) |𝜆| = 1 for any complex eigenvalues of 𝐴. ̃ ∶ ℂ2 → ℂ2 is diagonalizable. (c) 𝐿𝐴 6. If det(𝐴) = +1 and 𝐴 ≠ 𝐼, we have shown (Euler’s theorem, LA I) that 𝐿𝐴 acts on ℝ2 as a rotation about the origin. Now suppose det(𝐴) = 𝑎𝑑 − 𝑏𝑐 = −1. (b) Explain why 5. Let 𝐴 = (

𝑎2 + 𝑏2 = 1 𝑐2 + 𝑑2 = 1 𝑎𝑐 + 𝑏𝑑 = 0. cos 𝜃 sin 𝜃 Then show that 𝐴 = ( ) for some real 𝜃. sin 𝜃 − cos 𝜃 (c) Show that 𝐿𝐴 is reflection across some line 𝐿 through the origin, with 𝜃 related to the angle between 𝐿 and the +𝑥-axis. Hint. Recall the discussion of Euler’s theorem 6.48 in LA I. 7. A matrix 𝐴 is in the complex orthogonal group O(𝑛, ℂ) if and only if 𝜔(𝐴𝐳, 𝐴𝐰) = 𝜔(𝐳, 𝐰) for the canonical symmetric bilinear form on ℂ𝑛 , 𝑛

𝜔(𝐳, 𝐰) = ∑ 𝑧𝑗 𝑤𝑗

(𝐳.𝐰 ∈ ℂ𝑛 ) ,

𝑗=1

whose signature is (𝑛, 0). Now consider the related symmetric bilinear form 𝜔(𝐳, ˜ 𝐰) = 𝜔(𝑖𝐳, 𝑖𝐰) where 𝑖 = √−1. Explain why the following hold: (a) The matrix [𝜔] ˜ 𝔛 of this form with respect to the standard basis in ℂ𝑛 is −𝐼𝑛×𝑛 . (b) What does this mean for the product 𝐴𝐴T ? (c) What is the signature of 𝜔? ˜ 8. In discussing symplectic groups (the automorphism groups of nondegenerate skew-symmetric bilinear forms 𝐵 over ℝ or ℂ), we mentioned two canonical forms in common use, namely [𝐵]𝔛 = 𝐽 = (

0 𝐼𝑚×𝑚 ) −𝐼𝑚×𝑚 0

and (for a different basis) 𝑅

0 ⋱

[𝐵]𝔜 = ( 0

) 𝑅

where

𝑅=(

0 1 ) . −1 0

Restricting attention to the simpler case of 4 × 4 matrices, find a permutation matrix 𝑃 such that 𝑃T [𝐵]𝔛 𝑃 = [𝐵]𝔜 .

130

3. BILINEAR, QUADRATIC, AND MULTILINEAR FORMS

Hint. In this case the desired change of basis can be achieved by a simple relabeling of the basis vectors in 𝔛 without taking any linear combinations (which is why 𝑃 is a permutation matrix). Section 3.3. Sesquilinear Forms Over ℂ 1. True/False Questions (“True” if the statement is always true.) (a) For bilinear forms on a vector space over ℝ, there is no difference between “symmetry” and “Hermitian-symmetry.” (b) Let 𝐴 be a skew-adjoint complex matrix (so 𝐴∗ = −𝐴) and define 𝐵 ∶ ℂ𝑛 × ℂ𝑛 → ℂ letting 𝐵(𝐳, 𝐰) = 𝐳T 𝐴𝐰 , regarding 1 × 1 matrices as scalars and vectors in ℂ𝑛 as 𝑛 × 1 column vectors. Then 𝐵 is a sesquilinear form on the complex vector space 𝑉 = ℂ𝑛 . 2. If (𝐳, 𝐰) is the standard inner product on ℂ𝑛 and 𝐴 is a complex matrix, what properties must 𝐴 have if 𝐵(𝐳, 𝐰) = (𝐴𝐳, 𝐰) (𝐳, 𝐰 ∈ 𝑉 = ℂ𝑛 ) is to be the following: (a) A sesquilinear form on 𝑉. (b) A nondegenerate sesquilinear form on 𝑉 (so 𝐵(𝑉, 𝐰) = 0 if and only if 𝐰 = 0). (c) A Hermitian symmetric sesquilinear form on 𝑉. (d) A skew-Hermitian sesquilinear form on 𝑉. (e) An inner product on 𝑉.

10.1090/cln/030/04

CHAPTER 4

Tensor Fields, Manifolds, and Vector Calculus ... or, what they never told you in Calculus I-III

An Overview of This Chapter. This chapter explores some aspects of linear algebra that lie at the heart of modern differential geometry, and it ends with a reinterpretation of many of the main results of multivariate calculus. This is a vast subject, so the presentations of this chapter will not be as fully developed as in preceding chapters. Euclidean 𝑛-dimensional space 𝐸𝑛 is a featureless space — a line, a plane, etc. — perhaps equipped with a metric — a distance function 𝑑(𝑥, 𝑦) between points. By marking an origin and imposing coordinates, we can model 𝐸𝑛 as 𝑛-tuples of real numbers 𝐱 = (𝑥1 , … , 𝑥𝑛 ) in ℝ𝑛 . In addition to allowing us to describe locations of points 𝑝 as coordinate 𝑛-tuples, ℝ𝑛 comes equipped with certain algebraic operations — scaling and vector addition — which make Euclidean space into a vector space over ℝ. This extra structure, unknown to Euclid, was inspired by the “parallelogram law” for vector addition that arose in physics as the correct law for adding forces. In the late 1800’s, additional operations on coordinate space ℝ𝑛 were introduced, such as the inner product 𝑛 (𝐱, 𝐲) = ∑𝑖=1 𝑥𝑖 𝑦𝑖 (for arbitrary dimensions) and the cross product 𝐱×𝐲 (which makes sense only in 𝑛 = 3 dimensions). The tangent space TE𝑝 to 𝐸𝑛 at a base point 𝑝 was thought of as a copy of the vector space ℝ𝑛 attached to 𝐸𝑛 at 𝑝, whose elements in TE𝑝 are described as pairs (𝑝, 𝐱) where 𝑝 is a base point in 𝐸𝑛 and 𝐱 a vector in ℝ𝑛 . There is a separate tangent space attached to each base point in 𝐸. In calculus, tangent vectors based at 𝑝 were thought of as “arrows” attached to the base point, which can be scaled and added to other tangent vectors attached to the same base point via the rules 𝜆 ⋅ (𝑝, 𝐱) = (𝑝, 𝜆𝐱)

and

(𝑝, 𝐱1 ) + (𝑝, 𝐱2 ) = (𝑝, 𝐱1 + 𝐱2 ).

There is, however, no meaningful way to add tangent vectors (𝑝, 𝐱) ∈ TE𝑝 and (𝑞, 𝐲) ∈ TE𝑞 attached to different base points 𝑝 ≠ 𝑞. When elements of TE𝑝 are viewed as arrows, those arrows might represent force vectors acting on objects located at 𝑝 or the velocity of a moving particle as it passes through 𝑝, for example. They could also represent more general fields: an electric, magnetic, or gravitational field, or a field of stress tensors that pervade some solid medium, etc. Many mathematical models are concerned with fields of vectors 𝐅(𝐱) defined on 𝐸𝑛 (or some open subset). These assign a tangent vector 𝐅(𝑝) ∈ TE𝑝 to each base point, and the coordinates imposed on 𝐸𝑛 determine basis vectors 131

132

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

{𝐞1 , … , 𝐞𝑛 } in the tangent space TE𝑝 that allow us to describe any tangent vector at 𝑝 as a linear combination 𝐚 = 𝑎1 (𝑝)𝐞1 + ⋯ + 𝑎𝑛 (𝑝)𝐞𝑛 (𝑎𝑖 ∈ ℝ). Similarly, we can describe a field of vectors on 𝐸𝑛 as 𝐅(𝑝) = 𝐹1 (𝑝) 𝐞1 + ⋯ + 𝐹𝑛 (𝑝) 𝐞𝑛 whose scalar-valued coefficients 𝐹𝑘 (𝑝) vary with the base point. If 𝐚, 𝐛 are in TE𝑝 and 𝑛 = 3, their scalar multiples and vector sums are given by 𝑛

3

𝜆𝐚 = ∑ 𝜆𝑎𝑖 𝐞𝑖

and

𝐚 + 𝐛 = ∑ (𝑎𝑖 + 𝑏𝑖 ) 𝐞𝑖 ,

𝑖=1

𝑖=1

and their cross product is 𝐞1 𝐞2 𝐞3 𝐚 × 𝐛 = det ( 𝑎1 𝑎2 𝑎3 ) 𝑏1 𝑏2 𝑏3 = (𝑎2 𝑏3 − 𝑎3 𝑏2 ) 𝐞1 − (𝑎1 𝑏3 − 𝑎3 𝑏1 ) 𝐞2 + (𝑎1 𝑏2 − 𝑎2 𝑏1 ) 𝐞3 . By definition, scalar fields on 𝐸𝑛 are just scalar-valued functions 𝐹 ∶ 𝐸𝑛 → ℝ. This coordinate description of vector fields on ℝ3 or ℝ𝑛 is used in calculus to define the standard “vector operations” on vector fields 𝐲 = 𝐅(𝐱) = 𝑛 ∑𝑖=1 𝐹𝑖 (𝐱) 𝐞𝑖 that have differentiable coefficients. Writing 𝐷𝑥𝑖 𝜙(𝑝) for the 𝑖th partial derivative 𝜕𝜙/𝜕𝑥𝑖 (𝑝) of a scalar-valued function 𝜙 ∶ ℝ𝑚 → ℝ, we define the following operations: 1. Gradient: 𝑛

𝐠𝐫𝐚𝐝 𝐹(𝑝) = ∇𝐹(𝑝) = ∑ 𝐷𝑥𝑖 𝐹(𝑝) ⋅ 𝐞𝑖 𝑖=1 𝑛

for scalar fields 𝐹 ∶ ℝ → ℝ. 2. Curl: 𝐞1 𝐞2 𝐞3 𝜕/𝜕𝑥 𝜕/𝜕𝑥 𝜕/𝜕𝑥 𝐜𝐮𝐫𝐥 𝐅(𝑝) = ∇ × 𝐅(𝑝) = det ( 1 2 3 ) 𝐹1 𝐹2 𝐹3 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝜕𝐹 = ( 3 − 2 ) 𝐞1 − ( 3 − 1 ) 𝐞2 + ( 2 − 1 ) 𝐞3 𝜕𝑥2 𝜕𝑥3 𝜕𝑥1 𝜕𝑥3 𝜕𝑥1 𝜕𝑥2 for smooth vector fields 𝐅 = 𝐹1 𝐞1 + 𝐹2 𝐞2 + 𝐹3 𝐞3 . This definition only works for fields on 3-dimensional space ℝ3 . Although, as we will see in Section 4.4, it can be adapted to deal with vector fields on twodimensional space. 3. Divergence: 𝜕𝐹 𝜕𝐹 𝜕𝐹 𝐝𝐢𝐯 𝐅(𝑝) = ∇ ∘ 𝐅(𝑝) = 1 (𝑝) 𝐞1 + 2 (𝑝) 𝐞2 + 3 (𝑝) 𝐞3 𝜕𝑥1 𝜕𝑥1 𝜕𝑥3 𝜕𝐹𝑖 = Trace[ (𝑝)] 𝜕𝑥𝑗

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

133

for smooth vector fields 𝐅 ∶ ℝ3 → ℝ3 . The result is a scalar field. This too works only for fields on 3-dimensional Euclidean space, but it can be adapted to work for fields on ℝ2 . Various questions lurk in the background of these definitions. Questions. All these operators are described in terms of Cartesian coordinates imposed on the blank slate of Euclidean space. But how do we know that the numerical results of our calculations are anything more than artifacts of the coordinates we imposed to do the calculation? For instance: 1. What happens in polar coordinates or other coordinate systems? 2. How are bases {𝐞1 , … , 𝐞𝑚 } in the tangent spaces TE𝑝 induced by different coordinates systems on 𝐸𝑛 ? 3. How should the vector operations of calculus 𝐠𝐫𝐚𝐝 = ∇𝐹

𝐜𝐮𝐫𝐥 = ∇ × 𝐅

𝐝𝐢𝐯 = ∇ ∘ 𝐅

be described in non-Cartesian coordinates? 4. Finally, do vector fields and the operations on them have physical or geometric meaning independent of any particular system of coordinates? Physicists and mathematicians also wondered why these particular combinations of the 𝑛2 second order partial derivatives keep showing up in so many applications. Or, what analogs of the vector operators 𝑑𝑖𝑣, 𝑔𝑟𝑎𝑑, 𝑐𝑢𝑟𝑙 that pervade the physical sciences might exist in dimensions 𝑛 ≥ 4, where the classical definitions of 𝐜𝐮𝐫𝐥 and 𝐝𝐢𝐯 no longer make sense? Featureless Euclidean space is a very limited setting for developing mathematical physics and for understanding the possible meaning of “tangent vectors.” It is simply too special to reveal the subtleties involved. For instance, suppose you wanted to model phenomena occuring on the spherical surface of the Earth. • What does it mean to speak of a vector field on the two-dimensional sphere 𝑆 2 = {𝑥 ∈ ℝ3 ∶ 𝑥12 + 𝑥22 + 𝑥32 = 1} , a smooth two-dimensional hypersurface in ℝ3 ? How about calculations involving vector fields on the 3-dimensional sphere 𝑆 3 = {𝐱 ∈ ℝ4 ∶ ‖𝐱‖2 = 1} , a curved 3-dimensional hypersurface that lives in 4-dimensional Euclidean space? • What becomes of the classical vector operations on vector fields in higher dimensional settings? The same questions apply to vector fields defined on any curved lower dimensional hypersurface embedded in ℝ𝑛 . There are deeper and more fundamental issues to deal with in a consistent re-imagining of calculus. Suppose we regard 𝑆 2 or 𝑆 3 as the entire universe

134

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

of discourse. For instance, in general relativity the universe can be viewed as a 3-dimensional sphere 𝑆3 with a metric structure that evolves with time. There is no outside space in which this object lives, and any reference to it is pure metaphysics, divorced from phenomena observable from within 𝑆3 . “Tangent vectors” or “tangent spaces” viewed as arrows or hyperplanes attached to base points 𝑝 ∈ 𝑆 3 and extending into the “surrounding space” simply have no meaning. There is no “outside” to the universe. So let’s see how calculus might be developed using concepts intrinsic to 𝑆 2 or 𝑆 3 , or more general spaces one might use to model phenomena. Underlying everything is this issue: How can you deal with such questions in a coordinate-independent way? A satisfactory answer emerged only in the first quarter of the 20th century, based on the concept of a “differentiable manifold.” Differential geometry, the area of mathematics that deals with these issues, is a hybrid of hard analysis and advanced topics in linear algebra (tensors, differential forms) which will also be explored in this chapter. 4.1. Tangent Vectors, Cotangent Vectors, and Tensors The definition of a manifold begins with an 𝑚-dimensional locally Euclidean space 𝑀; a set of points that can be covered with coordinate charts (𝑈𝛼 , 𝑥𝛼 ), each consisting of an open set 𝑈𝛼 ⊆ 𝑀, the chart domain, and a continuous chart map 𝑥𝛼 ∶ 𝑈𝛼 → ℝ𝑚 that assigns Euclidean coordinates 𝐱 = 𝑥𝛼 (𝑢) = (𝑥1 , … , 𝑥𝑚 ) in ℝ𝑚 to each point 𝑢 ∈ 𝑈𝛼 . To make 𝑀 into a differentiable manifold, we require the following:

(4.1)

1. The chart domains 𝑈𝛼 cover all of 𝑀. 2. The images 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) are open sets in ℝ𝑚 for each index 𝛼. 3. The chart map 𝑥𝛼 is a bicontinuous bijection (a homeomorphism) from 𝑈𝛼 ⊆ 𝑀 to the open set 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) in coordinate space ℝ𝑚 .

−1 The chart map 𝑥𝛼 ∶ 𝑈𝛼 → 𝑉𝛼 ⊆ ℝ𝑚 and its inverse 𝑥𝛼 ∶ 𝑉𝛼 → 𝑈𝛼 ⊆ 𝑀 are both continuous maps. Thus a locally Euclidean space 𝑀 looks like ℝ𝑚 near any base point 𝑝 ∈ 𝑀; in particular, Cartesian coordinates can be imposed on 𝑀 near 𝑝. The idea is obviously inspired by the way the spherical surface of the Earth is mapped in a geographic atlas, each page being a “chart” that describes the features in a particular region. There may of course be overlap between the regions covered by certain pages, but the way those regions fit together can be read by comparing the appropriate pages. But more is needed to do calculus on 𝑀 — in order to discuss derivatives of functions defined on 𝑀, the charts (𝑈𝛼 , 𝑥𝛼 ) should be “differentiably related” whenever two chart domains 𝑈𝛼 and 𝑈𝛽 overlap.

Definition 4.1. For 1 ≤ 𝑘 ≤ ∞, a 𝒞 (𝑘) -structure on an 𝑚-dimensional locally Euclidean space 𝑀 is a family 𝒰 of charts (𝑈𝛼 , 𝑥𝛼 ), labeled by indices 𝛼 in some index set 𝐼, such that the chart domains 𝑈𝛼 cover 𝑀 and the coordinates on them are differentiably related whenever two chart domains overlap, so the

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

135

Vα = xα (Uα ) Nα = xα (Uα ∩ Uβ ) xα coordinate transition maps: xβ ◦ x−1 α

Uα xα ◦ x−1 β

Uα ∩ U β Uβ

xβ

Level Set M = Lf (x)=q

Nβ = xβ (Uα ∩ Uβ ) Vβ = xβ (Uβ )

−1 Figure 4.1. Coordinate transition maps 𝑥𝛼 ∘ 𝑦𝛽−1 and 𝑦𝛽 ∘ 𝑥𝛼 between charts (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) and their (shaded) domains of definition 𝑁𝛼 , 𝑁𝛽 in ℝ𝑚 are shown. The domains 𝑁𝛼 , 𝑁𝛽 live in different copies of coordinate space ℝ𝑚 ; both correspond to the intersection 𝑈𝛼 ∩𝑈𝛽 of the chart domains, which is an open set in the locally Euclidean space 𝑀.

coordinate transition maps between (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ), 𝑥𝛼 ∘ 𝑦𝛽−1 ∶ ℝ𝑚 → ℝ𝑚

and

−1 𝑦𝛽 ∘ 𝑥𝛼 ∶ ℝ𝑚 → ℝ𝑚 ,

are of class 𝒞 (𝑘) on the open sets in coordinate space ℝ𝑚 where they are defined. Such a family of charts 𝒰 is a “founding family” for a differentiable structure on 𝑀 that will make it a differentiable manifold. The meaning of the coordinate transition maps is illustrated in Figure 4.1. Here is a basic example of how this works. Example 4.2. Consider the two-dimensional sphere 𝑀 = 𝑆 2 in ℝ3 : 𝑀 = {𝐱 ∈ ℝ3 ∶ 𝑥12 + 𝑥22 + 𝑥32 = 1}. One family of charts that cover 𝑀 is given by the pairs (𝐻𝑖± , 𝑃𝑖 ) where 𝐻𝑖± are the six open hemispheres 𝐻𝑖+ = {𝐱 ∈ 𝑆 2 ∶ 𝑥𝑖 > 0}

and

𝐻𝑖− = {𝐱 ∈ 𝑆 2 ∶ 𝑥𝑖 < 0}

(1 ≤ 𝑖 ≤ 3)

and 𝑃𝑖 = restriction to the sphere 𝑆2 of the projection maps 𝑃𝑖 ∶ ℝ3 → ℝ2 𝑃1 (𝐱) = (𝑥2 , 𝑥3 ), … , 𝑃3 (𝐱) = (𝑥1 , 𝑥2 ). Each 𝑃𝑖 simply erases one of the coordinates in 𝐱 = (𝑥1 , 𝑥2 , 𝑥3 ), as indicated in Figure 4.2. In each case, range(𝑃𝑖 ) is the open unit disc 𝐷 = {(𝑠1 , 𝑠2 ) ∈ ℝ2 ∶ 𝑠21 +𝑠22 < 1}. Obviously the restriction 𝑃𝑖 |𝐻𝑖± ∶ 𝐻𝑖± → 𝐷 is one-to-one and continuous onto 𝐷

136

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Figure 4.2. One of the charts (𝑥𝛼 , 𝑈𝛼 ) that determine the standard manifold structure for the smooth hypersurface 𝑀 = 𝑆 2 , a two-dimensional sphere embedded in ℝ3 . The chart domain is the hemisphere 𝑈𝛼 = 𝐻3+ = {𝐱 ∶ ‖𝐱‖ = 1 and 𝑥3 > 0}. The chart map 𝑥𝛼 = 𝑃3 |𝐻3+ sending 𝐱 → (𝑥1 , 𝑥2 ) projects the hemisphere 𝑈𝛼 onto an open disc 𝐷 of radius 𝑟 = 1 in the 𝑥1 , 𝑥2 -plane. and is a 𝒞 ∞ map 𝑃𝑖 ∶ ℝ3 → ℝ2 whose inverse can be calculated explicitly. For instance when 𝑖 = 3, we have −1

(𝑃3 |𝐻3+ ) (𝑠1 , 𝑠2 ) = (𝑠1 , 𝑠2 , √1 − (𝑠21 + 𝑠22 ) )

for all 𝐬 ∈ 𝐷,

which is clearly a 𝒞 ∞ map from 𝐷 ⊆ ℝ2 onto 𝐻3+ ⊆ ℝ3 . The coordinate transition maps between two such charts, say 𝑥𝛼 = 𝑃3 |𝐻3+ and 𝑦𝛽 = 𝑃2 |𝐻2− , are then 𝑥𝛼 ∘ 𝑦𝛽−1 (𝐬) = 𝑃3 (𝑠1 , −√1 − (𝑠21 + 𝑠22 ), 𝑠2 ) = (𝑠1 , −√1 − (𝑠21 + 𝑠22 ) ) and −1 (𝐬) = 𝑃2 (𝑠1 , 𝑠2 , +√1 − (𝑠21 + 𝑠22 ) ) = (𝑠1 , +√1 − (𝑠21 + 𝑠22 ) ) 𝑦𝛽 ∘ 𝑥𝛼

for 𝐬 = (𝑠1 , 𝑠2 ) ∈ 𝐷. Both are 𝒞 ∞ maps from ℝ2 → ℝ2 .

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

137

A map 𝐲 = 𝑓(𝐱) = (𝑓1 (𝐱), … , 𝑓𝑛 (𝐱)) from ℝ𝑚 → ℝ𝑛 is “of class 𝒞 (𝑘) ” on an open set 𝑈 ⊆ ℝ𝑚 if all partial derivatives 𝛼

𝛼

𝐷𝑥𝛼 𝑓𝑖 (𝐱) = 𝐷𝑥11 ⋯ 𝐷𝑥𝑚𝑚 𝑓𝑖 (𝐱) 𝛼 = (𝛼1 , … , 𝛼𝑚 ) ∈ ℤ𝑚 + of scalar components 𝑓𝑖 ∶ 𝑈 → ℝ of the map 𝐲 = 𝑓(𝐱) exist and are continuous for 1 ≤ 𝑖 ≤ 𝑚 and for all 𝑚-tuples 𝛼 = (𝛼1 , … , 𝛼𝑚 ) with |𝛼| = 𝛼1 + ⋯ + 𝛼𝑚 ≤ 𝑘. The “exponents” 𝛼 = (𝛼1 , … , 𝛼𝑚 ) are called multi-indices, and 𝐷𝑥𝛼 is a partial 𝛼 derivative of total degree |𝛼| = 𝛼1 +⋯+𝛼𝑚 . By convention, 𝐷𝑥𝑖𝑖 = 𝐼 (the identity operator) if 𝛼𝑖 = 0, and in particular the multi-index 𝛼 = (0, … , 0) yields 𝐷𝑥𝛼 = 𝐼 (the identity operator). Once we have a “founding set” of 𝒞 ∞ -related charts 𝒰 = {(𝑈𝛼 , 𝑥𝛼 ) ∶ 𝛼 ∈ 𝐼} whose domains cover 𝑀, it determines a maximal atlas of consistent differentiably related charts on 𝑀. The maximal atlas 𝒰 determined by 𝒰 consists of all charts (𝜙, 𝑊) with 𝜙 ∶ 𝑊 → ℝ𝑚 , as in (4.1), such that (𝜙, 𝑊) is differentiably related to all the founding charts (𝑈𝛼 , 𝑥𝛼 ) in 𝒰. The following observation shows that this makes sense. Lemma 4.3. All charts (𝜙, 𝑊), (𝜓, 𝑊 ′ ) in the maximal atlas 𝒰 are 𝒞 (𝑘) related to each other, so coordinate transition maps 𝜙 ∘ 𝜓−1 and 𝜓 ∘ 𝜙−1 from ℝ𝑚 → ℝ𝑚 are 𝐶 (𝑘) wherever the chart domains 𝑊, 𝑊 ′ overlap. Proof. This follows from the classical chain rule: if 𝑝 ∈ 𝑊 ∩ 𝑊 ′ and (𝑈𝛼 , 𝑥𝛼 ) is a founding chart in 𝒰 that contains 𝑝, we can split −1 𝜙 ∘ 𝜓−1 = (𝜙 ∘ 𝑥𝛼 ) ∘ (𝑥𝛼 ∘ 𝜓−1 ) −1 at points near 𝑝. The factors (𝜙 ∘ 𝑥𝛼 ), (𝑥𝛼 ∘ 𝜓−1 ) are both 𝒞 (𝑘) maps between 𝑚 ∞ copies of ℝ because 𝜙, 𝜓 are 𝒞 -related to the founding chart (𝑈𝛼 , 𝑥𝛼 ). □

The maximal atlas 𝒰 in Example 4.2 was determined by the six founding charts, but 𝒰 includes many other charts. For instance, determinations of spherical coordinates can be defined on various open sets 𝑈 ⊆ 𝑆 2 that avoid the north and south poles 𝑁 = (0, 0, 1) and 𝑆 = (0, 0, −1) to obtain admissible coordinate charts in the maximal atlas. For example, spherical coordinates (𝜃, 𝜙) can be defined on the open hemisphere 𝐻1+ as shown in Figure 4.3: If 𝑥1 > 0 for 𝐱 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ 𝑆 2 , we can define a chart (𝑧𝛾 , 𝑈𝛾 ) on 𝑈𝛾 = 𝐻1+ by taking (𝜃, 𝜙) = 𝑧𝛾 (𝐱) = ( arctan (

𝑥2 ), arcsin(𝑥3 ) ), 𝑥1

so the transition map from (𝑈𝛼 , 𝑥𝛼 ) = (𝑃3 , 𝐻3+ ) to (𝑧𝛾 , 𝑈𝛾 ) with 𝑈𝛾 = 𝐻1+ is −1 (𝜃, 𝜙) = 𝑧𝛾 ∘ 𝑥𝛼 (𝑠1 , 𝑠2 ) = 𝑧𝛾 (𝑠1 , 𝑠2 , √1 − (𝑠21 + 𝑠22 ))

= ( arctan (

𝑠2 ), arcsin √1 − (𝑠21 + 𝑠22 )). 𝑠1

The inverse map can be handled similarly; both are 𝒞 ∞ from ℝ2 → ℝ2 .

138

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Figure 4.3. Spherical coordinates (𝜃, 𝜙) on the twodimensional sphere 𝑆 2 = {𝐱 ∈ ℝ3 ∶ 𝑥12 + 𝑥22 + 𝑥32 = 1} are assigned as shown. We have chosen a somewhat nonstandard definition of 𝜙, one that agrees with the way latitude is assigned on the surface of the Earth. Both angles are ambiguous up to a multiple of 2𝜋 radians (= 360∘ ); to get definite values, our convention restricts −𝜋 < 𝜃 < 𝜋 and −𝜋 < 𝜙 < 𝜋. The angles 𝜃 and 𝜙 cannot be defined at the north pole 𝑁 = (0, 0, 1) or south pole 𝑆 = (0, 0, −1) on the sphere. Example 4.4 (Stereographic Projections on 𝑆 2 ). Actually, the differentiable structure of the two-dimensional sphere 𝑆 2 , and even that of the 𝑛-dimensional sphere 2 𝑆 𝑛 = {𝐱 ∈ ℝ𝑛+1 ∶ 𝑥12 + ⋯ + 𝑥𝑛+1 = 1} in ℝ𝑛 , can be determined by just two founding charts that impose Euclidean coordinates on the “punctured spheres” 𝑈+ = 𝑆 𝑛 ∼ {𝑆} and 𝑈− = 𝑆 𝑛 ∼ {𝑁} obtained by deleting the south pole 𝑆 = (0, … , −1) or the north pole 𝑁 = (0, … , 1) and mapping these open subsets of 𝑆 𝑛 by stereographic projection onto the 𝑛dimensional tangent planes passing through the opposite pole 𝑁 or 𝑆. Stereographic projection will be discussed at some length in Exercise 5.21 of Chapter 5, so we won’t say more about it here. Smooth Functions and Mappings on Manifold 𝑴. Once a manifold 𝑀 is equipped with a differentiable structure (a maximal atlas of differentiably related covering charts), we can define the following: 1. Smooth Functions 𝑓 ∶ 𝑀 → ℝ. 2. Smooth Parametrized Curves 𝛾 ∶ ℝ → 𝑀 . 3. Smooth Mappings 𝜙 ∶ 𝑀 → 𝑁 between differentiable manifolds 𝑀 and 𝑁.

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

139

Definition 4.5. A function 𝑓 ∶ 𝑀 → ℝ is of class 𝒞 ∞ , indicated by writing −1 𝑓 ∈ 𝒞 ∞ (𝑀), if 𝑓 ∘ 𝑥𝛼 ∶ 𝑉𝛼 → ℝ is 𝒞 ∞ for every chart — i.e., 𝑓 is smooth (of ∞ class 𝒞 on 𝑀) if it is smooth in the classical sense when described in local chart coordinates. The same definition, applied to any open set 𝑈 ⊆ 𝑀, determines the space 𝒞 ∞ (𝑈) of smooth scalar valued functions on 𝑈. These are not only infinite-dimensional vector spaces, but they are also associative algebras since they are closed under forming pointwise products (𝑓 ⋅ ℎ)(𝑢) = 𝑓(𝑢)ℎ(𝑢), as well 𝑟 as scalar multiples 𝜆 ⋅ 𝑓, sums 𝑓1 + 𝑓2 , and linear combinations ∑𝑖=1 𝑐𝑖 𝑓𝑖 . By the classical chain rule, smoothness of 𝑓 ∶ 𝑀 → 𝑁 does not depend on any particular choice of local coordinates. We now call attention to certain special families of functions associated with a base point 𝑝 ∈ 𝑀. Definition 4.6 (The Local Algebra 𝒞 ∞ (𝑝)). The local algebra 𝒞 ∞ (𝑝) at a base point 𝑝 ∈ 𝑀 is the associative algebra of 𝒞 ∞ functions defined at and near 𝑝. A typical element in 𝒞 ∞ (𝑝) is therefore a pair (𝑓, 𝑈) involving an open neighborhood 𝑈 of 𝑝 and a 𝒞 ∞ function 𝑓 ∶ 𝑈 → ℝ. We then define (+), pointwise product (⋅), and scaling operations (𝑓1 , 𝑈1 ) + (𝑓2 , 𝑈2 ) = (𝑓1 + 𝑓2 , 𝑈1 ∩ 𝑈2 ) (𝑓1 , 𝑈1 ) ⋅ (𝑓2 , 𝑈2 ) = (𝑓1 ⋅ 𝑓2 , 𝑈1 ∩ 𝑈2 ) (pointwise product) 𝜆 ⋅ (𝑓, 𝑈) = (𝜆 ⋅ 𝑓, 𝑈) for all 𝜆 ∈ ℝ on 𝒞 ∞ (𝑝). The zero element in this vector space is the pair 𝟎 = (0, 𝑀), and 𝟏 = (1-, 𝑀) is the multiplicative identity element. Note that 𝒞 ∞ (𝑝) includes the algebra 𝒞 ∞ (𝑀) of globally defined 𝒞 ∞ functions on 𝑀, as well as 𝒞 ∞ (𝑈) for any open set 𝑈 that contains 𝑝. Definition 4.7. A map 𝜙 ∶ 𝑀 → 𝑁 between differentiable manifolds 𝑀, 𝑁 is smooth near 𝑝 if it becomes a 𝒞 ∞ map from ℝ𝑚 → ℝ𝑛 when described in local coordinates, so that −1 𝑦𝛽 ∘ 𝜙 ∘ 𝑥𝛼 ∶ ℝ𝑚 → ℝ𝑛

is 𝒞 ∞ in the classical sense

for all charts (𝑈𝛼 , 𝑥𝛼 ) about 𝑝 and (𝑈𝛽 , 𝑦𝛽 ) about 𝑞 = 𝜙(𝑝). The map is 𝒞 ∞ on an open subset (or on all of 𝑀) if it is 𝒞 ∞ near each base point in 𝑈. Definition 4.8. A parametrized curve 𝛾 ∶ ℝ → 𝑀 is of class 𝒞 (𝑘) if, for all charts (𝑈𝛼 , 𝑥𝛼 ) on 𝑀, 𝑥𝛼 ∘ 𝛾(𝑡) is a 𝒞 (𝑘) -map from ℝ → ℝ𝑚 on some open interval (𝑎, 𝑏) ⊆ ℝ. It is of class 𝒞 (𝑘) on a closed interval [𝑎, 𝑏] if there is a slightly larger interval (𝑎 − 𝜖, 𝑏 + 𝜖) on which 𝛾 is defined and of class 𝒞 (𝑘) . Thus, in local coordinates, 𝛾 becomes a classical 𝒞 (𝑘) curve 𝛾(𝑡) ̃ = 𝑥𝛼 ∘ 𝛾(𝑡) = (𝑥1 (𝑡), … , 𝑥𝑚 (𝑡)) from ℝ → ℝ𝑚 whose scalar components 𝑥𝑘 (𝑡) are differentiable of class 𝒞 (𝑘) .

140

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

The classical derivative 𝑑𝛾/𝑑𝑡 of a smooth curve 𝛾 ∶ [𝑎, 𝑏] → ℝ𝑚 is a vector in ℝ𝑛 : 𝛾(𝑡0 + Δ𝑡) − 𝛾(𝑡0 ) Δ𝛾 𝑑𝛾 (4.2) (𝑡0 ) = lim = lim 𝑑𝑡 Δ𝑡 ∆𝑡→0 Δ𝑡 ∆𝑡→0 𝑑𝑥𝑚 𝑑𝑥1 =( (𝑡 ), … , (𝑡 )). 𝑑𝑡 0 𝑑𝑡 0 But this makes no sense for smooth curves 𝛾 ∶ [𝑎, 𝑏] → 𝑀 with values in a manifold; 𝑀 is not a vector space, and differences of Δ𝛾 of points in 𝑀 are undefined. (How would you interpret the difference quotients Δ𝛾/Δ𝑡 in (4.2) for a curve that lives within some smooth curved hypersurface 𝑀 in ℝ𝑛 without making any reference to the surrounding space ℝ𝑛 ?) We can, however, make sense of 𝛾′ (𝑡0 ) as an operator 𝛾′ (𝑡0 ) ∶ 𝒞 ∞ (𝑝) → ℝ that acts as a directional derivative on the local algebra of 𝒞 ∞ functions defined near 𝑝 = 𝛾(𝑡0 ), letting 𝑑 (4.3) ⟨𝛾′ (𝑡0 ), 𝑓⟩ = {𝑓 ∘ 𝛾(𝑡)}|𝑡=𝑡 . 0 𝑑𝑡 ′ The operator 𝛾 (𝑡0 ) sends 𝑓 to the time-derivative of the values 𝑓(𝛾(𝑡)) seen as 𝑡 increases past 𝑡0 ; it is a linear operator from 𝒞 ∞ (𝑝) → ℝ. Given a chart (𝑈𝛼 , 𝑥𝛼 ) about 𝑝, the action of 𝛾′ (𝑡0 ) on functions can be com−1 puted by writing 𝑓 ∘ 𝛾(𝑡) as a composite (𝑓 ∘ 𝑥𝛼 ) ∘ (𝑥𝛼 ∘ 𝛾) ∶ ℝ → ℝ𝑚 → ℝ and applying the classical chain rule. If 𝑥𝛼 ∘ 𝛾(𝑡) = (𝑥1 (𝑡), … , 𝑥𝑚 (𝑡)), we get 𝑑 {𝑓 ∘ 𝛾(𝑡)}|𝑡=𝑡 0 𝑑𝑡 𝑑 −1 = {(𝑓 ∘ 𝑥𝛼 ) ∘ (𝑥𝛼 ∘ 𝛾)}|𝑡=𝑡 0 𝑑𝑡

⟨𝛾′ (𝑡0 ), 𝑓⟩ =

𝑚

(4.4)

−1 = ∑ 𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 )(𝑥𝛼 ∘ 𝛾(𝑡0 )) ⋅ 𝑖=1 𝑚 −1 = ∑ (𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 ))(𝑥𝛼 (𝑝)) ⋅ 𝑖=1

𝑑𝑥𝑖 (𝑡 ) 𝑑𝑡 0

𝑑𝑥𝑖 (𝑡 ) 𝑑𝑡 0

𝑚

𝜕𝑓 𝑑𝑥 (𝑝) ⋅ 𝑖 (𝑡0 ). 𝜕𝑥𝑖 𝑑𝑡 𝑖=1

=∑

Notation. Here, as in earlier chapters, we use “bracket notation” ⟨ℓ, 𝑣⟩ to indicate the result ℓ(𝑣) when we combine a vector 𝑣 ∈ 𝑉 with a dual vector ℓ ∈ 𝑉 ∗ . This “bracket notation” turns out to be a good idea. ○ Definition 4.9. Any linear map ℓ ∶ 𝒞 ∞ (𝑝) → ℝ with the property ⟨ℓ, 𝑓 ⋅ ℎ⟩ = ⟨ℓ, 𝑓⟩ ⋅ ℎ(𝑝) + 𝑓(𝑝) ⋅ ⟨ℓ, ℎ⟩ is referred to as a derivation on the local algebra. If 𝛾 ∶ (𝑎, 𝑏) → 𝑀 is a smooth curve passing through 𝑝 ∈ 𝑀 when 𝑡 = 𝑡0 , the directional derivative along 𝛾, 𝛾′ (𝑡0 ) ∶ 𝒞 ∞ (𝑝) → ℝ, is an example of a derivation on 𝒞 ∞ (𝑝). With some effort, it can be shown that all derivations on the local algebra 𝒞 ∞ (𝑝) arise in this manner, but we omit these technical

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

141

Figure 4.4. The tangent space TM𝑝 to a smooth 𝑚dimensional hypersurface 𝑀 ⊆ ℝ𝑛 is a translate 𝑝 + 𝐸 of some 𝑚-dimensional vector subspace 𝐸 ⊆ ℝ𝑛 . Elements 𝐯 = (𝑝, 𝐚), 𝐰 = (𝑝, 𝐛) in TM𝑝 are indicated by arrows attached to the base point 𝑝. Scaling and vector addition in TM𝑝 are accomplished by parallel transport transferring arrows back to the corresponding vectors in 𝐸, which can be added via the usual parallelogram law for vector addition. This makes TM𝑝 an 𝑚-dimensional vector space. details. Given this, we arrive at the modern definition of tangent vector and tangent space for a differentiable manifold 𝑀. Tangent Vectors and the Tangent Spaces TM𝑝 . Classically, manifolds were taken to be smooth hypersurfaces 𝑀 of some dimension 𝑚 embedded in a larger Euclidean space ℝ𝑛 . Elements of the tangent space TM𝑝 at a base point 𝑝 ∈ 𝑀 were “tangent vectors,” thought of as pairs (𝑝, 𝐚) consisting of a base point 𝑝 ∈ 𝑀 and an arrow 𝐚 ∈ ℝ𝑚 attached to 𝑝. Vectors in TM𝑝 make up an 𝑚-dimensional hyperplane attached to 𝑀 at 𝑝, which is a translate TM𝑝 = 𝑝 + 𝐸 centered at 𝑝 of some 𝑚-dimensional vector subspace 𝐸 ⊆ ℝ𝑛 . Algebraic structure — addition and scaling operations on tangent vectors in TM𝑝 — was introduced by translating arrows 𝑝 + 𝐚, 𝑝 + 𝐛 attached to 𝑝 back to vectors 𝐚, 𝐛 in the vector subspace 𝐸, where they can be added or scaled. Their sum 𝐚 + 𝐛 was then translated back to the pair (𝑝, 𝐚 + 𝐛) at 𝑝. In effect, sums and scalar multiples of tangent vectors at 𝑝 were defined to be (𝑝, 𝐚) + (𝑝, 𝐛) = (𝑝, 𝐚 + 𝐛) 𝜆 ⋅ (𝑝, 𝐚) = (𝑝, 𝜆 ⋅ 𝐚). These operations are not defined for tangent vectors attached to different base points in 𝑀.

142

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Some Objections: Finding the Right Definition. 1. None of this makes sense if we regard 𝑀 as our entire universe of discourse and are forbidden to refer to objects lying in some mythical all-encompassing Euclidean space. 2. A more intrinsic approach (still regarding 𝑀 as an embedded hypersurface in ℝ𝑛 ) is to recall that the velocity vector of a moving point in ℝ𝑛 whose position is 𝛾(𝑡) at time 𝑡 is given by 𝛾(𝑡0 + Δ𝑡) − 𝛾(𝑡0 ) 𝑑𝛾 Δ𝛾 (𝑡0 ) = lim = lim 𝑑𝑡 Δ𝑡 ∆𝑡→0 Δ𝑡 ∆𝑡→0 (derivative taken in ℝ𝑛 ). When this velocity vector is viewed as an arrow attached to 𝑝 = 𝛾(𝑡0 ) in 𝑀, it always lies in the classical tangent space to 𝑀 at 𝑝 because 𝛾(𝑡) remains within 𝑀 at all times. If we interpret tangent vectors at 𝑝 as velocity vectors for curves passing through 𝑝, this defines tangent vectors in terms of objects intrinsic to 𝑀 — smooth parametrized curves in 𝑀. But the derivative 𝑑𝛾/𝑑𝑡 is a limit of difference quotients Δ𝛾/Δ𝑡, which can’t be computed without referring to the surrounding space ℝ𝑛 . 3. The tangent space to 𝑀 at 𝑝 is supposed to be a vector space. If smooth curves 𝛾1 , 𝛾2 in 𝑀 pass through 𝑝 when 𝑡 = 𝑡0 , it is not immediately obvious how to find a curve 𝜂(𝑡) such that 𝜂(𝑡0 ) = 𝑝 and 𝜂′ (𝑡0 ) is the sum 𝛾1′ (𝑡0 ) + 𝛾2′ (𝑡0 ) of the tangent vectors. There is also an ambiguous relationship between curves through 𝑝 and tangent vectors at 𝑝 since different curves through 𝑝 can determine the same classical tangent vector 𝑑𝛾/𝑑𝑡. Another approach is already implicit in these comments. We shall view tangent vectors 𝛾′ (𝑡0 ) as “directional derivatives” of functions 𝑓 ∶ 𝑀 → ℝ defined near 𝑝 = 𝛾(𝑡0 ), obtained by computing the time derivative of the values of 𝑓 seen by the moving point 𝛾(𝑡). From this point of view, 𝛾′ (𝑡0 ) becomes a linear operator 𝑑 𝛾′ (𝑡0 ) ∶ 𝑓 → ⟨𝛾′ (𝑡0 ), 𝑓⟩ = {𝑓(𝛾(𝑡))}|𝑡=𝑡 0 𝑑𝑡 ∞ acting on elements 𝑓 in the local algebra 𝒞 (𝑝) defined in (4.3). The objects 𝑓, 𝛾(𝑡), and 𝛾′ (𝑡0 ) are now defined on 𝑀 without assuming that 𝑀 lives within some “universal Euclidean space.” Definition 4.10. If 𝑀 is a smooth manifold and 𝑝 ∈ 𝑀 is a base point, tangent vectors at 𝑝 are regarded as the derivations on the local algebra 𝒞 ∞ (𝑝), and the tangent space TM𝑝 at 𝑝 is the set of all such derivations. It is a vector space because linear combinations of derivations on 𝒞 ∞ (𝑝) are again derivations. Derivations on 𝒞 ∞ (𝑝) determined by directional derivatives along curves in 𝑀 have the following properties: 1. Derivation Property. Directional derivatives 𝛾′ (𝑡0 ) along smooth curves through 𝑝 act as linear maps from 𝒞 ∞ (𝑝) to ℝ — i.e., they are linear functionals in the dual space 𝑉 ∗ of the ∞-dimensional vector

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

143

space 𝑉 = 𝒞 ∞ (𝑝). The classical product formula for derivatives in one variable, 𝑑𝑓 𝑑 𝑑ℎ ⋅ ℎ(𝑡) + 𝑓(𝑡) ⋅ , {(𝑓 ⋅ ℎ)}(𝑡) = 𝑑𝑡 𝑑𝑡 𝑑𝑡 shows that directional derivatives along a curve are also derivations on 𝒞 ∞ (𝑝) because we have (𝑓 ⋅ ℎ)(𝛾(𝑡)) = (𝑓 ∘ 𝛾(𝑡)) ⋅ (ℎ ∘ 𝛾(𝑡)) for all 𝑡. Then 𝑑 {(𝑓 ∘ 𝛾(𝑡)) ⋅ (ℎ ∘ 𝛾(𝑡))}|𝑡=𝑡 0 𝑑𝑡 𝑑 𝑑 = {𝑓 ∘ 𝛾(𝑡)}|𝑡=𝑡 ⋅ ℎ(𝑝) + 𝑓(𝑝) ⋅ {ℎ ∘ 𝛾(𝑡)}|𝑡=𝑡 0 0 𝑑𝑡 𝑑𝑡 ′ ′ = ⟨𝛾 (𝑡0 ), 𝑓⟩ ⋅ ℎ(𝑝) + 𝑓(𝑝) ⋅ ⟨𝛾 (𝑡0 ), ℎ⟩

⟨𝛾′ (𝑡0 ), 𝑓 ⋅ ℎ ⟩ = (4.5)

for all 𝑓, ℎ ∈ 𝒞 ∞ (𝑝). 2. Vector Space Structure. Sums ℓ1 + ℓ2 and scalar multiples 𝜆 ⋅ ℓ of derivations are again derivations on 𝒞 ∞ (𝑝), so the tangent space TM𝑝 is a vector space over ℝ. 3. Local Operators Property. Directional derivatives 𝛾′ (𝑡0 ) are local operators on 𝒞 ∞ (𝑝): the outcome ⟨𝛾′ (𝑡0 ), 𝑓⟩ depends only on the behavior of 𝑓 in the immediate vicinity of 𝑝 = 𝛾(𝑡0 ). In particular, if 𝑓 ≡ ℎ on any open neighborhood of 𝑝, then ⟨𝛾′ (𝑡0 ), 𝑓⟩ = ⟨𝛾′ (𝑡0 ), ℎ⟩, and if 𝑓 ≡ 0 near 𝑝, we get ⟨𝛾′ (𝑡0 ), 𝑓⟩ = 0. Which curves passing through 𝑝 when 𝑡 = 0 determine the same operation on 𝒞 ∞ (𝑝)? If we describe such curves 𝛾(𝑡), 𝜂(𝑡) in local coordinates, say with 𝐱(𝑡) = 𝑥𝛼 ∘ 𝛾(𝑡) = (𝑥1 (𝑡), … , 𝑥𝑚 (𝑡)) 𝐲(𝑡) = 𝑥𝛼 ∘ 𝜂(𝑡) = (𝑦1 (𝑡), … , 𝑦𝑚 (𝑡)) , then 𝛾′ (𝑡0 ) = 𝜂′ (𝑡0 ) are equal as operations on 𝒞 ∞ (𝑝) if and only if the “firstorder terms” agree, so that 𝐱(𝑡0 ) = 𝐲(𝑡0 ) 𝑑𝑦 𝑑𝑦 𝑑𝑥 𝑑𝑥1 (𝑡0 ) = 1 (𝑡0 ), … , 𝑚 (𝑡0 ) = 𝑚 (𝑡0 ). 𝑑𝑡 𝑑𝑡 𝑑𝑡 𝑑𝑡 This is clear from equation (4.4), from which it also follows that the higher order derivatives of 𝐱(𝑡) and 𝐲(𝑡) are irrelevant. These definitions are framed in a way that does not favor any particular system of local coordinates near 𝑝 in the maximal atlas. In this sense, all definitions so far are coordinate-independent. Differential Operators 𝝏/𝝏𝒙𝒊 on a Manifold. Given a chart (𝑈𝛼 , 𝑥𝛼 ) on 𝑀, certain derivations on 𝒞 ∞ (𝑝) correspond to the familiar partial derivatives of calculus. If 𝑝 in 𝑀 corresponds under the chart map to 𝐩 = 𝑥𝛼 (𝑝) = (𝑝1 , … , 𝑝𝑚 )

144

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

in coordinate space ℝ𝑚 , we can define parametrized straight lines 𝛾𝑖̃ (𝑡) in ℝ𝑚 such that 𝛾𝑖̃ (0) = 𝐩 by 𝛾𝑖̃ (𝑡) = 𝐩 + 𝑡𝐞𝑖 = (𝑝1 , … , 𝑝𝑖 + 𝑡, … , 𝑝𝑚 ) = (𝑥1 (𝑡), … , 𝑥𝑚 (𝑡)) for 1 ≤ 𝑖 ≤ 𝑚, where 𝐞1 , … , 𝐞𝑚 are the standard basis vectors in ℝ𝑚 . These −1 curves in coordinate space can be transferred to 𝒞 ∞ curves 𝛾𝑖 (𝑡) = 𝑥𝛼 ∘ 𝛾𝑖̃ (𝑡) in 𝑀 that pass through 𝑝 when 𝑡 = 0. Directional derivatives along these curves are then derivations 𝛾𝑖′ (0) on 𝒞 ∞ (𝑝), which will hereafter be denoted by the symbol (

𝜕 | | ) ∶ 𝒞 ∞ (𝑝) → ℝ 𝜕𝑥𝑖 |𝑝

for 1 ≤ 𝑖 ≤ 𝑚.

These operators act on functions that live on the manifold 𝑀, while the classical partial derivatives 𝐷𝑥𝑖 act on functions that live on coordinate space ℝ𝑚 . The effect of (𝜕/𝜕𝑥𝑖 |𝑝 ) on an element 𝑓 ∈ 𝒞 ∞ (𝑝) will often be indicated using “bracket notation,” writing (4.6)

𝜕𝑓 𝜕 | | , 𝑓 ⟩ for 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ) and 1 ≤ 𝑖 ≤ 𝑚. (𝑝) = ⟨ 𝜕𝑥𝑖 𝜕𝑥𝑖 |𝑝

Note also that a chart (𝑈𝛼 , 𝑥𝛼 ) simultaneously determines correlated tangent vectors 𝑋𝑝 = (𝜕/𝜕𝑥𝑖 |𝑝 ) for 1 ≤ 𝑖 ≤ 𝑚 at every base point 𝑝 ∈ 𝑈𝛼 . The result is a “field of tangent vectors” on the chart domain 𝑈𝛼 for each 𝑖. If we let the base point 𝑝 vary within 𝑀, we obtain scalar-valued functions 𝜕𝑓/𝜕𝑥𝑖 (𝑢) = ⟨𝛾𝑖′ (0), 𝑓⟩ defined for all 𝑢 in the chart domain 𝑈𝛼 . A simplified version of (4.4) tells us how to compute these partial derivatives 𝜕𝑓/𝜕𝑥𝑖 on the manifold. Lemma 4.11. If (𝑈𝛼 , 𝑥𝛼 ) is a chart on 𝑀 and 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ), the partial derivatives 𝜕𝑓/𝜕𝑥𝑖 on the manifold are given by 𝜕𝑓 −1 (𝑝) = (𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 )) ∘ 𝑥𝛼 (𝑝) 𝜕𝑥𝑖

1≤𝑖≤𝑛

for all 𝑝 ∈ 𝑈𝛼 . Thus we get 𝜕𝑓/𝜕𝑥𝑖 on a chart in 𝑀 in three steps: −1 • Transfer 𝑓(𝑢) in 𝒞 ∞ (𝑀) to a 𝒞 ∞ function (𝑓 ∘ 𝑥𝛼 )(𝐱) on ℝ𝑚 . −1 • Take the classical partial derivative 𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 )(𝐱) on ℝ𝑚 . • Bring the result back to 𝑀 to get

𝜕𝑓 −1 (𝑢) = (𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 ))(𝑥𝛼 (𝑢)) 𝜕𝑥𝑖 for 𝑢 ∈ 𝑈𝛼 . Proof. Write 𝐩 = 𝑥𝛼 (𝑝) and 𝐱(𝑡) = 𝑥𝛼 ∘ 𝛾𝑖 (𝑡) = (𝑥1 (𝑡), … , 𝑥𝑚 (𝑡)). Then −1 ) ∘ (𝑥𝛼 ∘ 𝛾𝑖 ) to get maps ℝ → ℝ𝑚 → ℝ, to which we may factor 𝑓 ∘ 𝛾𝑖 = (𝑓 ∘ 𝑥𝛼

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

145

apply the classical chain rule to get 𝜕𝑓 𝜕 | | , 𝑓⟩ = ⟨𝛾𝑖′ (0) , 𝑓⟩ (𝑝) = ⟨ 𝜕𝑥𝑖 𝜕𝑥𝑖 |𝑝 𝑑 𝑑 −1 = {(𝑓 ∘ 𝑥𝛼 ) ∘ (𝑥𝛼 (𝛾𝑖 (𝑡)))}|𝑡=0 {𝑓 ∘ 𝛾𝑖 (𝑡)}|| 𝑑𝑡 𝑑𝑡 𝑡=0 𝑑 −1 = {(𝑓 ∘ 𝑥𝛼 )(𝑝1 , … , 𝑝𝑖 + 𝑡, … , 𝑝𝑚 )}|𝑡=0 𝑑𝑡 𝑚 𝑑𝑥𝑗 | −1 = ∑ [(𝐷𝑥𝑗 (𝑓 ∘ 𝑥𝛼 ))(𝑝1 , … , 𝑝𝑖 + 𝑡, … , 𝑝𝑚 ) ⋅ (𝑡) ]| |𝑡=0 𝑑𝑡 𝑗=1 =

−1 −1 = 0 + ⋯ + 1 ⋅ 𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 )(𝐩) + ⋯ + 0 = 𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 ) ∘ 𝑥𝛼 (𝑝)

because 𝑥𝛼 ∘ 𝛾𝑖 (𝑡) = (𝑝1 , … , 𝑝𝑖 + 𝑡, … , 𝑝𝑚 ) if 𝐩 = 𝑥𝛼 (𝑝) = 𝑥𝛼 ∘ 𝛾(0).

□

We will often invoke the following example. Example 4.12. If (𝑈𝛼 , 𝑥𝛼 ) is a chart on a differentiable manifold, the chart map 𝑥𝛼 can be written as 𝐱 = 𝑥𝛼 (𝑢) = (𝑋1 (𝑢), … , 𝑋𝑚 (𝑢)) , whose components 𝑋𝑘 (𝑢) are scalar-valued 𝒞 ∞ functions 𝑋𝑘 ∶ 𝑈𝛼 → ℝ on the chart domain 𝑈𝛼 . It is amusing to compute 𝜕𝑋𝑘 /𝜕𝑥𝑗 on 𝑈𝛼 as an exercise in understanding how the notation works. We claim that 𝜕𝑋𝑘 (4.7) (𝑢) ≡ 𝛿𝑘𝑗 for all 𝑢 ∈ 𝑈𝛼 and all 𝑗, 𝑘 𝜕𝑥𝑗 where 𝛿𝑘𝑗 is the Kronecker delta symbol, equal to 1 if 𝑘 = 𝑗 and = 0 when 𝑗 ≠ 𝑘. Discussion. Since −1 −1 −1 𝐱 = (𝑥1 , … , 𝑥𝑚 ) = 𝑥𝛼 (𝑥𝛼 (𝐱)) = (𝑋1 (𝑥𝛼 (𝐱)), … , 𝑋𝑚 (𝑥𝛼 (𝐱))) , −1 we have 𝑋𝑘 ∘𝑥𝛼 (𝐱) ≡ 𝑥𝑘 for all 𝐱 in the open set 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) ⊆ ℝ𝑚 . Applying Lemma 4.11 and standard facts from calculus, we get 𝜕𝑋𝑘 𝜕 | −1 | , 𝑋 ⟩ = (𝐷𝑥𝑗 (𝑋𝑘 ∘ 𝑥𝛼 ))(𝑥𝛼 (𝑢)) (𝑢) = ⟨ 𝜕𝑥𝑗 𝜕𝑥𝑗 |𝑢 𝑘 = 𝐷𝑥𝑗 (𝑥𝑘 )| = 𝛿𝑘𝑗 𝐱=𝑥𝛼 (𝑢)

for all 𝑢 ∈ 𝑈𝛼 .

○

If (𝑈𝛼 , 𝑥𝛼 ) is a chart on 𝑀 and 𝑝 ∈ 𝑀, we can evaluate the directional derivative of a function 𝑓 ∈ 𝒞 ∞ (𝑝) along a smooth curve 𝛾(𝑡) through 𝑝 by passing the problem over to a coordinate space where the outcome can be determined using the familiar tools of multivariate calculus. The final result is an easily applied formula involving the functions 𝜕𝑓/𝜕𝑥𝑖 on chart domains in 𝑀. Corollary 4.13. Let (𝑈𝛼 , 𝑥𝛼 ) be a chart on 𝑀, 𝑝 a base point in 𝑈𝛼 , and 𝑓 ∈ 𝒞 ∞ (𝑝). If 𝛾(𝑡) is a 𝒞 ∞ curve such that 𝛾(0) = 𝑝, whose description in local coordinates has the form 𝑥𝛼 (𝛾(𝑡)) = (𝑥𝑖 (𝑡), … , 𝑥𝑚 (𝑡))

with 𝒞 ∞ coefficients 𝑥𝑖 (𝑡) ,

146

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

then the tangent vector 𝑋𝑝 = 𝛾′ (0) determined by differentiating along 𝛾(𝑡) is given by 𝑚

⟨𝑋𝑝 , 𝑓⟩ = ⟨𝛾′ (0) , 𝑓⟩ = ∑ 𝑗=1

𝑑𝑥𝑗 𝜕𝑓 (0) ⋅ (𝑝) 𝑑𝑡 𝜕𝑥𝑗

𝑚

𝑑𝑥𝑗 𝜕 | | ), 𝑓⟩ (0) ⋅ ( 𝑑𝑡 𝜕𝑥𝑗 |𝑝 𝑗=1

=⟨∑ for all 𝑓 ∈ 𝒞 ∞ (𝑝). Hence 𝑚

𝑋𝑝 = ∑

(4.8)

𝑗=1

𝑑𝑥𝑗 𝜕 | | ) (0) ⋅ ( 𝑑𝑡 𝜕𝑥𝑗 |𝑝

in TM𝑝 .

−1 Proof. After factoring 𝑓 ∘ 𝛾 = (𝑓 ∘ 𝑥𝛼 ) ∘ (𝑥𝛼 ∘ 𝛾), apply the chain rule to

get 𝑑 𝑑 −1 {𝑓 ∘ 𝛾(𝑡)} = {𝑓 ∘ 𝑥𝛼 (𝑥1 (𝑡), … , 𝑥𝑚 (𝑡))} 𝑑𝑡 𝑑𝑡 𝑚 𝑑𝑥𝑗 𝜕𝑓 (𝑡) ⋅ (𝛾(𝑡)). = ∑ 𝑑𝑡 𝜕𝑥 𝑗 𝑗=1

⟨𝛾′ (𝑡), 𝑓⟩ =

□

Then set 𝑡 = 0. Corollary 4.14. If (𝑈𝛼 , 𝑥𝛼 ) is a chart on 𝑀, the vectors 𝔛 = {(

𝜕 | 𝜕 | | ), … , ( | )} | 𝜕𝑥1 𝑝 𝜕𝑥𝑚 |𝑝

are a basis for the tangent space TM𝑝 for every 𝑝 ∈ 𝑈𝛼 . In particular, dim(TM𝑝 ) = 𝑚, so the space of derivations on 𝒞 ∞ (𝑝) is finite dimensional even though the local algebra 𝒞 ∞ (𝑝) itself is infinite dimensional. Proof. With some effort, one can prove that every local derivation 𝐷 ∶ 𝒞 ∞ (𝑝) → ℝ is a directional derivative 𝛾′ (0) along some (not necessarily unique) smooth curve such that 𝛾(0) = 𝑝. The identity (4.8) shows that the vectors in 𝔛 span TM𝑝 . They are also independent: if there are 𝑐𝑖 such that 𝑚 0 = ∑𝑗=1 𝑐𝑖 (𝜕/𝜕𝑥𝑖 |𝑢 ) for all 𝑢 ∈ 𝑈𝛼 , we may apply this functional on 𝒞 ∞ (𝑈𝛼 ) to each of the scalar components 𝑋𝑘 (𝑢) of the chart map 𝑥𝛼 (𝑢) = (𝑋1 (𝑢), … , 𝑋𝑚 (𝑢)) to get 𝑚

0 = ∑ 𝑐𝑖 ⟨ 𝑖=1

𝑚

𝜕 | | , 𝑋 ⟩ = ∑ 𝑐𝑖 ⋅ 𝛿𝑖𝑘 = 𝑐𝑘 𝜕𝑥𝑖 |𝑢 𝑘 𝑖=1

for 𝑘 = 1, 2, … , 𝑚.

Thus, 𝔛 is a basis for TM𝑝 at every base point 𝑝 ∈ 𝑈𝛼 .

□

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

147

Change of Coordinates. At each point 𝑢 in the chart domain 𝑈𝛼 , a chart (𝑈𝛼 , 𝑥𝛼 ) determines basis vectors (𝜕/𝜕𝑥𝑖 |𝑢 ) and partial derivatives 𝜕𝑓/𝜕𝑥𝑖 (𝑢). A different chart (𝑈𝛽 , 𝑦𝛽 ) about 𝑝 will assign other basis vectors (𝜕/𝜕𝑦𝑖 |𝑢 ) and partial derivatives 𝜕/𝜕𝑦𝑖 (𝑢) at points where the domains overlap. We often need to pass descriptions of these constructs from one coordinate system to another. Given overlapping charts (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) on 𝑀, the scalar components in 𝑥𝛼 = (𝑋1 , … , 𝑋𝑚 ) and 𝑦𝛽 = (𝑌1 , … , 𝑌𝑚 ) are 𝒞 ∞ functions on the chart domains. For future reference, we determine the coordinate transition maps −1 𝑥𝛼 ∘ 𝑦𝛽−1 and 𝑦𝛽 ∘ 𝑥𝛼 shown in Figure 1.1. They are defined on the open sets in 𝑚 ℝ that correspond to 𝑈 = 𝑈𝛼 ∩ 𝑈𝛽 in 𝑀 under the chart maps 𝑥𝛼 and 𝑦𝛽 . A point 𝐱 ∈ 𝑥𝛼 (𝑈) maps to −1 −1 −1 𝐲 = (𝑦1 , … , 𝑦𝑛 ) = 𝑦𝛽 ∘ 𝑥𝛼 (𝐱) = (𝑌1 ∘ 𝑥𝛼 (𝐱), … , 𝑌𝑚 ∘ 𝑥𝛼 (𝐱) ) −1 in 𝑦𝛽 (𝑈), so 𝑦𝑗 = 𝑌𝑗 ∘ 𝑥𝛼 (𝐱). The vector-valued map 𝐲 = 𝐹(𝐱) = 𝑦𝛽 ∘ 𝑥𝛼 (𝐱) 𝑚 𝑚 ∞ from ℝ to ℝ is 𝒞 , and its classical Jacobian matrix is

(4.9)

−1 [𝐷𝐹(𝐱)] = [𝐷𝑥𝑗 (𝑦𝑖 )] = [𝐷𝑥𝑗 (𝑌𝑖 ∘ 𝑥𝛼 )(𝐱) ] =

−1 −1 𝜕(𝑌1 ∘ 𝑥𝛼 , … , 𝑌𝑚 ∘ 𝑥𝛼 ) 𝜕(𝑥1 , … , 𝑥𝑛 )

for 𝐱 ∈ 𝑥𝛼 (𝑈). In the reverse direction, we have 𝐱 = (𝑥1 , … , 𝑥𝑚 ) = 𝐺(𝐲) = 𝑥𝛼 ∘ 𝑦𝛽−1 (𝐲) = (𝑋1 ∘ 𝑦𝛽−1 , … , 𝑋𝑚 ∘ 𝑦𝛽−1 ), and this inverse map has Jacobian matrix (4.10)

[𝐷𝐺(𝐲)] = [𝐷𝑦𝑗 (𝑥𝑖 )] = [𝐷𝑦𝑗 (𝑋𝑖 ∘ 𝑦𝛽−1 )(𝐲) ] for 𝐲 ∈ 𝑦𝛽 (𝑈).

The transformation law (4.11) below will let us write derivatives 𝜕𝑓/𝜕𝑥𝑖 computed with respect to one chart on 𝑀 in terms of the partial derivatives 𝜕𝑓/𝜕𝑦𝑗 for another chart. We will make heavy use of these laws. Note that the resulting formulas are cast entirely in terms of functions that live on the manifold 𝑀. Theorem 4.15 (Change of Variable for Partial Derivatives). Let (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) be overlapping charts on a manifold 𝑀. For any 𝒞 ∞ function 𝑓 on 𝑀, we have (4.11)

𝑚 𝜕𝑌𝑗 𝜕𝑓 𝜕𝑓 (𝑢) = ∑ (𝑢) ⋅ (𝑢) 𝜕𝑥𝑖 𝜕𝑦𝑗 𝜕𝑥𝑖 𝑗=1

for all 𝑢 ∈ 𝑈𝛼 ∩ 𝑈𝛽

where 𝑌𝑗 ∈ 𝒞 ∞ (𝑈𝛽 ) are the scalar components of the chart map 𝑦𝛽 = (𝑌1 , … , 𝑌𝑚 ). Proof. The proof is a mildly strenuous exercise in applying the classical chain rule (and our definition of the operators 𝜕/𝜕𝑥𝑖 on the manifold). For 𝑢 ∈ 𝑈𝛼 ∩ 𝑈𝛽 , we have 𝜕𝑓 −1 −1 (𝑢) = (𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 )) ∘ 𝑥𝛼 (𝑢) = [𝐷𝑥𝑖 ((𝑓 ∘ 𝑦𝛽−1 ) ∘ (𝑦𝛽 ∘ 𝑥𝛼 ))] ∘ 𝑥𝛼 (𝑢). 𝜕𝑥𝑖

148

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

−1 Insertion of 𝑦𝛽−1 ∘ 𝑦𝛽 is a crucial step that makes the map 𝑓 ∘ 𝑥𝛼 ∶ ℝ𝑚 → ℝ a 𝜙

𝐹

composite of smooth maps ℝ𝑚 ⟶ ℝ𝑚 → ℝ between Euclidean spaces, with −1 𝐹 = 𝑓 ∘ 𝑦𝛽−1 and 𝜙 = 𝑦𝛽 ∘ 𝑥𝛼 . The classical chain rule can then be applied to obtain 𝑚

𝐷𝑥𝑖 (𝐹 ∘ 𝜙)(𝐱) = ∑ 𝐷𝑦𝑗 𝐹(𝜙(𝐱)) ⋅ 𝐷𝑥𝑖 𝑦𝑗 (𝐱) 𝑗=1

where 𝐲 = 𝜙(𝐱) = (𝑦1 (𝐱), … , 𝑦𝑚 (𝐱)) maps ℝ𝑚 → ℝ𝑚 . Now set 𝐹 = 𝑓 ∘ 𝑦𝛽−1 and −1 𝐲 = 𝜙(𝐱) = 𝑦𝛽 ∘ 𝑥𝛼 (𝐱) = (𝑦1 (𝐱), … , 𝑦𝑚 (𝐱)) mapping ℝ𝑚 → ℝ𝑚 to get 𝑚

𝜕𝑓 −1 −1 (𝑢) = ∑ [(𝐷𝑦𝑗 (𝑓 ∘ 𝑦𝛽−1 )) ∘ (𝑦𝛽 ∘ 𝑥𝛼 )](𝑥𝛼 (𝑢)) ⋅ [𝐷𝑥𝑖 ((𝑦𝛽 ∘ 𝑥𝛼 )𝑗 )](𝑥𝛼 (𝑢)) 𝜕𝑥𝑖 𝑗=1 𝑚 −1 = ∑ [((𝐷𝑦𝑗 (𝑓 ∘ 𝑦𝛽−1 )) ∘ 𝑦𝛽 )(𝑢)] ⋅ [𝐷𝑥𝑖 (𝑌𝑗 ∘ 𝑥𝛼 )](𝑥𝛼 (𝑢)) 𝑗=1 𝑚

𝜕𝑌𝑗 𝜕𝑓 (𝑢) ⋅ (𝑢) 𝜕𝑦𝑗 𝜕𝑥𝑖 𝑗=1

= ∑

(definition of 𝜕𝑓/𝜕𝑦𝑗 and 𝜕𝑌𝑗 /𝜕𝑥𝑖 ) □

on 𝑁.

Corollary 4.16 (Change of Variable Formula for Operators). The operators 𝜕/𝜕𝑥𝑖 and 𝜕/𝜕𝑦𝑗 determined by overlapping charts (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) on a manifold 𝑀 transform in the following way: (4.12)

𝑚 𝜕𝑌𝑗 𝜕 | 𝜕 | | = ∑ | (𝑢) ⋅ | 𝜕𝑥𝑖 𝑢 𝑗=1 𝜕𝑥𝑖 𝜕𝑦𝑗 |𝑢

for 1 ≤ 𝑖, 𝑗 ≤ 𝑚 and 𝑢 ∈ 𝑈𝛼 ∩ 𝑈𝛽 .

These “change of variable” formulas are easy to remember: the 𝑌𝑗 and 𝑦𝑗 should “cancel” when the formula is written correctly, leaving only terms that involve 𝜕/𝜕𝑥𝑖 . Given overlapping charts (𝑈𝛼 , 𝑥𝛼 ), (𝑈𝛽 , 𝑦𝛽 ) on 𝑀, entries in the Jacobian matrices (4.9) and (4.10) are 𝒞 ∞ scalar-valued functions on open subsets of coordinate space ℝ𝑚 : 𝜕𝑦𝑖 −1 (𝐱)] = [𝐷𝑥𝑗 (𝑌𝑖 ∘ 𝑥𝛼 )] 𝜕𝑥𝑗 𝜕𝑥 [ 𝑖 (𝐲)] = [ 𝐷𝑦𝑗 (𝑋𝑖 ∘ 𝑦𝛽−1 ) ] 𝜕𝑦𝑗 [

at points 𝐱 ∈ 𝑥𝛼 (𝑈𝛼 ) ⊆ ℝ𝑚 at points 𝐲 ∈ 𝑦𝛽 (𝑈𝛽 ) ⊆ ℝ𝑚 .

These matrices are inverses of each other when evaluated at base points 𝐱, 𝐲 in ℝ𝑚 that correspond under the transition maps. This relation between Jacobian matrices becomes clearer if we move these matrix-valued functions from coordinate space ℝ𝑚 to the manifold 𝑀 itself. The resulting matrix-valued functions

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

149

defined on 𝑀 [

−1 𝜕(𝑌𝑖 ∘ 𝑥𝛼 ) 𝜕𝑦 𝜕𝑌𝑖 ∘ 𝑥𝛼 ] (by Lemma 4.11) ] = [ 𝑖 ∘ 𝑥𝛼 ] = [ 𝜕𝑥𝑗 𝜕𝑥𝑗 𝜕𝑥𝑗

𝜕(𝑋𝑖 ∘ 𝑦𝛽−1 ) 𝜕𝑋𝑖 𝜕𝑥𝑖 ∘ 𝑦𝛽 ] = [ ∘ 𝑦𝛽 ] [ ]=[ 𝜕𝑦𝑗 𝜕𝑦𝑗 𝜕𝑦𝑗

(Lemma 4.11 again)

are then matrix-inverses of each other at every base point in 𝑈𝛼 ∩ 𝑈𝛽 . Exercise 4.17. Prove that the transferred Jacobian matrices on 𝑀 are inverses of each other at every base point in 𝑈𝛼 ∩ 𝑈𝛽 , so (4.13)

[

𝜕𝑋 𝜕𝑌𝑖 (𝑢)] ⋅ [ 𝑖 (𝑢)] = 𝐼𝑚×𝑚 𝜕𝑥𝑗 𝜕𝑦𝑗

for all 𝑢 ∈ 𝑈𝛼 ∩ 𝑈𝛽 .

Hint. By Example 4.12, 𝜕𝑌𝑖 /𝜕𝑦𝑗 = 𝛿𝑖𝑗 ; rewrite 𝜕/𝜕𝑦𝑗 in terms of the 𝜕/𝜕𝑥𝑘 as in (4.12). Vector Fields as Differential Operators on 𝑴. On 𝑀, or any open subset thereof, a vector field 𝑋˜ is a map that assigns a tangent vector 𝑋𝑝 ∈ TM𝑝 at each 𝑝 ∈ 𝑀. By Corollary 4.13, on any chart domain 𝑈𝛼 , there are uniquely determined coefficients 𝑐𝑖 (𝑢) such that 𝑚

𝜕 | | ) 𝑋˜𝑢 = ∑ 𝑐𝑖 (𝑢) ⋅ ( 𝜕𝑥𝑖 |𝑢

for 𝑢 ∈ 𝑈𝛼 .

𝑖=1

These coefficients will change if we pass to another coordinate chart, but smoothness of the coefficients is always preserved. Lemma 4.18. If 𝑋𝑢 = ∑ 𝑐𝑖 (𝑢) ⋅ (

𝜕 𝜕𝑥𝑖

|𝑢 ) for 𝑢 near base point 𝑝 ∈ 𝑈𝛼 when

described in local coordinates (𝑈𝛼 , 𝑥𝛼 ), then in any other chart (𝑈𝛽 , 𝑦𝛽 ) containing 𝑝, we have 𝑚

𝑋𝑢 = ∑ 𝑑𝑗 (𝑢) ⋅ ( 𝑗=1

𝜕 | ) 𝜕𝑦𝑗 𝑢

with coefficients 𝑚

𝑑𝑗 (𝑢) = ∑ 𝑐𝑘 (𝑢) ⋅ 𝑘=1

𝜕𝑌𝑗 (𝑢) , 𝜕𝑥𝑘

where the 𝑌𝑗 are the scalar components of 𝑦𝛽 = (𝑌1 (𝑢), … , 𝑌𝑚 (𝑢)). Proof. This is immediate from equation (4.11).

□

A smooth vector field 𝑋˜ is a field 𝑝 ↦ 𝑋𝑝 of tangent vectors on 𝑀 whose description in local coordinates has 𝒞 ∞ coefficients for every chart. It follows from Lemma 4.18 that smoothness of 𝑋˜ on 𝑀 has a coordinate-independent meaning because the scalar components 𝑌𝑗 (𝑢) of 𝑦𝛽 and their derivatives 𝜕𝑌𝑗 /𝜕𝑥𝑘 are smooth functions on 𝑀.

150

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Corollary 4.19. If 𝑋˜ is a vector field on 𝑀 whose description in local coor𝜕 dinates is 𝑋𝑢 = ∑ 𝑐𝑖 (𝑢) ⋅ ( |𝑢 ) with coefficients that are 𝒞 ∞ near 𝑝, then the 𝜕𝑥𝑖

same will be true on 𝑈𝛼 ∩ 𝑈𝛽 for any other chart (𝑈𝛽 , 𝑦𝛽 ). The set of smooth vector fields on 𝑀 is denoted 𝒟 (1,0) (𝑀), which becomes a vector space over ℝ if we define ˜ 𝑝 = 𝜆 ⋅ 𝑋𝑝 for 𝑝 ∈ 𝑀. (𝑋˜ + 𝑌˜)𝑝 = 𝑋𝑝 + 𝑌𝑝 and (𝜆𝑋) This ∞-dimensional space is also a “𝒞 ∞ (𝑀)-module” because there is a natural action 𝒞 ∞ (𝑀) × 𝒟 (1,0) (𝑀) → 𝒟 (1,0) (𝑀) defined by pointwise multiplication, ˜ 𝑝 = 𝑓(𝑝) ⋅ 𝑋𝑝 for all 𝑝. (𝑓 ⋅ 𝑋) Similarly, an action 𝒟 (1,0) ×𝒞 ∞ (𝑀) → 𝒞 ∞ (𝑀) is obtained by letting a vector field 𝑋˜ act on functions in the following way: ˜ (4.14) 𝑋𝑓(𝑢) = ⟨ 𝑋𝑢 , 𝑓⟩ for all 𝑢 ∈ 𝑀. ˜ is described in local chart coordinates, 𝑋˜ becomes an operator that When 𝑋𝑓 acts on smooth functions on coordinate space ℝ𝑚 as a first-order partial differential operator with variable coefficients and no scalar term 𝑎0 (𝑢)𝐼. In fact, if (𝑈𝛼 , 𝑥𝛼 ) is a chart on 𝑀 and we describe 𝑋˜ in chart coordinates, we get 𝜕 | | ) with 𝑐𝑖 (𝑢) ∈ 𝒞 ∞ (𝑈𝛼 ) , 𝑋𝑢 = ∑ 𝑐𝑖 (𝑢) ⋅ ( 𝜕𝑥𝑖 |𝑢 𝑖

and then 𝑚

𝜕 | ˜ | ,𝑓⟩ 𝑋𝑓(𝑢) = ⟨ 𝑋𝑢 , 𝑓 ⟩ = ∑ 𝑐𝑖 (𝑢) ⋅ ⟨ 𝜕𝑥𝑖 |𝑢 𝑖=1 𝑚

= ∑ 𝑐𝑖 (𝑢) ⋅ 𝑖=1

𝜕𝑓 (𝑢) 𝜕𝑥𝑖

is in 𝒞 ∞ (𝑈𝛼 ). When smooth vector fields are regarded as differential operators on 𝑀, they have their own “global derivation” property. In addition to being linear operators 𝑋˜ ∶ 𝒞 ∞ (𝑀) → 𝒞 ∞ (𝑀), they are linear derivations on 𝒞 ∞ (𝑀) with ˜ ⋅ ℎ)(𝑢) = 𝑋𝑓(𝑢) ˜ ˜ 𝑋(𝑓 ⋅ ℎ(𝑢) + 𝑓(𝑢) ⋅ 𝑋ℎ(𝑢) for 𝑢 ∈ 𝑀 and 𝑓, ℎ ∈ 𝒞 ∞ (𝑀). This follows directly from Definition 4.5. Another important property involves the support sets of functions 𝑓 in 𝒞 ∞ (𝑀), supp(𝑓) = Closure in 𝑀 of the set {𝑢 ∈ 𝑀 ∶ 𝑓(𝑢) ≠ 0} = Complement in 𝑀 of the open set {𝑢 ∈ 𝑀 ∶ 𝑓 ≡ 0 near 𝑢}. Proposition 4.20 (Reduction of Supports). The action of a smooth vector field 𝑋˜ can only decrease the support set supp(𝑓) of a function 𝑓 ∈ 𝒞 ∞ (𝑀) so that ˜ 𝑓 ≡ 0 on an open set 𝑈 ⊆ 𝑀 ⇒ 𝑋(𝑓) ≡ 0 on 𝑈. ˜ ≡0 Proof. If 𝑓 ≡ 0 near 𝑝, write 𝑋˜ in local chart coordinates to see 𝑋𝑓 near 𝑝. □

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

151

By looking at differences 𝑓 − ℎ, we see that 𝑓 ≡ ℎ on an open set 𝑈 ⊆ 𝑀 implies 𝑋˜ 𝑓 ≡ 𝑋˜ ℎ on 𝑈 for all 𝑓, ℎ ∈ 𝒞 ∞ (𝑀). Thus, smooth vector fields 𝑋˜ are local operators on ˜ at a point 𝑝 is determined only by the behavior of 𝑓 in 𝒞 ∞ (𝑀): the value of 𝑋𝑓 the immediate vicinity of 𝑝. The Differential of a Map Between Manifolds. If 𝜙 ∶ 𝑀 → 𝑁 is a 𝒞 map, 𝑝 ∈ 𝑀, and 𝑞 = 𝜙(𝑝) in 𝑁, then under the action of 𝜙, we have the following: • Points 𝑝 ∈ 𝑀 get moved forward to points 𝑞 = 𝜙(𝑝) in 𝑁. ∞

• Functions 𝑓 ∈ 𝒞 ∞ (𝑁) get “pulled back” to functions 𝜙T 𝑓 = 𝑓 ∘ 𝜙 in 𝒞 ∞ (𝑀). • Tangent vectors 𝑋𝑝 in TM𝑝 get “pushed forward” to vectors (𝑑𝜙)𝑝 (𝑋𝑝 ) in TN𝑞 by the differential of 𝜙 at 𝑝, the linear map (𝑑𝜙)𝑝 from TM𝑝 to TM𝑞 defined as follows. Definition 4.21 (The Differential 𝑑𝜙𝑝 ∶ TM𝑝 → TN𝜙(𝑝) ). For 𝑋𝑝 ∈ TM𝑝 ∞ and 𝑓 ∈ 𝒞𝑁 (𝜙(𝑝)), we define (𝑑𝜙)𝑝 𝑋𝑝 in TN𝜙(𝑝) to be the unique tangent vector 𝑌𝑞 at 𝑞 = 𝜙(𝑝) such that ⟨𝑌𝑞 , 𝑓⟩ = ⟨ (𝑑𝜙)𝑝 𝑋𝑝 , 𝑓 ⟩ = ⟨𝑋𝑝 , 𝑓 ∘ 𝜙⟩ for all 𝒞 ∞ functions 𝑓 defined on 𝑁 near 𝜙(𝑝). ∞ (𝜙(𝑝)), so it is in TN𝑞 ; it It is obvious that 𝑌𝑞 = (𝑑𝜙)𝑝 𝑋𝑝 is a dervation on 𝒞𝑁 is also obvious that (𝑑𝜙)𝑝 is a linear map from TM𝑝 → TN𝜙(𝑝) , which we call the “differential of 𝜙 at 𝑝.” None of these definitions refer to coordinates on 𝑀 or 𝑁. To calculate the effect of (𝑑𝜙)𝑝 , we impose local charts (𝑈𝛼 , 𝑥𝛼 ) about 𝑝 and (𝑈𝛽 , 𝑦𝛽 ) about 𝑞 = 𝜙(𝑝) which determine bases 𝔛 = {(𝜕/𝜕𝑥𝑖 |𝑝 )} and 𝔜 = {(𝜕/𝜕𝑦𝑗 |𝑞 )} in TM𝑝 and TN𝑞 . Since (𝑑𝜙)𝑝 ∶ TM𝑝 → TN𝜙(𝑝) is linear, its action is completely determined once we know what it does to these basis vectors. The end result is a law similar to the change of coordinates rule in Corollary 4.16 to Theorem 4.15.

Theorem 4.22 (Transformation Law for Tangent Vectors). If 𝜙 ∶ 𝑀 → 𝑁 is a 𝒞 ∞ map, 𝑝 ∈ 𝑀, and 𝑞 = 𝜙(𝑝), let (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) be charts about 𝑝 and 𝑞 respectively. If the scalar components of the chart map 𝑦𝛽 on 𝑁 are given by 𝑦𝛽 (𝑣) = (𝑌1 (𝑣), … , 𝑌𝑛 (𝑣)) for 𝑣 ∈ 𝑈𝛽 , then (4.15)

(𝑑𝜙)𝑝 (

𝑛 𝜕(𝑌𝑗 ∘ 𝜙) 𝜕 | 𝜕 | | )= ∑ (𝑝) ⋅ ( | ) 𝜕𝑥𝑖 |𝑝 𝜕𝑥𝑖 𝜕𝑦𝑗 |𝜙(𝑝) 𝑗=1

for 1 ≤ 𝑖 ≤ 𝑚.

Proof. On any chart (𝑈𝛽 , 𝑦𝛽 ) in 𝑁, the scalar components of 𝑦𝛽 have the property 𝜕𝑌𝑖 /𝜕𝑦𝑗 ≡ 𝛿𝑖𝑗 (Kronecker delta)

152

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

throughout the chart domain 𝑈𝛽 . The image in TN𝜙(𝑝) of any basis vector (𝜕/𝜕𝑥𝑖 |𝑝 ) in TM𝑝 can be written 𝑛

(𝑑𝜙)𝑝 (

𝜕 | 𝜕 | | ) = ∑ 𝑐𝑗 (𝑞) ⋅ ( | ) 𝜕𝑥𝑖 |𝑝 𝜕𝑦𝑗 |𝑞 𝑗=1

(𝑞 = 𝜙(𝑝)).

By (4.7) the coefficients 𝑐𝑗 (𝑞) can be recovered by bracketing with the scalar components 𝑌𝑗 of the chart map 𝑦𝛽 , so 𝑐𝑗 (𝑞) = ⟨ (𝑑𝜙)𝑝 ( = ⟨( =

𝜕 | | ) , 𝑌𝑗 ⟩ 𝜕𝑥𝑖 |𝑝

𝜕 | | ) , 𝑌𝑗 ∘ 𝜙 ⟩ 𝜕𝑥𝑖 |𝑝

(Definition 4.21 of (𝑑𝜙)𝑝 )

𝜕(𝑌𝑖 ∘ 𝜙) (𝑝). 𝜕𝑥𝑖 □

That proves the identity (4.15).

Note the formal cancellation of “𝑦𝑗 ” and “𝑌𝑗 ” when the formula is written correctly. Note also that we recover the change of variable formula (Corollary 4.16) when 𝑀 = 𝑁 and 𝜙 = id𝑀 . 𝜓

𝜙

Exercise 4.23. Let 𝑀 → 𝑁 → 𝑄 be 𝒞 ∞ maps between manifolds. If 𝑝 ∈ 𝑀 and 𝑞 = 𝜓(𝑝) in 𝑁, explain why the composite 𝜙 ∘ 𝜓 ∶ 𝑀 → 𝑄 is a 𝒞 ∞ map and prove that 𝑑(𝜙 ∘ 𝜓)𝑝 = (𝑑𝜙)𝜓(𝑝) ∘ (𝑑𝜓)𝑝 as maps TM𝑝 → TN𝜓(𝑝) → TQ𝜙(𝜓(𝑝)) . Next, we show how to calculate the differential (𝑑𝜙)𝑝 ∶ TM𝑝 → TN𝜙(𝑝) of a smooth mapping between manifolds. Example 4.24. The complex variable “squaring map” 𝑤 = 𝜙(𝑧) = 𝑧2 takes the form 𝑢 + 𝑖𝑣 = 𝜙(𝑥, 𝑦) = (𝑥 + 𝑖𝑦)2 = (𝑥2 − 𝑦2 ) + 𝑖(2𝑥𝑦) = (𝑥2 − 𝑦2 , 2𝑥𝑦) in Cartesian coordinates when we write 𝑧 = 𝑥 + 𝑖𝑦 and 𝑤 = 𝑢 + 𝑖𝑣. This is a map 𝜙 ∶ 𝑀 → 𝑁 between two copies of the complex plane ℂ, each of which can be regarded as a 𝒞 ∞ manifold whose structure is determined by a single global chart: • For 𝑀, take (𝑈𝛼 , 𝑥𝛼 ) with 𝑈𝛼 = 𝑀 and (𝑥, 𝑦) = 𝑥𝛼 (𝑧) = (𝑋(𝑧), 𝑌 (𝑧)) in ℝ2 , where 𝑋(𝑧) = 𝑥 and 𝑌 (𝑧) = 𝑦 if 𝑧 = 𝑥 + 𝑖𝑦. • For 𝑁, take (𝑈𝛽 , 𝑦𝛽 ) with 𝑈𝛽 = 𝑁 and (𝑢, 𝑣) = 𝑦𝛽 (𝑤) = (𝑈(𝑤), 𝑉(𝑤)) in ℝ2 , where 𝑈(𝑤) = 𝑢 and 𝑉(𝑤) = 𝑣 if 𝑤 = 𝑢 + 𝑖𝑣. −1 (𝑥, 𝑦) forces At a typical point 𝑧 = 𝑥 + 𝑖𝑦 in 𝑀, the identity (𝑢, 𝑣) = 𝑦𝛽 ∘ 𝜙 ∘ 𝑥𝛼 the scalar components of the chart maps 𝑥𝛼 = (𝑋, 𝑌 ) and 𝑦𝛽 = (𝑈, 𝑉) to satisfy

4.1. TANGENT VECTORS, COTANGENT VECTORS, AND TENSORS

153

the relations 𝑢 = 𝑋 2 (𝑧) − 𝑌 2 (𝑧) = 𝑥2 − 𝑦2 𝑣 = 2 𝑋(𝑧)𝑌 (𝑧) = 2𝑥𝑦 where 𝑤 = 𝑢 + 𝑖𝑣 = 𝑧2 . The players involved are shown in the following diagram: 𝑥 + 𝑖𝑦 = 𝑧

ℂ

↓

𝑥𝛼 ↓

(𝑋(𝑧), 𝑌 (𝑧)) = (𝑥, 𝑦)

𝜙

−−−−−−−−− −⟶ ℂ ↓ 𝑦𝛽 Φ=𝑦𝛽 ∘𝜙∘𝑥−1 𝛼

ℝ2 −−−−−−−−− −⟶ ℝ2

𝑤 = 𝑢 + 𝑖𝑣 = 𝑧2 ↓ (ᵆ, 𝑣) = (𝑈(𝑤), 𝑉 (𝑤))

where 𝑈(𝑤) = 𝑥2 − 𝑦2 and 𝑉(𝑤) = 2𝑥𝑦. If 𝑝 ∈ 𝑀 and 𝑞 = 𝜙(𝑝) ∈ 𝑁, the charts 𝑥𝛼 , 𝑦𝛽 determine basis vectors (𝜕/𝜕𝑥|𝑝 ), (𝜕/𝜕𝑦|𝑝 ) and (𝜕/𝜕𝑢|𝑞 ), (𝜕/𝜕𝑣|𝑞 ) in the tangent spaces TM𝑝 , TN𝑞 , which are two-dimensional over ℝ. To compute (𝑑𝜙)𝑝 , we describe 𝜙 in these local −1 coordinates. The 𝒞 ∞ map Φ = 𝑦𝛽 ∘ 𝜙 ∘ 𝑥𝛼 ∶ ℝ2 → ℝ2 between coordinate spaces is −1 (𝑢, 𝑣) = Φ(𝑥, 𝑦) = 𝑦𝛽 ∘ 𝜙 ∘ 𝑥𝛼 (𝑥, 𝑦) = 𝑦𝛽 (𝜙(𝑥 + 𝑖𝑦)) = (𝑥2 − 𝑦2 , 2𝑥𝑦),

so its Jacobian matrix is 𝜕(𝑢, 𝑣) 2𝑥 −2𝑦 =[ ] . 2𝑦 2𝑥 𝜕(𝑥, 𝑦) If 𝑝 = 𝑥 + 𝑖𝑦 in 𝑀, so 𝑥𝛼 (𝑝) = (𝑋(𝑝), 𝑌 (𝑝)) = (𝑥, 𝑦), basis vectors in TM𝑝 are transformed to vectors in TN𝜙(𝑝) as in (4.15): [𝐷Φ(𝑥, 𝑦)] =

(𝑑𝜙)𝑝 (

𝜕(𝑈 ∘ 𝜙) 𝜕(𝑉 ∘ 𝜙) 𝜕 | 𝜕 | 𝜕 | | )= (𝑝) ⋅ ( | (𝑝) ⋅ ( | )+ ). 𝜕𝑥 |𝑝 𝜕𝑥 𝜕𝑢 |𝜙(𝑝) 𝜕𝑥 𝜕𝑣 |𝜙(𝑝)

But on 𝑈𝛼 ⊆ 𝑀, we have 𝜕𝑋/𝜕𝑥 ≡ 1, 𝜕𝑋/𝜕𝑦 ≡ 0, etc. (courtesy of Example 4.12). Hence 𝜕(𝑈 ∘ 𝜙) 𝜕 𝜕𝑋 𝜕𝑌 = (𝑋 2 − 𝑌 2 ) = 2𝑋 ⋅ − 2𝑌 ⋅ = 2𝑋 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕(𝑉 ∘ 𝜙) 𝜕 𝜕𝑋 𝜕𝑌 = (2𝑋𝑌 ) = 2𝑌 ⋅ + 2𝑋 ⋅ = 2𝑌 , 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝜕𝑥 so at 𝑝 = 𝑥 + 𝑖𝑦 in 𝑀, we have 𝜕 | 𝜕 | 𝜕 | (𝑑𝜙)𝑝 ( | ) = 2𝑋(𝑝) ⋅ ( | ) + 2𝑌 (𝑝) ⋅ ( | ) 𝜕𝑥 |𝑝 𝜕𝑢 |𝜙(𝑝) 𝜕𝑣 |𝜙(𝑝) (4.16) 𝜕 | 𝜕 | = 2𝑥 ⋅ ( | ) + 2𝑦 ⋅ ( | ) | 𝜕𝑢 𝜙(𝑝) 𝜕𝑣 |𝜙(𝑝) where 𝜙(𝑝) = 𝑝2 . As a numerical example, suppose 𝑝 = 2 + 𝑖 in 𝑀, so 𝜙(𝑝) = (2 + 𝑖)2 = 3 + 4𝑖, and (4.16) becomes 𝜕 | 𝜕 | 𝜕 | (𝑑𝜙)𝑝 ( | ) = 4 ⋅ ( | )+2⋅( | ). 𝜕𝑥 |𝑝 𝜕𝑢 |𝜙(𝑝) 𝜕𝑣 |𝜙(𝑝)

154

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

A similar calculation yields (𝑑𝜙)𝑝 ( (4.17)

𝜕 | 𝜕 | 𝜕 | | ) = −2𝑌 (𝑝) ⋅ ( | ) + 2𝑋(𝑝) ⋅ ( | ) 𝜕𝑦 |𝑝 𝜕𝑢 |𝜙(𝑝) 𝜕𝑣 |𝜙(𝑝) = −2 ⋅ (

𝜕 | 𝜕 | | )+4⋅( | ). | 𝜕𝑢 𝜙(𝑝) 𝜕𝑣 |𝜙(𝑝)

Given this action on basis vectors, the action of (𝑑𝜙)𝑝 ∶ TM𝑝 → TN𝜙(𝑝) on arbitrary tangent vectors is determined by linearity. 4.2. Cotangent Vectors and Differential Forms As in Chapter 3 of LA I, the dual space 𝑉 ∗ of a vector space 𝑉 over ℝ is the space of linear functionals on 𝑉 — i.e., the linear maps ℓ ∶ 𝑉 → ℝ. If dim(𝑉) < ∞, then dim(𝑉 ∗ ) = dim(𝑉), and any basis 𝔛 = {𝑒1 , … , 𝑒𝑛 } in 𝑉 induces a dual basis 𝔛∗ = {𝑒1∗ , … , 𝑒𝑛∗ } in 𝑉 ∗ that is determined by the property ⟨ 𝑒𝑖∗ , 𝑒𝑗 ⟩ = 𝛿𝑖𝑗

(Kronecker delta symbol).

𝑛

If 𝑣 = ∑𝑖=1 𝑐𝑖 𝑒𝑖 in 𝑉, the dual functional 𝑒𝑘∗ reads the 𝑘th coefficient 𝑐𝑘 of 𝑣, 𝑛 so ⟨𝑒𝑘∗ , 𝑣⟩ = 𝑐𝑘 , and if ℓ = ∑𝑗=1 𝑑𝑗 𝑒𝑗∗ in 𝑉 ∗ , its coefficients can be found by bracketing ℓ with the basis vectors 𝑒𝑗 to get 𝑚

𝑑𝑗 = ⟨ ∑ 𝑑𝑘 𝑒𝑘∗ , 𝑒𝑗 ⟩ = ⟨ℓ, 𝑒𝑗 ⟩ for 1 ≤ 𝑗 ≤ 𝑛. 𝑘=1

For differentiable manifolds, the tangent spaces TM𝑝 and their dual spaces TM∗𝑝 both play crucial roles in differential geometry. Definition 4.25. The cotangent space at 𝑝 ∈ 𝑀 is the dual space TM∗𝑝 to the tangent space TM𝑝 . For reasons that will gradually emerge, this space is often referred to as the space of 1-forms, or rank-1 differential forms, on TM𝑝 and 1

is often denoted by ⋀ (TM∗𝑝 ). By definition, the field of scalars ℝ is regarded as the space of rank-0 differential forms on TM𝑝 , and we denote this 1-dimensional 0

space by ⋀ (TM∗𝑝 ). Notation. Given a chart (𝑈𝛼 , 𝑥𝛼 ) on 𝑀 and a point 𝑝 ∈ 𝑀, we get a basis 𝔛𝑝 = {(

𝜕 | | )} in TM𝑝 𝜕𝑥𝑖 |𝑝

∗

𝔛∗𝑝 = {(

and a dual basis

𝜕 | | ) } in TM∗𝑝 . 𝜕𝑥𝑖 |𝑝

For various reasons, these dual vectors have come to be denoted by other symbols that may seem peculiar at first, but the notation will grow on you as its advantages become apparent. Hereafter, we shall write (4.18)

𝜕 | | ) (𝑑𝑥𝑖 )𝑝 ≡ ( 𝜕𝑥𝑖 |𝑝

∗

4.2. COTANGENT VECTORS AND DIFFERENTIAL FORMS

155

so the dual basis determined by a chart can be written succinctly as ∗

(𝑑𝑥1 )𝑝 , … , (𝑑𝑥𝑚 )𝑝

instead of

(

∗

𝜕 | 𝜕 | | ) ,…,( | ) . | 𝜕𝑥1 𝑝 𝜕𝑥𝑚 |𝑝

By definition of “dual basis,” we then have 𝜕 | | ) ⟩ = 𝛿𝑖𝑗 (Kronecker delta) (4.19) ⟨ (𝑑𝑥𝑖 )𝑝 , ( 𝜕𝑥𝑗 |𝑝 for 1 ≤ 𝑖, 𝑗 ≤ 𝑚.

○

The Canonical d-Operator on Functions. Each 𝑓 ∈ 𝒞 ∞ (𝑝) determines an element (𝑑𝑓)𝑝 ∈ TM∗𝑝 , the exterior derivative of 𝑓 at 𝑝. This is defined via a construction that makes no mention of local coordinates. Definition 4.26 (Exterior Derivative of a Function). If 𝑓 ∶ 𝑀 → ℝ is a smooth function defined near base point 𝑝 in a manifold 𝑀, its exterior derivative at 𝑝 is the linear map (𝑑𝑓)𝑝 ∶ TM𝑝 → ℝ, a cotangent vector in TM∗𝑝 determined by the identity (4.20)

⟨ (𝑑𝑓)𝑝 , 𝑋𝑝 ⟩ = ⟨𝑋𝑝 , 𝑓 ⟩ ∞

for all 𝑓 ∈ 𝒞 (𝑝) and 𝑋𝑝 ∈ TM𝑝 . On any chart domain 𝑈𝛼 , the outcome is described in local coordinates by writing (𝑑𝑓)𝑝 as a linear combination of the dual basis vectors (𝑑𝑥1 )𝑝 , … , (𝑑𝑥𝑚 )𝑝 determined by the chart (𝑈𝛼 , 𝑥𝛼 ). Proposition 4.27. Given a chart (𝑈𝛼 , 𝑥𝛼 ) and a function 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ), the rank-0 exterior derivative (𝑑𝑓)𝑝 ∈ TM∗𝑝 has the following description in local chart cooordinates at every base point 𝑝 ∈ 𝑈𝛼 : 𝑚

𝜕𝑓 (𝑝) ⋅ (𝑑𝑥𝑖 )𝑝 . 𝜕𝑥𝑖 𝑖=1

(𝑑𝑓)𝑝 = ∑

(4.21)

Proof. At each 𝑝 ∈ 𝑈𝛼 , there exist unique coefficients such that (𝑑𝑓)𝑝 = in TM∗𝑝 . To determine 𝑐𝑖 (𝑝), simply apply (𝑑𝑓)𝑝 to the basis vectors (𝜕/𝜕𝑥𝑖 |𝑝 ) in TM𝑝 ; by (4.18), (4.20), and (4.21), we get 𝑚 ∑𝑖=1 𝑐𝑖 (𝑢) (𝑑𝑥𝑖 )𝑝

𝑐𝑖 (𝑝) = ⟨(𝑑𝑓)𝑝 , for 1 ≤ 𝑖 ≤ 𝑚.

𝜕𝑓 𝜕 | 𝜕 | | ⟩=⟨ | , 𝑓⟩ = (𝑝) 𝜕𝑥𝑖 |𝑝 𝜕𝑥𝑖 |𝑝 𝜕𝑥𝑖 □

It is important to observe that (𝑑𝑓)𝑝 has the same general form in all coordinate systems. For example, given Cartesian coordinates 𝐱 = 𝑥𝛼 (𝑢) = (𝑥, 𝑦) and polar coordinates 𝐲 = 𝑦𝛽 (𝑢) = (𝑟, 𝜃) on an open subset 𝑈 ⊆ 𝑀 = ℝ2 , we have 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 (𝑑𝑓)𝑢 = (𝑢) ⋅ (𝑑𝑥)𝑢 + (𝑢) ⋅ (𝑑𝑦)𝑢 = (𝑢) ⋅ (𝑑𝑟)𝑢 + (𝑢) ⋅ (𝑑𝜃)𝑢 𝜕𝑥 𝜕𝑦 𝜕𝑟 𝜕𝜃 for all 𝑢 where the charts overlap. Notice the resemblance between the descrip𝑛 tion of (𝑑𝑓)𝑝 in (4.21) and the classical gradient ∇𝑓(𝑝) = ∑𝑖=1 𝐷𝑥𝑖 𝑓(𝑝) 𝐞𝑖 of a

156

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

smooth function 𝑓 ∶ ℝ𝑛 → ℝ. This is not an accident. In the theory of manifolds, the classical gradient operation ∇𝑓 is obtained by applying the exterior 1 derivative 𝑑 ∶ 𝒞 ∞ (𝑀) → ⋀ (TM∗𝑝 ); then ∇𝑓 becomes the 1-form 𝑑𝑓 of (4.21). Every cotangent vector 𝜔𝑝 ∈ TM∗𝑝 is equal to (𝑑𝑓)𝑝 for some 𝑓 ∈ 𝒞 ∞ (𝑝), but the 𝑓 is not unique. Lemma 4.28. The map 𝑑 ∶ 𝒞 ∞ (𝑝) → TM∗𝑝 that sends 𝑓 ↦ (𝑑𝑓)𝑝 is linear and surjective but not one-to-one. The nontrival kernel is ker(𝑑) = {𝑓 ∈ 𝒞 ∞ (𝑝) ∶

𝜕𝑓 𝜕𝑓 (𝑝) = ⋯ = (𝑝) = 0}. 𝜕𝑥1 𝜕𝑥𝑚

1

Thus (𝑑𝑓)𝑝 = 0 in ⋀ (TM∗𝑝 ) if and only if 𝑝 is a critical point for 𝑓 ∶ ℝ𝑚 → ℝ when 𝑓 is described in local coordinates. Proof. If we have 𝜕𝑓/𝜕𝑥𝑖 (𝑝) = 0 for 1 ≤ 𝑖 ≤ 𝑚 for one chart (𝑈𝛼 , 𝑥𝛼 ) about 𝑝, then by (4.21) this must be true for every other chart (𝑈𝛽 , 𝑦𝛽 ) about 𝑝. Thus, the property “𝑝 is a critical point for 𝑓” is independent of any choice of local coordinates. Surjectivity follows for an interesting reason. The scalar component functions 𝑋𝑖 (𝑢) of the chart map 𝑥𝛼 (𝑢) = (𝑋1 (𝑢), … , 𝑋𝑚 (𝑢)) are in 𝒞 ∞ (𝑈𝛼 ). But by (4.7) we get (4.22)

(𝑑𝑋𝑖 )𝑢 = (𝑑𝑥𝑖 )𝑢

In fact, when we apply (𝑑𝑋𝑖 )𝑝 ∈ (4.7) yields

TM∗𝑝

for 1 ≤ 𝑖 ≤ 𝑚.

to a typical basis vector in TM𝑝 , equation 𝑚

𝑚

𝜕𝑋𝑖 (𝑢) ⋅ (𝑑𝑥𝑗 )𝑢 = ∑ 𝛿𝑖𝑗 ⋅ (𝑑𝑥𝑗 )𝑢 = (𝑑𝑥𝑖 )𝑢 . (𝑑𝑋𝑖 )𝑢 = ∑ 𝜕𝑥𝑗 𝑗=1 𝑗=1 ∗

Thus (𝑑𝑋𝑖 )𝑢 and the dual vector (𝑑𝑥𝑖 )𝑢 = (𝜕/𝜕𝑥𝑖 |𝑢 ) determined by the chart (𝑈𝛼 , 𝑥𝛼 ) are the same element in TM∗𝑢 for all 𝑢 ∈ 𝑈𝛼 , despite their very different origins. It is now clear that the range of 𝑑 ∶ 𝒞 ∞ (𝑝) → TM∗𝑝 contains a basis for TM∗𝑝 , and therefore 𝑑 is surjective. □ Let 𝜔 ∶ 𝑝 → 𝜔𝑝 ∈ TM∗𝑝 be a field of 1-forms on 𝑀. Given a chart (𝑈𝛼 , 𝑥𝛼 ), there are unique coefficients 𝑐𝑗 (𝑝) such that 𝑚

(4.23)

𝜔𝑝 = ∑ 𝑐𝑗 (𝑝) ⋅ (𝑑𝑥𝑗 )𝑝

in TM∗𝑝 for 𝑝 ∈ 𝑈𝛼 .

𝑗=1

By (4.19) the coefficients can be recovered by bracketing 𝜔𝑝 with the basis vectors {(𝜕/𝜕𝑥𝑖 |𝑝 )} in TM𝑝 determined by the chart. Definition 4.29. We say 𝜔 is a smooth field of 1-forms, or a rank-1 differential form on 𝑀, if for every chart (𝑈𝛼 , 𝑥𝛼 ) the coefficient functions 𝑐𝑘 (𝑢) are in 𝒞 ∞ (𝑈𝛼 ). As we will show in Theorem 4.30 below, this definition is coordinateindependent. The set of such fields on 𝑀 is denoted by 𝒟 (0,1) (𝑀), or sometimes

4.2. COTANGENT VECTORS AND DIFFERENTIAL FORMS

157

1

by ⋀ 𝑀 depending on the context. This becomes an infinite-dimensional vector space if sums and scalar multiples in 𝒟 (0,1) (𝑀) are given by (𝜆𝜔)𝑝 = 𝜆 ⋅ 𝜔𝑝

and

(𝜔 + 𝜇)𝑝 = 𝜔𝑝 + 𝜇𝑝

in TM∗𝑝

for all 𝑝 ∈ 𝑀 and fields 𝜔, 𝜇 ∈ 𝒟 (0,1) (𝑀). This space of 1-forms is also a 𝒞 ∞ (𝑀)-module under the action 𝒞 ∞ (𝑀) × 𝒟 (0,1) (𝑀) → 𝒟 (0,1) (𝑀) given by (𝑓 ⋅ 𝜔)𝑝 = 𝑓(𝑝) ⋅ 𝜔𝑝

for 𝑝 ∈ 𝑀, 𝑓 ∈ 𝒞 ∞ (𝑀), 𝜔 ∈ 𝒟 (0,1) (𝑀).

The following change of variable law for 1-forms implies that smoothness is a coordinate-independent property of a field of cotangent vectors. The proof follows easily from the previously established transformation laws for tangent vectors, Theorem 4.15 or Corollary 4.16. Theorem 4.30 (Change of Variable for Cotangent Vectors.). Cotangent vectors (𝑑𝑥𝑖 )𝑢 and (𝑑𝑦𝑗 )𝑢 determined by charts (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) on a manifold 𝑀 transform in the following way: 𝑚

𝜕𝑌𝑖 (𝑢) ⋅ (𝑑𝑥𝑗 )𝑢 𝜕𝑥𝑗 𝑗=1

(𝑑𝑦𝑖 )𝑢 = ∑

in TM∗𝑝 , 1 ≤ 𝑖, 𝑗 ≤ 𝑚

for all base points 𝑢 ∈ 𝑈𝛼 ∩ 𝑈𝛽 . Here the 𝑌𝑖 are the scalar components of the chart map 𝑦𝛽 = (𝑌1 , … , 𝑌𝑚 ). 𝑚

Proof. There are unique coefficients such that (𝑑𝑦𝑖 )𝑢 = ∑𝑗=1 𝑐𝑗 (𝑢)⋅(𝑑𝑥𝑗 )𝑢 . If the scalar components of 𝑦𝛽 are 𝑦𝛽 (𝑢) = (𝑌1 (𝑢), … , 𝑌𝑚 (𝑢)), then by (4.19) and (4.22), bracketing with the dual basis vectors (𝜕/𝜕𝑥𝑘 |𝑢 ) yields 𝜕 | 𝜕 | | ⟩ = ⟨ (𝑑𝑌𝑖 )𝑢 , | ⟩ 𝜕𝑥𝑘 |𝑢 𝜕𝑥𝑘 |𝑢 𝜕𝑌𝑖 𝜕 | | , 𝑌𝑖 ⟩ = =⟨ (𝑢) for 1 ≤ 𝑖 ≤ 𝑚, | 𝜕𝑥𝑘 𝑢 𝜕𝑥𝑘

𝑐𝑘 (𝑢) = ⟨ (𝑑𝑦𝑖 )𝑢 ,

□

proving the formula.

A law similar to Corollary 4.16 for tangent vectors describes the action by which a 𝒞 ∞ map 𝜙 ∶ 𝑀 → 𝑁 “pulls back” cotangent vectors from base points in 𝑁 to base points in 𝑀. We won’t pursue this here since it won’t be needed in later discussion, but it is a fundamental result in differential geometry. Exterior Derivatives (an Overview). Equation (4.21) defines a linear 0 1 operator 𝑑 = 𝑑0 ∶ ⋀ 𝑀 → ⋀ 𝑀 called the rank-0 exterior derivative, where 0 ⋀ 𝑀 = 𝒞 ∞ (𝑀) is the space of rank-0 differential forms. By Definition 4.26, (𝑑𝑓)𝑝 acts on elements 𝑋𝑝 ∈ TM𝑝 via (4.24)

⟨(𝑑𝑓)𝑝 , 𝑋𝑝 ⟩ = ⟨𝑋𝑝 , 𝑓 ⟩

for all 𝑓 ∈ 𝒞 ∞ (𝑝), 𝑝 ∈ 𝑀, 𝑋𝑝 ∈ TM𝑝 .

This 𝑑-operator has its own derivation property: (𝑑(𝑓ℎ))𝑢 = (𝑑𝑓)𝑢 ⋅ ℎ(𝑢) + 𝑓(𝑢) ⋅ (𝑑ℎ)𝑢

for all 𝑢 ∈ 𝑀 and 𝑓, ℎ ∈ 𝒞 ∞ (𝑀).

158

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Exercise 4.31. If 𝑓(𝑢), ℎ(𝑢) are 𝒞 ∞ functions defined on a chart 𝑥𝛼 in a manifold 𝑀, verify that (𝑑(𝑓 ⋅ ℎ))𝑢 = (𝑑𝑓)𝑢 ℎ(𝑢) + 𝑓(𝑢)(𝑑ℎ)𝑢 for all 𝑢 ∈ 𝑈𝛼 . We will eventually define a hierarchy of spaces of smooth tensor fields 𝑘

⋀ 𝑀 = (the space of smooth rank 𝑘-differential forms on 𝑀) 𝑘

together with associated rank-𝑘 exterior derivatives 𝑑 = 𝑑𝑘 ∶ ⋀ 𝑀 → ⋀ such that (4.25)

0

𝑑=𝑑0

1

𝑑=𝑑1

2

𝑑=𝑑2

𝑑=𝑑𝑚−1

𝑘+1

𝑀

𝑚

⋀ 𝑀 −−⟶ ⋀ 𝑀 − −⟶ ⋀ 𝑀 − −⟶ ⋯ −−− −⟶ ⋀ 𝑀 ,

where 𝑚 = dim(𝑀). In all dimensions, these “𝑑-operators” 𝑑𝑘 play the roles occupied by the vector operators div, grad, curl of calculus when 𝑚 = 3, but a few more ideas must be developed before that connection can be explained. Primitives of 𝟏-Forms. If 𝜔 is a smooth 1-form on an open set 𝑈 ⊆ 𝑀, a primitive of 𝜔 is an 𝑓 ∈ 𝒞 ∞ (𝑈) such that 𝑑𝑓 = 𝜔 on 𝑈. If 𝑓 is such a primitive on 𝑈, consider a chart (𝑈𝛼 , 𝑥𝛼 ) about 𝑝; replacing 𝑈𝛼 → 𝑈∩𝑈𝛼 , we may assume 𝑈𝛼 ⊆ 𝑈. On 𝑈𝛼 , we can describe 𝜔 in local coordinates: 𝑚

𝜔𝑢 = ∑ 𝑤𝑖 (𝑢) ⋅ (𝑑𝑥𝑖 )𝑢

with 𝑤𝑖 ∈ 𝒞 ∞ (𝑈𝛼 ).

𝑖=1

By (4.21), 𝑑𝑓 = 𝜔 on 𝑈𝛼 means that 𝑚

𝜕𝑓 (𝑢) (𝑑𝑥𝑖 )𝑢 𝜕𝑥 𝑖 𝑖=1

(𝑑𝑓)𝑢 = ∑

𝑚

is equal to

𝜔𝑢 = ∑ 𝑤𝑖 (𝑢) (𝑑𝑥𝑖 )𝑢 𝑖=1

for all 𝑢 ∈ 𝑈𝛼 . The coefficients must agree, so if there is an 𝑓 such that 𝑑𝑓 = 𝜔, the coefficient functions 𝑤𝑖 (𝑢) must be related to the partial derivatives of 𝑓 via the following system of partial differential equations: 𝜕𝑓 (𝑢) for all 𝑢 ∈ 𝑈𝛼 and 1 ≤ 𝑖 ≤ 𝑚. 𝜕𝑥𝑖 This system does not always admit solutions, even locally. In fact, the coefficients 𝑤𝑖 (𝑢) of 𝜔 in local coordinates must satisfy “consistency condition” if the equation 𝑑𝑓 = 𝑤 is to have any local solutions; within the chart domain we must have 𝜕𝑤𝑖 𝜕𝑤𝑗 (4.26) − ≡ 0 on 𝑈𝛼 for 1 ≤ 𝑖 < 𝑗 ≤ 𝑚. 𝜕𝑥𝑗 𝜕𝑥𝑖 𝑤𝑖 (𝑢) =

These conditions are necessary because if 𝑓 is of class at least 𝐶 (2) , its mixed 2nd 1 order partial derivatives must agree, so the 𝑤𝑖 (𝑢) satisfy a system of (𝑚2 − 𝑚) 2 equations on ℝ𝑚 : 𝜕𝑤𝑗 𝜕𝑤𝑖 𝜕 𝜕𝑓 𝜕 𝜕𝑓 = for 1 ≤ 𝑖, 𝑗 ≤ 𝑚. ( )= ( )= 𝜕𝑥𝑗 𝜕𝑥𝑗 𝜕𝑥𝑖 𝜕𝑥𝑖 𝜕𝑥𝑗 𝜕𝑥𝑖

4.2. COTANGENT VECTORS AND DIFFERENTIAL FORMS

159

In order to defer a long digression, we list the basic facts about local and global 1 primitives of 1-forms 𝜔 ∈ ⋀ (𝑀) below. 1. The consistency conditions for 𝜔 take the same form (4.26) in every system of local coordinates. (This is fairly obvious, as noted in the remarks following Proposition 4.27). 2. Using line integrals ∫𝛾 𝑤 of 1-forms along 𝐶 (1) curves, one can prove the existence of local solutions near any base point 𝑝 ∈ 𝑈𝛼 if the conditions (4.26) are satisfied. 3. Conditions (4.26) do not by themselves imply the existence of a “global solution” 𝑓 in 𝒞 ∞ (𝑈) to the identity 𝑑𝑓 = 𝜔. The geometry of 𝑈 can present obstructions to existence of global solutions for certain forms 𝜔 even when (4.26) holds (see Example 4.34 below). 4. If 𝜔 is a smooth 1-form on a manifold 𝑀 and the equation 𝑑𝑓 = 𝜔 has a 𝒞 ∞ solution on an open subset 𝑈 ⊆ 𝑀 that is connected, all other solutions on 𝑈 are obtained by adding an arbitrary constant to 𝑓. Exercise 4.32. If 𝑓 is a 𝒞 ∞ function near 𝑝 ∈ 𝑀, prove that the following properties are equivalent: 𝜕𝑓 (a) ≡ 0 near 𝑝 for 1 ≤ 𝑖 ≤ 𝑚. 𝜕𝑥𝑖 (b) 𝑓 ≡ (constant) on some open set containing 𝑝. Hint. When transferred to coordinate space via a chart about 𝑝, this local result can be proved by calculus methods. Remark 4.33. It follows from the definition of “connectedness” that 𝑓 ≡ (constant) on any connected open set 𝑈 ⊆ 𝑀 such that 𝑑𝑓 ≡ 0 on 𝑈. ○ Example 4.34. Let 𝑀 be the “punctured plane” ℝ2 ∼ {𝟎}. The anglevariable function 𝑦 𝜃(𝑥, 𝑦) = arcsin ( ) √𝑥2 + 𝑦2 is multiple-valued, but on any open half-plane 𝐻 ⊆ 𝑀 bounded by a line through the origin, there are smooth single-valued 𝒞 ∞ determinations of 𝜃. When angles are measured in radians, these functions 𝜃(𝑥, 𝑦) can only differ on 𝐻 by an added constant of the the form 𝑐 = 2𝜋𝑛, 𝑛 ∈ ℤ, because it is connected. Since 𝑑(𝑓 + 𝑐1-) = (𝑑𝑓), all smooth determinations of 𝜃(𝑥, 𝑦) on 𝐻 must 1 have the same single-valued exterior derivative 𝜔 = 𝑑𝜃 in ⋀ 𝑀, namely (4.27)

𝜔𝐱 = (

−𝑦 𝑥 ) 𝑑𝑥 + ( 2 ) 𝑑𝑦 + 𝑦2 𝑥 + 𝑦2

𝑥2

for 𝐱 = (𝑥, 𝑦) ∈ 𝐻

when 𝜔 is described in Euclidean coordinates on 𝐻. In particular, 𝜔 has local primitives near every 𝑝 ≠ 0 in 𝑀, but the multiple-valued nature of the primitive 𝜃(𝑥, 𝑦) on the punctured plane 𝑀 prevents us from piecing together these local solutions to get a globally defined 𝒞 ∞ function 𝑓 ∶ 𝑀 → ℝ such that 𝑑𝑓 = 𝜔 throughout 𝑀.

160

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Exercise 4.35. The punctured plane 𝑀 = ℝ2 ∼ {𝟎} is covered by the four open half-planes: 𝐻1 = {(𝑥, 𝑦) ∶ 𝑦 > 0} 𝐻3 = {(𝑥, 𝑦) ∶ 𝑦 < 0}

𝐻2 = {(𝑥, 𝑦) ∶ 𝑥 > 0} 𝐻4 = {(𝑥, 𝑦) ∶ 𝑥 < 0}.

For each half-plane, give an explicit single-valued 𝒞 ∞ determination of the angle variable 𝜃(𝑥, 𝑦). In each case show that the exterior derivative 𝜔 = 𝑑𝜃 is in fact equal to (4.27). Hint. You may have to express your answer in terms of the functions arcsin, arctan, or arccos from calculus, depending on which 𝐻𝑖 you examine. Exercise 4.36. The following 1-forms on 𝑀2 = ℝ2 ∼ {𝟎} are described with respect to the standard Euclidean coordinates on this space. Identify those that can have well-defined local primitives: 𝑦 𝑥 (a) 𝜔 = ( 2 ) ⋅ (𝑑𝑥) + ( 2 ) ⋅ (𝑑𝑦) 𝑥 + 𝑦2 𝑥 + 𝑦2 𝑦 𝑥 (b) 𝜔 = ( 2 ) ⋅ (𝑑𝑥) + ( 2 ) ⋅ (𝑑𝑦) 𝑥 + 𝑦2 𝑥 + 𝑦2 𝑦 𝑥 (c) 𝜔 = 2 ⋅ (𝑑𝑥) + 2 ⋅ (𝑑𝑦) 2 3/2 (𝑥 + 𝑦 ) (𝑥 + 𝑦2 )3/2 (d) 𝜔 = ln(𝑥2 + 𝑦2 ) ⋅ (𝑑𝑥) + ln(𝑥2 + 𝑦2 ) ⋅ (𝑑𝑦) (e) 𝜔 = (𝑥2 − 𝑦2 ) ⋅ (𝑑𝑥) + 2𝑥𝑦 ⋅ (𝑑𝑦) (f) 𝜔 = (2𝑥𝑦) ⋅ (𝑑𝑥) − (𝑥2 − 𝑦2 ) ⋅ (𝑑𝑦) Exercise 4.37. Verify that the following 1-form on 𝑀 = ℝ3 ∼ {𝟎} 1 𝜔= 2 ⋅ (𝑥(𝑑𝑥) + 𝑦(𝑑𝑦) + 𝑧(𝑑𝑧)) (𝑥 + 𝑦2 + 𝑧2 )3/2 has as a primitive the famed “1/𝑟 potential” 1 𝜙(𝐱) = in which 𝑟 = ‖𝐱‖ = √𝑥2 + 𝑦2 + 𝑧2 . 𝑟 Interpretations of 𝟏-Forms and Their Primitives. In calculus, “vector 𝑛 fields” on ℝ𝑛 (or open subsets thereof) are represented as 𝐅(𝐱) = ∑𝑖=1 𝐹𝑖 (𝐱) 𝐞𝑖 where 𝐞1 , … , 𝐞𝑛 are the standard basis vectors in ℝ𝑛 . But the nature of these basis vectors is seldom mentioned. This is a problem because fields appearing in applications are not all of the same type. Some examples: 1. Velocity Fields 𝑋˜ on 𝑀 = ℝ𝑛 must be regarded as fields of tangent vectors. In fact, if 𝛾(𝑡) is a curve of class 𝒞 (1) , its vector derivative 𝛾′ (𝑡) is the instantaneuous velocity of the moving point 𝑝 = 𝛾(𝑡), and this is a tangent vector in TM𝑝 . Therefore, at any 𝑝 ∈ 𝑀, the basis vectors 𝐞𝑖 in the identity 𝛾′ (𝑡) = 𝑣1 𝐞1 + ⋯ + 𝑣𝑛 𝐞𝑚

(𝑣𝑖 ∈ ℝ)

should also be interpreted as vectors in TM𝑝 . The instantaneous velocities of particles in a fluid flow at a particular moment in time provide

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

161

a physical example of a smooth vector field of tangent vectors on an open subset 𝑀 ⊆ ℝ𝑛 . 2. Gradient Fields ∇𝑓 determined by a function 𝑓 ∶ 𝑀 → ℝ are classically presented in the form ∇𝑓(𝐱) = 𝐷𝑥1 𝑓(𝐱) 𝐞1 + ⋯ + 𝐷𝑥𝑚 𝑓(𝐱) 𝐞𝑚 . But it should be apparent by now that there are many reasons to interpret every gradient field ∇𝑓 as a smoothly varying field of cotangent vectors, not tangent vectors, and then the basis vectors 𝐞𝑖 must also be regarded as cotangent vectors. Formula (4.21) strongly suggests the following interpretation: taking global coordinates 𝑥𝛼 (𝐱) = (𝑥1 , … , 𝑥𝑚 ) on Euclidean space 𝑀 (and 𝑈𝛼 = ℝ𝑚 ), we have basis vectors 𝔛∗𝑝 = {(𝑑𝑥1 )𝑝 , … , (𝑑𝑥𝑚 )𝑝 } in TM∗𝑝 for every base point 𝑝. In the standard chart coordinates on ℝ𝑚 , the exterior derivative 𝑑 mapping 0 1 ⋀ 𝑀 → ⋀ 𝑀 takes the form 𝑚

𝜕𝑓 (𝑝) ⋅ (𝑑𝑥𝑖 )𝑝 𝜕𝑥 𝑖 𝑖=1

(𝑑𝑓)𝑝 = ∑

for all 𝑝 ∈ ℝ𝑚 .

If 𝑓 ∈ 𝒞 ∞ (ℝ𝑚 ) and we identify the standard basis vectors 𝐞1 , … , 𝐞𝑚 attached to 𝑝 ∈ ℝ𝑚 with the dual vectors 𝐞∗1 = (𝑑𝑥1 )𝑝 , … , 𝐞∗𝑚 = (𝑑𝑥𝑚 )𝑝 in the cotangent space TM∗𝑝 , then the exterior derivative (𝑑𝑓)𝑝 becomes the classical gradient ∇𝑓(𝑝) everywhere on ℝ𝑚 . 3. Electric Fields 𝐄(𝐱) = 𝐸1 (𝐱) 𝐞1 + ⋯ + 𝐸𝑚 (𝐱)𝐞𝑚 are also fields of cotangent vectors. It has long been realized by physicists that, at least locally, 𝐄 fields are gradients 𝐄 = ∇𝜙 of scalar “potential functions” 𝜙 ∶ 𝑀 → ℝ. If 𝑀 = ℝ𝑚 and we identify the 𝐞𝑖 at a base point 𝑝 with the 1-forms (𝑑𝑥𝑖 )𝑝 determined by the standard coordinate chart on ℝ𝑚 , then the classical statement 𝐄 = ∇𝜙 becomes a statement about the exterior derivative of 𝜙, 𝐄(𝐱) = (𝑑𝜙)𝐱 = 𝐷𝑥1 𝜙(𝐱) (𝑑𝑥1 )𝐱 + ⋯ + 𝐷𝑥𝑚 𝜙(𝐱) ⋅ (𝑑𝑥𝑚 )𝐱 for all 𝐱 ∈ ℝ𝑚 . 4. Magnetic Fields are described in introductory courses as “fields of vectors,” but the “vectors” involved are something quite different, neither tangent vectors nor cotangent vectors. Magnetic fields are in fact represented by fields of rank-2 antisymmetric tensors, which assign at each base point 𝑝 a nondegenerate antisymmetric bilinear form 𝐵𝑝 on each tangent space TM𝑝 . We will see how these arise in the scheme (4.25), but understanding this last statement requires an excursion into multilinear algebra which will be elaborated in the next section. 4.3. Differential Forms on 𝑴 and Their Exterior Derivatives In Chapter 3, we discussed bilinear forms 𝜔 ∶ 𝑉 × 𝑉 → 𝕂 for 𝕂 = ℝ or ℂ, but to deal with the issues that arise in differential geometry and various areas of applied mathematics, it becomes necessary to get a deeper understanding of

162

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

multilinear algebra, particularly multilinear forms on vector spaces (aka tensors) and fields of tensors on differentiable manifolds. It may be helpful to regard a rank-𝑘 tensor 𝜔 on 𝑉 as a “black box” that accepts as an input an ordered list (𝑣1 , … , 𝑣𝑘 ) of vectors from 𝑉 and produces a scalar 𝜔(𝑣1 , … , 𝑣𝑘 ) ∈ 𝕂 as its output; the rank rk(𝜔) of the tensor is the number of inputs to the box. The workings of the box are subject to just one rule: For each 1 ≤ 𝑖 ≤ 𝑘, the output 𝜔(𝑣1 , … , 𝑣𝑘 ) must be a linear function of 𝑣𝑖 ∈ 𝑉 when the other inputs 𝑣𝑗 (𝑗 ≠ 𝑖) are held fixed. The bilinear forms examined in Chapter 3 are all rank-2 tensors, and in particular any inner product on a vector space 𝑉 over ℝ is a rank-2 tensor on 𝑉. The rank-1 tensors on 𝑉 are just the linear functionals ℓ ∶ 𝑉 → 𝕂 in the dual space 𝑉 ∗ , and we define the rank-0 tensors on 𝑉 to be the scalars in the ground field 𝕂. The rank-𝑘 tensors on 𝑉 are the multilinear forms 𝜔 ∶ 𝑉 × ⋯ × 𝑉 → ℝ with 𝑘 inputs, which are often referred to as the 𝑘-linear forms. We make these into a vector space by imposing the following addition and scaling operations: (𝜔1 + 𝜔2 )(𝑣1 , … , 𝑣𝑘 ) = 𝜔1 (𝑣1 , … , 𝑣𝑘 ) + 𝜔2 (𝑣1 , … , 𝑣𝑘 ) (𝜆 ⋅ 𝜔)(𝑣1 , … 𝑣𝑘 ) = 𝜆 ⋅ 𝜔(𝑣1 , … , 𝑣𝑘 ). The resulting vector space of “tensors of type (0, 𝑘)” is denoted by 𝑉 (0,𝑘) with 𝑉 (0,1) = 𝑉 ∗ and 𝑉 (0,0) = 𝕂, so dim(𝑉 (0,1) ) = dim(𝑉 ∗ ) = dim(𝑉)

and

dim(𝑉 (0,0) ) = 1.

Tensors of arbitrary mixed type 𝑉 (𝑟,𝑘) on 𝑉 can be defined, but we won’t need them in this account, with one exception: in this notation, the space 𝑉 itself is denoted by 𝑉 = 𝑉 (1,0) , and its elements are the tensors of type (1,0). If 𝔛 = {𝑒𝑖 } is a basis for 𝑉 and 𝔛∗ = {𝑒𝑖∗ } is the dual basis in 𝑉 ∗ , the space 𝑉 (0,𝑘) of 𝑘-linear forms 𝜔 ∶ 𝑉 × ⋯ × 𝑉 → ℝ is a vector space of dimension dim (𝑉 (0,𝑘) ) = 𝑚𝑘 if dim(𝑉) = 𝑚. We will prove this by showing that it is spanned by “monomials,” which are “tensor products” of vectors in the dual basis 𝔛∗ . If 𝐼 = (𝑖1 , … , 𝑖𝑘 ) is a multi-index with each 𝑖𝑗 ∈ [1, 𝑚], the corresponding monomial 𝑒𝐼∗ ∈ 𝑉 (0,𝑘) is the 𝑘-linear map 𝑘

(4.28)

𝑒𝐼∗ = 𝑒𝑖∗1 ⊗ ⋯ ⊗ 𝑒𝑖∗𝑘

that sends

(𝑣1 , … , 𝑣𝑘 ) ↦ ∏⟨𝑒𝑖∗𝑗 , 𝑣𝑗 ⟩ 𝑗=1

where the 𝑣𝑗 ∈ 𝑉. These operators are easily seen to be rank-𝑘 multilinear forms and are a basis for 𝑉 (0,𝑘) , so every rank-𝑘 multilinear form can be written uniquely as 𝜔 = ∑𝐼∈[1,𝑚]𝑘 𝑐𝐼 𝑒𝐼∗ with 𝑐𝐼 ∈ ℝ. Except for the number of players on the field, the proof is the same as that given in Chapter 3 to compute the dimensions of spaces of bilinear forms. Our main interest here will be in tensors that act on the tangent spaces to a differentiable manifold 𝑀, so at 𝑝 ∈ 𝑀 we consider tensors that act on 𝑉 = TM𝑝 . However, many results discussed below are purely algebraic in nature

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

163

and are true for tensors on arbitrary vector spaces that might be attached to base points in 𝑀. A tensor field of rank-𝑘 on 𝑀 assigns a 𝑘-linear form 𝜔𝑢 ∈ TM(0,𝑘) 𝑢 at each base point 𝑢 ∈ 𝑀. Given a chart (𝑈𝛼 , 𝑥𝛼 ) on 𝑀 and base point 𝑢 ∈ 𝑈𝛼 , we have bases {𝜕/𝜕𝑥𝑖 |𝑢 } in TM𝑢 and dual bases {(𝑑𝑥𝑖 )𝑢 } in TM∗𝑢 , and then there are uniquely determined coefficients 𝑐𝐼 (𝑢) on 𝑈𝛼 for 𝐼 = (𝑖1 , … , 𝑖𝑘 ) ∈ [1, 𝑚]𝑘 such that 𝜔𝑢 = ∑ 𝑐𝐼 (𝑢) (𝑑𝑥𝐼 )𝑢 = ∑ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑥𝑖1 )𝑢 ⊗ ⋯ ⊗ (𝑑𝑥𝑖𝑘 )𝑢 𝐼

for 𝑢 ∈ 𝑈𝛼 .

𝐼

A tensor field 𝜔 on 𝑀 is smooth if the coefficients 𝑐𝐼 (𝑢) are in 𝒞 ∞ (𝑈𝛼 ) for all charts on 𝑀. A routine (but messy) calculation involving the change of basis formula shows that smoothness of 𝜔 described in one chart implies smoothness with respect to any other chart in the maximal atlas. The space 𝒟 (0,𝑘) (𝑀) of smooth rank-𝑘 tensor fields is an infinite-dimensional vector space; it is also a 𝒞 ∞ (𝑀)-module under the action 𝒞 ∞ (𝐺) × 𝒟 (0,𝑘) (𝑀) → 𝒟 (0,𝑘) (𝑀)

given by

(𝑓𝜔)𝑢 = 𝑓(𝑢) ⋅ 𝜔𝑢

for 𝑢 ∈ 𝑀 and 𝑓 ∈ 𝒞 ∞ (𝑀). When 𝑛 = 1, the space 𝒟 (0,1) (𝑀) is precisely the 1 0 space ⋀ 𝑀 of smooth 1-forms on 𝑀, and 𝒟 (0,0) (𝑀) is equal to ⋀ 𝑀 = 𝒞 ∞ (𝑀) by definition. The space 𝒟 (0,2) (𝑀) of smooth rank-2 tensor fields consists of smoothly varying fields of bilinear forms on the tangent spaces TM𝑝 . Example 4.38 (Riemannian Structure on 𝑀). On any vector space 𝑉 over ℝ, an inner product 𝑔(𝑣1 , 𝑣2 ) is a particular type of rank-2 tensor in 𝑉 (0,2) , but there are many possible inner products on 𝑉. A Riemannian structure on a manifold 𝑀 is a smooth field of inner products 𝑝 ↦ 𝑔𝑝 with values 𝑔𝑝 ∈ TM(0,2) . 𝑝 When 𝑀 is equipped with this extra structure, we can define the following: • Length ‖𝑋𝑝 ‖ = √𝑔𝑝 (𝑋𝑝 , 𝑋𝑝 ) of any vector 𝑋𝑝 ∈ TM𝑝 . This determines a vector space norm on each tangent space that allows us to speak of the “length” of a tangent vector. • Orthogonality of vectors in 𝑋𝑝 , 𝑌𝑝 ∈ TM𝑝 , which we interpret to mean 𝑔𝑝 (𝑋𝑝 , 𝑌𝑝 ) = 0. The angle between two nonzero tangent vectors at 𝑝 is then cos (𝜃(𝑋𝑝 , 𝑌𝑝 )) =

𝑔𝑝 (𝑋𝑝 , 𝑌𝑝 ) . ‖𝑋𝑝 ‖ ‖𝑌𝑝 ‖

Lengths, angles, and orthogonality of tangent vectors cannot be defined in the absence of a Riemannian structure on 𝑀. Discussion. In particular, it now becomes meaningful to speak of orthonormal bases in each tangent space TM𝑝 , as well as more exotic constructs such as a “field of orthogonal frames”: a family of smooth vector fields {𝑋˜1 , … , 𝑋˜𝑚 } on 𝑀 such that 𝔛𝑝 = {(𝑋˜1 )𝑝 , … , (𝑋˜𝑚 )𝑝 } is an orthonormal basis in TM𝑝

164

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

with respect to the inner product 𝑔𝑝 ∶ TM𝑝 × TM𝑝 → ℝ at each base point 𝑝. Furthermore, if 𝛾 ∶ [𝑎, 𝑏] → ℝ is a 𝒞 ∞ curve, or merely one of class 𝒞 (1) , its length has a natural definition as the Riemann integral: 𝑏

Arc Length:

𝑏 ′

𝐿(𝛾) = ∫ ‖𝛾 (𝑡)‖ 𝑑𝑡 = ∫ √𝑔𝛾(𝑡) (𝛾′ (𝑡), 𝛾′ (𝑡)) 𝑑𝑡 𝑎

𝑎

where 𝛾′ (𝑡) is the tangent vector to the curve at 𝑝 = 𝛾(𝑡). The change of variable formula for Riemann integrals shows that the value of 𝐿(𝛾) is unchanged, with 𝐿(𝜂) = 𝐿(𝛾), if 𝜂 is any orientation-preserving reparametrization of 𝛾 such that 𝑑𝜙 𝜂 = 𝛾 ∘ 𝜙 where 𝜙 ∶ [𝑐, 𝑑] → [𝑎, 𝑏] is in 𝒞 (1) with > 0 for all 𝑠 . 𝑑𝑠 For orientation-reversing reparametrizations, we get 𝐿(𝜂) = (−1) ⋅ 𝐿(𝛾). With considerably more effort, one can show that if 𝑝 ∈ 𝑀, there is an open neighborhood 𝑈 such that any 𝑞 ∈ 𝑈 can be connected to 𝑝 by a geodesic — a 𝒞 (1) curve 𝛾0 ∶ [𝑎, 𝑏] → 𝑈 such that 𝛾0 (𝑎) = 𝑝, 𝛾0 (𝑏) = 𝑞, 𝛾0 (𝑡) ∈ 𝑈 for all 𝑡 and 𝐿(𝛾0 ) ≤ 𝐿(𝛾)

for any 𝒞 (1) curve 𝜂 in 𝑈 that connects 𝑝 to 𝑞.

These “minimal length” curves in 𝑀 are the analogs of straight line segments when Euclidean space ℝ𝑛 is equipped with the Euclidean Riemannian structure, obtained by taking the standard global chart 𝑥𝛼 (𝐱) = (𝑥1 , … , 𝑥𝑛 ) on 𝑈𝛼 = 𝑛 ℝ𝑛 and defining 𝑔𝑝 = ∑𝑖=1 (𝑑𝑥𝑖 )𝑝 ⊗ (𝑑𝑥𝑖 )𝑝 so that 𝑔𝑝 (

𝜕 | 𝜕 | | , | ) = 𝛿𝑖𝑗 𝜕𝑥𝑖 |𝑝 𝜕𝑥𝑗 |𝑝

(Kronecker).

The bases 𝔛𝑝 = {(𝜕/𝜕𝑥𝑖 |𝑝 )} induced in TM𝑝 by the chart coordinates are then orthonormal bases with respect to the inner product 𝑔𝑝 in each tangent space to ℝ𝑛 . This notion of arc length also induces a natural metric on any Riemannian manifold: 𝑑𝑀 (𝑝, 𝑞) = inf {𝐿𝑔 (𝛾) ∶ 𝛾 any 𝒞 ∞ curve such that 𝛾(𝑎) = 𝑝, 𝛾(𝑏) = 𝑞 }. (However, verifying that 𝑑𝑀 satisfies the triangle inequality takes considerable effort.) While inf{… } is actually achieved by a unique 𝒞 ∞ curve (a geodesic) for all 𝑞 sufficiently close to 𝑝, this need not be true for points far from 𝑝 in the manifold. Even if the minimal length 𝑑𝑀 (𝑝, 𝑞) is achieved, the geodesic connecting 𝑝 to 𝑞 might not be unique if 𝑝 and 𝑞 are widely separated. Think of the north and south poles on the unit sphere 𝑆 2 ⊆ ℝ3 . The sphere inherits a natural Riemannian structure from the surrounding Euclidean space (because the restriction of an inner product on 𝑉 to a vector subspace 𝑊 is an inner product on 𝑊), and the length-minimizing geodesics in the sphere 𝑆 2 are segments of great circles (intersections of the sphere with planes through the origin). Every great circle from 𝑁 to 𝑆 is a geodesic with the same length 𝐿(𝛾) = 𝜋. ○ We won’t have time to explore the geometry of Riemannian manifolds in these Notes, but we emphasize that there are many examples. An excellent account of this subject is given in the book Riemannian Geometry [1] by Manfredo

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

165

do Carmo. All 𝒞 ∞ manifolds embedded in a Euclidean space ℝ𝑛 , such as the spheres 𝑆 𝑛 of various dimensions or smooth level hypersurfaces determined via the implicit function theorem, inherit a natural Riemannian structure induced by the standard Riemannian structure on ℝ𝑛 described above. This Riemannian structure will, however, depend on how the manifold is embedded in ℝ𝑛 . Action of Permutation Groups 𝑺𝒏 on Tensors. Permutations 𝜎 ∈ 𝑆𝑘 act as linear operators on the space of rank-𝑘 tensors 𝑡 ∈ 𝑉 (0,𝑘) if we let 𝜎 ⋅ 𝑡(𝑣1 , … , 𝑣𝑘 ) = 𝑡(𝑣𝜍(1) , … , 𝑣𝜍(𝑘) )

for 𝑣𝑗 ∈ 𝑉.

Notice that in evaluating the action of 𝜎 ⋅ 𝑡, the inputs 𝑣1 , … , 𝑣𝑘 are permuted by 𝜎 ∈ 𝑆𝑘 before being fed into the tensor 𝑡. Then 𝜎 ∶ 𝑉 (0,𝑘) → 𝑉 (0,𝑘) is a linear map and we obtain a “left action of 𝑆𝑘 on tensors,” which means that (4.29)

(𝜎𝜏) ⋅ 𝑡 = 𝜎 ⋅ (𝜏 ⋅ 𝑡)

for all 𝜎, 𝜏 ∈ 𝑆𝑘 and 𝑡 ∈ 𝑉 (0,𝑘) .

Remark 4.39. Many find the proof of this “covariance” property confusing. For a straightforward argument that does the job, observe that 𝜎 ⋅ (𝜏 ⋅ 𝑡)(𝑣1 , … , 𝑣𝑘 ) = (𝜏 ⋅ 𝑡)(𝑣𝜍(1) , … , 𝑣𝜍(𝑘) ) = (𝜏 ⋅ 𝑡)(𝑤1 , … , 𝑤𝑘 )|

𝑤1 =𝑣𝜍(1) ,…,𝑤𝑘 =𝑣𝜍(𝑘)

= 𝑡(𝑤𝜏(1) , … , 𝑤𝜏(𝑘) )|𝑤 =𝑣 𝑖

= 𝑡(𝑣𝜍(𝜏(1)) , … , 𝑣𝜍(𝜏(𝑘)) )

𝜍(𝑖)

(since 𝑤𝜏(𝑖) = 𝑣𝜍(𝜏(𝑖)) )

= 𝑡(𝑣(𝜍𝜏)(1) , … , 𝑣(𝜍𝜏)(𝑘) ) = (𝜎𝜏) ⋅ 𝑡(𝑣1 , … , 𝑣𝑘 ) for all 𝑣𝑖 ∈ 𝑉.

○

Exercise 4.40. If ℓ1 , … , ℓ𝑘 ∈ 𝑉 ∗ , we have defined their tensor product to be the 𝑘-linear form ℓ1 ⊗ ⋯ ⊗ ℓ𝑘 as in (4.28). Verify that the action of 𝑆𝑘 on such monomials is given by 𝜎 ⋅ (ℓ1 ⊗ ⋯ ⊗ ℓ𝑘 ) = ℓ𝜍−1 (1) ⊗ ⋯ ⊗ ℓ𝜍−1 (𝑘) . (Compare with the initial definition, and note the role 𝜎−1 plays here.) All of this applies to manifolds 𝑀 if we take 𝑉 = TM𝑢 at base points 𝑢 ∈ 𝑀. for all base points in 𝑀, we If 𝜔 is a tensor field in 𝒟 (0,𝑘) (𝑀), so 𝜔𝑢 ∈ TM(0,𝑘) 𝑢 let 𝜎 act independently on each tangent space (but not on the base points to for each 𝑢 ∈ 𝑀, which they are attached). Then (𝜎 ⋅ 𝜔)𝑢 = 𝜎 ⋅ (𝜔𝑢 ) in TM(0,𝑘) 𝑢 and we get a left-action of the group 𝑆𝑘 by linear operators 𝜎 ∶ 𝒟 (0,𝑘) (𝑀) → 𝒟 (0,𝑘) (𝑀) on the space of smooth rank-𝑘 tensor fields.

166

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Antisymmetric tensors and tensor fields on manifolds are objects of special interest in calculus on manifolds. Definition 4.41 (Symmetric and Antisymmetric Tensor Fields). If 𝑉 is a finite-dimensional vector space, a tensor 𝜔 in 𝑉 (0,𝑘) is symmetric or antisymmetric if (4.30)

𝜎 ⋅ 𝜔 = 𝜔 for all 𝜎 ∈ 𝑆𝑘

or

𝜎 ⋅ 𝜔 = sgn(𝜎) ⋅ 𝜔

where sgn(𝜎) is the signature of the permutation 𝜎 ∈ 𝑆𝑘 . The symmetric tensors of rank-𝑘 are a vector subspace 𝑆 𝑘 (𝑉) ⊆ 𝑉 (0,𝑘) ; the antisymmetric tensors 𝑘 𝑘 are also a vector subspace, denoted by ⋀ 𝑉. When 𝑘 ≥ 2, 𝑆 𝑘 (𝑉)∩⋀ (𝑉) = {0}. When 𝑘 = 0, 1 the definition (4.30) is ambiguous, so tensors in 𝑉 (0,1) = 𝑉 ∗ and 𝑉 (0,0) = 𝕂 are regarded as being both symmetric and antisymmetric; in partic1 0 ular, 𝑆 1 𝑉 = ⋀ 𝑉 = 𝑉 ∗ and 𝑆 0 𝑉 = ⋀ 𝑉 = 𝕂. For a tensor field 𝜔 ∈ 𝒟 (0,𝑘) (𝑀) on a manifold 𝑀, (anti-)symmetry means that 𝜔𝑝 is an (anti-)symmetric tensor at each base point 𝑝 ∈ 𝑀. Smooth fields of antisymmetric rank-𝑘 tensors are of special interest. They are called 𝑘-forms, or differential forms of rank-𝑘, and the space of all such tensor fields is de𝑘 noted ⋀ 𝑀. This is a vector subspace of 𝒟 (0,𝑘) (𝑀), and both become 𝒞 ∞ (𝑀)modules if we let 𝒞 ∞ (𝑀) act in the following way on rank-𝑘 tensors: (𝑓 ⋅ 𝜔)𝑢 (𝑣1 , … , 𝑣𝑘 ) = 𝑓(𝑢) ⋅ 𝜔𝑢 (𝑣1 , … , 𝑣𝑘 )

for 𝑣1 , … , 𝑣𝑘 ∈ TM𝑢 , 𝑢 ∈ 𝑀.

The space of symmetric tensor fields of rank-𝑘 is denoted 𝑆 𝑘 𝑀, and the symmetric rank-𝑘 tensors on TM𝑝 are denoted by 𝑆 𝑘 (TM𝑝 ). Many tensors on a vector space 𝑉 are neither symmetric nor antisymmetric. Antisymmetric tensors can be created by “antisymmetrizing” arbitrary rank-𝑘 tensors in 𝑉 (0,𝑘) via a surjective linear projection map 𝑘

Alt ∶ 𝑉 (0,𝑘) → ⋀ 𝑉 ⊆ 𝑉 (0,𝑘) defined as follows: (4.31)

Alt(𝜔) =

1 ∑ sgn(𝜎) (𝜎 ⋅ 𝜔) 𝑘! 𝜍∈𝑆

for all 𝜔 ∈ 𝑉 (0,𝑘) .

𝑘

The map Alt is linear; the fudge factor “1/𝑘!” is needed to make 𝐴𝑙𝑡 a projection operator, with (𝐴𝑙𝑡)2 = 𝐴𝑙𝑡. We obtain a similar “symmetrization operator” 𝑆 ∶ 𝑉 (0,𝑘) → 𝑆 𝑘 (𝑉) by dropping the factor “sgn(𝜎)” in equation (4.31), but symmetrization will not play a role in the present narrative. at each base point converts any On a manifold 𝑀, doing this to 𝑉 = TM(0,𝑘) 𝑢 smooth tensor field 𝜔 ∈ 𝒟 (0,𝑘) (𝑀) into a smooth field of rank-𝑘 antisymmetric tensors; the resulting tensor field Alt(𝜔) is then given by (4.32)

(Alt(𝜔))𝑢 = Alt(𝜔𝑢 ) for all 𝑢 ∈ 𝑀, 𝜔 ∈ 𝒟 (0,𝑘) (𝑀).

Below, we list without proof the basic facts about antisymmetrization.

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

167

Theorem 4.42. If 𝑉 is a vector space, the map Alt ∶ 𝑉 (0,𝑘) → 𝑉 (0,𝑘) in (4.31) is linear, and the following hold: 1. (Alt)2 = Alt ∘ Alt is equal to Alt, so Alt is a projection onto its range in 𝑉 (0,𝑘) . 𝑘 2. Alt(𝑉 (0,𝑘) ) ⊆ ⋀ (𝑉 (0,𝑘) ), and if 𝜔 was an antisymmetric tensor on 𝑉 𝑘

to begin with, then Alt(𝜔) = 𝜔. Thus, Alt = id on ⋀ 𝑉, and range(Alt) 𝑘

is all of ⋀ 𝑉. 𝑘

As an operator on tensor fields, Alt maps 𝒟 (0,𝑘) (𝑀) → ⋀ 𝑀 if we take 𝑉 = TM(0,𝑘) in (4.31) at every base point. 𝑢 Products of Tensors and Tensor Fields. The tensor product 𝜔 ⊗ 𝜇 of 𝜔 ∈ 𝑉 (0,𝑘) and 𝜇 ∈ 𝑉 (0,ℓ) is a rank-(𝑘 + ℓ) tensor on 𝑉 such that 𝜔 ⊗ 𝜇(𝑣1 , … , 𝑣𝑘 , 𝑣𝑘+1 , … , 𝑣𝑘+ℓ ) = 𝜔(𝑣1 , … , 𝑣𝑘 ) ⋅ 𝜇(𝑣𝑘+1 , … , 𝑣𝑘+ℓ ) The (⊗) operation sends pairs of tensors (𝜔, 𝜇) in the Cartesian product space 𝑉 (0,𝑘) × 𝑉 (0,ℓ) to tensors 𝜇 ⊗ 𝜈 of rank 𝑘 + ℓ in 𝑉 (0,𝑘+ℓ) . It is easily seen to be a linear function of each input to 𝜔 ⊗ 𝜇 when the other is held fixed and is an associative operation in the sense that 𝜔 ⊗ (𝜇 ⊗ 𝜏) = (𝜔 ⊗ 𝜇) ⊗ 𝜏

in 𝑉 𝑘+ℓ+𝑚 .

It is not a commutative operation because because 𝜔⊗𝜇 and 𝜇⊗𝜔 are generally different elments of 𝑉 (0,𝑘+ℓ) . Wedge Product of Antisymmetric Tensors. The wedge product 𝜔 ∧ 𝜇 is a bilinear map of antisymmetric tensors 𝑘

ℓ

⋀ 𝑉 ×⋀ 𝑉 →⋀ 𝑘

𝑘+ℓ

𝑉

ℓ

that takes 𝜔 ∈ ⋀ 𝑉 and 𝜇 ∈ ⋀ 𝑉 to an antisymmetric tensor of rank 𝑘 + ℓ: (𝑘 + ℓ)! Alt(𝜔 ⊗ 𝜇). 𝑘! ℓ! Note that the ordinary tensor product 𝜔 ⊗ 𝜇 will have rank 𝑘 + ℓ, but it need not be antisymmetric even if both factors were antisymmetric, so 𝜔 ⊗ 𝜇 has to be 𝑘+ℓ antisymmetrized to end up in ⋀ 𝑉. The algebraic properties of the wedge product are listed below. We start with a simple observation that the transpose operation 𝐴 ↦ 𝐴T on linear operators induces a natural “transpose operation” on tensors in 𝑉 (0,𝑘) . (4.33)

𝜔∧𝜇=

Lemma 4.43. If 𝑉 is a finite-dimensional vector space and 𝐴 ∶ 𝑉 → 𝑉 is a linear operator, its transpose acts on the dual space 𝑉 ∗ with ⟨𝐴T ℓ, 𝑣⟩ = ⟨ℓ, 𝐴𝑣⟩

for 𝑣 ∈ 𝑉, ℓ ∈ 𝑉 ∗ .

This construction can be extended to a transpose operation 𝐴T ∶ 𝑉 (0,𝑘) → 𝑉 (0,𝑘) on arbitrary rank-𝑘 tensors by letting 𝐴 act independently on each input vector 𝑣𝑖 in (𝑣1 , … , 𝑣𝑘 ) before 𝜔 is applied: (4.34)

𝐴T 𝜔(𝑣1 , … , 𝑣𝑘 ) = 𝜔(𝐴(𝑣1 ), … , 𝐴(𝑣𝑘 )).

168

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS 𝑘

The operator 𝐴T leaves the subspace ⋀ 𝑉 invariant, so 𝐴T induces a linear oper𝑘

𝑘

ator on the space of antisymmetric tensors 𝐴T ∶ ⋀ 𝑉 → ⋀ 𝑉 as well as on the full space 𝑉 (0,𝑘) of rank-𝑘 tensors. Much of what follows is based on the interaction between the wedge product and transpose 𝐴T operations on (anitsymmetric) tensors. Theorem 4.44 (Algebraic Properties of the Wedge Product). Let 𝑉 be a finite-dimensional vector space with basis 𝔛 = {𝑒𝑖 } and dual basis 𝔛∗ = {𝑒𝑖∗ } in 𝑉 ∗ . The wedge product has the following algebraic properties: 1. Antisymmetry of wedge. If 𝜙1 , 𝜙2 are in 𝑉 ∗ (rank-1 tensors on 𝑉), their wedge product is an antisymmetric tensor in 𝑉 (0,2) , 1 𝜙1 ∧ 𝜙2 = (𝜙1 ⊗ 𝜙2 − 𝜙2 ⊗ 𝜙1 ) = (−1) ⋅ (𝜙2 ∧ 𝜙1 ). 2 In particular, 𝜙 ∧ 𝜙 = 0 in 𝑉 (0,2) for every rank-1 tensor 𝜙 ∈ 𝑉 ∗ . If 𝑘 ℓ 𝜔 ∈ ⋀ 𝑉 and 𝜇 ∈ ⋀ 𝑉, we have the more general commutation relations 𝜔 ∧ 𝜇 = (−1)𝑘𝑙 𝜇 ∧ 𝜔.

(4.35)

𝑘

ℓ

𝑚

2. Associativity of Wedge. If 𝜔 ∈ ⋀ 𝑉, 𝜇 ∈ ⋀ 𝑉, 𝜏 ∈ ⋀ 𝑉, then 𝜔∧(𝜇∧𝜏) = (𝜔∧𝜇)∧𝜏. Associativity implies that when we form the wedge product 𝜔1 ∧ ⋯ ∧ 𝜔𝑟 of several antisymmetric tensors, we don’t have to worry about where to put the parentheses. The proof of associativity also yields a fact that greatly simplifies many calculations involving iterated wedge products: (4.36)

(𝑘 + ℓ + 𝑚)! Alt(𝜔 ⊗ 𝜇 ⊗ 𝜏). 𝑘! ℓ! 𝑚! Notice that evaluating our original definition of 𝜔 ∧ (𝜇 ∧ 𝜏), 𝜔 ∧ (𝜇 ∧ 𝜏) = (𝜔 ∧ 𝜇) ∧ 𝜏 =

(𝑘 + ℓ + 𝑚)! Alt(𝜔 ⊗ (Alt(𝜇 ⊗ 𝜏)), 𝑘! ℓ! 𝑚! involves two applications of “𝐴𝑙𝑡,” but the last expression in (4.36) requires only one. 3. Evaluating Monomial Wedge Products. If 𝔛 = {𝑒1 , … , 𝑒𝑚 } is a ∗ basis in 𝑉 and 𝔛∗ = {𝑒1∗ , … , 𝑒𝑚 } ⊆ 𝑉 ∗ is the dual basis, we can define a rank-𝑘 antisymmetric tensor 𝜔 ∧ (𝜇 ∧ 𝜏) =

𝑒𝐽∗ = 𝑒𝑗∗1 ∧ ⋯ ∧ 𝑒𝑗∗𝑘

for any multi-index 𝐽 = (𝑗1 , … , 𝑗𝑘 )

without any need to interpolate parentheses among the factors since the wedge product is associative. One important computational fact is ∗ 𝑒1∗ ∧ ⋯ ∧ 𝑒𝑚 (𝑒1 , … , 𝑒𝑚 ) = 1 .

More generally, we can evaluate 𝑒𝑗∗1 ∧ ⋯ ∧ 𝑒𝑗∗𝑘 (𝑒𝑖1 , … , 𝑒𝑖𝑘 ) = 𝑒𝐽∗ (𝑒𝑖1 , … , 𝑒𝑖𝑘 )

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

169

for arbitrary multi-indices 𝐼, 𝐽 in [1, 𝑚]𝑘 (ordered or not). The outcome is zero if there are any repeated entries in the 𝑘-tuple 𝐽 = (𝑗1 , … , 𝑗𝑘 ); otherwise, there is a unique permutation 𝜎 ∈ 𝑆𝑘 such that 𝐼 = 𝜎 ⋅ 𝐽 = (𝑗𝜍(1) , … , 𝑗𝜍(𝑘) ) is an ordered multi-index with 𝑗𝜍(1) < ⋯ < 𝑖𝜍(𝑘) . The 𝑘

corresponding “ordered monomials” 𝜎𝐼 turn out to be a basis for ⋀ 𝑉. Theorem 4.45 (The Basis Theorem). Given a basis 𝔛 = {𝑒𝑖 } in 𝑉 and dual basis 𝔛∗ = {𝑒𝑖∗ } in 𝑉 ∗ , we define the set ℰ𝑘 of ordered 𝑘-tuples 𝐼 = (𝑖1 , … , 𝑖𝑘 ) with 1 ≤ 𝑖1 < ⋯ < 𝑖𝑘 ≤ 𝑚 = dim(𝑉). 𝑘

The corresponding “ordered monomials” in ⋀ 𝑉 𝑒𝐼∗ = 𝑒𝑖∗1 ∧ ⋯ ∧ 𝑒𝑖∗𝑘 ,

(𝐼 = (𝑖1 < ⋯ < 𝑖𝑘 ) ∈ ℰ𝑘 ) 𝑘

are a basis for the space of 𝑘-forms ⋀ 𝑉. In particular, 𝑚

𝑘

dim ( ⋀ 𝑉) = ( 𝑘 ) = #(multi-indices in ℰ𝑘 ). 0

When 𝑘 = 0, we have dim ( ⋀ 𝑉) = dimℝ (ℝ) = 1. It is important to notice 𝑚 that the space ⋀ 𝑉 of antisymmetric tensors of maximal rank 𝑚 = dim(𝑉) is 𝑘

also one-dimensional, and ⋀ 𝑉 = {0} for all 𝑘 > 𝑚 = dim(𝑉). Exercise 4.46. If 𝜙1 , 𝜙2 ∈ 𝑉 ∗ , verify by example that 𝜙1 ⊗𝜙2 need not equal 𝜙2 ⊗ 𝜙1 as elements of 𝑉 (0,2) , so “⊗” is not commutative. Remember, equality of rank-2 tensors means they have the same action on all inputs (𝑣1 , 𝑣2 ) and that the inputs are ordered lists, so (𝑣1 , 𝑣2 ) ≠ (𝑣2 , 𝑣1 ). Exercise 4.47. Let 𝑉 be a finite-dimensional vector space, 𝔛 = {𝑒𝑖 } a basis, and 𝔛∗ = {𝑒𝑖∗ } the dual basis. In Chapter 3, we explained how every bilinear form 𝐵 ∶ 𝑉 × 𝑉 → ℝ is described by an 𝑚 × 𝑚 matrix (𝑚 = dim 𝑉) [𝐵]𝔛

with entries

𝐵𝑖𝑗 = 𝐵(𝑒𝑖 , 𝑒𝑗 ).

𝑚

𝑚

Given two vectors 𝜙1 = ∑𝑖=1 𝑎𝑖 𝑒𝑖∗ , 𝜙2 = ∑𝑖=1 𝑏𝑖 𝑒𝑖∗ in 𝑉 ∗ , we get the bilinear forms 𝜙1 ⊗ 𝜙2 and 𝜙2 ⊗ 𝜙1 on 𝑉, using the rules of Theorem 4.44. (a) Rewrite 𝜙1 ∧ 𝜙2 as a linear combination ∑𝐼∈ℰ 𝑐𝐼 𝑒𝐼∗ of standard basis 𝑘

2

vectors in ⋀ 𝑉. Compute the coefficients 𝑐𝐼 in terms of {𝑎𝑖 } and {𝑏𝑖 }. (b) Compute the associated matrices [𝜙1 ∧ 𝜙2 ]𝔛 and [𝜙2 ∧ 𝜙1 ]𝔛 . Are they equal? Computations with antisymmetric tensors often reduce to working with wedge products 𝜙1 ∧ ⋯ ∧ 𝜙𝑘 of rank-1 tensors 𝜙 ∈ 𝑉 ∗ . The following simple formula, easily derived by induction from the associative law in Theorem 4.44, provides a way to directly evaluate such products: (1 + ⋯ + 1)! ⋅ Alt(𝜙1 ⊗ ⋯ ⊗ 𝜙𝑘 ) = 𝑘! Alt(𝜙1 ⊗ ⋯ ⊗ 𝜙𝑘 ) 𝜙1 ∧ ⋯ ∧ 𝜙𝑘 = 1! ⋯ 1! Note that the factor out front is now 𝑘!, not

1 𝑘!

as in (4.31).

170

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

If {𝑒𝑖 } is a basis for 𝑉 and {𝑒𝑖∗ } the dual basis in 𝑉 ∗ , one can evaluate a wedge product 𝑒𝐽∗ = 𝑒𝑗∗1 ∧ ⋯ ∧ 𝑒𝑗∗𝑘 for an arbitrary multi-index 𝐽 = (𝑗1 , … , 𝑗𝑘 ) of length 𝑘 with 𝑗𝑠 ∈ [ 1, 𝑚], whose entries need not be increasing or even distinct. Exercise 4.48. If {𝜙1∗ , … , 𝜙𝑟∗ } are vectors in 𝑉 ∗ , use the algebraic rules governing the “∧” operation to explain why 𝜙1∗ ∧ ⋯ ∧ 𝜙𝑟∗ = 0 in the following circumstances: (a) There is a repeated vector, say with 𝜙𝑖∗ = 𝜙𝑗∗ for indices 𝑖 ≠ 𝑗. (b) One of the vectors 𝜙𝑖∗ is the zero vector in 𝑉 ∗ . (c) The vectors {𝜙𝑖∗ } are not linearly independent in 𝑉 ∗ . (d) 𝑟 > dim(𝑉 ∗ ). 𝑟

In particular, ⋀ 𝑉 = (0) for 𝑟 > dim(𝑉). Exercise 4.49. Let 𝑉 ve a vector space with basis {𝑒1 , … , 𝑒𝑚 } and dual basis ∗ {𝑒1∗ , … , 𝑒𝑚 } in 𝑉 ∗ , and let 𝑒𝐽∗ = 𝑒𝑗∗1 ∧ ⋯ ∧ 𝑒𝑗∗𝑘 be an arbitrary monomial in the dual basis vectors with 𝑗1 , … , 𝑗𝑘 ∈ [1, 𝑚] but not necessarily with 𝑗1 < ⋯ < 𝑗𝑘 . (a) If we interchange adjacent factors 𝑒𝑗∗𝑠 ↔ 𝑒𝑗∗𝑠+1 in the wedge product, explain why 𝑒𝐽∗ becomes −𝑒𝐽∗ . (b) If the indices (𝑗1 , … , 𝑗𝑘 ) are distinct, there is a unique permutation 𝜎 in 𝑆𝑘 such that 𝐼 = 𝜎 ⋅ 𝐽 = (𝑗𝜍(1) , … , 𝑗𝜍(𝑘) ) has the same indices listed in increasing order 𝑗𝜍(1) < ⋯ < 𝑗𝜍(𝑟) , so 𝑒𝐼∗ is a standard basis vector 𝑘

in ⋀ 𝑉. Explain why 𝑒𝐽∗ = 𝑒𝜍(𝐽) = sgn(𝜎) ⋅ 𝑒𝐼∗ . Hint. In any case, 𝑒𝐽∗ is either zero or is ±𝑒𝐼∗ for a unique standard basis 𝑘

vector 𝑒𝐼∗ ∈ ⋀ 𝑉. If there are identical factors in 𝑒𝐽∗ , repeated swapping of adjacent factors will bring the identical factors together, where they annihilate each other because 𝜙 ∧ 𝜙 = 0. If the factors in 𝑒𝐽∗ are distinct, some permutation of the entries yields a multi-index 𝐼 in ℰ𝑘 . Then 𝐽 = 𝜎 ⋅ 𝐼 for some 𝜎 ∈ 𝑆𝑘 and 𝑒𝐽∗ = ±𝑒𝐼∗ . The problem in (b) is to show that the ± sign is given by the signature sgn(𝜎). Calculating Wedge Products on a Chart. If 𝐽 = {𝑗1 , … , 𝑗𝑘 } is an arbitrary multi-index of length 𝑘 (entries not necessarily increasing or distinct), we can form the wedge product (𝑑𝑥)𝐽 = (𝑑𝑥𝑗1 ) ∧ ⋯ ∧ (𝑑𝑥𝑗𝑘 ) of 1-forms (𝑑𝑥𝑖 ) determined by the chart (𝑈𝛼 , 𝑥𝛼 ). But in view of the anti-commutation relations for wedge products, we get 𝑒𝐽∗ ≡ 0 on 𝑈𝛼 unless the entries in 𝐽 are distinct, and if they are distinct, then 𝑒𝐽∗ = ±𝑒𝐼∗ for a unique standard basis vector 𝑒𝐼∗ , 𝐼 ∈ ℰ𝑘 . One last thing should be noted. In discussing wedge products 𝜔 ∧ 𝜇 of differential forms on 𝑀, we often encounter “weighted” expressions like 𝑓 (𝑑𝑥𝐼 ) ∧ ℎ (𝑑𝑥𝐽 ) involving 𝒞 ∞ functions 𝑓, ℎ on the chart domain 𝑈𝛼 . By definition, (𝜔 ∧ 𝜇)𝑢 = 𝜔𝑢 ∧ 𝜇𝑢 at all base points. Hence, (4.37)

(𝑓𝜔) ∧ (ℎ𝜇) = (𝑓ℎ) ⋅ (𝜔 ∧ 𝜇)

as differential forms on 𝑀.

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

171

Example 4.50. Let 𝑈𝛼 be an open set in 𝑀 = ℝ2 on which Cartesian coordinates 𝑥𝛼 (𝐱) = (𝑥, 𝑦) and polar coordinates 𝑦𝛽 (𝐱) = (𝑟, 𝜃) are both defined. The smooth 2-forms 𝑑𝑥 ∧ 𝑑𝑦 and 𝑑𝑟 ∧ 𝑑𝜃 each provide a basis for the 2 1-dimensional space ⋀ (TM𝑝 ) at each 𝑝 ∈ 𝑈𝛼 . We shall rewrite the smooth 2-form 𝑑𝑥 ∧ 𝑑𝑦 as 𝐹(𝑟, 𝜃) ⋅ (𝑑𝑟 ∧ 𝑑𝜃) using the algebraic properties of Theorem 4.44 and the change of variable formula for cotangent vectors (Theorem 4.30). Discussion. The chart maps have scalar components 𝑥𝛼 (𝐱) = (𝑥, 𝑦) = (𝑋(𝐱), 𝑌 (𝐱)) and 𝑦𝛽 (𝐱) = (𝑟, 𝜃) = (𝑅(𝐱), Θ(𝐱)) for 𝐱 ∈ 𝑀, and the coordinate transition maps in either direction are given by (4.38)

𝑋 = 𝑅 cos(Θ), 𝑌 = 𝑅 sin(Θ)

for 𝑥𝛼 ∘ 𝑦𝛽−1

𝑅 = (𝑋 2 + 𝑌 2 )1/2 , Θ = arctan(𝑌 /𝑋)

−1 for 𝑦𝛽 ∘ 𝑥𝛼 .

Apply the exterior derivative to the scalar component functions 𝑋, 𝑌 in (4.38) to get 𝜕𝑋 𝜕𝑋 𝑑𝑥 = 𝑑𝑋 = ⋅ 𝑑𝑟 + ⋅ 𝑑𝜃 𝜕𝑟 𝜕𝜃 𝜕 𝜕 = {𝑅 cos Θ} ⋅ 𝑑𝑟 + {𝑅 sin Θ} ⋅ 𝑑𝜃 𝜕𝑟 𝜕𝜃 𝜕Θ 𝜕𝑅 cos(Θ) − 𝑅 sin(Θ) ⋅ =[ ] ⋅ 𝑑𝑟 𝜕𝑟 𝜕𝑟 𝜕𝑅 𝜕Θ +[ sin(Θ) + 𝑅 cos(Θ) ⋅ ] ⋅ 𝑑𝜃 𝜕𝜃 𝜕𝜃 where 𝑑𝑋 = (𝑑𝑥) is the exterior derivative of the scalar component 𝑋 ∶ 𝑈𝛼 → ℝ in the chart map 𝑥𝛼 . By (4.22), (𝑑𝑥) and 𝑑𝑋 are the same smooth 1-form on 𝑈𝛼 , and by (4.7) we get 𝜕Θ 𝜕𝑅 𝜕𝑅 𝜕Θ ≡ 1, ≡0 ≡ 1, ≡ 𝜕𝑟 𝜕𝑟 𝜕𝜃 𝜕𝜃 on 𝑈𝛼 , so on the chart domain we have 𝑑𝑥 = 𝑑𝑋 = cos(Θ) ⋅ 𝑑𝑟 − 𝑅 sin(Θ) ⋅ 𝑑𝜃. The same sort of calculation yields 𝑑𝑦 = 𝑑𝑌 = sin(Θ) ⋅ 𝑑𝑟 + 𝑅 cos(Θ) ⋅ 𝑑𝜃. Adopting the time-honored abuse of notation that ignores the distinction between a function 𝑟 = 𝑅(𝐱) and its values 𝑟, and likewise for the variable 𝜃, we get 𝑑𝑥 ∧ 𝑑𝑦 = [cos(𝜃) ⋅ 𝑑𝑟 − 𝑟 sin(𝜃) ⋅ 𝑑𝜃] ∧ [sin(𝜃) ⋅ 𝑑𝑟 + 𝑟 cos(𝜃) ⋅ 𝑑𝜃 ] 2

= sin(𝜃) cos(𝜃) ⋅ (𝑑𝑟 ∧ 𝑑𝑟) − 𝑟 sin (𝜃) ⋅ (𝑑𝜃 ∧ 𝑑𝑟) + 𝑟 cos2 (𝜃) ⋅ (𝑑𝑟 ∧ 𝑑𝜃) − 𝑟2 sin(𝜃) cos(𝜃) ⋅ (𝑑𝜃 ∧ 𝑑𝜃) 2

= −𝑟 sin (𝜃) ⋅ (𝑑𝜃 ∧ 𝑑𝑟) + 𝑟 cos2 (𝜃) ⋅ (𝑑𝑟 ∧ 𝑑𝜃) = 𝑟(𝑑𝑟 ∧ 𝑑𝜃) because 𝑑𝑟 ∧ 𝑑𝑟 = 𝑑𝜃 ∧ 𝑑𝜃 ≡ 0 and 𝑑𝜃 ∧ 𝑑𝑟 = −𝑑𝑟 ∧ 𝑑𝜃 by antisymmetry.

○

172

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Exercise 4.51. On ℝ2 , compute 𝑑𝑟∧𝑑𝜃 in terms of 𝑑𝑥∧𝑑𝑦. On ℝ3 , compute 𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧 in terms of 𝑑𝜌 ∧ 𝑑𝜙 ∧ 𝑑𝜃 (spherical coordinates). Remark 4.52. Recall that spherical coordinates (𝜌, 𝜃, 𝜙) and Cartesian coordinates (𝑥, 𝑦, 𝑧) on ℝ3 are related via 𝑧 = 𝜌 sin(𝜙)

𝑦 = 𝜌 sin(𝜃) cos(𝜙)

𝑥 = 𝜌 cos(𝜃) cos(𝜙)

(see Figure 4.3).

○

Exterior Derivative of Higher Rank 𝒌-Forms. Smooth 𝑘-forms (smooth 𝑘 rank-𝑘 differential forms) 𝜔 are tensor fields whose values 𝜔𝑝 lie in ⋀ (TM𝑝 ) for each 𝑝 ∈ 𝑀. Given any chart (𝑈𝛼 , 𝑥𝛼 ), such a field is uniquely described on 𝑈𝛼 as a sum 𝜔𝑢 = ∑ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑥𝑖1 )𝑢 ∧ ⋯ ∧ (𝑑𝑥𝑖𝑘 )𝑢 = ∑ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑥𝐼 )𝑢 𝐼∈ℰ𝑘

for 𝑢 ∈ 𝑈𝛼 ,

𝐼∈ℰ𝑘

and smoothness means the 𝑐𝐼 are in 𝒞 ∞ (𝑈𝛼 ) for all charts. The general exterior derivatives 𝑑𝑘 on a manifold with dim(𝑀) = 𝑚 0

𝑑=𝑑0

1

𝑑=𝑑1

2

𝑑=𝑑2

𝑑=𝑑𝑚−1

𝑚

𝑑=𝑑𝑚

⋀ 𝑀 −−⟶ ⋀ 𝑀 − −⟶ ⋀ 𝑀 − −⟶ ⋯ −−− −⟶ ⋀ 𝑀 − −⟶ 0 are operations on rank-𝑘 differential forms for 𝑘 = 0, 1, 2, … The rank-0 deriva0 tive 𝑑0 has been defined in a coordinate-free way on ⋀ 𝑀 = 𝒞 ∞ (𝑀) in (4.20) 1 for 𝑓 ∈ 𝒞 ∞ (𝑀); the result (𝑑𝑓) ∈ ⋀ 𝑀 is the 1-form such that (4.39)

⟨ (𝑑𝑓)𝑝 , 𝑋𝑝 ⟩ = ⟨𝑋𝑝 , 𝑓 ⟩

for all 𝑋𝑝 ∈ TM𝑝 , 𝑓 ∈ 𝒞 ∞ (𝑀).

As in (4.21), in local chart coordinates, 𝑑𝑓 takes the form 𝑛

𝜕𝑓 (𝑝) ⋅ (𝑑𝑥𝑖 )𝑝 𝜕𝑥 𝑖 𝑖=1

(𝑑𝑓)𝑝 = ∑

for all 𝑝 ∈ 𝑈𝛼 .

Coordinate-free formulas exist describing the action of exterior derivatives 𝑑𝑘 of all ranks 𝑘 = 0, 1, 2, … In rank-0, the formula (4.39) for 𝑑0 is simple, but such descriptions become more complicated and unintuitive for 𝑘 ≥ 1. There is, however, a fairly straightforward way to describe 𝑑𝜔 in local chart coordinates for 𝜔 of any rank 𝑘, and this is the approach we shall pursue. The 𝑘+1 only problem lies in showing that we get the same element 𝑑𝜔𝑝 in ⋀ (TM𝑝 ) at all base points in 𝑀, no matter which local chart (𝑈𝛼 , 𝑥𝛼 ) we use to compute it. (The proof that the definition is chart-independent is fairly arduous, and we won’t have time to prove it in this brief survey.) □ Here is the definition of 𝑑𝜔 in local coordinates. Definition 4.53 (General Exterior Derivative 𝑑𝑘 ). If (𝑈𝛼 , 𝑥𝛼 ) is any chart 𝑘 and 𝜔 ∈ ⋀ 𝑀, there are unique 𝒞 ∞ coefficients 𝑐𝐼 (𝑢) such that 𝜔𝑢 = ∑ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑥𝐼 )𝑢 = ∑ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑥𝑖1 ∧ ⋯ ∧ 𝑑𝑥𝑖𝑘 ) 𝐼∈ℰ𝑘

𝐼∈ℰ𝑘

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

for all 𝑢 ∈ 𝑈𝛼 . Then the exterior derivative 𝑑𝜔 ∈ ⋀

𝑘+1

173

𝑀 is given on 𝑈𝛼 by

𝑚

𝜕𝑐𝐼 𝑑𝑥 ) ∧ (𝑑𝑥𝐼 ) 𝜕𝑥𝑗 𝑗 𝑗=1

𝑑𝜔 = ∑ (𝑑𝑐𝐼 ) ∧ (𝑑𝑥𝐼 ) = ∑ ( ∑ (4.40)

𝐼∈ℰ𝑘

𝐼∈ℰ𝑘 𝑚

= ∑ ∑ 𝐼∈ℰ𝑘 𝑗=1

𝜕𝑐𝐼 ⋅ (𝑑𝑥𝑗 ) ∧ (𝑑𝑥𝑖1 ∧ ⋯ ∧ 𝑑𝑥𝑖𝑘 ) 𝜕𝑥𝑗

where 𝑑𝑐𝐼 is the usual exterior derivative (4.39) of the scalar function 𝑐𝐼 (𝑢). The chart determines a basis 𝔛∗𝑝 = {(𝑑𝑥1 )𝑝 , … , (𝑑𝑥𝑚 )𝑝 } in TM∗𝑝 for each 𝑘

𝑝 ∈ 𝑈𝛼 and “standard” basis vectors in ⋀ (TM∗𝑝 ), 𝑒𝐼∗ = 𝑑𝑥𝐼 = (𝑑𝑥𝑖1 ∧ ⋯ ∧ 𝑑𝑥𝑖𝑘 ) for 𝐼 = (𝑖1 < ⋯ < 𝑖𝑘 ) in ℰ𝑘 . 𝑘+1

A monomial 𝑑𝑥𝑗 ∧ (𝑑𝑥𝑖1 ∧ ⋯ ∧ 𝑑𝑥𝑖𝑘 ) in (4.40) will be zero in ⋀ (TM∗𝑝 ) if 𝑗 is one of the entries in 𝐼; otherwise it will be nonzero but not a standard basis 𝑘+1 vector in ⋀ (TM∗𝑢 ) unless 𝑗 < 𝑖1 < ⋯ < 𝑖𝑘 . However, it is easy to rewrite any monomial 𝑑𝑥𝑗 ∧(𝑑𝑥𝑖1 ∧⋯∧𝑑𝑥𝑖𝑘 ) appearing in (4.40) as a sum involving ordered monomials using the rules provided in Theorem 4.44 and Exercise 4.47. Remark 4.54. An annoying technical issue arises in applying (4.40). This definition presumes that 𝜔 has been presented as a sum of monomials 𝑐𝐼 𝑒𝐼∗ involving the standard basis vectors with 𝐼 ∈ ℰ𝑘 . If we wish to compute the exterior derivative of a 𝑘-form 𝜔 = ∑𝐽∈[1,𝑚]𝑘 𝑐𝐽 ⋅𝑒𝑗∗1 ∧⋯∧𝑒𝑗∗𝑘 involving unordered monomials 𝑒𝐽∗ , it seems we would have to rewrite each summand in terms of the standard (ordered) basis monomials 𝑒𝐼∗ with 𝐼 ∈ ℰ𝑘 before applying (4.40). This could be done using the commutation rule 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 = −𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖 , repeatedly transposing adjacent factors to get 𝑑𝑥𝐽 = ±𝑑𝑥𝐼 for some 𝐼 ∈ ℰ𝑘 . That would be a terrible nuisance, but the following observation saves the day. ○ Lemma 4.55 (Extended Exterior Derivative). Formula (4.40) remains valid for 𝑘-forms on a chart domain 𝑈𝛼 , 𝜔=

∑

𝑐𝐽 ⋅ 𝑒𝐽∗

(𝑐𝐽 ∈ 𝒞 ∞ (𝑈𝛼 )) ,

𝐽∈[1,𝑚]𝑘

even if they involve monomials 𝑒𝐽∗ that are unordered or have repeated indices (in which case the term is zero). Proof. If an arbitrary multi-index 𝐽 ∈ [1, 𝑚]𝑘 has a repeated entry, say 𝑗𝑟 = 𝑗𝑠 , then 𝑑(𝑓 𝑑𝑥𝐽 ) = (𝑑𝑓) ∧ 𝑑𝑥𝐽 = 0 and does not contribute to the sum (4.40). Otherwise there is a unique permutation 𝜎 ∈ 𝑆𝑘 such that 𝐽 = 𝜎 ⋅ 𝐼 for an ordered index 𝐼 ∈ ℰ𝑘 , and then for any 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ), we have 𝑑(𝑓 ⋅ 𝑑𝑥𝐽 ) = 𝑑(𝑓 ⋅ sgn(𝜎) 𝑑𝑥𝐼 ) = sgn(𝜎) ⋅ (𝑑𝑓 ∧ 𝑑𝑥𝐼 ) (definition of 𝑑-operator) = 𝑑𝑓 ∧ sgn(𝜎) ⋅ 𝑑𝑥𝐼 = 𝑑𝑓 ∧ 𝑑𝑥𝐽 .

□

174

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

We will take advantage of this very useful fact below. 𝑘 𝑘+1 We now list the basic properties of the operators 𝑑𝑘 ∶ ⋀ 𝑀 → ⋀ 𝑀. The proofs are complicated, so we will not go into them here in order to press on toward reinterpreting multivariate calculus in terms of differential forms. Theorem 4.56 (Basic Properties of Exterior Derivatives). Although 𝑑𝑘 𝜔 is defined on 𝑀 using its description on a typical coordinate chart (𝑈𝛼 , 𝑥𝛼 ), formula 𝑘+1 (4.40) determines the same differential form (𝑑𝑘 𝜔)𝑝 in ⋀ 𝑀 for every chart (𝑈𝛼 , 𝑥𝛼 ) containing 𝑝. Thus, (4.40) yields a well-defined differential form 𝑑𝑘 𝜔 in 𝑘+1 𝑘 ⋀ 𝑀 for every 𝜔 ∈ ⋀ 𝑀. Furthermore, we have the following: 0

1

1. The Rank-0 Case.The rank-0 derivative 𝑑0 ∶ ⋀ 𝑀 → ⋀ 𝑀 is just the 𝑑-operator discussed earlier in (4.24) such that ⟨𝑑0 𝑓, 𝑋𝑝 ⟩ = ⟨𝑋𝑝 , 𝑓⟩ for 𝑋𝑝 ∈ TM𝑝 , 𝑓 ∈ 𝒞 ∞ (𝑝). 𝑘

2. Linearity. Each 𝑑𝑘 is a linear operator from ⋀ 𝑀 → ⋀

𝑘+1

𝑘

𝑀. ℓ

3. General Derivation Property. If 𝜔1 ∈ ⋀ 𝑀 and 𝜔2 ∈ ⋀ 𝑀, then 𝑑𝑘+ℓ (𝜔1 ∧ 𝜔2 ) = 𝑑𝑘 (𝜔1 ) ∧ 𝜔2 + (−1)𝑘 𝜔1 ∧ 𝑑ℓ (𝜔2 ) 𝑘+ℓ+1

in ⋀ 𝑀. This property governs the interaction between exterior derivatives and wedge products of differential forms on 𝑀. 4. By far the most interesting property is 𝑑2 = 0, which is a shorthand version of the statement (4.41)

𝑑𝑘+1 ∘ 𝑑𝑘 (𝜔) = 0 in ⋀

𝑘+2

𝑀

𝑘

for all 𝜔 ∈ ⋀ 𝑀 .

This follows from equality of mixed second-order partial derivatives of class 𝒞 (2) functions on ℝ𝑚 . Parts 1 and 2 are trivial; here is a proof that 𝑑2 = 0. Lemma 4.57. For each 𝑘 = 0, 1, 2, … we have 0 = 𝑑2 = 𝑑𝑘+1 ∘ 𝑑𝑘 . Proof. It suffices to show 𝑑2 𝜔 = 0 on a coordinate chart (𝑈𝛼 , 𝑥𝛼 ). In local 𝑘 coordinates, 𝜔 ∈ ⋀ 𝑀 is a sum over 𝐼 ∈ ℰ𝑘 of terms like 𝑓 ⋅ 𝑑𝑥𝐼 . Since the exterior derivative 𝑑𝑘 is linear, 𝑑𝑘 𝜔 is a sum of terms 𝑚

𝜕𝑓 ⋅ (𝑑𝑥𝑖 ∧ 𝑑𝑥𝐼 ), 𝜕𝑥 𝑖 𝑖=1

𝑑(𝑓𝑑𝑥𝐼 ) = ∑

and then 𝑑2 (𝜔) = 𝑑𝑘+1 (𝑑𝑘 𝜔) consists of terms 𝑚

𝑚

𝜕2 𝑓 ⋅ 𝑑𝑥𝑗 ∧ (𝑑𝑥𝑖 ∧ 𝑑𝑥𝐼 ) 𝜕𝑥𝑗 𝜕𝑥𝑖 𝑗=1 𝑖=1

𝑑(𝑑𝜔) = ∑ ∑

with 𝐼 ∈ ℰ𝑘 .

The monomials (𝑑𝑥𝑖 ∧𝑑𝑥𝐼 ) appearing in 𝑑(𝑓𝑑𝑥𝑖 ) might not be in standard form, but formula (4.40) nevertheless determines the same differential form (𝑑𝜔)𝑝 in ⋀

𝑘+1

(TM∗𝑝 ) for every chart 𝑈𝛼 containing 𝑝.

4.3. DIFFERENTIAL FORMS ON 𝑴 AND THEIR EXTERIOR DERIVATIVES

Since terms

175

𝜕2 𝑓 𝜕2 𝑓 = , anticommutativity of wedge products makes the 𝜕𝑥𝑗 𝜕𝑥𝑖 𝜕𝑥𝑖 𝜕𝑥𝑗

𝜕2 𝑓 (𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖 ∧ 𝑑𝑥𝐼 ) 𝜕𝑥𝑗 𝜕𝑥𝑖 and 𝜕2 𝑓 𝜕2 𝑓 (𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝐼 ) = (−1) (𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖 ∧ 𝑑𝑥𝐼 ) 𝜕𝑥𝑖 𝜕𝑥𝑗 𝜕𝑥𝑖 𝜕𝑥𝑗 in the double sum cancel in pairs for 𝑖 ≠ 𝑗, and when 𝑗 = 𝑖, we have 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖 = 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑖 ≡ 0 on 𝑈𝛼 . Thus, 𝑑(𝑑𝜔) = 0 on any chart, as claimed. □ The following observation is also useful in calculations involving exterior derivatives. 𝑘

Lemma 4.58. If 𝑓 ∈ 𝒞 ∞ (𝑀) and 𝜔 ∈ ⋀ 𝑀 with 𝜔 = ∑𝐼∈ℰ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑥𝐼 ) 𝑘 in local coordinates, then the exterior derivative of the combined 𝑘-form 𝑓𝜔 is described on the chart domain 𝑈𝛼 by (4.42)

𝑑(𝑓𝜔) = ∑ 𝑐𝐼 (𝑢) ⋅ (𝑑𝑓) ∧ (𝑑𝑥𝐼 ) + 𝑓(𝑢) ⋅ (𝑑𝑐𝐼 ) ∧ (𝑑𝑥𝐼 ) 𝐼∈ℰ𝑘

in ⋀

𝑘+1

(𝑈).

Proof. The exterior derivative takes the form 𝑑(𝑓𝜔) = 𝑑( ∑ (𝑓 ⋅ 𝑐𝐼 )(𝑢) ⋅ (𝑑𝑥𝐼 )) = ∑ 𝑑((𝑓𝑐𝐼 ) ⋅ (𝑑𝑥𝐼 )) 𝐼∈ℰ𝑘

𝐼∈ℰ𝑘

= ∑ 𝑑(𝑓𝑐𝐼 ) ∧ (𝑑𝑥𝐼 )

(by (4.40))

𝐼∈ℰ𝑘

in local coordinates. Since 𝑑 is a derivation on 𝒞 ∞ (𝑈𝛼 ), we get (4.42).

□

𝑘

Primitives of 𝒌-Forms. We say that 𝜇 ∈ ⋀ 𝑀 is a primitive for 𝜔 in 𝑘+1

⋀ 𝑀 if 𝑑𝑘 𝜇 = 𝜔 on 𝑀. A primitive for 𝜔 might not exist, but the condition 𝑑2 ≡ 0 imposes a necessary condition for 𝜔 to have locally defined primitives, namely 𝑑𝜔 = 𝑑𝑘+1 𝜔 ≡ 0 on 𝑀. In fact, if 𝜔 = 𝑑𝜇 on some open set 𝑈 ⊆ 𝑀, then 𝑑𝜔 = 𝑑2 𝜇 = 0 on 𝑈. For rank-1 forms, this identity is equivalent to the set of “consistency conditions” (4.26) mentioned earlier. It suffices to verify this on a typical chart, so suppose 2 1 𝑚 𝑑1 𝜔 = 0 in ⋀ (𝑈𝛼 ) for 𝜔 = ∑𝑖=1 𝑐𝑖 ⋅ 𝑑𝑥𝑖 in ⋀ (𝑈𝛼 ). Then 𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 = 0 if 𝑖 = 𝑗, so 𝑚

𝑑𝜔 = ∑ 𝑑𝑐𝑖 ∧ 𝑑𝑥𝑖 = ∑ ∑ ( 𝑖=1

=

∑ 1≤𝑖 3. 𝑘

Let {𝐞𝑖 } be the standard basis vectors in 𝑉 = ℝ3 and {𝐞∗𝑖 } the dual vectors 1

in 𝑉 ∗ = ⋀ 𝑉. Fix a base point 𝑝 in 𝑀 = ℝ3 . If we interpret the 𝐞𝑖 as tangent vectors at 𝑝, so 𝐞𝑖 = the directional derivative (𝜕/𝜕𝑥𝑖 |𝑝 ) in TM𝑝 , then the dual basis vectors are 𝐞∗𝑖 = (𝑑𝑥𝑖 )𝑝 in TM∗𝑝 . We have already seen that 3

Classical Gradient: ∇𝑓 = ∑ 𝐷𝑥𝑖 𝑓(𝐱) ⋅ 𝐞𝑖 𝑖=1

should be interpreted as a rank-1 differential form, the exterior derivative of 0 𝑓 ∈ ⋀ 𝑀 = 𝒞 ∞ (𝑀), which involves the dual basis vectors 𝐞∗𝑖 = (𝑑𝑥𝑖 )𝑝 , 3

3

𝜕𝑓 𝜕𝑓 (𝑝) 𝐞∗𝑖 = ∑ (𝑝) (𝑑𝑥𝑖 )𝑝 = (𝑑𝑓)𝑝 𝜕𝑥 𝜕𝑥 𝑖 𝑖 𝑖=1 𝑖=1

∇𝑓(𝑝) = ∑

in TM∗𝑝

182

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

rather than as a tangent vector 3

3

𝜕𝑓 𝜕𝑓 𝜕 | | ) ∑ (𝑝) 𝐞𝑖 = ∑ (𝑝) ( 𝜕𝑥 𝜕𝑥 𝜕𝑥 𝑖 𝑖 𝑖 |𝑝 𝑖=1 𝑖=1

in TM𝑝 .

As noted earlier, the exterior derivative 𝑑𝑓 = ∇𝑓 takes the same general form in all coordinate systems. For instance, in spherical coordinates 𝑦𝛽 (𝐱) = (𝜌, 𝜃, 𝜙) and Cartesian coordinates 𝑥𝛼 (𝐱) = (𝑥, 𝑦, 𝑧), we have 𝑑𝑓 =

𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝑑𝑥 + ⋯ + 𝑑𝑧 = 𝑑𝜌 + ⋯ + 𝑑𝜙 . 𝜕𝑥 𝜕𝑧 𝜕𝜌 𝜕𝜙

Now consider a classical “vector field” on ℝ3 of the sort encountered in calculus, which is usually written as 𝐅(𝐱) = 𝐹1 (𝐱) 𝐢 + 𝐹2 (𝐱) 𝐣 + 𝐹3 (𝐱) 𝐤 with smooth coefficients 𝐹𝑖 ∶ ℝ3 → ℝ. It is not clear how the basis vectors {𝐢, 𝐣, 𝐤} in these classical narratives are to be interpreted — as tangent vectors attached to each base point 𝑝? Cotangent vectors? Something else? The answer depends on the physical nature of the “field” being modeled. For instance we have seen that an electric field 𝐄(𝑝), being ∇𝜙 for some scalar potential function, should be regarded as a field of cotangent vectors and not as a field of tangent vectors. Our interpretation of ∇𝑓 as a field of cotangent vectors suggests that, in this case at least, we might interpret the traditional basis vectors {𝐢, 𝐣, 𝐤} as 𝐢 = (𝑑𝑥)𝑝

𝐣 = (𝑑𝑦)𝑝

𝐤 = (𝑑𝑧)𝑝 .

On the other hand, consider the operator curl = ∇× which sends a classical vector field 𝐅 to a new vector field ∇ × 𝐅 given by 𝐢 𝐣 𝐤 ∇ × 𝐅 = curl(𝐅) = det [ 𝜕/𝜕𝑥 𝜕/𝜕𝑦 𝜕/𝜕𝑧 ] 𝐹1 𝐹2 𝐹3 (4.49)

=(

𝜕𝐹 𝜕𝐹3 𝜕𝐹2 𝜕𝐹 𝜕𝐹 𝜕𝐹 − )𝐢 + ( 1 − 3)𝐣 + ( 2 − 1)𝐤 . 𝜕𝑦 𝜕𝑧 𝜕𝑧 𝜕𝑥 𝜕𝑥 𝜕𝑦

Should we still interpret the symbols 𝐢, 𝐣, 𝐤 the same way — as dual vectors in TM∗𝑝 ? The proper interpretation will depend on what type of tensor field 𝐅 represents — a vector field in 𝒟 (1,0) (𝑀), a field of cotangent vectors in 𝒟 (0,1) (𝑀) = 1 ⋀ 𝑀, or whatever. If 𝐅 represents a smooth 1-form (say an electric field in space), it turns out that ∇ × 𝐅 should be interpreted as the exterior derivative 𝑑𝐅, making ∇ × 𝐅 a smooth field of antisymmetric rank-2 tensors, a 2-form in 2 ⋀ 𝑀.

4.4. DIV, GRAD, CURL, AND ALL THAT

183

To see that this is the correct interpretation, write 𝐅 = 𝐹1 𝑑𝑥1 + 𝐹2 𝑑𝑥2 + 1 𝐹3 𝑑𝑥3 , regarding 𝐅 as a 1-form in ⋀ 𝑀. Its exterior derivative would then be 3

3

3

𝜕𝐹𝑖 𝑑𝑥𝑗 ) ∧ 𝑑𝑥𝑖 𝜕𝑥 𝑗 𝑗=1

𝑑𝐅 = ∑ (𝑑𝐹𝑖 ) ∧ 𝑑𝑥𝑖 = ∑ ( ∑ 𝑖=1

𝑖=1

𝜕𝐹 = ∑ 𝑖 ⋅ 𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖 𝜕𝑥𝑗 𝑖≠𝑗 ∑

=

1≤𝑖 0}, the radial variable 𝜌 is irrelevant because 𝜌 = ‖𝐱‖ = 1 for all points on the sphere 𝕊𝟚 . The standard coordinate chart on the upper hemisphere 𝐻3+ is 𝑥𝛼 (𝑢) = 𝑃(𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 , 𝑥2 )

in the unit disc 𝐷 ⊆ ℝ2 .

194

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

Rewrite the following rank-2 differential forms on 𝑆2 in new coordinates: (a) (𝑑𝜃) ∧ (𝑑𝜙) in terms of (𝑑𝑥1 ) ∧ (𝑑𝑥2 ). (b) (𝑑𝑥1 ) ∧ (𝑑𝑥2 ) in terms of (𝑑𝜃) ∧ (𝑑𝜙). Section 4.3. Differential Forms on 𝑴 and Exterior Derivatives 1. True/False Questions (“True” if the statement is always true.) (a) Every rank-𝑘 tensor 𝜔 on a finite-dimensional vector space 𝑉 is either symmetric with 𝜔(𝑣𝜍(1) , … , 𝑣𝜍(𝑘) ) = 𝜔(𝑣1 , … , 𝑣𝑘 ) or antisymmetric with 𝜔(𝑣𝜍(1) , … , 𝑣𝜍(𝑘) ) = sgn(𝜎) ⋅ 𝜔(𝑣1 , … , 𝑣𝑘 ) , for all permutations 𝜎 ∈ 𝑆𝑘 . (b) Every rank-𝑘 tensor 𝜔 ∈ 𝑉 (0,𝑘) is a sum 𝜔 = 𝜔𝑎 + 𝜔𝑠 of an antisymmetric tensor and a symmetric tensor. (c) Every rank-2 tensor 𝜔(𝑣1 , 𝑣2 ) in 𝑉 (0,2) has the symmetry property that allows you to move a scalar 𝜆 from one input to the other: 𝜔(𝜆𝑣1 , 𝑣2 ) = 𝜔(𝑣1 , 𝜆𝑣2 ). (d) A rank-𝑟 tensor 𝜔(𝑣1 , … , 𝑣𝑟 ) is zero if any of its inputs 𝑣𝑖 ∈ 𝑉 is the zero vector in 𝑉. (e) If dim(𝑉) = 𝑛, 𝜔 ∈ 𝑉 (0,𝑟) is a tensor of rank 𝑟 ≥ 2 on 𝑉, and 1 ≤ 𝑖 < 𝑗 ≤ 𝑟, then for all inputs 𝑣1 , … , 𝑣𝑟 , we have 𝜔(𝑣1 , … , 𝑣𝑗 , … , 𝑣𝑖 , … , 𝑣𝑟 ) = (−1) ⋅ 𝜔(𝑣1 , … , 𝑣𝑖 , … , 𝑣𝑗 , … , 𝑣𝑟 ), i.e., transposing any two inputs to 𝜔 switches the sign (±) of the output. (f) The output of a rank-𝑘 tensor an a finite-dimensional vector space is “translation invariant”: for fixed 𝐚 = (𝑎1 , … , 𝑎𝑘 ) in 𝑉, we have 𝜔(𝑣1 + 𝑎1 , … , 𝑣𝑘 + 𝑎𝑘 ) = 𝜔(𝑣1 , … , 𝑣𝑘 ) for all inputs 𝑣1 , … , 𝑣𝑘 ∈ 𝑉. 2. Explain why 𝑘

𝜔(𝜆1 𝑣1 , … , 𝜆𝑘 𝑣𝑘 ) = ( ∏ 𝜆𝑖 ) ⋅ 𝜔(𝑣1 , … , 𝑣𝑘 ) for 𝑣𝑖 ∈ 𝑉, 𝜆𝑖 ∈ 𝕂 𝑖=1

for any rank-𝑘 tensor 𝜔 on a vector space 𝑉. 3. If ℓ1 , … , ℓ𝑘 are in 𝑉 ∗ , verify that the “tensor product operator” 𝑘

ℓ1 ⊗ ⋯ ⊗ ℓ𝑘 ∶ (𝑣1 , … , 𝑣𝑘 ) → ∏⟨ℓ𝑗 , 𝑣𝑗 ⟩ 𝑗=1

from 𝑉 × ⋯ × 𝑉 → ℝ is a 𝑘-linear map and hence a tensor in 𝑉 (0,𝑘) .

ADDITIONAL EXERCISES

195

4. Given a chart (𝑈𝛼 , 𝑥𝛼 ) with 𝑥𝛼 (𝑢) = (𝑥1 , … , 𝑥𝑚 ) on a manifold 𝑀, we may define rank-2 tensor fields (𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑗 ) on 𝑈𝛼 . (a) If 𝑖 ≠ 𝑗, is (𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑗 )𝑝 ≠ 0 for all 𝑝 ∈ 𝑈𝛼 ? (b) If 𝑖 ≠ 𝑗, is (𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑗 )𝑝 ≠ (𝑑𝑥𝑗 ⊗ 𝑑𝑥𝑖 )𝑝 for all 𝑝 ∈ 𝑈𝛼 ? (c) Is (𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑖 )𝑝 nonzero for all 𝑝 ∈ 𝑈𝛼 ? Explain your answers. 5. Explain why a wedge product of vectors in 𝑉 ∗ 𝜙1∗ ∧ ⋯ ∧ 𝜙𝑟∗

with 1 ≤ 𝑟 ≤ 𝑛 = dim(𝑉)

is zero unless the vectors 𝜙𝑖 are linearly independent. 1

2

2

6. If 𝜔 ∈ ⋀ 𝑀, prove that 𝜔 ∧ 𝜔 = 0 in ⋀ 𝑀. If 𝜔 ∈ ⋀ 𝑀, is 𝜔 ∧ 𝜔 = 0 4 in ⋀ 𝑀? Hint. You can restrict attention to a chart domain, where we have (𝑑𝑥𝑖 ∧ 𝑑𝑥𝑗 ) = (−1) ⋅ (𝑑𝑥𝑗 ∧ 𝑑𝑥𝑖 ). For the second part, try a sum of two suitably chosen monomials 𝑒𝑖∗ ∧ 𝑒𝑗∗ . 7. We have noted the very useful fact that every rank-2 tensor 𝜔 (a bilinear form) on 𝑉 can always be split as a sum 𝜔𝑠 +𝜔𝑎 of a symmetric ten2 sor 𝜔𝑠 ∈ 𝑆 2 (𝑉) and an antisymmetric tensor 𝜔𝑎 ∈ ⋀ (𝑉). This is not always true for tensors of rank 𝑟 ≥ 3, and for large 𝑟, the dimension of 𝑟 the linear span 𝑆 𝑟 (𝑉) + ⋀ (𝑉) lags far behind dim 𝑉 (0,𝑟) = (dim 𝑉)𝑟 . Prove that it is not always possible to decompose tensors of rank = 𝑟 > 𝑛 = dim(𝑉) in 𝑉 (0,𝑟) as 𝜔 = (antisymmetric) + (symmetric). Hint. Antisymmetric tensors are zero when 𝑟 > 𝑛, so existence of such a decomposition would force all tensors of rank 𝑟 > dim 𝑉 to be symmetric. Do you really believe that is so? In seeking a counterexample, try 𝑉 = ℝ2 and see if you can construct a rank-3 tensor on 𝑉 that is not symmetric. In fact, if 𝔛 = {𝐞𝑖 } is a basis in 𝑉 with dual basis 𝔛∗ = {𝐞∗𝑖 }, try a few of the 23 = 8 possible monomials 𝐞∗𝑖1 ⊗ 𝐞∗𝑖2 ⊗ 𝐞∗𝑖3 that are basis vectors for 𝑉 (0,3) . Are they all symmetric tensors? 8. Determinants as Tensors on 𝕂𝑛 . If 𝐴 ∈ M(𝑛, 𝕂) and we regard its rows 𝑅𝑖 = (𝑎𝑖1 , … , 𝑎𝑖𝑛 ) ∈ 𝕂𝑛 as inputs to the map 𝜔(𝑅1 , … , 𝑅𝑛 ) = det(𝐴) , explain why the following hold: (a) 𝜔 ∶ 𝑉 × ⋯ × 𝑉 → 𝕂 is a rank-𝑛 multilinear map on 𝑉 = 𝕂𝑛 . (b) The output 𝜔(𝑅1 , … , 𝑅𝑛 ) changes sign if two inputs are switched before applying 𝜔. (c) The output 𝜔(𝑅1 , … , 𝑅𝑛 ) gets multiplied by sgn(𝜎) if we apply an arbitrary permutation 𝜎 ∈ 𝑆𝑛 to the inputs, sending 𝑅𝑖 → 𝑅𝜍(𝑖) for 1 ≤ 𝑖 ≤ 𝑛. Thus 𝜔 is an antisymmetric tensor of rank = dim(𝑉).

196

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS 𝑛

Note. ⋀ 𝑉 is one-dimensional when 𝑛 = dim(𝑉); furthermore, if 𝔛 = {𝑒𝑗 } is a basis for 𝑉 and 𝔛∗ = {𝑒𝑗∗ } is the dual basis in 𝑉 ∗ , then 𝑛

𝜔 = 𝑒1∗ ∧ ⋯ ∧ 𝑒𝑛∗ is a basis vector for ⋀ 𝑉. Thus, the determinant of 𝐴 is the unique scalar 𝑑(𝐴) such that det(𝐴) = 𝑑(𝐴) ⋅ 𝑒1∗ ∧ ⋯ ∧ 𝑒𝑛∗ in 𝑛 ⋀ 𝑉. This should be compared with the official definition of det(𝐴) in equation (4.3) of LA I. ○ Hint. Review the subsection Row Operations, Determinants, and Matrix Inverses in Section 4.3 of LA I. 9. On 𝑀 = ℝ2 , we have Cartesian coordinates 𝑥𝛼 (𝑢) = (𝑥, 𝑦) and polar coordinates 𝑦𝛽 (𝑢) = (𝑟, 𝜃). Using the change of variable formula (Corollary 4.16) for tangent vectors, (a) Rewrite the vector fields (𝜕/𝜕𝑥) and (𝜕/𝜕𝑦) in terms of (𝜕/𝜕𝑟) and (𝜕/𝜕𝜃). (b) Regarding smooth vector fields 𝑋˜ on 𝑀 as operators that act on 𝐶 ∞ functions 𝐶 ∞ (𝑀) via ˜ 𝑋(𝑓)(𝑢) = ⟨𝑋𝑢 , 𝑓⟩ for 𝑓 ∈ 𝒞 ∞ (𝑀), 𝑢 ∈ 𝑀 , express the following composite operators on functions 𝜕2 𝑓 𝜕 𝜕𝑓 = ( ) 2 𝜕𝑥 𝜕𝑥 𝜕𝑥

and

𝜕2 𝑓 𝜕 𝜕𝑓 = ( ) 2 𝜕𝑦 𝜕𝑦 𝜕𝑦

in terms of the operators (𝜕/𝜕𝑟) and (𝜕 /𝜕𝜃). (c) What form does the classical Laplace operator Δ2 𝑓 =

𝜕2 𝑓 𝜕2 𝑓 + 𝜕𝑥2 𝜕𝑦2

take when rewritten in terms of polar coordinates? Calculations with Vector and Tensor Fields on 𝑆 2 . The next Additional Exercises show how to handle calculations involving vector and tensor fields on a manifold, the two-dimensional sphere 𝑆 2 = {𝐱 ∶ 𝑥12 + 𝑥22 + 𝑥32 = 1} in ℝ3 . One of the founding charts (𝑈𝛼 , 𝑥𝛼 ) that determine the manifold structure of 𝑆 2 has as its domain the upper hemisphere 𝑀 = 𝐻3+ = {𝐱 ∈ ℝ3 ∶ 𝑥3 > 0} with chart map 𝑥𝛼 the projection 𝑃3 (𝑥1 , 𝑥2 , 𝑥3 ) = (𝑥1 , 𝑥2 ) ∈ ℝ2 of 𝑀 onto the unit disc 𝐷 = {(𝑥1 , 𝑥2 ) ∶ 𝑥12 + 𝑥22 < 1} in the 𝑥1 , 𝑥2 -plane. (Recall Example 4.2 and the accompanying figure.) The calculations will also illustrate the use of the the change of variable formulas for tangent vectors (Theorem 4.15), for cotangent vectors (Theorem 4.30), and for a particularly important rank-2 tensor, the Euclidean metric tensor on ℝ2 discussed in Example 4.38. 10. Define coordinate charts on the hemisphere 𝑀 ⊆ 𝑆 2 as follows: (i) (𝑈𝛼 , 𝑥𝛼 ) with 𝑈𝛼 = 𝑀 and chart map 𝑥𝛼 (𝐱) = 𝑃(𝐱) = (𝑥, 𝑦) for points 𝐱 = (𝑥, 𝑦, 𝑧) ∈ 𝑆 2 . (This is of course one of the “founding charts” by which differentiable manifold structure was imposed on 𝑆 2 .)

ADDITIONAL EXERCISES

197

(ii) Define (𝑈𝛽 , 𝑦𝛽 ) letting 𝑈𝛽 = 𝑀 and 𝑥𝛽 (𝑢) = (𝑟, 𝜃) the map that sends 𝑢 = (𝑥, 𝑦, 𝑧) ∈ 𝑆 2 to the polar coordinates (𝑟, 𝜃) of the projected point 𝑃(𝑢) = (𝑥, 𝑦) in the 𝑥, 𝑦-plane. Both charts determine smooth fields of tangent vectors on 𝑀 ⊆ 𝑆 2 : 𝜕 | 𝜕 | 𝜕 | 𝜕 | | ,𝑌 = | | ,𝑊 = | 𝑋= and 𝑉 = 𝜕𝑥 |𝑢 𝜕𝑦 |𝑢 𝜕𝑟 𝑢 |𝑢 𝜕𝜃 |𝑢 for 𝑢 ∈ 𝑀. Use the change of variable formula for tangent vectors, Corollary 4.16, to rewrite the vector field 𝜕/𝜕𝜃 in terms of 𝜕/𝜕𝑥 and 𝜕/𝜕𝑦 on 𝑀: 𝜕 𝜕 | 𝜕 | = 𝐴(𝑢) | + 𝐵(𝑢) | . | 𝜕𝑥 𝑢 𝜕𝑦 |𝑢 𝜕𝜃 Write the coefficients 𝐴(𝑢) and 𝐵(𝑢) in terms of the scalar components of the chart map 𝑦𝛽 (𝑢) = (𝑅(𝑢), Θ(𝑢)) and their derivatives. 11. In the setting of Additional Exercise 10, take (𝑈𝛼 , 𝑥𝛼 ) to be the same, but let 𝑦𝛽 be the hemisphere 𝑈𝛽 = 𝑀 equipped with spherical coordinates 𝑦𝛽 (𝑢) = (𝜃, 𝜙). Use the change of variable formula for tangent vectors on 𝑆 2 to rewrite 𝜕 | 𝜕 | 𝜕 | | in terms of | and | . 𝜕𝜙 |𝑢 𝜕𝑥 |𝑢 𝜕𝑦 |𝑢 As in the previous exercise, write the coefficients 𝐴(𝑢), 𝐵(𝑢) in terms of the components of the chart map 𝑦𝛽 (𝑢) = (𝜃, 𝜙) = (Θ(𝑢), Φ(𝑢)). Note. For (𝑥, 𝑦, 𝑧) ∈ ℝ3 we have 𝑥 = cos(𝜃) cos(𝜙) 𝑦 = sin(𝜃) cos(𝜙)

𝑥 = cos(𝜙)

2

since ‖𝐱‖ = 1 on 𝑆 (see Figure 4.3). Keep in mind our convention −𝜋/2 < 𝜙 < 𝜋/2 regarding the “azimuth angle” 𝜙. ○ Calculations Involving Differential Forms on 𝑆 2 . The coordinates 𝑥𝛼 (𝑢) = (𝑥, 𝑦) and 𝑦𝛽 = (𝜃, 𝜙) imposed on 𝑆 2 in Additional Exercise 11 also determine differential forms (smooth fields of cotangent vectors) on the sphere, which are denoted by (𝑑𝑥)𝑢 , (𝑑𝑦)𝑢

and

(𝑑𝜃)𝑢 , (𝑑𝜙)𝑢 ,

respectively. Below we illustrate use of the change of variable formula for cotangent vectors, Theorem 4.30. Recall that these fields of cotangent vectors are the exterior derivatives of the scalar components of chart maps 𝑥𝛼 = (𝑋(𝑢), 𝑌 (𝑢)) and 𝑦𝛽 = (𝑅(𝑢), Θ(𝑢)) with 𝑑𝑥𝑢 = (𝑑𝑋)𝑢 , 𝑑𝑦𝑢 = (𝑑𝑌 )𝑢

and

𝑑𝜃𝑢 = (𝑑Θ)𝑢 , 𝑑𝜙𝑢 = (𝑑Φ)𝑢 .

(Recall the discussion surrounding equation (4.22) of differential forms (𝑑𝑥𝑖 )𝑝 associated with a chart vs the exterior derivatives (𝑑𝑋)𝑝 of the scalar components of the chart map 𝑥𝛼 = (𝑋1 (𝑢), … , 𝑋𝑚 (𝑢)).)

198

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

12. For points 𝑢 ∈ 𝑀 = ℝ3 , let 𝐱 = 𝑥𝛼 (𝑢) = (𝑥1 , 𝑥2 , 𝑥3 ) be the standard Euclidean coordinates and define new coordinates, letting 𝐲 = 𝑦𝛽 (𝑢) = (𝑦1 , 𝑦2 , 𝑦3 ) where 𝐲 = 𝐿𝐴 (𝐱) = 𝐴 ⋅ 𝐱 for the nonsingular matrix 0 2 −1 2 ). 𝐴=(1 2 3 0 −1 (a) Determine the entries in the Jacobian matrix [𝜕𝑦𝑖 /𝜕𝑥𝑗 (𝑢)] at a typical point 𝑢 ∈ 𝑀. (b) Find the scalar-valued functions 𝐴(𝑢) and 𝐵(𝑢) on 𝑀 such that (𝑑𝑦1 ∧ 𝑑𝑦2 ∧ 𝑑𝑦3 )𝑢 = 𝐴(𝑢) ⋅ (𝑑𝑥1 ∧ 𝑑𝑥2 ∧ 𝑑𝑥3 )𝑢 (𝑑𝑥1 ∧ 𝑑𝑥2 ∧ 𝑑𝑥3 )𝑢 = 𝐵(𝑢) ⋅ (𝑑𝑦1 ∧ 𝑑𝑦2 ∧ 𝑑𝑦3 )𝑢 3

in ⋀ (ℝ3 ) by rewriting (𝑑𝑦𝑖 ) in terms of (𝑑𝑥𝑖 ) (or vice-versa) using the change of variable formula for cotangent vectors (Theorem 4.30). Use the commutation relations to write products 𝑑𝑥𝐼 = 𝑑𝑥𝑖1 ∧ 𝑑𝑥𝑖2 ∧ 𝑑𝑥𝑖3 as an ordered monomial ±(𝑑𝑥1 ∧ 𝑑𝑥2 ∧ 𝑑𝑥3 ), and similarly rewrite 𝑑𝑦𝐽 = 𝑑𝑦𝑗1 ∧𝑑𝑦𝑗2 ∧𝑑𝑦𝑗3 = ±𝑑𝑦1 ∧𝑑𝑦2 ∧𝑑𝑦3 . 13. Rewrite the smooth differential forms 𝑑𝜃 and 𝑑𝜙 on 𝑀 = 𝑆 2 in terms of the forms 𝑑𝑥 and 𝑑𝑦, and vice-versa. 14. As shown in Figure 4.3, spherical coordinates 𝑥𝛽 (𝑝) = (𝜌, 𝜃, 𝜙) on 𝑀 = ℝ3 are related to Cartesian coordinates via 𝑧 = 𝜌 sin(𝜙) 𝑦 = 𝜌 sin(𝜃) cos(𝜙) 𝑥 = 𝜌 cos(𝜃) cos(𝜙). 3 Since ⋀ 𝑀 is one-dimensional, the maximal rank forms 𝑑𝑥∧𝑑𝑦∧ 𝑑𝑧 and 𝑑𝜌 ∧ 𝑑𝜃 ∧ 𝑑𝜙 are scalar multiples of each other at each base point 𝑝 ∈ 𝑀, but the scale factor 𝑐(𝑝) may vary with 𝑝. Find the weight function 𝐴(𝜌, 𝜃, 𝜙) = 𝑐(𝑦𝛽 (𝑝)) such that (𝑑𝑥 ∧ 𝑑𝑦 ∧ 𝑑𝑧) = 𝐴(𝜌, 𝜃, 𝜙) ⋅ (𝑑𝜌 ∧ 𝑑𝜃 ∧ 𝑑𝜙). Hint. When you pass from Cartesian to spherical coordinates, each of the differential forms 𝑑𝑥, 𝑑𝑦, 𝑑𝑧 becomes a sum with three terms — 33 = 27 terms in all. But all rank-3 wedgeups of 𝑑𝜌, 𝑑𝜃, 𝑑𝜙 containing repeated factors are zero, so the only ones that survive are 3! = 6 permutations of the symbol string (𝑑𝜌 ∧ 𝑑𝜃 ∧ 𝑑𝜙) (and if you are willing to ignore a (±) sign, there is just one survivor: 𝑑𝜌 ∧ 𝑑𝜃 ∧ 𝑑𝜙). Change of Variable for Tensors. If two charts (𝑈𝛼 , 𝑥𝛼 ), (𝑈𝛽 , 𝑦𝛽 ) overlap on a manifold, a field 𝜔 of rank-2 tensors will have different descriptions 𝑚

𝜔=

(𝛼)

∑ 𝑐𝑗1 ,𝑗2 (𝑢) ⋅ (𝑑𝑥𝑗1 ⊗ 𝑑𝑥𝑗2 )𝑢

𝑗1 ,𝑗2 =1 𝑚

(𝛽)

= ∑ 𝑐𝑖1 ,𝑖2 (𝑢) ⋅ (𝑑𝑦𝑖1 ⊗ 𝑑𝑦𝑖2 )𝑢 𝑖1 ,𝑖2 =1

ADDITIONAL EXERCISES

199

(𝑚2 terms in each sum). On the open set 𝑈𝛼 ∩ 𝑈𝛽 in 𝑀 where chart domains overlap, the change of variable formula (Theorem 4.30) allows us to pass from a description in 𝑦𝛽 -coordinates to one in 𝑥𝛼 -coordinates. Simply replace (𝑑𝑦𝑖 ) wherever it appears by the weighted sum involving the forms 𝑑𝑥𝑗 , 𝑚

𝜕𝑌𝑖 (𝑢) (𝑑𝑥𝑗 )𝑢 , 𝜕𝑥𝑗 𝑘=1

𝑑𝑦𝑖 = ∑

in which the 𝑌𝑖 (𝑢) are the scalar components of the chart map 𝑦𝛽 = (𝑌1 , … , 𝑌𝑚 ). Each factor (𝑑𝑦𝑖 ) appearing in 𝜔 becomes a sum of forms (𝑑𝑥𝑗 ) associated with the 𝑥𝛼 -chart. Using multilinearity of 𝜔, we can then expand the resulting product of sums as a sum of simple monomials (𝑑𝑥𝑖 ) ⊗ (𝑑𝑥𝑗 ) because 𝑚

𝑚

𝑚

( ∑ 𝑎𝑖 (𝑑𝑥𝑖 )) ⊗ ( ∑ 𝑏𝑗 (𝑑𝑥𝑗 )) = ∑ 𝑎𝑖 𝑏𝑗 ⋅ (𝑑𝑥𝑖 ⊗ 𝑑𝑥𝑗 ). 𝑖=1

𝑗=1

𝑖,𝑗=1

Putting all this together, we obtain the description of 𝜔 in 𝑥𝛼 -coordinates: 𝑚

(4.55)

(𝛼)

𝑚

𝜔 = ∑ 𝑐𝑖1 ,𝑖2 (𝑢) ⋅ ∑ ( 𝑖1 ,𝑖2 =1

𝑗1 ,𝑗2 =1

𝜕𝑌𝑖2 𝜕𝑌𝑖1 (𝑢) (𝑢)) ⋅ (𝑑𝑥𝑗1 ⊗ 𝑑𝑥𝑗2 )𝑢 . 𝜕𝑥𝑗1 𝜕𝑥𝑗2

The coefficients (⋯) in this sum are clearly 𝒞 ∞ functions of 𝑢 ∈ 𝑀, so smoothness of the tensor field is independent of any particular choice of local coordinates. With sufficient patience, the same sort of calculation can be done for higher rank tensor fields on a manifold, but we will not pursure the general change of variable formula here. It uses the same ideas, but the notation becomes more baroque as 𝑟 = rank(𝜔) increases. 15. The standard Euclidean metric tensor on 𝑀 = ℝ2 is 𝑔 = (𝑑𝑥1 ⊗ 𝑑𝑥1 )𝑢 + (𝑑𝑥2 ⊗ 𝑑𝑥2 )𝑢 when described in the usual Cartesian coordinates 𝑥𝛼 (𝑢) = (𝑥1 , 𝑥2 ) on 𝑀. In polar coordinates 𝑦𝛽 (𝑢) = (𝑟, 𝜃), this becomes 𝑔 = 𝐴11 (𝑢) ⋅ (𝑑𝑟 ∧ 𝑑𝑟)𝑢 + 𝐴12 (𝑢) ⋅ (𝑑𝑟 ∧ 𝑑𝜃)𝑢 + 𝐴21 (𝑢) ⋅ (𝑑𝜃 ∧ 𝑑𝑟)𝑢 + 𝐴22 (𝑢) ⋅ (𝑑𝜃 ∧ 𝑑𝜃)𝑢 . Find the coefficients 𝐴𝑖𝑗 (𝑢) expressed in terms of the scalar components of the chart map 𝑦𝛽 (𝑢) = (𝑅(𝑢), Θ(𝑢)). Remark 4.69 (Differential Forms as “Infinitesimal Volume Elements”). In Example 4.50, we computed the relation between the differential forms 𝑑𝑟 ∧ 𝑑𝜃 and 𝑑𝑥 ∧ 𝑑𝑦 on 𝑀 = ℝ2 that describes the effect of a transition from Cartesian to polar coordinates in 𝑀 = ℝ2 : (𝑑𝑥 ∧ 𝑑𝑦) = 𝑟(𝑑𝑟 ∧ 𝑑𝜃). Expressions of this sort should also be familiar from calculus discussions of integrals ∬𝑅 𝑓(𝐱) 𝑑𝑥𝑑𝑦 over regions 𝑅 ⊆ ℝ2 . When such an integral is computed

200

4. TENSOR FIELDS, MANIFOLDS, AND VECTOR CALCULUS

in Cartesian or polar coordinates, one obtains an identity that can be found in any calculus text: ∬ 𝑓(𝑥, 𝑦)𝑑𝑥 𝑑𝑦 = ∬ 𝑓(𝑟, 𝜃)𝑟 𝑑𝑟 𝑑𝜃. We won’t have time to go into the theory of integration on manifolds, which is an extension of integration of functions over hypersurfaces in ℝ𝑛 (but without the “ℝ𝑛 ”). In more advanced accounts you would discover that expressions like 𝑑𝑥𝑑𝑦 and 𝑟𝑑𝑟𝑑𝜃 in calculus get replaced by the differential forms (𝑑𝑥 ∧ 𝑑𝑦) and 𝑟(𝑑𝑟 ∧ 𝑑𝜃) on the manifold 𝑀 = ℝ2 ; and more generally, functions 𝑓(𝑢) on a 𝑘-dimensional manifold get integrated against some rank-𝑘 (maximal rank) differential form on 𝑀 to arrive at a numerical value for the resulting integral. When 𝑀 = ℝ2 , this leads to such identities as ∬ 𝑓(𝑝) (𝑑𝑥 ∧ 𝑑𝑦) = ∬ 𝑓(𝑝) 𝑟 (𝑑𝑟 ∧ 𝑑𝜃) 𝑀

𝑀

involving integration of functions against maximal rank differential forms on the manifold. With this in mind, mathematicians have often regarded expressions such as 𝑑𝑥∧𝑑𝑦 and 𝑟 𝑑𝑟∧𝑑𝜃 as infinitesimal elements of oriented volume in the manifold over which the integration is being performed. That is why change of variable formulas such as (𝑑𝑥 ∧ 𝑑𝑦) = 𝑟 (𝑑𝑟 ∧ 𝑑𝜃) worked out in Example 4.50 become important when attention turns to integration on manifolds. For example, if volumes computed using integrals over regions in ℝ2 are to make geometric sense, each small region of the form 𝑑𝑟 × 𝑑𝜃 in polar coordinate space (𝑟, 𝜃) has to be assigned a two-dimensional volume ≈ 𝑟 𝑑𝑟 ⋅ 𝑑𝜃, while small regions 𝑑𝑥 × 𝑑𝑦 in Cartesian coordinate space (𝑥, 𝑦) are assigned volumes = 𝑑𝑥 ⋅ 𝑑𝑦, if the calculations are to agree. ○ Section 4.4. Div, Grad, Curl, and All That 1. True/False Questions (“True” if the statement is always true.) (a) The triple product (𝐚 × 𝐛) • 𝐜 of vectors in ℝ3 is the volume of the parallelopiped 𝑃 = {𝑐1 𝐚 + 𝑐2 𝐛 + 𝑐3 𝐜 ∶ 0 ≤ 𝑐𝑖 ≤ 1} with one corner at the origin and 𝐚, 𝐛, 𝐜 as its edges emanating from the origin. (b) We have 𝐚 × 𝐛 ≠ 𝟎 for vectors in ℝ3 unless they are collinear. (c) We have 𝐚 × 𝐛 = 𝟎 for vectors in ℝ3 if and only if they are perpendicular. 2. Some calculus texts describe the cross product 𝐚 × 𝐛 as an operation whose input is an ordered pair of vectors in ℝ3 and output is another vector in ℝ3 . (a) Verify that the cross product is a vector-valued bilinear map from 𝑉 × 𝑉 → 𝑉: a linear function of each input vector when the other is held fixed.

ADDITIONAL EXERCISES

201

There seems to be no reason that we could not then compute iterated products in ℝ3 such as 𝐚 × 𝐛 × 𝐜. Or is there? (b) Try it, taking 𝐚 = (1, 2, 0),

𝐛 = (1, −1, 1),

𝐜 = (−1, 2, 3).

Did you encounter any cognitive dissonance in carrying out this calculation based on the definition of 𝐚 × 𝐛 in (4.53)? (c) Explain why it is not possible to make a consistent definition of products like 𝐚 × 𝐛 × 𝐜 or 𝐚 × 𝐛 × 𝐜 × 𝐝. Hint. The cross product operation only accepts two inputs. There are two different ways to insert parentheses into 𝐚 × 𝐛 × 𝐜 to get expressions that only involve multiplying two vectors at a time. Write out and evaluate both expressions using the vectors listed in (b). Do the outcomes agree? 3. Given nonzero vectors a and b in ℝ3 , (a) Explain why the cross product of two vectors in ℝ3 has Euclidean length ‖𝐚 × 𝐛‖ = ‖𝐚‖ ‖𝐛‖ ⋅ cos 𝜃(𝐚, 𝐛) where 𝜃 is the angle between 𝐚 and 𝐛, measured in the plane they span in ℝ3 . (b) Explain why ‖𝐚 × 𝐛‖ = Area(𝑄) where 𝑄 is the parallelogram 𝑄 = {𝑐1 𝐚 + 𝑐2 𝐛 ∶ 0 ≤ 𝑐1 , 𝑐2 ≤ 1} ⊆ ℝ3 determined by a and b. What determines whether a (+) or (−) sign should be applied to get the “signed area” of 𝑄? 4. Prove that the cross product 𝐚×𝐛 of two vectors in ℝ3 is perpendicular to the plane 𝑄 determined by 𝐚 and 𝐛 (orthogonal with respect to the Euclidean inner product in ℝ3 ). 5. Why isn’t ∇(∇ ∘ 𝐅) = 𝐠𝐫𝐚𝐝(𝐝𝐢𝐯 𝐅) ≡ 0 a valid identity for smooth vector fields 𝐅(𝐱) on ℝ3 ? 6. Mathematicians often use the symbol “Δ” to indicate the Laplace operator, the second order partial differential operator 𝜕2 𝐹 𝜕2 𝐹 +⋯+ 2 2 𝜕𝑥1 𝜕𝑥𝑛 that acts on smooth scalar-valued functions 𝑓(𝐱) on ℝ𝑛 . A more archaic notation, still widely used when 𝑛 = 3, writes the Laplacian as Δ(𝐹) = 𝐷𝑥21 𝐹 + ⋯ + 𝐷𝑥2𝑛 𝐹 =

∇2 𝐹 = ∇ ∘ (∇𝐹) = 𝐝𝐢𝐯(𝐠𝐫𝐚𝐝 𝐹). (a) Verify that this is correct for 𝑛 = 3. (It is even true when 𝑛 = 2 using our modified definitions of div and grad on two-dimensional space.) (b) What goes wrong when the dimension is ≥ 4?

10.1090/cln/030/05

CHAPTER 5

Matrix Lie Groups An Overview of This Chapter. The subject of Lie groups is the nexus where three major areas of mathematics come into play — calculus-style analysis, modern algebra, and differential geometry are all intertwined in this field, underpinned by the tools linear algebra provides. That interplay among so many disciplines can be daunting, but it is also what makes Lie groups such an interesting subject. 1 This brief chapter cannot be a treatise on the subject but is rather an introduction to the players involved, a few of the most important concepts and examples illustrating the techniques involved in working with them. Some knowledge of differential geometry will be assumed, particularly the concept of a differentiable manifold (which will also be described below), but familiarity with Section 4.1 in Chapter 4 of these Notes should suffice. Many Lie groups of great interest in physics and geometry were originally modeled as curvilinear smooth hypersurfaces 𝑀 embedded in higher dimensional Euclidean spaces ℝ𝑛 (and sometimes ℂ𝑛 ), for example the “classical matrix groups” – the orthogonal groups O(𝑛), unitary groups U(𝑛), and symplectic groups Sp(2𝑛) introduced in Chapter 3 as the symmetry groups associated with various bilinear forms that arise throughout mathematics. They will often be used as concrete examples. Beginning in the 20th century, Lie theory (and most of differential geometry as well) moved away from the notion of hypersurfaces embedded in Euclidean spaces, and developed an “intrinsic” theory of hypersurfaces — now called manifolds — that viewed Lie groups and manifolds as self-contained universes in their own right. Although we will speak of Lie groups as manifolds in these Notes, to keep things simple we will focus on matrix Lie groups which can be modeled as subsets of matrix space M(𝑛, 𝕂) taking 𝕂 = ℝ or ℂ. Besides, almost everything said for matrix Lie groups carries over to the general theory of Lie groups, although some proofs get harder in that setting. In this chapter we will • Show how differentiable structure can be defined on a matrix group 𝐺 using the implicit function theorem to define Euclidean coordinates on a family of open sets 𝑈𝛼 that cover 𝐺. Equipped with these “coordinate charts,” 𝐺 becomes a “Lie group,” in which the group multiplication operation 𝐺 × 𝐺 → 𝐺 becomes a differentiable map. 1The “Lie” in “Lie groups” is pronounced “Lee.” Sophus Lie was a Norwegian mathematician who pioneered the study of these structures in the late 1800’s. Esoteric concepts then, they are ubiquitous and indispensible in modern physics and differential geometry. 203

204

5. MATRIX LIE GROUPS

• Discuss what “tangent vectors,” “tangent spaces,” and “derivatives of smooth functions” 𝑓 ∶ 𝐺 → ℝ might mean on a group 𝐺 equipped with differentiable structure, which might be realized as curvilinear hypersurface of high dimension. This is the realm of differential geometry. • The result is a construct 𝐺 that has both geometric and algebraic aspects. Lie groups can be quite complicated, but deal with them we must because they turn up at the heart of modern physics (and other fields), for instance as the symmetry groups that govern the interactions of subatomic particles, the behavior of the cosmos according to special and general relativity, and the seemingly paradoxical rules of quantum mechanics. Even the periodic table of chemistry can in the end be deduced as a consequence of the mathematics associated with that classic example of a Lie group — the “special orthogonal group” SO(3) of rotations in 3-dimensional space. • Finally, we will show that the tangent space to a Lie group 𝐺 at its identity element, which at first sight is just a vector space of the same dimension as the coordinate charts that cover 𝐺, acquires algebraic structure induced by the multiplication law in 𝐺 and becomes what is known as a “Lie algebra.” The important point is that this Lie algebra, which is a linear structure, is much easier to deal with than 𝐺, and yet it encodes almost enough information to completely reconstruct and understand the Lie group from which it was derived. 5.1. Matrix Groups and the Implicit Function Theorem The rank of a linear operator 𝑇 ∶ 𝑉 → 𝑊 is rank(𝑇) = dim(range(𝑇)) = dim(𝑉) − dim(ker(𝑇)). In this chapter, we will often abbreviate rank(𝑇) to “rk(𝑇)” for linear operators, and likewise for matrices. If 𝔛, 𝔜 are bases in 𝑉, 𝑊, the rank of 𝑇 can be determined from the matrix 𝐴 = [𝑇]𝔜𝔛 as follows. A 𝑘 × 𝑘 submatrix is obtained by designating 𝑘 rows and 𝑘 columns and extracting from 𝐴 the 𝑘 × 𝑘 array where these meet. To describe the outcome, we must specify the row indices 𝐼 = {𝑖1 < ⋯ < 𝑖𝑘 } and column indices 𝐽 = {𝑗1 < ⋯ < 𝑗𝑘 }; the resulting submatrix is indicated by writing 𝐴𝐼𝐽 (see Figure 5.1). The matrix 𝐴 itself is not necessarily square; it is 𝑛 × 𝑚 if dim(𝑉) = 𝑚, dim(𝑊) = 𝑛. Lemma 5.1. Given a nonzero 𝑛 × 𝑚 matrix 𝐴 its rank rk(𝐴) is 𝑘𝑚𝑎𝑥 = max{𝑘 ∈ ℕ ∶ 𝐴 has a nonsingular 𝑘 × 𝑘 submatrix 𝐴𝐼𝐽 }. Proof. We have rk(𝐴) ≥ 𝑘max because if 𝐴𝐼𝐽 is nonsingular, its columns {𝐶𝑗′1 , … , 𝐶𝑗′𝑘 } are linearly independent, and are truncated versions of the corresponding columns {𝐶𝑗1 , … , 𝐶𝑗𝑘 } of 𝐴, which forces the latter to be linearly independent. Hence, |𝐽| ≤ 𝑘𝑚𝑎𝑥 ≤ rk(𝐴).

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

205

Figure 5.1. A square 𝑘 × 𝑘 submatrix 𝐴𝐼𝐽 is extracted from an 𝑛 × 𝑚 matrix by specifying row indices 𝐼 = {𝑖1 < ⋯ < 𝑖𝑘 } and column indices 𝐽 = {𝑗1 < ⋯ < 𝑗𝑘 }. The rank rk(𝐴) is equal to 𝑘 if 𝐴𝐼𝐽 is nonsingular, and all nonsingular square submatrices have size 𝑟 ≤ 𝑘. We also have rk(𝐴) ≤ 𝑘𝑚𝑎𝑥 , for if rk(𝐴) = 𝑘, there is some set of column indices with |𝐽| = 𝑘 such that {𝐶𝑗 ∶ 𝑖 ∈ 𝐽} are linearly independent. If 𝐵 is the 𝑛 × 𝑘 matrix [𝐶𝑗1 ; … ; 𝐶𝑗𝑘 ], it is well known that row rank(𝐵) = column rank(𝐵) (see Corollary 4.47 of LA I), so we can find a set 𝐼 of row indices with |𝐼| = |𝐽| = 𝑘 such that the rows {𝑅𝑖 (𝐵) ∶ 𝑖 ∈ 𝐼} are linearly independent. The rows in the 𝑛×𝑘 matrix 𝐵 = [𝐶𝑗1 ; … ; 𝐶𝑗𝑘 ] are truncated versions of the corresponding rows in 𝐴, and those with row indices in 𝐽 are precisely the rows of the 𝑘×𝑘 submatrix 𝐴𝐼𝐽 . Obviously, this submatrix is nonsingular, so 𝑘𝑚𝑎𝑥 ≥ 𝑘 = rk(𝐴𝐼𝐽 ) = rk(𝐴). □ Note that various choices 𝐼, 𝐽 of row and column indices may yield nonsingular square submatrices 𝐴𝐼𝐽 of maximal size. Smooth Mappings and Their Differentials. Now consider a mapping 𝐲 = 𝜙(𝐱) = (𝜙1 (𝐱), … , 𝜙𝑛 (𝐱)) from 𝕂𝑚 → 𝕂𝑛 (𝕂 = ℝ or ℂ, but mostly ℝ in our discussion). We say that 𝜙 is a 𝒞 ∞ map (or smooth map) if the scalar components 𝜙𝑘 (𝐱) have continuous partial derivatives of all orders. The Jacobian matrix for 𝜙 at base point 𝑝 is the 𝑛 × 𝑚 matrix ⎛ ⎜ (𝐷𝜙)𝑝 = ⎜ ⎜ ⎜ ⎝

𝜕𝑦1 𝜕𝑥1

(𝑝) ⋅ ⋅ ⋅

⋅ ⋅ ⋅ 𝜕𝑦𝑛 (𝑝) ⋅ ⋅ ⋅ 𝜕𝑥1

𝜕𝑦1 𝜕𝑥𝑚

(𝑝) ⎞

⎟ ⎟ ⎟ ⎟ 𝜕𝑦𝑛 (𝑝) ⎠𝑛×𝑚 𝜕𝑥𝑚 ⋅ ⋅ ⋅

whose entries are smooth scalar-valued functions of 𝐱 ∈ 𝕂𝑚 . We will be concerned with the rank rk(𝑑𝜙)𝐱 of the Jacobian matrix at and near various base points. We assign a linear operator, the differential of 𝜙 at 𝑝, (𝑑𝜙)𝑝 ∶ 𝕂𝑚 → 𝕂𝑛

such that

(𝑑𝜙)𝑝 (𝐯) = (𝐷𝜙)𝑝 ⋅ col(𝑣1 , … , 𝑣𝑚 )

206

5. MATRIX LIE GROUPS

at each base point 𝑣 ∈ 𝕂𝑚 where 𝜙 is smooth. The operator (𝑑𝜙)𝑝 is the unique linear operator 𝕂𝑚 → 𝕂𝑛 that “closely approximates” the behavior of the (nonlinear) map 𝜙 ∶ 𝕂𝑚 → 𝕂𝑛 near 𝑝 in the sense that Δ𝜙 = 𝜙(𝑝 + Δ𝐱) − 𝜙(𝑝) = (𝑑𝜙)𝑝 ⋅ (Δ𝐱) + 𝐸(Δ𝐱), in which the “error term” 𝐸(Δ𝐱) becomes very small compared to Δ𝐱 for small increments away from the base point 𝑝: (5.1)

Error Estimate:

‖𝐸(Δ𝐱)‖ ⟶ 0 in 𝕂𝑛 ‖Δ𝐱‖

as

‖Δ𝐱‖ → 0 in 𝕂𝑚 .

We indicate this by writing “𝐸(Δ𝐱) = 𝑜(Δ𝐱).” As a function of the base point 𝑝 ∈ 𝕂𝑚 , the linear operator (𝑑𝜙)𝑝 is a 𝒞 ∞ map from 𝕂𝑚 into 𝕂𝑛 . The rank at 𝑝 of a map 𝜙 is the rank 𝑟 = rk(𝑑𝜙)𝑝 of its Jacobian matrix. As above, rk(𝑑𝜙)𝑝 = 𝑟 ⇔ dim(range(𝑑𝜙)𝑝 ) = 𝑟 ⇔ 𝑚 − dim(ker(𝑑𝜙)𝑝 ) = 𝑚 − 𝑟 ⇔ there are 𝑟 row indices 𝐼 = {𝑖1 < ⋯ < 𝑖𝑟 } and column indices 𝐽 = {𝑗1 < ⋯ < 𝑗𝑟 } such that 1. The 𝑟 × 𝑟 submatrix (𝑑𝜙𝑝 )𝐼𝐽 is nonsingular, and 2. No larger square submatrix (with 𝑘 > 𝑟) can be nonsingular, so 𝑟 × 𝑟 is the maximum size of any nonsingular square submatrix. Note the following points: • Various choices of 𝐼, 𝐽 may yield nonsingular submatrices (𝑑𝜙)𝐼𝐽 of maximal size 𝑟 × 𝑟. The valid choices may vary with the base point 𝑝. • For fixed 𝐼, 𝐽 the entries in the 𝑟 × 𝑟 matrix (𝑑𝜙𝐱 )𝐼𝐽 vary smoothly with 𝐱, and so does the determinant det (𝑑𝜙)𝐱 since the entries 𝜕𝜙𝑖 /𝜕𝑥𝑗 are 𝒞 ∞ on any chart domain. Thus, if the determinant is nonzero at 𝑝, it must also be nonzero for all 𝑥 near 𝑝 by continuity of the determinant det ∶ M(𝑘, 𝕂) → 𝕂. Hence, for fixed choice of 𝐼, 𝐽, we have rk(𝑑𝜙𝐱 )𝐼𝐽 ≥ 𝑟

for all 𝐱 near 𝑝 if

rk(𝑑𝜙𝑝 )𝐼𝐽 = 𝑟.

If 𝑟𝑚𝑎𝑥 is the largest value of rk(𝑑𝜙)𝑝 on 𝕂𝑚 , then rk(𝑑𝜙)𝐱 ≡ 𝑟𝑚𝑎𝑥 (constant rank) on some open neighborhood of 𝑝 in 𝕂𝑚 . Often, as when 𝜙 ∶ 𝕂𝑚 → 𝕂𝑛 has scalar components 𝜙 = (𝜙1 (𝑥), … , 𝜙𝑛 (𝑥)) that are polynomials in 𝐱 = (𝑥1 , … , 𝑥𝑚 ), this open set is dense in 𝕂𝑚 , and its complement has Lebesgue measure zero — i.e., maximal rank is achieved at “almost all” points in 𝕂𝑚 . • For any choice of an index set 𝐽 ⊂ [1, 𝑚] with |𝐽| = 𝑘 ≤ min{𝑚, 𝑛}, ′ define 𝐽 ′ = [1, 𝑚] ∼ 𝐽 and let 𝕂𝐽 , 𝕂𝐽 ⊆ 𝕂𝑚 be the subspaces 𝕂𝐽 = ℝ-span{𝑒𝑖 ∶ 𝑖 ∈ 𝐽}

and

′

𝕂𝐽 = ℝ-span{𝑒𝑖 ∶ 𝑖 ∈ 𝐽 ′ }

where {𝑒𝑖 } is the standard basis in 𝕂𝑚 . Then 𝕂𝑚 splits as a direct sum ′

𝕂𝐽 ⊕ 𝕂𝐽 , and this decomposition determines projections ′

𝜋 𝐽 ∶ 𝕂𝑚 → 𝕂𝐽 onto these subspaces.

′

and

𝜋 𝐽 ∶ 𝕂𝑚 → 𝕂𝐽

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

207

′

Figure 5.2. The projection maps 𝜋𝐽 , 𝜋𝐽 and direct sum de′ composition 𝕂𝑚 = 𝕂𝐽 ⊕ 𝕂𝐽 are associated with a partition of column indices [ 1, 𝑚] = 𝐽 ′ ∪ 𝐽, 𝐽 ′ ∩ 𝐽 = ∅. Near any base point 𝑝 ∈ 𝑀, this splits the variables in 𝐱 = (𝑥1 , … , 𝑥𝑚 ) into two groups so 𝐱 = (𝐱𝐽 ′ , 𝐱𝐽 ) with 𝑥𝐽 = (𝑥𝑘1 , … , 𝑥𝑘𝑟 ) and 𝑥𝐽 ′ = (𝑥ℓ1 , … , 𝑥ℓ𝑠 ), where 𝑟 = |𝐽|, 𝑠 = |𝐽 ′ |, and 𝑟 + 𝑠 = 𝑚 = dim(𝕂𝑚 ). In our narrative, such partitions of variables arise in discussing the rank 𝑟 = rk(𝑑𝜙)𝑝 of the the 𝑛 × 𝑚 Jacobian matrix [𝜕𝑓𝑖 /𝜕𝑥𝑗 ] of a differentiable map 𝜙 ∶ 𝕂𝑚 → 𝕂𝑛 at points on a typical level set 𝐿𝜙(𝐱)=𝑞 . By composing 𝜙 with translations in 𝕂𝑚 and 𝕂𝑛 we can assume 𝜙 maps the origin in 𝕂𝑚 to the origin in 𝕂𝑛 . This will not change rk(𝑑𝜙𝑝 ), but it does simplify the notation. The following exercise shows that this maneuver does not affect the Jacobian matrices or their determinants. Exercise 5.2. Let 𝑝 ∈ ℝ𝑚 . (a) Let 𝐲 = 𝜙(𝐱) = (𝑥1 + 𝑝1 , … , 𝑥𝑚 + 𝑝𝑚 ) be a translation operator from ℝ𝑚 → ℝ𝑚 . Prove that (𝑑𝜙)𝑝 = 𝐼𝑚×𝑚 at every base point 𝑝. 𝜙

𝜓

(b) Given smooth maps ℝ𝑚 ⟶ ℝ𝑛 ⟶ ℝ𝑘 and base points 𝑝 ∈ ℝ𝑚 , 𝑞 = 𝜙(𝑝) ∈ ℝ𝑛 , explain why the differential of 𝜓 ∘ 𝜙 from ℝ𝑚 → ℝ𝑘 is the matrix product of their differentials 𝑑(𝜓 ∘ 𝜙)𝑝 = (𝑑𝜓)𝜙(𝑝) ⋅ (𝑑𝜙)𝑝 . Smooth Hypersurfaces and the Implicit Function Theorem (IFT). The implicit function theorem (IFT) concerns itself with level sets 𝐿𝜙(𝐱)=𝑞 = {𝐱 ∈ 𝕂𝑚 ∶ 𝜙(𝐱) = 𝑞} ⊆ 𝕂𝑚 , on which a smooth mapping 𝜙 ∶ 𝕂𝑚 → 𝕂𝑛 has constant (vector) value 𝜙(𝐱) = 𝑞 ∈ 𝕂𝑛 . In essence, the IFT says that if 𝑝 is in a level set 𝐿𝜙(𝐱)=𝐪 , then the locus 𝐿𝜙(𝐱)=𝐪 is carved out of 𝕂𝑚 by imposing 𝑛 scalar constraints 𝜙1 (𝐱) = 𝑐1 , … , 𝜙𝑛 (𝐱) = 𝑐𝑛

208

5. MATRIX LIE GROUPS

Figure 5.3. Level sets for the map 𝑓 ∶ ℝ2 → ℝ with 𝑓(𝑥, 𝑦) = |𝑧2 − 1|2 , identifying 𝑧 = 𝑥 + 𝑖𝑦 with (𝑥, 𝑦) ∈ ℝ2 . If 𝑐 < 0 the level set 𝐿𝑐 = 𝐿𝑓(𝑧)=𝑐 is empty; when 𝑐 = 0 it consists of three isolated points 𝑧 = −1, 0, +1 ; and for 𝑐 > 0 the locus is usually a smooth curve (perhaps with more than one connected component, as when 0 < 𝑐 < 1). But when 𝑐 = 1 the locus has a singularity at the origin. It cannot be represented near the origin as the graph of any smooth function 𝑦 = ℎ(𝑥) or 𝑥 = 𝑔(𝑦). The origin is a “branch point” for the locus. on the components of 𝜙, and when the hypotheses of the IFT are satisfied at 𝑝, this level set can be described locally as a smooth hypersurface of dimension 𝑚 − 𝑟 in 𝕂𝑚 , where 𝑟 = rk(𝑑𝜙)𝑝 . Thus, the level set is the graph Γ = {(𝐱, 𝑓(𝐱)) ∶ 𝐱 ∈ 𝕂𝑚−𝑟 } ⊆ 𝕂𝑚 = 𝕂𝑚−𝑟 × 𝕂𝑟 of a smooth map 𝑓 ∶ 𝕂𝑚−𝑟 → 𝕂𝑟 . The idea is illustrated in the following example (see also Figure 5.2.) The map 𝐲 = 𝑓(𝐱) is the “implicit function” of the IFT. Example 5.3. Define 𝜙(𝑧) = |𝑧2 − 1|2 for 𝑧 ∈ ℂ and regard it as a map ℝ2 → ℝ by identifying 𝑧 = 𝑥 + 𝑖𝑦 ∈ ℂ with 𝐱 = (𝑥, 𝑦) ∈ ℝ2 . Then 𝜙 becomes a 4th degree polynomial in 𝑥 and 𝑦: 𝜙(𝑥, 𝑦) = 𝑥4 + 2𝑥2 𝑦2 + 𝑦4 − 2𝑥2 + 2𝑦2 + 1. The level sets 𝐿𝑐 = 𝐿𝜙(𝐱)=𝑐 are empty if 𝑐 < 0; reduce to the isolated points {−1, 0, +1} if 𝑐 = 0; and for 𝑐 > 0 are smooth curves (with more than one connected component when 0 < 𝑐 < 1). However there is one exception. When 𝑐 = 1, the locus 𝐿𝜙=1 , shown in Figure 5.3, has a singularity at the origin. Near 𝑧 = 0 + 𝑖0, it cannot be described locally as the graph of a smooth function 𝑦 = ℎ(𝑥) or 𝑥 = 𝑔(𝑦). The 1×2 Jacobian matrix 𝐷𝜙(𝑧) = [𝜕𝜙/𝜕𝑥(𝑧), 𝜕𝜙/𝜕𝑦(𝑧)] = (𝑑𝜙)𝑧 has rk(𝑑𝜙)𝑧 ≡ 1 (constant) throughout ℝ2 except at 𝑧 = 0, 𝑧 = −1, and

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

209

𝑧 = +1 on the real axis (the “critical points” where both partial derivatives of 𝑓 are zero). Exercise 5.4. Let 𝜙 ∶ ℝ2 → ℝ be the function in Example 5.3. (a) Verify that the Jacobian matrix 𝜕𝜙 𝜕𝜙 (𝑑𝜙)𝐱 = [ , ] 𝜕𝑥 𝜕𝑦 has rank zero (both components = 0) if and only if 𝐱 = (−1, 0), (0, 0), or (+1, 0) in ℝ2 by solving the system of equations 𝜕𝜙 𝜕𝜙 (𝐱) = 0 (𝐱) = 0. 𝜕𝑥 𝜕𝑦 (b) At which points 𝐱 is one of the derivatives 𝜕𝜙/𝜕𝑥 and 𝜕𝜙/𝜕𝑦 zero while the other is nonzero? Draw pictures of the sets 𝑆1 = {𝐱 ∶ 𝜕𝜙/𝜕𝑥 = 0 and 𝜕𝜙/𝜕𝑦 ≠ 0} 𝑆2 = {𝐱 ∶ 𝜕𝜙/𝜕𝑥 ≠ 0 and 𝜕𝜙/𝜕𝑦 = 0}. (c) Verify that rk(𝑑𝜙)𝐱 = 1 near every point in the sets 𝑆1 and 𝑆2 identified in (b). How are these points related to the pattern of level curves shown in Figure 5.3? In discussing the IFT, we will discover that a level set 𝐿𝑐 = 𝐿𝜙(𝐱)=𝑐 in Example 5.3 can be described as the graph of a smooth function 𝑦 = 𝑓(𝑥) near any point 𝑝 on the locus where 𝜕𝜙/𝜕𝑥(𝑝) ≠ 0, and similarly we can write 𝑥 = 𝑔(𝑦) if 𝜕𝜙/𝜕𝑦(𝑝) ≠ 0. In Example 5.3, at least one of these conditions is satisfied at every base point except the origin 𝐱 = (0, 0), which lies on the locus 𝐿𝜙=1 , and the points 𝐱 = (−1, 0) and (1, 0), which make up the degenerate locus 𝐿𝜙=0 . Consequently, for 𝑐 ≠ 0 or 1, the nonempty level sets 𝐿𝑐 = 𝐿𝜙(𝐱)=𝑐 can be described locally as smooth curves (the graphs of smooth functions 𝑦 = 𝑓(𝑥) or 𝑥 = 𝑔(𝑦)). Furthermore, 𝐿𝑐 can be described both ways (with 𝑦 = 𝑓(𝑥) or with 𝑥 = 𝑔(𝑦)) near most points 𝑝 ∈ 𝐿𝑐 , but at a few points, only one such description is possible — these are the points on the curves in Figure 5.3 at which 𝐿𝑐 has either a horizontal or vertical tangent line. We will apply the IFT to show that the “classical matrix groups” O(𝑛), SO(𝑛), U(𝑛), SO(𝑛, ℂ), etc., are “matrix Lie groups,” smooth hypersurfaces in matrix 2 space M(𝑛, 𝕂) ≃ 𝕂𝑛 that are also groups under matrix multiplication. This combination of algebraic and geometric aspects, not to mention their appearance in many aspects of modern physics. makes them objects of considerable interest. Given a 𝒞 ∞ map 𝜙 ∶ 𝕂𝑚 → 𝕂𝑛 with 𝑚 ≥ 𝑛, we write (𝑑𝜙)𝐱 for the Jacobian matrix [𝜕𝜙𝑖 /𝜕𝑥𝑗 (𝐱)]𝑛×𝑚 at 𝐱 ∈ ℝ𝑚 and rk(𝑑𝜙)𝐱 for its rank. The simplest version of the IFT asserts that if rk(𝑑𝜙)𝑝 = 𝑛 (the maximum possible rank), then the level set 𝐿𝐜 = 𝐿𝜙(𝐱)=𝐜

with

𝐜 = (𝑐1 , … , 𝑐𝑛 ) = 𝜙(𝑝) in ℝ𝑛

can be described near 𝑝 as a smooth hypersurface of dimension 𝑚 − 𝑛 in 𝕂𝑚 . If the rank condition holds near each base point in 𝐿𝐜 , the whole level set is a

210

5. MATRIX LIE GROUPS

smooth surface embedded in ℝ𝑚 . We will occasionally invoke a stronger version which says this: if we only know that rk(𝑑𝜙)𝐱 ≡ 𝑟 (constant) on 𝐿𝐜 , without requiring maximal rank 𝑟 = 𝑛, the conclusion is still true. This will show that various level sets are in fact differentiable manifolds. For simplicity, we shall focus on these results taking 𝕂 = ℝ, although most remain true almost verbatim for 𝕂 = ℂ. Theorem 5.5 (Implicit Function Theorem (IFT)). Let 𝜙 ∶ ℝ𝑚 → ℝ𝑛 with 𝑚 ≥ 𝑛 be a 𝒞 ∞ map defined near 𝑝 ∈ ℝ𝑚 , let 𝐜 = 𝜙(𝑝), and let 𝑀 = 𝐿𝜙(𝐱)=𝐜 be the level set in ℝ𝑚 containing 𝑝, which is determined by 𝑛 scalar constraint identities 𝜙1 (𝐱) = 𝑐1 , … , 𝜙𝑛 (𝐱) = 𝑐𝑛 . Assume rk(𝑑𝜙)𝐱 ≡ 𝑟 for 𝐱 near 𝑝. Then there exist index sets 𝐼 = {𝑖1 < ⋯ < 𝑖𝑟 } ⊆ [1, 𝑛]

and

𝐽 = {𝑗1 < ⋯ < 𝑗𝑟 } ⊆ [1, 𝑚]

such that [(𝑑𝜙)𝑝 (𝐱)]𝐼𝐽 is a nonsingular square submatrix of maximal size. The column indices 𝐽 determine a partition of indices [1, 𝑚] = 𝐽 ∪ 𝐽 ′

with

𝐽 ′ = [1, 𝑚] ∼ 𝐽, ′

from which we get the following: a direct sum decomposition ℝ𝑚 = ℝ𝐽 ⊕ ℝ𝐽 ′ with |𝐽| = 𝑟 and |𝐽 ′ | = 𝑚 − 𝑟, and linear projections 𝜋𝐽 , 𝜋𝐽 onto the summands. Then there is an open rectangular neighborhood 𝐵1 × 𝐵2 of 𝑝 aligned with the ′ decomposition ℝ𝑚 = ℝ𝐽 ⊕ ℝ𝐽 such that: 1. On the relatively open neighborhood 𝑈𝑝 = (𝐵1 × 𝐵2 ) ∩ 𝑀 of 𝑝 in 𝑀, the ′ ′ restriction 𝜋𝐽 |𝑈𝑝 ∶ 𝑈𝑝 → 𝐵1 of the linear projection 𝜋𝐽 from ℝ𝑚 to ′

ℝ𝐽 ≅ ℝ𝑚−𝑟 is a bicontinuous bijection between the relatively open set 𝑈𝑝 ⊆ 𝑀 and the open set 𝐵1 ⊆ ℝ𝑚−𝑟 . It assigns Euclidean coordinates 𝐱 = (𝑥1 , … , 𝑥𝑚−𝑟 ) to every point in 𝑈𝑝 . 2. The inverse of this map, ′

−1

Ψ = (𝜋𝐽 |𝑈𝑝 )

∶ 𝐵1 → 𝑈𝑝 ⊆ ℝ𝑚 , ′

is a 𝒞 ∞ map from the open set 𝐵1 ⊆ ℝ𝐽 into all of ℝ𝑚 that maps 𝐵1 onto the “relatively open neighborhood” 𝑈𝑝 of 𝑝 (the intersection of the level set 𝑀 with an open set in ℝ𝑚 ). We obtain a 𝒞 ∞ map 𝑓 ∶ 𝐵2 → 𝐵1 ⊆ ℝ𝐽 ≅ ℝ𝑛 by following Ψ with the “horizontal” projection 𝜋𝐽 shown in Figure 5.4. Then 𝑓 = 𝜋𝐽 ∘ Ψ maps 𝐵1 ⊆ ℝ𝑚−𝑟 to 𝐵2 ⊆ ℝ𝑟 , and Ψ is the graph map of 𝑓 because ′

Ψ(𝑥) = (𝜋𝐽 (Ψ(𝑥)), 𝜋𝐽 (Ψ(𝑥))) = (𝑥, 𝜋𝐽 ∘ Ψ(𝑥)) = (𝑥, 𝑓(𝑥))

for 𝑥 ∈ 𝐵1 .

In particular, the open neighborhood 𝑈𝑝 in 𝑀 is precisely the graph of the 𝒞 ∞ map ′ 𝑓 ∶ ℝ𝐽 → ℝ 𝐽 .

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

211

Figure 5.4. A diagram showing the players in the implicit function theorem. Here, 𝑆𝑝 is the level set passing through 𝑝 for a 𝒞 ∞ map 𝜙 ∶ ℝ𝑚 → ℝ𝑛 . By splitting coordinates in ℝ𝑚 ′ into two groups, we get a decomposition ℝ𝑚 = ℝ𝐽 ⊕ℝ𝐽 and as′ ′ sociated projections 𝜋𝐽 , 𝜋𝐽 from ℝ𝑚 to ℝ𝐽 or ℝ𝐽 such that (i) ′ the restriction 𝜋𝐽 |𝑈𝑝 becomes a bijective bicontinuous map to ′

an open set 𝐵1 in ℝ𝐽 for a suitably chosen open neighborhood −1 ′ 𝑈𝑝 of 𝑝 in 𝑆𝑝 , and (ii) the inverse Ψ = (𝜋𝐽 |𝑈𝑝 ) ∶ 𝐵1 → 𝑈𝑝 ′

is a 𝒞 ∞ map from the open set 𝐵1 ⊆ ℝ𝐽 into the entire Euclidean space ℝ𝑚 in which the level set 𝑆𝑝 lives. The set 𝑈𝑝 = 𝑆𝑝 ∩ (𝐵1 × 𝐵2 ) (the shaded rectangular region containing 𝑝) is a relatively open neighborhood of 𝑝 in the level set 𝐿𝐜 where 𝜙(𝐱) = 𝐜. Conclusion: Near 𝑝, the level set 𝑀 = 𝐿𝜙(𝐱)=𝐜 looks like part of a smooth hypersurface in ℝ𝑚 of dimension 𝑑 = 𝑚 − 𝑟. The situation described in the IFT is shown in Figure 5.4. Rank vs Dimension of Level Sets. A rough general principle is at work here. If 𝜙 ∶ ℝ𝑚 → ℝ is a scalar valued 𝒞 ∞ function, one often finds that the solution set 𝐿𝜙(𝐱)=𝑐 , 𝑐 ∈ ℝ, is a smooth hypersurface of dimension 𝑚 − 1. For a vector valued map 𝜙 ∶ ℝ𝑚 → ℝ𝑛 with 𝜙 = (𝜙1 , … , 𝜙𝑛 ), a typical level set 𝐿𝜙(𝐱)=𝐜 with 𝐜 = (𝑐1 , … , 𝑐𝑛 ) is the intersection of the solution sets for the system of scalar constraints 𝜙1 (𝐱) = 𝑐1 , … , 𝜙𝑟 (𝐱) = 𝑐𝑛

if

𝐜 = (𝑐1 , … , 𝑐𝑛 ).

The solution set tends to lose one degree of freedom for each imposed constraint, so the outcome is usually a smooth hypersurface in ℝ𝑚 of dimension 𝑚 − 𝑟. But that is not always the case, and the point of the IFT is to make clear when it is true. This principle also suggests why it is natural to restrict attention to the case 𝑚 ≥ 𝑛, in which “maximal rank” means rk(𝑑𝜙)𝑝 = 𝑛, the dimension of the “target space” ℝ𝑛 for 𝜙. When the number of constraints 𝑛 exceeds the dimension 𝑚 of the space ℝ𝑚 in which the level set lives, the locus may be

212

5. MATRIX LIE GROUPS

degenerate with no solutions at all, or it may reduce to a set of isolated points in ℝ𝑚 . It is worth asking “What’s going on if 𝑟 = rk(𝑑𝜙)𝑝 < 𝑛?” The short answer is that the imposed constraints 𝜙1 (𝐱) = 𝑐1 , … , 𝜙𝑛 (𝐱) = 𝑐𝑛 (one for each degree of freedom in the target space ℝ𝑛 ) include some redundant constraints whose elimination would not change the solution set 𝐿𝜙(𝐱)=𝐜 but would make 𝑛 equal to 𝑟 = rk(𝑑𝜙)𝑝 . Thus, we may as well assume the dimension 𝑛 of the target space for 𝜙 is equal to 𝑟 = rk(𝑑𝜙)𝑝 . Unfortunately, ferreting out the redundant constraints could prove an unpleasant chore. The “Maximal Rank” Case. As a particular example, if 𝜙 maps 𝕂𝑚 → 𝕂𝑛 and 𝑚 ≥ 𝑛, the maximum possible rank of (𝑑𝜙𝑝 ) is 𝑛. If this is achieved at some base point 𝑝 ∈ 𝕂𝑚 , 𝜙 will automatically have the same (maximal) rank at all points 𝐱 near 𝑝 in 𝕂𝑚 because entries in the Jacobian matrix [𝐷𝜙(𝐱)] vary continuously with 𝐱 and det [𝐷𝜙(𝑝)] ≠ 0. The maximal rank case 𝑟 = 𝑛 is the one most often encountered, but sometimes one needs the stronger “constant rank” version in which 𝑟 ≤ 𝑛. Exercise 5.6. Consider the 𝒞 ∞ map 𝜙 ∶ ℝ4 → ℝ2 given by 𝐲 = (𝑦1 (𝐱), 𝑦2 (𝐱)) = 𝜙(𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 ) = (𝑥12 + 𝑥22 , 𝑥32 − 𝑥42 + 𝑥1 𝑥4 ). (a) Show that the locus 𝑀 = 𝐿𝜙(𝐱)=𝐪 can be described as a smooth twodimensional hypersurface in ℝ4 near 𝑝 = (1, 2, −1, 3), at which 𝑞 = 𝜙(𝑝) = (5, −5). Identify all pairs of variables 𝑥𝑖 , 𝑥𝑗 (1 ≤ 𝑖 < 𝑗 ≤ 4) that can be used to smoothly parametrize this hypersurface near 𝑝. (b) Is this locus smooth near all of its points? Hint. In (b), start by showing 𝑥4 ≠ 0 for every point 𝐱 ∈ 𝑀, so you can assume 𝑥4 ≠ 0 in calculations involving points on 𝑀, even if you cannot draw a picture. In answering (b), you will have to compute rk(𝐴𝐼𝐽 ) for various square submatrices of the 2 × 4 Jacobian matrix [ 𝜕𝑦𝑖 /𝜕𝑥𝑗 ], which has variable coefficients. Do this using symbolic row operations. Exercise 5.7. Consider the 𝒞 ∞ scalar-valued function 𝜙 ∶ ℝ2 → ℝ, 𝜙(𝑥, 𝑦) = 𝑥3 𝑦 + 2𝑒𝑥𝑦 . (a) Find all critical points, where both partial derivatives 𝜕𝜙/𝜕𝑥 and 𝜕𝜙/𝜕𝑦 are zero. At a critical point 𝑝 there is no way to represent the level set 𝑆𝑝 = 𝐿𝜙(𝐱)=𝜙(𝐩) as the graph of a smooth function 𝑦 = 𝑓(𝑥) or 𝑥 = 𝑔(𝑦). (b) Locate all points 𝑝 where one of the partial derivatives is zero but the other is not (two cases to consider). Find the value of 𝜙 at each such base point to determine which level sets 𝐿𝜙(𝐩)=𝑐 contain such points. (c) The locus 𝑀 = 𝐿𝜙(𝐱)=2 obviously contains the horizontal and vertical axes. Prove that there are no other points on this locus. (Thus the origin is a singularity for the locus 𝑀, and there are no others.)

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

213

Hint. In (c): quadrant-by-quadrant, what is the sign of 𝜕𝜙/𝜕𝑦 off of the 𝑥and 𝑦-axes? Exercise 5.8. Let 𝐲 = 𝜙(𝐱) = (𝑦1 (𝐱), 𝑦2 (𝐱)) be a 𝒞 ∞ map from ℝ3 → ℝ2 such that 1. 𝜙(𝑝) = 𝑞 = (0, 0) at base point 𝑝 = (3, −1, 2). 1 2 1 2. At 𝑝, the Jacobian matrix is [𝜕𝑦𝑖 /𝜕𝑥𝑗 ] = ( ). 1 −1 1 Answer the following questions without knowing anything more about 𝜙: (a) Can the level set 𝑀 = 𝐿𝜙(𝐱) = 𝑞 be described near 𝑝 = (3, −1, 2 as a smooth hypersurface in ℝ3 ? Of what dimension 𝑘? (b) Which of the variables 𝑥1 , 𝑥2 , 𝑥3 can legitimately be used to parametrize 𝑀 near 𝑝 = (3, −1, 2) as the graph of a smooth map 𝑓 from ℝ𝑘 to ℝ3 ? List all valid choices of the parametrizing variables 𝑥𝑖1 , … , 𝑥𝑖𝑘 . Exercise 5.9. If 𝑓 ∶ 𝕂𝑟 → 𝕂𝑠 is a 𝒞 ∞ map defined on open set 𝐵 ⊆ 𝕂𝑟 , its graph Γ = {(𝐱, 𝑓(𝐱)) ∈ 𝕂𝑟 × 𝕂𝑠 ∶ 𝐱 ∈ 𝐵} is the range of the graph map 𝐹(𝐱) = (𝐱, 𝑓(𝐱)) from 𝕂𝑟 → 𝕂𝑟+𝑠 . Show that the graph map is 𝒞 ∞ for 𝐱 ∈ 𝐵 ⊆ 𝕂𝑟 and that rk(𝑑𝑓)𝐱 ≡ 𝑟 (constant) for all 𝐱 ∈ 𝐵. The Inverse Mapping Theorem (IMT). A result closely related to the implicit function theorem tells us when a differentiable map 𝜙 ∶ 𝕂𝑛 → 𝕂𝑛 (𝕂 = ℝ or ℂ) has a locally defined inverse near a base point 𝑝. This variant of the IFT will be central in our discussion of matrix Lie groups. Theorem 5.10 (Inverse Mapping Theorem). Let 𝜙(𝐱) = (𝜙1 (𝐱), … , 𝜙𝑛 (𝐱)) be a map from ℝ𝑛 → ℝ𝑛 that is differentiable of class 𝒞 (𝑟) , so all partial derivatives 𝐷𝛼 𝜙𝑖 of order |𝛼| = 𝛼1 + ⋯ + 𝛼𝑛 ≤ 𝑟 exist and are continuous near a base point 𝑝 in ℝ𝑛 . If the Jacobian matrix at 𝑝 𝜕𝜙 𝐷𝜙(𝑝) = [ 𝑖 (𝑝)] 𝜕𝑥𝑗 is invertible, so det(𝐷𝜙(𝑝)) ≠ 0, there exist open neighborhoods 𝑈 about 𝑝 and 𝑉 about 𝑞 = 𝜙(𝑝) such that 1. The map 𝐲 = 𝜙(𝐱) is of class 𝒞 (𝑟) on 𝑈 and is a bijective map from 𝑈→𝑉. 2. The inverse map 𝐱 = 𝜓(𝐲) from 𝑉 → 𝑈 is differentiable of class at least 𝒞 (𝑟) on 𝑉. Remark 5.11. Obviously, the inverse map will be of class 𝒞 ∞ if the forward map 𝜙 is 𝒞 ∞ ; the theorem remains true for multivariate maps 𝜙 ∶ ℂ𝑛 → ℂ𝑛 involving complex variables. Moreover, it even applies to analytic functions of several real or complex variables. Analyticity at 𝑝 means each scalar component 𝜙𝑘 has an absolutely convergent multivariable Taylor series expansion 𝐷𝛼 𝜙𝑘 (𝐩) 𝛼 𝛼 𝜙𝑘 (𝐱) = ∑ ⋅ (𝐱 − 𝐩)𝛼 (𝐱𝛼 = 𝑥1 1 ⋯ 𝑥𝑛 𝑛 ) ⋯ 𝛼 ! 𝛼 ! 𝑛 𝛼∈ℤ𝑛 1 +

214

5. MATRIX LIE GROUPS

on some open neighborhood of 𝑝. The class of real- or complex-analytic functions on a domain 𝑈 is indicated by the special symbol 𝒞 𝜔 (𝑈). 2 As an example, the exponential map Exp(𝐴) = 𝑒 𝐴 is a 𝒞 𝜔 map on 𝕂𝑛 when we identify 𝐴 ∈ M(𝑛, 𝕂) with 𝐚 = (𝑎11 , … , 𝑎1𝑛 , … , 𝑎𝑛1 , … , 𝑎𝑛𝑛 ) in 𝕂𝑛

2

2

for 𝕂 = ℝ (or in ℝ2𝑛 when 𝕂 = ℂ). Even in one variable, analyticity of 𝜙 is a much stronger condition than 𝜙 ∈ 𝒞 ∞ . ○ Differentiable Manifolds (General Definition). A space 𝑀 is locally Euclidean of dimension 𝑑 if it can be covered by a family of “charts” {(𝑥𝛼 , 𝑈𝛼 ) ∶ 𝛼 in some index set 𝐼}, where 𝑥𝛼 ∶ 𝑈𝛼 → 𝑉𝛼 ⊆ ℝ𝑑 is a bicontinuous map from an open subset 𝑈𝛼 ⊆ 𝑀 to an open set 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) in the Euclidean space ℝ𝑑 . The “chart maps” 𝑥𝛼 assign locally defined Euclidean coordinates (𝛼) (𝛼) 𝑥𝛼 (𝑢) = (𝑥1 (𝑢), … , 𝑥𝑘 (𝑢)) for 𝑢 ∈ 𝑈𝛼 . Thus, 𝑀 looks locally like Euclidean coordinate space ℝ𝑑 . Where the domains of two charts (𝑥𝛼 , 𝑈𝛼 ), (𝑥𝛽 , 𝑈𝛽 ) overlap, we have the situation shown in Figure 5.5. The intersection 𝑈𝛼 ∩ 𝑈𝛽 is an open set in 𝑀, the images 𝑁𝛼 = 𝑥𝛼 (𝑈𝛼 ∩ 𝑈𝛽 ), 𝑁𝛽 = 𝑥𝛽 (𝑈𝛼 ∩ 𝑈𝛽 ) are open sets in coordinate space ℝ𝑘 , and we have induced coordinate transition maps that tell us how the coordinates 𝐱 = 𝑥𝛼 (𝑢) and 𝐲 = 𝑥𝛽 (𝑢) assigned to 𝑢 ∈ 𝑀 by the chart maps are related. These transition maps 𝐱 = (𝑥𝛼 ∘ 𝑥𝛽−1 )(𝐲)

from 𝑁𝛽 → 𝑁𝛼

−1 𝑥𝛼 )(𝐱)

from 𝑁𝛼 → 𝑁𝛽

𝐲 = (𝑥𝛽 ∘

are bicontinuous bijections between the open sets 𝑁𝛼 and 𝑁𝛽 in ℝ𝑑 . Definition 5.12. A locally Euclidean space 𝑀 is a differentiable manifold of dimension dim(𝑀) = 𝑑 if the charts (𝑈𝛼 , 𝑥𝛼 ) that cover 𝑀 each map 𝑀 into ℝ𝑑 and are 𝒞 ∞ -related, so the coordinate transition maps are 𝒞 ∞ between the open sets 𝑁𝛼 , 𝑁𝛽 ⊆ ℝ𝑑 that correspond to the (open) intersection 𝑈𝛼 ∩ 𝑈𝛽 of chart domains in 𝑀. This allows us to make sense of such “smooth manifolds” without requiring that they be embedded in some surrounding Euclidean space. It is the starting point for modern differential geometry. Once we make 𝑀 a 𝒞 ∞ manifold by introducing 𝒞 ∞ related covering charts, we can begin to do calculus on 𝑀. The following concepts now make sense: 1. Smooth Functions on 𝑀. A map 𝑓 ∶ 𝑀 → ℝ is a 𝒞 ∞ function on 𝑀 if the scalar-valued function −1 𝑦 = 𝐹(𝐱) = (𝑓 ∘ 𝑥𝛼 )(𝐱),

defined on the open set 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) in coordinate space 𝕂𝑚 , has continuous partial derivatives of all orders for each of the covering charts that determine the manifold structure of 𝑀.

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

215

Vα = xα (Uα ) Nα = xα (Uα ∩ Uβ ) xα Uα

coordinate transition maps: xβ ◦ x−1 α

xα ◦ x−1 β

Uα ∩ U β Uβ

xβ

Level Set M = Lf (x)=q

Nβ = xβ (Uα ∩ Uβ ) Vβ = xβ (Uβ )

Figure 5.5. The coordinate transition maps 𝑥𝛼 ∘ 𝑥𝛽−1 and 𝑥𝛽 ∘ −1 𝑥𝛼 between two charts (𝑥𝛼 , 𝑈𝛼 ), (𝑥𝛽 , 𝑈𝛽 ) and their (shaded) domains of definition in ℝ𝑟 are shown. Both shaded domains 𝑁𝛼 , 𝑁𝛽 in ℝ𝑟 correspond to the intersection 𝑈𝛼 ∩ 𝑈𝛽 of the chart domains, which is an open set in the locally Euclidean space 𝑀. 2. Smooth Maps Between Manifolds. A map 𝜙 ∶ 𝑀 → 𝑁 is a C∞ mapping between manifolds 𝑀 and 𝑁 of dimensions 𝑚 and 𝑛 if it becomes a 𝒞 ∞ map from ℝ𝑚 → ℝ𝑛 when described in local coordinates on 𝑀 and 𝑁. Thus, if (𝑈𝛼 , 𝑥𝛼 ) and (𝑈𝛽 , 𝑦𝛽 ) are charts on 𝑀 and 𝑁 respectively, the composite −1 𝐲 = Φ(𝐱) = 𝑦𝛽 ∘ 𝜙 ∘ 𝑥𝛼 (𝐱)

is a 𝒞 ∞ map from ℝ𝑚 → ℝ𝑛 wherever it is well-defined. 3. Parametric Curves in 𝑀. A parametric curve in 𝑀 is any continuous map 𝑦 = 𝛾(𝑡) from some interval [𝑎, 𝑏] ⊆ ℝ into 𝑀. It is a C∞ -curve in 𝑀 if it becomes a 𝒞 ∞ vector-valued map 𝐱 = (𝑦1 (𝑡), … , 𝑦𝑚 (𝑡)) = 𝑥𝛼 ∘ 𝛾(𝑡)

for 𝑡 ∈ [𝑎, 𝑏]

from [𝑎, 𝑏] → ℝ𝑚 for every chart (𝑈𝛼 , 𝑥𝛼 ) on 𝑀. ℎ

𝑓

Exercise 5.13. Suppose 𝑀 ⟶ 𝑁 ⟶ 𝑄 are 𝒞 ∞ maps between 𝒞 ∞ manifolds. In terms of the preceding definitions, explain why the composite 𝑓 ∘ ℎ is a 𝒞 ∞ map from 𝑀 → 𝑄 wherever it is well-defined. Hint. If 𝑀, 𝑁, 𝑄 are Euclidean coordinate spaces ℝ𝑚 , ℝ𝑛 , ℝ𝑞 , this follows by the chain rule of multivariate calculus. Differentiable Manifolds and the IFT. One consequence of the IFT is this: If 𝜙 is a 𝒞 ∞ map ℝ𝑚 → ℝ𝑛 and if rk(𝑑𝜙)𝐱 ≡ 𝑟 (constant) near every point in a level set 𝑀 = 𝐿𝜙(𝐱)=𝐜 , 𝐜 ∈ ℝ𝑛 , we can

216

5. MATRIX LIE GROUPS

use the IFT to create a family of 𝒞 ∞ -related charts (𝑥𝛼 , 𝑈𝛼 ) whose domains 𝑈𝛼 cover 𝑀. The resulting standard 𝒞 ∞ structure makes 𝑀 into a smooth 𝑚 − 𝑟-dimensional manifold. The crucial fact that the charts are 𝒞 ∞ -related follows directly from the way the standard charts are constructed using the IFT, as in the following example. Example 5.14 (Constructing “Standard Charts” on Level Sets). The IFT and the “constant rank” condition allow us to construct a chart (𝑥𝛼 , 𝑈𝛼 ) about a typical base point 𝑝 ∈ 𝑀. Step 1. Write 𝜙 as 𝐯 = 𝜙(𝐮) in terms of the standard coordinates 𝐮 = (𝑢1 , … , 𝑢𝑚 ) and 𝐯 = (𝑣1 , … , 𝑣𝑛 ) in ℝ𝑚 and ℝ𝑛 . By Lemma 5.1 and the “constant rank” condition, we can, for each 𝐮 ∈ 𝑀, choose row and column indices 𝐼 ⊆ [ 1, 𝑛] and 𝐽 ⊆ [ 1, 𝑚] with |𝐼| = |𝐽| = 𝑟 = rk(𝑑𝜙)𝐮 such that the square submatrices [𝜕𝑣𝑖 /𝜕𝑢𝑗 (𝐮)]𝐼𝐽 are nonsingular for 𝐮 near 𝑝 in ℝ𝑚 . By Lemma 5.1, this cannot be done for any larger square submatrix. Step 2. Using the column indices determined in Step 1, let 𝐽 ′ = [ 1, 𝑚] ∼ 𝐽, ′ ′ ′ split ℝ𝑚 = ℝ𝐽 ⊕ ℝ𝐽 , and let 𝜋𝐽 , 𝜋𝐽 be the projection maps from ℝ𝑚 to ℝ𝐽 or ℝ𝐽 . By the IFT, there is a rectangular open neighborhood 𝐵1 × 𝐵2 of 𝑝 in 𝕂𝑚 such that the projection ′

𝜋𝐽 ∶ (𝐵1 × 𝐵2 ) → 𝐵1 maps the relatively open neighborhood 𝑈𝑝 = (𝐵1 × 𝐵2 ) ∩ 𝑀 in 𝑀 bijectively to ′ the open set 𝐵1 ⊆ ℝ𝐽 ≅ ℝ𝑚−𝑟 . To get a chart (𝑥𝛼 , 𝑈𝛼 ) that imposes Euclidean coordinates on 𝑀 near 𝑝, we take 𝑈𝛼 = 𝑈𝑝 and bijective chart map ′

𝑥𝛼 = (𝜋𝐽 |𝑈𝑝 ) ∶ 𝑈𝑝 → 𝐵1

(an open set in ℝ𝑚−𝑟 ).

Such charts (𝑥𝛼 , 𝑈𝛼 ) obviously cover 𝑀, owing to constancy of rk(𝑑𝜙) near every point in 𝑀. ′

−1

Step 3. The inverse map Ψ = (𝜋𝐽 |𝑈𝑝 )

∶ 𝐵1 → 𝑈𝑝 ⊆ 𝑀 ⊆ ℝ𝑚 is 𝒞 ∞ from

′

𝐵1 ⊆ ℝ𝐽 into all of ℝ𝑚 , and its range is precisely the chart domain 𝑈𝑝 . Now we must show that charts created this way, perhaps about different base points, are always 𝒞 ∞ -related wherever they overlap. Theorem 5.15. If 𝜙 ∶ ℝ𝑚 → ℝ𝑛 is a 𝒞 ∞ map, let 𝑀 = 𝐿𝜙(𝐱)=𝐜 be a level set such that rk(𝑑𝜙)𝑥 ≡ 𝑟 (constant) on an open neighborhood of every point in 𝑀. Then all standard charts on 𝑀 obtained by the preceding construction are 𝒞 ∞ related where they overlap. This determines the standard 𝒞 ∞ structure on 𝑀. The dimension of the resulting 𝒞 ∞ manifold is 𝑑 = 𝑚 − 𝑟. Proof. Consider two standard charts (𝑥𝛼 , 𝑈𝛼 ) and (𝑥𝛽 , 𝑈𝛽 ) about a typical point 𝑝 in 𝑈𝛼 ∩ 𝑈𝛽 . The chart (𝑥𝛼 , 𝑈𝛼 ) is determined by a partition of column indices [1, 𝑚] = 𝐽 ′ (𝛼) ∪ 𝐽(𝛼) and a choice of row indices 𝐼(𝛼) ⊆ [1, 𝑛] with |𝐼(𝛼)| = |𝐽(𝛼)| = 𝑟; then |𝐽 ′ (𝛼)| = 𝑚 − 𝑟 and [(𝑑𝜙)𝑝 ]𝐼𝐽 is nonsingular. In the

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

217

notation of the IFT, we then have 𝑈𝛼 = (𝐵1𝛼 × 𝐵2𝛼 ) ∩ 𝑀 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) =

𝐵1𝛼 ,

𝑥𝛼 = (𝜋𝐽 ′ (𝛼) |𝑈𝛼 ) an open set in ℝ𝑚−𝑟 .

The chart map 𝑥𝛼 is the restriction to 𝑈𝛼 of a linear (hence 𝒞 ∞ ) projection map ′ ′ 𝜋𝐽 (𝛼) ∶ ℝ𝑚 → ℝ𝐽 (𝛼) ≅ ℝ𝑚−𝑟 , 𝑥𝛼 = (𝜋𝐽 ′ (𝛼) |𝑈𝛼 ) ∶ 𝑈𝛼 → 𝐵1𝛼 ⊆ ℝ𝑚−𝑟 . −1 Its inverse Ψ = 𝑥𝛼 is the graph map −1

Ψ𝛼 = (𝜋𝐽 ′ (𝛼) |𝑈𝛼 )

∶ 𝑉𝛼 → 𝑈𝛼 ,

𝐵1𝛼

∞

which is actually a 𝒞 map from into all of ℝ𝑚 . The chart (𝑥𝛽 , 𝑈𝛽 ) corresponds to some other choice of column and row indices [1, 𝑚] = 𝐽 ′ (𝛽) ∪ 𝐽(𝛽) and 𝐼(𝛽) ⊆ [1, 𝑛], and a corresponding rectangular 𝛽 𝛽 𝛽 𝛽 open neighborhood 𝐵1 ×𝐵2 of 𝑝 in ℝ𝑚 . The chart map on 𝑈𝛽 = (𝐵1 ×𝐵2 ) ∩𝑀 is just the restriction to 𝑈𝛽 of a linear projection 𝜋𝐽 ′ (𝛽) , and by the IFT its inverse 𝛽

is a 𝒞 ∞ map from 𝐵1 into all of ℝ𝑚 . Therefore the coordinate transition map −1 𝑥𝛽 ∘ 𝑥 𝛼 = 𝜋𝐽 ′ (𝛽) ∘ (𝜋𝐽 ′ (𝛼) |𝑈𝛼 )

−1

= 𝜋𝐽 ′ (𝛽) ∘ Ψ𝛼

∞

is the composite of a linear map and a 𝒞 map ℝ𝑚−𝑟 ≅ ℝ𝐽

𝑥−1 𝛼

′ (𝛼)

𝑥𝛽

−−−⟶ ℝ𝑚 −−−⟶ ℝ𝐽

′ (𝛽)

≅ ℝ𝑚−𝑟 □

and is certainly 𝒞 ∞ . Likewise for the reverse transition map.

The preceding proof is burdened by the complicated notation used to label all the players. A stripped-down outline emphasizes the intuition behind the −1 proof. To show 𝐲 = 𝑥𝛽 ∘ 𝑥𝛼 (𝐱) is 𝒞 ∞ near 𝐱0 , we observe that −1

−1 • Near 𝐱0 , the map 𝑥𝛼 coincides with the map (𝜋𝐽 ′ (𝛼) |𝑈𝛼 ) , which by ∞ the IFT is a 𝒞 map from an open set in ℝ𝑚−𝑟 into all of ℝ𝑚 that sends 𝐱0 → 𝑝 and whose range is contained in 𝑀. • Near 𝑝, the chart map 𝑥𝛽 coincides with the globally defined linear projection map 𝜋𝐽 ′ (𝛽) , which is certainly 𝒞 ∞ . −1 Therefore the transition map 𝑥𝛽 ∘ 𝑥𝛼 is the composite of a linear map and a ∞ 𝒞 map ′

𝑥−1 𝛼

𝑥𝛽

′

ℝ𝑚−𝑟 ≅ ℝ𝐽 (𝛼) ⟶ ℝ𝑚 ⟶ ℝ𝐽 (𝛽) ≅ ℝ𝑚−𝑟 and is itself 𝒞 ∞ . Likewise for the transition map in the reverse direction. Example 5.16. Let 𝜙 ∶ ℝ3 → ℝ1 with 𝜙(𝐱) = 𝑥32 − 𝑥12 − 𝑥22 . At any 𝑝 = (𝑥1 , 𝑥2 , 𝑥3 ), the 1 × 3 Jacobian matrix 𝜕𝜙 𝜕𝜙 (𝑑𝜙)𝑝 = [ ,…, ] = [−2𝑥1 , −2𝑥2 , 2𝑥3 ] 𝜕𝑥1 𝜕𝑥3 is essentially the classical “gradient” vector ∇𝜙(𝑝). Then rk(𝑑𝜙)𝐱 is constant ≡ 1 unless all three entries are zero, which happens only at the origin 𝑝 = (0, 0, 0).

218

5. MATRIX LIE GROUPS

Figure 5.6. In (a) we show some level sets 𝐿𝑐 = 𝐿𝜙(𝐱)=𝑐 for the map 𝜙(𝐱) = 𝑥32 − 𝑥22 − 𝑥12 from ℝ3 → ℝ1 . For 𝑐 = 0 the level set where 𝑥12 − 𝑥22 − 𝑥32 = 0, shown in (b), is a double cone with a singularity at the origin, where it fails to be locally Euclidean. All other level sets are smooth two-dimensional hypersurfaces in ℝ3 , but the geometry of 𝐿𝑐 changes as we pass from 𝑐 < 0 to 𝑐 > 0. For 𝑐 < 0, we get a single connected surface; for 𝑐 > 0 there are two isolated pieces, both smooth. The level set 𝑀0 = 𝐿𝜙(𝐱)=0 is the double cone shown in Figure 5.6(b). This twodimensional hypersurface has a singularity at the origin in ℝ3 , where it fails to be locally Euclidean, so 𝐿𝜙(𝐱)=0 cannot be made into a smooth manifold by covering it with suitably defined coordinate charts. All other level surfaces 𝐿𝜙(𝐱)=𝑐 , 𝑐 ≠ 0, are smooth two-dimensional manifolds, a few of which are shown in Figure 5.6(a). Consider the possible charts we might impose near the point 𝑝 = (1, 1, √3) on the particular level set 𝐿𝜙(𝐱)=1 . Entries in (𝑑𝜙)𝑝 at 𝑝, (𝑑𝜙)𝑝 = [−2𝑥1 , −2𝑥2 , 2𝑥3 ], are nonzero at and near 𝑝 and rk(𝑑𝜙)𝐱 is constant ≡ 1 near 𝑝, so we may apply the IFT to define standard charts about 𝑝. Each nonzero entry in (𝑑𝜙)𝑝 corresponds to a nonsingular 1 × 1 submatrix, so several legitimate groupings of variables are available to parametrize 𝑀 near 𝑝: (5.2)

𝐽 = {1}, 𝐽 ′ = {2, 3}

𝐽 = {2}, 𝐽 ′ = {1, 3}

𝐽 = {3}, 𝐽 ′ = {1, 2}.

Thus, 𝐿𝜙(𝐱)=1 can be described as the graph in ℝ3 of various smooth functions 𝑥𝑘 = 𝑓𝑘 (𝑥𝑖 , 𝑥𝑗 ) by solving 1 = 𝜙(𝐱) = 𝑥32 − 𝑥22 − 𝑥12 for one variable in terms of the other two.

5.1. MATRIX GROUPS AND THE IMPLICIT FUNCTION THEOREM

219

1. 𝑥1 = 𝑓1 (𝑥2 , 𝑥3 ) = +√𝑥32 − 𝑥22 − 1 near (1, √3) in the (𝑥2 , 𝑥3 )-plane. 2. 𝑥2 = 𝑓2 (𝑥1 , 𝑥3 ) = +√𝑥32 − 𝑥12 − 1 near (1, √3) in the (𝑥1 , 𝑥3 )-plane. 3. 𝑥3 = 𝑓3 (𝑥1 , 𝑥2 ) = +√1 + (𝑥12 + 𝑥22 ) near (1, 1) in the (𝑥1 , 𝑥2 )-plane. −1 The coordinate transition map (𝑥1 , 𝑥3 ) = 𝑦𝛽 ∘ 𝑥𝛼 (𝑥2 , 𝑥3 ) can be computed directly by writing 𝑥1 = 𝑓1 (𝑥2 , 𝑥3 ) to get (𝑥1 , 𝑥3 ) in terms of (𝑥2 , 𝑥3 ). The resulting transition map −1 (𝑥1 , 𝑥3 ) = Φ(𝑥2 , 𝑥3 ) = 𝑦𝛽 ∘ 𝑥𝛼 (𝑥2 , 𝑥3 )

| = (𝑥1 , 𝑥3 )| |𝑥

1 =𝑓1 (𝑥2 ,𝑥3 )

= (√𝑥32 − 𝑥22 − 1 , 𝑥3 ) is clearly a 𝒞 ∞ map from (𝑥2 , 𝑥3 ) to (𝑥1 , 𝑥3 ). So is its inverse. Exercise 5.17. In Example 5.16, (a) Compute the inverse map (𝑥2 , 𝑥3 ) = Φ−1 (𝑥1 , 𝑥3 ). (b) One of the valid splittings [1, 3] = 𝐽 ′ ∪𝐽 of column indices listed in (5.2) is 𝐽 ′ = {1, 2}, 𝐽 = {3}. Find an explicit formula for the corresponding ′ projection map (𝑥1 , 𝑥2 ) = 𝜋𝐽 (𝑥1 , 𝑥2 , 𝑥3 ) that assigns Euclidean coordinates to points 𝐱 = (𝑥1 , 𝑥2 , 𝑥3 ) on 𝑀 near 𝑝 = (1, 1, √3). (c) Give an explicit formula for the inverse ′

−1

(𝑥1 , 𝑥2 , 𝑥3 ) = (𝜋𝐽 |ℝ𝐽 ′ ) (𝑥1 , 𝑥3 ) of the projection map in (b). Exercise 5.18. Consider the points (a) 𝑝 = (1, 0, √2)

(b) 𝑝 = (0, 0, −1)

on the two-dimensional hypersurface 𝑀 = 𝐿𝜙(𝐱)=1 of Example 5.16. In each case, determine all groups of coordinates 𝑥𝐽 ′ = (𝑥𝑖 , 𝑥𝑗 ) that parametrize 𝑀 near the base point 𝑝. Exercise 5.19. The unit sphere 𝑆2 = 𝐿𝜙(𝐱)=1 in ℝ3 is determined by 𝜙(𝐱) = = 1. Verify that it is a 𝒞 ∞ manifold in ℝ3 by showing that rk(𝑑𝜙)𝐱 ≡ 1 near every point 𝐱 ∈ 𝑆 2 . 𝑥12 +𝑥22 +𝑥32

Exercise 5.20. Describe a set of standard charts covering the unit sphere 𝑆 2 = 𝐿𝜙(𝐱)=1 where 𝜙(𝐱) = 𝑥12 + 𝑥22 + 𝑥32 , taking for your chart domains the relatively open hemispheres (boundary circles excluded) 𝑈𝑘+ = {𝐱 ∈ 𝑆 2 ∶ 𝑥𝑘 > 0}

𝑈𝑘− = {𝐱 ∈ 𝑆 2 ∶ 𝑥𝑘 < 0}

for 𝑘 = 1, 2, 3. (All six hemispheres are required to fully cover the sphere 𝑆 2 .) The chart maps 𝑥1± ∶ 𝑈1± → ℝ2 project points 𝐱 ∈ 𝑈1± onto the open unit disc 𝑥22 + 𝑥32 < 1 in the (𝑥2 , 𝑥3 )-plane when 𝑘 = 1; they project 𝑈2± onto the

220

5. MATRIX LIE GROUPS

open disc in the (𝑥1 , 𝑥3 )-plane when 𝑘 = 2; and they project onto the disc in the (𝑥1 , 𝑥2 )-plane when 𝑘 = 3. (a) Give explicit formulas for the chart maps on the particular domains 𝑈1+ and 𝑈3− . (b) Compute the coordinate transition maps in both directions for this pair of charts, noting that they have the form (𝑥𝑖 , 𝑥𝑗 ) = 𝑥𝛼 (𝐱) = 𝑥𝛼 (𝑥1 , 𝑥2 , 𝑥3 ) for 𝐱 ∈ 𝑀. These are examples of standard charts on the level set 𝐿𝜙(𝐱)=1 . Exercise 5.21 (Stereographic Projection on the 2-Sphere 𝑆2 ). Let 𝐻 + be the two-dimensional hyperplane in ℝ3 that is tangent to the unit sphere 𝑀 = 𝑆 2 at its “north pole” 𝑁 = (0, 0, +1), and consider the “punctured sphere” 𝑈𝛼 = 𝑆 2 ∼ {𝑆} obtained by deleting the south pole 𝑆 = (0, 0, −1) from the sphere. Each point 𝐮 ∈ 𝑈𝛼 determines a straight line in ℝ3 that emanates from 𝑆 and passes through 𝐮; continuing along this line, we meet the hyperplane 𝐻 + in a unique point with coordinates (𝑥(𝐮), 𝑦(𝐮), +1). The resulting bijection Φ+ from 𝑈𝛼 to 𝐻 + is an example of stereographic projection. Dropping the redundant coordinate entry “1”, we obtain the stereographic projection map 𝑥𝛼 ∶ 𝑈𝛼 → ℝ2 , 𝑥𝛼 (𝐮) = (𝑥1 (𝐮), 𝑥2 (𝐮)) ∈ ℝ2 , which is bicontinuous from the open subset 𝑈𝛼+ ⊆ 𝑆 2 onto all of coordinate space ℝ2 . Similarly, we may project the punctured sphere 𝑈𝛽 = 𝑆 2 ∼ {𝑁} along lines radiating from 𝑁 onto the hyperplane 𝐻 − tangent to the sphere at the south pole 𝑆 = (0, 0, −1) to define a second chart map 𝑥𝛽 (𝐯) = (𝑥′ , 𝑦′ ) ∈ ℝ2 for 𝐯 ∈ 𝑈𝛽 . (a) Give an explicit formula for the stereographic projection map (𝑥1 , 𝑥2 ) = 𝑥𝛼 (𝐮) = 𝑥𝛼 (𝑢1 , 𝑢2 , 𝑢3 ). Note carefully that 𝑥𝛽 maps triples 𝐮 with 𝑢21 + 𝑢22 + 𝑢33 = 1 to pairs (𝑥1 , 𝑥2 ) ∈ ℝ2 . −1 (b) Compute the coordinate transition map 𝐯 = 𝑥𝛽 ∘ 𝑥𝛼 (𝐮) and its in∞ verse, and check that it is 𝒞 where defined.

Stereographic projection allows us to cover the sphere 𝑆 2 with just two 𝒞 ∞ related charts, the minimum number possible since it is well known that 𝑆 2 cannot be mapped bicontinuously to an open set in the plane ℝ2 . But note: for many purposes, the covering with hemispheres leads to simpler computations than stereographic projection. Hint. In (a), use similar triangles and rotational symmetry. Note. The charts 𝐻 + and 𝐻 − are not among the founding charts for the differentiable structure on 𝑆2 described in Example 5.14, but they are 𝒞 ∞ -related to all founding charts (which, by Theorem 5.15, are 𝒞 ∞ -related to each other). ○

5.2. MATRIX LIE GROUPS

221

5.2. Matrix Lie Groups The classical groups, first discussed in Chapter 3, are level sets of certain polynomial maps 𝕂𝑚 → 𝕂𝑛 , except for the general linear group GL(𝑛, 𝕂) = {𝐴 ∈ M(𝑛, 𝕂) ∶ det(𝐴) ≠ 0}, often abbreviated to “GL.” It is an open, dense 2 subset in matrix space M(𝑛, 𝕂) ≅ 𝕂𝑛 , and so can be regarded as a smooth manifold covered by a single chart (𝑈𝛼 , 𝑥𝛼 ) with 𝑈𝛼 = GL and a chart map 2 𝑥𝛼 ∶ GL(𝑛, 𝕂) → 𝕂𝑛 , which we shall write as 𝑥𝛼 (𝐴) = (𝑎11 , … , 𝑎1𝑛 ; 𝑎21 , … , 𝑎2𝑛 ; … ; 𝑎𝑛1 , … , 𝑎𝑛𝑛 ) in what follows. Obviously, dim𝕂 (GL) = 𝑛2 since GL is an open set in the 𝑛2 dimensional matrix space. All other classical groups are closed subsets of lower 2 dimension in M(𝑛, 𝕂) ≅ 𝕂𝑛 . Definition 5.22. A smooth manifold 𝐺 is a Lie group if 1. It is a group under some product operation 𝑃 ∶ 𝐺 × 𝐺 → 𝐺, indicated by writing 𝑃(𝑥, 𝑦) = 𝑥 ⋅ 𝑦, and under the inversion map 𝐽 ∶ 𝐺 → 𝐺 that sends 𝑥 → 𝑥−1 . 2. The product operation 𝑃 and inversion operation 𝐽 are 𝒞 ∞ maps of the underlying differentiable manifold. The dimension 𝑑 = dim𝕂 (𝐺) is the common dimension of the coordinate charts that cover 𝐺 and provide its differentiable structure. In particular, if (𝑈𝛼 , 𝑥𝛼 ), (𝑈𝛽 , 𝑦𝛽 ) are coordinate charts, the product operation becomes a 𝒞 ∞ map 𝑃 ∶ 𝕂𝑑 × 𝕂𝑑 → 𝕂𝑑 in these coordinates, where 𝑑 = dim(𝐺). Thus, if 𝑥 ∈ 𝑈𝛼 , 𝑦 ∈ 𝑈𝛽 , and (𝑈𝛾 , 𝑥𝛾 ) is a chart containing the product 𝑧 = 𝑥 ⋅ 𝑦, the composite map 𝑃

𝑥𝛾

−1 𝑥𝛾 ∘ 𝑃 ∘ (𝑥𝛼 × 𝑦𝛽−1 ) ∶ 𝕂𝑑 × 𝕂𝑑 → 𝐺 × 𝐺 → 𝐺 −−→ 𝕂𝑑

is a 𝒞 ∞ map. Similarly, if 𝑧 ∈ 𝑈𝛼 and 𝑧−1 ∈ 𝑈𝛽 , then −1 𝑥𝛽 ∘ 𝐽 ∘ 𝑥𝛼 ∶ 𝕂𝑑 → 𝕂𝑑

is a 𝒞 ∞ map .

The general theory of Lie groups has become a vast subject. In general, we will restrict attention to matrix Lie groups, which comprise GL(𝑛, 𝕂) itself and its “closed” subgroups, the subsets 𝐺 ⊆ GL(𝑛, 𝕂) such that 𝐺 is : 1. A group under matrix multiplication, as in Definition 5.22. 2. A 𝒞 ∞ manifold (smooth hypersurface) in matrix space, as in Definition 5.12. 3. A closed subgroup in GL(𝑛, 𝕂), which means that if a sequence of matrices in 𝐺 converges to a limit 𝐴 in M(𝑛, 𝕂), so that 𝐴𝑛 → 𝐴 as 𝑛 → ∞, the limit 𝐴 is also in 𝐺. Notice that GL(𝑛, 𝕂) itself is not a closed subset 1 of matrix space because the upper triangular matrices 𝐴𝑛 = [ , 1; 0, 1] 𝑛 converge to 𝐵 = [0, 1; 0, 1] which has det(𝐵) = 0 and is not in GL. But this point of view soon becomes cumbersome with its continual need to refer to the external matrix space in which 𝐺 is embedded.

222

5. MATRIX LIE GROUPS

The general theory, in which 𝐺 is a differentiable manifold not embedded in any external Euclidean space, becomes accessible if you have read Section 4.1 of Chapter 4. The following discussion is pretty much self-contained but does presume some familiarity with that introductory discussion of manifolds, tangent vectors, vector fields, etc. Real vs Complex Matrix Groups. The bijection 𝐽 ∶ ℂ → ℝ2 = ℝ ⊕ ℝ given by 𝑧 = 𝑥 + √−1 𝑦 ↦ 𝐽(𝑧) = (𝑥, 𝑦) ∈ ℝ2 identifies ℂ ≅ ℝ2 as vector spaces over ℝ since 𝐽 is an ℝ-linear bijection; however, 𝐽 is not a ℂ-linear map. Similarly, if we view M(𝑛, ℂ) as a vector space over ℝ, it becomes a direct sum M(𝑛, ℂ) = M(𝑛, ℝ) ⊕ √−1 M(𝑛, ℝ) of vector subspaces over ℝ, 2

M(𝑛, ℂ) ≅ M(𝑛, ℝ) ⊕ M(𝑛, ℝ) ≅ ℝ2𝑛 , via the ℝ-linear bijection that sends 𝑍 = {𝑧𝑖𝑗 } = {𝑥𝑖𝑗 + √−1𝑦𝑖𝑗 } → (𝑋, 𝑌 ) = ({𝑥𝑖𝑗 }, {𝑦𝑖𝑗 }). In particular, we have dimℝ (ℂ) = 2 ⋅ dimℂ (ℂ) = 2 ⋅ 1 = 2 dimℝ M(𝑛, ℂ) = 2 ⋅ dimℝ M(𝑛, ℝ) = 2𝑛2 . In these Notes, we will generally view M(𝑛, ℂ) as a vector space over ℝ with dimension 2𝑛2 and regard all classical groups mentioned above as real hyper2 2 surfaces in ℝ𝑛 or ℝ2𝑛 , even if they are carved out of complex matrix space by real- or complex-valued constraint equations. The hypotheses of the IFT can be shown to hold for all the classical groups, and we shall regard these matrix groups as 𝒞 ∞ manifolds equipped with (real) coordinate charts the IFT provides. 2 For instance in M(𝑛, ℝ) ≅ ℝ𝑛 , once the conditions of the IFT are verified, the single real-valued constraint equation det(𝐴) = 1 yields a 𝒞 ∞ hypersurface 𝐺ℝ = SL(𝑛, ℝ) with dimℝ (𝐺ℝ ) = 𝑛2 − 1. 2 If we identify complex matrix space M(𝑛, ℂ) ≅ ℂ𝑛 , the matrix group 𝐺ℂ = SL(𝑛, ℂ) = {𝐴 ∈ M(𝑛, ℂ) ∶ det(𝐴) = 1 + 𝑖0} has complex dimension dim(𝐺ℂ ) = 𝑛2 − 1 because it is carved out of M(𝑛, ℂ) ≅ 2 ℂ𝑛 by a single complex constraint. By the complex-variable version of the (IFT), 𝐺ℂ can be described (at least locally) by imposing 𝑛2 − 1 independent complex coordinates on the group. On the other hand, if we view M(𝑛, ℂ) as a real vector space 2

2

2

M(𝑛, ℝ) ⊕ M(𝑛, ℝ) ≅ ℝ𝑛 ⊕ ℝ𝑛 ≅ ℝ2𝑛 , 2

then as a subset of ℝ2𝑛 , the group SL(𝑛, ℂ) becomes a hypersurface of real dimension 2𝑛2 − 2 = 2 ⋅ dimℂ 𝐺ℂ because the complex constraint identity det(𝐴) = Re(det(𝐴)) + 𝑖 Im(det(𝐴)) = 𝐹1 (𝐴) + 𝑖𝐹2 (𝐴) = 1 + 𝑖0 involves two independent real-valued constraints, 𝐹1 (𝐴) = 1 and 𝐹2 (𝐴) = 0, for matrices with complex entries.

5.2. MATRIX LIE GROUPS

223

Geometrically, SL(𝑛, ℝ) can be viewed as as a subset of SL(𝑛, ℂ) since SL(𝑛, ℝ) = SL(𝑛, ℂ) ∩ (M(𝑛, ℝ) + 𝑖0) is the set of complex matrices with det(𝐴) = 1 whose entries happen to be real. We can also think of this intersection as the set of “real points” in the complex matrix group SL(𝑛, ℂ) when we identify M(𝑛, ℂ) = M(𝑛, ℝ) + √−1 M(𝑛, ℝ) as the complexification of the real vector space M(𝑛, ℝ). Example 5.23 (Unitary Groups U(𝑛) and SU(𝑛)). The unitary groups are another important family of matrix groups in M(𝑛, ℂ): U(𝑛) = {𝐴 ∶ 𝐴𝐴∗ = 𝐼}

and

SU(𝑛) = U(𝑛) ∩ SL(𝑛, ℂ).

Discussion. In Chapter 6 of LA I we saw that a complex matrix 𝐴 is unitary if and only if its rows form an orthonormal basis in ℂ𝑛 , with inner products (𝑅𝑖 , 𝑅𝑗 ) = 𝛿𝑖𝑗 1

(Kronecker delta)

for 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛.

1

This involves (𝑛2 + 𝑛) = (𝑛2 − 𝑛) + 𝑛 independent complex constraint iden2 2 tities on row vectors in ℂ𝑛 . 2 If we regard M(𝑛, ℂ) ≅ ℝ2𝑛 , the expressions (𝑅𝑖 , 𝑅𝑖 ) = ‖𝑅𝑖 ‖2 = 1 are automatically real-valued (and positive) and impose 𝑛 real constraints, while the (𝑛2 − 𝑛)/2 complex off-diagonal constraints (𝑅𝑗 , 𝑅𝑘 ) = 0 + 𝑖0 for 𝑗 < 𝑘 each impose two real constraints. None of of these can be dropped if we want to 2 ensure 𝐴𝐴∗ = 𝐼. Thus, when we identify M(𝑛, ℂ) ≅ ℝ2𝑛 , U(𝑛) is determined 𝑛2 −𝑛

by 𝑛 + 2 ⋅ ( ) = 𝑛2 real constraints. If the conditions of the IFT can be 2 verified, we would conclude that U(𝑛) is a matrix Lie group with dimℝ U(𝑛) = dimℝ M(𝑛, ℂ) − #(irredundant real constraints) = 2𝑛2 − 𝑛2 = 𝑛2 . Remark 5.24 (Dimension of Level Sets in IFT). If 𝜙 ∶ ℝ𝑚 → ℝ𝑛 is a 𝒞 ∞ map such that 𝑟 = rk(𝑑𝜙)𝐱 is constant near each point in a level set 𝑀 = 𝐿𝜙(𝐱)=𝐜 (𝐜 ∈ ℝ𝑛 ), the IFT provides a covering of 𝑀 by 𝒞 ∞ -related coordinate charts that make it into a differentiable manifold. A careful reading of the implicit function theorem, Theorem 5.5, reveals the connection between the dimension of 𝑀 and the rank 𝑟 = rk(𝑑𝜙): dimℝ (𝑀) = dim(ℝ𝑚 ) − #(irredundant constraints) (5.3)

= dim(ℝ𝑚 ) − rk(𝑑𝜙) = 𝑚 − 𝑟

because this is also the dimension of the coordinate charts that cover 𝑀. We would have 𝑟 = rk(𝑑𝜙) < 𝑛 if 𝜙 involves some redundant constraints. ○ As for SU(𝑛), the additional condition det(𝐴) = 1+𝑖0 might seem to impose two more real constraints. However, the real and imaginary parts of det(𝐴) are not independent when 𝐴 ∈ U(𝑛), as they were for 𝐴 ∈ SL(𝑛, ℂ). The very definition of U(𝑛) imposes a constraint on values of det(𝐴), 𝐴𝐴∗ = 𝐼 ⇒ det(𝐴) det(𝐴∗ ) = | det(𝐴)|2 = 1 ⇒ | det(𝐴)| = 1,

224

5. MATRIX LIE GROUPS

which must be taken into account. The value of det(𝐴) = 𝑒𝑖𝜃(𝐴) is determined by a single real-valued “angle variable” 𝜃(𝐴). Requiring that | det(𝐴)| = 1 im2 poses just one real constraint on ℝ2𝑛 , so that dimℝ SU(𝑛) = dimℝ U(𝑛) − 1 = 𝑛2 − 1. In particular when 𝑛 = 2, we have dimℝ M(2, ℂ) = 8, dimℝ U(2) = 4, and dimℝ SU(2) = 3. The classical groups will mostly be viewed as smooth real hypersurfaces in 2 2 ℝ𝑛 (or ℝ2𝑛 if they are defined as subsets of complex matrix space M(𝑛, ℂ)). But a few actually are intrinsically complex manifolds that can be equipped with differentiably related complex coordinate charts. For instance, we have dimℂ GL(𝑛, ℂ) = 𝑛2

dimℂ SL(𝑛, ℂ) = 𝑛2 − 1

and

even though we will mostly regard them as real manifolds with dimℝ = 2𝑛2 or 2𝑛2 − 2. Example 5.25 (Complex Orthogonal Groups O(𝑛, ℂ)). Another example of a complex matrix Lie group is O(𝑛, ℂ) = {𝐴 ∈ M(𝑛, ℂ) ∶ 𝐴𝐴T = 1}, on which det(𝐴) can only take the values ±1 because 𝐴𝐴T = 𝐼 ⇒ 1 = det(𝐼) = det(𝐴𝐴T ) = (det(𝐴))2 ⇒ det(𝐴) = ±1. Since det(𝐴) = ±1, yet varies continuously with 𝐴, its value must be constant on some open neighborhood for each 𝐴 ∈ O(𝑛, ℂ). Therefore, the sets of constancy O+ and O− for det(𝐴) are disjoint open sets in O(𝑛, ℂ), and O+ = SO(𝑛, ℂ) is an open subgroup in the full orthogonal group. This means SO(𝑛, ℂ) ⊆ O(𝑛, ℂ) are complex matrix groups having the same dimension over ℂ (which gets mul2 tiplied by 2 when they are viewed as real hypersurfaces in ℝ2𝑛 ). In particular det(𝐴) ≡ +1 for all 𝐴 close to the identity in O(𝑛, ℂ) because det(𝐼𝑛×𝑛 ) = 1 and varies continuously with 𝐴. The group O(𝑛, ℂ) is the the matrix realization of the automorphism group Aut(𝐵) of the canonical nondegenerate symmetric bilinear form on on ℂ𝑛 , 𝐵(𝐳, 𝐰) = 𝐳T 𝐼𝑛×𝑛 𝐰 = 𝑧1 𝑤1 + ⋯ + 𝑧𝑛 𝑤𝑛

for 𝐳, 𝐰 ∈ ℂ𝑛 ,

with respect to the standard basis in ℂ𝑛 (recall the discussion following Definition 3.17). Although 𝐵 is bilinear and symmetric on ℂ𝑛 it is not an inner product and should not be confused with the standard Euclidean inner product on ℂ𝑛 , 𝑛

(𝐳, 𝐰) = ∑ 𝑧𝑗 𝑤𝑗 . 𝑗=1

For instance, the values of 𝐵(𝐳, 𝐳) need not be real or ≥ 0. Discussion. By definition of matrix product, we have 𝐴𝐴T = 𝐼 ⇔ the rows (and columns) of 𝐴 are 𝐵-orthonormal with 𝐵(𝑅𝑟 , 𝑅𝑠 ) = 𝛿𝑟𝑠 1

(Kronecker delta).

2

Thus (𝑛 + 𝑛) irredundant complex constraints on M(𝑛, ℂ) carve out O(𝑛, ℂ), 2

1

so dimℂ O(𝑛, ℂ) = (𝑛2 +𝑛). As indcated above, O(𝑛, ℂ) is made up of two open, 2

5.2. MATRIX LIE GROUPS

225

disjoint pieces O+ ∪ O− , one being the special orthogonal group SO(𝑛, ℂ). Both have the same (complex) dimension; when they are regarded as real Lie groups, their dimensions are dimℝ 𝐺 = 𝑛2 + 𝑛. It remains to verify the hypotheses of the IFT to show that these are in fact matrix Lie groups, but we omit these details. The calculations would be tedious, and we will soon cite a general theorem (Theorem 5.37 below) that circumvents such issues. ○ Examples of Lie Groups. Example 5.26. The simplest example of a Lie group is 𝐺 = (ℝ𝑛 , +) whose differentiable structure is given by the trivial chart (𝑈𝛼 , 𝑥𝛼 ) = (ℝ𝑛 , id). Clearly, the (+) operation is a 𝒞 ∞ map ℝ𝑛 ×ℝ𝑛 → ℝ𝑛 , as is the inversion map 𝐽(𝑥) = −𝑥 on ℝ𝑛 . This is a commutative (or abelian) Lie group, but strictly speaking it is not a matrix Lie group because its elements are not matrices. However, (ℝ2 , +) can be realized as a matrix group 𝐺 ⊆ M(3, ℝ) via the bijective isomorphism 1 0 𝑡1 𝜙(𝑡) = ( 0 1 𝑡2 ) 0 0 1

𝐭 = (𝑡1 , 𝑡2 )

such that 𝜙(𝐬+𝐭) = 𝜙(𝐬)⋅𝜙(𝐭), and this works in all dimensions 𝑛 ≥ 2. Similarly 𝐺 = ℂ𝑛 with (+) as its group operation is a Lie group with dimℂ ℂ𝑛 = 𝑛 and dimℝ ℂ𝑛 = 2𝑛; it can be modeled as a matrix group in the same way as ℝ𝑛 . The IFT shows that the unit circle 𝑆 1 = {𝑥 + 𝑖𝑦 ∈ ℂ ∶ |𝑧| = 𝑥2 + 𝑦2 = 1} in the complex plane is a 𝒞 ∞ manifold with dimℝ (𝑆 1 ) = 1 because 𝑆 1 can be covered by two real 𝒞 ∞ -related charts (recall stereographic projection discussed in Example 5.19). However, 𝑆 1 is also an abelian group under the multiplication operation in ℂ, and the operations 𝑃(𝑧, 𝑤) = 𝑧 ⋅ 𝑤, 𝐽(𝑧) = 1/𝑧 are differentiable when described in local coordinates. There is a natural bijective correspondence between the circle 𝑆 1 equipped with complex multiplication and the group SO(2) of rotation matrices on the plane via a diffeomorphism that intertwines the two group operations. This makes 𝑆 1 and 𝐺 isomorphic as Lie groups. Further details about 𝑆1 as a Lie group are provided by Exercises 2 - 3 in Section 5.2 of the Additional Exercises. Example 5.27 (𝐺 = GL(𝑛, 𝕂)). The general linear group GL = GL(𝑛, 𝕂) is a noncommutative matrix Lie group with dim𝕂 (𝐺) = 𝑛2 for 𝕂 = ℝ or ℂ. We can assign chart coordinates to elements 𝐴 ∈ GL, letting 2

𝑥𝛼 (𝐴) = (𝐴11 , … , 𝐴1𝑛 ; … ; 𝐴𝑛1 , … , 𝐴𝑛𝑛 ) ∈ 𝕂𝑛 , 2

which is the restriction to GL of a bijective 𝕂-linear map of M(𝑛, 𝕂) → 𝕂𝑛 . The product operation in GL is given by matrix multiplication, 𝑛

(5.4)

(𝐴𝐵)𝑖𝑗 = ∑ 𝐴𝑖𝑘 𝐵𝑘𝑗 , 𝑘=1

226

5. MATRIX LIE GROUPS

and inversion is effected by Cramer’s rule (see LA I, Theorem 4.48), 1 𝐴−1 = Cof(𝐴)T . det(𝐴) This rational map involves the transpose of the cofactor matrix Cof(𝐴), whose entries are polynomials in the entries of 𝐴; det(𝐴) is also a polynomial in the entries of 𝐴 and is nonvanishing on GL. Other matrix Lie groups are level sets for various 𝒞 ∞ maps : 𝜙 ∶ M(𝑛, 𝕂) → 𝕂𝑘 with 𝑘 ≤ 𝑛2 = dim𝕂 M(𝑛, 𝕂). It follows from the IFT that they are all smooth manifolds, but to verify that they are also Lie groups, we must also prove that the multiplication and inversion operations are differentiable maps between manifolds. The next observation takes care of that. Theorem 5.28. Suppose 𝑀 is the level set through 𝐴 ∈ M(𝑛, ℝ𝑛 ) for some 𝒞 map 𝜙 ∶ M(𝑛, ℝ) → ℝ𝑘 and that rk(𝑑𝜙)𝐴 is constant on some open neighborhood of each point in 𝑀. If 𝑀 is also a subgroup of GL(𝑛, ℝ) under matrix multiplication, then 𝑀 is automatically a Lie group in the standard 𝒞 ∞ structure imposed via the IFT as a smooth submanifold in matrix space (identifying 2 M(𝑛, ℝ) ≅ ℝ𝑛 ). ∞

Proof. If 𝑎, 𝑏 ∈ 𝑀, let (𝑈𝛼 , 𝑥𝛼 ), (𝑈𝛽 , 𝑥𝛽 ) be standard charts about 𝑎, 𝑏 and let (𝑈𝛾 , 𝑥𝛾 ) be a chart about 𝑐 = 𝑎 ⋅ 𝑏. By the IFT, 𝑥𝛾 is the restriction to 𝑈𝛾 2 ′ of a linear projection 𝜋𝐽 ′ (𝛾) from 𝕂𝑛 onto 𝕂𝑑 ≅ 𝕂𝐽 (𝛾) . The matrix product sending M(𝑛, 𝕂) × M(𝑛, 𝕂) → M(𝑛, 𝕂), is obviously 𝒞 ∞ on matrix space (in fact it is a polynomial map). Then the group operation 𝑃(𝑥, 𝑦) = 𝑥 ⋅ 𝑦 is a 𝒞 ∞ map with respect to the manifold structure on 𝑀 because it becomes a 𝒞 ∞ map from ℝ𝑑 × ℝ𝑑 → ℝ𝑑 when written in local coordinates: −1 −1 𝑥𝛾 ∘ 𝑃 ∘ (𝑥𝛼 × 𝑦𝛽−1 ) = (𝜋𝐽 ′ (𝛾) |𝑈𝛾 ) ∘ 𝑃 ∘ (𝑥𝛼 × 𝑦𝛽−1 )

where 𝑑 = dim𝕂 (𝐺) = 𝑛2 − 𝑟 and 𝑟 = rk(𝑑𝜙). Breaking this composite into steps, we have 𝕂

𝑑2

𝑑

𝑑

−1 𝑥−1 𝛼 ×𝑦𝛽

𝑥𝛾 =𝜋𝐽 ′ (𝛾)

𝑃

≅ 𝕂 × 𝕂 −−−− −− −−⟶ 𝑈𝛼 × 𝑈𝛽 ⟶ 𝑈𝛾 −−−−−−⟶ 𝕂𝑑 . 2

2

2

The matrix product operation 𝑃 ∶ 𝕂𝑛 × 𝕂𝑛 → 𝕂𝑛 is defined and 𝒞 ∞ on all of M(𝑛, 𝕂), and 𝑃(𝑈𝛼 × 𝑈𝛽 ) is contained in 𝐺 because 𝐺 is a group. As in the IFT, the map 𝑥𝛾 ∶ 𝑈𝛾 → 𝕂𝑑 is the restriction to 𝑈𝛾 ⊆ 𝐺 of a globally defined linear 2 projection map 𝜋𝐽 ′ (𝛾) from 𝕂𝑛 → 𝕂𝐽 ′ (𝛾) ≅ 𝕂𝑑 , so it is the restriction to 𝑈𝛾 of a globally 𝒞 ∞ map. Clearly then, the description of 𝑃 in local coordinates 2

−1 −1 × 𝑦𝛽−1 ) ∶ 𝕂𝑑 → 𝕂𝑑 × 𝑦𝛽−1 ) = 𝜋𝐽 ′ (𝛾) ∘ 𝑃 ∘ (𝑥𝛼 𝑥𝛾 ∘ 𝑃 ∘ (𝑥𝛼

is 𝒞 ∞ too. Similarly, the inversion map 𝐽 ∶ 𝐺 → 𝐺 is 𝒞 ∞ when expressed in standard chart coordinates. □ Below is the first example in which we need the implicit function theorem.

5.2. MATRIX LIE GROUPS

227

Example 5.29 (Special Linear Groups SL(𝑛, ℂ) and SL(𝑛, ℝ)). By appealing to the IFT, we will show that 𝐺 = SL(𝑛, ℝ) is a matrix Lie group with dimℝ (𝐺) = 𝑛2 − 1. Similar arguments would show that SL(𝑛, ℂ) is a smooth 2 complex hypersurface in M(𝑛, ℂ) ≅ ℂ𝑛 with dimℂ SL(𝑛, 𝐶) = 𝑛2 − 1

and

dimℝ SL(2, ℂ) = 2(𝑛2 − 1).

To keep things simple, we will focus on the case 𝕂 = ℝ, which can be handled by the familiar methods of real analysis. Discussion. SL(𝑛, ℝ) is the level set 𝐿𝜙(𝐴)=1 where 𝜙(𝐴) = det(𝐴) on 2 ℝ ≅ M(𝑛, ℝ). By the IFT, this is a smooth manifold in ℝ𝑛 if the 1 × 𝑛2 Jacobian matrix has rk(𝑑𝜙)𝐴 = 1 near every 𝐴 ∈ SL(𝑛, ℝ). This can be proved in all dimensions. In fact, if we list the entries in 𝐴 as (𝑎11 , … , 𝑎1𝑛 ; … ; 𝑎𝑛1 , … , 𝑎𝑛𝑛 ) 2 in ℝ𝑛 , then by definition of determinant, 𝑛2

det(𝐴) = ∑ sgn(𝜎) ⋅ 𝑎1,𝜍(1) ⋯ 𝑎𝑛,𝜍(𝑛) 𝜍∈𝑆𝑛

is a polynomial in the 𝑛2 entries 𝑎𝑖𝑗 . By the product formula for derivatives we get 𝑛 𝜕𝑎𝑗,𝜍(𝑗) 𝜕𝜙 = ∑ sgn(𝜎) ⋅ [ ∑ 𝑎1,𝜍(1) ⋅ ⋯ ⋅ ⋅ ⋯ ⋅ 𝑎𝑛,𝜍(𝑛) ] 𝜕𝑎𝑘,ℓ 𝜍∈𝑆 𝜕𝑎𝑘,ℓ 𝑗=1 𝑛

=

∑

sgn(𝜎) ⋅ 𝑎1,𝜍(1) ⋅ ⋯ ⋅ 𝑎ˆ 𝑘,ℓ ⋅ ⋯ ⋅ 𝑎𝑛,𝜍(𝑛) ,

𝜍∶𝜍(𝑘)=ℓ

because 𝜕𝑎𝑗,𝜍(𝑗) /𝜕𝑎𝑘,ℓ = 𝛿𝑗𝑘 𝛿𝜍(𝑗),ℓ

(product of Kronecker deltas).

(Here we use the standard math notation 𝑏1 ⋯ 𝑏ˆ𝑖 ⋯ 𝑏𝑛 = ∏𝑗≠𝑖 𝑏𝑗 to indicate a deleted factor in a product.) Indeed, 𝜕𝑎𝑗,𝜍(𝑗) /𝜕𝑎𝑘,ℓ = 0 unless (𝑗, 𝜎(𝑗)) = (𝑘, ℓ) which happens if and only if 𝑗 = 𝑘 and 𝜎(𝑘) = ℓ, in which case it is equal to 1. Therefore, we have 𝑎𝑘ℓ

𝜕𝜙 = ∑ sgn(𝜎) ⋅ 𝑎1,𝜍(1) ⋅ ⋯ ⋅ 𝑎𝑘,ℓ ⋅ ⋯ ⋅ 𝑎𝑛,𝜍(𝑛) . 𝜕𝑎𝑘,ℓ 𝜍∶𝜍(𝑘)=ℓ

But the 1 × 𝑛2 matrix [𝜕𝜙/𝜕𝑎𝑘ℓ ] has rank = 1 (maximal rank) unless all entries are zero, in which case we would have det(𝐴) = ∑ (⋯) + ∑ (⋯) + ⋯ + ∑ (⋯) 𝜍(1)=1

𝜍(1)=2

𝜍(1)=𝑛

= ∑ sgn(𝜎) ⋅ 𝑎1,𝜍(1) ⋯ 𝑎𝑛,𝜍(𝑛) = 0. 𝜍∈𝑆𝑛

That contradicts the hypothesis det(𝐴) = 1 and cannot occur. Thus (𝑑𝜙)𝐴 has maximal rank (= 1) at each point in SL(𝑛, ℝ), so dimℝ SL(𝑛, ℝ) = 𝑛2 − 1 as claimed. ○

228

5. MATRIX LIE GROUPS

Example 5.30 (Real Orthogonal Groups O(𝑛) and SO(𝑛)). The (real) orthogonal groups are defined as hypersurfaces in M(𝑛, ℝ), and both have the 1 same dimension dimℝ 𝐺 = (𝑛2 − 𝑛) for 𝑛 ≥ 2. In particular dimℝ SO(3) = 3. 2

Discussion. The group O(𝑛) is a closed set (hence a closed subgroup) in M(𝑛, ℝ) because if a sequence {𝐴𝑘 } in O(𝑛) converges to some limit 𝐴 in matrix space, so 𝐴𝑘 → 𝐴 as 𝑘 → ∞, then 𝐴T𝑘 → 𝐴T (obvious) and 𝐼 = 𝐴𝑘 𝐴T𝑘 → 𝐴𝐴T

as 𝑘 → ∞

by joint continuity of the matrix product operation (see LA I, Exercise 5.36). Thus, 𝐴 is in O(𝑛) and is in SO(𝑛) if 𝐴𝑘 ∈ SO(𝑛) since det(𝐴𝑘 ) → det(𝐴). On O(𝑛) we have 1 = det(𝐼) = det(𝐴𝐴T ) = (det(𝐴))2 , so det(𝐴) can only achieve values ±1, and −1 is actually achieved, for instance at 𝐽 = diag(1, … , 1, −1). Thus O(𝑛) consists of two disjoint pieces — one, the subgroup SO(𝑛) where det(𝐴) = +1 by definition, and the other a multiplicative coset 𝐽 ⋅ SO(𝑛) on which det(𝐴) ≡ −1. Both are open subsets of O(𝑛), so they have the same dimension as real manifolds. These two sets have geometric significance. Elements 𝐴 ∈ SO(𝑛) yield the orientation-preserving orthogonal linear maps 𝐿𝐴 ∶ ℝ𝑛 → ℝ𝑛 , while matrices in 𝐽 ⋅ SO(𝑛) correspond to orientation-reversing linear operators on ℝ𝑛 . The latter do not form a subgroup in O(𝑛) because the product of two orientationreversing maps is orientation-preserving. We show O(𝑛) is a 𝒞 ∞ manifold by directly verifying the hypotheses of the IFT. Recall that 𝐴 ∈ O(𝑛) ⇔ 𝐴𝐴T = 𝐼 ⇔ the rows 𝑅𝑖 are an orthonormal basis in ℝ𝑛 , so their innner products in ℝ𝑛 are (𝑅𝑖 , 𝑅𝑗 ) = 𝛿𝑖𝑗

(Kronecker delta for 1 ≤ 𝑖, 𝑗 ≤ 𝑛).

Eliminating redundant entries in this list of 𝑛2 identities by requiring 𝑖 ≤ 𝑗, we define the map 2

𝜙 ∶ ℝ𝑛 ≅ M(𝑛, ℝ) → ℝ(𝑛

2 +𝑛)/2

= ℝ𝑑

whose entries are inner products (𝑅𝑖 , 𝑅𝑗 ) of the rows of 𝐴 such that 𝑖 ≤ 𝑗, 𝜙(𝐴) = ((𝑅1 , 𝑅1 ), … , (𝑅1 , 𝑅𝑛 ); (𝑅2 , 𝑅2 ), … , (𝑅2 , 𝑅𝑛 ); ⋯ ; (𝑅𝑛 , 𝑅𝑛 )). Then O(𝑛) is the level set 𝐿𝜙(𝐴)=𝐜 , in which 𝐜 = (1, 0, … , 0; … ; 1, 0, 0, ; 1, 0 ; 1), 1

and we must show rk(𝑑𝜙)𝐴 = 𝑑 = (𝑛2 + 𝑛) for all 𝐴 ∈ O(𝑛). The idea is 2 clearly illustrated by the case 𝑛 = 3, where 𝜙(𝐴) = (𝜙1 , … , 𝜙6 ) with 3

𝜙(𝐴) = ( ∑ 𝑎21𝑖 , ∑ 𝑎1𝑖 𝑎2𝑖 , ∑ 𝑎1𝑖 𝑎3𝑖 ; ∑ 𝑎22𝑖 , ∑ 𝑎2𝑖 𝑎3𝑖 ; ∑ 𝑎23𝑖 ). 𝑖=1

𝑖

𝑖

𝑖

𝑖

𝑖

5.2. MATRIX LIE GROUPS

229

Dropping duplicate constraints, the 6 × 9 Jacobian matrix (𝑑𝜙)𝐴 is 2𝑎 2𝑎 2𝑎 0 0 0 0 0 0 ⎛ 𝑎 11 𝑎 12 𝑎 13 𝑎 𝑎 𝑎 0 0 0 ⎞ 21 22 23 11 12 13 ⎜ 𝜕𝜙𝑖 𝑎 𝑎32 𝑎33 0 0 0 𝑎11 𝑎12 𝑎13 ⎟⎟ . [ ] = ⎜ 31 0 0 0 0 ⎟ 2𝑎21 2𝑎22 2𝑎23 0 𝜕𝑎𝑖𝑗 ⎜ 0 𝑎31 𝑎32 𝑎33 𝑎21 𝑎22 𝑎23 ⎟ 0 0 ⎜ 0 0 2𝑎31 2𝑎32 2𝑎33 ⎠ 0 0 0 0 0 ⎝ Recall that row and column rank of any matrix are equal. Symbolic row/column 1 operations show that the row rank rk(𝑑𝜙)𝐴 = 6 = (𝑛2 + 𝑛), so dimℝ O(3) = 2 dimℝ SO(3) = 9 − 6 = 3 (see next exercise for hints on this calculation). ○ Exercise 5.31. For the real-orthogonal group SO(3) in M(3, ℝ), (a) In terms of the matrix elements 𝑎𝑖𝑗 for 𝐴 ∈ SO(3), write out the constraints 𝜙1 (𝐴) = 𝑐1 , … , 𝜙𝑟 (𝐴) = 𝑐6 that determine SO(3) as a hypersurface in M(3, ℝ), and verify that the entries in the preceding Jacobian matrix are correct. (b) Verify that the “constant rank” condition rk(𝑑𝜙)𝐴 = 6 of the IFT is satisfied by all 𝐴 close to the identity 𝐼 in SO(3). Hint. Row rank is the number of linearly independent rows or columns. After scaling the first row of (𝑑𝜙)𝐴 by 1/2, the original matrix 𝐴 appears as a 3 × 3 submatrix in the upper left corner with zeros below it. The original rows and columns of 𝐴 are linearly independent, so we can transform the submatrix 𝐴 to 𝐼3×3 by column operations. What happens when these column operations are applied to the full Jacobian matrix? (Additional steps, using both row and column operations, will be needed to finish the calculation.) Remark 5.32. In the next section, we will show that if a matrix Lie group 𝐺 ⊆ M(𝑚, ℝ) is determined by constraints 𝜙(𝐴) = 𝐜, then The rank rk(𝑑𝜙)𝐴 is constant near every 𝐴 ∈ 𝐺 if it is constant for all matrices 𝐴 ∈ 𝐺 close to the identity element 𝐼𝑚×𝑚 . This is entirely due to the fact that 𝐺 is a group of matrices, which forces the behavior of the rank (𝑑𝜙)𝐴 near any 𝐴 ∈ 𝐺 to be the same as that near the identity element. Constancy of rk(𝑑𝜙)𝐴 near the identity is a “local” condition on 𝐺, much easier to verify than constancy throughout 𝐺. ○ Exercise 5.33. In Example 5.30 we described the orthogonal group 𝐺 = SO(3) as the level set 𝑀 = 𝐿𝜙(𝑥)=𝐜 for a 𝒞 ∞ map 𝜙 ∶ M(3, ℝ) ≅ ℝ9 → ℝ6 and computed the 6 × 9 Jacobian matrix for arbitrary 𝐴 ∈ 𝐺. (a) Write out the Jacobian matrix of (𝑑𝜙)𝐴 at the identity element 𝐴 = 𝐼3×3 in 𝐺 and show that rk(𝑑𝜙)𝐼 = 6. (Notice how the rank calculations simplify in this special case.) (b) Explain why we must then have rk(𝑑𝜙)𝐴 = 6 for all 𝐴 in SO(3) that lie close to 𝐼. Hint. What is the maximum possible rank for 𝑑𝜙 in this example?

230

5. MATRIX LIE GROUPS

Exercise 5.34. Verify that the special orthogonal group SO(2) in M(2, ℝ) ≅ ℝ4 is a commutative matrix Lie group with dimℝ SO(2) = 1. The following questions outline the details needed to fully verify that the unitary groups U(2) = {𝐴 ∈ M(2, ℂ) ∶ 𝐴𝐴∗ = 𝐼} SU(2) = {𝐴 ∈ M(2, ℂ) ∶ 𝐴𝐴∗ = 𝐼 and det(𝐴) = 1} satisfy the conditions of the IFC, so they are matrix Lie groups in ℝ8 with dimℝ U(2) = 4 and dimℝ SU(2) = 3 when we identify M(2, ℂ) ≅ ℝ8 . To do so, we shall identify complex matrices 𝐴=(

𝑧1 𝑧2 ) 𝑧3 𝑧4

with entries 𝑧𝑘 = 𝑥𝑘 + √−1 𝑦𝑘 , (𝑥𝑘 , 𝑦𝑘 ∈ ℝ)

with real 8-tuples 𝐱 = (𝑥1 , … , 𝑥4 ; 𝑦1 , … , 𝑦4 ), regarding these groups as subsets of ℝ8 . Our comments will focus on verifying the hypotheses of IFT for SU(2). Analysis of U(2) is more complicated, but that case can be handled directly once we know SU(2) is a matrix Lie group by noting that U(2) is a product 𝑆 1 ⋅ SU(2) of familar subgroups. Here 𝑆1 is the unit-circle group realized as the set of complex matrices 𝑆1 = {𝑒𝑖𝜃 𝐼2×2 ∶ 𝜃 ∈ ℝ} in M(2, ℂ). Exercise 5.35. Let 𝐴=(

𝑎 𝑏 ) 𝑐 𝑑

(𝑎, 𝑏, 𝑐, 𝑑 ∈ ℂ)

be a matrix in M(2, ℂ). (a) Write out the complex constraint identities on 𝑎, 𝑏, 𝑐, 𝑑 that hold when 𝐴𝐴∗ = 𝐼2×2 and det(𝐴) = 1. Then rewrite this system to eliminate the variables 𝑐 and 𝑑 and prove that 𝐴 ∈ SU(2) ⇔ 𝐴 has the form 𝐴=(

𝑎 𝑏 ) −𝑏 𝑎

with

|𝑎|2 + |𝑏|2 = 1

for 𝑎, 𝑏 ∈ ℂ. (b) Recast these as identities in the real variables 𝑥1 , … , 𝑥4 ; 𝑦1 , … , 𝑦4 to describe SU(2) as the level set 𝐿𝜙(𝐱)=𝐜 for a suitably chosen map 𝜙 from ℝ8 → ℝ5 . What is the appropriate value 𝐜 = (𝑐1 , … , 𝑐5 )? (c) Once SU(2) has been described as a level set as in (b), compute the 5×8 Jacobian matrix and prove that rk(𝑑𝜙)𝐩 = 6 for all 𝑝 ∈ SU(2). (Then rank = 6 near every 𝑝 in SU(2) ⊆ M(2, ℂ) because this is a maximal rank case.) Hint. We have 𝐴𝐴∗ = 𝐼 ⇔ the rows in 𝐴 are an orthonormal basis in ℂ ; eliminating redundant (complex) identities leaves us with three identities (𝑅𝑖 , 𝑅𝑗 ) = 𝛿𝑖𝑗 (Kronecker delta) plus the constraint det(𝐴) = 𝑎𝑑−𝑏𝑐 = 1. Recall that | det(𝐴)| = 1 for all unitary matrices, so det(𝐴) = 𝑒𝑖𝜃 for some 𝜃 ∈ ℝ. 2

5.2. MATRIX LIE GROUPS

231

Continuing the discussion of Exercise 5.35, we pose a different question: Which choices of 𝑑 coordinates 𝑥𝑖1 , … , 𝑥𝑖𝑑 from 𝑥1 , … , 𝑥8 give valid parametrizations of SU(2) near the identity element? As in the IFT, we must find a partition of coordinate indices [1, 8] = 𝐽 ∪ 𝐽 ′ such ′ that the restriction (𝜋𝐽 ′ |𝑀 ) ∶ SU(2) → ℝ𝐽 is an admissible coordinate chart near 𝐼. (Not all choices work.) Exercise 5.36. Consider the identity element 𝐼2×2 in 𝐺 = SU(2). ′ (a) Produce a partition of [1, 8] = 𝐽 ′ ∪𝐽 such that 𝜋𝐽 ′ ∶ ℝ8 → ℝ𝐽 restricts to SU(2) to give a standard chart defined near 𝐼2×2 . It will follow that dimℝ SU(2) = |𝐽 ′ | = 3. (b) Determine all partitions [1, 8] = 𝐽 ′ ∪ 𝐽 that produce admissible chart ′ maps 𝜋𝐽 ∶ 𝑈𝛼 → ℝ3 near the identity in SU(2). (c) Produce an explicit partition [1, 8] = 𝐽 ′ ∪ 𝐽 with |𝐽 ′ | = 3 that does not yield valid coordinates describing SU(2) near the identity 𝐼. In practice, the need for such intricate calculations to prove that a group 𝐺 in M(𝑛, 𝕂) is a differentiable manifold (and hence a Lie group) can often be circumvented by appeal to a theorem due to von Neumann and developed further by Elie Cartan. Theorem 5.37 (J. von Neumann/E. Cartan). Every closed subgroup 𝐺 in M(𝑛, ℝ) is a matrix Lie group: it can be covered with 𝒞 ∞ -related charts (𝑥𝛼 , 𝑈𝛼 ) that make it into a Lie group over ℝ. This result does not identify the dimension 𝑑 = dim(𝐺) of the charts that cover 𝐺, which must be determined separately. On the other hand, it does not require that 𝐺 be a level set for some map 𝜙 ∶ ℝ𝑚 → ℝ𝑛 , and even if it is, there is no need to investigate rk(𝑑𝜙)𝐴 on 𝐺. The subgroup 𝐺 could even be a closed discrete subgroup (a zero-dimensional Lie group consisting of isolated points in matrix space) such as 𝐺 = ℤ ⊆ ℝ or SL(𝑛, ℤ) = {𝐴 ∈ M(𝑛, ℝ) ∶ all 𝑎𝑖𝑗 ∈ ℤ and det(𝐴) = 1}, for which differentiability considerations are irrelevant. As an example, consider 𝐺 = SO(𝑛, ℂ). The constant rank conditions of the IFT could be verified by brute force calculations, but it is easily seen that this matrix group is a closed subset in matrix space. Indeed, if 𝐴𝑘 → 𝐴 as 𝑘 → ∞ in matrix space and the 𝐴𝑘 are in SO(𝑛, ℂ), the limit 𝐴 must be in SO(𝑛, ℂ) because 𝐼 = 𝐴𝑘 𝐴T𝑘 → 𝐴𝐴T and det(𝐴) = lim𝑘→∞ det(𝐴𝑘 ) = 1. The same is true for the groups U(𝑛) and SU(𝑛) because 𝐴𝑘 → 𝐴 in M(𝑛, ℂ) ⇒ 𝐴∗𝑘 → 𝐴∗ with 𝐴𝑘 ∈ SU(𝑛) ⇒ 𝐴𝐴∗ = lim𝑘→∞ 𝐴𝑘 𝐴∗𝑘 = 𝐼 so 𝐴 ∈ U(𝑛), and if det(𝐴𝑘 ) = 1 then 𝐴 ∈ SU(𝑛) because det(𝐴) = lim𝑘→∞ det(𝐴𝑘 ) = 1. Translations and Automorphisms on a Lie Group. Many algebraic symmetry operations on matrix Lie groups are actually 𝒞 ∞ maps on the underlying manifolds. If 𝑔 ∈ 𝐺, we can define left and right translation operators 𝜆𝑥 , 𝜌𝑥 ∶ 𝐺 → 𝐺, letting 𝜆𝑥 (𝑔) = 𝑥 ⋅ 𝑔

and

𝜌𝑥 (𝑔) = 𝑔 ⋅ 𝑥.

232

5. MATRIX LIE GROUPS

These are continuous and invertible maps with the properties 𝜆𝑒 = id𝐺

𝜆𝑥⋅𝑦 = 𝜆𝑥 ∘ 𝜆𝑦

−1

𝜆𝑥−1 = (𝜆𝑥 ) ,

and likewise for right translations, except that 𝜌𝑥⋅𝑦 = 𝜌𝑦 ∘ 𝜌𝑥

(note reversal of factors).

These maps are diffeomeorphisms on 𝐺: they are bijections, and when 𝜆𝑥 or 𝜌𝑥 are described in terms of chart coordinates, they (and their inverses) become 𝒞 ∞ maps from ℝ𝑑 → ℝ𝑑 , where 𝑑 = dimℝ (𝐺). In fact, 𝜆𝑥 is obtained by holding fixed the first variable in the product map 𝑃(𝑥, 𝑦) = 𝑥𝑦 while the other is allowed to vary. Since 𝑃 ∶ 𝐺 × 𝐺 → 𝐺 is, by definition, jointly differentiable in all its variables, fixing one variable or the other yields 𝒞 ∞ maps 𝜆𝑥 and 𝜌𝑔 on 𝐺. There is a related action 𝐺 × 𝒞 ∞ (𝐺) → 𝒞 ∞ (𝐺) of 𝐺 on globally smooth functions 𝑓 on 𝐺: 𝐿𝑔 𝑓(𝑥) = 𝑓(𝑔−1 𝑥) = 𝑓 ∘ 𝜆𝑔−1 (𝑥)

(𝑔, 𝑥 ∈ 𝐺).

The inverse 𝑔−1 appearing here makes this a left action on 𝒞 ∞ (𝐺) in the sense that 𝐿𝑒 = 𝐼 𝐿𝑔1 𝑔2 = 𝐿𝑔1 ∘ 𝐿𝑔2 𝐿𝑔−1 = (𝐿𝑔 )−1 . (The action 𝑓(𝑥) ↦ 𝐿′𝑔 𝑓(𝑥) = 𝑓(𝑔 ⋅ 𝑥) = 𝑓 ∘ 𝜆𝑔 (𝑥) is a right action, with 𝐿′𝑔1 𝑔2 = 𝐿′𝑔2 ∘ 𝐿′𝑔1 , in which the order of the factors is reversed.) Definition 5.38 (The Automorphism Group Aut(𝐺)). An automorphism of a group 𝐺 is a bijection 𝛼 ∶ 𝐺 → 𝐺 that preserves all group operations, so 𝛼(𝑒) = 𝑒

𝛼(𝑥 ⋅ 𝑦) = 𝛼(𝑥) ⋅ 𝛼(𝑦)

𝛼(𝑥−1 ) = (𝛼𝑥 )−1

for 𝑥, 𝑦 ∈ 𝐺. The automorphisms Aut(𝐺) of a matrix Lie group 𝐺 are the bijections that preserve both the algebraic and the differentiable manifold structure of 𝐺. They are the 𝒞 ∞ diffeomorphisms of 𝐺 that are also group automorphisms. Exercise 5.39. Show that the inverse 𝛼−1 of an automorphism is also an automorphism and that the set of automorphisms Aut(𝐺) is itself a group under composition of operators, (𝛼1 ∘ 𝛼2 )(𝑥) = 𝛼1 (𝛼2 (𝑥))

for all 𝑥 ∈ 𝐺.

Example 5.40. The group 𝐺 acts on itself by “conjugation,” yielding the inner automorphisms 𝑖𝑔 ∶ 𝑥 → 𝑔𝑥𝑔−1 of 𝐺.2 These are diffeomorphisms from 𝐺 → 𝐺 because 𝑖𝑔 = 𝜆𝑔 ∘ 𝜌𝑔−1 is a composite of invertible 𝒞 ∞ translations. The inverse of 𝑖𝑔 is the inner automorphism (𝑖𝑔 )−1 = 𝑖𝑔−1 , and the correspondence Φ ∶ 𝑔 → 𝑖𝑔 is a homomorphism from 𝐺 into the full group Aut(𝐺) of automorphisms. The set Inn(𝐺) of inner auromorphisms is a subgroup of Aut(𝐺). Notice that 𝜆𝑥 , 𝜌𝑦 commute for all 𝑥, 𝑦 ∈ 𝐺 since they act on opposite sides of each group element. 2There may be other automorphisms of 𝐺, called outer automoprhisms, that do not arise as conjugations by elements of 𝐺. For instance, the identity map 𝑖𝑑 is the only inner automorphism on a commutative group such as (ℝ𝑛 , +), but this group has plenty of outer automorphisms: every invertible linear operator is an automorphism of ℝ𝑛 .

5.2. MATRIX LIE GROUPS

233

Exercise 5.41. Verify that conjugations 𝑖𝑔 (𝑥) = 𝑔𝑥𝑔−1 have the properties that make Φ(𝑔) = 𝑖𝑔 a group homomorphism from 𝐺 → Aut(𝐺): (a) 𝑖𝑒 = id𝐺

(b) 𝑖𝑔1 𝑔2 = 𝑖𝑔1 ∘ 𝑖𝑔2

(c) (𝑖𝑔 )−1 = 𝑖𝑔−1

for all 𝑔, 𝑔1 , 𝑔2 in 𝐺. The 𝒞 ∞ structure on a matrix Lie group is often obtained by identifying 𝐺 2 as a subset in M(𝑛, ℝ) ≅ ℝ𝑛 and then covering 𝐺 with standard charts (𝑥𝛼 , 𝑈𝛼 ) constructed via the IFT. The chart maps 𝑥𝛼 are just restrictions to 𝐺 of various 2 ′ 2 ′ projection maps 𝜋𝐽 ′ ∶ ℝ𝑛 → ℝ𝐽 that correspond to splittings ℝ𝑛 = ℝ𝐽 ⊕ ℝ𝐽 with |𝐽 ′ | = 𝑑 = dim(𝐺). Because of the way standard charts are defined, the next result makes it easy to check whether an arbitrary map 𝜓 ∶ ℝ𝑘 → 𝐺 into a matrix Lie group is a 𝒞 ∞ map with respect to the standard manifold structure on 𝐺. Proposition 5.42. Suppose 𝑀 = 𝐿𝜙(𝐱)=𝐜 is a smooth 𝑑-dimensional man2 ifold in ℝ𝑚 , a level set for some 𝒞 ∞ map 𝜙 from M(𝑚, ℝ) ≅ ℝ𝑚 → ℝ𝑛 . If 𝜓 ∶ ℝ𝑘 → M(𝑚, ℝ) is a 𝒞 ∞ map whose range happens to lie within 𝑀, then 𝜓 ∶ ℝ𝑘 → 𝑀 is automatically a 𝒞 ∞ map with respect to the standard 𝒞 ∞ manifold structures on 𝑀 and ℝ𝑘 . Proof. If 𝐱0 ∈ ℝ𝑘 , 𝑝0 = 𝜓(𝐱0 ), and (𝑈𝛼 , 𝑥𝛼 ) is a standard chart on 𝑀 2 ′ about 𝑝0 , then 𝑥𝛼 ∘ 𝜓 is a composite 𝜋𝐽 ∘ 𝜓 of a 𝒞 ∞ map 𝜓 ∶ ℝ𝑘 → 𝑀 ⊆ ℝ𝑚 , ′ 2 ′ and a linear map 𝜋𝐽 ∶ ℝ𝑚 → ℝ𝐽 ≅ ℝ𝑑 with 𝑑 = dim(𝑀), so 𝜓 is a 𝒞 ∞ map between Euclidean spaces when described in local coordinates on 𝑀. □ Tangent Spaces TM𝑝 of a Manifold. The tangent space TM𝑝 to a smooth hypersurface 𝑀 ⊆ ℝ𝑚 at a base point 𝑝 is often thought of as the set of (vectorvalued) derivatives 𝛾′ (0) of smooth parametrized curves 𝛾(𝑡) that pass through 𝑚 𝑝 when 𝑡 = 0 and remain within 𝑀 at all times. If 𝛾(𝑡) = ∑𝑗=1 𝑥𝑗 (𝑡) 𝐞𝑗 has differentiable coefficients, the classical derivative 𝑑𝛾/𝑑𝑡 is given by 𝑑𝛾 Δ𝛾 𝛾(𝑡 + Δ𝑡) − 𝛾(𝑡) (0) = lim = lim 𝑑𝑡 Δ𝑡 ∆𝑡→0 Δ𝑡 ∆𝑡→0 𝑑𝑥𝑚 𝑑𝑥1 = (0) 𝐞1 + ⋯ + (0) 𝐞𝑛 , 𝑑𝑡 𝑑𝑡

𝛾′ (0) =

where the 𝐞𝑗 are the standard basis vectors in ℝ𝑚 . The resulting tangent vectors 𝛾′ (0) at 𝑝 form a vector space, the tangent space TM𝑝 ⊆ ℝ𝑚 , whose elements are generally denoted by capital letters 𝑋𝑝 , 𝑌𝑝 , etc. Here, 𝑑 = dim(𝑀) ≤ 𝑚 = dim(ℝ𝑚 ). However, this classical interpretation of tangent vectors for hypersurfaces embedded in Euclidean space has proved inadequate for modern differential geometry and is antithetical to its spirit, which focuses on things intrinsic to the manifold and not hypothetical spaces in which they might (or might not) be embedded. Section 4.1 of these Notes provides a fast introduction to this

234

5. MATRIX LIE GROUPS

point of view, which regards tangent vectors at 𝑝 as operators 𝑋𝑝 ∶ 𝒞 ∞ (𝑝) → ℝ that act on the local algebra 𝒞 ∞ (𝑝), the space of smooth scalar-valued functions defined near 𝑝. Instead of regarding tangent vectors as derivatives 𝛾′ (0) along differentiable curves passing through 𝑝 when 𝑡 = 0, we are instead using 𝛾′ (0) to define a directional derivative 𝑋𝑝 , a linear operator whose action on functions in 𝒞 ∞ (𝑝) is defined to be ⟨𝑋𝑝 , 𝑓⟩ = ⟨𝛾′ (0), 𝑓⟩ =

𝑑 | {𝑓(𝛾(𝑡))}| |𝑡=0 𝑑𝑡

for 𝑓 ∈ 𝒞 ∞ (𝑝).

Thus, ⟨𝑋𝑝 , 𝑓⟩ is the instantaneous rate of change in the values of 𝑓, seen as 𝛾(𝑡) passes through 𝑝. The operators 𝑋𝑝 ∶ 𝒞 ∞ (𝑝) → ℝ are called derivations at 𝑝 which are completely characterized by two simple algebraic properties: 1. Linearity. They are linear functionals on the infinte dimensional local algebra 𝒞 ∞ (𝑝), so ⟨𝑋𝑝 , 𝑐1 𝑓1 + 𝑐2 𝑓2 ⟩ = 𝑐1 ⟨𝑋𝑝 , 𝑓1 ⟩ + 𝑐2 ⟨𝑋𝑝 , 𝑓2 ⟩ for 𝑓1 , 𝑓2 ∈ 𝒞 ∞ (𝑝), 𝑐1 , 𝑐2 ∈ ℝ. 2. Derivation Property. If 𝑓 ⋅ ℎ is the pointwise product of functions in 𝒞 ∞ (𝑝), then ⟨𝑋𝑝 , 𝑓 ⋅ ℎ⟩ = ⟨𝑋𝑝 , 𝑓⟩ ⋅ ℎ(𝑝) + 𝑓(𝑝) ⋅ ⟨𝑋𝑝 , ℎ⟩ (as in the “product rule” for derivatives in calculus). The space of derivations is the modern interpretation of the tangent space TM𝑝 . It is a vector space because sums and scalar multiples of derivations are again derivations. The Differential of a Map. A 𝒞 ∞ map between manifolds induces linear maps between the various tangent spaces. Definition 5.43 (Differential (𝑑𝜙)𝑝 of a Map 𝜙). A 𝒞 ∞ map 𝜙 ∶ 𝑀 → 𝑁 between manifolds that sends 𝑝 ∈ 𝑀 to 𝑞 = 𝜙(𝑝) ∈ 𝑁 induces a natural map between tangent spaces, a linear map (𝑑𝜙)𝑝 ∶ TM𝑝 → TN𝜙(𝑝) called the differential of 𝜙 at 𝑝. Notice that 𝜙 moves points forward from 𝑀 to 𝑁 and “pulls back” functions from 𝑁 to 𝑀 via 𝑓 ↦ 𝑓 ∘ 𝜙, while the differential (𝑑𝜙)𝑝 “pushes tangent vectors forward ” from TM𝑝 → TN𝜙(𝑝) . If we regard tangent vectors in TM𝑝 , TN𝑞 as derivation operators on the lo∞ ∞ cal algebras 𝒞𝑀 (𝑝), 𝒞𝑁 (𝑞), the action of the differential (𝑑𝜙)𝑝 can be described by specifying its action on elements 𝑋𝑝 ∈ TM𝑝 , letting (5.5)

⟨(𝑑𝜙)𝑝 𝑋𝑝 , 𝑓⟩ = ⟨𝑋𝑝 , 𝑓 ∘ 𝜙⟩

∞ for 𝑓 ∈ 𝒞𝑁 (𝑞).

On the other hand, if 𝑋𝑝 ∈ TM𝑝 is viewed as a vector derivative 𝛾′ (0) along some curve 𝛾(𝑡) in 𝑀 passing through 𝑝, then 𝜙 maps 𝛾(𝑡) to a smooth curve

5.2. MATRIX LIE GROUPS

235

𝜂(𝑡) = 𝜙(𝛾(𝑡)) in 𝑁 that passes through 𝑞 = 𝜙(𝑝) when 𝑡 = 0. The derivative 𝑌𝑞 = 𝜂′ (0) of this curve is precisely the image (𝑑𝜙)𝑝 𝑋𝑝 . The formal version of this statement is: Lemma 5.44. If 𝑋𝑝 = 𝛾′ (0) in TM𝑝 and 𝜙 ∶ 𝑀 → 𝑁 is a 𝒞 ∞ map, then in TN𝜙(𝑝) we have (𝑑𝜙)𝑝 𝑋𝑝 = (𝜙 ∘ 𝛾)′ (0) = 𝜂′ (0)

where 𝜂(𝑡) = 𝜙(𝜂(𝑡)).

Proof. By definition of the vector derivative 𝑑𝛾/𝑑𝑡, we have ⟨(𝑑𝜙)𝑝 𝑋𝑝 , 𝑓⟩ = ⟨𝑋𝑝 , 𝑓 ∘ 𝜙⟩ = =

𝑑 | {𝑓(𝜙 ∘ 𝛾(𝑡))}| |𝑡=0 𝑑𝑡

𝑑 | {𝑓(𝜂(𝑡))}| = ⟨𝜂′ (0), 𝑓⟩ |𝑡=0 𝑑𝑡 □

for all 𝑓 ∈ 𝒞 ∞ (𝑞), so 𝑋𝑝 = 𝜂′ (0). 𝜓

𝜙

Exercise 5.45. Given 𝒞 ∞ maps 𝑀 → 𝑁 → 𝑅 and points 𝑝 ∈ 𝑀, 𝑞 = 𝜓(𝑝) ∈ 𝑁, prove that 𝑑(𝜙 ∘ 𝜓)𝑝 = (𝑑𝜙)𝑞 ∘ (𝑑𝜓)𝑝 as maps TM𝑝 → TN𝑞 → TR𝜙(𝑞) = TR𝜙(𝜓(𝑝)) . Smooth Vector Fields on a Manifold. A vector field 𝑋˜ on a manifold assigns a tangent vector 𝑋𝑝 ∈ TM𝑝 to each base point in 𝑀. Example 5.46 (Translation-Invariant Vector Fields). For instance if 𝑀 is a matrix Lie group, we can move a tangent vector 𝑋 ∈ 𝔤 = TG𝑒 at the identity 𝑒 to a tangent vector (𝑑𝜆𝑔 )𝑒 𝑋 in TG𝑔 , obtaining a uniquely defined field of tangent vectors 𝑋˜𝑔 = (𝑑𝜆𝑔 )𝑒 𝑋 on 𝐺. This is a left-invariant vector field because (5.6)

𝑋𝑔⋅𝑝 = (𝑑𝜆𝑔 )𝑝 𝑋𝑝 in TG𝑔⋅𝑝

for all 𝑔, 𝑝 in 𝐺 and 𝑋𝑝 ∈ TG𝑝 .

Such fields are uniquely determined by their value 𝑋˜𝑒 = 𝑋 at the identity 𝑒 ∈ 𝐺. Similarly, right-invariant vector fields are obtained by applying right translations (𝑑𝜌𝑔 )𝑒 to an element 𝑋 ∈ TG𝑒 . But left- or right-invariant vector fields are quite special. For arbitrary vector fields, the values 𝑋˜𝑝 are not correlated as we move from one base point to another. Our interest here will be in “smoothly varying vector fields,” but the notion of smoothness can be defined in various ways, and it requires some preliminary discussion. For matrix Lie groups, one (not very elegant) approach is to argue that When we view a manifold 𝑀 as a 𝑑-dimensional hypersurface embedded in some higher-dimensional space ℝ𝑚 (perhaps a matrix space), tangent vectors are described as vectors in the larger space ℝ𝑚 𝑋˜𝑝 = 𝛾′ (0) = (𝑐1 (𝑝), … , 𝑐𝑚 (𝑝)) ∈ ℝ𝑚

236

5. MATRIX LIE GROUPS

attached to points 𝑝 ∈ 𝑀. Why not define smoothness of 𝑋˜ to mean that the entries 𝑐𝑗 (𝑝), 1 ≤ 𝑗 ≤ 𝑚, are 𝒞 ∞ functions on 𝑈𝛼 for all charts? That works for some purposes but is in many ways unsatisfactory. For one thing, the dimension 𝑑 = dim(𝑀) is usually < 𝑚 = the dimension of the larger space ℝ𝑚 in which 𝑀 is embedded. All those extra dimensions lie outside of 𝑀 and have little to do with its geometry. One would like to define smoothness of a vector field on 𝑀 in terms of constructs intrinsic to 𝑀. A better way to describe vector fields and perform calculations is to systematically prescribe basis vectors in the individual tangent spaces TM𝑝 attached to 𝑀 and use the fact that the tangent spaces are permuted by left translations (𝑑𝜆𝑔 ). The charts (𝑈𝛼 , 𝑥𝛼 ) that cover 𝑀 provide a way to do this. Given a chart, we can construct correlated bases (5.7)

𝔛𝑝 = {(

𝜕 | 𝜕 | | ), … , ( | )} 𝜕𝑥1 |𝑝 𝜕𝑥𝑑 |𝑝

throughout 𝑈𝛼 . (We will soon explain this odd notation for the basis vectors.) If 𝐮 = 𝑥𝛼 (𝑢) = (𝑥1 (𝑢), … , 𝑥𝑑 (𝑢)) in coordinate space ℝ𝑑 for 𝑢 ∈ 𝑈𝛼 , we can define smooth straight-line curves in ℝ𝑑 emanating from 𝐩 = 𝑥𝛼 (𝑝): 𝛾𝑖∗ (𝑡) = 𝐩 + 𝑡𝐞𝑖

(𝐞𝑖 the standard basis vectors in ℝ𝑑 , 1 ≤ 𝑖 ≤ 𝑑).

−1 The inverse 𝑥𝛼 ∶ ℝ𝑑 → 𝑀 of the chart map transfers these curves back to −1 ∗ 𝑀, and the resulting smooth curves 𝛾𝑖 (𝑡) = 𝑥𝛼 (𝛾𝑖 (𝑡)) in 𝑀 determine tangent vectors (𝜕/𝜕𝑥𝑖 |𝑝 ) in TM𝑝 whose action as derivations on 𝒞 ∞ (𝑝) is

⟨

𝜕 | 𝑑 | | , 𝑓⟩ = ⟨𝛾𝑖′ (0), 𝑓⟩ = {𝑓(𝛾𝑖 (𝑡))}| |𝑡=0 𝜕𝑥𝑖 |𝑝 𝑑𝑡 =

𝑑 | −1 (𝐩 + 𝑡𝐞𝑖 )| } {𝑓 ∘ 𝑥𝛼 |𝑡=0 𝑑𝑡

−1 = 𝐷𝑥𝑖 (𝑓 ∘ 𝑥𝛼 )(𝐩),

where 𝐷𝑥𝑖 is the classical 𝑖th partial derivative at 𝐩 = 𝑥𝛼 (𝑝) of functions on ℝ𝑑 . The corresponding derivation 𝛾𝑖′ (0) on 𝒞 ∞ (𝑝) is just the partial derivative 𝐷𝑥𝑖 on ℝ𝑑 lifted back to 𝑀 — which explains the suggestive notation (5.6) for the basis vectors associated with a chart. With this in mind, we can also define partial derivatives 𝜕𝑓 𝜕 | | , 𝑓⟩ (for 𝑝 ∈ 𝑈𝛼 ; 1 ≤ 𝑖 ≤ 𝑑) (𝑝) = ⟨ 𝜕𝑥𝑖 𝜕𝑥𝑖 |𝑝 of smooth functions 𝑓 ∶ 𝑀 → ℝ on the chart domain. Note carefully that 𝑓(𝑢) and its partial derivatives (𝜕𝑓/𝜕𝑥𝑖 )(𝑢) all live on the manifold and not on coordinate space ℝ𝑑 . Furthermore, we have simultaneously assigned basis vectors | | (𝜕/𝜕𝑥1 | ), … , (𝜕/𝜕𝑥𝑑 | ) in TM𝑝 for all 𝑝 ∈ 𝑈𝛼 . |𝑝 |𝑝

5.2. MATRIX LIE GROUPS

237

Exercise 5.47. The chart map 𝑥𝛼 ∶ 𝑈𝛼 → ℝ𝑑 has scalar components 𝑥𝛼 (𝑢) = (𝑋1 (𝑢), … , 𝑋𝑑 (𝑢)), where each 𝑋𝑘 ∶ 𝑈𝛼 → ℝ is a scalar-valued 𝒞 ∞ function on 𝑀. (a) Prove from the definitions that 𝜕𝑋𝑖 𝜕 | | , 𝑋 (𝑢)⟩ = 𝛿𝑖𝑗 (Kronecker delta) (𝑢) = ⟨ 𝜕𝑥𝑗 𝜕𝑥𝑗 |𝑢 𝑖 on all of 𝑈𝛼 . (b) Use this to show that the vectors (𝜕/𝜕𝑥𝑖 |𝑝 ) are linearly independent in TM𝑝 for each 𝑝 ∈ 𝑈𝛼 . (c) Explain why these vectors also span the tangent space TM𝑝 , so 𝔛𝑝 is a basis. Hint. Proving (a) is easy once you get comfortable with the notation; then (a) ⇒ (b) follows easily. In (c), consider the tangent vectors you get by taking directional derivatives along curves 𝛾(𝑡) through 𝑝 ∈ 𝑀 that correspond to straight lines through 𝐩 = 𝑥𝛼 (𝑝) in coordinate space. We shall define smoothness of vector fields in terms of the bases (5.6) in TM𝑝 determined by charts on 𝑀. This definition is intrinsic to the manifold because it makes no reference to any “surrounding space” ℝ𝑚 . The following lemma, proved as Corollary 4.16 of Chapter 4, makes it easy to write tangent vectors 𝛾′ (0) to smooth curves through 𝑝 in terms of the basis 𝔛𝑝 . Lemma 5.48. If 𝑝 ∈ 𝑀 lies in a chart (𝑈𝛼 , 𝑥𝛼 ) and 𝛾(𝑡) is a 𝒞 ∞ curve passing through 𝑝 when 𝑡 = 0, then in terms of the basis (5.6) we have 𝑑

𝑑𝑥𝑗 𝜕 | | ), (0) ⋅ ( 𝑑𝑡 𝜕𝑥 𝑗 |𝑝 𝑗=0

𝛾′ (0) = ∑ where

𝑥𝛼 ∘ 𝛾(𝑡) = (𝑥1 (𝑡), … , 𝑥𝑑 (𝑡)) is the description of 𝛾(𝑡) in local chart coordinates. Proof. Apply both sides to a typical 𝒞 ∞ function defined near 𝑝 and compare. □ With all this in mind, we make the following definition of smoothness. Definition 5.49 (Smooth Vector Fields on a Manifold 𝑀). On any chart (𝑈𝛼 , 𝑥𝛼 ), a vector field 𝑋˜ on 𝑀 can be uniquely represented in terms of the basis vectors 𝔛𝑢 , 𝑑

(5.8)

𝑋𝑢 = ∑ 𝑐𝑘𝛼 (𝑢) ⋅ ( 𝑗=1

𝜕 | | ) 𝜕𝑥𝑗 |𝑢

for 𝑢 ∈ 𝑈𝛼 .

We say 𝑋˜ is a smooth vector field on 𝑀 if the coefficients 𝑐𝑗𝛼 (𝑢) are 𝒞 ∞ functions −1 ∶ ℝ𝑑 → ℝ is 𝒞 ∞ ). of 𝑢 ∈ 𝑈𝛼 for all charts (so 𝑐𝑘𝛼 ∘ 𝑥𝛼

238

5. MATRIX LIE GROUPS

This definition is independent of the choice of local chart coordinates: in Lemma 4.18 and Corollary 4.16 of Chapter 4 we proved that if the coefficients are 𝒞 ∞ near 𝑝 for one chart, they are 𝒞 ∞ for any other chart containing 𝑝. Exercise 5.50 (Matrix Lie Groups Versus Lie Groups). Elements of a (real) matrix Lie group 𝐺 are matrices [𝐴𝑖𝑗 ] ∈ M(𝑛, ℝ); tangent vectors 𝑋 ∈ TG𝑝 are also matrices in M(𝑛, ℝ), being derivatives 𝛾′ (0) = [𝑋𝑖𝑗 (𝑝)] of smooth matrixvalued curves in M(𝑛, ℝ) that remain within 𝐺 at all times. Vector fields on 𝐺 are likewise matrix-valued since they are derivatives of matrix-valued curves in matrix space. Suppose we define smoothness of such a vector field 𝑋˜ to mean the individual matrix entries 𝑋𝑖𝑗 (𝑝) are 𝒞 ∞ functions with respect to the charts that −1 determine the differentiable structure of 𝐺, so 𝑋𝑖𝑗 ∘ 𝑥𝛼 ∶ ℝ𝑑 → ℝ is 𝒞 ∞ in the usual sense. Prove that 𝑋˜ is a smooth vector field as in Definition 5.49. Hint. This is a matter of reconciling various definitions, but it is a good check to whether you’ve got the definitions straight. For instance, when 𝐺 is a matrix Lie group, every left-invariant vector field on 𝐺 is obtained by transporting a fixed tangent vector 𝑋 ∈ TG𝑒 at the identity to other base points in 𝐺 by left translation, letting 𝑋˜𝑔 = (𝑑𝜆𝑔 )𝑒 𝑋. In this situation, 𝑋˜ is automatically smooth, essentially because group operations are differentiable. Although this seems intuitively obvious, proving it from the preceding definition is a bit messy and not particularly instructive, so we won’t go into those details here. Exercise 5.51. If 𝐺 is a Lie group as in Definition 5.22 and 𝑋 ∈ TG𝑒 , we get a well-defined field of vectors 𝑋˜ on 𝐺 if we take 𝑋˜𝑔 = (𝑑𝜆𝑔 )𝑒 𝑋. (a) Verify that 𝑋˜ is left-invariant, so that 𝑋˜𝑔⋅𝑝 = 𝑋˜𝑝 for all 𝑝, 𝑔 ∈ 𝐺. (b) If we define left translates of functions on 𝐺, letting 𝐿𝑔 𝑓(𝑥) = 𝑓(𝑔−1 𝑥) for all 𝑓 ∈ 𝒞 ∞ (𝐺) and 𝑔, 𝑥 ∈ 𝐺, verify that 𝑋˜ is left-invariant ⇔ ˜ = 𝑋(𝐿 ˜ 𝑔 (𝑓)) for all 𝑓 ∈ 𝒞 ∞ (𝐺) — i.e., 𝑋˜ commutes with all 𝐿𝑔 (𝑋𝑓) translation operators 𝐿𝑔 on 𝒞 ∞ (𝐺). (c) Using the fact that left translations 𝜆𝑔 are 𝒞 ∞ diffeomorphisms of 𝐺, verify the claim that 𝑋˜ is always a smooth vector field with respect to the manifold structure of 𝐺 for any 𝑋 ∈ 𝔤 using the fact that translations 𝜆𝑔 are diffeomorphisms of the manifold 𝐺. Smooth Vector Fields as Differential Operators on 𝑀. The set 𝒟(𝑀) of all smooth vector fields on a manifold 𝑀 is an infinite-dimensional vector space if we define the vector space operations, letting ˜ 𝑝 = 𝑐 ⋅ 𝑋˜𝑝 for 𝑐 ∈ ℝ (𝑐𝑋)

and

(𝑋˜ + 𝑌˜)𝑝 = 𝑋˜𝑝 + 𝑌˜𝑝

in TM𝑝 at every base point 𝑝 ∈ 𝑀. Since the local algebras 𝒞 ∞ (𝑝) at various base points all include the space 𝒞 ∞ (𝑀) of globally 𝒞 ∞ functions on 𝑀, each

5.2. MATRIX LIE GROUPS

239

smooth vector field 𝑋˜ determines a natural linear operator 𝑋˜ on the space of globally defined smooth functions 𝒞 ∞ (𝑀) if we let ˜ 𝑋𝑓(𝑝) = ⟨𝑋˜𝑝 , 𝑓⟩

for all

𝑓 ∈ 𝒞 ∞ (𝑀), 𝑝 ∈ 𝑀.

˜ are 𝒞 ∞ because, by equation (5.8), on any chart domain 𝑈𝛼 The functions 𝑋𝑓 we have 𝑑

𝜕 | ˜ | , 𝑓(𝑢)⟩ 𝑋𝑓(𝑢) = ⟨ ∑ 𝑐𝑘𝛼 (𝑢) 𝜕𝑥𝑗 |𝑢 𝑗=1

𝑑

= ∑ 𝑐𝑘𝛼 (𝑢)

(5.9)

𝑗=1

𝜕𝑓 (𝑢) ∈ 𝒞 ∞ (𝑈𝛼 ) 𝜕𝑥𝑗

for 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ). In particular, a vector field 𝑋˜ on a matrix Lie group 𝐺 is leftinvariant ⇔ it commutes with left translations, so that ˜ = 𝑋(𝐿 ˜ 𝑔 (𝑓)) or 𝐿𝑔 ∘ 𝑋˜ ∘ 𝐿−1 (5.10) 𝐿𝑔 (𝑋𝑓) 𝑔 (𝑓) = 𝑓 for all 𝑔 ∈ 𝐺 and 𝑓 ∈ 𝒞 ∞ (𝐺), 3 where 𝐿𝑔 𝑓(𝑥) = 𝑓(𝑔−1 𝑥). Thus, in local coordinates on a chart (𝑈𝛼 , 𝑥𝛼 ), the action of 𝑋˜ on smooth functions 𝑓 ∈ 𝒞 ∞ (𝑀) is the same as that of the partial differential operator with variable coefficients 𝑑

𝐿 = ∑ 𝑐𝑗𝛼 (𝑢) ⋅ (

(5.11)

𝑗=1

𝜕 | | ) 𝜕𝑥𝑗 |𝑢

for 𝑢 ∈ 𝑈𝛼 .

Notice that 𝐿 is described on 𝑈𝛼 as a first-order partial differential operator with 𝒞 ∞ coefficients and is homogeneous in the sense that it has no degree-0 scalar term 𝑐0 (𝑢)𝐼. This representation of 𝑋˜ will change if we pass to a different local chart (𝑈𝛽 , 𝑦𝛽 ), but the new description is easily found using the change of variables formula (Corollary 4.16), and its coefficients are again 𝒞 ∞ on 𝑈𝛽 . It is not so clear what one might mean by a “partial differential operator with smooth coefficients” on a manifold 𝑀, since 𝑀 has no preferred system of global coordinates. A natural definition, then, is to say 𝐿 is a smooth differential operator on 𝑀 if it is a linear operator 𝐿 ∶ 𝒞 ∞ (𝑀) → 𝒞 ∞ (𝑀) whose action looks like that of a smooth-coefficient partial differential operator whenever it is described in local coordinates, so that on any chart (𝑈𝛼 , 𝑥𝛼 ) the action of 𝐿 is described by a finite sum 𝑚

(5.12)

𝐿𝑓(𝑢) = 𝑐0 (𝑢)𝑓(𝑢) + ∑ 𝑐𝑗 (𝑢) 𝑗=1

𝑚

𝜕2 𝑓 𝜕𝑓 (𝑢) + ∑ 𝑐𝑗𝑘 (𝑢) (𝑢) + ⋯ 𝜕𝑥𝑗 𝜕𝑥𝑗 𝜕𝑥𝑘 𝑗,𝑘=1

∞

for all 𝑓 ∈ 𝒞 (𝑈𝛼 ). The “degree” deg(𝐿) is that of the highest order partial derivative appearing with a nonzero coefficient, and the order-zero term 𝑐0 (𝑢)𝐼 is referred to as the “scalar term” (which might be zero). 3The “𝑔−1 ” that appears in the definition of 𝐿𝑔 𝑓(𝑥) = 𝑓(𝑔−1 𝑥) ensures that we have a “left action” of 𝐺 on smooth vector fields, with 𝐿𝑔1 𝑔2 = 𝐿𝑔1 ∘ 𝐿𝑔2 ; without it we would have a “right action,” in which the factors on the right appear in reverse order.

240

5. MATRIX LIE GROUPS

It is clear from (5.11) that smooth vector fields on 𝑀 become the smooth homogeneous first-order partial differential operators on 𝑀. (It is not hard to check that the notions of “first order” and “homogeneous” persist and make sense in all coordinate systems.) The set 𝒟(𝑀) of all smooth vector fields on a manifold 𝑀 becomes an infinite-dimensional vector space if we define sums 𝑋˜ + 𝑌˜ and scalar multiples 𝑐 ⋅ 𝑋˜ base-point by base-point, letting (𝑋˜ + 𝑌˜)𝑝 = 𝑋˜𝑝 + 𝑌˜𝑝

˜ 𝑝 = 𝑐 ⋅ 𝑋˜𝑝 (𝑐𝑋)

in TM𝑝 for all 𝑝 ∈ 𝑀. Now let 𝑀 be a matrix Lie group 𝐺. We have noted that left-invariant vector fields on 𝐺 are automatically smooth (essentially because group multiplication is a differentiable map 𝐺 × 𝐺 → 𝐺). They form a vector subspace ℒ(𝐺) in the ˜ 𝑌˜ are determined by their values larger space 𝒟(𝐺) because two such fields 𝑋, ˜ ˜ ˜ 𝑋 = 𝑋𝑒 , 𝑌 = 𝑌𝑒 in the tangent space TG𝑒 and (𝑋˜ + 𝑌˜)𝑒 = 𝑋 + 𝑌 = (𝑋 + 𝑌 )𝑒 . Then the map Φ ∶ 𝔤 → ℒ(𝐺) that assigns to 𝑋 ∈ 𝔤 the unique left-invariant vector field 𝑋˜ with values 𝑋˜𝑝 = (𝑑𝜆𝑝 )𝑒 (𝑋) ∈ TG𝑝

for 𝑝 ∈ 𝐺, 𝑋 ∈ 𝔤

is a bijective linear isomorphism between 𝔤 = TG𝑒 and the vector space ℒ(𝐺) in 𝒟(𝐺). This correspondence between tangent vectors 𝑋 at the identity and leftinvariant partial differential operators 𝑋˜ on the whole group leads to many important insights about Lie groups. For one thing, it reveals, in a natural way, that the tangent space 𝔤 = TG𝑒 of a Lie group has remarkable “Lie algebra structure” that becomes apparent when we realize 𝔤 as a space ℒ(𝐺) of left-invariant partial differential operators on 𝐺. In order to explain this connection, we shall take time below to describe the general concept of a “Lie algebra” and provide a direct construction of the associated “Lie bracket operation” in TG𝑒 . That direct construction (for matrix Lie groups) will perhaps seem unintuitive, but its techniques are transparent and are elementary enough to allow straightforward matrix calculations. An alternative approach is needed to construct the Lie bracket that makes 𝔤 = TG𝑒 a Lie algebra for general Lie groups, and the identification of 𝔤 ≅ ℒ(𝐺) is the key to that construction at the heart of modern Lie theory. In this limited account, we will not be able to present all those details, but here is a meaningful outline of what happens: 1. One begins by identifying 𝔤 with the space ℒ(𝐺) of left-invariant vector fields 𝑋˜ on 𝐺, which in turn can be identified as the finite-dimensional vector space ℒ(𝐺) of homogeneous first-order partial differential operators defined on all of 𝐺. As such, ℒ(𝐺) is a subspace of the infinitedimensional vector space Diff(𝐺) of all linear partial differential operators (of any degree) with 𝒞 ∞ coefficients. But in addition to being a vector space, Diff(𝐺) also has a natural multiplication operation given by composition of operators, (𝑆 ⋅ 𝑇)(𝑓) = 𝑆(𝑇(𝑓))

for 𝑓 ∈ 𝒞 ∞ (𝐺).

5.2. MATRIX LIE GROUPS

241

This makes Diff(𝐺) a (noncommutative) associative algebra with identity. 2. Any associative algebra 𝒜 has a natural bracket operation, the associative commutator of two operators [𝑆, 𝑇] = 𝑆𝑇 − 𝑇𝑆

(𝑆, 𝑇 ∈ 𝒜).

This bilinear operation on pairs of elements 𝑆, 𝑇 ∈ 𝒜 measures the degree to which they fail to commute, since [𝑆, 𝑇] = 0 ⇔ 𝑆 and 𝑇 commute. Equipped with this bracket operation, Diff(𝐺) becomes a “Lie algebra,” a concept that will be defined and examined further in Section 5.3. 3. We now ask Is the finite-dimensional vector subspace ℒ(𝐺) ⊆ Diff(G) closed ˜ 𝑌˜] = 𝑋˜𝑌˜ − 𝑌˜𝑋˜ ? under formation of commutators [𝑋, There are several reasons this may seem unlikely, but they aren’t worth discussing because the answer is “yes,” and for an interesting reason that we now explain. In any case, once we know that [ℒ(𝐺), ℒ(𝐺)] ⊆ ℒ(𝐺), it follows immediately that ℒ(𝐺) becomes a finite-dimensional Lie algebra. By transferring this operation over to the tangent space 𝔤 = TG𝑒 ≅ ℒ(𝐺) we make 𝔤 into a Lie algebra with dim(𝔤) = dim(𝐺) = dim ℒ(𝐺). We close this commentary by showing why ℒ(𝐺) is closed under formation of comutators. Smooth vector fields 𝑋˜ on 𝐺 are particular examples of globally defined smooth partial differential operators in Diff(𝐺). If we describe 𝑋˜ in local coordinates on a chart (𝑈𝛼 , 𝑥𝛼 ) it becomes a homogeneous first-order partial differential operator with variable coefficents (no scalar term): 𝑑

𝜕 | | ) 𝑋˜𝑢 = ∑ 𝑐𝑗𝛼 (𝑢) ⋅ ( 𝜕𝑥𝑗 |𝑢 𝑗=1

for 𝑢 ∈ 𝑈𝛼

with coefficients in 𝒞 ∞ (𝑈𝛼 ). Such operators are determined by what they do to smooth functions 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ), and within Diff(𝐺) two partial differential operators on 𝑈𝛼 can be multiplied by composition to get ˜ 𝑌˜(𝑓)) 𝑋˜𝑌˜(𝑓) = 𝑋(

for 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ).

˜ 𝑌˜ in local coordinates, When we describe 𝑋, 𝑑

𝜕 𝑋˜ = ∑ 𝑎𝑗 (𝑢) ⋅ 𝜕𝑥 𝑗 𝑗=1

𝑑

and

𝜕 𝑌˜ = ∑ 𝑏𝑘 (𝑢) ⋅ , 𝜕𝑥 𝑘 𝑘=1

we find that their product is a second order partial differential operator, 𝑑

𝑑

𝜕𝑓 𝜕2 𝑓 𝜕𝑏 𝑋˜𝑌˜𝑓(𝑢) = ∑ 𝑎𝑗 (𝑢)( ∑ 𝑘 (𝑢) ⋅ (𝑢)) + 𝑎𝑗 (𝑢)𝑏𝑘 (𝑢) (𝑢) 𝜕𝑥𝑗 𝜕𝑥𝑘 𝜕𝑥𝑗 𝜕𝑥𝑘 𝑘=1 𝑗,𝑘=1

242

5. MATRIX LIE GROUPS

˜ But when we take the difference of these for 𝑓 ∈ 𝒞 ∞ (𝑈𝛼 ). Similarly for 𝑌˜𝑋𝑓. ˜ 𝑌˜] = 𝑋˜𝑌˜ − 𝑌˜𝑋˜ involving second expressions, all terms in the commutator [𝑋, partial derivatives cancel because the mixed second order partial derivatives ˜ 𝑌˜] on of smooth functions are equal. Thus the action of the commutator [𝑋, ∞ 𝒞 (𝑀) is described, on any chart, as a homogeneous first-order operator: 𝑚

𝑛

˜ 𝑌˜]𝑓(𝑢) = ∑ ( ∑ 𝑎𝑖 [𝑋, 𝑗=1

𝑖=1

𝜕𝑏𝑗 𝜕𝑎𝑗 𝜕𝑓 | | ) − 𝑏𝑖 )⋅( 𝜕𝑥𝑖 𝜕𝑥𝑖 𝜕𝑥𝑗 |𝑢

for 𝑢 ∈ 𝑈𝛼 .

˜ 𝑌˜] is a smooth Since this is true on every chart domain 𝑈𝛼 , the commutator [𝑋, vector field in 𝒟(𝑀). Finally, this commutator field is left-invariant (as in (5.10)) and hence is equal to 𝑍˜ for a unique element 𝑍 ∈ 𝔤. Thus 𝔤 ≅ ℒ(𝐺) is closed under formation of associative commutators in Diff(𝐺) and is a Lie algebra. 5.3. Lie Algebra Structure in Tangent Spaces of Lie Groups The tangent space TG𝑒 of a matrix Lie group at its identity element 𝑒 ∈ 𝐺 is of special interest to us, and we shall denote such spaces by German “fraktur” letters 𝔤 = TG𝑒 , 𝔥 = TH𝑒 , etc. These are of course vector subspaces in the matrix space M(𝑛, ℝ) in which 𝐺 is embedded. But their most important property is that the group operation 𝑥 ⋅ 𝑦 in 𝐺 induces a distinctive bilinear Lie bracket operation [𝑋, 𝑌 ] on tangent vectors that sends a pair 𝑋, 𝑌 in 𝔤 to a new tangent vector [𝑋, 𝑌 ] in 𝔤. This makes 𝔤 into a “Lie algebra.” Definition 5.52. A Lie algebra 𝔤 over a field of scalars 𝕂 is a vector space over 𝕂, equipped with a bracket operation 𝐵(𝑋, 𝑌 ) = [𝑋, 𝑌 ] that has the following properties: 1. Bilinearity. It is a vector-valued bilinear map from 𝔤 × 𝔤 → 𝔤 (linear in each input when the other is held fixed). 2. Skew-Symmetry. [𝑌 , 𝑋] = −[𝑋, 𝑌 ] for all 𝑋, 𝑌 ∈ 𝔤. In particular [𝑋, 𝑋] = 0 (the zero vector in 𝔤) for all 𝑋. 3. Jacobi Identity. For 𝑋, 𝑌 , 𝑍 ∈ 𝔤 we have [𝑋, [𝑌 , 𝑍]] + [𝑌 , [𝑍, 𝑋]] + [𝑍, [𝑋, 𝑌 ]] = 0. In view of these laws, if a Lie algebra 𝔤 is finite dimensional and 𝔛 = {𝑋1 , … , 𝑋𝑑 } is a vector basis, its algebraic structure is completely determined once we know the brackets of the basis vectors 𝑑

(𝑘)

[𝑋𝑖 , 𝑋𝑗 ] = ∑ 𝑐𝑖𝑗 𝑋𝑘

(𝑘)

(𝑐𝑖𝑗 ∈ 𝕂, 1 ≤ 𝑖, 𝑗 ≤ 𝑑).

𝑘=1 3

The 𝑑 coefficients appearing here are the structure constants of 𝔤 with respect to the basis 𝔛. The basic example of a Lie algebra is provided by the “commutator bracket” operation [𝑋, 𝑌 ] = 𝑋𝑌 − 𝑌 𝑋 (matrix products)

5.3. LIE ALGEBRA STRUCTURE IN TANGENT SPACES OF LIE GROUPS

243

on the associative algebra M(𝑛, 𝕂) of 𝑛×𝑛 matrices with entries in 𝕂; it measures the degree to which two matrices 𝑋 and 𝑌 fail to commute. Exercise 5.53. Verify that the Jacobi identity of Definition 5.52 holds for the associative commutator [𝑋, 𝑌 ] = 𝑋𝑌 − 𝑌 𝑋 on the matrix algebra M(𝑛, 𝕂) Hint. Expand all commutators in long form, as sums of expressions that are permutations of 𝑋𝑌 𝑍. Everything cancels. Similarly, the space Hom𝕂 (𝑉, 𝑉) of 𝕂-linear operators on a vector space 𝑉 becomes an associative algebra if we take composition of operators 𝑆 ∘ 𝑇 as its multiplication operator; it becomes a Lie algebra when equipped with the commutator bracket [𝑆, 𝑇] = 𝑆∘𝑇 −𝑇∘𝑆. No surprise here: if dim(𝑉) = 𝑛 and 𝔛 is a basis in 𝑉, we saw in Section 2.4 of LA I that the map 𝑇 → [𝑇]𝔛𝔛 is a natural isomorphism Hom𝕂 (𝑉, 𝑉) → M(𝑛, 𝕂) of associative algebras. This respects linear combinations and products, so it sends commutators to commutators. Hence it is also an isomorphism between the Lie algebras. Lie Algebra of a Matrix Lie Group. We can now explain how the multiplication operation 𝑥 ⋅ 𝑦 in 𝐺 induces a natural Lie bracket operation in the tangent space 𝔤 = TG𝑒 at the identity element in 𝐺, making 𝔤 into a Lie algebra. Two observations motivate our interest in this Lie algebra structure. • Being a vector space, the Lie algebra 𝔤 is a linear structure to which we can apply all the tools of Linear Algebra. That is a good thing because Lie algebras are much more accessible objects than the nonlinear Lie groups from which they are derived. • The Lie algebra 𝔤 might have remained a mathematical curiosity, were it not for its most important property: The Lie algebra structure of 𝔤, equipped with its Lie bracket, encodes almost all the information needed to completely reconstruct the original Lie group. This has been heavily exploited to reduce many investigations of Lie groups to purely algebraic questions about their Lie algebras. By now, Lie algebras have become a topic as important as the Lie groups from which they arose. It is not immediately apparent how the multiplication operation in 𝐺 induces a Lie bracket operation in the tangent space 𝔤. One approach is to get a handle on vectors in the tangent spaces TG𝑝 by regarding tagent vectors at 𝑝 as derivatives 𝛾′ (0) along differentiable curves 𝛾(𝑡) in 𝐺 that pass through 𝑝 at time 𝑡 = 0. This identifies tangent vectors at 𝑝 with derivation operators 𝑋𝑝 on functions 𝑓 in the local algebra 𝒞 ∞ (𝑝), letting ⟨𝑋𝑝 , 𝑓⟩ = ⟨𝛾′ (0), 𝑓⟩ = ⟨

𝑑 𝑑 | | {𝛾(𝑡)}| , 𝑓⟩ = {𝑓(𝛾(𝑡))}| | |𝑡=0 𝑑𝑡 𝑑𝑡 𝑡=0

for 𝑓 ∈ 𝒞 ∞ (𝑝). These operators are derivations on 𝒞 ∞ (𝑝), and in modern differential geometry they are the elements of the tangent space TM𝑝 .

244

5. MATRIX LIE GROUPS

Next, some basic facts about derivatives of algebraic combinations of smooth curves in a Lie group, and the tangent vectors 𝑋𝑝 ∈ TG𝑝 on local algebras 𝒞 ∞ (𝑝) they determine. Lemma 5.54. If 𝛾, 𝜂 ∶ ℝ → M(𝑛, 𝕂) are 𝒞 ∞ curves defined for 𝑎 < 𝑡 < 𝑏, then 𝑑 𝑑 𝑑 1. Sum Rule. {𝛾(𝑡) + 𝜂(𝑡)} = {𝛾(𝑡)} + {𝜂(𝑡)} for all 𝑡. 𝑑𝑡 𝑑𝑡 𝑑𝑡 𝑑𝛾 𝑑 2. Scaling Law. {𝑐 ⋅ 𝛾(𝑡)} = 𝑐 ⋅ (𝑡) for 𝑐 ∈ 𝕂. 𝑑𝑡 𝑑𝑡 3. Derivation Law. The derivative of the matrix product 𝛾(𝑡) ⋅ 𝜂(𝑡) is 𝑑𝜂 𝑑𝛾 𝑑 {𝛾(𝑡) ⋅ 𝜂(𝑡)} = (𝑡) ⋅ 𝜂(𝑡) + 𝛾(𝑡) ⋅ (𝑡) 𝑑𝑡 𝑑𝑡 𝑑𝑡

for all 𝑡.

The factors cannot be interchanged unless 𝐺 is an abelian group. Proof. Items (1.) and (2.) are trivial. For (3.) write the appropriate difference quotients as 𝛾(𝑡 + Δ𝑡) ⋅ 𝜂(𝑡 + Δ𝑡) − 𝛾(𝑡) ⋅ 𝜂(𝑡) Δ𝑡 𝛾(𝑡 + Δ𝑡) ⋅ 𝜂(𝑡 + Δ𝑡) − 𝛾(𝑡) ⋅ 𝜂(𝑡 + Δ𝑡) 𝛾(𝑡) ⋅ 𝜂(𝑡 + Δ𝑡) − 𝛾(𝑡) ⋅ 𝜂(𝑡) = + Δ𝑡 Δ𝑡 Δ𝜂 Δ𝛾 ⋅ 𝜂(𝑡 + Δ𝑡) + 𝛾(𝑡) ⋅ . = Δ𝑡 Δ𝑡 These converge to 𝛾′ (𝑡)⋅𝜂(𝑡)+𝛾(𝑡)⋅𝜂′ (𝑡) as Δ𝑡 → 0 because the product operation is jointly continuous, with 𝛾(𝑠) ⋅ 𝜂(𝑡) → 𝛾(0) ⋅ 𝜂(0) as 𝑠, 𝑡 → 0 independently. □ This yields the following description of the vector space operations in the tangent space TG𝑒 of a matrix Lie group. Lemma 5.55. The Lie algebra 𝔤 = TG𝑒 of a matrix Lie group is a vector space over ℝ, as are the tangent spaces TG𝑝 at arbitrary base points in 𝐺. Proof. If 𝑐 ∈ ℝ and 𝛾(𝑡) is a 𝒞 ∞ curve in 𝐺 passing through the identity 𝐼 when 𝑡 = 0, and if 𝛾′ (0) = 𝑋 ∈ 𝔤, then 𝜂(𝑡) = 𝛾(𝑐𝑡) is also a smooth curve, with 𝑑𝛾 𝑑 {𝜂(𝑡)} = 𝑐 ⋅ (𝑡) 𝑑𝑡 𝑑𝑡

for all 𝑡.

Hence 𝜂′ (0) = 𝑐 ⋅ 𝑋 is in 𝔤 = TG𝑒 for 𝑋 ∈ 𝔤. If 𝑔 ∈ 𝐺 and 𝛾, 𝜂 have 𝛾(0) = 𝜂(0) = 𝑒 and 𝛾′ (0) = 𝑋, 𝜂′ (0) = 𝑌 in 𝔤, then the product Φ(𝑡) = 𝛾(𝑡) ⋅ 𝑔−1 ⋅ 𝜂(𝑡) is a 𝒞 ∞ curve with Φ(0) = 𝑔𝑔−1 𝑔 = 𝑔, and by the derivation law Lemma 5.54 we get Φ′ (0) = 𝛾′ (0) ⋅ 𝑔−1 𝜂(0) + 𝛾(0)𝑔−1 ⋅ 𝜂′ (0) = 𝑋 + 𝑌 in 𝔤.

□

5.3. LIE ALGEBRA STRUCTURE IN TANGENT SPACES OF LIE GROUPS

245

Lemma 5.56. Let 𝛾(𝑡) be a 𝒞 ∞ curve in a matrix Lie group 𝐺 such that 𝛾(0) = 𝐼 and 𝛾′ (0) = 𝑋 in TG𝑒 . −1

1. Inverses 𝜇(𝑡) = (𝛾(𝑡)) exist for sufficiently small values of 𝑡, and 𝜇(𝑡) is a 𝒞 ∞ curve in 𝐺 with 𝜇(0) = 𝐼 and 𝜇′ (0) = −𝛾′ (0). 2. For fixed 𝑔 ∈ 𝐺, 𝜂(𝑡) = 𝜆𝑔 (𝛾(𝑡)) = 𝑔 ⋅ 𝛾(𝑡) is a 𝒞 ∞ curve in 𝐺 with 𝜂(0) = 𝑔 and vector derivative 𝜂′ (𝑡) = 𝑔 ⋅ 𝛾′ (𝑡)

(matrix product in M(𝑛, ℝ))

for all 𝑡 ∈ ℝ. 3. The differential (𝑑𝜆𝑔 )𝑒 sends 𝑋 ∈ TG𝑒 to (𝑑𝜆𝑔 )𝑒 𝑋 = 𝜂′ (0) = 𝑔 ⋅ 𝑋

(matrix product in M(𝑛, ℝ)).

in TG𝑔 . Similarly for right translations we have (𝑑𝜌𝑔 )𝑒 𝑋 = 𝑋 ⋅ 𝑔. Proof. Since 𝛾(𝑡) → 𝛾(0) = 𝐼 as 𝑡 → 0, det(𝛾(𝑡)) → det(𝐼) = 1, so det(𝛾(𝑡)) is nonzero and 𝛾(𝑡) is invertible for small 𝑡. Inversion 𝑥 ↦ 𝐽(𝑥) = 𝑥−1 is a 𝒞 ∞ map (by definition) and we are assuming 𝛾 ∶ ℝ → 𝐺 ⊆ M(𝑛, ℝ) is 𝒞 ∞ , so the composite 𝜇(𝑡) = 𝛾(𝑡)−1 = 𝐽(𝛾(𝑡)) is also 𝒞 ∞ . To prove 𝜇′ (0) = −𝛾′ (0) we differentiate the identity 𝐼𝑛×𝑛 = 𝛾(𝑡) ⋅ 𝛾(𝑡)−1 = 𝛾(𝑡) ⋅ 𝜇(𝑡) to get 0=

𝑑𝜇 𝑑𝛾 ⋅ 𝜇(𝑡) + 𝛾(𝑡) ⋅ . 𝑑𝑡 𝑑𝑡

Set 𝑡 = 0 and rearrange terms to conclude that 𝜇′ (0) = −𝛾′ (0). In (2) left translations 𝜆𝑔 ∶ 𝐺 → 𝐺 are diffeomorphisms of 𝐺 so 𝜂(𝑡) = 𝜆𝑔 (𝛾(𝑡)) is a 𝒞 ∞ curve in 𝐺 with 𝜂(0) = 𝑔. Then 𝜂′ (𝑡) = 𝑔 ⋅ 𝛾′ (𝑡) because 𝑔 ⋅ 𝛾(𝑡 + Δ𝑡) − 𝑔 ⋅ 𝛾(𝑡) 𝑑 {𝜆 (𝛾(𝑡))} = lim 𝑑𝑡 𝑔 Δ𝑡 ∆𝑡→0 Δ𝛾 = 𝑔 ⋅ lim = 𝑔 ⋅ 𝛾′ (𝑡) (product of 𝑛 × 𝑛 matrices) ∆𝑡→0 Δ𝑡

𝜂′ (𝑡) =

for all 𝑡. For (3) we must wend our way through several definitions. Viewing tangent vectors in TG𝑒 and TG𝑔 as derivations on the local algebras 𝒞 ∞ (𝑒) and 𝒞 ∞ (𝑔) we have just seen that 𝜂′ (0) = 𝑔 ⋅ 𝑋, while for all 𝑓 ∈ 𝒞 ∞ (𝑔) we have ⟨(𝑑𝜆𝑔 )𝑒 𝑋, 𝑓⟩ = ⟨𝑋, (𝑓 ∘ 𝜆𝑔 )⟩ = ⟨𝛾′ (0), (𝑓 ∘ 𝜆𝑔 )⟩ =

𝑑 𝑑 | | {𝑓(𝜆𝑔 ∘ 𝛾(𝑡))}| = {𝑓(𝑔 ⋅ 𝛾(𝑡))}| |𝑡=0 𝑑𝑡 |𝑡=0 𝑑𝑡

= ⟨𝜂′ (0), 𝑓⟩ = ⟨𝑔 ⋅ 𝑋, 𝑓⟩. Thus (𝑑𝜆𝑔 )𝑒 𝑋 = 𝑔 ⋅ 𝑋 in M(𝑛, 𝕂).

□

We can now show how the tangent space space 𝔤 = TG𝑒 acquires a Lie algebra structure.

246

5. MATRIX LIE GROUPS

Proposition 5.57 (The Lie Bracket in 𝔤). If 𝐺 ⊆ M(𝑛, ℝ) is a matrix Lie group its tangent space at the identity 𝔤 = TG𝑒 is closed under formation of associative commutators in M(𝑛, ℝ), then (5.13)

𝑋, 𝑌 ∈ 𝔤 ⇒ [𝑋, 𝑌 ] = 𝑋𝑌 − 𝑌 𝑋 is also in 𝔤,

so 𝔤 is a Lie subalgebra in M(𝑛, ℝ). Note. Tangent vectors 𝑋, 𝑌 ∈ 𝔤 = TG𝑒 are in M(𝑛, ℝ) so we can form their matrix products 𝑋𝑌 and 𝑌 𝑋, but these need not be elements of 𝔤; nevertheless, the associative commutator [𝑋, 𝑌 ] = 𝑋𝑌 − 𝑌 𝑋 always ends up in 𝔤. Once we prove that it will be obvious that the bracket operation (5.6) makes 𝔤 into a Lie algebra over ℝ. ○ Proof. Let 𝛾(𝑡), 𝜂(𝑡) be 𝒞 ∞ curves in 𝐺 that pass through 𝑒 = 𝐼 when 𝑡 = 0 and have derivatives 𝛾′ (0) = 𝑋, 𝜂′ (0) = 𝑌 in 𝔤. For 𝑠, 𝑡 ∈ ℝ the function 𝑓(𝑡) = 𝛾(𝑠)𝜂(𝑡)𝛾(𝑠)−1

(product of elements in 𝐺).

∞

is 𝒞 , 𝑓(0) = 𝐼 for all values of 𝑠, and 𝑓(𝑡) remains within 𝐺 for all 𝑡 (and 𝑠) because 𝐺 is a group. Thus 𝑓′ (0) is in the Lie algebra 𝔤. But we also have 𝑓′ (0) =

𝜕 𝑑 | | = {𝛾(𝑠)𝜂(𝑡)𝛾(𝑠)−1 }| = 𝛾(𝑠)𝑌 𝛾(𝑠)−1 {𝑓(𝑡)}| |𝑡=0 𝜕𝑡 |𝑡=0 𝑑𝑡

for small values of 𝑠. Hence 𝛾(𝑠) ⋅ TG𝑒 ⋅ 𝛾(𝑠)−1 ⊆ TG𝑒 for all small 𝑠 and we get 𝜕 𝑑 | | = 𝛾′ (0) ⋅ 𝑌 𝛾(0)−1 + 𝛾(0)𝑌 ⋅ {𝛾(𝑠)−1 }| {𝛾(𝑠)𝑌 𝛾(𝑠)−1 }| | |𝑠=0 𝜕𝑠 𝑑𝑠 𝑠=0 = 𝛾′ (0) ⋅ 𝑌 𝛾(0)−1 + 𝛾(0)𝑌 ⋅ −𝛾′ (0) = 𝑋𝑌 − 𝑌 𝑋 = [𝑋, 𝑌 ]

(commutator in M(𝑛, ℝ)) □

as required to show [𝔤, 𝔤] ⊆ 𝔤. Another way to say this is: (5.14)

[𝑋, 𝑌 ] =

𝜕2 𝑓 𝜕2 | (0) = . {𝛾(𝑠)𝜂(𝑡)𝛾(𝑠)−1 }| |𝑠=𝑡=0 𝜕𝑠𝜕𝑡 𝜕𝑠 𝜕𝑡

Adjoint Action of G on Its Lie Algebra 𝔤. The action of a Lie group on itself by inner automorphisms 𝑖𝑔 (𝑥) = 𝑔𝑥𝑔−1 (aka conjugations) plays a major role in Lie theory. The next result shows that the differentials (𝑑𝑖𝑔 )𝑒 are linear operators on the Lie algebra 𝔤, and the resulting linear action 𝐺×𝔤 → 𝔤 is called the adjoint action of 𝐺 on the Lie algebra. The modern literature reflects this now universal terminology by writing the diffential (𝑑𝑖𝑔 )𝑒 of a conjugation as Ad𝑔 ∶ 𝔤 → 𝔤. Lemma 5.58. An inner automorphisms 𝑖𝑔 ∶ 𝑥 → 𝑔𝑥𝑔−1 fixes the identity element in 𝐺 and its differential Ad𝑔 = (𝑑𝑖𝑔 )𝑒 leaves the Lie algebra 𝔤 = TG𝑒 invariant for every 𝑔 ∈ 𝐺, so Ad𝑔 (𝔤) ⊆ 𝔤. Furthermore, we have Ad𝑔 (𝑌 ) = 𝑔𝑌 𝑔−1 for all 𝑌 ∈ 𝔤, 𝑔 ∈ 𝐺.

(matrix product)

5.3. LIE ALGEBRA STRUCTURE IN TANGENT SPACES OF LIE GROUPS

247

Proof. If 𝑋 ∈ 𝔤, 𝑔 ∈ 𝐺, and 𝛾 is a 𝒞 ∞ curve in 𝐺 such that 𝛾(0) = 𝐼 and 𝛾 (0) = 𝑋, then 𝜂(𝑡) = 𝑖𝑔 (𝛾(𝑡)) = 𝑔𝛾(𝑡)𝑔−1 is a smooth curve in 𝐺 with 𝜂(0) = 𝐼 because products and inverses in 𝐺 are 𝒞 ∞ operations in 𝐺. By Lemma 5.54 we have 𝜂′ (𝑡) = 𝑔𝛾′ (𝑡)𝑔−1 , and setting 𝑡 = 0 we see that ′

𝜂′ (0) = 𝑔𝑋𝑔−1

(matrix product)

is always in 𝔤. By Part (3.) of Lemma 5.56 the inner automorphism 𝑖𝑔 (𝑥) = 𝑔𝑥𝑔−1 can be written 𝑖𝑔 = 𝜆𝑔 ∘ 𝜌𝑔−1 , hence Ad𝑔 (𝑋) = (𝑑𝑖𝑔 )𝑒 (𝑋) = (𝑑𝜆𝑔 )𝑒 ∘ (𝑑𝜌𝑔−1 )𝑒 (𝑋) = (𝑑𝜆𝑔 )𝑒 (𝑋 ⋅ 𝑔−1 ) = 𝑔𝑋𝑔−1 , □

as claimed.

Exercise 5.59. Verify that the adjoint action Ad ∶ 𝐺 × 𝔤 → 𝔤 of a matrix Lie group on its Lie algebra 𝔤 = TG𝑒 has the following properties. (a) Ad𝑔 = (𝑑𝑖𝑔 )𝑒 is an invertible linear operator on 𝔤. (b) The map “𝐴𝑑” that sends 𝑔 ∈ 𝐺 to the invertible linear operator Ad𝑔 = (𝑑𝑖𝑔 )𝑒 on 𝔤 is a group homomorphism from 𝐺 → Aut(𝔤). Ad𝑒 = id𝔤

Ad𝑔1 𝑔2 = Ad𝑔1 ∘ Ad𝑔2

(Ad𝑔 )−1 = Ad𝑔−1

for 𝑔1 , 𝑔2 ∈ 𝐺. Exercise 5.60. For a Lie algebra 𝔤 over any field of scalars 𝕂 we can define the linear map ad that sends 𝑋 ∈ 𝔤 to a linear operator ad𝑋 on 𝔤, letting ad𝑋 (𝑌 ) = [𝑋, 𝑌 ]

for all 𝑋, 𝑌 in 𝔤.

(a) Prove that ad𝑋 is a Lie algebra derivation, so that ad𝑋 ([𝑌 , 𝑍]) = [ad𝑋 (𝑌 ), 𝑍] + [𝑌 , ad𝑋 (𝑍)]

for all 𝑋, 𝑌 , 𝑍 ∈ 𝔤.

(b) Verify that this derivation property of ad𝑋 is equivalent to the Jacobi identity in the definition of Lie Algebras. We will have more to say about the adjoint action before long. Lie Algebras of the Classical Groups. There is more to be done if we wish to understand particular Lie algebras 𝔤 and do effective calculations with them. We need concrete models of 𝔤, and vector bases whose pattern of structure constants is simple enough to reveal the properties of 𝔤. Using the facts just developed we will identify the matrices appearing in the Lie algbras of the classical matrix Lie groups, and in some low-dimensional examples we will produce vector bases with particularly illuminating patterns of structure constants. When 𝐺 = GL(𝑛, 𝕂) there is nothing to do because GL(𝑛, 𝕂) is an open set that fills most of matrix space, and its Lie algebra 𝔤𝔩(𝑛, 𝕂) is all of M(𝑛, 𝕂) equipped with the usual associative Lie bracket. The standard “matrix units” 𝐸𝑖𝑗 with (𝑘, ℓ) entry equal to 𝛿𝑖𝑘 ⋅𝛿𝑗ℓ (Kronecker deltas) are a 𝕂-basis for 𝔤𝔩(𝑛, 𝕂). The structure constants are given by the commutator relations [𝐸𝑖𝑗 , 𝐸𝑘ℓ ] = 𝐸𝑖𝑗 𝐸𝑘ℓ − 𝐸𝑘ℓ 𝐸𝑖𝑗 = 𝛿𝑗𝑘 𝐸𝑖ℓ − 𝛿𝑖ℓ 𝐸𝑘𝑗 .

248

5. MATRIX LIE GROUPS

This is admittedly a fairly complicated set of relations. It becomes a little more transparent if you notice that the pairs of addresses (𝑖𝑗), (𝑘ℓ) and (𝑘𝑗), (𝑖, ℓ) lie at diametrically opposite corners of a rectangle in the 𝑛 × 𝑛 array of matrix addresses. Example 5.61 (Special Linear Group SL(𝑛, 𝕂)). The Lie algebra of 𝐺 = SL(𝑛, 𝕂) = {𝐴 ∈ M(𝑛, 𝕂) ∶ det(𝐴) = 1} is 𝔰𝔩(𝑛, 𝕂) = {𝑋 ∈ M(𝑛, 𝕂) ∶ Tr(𝑋) = 0}, which has dim𝕂 𝔰𝔩(𝑛, 𝕂) = 𝑛2 − 1 because it is the kernel of the 𝕂-linear trace function Tr ∶ 𝔤 → 𝕂. Discussion. Let 𝛾(𝑡) be a 𝒞 ∞ curve in 𝐺 with 𝛾(0) = 𝐼𝑛×𝑛 and 𝛾′ (0) = 𝑋 in the Lie algebra 𝔤. By definition of SL, 𝑛

1 ≡ det (𝛾(𝑡)) = ∑ sgn(𝜋) ⋅ ∏ 𝛾(𝑡)𝑗,𝜋(𝑗) 𝜋∈𝑆𝑛

for all 𝑡.

𝑗=1

The product rule implies 𝑛

0=

𝑑 {𝛾(𝑡)} = ∑ sgn(𝜋) ⋅ ( ∑ 𝛾(𝑡)1,𝜋(1) ⋅ ⋯ ⋅ 𝛾′ (𝑡)𝑘,𝜋(𝑘) ⋅ ⋯ ⋅ 𝛾𝑛,𝜋(𝑛) (𝑡)). 𝑑𝑡 𝜋∈𝑆 𝑘=1 𝑛

𝑛

Since 𝛾(0) = 𝐼𝑛×𝑛 when 𝑡 = 0, for 𝑗 ≠ 𝑘 the 𝑗th term in ( ∑𝑘=1 ⋯ ) is zero unless 𝜋(𝑗) = 𝑗, but when 𝑗 = 𝑘 we must have 𝜋(𝑘) = 𝑘 too, so 𝜋 = id in 𝑆𝑛 . We conclude that the entire sum over 𝑘 is zero, except for the term corresponding to 𝜋 = id in the outer sum over 𝑆𝑛 . Since 𝛾′ (0) = 𝑋 in M(𝑛, 𝕂) we get 𝑛

𝑛

0 = ∑ 1 ⋅ ⋯ ⋅ 1 ⋅ 𝑥𝑘𝑘 ⋅ 1 ⋅ ⋯ ⋅ 1 = ∑ 𝑥𝑘𝑘 = Tr(𝑋) 𝑘=1

𝑘=1

′

where 𝑥𝑖𝑗 = [𝛾 (0)]𝑖𝑗 . Conversely, if 𝑋 ∈ M(𝑛, 𝕂) and Tr(𝑋) = 0 the matrix-exponential series ∞

𝑡𝑛 𝑛 𝑋 𝑛! 𝑛=0

𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 = ∑

converges in matrix space to a 𝒞 ∞ curve that passes through 𝐼 when 𝑡 = 0. Furthermore, the exponent law 𝑒(𝑠+𝑡)𝑋 = 𝑒𝑠𝑋 ⋅ 𝑒𝑡𝑋 implies that (5.15)

𝑑 𝑡𝑋 {𝑒 } = 𝑋 ⋅ 𝑒𝑡𝑋 𝑑𝑡

for all 𝑋 ∈ M(𝑛, 𝕂), 𝑡 ∈ ℝ,

so 𝛾𝑋′ (0) = 𝑋. (Matrix exponentials 𝑒𝑡𝐴 are discussed in Sections 5.4 and 5.5 of LA I, see Application 5.5: Linear Systems of Differential Equations.) Finally, if 𝑋 ∈ M(𝑛, 𝕂) has trace Tr(𝑋) = 0 the exponentials 𝑒𝑡𝑋 remain within 𝐺 for all 𝑡 because det (𝑒𝑡𝑋 ) = 𝑒𝑡⋅Tr(𝑋) = 1

5.3. LIE ALGEBRA STRUCTURE IN TANGENT SPACES OF LIE GROUPS

249

(see Lemma 5.62 below). We conclude that 𝔰𝔩(𝑛, 𝕂) = {𝑋 ∈ M(𝑛, 𝕂) ∶ Tr(𝑋) = 0} = {𝑋 ∈ M(𝑛, 𝕂) ∶ 𝑒𝑡𝑋 ∈ SL(𝑛, ℂ) for all 𝑡 ∈ ℝ}. When 𝕂 = ℂ, 𝐺 = SL(𝑛, ℂ) is a complex Lie group and 𝔰𝔩(𝑛, ℂ) is a Lie algebra over ℂ. Whether 𝕂 = ℝ or ℂ, dim𝕂 (𝐺) = dim𝕂 𝔰𝔩(𝑛, 𝕂) and our identification of the Lie algebra immediately shows that dim𝕂 𝔰𝔩(𝑛, 𝕂) = 𝑛2 − 1, being the kernel of the linear trace function Tr ∶ 𝔤 → 𝕂. Lemma 5.62. When 𝕂 = ℂ or ℝ the identity det(𝑒 𝐴 ) = 𝑒Tr(𝐴)

(5.16)

holds for 𝕂 = ℝ or ℂ, or for arbitrary 𝕂 if the characteristic polynomial 𝑝𝐴 splits over 𝕂. Proof. If 𝑝𝐴 splits this follows because the Jordan canonical form of 𝐴 exists and is upper triangular. Its diagonal entries are the eigenvalues 𝜆𝑗 ∈ 𝕂, appearing according to their multiplicities 𝑚𝑗 . By Corollary 1.53 we get 𝑟

𝑚𝑗

det(𝐴) = ∏ 𝜆𝑗

𝑟

while

Tr(𝐴) = ∑ 𝑚𝑗 𝜆𝑗 . 𝑗=1

𝑗=1

Then 𝑒 𝐴 is upper triangular with diagonal entries 𝑒𝜆𝑗 , each appearing 𝑚𝑗 times, 𝑟 so det(𝐴) = ∏𝑗=1 𝑒𝑚𝑗 𝜆𝑗 . Meanwhile, 𝑒Tr(𝑋) = 𝑒(𝑚1 𝜆1 +⋯+𝑚𝑟 𝜆𝑟 ) , and the identity follows. When 𝕂 = ℝ equation (5.16) remains valid even if the characteristic polynomial does not split over ℝ. If 𝑇 = 𝐿𝐴 ∶ ℝ𝑛 → ℝ𝑛 for some 𝐴 in M(𝑛, ℝ) (5.16) follows by examining the Jordan canonical form of the complexification 𝑇ℂ on ℂ𝑛 and observing that det(𝑇ℂ ) = det(𝑇) = det(𝐴)

and

Tr(𝑇ℂ ) = Tr(𝑇) = Tr(𝐴).

The formulas for det(𝐴) and Tr(𝐴) involve all the complex eigenvalues of the real matrix 𝐴, but the terms in (5.16) are real-valued owing to cancellations. □ Proposition 5.63 (Lie Algebras of Unitary Groups U(𝑛) and SU(𝑛)). When 𝐺 = U(𝑛) or SU(𝑛) the Lie algebras are 𝔲(𝑛) = {𝑋 ∈ M(𝑛, ℂ) ∶ 𝑋 ∗ = −𝑋} 𝔰𝔲(𝑛) = {𝑋 ∈ M(𝑛, ℂ) ∶ 𝑋 ∗ = −𝑋 and Tr(𝑋) = 0} These groups are real, not complex, even though they are carved out of M(𝑛, ℂ), and we have dimℝ 𝔲(𝑛) = 𝑛2 dimℝ 𝔰𝔲(𝑛) = 𝑛2 − 1. Proof. For (⊆), let 𝛾(𝑡) be a 𝒞 ∞ curve in 𝐺 with 𝛾(0) = 𝐼, 𝛾′ (0) = 𝑋 in 𝔤. Since ∗ 𝑑𝛾 𝑑 {𝛾(𝑡)∗ } = ( ) 𝑑𝑡 𝑑𝑡

250

5. MATRIX LIE GROUPS

and 0= we get

𝑑 𝑑 𝑑 {𝛾(𝑡)∗ ⋅ 𝛾(𝑡)} = {𝛾(𝑡)}∗ ⋅ 𝛾(𝑡) + 𝛾(𝑡)∗ ⋅ {𝛾(𝑡)} 𝑑𝑡 𝑑𝑡 𝑑𝑡

0 = 𝛾′ (𝑡)∗ 𝛾(𝑡) + 𝛾(𝑡)∗ 𝛾′ (𝑡) for all small 𝑡. Setting 𝑡 = 0 gives 𝑋 ∗ = −𝑋, so 𝔲(𝑛) ⊆ {𝑋 ∶ 𝑋 ∗ = −𝑋}; similarly 𝔰𝔲(𝑛) ⊆ {𝑋 ∶ 𝑋 ∗ = −𝑋 and Tr(𝑋) = 0}. For the converse inclusion (⊇), if 𝑋 ∗ = −𝑋 and 𝛾(𝑡) = 𝑒𝑡𝑋 we have ∗

𝛾(𝑡)∗ = 𝑒𝑡 𝑋 = 𝑒−𝑡𝑋 = 𝛾(𝑡)−1

for all 𝑡,

so 𝛾(ℝ) ⊆ U(𝑛). Since 𝛾(𝑡) is 𝒞 ∞ , 𝛾(0) = 𝐼, and 𝛾′ (0) = 𝑋, the tangent vector 𝑋 is in 𝔲(𝑛), proving the inclusion (⊇). The same argument works for 𝔰𝔲(𝑛). □ Exercise 5.64. Review the preceding discussion to explain why (a) The exponential curve 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 remains within U(𝑛) if 𝑋 ∈ 𝔲(𝑛). (b) A vector 𝑋 ∈ M(𝑛, ℂ) is in the Lie algebra 𝔲(𝑛) ⇔ 𝑒𝑡𝑋 is in U(𝑛) for all 𝑡 ∈ ℝ. Example 5.65 (Complex Orthogonal Groups O(𝑛, ℂ) and SO(𝑛, ℂ)). The Lie algebras of O(𝑛, ℂ) and SO(𝑛, ℂ) are the same. Since det(𝐴) can only take the values ±1, O(𝑛, ℂ) consists of two disjoint open subsets, the one containing the identity 𝐼 being SO(𝑛, ℂ). Thus we have 𝔰𝔬(𝑛, ℂ) = 𝔬(𝑛, ℂ) = {𝑋 ∈ M(𝑛, ℂ) ∶ 𝑋 T = −𝑋}. This automatically implies that diagonal entries of 𝑋 are zero, so Tr(𝑋) = 0. Discussion. If 𝛾(𝑠) is a smooth curve in SO(𝑛, ℂ) passing through 𝐼 when 𝑠 = 0, we have 𝛾(𝑠)𝛾(𝑠)T ≡ 𝐼. Applying 𝑑/𝑑𝑠 we get 𝛾′ (𝑠)𝛾(𝑠)T + 𝛾(𝑠)𝛾′ (𝑠)T = 0

for all 𝑠,

which implies that 𝑋 T = −𝑋 when we set 𝑠 = 0. Conversely if 𝑋 T = −𝑋, the exponential curve 𝛾(𝑠) = 𝑒𝑠𝑋 has 𝛾(0) = 𝐼, 𝛾′ (0) = 𝑋, and remains confined within SO(𝑛, ℂ) for all 𝑠 because T

T

𝛾(𝑠)T = (𝑒𝑠𝑋 ) = 𝑒𝑠𝑋 = 𝑒−𝑠𝑋 = 𝛾(𝑠)−1 . Thus 𝛾(𝑠)𝛾(𝑠)T = 𝐼 for all 𝑠 ∈ ℝ. The subgroup SO(𝑛, ℂ) is a complex group with 1 dimℂ SO(𝑛, ℂ) = dimℂ 𝔰𝔬(𝑛, ℂ) = (𝑛2 − 𝑛), 2 but SO(𝑛, ℂ) can also be regarded as a Lie group over ℝ of twice the dimension, dimℝ 𝔰𝔬(𝑛, ℂ) = 𝑛2 − 𝑛. Example 5.66 (Real Orthogonal Groups O(𝑛) and SO(𝑛)). The groups O(𝑛) and SO(𝑛) are the real points in the corresponding complex groups, O(𝑛) = O(𝑛, ℂ) ∩ (M(𝑛, ℝ) + 𝑖0) SO(𝑛) = SO(𝑛, ℂ) ∩ (M(𝑛, ℝ) + 𝑖0).

5.3. LIE ALGEBRA STRUCTURE IN TANGENT SPACES OF LIE GROUPS

251

They have the same Lie algebra because SO(𝑛) is an open subgroup in O(𝑛) (because det(𝐴) varies continuously and det(𝐴) = ±1 on O(𝑛)). The same argument used in Example 5.65 shows that 𝔰𝔬(𝑛) = {𝑋 ∈ M(𝑛, ℝ) ∶ 𝑋 T = −𝑋}; (and the trace Tr(𝑋) is automatically zero). It is also clear that 1 dimℝ SO(𝑛) = dimℝ 𝔰𝔬(𝑛) = (𝑛2 − 𝑛). 2 Example 5.67 (Symplectic Group Sp(2𝑛, 𝕂)). The Lie algebra of the symplectic group Sp(2𝑛, 𝕂) is 𝔰𝔭(2𝑛, 𝕂) = {𝑋 ∈ M(2𝑛, 𝕂) ∶ 𝑋 T 𝐽 + 𝐽𝑋 = 0}, where 𝐽=(

0 𝐼𝑛×𝑛 , ) −𝐼𝑛×𝑛 0 2𝑛×2𝑛

and 𝐽 T = −𝐽 = 𝐽 −1 . The dimensions over 𝕂 = ℝ or ℂ are dim𝕂 𝔰𝔭(2𝑛, 𝕂) = 2𝑛2 + 𝑛. Here 2𝑛 is needed because, as we saw in Chapter 3, only even-dimensional vector spaces can support nondegenerate skew-symmetric bilinear forms. Matrices in the Lie algebra 𝔰𝔭(2𝑛, 𝕂) have the form 𝑋=(

𝐴 𝐵 , ) 𝐶 −𝐴T 2𝑛×2𝑛

where 𝐴, 𝐵, 𝐶 are arbitrary 𝑛 × 𝑛 matrices such that 𝐵 T = 𝐵 and 𝐶 T = 𝐶. Discussion. The inclusion (⊆) is seen by differentiating the relation that defines 𝐴 ∈ Sp(2𝑛, 𝕂): 𝐴 is nonsingular and 𝐴T 𝐽𝐴 = 𝐽

(or 𝐽𝐴𝐽 −1 = (𝐴T )−1 ) 2

(recall Definition 3.39). Since det(𝐽) ≠ 0, it follows that ( det(𝐴)) = 1, so det(𝐴) = ±1 for 𝐴 ∈ Sp(2𝑛, 𝕂), whether 𝕂 = ℝ or 𝕂 = ℂ. To prove the inclusion (⊇), we consider a 𝒞 ∞ curve 𝛾(𝑡) in 𝐺 = Sp(𝑛, 𝕂) such that 𝛾(0) = 𝐼 and 𝛾′ (0) = 𝑋 in 𝔤. Differentiate the identity 𝛾(𝑡)T 𝐽𝛾(𝑡) = 𝐽 to get (𝛾′ (𝑡))T 𝐽𝛾(𝑡) + 𝛾(𝑡)T 𝐽𝛾′ (𝑡) = 0 ; setting 𝑡 = 0 yields 𝑋 T 𝐽 + 𝐽𝑋 = 0

or

𝑋 T 𝐽 = −𝐽𝑋.

Notice that 𝑋 T = −𝐽𝑋𝐽 −1 = 𝐽𝑋𝐽 since 𝐽 −1 = −𝐽, so Tr(𝑋) = Tr(𝑋 T ) = Tr(𝐽𝑋𝐽) = Tr(𝑋𝐽 2 ) = −Tr(𝑋). Thus Tr(𝑋) = 0 automatically for elements in 𝔰𝔭(2𝑛, 𝕂). To prove the reverse inclusion (⊇) consider any 𝑋 such that 𝑋 T 𝐽 = −𝐽𝑋 (so T 𝑋 = 𝐽𝑋𝐽 = −𝐽𝑋𝐽 −1 ). Then 𝛾(𝑠) = 𝑒𝑠𝑋 is a 𝒞 ∞ matrix-valued curve such that 𝛾(0) = 𝐼, 𝛾′ (0) = 𝑋, and 𝛾(𝑡) remains within Sp(𝑛, 𝕂) because 𝐽𝛾(𝑠)𝐽 −1 = 𝑒𝑠(𝐽𝑋𝐽

−1 )

T

−1

= 𝑒−𝑠𝑋 = (𝑒−𝑠𝑋 )T = (𝛾(𝑠)T )

for all 𝑠 ∈ ℝ.

252

5. MATRIX LIE GROUPS

Exercise 5.68. Using the symmetry properties of matrices in 𝔰𝔭(2𝑛, 𝕂) to verify that matrices in these Lie algebras have the form shown in Example 5.67. Verify that 𝑋 ∈ M(2𝑛, 𝕂) is in 𝔰𝔭(2𝑛, 𝕂) ⇔ 𝑒𝑡𝑋 ∈ Sp(2𝑛, 𝕂) for all 𝑡 ∈ 𝕂. Some Lie Algebras Determined by Commutation Relations. Commutative Lie algebras are determined by the trivial relations [𝑋𝑖 , 𝑋𝑗 ] = 0 for any basis. The simplest noncommutative example is the Lie algebra of the “𝑎𝑥 + 𝑏 group,” Example 5.69 (Lie Algebra of “𝑎𝑥 + 𝑏” Group). The affine transformations 𝑇𝑎,𝑏 (𝑥) = 𝑎𝑥 + 𝑏 (𝑎 ≠ 0) of the real line form a group of transformations under composition that is isomorphic to the matrix group 𝐺 = {(

𝑎 𝑏 ) ∶ 𝑎, 𝑏 ∈ ℝ, 𝑎 ≠ 0 } 0 1

under the map 𝜙(

𝑎 𝑏 ) = 𝑇𝑎,𝑏 , 0 1

(for more details see the Additional Exercises for Section 5.3). It is not hard to see this is a two-dimensional matrix Lie group. 𝒞 ∞ curves through the identity 𝐼 that determine basis vectors in the tangent space 𝔤 = TG𝑒 are provided by the exponential functions 𝛾(𝑡) = 𝑒𝑡𝑋𝑘 where 𝑋1 = (

1 0 ) 0 0

with

𝑒𝑡𝑋1 = (

𝑒𝑡 0 ) 0 1

𝑋2 = (

0 1 ) 0 0

with

𝑒𝑡𝑋2 = (

1 𝑡 ). 0 1

The tangent vectors 𝛾𝑘′ (0) = 𝑋𝑘 span a two-dimensional vector space 𝔪 in M(2, ℝ); since [𝑋1 , 𝑋2 ] = 𝑋2 and the Lie bracket is bilinear, it is easy to check that commutators [𝐴, 𝐵] = 𝐴𝐵 − 𝐵𝐴 of arbitrary elements in 𝔪 are in 𝔪. Thus 𝔪 is a Lie algebra over ℝ, and is precisely the two-dimensional tangent space of the 𝑎𝑥 + 𝑏 group. Note. Although the 𝑎𝑥+𝑏 group 𝐺 has been defined as a group of matrices, it is not a “matrix Lie group” in our sense because it is not a closed subset in matrix space M(2, ℝ) ≅ ℝ4 , although it is a Lie group in the more general sense. 1 For instance, the diagonal matrices 𝐴𝑛 = [ , 0; 0, 1] ∈ 𝐺 converge in matrix 𝑛 space to 𝐵 = [0, 0; 0, 1] which is not invertible and not in 𝐺. All the classical groups are closed subgroups of matrices. ○ Exercise 5.70. In the Lie algebra 𝔤 of the 𝑎𝑥 + 𝑏 group, write 𝑋=(

1 0 ) 0 0

and

𝑌 =(

0 1 ). 0 0

5.3. LIE ALGEBRA STRUCTURE IN TANGENT SPACES OF LIE GROUPS

253

Prove that (a) [𝑋, 𝑌 ] = 𝑌 . (b) [𝔪, 𝔪] ⊆ 𝔪 if 𝔪 = ℝ-span{𝑋, 𝑌 }. (c) 𝔪 is contained in the tangent space TG𝑒 of the 𝑎𝑥 + 𝑏 group. (d) 𝔪 is equal to the Lie algebra of the 𝑎𝑥 + 𝑏 group. (e) Compute the one-parameter subgroups 𝑒𝑠𝑋 and 𝑒𝑡 𝑌 and their products 𝑒𝑠𝑋 𝑒𝑡𝑌 . Does every element 𝐴 in the 𝑎𝑥 + 𝑏 group have such a factorization? (What if det(𝐴) < 0?) Are such factorizations unique when det(𝐴) > 0? Example 5.71 (The Standard Basis in 𝔰𝔩(2, ℝ)). The Lie algebra of SL(2, ℝ), 𝔰𝔩(2, ℝ) = {(

𝑎 𝑏 ) ∶ 𝑎, 𝑏, 𝑐 ∈ ℝ} , 𝑐 −𝑎

has a distinguished basis 𝐻=(

1 0 ) 0 −1

𝐸=(

0 1 ) 0 0

𝐹=(

0 0 ), 1 0

whose commutation relations [𝐻, 𝐸] = 2𝐸

[𝐻, 𝐹] = −2𝐹

[𝐸, 𝐹] = 𝐻

are easily verified by direct calculation. It is clear that ad𝐻 (𝑋) = [𝐻, 𝑋] leaves the two-dimensional subspace 𝔪 = ℝ-span{𝐸, 𝐹} invariant, and ad𝐻 |𝔪 is diagonalizable with eigenvalues 𝜆± = ±2. The basis vectors are the derivatives 𝛾′ (0) for the one-parameter subgroups in SL(2, ℝ) 𝑒𝑡𝐻 = (

𝑒𝑡 0 ) 0 𝑒−𝑡

𝑒𝑡𝐸 = (

1 𝑡 ) 0 1

𝑒𝑡𝐹 = (

1 0 ), 𝑡 1

which all lie in SL(2, ℝ). The matrices 𝐸, 𝐹, 𝐻 are obviously linearly independent and lie in 𝔰𝔩(2, ℝ), so they are a basis for this 3-dimensional Lie algebra. Exercise 5.72 (Standard Basis in 𝔰𝔬(3)). Verify that: (a) The matrices in M(3, ℝ) 0 0 0 0 1) 𝐈=(0 0 −1 0

𝐉=(

0 0 1 0 0 0) −1 0 0

0 1 0 𝐊 = ( −1 0 0 ) 0 0 0

satisfy the commutation relations [𝐈, 𝐉] = 𝐊

[𝐉, 𝐊] = 𝐈

[𝐊, 𝐈] = 𝐉.

(b) Verify that 𝐈, 𝐉, 𝐊 are in the Lie algebra 𝔰𝔬(3) of the group SO(3) of orientation-preserving rotations about axes through the origin in ℝ3 . (c) Compute one-parameter groups 𝑅1 (𝑡) = 𝑒𝑡𝐈 , 𝑅2 (𝑡) = 𝑒𝑡𝐉 , 𝑅3 (𝑡) = 𝑒𝑡𝐊 in “closed form”– i.e. find explicit formulas for the series sums. (d) The 𝑅𝑘 (𝑡) are rotation operators in ℝ3 . In each case find the axis of rotation ℓ𝑘 and angle of of rotation 𝜃𝑘 (𝑡) when 𝑡 = 1. Hint. For (c) you might revisit Euler’s Theorem (Theorem 6.84 in LA I).

254

5. MATRIX LIE GROUPS

Exercise 5.73 (The Standard Basis in 𝔰𝔲(2)). Verify that: (a) The matrices in M(2, ℂ) 𝐈′ =

1 𝑖 0 ) ( 2 0 −𝑖

𝐉′ =

1 0 1 ) ( 2 −1 0

𝐊′ =

1 0 𝑖 ) ( 2 𝑖 0

satisfy the commutation relations [𝐈′ , 𝐉′ ] = 𝐊′

[𝐉′ , 𝐊′ ] = 𝐈′

[𝐊′ , 𝐈′ ] = 𝐉′ .

(b) These matrices are basis vectors for the Lie algebra 𝔰𝔲(2) of the special unitary group SU(2). (c) Find explicit formulas for the matrices in the one parameter subgroups 𝛾(𝑡) = 𝑒𝑡𝐴 of SU(2) for each of the matrices 𝐴 = 𝐈′ , 𝐉′ , 𝐊′ . Note. Our labeling of the basis vectors for 𝔰𝔬(3) and 𝔰𝔲(2) reveals that they have the same commutation relations and structure constants. This immediately implies there is a unique Lie algebra isomorphism between SO(3) and SU(2) that sends 𝐈 → 𝐈′ , … , 𝐊 → 𝐊′ . These commutation relations should look familiar. The same relations 𝐢×𝐣=𝐤

𝐣×𝐤=𝐢

𝐤×𝐢=𝐣

govern the calculus cross product operation 𝐚 × 𝐛 on ℝ3 when applied to the standard basis vectors 𝐢 = (1, 0, 0), 𝐣 = (0, 1, 0), 𝐤 = (0, 0, 1). This is not an accident. In classical and quantum mechanics the basis vectors 𝐈, 𝐉, 𝐊 in 𝔰𝔬(3) are often described as “infinitesimal generators” of rotations about the 𝑥-axis, 𝑦-axis, and 𝑧-axis in ℝ3 . (See also the discussion of “Differential Forms and the Cross-Product” in Section 4.4 of these Notes.) ○ 5.4. The Exponential Map for Matrix Lie Groups For 𝕂 = ℝ or ℂ, a matrix Lie group 𝐺 is a closed subgroup in GL(𝑛, 𝕂) that is also a smooth differentiable manifold embedded in matrix space M(𝑛, 𝕂) ≅ 2 𝕂𝑛 , with dimension 𝑑 = dim(𝐺) ≤ 𝑛2 . Its Lie algebra is the tangent space 𝔤 = TG𝑒 at the identity in 𝐺 and the other tangent spaces TG𝑔 are left translates (𝑑𝜆𝑔 )𝑒 (𝔤) = 𝑔 ⋅ 𝔤

(matrix product).

The principal tools for understanding Lie groups 𝐺 and their Lie algebras 𝔤 are the exponential map exp ∶ 𝔤 → 𝐺, and the one-parameter subgroups 𝛾𝑋 (𝑡) = exp(𝑡𝑋) it determines for elements 𝑋 ∈ 𝔤. Defining exp for general Lie groups would require a digression on integral curves (solution curves) for smooth left-invariant vector fields 𝑋˜ on 𝐺 — essentially a matter of solving partial differential equations on the manifold, with 𝑋˜ playing the role of the differential operator. The exponential is simpler for matrix Lie groups, so they will be the focus of these Notes. For such groups, 𝔤 is a subspace in M(𝑛, 𝕂) and exp turns out to be the restriction to 𝔤 = TG𝑒 of the classical exponential map for matrices, ∞

(5.17)

1 𝑛 𝐴 𝑛! 𝑛=0

Exp(𝐴) = 𝑒 𝐴 = ∑

(𝐴 ∈ M(𝑚, 𝕂)).

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

255

Our whole discussion will rest on the fact that the subgroups 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 generated by elements 𝑋 ∈ M(𝑛, 𝕂) remain within 𝐺 for all 𝑡 ∈ 𝕂 if and only if 𝑋 ∈ 𝔤. This was true for each of the classical groups, but for general matrix Lie groups this is not evident from the definition of 𝔤 = TG𝑒 , which says: 𝑋 ∈ TG𝑒 ⇔ 𝑋 = 𝛾′ (0) for some 𝒞 ∞ curve in M(𝑛, 𝕂) defined near 𝑡 = 0, such that 𝛾(0) = 𝐼 and 𝛾(𝑡) ∈ 𝐺 for all 𝑡. For any matrix 𝑋 the curves 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 from ℝ into GL(𝑛, 𝕂) are obviously 𝒞 maps defined for all 𝑡 ∈ ℝ. Once we know they remain confined to 𝐺 when 𝑋 ∈ 𝔤 we can use Exp to impose “exponential coordinates” that identify points near the identity in 𝐺 with an open subset of vector space 𝔤 with dimension 𝑑 = dim(𝐺). Our first task is to resolve this issue for Exp ∶ M(𝑛, 𝕂) → GL(𝑛, 𝕂), and then for the restrictions exp = Exp|𝔤 ∶ 𝔤 → 𝐺 for arbitrary matrix Lie groups. ∞

One-Parameter Subgroups in Matrix Lie Groups. For 𝕂 = ℝ or ℂ, the exponential map (5.17) sends M(𝑛, 𝕂) into GL(𝑛, 𝕂), but is not surjective or one-to-one. That presents some problems when we try to define logartihms 2 Log(𝐴) of matrices in GL(𝑛, 𝕂). As maps 𝕂𝑛 → 𝕂, matrix entries (𝑒 𝐴 )𝑖𝑗 are 𝒞 ∞ functions of the entries in 𝐴, so Exp is a 𝒞 ∞ map from M(𝑛, 𝕂) → M(𝑛, 𝕂) with range in GL(𝑛, 𝕂).4 Each matrix 𝑋 ∈ M(𝑛, 𝕂) determines a one-parameter subgroup 𝛾𝑋 (𝑡) = Exp(𝑡𝑋) = 𝑒𝑡𝑋

(𝑡 ∈ ℝ).

This 𝒞 ∞ map is also a group homorphism (ℝ, +) → (GL(𝑛, 𝕂), ⋅ ) because 𝛾𝑋 (𝑠 + 𝑡) = 𝛾𝑋 (𝑠) ⋅ 𝛾𝑋 (𝑡)

𝛾𝑋 (0) = 𝐼𝑛×𝑛

(the identity matrix),

by the exponent laws. Thus Exp maps radial lines ℝ ⋅ 𝑋 through the origin in matrix space to (nonlinear) one-dimensional subgroups in GL(𝑛, 𝕂) emanating from the identity element 𝐼. In Section 5.1 of LA I we saw that 𝑑𝛾𝑋 (𝑡) = 𝑋 ⋅ 𝛾𝑋 (𝑡) 𝑑𝑡

(matrix product)

for all 𝑡, and in particular the tangent vector at 𝐼 = 𝛾𝑋 (0) is 𝛾𝑋′ (0) = 𝑋. Exercise 5.74. Compute the one-parameter groups 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 in closed form for the following matrices (evaluate the series sum). (a) 𝐴1 = (

−1 0 ) 0 −1

(b) 𝐴2 = (

−1 0 ) 0 1

(d) 𝐴4 = (

0 1 ) 1 0

(e) 𝐴5 = (

𝑖 1 ) −1 0

(c) 𝐴3 = (

0 1 ) −1 0

where 𝑖 = √−1. 2

4 When we identify M(𝑛, ℝ) ≅ ℝ𝑛 the matrix entries are actually real-analytic functions of the 𝑛2 entries in 𝐴, with absolutely convergent multivariate Taylor series expansions at every base point.

256

5. MATRIX LIE GROUPS

The Logarithm 𝐋𝐨𝐠(𝑨) of a Matrix. Recall that 𝑒𝐴+𝐵 = 𝑒 𝐴 ⋅ 𝑒 𝐵 for commuting matrices. The following complex power series are convergent on the domains indicated: ∞

1 𝑘 𝑧 𝑘! 𝑘=0

Exp(𝑧) = ∑

for all 𝑧 ∈ ℂ

∞

(−1)𝑘+1 (𝑧 − 1)𝑘 𝑘! 𝑘=1

Log(𝑧) = ∑

for |𝑧 − 1| < 1.

By grouping together even and odd powers in the series 1 1 Exp(𝑖𝑦) = 1 + 𝑖𝑦 − 𝑦2 − 𝑖 𝑦3 + ⋯ 2! 3! for real 𝑦, we can verify the basic computational identity 𝑒𝑥+𝑖𝑦 = 𝑒𝑥 ⋅ 𝑒𝑖𝑦 = 𝑒𝑥 cos(𝑦) + 𝑖𝑒𝑥 sin(𝑦)

for 𝑥, 𝑦 ∈ ℝ.

2𝜋𝑖𝑛

Then 𝑒 = 1 for all 𝑛, so the complex exponential function 𝑓(𝑧) = 𝑒𝑧 is periodic under translations by Δ𝑧 = 2𝜋𝑖. These series are norm convergent when applied to matrices of sufficiently small norm. The series Exp(𝑋) converges for all 𝑋 in any norm, while Log(𝐴) converges when ‖𝐴 − 𝐼‖ < 1 for any norm such that ‖𝐴𝐵‖ ≤ ‖𝐴‖ ⋅ ‖𝐵‖, for instance the operator norm ‖𝐴‖𝑜𝑝 introduced in Section 5.3 of LA I. By wrangling with absolute norm convergence of series one can verify that the matrix-valued versions of these series behave very much like their scalar-valued counterparts, with Log(Exp(𝑋)) = 𝑋 for ‖𝑋‖ < log 2 (5.18) Exp(Log(𝐴)) = 𝐴 for ‖𝐴 − 𝐼‖ < 1. When 𝑋 is a nilpotent matrix the series Exp(𝑋) obviously reduces to a finite sum, and so does Log(𝐴) when 𝐴 = 𝐼 +𝑁 is unipotent, and these identities hold for all 𝑋 and all 𝐴 ≠ 0.This becomes important in dealing with Jordan forms of complex matrices, which consist of Jordan blocks 𝐵𝑘 = 𝜆𝑘 𝐼 + 𝑁𝑘 on the diagonal, where 𝑁𝑘 is an elementary nilpotent matrix. Example 5.75. If 𝜆 is a nonzero complex number, we will show that 1. When 𝜆 ≠ 0 the Jordan block 𝜆 1 0 ⋯ ⋯ 0 ⎛0 𝜆 1 ⋯ ⋯ 0⎞ ⎜ 0 ⋱ ⋱ ⋮ ⎟⎟ 𝐴 = 𝜆𝐼 + 𝑁 = ⎜ ⋱ 1 0⎟ ⎜⋮ 𝜆 1⎟ ⎜ ⋯ 0 𝜆 ⎠𝑛×𝑛 ⎝0 is the exponential of some complex matrix 𝑋. 2. The Jordan form can be used to conclude that every invertible complex matrix 𝐴 is 𝑒𝑋 for some 𝑋 in M(𝑛, ℂ), so Exp is surjective from M(𝑛, ℂ) to GL(𝑛, ℂ).

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

257

3. There are no solutions in M(𝑛, ℂ) to the equation Exp(𝑋) = 0𝑛×𝑛 . Nor are there any matrix solutions to Exp(𝑋) = 𝑁 when 𝑁 is a nilpotent matrix. Discussion. In Part 1, we have 𝑁 𝑁 Log(𝐼+ ) 𝜆 𝜆𝐼 + 𝑁 = 𝜆 ⋅ (𝐼 + ) = 𝑒log(𝜆) ⋅ 𝑒 𝜆 2

= 𝑒log(𝜆) ⋅ Exp(

𝑁 1 𝑁 (−1)𝑛 𝑁 − ( ) +⋯+ ⋅( ) 𝜆 2 𝜆 𝜆 (𝑛 − 1)

𝑛−1

)

= Exp(log(𝜆)𝐼 + ( ⋯ )) so (𝜆𝐼 + 𝑁) is in the range of Exp. If 𝐴 ∈ GL(𝑛, ℂ),some conjugate 𝑆𝐴𝑆 −1 will be in Jordan form, consisting of Jordan blocks on the diagonal, 𝑆𝐴𝑆 −1 = diag(𝐵1 , … , 𝐵𝑟 ). Each block is 𝐵𝑘 = 𝜆𝑘 𝐼 + 𝑁𝑘 with 𝑁𝑘 elementary nilpotent, and corresponds to an 𝐿𝐴 -invariant subspace 𝐸𝑘 in ℂ𝑛 , with a particular choice of basis 𝔛𝑘 . For the basis 𝔛 = 𝔛1 ∪ ⋯ ∪ 𝔛𝑟 we have [𝐿𝐴 ]𝔛𝔛 = diag(𝐵1 , … , 𝐵𝑟 ). If dim(𝐸𝑘 ) = 𝑚𝑘 , each 𝐵𝑘 = 𝑒𝐶𝑘 for some 𝑚𝑘 × 𝑚𝑘 matrix 𝐶𝑘 , and if 𝐶 = diag(𝐶1 , … , 𝐶𝑟 ) we have 𝑒𝐶 = diag(𝑒𝐶1 , … , 𝑒𝐶𝑟 ) = diag(𝐵1 , … , 𝐵𝑟 ) = 𝑆𝐴𝑆 −1 . Since 𝑆 −1 GL(𝑛, ℂ)𝑆 = GL(𝑛, ℂ), 𝐴 is in the range of Exp. As for Item 3, if 𝑒𝑋 = 0 for some 𝑋 we would have 𝐼 = 𝑒𝑋 ⋅ 𝑒−𝑋 = 0, which is impossible. If 𝑁 is nilpotent and 𝑁 = 𝑒𝑋 then 0 = 𝑁 𝑑 = 𝑒𝑑𝑋 and again, no such matrix 𝑑 ⋅ 𝑋 or 𝑋 can exist. In particular, the claim proved in Part 1 cannot hold if 𝜆 = 0 (so 𝐴 is an elementary nilpotent matrix). ○ Remark 5.76. Any complex number 𝑧 ≠ 0 has a logarithm 𝑤 = log(𝑧) such that 𝑒𝑤 = 𝑧; in fact there are infinitely many, given by log(𝑧) = log |𝑧| + 𝑖𝜃 + 2𝜋𝑖𝑛

𝑛 ∈ ℤ,

where 𝑧 = |𝑧|𝑒𝑖𝜃 is the polar form of 𝑧. If 𝑧 = 𝑟 + 𝑖0 is real and negative, it cannot have a real logarithm because 𝑒𝑡 > 0 for all real 𝑡. If you re-examine the proof in Part 1 of 5.75 you will find that: If 𝜆 is nonzero and real, the only obstacle to finding a real matrix such that Exp(𝑋) = 𝜆𝐼 + 𝑁 arises when the logarithm log(𝜆) is nonreal, which happens for instance when 𝜆 = −1. The blocks in the Jordan form of any complex matrix 𝐴 have the form 𝐵 = 𝜆𝐼, or 𝐵 = 𝜆𝐼 + 𝑁, so the only obstacle to finding a real matrix 𝑋 such that 𝑒𝑋 = 𝐴 for a nonsingular matrix 𝐴 ∈ M(𝑛, ℝ) is the presence of at least one non-real eigenvalue when 𝐴 is viewed as a complex matrix, since that would produce a non-real constant term log(𝜆) in the solution. ○ Example 5.77. Understanding the map Exp ∶ M(𝑛.𝕂) → GL(𝑛, 𝕂) for 𝕂 = ℂ or ℝ begins with finding all solutions of the matrix equation Exp(𝑋) = 𝑒𝑋 = 𝐼𝑛×𝑛

(𝑋 ∈ M(𝑛, 𝕂)),

258

5. MATRIX LIE GROUPS

as well as the equation 𝑒𝑋 = 0𝑛×𝑛 . The only solutions of 𝑒𝑋 = 𝐼 in M(𝑛, ℂ) are 𝑋 = diag(2𝜋𝑖𝑘1 , … , 2𝜋𝑘𝑟 )

for 𝑘1 , … , 𝑘𝑟 ∈ ℤ.

Real solutions (if any) can be read out of this by inspection. Discussion. We get the Jordan form of 𝑋 by a similarity transformation 𝐽(𝑋) = 𝑆𝑋𝑆 −1 ; it too is a solution because 𝑆𝐼𝑆 −1 = 𝐼. But 𝐽(𝑋) = diag(𝐵1 , … , 𝐵𝑟 ) consists of Jordan blocks, and 𝑒𝐽(𝑋) = diag(𝑒𝐵1 , … , 𝑒𝐵𝑟 ), so we are reduced to solving 𝑒 𝐵 = 𝐼𝑑×𝑑 for a Jordan block 𝐵 = 𝜆𝐼 + 𝑁 with 𝜆 ≠ 0. Then we have

𝐼 = 𝑒 𝐵 = 𝑒 𝜆 𝐼 ⋅ 𝑒𝑁

⎛1 ⎜0 𝜆 ⎜ =𝑒 ⋅⎜ ⎜ ⎜ ⎝0

1

1

1

1!

2! 1

3! 1

1!

2!

1

⋯ ⋯

1 (𝑑−1)! 1 (𝑑−2)!

⋱ ⋱ ⋱ 1

⎞ ⎟ ⎟ . ⎟ ⎟ ⎟ ⎠𝑑×𝑑

This cannot happen unless 𝑑 = 1 and each 𝜆 = 2𝜋𝑖𝑘 for some 𝑘 ∈ ℤ. Thus the blocks 𝐵𝑗 in a soution of 𝑒𝑋 = 𝐼 are all 1 × 1 but the entries 2𝜋𝑘𝑗 √−1 may differ from cell to cell. The solutions are the diagonal matrices 𝑋 = diag(2𝜋𝑖𝑘1 , … , 2𝜋𝑖𝑘𝑟 ) with 𝑘𝑗 ∈ ℤ.

○

Exercise 5.78. Use the JCF to prove that Exp ∶ M(𝑛, ℝ) → GL(𝑛, ℝ) need not be surjective by showing that 𝐴=(

−1 1 ) 0 −1

is not in the range of the exponential map. Singularities of the Exponential Map on 𝐌(n, ℂ). Next we show that the differential 𝑑(Exp)0 of the exponential map from M(𝑛, 𝕂) into GL(𝑛, 𝕂) ⊆ 2 𝕂𝑛 has nonzero Jacobian determinant at the origin in matrix space. Existence of a locally defined 𝒞 ∞ logarithm Log that inverts Exp near the zero matrix is then an immediate consequence of the inverse mapping theorem 5.10. Proposition 5.79 (Local Inverses of Exp). There are open neighborhoods 𝑈 about the zero matrix in M(𝑛, 𝕂), and 𝑉 about the identity 𝐼𝑛×𝑛 in GL(𝑛, 𝕂), such that Exp ∶ 𝑈 → 𝑉 is bijective and 𝒞 ∞ in both directions. The locally defined inverse Log(𝐴) is defined on 𝑉 = Exp(𝑈), with Log(𝐼𝑛×𝑛 ) = 0𝑛×𝑛 . ∞

Proof. Since 𝑒𝑋 = ∑𝑘=0 𝑋 𝑘 /𝑘! is absolutely norm convergent it is easily verified that the matrix entries (𝑒𝑋 )𝑖𝑗 are 𝒞 ∞ functions of the entries 𝑥𝑖𝑗 in 𝑋, so Exp is a 𝒞 ∞ map on matrix space. By convergence of the exponential series, simple norm estimates (as in Chapter 5 of LA I) show that Exp(𝑋) = 𝐼 + 𝑋 + 𝑜(‖𝑋‖)

with

𝑜(‖𝑋‖) = 0, 𝑋→0 ‖𝑋‖ lim

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

259

so the differential (𝑑Exp) at 𝑋 = 0 is the identity map 𝐼. Thus det (𝑑Exp)0 = 1 and Exp is smoothly invertible near 𝑋 = 0 in M(𝑛, 𝕂) = 𝔤𝔩(𝑛, 𝕂) by Theorem 5.10 □ This is of course a local result about behavior of Exp ∶ M(𝑛, 𝕂) → GL(𝑛, 𝕂) near 𝑋 = 0. But with further effort we can compute the differential of Exp at any base point in M(𝑛, 𝕂), see equation (5.22) below. Then (𝑑Exp)𝑋 is singular precisely at the base points 𝑋 where det (𝑑Exp)𝑋 = 0, and finding these singular points becomes a straightforward problem in linear algebra. In order to derive such formulas we need a few more basic facts about the maps Exp, Ad, and ad associated with a matrix Lie group 𝐺 and its Lie algebra 𝔤, so we digress momentarily to derive a crucial result relating vectors 𝑋 in the Lie algebra to one-parameter subgroups 𝛾𝑋 (𝑡) = Exp(𝑡𝑋) in a matrix Lie group 𝐺. The analytic estimates involved are related to the calculus proof that 𝑛

𝑡 lim (1 + ) = 𝑒𝑡 𝑛 𝑛→∞ Theorem 5.80. Let 𝐺 be a matrix Lie group and 𝛾(𝑡) a 𝒞 ∞ curve in 𝐺 defined on some interval |𝑡| < 𝑟. If 𝛾(0) = 𝐼 and 𝛾′ (0) = 𝑋, so 𝑋 is in the Lie algebra 𝔤 = TG𝑒 , then the one-parameter group 𝑘

𝑡 𝛾𝑋 (𝑡) = Exp(𝑡𝑋) is equal to lim 𝛾( ) , 𝑘 𝑘→∞ and remains within 𝐺 for all 𝑡 ∈ ℝ. Proof. The last part follows from the first because 𝐺 is a group and we have 𝛾𝑋 (𝑠 + 𝑡) = 𝛾𝑋 (𝑠) ⋅ 𝛾𝑋 (𝑡). In fact, since 𝛾𝑋 (−𝑟, 𝑟) ⊆ 𝐺 the exponent law 𝛾𝑋 (𝑠 + 𝑡) = 𝛾𝑋 (𝑠) ⋅ 𝛾𝑋 (𝑡) implies that 𝛾𝑋 (−2𝑟, 2𝑟) = 𝛾𝑋 (−𝑟, 𝑟)2 ⊆ 𝐺, etc., so 𝛾𝑋 (ℝ) ⊆ 𝐺. Now choose 𝛿 > 0 so that 𝛾(𝑡) ∈ 𝑉 (the neighborhood of 𝐼 in our theorem) for |𝑡| < 2𝛿; then 𝑍(𝑡) = Log(𝛾(𝑡)) is a 𝒞 ∞ curve with with 𝑍(0) = 0. By the chain rule ′ 𝑍 ′ (0) = (𝑑Exp)−1 0 ⋅ 𝛾 (0) = 𝐼 ⋅ 𝑋 = 𝑋, so by Taylor’s formula 𝑍(𝑡) = 𝑡𝑋 + 𝑜(𝑡2 )

as 𝑡 → 0.

For fixed 𝑋, if we replace 𝑡 by 𝑡/𝑘 for a fixed 𝑘 ∈ ℕ, the last identity becomes 𝑡 𝑡 𝑡2 𝑍( ) = 𝑋 + 𝑜( 2 ), 𝑘 𝑘 𝑘 hence there exists 𝜖 > 0 such that 𝑡 1 𝑘𝑍( ) = 𝑡𝑋 + 𝑜( ) 𝑘 𝑘

for |𝑡| < 𝜖 and 𝑘 ∈ ℕ.

Therefore 𝑘

𝑘

𝑡 𝑡 𝑡 1 𝛾( ) = (Exp 𝑍( )) = Exp(𝑘𝑍( )) = Exp(𝑡𝑋 + 𝑜( )) 𝑘 𝑘 𝑘 𝑘

260

5. MATRIX LIE GROUPS

is in 𝐺 for all 𝑘 and |𝑡| < 𝜖. Now let 𝑘 → ∞; by continuity of Exp we conclude that 𝑒𝑡𝑋 ∈ 𝐺 for |𝑡| < 𝜖. □ Exercise 5.81. Whether 𝕂 = ℝ or ℂ, prove that 𝑛

lim (𝐼 +

𝑛→∞

𝑋 ) = 𝑒𝑋 𝑛

in GL(𝑛, 𝕂)

for any 𝑋 ∈ M(𝑛, 𝕂). Hint. The group GL(𝑛, 𝕂) with Lie algebra 𝔤𝔩(𝑛, 𝕂) = M(𝑛, 𝕂) is an open set in matrix space, so it contains all points 𝐴 near the identity 𝐼 (recall Proposition 5.41 and Exercise 5.43 of LA I). Now apply Theorem 5.80 to the curve 𝛾(𝑡) = 𝐼 + 𝑡𝑋 for small 𝑡 ∈ ℝ in GL. When 𝕂 = ℂ an instructive alternative proof of the identity in Exercise 5.81, can be obtained using the JCF. When 𝑋 = diag((𝑎1 , … , 𝑎𝑛 ) is diagonal the result follows immediately from the classical calculus result just mentioned. If 𝑋 is diagonalizable, with 𝑆𝑋𝑆 −1 = 𝐷, then 𝑛

𝑆(𝐼 +

𝑛

𝑆𝑋𝑆 −1 𝑋 ) 𝑆−1 = (𝐼 + ) → 𝑆𝑒𝐷 𝑆 −1 𝑛 𝑛

as 𝑛 → ∞.

Applying the inverse similarity transformation 𝐵 → 𝑆 −1 𝐵𝑆 to both sides of this identity, we get 𝑛

(𝐼 +

𝑋 ) → 𝑆 −1 𝑒𝐷 𝑆 = 𝑒𝑋 𝑛

as 𝑛 → ∞.

We leave it as an exercise for the reader to carry out the final step, in which 𝑋 is not diagonalizable over ℂ and the JCF must be invoked. Relation Between 𝐄𝐱𝐩 ∶ 𝔤𝔩 → 𝐆𝐋 and Its Restriction 𝐞𝐱𝐩 ∶ 𝔤 → 𝐆. Theorem 5.80 yields a precise description of the vectors 𝑋 ∈ M(𝑛, 𝕂) in the Lie algebra 𝔤 of a matrix Lie group 𝐺. The following corollary shows, for any such group, that Exp maps 𝔤 into 𝐺, so the restriction exp𝐺 = Exp|𝔤 (usually written as “exp” with 𝐺 and 𝔤 understood) is a well-defined map from 𝔤 → 𝐺 that can be regarded as the “intrinsic exponential map” on the Lie algebra of a matrix Lie group. Corollary 5.82. For 𝕂 = ℝ or ℂ, let 𝐺 be a matrix Lie group in GL(𝑛, 𝕂). Then 1. The Lie algebra of 𝐺 in M(𝑛, 𝕂) is 𝔤 = {𝑋 ∈ M(𝑛, 𝕂) ∶ Exp(𝑡𝑋) ∈ 𝐺 for all 𝑡 ∈ ℝ}. 2. The restricted exponential map exp = Exp|𝔤 sends 𝔤 into 𝐺, and since (𝑑Exp)0 is the identity map on 𝔤𝔩(𝑛, 𝕂) the differential (𝑑 exp)0 of its restriction to 𝔤 is the identity map on 𝔤. Thus exp ∶ 𝔤 → 𝐺 is a locally invertible 𝒞 ∞ map whose inverse is a 𝒞 ∞ map with respect to the manifold structure on 𝐺.

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

261

Proof. The Lie algebra 𝔤 can be identified with the manifold 𝑀 = ℝ𝑚 (𝑚 = dim(𝐺)), and by Theorem 5.80 exp is a 𝒞 ∞ map from 𝑀 into the manifold 𝐺 of the same dimension. We have noted that the differential (𝑑 exp)0 at the zero element in ℝ𝑚 is the identity map (hence nonsingular), so by the inverse mapping Theorem 5.10 exp maps an open neighborhood of zero 𝑈 ⊆ 𝑀 bijectively to an open neighborhood 𝑉 of the identity element 𝑒 ∈ 𝐺. In particular, the in−1 verse map log = ( exp |𝑈 ) is a 𝒞 ∞ map with respect to the manifold structures in 𝐺 and 𝑀 = ℝ𝑚 . □ Case Study: Nilpotent Lie Groups. In the next stages of Lie theory (beyond the scope of these Notes) the goal is to understand the structure of a Lie algebra 𝔤 and the various connected Lie groups associated with it. This begins with investigations of Lie subalgebras 𝔥 ⊆ 𝔤, vector subspaces that are closed under formation of commutators [𝑋, 𝑌 ], and so are Lie algebras in their own right. This could be complicated business, but there is one venue in which the whole theory of Lie groups takes a particularly simple form: the study of nilpotent Lie groups and Lie algebras. In this subsection we will give examples illustrating the particularly simple form many concepts take for these groups. Every Lie algebra determines a sequence of subalgebras 𝔤 = 𝔤(1) ⊇ 𝔤(2) ⊇ 𝔤(3) ⊇ ⋯ defined inductively taking linear combinations of iterated commutators [𝑋1 , [𝑋2 , … , [𝑋𝑟 , 𝑋𝑟+1 ] … ]] of vectors in 𝔤: 𝔤(1) = 𝔤 𝔤(2) = [𝔤, 𝔤] = 𝕂-span{[𝑋, 𝑌 ] ∶ 𝑋, 𝑌 ∈ 𝔤} (5.19)

𝔤(3) = [𝔤, 𝔤(2) ] = 𝕂-span{[𝑋, 𝑌 ] ∶ 𝑋 ∈ 𝔤, 𝑌 ∈ 𝔤(2) } ⋮ 𝔤(𝑘+1) = [𝔤, 𝔤(𝑘) ] = 𝕂-span{[𝑋, 𝑌 ] ∶ 𝑋 ∈ 𝔤, 𝑌 ∈ 𝔤(𝑘) } ⋮

Definition 5.83. A Lie algebra over 𝕂 is nilpotent if this commutator series dies after finitely many steps, with 𝔤 = 𝔤(1) ⫌ 𝔤(2) ⫌ ⋯ ⫌ 𝔤(𝑟) ⫌ 𝔤(𝑟+1) = {0}. We leave it as an Additional Exercise for Section 5.4 to show that the subspaces [𝔤, 𝔤(𝑘) ] are decreasing and closed under formation of commutators [𝐴, 𝐵], so they are all Lie subalgebras in 𝔤; furthermore, ad𝑋 ∶ 𝔤 → 𝔤 is a nilpotent operator. It is possible to have [𝔤, 𝔤] = 𝔤 at the first step, and then the terms in the commutator series are constant. (This is true for the Lie algebras 𝔤 = 𝔰𝔩(𝑛, ℝ), 𝔰𝔬(𝑛), 𝔲(𝑛) for 𝑛 ≥ 2, and many other classical groups.) Finally, we say that a connected Lie group 𝐺 is nilpotent if its Lie algebra 𝔤 is nilpotent.

262

5. MATRIX LIE GROUPS

The Heisenberg groups 𝐻𝑛 are nilpotent Lie groups whose algebraic properties underlie the Heisenberg uncertainty principle for 𝑛-particle systems in quantum mechanics. They can be modeled as matrix Lie groups with dimensions dimℝ 𝐻𝑛 = 2𝑛 + 1, but their basic features are clearly suggested by what happens when 𝑛 = 1 and dimℝ (𝐺) = 3, a case we now examine in detail. Example 5.84 (The Heisenberg Group 𝐻3 ). The lowest order Heisenberg group is the matrix group 1 𝑥 𝑧 𝐺 = {( 0 1 𝑦 ) ∶ 𝑥, 𝑦, 𝑧 ∈ ℝ} . 0 0 1 Although 𝐺 is embedded in a 9-dimensional matrix space its dimension as a Lie group is dim(𝐺) = 3 because its elements are parametrized by the variables 𝑥, 𝑦, 𝑧. We shall indicate elements of 𝐺 by writing 1 𝑥 𝑧 (𝑥, 𝑦, 𝑧) = ( 0 1 𝑦 ) , 0 0 1 with 𝐼 = (0, 0, 0). The differentiable structure on 𝐺 is determined by a single chart (𝑈𝛼 , 𝑥𝛼 ) with 𝑈𝛼 = (all of 𝐺), 𝑉𝛼 = 𝑥𝛼 (𝑈𝛼 ) = ℝ3 , and chart map 1 𝑥 𝑧 𝑥𝛼 ( 0 1 𝑦 ) = (𝑥, 𝑦, 𝑧) 0 0 1

in ℝ3 .

The group multiplication law in 𝐺 is an easily computed polynomial in the parameters 𝑥, 𝑦, 𝑧, (𝑥, 𝑦, 𝑧) ⋅ (𝑥′ , 𝑦′ , 𝑧′ ) = (𝑥 + 𝑥′ , 𝑦 + 𝑦′ , 𝑧 + 𝑧′ + 𝑥𝑦′ ) We leave it as an exercise to show that the inverse of an element in 𝐺 is also described by a polynomial, with (𝑥, 𝑦, 𝑧)−1 = (−𝑥, −𝑦, −𝑧 + 𝑥𝑦). In the Lie algebra 𝔤 we can take basis vectors 0 1 0 𝑋=(0 0 0) 0 0 0

0 0 0 𝑌 =(0 0 1) 0 0 0

0 0 1 𝑍 = ( 0 0 0 ), 0 0 0

which are the derivatives 𝛾′ (0) of the 𝒞 ∞ curves 𝛾1 (𝑡) = (𝑡, 0, 0)

𝛾2 (𝑡) = (0, 𝑡, 0)

𝛾3 (𝑡) = (0, 0, 𝑡)

in 𝐺. The one-parameter groups determined by these tangent vectors are 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 = 𝐼 + 𝑡𝑋 +

𝑡2 2 𝑋 + ⋯ = 𝐼 + 𝑡𝑋 2!

since 𝑋 2 = 0; similarly 𝛾𝑌 (𝑡) = 𝐼 + 𝑡𝑌 = (0, 𝑡, 0)

and

𝛾𝑍 (𝑡) = 𝐼 + 𝑡𝑍 = (0, 0, 𝑡).

From this we get the one nontrivial commutation relation [𝑋, 𝑌 ] = 𝑍 between the basis vectors 𝑋, 𝑌 , 𝑍 in 𝔤; the general commutator [𝐴, 𝐵] can be found using bilinearity of the Lie bracket. We prove [𝑋, 𝑌 ] = 𝑍 using the construction presented in Proposition 5.57.

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

263

Discussion. Consider the product of one-parameter subgroups 𝑓(𝑠, 𝑡) = 𝛾𝑋 (𝑠)𝛾𝑌 (𝑡)𝛾𝑋 (𝑠)−1 = 𝑒𝑠𝑋 𝑒𝑡𝑌 𝑒−𝑠𝑋 with 𝛾𝑋′ (0) = 𝑋, 𝛾𝑌′ (0) = 𝑌 in 𝔤. For any matrix Lie group, we have shown that the commutator [𝑋, 𝑌 ] of elements in 𝔤 is given by [𝑋, 𝑌 ] =

𝜕2 𝑓 | | . 𝜕𝑠𝜕𝑡 |𝑠=𝑡=0

Applying 𝜕/𝜕𝑠 we get 𝜕𝑓 (𝑠, 𝑡) = 𝑋𝑒𝑠𝑋 ⋅ 𝑒𝑡𝑌 𝑒−𝑠𝑋 + 𝑒𝑠𝑋 𝑒𝑡𝑌 ⋅ (−𝑋𝑒𝑠𝑋 ). 𝜕𝑠 Setting 𝑠 = 0 this becomes 𝑋𝑒𝑡𝑌 − 𝑒𝑡𝑌 𝑋; then applying 𝜕/𝜕𝑡 and setting 𝑡 = 0 we get 𝑋𝑌 − 𝑌 𝑋 = [𝑋, 𝑌 ]. But 𝑓(𝑠, 𝑡) can be written another way using the multiplication law in 𝐺, 𝑓(𝑠, 𝑡) = (𝑠, 0, 0) ⋅ (0, 𝑡, 0) ⋅ (−𝑠, 0, 0) = (𝑠, 𝑡, 𝑠𝑡) ⋅ (−𝑠, 0, 0) = (0, 𝑡, 𝑠𝑡) from which we get 𝜕𝑓 (𝑠, 𝑡) = (0, 0, 𝑡) = 𝑡𝑍 𝜕𝑠

and

𝜕2 𝑓 | | = 𝑍. 𝜕𝑠𝜕𝑡 |𝑠=𝑡=0

Hence [𝑋, 𝑌 ] = 𝑍, [𝑌 , 𝑋] = −𝑍, and all other structure constants are zero. Exercise 5.85. Show that the exponential map Exp ∶ 𝔤 → 𝐺 has the form 1

0 𝑎 𝑐 1 𝑎 (𝑎 + 𝑎𝑏) 2 Exp(𝑎𝑋 + 𝑏𝑌 + 𝑐𝑍) = Exp ( 0 0 𝑏 ) = ( 0 1 ), 𝑏 0 0 0 0 0 1 for 𝑎, 𝑏, 𝑐 ∈ ℝ. Note that 𝐴3 = 0 for the strictly upper triangular matrices in 𝔤, so the exponential series is finite. Exercise 5.86. Show that the exponential map Exp ∶ 𝔤 → 𝐺 has a globally defined 𝒞 ∞ inverse Log ∶ 𝐺 → 𝔤 given by 1

1 𝑥 𝑧 0 𝑥 (𝑧 − 𝑥𝑦) 2 Log ( 0 1 𝑦 ) = ( 0 0 ), 𝑦 0 0 1 0 0 0 for 𝑥, 𝑦, 𝑧 ∈ ℝ. ˜ on 𝐺 deterFinally we show how to compute the smooth vector field 𝐻 mined by an arbitrary vector 𝐻 = 𝑎𝑋 + 𝑏𝑦 + 𝑐𝑍 in the Lie algebra 𝔤. With respect to the chart coordinates 𝑥𝛼 (𝑔) = (𝑥, 𝑦, 𝑧) the basis vectors are 𝑋=( where 𝑒 = 𝐼.

𝜕 | 𝜕 | | ), … , 𝑍 = ( | ), 𝜕𝑥 |𝑒 𝜕𝑧 |𝑒

264

5. MATRIX LIE GROUPS

˜ on 𝐺 determined by a tanExample 5.87. The left-invariant vector field 𝐻 gent vector 𝐻 = 𝑎𝑋 +𝑏𝑌 +𝑐𝑍 can be described in terms of the chart coordinates and the fields of basis vectors (𝜕/𝜕𝑥), … , (𝜕/𝜕𝑧) they induce throughout the chart domain 𝑈𝛼 = 𝐺. Discussion. Regarding tangent vectors in 𝔤 as derivations on the local al˜ becomes a partial differential operator on gebra 𝒞 ∞ (𝑒) at the identity in 𝐺, 𝐻 the space 𝒞 ∞ (𝐺) of smooth globally defined functions on 𝐺 and we have ˜ ˜(𝑥,𝑦,𝑧) , 𝑓⟩ = ⟨(𝑑𝜆(𝑥,𝑦,𝑧) )𝑒 𝐻, 𝑓⟩ 𝐻𝑓(𝑥, 𝑦, 𝑧) = ⟨𝐻 = ⟨𝐻, 𝑓 ∘ 𝜆(𝑥,𝑦,𝑧) ⟩

(by definition of (𝑑𝜆𝑔 )𝑒 on 𝔤)

= ⟨𝑎𝑋 + 𝑏𝑌 + 𝑐𝑍 , 𝑓 ∘ 𝜆(𝑥,𝑦,𝑧) ⟩ 𝜕 𝜕 =𝑎⋅ (𝑓 ∘ 𝜆(𝑥,𝑦,𝑧) )(0, 0, 0) + ⋯ + 𝑐 ⋅ (𝑓 ∘ 𝜆(𝑥,𝑦,𝑧) )(0, 0, 0) 𝜕𝑥 𝜕𝑧 By the multiplication law we obtain (𝑓 ∘ 𝜆(𝑥,𝑦,𝑧) )(𝑥′ , 𝑦′ , 𝑧′ ) = 𝑓(𝑥 + 𝑥′ , 𝑦 + 𝑦′ , 𝑧 + 𝑧′ + 𝑥𝑦′ ). Take partial derivatives (𝜕/𝜕𝑥′ ), … , (𝜕/𝜕𝑧′ ) of this expression and set 𝑥′ = 𝑦′ = 𝑧′ = 0 to get 𝜕𝑓 𝜕𝑓 𝜕𝑓 ˜ (𝑥, 𝑦, 𝑧) 𝐻𝑓(𝑥, 𝑦, 𝑧) = 𝑎 ⋅ [ (𝑥, 𝑦, 𝑧)] + 𝑏 ⋅ [ (𝑥, 𝑦, 𝑧)] + 𝑐 ⋅ 𝜕𝑥 𝜕𝑦 𝜕𝑧 𝜕 𝜕 𝜕 | = ⟨(𝑎 , 𝑓⟩. + 𝑏 + (𝑏𝑥 + 𝑐) )| 𝜕𝑥 𝜕𝑦 𝜕𝑧 |(𝑥,𝑦,𝑧) ˜ is Conclusion: at any base point 𝑔 ∈ 𝐺 the value of the left-invariant field 𝐻 ˜𝑔 = 𝑎( 𝜕 || ) + 𝑏( 𝜕 || ) + (𝑏 + 𝑐𝑥)( 𝜕 || ) 𝐻 𝜕𝑥 |𝑔 𝜕𝑦 |𝑔 𝜕𝑧 |𝑔 if 𝑔 = (𝑥, 𝑦, 𝑧). The same ideas apply for any Lie group, but when several charts are needed to cover 𝐺 we would have to do this calculation for each chart. ○ The Maps Exp, Ad, ad Revisited. Almost everything we need to know about these maps for general matrix Lie groups follows from the corresponding properties of the general linear group 𝐺 = GL(𝑛, 𝕂) and its Lie algebra 𝔤 = 𝔤𝔩(𝑛, 𝕂) in which: “Exp” is the classical matrix-exponential map ∞

1 𝑛 𝑋 𝑛! 𝑛=0

Exp(𝑋) = 𝑒𝑋 = ∑

for 𝑋 ∈ M(𝑛, 𝕂).

For matrix Lie groups 𝐺, • By definition, the map “ad” sends each 𝑋 ∈ 𝔤 to the linear operator ad𝑋 (𝑌 ) = [𝑋, 𝑌 ] = 𝑋𝑌 − 𝑌 𝑋 on the Lie algebra. These are all linear Lie derivations on 𝔤 in the sense that ad𝑋 ([𝑌 , 𝑍]) = [ad𝑋 (𝑌 ), 𝑍] + [𝑌 , ad𝑋 (𝑍)].

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

265

• The map “Ad” sends each 𝑔 ∈ 𝐺 to the differential Ad𝑔 = (𝑑𝑖𝑔 )𝑒 ∶ 𝔤 → 𝔤 of the conjugation operator 𝑖𝑔 (𝑥) = 𝑔𝑥𝑔−1 . Since 𝑖𝑔 is an automorphism of 𝐺, the differential turns out to be an automorphism of the Lie algebra, with Ad𝑔 ([𝑋, 𝑌 ]) = [Ad𝑔 (𝑋), Ad𝑔 (𝑌 )]

for 𝑋, 𝑌 ∈ 𝔤, 𝑔 ∈ 𝐺.

The automorphisms of 𝔤 form a group Aut(𝔤) under composition of operators and the map Ad ∶ 𝑔 → Ad𝑔 is a group homomorphism from (𝐺, ⋅ ) → (Aut(𝔤), ∘). (Recall Exercise 5.59.) These definitions apply verbatim to arbitrary Lie groups. The first fact we need connects all three players in this saga, and pervades most discussion of Lie groups and their Lie algebras. The proof is based on Corollary 5.82, in which we showed that the restriction exp = Exp|𝔤 of the classical exponential map Exp to the Lie algebra 𝔤 ⊆ M(𝑛, 𝕂) of a matrix Lie group 𝐺 turns out to be the “intrinsic” exponential map exp ∶ 𝔤 → 𝐺 associated with 𝐺. To state the result in full generality we consider a 𝒞 ∞ homomorphism 𝜙 ∶ 𝐺 → 𝐻 between two matrix Lie groups: a 𝒞 ∞ map that is also a group homomorphism, so that 𝜙(𝑒𝐺 ) = 𝑒𝐻

and

𝜙(𝑥 ⋅𝐺 𝑦) = 𝜙(𝑥) ⋅𝐻 𝜙(𝑦)

for all 𝑥, 𝑦 ∈ 𝐺.

Theorem 5.88. For any 𝒞 ∞ homomorphism 𝜙 ∶ 𝐺 → 𝐻 between matrix Lie groups we have (5.20)

𝜙(exp𝐺 (𝑡𝑋)) = exp𝐻 (𝑡 (𝑑𝜙)𝑒 𝑋) = 𝑒𝑡(𝑑𝜙)𝑒 𝑋

for 𝑋 ∈ 𝔤, 𝑡 ∈ ℝ,

where exp𝐺 and exp𝐻 are the exponential maps from 𝔤 → 𝐺 and 𝔥 → 𝐻. Proof. For 𝑋 ∈ 𝔤 consider the 𝒞 ∞ curves (one-parameter subgroups in 𝐻) 𝛾1 (𝑡) = 𝜙(exp𝐺 (𝑡𝑋))

and

𝛾2 (𝑡) = exp𝐻 (𝑡 (𝑑𝜙)𝑒 𝑋) = 𝑒𝑡 (𝑑𝜙)𝑒 𝑋 ,

which are group homomorphisms from (ℝ, +) into 𝐻 because exp ((𝑠 + 𝑡)𝑋) = exp(𝑠𝑋) ⋅ exp(𝑡𝑋) for both exp𝐺 and exp𝐻 . By Corollary 5.82 the curve 𝛾(𝑡) = exp𝐺 (𝑡𝑋) is 𝒞 ∞ in 𝐺 with 𝛾(0) = 𝐼 and 𝛾′ (0) = 𝑋 ∈ 𝔤. Viewing elements of 𝔥 as derivations on ∞ functions 𝑓 in the local algebra 𝒞𝐻 (𝑒) at the identity in 𝐻, it follows from the definition of the differential (𝑑𝜙)𝑒 and Lemma 5.44 that ⟨𝛾2′ (0), 𝑓⟩ = ⟨(𝑑𝜙)𝑒 𝛾′ (0), 𝑓⟩ = ⟨(𝑑𝜙)𝑒 𝑋, 𝑓⟩ = ⟨𝛾′ (0), 𝑓 ∘ 𝜙⟩ 𝑑 = {𝑓(𝜙(𝛾(𝑡)))}|𝑡=0 𝑑𝑡 𝑑 = {𝑓(𝜙(exp𝐺 (𝑡𝑋)))}|𝑡=0 𝑑𝑡 𝑑 = ⟨ {(𝜙(exp𝐺 (𝑡𝑋))}|𝑡=0 , 𝑓⟩ 𝑑𝑡 = ⟨𝛾1′ (0), 𝑓⟩

266

5. MATRIX LIE GROUPS

∞ for all 𝑓 ∈ 𝒞𝐻 (𝑒). Thus

𝛾2′ (0) = (𝑑𝜙)𝑒 𝑋 = 𝛾1′ (0). The matrix Lie groups 𝐺, 𝐻 need not live in the same matrix space. 5 Nevertheless, if 𝐺 ⊆ M(𝑛, ℝ) and 𝐻 ⊆ M(𝑚, ℝ) we may regard both groups 𝛾1 (𝑡) and 𝛾2 (𝑡) in 𝐻 as one parameter-groups in matrix space M(𝑚, ℝ), with the same derivatives 𝛾𝑘′ (0) = (𝑑𝜙)𝑒 𝑋 at 𝑡 = 0. Viewed this way, both groups 𝛾𝑘 (𝑡) are vector-valued solutions 𝑋(𝑡) to the differential equation 𝑑𝑋 = (𝑑𝜙)𝑒 ⋅ 𝑋(𝑡) 𝑑𝑡

with initial condition 𝑋(0) = 𝐼𝑚×𝑚 , 2

on the Euclidean space M(𝑚, ℝ) ≅ ℝ𝑚 We now invoke a fundamental existence theorem in the theory of systems of 𝑚 constant-coefficient ordinary differential equations. Recasting the system as a single differential equation involving vector-valued functions 𝑓(𝑡) with values in ℝ𝑚 , the existence theorem takes the form: Theorem 5.89 (Fundamental Existence Theorem for ODE). Given a matrix 𝐵 ∈ M(𝑚, ℝ), the vector-valued solutions of the differential equation 𝑑𝑋 = 𝐵 ⋅ 𝑋(𝑡) with initial value 𝑋(0) = 𝐜 𝑑𝑡 are defined and 𝒞 ∞ for all 𝑡 ∈ ℝ. The solution is unique once the initial value is specified, and is given by 𝑋(𝑡) = 𝑒𝑡𝐵 (𝐜)

for − ∞ < 𝑡 < ∞.

Returning to the proof of Theorem 5.88, we may take 𝐵 = [(𝑑𝜙)𝑒 ] and initial condition 𝑋(0) = 𝐼𝑚×𝑚 to conclude that 𝛾1 (𝑡) = 𝛾2 (𝑡) = 𝑒𝑡(𝑑𝜙)𝑒 𝑋 ⋅ 𝐼𝑛×𝑛 = Exp(𝑡(𝑑𝜙)𝑒 𝑋)

for all 𝑡, □

so the identity (5.20) is verified.

This yields a frequently invoked special case when we take 𝐺 = 𝐻 and 𝜙(𝑔) a typical inner automorphism 𝜙 = 𝑖exp𝐺 (𝑋) for some 𝑋 ∈ 𝔤, so that (𝑑𝜙)𝑒 = Adexp𝐺 (𝑋) . Writing ad𝑋 (𝑌 ) = [𝑋, 𝑌 ] as usual, we get: Corollary 5.90. If 𝐺 is a matrix Lie group and 𝑋, 𝑌 ∈ 𝔤 we have ∞

(5.21)

𝑡𝑛 (ad𝑋 )𝑛 (𝑌 ) 𝑛! 𝑛=0

Adexp(𝑡𝑋) (𝑌 ) = 𝑒𝑡 ad𝑋 (𝑌 ) = ∑

for all 𝑡 ∈ ℝ,

where “exp” is the restriction exp𝐺 = Exp|𝔤 . Proof. The first equality is immediate from Theorem 5.88; the second follows upon replacing 𝑒𝑡 ad(𝑋) by its power series expansion. □ 5The surrounding matrix spaces are irrelevant. The result is actually true as stated for 𝒞 ∞ homomorphisms between general Lie groups 𝐺, 𝐻.

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

267

This result is reflected in the following commutative diagram. exp

𝔤 −−⟶ 𝐺 ad ↓

↓ Ad exp

Der(𝔤) −−⟶ Aut(𝐺) where 𝐷𝑒𝑟(𝔤) is the space of derivation on 𝔤. Lemma 5.91. If 𝐺 is a matrix Lie group then 1. The linear operators Ad𝑔 = (𝑑𝑖𝑔 )𝑒 from 𝔤 → 𝔤 are Lie algebra homomorphisms, so that Ad𝑔 ([𝑋, 𝑌 ]) = [Ad𝑔 (𝑋), Ad𝑔 (𝑌 )]

for all 𝑋, 𝑌 in 𝔤.

2. The map Ad that sends 𝑔 → Ad𝑔 is a group homomorphism from 𝐺 into the automorphism group Aut(𝔤) of its Lie algebra. Proof. For Part 2: Ad𝑒 = 𝐼 (the identity map on 𝔤) and we have 𝑖𝑔1 𝑔2 = 𝑖𝑔1 ∘ 𝑖𝑔2 . Then by Exercise 5.59 we get Ad𝑔1 𝑔2 = Ad𝑔1 ∘ Ad𝑔2 and (2.) follows. For Part 1 apply Lemma 5.58 in which we proved that Ad𝑔 (𝑋) = (𝑑𝑖𝑔 )𝑒 (𝑋) = 𝑔𝑋𝑔−1

(matrix product).

We also know the Lie bracket [𝔤, 𝔤] in 𝔤 is the associative commutator [𝑋, 𝑌 ] = 𝑋𝑌 −𝑌 𝑋 computed in the full matrix algebra M(𝑛, 𝕂). (Remember: 𝑋𝑌 and 𝑌 𝑋 are not elements of 𝔤, but this commutator is back in 𝔤 owing to cancellations hidden from view in the proof of Proposition 5.57.) Now compute in M(𝑛, 𝕂): if 𝑔 ∈ 𝐺 and 𝑋, 𝑌 ∈ 𝔤 we have Ad𝑔 [𝑋, 𝑌 ] = 𝑔[𝑋, 𝑌 ]𝑔−1 = 𝑔(𝑋𝑌 − 𝑌 𝑋)𝑔−1 = [𝑔𝑋𝑔−1 , 𝑔𝑌 𝑔−1 ] (after some cancellations 𝑔𝑔−1 = 𝑒) = [Ad𝑔 (𝑋), Ad𝑔 (𝑌 )].

□

When 𝐺 = (ℝ, +) the exponential map is Exp(𝑥) = 𝑒𝑥 , but the calculation is not so straighforward for 𝐺 = (ℝ𝑛 , +). First, our discussion of Exp was framed for matrix Lie groups, for which the usual formula Exp(𝐴) = 𝑒 𝐴 makes sense because 𝐺 is embedded in some matrix space M(𝑛, ℝ). For general Lie groups the definitions of the Lie algebra 𝔤 and exponential map exp ∶ 𝔤 → 𝐺 are quite a bit more subtle. So, to apply our previous discussion and define an exponential map to 𝐺 = 𝑛 ℝ we must first realize 𝐺 as a matrix group. As an immediate consequence of equation (5.14), the bracket operation on the Lie algebra of any abelian matrix Lie group is trivial: [𝑋, 𝑌 ] = 0 for all 𝑋, 𝑌 ∈ 𝔤. Thus 𝔤 is just a copy of ℝ𝑛 where 𝑛 = dimℝ (𝐺). Also, as indicated in Example 5.26, (ℝ2 , +) is isomorphic to the matrix Lie group 1 0 𝑡1 𝐺 = {( 0 1 𝑡2 ) ∶ 𝐭 = (𝑡1 , 𝑡2 ) ∈ ℝ2 } 0 0 1

in M(3, ℝ) ,

whose exponential map we shall calculate in the following exercise. Similar calculations apply to any ℝ𝑛 .

268

5. MATRIX LIE GROUPS

Exercise 5.92 (Calculating Exp for (𝐺, ⋅) ≅ (ℝ2 , +)). Define the matrices 0 0 1 𝑋=( 0 0 0 ) 0 0 0

0 0 0 𝑌 =( 0 0 1 ) , 0 0 0

so that a typical element in 𝐺 is 𝐼 + 𝑡1 𝑋 + 𝑡2 𝑌 . (a) Calculate the one-parameter groups 𝛾𝑋 (𝑡1 ) = 𝑒𝑡1 𝑋 , 𝛾𝑌 (𝑡2 ) = 𝑒𝑡2 𝑌 . (b) Calculate the derivatives 𝛾𝑋′ (0), 𝛾𝑌′ (0) and explain why 𝑋, 𝑌 are basis vectors for the Lie algebra 𝔤 ⊆ M(3, ℝ). (c) Verify that Exp(𝑡1 𝑋 + 𝑡2 𝑌 ) = Exp(𝑡1 𝑋) ⋅ Exp(𝑡2 𝑌 ) is equal to 𝐼 + 𝑡1 𝑋 + 𝑡2 𝑌 for 𝑡1 , 𝑡2 ∈ ℝ, so 𝑒𝑥𝑝𝐺 = Exp|𝔤 maps 𝔤 bijectively onto the hyperplane 𝑀 = 𝐼 + ℝ𝑋 + ℝ𝑌 . Hint. The associative product 𝑋𝑌 is zero. We now extend Proposition 5.79 to identify all the singularities of the exponential map Exp on M(𝑛, 𝕂). Proposition 5.93. The differential (𝑑Exp)𝑋 of the exponential map at an arbitrary base point 𝑋 in M(𝑛, 𝕂) = 𝔤𝔩(𝑛, ℝ) is: (5.22)

(𝑑Exp)𝑋 = (𝑑𝜆Exp(𝑋) )𝐼 ∘ (

𝐼 − 𝑒−ad𝑋 ) = (𝑑𝜆Exp(𝑋) )𝐼 ∘ 𝑓(ad𝑋 ), ad𝑋

where 𝑓(𝑧) is the analytic function of complex variable 1 1 1 − 𝑒−𝑧 = 1 − 𝑧 + 𝑧2 − … 𝑧 2! 3! which converges for all 𝑧 ∈ ℂ by the ratio test. 𝑓(𝑧) =

(𝑓(0) = 𝐼 by L’Hôpital),

Remark 5.94. Before we take up the proof, this identity needs some interpretation. The first factor in (5.22) is the differential of a left-translation operator 𝜆𝑔 , 𝑔 = Exp(𝑋), which is always a nonsingular linear operator on 𝔤𝔩(𝑛, 𝕂); as such it is irrelevant in deciding whether (𝑑Exp)𝑋 is nonsingular, so we can ignore it. In the second factor, 𝑒−ad𝑋 is a nonsingular linear operator on 𝔤𝔩(𝑛, 𝕂), but the difference 𝐼 − 𝑒−ad𝑋 might have determinant zero for certain choices of 𝑋. By Proposition 5.40 of LA I the matrix-valued series obtained by setting 𝑧 = ad𝑋 is absolutely norm convergent, so the factor 𝑓(𝐴) always makes sense. If we write 𝐴 for 𝑓(ad𝑋 ) ∶ 𝔤𝔩 → 𝔤𝔩 in (5.22), deciding whether (𝑑Exp)𝑋 is singular for a given 𝑋 comes down to deciding whether 𝐴 is singular, which happens ⇔ 𝜆 = 0 is in the complex spectrum spℂ (𝐴). By the spectral mapping theorem (Lemma 6.12 of LA I if 𝕂 = ℂ, or Theorem 1.62 if 𝕂 = ℝ), we have sp(𝑓(ad𝑋 )) = 𝑓(sp(ad𝑋 )). We conclude that the factor 𝐴 is singular ⇔ 𝑓(𝜆) = 0 for some complex eigenvalue of ad𝑋 ∶ 𝔤 → 𝔤. The geometric problem of finding the singlarities of

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

269

Exp ∶ 𝔤𝔩(𝑛, 𝕂) → GL(𝑛, 𝕂) has now been reduced to a straightforward problem in linear algebra. ○ Exercise 5.95. If 𝐴 ∈ M(𝑛, ℝ) is viewed as a complex matrix that happens to have all-real entries, it is quite possible that spℂ (𝐴) ⫌ spℝ (𝐴). Explain why 𝐴 is singular ⇔ 𝜆 = 0 is in spℝ (𝐴) ⇔ 𝜆 = 0 is in spℂ (𝐴). Exercise 5.96. Explain why Exp ∶ M(𝑛, 𝕂) → GL(𝑛, 𝕂) has a smooth local inverse near 𝑋 unless ad𝑋 has an eigenvalue of the form 𝜆 = 2𝜋𝑛𝑖 for 𝑛 = ±1, ±2, … . We return to the task of proving Proposition 5.93. Proof. For a smooth curve 𝑋(𝑡) in M(𝑛, 𝕂) define the function 𝜕 {Exp(𝑠𝑋(𝑡))} for 𝑠, 𝑡 ∈ ℝ. 𝜕𝑡 Applying 𝜕/𝜕𝑠 (and abbreviating 𝑋(𝑡) = 𝑋 for the moment), we get 𝑌 (𝑠, 𝑡) = Exp(−𝑠𝑋(𝑡)) ⋅

𝜕𝑌 𝜕 (𝑠, 𝑡) = (−𝑋)Exp(−𝑠𝑋) ⋅ {Exp(𝑠𝑋))} 𝜕𝑠 𝜕𝑡 𝜕 𝜕 + Exp(−𝑠𝑋) { {Exp(𝑠𝑋)}} 𝜕𝑠 𝜕𝑡 Now write (∗ ∗ ∗) as a placeholder for the first term, and interchange the partial derivatives in 𝑠 and 𝑡 in the second, to get 𝜕 𝜕𝑌 = (∗ ∗ ∗) + Exp(−𝑠𝑋) ⋅ {𝑋 ⋅ Exp(𝑠𝑋)} 𝜕𝑠 𝜕𝑡 𝜕 𝑑𝑋 Exp(𝑠𝑋) + 𝑋 ⋅ {Exp(𝑠𝑋)}]. = (∗ ∗ ∗) + Exp(−𝑠𝑋) ⋅ [ 𝑑𝑡 𝜕𝑡 The term 𝜕 {Exp(𝑠𝑋)} 𝜕𝑡 appearing here cancels a similar term in (∗ ∗ ∗) that involves (+𝑠𝑋), leaving 𝑋 ⋅ Exp(−𝑠𝑋)

𝜕𝑌 𝑑𝑋 = Exp(−𝑠𝑋) Exp(𝑠𝑋) 𝜕𝑠 𝑑𝑡 𝑑𝑋 𝑑𝑋 = Adexp(−𝑠𝑋) ( ) = 𝑒−𝑠 ad𝑋 ⋅ ( ) 𝑑𝑡 𝑑𝑡

(by Lemma 5.58).

Since 𝑌 (0, 𝑡) ≡ 0, when 𝑠 = 1 we see that 1

Exp(−𝑋)

𝜕𝑌 𝑑 (𝑠, 𝑡) 𝑑𝑠, {Exp(𝑋)} = 𝑌 (1, 𝑡) = ∫ 𝑑𝑡 𝜕𝑠 0

and since ∞

𝑘 𝑑𝑋 (−1)𝑘 𝑠𝑘 𝜕𝑌 𝑑𝑋 = Exp(−𝑠 ad𝑋 ) = ∑ (ad𝑋 ) ⋅ 𝜕𝑠 𝑑𝑡 𝑘! 𝑑𝑡 𝑘=0

270

5. MATRIX LIE GROUPS

we may integrate term-by-term from 𝑠 = 0 to 𝑠 = 1 to get ∞

Exp(−𝑋(𝑡))

𝑘 𝑑𝑋 𝑑 (−1)𝑘 {Exp(𝑋(𝑡))} = ∑ (ad𝑋(𝑡) ) ⋅ 𝑑𝑡 𝑑𝑡 (𝑘 − 1)! 𝑘=1

= 𝑓(ad𝑋(𝑡) ) ⋅

𝑑𝑋 . 𝑑𝑡

Multiplying both sides on the right by Exp(𝑋(𝑡)) yields 𝑑 𝑑𝑋 {Exp(𝑋(𝑡))} = 𝑓(ad𝑋(𝑡) ) ⋅ Exp(𝑋(𝑡)) 𝑑𝑡 𝑑𝑡

for all 𝑡

(since Exp(𝑋(𝑡)) commutes with its derivative and with 𝑑𝑋/𝑑𝑡). This is true for all smooth curves with 𝑋(0) = 𝑋; taking 𝑡 = 0 we get (5.22). (Recall that (𝑑𝜆𝑔 )𝑒 (𝐻) = 𝑔 ⋅ 𝐻 (matrix product) for all 𝑔 ∈ 𝐺, 𝐻 ∈ 𝔤.) □ As a historical note, the most commonly cited analytic proof of the CBH formula (discussed next) follows once equation (5.22) is in hand, see [2]; a different, almost completely algebraic proof that proceeds along the lines developed by Dynkin can be found in [3, 4]. Connected Lie Groups. A matrix Lie group 𝐺 becomes a metric space when equipped with the Euclidean distance function 𝑛

1/2

𝑑(𝐴, 𝐵) = ( ∑ |𝐴𝑖𝑗 − 𝐵𝑖𝑗 |2 ) 𝑖,𝑗=1 2

it inherits from matrix space M(𝑛, 𝕂) ≅ 𝕂𝑛 . A set 𝐴 ⊆ 𝐺 is an open set if, for every 𝑎 ∈ 𝐴, all 𝑥 ∈ 𝐺 sufficiently close to 𝑎 are also in 𝐴. Definition 5.97 (Open Sets in 𝐺). A subset 𝐴 in a metric space 𝑋 is an open set if, for any 𝑎 ∈ 𝐴 there is a radius 𝑟(𝑎) > 0 such that 𝑑(𝑥, 𝑎) < 𝑟 ⇒ 𝑥 ∈ 𝐴. It is a closed set if its complement 𝑋 ∼ 𝐴 is open. That said, we can give a precise definition of what it means for a metric space 𝑋 to be connected. This is an important concept because some results on Lie groups apply only to groups that are connected (or nearly so). In the other direction we have discrete groups such as the integers (ℤ, +) whose points are isolated from one another; in this extreme dis-connected situation there is no geometry to discuss and we are in the realm of pure algebraic group theory. Definition 5.98. A metric space 𝑋 disconnected if it it contains nonempty open subsets 𝑈, 𝑉 such that 𝑋 = 𝑈 ∪ 𝑉 and 𝑈 ∩ 𝑉 = ∅. 𝑋 is connected if no such partition of 𝑋 into isolated open subsets is possible. The metric space 𝑋 itself is open, and by default so is the empty set because there are no elements in it for which the definition fails. In a metric space, saying “𝐴 is closed set” is the same as saying that you cannot escape from 𝐴 by taking a limit lim𝑛→∞ 𝑎𝑛 of points 𝑎𝑛 ∈ 𝐴. It is not hard to verify two basic

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

271

properties of open sets in any metric space. (i) Finite intersections 𝑈1 ∩ ⋯ ∩ 𝑈𝑛 of open sets are open. (ii) An arbitrary union

⋃

𝑈𝛼 of open sets is open.

𝛼∈𝐼

The matrix Lie groups we are concerned with are “locally Euclidean” by definition, and such spaces are connected if and only if they are “arcwise connected,” which is easier to verify (and often obvious). Definition 5.99. A metric space 𝑋 is arcwise connected if any two points 𝑎, 𝑏 ∈ 𝑋 can be joined by a continuous curve 𝛾 ∶ [0, 1] → 𝑋 such that 𝛾(0) = 𝑎 and 𝛾(1) = 𝑏. Of course the roles of 𝑎 and 𝑏 are interchangeable since the orientation-reversed curve −𝛾(𝑡) = 𝛾(1 − 𝑡) goes from 𝑏 to 𝑎 within 𝑋. A little thought should convince you that a matrix Lie group 𝐺 is arcwise connected if and only if every element 𝑎 ∈ 𝐺 is arc-connected to the identity element 𝑒 ∈ 𝐺. Lemma 5.100. A connected matrix Lie group 𝐺 is generated by any relatively open neighborhood 𝑈 of the identity element, no matter how small, in the sense that 𝐺= 𝑈 𝑛 where 𝑈 𝑛 = 𝑈 ⋅ . . . ⋅ 𝑈 (𝑛 terms) . ⋃ 𝑛∈ℕ

In essence, if 𝐺 is a connected Lie group the nature of 𝐺 close near the identity propagates to determine the nature of the whole group. Proof. (Sketch Proof.) The relatively open sets in 𝐺 mentioned here are those of the form 𝐴 = 𝐺 ∩ 𝑈 for some open set 𝑈 in the full matrix space M(𝑛, ℝ), just as open subsets of a hypersurface 𝑆 in ℝ𝑛 are the intersections of 𝑆 with actual open subsets in ℝ𝑛 . (They are also the open sets in 𝐺 determined by the metric in matrix space, restricted to the subset 𝐺.) The proof proceeds in several steps: • 𝑈 −1 = {𝑢−1 ∶ 𝑢 ∈ 𝑈} is also open, so 𝑈 ∩ 𝑈 −1 is a symmetric open ∞ neighborhood of 𝑒 in 𝐺. Thus it will suffice to show that 𝐺 = ⋃𝑛=1 𝑈 𝑛 for any open neighborhood of 𝑒 such that 𝑈 = 𝑈 −1 . ∞ • It follows that the union 𝐺0 = ⋃𝑛=1 𝑈 𝑛 is closed under forming products 𝑥 ⋅ 𝑦 or inverses 𝑥−1 of its elements. Hence 𝐺0 is the subgroup of 𝐺 generated by 𝑈 (the smallest subgroup containing the set 𝑈). We used the property 𝑈 −1 = 𝑈 to insure that 𝑥 ∈ 𝐺0 ⇒ 𝑥−1 ∈ 𝐺0 . • In matrix Lie groups the product 𝐴 ⋅ 𝐵 = {𝑎𝑏 ∶ 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵} of two open sets is again an open set, and so is the set of inverses 𝐴−1 = {𝑎−1 ∶ 𝑎 ∈ 𝐴}. This is an immediate consequence of the the fact that the group multiplication operation 𝑃 ∶ 𝐺 × 𝐺 → 𝐺 is jointly continuous and the inversion operation 𝐽 ∶ 𝐺 → 𝐺 is bicontinuous. • Thus all product sets 𝑈 𝑛 are open sets in 𝐺, as is their union 𝐺0 which is now seen to be an open subgroup in the larger group 𝐺.

272

5. MATRIX LIE GROUPS

• For the endgame we observe that no connected matrix Lie group group can have a proper open subgroup, which forces the desired conclusion that 𝐺0 = 𝐺. To see why, note that left translations 𝜆𝑔 (𝑥) = 𝑔𝑥 on 𝐺 are bicontiuous maps, so they send open sets to open sets. If 𝐻 is an open subgroup in 𝐺 the cosets (translates) 𝑔𝐻 = 𝜆𝑔 (𝐻) are also open sets. We now assert that 𝐺 is a union of pairwise disjoint translates 𝑔𝑖 𝐻 ≠ 𝑔𝑗 𝐻 (with 𝑔1 = 𝑒), which split 𝐺 into nonempty disjoint open sets 𝐻1 = 𝑒𝐻 = 𝐻

and

𝐻2 =

⋃

𝑔𝑖 𝐻 .

𝑔𝑖 ≠𝑒

Unless 𝐺 = 𝐻 (in which case we’re done), we get a contradiction to the hypothesized connectedness of 𝐺, and that will complete the proof of Lemma 5.100. The assertion follows immediately once we observe that In a matrix Lie group, left translates 𝑥𝐻 and 𝑦𝐻 of an open subgroup 𝐻 are either identical (𝑥𝐻 = 𝑦𝐻 as sets) or are disjoint (𝑥𝐻 ∩ 𝑦𝐻 = ∅), so the distinct translates of 𝐻 partition 𝐺 into disjoint open pieces. In fact, if 𝑥𝐻 and 𝑦𝐻 have nontrivial intersection we must have 𝑥ℎ1 = 𝑦ℎ2 for some ℎ1 , ℎ2 ∈ 𝐻, and then 𝑦𝐻 = 𝑥(ℎ1 ℎ2−1 )𝐻 = 𝑥𝐻 because 𝐻 is a subgroup in 𝐺, ℎ1 ℎ2−1 is in 𝐻, and ℎ𝐻 = 𝐻 for any ℎ ∈ 𝐻. So in this case 𝑥𝐻 = 𝑦𝐻, and otherwise 𝑥𝐻 and 𝑦𝐻 are disjoint as claimed. That completes the proof of Lemma 5.100.

□

It is also worth noting (without going into the proof) that any Lie group 𝐺, connected or not, contains a largest connected open subgroup 𝐻 and 𝐺 is a disjoint union of its translates. 𝐻 is referred to as the “connected component of the identity element in 𝐺.” To mention a few examples: 𝐺 = (ℝ𝑛 , +) and the circle group 𝐺 = (𝑆 1 , ⋅ ) are arcwise connected, hence connected, and so are the unitary groups 𝐺 = U(𝑛) with 𝑛 ≥ 2. But the real orthogonal groups O(𝑛) for 𝑛 ≥ 3 have two connected components: the maximal connected open subgroup 𝐻 = SO(𝑛) of proper orthogonal transformations having det(𝐴) = +1, and a disjoint translate whose elements all have det(𝐴) = −1. As we noted in Corollary 5.82, the restricted exponential map Exp|𝔤 on a connected matrix group sends 𝔤 into 𝐺 and its range includes an open neighborhood 𝑈 of the identity in 𝐺, which in turn generates the entire connected group 𝐺. The Campbell-Baker-Hausdorff Formula: Recovering 𝑮 from 𝔤. The Campell-Baker-Hausdorff formula (CBH) was the culmination of efforts by several mathematicians at the beginning of the 20th century to show that when the

5.4. THE EXPONENTIAL MAP FOR MATRIX LIE GROUPS

273

product operation 𝑥 ⋅ 𝑦 near the identity in 𝐺 is transfered via the exponential map to an operation 𝑋 ∗ 𝑌 on 𝔤, 𝑋 ∗ 𝑌 = Log(Exp(𝑋) ⋅ Exp(𝑌 ))

for 𝑋, 𝑌 ∈ 𝔤,

the result can be written as an infinite series 1

1

2

12

𝑋 ∗ 𝑌 = 𝑋 + 𝑌 + [𝑋, 𝑌 ] + −

1 48

[𝑋, [𝑋, 𝑌 ]] −

[𝑌 , [𝑋, [𝑋, 𝑌 ]]] −

1 48

1 12

[𝑌 , [𝑋, 𝑌 ]]

[𝑋, [𝑌 , [𝑋, 𝑌 ]]] + ⋯

that is absolutely norm convergent near zero in 𝔤, and whose terms involve only iterated Lie brackets. Baker, Campbell, and Hausdorff discussed recursive procedures for determining the series coefficents; it was only in 1947 that E.B. Dynkin found explicit combinatorial formulas for all terms. Thus the bracket operation in 𝔤 indeed determines the group structure of any connected Lie group 𝐺, up to a local isomoprhism. Full proof of CBH would take us beyond the scope of these Notes, but can be found in [2, 5, 6]. One approach can be sketched as follows. The Taylor expansion of Log(𝑧) about 𝑧 = 1 + 𝑖0 is ∞

(−1)𝑛+1 (𝑧 − 1)𝑛 𝑛 𝑛=1

Log(𝑧) = ∑

for |𝑧 − 1| < 1 in ℂ.

This yields an absolutely norm convergent sum if we replace 𝑧 by any complex matrix with operator norm ‖𝐴‖𝑜𝑝 < 1. By formally multiplying the series expansions for 𝑒𝑋 and 𝑒𝑌 and solving for 𝑍 in 𝑒𝑍 = 𝑒𝑋 ⋅ 𝑒𝑌 we get ∞

𝑋 𝑌

𝑛 (−1)𝑛+1 𝑋 𝑌 (𝑒 𝑒 − 𝐼) 𝑛 𝑛=1

𝑍 = Log(𝑒 𝑒 ) = ∑

⎞ (−1)𝑛+1 ⎛ 𝑋 𝑝1 𝑌 𝑞1 ⋯ 𝑋 𝑝𝑛 𝑌 𝑞𝑛 ⎜ ∑ ⎟. 𝑛 ⎜𝑝𝑖 +𝑞𝑖 >0 ( ∑𝑛 𝑝𝑖 + 𝑞𝑖 ) ⋅ ∏𝑛 𝑝𝑗 ! 𝑞𝑗 ! ⎟ 𝑛>0 𝑖=1 𝑗=1 ⎝ 1≤𝑖≤𝑛 ⎠

= ∑

Dynkin’s formula is obtained by showing that the expressions 𝑋 𝑝1 ⋯ 𝑌 𝑞𝑛 can be replaced by closely related expressions involving commutators [𝑋 𝑝1 ⋯ 𝑌 𝑞𝑛 ] = ((ad𝑋 )𝑝1 (ad𝑌 )𝑞1 ⋯ (ad𝑋 )𝑝𝑛 (ad𝑌 )𝑞𝑛 −1 ) 𝑌 , where ad𝐴 (𝐵) = [𝐴, 𝐵] for 𝐴, 𝐵 ∈ 𝔤. Since [𝐴, 𝐴] = 0 a term is zero if 𝑞𝑛 > 1, and it is zero (by definition) if 𝑞𝑛 = 0 and 𝑝𝑛 > 1. Absolute convergence of the series near (𝑋, 𝑌 ) = (0, 0) would also have to be addressed. Theorem 5.101 (Campbell-Baker-Hausdorff Formula). Let 𝔤 be the Lie algebra of a matrix Lie group over ℝ and write 𝑋 ∗ 𝑌 = Log(𝑒𝑋 𝑒𝑌 ) for 𝑋, 𝑌 in 𝔤. On a sufficiently small neighborhood of the origin we have an absolutely norm

274

5. MATRIX LIE GROUPS

convergent series expansion of the product operation in 𝐺: 𝑋 ∗ 𝑌 = Log(𝑒𝑋 𝑒𝑌 ) (5.23)

(−1)𝑛+1 𝑛 𝑛>0

= ∑

∑ 𝑝𝑖 +𝑞𝑖 >0 1≤𝑖≤𝑛

((ad𝑋 )𝑝1 (ad𝑌 )𝑞1 ⋯ (ad𝑥 )𝑝𝑛 (ad𝑌 )𝑞𝑛 −1 ) 𝑌 𝑛

𝑛

( ∑𝑖=1 𝑝𝑖 + 𝑞𝑖 ) ⋅ ∏𝑗=1 𝑝𝑗 ! 𝑞𝑗 !

.

where ad𝐴 (𝐵) = [𝐴, 𝐵]. Observe that this formula is “universal,” with the same terms no matter which Lie group we consider. In many situations we only need to know that such a series expansion exists, but there are many instances in which we want to know the exact form of the terms, particularly for nilpotent Lie groups, for which the CBH formula has only finitely may nonzero terms. Then 𝑋 ∗ 𝑌 is a multivariate polynomial in the variables 𝑋 and 𝑌 . 5.5. The Lie Correspondence: Lie Groups vs Lie Algebras Let 𝐺 be a connected 𝑑-dimensional matrix Lie group. We have mentioned that the Lie algebra 𝔤, a purely algebraic object, encodes almost all the information needed to reconstruct 𝐺. The obstacles to a complete reconstruction lie in the fact that Lie groups are geometric objects (manifolds) as well as algebraic objects (groups), and both aspects are important. The Lie algebra provides a good handle on the algebraic aspects, but the methods of differential geometry are needed to deal with large-scale geometric issues. For instance, we defined the Lie algebra as the tangent space 𝔤 = TG𝑒 at the identity element 𝑒, so vectors in 𝔤 are derivatives 𝑋 = 𝛾′ (0) along smooth curves in 𝐺 as they pass through 𝑒. These are determined entirely by the nature of 𝐺 in the immediate vicinity of 𝑒; the geometry of the group far from 𝑒 has no influence on the outcome. By comparison, algbraic properties near 𝑒 propagate throughout the group (at least if 𝐺 is connected) because, as mentioned earlier, 𝐺 is generated by any open neighborhood 𝑈 about 𝑒, no matter how small, in the sense that 𝐺 = ⋃𝑛∈ℕ 𝑈 𝑛 . Now consider what it means for Lie algebras or Lie groups to be “isomorphic.” Definition 5.102 (Isomorphism of Lie Groups vs Lie algebras). For Lie algebras and matrix Lie groups we say that: 1. Two Lie algebras are isomorphic, written as 𝔤 ≅ 𝔥, if there is a linear bijection 𝜙 ∶ 𝔤 → 𝔥 that intertwines the Lie algebra structures, with 𝜙([𝑋, 𝑌 ]𝔤 ) = [𝜙(𝑋), 𝜙(𝑌 )]𝔥

for 𝑋, 𝑌 ∈ 𝔤.

This equivalence relation divides all Lie algebras into isomorphism classes, and all members of a single class are regarded as being “the same.” 2. Two Lie groups are differentiably isomorphic, written 𝐺 ≅ 𝐻, if there is a bijection 𝜌 ∶ 𝐺 → 𝐻 that is: (i) a 𝒞 ∞ diffeomorphism (𝜌 and its

5.5. THE LIE CORRESPONDENCE: LIE GROUPS VS LIE ALGEBRAS

275

inverse are both 𝒞 ∞ maps), and (ii) 𝜌 is a group isomorphism, so that 𝜌(𝑥 ⋅𝐺 𝑦) = 𝜌(𝑥) ⋅𝐻 𝜌(𝑦)

for 𝑥, 𝑦 ∈ 𝐺.

By time-honored abuse of notation we often say that two Lie groups are “isomorphic” when 𝐺 ≅ 𝐻. This condition is sensitive to the large-scale geometries of 𝐺 and 𝐻, because of condition 2. It is easy to see that 𝐺 ≅ 𝐻 implies 𝔤 ≅ 𝔥. The following simple example shows why the reverse implication can fail. It is quite possible for 𝐺 and 𝐻 to be “locally isomorphic,” near their identity elements (which guarantees that 𝔤 ≅ 𝔥), but have quite different large scale geometries and are not globally diffeomorphic Lie groups, as in (2.). Example 5.103 (Locally Isomorphic Lie Groups). The real line (ℝ, +) and the Euclidean spaces (ℝ𝑛 , +) are Lie groups whose manifold structures are determined by a single chart. The unit circle (𝑆1 , ⋅) = {𝑧 ∈ ℂ ∶ |𝑧| = 1} with complex multiplication as the group operation is also a Lie group, but two overlapping charts are required to define its manifold structure. By the exponent laws, the complex exponential function 𝜙(𝑡) = 𝑒2𝜋𝑖𝑡 = cos(2𝜋𝑡) + 𝑖 sin(2𝜋𝑡) is a periodic 𝒞 ∞ map such that 𝜙(𝑡 + 𝑛) = 𝜙(𝑡) for all 𝑛 ∈ ℤ that repeatedly wraps the line around the circle. It is also a group homomorphism from (ℝ, +) onto (𝑆1 , ⋅ ) because 𝜙(𝑠 + 𝑡) = 𝜙(𝑠) ⋅ 𝜙(𝑡)

𝜙(0) = 1 + 𝑖0

for all 𝑠, 𝑡 ∈ ℝ. Although 𝜙 is not one-to-one it is one-to-one for points near zero 1 1 in ℝ, for instance on the interval 𝐼 = ( − , ). Furthermore, for 𝑠, 𝑡 sufficiently 2 2

1

close to the origin that 𝑠 + 𝑡 ∈ 𝐼 (say, with |𝑠|, |𝑡| < ), 𝜙 behaves like a group 4 isomorphism since it is one-to-one and 𝜙(𝑠 + 𝑡) = 𝜙(𝑠) ⋅ 𝜙(𝑡) for all such 𝑠, 𝑡. In other words, 𝜙 is a “local isomorphism” between these Lie groups. This phenomenon is encountered quite often. We saw earlier that SO(𝑛, ℂ) and O(𝑛, ℂ) are locally isomorphic, as are the rotation group SO(3) on ℝ3 and the full group O(3) of linear rigid motions on ℝ3 (including reflections across planes through the origin as well as rotations). The case of ℝ vs 𝑆 1 is more intersting since these groups are connected, and not related by inclusion – for details see Additional Exercises to Section 5.2, Exercise #3. This concept turns out to be just what is needed to capture the true relation between connected (matrix) Lie groups and their Lie algebras. We start with a precise definition, after which we can at last state the Lie Correspondence between (matrix) Lie groups and their Lie algebras. Definition 5.104 (Local Isomorphisms). A local homomorphism between Lie groups, 𝜙 ∶ 𝐺 → 𝐻, is a 𝒞 ∞ map defined on an open neighborhood 𝑈 of the identity in 𝐺 such that 𝜙(𝑥 ⋅𝐺 𝑦) = 𝜙(𝑥) ⋅𝐻 𝜙(𝑦) in 𝐻 for all 𝑥, 𝑦 ∈ 𝑈 such that 𝑥 ⋅𝐺 𝑦 ∈ 𝑈,

276

5. MATRIX LIE GROUPS

or equivalently, 𝜙(𝑥 ⋅𝐺 𝑦) = 𝜙(𝑥) ⋅𝐻 𝜙(𝑦) for all 𝑥, 𝑦 near the identity in 𝐺. A local isomorphism is a bijective diffeomeorphism between open neigborhoods 𝑈, 𝑉 of the identities in 𝐺 and 𝐻 such that 𝜙∶𝑈→𝑉

and

𝜓 = 𝜙−1 ∶ 𝑉 → 𝑈

are local homomorphisms. This brings us to a fundamental result in Lie theory. Theorem 5.105 (The Lie Correspondence). If 𝐺 and 𝐻 are connected matrix Lie groups with Lie algebras 𝔤 and 𝔥, the following statements are equivalent. 1. There is a diffeomorphism 𝜙 between open neighborhoods 𝑈, 𝑉 of the identities in 𝐺, 𝐻 (a bijective 𝒞 ∞ map with 𝒞 ∞ inverse) that is also a local isomorphism 𝜙 ∶ 𝐺 → 𝐻, so 𝜙(𝑥 ⋅𝐺 𝑦) = 𝜙(𝑥) ⋅𝐻 𝜙(𝑦) if 𝑥, 𝑦 ∈ 𝑈 and 𝑥 ⋅𝐺 𝑦 ∈ 𝑈. 2. There is a bijective linear isomorphism 𝑑𝜙 ∶ 𝔤 → 𝔥 between the Lie algebras that intertwines the Lie bracket operations, with 𝑑𝜙([𝑋, 𝑌 ]𝔤 ) = [𝑑𝜙(𝑋), 𝑑𝜙(𝑌 )]𝔥 for all 𝑋, 𝑌 ∈ 𝔤. In short, 𝐺 and 𝐻 are locally isomorphic connected Lie groups if and only if they have algebraically isomorphic Lie algebras. Exercise 5.106. Verify that the map 𝜙 ∶ (ℝ, +) → (𝑆 1 , ⋅ ) given by 𝜙(𝑡) = 𝑒2𝜋𝑖𝑡 = cos(2𝜋𝑡) + 𝑖 sin(2𝜋𝑡) is a local isomorphism between these abelian Lie groups, but is not a global isomorphism. Exercise 5.107. Explain why SO(𝑛, ℝ) and SL(𝑛, ℝ) cannot be locally isomorphic Lie groups. Further Development of Lie Theory. This is as far as we can go with Lie theory in this brief chapter. We conclude by indicating some of the next steps in a more complete account. 1. Making the transition from matrix Lie groups to general Lie groups. This begins by elaborating an alternative model of the Lie algebra, mentioned earlier, in which the Lie algebra is modeled as the space 𝔤̃ = ℒ(𝐺) of smooth left-invariant vector fields 𝑋˜ on 𝐺. By interpreting these as smooth first-order partial differential operators on the manifold 𝐺 we can embed 𝔤̃ in a much larger structure Diff(𝐺), the associative algebra of all smooth partial differential operators on 𝐺 (of any finite order). This turns out to have many advantages. 2. Coordinate systems in Lie groups. Exponential coordinates and multiplicative coordinates (akin to rectilinear vs polar coordinates) can be defined, which allow computational flexibility in attacking problems.

ADDITIONAL EXERCISES

277

3. The correspondence: (Lie subgroups in 𝐺) ↔ (Lie subalgebras in 𝔤). Many geometric subtleties arise in defining what should be meant by a “Lie subgroup” of a Lie group 𝐺. This reflects similar problems that arise in differential geometry when one defines “submanifolds” of a differentiable manifold 𝑀. When “Lie subgroups” are properly interpreted there is a bijective correspondence that matches structural features in 𝐺 with those in the Lie algebra 𝔤. 4. Resolving the ambiguities in the Lie correspondence. There is an elegant answer to the question Which connected Lie groups 𝐺 have the same Lie algebras 𝔤, and what are the global relations between them? Every Lie algebra 𝔤 determines a unique “simply connected covering ˜ such that every connected Lie group 𝐺 with 𝔤 as its Lie algroup” 𝐺, ˜ by a discrete subgroup 𝐷 ⊆ 𝐺. ˜ gebra is a simple quotient 𝐺 ≅ 𝐺/𝐷 The subgroup 𝐷 encodes information of profound importance about the large-scale geometric structure of 𝐺. All connected Lie groups with the same Lie algebra 𝔤 are easily understood once the covering ˜ and the discrete subgroup 𝐷 are known. The quotient map group 𝐺 ˜ → 𝐺/𝐷 ˜ 𝜋 ∶ 𝐺 = 𝐺 is a surjective local isomorphism between connected Lie groups. 5. Complete analysis of compact connected Lie groups. Using the tools mentioned here a complete analysis can be made of the algebraic and geometric structure of all compact connected Lie groups, which include many of the symmetry groups that govern interactions of fundamental particles in nuclear physics, as well as other “secret symmetries of the universe.” Additional Exercises Section 5.1. Matrix Groups and Implicit Function Theorem 1. True/False Questions (“True” if the statement is always true.) (a) The map 𝜙(𝑡) = 𝑒2𝜋𝑖𝑡 = cos(2𝜋𝑡) + 𝑖 sin(2𝜋𝑡) from ℝ into ℂ ≅ ℝ2 is a 𝒞 ∞ map with rk(𝑑𝜙)𝑡 = 1 at all base points. (b) The set of matrices 𝐶𝑡 = (

1 𝑡 ) 0 1

(𝑡 ∈ ℝ),

when equipped with matrix multiplication (⋅), is a matrix group that is isomorphic to the additive group (ℝ, +) of real numbers.

278

5. MATRIX LIE GROUPS

(c) The set of matrices 1 0 0 𝑥 ⎛ 0 1 0 𝑥1 ⎞ 2 𝐶𝐱 = ⎜ (𝐱 ∈ ℝ3 ), ⎟ ⎜ 0 0 1 𝑥3 ⎟ ⎝0 0 0 1 ⎠ equipped with matrix multiplication (⋅), is a matrix group that is isomorphic to the additive group (ℝ3 , +). (d) If we identify the complex plane ℂ with ℝ2 , the complex exponential map 𝜙(𝑧) = 𝑒𝑧 = 𝑒𝑥+𝑖𝑦 = cos(𝑥) + 𝑖 sin(𝑦), for 𝑧 = 𝑥 + 𝑖𝑦 ∈ ℂ, becomes a 𝒞 ∞ map from ℝ2 → ℝ2 whose rank rk(𝑑𝜙)𝑧 is equal to 2 at every base point 𝑧 ∈ ℂ. (e) If 𝐺 is a matrix Lie group in M(𝑛, ℝ), then the “chart map” 2

𝑥𝛼 ∶ 𝐴 → (𝑎11 , … , 𝑎𝑖𝑛 ; … ; 𝑎𝑛1 , … , 𝑎𝑛𝑛 ) ∈ ℝ𝑛

that reads off the entries 𝑎𝑖𝑗 in 𝐴 imposes Cartesian coordinates on 𝐺 that make it into a differentiable manifold. 2. If we delete the origin 0 + 𝑖0 from the complex plane ℂ, complex multiplication 𝑃(𝑧, 𝑤) = 𝑧 ⋅ 𝑤 still makes sense in the “punctured plane” ℂ∗ = ℂ ∼ 0, even though the (+) operation does not (why?). Answer the following questions about the system (𝐶 ∗ , ⋅). Cartesian coordinates are imposed on ℂ∗ by the globally defined chart map. (a) Show that (ℂ∗ , ⋅) is a group because • It is closed under formation of products 𝑧 ⋅ 𝑤 for 𝑧, 𝑤 ∈ ℂ∗ • It has a multiplicative identity element 𝑒 such that 𝑒 ⋅ 𝑧 = 𝑧 for all 𝑧 ∈ ℂ∗ for all 𝑧 ∈ ℂ∗ • For every 𝑧 ∈ ℂ∗ there is a multiplicative inverse 𝑤 = 𝑧−1 such that 𝑧 ⋅ 𝑤 = 𝑤 ⋅ 𝑧 = 𝑒 Find 𝑧−1 if 𝑧 = 4 − 5𝑖. (b) If we let ℝ2∗ = ℝ2 ∼ {(0, 0)}, multiplication 𝑧 ⋅ 𝑤 and the inversion map 𝐽(𝑧) = 1/𝑧 on ℂ∗ become 𝒞 ∞ maps from ℝ2∗ × ℝ2∗ → ℝ2∗ , or from ℝ2∗ → ℝ2∗ , when described in the coordinates (𝑥, 𝑦) on ℝ2∗ . 3. The 3-dimensional Heisenberg group is the set of upper triangular matrices 𝐺 in M(3, ℝ): 1 𝑥 𝑧 𝐴(𝑥, 𝑦, 𝑧) = ( 0 1 𝑦 ) 0 0 1

for 𝑥, 𝑦, 𝑧 ∈ ℝ.

(a) Verify that 𝐺 is closed under formation of matrix products and compute the entries 𝑥″ , 𝑦″ , 𝑧″ in the product of matrices 𝐴(𝑥", 𝑦", 𝑧") = 𝐴(𝑥, 𝑦, 𝑧) ⋅ 𝐴(𝑥′ , 𝑦′ , 𝑧′ ).

ADDITIONAL EXERCISES

279

and in the inverse matrix 𝐴−1 . (Since 𝐺 obviously contains the identity matrix 𝐼3×3 , it is then a group under matrix multiplication.) (b) Viewing matrices 𝐴 ∈ M(3, ℝ) as vectors in ℝ9 , 𝐴 = (𝑎11 , … 𝑎13 , … , 𝑎31 , … , 𝑎33 ), how would you describe 𝐺 as a level set 𝐿𝜙(𝐱)=𝐜 in ℝ9 for a suitably chosen smooth mapping 𝜙 ∶ ℝ9 → ℝ𝑟 , and for what values of 𝑟 > 0 and 𝐜? (c) Write out the 𝑟 × 9 Jacobian matrix of 𝜙, 𝐷𝜙(𝐴) = (

𝜕𝜙𝑘 ), 𝜕𝑎𝑗𝑘

at the identity element 𝐼3×3 in 𝐺 and at an arbitrary base point 𝐴 = 𝐴(𝑥, 𝑦, 𝑧) in 𝐺. (d) Is 𝐺 a smooth hypersurface (differentiable manifold) in ℝ9 , and if so, of what dimension? Explain. Section 5.2. Matrix Lie Groups 1. True/False Questions (“True” if the statement is always true.) (a) If 𝐺 is a matrix Lie group in M(𝑛, ℂ), it is not a complex Lie group unless dimℝ (𝐺) is even. (b) A matrix Lie group 𝐺 ⊆ M(𝑛, ℂ) is a complex Lie group if its dimension dimℝ (𝐺) is even. (c) The additive complex numbers (ℂ, +) can be realized as a martix group in M(2, ℂ) because the set of matrices 1 𝑧 ) ∶ 𝑧 ∈ ℂ} 1 0 equipped with matrix multiplication (⋅) is isomorphic to (ℂ, +). 2. If 𝑋 is in the tangent space 𝔤 = TM𝑒 of a matrix Lie group 𝐺, prove that the globally defined vector field with 𝑋˜𝑔 = (𝑑𝜆𝑔 )𝑒 𝑋 in TG𝑔 actually satisfies the left-invariance condition 𝑋˜𝑔⋅𝑝 = (𝑑𝜆𝑔 )𝑒 (𝑋˜𝑝 ) 𝐺′ = { (

for all base points 𝑝 and group elements 𝑔. Hint. Tangent vectors 𝑋𝑝 , 𝑌𝑝 ∈ TM𝑝 are equal if they have the same action on the local algebra 𝒞 ∞ (𝑝). The Circle Group 𝑆 1 . The unit circle 𝑆1 = {𝑧 ∈ ℂ ∶ |𝑧| = 1} in the complex plane is a group under complex multiplication (⋅) since |𝑧 ⋅ 𝑤| = |𝑧| ⋅ |𝑤| = 1 and |1/𝑧| = 1. It can also be viewed as the unit circle 𝐾 = {𝐱 ∈ ℝ2 ∶ 𝜙(𝐱) = 𝑥2 + 𝑦2 = 1}. By the IFT, this is a smooth 1-dimensional manifold in ℝ2 , but from the latter

280

5. MATRIX LIE GROUPS

point of view, its group structure is less apparent. Carry out the following steps to show 𝐺 = (𝑆 1 , ⋅ ) is an abelian (commutative) Lie group. 3. For the circle group 𝐺 = (𝑆 1 , ⋅ ): (a) Verify that the requirements of the IFT are satisfied for the level set 𝐾 = 𝐿𝜙(𝑥,𝑦)=1 in ℝ2 , where 𝜙(𝑥, 𝑦) = 𝑥2 + 𝑦2 maps ℝ2 → ℝ. (b) We can parametrize 𝐾 via the map 𝛾(𝑡) = 𝑒2𝜋𝑖𝑡 = (cos 𝑡, sin 𝑡) = (𝑥(𝑡), 𝑦(𝑡)) for 𝑡 ∈ ℝ. The multiplication 𝐾 inherits as a subset 𝑆 1 ⊆ ℂ is given by the usual exponent law 𝛾(𝑡) ⋅ 𝛾(𝑡′ ) = 𝛾(𝑡 + 𝑡′ ). If you only know the Cartesian coordinates 𝛾(𝑡) = (𝑥, 𝑦) and 𝛾(𝑡′ ) = (𝑥′ , 𝑦′ ), find the Cartesian coordinates of the product 𝛾(𝑡) ⋅ 𝛾(𝑡′ ) in terms of 𝑥, 𝑦, 𝑥′ , 𝑦′ . 4. Regarding 𝑁 = (0, 1) and 𝑆 = (0, −1) as the “north and south poles” of the circle 𝐾 of Additional Exercise 3, define the stereographic projection maps 𝑥𝛼 ∶ 𝑈𝛼 → ℝ and 𝑥𝛽 ∶ 𝑈𝛽 → ℝ that send the arcs 𝑈𝛼 = 𝐾 ∼ {𝑆} and 𝑈𝛽 = 𝐾 ∼ {𝑁} bijectively onto the tangent lines 𝐿𝑁 = {(𝑡, +1) ∶ 𝑡 ∈ ℝ} and 𝐿𝑆 = {(𝑠, −1) ∶ 𝑠 ∈ ℝ} passing through 𝑁 and 𝑆 respectively. (a) Draw a picture and find explicit formulas for the maps 𝑡 = 𝑥𝛼 (𝑥, 𝑦) and 𝑠 = 𝑥𝛽 (𝑥, 𝑦) from 𝑆1 into ℝ. −1 (b) Compute the coordinate transition maps 𝑥𝛽 ∘ 𝑥𝛼 and 𝑥𝛼 ∘ 𝑥𝛽−1 and verify that they are 𝒞 ∞ maps from ℝ → ℝ. Since these two 𝒞 ∞ -related charts cover the circle, they impose a differentiable structure on 𝐾 = 𝑆 1 .

Computations Involving O(2, ℂ). The following exercises examine properties of the complex orthogonal group O(2, ℂ) consisting of matrices such that 𝐴𝐴T = 𝐼2×2 . Identifying M(2, ℂ) ≅ ℂ4 and dropping one redundant identity, we are left with 3 irredundant complex constraint identities involving the rows 𝑅𝑗 = (𝐴𝑗1 , 𝐴𝑗2 ) of 𝐴, 𝜙1 = 𝐵(𝑅1 , 𝑅1 ) = 1 + 𝑖0,

𝜙2 = 𝐵(𝑅2 , 𝑅2 ) = 1 + 𝑖0,

𝜙3 = 𝐵(𝑅1 , 𝑅2 ) = 0,

so O(2, ℂ) is the level set 𝐿𝜙(𝐴)=(1,1,0) in ℂ4 . Here 𝐵 ∶ ℂ2 × ℂ2 → ℂ is the canonical nondegenerate symmetric bilinear form 𝐵 on ℂ2 𝐵(𝐳, 𝐰) = 𝐳T 𝐼2×2 𝐰 = 𝑧1 𝑤1 + 𝑧2 𝑤2 for 𝐳 = (𝑧1 , 𝑧2 ), 𝐰 = (𝑤1 , 𝑤2 ). 5. Regarding O(2, ℂ) as a subset in M(2, ℂ) ≅ ℂ4 , (a) Prove that a matrix 𝐴=(

𝑎 𝑏 ) 𝑐 𝑑

ADDITIONAL EXERCISES

281

with complex entries is in O(2, ℂ) if and only if 𝑑=𝑎

𝑐 = −𝑏

𝑎2 + 𝑏2 = 1.

(b) Verify that all matrices in O(2, ℂ) commute. (You could use (a) for this. It is no longer true for O(𝑛, ℂ) when 𝑛 ≥ 3). (c) In terms of the constraint identities that determine them, explain why dimℝ O(2, ℂ) = dimℝ SO(2, ℂ) = 2. 6. Viewing M(2, ℂ) ≅ ℝ8 , rewrite the complex constraints in Part (a) of the previous exercise as a system of real constraint identities 𝜙(𝐱) = 𝐜. Then verify that a constant rank condition rk(𝑑𝜙)𝐴 = 𝑟 holds near every base point 𝐴 ∈ O(2, ℂ), as required to apply the IFT. Hint. Calculations like those in (a) were previously encountered in discussing Euler’s theorem (LA I, Theorem 6.84); Exercises 6.86 and 6.88 in LA I might also provide some guidance). 7. Verify that the set of “real points” 𝐾 = SO(2, ℂ) ∩ (M(2, ℝ) + 𝑖0) in SO(2, ℂ) is the real-orthogonal group SO(2) = {(

cos 𝜃 − sin 𝜃 ) ∶ 𝜃 ∈ ℝ} . sin 𝜃 cos 𝜃

(a) Explain why this group of matrices is isomorphic to the group Rot(2) of rotations ℛ about the origin in ℝ2 , with composition of operators 𝑅𝜃1 ∘ 𝑅𝜃2 = 𝑅𝜃1 +𝜃2 as its multiplication operation. (b) Explain why 𝐾 is a closed, bounded set of matrices in M(2, ℝ). (c) Explain how 𝐾 acquires the differentiable structure needed to make it a matrix Lie group. 8. 𝐺 = SL(𝑛, ℝ) is a hypersurface in M(𝑛, ℝ) with dimℝ (𝐺) = 𝑛2 − 1. Taking 𝑛 = 3 and identifying matrices 𝐴 ∈ M(3, ℝ) with vectors 𝐱 = (𝑎11 , 𝑎12 , 𝑎13 ; … ; 𝑎31 , 𝑎32 , 𝑎33 ) ∈ ℝ9 , 𝐺 = SL(3, ℝ) is determined by the single constraint 𝜙(𝐴) = det(𝐴) = 1 where 𝜙 ∶ ℝ9 → ℝ. (a) Write out the Jacobian matrix [(𝑑𝜙)𝐴 ] and verify that its rank is = 1 near any 𝐴 ∈ 𝐺. (b) Exhibit a partition [1, 9] = 𝐽 ′ ∪ 𝐽 as in the IFT such that the pro′ jection (𝜋𝐽 ′ |𝐺 ) from ℝ𝐽 → ℝ8 yields a standard chart on SL(3, ℝ) near the identity element 𝐼3×3 . (c) Does every choice 𝐽 ′ = {1 ≤ 𝑖1 < ⋯ < 𝑖8 ≤ 9} of eight matrix entries (out of 9) yield a valid coordinate chart about the identity 𝐼3×3 in 𝐺? If not, exhibit an explicit choice for which this fails.

282

5. MATRIX LIE GROUPS

9. For 𝐺 = SL(2, ℝ), an obvious map 𝐺 → ℝ4 𝑎 𝑏 𝐴=( ) → 𝑥𝛼 (𝐴) = (𝑎, 𝑏, 𝑐, 𝑑) 𝑐 𝑑 assigns “coordinates” to matrices in 𝐺 but does not provide a “chart” for the standard 𝒞 ∞ manifold structure on 𝐺 because dimℝ 𝐺 = 3. Explain why we can obtain a valid coordinate chart (𝑈𝛼 , 𝑥𝛼 ) near the identity element 𝐼2×2 by deleting one of the four variables from 𝑎, 𝑏, 𝑐, 𝑑. Identify at least one choice of the “redundant variable” that does not work. 2 3 10. Repeat Exercise 9, taking 𝐴 = ( ) as the base point in SL(2, ℝ). 0 1/2 Section 5.3. Lie Algebra Structure in Tangent Spaces of Lie Groups 1. True/False Questions (“True” if the statement is always true.) (a) If a matrix Lie group is abelian (𝑥𝑦 = 𝑦𝑥 for all 𝑥, 𝑦), the Lie bracket on 𝔤 is trivial, with [𝑋, 𝑌 ] = 0 for all 𝑋, 𝑌 ∈ 𝔤. (b) If 𝐺 is a matrix Lie group, every automorphism of its Lie algebra 𝔤 is the differential Ad𝑔 = (𝑑𝑖𝑔 )𝑒 of some conjugation operation 𝑖𝑔 (𝑥) = 𝑔𝑥𝑔−1 (𝑔 ∈ 𝐺). (c) For the abelian Lie group 𝐺 = (ℝ𝑛 , +), every invertible linear operator on 𝔤 is an automorphism of the Lie algebra 𝔤. (d) The set 𝒯𝑛 = {𝐴 ∈ M(𝑛, 𝕂) ∶ 𝑎𝑖𝑗 = 0 unless 𝑖 < 𝑗} of strictly upper triangular 𝑛 × 𝑛 matrices (over any ground field 𝕂) becomes a nilpotent Lie algebra when equipped with the associative bracket [𝐴, 𝐵] = 𝐴𝐵 − 𝐵𝐴. (e) If 𝔤 is a nilpotent Lie algebra, any Lie subalgebra 𝔥 ⊆ 𝔤 is also nilpotent. 2. The Lie algebra of the general linear group GL(𝑛, ℝ) is all of matrix space 𝔤𝔩(𝑛, ℝ) = M(𝑛, ℝ). A basis is provided by the “matrix units” 𝐸𝑖𝑗 whose entries are zero except for a “1” in the (𝑖, 𝑗) spot. (a) For 𝑛 = 3, give explicit formulas for the one-parameter subgroups 𝛾𝑖𝑗 (𝑡) = Exp(𝑡𝐸𝑖𝑗 ) by summing the exponential series. The outcome will depend on 𝑖, 𝑗. What are the corresponding tangent vectors 𝛾𝑖𝑗′ (0) in 𝔤𝔩(𝑛, ℝ)? (b) The Lie algebra 𝔰𝔩(𝑛, ℝ) of SL(𝑛, ℝ) has dimension dimℝ 𝔰𝔩(𝑛, ℝ) = 𝑛2 − 1 < dimℝ 𝔤𝔩(𝑛, ℝ) = 𝑛2 . Find a way to select a basis 𝑌1 , … , 𝑌𝑛2 −1 for 𝔰𝔩(𝑛, ℝ) from the full set of matrix units 𝐸𝑖𝑗 in 𝔤𝔩(𝑛, ℝ) = M(𝑛, ℝ). Hint. In Example 5.61, we characterized matrices 𝐴 in 𝔰𝔩(𝑛, ℝ) in terms of their traces Tr(𝐴) = 𝑎11 + ⋯ + 𝑎𝑛𝑛 . (c) Describe the nonzero commutators [𝑌𝑖 , 𝑌𝑗 ] = 𝑌𝑖 𝑌𝑗 − 𝑌𝑗 𝑌𝑖 for the basis in (b).

ADDITIONAL EXERCISES

283

(d) Compute the one-parameter groups 𝛾𝑌 (𝑡) = Exp(𝑡𝑌 ) for the basis vectors 𝑌 = 𝑌𝑘 in 𝔰𝔩(𝑛, ℝ) constructed in Part (b). ˜ 𝑌˜ they determine on 3. If 𝑋, 𝑌 ∈ 𝔤 and the left-invariant vector fields 𝑋, a matrix Lie group 𝐺 are viewed as operators on 𝒞 ∞ (𝐺), explain why ˜ 𝑌˜] = 𝑋˜ ∘ 𝑌˜ − 𝑌˜ ∘ 𝑌˜ is left-invariant. the vector field [𝑋, 4. In Example 5.69, we considered the group Aff(ℝ) of affine transformations 𝑇𝑎,𝑏 (𝑥) = 𝑎𝑥 + 𝑏 on the real line. (a) Determine the multiplication law 𝑇𝑎,𝑏 ∘ 𝑇𝑎′ ,𝑏′ = 𝑇𝑎″ ,𝑏″ for composing affine transformations. (b) Check that Aff(ℝ) is a group, with (i) Identity Element 𝐼 such that 𝐼 ∘ 𝑇 = 𝑇 ∘ 𝐼. (ii) Inverses 𝑇 −1 such that 𝑇 −1 𝑇 = 𝐼 = 𝑇𝑇 −1 . (iii) Associative Law 𝑇1 (𝑇2 𝑇3 ) = (𝑇1 𝑇2 )𝑇3 , for all 𝑇𝑎,𝑏 ∈ Aff(ℝ). Note. That last rule is important because it allows us to write products 𝑇1 ⋯ 𝑇𝑛 without worrying about where to put the parentheses. ○ (c) Obviously 𝑇1,0 = 𝐼 (identity transformation on ℝ). Compute the −1

parameters 𝑎′ , 𝑏′ such that 𝑇𝑎′ ,𝑏′ = (𝑇𝑎,𝑏 ) . (d) Verify that 𝑎 𝑏 ) = 𝑇𝑎,𝑏 (where 𝑎, 𝑏 ∈ ℝ, 𝑎 ≠ 0) 0 1 is a bijective isomorphism between the group (𝐺, ⋅ ) of matrices in Example 5.69 and the group (Aff(ℝ), ∘). 𝜙(

Note. The group Aff(ℝ) is a differentiable manifold whose structure is determined by two charts defined on disjoint open sets (it is a Lie group with two connected components) 𝑈+ = {𝑇𝑎,𝑏 ∶ 𝑎 > 0, 𝑏 ∈ ℝ} 𝑈− = {𝑇𝑎,𝑏 ∶ 𝑎 < 0, 𝑏 ∈ ℝ}

(the identity component) (of a subgroup).

Parts (a) and (c) show that the group operations (matrix multiply) and (matrix inverse) become 𝒞 ∞ maps when expressed in local chart coordinates, so Aff(ℝ) is a Lie group. Strictly speaking, it is not a “matrix Lie group,” but it is isomorphic to the bona fide matrix Lie group defined in (d). ○ 5. Matrices 𝐴 in the 𝑎𝑥 + 𝑏 group 𝐺 are of two types: (i) Those with det(𝐴) > 0, for which 𝑇𝑎,𝑏 ∶ ℝ → ℝ preserves the order 𝑥 < 𝑦 of points on the number line. (ii) Those with det(𝐴) < 0, which reverse their order.

284

5. MATRIX LIE GROUPS

Thus 𝐺 is a union 𝐺 = 𝐺 + ∪ 𝐺 − of disjoint, open subsets, according to the sign of det(𝐴). The piece 𝐺 + containing the identity element 𝐼2×2 is a subgroup; its companion 𝐺 − is not. (Why?) (a) Prove that the “one-parameter subgroups” 𝛾1 (𝑠) = 𝑒𝑠𝑋 and 𝛾2 (𝑡) = 𝑒𝑡𝑌 of Exercise 5.70 uniquely label each 𝑔 in 𝐺 + by a pair (𝑠, 𝑡) in ℝ2 by showing that every 𝑔 ∈ 𝐺 + has a unique factorization 𝑒𝑠 0 1 𝑡 𝑔 = 𝑔(𝑠, 𝑡) = 𝑒𝑠𝑋 ⋅ 𝑒𝑡𝑌 = ( )⋅( ). 0 1 0 1 𝑎 𝑏 (b) If an element of 𝐺 is presented as 𝐴 = ( ), what are the 0 1 values of the parameters (𝑠, 𝑡) in Part (a)? 6. If 𝔤 is a finite-dimensional Lie algebra over ℝ, its complexification 𝔤ℂ = 𝔤 + 𝑖𝔤 = {𝑋 + 𝑖𝑌 ∶ 𝑋, 𝑌 ∈ 𝔤} is a vector space over ℂ if we define (+) and (⋅) operations (𝑋 + 𝑖𝑌 ) + (𝑋 ′ + 𝑖𝑌 ′ ) = (𝑋 + 𝑋 ′ ) + 𝑖(𝑌 + 𝑌 ′ ) (𝑎 + 𝑖𝑏) ⋅ (𝑋 + 𝑖𝑌 ) = (𝑎𝑋 − 𝑏𝑌 ) + 𝑖(𝑎𝑌 + 𝑏𝑋) (recall the discussion in Section 2.2) Verify that 𝔤ℂ becomes a Lie algebra over ℂ if we define its bracket to be [(𝑋 + 𝑖𝑌 ), (𝑋 ′ + 𝑖𝑌 ′ )] = ([𝑋, 𝑋 ′ ] − [𝑌 , 𝑌 ′ ]) + 𝑖([𝑋, 𝑌 ′ ] + [𝑌 , 𝑋 ′ ]). (Checking the Jacobi identity is the messy part.) 7. Verify that 𝔰𝔲(2) = {𝐴 ∈ M(2, ℂ) ∶ 𝐴𝐴T = 𝐼} is the complexification of its real points 𝔰𝔬(3) = 𝔰𝔲(2) ∩ (M(2, ℝ) + 𝑖0). Section 5.4. The Exponential Map for Matrix Lie Groups 1. True/False Questions (“True” if the statement is always true). (a) If 𝛾(0) = 𝐼 and 𝛾′ (0) = 𝑋 ∈ 𝔤 for a smooth matrix-valued curve 𝛾(𝑡) that lies within a matrix Lie group 𝐺 ⊆ M(𝑛, ℝ), then for some 𝑟 > 0, 𝛾(𝑡) = 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 for −𝑟 < 𝑡 < 𝑟. (b) M(𝑛, ℝ) equipped with the associative bracket [𝐴, 𝐵] = 𝐴𝐵 − 𝐵𝐴 is the Lie algebra 𝔤𝔩 of the general linear group GL(𝑛, ℝ). (c) Every matrix 𝐴 ∈ M(𝑛, ℂ) can be written as 𝐷 + 𝑁 where 𝐷 is a diagonal matrix, 𝑁 is nilpotent, and the factors commute with 𝐷𝑁 = 𝑁𝐷. Hint. Think (JCF). (d) Whenever a matrix 𝐴 ∈ M(𝑛, ℂ) can be decomposed as a sum 𝐷 + 𝑁 of commuting diagonal and nilpotent matrices, the 𝐷 and 𝑁 are unique.

ADDITIONAL EXERCISES

285

(e) The Lie algebra of the unitary group U(𝑛) is 𝔲(𝑛) = {𝐴 ∈ M(𝑛, ℂ) ∶ |det(𝐴)| = 1}. (f) Many curves 𝛾(𝑡) in 𝐺 that pass through the identity 𝑒 = 𝐼 can determine the same tangent vector 𝑋 = 𝛾′ (0) in 𝔤 = TG𝑒 when 𝑡 = 0, but the one-parameter group 𝛾𝑋 (𝑡) = 𝑒𝑡𝑋 is the only smooth curve in 𝐺 such that 𝛾(0) = 𝐼 and 𝛾′ (0) = 𝑋. (g) For any matrix Lie group 𝐺, the union of all the curves traced out by the one-parameter subgroups 𝛾𝑋 (𝑡) with generators 𝑋 ∈ 𝔤 contains an open neighborhood of the identity 𝐼 in 𝐺. 2. Which of the following identities between Lie algebras are valid? (a) 𝔲(𝑛) = 𝔰𝔲(𝑛) in M(𝑛, ℂ) (b) 𝔬(𝑛) = 𝔰𝔬(𝑛) in M(𝑛, ℝ) (c) 𝔬(𝑛, ℂ) = 𝔰𝔬(𝑛, ℂ) in M(𝑛, ℂ) 3. If 𝐴 ∈ M(𝑛, ℝ), the curve 𝑋(𝑡) = 𝑒𝑡𝐴 is a 𝒞 ∞ curve that passes through the identity 𝐼 when 𝑡 = 0 and is a solution of the matrix-valued differential equation 𝑑𝑋 (𝑡) = 𝐴 ⋅ 𝑋(𝑡) for all 𝑡 ∈ ℝ, 𝑑𝑡 with initial value 𝑋(0) = 𝐼𝑛×𝑛 (identity matrix). (a) Explain why 𝑋(𝑡) satisfies this differential equation. (b) What is the initial value 𝑋(0) when 𝑡 = 0? (c) What is its derivative 𝑋 ′ (0) when 𝑡 = 0? At an arbitrary time 𝑡? (d) Explain by example why the solution to this equation fails to be unique if the initial value has not been specified. 4. What happens if we consider complex matrices 𝐴 ∈ M(𝑛, ℂ) and allow complex-valued solution curves 𝑍(𝑡) = 𝑋(𝑡) + 𝑖𝑌 (𝑡) for the differential equation 𝑑𝑍/𝑑𝑡 = 𝐴 ⋅ 𝑍(𝑡) with initial value 𝑍(0) = 𝐼? Is 𝑍(𝑡) = Exp(𝑡𝐴) still a solution to the initial value problem 𝑑𝑋 𝑑𝑡

(𝑡) = 𝐴 ⋅ 𝑋(𝑡)

with 𝑋(0) = 𝐼𝑛×𝑛 .

(a) Prove that the vector space 𝔤(2) = [𝔤, 𝔤] is a Lie algebra in its own right by showing that Lie bracket [𝐴, 𝐵] of vectors in 𝔤(2) is always back in 𝔤(2) . Hint. The Lie bracket is bilinear, and [𝑋, 𝑌 ] ∈ 𝔤 for 𝑋, 𝑌 ∈ 𝔤 by definition of Lie algebra. (b) Prove that the chain of commutator subspaces is decreasing, with 𝔤(𝑘+1) ⊆ 𝔤(𝑘) for 𝑘 = 1, 2, … . (c) Prove that [𝔤(𝑘) , 𝔤(𝑘) ] ⊆ 𝔤(𝑘) for all 𝑘, so every commutator subspace 𝔤(𝑘) is a Lie subalgebra of 𝔤.

286

5. MATRIX LIE GROUPS

5. In Example 5.84 we defined the 3-dimensional Heisenberg group as a group of matrices in M(3, ℝ), and introduced a systematic way to label its elements by triples (𝑥, 𝑦, 𝑧) ∈ ℝ3 . In this scheme the law of matrix multiplication took the form (𝑥, 𝑦, 𝑧) ⋅ (𝑥′ , 𝑦′ , 𝑧′ ) = (𝑥 + 𝑥′ , 𝑦 + 𝑦′ , 𝑧 + 𝑧′ + 𝑥𝑦′ ). (a) Find the triple such that (𝑥′ , 𝑦′ , 𝑧′ ) is the inverse of (𝑥, 𝑦, 𝑧) in this group. (b) Compute the effect of an inner automorphism 𝑖𝑔 (𝑎) = 𝑔𝑎𝑔−1 when 𝑔 = (𝑥, 𝑦, 𝑧) and 𝑎 = (𝑥′ , 𝑦′ , 𝑧′ ) in 𝐺. The Killing Form 𝐵(𝑋, 𝑌 ) = Tr(ad𝑋 ad𝑌 ) on a Lie Algebra.6 This bilinear form on finite-dimensional Lie algebras turned out to be crucial in understanding their remarkable algebraic structures. The next example asks you to compute the matrix 𝐵𝔛 of the Killing form for the Lie algebra 𝔰𝔬(3) ⊆ M(3, ℝ) of the group of rotations in 3-dimensional space, using a basis 𝔛 = {𝑋1 = 𝑋, 𝑋2 = 𝑌 , 𝑋3 = 𝑍} 3

(𝑘)

whose pattern of structure constants [𝑋𝑖 , 𝑋𝑗 ] = ∑𝑘=1 𝑐𝑖𝑗 𝑋𝑘 is particularly simple: [𝑋, 𝑌 ] = 𝑍 [𝑌 , 𝑍] = 𝑋 [𝑍, 𝑋] = 𝑌 These commutation relations were discussed in Example 5.84; for simplicity we have changed the labels on basis matrices from {𝐈, 𝐉, 𝐊} used there to {𝑋, 𝑌 , 𝑍}. Computing the products ad𝑋𝑖 ad𝑋𝑗 of operators on 𝔤 and their traces is less daunting than it might seem if approached with a few basic principles in mind. • Computing the 3 × 3 matrices [ad𝑋𝑖 ]𝔛 is straightforward. • If we label 𝐴𝑖 = [ad𝑋𝑖 ]𝔛 , the products 𝐴𝑖 𝐴𝑗 are the 3 × 3 matrices 𝐴𝑖𝑗 = [ad𝑋𝑖 ]𝔛 ⋅ [ad𝑋𝑗 ]𝔛 = [ad𝑋𝑖 ad𝑋𝑗 ]𝔛 , whose traces are the entries 𝐵(𝑋𝑖 , 𝑋𝑗 ) in [𝐵]𝔛 . • Since 𝐵(𝑋, 𝑌 ) is symmetric (why?), we only need to compute the matrix entries 𝐵(𝑋𝑖 , 𝑋𝑗 ) = Tr(𝐴𝑖 𝐴𝑗 ) for 1 ≤ 𝑖 ≤ 𝑗 ≤ 3. these can be found by evaluating the coefficients in [𝑋𝑖 , [𝑋𝑗 , 𝐻]] = 𝑎𝑋 + 𝑏𝑌 + 𝑐𝑍 for 1 ≤ 𝑖 ≤ 𝑗 ≤ 3 and 𝐻 = 𝑋, 𝑌 , 𝑍. For example, taking 𝑋𝑖 = 𝑋 and 𝑋𝑗 = 𝑋 or 𝑌 , it follows from the commutation relations for 𝑋, 𝑌 , 𝑍 that [𝑋, [𝑋, 𝑋]] = 0 (since [𝑋, 𝑋] = 0) [𝑋, [𝑋, 𝑌 ]] = −𝑌 and [𝑋, [𝑋, 𝑍]] = −𝑍. 6 W. Killing was a contemporary of Lie, who was one of the first to attempt a detailed structural analysis of the “simple” Lie algebras over ℂ, which include many of the complex classical groups. He missed some of the possibilities, later filled in by E. Cartan; nevertheless his pioneering work stands as a foundation of Lie theory.

ADDITIONAL EXERCISES

287

Continuing these calculations we get 0 0 0 0) 𝐴11 = 𝐴1 𝐴1 = ( 0 −1 0 0 −1 and 𝐵(𝑋, 𝑋) = 𝐵(𝑋1 , 𝑋1 ) = Tr(𝐴1 𝐴1 ) = −2. The other entries are determined similarly. 9. Verify that the matrix of the Killing form 𝐵 for the real Lie algebra 𝔤 = 𝔰𝔬(3) of the group of rotations on ℝ3 is −2 0 0 0 ). [𝐵]𝔛 = ( 0 −2 0 0 −2 Note. The Killing form SO(3) is symmetric and strictly negative definite, with signature (0, 3) and 𝐵(𝐻, 𝐻) < 0 for all 𝐻 ≠ 0 in 𝔤. Inner products on ℝ3 are bilinear forms with signature (3, 0) and have 𝐵(𝐻, 𝐻) > 0 for all nonzero 𝐻. ○ Section 5.5 and 5.6. The Exponential Map and Lie Correspondence 1. True/False Questions (“True” if the statement is always true.) (a) The CHB formula for the multiplication operation in a matrix Lie group reduces to a finite sum if the Lie algebra 𝔤 is nilpotent. (b) If 𝐴 is a nonzero 𝑛 × 𝑛 complex matrix, solutions 𝑋 ∈ M(𝑛, ℂ) to the matrix equation 𝑒𝑋 = 𝐴 always exist. (c) For 𝐴 ∈ M(𝑛, ℂ) close enough to the identity matrix 𝐼, the matrix equation 𝑒𝑋 = 𝐴 has a unique solution 𝑋 = Log(𝐴). (d) If 𝐴 is a nonzero 𝑛 × 𝑛 complex matrix and 𝑋 ∈ M(𝑛, ℂ) is a solution 𝑋 ∈ M(𝑛, ℂ) to the matrix equation 𝑒𝑋 = 𝐴, all other solutions have the form 𝑋 + 2𝜋𝑛𝑖𝐼 for 𝑛 ∈ ℤ. (e) The exponential map Exp ∶ M(𝑛, ℝ) → GL(𝑛, ℝ) maps the set 𝒩𝑛 = {𝐴 ∈ M(𝑛, 𝕂) ∶ 𝐴𝑘 = 0 for some 𝑘 ∈ ℕ} of nilpotent 𝑛 × 𝑛 matrices into the set of unipotent 𝑛 × 𝑛 matrices 𝒰𝑛 = {𝐴 ∈ M(𝑛, 𝕂) ∶ 𝐴 − 𝐼 is nilpotent}. 2. List all real matrices 𝑋 such that 𝑒𝑋 = 𝐼𝑛×𝑛 . 3. Find all nilpotent solutions 𝑋 ∈ M(𝑛, ℂ) of the matrix identities −1 0 0 0) (a) Exp(𝑋) = ( 0 −1 0 0 −1

0 ⎛0 (b) Exp(𝑋) = ⎜ ⎜0 ⎝0

1 0 0 0

0 1 0 0

0 0⎞ ⎟ 1⎟ 0⎠

Hint. In (a), there are no nilpotent solutions 𝑋 (why?). In (b), you could use the (finite) series for Log(𝑋) or work backward from the finite series for Exp(𝐴) = 𝑋. Remember that a finite sum of commuting nilpotent operators is nilpotent.

288

5. MATRIX LIE GROUPS

4. The following identities can be proved by working with the scalar power series for Exp and Log. (i) Log(Exp(𝑋)) = 𝑋 (ii) Exp(Log(𝐴)) = 𝐴

for all 𝑋 with ‖𝑋‖ < √2 for all 𝐴 with ‖𝐴 − 𝐼‖ < 1.

But if 𝑋 is nilpotent (or 𝐴 unipotent), the series for Exp and Log have only finitely many nonzero terms. Are these identities valid for all nilpotent 𝑋 (or unipotent 𝐴)? Explain your reasoning. 5. If 𝐺 and 𝐻 are locally isomorphic matrix Lie groups, explain why their Lie algebras must be isomorphic (so 𝔤 ≅ 𝔥), starting from the definitions of 𝔤, 𝔥 as the tangent spaces TG𝑒 , TH𝑒 . 6. For the 3-dimensional Heisenberg group 𝐺 discussed in Example 5.84, the product 𝐴 ∗ 𝐵 of two typical vectors 𝐴 = 𝑎𝑋 + 𝑏𝑌 + 𝑐𝑍

𝐵 = 𝑎′ 𝑋 + 𝑏 ′ 𝑌 + 𝑐 ′ 𝑍

is given by a finite sum. (a) Explain why all iterated commutators [𝐻1 , [𝐻2 , … , [𝐻𝑟−1 , 𝐻𝑟 ], … ]]

𝐻𝑘 ∈ 𝔤

of length 𝑟 ≥ 3 are all zero. (b) Write out the full expansion of the Campbell-Hausdorff product 𝐴∗𝐵 for arbitrary 𝐴 and 𝐵 in the Lie algebra 𝔤 of the 3-dimensional Heisenberg group. (The CBH formula is a finite sum for all nilpotent Lie groups. This is one of their most congenial features.)

Bibliography [1] Manfredo Perdigão do Carmo, Riemannian geometry, Mathematics: Theory & Applications, Birkhäuser Boston, Inc., Boston, MA, 1992. Translated from the second Portuguese edition by Francis Flaherty. MR1138207 [2] Melvin Hausner and Jacob T. Schwartz, Lie groups; Lie algebras, Gordon and Breach Science Publishers, New York-London-Paris, 1968. MR0235065 [3] G. Hochschild, The structure of Lie groups, Holden-Day, Inc., San Francisco-LondonAmsterdam, 1965. MR0207883 [4] Nathan Jacobson, Lie algebras, Interscience Tracts in Pure and Applied Mathematics, No. 10, Interscience Publishers (a division of John Wiley & Sons), New York-London, 1962. MR0143793 [5] Wulf Rossmann, Lie groups, Oxford Graduate Texts in Mathematics, vol. 5, Oxford University Press, Oxford, 2002. An introduction through linear groups. MR1889121 [6] V. S. Varadarajan, Lie groups, Lie algebras, and their representations, Graduate Texts in Mathematics, vol. 102, Springer-Verlag, New York, 1984. Reprint of the 1974 edition. MR746308

289

Published Titles in This Series 30 Frederick P. Greenleaf and Sophie Marques, Linear Algebra II, 2020 29 Frederick P. Greenleaf and Sophie Marques, Linear Algebra I, 2019 28 L´ aszl´ o Erd˝ os and Horng-Tzer Yau, A Dynamical Approach to Random Matrix Theory, 2017 27 S. R. S. Varadhan, Large Deviations, 2016 26 Jerome K. Percus and Stephen Childress, Mathematical Models in Developmental Biology, 2015 25 Kurt O. Friedrichs, Mathematical Methods of Electromagnetic Theory, 2014 24 Christof Sch¨ utte and Marco Sarich, Metastability and Markov State Models in Molecular Dynamics, 2013 23 Jerome K. Percus, Mathematical Methods in Immunology, 2011 22 Frank C. Hoppensteadt, Mathematical Methods for Analysis of a Complex Disease, 2011 21 Frank C. Hoppensteadt, Quasi-Static State Analysis of Diﬀerential, Diﬀerence, Integral, and Gradient Systems, 2010 20 Pierpaolo Esposito, Nassif Ghoussoub, and Yujin Guo, Mathematical Analysis of Partial Diﬀerential Equations Modeling Electrostatic MEMS, 2010 19 Stephen Childress, An Introduction to Theoretical Fluid Mechanics, 2009 18 Percy Deift and Dimitri Gioev, Random Matrix Theory: Invariant Ensembles and Universality, 2009 17 Ping Zhang, Wigner Measure and Semiclassical Limits of Nonlinear Schr¨ odinger Equations, 2008 16 S. R. S. Varadhan, Stochastic Processes, 2007 15 Emil Artin, Algebra with Galois Theory, 2007 14 Peter D. Lax, Hyperbolic Partial Diﬀerential Equations, 2006 13 Oliver B¨ uhler, A Brief Introduction to Classical, Statistical, and Quantum Mechanics, 2006 12 11 10 9

J¨ urgen Moser and Eduard J. Zehnder, Notes on Dynamical Systems, 2005 V. S. Varadarajan, Supersymmetry for Mathematicians: An Introduction, 2004 Thierry Cazenave, Semilinear Schr¨ odinger Equations, 2003 Andrew Majda, Introduction to PDEs and Waves for the Atmosphere and Ocean, 2003

8 Fedor Bogomolov and Tihomir Petrov, Algebraic Curves and One-Dimensional Fields, 2002 7 S. R. S. Varadhan, Probability Theory, 2001 6 Louis Nirenberg, Topics in Nonlinear Functional Analysis, 2001 5 Emmanuel Hebey, Nonlinear Analysis on Manifolds: Sobolev Spaces and Inequalities, 2000 3 Percy Deift, Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach, 2000 2 Jalal Shatah and Michael Struwe, Geometric Wave Equations, 2000 1 Qing Han and Fanghua Lin, Elliptic Partial Diﬀerential Equations, Second Edition, 2011

Linear Algebra II FREDERICK P. GREENLEAF AND SOPHIE MARQUES

This book is the second of two volumes on linear algebra for graduate students in mathematics, the sciences, and economics, who have: a prior undergraduate course in the subject; a basic understanding of matrix algebra; and some proficiency with mathematical proofs. Both volumes have been used for several years in a one-year course sequence, Linear Algebra I and II, offered at New York University’s Courant Institute. The first three chapters of this second volume round out the coverage of traditional linear algebra topics: generalized eigenspaces, further applications of Jordan form, as well as bilinear, quadratic, and multilinear forms. The final two chapters are different, being more or less self-contained accounts of special topics that explore more advanced aspects of modern algebra: tensor fields, manifolds, and vector calculus in Chapter 4 and matrix Lie groups in Chapter 5. The reader can choose to pursue either chapter. Both deal with vast topics in contemporary mathematics. They include historical commentary on how modern views evolved, as well as examples from geometry and the physical sciences in which these topics are important. The book provides a nice and varied selection of exercises; examples are well-crafted and provide a clear understanding of the methods involved.

For additional information and updates on this book, visit www.ams.org/bookpages/cln-30

CLN/30

NEW YORK UNIVERSITY